In the aggregation types, for the groupby and column aggregator nodes please can additional percentiles be added.
at the moment there is the Median which is the 50th percentile, but what about the upper quartile and lower quartile (75 and 25th percentile respectively) as well as the 90th and 10th percentile.
i don't see a very efficient way of accomplishing theses on multiple columns from a table without resorting to loops, counting rows, applying variables and such like.
Indeed, those are already on the feature list. However, it's not that easy because especially for quartiles several definitions exist (e.g. in case the quartile position is not a full number). You would want to choose which definition to use.
Ok, thanks, glad to know its on your radar
Isn't the real problem not in the GroupBy, but in the inefficient implementations of loops? If loops where (almost) as fast as single nodes, there would be no need for more aggregations, right? Just my two cents...
I doubt it is the loops per se that are the problem.
A large part of the inefficiency in loops (and everything else in knime) is in the data comming out of and going back in into nodes.
That data needs to be parsed, checked, processed, formatted, written back to files, compressed, decompressed, etc etc etc.
All of that is not needed once it is in one node that does all your processing.
Besides, the issue here is not execution speed, but that the algorithm has to be recreated from nodes. In that i agree 100% with Simon, it would be nice if those aggregationtypes would exist.
Even if there are different implementations, if there would be one chosen and the choice.explained, that would allready be very helpfull.
I'm pleased to anounce that with the next KNIME release (end of July) the GroupBy node will offer much more statistics e.g. second moment, quantile, correlation, covariance, sum of squares/logs, Kurtosis, Skewness, MAD (mean/median absolute deviation).