I have a table sorted by 'volume percentage' as follows:
buy_trader_id volume volume percentage
3 23136 15.207
201 19046 12.519
9 13488 8.866
166 8937 5.874
31 8113 5.333
.......................... (and so on)
Now I wish to obtain traders contributing top 25% of volume. Here, they are the first two: '3' and '201' since the sum of volume percentages is 15.207+12.519 > 25%
In other words, I want to use a condition so that I get only those rows for which cumulative sum of volume percentages becomes 25%
I added a simpler example to aarons one. You can actually use the java snippet to calculate the cumulative sum of a row, and the rest is filtering rows.
I have wanted to do such tasks previously too. It would be much easier if it was possible to have an additional feature in the groupby node which you can aggregate by a specific percentile. I.e. the 10th and 90th percentile are always handy for getting the range of the majority of the data population,
in this case you could request the upper quartile, 25th percentile.