nth Percentile Calculation

racecar · April 5, 2016, 12:04am

Anyone know how to calculate nth percentile (e.g. 5th or 95th percentile) across rows (or columns) of values (of type double) for each row (or column) in a table?

(looks like the 'GroupBy' node has a P^2 percentile approximation but no good old fashioned percentile method)

If there aren't any Nodes available that do this out of the box perhaps someone would be kind enough to point me to resources showing how to use the Python scripting node for accomplishing this type of mathematical task (assuming I would use something like this in Python: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.quantile.html)

many thanks!

thor · April 5, 2016, 9:45am

The general term for percentile or quartile is quantile. The Group By node has an aggregation function for quantiles which you can configure to your needs, e.g. 0.9 for the 9th percentile.

racecar · April 5, 2016, 8:19pm

Easier than I could have hoped. Thanks!

Geo · April 6, 2016, 12:38am

Actually, I've always wondered why the Statistics node or any other not yet existing dedicated node which would carry the wonderful name "Quantiles" is not able to do this. Not to say that I would not be happy with the GroupBy solution but Statistics seemed a more intuitive place to look for such functionality.

For example, the Calculate median function in Statistics already does the preparation work to extract the median, why not extract a collection object with the desired n-tiles and let the user choose the quantile variety (quartiles, deciles, etc) in the configuration screen via a drop-down menu instead of the tick box?

armingrudd · December 1, 2017, 9:10am

How can I calculate the percentile rank for each column record?

I have a column which is the scores for some test and I want to know the percentile rank for each user in that test.

johannes_clarifydata · December 3, 2017, 10:58am

Hi, you can calculate the percentiles (or percentile groups) in the way described above and then use the "Rule Engine" node to match test scores to percentiles (or percentile groups). I don't think that there is a simpler i.e. built-in function to do this.

armingrudd · December 3, 2017, 4:20pm

It seems this approach doesn't work for what I'm looking for. I have several columns and many records (rows) that I need to have the percentile rank for each record of the columns.

The quantile aggregation function in group by node outputs one number to the percent value you choose.

I need something vise versa, I need some node to get my whole column values and outputs the percentile ranks for each record.

Iris · December 3, 2017, 5:03pm

The Groupby node now supports calculating quantiles.

Best, Iris

armingrudd · December 3, 2017, 9:38pm

I knew about the quantile function in groupby node but as I said in my last comment, I need to calculate the percentile rank for each record. In groupby node one value is given to the percent we choose but I want to determine the percentile rank for each record of a whole column.

I have a dataset which contains user IDs and their scores in some tests. I want to have the percentile rank for each user in each test. For example user 1 in test 1 is better than 32% of all users and so on.

In groupby node and the quantile function I cat set a percent and have the score. I want something vise versa and work on all records of a column.

armingrudd · December 5, 2017, 7:17pm

I have added a new topic for my question about percentile rank calculation here.

mauuuuu5 · April 6, 2021, 7:52pm

Hi @armingrudd it seems the link is dead, I beg if you can update it

Best Regards

Mau

armingrudd · April 7, 2021, 11:17pm

dylanknowles · July 25, 2022, 4:49pm

For anyone who finds this from search, what you want to do is calculate the nth percentile, join on the original population, row filter by rule, and then exclude anyone who is less than the nth percentile. E.g. 75th percentile = 3, population is [A=1, B=2, C=1, D=4, … ] would include D (and anyone else over 3).