Percentile Rank Calculation


#1

How can I calculate the percentile rank for each column record?

I have a dataset which contains user IDs and their scores in some test. I want to have the percentile rank for each user. For example user 1 is better than 32% of all users and so on.

I didn't find any node to do that for me. How can I do that in KNIME?

 


#2

Hi Armin,

thanks for the new thread!

How about you calculate the rank with the RANK node and than you divide the rank by the number of rows and multiply with 100.

Would this help?

Best, Iris 


#3

I did also think about using rank node but there is a problem with this approach too.

One simple formula for percentile rank is "PRx= 100 (C(i-1) / (N-1))" (which is used in excel) where "C(i-1)" is the number of values blow the interest value and "N" is the total number of the values.

Another formula is "PRx= 100 ((C(i-1) + 0.5 Ci) / N)" where "C(i-1)" is the number of values blow the interest value, "Ci" is the number of values equal to the interest value and "N" is the total number of values. In this formula "PR" is the percent of values equal or lower than the interest value in contrast to the first formula that "PR" is the percent of values just lower than the interest value.

In rank node I cannot have the same "i" for equal values and count all "i-1" values at the same time. It means if I set the ranking mode to "Standard", for same values I will have different PR and if I set the mode to "Dense", I'll miss counting equal values.

The ideal solution is that KNIME adds this feature (Percentile Rank) as a part of Rank node (plus some improvements to the node), but the best solution for now I think is that I output my data as an Excel file and do the ranking in Excel (using PERCENTRANK.INC function) and then read the file again in KNIME.


#4

Hey armingrudd,

I have added your feature-request. Thank you very much for the suggestion!

However, I don't understand why you have to use the PERCENTRANK.INC function in Excel. Attached you will find a workflow that does the exact same thing with the Rank node and a Math Formula, or did I miss something completely? (PERCENTRANK.INC value is included in the data as reference).

Cheers,
Ferry


#5

Dear Ferry,

I had it wrong in my mind about rank modes in KNIME. Your answer is completely correct.

Thank you so much for the reply and the workflow example.

KNIME is the best! ;)

Cheers,

Armin


#6

Hi,

 

Attached is the another version for rank using python node using Ferry's workflow.

 

Enjoy KNIME!


#7

Attached is the final. Sorry!


#8

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.