How can I calculate the percentile rank for each column record?
I have a dataset which contains user IDs and their scores in some test. I want to have the percentile rank for each user. For example user 1 is better than 32% of all users and so on.
I didn't find any node to do that for me. How can I do that in KNIME?
I did also think about using rank node but there is a problem with this approach too.
One simple formula for percentile rank is "PRx= 100 (C(i-1) / (N-1))" (which is used in excel) where "C(i-1)" is the number of values blow the interest value and "N" is the total number of the values.
Another formula is "PRx= 100 ((C(i-1) + 0.5 Ci) / N)" where "C(i-1)" is the number of values blow the interest value, "Ci" is the number of values equal to the interest value and "N" is the total number of values. In this formula "PR" is the percent of values equal or lower than the interest value in contrast to the first formula that "PR" is the percent of values just lower than the interest value.
In rank node I cannot have the same "i" for equal values and count all "i-1" values at the same time. It means if I set the ranking mode to "Standard", for same values I will have different PR and if I set the mode to "Dense", I'll miss counting equal values.
The ideal solution is that KNIME adds this feature (Percentile Rank) as a part of Rank node (plus some improvements to the node), but the best solution for now I think is that I output my data as an Excel file and do the ranking in Excel (using PERCENTRANK.INC function) and then read the file again in KNIME.
I have added your feature-request. Thank you very much for the suggestion!
However, I don't understand why you have to use the PERCENTRANK.INC function in Excel. Attached you will find a workflow that does the exact same thing with the Rank node and a Math Formula, or did I miss something completely? (PERCENTRANK.INC value is included in the data as reference).