When I use the kruskal wallis test node in KNIME I get a p-value of about 1.
But when I use the r snippet node and use a kruskal wallis test in it I get a p-value near zero.
A Workflow that reproduces my problem is in the attachment.
In the r snippet I have written:
y <- knime.in$"Column 0"
aj <- knime.in$"Column 1"
h <- kruskal.test(x=aj,g=y)
What am I missing?
Interesting. Any chance KNIME and R are using opposite null hypothesis?
I don't know. The node in KNIME is basically only a link to Wikipedia and all R Tutorials say that the null hypotesis is that the groups are from the same population. I am not very familiar with anything.
Having looked further into it, it appears that KNIME and R get a different values for the H test statistic so it is no surprise they also get different p-values. In the case of your data set:
KNIME: H = -197.685
R: H = 48.535
Since the H test statistic is distributed as a Chi-Squire with N-1 degrees of freedom (N = number of groups, 3 in your case) I believe there is something wrong with the way KNIME calculates it for your example data set. I will look further into it.
Update after further investigation: it appears your data set has a large number of ties in it. When there is a large number of ties the value of H out of the simplified formula needs to be adjusted dividing it by a correction factor (see Step 4. here https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance).
It appears that the R function kruskal.test applies the correction while the current KNIME implementation of the same test does not apply the correction. I haven't run all the calculations but it looks like this may lead to a "wrong" value for the H statistic and as a consequence to a wrong p-value.
I think you have found a bug or at least a missing check for a large number of ties in the way the node has been implemented in KNIME.
Thank you for solving my confusion.
Should I submitt a bug report or is it now knows to the KNIME people?
I think that if you are able to modify the subject of this post by pre-pending [BUG] to its title it will get the necessary attention from the KNIME team.
Dear Cherubin7th and marco_ghislanzoni,
thank you for reporting this. I have also tried to replicate this on R and it seems that there is something wrong with the way KNIME calculates it.
I have reported this to the tech KNIME team.
I will keep you updated.
thanks for discussing this issue in detail. Marco is correct, we didn't implement the correction factor. We will discuss it potentially add it to a future release.