Contradiction between R and KNIME for kruskal wallis test

Hello,

When I use the kruskal wallis test node in KNIME I get a p-value of about 1.

But when I use the r snippet node and use a kruskal wallis test in it I get a p-value near zero.

A Workflow that reproduces my problem is in the attachment.

In the r snippet I have written:

y <- knime.in$"Column 0"
aj <- knime.in$"Column 1"

h <- kruskal.test(x=aj,g=y)
h

knime.out<- knime.in

What am I missing?

 

Thank you

Interesting. Any chance KNIME and R are using opposite null hypothesis?

 

I don't know. The node in KNIME is basically only a link to Wikipedia and all R Tutorials say that the null hypotesis is that the groups are from the same population. I am not very familiar with anything.

Having looked further into it, it appears that KNIME and R get a different values for the H test statistic so it is no surprise they also get different p-values. In the case of your data set:

KNIME: H = -197.685

R: H = 48.535

Since the H test statistic is distributed as a Chi-Squire with N-1 degrees of freedom (N = number of groups, 3 in your case) I believe there is something wrong with the way KNIME calculates it for your example data set. I will look further into it.

Update after further investigation: it appears your data set has a large number of ties in it. When there is a large number of ties the value of H out of the simplified formula needs to be adjusted dividing it by a correction factor (see Step 4. here https://en.wikipedia.org/wiki/Kruskal%E2%80%93Wallis_one-way_analysis_of_variance).

It appears that the R function kruskal.test applies the correction while the current  KNIME implementation of the same test does not apply the correction. I haven't run all the calculations but it looks like this may lead to a "wrong" value for the H statistic and as a consequence to a wrong p-value.

I think you have found a bug or at least a missing check for a large number of ties in the way the node has been implemented in KNIME.

Cheers,
Marco.

Thank you for solving my confusion.

Should I submitt a bug report or is it now knows to the KNIME people?

I think that if you are able to modify the subject of this post by pre-pending [BUG] to its title it will get the necessary attention from the KNIME team.

Cheers,
Marco.

Dear Cherubin7th and marco_ghislanzoni,

thank you for reporting this. I have also tried to replicate this on R and it seems that there is something wrong with the way KNIME calculates it. 

I have reported this to the tech KNIME team.

I will keep you updated.

Thank you,

Cheers,

Vincenzo

Hi @all,

thanks for discussing this issue in detail. Marco is correct, we didn't implement the correction factor. We will discuss it potentially add it to a future release.

Best,

Christian