STATISTICAL TEST MANY TIMES

Hi,

I’m very new in KNIME, and I’m struggled in one of the steps.

I’m trying to build a prediction model for medical data. My study aims to classify if a patient is difficult or easy to be intubated during the surgery. These patients are classified as “Easy” or “Difficult”. This feature might be related to patient’s voice. Different parameters of the voice have been measured in the two types of patients.

Once the data is ready, some statistical tests are convenient to identify the significant data. I would like to use the Kolmogorov-Smirnov Test and compare the same parameter for “Easy” and “Difficult” patients. I have already split into two different columns the same feature. For example, the fundamental frequency of the voice in “easy” patients is placed in one column and the fundamental frequency in “difficult” patients is placed in another one.

Image of input data in the statistical test:

Image of dialog for Kolmogorov-Smirnov Test:
image

I would like to use the Kolmogorov-Smirnov Test and introduce as testing columns the first and second column and evaluate the results. After that, I would like to introduce as testing columns the third and fourth columns and evaluate the results. And this, consequently for every pair of data. If you pay attention to the picture, pairs of data are named the same, except for the ending (maybe this can help ).

I have an idea of what I should do. I would like to create a loop that introduces every pair of data and evaluates the results. But how can I arrange the pair of data I need? How I introduce them in the Kolmogorov-Smirnov test?

Thank you very much in advance,

Helena

Hi @helfortuny

I don’t know your column names exactly and assume the order of the columns to test is consitent.
But it looks like, it is possible to use the column names to unpivot your data set.
Once that is done, you can use a group loop to do tests. See if this help you forward.
statistical_test_many_times.knwf (73.8 KB)


gr. Hans

2 Likes

Uau, amazing!!! Thank you a lot!!!

1 Like

I have another question related to this topic. I’ve followed your procedure, which seems to work effectively. However, I have doubts when introducing the data in the Kolmogorov-Smirnov Test. The previous block, called pivoting, presents the data like this:

The data missing is because there are registered in the dataset more “easy” patients than “difficult” ones. Then, in every loop for the same parameter, there is more data for “difficult” than “easy”. Is that a problem for carrying out the Kolmogorov-Smirnov test? If it’s not a problem, data missing from “difficult” column might be skipped but I would like to use all data in “easy”. Are the settings well arranged like this?
image