Problem of many irrelevant correlations

I have data on several thousand students.
Each student has to be tested on each of 12 questions.
There are 2 test methods: objective and subjective.
Test are set at two points in time: Pre and Post.
I inherited the data. I did not design its gathering.
The column headers (variable names) take the form:
Time_Method_question e.g.
Pre_objective_01, Pre_objective_12, Pre_subjective_02, etc.
I know that it would have been better to have used more variables, but the data
set was what was inherited.
I wish to calculate the correlation between the two methods separately for each time point.

It is easy to produce a lot of correlations using the linear correlation node. However, it is also easy to get too many irrelevant coefficients. Objective question 1 with subjective questions 2 to 12, for example.
I merely wish to do objective question 1 with subjective question 1, objective question
with subjective question 2 and so on down to question 12.
The former procedural programmer in me keeps saying “that should be easy using loops”.
Has anyone got a simple way (for a relative Knime newbie) to produce, at each time point only the 12 correlations that I need and not the 144 that I get.

One can, of course, kludge one’s way through the problem by filtering out columns all overthe place and concatenating results at the end. That, though, is fragile and labour intensive. Theremust be a better way.

With thanks (and embarrassment)

3rd March 2024

Hello @LaurieMoseley
If I understood correctly; you can unpivot all the column values; then split the $Column Names$ resulting column, stepping on underscore character as delimiter.

Finally, you can pivot the ‘method’ (objective/subjective) split, resulting in two columns for comparing purposes.

I hope this helps. For further research you can share some data.



This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.