Linear Correlation Filter - Define which column to keep


I’ve been using the Linear Correlation Node to filter our columns that are redundant for for example decision trees. Now I came across a case where the correlation node (and filter) is confusing me:
Example: Correlation Example – KNIME Hub

Now in the example we’ve Column A containing either nothing or “Sold” and in Column B some IDs.
In one direction I can see a strong correlation: All IDs “28” for example are sold, but on the other side not all Sold IDs are “28” → Should be the correlation matrix in this case be asymmetric? Like a strong correlation in one direction but a week one in the other?

The problem for later becomes that the linear correlation filter seems always to process the table in order, meaning it will keep the first column and remove the second. In this example this is not really what you want because you’re going to remove the column with more information in it.

I hope my questions/problem is understandable, if not let me know!

Thanks for your help!

Update: I calculated Cramers V manually - it’s the same result. However it seems like Bias Correction is not implemented in the KNIME node and hence the very strong correlation even if it’s only one sided.
Cramér’s V - Wikipedia

For the workflow to get rid of the first row I would use a different approach in this case and use low variance filter to remove it.
For your general question I can’t provide an answer sorry.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.