Hi,
I’ve been using the Linear Correlation Node to filter our columns that are redundant for for example decision trees. Now I came across a case where the correlation node (and filter) is confusing me:
Example: Correlation Example – KNIME Hub
Now in the example we’ve Column A containing either nothing or “Sold” and in Column B some IDs.
In one direction I can see a strong correlation: All IDs “28” for example are sold, but on the other side not all Sold IDs are “28” → Should be the correlation matrix in this case be asymmetric? Like a strong correlation in one direction but a week one in the other?
The problem for later becomes that the linear correlation filter seems always to process the table in order, meaning it will keep the first column and remove the second. In this example this is not really what you want because you’re going to remove the column with more information in it.
I hope my questions/problem is understandable, if not let me know!
Thanks for your help!