Just run a workflow with “PCA” and “PCA Compute” nodes and in both nodes it is possible to define columns to include or exclude from the PCA. But looking at the output (covariance matrix and loadings) it appears, that the column that I excluded is still part of the calculation.
Here’s small part of the output with the “Concentration A” column present in the input file, but excluded in the “PCA Compute node” configuration:
eigenvalue Concentration A Feature1 Feature2
12.2156918 0.695533692 0.127841119 0.143758568
7.319064878 -0.703948333 0.168538412 0.183373315
4.497250157 0.079419285 -0.015860147 0.076603463
2.500490397 -0.109694686 -0.381236912 -0.195004551
1.482244583 -0.015410615 -0.011739858 -0.262190766
The “Concentration A” column is still present in the output file when it shouldn’t be. In addition, if I put a column filter in front of the “PCA Compute” node and eliminate this column, it doesn’t show up (naturally), but also the eigenvalues and loadings are different (which seems to indicate, that the column was taken into account for the calculation in the above example).
eigenvalue Feature1 Feature2
9.815795857 0.198503037 0.220085087
4.52425556 4.85E-04 0.091242809
2.577484582 0.38346734 0.203967157
1.484091467 0.008478681 -0.255752408
That’s something that shouldn’t happen, unless I’m missing something (the node configuration appears to be pretty straightforward). (???)
BTW: this is using Knime 2.9.1