How to evaluate the performance of a dimensionality reduction algorithm

hi I need to know
How to evaluate the performance of a dimensionality reduction algorithm knime,
what node do I need to use to know if the result of this algorithm is good
or if 2 columns instead of 10 is good or must be 3.

You can use the Principle Component Analysis (PCA) nodes to effect dimensionality reduction.

You can use the eigenvalue column in the spectral decomposition output of the PCA compute node to understand how much information is included in each PCA dimension of your data:
(a) Copy eigenvalues column to excel
(b) Sort values descending (largest to smallest).
(c) Scale values so that the total is 100%.
(d) The cumulative total tells you how much data is included in the first ā€˜nā€™ dimensions.

In the following table 96.23% of the information in the original data is carried in the first four dimensions from the PCA output.

Dimension Eigenvalues Scaled Cummulative
1 0.03689082 66.86% 66.86%
2 0.01209765 21.93% 88.79%
3 0.00293366 5.32% 94.11%
4 0.00116910 2.12% 96.23%
5 0.00073716 1.34% 97.56%
6 0.00053667 0.97% 98.54%
7 0.00046425 0.84% 99.38%
8 0.00016431 0.30% 99.68%
9 0.00012344 0.22% 99.90%
10 0.00005543 0.10% 100.00%
11 -0.00000000 0.00% 100.00%
Total 0.05517250 100.00%

In reality, you can use the PCA Apply node and select either the number of dimensions you want to use in your output (it will tell you the percentage of information retained), or you can directly input the minimum percentage of information you want to retain and it will work out how many dimensions you need.



@mohammad_alqoqa what I can offer are these links to articles about dimension reduction from the KNIME universe:

1 Like

Thank you very much.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.