How to evaluate the performance of a dimensionality reduction algorithm

mohammad_alqoqa · November 26, 2023, 9:40pm

hi I need to know
How to evaluate the performance of a dimensionality reduction algorithm knime,
what node do I need to use to know if the result of this algorithm is good
or if 2 columns instead of 10 is good or must be 3.

DiaAzul · November 27, 2023, 12:27am

You can use the Principle Component Analysis (PCA) nodes to effect dimensionality reduction.

You can use the eigenvalue column in the spectral decomposition output of the PCA compute node to understand how much information is included in each PCA dimension of your data:
(a) Copy eigenvalues column to excel
(b) Sort values descending (largest to smallest).
(c) Scale values so that the total is 100%.
(d) The cumulative total tells you how much data is included in the first ‘n’ dimensions.

In the following table 96.23% of the information in the original data is carried in the first four dimensions from the PCA output.

Dimension	Eigenvalues	Scaled	Cummulative
1	0.03689082	66.86%	66.86%
2	0.01209765	21.93%	88.79%
3	0.00293366	5.32%	94.11%
4	0.00116910	2.12%	96.23%
5	0.00073716	1.34%	97.56%
6	0.00053667	0.97%	98.54%
7	0.00046425	0.84%	99.38%
8	0.00016431	0.30%	99.68%
9	0.00012344	0.22%	99.90%
10	0.00005543	0.10%	100.00%
11	-0.00000000	0.00%	100.00%

Total	0.05517250	100.00%

In reality, you can use the PCA Apply node and select either the number of dimensions you want to use in your output (it will tell you the percentage of information retained), or you can directly input the minimum percentage of information you want to retain and it will work out how many dimensions you need.

DiaAzul

mlauber71 · November 27, 2023, 9:15am

@mohammad_alqoqa what I can offer are these links to articles about dimension reduction from the KNIME universe:

mohammad_alqoqa · November 27, 2023, 1:01pm

Thank you very much.

system · December 4, 2023, 1:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.