PCA Node in a Workflow from Examples is Taking Forever to Execute

cageybee · May 2, 2018, 3:38pm

Hi,

I downloaded the Topic Extraction workflow from the Examples server and I replaced the document source and hit execute and the PCA node is suck at 40% for almost an hour already. I must note that documents were resumes and there were only 94 of them. Knime is now (Not Responding).

Why is it taking so long?

amartin · May 8, 2018, 7:08am

Hi cageybee,

Most probably you have many rows and columns after creating bit vectors from your documents (check the output of the Document Vector node). PCA complexity is O(min(p^3,n^3)) (check this paper) and that is the reason it might take so long to execute the PCA node. You can try to restart your KNIME, reduce number of rows/columns (by sampling your documents with the Row Sampling node) and execute it again.

Best,
Anna

cageybee · May 8, 2018, 1:11pm

Anna, I left the process running over night and it was stuck at 40% the entire time and never moved beyond that.