PCA Node in a Workflow from Examples is Taking Forever to Execute


I downloaded the Topic Extraction workflow from the Examples server and I replaced the document source and hit execute and the PCA node is suck at 40% for almost an hour already. I must note that documents were resumes and there were only 94 of them. Knime is now (Not Responding).

Why is it taking so long?

Hi cageybee,

Most probably you have many rows and columns after creating bit vectors from your documents (check the output of the Document Vector node). PCA complexity is O(min(p^3,n^3)) (check this paper) and that is the reason it might take so long to execute the PCA node. You can try to restart your KNIME, reduce number of rows/columns (by sampling your documents with the Row Sampling node) and execute it again.


Anna, I left the process running over night and it was stuck at 40% the entire time and never moved beyond that.