Problem with analysis of large dataset of similarities

newflame · March 15, 2012, 9:32pm

Hi all,

Forgive me if this is a naive question.

I have a large table of similarities, 8000x8000 which I would like to analyse, e.g. cluster and plot using pca etc. The table is largely (though not exactly symmetrical and all values are between 0 and 1 and takes up 400M on disk. However I'm struggling to get KNIME to do anything on it without running out of heap. Anyone got a good idea of how to proceed? So far I've tried PCA directly and tried filtering but no good. I'm sure there is an obvious approach but I just can't see it.

Thanks in advance for any help!

richards99 · March 15, 2012, 10:45pm

Have you tried changing the setting in the node(s) which is causing the heap issue. Maybe in the memory policy tab of the node (or nodes) try getting it to write tables to disc.

Also try increasing the memory knime can use by editing the knime.ini file in the directory of knime. Change the -xmx512m to -xmx1024m or even higher if your machine allows. You will need to restart knime for this to take effect.

Does this help?

Simon.

newflame · March 16, 2012, 10:28am

thanks Simon, I gave it 3G and that did the trick!

Tim