Why does KNIME run a little slowly?


I am doing huge data analysis using KNIME on Windows 7. Everything is working perfectly, but I was wondering why the calculations are taking long (2 days), although the processor is not very busy (the light is just blinking and I am able to run many other programs) and only about 0.5GB of RAM are being used (I got the value from the task manager). Is there anyway for KNIME to consume all the abilities of the machine and speed up the data analysis? I do run other programs like R on the same machine and it happily consumes 2.5 GB of RAM.


Thanks all,


The ammount of memory KNIME can use is controlled by the heap space allocation. 

If you are on 32bit you can set this up to 1Gb, on 64bit systems you can use as much memory as you like (within reason). 

You can change this value by editing the knime.ini file located in the KNIME directory. Change the value of -Xmx {number}m to -Xmx2g for 2GB. 





Thanks for the hint Sam. I found the Knime.ini file, the value is set to 1024, yet the program is only using 472 MB at the moment.

I am still trying to find a way to speed this up, the calculations I am doing are a lot of little processes (100x iterations of a decision tree for multiple datasets). I am not sure of that has to do with the RAM, but I am just not fully happy about my machine using 1/4 of it's power and taking a few days to make the calculations. Can you think of anything that I can check to speed up the process?


Many thanks,


Are the nodes all in a linear path? Not all of the nodes take advantage of multi core processors. So for example if you have a quad core and only run 1 thread at a time per node, and only 1 node at a time you will only use around 25% of your CPU. 

Lets say you have a workflow Node 1 --> Node 2 --> Node 3 --> end. Could you split the output of node 1 into multiple parts and run node 2 --> node 3 --> concatenate --> end instead? 

I believe the Parallel chunk nodes can help you automate this (from the KNIME labs), I've attached an example. 

As a general rule I find there is an overhead with looping and tasks that should be quick can take a while. 



I see what's happening here. Yes the processes are arranged in a linear fashion, I will try to split them into separate jobs and see if they work faster. Thanks.