How to increase maximum working threads for all nodes

Hi,

Can somebody help me increase maximum working threads for all nodes from 100 to 300?

I have about 26 Gigs of memory. So that won't be an issue. Thanks!

Roger

My understanding is the default is 16. 

File > Preferences > KNIME

"Maximum working threads for all nodes = XXX" 

Change the value here. 

You need to worry about CPU ussage as well as memory and may see no to little benefit to increasing the thread size past a certain point. 

When I enter more than 100 it says "Value must be an integer between 1 and 100". I want about 300 threads.

Except if your computer has more than 100 cores, it doesn't make sense the enter larger values. Everything will get slower instead because of scheduling overhead.

Actually, I just want to run 300 python processess. I have done this outside of knime, just in python. I wanted to replicate this in knime. 

Running them at the same time on the same machine is not going to be exactly efficient.
Spawning the processes N at a time, where N is the number of cores, will always give the best results except if the processes are waiting for external sources such as webservices.

1 Like

These processes try to open files and run a statistic in each file. Not very cpu intensive work. For me, the number of processess is somehow proportional to performance (time it takes to complete). It does consume lot of memory but I have enough to run about 300 processes. So back to my original question, is there a way to increase number of processess to 300? From the discussion looks like there is not?

The processes have to have at least one of the following factors limiting their execution speed:

- Memory,  but you say you have plenty (but dont underestimate the importance of OS specific file-buffers etc)
- External factors such as network latency and calls to other servers/services, but you say they are not.
- CPU
- IO

In the case of a CPU-bound job spawning more processes to do more of the same is not going to have a positive effect on overall performance, it might even slow down a bit due to task-swapping.

In the case of IO-bound jobs there is even more black magic involved, but in general you can observe that (especially on traditional rotating-platter style harddiscs) even having number-of-cores processes fighting over disc-access allready has a significant degrading effect on the overall runtime of your jobs. In fact i see from experience that purely IO-bound jobs are best run in a sequential manner rather then in parallel.

 

 

If you are running on linux or a mac you can use standard sysadmin tools like top and iotop to see what is going on and roughly benchmark the process, and then tune back the number of cores that knime uses. Apply logic to the results.

 

1 Like