We have created an Azure VM with Windows Server 2016 and installed KNIME server on it. We have noticed that when using parallel processing it creates far fewer chunks than for instance when I run the workflow on my laptop. What could the reasons be for this?
do you use automatic chunk count?
This one creates the chunk count depending on the system and especially its CPU.
The formula for automatic chunk count is
1.5 * #available Processors (rounded up)
So now the problem is, what is an available processor? It depends…
For example using an Intel i5-CPU the number of available processors is equal to the number of cores.
Considering an Intel i7-CPU the number of available processors is equal to the number of cores * 2, due to their Hyperthreading capability.
However, concerning Azure I’m not sure if Java obtains the correct number as their documentation states:
Returns the number of processors available to the Java virtual machine.
This value may change during a particular invocation of the virtual machine. Applications that are sensitive to the number of available processors should therefore occasionally poll this property and adjust their resource usage appropriately.
see here: https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#availableProcessors() .
So it could be that Azure scales in such a way that the number of available processors changes during execution, or that in general the number of available processors is unkown to the JVM, in which case it would 1 and thus we would obtain 2 chunks.
Could you provide some information: number of cores, what kind of cpu, number of rows of the input table, and how many parallel chunks you get?
I am working with Willem on this Knime project and will try and answer your questions
((get-counter “\Processor(*)% idle time”).countersamples | select instancename).length -1 = 4
4 x Intel E5-2673 v4 @ 2.3Ghz
Standard D4s v3 (4 vcpus, 16 GiB memory)
we have split the processing for #rows using || chunks for 3 x 15,000 and 1 x 235,000
5 is created
to process 533K rows in total it is taking nearly 3 hours… on Willems i7 PC @ 1.8Ghz he gets 11-12 chunks and it takes 45mins!
sorry for the late reply.
So concerning the CPU: Intel E5-2673 v4 has 40 virtual cores (20 + hyperthreading) thus, for a single CPU you should already get ~60 chunks.
Unfortunately it seems that the code returns a wrong number of available processors for virtual machines (see for example https://stackoverflow.com/questions/55596774/runtime-getruntime-availableprocessors-returning-1-even-though-many-cores-av in this case its AWS). I’ve created an ticket to track the error internally and investigate further for a possible solution.
So far the workaround would be to adjust the workflow and set the number of chunks manually. The drawback is, that you always have to adjust it if you run it on another machine.