Azure VM - KNIME Server

#1

We have created an Azure VM with Windows Server 2016 and installed KNIME server on it. We have noticed that when using parallel processing it creates far fewer chunks than for instance when I run the workflow on my laptop. What could the reasons be for this?

0 Likes

#2

Hi,

do you use automatic chunk count?
This one creates the chunk count depending on the system and especially its CPU.
The formula for automatic chunk count is

1.5 * #available Processors (rounded up)

So now the problem is, what is an available processor? It depends…
For example using an Intel i5-CPU the number of available processors is equal to the number of cores.
Considering an Intel i7-CPU the number of available processors is equal to the number of cores * 2, due to their Hyperthreading capability.

However, concerning Azure I’m not sure if Java obtains the correct number as their documentation states:

Returns the number of processors available to the Java virtual machine.
This value may change during a particular invocation of the virtual machine. Applications that are sensitive to the number of available processors should therefore occasionally poll this property and adjust their resource usage appropriately.

see here: https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#availableProcessors() .
So it could be that Azure scales in such a way that the number of available processors changes during execution, or that in general the number of available processors is unkown to the JVM, in which case it would 1 and thus we would obtain 2 chunks.

Could you provide some information: number of cores, what kind of cpu, number of rows of the input table, and how many parallel chunks you get?

Cheers,
Moritz

1 Like

#3

Hi Moritz

I am working with Willem on this Knime project and will try and answer your questions

#Available Processors
((get-counter “\Processor(*)% idle time”).countersamples | select instancename).length -1 = 4

CPU
4 x Intel E5-2673 v4 @ 2.3Ghz
Standard D4s v3 (4 vcpus, 16 GiB memory)

#ROWS
we have split the processing for #rows using || chunks for 3 x 15,000 and 1 x 235,000

|| Chunks

5 is created

to process 533K rows in total it is taking nearly 3 hours… on Willems i7 PC @ 1.8Ghz he gets 11-12 chunks and it takes 45mins!

Cheers
Graham

0 Likes