Chunk size for streaming

l.thomas · March 12, 2018, 4:42pm

Hello,
I found the description of the chunk size parameter for the streaming execution a bit confusing when it states that “Larger chunk values yield better runtime”.
I understood that the data is streamed between nodes by chunk of rows according to this parameter.
In my case, I am using the streaming with the image processing nodes, so I achieve the best execution time by setting the chunck to 1 so that each row is directly streamed individually.
Would that not be the case with a classical table of string and digits for instance ?

Not really related to the previous question but in the job manager of individual nodes there is also now the “Test for streaming and distributed Processing” option. Any information about that ?

Thanks !

gab1one · March 12, 2018, 5:00pm

Datasets with “small” rows with a short processing time per row profit from a larger chunk size, for ones with “large” rows where more time is spent processing each row, a smaller chunk size is better. BTW how large are your images?

l.thomas · March 13, 2018, 8:16am

Thanks Gabriel for the fast reply !
When I am using the original image ie without cropping, there are 2048x2048 pixels, 16-bit, so 8 Mb by image file.