How work chunk size in Python Script ( Python Scripting node )?

Hi everyone,

I'm working with "Python scripting" node but I don't understand how works the "chunk size" in the "options" of the "dialog windows".

Since the fit method provided by scikit learn, doesn't support the partial_fit, I wanted to figure out why the result changes when I set the chunk size to 5, 50 and 200.

The DataFrame table has 170 rows, and in the end of the process, I obtain ever the same rows number.

I attached all the images.

Thanks for the attention.

 

Hi EmanueleNeo,

The chunk size defines how much data gets pushed into your python node. You have to be careful to use it if the data needs to be processed all at once. In your case, a smaller chunk size implies that you are training your model with less data in each iteration, which leads to a lower overall accuracy. Does that make sense?

Best,

Jeany

Hi Emanuele,

also, are you using the Python nodes from the Community Contributions or the ones in "Scripting"? Depending on this chunk-size may have different meanings. In the case of "Scripting" it only is a technical parameter which shouldn't influence your results at all (the final pandas data frame in python will always be the same). If it still makes a difference, it would be great if you

* Could double-check if your algorithm has some randomization in it? Can you maybe set a seed?

* Can you provide a small example workflow? This would make it much easier for us to debug the problem.

Best,


Christian