Python Scripting Node - Stop at 70%

trj · June 8, 2020, 2:57pm

Hello,

My Python Script is fully executed in debug mode and in the node (checked with comment to follow the evolution). But nothing’s going out. The node is stucked at 70%.
So the content is ready in the DataFrame, but nothing on the output port.

Any body can help ?

Best
Jerome

scapuzzi · June 9, 2020, 2:12am

Can you share more information? Don’t forget you always need to finish the code with:

output_table = df

(“df” is whatever you called the dataframe. It is called “input_table” by default.)

trj · June 10, 2020, 6:45am

The code is finished by the output_table = df.

Finally I let the node run for hours and it raise and error :

Execute failed: The requested buffersize during serialization exceeds the maximum buffer size. Please consider decreasing the ‘Rows per chunk’ parameter in the ‘Options’ tab of the configuration dialog.

It was defined at 500’000 and I tried to reduced to 5’000. Same error…

My computer has 64 Gb of Memory so it’s not about memory.

Any other tips ?

mlauber71 · June 10, 2020, 2:05pm

@trj can you tell us more about the data you are using initially (size, no of rows)? Have you tried it with a fraction of the size to see if it works at all.

These are the things that come to my mind to fix this:

store the data from within Python in as a parquet file and then read it back in KNIME (not the most elegant way but it may work) *1)
when you define the output_table statement at the and try using it like this:

output_table = df.copy()

There have been instances when this .copy() thing has worked. Not sure why.

*1)

trj · June 10, 2020, 2:50pm

@mlauber71 Thanks for your answer and various exploration direction.
But saving in a file and reading it back is not working. Too many values for KNIME.

Indeed, I think that is a limitation of KNIME with the number of columns.
With a subset it work well and I needed to do a transposition in order to reduce the number of columns.

My dataset have 48 rows and 3’000’000 columns. That was the issue.
I tried to export the second dataframe column as a “list” but then KNIME is not managing it anymore.

My solution (sure I can find something better):

Create the DataFrame and transpose it
GroupBy to create a list (failed because too many values)
Use only Python Nodes to do all the manipulation until the minimum number of parameters are reached.

So I found the limitation with KNIME to manipulate a huge amount of data.

I’ll have to use KNIME Python Node in order to keep going with that project. And use KNIME as a orchestrator and visualisation tool to show the process.

Thanks all for the support!
Best
Jerome

scapuzzi · June 10, 2020, 3:10pm

That link is a 404. Seems like an interesting topic - is there a live link?

mlauber71 · June 10, 2020, 3:57pm

There seems to be a problem with the KNIME hub at the moment. I will attach the workflow here. I hope the hub will be back soon.

kn_example_python_read_parquet_file.knwf (258.3 KB)

edit: the hub seems to be coming back. Sometimes if you press reload the content will re-appear. I hope this will be fixed soon.

system · June 2, 2023, 9:28pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.