Hello,
I’m trying to manage memory usage when converting data from the Python Script (Labs) node to a Knime table. The conversion from Pandas DataFrame to Pyarrow was causing memory issues, so I’m skipping the DataFrame step altogether.
I’ve tried many ways of outputting the Pyarrow table, but I’m always getting the following error:
Executing the Python script failed: Traceback (most recent call last):
File “”, line 190, in
File “/home/ubuntu/knime/configuration/org.eclipse.osgi/673/0/.cp/src/main/python/knime_arrow_table.py”, line 344, in append
batch = ArrowBatch(data, sentinel)
File “/home/ubuntu/knime/configuration/org.eclipse.osgi/673/0/.cp/src/main/python/knime_arrow_table.py”, line 109, in init
raise ValueError(“Can only create a Batch with data”)
ValueError: Can only create a Batch with data
The technique which appears most efficient is to create a list of arrays of column data, and apply a schema. I’ve also tried appending the data as batched, chunked, and dict data but I get the same error.
I have included a minimal version of the Python Script (Labs) node. I have made it a lot more minimal than the original but have retained parts of the consume() function which may seem unnecessary, but I figure it may be useful in case there’s a solution within returning the HTTP response data (maybe it can remain formatted as Pyarrow rather than converting from Pyarrow, to Python dict, back to Pyarrow?)
You will see in the script (line 190) that outputting the data by converting to Pandas Dataframe, (then to Pyarrow behind the scenes), and then to Knime table works just fine, so I’m going wrong with Pyarrow somewhere.
Thanks in advance.