Python Knio write error

Hi I tried the new python labs node and have an issue when writing back

import knime_io as knio
import pyarrow as pa
# This example script simply outputs the node's input table.
df = knio.input_tables[0].to_pandas()
df = df[['Cat','CustomerID']]
ar = pa.Table.from_pandas(df)
#does not work
knio.output_tables[0] = knio.write_table(ar)

#does work
#knio.output_tables[0] = knio.write_table(knio.input_tables[0])

In the Node both work but when I close the node and execute the workflow the upper throws me an error
Any ideas? It just tells me exception occurred in the python kernel. Thanks

@Daniel_Weikert could you give us more details for example what the error message says? Better even could you provide a workflow that would demonstrate the error?

Two possibilities would be date and time variables where there seem to be problems.

Or might there be an issue with RowID and index in pandas?

Thanks a lot for your comments @mlauber71 .
I made sure to remove any datetime columns and resetted the index in the pandas df.
It works smoothly as long as I have the node open. Only when running the workflow it fails.
Error is
Execute failed: An exception occured while running the Python kernel. See log for details.
but the KNIME log does not show anything? What kind of log is this message pointing to?

Are the new nodes a huge performance boost? Just wondering because the old node works fine with this “testing 2 line script”

@Daniel_Weikert oils you check your python setup and maybe try with a fresh one following the KNIME Python Guide or the provided yaml files.

Logs in debug modus:

Also you might create a minimal workflow that could be checked.

Hi @Daniel_Weikert,

could you try to run

knio.output_tables[0] = knio.write_table(df)

instead of

ar = pa.Table.from_pandas(df)
knio.output_tables[0] = knio.write_table(ar)

and see if that works? KNIME performs some additional type handling when converting between Pandas and PyArrow internally, but when you do it manually KNIME cannot do that for you. So that could be the culprit.

A minimal workflow using the same data types (but maybe randomized data) as your real workflow would help us to reproduce and understand the problem better.



Thanks a lot @carstenhaubold
That worked. So it is never required to convert the dataframe back even it is converted to pandas after reading it? The output writer will always accept a pandas dataframe?

Yes, as we allow to read input tables using pyarrow or pandas we also allow to return both of those types.

May I ask which data types are inside your data frame? Could you show us the output of print(df.dtypes)?

The dataframe hat several types including datime, float, category.
But the output I converted back to pyarrow where only to categorical columns and with those only it already failed

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Thanks for the feedback, we’ll have a look at categorical columns!