Python node differing batch size error

I’ve built a hacky python row filter, which seems to work fine for smaller tables, however for a little bigger table ~ 76k x 300 i run into a batching error.

The code is super simple:

import knime.scripting.io as knio

if len(knio.flow_variables["required_columns"]) != 0:

    df = knio.input_tables[0].to_pandas()

    mask = ~df[knio.flow_variables["required_columns"]].isin([-9999999]).any(axis=1)

    knio.output_tables[0] = knio.Table.from_pandas(df[mask])
else:
    knio.output_tables[0] = knio.input_tables[0]

An exception was raised by the Python Proxy. Return Message: Traceback (most recent call last): File “/home/user/software/knime_5.2.5/bundling/envs/org_knime_pythonscripting/lib/python3.11/site-packages/py4j/clientserver.py”, line 617, in _call_proxy return_value = getattr(self.pool[obj_id], method)(*params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.scripting.nodes_5.2.4.v202405171011/src/main/python/_knime_scripting_launcher.py”, line 237, in closeOutputs raise e File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.scripting.nodes_5.2.4.v202405171011/src/main/python/_knime_scripting_launcher.py”, line 231, in closeOutputs self._backends.tear_down_arrow(flush=check_outputs) File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.scripting.nodes_5.2.4.v202405171011/src/main/python/knime/scripting/_backend.py”, line 345, in tear_down_arrow b.tear_down_arrow(flush and is_active_backend) File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.scripting.nodes_5.2.4.v202405171011/src/main/python/knime/scripting/_backend.py”, line 186, in tear_down_arrow self._write_all_tables() File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.scripting.nodes_5.2.4.v202405171011/src/main/python/knime/scripting/_backend.py”, line 173, in _write_all_tables table._write_to_sink(sink) File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.arrow_5.2.0.v202311290857/src/main/python/knime/_arrow/_table.py”, line 321, in _write_to_sink sink.write(batch) File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.arrow_5.2.0.v202311290857/src/main/python/knime/_arrow/_backend.py”, line 392, in write self._write_batch(data) File “/home/user/software/knime_5.2.5/plugins/org.knime.python3.arrow_5.2.0.v202311290857/src/main/python/knime/_arrow/_backend.py”, line 397, in _write_batch raise ValueError( ValueError: Tried writing a batch after a batch with a different size than the first batch. Only the last batch of a table can have a different size than the first batch.

Interestingly the error only occurs in the pass through case, which is equivalent to the example script:

import knime.scripting.io as knio

# This example script simply outputs the node's input table.

knio.output_tables[0] = knio.input_tables[0]

Forcing the conversion to pandas DataFrame and back circumvents whatever is happening in the batching process:

import knime.scripting.io as knio

# This example script simply outputs the node's input table.

knio.output_tables[0] = knio.Table.from_pandas(knio.input_tables[0].to_pandas())

From my understanding knime is using arrow, so consequently back and forth conversion to arrow does not fix the issue:

import knime.scripting.io as knio

# This example script simply outputs the node's input table.

knio.output_tables[0] = knio.Table.from_pyarrow(knio.input_tables[0].to_pyarrow())

Another minor find is that python nodes seem to stay editable for linked components.

Hi @Ellison ,
Thanks for reporting this. Our developers are aware of this and will be working on it.

Thanks,
Sanket

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.