How to use Apache arrow in python scripting?

Previously I’m using flatbuffer or csv to transfer the data.
But the data serialization cost too many time, as I always processing more than 10mn row.

I try to change the serialization to Apache Arrow. But face this problem.

I’m using windows 10, Knime 3.5.2, Anaconda3, Apache Arrow 0.9

When I open the python (1->1) node, it says python start fail, and through the error

Traceback (most recent call last):
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.5.0.v201712011355\py\PythonKernel.py”, line 1376, in
kernel.run()
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.5.0.v201712011355\py\PythonKernel.py”, line 390, in run
self.run_command(self.read_string())
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.5.0.v201712011355\py\PythonKernel.py”, line 398, in run_command
handler.execute(self)
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.5.0.v201712011355\py\PythonKernel.py”, line 1139, in execute
data_frame = kernel.bytes_to_data_frame(data_bytes)
File “C:\Program Files\KNIME\plugins\org.knime.python2_3.5.0.v201712011355\py\PythonKernel.py”, line 412, in bytes_to_data_frame
column_names = self._serializer.column_names_from_bytes(data_bytes)
File “C:\Program Files\KNIME\plugins\org.knime.python2.serde.arrow_3.5.0.v201712122252\py\ArrowSerialization.py”, line 88, in column_names_from_bytes
deserialize_data_frame(path)
File “C:\Program Files\KNIME\plugins\org.knime.python2.serde.arrow_3.5.0.v201712122252\py\ArrowSerialization.py”, line 281, in deserialize_data_frame
stream_reader = pyarrow.RecordBatchStreamReader(f)
File “C:\Anaconda3\envs\py36_knime\lib\site-packages\pyarrow\ipc.py”, line 58, in init
self._open(source)
File “ipc.pxi”, line 260, in pyarrow.lib._RecordBatchReader._open
File “error.pxi”, line 77, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Old metadata version not supported

Hi zzenx,

downgrading Arrow to version 0.7 should solve the problem:
conda install -n py36_knime -c conda-forge pyarrow=0.7.0

Marcel

1 Like

thanks, Its really working.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.