Apache Arrow Serialization Error

Hi All!

I am trying to increase the speed of my Python nodes since it seems that there is a bottleneck with getting the data out of the Python node.

I’ve seen in the forums discussions about changing serialization to Apache Arrow. I have version 0.17 installed in my Python3 Anaconda environment (which I have configured in KNIME). After selecting Apache Arrow serialization when I open the Python node I get the following error:

org.knime.python2.kernel.PythonIOException: ‘pyarrow.lib.ChunkedArray’ object has no attribute ‘name’

I’ve seen pyarrow version requirements in old forum posts and am wondering what the current information is on this. Any help is greatly appreciated!

Hi @TardisPilot,

KNIME currently makes use of Apache Arrow in version 0.11, so you will need to downgrade the pyarrow package in your environment to that version (e.g., via conda install -n your_environment pyarrow=0.11). The current version requirements can be found in the Python installation guide, where they are listed in the py36_knime.yml file.

Sorry to hear that you are experiencing performance issues with the Python nodes. We are currently working on some improvements in this respect. We are also planning to upgrade Apache Arrow to a more recent version soon.

Marcel

4 Likes

Hi @MarcelW,

Thank you for the clarification and the link! I really appreciate it.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.