Python numbers are wrong

jenniferh · August 27, 2018, 9:51am

Dear all,
today I tried to evaluate a keras model in Knime within a python node. So far it worked, but when I send the predictions to an output table I get a lot of weard numbers showing in the knime table, which are clearly not there in the table if I print it to the Python output (or write it as a csv):
Here a screenshot

In the xls file are the actual numbers as outputted by pandas to_csv
predictions.xlsx (24.8 KB)

The column types are float32, in case that helps.

It would be great to get a hint what is going wrong here. Unfortunately I can’t really reproduce it in a different (shareable) setting but ill try further.

Thanks in advance,
Jennifer

MarcelW · August 27, 2018, 10:33am

Hi Jennifer,

This looks like a number overflow/loss of precision during conversion of Python floats into Java floats. What serialization library did you use to transfer the data to KNIME? You can find that under File > Preferences > KNIME > Python > Serialization library.

A shareable setting (i.e., an example workflow that reproduces the problem) would greatly help if that’s possible.

Thanks for reporting!

Marcel

jenniferh · August 27, 2018, 11:40am

Hi Marcel,

thanks for the quick feedback.
The Serialization indeed is the problem: I used the “standard” one (Flatbuffers Column Serialization) and when I switched to the experimental CSV it seems to work fine (The Apache arrow Serialization does not work at all).

With Flatbuffer:

With CSV Serialization:

I found a way to reproduce it:
It is reproducible if I read the csv with pandas read_csv into the Python source node, and write it to a KNIME table.
I have attached the ‘Workflow’ and the csv (as csv upload is not permitted I attached it as a *.txt file)

Python_issue.knwf (19.4 KB)
predictions.txt (24.3 KB)

Is there anything I should be aware of if I use the CSV serialization as it is labelled experimental? Then I would just keep it as is until the other serializations work as expected.

Thanks again,
Jennifer

MarcelW · August 27, 2018, 12:23pm

Hi Jennifer,

I’m glad that helped. I’d recommend Apache Arrow over CSV. To make Arrow work, you’d need to install pyarrow, version 0.7.0, in your Python environment. The package is available via the conda-forge channel. A command to install it would look like this:
conda install -n py35_knime -c conda-forge pyarrow=0.7.0
where py35_knime is the name of your conda environment.

Thanks for providing a workflow to reproduce the problem, we’ll work on fixing it as soon as possible.

Edit: This will be fixed in a future version of KNIME.

Marcel

jenniferh · August 27, 2018, 12:48pm

Hi Marcel,

I installed pyarrow via pip (version 0.10.0) before you mentioned conda) and it gave me this error (Thats why I said in the beginning it does not work at all):

ERROR PythonKernel pyarrow.lib.ArrowIOError: Invalid flatbuffers message.
ERROR Python Source 4:120 Execute failed: java.lang.Exception: Failed to receive message from Python or forward received message

removing it and doing it via conda as you suggested gives:

ERROR PythonKernel AttributeError: module ‘pyarrow’ has no attribute ‘OSFile’
ERROR Python Source 4:120 Execute failed: java.lang.Exception: Failed to receive message from Python or forward received message.

Any Idea? (Python 3.6.5, conda 4.3.30 on Ubuntu 16.04 LTS)

Thanks a lot!
Jennifer

christian.birkhold · August 27, 2018, 1:52pm

Currently, we only support arrow 0.7.x. Can you try with that version? Sorry for the trouble, we’re planning to update to 0.10.x.

jenniferh · August 27, 2018, 2:06pm

Sorry , it first gave me the second error above, but restarting helped. Now it works fine! Thank you so much!

system · September 3, 2018, 2:06pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.