Strange behaviour of Knime Python (1=>1) Node

Hi,

I have a really strange error and I was not able to fix it till now. A have a python script, which takes a dataframe and returns a pandas dataframe with 3 columns. Each element in every column is a list itself. The script worked totally fine, but know I’m getting a completely random error:

I know this kind of error and know what it means, but it makes no sense, because the script is running completely fine when I click on execute in the node.

Furthermore if I just swap my return df final_df with the input_table df_invoices there is no error anymore. Which makes no sense, because the script is doing essentially the same but is just returning the old not altered dataframe. And what does this have to do with the error above? Why is there even an error outside of the node but not inside?

Can someone please help me.

EDIT: If I comment everything out and just return a pandas DF with numbers I get the following error:

It is so weird because it worked fine before.

Best regards
Armin

Hi @ArminFan,

The error seems to happen on the “way back” from Python to KNIME, that is, outside of your script, when KNIME is converting the pandas data frame into a KNIME table again. This conversion only happens when you actually execute the node, not in the configuration dialogue (because in the dialogue, all data stays inside Python and no actual KNIME table is created).

Errors like the one you experience can happen if KNIME fails to find a proper equivalent KNIME-table representation of an output pandas data frame. That is because the format of a KNIME table is more restrictive than that of a data frame, so not every valid data frame can be converted into a valid KNIME table.
It is hard to tell whether that is also the underlying problem in your case. Could you share your knime.log file or check for yourself if the log contains any Python tracebacks that describe the error in more detail (e.g. point to specific lines in the code that performs the conversion)?

Marcel

Hi @MarcelW ,

knime.log (1.6 MB)

I attached my log for you. I thought about the same, maybe there is some kind of data which Knime couldn’t process. My dataframe right now looks like:

So nothing to special, just integers and floats.

Regards,
Armin

Hi @ArminFan,

Do the individual elements (numbers) of the data frame happen to be wrapped in numpy arrays?

I can reproduce the “non-byte type passed to CreateByteArray” error like this:

import pandas as pd
import numpy as np

output_table = pd.DataFrame({"my_test_column": [ np.array([1]) ]})

whereas this version works:

import pandas as pd

output_table = pd.DataFrame({"my_test_column": [ 1 ]})

Similarly, I can reproduce the “The truth value…” error like this:

import pandas as pd
import numpy as np

output_table = pd.DataFrame({"my_test_column": [ np.array([1, 2, 3]) ]})

whereas this works:

import pandas as pd

output_table = pd.DataFrame({"my_test_column": [ [1, 2, 3] ]})

numpy arrays as data frame elements are (currently) not supported by KNIME because KNIME does not have any builtin n-dimensional array types.
Still, we should definitely produce more helpful error messages in such cases + potentially treat numpy arrays that contain only a single element or are 1-dimensional separately (the latter should be representable in KNIME in the form of collection cells).

Marcel

4 Likes

Hi @MarcelW,

thank you very much. That solved my problem. I wondered why this error never happened before, but I always used lists, but for this use-case I needed the numpy array urgently so therefore I just converted it back to a list before returning it. Thanks!

The error message with the “.any() or .all()” was weird because it leaded me to one hour of testing a list comprehension with an “and” which was ultimately completely fine.

Best regards,
Armin

2 Likes

Great! – Yes, we should probably do a better job of pointing out whether an error happens inside or outside the user-provided script. I am going to open an internal ticket for this and the improvements mentioned in the post above.

6 Likes

@ArminFan for what it is worth: you could still combine KNIME and Python and numpy arrays, just not use them inside KNIME but save them separately and re-use them whenever needed:

1 Like

Hi @mlauber71,

thanks, thats a good point. Writing to disk/server and reading is a possibility, that’s true.

Regards
Armin

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.