Python 1=>2 node display error for molecules

#1

Hi,
I was trying to use a Python 1=>2 Scripting node to transform Molecules and then output them into two tables.
KNIME crashed multiple times trying to display the table, always non reproducible showing me either the desired table or just weard numbers, but each time with a pointer error.

I tried to go back to a very easy script and found a (or the?) problem:

Firstly if I create the two tables separately with eg.
output_table_1 = pd.Dataframe({“a” : [molecule,molecule], “b”:[1,2]})
output_table_2 = pd.Dataframe({“a” : [molecule,molecule], “b”:[1,2]})

I see the two tables with the RDKit Molecule and everything works fine.

If I do something like

x = pd.Dataframe({"a" : [molecule,molecule], "b":[1,2]})
output_table_1 = x
output_table_2 = x

The first table is displayed correctly but the second one shows me weard numbers instead of the molecule.
Screenshot%20from%202019-04-12%2015-05-35

This happens with the apache arrow as well as the flatbuffer serialization.

I have attached parts of the error messages (I can attach the whole if needed) and a simple workflow to reproduce the error.

I am not sure if I am doing something weard or if there is somehow a communication problem between rdkit, python and KNIME. (It works without the RDKit Molecules)

System: Ubuntu 16.04 LTS
KNIME: 3.7.1
RDKit KNIME integration 3.6.0.v201903281548

Python: Anaconda environment with python 3.6.7 and rdkit 2019.03.1.0 (error was also there for the latest 2018 release, just upgraded today) pyarrow 0.11.0

I am happy to provide more information if needed.

Thanks in advance for your ideas!

jennifer

Python_fail.knwf (22.3 KB)
errs.txt (4.4 KB)

1 Like

#2

Have you tried

output_table_2 = x.copy()

Also with Pandas it could make sense to make some transformations permanent with.
, inplace=true

Or it is something else. Like KNIME currently not supporting python >=3.7

1 Like

#3

Hi,
thanks for the suggestions. copy() indeed helps.

Still, I would assume that this should not be needed and most certainly should not crash KNIME in some cases?
For me copy() becomes problematic if I want to use larger datasets (I currently have loads of RAM but I would like to ship that Metanode and I think it should be as efficient as possible)

What I am doing is basically creating a dataframe and then I want to filter the data into two subsets based on some criteria. Hence inplace is not possible (only using copy but that not very efficient I guess)

I am using python 3.6 so it should not be a compatibility issue here.

Edit:
I just tried to recreate what I am trying to do. Interestingly I can execute it inside the node but executing the whole node gives the error: Execute failed: ‘Series’ object has no attribute ‘ToBinary’ but all objects are dataframes. I am not sure if I am totally confused here or if the Python 1=>2 node is.

from rdkit import Chem
import pandas as pd

#flow_variables = {}
flow_variables['Keep_all'] = "No"
flow_variables['keep_mixtures'] = "No"
flow_variables['keep_nonorganic'] = "No"

mol1 = Chem.MolFromSmiles('Cc1ccccc1')
mol2 = Chem.MolFromSmiles('Cc1ccccc1')
mol3 = Chem.MolFromSmiles('Cc1ccccc1')
mol4 = Chem.MolFromSmiles('Cc1ccccc1')

all = pd.DataFrame({'col1': [mol1, mol2, mol3, mol4], 'A': ["Yes", "No", "Yes", "Yes"],
                    'Mixture': ["Yes", "No", "No", "No"], 'Nonorganic': ["Yes", "No", "No", "No"]})

output_table_1 = all.copy()

x2 = all.copy()

if flow_variables['Keep_all'] == "No":
    output_table_1 = output_table_1[output_table_1['A'] == "Yes"]
    out1 = x2[x2['A'] != "Yes"]
else:
    out1 = pd.DataFrame()

if flow_variables['keep_mixtures'] == "No":
    output_table_1 = output_table_1[output_table_1['Mixture'] == "No"]
    out2 = x2[x2['Mixture'] != "No"]
else:
    out2 = pd.DataFrame()

if flow_variables["keep_nonorganic"] == "No":
    output_table_1 = output_table_1[output_table_1['Nonorganic'] == "No"]
    out3 = x2[x2['Nonorganic'] != "No"]
else:
    out3 = pd.DataFrame()

output_table_2 = pd.concat([out1, out2, out3])

I would really appreciate any input from your side.
Thanks

1 Like

#4

Okay.
I figured out the following:
the rdkit molecules make KNIME crash…as described here: Forum post

Nevertheless

  1. the rendering issue for the second output is still weard and
  2. the above error: Execute failed: ‘Series’ object has no attribute ‘ToBinary’

Sorry if the post is messy. I am willing to split, move or rewrite it if necessary.

0 Likes