Use of the RDKIT library in a python snippet

Dear KNIME communauty,

I am currently trying to implement a python snippet which requires rdkit library (see code below). The input table is a simple column with smiles entries in String format.

I get the following error:

ERROR     PythonSnippetNodeModel             [17:24:44] SMILES Parse Error: syntax error for input: r
ERROR     PythonSnippetNodeModel             Traceback (most recent call last):
ERROR     PythonSnippetNodeModel               File "/tmp/analyze7779091166834973996.py", line 232, in <module>
ERROR     PythonSnippetNodeModel                 for atom in m.GetAtoms():
ERROR     PythonSnippetNodeModel             AttributeError: 'NoneType' object has no attribute 'GetAtoms'
ERROR     Python Snippet                     Execute failed: No python output table found, check script output

I believe the issue is that I use the Mol class from rdkit within the code, which is not a dictionnary or part of the basic data types (int, ..). 
I also faced some similar issue with a table with a SDF column and could resolve the problem by deleting it.

My question is the following : is there a way to use the RDKIT data type in a python snippet?

I am quite stuck, thank you for your help!!! :)

Cheers,
Jose Manuel
OS - Centos 6.5 x64
Knime 2.9.2

PS: Here is the code I writted. It is designed to filter out molecules with isotopes or inorganic atoms by replacing the unwanted SMILES by a 'non valid' tag.

from rdkit import Chem

pyOut = {}
k = kIn.keys()
smiles = k[0]

# VARIABLES
aOrganic = ['C','H','N','O','S','P','F','Br','Cl','I']

for row in range(0,len(smiles)):

    m = Chem.MolFromSmiles(smiles[row])
    is_organic = True

    for atom in m.GetAtoms():
        aName = atom.GetSymbol()
        aIsotope = atom.GetIsotope()        # values 0 if not an isotope
        if aName not in aOrganic or aIsotope != 0:
            is_organic = False
if is_organic:
    output = Chem.MolToSmiles(m)

else:
    output = 'unvalid molecule'

pyOut = output

Hi,

At the moment there is really no straightforward way that I am aware of to do this.

hopefully someone else has a better answer, but otherwise I suspect it will have to wait until the new python interactive nodes are available.

-greg

 

The column type is not supported as you already assumed.

I don't know much about RDKit-data: is there any way to write the KNIME-table to a file; pass the path to this file to the snippet and read the file within python for further processing?

I'm not aware of any better solution...

Thank you for your answers!

Actually I did something similar to what niederle suggested: I called my python script within KNIME using a bash node, and then used a SDF reader to reimplant the data within the workflow.

I am pretty sure this is not the most straightforward way to do it, but it worked immediately for me.

If anyone has a better idea in the meantime new python snippets are released, don't hesitate to share it with me! ;)

Jose Manuel

Hi Jose,

Windows update 'restart/postpone' popup killed my first attempt at replying to this(!)  Here's attempt #2

My understanding is that the Python Snippet node is very fussy about input and output column types - so only 'simple' types (String, Integer, etc) are allowed.  So in your case I think you should be fine - String in, String out.  But not all String columns are equal!

My guess is that either you have some additional non-simple type columns in your input table (eg RDKit Mol, Smiles, etc) that are messing things up - even if you aren't referring to the columns in the script.  Or - more likely - you have used a Column Rename node to convert a Smiles-type column to a String column(?)  If so, this isn't String-like-enough for the Python Snippet node.

From memory when I have needed to do the same sort of thing, I have either used a Java Snippet node to make a new String column from the content of the Smiles column, or - even easier - used the String Manipulation node's string() method to the same effect.

Hopefully this helps - but if not, I can look back to find a concrete example.

 

Kind regards

James 

Hi James,

Sorry for the late answer...

I actually use the String Manipulation node really often, so I noticed this behavior (and learned the hard way to fear wild conversions using column rename)!

However, this would not help me as I wanted to use the RDKIT objects within KNIME (that's why I had to do this bulky workaround... and probably will keep on doing it until Python snippets can deal with non-standard data types).

Anyway thank you for your suggestion!

Cheers

Jose Manuel

Hi,

Last may, Greg mentioned some " new python interactive nodes". Aaron also mentioned something like this in june.

Did anybody hear anything from these?

It would be very nice to be able to use these with non-standard columns (SDF, RDKIT...).

Cheers,
Jose Manuel