RDKit Substructure Counter node

I've used RDKit Substructure Counter node and Create BitVector node [setting bit if count>=1] to generate a "fingerprint" from a set of query molecules. The query molecules are input as smarts strings, and the target is a smiles string.

This seems to work, however, I'm getting very different results to using a python script to call  RDKit HasSubstructMatch directly using the same input. Fragment of python code is:

    for q in queries:
        mq = MolFromSmarts(q)

Does anybody know why these might differ since presumably, the Knime node is just calling essentially the same RDKit functions under the hood?

BTW, using the Indigo Substructure Match Counter seem to agree with RDKit Substructure Counter, so I suspect thet the Python code isn't working as expected?!



Are you running the python script externally to KNIME? Could yout try running it in a python snippet node to see if it is an RDKit version issue? 

Have you looked at some of the matches to see if they are correct? For example can you match the SMARTS by eye for one of the queries and agree with the generated fingerprint?

I've run the Python code within a Knime Python Script (2:1) node. Knime points to the same Python version as I use when I run stand-alone Python; this might not be the same as the RDKit version used by Knime RDKit nodes however.

The query molecules are generated elswhere, one mismatch is:

mol = 'C[C@@H](NC(=O)[C@H](CCCCN)N1Cc2[nH]c3ccccc3c2C[C@@H](NC(=O)Cc4ccccc4)C1=O)c5ccccc5'

query = 'C:C:C'

The query was generated by the Indigo MCS Scaffold Finder, and seems a bit odd to me 'aliphatic carbon joind by aromatic bonds'?



I have discovered where the workflow was going wrong, the input file for the queries had a "molecule type cast" node, with the option somehow reset to "smiles", when it should have been "smarts". Once I changed this, the two method agree with each other.

Led astray by daft errors :-(