smarts query node - problems with different SMARTS lists

j_wollenhaupt · July 20, 2022, 9:42pm

Dear all,

I really like the Smarts Query node as it is fully multi-threaded. I am using it to do substructure searches, as I have lists of Smarts. (and your substructure node only takes SMILES)
It worked well on one of my smarts, but only partly or not at all on my other lists (molecules not correctly filtered)

Could someone tell me what has to be special about the SMARTS and if I can transform my lists in automatic fashion so that they will work with the node properly.

I am uploading the lists as well as some example molecules as input.

would be very happy about any help

best,
Jan

kienerj · July 21, 2022, 8:11am

Can’t really provide any help except:

rdkit detects all patterns from the faulty table as faulty as well. I’m by far a SMARTS expert so I can’t judge them on their validity. In general not all SMARTS parsers are made equal so while the creation program can deal with them, cdk and rdkit seem not be able to do so.
Question is the source of the SMARTS? Hand-made? maybe check them again. Or if from an application/toolkit maybe get some way to use the same toolkit in KNIME. Chemaxon/Marvin for example also has nodes (at extra cost) or use the External tool for the search if the used toolkit has a cli app.

j_wollenhaupt · July 22, 2022, 2:05pm

yes they are hand made,
sometimes there is a publication where they came from , but sometimes just inhereted from nice colleagues in Industry.

they work well in the RDK Substructure Node (so they cannot be completely “wrong”), but there you have to loop over the lists of queries as variables, as there is no 2nd input.
The CDK node has two inputs and does the parallelization fully internally. (20-25times faster in total)

system · October 20, 2022, 2:05pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.