substructure search with rdkit

hi guys,

we have some problems with a knime project: We have two databases and want to take each molecule from the first database to search for substructures in the second one using the rdkit-substructure filter.

we know how to find substructures for one molecule. our problem is to automatize the process, that every loop the next molecule from the first db is taken to search for substructures in the second db. maybe we have to use flow variables, but we dont know how...

thanks for your help!

best regards,


Hi Jada,
A knime node to do to substructure searches of the molecules on input 2 using the molecules on input 1 as queries is something that we’ve talked about doing. A concrete question for you: if you have M molecules and N queries, what would you like to find in the output table?


Hi Jada,

Sorry for the rather slow response - I guess you may already have worked-around this problem by now(?)  If not, then I think the answer is "yes" you can accomplish this using flow variables with the current RDKit nodes.

I have made a quick example workflow (Knime 2.3.0) that starts with two small SMILES 'databases' and loops over each mol in the 2nd to use as input into the Substructure Filter node that is processing the 1st.  There is also a little bit of processing afterwards to join back in the query mols to double-check the match.

I'm not sure if there is an easy way to share workflows on the Knime site(?), but I couldn't find one - and I couldn't upload a .zip file to my GoogleDocs - so I have setup a 'RapidShare' account and put the example there:

Hope this helps.

Kind regards