Bug in RDKit Substructure Searching nodes

It seems that if the substructure query contains Rx groups, then the substructure fails to find any hits.
For example, using Marvin sketcher and drawing a Phenyl ring with an R1 group, and then defining R1 to be Chloro or SH gives the SMARTS: [$(Cl),$([#16])]-[#6]-1=[#6]-[#6]=[#6]-[#6]=[#6]-1.
When this is run through a dataset containing these motifs, no hits are found. I have also tried to keep the representation as SDF and the substructure query still returns 0 hits.
Both the RDKit Molecule Substructure Filter and RDKit Substructure Filter give the same 0 hits. The node was set to return at least 1 matching hit.
Sending the same example to the Indigo Substructure Matcher node (as SMARTS or SDF) returns the desired number of results.
Please can this bug be fixed.
Thanks, Simon.

Looking a your query I suspect the issue is aromaticity detection with the RDKit is very susceptible to. Write the smarts in aromatic form and try again.

I should of mentioned. That was the first thing I tried but no success.


Hi Simon
The problem here is that your SMARTS includes explicit single and double bonds. This tells the substructure matcher to look for single and double (not aromatic) bonds.

If you export the molecule from the sketcher as a MOL (or SDF) cell you shouldn’t have this problem, but then you’ll lose the R groups (the RDKit parser doesn’t currently support those).

What you might try is kekulizing the molecules you want to search against. I think that should work.

I will put together a workflow showing a couple of alternatives here and share it in a bit.


The attached workflow shows three solutions to this.

  1. I use your SMARTS query but kekulize the molecules in the table to be searched
  2. I make the query aromatic in Marvin before exporting the SMARTS. This expresses what you’re actually trying to do with the query
  3. I split the query into two separate molecules and export them as MOL.

Hopefully this helps!
Forum aromatic substructures.knwf (56.4 KB)