RDKit Substructure Counter doesn't return the right number of matched substructures


I’m trying to match a set of SMARTS to a specific target structure. The structure is:

(smiles: [N-]=[N+]=Nc1ccc(/C=C/c2nc3ccc([N+](=O)[O-])cc3s2)cc1

When I try to match SMARTS [#6][$([NX2-1][NX2+1]#[NX1]),$([NX2]=[NX2+1]=[NX1-1])] (azide) using the RDKit substructure counter, I’m not getting the right number of matches. The settings are shown here:

Somehow, the node returns a Total Hit Count of 2, instead of 1. Am I missing something about the way this type of SMARTS are matched (two versions of the NNN moiety) ?

Thank you

Hi Serge,

From the configuration dialog, it looks like you have an older version of the RDKit nodes installed. I can’t reproduce the problem with the newer version. Could you please try updating and seeing if that clears up the problem?


Hi Greg,

Thank you for your feedback. Interesting… I have actually the latest version of the RDKit nodes.

When I run only that specific query, I indeed get only one hit, but when I run a full list of queries against the same structure, this very specific SMARTS (azide) gets a hit count of 2, which is wrong. I have attached a simplified workflow demonstrating this issue.

Test_QueryCount.knwf (28.5 KB)

PS: The version installed seems to be 4.2.0.v202103031420

Ah, I see. That’s a subtle one.
The azide query in the workflow you sent actually includes an extra atom:
So it’s matching the C connected to the azide (that’s the second atom in the query) and it’s two neighbors (that’s the first atom in the query).
Here’s the output form the RDKit Molecule Highlighting node for those matches:



Hi Greg,

Thank you for your help and for spotting this! A very silly mistake, I would say. I did miss that even on a SMARTS viewer.

Best regards