RDKit Molecule Substructure Filter incorrectly matches aromatic sulfur atoms molecule as metal-containing compounds

Hi guys,

When I use RDKit Molecule Substructure Filter node to filter out metal-containing compounds, the node filters compounds containing aromatic sulfur atoms as if they would contain metals.

In the attached workflow I used the SMARTS by Bristol-Myers Squibb HTS Deck Filters to filter out metal-containing compounds. In the example dataset there are 14 molecules of which: 5 contains metal atoms and 9 contains aromatic sulfur atoms. As you can see RDKit Molecule Substructure Filter matches all the 14 compounds as if they would contains metal atoms. In contrast CDK correctly splits the example set between metal-containing and not-metal-containing compounds.

Probably there is a problem in how the SMARTS containing some metal atom as Scandium (Sc) are interpreted by the RDKit node. It would be good if this can be corrected.

This problem was originally mentioned in this post but I thought it was appropriate to open a dedicated forum thread for it.

GioRDKit_metal_containing_matching_smarts_problem.knwf (63.7 KB)

1 Like

What’s going on here is a disagreement as to what the SMARTS [as] means.
The BMS filters clearly intended that it should mean “aromatic sulfur”. The RDKit, on the other hand, interprets it as [a&s] which means “aromatic and aromatic sulfur”.
As far as I know there’s no authoritative source of which elements should be recognized as aromatic, but this is one of those where the RDKit’s SMARTS and SMILES parsers disagree with each other, so it should be fixed.

We’ll get it fixed in the backend and then update the KNIME nodes.

Thanks for pointing out the problem!

1 Like

I would assume that [as] is trying to mean aromatic Arsenic, based on the OpenSMILES definitions:

Bracket Atoms bracket_atom ::= '[' isotope ? symbol chiral ? hcount ? charge ? class ? ']'
symbol ::= element_symbols aromatic_symbols '*'
isotope ::= NUMBER
element_symbols ::= …
aromatic_symbols ::= 'b' 'c' 'n' 'o' 'p' 's' 'se' 'as'


1 Like

@greglandrum and Steve @Vernalis, thank you for the contributions.
I would also assume that “[as]” is aromatic Arsenic.

1 Like