Handling of MDL query features (link nodes)

 

Hi,

I have found that certain MDL query features (link nodes in this case) are lost when converting to RDKit or Indigo molecules.  I have attached a screenshot showing the SDF string output for (1) the MOL output from the sketched (Marvin Sketch) query; (2) the RDKit version; and (3) the Indigo version (had to add an Indigo->Query Mol node in this case to get the SDF String rendering).

As you can see, in the RDKit and Indigo molecules, the M  LIN line is missing (but the atom list query feature - M  ALS - is retained).  Would it be possible to add support for the link node query feature, and also any other MDL query features that are missing - as I think these are what most chemists are familiar with when sketching query molecules?

Kind regards

James

(cross-posted on RDKit forum - http://tech.knime.org/forum/rdkit/handling-of-mdl-query-features-link-nodes)

 

Hi James,

Unfortunately link nodes are not supported in Indigo yet. And actually it is not planned to be implemented in the nearest future, because an efficient implementation of a substructure matcher algorithm for the link nodes is not trivial. 

For what purpose do you use link nodes, and could you give us a good example? Do you want to use it for substructure matching, or only for molecule manipulation?

Best regards,
Mikhail

Hi Mikhail,

I tend to use the link nodes in substructure matching - for example to search for 4 - 6 membered rings in a single search.   At the moment, the only way around the issue that I can see is to convert the MarvinSketch molecule to SMARTS, then to Indigo Query Molecules.  The ChemAxon code fully enumerates the link node(s) possibilities in an 'OR' list on converting to SMARTS.  For example, azetidine (4-membered N-containing ring), with the two carbons next to the N set as 1-3 link nodes would output all 9 (3x3) possible combinations:

[$(C1CNC1),$(C1CCNC1),$(C1CCNCC1),$(C1CCNC1),$(C1CCNCC1),$(C1CCCNCC1),$(C1CCNCC1),$(C1CCCNCC1),$(C1CCCNCCC1)]

This works fine from a substructural search point of view - but of course the original link node position information is lost - but this is not a big problem.  I think the greater concern is whether people using the Marvin Sketcher end-up with the results they are expecting if certain query features are not recognised by certain toolkits.

Kind regards

James