Substructure match counter and Query molecule to indigo nodes suggested imporvements

The substructure match counter node is very useful (I would have used it and saved alot of effort preparing and re-formatting data for this recent publication - http://dx.doi.org/10.1021/jm200187y - yes, we did cite Knime!), however a couple of suggested improvements:

1.  The query input table - currently only the first query row appears to be used despite multi-row query tables being accepted.  It would be very useful if multiple query rows could have their counts appended as separate columns - named either using the values in a second column, selected in the node configuration or (perhaps optionally as an alternative) systematically (Query_1, Query_2 etc)

2.  Query rendering - the rendering of queries is not fully representative of all the query features - for example the query SMARTS string [NX3;H2,H1,H0;!$(NS);!$(NC=O);!$(NC=N);!$(Nc)] (intended to capture aliphatic amines) is simply rendered 'N' by the query molecule to indigo node (although it does seem to still behave as expected in the substructure match counter)

3.  The query molecule to indigo node rejects any SMARTS strings containing 'h' definitions - please can it be fixed to handle these?

 Thanks

Steve

Hello Steve,

Thank you for the good comments about the substructure match counter node.

Reply to your suggestions:

1. This sounds reasonable and I think we will implement it soon.

2. We will also try to add a better rendering fir this structures. I think the whole string should be rendered as a single atom, because I don't know how this query could be rendered as a structure in a standard way. There is no IUPACK recommendtaions for the query structures.

3. We are stronly againt 'h' notation in SMARTS. Hydrogens can be stored implicitly or explicitly in the database, and this difference should not affect the search results. "H" (total hydrogen count) may be used instead. Do you have reasons to use "h" instead of "H"?

With best regards,
Mikhail

Hello Steve,

You could try nightly build to check query molecules rendering.

Currently query molecule "[NX3;H2,H1,H0;!$(NS);!$(NC=O);!$(NC=N);!$(Nc)]-c-c-c-c-c-c-c-c-c-c-c-c-c" is rendered as attached image "with_description.png". It contains explanation about the query structure. But it is too long. I think we will just print smarts expression, as it is shown on the picture "smarts.png" (it is not fully correct yet). What variant do you prefer?

With best regards,
Mikhail

Mikhail,

Thanks for this - I think the simple smarts should be ok, I'm not sure that the description adds enough to be worth the extra space it takes up.  Maybe others will comment differently, in which case, it might be worth a node configuration option to select?

Steve

Steve,

Renderer options are specified for Knime instance, not for a particular node.

We will think about an option, but I'm not sure that such option is necessary. We will add rendering of compact SMARTS expressions in one of the next releases.

Mikhail

Hello Steve,

Latest Substructure match counter from the nightly build (version 1.1.0.0001123) now accepts multiple queries. In such case mutiple columns with countrers are created. Also if highlighting is specified then the all matches are highlighted.

Best regards,
Mikhail