Substructure Matcher - Align Matched Structures

Hi,

I am currently using the Substructure Matcher node to take an imput query molecule from MarvinSketch (in mol format) and use this to filter a list of molecules.

What I would really like is if the query molecule 2D coordinates could be 'fixed' for the matches.  Selecting the 'Align Matched Structures' checkbox does align all of the hits; but apparently to the conformation found in the first incoming molecule(?)  It would be great if there was an option to use the coordinates of the query molecule instead.

Kind regards

 

James

Hi James,

You could try how it is implemented in the current nightly build 1.0.0.0001075. There is an option now "Align by query".

With best regards,
Mikhail

Hi Mikhail,

THis works perfectly. This is a really handy addition actually being able to specify how structures are aligned. Medicinal chemists always prefer their molecules aligned in specific ways and now its possible to do this using this feature "Align by Query".

I really like it. James thanks for the suggestion, and Mikhail thanks for implementing it.

Simon.

Hi Mikhail,

I've just had a chance to try this this weekend - it works very well!  However, during testing, I have found that the Substructure Matcher node gives errors for certain molecules:

ERROR Substructure Matcher Execute failed: element: can not calculate implicit hydrogens on aromatic N, charge 1, degree 3, 0 radical electrons

I have tracked this down to a problem parsing the following molecule - which I believe to be perfectly valid:

CCOc1ccc2[n+]([O-])c(N)c(-c3ccc([N+](=O)[O-])cc3)[n+]([O-])c2c1

Actually, I have found on a number of occasions that the Molecule->Indigo node lets through molecules that other Indigo nodes subsequently have problems with...  Anyway, hopefully this error is of use to make Indigo even better, and thanks once again for the improved Substructure Matcher!

 

 

Kind regards

James

PS  @Simon - glad you agree this is a useful addition; I have certainly found many of us medchemists wanting easy access to this functionality!

Hello James,
Thank you for the feedback!
Have you passed this molecule as SMILES or as MOLfile?
CCOc1ccc2[n+]([O-])c(N)c(-c3ccc([N+](=O)[O-])cc3)[n+]([O-])c2c1
I have checked it and when it is a smiles format, then there is no problems with it. If it is was passed in molfile format, then some operation with this molecule cannot be performed, because there is an uncertainty in the number of hydrogens for N+ atoms in the aromatic cycle. Molfile format doesn't support specification of the number of implicit hydrogens. For example, you can take these molecules in molfile format: CC1=CN=C(O)N1 and CC1=CNC(O)=N1. After aromatization they will become Cc1cnc(O)[nH]1 and Cc1c[nH]c(O)n1, but in a standard molfile the difference between them get lost. To resolve this ChemAxon introduced their extension to molfile format and they write implicit hydrogens in MRV_IMPLICIT_H section. We doesn't support it yet, and this is not planned. If this is very important thing for you, then we can implement it. 
Indigo has the dearomatization node that dearomatizes such molecule. In general case dearomatization is not fast operation and we decided not to do automatically. And due to limitation of molfile format it is not good idea to store aromatized molecules in this format. So, you can at first dearomatize such molecules, and then aromatize them again. In this case information about the number of implicit hydrogens will be preserved.
If you have a lot such cases, when Indigo accept the molecule via Molecule-to-Indigo transformation, but later throws a error, we might think to add an option to Molecule-to-Indigo node. With this option this node will accept only fully defined molecules, where all implicit hydrogens are defined. What do you think about adding such option? This is not done automatically because, Indigo can perform some operations even for such molecules: for example, you can dearomatize it.
Best regards,
Mikhail

Hi Mikhail,

Sorry for the slow response!  Yes, I was passing the molecule(s) in in MOL(SDF) format.  Your suggested process of 'dearomatize', then 'aromatize' works - thanks.  However, after doing this, I am having some problems with the Substructure Matcher node when I choose to align to query:

ERROR Substructure Matcher Execute failed: array: reserve(): no memory  

2011-10-13 10:54:46,278 ERROR KNIME-Worker-2 Substructure Matcher : Execute failed: array: reserve(): no memory

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 Substructure Matcher : Execute failed: array: reserve(): no memory

com.ggasoftware.indigo.IndigoException: array: reserve(): no memory

at com.ggasoftware.indigo.Indigo.checkResult(Indigo.java:49)

at com.ggasoftware.indigo.IndigoObject.clone(IndigoObject.java:63)

at com.ggasoftware.indigo.knime.submatcher.IndigoSubstructureMatcherNodeModel.execute(IndigoSubstructureMatcherNodeModel.java:127)

at org.knime.core.node.NodeModel.execute(NodeModel.java:668)

at org.knime.core.node.NodeModel.executeModel(NodeModel.java:524)

at org.knime.core.node.Node.execute(Node.java:873)

at org.knime.core.node.workflow.SingleNodeContainer.performExecuteNode(SingleNodeContainer.java:840)

at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:100)

at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:166)

at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)

at java.util.concurrent.FutureTask.run(Unknown Source)

at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:124)

at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:239)

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 WorkflowManager : Substructure Matcher 0:0:4 doBeforePostExecution

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 NodeContainer : Substructure Matcher 0:0:4 has new state: POSTEXECUTE

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 KnimeResourceNavigator : Node message changed: ERROR: Error in sub flow.

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 NodeContainer : Align_structures_to_scaffold (v3) 0:0 has new state: EXECUTING

2011-10-13 10:54:46,278 DEBUG KNIME-Worker-2 WorkflowManager : Substructure Matcher 0:0:4 doAfterExecute - failure

I am working with an input of ~10,000 molecules (Win7 64-bit; 32-bit KNIME 2.4.2; latest Indigo nodes)

Kind regards

James

Hello James,

Sorry for the delay in replay.

Thank you for the bug report. I have reproducted that issue. The problem is in the internal memory management in Java and KNIME due to native C++ code. I think we will solve this issue soon, like is was done in RDKit nodes. 

As for now, you need to specify memory policy for Indigo nodes manually: you need to set "Write tables to disk" in the Memory Policy tab in the configuration window.

Best regards,
Mikhail