Problem with R-group decomposer

Hi,

I have just encountered a bug(?) with the RDKit substructure filter node - so I thought I would share!

I was working with a random set of 10000 e-molecules -> RDKit (9985 remain) -> Substructure filter (indole SMARTS - c1cc2ccccc2n1 - 280 hits) -> finally R-group decomposition with the same indole SMARTS passed in as scaffold.  Get the following:

ERROR R group decomposition Execute failed: ("MolSanitizeException"): null

and the log shows:

 

2011-09-06 05:11:49,234 DEBUG KNIME-Worker-5 LocalNodeExecutionJob : R group decomposition 0:2:19 Start execute
2011-09-06 05:11:49,331 DEBUG KNIME-Worker-5 R group decomposition : reset
2011-09-06 05:11:49,332 DEBUG KNIME-Worker-5 R group decomposition : clean output ports.
2011-09-06 05:11:49,333 ERROR KNIME-Worker-5 R group decomposition : Execute failed: ("MolSanitizeException"): null
2011-09-06 05:11:49,334 DEBUG KNIME-Worker-5 R group decomposition : Execute failed: ("MolSanitizeException"): null
org.RDKit.MolSanitizeException
at org.RDKit.RDKFuncsJNI.getMolFrags__SWIG_4(Native Method)
at org.RDKit.RDKFuncs.getMolFrags(RDKFuncs.java:908)
at org.rdkit.knime.nodes.rgroups.RDKitRGroupsNodeModel.execute(RDKitRGroupsNodeModel.java:238)
at org.knime.core.node.NodeModel.execute(NodeModel.java:668)
at org.knime.core.node.NodeModel.executeModel(NodeModel.java:524)
at org.knime.core.node.Node.execute(Node.java:873)
at org.knime.core.node.workflow.SingleNodeContainer.performExecuteNode(SingleNodeContainer.java:840)
at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:100)
at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:124)
at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:239)
2011-09-06 05:11:49,335 DEBUG KNIME-Worker-5 WorkflowManager : R group decomposition 0:2:19 doBeforePostExecution
2011-09-06 05:11:49,335 DEBUG KNIME-Worker-5 NodeContainer : R group decomposition 0:2:19 has new state: POSTEXECUTE
2011-09-06 05:11:49,336 DEBUG KNIME-Worker-5 KnimeResourceNavigator : Node message changed: ERROR: Error in sub flow.
2011-09-06 05:11:49,336 DEBUG KNIME-Worker-5 NodeContainer : eMolecules 0:2 has new state: EXECUTING
2011-09-06 05:11:49,336 DEBUG KNIME-Worker-5 WorkflowManager : R group decomposition 0:2:19 doAfterExecute - failure
2011-09-06 05:11:49,336 DEBUG KNIME-Worker-5 R group decomposition : reset
2011-09-06 05:11:49,337 DEBUG KNIME-Worker-5 R group decomposition : clean output ports.
2011-09-06 05:11:49,337 DEBUG KNIME-Worker-5 NodeContainer : R group decomposition 0:2:19 has new state: IDLE
2011-09-06 05:11:49,338 DEBUG KNIME-Worker-5 R group decomposition : Configure succeeded. (R group decomposition)
2011-09-06 05:11:49,338 DEBUG KNIME-Worker-5 NodeContainer : R group decomposition 0:2:19 has new state: CONFIGURED
2011-09-06 05:11:49,338 DEBUG KNIME-Worker-5 KnimeResourceNavigator : state changed to IDLE
2011-09-06 05:11:49,338 DEBUG KNIME-Worker-5 NodeContainer : eMolecules 0:2 has new state: IDLE
2011-09-06 05:11:49,339 DEBUG KNIME-Worker-5 NodeContainer : eMolecules 0:2 has new state: IDLE
2011-09-06 05:11:49,339 DEBUG KNIME-WFM-Parent-Notifier NodeContainer : Workflow Manager 0 has new state: IDLE
2011-09-06 05:12:29,975 DEBUG main NodeContainerEditPart : R group decomposition 0:2:19 (CONFIGURED)
2011-09-06 05:12:29,976 DEBUG main NodeContainerEditPart : TableRow To Variable 0:2:18 (EXECUTED)
2011-09-06 05:12:31,633 DEBUG main OpenPortViewAction : Open Port View TableRow To Variable (#1)
 
Kind regards
 
James

Vacation-slowed reply:

It is probably a bug in the R-group decomposition node. If it's what I think it is, the fix is straightforward.

-greg

James,

Can you please export the workflow and attach it to a message here so that I can use it for testing?

Thanks,

-greg

Hi Greg,

I finally tracked-down the workflow (it was on my home PC, not my work laptop!).  I made a slight ammendment (table writer / table reader) so that I could save with the 10,000 molecule set, not the parent set of 5 million eMolecules (which made the file > 100MB!).  I also checked that I could still reproduce the error on my home PC (Win7 64-bit; KNIME 2.4.2; RDKit nodes 2.0.0.0001061).

Kind regards

James

Thanks. That's what I needed.

The problem occurs with molecules where a carbazole is present instead of the simple indole.

What an R-group decomposition should do in this case is an interesting question, but in the short term I will check in a fix so that no R groups are generated for in these cases.

-greg

Thanks Greg, the short-term fix is much appreciated!

As you probably saw, the workflow also contains the equivalent Indigo nodes - which I think do a rather elegant job in the cases where there is cyclisation between two scaffold substitution points (and I particularly like the rendering of the scaffold in these cases - even highlighting whether the ring is perceived as aromatic or not).

So in this indole scaffold example, R3 covers all of the examples where the 2- and 3- position of the indole are cyclised to give an extra ring.

Would a similar approach be viable for RDKit?

Kind regards

James