Problems with threshold parameter of RDKit MCS node

Dear KNIME and RDKit users,

I have a problem with the threshold parameter of the RDKit MCS node. In principle this parameter sets the fraction of molecules that the Maximum Common Substructure (MCS) must cover and ranges between 0.0 and 1.0.

As you can see from the attached example workflow below, if I run this node on a compound set having a common MCS of 28 atoms, it works as expected both if I run it with a threshold = 1.0 or a threshold = 0.5. If I run it with the same compound set where a couple of different compounds have been added, when I run it with a threshold = 1.0 I obtain an empty output. This is expected as I set up a threshold = 1.0, asking that MCS should cover all the compounds. Nevertheless, if l set the threshold to 0.5 I would expect to retrieve the MCS highlighted above, because more than 90% of the molecules share it. Unfortunately, I don’t retrieve any MCS in that case as the node gives an empty cell as output.

Am I missing something or is this a bug?

Thanks in advance for any help!

rdkit_mcs_threshold_problem.knwf (78.5 KB)

Hi @gcincilla -

Let me ask internally about this one (I’m not a chemist).

2 Likes

Possibly a question for @greglandrum or @manuelschwarze - it looks like either a bug in the RDKit node/library or a behaviour that isn’t intended to do what it sounds like it is intended to do.

Steve

4 Likes

@gcincilla, this appears to be a bug in the underlying RDKit MCS code.
I’ve filed the issue here: partial MCS failure · Issue #6578 · rdkit/rdkit · GitHub

Hopefully we’ll be able to get it fixed.

-greg

4 Likes

@greglandrum, thank you very much for checking that and for filing the issue on GitHub.
Gio

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.