Update to v1.36.3 - MMP Bug fix

Vernalis · May 18, 2023, 4:36pm

We have updated the nightly build and 4.7 stable builds to v1.36.3. this fixes a minor bug in the MMP fragmentation nodes bought to our attention via email.

The bug was that the MMP Molecule Fragment node (MMP Molecule Fragment (RDKit) – KNIME Community Hub) mis-estimated the number of possible fragmentations for the Limit by Complexity / Maximum Fragmentations options when the number of cuts being made was not 1.

As this bug was discovered as part of the result of answering a question about what the setting is meant to do, here is also the explanation given by email to the user, in case anyone else finds it useful:

I added this feature to prevent unexpectedly complex molecules (i.e. molecules which despite fitting within a prefilter by HAC still could be fragmented in a large number of ways) from causing the fragmentation of a set of compounds to grind to a complete halt whilst one or two ‘pathological’ molecules caused the node to wait for a long time to complete. I implemented this by a crude approach as follows:

Identify all the bonds which it is possible to cut for the specified number of cuts (you can see which bonds those are using the ‘MMP Show Cuttable Bonds’ node)

As each fragmentation is a combination of those bonds we can use the combination formula nCr = n! / ((n-r)!r!) where n is the number of cuttable bonds, and r is the number of cuts to make. There are 2 additional things to consider

When r = 1 when the number needs to reflect that each bond can be cut in 2 directions, e.g. A-B and result in a ‘key’ A-* and ‘value’ B-*, and also a ‘key’ B-* with ‘value’ A-* - this is why the 23 matching bonds for r=1 fails when the threshold is 45, but passes when the threshold is 46 (= 2 × 23) – I agree this is slightly unintuitive (I had to think quite hard about this when I was looking back at the source code just now), but I think it is correct.

When r = 2 we need to add in the possibility of each bond being cut twice if that option is selected, i.e. so that A-B becomes ‘key’ A-*.B-* with value *-* – so in addition to the nCr combinations from above we need to add a further r combinations. The example has 8 cuttable bonds, which give for 2 cuts 8 + 8!/(6!2!) = 28 + 8 = 36 – which is not the behaviour the node is showing – see below!

The caveat is that this approach will sometimes over-estimate molecular complexity when there is symmetry. Suppose an example SMILES string Fc1ccc(C(F)(F)F)cc1 and the situation where we are only making cuts at bonds to F – there are 4 such bonds (i.e. n = 4), and so this method estimates for 2 cuts that there are 4 + 4!/(2!2!) = 4 + 6 = 10 combinations. However, in reality, three of the C-F bonds are symmetrically identical and so the actual fragmentations would be one ‘double cut’ to the Ar-F bond, one ‘double cut’ to the ArC(F)(F)F (not 3 cuts to each F separately), one with a cut to Ar-F and a cut to a CF3 C-F bond, and one with 2 cuts to 2 separate CF3 group C-F bonds – i.e. 4 fragmentations. Clearly this is not ideal, but the filter was intended to be a rough filtering step to avoid pathological failures in a workflow.

Hopefully the above clarifies.

Steve

system · August 16, 2023, 4:36pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.