Update to v1.36.3 - MMP Bug fix

We have updated the nightly build and 4.7 stable builds to v1.36.3. this fixes a minor bug in the MMP fragmentation nodes bought to our attention via email.

The bug was that the MMP Molecule Fragment node (MMP Molecule Fragment (RDKit) – KNIME Community Hub) mis-estimated the number of possible fragmentations for the Limit by Complexity / Maximum Fragmentations options when the number of cuts being made was not 1.

As this bug was discovered as part of the result of answering a question about what the setting is meant to do, here is also the explanation given by email to the user, in case anyone else finds it useful:

I added this feature to prevent unexpectedly complex molecules (i.e. molecules which despite fitting within a prefilter by HAC still could be fragmented in a large number of ways) from causing the fragmentation of a set of compounds to grind to a complete halt whilst one or two ‘pathological’ molecules caused the node to wait for a long time to complete. I implemented this by a crude approach as follows:

  • Identify all the bonds which it is possible to cut for the specified number of cuts (you can see which bonds those are using the ‘MMP Show Cuttable Bonds’ node)
  • As each fragmentation is a combination of those bonds we can use the combination formula nCr = n! / ((n-r)!r!) where n is the number of cuttable bonds, and r is the number of cuts to make. There are 2 additional things to consider
    • When r = 1 when the number needs to reflect that each bond can be cut in 2 directions, e.g. A-B and result in a ‘key’ A-* and ‘value’ B-*, and also a ‘key’ B-* with ‘value’ A-* - this is why the 23 matching bonds for r=1 fails when the threshold is 45, but passes when the threshold is 46 (= 2 × 23) – I agree this is slightly unintuitive (I had to think quite hard about this when I was looking back at the source code just now), but I think it is correct.
    • When r = 2 we need to add in the possibility of each bond being cut twice if that option is selected, i.e. so that A-B becomes ‘key’ A-*.B-* with value *-* – so in addition to the nCr combinations from above we need to add a further r combinations. The example has 8 cuttable bonds, which give for 2 cuts 8 + 8!/(6!2!) = 28 + 8 = 36 – which is not the behaviour the node is showing – see below!

The caveat is that this approach will sometimes over-estimate molecular complexity when there is symmetry. Suppose an example SMILES string Fc1ccc(C(F)(F)F)cc1 and the situation where we are only making cuts at bonds to F – there are 4 such bonds (i.e. n = 4), and so this method estimates for 2 cuts that there are 4 + 4!/(2!2!) = 4 + 6 = 10 combinations. However, in reality, three of the C-F bonds are symmetrically identical and so the actual fragmentations would be one ‘double cut’ to the Ar-F bond, one ‘double cut’ to the ArC(F)(F)F (not 3 cuts to each F separately), one with a cut to Ar-F and a cut to a CF3 C-F bond, and one with 2 cuts to 2 separate CF3 group C-F bonds – i.e. 4 fragmentations. Clearly this is not ideal, but the filter was intended to be a rough filtering step to avoid pathological failures in a workflow.

Hopefully the above clarifies.

Steve

3 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.