We have updated the nightly build and 4.7 stable builds to v1.36.3. this fixes a minor bug in the MMP fragmentation nodes bought to our attention via email.
The bug was that the MMP Molecule Fragment node (MMP Molecule Fragment (RDKit) – KNIME Community Hub) mis-estimated the number of possible fragmentations for the
Limit by Complexity /
Maximum Fragmentations options when the number of cuts being made was not 1.
As this bug was discovered as part of the result of answering a question about what the setting is meant to do, here is also the explanation given by email to the user, in case anyone else finds it useful:
I added this feature to prevent unexpectedly complex molecules (i.e. molecules which despite fitting within a prefilter by HAC still could be fragmented in a large number of ways) from causing the fragmentation of a set of compounds to grind to a complete halt whilst one or two ‘pathological’ molecules caused the node to wait for a long time to complete. I implemented this by a crude approach as follows:
- Identify all the bonds which it is possible to cut for the specified number of cuts (you can see which bonds those are using the ‘MMP Show Cuttable Bonds’ node)
- As each fragmentation is a combination of those bonds we can use the combination formula
nCr = n! / ((n-r)!r!)where
nis the number of cuttable bonds, and
ris the number of cuts to make. There are 2 additional things to consider
r= 1 when the number needs to reflect that each bond can be cut in 2 directions, e.g.
A-Band result in a ‘key’
B-*, and also a ‘key’
A-*- this is why the 23 matching bonds for r=1 fails when the threshold is 45, but passes when the threshold is 46 (= 2 × 23) – I agree this is slightly unintuitive (I had to think quite hard about this when I was looking back at the source code just now), but I think it is correct.
r= 2 we need to add in the possibility of each bond being cut twice if that option is selected, i.e. so that
*-*– so in addition to the
nCrcombinations from above we need to add a further
rcombinations. The example has 8 cuttable bonds, which give for 2 cuts 8 + 8!/(6!2!) = 28 + 8 = 36 – which is not the behaviour the node is showing – see below!
The caveat is that this approach will sometimes over-estimate molecular complexity when there is symmetry. Suppose an example SMILES string
Fc1ccc(C(F)(F)F)cc1and the situation where we are only making cuts at bonds to
F– there are 4 such bonds (i.e.
n= 4), and so this method estimates for 2 cuts that there are
4 + 4!/(2!2!) = 4 + 6 = 10combinations. However, in reality, three of the C-F bonds are symmetrically identical and so the actual fragmentations would be one ‘double cut’ to the
Ar-Fbond, one ‘double cut’ to the
ArC(F)(F)F(not 3 cuts to each F separately), one with a cut to
Ar-Fand a cut to a
C-Fbond, and one with 2 cuts to 2 separate
C-Fbonds – i.e. 4 fragmentations. Clearly this is not ideal, but the filter was intended to be a rough filtering step to avoid pathological failures in a workflow.
Hopefully the above clarifies.