Chemistry Q: Counting Number of Fused Rings, and Number of Large Rings

Its probably not a completely KNIME specific question, but one I want to do in KNIME. It is rather specific to Chemistry.

Is it possible to do either of these two tasks using SMARTS, or other format within KNIME;

-Counting Number of Fused Rings within a molecule (Aliphatic or Aromatic)

-Counting Number of 8 Membered Rings or Greater.

 

I see no easy way of doing this with SMARTS, and none of the Molecule Descriptors from RDKit, Indigo, CDK, or MOE (unless I have missed it) seem to have this capability either.

 

Simon.

Hi Simon,

finding the number of fused rings is never trivial; I don't know if there is a better solution but I use a svl snippet from Moe where I use the code below and I output an "int" value

app Unbond sm_MatchAll['*!@*',Atoms []];
result=length uniq aMoleculeNumber sm_MatchAll['[*rx3]',Atoms[]];

 

basically, in the first line I break all the non-ring bonds and then I count how many of the produced fragments have at least one atom that is inside a ring and has 3 "ring connections"

 

it seems to work but, honestly, I haven't tested this rigorously so I will welcome any feedback.

 

Giovanni

 

If you are up for a little experiment, perhaps the PaDEL nodes can help you out in this.

You can find it here: http://padel.nus.edu.sg/software/padeldescriptor/

Note that the PaDEL nodes internally use CDK 1.4, so you will most likely need to convert your compounds to something general (e.g. smiles) and then to the CDK 1.4 representation using the converter from the PaDEL repository.

There are 1875 descriptors in PaDEL , including n8Ring and nFRing for your 8-membered rings and fused-rings respectively.

In my experience PaDEL can be a bit fiddly to get working correctly (mainly due to the duplication of an older CDK library) but produces some very valuable results.

 

Thanks Giovanni,

Thats a useful descriptor in itself you provided, and works nicely, but not quite what I was after.

By "Number Of Fused Rings", I meant "Number of Rings Fused Together", if thats any more clear!!

So for example;

Toluene, Thiophene, Chlorobenzene, Piperidine would give the answer 1

Napthylene, Indole, Indane, Chroman would give the answer 2

A ring system with Benzene fused to a 5 membered ring which itself is fused to another Benzene ring would give the answer 3.

I suspect this is not easy to script for ?

Simon.

Hi Ellert,

From reading around these PaDEL descriptors, these sound exactly what I need. I will experiment with these on my home machine, but unfortunately these PaDEL nodes are not part of our corporate build because the PaDEL nodes are not distributed in KNIMEs Community section.

Could KNIME or the PaDEL developers include these PaDEL nodes in the Community release of nodes like you have with the CDK/Indigo/RDKit etc nodes. This will help in having these part of future standard KNIME releases.

Simon.

How about a SMARTS like:

*~@*(~@*)~@*(~@*)~@*

Which I think should give you the number of bonds of any type shared between 2 rings.  Then you would  need to add 1 to the count of the substructure match to get your answer?

Steve

Hi Steve,

Thanks for the attempt, that SMARTS means nothing to me, but I have given it a try. However I dont think it quite works.

For bicyclics it gives 1, and monocyclics it gives 0, tricyclics with central 6 mem ring gives 2, but for tricyclics in which the central ring is 5 membered, it gives 3!

I am guessing the bond in the central 5 membered ring which attaches to both outer rings is counted as well.

Simon.

Hi Simon,

if you have access to the JChem-nodes, there are ChemicalTerms-functions like

fusedRingCount
and
largestRingSize

available. I don’t think that it will be possible to calculate these descriptors (easily) with SMARTS since those are essentially substructures - you would have to formulate (more or less) every relevant case as a separate substructure. Eg., you could create a SMARTS for 8-rings, 9-rings, 10-rings, etc.

Nils

Hi Simon,

Unfortunately the PaDEL nodes are not maintained actively enough to have a fixed place in the repositories, also as they rely on a dependency to an older version of CDK.
If they would rename that dependency it would for sure resolve some issues. For example, at the moment they have a MoleculetoCDK node to convert compounds to the representation used within PaDEL, which is CDK 1.4. This node has the same name as the standard MoleculetoCDK node, which converts to CDK 1.5. A lot of problems follow from this, like the node be replaced by the other version after a save and load of a workflow. Ugh.

It was suggested to the PaDEL developers to change the MoleculetoCDK to MoleculetoPaDEL, but they have not been very forthcomming in doing so.
The plugin stays a bit obscure and outside of the standard repositories.

If your corporate build allows only official nodes, then at this moment you are somewhat out of luck with PaDEL i am afraid.

There might indeed, as weskamp suggested, be JChem originating ringCountOfSize functionality. You might crosscheck the output with the data from PaDEL on your non-corporate system as a sanity check, that would actually be very interesting, and something I would love to get feedback on !

 

Cheers,
Ellert.

Hi Simon,

if I understand correctly what you are after, the svl script below could help you: it returns a string with the concatenated sizes of the ring systems in the molecule. i.e. it returns 1 for monocyclic systems, 2 for bicyclic ones, 3 for tricyclic ones and so on.

app Unbond sm_MatchAll['*!@*',Atoms []];
foo=app uniq aMoleculeNumber SmallestRings Atoms[];
result=swrite['{}', freq[uniq foo,foo]];

 

Giovanni

Hi,

Thanks for all the tips. Unfortunately I dont have access to the JChem nodes, so those are out.

I have had a look at the PaDEL nodes at home, but I may be missing something here. In the PaDEL descriptor node, there is only a list of around 50 descriptors, as opposed to the 1875 mentioned. In the 50 descriptors available, the number of >=8 mem rings, and number of rings fused are not in those 50. It does seem quite slow too on the available descriptors, just for a handful of simple structures.

EDIT: I am with it now, each of these 50 descriptors brings back lots of subdescriptors. So the "NumberofRings" option is what I need here to get the desired descriptors. I think getting PaDEL to a point for permanent integration into the community nodes would be a great addition. I think some minor improvements would be useful such as the ability to retain the structure column and any existing columns post the Descriptor node would be useful. Also an easy way to untick all the descriptors in the node would be good instead of having to manually untick them all. Also because there are SO many descriptors, the column names dont aptly explain what the descriptor is calculating. I think a link in the node description to a webpage listing what they all do would be useful or somehow incorporating these descriptions inside the node itself on another tab or something.

Thankfully I have found a solution outside of KNIME for now through internal techniques. But I think these could be useful descriptors to have in some of the community chemistry implementations for the future.

Thanks for all the suggestions!

Simon.

 

The description of all PaDEL descriptors are available on the various tabs in this excelsheet.

And you are right, the interface is not optimal :-)

I am very late to this discussion since I was googling for a solution to the same question.  Here is a suggested RDKit solution that counts the number of atoms that are contained in two rings and divides by two.  Benzene returns zero, napthalene one, and anthracene two.  I also have not rigorously tested this and would appreciate comments.  Thanks,

Konrad

#!/usr/bin/env python

from rdkit import Chem

def FusedRingCount(mol):
    RingFusedAtom = Chem.MolFromSmarts('[*R2]')
    matches = mol.GetSubstructMatches(RingFusedAtom)
    return len(matches)/2