Indigo Nodes Request

Hi Please can I request nodes in Indigo around the following;

- Enumerator/Combichem node which takes a scaffold(s) as input into one port which has Rx positions defined, i.e. R1, R2, R3, R4. (so basically either the scaffolds output from port1 in the decomposer node, or the scaffolds from port0 would be good templates for this). Then in a second port of the enumerator node is input a list of groups with attachment points across multiple columns (i.e. as the output is now from the decomposer node, these R Group columns would be ideal). Then within the Enumerator node you then say which R group column you want to match up to the R1 on the scaffold, which R group column you want to match up to the R2 on the scaffold etc.

- Transformation node where you can choose where the reaction is to take place and thus gives an output of the molecules with attachment points.  I hope it can be used to define not only attachment points but also Rx groups for a scaffold too, so for example, being able to take Toluene, and then specifying the introduction of an R1 group at the ortho position to the methyl.

- Improvement to "Molecular Properties" node to include Polar Surface Area (PSA), Hydrogen Bond Donors (HBD), Hydrogen Bond Acceptors (HBA). I appreciate these can be worked out with looping and such like but really need a user friendly way of doing this simply from an interface. Also QSAR Properties to calculate properties like SLogP, Lipinksi Rule of 5, Molecular Volume.

- Alignment node which takes a dataset of molecules in one port, and a set of scaffold(s) in the other point and simply aligns the molecules from the dataset to the same orientation to those drawn in the scaffold set. This is useful as chemists often want structures drawn in the same orientation to aid visual interpretation of molecules. So for example if the scaffold is an indole drawn in a way such that the Nitrogen is at the bottom right, and the 6 membered ring is on the left, and 5 membered on the right, then the output structures of the dataset will have all those molecules redrawn in the same way.

- Stereo Enumerator node which will take a molecule which has chiral centres present and will enumerate the molecule into all the possible enantiomers/diastereomers. Also if double bonds are present, E and Z enumeration is undertaken.

- Isotope Calculator node which will take a molecule and calculate the common isotopic masses of the molecule with relative abundance, i.e. HCl would be 36 (75%) and 38 (25%). This would be really nice for the medicinal chemists in calculating masses from an LCMS, something missing from KNIME at the moment.

- Scaffold Detector node which does not calculate just the Maximum Common Scaffold but calculates commonly exemplified scaffolds within the dataset.

- PAINS Detector node based on the excellent work of others on this site in implementing a workflow around PAINS detection, can this be simplified into one node to provide a count per molecule on the number of problematic groups found in a molecule.

- Tautomer Standardiser node which will take a set of structures and make sure they are all drawn in the same tautomer, i.e. 2-pyridinone and 2-pyridinol are drawn in the same way. This can be very useful to have them uniform in substructure searching and matched pair detection. Currently I see no tool in KNIME capable of this.

- Chirality Finder node which only returns back molecules with chiral centres present. It would be good if possible that the results are ordered in such a way that enantiomeric and diastereomeric pairs are listed next to each other in the table.

- Substructure Dictionary Search like the RDKit node which will search a dataset of molecules  using multiple substructure search queries and you can choose to return the results which match at least x number of substructures.

- Indigo Molecule to IUPAC Name Calculator node which will take structures and generate an IUPAC name from them.

- IUPAC Name To Structure node which will take IUPAC names and convert them to Indigo Molecules.

These are just a selection of nodes I would love to see in KNIME based around chemistry. The current implementation of Indigo nodes has been terrific and I would love to see them expanded to include some of the above suggestions. Others feel free to add to this list or suggest favourites etc.

Simon.

Hello Simon,

Thank you for this great detailed list. We are already working on some nodes (such as alignment node), and all new nodes will be includes in our plans. Soon we will release a new version of Indigo nodes. I think it will appear after 22 August because I will be on vaction, but maybe we will release it earlier.

With best regards,
Mikhail

Thanks Mikhail for adding these nodes to your list, I cant wait to see the new Indigo nodes. There was another future node I forgot to add to the list as well:

- Remove Duplicates node. This will take a list of molecules and any duplicate molecules are removed from the list, with only the first of the duplicate molecules and its other column properties being retained. An option to remove isomers would be great too, this is always a pain in medicinal chemistry projects where you have isomers as usually one isomer is active and one is active. Be great to sort the list of molecules beforehand in order of activity using KNIME Sorter node, and then use this "Remove Duplicates" node to remove the remaining isomers, such that only the active isomer is retained.

Simon.

Well one feature that I dearly miss from the early versions of the CDK toolkit and that can't be found anywhere, is the ability to enumerate structures from a molecular formula.

A large space of potential molecules is yet unsampled and a structure generator like the GENMDeterministicGenerator would be fantastic. When O,N are present in the mol formula, the number of potential structures for even simple formulae can be ginormous. Thus a structure generator needs to use a 'Goodlist' and 'Badlist' of structures that should not be generated. This reduces it to a manageable number. e.g. a bad structure could be a 3 membered ring fused to a all larger rings, or reactive fragments.

A related class in CDK that's very useful is the VicinitySampler. It starts with a structure and samples nearby structural space. One would think that these features are what would set Indigo apart from other toolkits. If its underlying Indigo code is based on graph libraries then it should be feasible although challenging to develop.