GroupBy

I have problems with GroupBy, see enclosed document.

 

Hi you want the inchi key in the green box in the group tab of the groupby node, and all remaining columns in the red box.

in the options tab, choose any column and count as the aggregation type. If you want to keep all the other columns then add these in the options tab too choosing an appropriate aggregation type such as First, etc. 

personally for comparing structures to see which are identical, I use canonicalised smiles rather than inchi keys. Use the rdkit canon smiles for this task. 

 

Hope it helps

 

simon.

Hi Simon

Maybe one should not work at 1:30 in the morning. I should have looked at one of my earlier workflows where I also used SMILES.

I learned how to add the remaing fields. In the past I used always a Joiner node. Is there an advantage of using canonical SMILES, is it faster?

Thanks a lot for your help.

Alex

 

With normal smiles it's possible to have the same structure Represented by different smiles depending on the structure orientation. To get around that, canonicalised smiles was developed which means you will always get the same smiles for the same structure every time.  Although this doesn't speed things up, it will mean for sure that there will be no duplicated structures left after a groupby.

 

simon.

Sorry, I made myself not clear. My question was if there is an advantage using Canonical SMILES compared to InChi keys.

Alex

Ah okay, I don't know the answer to that. I've always used smiles myself, the string is generally shorter than an Inchi key, so I would assume it may have some marginal speed advantage. My experience with inchi keys is minimal. 

Simon.

I would be careful to compare canonical SMILES and InChiKeys since they are apples and oranges.

A SMILES is a represenation of a molecule in the sense that you can recover the molecular structure from the SMILES. The same holds true for an InChI. An InChIKey is however a hashed representation from which it is not (trivially) possible to recover the original structure. Collisions (i.e., two different structures with the same InChiKey) can occur (although they are rare). 

So, if you compare two SDfiles using InChiKeys (alone and not only as a pre-filter), you should be aware that you might get the wrong answer for a few compounds. 

Good comment! I will use canonical smiles.

Alex