GroupBy Node: case-sensitive grouping

Dear all,

I am currently working on a small Cheminformatics workflow clustering a range of chemical substances by calculating Murcko Scaffold (RDKit) and then using the SMILES for these generated Scaffolds to cluster my list of coumpounds, using GroupBy.

Doing this, I came across a problem: obviously, GroupBy does not differentiate between

  • c1ccccc1 and
  • C1CCCCC1

In chemical terms, these structures are very distinct, the first one being cyclohexane, the second one representing an aromatic ring.

Did anyone else come across this and is there a way in the GroupBy node to get a case-sensitive clustering done?

Any help is appreciated,

Kind regards

Joachim

Hi Joachim,
you are right, case-sensitive grouping would be a useful feature to have. I don’t think that is currently possible, but I have a workaround you may be able to use. Using a String Manipulation node I prepend a § to each capital letter (I assume that this character will never occur in a SMILES string). Then I do the GroupBy and after that I can remove the character again using another String Manipulation node. I hope the attached workflow helps a bit!
Kind regards
Alexander

GroupBy SMILES.knwf (9.6 KB)

1 Like

Hi Alexander,
your workaround is doing the job nicely, thank you for your quick reply.

Are there plans to include case-sensitivity into the GroupBy node? From my perspective, this would be a very useful feature.

Kind regards

Joachim

Hi,
I thought about it again and found it really strange that GroupBy is case-insensitive and so I tried it myself again. For me the GroupBy creates two groups for the example strings you gave above, so the workaround should not even be necessary. Can you share your workflow? Now I am curious how it is case-insensitive for you.
Kind regards
Alexander

1 Like

Hi Alexander,
I can confirm your trial - strangely enough, when I did it this morning (before writing the post in the forum), I could not separate both cases and needed your workaround. Now, working to create a shareable workflow for you, I checked again and the GroupBy nicely differentiates between both cases even without your workaround.
Has KNIME been known to be affected by Monday morning blues or so? :wink:

Thank you for your help and support, I think we can close this case

Joachim

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.