Speedy SMILES Desalt output format


I’m a bit confused by the Speedy SMILES Desalt output format, as the column produced is of type


which is not recognized as Smiles format by RDKit Functional Group Filter as input. I feed the Speedy SMILES Desalt node with a Smiles column ( type: SMI) that has been generated using the Molecular Type Cast node.

Please advice!


This is in Knime 3.5.3, with Vernalis KNIME Nodes 1.12.8.v201803161143 and RDKit KNIME integration 3.3.1.v201804190838


Hi Evert,

My guess is that you don’t have the ‘Keep only first unique component’ option selected. In this case, a collection cell is returned with all distinct components sharing the same Heavy Atom Count (HAC). If that is the behaviour that you want, then follow the node with an Ungroup node.

Otherwise, if you check the ‘Keep only first unique component’ option, you will get a normal column of SMILES, according to the following rules:

  • If there is a unique component with the largest HAC, then this will be returned - unambiguously!
  • If you don’t select the ‘Keep the longest SMILES string’ tiebreak option, then the SMILES returned will be the ‘first’ - NB this is an arbitrary first (If you want the details, then each component is put into a Java HashSet, and the first member used, the order is essentially random)
  • If you do select the ‘Keep the longest SMILES string’ option, then the assumption is that the component you want to keep is the most complicated, which in turn assumes it has the longest SMILES, and that is the component kept. NB some salts, e.g. tartrate, have a long complex SMILES (e.g. D-Tartaric acid is O[C@@H]([C@H](O)C(O)=O)C(O)=O - 29 characters) which can easily win over another HAC=10 component, e.g. NCc1ccccc1CN (12 characters)

It’s a bit of a rough-and-ready approximation, but it works reasonably well most of the time.
If you want to be ‘safe’, then return them all, and either Ungroup and take them all, or use a Snippet Row Splitter with a snippet text something like

return $...(Largest Component(s))$.length > 1;

(where the $…$ bit is the name of the group column), and then reprocess the ones with multiple components using e.g. the RDKit Salt Stripper node on those (The RDKit node uses a salt dictionary, so it will not be fooled by e.g. tartrate, but instead will only remove salts in the dictionary)

Hope that helps / clarifies?


Many thanks for your extensive and fast answer. Your guess was correct, I had not ticked the ‘Keep only first unique component’ option. Ticking this solved my problem, but your additional pointers are also very useful information.


1 Like