Im working on a workflow for calculating descriptors from SMILES to be used for ML models. I came across some other workflows using the RDKit Salt Stripper, and I would like to understand the actual benefit/usage of this node in this specific workflow. Likewise regarding the Add Hs, generate coords, and optimize geometry nodes used before the rdkit descriptor calulation.
Thank you in advance
I moved the topic to the appropriate channel so that it becomes more visible to the RDKit community.
It would be great if you add a link to the workflow in question. If the workflow is on the KNIME Hub, you also have the option to start a discussion on the workflow page right there. Look for a “Start Discussion” button similar to what is shown in the screenshot here, at the bottom.
This would notify the workflow developers and they are best suited to answer your question.
When calculating descriptors you want to do this on the parent structures, but depending on the source of the structures they sometimes contain counter ions or salts as well. You can remove these with the RDKit Salt Stripper node, and then calculate descriptors.
As for the RDKit Descriptor Calculation node I believe this does not require 3D structures as input, it is enough to convert e.g. Smiles strings using the Molecule Type Cast node, and use this as input. There is also no need to add hydrogens first, the descriptor values will be the same.