Taking smiles from pubchem

tamaraa21 · July 13, 2020, 4:39pm

I noticed that the rdkit canonical smiles node does not always recognize the smiles in my original data set. After I use this node, there is always a certain number of rows with a ? in the canonical smiles column. Rather than manually going into pubchem with the molecule’s SID or REGID number to paste the canonical smile in my data, I wanted to know if there was a node that would automatically do this, given the molecules SID and REGID?

docminus2 · October 1, 2020, 7:48pm

You could try the pubchem API to download structures vs SID.
(unless that is where you get your original data set from? Not obvious from your question).
Not a RDkit question per se. Question is why are your original smiles not recognized? perhaps because they are faulty? may happen with salts or certain structural elements like nitro.
You could try the Rdkit structure normalizer as means to clean up your compounds.
Personally I find the Indigo2 nodes more efficient in sanitizing in Knime. But that is perhaps just me.

system · April 21, 2023, 9:09pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.