RDKit - handling of organo-metallic compounds?

SOH979 · January 22, 2015, 4:01pm

Has anyone come up with a strategy for handling organometallic compounds input as SMILES?

With "unusual" apparent valencies on N, O etc, and the various co-ordination numbers found on transition metals, RDKit typically throws a tantrum... as does CDK and Indigo!

BTW, I can probably suffice without molecules like ferrocene.

Also, as the data I'm looking at comes from a crystalographic DB, many compounds have multiple small ligand/solvation molecules as well as the "compound of interest" - is there an easy way to remove these?

Cheers,

Steve.

greglandrum · January 30, 2015, 8:08am

Hi Steve,

There's really no good way to express organometallics as standard SMILES: the types of bonds that show up are just too varied and different from what you get in organic molecules. There are some extensions to SMILES supported by the ChemAxon tools that may work, but I haven't tried them for organometallics and they wouldn't help with the RDKit anyway (though adding support for at least some of those extensions is on our ToDo list). The best I'm aware of for dealing with organometallics in standard SMILES, and it's pretty poor, is to break all the ligand-metal bonds and just express things as dot-separated structures.

For removing solvents, etc.: the RDKit salt-stripper node can be used for this, but you would need to provide it with the set of species that should be removed.

Best,

-greg

Docminus · January 30, 2015, 1:34pm

Would inchi keys work (better)?

SOH979 · February 16, 2015, 1:57pm

Hi,

InChI may well work better, but the data I have only has SMILES strings.

Cheers,

Steve.

Docminus · February 16, 2015, 3:10pm

Perhaps the CIR node, under Community Nodes, Talete, can help convert?

http://tech.knime.org/book/chemical-identifier-resolver-for-knime-trusted-extension

Ellert_van_Koperen · February 16, 2015, 3:18pm

If you want to simply rip out the ligand and solvent stuff, you can cut up the smiles yourself (simply split it on the dot (.) character) and keep only the fragment that is the longest, stringwise.

This worked for me in over 99% of the cases, only very rarely a solvent had a larger notation then a compound.

system · April 21, 2023, 9:10pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.