How to obtain the list of fragments defining a certain fingerprint?

Hi everybody,

I would like to know if there is a way in order to obtain, for a certain molecule, all the fragments/substructures defining a certain fingerprints vector (e.g. Morgan,FeatMorgan, etc.).

Thanks in advance,

Giovanni

Hi Giovanni

As you have yet to get a response I will let you know what limited information I have.

I don't believe CDK, RDKit or Indigo have implemented any method of identifying which bits correspond to each index in their hashed fingerprints. ChemAxon have implemented in for ECFP and FCFP : http://www.chemaxon.com/jchem/doc/user/ECFP.html

MACCS keys will likely vary between implementation. Here is an example for RDKit: http://rdkit.org/docs/api/rdkit.Chem.MACCSkeys-pysrc.html

Pubchem keys can be found here: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

Did that help at all?

Cheers

Sam

Hi Sam,

Thank you for the references on dictionary-based fingerprints fragments (MACCS and PubChem keys). Anyway I was more interested in obtaining fragments relative to public, opensource, hashed fingerprints (e.g. CDK-JCompoundMapper, RDKit, Indigo). Unfortunately, from what you said, only ChemAxon has this possibility implemented and I imagine it is available only under commercial license.

I hope this feature could be implemented also in opensource nodes in the future. I think such a functionality would be very interesting, useful and it would add transparency to fingerprints generator nodes that sometimes can appear like black boxes.

Thank you anyway for your quick reply! I hope someone can tell me if there is any news on this front in the future.

Cheers,

Giovanni