Hi everybody,
I would like to know if there is a way in order to obtain, for a certain molecule, all the fragments/substructures defining a certain fingerprints vector (e.g. Morgan,FeatMorgan, etc.).
Thanks in advance,
Giovanni
Hi everybody,
I would like to know if there is a way in order to obtain, for a certain molecule, all the fragments/substructures defining a certain fingerprints vector (e.g. Morgan,FeatMorgan, etc.).
Thanks in advance,
Giovanni
Hi Giovanni
As you have yet to get a response I will let you know what limited information I have.
I don't believe CDK, RDKit or Indigo have implemented any method of identifying which bits correspond to each index in their hashed fingerprints. ChemAxon have implemented in for ECFP and FCFP : http://www.chemaxon.com/jchem/doc/user/ECFP.html.
MACCS keys will likely vary between implementation. Here is an example for RDKit: http://rdkit.org/docs/api/rdkit.Chem.MACCSkeys-pysrc.html.
Pubchem keys can be found here: ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
Did that help at all?
Cheers
Sam
Hi Sam,
Thank you for the references on dictionary-based fingerprints fragments (MACCS and PubChem keys). Anyway I was more interested in obtaining fragments relative to public, opensource, hashed fingerprints (e.g. CDK-JCompoundMapper, RDKit, Indigo). Unfortunately, from what you said, only ChemAxon has this possibility implemented and I imagine it is available only under commercial license.
I hope this feature could be implemented also in opensource nodes in the future. I think such a functionality would be very interesting, useful and it would add transparency to fingerprints generator nodes that sometimes can appear like black boxes.
Thank you anyway for your quick reply! I hope someone can tell me if there is any news on this front in the future.
Cheers,
Giovanni