Information about CDK extended fingerprints

Hi guys,

I'm using CDK-fingerprints node of nighly build (CDK version 1.5.1). There you have the extended fingerprints option. Do you know where I can find some information about these fingerprints? I cannot find much documentation about them. Specifically I would like to know the following:

  1. Are they circular fingerprints based on extended connectivity of atoms? (As for example Morgan RDKit ones, or Scitegic ECFP).

  2. What is the default radius length (or linear path length, if that is the case) of enumerated fragments?

Thanks in advance for any help,

Gio

Hi Gio,

the only information available can be found in the class files:

https://github.com/egonw/cdk/blob/master/base/standard/src/main/java/org/openscience/cdk/fingerprint/Fingerprinter.java

https://github.com/egonw/cdk/blob/master/descriptor/fingerprint/src/main/java/org/openscience/cdk/fingerprint/ExtendedFingerprinter.java

The default linear path length is 8. According to the API: The fingerprinter class will create a fingerprint from all paths upto length N starting from each atom in the molecule and return the unique set of such paths. The extended fingerprinter generates an extended fingerprint with additional bits describing rings. It contains bits which tell if the structure has 0 rings, 1 or less rings, 2 or less rings ... 10 or less rings (referring to smallest set of smallest rings) and bits which tell if there is a fused ring system with 1,2...8 or more rings in it features.

Cheers,

Stephan

Hi again Stephan,

Thanks for your answer and for the GitHub references.

Unfortunately I didn't understand well what do you mean when you say "...generates an extended fingerprint...". Do you mean an extended-connectivity fingerprint? Like for example ECFP or RDKit morgan fingerprints?

Here my main issue is to understand whether these fingerprints are circular (i.d. they take account the structural information of tertiary and quaternary centers) as the ones I mentioned or if they are linear (i.d. they consider just molecular linear paths). This would be very important to know, as generally the formers perform quite better than the latter ones.

Please, does anybody know that?

Gio

Hi Gio,

sorry for the delay in reply. The fingerprints are linear, i.e. they consider linear paths.

The latest nightly CDK library version (the core library, not the KNIME plug-in) contains circular fingerprints.

Cheers,

Stephan

Hi Stephan,

Thanks again for the clarification. It would be very nice having circular fingerprints integrated inside KNIME-CDK package. Unfortunately I cannot help with this matter as I cannot program in Java. I hope this is something that can be done in the future.

Thanks anyway,

Gio

Hi Gio,

in preparation of the next release, I have added the CDK circular fingerprints to the plug-in (nightly). Have a go if they are still of relevance to you.

Cheers,

Stephan

Hi Gio,

in preparation of the next release, I have added the CDK circular fingerprints to the nightly build. Give it a go if they are still relevant to you.

Cheers,

Stephan

Hi Gio,

in preparation of the next release, I have added the circular fingerprints to the nightly build. Give it a go if they are still relevant to you.

Cheers,

Stephan

java code128java barcode library I am working on a bayesian model to predict molecular affinities based on molecular fingerprints. At the current state, I would like to analyse why a particular ligand is scored high or low. In order to do that, I want to look into the hightest contributions from the bitvectors and extract the molecular features that are encoded by the particular bits. It is going into the direction of QSAR. Did you ever do that? Or would you recommend another way of doing that?