About circular fingerprints

Hi

Is there a description of morgan and featmorgan fingerprints similar to articles for ECFP/FCFP? What are the default invariants for morgan fp, what is the morgan's default atom identifier length, how the fixed length fingerprint is produced?

As far as I understood you apply the same algorithm as in *CFP, but using different invariants?

Regards

The Morgan algorithm for fingerprinting was implemented according to the published algorithm for the ECFP/FCFP fingerprints.

The invariants used for the standard morgan fingerprint in the RDKit knime nodes include the same terms as the published algorithm: atomic number, degree, number of hs, formal charge, isotopic mass, ring membership. The actual invariants end up being different from the ones in ECFP since the hashing function is different, but calculated similiarty values between molecules are very, very similar.

The invariants for the FeatMorgan fingerprints use pharmacophoric features. Here are the definitions:

    // Definitions for feature points adapted from:
    // Gobbi and Poppinger, Biotech. Bioeng. _61_ 47-54 (1998)
    const char *smartsPatterns[6]={
      "[$([N;!H0;v3,v4&+1]),\
$([O,S;H1;+0]),\
n&H1&+0]", // Donor
      "[$([O,S;H1;v2;!$(*-*=[O,N,P,S])]),\
$([O,S;H0;v2]),\
$([O,S;-]),\
$([N;v3;!$(N-*=[O,N,P,S])]),\
n&H0&+0,\
$([o,s;+0;!$([o,s]:n);!$([o,s]:c:n)])]", // Acceptor
      "[a]", //Aromatic
      "[F,Cl,Br,I]",//Halogen
      "[#7;+,\
$([N;H2&+0][$([C,a]);!$([C,a](=O))]),\
$([N;H1&+0]([$([C,a]);!$([C,a](=O))])[$([C,a]);!$([C,a](=O))]),\
$([N;H0&+0]([C;!$(C(=O))])([C;!$(C(=O))])[C;!$(C(=O))])]", // Basic
      "[$([C,S](=[O,S,P])-[O;H1,-1])]" //Acidic
    };

 

I'm not sure what you mean by "default atom identifier length".

The fixed length fingerprint is generated as : bitId = bitId % fp_length

-greg

Hi greg,

Thanks fo answering me.

I think you gave me the information I needed. By default identifier length I mean the length of final identifier (e.g. in ECFP it is 32 bit word I think), but I guess that is not so important.

Regards

Hi again.

I'm sorry for stupid questions but there are some unclear thing to me. When multiplication bitId = bitId % fp_length is performed for each of the bitid the result is only single set bit? As far as I understand the fp output from the node is in hexadecimal format and if I use 'column rename' node to convert fp to string is binary fingerprint at the end?

Regards

Each atom environment (defined by atom invariants and radius) can set a bit.

The fp output of the node is a standard knime fingerprint type. The Rename node does give you the bit string for those when you give it a fingerprint column. You can also change the renderer to show you the bit string if you would like.

-greg