I have a large dataset of compounds (>900 compounds) and i would like to calculate the individual fingerprint similarity of these compounds for 107 reference compounds (two separate files). However i don't want the combined similarly index of the 107 against the 900 compounds. So for each of the 107 compounds i want a calculated 107 similarity measures of the 900 compounds. I am not how to do this and i have been trying for a while to work this out with not much luck!
I think i need some type of loop that appends a similarly column (of 900 values) for each of the 107 compounds. I am new to KNIME with no other users around me to ask so if someone could help that would be great!
no loop is needed here. Just generate your fingerprints of your 900 cpds, and in a separate table generate fingerprints for your100 reference compounds. I use rdkit fingerprints personally, using Morgan setting.
now use the indigo or CDK fingerprint similarity calculator node. Choose to calculate the maximum similarity. For each of the 900 cpds, This will now report back the highest similarity score against the reference cpds.
hope that helps
Thank you for your help! I need all the similarity measures not just the maximum similarity for the 107 compounds i did manage to do it using the chunk loop with a loop end with column append (feeling pretty pleased with myself for a beginner!). I am using these similarity measures as molecular descriptors for QSAR models hence why i need all of them.
Out of interest i know you mentioned morgan fingerprints however in your opinion which are the best structural fingerprint keys to use? I am using pubchem fingerprints (at the moment!) however the other one i had in mind was the MACCS 166 ones but would appreciate any comments!
There are probably others much better positioned to answer this more thoroughly than me, but as I understand the MACCS fingerprints only report the functional groups in the molecules and how many of them etc, it doesn't really report their connectivity. For example than an alcohol is present on a cyclohexyl ring with an amine at the para position would not be encoded within a MACCS fingerprint, only that an amine and alcohol are present.
Morgan fingerprints (also called ECC or extended connectivity class) report the connectivity up to x bonds away, where x can often be specified, with 4 or 6 being relatively standard. Feat Morgan is a variant of this Morgan fingerprint where functional groups are classed as equivalent, I.e. Br and Cl are classed as the same.
Hope that helps. I guess it depends what you want from your similarities.
however the other one i had in mind was the MACCS 166 ones but would appreciate any comments!