I'm a CDK lover and I'm trying to use JCompoundMapper (which is based on CDK) in order to measure molecular similarity.
I have a very odd behavior here. If I calculate the fingerprint vector for a certain molecule with 2 different nodes (but same setting) I obtain 2 different fingerprint vectors. In this manner it's impossible to obtain good molecular similarity estimation. I'm using Daylight-like atom type and extended connectivity fingerprints.
I suppose there is something wrong with the hash function. Maybe is there a random seed included there? Please, can anybody help me with that?
KNIME implementation of JCompoundMapper can be found at: http://sourceforge.net/projects/jcompoundmapper/files/
I just realized that the problem is not generalized to all the fingerprint types. For example using 2D-molprint like, this problem does not appear. Anyway, does anybody of you have experience with JCompoundMapper?
marvellous, I didn't know about the JCompoundMapper KNIME implementation up to now, thanks!
I could reproduce your issue but I have no experience with the JCompoundMapper library in general. I would suggest you contact the developers directly because I am not sure whether they monitor the KNIME-CDK forum.
You could also try to run the same functions via the command line using the jar file and check if that works.
OK Stephan, thanks for your suggestions.
It would be nice if JCompoundMapper node would be technically validated and delivered in KNIME as a community contribution package. There are a lot of interesting molecular fingerprints there, that are not present in other packages as RDKit or CDK (e.g. SHED keys, 2D- & 3D- 2 and 3 points pharmacophores, topological atom triplets, etc.).
that would be fantastic indeed. The same applies to the PaDEL descriptors, which are also based on CDK and have a KNIME plug-in available. The next 'stable' release of CDK (v. 1.6.x) might be a good time to contact the respective projects to talk about opportunitites.