Dear All (particularly Manuel and Greg!),
I have recently been wondering about a use-case in KNIME that I think would need some 'bridging' between the RDKit Figerprint node and the RDKit Molecule Highlighting node - perhaps a new "RDKit Fingerprint (explain)" node?
So the use case is that after using eg ECFP4-like fingerprints to build a model, I want to then go back and colour substructures in the molecules that are represented by certain bits in the fingerprint.
This is, I think, equivalent to what is documented here.
So I was thinking perhaps a new option could be exposed that would (while generating the fingerprints) add one column per bit (like the current Expand Bit Vector node) - but instead of containing 1's and 0's the cells would contain lists of atoms involved in setting the bit.
These columns could then be used in the Molecule Highlighting node - for example one could use the Fingerprint Bayesian Learner and then use the top 3 bit columns for one category to colour atoms green, and the bottom 3 to colour red.
One could also think to numerically scale the colouring based on the importance/score associated with the bit(?)
(disclaimer - if the functionality is already available somewhere, then 1000 apologies in advance!)
This is a nice idea that may involve a number of different use cases? One for example is explaining the contribution of atoms to a prediction (which I think is where you are getting at with the Bayesian learner?).
Riniker and Landrum have published some work on this kind of area: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3852750/
For an example of what this could look like in KNIME my presentation from last years UGM has how I implemented this using Lhasa's chemical engine: https://www.lhasalimited.org/Public/Library/2015/The%20use%20of%20KNIME%20to%20support%20research%20activity.pdf
see slides 41 - 43 (screencaps shown below).
This implementation can be achieved with RDKit structures now that the highlighting is available (or has it always been there?).
It would be interesting to have a node that can extract the atoms that:
1) Contribute to a bit in the fingerprint
2) Are the root atom for a bit in the fingerprint
It could return a list on atom index positions that could then be put into the highlight node? Or as you say provide an option to append the positions with the atoms when generating the fingerprint. Or would it be easier to have a node where you set: the structure, the fingerprint and which bit positions to highlight?
I hope my comments provide some useful discussion.
Thanks for the pointers - I knew I must have seen this somewhere before! :)
We can discuss at length in Berlin!