Is it possible to use RDRkit or other tools to extract descriptors or fingerprints for large molecules like proteins?
Is there a node to convert a PDB column to sdf or mol that RDkit recognises?
The RDKit doesn’t have any fingerprinting functionality that would be particularly useful for proteins. Since proteins aren’t particularly chemically diverse and are quite bit, standard chemical fingerprints don’t tend to work well for them. Still, depending on what you want to do with them, you could try the RDKit Count-based fingerprint node and use the AtomPair fingerprint. You should almost certainly increase the number of bits used (on the Advanced tab) to 9192 or more.
The easiest way to convert a PDB cell to SDF is to use the ChemAxon/Infocom MolConverter node.
I hope this helps,
I tried the count-based fingerprint node.
It has given this error:
ERROR RDKit Count-Based Fingerprint 3:237 Fingerprint Type ‘AtomPair’ could not be calculated: Only values 0…255 can be stored in the vector
I attach the sdf list od structure that caused the error. I run the fingerprint node with RDkit mol format as input.sdf.zip (143.1 KB)
Will the increase of bits number to 10000 or 100000 improve the result of the pringerprint?
Ah, that’s an entertaining one that I didn’t think of in advance.
The problem here is that the count vector being used by the fingerprinter node can only store up counts of up to 255. Many atom-pairs in proteins will appear more often than this.
Unfortunately this is a limitation within the code of the node itself and not something you’re going to be able to work around by adjusting parameters. Sorry about that.
If you have Python installed and can use the RDKit Python integration, I can provide a sample workflow that shows how to generate fingerprints that way. Again, it may not be worth the trouble since these fingerprints are unlikely to be particularly useful.
I have python 3.7 installed in knime 4.
I plan to build a QSPR model from proteins.
I think it is worth to try.
Ok, I will try and put something together in the next couple of days.