I am trying to perform multiple Fingerprint similarities between a few compounds and a library of around 250k compounds. The few compounds are different each time but this library does not change.
In order to cut down the execution time of the protocol, I am writing the 250k fingerprint to a CSV file (just like Leander here http://tech.knime.org/forum/knime-general/how-to-write-fingerprints-of-molecules-to-a-file) once, from another protocol.
I would like to read from this file and use the Indigo Fingerprint Similarity node. However, once read from a CSV Reader node, I cannot find a way to typecast the fingerprint column back to a BitVectorValue (from the String type used to write to the CSV file), which is required for the Fingerprint Similarity node.
I have tried using the Molecule Type Cast, Column Rename, Java Snippet (simple), OpenBabel nodes and have not found a solution yet.
Do I need to save the fingerprints to another file format?
Any idea on how to do this would be greatly appreciated.
Will it not be better to export the data to a KNIME Table File (using Table Writer node) instead of saving out as a CSV file, then read in with the Table Reader node, that way the column types would be retained.
I do agree though, a Typecasting facility to the BitVectorValue would be useful.
I believe you can get this done with the Bitvector Generator node (from Mining > Item Sets/Association Rules). Just be sure to specify the "Parse bitvector from strings" option and you should be good to go. If not, please post back and let us know.
In theory, I like Simon's suggestion better, because of the impact on execution time: using the Bitvector Generator node would have to be run on 250k lines, which may take a few extra seconds.
However, I did not know of such a node so I would like to thank Aaron too for the suggestion. It may come in handy in quite a few situations.
Simon, I will try using the Table file. Thanks for the tip on how to preserve column types.
For anyone coming across this thread like me:
1.Read in the bit string as string.
Use “Fingerprint From Binary String Node” with “Dense” fingerprint type to convert to bitvector. Either are correct but Dense is like the RDKIT output.
Always use the “Bit Vector Distances” for fastest Tanimoto searches. Bit comparisons are always the fastest and if you just use the Tanimoto or similarity search node alone, it may be much slower. Depends of if they updated it They may have improved the Similarity Search node since.