Writing fingerprints in something else than bitstrings?


I am currently struggling with writing fingerprints (i.e. from CDK or RDKit) to a file in something else than a bistring (which is the typical way to do so using a Column Rename node to convert the fingerprint column into a (bit)string column). I would prefer Base64 for file size reasons or at least hex format. Java converters exist for these but they require the fingerprint to be in the right input format. And it is not yet obvious to me which internal format is used by Knime for dealing with fingerprints.

I.e. fingerprints cannot directly be converted to Base64 using javax.xml.bind.DatatypeConverter.printBase64Binary($FINGERPRINT$) as this requires $FINGERPRINT$ to be an array of bin ( bin[] ) but it is reported by the compiler to be java.lang.String (which surprises me as I expected a binary fingerprint to be represented by a binary object). So one could expect that converting the fingerprint to a byte array before should help:

fp_bin = $FINGERPRINT$.getBytes();
String Base64 = javax.xml.bind.DatatypeConverter.printBase64Binary(fp_bin);

Indeed this results in a valid Base64 string, but it is of wrong size.

So, does anybody have a solution for this?


You have the BitVector generator node which will generate HEX.


In the particular case of RDKit fingerprints I figured out in the meanwhile that the surprisingly short Base64 string size is a result of RDKit fingerprints being run-length encoded. To get fixed-length Base64 strings you need to create the full-length fingerprint first, i.e. via the bitstring which can be generated using a "Column Rename" node applied on the fingerprint column with output format "String Value". The bitstring can afterwards be converted to a byte array and written as Base64 with a "Java Snippet" node:

String bitString = $FP_RDKIT$;
byte[] fp_bytes = new byte[bitString.length()/8];

for ( int i = 0; i < bitString.length(); i += 8 ) {
  String byteString = bitString.substring(i, i+8 );
  fp_bytes[i/8] = (byte)Integer.parseInt(byteString, 2);

return javax.xml.bind.DatatypeConverter.printBase64Binary(fp_bytes);

(output format of the return value is string of course)