Hi,
Structure Interaction fingerprints are fingerprints that describe interactions of a molecule/drug with different parts/molecules of a protein called amino acids. A typical sift may look like this once parsed into a csv like format. 1st row is the header
Moleculename,TYR122,SER232,LYS111
MYDRUG1, 10001100,00000000,11000000
MYDRUG2, 00110001,11000011,00000000
Each position out of 8 has a particular meaning.
I am attempting to import such a file using the csv,file reader, xls node and cluster. Note that one may either combine the 3 fingerprints or use them separately.
Issue#1: The bitstring with eight zeros is always imported as a zero since it is considered numeric. Thus when numbers are converted to a bitstring using bitstring generator, they are of unequal length, which may affect clustering results.
Issue#2: If the above bitstrings are modified to have quotes so as to declare them as strings, they do get imported properly, but then it becomes a hard task to merge them and form a bit vector. Needs several operations and not very satisfactory.
Issue#3: Excel cells were formatted as type Text from the format menu. Double quotes were removed. On reopening such a file excel/openoffice have no issue showing them as strings/text. Thus the sequence of eight 0's is shown correctly. However importing this excel file into knime again turns them into single 0's. Knime does not consider the metainfo about cell type. Nor does it have a dialog to modify individual column types while importing.
I posted this here and not cheminformatics as its a general issue with bitvectors.