Hello,
I have converted pubchem bitvectors to strings and split each bit position into it's own column so that I can use bit positions as predictor variables in QSAR models. I know which bit positions contribute to the activity and would like to retrieve the chemical features represented by these bits. Pubchem provides a description of each bit (ftp://ftp.ncbi.nih.gov/pubchem/data_spec/pubchem_fingerprints.txt). However, the pubchem fingerprint I computed, with the CDK node, has more bits (896) than the 881 bits expected for the pubchem fingerprint. Therefore, it is not possible for me to map bits to pubchem features. Also, I tried treating the last 15 bits as padding. But, the bits that are turned on and off in bit 881 are not in agreement with the structures so I do not believe it is a simple matter of padding. Any insight into how to map CDK pubchem bit positions to chemical features as describes in above link would be much appreciated. Thanks,
-Dan