Fingerprint

In which kind of format molecular fingerprints (e.g. Morgan) should be transformed to be used together with all the other numeric descriptors in ML nodes? I know some of the learners have an option to use fingerprint instead of other molecular decsriptors but I’d like to use fingerprints and molecular descriptors as part of the same feature space.

Thanks

Hi @mining12

A fingerprint is coded in KNIME as a column of type BitVector. The -Expand Bit Vector- (
Expand Bit Vector node) allows to expand a bit vector type column and generates as many columns as bits are in the bit vector:

If for instance, your Morgan fingerprint is of size 1024, you could expand it to 1024 columns using this node. If your aim is to mix different types of variables, i.e. bit vectors and PhysChem descriptors, you could use the -Expand Bit Vector- node first and then the -Column Appender- node (or the joiner) to join the different sources of columns. You would need too to normalize the columns (for instance using Gaussian Normalization) so that they are all of the same range. Otherwise you introduce a bias towards the biggest variable ranges.

Hope all this helps to answer your question.

Best

Ael

Thanks, that works well. BTW, is there a y-scrambler/randomization node that randomize classes?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.