minhash vector


I have a txt file with minhash vectors that I want to use with ML algorithms. (Currently I don’t know where to start and how to go about) I created the file with an external source. I have a couple of questions.

  1. I am not sure how to read the file correctly to be able to use the vector (The text file looks as follows I can upload test file too; currently its is just a text file.)
    AA 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903
    BB 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903

    CC 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903

  2. If I can get this vector read then what learner can I use?

Thank you in advance,

Hi Pieter,

You can use the workflow attached to read your minhash vector file. The random forest Learner node can directly use bit-vectors (e.g, fingerprints generated by RDKit Fingerprint node). But that is not true for numeric vectors like the ones in your example. One way would be, to split the vectors in to numerical columns first and use the resulting columns as features. This example https://kni.me/w/Eot69vL_DI5V79us on the KNIME Hub shows how to train random forest in KNIME.

support_read_molecule_vectors.knwf (643.5 KB)

Kind regards,

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.