I have a txt file with minhash vectors that I want to use with ML algorithms. (Currently I don’t know where to start and how to go about) I created the file with an external source. I have a couple of questions.
I am not sure how to read the file correctly to be able to use the vector (The text file looks as follows I can upload test file too; currently its is just a text file.)
i.e
AA 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903
BB 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903
…
…
…
CC 2548668;3827191;6208312;4793180;544975; … 9013842;7510492;3094977;52903
If I can get this vector read then what learner can I use?
You can use the workflow attached to read your minhash vector file. The random forest Learner node can directly use bit-vectors (e.g, fingerprints generated by RDKit Fingerprint node). But that is not true for numeric vectors like the ones in your example. One way would be, to split the vectors in to numerical columns first and use the resulting columns as features. This example https://kni.me/w/Eot69vL_DI5V79us on the KNIME Hub shows how to train random forest in KNIME.