I am trying to learn KNIME and do a text classifier. I am going through the Document Classification example and using that to build my own. I have a CSV file for my data that has one column for text and another for the category.
The example uses the Table Reader nodes to read in data. I took the concatenated result of that model and exported it to a CSV so that it mimcs my data. Everything is built the same and I get identical results up to the Term FIltering node. Basically, instead of using the Table Reader I read the exact same data via File Reader. However, when I get to the Bag of Words part in the Term filtering, I suddenly get different results. The example model contains many more rows of output. Is there something I'm not accounting for? All of the settings are identical but the results are the vastly different. Is the Strings To Document tokenizing the CSV entries differently? I just used the default tokenizer.