Different Results when processing example CSV

GoHawks1423 · July 17, 2017, 2:42am

I am trying to learn KNIME and do a text classifier. I am going through the Document Classification example and using that to build my own. I have a CSV file for my data that has one column for text and another for the category.

The example uses the Table Reader nodes to read in data. I took the concatenated result of that model and exported it to a CSV so that it mimcs my data. Everything is built the same and I get identical results up to the Term FIltering node. Basically, instead of using the Table Reader I read the exact same data via File Reader. However, when I get to the Bag of Words part in the Term filtering, I suddenly get different results. The example model contains many more rows of output. Is there something I'm not accounting for? All of the settings are identical but the results are the vastly different. Is the Strings To Document tokenizing the CSV entries differently? I just used the default tokenizer.

Iris · July 26, 2017, 8:38am

Hi Hawks,

this is really difficult to diagnose without actually looking at your workflow.

Best, Iris

UtilityHawk · September 9, 2017, 4:59am

Here is my workflow. Maybe I'm doing it wrong. I'm just learning this software but this looked fairly straight forward.

01_Document_clustering.knwf

UtilityHawk · September 9, 2017, 6:14am

As an example, when I run up to the bag of words creator, I noticed that the output table had 25,009 rows. Using the same exact settings, I could only get 1,726 with my CSV import.

system · June 2, 2023, 9:45pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.