Wrong Term Frequencies

I have the following sentence: “It is going to rain today” within “Table Create”.
At first, I convert it to documents “Strings To Document”, remove the punctuation and create a BoW with the “Bag of Words Creator”. Then I am using “TF” to create a column with term frequencies.

Although, I have a single document, where every term occurs a single time, I end up with an absolute term frequency of 2 for each term.

KNIME Workflow for TF (zip-file)

It would be great if you could have a look on it.
Thank you in advance.

Hi @b0raas,

you have set the Sentences column as Title as well in the dialog of the Strings to Documents node.
This will cause the term frequencies to double as the Title is also taken into account.
If you want to avoid this, you can select “Empty String” as Title in the dialog.

Best regards,
Julian

2 Likes

Thanks indeed this was the error. However, I am wondering why the column “Document” turns to an empty string after I change the title to “empty string”

I was always thinking of the column “Document” to be content-related instead of title related.

Hi @b0raas,

No, it is actually the other way around. Only the title is being displayed in the Document cell. :slight_smile:

It is so wired, because the column with the empty header is then used as an input for the punctuation and “Bag of Words” model. Which then produces the desired output.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.