Preprocessed Document (Bag of Words, Terms)

Hey all,
is there in KNIME the possibility to display the preprocessed documents only with the existing terms?

This picture is the result of the node reference row filter.


As you can see in the column “preprocessed documents”, this column also includes the words that have already been filtered out.

I have difficulty calculating the relative term frequency in node ‘TF’. As you can see in the figure, the word ‘PIN’ appears twice in the record. Instead of dividing it by the term number of the document, in this case 8, but it divides it by the total number of words, although some words have been filtered out and should no longer be considered.
image

image

I would be very happy about your help.

Thanks,
Canan

Hey Canan,

the reference row filter doesn’t work to filter terms from a document. It’s only for filtering rows. So if you filter rows from your table, you are filtering the rows but the terms remain in the document. The node that you are looking for is the Dictionary Filter node. You can pass the documents (not the Bag Of Words) to the upper input port and a dictionary (a column containing Strings) to the second input port. Afterwards you can apply the Bag of Words Creator.

Then the calculation of the term frequency should also be correct.

Cheers,

Julian