How to find TF-IDF of specific words?

Hi experts. I am new to Knime. I have 5 text files named, 1500.txt, 1600.txt.....1900.txt. Using Python I selected 200 most frequently used words from 1500.txt and compared those words with other text files. The following is an example of those words:-

Words 1500  1600 1700  1800  1900 
love 1775 904 897 887 798
great 1564 832 2044 2025 1574
good 1508 1599 2009 1671 1329
thee 1494 1644 1023 339 75
lord 1203 877 2110 823 84

 

Now the question is, I want to calculate TF-IDF of these 200 words in each text file. I am attaching the list of 200 words for ready reference. I hope you all understand my question.

Best wishes

Alam

Hi Alam,

Use the Flat File Parser to read in the txt files. Use the Dictionary tagger to tag the 200 most frequent words, that you counted before. Then use General Tag Filter (filter out all terms that have not been tagged, to reduce terms / data)->Bag of Words creator->TF (absolute).

I hope this helps.

Cheers, Kilian

Thanks Kilian for reply. But as i mentioned in my other post i am facing the problem in loading the text file.

Best wishes

Alam

I tried the Flat File Parser and it worked for me, see: https://tech.knime.org/forum/knime-textprocessing/flat-file-document-parser-problem

Cheers, Kilian

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.