How to find TF-IDF of specific words?

alamsaqib · November 25, 2015, 5:07am

Hi experts. I am new to Knime. I have 5 text files named, 1500.txt, 1600.txt.....1900.txt. Using Python I selected 200 most frequently used words from 1500.txt and compared those words with other text files. The following is an example of those words:-

Words	1500	1600	1700	1800	1900
love	1775	904	897	887	798
great	1564	832	2044	2025	1574
good	1508	1599	2009	1671	1329
thee	1494	1644	1023	339	75
lord	1203	877	2110	823	84

Now the question is, I want to calculate TF-IDF of these 200 words in each text file. I am attaching the list of 200 words for ready reference. I hope you all understand my question.

Best wishes

Alam

wordscount.csv

kilian.thiel · November 27, 2015, 9:14am

Hi Alam,

Use the Flat File Parser to read in the txt files. Use the Dictionary tagger to tag the 200 most frequent words, that you counted before. Then use General Tag Filter (filter out all terms that have not been tagged, to reduce terms / data)->Bag of Words creator->TF (absolute).

I hope this helps.

Cheers, Kilian

alamsaqib · November 27, 2015, 11:02am

Thanks Kilian for reply. But as i mentioned in my other post i am facing the problem in loading the text file.

Best wishes

Alam

kilian.thiel · November 27, 2015, 12:07pm

I tried the Flat File Parser and it worked for me, see: https://tech.knime.org/forum/knime-textprocessing/flat-file-document-parser-problem

Cheers, Kilian

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.