How to find TF-IDF of specific words?

alamsaqib · November 25, 2015, 9:45am

Hi experts. I am new to Knime. I have 5 text files named, 1500.txt, 1600.txt.....1900.txt. Using Python I selected 200 most frequently used words from 1500.txt and compared those words with other text files. The following is an example of those words:-

Words	1500	1600	1700	1800	1900
love	1775	904	897	887	798
great	1564	832	2044	2025	1574
good	1508	1599	2009	1671	1329
thee	1494	1644	1023	339	75
lord	1203	877	2110	823	84

Now the question is, I want to calculate TF-IDF of these 200 words in each text file. I am attaching the list of 200 words for ready reference. I hope you all understand my question.

Best wishes

Alam

wordscount.csv

ImNotGoodSry · November 25, 2015, 10:25am

Hi Alam,

you can either use the Math Formula node or the Java Snippet node. The correspondent formula(s) can be found here:

https://en.wikipedia.org/wiki/Tf–idf

I hope that's what you were looking for.

Best,
Marc

alamsaqib · November 25, 2015, 12:18pm

Thanks Marc. Actually i dont know which nods i will use and how i will do it? Can you you please help to making the workflow and creating the step by step nods?

Best wishes

Alam

ImNotGoodSry · November 25, 2015, 3:47pm

You can import the data from your CSV file using the CSV Reader node. Thereafter you can use one or more Math Formula nodes to calculate whatever you want (normalized term frequency, inverse document frequency). Alternatively you can use the Java Snippet node if you are familiar to Java.

If you still don't understand what I mean, you may find some tutorials and example workflows somewhere on the KNIME homepage.

Good luck! :-)

Marc

alamsaqib · November 25, 2015, 3:53pm

Thanks a lot Marc :)

system · June 2, 2023, 9:49pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.