How to find TF-IDF of specific words?

Hi experts. I am new to Knime. I have 5 text files named, 1500.txt, 1600.txt.....1900.txt. Using Python I selected 200 most frequently used words from 1500.txt and compared those words with other text files. The following is an example of those words:-

Words 1500  1600 1700  1800  1900 
love 1775 904 897 887 798
great 1564 832 2044 2025 1574
good 1508 1599 2009 1671 1329
thee 1494 1644 1023 339 75
lord 1203 877 2110 823 84

 

Now the question is, I want to calculate TF-IDF of these 200 words in each text file. I am attaching the list of 200 words for ready reference. I hope you all understand my question.

Best wishes

Alam

Hi Alam,

you can either use the Math Formula node or the Java Snippet node. The correspondent formula(s) can be found here:

https://en.wikipedia.org/wiki/Tf–idf

I hope that's what you were looking for.

Best,
Marc

Thanks Marc. Actually i dont know which nods i will use and how i will do it? Can you you please help to making the workflow and creating the step by step nods?

Best wishes

Alam

You can import the data from your CSV file using the CSV Reader node. Thereafter you can use one or more Math Formula nodes to calculate whatever you want (normalized term frequency, inverse document frequency). Alternatively you can use the Java Snippet node if you are familiar to Java.

If you still don't understand what I mean, you may find some tutorials and example workflows somewhere on the KNIME homepage.

Good luck! :-)

Marc

Thanks a lot Marc :)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.