Hi experts. I am new to Knime. I have 5 text files named, 1500.txt, 1600.txt.....1900.txt. Using Python I selected 200 most frequently used words from 1500.txt and compared those words with other text files. The following is an example of those words:-
Words |
1500 |
1600 |
1700 |
1800 |
1900 |
love |
1775 |
904 |
897 |
887 |
798 |
great |
1564 |
832 |
2044 |
2025 |
1574 |
good |
1508 |
1599 |
2009 |
1671 |
1329 |
thee |
1494 |
1644 |
1023 |
339 |
75 |
lord |
1203 |
877 |
2110 |
823 |
84 |
Now the question is, I want to calculate TF-IDF of these 200 words in each text file. I am attaching the list of 200 words for ready reference. I hope you all understand my question.
Best wishes
Alam
Hi Alam,
you can either use the Math Formula node or the Java Snippet node. The correspondent formula(s) can be found here:
https://en.wikipedia.org/wiki/Tf–idf
I hope that's what you were looking for.
Best,
Marc
Thanks Marc. Actually i dont know which nods i will use and how i will do it? Can you you please help to making the workflow and creating the step by step nods?
Best wishes
Alam
You can import the data from your CSV file using the CSV Reader node. Thereafter you can use one or more Math Formula nodes to calculate whatever you want (normalized term frequency, inverse document frequency). Alternatively you can use the Java Snippet node if you are familiar to Java.
If you still don't understand what I mean, you may find some tutorials and example workflows somewhere on the KNIME homepage.
Good luck! :-)
Marc