Hi all. I need some help. I have 5 CSV files named, 15Century, 16Century, 17Century, 18Century and 19Century, in these files there 200 most frequently used words from each century. I have an other text file named "unanimous doc.txt". Now i want that compare all these 5 files (centuries) with this one file (unanimous doc.txt) and in result it show me that from which century this unanimous doc belongs to. I am attaching all files. I need a work flow for this.
Hi Philip. Thanks for your reply. Acctually i am not that much good to KNIME, i am just a beginner. I need a workflow for that problem. I didnt create it.
Building the workflow yourself is the best way of learning how things work :)
Some suggestions on getting you started: Have a look at the Text Processing extension to transform your text document to the same stucture as the XXCentury files (lines with [word, count]).
Then compare the result to the individual XXCentury files. You could try transforming the word counts to probabilities and then apply some measure for comparing probability distributions such as Kullback-Leibler divergence.
A book such as "Data Science for Business" may be of guidance. It will not provide you with the KNIME workflow (or with any other code) but it will give you the general approach needed for text mining (as well as an example use case very similar to your's).