I want to calculate the TF-IDF by document. I dont know how to do this
I am able to do this by term. But dont know how to do that by document. Kindly advice me. This will help me to compare a document with another document.
the tf-idf cannot be calculated for a document. It tells you how often a term occurs in the set of documents divided by how often in occurs in any document.
For comparing documents I would point you to our example server, Kilian made some very nice examples there e.g. the 009002_DocumentClustering generates a vector for each document and than clusters them.
in case you haven't found a solution, attached is an example workflow, showing how to count TF, IDF values and multiply them using the Math Formula node.
I guess the text processing features depend a lot on what one intends to do. I've used them to perform some transformations and mining on a single text column (no title, no authors) for a supervised classification task. I found that for this kind of exercise the current document class seems to be a tad too complex. Maybe it would be worthwhile allowing the possibility in the String To Document node to set some options to "none" (e.g. title, authors, etc.) instead of having to choose an empty string variable in each case.