I am new to knime and already struggling with it. I am thinking of using knime as my main tool in my bachelor project.
Now to the problem. I have got a CSV file in which each row respesents a document (one string). I want to transform it to vector representation (document-term matrix, or term-document matrix).
I am able to read in the file and knime recognises each row as a string but then when I use Strings To Document node, it outputs a Document but each row consits only of "".
only the document title is shown in the data output table view. If the title is empty, "" is shown.
To create a numerical representation use the document vector node on a bag of words, as shown e.g. in the classification example (http://tech.knime.org/document-classification-example).
Here are some links which ma help you getting started with KNIME Textprocesing:
Thanks for the links. I already had a look at these.
As I have understood, I have to use the String To Document node. The problem with the String To Document node is that I have to specify colums for Title, Full Text and Authors. Anyways, whatever I do, String To Document does not seem to be the right node for me. But then, which am I supposed to use?
the Strings to Document node is the right node to create documents from strings. To create authors simply use the Java Snippet node and add a column containing a string, e.g. "John Doe" and a title e.g. "TitleX", with X as the row index.
Attached you find an example workflow (requiring KNIME Textprocessing >=2.9), showing how to create document vectors from strings.