I am new to knime and already struggling with it. I am thinking of using knime as my main tool in my bachelor project.

Now to the problem. I have got a CSV file in which each row respesents a document (one string). I want to transform it to vector representation (document-term matrix, or term-document matrix).

I am able to read in the file and knime recognises each row as a string but then when I use Strings To Document node, it outputs a Document but each row consits only of "".

only the document title is shown in the data output table view. If the title is empty, "" is shown.

To create a numerical representation use the document vector node on a bag of words, as shown e.g. in the classification example

Here are some links which may help you getting started with KNIME Textprocesing:


Online documentation:

Example workflows:

Kilian

Thanks for the links. I already had a look at these.

As I have understood, I have to use the String To Document node. The problem with the String To Document node is that I have to specify colums for Title, Full Text and Authors. Anyways, whatever I do, String To Document does not seem to be the right node for me. But then, which am I supposed to use?

the Strings to Document node is the right node to create documents from strings. To create authors simply use the Java Snippet node and add a column containing a string, e.g. "John Doe" and a title e.g. "TitleX", with X as the row index.

Attached you find an example workflow (requiring KNIME Textprocessing >=2.9), showing how to create document vectors from strings.

Kilian