Stemmer problem (3.1)

Thanks for providing version 3.1 of KNIME analytics platform. I try to adapt an example of the text mining webinar and replace the deprecated nodes by the new ones. While configuring the Stemmer node (e.g. Snowball Stemmer) I cannot select "Preprocessed document", only "Document" is available in the drop down menu of the document column.

Best regards, Stefan

Hi Stefan,

with 3.1 a few things changed making textprocesing a bit easier. The preprocessing nodes now don't cut awy additional columns. Therefore the old nodes had to be deprecated and new nodes had to be implemented. Additionally the new preprocessing nodes are streamable.

The dialog of the preprocessing nodes changed a bit. By default the "Document" column is selected in the drop down menu. However you can select any column of type Document as well. The input of the preprocessing node should not be a bag of words anymore. It should be simply a list of documents (table at least with one document column).

What is exactly the input of the Snowball Stemmer node? How many documents columns are in that table? Can you see all document column in the drop down menu in the Snowball stemmer node dialog?

Cheers, Kilian

Hi Kilian,

the workflow I’ve created consists of a chain of several preprocessing nodes: “Number Filter”, “Punctuation Erasure”, “Stop word Filter”, “Case Converter” and “Snowball Stemmer”.

First attempt: I tried to configure the preprocessing nodes by selecting always the column “Document” as document column and replacing it by itself. But, when I tried to configure the “Snowball Stemmer” I got an error message: “No column in spec compatible to DocumentValue” and I couldn’t continue.

Second attempt: I tried to reconfigure the “Number Filter” by appending the column “Preprocessed Document” without replacing the column “Document”. In the next nodes I selected the column “Preprocessed Document” as the document column and replaced it by itself (same strategy as in the first attempt). So it was always possible to compare the results of the two columns: “Document” (the original document) with “Preprocessed Document”. The “Snowball Stemmer” offered only the column “Document” and not “Preprocessed Document” in the dialog. So, I could'nt see all document columns in the drop down menu. It was possible to continue without error message, but then all the preprocessed actions had no effect...

Any ideas?

Best regards, Stefan

Hi Stefan,

in your chain of node the case converter is before the snowball stemmer. Since the snowball stemmer node complains that there is no document column, I assume that you use the wrong ase converter. There are two case converter nodes, one for strings and one for documents. You need the one for documents contained in the Textprocessing extension / folder! Btw. it is reciommend to use the case converter after stemming. Some stemmers might consider lower or upper case during the stemming routine.

Cheers, Kilian

 

Hi Kilian,

thanks, now it works! :-)

Stefan