Weird KNIME Behaviour when integrating R Snippet into workflow, Node created an empty data table

Hello everyone,

This is my first time working with R in Knime and I am doing something wrong. I am trying to modify the Sentiment example from KNIME Example server (009007_SentimentClassification) to use with tweets instead of IMDB reviews.

As I got very poor result when I tried this workflow with my tweets data (66%), so I modified some parts of the original workflow starting with the Punctuation Erasure because I noticed that this node has a bug as described in

https://tech.knime.org/forum/knime-textprocessing/punctuation-erasure

The original part was: ... >> punctuation erasure >> Number Filter >> N chars filter >> Stop word filter >> Case Converter >> Snowball Stemmer >> Bag of Words Creator >> ...
 

I am trying to replace most of the nodes with one R Snippet node with the followings contents

library('stringr')

# punctuation erasure
knime.in$"Document" <- str_replace_all(knime.in$"Document", "[[:punct:]]", " ")
# rename column
colnames(knime.in) <- "Document2"
# reduce multiple space characters to 1
knime.in <- as.data.frame(apply(knime.in,2,function(x) gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", x, perl=TRUE)))
#convert to lower case
as.data.frame(sapply(knime.in, tolower))

knime.out <- knime.in

And the workflow part becomes: ... > R Snippet > Strings to documents >Number filter > N chars filter > Stop word filter > Case converter > Column filter > Snowball stemmer > Bag of words creator > ...

At the node Bag of words creator I received the warning

Node created an empty data table.

Until now, I cannot resolve this problem, I have checked every step and setting for the nodes that i used without any luck. Do you have any idea what was wrong?

 

Use the String To Document after the R snippet node. That forces you to work with the two columns (Sentiment and Text) but at least R will understand the data - otherwise R only appears to see the Title and not the Full-text.

BTW it occurs to me that you should not put Sentiment into the Title because it will be part of the actual Document and then part of the explanatory variables. That's at least how it was set in the attached workflow... The alternative is to create an empty string column and use that as title or text.