Column types changed when passing data from one R Snippet node to another

I'm wanting to use KNIME to work with R, but finding that when I output a data frame from one node and input into another, the data types of some R columns are changed from character type to numeric type. If I pass an R data frame from one node to the next, shouldn't I end up with an equivalent object?

I constructed a simple two-node example to illustrate. The code for the two nodes is below. I create a simple R data frame with two character columns and one numeric, set it as the output of the first node and the input of the second node.

The first column (patients) is the patient ID. It needs to be treated as an identifier, not a number, so leading "0" should not be dropped. The second column (diagnoses) is a bit vector, a string of "1" and "0", encoding the presence or absence of six disease conditions. So the value "001000" should not be converted to "1000".

Node 1 R Snippet

patients <- c( '0088', '0247', '0880', '1271' )
diagnoses <- c( '001000', '011001', '000000', '110000' )
age <- c( 80, 70, 09, 8 )

R <- data.frame( patients, diagnoses, age, stringsAsFactors=FALSE )

print( str( R ) )

Node 2 R Snippet

print( str( R ) )

Node 1 Result

'data.frame':	4 obs. of  3 variables:
 $ patients : chr  "0088" "0247" "0880" "1271"
 $ diagnoses: chr  "001000" "011001" "000000" "110000"
 $ age      : num  80 70 9 8

Node 2 Result

'data.frame':	4 obs. of  3 variables:
 $ patients : int  88 247 880 1271
 $ diagnoses: int  1000 11001 0 110000
 $ age      : int  80 70 9 8

At the end of Node 1, the data frame returned via the variable 'R' has two columns of mode 'character' rather than 'numeric'. When the second Node is run and the variable named 'R' is initialized to the data object passed in via the input port, all of the columns are converted to have an R class of 'int', even those that were definitely exported as 'character'.

What do I need to do so that I can create an R object in one R Snippet node and use it in another R Snippet node?

I'm not sure whether there is any workaround (you might like to wait for an answer from the KNIME people).

If not, you could try to work with our R Scripting Integration nodes (http://tech.knime.org/community/scripting). At least your example will work as you would expect it. You just have to run a local R server (can be started in R if you have installed the package "Rserve"). If you have further questions, don't hesitate to contact me.

Thanks Antje and thanks rake for your R question. Our Local R integration is limited on the underlying file exchange which happens based on CSV files. However, the scripting integration from MPI (that Antje mentioned) as well as our R Remote nodes support R data types through the used Rserve package.

Antje, I know that I initially included Community Contributions, but somehow I didn't find the R Scripting stuff.  Maybe something was wrong with my Eclipse configuration, or I somehow just wasn't seeing it.  I've fixed that.  Now that I've R Scripting a bit, yes, this design is much more to my liking.  The R objects seem to remain "as is".

I had also experimented with the "R Snippet (Remote)" nodes, using Rserve.  However, I did not notice that there was a difference in the objects passed from one node to the next.  I had assumed that the Remote versions were identical to the Local versions, only differed according to where the R code was run.  Though I found it much easier to debug R using Local, since "print()" debug statements are available in KNIME via Right-Click "View: R Std Output".

-- Randy Kerber
    San Jose, California

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.