I'm wanting to use KNIME to work with R, but finding that when I output a data frame from one node and input into another, the data types of some R columns are changed from character type to numeric type. If I pass an R data frame from one node to the next, shouldn't I end up with an equivalent object?
I constructed a simple two-node example to illustrate. The code for the two nodes is below. I create a simple R data frame with two character columns and one numeric, set it as the output of the first node and the input of the second node.
The first column (patients) is the patient ID. It needs to be treated as an identifier, not a number, so leading "0" should not be dropped. The second column (diagnoses) is a bit vector, a string of "1" and "0", encoding the presence or absence of six disease conditions. So the value "001000" should not be converted to "1000".
Node 1 R Snippet
patients <- c( '0088', '0247', '0880', '1271' ) diagnoses <- c( '001000', '011001', '000000', '110000' ) age <- c( 80, 70, 09, 8 )R <- data.frame( patients, diagnoses, age, stringsAsFactors=FALSE )
print( str( R ) )
Node 2 R Snippet
print( str( R ) )
Node 1 Result
'data.frame': 4 obs. of 3 variables: $ patients : chr "0088" "0247" "0880" "1271" $ diagnoses: chr "001000" "011001" "000000" "110000" $ age : num 80 70 9 8
Node 2 Result
'data.frame': 4 obs. of 3 variables: $ patients : int 88 247 880 1271 $ diagnoses: int 1000 11001 0 110000 $ age : int 80 70 9 8
At the end of Node 1, the data frame returned via the variable 'R' has two columns of mode 'character' rather than 'numeric'. When the second Node is run and the variable named 'R' is initialized to the data object passed in via the input port, all of the columns are converted to have an R class of 'int', even those that were definitely exported as 'character'.
What do I need to do so that I can create an R object in one R Snippet node and use it in another R Snippet node?