As I understand, the R input dataset for a R node is read from a CSV file generated by the previous node. This is somewhat inefficient, as data frames in R are “column-wise” rather than “row-wise” as it is the case for other nodes in KNIME.
There is some room to efficiency improvements if you change this schema slightly. Instead of a single CSV file you can generate n --n being the number of columns in the CSV file-- text files containing a single column each.
Then, instead of submitting a single read.table statement to read the whole CSV file, you can get the same dataframe doing something similar to
do.call( cbind, sapply( dir(), function(x) read.table( x, header = T ) ) )
Here, you need R to set its working directory to the directory that contains only the column-csv files.
I did some tests based on the dataset at
http://www.cs.utexas.edu/users/pstone/Workshops/2004icml/GenderTrainingSet.zip
and I got systematic 20-25% speed gains.
Best regards,
Carlos J. Gil Bellosta