Small performance hack for R nodes

As I understand, the R input dataset for a R node is read from a CSV file generated by the previous node. This is somewhat inefficient, as data frames in R are “column-wise” rather than “row-wise” as it is the case for other nodes in KNIME.

There is some room to efficiency improvements if you change this schema slightly. Instead of a single CSV file you can generate n --n being the number of columns in the CSV file-- text files containing a single column each.

Then, instead of submitting a single read.table statement to read the whole CSV file, you can get the same dataframe doing something similar to cbind, sapply( dir(), function(x) read.table( x, header = T ) ) )

Here, you need R to set its working directory to the directory that contains only the column-csv files.

I did some tests based on the dataset at

and I got systematic 20-25% speed gains.

Best regards,

Carlos J. Gil Bellosta