Dear all,
I have just started to use knime, and I have found some issues in executing R scripts within a workflow. Basically, after extraction, I send a table of data to a "R learner" object in which I perform a feature selection with the R package caret. Unfortunately the script (below) gives an error during the execution: Error in { : task 1 failed - "undefined columns selected".
Now, saving the input table of the knime node as csv and importing it in R, I can execute the same code with no errors. I checked the library path of R loaded within knime and it correspond to that loaded in the native R. I tried to execute the code on a single processor, with no success. I also tried to execute the same code (with small adaptation on the input/output objects) with both the R snippet both from the community (with a local server) and from the standard implementation, obtaining the same error. On the other hand, other R scripts run without problems within knime.
I was wondering why the same R code gives different outcomes when executed within knime and in the native R environment.
Any help is really appreciated.
Thanks,
Luigi
-----------------------------------------------------
##SETTINGS
ncpu=2
niters=2
numCVs=2
##EXECUTE
library(caret)
library(doMC)
registerDoMC(ncpu)
#Import table
tableIn=knime.in
tableIn<-tableIn[complete.cases(tableIn),]
yId=grep('^LD',colnames(tableIn))
y=tableIn[,yId]
x=tableIn[,-c(1,yId)]
rownames(x)<-tableIn[,grep('^database_',colnames(tableIn))]
## REMOVE ZERO VARIANCE VARIABLES
xzv<-x[,!nearZeroVar(x, saveMetrics= TRUE)$zeroVar]
## REMOVE HIGHLY CORRELATED VARIABLES
correlationMatrix <- cor(xzv)
highlyCorrelated <- findCorrelation(correlationMatrix, cutoff=0.85)
xzvhc<-xzv[,-highlyCorrelated]
dim(xzv)
Ctrl<-rfeControl(functions = caretFuncs, rerank = FALSE,method = "repeatedcv",
saveDetails = FALSE,number = niters ,repeats = numCVs,verbose = TRUE,
returnResamp = "final",p = .75,index = NULL,
timingSamps = 0,seeds = NULL,allowParallel = TRUE)
nvar=0
i=0
## N SET VARIABLES TO EVALUATE
while(nvar < dim(xzv)[2]) {
i=i+1; nvar = 2^(i+1)
}
if (i>6) {i=6}
sizevar=2^(2:i)
rfefit <- rfe(xzvhc,y, metric="RMSE", maximize=FALSE, method = "pls",
rfeControl = Ctrl,sizes=sizevar)
knime.model <- rfefit$fit