I am building a PLS model in R where I am splitting the data into test and training sets using the partitioning node: Settings: Relative% 60 Draw randomly. This gives me a Training set of 43 and a Test set of 29 observations.
Then I use the R Learner (Local) node as follows
#PLS model
require(pls) #load PLS library
x<-R[,2:87] #send all but first column as X, Y is the first column
fit <- plsr(R$"Retention.Time" ~as.matrix(x), ncomp=5 ,data=R,validation="LOO")
R<-(fit) # pass model to Knime
Then I use the R Predictor model as follows:
require(pls) #load PLS library
pred<-predict(RMODEL,RDATA, ncomp=5) #Apply PLS to test data 'RDATA'
R<-cbind(RDATA,pred) #<=====BUG HERE======>
Causes an error message:
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 29, 43 Calls: cbind -> cbind -> data.frame.
This is odd because you expect the test data alongside the test predictions
If I change the code to debug the problem I get the following:
R<-RDATA # shows that RDATA is the TEST set of 29 rows so I'm sending the right data
R<-pred # is 43 rows NOT 29 so its predicting the TRAINING set not the TEST set!
Unless I am being incredibly stupid (always possible), it looks as if the predictions are being applied to the training set held in the R model not the test set being applied at the input.
I see the same problem when using the standard R lm (Multiple Linear Regression) model, so its not specific to the pls package.
Is it possible for someone to look into this? Its a show stopper for R based regression methods.
I am still struggling with the syntax here even for using the simple linear model in R
Imagine I have a 2 columns called "StdConc" and "Area" and a simple 5 point linear regression:
R-Learner Node: (fed with a 5 row, 2 column table)
X<-R$"StdConc"
Y<-R$"Area"
# fit model
model <- lm (Y~X, na.action = na.exclude)
# get model summary
summary(model)
R<-model
R-Predictor node (fed with a new set of 17 data points)
R<-cbind(RDATA, predict(RMODEL, RDATA))
Gives the following error
ERROR R Predictor Execute failed: Execution of R script failed: 'newdata' had 17 rows but variable(s) found have 5 rows
(This is beacause 'predict' is returning results from the 5 row TRAINING set which makes me think there is still an error in this node)
If I try
X<-R$"Area" #(the new data = test set)
R<-cbind(X, predict(RMODEL, newdata=X))
I have now given up on R_predictor node as I found that the default RDATA dataset is some relic from a previous analysis. ( and 'R' doesnt work either)
I have a workaround:
I have a "Table to R" node with the following code:
###########################
#PLS (table has Y variable as last column)
###########################
require(pls)
x<-R[,1:ncol(R-1)]
y<-R$"Y"
fit <- plsr(y~., ncomp=5 ,data=x,validation="LOO")
R<-(fit) # pass model to Knime
And then I use a "R+ Table to R" node with the following code:
That's still odd. However, I would like to invite you to test out the new R (interactive) integration released with KNIME 2.8 in KNIME Labs. Sorry again for the hassle and I hope we can sort out things while looking at the new nodes?
I am really looking forward to trying the new R (interactive) nodes, but am having trouble getting them to work.
When I tried to use the "R View (Table)" node, I got the following error: R cannot be intialized.
R_HOME does not contain a folder with name "bin".
R_HOME is meant to be the path to the folder which is the root of Rs installation tree.
It contains a bin folder which itself contains the R executable.
Please change the R settings in the preferences.
So, I set the Path to R Home for R (labs) to "C:\Program Files\R\R-3.0.1". Now I get the following error: R cannot be intialized.
null
What is the correct way of specifying the R_HOME? Any help would be appreciated.
The R_HOME is correct! I justed added an FAQ entry explaining the solution for the new R (labs) integration, see here. The problem is to point to the jri.dll (64bit) with a system property that need to be added to the knime.ini.
I followed the steps outlined in the FAQ regarding R (Interactive). All steps in the FAQ were followed except for "remove the feature org.knime.features.rengine.r2.feature.group from the
installation ..." because I couldn't find org.knime.features.rengine.r2.feature.group in my installation.
When I try to configure any node from the R (Interactive) group, I am still getting "R cannot be initialized" error. When I try to execute the node, I get the "RController" error. I am using KNIME 2.8.2 and R 3.0.1.
I think for R3 you have to add the following line to your knime.ini: -Djava.library.path=C:\Users\xyz\Documents\R\win-library\3.0\rJava\jri\x64 Hope this helps?
That was the source of the problem! I had installed R 64bit without the 32-bit libs. I re-installed R-3.0.2 64-bit and included the 32-bit libs this time. This has fixed the problem. R(Interactive) nodes in KNIME are now working.
I included a similar line in knime.ini. It is slightly different than the one you specified to reflect the correct location of jri.dll in my machine. This is the line I had included in knime.ini
I am running KNIME 2.9.2 and am trying to set up to use the R (Interactive) nodes. I have been getting this error
R cannot be intialized.
R_HOME is not a directory.
R_HOME is meant to be the path to the folder which is the root of Rs installation tree.
It contains a bin folder which itself contains the R executable.
Please change the R settings in the preferences.
I am on Mac OS 10.7.5 with R 3.0.1
I set the R_HOME in the preferences to many things (eg. I tried /Library/Frameworks/R.framework/Resources/R and /usr/bin/R)
I installed rJava.
in knime.ini I put : -Djava.library.path=/Library/Frameworks/R.framework/Versions/Current/Resources/library/rJava/jri/
I removed the feature org.knime.features.rengine.r2.feature.group from the installation
After these steps I still have the same error. Do you have suggestions what might be going wrong?