Serious bug in R predictor? Always predicts Training set not Test set data

I am building a PLS model in R where I am splitting the data into test and training sets using the partitioning node: Settings: Relative%  60 Draw randomly. This gives me a Training set of 43 and a Test set of 29 observations.

Then I use the R Learner (Local) node as follows

#PLS model
require(pls)            #load PLS library
x<-R[,2:87]            #send all but first column as X, Y is the first column
fit <- plsr(R$"Retention.Time" ~as.matrix(x), ncomp=5 ,data=R,validation="LOO")
R<-(fit) # pass model to Knime

Then I use the R Predictor model as follows:

require(pls)             #load PLS library
pred<-predict(RMODEL,RDATA, ncomp=5)     #Apply PLS to test data 'RDATA'
R<-cbind(RDATA,pred) #<=====BUG HERE======>

Causes an error message:
Error in data.frame(..., check.names = FALSE) :  arguments imply differing number of rows: 29, 43 Calls: cbind -> cbind -> data.frame.

This is odd because you expect the test data alongside the test predictions

If I change the code to debug the problem I get the following:

R<-RDATA    # shows that RDATA is the TEST set of 29 rows so I'm sending the right data
R<-pred       # is 43 rows NOT 29 so its predicting the TRAINING set not the TEST set!

Unless I am being incredibly stupid (always possible), it looks as if the predictions are being applied to the training set held in the R model not the test set being applied at the input.

I see the same problem when using the standard R lm (Multiple Linear Regression) model, so its not specific to the pls package.

Is it possible for someone to look into this? Its a show stopper for R based regression methods.

Many thanks,

Mark

 


 

dear mark,

You could use this learner

library(pls)
R<-R
y<-your Y variable x<-as.matrix(R)
model<-mvr(y ~ x, 10, data = R, method = "simpls") # set n=10 for example
R<-model

for predictor

library(pls)
x<-as.matrix(R[your range ])
R<-cbind(x, predict(RMODEL,x));

this could be work well for you

bye

fab

I am still struggling with the syntax here even for using the simple linear model in R

Imagine I have a 2 columns called "StdConc" and "Area" and a simple 5 point  linear regression:

R-Learner Node: (fed with a 5 row, 2 column table)

X<-R$"StdConc"
Y<-R$"Area"
# fit model
model <- lm (Y~X, na.action = na.exclude)
# get model summary
summary(model)
R<-model

R-Predictor node (fed with a new set of 17 data points)

R<-cbind(RDATA, predict(RMODEL, RDATA))

Gives the following error
ERROR     R Predictor     Execute failed: Execution of R script failed: 'newdata' had 17 rows but variable(s) found have 5 rows
(This is beacause 'predict' is returning results from the 5 row TRAINING set which makes me think there is still an error in this node)


If I try

X<-R$"Area" #(the new data = test set)
R<-cbind(X, predict(RMODEL, newdata=X))

I get the same error

and if I try

R<-cbind(R$"Area", predict(RMODEL, newdata=R$"Area"))

I get a prediction but again of the TRAINING data NOT the new data of 17 rows

So my question is what is the right syntax to get it to predict the new data (test set)

Many thanks

I have now given up on R_predictor node as I found that the default RDATA dataset is some relic from a previous analysis. ( and 'R' doesnt work either)

I have a workaround:

I have a "Table to R" node with the following code:

 

###########################
#PLS (table has Y variable as last column)
###########################
require(pls)
x<-R[,1:ncol(R-1)]  
y<-R$"Y"
fit <- plsr(y~., ncomp=5 ,data=x,validation="LOO")
R<-(fit) # pass model to Knime
 
And then I use a "R+ Table to R" node with the following code:
 
require(pls)
pred<-predict(fit,R, ncomp=5)
R<-pred

This seems to work as expected

That's still odd. However, I would like to invite you to test out the new R (interactive) integration released with KNIME 2.8 in KNIME Labs. Sorry again for the hassle and I hope we can sort out things while looking at the new nodes?

I am really looking forward to trying the new R (interactive) nodes, but am having trouble getting them to work.

 

When I tried to use the "R View (Table)" node, I got the following error:
    R cannot be intialized.
    R_HOME does not contain a folder with name "bin".
    R_HOME is meant to be the path to the folder which is the root of Rs installation tree.
    It contains a bin folder which itself contains the R executable.
    Please change the R settings in the preferences.

 

So, I set the Path to R Home for R (labs) to "C:\Program Files\R\R-3.0.1".  Now I get the following error:
     R cannot be intialized.
     null

 

What is the correct way of specifying the R_HOME?  Any help would be appreciated.

The R_HOME is correct! I justed added an FAQ entry explaining the solution for the new R (labs) integration, see here. The problem is to point to the jri.dll (64bit) with a system property that need to be added to the knime.ini.

I followed the steps outlined in the FAQ regarding R (Interactive).  All steps in the FAQ were followed except for "remove the feature org.knime.features.rengine.r2.feature.group from the
installation ..." because I couldn't find org.knime.features.rengine.r2.feature.group in my installation.

When I try to configure any node from the R (Interactive) group, I am still getting "R cannot be initialized" error.  When I try to execute the node, I get the "RController" error.  I am using KNIME 2.8.2 and R 3.0.1.

What am I missing?  Any help would be appreciated.

I think for R3 you have to add the following line to your knime.ini: -Djava.library.path=C:\Users\xyz\Documents\R\win-library\3.0\rJava\jri\x64 Hope this helps?

Is it possible that you installed R 64bit (with out 32bit libs)? We currently have a problem when only 64bit is installed, we are going to fix that.

That was the source of the problem! I had installed R 64bit without the 32-bit libs.  I re-installed R-3.0.2 64-bit and included the 32-bit libs this time.  This has fixed the problem.  R(Interactive) nodes in KNIME are now working.

Thank you for your help.

kannan

I included a similar line in knime.ini.  It is slightly different than the one you specified to reflect the correct location of jri.dll in my machine.  This is the line I had included in knime.ini

-Djava.library.path=C:\Users\myusername\Documents\R\R\win-library\3.0\rJava\jri\x64

I am still getting "R cannot be initialized" and "RController" error.  What else could be causing this?

Thank you.

 

We addressed and fixed this problem with KNIME 2.9

Hello,

I am running KNIME 2.9.2 and am trying to set up to use the R (Interactive) nodes. I have been getting this error

R cannot be intialized.
R_HOME is not a directory.
R_HOME is meant to be the path to the folder which is the root of Rs installation tree.
It contains a bin folder which itself contains the R executable.
Please change the R settings in the preferences.

I am on Mac OS 10.7.5 with R 3.0.1

  • I set the R_HOME in the preferences to many things (eg. I tried /Library/Frameworks/R.framework/Resources/R and /usr/bin/R)
  • I installed  rJava.
  • in knime.ini I put :  -Djava.library.path=/Library/Frameworks/R.framework/Versions/Current/Resources/library/rJava/jri/
  • I removed the feature org.knime.features.rengine.r2.feature.group from the installation

After these steps I still have the same error. Do you have suggestions what might be going wrong?

Many Thanks

 

 

Hiyas, 

I actually use: /Library/Frameworks/R.framework/Versions/3.0/Resources/library/rJava/jri

in my knime.ini, could you give that a try?

Regards,

Aaron