Cross Validation Help

Hi

I’m in the process of learning Knime cross validation node. I was able to build a model using “leave one out” method and Weka’s SMOreg model.
I followed the online instructions on how to use the cross validation node, however I’m not sure how to use this model and score it onto new data.
Say for example I have another dataset where values of my outcome are not yet available (let’s say predict future sales), so I need to predict them using the SMOreg model developed in cross validation node. How can I do that?

Normally, I would use the “predictor” and connect it to both the learner and a separate dataset, but I don’t know if I can attach a predictor to a cross validation node and if so, how to do that.

Just to make things easier to explain here are the files that contain the picture of what the node looks like now (just click on the picture to enlarge it):

http://yfrog.com/3dcrossvalidationj

Also, here’s my workflow:

http://www.megaupload.com/?d=WBXIKPYW

Just enter the code in the upper right corner into the box next to it and the zip file will start downloading.

Thank you very much for your help !!!

Cross validation is not intended to train models for future use. It is meant to estimate how well the model will perform on data similar to the training and test data. Why not train the SMOReg node on your complete test data?

Hi Thor, thank you for your reply.

I didn’t know cross validation was not used to score a model on a new dataset.

I thought it was basically assembly of SMOReg models that are fit on a piece of data and at the end each of them
has a vote or their outputs are averaged. I guess I was wrong.

By the way, what do you mean by “Why not train the SMOReg node on your complete test data?”

In this case I need to predict future values becasue the values of my output are not yet available.
Therefore, I wouldn’t be able to use a dataset that only has values of predictors (and not outcome) to train SMOReg learner node.

Maybe I misunderstood what you were saying.

Thanks

But you used some dataset with target values for your cross-validation, didn’t you. Why don’t you use just this complete dataset to train the SMOReg model?

Hi Thor,

Yes, of course I can do that.

Thanks for taking time to clarify.