Test and Training Data

Using the LinearRegression (3.6) node in Weka, to create a model using a sample of a larger data set, creates an error "column has more possible values in test data than in the training data" when that training model is applied using the all of the data. Is there any way ignore these new possible values in the test data, and just the predictor using the variables created by the training data?


Are you sure you are using the correct predictor node? Linear Regression shouldn't look at possible values at all since it's using numbers only.

(Linear Regression node can handle nominal values too, as those are modeled as individual variables for each possible values.)

I got it to work by filtering the test data possible values to match the training data values.  I did drop some test data, but it was insignificant.