Very bad result in regression

I have a dataset with 2 inputs and one output. I applied the linear regression with 10 folds cross validation. But the scorer shows negative R2. This was not improve by reducing the inputs from 2 to 1 or using LOOCV.

correlation matrix-.xlsx (5.2 KB)
KNIME_project7.knwf (46.2 KB)

Hi zizoo,
It looks like you do not have enough data: using 10-fold cross-validation on 55 data samples does not make much sense. Furthermore if you plot the values with a Scatter Plot you will be able to see that relationship between dependent and independent variables do not seem to be linear.

I tried to to train a Random Forest model without cross-validation using 80/20 data partitioning on train and test data (the Partitioning node) and got R^2 = 0.32.


Hi Anna,
I have more data but it is unbalanced. This is why I decided to discard them.
From a classification point of view, I think it is possible to add weights in SVM but I am not sure how I can do this in regression problems with weka toolbox.
Do you recommend any other regression weka tools for this type of dataset?

Hi Zied,

you might want to convert your regression problem to a classification problem (Number to String node) and check the LibSVM Weka Classifier, which has an option of assigning weights to classes.