My Simple Regression Tree has 0 deviations in the Predicted Values compared to the Actual Values

I'm repurposing my workflow to test a different column of the 40 columns.  The training data has 1768 records.  The test data has 442 records.  The problem is that (and I've done this for 6 different columns), I keep getting perfect {the difference of abs($Predicted_Value$-$Actual_Value$) } for the Simple Regression Tree (Learner and Predictor) is 0 for all records.  And the same differences with Random Forest and Tree Ensemble are very close to 0 too.  When I initially ran this looking for Hours worked, the error margin was approximately 27% (which is probably pretty good considering the small data size).  However it seems like whenever I run it for any other field, it is getting the exact value of that field in the prediction.  I've tried resetting the Reader, the Partitioner and the Learner Nodes, to no avail.  Any ideas?

Hi Karl,

not easy to diagnose without having access to your test data set, but there is a possibility this is not due to a kind of malfunction of KNIME.

My hypothesis is that your data set has a high number of repetitions. Since you are training on 80% of the data set (pretty large) and testing on the remaining 20%, if there are a lot of repetitions in the data the training set may cover almost in full the cases in the test set, hence the high performance of the predictor.

Did you try to simply plot the data to check what they look like?

Did you try to use less data in your training set and verify what happens to the performance of the predictors?

How many levels are you using for the modeling nodes? Did you try to use fewer of them and check the effect on the performance of the predictors?


Without the data, one can indeed only guess. I suppose you have removed all unique direct or indirect identifier columns from the data set before feeding the tree ?