Evaluating regression modelling results

cason · February 14, 2019, 7:53pm

Hi,

I have set up a regression model to try to predict a customers spend in Q3 from the spend in Q1 and Q2 of a year plus frequency and recency scores for the same time horizon. The input variables I used in full are:

CustomerID
Frequency Score - Complied score of how often the customer has purchased
Recency Score - Compiled score of how recently the customer has purchased
SpendQ1 - Total Monitory Spend in Q1 of the year
SpendQ2 - Total Monitory Spend in Q2 of the year
SpendQ3 - Target Attribute

In order to train the Linear Regression Learner I have partitioned the data set 70/30 and sent 70 to the learner (minus the CustomerID attribute) and 30 to the predictor. The scorer result I get from the predictor looks like:

and if I plot the SpendQ3 and the Predicted SpendQ3 it looks like:

To me this looks pretty good in turns of a predictive output but I learned to always be a bit careful with regression outputs and would like to get a second opinion. Does the result (and the method in obtaining these results) look accurate and trustworthy or am I missing something fundamental?

Thanks so much for your input.

ipazin · February 20, 2019, 1:09pm

Hi there!

Sry for a slow response. You doubt in your data, skills or in a power of linear regression?

Not sure a model can be properly evaluated without seeing and knowing data and context as well but from what you wrote I don’t see you missing something fundamental. Are all 4 variables statistically significant in your model? I guess R2 from Predictor node is also big?

Don’t know what industry are you in but the variables you have should pretty good explain your target attribute.

From