Hi,
I have set up a regression model to try to predict a customers spend in Q3 from the spend in Q1 and Q2 of a year plus frequency and recency scores for the same time horizon. The input variables I used in full are:
CustomerID
Frequency Score - Complied score of how often the customer has purchased
Recency Score - Compiled score of how recently the customer has purchased
SpendQ1 - Total Monitory Spend in Q1 of the year
SpendQ2 - Total Monitory Spend in Q2 of the year
SpendQ3 - Target Attribute
In order to train the Linear Regression Learner I have partitioned the data set 70/30 and sent 70 to the learner (minus the CustomerID attribute) and 30 to the predictor. The scorer result I get from the predictor looks like:
and if I plot the SpendQ3 and the Predicted SpendQ3 it looks like:
To me this looks pretty good in turns of a predictive output but I learned to always be a bit careful with regression outputs and would like to get a second opinion. Does the result (and the method in obtaining these results) look accurate and trustworthy or am I missing something fundamental?
Thanks so much for your input.