Churn Prediction Modeling

I am currently constructing a Churn Prediction model. I watched the YouTube tutorial video (very helpful!) on how to construct one, and I have completed one using my data. I was hoping to get some further assistance to interpret the results and how to improve the accuracy of the model.

Any help at all is greatly appreciated!


Hello srwolfendale,

In the example shown on YouTube we used a decision tree because of its nice tree visualization and highlighting property. However, be aware that you can use any other available machine learning algorithm as long as it produces nominal class-like predictions (for instance Random Forests). Random Forest should improve your prediction, since it trains multiple decision trees at once (to apply a Random Forest algorithm you would need many data).

Whatever machine learning algorithm you choose, you always need to train it and evaluate it. For this reason, the Partitioning node is required to partition most of the data (~ 70/80%) for training and the small remaining amount (~ 20-30%) for evaluation.

To add, you will increase the accuracy of your prediction with the following techniques:

- treat missing and outlier values;

- feature engineering -> this step helps to extract more information from existing data. New information is extracted in terms of new features. These features may have a higher ability to explain the variance in the training data;

- feature selection -> find out the best subset of attributes which better explains the relationship of independent variables with dependent variable;

- use an ensamble method (This technique simply combines the result of multiple weak models and produce better results). Please have a look at the Will they Blend series post about this:;

Sometimes, the improvement in model’s accuracy can be due to over-fitting too. For this purpose you can use the cross-validation technique ( This method helps to achieve more generalized relationships.

For interpretation (from 

So, we trained a model. But what if the model has not learned anything useful? We need to evaluate it before running it for real on real data. For the evaluation, we use that 20% of data we have kept aside and not used in the training phase, to feed a Decision Tree Predictor node. This node applies the model to all data rows one by one and produces the likelihood that that customer has of churning given his/her contract and operational data (P(Churn=0/1)). Depending on the value of such probability, a predicted class will be assigned to the data row (Prediction (Churn) =0/1).

The number of times that the predicted class coincides with the original churn class is the basis for any measure for the model quality as it is calculated by the Scorer node.

Notice that the customers with churn=0 are, hopefully, many more than the customers with Churn=1. If you want to take this fact into account and give more weight to the error made on the class Churn=1, then you can introduce an Equal Size Sampling node on the test set to under-sample  the more numerous class Churn=0.

Notice also that the Scorer node ‒ or any other scoring node ‒ allows you to evaluate and compare different models. A subsequent Sorter node would allow you to select and retain only the best performing model.

Hope this helps,




I would add a few resources which are interested in the churn prediction problem:

The great thing about these resources is that they go way beyond the data modeling step.

For a more problem oriented approach, you can also check out the website Cross Validated.



Thank you very much for your detailed response Vincenzo.  This has been tremendously helpful in developing my Predictive Churn model. Would you have time to discuss where I am at currently and any advice you would have to take it to the next level?