In the example shown on YouTube we used a decision tree because of its nice tree visualization and highlighting property. However, be aware that you can use any other available machine learning algorithm as long as it produces nominal class-like predictions (for instance Random Forests). Random Forest should improve your prediction, since it trains multiple decision trees at once (to apply a Random Forest algorithm you would need many data).
Whatever machine learning algorithm you choose, you always need to train it and evaluate it. For this reason, the Partitioning node is required to partition most of the data (~ 70/80%) for training and the small remaining amount (~ 20-30%) for evaluation.
To add, you will increase the accuracy of your prediction with the following techniques:
- treat missing and outlier values;
- feature engineering -> this step helps to extract more information from existing data. New information is extracted in terms of new features. These features may have a higher ability to explain the variance in the training data;
- feature selection -> find out the best subset of attributes which better explains the relationship of independent variables with dependent variable;
- use an ensamble method (This technique simply combines the result of multiple weak models and produce better results). Please have a look at the Will they Blend series post about this: https://www.knime.org/blog/KNIMEAnalyticsPlatform-meets-R-and-Python;
Sometimes, the improvement in model’s accuracy can be due to over-fitting too. For this purpose you can use the cross-validation technique (https://en.wikipedia.org/wiki/Cross-validation_(statistics)). This method helps to achieve more generalized relationships.
For interpretation (from https://www.knime.org/knime-applications/churn-prediction):
So, we trained a model. But what if the model has not learned anything useful? We need to evaluate it before running it for real on real data. For the evaluation, we use that 20% of data we have kept aside and not used in the training phase, to feed a Decision Tree Predictor node. This node applies the model to all data rows one by one and produces the likelihood that that customer has of churning given his/her contract and operational data (P(Churn=0/1)). Depending on the value of such probability, a predicted class will be assigned to the data row (Prediction (Churn) =0/1).
The number of times that the predicted class coincides with the original churn class is the basis for any measure for the model quality as it is calculated by the Scorer node.
Notice that the customers with churn=0 are, hopefully, many more than the customers with Churn=1. If you want to take this fact into account and give more weight to the error made on the class Churn=1, then you can introduce an Equal Size Sampling node on the test set to under-sample the more numerous class Churn=0.
Notice also that the Scorer node ‒ or any other scoring node ‒ allows you to evaluate and compare different models. A subsequent Sorter node would allow you to select and retain only the best performing model.
Hope this helps,