I am analyzing a credit portfolio to estimate probability of default of the borrowers.
The statistical model for fitting PD is the logistic regression, so I am using the "Logistic regression learner" (KNIME 3.3.3) to estimate the parameter alpha and beta. This node does not calculate Accuratcy Ratio, AUROC, odds ratio etc.
I think the correct way to obtain that measures is the predictor node, but if you have different way to do the same thing I am open to hear your advice.
In the Predictor node I can obtain the Probability Columns, that are a transformation of the estimated score, but I cant understand how is calculated the "Prediction colum name". Which is the cutoff used from the software for considering the predicted target variable equal 1?
Neither the Logistic Regression Learner node nor the (Logistic) Regression Predictor Node can calculate all the measures you’re looking for. But you can calculate most of them easily.
- Odd Ratio: The Odds Ratio can be calculated as the exponential function to the power of the regression coefficients. The first data output port of the Logistic Regression Learner node includes the regression coefficients and you can therefore calculate the Odds Ratio with the Math Formula node.
- Accuracy: The accuracy you get as output of the Scorer node. In addition this node provides you also the confusion matrix which I found always worth to have a look at.
About your last question. The cut off for the prediction is 0.5. If you want to use another cut off value you can check again the “append columns with predicted probabilities” checkbox in the (Logistic) Regression Predictor node and use the Rule Engine node after the Predictor node to define your customized cut off value.
FYI: In the newest KNIME version (4.3) the Regression Predictor node has changed and a new node called Logistic Regression Predictor node has been added.