AUC from logistic regression

MarcB · July 30, 2020, 6:27pm

Good evening,

I am running different algorithms (random forests and logistic regression with Laplace/lasso regularization) for a binary classification problem to compare them thereafter.

I have built the LR algorithm (middle section of the attached workflow) after 10-fold cross-validation and I have two questions:

How do I get a single AUC derived from the combined effect of all (non-excluded) predictors, instead of one AUC for each predictor?
How can I get access to variables remaining in the final model after Laplace/LASSO regularization?

Example LR.knwf (68.0 KB)

Thank you,
Marc

ScottF · July 30, 2020, 9:43pm

Hi @MarcB -

I can’t run your workflow directly since the data isn’t included. But to try and answer your questions:

In the ROC curve, make sure that in the green include dialog you put only the probability of the positive class, and not other numeric values.
I believe you can check which model coefficients have been forced to zero (or close to it) by looking at the second output port of the Logistic Regression Learner node.

MarcB · July 31, 2020, 10:50am

Thank you Scott, all clear. I am worried, though, that all predictors seem to be discarded (coefficients close to 0, very inflated SE and p-values = 1.00, but final AUC = 1.00; workflow attached without data because the file is too big). I have reviewed the flow and it is not obvious to me what may be wrong, aside from having a small test set.

Example.knwf (39.2 KB)

Thank you,
Marc

ScottF · August 3, 2020, 3:15pm

Without the data in the executed workflow, it’s not obvious to me what else might be wrong here.

Have you tried classifiers other than logistic regression here? Is your primary interest in this case prediction or interpretability?

MarcB · August 3, 2020, 10:40pm

Hi @ScottF (and others),

Please find attached a sample from the dataset (cases and predictors have been removed to meet uploading capacities). I am also working with Random Forests and SVM. Logistic Regression with LASSO was one of the selected algorithms due to ease of interpretation, but in the end the main aim is prediction of class, and the AUC from all models will be compared to determine which model provides the better classification.

Z_Example.knwf (3.4 MB)

Thank you,
Marc

ScottF · August 4, 2020, 4:14pm

I think this is covering a lot of the same ground as in your other thread below, and is likely to suffer from the same problem discussed in detail there - some sort of data leakage. I’ll close this thread to keep the forum tidy.