HELP! :) Why is my Backward Feature Elimination sooooo slow

Hi there,

I have configured a Backward Feature Elimination node on a logistic regression with 8,603 rows and 34 columns. I have not been able to have this loop finish. It has been hours since I started the process. I feel like this is not a lot of data and shouldn’t be taking this long. Any help is really appreciated.


Hello Allyson,

you are right for your setting this should not take hours (assuming you have reasonable processing power).
Do any nodes in the loop seem to hang for a long time?
If possible could you share your workflow?



Hi Adrian

Unfortunately, I can’t share my workflow because I am working with sensitive data. A logistic regression model with all of my variables would include 74 parameters including all of the dummy variables. I have tried switching between a Stochastic Average Gradient and Iteratively Reweighted Least Squares. This process gets hung up at the Logistic Regression Learner. Depending on the settings I have included I will get these two errors. " The algorithm did not reach convergence after the specified number of epochs. Setting the epoch limit higher might result in a better model." and “The covariance matrix could not be calculated because the observed fisher information matrix was singular.” I get the epochs error up until I increase the number of Epochs to 1,000,000, however, at a million the loop will not finish.

Hello Allyson,

yes, I understand that you can’t share sensitive data, so I will try my best to assist you with some of the things I encountered with the Logistic Regression.

Does the IRLS algorithm work at all? If not, then the most common cause are highly correlated features.
You can either use the correlation filter nodes found in Feature Selection or use the SAG algorithm with regularization by selecting a prior for your model parameters (Gauss corresponds to L2 or Ridge regression and Laplace to the LASSO).
If none of this helps, another common reason for this error with the SAG algorithm is that some features obtain much larger values than others, to prevent this you can use the normalizer node to z-score normalize your features.
Please note also that you don’t need to dummy encode your categorical variables as the learner will do so internally.

Kind regards,