Error with Logistic Regression Learner

 Hi,

 When I attempt to run the Logistic Regression Learner I keep getting the error: "Execute failed: Matrix is singular."

What is the reason for this and how can I resolve it?

Regards,

SC

Basic algebra tells us that a multipleregression linear equation can't be solved when the matrix is singular. A singular matrix means that the X columns in your input table are correlated. Remove them and try. Use the correlation node to find a matrix of correlations and filter the highly correlate ones out completely or keep 1 of them. 

Okay, so let's dig a little deeper into my problem then.

A snippet of my input data set looks like this:

Row ID, Label, Gender_Female, Gender_Male, 50_Percent_Bill_Paid, 100_Percent_Bill_Paid

1, true, 1, 0, 1, 1

2, false, 1, 0, 1, 0

3, false, 0, 1, 0, 0

Does this mean I should get rid of the correlating data from above.  Gender_Female and Gender_Male are dependent on eachother.  So should I remove one of those columns?

Can you please elaborate a little more now that you know what my data looks like?

Thanks.

Regards,

SC

Yes the genders are perfectly negatively correlated. see what happens when 1 of them is removed. If your dataset has many more columns its usually better to visualize corrrelations between all columns using the correlation node.

I understand.  Thanks.  I also found the problem with my data set.  Apparently, since my data was binned there were some columns with all 0.  I believe this is what they actually meant by singular matrix.  The column should have had at least 1 non-zero value.

Thank you for all of your help InsilicoConsulting!

The low variance filter node is very useful in removing columns where all values are the same. I would have recommended using the same, but your sample data did not have that column! :-)

cheers

Hi!

I have the same problem – if I remove the correlated characteristics / columns with correlation grater 0.3, it still doesn’t work.

It runs only when I have 2 columns (correlation 0.018) left, but the analysis then has not that much value…

Is there another way to get around the singular matrix? Some other reasons for singularity?

(The variance filter node doesn’t help since I have nominal data (strings).)

Thanks!

 

Hi sorinps,

the current implementation is unfortunately very fragile when it comes to the properties of your dataset.

You can still check if you have constant columns (containing only a single value), although I believe the learner should complain if that's the case.

Probably your best bet is to wait for the next release (which will actually this week ;)) because there will be another solver for logistic regression that is much more robust to "bad" data than the old one. (It also allows to use regularization which can be an effective countermeasure against those problems).

Cheers,

nemad