Linear Regression Singularity Error

Snowy · January 8, 2020, 10:13pm

HI,
I’ve noticed some odd results with the linear regression learner when the observations are constant.
Example…
Y = Unit sales. X = Avg Selling Price
Data set:

Linear Regression Learner Result:

Weka Linear Regression:

R LM(Unit_Sales~Avg_Selling_Price)

What’s the best way to avoid erroneous results as pictured in the Linear Regression Learner? It seems as though the coefficient for the independent variable should be 0 or undefined.

Is there a setting I can change or a node that allows me to easily remove a group if all rows in the group are constant?

I’m doing thousands of linear regressions via a loop and taking the coefficient to calculate price elasticity, so it’s important that I can get the correct avg_selling_price coefficient. Using the R snippet isn’t a bad option, but it is much slower than the Linear Regression Learner (perhaps because I’m loading stargazer library to convert to a data frame?).

Thanks for your help and potential solutions

ScottF · January 9, 2020, 8:00pm

Hi @Snowy -

Thanks for posting about this - I will create a ticket and let the dev team know.

In the meantime, you can remove problematic singular value columns using the Constant Value Column Filter node.

Snowy · January 10, 2020, 4:39pm

Thanks, ScottF.
The constant value column filter will just remove the single column, right? The rest of the table would be passed through the rest of the workflow – in this case, into the regression model, which would then fail, as the column (that was being used as a variable) no longer exists. I really need a way to pass a blank table using the Empty Table Switch, if the value is constant.

Also - I mentioned it in another thread several months back, but I think it would be great if KNIME had an open bug tracker, similar to how some other open source software does it. Here’s an example from the PHPBB forum software: https://tracker.phpbb.com/secure/Dashboard.jspa

Snowy · January 21, 2020, 6:59pm

FYI - I managed to solve this problem by creating a calculation of the variance in average price using the Math Formula node and combining it with the rule engine node.

Math formula:
COL_VAR($Avg_Selling_Price$)

Rule Engine:
Variance = 0 => TRUE

(Exclude true matches)

This way a table that has a constant value is never passed into the regression learner node.

system · July 22, 2020, 6:59am

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.