I’ve noticed some odd results with the linear regression learner when the observations are constant.
Y = Unit sales. X = Avg Selling Price
Linear Regression Learner Result:
Weka Linear Regression:
What’s the best way to avoid erroneous results as pictured in the Linear Regression Learner? It seems as though the coefficient for the independent variable should be 0 or undefined.
Is there a setting I can change or a node that allows me to easily remove a group if all rows in the group are constant?
I’m doing thousands of linear regressions via a loop and taking the coefficient to calculate price elasticity, so it’s important that I can get the correct avg_selling_price coefficient. Using the R snippet isn’t a bad option, but it is much slower than the Linear Regression Learner (perhaps because I’m loading stargazer library to convert to a data frame?).
Thanks for your help and potential solutions
Hi @Snowy -
Thanks for posting about this - I will create a ticket and let the dev team know.
In the meantime, you can remove problematic singular value columns using the Constant Value Column Filter node.
The constant value column filter will just remove the single column, right? The rest of the table would be passed through the rest of the workflow – in this case, into the regression model, which would then fail, as the column (that was being used as a variable) no longer exists. I really need a way to pass a blank table using the Empty Table Switch, if the value is constant.
Also - I mentioned it in another thread several months back, but I think it would be great if KNIME had an open bug tracker, similar to how some other open source software does it. Here’s an example from the PHPBB forum software: https://tracker.phpbb.com/secure/Dashboard.jspa
FYI - I managed to solve this problem by creating a calculation of the variance in average price using the Math Formula node and combining it with the rule engine node.
Variance = 0 => TRUE
(Exclude true matches)
This way a table that has a constant value is never passed into the regression learner node.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.