HI,
I’ve noticed some odd results with the linear regression learner when the observations are constant.
Example…
Y = Unit sales. X = Avg Selling Price
Data set:
What’s the best way to avoid erroneous results as pictured in the Linear Regression Learner? It seems as though the coefficient for the independent variable should be 0 or undefined.
Is there a setting I can change or a node that allows me to easily remove a group if all rows in the group are constant?
I’m doing thousands of linear regressions via a loop and taking the coefficient to calculate price elasticity, so it’s important that I can get the correct avg_selling_price coefficient. Using the R snippet isn’t a bad option, but it is much slower than the Linear Regression Learner (perhaps because I’m loading stargazer library to convert to a data frame?).
Thanks, ScottF.
The constant value column filter will just remove the single column, right? The rest of the table would be passed through the rest of the workflow – in this case, into the regression model, which would then fail, as the column (that was being used as a variable) no longer exists. I really need a way to pass a blank table using the Empty Table Switch, if the value is constant.
Also - I mentioned it in another thread several months back, but I think it would be great if KNIME had an open bug tracker, similar to how some other open source software does it. Here’s an example from the PHPBB forum software: https://tracker.phpbb.com/secure/Dashboard.jspa
FYI - I managed to solve this problem by creating a calculation of the variance in average price using the Math Formula node and combining it with the rule engine node.
Math formula:
COL_VAR($Avg_Selling_Price$)
Rule Engine:
Variance = 0 => TRUE
(Exclude true matches)
This way a table that has a constant value is never passed into the regression learner node.