Linear Regression Learner "not enough data"

Hey,
I stumbled upon this topic:

Is there any explanation for this behaviour? Why can’t I fit n parameters having n rows of data?

Best regards,
Hendrik.

1 Like

As far as I can tell, there isn’t be a reason why the Linear Regression Learner SHOULDN’T be able to give you the correct answer. My guess is maybe there is something hardcoded into the node where there is a minimum number of observations required. (For what it’s worth, many statistics text books recommend ~10 observations per parameter to get a result you can trust.) It looks like KNIME wants at least 3 observations.

Also, just FYI it does seem like the Linear Regression Learner needs some cleaning up. I posted an issue about singularities (i.e. all X values are the same) and I haven’t seen anything about this bug being fixed in the changelogs.

3 Likes

Thanks for the pointer to the other problem. Really sounds like they should rework that node…
I cannot confirm that my problem disappears with >= 3 rows, by the way. I get the same error message with 5 rows / 5 vars.

1 Like

Hi @HendrikE,

we are using an external package (org.apache.commons.math3.stat.regression) for the Linear Regression Learner. In their documentation, it states that they throw an exception if there are less observations than variables (observations < variables). However in the code it is stated as observations <= variables. This is were your problem arises. We are not sure if that is a bug or if it was intended by the package developer, because the cases of observation = variables are not very reliable.

Best,
Janina

3 Likes

Thanks a lot for your comments @janina. Mathematically, the case where the number of observations equals the number of variables is perfectly defined and has a unique solution, so -for what it’s worth- I would consider this a bug and would expect it to be fixed in Knime.
Best,
Luis

Hi @janina,
thanks for your answer. Is it this class one you are using?

https://commons.apache.org/proper/commons-math/javadocs/api-3.3/src-html/org/apache/commons/math3/stat/regression/SimpleRegression.html

Or could you please point to the source? I would really like to see whether I can make sense of it in the context.

Hallo @HendrikE,

we are using this one: https://commons.apache.org/proper/commons-math/javadocs/api-3.3/src-html/org/apache/commons/math3/stat/regression/MillerUpdatingRegression.html#line.43

Best,
Janina