Polynomial Regression Learner Error


I'm trying to setup my first workflow with regression modeling. I've generated a data set containing four input variables (x, y, z, k) and one output variable. x, y, z take values in the interval between 0 and 2 and k is randomly set to 0 or 1.

The output variable was computed as: x^2 + 2*y + z^(0.5) + k.

I'm trying then to do a basic polynomial regression analysis. I've attached a screenshot of the workflow: data is partitioned, the first part is used for the learner, the second part is used to test the predictor.

If I'm using a maximum polynomial degree of 1, it works with reasonable errors. If I'm trying to use higher polynomial degrees however, I get the following error:

ERROR  Polynomial Regression (Learner)  Execute failed: The attributes of the data samples are not mutually independent.

Can someone please tell me which could be the cause. I've observed, that if I exclude input variable 'k' from the analysis, it works fine for any polynomial degree.



Would it be possible to see an example worflow where this fails? You could use a Table creator node that is similar to your current input but which only contains fake data. 

Regards, Aaron

The error message tells you that there are at least two variables in the input data set that are depdendent on each other. Since the regression algorithm works with matrix inversions and matrix inversion cannot be performed if there dependent columns it's not possible to perform a regression (with this algorithm).

The dependence doesn't need to in the original dataset but also in the higher order columns that get generated with degrees > 1. Consider -1 0 1 for column A and 1 0 1 for column B: they are not dependent with degree one, but they will become dependent with degree 2: 1 0 1 and 1 0 1.

I got this error when I imported an XL spreadsheet without specifying that it had a header row. Checking that box on the import fixed the problem.

1 Like