Polynomial Regression

Hello,

I already have a new problem here.

When I use the Polynomial Regression learner and predictor, I get a fair result. the image is joined here. How can I get the resulting mathematical formula (expression)?

The "learned coefficients" view of the learner node gives some incoherent ones.

For example, the curve is param3_3 = f(time) and the degree of the polynom is 7.

But the learned coefficients give:

Intercept=-6785,2029

coefficient x^1=2,1246

coefficient x^2=0

and 0 until coefficient x^7.

 

And is there a way of getting the correct degree of the polynom automatically?

Nevertheless, the Linear regression node gives a correct result.

Thanks in advance!

Heej.

Talking about the Polynomial Regression node which comes along with a view "Learned Coefficients", you can simple copy the information from this view. If you need to extract those information in an automatic manner, you need to parse the PMML/XML that is output by the node. We have an example for that on our public server inside the XML category that shows how this works for the Linear Regression Learner - give it a try.

Hi Gabriel,

Thank you for your help but I'm not sure you've got my point.

The thing is, the polynomial learner gives an output with the above coefficients but they don't seem correct. If you have a look at the curve (polynom degree 7!), the coefficients in front of x^2, x^3, ...x^7  can't be all equal to zero.. x being the time here. But that's what the learner gives!

So how can I get the correct coefficients?

Also, I'm not trying to get the coefficients automatically, but rather a way of getting the degree of the best matching polynom with my data. Perhaps a loop with the Entropy node in it.. but it is not that simple with the learner node (output is not data but a model).

Best.

Heej

Are you using the latest version of KNIME? There was an error in the view that shifted all coefficients one to the left which sounds very much like your problem. In 2.5.4 it is fixed. If you have a look at the PMML model you will see the correct coefficients in the XML tree, though.

Concerning the "correct" degree, it doesn't generally hurt if you select a "too high" degree in the learner as the superfluous coefficients will just be zero. If you want to get the first coefficients that is close to zero, Thomas already suggested to extract that information from the XML of the PMML model.

... and to answer you last question, please have a look into the ensemble nodes that can also be used in a loop to collect (PMML) models by using the Model/PMML to Cell and Cell to Model/PMML nodes to translate models into cells, and vice versa.

First, I am actually using the latest version of KNIME: the 2.5.4.

Then, I don't have the Model/PMML to Cell and Cell to Model/PMML nodes on my knime, but there are neither in the list in Help/Install new software/ etc.

Last but not least, here is the PMML document output by the polynomial regression learner node:

  • RegressionTable intercept="-6785.202880859375"
  • NumericPredictor name="time" coefficient="2.124564051628113"
  • NumericPredictor name="time" exponent="2" coefficient="-2.7360484818927944E-4"
  • NumericPredictor name="time" exponent="3" coefficient="1.862055487578118E-8"
  • NumericPredictor name="time" exponent="4" coefficient="-7.121122810350988E-13"
  • NumericPredictor name="time" exponent="5" coefficient="1.479886500042083E-17"
  • NumericPredictor name="time" exponent="6" coefficient="-1.405104655556755E-22"
  • NumericPredictor name="time" exponent="7" coefficient="2.7892993013667567E-28"

You will have noticed unfortunately and weirdly enough, that the coefficients are indeed almost equal to zero. Except the first one and the intercept. If you plot this function with these coeffs, you'll have simply a line.. which is not what you can see on the output plot, which I joined before.

I might have misunderstood some of your proposals. But I can't solve it for now.

 

Heej

To get these additional nodes, you need to install the "KNIME Ensemble Learning Methods" extension. The nodes are then contained in the Data Mining / Ensemble Learning / Utility Nodes category.

I cannot fully agree. The plot of the function depends on the magnitude of the numbers. If you have very large numbers (and the huge intercept indicates this), then the x^2 and x^3 will dominate even if their coefficients are much smaller than the one for x^1.

Hey thor, you were completely right. I hadn't thought enough on the result it gave at first.

When I ploted the resulting expression, to check if it is the same as the predictor's result, I used "$time$^3" symbol which gives some weird curve. When I put "$time$*$time$*$time$", it worked. So yes, the coeffs were the right ones.

Thanks to all of you!