Polynomial regression model learning using categorical variables

learning
#1

Hello. I’d like to know if it’s possible in KNIME to learn a non-linear regression model that is capable of dealing with categorical variables. More specifically, my problem has the following variables:

Input variables:

  1. Position in the plate with 5 levels (categorical):
  • Top left
  • Top right
  • Center
  • Bottom left
  • Bottom right
  1. Orientation with 4 levels (categorical):
  • YZ
  • ZY
  • YZ45
  • Y45Z
  1. Thickness (double)
  2. Energy density (integer)
  3. Pressure (double)
  4. Number of times reused (integer)

Output:

Mechanical resistance of the piece

0 Likes

#2

Hi @glimachave,
that is possible if you convert the categorical values first. A common technique is One-hot encoding, where you make one column for each of the possible values. The column corresponding to the present value is set to 1, all others to 0. In KNIME this is done using the One to Many node. I hope that helps!
Kind regards
Alexander

1 Like

#3

Hello @AlexanderFillbrunn. Thanks for your reply. I’ve done that, but my question is more related to the procedure after the encoding of the variables. Once I’ve done it, I can’t use a polynomial regression learner of degree 2 (because one cannot fit a parabola with 2 values like 0 or 1), and a linear model doesn’t correspond very well to the behaviour of the other variables. So in terms of modeling, what would you recommend me to do in this case?

0 Likes

#4

Hi @glimachave,
how about using a Simple Regression Tree? Here you don’t even need one-hot encoding. If a single tree is not good enough or you don’t care about interpretability, one of the tree ensemble models might also be useful, e.g. Gradient Boosted Trees or Random Forest.
Kind regards
Alexander

1 Like

#5

In fact once I’ve used the One to Many node, the polynomial regression worked (it ignored the coefficients for the degree 2 terms of the dummy variables), and the model has a good precision apparently. Thanks !!

1 Like