Setting the range of possible values for the Regression Predictor

lunali98 · February 4, 2022, 1:31pm

Hello, KNIME community! I continue to study predictive analytics and KNIME
My predictive model predicts negative values of the marketing metric CTR, although there are no rows with negative values in the training data (this is impossible). How can I adjust the range of possible values for Regression Predictor so that the predictions can only take positive values?

aworker · February 4, 2022, 1:48pm

Hello @lunali98

This behavior is model dependent. If your model is numerically bound to provide “only positive” output predicted values then they should be positive. Otherwise they do not necessarily need to be + and the model has the freedom of generating negative values. For instance, a linear regression may predict negative values, and more generally speaking, it can produce values out of the “Training Set” bounds. Alternatively, there are ML models such as Deep Learning with RELU or Sigmoid output transfer functions which should never produce negative values.

In the case of using a ML regression model that is not bounded, you could always truncate the negative values to bring them to a zero value. This is absolutely correct since you know -a priori- that negative values are not possible. You can perfectly apply your statistical estimators after this truncation. This is also absolutely perfect. The same if your positive values should be bounded to a maximum positive value, you could also truncate them. And more generally you can truncate your predicted results to any value if a priori you know that they should not go beyond well known predefined bounds.

What kind of model are you using to achieve the regression ?

Hope is helps.

Best

Ael

lunali98 · February 4, 2022, 2:08pm

Thanks for the clarification!
I use the Linear Regression Learner node based on the workflow given in the tutorials. If linear regression can produce any values, does that mean that for my case, where the target values can be from 0 to 3.921, I should use a different model?

aworker · February 4, 2022, 2:25pm

@lunali98 My pleasure!

Not necessarily. One should use a linear model if:

The underlying nature of the problem is linear (Is it the case of your data?)
If you want to limit the solution to a linear model (more unusual case if you know a priori that your problem is not linear)

Besides the underlying nature of your data (linear or not), keep in mind that data is most often noisy which leads ML models to predict values “out of theoretical bounds”.

What is the nature of your data ? Can you describe it ? This is the first thing to consider when doing ML modeling.

ML modeling is driven by the nature of data

Looking forward to knowing your answer.
Hope it helps.

Best

Ael

system · May 5, 2022, 2:26pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.