How can I score a y in a loop regression model with multiple x?

Hello everyone,

I’m currently trying to implement a linear regression model on different target with a column list loop start.
Basically in each column involved in the loop I have an Y which is varying on the values of other columns.

I am wondering if is possible, either via an x-scoring or via a parameter optimization, to find all the possible combination of x that score a certain Y (for all the loops).

Ex:

I have Y1, Y2 and Y3 and all three of them are depending on x1, x2, x3, x4 and so on…
Can I calculate the most probable values of all the x to meet a certain score of Y, like:

Y1=A
Y2=B
Y3=C

With a loop process?

I really appreaciate your help, thank you a lot!

Hi,
what do you mean by “Can I calculate the most probable values of all the x to meet a certain score of Y”. What would be the most probable value and which score do you mean? Do you want to build 1 regression model for each Y with all other X as predictors, then find the corresponding score (i.e. RMSE, R^2, …)? Or do you want to calculate for each Y the subset of X that produces the best score?
Kind regards
Alexander

Hi Alexander,

Thank you for answering.
Probably I didn’t explain fully my problem.
I would like to forecast an Y vector with n values: Y=[y1,y2,y3,…]. This is because Sum(y1,y2,…yn)=100 (or 1 if they need to be normalized).

The Y vector, so each y(i) (y1,y2,y3,…), depend on the same x that will be used to predict the the actual Y vector.

Can I do it in Knime, and how?
Is the workflow that I am using close to what I want to obtain?

Really thank you for your help!!

Hi,
now I understand, but I don’t think we have a model that does it. You can, however, maybe just train a model for each component of Y and then use the math formula node to normalize the outputs of the models manually.
For your workflow: generally it looks good, but why do you need two loops? Why don’t you do the prediction also in the first loop?
Kind regards
Alexander

Hi Alexander,

So far I followed this workflow that I found here between the examples:

Here it use the same exact workflow for multiple target, and since in my structure I have different y(i) I thought could have been a very similar approach.
Since I’m using the second loop as forecasting, to not run the loop all the time I change the data to be used in the model.

How do you mean to implement the prediction in the first loop?

Shall force the model before the second loop to create an outcome where Y=100 given by the sum of all the y(i)?
Or do I need to weight each predicted y(i) (so, after the loop) in order to obtain a sum of all the y(i) = 100?

Which is the best case, and how do I do that, is is possible to see a small example that you think could work?

Thank you a lot.

Hi,
the way you do it is also fine, but theoretically you can immediately apply a model with the predictor once you trained it. Then you’d need only one loop, but the downside is that every time you change the prediction logic, you also need to rerun the training. So your approach is probably even better. To normalize your predictions, lets pretend the predicted values are in columns p1, p2, and p3. Then you can use the Math Formula (Multi Column) with this formula: 100 * $$CURRENT_COLUMN$$ / ($p1$ + $p2$ + $p3$).
Kind regards
Alexander

Hi Alexander,

I’ll give it a try.
Last things, since I have an extensive number of column, is there a way to make a summation formula like ( ∑ $p(i)$ ) instead of summing column by column.
Above that, can I implement also a rounding formula, that round the outcome numbers at the second digit after the comma?

Thank you again!

Hello Stefano,
you can calculate the sum before using the math formula node using the Column Aggregator node. You can then use the new column instead of the manual sum. For rounding there is the round(val, precision) function in the math formula node.
Kind regards
Alexander

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.