Multiple linear regression - Backwards elimination

Good morning everyone,

I’m currently try to set a new flow for a multiple linear regression. I mean a linear regression that work on different target columns (defined by a “Column list loop start” node) and use a lot of values as predictors (more than 200) for each targeted column.

I’m currently facing 2 issues that rise some question:

  1. Here is a list of the Statistics on Linear Regression and we can see there are no P>|t|

-Why is this happening?
-Will the backwards elimination work if I don’t have P values for the predictors?

  1. When I’m inserting the backwards elimination inside the column list loop, I have a problem with the “backwards feature elimination end” node.

In fact, in that node I have to set the target column and the predicted column to be compared to make the actual elimination

image

I can actually control the target column by using, as a flow variable, the column defined by the “column list loop start” node, but I cannot control the predicted column that will change name each iteration.

-Is there a way to control it by a wildcard for example (using Prediction)?

Please let me know and thank you in advance for your help.
Stefano

Hi Stefano,
I don’t have an answer for your first question right now, but I can tell you that the missing P should not matter for feature elimination. For controlling the prediction column via flow variable, you can use the String Manipulation (Variable) node to build the variable name based on the name of the target columns. You can use the expression

join("Prediction (", $targetcol$, ")")

in it to create the name of the prediction column.
Kind regards,
Alexander

1 Like

Hi Alexander,

Thank you for your response.
Why would you use a Join function?

I think the “string manipulator (variable)” node would receive, in the input port, the variables from the “predictor” node.
In the predictor, each time I will have a “Prediction (…)” column that change the name based on the iteration.

I need to create a variable that recognize that column with a function (probably search) and use this variable in the “Backward feature elimination” node to set the prediction column

I don’t think
join(“Prediction (”, $targetcol$, “)”)
would work, since I don’t have any column call targetcol.

Am I right? Am I missing something?
Thank you for your help

Hi,
$targetcol$ is just the placeholder for the actual column name, as it is output by the Column List Loop Start node. I think it is actually called currentColumn or something like that. If your target column is “ABC”, then the string manipulation expression returns “Prediction (ABC)” and that is the column that you have to compare “ABC” to.
Kind regards,
Alexander

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.