I’m currently try to set a new flow for a multiple linear regression. I mean a linear regression that work on different target columns (defined by a “Column list loop start” node) and use a lot of values as predictors (more than 200) for each targeted column.
I’m currently facing 2 issues that rise some question:
Here is a list of the Statistics on Linear Regression and we can see there are no P>|t|
In fact, in that node I have to set the target column and the predicted column to be compared to make the actual elimination
I can actually control the target column by using, as a flow variable, the column defined by the “column list loop start” node, but I cannot control the predicted column that will change name each iteration.
-Is there a way to control it by a wildcard for example (using Prediction)?
Please let me know and thank you in advance for your help.
Stefano
Hi Stefano,
I don’t have an answer for your first question right now, but I can tell you that the missing P should not matter for feature elimination. For controlling the prediction column via flow variable, you can use the String Manipulation (Variable) node to build the variable name based on the name of the target columns. You can use the expression
join("Prediction (", $targetcol$, ")")
in it to create the name of the prediction column.
Kind regards,
Alexander
Thank you for your response.
Why would you use a Join function?
I think the “string manipulator (variable)” node would receive, in the input port, the variables from the “predictor” node.
In the predictor, each time I will have a “Prediction (…)” column that change the name based on the iteration.
I need to create a variable that recognize that column with a function (probably search) and use this variable in the “Backward feature elimination” node to set the prediction column
I don’t think
join(“Prediction (”, $targetcol$, “)”)
would work, since I don’t have any column call targetcol.
Am I right? Am I missing something?
Thank you for your help
Hi,
$targetcol$ is just the placeholder for the actual column name, as it is output by the Column List Loop Start node. I think it is actually called currentColumn or something like that. If your target column is “ABC”, then the string manipulation expression returns “Prediction (ABC)” and that is the column that you have to compare “ABC” to.
Kind regards,
Alexander