Create loop for regression calculation

Hi,

I’m trying to create a workflow to make linear regressions easier to read.

I have several columns, and I would like to do the regression for column 1 vs 2, column 1 vs 3, column 1 vs 4 and so on.

I have been having difficulty using the loop to facilitate this. After that I want to compare the regression value with the real value.

I believe there is an easier way than creating an infinity of knots, as I did in the example below.

If you have a tutorial to share, thank you

Hi @ddgrespan,

I think you could do this with the Column List Loop Start – KNIME Hub node. Essentially any columns in the Include section will get looped through, any in the Exclude don’t, so I’m assuming you keep all non looped data including column 1 in the exclude and put everything else in include and build upon it that way.

Thanks,

Matt

2 Likes

Hi @ddgrespan and welcome to the KNIME forum
You can start with a list of files (all the CSV paths) . If they are in a unique folder structure you can start with a ‘List Files/Folders’ node… ending up with all your paths in ‘Path Format’ in a column.

A second column with the Excel Paths can be necessary if each excel is reading from different file.

Afterwards you can start your iterations with a ‘Table Row to Variable Loop Start’ node connecting it to the ‘CSV Reader’ with the variable port, using the Path variable to feed ‘file_selection’ . Same thing for the Excel that will need a variable port connection as well …

Then, the final workflow will have a unique pass of the data through your linear regression nodes. And a standard Loop End to collect your data, that will replace all the joiners in your workflow.

BR

1 Like

What I imagined was this

image

And the result for an operation is correct, where I have the predicted column and the actual data.

but I’m having difficulty getting it back for the rest of the columns.

Best regards!

@ddgrespan
Let’s focus on one thing at a time. First we can try to study how to read files in loop, and afterwards you can start to add the rest of the functionalities.

In this workflow you can learn to handle files in loop (the lower approach).

A sketch of the workflow has to be something like:

Good luck

4 Likes

Agree with @gonhaddock as it does not look like all your data is in the same file or other relevant specifications which need to be taken care of first.
So give his solution a try
br

Hi @Daniel_Weikert , @ddgrespan
I have doubts on the data-source and how it looks like.
If you read the challenge you can assume that there is a unique data source arranged in columns:

But I can see in the a CSV connected to the Learner and an Excel connected to the Predictor. Therefore at least there are two files.

Looking at the arrange of the wf. there are multiple CSV reader… that would be very inefficient for a unique source, that could be easy to connect one to many column filter (?)

Then, if you can provide some more information about the sources, we could help with more precise wf. arrangements.

BR

1 Like

Hi,

Here’s the model I’m using:

as you can see, I’m trying to create the regression in reference to the “ano”

when running, it works, but in the second loop, the following error is presented:
image

Any suggestions on how to optimize this?

Well you exclude Ano. Have you tried to include it so the target is always there?
br