How to: Loop losing 1 column in each iteration.

Hi,

What:
I am trying to make a Random Forest Scorer combined with Loop and Variable Importance. Let me explain:
I have a big table with 50 columns to run a Regression Random Forest. In order to prepare the Random Forest, first, I am evaluating its Variables Importance. I detect that there are more or less 20 columns that are not relevant and confuse the data, and therefore, its better to delete those columns. I have found out that there are 20 columns by test and fail, and I want to make it automatic.
My idea is to make a loop of 50 iterations. In each iteration I will look for the less relevant column in the Variable Importance Node I have created, and the idea is to get that column and delete it for the next iteration. If that works well, in the first iteration will be evaluated 50 columns, in the second the best 49 of the past iteration, in the next will be 48, next 47… until no columns are left.
In each iteration I run a Regression Random Forest and score it. Once I have ran all the iterations the output is a table with 50 rows, the column deleted in this iteration, and the scorer. Then I plot it to know where cuts the most confident set up.

Problem:
After 3 whole days looking for it, I have not been able to delete one column in each iteration. Knime doesn’t allow me to connect a node with the result of the iteration and use it to delete a column in the begining of the loop.
In the picture you will see one of the several test I have ran, where I cannot connect those 2 nodes, doesn’t allow me. It doesn’t work with this simple methods, nor using Variables or external tables where I add a new column each iteration.
I am stuck. Need help!!

Thanks,

You can use the Feature Selection loop for this.

This is the Loop Start you would use:

And on the hub you can also find an example workflow:

7 Likes

Hi there @Apeire,

I usually advocate “first try it yourself and then ask” but not necessary for 3 whole days…

Welcome to KNIME Community! :slight_smile:

Br,
Ivan

2 Likes

Hi Iris,

Many thanks for answering. I saw your solution, went directly to test it as it looks quite interesting, but after 2 days testing and trying to understand how it works I am still struggling. The results doesn’t make sense so pretty sure I don’t understand the node yet. I have a new theory I have to try. If I don’t get it right, I will come back with hope of enlightment.

“See” you soon.

Thanks,

When I am close to the solution I like to ask. But when I am so far away I don’t like to make silly questions without understanding even what I want.
:sweat_smile:

2 Likes

All right,

I have been “playing” with this great Feature Selection Loop. It covers what I was looking for, once it has made a loop, it comes back and deletes a column and make a loop again. Fantastic, I made it work pretty well finally. But I have a concern about it.

  1. I have no idea what does it do between loops. If I have 50 columns it will delete 1 each loop, but between loops it does 50 other loops to find out the best model with the X columns available. But, what does it do with this columns? does it eliminate them one at a time? does it shuffles it one at a time? I would like to know that to understand the process. I couldn’t found out that information.

  2. Before testing Feature Loop Start (FLS) I design a node which makes everything FLS makes between loops, make X loops (e.g. 50) varying one column at a time (shuffles it) and find out which column eliminate. Then I concatenate 50 of this nodes in cue, each node eliminates one column, the least important one, gets the R^2 of that node, and I got the more or less the same exercise of FLS. This results in a TERRIBLE, trully terrible inefficient model that last 20-30 times more than FLS, even I cannot run more than 10 of these nodes at the same time because it crashes Knime, but here the good point and my question: The result was a fairly more confident and accurate model than FLS and I think it is because I shuffle the columns before training. So, is there a way to make FLS to shuffle the columns or use another Loop node that let me have what I have in my personalized node?

I attach some pictures of what I do in my personalized node. The images are in order you enter in each node.

Many thanks in advance

1 Like

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.