Forward feature selection

Dear community,
I’m just using knime for data mining and I’m using the forward feature selection for a very large dateset with just over 3000 columns to apply a regression algorithm after that. The problem is that the loop never ends (it runs throughout the night). Can someone help me?

Hello Piera,

in the Feature Selection Loop Start node you can specify a threshold for the maximal number of features you want to find. This reduces the search space dramatically if the desired number of features is much smaller than the total number of features.
You can also reduce your feature set by removing constant features (Low Variance Filter) or highly correlated features (Linear Correlation and Correlation Filter). A smaller feature set requires fewer iterations in your feature selection loop and hence reduces the runtime.

Best,

nemad

3 Likes

Yes, I tried to set a threshold of 20 but it always takes a long time and seems never to end. Now I try to apply a Linera Correlation and Correlation Filter. Thank you very much for your reply. Soon I’ll let you know.

The problem is that even for a maximal feature set size of 20 the algorithm has to perform 62790 loop iterations (3000 to select the first feature, 2999 for the second feature and so on).

2 Likes