Hello,
I would like to try all combinations of features to select the best combination to give the highest accuracy from regression.
I found the nodes for features selection loop start and end. But they provide the forward-backwards random and genetic algorithm to select the features.
They all do not provide the test over all combinations.
Is it possible to provide a workflow that forces the selection of all the combinations possible?
The run will take much more time but will guarantee to use the best model.
Hi zizoo,
There seems to be no native node to do the exhaustive search :(. If you are proficient with python/R/Java you can try to use a snippet node in the corresponding language. For example, in python there seems to be a ExhaustiveFeatureSelector in mlxtend package or you can generate all possible combinations using itertools
package, get those out into KNIME and then use those as a mask to select various feature subsets natively in KNIME.
What kind of model are you interested in. I’m confused a bit, because you’ve mentioned “… accuracy from regression”. If you plan to use a linear model only, then you can use the H2O GLM, which allows to constrain the number of parameters using Lasso regularisation. Would that be a viable option?
Best regards,
Mischa
Hello @zizoo,
there is exists a way to do brute force feature selection via KNIME’s Parameter Optimization Loop.
I created a workflow that shows how to do it:https://hub.knime.com/nemad/space/Brute_Force_Feature_Selection
However, besides the long runtime this approach also has the downside that the configuration of the Parameter Optimization Loop Start is tedious for a large number of features. However, in that case the loop will run forever anyway, so the longer configuration won’t have a big relative impact on the overall time spent by the workflow.
Cheers,
Adrian
Hi @lisovyi,
I plan to try different algorithms including linear regression, XGBoost and neural networks.
The reason for using the exhaustive search is to find one or two features that provide the highest accuracy determined with the highest R2.
Hi @nemad
I have hundreds of features that should go inside the node of parameter optimisation loop start.
Is there an option to automate the configuration of this node to avoid the need to enter manually the features?
Hi,
the strategy very much depends on the goal. If your goal is to determine a subset of the most important features for each model, why don’t you use model interpretation tools that were released in KNIME 4.0? For example, see this nice workflow:
https://kni.me/w/hl-WmjRhnteq_Aq-
As for your second question, at this moment there is no straightforward way to dynamically configure the parameter optimisation loop in your use-case (it is possible to configure the numerical partameters via flow variables, but not the number of parameters). However, we have it on our TODO list. Maybe Adrian can come up with some work around.
Best regards,
Mischa.
Hello @zizoo,
unfortunately, there is currently no workaround that would work for hundreds of features.
However, in this case, an exhaustive search is not feasible anyway:
For n features an exhaustive search has 2^n iterations.
In order to better illustrate what this means let’s consider the specific case of 300 features, in which the exhaustive search would have 2^300 > 10^90 iterations.
That is more iterations than there are atoms in the universe.
It’s a different story if you are only interested in subsets of a certain (small) size e.g. 2.
The number of such subsets is given by the binomial coefficient.
In the case of 300 features and subset size 2, this amounts to only 44850 possible subsets. A search through this space will still take a long time but it won’t take literally forever like the exhaustive search.
I believe it should be possible to realize such a search in KNIME using a nested combination of Column List Loops.
Cheers,
Adrian
Dear @nemad
I tried the nested column loop as you suggested with generic data in the workflow below.
It is not configured properly.
Could you please help me to fix it?KNIME_project5.knwf (29.8 KB)
Thanks,
Hi @nemad,
I could find the reason for the loop failure.
This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.