Can KNIME do the feature selection and then parameter optimization simultaneously?

Can KNIME do the feature selection and then parameter optimization simultaneously?
For example, we select feature 1 and then try to optimize the parameter for that feature.

Now, I have tried to do as shown in the picture.
I did not sure that it can store both parameter and feature or not.

Parameter optimization only makes sense once the features are set else your are doing much more work than actually needed.

You would also need to cross-validate each feature combination else you simply optimize the features for 1 specific training set and not a generalized one.

For each Feature selection step, do CV and then if you have the features for each parameter optimization loop do CV, meaning it will become more computational expensive. Hence I’m not really a fan of feature selection this way especially for trees. Trees should be relatively immune to unimportant features. For feature selection I would simply go with the linear correlation filter and low variance filter.

Also Random forest and boosting are relatively immune to parameter optimizations compared to say SVM or neural networks were wrong params lead to unusable models. if it preforms bad with 100 trees it will not get magically excellent with 1000 trees. So while not 100% correct you can probably get away with doing it once and reusing those parameters for the same data set even if new data is added (assuming new data is only a small fraction of total data)

2 Likes

Hi @Kpa

if you really want to do it simultaneously, you need to use a regular “Loop End” which collects then both information in a table. Since you are then not having a “Feature Selection Model” to filter your columns, you also need to do the filtering on your own. The “Extract Column Header”, “Insert Column Header” and “Reference Column Filter” nodes may be useful for this process.

Doing feature selection and parameter optimization simultaneously can be very expensive, so what you could do to save time is doing first the parameter optimization and afterwards the feature selection. As an optional step you could do another parameter optimization on the selected features then. This may lead to a bit worse results then doing it simultaneously but saves you time and work.

Maybe worth for you to take a look at is the workflow we published some weeks ago which allows to do fully automated machine learning: https://www.knime.com/blog/intelligently-automating-machine-learning-artificial-intelligence-and-data-science. It does a fully automatic parameter optimization of models and feature engineering for your data. You can downloaded and use it here: knime://EXAMPLES/50_Applications/36_Guided_Analytics_for_ML_Automation

Cheers,
Simon

1 Like

Actually, I have done with doing the feature selection first and then opitimizing parameter.
I just have a doubt for can we do it simultaneously or not.
So thank you so much for your answer and suggested me the useful example.

That workflow doesn’t work if you deselect any tree based model.

1 Like

Hi @beginner,

What exactly do you mean? What does not work?

What I mean is that it wasn’t very well tested.

What doesn’t work:

image

image

Thanks for reporting this, we will have a look into it and upload a fixed version. A workaround may be to check “Finetune Model Parameters” and let the settings just as they are in the next two pages.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

We released a new version of the workflow. You can find it here: knime://EXAMPLES/50_Applications/36_Guided_Analytics_for_ML_Automation
The bug you have reported is fixed.

1 Like