How to evaluate multiple (classification) models and choose the best one? (without doing too much manual work)

Hello,

I am looking into KNIME as an analytical workbench and data mining tool. Looks very promising and have built some models already. However I find it pretty hard to find an efficient way to optimize and choose amongst different models.

So for an example, let's look at a classification example. 

  • I have three classification models (algorithms)
    • Decision tree
    • Logistic regression
    • SVM
  • I have a dataset split in train, validation and test (60/20/20)

My questions are

  1. How can I use train and validate to optimize the learner
    1. How can I run different versions of the learners (eg. Dec.Tree with and without pruning, SVM with Poly, HT or RBF kernel)
    2. with the validation I would like to optimize the parameters of the algorithms, like min. nr of records per node, gamma, kappa, etc.
  2. How can I compare the performance of the different models on the different data sets and select my best model?

Doing something like this in SAS EM would require 'just a couple of nodes' (and a lot of Euros :)), not sure how to do this in KNIME and if it is possible at all.. 

Any help would be appreciated.

regards,

Geoffrey

Do you want to automate this? 

You could do some loops that iterate over different configurations to achieve your DT learning. Or you could manually configure the mutliple options and stick them in a meta node (this would be my preference if you only have a few different configs). 

There are performance calculating nodes like "Scorer" which will produce a performance for classification models. I'm sure there is an equivalent for regression. 

For optimising parameters look into the parameter optimisation loop nodes in the labs section. They should do what you are after.

Cheers

Sam

Dear @swebb,
Sorry for reviving such an old post… But I hope you’re still here, and can receive (and, answer to) this post of mine.
I have a similar situation of comparison between different algorithms’ performances, and I wish to apply some loops to achieve their best configurations. Can you help me with this task? I can easily share with you the anonymized data of my research…
Thanks in advance for any help you can offer…
B.R.,
Rogério.

You could try out the AutoML component to avoid creating individual models yourself.
br

1 Like

@rogerius1st I see these approaches (besides it would be better to open a new thread and leave such old ones alone).

First- AutoML Regression and Classification Examples. Like @Daniel_Weikert has mentioned these are the components currently being developed by KNIME:

AutoML Regression and Classification Examples – KNIME Community Hub (example on the hub for classification and regression)
Guided Automation – KNIME Community Hub (guided automation)
Compute and Visualize Global Feature Importance Metrics – KNIME Community Hub (global feature importance)

There is a blog collection describing the approach:

https://www.knime.com/integrated-deployment-knime-blog-series


Second - H2O.ai AutoML in KNIME for classification problems (my own little approach utilising H2O.ai AuoML)

H2O.ai AutoML (generic KNIME nodes) in KNIME for classification problems - a powerful auto-machine-learning framework

Sparkling predictions and encoded labels - “the poor man’s ML Ops”

Results get evaluated with R node collection (Model Quality Classification - Graphics – KNIME Community Hub) and are stored in sub-folders


Third - The KNIME Model Process Factory (2017) - an older approach by @Iris at collecting and evaluating models

2 Likes