How to evaluate multiple (classification) models and choose the best one? (without doing too much manual work)

Hello,

I am looking into KNIME as an analytical workbench and data mining tool. Looks very promising and have built some models already. However I find it pretty hard to find an efficient way to optimize and choose amongst different models.

So for an example, let's look at a classification example. 

  • I have three classification models (algorithms)
    • Decision tree
    • Logistic regression
    • SVM
  • I have a dataset split in train, validation and test (60/20/20)

My questions are

  1. How can I use train and validate to optimize the learner
    1. How can I run different versions of the learners (eg. Dec.Tree with and without pruning, SVM with Poly, HT or RBF kernel)
    2. with the validation I would like to optimize the parameters of the algorithms, like min. nr of records per node, gamma, kappa, etc.
  2. How can I compare the performance of the different models on the different data sets and select my best model?

Doing something like this in SAS EM would require 'just a couple of nodes' (and a lot of Euros :)), not sure how to do this in KNIME and if it is possible at all.. 

Any help would be appreciated.

regards,

Geoffrey

Do you want to automate this? 

You could do some loops that iterate over different configurations to achieve your DT learning. Or you could manually configure the mutliple options and stick them in a meta node (this would be my preference if you only have a few different configs). 

There are performance calculating nodes like "Scorer" which will produce a performance for classification models. I'm sure there is an equivalent for regression. 

For optimising parameters look into the parameter optimisation loop nodes in the labs section. They should do what you are after.

Cheers

Sam