I am looking into KNIME as an analytical workbench and data mining tool. Looks very promising and have built some models already. However I find it pretty hard to find an efficient way to optimize and choose amongst different models.
So for an example, let's look at a classification example.
- I have three classification models (algorithms)
- Decision tree
- Logistic regression
- I have a dataset split in train, validation and test (60/20/20)
My questions are
- How can I use train and validate to optimize the learner
- How can I run different versions of the learners (eg. Dec.Tree with and without pruning, SVM with Poly, HT or RBF kernel)
- with the validation I would like to optimize the parameters of the algorithms, like min. nr of records per node, gamma, kappa, etc.
- How can I compare the performance of the different models on the different data sets and select my best model?
Doing something like this in SAS EM would require 'just a couple of nodes' (and a lot of Euros :)), not sure how to do this in KNIME and if it is possible at all..
Any help would be appreciated.