I am looking into KNIME as an analytical workbench and data mining tool. Looks very promising and have built some models already. However I find it pretty hard to find an efficient way to optimize and choose amongst different models.
So for an example, let's look at a classification example.
I have three classification models (algorithms)
Decision tree
Logistic regression
SVM
I have a dataset split in train, validation and test (60/20/20)
My questions are
How can I use train and validate to optimize the learner
How can I run different versions of the learners (eg. Dec.Tree with and without pruning, SVM with Poly, HT or RBF kernel)
with the validation I would like to optimize the parameters of the algorithms, like min. nr of records per node, gamma, kappa, etc.
How can I compare the performance of the different models on the different data sets and select my best model?
Doing something like this in SAS EM would require 'just a couple of nodes' (and a lot of Euros :)), not sure how to do this in KNIME and if it is possible at all..
You could do some loops that iterate over different configurations to achieve your DT learning. Or you could manually configure the mutliple options and stick them in a meta node (this would be my preference if you only have a few different configs).
There are performance calculating nodes like "Scorer" which will produce a performance for classification models. I'm sure there is an equivalent for regression.
For optimising parameters look into the parameter optimisation loop nodes in the labs section. They should do what you are after.
Dear @swebb,
Sorry for reviving such an old post… But I hope you’re still here, and can receive (and, answer to) this post of mine.
I have a similar situation of comparison between different algorithms’ performances, and I wish to apply some loops to achieve their best configurations. Can you help me with this task? I can easily share with you the anonymized data of my research…
Thanks in advance for any help you can offer…
B.R.,
Rogério.