Hi zizoo, yes this why it should work correctly. Loops is Knime need to be opened and ended in correct order. eg. inner loop must end before outer loop.
Hi again,
I adjusted a bit the workflow as shown below. I practically added a row and a column filter to get the value of the accuracy of the average from the cross validation. I am not sure which one is correct.
I found a difference in the result of the optimization end loop.
Hello @zizoo,
whatâs correct really depends on what you want to achieve.
But I guess your goal is to find the parameters that give you optimal accuracy of your model, in which case your workflow is correct.
Just a quick note: You actually donât need the row and column filter because the Scorer node already outputs the accuracy as flow variable. Simply connect the Scorer with the Parameter Optimization Loop End via the red flow variable connection and you are good to go.
Cheers,
nemad
Hi, @nemad,
I got problems with the SVM optimization loop. Can you help me to configure it correctly?
I am working with some Covidâs data on my state, and I try to find out if one (or more) of three regional epidemiological rates can serve for predicting effects on a specific market. My goal is to compare the predictive performance of a few algorithms.
I have 6 possible classes for real metrics collected in a survey, and predictions should also fit in one of the same 6 classes. Input data are stored in this XLSX file:
Help with PpredictionsClassifcations.xlsx (9.7 KB)
I built a workflow as follows:
I applied on it the RBF kernel and added its two parameters: a) sigma; and b) Overlapping penalty.
But it is not working properly. This is the Confusion Mattrix I got:
Of course, the only ârightâ predictions fit in the due class just by chance. My ârealâ accuracy is â0%â.
I have also manually tested it with a series of different ranges of values (from very tiny to widely broad spectra), with and without the loop (Parameter Optimization Loop Start / End). Predicted values are classified chiefly in just one class, and I canât find a better configuration for this loop and prediction/classification, which are stored in this XLSX file:
Help with SVM Predictions.xlsx (6.2 KB)
Can you help me with this task? Any help will be greatly appreciated.
Thank you.
B.R.,
Rogério.
P.S.: sorry for reviving such an old post, but no better idea came to my mind.
Hello @rogerius1st,
There are a few things that I donât understand:
- Why the Denormalizer after the Scorer node?
- Which parameters are you optimizing over?
- What is the warning the learner node displays?
- Your ROC Curve only shows the performance of the last iteration of the parameter optimization. Are all other iterations similarily bad?
All in all it is hard to tell what is going wrong from just the snippet you posted. Is there any chance that you could share the workflow?
In the meantime Iâd double check the normalization which can easily throw of an SVM. Do you use the same normalization model for both training and testing data, i.e. is the model produced by the Normalizer of the training data applied to the testing data via the Normalizer (Apply) node?
Best regards,
Adrian
Dear Adrian,
Thanks for your help. Indeed, You have already helped me a lot of times, with many other posts on which you helped many other beginners such as myself. Now, some answers to your questions:
-
Although I had previously realized that it wasnât necessary because the classes (that I had collected in a survey) already were between â0â and â1â. Notwithstanding, the three rates are not ânormalizedâ. And I applied the âNormalizerâ node before every âLearnerâ node and the âDenormalizerâ node after every âpredictorâ node of each algorithm that I am trying to compare their performances (and before the Scorer node), because I read (in one post at the âKnime forumâ) that â[âŠ] algorithms predict better if working with normalized data [âŠ]â. Isnât that so? Indeed, I found no difference with or without the Denormalizer node (just for normalizing (or not) the real metrics (those collected in a survey) nor the predictions. I wished to emphasize that I am (trying to) compare some of COVIDâs epidemiological local rates as potential predictors of local market retractions. So, it made sense to me if I worked with normalized data, for these rates werenât normalized before, although the market metrics were.
-
Iâm trying to optimize over two parameters (as I chose the RBF kernel): a) sigma (from 0.1 to 1.0; step 0.1); and b) overlapping penalty (from 0.1 to 2; step 0.1). After reading (the new version of) my workflow, do you think other ranges of values would be advisable for my situation?
-
The warning of the learner node displays a message of rejection of a few columns. And this is a very good point in your comments. After reading your answer, (and trying to answer it), I rewrote the workflow, applying a âColumn filterâ (to exclude those additional columns (which I had no intention to use henceforward), I got the workflow that follows:
Help with SVM Loop.knwf (42.8 KB)
Formerly, I had also tried to apply the SMOTE node (on the training partition, and twice, three, and four times of over-sampling), but it rendered the same results: Accuracy = 14,454%, and (almost) all the predictions in the same class.
But suddenly, I just removed the SMOTE node, and it worked!.. Unexpectedly well, by the way. It seems so much better now, that it resembles the result one can get with âovertrainingâ. See the Confusion Matrix below:
I am still using the same input data:
Help with PpredictionsClassifcations.xlsx (15.7 KB)
As it hadnât worked out before (and does now), I guess that (then) I had neglected the exceeding columns, which have had a harmful effect on former prediction configurations. Now, itâs working for me. Thank you once again. -
My ROC Curves are now working with all classes, and their current predictions are functional, notwithstanding the due considerations relative to my VERY scarce number of rows.
-
Did you get to download (and open in Knime) my new version of this branch of my workflow? And does it seem to you (a little) more comprehensible?
Thank you once more for all your help.
I wish you all the best.
Bye.
Rogério.
Hello @rogerius1st,
first of all, sorry for the late reply, I was on vacation last week.
Thank you for posting your workflow, that makes it so much easier to figure out what is going on.
Some observations:
- You perform normalization on your full dataset before splitting it into training and testing data. This can skew your model because the testing data affects the normalizer model that you would later have to apply to new data in order to apply the model in practice. Therefore the Normalizer should be used after the Partitioning node and then applied to the testing data via the Normalizer (Apply).
- In your case the best value for Overlapping penalty is the stop value of your parameter search. If you fall on a parameter boundary like this, it might make sense to extend the parameter range to see if larger Overlapping penalty values would yield an even better model.
- In your workflow the best model happens to be the last model to be trained but in general you have to retrain your model once the best parameter combination is found and then evaluate the performance on an independent third dataset.
- You might want to employ cross validation together with parameter optimization to get the most robust results. Here is a workflow that shows how to do this: Parameter Optimization Loop with Cross Validation â KNIME Hub
- The Denormalizer before the Scorer is not needed because the scorer looks at string columns that are not affected by a Normalizer anyway.
Cheers,
Adrian
Hi again, @nemad.
Sorry for my late reply. And thanks a lot for the enlightenment. I suppose I havenât learned (yet) how to use most of Knimeâs nodes properly.
- I moved the âNormalizerâ nodes from before to after the âPartitionerâ, applying it only to the Training data, and adding the âNormalizer (Apply)â to the Test data (and feeding this last one (the blue square input) with the model from the former one (the blue square output)), and removed the âDenormalizerâ as well, as you suggested.
- About the question of Overlapping penalty values that you mentioned, I wasnât sure what should be a good range for it. I simply tried to double the âneutralâ value of â1â (i.e., applied the interval [0.01; 2.0], with a step size of 0.01 â the results came with the best parameter of 1.79 for the penalty.
- Afterward, I increased the penalty value to â3â (although I still donât know if it may make any real sense to test such a value (which I suppose itâs high)), and the best parameters for Accuracy didnât increase very much: it became 2.22 for the penalty and 0.12 for sigma. This is what I got in a âSurface Plotâ node (I show you two views: a front view and a bottom-up view, to ease understanding of how this surface appeared to me):
and
I thought it might mean: lower sigmas and lower penalties lead to higher accuracies. Am I right?
This is my new Confusion Matrix:
- By the way: Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldnât they have the same value? Otherwise, which one should I use?
- Upon your suggestion of testing the performance with a third independent dataset, unfortunately, I donât have any. My total dataset is very tiny, with 182 instances (rows). Setting the partition to 70%, it renders me 127 for training and 55 for prediction. But I got no additional data for validating my models (on other data different from training and testing).
- I tried to adapt the @paolotamagâs example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?
My workflow became like this:
Or the sharable link of the modified workflow:
Help (3) with Optimization loops on SVM.knwf (106.3 KB)
Thanks for any help you can lend me.
B.R., Rogério
Hello Rogério,
the surface plot is a great idea to visualize the optimization!
However, there might be a bug in it because the accuracy should only take on values between 0 and 1.
Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldnât they have the same value? Otherwise, which one should I use?
That is indeed odd. Can you perhaps send me the workflow that produced this incoherence, so that I can figure out if there is a bug on our end?
But I got no additional data for validating my models (on other data different from training and testing).
Thatâs where cross validation can help you. You would split your data with the partitioning node to get a training and a test set. Then you use cross validation in combination with the Parameter Optimization loop to find the best parameters and then train your final model using the full training data and apply that to your test data to get a less biased estimation of your model on unseen data.
I tried to adapt the @paolotamagâs example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?
In your case you will want to replace the ROC curve with the Scorer to calculate the accuracy and collect it with a Variable Loop End node instead of the Loop End node that Paolo uses in the example.
Cheers,
Adrian
Hello there @rogerius1st
Did you check there the Parameter Optimization (Table) component by @k10shetty1 ?
This enables you to drag and drop functionality and perform parameter optimization with minimal configuration.
The component has settings that expose you to the right level of detail.
Here is an example workflow on classification: Parameter Optimization (Table) Component on Random Forest â KNIME Hub
Here is an example on regression: Parameter Optimization (Table) Component on MLP â KNIME Hub
For more examples and explanation take a look at the following resource.
Keerthan is an expert on this topic and co-author of the blog post:
and of this example space:
Dear Keerthan @k10shetty1
Thanks for all your support. I (tried to) study all the material available in the sent links. And tried to apply that âComponentâ in my situation. I didnât get good results yet, probably for not properly understanding them. I tried this Component for comparing the performances of five classifications in algorithms. I received âerror messagesâ in most of them:
-k-NN:
- MLP:
- PNN:
- SVM and NaĂŻve Bayes: no warning messages, but (apparently) into an endless loop.
Though I tried (since your post, and all through the whole past week) a series of parameters and configurations, I think Iâm still needing help (or a great help!) on it. Below I send you my starting material:
a) an XLSX file with 3 rates which may supposedly be related to 1 business metric;
Help with PpredictionsClassifcations.xlsx (8.1 KB)
b) the workflow on which I tried to apply your suggestion with the cited âComponentâ.
KNWF_to_Paolo_Keerthan_3.knwf (847.6 KB)
If you can help me, it will be greatly appreciated.
Thank you
B.R.,
Rogério.
Hello there, @rogerius1st.
I ran your workflow and would make the following changes:
-
SVM and Nave Bayes - The âBrute Forceâ technique would try every possible parameter combination in the parameter start and end range.Given that you have 200 values for the âOverlapping penaltyâ parameter and 100 values for theâsigmaâ parameter, to try everything would take a long time.I would recommend that you either increase the step size or use a different strategy in the component configuration.
-
Please update variable values in your Learner nodeâs âFlow Variableâ section for PNN and MLP, within the capture segment.
-
For KNN, the Parameter Optimization component works with a Model that includes a Learner and Predictor node. Because this is not the case with KNN, please use the Counting Loop Start node.
Please see the attached workflow for updates.
KNWF_to_Paolo_Keerthan.knwf (1.9 MB)
Thanks for sharing @paolotamag .
Why is the bottom workflow captured? I do not understand the reason for this. Can you elaborate
br
The bottom workflow captured is the model learner and predictor that you would like to optimize.
Instead of building from scratch you need to capture the learner and predictor only, configure the learner flow variables to be optimized, and connect the component together with data and parameters settings.
Did you read the blog posts? At the end is well explained.
probably not detailed enough I just rushed quickly over. To me it looks like the optimazation is learned in the âtrain full model partâ so I assume this is the part which needs to be captured
br
Dear Keerthan,
Thanks for your direct (and detailed) answer. Thanks for your comments, and special thanks for the (uphill) task you probably had while working on my KNWF.
I updated my workflow based on your suggestions (where possible, increasing the step size) â as far-reaching as I could understand the specifics of your changes. I send you (attached) my new version of that workflow.
WF_answers_to_Keerthan 1.knwf (1.9 MB)
Thou, Iâm feeling compelled to comment with you a few things I couldnât reach:
- (on your original answer) the component presents only 10 results on the Parameter Optimization Loop End â All parameters. How can I enlarge this number?
- on SVM, Iâve increased the step size, as you suggested, changing them into:
and thus I got a low âbest accuracyâ (of 0.2835) in the âBest parametersâ right-click option of the component:
Opening the Component, I also saw that almost all SVM predictions (except by 4 from 55 instances) were in the same class (= â0.1â). So, I guess the real accuracy was â0â, because some original instances belonged to that class, and this coincided with the class to which a few points were assigned to this class just by chance. What could I do now to fix that?
b) on k-NN (testing k values in the interval {2; 20], the accuracy attained was 0.273 (for both 2 and 3 nearest neighbors). After these results, I applied the âLine Plotâ node, which rendered this graph:
According to the Elbow Method, the k = 2 to 3 (the first âelbowâ) would be the best numbers (I chose k = 3). I also tried to test (and compare to) (almost) the same application, but with no loops, and with k = 3, which rendered me the following Confusion Matrix:

The data classification in this matrix was somewhat strange. Data seem to be fully dispersed along it, which leads to the supposition that the âright classificationsâ were generated just by chance. Does it look the same to you? In such a case, what could I do to reduce this data dispersion?
c) on NaĂŻve Bayes (NB), (almost) everything was different.
That happened when I was trying to follow your suggestions. Then, Iâve increased the step size, according to the below table:
Thus, my âbest resultsâ were remarkably different from the ones I got before:
Then, I opened the Component and selected âAll parametersâ (on the âParameter Optimization Loop Endâ, and I saw that the accuracies were equal (= 0.709) for all 10 parameters.Are the NaĂŻve Bayesâ results so different from the former ones?
d) on MLP, I applied the following (as you suggested):
And got these results (= 0,7717), which are somewhat similar to NBâs ones, but remarkably different from the remainder algorithms:
Would you mind helping me to understand such high differences?
e) on PNN (as in your original answer, using âMinimum standard deviationâ as âTheta Minusâ and âThreshold Standard Deviationâ as âTheta Plusâ) I also increased the step size, like:
And got these results:
The PNNâs accuracy (= 0.299) is somewhat close to SVMâs and to k-NNâs results, but once again very different from NBâs and MLPâs.
Can you enlighten me about what is happening (or should have happened)?
Thanks again for all your help.
B.R.,
Rogério.
Dear Keerthan,
There are a few things I wish to ask you, but I forgot to send them in my yesterdayâs reply:
-
on the (c) (NaĂŻve Bayes), there was a greater difference between the accuracy as generated in the âBest parametersâ Component results:
and the accuracy that was directly calculated if I apply only the same âbest parametersâ (i.e., with no loops) on the Learner node, in a âmanual checkâ:
-
On the SVM, there was also a similar difference, but this time itâs very smaller:
0.2834 (on âBest parametersâ) versus 0.2546 (on a manual check with the same parameters. -
On the MLP, this difference was:
0.7165 (on âBest parametersâ) versus 0.5454 (on a manual check with the same parameters. -
And on the PNN, this situation was worse:
0.2992 (on âBest parametersâ) but âMinimum standard deviationâ (on the Theta Minus) = 0.976, versus âThreshold standard deviationâ (which was expected to be the Theta plus, but itâs higher than the former, and so, itâs not applicable. How could this be so?
Would you mind helping me to understand the origin of these differences?
Thanks once again for all your effort in this âEvangelism taskâ.
B.R.,
Rogério.
Hi @rogerius1st ,
@k10shetty1 is on vacation that is why he is not answering.
Please however consider this:
The framework we put in place is really general but the results you get are mostly based on:
- Your data, that is the distrubution of the features in the sample of data you are training the model with. The sample, however big (and yours is really small), is just an approximation of the reality it describes of which we are not expert and cannot help you with.
-
The model trained. Each model you have trained has its unique training and those parameters are controlling such training. In order to answer your questions we need to recall the details of the model training and perform the exercise of understanding what each of these parameters means.
You did not select one model, you selected many and this increases this effort of understanding the various algorithms and what each of those parameters controls. Moreover each model might need a different kind of data preparation.
To help you we would need to:
- Undertstand your data.
- Recall how the models are trained and what each parameters means.
Please notice however you are asking to perform a gigantic task.
This task would not even work if the results you are getting are simply due to chance as they are highly dependant on how you partition train and test.
The approach we can provide guidance on, it is a bit more basic but more in line with the work of many business data scientists in this domain: try and try and try and then pick the combination which gives the best results (after proper validation) and explain the model which performs best with XAI or by analyzing the structure of that model (for example decision tree or logistic regression are interpretable after training).
When no attempt increases the performance usually this has nothing to do with the single step (such as parameter optimiazation) but rather something more fundamental like how the data was collected and whether the task at hand is even feasible.
In your case you can achieve a good performance even with little data if you select the right model, rather than performing the parameter optimization. It seems to me that without proper data preparation many of those models you selected wonât train, no matter what parameters you adopt there.
Given this please consider to (on your own) understand what each of those parameters means and read papers on how they should be optimized and/or how data should be prepared (for example MLP requires normalization and so on)
If you canât do the above on your own we recommend to please consider the AutoML components (blog post) which do all these steps automatically for you and optimizing as much as possible within a reasonable amount of time.
Given all of that:
- I downloaded your workflow.
- I used the same partioning node on all models (that is the very least we can do before comparing performances, even if you are using the same seed and settings in all partitioning node it would be a pain to control and change it in all its occurrences everywhere)
- I added the AutoML component and compare its output to yours
- The best result I could find like this is: Gradient Boosted Trees with an accuracy of 100% on the validation set!
The main issue here is that you have really little data: 180ish rows!
With this little data you want to do so much! Too much!
Partitioning at first for global train and test.
Then on train (127 rows) partioning again for parameter optimization with cross validation.
Then on test (55 rows) get statistically significant performance metrics.
Also why did you pick those precise models? Was this an assignment?
On another hand the performance is measured with accuracy on a multiclass string target after binned from a numerical target.
Did you consider doing a regression from the start? Did you consider reducing the number of classes to two?
I added the AutoML Regression to the workflow too.
Open the View of the 3 components in the workflow to inspect results.
I am having a bit of a deja vu here since I tried to build a workflow comparing several multiclass models with seemingly similar data before and also discussed about the structure and quality of the data:
My impression then was that it had more to do with the task and also the target. I suggested if maybe formulating it as a regression task might help but we never managed to finish that conversation. I wonder if given more data an SVM might perform better with a regression task.
A support vector classifier came out on top the last time around although the model overall was not very good.
Other relevant links might be this.