Optimisation of SVM parameters

Hi zizoo, yes this why it should work correctly. Loops is Knime need to be opened and ended in correct order. eg. inner loop must end before outer loop.

Hi again,
I adjusted a bit the workflow as shown below. I practically added a row and a column filter to get the value of the accuracy of the average from the cross validation. I am not sure which one is correct.
I found a difference in the result of the optimization end loop.

Hello @zizoo,

what’s correct really depends on what you want to achieve.
But I guess your goal is to find the parameters that give you optimal accuracy of your model, in which case your workflow is correct.
Just a quick note: You actually don’t need the row and column filter because the Scorer node already outputs the accuracy as flow variable. Simply connect the Scorer with the Parameter Optimization Loop End via the red flow variable connection and you are good to go.

Cheers,

nemad

Hi, @nemad,
I got problems with the SVM optimization loop. Can you help me to configure it correctly?
I am working with some Covid’s data on my state, and I try to find out if one (or more) of three regional epidemiological rates can serve for predicting effects on a specific market. My goal is to compare the predictive performance of a few algorithms.
I have 6 possible classes for real metrics collected in a survey, and predictions should also fit in one of the same 6 classes. Input data are stored in this XLSX file:
Help with PpredictionsClassifcations.xlsx (9.7 KB)
I built a workflow as follows:

I applied on it the RBF kernel and added its two parameters: a) sigma; and b) Overlapping penalty.
But it is not working properly. This is the Confusion Mattrix I got:
image
Of course, the only “right” predictions fit in the due class just by chance. My “real” accuracy is “0%”.
I have also manually tested it with a series of different ranges of values (from very tiny to widely broad spectra), with and without the loop (Parameter Optimization Loop Start / End). Predicted values are classified chiefly in just one class, and I can’t find a better configuration for this loop and prediction/classification, which are stored in this XLSX file:
Help with SVM Predictions.xlsx (6.2 KB)
Can you help me with this task? Any help will be greatly appreciated.
Thank you.
B.R.,
Rogério.
P.S.: sorry for reviving such an old post, but no better idea came to my mind.

Hello @rogerius1st,

There are a few things that I don’t understand:

  • Why the Denormalizer after the Scorer node?
  • Which parameters are you optimizing over?
  • What is the warning the learner node displays?
  • Your ROC Curve only shows the performance of the last iteration of the parameter optimization. Are all other iterations similarily bad?

All in all it is hard to tell what is going wrong from just the snippet you posted. Is there any chance that you could share the workflow?

In the meantime I’d double check the normalization which can easily throw of an SVM. Do you use the same normalization model for both training and testing data, i.e. is the model produced by the Normalizer of the training data applied to the testing data via the Normalizer (Apply) node?

Best regards,
Adrian

3 Likes

Dear Adrian,
Thanks for your help. Indeed, You have already helped me a lot of times, with many other posts on which you helped many other beginners such as myself. Now, some answers to your questions:

  1. Although I had previously realized that it wasn’t necessary because the classes (that I had collected in a survey) already were between “0” and “1”. Notwithstanding, the three rates are not “normalized”. And I applied the “Normalizer” node before every “Learner” node and the “Denormalizer” node after every “predictor” node of each algorithm that I am trying to compare their performances (and before the Scorer node), because I read (in one post at the “Knime forum”) that “[
] algorithms predict better if working with normalized data [
]”. Isn’t that so? Indeed, I found no difference with or without the Denormalizer node (just for normalizing (or not) the real metrics (those collected in a survey) nor the predictions. I wished to emphasize that I am (trying to) compare some of COVID’s epidemiological local rates as potential predictors of local market retractions. So, it made sense to me if I worked with normalized data, for these rates weren’t normalized before, although the market metrics were.

  2. I’m trying to optimize over two parameters (as I chose the RBF kernel): a) sigma (from 0.1 to 1.0; step 0.1); and b) overlapping penalty (from 0.1 to 2; step 0.1). After reading (the new version of) my workflow, do you think other ranges of values would be advisable for my situation?

  3. The warning of the learner node displays a message of rejection of a few columns. And this is a very good point in your comments. After reading your answer, (and trying to answer it), I rewrote the workflow, applying a “Column filter” (to exclude those additional columns (which I had no intention to use henceforward), I got the workflow that follows:
    Help with SVM Loop.knwf (42.8 KB)
    Formerly, I had also tried to apply the SMOTE node (on the training partition, and twice, three, and four times of over-sampling), but it rendered the same results: Accuracy = 14,454%, and (almost) all the predictions in the same class.
    But suddenly, I just removed the SMOTE node, and it worked!.. Unexpectedly well, by the way. It seems so much better now, that it resembles the result one can get with ‘overtraining’. See the Confusion Matrix below:
    image
    I am still using the same input data:
    Help with PpredictionsClassifcations.xlsx (15.7 KB)
    As it hadn’t worked out before (and does now), I guess that (then) I had neglected the exceeding columns, which have had a harmful effect on former prediction configurations. Now, it’s working for me. Thank you once again.

  4. My ROC Curves are now working with all classes, and their current predictions are functional, notwithstanding the due considerations relative to my VERY scarce number of rows.

  5. Did you get to download (and open in Knime) my new version of this branch of my workflow? And does it seem to you (a little) more comprehensible?

Thank you once more for all your help.
I wish you all the best.
Bye.
Rogério.

Hello @rogerius1st,
first of all, sorry for the late reply, I was on vacation last week.
Thank you for posting your workflow, that makes it so much easier to figure out what is going on.

Some observations:

  • You perform normalization on your full dataset before splitting it into training and testing data. This can skew your model because the testing data affects the normalizer model that you would later have to apply to new data in order to apply the model in practice. Therefore the Normalizer should be used after the Partitioning node and then applied to the testing data via the Normalizer (Apply).
  • In your case the best value for Overlapping penalty is the stop value of your parameter search. If you fall on a parameter boundary like this, it might make sense to extend the parameter range to see if larger Overlapping penalty values would yield an even better model.
  • In your workflow the best model happens to be the last model to be trained but in general you have to retrain your model once the best parameter combination is found and then evaluate the performance on an independent third dataset.
  • You might want to employ cross validation together with parameter optimization to get the most robust results. Here is a workflow that shows how to do this: Parameter Optimization Loop with Cross Validation – KNIME Hub
  • The Denormalizer before the Scorer is not needed because the scorer looks at string columns that are not affected by a Normalizer anyway.

Cheers,
Adrian

1 Like

Hi again, @nemad.
Sorry for my late reply. And thanks a lot for the enlightenment. I suppose I haven’t learned (yet) how to use most of Knime’s nodes properly.

  1. I moved the “Normalizer” nodes from before to after the “Partitioner”, applying it only to the Training data, and adding the “Normalizer (Apply)” to the Test data (and feeding this last one (the blue square input) with the model from the former one (the blue square output)), and removed the “Denormalizer” as well, as you suggested.
  2. About the question of Overlapping penalty values that you mentioned, I wasn’t sure what should be a good range for it. I simply tried to double the “neutral” value of “1” (i.e., applied the interval [0.01; 2.0], with a step size of 0.01 → the results came with the best parameter of 1.79 for the penalty.
  3. Afterward, I increased the penalty value to “3” (although I still don’t know if it may make any real sense to test such a value (which I suppose it’s high)), and the best parameters for Accuracy didn’t increase very much: it became 2.22 for the penalty and 0.12 for sigma. This is what I got in a “Surface Plot” node (I show you two views: a front view and a bottom-up view, to ease understanding of how this surface appeared to me):

    and

    I thought it might mean: lower sigmas and lower penalties lead to higher accuracies. Am I right?
    This is my new Confusion Matrix:
    image
  4. By the way: Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldn’t they have the same value? Otherwise, which one should I use?
  5. Upon your suggestion of testing the performance with a third independent dataset, unfortunately, I don’t have any. My total dataset is very tiny, with 182 instances (rows). Setting the partition to 70%, it renders me 127 for training and 55 for prediction. But I got no additional data for validating my models (on other data different from training and testing).
  6. I tried to adapt the @paolotamag’s example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?
    My workflow became like this:

    Or the sharable link of the modified workflow:
    Help (3) with Optimization loops on SVM.knwf (106.3 KB)

Thanks for any help you can lend me.
B.R., Rogério

Hello Rogério,

the surface plot is a great idea to visualize the optimization!
However, there might be a bug in it because the accuracy should only take on values between 0 and 1.

Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldn’t they have the same value? Otherwise, which one should I use?

That is indeed odd. Can you perhaps send me the workflow that produced this incoherence, so that I can figure out if there is a bug on our end?

But I got no additional data for validating my models (on other data different from training and testing).

That’s where cross validation can help you. You would split your data with the partitioning node to get a training and a test set. Then you use cross validation in combination with the Parameter Optimization loop to find the best parameters and then train your final model using the full training data and apply that to your test data to get a less biased estimation of your model on unseen data.

I tried to adapt the @paolotamag’s example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?

In your case you will want to replace the ROC curve with the Scorer to calculate the accuracy and collect it with a Variable Loop End node instead of the Loop End node that Paolo uses in the example.

Cheers,

Adrian

1 Like

Hello there @rogerius1st
Did you check there the Parameter Optimization (Table) component by @k10shetty1 ?

This enables you to drag and drop functionality and perform parameter optimization with minimal configuration.

The component has settings that expose you to the right level of detail.

Here is an example workflow on classification: Parameter Optimization (Table) Component on Random Forest – KNIME Hub

Here is an example on regression: Parameter Optimization (Table) Component on MLP – KNIME Hub

For more examples and explanation take a look at the following resource.

Keerthan is an expert on this topic and co-author of the blog post:

and of this example space:

2 Likes

Dear Keerthan @k10shetty1
Thanks for all your support. I (tried to) study all the material available in the sent links. And tried to apply that ‘Component’ in my situation. I didn’t get good results yet, probably for not properly understanding them. I tried this Component for comparing the performances of five classifications in algorithms. I received ‘error messages’ in most of them:
-k-NN:
image

  • MLP:
    image
  • PNN:
    image
  • SVM and NaĂŻve Bayes: no warning messages, but (apparently) into an endless loop.
    Though I tried (since your post, and all through the whole past week) a series of parameters and configurations, I think I’m still needing help (or a great help!) on it. Below I send you my starting material:
    a) an XLSX file with 3 rates which may supposedly be related to 1 business metric;
    Help with PpredictionsClassifcations.xlsx (8.1 KB)
    b) the workflow on which I tried to apply your suggestion with the cited ‘Component’.
    KNWF_to_Paolo_Keerthan_3.knwf (847.6 KB)

If you can help me, it will be greatly appreciated.
Thank you
B.R.,
Rogério.

Hello there, @rogerius1st.

I ran your workflow and would make the following changes:

  • SVM and Nave Bayes - The ‘Brute Force’ technique would try every possible parameter combination in the parameter start and end range.Given that you have 200 values for the ‘Overlapping penalty’ parameter and 100 values for the’sigma’ parameter, to try everything would take a long time.I would recommend that you either increase the step size or use a different strategy in the component configuration.

  • Please update variable values in your Learner node’s ‘Flow Variable’ section for PNN and MLP, within the capture segment.

  • For KNN, the Parameter Optimization component works with a Model that includes a Learner and Predictor node. Because this is not the case with KNN, please use the Counting Loop Start node.

Please see the attached workflow for updates.
KNWF_to_Paolo_Keerthan.knwf (1.9 MB)

1 Like

Thanks for sharing @paolotamag .
Why is the bottom workflow captured? I do not understand the reason for this. Can you elaborate
br

The bottom workflow captured is the model learner and predictor that you would like to optimize.
Instead of building from scratch you need to capture the learner and predictor only, configure the learner flow variables to be optimized, and connect the component together with data and parameters settings.
Did you read the blog posts? At the end is well explained.

1 Like

probably not detailed enough I just rushed quickly over. To me it looks like the optimazation is learned in the “train full model part” so I assume this is the part which needs to be captured
br

1 Like

Dear Keerthan,
Thanks for your direct (and detailed) answer. Thanks for your comments, and special thanks for the (uphill) task you probably had while working on my KNWF.

I updated my workflow based on your suggestions (where possible, increasing the step size) – as far-reaching as I could understand the specifics of your changes. I send you (attached) my new version of that workflow.
WF_answers_to_Keerthan 1.knwf (1.9 MB)

Thou, I’m feeling compelled to comment with you a few things I couldn’t reach:

  1. (on your original answer) the component presents only 10 results on the Parameter Optimization Loop End → All parameters. How can I enlarge this number?
  2. on SVM, I’ve increased the step size, as you suggested, changing them into:
    image
    and thus I got a low “best accuracy” (of 0.2835) in the ‘Best parameters’ right-click option of the component:
    image

Opening the Component, I also saw that almost all SVM predictions (except by 4 from 55 instances) were in the same class (= “0.1”). So, I guess the real accuracy was “0”, because some original instances belonged to that class, and this coincided with the class to which a few points were assigned to this class just by chance. What could I do now to fix that?

b) on k-NN (testing k values in the interval {2; 20], the accuracy attained was 0.273 (for both 2 and 3 nearest neighbors). After these results, I applied the ‘Line Plot’ node, which rendered this graph:

.
According to the Elbow Method, the k = 2 to 3 (the first “elbow”) would be the best numbers (I chose k = 3). I also tried to test (and compare to) (almost) the same application, but with no loops, and with k = 3, which rendered me the following Confusion Matrix:
image
The data classification in this matrix was somewhat strange. Data seem to be fully dispersed along it, which leads to the supposition that the “right classifications” were generated just by chance. Does it look the same to you? In such a case, what could I do to reduce this data dispersion?

c) on NaĂŻve Bayes (NB), (almost) everything was different.
That happened when I was trying to follow your suggestions. Then, I’ve increased the step size, according to the below table:
image

Thus, my “best results” were remarkably different from the ones I got before:
image

Then, I opened the Component and selected “All parameters” (on the “Parameter Optimization Loop End”, and I saw that the accuracies were equal (= 0.709) for all 10 parameters.Are the Naïve Bayes’ results so different from the former ones?

d) on MLP, I applied the following (as you suggested):
image

And got these results (= 0,7717), which are somewhat similar to NB’s ones, but remarkably different from the remainder algorithms:
image
Would you mind helping me to understand such high differences?

e) on PNN (as in your original answer, using ‘Minimum standard deviation’ as ‘Theta Minus’ and ‘Threshold Standard Deviation’ as ‘Theta Plus’) I also increased the step size, like:
image
And got these results:
image

The PNN’s accuracy (= 0.299) is somewhat close to SVM’s and to k-NN’s results, but once again very different from NB’s and MLP’s.

Can you enlighten me about what is happening (or should have happened)?

Thanks again for all your help.
B.R.,
Rogério.

Dear Keerthan,
There are a few things I wish to ask you, but I forgot to send them in my yesterday’s reply:

  • on the (c) (NaĂŻve Bayes), there was a greater difference between the accuracy as generated in the “Best parameters” Component results:
    image
    and the accuracy that was directly calculated if I apply only the same “best parameters” (i.e., with no loops) on the Learner node, in a “manual check”:
    image

  • On the SVM, there was also a similar difference, but this time it’s very smaller:
    0.2834 (on “Best parameters”) versus 0.2546 (on a manual check with the same parameters.

  • On the MLP, this difference was:
    0.7165 (on “Best parameters”) versus 0.5454 (on a manual check with the same parameters.

  • And on the PNN, this situation was worse:
    0.2992 (on “Best parameters”) but “Minimum standard deviation” (on the Theta Minus) = 0.976, versus “Threshold standard deviation” (which was expected to be the Theta plus, but it’s higher than the former, and so, it’s not applicable. How could this be so?
    image

Would you mind helping me to understand the origin of these differences?

Thanks once again for all your effort in this “Evangelism task”.
B.R.,
Rogério.

Hi @rogerius1st ,

@k10shetty1 is on vacation that is why he is not answering.

Please however consider this:

The framework we put in place is really general but the results you get are mostly based on:

  • Your data, that is the distrubution of the features in the sample of data you are training the model with. The sample, however big (and yours is really small), is just an approximation of the reality it describes of which we are not expert and cannot help you with.
  • The model trained. Each model you have trained has its unique training and those parameters are controlling such training. In order to answer your questions we need to recall the details of the model training and perform the exercise of understanding what each of these parameters means.
    You did not select one model, you selected many and this increases this effort of understanding the various algorithms and what each of those parameters controls. Moreover each model might need a different kind of data preparation.

To help you we would need to:

  • Undertstand your data.
  • Recall how the models are trained and what each parameters means.

Please notice however you are asking to perform a gigantic task.
This task would not even work if the results you are getting are simply due to chance as they are highly dependant on how you partition train and test.

The approach we can provide guidance on, it is a bit more basic but more in line with the work of many business data scientists in this domain: try and try and try and then pick the combination which gives the best results (after proper validation) and explain the model which performs best with XAI or by analyzing the structure of that model (for example decision tree or logistic regression are interpretable after training).

When no attempt increases the performance usually this has nothing to do with the single step (such as parameter optimiazation) but rather something more fundamental like how the data was collected and whether the task at hand is even feasible.

In your case you can achieve a good performance even with little data if you select the right model, rather than performing the parameter optimization. It seems to me that without proper data preparation many of those models you selected won’t train, no matter what parameters you adopt there.

Given this please consider to (on your own) understand what each of those parameters means and read papers on how they should be optimized and/or how data should be prepared (for example MLP requires normalization and so on)

If you can’t do the above on your own we recommend to please consider the AutoML components (blog post) which do all these steps automatically for you and optimizing as much as possible within a reasonable amount of time.

Given all of that:

  1. I downloaded your workflow.
  2. I used the same partioning node on all models (that is the very least we can do before comparing performances, even if you are using the same seed and settings in all partitioning node it would be a pain to control and change it in all its occurrences everywhere)
  3. I added the AutoML component and compare its output to yours
  4. The best result I could find like this is: Gradient Boosted Trees with an accuracy of 100% on the validation set!

The main issue here is that you have really little data: 180ish rows!
With this little data you want to do so much! Too much!
Partitioning at first for global train and test.
Then on train (127 rows) partioning again for parameter optimization with cross validation.
Then on test (55 rows) get statistically significant performance metrics.

Also why did you pick those precise models? Was this an assignment?

On another hand the performance is measured with accuracy on a multiclass string target after binned from a numerical target.

Did you consider doing a regression from the start? Did you consider reducing the number of classes to two?

I added the AutoML Regression to the workflow too.

Open the View of the 3 components in the workflow to inspect results.

2022-11-28_13h19_07

6 Likes

I am having a bit of a deja vu here :slight_smile: since I tried to build a workflow comparing several multiclass models with seemingly similar data before and also discussed about the structure and quality of the data:

My impression then was that it had more to do with the task and also the target. I suggested if maybe formulating it as a regression task might help but we never managed to finish that conversation. I wonder if given more data an SVM might perform better with a regression task.

A support vector classifier came out on top the last time around although the model overall was not very good.

Other relevant links might be this.