Optimisation of SVM parameters

Hello,

I would like to optimise the parameters of the SVR to get the highest R2 for my model. I tried to follow some tutorials here. But my workflow is still not working.

I put my trainer and learner between the strat and end of an optimisation loop.

Could you please help me to fix it?

Thanks,

Zied

Hello zizoo,

can you upload an example workflow? That would make it much easier to help you because from your text I can only guess what might be the problem
If you don't want to upload your full workflow, just upload the part that contains the optimization loop, you can even replace your data with e.g. the ouput of a "Data Generator".

If that's not possible then I would check if you replace the right parameters with flow variables.

Cheers,

nemad
 

Hi Nemad,
I am trying to optimise the hyperparameters of the weka SVM run with cross validation.
I attached a part of my worksheet. I used data generator and I got the following error:
WARN LibSVM (3.7) 6:271 Unable to merge flow object stacks: Conflicting FlowObjects: <Loop Context (Head 6:323, Tail unassigned)> - iteration 0 vs. <Loop Context (Head 6:281, Tail unassigned)> - iteration 0 (loops/scopes not properly nested?)

I tried also with simple partition function(without cross validation) and it is not working too. I the flow variable I cannot see properly the parameters of the weka svm nd also they seem to be all strings.

Hi again, with some try and test. I managed to get it run but I am not sure if the workflow is correct in this way. I attached a picture.

Hi zizoo, yes this why it should work correctly. Loops is Knime need to be opened and ended in correct order. eg. inner loop must end before outer loop.

Hi again,
I adjusted a bit the workflow as shown below. I practically added a row and a column filter to get the value of the accuracy of the average from the cross validation. I am not sure which one is correct.
I found a difference in the result of the optimization end loop.

Hello @zizoo,

what’s correct really depends on what you want to achieve.
But I guess your goal is to find the parameters that give you optimal accuracy of your model, in which case your workflow is correct.
Just a quick note: You actually don’t need the row and column filter because the Scorer node already outputs the accuracy as flow variable. Simply connect the Scorer with the Parameter Optimization Loop End via the red flow variable connection and you are good to go.

Cheers,

nemad

Hi, @nemad,
I got problems with the SVM optimization loop. Can you help me to configure it correctly?
I am working with some Covid’s data on my state, and I try to find out if one (or more) of three regional epidemiological rates can serve for predicting effects on a specific market. My goal is to compare the predictive performance of a few algorithms.
I have 6 possible classes for real metrics collected in a survey, and predictions should also fit in one of the same 6 classes. Input data are stored in this XLSX file:
Help with PpredictionsClassifcations.xlsx (9.7 KB)
I built a workflow as follows:

I applied on it the RBF kernel and added its two parameters: a) sigma; and b) Overlapping penalty.
But it is not working properly. This is the Confusion Mattrix I got:
image
Of course, the only “right” predictions fit in the due class just by chance. My “real” accuracy is “0%”.
I have also manually tested it with a series of different ranges of values (from very tiny to widely broad spectra), with and without the loop (Parameter Optimization Loop Start / End). Predicted values are classified chiefly in just one class, and I can’t find a better configuration for this loop and prediction/classification, which are stored in this XLSX file:
Help with SVM Predictions.xlsx (6.2 KB)
Can you help me with this task? Any help will be greatly appreciated.
Thank you.
B.R.,
Rogério.
P.S.: sorry for reviving such an old post, but no better idea came to my mind.

Hello @rogerius1st,

There are a few things that I don’t understand:

  • Why the Denormalizer after the Scorer node?
  • Which parameters are you optimizing over?
  • What is the warning the learner node displays?
  • Your ROC Curve only shows the performance of the last iteration of the parameter optimization. Are all other iterations similarily bad?

All in all it is hard to tell what is going wrong from just the snippet you posted. Is there any chance that you could share the workflow?

In the meantime I’d double check the normalization which can easily throw of an SVM. Do you use the same normalization model for both training and testing data, i.e. is the model produced by the Normalizer of the training data applied to the testing data via the Normalizer (Apply) node?

Best regards,
Adrian

3 Likes

Dear Adrian,
Thanks for your help. Indeed, You have already helped me a lot of times, with many other posts on which you helped many other beginners such as myself. Now, some answers to your questions:

  1. Although I had previously realized that it wasn’t necessary because the classes (that I had collected in a survey) already were between “0” and “1”. Notwithstanding, the three rates are not “normalized”. And I applied the “Normalizer” node before every “Learner” node and the “Denormalizer” node after every “predictor” node of each algorithm that I am trying to compare their performances (and before the Scorer node), because I read (in one post at the “Knime forum”) that “[…] algorithms predict better if working with normalized data […]”. Isn’t that so? Indeed, I found no difference with or without the Denormalizer node (just for normalizing (or not) the real metrics (those collected in a survey) nor the predictions. I wished to emphasize that I am (trying to) compare some of COVID’s epidemiological local rates as potential predictors of local market retractions. So, it made sense to me if I worked with normalized data, for these rates weren’t normalized before, although the market metrics were.

  2. I’m trying to optimize over two parameters (as I chose the RBF kernel): a) sigma (from 0.1 to 1.0; step 0.1); and b) overlapping penalty (from 0.1 to 2; step 0.1). After reading (the new version of) my workflow, do you think other ranges of values would be advisable for my situation?

  3. The warning of the learner node displays a message of rejection of a few columns. And this is a very good point in your comments. After reading your answer, (and trying to answer it), I rewrote the workflow, applying a “Column filter” (to exclude those additional columns (which I had no intention to use henceforward), I got the workflow that follows:
    Help with SVM Loop.knwf (42.8 KB)
    Formerly, I had also tried to apply the SMOTE node (on the training partition, and twice, three, and four times of over-sampling), but it rendered the same results: Accuracy = 14,454%, and (almost) all the predictions in the same class.
    But suddenly, I just removed the SMOTE node, and it worked!.. Unexpectedly well, by the way. It seems so much better now, that it resembles the result one can get with ‘overtraining’. See the Confusion Matrix below:
    image
    I am still using the same input data:
    Help with PpredictionsClassifcations.xlsx (15.7 KB)
    As it hadn’t worked out before (and does now), I guess that (then) I had neglected the exceeding columns, which have had a harmful effect on former prediction configurations. Now, it’s working for me. Thank you once again.

  4. My ROC Curves are now working with all classes, and their current predictions are functional, notwithstanding the due considerations relative to my VERY scarce number of rows.

  5. Did you get to download (and open in Knime) my new version of this branch of my workflow? And does it seem to you (a little) more comprehensible?

Thank you once more for all your help.
I wish you all the best.
Bye.
Rogério.

Hello @rogerius1st,
first of all, sorry for the late reply, I was on vacation last week.
Thank you for posting your workflow, that makes it so much easier to figure out what is going on.

Some observations:

  • You perform normalization on your full dataset before splitting it into training and testing data. This can skew your model because the testing data affects the normalizer model that you would later have to apply to new data in order to apply the model in practice. Therefore the Normalizer should be used after the Partitioning node and then applied to the testing data via the Normalizer (Apply).
  • In your case the best value for Overlapping penalty is the stop value of your parameter search. If you fall on a parameter boundary like this, it might make sense to extend the parameter range to see if larger Overlapping penalty values would yield an even better model.
  • In your workflow the best model happens to be the last model to be trained but in general you have to retrain your model once the best parameter combination is found and then evaluate the performance on an independent third dataset.
  • You might want to employ cross validation together with parameter optimization to get the most robust results. Here is a workflow that shows how to do this: Parameter Optimization Loop with Cross Validation – KNIME Hub
  • The Denormalizer before the Scorer is not needed because the scorer looks at string columns that are not affected by a Normalizer anyway.

Cheers,
Adrian

1 Like

Hi again, @nemad.
Sorry for my late reply. And thanks a lot for the enlightenment. I suppose I haven’t learned (yet) how to use most of Knime’s nodes properly.

  1. I moved the “Normalizer” nodes from before to after the “Partitioner”, applying it only to the Training data, and adding the “Normalizer (Apply)” to the Test data (and feeding this last one (the blue square input) with the model from the former one (the blue square output)), and removed the “Denormalizer” as well, as you suggested.
  2. About the question of Overlapping penalty values that you mentioned, I wasn’t sure what should be a good range for it. I simply tried to double the “neutral” value of “1” (i.e., applied the interval [0.01; 2.0], with a step size of 0.01 → the results came with the best parameter of 1.79 for the penalty.
  3. Afterward, I increased the penalty value to “3” (although I still don’t know if it may make any real sense to test such a value (which I suppose it’s high)), and the best parameters for Accuracy didn’t increase very much: it became 2.22 for the penalty and 0.12 for sigma. This is what I got in a “Surface Plot” node (I show you two views: a front view and a bottom-up view, to ease understanding of how this surface appeared to me):

    and

    I thought it might mean: lower sigmas and lower penalties lead to higher accuracies. Am I right?
    This is my new Confusion Matrix:
    image
  4. By the way: Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldn’t they have the same value? Otherwise, which one should I use?
  5. Upon your suggestion of testing the performance with a third independent dataset, unfortunately, I don’t have any. My total dataset is very tiny, with 182 instances (rows). Setting the partition to 70%, it renders me 127 for training and 55 for prediction. But I got no additional data for validating my models (on other data different from training and testing).
  6. I tried to adapt the @paolotamag’s example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?
    My workflow became like this:

    Or the sharable link of the modified workflow:
    Help (3) with Optimization loops on SVM.knwf (106.3 KB)

Thanks for any help you can lend me.
B.R., Rogério

Hello Rogério,

the surface plot is a great idea to visualize the optimization!
However, there might be a bug in it because the accuracy should only take on values between 0 and 1.

Those optimized parameters generated the maximized accuracy of 76.34% on the Confusion matrix, but I got (on the same run) the value of 0,782 clicking on the 'Best Parameters" of the Parameter Optimization Loop End node. Shouldn’t they have the same value? Otherwise, which one should I use?

That is indeed odd. Can you perhaps send me the workflow that produced this incoherence, so that I can figure out if there is a bug on our end?

But I got no additional data for validating my models (on other data different from training and testing).

That’s where cross validation can help you. You would split your data with the partitioning node to get a training and a test set. Then you use cross validation in combination with the Parameter Optimization loop to find the best parameters and then train your final model using the full training data and apply that to your test data to get a less biased estimation of your model on unseen data.

I tried to adapt the @paolotamag’s example (for Cross Validation) that you had linked, though I think I need some help with this adaptation (or translation). Would you?

In your case you will want to replace the ROC curve with the Scorer to calculate the accuracy and collect it with a Variable Loop End node instead of the Loop End node that Paolo uses in the example.

Cheers,

Adrian

1 Like

Hello there @rogerius1st
Did you check there the Parameter Optimization (Table) component by @k10shetty1 ?

This enables you to drag and drop functionality and perform parameter optimization with minimal configuration.

The component has settings that expose you to the right level of detail.

Here is an example workflow on classification: Parameter Optimization (Table) Component on Random Forest – KNIME Hub

Here is an example on regression: Parameter Optimization (Table) Component on MLP – KNIME Hub

For more examples and explanation take a look at the following resource.

Keerthan is an expert on this topic and co-author of the blog post:

and of this example space:

2 Likes

Dear Keerthan @k10shetty1
Thanks for all your support. I (tried to) study all the material available in the sent links. And tried to apply that ‘Component’ in my situation. I didn’t get good results yet, probably for not properly understanding them. I tried this Component for comparing the performances of five classifications in algorithms. I received ‘error messages’ in most of them:
-k-NN:
image

  • MLP:
    image
  • PNN:
    image
  • SVM and Naïve Bayes: no warning messages, but (apparently) into an endless loop.
    Though I tried (since your post, and all through the whole past week) a series of parameters and configurations, I think I’m still needing help (or a great help!) on it. Below I send you my starting material:
    a) an XLSX file with 3 rates which may supposedly be related to 1 business metric;
    Help with PpredictionsClassifcations.xlsx (8.1 KB)
    b) the workflow on which I tried to apply your suggestion with the cited ‘Component’.
    KNWF_to_Paolo_Keerthan_3.knwf (847.6 KB)

If you can help me, it will be greatly appreciated.
Thank you
B.R.,
Rogério.

Hello there, @rogerius1st.

I ran your workflow and would make the following changes:

  • SVM and Nave Bayes - The ‘Brute Force’ technique would try every possible parameter combination in the parameter start and end range.Given that you have 200 values for the ‘Overlapping penalty’ parameter and 100 values for the’sigma’ parameter, to try everything would take a long time.I would recommend that you either increase the step size or use a different strategy in the component configuration.

  • Please update variable values in your Learner node’s ‘Flow Variable’ section for PNN and MLP, within the capture segment.

  • For KNN, the Parameter Optimization component works with a Model that includes a Learner and Predictor node. Because this is not the case with KNN, please use the Counting Loop Start node.

Please see the attached workflow for updates.
KNWF_to_Paolo_Keerthan.knwf (1.9 MB)

1 Like

Thanks for sharing @paolotamag .
Why is the bottom workflow captured? I do not understand the reason for this. Can you elaborate
br

The bottom workflow captured is the model learner and predictor that you would like to optimize.
Instead of building from scratch you need to capture the learner and predictor only, configure the learner flow variables to be optimized, and connect the component together with data and parameters settings.
Did you read the blog posts? At the end is well explained.

1 Like

probably not detailed enough I just rushed quickly over. To me it looks like the optimazation is learned in the “train full model part” so I assume this is the part which needs to be captured
br

1 Like

Dear Keerthan,
Thanks for your direct (and detailed) answer. Thanks for your comments, and special thanks for the (uphill) task you probably had while working on my KNWF.

I updated my workflow based on your suggestions (where possible, increasing the step size) – as far-reaching as I could understand the specifics of your changes. I send you (attached) my new version of that workflow.
WF_answers_to_Keerthan 1.knwf (1.9 MB)

Thou, I’m feeling compelled to comment with you a few things I couldn’t reach:

  1. (on your original answer) the component presents only 10 results on the Parameter Optimization Loop End → All parameters. How can I enlarge this number?
  2. on SVM, I’ve increased the step size, as you suggested, changing them into:
    image
    and thus I got a low “best accuracy” (of 0.2835) in the ‘Best parameters’ right-click option of the component:
    image

Opening the Component, I also saw that almost all SVM predictions (except by 4 from 55 instances) were in the same class (= “0.1”). So, I guess the real accuracy was “0”, because some original instances belonged to that class, and this coincided with the class to which a few points were assigned to this class just by chance. What could I do now to fix that?

b) on k-NN (testing k values in the interval {2; 20], the accuracy attained was 0.273 (for both 2 and 3 nearest neighbors). After these results, I applied the ‘Line Plot’ node, which rendered this graph:

.
According to the Elbow Method, the k = 2 to 3 (the first “elbow”) would be the best numbers (I chose k = 3). I also tried to test (and compare to) (almost) the same application, but with no loops, and with k = 3, which rendered me the following Confusion Matrix:
image
The data classification in this matrix was somewhat strange. Data seem to be fully dispersed along it, which leads to the supposition that the “right classifications” were generated just by chance. Does it look the same to you? In such a case, what could I do to reduce this data dispersion?

c) on Naïve Bayes (NB), (almost) everything was different.
That happened when I was trying to follow your suggestions. Then, I’ve increased the step size, according to the below table:
image

Thus, my “best results” were remarkably different from the ones I got before:
image

Then, I opened the Component and selected “All parameters” (on the “Parameter Optimization Loop End”, and I saw that the accuracies were equal (= 0.709) for all 10 parameters.Are the Naïve Bayes’ results so different from the former ones?

d) on MLP, I applied the following (as you suggested):
image

And got these results (= 0,7717), which are somewhat similar to NB’s ones, but remarkably different from the remainder algorithms:
image
Would you mind helping me to understand such high differences?

e) on PNN (as in your original answer, using ‘Minimum standard deviation’ as ‘Theta Minus’ and ‘Threshold Standard Deviation’ as ‘Theta Plus’) I also increased the step size, like:
image
And got these results:
image

The PNN’s accuracy (= 0.299) is somewhat close to SVM’s and to k-NN’s results, but once again very different from NB’s and MLP’s.

Can you enlighten me about what is happening (or should have happened)?

Thanks again for all your help.
B.R.,
Rogério.