External validation dataset

zizoo · July 6, 2018, 4:10pm

Hello,I applied the SVM node to train and the SVM node to predict and I used partition X -aggregate loop following the 10 folds cross validation. I connected the scorer node to the loop end.
Is the accuracy I get is the average of the 10 folds cross validation? (example 1)
Is this accuracy sufficient to know the robustness of the model and to compare with other models?
Is an external dataset required to test the prediction ability of the model? (example 2)
Is the comparison between different models based on the accuracy of prediction on the external dataset for each model? And where is the accuracy from the cross validation useful?
I attached a workflow with the different scenario and I hope I can get the best solution.

KNIME_project9.knwf (31.8 KB)

Thanks,
Zied

zizoo · July 9, 2018, 10:14am

Hello,
Is there any feedback for my questions above?
Thanks,
Zied

zizoo · July 26, 2018, 8:17am

Hello,
Is there any feedback for my questions above?
Thanks,
Zied

beginner · July 26, 2018, 9:14am

1, For validation set, lower part of your screen shot, you need a separate learner right after the partitioning node and connect that to the 2nd predictor,

About the cross validation:

IMHO this is wrong how you do it. You need to score each fold separately. I do it after x-aggregator using a group loop start grouping on fold#. Then you can actually get you median and quartile values and hence see the variance of your model!!! You can draw for example a box plot showing the variance between all the runs ( i usually do 10x10cv) for the metrics of interest.