SMOTE for unbalanced data (SVM)

hi everyone, I have developed a model with the support vector machine (SVM) with an unbalanced dataset (two classes). I tried to use the SMOTE to balance the minority class. So I get a database with an equal number of elements for the 2 classes. On the original database, I used the RBF kernel and optimized sigma and c, with decent results. When i use SVM+SMOTE, any parameter I use it seems that the SVM is not able to divide the classes (practically it gives me SP = 1 and SE = 0) whatever parameter it uses. I have already developed other models with the SMOTE and the same DB (knn, MPL, random forest) with good results, but the SVM + SMOTE does not seem to work. Has anyone ever had this problem?

Hi @degapifa,

Could you elaborate a bit more about your input data? For instance, how many variables (dimensions) are you using? Are you able to train an SVM if you do an equal size sampling of the input data?

Best regards,
Stefan

Thanks for the reply. the DB consists of:
6370 samples, of which 940 belonging to class A, and 5430 to class B. There are 1060 characteristics (all normalized) for each sample. The smote balances me at 5430 vs 5430.

From my experience that already qualifies as a high-dimensional dataset: I am not an expert but are aware the SMOTE has some shortcomings which have lead to alternative approaches being discussed in scientific literature:

None of the alternative approaches are implemented in KNIME (at least to my knowledge) but you might get lucky with the Python or R Integration.

Best,
Stefan

3 Likes

Just to leave that here: @malik has published an article about Recursive Feature Elimination SVM (RFE-SVM) that includes an implementation in KNIME that might come in handy: Recursive Cluster Elimination based Rank Function (SVM-RCE-R) implemented in KNIME

Best,
Stefan

5 Likes

i will try, thanks!!