Hi, everyone. I’d like to ask help about a problem that occurs while using the MultilayerPerceptron(3.7) node from Weka.
I’m dealing with a class imbalanced classification problem for a university project. In my workflow I try to use different algorithms to deal with this problem. One of those is MLP, and I’m using the MultilayerPerceptron(3.7) node implemented by Weka. In particular I choose to use the CostSensitiveClassifier(3.7) with MultilayerPerceptron as the selected classifier.
This is the problem: during the feature selection procedure (I use a wrapper loop to select the relevant attributes) the classifiers is able to give reasonable results in terms of recall, precision, F1-measure and so on. After the feature selection I evaluate the classifier performance on a new sample of data (never used before) to avoid overfitting. But now, the exact same classifier, with the exact same settings is unable to detect the records belonging to the minority class. Every record is classified as part of the majority class and because of that I obtain recall=0, precision=0, F1-measure=0, etc.
I don’t understand why the classifier works before and is not working right now. The settings are exactly the same. The only thing that changes are the data used to learn the classifier and to test it.
First of all, thank you for the answer. I’d like to say that I’m a beginner, this is a workflow for the Machine Learning course in my university and I use just the things that the professor showed during the lectures.
To deal with overfitting I split the dataset into partition A and partition B. I use partition A to do the feature selection, splitting it into Training Set and Validation Set. Then, I use partition A to learn the classifier and partition B to test it. I think that this procedure is right to avoid overfitting, or at least that is what the professors told us
well partitioning is to avoid data leakage not overfitting. To avoid overfitting in neural nets additional layers (like Dropout) can be used. Another option could be using class weights. (I don’t know if weka supports that I have always using python till now)
Have you thought about trying oversampling techniques to balance your dataset for the training part?
Unfortunately, the problem is still there.
I try with different settings, I tried with different training and test sets, but the results won’t came out. I take in consideration everything I know about overfitting and so on.
I’d like to add that other students from my course have the same problem. Is it possible that there is some kind of problem with the latest version of KNIME and the Weka package? Maybe I need to install some other extension?