Problem with MultilayerPerceptron(3.7) from Weka

Hi, everyone. I’d like to ask help about a problem that occurs while using the MultilayerPerceptron(3.7) node from Weka.

I’m dealing with a class imbalanced classification problem for a university project. In my workflow I try to use different algorithms to deal with this problem. One of those is MLP, and I’m using the MultilayerPerceptron(3.7) node implemented by Weka. In particular I choose to use the CostSensitiveClassifier(3.7) with MultilayerPerceptron as the selected classifier.

This is the problem: during the feature selection procedure (I use a wrapper loop to select the relevant attributes) the classifiers is able to give reasonable results in terms of recall, precision, F1-measure and so on. After the feature selection I evaluate the classifier performance on a new sample of data (never used before) to avoid overfitting. But now, the exact same classifier, with the exact same settings is unable to detect the records belonging to the minority class. Every record is classified as part of the majority class and because of that I obtain recall=0, precision=0, F1-measure=0, etc.

I don’t understand why the classifier works before and is not working right now. The settings are exactly the same. The only thing that changes are the data used to learn the classifier and to test it.

I hope someone is able to help me.
Thanks!

Hi @GianluCav & welcome to the KNIME community !

Without going deeper, I would say that your MLP is doing overfitting:

Have you checked this ?

Would it be possible to upload here your workflow ? It would help us to help you and maybe suggest a solution :wink:

Best

Ael

First of all, thank you for the answer. I’d like to say that I’m a beginner, this is a workflow for the Machine Learning course in my university and I use just the things that the professor showed during the lectures.

This is the workflow: KnimeForum – Google Drive

To deal with overfitting I split the dataset into partition A and partition B. I use partition A to do the feature selection, splitting it into Training Set and Validation Set. Then, I use partition A to learn the classifier and partition B to test it. I think that this procedure is right to avoid overfitting, or at least that is what the professors told us :sweat_smile:

hi
well partitioning is to avoid data leakage not overfitting. To avoid overfitting in neural nets additional layers (like Dropout) can be used. Another option could be using class weights. (I don’t know if weka supports that I have always using python till now)
Have you thought about trying oversampling techniques to balance your dataset for the training part?
br

Hi @Daniel_Weikert

@GianluCav has kindly uploaded and shared his workflow so that people can have a look at it. You could hence download it and see from there what he is using as different balancing techniques :slight_smile:

Best

Ael

Unfortunately, the problem is still there.
I try with different settings, I tried with different training and test sets, but the results won’t came out. I take in consideration everything I know about overfitting and so on.
I’d like to add that other students from my course have the same problem. Is it possible that there is some kind of problem with the latest version of KNIME and the Weka package? Maybe I need to install some other extension?

I found the problem. This is the output of the inducer used for the MLP:


As you can see, even when the probability calculated by the inducer of a record belonging to Class 1 is equal to 1, the inducer classifies that record as part of Class 0. Can somebody explain why?

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.