how can I find if the model have overfitting

@mohammad_alqoqa these things come to my mind:

  • you leave out several variables from the dataset. I assume this is a deliberate decision
  • also when you split the data you do not set a seed so the splits might not be exactly reproducible
  • to prevent overfitting in Random Forests (or other tree based algorithms) two parameters to check are the allowed depth of the tree, maybe limit that to 6 or 8 or try a parameter optimization. Also you might try to enforce a minimum number of values that must be there in a node
  • also you can try increasing the number of rounds you allow maybe try 2000 or something

You can also try other methods also with data preparation (think vtreat although they might not help you that much in this case):

Since you have only numerical data as features there might be a possibility that if you try something with deep learning you might come across different perspectives.
With the H2O.ai AutoML node you can easily let KNIME try some DL without too much effort, Maybe let in run for an hour and see what happens:

You also might want to think about the metric you want to optimize. With AutoML you can let the node decide. Maybe in your case you can start with classic AUC.

Could also be a case for a hyper parameter optimization - but KNIME does not (yet?) support categorical hyper parameters out of the box. You could try some Python code (I have prepared with XGBoost, LightGBM - one would have to expand the objective to test).

3 Likes