Bad correlation

you can do these things:

  • check the importance of your variables in the model and see if the top ones might contain ‘leaks’ that somehow give away the correct answer
  • see if you have variables like year not in a relative but an absolute way (like year=2016 which might change over time)
  • check if your train, test, validate datasets are truly separated. Eg you have households that you split by person, and one ends up in test one in training. But they share pretty much the same data and target
  • see if your new data contains all the variables in the same quality like your original set
  • you might think about what you accuracy does mean in the context of you business question (cf. links below)

Then you might provide us with more details or even a sample workflow, you might use fake data if you cannot share the original data.


Models for 0/1 or Yes/No Targets

Understand metrics like AUC and Gini (and use H2O.ai)

1 Like