Missing values

ScottF · April 1, 2019, 7:30pm

Some algorithms (like decision trees) can handle missing values - sort of - by treating such values as their own class. This isn’t ideal, but maybe in your case it’s good enough. Other algorithms don’t handle missings well at all, so you are forced to impute.

Another strategy here might be to predict the missing values in your dataset using the other values that aren’t missing, but then you are making some assumptions about the nature of the data. Since you are trying to build a classifier to identify unusual patterns, this might not be what you want.

Then again, imputation and averaging beforehand makes certain assumptions too, so no strategy is perfect.

The very unsatisfying answer to “how should I handle missings” is “it depends”.