I’m trying to create a Random Forest classifier that classifies records with 9 categorical features into 8 classifications. However, in the case that the classifier encounters a combination of the 9 features not contained in the training set, or in the case that the classifier cannot correctly classify a record or records, instead of misclassifying the record, I need the classifier to classify those records as “other” or “unknown”.
I’m currently using the builtin Random Forest Learner and Predictor nodes.
A thought I had initially was to create artificial data for the training set to create this “other” bucket, or to use the confidence rating to make the classification.
How would I go about this?