Active Learning for Document Classification

This workflow defines a fully automated web based application that will label your data using active learning. The workflow was designed for business analysts to easily go through documents to be labeled in any number of classes. In each iteration the user labels more documents and the model is trained using the already labeled instances. With every new iteration, the model proposes documents based on a exploration vs exploitation approach. Once the user is happy with the overall potential falling below a certain value, they can exit the loop and export the model to label the remaining instances.

This is a companion discussion topic for the original entry at

This is a valuable 1st step in labeling. Another step is to do Positive Unlabeling. This practice is done sometimes in medical problems, where low probability false-positives are deleted from the training set, which is then used to train a second model. This process can be done iteratively in a loop with a Reference Row filter.