Emil the TeacherBot - Creating a Subset of the Trainingset based on the most Uncertain Predicted Class

This workflow is part of a number of other workflows that address a data mining scenario at the intersection of active learning, text mining, stream mining and service-oriented knowledge discovery architectures. This workflow, in particular, allows to create a subset of the training set based on the most uncertain predicted classes. It first read the entire training set. Then, it processes the questions and it predicts the class for each one of those. The loop body allows to compute the differences between the three top probabilities for each predicted class of each question. Finally, a subset of the entire training set is created based on the most uncertain predicted class and saved as new table.


This is a companion discussion topic for the original entry at https://kni.me/w/PzOh6F2Fvo3kf4cw