Binary Classification Inspector predicting upon new data

paolotamag · December 11, 2019, 11:31am

Hi @ReidF,
I am happy you are liking our new node and its view.
The node is meant to give an interactive solution to find the best threshold and select a single model from its view. However it also compute a number a number of things regardless its view acting as some kind of a new “Scorer node” only for binary classification. In fact it outputs the settings for you to store and re-apply with other nodes (like for example a Rule Engine node) to a deployment model.

First of all by default you can set the node to optimize the threshold for all input probabilities (that is for all different models). At the output of the node you should see the optimal threshold and the associated performance metrics as well as the classifications given those threshold (check the box “Append new predictions for all models”). This is computed without any in-view user interaction if you change the threshold from default (0.5) to the optimization of a performance metric, for example F-score (see picture).

Without any interaction with those settings this would be the first output:

This instead is the output of the second port:

As you can see the node is creating new classification columns storing the new predicted classes given the new threshold. However this does not mean the node is able to score new data for which probabilities are not available. You need to use a predictor node for that. The best and most generic predictor node that you can use for all PMML models is the following (recently updated in 4.1 to output probabilities): https://kni.me/n/jya1a6bRZlBmer8S

However no model is selected (“Selected Model Prediction” column) and no flow variable with the selected model threshold is applied. You will need to automatically select the best model (highest F-Measure for example).

This tiny workflow (https://kni.me/w/1b6mEyoQIMW3qvoK) is supposed to show you how you can score new data (no ground truth (Target column) is available) and apply the best threshold found using the training set.

Step 1 use the predictor node.
Step 2 apply the threshold found by the Binary Classification Inspector node with a Rule Engine node