this is Paolo from KNIME.
I realized there are lots of materials and examples on machine learning interpretability but it looks like they all scattered on the hub and I wanted to do a post where all materials could be found in a single web page (also the Twitter videos for example!). Enjoy!
Interpretability view with Shapley Values and Partial Dependence: https://kni.me/w/AIM_-dYRkW3Qq7UO
Video of this view:
The cool feature of this view is the chance of selecting one or more Shapley Value explanations (feature importance) in the Bubble Chart, see the actual values explaining the predictions in the Violin Plot and visualizing the ICE curves in the Partial Dependence view. There is also a Surrogate Decision Tree. This tree is overfitting your predictions so it will do the split based on the predictions of the real model.
If you have a model in PMML format you can use the view already using this component: https://kni.me/w/dsMwugXNrpY7wdcV
A more complex example where models are automatically trained and explained is available here: https://kni.me/w/kQNYoitDenHzA-HH
The view should automatically visualize the explanations of any PMML model. You also need to supply data the PMML is able to score and define how many explanation to visualize. Computing explanations can be computationally expensive so be careful!
Workflow and composite view to visualize and explore LIME explanations: https://kni.me/w/3NciU4lnW6e4RMk1
Video of this view:
This view is designed to browse LIME explanations as tiny bar charts (small multiples).
While browsing them you can also keep in mind the distribution of confusion matrix classes or some feature of your interest (alcohol and sulphites of wine).
This view is hardcoded on the wine dataset but it makes a good example to visualize explanations interactively.
Compare SHAP and Shapley Values: https://kni.me/w/lOTFT_OHrVaWoV4P
This example is comparing a single explanation using those two similar strategies.
The explanations are compared via stacked bar charts, which are assuming the form of force diagram plots (scroll all the way down on hub page to see picture).
The SHAP Summarizer is used because the SHAP does not need so much training data like the Shapley Values loop does and it is also inefficient without this summarized version of the dataset downsizing to 100 rows the sampling table.
This view is hardcoded on the titanic dataset but it makes a good example to visualize explanations interactively.
Explain with Shapley Values anomalies/outliers detected with H2O Isolation Forest: https://kni.me/w/nPu68HZEu7Iy9N_4
A visualization is not available yet here but most approaches of the prior examples could be used.
I was planning to do a blog post at some point on all those examples.
Let’s see if in 2020 I find the time