Edit: broken links were updated and Integrated Deployment added.
this is Paolo from KNIME.
I realized there are lots of materials and examples on machine learning interpretability but it looks like they all scattered on the hub and I wanted to do a post where all materials could be found in a single web page (also the Twitter videos for example!). Enjoy!
Interpretability view with Shapley Values and Partial Dependence:
Video of this view:
The cool feature of this view is the chance of selecting one or more Shapley Value explanations (feature importance) in the Bubble Chart, see the actual values explaining the predictions in the Violin Plot and visualizing the ICE curves in the Partial Dependence view. There is also a Surrogate Decision Tree. This tree is overfitting your predictions so it will do the split based on the predictions of the real model.
It used to work with PMML nodes:
A more recent example where models are automatically trained and explained is available here using Integrated Deployment:
The view should automatically visualize the explanations of any classification model correctly captured with Integrated Deployment.
Workflow and composite view to visualize and explore LIME explanations:
Video of this view:
This view is designed to browse LIME explanations as tiny bar charts (small multiples).
While browsing them you can also keep in mind the distribution of confusion matrix classes or some feature of your interest (alcohol and sulphites of wine).
This view is hardcoded on the wine dataset but it makes a good example to visualize explanations interactively.
Compare SHAP and Shapley Values:
This example is comparing a single explanation using those two similar strategies.
The explanations are compared via stacked bar charts, which are assuming the form of force diagram plots (scroll all the way down on hub page to see picture).
The SHAP Summarizer is used because the SHAP does not need so much training data like the Shapley Values loop does and it is also inefficient without this summarized version of the dataset downsizing to 100 rows the sampling table.
This view is hardcoded on the titanic dataset but it makes a good example to visualize explanations interactively.
Explain with Shapley Values anomalies/outliers detected with H2O Isolation Forest:
A visualization is not available yet here but most approaches of the prior examples could be used.
Where to find more examples:
KNIME Machine Learning Interpretability Examples:
Verified Components category:
Examples adopting Integrated Deployment for explaining any ML algorithm with a single configurable component:
I was planning to do a blog post at some point on all those examples.
Let’s see if in 2021 I find the time