Explaining model predictions with Shapley values + conditional...
Transcript of Explaining model predictions with Shapley values + conditional...
![Page 1: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/1.jpg)
www.nr.no
Explaining model predictions withShapley values + conditional inference trees R package
September 3rd, 2020
Annabelle Redelmeier & Martin Jullum
![Page 2: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/2.jpg)
Explanation problem
► Suppose you have a black-box model predicts the price of
car insurance based on some features.
► How can we explain the prediction of a black-box model to
a customer?
2
Features
Age
Gender
Type of car
# accidents
Time since car
registered
black box
model
Prediction
$123 /
month
Janet
![Page 3: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/3.jpg)
Explanation problem
► One way to explain a black-box model is to show how the
features contribute to the overall prediction (i.e $123).
33
0 100 150123
Feature 2
Feature 1
Feature 4
Feature 3
Feature 5
► To calculate these contributions, we can use
Shapley values.
![Page 4: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/4.jpg)
Why are explanations important?
► Engineer/scientist making the model: Are there problems
with the model? Edge cases? Biases?
► Society: Is the model fair? Legality?
► Individual: do I trust the model prediction/outcome?
► Company/group using the model: Do customers trust me?
Can I back the model up?
4
Peeking Inside the Black-Box:
A Survey on Explainable Artificial Intelligence (XAI)
Adadi, 2018
![Page 5: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/5.jpg)
To put explanations in context
Model agnostic Model specific
Local explanation LIME,
Shapley values,
Explanation Vectors,
Counterfactuals
explanations,
Saliency map
Global explanation Partial dependence plots,
Activation maximization,
Model distillation,
Decision trees,
Rule lists,
5
Explain a
specific
prediction
Understanding
the whole logic
of the model
Used for
any ML
model
Specific to a
model like
xgboost or
regression
Shapley values
![Page 6: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/6.jpg)
Shapley values
66
► Economic game theory in 1953.
► Setting: A game with 𝑀 players cooperating to maximize the total
gains of the game.
► Goal: Distribute the total gains in a “fair” way:
► Axiom 1: Players that contribute nothing get payout = 0.
► Axiom 2: Two players that contribute the same get equal payout.
► Axiom 3: The sum of the payouts = total gains.
► Lloyd Shapley (1953) found a unique way to distribute the total
gains in such a way that obeys these three axioms.
![Page 7: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/7.jpg)
Shapley values
► We assume player 𝑗 will contribute possibly differently
depending on who he or she is cooperating with.
► Suppose player 𝑗 is cooperating with players in group 𝑆.
► Then, we define the marginal contribution of player 𝑗 with
group 𝑆:
7
𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆
Contribution
with player 𝑗
Contribution
without player 𝑗
Contribution
function
![Page 8: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/8.jpg)
8
𝜙𝑗=
𝑆⊆𝑀\ 𝑗
𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆
Shapley values
The set of all
players in
the game
► Shapley value of player 𝑗 = payout for player 𝑗 defined as
Weight
function
3 4
![Page 9: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/9.jpg)
9
Example► 2 players: 𝑥1 and 𝑥2
► Then, 𝑥1’s Shapley value is:
𝜙𝑥1 =
𝑆⊆𝑀\ 𝑥1
𝑤 𝑆 𝑣 𝑆 ∪ {𝑥1} − 𝑣 𝑆
𝜙𝑗=
𝑆⊆𝑀\ 𝑗
𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆
𝑃(𝑀\{𝑥1}) = {{𝑥2}, {∅}}
= 𝑤1[𝑣 𝑥2, 𝑥1 − 𝑣( 𝑥2 )] + 𝑤2 𝑣 ∅, 𝑥1 − 𝑣 ∅
power set
![Page 10: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/10.jpg)
How does this translate to ML?
10
► Given individual = Janet.
► Feature values are:▪ Age = 55 years
▪ Gender = woman
▪ Type of car = Buick (Volvo)
▪ # Accidents = 3
▪ Time since car registered = 3.2 years
► Predicted value = $123.
become
become
► Cooperative game individual.
► Players feature values of the given individual.
► Total gains predicted value of the given individual.
become
![Page 11: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/11.jpg)
How does this translate to ML?
11
𝜙𝑗=
𝑆⊆𝑀\ 𝑗
𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆
Contribution of
feature 𝑗
Set of all
features in the
ML model
Subset of
featuresContribution of
feature set 𝑆
We have access to almost all we need. We’re only missing 𝑣(𝑆)…
ML model
Set of all features
Here we condition on the features in 𝑆equal to the feature values of the
individual (Janet)
𝑆 = {Age, gender}ҧ𝑆 = {Type of car, # Accidents, Time since registration,
gender}
![Page 12: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/12.jpg)
Problems with Shapley values in ML
12
𝜙𝑗=
𝑆⊆𝑀\ 𝑗
𝑤 𝑆 𝑣 𝑆 ∪ {𝑗} − 𝑣 𝑆
Calculating 𝜙𝑗 when
𝑀 is large.
Estimating 𝑣 𝑆 when
1 2
• If 𝑀 = 10, there are 210 = 1024 combinations.
• If 𝑀 = 30, there are 230 = 1.1 million!
![Page 13: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/13.jpg)
2) Computing 𝑣(𝑆)
► is rarely known.
▪ (Lundberg & Lee, 2017a) assume the features are
independent.▪ 𝑣(𝑆) can then be estimated by sampling from the full data set and calculating an
average.
▪ (Aas et al., 2019) estimate 𝑣(𝑆) parametrically and non-
parametrically in various methods.
13
2
![Page 14: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/14.jpg)
2) Computing 𝑣(𝑆)
► (Aas et al., 2019)’s methods to estimate
1. Empirical method:
1. Calculate a distance between the set of
features explained and every training instance
= 𝐷𝑆(𝑥∗, 𝑥𝑖).
2. Use this 𝐷 to calculate a weight
𝑤𝑆 𝑥∗, 𝑥𝑖 = exp(−𝐷2
2𝜎2) for each training instance.
3. Estimate
14
𝑣 𝑆 ≈σ𝑤𝑆 𝑓(𝑥 ҧ𝑆
𝑘 , 𝑥𝑆∗)
σ𝑤𝑆 (𝑥∗, 𝑥𝑘 )
ML model
Either samples
or all training
instances
2
![Page 15: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/15.jpg)
2) Computing 𝑣(𝑆)
► (Aas et al., 2019)’s methods to estimate
2. Gaussian method:
1. Assume the features are jointly Gaussian. This means we
have an explicit form for the conditional distribution
𝑝 𝑥 ҧ𝑆 𝑥𝑆 = 𝑥𝑆∗ .
2. Estimate the conditional mean and covariance matrix.
3. Sample 𝑘 times from this conditional Gaussian distribution
with the estimated mean and covariance matrix.
4. Estimate
15ML model
Samples from
the conditional
Gaussian
2
![Page 16: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/16.jpg)
2) Computing 𝑣(𝑆)
► The problem is that (Aas et al., 2019)’s methods assume
the features are continuously distributed.
16
We extend (Aas et al., 2019)’s method to handle
mixed (i.e continuous, categorical, ordinal) features
using conditional inference trees.
2
![Page 17: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/17.jpg)
17
► ctree (Hothorn et al., 2006) is a tree fitting statistical
model like CART and C4.5.
▪ What is so great about a tree?
► Differences:
▪ Solves for the splitting feature and split
point using hypothesis tests.
![Page 18: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/18.jpg)
18
How do we build a ctree?
1. Test each feature with the partial hypothesis test:
𝐻0𝑗: 𝐹 𝒀 𝑋𝑗 = 𝐹(𝒀)
2. If the global 𝑝-value is < 𝛼 , choose the feature that is
the least dependent of Y.
3. Find a split point based on this feature (and your
favourite splitting algorithm).
Can handle multivariate responses!
![Page 19: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/19.jpg)
19
► To estimate :
1. We fit a ctree where the tree features are the features in 𝑆and the tree response are the features not in 𝑆.
ҧ𝑆 ~ 𝑓(𝑆)
response features
Example:
𝑆 = {Age, gender}ҧ𝑆 = {Type of car, # Accidents, Time since registration}
![Page 20: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/20.jpg)
2. Given the tree, we find the leaf node based on Janet’s
features values:
20
𝑆 = {Age=55,
gender=woman}
woman 55 Volvo
woman
woman
woman
woman 54
52
60
55
Tesla
BMW
BMW
Nissan
0 1
2 1
1 0
3 2
0 1
gender age car accidents Time since
registration
Other train
observations
in the leaf:
![Page 21: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/21.jpg)
21
3. We sample from the leaf and use these samples to
estimate:
ML model
woman 55 Volvo
woman
woman
woman
woman 54
52
60
55
Tesla
BMW
BMW
Nissan
0 1
2 1
1 0
3 2
0 1
gender age car accidents Time since
registration
![Page 22: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/22.jpg)
Simulation studies
1. Simulate dependent categorical data to act as our features.
2. Define linear response model.
3. Convert categorical data to numerical data using one-hot
encoding so that we can use methods in (Aas et al., 2019).
22
Fixed coefficients
Indicator function Normal(0, 1)
![Page 23: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/23.jpg)
Simulation studies
4. Calculate the true Shapley values using the true
expectation:
5. Evaluate the methods using mean absolute error (MAE):
23# featuresTrue Shapley value of
feature 𝑗
Estimated Shapley value
of feature 𝑗
# test observations
![Page 24: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/24.jpg)
Simulation study 1
24
Dependence of features
Mean absolute error
(Lundberg & Lee, 2017a)
(Aas et al., 2019)
![Page 25: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/25.jpg)
Computation time
25
Over all
dependence,
𝜌, and test
observations
Note: Gaussian
is slow because
- It has to call
“predict”
function
more than
empirical +
ctree
- Matrix
inversions
and
sampling
![Page 26: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/26.jpg)
Simulation study 2
26
Dependence of features
Mean absolute error
(Lundberg & Lee, 2017a)
(Aas et al., 2019)
![Page 27: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/27.jpg)
Computation time
27
Over all
dependence,
𝜌, and test
observations
![Page 28: Explaining model predictions with Shapley values + conditional …anderslo/BI_seminar-03-09-2020_2.pdf · 2020. 9. 3. · Explanation problem One way to explain a black-box model](https://reader033.fdocuments.net/reader033/viewer/2022051901/5fefdce503c2a24038577383/html5/thumbnails/28.jpg)
Limitations
► The ctree/Gaussian/empirical methods cannot be used for
more than 23 (30?) features due to computational
problems (regardless of one-hot encoding…). What can we
do to improve this?
▪ GroupSHAP?
▪ New approach to sampling: (Grah and Thouvenot, 2020)1?
281A Projected Stochastic Gradient Algorithm for Estimating Shapley Value Applied in Attribute Importance