The Mythos of Model Interpretability -...

Post on 16-Feb-2018

223 views 2 download

Transcript of The Mythos of Model Interpretability -...

The Mythos of Model Interpretability

Zachary C. Lipton

https://arxiv.org/abs/1606.03490

Outline

• What is interpretability?

• What are its desiderata?

• What model properties confer interpretability?

• Caveats, pitfalls, and takeaways

What is Interpretability?

• Many papers make axiomatic claims that some model is interpretable and therefore preferable

• But what interpretability is and precisely what desiderata it serves are seldom defined

• Does interpretability hold consistent meaning across papers?

Inconsistent Definitions

• Papers use the words interpretable, explainable, intelligible, transparent, and understandable, both interchangeably (within papers) and inconsistently (across papers)

• One common thread, however, is that interpretability is something other than performance

We want good models

Evaluation Metric

We also want interpretable models

Evaluation Metric

Interpretation

The Human Wants Something the Metric Doesn’t

Evaluation Metric

Interpretation

What Gives?• So either the metric captures everything and people

seeking interpretable models are crazy or…

• The metric / loss functions we optimize are fundamentally mismatched from real life objectives

• We hope to refine the discourse on interpretability, introducing more specific language

• Through the lens of the literature, we create a taxonomy of both objectives & methods

Outline

• What is interpretability?

• What are its desiderata?

• What model properties confer interpretability?

• Caveats, pitfalls, and takeaways

Trust

• Does the model know when it’s uncertain?

• Does the model make same mistakes as human?

• Are we comfortable with the model?

Causality• We may want models to

tell us something about the natural world

• Supervised models are trained simply to make predictions, but often used to take actions

• Caruana (2015) shows a mortality predictor (for use in triage) that assigns lower risk to asthma patients

Transferability• The idealized training

setups often differ from real world

• Real problem may be non-stationary, noisier, etc.

• Want sanity-checks that the model doesn’t depend on weaknesses in setup

Informativeness• We may train a model

to make a decision

• But it’s real purpose is to aid a person in making a decision

• Thus an interpretationmay simply be valuable for the extra bits it carries

Outline

• What is interpretability?

• What are its desiderata?

• What model properties confer interpretability?

• Caveats, pitfalls, and takeaways

Transparency

• Proposed solutions conferring interpretability tend to fall into two categories

• Transparency addresses understanding how the model works

• Explainability concerns the model’s ability to offer some (potentially post-hoc) explanation

Simulatability• One notion of transparency

is simplicity

• This accords with papers advocating small decision trees

• A model is transparent if a person can step through the algorithm in reasonable time

Decomposability

• A relaxed notion requires understanding individualcomponents of a model

• Such as: weights of a linear model or the nodes of a decision tree

Transparent Algorithms

• A yet weaker notion would require onlythat we understand the behavior algorithm

• E.g. convergence of convex optimizations, generalization bounds

Post-Hoc Interpretability

Ah yes, something cool is happening in node 750,345,167… maybe it sees a cat?

Maybe we’ll see something awesome if we jiggle the inputs?

Verbal Explanations• Just as people generate

explanations (absent transparency), we might train a (possibly separate)model to generate explanations

• We might consider image captions as interpretations of object predictions

(Image: Karpathy et al 2015)

Saliency Maps

• While the full relationship between input and output might be impossible to describe succinctly, local explanations are potentially useful.

(Image: Wang et al 2016)

Case-Based Explanations• Another way to generate

a post-hoc explanation might be to retrieve labeled items that are deemed similar by the model

• For some models, we can retrieve histories from similar patients

(Image: Mikolov et al 2014)

Outline

• What is interpretability?

• What are its desiderata?

• What model properties confer interpretability?

• Caveats, pitfalls, and takeaways

Discussion Points

• Linear models not strictly more interpretable than deep learning

• Claims about interpretability must be qualified

• Transparency may be at odds with the goals of AI

• Post-hoc interpretations may potentially mislead

Thanks!Acknowledgments:Zachary C. Lipton was supported by the Division of Biomedical Informatics at UCSD, via training grant (T15LM011271) from the NIH/NLM. Thanks to Charles Elkan, Julian McAuley, David Kale, Maggie Makar, Been Kim, Lihong Li, Rich Caruana, Daniel Fried, Jack Berkowitz, & Sepp Hochreiter

References:The Mythos of Model Interpretability (ICML Workshop on Human Interpretability 2016) - ZC Liptonhttp://arxiv.org/abs/1511.03677

Directly Modeling Missing Data with RNNs (MLHC 2016) - ZC Lipton, DC Kale, R Wetzelhttp://arxiv.org/abs/1606.04130

Learning to Diagnose (ICLR 2016) - ZC Lipton, DC Kale, Charles Elkan, R Wetzelhttp://arxiv.org/abs/1511.03677

Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. (2015) - R Caruana et alhttp://dl.acm.org/citation.cfm?id=2788613