Causal discovery and prediction mechanisms

18
Shohei Shimizu Shiga University / Osaka University / RIKEN AIP Causal discovery and prediction mechanisms France/Japan Machine Learning Workshop Paris, September, 2017

Transcript of Causal discovery and prediction mechanisms

Page 1: Causal discovery and prediction mechanisms

Shohei ShimizuShiga University / Osaka University / RIKEN AIP

Causal discovery and

prediction mechanisms

France/Japan Machine Learning WorkshopParis, September, 2017

Page 2: Causal discovery and prediction mechanisms

Introduction

2

Page 3: Causal discovery and prediction mechanisms

Causal discovery

• Unsupervised learning of causal relations• Estimate the causal structure under some

assumptions– Typically, directed acyclic graph (DAG)

3

Data

observations

Varia

bles Discovery

x4

x1

x5 x6

x3x2

+Assumptions

Page 4: Causal discovery and prediction mechanisms

Correlation does not imply causation

• Correlation: Countries eating more chocolates have more Nobel laureates

• Causation: Having increased their chocolate consumption increases the number of Nobel laureates

4

Chclt Nobel

GDPCorr. 0.791

Nob

el

Chocolate

Hidden Common cause

Maurage+13JNutritionMesserli12NEJM

Chclt Nobel

GDP

Page 5: Causal discovery and prediction mechanisms

5

Conventional applicationsEpidemiology Economics

Sleep problems

Depression mood

Sleep problems

Depression mood ?

or

OpInc.gr(t)

Empl.gr(t)

Sales.gr(t)

R&D.gr(t)

Empl.gr(t+1)

Sales.gr(t+1)

R&D(.grt+1)

OpInc.gr(t+1)

Empl.gr(t+2)

Sales.gr(t+2)

R&D.gr(t+2)

OpInc.gr(t+2)

(Moneta et al., 2012)(Rosenstrom et al., 2012)

Neuroscience Chemistry

(Campomanes et al., 2014)(Boukrina & Graves, 2013)

Page 6: Causal discovery and prediction mechanisms

New application

• Analyze the mechanisms of predictive models (Blobaum & Shimizu, MLSP2017)

• See what happens when intervening on features– Having changed the value (or distribution) of

chocolate consumption of a country– Not comparing the chocolate consumptions of

two different countries

6

Page 7: Causal discovery and prediction mechanisms

Example

• Predictive model: 𝑦" = 𝑓 𝑥&, 𝑥(

• Intervention effects of features on prediction depend on the causal structure

7

𝑋&

𝑋(

𝑌+

𝑋&

𝑋(

𝑌+𝑋&

𝑋(

𝑌+

5

-3

5

-3

25

-3

2

,,-.

𝐸(𝑌+|do(𝑋&))= -1 ,,-.

𝐸(𝑌+|do(𝑋&))= 5

iexx += 12 2

Page 8: Causal discovery and prediction mechanisms

Analysis of predictive mechanisms

• Combine the causal model and predictive model to model the prediction mechanism

8

𝑋&

𝑋( 𝑋5

𝑋6

𝑌

𝑋&

𝑋( 𝑌+𝑋5

𝑋6

𝑋&

𝑋( 𝑋5

𝑋6

𝑌

Causal model Predictive model

𝑌+

Prediction mechanism model

( )444 ,eyfx = ( )4321 ,,,ˆ xxxxfy = ( )( )cxdoyE i =|ˆ

Lasso, SVM, Decision Tree, Deep Learning, etc.

⋮ ( )( )cxdoyp i =|ˆ

Page 9: Causal discovery and prediction mechanisms

Illustrative example• Auto-MPG (miles per gallon) dataset• Linear regression• Which variable has the greatest intervention

effect on MPG prediction?• On which variable should we intervene to obtain

a certain MPG prediction?

9

Cylinders

Displacement

Weight

Horsepower

AccelerationMPG

𝑀𝑃𝐺;

Desired MPG prediction

Suggested intervention on cylinders

15 821 630 4

Page 10: Causal discovery and prediction mechanisms

Causal discovery

Key idea

10

Data

observations

Varia

bles Discovery

x4

x1

x5 x6

x3x2

+Assumptions

Page 11: Causal discovery and prediction mechanisms

Difficulty of causal discovery

• Many causal models give the same data distribution

11

Chclt NobelChclt Nobel

GDP GDP

Chclt Nobel

GDPHidden Common cause

Hidden Common cause

Hidden Common cause

Corr. 0.791

Nob

el

Chocolate

Page 12: Causal discovery and prediction mechanisms

Linear Non-Gaussian Acyclic Model (Shimizu+06JMLR)

• Linear DAG with non-Gaussian and independent errors

• Identifiable: causal directions and coefficients• Various extensions

– Nonlinearity (Hoyer+08NIPS, Zhang+09UAI)

– Cyclicity (Lacerda+08UAI)

– Hidden common causes (Hoyer+08IJAR; Henao+10JMLR; Shimizu+14JMLR)

12

iij

jiji exbx +=å¹ x1 x2

x3

21b

23b13b

2e

3e

1e

ie

Page 13: Causal discovery and prediction mechanisms

Different causal directions give different data distributions

• Use the idea of Blind Source Separation (ICA: Independent Component Analysis)

13

! !úû

ùêë

éúû

ùêë

é=ú

û

ùêë

é

2

1

212

1

101ee

bxx

"#"$%

sx

! !úû

ùêë

éúû

ùêë

é=ú

û

ùêë

é

2

112

2

1

101

eeb

xx

"#"$%

sx

0

0

x1 x2

e1 e2

x1

x2

x1

x2

x1 x2

e1 e2

A

A

Page 14: Causal discovery and prediction mechanisms

)()()(0

tttk

exBx +-=å=t

t t

Further examples• Hidden common causes: Overcomplete ICA

– Hoyer+IJAR, Henao+11JMLR, Shimizu+14JMLR

• Time-series: Blind deconvolution– Swanson+97, Hyvarinen+10JMLR

– Subsampling (Gong+15ICML)

14

i

Q

qqiq

ijjiji euxbx ++= åå

=¹ 1l

Page 15: Causal discovery and prediction mechanisms

Conclusion

15

Page 16: Causal discovery and prediction mechanisms

Conclusion• Causal discovery

– Unsupervised learning of causal relations– Capable of identifying the causal structure

under some assumptions

• Analysis of prediction mechanisms– A new application of causal discovery– Combining causal models and predictive

models to model their prediction mechanisms

16

Page 17: Causal discovery and prediction mechanisms

17

Page 18: Causal discovery and prediction mechanisms

Data-driven approach

1. Make assumptions on causal relations2. Derive constraints that should hold in

data3. Find the best model(s) that satisfies the

constraints actually holding in data

18

Chclt Nobel Chclt Nobel Chclt Nobel

If Chclt and Nobel are actually independent, select the most right causal graph

Three candidatesa. b. c.