Causal inference and counterfactual prediction in machine ...
Causal discovery and prediction mechanisms
-
Upload
shiga-university-riken -
Category
Data & Analytics
-
view
273 -
download
1
Transcript of Causal discovery and prediction mechanisms
Shohei ShimizuShiga University / Osaka University / RIKEN AIP
Causal discovery and
prediction mechanisms
France/Japan Machine Learning WorkshopParis, September, 2017
Introduction
2
Causal discovery
• Unsupervised learning of causal relations• Estimate the causal structure under some
assumptions– Typically, directed acyclic graph (DAG)
3
Data
observations
Varia
bles Discovery
x4
x1
x5 x6
x3x2
+Assumptions
Correlation does not imply causation
• Correlation: Countries eating more chocolates have more Nobel laureates
• Causation: Having increased their chocolate consumption increases the number of Nobel laureates
4
Chclt Nobel
GDPCorr. 0.791
Nob
el
Chocolate
Hidden Common cause
Maurage+13JNutritionMesserli12NEJM
Chclt Nobel
GDP
5
Conventional applicationsEpidemiology Economics
Sleep problems
Depression mood
Sleep problems
Depression mood ?
or
OpInc.gr(t)
Empl.gr(t)
Sales.gr(t)
R&D.gr(t)
Empl.gr(t+1)
Sales.gr(t+1)
R&D(.grt+1)
OpInc.gr(t+1)
Empl.gr(t+2)
Sales.gr(t+2)
R&D.gr(t+2)
OpInc.gr(t+2)
(Moneta et al., 2012)(Rosenstrom et al., 2012)
Neuroscience Chemistry
(Campomanes et al., 2014)(Boukrina & Graves, 2013)
New application
• Analyze the mechanisms of predictive models (Blobaum & Shimizu, MLSP2017)
• See what happens when intervening on features– Having changed the value (or distribution) of
chocolate consumption of a country– Not comparing the chocolate consumptions of
two different countries
6
Example
• Predictive model: 𝑦" = 𝑓 𝑥&, 𝑥(
• Intervention effects of features on prediction depend on the causal structure
7
𝑋&
𝑋(
𝑌+
𝑋&
𝑋(
𝑌+𝑋&
𝑋(
𝑌+
5
-3
5
-3
25
-3
2
,,-.
𝐸(𝑌+|do(𝑋&))= -1 ,,-.
𝐸(𝑌+|do(𝑋&))= 5
iexx += 12 2
Analysis of predictive mechanisms
• Combine the causal model and predictive model to model the prediction mechanism
8
𝑋&
𝑋( 𝑋5
𝑋6
𝑌
𝑋&
𝑋( 𝑌+𝑋5
𝑋6
𝑋&
𝑋( 𝑋5
𝑋6
𝑌
Causal model Predictive model
𝑌+
Prediction mechanism model
( )444 ,eyfx = ( )4321 ,,,ˆ xxxxfy = ( )( )cxdoyE i =|ˆ
Lasso, SVM, Decision Tree, Deep Learning, etc.
⋮ ( )( )cxdoyp i =|ˆ
Illustrative example• Auto-MPG (miles per gallon) dataset• Linear regression• Which variable has the greatest intervention
effect on MPG prediction?• On which variable should we intervene to obtain
a certain MPG prediction?
9
Cylinders
Displacement
Weight
Horsepower
AccelerationMPG
𝑀𝑃𝐺;
Desired MPG prediction
Suggested intervention on cylinders
15 821 630 4
Causal discovery
Key idea
10
Data
observations
Varia
bles Discovery
x4
x1
x5 x6
x3x2
+Assumptions
Difficulty of causal discovery
• Many causal models give the same data distribution
11
Chclt NobelChclt Nobel
GDP GDP
Chclt Nobel
GDPHidden Common cause
Hidden Common cause
Hidden Common cause
Corr. 0.791
Nob
el
Chocolate
Linear Non-Gaussian Acyclic Model (Shimizu+06JMLR)
• Linear DAG with non-Gaussian and independent errors
• Identifiable: causal directions and coefficients• Various extensions
– Nonlinearity (Hoyer+08NIPS, Zhang+09UAI)
– Cyclicity (Lacerda+08UAI)
– Hidden common causes (Hoyer+08IJAR; Henao+10JMLR; Shimizu+14JMLR)
12
iij
jiji exbx +=å¹ x1 x2
x3
21b
23b13b
2e
3e
1e
ie
Different causal directions give different data distributions
• Use the idea of Blind Source Separation (ICA: Independent Component Analysis)
13
! !úû
ùêë
éúû
ùêë
é=ú
û
ùêë
é
2
1
212
1
101ee
bxx
"#"$%
sx
! !úû
ùêë
éúû
ùêë
é=ú
û
ùêë
é
2
112
2
1
101
eeb
xx
"#"$%
sx
0
0
x1 x2
e1 e2
x1
x2
x1
x2
x1 x2
e1 e2
A
A
)()()(0
tttk
exBx +-=å=t
t t
Further examples• Hidden common causes: Overcomplete ICA
– Hoyer+IJAR, Henao+11JMLR, Shimizu+14JMLR
• Time-series: Blind deconvolution– Swanson+97, Hyvarinen+10JMLR
– Subsampling (Gong+15ICML)
14
i
Q
qqiq
ijjiji euxbx ++= åå
=¹ 1l
Conclusion
15
Conclusion• Causal discovery
– Unsupervised learning of causal relations– Capable of identifying the causal structure
under some assumptions
• Analysis of prediction mechanisms– A new application of causal discovery– Combining causal models and predictive
models to model their prediction mechanisms
16
17
Data-driven approach
1. Make assumptions on causal relations2. Derive constraints that should hold in
data3. Find the best model(s) that satisfies the
constraints actually holding in data
18
Chclt Nobel Chclt Nobel Chclt Nobel
If Chclt and Nobel are actually independent, select the most right causal graph
Three candidatesa. b. c.