EPSE 581C: Causal Inference for Applied Researchers
Transcript of EPSE 581C: Causal Inference for Applied Researchers
EPSE 581C: Causal Inference for Applied Researchers
Ed Kroc
University of British Columbia
May 22, 2019
Ed Kroc (UBC) Causal Inference May 22, 2019 1 / 48
Last time
Model misspecification and (some of) its effects
Ed Kroc (UBC) Causal Inference May 22, 2019 2 / 48
Today
More model misspecification and (some of) its effects
Consistency and unbiasedness of estimators
Ed Kroc (UBC) Causal Inference May 22, 2019 3 / 48
Regression Discontinuity (RD) design
Suppose our data look like this:
Ed Kroc (UBC) Causal Inference May 22, 2019 4 / 48
Regression discontinuity design
Estimation:
It would be unreasonable to assume equal slopes on both sides of thethreshold. Thus, we may propose the model:
Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.
Under this specification, our estimate of the ACE is:
zACE pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q
“ pβT ` 2pβTX
But what if we misspecified the model by assuming equalslopes on both sides of the threshold?
This would produce a case of model misspecification.
Ed Kroc (UBC) Causal Inference May 22, 2019 5 / 48
Model misspecification
Broadly construed, there are three main types of model misspecification:
(1) Misspecification of the random error structure.
Heteroskedasticity of errors
Autocorrelation of errors (response)
(2) Misspecification of the “link” function.
Severe lack of normality of errors
(3) Misspecification of the covariate structure.
Misspecified functional form for covariates
Omitted covariates
All three of these issues are common to all forms of regression analysis(including factor analysis, SEMs, mixed effects modelling, etc.)
In practice, (3) can be very difficult to detect and to properlycorrect for. Unfortunately, (3) is also the most important case.
Ed Kroc (UBC) Causal Inference May 22, 2019 6 / 48
Regression assumptions: descriptive/predictive vs. causal
The best way to check for violations of any of the regressionassumptions is by examining residual plots (or standardized residualplots for GLMs).
One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).
If all assumptions are satisfied, then all residual plots should looksomething like a random blob:
Ed Kroc (UBC) Causal Inference May 22, 2019 7 / 48
Regression assumptions: descriptive/predictive vs. causal
One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).
If errors autocorrelate, residual vs. fitted plot may look like:
Ed Kroc (UBC) Causal Inference May 22, 2019 8 / 48
Regression assumptions: descriptive/predictive vs. causal
One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).
If errors have unequal variances, then residual vs. fitted plot may looklike:
Ed Kroc (UBC) Causal Inference May 22, 2019 9 / 48
Regression assumptions: descriptive/predictive vs. causal
One should always plot residuals vs. fitted values, and residualsvs. each predictor (even this is not sufficient to detect violations).
If the functional form of the predictors is misspecified, then residualsvs. fitted plot may look like:
Ed Kroc (UBC) Causal Inference May 22, 2019 10 / 48
Model misspecification: misspecified covariate function
(3) Misspecification of the covariate structure.
Must respecify the functional form of the model.
Usually diagnosable by looking at residual plots and/or examining theraw data, but this is rarely trivial.
Moreover, a better functional form may be too complicated toreasonably estimate given the amount of data we have.
Taught to err on the side of simplicity in explanatory/predictiveinference,
. . . but for causal inference, this issue cannot be downplayed.
Ed Kroc (UBC) Causal Inference May 22, 2019 11 / 48
Model misspecification: misspecified covariate function
Suppose our data look like this:
Ed Kroc (UBC) Causal Inference May 22, 2019 12 / 48
Model misspecification: misspecified covariate function
Estimation:
It would be unreasonable to assume equal slopes on both sides of thethreshold. Thus, we may propose the model:
Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.
Under this specification, our estimate of the ACE is:
zACE pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q
“ pβT ` 2pβTX
But what if we misspecified the model by assuming equalslopes on both sides of the threshold?
This would produce a case of model misspecification.
Ed Kroc (UBC) Causal Inference May 22, 2019 13 / 48
Model misspecification: misspecified covariate function
Suppose we propose the misspecified model for our example data inthe previous diagram:
Y “ β0 ` βTT ` βXX ` δ1.
Under this specification, our estimate of the ACE is:
zACEwrong pX “ 2q “ pEpY p1q | X “ 2q ´ pEpY p0q | X “ 2q
“ pβ1T
However, we know that the more appropriate estimate from theproperly specified model is
zACE rightpX “ 2q “ pβT ` 2pβTX .
Ed Kroc (UBC) Causal Inference May 22, 2019 14 / 48
Model misspecification: misspecified covariate function
Misspecified regression model in orange:
Ed Kroc (UBC) Causal Inference May 22, 2019 15 / 48
Model misspecification: misspecified covariate function
Thus, the estimate from our misspecified model is off by:
zACE rightpX “ 2q ´zACEwrong pX “ 2q “ pβT ` 2pβTX ´ pβ1T
It is very important to notice that
pβT ‰ pβ1T
This is because our estimates depend on the model specification.
Ed Kroc (UBC) Causal Inference May 22, 2019 16 / 48
Model misspecification: misspecified covariate function
Recall: we proposed the misspecified model for our example data:
Y “ β0 ` βTT ` βXX ` δ1.
In actuality, the true model is:
Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ.
So the error in the misspecified model, δ1, does not satisfy thenecessary assumptions of the regression framework. In particular:
δ1 “ βTXT ¨ X ` δ,
so δ1 is confounded with T and X ; i.e. δ1 is not independent of T orX .
Ed Kroc (UBC) Causal Inference May 22, 2019 17 / 48
Model misspecification: misspecified covariate function
Under our model misspecification, we know that a term is missingfrom our model; i.e. the interaction of T and X is absorbed into theerror term:
δ1 “ βTXT ¨ X ` δ,
Thus, Covpδ1,T q ‰ 0, and so
βT “CovpY ,T q ´ Covpδ1,T q
VarpT q
However, our standard regression estimators assume that there are noviolations of assumptions; thus, our actual estimate is:
pβ1T “yCovpY ,T q
xVarpT q“
řni“1pyi ´ syqpti ´ stqřn
i“1pti ´ stq2
Ed Kroc (UBC) Causal Inference May 22, 2019 18 / 48
Model misspecification: misspecified covariate function
We know that the actual population parameter we are interested in isβT from the correctly specified model:
Y “ β0 ` βTT ` βXX ` βTXT ¨ X ` δ
Doing the same covariance algebra as before, and noting that allregression assumptions are (mostly) satisfied since the model isproperly specified, we find
βT “CovpY ,T q ´ βTXCovpTX ,T q
VarpT q“
CovpY ,T q
VarpT q´ βTXEpX q.
But using the misspecified model, we do not estimate this! Instead,we estimate only the first term:
β1T “CovpY ,T q
VarpT q
Ed Kroc (UBC) Causal Inference May 22, 2019 19 / 48
Model misspecification: Ex. 1
Misspecified model on LEFT; properly specified model on RIGHT:
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
x
y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
x
y
Ed Kroc (UBC) Causal Inference May 22, 2019 20 / 48
Model misspecification: Ex. 1
Misspecified model on LEFT; properly specified model on RIGHT:
0.0 0.5 1.0 1.5
-0.15
-0.05
0.05
0.10
0.15
0.20
fitted(mod.w)
residuals(mod.w)
0.0 0.5 1.0 1.5 2.0
-0.10
-0.05
0.00
0.05
0.10
fitted(mod.r)
residuals(mod.r)
Clear evidence of model misspecification in residuals vs. fitted plot!
Ed Kroc (UBC) Causal Inference May 22, 2019 21 / 48
Model misspecification: Ex. 1
Misspecified model on LEFT; properly specified model on RIGHT:
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
x
y
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
x
yzACEwrong pX “ 0.5q “ pβ1T “ 0.515
zACE rightpX “ 0.5q “ pβT ` 0.5pβTX “ 0.008` 0.5 ˚ 0.999 “ 0.508
Not too bad. . ., but what if the misspecification was worse?
Ed Kroc (UBC) Causal Inference May 22, 2019 22 / 48
Model misspecification: Ex. 2
Misspecified model on LEFT; properly specified model on RIGHT:
0.0 0.2 0.4 0.6 0.8 1.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
x
y
0.0 0.2 0.4 0.6 0.8 1.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
x
y
Ed Kroc (UBC) Causal Inference May 22, 2019 23 / 48
Model misspecification: Ex. 1
Misspecified model on LEFT; properly specified model on RIGHT:
-2.5 -2.0 -1.5 -1.0 -0.5 0.0
-0.3
-0.2
-0.1
0.0
0.1
0.2
fitted(mod.w2)
residuals(mod.w2)
-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0
-0.10
-0.05
0.00
0.05
0.10
fitted(mod.r)
residuals(mod.r)
Clear evidence of model misspecification in residuals vs. fitted plot!
Ed Kroc (UBC) Causal Inference May 22, 2019 24 / 48
Model misspecification: Ex. 2
Misspecified model on LEFT; properly specified model on RIGHT:
0.0 0.2 0.4 0.6 0.8 1.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
x
y
0.0 0.2 0.4 0.6 0.8 1.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
x
yzACEwrong pX “ 0.5q “ pβ1T “ ´0.152
zACE rightpX “ 0.5q “ pβT ` 0.5pβTX ` p0.5q2pβTX2 ““ ´0.527
Misspecified model ACE estimate is 3-times too small.
Ed Kroc (UBC) Causal Inference May 22, 2019 25 / 48
Model misspecification: ignore fit statistics
Notice: fit statistics are useless here.
That is, misspecified models can still “fit” the data very well.
Good enough for explanatory modelling.
Not good enough for causal modelling!
Ignore all fit statistics when performing causal modelling, including:
Goodness-of-fit F -tests
R2 statistics
Information criterion statistics (AIC, BIC, DIC, etc.)
Ed Kroc (UBC) Causal Inference May 22, 2019 26 / 48
Model misspecification: ignore statistical significance
Notice: statistical significance of model coefficient estimates isirrelevant here.
Recall numerical Ex. 1:
All estimates significant for misspecified model
In properly specified model, intercept (pβ0) and marginal treatment
(pβT ) estimates not statistically significant.
Recall numerical Ex. 2:
All estimates significant for misspecified model
In properly specified model, intercept (pβ0) and marginal first-order
treatment (pβT ) estimates not statistically significant.
Ed Kroc (UBC) Causal Inference May 22, 2019 27 / 48
Model misspecification: bigger sample size will never fixthe problem
It is common “wisdom” that the more data you have, the better youwill be able to quantify your effects of interest.
This is true for explanatory/descriptive and predictive modelling, butfalse for causal modelling.
Ed Kroc (UBC) Causal Inference May 22, 2019 28 / 48
Consistency and unbiasedness of estimators
There are two extremely important and desirable properties we usually likeour estimators to have:
Consistency
Unbiasedness
Other properties are also often desirable (e.g. asymptotic normality), butconsistency and unbiasedness are by far the most important.
Ed Kroc (UBC) Causal Inference May 22, 2019 29 / 48
Unbiasedness of estimators
Generally, an estimator pθ for some population parameter θ of a randomvariable of interest X is called unbiased if:
Eppθq “ θ
In words, an estimator is unbiased for its estimand (what it is trying toestimate) if, on average, the estimator equals the estimand.
Example: In a random sample, the sample mean, pθ “ 1n
řni“1 Xi , is an
unbiased estimator of the population mean, θ “ EpX q:
E
˜
1
n
nÿ
i“1
Xi
¸
“1
n
nÿ
i“1
EpXi q
“1
n
nÿ
i“1
EpX q
“nEpX q
n“ EpX q X
Ed Kroc (UBC) Causal Inference May 22, 2019 30 / 48
Consistency of estimators
Generally, an estimator pθ is called consistent if, as the sample size increaseswithout bound, the sample value of pθ approaches a single number, a:
for all ε ą 0, limnÑ8
Prp|pθ ´ a| ą ε | Snq “ 0,
where Sn denotes a random sample of size n.
If an estimator is both unbiased and consistent, then not only does itsaverage value equal the true estimand of interest, but as we increasethe sample size, the estimator becomes more and more precise aboutthis true value.
That is, such an estimator is both accurate and precise as sample sizeincreases.
Ed Kroc (UBC) Causal Inference May 22, 2019 31 / 48
Consistency and unbiasedness of estimators
It is entirely possible that an estimator is consistent but biased;e.g. the unadjusted sample variance:
1
n
nÿ
i“1
pxi ´ sxq2
It is also entirely possible that an estimator is unbiased butinconsistent; e.g. using the average of the sample min and max toestimate the population mean:
maxtxi : 1 ď i ď nu `mintxi : 1 ď i ď nu
2
Estimators can also be neither unbiased nor consistent. Very bad!
Also, some estimators are asymptotically unbiased and consistent.
Ed Kroc (UBC) Causal Inference May 22, 2019 32 / 48
Consistency and unbiasedness of estimators: Example
The sample mean is an unbiased and consistent estimator of thepopulation mean (for population random variables with finite mean):
sX “1
n
nÿ
i“1
Xi
The average of the sample extremes is an unbiased but inconsistentestimator of the population mean:
Avgpmin,maxq :“maxtxi : 1 ď i ď nu `mintxi : 1 ď i ď nu
2
Ed Kroc (UBC) Causal Inference May 22, 2019 33 / 48
Consistency and unbiasedness of estimators: Example
Example: Suppose we have 30 observations from a normallydistributed population, X „ Np3.3, 1q.
These observations generate the two sample statistics:
sX “ 3.39, Avgpmin,maxq “ 3.28
Both seem pretty good. This is no accident either.
Ed Kroc (UBC) Causal Inference May 22, 2019 34 / 48
Consistency and unbiasedness of estimators: Example
Sampling Distribution of the Sample Mean
Sample Mean
Frequency
2.8 3.0 3.2 3.4 3.6 3.8 4.0
050
100
150
200
Sampling Distribution of the Sample Average Spread
Sample Average Spread
Frequency
2.0 2.5 3.0 3.5 4.0 4.5
050
100
150
200
Simulated 1000 draws of 30 observations from X to create these(estimated) sampling distributions.
Both estimators unbiased, but average of extremes is not veryprecise. . .Ed Kroc (UBC) Causal Inference May 22, 2019 35 / 48
Consistency and unbiasedness of estimators: Example
Histogram of avg
avg
Frequency
3.0 3.2 3.4 3.6
050
100
150
200
Histogram of rng
rng
Frequency
2.0 2.5 3.0 3.5 4.0 4.5
050
100
150
200
250
Increased sample size: simulated 1000 draws of 100 observations fromX to create these new (estimated) sampling distributions.
Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.
Ed Kroc (UBC) Causal Inference May 22, 2019 36 / 48
Consistency and unbiasedness of estimators: Example
Histogram of avg
avg
Frequency
3.20 3.25 3.30 3.35 3.40
050
100
150
200
250
Histogram of rng
rng
Frequency
2.5 3.0 3.5 4.0
050
100
150
200
250
300
Increased sample size: simulated 1000 draws of 1000 observationsfrom X to create these new (estimated) sampling distributions.
Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.
Ed Kroc (UBC) Causal Inference May 22, 2019 37 / 48
Consistency and unbiasedness of estimators: Example
Histogram of avg
avg
Frequency
3.26 3.28 3.30 3.32 3.34
0500
1000
1500
2000
Histogram of rng
rng
Frequency
2.0 2.5 3.0 3.5 4.0 4.5
01000
2000
3000
Increased sample size: simulated 1000 draws of 10,000 observationsfrom X to create these new (estimated) sampling distributions.
Notice: sample mean gets more precise with larger sample size, butsample average of extremes does not.
Observe consistency of sample mean, inconsistency of sample averageof extremes.
Ed Kroc (UBC) Causal Inference May 22, 2019 38 / 48
Consistency and unbiasedness of estimators
When all the usual regression assumptions hold, the standardestimators for the model coefficients (e.g. maximum likelihood orordinary least squares estimators) are consistent and unbiased for thetrue population values of those parameters.
However, when the regression model is misspecified, the estimatorsare still consistent, but they are no longer unbiased. Moreover, theyare not even asymptotically unbiased.
White (1982), Econometrica: MLEs of regression coefficients willapproach the values that minimize the Kullback-Leibler divergencebetween the specified model and the true model.
Ed Kroc (UBC) Causal Inference May 22, 2019 39 / 48
Consistency and unbiasedness of estimators
UPSHOT: if the functional form of your model is misspecified, and/orif you are missing important covariates, it doesn’t matter how muchdata you have: your estimates will always be wrong, even if they arevery precise.
This is a HUGE problem for causal inference.
It is common “wisdom” that the more data you have, the better youwill be able to quantify your effects of interest; this is false whenperforming model-based causal inference.
Ed Kroc (UBC) Causal Inference May 22, 2019 40 / 48
Model misspecification: omitted variables
So far, we have only focused on model misspecification where thefunctional form of the covariates is misspecified, but our modelsalways contained all explanatory variables.
In practical non-experimental research, we will always be missingsome confounders; we can’t measure everything, or even knoweverything we should always be measuring!
Detecting important omitted variables can be very difficult.
Residual plots still the way to go, but they will not always suggestomitted variable bias.
Hence, why the exchangeability of treatment is so important in anRD-design: treatment is “as good as” randomly assigned near thethreshold; thus, biasing effects of omitted variables should benegligible (near the threshold).
Ed Kroc (UBC) Causal Inference May 22, 2019 41 / 48
A return to controlled experiments
Why don’t we hear about these issues (omitted variables, modelmisspecification) in the context of controlled experiments?
ANSWER: usually, well-controlled experiments bypass these issues bydesign.
Example: Does an increase in NO2 in native SE BC soil causeArabidopsis lyrata leaves to grow larger?
3ˆ 5 factorial design on 90 seeds:
3 levels of NO2: control (native soil), 1.5 times average NO2
concentration, 2 times average NO2 concentration.
5 time points (after sprouting), no repeated measures: 5 days, 10 days,15 days, 20 days, 25 days.
Outcome measure: length of eighth leaves.
Ed Kroc (UBC) Causal Inference May 22, 2019 42 / 48
A return to controlled experiments
Here, we could propose a full 2-way ANOVA model:
Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε,
where τX denotes the average treatment effect of X , µ denotes thegrand average lengths of eighth leaves (over all nitrogen levels andtime points), and ε denotes random error.
Experiment is controlled to fix the values of possible confounders:e.g. humidity, light, water, O2 levels, etc.
Levels of explanatory factors are also fixed; NO2 and age arecontinuous variables, but experimental control fixes the possiblevalues these variables can assume to finite sets.
Ed Kroc (UBC) Causal Inference May 22, 2019 43 / 48
A return to controlled experiments
Here, we could propose a full 2-way ANOVA model:
Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε.
However, suppose there was some unknown confounder V that wedidn’t account for: e.g. maybe 10 of the 90 seeds are less viable thanthe others.
But here, randomization of seeds to experimental treatments (NO2ˆ
age) will likely remove the effect of this confounder:
Prpseed i P NO2 ˆ age | V q “ Prpseed i P NO2 ˆ ageq.
Therefore,
PrpLen | NO2, age, V q “ PrpLen | NO2, ageq
Ed Kroc (UBC) Causal Inference May 22, 2019 44 / 48
A return to controlled experiments
Here, we could propose a full 2-way ANOVA model:
Len “ µ` τNO2 ` τage ` τNO2ˆage ` ε.
What about misspecifying the functional form of the model?
Not an issue in ANOVA of controlled, randomized experiments.
Notice: ANOVA model does not have to posit an explicit functionalform between response and covariates because all covariates arecategorized into finitely many, controlled factor levels.
Suppose Len is (positive, concave down) quadratically related to age.Then average treatment effects τage will increase quadratically overthe 5 fixed ages since we estimate the average effect for each fixedage.
Ed Kroc (UBC) Causal Inference May 22, 2019 45 / 48
A return to controlled experiments
Contrast with observational protocol: if we cannot control the age ofthe plants, then we are forced to quantify the average effect of afunction of age on response, e.g.
Len “ β0 ` τNO2 ` βage ¨ age ` βN02ˆage ¨ τNO2 ¨ age ` ε
Such a regression model assumes a linear relationship between ageand response.
But since we have no control over age, we are forced to model allages simultaneously; this is much harder to do than to simplycalculate the average effect of age on response for a finite, fixednumber of age categories.
Ed Kroc (UBC) Causal Inference May 22, 2019 46 / 48
A return to controlled experiments
A natural idea may be to simply ad hoc categorize age; i.e. weobserve 90 plants in the wild with arbitrary ages, but then categorizeage after the fact into 3 categories: 0–9 days, 10–19 days, 20–29 days.
But this only fixes the problem if sample units are exchangeable (overnitrogen treatment and all possible confounders) within each ad hocage category.
However, is nitrogen level fixed in the wild? Probably not!
And older plants may be exposed to more light and water (otherwisethe plants would die before reaching 10 days of age).
Therefore, in order to ensure exchangeability of sample units overtreatments, we now have to account for these omitted variables, aswell as the functional relationships between them, and betweennitrogen. . . So we are back to our model misspecification issues.
Ed Kroc (UBC) Causal Inference May 22, 2019 47 / 48
Next time
The Neyman-Rubin causal model
Propensity scores
Ed Kroc (UBC) Causal Inference May 22, 2019 48 / 48