Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference...

290
Advanced Quantitative Research Methodology, Lecture Notes: Matching Methods for Causal Inference 1 Gary King GaryKing.org March 31, 2013 1 c Copyright 2013 Gary King, All Rights Reserved. Gary King (Harvard, IQSS) 1 / 66

Transcript of Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference...

Page 1: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Advanced Quantitative Research Methodology, LectureNotes: Matching Methods for Causal Inference1

Gary King

GaryKing.org

March 31, 2013

1 c©Copyright 2013 Gary King, All Rights Reserved.Gary King (Harvard, IQSS) 1 / 66

Page 2: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 3: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 4: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 5: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 6: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 7: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 8: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 9: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview

Problem: Model dependence (review)

Solution: Matching to preprocess data (review)

Problem: Many matching methods & specifications

Solution: The Space Graph helps us choose

Problem: The most commonly used method can increase imbalance!

Solution: Other methods do not share this problem

(Coarsened Exact Matching is simple, easy, and powerful)

Lots of insights revealed in the process

Gary King (Harvard, IQSS) 2 / 66

Page 10: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence Example

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 11: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 12: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 13: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 14: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 15: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 16: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 17: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 18: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence ExampleReplication: Doyle and Sambanis, APSR 2000

Data: 124 Post-World War II civil wars

Dependent variable: peacebuilding success

Treatment variable: multilateral UN peacekeeping intervention (0/1)

Control vars: war type, severity, duration; development status; etc.

Counterfactual question: UN intervention switched for each war

Data analysis: Logit model

The question: How model dependent are the results?

Gary King (Harvard, IQSS) 3 / 66

Page 19: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Two Logit Models, Apparently Similar Results

Original “Interactive” Model Modified ModelVariables Coeff SE P-val Coeff SE P-valWartype −1.742 .609 .004 −1.666 .606 .006Logdead −.445 .126 .000 −.437 .125 .000Wardur .006 .006 .258 .006 .006 .342Factnum −1.259 .703 .073 −1.045 .899 .245Factnum2 .062 .065 .346 .032 .104 .756Trnsfcap .004 .002 .010 .004 .002 .017Develop .001 .000 .065 .001 .000 .068Exp −6.016 3.071 .050 −6.215 3.065 .043Decade −.299 .169 .077 −0.284 .169 .093Treaty 2.124 .821 .010 2.126 .802 .008UNOP4 3.135 1.091 .004 .262 1.392 .851Wardur*UNOP4 — — — .037 .011 .001Constant 8.609 2.157 0.000 7.978 2.350 .000N 122 122Log-likelihood -45.649 -44.902Pseudo R2 .423 .433

Gary King (Harvard, IQSS) 4 / 66

Page 20: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Doyle and Sambanis: Model Dependence

Gary King (Harvard, IQSS) 5 / 66

Page 21: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 22: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 23: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 24: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 25: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 26: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 27: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 28: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:

If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 29: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).

If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 30: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Overview of Matching for Causal Inference

Goal: reduce model dependence

A nonparametric, non-model-based approach

Makes parametric models work better rather than substitute for them(i.e,. matching is not an estimator; its a preprocessing method)

Should have been called pruning (no bias is introduced if pruning is afunction of T and X , but not Y )

Apply model to preprocessed (pruned) rather than raw data

Violates the “more data is better” principle, but that only applieswhen you know the DGP

Overall idea:If each treated unit exactly matches a control unit w.r.t. X , then: (1)treated and control groups are identical, (2) X is no longer aconfounder, (3) no need to worry about the functional form (XT − XC

is good enough).If treated and control groups are better balanced than when youstarted, due to pruning, model dependence is reduced

Gary King (Harvard, IQSS) 6 / 66

Page 31: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 32: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 33: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 34: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 35: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 36: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 37: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Model Dependence: A Simpler Example(King and Zeng, 2006: fig.4 Political Analysis)

What to do?

Preprocess I: Eliminate extrapolation region

Preprocess II: Match (prune bad matches) within interpolation region

Model remaining imbalance

Gary King (Harvard, IQSS) 7 / 66

Page 38: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)

1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 39: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)

1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 40: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)

1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 41: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)

1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 42: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points

2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 43: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 44: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}

use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 45: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 46: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.

4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 47: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Remove Extrapolation Region, then Match

Must remove data (selecting on X ) to avoid extrapolation.

Options to find “common support” of p(X |T = 1) and P(X |T = 0)1 Exact match, so support is defined only at data points2 Less but still conservative: convex hull approach

let T ∗ and X ∗ denote subsets of T and X s.t. {1− T ∗, X ∗} fallswithin the convex hull of {T , X}use X ∗ as estimate of common support (deleting remainingobservations)

3 Other approaches, based on distance metrics, pscores, etc.4 Easiest: Coarsened Exact Matching, no separate step needed

Gary King (Harvard, IQSS) 8 / 66

Page 48: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

Gary King (Harvard, IQSS) 9 / 66

Page 49: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

Gary King (Harvard, IQSS) 10 / 66

Page 50: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

Gary King (Harvard, IQSS) 11 / 66

Page 51: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

Gary King (Harvard, IQSS) 12 / 66

Page 52: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

Gary King (Harvard, IQSS) 13 / 66

Page 53: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

T

CC

C

CC

C

C

C

C

C

C

C

C

C

C

C

C

CC C

C

C

C

C

C

C

C

C

C

C

C

CCC

CC

CC

C

C

Gary King (Harvard, IQSS) 14 / 66

Page 54: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

TC

C

C

C

C

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CC

CC

C

C

C

C

CC

CCCC

Gary King (Harvard, IQSS) 15 / 66

Page 55: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Education (years)

Out

com

e

12 14 16 18 20 22 24 26 28

0

2

4

6

8

10

12

T

T

T

T T

T

T

TTT

TT

T TT T

T

T

T

TC

C

C

C

C

CC

C

C

CC

C CC

C

C

CCCC

C

CC

C

CC

CC

CC

C

C

C

C

CC

CCCC

Gary King (Harvard, IQSS) 16 / 66

Page 56: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Matching within the Interpolation Region(Ho, Imai, King, Stuart, 2007: fig.1, Political Analysis)

Matching reduces model dependence, bias, and variance

Gary King (Harvard, IQSS) 17 / 66

Page 57: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 58: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 59: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 60: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 61: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 62: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Empirical Illustration: Carpenter, AJPS, 2002

Hypothesis: Democratic senate majorities slow FDA drug approvaltime

n = 408 new drugs (262 approved, 146 pending).

lognormal survival model.

seven oversight variables (median adjusted ADA scores for House andSenate Committees as well as for House and Senate floors,Democratic Majority in House and Senate, and DemocraticPresidency).

18 control variables (clinical factors, firm characteristics, mediavariables, etc.)

Gary King (Harvard, IQSS) 18 / 66

Page 63: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 64: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 65: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 66: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 67: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 68: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 69: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 70: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Evaluating Reduction in Model Dependence

Focus on the causal effect of a Democratic majority in the Senate(identified by Carpenter as not robust).

omit post-treatment variables.

use one-to-one nearest neighbor propensity score matching.

discard 49 units (2 treated and 17 control units).

run 262,143 possible specifications and calculates ATE for each.

Look at variability in ATE estimate across specifications.

(Normal applications would only do one or a small number ofspecifications.)

Gary King (Harvard, IQSS) 19 / 66

Page 71: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Reducing Model Dependence

−80 −70 −60 −50 −40 −30

0.00

0.05

0.10

0.15

0.20

Estimated in−sample average treatment effect for the treated

Den

sity

Raw data Matcheddata

Point estimate of Carpenter's specification

using raw data

Figure: Histogram of estimated in-sample average treatment effect for the treated(ATT) of the Democratic Senate majority on FDA drug approval time across262, 143 specifications.

Gary King (Harvard, IQSS) 20 / 66

Page 72: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Another Example: Jeffrey Koch, AJPS, 2002

−0.05 0.00 0.05 0.10

010

2030

4050

60

Estimated average treatment effect

Den

sity

Raw data

Matcheddata

Point estimate of raw data

Figure: Estimated effects of being a highly visible female Republican candidateacross 63 possible specifications with the Koch data.

Gary King (Harvard, IQSS) 21 / 66

Page 73: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:

Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 74: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:

Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 75: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variable

Ti Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 76: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)

Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 77: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 78: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 79: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 80: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 81: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 82: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 83: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 84: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

How Matching Works

Notation:Yi Dependent variableTi Treatment variable (0/1)Xi Pre-treatment covariates

Treatment Effect for treated (Ti = 1) observation i :

TEi = Yi (Ti = 1)−Yi (Ti = 0)

= observed −unobserved

Estimate Yi (0) with Yj from matched (Xi ≈ Xj) controls

Yi (0) = Yj(0) or a model Yi (0) = g0(Xj)

Prune unmatched units to improve balance (so X is unimportant)

QoI: Sample Average Treatment effect on the Treated:

SATT =1

nT

∑i∈{Ti=1}

TEi

or Feasible Average Treatment effect on the Treated: FSATT

Gary King (Harvard, IQSS) 22 / 66

Page 85: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 86: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 87: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)

Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 88: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unit

Control units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 89: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unused

Prune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 90: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 1: Mahalanobis Distance Matching

1 Preprocess (Matching)

Distance(Xi ,Xj) =√

(Xi − Xj)′S−1(Xi − Xj)Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 23 / 66

Page 91: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

Gary King (Harvard, IQSS) 24 / 66

Page 92: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

Gary King (Harvard, IQSS) 25 / 66

Page 93: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CC CC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

Gary King (Harvard, IQSS) 26 / 66

Page 94: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CC CC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

Gary King (Harvard, IQSS) 27 / 66

Page 95: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TT T

TT TT T TTTTT TT

TTTT

CCC C

CC

C

C

C CCC

CC

CC C CC

C

C

CCCCC

CCC CCCCC

C CCCC

C

Gary King (Harvard, IQSS) 28 / 66

Page 96: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TT T

TT TT T TTTTT TT

TTTT

CCC C

CC

C

C

C CCC

CC

CC C CC

C

Gary King (Harvard, IQSS) 29 / 66

Page 97: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Mahalanobis Distance Matching

Education (years)

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

T TT T

TT TT T TTTTT TT

TTTT

CCC C

CC

C

C

C CCC

CC

CC C CC

C

Gary King (Harvard, IQSS) 30 / 66

Page 98: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 99: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 100: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 101: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |

Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 102: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unit

Control units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 103: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unused

Prune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 104: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 2: Propensity Score Matching

1 Preprocess (Matching)

Reduce k elements of X to scalar πi ≡ Pr(Ti = 1|X ) = 11+e−Xi β

Distance(Xi ,Xj) = |πi − πj |Match each treated unit to the nearest control unitControl units: not reused; pruned if unusedPrune matches if Distance>caliper

2 Estimation Difference in means or a model

Gary King (Harvard, IQSS) 31 / 66

Page 105: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

Gary King (Harvard, IQSS) 32 / 66

Page 106: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore

Gary King (Harvard, IQSS) 33 / 66

Page 107: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

C

C

C

C

CC

C

CCC

CC

C

C

C

CCCC

C

C

CC

C

CC

CC

C

C C

CC

C

C

TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T

Gary King (Harvard, IQSS) 34 / 66

Page 108: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T

Gary King (Harvard, IQSS) 35 / 66

Page 109: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

C

C

CC

CCC

C

C

C

CC

C

C

C

C

C

C

C

CCCCC

C

CCCCCCCCC

C

C

C

C

CC

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T

Gary King (Harvard, IQSS) 36 / 66

Page 110: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

1

0

PropensityScore

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T

Gary King (Harvard, IQSS) 37 / 66

Page 111: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C

C

CC

C TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

1

0

PropensityScore

CCC

C

CCC

C

CCC

CCCC

T

TTT

T

TT

T

T

T

T

TT

T

T

T

T

T

T

T

Gary King (Harvard, IQSS) 38 / 66

Page 112: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Propensity Score Matching

Education (years)

Age

12 16 20 24 28

20

30

40

50

60

70

80

C

C

CC

C

CC

C

C

C

C

C

C

C

C

C

C

CC

C TTTT

T

T

T

T

T

T

T

T

T

TT

T

T

T

T

T

Gary King (Harvard, IQSS) 39 / 66

Page 113: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)

Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 114: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)

Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 115: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 116: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)

Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 117: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 118: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 119: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )

Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 120: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 121: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 122: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treateds

Can apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 123: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Method 3: Coarsened Exact Matching

1 Preprocess (Matching)Temporarily coarsen X as much as you’re willing

e.g., Education (grade school, high school, college, graduate)Easy to understand, or can be automated as for a histogram

Apply exact matching to the coarsened X , C (X )

Sort observations into strata, each with unique values of C(X )Prune any stratum with 0 treated or 0 control units

Pass on original (uncoarsened) units except those pruned

2 Estimation Difference in means or a model

Need to weight controls in each stratum to equal treatedsCan apply other matching methods within CEM strata (inherit CEM’sproperties)

Gary King (Harvard, IQSS) 40 / 66

Page 124: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Gary King (Harvard, IQSS) 41 / 66

Page 125: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

CCC CCC CC

C CC C CCC CCCCCCC CC CCC CCCCCC

C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT

Gary King (Harvard, IQSS) 42 / 66

Page 126: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

HS BA MA PhD 2nd PhD

Drinking age

Don't trust anyoneover 30

The Big 40

Senior Discounts

Retirement

Old

CCC CCC CC

C CC C CCC CCCCCCC CC CCC CCCCCC

C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT

Gary King (Harvard, IQSS) 43 / 66

Page 127: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

HS BA MA PhD 2nd PhD

Drinking age

Don't trust anyoneover 30

The Big 40

Senior Discounts

Retirement

Old

CCC CCC CC

C CC C CCC CCCCCCC CC CCC CCCCCC

C CCC CC C

T TT T

TT TT T TTTTT TT

TTTT

Gary King (Harvard, IQSS) 44 / 66

Page 128: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

HS BA MA PhD 2nd PhD

Drinking age

Don't trust anyoneover 30

The Big 40

Senior Discounts

Retirement

Old

CC C

CC

CC C CCC CC CCCC

C

TTT T TT

TTT TT

TTTT

Gary King (Harvard, IQSS) 45 / 66

Page 129: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

HS BA MA PhD 2nd PhD

Drinking age

Don't trust anyoneover 30

The Big 40

Senior Discounts

Retirement

Old

CC C

CCCC C CC

C CC CCCC

C

TTT T TT

TTT TT

TTTT

Gary King (Harvard, IQSS) 46 / 66

Page 130: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Coarsened Exact Matching

Education

Age

12 14 16 18 20 22 24 26 28

20

30

40

50

60

70

80

CC C

CCCC C CC

C CC CCCC

C

TTT T TT

TTT TT

TTTT

Gary King (Harvard, IQSS) 47 / 66

Page 131: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 132: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 133: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 134: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 135: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 136: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)

Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 137: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

The Bias-Variance Trade Off in Matching

Bias (& model dependence) = f (imbalance, importance, estimator) we measure imbalance instead

Variance = f (matched sample size, estimator) we measure matched sample size instead

Bias-Variance trade off Imbalance-n Trade Off

Measuring Imbalance

Classic measure: Difference of means (for each variable)Better measure: Difference of multivariate histograms,

L1(f , g ;H) =1

2

∑`1···`k∈H(X)

|f`1···`k− g`1···`k

|

Gary King (Harvard, IQSS) 48 / 66

Page 138: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate

(Ugh!)

Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 139: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate

(Ugh!)

Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 140: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalance

CEM: Choose imbalance, match, check matched nBest practice: iterate

(Ugh!)

Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 141: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched n

Best practice: iterate

(Ugh!)

Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 142: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate

(Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 143: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)

Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 144: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 145: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 146: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,

Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 147: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, and

Choose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 148: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Comparing Matching Methods

Standard approach

MDM & PSM: Choose matched n, match, check imbalanceCEM: Choose imbalance, match, check matched nBest practice: iterate (Ugh!)Choose matched solution & matching method becomes irrelevant

An alternative approach

Compute lots of matching solutions,Identify the frontier of lowest imbalance for each given n, andChoose a matching solution among those on the frontier

Gary King (Harvard, IQSS) 49 / 66

Page 149: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

A Space Graph: Real DataKing, Nielsen, Coberley, Pope, and Wells (2011)

20000 15000 10000 5000 0

0.0

0.2

0.4

0.6

0.8

Healthways Data

N of Matched Sample ("variance")

L1 (

"bia

s")

● Raw DataRandom PruningPSMMDMCEM

Gary King (Harvard, IQSS) 50 / 66

Page 150: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

A Space Graph: Real Data

15000 10000 5000 0

0.0

0.2

0.4

0.6

0.8

1.0

Called/Not Called Data

N of Matched Sample ("variance")

L1 (

"bia

s")

Gary King (Harvard, IQSS) 51 / 66

Page 151: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

A Space Graph: Real Data

400 300 200 100 0

0.0

0.2

0.4

0.6

0.8

1.0

FDA Data

N of Matched Sample ("variance")

L1 (

"bia

s")

Gary King (Harvard, IQSS) 52 / 66

Page 152: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

A Space Graph: Real Data

600 400 200 0

0.0

0.2

0.4

0.6

0.8

1.0

Lalonde Data Subset

N of Matched Sample ("variance")

L1 (

"bia

s")

Gary King (Harvard, IQSS) 53 / 66

Page 153: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Space Graphs: Different Imbalance Metrics

N of Matched Sample

L1

0.0

0.2

0.4

0.6

0.8

1.0

2500 2000 1500 1000 500 0

●●

Aid Shocks (L1 Metric)

● Raw DataRandom PruningCEMMDMPSM

published PSM

published PSM with 1/4 sd caliper

N of Matched Sample

Ave

rage

Diff

eren

ce in

Mea

ns

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

2500 2000 1500 1000 500 0

Aid Shocks (Difference in Means Metric)

published PSM

published PSM with 1/4 sd caliper

N of Matched Sample

Ave

rage

Mah

alan

obis

Dis

crep

ancy

010

2030

4050

60

2500 2000 1500 1000 500 0

● ●

Aid Shocks (Average Mahalanobis Discrepancy)

published PSM

published PSM with 1/4 sd caliper

Gary King (Harvard, IQSS) 54 / 66

Page 154: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

PSM Approximates Random Matching in Balanced Data

Covariate 1

Cov

aria

te 2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●

PSM MatchesCEM and MDM Matches

Gary King (Harvard, IQSS) 55 / 66

Page 155: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Destroying CEM with PSM’s Two Step Approach

Covariate 1

Cov

aria

te 2

−2 −1 0 1 2

−2

−1

0

1

2

−2 −1 0 1 2

−2

−1

0

1

2

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●●

●●

● ●

● ●

CEM MatchesCEM−generated PSM Matches

Gary King (Harvard, IQSS) 56 / 66

Page 156: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:

The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 157: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:

The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 158: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 159: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original data

Can increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 160: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matches

Approximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 161: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 162: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reduction

Implications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 163: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 164: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking required

Adjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 165: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistake

Adjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 166: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistake

Reestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 167: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 168: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problems

CEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 169: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)

(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 170: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)

You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 171: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space Graph

CEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 172: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Pause for Conclusions

Propensity score matching:The problem:

Imbalance can be worse than original dataCan increase imbalance when removing the worst matchesApproximates random matching in well-balanced data(Random matching increases imbalance)

The Cause: unnecessary 1st stage dimension reductionImplications:

Balance checking requiredAdjusting for potentially irrelevant covariates with PSM is a mistakeAdjusting experimental data with PSM is a mistakeReestimating the propensity score after eliminating noncommonsupport may be a mistake

CEM and Mahalanobis do not have PSM’s problemsCEM > Mahalanobis > Propensity Score (in many data sets andsims)(Your performance may vary)You can easily check with the Space GraphCEM is the easiest and most powerful; let’s look more deeply. . .

Gary King (Harvard, IQSS) 57 / 66

Page 173: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance):

requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 174: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance):

requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 175: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance):

requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 176: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance):

requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 177: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance):

requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 178: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES);

all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 179: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ;

Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 180: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ;

aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 181: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance;

in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 182: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 183: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:

Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 184: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex ante

Most important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 185: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 186: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on others

Best practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 187: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice:

choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 188: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n

-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 189: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match

-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 190: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check,

tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 191: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak

-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 192: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match

-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 193: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,

tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 194: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak

-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 195: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match

-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 196: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check,

tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 197: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak

-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 198: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match

-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 199: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check,

· · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 200: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·

Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 201: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice:

choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 202: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice: choose n,

match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 203: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice: choose n, match,

publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 204: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice: choose n, match, publish,

STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 205: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice: choose n, match, publish, STOP.

(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 206: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Problems With Matching Methods (other than CEM)

Don’t eliminate extrapolation region

Don’t work with multiply imputed data

Most violate the congruence principle

Largest class of matching methods (EPBR, e.g., propensity scores,Mahalanobis distance): requires normal data (or DMPES); all X ’smust have same effect on Y ; Y must be a linear function of X ; aimsonly for expected (not in-sample) imbalance; in practice, we’relucky if mean imbalance is reduced

Not well designed for observational data:Least important (variance): matched n chosen ex anteMost important (bias): imbalance reduction checked ex post

Hard to use: Improving balance on 1 variable can reduce it on othersBest practice: choose n-match-check, tweak-match-check,tweak-match-check, tweak-match-check, · · ·Actual practice: choose n, match, publish, STOP.(Is balance even improved?)

Gary King (Harvard, IQSS) 58 / 66

Page 207: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for:

means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 208: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for:

means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 209: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for:

means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 210: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for:

means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 211: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means,

variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 212: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances,

skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 213: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,

covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 214: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances,

comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 215: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments,

coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 216: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness,

co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 217: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis,

quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 218: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles,

and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 219: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.

=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 220: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 221: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .

too coarse?

You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 222: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse?

You’re left modeling remaining imbalancesnot coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 223: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalances

not coarse enough?

n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 224: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalancesnot coarse enough?

n may be too smallas large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 225: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalancesnot coarse enough? n may be too small

as large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 226: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalancesnot coarse enough? n may be too smallas large as you’re comfortable with, but n is still too small?

No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 227: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalancesnot coarse enough? n may be too smallas large as you’re comfortable with, but n is still too small? No magic method of matching can save you;

You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 228: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM as an MIB Method

Coarsening determines the level of imbalance

Convenient monotonicity property: Reducing maximum imbalance onone X : no effect on others

We Prove: setting ε bounds the treated-control group difference,within strata and globally, for: means, variances, skewness,covariances, comoments, coskewness, co-kurtosis, quantiles, and fullmultivariate histogram.=⇒ Setting ε controls all multivariate treatment-control differences,interactions, and nonlinearities, up to the chosen level (matched n isdetermined ex post)

What if coarsening is set . . .too coarse? You’re left modeling remaining imbalancesnot coarse enough? n may be too smallas large as you’re comfortable with, but n is still too small? No magic method of matching can save you; You’re stuck modeling or collecting better data

Gary King (Harvard, IQSS) 59 / 66

Page 229: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

End of planned slides for today; others follow

Gary King (Harvard, IQSS) 60 / 66

Page 230: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 231: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 232: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 233: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 234: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 235: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis space

Estimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 236: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitive

CEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 237: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s units

E.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 238: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit):

would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 239: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit): would bin college drop outwith 1st year grad student;

and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 240: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit): would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 241: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit): would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 242: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit): would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 243: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Other CEM properties we prove

Automatically eliminates extrapolation region (no separate step)

Bounds model dependence

Bounds causal effect estimation error

Meets the congruence principle

The principle: data space = analysis spaceEstimators that violate it are nonrobust and counterintuitiveCEM: εj is set using each variable’s unitsE.g., calipers (strata centered on each unit): would bin college drop outwith 1st year grad student; and not bin Bill Gates & Warren Buffett

Approximate invariance to measurement error:CEM pscore Mahalanobis Genetic

% Common Units 96.5 70.2 80.9 80.0

Fast and memory-efficient even for large n; can be fully automated

Simple to teach: coarsen, then exact match

Gary King (Harvard, IQSS) 61 / 66

Page 244: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Imbalance Measures

Variable-by-Variable Difference in Global Means

I(j)1 =

∣∣∣X (j)mT − X

(j)mC

∣∣∣ , j = 1, . . . , k

Multivariate Imbalance: difference in histograms (bins fixed ex ante)

L1(f , g) =∑

`1···`k

|f`1···`k− g`1···`k

|

Local Imbalance by Variable (given strata fixed by matching method)

I(j)2 =

1

S

S∑s=1

∣∣∣X (j)ms

T− X

(j)ms

C

∣∣∣ , j = 1, . . . , k

Gary King (Harvard, IQSS) 62 / 66

Page 245: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Imbalance Measures

Variable-by-Variable Difference in Global Means

I(j)1 =

∣∣∣X (j)mT − X

(j)mC

∣∣∣ , j = 1, . . . , k

Multivariate Imbalance: difference in histograms (bins fixed ex ante)

L1(f , g) =∑

`1···`k

|f`1···`k− g`1···`k

|

Local Imbalance by Variable (given strata fixed by matching method)

I(j)2 =

1

S

S∑s=1

∣∣∣X (j)ms

T− X

(j)ms

C

∣∣∣ , j = 1, . . . , k

Gary King (Harvard, IQSS) 62 / 66

Page 246: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Imbalance Measures

Variable-by-Variable Difference in Global Means

I(j)1 =

∣∣∣X (j)mT − X

(j)mC

∣∣∣ , j = 1, . . . , k

Multivariate Imbalance: difference in histograms (bins fixed ex ante)

L1(f , g) =∑

`1···`k

|f`1···`k− g`1···`k

|

Local Imbalance by Variable (given strata fixed by matching method)

I(j)2 =

1

S

S∑s=1

∣∣∣X (j)ms

T− X

(j)ms

C

∣∣∣ , j = 1, . . . , k

Gary King (Harvard, IQSS) 62 / 66

Page 247: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

Imbalance Measures

Variable-by-Variable Difference in Global Means

I(j)1 =

∣∣∣X (j)mT − X

(j)mC

∣∣∣ , j = 1, . . . , k

Multivariate Imbalance: difference in histograms (bins fixed ex ante)

L1(f , g) =∑

`1···`k

|f`1···`k− g`1···`k

|

Local Imbalance by Variable (given strata fixed by matching method)

I(j)2 =

1

S

S∑s=1

∣∣∣X (j)ms

T− X

(j)ms

C

∣∣∣ , j = 1, . . . , k

Gary King (Harvard, IQSS) 62 / 66

Page 248: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 249: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo:

XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 250: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ).

n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 251: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000

Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 252: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement

; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 253: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 254: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 255: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 256: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00

MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 257: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28

PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 258: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16

CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 259: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 260: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 261: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 262: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24

PSC 2.38 1.25 .74 1.25 .74 1.18MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 263: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 264: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13

CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 265: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR Data

Gary King (Harvard, IQSS) 63 / 66

Page 266: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: EPBR-Compliant Data

Monte Carlo: XT ∼ N5(0,Σ) and XC ∼ N5(1,Σ). n = 2, 000, reps=5,000Allow MAH & PSC to match with replacement; use automated CEM

Difference in means (I1):

X1 X2 X3 X4 X5 Seconds

initial 1.00 1.00 1.00 1.00 1.00MAH .20 .20 .20 .20 .20 .28PSC .11 .06 .03 .06 .03 .16CEM .04 .02 .06 .06 .04 .08

Local (I2) and multivariate L1 imbalance:

X1 X2 X3 X4 X5 L1

initial 1.24PSC 2.38 1.25 .74 1.25 .74 1.18

MAH .56 .36 .29 .36 .29 1.13CEM .42 .26 .17 .22 .19 .78

CEM dominates EPBR-methods in EPBR DataGary King (Harvard, IQSS) 63 / 66

Page 267: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 268: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 269: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 270: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 271: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28

MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 272: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08

PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 273: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23

GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 274: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12

CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 275: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 276: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM in Practice: Non-EPBR Data

Monte Carlo: Exact replication of Diamond and Sekhon (2005), using datafrom Dehejia and Wahba (1999). CEM coarsening automated.

BIAS SD RMSE Seconds L1

initial −423.7 1566.5 1622.6 .00 1.28MAH 784.8 737.9 1077.2 .03 1.08PSC 260.5 1025.8 1058.4 .02 1.23GEN 78.3 499.5 505.6 27.38 1.12CEM .8 111.4 111.4 .03 .76

CEM works well in non-EPBR data too

Gary King (Harvard, IQSS) 64 / 66

Page 277: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data

1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 278: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data

1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 279: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall

2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 280: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage

3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 281: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 282: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 283: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 284: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 285: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices

Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 286: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices Histogram bin size calculations,

EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 287: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices Histogram bin size calculations, EstimatedSATT error bound,

Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 288: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 289: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

CEM Extensions I

CEM and Multiple Imputation for Missing Data1 put missing observation in stratum where plurality of imputations fall2 pass on uncoarsened imputations to analysis stage3 Use the usual MI combining rules to analyze

Multicategory Treatments: No modification necessary; keep all stratawith ≥ 1 unit having each value of T (L1 is max difference acrosstreatment groups)

Continuous Treatments: Coarsen treatment and apply CEM as usual

Blocking in Randomized Experiments: no modification needed;randomly assign T within CEM strata

Automating user choices Histogram bin size calculations, EstimatedSATT error bound, Progressive Coarsening

Detecting Extreme Counterfactuals

Gary King (Harvard, IQSS) 65 / 66

Page 290: Advanced Quantitative Research Methodology, Lecture · Overview of Matching for Causal Inference Goal: reduce model dependence A nonparametric, non-model-based approach Makes parametric

For papers, software (for R and Stata), tutorials, etc.

http://GKing.Harvard.edu/cem

Gary King (Harvard, IQSS) 66 / 66