Propensity score matching for simple and clustered...

Post on 07-Sep-2018

241 views 1 download

Transcript of Propensity score matching for simple and clustered...

Propensity score matching for simple and

clustered data using SPSS and R

Felix Thoemmes & Wang Liao

Support provided to first author by

IES grant “Matching Strategies for Observational Studies with Multilevel Data in Educational Research”

Increasing use of propensity scores

Source: Web of Science 2

0

2000

4000

6000

8000

10000

1983 1987 1991 1995 1999 2003 2007 2011

Propensity scores

e(x) = p (z=1 | x)

3

Propensity score

probability

z = treatment assignment

1 = treatment group

0 = control group

x = vector of covariates

conditional on

Propensity scores

A single number summary based on all available

covariates that expresses the probability that a

given subject is assigned to the treatment

condition, based on the values of the set of

observed covariates

4

e(x) = p (z=1 | x)

4

Actual assignment

Pro

babili

ty o

f re

ceiv

ing t

reatm

ent

Control Treatment

5

Actual assignment

Pro

babili

ty o

f re

ceiv

ing t

reatm

ent

Control Treatment

Why propensity scores?

• Is there anything that we can do with

propensity scores that we cannot do with

multiple regression?

Comparison

Propensity scores Regression adjustment

Tool to strengthen causal conclusions Tool to strengthen causal conclusions

Models relationship between confounders

and treatment

Models relationship between confounders

and outcome

Specification of functional form can be

checked via balance measures

Specification of functional form can be

checked via examination of residuals

Easy assessment of overlap – little

potential for extrapolation

Overlap is assessed in multi-dimensional

space – often extrapolated

No routine assumptions about linearity

and interactions

Classic ANCOVA assumes linearity and

absence of interaction, but this can be

relaxed

Outcome variable unknown Outcome variable part of the model

Sample size can be diminished through

matching, loss of power

Sample size stays constant, power can

increase due to covariates

Causal effect for treated, untreated, local

comparison

Causal effect extrapolated to population

Selection Estimation Conditioning Model

Checks Effect

Estimation

8

9

Selection Estimation Conditioning Model

Checks Effect

Estimation

Selection of covariates is the single most important aspect to

ensure unbiasedness of causal effect

Debate in literature (see Rubin, Pearl, 2009, Statistics in

Medicine) on how to select covariates

Include variables that are confounders

(based on your theoretical background knowledge)

Exclude variables that are affected by the treatment (potential mediators)

Exclude variables that are instrumental variables

Exclude variables that are collider variables and induce dependencies

Correlational evidence as basis for variable selection can mislead

10

Selection Estimation Conditioning Model

Checks Effect

Estimation

Selection Estimation Conditioning Model

Checks Effect

Estimation

Traditionally, estimated using logistic regression

Might necessitate iterative model optimization

Data mining approaches offer some promise

Covariate-balancing propensity score (K. Imai)

11

Selection Estimation Conditioning Model

Checks Effect

Estimation

Matching can be done in MANY different ways

1:1, 1:k nearest neighbor matching

1:1, 1:k optimal matching

k:k full matching

Kernel matching

Synthetic matching

12

Selection Estimation Conditioning Model

Checks Effect

Estimation

Other approaches include

Stratification (form subclasses based on estimated propensity score)

Weighting (use propensity score to construct weights that balance groups)

Regression adjustment (use propensity score as a covariate)

13

Selection Estimation Conditioning Model

Checks Effect

Estimation

• Check of covariate balance

–standardized difference of covariates (and squares, interactions)

–various diagnostic graphs

• Region of common support (distributional overlap)

–graphical assessment (e.g. histograms)

14

Selection Estimation Conditioning Model

Checks Effect

Estimation

• Estimate of treatment effect

–Mean difference

–Standard error dependent on conditioning scheme

15

Propensity scores in R and SPSS

• “MatchIt()” from Ho et al. performs a wide

variety of these tasks

• “PSM for SPSS” is an SPSS implementation

of MatchIt() and several other R packages

(e.g., “Ritools()”, “cem”, “optmatch”)

MatchIt in R

MatchIt

• Offers various ways to estimate the

propensity score (including generalized

additive models)

• Offers various way to match (including full

matching, nearest neighbor matching, exact

matching)

MatchIt

• Offers various ways to fine-tune the matching

(caliper, discarding of units outside region of

overlap)

MatchIt

• Provides output of

– Balance table (standardized difference)

– Diagnostic balance plots

PSM in SPSS

• Offers most (but not all) of the features of MatchIt

• In addition

– Reports Hansen & Bowers overall chi-square test of balance

– Reports King’s multivariate imbalance measure

– Supports multi-level data (fixed and random effects models)

Multi-level data

• Selection is on level 1, unit of analysis is on

level 1, but clustering is present

SPSS PSM

• Options to estimate fixed effects model,

random effects models (user defines which

slopes should be random)

• PS estimated based on model that allows

intercepts and slopes to be estimated as

random effects (allowed to vary across

clusters)

SPSS PSM

• Conditioning within clusters (CWC),

conditioning across clusters (CAC)

• Flexible PS MLM modeling choices

• Balance checks within clusters and globally

formula

treat ~ x1 + x2 + x3 + x1:x2 + x1^2

method

nearest, full, optimal,

genetic, exact, subclass

formula

treat ~ x1 + x2 + x3 + x1:x2 + x1^2

method

nearest, full, optimal,

genetic, exact, subclass