Causal Inference in Clinical Trials - Statistics Homepage · Technical interlude: in the...

76
Causal Inference in Clinical Trials Lingling Li Department of Ambulatory Care and Prevention Harvard Medical School June 11, 2008 Li, L. (Institute) June 11, 2008 1 / 76

Transcript of Causal Inference in Clinical Trials - Statistics Homepage · Technical interlude: in the...

Causal Inference in Clinical Trials

Lingling Li

Department of Ambulatory Care and PreventionHarvard Medical School

June 11, 2008

Li, L. (Institute) June 11, 2008 1 / 76

Motivating Example I

The Study to Understand Prognoses and Preferences for Outcomesand Risks of Treatments (SUPPORT)

Examine the e¤ect of the use of right heart catheterization (RHC) onsurvival time up to 30 days

Observational study with 5735 critically ill adult patients

2184 patients received RHC

Li, L. (Institute) June 11, 2008 2 / 76

Motivating Example I

For those 5735 patients, data available on

A dichotomous treatment indicator variable A

A = 1 indicates receiving RHC during the initial 24 hours of an ICUstay (2184)

Response variable Y : survival time up to 30 daysA vector of baseline covariates L including age, years of education,income, weight, blood pressure etc.

Li, L. (Institute) June 11, 2008 3 / 76

Motivating Example I

Confounding: decision to use or withhold RHC was left to thediscretion of the physician based on patient characteristics

Analyzed using di¤erent methods in several published papers(Connors et al., 1996; Lin et al., 1998; Tan, Z., 2006)

Lack of evidence suggesting bene�cial e¤ects of RHC on the 30-daysurvival

The appearance of a harmful e¤ect could be explained away byunmeasured confounding

Li, L. (Institute) June 11, 2008 4 / 76

Motivation Example II

The AIDS Clinical Trial Group (ACTG) study 021, a double-blindrandomized clinical trial

Compare the e¤ect of bactrim versus aerosolized pentamidine (AP) asprophylaxis therapy for pneumocystis pneumonia (PCP) in AIDSpatients

Primary endpoint: time to recurrent PCP

Cross over to the other treatment allowed if PCP developed

Secondary endpoint: time to death

Li, L. (Institute) June 11, 2008 5 / 76

Motivation Example II

310 patients accrued, 94 died, the remaining 216 patients either aliveat the end of the trial or dropped out in the middle

Of the 94 deaths, 21 crossed over, 37 stopped all prophylactic therapy(21 for non-medical reasons and 16 for medical indications)

Suppose we are interested in the survival status at the end of thetrial:

Y = 1 if dead at the end of the trailY = 0 if alive at the end of the trail

Li, L. (Institute) June 11, 2008 6 / 76

Motivation Example II

Cross over to the other treatment arm (nonrandom nonadherence)

Informative drop-out due to medication indications

Standard methods: Intent-to-treat analysis, Per-protocol/As-treatedanalysis

Li, L. (Institute) June 11, 2008 7 / 76

Outline

Introduction to Causality

Confounding in Observational Studies

Dependent Censoring in Randomized Trials

Sensitivity Analysis for Nonignorable Censoring

Li, L. (Institute) June 11, 2008 8 / 76

Outline

Introduction to CausalityConfounding in Observational Studies

Dependent Censoring in Randomized Trials

Sensitivity Analysis for Nonignorable Censoring

Li, L. (Institute) June 11, 2008 9 / 76

Introduction

There are generally two notions of causation:

1 Cause of an e¤ect: �rst observe an event/outcome, and subsequentlyidentify the causes or events that lead to the observed outcome

2 E¤ect of a cause: assess the e¤ect of a well de�ned exposure orintervention. e.g. does smoking cause lung cancer? doesazidothymidine (AZT) prevent the advent of AIDS among HIVinfected patients?

Li, L. (Institute) June 11, 2008 10 / 76

Introduction

An example of (1):

In the 80s, unusual high number of patients dying from a combinationof syndromes including a rare Kaposi�s skin cancer and pneumoniaLater, HIV found to be the cause

Limited to (2) in this lecture with our focus on challenges in clinicaltrials

Li, L. (Institute) June 11, 2008 11 / 76

Introduction

Why do we need a formal theory of causation?

Make it explicit what we mean by "causal e¤ect", i.e., what is thequantity/estimand we seek?

Give explicit assumptions under which "association is causation", andtherefore standard statistical methods may be used

Give explicit assumptions needed for the identi�cation of causale¤ects even when "association is not causation"

Allow for the derivation of new statistical methods when standard andfamiliar methods fail

Li, L. (Institute) June 11, 2008 12 / 76

Counterfactuals

Suppose you are contemplating taking an aspirin for your headache,and the outcome Y denotes whether or not you are headache freewithin say the next hour.

As a thought experiment, think of two potential outcomes

Y0 : headache outcome after not taking aspirin

Y1 : headache outcome after taking aspirin

Note everything else remains exactly the same

Either of the two will be observed depending on whether or not youdecide to take the aspirin, but never both

Li, L. (Institute) June 11, 2008 13 / 76

Counterfactuals

Ya is the outcome that you would observe if, possibly contrary tofact, you followed treatment a 2 f0, 1g.

The English sentence "aspirin has no causal e¤ect on my headache"is the mathematical statement about my potential outcomes:

Y1 = Y0.

Suppose larger Y indicates better outcome, an individual has abene�cial causal e¤ect of aspirin if Y1 > Y0; or has a harmful causale¤ect of aspirin if Y1 < Y0.

Li, L. (Institute) June 11, 2008 14 / 76

Counterfactuals

Consistency assumption: the observed outcome Y satis�es

Y = AY1 + (1� A)Y0

ID A Y Y0 Y11 0 0 0 ?2 0 1 1 ?3 1 1 ? 14 1 1 ? 1

Note that since both outcomes are never simultaneously observed, itis impossible to evaluate individual causal e¤ects.

Li, L. (Institute) June 11, 2008 15 / 76

Counterfactuals

Remark: the mere de�nition of the potential variable Ya carries theso-called assumption of no-interference between units (i.e.,Stable unit treatment value assumption). Under this assumption,the value of the outcome of one subject who receives treatment a, isuna¤ected by the treatments received by the other subjects in thepopulation.

As an example, the potential outcome would be ill-de�ned if A = 0was placebo and A = 1 was treatment with a vaccine for a highlycontagious disease. Obviously, if the vaccine is e¤ective, the value ofY0 would depend on whether or not the contacts of the person getthe vaccine.

Li, L. (Institute) June 11, 2008 16 / 76

Without the no-interference assumption the notation Ya would bemeaningless. We would need a de�nition of potential variable for eachpossible value taken by the vector of treatments of all remainingsubjects in the population!

Li, L. (Institute) June 11, 2008 17 / 76

Association vs Causal Measures in the Population

The variables we can hope to observe are (A,Y ) .

ID A Y Y0 Y11 0 0 0 02 0 1 1 13 1 1 1 14 1 1 1 1

Average causal e¤ect (a causal e¤ect measure in the population)

ψ � E (Y1)� E (Y0) = 3/4� 3/4 = 0

Di¤erence of conditional expectations (an association measure)

ζ = E (Y jA = 1)� E (Y jA = 0) = 2/2� 1/2 = 1/2

Li, L. (Institute) June 11, 2008 18 / 76

Counterfactuals

Technical interlude: in the expressionsE (Y jA = 1) ,E (Y jA = 0) ,E (Y1) and E (Y0) , the randomvariables Y , (Y1,Y0) are to be understood as the outcome andpotential outcomes of a person chosen at random from the �nitepopulation of four individuals. The expectations, of course, coincidewith the averages of the values of the variables in the four membersof the �nite population. Later on, we will regard these variables asarising from one random draw from an in�nite population and theexpectations will be understood as the average of the values of thevariables in the in�nite members of the population.

Li, L. (Institute) June 11, 2008 19 / 76

Randomization

Suppose you randomize the population of patients to either aspirinwith probability p > 0 or to no aspirin with probability 1� p > 0.Then, with q denoting independence, it holds that

Ya q A, for a = 0, 1

because, Y1 and Y0 are, like gender and age, pretreatment variables.

In such case we have for a = 0, 1, that

P (Ya = 1) =|{z}by randomizationand 0<p<1

P (Ya = 1jA = a) =|{z}by consistency

P (Y = 1jA = a)

Li, L. (Institute) June 11, 2008 20 / 76

Randomization

Thus, the probability distribution of the counterfactuals Ya, a = 0, 1,can be written in terms of the distribution of the observed data(Y ,A) and hence it is identi�ed.

Note that randomization does not imply Y q A sinceY = AY1 + (1� A)Y0 is determined by treatment and therefore is apost-treatment variable.

In fact, under randomization,

Y q A, Y1D= Y0 (Y1 and Y0 equal in distribution)

Li, L. (Institute) June 11, 2008 21 / 76

Motivating Example I

The Study to Understand Prognoses and Preferences for Outcomesand Risks of Treatments (SUPPORT)

Examine the e¤ect of the use of right heart catheterization (RHC) onsurvival time up to 30 days

Observational study with 5735 critically ill adult patients

2184 patients received RHC

Li, L. (Institute) June 11, 2008 22 / 76

Motivating Example I

For those 5735 patients, data available on

A dichotomous treatment indicator variable A

A = 1 indicates receiving RHC during the initial 24 hours of an ICUstay (2184)

Response variable Y : survival time up to 30 daysA vector of baseline covariates L including age, years of education,income, weight, blood pressure etc.

Decision to use or withhold RHC was left to the discretion of thephysician based on patient characteristics

Li, L. (Institute) June 11, 2008 23 / 76

Outline

Introduction to Causality

Confounding in Observational StudiesDependent Censoring in Randomized Trials

Sensitivity Analysis for Nonignorable Censoring

Li, L. (Institute) June 11, 2008 24 / 76

Observational Study

Randomization not applicable due to ethical and practical reasons

Observed data comes from a cross-sectional observational study

A random sample from (L,A,Y ) where L is a vector of pre-treatmentcovariates.

No unmeasured confounders assumption (NUCA):

Ya q AjL, for a = 0, 1

i.e. Ya and A are conditionally independent given L.

Li, L. (Institute) June 11, 2008 25 / 76

Observational Study

The intuition behind (NUCA) similar to that of Randomization

Within each level of L, A randomly assigned

The random probabilities likely to depend on L

NUCA achievable only if all common predictors of A and Y aremeasured

Li, L. (Institute) June 11, 2008 26 / 76

The G-formula.

Theorem 1: if (i) consistency, (ii) NUCA and (iii) (Positivity)P (A = ajL) > 0 (CNP) hold w.p. 1, the distribution of Ya isidenti�ed and it satis�es

pa (y) =Zp (y jl , a) dF (l)

pa � pYa (y) : the density (with respect to some measure µ) of Ya aty

p (y jl , a) � pY jL,A (y jl , a) : the conditional density of Y (at y) givenL = l and A = a

F (l) : the c.d.f. of L at l

Li, L. (Institute) June 11, 2008 27 / 76

The G-formula.

Proof: for a = 0, 1

pa (y) � pYa (y)

=ZpYa jL (y jl) dF (l)

=ZpYa jL,A (y jl , a) dF (l) by randomization and positivity

=ZpY jL,A (y jl , a) dF (l) by consistency

Li, L. (Institute) June 11, 2008 28 / 76

Crude vs Counterfactual Means

In general, the crude mean E (Y jA = a) is not equal to thecounterfactual mean E (Ya) .

E (Ya) : the mean of the outcome Y if, contrary to the fact, everyonein the population was forced to take treatment A = a.E (Y jA = a) : the mean of the outcome Y among the sub-populationwho chose to take treatment A = a.

Li, L. (Institute) June 11, 2008 29 / 76

Crude vs Counterfactual Means

Under the assumptions of the previous theorem (CNP), the causaltreatment e¤ect can be identi�ed as

E (Ya)(1)= E [E (Y jA = a, L)](2)= E [E (Y jA = a,π (a, L))](3)= E

�I (A = a)π (a, L)

Y�,

where π (a, L) � p (A = ajL) , the so-called propensity score (PS).

Li, L. (Institute) June 11, 2008 30 / 76

Standard Regression

Equity (1) holds since

E (Ya) = E [E (YajL)] = E [E (YajA = a, L)] = E [E (Y jA = a, L)] .

Suppose a 2 f0, 1g , and E (Y jA, L) = b0 (L; η) + βA, then

ψ = E (Ya=1)� E (Ya=0) = β

The maximum likelihood estimator bβ is consistent and attains thee¢ ciency bound IF the parametric model of E (Y jA, L) is correctlyspeci�ed.

Li, L. (Institute) June 11, 2008 31 / 76

Propensity Score AdjustmentsEquity (3)

E [E (Y jA = a,π (a, L))] (3)= E�I (A = a)π (a, L)

Y�

holds since

E�I (A = a)π (a, L)

Y�

= E�E�I (A = a)π (a, L)

Y jA = a,π (a, L)��

= E�I (A = a)π (a, L)

E [Y jA = a,π (a, L)]�

= E�E (I (A = a) jπ (a, L))

π (a, L)E [Y jA = a,π (a, L)]

�= E (E [Y jA = a,π (a, L)])

Li, L. (Institute) June 11, 2008 32 / 76

Propensity Score Adjustments

Equity (2)

E [E (Y jA = a, L)] (2)= E [E (Y jA = a,π (a, L))]

holds since equity (3) holds and

E�I (A = a)π (a, L)

Y�

= E�E�I (A = a)π (a, L)

Y jA = a, L��

= E�I (A = a)π (a, L)

E [Y jA = a, L]�

= E�E (I (A = a) jL)

π (a, L)E [Y jA = a, L]

�= E (E [Y jA = a, L])

Ya q AjL) Ya q Ajπ (a, L) (Rosenbaum and Rubin, 1984)

Li, L. (Institute) June 11, 2008 33 / 76

Strati�cation by the Propensity Score

Exploiting the formula E (Ya) = E [E (Y jπ (a, L) ,A = a)] ,Rosenbaum, Rubin and co-authors in a number of papers dating backto the mid-80�s have proposed the following estimator of E (Ya).

1 Specify a model π (L; α) = π (1, L; α) for the conditional probability of

taking treatment pAjL (1jL) , say π (L; α) =�1+ exp

��αT L

���1indexed by a �nite dimensional vector α. Estimate α with its MLestimator bα using data (Ai , Li ) i = 1, ..., n.

2 Stratify the units into, say quintiles, of the estimated propensity scoresπ (Li ;bα) , i = 1, ..., n.

3 Estimate the mean of Y among those having A = a , separately foreach stratum.

4 Compute the weighted average of the means estimated in 3) withweights equal to the proportion of subjects of the sample in eachstratum.

Li, L. (Institute) June 11, 2008 34 / 76

Strati�cation by the Propensity Score

Extendable to multiple treatment arms (Zanutto et al., 2005) underthe validity of the following ordinal logit model:

log�Pr (A � a)Pr (A < a)

�= αa + π� (L)

for some known function π� (L) , e.g., π� (L) = γT L.

Does not entirely remove confounding if within each stratum, there isstill residual confounding by L, i.e. if A and Ya are not independentwithin each stratum (Lunceford and Davidian, 2004).

Li, L. (Institute) June 11, 2008 35 / 76

Inverse Probability Weighting (IPW) Estimators

Exploiting the formula E (Ya) = EhI (A=a)pAjL(ajL)

Yi, we can construct

the following estimator1 Specify a model π (a, L; α) for the conditional probability of taking

treatment pAjL (ajL) , say π (1, L; α) =�1+ exp

��αT L

���1if

a 2 f0, 1g . Estimate α with its ML estimator bα using data (Ai , Li )i = 1, ..., n

2 Construct an estimator of E (Ya) using Pn

nI (A=a)π(a,L;bα)Y

owhere

Pn (Z ) � n�1 ∑ni=1 Zi for any n iid copies Z1, ...,Zn of Z3 Naturally, bψipw 1 � Pn

nI (A=1)π(1,L;bα)Y

o�Pn

nI (A=0)π(0,L;bα)Y

ois a consistent

estimator of ψ = E (Ya=1)� E (Ya=0)

Li, L. (Institute) June 11, 2008 36 / 76

IPW Estimator: A Heuristic ExplanationSuppose that L is a binary baseline covariate, with L = 1 indicating badhealth.

1 Suppose there are 100 subjects with L = 1 and another 100 subjectswith L = 0

2 80 subjects with L = 1 assigned to A = 1 andE (Y jL = 1,A = 1) = 30; the remaining 20 assigned to A = 0 andE (Y jL = 1,A = 0) = 20

3 30 subjects with L = 0 assigned to A = 1 andE (Y jL = 0,A = 1) = 70; the remaining 70 assigned to A = 0 andE (Y jL = 0,A = 0) = 60

4 Under Ya q AjL, to estimate E (Ya=1) , each of the 80 subjects withL = 1 and A = 1 counts for themselves and for the other 20 subjectswith L = 1 but assigned to A = 0 (1+ 20

80 =10.8 ); similarly, each of

the 30 subjects with L = 0 and A = 1 counts for themselves and forthe other 70 subjects with L = 0 and A = 0 (1+ 70

30 =10.3 )

Li, L. (Institute) June 11, 2008 37 / 76

Ratio Estimators

E (Ya) = EhI (A=a)pAjL(ajL)

Yican be re-written as

E (Ya) =hE�I (A=a)pAjL(ajL)

�i�1E�I (A=a)pAjL(ajL)

Y�

Suppose π (a, L; α) is correctly speci�ed, E (Ya) can be estimated by

bE (Ya) = Pn

�I (A = a)Yπ (a, L;bα)

��Pn

�I (A = a)π (a, L;bα)

���1bψipw 2 � bE (Y1)� bE (Y0)bψipw 2 is generally more e¢ cient than bψipw 1Known as a ratio estimator in the sampling literature (Horvitz andThompson, 1952)

Li, L. (Institute) June 11, 2008 38 / 76

IPW Estimators with Augmentations

In fact, in the model that a parametric form π (a, L; α) for pAjL (ajL) isassumed, there exists a class of IPW estimators (with augmentations) ofthe form

bE (Ya)aipw ,d = Pn

hI (A=a)π(a,L;bα)Y

iPn

hI (A=a)π(a,L;bα)

i| {z }ipw2 estimator

�Pn

h�I (A=a)π(a,L;bα) � 1

�d (L)

iPn

hI (A=a)π(a,L;bα)

i| {z }

augmentation term

bα is the ML estimatord (�) is an arbitrary function of LWhen π (a, L; α) is correct, the augmentation term converges to 0 inprobability for any d .

Li, L. (Institute) June 11, 2008 39 / 76

Remarks

The IPW (augmented) estimators, and the estimator based on PSstrati�cation will tend to give highly variable and unreliable estimatorswhen the propensity scores are close to 1 or to 0 for some values of L.

This problem is often referred to in the literature as the problem ofpoor overlap of the propensity scores between the treated anduntreated.

Li, L. (Institute) June 11, 2008 40 / 76

Remarks

The regression estimator based on a parametric model, φ(a, L; η), forE (Y jA = a, L) does not su¤er from this problem because itextrapolates from the �tted model bE (Y jA = a, L) = φ(a, L; bη) toproduce estimates of E (Y jA = a, l) for values of l with π (a, l) ' 0.However, the estimation now strongly depend on the validity of theregression model φ(a, L; η).

Li, L. (Institute) June 11, 2008 41 / 76

Propensity Score Matching

Rosenbaum and Rubin (1983) �rst proposed this approach toestimate the average treatment e¤ect on the treated, i.e.,E (Ya=1jA = 1)� E (Ya=0jA = 1) .

1 Specify a model π (L; α) = π (1, L; α) for the conditional probability of

taking treatment pAjL (1jL) , say π (L; α) =�1+ exp

��αT L

���1indexed by a �nite dimensional vector α. Estimate α with its MLestimator bα using data (Ai , Li ) i = 1, ..., n, and bπi = π (Li ;bα)

2 Match each treated subject to one or more untreated subjects onpropensity score based on certain distance (Rosenbaum, 2002)

3 The estimator can be constructed as

bψPSM =∑i I (Ai = 1)

�Yi � Ym,i

∑i I (Ai = 1)

,

where Ym,i is the average outcome of a few subjects in the untreatedgroup (the "controls") with propensity scores close to bπi .

Li, L. (Institute) June 11, 2008 42 / 76

Propensity Score Matching

Suppose bα converges in probability to α�

Suppose model π (L; α) is correctly speci�ed, bψPSM converges inprobability to

E f[E (Y jA = 1,π (L; α�))� E (Y jA = 0,π (L; α�))] jA = 1g

= E��

E (Ya=1jA = 1,π (L; α�))�E (Ya=0jA = 1,π (L; α�))

�jA = 1

�= E (Ya=1 � Ya=0jA = 1)

Li, L. (Institute) June 11, 2008 43 / 76

Confounding in Observation Studies

Identify causation from association

Standard regression

Propensity Score (PS) adjustments

PS strati�cationInverse probability weighting (IPW)PS matching

Li, L. (Institute) June 11, 2008 44 / 76

Outline

Introduction to Causality

Confounding in Observational Studies

Dependent Censoring in Randomized TrialsSensitivity Analysis for Nonignorable Censoring

Li, L. (Institute) June 11, 2008 45 / 76

Motivation Example II

The AIDS Clinical Trial Group (ACTG) study 021, a double-blindrandomized clinical trial

Compare the e¤ect of bactrim versus aerosolized pentamidine (AP) asprophylaxis therapy for pneumocystis pneumonia (PCP) in AIDSpatients

Our endpoint of interest: the survival status at the end of the trial

Li, L. (Institute) June 11, 2008 46 / 76

Motivation Example II

Informative drop-out: patients stopped prophylactic therapies fornon-medical reasons and medical indications

Nonrandom nonadherence: cross over to the other treatment arm ifPCP developed

Causal e¤ect: the e¤ect of two prophylaxis therapies on the survivalrate at the end of the trial, if everyone followed the assigned therapyand didn�t stop the therapy unless for toxicity

(Arti�cially) regard subjects as dependently censored at the �rst timea subject stops therapy due to reasons other than toxicity or switchestherapy

Li, L. (Institute) June 11, 2008 47 / 76

Motivation Example II

Appropriate palliative therapies available to combat the toxicity

Causal e¤ect: the e¤ect of two prophylaxis therapies on the survivalrate at the end of the trial, if everyone followed the assigned therapyand never stopped the therapy

(Arti�cially) regard subjects as dependently censored at the �rst timea subject stops therapy due to any reason or switches therapy

Li, L. (Institute) June 11, 2008 48 / 76

A Randomized Clinical Trial with Missing Data

Suppose we have i .i .d . observations Oi = (RiYi ,Ai , Li ) i = 1, 2, ...n.

Li : a vector of baseline covariatesAi : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 f0, 1gYi : continuous outcome measured at the end of a �xed follow-upperiod

Ri : missing indicator, Yi observed if Ri = 1

Li, L. (Institute) June 11, 2008 49 / 76

Missing Completely at Random (MCAR)

MCAR:

p (R = 1jY ,A, L) = p (R = 1jA, L),R q Y jA = a, L,

pY jR=1,A,L (y jR = 1, a, l) = pY jA,L (y ja, l)

Further, as Aq L and pYa (y) = pY jA=a (y) by randomization,

E [E (Y jR = 1,A = a, L)]

=Zp (Y jA = a, l) pL (l) dµ (l)

AqL=

Zp (Y jA = a, l) pLjA (l ja) dµ (l)

= E (Y jA = a) rand .= E (Ya)

Standard complete case analysis is valid

Li, L. (Institute) June 11, 2008 50 / 76

Motivation Example II

Per-protocol analysis valid if the probability of being censored isindependent of the outcome given the treatment assignment andsome pre-treatment covariates

Apparently NOT true

stopping therapy due to medication indications and other reasonsswitching therapy if PCP developedPCP status and the medication indications likely to a¤ect the survivalstatus

Li, L. (Institute) June 11, 2008 51 / 76

A Randomized Clinical Trial with Missing Data

Suppose we have i .i .d . observations Oi = (RiYi ,Ai , Li ,Wi )i = 1, 2, ...n.

Li : a vector of baseline covariatesAi : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 f0, 1gWi : a vector of post-treatment covariates correlated with Yi andalways observed

Yi : continuous outcome measured at the end of a �xed follow-upperiod

Ri : missing indicator, Yi observed if Ri = 1

Li, L. (Institute) June 11, 2008 52 / 76

Missing at Random (MAR)

MAR:

p (R = 1jY ,A = a, L,W ) = p (R = 1jA = a, L,W ), R q Y jA = a, L,W

Positivity: Pr (R = 1jA = a, L,W ) > 0 w.p. 1However, in general W̄1 � (L,W )q A DOES NOT hold.Equivalently,

E [E (Y jR = 1,A = a, W̄1)]

=ZpY jA,W̄1

(y ja, w̄1) pW̄1(w̄1) dµ (w̄1)

6=ZpY jA,W̄1

(y ja, w̄1) pW̄1(w̄1ja) dµ (w̄1) = E (Ya)

Standard regression using complete cases will be biased.

Li, L. (Institute) June 11, 2008 53 / 76

A "Missing Data" Problem

As thoroughly discussed in Robins et al. (1994), the estimation ofaverage treatment e¤ect ψ = E (Ya=1)� E (Ya=0) can be viewed asa "missing data" problem.

Suppose we want to estimate E (Ya=1)

"full data": (Ya=1,A, L) .Ya=1 is observed for the subgroup with A = 1, but missing for theother subgroup with A = 0Under NUCA (i.e., Ya q AjL), p (A = 1jYa=1, L) = p (A = 1jL)

Similarly, for E (Ya=0)

"full data": (Ya=0,A, L)Ya=0 is observed for the subgroup with A = 0, but missing for theother subgroup with A = 1

Li, L. (Institute) June 11, 2008 54 / 76

Inverse Probability Censoring Weighting (IPCW)

Suppose we are interested in τ1 = E (Ya=1)

By randomization, pYa=1 (y) = pY jA=1 (y)

pY jA,R (y jA = 1,R = 1) 6= pY jA,R (y jA = 1,R = 0)pY jA,R ,W̄1

(y jA = 1,R = 1, w̄1) = pY jA,R (y jA = 1,R = 0, w̄1)Within each level of W̄1, by assumptions, the distribution of Y amongthose with R = 0, though not observed in our data, equals thedistribution of Y among those with R = 1.

Li, L. (Institute) June 11, 2008 55 / 76

Observation study Missing Data in RCTE (Y1) E (Y1) = E (Y jA = 1)Whole population Sub-population with A = 1Y1 YA RY1 observed if A = 1 Y observed if R = 1L W̄1

p (AjL,Y1) = p (AjL) p (R jA = 1, W̄1,Y ) = p (R jA = 1, W̄1)

Li, L. (Institute) June 11, 2008 56 / 76

Inverse Probability Weighting (IPW) Estimators

Exploiting the formula E (Ya) = EhI (A=a)pAjL(ajL)

Yi,

bE1 (Ya) � Pn

nI (A=1)π(1,L;bα)Y

obE2 (Ya) = Pn

nI (A=a)Yπ(a,L;bα)

o �Pn

nI (A=a)π(a,L;bα)

o��1as E

�I (A=a)π(a,L;α)

�= 1

......

Li, L. (Institute) June 11, 2008 57 / 76

Inverse Probability Censoring Weighting (IPCW)

Suppose π1 (W̄1) � Pr (R = 1jW̄1,A = 1)

E�

RYπ1(W̄1)

jA = 1�= E fE (Y jR = 1, W̄1,A = 1) jA = 1g MAR=

E (Y jA = 1) rand .= E (Ya=1)

Impose a parametric form π1 (W̄1; α) for Pr (R = 1jW̄1,A = 1) , andbα is the ML estimatorA natural estimator bτ1,ipcw = Pn1

hR

π1(W̄1;bα)Yi, where n1 is the size

of the treated group, and Pn1 (Z ) is the sample average ofZ1,Z2, ...,Zn1 within the treated group.

Li, L. (Institute) June 11, 2008 58 / 76

IP(T)W Estimators with Augmentations

Recall the class of IP(T)W estimators (with augmentations)

bE (Ya)aipw ,d = Pn

hI (A=a)π(a,L;bα)Y

iPn

hI (A=a)π(a,L;bα)

i| {z }ratio estimator

�Pn

h�I (A=a)π(a,L;bα) � 1

�d (L)

iPn

hI (A=a)π(a,L;bα)

i| {z }

augmentation term

The parameter of interest E (Ya) = E�I (A=a)pAjL(ajL)

Y�under CNP

π (a, L; α) is an assumed parametric form for pAjL (ajL) and bα is theML estimator using (Ai , Li ) i = 1, ..., n

d (�) is an arbitrary function of LWhen π (a, L; α) is correct, the augmentation term converges to 0 inprobability for any d .

Li, L. (Institute) June 11, 2008 59 / 76

IPCW Estimators with Augmentations

Similarly, a class of IPCW estimators with augmentation terms

bτ1,ipcw ,d = Pn1

hR

π1(W̄1;bα)Yi

Pn1

hR

π1(W̄1;bα)i �

Pn1

h�R

π1(W̄1;bα) � 1�d (W̄1)

iPn1

hR

π1(W̄1;bα)i ,

The parameter of interestE (Ya=1) = E (Ya=1jA = 1) = E

�R

π1(W̄1)Y jA = 1

�under MAR and

positivity

π1 (W̄1; α) is an assumed parametric form for Pr (R = 1jW̄1,A = 1)and bα is the ML estimator using realizations of (R, W̄1) in the treatedgroup only

d (�) is an arbitrary function of W̄1

When π1 (W̄1; α) is correct, the augmentation term converges to 0 inprobability for any d .

Li, L. (Institute) June 11, 2008 60 / 76

Dependent Censoring in a Longitudinal Study

Suppose the surrogate variables W are to be measured at �xed timesk = 1, 2, ...,K � 1.For any k � K � 1, let W̄k ,i = (W0,i = Li ,W1,i , ...,Wk�1,i ) , theW�history for subject i up to but not including the kth occasion.The outcome Yi measured at time K , the end of follow-up.

Rk ,i = 1 if subject i remains in the follow-up at time k; and 0otherwise.

Monotone missing pattern, i.e., Rk ,i = 0) Rs ,i = 0 for any s � k.

Li, L. (Institute) June 11, 2008 61 / 76

Dependent Censoring in a Longitudinal Study

For any 1 � k � K ,1 Sequentially ignorable missingness (SIM):

Pr (Rk = 1jRk�1 = 1, W̄k ,A,Y ) = Pr (Rk = 1jRk�1 = 1,A, W̄k )

2 Positivity: with probability 1,

λk (a, W̄k ) � Pr (Rk = 1jRk�1 = 1, W̄k ,A = a) � σ > 0

π (a, W̄K ) � (RK = 1jW̄K ,A = a) =K

∏k=1

λk (a, W̄k ). Let R = RK

E (Ya) = E (YajA = a) = E�

Rπ(a,W̄K )

Y jA = a�under (i)

Consistency, (ii) SIM, and (iii) Positivity

Li, L. (Institute) June 11, 2008 62 / 76

Dependent Censoring in a Longitudinal Study

Impose a parametric from fλk (1, W̄k ; α) : 1 � k � Kg indexed by a�nite-dimensional parameter vector α

The partial likelihood for the missing mechanism

L =K

∏k=1

nλk (1, W̄k ; α)

Rk (1� λk (1, W̄k ; α))1�Rk

oRk�1The estimator bα obtained by maximizing the partial likelihood n1

∏i=1Li

using (RiW̄K ,i , ...,Rk ,iW̄k ,i , ...,R1,iW̄1,i ,W0,i ) for i = 1, ..., n1, thesubjects in the treated group

bπ (1, W̄K ) =K

∏k=1

λk (1, W̄k ;bα) and bE (Ya) = Pn1

nRbπ(1,W̄K )

Yo

Li, L. (Institute) June 11, 2008 63 / 76

Outline

Introduction to Causality

Confounding in Observational Studies

Dependent Censoring in Randomized Trials

Sensitivity Analysis for Nonignorable Censoring

Li, L. (Institute) June 11, 2008 64 / 76

Back to the MAR Setting

Suppose we have i .i .d . observations Oi = (RiYi ,Ai , W̄1,i = (Li ,Wi ))i = 1, 2, ...n.

Li : a vector of baseline covariatesWi : a vector of post-treatment covariates correlated with Yi andalways observed

Ai : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 0, 1Yi : continuous outcome measured at the end of a �xed follow-upperiod

Ri : missing indicator, Yi observed if Ri = 1

Li, L. (Institute) June 11, 2008 65 / 76

Nonignorable Censoring

Ignorable censoring: p (R = 1jY ,A = a, W̄1) = p (R = 1jA = a, W̄1)

Nonignorable censoring :p (R = 1jY ,A = a, W̄1) 6= p (R = 1jA = a, W̄1)

Because of ignorability, the likelihood for one individual is given by8<:hpY jR ,W̄1,A=a (y jR = 1, w̄1)

iRpW̄1 jA=a (w̄1)

� [π (a, W̄1; α)]R(1� π (a, W̄1; α))

1�R

9=;=

8<:hpY jW̄1,A=a (y jw̄1; η1)

iRpW̄1 jA=a (w̄1; η2)

� [π (a, W̄1; α)]R(1� π (a, W̄1; α))

1�R

9=;The parameter of interest τ = E (Y jA = a) depends on η1 and η2.

Li, L. (Institute) June 11, 2008 66 / 76

Nonignorable Censoring

p (R = 1jY ,A = a, W̄1) = π (a, W̄1,Y ; α) instead of π (a, W̄1; α)

For example, π (a, W̄1,Y ; α) = exp it�

αT W̄1 + γY

γ = 0 , ignorable censoring

γ: a measure of the correlation between the censoring indicator Rand the outcome Y after adjusting for W̄1

Larger value of γ implies more hidden bias

Li, L. (Institute) June 11, 2008 67 / 76

Sensitivity Analysis

For any given value γ,

ML estimator bαML not obtainable as the following likelihood dependson the unobserved outcome

L =na

∏i=1

e(αT W̄1,i+γYi)Ri

1+ eαT W̄1,i+γYi

a CAN estimator bα still obtainable by solving the following estimatingequations

na

∑i=1

�Ri

π (a, W̄1,i ,Yi ; α)� 1

�d (W̄1,i ) = 0

where d (�) is a vector of arbitrary functions of W̄1,i , and has thesame dimension as α

Li, L. (Institute) June 11, 2008 68 / 76

Sensitivity Analysis

The IPCW estimator is robust if insensitive to di¤erent values of γ

More �exible models for the missing mechanism:π (a, W̄1,Y ; α) = exp it

�αT W̄1 + γqa (Y , W̄1)

with qa (Y , W̄1)

being a known function of Y and W̄1

The choice of d (�) in the estimating equations a¤ects the e¢ ciencySee Rotnitzky et al. (1998, 2001) and Scharfstein et al. (1999) fordetailed discussions

Li, L. (Institute) June 11, 2008 69 / 76

Conclusions (I): the framework of counterfactuals can beuseful

Make it explicit what we mean by "causal e¤ect", i.e., what is thequantity/estimand we seek?

Give explicit assumptions under which "association is causation", andtherefore standard statistical methods may be used

Give explicit assumptions needed for the identi�cation of causale¤ects even when "association is not causation"

Allow for the derivation of new statistical methods when standard andfamiliar methods fail

Li, L. (Institute) June 11, 2008 70 / 76

Conclusions (II): methods need to be used with caution

The counterfactual framework may not apply to some settings

Make sure the estimand has the right meaning and interpretation

The validity of associated assumptions, some assumptions not eventestable, e.g., NUCAExpert knowledge can be used to enhance the plausibility of theassumptions

Li, L. (Institute) June 11, 2008 71 / 76

References

Connors, AF., Spero¤, T., Dawson, NV., et al. (1996). Thee¤ectiveness of right heart catheterization i the initial care of criticallyill patients. Journal of the American Medical Association. 276,889-897.

Fischl, MA., Parker, CB., Pettinellli, C., et al. (1990). A randomizedcontrolled trial of a reduced daily dose Zidovudine in patients with theacquired immuno-de�ciency syndromes. New England Journal ofMedicine. 323, 1009-1014.

Horvitz, DG. and Thompson, DJ. (1952). A generalization ofsampling without replacement from a �nite universe. Journal of theAmerican Statistical Association 47, 663-685.

Lin, DY., Psaty, BM., Kronmal, RA. (1998). Assessing the sensitivityof regression results to unmeasured confounders in observationalstudies. Biometrics. 54, 948-963.

Li, L. (Institute) June 11, 2008 72 / 76

References

Lunceford, JK. and Davidian, M. (2004). Strati�cation and weightingvia the propensity score in estimation of causal treatment e¤ects: Acomparative study. Statistics in Medicine 23, 2937-2960.

Robins, JM. and Finkelstein, DM. (2000). Correcting fornoncompliance and dependent censoring in an AIDS clinical trial withinverse probability of censoring weighted (IPCW) log-rank tests.Biometrics, 779-788.

Robins, JM., Rotnitzky, A., Zhao, LP. (1994). Estimation ofregression coe¢ cients when some regressors are not always observed.Journal of the American Statistical Association 89, 846-866.

Rosenbaum, PR. and Rubin, DB. (1983). The Central role of thepropensity score in observational studies for causal e¤ects. Biometrika70, 41-55.

Li, L. (Institute) June 11, 2008 73 / 76

References

Rosenbaum, PR. and Rubin, DB. (1984). Reducing bias inobservational studies using subclassi�cation on the propensity score.Journal of the American Statistical Association. 79, 516-524.

Rosenbaum, PR. (1998). Propensity score. In Encyclopedia ofBiostatistics, Volume 5, Armitage P, Colton T (eds). Wiley: NewYork, 3551-3555.

Rosenbaum, RP. (2002). Observational Studies, 2nd edn. New York:Springer-Verlag.

Rotnitzky, A. and Robins, JM. (1995) Semiparametric regressionestimation in the presence of dependent censoring. Biometrika, 82(4):805-20.

Li, L. (Institute) June 11, 2008 74 / 76

References

Rotnitzky, A., Robins, JM., Scharfstein, DO. (1998). Semiparametricregression for repeated outcomes with nonignorable nonresponse.Journal of the American Statistical Association. 93, 1321-1339.

Rotnitzky, A., Scharfstein, DO., Su, TL., Robins, JM. (2001).Methods for conducting sensitivity analysis of trials with potentiallynonignorable competing causes of censoring. Biometrics. 57,103-113.

Scharfstein, DO., Robins, JM., Eddings, W., Rotnitzky, A. (2001).Inference in randomized studies with informative censoring anddiscrete time-to-event endpoints. Biometrics, 57(2):404-413.

Scharfstein, DO. and Robins, JM. (2002). Estimation of the failuretime distribution in the presence of informative censoring. Biometrika,89(3):617-634.

Li, L. (Institute) June 11, 2008 75 / 76

References

Scharfstein, DO., Rotnitzky, A., Robins, JM. (1999). Adjusting fornonignorable drop-out using semiparametric nonresponse models(with discussion). Journal of the American Statistical Association.94, 1096-1146.

Tan, Z. (2006) A distributional approach for causal inference usingpropensity scores. Journal of the American Statistical Association.101(476):1619-37.

Zanutto, E., Lu, B., and Hornik, R. (2005). Using propensity scoresubclassi�cation for multiple treatment doses to evaluate a nationalantidrug media campaign. Journal of Educational and BehavioralStatistics 30(1): 59-73.

Li, L. (Institute) June 11, 2008 76 / 76