Causal Inference in Clinical Trials - Statistics Homepage · Technical interlude: in the...
Transcript of Causal Inference in Clinical Trials - Statistics Homepage · Technical interlude: in the...
Causal Inference in Clinical Trials
Lingling Li
Department of Ambulatory Care and PreventionHarvard Medical School
June 11, 2008
Li, L. (Institute) June 11, 2008 1 / 76
Motivating Example I
The Study to Understand Prognoses and Preferences for Outcomesand Risks of Treatments (SUPPORT)
Examine the e¤ect of the use of right heart catheterization (RHC) onsurvival time up to 30 days
Observational study with 5735 critically ill adult patients
2184 patients received RHC
Li, L. (Institute) June 11, 2008 2 / 76
Motivating Example I
For those 5735 patients, data available on
A dichotomous treatment indicator variable A
A = 1 indicates receiving RHC during the initial 24 hours of an ICUstay (2184)
Response variable Y : survival time up to 30 daysA vector of baseline covariates L including age, years of education,income, weight, blood pressure etc.
Li, L. (Institute) June 11, 2008 3 / 76
Motivating Example I
Confounding: decision to use or withhold RHC was left to thediscretion of the physician based on patient characteristics
Analyzed using di¤erent methods in several published papers(Connors et al., 1996; Lin et al., 1998; Tan, Z., 2006)
Lack of evidence suggesting bene�cial e¤ects of RHC on the 30-daysurvival
The appearance of a harmful e¤ect could be explained away byunmeasured confounding
Li, L. (Institute) June 11, 2008 4 / 76
Motivation Example II
The AIDS Clinical Trial Group (ACTG) study 021, a double-blindrandomized clinical trial
Compare the e¤ect of bactrim versus aerosolized pentamidine (AP) asprophylaxis therapy for pneumocystis pneumonia (PCP) in AIDSpatients
Primary endpoint: time to recurrent PCP
Cross over to the other treatment allowed if PCP developed
Secondary endpoint: time to death
Li, L. (Institute) June 11, 2008 5 / 76
Motivation Example II
310 patients accrued, 94 died, the remaining 216 patients either aliveat the end of the trial or dropped out in the middle
Of the 94 deaths, 21 crossed over, 37 stopped all prophylactic therapy(21 for non-medical reasons and 16 for medical indications)
Suppose we are interested in the survival status at the end of thetrial:
Y = 1 if dead at the end of the trailY = 0 if alive at the end of the trail
Li, L. (Institute) June 11, 2008 6 / 76
Motivation Example II
Cross over to the other treatment arm (nonrandom nonadherence)
Informative drop-out due to medication indications
Standard methods: Intent-to-treat analysis, Per-protocol/As-treatedanalysis
Li, L. (Institute) June 11, 2008 7 / 76
Outline
Introduction to Causality
Confounding in Observational Studies
Dependent Censoring in Randomized Trials
Sensitivity Analysis for Nonignorable Censoring
Li, L. (Institute) June 11, 2008 8 / 76
Outline
Introduction to CausalityConfounding in Observational Studies
Dependent Censoring in Randomized Trials
Sensitivity Analysis for Nonignorable Censoring
Li, L. (Institute) June 11, 2008 9 / 76
Introduction
There are generally two notions of causation:
1 Cause of an e¤ect: �rst observe an event/outcome, and subsequentlyidentify the causes or events that lead to the observed outcome
2 E¤ect of a cause: assess the e¤ect of a well de�ned exposure orintervention. e.g. does smoking cause lung cancer? doesazidothymidine (AZT) prevent the advent of AIDS among HIVinfected patients?
Li, L. (Institute) June 11, 2008 10 / 76
Introduction
An example of (1):
In the 80s, unusual high number of patients dying from a combinationof syndromes including a rare Kaposi�s skin cancer and pneumoniaLater, HIV found to be the cause
Limited to (2) in this lecture with our focus on challenges in clinicaltrials
Li, L. (Institute) June 11, 2008 11 / 76
Introduction
Why do we need a formal theory of causation?
Make it explicit what we mean by "causal e¤ect", i.e., what is thequantity/estimand we seek?
Give explicit assumptions under which "association is causation", andtherefore standard statistical methods may be used
Give explicit assumptions needed for the identi�cation of causale¤ects even when "association is not causation"
Allow for the derivation of new statistical methods when standard andfamiliar methods fail
Li, L. (Institute) June 11, 2008 12 / 76
Counterfactuals
Suppose you are contemplating taking an aspirin for your headache,and the outcome Y denotes whether or not you are headache freewithin say the next hour.
As a thought experiment, think of two potential outcomes
Y0 : headache outcome after not taking aspirin
Y1 : headache outcome after taking aspirin
Note everything else remains exactly the same
Either of the two will be observed depending on whether or not youdecide to take the aspirin, but never both
Li, L. (Institute) June 11, 2008 13 / 76
Counterfactuals
Ya is the outcome that you would observe if, possibly contrary tofact, you followed treatment a 2 f0, 1g.
The English sentence "aspirin has no causal e¤ect on my headache"is the mathematical statement about my potential outcomes:
Y1 = Y0.
Suppose larger Y indicates better outcome, an individual has abene�cial causal e¤ect of aspirin if Y1 > Y0; or has a harmful causale¤ect of aspirin if Y1 < Y0.
Li, L. (Institute) June 11, 2008 14 / 76
Counterfactuals
Consistency assumption: the observed outcome Y satis�es
Y = AY1 + (1� A)Y0
ID A Y Y0 Y11 0 0 0 ?2 0 1 1 ?3 1 1 ? 14 1 1 ? 1
Note that since both outcomes are never simultaneously observed, itis impossible to evaluate individual causal e¤ects.
Li, L. (Institute) June 11, 2008 15 / 76
Counterfactuals
Remark: the mere de�nition of the potential variable Ya carries theso-called assumption of no-interference between units (i.e.,Stable unit treatment value assumption). Under this assumption,the value of the outcome of one subject who receives treatment a, isuna¤ected by the treatments received by the other subjects in thepopulation.
As an example, the potential outcome would be ill-de�ned if A = 0was placebo and A = 1 was treatment with a vaccine for a highlycontagious disease. Obviously, if the vaccine is e¤ective, the value ofY0 would depend on whether or not the contacts of the person getthe vaccine.
Li, L. (Institute) June 11, 2008 16 / 76
Without the no-interference assumption the notation Ya would bemeaningless. We would need a de�nition of potential variable for eachpossible value taken by the vector of treatments of all remainingsubjects in the population!
Li, L. (Institute) June 11, 2008 17 / 76
Association vs Causal Measures in the Population
The variables we can hope to observe are (A,Y ) .
ID A Y Y0 Y11 0 0 0 02 0 1 1 13 1 1 1 14 1 1 1 1
Average causal e¤ect (a causal e¤ect measure in the population)
ψ � E (Y1)� E (Y0) = 3/4� 3/4 = 0
Di¤erence of conditional expectations (an association measure)
ζ = E (Y jA = 1)� E (Y jA = 0) = 2/2� 1/2 = 1/2
Li, L. (Institute) June 11, 2008 18 / 76
Counterfactuals
Technical interlude: in the expressionsE (Y jA = 1) ,E (Y jA = 0) ,E (Y1) and E (Y0) , the randomvariables Y , (Y1,Y0) are to be understood as the outcome andpotential outcomes of a person chosen at random from the �nitepopulation of four individuals. The expectations, of course, coincidewith the averages of the values of the variables in the four membersof the �nite population. Later on, we will regard these variables asarising from one random draw from an in�nite population and theexpectations will be understood as the average of the values of thevariables in the in�nite members of the population.
Li, L. (Institute) June 11, 2008 19 / 76
Randomization
Suppose you randomize the population of patients to either aspirinwith probability p > 0 or to no aspirin with probability 1� p > 0.Then, with q denoting independence, it holds that
Ya q A, for a = 0, 1
because, Y1 and Y0 are, like gender and age, pretreatment variables.
In such case we have for a = 0, 1, that
P (Ya = 1) =|{z}by randomizationand 0<p<1
P (Ya = 1jA = a) =|{z}by consistency
P (Y = 1jA = a)
Li, L. (Institute) June 11, 2008 20 / 76
Randomization
Thus, the probability distribution of the counterfactuals Ya, a = 0, 1,can be written in terms of the distribution of the observed data(Y ,A) and hence it is identi�ed.
Note that randomization does not imply Y q A sinceY = AY1 + (1� A)Y0 is determined by treatment and therefore is apost-treatment variable.
In fact, under randomization,
Y q A, Y1D= Y0 (Y1 and Y0 equal in distribution)
Li, L. (Institute) June 11, 2008 21 / 76
Motivating Example I
The Study to Understand Prognoses and Preferences for Outcomesand Risks of Treatments (SUPPORT)
Examine the e¤ect of the use of right heart catheterization (RHC) onsurvival time up to 30 days
Observational study with 5735 critically ill adult patients
2184 patients received RHC
Li, L. (Institute) June 11, 2008 22 / 76
Motivating Example I
For those 5735 patients, data available on
A dichotomous treatment indicator variable A
A = 1 indicates receiving RHC during the initial 24 hours of an ICUstay (2184)
Response variable Y : survival time up to 30 daysA vector of baseline covariates L including age, years of education,income, weight, blood pressure etc.
Decision to use or withhold RHC was left to the discretion of thephysician based on patient characteristics
Li, L. (Institute) June 11, 2008 23 / 76
Outline
Introduction to Causality
Confounding in Observational StudiesDependent Censoring in Randomized Trials
Sensitivity Analysis for Nonignorable Censoring
Li, L. (Institute) June 11, 2008 24 / 76
Observational Study
Randomization not applicable due to ethical and practical reasons
Observed data comes from a cross-sectional observational study
A random sample from (L,A,Y ) where L is a vector of pre-treatmentcovariates.
No unmeasured confounders assumption (NUCA):
Ya q AjL, for a = 0, 1
i.e. Ya and A are conditionally independent given L.
Li, L. (Institute) June 11, 2008 25 / 76
Observational Study
The intuition behind (NUCA) similar to that of Randomization
Within each level of L, A randomly assigned
The random probabilities likely to depend on L
NUCA achievable only if all common predictors of A and Y aremeasured
Li, L. (Institute) June 11, 2008 26 / 76
The G-formula.
Theorem 1: if (i) consistency, (ii) NUCA and (iii) (Positivity)P (A = ajL) > 0 (CNP) hold w.p. 1, the distribution of Ya isidenti�ed and it satis�es
pa (y) =Zp (y jl , a) dF (l)
pa � pYa (y) : the density (with respect to some measure µ) of Ya aty
p (y jl , a) � pY jL,A (y jl , a) : the conditional density of Y (at y) givenL = l and A = a
F (l) : the c.d.f. of L at l
Li, L. (Institute) June 11, 2008 27 / 76
The G-formula.
Proof: for a = 0, 1
pa (y) � pYa (y)
=ZpYa jL (y jl) dF (l)
=ZpYa jL,A (y jl , a) dF (l) by randomization and positivity
=ZpY jL,A (y jl , a) dF (l) by consistency
Li, L. (Institute) June 11, 2008 28 / 76
Crude vs Counterfactual Means
In general, the crude mean E (Y jA = a) is not equal to thecounterfactual mean E (Ya) .
E (Ya) : the mean of the outcome Y if, contrary to the fact, everyonein the population was forced to take treatment A = a.E (Y jA = a) : the mean of the outcome Y among the sub-populationwho chose to take treatment A = a.
Li, L. (Institute) June 11, 2008 29 / 76
Crude vs Counterfactual Means
Under the assumptions of the previous theorem (CNP), the causaltreatment e¤ect can be identi�ed as
E (Ya)(1)= E [E (Y jA = a, L)](2)= E [E (Y jA = a,π (a, L))](3)= E
�I (A = a)π (a, L)
Y�,
where π (a, L) � p (A = ajL) , the so-called propensity score (PS).
Li, L. (Institute) June 11, 2008 30 / 76
Standard Regression
Equity (1) holds since
E (Ya) = E [E (YajL)] = E [E (YajA = a, L)] = E [E (Y jA = a, L)] .
Suppose a 2 f0, 1g , and E (Y jA, L) = b0 (L; η) + βA, then
ψ = E (Ya=1)� E (Ya=0) = β
The maximum likelihood estimator bβ is consistent and attains thee¢ ciency bound IF the parametric model of E (Y jA, L) is correctlyspeci�ed.
Li, L. (Institute) June 11, 2008 31 / 76
Propensity Score AdjustmentsEquity (3)
E [E (Y jA = a,π (a, L))] (3)= E�I (A = a)π (a, L)
Y�
holds since
E�I (A = a)π (a, L)
Y�
= E�E�I (A = a)π (a, L)
Y jA = a,π (a, L)��
= E�I (A = a)π (a, L)
E [Y jA = a,π (a, L)]�
= E�E (I (A = a) jπ (a, L))
π (a, L)E [Y jA = a,π (a, L)]
�= E (E [Y jA = a,π (a, L)])
Li, L. (Institute) June 11, 2008 32 / 76
Propensity Score Adjustments
Equity (2)
E [E (Y jA = a, L)] (2)= E [E (Y jA = a,π (a, L))]
holds since equity (3) holds and
E�I (A = a)π (a, L)
Y�
= E�E�I (A = a)π (a, L)
Y jA = a, L��
= E�I (A = a)π (a, L)
E [Y jA = a, L]�
= E�E (I (A = a) jL)
π (a, L)E [Y jA = a, L]
�= E (E [Y jA = a, L])
Ya q AjL) Ya q Ajπ (a, L) (Rosenbaum and Rubin, 1984)
Li, L. (Institute) June 11, 2008 33 / 76
Strati�cation by the Propensity Score
Exploiting the formula E (Ya) = E [E (Y jπ (a, L) ,A = a)] ,Rosenbaum, Rubin and co-authors in a number of papers dating backto the mid-80�s have proposed the following estimator of E (Ya).
1 Specify a model π (L; α) = π (1, L; α) for the conditional probability of
taking treatment pAjL (1jL) , say π (L; α) =�1+ exp
��αT L
���1indexed by a �nite dimensional vector α. Estimate α with its MLestimator bα using data (Ai , Li ) i = 1, ..., n.
2 Stratify the units into, say quintiles, of the estimated propensity scoresπ (Li ;bα) , i = 1, ..., n.
3 Estimate the mean of Y among those having A = a , separately foreach stratum.
4 Compute the weighted average of the means estimated in 3) withweights equal to the proportion of subjects of the sample in eachstratum.
Li, L. (Institute) June 11, 2008 34 / 76
Strati�cation by the Propensity Score
Extendable to multiple treatment arms (Zanutto et al., 2005) underthe validity of the following ordinal logit model:
log�Pr (A � a)Pr (A < a)
�= αa + π� (L)
for some known function π� (L) , e.g., π� (L) = γT L.
Does not entirely remove confounding if within each stratum, there isstill residual confounding by L, i.e. if A and Ya are not independentwithin each stratum (Lunceford and Davidian, 2004).
Li, L. (Institute) June 11, 2008 35 / 76
Inverse Probability Weighting (IPW) Estimators
Exploiting the formula E (Ya) = EhI (A=a)pAjL(ajL)
Yi, we can construct
the following estimator1 Specify a model π (a, L; α) for the conditional probability of taking
treatment pAjL (ajL) , say π (1, L; α) =�1+ exp
��αT L
���1if
a 2 f0, 1g . Estimate α with its ML estimator bα using data (Ai , Li )i = 1, ..., n
2 Construct an estimator of E (Ya) using Pn
nI (A=a)π(a,L;bα)Y
owhere
Pn (Z ) � n�1 ∑ni=1 Zi for any n iid copies Z1, ...,Zn of Z3 Naturally, bψipw 1 � Pn
nI (A=1)π(1,L;bα)Y
o�Pn
nI (A=0)π(0,L;bα)Y
ois a consistent
estimator of ψ = E (Ya=1)� E (Ya=0)
Li, L. (Institute) June 11, 2008 36 / 76
IPW Estimator: A Heuristic ExplanationSuppose that L is a binary baseline covariate, with L = 1 indicating badhealth.
1 Suppose there are 100 subjects with L = 1 and another 100 subjectswith L = 0
2 80 subjects with L = 1 assigned to A = 1 andE (Y jL = 1,A = 1) = 30; the remaining 20 assigned to A = 0 andE (Y jL = 1,A = 0) = 20
3 30 subjects with L = 0 assigned to A = 1 andE (Y jL = 0,A = 1) = 70; the remaining 70 assigned to A = 0 andE (Y jL = 0,A = 0) = 60
4 Under Ya q AjL, to estimate E (Ya=1) , each of the 80 subjects withL = 1 and A = 1 counts for themselves and for the other 20 subjectswith L = 1 but assigned to A = 0 (1+ 20
80 =10.8 ); similarly, each of
the 30 subjects with L = 0 and A = 1 counts for themselves and forthe other 70 subjects with L = 0 and A = 0 (1+ 70
30 =10.3 )
Li, L. (Institute) June 11, 2008 37 / 76
Ratio Estimators
E (Ya) = EhI (A=a)pAjL(ajL)
Yican be re-written as
E (Ya) =hE�I (A=a)pAjL(ajL)
�i�1E�I (A=a)pAjL(ajL)
Y�
Suppose π (a, L; α) is correctly speci�ed, E (Ya) can be estimated by
bE (Ya) = Pn
�I (A = a)Yπ (a, L;bα)
��Pn
�I (A = a)π (a, L;bα)
���1bψipw 2 � bE (Y1)� bE (Y0)bψipw 2 is generally more e¢ cient than bψipw 1Known as a ratio estimator in the sampling literature (Horvitz andThompson, 1952)
Li, L. (Institute) June 11, 2008 38 / 76
IPW Estimators with Augmentations
In fact, in the model that a parametric form π (a, L; α) for pAjL (ajL) isassumed, there exists a class of IPW estimators (with augmentations) ofthe form
bE (Ya)aipw ,d = Pn
hI (A=a)π(a,L;bα)Y
iPn
hI (A=a)π(a,L;bα)
i| {z }ipw2 estimator
�Pn
h�I (A=a)π(a,L;bα) � 1
�d (L)
iPn
hI (A=a)π(a,L;bα)
i| {z }
augmentation term
bα is the ML estimatord (�) is an arbitrary function of LWhen π (a, L; α) is correct, the augmentation term converges to 0 inprobability for any d .
Li, L. (Institute) June 11, 2008 39 / 76
Remarks
The IPW (augmented) estimators, and the estimator based on PSstrati�cation will tend to give highly variable and unreliable estimatorswhen the propensity scores are close to 1 or to 0 for some values of L.
This problem is often referred to in the literature as the problem ofpoor overlap of the propensity scores between the treated anduntreated.
Li, L. (Institute) June 11, 2008 40 / 76
Remarks
The regression estimator based on a parametric model, φ(a, L; η), forE (Y jA = a, L) does not su¤er from this problem because itextrapolates from the �tted model bE (Y jA = a, L) = φ(a, L; bη) toproduce estimates of E (Y jA = a, l) for values of l with π (a, l) ' 0.However, the estimation now strongly depend on the validity of theregression model φ(a, L; η).
Li, L. (Institute) June 11, 2008 41 / 76
Propensity Score Matching
Rosenbaum and Rubin (1983) �rst proposed this approach toestimate the average treatment e¤ect on the treated, i.e.,E (Ya=1jA = 1)� E (Ya=0jA = 1) .
1 Specify a model π (L; α) = π (1, L; α) for the conditional probability of
taking treatment pAjL (1jL) , say π (L; α) =�1+ exp
��αT L
���1indexed by a �nite dimensional vector α. Estimate α with its MLestimator bα using data (Ai , Li ) i = 1, ..., n, and bπi = π (Li ;bα)
2 Match each treated subject to one or more untreated subjects onpropensity score based on certain distance (Rosenbaum, 2002)
3 The estimator can be constructed as
bψPSM =∑i I (Ai = 1)
�Yi � Ym,i
∑i I (Ai = 1)
,
where Ym,i is the average outcome of a few subjects in the untreatedgroup (the "controls") with propensity scores close to bπi .
Li, L. (Institute) June 11, 2008 42 / 76
Propensity Score Matching
Suppose bα converges in probability to α�
Suppose model π (L; α) is correctly speci�ed, bψPSM converges inprobability to
E f[E (Y jA = 1,π (L; α�))� E (Y jA = 0,π (L; α�))] jA = 1g
= E��
E (Ya=1jA = 1,π (L; α�))�E (Ya=0jA = 1,π (L; α�))
�jA = 1
�= E (Ya=1 � Ya=0jA = 1)
Li, L. (Institute) June 11, 2008 43 / 76
Confounding in Observation Studies
Identify causation from association
Standard regression
Propensity Score (PS) adjustments
PS strati�cationInverse probability weighting (IPW)PS matching
Li, L. (Institute) June 11, 2008 44 / 76
Outline
Introduction to Causality
Confounding in Observational Studies
Dependent Censoring in Randomized TrialsSensitivity Analysis for Nonignorable Censoring
Li, L. (Institute) June 11, 2008 45 / 76
Motivation Example II
The AIDS Clinical Trial Group (ACTG) study 021, a double-blindrandomized clinical trial
Compare the e¤ect of bactrim versus aerosolized pentamidine (AP) asprophylaxis therapy for pneumocystis pneumonia (PCP) in AIDSpatients
Our endpoint of interest: the survival status at the end of the trial
Li, L. (Institute) June 11, 2008 46 / 76
Motivation Example II
Informative drop-out: patients stopped prophylactic therapies fornon-medical reasons and medical indications
Nonrandom nonadherence: cross over to the other treatment arm ifPCP developed
Causal e¤ect: the e¤ect of two prophylaxis therapies on the survivalrate at the end of the trial, if everyone followed the assigned therapyand didn�t stop the therapy unless for toxicity
(Arti�cially) regard subjects as dependently censored at the �rst timea subject stops therapy due to reasons other than toxicity or switchestherapy
Li, L. (Institute) June 11, 2008 47 / 76
Motivation Example II
Appropriate palliative therapies available to combat the toxicity
Causal e¤ect: the e¤ect of two prophylaxis therapies on the survivalrate at the end of the trial, if everyone followed the assigned therapyand never stopped the therapy
(Arti�cially) regard subjects as dependently censored at the �rst timea subject stops therapy due to any reason or switches therapy
Li, L. (Institute) June 11, 2008 48 / 76
A Randomized Clinical Trial with Missing Data
Suppose we have i .i .d . observations Oi = (RiYi ,Ai , Li ) i = 1, 2, ...n.
Li : a vector of baseline covariatesAi : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 f0, 1gYi : continuous outcome measured at the end of a �xed follow-upperiod
Ri : missing indicator, Yi observed if Ri = 1
Li, L. (Institute) June 11, 2008 49 / 76
Missing Completely at Random (MCAR)
MCAR:
p (R = 1jY ,A, L) = p (R = 1jA, L),R q Y jA = a, L,
pY jR=1,A,L (y jR = 1, a, l) = pY jA,L (y ja, l)
Further, as Aq L and pYa (y) = pY jA=a (y) by randomization,
E [E (Y jR = 1,A = a, L)]
=Zp (Y jA = a, l) pL (l) dµ (l)
AqL=
Zp (Y jA = a, l) pLjA (l ja) dµ (l)
= E (Y jA = a) rand .= E (Ya)
Standard complete case analysis is valid
Li, L. (Institute) June 11, 2008 50 / 76
Motivation Example II
Per-protocol analysis valid if the probability of being censored isindependent of the outcome given the treatment assignment andsome pre-treatment covariates
Apparently NOT true
stopping therapy due to medication indications and other reasonsswitching therapy if PCP developedPCP status and the medication indications likely to a¤ect the survivalstatus
Li, L. (Institute) June 11, 2008 51 / 76
A Randomized Clinical Trial with Missing Data
Suppose we have i .i .d . observations Oi = (RiYi ,Ai , Li ,Wi )i = 1, 2, ...n.
Li : a vector of baseline covariatesAi : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 f0, 1gWi : a vector of post-treatment covariates correlated with Yi andalways observed
Yi : continuous outcome measured at the end of a �xed follow-upperiod
Ri : missing indicator, Yi observed if Ri = 1
Li, L. (Institute) June 11, 2008 52 / 76
Missing at Random (MAR)
MAR:
p (R = 1jY ,A = a, L,W ) = p (R = 1jA = a, L,W ), R q Y jA = a, L,W
Positivity: Pr (R = 1jA = a, L,W ) > 0 w.p. 1However, in general W̄1 � (L,W )q A DOES NOT hold.Equivalently,
E [E (Y jR = 1,A = a, W̄1)]
=ZpY jA,W̄1
(y ja, w̄1) pW̄1(w̄1) dµ (w̄1)
6=ZpY jA,W̄1
(y ja, w̄1) pW̄1(w̄1ja) dµ (w̄1) = E (Ya)
Standard regression using complete cases will be biased.
Li, L. (Institute) June 11, 2008 53 / 76
A "Missing Data" Problem
As thoroughly discussed in Robins et al. (1994), the estimation ofaverage treatment e¤ect ψ = E (Ya=1)� E (Ya=0) can be viewed asa "missing data" problem.
Suppose we want to estimate E (Ya=1)
"full data": (Ya=1,A, L) .Ya=1 is observed for the subgroup with A = 1, but missing for theother subgroup with A = 0Under NUCA (i.e., Ya q AjL), p (A = 1jYa=1, L) = p (A = 1jL)
Similarly, for E (Ya=0)
"full data": (Ya=0,A, L)Ya=0 is observed for the subgroup with A = 0, but missing for theother subgroup with A = 1
Li, L. (Institute) June 11, 2008 54 / 76
Inverse Probability Censoring Weighting (IPCW)
Suppose we are interested in τ1 = E (Ya=1)
By randomization, pYa=1 (y) = pY jA=1 (y)
pY jA,R (y jA = 1,R = 1) 6= pY jA,R (y jA = 1,R = 0)pY jA,R ,W̄1
(y jA = 1,R = 1, w̄1) = pY jA,R (y jA = 1,R = 0, w̄1)Within each level of W̄1, by assumptions, the distribution of Y amongthose with R = 0, though not observed in our data, equals thedistribution of Y among those with R = 1.
Li, L. (Institute) June 11, 2008 55 / 76
Observation study Missing Data in RCTE (Y1) E (Y1) = E (Y jA = 1)Whole population Sub-population with A = 1Y1 YA RY1 observed if A = 1 Y observed if R = 1L W̄1
p (AjL,Y1) = p (AjL) p (R jA = 1, W̄1,Y ) = p (R jA = 1, W̄1)
Li, L. (Institute) June 11, 2008 56 / 76
Inverse Probability Weighting (IPW) Estimators
Exploiting the formula E (Ya) = EhI (A=a)pAjL(ajL)
Yi,
bE1 (Ya) � Pn
nI (A=1)π(1,L;bα)Y
obE2 (Ya) = Pn
nI (A=a)Yπ(a,L;bα)
o �Pn
nI (A=a)π(a,L;bα)
o��1as E
�I (A=a)π(a,L;α)
�= 1
......
Li, L. (Institute) June 11, 2008 57 / 76
Inverse Probability Censoring Weighting (IPCW)
Suppose π1 (W̄1) � Pr (R = 1jW̄1,A = 1)
E�
RYπ1(W̄1)
jA = 1�= E fE (Y jR = 1, W̄1,A = 1) jA = 1g MAR=
E (Y jA = 1) rand .= E (Ya=1)
Impose a parametric form π1 (W̄1; α) for Pr (R = 1jW̄1,A = 1) , andbα is the ML estimatorA natural estimator bτ1,ipcw = Pn1
hR
π1(W̄1;bα)Yi, where n1 is the size
of the treated group, and Pn1 (Z ) is the sample average ofZ1,Z2, ...,Zn1 within the treated group.
Li, L. (Institute) June 11, 2008 58 / 76
IP(T)W Estimators with Augmentations
Recall the class of IP(T)W estimators (with augmentations)
bE (Ya)aipw ,d = Pn
hI (A=a)π(a,L;bα)Y
iPn
hI (A=a)π(a,L;bα)
i| {z }ratio estimator
�Pn
h�I (A=a)π(a,L;bα) � 1
�d (L)
iPn
hI (A=a)π(a,L;bα)
i| {z }
augmentation term
The parameter of interest E (Ya) = E�I (A=a)pAjL(ajL)
Y�under CNP
π (a, L; α) is an assumed parametric form for pAjL (ajL) and bα is theML estimator using (Ai , Li ) i = 1, ..., n
d (�) is an arbitrary function of LWhen π (a, L; α) is correct, the augmentation term converges to 0 inprobability for any d .
Li, L. (Institute) June 11, 2008 59 / 76
IPCW Estimators with Augmentations
Similarly, a class of IPCW estimators with augmentation terms
bτ1,ipcw ,d = Pn1
hR
π1(W̄1;bα)Yi
Pn1
hR
π1(W̄1;bα)i �
Pn1
h�R
π1(W̄1;bα) � 1�d (W̄1)
iPn1
hR
π1(W̄1;bα)i ,
The parameter of interestE (Ya=1) = E (Ya=1jA = 1) = E
�R
π1(W̄1)Y jA = 1
�under MAR and
positivity
π1 (W̄1; α) is an assumed parametric form for Pr (R = 1jW̄1,A = 1)and bα is the ML estimator using realizations of (R, W̄1) in the treatedgroup only
d (�) is an arbitrary function of W̄1
When π1 (W̄1; α) is correct, the augmentation term converges to 0 inprobability for any d .
Li, L. (Institute) June 11, 2008 60 / 76
Dependent Censoring in a Longitudinal Study
Suppose the surrogate variables W are to be measured at �xed timesk = 1, 2, ...,K � 1.For any k � K � 1, let W̄k ,i = (W0,i = Li ,W1,i , ...,Wk�1,i ) , theW�history for subject i up to but not including the kth occasion.The outcome Yi measured at time K , the end of follow-up.
Rk ,i = 1 if subject i remains in the follow-up at time k; and 0otherwise.
Monotone missing pattern, i.e., Rk ,i = 0) Rs ,i = 0 for any s � k.
Li, L. (Institute) June 11, 2008 61 / 76
Dependent Censoring in a Longitudinal Study
For any 1 � k � K ,1 Sequentially ignorable missingness (SIM):
Pr (Rk = 1jRk�1 = 1, W̄k ,A,Y ) = Pr (Rk = 1jRk�1 = 1,A, W̄k )
2 Positivity: with probability 1,
λk (a, W̄k ) � Pr (Rk = 1jRk�1 = 1, W̄k ,A = a) � σ > 0
π (a, W̄K ) � (RK = 1jW̄K ,A = a) =K
∏k=1
λk (a, W̄k ). Let R = RK
E (Ya) = E (YajA = a) = E�
Rπ(a,W̄K )
Y jA = a�under (i)
Consistency, (ii) SIM, and (iii) Positivity
Li, L. (Institute) June 11, 2008 62 / 76
Dependent Censoring in a Longitudinal Study
Impose a parametric from fλk (1, W̄k ; α) : 1 � k � Kg indexed by a�nite-dimensional parameter vector α
The partial likelihood for the missing mechanism
L =K
∏k=1
nλk (1, W̄k ; α)
Rk (1� λk (1, W̄k ; α))1�Rk
oRk�1The estimator bα obtained by maximizing the partial likelihood n1
∏i=1Li
using (RiW̄K ,i , ...,Rk ,iW̄k ,i , ...,R1,iW̄1,i ,W0,i ) for i = 1, ..., n1, thesubjects in the treated group
bπ (1, W̄K ) =K
∏k=1
λk (1, W̄k ;bα) and bE (Ya) = Pn1
nRbπ(1,W̄K )
Yo
Li, L. (Institute) June 11, 2008 63 / 76
Outline
Introduction to Causality
Confounding in Observational Studies
Dependent Censoring in Randomized Trials
Sensitivity Analysis for Nonignorable Censoring
Li, L. (Institute) June 11, 2008 64 / 76
Back to the MAR Setting
Suppose we have i .i .d . observations Oi = (RiYi ,Ai , W̄1,i = (Li ,Wi ))i = 1, 2, ...n.
Li : a vector of baseline covariatesWi : a vector of post-treatment covariates correlated with Yi andalways observed
Ai : a dichotomous variable indicating the randomly assignedtreatment arm, Ai 2 0, 1Yi : continuous outcome measured at the end of a �xed follow-upperiod
Ri : missing indicator, Yi observed if Ri = 1
Li, L. (Institute) June 11, 2008 65 / 76
Nonignorable Censoring
Ignorable censoring: p (R = 1jY ,A = a, W̄1) = p (R = 1jA = a, W̄1)
Nonignorable censoring :p (R = 1jY ,A = a, W̄1) 6= p (R = 1jA = a, W̄1)
Because of ignorability, the likelihood for one individual is given by8<:hpY jR ,W̄1,A=a (y jR = 1, w̄1)
iRpW̄1 jA=a (w̄1)
� [π (a, W̄1; α)]R(1� π (a, W̄1; α))
1�R
9=;=
8<:hpY jW̄1,A=a (y jw̄1; η1)
iRpW̄1 jA=a (w̄1; η2)
� [π (a, W̄1; α)]R(1� π (a, W̄1; α))
1�R
9=;The parameter of interest τ = E (Y jA = a) depends on η1 and η2.
Li, L. (Institute) June 11, 2008 66 / 76
Nonignorable Censoring
p (R = 1jY ,A = a, W̄1) = π (a, W̄1,Y ; α) instead of π (a, W̄1; α)
For example, π (a, W̄1,Y ; α) = exp it�
αT W̄1 + γY
γ = 0 , ignorable censoring
γ: a measure of the correlation between the censoring indicator Rand the outcome Y after adjusting for W̄1
Larger value of γ implies more hidden bias
Li, L. (Institute) June 11, 2008 67 / 76
Sensitivity Analysis
For any given value γ,
ML estimator bαML not obtainable as the following likelihood dependson the unobserved outcome
L =na
∏i=1
e(αT W̄1,i+γYi)Ri
1+ eαT W̄1,i+γYi
a CAN estimator bα still obtainable by solving the following estimatingequations
na
∑i=1
�Ri
π (a, W̄1,i ,Yi ; α)� 1
�d (W̄1,i ) = 0
where d (�) is a vector of arbitrary functions of W̄1,i , and has thesame dimension as α
Li, L. (Institute) June 11, 2008 68 / 76
Sensitivity Analysis
The IPCW estimator is robust if insensitive to di¤erent values of γ
More �exible models for the missing mechanism:π (a, W̄1,Y ; α) = exp it
�αT W̄1 + γqa (Y , W̄1)
with qa (Y , W̄1)
being a known function of Y and W̄1
The choice of d (�) in the estimating equations a¤ects the e¢ ciencySee Rotnitzky et al. (1998, 2001) and Scharfstein et al. (1999) fordetailed discussions
Li, L. (Institute) June 11, 2008 69 / 76
Conclusions (I): the framework of counterfactuals can beuseful
Make it explicit what we mean by "causal e¤ect", i.e., what is thequantity/estimand we seek?
Give explicit assumptions under which "association is causation", andtherefore standard statistical methods may be used
Give explicit assumptions needed for the identi�cation of causale¤ects even when "association is not causation"
Allow for the derivation of new statistical methods when standard andfamiliar methods fail
Li, L. (Institute) June 11, 2008 70 / 76
Conclusions (II): methods need to be used with caution
The counterfactual framework may not apply to some settings
Make sure the estimand has the right meaning and interpretation
The validity of associated assumptions, some assumptions not eventestable, e.g., NUCAExpert knowledge can be used to enhance the plausibility of theassumptions
Li, L. (Institute) June 11, 2008 71 / 76
References
Connors, AF., Spero¤, T., Dawson, NV., et al. (1996). Thee¤ectiveness of right heart catheterization i the initial care of criticallyill patients. Journal of the American Medical Association. 276,889-897.
Fischl, MA., Parker, CB., Pettinellli, C., et al. (1990). A randomizedcontrolled trial of a reduced daily dose Zidovudine in patients with theacquired immuno-de�ciency syndromes. New England Journal ofMedicine. 323, 1009-1014.
Horvitz, DG. and Thompson, DJ. (1952). A generalization ofsampling without replacement from a �nite universe. Journal of theAmerican Statistical Association 47, 663-685.
Lin, DY., Psaty, BM., Kronmal, RA. (1998). Assessing the sensitivityof regression results to unmeasured confounders in observationalstudies. Biometrics. 54, 948-963.
Li, L. (Institute) June 11, 2008 72 / 76
References
Lunceford, JK. and Davidian, M. (2004). Strati�cation and weightingvia the propensity score in estimation of causal treatment e¤ects: Acomparative study. Statistics in Medicine 23, 2937-2960.
Robins, JM. and Finkelstein, DM. (2000). Correcting fornoncompliance and dependent censoring in an AIDS clinical trial withinverse probability of censoring weighted (IPCW) log-rank tests.Biometrics, 779-788.
Robins, JM., Rotnitzky, A., Zhao, LP. (1994). Estimation ofregression coe¢ cients when some regressors are not always observed.Journal of the American Statistical Association 89, 846-866.
Rosenbaum, PR. and Rubin, DB. (1983). The Central role of thepropensity score in observational studies for causal e¤ects. Biometrika70, 41-55.
Li, L. (Institute) June 11, 2008 73 / 76
References
Rosenbaum, PR. and Rubin, DB. (1984). Reducing bias inobservational studies using subclassi�cation on the propensity score.Journal of the American Statistical Association. 79, 516-524.
Rosenbaum, PR. (1998). Propensity score. In Encyclopedia ofBiostatistics, Volume 5, Armitage P, Colton T (eds). Wiley: NewYork, 3551-3555.
Rosenbaum, RP. (2002). Observational Studies, 2nd edn. New York:Springer-Verlag.
Rotnitzky, A. and Robins, JM. (1995) Semiparametric regressionestimation in the presence of dependent censoring. Biometrika, 82(4):805-20.
Li, L. (Institute) June 11, 2008 74 / 76
References
Rotnitzky, A., Robins, JM., Scharfstein, DO. (1998). Semiparametricregression for repeated outcomes with nonignorable nonresponse.Journal of the American Statistical Association. 93, 1321-1339.
Rotnitzky, A., Scharfstein, DO., Su, TL., Robins, JM. (2001).Methods for conducting sensitivity analysis of trials with potentiallynonignorable competing causes of censoring. Biometrics. 57,103-113.
Scharfstein, DO., Robins, JM., Eddings, W., Rotnitzky, A. (2001).Inference in randomized studies with informative censoring anddiscrete time-to-event endpoints. Biometrics, 57(2):404-413.
Scharfstein, DO. and Robins, JM. (2002). Estimation of the failuretime distribution in the presence of informative censoring. Biometrika,89(3):617-634.
Li, L. (Institute) June 11, 2008 75 / 76
References
Scharfstein, DO., Rotnitzky, A., Robins, JM. (1999). Adjusting fornonignorable drop-out using semiparametric nonresponse models(with discussion). Journal of the American Statistical Association.94, 1096-1146.
Tan, Z. (2006) A distributional approach for causal inference usingpropensity scores. Journal of the American Statistical Association.101(476):1619-37.
Zanutto, E., Lu, B., and Hornik, R. (2005). Using propensity scoresubclassi�cation for multiple treatment doses to evaluate a nationalantidrug media campaign. Journal of Educational and BehavioralStatistics 30(1): 59-73.
Li, L. (Institute) June 11, 2008 76 / 76