Analysis of disease progression from clinical observations of US Air Force active duty members...

3
Analysis of disease progression from clinical observations of US Air Force active duty members infected with the human immunodeficiency virus: distribution of AIDS survival time from interval-censored observations Jorge Arag6n*, Mary Weston *~+and Ronald Warner t A non-parametric estimator of the AIDS survival time (after developin 9 AIDS) is computed for the AIDS data set from the US Air Force ( USAF). Survival times are unobservable. They are censored by the screenin9 mechanism. The Armstron9 Laboratory's Epidemiolofic Research Division maintains data on over 954 active duty US Air Force (USAF) individuals who tested positive for human immunodeficiency virus (HIV) antibodies. Many have been clinically evaluated seven times since 1986. The HIV-positive individual is classified in seven stages of the disease complex as time progresses. Exact times of transition from one stage to the next are unknown. It is" known that transition occurred between two consecutive evaluations. The aim of this" study is to analyse distributions of the times that individuals" spend in each stage of the HIV disease complex. We will discuss methods used to obtain non-parametric estimators of the distribution of times that individuals spend in stage 6. Finally, it is hoped to model the median time spent in each stage of the disease. This', alon9 with incidence and separation data, will allow the prediction of the impact of HIV dis'ease on USAF individuals and medical care systems. Keywords: Concave programming problem; non-parametric maximum likelihood estimator The US Air Force data file for HIV/AIDS contains screenlng information on 954 infected individuals who have been evaluated at least once since 1986. Some individuals have been clinically evaluated as many as seven times. The number of infections and evaluations per individual will increase as time passes. At each evaluation the infected individual is classified into one of seven stages of the disease. The different stages are the Walter Reed (WR) classes 1 to 6 and, finally, death. The description of the different WR stages is progressive: Stage 1: Infected, but no symptoms. Stage 2: First signs of infection, such as swollen lymph glands. Stage 3: Abnormally low number of T-helper cells (with or without symptoms). Stage 4: Low response to skin-test battery for body's ability to fight disease. * Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA. tEpidemio- logic Research Division (AOES), Armstrong Laboratory, Brooks AFB, TX 78235, USA. CTo whom correspondence should be addressed 0264~410x/93/05/0552-03 :~, 1993Butterworth-HeinemannLtd 552 Vaccine, Vol. 11, Issue 5, 1993 Stage 5: No response to skin-test battery, or yeast infection in the mouth. Stage 6: AIDS. Our primary interest is to study the progression of the disease from one stage to the next within an infected individual. The aim is to obtain non-parametric estimators of the distribution of the time that an infected individual spends in each stage of the disease. The exact transition times from one stage to the next are unknown. It is only known that transition occurred within an interval of time, a situation known as interval censoring. Since the origin (infection) for each individual is also unknown, we consider first the backward process and restrict the analysis to those individuals that provide information about stage 6 (92 observations). Of these 92 individuals, 76 are dead. For the deaths, the origin of this backward process is the date of death. For those that are alive, the origin of the backward process is the date of the last medical evaluation. From this subset of the USAF data we estimate (non-parametrically) the distribution of the AIDS survival time (time that an infected individual spends in Walter Reed stage 6) for those infected individuals who develop AIDS.

Transcript of Analysis of disease progression from clinical observations of US Air Force active duty members...

Page 1: Analysis of disease progression from clinical observations of US Air Force active duty members infected with the human immunodeficiency virus: distribution of AIDS survival time from

Analysis of disease progression from clinical observations of US Air Force active duty members infected with the human immunodeficiency virus: distribution of AIDS survival time from interval-censored observations

Jorge Arag6n*, Mary Weston *~+ and Ronald Warner t

A non-parametric estimator o f the AIDS survival time (after developin 9 AIDS) is computed for the AIDS data set from the US Air Force ( USAF). Survival times are unobservable. They are censored by the screenin9 mechanism. The Armstron9 Laboratory's Epidemiolofic Research Division maintains data on over 954 active duty US Air Force (USAF) individuals who tested positive for human immunodeficiency virus (HIV) antibodies. Many have been clinically evaluated seven times since 1986. The HIV-positive individual is classified in seven stages o f the disease complex as time progresses. Exact times o f transition from one stage to the next are unknown. It is" known that transition occurred between two consecutive evaluations. The aim of this" study is to analyse distributions o f the times that individuals" spend in each stage o f the H I V disease complex. We will discuss methods used to obtain non-parametric estimators o f the distribution o f times that individuals spend in stage 6. Finally, it is hoped to model the median time spent in each stage o f the disease. This', alon9 with incidence and separation data, will allow the prediction o f the impact o f HIV dis'ease on USAF individuals and medical care systems.

Keywords: Concave programming problem; non-parametric maximum likelihood estimator

The US Air Force data file for H I V / A I D S contains screenlng information on 954 infected individuals who have been evaluated at least once since 1986. Some individuals have been clinically evaluated as many as seven times. The number of infections and evaluations per individual will increase as time passes. At each evaluation the infected individual is classified into one of seven stages of the disease. The different stages are the Walter Reed (WR) classes 1 to 6 and, finally, death. The description of the different WR stages is progressive:

Stage 1: Infected, but no symptoms. Stage 2: First signs of infection, such as swollen lymph

glands. Stage 3: Abnormally low number of T-helper cells (with

or without symptoms). Stage 4: Low response to skin-test battery for body ' s

ability to fight disease.

* Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA. tEpidemio- logic Research Division (AOES), Armstrong Laboratory, Brooks AFB, TX 78235, USA. CTo whom correspondence should be addressed

0264~410x/93/05/0552-03 :~, 1993 Butterworth-Heinemann Ltd 552 Vaccine, Vol. 11, Issue 5, 1993

Stage 5: No response to skin-test battery, or yeast infection in the mouth.

Stage 6: AIDS.

Our primary interest is to study the progression of the disease from one stage to the next within an infected individual. The aim is to obtain non-parametric estimators of the distribution of the time that an infected individual spends in each stage of the disease. The exact transition times from one stage to the next are unknown. It is only known that transition occurred within an interval of time, a situation known as interval censoring. Since the origin (infection) for each individual is also unknown, we consider first the backward process and restrict the analysis to those individuals that provide information about stage 6 (92 observations). Of these 92 individuals, 76 are dead. For the deaths, the origin of this backward process is the date of death. For those that are alive, the origin of the backward process is the date of the last medical evaluation. From this subset of the USAF data we estimate (non-parametrically) the distribution of the AIDS survival time (time that an infected individual spends in Walter Reed stage 6) for those infected individuals who develop AIDS.

Page 2: Analysis of disease progression from clinical observations of US Air Force active duty members infected with the human immunodeficiency virus: distribution of AIDS survival time from

X

d = l C X I ~" T

X

d = e = 0 • I X -" U

HIV disease progression: J. Aragon et al.

where {1 . . . . . m} = I o U I I U I 2 is a disjoint union, and where i' < i is an index dependent on the choice of i.

The algorithm to solve this concave programming problem, called iterative damped CM algorithm, is described with detail by Arag6n and Eberly 5. It is an adaptation of the CM algorithm developed b y Groeneboom in order to have global convergence. There are no convergence results for the CM algorithms of Groeneboom.

X

e : l ~ I x I T U

Figure 1 Three different cases of interval censoring for non-parametric estimation

A non-parametric estimator of a survival distribution was introduced by Kaplan and Meier ~ for right-censored data. Turnbull 2 developed the expectation-maximization (EM) algorithm for doubly-censored data which is an iterative method based on the self-consistency concept introduced by Efron 3. Groeneboom 4 presented an alternative method called iterative convex minorant (CM) algorithm for interval-censored observations. The method used here is an adaptation of the iterative CM algorithm by Aragon and Eberly 5.

M E T H O D S

Let Xi be the time that the ith infected individual spends in stage 6 and let (T~, U/) be the interval censoring the variable Xi. The random variable X/ is unobservable. We only observe T~, U/and the indicators d / = l{x, <, r,l and e~ = I{ v, < x, <~ u,}. Note that d/-- 1 corresponds to a left-censored observation and d / = eg = 0 corresponds to a right-censored observation. Therefore U/is not needed if d~= 1 and T/ is not needed if d i = e i = O . Figure 1 illustrates the three cases. It is assumed that X i and ( T i, Ui) are independent.

We follow the approach described by Groeneboom. The likelihood function, conditional on those who develop AIDS, is given by

L ( F ) = f i F(T~) a' i=1

x ( F ( U / ) -- F(T~) )e ' (1 -- F ( U i ) ) ' - a ' - e ' (1 )

where F is the cumulative distribution function of X/and n is the sample size.

The non-parametric maximum likelihood estimator ( N P M L E ) of the distribution F is the function /7, maximizing Equation (1). Note that only the values of F at the points T~ or U/ matter for the maximization problem.

If we let y / = F (T,)) where T,~ is the ith order statistic of the set { T 1 . . . . . T,, U~ . . . . . U,}, the maximization problem is formulated as follows. Maximize the (conditional) log likelihood

qS(y) = ~, log(yi) + ~,, log(1 - yl) i e l o i e l l

+ Z l o g ( y / - y;) (2) i ' e 12

over the simplex S = { y e R ' : O < , , y / < ~ . . . <~y,,<~ 1}

R E S U L T S A N D D I S C U S S I O N

The 92 observations corresponding to the infected individuals from the USAF data who developed AIDS are listed in Table 1. The first 16 observations correspond to individuals who are still alive. For this group, the value of u is computed as the time interval between the first and last evaluations in which the individual was classified in stage 6. The unit of time used is months (28 days per month). For the remaining 76 observations

Table 1 Walter Reed stage 6 data from US Air Force

T U d e T U d e

0.00 35.36 0 0 22,25 24.96 0 1 0.00 18.00 0 0 16.32 38.93 0 1 0.00 18.11 0 0 11,89 48.04 0 1 0.00 14.43 0 0 24,36 37.82 0 1 0.00 17.18 0 0 7,93 24.89 0 1 0.00 16.68 0 0 21,25 28.96 0 1 0.00 17.50 0 0 24.39 41.14 0 1 0.00 26.04 0 0 14.61 30.07 0 1 0.00 18.21 0 0 9.57 28.07 0 1 0.00 22.43 0 0 9.75 26.43 0 1 0.00 15.04 0 0 11.89 29.18 0 1 0.00 22.64 0 0 5.89 18.93 0 1 0.00 7.96 0 0 18.25 27.32 0 1 0.00 17.50 0 0 6.54 22.96 0 1 0.00 37.46 0 0 11.32 23.54 0 1 0.00 37,61 0 0 13.50 31.18 0 1

26.21 27.57 0 1 0.00 52.61 0 0 10.21 29.07 0 1 0.00 15.82 0 0 6.07 12.54 0 1 0.00 13.64 0 0 14.96 28.71 0 1 0.00 13.64 0 0 20.00 25.21 0 1 0.00 16.32 0 0 18.96 39.21 0 1 0.00 15.14 0 0 3.32 29.36 0 1 0.00 12.79 0 0 3.11 0.00 1 0 0.00 9.79 0 0 7.57 0.00 1 0 0.00 11.04 0 0 6.29 0.00 1 0 0.00 18.04 0 0 12.32 0.00 1 0 0.00 10.57 0 0 8.00 0.00 1 0 0.00 17.54 0 0 8.93 0.00 1 0 0.00 5.79 0 0 3.64 0.00 1 0 0.00 5.86 0 0 20.07 0.00 1 0 0.00 7.54 0 0 2.07 0.00 1 0 0.00 10.89 0 0 10.25 0.00 1 0 0.00 20.29 0 0 7.32 0.00 1 0 0.00 13.25 0 0 7.82 0.00 1 0 0.00 8.39 0 0 24.75 0.00 1 0 0.00 11.82 0 0 14.68 0.00 1 0 0.00 7.29 0 0 7.96 0.00 1 0 0.00 2.43 0 0 16.25 0.00 1 0 0.00 17.39 0 0 13.54 0.00 1 0 0.00 4.71 0 0 20.29 0.00 1 0 0.00 24.75 0 0 13.86 0.00 1 0 0.00 18.64 0 0 3.86 0.00 1 0 0.00 29.75 0 0 5.46 0.00 1 0 0.00 19.93 0 0 19.36 0.00 1 0 0.00 11.36 0 0 0.00 22.75 0 0 0.00 23.14 0 0

Vaccine, Vol. 11, Issue 5, 1993 553

Page 3: Analysis of disease progression from clinical observations of US Air Force active duty members infected with the human immunodeficiency virus: distribution of AIDS survival time from

HIV disease progress ion: J, Aragon et al.

1'able 2 Estimated cumulative probabilities of time (months) from AIDS to death

1.0 : Time d Cumulative prob. ~ 0.8

2.07 0.16 6.29 0.21 7.32 0.25 ~ 0.6

18.93 0.31 22.96 0.51 ~ O.4 24.89 0.65 26.43 0.83 E 37.82 0.90 0.2 52.61 0.90

aTiming is measured in months of 28 days

the value of t (if positive ) is computed as the time interval between the date of death and (1) the last date in which the individual was classified in stage 5, if d = 0, or (2) the first date in which the individual was classified in stage 6, if e = 0.

The value of u (if positive) is computed as the time interval between the date of death and the last date in which the individual is classified in stage 5. The estimated distribution is given in Table 2 and Figure 2.

From Figure 2 we can see that the median survival time in stage 6 is approximately 23 months, which is consistent with the results provided by some authors in the literature. Longini et al. 6 estimated, from US Army data, a mean time of 1.3 years for individuals older than 30, and 2 years for the age group 25 years or younger.

It is clear from the graph of the estimated distribution that the USAF sample size is not large enough to provide a reliable estimate. This will improve with time as more information becomes available. The data do not provide information on treatment or reporting delays for deaths. This may be a problem since it is likely that there are reporting delays of deaths which may be a source of bias.

One last observation is that the clinical evaluation which defines the censoring mechanism may not be independent of the survival time. The fact that infected individuals are not evaluated at random times, and some of them decide on their own when to show up for an evaluation, suggests that an informative censoring mechanism is more appropriate. If this is the case, special care should be taken when interpreting the estimator of the survival time distribution.

I I I I I

10 20 30 40 50

~ o n t h s

Figure 2 Estimated cumulative probability distribution ofsurvival aRer AIDS diagnosis (USAF 1990 data)

A C K N O W L E D G E M E N T S

The authors wish to thank Major (Dr) Daniel R. Lucey, USAF, MC and Major (Dr) Craig W. Hendrix, Wilford Hall USAF Medical Center, Lackland AFB, TX for providing the data for patients entering Walter Reed stages 5 and 6.

R E F E R E N C E S

1 Kaplan, E.L. and Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 1958, 53, 457 481

2 Turnbull, B.W. Nonparametric estimation of a survivorship function with doubly-censored data. J. Am. Stat. Assoc. 1974, 69, 169 173

3 Efron, B. The two sample problem with censored data. Proc. 5th Berkeley Symp. 1967, Vol. 4, pp. 831-853

4 Groeneboom, P. Nonparametric Maximum Likelihood Estimation for Interval Censoring and the Deconvolution Problem Technical Report, Statistics Department, Stanford University, 1991

5 Aragon, J. and Eberly, D. On Convergence of the Convex Minorant Algorithms for Distribution Estimation Under Censored Data Technical Report, Statistics Department, Stanford University, 1991

6 Longini, I.M., Clark, W.S., Gardner, L.I. and Brundage, J.F. The dynamics of CD4+ T-lymphocyte decline in HIV-infected individuals: a Markov model approach. J. AIDS 4, 1141-1147

554 Vaccine, Vol. 11, Issue 5, 1993