The Quality of Well-Being Scale: critical...

24
".',. International Journal forQuality in HealthCale1998;Volume 10,Number 6: pp.509-520 The Quality of Well-Being Scale: critical similarities and differences with SF-36 ROBERT M. KAPLAN, THEODORE G. GANIATS, WILLIAM J. SIEBER AND JOHN P.ANDERSON Department of Familyand PreventativeMedicine,University of California, CA, USA Abstract Purpose. To summarize the development and application of a generic measure of health-related quality of life known as the Quality of Well-Being Scale (QWB). Background. The QWB is part of a general health policy model. The measure includes functional components for mobility, ' physical activity, and social activity. In addition, it includes a comprehensive list of symptoms and problems. Q_VB scoring allows placement of each individual on a continuum of wellness ranging from 0 (for dead) to 1.0 for asymptomatic full function. The General Health Policy Model combines this point in time measure with information on prognosis and mortality to estimate quality-adjusted life-years (QALYs). Forming a ratio of program costs to QALYs yields estimates of cost per quality-adjusted life year. Evidence supports the validity of the QWB for a wide variety of applications in population monitoring, descriptive studies of patient populations, and clinical trials. We offer a variety of comparisons between the QWB and Medical Outcomes Study Short Form 36 (SF-36). Main findings. In comparison with the SF-36, the QWB provides less information on health profiles, but has the advantage of providing a metric that can be used for cost-utility or cost-effectiveness analysis. Although ceiling effects are common for some SF-36 subscales, perfect scores on the QWB are very rare. The QWB has an approximately normal distribution for populations of adults. Although, often criticized for not including a mental health component, we present evidence documenting the validity of the Q\VB for patients with psychological and psychiatric diagnoses. Conclusions. The QWB and SF-36 are alternative comprehensive measures of health outcomes. Keywords: outcomes assessment, outcomes research, quality of life, Quality of Well-Being Scale (QWB), SF-36 A variety of different methods are now available to measure health. The model of health status includes components for health-related quality of life. Many of these measures are mortality (death) morbidity (health-related quality of life) and specific to particular illnesses or diseases. However, there is time. The rationale for the model is that diseases and dis- a continuing need for general or generic measures that abilities are important for two reasons. First, illness may can be used for population monitoring, evaluation research, cause the life expectancy to be shortened. Second, illness individual clinical decisions, or as outcome measures in may make life less desirable at times prior to death (health- randomized clinical trials. The purposes of this paper are to: related quality of life) [1-5]. (i) review the Quality of Well-Being Scale (QWB) and (ii) Central to the General Health Policy Model is a general compare the QWB with the Medical Outcomes Study 36 conceptualization of quality of life. The QWB is a method Item Short Form (SF-36). of measuringquality of life for calculationsin the model. The Q\VB is a preference-weighted measure combining three scales of functioning with a measure of symptoms and QWg-general health policy model problems to produce a point-in-time expression of well-being that runs from 0 (for death) to 1.0 (for asymptomatic full General background function) [3]. The model separates aspects of health status and life quality into distinct components. These are life The General Health Policy Model grew out of substantive expectancy (mortality), functioning and symptoms (mor- theories in economics, psycholog'y, medicine, and public bidity), preference for observed functional states (utility) and Presented at Henry Ford Health System,Center for Health System Studies,conference,UsingOutcomes Data to Compare Plans, Networks, and Providers, Dearborn, MI, 23 April t998. Address correspondence to Robert M. Kaplan, University of California, San Diego, Department of Family and Preventative Medicine, 9500 Gilman Drive, 0628 La jolla, CA 92093-0628, USA. Tel: +1 619 534 6058. Fax: + I 619 534 7517. E-mail: [email protected] © 1998 International Society for Quality in Health Care and Oxford University Press 509

Transcript of The Quality of Well-Being Scale: critical...

Page 1: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

".',. InternationalJournalforQualityin HealthCale1998;Volume 10,Number 6: pp.509-520

The Quality of Well-Being Scale: criticalsimilarities and differences with SF-36ROBERT M. KAPLAN, THEODORE G. GANIATS, WILLIAM J. SIEBER AND JOHN P.ANDERSON

Department of Familyand PreventativeMedicine,Universityof California, CA, USA

Abstract

Purpose.To summarize the development and application of a generic measure of health-related quality of life known asthe Quality of Well-Being Scale (QWB).

Background. The QWB is part of a general health policy model. The measure includes functional components for mobility, 'physical activity, and social activity. In addition, it includes a comprehensive list of symptoms and problems. Q_VB scoringallows placement of each individual on a continuum of wellness ranging from 0 (for dead) to 1.0 for asymptomatic fullfunction. The General Health Policy Model combines this point in time measure with information on prognosis andmortality to estimate quality-adjusted life-years (QALYs). Forming a ratio of program costs to QALYs yields estimates ofcost per quality-adjusted life year. Evidence supports the validity of the QWB for a wide variety of applications in populationmonitoring, descriptive studies of patient populations, and clinical trials. We offer a variety of comparisons between theQWB and Medical Outcomes Study Short Form 36 (SF-36).

Main findings. In comparison with the SF-36, the QWB provides less information on health profiles, but has the advantageof providing a metric that can be used for cost-utility or cost-effectiveness analysis. Although ceiling effects are commonfor some SF-36 subscales, perfect scores on the QWB are very rare. The QWB has an approximately normal distributionfor populations of adults. Although, often criticized for not including a mental health component, we present evidencedocumenting the validity of the Q\VB for patients with psychological and psychiatric diagnoses.

Conclusions. The QWB and SF-36 are alternative comprehensive measures of health outcomes.

Keywords: outcomes assessment, outcomes research, quality of life, Quality of Well-Being Scale (QWB), SF-36

A variety of different methods are now available to measure health. The model of health status includes components forhealth-related quality of life. Many of these measures are mortality (death) morbidity (health-related quality of life) andspecific to particular illnesses or diseases. However, there is time. The rationale for the model is that diseases and dis-a continuing need for general or generic measures that abilities are important for two reasons. First, illness maycan be used for population monitoring, evaluation research, cause the life expectancy to be shortened. Second, illnessindividual clinical decisions, or as outcome measures in may make life less desirable at times prior to death (health-randomized clinical trials. The purposes of this paper are to: related quality of life) [1-5].(i) review the Quality of Well-Being Scale (QWB) and (ii) Central to the General Health Policy Model is a generalcompare the QWB with the Medical Outcomes Study 36 conceptualization of quality of life. The QWB is a methodItemShort Form (SF-36). of measuringqualityof life for calculationsin the model.

The Q\VB is a preference-weighted measure combining threescales of functioning with a measure of symptoms and

QWg-general health policy model problems to produce a point-in-time expression of well-beingthat runs from 0 (for death) to 1.0 (for asymptomatic full

General background function) [3]. The model separates aspects of health statusand life quality into distinct components. These are lifeThe General Health Policy Model grew out of substantive expectancy (mortality), functioning and symptoms (mor-theories in economics, psycholog'y, medicine, and public bidity), preference for observed functional states (utility) and

Presentedat Henry Ford Health System,Center for Health SystemStudies,conference,UsingOutcomes Data to ComparePlans, Networks, and Providers, Dearborn, MI, 23 April t998.Address correspondence to Robert M. Kaplan, University of California, San Diego, Department of Family and PreventativeMedicine, 9500 Gilman Drive, 0628 La jolla, CA 92093-0628, USA. Tel: +1 619 534 6058. Fax: + I 619 534 7517.E-mail: [email protected]

© 1998 International Society for Quality in Health Care and Oxford University Press 509

Page 2: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet al.

duration of stay in health states (prognosis). In addition to over the course of time. If Sally is intelligent today, we expectclassification into observable levels of function, individuals her to be equally intelligent 2 weeks from toda): Variation inare also classified by symptoms or problems. Symptoms, such her performance across the 2 weeks might be attributableas fatigue or a sore throat might not be directly observable to measurement error. Thus, test-retest estimates are anby others, while problems, such as a missing limb might be important source of information.noticeable by others. On any particular da3; nearly 80% of This same logic may not apply to health status measures.the general population is optimally functional. However, over If Sally is very sick toda}; we may not expect her to bean interval of 8 consecutive days, only 12% experience no equally sick in 2 weeks time. Differences between healthsymptoms [3]. Symptoms or problems may be severe, such scores taken at two points in time may mean that sheas serious joint pain, or minor such as taking medication or recovered from her illness or that she got sicker. \'_'nenfoUowing a prescribed diet for health reasons, the underlying construct is expected to change over time,

In order to obtain preference weights for observable health test-retest evaluations may have very little meaning. As astates, peer judges place the observable states of health and result, traditional refiability data have less meaning for thefunctioning onto a preference continuum ranging from 0 for Q_q3.death to 1.0 for asymptomatic full function [4-6]. In addition We are not implying that utifity measures are exempt fromto the morbidity component, the model requires mortality reliability assessment. Measurement error is assessed in severaldata as from life tables [7], direct measurement [8], or clinical other ways. For example, we consider reliability assessed overexperience. The quality-adjusted life expectancy is the current short intervals when large swings in health status are notlife expectancy adjusted for diminished quality of life as- expected. In addition, we can consider agreement on clas-sociated with dysfunctional states and the durations of stay sification of functioning using different methods.in each state. The model quantifies the health activity or Considerable evidence shows that the QW-B scores assessedtreatment program in terms of the quality-adjusted life-years on consecutive days are highly reliable. Table 1 summarizesthat it produces or saves, consecutiveday QWB correlationsfor a variety of popu-

lations. As the table shows, the measure has good short

Reliability term stability [10]. A second method for calculating interdayreliability estimated agreement per cent for reports of dys-

Reliability is an estimate of the proportion of variance in a function on consecutive days. The agreement percentagetest or measure that is true score. Subtracting this proportion (AP) is calculated as:from 1.0 gives an estimate of the proportion of variationassumed to be error. There are several potential sources of AP = no. of agreementserror. Two important sources are item sampling and time (no. of agreements + no. of disagreements)

sampling. One of the basic tenets of psychometric theory is These values are shown in the righthand column of Table 1.that each item in a test or measure is an unbiased and The values were not available for three of the seven popu-representative sample from the domain under study. In the lations. The reported values are from Anderson et al. [10].construction of an intelligence test, for example, there is aninfinite number of items that might represent intellectualability. The Domain Sampling Model [9] assumes that itemsare sampled from this domain of performance and that eachitem is assumed to be an unbiased estimate of the underlying Table I Correlations between QWB estimated on consecutivetrait. Refiability is estimated from the inter-item correlations days in various populations and interday agreement per cent 1and the extent to which the items are intercorrelated char-

acterizes the reliability of the measure. Reliable measures are Population Reliabifity Agreementthose for which the component items measure the same General population adultsconstruct. (n= 681) 0.96 0.82

Measures of health status may be derived from a very General population childrendifferent theoretical model. Items may not be considered (n= 274) 0.93 0.87random samples from a large domain because each may have Adults in California--Indochinesea very specific meaning. For example, report of a severe community (n= 598) 0.94 0.94headache offers very specific information. The item is not Burn patients (n= 143) 0.83 0.97randomly sampled from all possible symptoms and the mean- Non-head trauma patients 2ing of reporting a headache is very different from the meaning (n= 1048) 0.90of reporting difficulty urinating. Thus, item sampling, as Chronic obstructive pulmonaryknown in psychometric theory, is not necessarily relevant to disease patients 2 (n = 84) 0.98many health status measures. A question about a headache, Diabetes patients 2 (n=70) 0.96as used in clinical medicine, is not randomly selected from a

large domain of questions about health. 'Data from [10]and JP Anderson, unpublishedwork.A second type of reliability in psychometric theory concerns aAgreement data not available for trauma, COPD and diabetes

time sampling. Psychological traits are considered to be stable patients.

510

Page 3: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versusSF-36

1.00] [] Sinusitis Alzheimer's Disease. Those with disease are significantly0.95 t..i [] Diabetes lower on the Q\VB and the degree of illness is systematically

1 • Emphysema related to QWB score [11]. A variety of studies have evaluated

°'9°'1. the relationship between QWB and disease caused by theIIl 0.ss- human immune deficiency virus (HIV). A series of studies

0.80 was conducted at the University of California, San Diego

O (UCSD) HIV Neurobehavioral Research Center (HNRC).0.75

lal These studies evaluated the relationship between ne-0.v0 urocognitive impairment, disability, and the QXXrB. Figure 3

summarizessome of the relationshipsbetweenthe QWB,> o.6sand a varietyof outcomes.WhenQWBscoreswerebroken

0.5o down by HIV grouping, the Centers for Disease Control

0.5s (CDC) IV group was significantly lower (0.661) than theCDC A/B groups (0.755) and tile control groups (0,802)

0.50

ls.44 45.54 6s. (upper left panel). The differences between the class IV andclass A/B is about.10 units of well-being, suggesting that

Age individuals lose 0.1 equivalents of well years of life for eachyear they are in the AIDS category in comparison to the

Figure I Sinus disease and diabetes in the general population, asymptomatic groups. In comparison to the uninfected con-(The data in the figure are from [7].) trols, this would equal 1 year of life loss for each 10

infected individuals. The QWB was shown to be significantly

associated with CD4+ lymphocytes (P<0.001) (upper

Validity middle), [3-2 microgiobulin quartile (P<0.03) (upper right),

Validity defines the range of inferences that can be made on neurologists ratings of dysfunction (/°<0.001) (lower left),the basis of a scale score [3]. Evidence for construct validity clinician ratings neuropsychological impairment (P<0.04)of health status measures is often provided by correlations (lower center), future vital status (P<0.05) lower right) andwith other measures of the same construct. The QWB has several psychiatric variables including Profile of Mood States

now been used in a wide variety of different studies. The (POMS) scores for vigor (/9<0.001) and dejection (P<0.001).

validity evidence will be reviewed briefly and the reader is Multivariate models demonstrated high covariation between

referred to the specific paPers. The QWB has been used in predictors of QWB. These results suggest that the QWBboth population and clinical studies. Figure 1 summarizes is a significant correlate of biological, neuropsychological,

QWB scores, estimated from the US National Health In- neurological, psychiatric, and mortalit 3, outcomes for maleterview Surv.e}; in relation to three problems, sinusitis, dia- HIV infected patients [12].

betes, and emphysema. In each of three age groups, sinus The QWB has also been Used in clinical trials and studiesdisease was shown to be a less serious problem than diabetes to evaluate medical and surgical therapies in conditions such

which, in turn, had less impact than emphysema [7]. as chronic obstructive pulmonary disease [13], AIDS [14],Figure 2 summarizes the relationship between the QWB cystic fibrosis [I5], diabetes mdlitus [16], atrial fibrillation

and measures of cognitive impairment for patients with [17], lung transplantation [18], arthritis [19], cancer [5], de-pression [20,21] and several other conditions [22]. Further,the method has been used for health resource allocation

800

77s modeling and has served as the basis for an innovative7s0 experiment on rationing of health care by the state of Oregon72s [23,24].Studieshave also demonstratedthat the Q_B isToo responsive to clinical change derived from surgery [25] or675eso medical conditions such as rheumatoid arthritis [2@ AIDS

Ill 82_ [27]andcysticfibrosis[28].600

0 s75 QWB-SA 1.04550

sas [ [_ _ V_ I One °f the mai°r c°ncems ab°ut the QWB is that it must

soo be administered by a trained interviewer. Although we still47s4s0 believe that the interviewer-adn_nistered Q\VB is the optimal4,_ I way to collect health outcome information, we have recently4oo developed a self-administered form, which is known as the

Control Mild Moderate Severe

Quality o_fWell-Being Self-Administered (Q_XrB-SA) version

Group 1.04. The stir-administered form can be completed in about10 minutes and can be machine scored. Initial data on the

Figure 2 QWB by level of cognitive impairment in Alz- validly, and reliabilit3, of the QWB-SA 1.04have been reported

helmets disease. (The data in the figure are from [11].) recently [29].

511

Page 4: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet al.

I a. Mean QWB and 95%confidenceintervalbygroup Ib, MeanQV¢BbyCD4+tyrnphocy_es

0.9 0.80" .--p. lC QW8 by SerumBeta2MicroglObulin

rn 0.7 _ "r- m _ 0,77

O_ ----T---- _ 0.65 ..- --r"- .. "" _ 0.740,6 O

0.5 _//./_ 0.680.65 .... , , •0.4 0.50 1 2 3 4

Control CDC-A COC.-B CDC-.4 <199 200-500 >500Serum Beta2 Microglobulin Quartile

Group CD4+ Cells /

1d. OWl] by neurologistratingof centralimpairment 1f. Vital status by baseline QW8le. OWB by NeuropsychologistGlobal Rat_n!

0.8' 0.80"

•T T .//.//) o.7s-

O.7' _/ _

o,8 o. o.

_ II1 - "

.,,T 00,0 065

o0.60"

0.65

0.4 -,_/ t-_, 0.55"1 2 3 4 5 0.60 0.50 -

Rating of Central Impairment None Mild > Mild Dead, N=46 Alive, N=466

impairment Status

Figure 3 Relationship between QWB and CDC group (A), CD4 + cells (B), b 2-microglobulin (C).,neurologists ratings ofcentral nervous system dysfunction (D), neuropsychological test summary score of impairment (E), and eventual deathwithin 18 months (F). Error bars show 95% confidence intervals. (The data in the figure are from [14].)

In summary, the general QWB and QWB-SA 1.04 have 1970s. The QWB evolved from a health status index originallyevidence supporting validity in a variety of different specific developed byJ.W. Bush and his associates at UCSD. In orderdiseases. The measures have been shown to be responsive to develop a population health status index, Bush el a/.to change and their application has been found to be feasible developed scales for mobility, physical activity and socialin population studies and clinical trials, activit3, [3]. These scales were based on extensive reviews of

a wide variety of questionnaires used for government surveysand epidemiologic monitoring studies.

Comparisons betweenthe QWB and At about the same time RAND was funded to conduct

SF-36" theory the Health Insurance Experiment. The San Diego groupprovided their health status index for use in the study.

The SF-36 The RAND group found the measure cumbersome, butincorporated the basic scales of mobility, physical activity and

The SF-36 has established itself as the most commonly used social activity into their survey instrument. They divided thequality of life measure in the world. The SF-36 grew out of social activity scale into two components of social contactswork by the RAND organization and the Medical Outcomes and self-care. The San Diego group had always includedStudy (MOS) [30]. The SF-36, includes eight health concepts: symptoms and problems in addition to observable function.

physical functioning, role-pl4ysical, bodily pain, general health The RAND group decided not to include specific symptomperceptions, vitality, social functioning, role-emotional, and components. Further, the RAND group emphasized the usemental health. The measure has been given to literally hun- of the measures as profiles while the San Diego group evolveddreds of thousands of respondents and has an excellent toward the creation of a single score. Whereas the SF-36record of reliability and validity [31]. In the following sections development excluded preference weighted scoring, QWBwe describe some of the similarities between the QWB and single-score development allowed for QWB use as a profile.SF-36. Then, we will pinpoint some ofthe differences between As part of the Health Insurance Experiment, the RANDthese methods.

group completed detailed psychometric studies of the func-

Common history tion statusquestionnairealongwith a varietyof other meas-ures. The next step in this programmatic research endeavor

Although not commonly recognized, the QWB and SF-36 was the large scale MOS. This study continued to refinehave some common roots. Both approaches are outgrowths measures originally developed for the Health Insurance Ex-of early attempts to develop health status indexes in the early periment. Ultimately, this resulted in the current SF-36.

512

Page 5: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

90

L0 7o4o(/)

60. - - .m-.- control8 Treatment

50 , = , . , . , • , • , . ,iF. FP BP EH VT SF FE MH

SF-36 Scale

Figure 4 Comparison of SF-36 profiles in hypothetical treatment of headache. PF, physical functioning; RP, role-physical;BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role-emotional; MH, mental health.

Although the SF-36 has evolved considerably from the a treatment or control group. As the figure shows, thoseoriginal function status measures, it is also important to who received the treatment scored higher on measures ofemphasize that many of the concepts remain similar, role-physical, bodily pain, and physical functioning. However,

The choice between measures such as the QWB and the they scored lower on general health perceptions, vitality, andSF-36 is difficult. In the following sections we discuss several mental health. The drug, in this case, was successful inissues that may guide these decisions. We begin with a more relieving headaches but produced significant drowsiness. UI-theoretical discussion of the differences between profile timately, clinicians must offer some general interpretation ofand utility based measures. Then, we critique methods for these profiles by applying a weighting system. They mustcomparing measures and offer some data on correlations decide if they are more concerned about physical role functionbetween measures. Finally, we address the need for separate or about vitality in general health perceptions. Judgment aboutphysical and mental health measures, the relative importance of various dimensions is common and

typically is done implicitly, arbitratiI3; and in an idiosyncraticTheory: profile versus utility scoring way. Thus, physicians or patients may idiosyncratically ignore

A variety of genetic measures assess the dimension of life a particular test result or a particular symptom becausequality [32]. These typically include physical functioning, another one is more important to them. However, the processemotional functioning, and some symptomatic complaints, by which relative importance is evaluated can be studiedThe specific dimensions vary from measure to measure and explicitly and measures of perceived relative importance canthere is substantial debate about which dimensions should become part of the measure.

be included [33]. These problems in interpreting profiles for clinical decision

One of the most important distinctions among measures making explain why it has been difficult to use profiles foris in how data are scaled and reported. Profile approaches cost-effectiveness and cost-utility analysis. Comparison ofreport a series of scores and characterize individual re- different options for the use of common resources requiresspondents or groups as profiles. The focus of attention is overall quantification of health outcomes using a commonon the components of health rather than on an overall measurement unit. To a large extent, this integrating pref-summary. An example of one profile is shown in Figure 4. erence or utility function is the most important feature ofThe figure shows two hypothetical profiles corresponding to cost-utility analysis. Despite many attempts, popular outcometreatments for headache. In this hypothetical clinical trial, measures such as the Sickness Impact Profile (SIP), SF-36, thepatients with recurrent headaches were randomly assigned to McMaster Health Index Questionnaire, and the Nottingham

513

Page 6: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanetal.

Health Profile (NHP) have not been useful for cost-ef- 0:06

fectiveness analysis because they have multiple outcome 0.05dimensions. Some profiles, such as the SIP and the NHP E

yield a single summary score, but are not scaled on a 0 to _ 0.041.0 continuum with a clear reference to death. Thus, they _ 0.03are not useful for cost-effectiveness or cost-utility studies. ,_

In addition, it is sometimes difficult to capture the total _ 0.0aclinical picture using profile measures because most treat- o 0.01merits have side-effects as well as benefits. A successful

surgery for prostate cancer, for example, might be associated 0 Mobility Physical Socialwith impotence and incontinence. The major challenges are

in determining what it means when someone experiences a Figure 5 QWB subscale adjustments in late life psychosisside-effect and whether the benefits justify these side-effects, patients of different ]evels of severity. (The data in the figureHow do we determine whether or not observable side-effects are from [37].)are important? Should the patient who feels sleepy discontinue

his or her medication? Should a patient with insulin dependent Measured utilities are needed to calculate quality-adjusteddiabetes mellitus discontinue therapy because he or she life-years (QALYs). These methods are required in order todevelops skin problems at the injection sites? Skin problems perfor,'n cost-utility analysis [36]. QALYs integrate mortalit3,are a nuisance, but without treatment the patient would die. and morbidity, to express health status in terms of equivalentsOften the issue is not whether treatment causes side-effects, of well-years of life. Suppose a woman who has a lifebut how we should place these side-effects within the per- expectancy of 75 years dies of breast cancer at age 50; thespective of total health. Ultimatel}; we must decide whether disease was associated with 25 lost life-years. If 100 womentreatment produces a net benefit or a net deficit in health died at age 50 (and also had a life expectancy of 75 years)status. 2500(100x 25years)life-yearswouldbe lost.

Often determining net benefit requires difficult trade offs. Death is not the only outcome of concern in cancer. ManyA treatment may extend life expectancy by treating a patho- adults suffer from the disease leaving them somewhat disabledphysiologic abnormality. However, the treatment may reduce over long periods of time. Although still alive, the quality ofquality of life or life satisfaction. Preferences and utility their lives has diminished. QALYs take into considerationmeasures attempt to quantify quality of life in relation to life the quality,of lifeconsequences of these illnesses. For example,expectancy. These methods can be used to estimate net a disease that reduces quality of life by one half will takebenefit. The assessment of net ben.efit requires summary away 0.5 QALYs over the course of 1 year. If it effects twomeasures that scale relative desirability of various health people, it will take away 1 year (2 x 0.5) over a 1-year period.outcomes. The scaling represents utilityTor preferences for A pharmaceutical treatment that improves quality of life bycomponentsof health. 0.2 for each of five individualswill result in the equivalent

Measuring preference evokes many technical and method- of one QALY if the benefit is maintained over a 1-),earologieal challenges [6,34]. Different methods of preference period. This system has the advantage of considering bothmeasurement can yield different results [35], a finding that benefits and side-effects of programs in terms of the commonshould not be surprising because the various approaches to QALY units. Although QALYs are typically assessed forpreference assessment are based on different underlying patients, they can also be measured for others, including careconceptual models and the methods ask different questions, givers who are placed at risk because they experience excessDecision theory based methods are similar in that they place stress and burden.

wellness on a continuum between 0.0 and 1.0. One of the important contrasts between the QWB andDecision theory methods are refinements of generic sur- the SF-36 is that the QWB is primarily a preference-based

viral analysis. In traditional survival analysis, those who are measure designed to produce estimates of QALYs. The SF-alive are statistically coded as 1.0 while those who are dead 36 is primarily a profile measure. However, summary scoresare statistically coded as 0.0. Mortality can result from any are available for the SF-36 mental and physical health corn-disease and survival analysis allows the comparison between ponents. Similarly, profile information can be obtained fromdifferent diseases. For example, we can state the life ex- the QWB and QWB-SA. In fact, some investigators preferpectancy for those who will eventually die of heart disease to report profile information along with the overall score.and compare it to the life expectancy to those who eventually For example, Patterson et aL [37] evaluated Q\VB outcomesdie of cancer. Thus, there is an advantage over disease specific for patients with late onset schizophrenia. The study ismeasures such as heart ejection fractions and tumor size. interesting because it is always assumed that the variation inThe difficulty is that everyone who remains alive is given the outcomes for patients with mental illness will be reflectedsame score. A person confined to bed with an irreversible by mental health items. Figure 5 summarizes some profilecoma is alive and is counted the same as someone who is information from the Patterson study, tt compares patientsactively participating in athletics. Utility assessment, on the who are normal, or have low, moderate, or high degreesother hand, allows the quantification of levels of wellness on of dysfunction associated with schizophrenia (evaluated bythe continuum anchored by death and optimum function, formal psychiatric interviews) in relation to Q_B scales for

514

Page 7: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

Table 2 Principal concepts and domains of health-related quality of life contained in general preference weighted instruments

for assessing quality-adjusted life years

Instrument

Health Utility Years of

Disability Index Healthy Quality of Quality, ofDistress ................................ Life Well-being Life and

Concept Index EuroQol 15D Mark I Mark II Mark III HP2K scale Health

Healthperceptions ** ** **

Social function

Socialrelations ** ** **

Usualsocialrole ** ** **

Intimacy/sexual function

Communication/speech ** ** **

Psychological functionCognitivefunction ** ** **Emotionalfunction ** ** ** ** **

Mood/feelings **

Physical function

Mobility ** ** ** ** ** ** **

Physicalactivity ** ** ** ** ** **Self-care ** ** ** ** **

ImpairmentSensory function/loss ** ** **

Symptoms/impairments ** ** ** ** ** ** **

From Gold MR, Siegel JE, Russel LB, Weinstein MC. (eds). Cost-effectivenessin Health and Medicine.New York: Oxford Universit3, Press,1996.

mobility, physical activity, and social activity. The scaling was related quality of life. Although there has been considerablethe adjustment (from 1.0) in the QWB scale. Thus, higher interest in measuring the cost-effectiveness of treatments,

scores in the figure represent more dysfunction. As the figure little is known about the validity of general outcome measuresdemonstrates, those with more severe schizophrenia have and it is often difficult to choose between different ap-

higher adjustments on the QWB subscales. The figure is proaches. Some authors have attempted to simplify the taskinteresting because it demonstrates that QWB profile in- by offering summary tables. An example of one such summary

formation can be reported if the investigator desires. Further, is shown as Table 2.the figure demonstrates that traditional functional scales Despite the attractiveness of this approach, there are also

capture variability associated with mental illness. We will some difficulties. In particular, creators of the tables typicallyreview the issue of mental heatlh in more detail in the last examine the names of subscales, rather than the content of

section of the paper, the measures. Consider the example of sensory function orloss. According to Table 2, sensory functioning is notincludedin the Q\VB. The creators of the table came to this conclusion

because there is no subscale on the Q_rB named sensory

Comparisons between the QWB and function. However, the QWB-SA includes symptoms for loss

SF-36: studies of vision, loss of hearing, impairment of vision (includingwearing glasses or contact lenses), problems with taste and

Content and face validity smell, and so on. In fact the newer self-administered Q'_,q3(Q_rB-SA) devotes a major portion of the questionnaire to

In 1993, the US Department of Health and Human Services items on sensation and sensory organs. The symptoms include

appointed a multi-disiplinary group of methodologists to any hearing loss, blindness in one eye, blindness in both eyes,recommend standardized strategies for the evaluation of any problems with vision (floaters, double or distorted vision),health care. The panel, which released its report in 1996, eye pain, sensitivity to light, ear aches, difficulty in balance,

suggested that standardized outcomes analyses be conducted and a variety of others. Indeed, the QWB-SA includes muchto evaluate the cost-effectiveness of medical care [36]. These more content on sensory functioning than do measures that

analyses require preference weighted measures of health are identified as including content on sensory functioning.

515

Page 8: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R,M.Kaplanet al. i

MeanSF-36Scoresvs.Age-adjusted,NationalAverageScoresTable 3 Correlations between Q_q3 and MOS components 100.in four studies

SF-36 Fryback Anderson Hughes Andresen 8o-

............60.

Physical function 0.690 0.533 0.62 0.511

Role--physical 0.566 0.462 0.64 0.281

Bodily pain 0.560 0.173 0.42 0.410 40.General health

perception 0.522 0.525 0;37 0.492Vitality. 0.505 0.481 0.47 0.468Social function 0,397 0.474 0.38 0.277

Role--emotional 0.175 0.283 0.24 0.167 0PF RP BP GH MH RE VT SF

Mental health 0.301 0.272 0.23 0.383

Figure 6 SF-36 scores in each atrial fibrillation for patients

All studies used the SF-36 except Hughes et al. [40] who used a and MOS normative sample. (The data in the figure are frommodification of the SF-36 known as the MOS-HIV-34. [42].)

has no functional limitations on a particular day [9]. Thus,Correlations between the QWB and SF-36 functional items on surveys such as the National Health

Interview Survey show the great majority of people to beA growing number of studies have applied both the SF-36 well. On the other hand, only about 12% of the general

and the QWB. Table 3 summarizes correlations between the population experience no symptom or problem during anQWB and SF-36 components in four recent studies. The average week [2]. Fryback et a/. [38] studied the distributional

first study is based on a population sample in Beaver Dam, properties of the QWB and SF-36 scales in Beaver Dam,Wisconsin involving 1356 adults (560 men and 796 women). Wisconsin. They found the QWB to be approximately nor-

The mean age of the sample was 64.1 years [38]. In this mally distributed with scores of 1.0 being exceptionally rare.

sample, all of the participants lived in community settings. This finding has also been reported by our group [3] andThe second study involved 20t adults with serious illnesses, others [4t].including 99 with AIDS, 74 with cancer, and 28 with other Ganiats etal. [42] have evaluated the QWB, SF-36, and other

terminal illnesses [39]. The AIDS patients were younger measures in clinical trials for patients with atrial fibrillation.

(mean = 38 years) while those with cancer were older (mean Confirming other studies, they found 0% of patients at theage=61 years).. Those with other illnesses were also older(mean =65 years). The third sample was a group of 100 HIV ceiling level for the QWB. In contrast, many SF-36 scales

were at their ceiling level. In other words, improvements oninfected men (mean age 35.3 years). These men completed these dimensions would be difficult to detect because the

the MOS-HIV-34 which is a 34-item adaptation of the SF- highest values had already been recorded. For example, nearly

36 for people with HIV disease [40]. The fourth sample 70% of respondents obtained the highest score for role-included 301 community dwelling older adults (124 men and emotional and 54% obtained the highest score for social177 women). All participants were older than 65 years and44% of the men and 53% of the women were older than 75 functioning. This was a concern because these patients were

afflicted with a serious chronic illness. However, the SF-36

years [41]. scores for this elderly population were comparable to thoseIn all studies correlations between the QWB total score from the MOS normative sample (Figure 6).

and the SF-36 physical function scale are substantial. Similarly, One of the issues in the application of QWB, SF-36, and

correlations between the QWB scale and the role-emotional other measures is the percentage of respondents who have

scale are low in all four studies. As the table shows, correlations incomplete questionnaires, in the atrial fibrillation studies, allbetween the QWB and SF-36 components are quite consistent

across these very different patient populations. The only measures were administered to at least 300 patients. Theexception is the low correlation between total QWB and percentage of patients who had any data missing for eachmeasure was recorded. Because it was interviewer ad-

bodily pain in the terminally ill sample. Overall, the available ministered, no patient had incomplete survey responses forevidence suggests that the SF-36 and QWB tap most of the the QWB. The EQ-5D had 6°Y0incomplete survey, responses.same variation in health status. The Health Utility Index (HUI) bad 11% and the SF-36 had

26%. The new self-administered QWB also has had some

Ceiling effects missing data problems similar to those of other measures. In

Measures differ in the extent to which they are responsive the recent study by Andresen et a/. [41], missing data for theto minor variations in wellness. Some of our early studies QWB-SA were common: about 3% of the respondents failed

demonstrated that nearly 80% of the general population to fill-out items on chronic diseases. Overall, the missing

516

Page 9: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

data rate was about the same as it is for the SF-36. However, It is commonly asserted that the QWB excludes mental

the SF-36 has worked out a scoring routine that allows the health content. Existing empirical evidence supports themeasures to be scored with up to 50% of the items missing, validity of the QWB in studies of patients known to have

We are currently working on a missing data scoring protocol impaired mental health. One study evaluated the validity offor the QWB-SA. the QWB as an outcome measure for older psychotic patients.

Seventy-two psychotic patients and 28 matched controls from

Mental health the San Diego Veterans Affairs Medical Center completed

Despite widespread interest in the concept of a QALY among the QWB; the Structured Clinical Interview for the DSM-practitioners in many different specialties, these ideas have III-R patient version, Scales for the Assessment of Positivereceived very little attention in the mental health fields. This and Negative Symptoms (SAPS and SANS), and the Global

reflects the widespread belief that mental health and physical Severity Index (GSI) from the Brief Symptom Inventory

health outcomes are conceptually distinct, were administered to all subjects. The QWB was significantlyAlthough many questionnaires include different dimen- correlated with the SANS (r=--0.52, P<0.001) the SAPS

sions, they still may be tapping the same constructs. For (r= --0.57, P < 0.001) and the GSI (r= --0.62, P<0.001).

example, a measure without a mental health component does There was a strong and signifi.cant linear relationship between.not necessarily neglect mental health. Mental health symptoms QWB and severity of illness (as classified by the SANS and

may be included and the impact of mental health, cognitive the SAPS). In addition, component scores of the QWB (i.e.functioning, or mental retardation may be represented in mobility, physical activity, social activity, and worst symptom)

questions about role functioning. Some measures have mul- were significantly lower among patients as compared totiple dimensions for mental health symptoms while others controls, and declined systematically as psychiatric symptoms

include fewer items that ask about problems in general. It is increased [37]. In a related stud),, Rapaport et al. [44] foundnot clear that multiple measures are more capable of detecting that patients with schizophrenia had QWB scores that were

clinical differences. This remains an empirical question for significantly lower (0.6)than age matched non-schizophrenia

systematic analysis. A common strategy is to report outcomes controls (0.7). QWB differences between these two groups

along multiple dimensions, were comparable in magnitude to a wide array of psychiatrySeveral years ago Kaplan and Anderson [43] argued that specific measures.

there are many similarities in mental health and physical Several other studies have now documented the sensitivity"

health outcomes. The preference- and utility- based measures, of the QWB to mental health problems. For example, a

that are assumed to ignore mental health, include the basic variety of studies has demonstrated the validity of the QWB

dimensions of observable functioning, symptoms, and dur- for assessing depression among patients with HIV disease.ation. Mental health problems, like physical health problems, In one study, ratings of depression using the Hamilton

can be represented by symptoms and by disrupted role Depression Scale (HAM-D) were obtained from 285 HIV

functioning. Consider some examples. Suppose that a patient patients and 84 HIV negative men participating in the HNRChas the primary symptom of a cough. If the cough does not cohort. The data were obtained at baseline and 6 months

disrupt role function, the preference or utility weighted score later. Depression was defined as Hamilton scores greater thanmight show a small deviation from 1.0. If the cough is more 10. The study demonstrated a systematic relationship between

serious and keeps the person at home, the score will be Hamilton scores and QWB scores at baseline [45]. In addition,

lower. If the cough is very severe, it might limit the person 22 HIV-positive subjects experienced increases of 10 pointsto a hospital and may have serious disruptive effects upon on the Hamilton Scale between the first evaluation and that

role functioning. This would necessitate an even lower score, at 6 months. For these individuals, significant reductions in

Coughs can be of different duration. A cough associated QWB scores were observed. Analyses of QWB symptomswith an acute respirator), infection may have a serious impacton functioning that may last only a short period of time. suggested greater symptom severity among those whose

Hamilton scores increased. In addition, there was a greaterThis would be indicated by a minor deviation in QALYs. Areduction in physical activity. In other words, mood affectedchronic cough associated with obstructive lung disease would

be associated with significant loss of QALYs because duration both symptoms and physical function.is a major component of the calculation. Pyne et al. [20,21] compared Q_q3 scores between patients

Now consider the case of a person with depression, with major depression and controls. The depi:essed patientsDepression may be a symptom reported by a patient just as were divided into mild, moderate and severe groups according

a cough is reported by other patients. Depression without to their scores on the HAM-D. There was a step-wise decreasedisruption of role function would cause a minor variation of in QWB scores for the controls through each of the patientwellness. If the depression caused the person to stay at home groups. Similar results were observed for Beck Depression

the preference or utility weighted score would be lower. Inventory (BDI) scores. The difference between each ofSevere depression might require the person to be in a these groups is highly significant (/°<0.001). In order to

hospital or special facility and would result in a lower score, evaluate which of six variables (age, sex, family histor},Depressions, like coughs, are of different durations. De- presence of Axis III diagnosis, comorbid Axis 1 diagnosis,

pressi0n of long duration would cause the loss of more HAM-D) best predicted the QRq3 score in this patient

QALYs than would depression of short duration, population, a hierarchical multiple regression analysis was

517

Page 10: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet al.

conducted. Although Axis III disorders are significant pre- In three of these four populations, SF-36 data were alsodictors of QWB, depression (HAM-D and BDI) predicts available. Using the same strategy, we have replicated factor

QWB when Axis III diagnoses are controlled statistically, analyses showing that mental and physical health separation

The Beaver Dana study offers some evidence that the for the SF-36. How might we explain why the QXVI3 factorQWB is sensitive to mental health disorders in the community, analysis mixes mental and physical health symptoms, whereas

The study offers QkXq3 scores for individuals with a wide the SF-36 factor analysis separates them? One explanation is

variety of conditions. Respondents with self-reported de- the difference between symptoms and functioning. In both

pression had the third lowest mean Q\'_,_ score among all factor analyses, functioning items emerge as a separate factor.

conditions [46]. The mental health items on the SF-36 use a different formatQALYs can be used to compare treatments for physical and place greater emphasis on symptomatic activity. The SF-

and mental health problems. Suppose, for example, that a 36 also separates instructions for role-emotional and physical

treatment for anxiety elevates patients from a level of 0.65 functioning scales. Respondents are instructed to consider

to a level of 0.75. Suppose further, that this treatment benefit disruptions in activities that result from physical or mental

lasted for 1 year. Each patient would gain 0.10 QALY health problems. Thus, separation of physical and emotional

(0.75-0.65=0.10 x l year=0.10 QALY) for each year the components in factor analysis might be expected becausebenefit was observed. The treatment benefit would be ex- respondents had been prompted to think differently about

pressed in terms of general QALY units. The productivity of them. It is worth noting that the separate dimensions of

the providers could be compared with providers in other physical and mental health reported by Ware et al. [31] derivein part from their use of orthogonal rotations techniques.

areas of health care. All providers in health care use resources. Methods that accommodate correlated factors (i.e. oblique

Dividing the cost of a treatment by the QALY productivity rotation) have been shown to offer a better fit for healthprovides the cost:utility ratio. Measuring productivity of men-tal health-related treatments in QALY units would allow the data [47].

investments in mental health services to be compared directly Others have noted the complications resulting from theseparation of physical and mental health. For example, Simon

to those in other aspects of health care. • el al. evaluated 536 primary care patients before and after

One of the challenges to single-score measures is the treatment with antidepressant medications. The mental health

argument that physical and mental health are different di- treatment was associated with improvements in both mentalmensions and that any measure combining them into a single and physical health subscales of the SF-36 (physical function,

index is like combining apples and oranges (CE Schwartz, role-physical, bodily pain, and general health perceptions).RM Kaplan, JP Anderson, T Holbrook and MW Genderson, However, because of an artifact in the scoring system, the

unpublished work). Clearly, mental and physical health set- physical health summary score was unchanged. The study is

vices are different. Yet, providers compete for the same important for two reasons: it demonstrates the complicationshealth care resources and there must be some basis for in interpreting SF-36 summary scores; and it demonstratesdeciding how to allocate these resources, that mental health treatments may have significant effects on

We have conducted several studies designed to determine measures believed to represent physical health [48]. We believeif mental and physical health clearly separate into well-defined this underscores the fuzzy boundaz 3, between physical anddimensions. Ware et al. [31] have clearly shown that factor mental health.

analyses of the SF-36 suggest separate physical and mentalhealth dimensions. In a series of studies, we factor analyzed

the QWB using groups of patients with multiple sclerosis Summary(n= 263), non-insulin dependent diabetes mellitus (n= 420),

non-head traumatic injury (n=852), AIDS (n=99), and can- Outcomes researchers now have a variety of validated ap-

cer (n=74). Each QWB item was treated as dichotomous proaches to assess the costs, risks and benefits in medicaland the matrix of 0s and ls was subjected to factor analysis care. The QWB and the SF-36 are two methods that arisewith orthogonal rotation. Items with factor loadings > 0.40 from similar traditions. Many components of the measures arewere regarded as consistent with a particular factor, similar and correlations between the measures are substantial.

The analysis consistently revealed two strong factors. The Until recently, the SF-36 was much easier and less expensivefirst factor combined physical and psychological symptoms, to administer. However, an inexpensive self-administeredThe same factor emerged across these diverse patient popu- form of the QWB is now available.

lations. The items that consistently load on this factor include There are also important distinctions between the QWB

upset stomach, general tiredness, feeling upset, depressed or and SF-36. The QWB places greater emphasis on symptomscrying, dizziness, trouble sleeping, and excessive worry. In and provides more clinical information. For example, it offers

contrast to the symptom factor, the second dimension was the clinician symptoms reports similar to a review of systems.defined by function. The second factor includes items such Further, the QWB can be used for policy analysis because

as spending the day in a wheelchair, difficulty in walkdng, and outcomes can be translated into QALYs. Although it islimitations in social activity (CE Schwartz., RM Kaplan, JP possible to create profiles from the QWB, it does less wellAnderson, T Holbrook and MW Genderson, unpublished than the SF-36 for characterizing multi-dimensional patterns

work), of outcome. The QWB and SF-36 represent different

518

Page 11: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

measurement approaches. The SF-36 is rooted in psy- 10. Anderson JR Kaplan RiM, Berry CC et al. Interday reliability, ofchometric theory. The QWB, although adhering to some function assessment for a health status measure. The Qualit3,

principles of psychometric theory, arises from a decision of Well-Being scale, il,led Care1989; 27: 1076-i083.

theory tradition. Because of the way it is constructed, some 11. Kerner DN, Patterson TL, Grant I, Kaplan RM. _vNidit3, of theaspects of psychometric theory do not apply directly to the Quality of Well-being scale for patients with Alzheimers Disease.QXX,q3. For example, traditional test-retest reliabilit3r has little JAging Health 1998; 10: 44--61.value for assessing either the Q_rB or the SF-36. The Q\VBis better suited to policy analysis and to econornic studies 12. Kaplan RaM,Anderson JP, Patterson TL e/a/. \,'alidity of the

that require the calculation ofa QALY. Investigators interested QualitT of Well-being Scale for persons with HIV int'ection.in cost-effectiveness or Cost-utility analysis should consider I°(ychosomatlcAgedt995: 5Z" 138-147.

using the QWB or a related utilit3,-based measure. In- 13. Kaplan Ra\{, Atldns cj, Timms R. Validity of a qualiD, of well-vestigators interested in reviewing a profile of outcomes may being scale as an outcome measure in chronic obstructivebe better to use the well-established and well-validated pulmonar 3,disease. JChronDis1984; 37: 85-95.SF-36.

We believe that better measurement technologies wiU re- 14. Kaplan RaM,Patterson TL, Kerner DN et al. The Quality of

place many current approaches to health status assessment. Well-being Scale in asymtomatic HI¥ infected patients. Qua/Continuing research is necessary to build the next generation Life Res 1997; 6: 507-514.

of outcome measures. 15. Orenstein DM, Nixon PA, Ross EA, Kaplan RaM.The quality.of well-being in cystic fibrosis. Chest 1989; 95."344-347.

Acknowledgements 16. Kaplan RM, Hartwell SL, Wilson DK, Wallace JP. Effects ofdiet and exercise interventions on control and quality of life innon-insulin-dependent diabetes mellitus.J Gen Intern A'Ied1987;

This work was supported in part by grants R01 HS 09170 2: 220-228.and 5RO1NS33519 from the Agency for Health Care Policy

and Research and grant P60 AR 40770 from the National 17. Ganiats TG, Palinkas LA, Kaplan RM. Comparison of Quality,of Well-Being scale and Functional Status Index in patients with

Institutes of Health. atrial fibrillation. Med Care1992; 30: 958-964.

18. Squier H, Ries AL, Kaplan RM et al. Quality. of well-beingReferences predicts survival in lung transplantation candidates. Am J Resp

C'tit Care Med 1995; 152: 2032-2036.

1. Kaplan RiM. Decisions about prostate cancer screening in 19. Kaplan RM, Kozin F, Anderson JR Measuring quality of lifemanaged care. Curr OpiJ,Onco/1997; 9: 480-486. in arthritis patients (including discussion of a general health-

2. Kap]an RiM,Anderson JP. A general health policy model: update decision model). Qua/Lift CardiovascCare 1988; 4:131-139.and applications, l-tea/lhSen_Res 1988; 23: 203-234.

20. Pyne JM, Patterson TL, Kaplan RM el al. Quality of life3. Kaplan RM, Bush JW, Berry CC. Health status index: category assessment for patients with major depression. P..q,chiatrServ

rating versus magnitude estimation for measuring levels of well- 1997; 48: 224-230.being. AgedCare 1979; 17: 501-525.

2t. PyneJM, Patterson TL, Kaplan RM elal. Preliminary longitudinal4. Kaplan RM. Human preference measurement for health de- assessment of quality of life in patients with major depression.

cisions and the evaluation of long-term care. in Kane RL, Kane Psychophat_eacolBull 1997; 33: 23229.RA, eds. Valuesand Long-_rm C)re. Lexington, MA: LexingtonBooks, 1982:pp. 157-188. 22. Kaplan RM. Application of a general health policy model in

the American health care crisis.J R SocAged1993; 86: 277-281.5. Kaplan RiM. Quality of life assessment for cost/utility studies

in cancer. CancerTreat Rev 1993; 19: S85-$96. 23. Kaplan RM. HippocraticPredicament:Affordabi/i_ Access, and Ac-

6. Kaplan RM. Profile versus utility based measures of outcome countabili_,in Health Care.San Diego, CA: Academic Press, 1993.for clinical trials. In Staquet MJ, Hays RD, Fayers PM, eds. 24. Kaplan RiM. Value judgment in the Oregon Medicaid ex-Qua/iO, of Life Assessment in Crinica/ Trials. London: Oxford periment. Med Care 1994; 32: 975-88.University, Press, 1998: pp. 69-90.

7. Erickson P, KendaU EA, Anderson J, Kaplan RM. Using corn- 25. Hodgldn PS. Health impact of endoscopic sinus surgery assessedposite health status measures to assess the nations health. Med by the Quality of Well-being (QWB) Scale. Unpublished paper.C}lre1989;27! $66-S76. San Diego,CA: Universityof California,1994.

8. Anderson JP, Kaplan RM, Coons SJ, Schneiderman LJ. Corn- 26. Bombardier C, \X/areJ, Russell tJ el al. Auranofin therapy andpatison of the Quality of Wall-being Scale and the SF-36 results quality of life in patients with rheumatoid arthritis. Results ofin two studies of ill adults: AIDS and cancer, j C/in Epidemiol a multicenter trial. Amj Med 1986; 81: 565-578.

i998; 51: 755-762. 27. Kaplan RaM,Anderson JP, Wu AW et al. The Quality of Well-

9. Nunnally JC, Bernstein IH. P(1_chomel,ic7beo*);3rd edn. New being Scale. Applications in AIDS cystic fibrosis, and arthritis.York, NY: McGraw Hill, 1994. Med Care 1989; 27: $27-$43.

519

Page 12: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanetal.

28. Orenstein DM, Kaplan RM. Measuring the qualit3,of well-being advance directives on quality adjusted life expectancy and psy-in cystic fibrosis and lung transplantation. The importance of chological well-being among ill adults. J Clin Epidemiol I994;the area under the cu_e. C}est 1991; 100: 1016-1018. 47: 761-772.

29. Kaplan RM, Sieber WJ, Ganiazs TG. The Quality of Well-being 40. Hughes TE, Kaplan RM, Coons SJ et al. Construct validities ofScale: comparison of the interviewer-administered version with the quality of weU-being scale and the MOS-HIV-34 healtha self-administered questionnaire. PsydaolHea/Ih 1997; 12: 783- survey for HIV- infected patients. Med Decis MakiNg 1997; 17:791. 439-446.

30. Ware Jr JE, Phillips J, Yody BB, Adamczyk J. Assessment tools: 41. Andreson EM, Rothenberg BM, Kaplan RiM. Performance offunctkmal health status and patient satisfaction. Am/MedQua/ a self-administered mailed version of the Quality of Well-being1996; 11: $50-$53. (QWB-SA) Questionnaire among older adults. AgedCare 1998;

36: 1349-1360.

31. Ware Jr JE, Kosinski M, Bayliss MS el al. Comparison ofmethods for the scoring and statistical analysis of SF-36 health 42. Ganiats TG, Sieber WJ, Barber E, Barrett-Connor E. Initialprofile and summary measures: summary of results from the comparison of four generic quality of life instruments. Qua/Lift• Res 1997; 6: 648.Medical Outcomes Study. Med Care 33: AS264-AS279.

43. Kaplan RM, Anderson JP. The Quality of Well-being Scale:32. Spilker B (ed). Qt_a/iO, of Lift and Phannacoeconomicsin Clinical rationale for a single quality of life index. In Walker SR, Rosser

THa/s.New York, NY: Raven, 1996: pp. 309-322. R, eds. QualiO,of Life: Assessmentand Applicatio*a.London: MTP

33. Kaplan RM, Anderson JP. The general heaith policy model: Press, 1988: pp. 51-77.

an integrated approach. In Spilker B, ed. Quality of Life and 44. Rapaport MH, Bazzetta J, Mcadams LAet a/. Validation of thePharmacoeconomicsin CliNicalTrials.New York, NY: Raven, 1996: Scale of Functioning in older outpatients with schizophrenia.pp. 309-322. Am J GeriatrPsychial*y1996; 4:218-228.

34. Coons sJ, Kaptan RM. Cost-utility analysis. In Bootman JL, 45. Rubin HC, Patterson TL, Atkinson JH etal. Tracking effectsTownsend RJ, McGhan WF, eds. PriNdple*of Pharr_acoeconomics, of depression on life quality in HIV. 2NdINlernalionalCo,_re_lce2nd edn. Cincinnati, OI-i: Harvey Whitney Books, 1996: pp. onAID,; I**gaacl,Brighton UK, July 7-10, 1994.

102-126_ 46. Fryback DG, Dasbach EJ, Klein R, et a/. The Beaver Dana

35. Revicki DA, Kaplan RiM. Relationship between psychometric Health Outcomes Study: initial catalog of health-state qualityand utility-based approaches to the measurement of health- factors. MedDecisMaking 1993; 13: 89-102.related quality of life. Qual Life Res 1993; 2: 477-487. 47. Hays RE), Marshall GN, Wang E__;Sherbourne CD. Four-year

36. Gold MR, Siegel JE, Russel LB, Weinstein MC. Cost-effecliveness cross-lagged associations between physical and mental heaithin Health andMediciNe.New York: Oxford University Press, 1996. in the Medical Outcomes Stud3,.J CoNsultCliNPsycho/1994; 62:

441-449.

37. Patterson TL, Kaplan RM, Grant Iet aZ Quality of well-beingin late-life psychosis. Psycbia_.,Res 1996; 63: 169-18t. 48. Simon GE, Revicki DA, Grothaus L, Vonkorff M. SF-36

summary scores: are physical and mental health truly distinct?

38. Fryback DG, Lawrence WF, Martin PA el al. Predicting Quality Med Care36: 567-572.of Well-being scores from the SF-36: results from the BeaverDam Health Outcomes Study. Med DecisMakiNg 1997; 17: 1-9.

39. Anderson JP, Kaplan RM, Schneiderman LJ. Effects of offering Accepted for publication 4 September 1998

520

Page 13: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

_'.,_ International Journal for Quality in Health Care 1998; Volume 10, Number 6: pp. 509-520

The Quality of Well-Being Scale: criticalsimilarities and differences with SF-36ROBERTM. KAPLAN, THEODORE G. GANIATS, WILLIAM J.SIEBERAND JOHN P.ANDERSON

Department of Family and Preventative Medicine, University of California, CA, USA

Abstract

Purpose. To summarize the development and application of a generic measure of health-related quality of life known as

the Quality of Well-Being Scale (QWB).

Background. The QWB is part of a general health policy model. The measure includes functional components for mobility,

physical activity, and social activity. In addition, it includes a comprehensive list of symptoms and problems. Q_rB scoring

allows placement of each individual on a continuum of wellness ranging from 0 (for dead) to 1.0 for asymptomatic fullfunction. The General Health Policy Model combines this point in time measure with information on prognosis and

mortality to estimate quality-adjusted life-years (QALYs). Forming a ratio of program costs to QALYs yields estimates ofcost per qualitT-adjusted life year. Evidence supports the validit 3, of the QWB for a wide variety of applications in population

monitoring, descriptive studies of patient populations, and clinical trials. We offer a variety of comparisons between theQWB and Medical Outcomes Study Short Form 36 (SF-36).

Main findings. In comparison with the SF-36, the Q_q3 provides less information on health profiles, but has the advantage

of providing a metric that can be used for cost-utility or cost-effectiveness analysis. Although ceiling effects are commonfor some SF-36 subscales, perfect scores on the QWB are very rare. The QWB has an approximately normal distribution

for populations of adults. Although, often criticized for not including a mental health component, we present evidence

documenting the validity of the QWB for patients with psychological and psychiatric diagnoses.

Conclusions. The QWB and SF-36 are alternative comprehensive measures of health outcomes.

Keywords: outcomes assessment, outcomes research, quality of life, Quality, of Well-Being Scale (QWB), SF-36

A variety of different methods are now available to measure health. The model of health status includes components forhealth-related quality of life. Many of these measures are mortality (death) morbidity (health-related quality of life) and

specific to particular illnesses or diseases. However, there is time. The rationale for the model is that diseases and dis-a continuing need for general or generic measures that abilities are important for two reasons. First, illness maycan be used for population monitoring, evaluation research, cause the life expectancy to be shortened. Second, illnessindividual clinical decisions, or as outcome measures in may make life less desirable at times prior to death (health-

randomized clinical trials. The purposes of this paper are to: related quality of life) [1-5].

(i) review the Quality of Well-Being Scale (QWB) and (ii) Central to the General Health Policy Model is a general

compare the QWB with the Medical Outcomes Study 36 conceptualization of quality of life. The QWB is a methodItem Short Form (SF-36). of measuring quality of life for calculations in the model.

The QWB is a preference-weighted measure combining threescales of functioning with a measure of symptoms and

QWB-general health policy model problems to produce a point-in-time expression of well-beingthat runs from 0 (for death) to 1.0 (for asymptomatic full

General background function) [3]. The model separates aspects of health statusand life quality into distinct components. These are life

The General Health Policy Model grew out of substantive expectancy (mortality), functioning and symptoms (mor-theories in economics, psycholog3; medicine, and public bidity), preference for observed functional states (utility) and

Presented at Henry Ford Health System, Center for Health System Studies, conference, Using Outcomes Data to ComparePlans, Networks, and Providers, Dearborn, MI, 23 April 1998.

Address correspondence to Robert M. Kaplan, University of California, San Diego, Department of Family and Preventative

Medicine, 9500 Gilman Drive, 0628 La Jolla, CA 92093-0628, USA. Tel: +1 619 534 6058. Fax: + I 619 534 7517.

E-mail: [email protected]

© 1998InternationalSocietyfor Qualityin HealthCareandOxford UniversityPress 509

Page 14: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet al.

duration of stay in health states (prognosis). In addition to over the course of time. If Sally is intelligent today, we expectclassification into observable levels of function, individuals her to be equally intelligent 2 weeks from today. Variation inare also classified by symptoms or problems. Symptoms, such her performance across the 2 weeks might be attributableas fatigue or a sore throat might not be directly observable to measurement error. Thus, test-retest estimates are anby others, while problems, such as a missing limb might be important source of information.noticeable by others, on any particular day, nearly 80% of This same logic may not apply to health status measures.the general population is optimally functional. However, over If Sally is very sick today; we may not expect her to bean interval of 8 consecutive days, only 12% experience no equally sick in 2 weeks time. Differences between healthsymptoms [3]. Symptoms or problems may be severe, such scores taken at two points in time may mean that sheas serious joint pain, or minor such as taking medication or recovered from her illness or that she got sicker. \'_'2qenfollowing a prescribed diet for health reasons, the underlying construct is expected to change over time,

In order to obtain preference weights for observable health test-retest evaluations may have very little meaning. As astates, peer judges place the observable states of health and result, traditional reliabilit3Tdata have less meaning for thefunctioning onto a preference continuum ranging from 0 for QWB.death to 1.0 for asymptomatic full function [4-6]. In addition We are not implying that utilit3,measures are exempt fromto the morbidity component, the model requires mortality reliabilit3,assessment. Measurement error is assessed in severaldata as from life tables [7], direct measurement [8], or clinical other ways. For example, we consider reliability assessed overexperience. The quality-adjusted life expectancy is the current short intervals when large swings in health status are notlife expectancy adjusted for diminished quality of life as- expected. In addition, we can consider agreement on clas-sociated with dysfunctional states and the durations of stay sification of functioning using different methods.in each state. The model quantifies the health activity or Considerable evidenceshows that theQW13 scores assessedtreatment program in terms of the quality-adjusted life-years on consecutive days are highly reliable. Table 1 summarizesthat it produces or saves, consecutive day QWB correlations for a variety of popu-

lations. As the table shows, the measure has good short

Reliability term stability [10]. A second method for calculating interdayreliability estimated agreement per cent for reports of dys-

Reliability is an estimate of the proportion of variance in a function on consecutive days. The agreement percentagetest or measure that is true score. Subtracting this proportion (AP) is calculated as:from 1.0 gives an estimate of the proportion of variationassumed to be error. There are several potential sources of AP= no. of agreementserror. Two important sources are item sampling and time (no. of agreements + no. of disagreements)

sampling. One of the basic tenets of psychometric theory is These values are shown in the righthand column of Table 1.that each item in a test or measure is an unbiased and The values were not available for three of the seven popu-representative sample from the domain under study. In the lations. The reported values are from Anderson et al. [10].construction of an intelligence test, for example, there is aninfinite number of items that might represent intellectualability. The Domain Sampling Model [9] assumes that itemsare sampled from this domain of performance and that eachitem is assumed to be an unbiased estimate of the underlying Table I Correlations between QWB estimated on consecutivetrait. Reliability is estimated from the inter-item correlations days in various populations and interday agreement per cent_and the extent to which the items are intercorrelated char-

acterizes the reliability of the measure. Reliable measures are Population Reliability Agreementthose for which the component items measure the same General population adultsconstruct. (n= 681) 0.96 0.82

Measures of health status may be derived from a very General population childrendifferent theoretical model. Items may not be considered (n = 274) 0.93 0.87random samples from a large domain because each may have Adults in California--Indochinesea very specific meaning. For example, report of a severe community (n = 598) 0.94 0.94headache offers very specific information. The item is not Burn patients (n= 143) 0.83 0.97randomly sampled from all possible symptoms and the mean- Non-head trauma patients 2ing of reporting a headache is very different from the meaning (, = 1048) 0.90of reporting difficulty urinating. Thus, item sampling, as Chronic obstructive pulmonaryknown in psychometric theory, is not necessarily relevant to disease patients 2 (n= 84) 0.98many health status measures. A question about a headache, Diabetes patients 2 (n=70) 0.96as used in clinical medicine, is not randomly selected from a

large domain of questions about health. _Datafrom [10]and Jp Anderson,unpublishedwork.A second type of reliability in psychometric theory concerns 2Agreement data not available for trauma, COPD and diabetes

time sampling. Psychological traits are considered to be stable patients.

510

Page 15: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versusSF-36

1"00t I-1 Sinusitis Alzheimer's Disease. Those with disease are significantly

0"95t 1 _ Diabetes lower on the QWB and the degree of illness is systematicallyEmphys,rna related to QWB score [11]. A variety of studies have evaluated

0"90t the relationship between QWB and disease caused by the0.85 1m human immune deficiency virus (HIV). A series of studies

0.8o-' , -- was conducted at the University of California, San Diego

O I (UCSD) HIV Neurobehavioral Research Center (HNRC).I_ 0,78-'

These studies evaluated the relationship between ne-

0.70 urocogNtive impairment, disability, and the QWB. Figure 3summarizes some of the relationships between the QWB,> o.6sand a variety of outcomes. \.'(/hen QWB scores were broken

0.60 down by HIV grouping, the Centers for Disease Controlo.ss (CDC) IV group was significantly lower (0.661) than the

CDC A/B groups (0.755) and the control groups (0.802)0.50

15-44 48-64 65+ (upper left panel). The differences between the class IV and

class A/B is about.10 units of -,veil-being, suggesting thatAge individuals lose 0.1 equivalents of well years of life for each

year they are ifi the AIDS category in comparison to theFigure I Sinus disease and diabetes in the general population, asymptomatic groups. In comparison to the uninfected con-

(The data in the figure are from [7].) trois, this would equal 1 year of life loss for each 10infected individuals. The QWB was shown to be significantly

associated with CD4+ lymphocytes (?<0.001) (upperValidity middle), 13-2 microglobulin quartile (P<0.03) (upper right),

Validity defines the range of inferences that can be made on neurologists ratings of dysfunction (P<0.001) (lower left),the basis of a scale score [3]. Evidence for construct validity clinician ratings neuropsychological impairment (P<0.04)

of health status measures is often provided by correlations (lower center), future vital status (?<0.05) lower right) andwith other measures of the same construct. The QWB has sevet:al psychiatric variables including Profile of Mood States

now been used in a wide variety of different studies. The (POMS) scores for vigor (?<0.001) and dejection (?<0.001).

validity evidence will be reviewed briefly and the reader is Multivariate models demonstrated high covariation betweenreferred to the specific papers. The QWB has been used in predictors of QWB. These results suggest that the QWB

both population and clinical studies. Figure 1 summarizes is a significant correlate of biological, neuropsychological,QWB scores, estimated from the US National Health In- neurological, psychiatric, and mortality outcomes for male

terview Sur_:ey, in relation to three problems, sinusitis, dia- HIV infected patients [12].betes, and emphysema. In each of three age groups, sinus The Q',XrB has also been used in clinical trials and studies

disease was shown to be a less serious problem than diabetes to evaluate medical and surgical therapies in conditions suchwhich, in turn, llad less impact than emphysema [7]. as chronic obstructive pulmonary disease [13], AIDS [14],

Figure 2 summarizes the relationship between the QWB cystic fibrosis [15], diabetes mellitus [16], atrial fibrillationand measures of cognitive impairment for patients with [17], lung transplantation [18], arthritis [19], cancer [5], de-

pression [20,21] and several other conditions [22]. Further,the method has been used for health resource allocation

800

77!5 modelingand has served as the basis for an innovative7sv experiment on rationing of health care by the state of Oregonv25 [23,24].Studieshave also demonstrated that the QWB is

zoo responsive to clinical change derived from surgery [25] or675

6so medicalconditions such as rheumatoidarthritis [26],AIDS

Ill 62_ [27]andcysticfibrosis[28].600

O s7s QWB-SA 1.04550

s=s / _ _ [_ [ One of the major concerns about the Q\'_ is that it must

s00 be administered by a trained interviewer. Although we still475

,so believethat the interviewer-administeredQWBis the optimal42s wayto collecthealth outcome information,we have recently

400 • developed a self-administered form, which is known as theControl Mild Modecate Severe

Quality of Well-Being Self-Adrninistered (Q'gq3-sA) version

Group 1.04. The self-ad,rdnistered form can be completed in about10 minutes and can be machine scored. Initial data on the

Figure 2 QWB by level of cognitive impairment in Alz- valid W and reliability of the Q\'_q3-SA 1.04have been reportedheimers disease. (The data in the figure are from [11].) recently [29].

511

Page 16: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R. M. Kaplan et al.

lit. Mean QWB and95% confidenceintervalby group lb.M_znQWBbyCD4+Irmph0cytes

0.9" 0,80' F" 1COW6 bySerum6eta2 MicroglObulin

• .¢ --i--,. 0.83"

0.80"

0.7- -T- 0.70" _ 0,77'

'_ --T-- ._ 0.65-. _- 0.74.0 0.6 ° "/// 0 '" "0.60 • 0•71'

0.55"

0.65 .... , - ,0.4 0.50 1 2 3 4

Control CDC-A C;DC,-B CDC;-4 <1 99 200-500 >500Serum Beta2 Microglobulln Quartile /Group CD4+Cells

1d. QWB by neurologistratingof centralimpairment 1t. Vital status by baseline QWBle. OWB by NeuropsychologistGlobal Ratin

0.800.8

_oo.8 ,T "_ =

°o.o. :O..o.o0.5 _ 0.65"0.4 6- 0.551 2 3 4 5 0.60 0.50

Rating of Central Impairment None Mild > Mild Dead. N=46 Alive. N=466

Impairment Status

Figure 3 Relationship between QWB and CDC group (A), CD4 + ceils (13),b 2-microglobulin (C), neurologists ratings ofcentral nervous system dysfunction (D), neuropsychological .test summary score of impairment (E), and eventual deathwithin 18 months (F). Error bars show 95% confidence intervals. (The data in the figure are from [14].)

In summary, the general QWB and QWB-SA L04 have 1970s. The QWB evolved from a health status index originallyevidence supporting validity in a variety of different specific developed byJ.W. Bush and his associates at UCSD. In orderdiseases. The measures have been shown to be responsive to develop a population health status index, Bush et aLto change and their application has been found to be feasible developed scales for mobility, physical activity and socialin population studies and clinical trials, activit3, [3]. These scales were based on extensive reviews of

a wide variety of questionnaires used for government surveysand epidemiologic monitoring studies.

Comparisons betweenthe QWB and At about the same time RAND was funded to conduct

SF-36: theory the Health Insurance Experiment. The San Diego groupprovided their health status index for use in the study.

The SF-36 The RAND group found the measure cumbersome, butincorporated the basic scales of mobility, physical activity and

The SF-36 has established itself as the most commonly used social activity into their survey instrument. They divided thequality of life measure in the world. The SF-36 grew out of social activity scale into two components of social contactswork by the RAND organization and the Medical Outcomes and self-care. The San Diego group had always includedStudy 0VIOS)[30]. The SF-36, includes eight health Concepts: symptoms and problems in addition to observable function.physical functioning, role-pl4ysical, bodily pain, general health The RAND group decided not to include specific symptomperceptions, vitality, social functioning, role-emotional, and components. Further, the RAND group emphasized the usemental health. The measure has been given to literally hun- of the measures as profiles while the San Diego group evolveddred5 of thousands of respondents and has an excellent toward the creation of a single score. Whereas the SF-36record of reliability and validity [31]. In the following sections development excluded preference weighted scoring, Q\_;,q3we describe some of the similarities between the QWB and single-score development allowed for QXXVBuse as a profile.SF-36. Then, we willpinpoint some of the differences between As part of the Health Insurance Experiment, the RANDthese methods.

group completed detailed psychometric studies of the func-

Common history tion status questionnaire along with a variety of other meas-ures. The next step in this pro_ammatic research endeavor

Although not commonly recognized, the QWB and SF-36 was the large scale MOS. This study continued to refinehave some common roots. Both approaches are outgrowths measures originally developed for the Health Insurance Ex-of early attempts to develop health status indexes in the early periment. Ultimately, this resulted in the current SF-36.

512

Page 17: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

90

/

l

//L_

O 70O

',,60 _. "_i/ - - _ - Control

Treatment

SO | i • i • i , i • i / .1laF FP BP e-I VT SF FE MH

SF-36 Scale

Figure 4 Comparison of SF-36 profiles in hypothetical treatment of headache. PF, physical functioning; RP, role-physical;BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role-emotional; MH, mental health.

Although the SF-36 has evolved considerably from the a treatment or control group. As the figure shows, thoseoriginal function status measures, it is also important to who received the treatment scored higher on measures of

emphasize that many of the concepts remain similar, role-physical, bodily pain, and physical functioning. However,

The choice between measures such as the QWB and the they scored lower on general health perceptions, vitality, andSF-36 is difficult. In the following sections we discuss several mental health. The drug, in this case, was successful in

issues that may guide these decisions. We begin with a more relieving headaches but produced significant drowsiness. UI-

theoretical discussion of the differences between profile timately, clinicians must offer some general interpretation of

and utility based measures. Then, we critique methods for these profiles by applying a weighting system. They mustcomparing measures and offer some data on correlations decide if they are more concerned about physical role function

between measures. Finally, we address the need for separate or about vitality in general health perceptions. Judgment aboutphysical and mental health measures, the relative importance of various dimensions is common and

typically is done implicitly, arbitrarily, and in an idiosyncratic

Theory: profile versus utility scoring way. Thus, physicians or patients may idiosyncratically ignore

A variety of generic measures assess the dimension of life a particular test result or a particular symptom because

quality [32]. These typically include physical functioning, another one is more important to them. However, the processemotional functioning, and some symptomatic complaints, by which relative importance is evaluated can be studiedThe specific dimensions vary from measure to measure and explicit]), and measures of perceived relative importance canthere is substantial debate about which dimensions should become part of the measure.

be included [33]. These problems in interpreting profiles for clinical decision

One of the most important distinctions among measures making explain why it has been difficult to use profiles for

is in how data are scaled and reported. Profile approaches cost-effectiveness and cost-utility analysis. Comparison ofreport a series of scores and characterize individual re- different options for the use of common resources requires

spondents or groups as profiles. The focus of attention is overall quantification of health outcomes using a commonon the components of health rather than on an overall measurement unit. To a large extent, this integrating pref-

summary. An example of one profile is shown in Figure 4. erence or utility, function is the most important feature of

The figure shows two hypothetical profiles corresponding to cost-utility analysis. Despite many attempts, popular outcome

treatments for headache. In this hTpothetical clinical trial, measures such as the Sickness Impact Profile (SIP), SF-36, thepatients with recurrent headaches were randomly assigned to McMaster Health Index Questionnaire, and the Nottingham

513

Page 18: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanetal.

Health Profile (NHP) have not been useful for cost-ef- 0.06

fectiveness analysis because they have multiple outcome 0.05-dimensions. Some profiles, such as the SIP and the NHPyield a single summary score, but are not scaled on a 0 to E 0.04-"_ = Normals1.0 continuum with a clear reference to death. Thus, they _ 0.03- mLoware not useful for cost-effectiveness or cost-utility studies. _ =Moderate

In addition, it is sometimes difficult to capture the total _ 0.02-clinical picture using profile measures because most treat- O 0.01-ments have side-effects as well as benefits. A successful

surgery for prostate cancer, for example, might be associated O- Mobility Physical Socialwith impotence and incontinence. The major challenges arein determining what it means when someone experiences a Figure S QWB subscale adjustments in late life psychosisside-effect and whether tile benefits justify these side-effects, patients of different levels of severity. (The data in the figureHow do we determine whether or not observable side-effects are from [37].)

are important? Should the patient who feels sleepy discontinue

his or her medication? Should a patient with insulin dependent Measured utilities are needed to calculate quality-adjusteddiabetes mellitus discontinue therapy because he or she life-years (QALYs). These methods are required in order todevelops skin problems at the injection sites? Skin problems perform cost-utility analysis [36]. QALYs integrate mortalit3.,are a nuisance, but without treatment the patient would die. and morbidity to express health status in terms of equivalentsOften the issue is not whether treatment causes side-effects, of well-years of life. Suppose a woman who has a lifebut how we should place these side-effects within the per- expectancy of 75 years dies of breast cancer at age 50; thespective of total health. Ultimately, we must decide whether disease was associated with 25 lost life-years, if 100 womentreatment produces a net benefit or a net deficit in health died at age 50 (and also had a life expectancy of 75 years)status. 2500 (100 x 25 years) life-years would be lost.

Often detern-fining net benefit requires difficult trade offs.. Death is not the only outcome of concern in cancer. ManyA treatment may extend life expectancy by treating a patho- adults suffer from the disease leaving them somewhat disabledphysiologic abnormality. However, the treatment may reduce over long periods of time. Although still alive, the quality ofquality of life or life satisfaction. Preferences and utility their lives has diminished. QALYs take into considerationmeasures attempt to quantify quality of life in relation to life the quality of life consequences of these illnesses. For example,expectancy. These methods can be used to estimate net a disease that reduces quality of life by one half will takebenefit. The assessment of net benefit requires summary away 0.5 QALYs over the course of I year. If it effects twomeasures that scale relative desirability of various health people, it will take away 1 year (2 x 0.5) over a 1-year period.outcomes. The scaling represents utility or preferences for A pharmaceutical treatment that improves quality of life bycomponentsof health. 0.2 for each of five individualswill result in the equivalent

Measuring preference evokes many technical and method- of one QALY if the benefit is maintained over a 1-yearological challenges [6,34]. Different methods of preference period. This system has the advantage of considering bothmeasurement can yield different results 1735],a finding that benefits and side-effects of programs in terms of the commonshould not be surprising because the various approaches to QALY units. Although QALYs are typically assessed forpreference assessment are based on different underlying patients, they can also be measured for others, including careconceptual models and the methods ask different questions, givers who are placed at risk because they experience excessDecision theory based methods are similar in that they place stress and burden.wellness on a continuum between 0.0 and 1.0. One of the important contrasts between the Q\VJB and

Decision theory methods are refinements of generic sur- the SF-36 is that the QWB is primarily a preference-basedrival analysis. In traditional survival analysis, those who are measure designed to produce estimates of QALYs. The SF-alive are statistically coded as 1.0 while those who are dead 36 is primarily a profile measure. However, summary scoresare statistically coded as 0.0. Mortality can result from any are available for the SF-36 mental and physical health corn-disease and survival analysis allows the comparison between portents. Similarl}; profile information can be obtained fromdifferent diseases. For example, we can state the life ex- the QWB and QWB-SA. in fact, some investigators preferpectancy for those who will eventually die of heart disease to report profile information along with the overall score.and compare it to the life expectancy to those who eventually For example, Patterson el al. [37] evaluated Q_rB outcomesdie 0fcancer. Thus, there is an advantage over disease specific for patients with late onset schizophrenia. The study ismeasures such as heart eiection fractions and tumor size. interesting because it is always assumed that the variation inThe difficulty is that everyone who remains alive is given the outcomes for patients with mental illness will be reflectedsame score. A person confined to bed with an irreversible by mental health items. Figure 5 summarizes some profilecoma is alive and is counted the same as someone who is information from the Patterson study. It compares patientsactively participating in athletics. Utility assessment, on the who are normal, or have low, moderate, or high degreesother hand, allows the quantification of levels of wellness on of dysfunction associated with schizophrenia (evaluated bythe continuum anchored by death and optimum function, formal psychiatric interviews) in relation to QWB scales for

514

Page 19: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

Table 2 Principal concepts and domains of health-related quality of life contained in general preference weighted instrumentsfor assessing quality-adjusted life years

Instrument

Health Utility' Years of

Disability Index Healthy Qualityof QualityofDistress ................................ Life Well-being Life and

co9 pt................................................. E?2!<........ece!e..............E e!th.........Healthperceptions ** ** **

Social function

Socialrelations ** ** **

Usualsocialrole ** ** **

Intimacy/sexual function

Communication/speech ** ** **

Psychological function

Cognitivefunction ** ** **Emotionalfunction ** ** ** ** **

Mood/feelings **

Physical functionMobility ** ** ** ** ** ** **

Physicalactivity ** ** ** ** ** **Self-care ** ** ** ** **

ImpairmentSensory function/loss ** ** **

Symptoms/impairments ** ** ** ** ** ** **

From Gold MR, Siegel JE, Russel LB, Weinstein MC. (eds). Cost-effeclivenessin Health and Medidne. New York: Oxford University Press,1996.

mobility, physical activity, and social activity. The scaling was related quality of life. Although there has been considerablethe adjustment (from 1.0) in the QWB scale. Thus, higher interest in measuring the cost-effectiveness of treatments,

scores in the figure represent more dysfunction. As the figure little is known about the validity of general outcome measuresdemonstrates, those with more severe schizophrenia have and it is often difficult to choose between different ap-

higher adjustments on the QWB subscales. The figure is proaches. Some authors have attempted to simplify the taskinteresting because it demonstrates that QWB profile in- by offering summary tables. An example of one such summary

formation can be reported if the investigator desires. Further, is shown as Table 2.the figure demonstrates that traditional functional scales Despite the attractiveness of this approach, there are also

capture variability associated with mental illness. We will some difficulties. In particular, creators of the tables typicallyreview the issue of mental heatlh in more detail in the last examine the names of subscales, rather than the content of

section of the paper, the measures. Consider the example of sensory function orloss. According to Table 2, sensory functioning is not includedin the QWB. The creators of the table came to this conclusion

because there is no subscale on the Q\_¢rB named sensory

Comparisons between the QWB and function. However, the QWB-SA includes symptoms for loss

SF-36: studies of vision, loss of hearing, impairment of vision (includingwearing glasses or contact lenses), problems with taste andsmell, and so on. In fact the newer self-administered Q'_rB

Content and face validity (QWB-SA) devotes a major portion of the questionnaire toin 1993, the US Department of Health and Human Services items on sensation and sensory organs. The symptoms include

appointed a multi-disiplinary group of methodologists to any hearing loss, blindness in one eye, blindness in both eyes,recommend standardized strategies for the evaluation of any problems with vision (floaters, double or distorted vision),

health care. The panel, which released its report in 1996, eye pain, sensitivity to light, ear aches, difficulty in balance,suggested that standardized outcomes analyses be conducted and a variety of others. Indeed, the QWB-SA includes muchto evaluate the cost-effectiveness of medical care [36]. These more content on sensory functioning than do measures that

analyses require preference weighted measures of health are identified as including content on sensory functioning.

515

Page 20: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M. Kaplanet al. '_

MeanSF-36Scoresvs.Age-adjusted,NationalAverageScoresTable 3 Correlations between QWB and IVIOS components 100

in.fourstudies Ii

SF-36 Fryback Anderson Hughes Andresen 80.

............ .....Physical function 0.690 0.533 0.62 0.511 60.

Role--physica] 0.566 0.462 0.64 0.281Bodily pain 0.560 0.173 0.42 0.410 4oGeneral health

perception 0.522 0.525 0.37 0.492 20Vitality 0.505 0.481 0.47 0.468Social function 0.397 0.474 0.38 0.277

Role--emotional 0.175 0.283 0.24 0.167 0PF RP BP GH MH RE VT SF

Mental health 0.301 0.272 0.23 0.383

Figure 6 SF-36 scores in each atrial fibrillation for patientsAll studies used the SF-36 except Hughes et al. [40] who used a and MOS normative sample. (The data in the figure are from

modification of the SF-36 known as the MOS-HIV-34. [42].)

has no functional limitations on a particular day [9]. Thus,Correlations between the QWB and SF-36 functional items on surveys such as the National Health

Interview Survey show the great majority of people to be

A growing number of studies have applied both the SF-36 well. On the other hand, only about 12% of the generaland the QVC-B. Table 3 summarizes correlations between the population experience no symptom or problem during anQWB and SF-36 components in four recent studies. The

average week [2]. Fryback et al. [38] studied the distributionalfirst study is based on a population sample in Beaver Dam, properties of the Q_CrB and SF-36 scales in Beaver Dam,

Wisconsin involving 1356 adults (560 men and 796 women). Wisconsin. They found the QWB to be approximately nor-The mean age of the sample was 64.1 years [38]. In this maUy distributed with scores of 1.0 being exceptionally rare.

sample, all of the participants lived in community settings. This finding has also been reported by our group [3] andThe second study involved 20t adults with serious illnesses, others [41].including 99 with AIDS, 74 with cancer, and 28 with other

Ganiats etal. [42] have evaluated the QWB, SF-36, and otherterminal illnesses [39]. The AIDS patients were younger measures in clinical trials for patients with atrial fibrillation.

(mean = 38 years) while those with cancer were older (mean Confirming other studies, they found 0% of patients at the

age=61 years).. Those with other illnesses were also older ceiling level for the QWB. In contrast, many SF-36 scales(mean = 65 years). The third sample was a group of 100 HIV were at their ceiling level. In other words, improvements oninfected men (mean age 35.3 years). These men completed these dimensions would be difficult to detect because the

the MOS-HIV-34 which is a 34-item adaptation of the SF- highest values had already been recorded. For example, nearly36 for people with HIV disease [40]. The fourth sample 70% of respondents obtained the highest score for role-included 301 community dwelling older adults (124 men and

177 women). All participants were older than 65 years and emotional and 54% obtained the highest score for social44% of the men and 53% of the women were older than 75 functioning. This was a concern because these patients were

afflicted with a serious chronic illness. However, the SF-36

years [41]. scores for this elderlypopulation were comparableto those

In all studies correlations between the QWB total score from the MOS normative sample (Figure 6).and the SF-36 physical function scale are substantial. Similarly,correlations between the QWB scale and the role-emotional One of the issues in the application of QWB, SF-36, and

other measures is the percentage of respondents who havescale are low in all four studies. As the table shows, correlations incomplete questionnaires. In the atrial fibrillation studies, all

between the QWB andSF-36 components are quite consistent measures were administered to at least 300 patients. Theacross these very different patient populations. The only percentage of patients who had any data missing for eachexception is the low correlation between total QWB and measure was recorded. Because it was interviewer ad-

bodily pain in the terminally ill sample. Overall, the available ministered, no patient had incomplete survey responses forevidence suggests that the SF-36 and QWB tap most of the the QWB. The EQ-5D had 6% incomplete survey responses.

same variation in health status. The Health Utility Index (HUI) had 11% and the SF-36 had26%. The new self-administered QXVB also has had some

Ceiling effects missing data problems similar to those of.other measures. In

Measures differ in the extent to which they are responsive the recent study by Andresen et al. [41], missing data for theto minor variations in wellness. Some of our early studies QWB-SA were common: about 3% of the respondents failed

demonstrated that nearly 80% of the general population to fill-out items on chronic diseases. Overall, the missing

516

Page 21: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

data rate was about the same as it is for the SF-36. However, _t is commonly asserted that the QWB excludes mental

the SF-36 has worked out a scoring routine that allows the health content. Existing empirical evidence supports themeasures to be scored with up to 50% of the items missing, validity of the QWB in studies of patients known to have

We are currently working on a missing data scoring protocol impaired mental health. One study evaluated the validity of

for the QWB-SA. the QWB as an outcome measure for older psychotic patients.Seventy-two psychotic patients and 28 matched controls from

Mental health the San Diego Veterans Affairs Medical Center completed

Despite widespread interest in the concept of a QALYamong the Q\VB; the Structured Clinical Interview for the DSM-practitioners in many different specialties, these ideas have III-R patient version, Scales for the Assessment of Positivereceived very Iittle attention in the mental health fields. This and Negative Symptoms (SAPS and SANS), and the Global

reflects the widespread belief that mental health and physical Severity Index (GSI) from the Brief Symptom Inventory

health outcomes are conceptually distinct, were administered to all subjects. The QWB was significantly

Although many questionnaires include different dimen- correlated with the SANS (r=--0.52, P<0.001) the SAPSsions, they still may be tapping the same constructs. For (r= --0.57, P < 0.001) and the GSI (r= -0.62, P<0.001).

example, a measiare without a mental health component does There was a strong and significant linear relationship between.not necessarily neglect mental health. Mental health symptoms QWB and severity of illness (as classified by the SANS andmay be included and the impact of mental health, cognitive the SAPS). In addition, component scores of the QWB (i.e.

functioning, or mental retardation may be represented in mobility, physical activity, social activit% and worst symptom)questions about role functioning. Some measures have mul- were significantly lower among patients as compared to

tiple dimensions for mental health symptoms while others controls, and declined systematically as psychiatric symptoms

include fewer items that ask about problems in general. It is increased [37]. In a related study, Rapaport et al. [44] foundnot clear that multiple measures are more capable of detecting that patients with schizophrenia had QWB scores that were

clinical differences. This remains an empirical question for significantly lower (0.6) than age matched non-schizophrenia

systematic analysis. A common strategy is to report outcomes controls (0.7). QWB differences between these two groups

along multiple dimensions, were comparable in magnitude to a wide array of psychiatry,Several years ago Kaplan and Anderson [43] argued that specific measures.

there are many similarities in mental health and physical Several other studies have now documented the sensitivity"

health outcomes. The preference- and utility- based measures, of the QWB to mental health problems. For example, a

that are assumed to ignore mental health, include the basic variety of studies has demonstrated the validity of the QWB

dimensions of observable functioning, symptoms, and dur- for assessing depression among patients with HIV disease.

ation. Mental health problems, like physical health problems, In one study, ratings of depression using the Hamiltoncan be represented by symptoms and by disrupted role Depression Scale (HAM-D) were obtained from 285 HIV

functioning. Consider some examples. Suppose that a patient patients and 84 HIV negative men participating in the HNRChas the primary symptom of a cough. If the cough does not cohort. The data were obtained at baseline and 6 months

disrupt role function, the preference or utility weighted score later. Depression was defined as Hamilton scores greater thanmight show a small deviation from 1.0. If the cough is more 10. The study demonstrated a systematic relationship betweenserious and keeps the person at home, the score will be

Hamilton scores and QWB scores at baseline [45]. In addition,

lower. If the cough is very severe, it might limit the person 22 HIV-positive subjects experienced increases of 10 pointsto a hospital and may have serious disruptive effects upon on the Hamilton Scale between the first evaluation and that

role functioning. This would necessitate an even lower score, at 6 months. For these individuals, significant reductions in

Coughs can be of different duration. A cough associated QWB scores were observed. Analyses of QWB symptomswith an acute respiratory infection may have a serious impacton functioning that may last only a short period of time. suggested ,greater symptom severity among those whose

This would be indicated by a minor deviation in QALYs. A Hamilton scores increased. In addition, there was a greaterreduction in physical activity. In other words, mood affectedchronic cough associated with obstructive lung disease would

be associated with significant loss of QALYs because duration both symptoms and physical function.is a major component of the calculation. Pyne et al. [20,21] compared Q\Veq3scores between patients

Now consider the case of a person with depression, with major depression and controls. The depressed patientsDepression may be a symptom reported by a patient just as were divided into mild, moderate and severe groups according

a cough is reported by other patients. Depression without to their scores on the HAM-D. There was a step-wise decreasedisruption of role function would cause a minor variation of in QWB scores for the controls through each of the patient

wellness. If the depression caused the person to stay at home _oups. Similar results were observed for Beck Depressionthe preference or utility, weighted score would be lower. Inventory (BD1) scores. The difference between each of

Severe depression might require the person to be in a these groups is highly significant (/_<0.001). In order tohospital or special facility and would result in a lower score, evaluate which of six variables (age, sex, family historT,Depressions, like coughs, are of different durations. De- presence of Axis Ili diagnosis, comorbid Axis I diagnosis,

pression of long duration would cause the loss of more HAM-D) best predicted the QWB score in this patient

QALYs than would depression of short duration, population, a hierarchical multiple regression analysis was

517

Page 22: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet aL

conducted. Although Axis III disorders are significant pre- In three of these four populations, SF-36 data were alsodictors of QWB, depression (HAM-D and BDI) predicts available. Using the same strategy, we have replicated factorQWB when Axis III diagnoses are controlled statistically, analyses showing that mental and physical health separation

The Beaver Dam study offers some evidence that the for the SF-36. How might we explain why the QWB factorQWB is sensitive to mental health disorders in the community, analysis mixes mental and physical health symptoms, whereasThe study offers Q\VB scores for individuals with a wide the SF-36 factor analysis separates them? One explanation isvariety of conditions. Respondents with self-reported de- the difference between symptoms and functioning. In bothpression had the third lowest mean QWB score among all factor analyses, functioning items emerge as a separate factor.conditions [46]. The mental health items on the SF-36 use a different format

QALYs can be used to compare treatments for physical and place greater emphasis on symptomatic activity. The SF-and mental health problems. Suppose, for example, that a 36 also separates instructions for role-emotional and physicaltreatment for anxiety elevates patients from a level of 0.65 functioning scales. Respondents are instructed to considerto a level of 0.75. Suppose further, that this treatment benefit disruptions in activities that result from physical or mentallasted for 1 year. Each patient would gain 0.10 QALY health problems. Thus, separation of physical and emotional(0.75-0.65=0.10 x l year=0.10 QALY) for each year the components in factor analysis might be expected becausebenefit was observed. The treatment benefit would be ex- respondents had been prompted to think differently about

pressed in terms of general QALY units. The productivity of them. It is worth noting that the separate dimensions ofthe providers could be compared with providers in other physical and mental health reported by Ware etal. [31] derive

in part from their use of orthogonal rotations techniques.areas of health care. All providers in health care use resources.Dividing the cost of a treatment by the QALY productivit 3, Methods that accommodate correlated factors (i.e. oblique

rotation) have been shown to offer a better fit for healthprovides the cost:utility ratio. Measuring productivity of men-tal health-related treatments in QALY units would allow the data [47].

investments in mental health services to be compared directly Others have noted the complications resulting from theseparation of physical and mental health. For example, Simon

to those in other aspects of health care. el al. evaluated 536 primary care patients before and afterOne of the challenges to single-score measures is the treatment withantidepressantmedications. The mental health

argument that physical and mental health are different di- treatment was associated with improvements in both mentalmensions and that any measure combining them into a single and physical health subscales of the SF-36 (physical function,index is like combining apples and oranges (CE Schwartz, role-physical, bodily pain, and general health perceptions).RM Kaplan, JP Anderson, T Holbrook and lvIW Genderson, However, because of an artifact in the scoring system, theunpublished work). Clearly, mental and physical health ser- physical health summary score was unchanged. The study isvices are different. Yet, providers compete for the same important for two reasons: it demonstrates the complicationshealth care resources and there must be some basis for in interpreting SF-36 summary scores; and it demonstratesdeciding how to allocate these resources, that mental health treatments may have significant effects on

We have conducted several studies designed to determine measures believed to represent physical health [48].We believeif mental and physical health clearly separate into we!l-defined this underscores the fuzzy boundary between physical anddimensions. Ware el al. [31] have clearly shown that factor mental health.analyses of the SF-36 suggest separate physical and mentalhealth dimensions. In a series of studies, we factor analyzed

the QW13 using groups of patients with multiple sclerosis Stirlrlmar[(n= 263), non-insulin dependent diabetes mellitus (n= 420),

non-head traumatic injury (n= 852), AiDS (n= 99), and can- Outcomes researchers now have a variety of validated ap-cer (n=74). Each QWB item was treated as dichotomous proaches to assess the costs, risks and benefits in medicaland the matrix of 0s and ls was subjected to factor analysis care. The QWB and the SF-36 are two methods that arisewith orthogonal rotation. Items with factor loadings > 0.40 from similar traditions. Many components of the measures arewere regarded as consistent with a particular factor, similar and correlations between the measures are substantial.

The analysis consistently revealed two strong factors. The Until recently, the SF-36 was much easier and less expensivefirst factor combined physical and psychological symptoms, to administer. However, an inexpensive self-administeredThe same factor emerged across these diverse patient popu- form of the QWB is now available.lations. The items that consistently load on this factor include There are also important distinctions between the QWBupset stomach, general tiredness, feeling upset, depressed or and SF-36. The QWB places greater emphasis on symptomscrying, dizziness, trouble sleeping, and excessive worry. In and provides more clinical information. For example, it offerscontrast to the symptom factor, the second dimension was the clinician symptoms reports similar to a review of systems.defined by function. The second factor includes items such Further, the QWB can be used for policy analysis becauseas spending the day in a wheelchair, difficulty in walking, and outcomes can be translated into QALYs. Although it islimitations in social activity (CE Schwartz., RM Kaplan, Jp possible to create profiles from the QWB, it does less wellAnderson, T Holbrook and MW Genderson, unpublished than the SF-36 for characterizing multi-dimensional patternswork), of outcome. The QWB and SF-36 represent different

518

Page 23: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

QWB versus SF-36

measurement approaches. The SF-36 is rooted in psy- 10. AndersonJP, Kaplan R/vl, Bern, CC eta/. Interday reliability of

chometric theory. The QWB, although adhering to some function assessment for a health 'status measure. The Qualityprinciples of psychometric theory, arises from a decision of \Vell-Being scale. MedCarel989;27: 1076-1083.

theory tradition. Because of the way it is constructed, some 11, Kerner DN, Patterson TL, Grant I, Kaplan ttaM. 'v_idity of theaspects of psychometric theory do not apply directly to the Quality of Well-being scale for patients with Alzheimers Disease.QWB. For example, traditional test-retest reliability has little J Aging Health 1998; 10: 44-61.value for assessing either the QWB or the SF-36. The Q\'4rBis better suited to policy analysis and to economic studies 12. Kaplan RM, Anderson JP, Patterson TL eta/. Validity of the

that require the calculation ofa QALY. Investigators interested Quality of \'(,:ell-being Scale for persons with HIV infection.in cost-effectiveness or cost-utilit T analysis should consider Po,chosomaticMed 1995: 57: 138-147.

using the QWB or a related utility-based measure. In- 13. Kaplan RaM,Atkins CJ, Timms R. Validity of a quality of welt-vestigators interested in reviewing a profile of outcomes may being scale as an outcome measure in chronic obstructivebe better to use the well-established and well-validated puhnonary disease. J Chmn Di.¢1984; 37: 85-95.SF-36.

14. Kaplan RM, Patterson TL, Kerner DN et a/. The Quality ofWe believe that better measurement technologies will re-

Well-being Scale in asymtomatic HIV infected patients. Qua/place many current approaches to health status assessment. Life Res 1997; 6" 507-514.Continuing research is necessary to build the next generationof outcome measures. 15. Orenstein DM, Nixon PA,Ross EA, Kaplan RaM.The quality

of well-being in cystic fibrosis. Chest 1989; 95: 344-347.

Acknowledgements 16. Kaplan RIvl, Hartwell SL, Wilson DK, Wallace JR Effects ofdiet and exercise interventions on control and quality of life innon-insulin-dependent diabetes mellitus.J Gen Inlern Med 1987;

This work was supported in part by grants R01 HS 09170 2: 220-228.and 5RO1NS33519 from the Agency for Health Care Policy

and Research and grant P60 AR 40770 from the National 17. Ganiats TG, Palinkas LA, Kaplan RM. Comparison of Quality,of \Veil-Being scale and Functional Status Index in patients with

Institutes of Health. atrial fibrillation. A4edCarv 1992; 30: 958-964.

18. Squier H, Ries AL, Kaplan RM et al. Quality of well-being

References predicts survival in lung transplantation candidates. AmJ RespCHt CareMed 1995; 152: 2032-2036.

1. Kaplan RiM. Decisions about prostate cancer screening in 19. Kaplan RM, Kozin F, Anderson JP. Measuring quality of lifemanaged care. Curr Opin Oncol 1997; 9: 480-486. in arthritis patients (including discussion of a general health-

2. Kaplan RM, Anderson JP. A general health policy model: update decision model). Qual Lift CardiovascCare 1988; 4:131-139.and applications. Heallh ._'eTTsRes 1988; 23: 203-234.

20. Pyne JM, Patterson TL, Kaplan RM eta/. Quality of life3. Kaplan RM, Bush JW, Berry CC. Health status index: category assessment for patients with major depression, l_9,chiatrServ

rating versus magnitude estimation for measuring levels of well- 1997; 48: 224-230.being. Med Care 1979; 17: 501-525.

21. Pyne JM, Patterson TL, Kaplan RM ela/. Preliminary longitudinal4. Kaplan R_M.Human preference measurement for health de- assessment of quality of life in patients with major depression.

cisions and the evaluation of long-term care. In Kane RL, Kane Psychophatr_acolBull 1997; 33: 23-29.RA, eds. Valuesand Long-_rm Care. Lexington, MA: LexingtonBooks, 1982: pp. 157-188. 22. Kaplan RM. Application of a general health policy model in

the American health care crisis. J R SocMed 1993; 86: 277-281.5. Kaplan RaM.Quality of life assessment for cost/utility studies

in cancer. Cancer TreatRev 1993; 19: $85-$96. 23. Kaplan RiM. HippocraticPredicament:Affordability,Access, and Ac-

6. Kaplan RM. Profile versus utility based measures of outcome countabili_,in Health Care. San Diego, CA: Academic Press, 1993.for clinical trials. In Staquet MJ, Hays RD, Fayers PM, eds. 24. Kaplan RiM. Value judgment in the Oregon Medicaid ex-QualiO, of Lift Assessment in Clinical THals. London: Oxford periment. ,bled Care1994; 32: 975-88.University Press, 1998: pp. 69-90.

7. Erickson P, Kendall EA, Anderson J, Kaplan RM. Using com- 25. Hodgkin PS. Health impact of endoscopic sinus surgeD, assessedposite health status measures to assess the nations health. Med by the Quality of \X/ell-being (QWB) Scale. Unpublished paper.C}lre1989; 27: $66-$76. San Diego, CA: University of California, 1994.

8. Anderson JP, Kaplan RM, Coons SJ, Schneiderrnan LJ. Corn- 26. Bombardier C, Ware J, Russell IJ eta/. Auranofin therapy andparison of the Quality of Well-being Scale and the SF-36 results quality of life in patients with rheumatoid arthritis. Results ofin two studies of ill adults: AIDS and cancer. J CTinF:pidemiol a multicenter trial. AmJ Met1986; 81: 565-578.

1998; 51: 755-762. 27. Kaplan RM, Anderson JP, Wu AW el a/. The Quality, of Well-9. Nunnally JC, Bernstein IH. PsychometricTheoD;3rd edn. New being Scale. Applications in AIDS cystic fibrosis, and arthritis.

York, NY: McGraw Hill, 1994. Med Care 1989; 27: $27-$43.

519

Page 24: The Quality of Well-Being Scale: critical 36rmkaplan.bol.ucla.edu/Robert_M._Kaplan/1998_Publications_files/0308.pdf · (i) review the Quality of Well-Being Scale (QWB) and (ii) Central

R.M.Kaplanet al.

28. Orenstein DM, Kaplan RM. Measuring the quality of well-being advance directives on quafity adjusted life expectancy and psy-in cystic fibrosis and lung transplantation. The importance of chological well-being among ill adults. J Clin Epidemiol 1994;the area under the curve. Chest 1991; 100: 1016-1018. 47: 761-772.

29. Kaplan RM, Sieber WJ, Ganiats TG. The Quality of Well-being 40. Hughes TE, Kaplan RM, Coons SJ et al. Construct validities ofScale: comparison of the interviewer-administered version with the quality of weU-being scale and the MOS-HI"V-34 healtha self-administered questionnaire. PsycholHea#h 1997; 12: 783- survey for HIV- infected patients. Med Deals .Making 1997; 17:791. 439-446.

30. Ware Jr JE, Phillips J, Yody BB, Adamczyk J. Assessment tools: 41. Andreson EM, Rothenberg BM, Kaplan RaM.Performance offunctional health status and patient satisfaction. AmJ MedQual a self-administered mailed version of the Quality of Well-being1996; 11: $50-$53. (Q_q3-SA) Questionnaire among older adults. MedCare 1998;

36: 1349-1360.

31. Ware Jr JE, Kosinski M, Bayliss MS el aZ Comparison of42. Ganiats TG, Sieber WJ, Barber E, Barrett-Connor E. Initialmethods for the scoring and statistical analysis of SF-36 health

profile and summary' measures: summary of results from the comparison of four generic quaffw of life instruments. Qua/LifeMedical Outcomes Study. Med Care33: AS264-AS279. Res 1997; 6: 648.

43. Kaplan R_M, Anderson JP. The Quality of WeU-being Scale:32. Spilker B (ed). Quality of Life a,M Pham_acoeconomicsin Oinicalrationale for a single quality of life index. In Walker SR, Rosser

Trials. New York, NY: Raven, 1996: pp. 309-322.R, eds. Quality of Life: Assessmentand Application. London: MTP

33. Kaplan RaM, Anderson JP. The general health poficy model: Press, 1988: pp. 51-77.

an integrated approach. In Spilker B, ed. Quality of Life and 44. Rapaport MH, Bazzetta J, Mcadams LA et aL Validation of thePharmacoeconomicsin Clinical Trials.New York, NY: Raven, 1996: Scale 9f Functioning in older outpatients with schizophrenia.pp. 309-322. Am f GeriatrP_ychiatO, 1996; 4: 218-228.

34. Coons SJ, Kaplan R_g. Cost-utility analysis. In Bootman JL, 45. Rubin HC, Patterson TL, At'ldnson JH el al. Tracking effectsTownsend RJ, McGhan WF, eds. Pdndples of Pharmacoeconomics, of depression on fife quality in HIV. 2nd InternationalConference2nd edn. Cincinnati, OH: Harvey Whitney Books, 1996: pp. onAIDS hnpacl, Brighton UK, July 7-10, 1994.102-126[

46. Fryback DG, Dasbach EJ, Klein R, et al. The Beaver Dam35. Revicki DA, Kaplan RM. Relationship between psychometric Health Outcomes Study: initial catalog of health-state quality

and utility-based approaches to the measurement of health- factors. Med DealsMaking 1993; 13: 89-102.related quality of life. Qual Life Res 1993; 2: 477-487.

47. Hays RD, Marshall GN, Wang EY, Sherbourne CD. Four-year

36. Gold MR, Siegel JE, Russel LB, Weinstein MC. Cost-effeclhJeness cross-lagged associations between physical and mental healthin Health andMedicine.New York: Oxford University Press, 1996. in the Medical Outcomes Stud),.J Consult Clin Psycho/1994; 62:

441-449.37. Patterson TL, Kaplan RM, Grant I et al. Quality of well-being

in late-life psychosis. P(ychiat.o,Res 1996; 63: 169-181. 48. Simon GE, Revicki DA, Grothaus L, Vonkorff M. SF-36summary scores: are physical and mental health truly distinct?

38. Fryback DG, Lawrence WF, Martin PA el al. Predicting Quality Med Care 36: 567-572.of Well-being scores from the SF-36: results from the BeaverDam Health Outcomes Study. Med DecisMakitg 1997; 17: 1-9.

39. Anderson J P, Kaplan RM, Schneiderman LJ. Effects of offering Accepted for publication 4 September 1998

520