External validation of the SAPS II, APACHE II and APACHE III prognostic models in South England: a...

Received: 3 September 2001Accepted: 7 November 2002Published online: 18 January 2003© Springer-Verlag 2003

Work performed in the Department of In-tensive Care Medicine, Portsmouth Hospi-tals NHS Trust, Queen Alexandra Hospital,Portsmouth PO6 3LY, Hampshire, UK.Funding support: This study was supportedby departmental funds. There was no con-flict of interest.

Abstract Objective: External vali-dation of three prognostic models inadult intensive care patients in SouthEngland. Design: Prospective cohortstudy. Setting: Seventeen intensivecare units (ICU) in the South WestThames Region in South England.Patients and participants: Data of16,646 patients were analysed. Interventions: None. Measurementsand results: We compared directlythe predictive accuracy of threeprognostic models (SAPS II,APACHE II and III), using formaltests of calibration and discrimina-tion. The external validation showeda similar pattern for all three modelstested: good discrimination, but im-perfect calibration. The areas underthe receiver operating characteristics(ROC) curves, used to test discrimi-nation, were 0.835 and 0.867 forAPACHE II and III, and 0.852 forthe SAPS II model. Model calibra-tion was assessed by Lemeshow-Hosmer C-statistics and was Χ2 =232.1 for APACHE II, Χ2 =443.3 for APACHE III and Χ2 =287.5 for SAPS II. Conclusions: Disparity in case mix,a higher prevalence of outcomeevents and important unmeasuredpatient mix factors are possible

sources for the decay of the models’predictive accuracy in our popula-tion. The lack of generalisability ofstandard prognostic models requirestheir validation and re-calibrationbefore they can be applied with con-fidence to new populations. Cus-tomisation of existing models maybecome an important strategy to ob-tain authentic information on diseaseseverity, which is a prerequisite forreliably measuring and comparingthe quality and cost of intensive care.

Keywords Intensive care unit · Intensive care · Severity-of-illness index · Prognostication · Outcome · Hospital mortality

Intensive Care Med (2003) 29:249–256DOI 10.1007/s00134-002-1607-9 O R I G I N A L

Dieter H. BeckGary B. SmithJohn V. PappachanBrian Millar

External validation of the SAPS II, APACHE II and APACHE III prognostic modelsin South England: a multicentre study

Introduction

Prognostic models inevitably reflect the medical cultureand the population characteristics of the countries whichthey originate from. The development of the Simplified

Acute Physiology Score (SAPS) II [1] was based on acohort of ICU patients from Europe and North America.The Acute Physiology and Chronic Health Evaluation(APACHE) II and III systems [2, 3] were developed andvalidated in the United States (US). Although the meth-

D.H. Beck (✉)Department of Anaesthesiology and Intensive Care,Charité Hospital, Humboldt University,Schumannstrasse 20-21, 10098 Berlin, Germanye-mail: [email protected].: +44-30-2189111Fax: +44-30-2189111

G.B. SmithDepartment of Intensive Care Medicine,Queen Alexandra Hospital,Portsmouth, Hampshire, PO6 3LY UK

J.V. PappachanDepartment of Anaesthesia,Southhampton General Hospital,Southampton, UK

B. MillarCritical Audit Ltd.,St.George’s Hospital Medical School,London, UK

D.H. BeckLindauerstr.10, 10781 Berlin, Germany

Verwendete Distiller 5.0.x Joboptions

Dieser Report wurde automatisch mit Hilfe der Adobe Acrobat Distiller Erweiterung "Distiller Secrets v1.0.5" der IMPRESSED GmbH erstellt. Sie koennen diese Startup-Datei für die Distiller Versionen 4.0.5 und 5.0.x kostenlos unter http://www.impressed.de herunterladen. ALLGEMEIN ---------------------------------------- Dateioptionen: Kompatibilität: PDF 1.2 Für schnelle Web-Anzeige optimieren: Ja Piktogramme einbetten: Ja Seiten automatisch drehen: Nein Seiten von: 1 Seiten bis: Alle Seiten Bund: Links Auflösung: [ 600 600 ] dpi Papierformat: [ 595 785 ] Punkt KOMPRIMIERUNG ---------------------------------------- Farbbilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 150 dpi Downsampling für Bilder über: 225 dpi Komprimieren: Ja Automatische Bestimmung der Komprimierungsart: Ja JPEG-Qualität: Mittel Bitanzahl pro Pixel: Wie Original Bit Graustufenbilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 150 dpi Downsampling für Bilder über: 225 dpi Komprimieren: Ja Automatische Bestimmung der Komprimierungsart: Ja JPEG-Qualität: Mittel Bitanzahl pro Pixel: Wie Original Bit Schwarzweiß-Bilder: Downsampling: Ja Berechnungsmethode: Bikubische Neuberechnung Downsample-Auflösung: 600 dpi Downsampling für Bilder über: 900 dpi Komprimieren: Ja Komprimierungsart: CCITT CCITT-Gruppe: 4 Graustufen glätten: Nein Text und Vektorgrafiken komprimieren: Ja SCHRIFTEN ---------------------------------------- Alle Schriften einbetten: Ja Untergruppen aller eingebetteten Schriften: Nein Wenn Einbetten fehlschlägt: Warnen und weiter Einbetten: Immer einbetten: [ ] Nie einbetten: [ ] FARBE(N) ---------------------------------------- Farbmanagement: Farbumrechnungsmethode: Alle Farben zu sRGB konvertieren Methode: Standard Arbeitsbereiche: Graustufen ICC-Profil: RGB ICC-Profil: sRGB IEC61966-2.1 CMYK ICC-Profil: U.S. Web Coated (SWOP) v2 Geräteabhängige Daten: Einstellungen für Überdrucken beibehalten: Ja Unterfarbreduktion und Schwarzaufbau beibehalten: Ja Transferfunktionen: Anwenden Rastereinstellungen beibehalten: Ja ERWEITERT ---------------------------------------- Optionen: Prolog/Epilog verwenden: Nein PostScript-Datei darf Einstellungen überschreiben: Ja Level 2 copypage-Semantik beibehalten: Ja Portable Job Ticket in PDF-Datei speichern: Nein Illustrator-Überdruckmodus: Ja Farbverläufe zu weichen Nuancen konvertieren: Nein ASCII-Format: Nein Document Structuring Conventions (DSC): DSC-Kommentare verarbeiten: Nein ANDERE ---------------------------------------- Distiller-Kern Version: 5000 ZIP-Komprimierung verwenden: Ja Optimierungen deaktivieren: Nein Bildspeicher: 524288 Byte Farbbilder glätten: Nein Graustufenbilder glätten: Nein Bilder (< 257 Farben) in indizierten Farbraum konvertieren: Ja sRGB ICC-Profil: sRGB IEC61966-2.1 ENDE DES REPORTS ---------------------------------------- IMPRESSED GmbH Bahrenfelder Chaussee 49 22761 Hamburg, Germany Tel. +49 40 897189-0 Fax +49 40 897189-71 Email: [email protected] Web: www.impressed.de

Adobe Acrobat Distiller 5.0.x Joboption Datei

<< /ColorSettingsFile () /AntiAliasMonoImages false /CannotEmbedFontPolicy /Warning /ParseDSCComments false /DoThumbnails true /CompressPages true /CalRGBProfile (sRGB IEC61966-2.1) /MaxSubsetPct 100 /EncodeColorImages true /GrayImageFilter /DCTEncode /Optimize true /ParseDSCCommentsForDocInfo false /EmitDSCWarnings false /CalGrayProfile () /NeverEmbed [ ] /GrayImageDownsampleThreshold 1.5 /UsePrologue false /GrayImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /AutoFilterColorImages true /sRGBProfile (sRGB IEC61966-2.1) /ColorImageDepth -1 /PreserveOverprintSettings true /AutoRotatePages /None /UCRandBGInfo /Preserve /EmbedAllFonts true /CompatibilityLevel 1.2 /Start /AntiAliasColorImages false /CreateJobTicket false /ConvertImagesToIndexed true /ColorImageDownsampleType /Bicubic /ColorImageDownsampleThreshold 1.5 /MonoImageDownsampleType /Bicubic /DetectBlends false /GrayImageDownsampleType /Bicubic /PreserveEPSInfo false /GrayACSImageDict << /VSamples [ 2 1 1 2 ] /QFactor 0.76 /Blend 1 /HSamples [ 2 1 1 2 ] /ColorTransform 1 >> /ColorACSImageDict << /VSamples [ 2 1 1 2 ] /QFactor 0.76 /Blend 1 /HSamples [ 2 1 1 2 ] /ColorTransform 1 >> /PreserveCopyPage true /EncodeMonoImages true /ColorConversionStrategy /sRGB /PreserveOPIComments false /AntiAliasGrayImages false /GrayImageDepth -1 /ColorImageResolution 150 /EndPage -1 /AutoPositionEPSFiles false /MonoImageDepth -1 /TransferFunctionInfo /Apply /EncodeGrayImages true /DownsampleGrayImages true /DownsampleMonoImages true /DownsampleColorImages true /MonoImageDownsampleThreshold 1.5 /MonoImageDict << /K -1 >> /Binding /Left /CalCMYKProfile (U.S. Web Coated (SWOP) v2) /MonoImageResolution 600 /AutoFilterGrayImages true /AlwaysEmbed [ ] /ImageMemory 524288 /SubsetFonts false /DefaultRenderingIntent /Default /OPM 1 /MonoImageFilter /CCITTFaxEncode /GrayImageResolution 150 /ColorImageFilter /DCTEncode /PreserveHalftoneInfo true /ColorImageDict << /QFactor 0.9 /Blend 1 /HSamples [ 2 1 1 2 ] /VSamples [ 2 1 1 2 ] >> /ASCII85EncodePages false /LockDistillerParams false >> setdistillerparams << /PageSize [ 595.276 841.890 ] /HWResolution [ 600 600 ] >> setpagedevice

ods have been reported to adapt well to different envi-ronments [4, 5], all but one of the published studies [6]that formally assessed the models’ predictive accuracywhen applied to new populations from other institutionsor countries have shown the same pattern: good discrimi-nation, but imperfect calibration [7, 8, 9, 10, 11, 12, 13,14, 15, 16].

Few investigations have directly compared the predic-tive accuracy of the newer prognostic models on a largerscale in the same independent patient population. Rowanet al. applied the Mortality Prediction Model (MPM0)and the APACHE II system to British and Irish ICU pa-tients [14]. The Mortality Prediction Model (MPM) IIand SAPS II were evaluated in a cohort of 16,060 ICUpatients from 12 European countries [9]. A study of9,420 European patients, compared SAPS II, MPM IIand APACHE II [7]. Five prognostic models includingSAPS II, APACHE II and APACHE III were externallyvalidated in Scotland [15]. In England, the APACHE IIand III systems were compared only once in a single-centre investigation [16].

In the present study, we performed the external vali-dation of three predictive models and directly comparedtheir performance in the same independent population ofintensive care patients in South England.

Materials and methods

Hospitals

The study was conducted from 1 April, 1993, to 31 December,1996. During this period, data on all admissions to 17 generalICUs in the Western Division of the South Thames RegionalHealth Authority and the Portsmouth Health District were collect-ed. Sixteen ICUs were located in district general hospitals, onewas located in a teaching hospital. Data collection was performedover the entire study period in 15 ICUs. In one ICU, data collec-tion began on 1 June, 1995, and was continued until the end of thestudy period. One ICU participated only from 1 April to 30 June,1994. Specific approval by the ethics committee was not requiredsince data had already been collected for clinical purposes.

Patients

Patients who had been admitted to the ICU because beds in a sep-arate coronary care or high dependency unit (HDU) were notavailable in the hospitals concerned were not considered to requireintensive therapy and were excluded a priori from the study. Dur-ing the study period, 20,988 patients who required intensive thera-py were admitted. Patients with primary burn injury (n=31), ICUstay less than 4 h (n=956), age less than 16 years (n=1,469) andre-admissions (n=1,322) were excluded. After applying these ex-clusion criteria, 17,210 patients were included in the study. A fur-ther 564 patients had to be excluded because of missing mortalityprobabilities, which left a total of 16,646 patients to be includedfor the data analysis. Only the data from the first ICU admission inany one hospital admission were considered. Patients after coro-nary artery surgery were also excluded from the study.

Data collection

Written guidelines for data definition and formal training of thepersonnel involved in the process of data acquisition were provid-ed to ensure the reliability of the data collected. On a quarterly ba-sis, data of a randomly selected sample of 20% of the admissionsfrom each participating ICU were cross-validated against the re-spective medical and nursing records and other available docu-mentation. All documented data were checked for implausible andoutlying values. Data including age and sex, pre-existing co-mor-bidity, diagnostic category, type of admission (medical vs electiveand emergency surgical), were collected using the criteria and def-initions described by the developers of the APACHE III model [3].The APACHE III diagnoses were mapped to correspondingAPACHE II diagnoses using a system made available to us byAPACHE Medical Systems. Diagnostic classification is not re-quired for the SAPS II model. The vital status at ICU and hospitaldischarge was documented for all patients. Within each ICU, theelectronic storage of demographic, clinical and physiological datawas undertaken using a personal computer equipped with a spe-cially developed database programme (Wardwatcher, Critical Audit, London, UK).

Scores and mortality probabilities

The calculations of the individual severity-of-illness scores foreach model were based upon the most deranged physiological val-ues recorded within the first 24 h of ICU admission. Missingphysiological variables were considered to be normal and assignedzero points. The mortality probabilities for APACHE II and SAPSII were calculated using the original regression equations [1, 2]. Inthe case of the APACHE III method, the respective computationswere provided by APACHE Medical Systems.

Statistical analysis

The discriminative ability of the models was assessed using re-ceiver operating characteristics (ROC) curves. The area under theROC curve (AUC) is an expression of the model’s ability to dis-criminate correctly between survivors and non-survivors [17].Hosmer-Lemeshow C-statistics and calibration curves were em-ployed for assessing the calibration. Hosmer-Lemeshow C-testsare summary statistics for assessing the agreement between the ac-tual and predicted death rates [18, 19]. Calibration curves wereconstructed by plotting the predicted death rates stratified by 5%intervals of mortality risk (x-axis) against the observed death rates(y-axis).

Standardized mortality ratios (SMRs), obtained by dividing theobserved by the expected number of deaths, and 95% confidenceintervals were computed to test the models’ uniformity-of-fitacross subgroups of ICU patients. Simple χ2 tests were applied toanalyse differences between predicted and actual outcomes. Con-tinuous variables and differences between the means of normallydistributed variables were analysed by Student’s t-test.

Results

The overall hospital death rate (26.5%) was significantlyhigher than predicted by all three models (p<0.0001,t-test). Hospital mortality varied 2.8-fold across the par-ticipating ICUs (15.4–43.1%). The overall ICU mortalitywas 18.3% and showed a 3.4-fold variation between theindividual ICUs (9.6–32.4%). SMRs for the whole popu-

250

lation were 1.17 (CI: 1.13–1.20; p<0.001) for SAPS II,1.18 (CI: 1.15–1.22; p<0.001) for APACHE II and 1.24(CI: 1.20–1.32; p<0.001) for APACHE III.

Demographic and clinical patient data are summari-sed in Table 1. Hospital mortality for females, who ac-counted for 41.2% (n=6,866) of the patients, was similarto that of male patients. Non-survivors were significantlyolder than survivors (p<0.0001, t-test). Mortality wassignificantly higher for medical than for surgical admis-sions (p<0.0001, χ2 test) and emergency surgical patientshad a significantly higher death rate than had electivesurgical patients (p<0.0001, χ2 test). For all models,mean severity-of-illness scores and mean mortality prob-abilities were significantly higher (p<0.0001, t-test) fornon-survivors.

For all models, the SMRs for individual ICUs showedwide variation across the spectrum of participating hos-pitals (Table 2). The SMRs ranged from 0.92 to 1.53 for SAPS II, 0.93 to 1.55 for APACHE II and from 0.98to 1.56 for the APACHE III model. Only two of seven-teen intensive care units had mortality ratios less than1.0. Significant discrepancies between the models oc-curred in only three ICUs. Applying the SAPS II andAPACHE II methods, “ICU performance” was as goodas expected for seven of the seventeen units. Observedhospital mortality was higher than predicted by SAPS IIand APACHE II (SMR>1) for ten ICUs. When the

251

Table 1 Demographic characteristics, operative status and ob-served mortality rates. Non-survivors were significantly older (t-test, p<0.001). Mortality was significantly higher for medicalversus surgical patients and emergency versus elective surgical pa-tients (p<0.0001, χ2 test). For all models, mean scores and mortal-ity probabilities were significantly higher for non-survivors(p<0.001, t-test)

All Survivors Non-survivors (n=16 646) (n=12,227) (n=4,419)

Age (years, 61 (16–101) 58 (16–101) 68 (16–99)range)

Male (%) 9770 (58.8) 7232 (43.4) 2538 (13.4)Female (%) 6866 (41.2) 4975 (30.0) 1881 (11.3)Medical (%) 9817 (59.0) 6704 (68.3) 3113 (31.7)Surgical (%) 6829 (41.0) 6169 (80.9) 1305 (19.1)Elective 4185 (25.1) 3864 (89.1) 421 (10.1)Emergency 2644 (15.9) 2305 (66.6) 884 (33.4)

Median length of stay (days)Hospital stay 9.2 (1.3–17.4) 10.5 (5.1–18.9) 4.1 (1.3–10.5)ICU stay 1.8 (0.9–4.0) 1.7 (0.9–3.7) 2.3 (0.9–6.2)

ScoresSAPS II 34 (17) 28 (14) 51(18)APACHE II 15 (7) 13 (6) 22 (8)APACHE III 57 (25) 48 (22) 85 (29)

Mortality probabilities (%)SAPS II 22.7 (27) 13.8 (17) 47.3 (29)APACHE II 22.4 (23) 15.2 (15) 42.3 (25)APACHE III 21.5 (26) 12.3 (16) 46.8 (28)

Tab

le2

Inte

nsiv

e ca

re u

nit

and

hosp

ital

mor

tali

ty a

nd s

tand

ardi

zed

mor

tali

ty r

atio

s (S

MR

s) f

or t

he i

ndiv

idua

l in

tens

ive

care

uni

ts (

95%

CI

conf

iden

ce i

nter

vals

for

the

SM

Rs)

pva

lue

is f

or s

impl

e ch

i-sq

uare

test

s us

ed f

or th

e an

alys

is o

f th

e di

ffer

ence

s be

twee

n ob

serv

ed a

nd e

stim

ated

dea

th r

ates

ICU

Adm

is-

ICU

H

ospi

tal

SA

PS

II

APA

CH

E I

IA

PAC

HE

III

sion

sm

orta

lity

mor

tali

tyS

MR

95%

CI

pva

lue

SM

R95

% C

Ip

valu

eS

MR

95%

CI

pva

lue

150

413

.923

.21.

090.

90–1

.30

0.40

0.99

0.82

–1.1

80.

931.

231.

02–1

.48

0.02

732

728

19.9

30.8

1.19

1.04

–1.3

5<

0.01

1.28

1.11

–1.4

5<

0.00

11.

361.

19–1

.55

<0.

001

377

719

.729

.31.

201.

05–1

.36

<0.

011.

291.

12–1

.46

<0.

001

1.32

1.15

–1.5

0<

0.00

14

1,16

715

.222

.71.

060.

93–1

.19

0.37

1.03

0.91

–1.1

60.

671.

080.

96–1

.22

0.19

75

829

23.4

33.7

1.18

1.04

–1.3

2<

0.01

1.22

1.08

–1.3

8<

0.00

11.

391.

23–1

.57

<0.

001

61,

092

24.3

35.8

1.12

1.01

–1.2

30.

031.

231.

11–1

.35

<0.

001

1.23

1.11

–1.3

5<

0.00

17

121

13.2

24.8

1.38

0.93

–1.9

80.

091.

220.

82–1

.75

0.31

1.21

0.81

–1.7

30.

349

843

232

.443

.11.

531.

32–1

.77

<0.

001

1.55

1.34

–1.7

9<

0.00

11.

561.

34–1

.80

<0.

001

92,

484

9.6

15.4

0.92

0.83

–1.0

10.

090.

930.

84–1

.02

0.14

0.98

0.88

–1.0

80.

657

1086

031

.24.

21.

371.

23–1

.52

<0.

001

1.48

1.33

–1.6

4<

0.00

11.

381.

24–1

.53

<0.

001

1178

921

.028

.01.

120.

97–1

.27

0.11

1.10

0.96

–1.2

60.

161.

181.

03–1

.34

0.01

7312

1,12

517

.226

.51.

231.

09–1

.37

<0.

001

1.28

1.14

–1.4

3<

0.00

11.

291.

15–1

.44

<0.

001

1364

018

.127

.01.

441.

23–1

.67

<0.

001

1.40

1.2–

1.62

<0.

001

1.32

1.13

–1.5

30.

0003

142,

562

17.0

24.9

1.31

1.21

–1.4

2<

0.00

11.

181.

09–1

.28

<0.

001

1.20

1.10

–1.2

9<

0.00

115

1447

20.2

25.3

1.08

0.97

–1.2

00.

141.

161.

04–1

.28

<0.

011.

291.

16–1

.43

<0.

001

1642

418

.426

.71.

231.

02–1

.48

0.03

1.10

0.91

–1.3

20.

331.

291.

06–1

.55

0.00

8617

665

12.9

21.4

0.94

0.79

–1.1

10.

491.

060.

89–1

.25

0.53

1.18

1.00

–1.3

90.

0508

Tota

l16

,646

18.3

26.6

1.17

1.13

–1.2

0<

0.00

11.

181.

15–1

.22

<0.

001

1.24

1.20

−1.

27<

0.00

1

APACHE III equation was used for case mix adjustment,only four ICUs performed as expected; for thirteen units,the actual mortality was higher than predicted byAPACHE III.

Analysis of the overall goodness-of-fit demonstratedgood discrimination, but imperfect calibration for allthree models. Discrimination, as tested by the AUC, wasbest for the APACHE III model. For both SAPS II(0.852 vs 0.860) and APACHE III (0.867 vs 0.90) theAUCs were only marginally smaller compared with theoriginal databases [1, 3]. For APACHE II, the AUC was0.835. The AUC was not reported in the original descrip-tion [2].

Hosmer-Lemeshow C-statistics showed poor calibra-tion (p<0.001) for all three models (Table 3): SAPS II(χ2=287.5), APACHE II (χ2=232.1) and APACHE III(χ2=443.3). The calibration curves for APACHE II andIII (Figs. 1a and b) lay close to the diagonal at both endsof the spectrum of predicted risks (<20% and >80%).For the risk strata from 25% to 75%, both curves layabove the line of ideal prediction; this deviation wasmore pronounced for APACHE III. The SAPS II curve(Fig. 1c) lay closer to the diagonal for the strata withpredicted risks from 25% to 75%, but deviated from the

calibration line for higher risks groups (>80%). SAPS IIrisk predictions were significantly different from ob-served mortality for 13 out of 20 risk groups. The re-spective figures for APACHE II and APACHE III were14 and 15. Overall, the mortality of high risk patientswas better reflected by the curves of the APACHE mod-els, the mortality in the intermediate risk groups was bet-ter described by the SAPS II curve. Low risk patientswere similarly well described by all methods.

More than 90% of our ICU patients were admittedfrom the operating/recovery room, the emergency de-partment and the wards of the same hospital. Less than10% were transferred from ICUs and wards of other hos-pitals, or were directly admitted to the ICU (Table 4).Apart from the predictions by SAPS II for admissionsfrom the emergency department (SMR=0.99), the actualmortality was higher than estimated (SMR>1) for thethree major patient locations with all three models.SMRs were similarly raised for patients transferred fromother intensive care units. Predictions matched the ob-served mortality for patients directly admitted and thegroup admitted from non-ICU facilities of other hospi-tals.

252

Table 3 Hosmer-Lemeshow C-statistics for SAPS II, APACHE II and APACHE III (df degrees of freedom, No. number)

Patients SAPS II APACHE II APACHE III

Decile No. of deaths Decile No. of deaths Decile No. of deaths

Observed Estimated Observed Estimated Observed Estimated

1,664 0–1.5 22 15.4 0–2.7 32 24.7 0–1.3 35 11.51,665 1.5–2.9 77 39.5 2.7–4.8 71 61.7 1.3–2.4 46 30.71,664 2.9–5.2 121 67.2 4.8–7.3 128 98.8 2.4–3.5 77 53.01,665 5.2–7.2 166 104.3 7.3–10.1 187 143.0 3.5–6.4 128 85.41,665 7.2–16.7 223 155.0 10.1–14.1 235 199.7 6.4–9.7 212 132.01,664 16.7–26.6 303 233.5 14.1–19.3 351 276.0 9.7–15.5 303 205.51,665 26.6–41.5 534 355.4 19.3–27.1 472 385.7 15.5–25.1 468 332.91,664 41.5–57.5 640 552.8 27.1–38.7 710 543.1 25.1–40.2 723 531.11,665 57.5–66.1 998 885.5 38.7–56.9 925 785.6 40.2–64.5 1036 857.91,665 66.1−100 1335 1373.2 56.9–1004 1308 1212.8 65.5–100 1391 206.816,646 4419 3781.7 4419 3731.0 4419 3575.0

χ2 =287.5, df =8, p<0.0001 χ2 =232.1, df =8, p<0.0001 χ2 =443.3, df =8, p<0001

Table 4 Standardized mortality ratios for patient location before ICU admission. 95% confidence intervals for the standardized mortali-ty ratios in parentheses

Location Admissions (%) Deaths (%) SAPS II APACHE II APACHE III

Operating/recovery room 6,829 (41.0) 19.1 1.20 (1.13–1.26) 1.09 (1.03–1.15) 1.25 (1.19–1.32)Emergency department 4,369 (26.4) 24.7 0.99 (0.93–1.05) 1.19 (1.12–1.26) 1.24 (1.17–1.32)Ward 3,987 (24.0) 39.8 1.33 (1.26–1.39) 1.28 (1.22–1.35) 1.24 (1.18–1.31)ICU other hospital 674 (4.0) 36.8 1.22 (1.08–1.39) 1.26 (1.11–1.46) 1.21 (1.07–1.37)Other hospital 591 (3.6) 30.3 0.99 (0.85–1.15) 1.05 (0.90–1.21) 1.07 (0.92–1.24)Direct admission 169 (1.0) 6.5 0.69 (0.34–1.23) 0.82 (0.41–1.47) 1.09 (0.54–1.96)

Total 16,646 (100) 1.17 (1.13–1.20) 1.18 (1.15–1.22) 1.24 (1.20–1.27)

Discussion

The external validation of three widely used prognosticmodels showed good discrimination, but imperfect cali-bration for all three models when applied to the same in-dependent population of English ICU patients. Our re-sults accord with other reports on the performance ofAPACHE II and APACHE III in Britain [10, 12]. Thesame pattern was observed in the external validation ofthe SAPS II, APACHE II and III models in Scottish in-tensive care patients [15] and was also reported for theSAPS II model applied to European patients [8, 9]. In alarge US database, the external validation of theAPACHE III model showed very good discrimination ofthe model but imperfect calibration [11]. Only one study

reported good calibration for the APACHE II model, butagain imperfect calibration for the two other methodstested [6].

Case mix

Disparity between the case mix of test and reference dat-abases is one of the main sources for the decay of the pre-dictive accuracy of prognostic models when applied topopulations other than those they were developed for [20,21]. In fact, substantial differences in the case mix of UKpatients and the population of the original APACHE IIand III investigations have been reported [12, 22, 23].The SAPS II model also failed to adjust adequately fordifferences in the case mix profiles of ICU patients fromvarious European countries [8, 9, 24, 25]. The distribu-tion of case mix factors examined in our study was com-parable to that reported by Pappachan et al. [12] whoused the same database but a shorter study period than wedid in the present investigation. In essence, patients inEngland were older, had greater co-morbidity and weresicker than were the US patients. More English patientswere admitted from the hospital wards, rather than fromthe emergency department and the proportion of emer-gency surgical admissions was higher. Different modelsuse different inclusion criteria, for instance the minimallength of ICU stay. We applied the inclusion criteria de-scribed for the original APACHE III (<4 h), while for theSAPS II and APACHE II reference databases a minimalICU stay more than 8 h was used. This could have alteredthe case mix and may have contributed to the differencesin performance between the models.

Quantifying case mix

Many studies have described the impact of case mix onthe predictive accuracy of prognostic models, but meth-ods for quantifying these effects have not been widelyapplied. Using computer simulation techniques, Murphy-Filkins et al. [26] analysed the effect of the diversity inpatient mix on the performance of the MPM0 admissionmodel. A stepwise increase and decrease of the propor-tions of each of the model’s variables was performed un-til a “critical percentage” was reached at which the mod-el’s calibration for the population was no longer consid-ered acceptable. This is an important concept because itquantifies the influence of major patient factors on mod-el performance for a particular population. The MPM0admission model [27] includes almost exclusively binaryvariables that refer to the presence or absence of a cer-tain condition, but this method could be modified forphysiology-based severity systems, which primarily relyon continuous variables and may also be applicable forquantifying unmeasured case mix factors.

253

Fig. 1 a APACHE II calibration curve. b APACHE III calibrationcurves. c SAPS II calibration curves. The bars represent the num-ber of patients in each risk group. The dashed, diagonal line indi-cates ideal prediction (predicted = observed mortality). Error barsrepresent the 95% confidence intervals

Unmeasured case mix

Most prognostic models have traditionally focused onpatient characteristics, deranged physiology and co-mor-bidities, but it is increasingly recognised that outcomesalso depend on a variety of clinical and non-clinical fac-tors that are not measured and therefore can not be ad-justed for by the models. If one or more important vari-ables are not present in the model, this model may not betransportable to centres with a different case mix [20].Possibly relevant, but unmeasured, case mix factors cancomprise the entire spectrum from pre- to post-ICU care,including the organisational and structural framework ofcare, admission and discharge policies, and the level ofdischarge facilities available. Currently, many of thesefactors seem to undergo rapid transformation.

At the time of our investigation, no pre-defined, uni-form admission and discharge criteria existed and mostof the participating hospitals did not have designated in-termediate care facilities. It was impossible to determineexactly the number of available high dependency unit(HDU) beds in the hospitals during the study period be-cause the provision of the few HDU beds varied, de-pending on the actual funding and the availability oftrained nursing staff. However, the fact that 71% of ourICU patients were discharged to “normal” wards and lessthan 1% were discharged to HDU facilities in the samehospital is a reflection of the lack of intermediate carefacilities in most hospitals. Similarly, about 4% of ourICU patients were transferred to intensive care units andanother 4% were discharged to non-ICU facilities in oth-er hospitals.

In a recent investigation [28], we examined the effectsof discharge facilities on post-ICU mortality, using datafrom a single intensive care unit which also participatedin the present study. Our results demonstrated that pre-mature ICU discharge – defined by inappropriately highdischarge TISS scores – was common and was associat-ed with increased mortality. Patients who were dis-charged “prematurely” to intermediate care facilities hadlower mortality rates than had patients who were dis-charged to hospital wards. Thus, the provision of HDUbeds contributed to a reduction in post-ICU deaths forsuch patients. It can be assumed that the low availabilityof intermediate care beds could also have contributed tothe higher mortality rates observed in the present investi-gation. The absence of uniform admission and dischargepolicies may also explain the large variation between theindividual ICUs, but these factors are difficult to assessin a standardised manner.

Patient location

Patient location before ICU admission is an example ofan important case mix factor which is not measured by

APACHE II and SAPS II. By contrast, it was introducedas a predictor variable into the APACHE III equation,but accounts for only 1% of the model’s explanatorypower [3]. Patient location serves as an indirect or proxymeasure of the delay in instituting intensive therapy,which is assumed to be associated with increased mortal-ity. For instance, a large proportion of patients admittedfrom the hospital wards, rather than from the emergencydepartment, is considered as an indicator of delayed in-tensive therapy and has been shown to be associatedwith higher mortality rates [11, 24]. Compared with thedata from Europe [9] and the US [11], a larger propor-tion of our patients was admitted from the hospital wards(UK: 24.0% vs Europe: 18.2% vs US: 11.7%) and asmaller proportion was admitted from the emergency de-partment (UK: 26.4% vs Europe: 29.2% vs US: 39.0%).Except for the predictions by SAPS II for admissionsfrom the emergency department (SMR =0.99), the ob-served mortality in our population was higher than esti-mated (SMR >1) for the three major patient locationswith all three models.

Prevalence of outcome events

Another reason for the deterioration of the models’ fit isthat the prevalence of the outcome events in a new seriesof patients may be lower or higher than in the populationin which the model was developed [29]. The hospitalmortality rates reported in the original descriptions of theAPACHE II, APACHE III and SAPS II models were19.6%, 17.0% and 21.8%, respectively. The mortalityrate for our population was 26.5%. Other investigationsreported similarly increased mortality rates ranging from29.4% to 34.1% [8, 15, 22]. Two computer simulationstudies showed that as the difference between originaland simulated mortality rates increases, the calibration ofthe models deteriorates progressively [30, 31]. The high-er hospital mortality rates in our and other studies mayhave contributed to the consistently observed deteriora-tion in the calibration of standard models for new popu-lations.

Customisation

Prognostic models that imperfectly characterise the mor-tality of a specific population can be adjusted by cus-tomising them to obtain more reliable mortality esti-mates [29, 30]. Customisation of the original SAPS II,APACHE III and MPM0 admission models was success-fully employed to improve the models’ fit for Italian,Spanish and European ICU patients [8, 32, 33]. We re-calibrated the original SAPS II model for a population of15,511 ICU patients in South England [34], using thesame database as in the current investigation, and

254

showed that customisation resulted in significantly im-proved calibration: the Hosmer-Lemeshow χ2 value de-creased from 306 before to 14.5 after customisation,model discrimination was not affected. Customised mod-els are useful for quality assurance purposes [30] andgood, re-calibrated local severity models can be used forperforming reliable quality of care comparisons [35].The major disadvantage is that re-calibration of standardmodels precludes further external comparisons and re-stricts the use of the re-calibrated model to the popula-tion for which customisation had been performed.

Changes in structure, organisation and financing ofthe health care systems are having profound effects onthe provision of hospital and intensive care and may fur-ther degrade the validity of standard severity models.The earlier discharge of patients from acute care hospi-tals has become common practice in many countries andcan affect the performance of models that rely on vitalstatus at hospital discharge as the principal outcomemeasure. In-hospital mortality may no longer representan adequate end point and alternative outcome measure-ments are needed [36, 37]. Increasing pressure on ICUbeds may result in delayed or refused ICU admission[38, 39] or premature ICU discharge [40], and will con-sequently alter the composition of the ICU populationsand their mortality experience. Prognostic models shouldbe validated and, where appropriate, can be periodicallyre-calibrated to maintain their validity in the process ofthese dynamic developments.

In conclusion, the mortality probabilities for all mod-els tested did not accurately reflect the mortality experi-

ence of ICU patients in South England. Disparities incase mix and mortality rates between Britain and othercountries are the most likely explanations for the mod-els’ deficiencies in fitting British ICU populations. Fur-ther research should focus on the detailed description ofthe case mix of ICU populations, including methods forquantifying measured and unmeasured patient mix fac-tors. Attempts should be made to investigate newlyevolving trends in the delivery of health care and theireffects on the composition of ICU populations and valid-ity of outcome measures. Comparisons regarding qualityissues are limited by the lack of valid methods for stan-dardising case mix, but as we are becoming increasinglyaccountable for the quality and the cost of intensive care,reliable data on its efficiency and efficacy are needed.Authentic information on disease severity is a prerequi-site for reliable quality of care comparisons and the re-calibration of existing prognostic models seems a prag-matic strategy to obtain this information without muchdelay.

Acknowledgements We acknowledge the directors and staff ofthe following ICUs for their co-operation and enthusiasm in thecollection of data for this study: Ashford Hospital, Middlesex;Crawley Hospital; East Surrey Hospital, Redhill; Epsom GeneralHospital; Frimley Park Hospital; Haywards Heath Hospital;Kingston Hospital; Mayday Hospital, Croydon; Queen Mary’sHospital, Roehampton; Royal Surrey County Hospital, Guildford;St. Helier Hospital, Carshalton; St. Peter’s Hospital, Chertsey; St.Richard’s Hospital, Chichester; Worthing Hospital; General ICU,St. George’s Hospital, Tooting; Queen Alexandra Hospital, Ports-mouth and St. Mary’s Hospital, Portsmouth.

255

References

1. Le Gall JR, Lemeshow S, Saulnier F(1993) Development of a new scoringsystem, the SAPS II, from a Europe-an/North American multicenter study.JAMA 270:2957–2963

2. Knaus WA, Draper EA, Wagner DP etal. (1985) APACHE II: a severity ofdisease classification system. Crit CareMed 10:818–829

3. Knaus WA, Wagner DP, Draper EA etal. (1991) The APACHE III prognosticsystem: risk prediction of hospital mor-tality for critically ill hospitalizedadults. Chest 100:1619–1639

4. Wong DT, Crofts SL, Gomez M,McGuire GP, Byrick RJ (1995) Evalua-tion of predictive ability of APACHE IIsystem and hospital outcome in Cana-dian intensive care patients. Crit CareMed 23:1177–1183

5. Sirio CA, Tajimi K, Tase C, KnausWA, Wagner DP, Hirasawa H et al.(1992) An initial comparison of inten-sive care in Japan and the UnitedStates. Crit Care Med 20:1207–1215

6. Markgraf R, Deutschinoff G, PientkaL, Scholten T (2000) Comparison ofAcute Physiology and Chronic HealthEvaluation score II and III and Simpli-fied Acute Physiology Score II: a pro-spective cohort study evaluating thesemodels to predict outcome in a Germanmultidisciplinary intensive care unit.Crit Care Med 28:26–33

7. Castella X, Artigas A, Bion J, Kari A(1995) A comparison of severity of ill-ness scoring systems for intensive careunit patients: results of a multicenter,multinational study. Crit Care Med23:1327–1335

8. Apolone G, Bertolini G, D’Amico R,Iapichino G, Cattanco A, De Salvo G,Melotti RM (1996) The performance ofSAPS II in a cohort of Italian ICUs: results from GiViTI. Intensive CareMed 22:1368–1378

9. Moreno R, Reis Miranda D, Fidler V,Van Schilfgaarde R (1998) Evaluationof two outcome prediction models onan independent database. Crit CareMed 26:50–61

10. Rowan KM, Kerr JH, Major E, Mc Pherson K, Short A, Vesssey MP(1993) Intensive Care Society’sAPACHE II study in Britain andIreland II: outcome comparisons of in-tensive care units after adjustment forcase mix by the American APACHE IImethod. BMJ 307:977–981

11. Zimmerman JE, Wagner DP, DraperEA et al. (1998) Evaluation of AcutePhysiology and Chronic Health III pre-dictions of hospital mortality in an in-dependent database. Crit Care Med20:1317–1326

12. Pappachan JV, Millar B, Bennett D,Smith GB (1999) Comparison of out-come from intensive care admission after adjustment for case mix by theAPACHE III prognostic system. Chest 115:802–805

13. Bastios PG, Sun X, Wagner DP, KnausWA, Zimmerman JE (1996) Applica-tion of the APACHE III prognosticsystem in Brazilian intensive careunits: a prospective multicenter study.Intensive Care Med 22:564–570

14. Rowan KM, Kerr JH, Major E et al.(1994) Intensive Care Society’s AcutePhysiology and Chronic Health Evalu-ation (APACHE II) study in Britainand Ireland: a prospective, multicenter,cohort study comparing two methodsfor predicting outcome for adult inten-sive care patients. Crit Care Med22:1392–1401

15. Livingston BM, MacKirdy FN, HowieJC, Jones R, Norrie JD (2000) Assess-ment of the performance of five inten-sive care scoring models within a largeScottish database. Crit Care Med28:1820–1827

16. Beck DH, Taylor BL, Millar B, SmithGB (1997) Prediction of outcome fromintensive care: a prospective cohortstudy comparing APACHE II and III ina UK intensive care unit. Crit CareMed 25:9–15

17. Hanley JA, McNeil BJ (1982) Themeaning and use of the area under a receiver operating characteristic (ROC)curve. Radiology 143:29–36

18. Hosmer DW, Lemeshow S (1989) Applied logistic regression. John Wiley, New York, pp 135–149

19. Lemeshow S, Hosmer DW (1982) A review of the goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 115: 92–106

20. Altman DC, Royston P (2000) What dowe mean by validating prognosticmodels? Stat Med 19:453–473

21. Harrell FE, Lee K (1984) Regressionmodelling strategies for improvedprognostic prediction. Stat Med3:143–152

22. Goldhill DR, Withington PS (1996)The effect of case mix adjustment onmortality as predicted by APACHE II.Intensive Care Med 22:415–419

23. Rowan KM, Kerr JH, Major E et al.(1993) Intensive Care Society’sAPACHE II study in Britain andIreland I: variations in case mix ofadult admissions to general intensivecare units and impact on outcome.BMJ 307:972–977

24. Moreno R, Apolone G, Reis Miranda D(1998) Evaluation of the uniformity offit of general outcome prediction mod-els. Intensive Care Med 24:40–47

25. Metnitz PGH, Vesely H, Valentin A,Popow C, Hiesmayr M, Lenz K, KrennCG Steltzer H (1999) Evaluation of aninterdisciplinary data set for nationalintensive care unit assessment. CritCare Med 27:1486–1491

26. Murphy-Filkins RL, Teres D, Lemeshow S, Hosmer DW (1996) Effect of changing patient mix on theperformance of intensive care unit se-verity-of-illness models: how to distin-guish a general from a speciality inten-sive care unit. Crit Care Med24:1968–1973

27. Lemeshow S, Teres D, Klar J et al.(1993) Mortality Probability Models(MPM II) based on an international cohort of intensive care unit patients.JAMA 270:2478–2486

28. Beck DH, McQuillan PJ, Smith GB(2002) Waiting for the break of dawn?The effects of discharge TISS scores,discharge time and discharge facilitieson mortality after intensive care. Intensive Care Med 28:1287–1293

29. Miller ME, Hui SL, Tierney WM(1991) Validation techniques for logis-tic regression models. Stat Med10:1213–1226

30. Zhu PG, Lemeshow S, Hosmer DW,Klar J, Avrunin J, Teres D (1996) Factors affecting the performance ofthe models in the Mortality PredictionModel II system and strategies of customization: a simulation study. Crit Care Med 24:57–63

31. Glance LG, Osler TM, Papadakos P(2000) Effect of mortality rate on theperformance of the Acute Physiologyand Chronic Health Evaluation II: asimulation study. Crit Care Med28:3424–3428

32. Rivera-Fernandez R, Vazquez-Mata G,Bravo M, Aguayo-Hoyos E, Zimmerman JE, Wagner DP, Knaus WA (1998) The APACHE IIIprognostic system: customized mortali-ty predictions for Spanish ICU pa-tients. Intensive Care Med 24:574–581

33. Moreno R, Apolone G (1997) Impactof different customization strategies inthe performance of a general severityscore. Crit Care Med 25:2001–2008

34. Beck DH, Smith GB, Pappachan JV(2002) The effects of two methods forcustomising the original SimplifiedAcute Physiology Score (SAPS) II forintensive care patients from South England. Anaesthesia; 57:785–793

35. Sirio CA, Shepardson LB, Rotondi AJ,Cooper GS, Angus DC, Harper DL,Rosenthal GE (1999) Community-wideassessment of intensive care outcomesusing a physiologically based prognos-tic measure. Chest 115:793–801

36. Teres D, Lemeshow S (1998) AsAmerican as apple pie and APACHE(editorial). Crit Care Med26:1297–1298

37. Ridley S (2001) Critical care out-comes. Anaesthesia 56:1–3

38. Smith GB, Taylor BL, McQuillan PJ,Nials E (1995) Rationing intensivecare: intensive care provision varieswidely in Britain. BMJ 10:1412–1413

39. Metcalfe A, McPherson K. Sloggett A(1997) Mortality among appropriatelyreferred patients refused admission tointensive care units. Lancet 350:7–11

40. Goldfrad C, Rowan K (2000) Consequences of discharges from intensive care at night. Lancet 355:1138–1142

256

External validation of the SAPS II, APACHE II and APACHE III prognostic models in South England: a...

Documents

Transcript of External validation of the SAPS II, APACHE II and APACHE III prognostic models in South England: a...