Tackling over- dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr...
-
Upload
aaron-adkins -
Category
Documents
-
view
219 -
download
0
Transcript of Tackling over- dispersion in NHS performance indicators Robert Irons (Analyst – Statistician) Dr...
Tackling over-dispersion in NHS performance indicators
Robert Irons (Analyst – Statistician)
Dr David Cromwell (Team Leader)
20/10/2004
2
Outline of presentation
• NHS Star Ratings Model
• Criticism of some of the indicators
• The reason – overdispersion
• Options for tackling the problem
• Our solution – an additive random effects model
• Effects on the ratings indicators
3
Performance Assessment in the UK•1990s: Government focused on efficiency•1997: Labour replaces Conservative government•Late 90s: Labour focus on quality & efficiency
– Define Performance Assessment Framework– Publish NHS Plan in 2000– Commission for Health Improvement (CHI)
created– Performance ratings first published in 2001,
responsibility passed to CHI for 2003 publication– Healthcare Commission replaces CHI on April
2004, has broader inspection role
4
NHS Performance Ratings
• An ‘at a glance’ assessment of NHS trusts’ performance
– Performance rated as 0, 1, 2, or 3 stars
– Yearly publication
• Focus on how trusts deliver government priorities– Linked to implementation of key policies
• Priorities and Planning framework
• National Service Frameworks
• Have limited role in direct quality improvement– Modernisation agency helps trusts with low rating
5
Scope of NHS ratings
2001 2002 2003 2004
Acute trusts
Ambulance trusts
Mental health trusts
Primary care trusts
6
The ratings model
• Overall rating derived from many different indicators– and affected by Clinical Governance Reviews
• Two types of indicators, organised in 4 groups– Key targets & Balanced Scorecard indicators– BS indicators grouped into 3 focus areas
• Patient focus, clinical focus, capacity & capability
Trust TypeKey
TargetsBalanced Scorecard
Acute 9 35Ambulance 4 19Mental Health 7 31Primary Care Trusts 9 33
7
Combining the indicators
• Indicators are measured on different scales– Categorical (eg. Yes/No)– Proportional (eg. proportion of patients waiting
longer than 15 months)– Rates (eg. mortality rate within 30 days following
selected surgical procedures)• Further complication
– Performance on some indicators is measured against published targets – define thresholds
– Performance on other indicators is based on relative differences between trusts
8
Combining the indicators
• Indicators first transformed so they are all on an equivalent scale
• Key targets assigned to three levels:– achieved
– under-achieved
– significantly under-achieved • Balanced scorecard indicators
– 1 – significantly below average (worst performance)
– 2 – below average
– 3 – average
– 4 – above average
– 5 – significantly above average (best performance)
9
Transforming the indicators
• Key target indicators transformed using thresholds defined by government policy
• Balanced scorecard indicators transformed via several methods
– Percentile method– Statistical method– Absolute method, if policy target exists– Mapping method (for indicators with ordinal scales)
Trust type
Acute trusts Ambulance
trusts
Mental health trusts
Primary care trusts
Percentile 11 3 9 11
Statistical 12 8 9 11
Absolute 8 3 5 4
Defined mapping 4 5 8 7
10
Transforming the indicators- the statistical method
Trust type
Indicators Acute trusts Ambulance
trusts
Mental health trusts
Primary care trusts
Clinical indicators 4 2
Patient survey 5 5 4 5
Staff survey 3 3 3 3
Change in rate indicators
3
11
The old statistical method
• Based on simple confidence intervals• 95% and 99% confidence intervals calculated for a
trust’s indicator value• Trust confidence interval compared with the overall
national rate (effectively a single point)
Significantly below average
1 no 99% confidence interval overlap: higher values
Below average 2 no 95% confidence interval overlap: higher values
Average 3 overlapping 95% confidence intervals, eg England: 5.51% to 5.55%
Above average 4 no 95% confidence interval overlap: lower values
Significantly above average
5 no 99% confidence interval overlap: lower values
12
The old statistical method- problematic
• Not a proper statistical hypothesis test• Differentiating between trusts based on
differences that exceed levels of sampling variation
• On some indicators, this led to the assignment of too many NHS trust to the significantly good/ bad bands on some indicators
13
Working example- standardised readmission rate of patients within 28 days of initial discharge
Significantly below
average
Below average
Average Above average
Significantly above
average
Total
32 6 40 13 49 140
Trusts with > 50 readmissions
0
0.5
1
1.5
2
SA
R
14
Readmissions within 28 days of discharge- funnel plot (2003/04 data)
Expected re-admissions
Old band 95% limits 99% limits
2.5993 5607.48
-.149339
2.14934
555
2
5
13
55
1
2
5
3
4
3
3
111
33 3
3
433 3 333 3
1
3
4 431
1
55
5
33 3
13 3
5
5
11
1
54 5 43
33
5
33
1
55
111
3 31
35 53
5
5
33 3
5 53
1333
5
3
1
5
3
1
5
3
1
4
123 3
53
5
313
354
31
1
5
3 34
5
31
13 3
1
53
11 11
5
3
5
3
1
5
11
335
11
5
55
3
113
1 11
3
1
55
3
1
4
1
3 31
5
11
3 1
5
33
5
15
Mortality within 30 days of selected surgical procedures- funnel plot (2003/04 data)
exp
Old band 95% limits 99% limits
.641582 348
0
1.97137
3
5
3
55
33
3
55
3
3
2
3 33
3
5
333
3
3
5
2
3
33
3
3
33
3
3
3
3
3
33
3
2
3
3
3
33
3
3
4
3
3
3
33
3
3
3
3
33
3
3
1
3
3
3
33
3
1
333
3
3
1
5
33
3
3
33
3
2
3
333
3
3
4
3
4
1
3
3
2
5
3
3
3434
33333
33
2
33
3
3
4
3
4
33
34
3
3
3
3
1
3
3
5
33
3
34
32
33
333
5
3 33
33
2
3
3
45
3
33
34
5
1
3
3 33
5
3
5 4
16
Z scores
• Standardised residual• Z scores are used to summarise
‘extremeness’ of the indicators• Funnel plot limits approximate to the
naïve Z score• Naïve Z score given by
– Zi = (yi –t)/si
– Where yi is the indicator value, and si is the local standard error
17
Dealing with over-dispersion
• Three options were considered– Use of an ‘interval null hypothesis’– Allow for over-dispersion using a
‘multiplicative variance model’– …or a ‘random-effects additive
variance model’
18
Interval null hypothesis
• Similar to the naïve Z score or standard funnel limits• Uses a judgement of what constitutes a normal
range for the indicator• Define normal range (eg percentiles, national rate ±
x%)• Funnel limits then defined as:
– Upper/ lower limit = Range limit ± (x * si0)
• Reduces number of significant results• But might be considered somewhat arbitrary• Interval could be defined based on previous years’
data, or prior knowledge• Makes minimal use of the sampling error
19
Interval null hypothesis-a funnel plot
exp
Old band 95% limits 99% limits
2.5993 5607.48
-.136105
2.15319
555
2
5
13
55
1
2
5
3
4
3
3
111
33 3
3
433 3 333 3
1
3
4 431
1
5 55
33 3
13 3
5
5
11
1
54 5 43
33
5
33
1
55
111
3 31
35 53
5
5
33 3
5 53
1333
5
3
1
5
3
1
5
3
1
4
123 3
53
5
313
354
31
1
5
3 34
5
31
13 3
1
53
11 11
5
3
5
3
1
5
11
335
11
5
55
3
113
1 11
3
1
55
3
1
4
1
3 31
5
11
3 1
5
33
5
20
Multiplicative variance model
• Inflates the variance associated with each observation by an over-dispersion factor ( ):
Zi2 = Pearson X2
= X2 / I • Limits on funnel plot are then expanded by • Do not want to be influenced by the outliers we
are trying to identify• Data are first winsorised (shrinks the extreme z-
values in)• Over dispersion factor could be provisionally
defined based on previous years’ data• Statistically respectable, based on a ‘quasi-
likelihood’ approach
21
Multiplicative over-dispersion-a funnel plot (not winsorised, = 21.45)
Expected re-admissions
Old band 95% limits 99% limits
2.5993 5607.48
-.190294
2.19029
555
2
5
13
55
1
2
5
3
4
3
3
111
33 3
3
433 3 333 3
1
3
4 431
1
55
5
33 3
13 3
5
5
11
1
54 5 43
33
5
33
1
55
111
3 31
35 53
5
5
33 3
5 53
1333
5
3
1
5
3
1
5
3
1
4
123 3
53
5
313
354
31
1
5
3 34
5
31
13 3
1
53
11 11
5
3
5
3
1
5
11
335
11
5
55
311
31 1
1
3
1
55
3
1
4
1
3 31
5
11
3 1
5
335
22
Multiplicative over-dispersion-a funnel plot (10% winsorised, = 13.97)
Expected re-admissions
Old band 95% limits 99% limits
2.5993 5607.48
-.191437
2.19144
555
2
5
13
55
1
2
5
3
4
3
3
111
33 3
3
433 3 333 3
1
3
4 431
1
55
5
33 3
13 3
5
5
11
1
54 5 43
33
5
33
1
55
111
3 31
35 53
5
5
33 3
5 53
1333
5
3
1
5
3
1
5
3
1
4
123 3
53
5
313
354
31
1
5
3 34
5
31
13 3
1
53
11 11
5
3
5
3
1
5
11
335
11
5
55
311
31 1
1
3
1
55
3
1
4
1
3 31
5
11
3 1
5
335
23
Winsorising
• Winsorising consists of shrinking in the extreme Z-scores to some selected percentile, using the following method.
1. Rank cases according to their naive Z-scores.
2. Identify Zq and Z1-q, the (100*q)% most extreme top and bottom naive Z-scores, where q might, for example, be 0.1
3. Set the lowest (100*q)% of Z-scores to Zq, and the highest (100*q)% of Z-scores to Z1-q. These are the Winsorised statistics.
• This retains the same number of Z-scores but discounts the influence of outliers.
24
Winsorising
• Winsorising
Fra
ctio
n
zi-14.909 11.148
0
.248555
Fra
ctio
n
zi-14.909 11.148
0
.248555
Non winsorised
10% winsorised
25
Random effects additive variance model
• Based on a technique developed for meta-analysis• Originally designed for combining the results of
disparate studies into the same effect• In meta-analysis terms, consider the indicator value
of each trust to be a separate study• Essentially seeks to compare each trust to a ‘null
distribution’ instead of a point
• Assumes that E[yi] = i, and V[i] =
• Uses a method-of-moments method to estimate
(Dersimonian and Laird, 1986) • Based on winsorised estimate of
22
26
Random effects additive variance model
• If ( I ) < ( I – 1) then – the data are not over-dispersed, and = 0 – use standard funnel limits/ naïve Z scores
• Otherwise:
• Where wi = 1 / si2
• The new random-effects Z score is then calculated as:
2
i i ik ii www
II2
2 )1(ˆˆ
22
0
s
yz
i
iD
i
27
Comparing to a ‘null distribution’
Trusts w ith > 50 readmissions
0
0.5
1
1.5
2
SA
R
28
Additive over-dispersion-a funnel plot (20% winsorised)
Expected re-admissions
Old band 95% limits 99.8% limits
2.5993 5607.48
.004654
1.99535
29
Effects on the banding of trusts- Readmissions 2002/03 data
Significantly below average
Below average
Average Above average
Significantly above
average
Previous banding method
32 6 40 13 49
Random-effects (20% winzorised)
3 9 101 21 6
30
Why we chose the additive variance method• Generally avoids situations where two trusts which
have the same value for the indicator get put in different bands because of precision
• A multiplicative model would increase the variance at some trusts more than at others
– e.g. a small trust with large variance would be affected much more than a large trust with small variance
• By contrast, an additive model increases the variance at all trusts by the same amount
• Better conceptual fit with our understanding of the problem, that the factors inflating variance affect all trusts equally, so an additive model is preferable
31
References:DJ Spiegelhalter (2004) Funnel plots for comparing institutional performance. Statistics in Medicine, 24, (to appear)
DJ Spiegelhalter (2004) Handling over-dispersion of performance indicators (submitted)
R DerSimonian & N Laird (1986) Meta-analysis in clinical trials. Controlled Clinical Trials, 7:177-188
Acknowledgements:David SpiegelhalterAdrian CookTheo Georghiou
Thank you