Embracing uncertainty when
reasoning about outcomes
Dr Andy Fugard
Research Fellow
EBPU Masterclass 7
Timetable 10:30 – 10:45 Registration
10:45 – 11:00 Introductions
11:00 – 11:30 What data are you collecting? What’s it for?
11:30 – 13:00 Recap on basics of uncertainty • Summary stats (mean, median, SD, quantiles/percentiles)
• Reasoning from sample to population
• (Im)precision, confidence/uncertainty intervals and p-values
• Effect size and power
13:00 – 14:00 Lunch
14:00 – 15:00 Statistical techniques often used to assess change in CAMHS • Reliable change index
• “Added-value”/estimated treatment effect scores
15:00 – 15:15 Coffee break
15:15 – 16:00 Basics of statistical process control • Run charts
• Control charts
The one slide summary…
• Statistical methods are needed to extract meaning from the noise – There’s a lot of methods
– You might be pleasantly surprised how helpful your (or a friend’s) old undergrad stats textbook is
– Don’t forget about statistician colleagues
• Interpretation of data equally essential – Forget the numbers for a moment
– Think what’s actually happening clinically?
– What do the theories say?
– What do the systematic reviews say helps? NICE Guidelines? Your service users?
Warmup – what data do you
collect?
Summary statistics
Data (N = 200) randomly generated using SDQ Parent norms
14 5 15 1 12 7 7 14 19 21 0 9 0 10 0 11 8 3
9 7 7 3 7 11 13 12 13 12 0 12 12 12 11 0 9 11
14 18 9 17 0 5 0 3 8 11 1 2 0 8 16 0 1 3
9 1 8 11 4 4 11 16 9 10 14 14 3 3 10 12 8 8
8 16 7 7 8 13 2 12 10 6 5 5 21 12 17 13 0 0
0 20 10 8 11 0 7 0 9 11 9 7 4 13 4 4 13 11
13 0 4 0 4 18 3 0 12 14 7 8 6 1 14 8 8 12
6 14 16 12 8 8 11 5 2 8 13 6 12 1 19 13 8 8
16 9 6 7 12 8 8 5 1 4 0 18 11 3 12 14 18 0
7 11 0 12 9 20 10 7 13 2 17 12 13 0 2 3 7 15
15 6 16 6 6 6 1 5 2 0 5 7 5 18 12 8 1 0
12 7
What a lot of statistics is about
• Reducing data in various ways
• Uncovering relationships
• Drawing inferences about a population based on a random sample of that population
All stats packages will compute stuff on slides following (and deal with tricky details) – this meant to build intuitions; don’t compute by hand!
(Arithmetic) Mean:
sum all numbers and divide by N
14 5 15 1 12 7 7 14 19 21 0 9 0 10 0 11 8 3
9 7 7 3 7 11 13 12 13 12 0 12 12 12 11 0 9 11
14 18 9 17 0 5 0 3 8 11 1 2 0 8 16 0 1 3
9 1 8 11 4 4 11 16 9 10 14 14 3 3 10 12 8 8
8 16 7 7 8 13 2 12 10 6 5 5 21 12 17 13 0 0
0 20 10 8 11 0 7 0 9 11 9 7 4 13 4 4 13 11
13 0 4 0 4 18 3 0 12 14 7 8 6 1 14 8 8 12
6 14 16 12 8 8 11 5 2 8 13 6 12 1 19 13 8 8
16 9 6 7 12 8 8 5 1 4 0 18 11 3 12 14 18 0
7 11 0 12 9 20 10 7 13 2 17 12 13 0 2 3 7 15
15 6 16 6 6 6 1 5 2 0 5 7 5 18 12 8 1 0
12 7
14 + 5 +15 +1 + 12 + … + 7
200=8.2
(Arithmetic) Mean:
sum all numbers and divide by N
14 5 15 1 12 7 7 14 19 21 0 9 0 10 0 11 8 3
9 7 7 3 7 11 13 12 13 12 0 12 12 12 11 0 9 11
14 18 9 17 0 5 0 3 8 11 1 2 0 8 16 0 1 3
9 1 8 11 4 4 11 16 9 10 14 14 3 3 10 12 8 8
8 16 7 7 8 13 2 12 10 6 5 5 21 12 17 13 0 0
0 20 10 8 11 0 7 0 9 11 9 7 4 13 4 4 13 11
13 0 4 0 4 18 3 0 12 14 7 8 6 1 14 8 8 12
6 14 16 12 8 8 11 5 2 8 13 6 12 1 19 13 8 8
16 9 6 7 12 8 8 5 1 4 0 18 11 3 12 14 18 0
7 11 0 12 9 20 10 7 13 2 17 12 13 0 2 3 7 15
15 6 16 6 6 6 1 5 2 0 5 7 5 18 12 8 1 0
12 7
14 + 5 +15 +1 + 12 + … + 7
200=8.2
(Arithmetic) Mean:
sum all numbers and divide by N
14 5 15 1 12 7 7 14 19 21 0 9 0 10 0 11 8 3
9 7 7 3 7 11 13 12 13 12 0 12 12 12 11 0 9 11
14 18 9 17 0 5 0 3 8 11 1 2 0 8 16 0 1 3
9 1 8 11 4 4 11 16 9 10 14 14 3 3 10 12 8 8
8 16 7 7 8 13 2 12 10 6 5 5 21 12 17 13 0 0
0 20 10 8 11 0 7 0 9 11 9 7 4 13 4 4 13 11
13 0 4 0 4 18 3 0 12 14 7 8 6 1 14 8 8 12
6 14 16 12 8 8 11 5 2 8 13 6 12 1 19 13 8 8
16 9 6 7 12 8 8 5 1 4 0 18 11 3 12 14 18 0
7 11 0 12 9 20 10 7 13 2 17 12 13 0 2 3 7 15
15 6 16 6 6 6 1 5 2 0 5 7 5 18 12 8 1 0
12 7
14 + 5 +15 +1 + 12 + … + 7
200=8.2
(Arithmetic) Mean:
sum all numbers and divide by N
14 5 15 1 12 7 7 14 19 21 0 9 0 10 0 11 8 3
9 7 7 3 7 11 13 12 13 12 0 12 12 12 11 0 9 11
14 18 9 17 0 5 0 3 8 11 1 2 0 8 16 0 1 3
9 1 8 11 4 4 11 16 9 10 14 14 3 3 10 12 8 8
8 16 7 7 8 13 2 12 10 6 5 5 21 12 17 13 0 0
0 20 10 8 11 0 7 0 9 11 9 7 4 13 4 4 13 11
13 0 4 0 4 18 3 0 12 14 7 8 6 1 14 8 8 12
6 14 16 12 8 8 11 5 2 8 13 6 12 1 19 13 8 8
16 9 6 7 12 8 8 5 1 4 0 18 11 3 12 14 18 0
7 11 0 12 9 20 10 7 13 2 17 12 13 0 2 3 7 15
15 6 16 6 6 6 1 5 2 0 5 7 5 18 12 8 1 0
12 7
14 + 5 +15 +1 + 12 + … + 7
200=8.2
Median: sort & take the middle value
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 11
11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12
12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13
13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15
16 16 16 16 16 16 17 17 17 18 18 18 18 18 19 19 20 20
21 21
What about the rest of the data? (Histogram)
Or… (Discrete histogram)
50th Percentile / Median
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 11
11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12
12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13
13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15
16 16 16 16 16 16 17 17 17 18 18 18 18 18 19 19 20 20
21 21
25th Percentile / 1st quartile
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 11
11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12
12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13
13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15
16 16 16 16 16 16 17 17 17 18 18 18 18 18 19 19 20 20
21 21
0th Percentile
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 11
11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12
12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13
13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15
16 16 16 16 16 16 17 17 17 18 18 18 18 18 19 19 20 20
21 21
100th Percentile
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2
2 2 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 6 6 6 6 6 6 6 6
6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
8 9 9 9 9 9 9 9 9 9 9 10 10 10 10 10 10 11
11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 12 12 12
12 12 12 12 12 12 12 12 12 12 12 12 12 13 13 13 13 13
13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 15 15 15
16 16 16 16 16 16 17 17 17 18 18 18 18 18 19 19 20 20
21 21
Things you can do with
percentiles
Commonly seen
• Interquartile range: lower and upper value of the middle 50% of the data
Other possibilities
• The middle 80%
• Or the top 5%
What might be useful?
Things you can do with percentiles:
norms (here SDQ)
Standard deviation
• Measure of how spread out value are
• Start with the variance: average* of squared
differences between mean and each value: 𝑥1 − mean 2, 𝑥2 − mean 2, … , 𝑥𝑛 − mean 2
• Standard deviation is the square root of this
• In original units of variable, e.g., SDQ points,
age in years, …
*Or almost: divide by (N−1)
Rules of thumb for normally
distributed (bell curve) data
~68% of data
Mean −1SD −2SD −3SD 1SD 2SD 3SD
Rules of thumb for normally
distributed (bell curve) data
~95% of data
Mean −1SD −2SD −3SD 1SD 2SD 3SD
Rules of thumb for normally
distributed (bell curve) data
~99.7% of data
Mean −1SD −2SD −3SD 1SD 2SD 3SD
Reasoning from sample to
population
Thinking about quality of care
provided by a service
Is one patient enough?
Thinking about quality of care
provided by a service
Two?
Thinking about quality of care
provided by a service
Five?
Discuss: what might vary that’s
out of control of clinician?
Made up example
• Data from two teams
• 20 patients in each
• Dependent variable: levels of difficulties at
end of treatment
Results
Team A mean 5.2
SD 2.3
Team B mean 7.4
SD 2.7
A B
Difficu
ltie
s
0
2
4
6
8
What we have
• Sample estimates of the population
– Mean
– Standard deviation
• Means definitely differ in
the sample of 40
A B
Difficu
ltie
s
0
2
4
6
8
What we (typically) want to know
• Do the means differ in the population?
• Do the results generalise beyond these 40
• Would the result replicate in another
sample from the same population?
You know the
mean outcome
for a random
sample
You know the
mean outcome
for a random
sample
What’s it likely
to be for the
population?
Null hypothesis
significance testing
Null hypothesis significance testing
• One way to reason from sample to population
• Requires:
Null hypothesis: e.g., the (population) means are the same
Alternative hypothesis: e.g., the (population) means are different
• We know the sample means were different
• Hope is that we can reject H0
Example: the t statistic
• Computed from sample means, SDs, & Ns
• Gives a standardised measure
• Related to difference in sample means
• A bigger number bigger difference
• Closer to zero smaller difference
• For our example we get t = 2.8
• Sign gives the direction of the difference, e.g., t = −2.8 would have been in the opposite direction
Pretend…
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
There is no difference in population means.
Simulate with 10,000 studies where there‘s no
difference in the population means
t
Fre
qu
en
cy
-4 -2 0 2 4
05
00
10
00
15
00
t
Fre
qu
en
cy
-4 -2 0 2 4
05
00
10
00
15
00
Occasionally large
positive differences
in sample
t
Fre
qu
en
cy
-4 -2 0 2 4
05
00
10
00
15
00
Occasionally large
negative differences
in sample
t
Fre
qu
en
cy
-4 -2 0 2 4
05
00
10
00
15
00
Mostly only small
differences in sample
t
Fre
qu
en
cy
-4 -2 0 2 4
05
00
10
00
15
00
Mostly only small
differences in sample
But remember:
NO DIFFFERENCE IN POPULATION
t
De
nsity
-4 -2 0 2 4
0.0
0.1
0.2
0.3
Normalise so blue area = 1
Blue area between two values gives
probability of getting those values in sample
Can we find the null distribution
without simulation?
• William Gosset
• 1899: Joined Guinness in Dublin
• Developed statistics to help with
quality control in brewing
• Published under pseudonym
Student
• Worked out the t-distribution [Student (1908). The probable error
of a mean. Biometrika 6, 1–25.]
... the t distribution is now computed
in all stats software …
What is a p-value then?
• We got t = 2.8 in the sample
• How likely is this, assuming that the
population means are the same?
• We don‘t want it to be likely
• We want to be in a world where there is a
difference!
• The hypothesis didn‘t specify a direction
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
t
De
nsity
t(38) = 2.8, p = .008
Area = .004 Area =.004
Final step
• Current conventions say that p < 0.05 is
“statistically significant”
• So can reject H0
• Note statistically significant doesn’t mean
clinically significant – the magnitude of the
difference matters!
Confidence intervals
• Give interval where population value is
likely to be
• With a given degree of confidence
• 95% confidence means that in 95% of
studies, the true population value will be
included in the interval
02
46
8
t(198) = 0, p = 1
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 0.676, p = 0.5
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 1.653, p = 0.1
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 1.93, p = 0.055
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 1.972, p = 0.05
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 2.017, p = 0.045
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 2.017, p = 0.045
Group
Sco
re
A B
95% Confidence intervals of the means
Note: CIs can
overlap and still
there’s a significant
difference
02
46
8
t(198) = 2.601, p = 0.01
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 2.839, p = 0.005
Group
Sco
re
A B
95% Confidence intervals of the means
02
46
8
t(198) = 3.339, p = 0.001
Group
Sco
re
A B
95% Confidence intervals of the means
Why this funnel shape?
• Standard error of the mean =
SD
𝑁
• Sample size ↑ … error ↓
• So for smaller services you’d expect greater between-service differences in (sample) means
• Even if there’s no population difference
Also happens for proportions: relationship
between sample size and recovery rates
Graphed
using public
Adult IAPT
data
downloaded
from NHS IC
web page
Effect size and power
• Effect size: how big the effect is
• For example how large a difference in
means or in proportions
• Larger the effect, the easier it is to detect
• More data
More precise estimate of population quantity
more likely to detect an effect = more power
Power analysis
http://www.psycho.uni-
duesseldorf.de/abteilunge
n/aap/gpower3/
Examples with proportions
• Compare proportion of families dropping
out of treatment between two teams
• Or try to reduce drop-outs
• Or increase proportion of children who say
they felt listened to
Minimum sample size needed in each group for
test of difference between two proportions (power = 80%, searching for p < .05)
Group 2
Group 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0.2 195
0.3 59 292
0.4 30 80 355
0.5 18 38 93 387
0.6 12 22 42 97 387
0.7 9 14 23 42 93 355
0.8 6 9 14 22 38 80 292
0.9 5 6 9 12 18 30 59 195
Examples with means
• Compare average outcomes between two
teams
• Or test whether training has any effect on
outcomes
• Compare clinicians’ outcomes?
Minimum sample size needed in each group for test of
difference between two means (power = 80%, searching for p < .05)
Effect size d (in SD units)
N in each group
0.1 1571 0.2 393 0.3 175 0.4 99 0.5 64 0.6 45 0.7 33 0.8 26 0.9 20 1.0 17
𝑑 = mean1 − mean2
SD
What about interpretation?
Let’s start somewhere where it
seems to be easier…
Rizo and Südhof (2002)
Zhang, Wu,
Wang, et al
(2011)
There’s still a large gap
between data and theory!
“My friend told me Chomsky said something very sad. He said that today we don't need theory. All we need to do is tell people, empirically, what is going on. Here, I violently disagree: facts are facts, and they are precious, but they can work in this way or that. Facts alone are not enough. […] I'm sorry, I'm an old-fashioned continental European. Theory is sacred and we need it more than ever.” – Slavoj Žižek, interview in New Statesman, 29 October 2009
Start with the basics
• Data quality – What’s the return rate?
– Any sign of systematic biases?
– Miscoded data?
• Domain coverage of measure – Broad spectrum vs. narrow focus
– Are the problems/strengths covered at all?
• Case mix – Comorbidity
– Emotional problems vs. developmental conditions
“In the absence of randomization, one has to
work very hard to demonstrate that
unbalanced patient characteristics or referral
practices could not have substantially
influenced the treatment outcome
comparison”
– Clark, Fairburn, and Wessely (2008, p. 631)
Discuss: where to look for ideas
for factors influencing outcomes?
Some ideas…
• Therapeutic alliance – Correlates with outcomes (Shirk, Karver, Brown,
2011)
– Parent alliance relates to less frequent cancellations and CYP alliance to outcomes (Hawley & Weisz, 2005)
• Getting feedback on therapy (Bickman, Kelley, Breda, Andrade & Riemer, 2011)
• Practitioner skill (Scott, Carby, Rendu, 2008)
• Steer clear of “service restructuring”?
• Normalising the possibility to change therapist if things aren’t work out? (Evidence?)
Skill and outcome… (Scott, Carby, Rendu, 2008)
Skill and outcome… (e.g., Scott Carby, Rendu, 2008)
Assessing change I:
reliable change index
What if
patient
discharged?
A problem with changescores
Improvement/deterioration
could be due to chance
and to change in
underlying problems
What do
you
see?
One solution: reliable change
indices (Jacobson & Truax 1991)
RCI =post − pre
SEdiff
Change score
Reliable change
RCI =post − pre
SEdiff
where Standard error
of the
difference
SEdiff = SDpre 2 1 − 𝑟
Reliable change
RCI =post − pre
SEdiff
SEdiff = SDpre 2 1 − 𝑟
where
SD for pre-
score
Reliability, e.g.,
Cronbach’s α or test-
retest reliability
Example
• SDpre = 𝟔. 𝟔𝟓, 𝑟 = 𝟎. 𝟖𝟖
• SEdiff = SDpre 2 1 − 𝑟
• = 𝟔. 𝟔𝟓 × 2 × 1 − 𝟎. 𝟖𝟖
• = 3.257821
• Then…. RCI =post − pre
3.257821
z-scores
• The null distribution is a z-score, i.e., – Normally distributed
– Mean 0
– SD 1
• So you can compute a p-value
• See Excel file at http://www.corc.uk.net/resources/downloads/
Change-
score RCI p
1 0.31 0.759
2 0.61 0.539
3 0.92 0.357
4 1.23 0.220
5 1.53 0.125
6 1.84 0.066
7 2.15 0.032
8 2.46 0.014
9 2.76 0.006
10 3.07 0.002
11 3.38 0.001
There are some variations
on this theme
• Cronbach α or test-retest reliability used
• Sometimes (e.g., Barkham et al 2012) the
SD of the difference is used rather than
time 1 SD
• These values can come from large norm
samples (preferred) or sometimes (have
to) come from smaller samples
Important message: just tell everyone what
you did and be consistent
Another useful thing to do with this:
compute a reliable change criterion
• Work out 1.96 × SEdiff (why 1.96?)
• Change greater than this amount is reliable change (with 95% confidence)
• Example
– SEdiff = 3.257821 from example before
– Multiple by 1.96 = around 6.4
– If the change scores are integers, then this means a change of 7 or more (in either direction) is reliable
• Sometimes provided by measure developers
Assessing change II:
“added value” score
The problem
• Many factors other than mental health intervention can change symptom scores
• Examples
– Helpful friends, family, teacher
– Development
– Referral at peak of problems
– Regression to the mean
– Various response biases
– Negative life episodes
Intuition
• Suppose you’re okay at bowling but not fantastic
• You throw– strike!
• What happens on your next turn?
• Analogy: – Skill + luck
– True levels of difficulties + random variation
Regression to the mean
Think of all questionnaire scores as consisting of a true score and a error component
Measured score = true score + error
(But beware, sometimes some of the “error” is the signal, e.g., a particularly good/distressing day)
Now suppose…
• You measure the same thing twice
• It hasn’t really changed
• And there is no measurement error
Each line
connects a
person’s
score at
time 1 and
time 2
What happens if you add
measurement error, but the true
score doesn’t change…?
Example
(Parent SDQ in CORC)
r = .30
p << 0.001
N = 18140
Added value score (Goodman and Goodman)
• Developed using BCAMHS 2004 data
• Parent-rated SDQs
• 609 people with clinical problems
• Mostly (84%) not attending CAMHS
• Regression model developed predicting
Time 2 from Time 1 scores 6 months
earlier
Outset 6 months later
Non-CAMHS
sample modeled
Produces equation for
change due, e.g., to
• regression to mean
• spontaenous recovery
Outset 6 months later
AVS
Change in
CAMHS
case
Actual score
Predicted
non-CAMHS
score
Final added-value score
2.3 + 0.8𝑇𝑜𝑡𝑎𝑙1 + 0.2𝐼𝑚𝑝𝑎𝑐𝑡1 − 0.3𝐸𝑚𝑜𝑡𝑖𝑜𝑛1 − 𝑇𝑜𝑡𝑎𝑙2
Predicted T2 score if had
received no treatment
Actual T2 score
with treatment
Evidence it works
• Two parenting programme RCTs now
supporting the AVS
– Ford, Hutchings, Bywater, Goodman, &
Goodman (2009) Br J Psychiatry. 194(6).
– Sebastian Rotheray’s talk at 2012 RCP conf.
• Control group has an AVS ≈ 0
• Treatment group has AVS ≈ treat − control
• Not yet tested for emotional problems
Notes on the AVS http://www.sdqinfo.com/c5.html
• Using only for largish samples, e.g., 100 cases
• Confidence intervals for individual cases plus or minus 10 points
• Only applies to Parent-reported SDQs
• “Although initial findings on added value scores are promising, they should not be taken too seriously until accumulating experimental data from around the world tells us more about the formula's own strengths and difficulties!”
What about for other measures?
• Difficult to find data for people not receiving treatment
• The Parent-SDQ AVS based on original sample of 7977 cases in general population
• One source is waiting list controls in RCTs
• More to come, initially for school counselling populations (Cooper, Fugard, McArthur, Pybis, in preparation)
A rapid intro to statistical
process control
Basic ideas
• Common cause: causes of variation which affect all parts of the system, for instance noisiness in the measurement
• Special cause: causes of variation which do not affect all parts of the system, or not all of the time
Focus on trying to spot special causes, e.g., due to variation in practice
Perla R J et al. BMJ Qual Saf 2011;20:46-51
Run chart
Rules for run charts (giving p < .05)
• Shift: at least 6 consecutive points all above/below median (ignore points on median)
• Trend: at least 5 points all rising/falling (ignore like points)
• Run: too few or too many points on one side of median (statistical tables for how many there have to be; see Perla et al 2011)
• Astronomical point: something visually odd (not probability based for run charts – better to use control charts, coming later)
Total number of data
points on the run chart
that do not fall on the
median
Lower limit for the
number of runs
(< than this number
runs is ‘too few’)
Upper limit for the
number of runs
(> than this number
runs is ‘too many’)
10 3 9
11 3 10
12 3 11
13 4 11
14 4 12
15 5 12
16 5 13
17 5 13
18 6 14
19 6 15
20 6 16
More in Perla R J et al. BMJ Qual Saf 2011;20:46-51
Perla R J et al. BMJ Qual Saf 2011;20:46-51
Fictional example – based on a real service
Clinician
Number of face-
to-face sessions
this week
Claire 24
Panos 22
Thom 21
Johannes 20
Keith 19
Berit 19
Jenny 19
Polly 16
Bianca 14
Ellie 12
Weekly targets came from
Above: 20 face-to-face
sessions per week
Now easily tracked by
service manager
Actual session counts shared
each week with everyone
Why is Clare performing
better than Ellie?
One reason: random variation
Control chart (see Caulcutt 2004)
Upper/lower control lines (UCL/LCL): mean +/– 3 SD
Upper/lower warning lines (UWL/LWL): mean +/– 2 SD
Process has changed if one of:
• 1 point above UCL
• 1 point below LCL
• 2 consecutive points between UCL and UWL
• 2 consecutive points between LCL and LWL
• 8 consecutive points on same side of mean
Concise summary: monitor for a while before judging!
Further reading
Thank you!
Top Related