Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education...
-
Upload
bailey-lyon -
Category
Documents
-
view
217 -
download
1
Transcript of Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education...
Client Assessment and Client Assessment and Other New Uses Other New Uses of Reliabilityof ReliabilityWill G HopkinsPhysiology and Physical EducationUniversity of Otago, Dunedin NZ
Reliability: the Essentials Assessment of Individual Clients and Patients Estimation of Sample Size for Experiments Estimation of Individual Responses to a
Treatment
Part of a mini-Part of a mini-symposium symposium
presented at the presented at the Annual MeetingAnnual Meetingof the American of the American
College of Sports College of Sports MedicineMedicine
in Baltimore, in Baltimore,May 31, 2001May 31, 2001
Reliability: the Essentials Reliability is reproducibility of a measurement
if or when you repeat the measurement. It's crucial for clinicians…
because you need good reproducibility to monitor small but clinically important changes in an individual patient or client.
It's crucial for researchers…because you need good reproducibility to quantify such changes in controlled trials with samples of reasonable size.
Reliability: the Essentials How do we quantify reliability? Easy to understand for one subject tested many times:
ChrisSubject
76Trial 2
72Trial 1
74Trial 3
79Trial 4
79Trial 5
77Trial 6
76.2 ± 2.8Mean ± SD
The 2.8 is the standard error of measurement. I call it the typical error, because it's the typical difference between the subject's true value and the observed values. It's the random error or noise in our assessment of clients and in our experimental studies. Strictly, this standard deviation of a subject's values is the
total error of measurement.
Reliability: the Essentials We usually measure reliability with many subjects tested
a few times:
ChrisSubject
7672Trial 2Trial 1
4Trial 2-1
The 3.4 divided by 2 is the typical error. The 3.4 multiplied by ±1.96 are the limits of agreement. The 2.6 is the change in the mean.
Jo 5853 5Kelly 6060Pat 8284Sam 7367
0-26
Mean ± SD: 2.6 ± 3.4
Reliability: the Essentials And we can define retest correlations:
Pearson (for two trials) and intraclass (two or more trials).
50
70
90
50 70 90
Trial 2
Trial 1
Intraclass r = 0.95
Pearson r = 0.96
Assessment of Individual Clients and Patients
When you test or retest an individual, take account of relative magnitudes of signal and noise.
The signal is what you are trying to measure. It's the smallest clinically or practically important
change (within the individual) or difference (between two individuals or between an individual and a criterion value).
Rarely it's larger changes or differences.
Assessment of Individual Clients and Patients
The noise is the typical error of measurement. It needs to come from a reliability study in which
there are no real changes in the subjects.• Or in which any real changes are the same for all
subjects.• Otherwise the estimate of the noise will be too large. • Time between tests is therefore
as short as necessary.• A practice trial may be important,
to avoid real changes.• If published error is not relevant to your situation,
do your own reliability study.
Assessment of Individual Clients and Patients
If noise << signal... Example:
body mass; noise in scales = 0.1 kg, signal = 1 kg. The scales are effectively noise-free. Accept the measurement without worry.
If noise >> signal... Example:
speed at ventilatory threshold; noise = 3%, signal = 1%. The noise swamps all but large changes or differences. Find a better test.
Assessment of Individual Clients and Patients If noise signal...
Examples: many lab and field tests. Accept the result of the test cautiously. Or improve assessment by...
1. averaging several tests2. using confidence limits3. using likelihoods4. possibly using Bayesian adjustment
1. Average several tests to reduce the noise.• Noise reduces by a factor of 1/n,
where n = number of tests.2. Use likely (confidence) limits for the subject's true value.
• Practically useful confidence is less than the 95% of research.
• For a single test, single score ± typical error are 68% confidence limits.
• For test and retest, change score ± typical error are 52% confidence limits.
Assessment of Individual Clients and Patients
• Example of likely limits for a change score: noise (typical error) = 1.0, smallest important change = 0.9.
Assessment of Individual Clients and Patients
"a positive change?"
"no real change?"
-1 0 1 2Change score
0.9-0.9
trivialnegative positive
-2
the true change is 52% likely
to be between 0.5 and 2.5.
If you see a change of 1.5,
If you see a change of 0.5, the true change is 52% likely
to be between -0.5 and 1.5.
Assessment of Individual Clients and Patients
3. Use likelihoods that the true value is greater/less than an important reference value or values.• More precise than confidence limits,
but needs a spreadsheet for the calculations.• For single scores, the reference value is usually
a pass-fail threshold.• For change scores, the best reference values are
± the smallest important change.
• Same example of a change score, to illustrate likelihoods: noise (typical error) = 1.0, smallest important change = 0.9.
Assessment of Individual Clients and Patients
"a positive change"
"maybe no real change"
-1 0 1 2Change score
0.9-0.9
trivialnegative positive
-2
66% the true change is positive;
66%5% the true change is negative. 5%
If you see 1.5, chances are...
29% the true change is trivial;29%
If you see 0.5, chances are...39% the true change is positive;45% the true change is trivial;16% the true change is negative.
39%16% 45%
Assessment of Individual Clients and Patients
4. Go Bayesian?• That is, take into account your prior belief about the
likely outcome of the test.• When you scale down or reject outright an unlikely high
score, you are being a Bayesian... because you attribute the high score partly or entirely to noise, not the client.
Assessment of Individual Clients and Patients
• To go Bayesian quantitatively…1. specify your prior belief with likely limits;2. combine your belief with the observed score
and the noise to give…3. an adjusted score with adjusted likely limits or
likelihoods.• But how do you specify your prior belief believably?
Example: if you believe a change couldn't be outside ±3, where does the ±3 come from, and what likely limits define couldn't? 80%, 90%, 95%, 99%... ?
• So use Bayes qualitatively but not quantitatively.
Based on having acceptable precision for the effect. Precision is defined by 95% likely limits.
Estimate of likely limits needs typical error from a reliability study in which the time frame is the same as in the experiment.
If published error is not relevant, try to do your own reliability study.
Acceptable limits…
Estimation of Sample Size for Experiments
Acceptable limits can't be both substantially positive and negative, in the worst case of observed effect = 0.
Estimation of Sample Size for Experiments
For a crossover, 95% likely limits =±[2 x (typical error)/(sample size)] x t0.975,DF = ± d, where DF is the degrees of freedom in the experiment.
Therefore sample size ≈ 8(typical error)2/d2, so...
0
trivial
Effect (change score)
negative positive Therefore 95% likely limits
d-d
= smallest important effect= ± d
Estimation of Sample Size for Experiments
When typical error ≈ smallest effect, sample size ≈ 8. For a study with a control group, sample size ≈ 32
(4x as many). Beware: typical error in an experiment is often larger than in
a reliability study, so you may need more subjects. When typical error << smallest effect, sample size
could be ~1, but use ~8 in each group to ensure sample is representative.
Estimation of Sample Size for Experiments
When typical error >> smallest effect… Test 100s of subjects to estimate small effects. Or test fewer subjects many times pre and post
the treatment. Or use a smaller sample and find a test
with a smaller typical error. Or use a smaller sample and hope for a large effect.
• Because larger effects need less precision.• If you get a small effect, tell the editor
your study will contribute to a meta-analysis.
Individual Responses to a Treatment
An important but neglected aspect of research. How to see them? Three ways.
1. Display each subject's values:
pre post
testscore
40
60
80
pre post
drugdrugplaceboplacebo
time
No IndividualResponses to Drug
pre postpost
drugdrugplaceboplacebo
time
Substantial IndividualResponses to Drug
pre
same reliabilitydifferent reliability
Individual Responses to a Treatment
2. Look for an increase in the standard deviation of the treatment group in the post test.• But you might miss it:
pre postpost
placeboplacebo
timepre
testscore
40
60
80 drugdrug
Each subject'svalues
pre post
testscore
time
40
60
80 drugdrug
placeboplacebo
Means andstandard deviations
relatively larger
Individual Responses to a Treatment
3. Look for a bigger standard deviation of the post-pre change scores in the treatment group.• Now much easier to see any individual responses:
To present the magnitude of individual responses...
pre post
testscore
40
60
80
pre post
drugdrugplaceboplacebo
Each subject'svalues
0
10
20
placeboplacebo drugdrug
post-prescore
30
Means and SDsof change scores
Individual Responses to a Treatment
Express individual responses as a standard deviation. Example: effect of drug = 14 ± 7 units (mean ± SD). This SD for individual responses is
free of measurement error. • It is NOT the SD of the change score
for the drug group. There is a simple formula for this SD (see next slide), but
getting its likely limits is more challenging. If you find individual responses, try to account for them in
your analysis using subject characteristics as covariates.
Individual Responses to a Treatment
How to derive this standard deviation: From the standard deviations of the change scores of the
treatment and control groups: (SD2treat - SD2
cont). Or from analysis of the treatment and control groups as
reliability studies: 2(error2treat - error2
cont). Or by using mixed modeling, especially to get its
confidence limits. Identify subject characteristics responsible for the
individual responses by using repeated-measures analysis of covariance. This approach also increases precision of the estimate of
the mean effect.
This presentation, spreadsheets, more information at:
A New View of Statistics
SUMMARIZING DATASUMMARIZING DATA GENERALIZING TO A POPULATIONGENERALIZING TO A POPULATIONSimple & Effect StatisticsSimple & Effect Statistics
Precision ofMeasurementPrecision ofMeasurement
Confidence LimitsConfidence Limits
StatisticalModelsStatisticalModels
DimensionReductionDimensionReduction
Sample-SizeEstimationSample-SizeEstimation
newstats.org