Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis...
-
Upload
claude-wade -
Category
Documents
-
view
221 -
download
5
Transcript of Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis...
1
Introducing Statistical Inference with Randomization Tests
Allan Rossman
Cal Poly – San Luis Obispo
222
Outline
2×2 tables Activity/example 1: Dolphin therapy? Activity/example 2: Murderous nurse?
Quantitative response Activity/example 3: Sleep deprivation? Activity/example 4: Age discrimination? Activity/example 5: Memory study?
Extensions, reflections, further reading
333
Example 1: Dolphin therapy?
Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment
Is dolphin therapy more effective than control? Core question of inference:
Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?
Dolphin therapy Control group TotalSubject improved 10 3 13Subject did not 5 12 17
Total 15 15 30Proportion 0.667 0.200
444
Example 1 (cont.)
Standard approach: Could calculate test statistic, p-value from approximate sampling distribution (z, chi-square) But technical conditions do not hold But this would be approximate anyway But how does this relate to what “significance” means?
555
Example 1 (cont.)
Alternative: Simulate random assignment process many times, see how often such an extreme result occurs Assume no treatment effect (null model) Re-randomize 30 subjects to two groups (using cards)
Assuming 13 improvers, 17 non-improvers regardless Determine number of improvers in dolphin group
Or, equivalently, difference in improvement proportions Repeat large number of times (turn to computer) Ask whether observed result is in tail of distribution
Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective
777
Example 1 (cont.)
Conclusion: Experimental result is statistically significant What does that mean; what is logic behind that?
Experimental result very unlikely to occur by chance alone A difference in success proportions at least as large
as .467 (in favor of dolphin group) would happen in less than 2% of all possible random assignments if dolphin therapy was not effective
8
Example 1 (cont.)
Exact randomization distribution Hypergeometric distribution Fisher’s Exact Test p-value =
= .0127 0.30
0.25
0.20
0.15
0.10
0.05
0.00
X
Pro
bability
10
0.0127
3
Distribution PlotHypergeometric, N=30, M=13, n=15
15
30
2
17
13
13
3
17
12
13
4
17
11
13
5
17
10
13
9
Example 2: Murderous Nurse?
Murder trial: U.S. vs. Kristin Gilbert Accused of giving patients fatal dose of heart stimulant Data presented for 18 months of 8-hour shifts
Relative risk: 6.34
Gilbert on shift Gilbert not on shift TotalDeath occurred 40 34 74
No death 217 1350 1567Total 257 1384 1641
Proportion 0.156 0.025
10
Example 2 (cont.)
Structurally the same as previous example, but with one crucial difference No random assignment to groups
Observational study Allows many potential explanations other than “random
chance” Confounding variables Perhaps she worked intensive care unit or night shift
Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled
out as an explanation Some statisticians disagree
12
Example 2 (cont.)
Incredibly unlikely to observe such a difference/ratio by chance alone, if there were no difference between the groups But this does not prove, or perhaps even strongly
suggest, guilt Observational study Allows many potential explanations other than “random
chance” Confounding variables Perhaps she worked intensive care unit or night shift
131313
Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects
on cognitive functioning three days later? 21 subjects; random assignment
Core question of inference: Is such an extreme difference unlikely to occur by
chance (random assignment) alone (if there were no treatment effect)?
improvement
sleep c
onditio
n
4032241680-8-16
deprived
unrestricted
141414
Example 3 (cont.)
Could calculate test statistic, p-value from approximate “sampling” distribution (if conditions are met)
68.2
93.5
92.15
1073.14
1117.12
90.382.1922
2
22
1
21
21
ns
ns
xxt
008.68.2Pr ? tvaluep
151515
Example 3 (cont.)
Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs
Start with tactile simulation using index cards Write each “score” on a card Shuffle the cards Randomly deal out 11 for deprived group, 10 for unrestricted
group Calculate difference in group means Repeat many times
16
Example 3 (cont.)
Then use technology to simulate this randomization process
Applet: www.rossmanchance.com/applets/ (Randomization Tests)
difference in group means by random assignment
num
ber
of ra
ndom
izations
181260-6-12-18
120
100
80
60
40
20
0
= 13 / 1000approx p-value
17
Example 3 (cont.)
Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later Justification: Experimental results as extreme as
those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation
Easy to analyze medians instead
19
Example 4: Age discrimination? Martin vs. Westvaco (Statistics in Action) Employee ages:
25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Fired employee ages in bold:
25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Robert Martin: 55 years old Do the data provide evidence that the firing process
was not “random” How unlikely is it that a “random” firing process would
produce such a large average age?
20
Example 4 (cont.)
Exact permutation distribution:
Exact p-value: 6 / 120 = .05
56524844403632
20
15
10
5
0
mean age (fired)
Frequency
21
Example 5: Memorizing letters You will be given a string of 30 letters
Memorize as many as you can in 20 seconds (in order)
Design questions What kind of study is this? What kind of randomness was used in this study? What are the variable, and what kind are they?
Analysis questions Do boxplots suggest a significant difference? Simulate a randomization test, interpret the results
22
Extensions
Matched pairs design Randomize within pairs (e.g., by flipping coin)
Comparing more than 2 groups Alternative to chi-square, ANOVA Same use of randomization
Somewhat harder to define test statistic
Regression/correlation Randomize/permute one of the variables
232323
Reflections
You can do this at beginning of course Then repeat for new scenarios with more richness Spiraling could lead to deeper conceptual understanding
Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies
Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized
Flexibility in choice of test statistic (e.g. medians, odds ratio) Generalize to more than two groups
Takes advantage of modern computing power Does not require assumptions of normality
24
Fisher on randomization tests“The statistician does not carry out this very
simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.” – R.A. Fisher (1936)
252525
Ptolemaic curriculum?
“Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.”
– George Cobb (TISE, 2007)
26
Further reading
Ernst (2005), Statistical Science Scheaffer and Tabor (2008), Mathematics
Teacher Rossman (2008), Statistics Education
Research Journal Statistics: A Guide to the Unknown (ed. R.
Peck) NSF-funded project:
http://statweb.calpoly.edu/csi/