Benjamin OlkenMIT
Sampling and Sample Size
Course Overview
1. What is evaluation?2. Measuring impacts (outcomes,
indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and
Scaling Up
Course Overview
1. What is evaluation?2. Measuring impacts (outcomes,
indicators)3. Why randomize?4. How to randomize5. Threats and Analysis6. Sampling and sample size7. RCT: Start to Finish8. Cost Effectiveness Analysis and
Scaling Up
What’s the average result?
• If you were to roll a die once, what is the “expected result”? (i.e. the average)
Possible results & probability: 1 die
1 2 3 4 5 60
1/6
Rolling 1 die: possible results & average
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.00
1/6
Frequency Average
What’s the average result?
• If you were to roll two dice once, what is the expected average of the two dice?
Rolling 2 dice:Possible totals & likelihood
2 3 4 5 6 7 8 9 10 11 12
Fre-quency
0.027777777777777
9
0.055555555555555
5
0.083333333333333
3
0.111111111111111
0.138888888888889
0.166666666666667
0.138888888888889
0.111111111111111
0.083333333333333
3
0.055555555555555
5
0.027777777777777
9
0
1/8
1/5
Rolling 2 dice: possible totals 12 possible totals, 36 permutations
Die 1
Die 2
2 3 4 5 6 7
3 4 5 6 7 8
4 5 6 7 8 9
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
Rolling 2 dice:Average score of dice & likelihood
1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6
Fre-quency
0.02777777777777
79
0.05555555555555
55
0.08333333333333
33
0.11111111111111
1
0.13888888888888
9
0.16666666666666
7
0.13888888888888
9
0.11111111111111
1
0.08333333333333
33
0.05555555555555
55
0.02777777777777
79
0
1/8
1/5
Lik
elih
oo
d
Outcomes and Permutations
• Putting together permutations, you get:1. All possible outcomes2. The likelihood of each of those
outcomes– Each column represents one possible
outcome (average result)– Each block within a column represents one
possible permutation (to obtain that average)
2.5
Rolling 3 dice:16 results 318, 216 permutations
1 1 1/3 1 2/3 2 2 1/3 2 2/3 3 3 1/3 3 2/3 4 4 1/3 4 2/3 5 5 1/3 5 2/3 6
Fre-quency
0.00462962962962964
0.01388888888888
89
0.02777777777777
79
0.04629629629629
64
0.06944444444444
45
0.09722222222222
22
0.11574074074074
1
0.125 0.125 0.11574074074074
1
0.09722222222222
22
0.06944444444444
45
0.04629629629629
64
0.02777777777777
79
0.01388888888888
89
0.00462962962962964
0
1/8
Rolling 4 dice: 21 results, 1296 permutations
1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25 3.5 3.75 4 4.25 4.5 4.75 5 5.25 5.5 5.75 60%
2%
4%
6%
8%
10%
12%
14%
Rolling 5 dice:26 results, 7776 permutations
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.2 5.4 5.6 5.8 60%
2%
4%
6%
8%
10%
12%
Looks like a bell curve, or a normal distribution
1
1.16
6666
6666
6667
1.33
3333
3333
3333 1.
5
1.66
6666
6666
6667
1.83
3333
3333
3333 2
2.16
6666
6666
6667
2.33
3333
3333
3333 2.
5
2.66
6666
6666
6667
2.83
3333
3333
3333 3
3.16
6666
6666
6667
3.33
3333
3333
3334 3.
5
3.66
6666
6666
6667
3.83
3333
3333
3334 4
4.16
6666
6666
6667
4.33
3333
3333
3334 4.
5
4.66
6666
6666
6667
4.83
3333
3333
3334 5
5.16
6666
6666
6667
5.33
3333
3333
3334 5.
5
5.66
6666
6666
6666
5.83
3333
3333
3334
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
Rolling 10 dice:50 results, >60 million permutations
>99% of all rolls will yield an average between 3 and 4
1 1.72222222222222 2.44444444444444 3.16666666666666 3.88888888888888 4.6111111111111 5.333333333333320.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
Rolling 30 dice:150 results, 2 x 10 23 permutations*
>99% of all rolls will yield an average between 3 and 4
1 1.73333333333333 2.46666666666666 3.19999999999999 3.93333333333332 4.66666666666669 5.400000000000060.0%
0.5%
1.0%
1.5%
2.0%
Rolling 100 dice: 500 results, 6 x 10 77 permutations
Rolling dice: 2 lessons
1. The more dice you roll, the closer most averages are to the true average(the distribution gets “tighter”)-THE LAW OF LARGE NUMBERS-
2. The more dice you roll, the more the distribution of possible averages (the sampling distribution) looks like a bell curve (a normal distribution)-THE CENTRAL LIMIT THEOREM-
Which of these is more accurate?
I. II.
A. B. C.
0% 0%0%
A. I.B. II.C. Don’t know
Accuracy versus Precision
Precision
(Sampl
e Size)
Accuracy (Randomization)
truth
estimates
Accuracy versus Precision
Precision
(Sampl
e Size)
Accuracy (Randomization)
truth truthestimates
estimates
truth truthestimates
estimates
THE basic questions in statistics
• How confident can you be in your results?
• How big does your sample need to be?
THAT WAS JUST THE INTRODUCTION
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
Baseline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
frequency
test scores
Mean = 26
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
26 frequency
mean
test scores
Standard Deviation = 20
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0
100
200
300
400
500
600
26 frequency sd
mean
test scores
1 Standard Deviation
Let’s do an experiment
• Take 1 Random test score from the pile of 16,000 tests
• Write down the value• Put the test back• Do these three steps again• And again• 8,000 times• This is like a random sample of
8,000 (with replacement)
Good, the average of the sample is about 26…
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
What can we say about this sample?
But…
• … I remember that as my sample goes, up, isn’t the sampling distribution supposed to turn into a bell curve?
• (Central Limit Theorem)• Is it that my sample isn’t large
enough?
This is the distribution of my sample of 8,000 students!
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
Population v. sampling distribution
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
How do we get from here…
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
100200300400500
test scores
To here…
This is the distribution of the population(Population Distribution)
This is the distribution of Means from all Random Samples(Sampling distribution)
Draw 10 random students, take the average, plot it: Do this 5 & 10 times.
Inadequate sample size
No clear distribution around population mean
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 5 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 10 Samples
More sample means around population mean
Still spread a good deal
Draw 10 random students: 50 and 100 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means With 50 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520123456789
10
Frequency of Means with 100 Samples
Distribution now significantly more normal
Starting to see peaks
Draws 10 random students: 500 and 1000 times
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 500 Samples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 520
10
20
30
40
50
60
70
80
Frequency of Means With 1000 Samples
Draw 10 Random students
• This is like a sample size of 10• What happens if we take a sample
size of 50?
N = 10 N = 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 5 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 10 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 5 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
Frequency of Means With 10 Samples
Draws of 10 Draws of 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
12
14
Frequency of Means With 50 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
5
10
15
20
Frequency of Means with 100 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
2
4
6
8
10
12
14
Frequency of Means With 50 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
4
8
12
16
Frequency of Means With 100 Samples
Draws of 10 Draws of 50
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
102030405060708090
Frequency of Means With 500 Samples
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 520
20406080
100120140160
Frequency of Means With 1000 Samples
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
Population & sampling distribution: Draw 1 random student (from 8,000)
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26mean
frequency
freq (N=1)
test scores
Sampling Distribution: Draw 4 random students (N=4)
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26mean
frequency
freq (N=4)
test scores
Law of Large Numbers : N=9
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26mean
frequency
freq (N=9)
test scores
Law of Large Numbers: N =100
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26mean
frequency
freq (N=100)
test scores
The white line is a theoretical distribution
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
mean
frequency
dist_1
freq (N=1)
test scores
Central Limit Theorem: N=1
Central Limit Theorem : N=4
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26
mean
fre-quency
dist_4
freq (N=4)
test scores
Central Limit Theorem : N=9
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26
mean
frequency
dist_9
freq (N=9)
test scores
Central Limit Theorem : N =100
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26
mean
frequency
dist_100
freq (N=100)
test scores
Outline
• Sampling distributions– population distribution– sampling distribution– law of large numbers/central limit
theorem– standard deviation and standard error
• Detecting impact
Standard deviation/error
• What’s the difference between the standard deviation and the standard error?
• The standard error = the standard deviation of the sampling distributions
Variance and Standard Deviation
• Variance = 400
• Standard Deviation = 20
• Standard Error = SE =
Standard Deviation/ Standard Error
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
26
mean
frequency
sd
dist_1
test scores
Sample size ↑ x4, SE ↓ ½
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-0.5%
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
3.5%
4.0%
4.5%
26
mean
frequency
sd
se4
dist_4
test scores
Sample size ↑ x9, SE ↓ ?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
-1.0%
0.0%
1.0%
2.0%
3.0%
4.0%
5.0%
6.0%
7.0%
26mean
frequency
sd
se9
dist_9
test scores
Sample size ↑ x100, SE ↓?
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
0.0%
5.0%
10.0%
15.0%
20.0%
25.0%
26
mean
frequency
sd
se100
dist_100
test scores
Outline
• Sampling distributions• Detecting impact
– significance– effect size – power– baseline and covariates– clustering– stratification
Baseline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
50
100
150
200
250
300
350
400
450
500
test scores
We implement the Balsakhi Program (lecture 3)
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
After the balsakhi programs, these are the endline test scores
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
test scores
Endline test scores
0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
0
20
40
60
80
100
120
140
160
Endline test scores
The impact appears to be?
A. PositiveB. NegativeC. No impactD. Don’t know
0123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
0
50
100
150
200
250
300
350
400
450
500
Baseline test scores
Stop! That was the control group. The treatment group is red.
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
control
treatment
test scores
Post-test: control & treatment
Is this impact statistically significant?
A. B. C.
0% 0%0%A. YesB. NoC. Don’t know
01234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991000
20
40
60
80
100
120
140
160
control
treatment
control μ
treatment μ
test scores
Average Difference = 6 points
One experiment: 6 points
One experiment
Two experiments
A few more…
A few more…
Many more…
A whole lot more…
…
Running the experiment thousands of times…
By the Central Limit Theorem, these are normally distributed
Hypothesis testing
• In criminal law, most institutions follow the rule: “innocent until proven guilty”
• The presumption is that the accused is innocent and the burden is on the prosecutor to show guilt– The jury or judge starts with the “null
hypothesis” that the accused person is innocent
– The prosecutor has a hypothesis that the accused person is guilty
74
Hypothesis testing
• In program evaluation, instead of “presumption of innocence,” the rule is: “presumption of insignificance”
• The “Null hypothesis” (H0) is that there was no (zero) impact of the program
• The burden of proof is on the evaluator to show a significant effect of the program
Hypothesis testing: conclusions
• If it is very unlikely (less than a 5% probability) that the difference is solely due to chance: – We “reject our null hypothesis”
• We may now say: – “our program has a statistically
significant impact”
What is the significance level?
• Type I error: rejecting the null hypothesis even though it is true (false positive)
• Significance level: The probability that we will reject the null hypothesis even though it is true
Hypothesis testing: 95% confidence
YOU CONCLUDE
Effective No Effect
THE TRUTH
Effective Type II Error
(low power)
No Effect
Type I Error(5% of the time)
78
What is Power?
• Type II Error: Failing to reject the null hypothesis (concluding there is no difference), when indeed the null hypothesis is false.
• Power: If there is a measureable effect of our intervention (the null hypothesis is false), the probability that we will detect an effect (reject the null hypothesis)
Assume two effects: no effect and treatment effect β
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Before the experiment
H0Hβ
Anything between lines cannot be distinguished from 0
Impose significance level of 5%
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceHβH0
Shaded area shows % of time we would find Hβ true if it was
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Can we distinguish Hβ from H0 ?
HβH0
What influences power?
• What are the factors that change the proportion of the research hypothesis that is shaded—i.e. the proportion that falls to the right (or left) of the null hypothesis curve?
• Understanding this helps us design more powerful experiments
83
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
Effect Size: 1*SE
• Hypothesized effect size determines distance between means
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
1 Standard Deviation
HβH0
Effect Size = 1*SE
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
The Null Hypothesis would be rejected only 26% of the time
Power: 26%If the true impact was 1*SE…
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
HβH0
Bigger hypothesized effect size distributions farther apart
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Effect Size: 3*SE
3*SE
Bigger Effect size means more power
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Effect size 3*SE: Power= 91%
H0
Hβ
What effect size should you use when designing your experiment?
A. B. C. D.
25% 25%25%25%A. Smallest effect size that is still cost effective
B. Largest effect size you estimate your program to produce
C. BothD. Neither
• Let’s say we believe the impact on our participants is “3”
• What happens if take up is 1/3?• Let’s show this graphically
Effect size and take-up
Let’s say we believe the impact on our participants is “3”
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
Effect Size: 3*SE
3*SE
Take up is 33%. Effect size is 1/3rd
• Hypothesized effect size determines distance between means
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
1 Standard Deviation
HβH0
Take-up is reflected in the effect size
Back to: Power = 26%
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
HβH0
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
By increasing sample size you increase…
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
A. B. C. D. E.
0% 0% 0%0%0%
A. AccuracyB. PrecisionC. BothD. NeitherE. Don’t know
Power: Effect size = 1SD, Sample size = N
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
Power: Sample size = 4N
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 64%
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significance
Power: Sample size = 9
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
What are typical ways to reduce the underlying variance
A. B. C. D. E. F.
0% 0% 0%0%0%0%
A.Include covariates
B.Increase the sample
C.Do a baseline survey
D.All of the aboveE.A and BF.A and C
Variance
• There is sometimes very little we can do to reduce the noise
• The underlying variance is what it is• We can try to “absorb” variance:
– using a baseline– controlling for other variables
• In practice, controlling for other variables (besides the baseline outcome) buys you very little
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
Sample split: 50% C, 50% T
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
Equal split gives distributions that are the same “fatness”
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 91%
If it’s not 50-50 split?
• What happens to the relative fatness if the split is not 50-50.
• Say 25-75?
Sample split: 25% C, 75% T
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
significanceH0
Hβ
Uneven distributions, not efficient, i.e. less power
-4 -3 -2 -1 0 1 2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
control
treatment
power
Power: 83%
Allocation to T v C
2
2
1
2
21 )(nn
XXsd
12
2
2
1
2
1)( 21 XXsd
15.13
4
1
1
3
1)( 21 XXsd
Power: main ingredients
1. Effect Size2. Sample Size 3. Variance4. Proportion of sample in T vs. C5. Clustering
Clustered design: intuition
• You want to know how close the upcoming national elections will be
• Method 1: Randomly select 50 people from entire Indian population
• Method 2: Randomly select 5 families, and ask ten members of each family their opinion
114
Low intra-cluster correlation (Rho)
HIGH intra-cluster correlation (rho)
All uneducated people live in one village. People with only primary education live in another. College grads live in a third, etc. Rho on education will be..
A. B. C. D.
25% 25%25%25%A. HighB. LowC. No effect on rhoD. Don’t know
If rho is high, what is a more efficient way of increasing power?
A. B. C. D.
0% 0%0%0%
A. Include more clusters in the sample
B. Include more people in clusters
C. BothD. Don’t know
Testing multiple treatments
Control Group Balsakhi
CAL program Balsakhi + CAL
←0.15 SD→
↖
0.25 SD↘
↑0.15 SD
↓
↑0.10 SD
↓
←0.10 SD→
↗
0.05 SD↙
50 50
50 100100
100200
200
END
Top Related