P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

28
AADAPT Workshop South Asia Goa, December 17-21, 2009 PRACTICAL SAMPLING FOR IMPACT EVALUATIONS Marie-Hélène Cloutier 1

Transcript of P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Page 1: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

PRACTICAL SAMPLING FOR

IMPACT EVALUATIONSMarie-Hélène Cloutier

1

Page 2: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

INTRODUCTIONIdeally, want to compare what happens to the same schools with and without the programBut impossible → use statistics.

Define treatment and control groups Compare mean outcome (e.g. test scores) value Random assignment ensures comparability but do not

remove noise…

How big should groups be and how should we select them?

Warning! Goal is to give overview of how sampling features affect what it is

possible to learn from an impact evaluation Not make you a sampling expert or give you a headache 2

Page 3: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Sampling frame - Representativeness / external validity

Which populations or groups are we interested in and where do we find them?

Sample size - Groups large enough to credibly detect a meaningful effect

How many people/schools/units should be interviewed/observed from that population?

3

INTRODUCTION

Page 4: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

SAMPLING FRAMECensus vs Samples?

Sample – Lower cost, faster data collection (avoid capturing dynamics), and smaller data set (improved data quality)

Who are we interested in? Feasibility and what you want to learna) All schools?b) All public schools?c) All public primary schools?d) All public primary schools in a particular region?

External validity Can findings from a sample of population (c) inform appropriate

programs to help secondary schools? Can findings from a sample of population (d) inform national

policy?4

Page 5: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

SAMPLING FRAMEFinding the units we’re interested in Depends on size and type of experiment

Required information before sampling Complete listing all of units of observation available for

sampling in each area or groups

5

Experiment Primary Sampling Unit

Piloting new national textbooks Schools or Classrooms

Early literacy program Classrooms for grades 1 - 3

Incentives for teachers in rural schools Schools classified as rural

Page 6: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

SAMPLE SIZE AND CONFIDENCEExample: simpler question than program impact Say we wanted to know the average annual expenses of

a school Option 1: We go out and interview 5 randomly selected

headmasters and take the average of their responses. Option 2: We interview 1,000 randomly selected headmasters

and average their responses.

Which average is likely to be closer to the true average?Why?

6

Page 7: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

SAMPLE SIZE AND CONFIDENCEExample: simpler question than program impact Say we wanted to know the average annual expenses of

a school Option 1: We go out and interview 5 randomly selected

headmasters and take the average of their responses. Option 2: We interview 1,000 randomly selected headmasters

and average their responses.

Which average is likely to be closer to the true average?Why?

7

With IE, need many observations to say with confidence whether average outcome treatment > or < average outcome control

Page 8: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Main things to be aware of:1. Detectable effect size

2. Probability of type 1 error (significance)

Probability of type 2 error (1 – power)

3. Variance of outcome(s)8

CALCULATING SAMPLE SIZE

)1(1)(4

2

22/

2

H

D

zzN

There is a formula…

Moussa P. Blimpo
Might be better to bring this slide up at the end. After we presented the intuitive approach(Right before slide # 17)
Page 9: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

What is an effect size? The extent to which the intervention affects the outcome of interest

E.g. 10% increase in test scores, 25% increase in completion rate

Harder to capture (detect) a smaller effect

9

CALCULATING SAMPLE SIZEDETECTABLE EFFECT SIZE

Page 10: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Who is taller?Detecting smaller differences is harder

10

CALCULATING SAMPLE SIZEDETECTABLE EFFECT SIZE

Page 11: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Larger samples easier to detect smaller effects E.g. Are test scores similar in schools where teachers

receive bonus than in schools where they are not?

Sample Test scores Can we say it is different?10 schools with bonus 68% With very low confidence10 schools without bonus 65%10 school with bonus 80% With high confidence10 school without bonus 50%500 school with bonus 68% With high confidence500 school without bonus 65%

11

CALCULATING SAMPLE SIZEDETECTABLE EFFECT SIZE

Page 12: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

How to determine detectable effect size? Smallest effect that would prompt a policy response Smallest cost effective effect

E.g. Constructing toilets for girls significantly ↑ girls access by 10%.

Great - let’s think about how we can scale this up. significantly ↑ girls access by 0.5%.

Great….uh..wait: we spent all of that money and it only increased test scores by that much?

12

CALCULATING SAMPLE SIZEDETECTABLE EFFECT SIZE

Page 13: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Minimize 2 types of statistical error:

Type 1 error → repeating/continuing

a bad program Minimized after data

collection, during analysis

Type 2 error → stopping/not scaling

up good program Minimized before data

collection

13

Conclusion, based on data analysis is that…

there is an impact

cannot say there is an

impact

Intervention has an effect (in

reality)

No Type 1 error OK

Yes OK Type 2 error

CALCULATING SAMPLE SIZETYPE 1 AND TYPE 2 ERRORS

Page 14: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Type 1: significance Lower significance Larger samples Common levels: α = 1% or α = 5%

1% or 5% probability that there is an effect but we think found one

1- Type 2: power Higher power Larger samples Common levels: 1- β = 80% or 1- β = 90%

20% or 10% probability that there is an effect but we cannot detect it

14

CALCULATING SAMPLE SIZETYPE 1 AND TYPE 2 ERRORS

Page 15: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Less underlying variance easier to detect

difference smaller sample

15

CALCULATING SAMPLE SIZEVARIANCE IN OUTCOME

Page 16: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

How do we know this before we decide our sample size and collect our data? Ideal pre-existing data often ….non-existent

Example: EMIS, school census, national assessment Can use pre-existing data from a similar

population

Makes this a bit of guesswork, not an exact science

16

CALCULATING SAMPLE SIZEVARIANCE IN OUTCOME

Page 17: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

1. Multiple treatment arms2. Group-disaggregated results3. Clustered design4. Stratification

17

FURTHER ISSUES

Page 18: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Straightforward to compare each treatment separately to the comparison group

To compare multiple treatment groups larger samples Especially if treatments very similar, because differences

between treatment groups would be smaller Like fixing a very small detectable effect size

E.g. Distinguish between two amounts of scholarships

18

FURTHER ISSUES1. MULTIPLE TREATMENT ARMS

Page 19: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Are effects different for men and women? For different grades?

Estimating differences in treatment impacts (heterogenous) larger samples Especially difference is expected to react in a

similar way

19

FURTHER ISSUES2. GROUP-DISAGGREGATED RESULTS

Page 20: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Sampling units are clusters rather than individuals Very common in education: outcome of interest at

the student level but sampling/randomization unit are villages/schools/classroom

Examples: Impact of teacher training on student test scores

Primary sampling unit Schools

Secondary sampling unit TeachersOutcomes unit Students

FURTHER ISSUES3. CLUSTERED DESIGN

Page 21: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Why? Minimize or remove contamination– E.g.: In the deworming program, schools was chosen as the

unit because worms are contagious Basic Feasibility/Political considerations– E.g. school-feeding: Cannot include and exclude different

students from the same school Only natural choice– Example: Any education intervention that affect an entire

classroom (e.g. flipcharts, teacher training).

FURTHER ISSUES3. CLUSTERED DESIGN

Page 22: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Implications of clustering Outcomes for all the individuals within a unit may be

correlated All villagers are exposed to the same weather All students share a schoolmaster The program affect all students at the same time. The member of a village interact with each other

The sample size needs to be adjusted for this correlation More correlation btw outcomes → larger sample

Adequate number of groups!!! (often matters less than the number of individuals per groups) e.g. You CANNOT randomize at the level of the district, with one

treated district and one control district!!!!

FURTHER ISSUES3. CLUSTERED DESIGN

Page 23: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

What? Sub-populations/blocks defined by value of the control variables Common strata: geography, gender, sector, etc. Treatment assignment (or sampling) occurs within these groups

Why? Ensures treatment and control groups are balanced ↓ sample size because

↓ variance of the outcome of interest in each strata (most when high correlation btw stratification variables and outcome)

↓ correlation of units within clusters.

FURTHER ISSUES4. STRATIFYING

Page 24: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Geography example: What’s the impact in a particular region? Sometimes hard to say with any confidence

= T = C

FURTHER ISSUES4. STRATIFYING

Page 25: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

Why do we need strata? Random assignment to treatment within

geographical units Within each unit, ½ will be treatment, ½ will be control

Similar logic for gender, type of schools, school size, etc

FURTHER ISSUES4. STRATIFYING

Page 26: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

SUMMING UP

Your sample size will determine how much you can learn from your IE

Some judgment and guesswork in calculations but important to spend time on them If sample size is too low: waste of time and money

You will not be able to detect a non-zero impact with any confidence

Questions?26

Page 27: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

EXAMPLE/EXERCISE Exemple : Sampling efficiency

We generated data from a population Compute mean and variance Select random sample of different sizes and compute the

average And see how close the the real population value we get

27

Mean Standard deviation

Confidence Interval (95%)

Population, 100000 61 14.91 -Sample, 3000 61.39 15 [60.84 , 61.94]Sample, 1000 60.86 15.07 [59.91 , 61.80 ]Sample, 300 61.77 14.59 [60.09 , 63.45]Sample, 30 66.73 14.75 [61.35 , 72.11]

Page 28: P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.

EXAMPLE/EXERCISE Exemple : Sample size

Country X wishes to improve students’ math performance in grade 2. To do so, the Minisitry of Education of X decides to distribute new math textbooks to those students that they can take home. One year earlier, a national test in Math indicated that the average test scores was 40% with a standard deviation of 19. The national statistics indicate that 15% of the students repeat grade 2. Distributing the textbooks cost on average $125 (cost of the book and distribution). Given that the Minister is unsure of the impact of this program, he would like you to evaluate it.

List the different items that you need in order to determine your sample size. Fixe the value of those items. 28