Impact Evaluation Toolbox
description
Transcript of Impact Evaluation Toolbox
![Page 1: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/1.jpg)
Impact Evaluation Toolbox
Gautam RaoUniversity of California, Berkeley
* ** Presentation credit: Temina Madon
![Page 2: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/2.jpg)
Impact Evaluation
1) The “final” outcomes we care about- Identify and measure them
![Page 3: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/3.jpg)
Measuring Impacts
INPUTS
OUTPUTS
OUTCOMES
IMPACTS
difficulty of
showing causality
![Page 4: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/4.jpg)
Measuring Impacts
HIV Awareness Campaign
INPUTS
OUTPUTS
OUTCOMES
IMPACTSKnowledge abt HIV,
sexual behavior, incidence
difficulty of
showing causality
No of Street Plays, Users targeted at
govt. clinics
![Page 5: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/5.jpg)
Impact Evaluation
1) The “final” outcomes we care about- Identify and measure them
2) True “causal” effect of the intervention- Counterfactual: What would have happened in the absence of the intervention?- Compare measured outcomes with counterfactual Causal effect
![Page 6: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/6.jpg)
Toolbox for Impact EvaluationNon or Quasi-Experimental1) Before vs. After2) With / Without Program3) Difference –in-Difference4) Discontinuity Methods5) Multivariate Regression6) Instrumental Variable
Experimental Method (Gold Standard)7) Randomized Evaluation
![Page 7: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/7.jpg)
Naïve Comparisons
1) Before-After the program
2) With-Without the program
![Page 8: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/8.jpg)
An Example – Millennium Village Project (MVP)
Intensive package intervention to spark development in rural Africa
– 2005 to 2010 (and ongoing)– Non randomly selected sites
– Poorer villages targeted
![Page 9: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/9.jpg)
Before vs After
![Page 10: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/10.jpg)
With vs. Without Program
MVP villages
Comparison Villages
![Page 11: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/11.jpg)
With a little more data
![Page 12: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/12.jpg)
Compare before and after intervention:
A’ = Assumed outcome with no intervention
B = Observed outcome
B-A = Estimated impact
Malnutrition
Before-After Comparison
AfterBefore
B
A’
2004
A
2002
We observe that villagers provided with training experience an increase in malnutrition from 2002 to 2004.
![Page 13: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/13.jpg)
Now compare villagers in the program region A to those in another region B. We find that our “program” villagers have a larger increase in malnutrition than those in region B.
Did the program have a negative impact?– Not necessarily (remember program placement!)
• Region B is closer to the food aid distribution point, so villagers can access food easily (observable)
• Region A is less collectivized–villagers are less likely to share food with their malnourished neighbors (unobservable)
With-Without Comparison
![Page 14: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/14.jpg)
Problems with Simple Comparisons
Before-After
Problem: Many things change over time, not just the project. Omitted Variables.
With-Without
Problem: Comparing oranges with apples. Why did the enrollees decide to enroll? Selection bias.
![Page 15: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/15.jpg)
Another example of Selection Bias
Impact of Health Insurance• Can we just compare those with and
without health insurance?– Who purchases insurance?– What happens if we compare health of those who
sign up, with those who don’t?
![Page 16: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/16.jpg)
• Change in outcome in the treatment group controlling for observable characteristics
• requires theorizing on what observable characteristics may impact the outcome of interest besides the programme
• Issues:– how many characteristics can be accounted for?
(omission variable bias)– requires a large sample if many factors are to be
controlled for
Multivariate Regression
![Page 17: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/17.jpg)
Multivariate Regression
Compare before and after intervention:
A’ = Assumed outcome with no intervention
B = Observed outcome
B-A = Estimated impact
Control for other factors
C = Actual outcome with no intervention (control)
B-C = True ImpactAfterBefore
B
A’
2004
A
2002
Malnutrition
C
![Page 18: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/18.jpg)
To Identify Causal Links:
We need to know:• The change in outcomes for the treatment
group• What would have happened in the absence of
the treatment (“counterfactual”)
At baseline, comparison group must be identical (in observable and unobservable dimensions) to those receiving the program.
![Page 19: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/19.jpg)
Creating the CounterfactualWomen aged 14-25 years
![Page 20: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/20.jpg)
What we can observe
Women aged 14-25 years
Screened by: • income• education level• ethnicity• marital status• employment
![Page 21: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/21.jpg)
What we can’t observe
Women aged 14-25 years
What we can’t observe:• risk tolerance• entrepreneurialism• generosity• respect for authority
![Page 22: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/22.jpg)
Selection based on Observables
With Program
Without Program
![Page 23: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/23.jpg)
What can happen (unobservables)
With Program
Without Program
![Page 24: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/24.jpg)
Randomization
With Program
Without Program
![Page 25: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/25.jpg)
Randomization
With Program
Without Program
![Page 26: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/26.jpg)
Methods & Study Designs
Study Designs
ClusteringPhased Roll-OutSelective promotionVariation in Treatments
Methods Toolbox
RandomizationRegression DiscontinuityDifference in DifferencesMatchingInstrumental Variables
![Page 27: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/27.jpg)
Methods Toolbox
Randomization
Use a lottery to give all people an equal chance of being in control or treatment groups
With a large enough sample, it guarantees that all factors/characteristics will be equal between groups, on average
Only difference between 2 groups is the intervention itself
“Gold Standard”
![Page 28: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/28.jpg)
Ensuring Validity
Random sample
Randomization
National Population
Sample of National Population
INTERNAL VALIDITY
EXTERNAL VALIDITY
![Page 29: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/29.jpg)
Ensuring Validity
Internal Validity: - Have we estimated a causal effect?- Have we identified a valid counterfactual?
External Validity: - Is our estimate of the impact
“generalizable”- e.g. will IE results from Kerala generalize
to Punjab?
![Page 30: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/30.jpg)
Methods ToolboxRegression discontinuity
When you can’t randomize… use a transparent & observable criterion for who is offered a program
Examples: age, income eligibility, test scores (scholarship), political borders
Assumption: There is a discontinuity in program participation, but we assume it doesn’t impact outcomes of interest.
![Page 31: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/31.jpg)
Methods ToolboxRegression discontinuity
Non-Poor
Poor
![Page 32: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/32.jpg)
Methods Toolbox
Non-Poor
Poor
Regression discontinuity
![Page 33: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/33.jpg)
Example: South Africa pensions program
Methods ToolboxRegression discontinuity
Age Women Men
55-59 16% 5%
60-64 77% 22%
65+ 60%
What happens to children living in these households?
GIRLS: pension-eligible women vs. almost eligible 0.6 s.d. heavier for age
BOYS: pension-eligible women vs. almost-eligible 0.3 s.d. heavier for age
With pension-eligible men vs almost-eligible No diff.
![Page 34: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/34.jpg)
Limitation: Poor generalizability. Only measures those near the threshold.
Must have well-enforced eligibility rule.
Methods ToolboxRegression discontinuity
![Page 35: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/35.jpg)
Pre Post
Difference in differences
2-way comparison of observed changes in outcomes (Before-After) for a sample of participants and non-participants (With-Without)
Methods Toolbox
![Page 36: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/36.jpg)
Big assumption: In absence of program, participants and non-participants would have experienced the same changes.
Robustness check: Compare the 2 groups across several periods prior to intervention; compare variables unrelated to program pre/post
Can be combined with randomization, regression discontinuity, matching methods
Difference in differences
Methods Toolbox
![Page 37: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/37.jpg)
Also called: Interrupted time series, trend break analysis
• Collect data on a continuous variable, across numerous consecutive periods (high frequency), to detect changes over time correlated with intervention.
• Challenges: Expectation should be well-defined upfront. Often need to filter out noise.
• Variant: Panel data analysis
Time Series Analysis
Methods Toolbox
![Page 38: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/38.jpg)
Time Series Analysis
Methods Toolbox
![Page 39: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/39.jpg)
Pair each program participant with one or more non-participants, based on observable characteristics
Big assumption required: In absence of program, participants and non-participants would have been the same. (No unobservable traits influence participation)
**Combine with difference in differences
Matching
Methods Toolbox
![Page 40: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/40.jpg)
Matching
Methods Toolbox
Propensity score assigned:
score(xi) = Pr(Zi = 1|Xi = xi)
Z=treatment assignmentx=observed covariates
Requires Z to be independent of outcomes…
Works best when x is related to outcomes and selection (Z)
![Page 41: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/41.jpg)
Matching
Methods Toolbox
![Page 42: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/42.jpg)
Matching
Methods Toolbox
Often need large data sets for baseline and follow-up:
• Household data• Environmental & ecological data (context)
Problems with ex-post matching: More bias with “covariates of convenience” (i.e. age, marital status, race, sex)
Prospective if possible!
![Page 43: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/43.jpg)
Instrumental Variables
Methods Toolbox
Example: Want to learn the effect of access to health clinics on maternal mortality
Find a variable – “Instrument” – which affects access to health clinics but not maternal mortality directly.
e.g. changes in distance of home to nearest clinic.
![Page 44: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/44.jpg)
Overview of Study Designs
Study Designs
Clusters
Phased Roll-OutSelective promotionVariation in Treatments
Not everyone has access to the intervention at the same time (supply variation)
The program is available to everyone (universal access or already rolled out)
![Page 45: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/45.jpg)
Study Designs
• Cluster when spillovers or practical constraints prevent individual randomization.
• But… easier to get big enough sample if we randomize individuals
Individual randomization Group randomization
Cluster vs. Individual
![Page 46: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/46.jpg)
Issues with Clusters
• Assignment at higher level sometimes necessary:– Political constraints on differential treatment within
community– Practical constraints—confusing for one person to
implement different versions– Spillover effects may require higher level randomization
• Many groups required because of within-community (“intraclass”) correlation
• Group assignment may also require many observations (measurements) per group
![Page 47: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/47.jpg)
Example: Deworming
• School-based deworming program in Kenya:
– Treatment of individuals affects untreated (spillovers) by interrupting oral-fecal transmission
– Cluster at school instead of individual– Outcome measure: school attendance
• Want to find cost-effectiveness of intervention even if not all children are treated
• Findings: 25% decline in absenteeism
![Page 48: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/48.jpg)
Possible Units for Assignment
– Individual– Household – Clinic– Hospital
– Village level– Women’s
association– Youth groups– Schools
![Page 49: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/49.jpg)
Unit of Intervention vs. Analysis
20 InterventionVillages
20 ComparisonVillages
![Page 50: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/50.jpg)
Target Population
Evaluation SampleUnit of Intervention/
Randomization
Unit of AnalysisControl Treatment
Random Sample
Random Assignment
Random Sample again?
Unit of Analysis
![Page 51: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/51.jpg)
Study DesignsPhased Roll-out
Use when a simple lottery is impossible (because no one is excluded) but you have control over timing.
![Page 52: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/52.jpg)
Phased Roll-Out
ClustersA
1
2
3
![Page 53: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/53.jpg)
Phased Roll-Out
ClustersA B
1 Program
2
3
![Page 54: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/54.jpg)
Phased Roll-Out
ClustersA B C
1 Program Program
2 Program
3
![Page 55: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/55.jpg)
Phased Roll-Out
ClustersA B C D
1 Program Program Program
2 Program Program
3 Program
![Page 56: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/56.jpg)
Randomize who gets an extra push to participate in the program
Already have the program
Unlikely to join the program regardless
Likely to join the program if pushed
Study DesignsRandomized Encouragement
![Page 57: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/57.jpg)
Evaluation ToolboxVariation in Treatment
![Page 58: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/58.jpg)
Advantages of “experiments”
Clear and precise causal impactRelative to other methods
Much easier to analyzeCan be cheaper (smaller sample sizes)Easier to explainMore convincing to policymakersMethodologically uncontroversial
58
![Page 59: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/59.jpg)
When to think impact evaluation?
EARLY! Plan evaluation into your program design and roll-out phasesWhen you introduce a CHANGE into the program
New targeting strategyVariation on a themeCreating an eligibility criterion for the program
Scale-up is a good time for impact evaluation!Sample size is larger
59
![Page 60: Impact Evaluation Toolbox](https://reader033.fdocuments.net/reader033/viewer/2022061617/56815d4a550346895dcb5276/html5/thumbnails/60.jpg)
When is prospective impact evaluation impossible?
Treatment was selectively assigned and announcedand no possibility for expansion
The program is over (retrospective)Universal take up alreadyProgram is national and non excludableFreedom of the press, exchange rate policy(sometimes some components can be randomized)Sample size is too small to make it worth it
60