The Human Development Spanish Impact Network...
Transcript of The Human Development Spanish Impact Network...
www.worldbank.org/ieinpractice
The
World Bank
Human Development
Network
Spanish Impact
Evaluation Fund
Laura Rawlings
Safety Nets Core Course, December 2011
This material builds on Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.
MEASURING IMPACT
Impact Evaluation for Policy Makers
Outline
Why and What to Evaluate?
The Causal Inference Problem
Where do comparison groups come from?
Randomized assignment
Illustrations
Why Evaluate?
Credible evidence as a foundation for…
Results-based management
Knowledge on development effectiveness
Accountability and transparency
1
2
3
Governments and managers are being judged by their programs‟ performance, not their control of inputs. Shift in focus from inputs to outcomes, from threat to tools
Between government and civil society
Between programs and beneficiaries
Need evidence on what works
Improve program/policy implementation
Information key to sustainability4 Budget negotiations, program sustainability across
political administrations
Informing beliefs and the press
Monitoring vs. Evaluation
Monitoring Evaluation
Frequency Regular, Continuous Periodic
Coverage All programs Selected program, aspects
Data Universal Sample based
Depth of
InformationTracks implementation,
looks at WHAT
Tailored, often to performance
and impact/ WHY
Cost Cost spread out Can be high
UtilityContinuous program
improvement, managementMajor program decisions
Evaluations
A systematic, objective assessment of an on-going or
completed project, program, or policy, its design,
implementation and/or results, asking
o Descriptive Questions to seek to determine what is taking
place and describe aspect of a process.
o Normative Questions to compare what is taking place to
what should be taking place.
o Cause-and-Effect Questions to examine outcomes and
assess what difference the intervention makes in outcomes
Impact Evaluation Answers
What is effect of a public works program on
beneficiaries‟ income?
Does providing skills training improve
employment of disadvantaged youth?
Are vocational training or business grant
programs more effective in improving rural
households risk-management?
Is it necessary to complement an
entrepreneurship training intervention with
transfers to foster business creation?
Impact Evaluation
An assessment of the causal effect of a project , program or policy on beneficiaries. Uses a counterfactual…
o to estimate what the state of the beneficiaries would have been in the absence of the program (the control or comparison group), compared to the observed state of beneficiaries (the treatment group), and
o to determine intermediate or final outcomes attributable to the intervention .
Prospective vs Retrospective Evaluation
Retrospective Evaluation is necessary when we have to work with a pre-assigned program (expanding an existing program) and existing data (baseline?)
In Prospective Evaluation, the evaluation is designed in parallel with the assignment of the program, and the baseline data can be gathered.
When to use Impact Evaluation?
Evaluate impact selectively, when project is:
Innovative
Replicable/scalable
Strategically relevant
Evaluation will fill knowledge gap
Substantial policy impact
Use evaluation within a program to test
alternatives and improve programs
Beyond „does my program work‟?
Towards „which design is more effective?‟
Impact Evaluation 2.0
Using a Theory of change to derive a
well-defined policy question
What are the intended results of the program?
How will we achieve the intended results?
How will we know we have achieved the intended results?
Results Chain are a simple approach to mapping
the causal logic/theory of change underpinning a
program, answering 3 questions:
Typical Results Chain
Inputs
• Financial, human, and other resources mobilized to support activities
• Budgets, staffing,
other available
resources
Activities
• Actions taken or work performed to convert inputs into specific outputs
• Series of activities
undertaken to
produce goods
and services
Outputs
• Products resulting from converting inputs into tangible outputs
• Goods and
services produced
and delivered ,
under the control
of the
implementing
agency
Outcomes
• Changes resulting from use of outputs by targeted population
• Not fully under the control of implementing agency
Final
Outcomes
• The final
objective of the
program
• Long-term
goals
• Changes in outcomes with multiple drivers
12
Implementation (SUPPLY SIDE) Results (DEMAND + SUPPLY)
Public Works Program
Results Chain Example
Inputs
• Budget for PW
Program
• Ministry of Labor
staff
• Staff from
participating
municipalities
Activities
• Setting of sub-
minimum wage
• Information
campaign
• Enrollment
• Selection of sites,
contracting and
training of PW
operators
Outputs
(Annual)
• 50,000 jobs
• $1,000,000 in
wages
• > 75% of
program costs
transferred as
wages
• 2,000 PW
subprojects
produced
Outcomes
• Net income
transfer to
households
• Skills acquired
• Utility,
maintenance of
PWs
Final
Outcomes
• Income,
employment
• Beneficiary
households:
-income, assets
-health, nutrition
- education
• Aggregate
unemployment,
poverty
13
Implementation (SUPPLY SIDE) Results (DEMAND + SUPPLY)
CausalInference
Counterfactuals
False Counterfactuals
Before & After (Pre & Post)
Enrolled & Not Enrolled (Apples & Oranges)
Our Objective
Estimate the causal effect (impact)
of intervention (P) on outcome (Y).(P) = Program or Treatment
(Y) = Outcome Indicator, Measure of Success
Example: What is the effect of a Public Works Program (P)
on Household Consumption (Y)?
Problem of Missing Data
For a program beneficiary:
α= (Y | P=1)-(Y | P=0)
we observe(Y | P=1): Household Consumption (Y) with
a public works program (P=1)
but we do not observe(Y | P=0): Household Consumption (Y)
without a public works program (P=0)
SolutionEstimate what would have happened to
Y in the absence of P.
We call this the Counterfactual.
Estimating impact of P on Y
OBSERVE (Y | P=1)
Outcome with treatment
ESTIMATE (Y | P=0)
The Counterfactual
o Intention to Treat (ITT) –
Those offered treatment
o Treatment on the Treated
(TOT) – Those receiving
treatment
o Use comparison or
control group
α= (Y | P=1)-(Y | P=0)
IMPACT = - counterfactualOutcome with
treatment
Example: What is the Impact of…
giving Fulanito
(P)
(Y)?
additional pocket money
on Fulanito‟s consumption
of candies
In reality, use statistics
Treatment Comparison
Average Y=6 candies Average Y=4 Candies
IMPACT=6-4=2 Candies
Finding good comparison groups
We want to find clones for the Fulanitos in our
programs.
The treatment and comparison groups should
o have identical characteristics
o except for benefiting from the intervention.
In practice, use program eligibility & assignment
rules to construct valid estimates of the
counterfactuals
CausalInference
Counterfactuals
False Counterfactuals
Before & After (Pre & Post)
Enrolled & Not Enrolled (Apples & Oranges)
Counterfeit Counterfactual #1Before & After
What is the effect of a Public Works program
(P) on consumption (Y)?Y
TimeT=1997 T=1998
α = $35
IMPACT=A-B= $35
B
A
233
268
(1) Observe only
beneficiaries (P=1)
(2) Two observations
in time:
Consumption at T=0
(1997)
and consumption at
T=1 (1998)
Case 1: What’s the problem?
Y
TimeT=0 T=1
α = $35
B
A
233
268
Economic Boom:o Real Impact=A-C
o A-B is an
overestimate
C ?
D ?
Impact?
Impact?Economic Recession:o Real Impact=A-D
o A-B is an
underestimate
Counterfeit Counterfactual #2
If we have post-treatment data on
o Enrolled: treatment group
o Not-enrolled: “control” group (counterfactual)
Those ineligible to participate.
Or those that choose NOT to participate.
Selection Bias
o Reason for not enrolling may be correlated
with outcome (Y)
Control for observables.
But not un-observables!
o Estimated impact is confounded with other
things.
Enrolled & Not Enrolled
B&ACompare: Same individuals
Before and After they
receive P.
Problem: Other things may
have happened over time.
E&NECompare: Group of
individuals Enrolled in a
program with group that
chooses not to enroll.
Problem: Selection Bias.
We don‟t know why they
are not enrolled.
Keep in Mind
Both counterfactuals may
lead to biased estimates of
the counterfactual and the
impact.
!
IE Methods
Toolbox
Randomized Assignment
Discontinuity Design
DD
Randomized Promotion
Difference-in-Differences
P-Score matching
Matching
All impact evaluations estimate the counterfactual, using control or comparison groups: What would the treatment group be like in the absence of the program?
1. Experimental/Randomized Assignment
- uses randomized assignment to determine who gets program treatment(s) and who is control among eligible beneficiaries
- can be used ethically in cases where program cannot reach all potential beneficiaries at once; or to test program alternatives
- random assignment creates statistically equivalent groups (treatment and control) which allows a valid estimate of the counterfactual
2. Quasi-Experimental
- mimics experimental designs
- methods to create comparison groups include:• Randomized promotion
• Regression Discontinuity
• Differences in Differences
• Statistical Matching
Method is derived from rules of program operation
IE Methods Toolbox
Where Do Comparison Groups come from?
The rules of program operation
determine the evaluation strategy.
We can almost always find a valid
comparison group if: the operational rules for selecting
beneficiaries are equitable, transparent and
accountable;
the evaluation is designed prospectively.
5 methods in IE Toolbox
1 Randomized Assignment
3 Regression Discontinuity Design
DD
2 Randomized Promotion
4 Difference-in-Differences
5 Matching
RDD
5 methods in IE toolbox take different
approaches to generate comparison
groups and estimate the counterfactual:
Choosing your IE method(s)
Best Design
Have we controlled for
everything?
Is the result valid for
everyone?
o Best comparison group you
can find + least operational
risk
o External validity
o Evaluation results apply to
population we‟re interested in
o Internal validity
o Good comparison group
Choose the best possible design given the
operational context:
Choosing an IE design for your program
Use opportunities to generate good comparison
groups and ensure baseline data is collected.
3 questions to determine which method is
appropriate for a given program
Money: Does the program have sufficient resources to
achieve scale and reach full coverage of all eligible
beneficiaries?
Targeting Rules: Who is eligible for program benefits? Is the
program targeted based on an eligibility cut-off or is it
available to everyone?
Timing: How are potential beneficiaries enrolled in the
program – all at once or in phases over time?
Choosing your IE method(s)
Money Excess demand No Excess demand
Targeting
Timing
Targeted Universal Targeted Universal
Phased
Roll-out
1 Randomized
assignment
4 RDD
1 Randomized
assignment
2 Randomized
promotion
3 DD with
5 Matching
1 Randomized
Assignment
4 RDD
1 Randomized
assignment to
phases
2 Randomized
Promotion to
early take-up
3 DD with
5 matching
Immediate
Roll-out
1 Randomized
Assignment
4 RDD
1 Randomized
Assignment
2 Randomized
Promotion
3 DD with
5 Matching
4 RDD
If less than full
Take-up:
2 Randomized
Promotion
3 DD with
5 Matching
Choosing your IE method(s)
Money Excess demand No Excess demand
Targeting
Timing
Targeted Universal Targeted Universal
Phased
Roll-out
1 Randomized
assignment
4 RDD
1 Randomized
assignment
2 Randomized
promotion
3 DD with
5 Matching
1 Randomized
Assignment
4 RDD
1 Randomized
assignment to
phases
2 Randomized
Promotion to
early take-up
3 DD with
5 matching
Immediate
Roll-out
1 Randomized
Assignment
4 RDD
1 Randomized
Assignment
2 Randomized
Promotion
3 DD with
5 Matching
4 RDD
If less than full
Take-up:
2 Randomized
Promotion
3 DD with
5 Matching
IE Methods
Toolbox
Randomized Assignment
Discontinuity Design
DD
Randomized Promotion
Difference-in-Differences
P-Score matching
Matching
Randomized Treatments & Controls
o Randomize!
o Lottery for who is offered benefits
o Fair, transparent and ethical way to assign benefits to equally
deserving populations.
Eligibles > Number of Benefits
o Give each eligible unit the same chance of receiving treatment
o Compare those offered treatment with those not offered
treatment (controls).
Oversubscription
o Give each eligible unit the same chance of receiving treatment
first, second, third…
o Compare those offered treatment first, with those
offered later (controls).
Randomized Phase In
= Ineligible
Randomized treatments and controls
= Eligible
1. Population
External Validity
2. Evaluation sample
3. Randomize
treatment
Internal Validity
Comparison
Unit of RandomizationChoose according to type of program
o Individual/Household
o School/Health
Clinic/catchment area
o Block/Village/Community
o Ward/District/Region
Keep in mind
o Need “sufficiently large” number of units to
detect minimum desired impact: Power.
o Spillovers/contamination
o Operational and survey costs
Randomized Assignment
How do we know we have
good clones?
In the absence of the program,
treatment and comparisons should be
identical
Let‟s compare their characteristics at
baseline (T=0)
The Problem – High Youth Unemployment in Tunisia:
23% among all higher education graduates in 2009
Persistent over time (46% 18 months after graduation)
Objectives:
Better align skills of recent graduates with private sector needs
Foster entrepreneurship among cohort of young graduates
Intervention:
National reform of last semester of undergraduate studies
In lieu of standard curriculum, opportunity to (i) participate in
entrepreneurship training, (ii) receive individual coaching, (iii)
write a business plan and (iv) participate in a business plan
competition
Case 1: Entrepreneurship Promotion in Tunisia
Context (Premand et al 2011)
Innovative program with prospective evaluation design
Evaluation question:
Does entrepreneurship training increase employment among
university graduates?
Evaluation design:
Inscription open to all 3rd year students
Oversubscription – no clear mean to prioritize needs
Individual randomization (computer –based, stratification by gender
and type of major)
Case 1: Entrepreneurship Promotion in Tunisia
Randomized Assignment
= Ineligible
Case 1: Entrepreneurship Promotion in Tunisia
Randomized Assignment
= 3rd year University Students
1. Population
2. Students
Interested in
Participating
(1702 students)
3. Randomize
treatment
Comparison
Treatment
Case 1: Entrepreneurship Promotion in Tunisia
How do we know we have good clones?
46
Randomized assignment worked in achieving balance…
Timeline: Baseline in December 2009, intervention in 2010, final evaluation results 2011
Control Treatment Diff Err. St.
Male 34% 33% -0.01 0.01
Age 22.97 23.08 0.11 0.09
Average grade in 2nd year 11.46 11.51 0.05 0.05
Has w ork experience 69.7% 71.7% 0.02 0.02
Has w ork experience related to business plan 61.0% 63.0% 0.02 0.03
Know s an entrepreneur 59.0% 63.1% 0.04 0.03
Is w illing to take risk 0.94 0.95 0.93 -0.02
Household size 6.46 6.48 0.03 0.05
Father is employed 36.5% 33.6% -0.03 0.02
Father is self-employed 27.5% 27.9% 0.00 0.01
Mother is employed 9.5% 9.5% 0.00 0.01
Mother is self-employed 7.2% 8.2% 0.01 0.02
Monthy income betw een 0 and 300 DT 23.9% 25.1% 0.01 0.02
Monthy income betw een 301 and 500 DT 30.1% 29.2% -0.01 0.02
Monthy income betw een 501 and 1000 DT 21.6% 19.5% -0.02 0.01
Keep in Mind
Randomized AssignmentIn Randomized Assignment,
large enough samples,
produces 2 statistically
equivalent groups.
We have identified the
perfect clone.
Randomized
beneficiary
Randomized
comparison
Feasible for prospective
evaluations with over-
subscription/excess demand.
Most pilots and new
programs fall into this
category.
!
Consider evaluating relative
effectiveness of alternative
program design options.
Remember
The objective of impact evaluation
is to estimate the causal effect or
impact of a program on outcomes
of interest.
Remember
To estimate impact, we need to
estimate the counterfactual.o what would have happened in the absence of
the program and
o use comparison or control groups.
Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel
M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary
Material, The World Bank, Washington DC
(www.worldbank.org/ieinpractice).
Spanish and French version forthcoming.
Khandker, Shahidur R., Gayatri B. Koolwal, and Samad A. Samad,
2009, Handbook on Impact Evaluation: Quantitative Methods and
Practice. Washington DC: The World Bank, 2009.
Fitzsimons, E. and M. Vera-Hernández, 2009, “A practitioner’s guide to
Evaluating the Impact of Labor Market Programs”, Employment Policy
Primer N. 12, World Bank, Washington DC.
References (Methods)