The Human Development Spanish Impact Network...

53
www.worldbank.org/ieinpractice The World Bank Human Development Network Spanish Impact Evaluation Fund

Transcript of The Human Development Spanish Impact Network...

www.worldbank.org/ieinpractice

The

World Bank

Human Development

Network

Spanish Impact

Evaluation Fund

Laura Rawlings

Safety Nets Core Course, December 2011

This material builds on Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

MEASURING IMPACT

Impact Evaluation for Policy Makers

Outline

Why and What to Evaluate?

The Causal Inference Problem

Where do comparison groups come from?

Randomized assignment

Illustrations

Why Evaluate?

Credible evidence as a foundation for…

Results-based management

Knowledge on development effectiveness

Accountability and transparency

1

2

3

Governments and managers are being judged by their programs‟ performance, not their control of inputs. Shift in focus from inputs to outcomes, from threat to tools

Between government and civil society

Between programs and beneficiaries

Need evidence on what works

Improve program/policy implementation

Information key to sustainability4 Budget negotiations, program sustainability across

political administrations

Informing beliefs and the press

Monitoring vs. Evaluation

Monitoring Evaluation

Frequency Regular, Continuous Periodic

Coverage All programs Selected program, aspects

Data Universal Sample based

Depth of

InformationTracks implementation,

looks at WHAT

Tailored, often to performance

and impact/ WHY

Cost Cost spread out Can be high

UtilityContinuous program

improvement, managementMajor program decisions

Evaluations

A systematic, objective assessment of an on-going or

completed project, program, or policy, its design,

implementation and/or results, asking

o Descriptive Questions to seek to determine what is taking

place and describe aspect of a process.

o Normative Questions to compare what is taking place to

what should be taking place.

o Cause-and-Effect Questions to examine outcomes and

assess what difference the intervention makes in outcomes

Impact Evaluation Answers

What is effect of a public works program on

beneficiaries‟ income?

Does providing skills training improve

employment of disadvantaged youth?

Are vocational training or business grant

programs more effective in improving rural

households risk-management?

Is it necessary to complement an

entrepreneurship training intervention with

transfers to foster business creation?

Impact Evaluation

An assessment of the causal effect of a project , program or policy on beneficiaries. Uses a counterfactual…

o to estimate what the state of the beneficiaries would have been in the absence of the program (the control or comparison group), compared to the observed state of beneficiaries (the treatment group), and

o to determine intermediate or final outcomes attributable to the intervention .

Prospective vs Retrospective Evaluation

Retrospective Evaluation is necessary when we have to work with a pre-assigned program (expanding an existing program) and existing data (baseline?)

In Prospective Evaluation, the evaluation is designed in parallel with the assignment of the program, and the baseline data can be gathered.

When to use Impact Evaluation?

Evaluate impact selectively, when project is:

Innovative

Replicable/scalable

Strategically relevant

Evaluation will fill knowledge gap

Substantial policy impact

Use evaluation within a program to test

alternatives and improve programs

Beyond „does my program work‟?

Towards „which design is more effective?‟

Impact Evaluation 2.0

Using a Theory of change to derive a

well-defined policy question

What are the intended results of the program?

How will we achieve the intended results?

How will we know we have achieved the intended results?

Results Chain are a simple approach to mapping

the causal logic/theory of change underpinning a

program, answering 3 questions:

Typical Results Chain

Inputs

• Financial, human, and other resources mobilized to support activities

• Budgets, staffing,

other available

resources

Activities

• Actions taken or work performed to convert inputs into specific outputs

• Series of activities

undertaken to

produce goods

and services

Outputs

• Products resulting from converting inputs into tangible outputs

• Goods and

services produced

and delivered ,

under the control

of the

implementing

agency

Outcomes

• Changes resulting from use of outputs by targeted population

• Not fully under the control of implementing agency

Final

Outcomes

• The final

objective of the

program

• Long-term

goals

• Changes in outcomes with multiple drivers

12

Implementation (SUPPLY SIDE) Results (DEMAND + SUPPLY)

Public Works Program

Results Chain Example

Inputs

• Budget for PW

Program

• Ministry of Labor

staff

• Staff from

participating

municipalities

Activities

• Setting of sub-

minimum wage

• Information

campaign

• Enrollment

• Selection of sites,

contracting and

training of PW

operators

Outputs

(Annual)

• 50,000 jobs

• $1,000,000 in

wages

• > 75% of

program costs

transferred as

wages

• 2,000 PW

subprojects

produced

Outcomes

• Net income

transfer to

households

• Skills acquired

• Utility,

maintenance of

PWs

Final

Outcomes

• Income,

employment

• Beneficiary

households:

-income, assets

-health, nutrition

- education

• Aggregate

unemployment,

poverty

13

Implementation (SUPPLY SIDE) Results (DEMAND + SUPPLY)

CausalInference

Counterfactuals

False Counterfactuals

Before & After (Pre & Post)

Enrolled & Not Enrolled (Apples & Oranges)

Our Objective

Estimate the causal effect (impact)

of intervention (P) on outcome (Y).(P) = Program or Treatment

(Y) = Outcome Indicator, Measure of Success

Example: What is the effect of a Public Works Program (P)

on Household Consumption (Y)?

Causal Inference

What is the impact of (P) on (Y)?

α= (Y | P=1)-(Y | P=0)

Can we all go home?

Problem of Missing Data

For a program beneficiary:

α= (Y | P=1)-(Y | P=0)

we observe(Y | P=1): Household Consumption (Y) with

a public works program (P=1)

but we do not observe(Y | P=0): Household Consumption (Y)

without a public works program (P=0)

SolutionEstimate what would have happened to

Y in the absence of P.

We call this the Counterfactual.

Estimating impact of P on Y

OBSERVE (Y | P=1)

Outcome with treatment

ESTIMATE (Y | P=0)

The Counterfactual

o Intention to Treat (ITT) –

Those offered treatment

o Treatment on the Treated

(TOT) – Those receiving

treatment

o Use comparison or

control group

α= (Y | P=1)-(Y | P=0)

IMPACT = - counterfactualOutcome with

treatment

Example: What is the Impact of…

giving Fulanito

(P)

(Y)?

additional pocket money

on Fulanito‟s consumption

of candies

The Perfect CloneFulanito Fulanito‟s Clone

IMPACT=6-4=2 Candies

6 candies 4 candies

In reality, use statistics

Treatment Comparison

Average Y=6 candies Average Y=4 Candies

IMPACT=6-4=2 Candies

Finding good comparison groups

We want to find clones for the Fulanitos in our

programs.

The treatment and comparison groups should

o have identical characteristics

o except for benefiting from the intervention.

In practice, use program eligibility & assignment

rules to construct valid estimates of the

counterfactuals

CausalInference

Counterfactuals

False Counterfactuals

Before & After (Pre & Post)

Enrolled & Not Enrolled (Apples & Oranges)

Counterfeit Counterfactual #1Before & After

What is the effect of a Public Works program

(P) on consumption (Y)?Y

TimeT=1997 T=1998

α = $35

IMPACT=A-B= $35

B

A

233

268

(1) Observe only

beneficiaries (P=1)

(2) Two observations

in time:

Consumption at T=0

(1997)

and consumption at

T=1 (1998)

Case 1: What’s the problem?

Y

TimeT=0 T=1

α = $35

B

A

233

268

Economic Boom:o Real Impact=A-C

o A-B is an

overestimate

C ?

D ?

Impact?

Impact?Economic Recession:o Real Impact=A-D

o A-B is an

underestimate

Counterfeit Counterfactual #2

If we have post-treatment data on

o Enrolled: treatment group

o Not-enrolled: “control” group (counterfactual)

Those ineligible to participate.

Or those that choose NOT to participate.

Selection Bias

o Reason for not enrolling may be correlated

with outcome (Y)

Control for observables.

But not un-observables!

o Estimated impact is confounded with other

things.

Enrolled & Not Enrolled

B&ACompare: Same individuals

Before and After they

receive P.

Problem: Other things may

have happened over time.

E&NECompare: Group of

individuals Enrolled in a

program with group that

chooses not to enroll.

Problem: Selection Bias.

We don‟t know why they

are not enrolled.

Keep in Mind

Both counterfactuals may

lead to biased estimates of

the counterfactual and the

impact.

!

IE Methods

Toolbox

Randomized Assignment

Discontinuity Design

DD

Randomized Promotion

Difference-in-Differences

P-Score matching

Matching

All impact evaluations estimate the counterfactual, using control or comparison groups: What would the treatment group be like in the absence of the program?

1. Experimental/Randomized Assignment

- uses randomized assignment to determine who gets program treatment(s) and who is control among eligible beneficiaries

- can be used ethically in cases where program cannot reach all potential beneficiaries at once; or to test program alternatives

- random assignment creates statistically equivalent groups (treatment and control) which allows a valid estimate of the counterfactual

2. Quasi-Experimental

- mimics experimental designs

- methods to create comparison groups include:• Randomized promotion

• Regression Discontinuity

• Differences in Differences

• Statistical Matching

Method is derived from rules of program operation

IE Methods Toolbox

Where Do Comparison Groups come from?

The rules of program operation

determine the evaluation strategy.

We can almost always find a valid

comparison group if: the operational rules for selecting

beneficiaries are equitable, transparent and

accountable;

the evaluation is designed prospectively.

5 methods in IE Toolbox

1 Randomized Assignment

3 Regression Discontinuity Design

DD

2 Randomized Promotion

4 Difference-in-Differences

5 Matching

RDD

5 methods in IE toolbox take different

approaches to generate comparison

groups and estimate the counterfactual:

Choosing your IE method(s)

Best Design

Have we controlled for

everything?

Is the result valid for

everyone?

o Best comparison group you

can find + least operational

risk

o External validity

o Evaluation results apply to

population we‟re interested in

o Internal validity

o Good comparison group

Choose the best possible design given the

operational context:

Choosing an IE design for your program

Use opportunities to generate good comparison

groups and ensure baseline data is collected.

3 questions to determine which method is

appropriate for a given program

Money: Does the program have sufficient resources to

achieve scale and reach full coverage of all eligible

beneficiaries?

Targeting Rules: Who is eligible for program benefits? Is the

program targeted based on an eligibility cut-off or is it

available to everyone?

Timing: How are potential beneficiaries enrolled in the

program – all at once or in phases over time?

Choosing your IE method(s)

Money Excess demand No Excess demand

Targeting

Timing

Targeted Universal Targeted Universal

Phased

Roll-out

1 Randomized

assignment

4 RDD

1 Randomized

assignment

2 Randomized

promotion

3 DD with

5 Matching

1 Randomized

Assignment

4 RDD

1 Randomized

assignment to

phases

2 Randomized

Promotion to

early take-up

3 DD with

5 matching

Immediate

Roll-out

1 Randomized

Assignment

4 RDD

1 Randomized

Assignment

2 Randomized

Promotion

3 DD with

5 Matching

4 RDD

If less than full

Take-up:

2 Randomized

Promotion

3 DD with

5 Matching

Choosing your IE method(s)

Money Excess demand No Excess demand

Targeting

Timing

Targeted Universal Targeted Universal

Phased

Roll-out

1 Randomized

assignment

4 RDD

1 Randomized

assignment

2 Randomized

promotion

3 DD with

5 Matching

1 Randomized

Assignment

4 RDD

1 Randomized

assignment to

phases

2 Randomized

Promotion to

early take-up

3 DD with

5 matching

Immediate

Roll-out

1 Randomized

Assignment

4 RDD

1 Randomized

Assignment

2 Randomized

Promotion

3 DD with

5 Matching

4 RDD

If less than full

Take-up:

2 Randomized

Promotion

3 DD with

5 Matching

IE Methods

Toolbox

Randomized Assignment

Discontinuity Design

DD

Randomized Promotion

Difference-in-Differences

P-Score matching

Matching

Randomized Treatments & Controls

o Randomize!

o Lottery for who is offered benefits

o Fair, transparent and ethical way to assign benefits to equally

deserving populations.

Eligibles > Number of Benefits

o Give each eligible unit the same chance of receiving treatment

o Compare those offered treatment with those not offered

treatment (controls).

Oversubscription

o Give each eligible unit the same chance of receiving treatment

first, second, third…

o Compare those offered treatment first, with those

offered later (controls).

Randomized Phase In

= Ineligible

Randomized treatments and controls

= Eligible

1. Population

External Validity

2. Evaluation sample

3. Randomize

treatment

Internal Validity

Comparison

Unit of RandomizationChoose according to type of program

o Individual/Household

o School/Health

Clinic/catchment area

o Block/Village/Community

o Ward/District/Region

Keep in mind

o Need “sufficiently large” number of units to

detect minimum desired impact: Power.

o Spillovers/contamination

o Operational and survey costs

Randomized Assignment

How do we know we have

good clones?

In the absence of the program,

treatment and comparisons should be

identical

Let‟s compare their characteristics at

baseline (T=0)

The Problem – High Youth Unemployment in Tunisia:

23% among all higher education graduates in 2009

Persistent over time (46% 18 months after graduation)

Objectives:

Better align skills of recent graduates with private sector needs

Foster entrepreneurship among cohort of young graduates

Intervention:

National reform of last semester of undergraduate studies

In lieu of standard curriculum, opportunity to (i) participate in

entrepreneurship training, (ii) receive individual coaching, (iii)

write a business plan and (iv) participate in a business plan

competition

Case 1: Entrepreneurship Promotion in Tunisia

Context (Premand et al 2011)

Innovative program with prospective evaluation design

Evaluation question:

Does entrepreneurship training increase employment among

university graduates?

Evaluation design:

Inscription open to all 3rd year students

Oversubscription – no clear mean to prioritize needs

Individual randomization (computer –based, stratification by gender

and type of major)

Case 1: Entrepreneurship Promotion in Tunisia

Randomized Assignment

= Ineligible

Case 1: Entrepreneurship Promotion in Tunisia

Randomized Assignment

= 3rd year University Students

1. Population

2. Students

Interested in

Participating

(1702 students)

3. Randomize

treatment

Comparison

Treatment

Case 1: Entrepreneurship Promotion in Tunisia

How do we know we have good clones?

46

Randomized assignment worked in achieving balance…

Timeline: Baseline in December 2009, intervention in 2010, final evaluation results 2011

Control Treatment Diff Err. St.

Male 34% 33% -0.01 0.01

Age 22.97 23.08 0.11 0.09

Average grade in 2nd year 11.46 11.51 0.05 0.05

Has w ork experience 69.7% 71.7% 0.02 0.02

Has w ork experience related to business plan 61.0% 63.0% 0.02 0.03

Know s an entrepreneur 59.0% 63.1% 0.04 0.03

Is w illing to take risk 0.94 0.95 0.93 -0.02

Household size 6.46 6.48 0.03 0.05

Father is employed 36.5% 33.6% -0.03 0.02

Father is self-employed 27.5% 27.9% 0.00 0.01

Mother is employed 9.5% 9.5% 0.00 0.01

Mother is self-employed 7.2% 8.2% 0.01 0.02

Monthy income betw een 0 and 300 DT 23.9% 25.1% 0.01 0.02

Monthy income betw een 301 and 500 DT 30.1% 29.2% -0.01 0.02

Monthy income betw een 501 and 1000 DT 21.6% 19.5% -0.02 0.01

Keep in Mind

Randomized AssignmentIn Randomized Assignment,

large enough samples,

produces 2 statistically

equivalent groups.

We have identified the

perfect clone.

Randomized

beneficiary

Randomized

comparison

Feasible for prospective

evaluations with over-

subscription/excess demand.

Most pilots and new

programs fall into this

category.

!

Consider evaluating relative

effectiveness of alternative

program design options.

Remember

The objective of impact evaluation

is to estimate the causal effect or

impact of a program on outcomes

of interest.

Remember

To estimate impact, we need to

estimate the counterfactual.o what would have happened in the absence of

the program and

o use comparison or control groups.

Remember

We have a toolbox with 5 methods

to identify good comparison

groups.

Remember

Choose the best evaluation

method that is feasible in the

program‟s operational context.

Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel

M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary

Material, The World Bank, Washington DC

(www.worldbank.org/ieinpractice).

Spanish and French version forthcoming.

Khandker, Shahidur R., Gayatri B. Koolwal, and Samad A. Samad,

2009, Handbook on Impact Evaluation: Quantitative Methods and

Practice. Washington DC: The World Bank, 2009.

Fitzsimons, E. and M. Vera-Hernández, 2009, “A practitioner’s guide to

Evaluating the Impact of Labor Market Programs”, Employment Policy

Primer N. 12, World Bank, Washington DC.

References (Methods)

Spanish Version

& French Versions

available as well!

www.worldbank.org/ieinpractice

Thank YouThank You