Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something...

71
Evaluation Design: Experiments, Quasi-Experiments, and Non- Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Transcript of Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something...

Page 1: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments

T1 x T2C1 C2

Somethinghappens

here

(pre-post reflexive design)

Page 2: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Core Concepts

1. Causal Analysis and the Counter-Factual

1. The Competing Hypotheses Framework

2. Calculation of Program Effect (effect size)

3. Randomization Process

4. Control versus Comparison Group

5. Treatment Effect

1. Average Treatment Effect (ATE)

2. Intention to Treat (ITT)

3. Treatment on the Treated (TOT)

Page 3: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

THE COUNTERFACTUAL:

Page 4: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Compared to what?

T=1 T=2Program

Program Effect?

The counter-factual is exercise of trying to figure out what the outcome would have been if there had been no program intervention. So we need a point of reference to measure that scenario…

Page 5: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Were suicide rates HIGH for a specific high school in suburban California?

95% CI:Average Rate Per Year at HS

Null Hypothesis: Population Average

Null Hypothesis: All HS Students

OR All Californians

Null Hypothesis: All Suburban HS

Students

Null

Null

Null

These are all valid conter-factuals.

How we define our comparison drives the conclusions.

Page 6: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

What is the implied counter-factual?

T=1 T=2Program

Program Effect?

If you see a p-value, there is often an implied counter-factual. Figuring out what it is, you often know what went wrong with the research design.

Page 7: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 8: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 9: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 10: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Another Example:

Page 11: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

G1t=0 G1t=1 G1t=2 G1t=3 G1t=4

G2t=0 G2t=1 G2t=2 G2t=3 G2t=4

G3t=0 G3t=1 G3t=2 G3t=3 G3t=4

G4t=0 G4t=1 G4t=2 G4t=3 G4t=4

x

x

x

x

Treatment Groups

Control Groups

Page 12: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

G1t=0 G1t=1 G1t=2 G1t=3 G1t=4

G2t=0 G2t=1 G2t=2 G2t=3 G2t=4

G3t=0 G3t=1 G3t=2 G3t=3 G3t=4

G4t=0 G4t=1 G4t=2 G4t=3 G4t=4

x

x

x

x

Treatment GroupControl Group

Specific tests: treatment gains for late treatment?

Page 13: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

A valid counter-factual allows us to answer the following two questions:

1) Compared to what? The program outcomes are different than outcomes in the comparison group. The comparison group is defined by the researcher.

In some special cases the comparison group is identical (statistically speaking) to the treatment group. In this case we call it a “control” group.

Page 14: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Selecting a good (valid) counter-factual is hard:

There is no going back…we can’t unlearn, undo the effects of a drug, reverse the effects of a program.

How do we identify a plausible counter-factual?

Page 15: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

A valid counter-factual allows us to answer the following two questions:

1) Compared to what? The program outcomes are different than outcomes in the comparison group. The comparison group is defined by the researcher.

In some special cases the comparison group is identical (statistically speaking) to the treatment group. In this case we call it a “control” group.

2) How big is the program effect? Is the difference is meaningful (statistically significant and socially salient).?

In the simple case the program effects is just the difference of the average outcome of the treatment and control group, but in practice there are many ways we calculate an effect.

Page 16: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Effect Size

Is the Princeton Review Class Effective?

Treatment Group: GRE=580

Control Group: GRE=440

Effect = 580 – 440 = 140

P-val = 0.023Diff=0

XT-XC=140

𝑆𝐸(𝑋 ¿¿𝑇 −𝑋𝐶)=√ 𝑣𝑎𝑟𝑇𝑛𝑇

+𝑣𝑎𝑟 𝐶

𝑛𝐶

¿

Is it significant?

How big is the effect?

“Effect” translates to a calculation of impact plus statistical significance.

Page 17: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Effects Size in Correlation Study:

50 55 60 65 70

50

100

150

Dosage and Response

Caffeine (mm)

He

art

Ra

te (

per

min

)

For a one-unit change in X, we expect a β1 change in Y.

Effect

One unit of X

𝑦=𝛽0+𝛽1𝑥

Slope=0

β1 =140

How big is the effect?

Is it significant?

Page 18: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

THE THREE WAYS WE CALCULATE EFFECTS

Page 19: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Two “counterfeit” or “weak” counterfactuals

time=1 time=2Program

time=1 time=2Program

Pre-Post

Effect

Post-Only

Effect

These are only valid when certain conditions are met.

Page 20: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Example: Do Charter Schools Outperform Public Schools?

http://www.rightmichigan.com/story/2011/6/21/23927/4600

“Don't get me wrong. I am not opposed to charter schools on principle. My beef with charter schools is that most skim the most motivated students out of the poorest communities, and many have disproportionately small numbers of children who need special education or who are English-language learners. The typical charter, operating in this way, increases the burden on the regular public schools, while privileging the lucky few. Continuing on this path will further disable public education in the cities and hand over the most successful students to private entrepreneurs.”

http://blogs.edweek.org/edweek/Bridging-Differences/2009/11/obama-and-duncan-are-wrong-abo.html

Page 21: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

The difference in difference estimator:

Group / Time T=1 T=2

Treatment T1 T2

Control C1 C2

Difference between treated and untreated: T2 – C2

But what about the trend???

Program Impact= (T2 – T1) – (C2 – C1)

Gains during the program period

Gains as a result of trend

Page 22: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

time=1 time=2Program

Policy / program group

Control / comparison group

Outcome T2

T1

C1

C2

First Difference: Difference in Difference:

Effect = T2 – T1 Effect = (T2 – T1) – (C2 – C1)

Second difference:

Effect = T2 – C2

*trend*

Total Change

Trend

Effect

The difference in difference estimator:

Page 23: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Example of the difference-in-difference estimate in practice

Page 24: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Comparing teacher “effects”

Distribution of diff-in-diff scores

Page 25: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Breaking it Down:

Pre Post

aa + c

a + b

a + b + c + d

Effect = (T2 – T1) – (C2 – C1)

Effect = ( c + d ) – c = d

Single Diff 2 = (a+b+c+d)-(a+b) = (c+d)

Single Diff 1 = (a+c)-(a)= c

Page 26: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

As a regression of dummies

)2(2 3210 TreatGroupPeriodTreatGroupPeriodY

Group / Time T=1 T=2

Treatment T1 T2

Control C1 C2

Slope Effect Interpretation

β0 C1 Baseline

β1 C2-C1 Trend

β2 T1-C1 Initial Difference

β3 (T2-T1)-(C2-C1) Treatment effect

Page 27: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Yi,t = a + bTreati,t+ cPosti,t + d(Treati,tPosti,t )+ ei,t

Diff-in-Diff=(Single Diff 2-Single Diff 1)=(c+d)-c=d

Putting Graph & Regression Together

Pre Post

aa + c

a + b

a + b + c + d

Single Diff 2 = (a+b+c+d)-(a+b) = (c+d)

Single Diff 1 = (a+c)-(a)= c

Page 28: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Validity of two “counterfeit” counterfactuals:

(T2 – T1) – (C2 – C1) = T2 – T1

IFF

C2 – C1 = 0(no trend)

Pre-Post Post-Only

(T2 – T1) – (C2 – C1) = T2 – C2

IFF

C1 – T1 = 0(equivalent at time 1)

time=1 time=2Program

Outcome

T2

T1

C1

C2

Why we useRandomization

or Matching

Page 29: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Textbook notation for the effect calculation:

Program Effect = E( Y | P=1 ) – E( Y | P=0 )

In plain English, we calculate the effect by subtracting the average outcome E(Y) of the control group (P=0) from the average outcome of the treatment group (P=1).

Why might this be misleading?

Page 30: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 31: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 32: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Validity of two “counterfeit” counterfactuals:

(T2 – T1) – (C2 – C1) = T2 – T1

IFF

C2 – C1 = 0(no trend)

Pre-Post Post-Only

(T2 – T1) – (C2 – C1) = T2 – C2

IFF

C1 – T1 = 0(equivalent at time 1)

time=1 time=2Program

OutcomeT2

T1

C1C2

Randomization gives us this. Therefore

we can use post-testonly estimators.

This is the exception, not the rule.

Page 33: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

DIFFERENT TYPES OF COUNTERFACTUALS IN PRACTICE

Page 34: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Varieties of the Counter-Factual:

T=1 T=2Program

T=1 T=2Program

Pre-Post

Effect

Post-OnlyPre-Post

With Control

Effect

T=1 T=2Program

A

B

Effect:A-B

Time

InterruptedTime Series

Effect

Program Qualification

Regression Discontinuity

Effect

Program

(Diff-in-Diff)

Page 35: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Interrupted Time Series

Page 36: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Regression discontinuity design

Source: Martinez, 2006, Course notes

Page 37: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Regression Discontinuity

Page 38: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Core Concepts

1. Causal Analysis and the Counter-Factual

1. The Competing Hypotheses Framework

2. Calculation of Program Effect (effect size)

3. Randomization Process

4. Control versus Comparison Group

5. Treatment Effect

1. Average Treatment Effect (ATE)

2. Intention to Treat (ITT)

3. Treatment on the Treated (TOT)

Page 39: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

SUCCESSFUL AND UNSUCCESSFUL RANDOMIZATION

Page 40: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 41: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Is this problematic?

Page 42: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Bonferroni Correction:

When we want to be 95% confident that two groups are the same, and we can measure those groups using a set of contrasts, then our decision rule is no longer to reject the null (that the groups are the same) if thep-value < 0.05. A “contrast” is a comparison of means of any measured characteristic between two groups.

If we have a 5% chance of observing a p-value of 0.05 for each contrast, then the probability of observing at least one contrast with a p-value that small is greater than 5%! It is actually n*.05, where n is the number of contrasts.

So if we want to be 95% confident that the groups are different (not just the contrasts), we have to adjust our decision rule to /n.

For example, if we have 10 contrasts, then our decision rule is now 0.05/10, or 0.005. The p-value of at least one contrast must be below 0.005 for us to conclude that the groups are different.

x1 <- rbinom( 10000, 6, 0.05 ) table( x1 ) / 10000y1 <- rbinom( 10000, 6, 0.05/6 ) table( y1 ) / 10000

Page 43: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Test for “Happy” Randomization

0.05 / 6 = 0.0083

x1 <- rbinom( 10000, 6, 0.05 ) table( x1 ) / 10000y1 <- rbinom( 10000, 6, 0.05/6 ) table( y1 ) / 10000

Page 44: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

RCT versus Natural Experiments:

1. RCT assumes complete control over the assignment process

2. Natural Experiments often utilize randomization:

Charter Schools Example

Vietnam Veterans Example

Page 45: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

time=1 time=2Program

Policy / program group

Control / comparison group

“Control” Versus “Comparison” Groups

Outcome

First Difference: Difference in Difference:

Effect = T2 – T1 Effect = (T2 – T1) – (C2 – C1)

Effect = T2 – C2

T2

T1

C1

C2

*trend*

Two Considerations:

Does C1 = T1 ?Not usually for comparison groups.

Is C2 – C1 accurate reflection of trend?In some cases, the comparison group can adequately capture trend.

Page 46: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

DIFFERENT INTERPRETATIONS OF PROGRAM EFFECTS

Page 47: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Estimation of the counter-factual:

Program Effect = E( Y | P=1 ) – E( Y | P=0 )

Operationalized as:

Program Effect = ( T2 – T1 ) – (C2 – C1 )

Control Group

Given Bed Nets

Given Bed Nets AND Use Them!

Page 48: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Calculation of Treatment Effects:

Page 49: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Terminology:

• “Average” treatment effects– Treatment on the Treated (TOT) Effects– Intention to Treat (ITT) Effects

Given Bed Nets

Given Bed Nets AND Use Them!

Page 50: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Exam Question:

What is the difference between non-compliance and attrition?

Page 51: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

CAMPBELL SCORES: ELIMINATING COMPETING HYPOTHESES

Page 52: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

http://www.youtube.com/watch?v=7DDF8WZFnoU

Can Ants Count?

Page 53: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypotheses

The Program Hypothesis:

The change that we saw in our study group above and beyond the comparison group (the effect size) was a result of the program.

The Competing Hypothesis:

The change that we saw in our study group above and beyond the comparison group was a result of _______.

(insert any item of the Campbell Score)

Page 54: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

The Campbell Score:

Omitted Variable BiasSelection / Omitted VariablesNon-Random Attrition Trends in the DataMaturationSecular TrendsTestingSeasonalityRegression to the Mean

Study CalibrationMeasurement ErrorTime-Frame of Study Contamination FactorsIntervening Events

Page 55: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #1

Selection Into a Program

If people have a choice to enroll in a program, those that enroll will be different than those that do not.

This is a source of omitted variable bias.

The Fix:

Randomization into treatment and control groups.

Randomization must be “happy”!

Page 56: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Test for “Happy” Randomization

x1 <- rbinom( 10000, 6, 0.04 ) table( x1 ) / 10000

Page 57: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #2

Non-Random Attrition

If the people that leave a program or study are different than those that stay, the calculation of effects will be biased.

The Fix:

Examine characteristics of those that stay versus those that leave.

$3.00$2.50$2.00$1.50$1.00

$3.00$2.50$2.00$1.50$1.00

Mean = $2.00Mean = $2.50

Microfinance Example: Artificial effect size

Attrit

Page 58: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Test for Attrition

Can also be tested in another way: do background characteristics of T1 = T2

Page 59: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Separating Trend from Effects

TreatmentEffect

Trend

Total Gains During Study

Time=1 Time=2

C1 = T1

T2

C2

Outcome

T2-C2 removes trend

Page 60: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Separating Trend from Effects

Total Gain

RemovedBy T2-C2

Time=1 Time=2

C1 ≠ T1

T2

C2

Outcome

Actual Trend

T2-C2 does NOT fully remove trend

NOTE, diff-in-diff separates trends even when groups are not equivalent.

T2-C2

Page 61: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Separating Trend from Effects

Total Gain

RemovedBy T2-C2

Time=1 Time=2

C1 ≠ T1

T2

C2

Outcome

Actual Trend

T2-C2 removes too much trend

NOTE, diff-in-diff separates trends even when groups are not equivalent.

T2-C2

Page 62: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #3

Maturation

Occurs when growth is expected naturally, such as increase in cognitive ability of children because of natural development independent of program effects.

The Fix:

Use a comparison group to remove the trend.

Pre-Post With Control

T=1 T=2Program

A

B

Effect:A-B

Page 63: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #4

Secular Trends

Very similar to maturation, except the trend in the data is caused by a global process outside of individuals, such as economic or cultural trends.

The Fix:

Use a comparison group to remove the trend.

Pre-Post With Control

T=1 T=2Program

A

B

Effect:A-B

Page 64: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #5

Seasonality

Data with seasonal trends or other cycles will have natural highs and lows.

The Fix:

Only compare observations from the same time period, or average observations over an entire year (or cycle period).

AprilOctober

General Trend

Page 65: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #6

Testing

When the same group is exposed repeatedly to the same set of questions or tasks they can improve independent of any training.

The Fix:

This problem only applies to a small set of programs. Change tests, use post-test only designs, or use a control group that receives the test.

Pre-Post With Control

T=1 T=2Program

A

B

Effect:A-B

Page 66: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #7

Regression to the Mean

Every time period that you observe an outcome, during the next time period the outcome naturally has a higher probability of being closer to the mean than it does of staying the same or being more extreme. As a result, quality improvement programs for low-performing units often have a built-in improvement bias regardless of program effects.

The Fix:

Take care not to select a study group from the top or bottom of the distribution in a single time period (only high or low performers).

Page 67: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20100

5

10

15

20

25

30

Home Runs by Derek Jeter Per SeasonHo

me

Runs

Average = 15.6

Regression to the Mean Example

Page 68: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #8

Measurement Error

If there is significant measurement error in the dependent variables, it will bias the effects towards zero and make programs look less effective.

The Fix:

Use better measures of dependent variables.

Page 69: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #9

Study Time-Frame

If the study is not long enough it make look like the program had no impact when in fact it did. If the study is too long then attrition becomes a problem.

The Fix:

Use prior knowledge or research from the study domain to pick an appropriate study period.

Examples:

• Michigan Affirmative Action Study• Iowa liquor law change

Page 70: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)
Page 71: Evaluation Design: Experiments, Quasi-Experiments, and Non-Experiments T1 x T2 C1 C2 Something happens here (pre-post reflexive design)

Competing Hypothesis #10

Intervening Events

Has something happened during the study that affects one of the groups (treatment or control) but not the other?

The Fix:

If there is an intervening event, it may be hard to remove the effects from the study.