Juan-Camilo Crdenas Universidad de los Andes Jim Murphy
University of Alaska Anchorage Experimental Methods in Social
Ecological Systems
Slide 2
Agenda Day 1 Noon 12:15Welcome, introductions 12:15 1:15Play
Game #1 (CPR: 1 species vs. 4 species) 1:15 2:00Debrief game #1 and
other results from the field 2:00 2:15Break 2:15 3:15Game #2 (Beans
game) 3:15 4:00Debrief Game #2 4:00 4:15Break 4:15 5:00Basics of
Experimental design Homework for Day 2: Think of an interesting
question or problem to be worked in groups tomorrow
Slide 3
Agenda Day 2 8:30 9:15Designing and running experiments in the
field 9:15 10:15Classwork: work in groups solving experimental
design problems 10:15 10:30Break 10:30 11:15Discussion on group
solutions 11:15 noonBegin design your own experiment (form groups
based on best ideas proposed) Noon 1:00 Lunch 1:00 1:30Continue
design your own experiment (work in groups) 1:30 2:30Present
designs 2:30 3:00Feedback: how could we make this workshop
better?
Slide 4
Materials online We will create a web site with materials from
the workshop. Please give us your email address (write neatly!!)
and we will send you a link when it is ready.
Slide 5
Why run experiments?
Slide 6
Slide 7
Slide 8
Types of experiments 1. Speaking to Theorists Test a theory or
discriminate between theories Compare theoretical predictions with
experimental observations Does non-cooperative game theory
accurately predict aggregate behavior in an unregulated CPR?
Explore the causes of a theorys failure If what you observe in the
lab differs from theory, try to figure out why. Communication
increases cooperation in a CPR even though it is cheap talk Why? Is
my experiment designed correctly? What caused the failure? Theory
stress tests (boundary experiments)
Slide 9
Types of experiments (cont.) 2. Searching for Facts Establish
empirical regularities as a basis for new theory In most sciences,
new theories are often preceded by much observation. I keep
noticing this. Whats going on here? The Double Auction Years of
experimental data showed its efficiency even though no formal
models had been developed to explain why this was the case.
Behavioral Economics Many experiments identifying anomalies, but
have not yet developed a theory to explain.
Slide 10
Types of experiments (cont.) 3. Whispering in the Ears of
Princes Evaluate policy proposals Alternative institutions for
auctioning emissions permits Allocating space shuttle resources
Test bed for new institutions Electric power markets Water markets
Pollution permits FCC spectrum licenses
Slide 11
Basics of Experimental Design
Slide 12
Baseline static CPR game Common pool resource experiment Social
dilemma Individual vs group interests Benefits to cooperation
Incentives to not cooperate Field experiments in rural Colombia
Groups of 5 people Decide how much to extract/harvest from a shared
natural resource
Slide 13
Subjects choose a level of extraction 0 8 Low harvest levels
(conservative) High harvest levels
Slide 14
Payoffs also depend on choices of other 4 group members
Slide 15
Slide 16
Group earnings largest if all choose 1
Slide 17
Strong incentives to harvest more than 1
Slide 18
Nash equilibrium: All choose 6 Social optimum: All choose
1
Slide 19
Comment on payoff tables The early CPR experiments typically
used payoff tables. We dont live in a world of payoff tables Frames
how a person should think about the game A lot of numbers, hard to
read Too abstract?? More recent CPR experiments using richer
ecological contexts e.g., managing a fishery is different than an
irrigation system
Slide 20
Objective To explore interaction between: Formal regulations
imposed on a community to conserve local natural resources Informal
non-binding verbal agreements to do the same.
Slide 21
Possible 2x3 factorial design External Enforcement
NoneLowMedium Communication No BaselineLowMedium Yes Comm OnlyLow +
CommMedium + Comm Groups of N=5 participants Play 10 rounds of one
of the 6 treatments Enforcement Individual harvest quota = 1
(Social optimum) Exogenous probability of audit Fine (per unit
violation) if caught exceeding quota Participants paid based on
cumulative earnings in all 10 rounds These 2 treatments have been
conducted ad nauseum. Are they necessary?
Slide 22
Baselines and replication Replication In any experimental
science, it is important for key results to be replicated to test
robustness Link to previous research. Is your sample unique?
Baseline or control group The baseline treatment also gives us a
basis for evaluating what the effects are of each treatment In any
experimental study, it is crucial to think carefully about the
relevant control!
Slide 23
Alternative design Stage 1 Baseline CPR (5 rounds) Stage 2 one
of the 5 remaining treatments (5 rounds) Comm only Low Low + Comm
Med Med + Comm Advantage Having all groups play Stage 1 baseline
facilitates a clean comparison across groups. Disadvantage fewer
rounds of the Stage 2 treatments. Enough time to converge??
Disadvantage(?) All stage 2 decisions conditioned upon having
already played a baseline
Slide 24
Optimal sample size External Enforcement NoneLowMedium
Communication No BaselineLowMedium Yes Comm OnlyLow + CommMedium +
Comm Groups of N=5 participants How many groups per treatment
cell?
Slide 25
John Lists notes on sample size Also see : John A. List Sally
Sadoff Mathis Wagner So you want to run an experiment, now what?
Some simple rules of thumb for optimal experimental design
Experimental Economics (2011). 14:439-457
Slide 26
Some Design Insights A. 0 (control) / 1 (treatment), equal
outcome variances B. 0/1 treatment, unequal outcome variances C.
Treatment Intensityno longer binary D. Clusters
Slide 27
Some Design Rules of Thumb for Differences in between-subject
experiments Assume that X 0 is N( 0, 0 2 ) and X 1 is N( 1, 1 2 );
and the minimum detectable effect 1 0 = . H 0 : 0 = 1 and H 1 : 1 0
= . We need the difference in sample means X 1 X 0 to satisfy:
1.Significance level (probability of Type I error) = : 2. Power (1
probability of Type II error) = 1-:
Slide 28
Standard Case
Slide 29
Power A. Our usual approach stems from the standard regression
model: under a true null what is the probability of observing the
coefficient that we observed? B. Power calculations are quite
different, exploring if the alternative hypothesis is true, then
what is the probability that the estimated coefficient lies outside
the 95% CI defined under the null.
Slide 30
Sample Sizes for Differences in Means (Equal Variances) Solving
equations 1 and 2 assuming equal variances 1 2 = 2 2 : Note that
the necessary sample size Increases rapidly with the desired
significance level ( t /2 ) and power ( t ). Increases
proportionally with the variance of outcomes ( ). Decreases
inversely proportionally with the square of the minimum detectable
effect size ( ). Sample size depends on the ratio of effect size to
standard deviation. Hence, effect sizes can just as easily be
expressed in standard deviations.
Slide 31
Standard is to use =0.05 and have power of 0.80 (=0.20). So if
we want to detect a one-standard deviation change using the
standard approach, we would need: n = 2(1.96 + 0.84) 2 *(1) 2 =
15.68 observations in each cell std. dev. change is detectable with
4*15.68 ~ 64 observations per cell n=30 seems to be the magic
number in many experimental studies: ~ 0.70 std. dev. change.
Slide 32
Sample Size Rules of Thumb: Assuming =0.05 and = 0.20 requires
n subjects: = 0.05 and = 0.05 1.65 n = 0.01 and = 0.20 1.49 n =
0.01 and = 0.05 2.27 n
Slide 33
Example from a recent undergrad research project Local homeless
shelter was conducting a fundraising campaign. They asked us to
replicate Lists study about the effects of matching contributions.
The shelter wanted the same 4 treatments as in List: No match, 1:1,
2:1, and 3:1 to test whether high match ratios would increase
contributions. Local oil company agreed to donate up to $5000 to
provide a match for money donated.
Slide 34
Fundraising example The shelter had funds to send out 16,000
letters to high income women in Anchorage who had never donated
before. Expected response rate was about 3 to 4% (n 480-640)
Question: How many treatments should we run, if we expect about 500
responses? They said a meaningful treatment effect would be ~$25.
Standard deviation from previous campaigns was ~$100.
Slide 35
Sample size With only 500 expected responses, we could only
conduct 2 treatments.
Slide 36
Sample Sizes for Differences in Means (unequal variances)
Another Rule of Thumbif the outcome variances are not equal then:
The ratio of the optimal proportions of the total sample in control
and treatment groups is equal to the ratio of the standard
deviations. Example: Communication tends to reduce the variance, so
perhaps groups in this treatment.
Slide 37
Treatment levels External Enforcement NoneLowMediumHigh
Communication No BaselineLowMediumHigh Yes Comm OnlyLow +
CommMedium + CommHigh + Comm How many levels of enforcement do we
need? Do we need 3 levels of enforcement?
Slide 38
What about Treatment Levels? Assume that you are interested in
understanding the intensity of treatment : Level of enforcement
(e.g., audit probability) Assume that the outcome variance is equal
across various cells. How should you allocate the sample if audit
probability could be between 0-1? For simplicity, say X=25%, 50%,
or 75% Assume that you have 1000 subjects available.
Slide 39
Reconsider what we are doing: Y = XB + e One goal in this case
is to derive the most precise estimate of B by using exogenous
variation in X. Recall that the standard error of B is =
var(e)/n*var(X)
Slide 40
Rules of Thumb Linear sample @ X=25% 0 @ X=50% @ X=75%
Quadratic @ X=25% @ X=50% @ X=75% Intuition:The test for a
quadratic effect compares the mean of the outcomes at the extremes
to the mean of the outcome at the midpoint
Slide 41
Intra-cluster Correlation What happens when the level of
randomization differs from the unit of observation? Think of
randomization at the village level, or at the store level, and
outcomes are observed at the individual level. Classic example:
comparing two textbooks. Randomization over classrooms Observations
at individual level Another Example: To test robustness of results,
you may want to conduct the experiments in multiple communities.
How do you allocate treatments across communities, especially if
number of participants per village is small? In our Colombian
enforcement study, we replicated the entire design in three
regions. In a separate CPR experiment in Russia, we visited 3
communities in one region. Each treatment was conducted 1x in each
community. We are assuming that the differences across communities
are small. Cannot make cross-community comparison
Slide 42
Intracluster Correlation Real Sample Size (RSS) = mk/CE m =
number of subjects in a cluster k = number of clusters CE = 1 +
(m-1) = intracluster correlation coefficient = s 2 B /(s 2 B + s 2
w ) s 2 B = variance between clusters s 2 w = variance within
clusters
Slide 43
Intracluster Correlation What does 0 mean? No correlation of
responses within a cluster No need to adjust optimal sample sizes
What does 1 mean? All responses within a cluster are identical
Large adjustment needed: RSS is reduced to the number of
clusters
Slide 44
Example Pilot testing confirms our suspicion, yielding = 0.04.
They wish to detect a 1/10 std. dev. change. Using the standard
approach, what should the sample size equal?
Slide 45
0: What is n? Sample Size Formula: n = 2*(t a + t B ) 2 * [/] 2
n = 1568 at each level; 3136 total.
Slide 46
Example RSS = mk/CE =784*4/(1+.04(784-1)) ~97! What is the
required sample size? = 2*(t a + t B ) 2 * 100(1+783(0.04)) =
15.68*3232(note that 0: 15.68*100) =50,678 at each incentive
level!
Slide 47
Randomized factorial design Advantages Independence among the
factor variables Can explore interactions between factors
Disadvantages Number of treatments grows quickly with increase in
number of factors or levels within a factor Example: Conduct
experiment in multiple communities and use community as a treatment
variable
Slide 48
Fractional factorial design Say we want to add informal
sanctions with a 3:1 ratio I can pay $3 to reduce your earnings by
$1 1 new factor with 2 levels To run all combinations would require
2x2x2 = 8 treatments Assume optimal sample size per cell is 6
groups of 5 people (30 total per cell) 8 treatments x 30
people/cell = 240 people Assume you can only recruit about half
that (~120) You could run only 3 groups per cell (15 people) lose
power/significance Solution: conduct a balanced subset of
treatments External Enforcement LowMedium Communication No
LowMedium Yes Low + CommMedium + Comm
Slide 49
Fractional factorial design If you are considering this
approach, there are a few different design options depending upon
the effects you want to capture, number of treatments, etc. This is
just one example! Communication External Enforcement Sanctions
Slide 50
Fractional factorial design Advantage: dramatically reduces the
number of trials Disadvantage: achieves balance by systematically
confounding some direct effects with some interactions. It may not
be serious, but you will lose the ability to analyze all of the
different possible interactions.
Slide 51
Nuisance Variables Other factors of little or no primary
interest that can also affect decisions. These nuisance effects
could be significant. Common examples Gender, age, nationality
(most socio-economic vbls) Selection bias Recruitment -- open to
whoever shows up vs random selection Experience Participated in
previous experiments Learning Concern in multi-round experiments
Non-experiment interactions People talking before an experiment
while waiting to start In a community, people may hear about
experiment from others
Slide 52
Confounded variables Confounding occurs when the effects of two
independent variables are intertwined so that you cannot determine
which of the variables is responsible for the observed effect.
Example: What are some potential confounds when comparing the
Baseline with Low? External Enforcement NoneLowMedium Communication
No BaselineLowMedium Yes Comm OnlyLow + CommMedium + Comm
Slide 53
Another design approach If trying to identify factors that
influence decisions, try adding them one at a time. Imposing a fine
for non-compliance differs from the baseline CPR in multiple ways.
Possible confounds: FRAME The simple existence of a quota may send
a signal about expected behavior, independent of any audits or
fines. GUILT = FRAME + audit Getting audited may generate feelings
of guilt because the individual is privately reminded about
anti-social choices FINE = FRAME + GUILT (audit) + fine for
violations Are people responding to the expected penalty? Or are
they responding to the frame from the quota?
Slide 54
3 Sources of variability 1. conditions of interest (wanted) 2.
measurement error (unwanted) People can make mistakes,
misunderstand instructions, typos 3. experimental material and
process (unwanted) No two people are identical, and their responses
to the same situation may not be the same, even if your theory
predicts otherwise.
Slide 55
Design in a nutshell Isolate the effects of interest Control
what you can Randomize the rest
Slide 56
Some Practical Advice
Slide 57
Some thoughts in no particular order Think carefully about your
research question Formulate testable hypotheses grounded in theory
How does your idea contribute to the literature? Think carefully
about possible results and how they would be interpreted What if
results are consistent with theory/expectations? What if they are
not? Be prepared for either possibility Prepare code for data
analysis BEFORE running experiments Forces you to think carefully
about what your data will look like, and what you want to get out
of it.
Slide 58
Some thoughts on data analysis Are your data discrete, binary
or continuous? Multinomial logit, ordered probit, logit, Poission,
linear Repeated observations or one-shot decisions Random effects,
hierarchical mixed models, nonparametrics
Slide 59
More thoughts Subject payments and salience One distinguishing
feature of economic experiments is that subjects are paid based on
their decisions and possibly the decisions of others Must pay
enough for subjects to take experiment seriously Avoid tournaments
E.g., giving a bonus to person who earns the most money Typically
pay in cash, in some field experiments may use another medium Never
use deception! Keep earnings and decisions private
Slide 60
Instructions Think carefully about every word in your
instructions Framing effects partner in the UG or your opponent
Could frame UG as an offer to sell at a price Using examples I used
the example of $14/$6 split. Does that suggest proposers should
take more than half? What if I used a 10/10 split? Or 6/14? Could
give multiple examples Experiment length Be aware that people get
tired and bored
Slide 61
Other stuff Strategy method Hot vs cold decisions Paying for
just one round in multi-round game AB-BA designs for within-subject
comparisons Playing multiple games and paying for just one Factor
levels should allow for enough distance between hypotheses Social
optimum is people will harvest 10% of the fish Nash equilibrium
predicts 15%. Nash equilibrium & social optimum should be
farther apart