Single Factor Experimental Designs I Lawrence R. Gordon Psychology Research Methods I.
-
date post
21-Dec-2015 -
Category
Documents
-
view
220 -
download
1
Transcript of Single Factor Experimental Designs I Lawrence R. Gordon Psychology Research Methods I.
Single Factor Experimental Designs I
Lawrence R. Gordon
Psychology Research Methods I
Simplest Experimental Designs:One “IV” with two levels
Independent groups– Between-subjects, random assignment
Matched groups – Between-subjects, block random assignment
Nonequivalent groups– Between-subject, selected person variables
Repeated measures– Within-subjects, assignable levels or “time”
Stroop Experiment
BLUE
RED
GREEN
BLACK
NC NCWd
0
20
40
60
80
100
120
NC NCWd
Rea
ding
Tim
es (
sec)
Reading Instructions
Question:BOWER “DROODLES” EXPERIMENT
8.71
8.72
8.73
8.74
8.75
8.76
8.77
8.78
8.79
# Droodles
NoWordsWords
Mean # Droodles recalled with no cue = 7.03 C (.91 I)
Mean # Droodles recalled with cue = 8.26 C (.33 I)
ARE THESE MEANS “DIFFERENT”?
YES - why? Later...
Condiion
With WordsWithout Words
Mea
n #
Reca
lled
10
8
6
4
2
0
number incorrect
number correct
Question:CLASS “HAVING FUN” EXAMPLE
0
2
4
6
8
10
12
14
Time Est
MoreFunLess Fun
MORE Fun mean time estimated = 8.6
LESS Fun mean time estimated = 12.5
ARE THESE MEANS “DIFFERENT”?
YES - Why? Later...
ANSWERS
How do we make these “judgments”?
INFERENTIAL STATISTICS– Null hypothesis significance test– An instance of a very general decision scheme
NULL HYPOTHESIS SIGNIFICANCE TESTING ABSTRACT
– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)
ABSTRACTRESEARCHQUESTION
VERBAL NULLHYPOTHESIS
SYMBOLICNULLHYPOTHESIS
VERBALALTERNATIVEHYPOTHESIS
SYMBOLICALTERNATIVEHYPOTHESIS
Do teachers scorehigher on the GREVERBAL than thenational average?
The teacherpopulation GRE-Vmean is equal tothe nationalaverage of 476
H0: GRE-V =476
The teacherpopulation GRE-Vmean is differentfrom the nationalaverage of 476
H1: GRE-V
476
Do males orfemales tend toscore better on theGRE-Verbal?
The male andfemale GRE-Vpopulation meansare not different
H0: M = FThe male andfemale GRE-Vpopulation meansare different
H1: M F
Is there arelationshipbetween GPA (X)and starting salary(Y) for collegegrads?
The populationcorrelation betweenGPA and startingsalary is equal tozero
H0: XY = 0 The populationcorrelation betweenGPA and startingsalary is not equalto zero
H1: XY 0
Adapted from Johnson and Christensen (2000). Educational Research. Allyn & Bacon
NULL HYPOTHESIS SIGNIFICANCE TESTING ABSTRACT
– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)
ASSESS– ASSUMING null is true, what is the “chance”
(probability) of obtaining the data we did?
ASSESS
Key question: “IF I assume that the null hypothesis is true, is my sample statistic so unlikely that it makes more sense to reject the null hypothesis (and thereby accept the alternative)?
Key concept: the “Sig” or “p =” is the answer to the question “how likely is my result IF the null hypothesis is true?”
NULL HYPOTHESIS SIGNIFICANCE TESTING
ABSTRACT– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)
ASSESS– ASSUMING H0 is true, what is the probability
or “chance” of obtaining the data we did
DECIDE– IF the chance is “small enough,” reject H0 and
INFER the “Effect” is real (what can go wrong?)
DECIDE
One way to assess: Compare p (or Sig.) from SPSS to a preselected level of
“small enough” (“significance level”) and reject the null if it is equal to or less than that
– REJECT NULL if p (usually =.05!)
– Example, reject null if p=.037 =.05 (it is!)
– NOTE: you select α (usually .05); p is computed from your data!
BUT WHAT CAN GO WRONG? Errors…!
THE DECISION SCHEME
Examples
A major problem is that NO decision is ever “guaranteed” to be right
Some examples:– Fire alarm– Jury trial– NHST
FIRE ALARM
The Decision Scheme
NULL IS TRUE H0
NO FIRE
ALT IS TRUE H1
FIRE
DECIDE:RETAIN NULL
NO ALARM PULLED
CORRECT!
No one bothered
ERROR TYPE II
"Missed" fire
DECIDE:REJECT NULL
PULL ALARM
ERROR TYPE I
"False alarm"
CORRECT!
LIVES SAVED!
TRIAL BY JURY
The Decision Scheme
NULL IS TRUE H0
Defendant is "really"innocent (assumed!)
ALT IS TRUE H1
Defendant is "really"guilty
DECIDE:RETAIN NULL
ACQUIT
CORRECT!
Innocent person isfreed
ERROR TYPE II
Guilty person getsoff
DECIDE:REJECT NULL
CONVICT
ERROR TYPE I
Innocent person isconvicted
CORRECT!
Guilty person doesthe time
Null Hypothesis Significance Testing
The Decision Scheme
NULL IS TRUE H0
"Really" just CHANCE
ALT IS TRUE H1
"Really" an EFFECT
DECIDE:RETAIN NULL
"No effect found"
CORRECT!
1- = .95
ERROR TYPE II
DECIDE:REJECT NULL
"Effect found"
ERROR TYPE I
=.05 "Level of Significance"
CORRECT!
1- "Power"
The Decision Scheme: Comments
If the reality is “chance”, we are correct by NOT inferring an effect, or wrong if we do.
• TYPE I ERROR: Reject null when null is true• Probability of a Type I error is (alpha) -- the “level
of significance”
If the reality is “effect”, we are correct BY inferring an effect, or wrong if we do not.
• TYPE II ERROR: Retain null when null is false• Probability of a Type II error is (beta)
– More common to use Power = 1 -
Question:BOWER “DROODLES” EXPERIMENT
8.71
8.72
8.73
8.74
8.75
8.76
8.77
8.78
8.79
# Droodles
NoWordsWords
Mean # Droodles recalled with no cue = 7.03 C (.91 I)
Mean # Droodles recalled with cue = 8.26 C (.33 I)
ARE THESE MEANS “DIFFERENT”?
YES - why? Condiion
With WordsWithout Words
Mea
n #
Reca
lled
10
8
6
4
2
0
number incorrect
number correct
ANSWERS REVISITEDBower Experiment F’02
INFERENTIAL STATISTICSGroup Statistics
95 8.79 2.71 .28
99 8.74 2.23 .22
95 .74 1.05 .11
99 .13 .34 3.41E-02
ConditionNo Words
With Words
No Words
With Words
Number of PicturesRemembered
Number of PicturesIncorrect
N Mean Std. DeviationStd. Error
Mean
Group Statistics
97 7.0309 2.35608 .23922
97 8.2577 2.40346 .24403
97 .9072 1.07124 .10877
97 .3299 .80002 .08123
condiionWithout Words
With Words
Without Words
With Words
Number Correct
NumberIncorrect
N Mean Std. DeviationStd. Error
Mean
BOWER EXPERIMENT: Compare Groups on Each of Two DVs
-3.590 192 .000
4.253 192 .000
Number of Droodles Correct
Number of Droodles Incorrect
t df Sig. (2-tailed)
t-test for Equality of Means
Question:CLASS “HAVING FUN” EXAMPLE
0
2
4
6
8
10
12
14
Time Est
MoreFunLess Fun
MORE Fun mean time estimated = 8.6
LESS Fun mean time estimated = 12.5
ARE THESE MEANS “DIFFERENT”?
YES - Why?
ANSWERS REVISITED“Having Fun” Example
Inferential Statistics
Independent Samples Test
-6.353 98 .000 -3.880Equal variancesassumed
Estimate of 10minute interval
t df Sig. (2-tailed)Mean
Difference
t-test for Equality of Means
Group Statistics
50 8.604 2.722
50 12.484 3.353
Experimental Conditions'More fun' (Captions)
'Less fun' (No Captions)
Estimate of 10minute interval
N Mean Std. Deviation
A New Example – from scratch:Doob and Gross (1968)... Status of frustrator as an inhibitor of horn-
honking responses. J Social Psychology, 76, 213-218.
IV: Low vs. high status car DV: Latency of following car to honk when light
turns green and car doesn’t move (seconds) Results:
– Low-status -- N=15, = 7.12 (2.77) sec
– High-status -- N=20, = 9.23 (2.82) sec
SPSS output and interpretation
LoX
HiX
SPSS COMPUTATIONDoob & Gross (1968) Data
Low status 1.68
Low status 6.42
Low status 8.58
Low status 6.85
Low status 10.59
Low status 3.26
Low status 9.44
Low status 4.84
Low status 4.98
Low status 12.31
Low status 9.01
Low status 6.13
Low status 7.86
Low status 6.71
Low status 8.14
High status 9.10
High status 7.83
High status 11.22
High status 5.29
High status 13.20
High status 11.79
High status 3.87
High status 7.41
High status 8.40
High status 14.05
High status 4.44
High status 8.11
High status 9.81
High status 11.79
High status 6.84
High status 12.64
High status 8.68
High status 10.66
High status 9.95
High status 9.53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Status Group
Latency toHonk Horn
(sec)Group Statistics
15 7.1200 2.7727 .7159
20 9.2305 2.8203 .6306
Status GroupLow status
High status
Latency to HonkHorn (sec)
N Mean Std. DeviationStd. Error
Mean
Independent Samples Test
.024
.877
-2.207 -2.212
33 30.586
.034 .035
-2.1105 -2.1105
.9564 .9541
-4.0564 -4.0574
-.1646 -.1636
F
Sig.
Levene's Test forEquality of Variances
t
df
Sig. (2-tailed)
Mean Difference
Std. Error Difference
Lower
Upper
95% Confidence Intervalof the Difference
t-test for Equality ofMeans
Equalvariancesassumed
Equalvariances not
assumed
Latency to Honk Horn (sec)
NHST: Two Independent Means
ABSTRACT
– H0 “chance”: 1 = 2 OR 1 - 2 = 0
– H1 “effect”: 1 2 OR 1 - 2 0
ASSESS– ASSUMING H0 is true, what is the probability or “chance” of
the empirical sample outcome?
– Compute independent-sample t; note the “p = ”. DECIDE
– IF the chance is “small enough,” reject H0; otherwise do not.
– If p(t|Null) (=.05), reject null and interpret alternative.
Repeated-measuresDefinitional Example “Family therapy for
anorexia” (1994) Before and after
family therapy - weights– Using SPSS
• t-test on paired -samples
– Was there a change?
Everitt (1994) Data
83.8 95.2 11.4
83.3 94.3 11.0
86.0 91.5 5.5
82.5 91.9 9.4
86.7 100.3 13.6
79.6 76.7 -2.9
76.9 76.8 -.1
94.2 101.6 7.4
73.4 94.9 21.5
80.5 75.2 -5.3
81.6 77.8 -3.8
82.1 95.5 13.4
77.6 90.7 13.1
83.5 92.5 9.0
89.9 93.8 3.9
86.0 91.7 5.7
87.3 98.0 10.7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Beforetherapy
Aftertherapy
After -Before
One-Sample Test
4.185 16 .001 7.265After - Beforet df Sig. (2-tailed)
MeanDifference
Test Value = 0
Repeated-measuresDefinitional Example “Family therapy for anorexia” (1994) SPSS -- standard analysis for paired-samples:
Paired Samples Statistics
83.229 17 5.017
90.494 17 8.475
Before therapy
After therapy
Pair1
Mean N Std. Deviation
Paired Samples Correlations
17 .538 .026Before therapy& After therapy
Pair1
N Correlation Sig.
Paired Samples Test
-7.265
7.157
-4.185
16
.001
Mean
Std. Deviation
Paired Differences
t
df
Sig. (2-tailed)
Before therapy- After therapy
Pair 1
Can a Machine Tickle?
Reference: Harris & Christenfeld (1999) Psychonomic Bull. & Rpts.
Problem: Why can’t you tickle yourself? Reflex vs Interpersonal explanations. Review of long history and more recent research.
Method: 21 & 14 - UCSD undergrads, age 18-28. Tickled twice on foot by “machine” vs “person” (both really a person). Measured by both self-report and behavior rating of video. Open answers to “was sensation produced by the E different than [that] produced by the machine?”
Can a machine, cont… Results: interrater reliability, descriptives in
table, inferential tests in text of article: “There was no hint of any difference between tickle responses
produced by the experimenter and by the machine for behavior [t(32)=0.27, n.s.] or self-report [t(32)=0.12, n.s.].”
Discussion:– “generally favorable to the view that the tickle
response is some form of innate stereotyped motor behavior, perhaps akin to a reflex”
– suggests “that ticklish laughter itself does not require any belief that another human being is producing the stimulation.”
NHST: Two Related Means
ABSTRACT
– H0 “chance”: 1 - 2 = 0 OR 1 = 2
– H1 “effect”: 1 - 2 0 OR 1 2
ASSESS– ASSUMING H0 is true, what is the probability or “chance” of
the empirical sample outcome?
– Compute related-sample t; note the “p = ”. DECIDE
– IF the chance is “small enough,” reject H0; otherwise do not.
– If p(t|Null) (=.05), reject null and interpret alternative.
REVIEW & SUMMARY
Two-level single factor (IV) designs:– Independent groups *– Nonequivalent groups *– Matched groups **– Repeated measures **
* use t-test for independent means ** use t-test for paired means