Single Factor Experimental Designs I Lawrence R. Gordon Psychology Research Methods I.

Single Factor Experimental Designs I

Lawrence R. Gordon

Psychology Research Methods I

Simplest Experimental Designs:One “IV” with two levels

Independent groups– Between-subjects, random assignment

Matched groups – Between-subjects, block random assignment

Nonequivalent groups– Between-subject, selected person variables

Repeated measures– Within-subjects, assignable levels or “time”

Stroop Experiment

BLUE

RED

GREEN

BLACK

NC NCWd

0

20

40

60

80

100

120

NC NCWd

Rea

ding

Tim

es (

sec)

Reading Instructions

Question:BOWER “DROODLES” EXPERIMENT

8.71

8.72

8.73

8.74

8.75

8.76

8.77

8.78

8.79

# Droodles

NoWordsWords

Mean # Droodles recalled with no cue = 7.03 C (.91 I)

Mean # Droodles recalled with cue = 8.26 C (.33 I)

ARE THESE MEANS “DIFFERENT”?

YES - why? Later...

Condiion

With WordsWithout Words

Mea

n #

Reca

lled

10

8

6

4

2

0

number incorrect

number correct

Question:CLASS “HAVING FUN” EXAMPLE

0

2

4

6

8

10

12

14

Time Est

MoreFunLess Fun

MORE Fun mean time estimated = 8.6

LESS Fun mean time estimated = 12.5


YES - Why? Later...

ANSWERS

How do we make these “judgments”?

INFERENTIAL STATISTICS– Null hypothesis significance test– An instance of a very general decision scheme

NULL HYPOTHESIS SIGNIFICANCE TESTING ABSTRACT

– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)

ABSTRACTRESEARCHQUESTION

VERBAL NULLHYPOTHESIS

SYMBOLICNULLHYPOTHESIS

VERBALALTERNATIVEHYPOTHESIS

SYMBOLICALTERNATIVEHYPOTHESIS

Do teachers scorehigher on the GREVERBAL than thenational average?

The teacherpopulation GRE-Vmean is equal tothe nationalaverage of 476

H0: GRE-V =476

The teacherpopulation GRE-Vmean is differentfrom the nationalaverage of 476

H1: GRE-V

476

Do males orfemales tend toscore better on theGRE-Verbal?

The male andfemale GRE-Vpopulation meansare not different

H0: M = FThe male andfemale GRE-Vpopulation meansare different

H1: M F

Is there arelationshipbetween GPA (X)and starting salary(Y) for collegegrads?

The populationcorrelation betweenGPA and startingsalary is equal tozero

H0: XY = 0 The populationcorrelation betweenGPA and startingsalary is not equalto zero

H1: XY 0

Adapted from Johnson and Christensen (2000). Educational Research. Allyn & Bacon

NULL HYPOTHESIS SIGNIFICANCE TESTING ABSTRACT

– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)

ASSESS– ASSUMING null is true, what is the “chance”

(probability) of obtaining the data we did?

ASSESS

Key question: “IF I assume that the null hypothesis is true, is my sample statistic so unlikely that it makes more sense to reject the null hypothesis (and thereby accept the alternative)?

Key concept: the “Sig” or “p =” is the answer to the question “how likely is my result IF the null hypothesis is true?”

NULL HYPOTHESIS SIGNIFICANCE TESTING

ABSTRACT– Null hypothesis “Chance” (H0)– Alternative hypothesis “Effect” (H1)

ASSESS– ASSUMING H0 is true, what is the probability

or “chance” of obtaining the data we did

DECIDE– IF the chance is “small enough,” reject H0 and

INFER the “Effect” is real (what can go wrong?)

DECIDE

One way to assess: Compare p (or Sig.) from SPSS to a preselected level of

“small enough” (“significance level”) and reject the null if it is equal to or less than that

– REJECT NULL if p (usually =.05!)

– Example, reject null if p=.037 =.05 (it is!)

– NOTE: you select α (usually .05); p is computed from your data!

BUT WHAT CAN GO WRONG? Errors…!

THE DECISION SCHEME

Examples

A major problem is that NO decision is ever “guaranteed” to be right

Some examples:– Fire alarm– Jury trial– NHST

FIRE ALARM

The Decision Scheme

NULL IS TRUE H0

NO FIRE

ALT IS TRUE H1

FIRE

DECIDE:RETAIN NULL

NO ALARM PULLED

CORRECT!

No one bothered

ERROR TYPE II

"Missed" fire

DECIDE:REJECT NULL

PULL ALARM

ERROR TYPE I

"False alarm"

CORRECT!

LIVES SAVED!

TRIAL BY JURY

The Decision Scheme

NULL IS TRUE H0

Defendant is "really"innocent (assumed!)

ALT IS TRUE H1

Defendant is "really"guilty

DECIDE:RETAIN NULL

ACQUIT

CORRECT!

Innocent person isfreed

ERROR TYPE II

Guilty person getsoff

DECIDE:REJECT NULL

CONVICT

ERROR TYPE I

Innocent person isconvicted

CORRECT!

Guilty person doesthe time

Null Hypothesis Significance Testing

The Decision Scheme

NULL IS TRUE H0

"Really" just CHANCE

ALT IS TRUE H1

"Really" an EFFECT

DECIDE:RETAIN NULL

"No effect found"

CORRECT!

1- = .95

ERROR TYPE II

DECIDE:REJECT NULL

"Effect found"

ERROR TYPE I

=.05 "Level of Significance"

CORRECT!

1- "Power"

The Decision Scheme: Comments

If the reality is “chance”, we are correct by NOT inferring an effect, or wrong if we do.

• TYPE I ERROR: Reject null when null is true• Probability of a Type I error is (alpha) -- the “level

of significance”

If the reality is “effect”, we are correct BY inferring an effect, or wrong if we do not.

• TYPE II ERROR: Retain null when null is false• Probability of a Type II error is (beta)

– More common to use Power = 1 -

Question:BOWER “DROODLES” EXPERIMENT

8.71

8.72

8.73

8.74

8.75

8.76

8.77

8.78

8.79

# Droodles

NoWordsWords

Mean # Droodles recalled with no cue = 7.03 C (.91 I)

Mean # Droodles recalled with cue = 8.26 C (.33 I)


YES - why? Condiion

With WordsWithout Words

Mea

n #

Reca

lled

10

8

6

4

2

0

number incorrect

number correct

ANSWERS REVISITEDBower Experiment F’02

INFERENTIAL STATISTICSGroup Statistics

95 8.79 2.71 .28

99 8.74 2.23 .22

95 .74 1.05 .11

99 .13 .34 3.41E-02

ConditionNo Words

With Words

No Words

With Words

Number of PicturesRemembered

Number of PicturesIncorrect

N Mean Std. DeviationStd. Error

Mean

Group Statistics

97 7.0309 2.35608 .23922

97 8.2577 2.40346 .24403

97 .9072 1.07124 .10877

97 .3299 .80002 .08123

condiionWithout Words

With Words

Without Words

With Words

Number Correct

NumberIncorrect


Mean

BOWER EXPERIMENT: Compare Groups on Each of Two DVs

-3.590 192 .000

4.253 192 .000

Number of Droodles Correct

Number of Droodles Incorrect

t df Sig. (2-tailed)

t-test for Equality of Means

Question:CLASS “HAVING FUN” EXAMPLE

0

2

4

6

8

10

12

14

Time Est

MoreFunLess Fun

MORE Fun mean time estimated = 8.6

LESS Fun mean time estimated = 12.5


YES - Why?

ANSWERS REVISITED“Having Fun” Example

Inferential Statistics

Independent Samples Test

-6.353 98 .000 -3.880Equal variancesassumed

Estimate of 10minute interval

t df Sig. (2-tailed)Mean

Difference

t-test for Equality of Means

Group Statistics

50 8.604 2.722

50 12.484 3.353

Experimental Conditions'More fun' (Captions)

'Less fun' (No Captions)

Estimate of 10minute interval

N Mean Std. Deviation

A New Example – from scratch:Doob and Gross (1968)... Status of frustrator as an inhibitor of horn-

honking responses. J Social Psychology, 76, 213-218.

IV: Low vs. high status car DV: Latency of following car to honk when light

turns green and car doesn’t move (seconds) Results:

– Low-status -- N=15, = 7.12 (2.77) sec

– High-status -- N=20, = 9.23 (2.82) sec

SPSS output and interpretation

LoX

HiX

SPSS COMPUTATIONDoob & Gross (1968) Data

Low status 1.68

Low status 6.42

Low status 8.58

Low status 6.85

Low status 10.59

Low status 3.26

Low status 9.44

Low status 4.84

Low status 4.98

Low status 12.31

Low status 9.01

Low status 6.13

Low status 7.86

Low status 6.71

Low status 8.14

High status 9.10

High status 7.83

High status 11.22

High status 5.29

High status 13.20

High status 11.79

High status 3.87

High status 7.41

High status 8.40

High status 14.05

High status 4.44

High status 8.11

High status 9.81

High status 11.79

High status 6.84

High status 12.64

High status 8.68

High status 10.66

High status 9.95

High status 9.53

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

Status Group

Latency toHonk Horn

(sec)Group Statistics

15 7.1200 2.7727 .7159

20 9.2305 2.8203 .6306

Status GroupLow status

High status

Latency to HonkHorn (sec)


Mean

Independent Samples Test

.024

.877

-2.207 -2.212

33 30.586

.034 .035

-2.1105 -2.1105

.9564 .9541

-4.0564 -4.0574

-.1646 -.1636

F

Sig.

Levene's Test forEquality of Variances

t

df

Sig. (2-tailed)

Mean Difference

Std. Error Difference

Lower

Upper

95% Confidence Intervalof the Difference

t-test for Equality ofMeans

Equalvariancesassumed

Equalvariances not

assumed

Latency to Honk Horn (sec)

NHST: Two Independent Means

ABSTRACT

– H0 “chance”: 1 = 2 OR 1 - 2 = 0

– H1 “effect”: 1 2 OR 1 - 2 0

ASSESS– ASSUMING H0 is true, what is the probability or “chance” of

the empirical sample outcome?

– Compute independent-sample t; note the “p = ”. DECIDE

– IF the chance is “small enough,” reject H0; otherwise do not.

– If p(t|Null) (=.05), reject null and interpret alternative.

Repeated-measuresDefinitional Example “Family therapy for

anorexia” (1994) Before and after

family therapy - weights– Using SPSS

• t-test on paired -samples

– Was there a change?

Everitt (1994) Data

83.8 95.2 11.4

83.3 94.3 11.0

86.0 91.5 5.5

82.5 91.9 9.4

86.7 100.3 13.6

79.6 76.7 -2.9

76.9 76.8 -.1

94.2 101.6 7.4

73.4 94.9 21.5

80.5 75.2 -5.3

81.6 77.8 -3.8

82.1 95.5 13.4

77.6 90.7 13.1

83.5 92.5 9.0

89.9 93.8 3.9

86.0 91.7 5.7

87.3 98.0 10.7

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

Beforetherapy

Aftertherapy

After -Before

One-Sample Test

4.185 16 .001 7.265After - Beforet df Sig. (2-tailed)

MeanDifference

Test Value = 0

Repeated-measuresDefinitional Example “Family therapy for anorexia” (1994) SPSS -- standard analysis for paired-samples:

Paired Samples Statistics

83.229 17 5.017

90.494 17 8.475

Before therapy

After therapy

Pair1

Mean N Std. Deviation

Paired Samples Correlations

17 .538 .026Before therapy& After therapy

Pair1

N Correlation Sig.

Paired Samples Test

-7.265

7.157

-4.185

16

.001

Mean

Std. Deviation

Paired Differences

t

df

Sig. (2-tailed)

Before therapy- After therapy

Pair 1

Can a Machine Tickle?

Reference: Harris & Christenfeld (1999) Psychonomic Bull. & Rpts.

Problem: Why can’t you tickle yourself? Reflex vs Interpersonal explanations. Review of long history and more recent research.

Method: 21 & 14 - UCSD undergrads, age 18-28. Tickled twice on foot by “machine” vs “person” (both really a person). Measured by both self-report and behavior rating of video. Open answers to “was sensation produced by the E different than [that] produced by the machine?”

Can a machine, cont… Results: interrater reliability, descriptives in

table, inferential tests in text of article: “There was no hint of any difference between tickle responses

produced by the experimenter and by the machine for behavior [t(32)=0.27, n.s.] or self-report [t(32)=0.12, n.s.].”

Discussion:– “generally favorable to the view that the tickle

response is some form of innate stereotyped motor behavior, perhaps akin to a reflex”

– suggests “that ticklish laughter itself does not require any belief that another human being is producing the stimulation.”

NHST: Two Related Means

ABSTRACT

– H0 “chance”: 1 - 2 = 0 OR 1 = 2

– H1 “effect”: 1 - 2 0 OR 1 2

ASSESS– ASSUMING H0 is true, what is the probability or “chance” of

the empirical sample outcome?

– Compute related-sample t; note the “p = ”. DECIDE

– IF the chance is “small enough,” reject H0; otherwise do not.

– If p(t|Null) (=.05), reject null and interpret alternative.

REVIEW & SUMMARY

Two-level single factor (IV) designs:– Independent groups *– Nonequivalent groups *– Matched groups **– Repeated measures **

* use t-test for independent means ** use t-test for paired means

Single Factor Experimental Designs I Lawrence R. Gordon Psychology Research Methods I.

Documents

Transcript of Single Factor Experimental Designs I Lawrence R. Gordon Psychology Research Methods I.