Effect Size

15
1 Beyond p value: Effect Size April 4, 2008 Guy Lion

description

Presentation introducing a better hypothesis testing methodology: Effect Size.

Transcript of Effect Size

Page 1: Effect Size

1

Beyond p value: Effect Size

April 4, 2008Guy Lion

Page 2: Effect Size

2

P value vs Effect Size

By increasing sample size, you can show there is a statistically significant* difference between two Means. The Effect Size evaluates how material is that difference.

P value = probability sample Means are the same.(1 – P) or C.L. = probability sample Means are different.

Effect Size = how different sample Means are.

*Statistically significant does not imply “significant.” [Webster: … of consequence.]

Page 3: Effect Size

3

Effect Size in Plain English

Large Effect Size is visible without looking at a large sample.

Other Effect Size examples:• The Kaplan course raises SAT math scores by 60 points; • This WOW initiative increases # Solutions by 30 units.

When analyzing Effect Size we deal with standardized units…

With sea lions gender has a large Effect Size.

With pugs gender has a small Effect Size

Page 4: Effect Size

4

Effect Size Measures

Cohen’s d = (MeanPilot – MeanControl)/Pooled Stand. Deviation

Cohen’s d is similar to the unpaired t test t value. It relies on Standard Deviations instead of Standard Errors.

Hedges’ g is a more accurate version of Cohen’s d

Hedges’

Cohen’s dAdjustment for small sample

Page 5: Effect Size

5

Pilots vs Controls

In this example, there is a 1.9 Standard deviation difference between Pilots and Controls.

Distribution of Controls vs Pilots

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

-2.5

0-2

.10

-1.7

0-1

.30

-0.9

0-0

.50

-0.1

00.

300.

701.

101.

501.

902.

30

Z value

Distance = 1.9 Z

Page 6: Effect Size

6

Effect Size Info

Effect Size

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

3.5%

4.0%

4.5%

Z value

Nonoverlap Area

Effect Size

Overlap Area

Midpoint

Pilot Avg.Percentile Standing

A B C = 2(B) D = 1 - C E = D/APilot Avg. Area Percent

Effect Percentile Area beyond Overlap Nonoverlap ofSize Standing Midpoint under curve curve area area Nonoverlap1.90 97.1% 0.95 82.9% 17.1% 34.2% 65.8% 79.4%

Page 7: Effect Size

7

An Example

Testing Difference in Units - Solutions

Two sample Unpaired t test. Effect Size - using Averages

Pilots Controls Pilots ControlsSample size 27 81Average 109.1 93.9 Average 109.1 93.9Standard deviation 42.2 77.4 Standard deviation 42.2 77.4Standard error 8.1 8.6Average difference 15.3 Average difference 15.3

Output OutputGroup standard error 11.84 Pooled - Standard deviation 70.4

t statistic 1.289 Cohen's d value 0.22Degree of freedom 106 Hedges' g multiplier 0.99Group's t statistics: Hedges' g 0.22P Value, prob. samples are same: Percentile standing - Pilot Avg. 58.5%a) One tail 10.0% Effect size strength Smallb) Two tail 20.0% Effect size midpoint 0.11c) Using normal distribution 19.7% A Area under curve 54.3%

B Area beyond curve 45.7%Confidence level samples are different: C = 2B Overlap area 91.4%a) Two tail t distribution 80.0% D = 1 - C Nonoverlap area 8.6%b) Using normal distribution 80.3% E = D/A Percent Nonoverlap (Cohen) 15.8%

Effect size interpretation (Cohen 1988).0.0 < Small0.2 Small0.5 Medium0.8 Large

Page 8: Effect Size

8

ES Confidence Interval

The Effect Size standard deviation formula allows to build Confidence Intervals around Effect Size values.

Effect Size - using Averages

Pilots Controls

Average 109.1 93.9Standard deviation 42.2 77.4

Average difference 15.3

OutputPooled - Standard deviation 70.4

Cohen's d value 0.22Hedges' g multiplier 0.99Hedges' g 0.22Percentile standing - Pilot Avg. 58.5%Effect size strength SmallEffect size midpoint 0.11Area under curve 54.3%Area beyond curve 45.7%Overlap area 91.4%Nonoverlap area 8.6%Percent Nonoverlap (Cohen) 15.8%

Confidence Interval a) In standardized units:Effect Size standard deviation 0.22

C.I. Z value Low high80.0% 1.28 -0.07 0.5095.0% 1.96 -0.22 0.65

b) In regular units:C.I. Z value Low high

80.0% 1.28 -5.0 35.595.0% 1.96 -15.7 46.2

Page 9: Effect Size

9

1st Nonparametric test: Gamma Index Gamma Index

Pilots Controls11 1014 1215 1215 1517 1617 1618 1819 2020 27

Average 16.2 16.2Difference 0.0

Median 17.0 16.0

OutputEffect Size - Gamma Index# < Control Median 4Proportion 44.4%Z Value (left tail) -0.14

Adjusted Effect Size - Gamma Index# > Control Median 5Proportion 55.6%Z Value (left tail) 0.14

Recalculated Gamma Index to make it the same sign as Cohen's d and Hedges' g.

Page 10: Effect Size

10

2nd Nonparametric test: Cliff DeltaCliff Delta

Control Pilots1 11 22 32 42 43 53345

Dominance MatrixPilot > Control = 1. Pilot = Control = 0. Pilot < Control = -1

Pilots1 2 3 4 4 5

1 0 1 1 1 1 11 0 1 1 1 1 12 -1 0 1 1 1 1

Controls 2 -1 0 1 1 1 12 -1 0 1 1 1 13 -1 -1 0 1 1 13 -1 -1 0 1 1 13 -1 -1 0 1 1 14 -1 -1 -1 0 0 15 -1 -1 -1 -1 -1 0

Average (0.8) (0.3) 0.3 0.7 0.7 0.9 0.25

Cliff Delta ranges between +1 when all values of one group are higher than the values of the other group and – 1 when reverse is true. Two overlapping distributions would have a Cliff Delta of 0.

Page 11: Effect Size

11

Cliff Delta efficiently…

Cliff Delta1 -1

Control Pilots > Control < Control1 1 0 8 -0.81 2 2 5 -0.32 3 5 2 0.32 4 8 1 0.72 4 8 1 0.73 5 9 0 0.933 0.2545

[(1 x 0) – (1 x 8)]/10 = -0.8

Page 12: Effect Size

12

Cliff Delta vs Cohen’s d

Cliff DeltaA B C = 2(B) D = 1 - C E = D/A

Cohen's d Pilot Avg. Area Percent Effect Percentile Area beyond Overlap Nonoverlap ofSize Standing Midpoint under curve curve area area Nonoverlap0.20 57.9% 0.10 54.0% 46.0% 92.0% 8.0% 14.8%0.50 69.1% 0.25 59.9% 40.1% 80.3% 19.7% 33.0%0.80 78.8% 0.40 65.5% 34.5% 68.9% 31.1% 47.4%

If distributions are Normal, Cliff Delta = Percent of Nonoverlap within a Cohen's d framework.

Page 13: Effect Size

13

Effect Size and Sample Size requirement

The above formula results from the algebraic transformation of: t stat or Z value = Difference in Means/Group Standard Error.

n = (Z x Group S.D./Effect Size)2

Sample size requirement selecting Effect Size in units and 2-tail p value a level.

InputStandard deviation 45Effect size in units 15Group standard deviation 63.6P value a 0.05Z value 1.96Effect Size 0.33Sample size required 70

Page 14: Effect Size

14

Testing Sample SizeSample size requirement selecting Effect Size in units and 2-tail p value a level.

InputStandard deviation 45Effect size 15Group standard deviation 63.6P value a 0.05Z value 1.96Effect size 0.33Sample size required 70

Unpaired t test -testing the above sample

Pilots ControlsSample size 70 70Average 115.0 100.0Standard deviation 45.0 45.0Standard error 5.38 5.38Avg. difference or Effect Size 15.0

OutputGroup standard error 7.61t statistic 1.972Degree of freedom 138Group's t statistics:P Value, prob. samples are same: a) One tail 2.5%b) Two tail 5.1%c) Using normal distribution 4.9% Using NORMSDIST

n = (Z x Group S.D./Effect Size)2

Page 15: Effect Size

15

Conclusion

• P value does not tell us how different two samples are.

• Effect Size and its Confidence Intervals give much information on how different two samples are.

• Effect Size in units allows us to derive a required sample size to meet a p value a threshold.