The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof...

32
The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect

Transcript of The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof...

Page 1: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

The use & abuse of tests

• Statistical significance ≠ practical significance

• Significance ≠ proof of effect (confounds)

• Lack of significance ≠ lack of effect

Page 2: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Factors that affect a hypothesis test

• the actual obtained difference

• the magnitude of the sample variance (s2)

• the sample size (n)

• the significance level (alpha)

• whether the test is one-tail or two-tail

X

Why might a hypothesis test fail to find a real result?

Page 3: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Two types of error

We either accept or reject the H0. Either way, we could be wrong:

Page 4: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Two types of error

False positive rate

False negative rate

“sensitivity” or “power”

We either accept or reject the H0. Either way, we could be wrong:

Page 5: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Error probabilities

When the null hypothesis is true:

P(Type I Error) = alpha

When the alternative hypothesis is true:

P(Type II Error) = beta

Page 6: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Two types of error

False positive rate

False negative rate

“sensitivity” or “power”

Page 7: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Type I error

The “false positive rate”

• We decide there is an effect when none exists; we reject the null wrongly

• By choosing an alpha as our criterion, we are deciding the amount of Type I error we are willing to live with.

• P-value is the likelihood that we would commit a Type I error in rejecting the null

Page 8: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Type II error

The “false negative” rate

• We decide there is nothing going on, and we miss the boat – the effect was really there and we didn’t catch it.

• Cannot be directly set but fluctuates with sample size, sample variability, effect size, and alpha

• Could be due to high variability… or if measure is insensitive or effect is small

Page 9: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Power

The “sensitivity” of the test

• The likelihood of picking up on an effect, given that it is really there.

• Related to Type II error: power = 1-

Page 10: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

A visual example

(We are only going to work through a one-tailed example.)

We are going to collect a sample of 10 highly successful leaders & innovators and measure their scores on scale that measures tendencies toward manic states.

We hypothesize that this group has more tendency to mania than does the general population ( and )50 5

Page 11: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

µ0 = 50

Rejection region

Step 1: Decide on alpha and identify your decision rule (Zcrit)

Z = 0 Zcrit = 1.64

null distribution

Page 12: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Step 2: State your decision rule in units of sample mean (Xcrit )

null distribution

Rejection region

µ0 = 50

Z = 0 Zcrit = 1.64

Xcrit = 52.61

Page 13: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Rejection regionRejection regionAcceptance region

Step 3: Identify µA, the suspected true population mean for your sample

µ0 = 50 Xcrit = 52.61 µA = 55

alternative distribution

Page 14: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Rejection region

power

beta

Xcrit = 52.61 µA = 55µ0 = 50

Step 4: How likely is it that this alternative distribution would produce a mean in the rejection region?

Z = 0Z = -1.51

alternative distribution

Page 15: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Power & Error

Page 16: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Power is a function of

The chosen alpha level ()

The true difference between 0 and A

The size of the sample (n)

The standard deviation (s or )

standard error

Page 17: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing alpha

Page 18: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

Changing alpha

alpha

Page 19: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing alpha

Page 20: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

Changing alpha

alpha

Page 21: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing alpha

• Raising alpha gives you less Type II error (more power) but more Type I error. A trade-off.

Page 22: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Changing distance between 0 and A

beta

µ0 XcritµA

alpha

Page 23: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing distance between 0 and A

Page 24: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing distance between 0 and A

Page 25: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing distance between 0 and A

Page 26: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing distance between 0 and A

• Increasing distance between 0 and A lowers Type II error (improves power) without changing Type I error

Page 27: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

Changing standard error

beta

µ0 XcritµA

alpha

Page 28: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing standard error

Page 29: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing standard error

Page 30: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing standard error

Page 31: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

beta

µ0 XcritµA

alpha

Changing standard error

• Decreasing standard error simultaneously reduces both kinds of error and improves power.

Page 32: The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.

To increase power

Try to make really different from the null-hypothesis value (if possible)

Loosen your alpha criterion (from .05 to .10, for example)

Reduce the standard error (increase the size of the sample, or reduce variability)

For a given level of alpha and a given sample size, power is directly related to effect size. See Cohen’s power tables, described in your text