POSC 202A: Lecture 9 Lecture: statistical significance.

36
POSC 202A: Lecture 9 Lecture : statistical significance.

Transcript of POSC 202A: Lecture 9 Lecture: statistical significance.

Page 1: POSC 202A: Lecture 9 Lecture: statistical significance.

POSC 202A: Lecture 9

Lecture: statistical significance.

Page 2: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

The fundamental question of statistical significance:

Page 3: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

The fundamental Question:

How likely is the result we observe to be the product of chance?

This question drives all of the tests we perform by allowing us to differentiate the systematic

from the stochastic.

Page 4: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Confidence Interval-An interval calculated from sample data that is guaranteed to

capture the true population parameter.

It tells us how large of an interval we need to create in order to capture the true population value in some fixed percentage of

the intervals we draw.

Page 5: POSC 202A: Lecture 9 Lecture: statistical significance.
Page 6: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Think of it this way: We can draw a sample in order to estimate some statistic. If we

repeat over and over we start to create a sampling distribution.

To create a 95% CI we need to see how large of an interval around the statistic we need to create to satisfy the condition

that in 95% of the samples drawn the interval we draw contains the true population parameter (value).

Page 386 has a nice graph of this.

Page 7: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Confidence Interval:

An example: a 95% confidence interval is the range needed to capture the true population value in 95% of the intervals we

draw from a population.

Page 8: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Confidence Interval:

The interval thus captures the variability inherent in using samples to draw inferences about a population.

To estimate, we need an estimate of the population mean and the standard deviation.

Reported in the form:Estimate Margin of ErrorGenerally, the interval is:

mean – (z*sd); mean+(z*sd)

Page 9: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Confidence Interval:

To estimate, we need an estimate of the population mean and the standard deviation.

How do we calculate this for a sampling distribution?This is for proportions:

Mean=p

n

ppDS

)1(..

Page 10: POSC 202A: Lecture 9 Lecture: statistical significance.

Refresher: Thumb’s Rule

Recall that with a normal distribution:

Apportionment of area about the mean is+/- 1sd= 68%

+/- 2sd= 95%+/- 3sd= 99.7%

So, a 95% CI corresponds to

+/- 2sd= 95%x

Page 11: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Lets create an example using a sample from Dear Abby, in which 400 women responded of whom 60% would rather just

cuddle than have sex with their husbands.

What do we want to know?

Page 12: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Is 60% beyond what we would expect to see due to chance alone?

Is this likely to be just random variation?

What do we want to know?

Page 13: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Lets create an example using a sample from Dear Abby, in which 400 women responded of whom 60% would rather just

cuddle than have sex with their husbands.

Sample mean=p

n

ppDS

)1(..

.6 =p, so: S.D.= 0245.20

49.

400

24.

400

)6.1(6.

Page 14: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

Our confidence interval is thus:

+/- 2sd= 95 CIOr

.6 +/- 2(.0245)= .6-.049 and .6+.049

x

Page 15: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

.6 +/- 2(.0245)= .6-.049 and .6+.049Round off to .05

x.60 .65.55

95% CI

Page 16: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

95% of all samples will capture the true population parameter in the range

between .55-.65

x.60 .65.55

95% CI

Page 17: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

95% of all samples will capture the true population parameter in the range

between .55-.65

From this we conclude that we are 95% confident that between 55% and 65% of

women prefer cuddling to sex.

Page 18: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

What would we have expected if they answered randomly?

Page 19: POSC 202A: Lecture 9 Lecture: statistical significance.

Exercise

A study of graduate placement finds that examining 50 graduates, 20% of

students earn jobs at universities that are not teaching intensive.

Construct a confidence interval that allows us to assess whether this result is too

small to attribute to chance.

Page 20: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

How do our estimates change with the size of the population?

Recall we found

0245.20

49.

400

24.

400

)6.1(6.

The population average (mean) stays the same regardless of sample size. But what of the SD?

n

ppDS

)1(..

Page 21: POSC 202A: Lecture 9 Lecture: statistical significance.

Sample Size and Confidence Intervals

Where the sample statistic is .5

Sample #1 Sample #2 Sample #3 Sample #4 Sample #5 Sample #6 Sample #7

Mean 0.5 0.5 0.5 0.5 0.5 0.5 0.5

N 100 200 300 400 500 1000 10000

Sd 0.05 0.035 0.029 0.025 0.022 0.016 0.005

95% CI 0.4 0.43 0.44 0.45 0.46 0.47 0.49

to 0.6 0.57 0.56 0.55 0.54 0.53 0.51

Page 22: POSC 202A: Lecture 9 Lecture: statistical significance.

Sample Size and Standard Error of an Average

100

200

300400

500

1000

5000

0

0.01

0.02

0.03

0.04

0.05

0.06

0 1000 2000 3000 4000 5000 6000

Sample Size

Sta

nd

ard

Err

or

Page 23: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

A shortcut for approximating the 95% CI for a proportion:

N

1

More accurate as you get closer to an even split (i.e. 50%)

Page 24: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

.

Mean 0.10 0.20 0.30 0.40 0.50 Sample Size 900 900 900 900 900 SD-calculation

N

pp )1(

0.010 0.013 0.015 0.016 0.017 2x SD 0.020 0.027 0.031 0.033 0.033 2 SD approximation

N

1

0.033 0.033 0.033 0.033 0.033

Page 25: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Recall--The fundamental Question:

How likely is the result we observe to be the product of chance?

This question drives all of the tests we perform by allowing us to differentiate the systematic

from the stochastic.

Significance testing is all about comparisons.

Page 26: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Significance testing is all about comparisons.

Is what we observe close or far from what we expect ?

Is what we observe so far from what we expect that we cannot attribute what we see to

chance alone.

Page 27: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

An example: Imagine we took a valid random sample of women’s preferences and got the same result as Dear Abby’s survey of women’s cuddling preferences (60% preferred cuddling to sex; 400 women responded).

How would we conduct a significance test?

We need to identify the appropriate comparison.

Page 28: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

How would we conduct a significance test?

What if women randomly answered “snuggle” or “sex”?

Page 29: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

What we would expect if women just answered randomly?

If so, we might expect 50% of respondents to prefer snuggling.

Page 30: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

What we would expect if women just answered randomly?

If so, we might expect 50% of respondents to prefer snuggling.

Then we need to know if what we observed is (60%) too large of a result to be attributable to chance alone.

How can we determine this?

Page 31: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

How can we determine this?

One way is to estimate a 95% confidence interval around our sample mean (60%) and see if it contains the result we would

expect sue to chance alone (50%).

Page 32: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

Recall, we created the following interval earlier. We can simply look to see if .50 is within the interval constructed.

Since it is not, then our sample result is statistically significantly different than chance.

x.60 .65.55

95% CI

.50

Page 33: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

How can we determine this?

A second way is to conduct a significance test by solving for areas under the normal curve. Recall we know how to find

the likelihood that some event occurs using the formula:.

XXi

Z

This formula asks whether what we see is too far from chance to be attributed to chance alone.

Page 34: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

We simply calculate the number o standard units what we observe is from random chance.

XXi

Z

Then use the Z table to obtain the likelihood that we see a sample as large as 60% if the true value is 50%.

45.2

5060

Page 35: POSC 202A: Lecture 9 Lecture: statistical significance.

Statistical Significance

Our Z table only goes to 3.4!

Less than 1 time in 10,000 would we see a sample mean of 60% if the process were driven by chance alone.

Page 36: POSC 202A: Lecture 9 Lecture: statistical significance.

Confidence Interval

We could illustrate this process on the normal curve as well. The Q: How likely is it that we would see a result above 60%?

x.60.55.50

Z>.9998

Z<.0001

Xi