Warm up On slide. Section 11.1 Chi-Square Inference Summary Means (Hypothesis Test and Confidence...

23
Warm up On slide

Transcript of Warm up On slide. Section 11.1 Chi-Square Inference Summary Means (Hypothesis Test and Confidence...

Warm up• On slide

Section 11.1Section 11.1Section 11.1Section 11.1

Chi-SquareChi-Square

Inference SummaryMeans

(Hypothesis Test and Confidence Intervals)

Proportions(Hypothesis Test and Confidence Intervals)

One-sample Z procedures

One Proportion Z Procedures

One-sample t procedures

Two Proportion Z Procedures

Matched pairs t procedures

Two-sample t procedures

The questions then are…

• What if we want to compare MORE than 2 proportions?– i.e. Let’s examine the proportion of high school

students who go on to four-year colleges. Is that proportion different based on race (White, African American, Asian, Hispanic)? We’d be comparing 4 proportions!

• What if we want to make a prediction of results based on a predicted model?– i.e. We want to predict the results of mating two

red-eyed fruit flies by comparing the actual results to the predicted model.

• What if we want to compare two categorical variables to see if there is a relationship?– i.e. Is smoking behavior (current smoker, former

smoker, never smoked) associated to socioeconomic status (high, medium, low)?

The answer is…

2 Spelled Chi-Squared.

Pronounced like KITE without the “te.”

Then there were three• There are three types of tests

– Goodness of fit– Homogeneity of Proportions– Association / Independence

• Today our focus will be the Chi-Squared Goodness of Fit test.

2

Goodness of Fit• The Chi-squared goodness of fit test

measures whether an observed sample distribution is significantly different from the hypothesized distribution.

• The idea is to compare the observed counts in each category to the expected count for each category based on the hypothesized distribution.

• H0: The specified distribution of the categorical variable is correct.

• Ha: The specified distribution of the categorical variable is not

correct.

Conditions • Use the chi-squared test if

– SRS– All the expected counts are at least

1.– No more than 20% of expected

counts are less than 5.

• Mars, Incorporated makes milk chocolate candies. Here’s what the company’s Consumer Affairs Department says about the color distribution of its M&M’S Milk Chocolate Candies: On average, the new mix of colors of M&M’S Milk Chocolate Candies will contain 13 percent of each of browns and reds, 14 percent yellows, 16 percent greens, 20 percent oranges and 24 percent blues

The one-way table below summarizes the data from a sample bag of M&M’S Milk Chocolate

Candies. In general, one-way tables display the distribution of a categorical variable for the

individuals in a sample

Since the company claims that 24% of all M&M’S Milk Chocolate Candies are blue, we might believe that something fishy is going on. We could use the one-sample z test for a proportion from Chapter 9 to test the hypotheses

H0: p = 0.24Ha: p ≠ 0.24

where p is the true population proportion of blue M&M’S. We could then perform additional significance tests for each of the remaining colors.

HypothesesThe null hypothesis in a chi-square goodness-of-fit test should state a claim about the distribution of a single categorical variable in the population of interest. In our example, the appropriate null hypothesis is

H0: The company’s stated color distribution forM&M’S Milk Chocolate Candies is correct.

Ha: The company’s stated color distribution forM&M’S Milk Chocolate Candies is not correct.

We can also write the hypotheses in symbols as

H0: pblue = 0.24, porange = 0.20, pgreen = 0.16, pyellow = 0.14, pred = 0.13, pbrown = 0.13,

Ha: At least one of the pi’s is incorrect

where pcolor = the true population proportion of M&M’S Milk Chocolate Candies of that color.

The formula

E

EO 22

Remember Σ means sum. So complete this equation for each and add them all up!!!!

2 (9 14.40)2

14.40(8 12.00)2

12.00(12 9.60)2

9.60

(15 8.40)2

8.40 (10 7.80)2

7.80 (6 7.80)2

7.80

180.10

415.0621.0186.5600.0333.1025.22

P-value = .0703

Example• Back in 1980, the

US population had the following distribution by age:

Age Group

Percent of the

Population

0 to 24 41.39%

25 to 44 27.68%

45 to 64 19.64%

65 and older

11.28%

1996…• Suppose I take a

sample of 500 US residents in 1996 and find the following distribution:

Age Group

Count

0 to 24 177

25 to 44 158

45 to 64 101

65 and older

64

Total 500

I want to know: does the distribution of my sample in 1996 match the distribution of age from 1980?

Let’s Compare:Observed (based on

sample of 500)

Expected (based on

1980 percentage *

500)

177 206.95

158 138.4

101 98.2

64 56.4

Help me fill in the last column!

0-24

25-44

45-64

65+

US Population by Age 1980 and 1996

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 to 24 25 to 44 45 to 64 65+

Age

Pe

rce

nt

1980

1996

We see that the distributions are different. The question is ARE THEY SIGNIFICANTLY DIFFERENT?

Characteristics of the Chi-Squared Statistic

• Chi-Square is ALWAYS (always? Yes, always) skewed RIGHT.

• As the degrees of freedom increase, the graph becomes less skewed. It becomes more symmetric and looks more like a normal curve.

• The total area under a chi-square curve is 1. WHY?

In Calc• Put Observed in L1 and Expected in L2• Stat, Test, χ2 GOF-Test• Enter your df

• CAUTION!!!! You still need to know how to use the formula and table… Sometimes your calculator will give you an error! This happened in the 2008 Free Response!

How to recognize Χ2 Goodness of Fit

• You have many percents and you want to know if your sample matches the distribution.

Homework

Chapter 11#9, 10, 13(a-c), 15, 19-

22explain