Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

Post on 01-Jan-2016

217 views 1 download

Tags:

Transcript of Economics 173 Business Statistics Lecture 7 Fall, 2001 Professor J. Petry

Economics 173Business Statistics

Lecture 7

Fall, 2001

Professor J. Petry

http://www.cba.uiuc.edu/jpetry/Econ_173_fa01/

2

Organization of Techniques• Keeping track of the different tests we are conducting is best done

with the “Decision Tree” and “Summary” provided in Chapter 22 of your book.

• As we go through the chapters you should be utilizing the decision tree and Summary to do your problems.– You will be given copies of both for the exams.– We will use the version at the end of the book (chapter 22) so you have the

same one to use during the mid-term and the final. – The versions we are handing out today, include statistical tables which, as

we announced last class will no longer be used in this course.• Develop a process to work each problem. My process is . . .

– Read the question at least twice– Ask myself what type of question does this feel like? Parameter? H1?– Go down the decision tree formally

3

Organization of TechniquesExample 1:

In a recent municipal election the high cost of housing became an important issue. A candidate seeking to unseat an incumbent claimed that the average family spends more than 30% of its annual income on housing. A housing expert was asked to investigate the claim. A random sample of 125 households was asked to report the percentage of household income spent on housing costs. Assuming you were given the data, what technique would you use to determine if the candidate was correct at the 5% significance level?

Example 2:The number of internet users is rapidly increasing. A recent survey reveals that there are about 30 million Internet users in North America. Suppose a survey of 200 of these people were asked to report how many hours they spent on the Internet last week. Assuming you were given the data, what technique would you use to estimate with 95% confidence the average amount of time spent by all North Americans on the Internet?

4

Organization of TechniquesExample 3:

A rock promoter is in the process of deciding whether to book a new band for a rock concert. He knows that this band appeals almost exclusively to teenagers. According to the latest census, there are 400,000 teenagers in the area. Assuming you were provided the data, what technique would you use to estimate the proportion of teenagers who will attend the concert?

Example 4:Some traffic experts believe that the major cause of highway collisions is the differing speeds of cars. That is, when some cars are driven slowly while others are driven fast, cars tend to congregate in bunches increasing the probability of accidents. Thus the greater the variation in speeds, the greater the number of collisions that occur. Suppose that one expert believes that when the variance exceeds 18 (mph), the number of accidents will be unacceptably high. Assuming you are provided the data, what technique would you use to test whether the variance in speeds exceeds 18 (mph)?

5

Inference about the Comparison of

Two Populations

Inference about the Comparison of

Two Populations

Chapter 12

6

12.1 Introduction

• Variety of techniques are presented whose objective is to compare two populations.

• We are interested in:– The difference between two means.– The ratio of two variances.– The difference between two proportions.

7

• Two random samples are drawn from the two populations of interest.

• Because we are interested in the difference between the two means, we build the statistic for each sample.

12.2 Inference about the Difference b/n Two Means: Independent Samples

x

8

is normally distributed if the (original) population distributions are normal .

is approximately normally distributed if the (original) population is not normal, but the sample size is large.

Expected value of is 1 - 2

The variance of is 12/n1 + 2

2/n2

21 xx

21 xx

The Sampling Distribution of 21xx

21xx

21xx

9

• If the sampling distribution of is normal or approximately normal we can write:

• Z can be used to build a test statistic or a confidence interval for 1 - 2

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

21xx

10

• Practically, the “Z” statistic is hardly used, because the population variances are not known.

21

21

nn

)()xx(Z

21

21

nn

)()xx(Z

? ?

• Instead, we construct a “t” statistic using the sample “variances” (S1

2 and S22).

S22S1

2t

11

• Two cases are considered when producing the t-statistic.

– The two unknown population variances are equal.

– The two unknown population variances are not equal.

12

Case I: The two variances are equal

2nns)1n(s)1n(

S21

2

22

2

112

p

Example: S12 = 25; S2

2 = 30; n1 = 10; n2 = 15. Then,

04347.2821510

)30)(115()25)(110(S2

p

• Calculate the pooled variance estimate by:

2pS

n2 = 15n1 = 10

21S

22S

13

• Construct the t-statistic as follows:

2nn.f.d

)n1

n1

(s

)()xx(t

21

21

2p

21

• Perform a hypothesis test H0: = 0 H1: > 0;

or < 0; or 0

Build an interval estimate

level. confidence the is where

)n1

n1

(st)xx(21

2

p21

14

1)(

1)(

)/(d.f.

)(

)()(

2

2

222

1

21

21

22

221

21

2

22

1

21

21

nns

nns

nsns

ns

ns

xxt

Case II: The two variances are unequal

15

Run a hypothesis test as needed, or, build an interval estimate

level. confidence theis where

n

s

n

st)xx(

Estimator

2

22

1

21

21

16

• Example 12.1– Do people who eat high-fiber cereal for

breakfast consume, on average, fewer calories for lunch than people who do not eat high-fiber cereal for breakfast?

– A sample of 150 people was randomly drawn. Each person was identified as a consumer or a non-consumer of high-fiber cereal.

– For each person the number of calories consumed at lunch was recorded.

17

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Calories consumed at lunch

Solution: • The data are quantitative. • The parameter to be tested is the difference between two means. • The claim to be tested is that mean caloric intake of consumers (1) is less than that of non-consumers (2).

18

• Identifying the technique

–The hypotheses are:

H0: (1 - 2) = 0H1: (1 - 2) < 0

– To check the relationships between the variances, we use a computer output to find the samples’ standard deviations. We have S1 = 64.05, and S2

= 103.29. It appears that the variances are unequal.

– We run the t - test for unequal variances.

1 < 2)

19

Calories consumed at lunch

• At 5% significance level there is sufficient evidence to reject the null hypothesis.

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

Consmers Non-cmrs568 705498 819589 706681 509540 613646 582636 601739 608539 787596 573607 428529 754637 741617 628633 537555 748

. .

. .

. .

. .

t-Test: Two-Sample Assuming Unequal Variances

ConsumersNonconsumersMean 604.023 633.234Variance 4102.98 10669.8Observations 43 107Hypothesized Mean Difference0df 123t Stat -2.09107P(T<=t) one-tail 0.01929t Critical one-tail 1.65734P(T<=t) two-tail 0.03858t Critical two-tail 1.97944

20

• Solving by hand– The interval estimator for the difference between two

means is

65.2721.29107

29.1034305.64

9796.1)239.63302.604(

)2n

22s

1n

21s

(2t)2x1x(

22

21

• Example 12.2

– Do job design (referring to worker movements) affect worker’s productivity?

– Two job designs are being considered for the production of a new computer desk.

– Two samples are randomly and independently selected• A sample of 25 workers assembled a desk using design A. • A sample of 25 workers assembled the desk using design B.• The assembly times were recorded

– Do the assembly times of the two designs differs?

22

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Assembly times in Minutes

Solution

• The data are quantitative.

• The parameter of interest is the difference between two population means.

• The claim to be tested is whether a difference between the two designs exists.

23

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

Design-A Design-B6.8 5.25.0 6.77.9 5.75.2 6.67.6 8.55.0 6.55.9 5.95.2 6.76.5 6.6. .. .. .. .

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

t-Test: Two-Sample Assuming Equal Variances

Design-A Design-BMean 6.288 6.016Variance 0.847766667 1.3030667Observations 25 25Pooled Variance 1.075416667Hypothesized Mean Difference0df 48t Stat 0.927332603P(T<=t) one-tail 0.179196744t Critical one-tail 1.677224191P(T<=t) two-tail 0.358393488t Critical two-tail 2.01063358

The Excel printout

P-value of the one tail test

P-value of the two tail test

Degrees of freedomt - statistic

2

1S 2

2S2

pS

24

A 95% confidence interval for 1 - 2 is calculated as follows:

]8616.0,3176.0[5896.0272.0

)251

251

1.075(0106.2016.6288.6

)n1

n1

(st)xx(21

2

p21

Thus, at 95% confidence level

-0.3176 < 1 - 2 < 0.8616

Notice: “Zero” is included in the interval

25

Checking the required Conditions for the equal variances case (example 12.2)

The distributions are notbell shaped, but theyseem to be approximately normal. Since the techniqueis robust, we can be confidentabout the results.

0

2

4

6

8

10

12

5 5.8 6.6 7.4 8.2 More

Design A

01234567

4.2 5 5.8 6.6 7.4 More

Design B

26

Example

• 12.20 from book• Random samples were drawn from each of two

populations. The data are stored in columns 1 and 2, respectively, in file XR12-20.

• Is there sufficient evidence at the 5% significance level to infer that the mean of population 1 is greater than the mean of population 2?

27

X1 X2

Mean 246.80 Mean 239.66Standard Error 2.88 Standard Error 0.94Median 247.00 Median 240.00Mode 280.00 Mode 240.00Standard Deviation 28.81 Standard Deviation 11.57Sample Variance 829.90 Sample Variance 133.81Kurtosis 0.34 Kurtosis 0.02Skewness -0.02 Skewness 0.02Range 162.00 Range 61.00Minimum 158.00 Minimum 213.00Maximum 320.00 Maximum 274.00Sum 24680.00 Sum 35949.00Count 100.00 Count 150.00Confidence Level(95.0%) 5.72 Confidence Level(95.0%) 1.87

t-Test: Two-Sample Assuming Unequal Variances

X1 X2Mean 246.8 239.66Variance 829.89899 133.8097987Observations 100 150Hypothesized Mean Difference 0df 121t Stat 2.3551335P(T<=t) one-tail 0.0100626t Critical one-tail 1.657545P(T<=t) two-tail 0.0201252t Critical two-tail 1.9797653