prediction interval Solution 4.2.11. - University of Hawaiigrw/Classes/2013-2014... · Math 472...

Math 472 Homework Assignment 4

Problem 4.2.11. Let X1, X2, . . . , Xn, Xn+1 be a random sample of sizen + 1, n > 1, from a distribution that is N(µ, σ2). Let X =

∑ni=1Xi/n

and S2 =∑n

i=1(Xi −X)2/(n− 1). Find the constant c so that the statistic

c(X −Xn+1)/S has a t-distribution. If n = 8, determine k such that

P (X − kS < X9 < X + kS) = 0.80.

The observed interval (x − ks, x + ks) is often called an 80% predictioninterval for X9.

Solution 4.2.11. The random variable X−Xn+1 has a normal distributionsince it is a linear combination of independent normally distributed randomvariables. Its expected value and variance are

E[X −Xn+1] = µ− µ = 0,

Var[X −Xn+1] =σ2

n+ σ2 = σ2

(n+ 1

n

).

Therefore,√n/(n+ 1)(X −Xn+1)/σ has a standard normal distribution.

By Student’s theorem, X and S2 are independent random variables, and(n− 1)S2/σ2 has a χ-square distribution with r = n− 1 degrees of freedom.Since X and S2 are functions of X1, . . . , Xn, we see that X, S2, and Xn+1 areindependent random variables, therefore X −Xn+1 and S2 are independentrandom variables. From these observations we conclude that√

n/(n+ 1)(X −Xn+1)/σ√S2/σ2

=

√n/(n+ 1)(X −Xn+1)

S

has a t-distribution with r = n− 1 degrees of freedom. It follows that

c =

√n

n+ 1.

Let n = 8. Then c =√

8/9 = 2√

2/3. Let T7 be a t-distributed randomvariable with 7 degrees of freedom and let t0.10,7 be the number definedby P (T7 > t0.10,7) = 0.10. Then P (−t0.10,7 < T7 < t0.10,7) = 0.80. This,combined with the observation that

−t0.10,7 <c(X −X9)

S< t0.10,7 ⇐⇒ X − t0.10,7

cS < X9 < X +

t0.10,7c

S,

shows that

k =t0.10,7c

=3t0.10,7

2√

2

.=

(3)(1.415)

2.828

.= 1.501.

1

2

Problem 4.2.18. Let X1, X2, . . . , Xn be a random sample from N(µ, σ2),where both parameters µ and σ2 are unknown. A confidence interval for σ2

can be found as follows. We know that (n − 1)S2/σ2 is a random variablewith a χ2(n − 1) distribution. Thus we can find constants a and b so thatP ((n− 1)S2/σ2 < b) = 0.975 and P (a < (n− 1)S2/σ2 < b) = 0.95.

(a) Show that this second probability statement can be written as

P

((n− 1)S2

b< σ2 <

(n− 1)S2

a

)= 0.95.

(b) If n = 9 and s2 = 7.93, find a 95% confidence interval for σ2.(c) If µ is known, how would you modify the preceding procedure for finding

a confidence interval for σ2?

Solution 4.2.18.(a) With the numbers a and b chosen as above, the result follows imme-

diately from the observation that

a <(n− 1)S2

σ2< b ⇐⇒ (n− 1)S2

b< σ2 <

(n− 1)S2

a.

(b) Let W be a χ2-distributed random variable with 8 degrees of free-dom. Then P (W < 17.5345)

.= 0.975 and P (2.1797 < W < 17.5345)

.=

0.95. With n = 9 we calculate the lower confidence limit to be (n −1)s2/b

.= (8)(7.93)/(17.5345)

.= 3.618 and the upper confidence limit to be

(n − 1)s2/a.= (8)(7.93)/(2.1797)

.= 29.104. Therefore the 95% confidence

interval for σ2 is

(3.618, 29.104).

(c) If µ is known then we can replace (n−1)S2/σ2 with∑n

i=1(Xi−µ)2/σ2,which has a χ-square distribution with n degrees of freedom, as a pivotalrandom variable.

Problem 4.2.25. To illustrate Exercise 4.2.24, letX1, . . . , X9 and Y1, . . . , Y12represent two independent random samples from the respective normal dis-tributions N(µ1, σ

21) and N(µ2, σ

22). It is given that σ21 = 3σ22, but σ22 is

unknown. Define a random variable that has a t-distribution that can beused to find a 95% confidence interval for µ1 − µ2.

Solution 4.2.25. Let X =∑9

i=1Xi/9 and Y =∑12

i=1 Yi/12. Since X − Yis a linear combination of independent normal random variables, it has a

3

normal distribution with

E[X − Y ] = µ1 − µ2

Var[X − Y ] =σ219

+σ2212

=σ219

+3σ2112

= σ21

(1

9+

1

4

).

Thus the random variable

(X − Y )− (µ1 − µ2)σ1√

1/9 + 1/4(1)

has a standard normal distribution.Since 8S2

1/σ21 and 11S2

2/σ22 = 11S2

2/(3σ21) are independent random vari-

ables that have χ2-distributions with respective degrees of freedom 8 and11, the random variable

19S2p/σ

21 = 8S2

1/σ21 + 11S2

2/(3σ21) = (8S2

1 + 11S22/3)/σ21

has a χ2-distribution with 19 degrees of freedom. Therefore Sp/σ1 is thesquare root of a χ2-distributed random variable divided by its degrees offreedom. Since Student’s Theorem implies that X, Y , S2

1 , and S22 are inde-

pendent random variables, we see that the random variable in equation (1)and S2

p are independent. Therefore

(X−Y )−(µ1−µ2)σ1√

1/9+1/4

Sp/σ1=

(X − Y )− (µ1 − µ2)Sp√

1/9 + 1/4

has a t-distribution with 19 degrees of freedom that can be used for a 95%confidence interval for µ1 − µ2, where S2

p = (8S21 + 11S2

2/3)/19.

Problem 4.2.27. Let X1, X2, . . . , Xn and Y1, Y2, . . . , Ym be two indepen-dent random samples from the respective normal distributions N(µ1, σ

21)

and N(µ2, σ22), where the four parameters are unknown. To construct a

confidence interval for the ratio, σ21/σ22, of the variances, form the quotient

of the two independent χ2 variables, each divided by its degrees of freedom,namely,

(2) F =

(m−1)S22

σ22

/(m− 1)

(n−1)S22

σ21

/(n− 1)=S22/σ

22

S21/σ

21

,

where S21 and S2

2 are the respective sample variances.

(a) What kind of distribution does F have?(b) From the appropriate table, a and b can be found so that P (F < b) =

0.975 and P (a < F < b) = 0.95.

4

(c) Rewrite the second probability statement as

(3) P

[aS21

S22

<σ21σ22

< bS21

S22

]= 0.95.

The observed values, s21 and s22, can be inserted in these inequalities toprovide a 95% confidence interval for σ21/σ

22.

Solution 4.2.27.(a) The random variable F in equation (2) has an F -distribution with

r1 = m numerator degrees of freedom and r2 = n denominator degrees offreedom.

(b) Using the notation of Table V on page 661, b = F0.025(m,n) anda = F0.975(m,n).

(c) A direct calculation shows that

a <S22/σ

22

S21/σ

21

< b ⇐⇒ aS21

S22

<σ21σ22

< bS21

S22

.

Combining this calculation with P (a < F < b) = 0.95 leads immediatelyto equation (3). We therefore find a 95% confidence interval for the ratioσ21/σ

22 is given by (

F0.975(m,n)S21

S22

, F0.0275(m,n)S21

S22

).

Problem 4.6.5. Assume that the weight of cereal in a “10-ounce box” isN(µ, σ). To test H0 : µ = 10.1 against H1 : µ > 10.1, we take a randomsample of size n = 16 and observe that x = 10.4 and s = 0.4.

(a) Do we accept or reject H0 at the 5% significance level?(b) What is the approximate p-value of this test?

Solution 4.6.5.(a) The test statistic,

√16(X − 10.1)/S has a Student’s t-distribution

with r = 15 degrees of freedom. The realization of the test statistic ist = (4)(10.4 − 10.1)/(0.4) = 3, and the critical value for the right-tailedalternative H1 is t0.05,15

.= 1.753. Since t > t0.05,15, we reject H0 at the 5%

significance level.(b) The approximate p-value of this test is p = P (T15 > 3)

.= 0.0045.

Problem 4.6.6. Each of 51 golfers hit three golf balls of brand X andthree golf balls of brand Y in a random order. Let Xi and Yi equal theaverages of the distances traveled by the brand X and brand Y golf ballshit by the ith golfer, i = 1, 2, . . . , 51. Let Wi = Xi − Yi, i = 1, 2, . . . , 51.Test H0 : µW = 0 against H1 : µW > 0, where µW is the mean of thedifferences. If w = 2.07 and s2W = 84.63, would H0 be accepted or rejectedat an α = 0.05 significance level? What is the p-value of this test?

5

Solution 4.6.6. We may assume that the distribution of the test statistic(W − µW )/(SW /

√51) is approximately the standard normal distribution.

Under the null hypothesis the realization of the test statistic is

z.=

w − 0√s2w/n

.=

2.07√84.63/51

.= 1.607.

The critical value for the right-tailed alternative hypothesis is z0.05.= 1.645,

and since z < z0.05 we accept H0 at the α = 0.05 significance level. Thep-value of this test is p

.= P (Z > 1.607)

.= 0.054.

Problem 4.6.7. Among the data collected for the World Health Organi-zation air quality monitoring project is a measure of suspended particlesin µg/m3. Let X and Y equal the concentration of suspended particles inµg/m3 in the city center (commercial district) for Melbourne and Houston,respectively. Using n = 13 observations of X and m = 16 observations ofY , we test H0 : µX = µY against H1 : µX < µY .

(a) Define the test statistic and critical region, assuming that the unknownvariances are equal. Let α = 0.05.

(b) If x = 72.9, sx = 25.6, y = 81.7, and sy = 28.3, calculate the value ofthe test statistic and state your conclusion.

Solution 4.6.7.(a) We are testing a difference in means and the sample sizes are small.

This suggests using the random variable

T27 =(X − Y )− (µX − µY )

Sp√

1/13 + 1/16,

where S2p =

12S2X + 15S2

Y

27,

as our test statistic. The random variable T27 has an approximate Student’st-distribution with 27 degrees of freedom. We are testing the left-tailedalternative µX − µY < 0, which leads to the critical region t < −t0.05,27,where t0.05,27

.= 1.703.

(b) From the given data, the realization of Sp is

sp =

√12s2x + 15s2y

27

.=

√(12)(25.6)2 + (15)(28.3)2

27

.= 27.133

and, assuming H0 is true, the realization of T27 is

t =(x− y)− 0

sp√

1/13 + 1/16

.=

72.9− 81.7

(27.133)(0.3734)

.= −0.8686.

Since t > −1.703, t lies outside the critical region. Therefore these dataprovide insufficient evidence to reject H0 at the significance level α = 0.05.The p-value of this test is p

.= P (T27 < −1.703)

.= 0.196.

6

Problem 4.6.8. Let p equal the proportion of drivers who use a seat beltin a country that does not have a mandatory seat belt law. It was claimedthat p = 0.14. An advertising campaign was conducted to increase thisproportion. Two months after the campaign, y = 104 out of a randomsample of n = 590 drivers were wearing their seat belts. Was the campaignsuccessful?

Solution 4.6.8. We will perform a large sample hypothesis test to testwhether the value of p increased after the advertising campaign. We testwhether the data support rejection of the null hypothesis H0 : p = 0.14 infavor of the right-tailed alternative hypothesis Ha : p > 0.14. We will usethe random variable

Z =p− p√

p(1− p)/n,

whose distribution is approximately the standard normal distribution, asthe test statistic. We will test at the significance level of α = 0.05. Theapproximate rejection region is z > z0.05

.= 1.645.

From the given data, the realized value of p is p = 104/590.= 0.176 and,

assume the null hypothesis is true, the realized value of Z is

z.=

0.176− 0.14√(0.14)(1− 0.14)/590

.= 2.539.

and the approximate p-value of this test is p.= P (Z > 2.539)

.= 0.0056.

Since z > 1.645 these data support rejection of the null hypothesis at thesignificance level of α = 0.05. In fact, the p-value shows that these datawould support rejection of H0 in favor of Ha even at the significance levelof α = 0.01. These data certainly support the claim that the proportion ofseat belt use increased following the advertising campaign.

Problem 4.6.9. In Exercise 4.2.18 we found a confidence interval for thevariance σ2 using the variance S2 of a random sample of size n arising fromN(µ, σ2), where the mean µ is unknown. In testing H0 : σ2 = σ20 againstH1 : σ2 > σ20, use the critical region defined by (n − 1)S2/σ20 ≥ c. That is,reject H0 and accept H1 if S2 ≥ cσ20/(n− 1). If n = 13 and the significancelevel α = 0.025, determine c.

Solution 4.6.9. Let the random variable W have a χ2-distribution withr = n− 1 = 12 degrees of freedom. By Student’s Theorem, 12S2/σ20 has thesame distribution, therefore c is determined by the property P (W ≥ c) =0.025, which yields c

.= 23.337.

Problem 4.6.10. In Exercise 4.2.27, in finding a confidence interval forthe ratio of the variances of two normal distributions, we used a statisticS21/S

22 , which has an F -distribution when those two variances are equal. If

we denote that statistic by F , we can test H0 : σ21 = σ22 against H1 : σ21 > σ22using the critical region F ≥ c. If n = 13, m = 11, and α = 0.05, find c.

7

Solution 4.6.10. By Student’s Theorem, the numerator degrees of freedomis r1 = n−1 = 12 and the denominator degrees of freedom is r2 = m−1 = 10.The value of c is determined by the property P (F ≥ c) = 0.05. Using thenotation of Table V on page 662 we find that c = F0.05(12, 10), and usingthe program R we find c

.= 2.913.

prediction interval Solution 4.2.11. - University of Hawaiigrw/Classes/2013-2014... · Math 472...

Documents

Transcript of prediction interval Solution 4.2.11. - University of Hawaiigrw/Classes/2013-2014... · Math 472...