prediction interval Solution 4.2.11. - University of Hawaiigrw/Classes/2013-2014... · Math 472...
Click here to load reader
Transcript of prediction interval Solution 4.2.11. - University of Hawaiigrw/Classes/2013-2014... · Math 472...
Math 472 Homework Assignment 4
Problem 4.2.11. Let X1, X2, . . . , Xn, Xn+1 be a random sample of sizen + 1, n > 1, from a distribution that is N(µ, σ2). Let X =
∑ni=1Xi/n
and S2 =∑n
i=1(Xi −X)2/(n− 1). Find the constant c so that the statistic
c(X −Xn+1)/S has a t-distribution. If n = 8, determine k such that
P (X − kS < X9 < X + kS) = 0.80.
The observed interval (x − ks, x + ks) is often called an 80% predictioninterval for X9.
Solution 4.2.11. The random variable X−Xn+1 has a normal distributionsince it is a linear combination of independent normally distributed randomvariables. Its expected value and variance are
E[X −Xn+1] = µ− µ = 0,
Var[X −Xn+1] =σ2
n+ σ2 = σ2
(n+ 1
n
).
Therefore,√n/(n+ 1)(X −Xn+1)/σ has a standard normal distribution.
By Student’s theorem, X and S2 are independent random variables, and(n− 1)S2/σ2 has a χ-square distribution with r = n− 1 degrees of freedom.Since X and S2 are functions of X1, . . . , Xn, we see that X, S2, and Xn+1 areindependent random variables, therefore X −Xn+1 and S2 are independentrandom variables. From these observations we conclude that√
n/(n+ 1)(X −Xn+1)/σ√S2/σ2
=
√n/(n+ 1)(X −Xn+1)
S
has a t-distribution with r = n− 1 degrees of freedom. It follows that
c =
√n
n+ 1.
Let n = 8. Then c =√
8/9 = 2√
2/3. Let T7 be a t-distributed randomvariable with 7 degrees of freedom and let t0.10,7 be the number definedby P (T7 > t0.10,7) = 0.10. Then P (−t0.10,7 < T7 < t0.10,7) = 0.80. This,combined with the observation that
−t0.10,7 <c(X −X9)
S< t0.10,7 ⇐⇒ X − t0.10,7
cS < X9 < X +
t0.10,7c
S,
shows that
k =t0.10,7c
=3t0.10,7
2√
2
.=
(3)(1.415)
2.828
.= 1.501.
1
2
Problem 4.2.18. Let X1, X2, . . . , Xn be a random sample from N(µ, σ2),where both parameters µ and σ2 are unknown. A confidence interval for σ2
can be found as follows. We know that (n − 1)S2/σ2 is a random variablewith a χ2(n − 1) distribution. Thus we can find constants a and b so thatP ((n− 1)S2/σ2 < b) = 0.975 and P (a < (n− 1)S2/σ2 < b) = 0.95.
(a) Show that this second probability statement can be written as
P
((n− 1)S2
b< σ2 <
(n− 1)S2
a
)= 0.95.
(b) If n = 9 and s2 = 7.93, find a 95% confidence interval for σ2.(c) If µ is known, how would you modify the preceding procedure for finding
a confidence interval for σ2?
Solution 4.2.18.(a) With the numbers a and b chosen as above, the result follows imme-
diately from the observation that
a <(n− 1)S2
σ2< b ⇐⇒ (n− 1)S2
b< σ2 <
(n− 1)S2
a.
(b) Let W be a χ2-distributed random variable with 8 degrees of free-dom. Then P (W < 17.5345)
.= 0.975 and P (2.1797 < W < 17.5345)
.=
0.95. With n = 9 we calculate the lower confidence limit to be (n −1)s2/b
.= (8)(7.93)/(17.5345)
.= 3.618 and the upper confidence limit to be
(n − 1)s2/a.= (8)(7.93)/(2.1797)
.= 29.104. Therefore the 95% confidence
interval for σ2 is
(3.618, 29.104).
(c) If µ is known then we can replace (n−1)S2/σ2 with∑n
i=1(Xi−µ)2/σ2,which has a χ-square distribution with n degrees of freedom, as a pivotalrandom variable.
Problem 4.2.25. To illustrate Exercise 4.2.24, letX1, . . . , X9 and Y1, . . . , Y12represent two independent random samples from the respective normal dis-tributions N(µ1, σ
21) and N(µ2, σ
22). It is given that σ21 = 3σ22, but σ22 is
unknown. Define a random variable that has a t-distribution that can beused to find a 95% confidence interval for µ1 − µ2.
Solution 4.2.25. Let X =∑9
i=1Xi/9 and Y =∑12
i=1 Yi/12. Since X − Yis a linear combination of independent normal random variables, it has a
3
normal distribution with
E[X − Y ] = µ1 − µ2
Var[X − Y ] =σ219
+σ2212
=σ219
+3σ2112
= σ21
(1
9+
1
4
).
Thus the random variable
(X − Y )− (µ1 − µ2)σ1√
1/9 + 1/4(1)
has a standard normal distribution.Since 8S2
1/σ21 and 11S2
2/σ22 = 11S2
2/(3σ21) are independent random vari-
ables that have χ2-distributions with respective degrees of freedom 8 and11, the random variable
19S2p/σ
21 = 8S2
1/σ21 + 11S2
2/(3σ21) = (8S2
1 + 11S22/3)/σ21
has a χ2-distribution with 19 degrees of freedom. Therefore Sp/σ1 is thesquare root of a χ2-distributed random variable divided by its degrees offreedom. Since Student’s Theorem implies that X, Y , S2
1 , and S22 are inde-
pendent random variables, we see that the random variable in equation (1)and S2
p are independent. Therefore
(X−Y )−(µ1−µ2)σ1√
1/9+1/4
Sp/σ1=
(X − Y )− (µ1 − µ2)Sp√
1/9 + 1/4
has a t-distribution with 19 degrees of freedom that can be used for a 95%confidence interval for µ1 − µ2, where S2
p = (8S21 + 11S2
2/3)/19.
Problem 4.2.27. Let X1, X2, . . . , Xn and Y1, Y2, . . . , Ym be two indepen-dent random samples from the respective normal distributions N(µ1, σ
21)
and N(µ2, σ22), where the four parameters are unknown. To construct a
confidence interval for the ratio, σ21/σ22, of the variances, form the quotient
of the two independent χ2 variables, each divided by its degrees of freedom,namely,
(2) F =
(m−1)S22
σ22
/(m− 1)
(n−1)S22
σ21
/(n− 1)=S22/σ
22
S21/σ
21
,
where S21 and S2
2 are the respective sample variances.
(a) What kind of distribution does F have?(b) From the appropriate table, a and b can be found so that P (F < b) =
0.975 and P (a < F < b) = 0.95.
4
(c) Rewrite the second probability statement as
(3) P
[aS21
S22
<σ21σ22
< bS21
S22
]= 0.95.
The observed values, s21 and s22, can be inserted in these inequalities toprovide a 95% confidence interval for σ21/σ
22.
Solution 4.2.27.(a) The random variable F in equation (2) has an F -distribution with
r1 = m numerator degrees of freedom and r2 = n denominator degrees offreedom.
(b) Using the notation of Table V on page 661, b = F0.025(m,n) anda = F0.975(m,n).
(c) A direct calculation shows that
a <S22/σ
22
S21/σ
21
< b ⇐⇒ aS21
S22
<σ21σ22
< bS21
S22
.
Combining this calculation with P (a < F < b) = 0.95 leads immediatelyto equation (3). We therefore find a 95% confidence interval for the ratioσ21/σ
22 is given by (
F0.975(m,n)S21
S22
, F0.0275(m,n)S21
S22
).
Problem 4.6.5. Assume that the weight of cereal in a “10-ounce box” isN(µ, σ). To test H0 : µ = 10.1 against H1 : µ > 10.1, we take a randomsample of size n = 16 and observe that x = 10.4 and s = 0.4.
(a) Do we accept or reject H0 at the 5% significance level?(b) What is the approximate p-value of this test?
Solution 4.6.5.(a) The test statistic,
√16(X − 10.1)/S has a Student’s t-distribution
with r = 15 degrees of freedom. The realization of the test statistic ist = (4)(10.4 − 10.1)/(0.4) = 3, and the critical value for the right-tailedalternative H1 is t0.05,15
.= 1.753. Since t > t0.05,15, we reject H0 at the 5%
significance level.(b) The approximate p-value of this test is p = P (T15 > 3)
.= 0.0045.
Problem 4.6.6. Each of 51 golfers hit three golf balls of brand X andthree golf balls of brand Y in a random order. Let Xi and Yi equal theaverages of the distances traveled by the brand X and brand Y golf ballshit by the ith golfer, i = 1, 2, . . . , 51. Let Wi = Xi − Yi, i = 1, 2, . . . , 51.Test H0 : µW = 0 against H1 : µW > 0, where µW is the mean of thedifferences. If w = 2.07 and s2W = 84.63, would H0 be accepted or rejectedat an α = 0.05 significance level? What is the p-value of this test?
5
Solution 4.6.6. We may assume that the distribution of the test statistic(W − µW )/(SW /
√51) is approximately the standard normal distribution.
Under the null hypothesis the realization of the test statistic is
z.=
w − 0√s2w/n
.=
2.07√84.63/51
.= 1.607.
The critical value for the right-tailed alternative hypothesis is z0.05.= 1.645,
and since z < z0.05 we accept H0 at the α = 0.05 significance level. Thep-value of this test is p
.= P (Z > 1.607)
.= 0.054.
Problem 4.6.7. Among the data collected for the World Health Organi-zation air quality monitoring project is a measure of suspended particlesin µg/m3. Let X and Y equal the concentration of suspended particles inµg/m3 in the city center (commercial district) for Melbourne and Houston,respectively. Using n = 13 observations of X and m = 16 observations ofY , we test H0 : µX = µY against H1 : µX < µY .
(a) Define the test statistic and critical region, assuming that the unknownvariances are equal. Let α = 0.05.
(b) If x = 72.9, sx = 25.6, y = 81.7, and sy = 28.3, calculate the value ofthe test statistic and state your conclusion.
Solution 4.6.7.(a) We are testing a difference in means and the sample sizes are small.
This suggests using the random variable
T27 =(X − Y )− (µX − µY )
Sp√
1/13 + 1/16,
where S2p =
12S2X + 15S2
Y
27,
as our test statistic. The random variable T27 has an approximate Student’st-distribution with 27 degrees of freedom. We are testing the left-tailedalternative µX − µY < 0, which leads to the critical region t < −t0.05,27,where t0.05,27
.= 1.703.
(b) From the given data, the realization of Sp is
sp =
√12s2x + 15s2y
27
.=
√(12)(25.6)2 + (15)(28.3)2
27
.= 27.133
and, assuming H0 is true, the realization of T27 is
t =(x− y)− 0
sp√
1/13 + 1/16
.=
72.9− 81.7
(27.133)(0.3734)
.= −0.8686.
Since t > −1.703, t lies outside the critical region. Therefore these dataprovide insufficient evidence to reject H0 at the significance level α = 0.05.The p-value of this test is p
.= P (T27 < −1.703)
.= 0.196.
6
Problem 4.6.8. Let p equal the proportion of drivers who use a seat beltin a country that does not have a mandatory seat belt law. It was claimedthat p = 0.14. An advertising campaign was conducted to increase thisproportion. Two months after the campaign, y = 104 out of a randomsample of n = 590 drivers were wearing their seat belts. Was the campaignsuccessful?
Solution 4.6.8. We will perform a large sample hypothesis test to testwhether the value of p increased after the advertising campaign. We testwhether the data support rejection of the null hypothesis H0 : p = 0.14 infavor of the right-tailed alternative hypothesis Ha : p > 0.14. We will usethe random variable
Z =p− p√
p(1− p)/n,
whose distribution is approximately the standard normal distribution, asthe test statistic. We will test at the significance level of α = 0.05. Theapproximate rejection region is z > z0.05
.= 1.645.
From the given data, the realized value of p is p = 104/590.= 0.176 and,
assume the null hypothesis is true, the realized value of Z is
z.=
0.176− 0.14√(0.14)(1− 0.14)/590
.= 2.539.
and the approximate p-value of this test is p.= P (Z > 2.539)
.= 0.0056.
Since z > 1.645 these data support rejection of the null hypothesis at thesignificance level of α = 0.05. In fact, the p-value shows that these datawould support rejection of H0 in favor of Ha even at the significance levelof α = 0.01. These data certainly support the claim that the proportion ofseat belt use increased following the advertising campaign.
Problem 4.6.9. In Exercise 4.2.18 we found a confidence interval for thevariance σ2 using the variance S2 of a random sample of size n arising fromN(µ, σ2), where the mean µ is unknown. In testing H0 : σ2 = σ20 againstH1 : σ2 > σ20, use the critical region defined by (n − 1)S2/σ20 ≥ c. That is,reject H0 and accept H1 if S2 ≥ cσ20/(n− 1). If n = 13 and the significancelevel α = 0.025, determine c.
Solution 4.6.9. Let the random variable W have a χ2-distribution withr = n− 1 = 12 degrees of freedom. By Student’s Theorem, 12S2/σ20 has thesame distribution, therefore c is determined by the property P (W ≥ c) =0.025, which yields c
.= 23.337.
Problem 4.6.10. In Exercise 4.2.27, in finding a confidence interval forthe ratio of the variances of two normal distributions, we used a statisticS21/S
22 , which has an F -distribution when those two variances are equal. If
we denote that statistic by F , we can test H0 : σ21 = σ22 against H1 : σ21 > σ22using the critical region F ≥ c. If n = 13, m = 11, and α = 0.05, find c.
7
Solution 4.6.10. By Student’s Theorem, the numerator degrees of freedomis r1 = n−1 = 12 and the denominator degrees of freedom is r2 = m−1 = 10.The value of c is determined by the property P (F ≥ c) = 0.05. Using thenotation of Table V on page 662 we find that c = F0.05(12, 10), and usingthe program R we find c
.= 2.913.