Econ 246 Topic 2 page 1
Topic 2: Estimation Introduction
Objective: Using sample information to learn about p________n characteristics. (i) Estimation: Want to measure the _____ of a population parameter. For example, one could use the sample statistic X
X to measure μ.
How Agood@ is the e________? What are the methods for estimating population
parameters?
Topic 2 page 2
(ii) Hypothesis Testing: Want to test the v_______ of some statement about the population characteristics. (iii) Prediction: F_______ a population parameter in the future. For now, we will:
1) Present criteria for judging how well a s_____ statistic E________ the population parameter.
2) To analyse several popular methods for estimating these parameters.
Topic 2 page 3
We will now apply our knowledge of the probability of sampling statistics, and use sample measures, to learn about population measures.
Developing methods for accurately estimating the value of population parameters is an extremely important part of statistical analysis.
Topic 2 page 4
Estimation: Estimators and Estimates Definition: ESTIMATOR: Random variables used to e_______ population parameters are called estimators. ESTIMATES: Specific ______ of these variable estimators of population parameters are called estimates. Example:
X is an estimator of μ: XXn
n
ii =⎟⎠
⎞⎜⎝
⎛ ∑=1
1 This formula is an estimator.
A X =10" A specific value of estimator is an estimate of μ.
Topic 2 page 5
Remarks: It is important to note the distinction between the two: Estimator: is a formula or r___ used to measure a population parameter on the basis of s_____ information. Eg. Formula defines a statistic. Estimate: is the value c_______ for a given sample when
using an estimator.
Eg. Want to determine the mean of a population.
Must use ∑= nXX i / as the estimator. The numerical value of X is the estimate.
Topic 2 page 6
The value of X =10. 10 is the point estimate of μ. Eg. We could also use the sample m_____ or m___ as estimators of the population mean μ. What makes X a better _________ than the median or the mode?
Topic 2 page 7
It is not necessary for an estimate of a population parameter to be one single value.
The estimate could be a _____ of values. Two Types of Estimation: 1) Point Estimation: estimates that specify a single value of the population are point e_______s. Example: The estimated value of new Japanese cars is $45,000. I.e. X =$45,000.
Topic 2 page 8
2) Interval Estimation: Estimates that specify a r____ of values are called interval estimates. Example: The mean price range of new Japanese cars will fall somewhere between $25,000 to $45,000. Specifies a range of values indicating that we believe the mean price for the population lies w_____ this interval. Remarks: Point estimation can be m________g because we must use estimation to determine the estimate. Hence, there is an element of uncertainty introduced into estimation process.
Topic 2 page 9
Must report some measure of the r________ of the estimation process. (How accurate is our estimation process?) Reliability of Estimation Depends On:
E_________ method S_______ method size of the sample characteristics (parameters) of the parent population.
Econ 246 Topic 2 page 10
Question: AWhich estimator is best?@ The choice of optimal estimator needs to be made with any estimation problem. Generally the choice of estimator in a given situation usually depends on how well the estimator satisfies certain criteria. X and s2 are all functions of the random sample values of Xi=s. These statistics have a sampling d___________. The information in the sampling distribution of an estimator provides one basis for determining how effective or ____ an estimator is.
Topic 2 page 11
Ex: We use X to estimate μ. We know E( X )=μ and V( X)=σ2/n. So, on a_____, using X to generate an estimate of μ gives the right answer, but not necessarily for any particular sample: Suppose we know that μ=10 We draw three samples: X 1 =12
X 2 = 8 X 3 =10.
M
X kik∑ = =/ 10 μ
The expected value of the sampling distribution of X converges to the population mean.
Topic 2 page 12
Also, as the sample size gets bigger, the variance of X , V( X ), ________. So, as an estimator of μ, X is more reliable if we have large samples of data.
Topic 2 page 13
Sampling Properties of Good Estimators: (4 Primary categories)
(I) Un______ness:
On a______, the value of the estimate should equal the population parameter being estimated. If the average value of the estimator does not equal the actual parameter value, the estimator is a biased estimator.
Ideally, an estimator has a bias of ____ if it is said to be unbiased:
AAn estimator is said to be unbiased if the expected value of the estimator is equal to the ____ value of
the parameter being estimated.@
Topic 2 page 14
Generally, if θ is a population parameter to be estimated, θ and $θ is an estimator where $ $( , , , );θ θ= X X Xn1 2 L $θ is said to be an unbiased estimator of θ if: ( )E $ .θ θ= Example 1) μ=)(XE under simple random sampling (Topic 1); so X is an unbiased estimator.
Econ 246 Topic 2 page 15
Example 2) If Xi ~N(μ,σ2), then it can be shown that:
( ) .X X isi
i
n −⎛⎝⎜
⎞⎠⎟
=∑ σ
χν1
22
Chi-square
Taking the expected value of the χ2 we get:
( ) ( )
( ) ( )
( ) ( )
E X X n
nX X
if
En
X X
ii
n
ii
n
ii
n
1 1
11
11
2
2
1
2
1
2
1
2
2
σ
σ
σ
−⎛⎝⎜
⎞⎠⎟ = −
=−
−
−−
⎛
⎝⎜
⎞
⎠⎟ =
=
=
=
=
∑
∑
∑
since the sample variance s
we rearrange the epectation equation above:
E(s )
2
2
,
.
Topic 2 page 16
The sample ________ is an unbiased estimator of the population variance. f
f
( $)
( ~)
θ
θ f ( ~)θ
E( $)θ θ= E( ~)θ θ≠ $, ~θ θ Bias ~ [ ( ~) ]θ θ θ= −E Two estimators: $~θθ
is unbiased.
is biased.
Unbiasedness criterion requires only that the _______ value of the estimator equal the population parameter.
f ( $)θ
Topic 2 page 17
It says nothing about the potential dispersion of the estimator values around the population parameter. It only seems sensible that most of the values of a good estimator should be reasonably close to the population parameter. For this reason, the next property of a good estimator is important.
Topic 2 page 18
(II) Efficiency: The most efficient estimator among a group of unbiased estimators is the one with the s______ variance (or dispersion of values).
If $θ and~θ are both unbiased estimators of θ , and the variance of $θ is less than or equal to the variance of ~θ , V V( $) ( ~)θ θ≤ , then $θ is an efficient estimator of θ , relative to ~θ . The most efficient estimator is called the best unbiased estimator, where >____= implies minimum v_______.
Topic 2 page 19
Relative Efficiency: is defined as the ratio of the variance of the two estimators:
Relative Efficiency Variance of the first estimatorVariance of the second estimator
= =VV( $)( ~)θθ
; R.E. < 1 if $θ is efficient relative to ~θ . R.E. > 1 if ~θ is efficient relative to $θ . R.E. = 1 equally efficient.
What if one or both of the estimators are biased? We then calculate the MSE of the estimators to determine which estimator is best:
Topic 2 page 20
Definition: AMean Squared Error@ ( MSE) The mean squared error (MSE) of $θ is E( $ )θ θ− 2 :
[ ]( ) ( )[ ][ ] [ ] ( )( )
}
MSE E add subtract E
E E E group in two
E E E and
E E E E E E E
Variance Bias
V Bias
( $) ( $ ) & ( $)
$ ( $) ( $)
$ ( $) ( $) exp
( $ ( $)) ( ( $) ) ( $ ( $)) ( ( $) )
( $) ( $)
( $) ( ( $
θ θ θ θ
θ θ θ θ
θ θ θ θ
θ θ θ θ θ θ θ θ
θ θ
θ
= − ⇒
= − + − ⇒
= − + − ⇒
= − + − + − −
=
= +
⇑ ⇑⇑
2
2
2
2 2 2
06 744 844 674 84
[ ] [ ]θ )2
0+
The MSE enables us to compare biased estimators.
Topic 2 page 21
Note:
MSE V if the Bias( $) ( $) ( $)θ θ θ= = 0
. Definition: Let $θ and
~θ be two estimators of θ .
Then $θ is an efficient compared to ~θ if MSE ( $θ )≤ MSE (~θ ).
Topic 2 page 22
Graphical illustration: (I) f
f
( $)
( ~)
θ
θ f ( ~)θ
E( $)θ θ= E( ~)θ θ≠ $,
~θ θ Bias ~ [ ( ~) ]θ θ θ= −E In this example, the V V( $) ( ~)θ θ< , and $θ is unbiased while ~θ is biased. Hence, MSE ( $θ ) < MSE (~θ ) . Smaller variance and unbiased estimator is better than larger variance and biased estimator.
f ( $)θ
Biased
Unbiased
Topic 2 page 23
(II): Biased and Efficient Versus Unbiased and Inefficient f
f
( $)
( ~)
θ
θ f ( ~)θ
E( $)θ θ= E( ~)θ θ≠ $, ~θ θ Bias ~ [ ( ~) ]θ θ θ= −E Here V V( $) ( ~)θ θ> , but $θ is unbiased while
~θ is biased. MSE trade-off!
f ( $)θ
Biased
Unbiased
Topic 2 page 24
Must compare the MSE numbers: MSE( $θ ) = V( $θ ) MSE (~θ )= V(~θ ) + [Bias(~θ )]2 ~θ may be a better estimator of θ if MSE(~θ ) is less than the MSE ( $θ ).
Topic 2 page 25
(III) Sufficiency: AAn estimator is said to be sufficient if it uses all the ___________ about the population parameter that the sample can provide.@ Estimator incorporates all of the information available from the ______. Sufficient estimators take into account each sample observation and any information that is generated by these observations. Example: >Not sufficient=: the median is ___ a sufficient estimator because it ranks the observations to obtain the
middle value. ( )Median X mediani: , , ; .= =1 3 5 3
Topic 2 page 26
Example: >Not sufficient=: the mode is not sufficient because it represents the most frequent occurring observation as an estimator of the population mean.
( )Mode Xi: , , , , , , ; .= =1 1 1 3 4 4 5 1mode
Example: >Sufficient=: The sample ____ is a sufficient estimator because it uses the entire data set.
( )X
nX
nX X Xi
i
n
n= = + + +=∑1 1
11 2 L
Sufficiency is a necessary condition for efficiency.
Econ 246 Topic 2 page 18
(IV) Consistency: >Large Sample Property= Usually the ____________ of an estimator will change as the sample size changes.
(The sample ____ changes the distribution: The properties of estimators for large sample sizes (as n N or infinity) are important. (Biasness and inefficiency of estimators may change as n approaches infinity.) Properties of estimators based on distributions approached as n becomes large, are called asymptotic properties.
Econ 246 Topic 2 page 19
These properties may differ from the finite or small sample properties. Consistency is the most important asymptotic property:
A consistent estimator achieves c__________ in probability limit of the estimator to the population parameter as the size of n increases. (Beyond this course.) What we will discuss is a >stronger= notion of consistency:
Mean Square Consistency: Recall: MSE= _________+_____2. An estimator is mean square consistent if its MSE 0 as the sample size, n, becomes large.
Econ 246 Topic 2 page 20
AAn estimator, $θ $θ, is mean square consistent if its MSE: E ( $θ -θ )2, approaches zero as the sample size becomes large@.
( ) ( )E and V as n$ $ .θ θ θ→ → → ∞0
Note: If an estimator is mean square consistent, then it will
also be consistent in the convergence in probability sense; But an estimator may be consistent in the convergence in probability sense, yet not be mean square consistent.
Econ 246 Topic 2 page 21
Mean square consistency is not a >necessary= condition for convergence in probability.
Consistency implies that the probability distribution of the estimator for large samples becomes smaller and
smaller (i.e. variance is decreasing as more information about the population is used in each sample). The
distribution becomes more centred about the ___ value of the parameter (bias getting smaller). And in the limit as n = ∞ , the probability distribution of the estimator degenerates into a single Aspike@ at the true value.
When n bias→ ∞ → →, var & .0 02
Econ 246 Topic 2 page 22
Example #1: Unbiased Estimator: $θ : E( $)θ θ=
Note: Sampling distribution changes as n increases.
f n( $)θ =∞
f n n( $)θ = 3
f n n( $)θ = 2
f n n( $)θ = 1
E( $)θ θ=
f ( $)θ
$θ
V as n( $) .θ → → ∞0
{ }n n n1 2 3< < < ∞
Econ 246 Topic 2 page 23
Example #2: Biased Estimator: $θ E( $)θ θ≠ but (i) Bias $θ 0, as n ∞ . I.e. E as n( $) ,θ θ→ → ∞
(Asymptotic unbiasedness.) (ii) Var $θ 0, as n ∞ . (Asymptotically efficient)
θ E n n( $)θ = 2
E n n( $)θ = 1
f n( $)θ =∞
f n n( $)θ = 3
f n n( $)θ = 2
f n n( $)θ = 1
E n n( $)θ = 3
$θ
f ( $)θ
Econ 246 Topic 2 page 24
Example: Let θ be a random variable with a mean of μ and a variance
of σ 2 :θ μ σ~ ( , ).2
Let $θ be an estimator of θ based on a random sample of size n such that:
( )E n
V n
( $)
( $)
θ θ
θ σ
= +
=
3
4
2
23
A) Is $θ an unbiased estimator of θ ? Explain.
Econ 246 Topic 2 page 25
E( $)θ θ θ= if it is an unbiased estimator of . Not unbiased. B) Is $θ a consistent estimator of θ ? Consistency implies:
Econ 246 Topic 2 page 26
So, the:
( )bias is 0 as n
V( ) = 4n2
3
3
0
2n
as n
→ → ∞
→ → ∞$θ σ
Consistent estimator. C) Calculate the MSE( $θ ) and compare it to the MSE X( ).
( )
MSE(
= 4n +
= 14n
( n + 36)
23
2
42
$) ( $) ( ( $))θ θ θ
σ
σ
= +V bias
n
2
23
Econ 246 Topic 2 page 27
MSE(X n2
) ( )= =V X σ
The MSE X( ) is ______ than the MSE( $θ ).
Econ 246 Topic 2 page 28
Estimating Unknown Parameters
Interval Estimation
Recall, the major weakness of point estimates is that it can be misleading. It does not permit the Aexpression@ of any degree of uncertainty about the estimate.
We need to quantify this un_________.
The usual way to express this uncertainty (i.e. probability of
error) about an estimate is to define an ________ of values in which the population parameter is likely to be.
This is known as interval estimation.
Econ 246 Topic 2 page 29
Confidence Intervals In Topic 1, we determined the value of a & b such that
P a X b( )≤ ≤ = some predetermined value. The interval (a,b) is called a probability interval for X . Example: P a X b( )≤ ≤ =0.90 based on a random sample of size n, drawn from a population with a known mean μ.
-1.645 E X( ) = μ 1.645 Z
Area=90%
Area=5% Area=5%
Econ 246 Topic 2 page 30
We know the random variable X will fall in the probability interval (a,b) 90% of the time. Each ______ drawn generates a value for X . The interval states that the random variable X will fall within this 90% interval 90% of the time, and fall outside the interval 10% of the time.
Econ 246 Topic 2 page 2
Question: AWhat if μ is unknown and we want to construct a confidence interval for μ based on X ?@ In a confidence interval, the exact sampling _________ of the estimator is applied to construct confidence limits that assume a preset probability of error for the interval estimate. The confidence interval defines an interval based on X , such that, μ is likely to fall within such intervals 90% of the time the method is employed creating 90% confidence interval. Hence, on average, 90 out of 100 times, such calculated intervals from samples of size n, will contain μ.
Econ 246 Topic 2 page 3
So, our objective is to preset a _____ of values which cover the true parameter with some fixed probability. Note: This example does not state that there is a 90% probability that μ lies within the interval.
μ is not a random variable. It is a c_______: E(μ)=μ no distribution.
If μ lies within the interval, then the probability that μ is in the interval = 1.
If μ lies outside the interval, the probability is zero.
Econ 246 Topic 2 page 4
It is improper to make probability statements about the values of population parameters.
Those parameters are not random!
When constructing an interval estimate, the boundaries (endpoints) are based on ______ information. Each sample will generate different values. Hence, these intervals are r______variables. So, we can make proper probability statements about the Aproportion of intervals@ that would include a specific parameter value.
Econ 246 Topic 2 page 19
Notation: Use the term confidence interval when specifying the upper and lower limit on the likely value of a parameter. A 90% confidence interval for a parameter declares: AThe probability is 90% that the interval to be determined on the basis of the sample evidence would be one that includes the population parameter.@
Econ 246 Topic 2 page 20
P(interval includes μ )=0.80 Sample:
1 X1 2 X2 3 X3 4 X4 5 X5 μ Four out of five samples cover μ .
Example of 80% Confidence Intervals:
Econ 246 Topic 2 page 21
Econ 246 Topic 2 page 22
Confidence Interval For μ, (σ Known) Illustration with reference to estimating the mean of a normal population. Assume: Xi ~N(μ,σ2); where i=1, 2, 3, ...
Also assume σ2 is known.
Use X as a point estimator of μ, where X ~N(μ, σ2 /n).
If we standardize Z X
n=
− μσ ~ N(0,1).
To have 95% of the area in the centre, we must have 2.5% of the area in each tail. Z=1.96 represents 97.5% of the area from Z=1.96 to the left.
Econ 246 Topic 2 page 23
So:
[ ]P Z
P X
n
− ≤ ≤ =
− ≤−
≤⎡
⎣
⎢⎢
⎤
⎦
⎥⎥=
1 96 1 96 0 95
1 96 1 96 0 95
. . .
. . .μσ
If we rearrange this:
P n X nμ σ μ σ−⎛⎝⎜
⎞⎠⎟ ≤ ≤ +⎛
⎝⎜⎞⎠⎟
⎡⎣⎢
⎤⎦⎥=1 96 1 96 0 95. . .
This is the range of values over which we are 95% confident X will lie.
Econ 246 Topic 2 page 24
Use this to determine the interval which has 95% probability of containing μ. I.e. the 95% confidence interval for μ. i) Upper limit:
X nrearrange
X n
≤ +
≥ −
μ σ
μ σ
1 96
1 96
.
:
.
Lower Limit
Econ 246 Topic 2 page 25
ii) Lower limit:
μ σ
μ σ
− ≤
≤ +
1 96
1 96
.
:
.
n X
rearrange
X n So the probability that the random interval:
X n X n− +1 96 1 96. , .σ σ
covers μ is 0.95. AIn 95% of samples, μ will lie within the interval calculated
for each sample.
Upper Limit
Econ 246 Topic 2 page 26
Remarks: (i) The interval is random because it depends on the random variable X .
μ is a fixed value. Hence, μ does not have a probability distribution. (ii) The intervals are based on the sampling distribution of X .
Fix n (sample size) and repeatedly draw samples.
Compute X and interval X n±⎡⎣⎢
⎤⎦⎥
1 96. σeach time.
95% of such intervals will _______ μ.
Econ 246 Topic 2 page 27
But, there is no guarantee that on any particular draw, the interval will cover μ.
Either it does or does not.
What Determines the Length of the Interval?
(I) n ⇒ the sample ____
(II) σ 2 ⇒ population ________ (big variation means big interval)
(III) The chosen probability ⇒ (confidence level) (IV) Sampling method
Econ 246 Topic 2 page 28
The Probability of An Error It is often more convenient to refer to the probability that a confidence interval will not include the parameter of interest. Notation:
Let Aα@ = the probability that a confidence interval will __ include the parameter of interest. The value of α is known as the probability of making an _____.
α indicates the p_________ of times that one will be incorrect in assuming that the intervals would contain the population parameter.
Econ 246 Topic 2 page 29
Usually the confidence interval is referred to as being of size 100(1-α)%. Example: (Continued from above) In 5% of the cases, on the
average, the intervals calculated as X n±⎡⎣⎢
⎤⎦⎥
1 96. σ would not
contain μ. So, if α = 0.05, then the associated interval is a 100(1-α)% confidence interval. (i.e. 95% confidence interval.) If a 90% confidence interval is specified, then α = 1-(90/100) = 0.10. There is a >trade-off= between the value of α and the size of the confidence interval.
Econ 246 Topic 2 page 30
The lower the value of α, the larger the interval must be. (Assuming all factors are held constant.)
To be very confident that the interval will cover the population parameter, then a relatively _____ interval is necessary. If one need not be very confident that the population parameter will be within the interval, then a small interval is fine. The value of Aα@ is usually arbitrarily set at either 1%, 5%, or 10% representing 99%, 95% and 90% confidence intervals, respectively.
Although this choice is standard, it may not necessarily lead to an optimal trade-off between the size of the confidence interval and the risk of making an error.
Econ 246 Topic 2 page 31
Note: (i) Confidence intervals are constructed on the basis of
sample information. So, a change in >α= and >n= affect the ____ of the interval. The larger the sample size, the more confident one can be about the estimate of the population parameter. Hence, the smaller the need for the interval to be to guarantee a given level of confidence.
(ii) It is usually desirable to have as _____ a confidence
interval as possible. But optimal interval size must be decided on by considering the Atrade off@ between the cost of sampling and the amount of risk of making an error that one wants to assume.
Econ 246 Topic 2 page 32
*The risk of making an error and sample size trade-off is an important issue....more on this later.
Econ 246 Topic 2 page 25
Determining A Confidence Interval Both n and α are usually set in advance. But we can solve for an unknown >α= and n under certain conditions. In addition to specifying >α= before we construct a confidence interval, we must also specify how much of the total error, >α,= is attributed to the possibility that the _____parameter might be larger than the upper limit of the confidence interval. The remainder of the error is attributed to the possibility that the true parameter might be _______ than the lower bound of the interval.
Topic 2 page 26
Usually this split of >α= is obtained by dividing >α= equally between the upper and lower tails of the distribution:
AThe common procedure for determining confidence intervals is.... to exclude half of >α= (i.e. α /2) on the high
side and half of >α= on the low side.@
In applying this procedure, we are assuming that the ______y of under and over-estimating the true parameter is the same.
This may not be true!
The decision on how to divide >α= should depend on how serious or c______it is to make errors on the high side relative to errors on the low side.
Topic 2 page 27
Problem: It can be very difficult to determine the cost of making an error. Next: The other factor that influences the boundaries of a particular confidence interval is the sampling distribution of the statistic used to estimate the population parameter. Since different sampling distributions are used for estimating different population parameters, such as the mean or the variance, we will next examine the process of constructing intervals for each of these cases:
Topic 2 page 28
Econ 246 Topic 2 page 29
Confidence Interval for μ With σ Known
We will examine in this section a confidence interval for the population parameter μ based on a random sample drawn from a normal parent population with a known population standard deviation.
We use X as the sample statistic for estimating μ (see previous section for details) because:
E( X ) = μ (unbiased estimator)
s nx = σ
Topic 2 page 30
Z X
n=
−⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
( )μσ
standard normal distribution. Notation: Let Zα represent the point for which the probability of observing values of Z greater than Zα= α:. P(Z>Zα)= α, and the cumulative probability at this point is F(Zα)=1-α.
Topic 2 page 31
Example: F(Z0.25)=0.75:
Z This notation allows us to represent the proportion of the total area under a normal probability density function in two ways:
F(Z) area to the left of a point Z Zα area to the right of a point Z having an area of >α= to its
right.
0 0.675
Area =
α = 0 25.
( )1 − α
Topic 2 page 32
Since the normal distribution is _________, the negative values (- Zα) can denote points in the lower left tail of the normal distribution below which a proportion of >α= of the area is excluded. Example: 90% Confidence Interval ( )1 − α -Z0.05=-1.645 0 Z0.05=1.645
0.05 0.95
0.90
α2 0 05= .α
2 0 05= .
Topic 2 page 33
For finding the upper and lower confidence interval boundaries, one only has to find the points excluding (α/2) proportion of the area in each tail of the normal distribution, such that the total area ________ equals α and the area included in the interval between the upper and lower limits is (1-α).
Topic 2 page 34
Method:
Let Z P Z Zα α2 2= ≥( ). and − = ≤Z P Z Zα α2 2( ). ( )1 − α 0
−Zα 2
Zα 2
α2
α2
Topic 2 page 35
Example: If α=0.10 (90% confidence interval)
+ = =Z Zα 2 0 10 2 0 05. .
The value that satisfies is the same point as F(Z)=F(0.95)=1.645. ( ) .1 0 90− =α -1.645 0 1.645
P Z( . . ) . . . .− ≤ ≤ = − − =1 645 1 645 1 0 05 0 05 0 90
α2 0 05= . α
2 0 05= .
Topic 2 page 36
Thus, the probability that AZ@ falls between the two limits -Zα/2 and Zα/2 is represented by: P Z Z Z( )− ≤ ≤ = −α α α2 2 1 Which is 100(1-α)% probability interval for Z, any standardized normal variable:
Z X
n=
−⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
( )μσ
.
Topic 2 page 37
To derive a confidence interval for μ, we simply rearrange the expression:
( ):− ≤ ≤Z Z Zα α2 2
X Z
nX Z
n− ⎛
⎝⎜⎞⎠⎟≤ ≤ + ⎛
⎝⎜⎞⎠⎟α α
σ μ σ2 2
The equation represents the 100(1-α)% confidence interval for μ, when σ is _____ and the parent population is _____l.
Topic 2 page 38
Example : If α = 0.025, the value of Z α/2 that satisfies
P Z Z Z( ) ..≥ = =α 2 0 0125 0 0125
is the same point as F(Z) = 0.____. From the Z table, the value of Zα/2 = Z0.0125=____. ( ) .1 0 975− =α -2.24 0 2.24 Z
α2 0 0125= .
α2 0 0125= .
97.5% C.I.
Topic 2 page 39
Example 1: A marketing firm is interested in determining the average percentage of university students who pay off their credit cards each month. A random sample of 25 students out of a population of 2300 is taken to determine the average percent. The sample average is 23(%). (i) Assuming a normal population with a standard deviation of 8(%), find a 95% confidence interval for the population mean. α=0.05. α/2=0.05/2=0.025. Z(0.975)=____ Z(0.975)=1.96
α/2=0.025
Topic 2 page 40
[ ]
{ }
X Zn
X Zn
Confidence Interval toWidth
− ⎡⎣⎢
⎤⎦⎥≤ ≤ + ⎡
⎣⎢⎤⎦⎥
= ±
= ±
=
0 025 0 025
23 1 96 85
23 3136
95 19 864 26 1366 272
. .
( . )
.
% . . ;. .
σ μ σ
Hence, if many repeated samples were taken, 95% of the intervals constructed as:
X ± ⎡
⎣⎢⎤⎦⎥
1 96 85
.
for a sample size of 25, will include the true value of the unknown average value. Whether this particular interval does or does not contain μ is unknown. However, we now have a measurement of the uncertainty associated with our statement about the actual value of μ.
Topic 2 page 41
(ii) Give a 99% confidence interval for the average % of students who pay off their credit cards each month. Compare it to (i). 99% Confidence interval is: α=0.01. α/2=0.01/2=0.005. Z(0.995)=2.576
[ ]
{ }
X Zn
X Zn
Confidence Interval toWidth
− ⎡⎣⎢
⎤⎦⎥≤ ≤ + ⎡
⎣⎢⎤⎦⎥
= ±
= ±
=
0 005 0 005
23 2 576 85
23 4 1216
99 18 8784 27 12168 2432
. .
( . )
.
% . . ;. .
σ μ σ
Compared to part (i), the 99% confidence interval is wider.
Topic 2 page 42
(iii) Suppose a 95% confidence interval is required but you want the interval to be narrower than part (i). What do you do? Change the sample ____. Let n=64. A 95% confidence interval has an α=0.05.
X Z n± ⎡⎣⎢
⎤⎦⎥0 025.
σ
2 0 025× ⎡
⎣⎢⎤⎦⎥
Z n.σ
The confidence interval:
[ ]
{ }
= ±
= ±
=
23 1 96 88
23 1 96
95 21 04 24 963 92
( . )
.
% . . ;. .
Confidence Interval toWidth
implies the width of CI equalling
Topic 2 page 43
2 2× ⎡
⎣⎢⎤⎦⎥
Z nασ
X
More on this later.......
Econ 246 Topic 2 page 41
If we wanted to dictate the width of the confidence interval, we could solve for n: Let:
( )
( )
= ⎛⎝⎜
⎞⎠⎟
⎛⎝⎜
⎞⎠⎟=
= ⎛⎝⎜
⎞⎠⎟
⎛⎝⎜
⎞⎠⎟=
= =
= =
= ⇔ = =
2 2 5
2 1 96 8 2 5
2 15 68 2 531 36 2 5
31 362 5
12 544 157 352
2
2
Zn
n
n
nrearranging
n n
ασ .
. .
. / .. .
:..
. .
Width of the confidence
interval.
Topic 2 page 42
We need a sample size of 158 to achieve a 95% confidence interval. Given some >α=, a narrower confidence interval can be derived with a larger sample size. *****************************
Topic 2 page 43
AWhat if the assumption of normality for the population is relaxed?@ If the sample size, n, is small, then the distribution of
( )X
n
−⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
μσ
is not ______ and there is no easy way to determine a confidence interval. But, according to the ___, when n is large (n>30), ( )X
n
−⎡
⎣
⎢⎢
⎤
⎦
⎥⎥
μσ
is approximately normally distributed and we can apply the formula.
Topic 2 page 44
Applied Questions: 9.12: When property adjusted, the period of intense heat applied in a fabricating process is normally distributed with a variance of σ2 = 0.50 (seconds)2. Four random cycles of the process are monitored and the results are heat periods of 51, 52, 50, and 51 seconds.
A) Find an unbiased estimate of μ.
( )
X =+ + +
=
=
51 52 50 514
51
Since the E X , unbiased estimator.μ
Topic 2 page 45
B) Construct a 95% confidence interval for μ.
{ }
= ± ⎛⎝⎜
⎞⎠⎟
= ±
= ± ⇒
X Znσσ
2
51 1 96 0 3535
51 0 6929
. ( . )
. confidence interval = 50.3 to 51.7
Topic 2 page 46
Confidence Intervals for μ, With σ Unknown
The assumption that the population standard deviation σ is known may not be applicable to many practical problems. I.e. If the population mean is unknown it is highly unlikely the standard deviation of the unknown mean will be known.
The population σ must be estimated by using sample information to estimate >s= (the sample std. deviation).
Hence, usually the t-distributed random variable, is
employed to solve these problems. (Assuming the parent population is ______.)
Topic 2 page 47
t Xs
n
with n degrees of freedom=−
⎛⎝⎜
⎞⎠⎟
⎡
⎣
⎢⎢⎢
⎤
⎦
⎥⎥⎥
− =( ) ( )μ ν1
The methodology to determine a 100(1-α)% confidence interval is the same as the >σ known= case, except the Alimits@ of the confidence interval are found by using t-values instead of Z -values. Notation:
The t-value is labelled with two subscripts: tα ν2 , . The α/2 is the probability in the upper tail; The υ represents the degrees of freedom.
Topic 2 page 48
Recall, >t= represents a series of distributions whose shape depends on the sample size.
As n increases, the t-distribution approaches the normal distribution. Example: n=20, d.f.=19, 95% confidence interval:
P t( . ) .19 2 093 0 025≥ =
t19 -2.093 0 2.093
Area=1-α =95%
Area=2.5% α /2=2.5%
Topic 2 page 49
So,
[ ]P t Xs
nt
The confidence interval forpopulation normal unknown
− ≤−
≤⎡
⎣
⎢⎢
⎤
⎦
⎥⎥= −
−
α ν α νμ α
μ
α μσ
2 2 1
100 1
, ,
( )% ,, :
Rearranging and solving for :
X t s
nX t s
n− ≤ ≤ +α ν α νμ2 2, ,
X t s
nX s
± ⎛⎝⎜
⎞⎠⎟= ± ⎛
⎝⎜⎞⎠⎟α ν2 2 093
20, ( . )
Topic 2 page 50
Example: Air B.C. flies daily to Prince George to deliver mail. The trip from Vancouver follows an identical route. Owing to variations in weather patterns and landing clearance, the actual flight time varies. In a sample of 5 trips, the following flight times are recorded as:(55,57, 65, 47, 76) minutes. Determine a 90% confidence interval for μ. Determine an unbiased estimator of μ:
[ ]Xn
Xii
n
= = + + + + = ==∑1 1
555 57 65 47 76 300 5 60
1
/ minutes
The sample variance s
nX Xi
2 211
=−
−∑( )
Topic 2 page 51
Xi
( )X Xi −
( )X Xi −
2
55
55 - 60 = -5
57
57 - 60 = -3
65
65 - 60 = 5
47
47 - 60 = -13
76
76 - 60 = 16
300∑ 0∑ 484∑
sn
X X
s
i2 2 21
114
484 121
11
=−
− = =
=
∑( ) ( )
.
minutes
Topic 2 page 52
v=n-1=4 degrees of freedom F(t4) =1-α/2 = 1-0.05 = 0.95 t0.05,4 = Substituting values into the formula:
{ }
X t sn
X t sn
confidence interval toWidth
− ≤ ≤ +
= − ≤ ≤ +
= − ≤ ≤ += ≤ ≤
=
α ν α νμ
μ
μμ
2 2
60 2 132 115
60 2 132 115
60 10 489 60 10 48949 51 70 49
49 51 70 4920 98
, ,
. .
. .. .
: . ..
The interval (49.51 to 70.49) represents a 90% confidence interval for the
mean flight time per trip to Prince George, B.C.
1-α
-2.132 0 2.132
Topic 2 page 53
Topic 2 page 54
Remarks: This confidence interval depicts how often the ____ mean
is likely not to lie in the intervals calculated as α=0.10, if this procedure is repeated over and over.
We can decrease the size of the interval by either increasing the chance of _____ α, or by increasing the sample size n.
So, allowing a greater risk of error, or incurring a greater sampling cost to increase n, results in a smaller confidence interval.
Larger α α⇒ ↓t 2
Larger n sn
⇒ ↓
Topic 2 page 55
Determining the Size of the Sample (n)
Many times researchers do not know what is the optimal sample size for determining good estimates from our estimators of population parameters. In this situation, researchers can determine the optimal sample size if they: 1) Know what level of confidence is desired (100(1-α)).
2) Know what is the maximum difference (D) allowed between the point estimate of the population parameter and the true value of the population parameter. Let ‘D’ = largest allowable sampling error.
Topic 2 page 56
3 Cases: (I) Population Normal and σ Known We want to determine An@ for a 100(1-α)% confidence interval for μ. Since
Z X
nis N=
− μσ ( , )0 1
, if we require a level of confidence = (1-α), then the Z variable equation results in 100(1-α)% interval for [ ]X − μ :
( )− ≤ − ≤⎡
⎣⎢⎤⎦⎥
Zn
X Znα α
σ μ σ2 2
.
Econ 246 Topic 2 page 57
Concentrating on the right upper tail:
( )X Z
n− ≤⎡
⎣⎢⎤⎦⎥
μ σα 2
means the maximum value that [ ]X − μ can assume is
Znασ
2⎡⎣⎢
⎤⎦⎥.
Recall, the researcher dictates the largest sampling error allowed for [ ]X − μ =D. Since this error is on either side of the true mean, we can set
D X Z
n= − =μ σ
α 2 .
Topic 2 page 58
Solving for n, we can determine the value of >n= that guarantees with 100(1-α)% confidence that X − μ will be no larger than D.
To Determine the Minimum Sample size in Estimating the Mean:
( )n
Z
D=
α σ2
2 2
2
Econ 246 Topic 2 page 59
Example: Credit Card Debt We want a 99% confidence interval for the mean percentage of students who pay off their credit cards, such that our sample X − μ differ by no more than 15. % .
X D− ≤ =μ 15. %
If: D=1.5 σ = 8 Zα/2= 2.576 when α/2=0.005 Substitute into the equation:
Topic 2 page 60
( )
( )
nZ
D
n
n
=
=⎡
⎣⎢
⎤
⎦⎥ = ⎡
⎣⎢⎤⎦⎥
= =
α σ2
2 2
2
2 2
2
2 576 815
20 60815
13 738 188 75
( . ).
..
( . ) .
We need a random sample of size 189 students to ensure
that 99% of the time the value of X will be within 1.5% of the true population mean μ.
Econ 246 Topic 2 page 45
(II) Population Not Normal and σ Known
By the ___, the sampling distribution approaches the normal distribution as >n= increases.
Hence, once n is determined using the same method as above, check to see if >n= is greater than 30.
If it is, our method of solution is appropriate.
Topic 2 page 46
(III) Population Normal and σ Unknown If the population is normal, but the standard deviation is
unknown, use the t-variable:
t Xs
n=
− μ
instead of Z. So, first specify the maximum sampling error D= X − μ .
Topic 2 page 47
PROBLEMS:
S is calculated from the sample, but we have not determined the optimal ______ size.
The t-value tα/2,v depends on the sample size, which is unknown. Solve using iteration.
Topic 2 page 48
Confidence Interval for σ2
If a researcher is interested in determining the Avariability@ or Avolatility@ of a population characteristic, it is often desirable to construct a confidence interval for an estimate of an unknown population variance σ2. Example: May be interested in not just the average time but the variability of time people spend on the Internet. Example: Variation in temperature in some tropical location.
Topic 2 page 49
Construction of A 100(1-α)% Confidence Interval for σ2 Recall, when sampling from a normal population, that
(i) the variable( )n s−⎡
⎣⎢
⎤
⎦⎥
1 2
2σ has a χν2
.
(ii) χν2 values are always ________, and
(iii) χν2 distribution is not symmetrical. It is asymmetric.
Hence, when we determine a confidence interval for a random variable that follows a χν
2 distribution, (1) we cannot use values such as ±χα ν2
2, that were used for the Z and t-
distributions.
Topic 2 page 50
(2) We must look up two separate values for each tail of the cumulative chi-square distribution F( χν
2 ).
Econ 246 Topic 2 page 51
Example: v = (n-1)=15, α=10%, α/2=0.05 χupper
2 25= for α/2=0.05. χlower
2 7 26= . for α/2=0.05. The area in-between these two limits of the chi-square distribution must contain (0.950 -0.05)=0.90 of the total area. These two limits define the 90% probability interval for a chi-square distributed random variable.
Topic 2 page 52
F(25)=0.95 F(7.26)=0.05 0.05= α/2 0 7.26 15=mean 25
χ152
Lower χν2 Upper χν
2
Topic 2 page 53
Using these χν2 limit values and substituting in the χν
2
variables χ σν
2 1 2
2=−⎡
⎣⎢
⎤
⎦⎥
( )n s, we can derive the following
probability interval:
.P n s
lower upperχσ
χ α22
221 1≤
−≤
⎡
⎣⎢
⎤
⎦⎥ = −
( )
Solving for these inequalities for the unknown parameter σ2 we derive the:
100 1
1 1
2
2
22
2
2
( )% :
( ) ( ),−
−≤ ≤
−
α σ
χσ
χ
confidence interval for parent population normal
n s n s
upper lower
Topic 2 page 54
Notice the upper value of χν2 lands in the denominator of
the lower endpoint for σ2 and the lower value of χν2 is
located in the denominator of the upper endpoint for σ2.
Topic 2 page 55
Topic 2 page 56
Topic 2 page 57
Econ 246 Topic 2 page 58
Example: A consumer research group is interested in the variation of minutes of operation of several brands of batteries. We label each type of battery >A= >B= and >C.= Company A claims its mean batterylife-time is 750 minutes. Company B claims an average of 670 minutes and Company C claims its average life is 721 minutes. The consumer group is interested in estimating the variance of the population of batteries from each company, using a sample of size 5. The best point estimate of σ2 is s2, which will be determined from the samples. To create an interval estimate for σ2, with a confidence level of 95% (and hence a risk of error = 0.05,) compute the confidence interval for each company. Assume the parent population is normal.
Topic 2 page 59
Company A
Company B
Company C
Xi
( )X Xi −
2
Xi
( )X Xi −
2
Xi
( )X Xi −
2
0.64 0.04
735
184.96
84.64
7191.04
740
345.96
27.04 635.04
760
1489.96
1.44
8873.64
690
985.96
219.04 1211.04
682
1552.36
Xi∑ =3729
Σ=332.8
Xi∑ =3424
Σ=17,910.8
Xi∑ =3607
Σ=4,557.28
X A =745.8
SA2 83 2= .
X B =684.8
SB2 4477 7= .
XC =721.4
SC2 1139 32= .
Recall sn
X Xi2 21
1=
−−∑( )
( )
Econ 246 Topic 2 page 60
To compute the confidence interval for each, we need the χν2
limit values for v=4, α=0.05, α/2=0.025. Looking at the chi-square distribution table:
χ
χlower
upper
2
2
0 484
111
=
=
.
.
[ ]
( ) ( ) .
:
( )( . ).
( )( . ).
. . . .
n s n s Equation
Company A
width of battery A minutes
upper lower
−≤ ≤
−⇐
= ≤ ≤⎡
⎣⎢
⎤
⎦⎥ =
= ≤ ≤ ⇐ =
1 1 9 8
4 83 2111
4 83 20 484
29 98 687 60 657 62
2
22
2
2
2
2 2
χσ
χ
σ
σ
Topic 2 page 61
[ ]
Company B
width of battery B minutes
:
( )( . ).
( )( . ).
. . , . .
= ≤ ≤⎡
⎣⎢
⎤
⎦⎥ =
= ≤ ≤ ⇐ =
4 4477 7111
4 4477 70 484
1613 58 37005 78 35 392 21
2
2 2
σ
σ
[ ]
Company C
width of battery C minutes
:
( )( . ).
( )( . ).
. . . .
= ≤ ≤⎡
⎣⎢
⎤
⎦⎥ =
= ≤ ≤ ⇐ =
4 1139 32111
4 1139 320 484
410 57 9415 87 9005 3
2
2 2
σ
σ
Topic 2 page 62
Taking square-roots, we convert back to the original units:
Company Lower Boundary
Upper Boundary
Width (minutes)
A
5.48
26.22
B
40.17
192.37
C
20.26
97.04
Topic 2 page 63
Example: (9.43) A sample of size 15 has a standard deviation 3. The population measure is the distance (kilometres) between home and the work location of commuters. Find a 90% confidence limit on the true population variance. Assume that the population is normally distributed. n=15 n-1=14 s=3 α =0.10 α/2=0.05
Topic 2 page 64
A confidence interval for the population variance: Looking at the chi-square table:
χupper2 23 7= . for α/2=0.05. χlower
2 6 57= . for α/2=0.05.
( ) ( )
( ).
( ).
( . . )
n s n s
upper lower
−≤ ≤
−
= ≤ ≤
= ≤ ≤
1 1
14 923 7
14 96 57
5 32 19 18
2
22
2
2
2
2
χσ
χ
σ
σ
Topic 2 page 65
F( χ142
) F(23.7)=0.95 F(6.57)=0.05 0.05= α/2 0 6.57 14=mean 23.7
χ142
Lower χν2 Upper χν
2
Topic 2 page 66
Question A random sample of 16 observations from an approximately normally distributed population is as follows: [215, 328, 433, 124, 214, 320, 436, 342, 219, 121, 138, 244, 319, 425, 334, 247]. A) Construct a 90% confidence interval for the population
variance. B) Construct a 90% confidence interval for the population
standard deviation.
Topic 2 page 67
Sample Variance s2=11026.36504 90% confidence interval:
Topic 2 page 68
sn
upper
lower
2
152
152
11026 3650416010
24 9958
7 260944
===
=
=
.
..
.,
,
αχ
χ
Topic 2 page 69
( ) ( )
( )( . ).
( )( . ).
..
..
( . . )
, ,
n s n s
upper lower
−⎡
⎣⎢⎢
⎤
⎦⎥⎥≤ ≤
−⎡
⎣⎢
⎤
⎦⎥
⎡⎣⎢
⎤⎦⎥≤ ≤ ⎡
⎣⎢⎤⎦⎥
⎡⎣⎢
⎤⎦⎥≤ ≤ ⎡
⎣⎢⎤⎦⎥
≤ ≤
1 1
15 11026 3650424 9958
15 11026 365047 260944
165395475624 9958
16539547567 260944
6616 9307 22778 7841
2
22
2
2
2
2
2
χσ
χ
σ
σ
σ
ν ν
b) 81.3445 to 150.9264
Topic 2 page 70
Question : For a random variable χν2 , which has chi-square
distribution with 8 degrees of freedom, determine the values of a and b such that P a b( ) .≤ ≤ =χ 2 0 75 and such that the areas in each tail are equal. (3 marks)
Topic 2 page 71
χ
χ
χ
0 125 82
0 875 82
2
37965
12 6361
37965 12 6361 0 75
. ,
. ,
.
.
( . . ) .
= =
= =
≤ ≤ =
a
b
P
Top Related