Statistical Intervals for a Single Sample From only one sample, An interval has been found. Because...
-
Upload
doreen-clark -
Category
Documents
-
view
230 -
download
0
Transcript of Statistical Intervals for a Single Sample From only one sample, An interval has been found. Because...
Statistical Intervals for a Single Sample
From only one sample,
An interval has been found.
Because the sample was ample,
The results were quite profound!
- author unknown circa 2007
Chapter 8A
A Diversion – the sampling distributions or distributions arising from the normal
If Z1, Z2, ..., Zn are independent standard normal random variables, then
2 2 2 21 2 ... ( )nZ Z Z n chi-square distribution
with n degrees freedom
2( )
/
ZT t n
n t-distribution
with n degrees freedom
2
2
( ) /( , )
( ) /
n nF F n m
m m
F-distribution with n degrees freedom inthe numerator and m degreesof freedom in the denominator
What will be Chi-square?
2
22
12
1
2
21
2 2
( , )
(0,1)
Therefore
( 1)However 1
i
i
n
ini i
i
n
ii
X n
XZ n
XX
n
X Xn S
n
Let Xi be the ith sample value from a normal population
More about Student
The t statistic was introduced by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name.
Gosset was a statistician for the Guinness brewery in Dublin, Ireland, and was hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes.
Gosset published the t test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret.
F-Distribution named after R.A. Fisher
Born 17 February 1890(1890-02-17)East Finchley, London , England Died 29 July 1962 (aged 72) Adelaide, AustraliaResidence England, Australia Nationality British, Field Statistics, Genetics, Natural selection Institutions Rothamsted Experimental StationUniversity College London, Cambridge UniversityAlma mater Cambridge University Academic advisor Sir James JeansF.J.M. Stratton Notable students C.R. Rao Known for Maximum likelihoodFisher informationAnalysis of variance Notable prizes Royal Medal (1938)Copley Medal (1955)
Three types of Intervals
-Confidence Interval – bound population parameter or distribution parameter.
-Tolerance Interval – bound a proportion of the distribution at a certain confidence level.
- Prediction Interval – bound a single observation => assumptions on population distribution critical here.
Overheard at a rest stopI know that my average driving time
on this daily route has been 2.3 hours over the last 7 days. However, that is
based on a sample and therefore is unlikely to equal my population
mean. What I really need is some way to measure how precise this
estimate is.
A typical prob-stat graduate
Confidence Interval - A statement consisting of two values between which the population parameter is estimated to lie.
Reliability – degree of confidence – the probability with which the population parameter will be “captured by the two values.
Precision – the length of the confidence interval (a measure of the error in estimating the parameter.
I am 95% confident that my mean
driving time to work is between 37.4
minutes and 41.2 minutes.
41.2 – 37.4 = 3.8Best estimate is midpoint = 39.3Error = 1.9 minutes
The Bigger Picture of a Confidence Interval
General Approach:
Estimate reliability Factor x Standard Error
our pointestimate our confidence
our precision
The Biggest Picture of a Confidence Interval
1 1Pr ,..., ,..., 1n nl x x u x x
Measure of RiskMeasure of Uncertainty (precision)(Random Variables; i.e. statistics
populationparameter
The length of a confidence interval is a measure of the precision of estimation.
(1 – )% of the C.I.s constructed this way contain the mean. Watch the interpretation of this concept.
Confidence Interval on Mean of Normal Distribution Variance Known
/2 /2
/2 /2
/2
has a standard normal distribution/
1/
with a little algebra,
/ / 1
This is our 100(1 )% Confidence Interval on .
is the upper
XZ
n
XP z z
n
P X z n X z n
z
/2 percentage point from standard normal
Our Very First Real Confidence Interval
A sample of 100 batteries are tested for their operating life. They averaged (mean) 10 hours before failing. The manufacturer has assured us that the population variance is 16 hours. Find a 95 percent confidence interval for the mean life of this particular type of battery.
100, 10 ., 4 .N x hr hr
.025
/2
1.96
410 1.96 (9.216,10.784)
100
z
X zn
Sample Size and Precision
. exceednot error will that theconfident
)%-100(1 becan weand below as Choose
by error theDefine
/ :lyEquivalent
1//
2
2/
2/
2/2/
E
zn
E
n
Ex
nzx
nzXnzXP
Think of E as a measure of practical
significance.
Our Very First Real Confidence Interval Revisited
For our battery problem, what sample size is required to reduce the error to .5 hr. with a 99% confidence?
2 2/2
2
2.58 16426.0096 426
.5
z xn
E
Problem 8-12
Life of a 75 watt bulb is normally distributed with std dev = 25 hrs. Suppose we want to be 95% confident that the error in estimating mean life is less than 5 hours. Find a sample size.
2 2
/2 1.96(25)96.04 96
5
zn
E
Problem 8-10
Diameter of holes for a cable harness is normally distributed with a standard deviation of .01 in. A random sample of 10 yields average diameter of 1.5045 in. Find a 99% two-sided confidence interval.
5127.14963.1
10/)01.0(58.25045.110/)01.0(58.25045.1
// 005.0005.0
nzxnzx
Interpreting a Confidence Interval
The confidence interval is a random interval The appropriate interpretation of a confidence
interval (for example on ) is: The observed interval [l, u] brackets the true value of , with confidence 100(1-).
Examine Figure 8-1 on the next slide.
Repeated Confidence Intervals, gen. samples
Sample1 1.886 1.014 -1.534 0.192 0.801 -0.429 -0.579 0.647 0.149 1.0152 1.040 -1.008 -0.225 0.374 0.168 0.607 -1.439 -1.070 1.355 0.9943 1.091 -0.447 1.393 1.105 -0.012 -1.986 -1.518 0.749 1.244 -1.1234 -1.618 0.874 0.484 -1.761 -0.653 -0.432 1.695 0.487 -1.589 -0.9085 -1.246 -0.386 0.222 -0.326 0.969 0.225 0.824 -1.450 0.399 0.5666 0.301 -1.002 1.791 -0.212 1.403 0.669 -0.071 -0.306 1.576 -0.171
96 -0.778 -0.977 0.361 -1.247 -0.045 -0.213 -1.772 -0.052 0.666 1.27397 0.277 0.646 -0.693 -1.306 -1.311 -0.489 0.743 -0.313 -0.219 -1.27898 0.725 2.182 -0.855 -0.831 0.359 0.295 -1.639 1.165 -1.099 0.09099 -0.227 -1.575 1.890 0.497 0.211 0.408 -0.542 1.423 1.832 -0.073
100 -0.425 -0.328 0.475 1.241 0.210 1.409 0.641 -2.964 -1.120 0.983
Random samples from a standard normal distribution, N(0,1).
Generated in Excel as NORMSINV(RAND())
Repeated Confidence Intervals, hits and misses
mean std dev 90% lower 90% upper 95% lower 95% upper90%
misses95%
misses-0.104 0.717 -0.476 0.268 -0.549 0.340 0 00.375 0.807 -0.044 0.793 -0.125 0.875 0 00.327 0.929 -0.155 0.809 -0.249 0.903 0 00.064 1.117 -0.515 0.643 -0.628 0.756 0 00.335 1.426 -0.405 1.074 -0.549 1.219 0 00.071 1.214 -0.559 0.701 -0.681 0.823 0 0
-0.346 1.080 -0.907 0.214 -1.016 0.324 0 00.403 0.531 0.127 0.678 0.074 0.732 1 10.217 1.124 -0.366 0.800 -0.480 0.914 0 0
0.534 1.022 0.004 1.064 -0.099 1.168 1 00.387 0.750 -0.002 0.776 -0.078 0.852 0 0
-0.212 1.253 -0.862 0.438 -0.988 0.564 0 00.469 0.759 0.076 0.863 -0.001 0.940 1 00.305 0.612 -0.012 0.622 -0.074 0.684 0 00.374 0.745 -0.013 0.760 -0.088 0.836 0 0
-0.024 0.489 -0.278 0.230 -0.327 0.279 0 0Totals -> 14 9
Since these are not 10 and 5, respectively, is there an error?
Repeated Confidence Intervals, .90
n p misses prob(p) Cum Prob100 0.1 0 0.0000 0.0000
1 0.0003 0.00032 0.0016 0.00193 0.0059 0.00784 0.0159 0.02375 0.0339 0.05766 0.0596 0.11727 0.0889 0.20618 0.1148 0.32099 0.1304 0.4513
10 0.1319 0.583211 0.1199 0.703012 0.0988 0.801813 0.0743 0.876114 0.0513 0.927415 0.0327 0.960116 0.0193 0.979417 0.0106 0.990018 0.0054 0.995419 0.0026 0.998020 0.0012 0.9992
How are these probabilitiesbeing generated?
Let X = RV, number ofmisses. Then X ~ Bin(100, .1)E[X] = np = 10
Repeated Confidence Intervals, .95
n p misses prob(p) Cum Prob100 0.05 0 0.0059 0.0059
1 0.0312 0.03712 0.0812 0.11833 0.1396 0.25784 0.1781 0.43605 0.1800 0.61606 0.1500 0.76607 0.1060 0.87208 0.0649 0.93699 0.0349 0.9718
10 0.0167 0.988511 0.0072 0.995712 0.0028 0.9985
Major Point: Can you see how probability helps us assess the risk associated with statistical inference?
Our Very First Real Confidence Interval Revisited Again
A One-Sided Confidence IntervalBased upon the sample of 100 batteries averaging (mean) 10 hours to failure. The manufacturer continues to assure us that the population variance is 16 hours. Find a 95 percent lower confidence interval for the mean life of this particular type of battery.
.05 1.6449
410 1.6449 9.342 hr.
100
z
X zn
A Transition…
The previous development of a confidence interval was limited in two ways:
- Needed a Normal population
- Needed to know the standard deviation of the Normal distribution
The Central Limit Theorem eliminates the need to explicitly know the population is normal – Z will still be approximately standard normal
We can estimate using the sample standard deviation, s.
Confidence Interval on Mean of Normal – Variance Unknown
Same form as for the normal – measure of risk is now from the t distribution, and we boldly use the sample standard deviation – protected by the heavy-tailed t distribution-even when sample size is small!
Remember we are still assuming that observations from the underlying population are normally distributed.
s.d.f.' 1on with distributi a ofpoint
percentage 2/100upper theis where
//
:interval confidencepercent )1(100
1,2/
1,2/1,2/
nt
t
nstxnstx
n
nn
Where Does it Come From? Do we care?
From earlier:
2
2
//( 1)
1
XXnTS nn S
n
numerator is standard normal
denominator is chi-square divided by d.f. (n-1)
T has a t distribution with n-1 degrees of freedom.
2
2
(0,1)/
( 1)1
XZ n
n
n SX n
Are we still caring?
/2 /2
/2 /2
/2
1/
with a little algebra,
/ / 1
This is our 100(1 )% Confidence Interval on .
is the upper /2 percentage point from t-distribution
XP t t
S n
P X t S n X t S n
t
Confidence Interval on Mean of Normal – Variance Unknown
For large samples the distributional assumption is not critical. If sample size is not large use the t distribution
variance.andmean unknown with
ondistributi normal a from sample a are ,...,, if
freedom of degrees 1-non with distributi a has /
21 nXXX
tnS
XT
As n ∞, t distribution becomes standard normal.
t Distribution Converges to Standard Normal
For large sample size, use the normal distribution even if the variance is unknown.
The t distribution
1).-(ndeviation standard sample with the
associated freedom of degrees ofnumber theisk Usually
. ty)(probabili area have which weabove freedom of
degrees with variablerandom theof value theis ,
kTt k
xkxkk
kxf k 2/)1(2 1/
1
2/
2/1)(
Our Very First CI using the t-distribution
Sulfur dioxide and nitrogen oxide are products of fossil fuel consumption. These compounds can be carried long distances and converted to acid before being deposited in the form of “acid rain.” The following sulfur dioxide concentrations (in micrograms per cubic meter) were obtained from different locations in a forest though to have been damaged by acid rain. Estimate the mean concentration in the forest. 52.7 43.9 41.7 71.5 47.6 55.1
62.2 56.5 33.4 61.8 54.3 5045.3 63.4 53.9 65.5 66.6 7052.4 38.6 46.1 44.4 60.7 56.4
/2, 1 /2, 1/ /
10.07 10.0753.92 2.069 53.92 2.069 (49.67,58.17)
24 24
n nx t s n x t s n
2.025,2324, 53.92, 101.48, 2.069n x s t
Average concentrationin undamaged areasis 20 g/m3