Statistical Intervals for a Single Sample From only one sample, An interval has been found. Because...

Statistical Intervals for a Single Sample

From only one sample,

An interval has been found.

Because the sample was ample,

The results were quite profound!

- author unknown circa 2007

Chapter 8A

What to Look Forward to this Week

Today only!

A Diversion – the sampling distributions or distributions arising from the normal

If Z1, Z2, ..., Zn are independent standard normal random variables, then

2 2 2 21 2 ... ( )nZ Z Z n chi-square distribution

with n degrees freedom

2( )

/

ZT t n

n t-distribution

with n degrees freedom

2

2

( ) /( , )

( ) /

n nF F n m

m m

F-distribution with n degrees freedom inthe numerator and m degreesof freedom in the denominator

What will be Chi-square?

2

22

12

1

2

21

2 2

( , )

(0,1)

Therefore

( 1)However 1

i

i

n

ini i

i

n

ii

X n

XZ n

XX

n

X Xn S

n

Let Xi be the ith sample value from a normal population

The Chi-square Distribution

k = degrees offreedom

The t-distributionalso known as Student’s t

f(x) =

v = k = df

More about Student

The t statistic was introduced by William Sealy Gosset for cheaply monitoring the quality of beer brews. "Student" was his pen name.

Gosset was a statistician for the Guinness brewery in Dublin, Ireland, and was hired due to Claude Guinness's innovative policy of recruiting the best graduates from Oxford and Cambridge to apply biochemistry and statistics to Guinness' industrial processes.

Gosset published the t test in Biometrika in 1908, but was forced to use a pen name by his employer who regarded the fact that they were using statistics as a trade secret.

The F-distribution

f(x)

B(m,n) is theBeta function

an interestingproperty:

F-Distribution named after R.A. Fisher

Born 17 February 1890(1890-02-17)East Finchley, London , England Died 29 July 1962 (aged 72) Adelaide, AustraliaResidence England, Australia Nationality British, Field Statistics, Genetics, Natural selection Institutions Rothamsted Experimental StationUniversity College London, Cambridge UniversityAlma mater Cambridge University Academic advisor Sir James JeansF.J.M. Stratton Notable students C.R. Rao Known for Maximum likelihoodFisher informationAnalysis of variance Notable prizes Royal Medal (1938)Copley Medal (1955)

Three types of Intervals

-Confidence Interval – bound population parameter or distribution parameter.

-Tolerance Interval – bound a proportion of the distribution at a certain confidence level.

- Prediction Interval – bound a single observation => assumptions on population distribution critical here.

Overheard at a rest stopI know that my average driving time

on this daily route has been 2.3 hours over the last 7 days. However, that is

based on a sample and therefore is unlikely to equal my population

mean. What I really need is some way to measure how precise this

estimate is.

A typical prob-stat graduate

Confidence Interval - A statement consisting of two values between which the population parameter is estimated to lie.

Reliability – degree of confidence – the probability with which the population parameter will be “captured by the two values.

Precision – the length of the confidence interval (a measure of the error in estimating the parameter.

I am 95% confident that my mean

driving time to work is between 37.4

minutes and 41.2 minutes.

41.2 – 37.4 = 3.8Best estimate is midpoint = 39.3Error = 1.9 minutes

The Big Picture of a Confidence Interval

(L,U) is 100(1-)% CI for the population parameter

The Bigger Picture of a Confidence Interval

General Approach:

Estimate reliability Factor x Standard Error

our pointestimate our confidence

our precision

The Biggest Picture of a Confidence Interval

1 1Pr ,..., ,..., 1n nl x x u x x

Measure of RiskMeasure of Uncertainty (precision)(Random Variables; i.e. statistics

populationparameter

The length of a confidence interval is a measure of the precision of estimation.

(1 – )% of the C.I.s constructed this way contain the mean. Watch the interpretation of this concept.

Confidence Interval on Mean of Normal Distribution Variance Known

/2 /2

/2 /2

/2

has a standard normal distribution/

1/

with a little algebra,

/ / 1

This is our 100(1 )% Confidence Interval on .

is the upper

XZ

n

XP z z

n

P X z n X z n

z

/2 percentage point from standard normal

Our Very First Real Confidence Interval

A sample of 100 batteries are tested for their operating life. They averaged (mean) 10 hours before failing. The manufacturer has assured us that the population variance is 16 hours. Find a 95 percent confidence interval for the mean life of this particular type of battery.

100, 10 ., 4 .N x hr hr

.025

/2

1.96

410 1.96 (9.216,10.784)

100

z

X zn

Sample Size and Precision

. exceednot error will that theconfident

)%-100(1 becan weand below as Choose

by error theDefine

/ :lyEquivalent

1//

2

2/

2/

2/2/

E

zn

E

n

Ex

nzx

nzXnzXP

Think of E as a measure of practical

significance.

Our Very First Real Confidence Interval Revisited

For our battery problem, what sample size is required to reduce the error to .5 hr. with a 99% confidence?

2 2/2

2

2.58 16426.0096 426

.5

z xn

E

Problem 8-12

Life of a 75 watt bulb is normally distributed with std dev = 25 hrs. Suppose we want to be 95% confident that the error in estimating mean life is less than 5 hours. Find a sample size.

2 2

/2 1.96(25)96.04 96

5

zn

E

Problem 8-10

Diameter of holes for a cable harness is normally distributed with a standard deviation of .01 in. A random sample of 10 yields average diameter of 1.5045 in. Find a 99% two-sided confidence interval.

5127.14963.1

10/)01.0(58.25045.110/)01.0(58.25045.1

// 005.0005.0

nzxnzx

Interpreting a Confidence Interval

The confidence interval is a random interval The appropriate interpretation of a confidence

interval (for example on ) is: The observed interval [l, u] brackets the true value of , with confidence 100(1-).

Examine Figure 8-1 on the next slide.

Figure 8-1 Repeated construction of a confidence interval for

Repeated Confidence Intervals, gen. samples

Sample1 1.886 1.014 -1.534 0.192 0.801 -0.429 -0.579 0.647 0.149 1.0152 1.040 -1.008 -0.225 0.374 0.168 0.607 -1.439 -1.070 1.355 0.9943 1.091 -0.447 1.393 1.105 -0.012 -1.986 -1.518 0.749 1.244 -1.1234 -1.618 0.874 0.484 -1.761 -0.653 -0.432 1.695 0.487 -1.589 -0.9085 -1.246 -0.386 0.222 -0.326 0.969 0.225 0.824 -1.450 0.399 0.5666 0.301 -1.002 1.791 -0.212 1.403 0.669 -0.071 -0.306 1.576 -0.171

96 -0.778 -0.977 0.361 -1.247 -0.045 -0.213 -1.772 -0.052 0.666 1.27397 0.277 0.646 -0.693 -1.306 -1.311 -0.489 0.743 -0.313 -0.219 -1.27898 0.725 2.182 -0.855 -0.831 0.359 0.295 -1.639 1.165 -1.099 0.09099 -0.227 -1.575 1.890 0.497 0.211 0.408 -0.542 1.423 1.832 -0.073

100 -0.425 -0.328 0.475 1.241 0.210 1.409 0.641 -2.964 -1.120 0.983

Random samples from a standard normal distribution, N(0,1).

Generated in Excel as NORMSINV(RAND())

Repeated Confidence Intervals, hits and misses

mean std dev 90% lower 90% upper 95% lower 95% upper90%

misses95%

misses-0.104 0.717 -0.476 0.268 -0.549 0.340 0 00.375 0.807 -0.044 0.793 -0.125 0.875 0 00.327 0.929 -0.155 0.809 -0.249 0.903 0 00.064 1.117 -0.515 0.643 -0.628 0.756 0 00.335 1.426 -0.405 1.074 -0.549 1.219 0 00.071 1.214 -0.559 0.701 -0.681 0.823 0 0

-0.346 1.080 -0.907 0.214 -1.016 0.324 0 00.403 0.531 0.127 0.678 0.074 0.732 1 10.217 1.124 -0.366 0.800 -0.480 0.914 0 0

0.534 1.022 0.004 1.064 -0.099 1.168 1 00.387 0.750 -0.002 0.776 -0.078 0.852 0 0

-0.212 1.253 -0.862 0.438 -0.988 0.564 0 00.469 0.759 0.076 0.863 -0.001 0.940 1 00.305 0.612 -0.012 0.622 -0.074 0.684 0 00.374 0.745 -0.013 0.760 -0.088 0.836 0 0

-0.024 0.489 -0.278 0.230 -0.327 0.279 0 0Totals -> 14 9

Since these are not 10 and 5, respectively, is there an error?

Repeated Confidence Intervals, .90

n p misses prob(p) Cum Prob100 0.1 0 0.0000 0.0000

1 0.0003 0.00032 0.0016 0.00193 0.0059 0.00784 0.0159 0.02375 0.0339 0.05766 0.0596 0.11727 0.0889 0.20618 0.1148 0.32099 0.1304 0.4513

10 0.1319 0.583211 0.1199 0.703012 0.0988 0.801813 0.0743 0.876114 0.0513 0.927415 0.0327 0.960116 0.0193 0.979417 0.0106 0.990018 0.0054 0.995419 0.0026 0.998020 0.0012 0.9992

How are these probabilitiesbeing generated?

Let X = RV, number ofmisses. Then X ~ Bin(100, .1)E[X] = np = 10

Repeated Confidence Intervals, .95

n p misses prob(p) Cum Prob100 0.05 0 0.0059 0.0059

1 0.0312 0.03712 0.0812 0.11833 0.1396 0.25784 0.1781 0.43605 0.1800 0.61606 0.1500 0.76607 0.1060 0.87208 0.0649 0.93699 0.0349 0.9718

10 0.0167 0.988511 0.0072 0.995712 0.0028 0.9985

Major Point: Can you see how probability helps us assess the risk associated with statistical inference?

One-sided Confidence Bounds

Our Very First Real Confidence Interval Revisited Again

A One-Sided Confidence IntervalBased upon the sample of 100 batteries averaging (mean) 10 hours to failure. The manufacturer continues to assure us that the population variance is 16 hours. Find a 95 percent lower confidence interval for the mean life of this particular type of battery.

.05 1.6449

410 1.6449 9.342 hr.

100

z

X zn

A Transition…

The previous development of a confidence interval was limited in two ways:

- Needed a Normal population

- Needed to know the standard deviation of the Normal distribution

The Central Limit Theorem eliminates the need to explicitly know the population is normal – Z will still be approximately standard normal

We can estimate using the sample standard deviation, s.

Confidence Interval on Mean of Normal – Variance Unknown

Same form as for the normal – measure of risk is now from the t distribution, and we boldly use the sample standard deviation – protected by the heavy-tailed t distribution-even when sample size is small!

Remember we are still assuming that observations from the underlying population are normally distributed.

s.d.f.' 1on with distributi a ofpoint

percentage 2/100upper theis where

//

:interval confidencepercent )1(100

1,2/

1,2/1,2/

nt

t

nstxnstx

n

nn

Where Does it Come From? Do we care?

From earlier:

2

2

//( 1)

1

XXnTS nn S

n

numerator is standard normal

denominator is chi-square divided by d.f. (n-1)

T has a t distribution with n-1 degrees of freedom.

2

2

(0,1)/

( 1)1

XZ n

n

n SX n

Are we still caring?

/2 /2

/2 /2

/2

1/

with a little algebra,

/ / 1

This is our 100(1 )% Confidence Interval on .

is the upper /2 percentage point from t-distribution

XP t t

S n

P X t S n X t S n

t

Confidence Interval on Mean of Normal – Variance Unknown

For large samples the distributional assumption is not critical. If sample size is not large use the t distribution

variance.andmean unknown with

ondistributi normal a from sample a are ,...,, if

freedom of degrees 1-non with distributi a has /

21 nXXX

tnS

XT

As n ∞, t distribution becomes standard normal.

t Distribution Converges to Standard Normal

For large sample size, use the normal distribution even if the variance is unknown.

The t distribution

1).-(ndeviation standard sample with the

associated freedom of degrees ofnumber theisk Usually

. ty)(probabili area have which weabove freedom of

degrees with variablerandom theof value theis ,

kTt k

xkxkk

kxf k 2/)1(2 1/

1

2/

2/1)(

Our Very First CI using the t-distribution

Sulfur dioxide and nitrogen oxide are products of fossil fuel consumption. These compounds can be carried long distances and converted to acid before being deposited in the form of “acid rain.” The following sulfur dioxide concentrations (in micrograms per cubic meter) were obtained from different locations in a forest though to have been damaged by acid rain. Estimate the mean concentration in the forest. 52.7 43.9 41.7 71.5 47.6 55.1

62.2 56.5 33.4 61.8 54.3 5045.3 63.4 53.9 65.5 66.6 7052.4 38.6 46.1 44.4 60.7 56.4

/2, 1 /2, 1/ /

10.07 10.0753.92 2.069 53.92 2.069 (49.67,58.17)

24 24

n nx t s n x t s n

2.025,2324, 53.92, 101.48, 2.069n x s t

Average concentrationin undamaged areasis 20 g/m3

It’s Official Now

Stay Tuned – next time…

Statistical Intervals for a Single Sample From only one sample, An interval has been found. Because...

Documents

Transcript of Statistical Intervals for a Single Sample From only one sample, An interval has been found. Because...