Sampling Distributions A review by Hieu Nguyen (03/27/06)

Post on 01-Apr-2015

229 views 0 download

Transcript of Sampling Distributions A review by Hieu Nguyen (03/27/06)

Sampling Distributions

A review by Hieu Nguyen(03/27/06)

Parameter vs Statistic

A parameter is a description for the entire population.

Example:A parameter for the US population is the proportion of all people who support President Bush’s nomination of Samuel Alito to the Supreme Court.

p=.74

Parameter vs Statistic

A statistic is a description of a sample taken from the population. It is only an estimate of the population parameter.

Example:In a poll of 1001 Americans, 73% of those surveyed supported Alito’s nomination.

p-hat=.73

Bias

The bias of a statistic is a measure of its difference from the population parameter.

A statistic is unbiased if it exactly equals the population parameter.

Example:The poll would have been unbiased if 74% of those surveyed approved of Alito’s nomination.

p-hat=.74=p

Sampling Variability

Samples naturally have varying results. The mean or sample proportion of one sample may be different from that of another.

In the poll mentioned before p-hat=.73. A repetition of the same poll may have

p-hat=.75.

Central Limit Theorem (CLT)

Populations that are wildly skewed may cause samples to vary a great deal.

However, the CLT states that these samples tend to have a sample proportion (or mean) that is close to the population parameter.The CLT is very similar to the law of large

numbers.

CLT Example

Imagine that many polls of 1001 Americans are done to find the proportion of those who supported Alito’s nomination.

Although the poll results vary, more samples have a mean that is close to the population parameter μ=.74.

CLT Example

Plot the mean of all samples to see the effects of the CLT. Notice how there are more sample means near the population parameter μ=.74.

This histogram is actually a sampling distribution

Sampling Distributions: Definition Textbook definition:

A sampling distribution is the distribution of values taken by the statistic in all possible samples of the same size from the same population.

In other words, a sampling distribution is a histogram of the statistics from samples of the same size of a population.

Two Most Common Types of Sampling Distributions Sample Proportion Distribution

Distribution of the sample proportions of samples from a population

Sample Mean Distribution Distribution of the sample means of samples

from a population For both types, the ideal shape is a normal

distribution

Sampling Distributions: Conditions Before assuming that a sampling

distribution is normal, check the following conditions:Plausible IndependenceRandomnessEach sample is less than 10% of the

population

Sampling Distributions As Normal Distributions When all conditions met, the sampling

distribution can be considered a normal distribution with a center and a spread.

Note:With sample proportion distributions, another condition must be meet:Success-failure conditon – there must be at least 10

success and 10 failures according to the population parameter and sample size

Sampling Distributions As Normal Distributions: Equations Sample Proportion

Distributionp = population proportion (given)

Sample Mean Distributionμ = population mean (given)

σ = population standard deviation (given)

n

pqpSD ˆ

pSDpN ˆ,

n

ySD

ySDN ,

Sampling Distributions As Normal Distributions: Note Note:

If any of the parameters are unknown, use the statistics from a sample to approximate it.

Using Sampling Distributions

Sampling Distributions can estimate the probability of getting a certain statistic in a random sample.Use z-scores or the NormalCDF function in

the TI-83/84.

Using Sampling Distributions: Z-Scores w/ Example Use the z-score table to find appropriate

probabilitiesExample:Find the probability that a poll of Americans that support Alito’s nomination will return a sample proportion of .72.

ppP

OR

ppP

pSD

ppz

ˆˆ

ˆˆ

ˆ

ˆ

0749.72.ˆ

443.10139.

74.72.ˆ

ˆ

0139.1001

26.*74.ˆ

74.

pP

pSD

ppz

n

pqpSD

p

Using Sampling Distributions: NormalCDF Function w/ Example The syntax for the NormalCDF function is:

NormalCDF(lower limit, upper limit, μ, σ)Example:Find the probability that a sample of size 25 will have a mean of 5 given that the population has a mean of 7 and a standard deviation of 3.

000429.)6,.7,5,0(

6.25

3

3

7

NormalCDFn

ySD

Sampling Distribution for Two Populations Use a difference sampling distribution if

the question presents 2 different populations.

22yxyx

yxyx

Sampling Distribution for Two Populations: Example(adapted from AP Statistics – Chapter 9 – Sampling Distribution Multiple Choice Questions

Medium oranges have a mean weight of 14oz and a standard deviation of 2oz. Large oranges have a mean weight of 18oz and a standard deviation of 3oz. Find the probability of finding a medium orange that weights more than a large orange.

134.)606.3,4,0,(

606.323

41418

3

18

2

14

2222

NormalCDF

xyxy

xyxy

y

y

x

x

Example Problem(adapted from DeVeau Sampling Distribution Models Exercise #42)

Ayrshire cows average 47 pounds if milk a day, with a standard deviation of 6 pounds. For Jersey cows, the mean daily production is 43 pounds, with a standard deviation of 5 pounds. Assume that Normal models describe milk production for these breeds. A) We select an Ayrshire at random. What’s the probability that she averages

more than 50 pounds of milk a day? B) What’s the probability that a randomly selected Ayrshire gives more milk

than a randomly selected Jersey? C) A farmer has 20 Jerseys. What’s the probability that the average

production for this small herd exceeds 45 pounds of milk a day? D) A neighboring farmer has 10 Ayrshires. What’s the probability that his herd

average is at least 5 pounds higher than the average for the Jersey herd?

Example Problem Solution

First, check the assumptions: Independent samplesRandomnessSample represents less than 10% of

population

Example Problem Solution

A) Use the normal model to estimate the appropriate probability.

309.6,47,,50

309.50ˆ5.6

4750

6

47

NormalCDF

pPx

z

Example Problem Solution

B) Create a normal model for the difference between Ayrshires and Jerseys. Use the model to estimate the appropriate probability.

696.)810.7,4,,0(

696.0512.810.7

40

810.756

44347

5

43

6

47

2222

NormalCDF

xPx

zja

ja

jaja

jaja

j

j

a

a

Example Problem Solution

C) Create a sampling distribution model for which n=20 Jerseys. Use the model to estimate the appropriate probability.

0367.)6,47,,50(

0367.45ˆ789.1.118.1

4345

118.120

5

20

5

43

NormalCDF

pPx

z

nySD

n

Example Problem Solution

D) First create a sampling distribution model for 10 random Ayrshires and 20 random Jerseys. Then create a normal model for the difference between the 10 Ayrshires and 20 Jerseys.

118.120

5

20

5

43

j

jj

j

j

j

nySD

n

897.110

6

10

6

47

a

aa

a

a

a

nySD

n

325.)202.2,4,,5(

325.5454.202.2

45

202.2118.1897.1

44347

2222

NormalCDF

xPx

zja

ja

jaja

jaja