Chapter 8: Sampling Variability and Sampling Distributionscparrish.sewanee.edu/stat204...

Stat 204, Part 4 Inference

Chapter 8: Sampling Variability and Sampling Distributions

These notes reflect material from our text, Statistics, Learning from Data, First Edition, by Roxy Peck,published by CENGAGE Learning, 2015.

Sampling distributions

Three distributions : population, data, sampling

Sampling distribution of the sample proportion

Sampling distribution of the sample mean

10 15 20 25 30 35 40

Population distribution vs. sampling distribution of sample mean

Frequency

populationsample means

LLN and CLT

LLN : Xn → µ as n gets larger.

CLT : the distribution of Xn → Normal Distribution as n gets larger.

Both theorems require certain conditions to be satisfied for the theorem to be applicable. The CLTrequires

• independent sample elements (from less than 10% of the population)

• relatively large sample size (for instance, at least 30 elements)

• data which are not highly skewed and without extreme outliers

in order to conclude that the sample mean is approximately normally distributed with standard deviationapproximately SE (OIS, p.168).

Spring 2016 Page 1 of 6

Sampling distributions of the sample mean

Uniform distribution.

0.0 0.2 0.4 0.6 0.8 1.0

Uniform Distribution, U[0,1]

Probability

n=2, n.samples=10000

Sample mean

Frequency

0.0 0.2 0.4 0.6 0.8 1.0

Sample mean

Frequency

0.2 0.4 0.6 0.8 1.0

Sample mean

Frequency

0.3 0.4 0.5 0.6 0.7

Exponential distribution.

0.0 0.2 0.4 0.6 0.8 1.0

Exponential Distribution

Probability

Sample mean

Frequency

0 1 2 3 4 5

Sample mean

Frequency

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5

Sample mean

Frequency

0.5 1.0 1.5

Normal distribution. The sampling distribution of a normal distribution is a normal distribution, evenfor small sample sizes.

-3 -2 -1 0 1 2 3

Normal Distribution

Probability

Sample mean

Frequency

-3 -2 -1 0 1 2 3

Sample mean

Frequency

-1.5 -0.5 0.0 0.5 1.0 1.5

Sample mean

Frequency

-0.5 0.0 0.5 1.0

Sampling distributions of counts and proportions

Imagine taking a sample of size 100 from a Bernoulli population with p = 0.6. Do this 1000 times andmake a histogram of the counts of successes. Take another 1000 samples of size 100 and make a histogramof the proportions of successes in the samples. What do you observe?

Population Distribution(p=0.6)

Outcome

Sampling DistributionCounts

results

40 50 60 70 80

Sampling DistributionProportions

results

0.4 0.6 0.8

Distribution of a statistic (mean or proportion)

Two scenarios : (1) If we are studying a quantitative variable and we know the population parametersµ and σ, what can be said of the sample statistics? (2) If we know the sample statistics x and s, what canbe said of the population parameters? The first question is easiest to answer, but is rarely the case. Thesecond question leads to much of contemporary statistics.

We might be studying a quantitative variable in a population (normal or not) with population pa-rameters µ and σ, and we wish to know the sampling distribution of the sample mean x. Or we mightbe studying a categorical variable in a population with proportion p, and we wish to know the samplingdistribution of the sample proportion p̂. The following table summarizes the sample distributions in bothcases (Probability and Statistics, Open Learning Initiative, CMU).

Variable Statistic Shape Center Standard Error Conditions

quantitative (σ known) x Normal µ σ/√n n ≥ 30 or approx. normal

quantitative (σ unknown) x t µ s/√n n ≥ 30 or approx. normal

categorical p̂ Normal p√

p(1−p)n min(np, n(1− p)) ≥ 15

Barrels of marbles and beans

Three distributions : population, data, sampling

Here is a more leisurely view of sampling distributions. Suppose that in the middle of the classroomwe have a big barrel of red and white marbles. A proportion p of this large population of marbles is red. Ifwe actually knew exactly how many red and how many white marbles were in the barrel, we could make asmall table displaying the number of red and white marbles in the barrel. Agresti and Franklin would callthis table the population distribution. Beside that barrel is another large barrel full of bright green limabeans. The beans vary in size from large to small, but most are medium-sized. We are actually interestedin the weights of the beans, so it is interesting to know that the average weight of the beans in this barrelis µ and the standard deviation of their weights is σ. A graph of the actual distribution of weights of thelima beans in the barrel would be an illustration of a population distribution.

Sampling distribution of the sample proportion

Next to the barrels is a small scoop. We begin by sampling from the barrel with the marbles. Scoopout a small but fixed number of marbles, n, and count the proportion of red marbles in your scoop, p̂.Make a small table displaying the number of red and the number of white marbles in your scoop. Agrestiand Franklin would call this table the data distribution of the marbles in your sample. Another studenttakes a scoop of the same number n of marbles, and calculates the statistic p̂ for the marbles in her scoop.We might expect these two statistics to be similar, but probably not exactly equal. The question now is,if a large number of students took similar samples of size n from the barrel of marbles, and calculatedthe proportion p̂ of red marbles in each scoop, what would the collection of statistics p̂ look like. Theanswer to that question is called the sampling distribution of p̂. Since our samples are independent, theCLT says that the sampling distribution is approximately normal. It has mean µ and standard deviation√p(1− p)/n.

Sampling distribution of the sample mean

Now turn to the barrel of lima beans. We want to study the weights of the beans in our samples, so weborrow a precision balance from the chemistry department. Scoop out a small but fixed number of limabeans, n, weigh each bean in your scoop, and calculate the average weight of the beans in your scoop, x̄.Make a stripchart (dot plot) of the weights of the beans in your sample. This strip chart is an illustration ofthe data distribution of the beans in your sample. Now the next student repeats the process, and calculatesthe average weight of the beans in the second scoop, x̄. We might expect these two statistics to be similar,but probably not exactly equal. The question now is, if a large number of students took similar samples of

size n from the barrel of lima beans, and calculated the average weight of the beans x̄ in each scoop, whatwould the collection of statistics x̄ look like. The answer to that question is called the sampling distributionof x̄. Since our samples are independent, the CLT says that the sampling distribution is approximatelynormal, and in fact if the distribution of the population weights is actually normal, then the samplingdistribution is also normal (not just approximately normal). In either case, the sampling distribution hasmean µ and standard deviation σ/

Standard Error

The standard deviation of a statistic is called a standard error, so in the sequel we will write

SEx̄ = σ/√n

orSEp̂ =

√p(1− p)/n.

Cherry Blossom Run, 2012

16,924 runners participated in the Cherry Blossom 10 Mile Run in 2012. Take the population to be allof the runners, and estimate their average time to complete the race and the average age of the participantsby taking 1000 samples of 100 runners each and calculating the corresponding sample statistics. The resultsare illustrated in the population and sampling distributions shown below. (OpenIntro Statistics, 2nd ed.,pp.159-164).

Population Distribution (time)

Time (Min)

Frequency

40 60 80 120 160

Sampling Distribution (time)

Sample mean

Frequency

40 60 80 120 160

0100200

Population Distribution (age)

Frequency

0 20 40 60 80

Sampling Distribution (age)

Sample mean

Frequency

0 20 40 60 80

Inference

Outline for one-variable inference for sample data (Peck, chapters 7–13):

The goal is to generalize from a sample to learn about a population

• categorical variable

- one proportion

- confidence interval - one-sample z CI for a proportion

- hypothesis testing - one-sample z HT for a proportion

- difference of two proportions

- confidence interval - two-sample z CI for a difference in proportions

- hypothesis testing - two-sample z HT for a difference in proportions

• continuous variable

- one mean

- confidence interval - one-sample t CI for a mean

- hypothesis testing - one-sample t HT for a mean

- difference of paired means

- confidence interval - paired t CI for a difference in means

- hypothesis testing - paired t HT for a difference in means

- difference of independent means

- confidence interval - two-sample t CI for a difference in means

- hypothesis testing - two-sample t HT for a difference in means

Exercises

We will attempt to solve some of the following exercises as a community project in class today. Finish thesesolutions as homework exercises, write them up carefully and clearly, and hand them in at the beginningof class next Friday.

Homework 8a – sampling distributions

Exercises from Chapter 8:8.2 (histograms), 8.3 (imports), 8.15 (sampling distribution), 8.16 (normal), 8.26 (sampling distribution)

Homework 8b – sampling distributions

Exercises from Chapter 8:8.27 (sampling distribution), 8.29 (polling), 8.30 (credit card), 8.50 (sample size), 8.52 (hurricane)

Chapter 8: Sampling Variability and Sampling Distributionscparrish.sewanee.edu/stat204...

Documents

Transcript of Chapter 8: Sampling Variability and Sampling Distributionscparrish.sewanee.edu/stat204...

9. Sampling Distributions

1 Chapter 7 Sampling and Sampling Distributions Simple Random Sampling Point Estimation Introduction to Sampling Distributions Sampling Distribution of.

Example: Sampling distributions

Chapter 7 Sampling and Sampling Distributions

05 Sampling Distributions

1 1 Slide Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.

Sampling and Sampling Distributions: Part 2

Chapter 7:Sampling and Sampling Distributions

9.1: Sampling Distributions

Fundamental Sampling Distributions

Chapter 7 Sampling and Sampling Distributions Sampling Distribution of Sampling Distribution of Introduction to Sampling Distributions Introduction to.

Chapter 11 Sampling and Sampling Distributions

Sampling and Sampling Distributions...224 Sampling and Sampling Distributions 6.1 Sampling from a Population Development of a Sampling Distribution 6.2 Sampling Distributions of Sample

6 Sampling and Sampling Distributions - York University · 6 Sampling and Sampling Distributions ... compared with probability distributions based on assumptions of evenly ... sample

Sampling Distributions

Sampling and Sampling Distributions Aims of Sampling Probability Distributions Sampling Distributions The Central Limit Theorem Types of Samples.

Chapter 7: Sampling Distributionscparrish.sewanee.edu/stat204 S2015/notes/part 02...Stat 204, Part 2 Probability Chapter 7: Sampling Distributions These notes re ect material from

1 Sampling Methods and Sampling Distributions Sampling Methods and Sampling Distributions Chapter.

09 Sampling Distributions

Sampling and sampling distributions