A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum...

54
A new sampling method: stratified sampling • In stratified sampling, we conduct SRS in each stratum • Outline – Definition and motivation – Statistical inference (theory of stratified sampling) – Advantages of stratified sampling – Sample size calculation

Transcript of A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum...

Page 1: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

A new sampling method: stratified sampling

• In stratified sampling, we conduct SRS in each stratum

• Outline– Definition and motivation– Statistical inference (theory of stratified sampling)– Advantages of stratified sampling– Sample size calculation

Page 2: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling: definition and motivation

• A motivating example: average number of words in save messages of people in this room

• What is stratified sampling?– Stratify: make layers– Strata: subpopulations• Strata do not overlap• Each sampling unit belongs to exactly one stratum• Strata constitute the whole population

Page 3: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Why do we use stratified sampling?

• Be protected from obtaining a really bad sample. Example– Population size is N=500 (250 women and 250 men)– SRS of size n=50– It is possible to obtain a sample with no or a few males– Pr(less than or equal to 15 men in an SRS)=0.003– Pr(less than or equal to 20 men in an SRS)=0.10

• In stratified sampling, we can sample 25 men and 25 women

Page 4: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Why do we use stratified sampling?

• Stratified sampling allows us to compare subgroups

• Convenient, reduce cost, easy to sample• More precise. See the following example

Page 5: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Total number of farm acres (3078 counties)

• SRS of 300 counties from the Census of Agriculture– Estimate: , standard error:

• Stratified sampling: about 10% stratum (region)

Page 6: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Total number of farm acres (3078 counties)

Estimate: Standard error:

Page 7: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Theory of stratified sampling

Page 8: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Notation for Stratification: Population

Page 9: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Notation for Stratification: Sample

Page 10: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling: estimation

Page 11: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Statistical Properties: Bias and Variance

Page 12: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Variance Estimates for stratified samples

Page 13: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Confidence intervals for stratified samples

Some books use t distribution with n-H degrees of freedom

Page 14: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weights

In a population with 1600 men and 400 women and the stratified sample design specifies sampling 200 men and 200 women, • Each man in the sample has weight 8 and woman has weight 2• Each woman in the sample represents herself and 1 other woman not

selected• Each man represents himself and 7 other men not in the sample

Page 15: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weights

• The sampling probability for the jth unit in the hth stratum is

• Sampling weight:

• The sum of sampling weight is N

Page 16: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weights

Page 17: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weightsexample

Page 18: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weights in proportional allocation

• In proportional allocation, the number of sampled units in each stratum is proportional to the size of the stratum, i.e.,

• Every unit in the sample has the same weight and represents the same number of units in the population. The sample is called self-weighting

Page 19: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Sampling probabilities and weights in proportional allocation

Sampling probability for all units is about 10%All the weights are the same: 10

Page 20: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.
Page 21: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of stratified sampling

Page 22: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Observed data

Page 23: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Spreadsheet for calculations in the example

Page 24: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling for proportions

Page 25: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Allocating observations to strata

• In the theoretical derivation and examples of stratified sampling, we assume that someone has designed a survey.

• Survey design is the most important part of using a survey in research– If we use a badly designed survey, there is no way that

we can get the correct result• The problem of allocating observations to strata

concerns how should one determines the sample size /relative sample of each stratum.

Page 26: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Proportional Allocation

• In proportional allocation– the number of sampled units in each stratum is

proportional to the size of the stratum– The probability of selection is the same for all

strata (= ) for all strata– Every unit in the sample has the same weight

(=N/n), represents the same number of units in the population

– The sample is a self-weighting sample

Page 27: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling (with proportional allocation) vs SRS

• What is the benefit of using stratified sampling (with proportional allocation)

• Under what conditions is stratified sampling (with proportional allocation) better than SRS?

• To compare the two sampling methods, we need to compare between-strata and within-strata variances

Page 28: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Analysis of Variance (ANOVA) for the population

Page 29: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling (with proportional allocation) vs SRS

Page 30: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling (with proportional allocation) vs SRS

Page 31: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Stratified sampling (with proportional allocation) vs SRS

• The situation when stratified sampling with proportional allocation give a larger variance than SRS rarely happens when the strata sizes are large.

• The more unequal the stratum means, the more precision we will gain by using stratified sampling with proportional allocation

Page 32: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal Allocation

• Stratified sampling with proportional allocation is easy to conduct

• It is more precise than SRS in most situations• But it is not necessarily the most efficient

stratified sampling• This is especially true when the variances vary

substantially from stratum to stratum

Page 33: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

• The goal of optimal allocation is to gain the most information for the least cost.

• We can assume that the total cost is fixed. Given that, we want to minimize the variance

• Different types of cost– Total cost: C– Overhead cost such as maintaining an office: C0

– The cost of taking an observation in stratum h: Ch

Page 34: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

Want to minimize

subject to

H

h hhH

hh

hh

H

hh

hh

h

hstr

SN

N

n

S

N

N

n

S

N

N

N

nyVar

1

221

22

1

22

)1(][

Recall that

H

h hhstr yNN

y1

1

Page 35: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

• Introducing a Lagrange multiplier λ, we will need to minimize

• Take partial derivative and set it to zero

)(][),...,,(1 01

H

h hhstrH CcncyVarnnL

Page 36: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

ncSN

cSNn

cSNNc

SN

cN

SNn

cn

S

N

NL

n

L

H

h hhh

hhhh

hhh

h

hh

h

hhh

hh

hh

h

1

2

22

/

/

/11

0

unknown. is which size, sample total theis where n

Page 37: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

• We need to find the value of n• Recall that the total cost C is fixed, i.e.,

)(/

/

/

/

0

1

1

0

1

1

1

1

0

cCcSN

cSNn

cCncSN

cSN

CnccSN

cSNc

H

h hhh

H

h hhh

H

h hhh

H

h hhh

H

h hH

h hhh

hhh

Page 38: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation

• Combine the results, we have

)(/

0

1

cCcSN

cSNn H

h hhh

hhhh

Page 39: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation: two special situations

.allocation sNeyman' called is This

/and

situation, In this

...

1

*0*

0

*21

nSN

SNnSNcSNn

c

cCnnccC

cccc

H

h hh

hhhhhhhhh

H

Page 40: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example

Page 41: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation: two special situations

.allocation alproportion is This

situation, In this

...and ... *21

*21

nN

NnN

c

SNn

SSSScccc

hhh

h

hhh

HH

Page 42: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Optimal allocation for fixed variance (v)

• One may want to minimize cost for fixed variance

• Mathematically, we want to

• One can use Lagrange multiplier to show that

Want to minimizesubject to ][ stryVarv

hhhh cSNn /

Page 43: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Some practical issues

• Stratified sampling often gives higher precision than SRS

• But how to define strata?

• Stratification is most efficient when stratum means differ widely

Page 44: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Define strata

• Try to find some variables closely related to y– E.g., For farm income, use the size of a farm as a

stratification variable– For estimating total business expenditures on

advertising, stratify by number of employees or by the type of product

• Get information from experts, old data, preliminary data, etc

Page 45: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Effects of unknown strata sizes and variances

• Unknown strata sizes and variances cause bias• One can use a pilot study to obtain good

estimates of strata sizes and variances

Page 46: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Summary

• Stratified sampling almost always gives higher precision than SRS

• Stratification adds complexity to survey. E.g., when strata sizes and variances are unknown

• In many situations, the potential gain from stratification are large enough to justify the effects of stratifying population and the expenses of conducting pilot studies

Page 47: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

Poststratification

• Suppose a sampling frame lists all households in an area

• You would like to estimate the average amount spent on food in a month

• One desirable stratification variable is household size– Large households are expected to have higher food bills

• The distribution of household size is known (from U.S. census data)

Page 48: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratificationThe distribution of household size from U.S. census

Page 49: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratification

• The sampling frame does not include information on household size – we cannot conduct a stratified sampling based on household size

• We take an SRS and record– The amount spent on food– The household size

• If n (of the SRS) is large enough, we expect about 26% 1-person households and about 31% two-person households, and so on

Page 50: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratification

• We can use the methods of stratified sampling to estimate the average amount spent on food for each category of household sizes

• After the observations are taken, we can form a “stratified” estimate of the population mean

Page 51: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratification

Page 52: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratification

• Discuss about the example

Page 53: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

An example of poststratification

• Poststratification can be dangerous• You can obtain arbitrarily small variances if

you choose the strata after seeing data• Poststratificaiton is most often used to correct

for the effects of differential nonresponse in the poststrata (chapter 8)

Page 54: A new sampling method: stratified sampling In stratified sampling, we conduct SRS in each stratum Outline – Definition and motivation – Statistical inference.

A new sampling method

• Motivating example• Want to study the average amount water used

by per person• How would you design a survey?