Sampling and Survey

54
Outline Introduction Basic Concepts Sampling methods Sampling and Survey April 22, 2009 Sampling and Survey

Transcript of Sampling and Survey

Page 1: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Sampling and Survey

April 22, 2009

Sampling and Survey

Page 2: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

1 IntroductionSampling in life

2 Basic ConceptsSurvey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

3 Sampling methodsSimple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Sampling and Survey

Page 3: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Sampling in life

Sampling-Induction

Sampling food and drink

Sampling a new product to be introduced or already in themarket.

Your example of preference.

Sampling and Survey

Page 4: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Sampling in life

When observed instances are identical or almost so, only a feware needed to be observed.

Repeated observations are for confirmation.

When items indeed vary, drawing conclusion based on one orfew samples may be risky.

Repeated observations are then required to make stronginferences.

Sampling and Survey

Page 5: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Sampling in life

Sampling is applied to

It lets finding out characteristics and events in:

Human’s

Plant populations

Physical objects

Animals’s

Etc.

Sampling and Survey

Page 6: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

Meeting the objectives

In order to meet easily the survey’s objectives, the following are tobe considered

• Subject matter issues.

Population to be surveyedStatistics to be obtainedData to be collectedTime periodsAccuracyAnalysisReports’ design and date of delivery.

• Operational issues.

Methods for obtaining the dataSurvey data processingMonitoring operational performanceTimetables for survey operations

Sampling and Survey

Page 7: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

• Administrative issues.

Project approvalsFinanceRecruiting, Training, Supervision, remuneration, transportation,Office space, Equipment, supplies, communications, relation withpublic, informants, staff, etc.

• Sampling issues

Sample frameSample size and its allocationDomains of studyMethods for estimating the survey results and their sample andnonsample errors.

Sampling and Survey

Page 8: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

Element is an object in which a measurement is taken.

Population is a collection of elements about which we wishto make an inference

Sampling units are non-overlapping collections of elementsfrom the population that cover the entire pop.

Frame is a list of sampling units.

Sample is a collection of sampling units drawn from a frameor frames.

Sampling and Survey

Page 9: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

The goal is estimate population parameters (θ)from the samplesuch as

Mean

Total

Proportions

Etc.

To determine the sampling method and the sample size

Every item contains a certain amount of information.

The quantity of information a sample has depends on thenumber of items in it and the variability.

θ estimates θ in the population, then

error of estimation =| θ − θ |< B (the bound)

and, the certanty

P[| θ − θ |< B] = (1− α)

Sampling and Survey

Page 10: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

The common value for B = 2σθHence, the method of sampling is chosen, based on thefollowing: Choose the sampling method with the highestcertainty and the lowest error of estimation, together with theminimum cost.

Sampling and Survey

Page 11: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

1 Simple random sample. Any sample of size n has the samechance of being selected.Property. It will contain as much information from thepopulation as any other sample survey design if thatpopulation were homogeneous.

2 Stratified random sample. Nonetheless, suppose that popconsists of items clearly identified as belonging to any of anumber of groups (i.e. people belonging to either group oflow income or high income), estimation from that pop will beremarkably more accurate if the sample comes from all of thegroups.This is achieved by knowing the presence of auxiliaryvariables, that define the groups or strata.

Sampling and Survey

Page 12: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

3 Cluster Sampling. Usually used in urban areas. This methodconsists of simple random selection of groups. Such as cityblocks, families, or apartment buildings. Once a given group isselected, all the items in the group are sampled.

4 Systematic Sample. Sometimes a list of items is available.This method consists on randomly selecting one item close tothe beginning of the list and proceed the selection of thesample every tenth or fifteenth item thereafter.

Sampling and Survey

Page 13: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

1 Errors of non-observation.

Sampling. The data observed in a sample will not preciselymirror the data in the pop regardless how careful themeasurements are done.. The most common error is theSample Bias.Coverage. The sampling frame does not match up perfectlywith the target pop.Non-response. The inability to contact the sampled item, theinability of an interviewed to come up with the answer, or justbecause the interviewed refuses to answer.

Sampling and Survey

Page 14: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Survey DesignSamplingHow to select the sampleProbability Sampling methodsSources of errors in sampling

2 Error of observation.

Enumerators. Have a direct and dramatic effect on the way ainterviewed responds to a question. Reading withoutappropriate emphasis or intonation.Respondents. Inability to answer correctly.Instrument. The inability to clarify the measurement units aquestion refers to, inches or centimeter. Clarifying terms:employment or unemployment rates.Method of data collection. Either direct interview, by phonecalls, or by mail, now by websites.

Sampling and Survey

Page 15: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Simple Random Sample

Sampling and Survey

Page 16: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Definition. Simple Random Sample is a sampling proceduresuch that a sample size n is drawn from a pop size N, in away that any sample of the same size has the same chance ofbeing selected.

How to draw a simple random sample. UsingR − Software load and install Sampling package. srswor(n,N)function selects the sample for you.

Estimation of a population mean, total, and proportion

Parameter Mean Variance Bound

Mean µ = y =

nXi=1

yi

nV (y) = s2

n(1− n

N) 2

qV (y)

Total τ = Ny =

N

nXi=1

yi

nV (Ny) = N2 s2

n(1− n

N) 2

qV (Ny)

Proportion p = y =

nXi=1

yi

nV (p) = pq

n−1(1− n

N) 2

qV (p)

Sampling and Survey

Page 17: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The following code does the sample selection and compute the CIfor the mean of the Height for you. The data set used is the oneyou had accessed before in a file posted in the LISA− website, forthis course.

Sampling and Survey

Page 18: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Assuming n = 10 and N = 57.

n=10; N=57;sam=srswor(n,N);srsexample=getdata(classsur,sam);meanHeight=mean(classsur$Height,na.rm=TRUE)varHeight=var(classsur$Height,na.rm=TRUE)meanHeight.sample=mean(srsexample$Heightna.rm=TRUE)varHeight.sample=var(srsexample$Height,na.rm=TRUE)var.simpless.Height=(varHeight.sample/n)*(1-n/N)B=2*sqrt(var.simpless.Height)Estimation=data.frame(meanHeight.sample,var.simpless.Height,meanHeight.sample-B,meanHeight, meanHeight.sample+B)colna=c("Height sample Mean","Height sample Var","LCL","RealMean","UCL")colnames(Estimation)=colna

Sampling and Survey

Page 19: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

This is the output for the code above. (it is a 95% CI),

yHeight s2Height LCL µHeight UCL

66.20 0.98 64.22 67.02 68.18

Sampling and Survey

Page 20: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

60 62 64 66 68 70

6466

6870

72

CI for estimating the population mean of the Height

The number of times the CI covers the real value is 96 out of 100Seq

Hei

ght M

ean

Figure: CI for the µHeight

Sampling and Survey

Page 21: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Sample size to estimate µ with a bound on the error ofestimation B is given by

n =Nσ2

(N − 1)B2

4 + σ2

Sample size to estimate τ with a bound on error B,

n =Nσ2

(N − 1) B2

4N2 + σ2

Sample size to estimate p (proportion) with a bound on errorB,

n =Npq

(N − 1)B2

4 + pq

Sampling and Survey

Page 22: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example for the mean. Assume that our data set is the frame ofour population. We want to estimate the mean of the Height.From a previous study, we know that the standard deviation of thepopulation’s Height is about 4.5. We also required an error ofestimation B=2 inches. What is the needed sample size?Soln. We have to compute the sample size. Remember thatN=57. Thus

n =57 ∗ 4.52

(57− 1) 22

4 + 4.52= 15.14 ' 16

Sampling and Survey

Page 23: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example on proportion. Now we want to estimate the proportionof women in this class. We do not know anything about women¨ !!!. We also required an error of estimation B=0.05. What is

the sample size needed get our estimation.Soln.

n =57(0.5)(0.5)

(57− 1) 0.052

4 + (0.5)(0.5)= 50

This large value of the sample size is due to (1) We do not knowmuch about the proportion, and p=0.5 is the value that maximizesthe variance in the pop, and (2) because the population size isreally small.

Sampling and Survey

Page 24: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Stratified Random Sample

Sampling and Survey

Page 25: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Definition. It is obtained by separating the populationelements into homogenous groups, called strata.

How to draw a stratified. Using R − software, the stratafunction in the Sampling package, does the selection for you.

Estimation of a population mean, total, and proportion

Parameter Mean Variance Bound

Mean yst = 1N

LXi=1

Ni yi V (yst ) = 1N2

LXi=1

N2i (1−

ni

Ni

)(s2i

ni

) 2q

V (yst )

Total Nyst =LX

i=1

Ni yi V (Nyst ) =LX

i=1

N2i (1−

ni

Ni

)(s2i

ni

) 2q

V (Nyst )

Proportion pst = 1N

LXi=1

Ni pi V (pst ) = 1N2

LXi=1

N2i (1−

ni

Ni

)(pq

ni − 1) 2

qV (pst )

Sampling and Survey

Page 26: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The following example is again based on our data set. Now wedefine Gender as the variable to stratify. This variable has twostrata. We also want to estimate the mean of the Height. Using R,we get the following output.

Sampling and Survey

Page 27: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Table: Stratified random sample R-output

Height Weight StudyHrs SleepHrs Job Textpay Gender ID unit Stratum nh Nh

65 155 10 7 1 150 2 5 1 6.00 23.00

67 120 7 7 1 180 2 11 1 6.00 23.00

64 106 13 6 2 200 2 21 1 6.00 23.00

67 na 15 7 2 198 2 33 1 6.00 23.00

na an 14 4 2 260 2 45 1 6.00 23.00

68 122 13 6 1 250 2 55 1 6.00 23.00

72 160 9 7 2 111 1 4 2 6.00 34.00

70 160 8 6 2 245 1 17 2 6.00 34.00

67 147 10 5 1 220 1 31 2 6.00 34.00

75 160 9 9 1 200 1 40 2 6.00 34.00

67 153 15 9 2 90 1 44 2 6.00 34.00

58 155 16 6 1 200 1 57 2 6.00 34.00

Sampling and Survey

Page 28: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

In R, by the function syvdesign we analyzed the data and alongwith syvmean function, the following result is obtained,

mean s2Heightst

LCI Real Mean UCI

Height 67.92 0.79 66.15 67.02 69.70

It is clear that this method gives a more precise mean estimation.It is because comparing the variances of both methods, this oneabove is smaller. Then a narrower CI.

Sampling and Survey

Page 29: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The sample size is then computed

The sample size required to estimate µ or τ with a bound Bon the error of estimation is

n =

L∑i=1

N2i σ

2i /wi

N2D +L∑

i=1

Niσ2i

,

where D = B2

4 for µ or D = B2

4N2 for τ , and wi is the fractionof observation allocated to stratum i . Thus,

wi = ni/n

Sampling and Survey

Page 30: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The sample size required to estimate a proportion with abound B on the error of estimation is

n =

L∑i=1

N2i piqi/wi

N2 B2

4 +L∑

i=1

Nipiqi

Again,wi = ni/n

Sampling and Survey

Page 31: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example for estimating the mean. Going back to our data set. Wewant to estimate the mean under the strata scheme. We have twostrata defined by Gender variable. We barely know that on eachstrata the standard deviation of the Height is approximately: 4 and3. We know that N = 57 = NGender=1(= 23) + NGender=2(= 34).We want a sample equally weighted across strata. Determine thesample size, with an error of estimation=2 inches.From our basic statistic courses, we know that the standarddeviation is approximately one fourth of the range. Hence,

D = 22/4 = 1

n =232(4)2/(1/2) + 342(3)2/(1/2)

5721 + (23(4)2 + 34(3)2= 9.6 ' 10

[Note: compare this sample size with the one obtained by Simple Random Sample]

Sampling and Survey

Page 32: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

AllocationThere are two most important ways to allocate the sample intostrata:

Considering the cost, which is the general allocation

ni = n

Niσi/

√ci

L∑i=1

Niσi/√

ci

Neyman allocation, making ci = c ,∀ i=1, . . . L

ni = n

Niσi

L∑i=1

Niσi

Sampling and Survey

Page 33: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example. Continuation of the last example. Allocate thecalculated sample size into those strata, assuming that the realstandard deviation are those stated in the problem. Assuming also,equal costs.

n1 = 10

(23 ∗ 4

23 ∗ 4 + 34 ∗ 3

)= 4.7 ' 5

n2 = 10

(34 ∗ 3

23 ∗ 4 + 34 ∗ 3

)= 5.2 ' 5

Sampling and Survey

Page 34: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Systematic Sample

Sampling and Survey

Page 35: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Definition. A sample is obtained by randomly selecting onelement from the first k 6 N

n elements and every kth elementthereafter.

Pros and Cons.

It is easier to perform in the field and therefore is less subjectto selection errors.Estimates can be more preciseIf the population is grouped, the sample will containinformation among those groups.The sample is spread uniformly over the populationWithout knowing the population, it could induce some of thesampling bias.To get a more accurate estimation of the estimator’s variancemultiple systematic samples are required.

Sampling and Survey

Page 36: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Estimation (From only one systematic sample). Either amean, a total, or a proportion estimation is required, the formulasare the same as those for Simple Random Sample.How to draw a systematic sample. Using R, the functionUPsystematic draws the sample for you. It is necessary to define avector of probabilities. Since this method can be generalized to aunequal sampling probability. For this case, that probability is thesame for every single item.

Sampling and Survey

Page 37: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Exercise. Select a sample of size n = 10, systematically. Using R,we get the following table. It is only part of the table.

ID unit Gender Height Weight

2 2 2 71 1587 7 2 66 125

13 13 2 61 11519 19 2 64 na25 25 2 67 13030 30 2 63 12336 36 2 65 11842 42 1 73 17547 47 1 75 22553 53 2 68 175

It is clear that this induced sample bias. Do you notice it?

Sampling and Survey

Page 38: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

In the case of repeated systematic samples, we clearly can estimatethe variance of ysy through ANOVA idea. In the following context.Assume the following table,

Cluster Sample NumberMean

number 1 2 . . . ns

1(initials) y11 y12 . . . y1ns y1

2 y21 y22 . . . y2ns y2...

......

......

...c yc1 yc2 . . . ycns yk

y

V (µ) =(N − n)

N

∑ci=1(yi − y)2

c(c − 1)Note:

V (ysy ) =σ2

n(1 + (n − 1)ρ)

Sampling and Survey

Page 39: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

How to draw multiple systematic samples?

One Sys Multiple Sys

N=960 N=960n=60 n=60

k = 96060 = 16 10k = 10 ∗ 16 = 160 = k ′

c=1 w. ns = n c=6 w.ns = 101 in k 10 in k’

Sampling and Survey

Page 40: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Exercise. Based on the sample we got in the exercise above, usingR, we got the following result.The sample mean of the Height and the standard error of thismean are given by,

Mean SE

Height 67.30 1.29

And

LCI Real mean UCI

64.72 67.02 69.88

This CI is wide. It is because this sampling method induced asample bias, it is also increasing the variability on the estimation.

Sampling and Survey

Page 41: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Textbook example. The QC section uses systematic sampling toestimate the average amount of fill in 12-ounces cans. The datawas obtained in a 1 in 15 sys. sample in one day. Estimate µ andplace a bound on the error of estimation.

Amount of fill (in ounces)

12.00 11.91 11.87 12.05 11.75 11.8511.97 11.98 12.01 11.87 11.93 11.9812.01 12.03 11.98 11.91 11.95 11.8712.03 11.98 11.87 11.93 11.97 12.0512.01 12.00 11.90 11.94 11.93 12.0211.80 11.83 11.88 11.89 12.05 12.04

This is clearly a one systematic sample.

Sampling and Survey

Page 42: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Soln. To apply the analysis for a multiple systematic sampling, weassume that we have c = 6 clusters and ns = 6.

The sample mean is (which is the pop mean estimator):

y =12.0 + 11.97 + 12.01 + . . .+ 12.02 + 12.04

36= 11.95

The sample variance, given that we can compute 6 differentmeans which are:

11.91 11.96 11.96 11.97 11.97 11.92,

is ∑6i=1(yi − y)2

(6− 1)= 0.0007985.

Sampling and Survey

Page 43: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Now, just using the formula for the variance of the multiplesystematic sampling, and under the assumption that N is unknownand large (N−n

N = 1), we get,

V (µ) =0.0007985

6= 0.000133

The bound is:2√

0.000133 = 0.023

The 95% CI for the pop mean is:

(11.92, 11.97)

Sampling and Survey

Page 44: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Definition. A cluster sample is a probability sample in whicheach sampling unit is a ”collection” of elements. Thosecollections could be,

City blocksLand plotsForest regionsetc.It reduces the cost of obtaining observations by reducing thedistances among elements

How to draw a sample. Take advantage of R, using clusterfunction in Sampling package. It is needed to have in thedata set a clustering variable. Then a Simple Random Sampleon the clusters is carried out.

Sampling and Survey

Page 45: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example. This is a new data set. For this data set select a sampleof 15 states. The columns on this data set are (From US Bureu of the Census,

1995):

Column Description

1 State US states

2 ExpPP90 and 92 Expenditure (dls) per student

3 ExpPC90 and 92 Expenditure (dls) per capita

4 TeaSal90 and 92 Average teacher salary (thousand dls)

5 Comp1 % of residents over 25, w. complete High School, 1990

6 Dropout % of people 16-19 of age dropped out HS, 1990

7 Region 1:NE, 2:Midwest, 3: S, 4:W

8 Pop Population in millions, 1992

9 Enroll Students enrolled, 1990 (thousands)

10 Teachers Teachers, 1990 (thousands)

Sampling and Survey

Page 46: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Soln. Using R, we got the following result.

STATE

1 Alaska2 California3 Colorado4 Connecticut5 DC6 Florida7 Indiana8 Iowa9 Louisiana

10 N Dakota11 New Mexico12 Ohio13 Oklahoma14 Pennsylvania15 Texas

Sampling and Survey

Page 47: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

As it could be expected, estimations’ formulae become messier.Expressions for estimating the mean and total of a population, are

µ = y =

∑ni=1 yi∑ni=1 mi

, and τ = My

It is needed to define some other terms,

Term Description

N Number of clusters in the popn Number of clusters selected

mi Number of elements in cluster im Average cluster size for the sampleM Number of elements in the pop

M = M/N Average cluster size in popyi Total of all observations in cluster i

Sampling and Survey

Page 48: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The estimated variance of µ and τ ,

V (y) =(N − n)

NnM2s2r , and V (τ) = M2V (y); s2

r =

n∑i=1

(yi − ymi )2

n − 1

or for estimating τ without M dependence,

Nyt =N

n

n∑i=1

yi ; V (τ) = N2 (N − n)

Nns2t ; s2

t =

∑ni=1(yi − yt)2

n − 1

M can be estimated by m if M is unknown, for estimating µ.

Sampling and Survey

Page 49: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Exercise. For the sample selected above, estimate the averageteachers’ salary for the US, 1992. Place a bound of error ofestimation.Using R, we get the following 95% CI for estimating the populationmean.

y sr LCI Mean (pop) UCI

Teachers’ mean 33.99 1.66 30.67 34.14 37.31

R estimated M by m.Since we know the values for N and M, thus, doing thecomputation in Excel, the following results are obtained.

y sr LCI Mean (pop) UCI

Teachers’ mean salary 33.99 2.37 29.25 34.14 38.74

Sampling and Survey

Page 50: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The sample size determination is done by the following formulas.

To estimate µ with a bound B on the error,

n =Nσ2

r

ND + σ2r

Where D = (BM)2/4. σ2r is estimated by s2

r .

To estimate τ with a bound B on the error,

n =Nσ2

r

ND + σ2r

Where D = B2/(4N2). σ2r is estimated by s2

r .

Sampling and Survey

Page 51: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

The extension of these formulas to estimate proportions, isstraightforward.

Sampling and Survey

Page 52: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

Example of the analysis in multinomial type data setShould smoking be banned from the workplace?A Time/Yankelovich poll of 800 adult americans carried out onApril 6-7, 1994 (Time april 18, 1994)

Nonsmokers Smokers

Banned 44% 8%Special Areas 52% 80%No restriction 3% 11%

Total 100%(600) 100%(200)

1 The true difference between the proportions choosing”banned” The prop choosing banned are independent of eachother. Thus the 95% CI for the difference estimate is :

(.44− .08)± 2

√.44x .56

600+.08x .92

200

0.36± 0.06 = 30%, 42%

Sampling and Survey

Page 53: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

2 The true difference between the proportions of nonsmokerschoosing ”banned” and ”especial areas”. This is amultinomial type data, the estimate of the diff is,

(.52− .44)± 2

√.44x .56

600+.52x .48

600+ 2

.52x .44

600

0.08± 0.08 = 0, 16%

You may ask why these formulas are used.

Sampling and Survey

Page 54: Sampling and Survey

OutlineIntroduction

Basic ConceptsSampling methods

Simple Random SampleStratified Random SampleSystematic SamplingCluster sampling

In a multinomial type data set, regardless the number of classesyou have, for proportions we have the following expressions

pi =n∑

j=1

yij/n for all i = 1, . . . , k(classes)

V (pi ) =pi (1− pi )

nfor all i = 1, . . . , k(classes)

Cov(pi , pj) = −pipj

nfor all (i 6= j) = 1, . . . , k(classes)

AndV (pi − pj) = V (pi ) + V (pj)− 2Cov(pi , pj)

Sampling and Survey