Survey and Sampling Methods Session 9. Introduction Nonprobability Sampling and Bias Stratified...
-
Upload
anabel-mathews -
Category
Documents
-
view
264 -
download
5
description
Transcript of Survey and Sampling Methods Session 9. Introduction Nonprobability Sampling and Bias Stratified...
Survey and Sampling Methods
Session 9
• Introduction• Nonprobability Sampling and Bias• Stratified Random Sampling• Cluster Sampling• Systematic Sampling• Nonresponse• Summary and Review of Terms
9-1 Sampling Methods
• Sampling methods that do not use samples with known probabilities of selection are knows as nonprobability sampling methods.
• In nonprobability sampling methods, there is no objective way of evaluating how far away from the population parameter the estimate may be.
• Frame - a list of people or things of interest from which a random sample can be chosen.
9-2 Nonprobability Sampling and Bias
In stratified random sampling, we assume that the population of N units may be divided into m groups with Ni units in each group i=1,2,...,m. The m strata are nonoverlapping and together they make up the total population: N1 + N2 +...+ Nm =N.
7654321 Group
Nii
Population Distribution7654321 Group
ni
Sample Distribution
In proportional allocation, the relative frequencies in the sample (ni/n) are the same as those in the population (Ni/N) .
9-3 Stratified Random Sampling
True weight of stratum i:
Sampling fraction in stratum i:
True mean of population: True mean in stratum i: iTrue variance of the population: 2
True variance of stratum i: 2
Sample mean in stratum i:
Sample variance in stratum i: The in stratified random sampling:
st
WiNiN
finin
iXi
si
X Wi Xii
m
2
1
estimator of the population mean
Relationship Between the Population and a Stratified Random Sample
1. If the estimator of the mean in each stratum, Xi , is then the stratified estimator of the mean, Xst is an estimator of the population mean, .2. If the samples in the different strata are drawn independently of each other, then the variance of the stratified estimator of the population mean, Xst , is given by:
( ) = Xii=1
m
If sampling in all strata is random, then the variance of Xst is further equal to:
( ) =i=1
m
When the sampling fractions, , are small and may be ignored, we have:
unbiased unbiased,
( )
.
( )
V Xst Wi V
V Xst Wii
ni
fi
fi
2
3
2 21
V Xst Wii
ni
( ) =i=1
m 2 2
Properties of the Stratified Estimator of the Sample Mean
4. If the sample allocation is proportional for all i , then
( ) =1 - f
n i=1
m
which reduces to
( ) =1n i=1
m
when the sampling fraction is small. In addition, if the population variances in all strata are equal, then
( ) =2
n
when the sampling fraction is small.
ni
nN
iN
V X st Wi
V X st Wi
V X st
i
i
2
2
Properties of the Stratified Estimator of the Sample Mean (continued)
An unbiased estimator of the population variance of stratum i,12 , is:
i2
data in iIf sampling in each stratum is random:
2 ( ) = i2
i=1
m
SX X
ini
S X st
Wi S
nf
ii
( )
( )
2
1
1
2
When the Population Variance is Unknown
A (1 - )100% confidence interval for the population mean, , using stratifiedsampling: x
st
The effective degrees of freedom:
Effective df =
( )=
( ) /
z s Xst
Ni
Ni
ni
si
nii
m
Ni
Ni
ni
ni
si
nii
m
2
2
1
2
2 4
11
( )
( )
Confidence Interval for the Population mean in Stratified Sampling
Population True SamplingNumber Weights Sample Fraction
Group of Firms (Wi) Sizes (fi) 1. Diversified service companies 100 0.20 20 0.202. Commercial banking companies 100 0.20 20 0.203. Financial service companies 150 0.30 30 0.304. Retailing companies 50 0.10 10 0.105. Transportation companies 50 0.10 10 0.106. Utilities 50 0.10 10 0.10
N=500 n=100
StratumMeanVariance ni Wi Wixi 1 52.7 97650 20 0.2 10.54 156.240 2 112.6 64300 20 0.2 22.52 102.880 3 85.6 76990 30 0.3 25.68 184.776 4 12.6 18320 10 0.1 1.26 14.656 5 8.9 9037 10 0.1 0.89 7.230 6 52.3 83500 10 0.1 5.23 66.800
Estimated Mean: 66.12 532.582Estimated standard error of mean: 23.08
1 fn
Wi si2 95% Confdence Interval:
xst
66
z s Xst
212 1 96 23 08
66 12 45 2420 88 111 36
( )
. ( . )( . )
. .[ . , . ]
Example 9-1
Stratified estimator of the population proportion, ,
The approximate variance of
V(
When the finite - population correction factors, must be considered:
V(
When proportional allocation is used:
V(
p
Pst Wi Pii
m
Pst
Pst WiPi Qinii
m
f
PstN
Ni
Ni
ni
Pi QiN
inii
m
Pstf
nWi Pi Qii
m
i
,
)
,
) ( )
( )
)
1
21
12
211
11
Stratified Sampling for the Population Proportion
NumberGroup Wi ni fi InterestedMetropolitan 0.65 130 0.65 28 0.14 0.0005756Nonmetropolitan 0.35 70 0.35 18 0.09 0.0003099
Estimated proportion: 0.23 0.0008855Estimated standard error: 0.0297574
90% confidence interval:[0.181,0.279]
Wi piWi pi qi
n
90% Confdence Interval: p
st ( )
. ( . )( . )
. .[ . , . ]
z s Pst
20 23 1 645 0 2970 23 0 0490 181 0 279
Stratified Sampling for the Population Proportion: An Example
1. Preferably no more than 6 strata.2. Choose strata so that Cum f(x) is approximately constant for all strata (Cum f(x) is the cumulative square root of the frequency of X, the variable of interest).
Age Frequency (fi) 20-25 1 126-30 16 4 531-35 25 5 536-40 4 241-45 9 3 5
f(x) Cum f(x)
Rules for Constructing Strata
For optimum allocation of effort in stratified random sampling, minimize thecost for a given variance, or minimize the variance for a given cost.
Total Cost = Fixed Cost + Variable Cost C = C0 Cini
Optimum Allocation: nin
(Wi i ) / Ci(Wi i ) / Ci
If the cost per unit sampled is the same for all strata (Ci = c):
Neyman Allocation: nin
(Wi i )
(Wi i )
Optimum Allocation
1 0.4 1 4 0.4 0.200 0.329 0.235 2 0.5 2 9 1.0 0.333 0.548 0.588 3 0.1 3 16 0.3 0.075 0.123 0.176
i W W i isi Ci si W
isi
Ci
OptimumAllocation
Neyman
Allocation
1.7 0.608
Optimum Allocation: An Example
7654321 Group
Population Distribution
In stratified sampling a random sample (ni) is chosen from each segment of the population (Ni).
Sample Distribution
In cluster sampling observations are drawn from m out of M areas or clusters of the population.
9-4 Cluster Sampling
Cluster sampling estimator of :
Estimator of the variance of the sample mean:
s
where
=
2
Xn X
n
X M mMmn
n X X
m
nn
m
cl
i ii
m
ii
m
cl
i i cli
m
ii
m
1
1
2
2 2
1
1
1( )( )
Cluster Sampling: Estimating the Population Mean
Cluster sampling estimator of :
Estimator of the variance of the sample proportion:
s
2
p
Pn P
n
P M mMmn
n P P
m
cl
i ii
m
ii
m
cl
i i cli
m
( )( )
1
1
2
2 2
1
1
Cluster Sampling: Estimating the Population Proportion
95% Confdence Interval: x
cl
z s Xcl
2
2183 1 96 15872183 2 4719 36 24 30
( )
. ( . )( . )
. .[ . , . ]
xi ni nixi xi-xcl (xi-xcl)2
21 8 168 -0.8333 0.694 0.0011822 8 176 0.1667 0.028 0.0000511 9 99 -10.8333 117.361 0.2526934 10 340 12.1667 148.028 0.3934828 7 196 6.1667 38.028 0.0495325 8 200 3.1667 10.028 0.0170618 10 180 -3.8333 14.694 0.0390624 12 288 2.1667 4.694 0.0179719 11 209 -2.8333 8.028 0.0258220 6 120 -1.8333 3.361 0.0032230 8 240 8.1667 66.694 0.1134626 9 234 4.1667 17.361 0.0373812 9 108 -9.8333 96.694 0.2081917 8 136 -4.8333 23.361 0.0397413 10 130 -8.8333 78.028 0.2074129 8 232 7.1667 51.361 0.0873824 8 192 2.1667 4.694 0.0079926 10 260 4.1667 17.361 0.0461518 10 180 -3.8333 14.694 0.0390622 11 242 0.1667 0.028 0.00009
3930 s2(Xcl)= 1.58691 xcl = 21.83
M mMmn
n X Xm
i i cl
2
2 2
1( )
Cluster Sampling: Example 9-2
Randomly select an element out of the first k elements in the population, and then select every kth unit afterwards until we have a sample of n elements.
Systematic sampling estimator of :
Estimator of the variance of the sample mean: s2
When the mean is constant within each stratum of k elements but different between strata:
s2
When the population is linearly increasing or decreasing with respect to the variable of interest:
s2
X sy
Xii
m
n
X syN n
NnS
X syN n
Nn
Xi Xi ki
n
n
X syN n
Nn
Xi Xi k Xi ki
n
n
1
2
21
2 1
2 22
16 2
( )
( )( )
( )
( )( )
( )
9-5 Systematic Sampling
s2
s2
A 95% confidence interval for the average price change for all stocks: s
X syXii
m
n
X syN n
NnS
X sy X sy
1 0 5 0 36
2 2100 1002100 100
0 36 0 0034
1 960 5 1 96 0 00340 5 0114
0 386 0 614
. .
( )( )( )
. .
( . ) ( ). ( . )( . ). .
[ . , . ]
Systematic Sampling: Example 9-3
Systematic nonresponse can bias estimates. Callbacks of nonrespondents. Offers of monetary rewards for
nonrespondents. Random-response mechanism.
9-6 Nonresponse