Sampling Racial and Ethnic Minoritiessph.unc.edu/files/2013/07/kalsbeek_slides_2000.pdf ·...
Transcript of Sampling Racial and Ethnic Minoritiessph.unc.edu/files/2013/07/kalsbeek_slides_2000.pdf ·...
1
Sampling Racial and
Ethnic Minorities
William D. Kalsbeek
Director, Survey Research Unit
Professor, Department of Biostatistics
University of North Carolina
June 14, 2000
Copyright 2000, William Kalsbeek
Copyright 2000, William Kalsbeek
2
Acknowledgements
Gayle Shimokura
– For significant contributions to this presentation
through her meticulous background research.
CDC/National Center for Health Statistics
(Contract No. UR6/CCU417428-01)
– For funding support for this presentation
– UNC-CH’s Center for Health Statistics Research
– http://www.sph.unc.edu/chsr
Copyright 2000, William Kalsbeek
3
Race/Ethnic Minorities
(% of Population: March 2000 CPS)
Hispanics (11.7 %)
– Settled (95%)
– Mobile (5 %)
African-American (12.8 %)
– Settled (99.9%)
– Mobile (0.1%)
Asian-American (4.0%)
Native-American (0.9%)
Copyright 2000, William Kalsbeek
4
Overview
Some basics on probability sampling
Problems in sampling rare population
subgroups*
A review of some existing remedies*
* Note that a reference list is available
Copyright 2000, William Kalsbeek
5
Context: Sampling Race/Ethnic Minorities
As the population subgroup of interest in a specially targeted
study (targeted sampling)
As a key subgroup in a general population study (oversampling)
<-------------------- General Population ------------------->
Ethnic Minority
With Oversampling
Targeted
Copyright 2000, William Kalsbeek
6
Probability vs. Nonprobability Sampling?
Probability sampling:
– Random sampling methods used
– Each member of the target population with a
known, nonzero selection probability
Nonprobability sampling in exceptional
circumstances
– Judgment used
– Requires models to analyze
Probability sampling is generally preferred
Copyright 2000, William Kalsbeek
7
Sampling Frames and Linkage
Sampling Frame = List(s) used to select a
probability sample
EXAMPLE: List of patients to sample
health care users
Usefulness of a frame is tied to:
– The linkage that exists between entries on the
list and the population being sampled
Copyright 2000, William Kalsbeek
8
Sample Weights
A number for each member of the sample
– Reflecting the inverse of the selection
probability for the sample member
May be adjusted for sample imbalance due
to:
– Nonresponse
– Incomplete frame coverage
– Other selection problems
Copyright 2000, William Kalsbeek
9
What are the Statistical Goals of
Probability Sampling?
Validity
– The ability to produce estimates without bias
tied to sampling
– Achieved if all population members have some
known chance to be chosen in the sample
Efficiency
– Tied to precision of estimates
– Achieved if the right sampling “tools” are used
– Greater efficiency costs more (cost-efficiency)
Copyright 2000, William Kalsbeek
10
What Selection Tools Might be Used
to Sample Race/Ethnic Minorities?
Stratified sampling
– Separate sampling within each of a number of
population groupings (strata)
Screening for the targeted minority group
– Identify subgroup members in initial sample
of the full population
Copyright 2000, William Kalsbeek
11
Stratified Sampling:
Population divided into a H subgroups
called strata
Separate probability sample in each stratum
Combine estimates from each stratum to
produce the estimate for the whole
population
Vs. “Stratified Analysis”
Copyright 2000, William Kalsbeek
12
Stratified Sampling Used When:
Wish to improve the efficiency of
population-wide estimates
AND/OR
Wish to control the sample size of estimates
for important population subgroups
– Isolatable to some degree by the strata
Copyright 2000, William Kalsbeek
13
Stratum Allocation Options:
Ch = Average cost of adding another respondent
to the sample in the h-th stratum
h h hf n / N Sampling rate in h-th stratum
= Standard deviation of all members of the h-th
stratum (measures intra-stratum variation) hS
Copyright 2000, William Kalsbeek
14
Stratum Allocation Options Analysis Priority
Option Description to Estimates for:
Proportionate Same sampling rates Overall
(fh = f in all strata) population
Optimum Most cost-efficient
sampling rates Overall population
Balanced Equal sample sizes Population
(nh = n/H) subgroups*
Disproportionate To "oversample" Key population
important subgroups subgroups*
(fh higher in subgroup strata)
* Definable by the strata
hh
h
Sf
C
Copyright 2000, William Kalsbeek
15
Screening for a Targeted
Population Subgroup
Sampling in two “phases”
– Goal is to locate members of the population subgroup
– Usually done by telephone or face-to-face in general
population surveys
Process:
– Select an initial sample
– Administer a relatively short interview
• To determine membership in the targeted subgroup
– Retain all target subgroup and (perhaps) a random
portion of the rest Copyright 2000, William Kalsbeek
Copyright 2000, William Kalsbeek
16
What May Lead to Problems in
Sampling Race/Ethnic
Minorities?
Incomplete Frame(s)
– A sizable portion of the population not linked to
entries on the list(s) used for sampling
Rarity
– They usually comprising a relatively small
percentage of the target population
Copyright 2000, William Kalsbeek
17
What May Lead to Problems in
Sampling Race/Ethnic
Minorities?
Mobility
– Some of them move around a lot, thus creating
a more dynamic than static linkage between the
frame and sampled population
Dispersion
– They are somewhat scattered geographically
– May have some pockets with relatively high
concentrations
Copyright 2000, William Kalsbeek
18
Copyright 2000, William Kalsbeek
19
Some Remedies
Targeted Sampling
– Multiple Frame Methods
– Linkage Exploitation Methods
• Network/multiplicity sampling
• Snowball sampling
• Adaptive cluster sampling
– Time and Space Sampling
Oversampling
– Disproportionate Stratified Sampling with
Screening
Copyright 2000, William Kalsbeek
20
Multiple Frame Methods:
Selection Approaches
Premise:
– Frame options taken alone may be inadequate or too
costly to use,
BUT
– Choosing the sample jointly from multiple frames may:
• Produce better coverage of the targeted population and
• Be more cost-effective
Dual-Frame Designs --- Two frames
Copyright 2000, William Kalsbeek
21
Multiple Frames
Frame A
Frame C
Frame B
Copyright 2000, William Kalsbeek
22
Multiple Frame Methods: EXAMPLE
Sampling Native Americans
Two frames:
– List of tribal rolls
• Less complete
• Less expensive to locate NAs
– Area household frame from:
• List of residential dwellings in a sample of block
groups (neighborhoods)
• More complete
• More expensive because of the need to screen
– Most cost-effective mix = ?
Copyright 2000, William Kalsbeek
23
Multiple Frame Methods:
Estimation Approaches
Work by Hartley (1962), Choudry (1989),
and Skinner and Rao (1996)
Special Requirements:
– Identify/eliminate overlap prior to sampling
OR
– Require knowledge of membership in
intersection groups for analysis adjustments
Copyright 2000, William Kalsbeek
24
Multiple Frame Methods:
Estimation Approaches
Eliminate frame duplication; treat as a
stratified sample
OR
Select with duplication present and either:
– Combine estimates for intersection groups
OR
– Determine frame membership for sample
respondents and weight accordingly
Copyright 2000, William Kalsbeek
25
Multiple Frame Methods: Implications
for Sampling Race/Ethnic Minorities
Advantages:
– Improved sample coverage over using a single list
– Potential cost savings if cost of frame use differs
among frames
Disadvantages:
– Higher design/selection/analysis complexity
relative to single frame use
– Challenge in finding the most cost-effective mix of
sample sizes for frames
Copyright 2000, William Kalsbeek
26
Linkage Exploitation Methods:
Selection Approaches
Premise:
– Population members with a rare attribute can
often identify others with the same attribute
Various adaptations:
– Based in the notion of multiplicity in frames
– Differ according to how multiplicity is utilized
Copyright 2000, William Kalsbeek
27
Multiplicity
Frame
Listing
Population
Member
Copyright 2000, William Kalsbeek
28
Linkage Exploitation Methods:
Various Adaptations
Network/multiplicity sampling
– Network --- social/spatial/organizational
linkage among members of the targeted
subgroup
– EXAMPLES: relatives, friends, co-workers, co-
habitants, organization co-members, etc.
– Linkages may be:
• Asymmetric
• Complex
• EXAMPLE: friends
Copyright 2000, William Kalsbeek
29
Linkage Exploitation Methods:
Various Adaptations
Network/multiplicity sampling
– Sampling Process:
• Chose an initial sample of targeted subgroup
• Sample members interviewed and asked to nominate
other members of their network who are members of
the targeted subgroup
• Interview those nominated and have them nominate
others in like manner
• Selection probability directly tied to size of network
Copyright 2000, William Kalsbeek
30
Linkage Exploitation Methods:
Various Adaptations
Snowball sampling
– Network sampling but with multiple phases of
nomination
– Snowballing may be best used to construct
frames to sample rare populations
• Continue waves of nomination until list expansion
ceases
Copyright 2000, William Kalsbeek
31
Linkage Exploitation Methods:
Various Adaptations
Adaptive cluster sampling
– Exploits the tendency for members of some
targeted subgroups to cluster together
• Original motivation from ecology and geology
– Sampling Process:
• Select a random sample of the population
• Where one identifies members of the targeted
subgroup, sample others in the “neighborhood”
Copyright 2000, William Kalsbeek
32
Linkage Exploitation Methods:
EXAMPLE Snowballing: sampling frame of prenatal
care providers
Study of recent female immigrants from
Central and South America
Process:
– Contact OB-GYNs in private practices and
public clinics
– Those providing prenatal care to immigrants
nominate others doing the same
– Continue iteratively until the no new providers
are discovered
Copyright 2000, William Kalsbeek
33
Linkage Exploitation Methods:
Estimation Major contributors: Sirken (network),
Goodman (snowball), and Thompson
(adaptive)
Approaches:
– Weighted multiplicity estimation (Sirken)
– Rao-Blackwellization to improve estimator
efficiency (Thompson)
Special requirements:
– Network membership information
– Multiplicity counts
Copyright 2000, William Kalsbeek
34
Linkage Exploitation Methods:
Implications for Sampling
Race/Ethnic Minorities Advantages:
– Greater operational efficiency in locating members
of the target population
• Find a “hotspot;” then sample “nearby”
Disadvantages:
– Difficult to determine selection probabilities for
weights
• Asymmetric linkages (A nominates B, but not vice versa)
• Valid probability samples?
Copyright 2000, William Kalsbeek
35
Time and Space Sampling:
Selection Approach Premise:
– Portions of ethnic subpopulations are relatively
mobile (e.g., migrant farm workers, homeless)
– Sampling a “chunk” of time
– Linkage between members of the target
subgroup and the frame is dynamic overtime
– Those moving more frequently have greater
chance of selection
– Sample space and time to address this potential
for bias
Copyright 2000, William Kalsbeek
36
Time and Space Sampling:
EXAMPLE
Sampling migrant seasonal farm workers
Process:
– Spatial dimension: sample migrant housing
locations
• On farms
• In other residential housing areas
– Time dimension: sample time periods during
the data collection period
• Three consecutive days
Copyright 2000, William Kalsbeek
37
Time and Space Sampling:
Estimation
Contributors: Kalsbeek (1988); Kalton (1991)
Approaches:
– Multiplicity estimators similar to those used in
network samples
Special Requirements:
– Need multiplicity count for each sample member?
– Sampling scheme compromise needed between:
• Statistical precision of estimates
• Operational effectiveness
Copyright 2000, William Kalsbeek
38
Time and Space Sampling:
Implications for Sampling
Race/Ethnic Minorities
Advantages:
– Deals with the fluidity of frame-population
linkage in mobile populations
– Provides a framework for finding a cost-
efficient solution
Disadvantages:
– Added complexity to selection, data gathering,
and analysis of sample
Copyright 2000, William Kalsbeek
39
Disproportionate Stratified
Sampling with Screening:
Selection Approach
Premise:
– Concentrations of the targeted subgroup vary
in the population
– Sample strata with higher concentrations more
heavily
– Result: larger sample size for the target
subgroup relative to a proportionate sample
Copyright 2000, William Kalsbeek
40
Copyright 2000, William Kalsbeek
41
DSS with Screening:
EXAMPLE
Oversampling African-Americans
A simple process:
– Stratify the population
• By relatively high and low concentrations of African-
Americans
• High concentration areas in the South and large cities
– Sample with relatively higher rates in the high
concentration stratum
Copyright 2000, William Kalsbeek
42
DSS with Screening: Estimation
Approaches:
– Weighted estimate to account for sample
disproportionality
– Effect of variable weights is to lower precision
of some population estimates
Special Requirements:
– Establishing the most cost-efficient overall and
stratum-specific sampling rates
Copyright 2000, William Kalsbeek
43
DSS with Screening:
Implications for Sampling
Race/Ethnic Minorities
Advantages:
– Increased sample size for the targeted
subgroups
• Are target subgroup non-members in the
(oversampled) high concentration strata)
Disadvantages:
– Loss in precision on overall population
estimates
Copyright 2000, William Kalsbeek
44
A Two-Stratum Model for
Effects of Oversampling
Setting:
– Oversampling a minority group
• 10% of the population
– Two sampling strata:
• One with higher % minority (to oversample)
• One with lower % minority (to undersample)
– Two alternative sets of strata:
• Nearly Pure --- strata virtually all members or non-
members
• Less Pure --- strata mostly all members or non-
members
Copyright 2000, William Kalsbeek
45
Nearly Pure Strata
Oversampled
Stratum
|
Undersampled Stratum
|
TARGET POPULATION
Copyright 2000, William Kalsbeek
46
Less Pure Strata
Undersampled Stratum
|
Oversampled
Stratum
|
TARGET POPULATION
Copyright 2000, William Kalsbeek
47
A Two-Stratum Model for
Effects of Oversampling
Assumptions:
– Simple random sampling in each stratum
– Stratum unit variances are equal
– Other minor simplifying conditions
Copyright 2000, William Kalsbeek
48
A Two-Stratum Model for
Effects of Oversampling Sample Sizes (Relative to Proportionate):
– Minority_Nom = Nominal Sample Size for Minority
• Observed increase in size of minority sample
• Due to oversampling of the predominantly minority stratum
– Minority_Eff = Effective Sample Size for Minority
• Adjusted size of minority sample
• Considering the (downward) effect of variable sample
weights on statistical quality of estimates
– Overall_Eff = Effiective Size of Overall Sample
• Adjusted size of overall sample
• Considering the (downward) effect of variable sample
weights on statistical quality of estimates
Copyright 2000, William Kalsbeek
49
Effects of Oversampling:
Nearly Pure Strata
Effects of Oversampling on Sample Sizes:
Nearly Pure Strata
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
1 3 5 7 9 11 13 15 17 19 21
Degree of Oversampling (Relative Sampling Rates)
Sample
Sizes
Relative
to
Prop.
Alloc.
Overall_Eff Minority_Nom Minority_Eff
Copyright 2000, William Kalsbeek
50
Effects of Oversampling:
Less Pure Strata
Effects of Oversampling on Sample Size:
Less Pure Strata
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
1 3 5 7 9 11 13 15 17 19 21
Degree of Oversampling (Relative Sampling Rates)
Sample
Sizes
Relative
to
Prop.
Alloc.
Overall_Eff Minority_Nom Minority_Eff
Copyright 2000, William Kalsbeek
51
Summary
Sampling rare ethnic groups is possible
BUT
Accomplishing it effectively is likely to be:
– Complex (dealing with multiplicity, dealing
with multiple frames, resolving statistical-
operational dilemmas)
– Costly (screening, stratification)
– Adverse effect on overall population estimates
(if oversampling done)
– Loss of sampling validity? (snowball sampling)
Copyright 2000, William Kalsbeek
52
A Case-Study in Oversampling
Blacks and Mexican-Americans:
The Third National Health and Nutrition
Examination Survey (NHANESIII)
Copyright 2000, William Kalsbeek
53
Cluster Sampling:
Random selection applied to one or more
levels of a population hierarchy
“Sampling Stage” = Level of hierarchy at
which sampling is done
Jargon:
– PSU = Primary Sampling Unit is what is
sampled in the first selection stage
– SSU = Secondary Sampling Unit is what is
sampled in the second stage
Copyright 2000, William Kalsbeek
54
Population Hierarchies:
Population
Member
(Based on Vital and Health Statistics, Series 2, Number 113
Copyright 2000, William Kalsbeek
55
Population Hierarchies:
EXAMPLE: African-American residents of the
US non-institutionalized household population
Resident > Household > Block Group >
Census Tract > Minor Civil Division >
County > State > US
Copyright 2000, William Kalsbeek
56
NHANES III Overview
National health survey
U.S. civilian noninstitutionalized population
Stratified multi-stage sample design
Detailed profile and predictors of health status
Data gathering timeline:
– 1988-94
Data collected by:
– Face-to-face interviews in the home
– Detailed examination at mobile sites
Copyright 2000, William Kalsbeek
57
NHANES III Target Population
U.S. residents
– Two months and older
– Including those living in Alaska and Hawaii
Civilians only
– Excludes housing on military bases
Noninstitutionalized population only
Excludes some residents of hospitals, nursing homes,
prisons, and other comparable institutions
Eligibility determined as of the time of interview
Copyright 2000, William Kalsbeek
58
NHANES III in General
Key minority domains:
– Black (non-Hispanic)
– Mexican American
– Children: 2 months – 5 years
– The Elderly: > 60 years
Copyright 2000, William Kalsbeek
59
Copyright 2000, William Kalsbeek
60
Stratification to Oversample
Key Minority Domains Applied at:
The PSU level:
– Race/ethnicity or income indicator
The “segment” level:
– Density of Mexican-Americans
The household level:
– Race/ethnicity
The (sample) person level:
– Age
Copyright 2000, William Kalsbeek
61
Oversampling of
Key Minority Domains
Implementation accomplished by:
– Disproportionate allocation favoring key
minority domains
– Using a weighted measure of size:
*j j
j
Mos Mos
j = overall sampling probability for the j-th among all
cells of the cross-classification by the race/ethnicity
and age categories that define the key minority domains
jMos = Measure of size for the same cross-classification
within the ( -th) cluster
Copyright 2000, William Kalsbeek
62
Stratification to Oversample
Key Minority Domains in NHANES III
Domain Approximate
Population %
Approximate
Sample %
Black, non-
Hispanic
12 31
Mexican
American
5 26
2 months –
5 years
7 11
60 years 21 32
Copyright 2000, William Kalsbeek
63
Stratification to Oversample
Key Minority Domains in NHANES III
Oversampling implies more widely variable
selection probabilities and sample weights
Effect of variable weights is to increase variances
of estimates
One model: Increased variance by a factor of,
2
2
sn 1Deff 1
n
2s Variance of weights among sample respondents
Mean of weights among sample respondents
Copyright 2000, William Kalsbeek
64
Stratification to Oversample Key
Minority Domains in NHANES III
EXAMPLE: – Effect of variable sample weights on total population
estimates using data from the MEC-examined NHANES III
sample
n = 23,561
9,397.04
22s 12,405.33
Deff 2.743