Post on 18-Dec-2015
11
ACS Public Use Microdata Samples of 2005 and 2006 –
How to Use the Replicate Weights
B. Dale Garrett and Michael StarsinicU.S. Census Bureau
AAPOR Conference, New OrleansMay 16, 2008
2
Public Data
• The American Community Survey (ACS) produces an annual Public Use Microdata Sample (PUMS) file.
• You can download these files for free.
• Write your own program to tally and analyze data.
3
Key Points
• PUMS data users want to know the reliability of an estimate.
• This paper explains how to use PUMS replicate weights to estimate standard errors.
44
Outline
• the American Community Survey (ACS)
• the Public Use Microdata Sample (PUMS)– sample design– confidentiality– weights– standard errors– issues with standard errors
55
The American Community Survey
• The 2005 ACS – Sample of 250,000 housing units per month.– Every county represented in the fifty states,
District of Columbia and Puerto Rico.– Collects population and housing characteristics
• The 2006 ACS was similar but added– A sample of both institutional and noninstitutional
Group Quarters population.– GQ sample size was 16,000 persons per month
66
PUMS Sample Design
• PUMS is a subsample of ACS
– Sort the ACS interviews on geography, mode of interview, types of housing units, demographics
– Sample size: • one percent of the total HUs and HH persons in 2005 and
2006.• one percent of total GQ persons in 2006
– Systematic sampling at the state and PUMA level.
7
PUMA Definition
• PUMA - Public Use Microdata Area
– Designed for public release of information by local state officials.
– Large enough to achieve disclosure avoidance. • An area of 100,000 population or more as of the 2000
Census.
88
PUMS Protects Confidentiality• PUMS does not reveal:
– Names of persons.– Address.– Detailed Type of group quarters.– Geographic data below the PUMA level.
• The respondent’s identity is protected.– Top-coding of age, income and other variables.– Data swapping– Synthetic data– Perturbation of data
9
Rural PUMAs in KY
9
10
PUMAs in Baltimore Co., MD
10
1111
PUMS Weighting
• The PUMS initial weight was equal to the ACS final weight times the sampling interval.
• The 2006 PUMS file was ratio-estimated to ACS– persons in households by sex by PUMA– housing units by vacant/occupied by PUMA– persons in group quarters by institutional/
noninstitutional by state
1212
How to Program an Estimate – Counts, Aggregates, Ratios, Medians
• Totals (counts)– Sum the PUMS weights (for the characteristic).
• Aggregates– Sum the product of the PUMS weight times the
value
• Ratios– Form the total or aggregate for the numerator– Sum the PUMS weights for the characteristic in the
denominator– Divide
• Medians – use weighted distributions
1313
ACS Standard Errors
• The ACS uses the successive difference model of replicate weights to estimate standard errors.
• The successive difference model of Kirk Wolter was developed for ACS by Robert Fay and George Train.
http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf
1414
Two Methods for PUMS Standard Errors
• Design factor method– Design factors are factors to multiply times the
standard error of a simple random sample. – Easier to use than the replicate weights
• Replicate weight method– Generally, you get a more accurate standard error
estimate by using the replicate weights.– Somewhat more work than design factors.
http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.pdf
1515
Three Steps to Standard ErrorsUsing Replicate Weights
• Write a program to derive an estimate using the PUMS weight.
• Run the program 80 more times using each of the 80 replicate weights.
• Use the PUMS estimate and the 80 replicate estimates in the Standard Error formula.
1616
ACS PUMS Replicate Weight Formula for a Standard Error
80
1
2r XX
80
4SE
r
• where:– X is the estimate formed from the PUMS weight
– Xr is the estimate formed from the rth replicate weight.
1717
Standard Errors of Differences
• There are two estimates, A and B.
• You want to use a Z-test to see if the difference (A – B) is significant.
• The Z-test requires the standard error of the difference.
1818
For Independent Estimates
• SEA-B – the standard error of (A – B) • SEA – the standard error of estimate A• SE B– the standard error of estimate B
22B)-(A SE BA SESE
Use the standard errors of the two estimates to estimate the standard error of the difference.
1919
For Correlated Estimates
• Directly use the replicate weights to calculate the standard error of the difference.
– Let X = (A - B) = the difference
– Let Xr = (Ar – Br )• for the 80 replicate differences X1 … X80
• Use the replicate weight formula (seen earlier).
2020
Replicate Weight Issues• Estimate is zero, standard error is not zero.
– Cannot use replicate weights to estimate the standard error.– See the PUMS Accuracy document for a formula.
• The replicate standard error is zero, estimate is not zero.– Zero means that if you reselected the sample the answer would
be the same.– Acceptable if estimate controlled in the weighting.– Not acceptable if the estimate is a median. Often a direct
median gives a zero standard error.
21
Standard Error Options for Medians
• Direct median with replicate weights may give a zero standard error. This is not good.
• Categorical median with replicate weights will give a more stable standard error, but still some zero standard errors.
• Design factor method – Start with either the direct or categorical median, use design factors for the standard error.
22
Conclusion
• Replicate weights for ACS PUMS are:– Available for 2005 PUMS and later.– Easy to use for most estimates.– Few issues
• For medians – Replicate weight standard errors may be zeros.– To avoid the zeros use the design factor method.
2323
References
• US Census Bureau: Accuracy of the Data (2006) for ACS is found at:
– http://www.census.gov/acs/www/Downloads/ACS/accuracy2006.pdf
• US Census Bureau: PUMS Accuracy of the Data (2006) is found at:
– http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.
• US Census Bureau: Design and Methodology: American Community
Survey, Technical Paper 67, May 2006,
– http://www.census.gov/acs/www/Downloads/tp67.pdf
• Fay & Train, Aspects of Survey and Model-Based Postcensal Estimation of
Income and Poverty Characteristics for States and Counties, 1995
– http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf
2424
Contact Information
• For questions about this presentation or for an example program to generate standard errors.
• Contact me at B.Dale.Garrett@census.gov
Views expressed in this paper are those of the authors and not necessarily those of the U.S. Census Bureau.
2525
How to Derive an Estimate – Direct Medians
• The direct median is the weighted sample median or the distributional median.
• Sum the weights for the characteristic total.• Sort the file on the value of interest.• Sum the weights until the 50% point.• The direct median is the value of the record
which crosses the 50% point.• Or a point between the values of two records
that divide the file into two exact halves.
2626
How to Derive an Estimate –Categorical Medians
• Categorical or interpolated medians.– Used for published ACS statistics in Factfinder.
• Categorical medians are interpolations:– A weighted distribution of the characteristic.– Each bin or row is assigned a range of values.– Uses linear interpolation for most variables.
27
Direct Median Example Based on 5 Records
Record # Percent of Total
Income from record
Direct median
1 18% 18,000
2 22% 33,000
3 20% 41,000 41,000
4 15% 49,000
5 25% 62,000
28
Direct and Categorical Medians Example Based on 5 Records
Income
Range
Record #
Percent of Total
Income from record
Direct median
Categorical median
-59,000 to 20,000
1
18% 18,000
20,000 to 40,000
2
22% 33,000
40,000
to
60,000
3 20% 41,000 41,000 45,700
4 15% 49,000
60,000 + 5 25% 62,000