11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale...

ACS Public Use Microdata Samples of 2005 and 2006 –

How to Use the Replicate Weights

B. Dale Garrett and Michael StarsinicU.S. Census Bureau

AAPOR Conference, New OrleansMay 16, 2008

Public Data

• The American Community Survey (ACS) produces an annual Public Use Microdata Sample (PUMS) file.

• You can download these files for free.

• Write your own program to tally and analyze data.

Key Points

• PUMS data users want to know the reliability of an estimate.

• This paper explains how to use PUMS replicate weights to estimate standard errors.

Outline

• the American Community Survey (ACS)

• the Public Use Microdata Sample (PUMS)– sample design– confidentiality– weights– standard errors– issues with standard errors

The American Community Survey

• The 2005 ACS – Sample of 250,000 housing units per month.– Every county represented in the fifty states,

District of Columbia and Puerto Rico.– Collects population and housing characteristics

• The 2006 ACS was similar but added– A sample of both institutional and noninstitutional

Group Quarters population.– GQ sample size was 16,000 persons per month

PUMS Sample Design

• PUMS is a subsample of ACS

– Sort the ACS interviews on geography, mode of interview, types of housing units, demographics

– Sample size: • one percent of the total HUs and HH persons in 2005 and

2006.• one percent of total GQ persons in 2006

– Systematic sampling at the state and PUMA level.

PUMA Definition

• PUMA - Public Use Microdata Area

– Designed for public release of information by local state officials.

– Large enough to achieve disclosure avoidance. • An area of 100,000 population or more as of the 2000

Census.

PUMS Protects Confidentiality• PUMS does not reveal:

– Names of persons.– Address.– Detailed Type of group quarters.– Geographic data below the PUMA level.

• The respondent’s identity is protected.– Top-coding of age, income and other variables.– Data swapping– Synthetic data– Perturbation of data

Rural PUMAs in KY

PUMAs in Baltimore Co., MD

PUMS Weighting

• The PUMS initial weight was equal to the ACS final weight times the sampling interval.

• The 2006 PUMS file was ratio-estimated to ACS– persons in households by sex by PUMA– housing units by vacant/occupied by PUMA– persons in group quarters by institutional/

noninstitutional by state

How to Program an Estimate – Counts, Aggregates, Ratios, Medians

• Totals (counts)– Sum the PUMS weights (for the characteristic).

• Aggregates– Sum the product of the PUMS weight times the

• Ratios– Form the total or aggregate for the numerator– Sum the PUMS weights for the characteristic in the

denominator– Divide

• Medians – use weighted distributions

ACS Standard Errors

• The ACS uses the successive difference model of replicate weights to estimate standard errors.

• The successive difference model of Kirk Wolter was developed for ACS by Robert Fay and George Train.

http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf

Two Methods for PUMS Standard Errors

• Design factor method– Design factors are factors to multiply times the

standard error of a simple random sample. – Easier to use than the replicate weights

• Replicate weight method– Generally, you get a more accurate standard error

estimate by using the replicate weights.– Somewhat more work than design factors.

http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.pdf

Three Steps to Standard ErrorsUsing Replicate Weights

• Write a program to derive an estimate using the PUMS weight.

• Run the program 80 more times using each of the 80 replicate weights.

• Use the PUMS estimate and the 80 replicate estimates in the Standard Error formula.

ACS PUMS Replicate Weight Formula for a Standard Error

• where:– X is the estimate formed from the PUMS weight

– Xr is the estimate formed from the rth replicate weight.

Standard Errors of Differences

• There are two estimates, A and B.

• You want to use a Z-test to see if the difference (A – B) is significant.

• The Z-test requires the standard error of the difference.

For Independent Estimates

• SEA-B – the standard error of (A – B) • SEA – the standard error of estimate A• SE B– the standard error of estimate B

22B)-(A SE BA SESE

Use the standard errors of the two estimates to estimate the standard error of the difference.

For Correlated Estimates

• Directly use the replicate weights to calculate the standard error of the difference.

– Let X = (A - B) = the difference

– Let Xr = (Ar – Br )• for the 80 replicate differences X1 … X80

• Use the replicate weight formula (seen earlier).

Replicate Weight Issues• Estimate is zero, standard error is not zero.

– Cannot use replicate weights to estimate the standard error.– See the PUMS Accuracy document for a formula.

• The replicate standard error is zero, estimate is not zero.– Zero means that if you reselected the sample the answer would

be the same.– Acceptable if estimate controlled in the weighting.– Not acceptable if the estimate is a median. Often a direct

median gives a zero standard error.

Standard Error Options for Medians

• Direct median with replicate weights may give a zero standard error. This is not good.

• Categorical median with replicate weights will give a more stable standard error, but still some zero standard errors.

• Design factor method – Start with either the direct or categorical median, use design factors for the standard error.

Conclusion

• Replicate weights for ACS PUMS are:– Available for 2005 PUMS and later.– Easy to use for most estimates.– Few issues

• For medians – Replicate weight standard errors may be zeros.– To avoid the zeros use the design factor method.

References

• US Census Bureau: Accuracy of the Data (2006) for ACS is found at:

– http://www.census.gov/acs/www/Downloads/ACS/accuracy2006.pdf

• US Census Bureau: PUMS Accuracy of the Data (2006) is found at:

– http://acsweb2.acs.census.gov/acs/www/Downloads/2006/AccuracyPUMS.

• US Census Bureau: Design and Methodology: American Community

Survey, Technical Paper 67, May 2006,

– http://www.census.gov/acs/www/Downloads/tp67.pdf

• Fay & Train, Aspects of Survey and Model-Based Postcensal Estimation of

Income and Poverty Characteristics for States and Counties, 1995

– http://www.census.gov/hhes/www/saipe/asapaper/FayTrain95.pdf

Contact Information

• For questions about this presentation or for an example program to generate standard errors.

• Contact me at B.Dale.Garrett@census.gov

Views expressed in this paper are those of the authors and not necessarily those of the U.S. Census Bureau.

How to Derive an Estimate – Direct Medians

• The direct median is the weighted sample median or the distributional median.

• Sum the weights for the characteristic total.• Sort the file on the value of interest.• Sum the weights until the 50% point.• The direct median is the value of the record

which crosses the 50% point.• Or a point between the values of two records

that divide the file into two exact halves.

How to Derive an Estimate –Categorical Medians

• Categorical or interpolated medians.– Used for published ACS statistics in Factfinder.

• Categorical medians are interpolations:– A weighted distribution of the characteristic.– Each bin or row is assigned a range of values.– Uses linear interpolation for most variables.

Direct Median Example Based on 5 Records

Record # Percent of Total

Income from record

Direct median

1 18% 18,000

2 22% 33,000

3 20% 41,000 41,000

4 15% 49,000

5 25% 62,000

Direct and Categorical Medians Example Based on 5 Records

Income

Record #

Percent of Total

Income from record

Direct median

Categorical median

-59,000 to 20,000

18% 18,000

20,000 to 40,000

22% 33,000

40,000

60,000

3 20% 41,000 41,000 45,700

4 15% 49,000

60,000 + 5 25% 62,000

11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale...

Documents

Transcript of 11 ACS Public Use Microdata Samples of 2005 and 2006 – How to Use the Replicate Weights B. Dale...

Microdata Sharing Via Pseudonymization

Microdata and schema.org

AAPOR 2016 - Dutwin and Buskirk - Apples to Oranges

World Bank: Microdata Library

April 2014 Consumer Expenditure Survey Microdata Users ... · Consumer Expenditure Survey Microdata Users’ Workshop and Survey Methods ... The 2013 Consumer Expenditure Survey Microdata

Microdata - pillole

2016 aapor mari harris

Date: 9 September 2013 To: AAPOR From: Membership ......Date: 9 September 2013 To: AAPOR From: Membership & Chapter Relations Committee RE: 2013 AAPOR Membership Surveys: Screenshots

Strengths of OS microdata

Access to Microdata

MICRODATA-BASED ANALYSIS OF SCIENCE AND INNOVATION: … · Microdata Traditional approach to statistics at OECD Indicators NSO Microdata NSO - Problem: What OECD collects and reports

Census Microdata Revolution

Dienstencatalogus On Site / Remote Access...Microdata Services Catalogue of services Microdata Services 2021 Microdata Services Microdata@cbs.nl +31(0)70-3375444 (RA helpdesk for technical

2016 aapor michael wild

Zone of Ambivalence AAPOR Presentation

Why YOU should join AAPOR! - papor.org · Why YOU should join AAPOR! Liz Hamel AAPOR Membership and Chapter Relations Chair. The American Association for Public Opinion Research:

ОТЧЁТ AAPOR О БОЛЬШИХ ДАННЫХ

AAPOR Report on Big Data

SKOS, RDFa, Microformats, Microdata

Microdata Finland Oy