Can We Trust Data Users to Consider Data Quality?
-
Upload
uriel-bradshaw -
Category
Documents
-
view
31 -
download
0
description
Transcript of Can We Trust Data Users to Consider Data Quality?
Can We Trust Data Users to Consider Data Quality?
Presented at the 2008 European Conference on Quality in Official Statistics
Background
The American Community Survey (ACS) is an innovative approach for collecting and publishing demographic, social, economic, and housing data
National sample of about 3 million addresses each year
Background
Combining ACS samples over time permits publication for smallest geographic areas
Combining ACS samples over space permits publication for shorter time periods
ACS Data Release Schedule
Type of Estimate
Year of Data Release
2006 2007 2008 2009 2010 2011
5-year NA NA NA NA 2005-2009
2006-2010
3-year NA NA 2005-2007
2006-2008
2007-2009
2008-2010
1-year 2005 2006 2007 2008 2009 2010NA: Not Available
Dissemination Options
5-year estimates released for all geographic areas to produce data similar to census sample data
1-year and 3-year estimates released only for a subset of these geographic areas
ACS Data Users
Technically advanced users have the experience and can usually be trusted to consider quality
Novice users who lack this experience may not understand or take quality into account
Consequences
Release of data that are not perceived as credible leads to loss of trust in the integrity of the survey in general
“ACS strikes again! Its hard to believe that the Census Bureau expects users to accept these numbers.”
ACS Dissemination Philosophy
Release as many data as possible to as many areas as possible while being certain that confidentiality is retained
Produce accompanying information on sampling error and educational materials for users
Methods
1-year estimates are only published for geographic areas with a minimum population of 65,000
Products reflect the use of a table-based data release rule and the availability of detailed and collapsed tables
Example of a Collapsed Table
Example - Margins of Error
Example - Confidence Intervals
Example - Statistical Testing
Other Educational Materials
Website includes numerous documents describing survey methods and survey quality
Separate ACS web page on Quality Measures
Review of 2006 ACS Data
Summary of the total estimates produced from the ACS sample
Reliability of published estimates
Effectiveness of publication thresholds and data release rules
Estimate Size – 2006 ACSNumber of published (blue) and not published (grey) estimates (in millions)
5.1
15.2
9.1
27.5
11.9
9
18.3
35.1
7.5
3.7
10.1
4.4
3.8
8.3
Estimates of Zero
Less than 0.5%
0.5 - 1.0%
1.0 - 5.0%
5.0 - 10.0%
10.0 - 20.0%
Greater than 20%
Reliability of 2006 ACS Estimates Number of published estimates (in millions) with CVs of less than 30% (blue) and CVs of 30% or greater (red)
2.8
3.7
20.5
11.0
8.6
18.0
5.1
12.4
5.4
7.0
0.9
0.5
0.4
Estimates of Zero
Less than 0.5%
0.5 - 1.0%
1.0 - 5.0%
5.0 - 10.0%
10.0 - 20.0%
Greater than 20%
Effectiveness of Thresholds and Release Rules – 2006 ACSNumber of estimates (in millions) with CVs of 30% or greater with release rules (red) or without release rules (grey) given varying publication thresholds
51.8
18.6
11.6
3.5
31.7
20.1
9.0
2.0
6.0
94.765,000
125,000
250,000
500,000
1 Million
Effect of Threshold Changes on Scope of PublicationNumber of geographic areas receiving 1-year estimates given varying publication thresholds
6,502
3,830
1,533
1,023
292
65,000
125,000
250,000
500,000
1 Million
Conclusions
Continued release of 1-year estimates based on 65,000 threshold and use of data release rule
Expansion of educational materials for users with emphasis on quality
New Initiatives
Survey of users to obtain feedback on measures of sampling error
Development of on-line calculator
Testing of alternative visual display of ACS data
New Initiatives
Data user guides for targeted audiences
On-line tutorial
Training materials and train-the-trainer sessions