INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16,...
-
Upload
antony-sims -
Category
Documents
-
view
228 -
download
0
description
Transcript of INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16,...
INFO 4470/ILRLE 4470 Visualization Tools and Data
Quality
John M. Abowd and Lars VilhuberMarch 16, 2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
2
Outline
• Data quality issues• Coverage• Known biases• Missing data• Extended example using the Quarterly
Workforce Indicators
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
3
Coverage
• What is the target (theoretical) population?• Does the statistical frame fully cover this
population?• What are the known deficiencies in the
frame?• Can these be addressed with other data?
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
4
Biases and Sampling Error
• Bias: the statistical model does not estimate the quantity of interest in expected value
• Sampling error: the statistical model has imprecision in estimating the quantity of interest
• Non-sampling error: variation in the statistical model introduced by editing, imputation, non-random non-response, etc.
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
5
What are Missing Data?
• Random or systematic missing items or entities in a statistical sample or frame
• One of the most ubiquitous data quality problems known
• Cannot be ignored, or rather
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
6
How to code missing data?
• How about giving a zero, “0” for missing?• Note with a distinctive value such as:– -9 for non-responses– -8 for invalid responses– Or even “.”
• Statistical software like SAS, Stata and SPSS can then recognize those values as missing and treat them according to your instructions
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
7
How to Handle Missing Data
• It depends on the type of analysis:– Simple summary tabulations– Descriptive stats, including mean and standard
deviation– Correlation, relation between two or more
variables– Model to estimate the effects of characteristics
3/16/2011
Missing vs Non-missing
"There is no difference between cases with missing versus observed
data"
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
9
“Inference and Missing Data” Rubin (1976)
• MCAR – Missing Completely At Random– Rigorous and hard to achieve– Sometimes by design
• MAR – Missing At Random– Less Rigorous– Cannot be tested without the data which are
missing
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
10
Assume MAR?
• YES – then mechanism by which data are missing is IGNORABLE
• NO – then mechanism is not ignorable, requires more involved procedures
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
11
MAR - YES
Delete• Listwise
• Pairwise
Impute• Assign Mean
• Hot Deck
• Maximum Likelihood
• Multiple Imputation
3/16/2011
Visualization of Missing Data in the Quarterly Workforce Indicators
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
13
Data Sources (Quarterly Only)
• Quarterly Workforce Indicators (Census Bureau)
• Quarterly Census of Employment and Wages (BLS)
• Business Employment Dynamics (BLS)• Job Openings and Labor Turnover Survey• Adjustments by to JOLTS by Davis, Faberman,
Haltiwanger, and Rucker (2010)
3/16/2011
Quarterly Workforce Indicators I
• Flows are based on longitudinally linked (by employer and employee) Unemployment Insurance Wage Records
• Beginning-of-quarter employed if wage record with earnings > $1.00 in quarters t-1 and t (B)
• End-of-quarter employed if wage record with earnings > $1.00 in quarters t and t+1 (E)
• Accession if wage record in t but not t-1 (A)• Separation if wage record in t but not t+1 (S)
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
15
Quarterly Workforce Indicators II
• Job creation if establishment has positive employment change from beginning to end of quarter (JC)
• Job destruction if establishment has negative employment change from beginning to end of quarter (JD), always stated as absolute value of change
• Demographic (age x gender), geographic (county, CBSA, WIB), NAICS (sector, sub-sector, industry group), and ownership (All, private)3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
16
Quarterly Census of Employment and Wages
• Stocks of employment measured as of the 12th day of the month for each month in the quarter for each establishment (no job level data)
• BLS uses month-3 employment to measure changes• Beginning of quarter employment is month-3
employment from quarter t-1• End of quarter employment is month-3 employment
for quarter t• Geographic (county, MSA), NAICS (sector, sub-sector,
industry group), and ownership (all, public, private)
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
17
Business Employment Dynamics
• Gross job gains (job creations) is the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if positive
• Gross job losses (job destructions) is the absolute value of the change in employment at an establishment between month-3 in quarter t-1 and month-3 in quarter t, if negative
• Limited geography (state), NAICS (sector) detail3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
18
Job Openings and Labor Turnover Survey
• Monthly survey of continuing establishments• Accessions measured as all new employment over
the course of the month (summed over the three months in the quarter here)
• Separations measured as quits, layoffs, discharges, and other over the course of the month (summed over the three months of the quarter here)
• Limited geography (national) and NAICS (sector) detail
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
19
QWI Coverage of the Private Workforce
Sub-period 1
Sub-period 2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
20
Worker Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
SAWRR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
21
Job Reallocation Rate
• This rate is available in the QWI for 8 age groups, both genders, NAICS sector, state and quarter from 1990:Q1 to 2008:Q4 (more detail is available than we used)
2agsktagskt
agsktagsktagskt EB
JDJCJRR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
22
Excess Reallocation Rate
• ERR = WRR – JRR• The excess reallocation rate measures the extent
to which gross worker flows exceed the minimum required to service the gross job flows
• This has been very difficult to estimate nationally because there were no data collected on a consistent basis for all the component flows
• QWI solved that problem
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
23
Statistical Methodology
• Divide the analysis into two periods– 1993:Q1-2001:Q4 (early period, many states are completely
missing, 10 states complete)– 1999:Q1-2008:Q4 (later period, 37 states are complete)
• For each sub-period use a multiple imputation model to complete the missing data
• For the overlap period, use a ramped weight to compute the average implicate combining the two periods
• Use the standard multiple imputation formulae to combine implicates
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
24
National Estimates
• The combining formula for producing the national WRR is shown above (similar formulae apply to other rates)
WRRag+kt = 1MP M
`=1
2664P
8s
µB ( `)agsk t +E ( `)
agsk t2
¶WRR ( `)
agsk t
P8v
µB ( `)agvk t +E ( `)
agvk t2
¶
3775
agkstsk
agkstagkst
agtagt
agtagt
agtagtagt
WRREB
EB
EBSA
WRR
, 22
2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
25
Implicate Combining Formulae I
V (`)hdWRRag+kt
i= 1
49P
8s
µB (`)agsk t +E (`)
agsk t2
¶³WRR (`)
agsk t ¡ dWRR ( `)ag+k t
´ 2
P8v
µB (`)agvk t +E ( `)
agvk t2
¶ :
B £WRRag+kt¤= 1
M ¡ 1P M
`=1
µdWRR (`)
ag+kt ¡ WRRag+kt
¶2
M
s ktagktag
agsktagsktagskt
agktEB
WRREB
MWRR
1
2
21
s ktagktag
agsktagsktagskt
agkt EB
WRREB
RRW
2
2
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
26
Implicate Combining Formulae II
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
s ktagktag
agktagsktagsktagskt
agkt EB
RRWWRREB
RRWV
2
2491
2
M
agktagktagkt WRRRRWM
WRRB1
2
11
agkt
M
agktagkt WRRBMMRRWV
MWRRT
1
11
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
27
Implicate Combining Formulae III
T £WRRag+kt¤= 1
MP M
`=1V (`)hdWRRag+kt
i+ M +1
M B £WRRag+kt¤
MR £WRRag+kt¤= B [WRR ag+k t ]
T [WRR ag+k t ]
df £WRRag+kt¤= (M ¡ 1)
µ1+ 1
M +11MP M
`=1 V( `)£dWRR ag+k t
¤B [WRR ag+k t ]
¶2:
agkt
agktagkt
WRRTWRRBWRRMR
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
28
Results
• Comparison of WRR between QWI and JOLTS• Comparison of JRR between QWI and BED• Demographic detail• Industry detail
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
29
WRR: QWI v. JOLTS
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
30
WRR: QWI v. Adjusted JOLTS
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
31
JRR: QWI v. BED
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
32
JRR WRR ERR by Gender
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
33
ERR (Churning) by Age Group
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
34
Job Creation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
35
Job Destruction Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
36
Job Reallocation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
37
Accession Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
38
Separation Rate
3/16/2011
(c) John M. Abowd and Lars Vilhuber 2011, all rights reserved
39
Worker Reallocation Rate
3/16/2011