SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS
-
Upload
hilda-warren -
Category
Documents
-
view
31 -
download
0
description
Transcript of SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS
![Page 1: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/1.jpg)
1
SIPP IMPUTATION SCHEME ANDDISCUSSION ITEMS
Presenters:Nat McKee - Branch Chief
Census BureauDemographic Surveys Division (DSD)Income Surveys Programming Branch (SIPP)301-763-5244
Zelda McBride - SupervisorCensus BureauDemographic Surveys Division (DSD)Income Surveys Programming Branch (SIPP)301-763-2942
ASA/SRM SIPP WORKING GROUP MEETING September 16, 2008
![Page 2: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/2.jpg)
2
OVERVIEW OF IMPUTATION
TYPES OF MISSING DATA• Item Non-Response
as refusals, blanks, don’t know, incompatible answers Handled via hot deck imputation
• Unit Non-Responseas person level non-interviews or insufficientpartialHandled via Type Z and/or hot deck imputation
![Page 3: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/3.jpg)
3
HOT DECK OVERVIEW
File is sorted geographically – allocated data likely to come from geographically proximate case
Replace missing data items with reported data from another similar person/household
![Page 4: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/4.jpg)
4
EDITING STEPS
• Before Pass 1 – cold (initial) values are in the decks,missing data is not imputed yet
• Pass 1 – cold values are replaced by the live hot data but editing is not saved
• Pass 2 – the last values updated in Pass 1 are the starting Values for the edit pass
![Page 5: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/5.jpg)
5
1 3
1 3
3 3
GENDER X AGE CATEGORIESINITIAL VALUES
What did you have for lunch today?1-Hamburger 2-Yogurt 3-
Salad 4-Chicken 5-Roast Beef 6-
OtherMale Female
1. Under 302. 30 - 643. 65+
![Page 6: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/6.jpg)
6
1 3
5 2 <-4
3 3
VALUES AFTER PASS 1 BEFORE EDITING
F
1.
2.
3.
Nat, Tracy, Zelda, Jeff, Martha 5 2 4 R
R M
![Page 7: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/7.jpg)
7
0 0
1 1
0 0
1
2
3
M F
COUNTERS FOR DONOR USAGE
![Page 8: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/8.jpg)
8
IMPUTING FOR MISSING DATA
• Process sequentially by unit for each section: demographics,household characteristics, labor force, assets, general income, health insurance and program participation
• If non missing data --- replaces the hot deck value• If missing takes the last hot deck value and
increments the counter• Repeating the same edit program/imputation will
give the same results each time(i.e. rerun – no changes – same donors, same results)
![Page 9: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/9.jpg)
9
IMPUTATION MATRICES
• Matrix defined with stratifying parameters relevant to the item
• Sex, race, age (with categories) are used frequently in matrices
• Other specialized relevant variables are used too as when imputing class of worker a recode of industries is used in the matrix
![Page 10: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/10.jpg)
10
USING PREVIOUS WAVE DATA
Wave 2+ sometimes use previous wave data as a parameter in the hot deck
• Advantage – more consistency wave to wave
• Disadvantage – a particular donor has the potential to influence every wave
![Page 11: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/11.jpg)
11
ALLOCATION FLAGS
0 – no imputation initialized1 – hot deck imputation2 – set to cold value3 – logical (derived)4 – used previous wave data
![Page 12: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/12.jpg)
12
TYPE Z NONINTERVIEW
Type Z Noninterview = Noninterviewed Person Within Interviewed Household:
EPPINTVW (Wave 3) Frequency Percent --------------------------------------------------------------1=Noninterview in all 4 months 14254 12.34 1=Interview (Self) 44912 38.89 2=Interview (Proxy 29844 25.84 3=Non-Interview - Type Z 3042 2.63 4=Non-Interview - Psuedo Type Z 1039 0.90 5=Children under 15 during ref period 22404 19.40
![Page 13: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/13.jpg)
13
TYPE Z IMPUTATION
Type Z Imputation = Hierarchical sorting and merging Operation that matches type Z noninterviews with respondents based on demographic characteristics available for both.
• Imputes entire record from single donor.
![Page 14: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/14.jpg)
14
ELIGIBILITY FOR TYPE Z IMPUTATION
• Type Z noninterview
• Wave 1, or for Wave 2+ no previous wave info available
Type Z Eligibility TYPZIMP (Wave 3) Frequency Percent ------------------------------------------------- Not Eligible 2964 72.63 Eligible 1117 27.37
![Page 15: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/15.jpg)
15
ELIGIBILITY FOR TYPE Z DONORS
• Interview or sufficient partial interview sufficient partial = reached first asset
question (completed Demographics, Labor Force Recipiency, General Income Recipiency, and Asset Intro.)
![Page 16: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/16.jpg)
16
TYPE Z PROCESS
• determine if person is type Z or donor, create separate files for type Z and donors
![Page 17: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/17.jpg)
17
TYPE Z PROCESS - CONTINUED
• create 4 levels of match keys for each person on both files– match keys are based on rotation group plus
various demographic variables: age, race, sex, veteran status, marital status, relationship to reference person, educational attainment, parental status, spouse’s interview status
– Level 1 keys are the most restrictive, level 4 are the least (designed to always find a match)
![Page 18: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/18.jpg)
18
TYPE Z PROCESS - CONTINUED
• sort both files by match keys• match files• select best match for each type Z case:
– level 1 match=best level 4=worst• transfer data from donor record to type z
record for matched cases
![Page 19: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/19.jpg)
19
LITTLE TYPE Z
• Used in labor force edit to get job and labor force data from a donor
![Page 20: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/20.jpg)
20
QUESTIONS?
![Page 21: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/21.jpg)
21
DISCUSSION ISSUES ON HOW TO IMPROVE CURRENT IMPUTATIONS
1. What do we gain by doing type Z imputations vs. hot deck imputations? What are the trade-offs?
2. What is the threshold (or how should a threshold be determined) for identifying hot-deck overuse for a particular donor/cell? Does this need to be adjusted as the sample size changes (as in the case of a sample cut)?
![Page 22: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/22.jpg)
22
DISCUSSION ISSUES ON HOW TO IMPROVE CURRENT IMPUTATIONS
(CONTINUED)
3. What is the threshold (or how should a threshold be determined) for determining cold-deck overuse?
4. How do we determine optimum size for a particular hot deck? Is there a relationship between the number of cells in a hot deck matrix and the number of cases in the universe?
![Page 23: SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS](https://reader030.fdocuments.net/reader030/viewer/2022032612/568133d2550346895d9ac967/html5/thumbnails/23.jpg)
23
DISCUSSION ISSUES ON HOW TO IMPROVE CURRENT IMPUTATIONS
(CONTINUED)
5. Currently, we do not distinguish between reported data and imputed data in the stratifying variables for particular hot decks. Do we need to be concerned about this?
6. Any objective, simple way to choose stratifying variables in a hot deck?
7. What methods/criteria should be used to determine quality of imputations?