Chapter 2: File Contents - ASPE · Chapter 2: File Contents II-6 File Field Description Reported...

25
Chapter 2: File Contents II-1 Chapter 2: File Contents Grantee welfare outcomes files should include all administrative and survey data used in analysis and reporting. To assist grantees with identifying and reporting specific data elements in these files, the Work Group developed a set of recommendations for commonly reported administrative data items, examples of grantee-specific administrative data items, and a list of some auxiliary data elements grantees may choose to report in their files. The Group’s recommendations and guidelines for administrative and survey data were developed to provide grantees with a model for reporting and labeling their data. However, for some grantees that have already produced data files for reporting and analysis, these recommendations and guidelines may require an additional level of effort to implement. The following chapter provides further information on the Group’s recommendations regarding administrative and survey data file contents and variable labeling. Ideally, grantees should follow these recommendations; however, where this proves to be a significant additional burden or is impossible to implement within existing data agreements, grantees may choose to adopt a different approach. If a different approach is selected, it is incumbent on grantees to clearly document this approach. A. Administrative Data Files Grantee administrative data files should include all data elements reported in their administrative data reports. At a minimum, the data contained in grantees’ administrative data files should be sufficient for researchers to replicate results in the administrative data reports (see Tip 2.1). Additionally, case-level data should be able to be linked to data in other administrative or survey data files. Generally speaking, grantee administrative data fall into one of two categories: Common data elements analyzed by most grantees in their administrative data reports. The Group identified 18 such elements (see Figure 2.1). Grantee-specific data elements or data elements that were included in individual grantees’ research, but not found across all grantees (see Figure 2.2). Tip 2.1 contains a checklist identifying minimum contents of administrative data files. Common data elements are listed in Figure 2.1. Grantee-specific data elements are listed in Figure 2.3. Auxiliary data elements are listed in Figure 2.3.

Transcript of Chapter 2: File Contents - ASPE · Chapter 2: File Contents II-6 File Field Description Reported...

Chapter 2: File Contents II-1

Chapter 2: File Contents

Grantee welfare outcomes files should include all administrative andsurvey data used in analysis and reporting. To assist grantees withidentifying and reporting specific data elements in these files, theWork Group developed a set of recommendations for commonlyreported administrative data items, examples of grantee-specificadministrative data items, and a list of some auxiliary data elementsgrantees may choose to report in their files.

The Group’s recommendations and guidelines for administrative andsurvey data were developed to provide grantees with a model forreporting and labeling their data. However, for some grantees thathave already produced data files for reporting and analysis, theserecommendations and guidelines may require an additional level ofeffort to implement.

The following chapter provides further information on the Group’srecommendations regarding administrative and survey data filecontents and variable labeling. Ideally, grantees should follow theserecommendations; however, where this proves to be a significantadditional burden or is impossible to implement within existing dataagreements, grantees may choose to adopt a different approach. If adifferent approach is selected, it is incumbent on grantees to clearlydocument this approach.

A. Administrative Data Files

Grantee administrative data files should include all dataelements reported in their administrative data reports. At aminimum, the data contained in grantees’ administrative data filesshould be sufficient for researchers to replicate results in theadministrative data reports (see Tip 2.1). Additionally, case-leveldata should be able to be linked to data in other administrative orsurvey data files.

Generally speaking, grantee administrative data fall into one of twocategories:

• Common data elements analyzed by most grantees in theiradministrative data reports. The Group identified 18 such elements(see Figure 2.1).

• Grantee-specific data elements or data elements that wereincluded in individual grantees’ research, but not found across allgrantees (see Figure 2.2).

Tip 2.1 contains achecklist identifyingminimum contents ofadministrative data files.

Common data elementsare listed in Figure 2.1.

Grantee-specific dataelements are listed inFigure 2.3.

Auxiliary data elementsare listed in Figure 2.3.

Chapter 2: File Contents II-2

In addition to examples of the common and grantee-specific dataelements, the Work Group also identified a brief list of auxiliarydata on the characteristics of former recipients and other welfareoutcomes that would be useful for further research (see Figure2.3).

Grantees are not required to include auxiliary data in their files;however, if possible, reporting such additional elements isrecommended.

Tip 2.1: Checklist for Producing Administrative Data Files

At a minimum, grantees’ administrative data files should include the data necessary toreplicate results in their administrative data reports. These data may include:

q Common and grantee-specific data elements.

q New variables constructed for analysis and included in administrative data reports.

q The raw data used to construct variables for analysis, except where several raw dataelements were used to construct binary flags for benefit receipt during a given month orquarter.

q Variables that identify subgroups or other characteristics used in report cross-tabulations.

q Record identifiers that allow case-level data to be related or linked to survey oradministrative data contained in other files.

The Work Group also recommends that grantees include common and auxiliary dataelements that were not included in grantees’ analysis and reporting, if this inclusion does notpresent a significant additional burden.

Chapter 2: File Contents II-3

Common Data Elements

The Work Group identified 18 administrative data items that werereported by most grantees in their welfare outcomes research. Forthese items, the Work Group has developed a uniform set ofdefinitions and reporting guidelines to ensure that these data arereported in a standardized manner. Figure 2.1 provides a briefoverview of these items; Appendix B provides a more detaileddescription and the recommended reporting requirements for theseitems that may be used when constructing the file.

Although much of Figure 2.1 and Appendix B refer to the “exit”month, this term should be understood as the “base” month orquarter in which selection into the study population occurred. Forexample, the selection event could be the month or quarter in which anindividual left TANF or applied for or was diverted from TANFassistance.

While it is preferred that grantees report all items and use therecommended standardized approach, alternative approaches may beused where it is impossible to follow these recommendations or theypresent a significant burden to grantees. Where another approach isused, grantees should provide a description of the alternative intheir documentation. Grantees may also omit some common dataelements if including them would present an additional burden. Forexample, a grantee may report only one or two of the three commondata elements related to Medicaid enrollment.

Figure 2.1 presentscommon data elementdefinitions and reportingguidelines.

Appendix B containsadditional detail insupport of thesecommon data items.

Chapter 2: File Contents II-4

Figure 2.1: Recommended “Common” Data Elements

File Field Description Reported Value Minimum TimePeriod to Be

Reported

Variable Name(s)

QuarterlyEarnings

Earnings informationextracted from

quarterlyUnemploymentInsurance data.

Dollar value for totalearnings during a

given calendar quarter

Four quarters beforeand four quarters after

the selection event.

Variable name prefix:EARN

Examples for leaversstudies:

Prior to TANF “exit”:EARNBQ01 through

EARNBQ99

Quarter of TANF “exit”:EARNAQ00

After TANF “exit”:EARNAQ01 through

EARNAQ99

TANF BenefitReceipt or

Authorization

A binary variable thatindicates whether a

casehead received orwas authorized toreceive a TANF

payment during agiven month.

0 = Casehead did notreceive or was not

authorized to receive aTANF payment in a

given month

1 = Caseheadreceived or was

authorized to receive aTANF payment in a

given month

Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:TNFR

Examples for leaversstudies:

Prior to TANF “exit”:TNFRBM01 through

TNFRBM99

Month of TANF “exit”:TNFRAM00

After TANF “exit”:TNFRAM01 through

TNFRAM99

TANF BenefitAmount

The TANF benefitamount paid to a

casehead during agiven month.

Dollar value of benefit Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:TNFB

Examples for leaversstudies:

Prior to TANF “exit”:TNFBBM01 through

TNFBBM99

Month of TANF “exit”:TNFBAM00

After TANF “exit”:TNFBAM01 through

TNFBAM99

Chapter 2: File Contents II-5

File Field Description Reported Value Minimum TimePeriod to Be

Reported

Variable Name(s)

Selection EventDate

The date used forselection into the studycohort. For example,for leavers studies,this would be the

month/year in which acasehead last

received benefits; orfirst month/year in

which a casehead didnot receive benefits.

For applicant studies,this would be the

month/year in whichan individual appliedfor TANF assistance.

MMYYYY

MM = Numeric MonthYYYY = Numeric Year

N/A Leavers Studies:EXITDATE

Applicant/DiversionStudies:

APPLDATE

CaseheadMedicaid

Enrollment

A binary variablenoting whether the

casehead wasenrolled in the

Medicaid programduring a given month.

0=Casehead was notenrolled in the

Medicaid programduring a given month

1=Casehead wasenrolled in the

Medicaid programduring a given month

Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:MEDR

Examples for leaversstudies:

Prior to TANF “exit”:MEDRBM01 through

MEDRBM99

Month of TANF “exit”:MEDRAM00

After TANF “exit”:MEDRAM01 through

MEDRAM99

MedicaidEnrollment for

Any Member ofthe Former Case

Unit

A binary variable thatindicates whether anymember of the formercase unit was enrolled

in the Medicaidprogram during a

given month.

0=No member of thecase unit was enrolled

in the Medicaidprogram during a

given month

1=At least one othermember of the caseunit was enrolled in

the Medicaid programduring a given month(e.g., casehead, child,

or other adult)

Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:MEDT

Examples for leaversstudies:

Prior to TANF “exit”:MEDTBM01 through

MEDTBM99

Month of TANF “exit”:MEDTAM00

After TANF “exit”:MEDTAM01 through

MEDTAM99

Chapter 2: File Contents II-6

File Field Description Reported Value Minimum TimePeriod to Be

Reported

Variable Name(s)

MedicaidEnrollment forAny Child WhoWas a Memberof the Former

Case Unit

A binary variable thatindicates whether a

child who was amember of the formercase unit was enrolled

in the Medicaidprogram during a

given month.

0=No child wasenrolled in the

Medicaid programduring a given month

1=At least one childwas enrolled in theMedicaid program

during a given month

Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:MEDC

Examples for leaversstudies:

Prior to TANF “exit”:MEDCBM01 through

MEDCBM99

Month of TANF “exit”:MEDCAM00

After TANF “exit”:MEDCA01 through

MEDCA99

Food StampReceipt

A binary variable thatindicates whether thecasehead received or

was authorized toreceive food stamps

during a given month.

0=Casehead did notreceive or was not

authorized to receivefood stamps during

given month

1=Casehead receivedor was authorized toreceive food stampsduring given month

Twelve months beforeand twelve monthsafter the selection

event.

Variable name prefix:FOOD

Examples for leaversstudies:

Prior to TANF “exit”:FOODBM01 through

FOODBM99

Month of TANF “exit”:FOODAM00

After TANF “exit”:FOODAM01 through

FOODAM99

Chapter 2: File Contents II-7

File Field Description Reported Value Minimum TimePeriod to Be

Reported

Variable Name(s)

TANF CaseComposition

Grantees shouldreport at least two of

the following:

1. Number ofcertified adults on

TANF case.2. Number of

certified childrenon TANF case.

3. Total number ofcertified

individuals onTANF case.

Number of certifiedadults, children, or

total number ofindividuals on TANF

case.

Twelve months beforeand twelve monthsafter the selection

event.

For example: themonth of exit, andother months up to

twelve months beforeand after exit (wheredata are available).

Variable name prefixfor adults: ADLT

Variable name prefixfor children: CHLD

Variable name prefixfor total number:

TOTC

Examples for leaversstudies:

Prior to TANF “exit”:i.e., ADLTBM01i.e., CHLDBM01i.e., TOTCBM01

Month of TANF “exit”:i.e., ADLTAM00i.e., CHLDAM00i.e., TOTCAM00

After TANF “exit”:i.e., ADLTAM01i.e., CHLDAM01i.e., TOTCAM01

Casehead Dateof Birth

The month and year inwhich the casehead

was born.

MMYYYY

MM = Numeric MonthYYYY = Numeric Year

N/A CASE_AGE

CaseheadGender

Casehead gender. 1=Male2=Female

N/A CASE_SEX

Case Type The case unit’sdesignation as asingle, two, or noparent household.

1=Single parenthousehold

2=Two parenthousehold

3=No parent on case/child only case

4=Other

N/A CASETYPE

Race/Ethnicity CaseheadRace/ethnicity.

Grantees may usestate-specific

definitions and codes

N/A CASERACE

Age of OldestChild

Age of oldest child oncase at time of

selection into thestudy.

Age at TANF ”exit” orother selection event

N/A CHLDAGE2

Chapter 2: File Contents II-8

File Field Description Reported Value Minimum TimePeriod to Be

Reported

Variable Name(s)

Age of YoungestChild

Age of youngest childon case at time ofselection into the

study.

Age at TANF “exit” orother selection event

N/A CHLDAGE1

GeographicIdentifier

County in whichcasehead resided attime of selection into

the study.

State/County FIPSCode

N/A COUNTYCD

Survey Sample A binary variable thatindicates whether a

case was included in asurvey research

cohort.

0=Not included insurvey sample

1=Included in surveysample

N/A SURVFLAG

Marital Status Casehead maritalstatus at time ofselection into the

study.

Grantees may usestate-specific

definitions and codes

N/A MARISTAT

Note: In all cases, the term “exit” month should be understood to mean the critical selection event thatmarks entry into the study population, whether that be a study population of leavers, divertees,applicants, or entrants to the TANF program.

Chapter 2: File Contents II-9

Grantee-Specific and Auxiliary Data Elements

Each welfare outcomes grantee has selected additional data foranalysis and reporting that extend beyond those identified in the listof common elements. These data should also be included in granteeadministrative data files, and are considered grantee-specific dataelements.

The nature of grantee-specific data requires each grantee to definethe data being reported, including its values, the time period(s)covered, and the assigned variable names. When completing thistask, grantees should follow the Work Group’s recommendedapproach (see Chapter 1). To assist grantees in applying thisapproach, the Group identified a list of grantee-specific data elementsto serve as examples for applying these standards. A summary ofthese examples is provided in Figure 2.2.

The Work Group also identified a brief list of example auxiliary dataitems that would be helpful for grantees to include in their files.While most grantees did not use these data in their analysis andreports, these additional data elements represent an opportunity toimprove future welfare outcomes research. Figure 2.3 provides abrief overview of these items and examples of such items. Granteesare encouraged to report these auxiliary data items in theiradministrative data files, to the extent possible.

Figure 2.2 summarizesexamples of reportingstandards for grantee-specific data.

Figure 2.3 provides anoverview and examples ofauxiliary data items.

Chapter 2: File Contents II-10

Figure 2.2: Examples of “Grantee-specific” Data Elements

File Field Description Reported Value RecommendedTime Period toBe Reported

Variable Name(s)

Number of JobsHeld

Number of jobs heldby an individual in aparticular quarter as

indicated by thenumber of EIN’s listed

in UnemploymentInsurance wage

records.

Number Four quarters beforeand four quarters after

selection event.

Variable name prefix:JOBS

Examples for leaversstudies:

Prior to TANF “exit”:JOBSBQ01 through

JOBSBQ99

Month of TANF “exit”:JOBSAQ00

After TANF “exit”:JOBSAQ01 through

JOBSAQ99

Reported Wages Information oncasehead wagesobtained from a

source other than UIwage data. Forexample, wage

information that isavailable on State

TANF or food stampfiles.

Dollar value Twelve months afterand twelve months

before selection event.

Variable name prefix:WAGE

Examples for leaversstudies:

Prior to TANF “exit”:WAGEBM01 through

WAGEBM99

Month of TANF “exit”:WAGEAM00

After TANF “exit”:WAGEAM01 through

WAGEAM99

Chapter 2: File Contents II-11

File Field Description Reported Value RecommendedTime Period toBe Reported

Variable Name(s)

Length TANFSpell Before

Selection Event

The number ofconsecutive months

the casehead receivedTANF assistance

immediately prior toselection event.

For TANF leaversstudies, this should be

the number ofconsecutive monthsimmediately prior to

exit. Fordiversion/applicant

studies, this should bethe number of

consecutive months acasehead receivedassistance for last

episode. If there is nopast history of TANFbenefit receipt, thevalue would be 0.

Number of Months N/A SPELLLENT

ChildcareSubsidy

The childcare subsidyamount received by acasehead in a given

month.

Dollar value Twelve months beforeand twelve months

after selection event.

Variable name prefix:CSUB

Examples for leaversstudies:

Prior to TANF “exit”:CSUBBM01 through

CSUBBM99

Month of TANF “exit”:CSUBAM00

After TANF “exit”:CSUBAM01 through

CSUBAM99

Reason for CaseClosure

A state-specificnotation for the reason

a casehead’s TANFwas closed at time of

exit.

Individual state codesshould be used andclearly documented.

N/A CCLOSURE

Note: In all cases, the term “exit” month should be understood to mean the critical selection event thatmarks entry into the study population, whether that be a study population of leavers, divertees,applicants, or entrants to the TANF program.

Chapter 2: File Contents II-12

Figure 2.3: Examples of Auxiliary Administrative Data Elements andHow They May Be Defined

File Field Description Reported Value RecommendedTime Period toBe Reported

Variable Name(s)

Urban/Rural A binary variablenoting whether the

casehead resided inan urban or rural areaat time of the selection

event. Filedocumentation shouldclearly note how urbanand rural areas have

been defined.

0=Resided in ruralarea

1=Resided in urbanarea

N/A URBRURAL

GeographicIdentifier

A geographicidentifying variable atthe smallest possibleunit (i.e., zip code orcensus tract) notingcasehead location at

time of selectionevent.

Preferred codesinclude:

Zip Code or Censustract

N/A LOCATION

Number ofChildren Under 6

Years of Age

Number of children oncash assistance casewho were under theage of 6 at selection

event.

Number N/A CHUNDER6

Age of YoungestChild Not on

Case

Age of youngest childnot on cash assistance

case at time atselection event.

Number N/A YOUNGEST

Language Barrier A binary variable thatindicates whether the

casehead spokeEnglish or had a

language barrier attime of selection

event.

0=No language barrier

1=Experiencedlanguage barrier

N/A LANGUAGE

EducationalAttainment

A state-specificnotation as to thehighest level of

education a caseheadhad achieved at time

of selection event.

Individual state codesshould be used andclearly documented

N/A EDUCATION

Chapter 2: File Contents II-13

File Field Description Reported Value RecommendedTime Period toBe Reported

Variable Name(s)

Child SupportReceipt

A binary variablenoting whether

casehead receivedchild support

payments during themonth of the selection

event.

0=Did not receive childsupport payments

1=Received childsupport payments

N/A CHLDSUPP

Subsidized orPublic Housing

A binary variablenoting whether a

casehead received ahousing subsidy or

lived in publicly fundedhousing at the

selection event.

0=Did not receivehousing subsidy or live

in publicly fundedhousing

1= Received housingsubsidy or lived in

publicly fundedhousing

N/A PHOUSING

Note: In all cases, the term “exit” month should be understood to mean the critical selection event thatmarks entry into the study population, whether that be a study population of leavers, divertees,applicants, or entrants to the TANF program.

Chapter 2: File Contents II-14

Coding Missing Administrative Data

The Work Group agreed that it is important that researchers using thewelfare outcomes data files understand and correctly analyze missingdata. For example, flags for missing data listed as “blank” or “0” inadministrative data files mean very different things across dataelements and, as a result, may be easily misunderstood ormisinterpreted by outside users in their analyses.

To mitigate possible interpretation problems caused by missing data,the Work Group recommended a standard approach to labelingmissing administrative data in grantee files. An overview of thisapproach is provided in Figure 2.4. The Work Group recommendsthat grantees use negative numbers between the values of one andfive to flag missing data. While it is impossible to identify all of thecircumstances that may result in missing values, the Group dididentify some common occurrences and assigned reserved codes forthese situations.

Where recommended codes are insufficient to describe certain typesof missing data, grantees should construct additional codes usingvalues of “-4” or “-5” and provide clear documentation for howthese codes were applied.

Tip 2.2 provides additional information as to how recommendedcodes may be applied to missing Unemployment Insurance wagedata.

Examples of how these codes may be applied to the recommendedcommon data elements are also provided in Appendix B.

Tip 2.2 and Appendix Bprovide examples of howcodes for missingadministrative data maybe applied.

Figure 2.4 providesdefinitions and examplesof recommended codesfor missing administrativedata.

Chapter 2: File Contents II-15

Figure 2.4: Reserved Codes for Missing Administrative Data

Reserved Code Description Definition

-1 Match File Unavailable forCohort or Subgroup

Data are missing for a whole cohort or subgroupbecause the administrative data file was

unavailable for data matches (e.g., data aremissing due to time lags or other reasons).

-2 No Match Attempted forIndividual Case

Data are missing for certain cases because theindividual TANF case was excluded from the data

match (e.g., necessary identifying information suchas Social Security Number in the TANF record are

missing).

-3 Data were Missing in LinkedFile

While the case itself appears in the match file, thespecific data element was not reported for aparticular case (e.g., there was incomplete

information or missing values for a particular case).

Chapter 2: File Contents II-16

Tip 2.2: Applying the Reserved Codesfor Missing Administrative Data

Using UI wage data as an example, the recommended missing administrative data codesmay be applied as follows:

q “No Match File Available” or “-1”

Grantees should use a “-1” to label missing UI data when information for certain cases isabsent because the match file was unavailable for data matches with a subset of cases.For example, UI data are missing for a whole cohort or subgroup because of time lagsbetween the file becoming available and including the data in the analyses, OR becausethe UI file does not exist for some other reason.

q “No Match in Data File” or “-2”

Grantees should use a “-2” to label missing administrative data when information forcertain cases is absent because the individual TANF case was excluded from the datamatch. For example, a match between the TANF records and UI data could occur for aparticular case where the TANF data are missing a Social Security Number.

q “Data Were Not Reported” or “-3”

Grantees should use a “-3” to label missing administrative data in situations where thecase appears in the match file, but the specific data element was not reported for aparticular case. That is, there was incomplete information or missing data, such asevidence of benefit receipt but missing data on the benefit amount for a particular month.However, it is important to note that in UI earnings records, “missing” earningsinformation is generally is assumed to mean 0 earnings, not missing data. Accordingly,for the UI example, a “-3” missing code would probably not be used.

Chapter 2: File Contents II-17

2. Survey Data Files

Grantees should produce data files for survey data collected as apart of their welfare outcomes grant, including all data collected by aspecific questionnaire as well as some basic survey administrationvariables that researchers will need to correctly interpret the data.The following sections provide more specific information on therecommended contents for grantee survey data files.

Survey Sample Cases to Be Included in Grantee Files

The Work Group recommends that grantee survey data files includeall cases, or records, that were sampled for a specific survey. That is,both survey respondents and non-respondents should be included inthe survey data file. A binary “respondent flag” should be includedto identify the two groups (see also Figure 2.5). Including allsampled cases will allow researchers to accurately examinedifferences between survey respondents and non-respondents.

Survey Data Items to Be Reported by Grantees

The Work Group recommends that grantees follow the basic premisethat all survey data collected should be included in grantee surveydata files (see Tip 2.3). However, the Group also noted severalimportant exceptions to this premise:

• Verbatim, or “open-ended,” question responses that have notbeen coded do not need to be included. Coded verbatim responsesshould be included in the file.

• Grantees may substitute constructed variables for raw survey datain cases where the same survey question was administered severaltimes to different survey sample subgroups. For example, the sameincome question administered to different subgroups at differentpoints in the survey may be aggregated into a single measure orvariable.

• Obvious personal identifiers such as name, address, phonenumber, social security number, and case identification numbershould not be included in the file.

Similar to administrative data files, grantees should also includevariables constructed for survey data analysis and reporting.Supporting information as to how these variables were created andtheir corresponding values and labels should be clearly presented inthe survey data file documentation.

Figure 2.5 providesinstructions for includinga “respondent flag” insurvey data files.

A checklist forproducing survey datafiles is provided in Tip2.3.

Chapter 2: File Contents II-18

In some cases, grantees may feel that data from a particularsurvey question or bank of questions are “bad” or “unreliable”based on their subsequent analysis. These data should still beincluded in the welfare outcomes survey data file, and granteeconcerns regarding limitations should be clearly described inthe survey data file’s documentation.

Tip 2.3: Checklist for Producing Survey Data Files

What goes into a survey data file?

q All survey data collected by the questionnaire.q All sampled cases – not just those with whom a survey interview was

conducted.q All variables constructed for analyses that were included in grantee survey

data reports; raw data used to construct variables should also be includedin the file and corresponding documentation should indicate how variableswere constructed.

q Survey data weights.q Survey administration variables that describe the survey process (see

Figure 2.4).q Flags that indicate where values have been imputed.q A unique record identifier for each case. This identifier should be

constructed using the approach outlined in Chapter 1 and should allowsurvey data to be linked to administrative data included in other welfareoutcomes files.

What does not need to go into a survey data file?

q Verbatim or open-ended survey question responses that have not beencoded.

q Obvious personal identifiers, such as name, address, phone number, socialsecurity number, and case identification number.

q Data collected by computer-assisted interviewing systems for samplemanagement and data edit checks (e.g., interviewer call attempt messagefields, responses from verification questions, “soft edits”).

q Survey data collected during questionnaire pretesting, unless these dataare included in grantee survey data analysis and reports.

Chapter 2: File Contents II-19

Survey Administration Items to be Reported by Grantees

To correctly interpret survey data, researchers must understand thenature of the survey implementation process. The Work Grouprecommends that grantees report eight survey administrationvariables in the survey data files. A complete description of theseitems is provided in Figure 2.5. Grantees should not report surveyadministration items that do not apply to their survey effort (e.g.,pretest flag or sample weights).

Figure 2.5 gives acomplete description ofthe eight surveyadministration variablesfor survey data files.

Chapter 2: File Contents II-20

Figure 2.5: Recommended Survey Administration Items

File Field Description Reported Value Variable Name

Unique Identifier Each casehead included inthe survey data file should

be assigned a uniqueidentifier that allows case-level data to be linked to

other files.

Identifier constructed usingrecommended approach

outlined in Chapter 1.

MASTERID

Final Survey Disposition The final survey dispositionfor a particular case should

be indicated.

Grantee-specific codes maybe used. These codes

should correspond with theresponse rate calculation

methodology outlined in thesurvey file’s documentation.

DISPOSIT

Respondent Flag A binary flag noting casesthat are included as

“completed interviews” inthe numerator of the

reported response rate.

0 = Not included inresponse rate numerator as

completed interview

1 = Included in responserate numerator as

completed interview

RESPOND

Sample Weight Weights calculated toaccount for survey sample

design effects.

Sample weight value SRWEIGHT(If file has multiple weights,variable names should be

noted as:SRWEIGT1; SRWEIGT2;

Etc.)

Interview Date The date on which a surveyinterview was completed.

MMDDYYYY

MM = Numeric monthDD = Numeric day

YYYY = Numeric year

RESPDATE

Number of TelephoneAttempts

The number of telephoneinterviewer attempts to

reach a sample member fora survey interview. Survey

data file documentationshould clearly note how

“telephone interviewattempt” was defined.

Number ATTEMPTS

Interview Mode A flag indicating by whichmode a survey interview

was conducted.

1 = CATI2 = CAPI

3 = Phone with pencil andpaper

4 = In-person with penciland paper

5 = In-person with cellulartelephone

6 = Mail survey

INTVMODE

Chapter 2: File Contents II-21

File Field Description Reported Value Variable Name

Pretest Sample Flag In cases where granteesinclude data from a survey

pretest in their finalanalysis, a binary flag

should be included thatnotes which cases are fromthe pretest and which cases

are from the final surveyfielding.

0= Non-pretest

1 = Pretest data

PRETEST

Chapter 2: File Contents II-22

Protecting Respondent Confidentiality in Survey Data Files

Work Group members felt that even with proposed plans forresearcher-restricted access to survey data files (see Chapter 5), somegrantees may feel the need to provide additional confidentialityprotections in their data by excluding certain data elements. In thesecases, the Group recommends that grantees consider “top coding” orcollapsing categories (e.g., age), as an alternative to variable deletion.However, if grantees must delete variables to meet confidentialityrequirements, the deletions should be clearly noted in their filedocumentation.

• Top Coding

While Work Group members did not feel that standards for top codingspecific variables were necessary, they did recommend a generalstandard of having no more than 5% of high- or low-range valuestop coded. This strategy eliminates “extreme” high or low valuesfrom the data file.

If grantees choose to top code, these data should be included in thesurvey data file and the file’s documentation should clearly notewhere and when top coding has been applied.

• Collapsing Categories

Collapsing values for continuous survey data variables into categoriesmay reduce the possibility of identifying individual respondents. Forexample, children’s ages may be collapsed into categories such as 0-2years, 3-5 years, etc.

Similar to top coding, if grantees choose to collapse categories, thesedata should be included in the survey data file and the file’sdocumentation should clearly note where and when categories havebeen collapsed.

Recommended Strategies for Reporting Special Types ofSurvey-related Data

Some grantee survey analysis efforts have employed advancedstatistical techniques to account for sample design and survey non-response. Where these techniques have been applied, it is essentialthat grantees clearly document these strategies and include values orother data elements that will allow future researchers to replicatereported survey results.

Chapter 5 providesbackgroundinformation onconfidentiality andrestricted file access.

Chapter 2: File Contents II-23

The Group noted that the two most popular techniques employedby grantees were imputation and survey sample weights.Specific recommendations for reporting on these strategies weredeveloped and are described in the following paragraphs.

• Imputed Data

Some grantees may choose to impute certain survey data values.In cases where imputed values are reported, an additional dataelement or “flag” variable should be created that indicateswhich records contain imputed data. The flag should be codedas follows:

0 = Non-imputed data1 = Imputed data

A separate flag variable should be constructed for each survey dataelement that has been imputed. For example, if survey dataindicating monthly household income and the number of jobs heldsince TANF exit includes imputed values, two flag variablesshould be created – imputed income and imputed number of jobs –that indicate which values have been imputed.

Grantee survey data file documentation should include a detailedaccount of how imputation techniques were applied.

• Survey Weights

All survey data weights should be included in grantee data filesand file documentation should include a detailed description ofhow these weights were calculated. Where grantees havecalculated more than one survey weight for a particular record,each weight should be included in the file as a separate variable(see Figure 2.5).

Figure 2.5 showsguidelines forreporting surveyadministrationitems, includingsample weights.

Chapter 2: File Contents II-24

Coding Missing Survey Data

Similar to the reserved codes for missing administrative data, theGroup recommends that grantees use a set of reserved codes whenlabeling missing survey data. For missing survey data, granteesshould use negative numbers between the values of six andnine. The recommended reserved codes are presented in Figure2.6. Grantees should construct additional missing codes wherenecessary and provide clear documentation for how these codeswere applied.

Figure 2.6 showsreserved codes formissing survey data.

Chapter 2: File Contents II-25

Figure 2.6: Reserved Codes for Missing Survey Data

Reserved Code Description Definition

-6 Invalid Missing Data are missing because a survey question wasinadvertently not administered to a respondent. For

example, a survey skip pattern was violated orincorrectly applied.

-7 Don’t Know Data are missing because a respondent answereda question with a “don’t know” response.

-8 Refused Data are missing because a respondent refused toanswer a question.

-9 Valid Missing Data are missing because survey question was notadministered to respondent.

For example, the question was correctly skipped.