Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest...

44
Linking survey data with administrative employment data: The case of the ALWA survey Utilizing Administrative Data: Technical, Statistical and Research Issues Washington D.C. October 27, 2011 Manfred Antoni Institute for Employment Research (IAB)

Transcript of Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest...

Page 1: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Linking survey data with administrativeemployment data: The case of the ALWA survey

Utilizing Administrative Data: Technical, Statistical and Research Issues

Washington D.C.

October 27, 2011

Manfred AntoniInstitute for EmploymentResearch (IAB)

Page 2: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

OutlineAdministrative data sources of the IAB and data access

ALWA survey data

The process of record linkage

Outlook

2

Page 3: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Origin of administrative IAB dataSources of data of the German Federal Employment Agency (BA):

Social security notifications by employersData from internal processes of the local employment agencies and the BA

Data are merged and prepared for scientific purposes by the Institute forEmployment Research (IAB)

Access to data sets by external researchers through the Research DataCentre (FDZ) of the BA, located at the IAB in Nuremberg, Germany

3

Page 4: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Ways of access to IAB data via the FDZ

Off-Site Use On-Site Use

CD or Download

(Scientific Use File)

Remote Execution Guest Stay

Factually anonymous Weakly anonymous (= confidential, restricted)

IAB CD

4

Page 5: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Access via RDC-in-RDC approach

Nuremberg

Düsseldorf

Bremen

Berlin Ann Arbor, Mi

Dresden

Terminal Server with RDC

Data

Management Server

Thin Client

ICA connection using a Citrix

Access Gateway

Connection to Management Server

Technical Implementation: Citrix thin client solution

5

Page 6: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Administrative labor market data: individualsDay-to-day information on histories of dependent employment andregistered unemployment since 1975 (over 80% of the German labor force)

Longitudinal information on earnings and benefit receipt

Since 2000: detailed information on the participation in active labor marketpolicy measures and job seeking

Detailed social security benefit information since 2005 (household-level)

6

Page 7: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Administrative labor market data: establishmentsDaily establishment-level employment and wage bill from 1975-presentEstablishment History Panel (BHP) includes yearly measures on:

economic sectorqualification and age structurewage distribution inside the firmworker flows for different subgroups of employeesfounding and closing of firms

Administrative establishment-level data can be combined with worker leveldata and to the IAB Establishment Panel survey

7

Page 8: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

8

Page 9: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Administrative data sources of the IAB and data access

ALWA survey data

The process of record linkage

Outlook

9

Page 10: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Work and Learning in a Changing World (ALWA) (I)Retrospective interviews (CATI) with > 10,000 German residents (cf. Antoni et al.,

2010)

Birth cohorts 1956-1988 (aged 18-52 at time of interview in 2007/08)Monthly longitudinal information on, e.g.

educational,employment,residential,partnership, child-rearing and parental leave histories,military and alternative services, etc.

Aided recall techniques reduce recall error during interviews. (cf. Drasch and

Matthes, forthcoming)

10

Page 11: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Work and Learning in a Changing World (ALWA) (II)Detailed cross-sectional information on, e.g.

place and date of birth, immigrant background, language skills,family background,cultural capital,religiousness,importance of different domains of life,self-reported measures of cognitive skills and personality traits etc.

Literacy and numeracy tests (PAPI, 30 min. per domain) for a sub-sampleData access: Scientific Use File provided by the FDZ(cf. http://fdz.iab.de/en/FDZ_Individual_Data/ALWA.aspx)

11

Page 12: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Administrative data sources of the IAB and data access

ALWA survey data

The process of record linkage

Outlook

12

Page 13: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Consent to record linkage in GermanySurvey institute or interviewers are legally bound to inform respondentsabout the nature of information that is going to be matched.

The interviewer has to ask for approval to that procedure explicitly.

The respondent has to be able to make an informed decision.

During the ALWA survey, consent was given by 9,531 (92%) of therespondents.

13

Page 14: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Identifiers of respondentsALWA lacks unique identifiers for a direct link to administrative records.Identifiers for matching:

first and last namegenderday, month and year of birthpostal code, place name, street name and house number

Sources of identifiers:Field information (infas Institute for Applied Social Sciences)Personal register data (IAB department IT Services and Information Management)

Reference year for both sources: 2007Extensive standardization of identifiers

14

Page 15: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Subsequent steps of the linkage processExact matching

Records from both data sources are directly matched based on all available fields.Even smallest variations in spelling lead to a rejection of a potential match.

Probabilistic record linkageComparison was done with software Merge ToolBox (MTB, v0.7). (cf. Schnell,

Bachteler, and Reiher, 2005)

Parameters for Jaro-Winkler metric according to prior experience with IAB data(cf. Bachteler, 2008)

Blocking on postal codes (about 1100 blocks)Classification of record pairs into links, possible links and non-links (cf. Fellegi and

Sunter, 1969)

Manual matchingIndependent manual review and classification of possible links by two staffmembersSupervision and final decision in case of contradicting classifications by thirdperson

15

Page 16: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Number of observations over all stages of linkage

N NNc

NNr

CATI respondents (Nr ) 10404 100%Consenting CATI respondents (Nc ) 9531 100% 91.61%Exact matches 5035 52.83% 48.39%Exact and probabilistic matches (Jaro-Winkler) 7919 83.09% 76.11%Exact, probabilistic and manual matches 8243 86.49% 79.23%

16

Page 17: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Data release and outlookOfficial data product of the FDZ: "ALWA survey data linked to administrativedata of the IAB" (ALWA-ADIAB)Access via on-site use and subsequent remote execution at the FDZ startingthis week(cf. http://fdz.iab.de/en/FDZ_Individual_Data/ALWA.aspx)

Planned methodological analyses based on ALWA-ADIAB:Evaluation of success of aided recall techniques in reducing recall errorValidation of educational histories in administrative data and evaluation ofexisting correction proceduresAnalysis of social desirability bias in respondent behavior

Planned substantive research based on ALWA-ADIAB:Monetary returns to cognitive skills in the German labor market

17

Page 18: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Data release and outlookOfficial data product of the FDZ: "ALWA survey data linked to administrativedata of the IAB" (ALWA-ADIAB)Access via on-site use and subsequent remote execution at the FDZ startingthis week(cf. http://fdz.iab.de/en/FDZ_Individual_Data/ALWA.aspx)

Planned methodological analyses based on ALWA-ADIAB:Evaluation of success of aided recall techniques in reducing recall errorValidation of educational histories in administrative data and evaluation ofexisting correction proceduresAnalysis of social desirability bias in respondent behavior

Planned substantive research based on ALWA-ADIAB:Monetary returns to cognitive skills in the German labor market

17

Page 19: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Future developments at the FDZCombination of administrative records (and surveys) with

National Educational Panel Study (NEPS): ALWA respondents willing toparticipate in further rounds were included in the NEPS. Non-consenterswere asked for consent again.

⇒ Analysis of consent and linkage success in a longitudinal perspectivePatent dataData of the Deutsche Bundesbank (Foreign Direct Investments, AnnualFinancial Statements)Commercial business data (Bureau van Dijke)Health dataSocial security data from other countriesMultinationals

18

Page 20: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Thank you for your attention

Manfred AntoniInstitute for Employment Research (IAB)Contact: [email protected]

Page 21: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Consent and linkage success rate by subgroups, expressed as percentages Iconsenters exact exact+probabil. all

matches matches matches

Total 92.1 53.1 80.1 86.4

18-24 93.9 59.9 84.1 90.025-34 91.6 58.3 82.9 90.735-44 92.0 50.4 79.0 85.545-52 91.5 48.8 77.1 82.7

(0.014) (0.000) (0.000) (0.000)Female 92.1 52.5 80.2 87.1Male 92.2 53.7 80.0 85.7

(0.879) (0.243) (0.867) (0.050)German nationality 92.1 53.0 80.0 86.2Foreign nationality 90.9 55.3 85.4 95.5

(0.489) (0.532) (0.057) (0.000)Native language not German 92.1 65.8 84.2 93.8Native language German 92.1 52.6 80.0 86.2

(0.991) (0.000) (0.068) (0.000)Born in West Germany 91.7 52.3 79.8 85.8Born in East Germany 93.8 56.5 81.6 89.1

(0.003) (0.002) (0.088) (0.000)No training 93.3 57.9 82.2 87.1Training + lower secondary 92.0 58.9 85.9 92.4Training + intermediate 91.8 57.6 83.7 90.1Training + upper secondary 93.1 53.5 82.1 88.9Master craftsman 92.5 48.0 73.1 79.0Higher Education 91.0 41.8 71.8 79.0

(0.106) (0.000) (0.000) (0.000)

N 9790 9024 9024 9024

Note: ALWA, own calculations, p-values of Pearsonχ2-test in parantheses. Percentages in columns related to linkage success arebased on consenters.

Page 22: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Consent and linkage success rate by subgroups, expressed as percentages IIconsenters exact exact+probabil. all

matches matches matches

Total 92.1 53.1 80.1 86.4

Self employed 89.2 42.4 64.8 72.6Freelancer 93.8 52.9 82.6 87.4In dependent employment 92.1 56.3 86.8 92.9Civil servant 93.3 13.1 20.9 25.7Unemployed 89.3 68.4 86.9 94.2In formal education 94.7 56.6 80.0 84.9Other activity 92.3 47.5 73.5 82.4

(0.000) (0.000) (0.000) (0.000)No partner in household 92.3 58.3 83.3 88.6Partner in household 92.0 50.0 78.2 85.1

(0.641) (0.000) (0.000) (0.000)No children in household 92.2 56.3 81.6 87.3Children in household (dummy) 92.0 49.9 78.7 85.5

(0.684) (0.000) (0.001) (0.015)Personal net income <500EUR 93.2 54.7 80.8 86.5500-999EUR 92.7 59.4 85.6 92.51000-1499EUR 92.6 56.4 82.3 90.41500-1999EUR 92.8 57.2 83.0 88.92000-2999EUR 92.9 46.1 75.6 81.3more than 3000EUR 92.5 40.2 69.2 74.4Income refused 63.6 47.3 80.0 86.7

(0.000) (0.000) (0.000) (0.000)

N 9790 9024 9024 9024

Note: ALWA, own calculations, p-values of Pearson χ2-test in parantheses. Percentages in columns related to linkage success arebased on consenters.

Page 23: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Possible determinants of consentRespondent characteristics: gender, age, qualification, foreign nationality,native language German, cognitive abilities, employment status, income,data protection concerns (measured as refusal to provide incomeinformation), household composition

Interviewer characteristics: gender, age, qualification, number of previousALWA interviews, consent rate in previous interviews, survey experiencebefore ALWA, age and schooling level relative to that of respondent

Interview situation: elapsed duration of interview, share of refused answers(to sensitive questions), share of answers with recall problems, consent tofurther wave or paper-and-pencil tests, weekday, disturbances,comprehension problems during interview

22

Page 24: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Method of empirical analysis: Probit regressionBinary dependent variables:

consent to data linkage by the respondent (N=9790)successful exact match of survey and administrative record (N=9024)successful exact or probabilistic match of survey and administrative record(N=9024)

Results presented as odds ratiosCluster-robust standard errors on the basis of 210 interviewers in consentmodelsExclusion of observations with missing values in control variables

23

Page 25: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of consent I: respondent

without interactions with interactions

Foreign nationality 0.872 (0.101) 0.869 (0.101)Born in East Germany 1.192∗∗∗ (0.074) 1.188∗∗∗ (0.075)Age (dummies, ref.: aged 18-24)25-34 0.846∗ (0.079) 0.832∗ (0.079)35-44 0.909 (0.086) 0.847 (0.087)45-52 0.879 (0.089) 0.794∗∗ (0.092)Qualification (dummies, ref.: no vocational training)Training + lower secondary 1.045 (0.101) 1.017 (0.100)Training + intermediate 0.999 (0.082) 0.984 (0.084)Training + upper secondary 1.114 (0.104) 1.137 (0.108)Master craftsman 1.065 (0.128) 1.057 (0.128)Higher Education 1.015 (0.088) 1.033 (0.090)Employment status (dummies, ref.: unemployed)Self employed 1.136 (0.128) 1.135 (0.129)Freelancer 1.503∗∗∗ (0.214) 1.497∗∗∗ (0.215)In dependent employment 1.316∗∗∗ (0.113) 1.320∗∗∗ (0.114)Civil servant 1.482∗∗∗ (0.196) 1.493∗∗∗ (0.199)In formal education 1.474∗∗∗ (0.163) 1.485∗∗∗ (0.165)Other activity 1.327∗∗∗ (0.141) 1.336∗∗∗ (0.142)Personal net income (dummies, ref.: 1500-1999 EUR)Personal net income below 500EUR 1.010 (0.078) 1.009 (0.079)500-999EUR 0.936 (0.070) 0.935 (0.069)1000-1499EUR 0.961 (0.064) 0.963 (0.064)2000-2999EUR 1.002 (0.075) 1.003 (0.075)more than 3000EUR 1.064 (0.085) 1.065 (0.085)Income refused 0.524∗∗∗ (0.053) 0.526∗∗∗ (0.053)

Page 26: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of consent II: interviewer

without interactions with interactions

Int: male 0.886∗∗ (0.053) 0.885∗∗ (0.052)Experience as interviewer (years) 1.078∗∗∗ (0.031) 1.077∗∗∗ (0.031)Age of interviewer (dummies, ref.: aged < 25)Int: aged 25-34 1.118 (0.122) 1.113 (0.113)Int: aged 35-44 1.131 (0.138) 1.192 (0.165)Int: aged 45-54 1.211∗∗ (0.117) 1.317∗∗ (0.160)Int: aged 55 and more 1.207 (0.145) 1.394∗∗ (0.213)Qualification of interviewer (dummies, ref.: no vocational training)Int: training, below upp. secondary 1.017 (0.108) 1.074 (0.124)Int: training, upper secondary 1.062 (0.111) 1.062 (0.114)Int: higher education 0.889 (0.082) 0.892 (0.083)Int: education unknown 0.937 (0.107) 0.952 (0.120)No. of previous ALWA interviews (dummies, ref.: 0-25)No. of previous interviews 26-50 0.849∗∗ (0.054) 0.849∗∗ (0.055)No. of previous interviews 51-100 0.870∗∗ (0.059) 0.868∗∗ (0.056)No. of previous interviews above 100 0.859∗∗ (0.057) 0.860∗∗ (0.056)Int: consent rate in previous interviews 1.164 (0.164) 1.151 (0.162)Age relative to that of respondent (dummies, ref.: aged the same +/-10 years)Int. at least 10 years younger 1.033 (0.082)Int. at least 10 years older 0.866∗ (0.068)Schooling level relative to that of respondent (dummies, ref.: lower schooling level)Same schooling level 1.084 (0.131)Higher schooling than respondent 1.128 (0.157)Unknown relation of schooling levels 1.080 (0.206)

Page 27: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of consent III: interview situation

without interactions with interactions

Interview on weekend 1.056 (0.062) 1.054 (0.061)Share of refused answers as percentage 0.745∗∗∗ (0.056) 0.745∗∗∗ (0.056)Share of ’dont’t know’ as percentage 0.950∗∗∗ (0.014) 0.950∗∗∗ (0.014)Duration before consent quest. (m) 0.998 (0.002) 0.998 (0.002)Disturbance during int. 0.998 (0.076) 0.998 (0.076)Comprehension problems during int. 1.001 (0.080) 1.002 (0.080)Other problems during int. 0.821∗∗∗ (0.055) 0.818∗∗∗ (0.055)Consent to follow-up survey 1.916∗∗∗ (0.132) 1.914∗∗∗ (0.131)Consent to cognitive tests 1.510∗∗∗ (0.075) 1.511∗∗∗ (0.074)Constant 1.706∗∗ (0.438) 1.603 (0.498)

Wald-statistic (χ2) [p-value] 1295 [0.000] 1416 [0.000]pseudoR2 0.114 0.115

Notes: ALWA, own calculations; 9790 observations; 210 interviewers; cluster-robuststandard errors in parentheses; ***, **, * denote significance at 1%, 5%, 10%.

Page 28: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of deterministic and probabilistic linkage success

deterministic determ.+probabilistic

Male 1.057∗ (0.035) 1.041 (0.040)Foreign nationality 0.819∗ (0.084) 1.082 (0.136)Native language German 0.736∗∗∗ (0.064) 1.028 (0.105)Self employed 0.629∗∗∗ (0.051) 0.576∗∗∗ (0.054)Freelancer 0.739∗∗∗ (0.069) 0.913 (0.103)In dependent employment 0.797∗∗∗ (0.055) 1.091 (0.091)Civil servant 0.260∗∗∗ (0.028) 0.176∗∗∗ (0.020)In formal education 0.678∗∗∗ (0.054) 0.683∗∗∗ (0.066)Other activity 0.632∗∗∗ (0.050) 0.636∗∗∗ (0.059)Personal net income below 500EUR 0.921 (0.050) 0.986 (0.065)500-999EUR 1.018 (0.053) 1.096 (0.070)1000-1499EUR 0.929 (0.045) 0.907 (0.054)2000-2999EUR 0.871∗∗∗ (0.044) 0.980 (0.060)more than 3000EUR 0.819∗∗∗ (0.049) 0.849∗∗ (0.058)Income refused 0.800∗∗ (0.088) 0.938 (0.121)Constant 2.370∗∗∗ (0.293) 3.265∗∗∗ (0.479)

Wald-statistic (χ2) [p-value] 578 [0.000] 1017 [0.000]pseudoR2 0.046 0.113

Notes: ALWA, own calculations, 9024 observations, exponentiated coefficients. ***, **,* denote significance at 1%, 5%, 10%. Reference category: respondent aged 18-24, notraining, unemployed, net household income of 1500-1999 EUR.

Page 29: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Sensitivity analysesResults robust in the following alternative specifications:

Inclusion of survey weightsRandom effects based on the interviewersBivariate probit regression of both equations with correlated error termsFourvariate probit regression with additional equations for consent to follow-upinterviews and paper-and-pencil tests respectively

28

Page 30: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Conclusions...for data users:

High consent and linkage rates ⇒ high number of observations, increasedresearch potentialPersonal net income and qualification not relevant for consent ⇒ no selectivityin some important variablesYoungest age group over-represented in the linked data ⇒ additional weightingvariable might be necessary

29

Page 31: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Conclusions...for survey design and field administration:

Privacy concerns contribute to lack of consent ⇒ advance letter andquestionnaire wording should create trustFemale and more experienced interviewers are more successful in achievingconsent ⇒ careful selection and assignment of interviewersHigh number of interviews per interviewer in one study may have averse impacton consent rates ⇒ better field supervision, limit on interviews per interviewer

...for record linkage practice:Linkage rate increases from 53% to 83% by probabilistic record linkageOverall selectivity of linked data set is slightly decreased

⇒ additional step of probabilistic linkage successfully increases research potential

30

Page 32: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Conclusions...for survey design and field administration:

Privacy concerns contribute to lack of consent ⇒ advance letter andquestionnaire wording should create trustFemale and more experienced interviewers are more successful in achievingconsent ⇒ careful selection and assignment of interviewersHigh number of interviews per interviewer in one study may have averse impacton consent rates ⇒ better field supervision, limit on interviews per interviewer

...for record linkage practice:Linkage rate increases from 53% to 83% by probabilistic record linkageOverall selectivity of linked data set is slightly decreased

⇒ additional step of probabilistic linkage successfully increases research potential

30

Page 33: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Question of consentTo keep the interview hereafter as short as possible we would like to include datain our analyses of the survey that are held by the Institute for EmploymentResearch of the Federal Employment Agency in Nuremberg.This includes for instance information to previous periods of employment andunemployment and on the participation in measures of active labor marketpolicy.To this end I kindly ask you for your permission to merge this data to the surveydata.It is guaranteed that all rules of data protection are strictly applied wheneverthese information are analysed.It goes without saying that your consent is voluntary. You can withdraw it at anytime.Is that ok?

31

Page 34: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

StandardizationExtensive standardization of identifiers before records are compared:

minimizing variation in spelling of names, places and street names,filling in missing information in postal codes or place names,deleting blanks and special characters,standardizing or deleting abbreviations,deleting suffixes of house numbers.

All steps are applied consistently to both address files.

32

Page 35: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about
Page 36: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of consent I: respondent

without interactions with interactions

Male 0.991 (0.048) 0.997 (0.049)Foreign nationality 0.872 (0.101) 0.869 (0.101)Native language German 0.834 (0.101) 0.835 (0.101)Born in East Germany 1.192∗∗∗ (0.074) 1.188∗∗∗ (0.075)Age (dummies, ref.: aged 18-24)25-34 0.846∗ (0.079) 0.832∗ (0.079)35-44 0.909 (0.086) 0.847 (0.087)45-52 0.879 (0.089) 0.794∗∗ (0.092)Qualification (dummies, ref.: no vocational training)Training + lower secondary 1.045 (0.101) 1.017 (0.100)Training + intermediate 0.999 (0.082) 0.984 (0.084)Training + upper secondary 1.114 (0.104) 1.137 (0.108)Master craftsman 1.065 (0.128) 1.057 (0.128)Higher Education 1.015 (0.088) 1.033 (0.090)Prose literacy score 0.969 (0.022) 0.971 (0.023)Document literacy score 0.969 (0.019) 0.972 (0.019)Numeracy score 0.965 (0.022) 0.966 (0.022)High-cultural activity 0.927∗∗∗ (0.018) 0.929∗∗∗ (0.018)Employment status (dummies, ref.: unemployed)Self employed 1.136 (0.128) 1.135 (0.129)Freelancer 1.503∗∗∗ (0.214) 1.497∗∗∗ (0.215)In dependent employment 1.316∗∗∗ (0.113) 1.320∗∗∗ (0.114)Civil servant 1.482∗∗∗ (0.196) 1.493∗∗∗ (0.199)In formal education 1.474∗∗∗ (0.163) 1.485∗∗∗ (0.165)Other activity 1.327∗∗∗ (0.141) 1.336∗∗∗ (0.142)Partner in household 1.084 (0.054) 1.087∗ (0.053)Children in household (dummy) 0.991 (0.055) 0.987 (0.055)Personal net income (dummies, ref.: 1500-1999 EUR)Personal net income below 500EUR 1.010 (0.078) 1.009 (0.079)500-999EUR 0.936 (0.070) 0.935 (0.069)1000-1499EUR 0.961 (0.064) 0.963 (0.064)2000-2999EUR 1.002 (0.075) 1.003 (0.075)more than 3000EUR 1.064 (0.085) 1.065 (0.085)Income refused 0.524∗∗∗ (0.053) 0.526∗∗∗ (0.053)

Page 37: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Determinants of consent, separate probit regressions for female and male respondents

female respondents male respondents

Int: male 0.895∗ (0.059) 0.875∗ (0.065)Int: aged 25-34 1.164 (0.124) 1.070 (0.148)Int: aged 35-44 1.313∗ (0.192) 1.106 (0.200)Int: aged 45-54 1.447∗∗ (0.222) 1.213 (0.192)Int: aged 55 and more 1.862∗∗∗ (0.331) 1.102 (0.222)Int. at least 10 years younger 1.029 (0.125) 1.053 (0.110)Int. at least 10 years older 0.754∗∗∗ (0.072) 0.971 (0.096)Int: training, below upp. secondary 1.065 (0.153) 1.093 (0.178)Int: training, upper secondary 1.080 (0.110) 1.040 (0.159)Int: higher education 0.907 (0.085) 0.888 (0.106)Int: education unknown 0.964 (0.103) 0.941 (0.163)Same schooling level 1.155 (0.212) 1.027 (0.116)Higher schooling than respondent 1.209 (0.233) 1.060 (0.160)Unknown relation of schooling levels 0.981 (0.216) 1.214 (0.282)Experience as interviewer (years) 1.089∗∗∗ (0.035) 1.068∗ (0.041)No. of previous interviews 26-50 0.893 (0.083) 0.812∗∗∗ (0.064)No. of previous interviews 51-100 0.821∗∗∗ (0.061) 0.905 (0.079)No. of previous interviews above 100 0.876∗ (0.063) 0.840∗ (0.078)Int: consent rate in previous interviews 1.139 (0.184) 1.192 (0.222)Constant 2.057∗∗ (0.724) 1.645 (0.642)

pseudoR2 0.150 0.104Observations 4920 4870

Page 38: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Loss of observations due to missing values

Total 10,177Male 0Age class 59Foreign nationality 0Native language German 0Born in East Germany 0Educational level 0Prose literacy score 267Document literacy score 267Numeracy score 267High-cultural activity 23Labor market status at interview (alt.) 0Personal net income in EUR 29Partner in household 7Children in household (dummy) 0Int: male 0Age class of interviewer 0Int: age relative to respondent 0Education of interviewer 0Experience as interviewer (years) 0Int: N of previous interviews 0Int: consent rate in previous interviews 0Interview on weekend 0Share of refused answers as percentage 0Share of ’dont’t know’ as percentage 0Duration before consent quest. (m) 0Disturbance during int. 0Comprehension problems during int. 0Other problems during int. 0Consent to follow-up survey 8Consent to cognitive tests 0

Note: ALWA, own calculations.

Page 39: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Mean characteristics of interviewer by consent status, t-tests of differenceNo consent Consent Diff. t

Int: male 0.624 0.556 −0.068∗∗∗ (−3.664)Int: aged up to 24 0.183 0.119 −0.063∗∗∗ (−5.125)Int: aged 25-34 0.180 0.172 −0.008 (−0.569)Int: aged 35-44 0.209 0.199 −0.010 (−0.634)Int: aged 45-54 0.299 0.364 0.065∗∗∗ (3.624)Int: aged 55 and more 0.130 0.145 0.016 (1.195)Int: no training 0.150 0.171 0.021 (1.497)Int: training, below upp. secondary 0.140 0.162 0.022 (1.585)Int: training, upper secondary 0.149 0.165 0.016 (1.125)Int: higher education 0.354 0.333 −0.021 (−1.191)Int: education unknown 0.207 0.170 −0.037∗∗∗ (−2.636)Experience as interviewer (years) 1.684 1.828 0.144∗∗∗ (3.898)No. of previous interviews 0-25 0.325 0.353 0.028 (1.567)No. of previous interviews 26-50 0.171 0.164 −0.007 (−0.470)No. of previous interviews 51-100 0.234 0.219 −0.015 (−0.965)No. of previous interviews >100 0.269 0.263 −0.007 (−0.394)Int: consent rate in previous interviews 0.888 0.904 0.016∗∗∗ (2.813)

Notes: ALWA, own unweighted calculations. 9790 observations. ***,**,* denote significant dif-ference at 1%, 5%, 10%.

37

Page 40: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Characteristics of interview situation by consent status, t-tests of differenceNo consent Consent Diff. t

Interview on weekend 0.194 0.196 0.001 (0.095)Share of refused answers as percentage 0.175 0.040 −0.135∗∗∗ (−14.595)Share of ’dont’t know’ as percentage 0.937 0.598 −0.339∗∗∗ (−7.377)Duration before consent quest. (m) 26.026 25.461 −0.565 (−1.109)Disturbance during int. 0.071 0.058 −0.013 (−1.471)Comprehension problems during int. 0.060 0.050 −0.010 (−1.178)Other problems during int. 0.124 0.081 −0.043∗∗∗ (−4.168)Consent to follow-up survey 0.778 0.949 0.171∗∗∗ (18.873)Consent to cognitive tests 0.332 0.585 0.254∗∗∗ (13.776)

Notes: ALWA, own unweighted calculations. 9790 observations. ***,**,* denote significant dif-ference at 1%, 5%, 10%.

38

Page 41: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about
Page 42: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

Data access: easy, quick and cheapEasy:

Non-technical project proposalApproval by RDC (off-site use) or Federal Ministry (on-site use)Support during the application process and afterwards

Quick:Time until user/institution receives contract: 2 to 4 weeks

Cheap:Data access is free of chargeFinancial support for researchers

40

Page 43: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

On-Site Off-Site

Remote Execution Scientific Use File

SIAB ü ü ü (planned)

Establishment History

Panel

ü

ü

Establishment Panel (ü) ü ü

(only for 2007)

Linked-Employer-

Employee-Data

ü ü

PASS ü* ü* ü

ALWA ü* ü* ü

§ Easy, Quick and Cheap

* combined data

41

Page 44: Linking survey data with administrative employment data ... 2011 Utilizing...Remote Execution Guest Stay Factually anonymous Weakly anonymous (= confidential, restricted) ... about

4

complete

microdata

confidential

microdata

de-facto

anonymised

microdata

delete direct identifier anonymisation method

Availability

Degree of

analysis potential

stronger anonymisation method

fully

anonymised

microdata

on-site off-site

Available at FDZ

42