Working with EU-SILC: data files, variables and data management Practical computing session I –...

21
Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften DwB-Training Cource on EU-SILC , February 13-15, 2013 Romanian Social Data Archive at the Departement of Sociology University of Bucharest, Romania

Transcript of Working with EU-SILC: data files, variables and data management Practical computing session I –...

Page 1: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

Working with EU-SILC: data files, variables anddata management

Practical computing session I – Part 1

Heike WirthGESIS – Leibniz Institut für Sozialwissenschaften

DwB-Training Cource on EU-SILC , February 13-15, 2013Romanian Social Data Archive at the Departement of SociologyUniversity of Bucharest, Romania

Page 2: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• EU-SILC datasets

• EU-SILC Variables

• Differences between Data collected & anonymised User Database (UDB)

• Hands on • Transform CSV-File into SPSS/Stata-Systemfile• number of households/persons in the file

2

Overview

Page 3: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• Four separate files • Household ( = 1 observation per household)

Register data (D) Household data (H)

• Individuals (= 1 observation per person) Register data (R) Personal data (P)

• Since cross & longitudinal data are provided separately => 8 files

3

EU-SILC Data

Page 4: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

For example:

• UDB_c10D_ver 2010-1 from 01-03-12.csv• UDB_c10H_ver 2010-1 from 01-03-12.csv• UDB_c10R_ver 2010-1 from 01-03-12.csv• UDB_c10P_ver 2010-1 from 01-03-12.csv

• _c = cross; _l = longitudinal• 10 = year of the survey = 2010• D = Household Register File• H = Household Data File• R = Personal Register File• P = Personal Data File

• 2010-1= version of the data (e.g. 1st version of the 2010 data)• csv = type of data (=comma separated values)4

EU-SILC Data

Page 5: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• Household Register File (D)• one record for every household including information regarding sample units, household weights, etc• e.g. UDB_c10D_ver 2010-2: N = 225 972 households

• Household Data File (H) • one record for every household including household data• e.g. UDB_c10H_ver 2010-2: N = 225 972 households

• Personal Register File (R)• one record for every person currently living in the household or temporarily absent• e.g. UDB_c10R_ver 2010-2: N = 576 531 persons

• Personal Data File (P)• Reference population: members of the household aged 16 and over• e.g. UDB_c10R_ver 2010-2: N = 476 705 persons

5

EU-SILC Data

Page 6: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

6

Domains & Areas - Households

Source: Guidelines_Doc65_2010.pdf, p.73

Page 7: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

7

Domains & Areas - Persons

Source: Guidelines_Doc65_2010.pdf, p.73

Page 8: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• Variable names in EU-SILC are composed of 3 parts:

• 1st character refers to the dataset (D; H; R; P)• 2nd character refers to the domain• 3 digits represent a sequential number

• e.g. PE040 = Highest ISCED Level attained

• Most important piece of data documentation: • Guideline ‘Description of Target Variables’ • refers to variables delivered by the NSIs to EUROSTAT

8

EU-SILC Variables

Page 9: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

9

Guidelines – Target Variables (collected)

Page 10: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

10

Guidelines – Target Variables (collected)

Page 11: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

11

Guidelines – Target Variables (derived)

(...)

Page 12: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

12

Different variable vames but same labels

Page 13: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

13

Page 14: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

14

Page 15: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

15

Check HH020 & HH021 (using flag-variables)

HH021_F Flag

Gesamt

-5 m.v.of HH020

because HH021 is still

used -1 missing 1 filledHH020_F Flag

-5 m.v.of HH020 because HH021 is used

0 1 17245 17246

-1 missing 1 0 0 1

1 filled 1499 0 1500 2999

Gesamt 1500 1 18745 20246

Page 16: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• DIFFERENCES BETWEEN DATA COLLECTED (as described in the guidelines) AND THE ANONYMISED USER DATABASE

• All income variables are in € (EURO)• Variables removed• Top/Bottom coding• Variables added

• in addition: country specific rules

16

Additional important information

Page 17: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• Names of variable added• 1st character refers to the file (D; H; R; P)• 2nd character ‘X’• 3 digits represent a sequential number

• e.g. • HX040: Household size• HX060: Household type• HX080: Poverty Indicator• (….)

17

Anonymised User Database – Variables added

Page 18: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

18

Anonymised User Database – Variables added

Page 19: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

• Step 1: Open the 4 SPSS and/or Stata – Systemfiles

• Step 2: - Check the data

• How many households are included in the data (H- & D-File)• total• by country

• How many persons are included in the data (P- & R-File)• total (any differences between the P- & R-File?)• by country

• There are 15 countries in the training files. Fill in the table (next slide)• What are the main differences across countries? • Are there differences in the % of unemployed depending whether you use

RB210 or PL031, why?19

Hands on – Exercise 1

Page 20: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

20

Exercise 1.3: Fill in the table

Total Hhld Gross

Income (HY010)

Total disposable

Hhld income before social

transfers (HY023)

Total disposable household

income (HY020 )

Average Hhld Size

(HX040)

% of hhld with

difficulties or great

difficulties to make

ends meet (HS120)

% of household

living in densly

populated area

(DB100)

Basic activity status

(RB210) % of un-

employed

Self-defined current

economic status

(PL031) % of un-

employed

AT Oesterreich

BG BulgariaCY CyprusCZ Czech RepublicDK DanmarkEE EstoniaES EspanaFI SuomiFR FranceGR ElladaHU HungaryLT LithuaniaMT MaltaPL Poland

Country

Mean

Page 21: Working with EU-SILC: data files, variables and data management Practical computing session I – Part 1 Heike Wirth GESIS – Leibniz Institut für Sozialwissenschaften.

21

Exercise 1.3: Fill in the table

Total Hhld Gross

Income (HY010)

Total disposable

Hhld income before social

transfers (HY023)

Total disposable household

income (HY020 )

Average Hhld Size

(HX040)

% of hhld with

difficulties or great

difficulties to make

ends meet (HS120)

% of household

living in densly

populated area

(DB100)

Basic activity status

(RB210) % of un-

employed

Self-defined current

economic status

(PL031) % of un-

employed

AT Oesterreich 50.377 24.768 36.627 2,2 12,5% 36,2% 2,8% 3,7%BG Bulgaria 6.466 4.074 5.869 2,4 68,3% 44,4% 9,2% 9,4%CY Cyprus 39.736 25.781 35.040 2,8 47,5% 55,6% 3,3% 4,1%CZ Czech Republic 14.252 8.216 12.271 2,3 26,4% 32,5% 3,5% 4,3%DK Danmark 75.904 36.981 49.636 2,5 6,0% 32,7% 3,1% 3,9%EE Estonia 12.466 7.530 10.735 2,5 22,9% 34,3% 8,0% 9,2%ES Espana 29.468 17.985 25.251 2,6 29,9% 48,8% 9,2% 11,1%FI Suomi 55.202 30.691 41.284 2,3 5,5% 23,5% 5,0% 6,0%FR France 48.291 25.807 39.117 2,3 18,3% 46,6% 4,5% 5,5%GR Ellada 28.565 14.162 21.930 2,5 53,7% 37,9% 5,1% 6,1%HU Hungary 9.865 4.439 7.773 2,4 54,6% 31,5% 5,2% 6,0%LT Lithuania 9.627 5.223 8.240 2,4 40,4% 43,5% 8,0% 8,9%MT Malta 23.704 14.322 20.199 2,7 43,3% 90,0% 2,8% 3,3%PL Poland 11.543 6.030 8.991 2,6 37,7% 40,9% 4,8% 5,8%

Country