European Commission - Main points of feedback: · Web [email protected] Main...

21
MAIN POINTS OF FEEDBACK: More links to FRIBS – to be included in the next version of the article General update of the article (in particular the part on public use files, item 3.6) Better description of the scope of the article (rules at the EU level vs. rules at the national level) Corrections for text inconsistencies (e.g. disclosure risk in microdata files, transmission of scientific use files to researchers) Reformulation of definition of confidential data Improvement of the section on anonymisation Addition of the sections with hyperlinks (Eurostat (internal) and external) MICRODATA SERVICE FOR RESEARCHERS Note to the reviewer(s): several hyperlinks included in this draft, especially the ones that link to other SE articles of the manual, are not yet activated, but merely display an as-if link in order show how the article will be interconnected once it will be integrated into Statistics Explained. Foreword This article describes legal conditions at the EU level to release/access record-level data and methods used to prepare microdata files for researchers. The article focuses on European Statistical System (ESS) microdata released by Eurostat. As part of the European Business Statistics manual the article refers to data on businesses but the conditions of access to microdata at the EU level are the same for social and business data. Commission européenne, 2920 Luxembourg, LUXEMBOURG - Tel. +352 43011 Office: BECH A2/164 - Tel. direct line +352 4301-30037 - Fax +352 4301-34149 http://ec.europa.eu/eurostat [email protected]

Transcript of European Commission - Main points of feedback: · Web [email protected] Main...

Page 1: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

MAIN POINTS OF FEEDBACK:

More links to FRIBS – to be included in the next version of the article

General update of the article (in particular the part on public use files, item 3.6)

Better description of the scope of the article (rules at the EU level vs. rules at the national level)

Corrections for text inconsistencies (e.g. disclosure risk in microdata files, transmission of scientific use files to researchers)

Reformulation of definition of confidential data

Improvement of the section on anonymisation

Addition of the sections with hyperlinks (Eurostat (internal) and external)

MICRODATA SERVICE FOR RESEARCHERS

Note to the reviewer(s): several hyperlinks included in this draft, especially the ones that link to other SE articles of the manual, are not yet activated, but merely display an as-if link in order show how the article will be interconnected once it will be integrated into Statistics Explained.

Foreword

This article describes legal conditions at the EU level to release/access record-level data and methods used to prepare microdata files for researchers. The article focuses on European Statistical System (ESS) microdata released by Eurostat.

As part of the European Business Statistics manual the article refers to data on businesses but the conditions of access to microdata at the EU level are the same for social and business data.

Table of contents

1. European statistical system and European statistics

2. Confidential data versus microdata

3. Access to ESS microdata released by Eurostat

3.1. Data collections available as microdata files for scientific purposes

3.2. Criteria for eligible research entities and research proposals

3.3. Secure use files: access in the safe centre in Eurostat (Luxembourg)

3.3.1. ESS secure use files

3.3.2. Output checking

Commission européenne, 2920 Luxembourg, LUXEMBOURG - Tel. +352 43011Office: BECH A2/164 - Tel. direct line +352 4301-30037 - Fax +352 4301-34149

http://ec.europa.eu/[email protected]

Page 2: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

3.3.3. Remote access

3.3.4. Decentralised And Remote Access (DARA) to ESS secure use files

3.4. Scientific use files (SUFs)

3.4.1. Preparation of scientific use files

3.4.2. ESS scientific use files

3.5. ESS business microdata released by Eurostat

3.6. Public use files (PUFs)

3.7. Modes of access available in the ESS

4. How ESS microdata are prepared by Eurostat

5. Anonymisation

6. Organisation of access to microdata in the ESS

7. Conclusions and future perspectives

8. See also

9. Further Eurostat information

10. External links

11. Contact

1. European statistical system and European statistics

This chapter focuses on access to European Statistical System (ESS) microdata released by Eurostat. However the major concepts and principles apply to any data set made available by statistical offices for research purposes.

According to the Regulation (EC) No 223/2009 on European statistics:

European statistical system is the partnership between Eurostat and the national statistical institutes (NSIs) and other national authorities (ONAs) responsible in each Member State for the development, production and dissemination of European statistics. NSIs and ONAs are often (and also in this chapter) called national statistical authorities (NSAs). The list of NSAs is available on Eurostat website1.

European statistics are statistics necessary for the performance of the activities of the Community. European statistics are determined in the European statistical programmes2.

European statistics are transmitted to Eurostat according to the subject-matter regulations. If NSAs transmit the data in the form of microdata, and if this is agreed, Eurostat can grant access to this data for scientific purposes.

1 Path: Eurostat website/About Eurostat/Our partners/European statistical system.

2 More information about European statistical programmes: http://ec.europa.eu/eurostat/web/european-statistical-system/overview.

2

Page 3: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

2. Confidential data versus microdata

Confidential data means data which reveal the contribution of individual statistical units (individual persons, households or business entities). One of the fundamental principles of the European Statistical System (ESS) is the obligation of the national statistical authorities (NSA) and Eurostat to protect confidential data. Confidential data can not be published.

The methods used to identify and protect confidential data in the different forms of statistical output are called methods of statistical disclosure control (SDC). The SDC methods establish criteria necessary to judge whether figures in an output (e.g. cells in a table, records in microdata files, regression co-efficient in a model etc.) are confidential or not. If the figures are not confidential they are safe to be published. If they are confidential, they need to be treated in an appropriate way: for instance suppressed, rounded or aggregated. SDC methods offer a wide choice of techniques that aim at optimal protection (without too much loss of information) of confidential figures. A separate chapter of the EBS manual provides an overview of the SDC methods applied on the different types of statistical outputs.

Microdata consist of sets of records (lines in the file) containing information on individual persons, households or business entities, i.e. statistical units. Each record (line) represents information about respondents and/or statistical units.

Records can be easily identifiable when they contain unique direct identifiers such as names, address, social security number, ID number. These confidential records with direct identifiers are only available to the statistical institutes under strict confidentiality protocols. Microdata with direct identifiers (especially with a unique ID number) are more and more important for the production of official statistics, as it allows linking data collected from different sources, thus fostering the use of e.g. administrative sources and derivation of further results on the basis of already collected data. Direct identifiers also allow the creation of longitudinal files, following individuals over time.

Microdata without direct identifiers are still confidential as combination of rare characteristics may lead to identification of unique statistical units (see below). For the research community these microdata are invaluable as only they allow deep analysis of relationships in the data, i.e. causalities, dependencies, convergences etc.

Conditions of access to microdata are normally outlined in the legal acts. In the ESS, access to microdata is limited to statistical analysis for scientific purposes. Precise conditions are established in a Commission Regulation. In parallel, national access systems, governed by National Statistical Institutes, co-exist. Chapter 4 describes the conditions of access to ESS microdata released by Eurostat.

3. Access to ESS microdata released by Eurostat

3.1 Data collections available as microdata files for scientific purposes

As stated above access to confidential microdata for scientific purposes at European level may be considered for those data sets for which Eurostat receives data at the individual/micro level. In 2016 Eurostat granted access to microdata for a wide range of data collections, four of which are business data collections (marked by BS):

1. Structure of Earnings Survey (SES), BS

2. Community Innovation Survey (CIS), BS3

Page 4: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

3. Continuing Vocational Training Survey (CVTS), BS

4. Micro-Moments Dataset (MMD) - Linked micro-aggregated data on ICT usage, innovation and economic performance in enterprises, BS

5. European Community Household Panel (ECHP),

6. European Union Statistics on Income and Living Conditions (EU-SILC),

7. Labour Force Survey (LFS),

8. Adult Education Survey (AES),

9. European Road Freight Transport Survey (ERFT),

10. European Health Interview Survey (EHIS),

11. Community Statistics on Information Society (CSIS),

12. Household Budget Survey (HBS).

The updated list and descriptions of available microdata collections can be found on the Eurostat website.

3.2 Criteria for eligible research entities and research proposals

The legal basis for access to ESS microdata is Commission Regulation (EU) No 557/2013 on access to confidential data for scientific purposes3. The Regulation defines criteria for eligible research entities and research proposals. It also describes how the microdata shall be made available to researchers (modes of access).

The researcher interested in access to microdata must follow the procedure composed of two steps4:

Step 1 – recognition by Eurostat of the organisation where the researcher is affiliated as a research entity;

Step 2 – submission to Eurostat of the research proposal describing the scientific project and justifying the need for access to confidential data.

Step 1 Recognition as a research entity

The recognition of research entities (step 1) aims at identifying those organisations (or specific departments of the organisations) that are carrying out research and can be entrusted with confidential data. The following criteria must be fulfilled by the applying entities:

The purpose (mission, statute) of the entity shall refer to research;

The entity must demonstrate an established record of quality research, e.g. by presenting a list of scientific publications and research projects; results of research has to be made public;

The entity must be independent in formulating scientific conclusions;

3 http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2013:164:0016:0019:EN:PDF

4 See more information on microdata access procedures at the Eurostat dedicated website: http://ec.europa.eu/eurostat/web/microdata/overview

4

Page 5: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

The entity must have adequate security safeguards.

The content of the application is evaluated by Eurostat. Upon positive assessment, the head of a recognized research entity signs the commitment that the microdata will be used according to the terms agreed and protected by the researchers belonging to the entity.

Eurostat publishes the list of recognised research entities on its website5. In 2016 it comprised about 600 entities located inside and outside the EU.

Step 2 Submission of research proposal

In order to get access to microdata, researchers affiliated with the recognised research entities submit to Eurostat a research proposal.

In the research proposal researchers describe:

the research project for which the microdata are to be used;

the data and variables to be used;

statistical methods to be applied on the data;

why access to microdata is necessary for the project;

how the results of research will be published;

how the security of the data will be ensured.

In order to be considered eligible the research proposal must specify in sufficient details the scientific purpose of the research, justified need for the use of microdata and present the expected outcomes of the research. The results of the research must be made public. Each researcher named in the research proposal as a potential user of the microdata signs an individual confidentiality declaration where he or she commits to respect the specific terms of use of confidential data.

The research proposal is consulted internally with Eurostat managers in charge of the requested data and the national statistical authorities that provided the data. If an NSA refuses the access, the data of that country is removed from the microdata file.

When the research proposal is accepted the data are made available to the researchers. Researchers may access the data for the period specified in the research proposal. If requested, new releases of the approved microdata sets are sent to researchers in the course of the project (maximum 5 years).

At the end of the access period researchers send to Eurostat the resulting publications and destroy the confidential data received6.

ESS microdata are available to researchers in two formats:

o Secure use files

o Scientific use files

5 http://ec.europa.eu/eurostat/documents/203647/771732/Recognised-research-entities.pdf6 Eurostat is currently developing a database with all publications issued using ESS microdata.

5

Page 6: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

3.3 Secure use files: access in the safe centre in Eurostat (Luxembourg)

Secure use files contain data on individual statistical units. Normally only direct identifiers are removed and data are cleaned but not further anonymised.

The identification of statistical units (companies or persons) is still possible and secure use files are considered confidential files. The identification of respondents can be done via combination of basic characteristics/variables; e.g. a company with more than 1000 employees in the region level NUTS3 can be easily recognised even if the name is not available.

3.3.1 ESS secure use files

Access to ESS secure use files is only possible in the safe centre in Eurostat. In the safe centre the researchers may analyse the data, but nothing can be taken outside the room. Researchers are isolated from the "rest of the world", they can not use the internet, download the data etc. They can only work in a dedicated room equipped with a standalone PC.

The following ESS data collections are available as secure use files in Eurostat safe centre:

Structure of Earnings Survey (SES),

Community Innovation Survey (CIS),

Micro-Moments Dataset (MMD) - Linked micro-aggregated data on ICT usage, innovation and economic performance in enterprises.

3.3.2. Output checking

At the end of the work in the safe centre the researcher places the results of the research in the output folder. The researcher has to make sure that the output is safe. On top of that the results are checked for confidentiality by the Eurostat data manager. This is so called output checking. It aims at verifying whether the results do not contain confidential data. Safe output is sent to researchers by e-mail.

The general rules for output checking can be found in the Guidelines for Output Checking. The Guidelines differentiate between safe (e.g. regression coefficients) and unsafe (e.g. tables) output and propose relevant techniques to check if results are confidential or not.

Rules for output checking are also specific to the characteristics and sensitivity of the domain. For example there exist specific safe centre rules at the ESS level for the Community Innovation Survey (CIS)7. These rules have been established by the representatives of NSAs and include the requirements and criteria for safe output produced on the basis of the CIS secure use files.

3.3.3. Remote access

Some statistical institutes in Europe offer access to national secure use files in a remote mode 8. It allows researchers to work on the secure use files without travelling to a safe centre. The key principle of remote access is that secure use files remain in the controlled environment in one place while the researcher connects from elsewhere. Specific (e.g. biometric) tools exist to

7 http://ec.europa.eu/eurostat/documents/203647/203701/Note-CIS-researcher-Eurostat-SAFE-Centre.pdf

8 There are also various collaborative initiatives involving several countries. Nordic countries have agreed that in case of a research project calling for joint Nordic microdata it is possible to pool their data into the remote access system of one of the Nordic countries (see: http://nordman.network/).

6

Page 7: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

check remotely the identity of the researcher. The remote connection enables a researcher to run statistical packages/programs on a server in a distant location. There are two basic types of remote access: "real" remote access where the researcher can see the microdata and work directly on the files and remote execution where the researcher cannot see the data but submits codes and routines that are processed on the data by the system. The remote execution system checks the input codes (un-authorised tasks are blocked) and the output data (confidentiality on the fly9).

The intention of Eurostat is to provide access to ESS microdata remotely. The first step in this direction is DARA project (see below).

3.3.4. Decentralised And Remote Access (DARA) to ESS secure use files

Eurostat aims at establishing a system allowing eligible researchers (according to the procedure described in 3.2) to work on the secure use files via accredited safe centres in National Statistical Institutes. This system is currently under development.

3.4 Scientific use files (SUFs)

"Scientific use files means confidential data for scientific purposes to which methods of statistical disclosure control have been applied to reduce (to an appropriate level and in accordance with current best practice) the risk of identification of the statistical unit" (definition from the Regulation (EC) No 557/2013). Compared to secure use files scientific use files (SUFs) are further anonymised. Not only direct identifiers are removed but in addition certain categories of variables are grouped together, rounded, swapped or suppressed. The identification of statistical units in SUFs is still possible but less probable. The eventual identification of a statistical unit in SUFs may take place when the statistical unit has some "rare" characteristics (e.g. it is a very big company)10.

Since statistical units are identifiable, SUFs are considered confidential and can be accessed only by authorised researchers (procedure described in 3.2). Contrary to secure use files, SUFs can be used outside Eurostat's secure environment11. Eurostat sends them currently on CDs/DVDs. The data may be used in the premises of the research entity. The CDs/DVDs need to be stored in a locked compartment and any intermediate results containing confidential data have to be accessed on password protected computer.

Table 1 Identification risk levels in the different types of microdata

Data Risk levels How the respondents can be identified

With what level of precision can

respondents be identified?

Microdata for Extremely high By direct identifiers or "This is a record 9 See more: Methodology for the Automatic Confidentialisation of Statistical Outputs from Remote

Servers at the Australian Bureau of Statistics, Gwenda Thompson, Stephen Broadfoot and Daniel Elazar, October 2013.

10 Both in case of scientific and secure use files the identification occurs when the user has some knowledge about the real statistical unit; for instance the user knows where the unit is located, how big it is, what its main activities are.

11 Specific conditions of use apply. 7

Page 8: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

statistical purposes combination of indirect identifiers (characteristics such as NUTs level, size class, NACE category)

referring to company X."

Secure use files High By combination of indirect identifiers (characteristics such as NUTS level, size class, NACE category)

Probability of identification is much smaller than with microdata for statistical purposes: "This is a record that probably refers to company X."

Scientific use files Low (reduced) By combination of indirect identifiers (characteristics such as NUTS level, size class, NACE category), but only units with rare characteristic can be identified

Probability of identification is much smaller than with secure use files: "This is a record that may refer to company X."

Public use files Eliminated NA NA

3.4.1. Preparation of scientific use files

The scientific use files must be developed in such a way that the identification is more difficult for the user. At the same time the data must still have research value. The following basic SDC methods are applied to make the identification of respondents more difficult12:

Removal of direct identifiers

Recoding: provision of information at the more general level: e.g. at NUTS2 instead of NUTS3, size classes instead of precise employment figures etc.

Micro-aggregation

Record swapping

Rounding

(Local) suppression

In order to produce scientific use files, SDC methods are applied gradually. Actual disclosure risk and quality of the data are constantly checked. The more anonymised the files are, the less detailed they are and the less interesting for researchers.

The process of application of SDC methods on the microdata shall continue until the right balance is found between disclosure risk (probability of identification of respondent) and quality

12 These SDC techniques are described in details in the chapter on SDC.8

Page 9: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

of the data. The "rightness" of the balance depends on many conditions and involves expert judgement. It helps to establish the framework criteria at the beginning of the process.

Table 2 Examples of framework criteria for the scientific use files

Criteria for:

Disclosure risk in the scientific use file

There shall be at least X number of records with the same characteristics (defined by combination of variables) in the SUF

Quality of the scientific use file The indicator X produced from the SUF shall be close to an indicator X produced from original data

It is important to well document the various steps of microdata protection and describe the reasoning leading to a particular decision. It not only makes the process transparent, but also allows reproducing the process on other releases of the data or other countries' data.

3.4.2. ESS scientific use files

Most of the ESS microdata sets are available as scientific use files. All social surveys are available in this format. In addition, the following ESS business data collections are available as scientific use files released by Eurostat:

Structure of Earnings Survey (SES),

Community Innovation Survey (CIS),

Continuing Vocational Training Survey (CVTS).

The preparation of scientific use files is much more difficult for business data than for social data. This is because it is easier to identify the enterprises, even if their detailed characteristics (direct identifiers: name, address, business register number) are not provided. This holds in particular for big enterprises.

3.5 ESS business microdata released by Eurostat

As mentioned above ESS business microdata are available on site in the safe centre in Eurostat (secure use files) and as scientific use files, depending on the data collection. There are fewer users of secure use files because of the costs related to the travel to the safe centre in Luxembourg. All microdata files for researchers are provided free of charge13.

Table 3 ESS business microdata available for scientific purposes and number of research proposals submitted July 2013 – July 2016

Secure use files Scientific use

13 Access to microdata released by Eurostat used to be subject to fees. Since 2011 the access is free of charge following the decision of the Dissemination Working Group. This decision was motivated by the fact that the complex procedures of costs recovery applied by Eurostat to charge for access were in-efficient. In addition these charging procedures were slowing down the application process.

9

Page 10: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

files

Structure of Earnings Survey (SES) 2 35

Community Innovation Survey (CIS) 16 37

Continuing Vocational Training Survey (CVTS) - 8

Micro-Moments Dataset (MMD, available since 2015)

3 -

Not all EU countries participate in the release of ESS business microdata14. In some countries release of individual businesses data is forbidden in the law.

In case of scientific use files, some countries opt out for technical reasons related to the applied SDC method or because they consider the research value of the scientific use file insufficient.

3.6 Public use files (PUFs)

In 2014 Eurostat launched a project aiming at development of methodology for ESS public use files for EU-SILC and EU-LFS. Seven NSIs worked on the methodology for PUFs and developed the actual public use files. These files are now made available on the CROS portal (Platform for Collaboration in Research and Methodology for Official Statistics) here. In the course of the PUF project it became clear that producing PUFs both safe and rich in information would be very difficult. These first versions of the ESS PUFs are mainly to be used for educational and testing purposes. It is not foreseen to elaborate PUFs for European business data15.

3.7 Modes of access available in the ESS

There are various other modes of access existing in the ESS countries. The source data also vary from traditional questionnaires to administrative registers and publicly available sources. The graph below represents current situation regarding different modes of access to the available data types.

Graph 1 Modes of access to the available microdata types (in bold modes of access provided by Eurostat)

14 Details on countries participating in the different microdata releases can be found here http://ec.europa.eu/eurostat/documents/203647/771732/Datasets-availability-table.pdf.

15 But some national statistical institutes (e.g. Finish) offer access to public use files for business data.10

Page 11: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

Secure use files Scientific use files) Public use files

On-site accessRemote accessRemote executionDARA

Data transmitted to researchers on CDs or DVDs or over the internet

Data available on the website with or without subscription

4. How ESS microdata are prepared by Eurostat

In order to release ESS microdata the NSAs need to agree on the mode of access (secure use files or scientific use files) and on the adequate protection methods applied on the data. This process is outlined in the "Guidelines for the assessment of research entities, research proposals and access facilities". It covers both SDC methods applied to produce scientific use files and rules for output checking of secure use files. The process can be launched only for the surveys for which NSAs transmit microdata to Eurostat.

The process consists of the following stages:

(1) The domain specific ESS working group (e.g. CIS WG):

a. analyses the need for and context of the release of confidential data for scientific purposes;

b. identifies researchers’ needs regarding the level of detail of the datasets;

c. prioritises the importance of variables for researchers’ interest;

d. documents the most relevant types of analysis in the context of the survey;

e. proposes the mode of release (secure use files or scientific use files);

(2) The ESS Working Group on Methodology (WGM) is formally notified on the decision by the domain specific Working Group on the release of confidential data for scientific purposes.

(3) After an analysis of the disclosure risk, Eurostat, assisted by the Expert Group on SDC, proposes protection methods.

(4) The protection method is cross-validated by the domain specific Working Group against the initial context and objectives and by the WGM with regard to disclosure risks.

(5) The national statistical authorities providing the confidential data notify Eurostat of their approval of the protection method and of the inclusion of their data in the release.

11

Page 12: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

(6) The list of the research datasets and possible modes of access is published on the Eurostat website16.

5. Anonymisation

The term "anonymisation" is often used as a synonym of microdata protection in general. Anonymisation may refer to the overall "de-confidentialisation" process or to its specific stages. The different stages of microdata protection are:

1. De-identification or pseudonymisation: process of removing direct identifiers (like name, ID and address) from the confidential data and replacing them with pseudo codes17.

2. Partial anonymisation: application of the set of SDC methods on (de-identified) microdata in order to reduce the risk of identification of the statistical unit. Scientific use files are the result of partial anonymisation.

3. Complete anonymisation: application of SDC methods that completely eliminate the risk of identification of the statistical unit (directly or indirectly). Public use files contain completely anonymised records.

The picture below presents the process of preparation of the different types of microdata files and other sorts of outputs.

16 The full list of available microdata collections, reference years and countries participating is available here: http://ec.europa.eu/eurostat/documents/203647/771732/Datasets-availability-table.pdf.

17 In some countries anonymisation is equal to de-identification only. 12

Page 13: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

De-identification (removal of direct

identifiers)

Partial anonymisation, SDC methods for

microdata

Development of secure environment

and of output checking rules

Official statistics(tables,

aggregates)

Complete anonymisation (SDC methods for microdata)

Microdata for research

Microdata for

internal NSI

purposes

Data for the public

Raw microdata

(with direct identifiers

Microdata for statistical purposes

(without direct identifiers)

Microdata for scientific purposes

SECURE USE FILES(used in NSI controlled

environment)on-site or remote access

Microdata for scientific purposes

SCIENTIFIC USE FILES

(used outside NSI controlled environment)

off-site access

Microdata for the public

PUBLIC USE FILES

SDC methods for tabular data

Linked data

Statistics produced by the researchers

(models, tables, aggregates)

Output checking,

SDC

Data linking via direct identifiers

Graph 2 Process of preparation of the different types of statistical outputs.

6. Organisation of access to microdata in the ESS

In parallel to the access to ESS microdata granted by Eurostat, most of the NSAs in Europe offer access to their national microdata. NSAs have microdata at the country level and decide individually which of them are available for scientific purposes and under which specific conditions. An overview of the microdata access systems in the EU was prepared in 2015 in the framework of the "Data without Boundaries" project and is available here: http://www.dwbproject.org/access/accreditation_db.html 18 .

Access to microdata is not always considered as a core business by NSAs and in some countries researchers pay for services related with the provision of access.

Many EU countries collaborate with data archives which provide additional services to researchers. Such services comprise: metadata preparation, user support, trainings, information sessions. In some countries data archives release microdata, usually scientific use files, on behalf of NSAs. In the ESS data archives have become an important partner, adding value to the research community with regard to accessing European statistics.

18 After this overview was done microdata access systems and conditions might have changed in the ESS countries. The intention of Eurostat is to update it in 2017.

13

Page 14: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

7. Conclusions and future perspectives

The European Statistical System microdata access systems constantly evolve. They offer access to more and more datasets provided via different modes.

With regard to access to European business statistics microdata for research purposes, Eurostat offers an array of access modalities covering data collections for which microdata is available at European level.

At the same time Eurostat is expanding its services by developing decentralised access to secure use files via safe centres in Member States. Another area of work in progress is the development of a remote execution system which allows researchers to remotely submit codes and routines that will be applied on the microdata; the researcher does not actually access the microdata themselves.

Other challenges ahead of Eurostat include processing and provision of access to integrated data or big data. Legal issues related with such processing need to be solved first.

8. See also

To be filled in at a later stage

9. Further Eurostat information

List of national statistical institutes (NSIs) and other national authorities (ONAs) responsible in each Member State for the development, production and dissemination of European statistics

http://ec.europa.eu/eurostat/documents/747709/753176/20170119_List_ONAs_MT/f8d89991-5f4f-4c38-9558-61bd594942e7

European statistical programmes (annual and multiannual) http://ec.europa.eu/eurostat/web/european-statistical-system/overview

Microdata access procedures, descriptions of available microdata collections

http://ec.europa.eu/eurostat/web/microdata/overview

Self-study material for the users of ESS microdata sets (including short SDC tutorial)

http://ec.europa.eu/eurostat/web/microdata/overview/self-study-material-for-microdata-users

General rules for output checking https://ec.europa.eu/eurostat/cros/system/files/dwb_standalone-document_output-checking-guidelines.pdf

Safe centre rules for Community Innovation Survey http://ec.europa.eu/eurostat/documents/203647/203701/Note-CIS-researcher-Eurostat-SAFE-Centre.pdf

14

Page 15: European Commission - Main points of feedback: · Web viewAleksandra.Bujnowska@ec.europa.eu Main points of feedback: More links to FRIBS – to be included in the next version of

Public use files for EU-SILC and EU-LFS – results of NSIs collaborative project

https://ec.europa.eu/eurostat/cros/content/puf-public-use-files_en

List of available microdata collections, reference years and countries participating

http://ec.europa.eu/eurostat/documents/203647/771732/Datasets-availability-table.pdf.

10. External links

Information about remote access to pooled data of Nordic countries

http://nordman.network/

Overview of the microdata access systems in the EU countries prepared in 2015 by the "Data without Boundaries" project

http://www.dwbproject.org/access/accreditation_db.html

11. Contact

To be filled in at a later stage

15

YOUR FEEDBACK HELPS US TO IMPROVE THIS ARTICLE AND IS HIGHLY APPRECIATED

Please click here to open the user survey