Subtitle “Promoting the usage of administrative data in Statistics · to improve the use of...

45
Subtitle “Promoting the usage of administrative data in Statistics Estonia by describing and harmonising metadata” Final grant report

Transcript of Subtitle “Promoting the usage of administrative data in Statistics · to improve the use of...

  • Subtitle “Promoting the usage of administrative data in Statistics

    Estonia by describing and harmonising metadata”

    Final grant report

  • 2

    Table of Contents

    Executive Summary .............................................................................................................................. 3

    List of acronyms .................................................................................................................................... 5

    Introduction ........................................................................................................................................... 6

    1. Obtaining knowledge about best practices of administrative data and metadata management

    system from another Member State (study visit) ............................................................................... 7

    1.1. Summary and difficulties encountered ................................................................................... 11

    2. Analysing and compiling data about current agreements, data sources and data structure

    descriptions .......................................................................................................................................... 12

    2.1. Summary and difficulties encountered ................................................................................... 15

    3. Analysing the questionnaires and finding variables that could be replaced by administrative

    data ....................................................................................................................................................... 15

    3.1. Summary and encountered difficulties ................................................................................... 26

    4. Mapping management processes of administrative data and metadata in Statistics Estonia .. 28

    4.1. Summary and difficulties encountered ................................................................................... 36

    5. Creating vision document on how to give feedback to the data owners about data transmission

    deadlines and agreed data structures ................................................................................................ 36

    5.1. Summary and encountered difficulties ................................................................................... 38

    6. Describing metadata for the data sources whose cooperation agreements are renewed in the

    metadata system .................................................................................................................................. 39

    6.1. Summary and difficulties encountered ................................................................................... 41

    7. Renewing cooperation agreements made with data owners before the year 2010 .................... 42

    7.1. Summary and difficulties encountered ................................................................................... 45

    References ............................................................................................................................................ 45

  • 3

    Executive Summary

    According to Statistics Estonia’s strategy, our goal is to produce high quality statistics with as

    low administrative burden and as high efficiency as possible. In order to achieve this, we need

    to improve the use of administrative data and describe the related metadata in our metadata

    management system.

    At the moment, Statistics Estonia uses over 100 different administrative data sources (state

    registries) in the statistical production process. Managing, describing and improving the

    related information and metadata of those sources is a challenging and ongoing process.

    In this project we have described and standardised metadata for the data sources whose

    cooperation agreement needed updating. During the process we also had the chance to

    develop and strengthen the partnership with the data owners, which is the key element of

    using the data of administrative sources.

    Our project started with learning form Statistics Austria’s experience and we were able to

    analyse and work through all our administrative data management related information to start

    managing it more efficiently.

    Efficient data management is only possible if we have optimized management processes.

    During the project we were able to map the as-is and to-be processes of administrative data

    and metadata management.

    The volume of administrative data and metadata is growing fast, so it is now clear, that we

    need to move towards more automated processes. For that reason we have created the vision

    document for developing the new information system Administrative Data Gate, that will

    allow to send automated feedback and reminders to the data owners and also automate the

    data checking processes.

    The grant project enabled us to analyse our current questionnaires domain by domain and

    make suggestions to use additional administrative data sources to lower the response burden.

    This analyse was a new approach for us, because usually the statisticians are responsible for

    their statistical activities. But now we analysed different questionnaires together centrally and

    had the opportunity to give the statisticians some new ideas, which sources to use and

    improve the usage of administrative data in our organisation.

  • 4

    Statistics Estonia is very grateful for being able to realise the activities in this grant project. It

    was really helpful that we could have our temporary employee who worked through a lot of

    information and we were able to start moving towards more automated processes of managing

    administrative data.

  • 5

    List of acronyms

    ATAO – Statistics Design Department

    BORA – The Beneficial Owners Register Act

    EAS – Enterprise Estonia

    ECAA – Estonian Civil Aviation Authority

    EMDE – Electronic Maritime Information System

    GSBPM – Generic Statistical Business Process Model

    MUIS – System of (Estonian) Museums

    RIHA – Administration system for the state information system

    SE – Statistics Estonia

    sDWH – Statistical Datawarehouse

    TÖR – Working register

  • 6

    Introduction

    The main objective of this grant project is to improve the use of administrative data sources in

    Statistics Estonia. According to Statistics Estonia’s strategy our goal is to produce high quality

    statistics with as low administrative burden and as high efficiency as possible.

    Producing high quality statistics is possible only when we have standardised metadata and

    efficient production processes. Harmonised metadata is usable across all statistical domains,

    which means if one data source is used in different statistical activities the metadata will be

    described only once. The described metadata will be available directly for the users and for the

    systems in the live production environment.

    Statistics Estonia has set the goal to reduce administrative and response burden for the

    respondents. This is possible only if we use more administrative sources and quit using some

    of the questionnaires or prefilling some values on the questionnaire to help the respondent to

    answer.

    Improving the use of administrative data for the statistical production has one key element –

    close cooperation and partnership with the data owners. At the moment Statistics Estonia uses

    over 100 different administrative sources and our goal is to build closer cooperation with the

    data owners in order to ensure efficient negotiations and high quality data delivery. One

    important part of the cooperation are valid and up to date data delivery contracts – it is important

    that both sides of the contract know their responsibilities and that data owners know why the

    data is needed and that it is securely processed and stored in Statistics Estonia.

    We have planned several activities during the project that will help to re-use information,

    produce statistics more efficiently, reduce administrative and response burden.

    To perform the tasks of the grant project we have conducted weekly project team meetings,

    where we discuss and agree on the tasks of upcoming week. We are using the web application

    (JIRA) for assigning and monitoring planned activities. All the team member need to report

    weekly on the progress of their tasks and possible difficulties, to ensure the execution of the

    project on schedule.

  • 7

    Now the overview of the progress on the tasks of the grant project is given. The overview is

    written task by task and also the level of progress is evaluated. For every task the encountered

    difficulties and overall summary are described briefly.

    1. Obtaining knowledge about best practices of administrative data

    and metadata management system from another Member State

    (study visit)

    In order to use best practices available in other European statistical offices, we planned to have

    a study visit at the beginning of the project. In order to choose our possible destinations, we

    gathered some information from our colleagues, who have attended different Eurostat working

    groups. We received information that Austria and Finland both have advanced systems for

    managing metadata and administrative data. Both countries also conduct register based

    population and household census.

    Statistics Austria was able to welcome us in October to share their knowledge and experience.

    And as Austria is considered one of the leaders in European statistical system concerning the

    use of administrative data, we were happy to plan the 1,5 day agenda for the study visit.

    The study visit took place from 17 to 19 of October and we had a very full agenda for the 18

    and 19 of October. From Statistics Estonia four people attended the visit:

    two Leading Methodologists from Statistics Design Department (responsible for

    negotiations with data owners, describing metadata of administrative data and preparing

    the contracts);

    Developer from the Data Service Department (responsible for the data warehouse and

    developing the new IT system for administrative data management and automated

    controls);

    Head of Data Description from the Statistics Design Department (responsible for the

    processes of managing administrative data and describing metadata in Statistics

    Estonia).

    The agenda of the study visit was full of very useful and interesting topics for us. The overview

    of the study visit by agenda topics is given below.

    • Coordination and guideline for administrative data

  • 8

    The first topic was an introduction of how Statistics Austria manages their administrative data

    and related information. Statistics Austria uses over 500 different data sources and over 50

    sources are used for the register based census. They do not have data delivery contracts with all

    the sources, because the management of the contracts would be too burdensome and their

    national statistical law says that Statistics Austria can have the data for free from the data

    owners.

    Statistics Austria currently has a separate metadata database for administrative data. It is the

    ACCESS-database used since 2008. The aim of this database is to get an overview of all

    administrative data available in Statistics Austria, have the list of projects that use

    administrative data and a search functionality to choose from the available data. In this database

    also information about external and internal contact persons and organisation details are stored.

    Statistics Austria plans to integrate the current metadata database to their new centralised data

    and metadata management system Statistical Datawarehouse (sDWH). Then the metadata for

    the administrative data will be extended and also data structures, attributes, classification lists,

    quality indicators, data formats, statistical units, reference dates, key words and legal basis will

    be available.

    • Statistical Datawarehouse (sDWH) – Motivation

    Statistics Austria has developed statistical datawarehouse to guarantee internal, house-wide,

    easily accessible data/metadata platform. sDWH project was established in 2014 and technical

    solution was fixed in 2015. In April 2017 Statistics Austria started the implementation phase

    rolewise and departmentwise.

    Metadata is described in the sDWH and has to be defined before it is possible to incorporate a

    new dataset. This supports the housewide harmonisation of concepts and other metadata.

    There are different roles for the sDWH users which helps to manage the workflow of data and

    metadata management. For example the Administrator can define represented variables, data

    sets and load the data, but Quality Manager has to approve or disapprove the represented

    variables and data sets.

    • Statistical Datawarehouse (sDWH) – Application (handling of metadata)

    Statistics Austria also demonstrated the live demo of their sDWH. For us it was most impressive

    to see that data and metadata are stored in one application and that it is also possible to link

    different datasets in the system and visualise the results. In the sDWH all links are also

  • 9

    visualised, for example it can be seen which data set is used in which project. The system also

    shows possible joining options and the data descriptions at the variable level are only one click

    away.

    sDWH enables to mark some variables and data sets as protected and then the in-house data

    owner has to provide the permission for using the data set. The process of asking and granting

    the permission is also part of the sDWH – all the permissions and explanations are stored in

    one system, the users do not have to send separate e-mails for that.

    • Register-based Census - an Overview

    In 2001 the last traditional census was conducted in Austria, it’s cost was 72 million euros. In

    2011 the register based census costed only 10 million euros.

    Statistics Austria has more than 50 data sources for the register based census. In 2006 they also

    had the register based census test, where methods, data procedures and use of registers were

    successfully tested.

    • Workflow of a Register-based Census

    As Statistics Estonia is also planning to conduct register based census in 2021, it was interesting

    to hear about the workflow of the Statistics Austria’s register based census team. They have 13-

    15 persons permanently on the census team, and for the census in 2011 they also had about 20

    temporary team members responsible for different tasks.

    Every team member is responsible for capturing some of the data sources and they remind the

    data owner one month in advance about the need to deliver the data.

    Statistics Austria has process documentation system for management reporting, timetable and

    production schedule.

    The process documentation is available in ADAM/EVA database – for example the

    timetable/calendar for execution of monthly, quarterly, yearly processes is held.

    ADAM/EVA database and documentation (handling of metadata)

  • 10

    The data for the census is stored currently in ADAM/EVA database, but future plan is to

    incorporate the data to sDWH. ADAM/EVA database is also used for metadata documentation.

    There is a search option for tables, variables, attributes and variable values.

    • Other projects based on ADAM/EVA database

    ADAM/EVA database is also used for other projects in Statistics Austria. For example labour

    force survey, national accounts, rich frame for social statistics, monitoring of education-related

    employment, tracking of graduates and register-based labour market career.

    Rich frame is used for calibration/post-stratification, non-response analysis and substituting

    survey questions with administrative data.

    • Statistical Datawarehouse (sDWH) – Future plans

    Statistics Austria shared with us the future plans for the sDWH. They are planning to integrate

    all administrative data and metadata in the warehouse. Then they will have fully integrated and

    harmonised metadata management system.

    Statistics Austria is also planning to create GeoWizard for automatic creation of working maps

    for internal and external use and all the necessary data, metadata and information will be stored

    in standardised way in sDWH.

    For visualisation of statistical information Statistics Austria is currently in the process of

    evaluation of the Tableau software. Visualisation is important internally for the heads of

    departments to create reports about data usage and availability. Externally it is planned to

    develop dashboards for disseminating statistics in more user-friendly format.

    • Quality assessment for Register-based statistics / metadata of administrative data

    Statistics Austria has developed three stages quality evaluation system for the administrative

    data. The data quality is evaluated at the raw data phase, when the registers provide the data.

    The next phase is combining and linking the data in the central database and then the next

    evaluation process takes place. After combinations and imputations the data is available in the

    final data pool and the quality of data is evaluated again.

    • Census - Analysis of Residence

    Statistics Austria introduced us how they avoid overcoverage of residents. They have the system

    that if the person has only one record in the Central Persons Register, they have to confirm the

  • 11

    residence by answering the official letter. About 69 thousand letters were sent out last time to

    confirm the residence. If the residence is not confirmed by answering the official letter the

    person is a candidate for deletion. However, the local authorities have the opportunity to oppose

    the deletions by proving that the person is still the resident of their municipality.

    For conducting the census successfully, Statistics Austria has annual quality evaluation for the

    residence data, all sources and outputs are analysed and evaluated.

    • Business Register for Administrative Purposes and Beneficial Owner Register

    Last topic in the agenda was the introduction of two Austrian registers.

    Every entity taking part of the E-Government processes needs to be registered in one of the

    state registers. The business register combines different registers and is the basis for statistical

    registers.

    The automatic data transmission times are different, some registers transmit the data to the

    business registers weekly, but some registers have the online connection and the data is always

    up to date.

    The Austrian Beneficial Owner Register Act (BORA) obliges legal entities to register their

    owners. This should equip financial supervisors with a tool to fight money laundering and

    terrorism financing.

    Due to the BR for administrative purposes Statistics Austria is optimal partner to technically

    implement that register for the Austrian Ministry of Finance.

    The BO register is a great business case and the BORA explicitly allows Statistics Austria the

    usage of data for statistical purposes.

    1.1. Summary and difficulties encountered

    In conclusion the study visit was very successful for us, we had the opportunity to learn from

    Statistics Austria’s experiences and best practices. Although we are at the different stage of

    using and managing administrative data than Statistics Austria, we got new ideas about how to

    optimise the processes of documenting metadata of administrative data.

    Firstly, we were surprised to hear, that Statistics Austria does not have formal written contracts

    with all the data owners. And as our national statistical law also says, that data from the

  • 12

    registries can be obtained for free for the purposes of official statistics, then we are considering

    the solutions how to make the data transmission agreements more flexible. Right now we

    mostly have written contracts with the data owners or if the data transmission is done for

    piloting the data usage, we send the data request to get the data. We are currently developing

    the form of data transmission agreement that would describe the needed data structures and

    deadlines, but would be flexible and not so burdensome to change and keep up to date.

    Secondly, we really appreciated the workflow management of sDWH. In our current metadata

    management system iMeta the metadata can be described only by the metadata team members

    and for the correct metadata we ask the input from statistical departments by Excel forms.

    However, we are currently piloting new metadata information system Colectica, where the

    workflow service is also integrated. So, in the future we want to implement the similar system

    as Statistics Austria, that also the analytics can insert the metadata, but before publishing it for

    use, the administrator from metadata team needs to approve the metadata.

    Thirdly, after the visit we are convinced that in the future the metadata and administrative data

    should be integrated to one information system in order to use the data more efficiently in the

    statistical production process. The Data Service Department started piloting data virtualisation

    tool Denodo, where data catalogues can be created that integrate data and metadata into one

    system. Implementing this application would be most useful for the statistical departments,

    because then they do not have to link data and metadata themselves anymore.

    The main difficulty of performing this task, was finding suitable time for the study visit for

    Statistics Estonia and Statistics Austria. It was our interest to have the study visit at the

    beginning of the project to be able to use the gained knowledge in our further actions.

    We were not sure whether we will get approval for the grant project application from the

    Eurostat when we planned and attended the study visit in October. So there was the risk of not

    getting refunded for the study visit.

    2. Analysing and compiling data about current agreements, data

    sources and data structure descriptions

    In order to be able to start with the tasks of renewing cooperation agreements made before 2010

    and analysing questionnaires to find variables that can be substituted with administrative data,

  • 13

    we started the process of analysing and compiling information about current agreements, data

    sources etc.

    As Statistics Estonia is currently struggling to manage the information related to administrative

    data, we started the process of systemising and visualising the information we needed to

    manage. It was the first task of our temporary staff.

    An introductory task for the new employee was to create an overview Excel table of the data

    that Statistics Estonia captures form administrative sources. The information in the table is

    presented by data sets of different data sources. Each data set contains information about the

    data structure, the transmission channel, the format, and the deadline for the data to be

    transmitted. In addition, a brief description of the contract or data request has been provided

    and also the purpose of using the data in Statistics Estonia. This task helped our new employee

    to understand and see what kind of data Statistics Estonia receives from different data sources.

    The basis for the overview table was already created and consisted of the list of all the registries

    from whom Statistics Estonia gets data from. The first task was to add the information about

    the data structure, the transmission channel, the format, the deadlines for the data to be

    transmitted, a brief description of the contract or data request and the purpose of using the data.

    All the necessary information was collected by searching through different documents and

    information systems. The information was stored to document management system, metadata

    management system, shared computer folders, Outlook mailbox and JIRA tasks. The

    information has not been systematically stored or managed, so it made the task difficult for the

    new employee to find and compile all the necessary information. The stored contracts and data

    requests have not been always marked correctly as valid or not, so the hardest part of the task

    was to make sure which of the contracts and data requests are still valid. We have new annexes

    for every dataset we capture from the data owners, and new annex very often invalidates the

    former annex, but not always. So it was challenging to go through all the annexes and find

    currently valid ones.

    In Statistics Estonia the web platform called Confluence is used to manage internal information

    and to make it accessible to other colleagues. Every team has it’s own space or page in

    Confluence and different overviews and guidelines can be stored and shared that way. We

    decided that the overview table of different data sources also has to be visualised better and that

    was the next task for the new employee.

    A summary table of contracts and requests for administrative metadata was compiled to

    Confluence. The overview is under Metadata team page, where the sub-page for administrative

  • 14

    data was created. The table contains a list of institutions and their registries which Statistics

    Estonia has a contract with or from whom data is obtained through data requests. In case of a

    contract, the date of signing and completion of the contract is attached to it. In addition, each

    data source has the information about contact persons of the institution to whom it is possible

    to turn to with data transfer issues. The compilation of the summary table gave an overview of

    what existing contracts were signed before 2010 and which should be updated.

    The previous task with an Excel table helped to get started with this task. The list of institutions

    and their registries were taken from the Excel table and added to Confluence table. Our

    employee started collecting information about contracts and data requests via local discs and

    document management system called Livelink. Like in the previous task the most important

    part of this task was to make sure which contracts and data requests are still valid, and also

    which ones are the latest. Statistics Estonia keeps all the documents about each data source,

    even the ones that are not valid anymore. The situation that all contracts and data requests were

    stored in different places and were not in order made this task time-consuming. All the

    information about the contracts and data requests came from inside the document. So the

    employee had to read through every contract or data request file she found, in order to find the

    right information for the table.

    After compiling the overview table to the Confluence, we decided that we need sub-pages about

    every data source. The main reason for that was, that Statistics Estonia captures many different

    data sets with different deadlines from one data source or registry. Also there can be different

    contact persons for different data sets and there are also different users in Statistics Estonia.

    So our new employee linked new sub-pages to the Confluence overview table and these sub-

    pages give the users more detailed information about the data source. Each sub-page has the

    description of captured data set, deadlines, contact persons information, user information and

    link to metadata management system, where the metadata of the data source is stored. In the

    future we also plan to link there the information about the data warehouse tables, where the

    administrative data is stored and can be accessible for the analysts of Statistics Estonia. This

    would give any colleague of Statistics Estonia the full information about each dataset, which is

    available for using.

  • 15

    2.1. Summary and difficulties encountered

    Performing this task was crucial for having better overview of the administrative data related

    information Statistics Estonia needs to manage. It also gave our temporary employee the needed

    knowledge about data available for use to perform the analysis of questionnaires.

    Main difficulties were already described above, but it is important to highlight the large amount

    of information that our temporary employee had to work through and systemise. It was quite

    time consuming, because different documents have been stored in different places for historical

    reasons and now all this information had to compiled to visualise the existing situation.

    Completing this task is a big step ahead for Statistics Estonia, because now we can understand

    our needs for administrative data related information management system.

    3. Analysing the questionnaires and finding variables that could

    be replaced by administrative data

    Statistics Estonia has 127 different questionnaires that the respondents have to fill out in order

    to produce statistics. Our aim is to reduce the administrative and response burden by improving

    the use of administrative sources. Although Statistics Estonia already uses about hundred

    different data sources, we were still convinced that there are variables on the questionnaires that

    can be replaced by administrative data. We have already about 35 questionnaires where we

    prefill some variables for the respondents in order to make the answering more convenient and

    less time consuming. When the grant application was written we chose some of the domains to

    be analysed during the grant project and our temporary employee started the process as soon as

    she had gotten the overview of our questionnaires and available administrative data.

    During the period of October 2018 until March 2019, we have found new data sources for our

    agricultural statistics domain, which is a really important domain in Estonia and so we decided

    to include the domain to our grant project and analyse it more thoroughly.

    The two domains that we are finished analysing by the submission of the intermediate report

    are culture and agriculture.

    First step of the analysis was to get the overview of the questionnaires and collected variables

    of the culture and agriculture domains of statistical activities, and also to get the overview of

    the administrative data in use.

  • 16

    Second step was to compare the variables of questionnaires with the administrative data already

    in use to find possible new sources to replace questionnaire variables. For storing the new

    information and for a better overview of which variables collected by questionnaires can be

    replaced with administrative data, an Excel table was created. The Excel table contains the

    questionnaire code, a specific number of the statistical activity, the name of the statistical

    activity and then certain questions in the questionnaire with suggestions to replace with

    administrative data.

    In Estonia we have the state level administration system for the state information system called

    RIHA. In RIHA every state information system needs to be registered. So actually RIHA is the

    catalogue of the state’s information system, where information is stored about which data are

    collected and processed and in which information systems. And also which services, including

    X-Road services, are provided and who is using them.

    X-Road is the backbone of e-Estonia: it is the data exchange layer that allows various public

    and private sector e-service information systems to link up and function in harmony. X-Road

    has developed into a tool that can write to multiple information systems, transmit large datasets

    and perform searches across several information systems simultaneously. Today, X-Road is

    implemented in Finland, Kyrgyzstan, Namibia, Faroe Islands, Iceland, Ukraine and other

    countries. (e-Estonia, 2019)

    The next logical step to find new data sources was to search the RIHA. If the information system

    owner has registered and inserted all the necessary information to RIHA, it is very good source

    of information for Statistics Estonia. Unfortunately, at the moment quite big part of the

    information in RIHA is outdated, because it needs to be updated manually by the data owners.

    But some development plans hopefully resolve this problem and keeping the information

    updated in RIHA can be automated in the future.

    Additionally, we searched from the Internet to find data that is already public and can also be

    used by web-combing or other methodologies.

    Third step was proposing to replace variables collected by questionnaires with the

    administrative data. This step included face-to-face meetings with people that work on the fields

    of culture and agriculture in Statistics Estonia.

  • 17

    Last step was planning the future activities according to the meetings held with the analytics of

    cultural and agricultural statistics. In some cases we managed also to have negotiations and

    meetings with the data owners to agree on the new data deliveries.

    In the domain of culture we have 6 different questionnaires, which are divided to the following

    statistical activities: Movie, Museum, Music, Radio and Television. We made proposals to

    substitute some variables with new data sources to five different questionnaires. Our proposals

    and the results are compiled in the table below.

    Suggestions Outcomes

    1. Data about all the Estonian movies

    (movie type, name, duration) from

    Estonian Film Database.

    This suggestion was accepted and the next step is

    to negotiate with Estonian Film Database

    manager.

    2. The number of museals in each

    Estonian museum from Information

    System of (Estonian) Museums

    (MUIS).

    These suggestions were accepted, but there is a

    plan to rearrange some of the parts in museums’

    questionnaires, so there’s actually no full

    overview of what kind of data will be needed after

    the questionnaires redesign.

    3. The number of employees in

    Estonian museums from Working

    register (TÖR).

    4. Music event names, types, number

    of concerts, number of tickets sold,

    ticket sales revenue and number of

    visitors from sites that are officially

    selling tickets online in Estonia (for

    example Piletimaailm and Piletilevi).

    This suggestion was accepted partly, because

    there are multiple sites that are selling tickets

    online. In addition to these online selling

    companies, there are non-official sellers and also a

    chance to buy concert tickets on site. So there’s no

    accurate overview of how many people visited a

    concert and how much was the ticket sales

    revenue. However, we have signed the contract

    with one of the sellers Piletimaailm and will be

    receiving first dataset soon. Then our analytics can

    pilot the data usability.

    The information about music events names, types

    and number of concerts can be found from the site

    http://kultuur.info.

  • 18

    5. The number of employees with their

    job titles in radio broadcasting stations

    from Working register (TÖR).

    This suggestion was accepted and as we are

    capturing data from the Working register already,

    the analytics just have to take the data into use.

    6. The number of employees with their

    job titles in television broadcasting

    stations from Working register (TÖR).

    This suggestion was accepted and as we are

    capturing data from the Working register already,

    the analytics just have to take the data into use.

    In the domain of agricultural statistics we have 14 different questionnaires, which are divided

    in the following statistical activities: Sown area of field crops, Purchase of livestock and

    poultry, Livestock farming and meat production, Quarterly statistics of livestock farming,

    Purchase and use of milk, Economic accounts for agriculture, Farm Structure Survey,

    Agricultural products, Yields, Crop farming, Cereals, Dairy products, Organic farming, Supply

    balance sheets of agricultural products and Agricultural products. Agricultural statistics is one

    of the most important statistical domains in Estonia and also in Europe, but collecting data by

    questionnaires has always been burdensome to respondents in that field. That is the reason, why

    we decided to include agriculture, as one of the domains to our grant project. We started

    analysing the domain in the fall and our initial analysis showed that there are still some data

    sources that Statistics Estonia is not capturing and using for the agricultural statistics.

    The Veterinary and Food Board was the data source we started negotiations with and as a first

    step we asked them to send us some data sets for piloting the data usage. The data sets were

    about slaughtered animals, production of honey and the number of pigs slaughtered at home.

    Our analytics piloted the usability of the data and we compiled the data needs to start

    negotiations with the Veterinary and Food Board.

    Statistics Estonia’s data need was broad and we wanted to capture several data sets with

    different data delivery deadlines and also it involved different analytics from our side and

    different departments from the Veterinary and Food Board side. For effective discussions we

    had several meetings to agree on the different data sets compositions and data delivery

    deadlines.

    We managed to agree on all the datasets and now we get monthly and yearly data set about

    slaughtered animals. The monthly data set was immediately used for prefilling the

    questionnaires. Also we now get yearly data sets about the production of honey and number of

    pigs slaughtered at home.

    We also had meetings with two other data owners Estonian Land Board and Agricultural Board.

    Both sources are already in use in Statistics Estonia, but our data needs have widened and also

  • 19

    the composition of data in those registries have changed – so we need to work on new

    agreements and getting access to available data.

    Our proposals for agricultural statistics and the outcomes are compiled to the table below.

    Suggestions Outcomes

    1. The number of slaughtered animals,

    the weight of edible/unedible meat from

    Veterinary and Food Board.

    This suggestion was accepted and the

    questionnaires are prefilled with the data from

    monthly data set

    2. The information about honey

    production in Estonia from Veterinary

    and Food Board

    This suggestion was accepted and we have

    received the yearly data set about 2018, which

    was used for pre-filling the questionnaire. The

    quality of the data is very good and next year the

    data will not be asked with the questionnaire -

    the statistics of honey production will be based

    on administrative data only.

    3. The number of pigs slaughtered at

    home from Veterinary and Food Board

    This suggestion was accepted and we already

    received the yearly data set about 2018, which

    was used for additional data source for validating

    questionnaire data. In the future the data will be

    used to substitute the collected variables.

    4. Number of people employed in the

    agriculture field with their job titles

    from Working register (TÖR)

    This suggestion was accepted, but needs a

    further methodological analysis. The data from

    the Working register is captured monthly, so if

    the analysis shows the compatibility of the data,

    it can be used for pre-filling the questionnaires.

    5. The prices of land from the Estonian

    Land Board, according to new

    methodology

    We have still ongoing negotiations with the

    Estonian Land Board to receive the land prices

    data from them. They have promised to make

    spatial analysis taking into account the land use

    data from the Estonian Agricultural Registers

    and Information Board. Now we are waiting for

    the new spatial analysis by Estonian Land Board

    to see if this is sufficient for our data needs.

  • 20

    6. Organic farming data from the

    Agricultural Board

    The negotiations are still ongoing, the

    Agricultural Board is a very important data

    source for organic farming statistics. The

    information system of the Agricultural Board is

    in development and we have had several

    meetings to explain Statistics Estonia’s

    expanding data needs. We need more detailed

    data about organic farming and we are

    negotiating to get our data needs to be

    considered in the new information system.

    7. The number of fur animals, number

    of animals slaughtered for fur, number

    of skins sold etc. from Veterinary and

    Food Board.

    Recently we got information that the Estonian

    Veterinary and Food Board will start collecting

    information about the fur animals. Now the

    negotiations are in the process of getting to know

    the data composition and possibilities to get

    access to the data.

    Obligations:

    REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE

    COUNCIL (number of bovine animals, pigs, sheep, goats and poultry slaughtered in

    slaughterhouses)

    REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE

    COUNCIL (carcass weight of bovine animals, pigs, sheep, goats and poultry slaughtered in

    slaughterhouses)

    REGULATION (EC) No 138/2004 OF THE EUROPEAN PARLIAMENT AND OF THE

    COUNCIL (Production account: Other animal products: others)

    REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE

    COUNCIL (slaughtering carried out other than in slaughterhouses: pigs)

    ESS Agreement on statistics of agricultural land prices and rents

    COUNCIL REGULATION (EC) No 834/2007 of 28 June 2007 on organic production and

    labelling of organic products and repealing Regulation (EEC) No 2092/91 and Commission

    Regulation (EC) No 889/2008 of 5 September 2008 laying down detailed rules for the

  • 21

    implementation of Council Regulation (EC) No 834/2007 on organic production and

    labelling of organic products with regard to organic production, labelling and control

    REGULATION (EC) No 138/2004 OF THE EUROPEAN PARLIAMENT AND OF THE

    COUNCIL (Production account: Other animal products: others)

    In the domain of accommodation statistics we have 2 different questionnaires, which are

    divided in the following statistical activities: Tourism and Accommodation activities. We made

    proposals to substitute some variables with new data sources to only one questionnaire, because

    our Tourism questionnaire only consists personal questions that can’t be replaced by

    administrative data. Our proposals and the results are compiled in the table below.

    Suggestions Outcomes

    1. The number of beds in

    accommodation facilities from

    Enterprise Estonia (EAS)

    The next step for us was to check the definition

    of “the number of beds” that is used in the EAS

    database . Is it how many beds are in total, or

    how many beds had been used?

    Another important step for us was to make sure

    how EAS manages their database. The main

    question is: Does enterprises themselves

    voluntarily add information to the database?

    2. Wheelchair access in accommodation

    facilities from Enterprise Estonia (EAS)

    Obligations:

    REGULATION (EU) No 692/2011 OF THE EUROPEAN PARLIAMENT AND OF

    THE COUNCIL of 6 July 2011 concerning European statistics on tourism and repealing

    Council Directive 95/57/EC

    Commission Implementing Regulation (EU) No 1051/2011 of 20 October 2011

    implementing Regulation (EU) No 692/2011 of the European Parliament and of the

    Council concerning European statistics on tourism, as regards the structure of the quality

    reports and the transmission of the data

    In the domain of energy statistics we have 4 different questionnaires, which are divided in the

    following statistical activities: Electric power stations; Energy; Energy production, sales and

  • 22

    fuel consumption; Consumption of fuel and energy. We made proposals to substitute some

    variables with new data sources to only one questionnaire, which is “Energy”. Our proposals

    and the results are compiled in the table below.

    Suggestions Outcomes

    1. Data of produced, purchased and sold

    electricity in Estonia from Elering.

    Statistics Estonia is already receiving some

    data from Elering. Our next step is to check

    if Elering can give us necessary data

    monthly.

    5. Data of the fuel used for freight transport

    from Estonian Road Administration.

    SE is already using some of the data from

    the Estonian Road Administration. Next step

    is to check if we could also use the data

    from the yearly car reviews. That would

    enable us to find out the fuel usage of the

    freight transport.

    Obligation:

    Regulation (EC) No 1099/2008 of the European Parliament and of the Council of 22

    October 2008 on energy statistics

    In the domain of transportation statistics we have 23 different questionnaires, which are divided

    in the following statistical activities: Gas pipelines, Freight transport through ports, Freight

    transport on the road, Ships in the harbor, Ship traffic, Ship-based economic and social

    indicators, Ship registers, Marine accidents, Shipping-unloading, Air traffic, Flight accidents,

    Traffic Register, Road transport, Sea transportation, International travel through ports, Railway

    and rolling stock, Rail transport, Inland waterway transport, Vehicle registration, Tram-troll,

    Tram and trolley transport, Aircraft Register, Air transport. We made proposals to substitute

    some variables with new data sources to 3 questionnaires. Our proposals and the results are

    compiled in the table below.

  • 23

    Suggestions Outcomes

    1. Number of air passengers, goods and mail

    transported by air from Tallinn Airport

    website "Air Traffic Review"

    Our next step is to check, if Tallinn Airport

    is willing to give us microdata about the

    passengers, goods and mail.

    2. The number of civil aircrafts from the

    Estonian Civil Aviation Authority’s

    (ECAA) website.

    Our next step is to make sure how and who

    is updating the website? And also how to

    ensure that the website has relevant data.

    3. Data about the trucks (total weight,

    number of axles of the truck, type of

    bodywork, type of engine) from Estonian

    Road Administration.

    SE is already using some of the data from

    the Estonian Road Administration. Next step

    is to check if we could also use the data

    from the yearly car reviews.

    Obligations:

    Regulation (EU) No 70/2012 of the European Parliament and of the Council of 18

    January 2012 on statistical returns in respect of the carriage of goods by road

    Commission Regulation (EU) No 202/2010 of 10 March 2010 amending Regulation

    (EC) No 6/2003 concerning the dissemination of statistics on the carriage of goods by

    road

    Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council

    Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No

    91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with

    respect to the establishment of NST 2007 as the unique classification for transported

    goods in certain transport modes

    Commission Regulation (EC) No 833/2007 of 16 July 2007 ending the transitional

    period provided for in Council Regulation (EC) No 1172/98 on statistical returns in

    respect of the carriage of goods by road

    Commission Regulation (EC) No 642/2004 of 6 April 2004 on precision requirements

    for data collected in accordance with Council Regulation (EC) No 1172/98 on statistical

    returns in respect of the carriage of goods by road

    Commission Regulation (EC) No 6/2003 of 30 December 2002 concerning the

    dissemination of statistics on the carriage of goods by road

  • 24

    Commission Regulation (EC) No 2163/2001 of 7 November 2001 concerning the

    technical arrangements for data transmission for statistics on the carriage of goods by

    road

    Commission Regulation (EU) No 520/2010 of 16 June 2010 amending Regulation (EC)

    No 831/2002 concerning access to confidential data for scientific purposes as regards

    the available surveys and statistical data sources

    Directive 2009/42/EC of the European Parliament and of the Council of 6 May 2009 on

    statistical returns in respect of carriage of goods and passengers by sea (Recast)

    Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council

    Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No

    91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with

    respect to the establishment of NST 2007 as the unique classification for transported

    goods in certain transport modes

    2010/216/: Commission Decision of 14 April 2010 amending Directive 2009/42/EC of

    the European Parliament and of the Council on statistical returns in respect of carriage

    of goods and passengers by sea

    Commission delegated decision of 3 February 2012 amending Directive 2009/42/EC of

    the European Parliament and of the Council on statistical returns in respect of carriage

    of goods and passengers by sea

    Regulation (EC) No 437/2003 of the European Parliament and of the Council of 27

    February 2003 on statistical returns in respect of the carriage of passengers, freight and

    mail by air

    Commission Regulation (EC) No 158/2007 of 16 February 2007 amending Commission

    Regulation (EC) No 1358/2003 as regards the list of Community airports

    UNECE, ITF and Eurostat Common Questionnaire for Transport Statistics Gentlemen's

    Agreement

    Commission Regulation (EC) No 546/2005 of 8 April 2005 adapting Regulation (EC)

    No 437/2003 of the European Parliament and of the Council as regards the allocation of

    reporting-country codes and amending Commission Regulation (EC) No 1358/2003 as

    regards the updating of the list of Community airports

    Commission Regulation (EC) No 1358/2003 of 31 July 2003 implementing Regulation

    (EC) No 437/2003 of the European Parliament and of the Council on statistical returns

  • 25

    in respect of the carriage of passengers, freight and mail by air and amending Annexes

    I and II thereto

    In the domain of IT, research and development statistics we have 5 different questionnaires,

    which are divided in the following statistical activities: IT in the company, IT in the household,

    Business Innovation Survey, Research and development, Research and Development (in the

    company). We made proposals to substitute some variables with new data sources to 2

    questionnaires. Our proposals and the results are compiled in the table below.

    Suggestions Outcomes

    1. The number of employees in the research

    and development field with their scientific

    field, age and gender from Working Register

    (TÖR)

    This suggestion was accepted partly. The

    information that TÖR has about the

    employees in the research and development

    field is not matching with the definitions

    that specific questionnaires have.

    But, TÖR can be used for checking the data

    collected by questionnaire.

    2. The number of Information and

    Communication Technology specialists in a

    company from Working Register (TÖR)

    This suggestion was accepted partly.

    Initially, TÖR could be used for checking

    the data collected by questionnaire, and if

    TÖR’s quality gets better, we might be able

    to fully use it.

    Obligations:

    Regulation (EC) No 808/2004 of the European Parliament and of the Council of 21

    April 2004 concerning Community statistics on the information society

    Commission Regulation (EC) No 753/2004 of 22 April 2004 implementing Decision

    No 1608/2003/EC of the European Parliament and of the Council as regards statistics

    on science and technology

    Commission Implementing Regulation (EU) No 995/2012 of 26 October 2012 laying

    down detailed rules for the implementation of Decision No 1608/2003/EC of the

    European Parliament and of the Council concerning the production and development of

    Community statistics on science and technology

  • 26

    3.1. Summary and encountered difficulties

    The completion of this task was really challenging for us, because our temporary employee had

    to work through a lot of information. However, we managed to analyse the questionnaires and

    available data sources of culture, agriculture, accommodation, transportation, IT research and

    development statistics and now we have the overview of the step by step processes that need to

    be done in order to find new sources or new use cases for the administrative data already in use.

    Some of our proposals were easily applicable, but some of the suggestions need further analysis

    from the statistical domain experts.

    In the field of culture we had six proposals. The proposals 2 and 3 are waiting for the redesign

    of the questionnaire Museum and the redesign process will not be finished before 2021.

    Regarding the proposal 1 to use data form the Estonian Film Database, we have already started

    the negotiation process and drawn the draft cooperation agreement. Hopefully it will be signed

    this year and next year we can start using the data.

    The proposal 4 is already partly in production. We are currently receiving data from

    Piletimaailm, but this company is not the only seller of culture events tickets in Estonia. So for

    more complete data, we have started the negotiations with the other company Piletilevi.

    However, the negotiations with the private sector companies are time consuming and we are

    not sure when we will be able to receive data from Piletilevi.

    The proposals 5 and 6 are about using the Working register (TÖR) data. As Working register

    is a quite new register in Estonia, the data is still quite incomplete as regards of job titles.

    However, we are expecting the completeness to get better by the end of this year and then it

    will be able to use the data across all statistical domains.

    In the field of agriculture we had seven proposals. Proposals 1, 2 and 3 are already in

    production. The proposal 4 was also about using the Working register and it has to wait for

    better data completeness and analysis form statistical domain experts.

    We already received the first dataset form Estonian Land Board according to new methodology,

    but the usability has to be analysed further and maybe we still need to process the data more,

    before it can be used directly in our statistical production process.

  • 27

    Proposal 6 to receive further information on organic farming is still in the draft agreement

    format. We have compiled our data needs and explained them to the Agricultural Board, but as

    their information system is still in development, we have not been able to receive the data or

    sign the new agreement yet. Hopefully we will be able to sign the agreement and get first

    datasets at the beginning of 2020.

    Proposal 7 is not in production yet, because we have not received the confirmation from

    Veterinary and Food Board that they have data about fur animals. Our next step is to arrange

    the meeting with the data owner and clarify our data needs.

    In the field of accommodation statistics we had two proposals to start using data from Enterprise

    Estonia. Our next step is to find out, how reliable is the information in this database. We

    currently have information that the enterprises insert the information there themselves

    voluntarily and that means the data completeness may not be that good.

    In the field of energy statistics we also had two proposals. Proposal number 1 is about using

    monthly data from Elering. Currently we are receiving data from Elering once a year and since

    Elering is a private sector company the negotiations for more frequent data capturing will take

    time. We have planned to have a meeting with them to discuss whether it would be possible to

    start capturing monthly data in automated way for example using x-road.

    Proposal 2 is about using more data from Estonian Road Administration. We are in the

    negotiations process to renew our data delivery agreement and automate the data capturing from

    the Estonian Road Administration. However, the negotiations are taking some time, because

    the information systems of Estonian Road Administration are in the development process. We

    are finalizing the draft agreement with our data needs and we hope to renew the agreement

    during next year.

    In the field of transport statistics we had three proposals. The proposal number 1 involves

    getting microdata from Tallinn Airport. Unfortunately, their first answer was negative, because

    they consider giving microdata to third parties as a security risk. At the moment it is still unclear

    whether we would be able to justify our data needs legally and prove our data protection rules

    will ensure that it is safe to send data to Statistics Estonia.

    Proposal number 2 was about using data from Estonian Civil Aviation Authority’s website. Our

    next step is to find out, how the renewal of the website is organised. For that we have to contact

  • 28

    the authority responsible for the website, hopefully we will get some answers by the end of this

    year and then can decide whether the proposal can be realised in the production process.

    Proposal number 3 was also about using additional data from Estonian Road Administration

    and that will have to wait until the negotiations and renewal of the data delivery agreement have

    been finished.

    In the field of IT, research and development we had two proposals, both were about using the

    data from Working register. We will wait until the end of this year to analyse the completeness

    and quality of the register data and then can decide how different statistical domains can use

    the data in their statistical activities.

    Main difficulties of performing this task was going through huge amount of information and

    trying to find new solutions and sources for the questionnaire-based statistics. Statistics Estonia

    is aiming to use more administrative data and analysing questionnaires domainwise is

    innovative approach for us that has not been done before because the lack of the human

    resources.

    4. Mapping management processes of administrative data and

    metadata in Statistics Estonia

    Statistics Estonia’s goal is to produce high quality statistics as efficiently as possible. Efficient

    production is possible if we improve and widen our administrative data use. Wider use of

    administrative data also reduces administrative and response burden. Statistics Estonia is

    already using over one hundred administrative sources. However, it has become challenging to

    manage all the information related to administrative data sources, for example information

    about cooperation agreements, deadlines, process phases etc.

    During the project we have started to analyse and map the processes of managing administrative

    data and metadata in Statistics Estonia. The first task was to map the “as is” process. Below is

    the result of the mapping of “as is” processes.

  • Figure 1. As-is process of managing administrative data and metadata in Statistics Estonia

  • This process map covers the process of using and managing administrative data from the first

    phase where the data need is identified to the actual usage of the data in statistical production.

    The project map involves five different departments of Statistics Estonia and the process goes

    through the GSBPM phases Specify Needs, Design, Build, Collect, Process and Metadata

    Management/Quality Management.

    The central role in this process map has the Statistics Design Department (ATAO). The

    Statistics Design Department was created in 2017 and since then it has the central role of

    managing administrative data and metadata. The metadata management has been centralised in

    Statistics Estonia Methodology Department since 2004 and managing and capturing

    administrative data was formerly the responsibility of Data Warehouse Department. But as

    Statistics Estonia has started using administrative data more and aims to create and develop

    closer partnerships with the data owners, the management of metadata and administrative data

    was decided to centralise to the Metadata team in the Statistics Design Department.

    The process map above describes the processes after the creation of Statistics Design

    Department. We are working on optimizing the processes of managing administrative data, it

    means we want to provide the data more efficiently and in more standardized way for the

    statistical production.

    Below is the result of mapping the “to be” processes. For better understanding we split

    administrative data management process. Our aim is to simplify the usage and analysis of

    administrative data for the statistical departments and also to shorten the time of getting access

    to new data. Figure 2 shows the process of managing new or changed data need. Metadata team

    has the central role in this process and the process goes through the Specify Needs (1) and

    Design (2) phase of GSBPM. After getting input from Analysts, the Design phase is carried out

    by the Methodologists in Metadata team. The Design phase for administrative data includes

    defining the variables that need to be captured from the data source, compiling information for

    the data request or contract and preparing the data requests and contracts. In this phase, most of

    the communication and negotiation with data owners takes place. The administrative data

    manager’s role in this phase is similar to that of an intermediary or a “translator” – it is important

    to define the data needs as clearly as possible.

    Administrative data management in the Design phase includes describing metadata for

    administrative data centrally, in cooperation with the owners of registers and statistical domain

    departments.

  • 31

    The wide use of administrative data in SE has produced a lot of information related to data

    sources. For example, information about cooperation agreements, data requests, data delivery

    deadlines, data structures, formats, additional information about data, communication with data

    owners, process phases, etc.

    The deadlines for data transmission in SE are currently managed and visualised in the web

    application JIRA. JIRA enables to monitor the process of data deliveries, data loading,

    processing, etc. There are different tasks for every data delivery, and every task and subtask

    can be assigned to a different person. Whenever problems or obstacles arise in some process

    phase, the questions and answers are inserted in JIRA as comments. This enables to get an

    overview of the workflow related to the specific dataset.

  • Figure 2. To-be process of agreements with data owners and managing administrative data and metadata

  • Figure 3 shows the data capturing process that ends with the making the data available for

    analysts. This process goes through the Build (3) and Collect (4) phases of GSBPM.

    Build and Collect phases for administrative data are the responsibility of the Data Service

    Department. In these phases, pre-processing the data and making them available to the NSI’s

    in-house applications is the role of administrative data managers. It is ensured through these

    procedures that there are no duplicate data and that the data are ready for statistical analysis.

    Administrative data are captured through different channels:

    1) encrypted .csv or .xls(x) files by e-mail, FTP or cloud services;

    2) X-Road services that are divided into:

    • pull services – the data owner has developed an X-Road service the content of which is

    suitable for SE. The data are pulled to SE through the X-Road service.

    • push services to xGate – the data are pushed to SE through our xGate service. This is

    the preferred channel for data capture, because SE validates the received data against XSD, and

    the data delivery process is controlled by SE.

    When administrative data have been captured through different channels, the loading processes

    begin. The first step is loading the data to the Initial Observation Registry (IOR). When the data

    are sent by .csv or .xls(x) files, the data will be loaded to Oracle database as they arrive. Loading

    and processing the data that has been sent with files is time-consuming for us, because there are

    constant problems with agreed data structures and wrong data formats.

    When data are captured by X-Road pull services, the XML file is parsed to the IOR by Oracle

    tools. When data are captured by xGate, the file is parsed and validated against the XSD file

    generated in the iMeta system. After loading the data to IOR, it is possible to give the first

    feedback about the received data. The captured data are unloadable if the formats are incorrect

    or there are missing variables.

    The next step is Data Staging Area (DSA), where data structure checks and conversions to

    correct formats take place. These checks and conversions are done according to the metadata

    descriptions in iMeta. It is also possible to develop more contextual checks, but for this, the

    input for the rules is needed from statistical domain departments. After DSA, it is possible to

    automatically generate a quality report about the delivered dataset.

  • 34

    The last step is to make the data available for users, which means that the data are loaded to

    Final Observation Registry (FOR) and are pseudonymised if the data include personal data. The

    process of pseudonymisation involves removing personal identification numbers, names and

    contacts from the data. PIN-numbers are replaced with unique identifiers that allow the data to

    be joined. The unique in-house identifiers are not derived from PIN-numbers, which means that

    it is not possible to convert the unique identifiers mathematically to PIN-numbers.

    The data are stored and versioned in Oracle databases, which are available for use to statistical

    domain departments through SAS or R.

  • Figure 3. To-be process of data capturing and making it available to users

  • 4.1. Summary and difficulties encountered

    After the creation of Statistics Design Department the process of managing administrative data

    changed already. However, our goal is to redesign the processes to provide administrative data

    for statistical departments more efficiently and in the standardised way.

    The main difficulty of mapping the current process was related to the fact that many

    departments are involved in this process. This also makes optimizing the processes challenging,

    because every step of the process has to be analysed thoroughly in order to find the solutions

    of how to simplify the process and shorten the time used for different project steps.

    For having better understanding how to make our administrative data management more

    efficient, it was very helpful to read the document “Good practices in accessing, using and

    contributing to the management of administrative data” (Eurostat, 2018). The main advantage

    of this document is the compilation of experiences of different NSI’s. It is assuring to know that

    other statistical offices are on the same path and we are all moving towards better partnerships

    and administrative management processes. This document also gives an idea which are the

    countries we could learn from and ask for guidance.

    To-be processes were mapped with as much detail as possible. That enables us to monitor the

    processes and make changes, if necessary.

    Our next step is to create description of each process step and document how, who and what is

    done in every stage of the process. The goal is to create written instructions in order to make

    workflow more smooth and to enable new team members to know what to do more easily.

    5. Creating vision document on how to give feedback to the data

    owners about data transmission deadlines and agreed data

    structures

    One part of optimising and standardising the processes of managing administrative data related

    information, is the automation of different notifications and feedbacks.

    Currently we are sending e-mails prior to data transmission deadlines manually and only to

    those data owners, who tend to forget their data deliveries.

  • 37

    At the moment Statistics Estonia does not have an information system for automated data

    structure checks and for monitoring data transmission deadlines of administrative sources. We

    are in the progress of working out the vision document on how to give feedback to the data

    owners about data transmission deadlines and agreed data structures.

    We have analysed what type of information we need to manage in the information system – this

    includes the deadlines of data deliveries, related contacts and contract information and also the

    information about data structures, formats and metadata.

    We have also analysed different information systems that are already in use in our statistical

    production process and there are some information systems that could be developed further to

    provide some of the functionality needed for managing different information and send out

    automatic notifications.

    If the compliance with the agreed data structures and metadata would be checked automatically,

    then we also could generate the quality report for sending the feedback to the data owners.

    The analysis of our current information systems showed that we would need to develop new

    information system to enable automated checks and feedback.

    SE has created a vision document to develop new information system Administrative Data Gate.

    It will help automate the administrative data management in Design, Build and Collect process

    phases.

    The main functionalities of the Administrative Data Gate are:

    • Monitoring data deliveries and sending automated feedback and reminders to data

    holders.

    • Reading metadata from SE’s metadata management system and checking delivered data

    against the agreed structures and content.

    • Functionality to convert data to formats or structures needed by statistical domain

    departments.

    • Administrative Data Gate will allow to log and monitor every procedure that is done

    with the specific dataset.

    • Dashboard with main operations visible for users.

    The Administrative Data Gate would actually become the one channel, where all the

    administrative data goes through, as it is shown in Figure 4. The input data can come in different

    formats (csv, txt, xls, ods, xml, json) or from different channels (x-road push/pull services, e-

    mails), but all the data is guided through the Adminstrative Data Gate, where automated data

    checking and corrections are done.

  • 38

    After the data checking, the quality feedback report is generated and sent to the data owner.

    The quality feedback report’s content is not clear yet, but it will definitely contain information

    about data structures and data formats compliance.

    Figure 4. Dataflow through Administrative Data Gate

    5.1. Summary and encountered difficulties

    We have analysed our needs and have the overview of the functionality that is needed to manage

    administrative data related information efficiently and also to run automated controls on

    delivered data sets.

    However, it has been difficult to decide whether we need to develop new information system

    to provide the needed functionalities or can some of our used applications developed to fulfil

    the needs. The analysis for this showed, that we need to develop new information system.

    Now the challenge is to find financial and human resources to start the development process of

    the Administrative Data Gate. Statistics Estonia has already applied for financial support from

    the SF funds, but the feedback for the application has not arrived yet. So the timeline for the

    development process is still unknown.

  • 39

    6. Describing metadata for the data sources whose cooperation

    agreements are renewed in the metadata system

    Statistics Estonia is using about one hundred different administrative data sources in our

    statistical production process. Describing and harmonising the metadata for administrative data

    is time consuming, because there are several metaobjects in our metadatadata management

    system iMeta that have to be defined in order to fully document the captured data.

    We are in the process of describing all the metadata for received administrative data, but during

    this grant project we will concentrate on describing and standardising the metadata of those

    data sources, whose cooperation agreements are signed before 2010.

    We have done preparations for renewing the data delivery agreements and some of the metadata

    is already described in our metadata management system.

    The metadata description process involves also the data owners and analytics from statistical

    departments. The steps for describing the metadata for administrative data are following:

    • analysing already received data and adding variable descriptions, classifications and

    code lists to our metadata management system;

    • describing the rest of metadata related to the first sub-task according to Neuchâtel

    terminology model (conceptual variables, statistical characteristics, statistical unit types);

    • cooperating with the leaders of the statistical activities to describe and harmonize

    metadata efficiently;

    • describing metadata in the metadata system for additional data needs and giving the

    input for cooperation agreements renewal process.

    The Neuchâtel terminology model (Neuchâtel Group, 2004), has been used for describing the

    variables in our metadata management system. In this model, the variables are described in

    three levels – conceptual variable, statistical characteristic (object variable) and contextual

    variable. Statistical unit type is an entity for which information is sought and for which statistics

    are ultimately compiled. Statistical characteristic is a characteristic of a statistical unit type.

    Conceptual variable (concept) provides a general description of the meaning of the statistical

    characteristic without explicit reference to any particular statistical unit type. Contextual

    variable describes the variable in the context of a statistical activity. Contextual variables can

    be defined as register variables or cube variables.

  • 40

    Our goal was to describe and harmonise all the metadata of those administrative data sources,

    whose cooperation agreement was signed before 2010.

    So we started out with describing and harmonising all necessary metadata objects for:

    Estonian Tax and Customs Board

    National Institute for Health Development

    Estonian Land Board

    Agricultural Board

    Agricultural Research Center

    The Estonian Tax and Customs Board is a very important data source for us. They are the

    owners of several state registers, and SE captures 80 different datasets from them every year.

    The frequency of data capture varies from once a day to once a year. For this source we had to

    describe and harmonise 483 different contextual variables and also all the corresponding

    metadata objects. There were quite many variables that had to be specified with the data owners,

    because the forms of tax and customs declarations are constantly changing and for the

    contextual description of metadata, we had to be sure to understand each variable thoroughly.

    The National Institute for Health Development is the source for death and birth statistics for

    SE. From this source we capture 147 different variables. The content of those variables was

    quite clear for us and it was not too troublesome to describe them in our metadata management

    system. Unfortunately we found out, that National Institute for Health Development is starting

    major developments in their information systems in order to unite different smaller registers

    into one big register. That means we have to be ready for changes in data content and also revise

    our metadata descriptions, when the development has taken place.

    The Estonian Land Board has always been good cooperation partner for Statistics Estonia. They

    are the owners of Address Data System, that enables all the registers to exchange address data

    in harmonised way. For this source we had to describe 162 variables and corresponding

    metadata objects. As we started to prepare the new data delivery contract and review our current

    data needs, we also discovered that due to some changes in Estonian legislation the Estonian

    Land Board does not collect some of the variables that are needed in our statistical production

    process from the start of 2019. That means our analysts have to change the methodology of

    their statistical activities.

  • 41

    For agricultural statistics, one of the most important source is the Agricultural Board. At the

    moment, we capture data once a year, but with signing the new data delivery agreement we

    would like to start capturing data twice a year. We described 88 variables and corresponding

    metadata objects for Agricultural Board, some of those variables are still in draft version until

    our negotiation process to renew the agreement is finalised. However, we have had several very

    useful meetings with the source and also were able to incorporate the available data more

    efficiently in our statistical production process.

    Agricultural Research Center is also an important source for agricultural statistics. Hopefully,

    we will start receiving twenty data sets and 63 variables from that source. As the negotiations

    for new data delivery agreement are still in progress, also the metadata description is in draft

    version. We are ready to change or supplement our current metadata descriptions, when the

    agreement is finalised.

    6.1. Summary and difficulties encountered

    The main difficulty of performing this task is understanding the conceptual meaning of the data

    correctly. For standardising and harmonising metadata of administrative data and documenting

    it in our metadata management system iMeta, we needed to involve the data owners and also

    the data users from our statistical departments.

    There are two sources, Agricultural Board and Agricultural Research Center, whose metadata

    descriptions are partly in draft version. That means that we have done all necessary preparations

    for describing them, but they are not published in our metadata management system yet. We

    are waiting to finalise the negotiations to renew the data delivery agreements and then can

    publish also the metadata descriptions.

    So although the data descriptions are done and managed centrally in Statistics Estonia, there

    are still other parties to the process, whose knowledge had to be considered. This means that

    the process is time consuming and some meetings for agreeing on data definitions have to be

    conducted.

  • 42

    7. Renewing cooperation agreements made with data owners

    before the year 2010

    During the grant project we plan to renew the cooperation agreements which are in force and

    signed before 2010. It is important, because before 2010 Statistics Estonia used a different

    contract format, which did not specify for example the delivered data structure. We are now

    moving towards automated data capturing and controlling systems, so it is really important to

    agree on specific data structures, formats and metadata.

    Our analysis of data delivery contracts showed that we need to renew our contracts with five

    different institutions. And almost all those institutions own several registries from where

    Statistics Estonia captures different data sets.

    We have started preparing new agreements with:

    Estonian Tax and Customs Board

    National Institute for Health Development

    Estonian Land Board

    Agricultural Board

    Agricultural Research Center

    It is important to use the new contract format where the main part of the contract is updated and

    also the annex for detailed data compositions. The main part of the new contract format consists

    of:

    1. General information (details of the parties and the purpose of the contract);

    2. List of contract’s documents (annexes to the agreement are mentioned if any);

    3. Object of the contract (content of the contract, explanation of the concept “data” and the

    method of transmission);

    4. Rights and obligations for the parties (a list of rights and obligations that all parties need

    to follow);

    5. Confidentiality (the confidentiality obligation for the parties is stated);

    6. Contract performance obligations (consists following information: data transmission is

    at no cost, but the costs of performance of the contract shall be borne by each party from

    its budget);

    7. Force majeure (a list of situations which obstruct the continuation or lawful existence

    of a contract amidst the parties);

  • 43

    8. Modification, completion and termination of the contract (consists information about

    the rules for modification, completion and termination of the contract to all parties);

    9. Solving arguments (how the disputes arising from performance of the contract shall be

    resolved);

    10. Other terms;

    11. Contact information.

    New annex(es) include the composition of the data at the variable level and contact persons for

    the transmission of data.

    The renewing process included describing metadata for the captured datasets, because in the

    annexes we always define the data composition in detailed level.

    Estonian Tax and Customs Board is a major data source for us. The data delivery agreement

    with them is in force since 2007. Since that SE’s data needs have grown and also quite many

    changes in the registers of Tax and Customs Board have taken place. It was absolutely essential

    to renew the cooperation agreement. For that we started the preparations from mapping the

    actual data needs of SE. For every data set we had meetings with the analysts who need the data

    and specified the data content. From those meetings we gathered questions and information that

    needed to be negotiated with the data source.

    Some of the negotiations with the Tax and Customs Board took place via e-mails and phone

    calls. However, it is always more efficient to have the necessary persons around one table to

    agree on something.

    The data content negotiations needed the involvement of the subject matter experts from both

    sides. As Statistics Estonia is using 80 different data sets from Tax and Customs Board, we had

    to arrange several meetings to specify the data content.

    There were also separate meeting with the lawyers of both parties. Statistics Estonia has worked

    out the standard data delivery agreement. However, the Tax and Customs Board has their own

    standard agreements for data exchange. So, it was necessary to address legal issues and work

    out the agreement that suits both institutions. The legal negotiations were also successful and

    we managed to sign the new data delivery agreement with the Tax and Customs Board in May

    2019.

    The National Institute for Health Development is an important source for population and social

    statistics. With that institutions we have two separate data delivery agreements – one for each

  • 44

    register. Our goal is to have only one agreement with the National Institute for Health

    Development that cover the birth and death data. At the mo