SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of...

65
1 ESSNET USE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS WP4 TIMELINESS OF ADMINISTRATIVE SOURCES FOR MONTHLY AND QUARTERLY ESTIMATES STS-ESTIMATES BASED ON ADMIN DATA: DEALING WITH REVISIONS (SGA 2011: DELIVERABLE 4.4) Ciro Baldi a , Donatella Tuzi, Francesca Ceccato, Silvia Pacini, Epp Karus, Pieter Vlag a Corresponding author: Senior Researcher ISTAT Italian NSI Directorate for Short-Term Economic Statistics Wages and Labour Input Business Statistics Division Via Tuscolana, 1788 00173 Rome

Transcript of SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of...

Page 1: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

1

ESSNET

USE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS

STATISTICS

WP4

TIMELINESS OF ADMINISTRATIVE SOURCES FOR MONTHLY AND QUARTERLY

ESTIMATES

STS-ESTIMATES BASED ON ADMIN DATA: DEALING WITH REVISIONS

(SGA 2011: DELIVERABLE 4.4)

Ciro Baldi a, Donatella Tuzi, Francesca Ceccato, Silvia Pacini, Epp Karus, Pieter Vlag

a Corresponding author: Senior Researcher

ISTAT – Italian NSI

Directorate for Short-Term Economic Statistics

Wages and Labour Input Business Statistics Division

Via Tuscolana, 1788 00173 Rome

Page 2: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

2

Contents

Contents ............................................................................................................................................... 2

1. Introduction ............................................................................................................................... 3

2. Using admin data for STS-estimates ......................................................................................... 4

2.1 The general system of admin data based STS-estimates .................................................................... 4

2.2 Incompleteness of admin data: two situations ................................................................................... 5

2.3 The large enterprise survey ................................................................................................................. 7

2.4 STS-estimates and active enterprises ................................................................................................. 7

3. Admin data based STS-estimates and revisions ....................................................................... 9

3.1 Components for revisions ................................................................................................................... 9

3.2 The complete sequence of revisions ................................................................................................ 10

4. Revision strategy ..................................................................................................................... 10

4.1 Introduction ...................................................................................................................................... 10

4.2 Factors influencing the revision policy ............................................................................................ 11

4.3 Updating of admin data .................................................................................................................... 11

4.4 Updating of the survey data .............................................................................................................. 13

4.5 Updating of the SBR (and population changes) ............................................................................... 13

4.6 Benchmarking ................................................................................................................................... 14

4.7 The second component: publication strategy and output obligations ............................................... 14

4.8 Examples and considerations ............................................................................................................ 15

5. A complete sequence of revisions: an example from Finland ................................................ 21

6 A structural way to analyse revisions: an example from Italy ................................................ 25

6.1 Introduction ...................................................................................................................................... 25

6.2 General outline: 5 steps .................................................................................................................... 25

6.3 Step 1: context information – example Italian employment data ..................................................... 27

6.4 Step 2: revision measures – example for Italian employment data .................................................. 30

6.5 Step 2a: graphical analysis of problematic domains ....................................................................... 31

6.6 Step 2b: decomposition of the revision error into a survey part and an admin data part .................. 33

6.7 Step 3: further analysis .................................................................................................................... 35

6.8 Cause and effect report: a synthetic description of the main causes of revisions ............................. 37

6.9 Generalisation and general remarks.................................................................................................. 39

7 Conclusions ............................................................................................................................. 40

Acknowledgements ............................................................................................................................ 41

Appendix 1. An application of revision analysis to the Estonian turnover estimates on retail trade 43

Appendix 2. Summary statistics on revisions .................................................................................... 55

Appendix 3. Contribution of admin data and survey data to revisions .............................................. 58

Appendix 4. SAS code for the calculation of the summary statistics on revisions and graphs ......... 62

Page 3: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

3

Summary

Dealing with revisions is an important part of any short-term statistical (STS) process. This is

especially the case when STS-estimates are based on a combination of a survey for the largest

enterprises and administrative data (admin data) for medium and small enterprises. In such a

system, admin data are structurally incomplete when the first estimates have to be made and this

incompleteness leads to revisions when replacing these first preliminary estimates with later final

estimates based on a complete set of admin data. As NSIs (NSIs) have little control over the

completeness of the admin data, it is important that revision analysis is carried out in order to

monitor this effect. However, our analyses show that updates to the information from the large

enterprises survey and corrections in the determination of the active enterprise population between

the first preliminary and last final estimate are also important sources for revisions.

Based on these observations, this paper describes three aspects of revision analysis. The first is

revision analysis as a tool to understand the characteristics of the preliminary estimates. The

second is revision analysis to suggest areas for improvement, either in a development stage or

during current production. The third is to support NSIs when setting up revision policies for an

STS-production process based on a combination of a large enterprise survey and admin data.

Keywords: Revisions, Administrative Data, Revision Policy, Revision Analysis, Short term

Statistics, Quality

1. Introduction

Short-term statistics (STS) have to provide early signals on the economy dynamics. For this reason

they are needed as quickly as possible and are usually released when information is still partial or

subject to change. Subsequently, revised estimates are released to incorporate newly available

information to improve the quality of the indicators. Since, by definition, later estimates are

considered more accurate, the revision is a primary quality indicator of the preliminary estimates,

for the producer as well for the user. The reliability of the preliminary estimates can be easily

assessed by looking at the magnitude and characteristics of revisions. Large and/or systematic

revisions may be interpreted as a signal of bad performance of the early estimates and thus damage

their credibility.

These considerations lead the NSIs to design preliminary estimates in such a way that revisions are

minimised. As a consequence, the analysis and monitoring of revisions are essential aspects either

in a development stage or in current production.

In a development stage, the comparison of simulated preliminary estimates and final estimates

provides useful information to improve and fine-tune the estimation methodology through the

detection of the main causes of discrepancies and through comparisons of variants of the

methodology. In current production, the constant monitoring of revisions allows detection as soon

as possible of problems that may have arisen in the estimation process, so that the necessary actions

may be taken.

Page 4: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

4

The analysis of revisions is important in all short-term statistical processes, but even more in those

based on admin data, for a couple of reasons. Firstly, the NSIs have a limited control on coverage

and completeness of the data, implying that statisticians may face occasional or structural drops in

preliminary data. The monitoring of revisions may be essential to identify these problems at an

early stage. Secondly, frequently at least for final estimates the admin data cover the target

population on a census basis. This means that the final estimate is not affected by sampling errors.

The revision may be interpreted as the difference between an estimate and the “true” value,

assuming that the final estimate is accurate.

A second aspect in dealing with revisions is to set up an appropriate revision schedule of subsequent

releases of the indicators for the same reference period. In the case of admin data based processes,

this revision policy has to take into account the updating of multiple sources of information.

The aim of this deliverable is twofold. On one hand, it documents the work of the ESSnet Admin

Data on the analysis of revisions. It proposes a way to analyse revisions systematically for STS-

estimates based on admin data for the small and medium sized enterprises and a survey of the

largest enterprises. On the other hand, it illustrates the main sources of revisions in STS-estimates in

an admin data based process and relates these sources of revisions to a revision strategy.

The document is organised as follows. Chapter 2 provides a summary of methodological and

practical issues when producing STS-estimates with a combination of a survey of the largest

enterprises and admin data for the medium-sized and small enterprises. Basically, this section

summarises the results of deliverables 4.1, 4.2 and 4.3 of the ESSnet on Admin Data. Some

practical examples of revision analysis are shown in chapter 3. This section mainly deals with an

example of revision analysis from the Italian estimates of the number of employees based on Social

Security data. However, some additional examples using VAT-data from Estonia, Finland and the

Netherlands are also included. Chapter 4 discusses the relationship between revisions and the three

data sources, i.e. a survey of the largest enterprises, admin data for the other enterprises and the

business register to determine the population frame. In this section a link between these revision

sources and a revision policy is made.

In Appendix 1 the revision indicators are explained, together with their formulae. In Appendix 2, it

is described a simple way to decompose the revisions in contributions due to the data sources, In

Appendix 3 a second full application of the revision analysis is reported on Estonian VAT data on

turnover in retail trade. Finally Appendix 4 reports some SAS code snippets that might be used to

replicate the analysis shown here in other NSIs.

2. Using admin data for STS-estimates

2.1 The general system of admin data based STS-estimates

The general set-up when utilizing admin data for producing STS is that a combination of a survey

and admin data is used (see, e.g., Orchard et al. 2011; Karus, 2012; Kavaliauskiene, 2011; Lorenz,

2011; and Šličkutė-Šeštokienė, 2011). Since large enterprises often have a complex structure and

their impact on the estimates is large, correct surveyed observations from those large enterprises are

Page 5: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

5

considered crucial for producing reliable STS figures. In the survey the large enterprises are

generally completely enumerated.

For the remaining small and medium enterprises, VAT data are used instead of direct observations

by the NSI. In some specific cases only, a small sample may be surveyed for small and medium

enterprises In other words, the general system of admin data based STS-estimates consists of two

parts: the use of a survey for the large enterprises; and the use of admin data, i.e. VAT data for

turnover estimates and social security data for employment estimates.

2.2 Incompleteness of admin data: two situations

A drawback of using admin data for small and medium enterprises is that these admin data are still

incomplete when the monthly or quarterly STS-estimates have to be produced. This incompleteness

might be temporal (e.g. due to late response of enterprises) or structural (e.g. because enterprises

below a fixed income threshold may report for a different periodicity to the admin data holder).

Roughly speaking, two general situations can be distinguished:

I. By far most admin data of the period t is available in time (with the remainder of the data being

provided later).

II. No or very limited data of the period t is available in time.

In both cases a common practice among NSIs is to enumerate the population of large enterprises

completely with a survey.

Situation I applies in general to regularly produced quarterly estimates, because the general

situation in continental Europe is that commercial enterprises have to declare their VAT and Social

Security data on a monthly or quarterly base, and the deadline for reporting these data to the

authorities is much earlier than publication deadlines for quarterly turnover and employment

estimates according to the STS-regulation.

The second situation (situation II) mostly applies for monthly estimates, because some enterprises

declare per quarter, and some deadlines for monthly statistical publications are early, e.g. before the

deadline of reporting to the tax office.

Many NSIs consider that coverage of about 80% of the total turnover by the large enterprises survey

and available admin data is necessary, before reliable figures of turnover can be published in a

certain publication cell (see, e.g., Vlag, 2012).

In situation I, the natural wish of a statistician would be to complete the dataset to the whole

population. Theoretically, many ways of doing this are possible. In practice, various methods of

imputation are used. Several statistical production systems using an almost complete VAT-dataset

have been described in deliverable 4.1 of SGA-2011 of the ESSnet Admin Data (Maasing et al.,

2013). An important conclusion of Maasing et al., 2013 was that both level and growth rate

estimates for turnover can be produced using VAT if:

VAT provides a good coverage of the population. A good coverage is defined as 80 % or more

of the estimated population covered by available VAT (Maasing et al., 2013);

the data transfer from the tax office to the statistical institute is guaranteed; and

Page 6: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

6

the link with the Business Register is well established.

In case of situation II (no or few admin data available) it was concluded that the statistical survey

cannot be directly substituted by admin data, because the available admin data are generally not

representative for the target population and this selectivity cannot be determined beforehand. In this

case several estimation methods are available:

maintaining a small survey for current period t and weighting this mini-survey with help of

admin data of previous period. Hence, the admin data are used as auxiliary information for the

estimates. This method is described in deliverable 4.2 of SGA-2011 of the ESSnet Admin Data

(Kavaliauskiene et al., 2013)

alternatively the admin data estimates of previous month or quarter are used to check whether

long-term trends and short-term movements are similar for the larger enterprises and smaller

enterprises. Depending on the outcome, this information can be used to decide whether

a survey of the largest enterprises (LE-survey) only is sufficient for the (first) monthly

estimates, knowing that the structural series based on a LE-survey and admin data become

available at a later stage or for the quarters; or

the LE-survey should be combined with a separate estimate for the smallest enterprises

based on extrapolation of the VAT-series.

These model-based estimation methods are described in deliverable 4.3 of SGA-2011 of the ESSnet

Admin Data (Vlag et al., 2013).

General problem –

administrative data are not available at

the time they are needed

A. Admin data almost complete

B.No or limited admin data

Use of incomplete dataset

current period

Imputation of missing data

Use of admin data of

previous period(s)

Regression type

estimation technique

Benchmarking /

Nowcasting

Deliverable 4.1

SGA-2011

Deliverable 4.2

SGA-2011

Deliverable 4.3

SGA-2011

Figure 1. Scope of the timeliness problem, and relationship with deliverables.

Page 7: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

7

2.3 The large enterprise survey

In deliverable 4.1 of the ESSnet Admin Data it has been discussed that the coverage of the large

enterprise survey differs per country. It was concluded that this coverage is in practice often based

on a balance between:

1. targets for administrative burden reduction and statistical production costs;

2. the link between the statistical business register and the (units of) VAT and social security

admin data; and

3. the impact on growth rates of definitional differences between the ‘administrative’ variables

and the ‘statistical’ variables required by the STS-regulation.

However, when defining the coverage of the large enterprise (LE) survey, it has to be kept in mind

that the maintenance of this survey provides some insurance against unexpected breaks in the

system (such as drops in admin data). As preliminary STS-estimates are generally designed in such

a way that revisions are minimised, Langford and Teneva (2012) developed a method to calculate

the impact of the ‘incompleteness’ factor on revisions, to determine the boundary between the LE-

survey and admin data parts in admin data based STS-estimates.

This method is based on calculating revisions between the first STS-estimates (incomplete admin

data) and final STS-estimates (complete admin data) by defining the boundary between the LE-

survey and the admin data parts at several thresholds. More specifically, revisions are calculated

when using admin data for enterprises with fewer than 20, 50, 100 and 200 persons employed,

respectively. These authors have argued that the boundary between the LE-survey and the admin

data parts should be set at the threshold at which revisions start to increase considerably. As a

consequence, it is recommended that coverage of the large enterprise survey differs per activity.

Langford and Teneva (2012) tested the method on VAT-data in the United Kingdom. The coverage

of VAT for the “first estimates” (3 months after the reporting periods) was about 60 % in terms of

turnover. The coverage of VAT in the final estimates is close to 100 %. Langford and Teneva

(2012) assumed that the data of the LE-survey were complete at the first estimates and remained

unchanged between the first estimates and the final estimates. The validity of this assumption will

be discussed in the next chapters of this paper.

2.4 STS-estimates and active enterprises

Deliverable 4.1 of the ESSnet AdminData (Maasing et al., 2013) extensively discusses a subtle but

relevant aspect for the STS estimates - uncertainty about which enterprises are active and which are

not during the reference period.

The fact that admin data for a reference period normally cover the population of active enterprises

in that period defines a distinctive feature of STS based on admin data. In other words, the admin

data provide a representation of the currently active population of enterprises, which may or may

not coincide with the active population of enterprises according to the Statistical Business Register

(SBR). In practice, this disparity causes several challenges.

The SBR is often used:

as the sample frame for surveys, including the LE-survey in an admin data based STS-system,

and/or

Page 8: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

8

to maintain consistency between the various business surveys of the NSI.

For these reasons, it is preferred that the admin data used for STS-estimates are linked to the SBR.

In practice, most countries link the admin data to a ‘frozen’ SBR for a certain period. A frozen SBR

defines the enterprise population characteristics as registered in the SBR at a certain date (for

example 31 December 20xx). However, linking the admin data to a frozen SBR is not

straightforward, even for a complete admin dataset (e.g.for annual SBS-statistics or final STS-

estimates). Reasons for complications are:

1. different enterprise units in the SBR and the admin data;

2. different registrations of mergers/split-offs in admin data and the SBR due to time-lags;

3. different registration of information in admin data than in the SBR due to maintenance

peculiarities;

4. (slightly) different population coverage.

The incompleteness of admin data is an important issue for STS. Due to time-lags between the SBR

and the admin data source, late reporting starting enterprises are missed in the first

estimates as they

are not included in the SBR yet. For the same reason, it is difficult to determine whether admin data

are missing due to a) late reporting or b) because the enterprise has stopped. In the latter case no

imputation is needed for the missing units. Hence, the so-called provisional target population is

uncertain at the time of the preliminary estimates. This situation is sketched in Figure 2 below.

2

Population frame

= business register

Admin

Data

(i.e. VAT)

SBS:

link admin data

Complications:

- coverage,

- dif. units,

- merges

Estimation:

Provisional active population

but stopped

Additional challenge STS:

time-lags

missing

missing

VAT

but active

Not in BR but

Admin

Data

(i.e. VAT)

Figure 2. Schematic sketch of a) general challenges when linking admin data to the SBR (middle column)

and b) specific challenges for STS when linking incomplete admin data to the SBR

Page 9: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

9

In contrast to the first estimates, this ‘time-lag’ problem does not exist for the final estimates. All

admin data are available when the final estimates have to be produced and the final target

population which consists of all active enterprises in period t hand can be compiled by linking all

available admin data with the SBR (Baldi et al., 2012).

Analyses in several countries show that the uncertainty in the provisional target population is a

major source of revisions in STS-admin data estimates (Maasing et al., 2013).

3. Admin data based STS-estimates and revisions

3.1 Components for revisions

The previous chapter mentions three major sources for revisions in an admin data based STS-

system:

1. the estimation for the small and medium sized enterprises, because the admin data are

incomplete for the preliminary first estimates but complete for the final estimates;

2. the estimation for the large enterprises, as the data of the LE-survey might be updated between

the first and final estimates.

3. the estimation of the target population, i.e. the active enterprises, for the preliminary estimates.

Taking into account these three components, Roestel (2011) proposed to perform revision analyses

on the total estimate, i.e. in most cases the published results, sub-divided by revision analyses on:

a. both data sources, i.e.

a1. the admin data based estimate for the small and medium sized enterprises; and

a2. the LE-survey estimate for the large enterprises

b. (the uncertainty in) the active population at the time of the preliminary estimates.

The basic idea behind this sub-division is that (simulations of) revisions between the preliminary

estimates and the final estimates can be used for fine-tuning an admin data based STS-production

system. More specifically, these can be used to decide whether available resources should be

concentrated on optimising:

the estimation for missing VAT-data in the preliminary estimates (or VAT-data analysis in

general);

the estimation for the LE-survey (i.e. dealing with missing survey data or revisions to these

data), or the size of the LE-survey;

the link between the admin data and the business register.

In a production setting, this sub-division can help to find the cause for an unusually large revision

more quickly. This is especially the case if the revision is not caused by a single unit, but by a more

general problem, such as:

drops in responses in VAT or the LE-survey; or

the relationship with periodicity and changes in the business cycle.

Note that these three components of revisions cover all possible causes. For example, revisions may

also be caused by changes in the SBR (corrections of erroneous NACE-codes; incorporation of

Page 10: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

10

merges/split-offs in the SBR), revised VAT or social security data declaration or due to combining

different sources, but these three components are normally sufficient to find the underlying causes

of revisions. Chapter 6 explains how the revisions of the separate components can be calculated.

3.2 The complete sequence of revisions

Depending on the timeliness of the admin data and the publication deadlines, the theoretically most

extensive sequence of admin data based STS-estimates for period t would consist of the following

estimates:

1. A first preliminary estimate for period t based on a LE-survey plus model-based estimation for

small and medium sized enterprises, because the admin data are not available yet (= situation II

in Chapter 2),

2. A second preliminary estimate for period t based on a LE-survey plus an estimation for small

and medium sized enterprises based on fairly complete admin data (= situation I in Chapter 2),

3. A series of estimates based on a LE-survey and admin data gradually becoming complete,

4. A final estimate based for period t on:

a LE-survey (which is complete and completely analysed)

the complete (and analysed) set of admin data for small and medium sized enterprises

a population frame derived from a ‘analysed and corrected’ SBR for this period.

In practice, such a complete sequence was only found for the retail trade in Finland which covers a

sufficiently long period. This sequence covers all estimates between the first output as required by

the European STS-regulation (30 days after the end of the month) and a final estimate (225 days

after the month). As the output obligations for the first estimates are later for other activities, the

admin data are already available. But in Finland such a complete sequence does not exist for other

activities and for these activities the sequences starts with the second preliminary estimates.

4. Revision strategy

4.1 Introduction

The design of a revision policy is an important part in the development of any short term statistics

process. Because short term indicators have to provide early signals on the economy dynamics,

they are needed as quickly as possible and are usually released when only partial information is

available, thus being subject to changes when more information is available. In setting up a revision

policy a NSI has to balance the need to release more accurate indicators when updated information

is available and the costs both for the producers and users of frequent release. As Eurostat (2012)

puts it:

“Revisions are a two-sided affair from the producer’s perspective as well. The new information they provide

is needed to describe economic developments more precisely, yet, frequent and/or major revisions can

damage the credibility of the statistical data. …Both, producers and users have extra work caused by

revisions. Producers have to develop revised and new data. Users have to update their databank and to adjust

their analysis.”

Page 11: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

11

And NSIs have to “… find a balance between the demands for the best statistical information at all points in

time (which then suggests a continuous revision policy) and avoiding unnecessary changes in the data.

The basic principle is that significant information for politically or economically important data should be

incorporated as quickly as possible into published data in order to avoid a wrong assessment of the economic

development, whereas minor changes should first be collected before being implemented.”

While the general principles of revision policies are well described in the – above cited -

OECD/Eurostat (2008) and Eurostat (2012) documents, this paper presents some specific elements

for admin data based STS estimates.

4.2 Factors influencing the revision policy

While the first release is constrained by European Regulations or national obligations deadlines,

following releases are planned according a revision policy fixed by the NSI. The factors that

influence such policy may be grouped in two categories:

1. the updates of available data sources (input), due to revisions of:

1.1 admin data;

1.2 survey data;

1.3 SBR;

1.4 other information (e.g. benchmarking);

2. publication strategy and output obligations.

Note that the first factor corresponds with the three components which determine the quality of the

admin data based STS-estimate (see chapter 2).

4.3 Updating of admin data

VAT or Social Security data for the reference period might be either incomplete or missing at all at

the time of the first estimate. Depending on the deadline of the first estimate and the legislation that

sets the obligation for the firms, the admin data may be missing at all (or only very limited

available), as illustrated in situation II (see Vlag et al 2013, Kavaliauskiene et al., 2013, Teneva

2012, Orchard et al. 2011) or partly incomplete only as illustrated by the situation B cases (see

Baldi et al. 2011, Vlag et al., 2013. For following estimates of the same period (revisions) the NSI

may rely on more complete datasets up to the point where the data for the whole population covered

by the admin data is available. In situation II, a following estimate can be based on incomplete

admin data. Also, in situation II, the completion of the data happens because all late reporters

become available eventually.

During the development of the statistical process, the cause and the timing toward the completion

and the stabilization of the admin data should be analysed in order to set up a proper estimation

methodology and a revision policy. The plot of the cumulative number of reporters against the

transmission dates, or the analysis of the impact of value changes in different deliveries of the data,

are useful tools in deciding when to revise the estimate and which is the length of the series to be

revised. Such data analyses may not only help to determine revision strategy, but also help in

Page 12: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

12

designing the transmission schedule of admin data from the data holder to the NSI. For instance,

Italy requests two transmissions of Social Security data, one at 45 days from the end of the quarter,

the latest moment to respect the STS deadline taken into account the processing time, and the

second one year later when the data are complete and stabilized (Baldi et al. 2011). The schedule of

transmission of VAT data set up by Estonia for the turnover index in this very start of the new

system implies 4 transmissions of data with a monthly periodicity (Karus 2012).

In general, it can be stated that the majority of the late information will become available at two or

three specific moments. More precisely,

most data of late reporters will be available at the end of the next period, i.e. when the estimates

for the next month or next quarter have to be produced;

in the case of monthly output, missing data due to quarterly reporting become available after

the end of the quarter. More specifically, quarterly admin data become available 20-40 days

after the quarter, taking into account that reporting deadlines for submitting VAT and/or social

security data for monthly and quarterly periods (Vlag et al., 2013) are in general 20-40 days

after reference periods;

the remaining missing information, the annual VAT and employment reporters, becomes

available x months after the end of the year. When these missing data are received, the admin

dataset can be considered as complete.

Besides completing the data, the subsequent transmissions of admin data may contain change of

values for the units already present in earlier transmissions. This may happen either because the

firm has amended a previously declared data or because the administrative institution has adjusted

the data following its check procedures. The timing of revisions in previously reported values for

the units already is, in general, more erratic that the timing of completing the missing admin data.

Therefore, analyses about completion and the stabilization of the admin data in time are extremely

important in a development and production phase despite the fact the most important moments of

the completion of the data can be estimated beforehand.

The consequences for developing a revision strategy are as follows:

the first estimate is defined by output needs, i.e. European regulations or national requirement.

If an incomplete set of admin data is available, then:

an obvious timing for the first revision is the publication moment of the next period,

because the majority of the late reporters are available then;

an obvious timing for another revision is the publication moment of the quarterly results

(in case of monthly publication) of the next period because the majority of the late

reporters are available then;

an obvious timing for the last revision is when the information of the yearly reporters

become available.

The final estimate (on which last revision is based) is considered as the most accurate estimate and

is therefore considered as the reference.

Page 13: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

13

If no or very limited admin data are available at the timing of the first estimates (situation II), an

obvious additional revision moment is the first publication with an incomplete admin data set

(situation I).

This raw revision scheme can, however, be improved based on analyses about completion and the

stabilization and the previously mentioned other factors determining revision strategy, like large

enterprise survey, updates of information of the Business Register, benchmarking and output

obligations.

4.4 Updating of the survey data

Since the general set-up when utilizing admin data for producing STS is that a combination of a

survey for large enterprises and admin data for small and medium sized enterprises, the impact of

the updates in survey data, new or revised reponse, plays and important role in the amount of

revision. However, in contrast to admin data NSIs have much more control in the timing of these

updates because it controls the data collection. The impact of this data source has less impact on the

revision strategy than the availability of admin data.

When carrying out revision analysis, it is however important that the original survey data (used for

the first estimates and other estimates) remain stored. This recommendation seems obvious but, in

practice, the revision analyses presented in Chapter 6 of this deliverable were hampered by the fact

that some countries do not preserve the ‘original’ survey data. The ESSnet project has also observed

that – like the admin data part – the large enterprises survey is sometimes not complete when the

first estimates have to be made. Therefore, it is recommended to perform analyses about completion

and the stabilization of the large enterprise survey when developing an admin data based STS-

system. Improving the data collection and data treatment system of the few remaining surveyed

enterprises may considerably improve the quality of the output. Some examples will be given in

Chapter 6.

4.5 Updating of the SBR (and population changes)

The role of the SBR is crucial for the definition of the target population. Some countries involved in

the ESSnet use for current period t the latest available frozen version of the SBR, which is a file

normally released yearly. Typically this file is released in the last part of the year (October-

November) and is used for the subsequent year, both for sampling monthly to yearly surveys and to

define the target population for that year. Another important point is whether the frozen SBR file is

time-referenced or not, that is if this file is referred to a particular moment or period of time (e.g a

year), indicating that the information well represent the population of enterprises active in that

moment or during that period. In Finland it is referred to the year before and in Italy, where the file

is released at the beginning of the year, to two years before. Other countries, like the Netherlands,

use an actual version of the SBR, e.g. the population for period t corresponds with the list of

enterprises in the SBR for period t.

Within admin data based statistical processes, the actual use of the (frozen version of the) SBR,

however, varies among countries. In Lithuania and Estonia, beyond being used to sample the survey

part, it establishes the population frame for the coming year. Only changes occurred to big

Page 14: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

14

enterprises (entries in the sector/births, exits/deaths or spurious demographic events such as those

implied by mergers or demergers of enterprises) are taken into consideration during the year. In

Estonia, in the current phase of testing, for the admin data part the units identified as dead according

to the administrative sources are removed from the target population (Maasing 2012). On the other

side, Italy and Germany use the SBR only as a starting point to define the target population of the

reference period by adding to it the units born during the year and removing those deemed dead (see

Baldi et al. 2011, Lorenz 2011). The characteristics of these approaches are discussed elsewhere

(De Waal et al. 2012, Maasing et al. 2012).

Here what matters are the implications for the revision policy. Due to time-lag issues, the

registration of starters-, stoppers-, merges and split-off is different in the admin data sources. This

time-lag effect increases when an older frozen version of the SBR is used as population frame.

However, it does also exist when an actual version of the SBR is used, although to a smaller extent.

In deliverable 4.1 of the ESSnet on AdminData (Vlag et al., 2013) and Chapter 2.3 it is argued that

this time-lag effect leads to uncertainty in the determination of the provisional active population and

that this uncertainty is a major sources of revision.

The implication for the revision strategy is that the final revision should be based on a complete set

of admin data plus the final version of the SBR for this period. Ideally, it is recommended that the

intermediate revisions also correspond with updates of the SBR information.

4.6 Benchmarking

The availability of estimates for a lower level of temporal disaggregation (e.g. yearly or quarterly)

may trigger the necessity to revise the estimates at the higher level (quarterly or monthly) to

produce consistent estimates. For instance, the release of quarterly data on turnover in retail trade is

used in Estonia and Netherlands to benchmark the monthly estimates. In some cases the benchmark

to annual estimates (e.g. SBS) involves the adjustment to a more appropriate population of

enterprises. In other countries, such as Italy, the practice of revising the STS indicators to acquire

the consistency with the SBS indicators is not used.

Whatever the practice of the country, it is recommended that if benchmarking is applied the timing

of the benchmarking corresponds with the revision strategy. More specifically, when monthly

results are benchmarked with quarterly information it is preferred that this benchmarking

corresponds with the incorporating of the quarterly admin data in the published estimates. When the

STS-series are benchmarked with annual information it is preferred that this benchmarking

corresponds with the incorporation of the annual admin data (and the version of the SBR) in the

published STS-estimates. Of course, this ideal situation should be balanced against publication

obligations (see Chapter 4.7).

4.7 The second component: publication strategy and output obligations

Beyond the availability of more updated information, output obligation and publication strategy

also determine the revision policy.

Page 15: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

15

Firstly, the actual users’ needs; questions are often implicitly raised in order to set up the schedule

of revisions, e.g. how frequent revisions should be in order to be functional for the analysts and

other users, and what is the maximum delay demanded for the final estimate.

Secondly, the revision scheduling may be influenced by the fact that some statistics are inputs for

other statistics (for instance National Accounts). A related issue arises because, in this case, the NSI

may set up common revision policies for indicators used for other statistics. See for instance ONS’s

Revisions and Corrections Policy (http://www.ons.gov.uk/ons/guide-method/revisions/revisions-

and-corrections-policy/index.html).

Thirdly, the relationship with the admin data holder may influence the planning of revisions, since it

may only be feasible to ask for the data for one period a few times.

Finally, the work load necessary to manage (organise, process, store) multiple versions of micro-

and macro-data should not be underestimated. This remark seems obvious, but one of the most

important conclusions of this work was that several NSIs have implemented an admin data based

STS system, but only a few countries have organized this system in such a way that complete

revision analyses can be carried out.

4.8 Examples and considerations

The following examples show how the revision policy is affected by the available information.

In Lithuania, the monthly indicators of income for the Retail trade are released only twice, at 27-28

days after the end of the reference month and at about 58 days. The second release incorporates late

respondents and revised values from the surveys and also the VAT data for the reference month,

which were not available for the first estimate (Table 1). In the same country, the income indicator

for Manufacturing is released three times: the first estimate is released at t+21 or t+22 days, based

only on survey data; the second at t+51 also incorporates the admin data for the reference month;

the third estimate, released at the end of the year/beginning of the following year incorporates

updated information (Table 2).

In Estonia, the new methodology for the estimates of monthly indicators on turnover contemplates

up to six estimates (Table 3). Considering the available information, three versions of preliminary

estimates are released: the first estimate is released 30 days after the reference month and is based

on the first version of the survey data and the first transmission of admin data (obtained at t+24

days). At 60 days a second version of the estimates is released, on the basis of survey updates and a

second admin data transmission (obtained at t+54 days). A third version of the estimates is available

90 days after the reference month, which uses only updated survey information because the VAT is

complete and checked after t+60 days. Further versions of the preliminary estimates are also

calculated as a consequence of benchmarking to quarterly estimates (implemented in February,

May, August and November), which are based on a larger survey.

Italy releases five estimates of the quarterly index on number of employees, the first at 60 days from

the end of the period and the others each quarter, up to one year later. The final estimate

incorporates the second (and last) transmission of Social Security data. The rationale for this

transmission schedule is that the first transmission is already almost complete and the data can be

considered definitive after one year. These facts, coupled with the long time and labour-intensive

Page 16: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

16

processing needed to derive the requested indicators, advised against requesting further intermediate

transmissions. The release of a newer version of the SBR, as well the revision of the large enterprise

monthly survey that affect all the months of the previous year, are incorporated once a year in the

May release (Table 4).

It is worthwhile to mention that the Lithuanian and the Estonian revision policy are comprehensive

and well described but do not include a final revision which takes into account a new version of the

SBR as revised population frame. As the inventory of practices of admin based STS-systems

revealed that uncertainties in the active population is a major source for revisions. As the final

estimate is considered as an accurate one in revision analyses and revision policies, it is

recommended that in new systems a final estimate should include updated information of the

population, i.e. business register.

Page 17: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

17

Table 1 - Lithuania: revision policy in the monthly income estimates (retail trade)

Page 18: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

18

Table 2 - Lithuania: revision policy in the monthly income estimates (manufacturing)

Page 19: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

19

Table 3 - Estonia: revision policy in the monthly turnover estimates (retail trade)

Page 20: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

20

Table 4 - Italy: revision policy in the quarterly employment estimates

Page 21: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

21

5. A complete sequence of revisions: an example from Finland

The monthly retail trade survey of Statistics Finland is mainly based on VAT data. Only the largest

enterprises are surveyed, plus some smaller enterprises which are considered to be crucial for the

estimates. However, VAT data arrives too late for the first estimates. As a result, the first estimates

for month t are produced without using VAT, but with a large enterprises survey only. The

consecutive turnover growth rates for month t are determined as follows:

A first estimate is provided 30 days after the end of the month. It is based on unweighted

survey results of about 100 large enterprises. The estimate is made for the highest aggregates

only.

A second estimate is provided 45 days after the end of the month. It is based on unweighted

survey results of about 280 large enterprises. This estimate is used for high and some lower

aggregates. The selected 280 enterprises cover about 70 % of total turnover.

VAT is used for the estimates provided at 75 days after the end of the month. At this stage all

aggregates are published. The estimation techniques are described by Maasing et al., 2013.

The results are revised up to 225 days after the month. The ‘225 days’ estimate is considered

as final, because the VAT-information and “Business Register” information about the

population is complete.

The first two estimates are based on a simple model; “growth rate of the large enterprises = growth

rate of the entire target population”. In formulae:

The results of the estimates a) after 30 days, b) after 45 days, b) after 75 days and c) 225 days are

shown in Figures 3 and 4. Figure 3 focuses on the revisions between the first two preliminary

estimates and the third (first admin data based) estimate for month t. Figure 4 focuses on the

revisions between the second preliminary estimate, the first (incomplete) admin data based

estimates and the final (complete) admin data based estimate.

Figure 3 shows that, for the retail trade, the 30 day and 45 day estimates are quite similar, because

the revision between these estimates and the t+75 days estimate are quite similar. In general the

t+45 days are closer to the t+75 days estimate than the t+30 days estimates. This was expected, as

the sample is larger at t+45 days. No systematic bias was detected between the t+30, t+45 days

estimate on one hand and the t+75 days estimate on the other hand. Neither was a relationship

between growth and revision detected. As a result, it can be concluded that these revision analyses

indicate that the t+30 and t+45 days estimates are satisfactory, taking into account the small sample

size. This demonstrates that for high aggregates first estimates can be produced with a small survey

and a simple model, if the necessary assumptions of the latter (in this case: the short-term

MLEtt

SEMLE

MLEttSEMLEttMLE

tt GYY

CGYGYG ;1,

;1,;1.

1,

...

Page 22: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

22

movement of growth of large enterprises is correlated with the growth of all enterprises) are

fulfilled.

Figure 4 shows that the growth rates after 75 days (the first ones produced with VAT) are

structurally higher than the estimates after 225 days, i.e. they are biased upwards. This is caused by

uncertainty about the active population 75 days after the month, which is caused by time-lags in

VAT and Business Register information about starting and stopping enterprises, leading to

uncertainty about which missing VAT-data need to be imputed (Maasing et al., 2013). The results

show large revisions in summer 2010 and 2011, respectively. Further analyses have revealed that

these high revisions are related to a low survey response in the summer periods. This example

illustrates a major drawback of this estimation method. As a small dataset is grossed to the entire

target population, errors and irregularities in this dataset may easily lead to errors in the estimates.

Hence, it is recommended to check the (few) available data thoroughly if they are used for a

temporal estimation of many small enterprises.

Figure 3 Growth rates derived from a) LE-estimate 30 days after the end of the month, b) LE-estimate 45

days after the end of the month based on a larger sample, c) LE + VAT estimate 75 days of the month using

incomplete VAT-data. Revisions are shown in bars below.

Page 23: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

23

Figure 4 Growth rates derived from a) LE-estimate 45 days after the end of the month, b) LE + VAT

estimate 75 days of the month and c) a final estimate after 225 days of the month when the VAT-data and

information about the population are complete.

These series provide useful information about revisions and the quality of the series. The results

generally agree with the observations of deliverables 4.1, 4.2 and 4.3 of the ESSnet Admin Data.

However, they also reveal additional issues (slight structural bias in t+45 day estimate; effect of

lower responses in summer) that were only detected afterwards. In the next chapter, the approach of

structural revision analyses and an accompanying revision sheet is presented, in order to detect

larger revisions and underlying causes at an earlier stage.

Page 24: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

24

Variable Domain Vintage No. of

occurrences Period

Size Direction Variability

Impact on

growth

rates

direction

MAR RMAR MaxAR MeAR MR % > 0 % < 0 MeR SDR Range

%

Sign(Later)

=Sign(Early)

Turnover Retail trade t+75 36 Jan09-Dec11 0.7 0.2 1.4 0.7 -0.7 0.0 100.0 -0.7 0.3 1.3 94.4

Turnover Construction t+75 36 Jan09-Dec11 1.5 0.1 6.3 1.5 -1.0 8.3 91.7 -1.2 1.6 9.4 94.4

Wages and

salaries Retail trade t+45 36 Jan09-Dec11 0.8 n.a 1.6 0.8 -0.8 0.0 100.0 -0.8 0.3 1.6 n.a.

Wages and

salaries Whole economy t+45 36 Jan09-Dec11 0.8 n.a 1.7 0.7 -0.8 0.0 100.0 -0.7 0.3 1.5 n.a

Table 5 - Summary statistics on revisions - Finland

Page 25: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

25

6 A structural way to analyse revisions: an example from Italy

6.1 Introduction

A structured way to analyse revisions is proposed in this section. It takes into account the general

context as described in chapters 1 to 5. These analyses focus on:

1. the revision between the first and the last estimate, although the method can be used between

any two vintages;

2. the original series (not adjusted for calendar or seasonal factors);

3. the growth rate as the target parameter.

The presented framework is aimed to provide a concrete example of structural revisions analyses

according to the outlines as sketched in previous chapters. It is a continuation of the work of Roestel

(2011). Hence, although the example is worked out for social security data of Italy and VAT-data

for Estonia, it is basically a continuation of the attempt to formalise VAT-analyses in the

Netherlands and Germany (Roestel, 2011). More specifically, it provides the tools to assess the size,

systematic nature and plausibility of revisions and to direct further efforts to understand the causes

of revisions, splitting up the revision into the part due to admin data and the part due to survey data.

The framework proposed follows a top-down approach, as illustrated in Figure 5.

The aim of this example is not only to provide a concrete example of revision analyses and show

which factors in the statistical process may lead to revisions, but also to demonstrate which data and

time-series need to be stored to perform structural revision analyses. The importance of the latter

remark must not be under-estimated, because several admin data STS systems exist where there is

no opportunity to perform structural (automatic) revision analyses.

6.2 General outline: 5 steps

In the first step, general information on the target parameters and their estimation procedures are

summarised according to a pre-defined form, in order to provide the context for the analysis of

revisions (step 1).

In the second step, synthetic measures of revisions are reported for all the publication domains (step

2). From this bird’s eye view of the aggregate revisions of all the publication domains, the analysis

drills down in two directions. The first direction (step 3) aims at deepening the analysis only on

domains that have shown problems in step 2 through:

a) a time-series representation of the revisions; and

b) decomposition, showing the contributions of admin and survey data.

The second direction aims at detecting, at the level of estimation methodology, the critical issues

that may have influenced revisions (step 4) either with cross-domains analysis or, again, drilling

down into specific domains. Being method-specific, the appropriate analysis for this second

direction can be only chosen in the specific context. In the final step (step 5) a cause and effect

Page 26: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

26

report is prepared in order to summarise the most relevant issues which have emerged from the

previous analyses.

This framework is intended as a monitoring tool, for internal control and not for publication, which

might provide useful feedback for process management and in the evaluation of the estimates for

future improvements.

In this context, the purpose of this analysis framework is threefold. Firstly, the analysis of revisions

can be useful in experimental contexts to evaluate a tentative methodology and eventually compare

some variants. Secondly, even in current production contexts it could be a useful tool to monitor the

performance of the estimates, through the analysis of characteristics and causes of revisions.

Thirdly, applying these analyses on subsequent deliveries of admin data referred to the same period

should help to evaluate the gains in terms of revisions of successive deliveries and to design a

revision policy.

The proposed analysis requires not only that all vintages of macro data are stored, but also, for the

analysis in step 3 and 4, that it is possible to replicate the data situation of the different sources at

different points in time. In other words the data base system should be able to track the changes in

micro data (either value substitutions, availability of late units, changes in the SBR etc.).

Figure 5. The revisions analysis framework

Context information

Revision measures

- Target parameters.

- Data sources.

- Estimation procedures. First/last estimate

Step 1

Step 2 - Synthetic measures for publication domains

Step 3 Analysis of

problematic domains

Graphical analysis

Revision

decomposition

(survey/admin)

Cause effect

analysis

- Legislation changes.

- Sudden (unexpected) drop in data depending on?

- External original source revision of microdata.

- Internal processing revision of microdata.

- Changes in methodology.

- Reclassification/errors in Nace code.

- Others.

Step 5

Further analysisStep 4 - Estimation methodology (country specific)

Page 27: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

27

Chapter 6.3 shows how this framework is used for quarterly estimates of the number of employees

based on Social Security in Italy (Baldi et al. 2012). An additional example on the use of VAT for

turnover estimates for the retail trade in Estonia is shown in Appendix 1. A further application is

found in Langford and Teneva (2012). Note that these examples are presented as case studies and

the framework can be adapted to a more general situation.

6.3 Step 1: context information – example Italian employment data

The purpose of the ‘context information’ section is to provide context information and metadata to

understand the analysis of revisions that follows.

This template and the corresponding analyses are designed to study the revisions per indicator

(same target variable, same source, same periodicity, same deadline). Therefore, a separate form for

each indicator should be compiled. The template is filled in with the Italian data in Table 6.

However, it is relatively straightforward to complete this form for admin data based statistics (VAT

or social security data) in other countries. In Annex 1 this example has been filled in for Estonia.

General Information

Indicator Number of employees

Target Domains All divisions of the B-N (Nace Rev.2) aggregate

Periodicity Quarterly

Number of routine revisions 1 (q-4)

Deadline of the first estimate 60 days

Release of the first estimate 60

Release of the second estimate 150

Release of the third estimate 240

Release of the fourth estimate 330

Release of the final estimate 420 days

Sub-populations

Large enterprises Enterprises with 500 employees or more

Share of LEs in terms of target variable (i.e.

turnover or employment)

20% of total employment

Small and medium enterprises Enterprises with less than 500 employees

Share of SMEs in terms of target variable

(i.e. turnover or employment)

80% of total employment

First estimate

Large enterprises

Use of survey Yes, census, 1400 enterprises

% of survey respondents on final data 84.2% (2011 average value)

% of target variable on final data 87.7% (2011 average value)

Use of admin data No

% of data reporters on final data

% of target variable on final data

Estimator Enumeration of available data + imputation of

missing units

Page 28: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

28

Small and medium enterprises

Use of survey No

% of survey respondents on final data

% of target variable on final data

Use of admin data Yes: direct use of almost complete data

% of data reporters on final data 98.15% (2007-2010 average value)

% of main variable on final data 98.14% (2007-2010 average value)

Estimator Enumeration of available data + imputation of

missing units.

Combined estimate between LEs and

SMEs

Sum of LEs and SMEs estimation results by domain

Notes In the imputation procedure, the list of active non

reporting units is predicted through adjacent

reporting. Employment values are imputed using

E(t)/E(t-1) calculated by domains and on panel

reporting units

Final estimate

Large enterprises

Use of survey Yes

% of survey respondents on final data 96.8% (2011 average value)

% of target variable on final data 98% (2011 average value)

Use of admin data No

% of survey respondents on final data

% of target variable on final data

Estimator Enumeration of available data (included late

respondents not included in the preliminary

estimate) + imputation of missing units on the basis

of a deterministic approach

Small and medium enterprises

Use of survey No

% of survey respondents on final data

% of target variable on final data

Use of admin data Yes

% of data reporters on final data 100%

% of main variable on final data 100%

Estimator Enumeration of available data

Combined estimate between LEs and

SMEs

Sum of LEs and SMEs estimation results by domain

Notes

Table 6. Context information, as filled in for quarterly employment estimates in Italy.

In relation to the framework, it is worth emphasising that, in the subsection on “sub-populations”,

each sub-population of enterprises should be indicated for which there is a significant difference in

data source ( survey or admin data); or methodology.

Page 29: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

29

Routine revisions in this template are defined as all the regular revisions scheduled according to the

revision policy. Occasional revisions, due to a change in methodology or unexpected events, are not

mentioned in this part of the framework. If a country is still developing an STS-production system

which uses admin data, this template may be useful to develop a tentative revision policy.

All relevant information to understand the methodology, especially with regards to the analysis of

revisions, should be reported under the line ‘notes’.

With regard to the Italian example, some details in the framework are evident and there follows a

summary with further details. The Italian employment estimates are obtained with a mixed mode

approach where the population of enterprises with at least 500 employees is covered by a traditional

survey and the population of small and medium enterprises is covered with Social Security data

(Baldi et al. 2011). The large enterprises survey (LE-survey) is a monthly panel survey that follows

all the enterprises that had at least 500 employees in the base year regardless of whether their size

remained over the threshold since that year. In order to cover the population of enterprises not

represented by the survey, a complementary list of enterprises is built from the Social Security data

in the base year and maintained in every quarter by adding the firms that entered the population. In

every period, the estimate is produced by adding the estimated employment of the LE-survey to the

estimated employment derived from the admin data. The preliminary first estimates of the LE-

survey are obtained by adding imputed values of the non-reporters to the enumeration of the

reporters. Once a year the LE-survey data of the previous 12 months are revised, replacing the

imputed values with the reported values.

The estimates for the population of SMEs covered by the admin data, shown in the example, are

obtained with an imputation procedure much like the one used in Germany for the turnover

estimates based on VAT data. It is a current population methodology where STS-estimates for the

target population of enterprises for the month t is approximated by the early reporters for that

month, plus imputed values for the enterprises assumed active for that month but not reporting. This

list is formed using the following deterministic rules:

In each of the first two months of a quarter, a unit is declared missing (that is assumed to be

active) if it has reported for both the month before and the month after the reference month.

For the third month of the quarter, for which there is no information about the following

month, the unit is assumed to be active if it has reported the month before.

The employment value is imputed by applying the growth rate between month t-1 and month t (of

the units present in both months) to the value of employment for the missing unit at the time t-1. In

formulae, the provisional population of units Pt

pis composed by the population of early reporters

Pt

erand the population of units defined as missing reporters P

t

mr:

p er mr

t t tP P P [1]

The above deterministic rules imply that a unit is defined to belong to the population of missing

units for the month t:

Page 30: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

30

when t is the first or second month of the quarter;

when t is the third month of the quarter.

The value imputed for a missing unit is therefore: [2]

6.4 Step 2: revision measures – example for Italian employment data

This step is basically the heart of the revision analyses as it provides a set of summary statistics

about the revisions for all the publication domains. The mentioned revisions are supposed to be

evaluated on the year-on-year growth rates.

The literature indicates a variety of summary measures (see for instance Di Fonzo, 2005; McKenzie

and Gamba, 2008). In the following table (table 2), the most commonly used are examined. The

summary measures are classified in four groups, depending on the kind of information they provide

(size, direction, variability, impact on sign of growth rates). A detailed description of the summary

statistics of table 6, together with their formulae, is provided in Appendix 2.

The aim of these revision measures is to provide two kind of indications:

a general idea about the (size of) revisions and check whether there is anything systematic in

the revisions

the identification of domains that need in-depth analysis. The analyst can define rules and

thresholds to identify the problematic domains. One can also have a couple of thresholds for

each indicator: perhaps a first to indicate mild problems and a second to indicate severe

problems.

The table has been filled in with the results on the Italian estimates for 10 selected domains only.

The most critical values of the summary statistics report are picked out in yellow.

The main points highlighted by the analysis are:

1. Overall the revisions are quite limited in size: they range in terms of Mean Absolute Revision

between 0.1 and 0.3 with the exceptions of division 10 (0.4) and division 81 (0.8). In terms of

Median Absolute Revision they are below 0.3, with the only exception of division 81 (0.6).

2. The analysis of direction measures shows that the revisions are, slightly, systematically positive

implying a small under-estimation in the first estimates.

3. On the basis of the Max Absolute Revision and size of the domain three cases are chosen for a

more detailed analysis: division 10 (manufacture of food products), division 41 (construction of

buildings), division 81 (services to buildings and landscape activities).

The first conclusion from these results is that the first estimates tend to be too low, which is useful

information when publishing the results.

1 1 if and mr er er

t t ti P i P i P

1 if mr er

t ti P i P

1

1

1

1

ˆer er

t t

er ert t

jt

j P P

it it

jt

j P P

y

y yy

Page 31: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

31

More detailed analyses, relating this under-estimation to:

the large enterprise survey;

the estimation procedure; and/or

the uncertainty in provisional target population

are discussed in the next chapter and are of course crucial to improving the system (or comparing

systems across countries). The second conclusion from these results is that divisions 10, 41, 81 are

weak sections, which is again useful information when publishing the results.

6.5 Step 2a: graphical analysis of problematic domains

For each problematic domain selected from the previous analysis, a graph comparing the first and

the last estimate and related revisions is shown below (Figure 6). The aim of these graphical

analyses is to determine whether the weak domains have large revisions for all periods or only

certain periods. It is important to address this for two reasons – firstly for explaining published

results, and secondly in case high revisions occur regularly and are related to the business cycle

or seasonality, which would show that the chosen methodology is unsuitable for that particular

domain.

Page 32: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

32

Domain Period No. of

occur-

rences

(for

which

revisions

are calcu-

lated)

Size of

domain

(target

population)

Survey

Covera

ge

Revision error

No. of

employees

Target

variabl

e (%on

target

popu-

lation)

Size Direction Variability Impact on

growth rates

direction

MAR RMA

R

MaxA

R

MeAR MR %

revisio

n >0

%

revision

<0

MeR SDR Rang

e % Sign(Later) = Sign(Early)

10

2008:1-

2010:3 11 307,084 15.3 0.4 0.3 1.5 0.1 0.4 100.0 0.0 0.1 0.6 1.5 90.9

15

2008:1-

2010:3 11 122,478 4.6 0.2 0.0 0.4 0.1 0.1 81.8 18.2 0.1 0.2 0.6 100.0

25

2008:1-

2010:3 11 510,910 2.3 0.1 0.0 0.3 0.1 0.1 81.8 18.2 0.1 0.1 0.5 100.0

28

2008:1-

2010:3 11 426,635 15.4 0.2 0.1 0.5 0.2 0.2 90.9 9.1 0.2 0.1 0.5 90.9

30

2008:1-

2010:3 11 92,197 53.8 0.3 0.1 1.0 0.2 0.1 63.6 36.4 0.1 0.4 1.3 100.0

41

2008:1-

2010:3 11 469,929 1.8 0.2 0.0 0.9 0.1 0.1 63.6 36.4 0.0 0.3 1.0 100.0

47

2008:1-

2010:3 11

1,033,858 22.7 0.2 0.1 0.6 0.2 0.2 90.9 9.1 0.2 0.2 0.6 90.9

64

2008:1-

2010:3 11 388,278 75.3 0.2 0.2 0.6 0.3 0.2 90.9 9.1 0.3 0.2 0.7 100.0

71

2008:1-

2010:3 11 78,519 11.9 0.2 0.1 0.6 0.2 0.2 90.9 9.1 0.2 0.2 0.6 90.9

81

2008:1-

2010:3 11 426,375 19.8 0.8 0.4 2.2 0.6 0.8 100.0 0.0 0.6 0.6 1.9 90.9

Table 6. Summary statistics on revisions (application to Italian employment estimates)

Page 33: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

33

Figure 6. Graphical analyses: preliminary estimate, last estimate and revisions (application to Italian

employment estimates).

The graphical analyses show that high revisions are concentrated at the end of 2009 for division 10,

while for division 41 the only noticeable revision (2010 Q2) is very small compared to the size of

the growth rate (-15%). Division 81 is different since it shows sizeable, positive, revisions in almost

every quarter, with an increase at the beginning of 2010.

6.6 Step 2b: decomposition of the revision error into a survey part and an admin

data part

In this step, for the problematic domains, the revisions are broken down according to the

contribution due to the survey and the contribution due to admin data. The exact calculations behind

this decomposition are provided in Appendix 2. In this chapter only the results will be presented.

A simple way of presenting the revision and its components can be through graphs like those in

Figure 7. The red line in this graph is the total revision and the bars represent the contribution of

the two sources of data. The application to the three selected Italian divisions is shown.

Page 34: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

34

Figure 7. Graphical analysis: decomposition of revisions due to admin and survey data.

Page 35: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

35

Decomposing the revisions in this way shows that for division 10 in 2009 Q3 and 2009 Q4 the

revisions were substantially due to the LE-survey. The explanation is that a couple of influential

enterprises were missing for the preliminary estimates and imputed. Since they had a seasonal

pattern different from that of the respondent enterprises, the imputation procedure was unable to

predict their values accurately.

For the other two divisions, where LEs are much less relevant, the bulk of revisions are due to the

admin data. In division 81, in the last part of the time series, the contribution of the large enterprises

compensates a little for the contribution of the SMEs.

This Figure is important because it reveals that large revisions might be due to missing data in the

LE-survey. Due to the significant contribution of the LE-survey data to the estimate, the data

collection and imputation of the LE-survey also need to be sound in an admin data based STS-

system! If this is not the case, large revisions in admin data based STS-estimates may arise which

are NOT caused by deficiencies in the admin data.

Large revisions caused by the LE-survey were also detected in a new VAT-system for quarterly

turnover estimates in the Netherlands, in a VAT-based STS-system in Estonia (Appendix 1) and

were also described by Roestel (2012) when using VAT-data in Germany. Hence, there seems to be

a general problem of large revisions due to missing (or revised) data in the STS-survey, not just

with Italian employment data.

6.7 Step 3: further analysis

This step focuses on the main mechanism issues that may have caused revisions. The analysis drills

down into the most critical features of the adopted method; to identify and quantify the contribution

of the most crucial steps of the estimation methodology to the revision. This step covers method-

specific explorations and is therefore left to each NSI. For methodologies based on imputation

procedures, it is recommended that the analysis addresses whether the revisions are due to the

identification of units for imputation or to the applied method of imputation. Further insights may

come from analysis aimed at understanding how the methodology works in sectors characterised by

different dynamics or behaviours (seasonality, trend, business demography etc.).

In the example of Italian employment estimates, because the table of summary measures of

revisions has highlighted a slight but systematically positive revision error, further analysis is

required into problems with the imputation procedure for the part covered by admin data. Since this

is composed of two parts, it can be useful to see if the errors are due to the imputation method or

due to the uncertainty in the active population at the timing of the first estimates.

To clarify the results, using the same notation as in Appendices 1 and 2 and dropping the redundant

subscript and superscripts, the preliminary estimate may be written as:

[3] ˆp

p

t it

i P

Y y

Page 36: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

36

The preliminary estimate can thus be rewritten as:

[4]

Analogously, since the final population may be seen as composed by the early reporters and the

late reporters (because ), the last estimate is defined as:

[5]

The difference between the final estimate and the preliminary estimate (the revision error) on the

subpopulation is thus:

[6]

Since part of the population of assumed missing reporters is effectively constituted by late reporters,

this can also be written as:

[7]

where the first term represents the imputation error on the population of units correctly assumed as

active, the second term represents the under-estimation due to the population of units which were

not identified as active (or were incorrectly defined as inactive) and the third term represents the

over-estimation due to the population of units incorrectly defined as active.

Table 7 shows for the selected domains the terms described above as averages for the period 2008-

2010. All values are expressed as percentage share of the last estimate. Starting from the total

preliminary estimate (column h) we find, like table 2, the result of a slight downward bias. On

average, this accounts for a maximum of 0.2-0.3 percentage points, with the exception of division

81. Analysing the results in more detail, it can be seen that there are no significant differences (for

the population of enterprises imputed at t and then reporting in the final population) between the

imputed values and the reported values (columns b and c). This result suggests that the imputation

method on average is not responsible for the error. The slight downward bias derives instead from

the population of units which are incorrectly defined as active (column d) or which were incorrectly

identified as inactive in the provisional population (column e).

ˆer mr

t t

p

t it it

i P i P

Y y y

er

tP

lr

tP l er lr

t t tP P P

ˆ ˆ( ) ( )er lr er mr lr mr

t t t t t t

l p

t t it it it it it it

i P i P i P i P i P i P

Y Y y y y y y y

( ) ( )

ˆ ˆ ˆ( )lr mr lr mr lr lr mr mr lr mr

t t t t t t t t t t

l p

t t it it it it it it

i P i P i P P i P P P i P P P

Y Y y y y y y y

er lrt t

l

t it it

i P i P

Y y y

Page 37: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

37

Table 7. Summary results on the Italian imputation method

The main conclusion from these analyses is that uncertainty in the active enterprise population

causes systematic revisions. This has been found not only with Social Security data in Italy, but also

with VAT-data in Estonia, Finland and Germany (Vlag et al.,2013).

6.8 Cause and effect report: a synthetic description of the main causes of

revisions

An analysis of cause and effect for the main causes of revision emerged from the previous analysis,

using the following grid (Table 8). In the first section of the grid, the main characteristics of the

revisions are described from a general perspective, as the synthetic report on the main causes of

revision for the Italian employment estimates emerged from the previous analysis.

In the second part of the grid, for the most problematic domains (10, 41 and 81 in the Italian

example), for each point in time for which the revision is particularly significant, the possible

causes are reported, according to a pre-specified codification (see below).

Without later

reporting

Imputed

values in

the

preliminary

estimates

(b)

Reported

values in

the final

data (c)

Imputed

values in the

preliminary

estimates

(d)

10 98.2 1.2 1.2 0.4 0.7 1.6 1.8 99.8 100.0

15 98.4 1.0 1.0 0.5 0.7 1.5 1.6 99.8 100.0

25 98.6 0.9 0.9 0.3 0.4 1.3 1.4 99.9 100.0

28 98.5 1.0 1.0 0.3 0.5 1.3 1.5 99.8 100.0

30 98.1 1.2 1.2 0.5 0.7 1.7 1.9 99.8 100.0

41 97.7 1.4 1.4 0.8 1.0 2.2 2.3 99.9 100.0

47 98.0 1.2 1.2 0.6 0.8 1.8 2.0 99.8 100.0

64 98.2 1.2 1.2 0.3 0.6 1.5 1.8 99.7 100.0

71 98.3 1.1 1.1 0.3 0.6 1.5 1.7 99.8 100.0

81 96.3 1.8 1.8 0.7 2.0 2.5 3.7 98.8 100.0

Domain

Imputed

value

f=(b+d)

Total

reported

value

i=(a+g)

With later reporting

Early

reporters

(a)

Units with imputed missing values at

t

Units without

imputed values

in the preliminary

estimates but

reporting in the

final data (e)

Total

estimated

value

h=(a+f)

Reported

values

g=(c+e)

Page 38: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

38

Table 8. Causes of point in time errors. Application to the Italian employment estimates

GENERIC

(CROSS

DOMAINS)

- Slight general under-estimation.

- Under-estimation systematically characterises SMEs subpopulation.

- Revisions of survey data are in general low and sometimes compensate for SMEs

revisions. But for some domains they appear to be significant.

- Almost all domains record high revisions in the first and second quarters of 2010, due to

an administrative change.

DOMAIN POINT IN TIME

ERRORS/GENERIC

REVISIONS

CAUSES DESCRIPTION

Division 10 2009q3 SR High revisions on LEs survey data. Sector

characterised by seasonal activity.

2009q4 SR High revisions on LEs survey data.

2010q2 LC Severe change of the Social Security declaration

form.

Generic

Division 41 2010q2 LC Severe change of the Social Security declaration

form

Generic

Division 81 2009q1 MP General underestimation of the imputation method

+

2009q4 MP General underestimation of the imputation method

+

2010q1 MR+ LC General underestimation of the imputation method

+

2010q2 MP+LC Severe change of the Social Security declaration

form

2010q3 MP+LC Severe change of the Social Security declaration

Generic General underestimation of the method. Sector characterised by

higher than average, non reporting rates and relevant business

dynamics.

Classification of possible causes of revisions

Legislation changes (LC)

Sudden (unexpected) drop in data----depending on?(AD)

External original source revision of microdata (MR)

Internal processing revision of microdata (PE)

Changes in methodology (MC)

Reclassification/errors in Nace code (NC)

Method performance (MP)

Revision of Survey data (SR)

Others (explain) (O)

Summarising, these revision analyses revealed that:

Page 39: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

39

the revisions are generally quite low, suggesting that the preliminary estimate is quite reliable.

However it appears that the first estimate is characterised by a small systematic under-

estimation (due to provisional active population estimate);

regarding the estimation using admin data, the imputation for late reporters does not seem to be

a problem provided the units to be imputed are correctly identified. The imputation procedure

adopted, based on the month-on-month growth rate of the reporters works well overall in

estimating the employment of the units that are correctly identified as late reporters;

more important is the (in)ability of the first estimates to capture the true population dynamics.

In practice, over-imputation for inactive units does not fully compensate for the under-

imputation of new starters;

the imputation methodology did not fully cope with the unexpected and considerable drop in

data due to the legislation change which occurred in 2010;

6.9 Generalisation and general remarks

Two generalisations can be considered depending on the available data in the individual countries.

The first consists in analysing the average measures of revisions through time to check whether

they are not influenced by the business cycle or changes in information availability (for

instance because of legislation changes or other administrative issues). A simple way to do this

is to calculate the table of summary revisions for sub-periods. An alternative is to follow

revision measures on “sliding time windows”.

A second useful generalisation is to study the revisions across vintages. This may highlight

whether the estimates converge gradually to the final values or not (Röstel 2011). Moreover, it

may provide insights into the impact of updates to the various sources (admin data, survey data,

benchmarking, etc.).

A general comment is that we have noticed that only a few countries structurally carry out revision

analyses. Moreover, the necessary data to perform revision analyses are sometimes not available, or

are difficult to extract from existing systems. While the available admin data used for the different

releases are often stored, it was sometimes difficult to trace back afterwards which version of the

Business Register or which LE-survey data were used for the consecutive estimates. The latter may

suggest that the contributions of these factors to the quality of the admin based STS-estimates might

be under-estimated when developing systems.

As revision analyses provide essential information about the quality of the preliminary estimates

and the underlying causes (missing admin data, missing LE-survey data and/or inability to capture

population dynamics), when developing an admin data based STS-system it is recommended that

the database is constructed in such a way that structural revision analyses can be carried out.

Page 40: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

40

7 Conclusions

Analyses of current practices suggest that

estimation methods for missing admin data;

missing data in the large enterprises survey; and

uncertainty of the active population when the estimates have to be made

are the most important components determining the quality of admin data based STS-estimates (see

deliverable 4.1 of the ESSnet on AdminData). If preliminary estimates have to be made when no

admin data are available, the factor ‘estimation methods for missing admin data’ is replaced by the

quality of temporary model-based estimations.

In this case a complete sequence of STS-estimates for period t (and its accompanying revisions)

should consist of:

model-based estimates for small and medium sized enterprises, replaced by admin data

estimates with a decreasing amount of missing admin data over time, and

a survey-based estimate for the largest enterprises with an decreasing amount of missing survey

data over time.

Within such a sequence, the size of revisions is the primary quality indicator of the accuracy of the

preliminary estimates. Moreover this indicator is available to the general public. Large and/or

systematic revisions may be interpreted as a signal of unreliability and undermine the credibility of

the preliminary estimates. Revision analyses are a powerful tool to increase the reliability of (admin

based) STS-estimates.

Revision analyses are highly recommended in the development stage of such a system, as they help

to evaluate the methodology, compare its variants and indicate the direction for improvements. If an

NSI is considering replacing a survey based approach with an admin data based or mixed approach,

the study of the revisions is essential to compare the old and the new methods. Another interesting

application for revision analyses is in the choice of the boundary between the survey part and the

admin data part.

Revision analyses are equally important during current production, as they can help discover sudden

events in both available admin (or survey) data or changes in the business cycle that are causing

larger revisions. Moreover, they can detect publication levels for which the preliminary estimates

are ‘strong’ and publication levels for which they are ‘weaker’. At the same time, revision analyses

reveal the strength of the estimates under normal or abnormal circumstances. The availability of

such information is very useful for fine-tuning the estimation methodology or explaining results to

the public.

In this context, it is remarkable that few countries perform structural revision analyses on their

admin data based STS series. In some cases, the necessary information cannot even be derived from

the database. Hence, one of the recommendations when developing or updating an admin data based

Page 41: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

41

STS-system is that the database is constructed in such a way that structural revision analyses can be

carried out.

The proposed revision sheet might be a useful tool to analyse revisions structurally, because it

includes all necessary elements. Depending on the exact data situation in an individual country, the

practical use of such a sheet will differ per country.

Revision analyses are closely related to a revision policy. Generally the deadline for the first

estimates is determined by European Regulations or by national needs. The publication dates of

later (more final) estimates are often more flexible. As uncertainty in the provisional active

population is a major source for revisions and often leads to slightly biased (under- or

overestimated) growth rates in the first estimates, it is recommended that the final estimates are

made when both the admin data and survey data are complete, and the Statistical Business Register

has been finalised. Other revision points can be determined based on output obligations, or at times

when the missing or revised data become available.

Acknowledgements

This deliverable is the result of the work and discussions of the complete ESSnet WP4 group. The

authors would like to acknowledge the contributions and constructive comments of all the countries

participating in WP4 of the ESSnet Admin Data. The gratitude and appreciation of the authors are

sent to them all.

References

Baldi C., Congia M.C., Pacini S., Tuzi D. (2011). The STS-employment estimates in Italy based

on admin data. Deliverable of Work package 4.

Baldi C., Tuzi D. (2012). Analysis of revisions of admin data based short term statistics. Proposal

of a template and an application to Italian employment estimates. Interim report of Work package 4.

Baldi C., Ceccato F., Pacini S., Tuzi D. (2012). Imputation of employment admin data in Italy.

Interim report of the Work package 4.

De Waal, A.G., Vlag, P.A., Baldi, C. Tuzi, D. (2012), The use of administrative data for STS.

Situation I: Good coverage provided by administrative data. Milestone of Work package 4.

http://essnet.admindata.eu

Di Fonzo T. (2005). The OECD project on revisions analysis: First elements for discussion, paper

presented at the OECD STESEG Meeting, Paris, 27-28 June 2005

(http://www.oecd.org/dataoecd/55/17/35010765.pdf).

Eurostat (2012). ESS Guidelines on Revision Policy for Principal European Economic Indicators

(https://circabc.europa.eu/faces/jsp/extension/wai/navigation/container.jsp).

Kavaliauskiene D., (2011). Application of Ratio and GREG-Estimator to VAT for Monthly

Turnover Estimates. Deliverable of Work package 4.

Kavaliauskiene D, Slickute-Sestokiene M & Vlag P (2013), The use of regression estimators for

admin data based STS estimates, Deliverable 4.2 of ESSnetAdminData – SGA2011,

http://essnet.admindata.eu

Page 42: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

42

Karus E. (2012). Revision analysis on Estonian Retail Trade Data. Presentation for Work package

4.

Kiema S., Remes T. (2012). Comparison of Imputation with Realisations. Presentation for Work

package 4.

Langford, A., and Teneva, M. (2012). Analysis if revisions of admin data based short term

statistics. Application to UK retail sales data and implications for the definition of the boundary

between survey and administrative data coverage. Internal report of the Work package 4 (upon

request).

Lorenz R. (2011). Current Results and Future Improvements in Respect of Estimates for Missing

Values in the VAT Registration. Deliverable of Work package 4.

Maasing E. (2012). Testing Imputations on Estonian Retail Trade Data. Presentation for work

package 4.

Maasing E, Remes T, Baldi C & Vlag P (2013), STS estimates based solely on admin data: final

results and recommendations, Deliverable 4.1 of ESSnetADminData – SGA2010,

http://essnet.admindata.eu

Mazzi G. L., Ruggeri Cannata, R. (2008). A Proposal for a Revisions Policy of Principal

European Economic Indicators (PEEIs), Contribution to the OECD/Eurostat Task Force on

Performing Revisions Analysis for Sub-Annual Economic Statistics

(http://www.oecd.org/dataoecd/44/39/40309491.pdf).

McKenzie R., Gamba M. (2008). Interpreting the results of Revision Analyses: Recommended

Summary Statistics. Contribution to the OECD/Eurostat Task Force on “Performing Revisions

Analysis for Sub-Annual Economic Statistics” (www.oecd.org/dataoecd/47/18/40315546.pdf).

OECD/Eurostat Task Force on Performing Revisions Analysis for Sub-Annual Economic

Statistics (2008). A basis for classifying reasons for revisions to short term statistics,

(http://www.oecd.org/dataoecd/44/37/40309451.pdf).

Orchard C., Langford A., Moore K. (2011). National practices of the use of administrative and

accounts data in UK short-term business statistics. Deliverable of Work package 4.

Röstel D. (2011). Attempts to improve methods of plausibility checks of combined turnover data in

German service statistics (STS). Deliverable of Work package 4 (SGA-2010).

Sirviö M. (2011a). Turnover Indices (incl. Retail Trade) and Value Added Tax (VAT) Data.

Deliverable of work package 4 (SGA-2010).

Sirviö M. (2011b). Industrial Production Index and Value Added Tax (VAT) Data. Deliverable of

Work package 4 (SGA-2010).

- M. (2011). Application of GREG Estimators for (Administrative Data Based)

Short Term Lithuanian Labour Statistics. Deliverable of Work package 4 (SGA-2010).

Teneva M. (2012). Use of VAT data in Monthly Business Survey. Presentation for Work package

4.

Toivanen E. (2012). Practical Experiences with Estimating Incomplete VAT Data. Presentation for

Work package 4.

Vlag P. (2012). Using Incomplete VAT-data for Turnover Estimates in Europe. Presentation for

Work package 4.

Page 43: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

43

Vlag P., Ortega Azurduy S., Karus E. (2011a). The use of admin data for monthly and quarterly

estimates: common issues and challenges in Estonia, Finland, Germany, Italy, Lithuania, The

Netherlands and the United Kingdom. Deliverable of Work package 4 (SGA, 2010).

Vlag P., Ortega Azurduy S., Van Loon A., Scholtus S. (2011b). Monthly turnover estimates with

VAT: challenges in the Netherland. Deliverable of Work package 4 (SGA, 2010).

Pieter Vlag, Reinier Bikker, Ton de Waal, Eetu Toivanen, Mila Teneva (2013), Extrapolating

admin data for early estimation: some findings and recommendations for the ESS, Deliverable 4.3

of ESSnetADminData – SGA2010, http://essnet.admindata.eu

Appendix 1. An application of revision analysis to the Estonian

turnover estimates on retail trade

This section describes the results of the application of revision analysis to the Estonian retail trade

turnover estimates (division 47 of the Nace Rev.2) (Karus 2012). The analysis refers to the year on

year growth rates of the target variable and covers the period from January to May 2012.

The following table A3.1 reviews main information on sources, revisions policy, estimation

method, coverage etc. on the considered domain.

The Estonian monthly estimates on retail trade turnover are based on a mixed source approach,

where survey data are complemented by admin data on VAT. A census survey collects information

on large enterprises (20+ persons employed) while data on medium enterprises (2-19 persons

employed) are collected through a sample survey. VAT Admin data are used as direct source for

small enterprises (1 persons employed) estimates and as auxiliary source for the imputation of

missing units in the survey sources. Outliers from both survey and admin data are corrected with

stratum averages. The estimates are based on a fixed population, that for year 2012 was created in

November 2011, on the basis of a frozen version of the SBR. Only the changes for large and

important enterprises are taken into account while updating the population during the reference

year. In order to take into account exits, in the admin data portion, the activity status is predicted,

using information on VAT reporting: an enterprise is considered as non active if VAT declaration

for current and previous month was not reported. This information in reinforced using an additional

admin source: under the previous condition evaluated on VAT data, an enterprise is considered non

active if social security declaration for current and previous month was not reported or reported

with salary=0.

Considering the available information, three versions of preliminary estimates are released (30 days,

60 days and 90 days after the reference month); further versions of the preliminary estimates are

also calculated through benchmarking, when quarterly survey data are available (benchmarking is

implemented in February, May, August and November). Benchmarking also allows elimination of

differences in definitions between survey and admin data. For some months the number of revisions

may arrive to six (see §3 for further details on the Estonian revision policy). Final estimates are

released 1 year after the reference month.

Page 44: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

44

This new method, based on the exploitation of the VAT data, has been in production only since

January 2012. For this reason only a few vintages of the estimates are available at the moment.

Page 45: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

45

Table A3.1 - Context information. Estonian retail trade turnover estimates

General Information

Indicator Turnover

Target Domains NACE 47

Periodicity Monthly

Number of routine revisions First estimate + 3 revisions

Page 46: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

46

Deadline of the first estimate T+15 days after the reference month for the survey

T+20 days after the reference month for VAT data

Release of the first estimate T+30 days (1 month) from the reference month

Release of the second estimate T+60 days (2 months) from the reference month

Release of the third estimate T+90 days (3 months) from the reference month

Page 47: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

47

February, May, August and November Benchmarking with quarterly survey data

Release of the final estimate T+365 days (12 months) from the reference month

Subpopulations

Large enterprises Census (survey) for large enterprises, 20 and more persons

employed

Share of LEs in terms of target variable

(turnover)

84.2% (January 2012 8th estimate in September 2012)

Medium enterprises Sample survey for medium enterprises, 2-9 persons employed and

10-19 persons employed

Share of MEs in terms of target variable

(turnover)

13.1% (January 2012 8th estimate in September 2012)

Small enterprises Admin data (VAT) for small enterprises, 1 person employed

Share of SEs in terms of target variable

(turnover)

2.7% (January 2012 8th estimate in September 2012)

First estimate

Large and medium enterprises

Use of survey Yes, census and sample survey.

Sample size – 567 enterprises

% of survey respondents on final data 91.2% (August 2012 first estimate LME survey respondents/January

2012 8th LME survey respondents)

% of target variable on final data 98.1% (January 2012 first LME respondents turnover

estimate/January 2012 8th LME respondents turnover estimate)

Use of admin data Yes. VAT for imputation of missing data.

% of survey respondents on final data 25 respondents

% of target variable on final data 0.9%

Estimator Census – enumeration of available data + imputation of missing

data. Sample survey – enumeration of available data + imputation of

missing data, weighting.

Small enterprises

Use of survey Yes. 7 outliers

% of survey respondents on final data 100% (January 2012 first SE outlier respondents/ January 2012 8th

SE outlier respondents)

% of target variable on final data 100% (January 2012 first SE outlier turnover/ January 2012 8th SE

outlier turnover)

Use of admin data Yes. Population 4331 (2012). Direct use VAT data, statistical

turnover is calculated.

% of data reporters on final data 97.5% (January 2012 first SE VAT respondents/January 2012 2 nd

SE VAT respondents)

% of main variable on final data 88.8% (January 2012 first SE VAT turnover estimate/January 2012

Page 48: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

48

8th SE VAT turnover estimate)

Estimator Enumeration of available data + imputation of missing data. Outliers

(turnover exceeding average +/– 3x StdDev) imputed with average

of the stratum.

Combined estimate between LMEs and

SEs

Sum of large, medium and small enterprises estimation results my

domain.

Final estimate

Large and medium enterprises

Use of survey Yes, census and sample survey.

Sample size – 567 enterprises

% of survey respondents on final data 14.1% of survey and VAT respondents

% of target variable on final data 97.0% of total NACE 47 turnover

Use of admin data Yes. VAT for imputation of missing data.

% of survey respondents on final data 0.2% of survey and VAT respondents

% of target variable on final data 0.1% of total NACE 47 turnover

Estimator Census – enumeration of available data + imputation of missing

data. Sample survey – enumeration of available data + imputation of

missing data, weighting.

Small enterprises

Use of survey Yes. 7 outliers

% of survey respondents on final data

0.2% of survey and VAT respondents

% of target variable on final data 0.2% of total NACE 47 turnover

Use of admin data Yes. Population 4331 (2012). Direct use VAT data, statistical

turnover is calculated.

Page 49: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

49

% of data reporters on final data 61.4% of survey and VAT respondents

% of main variable on final data 2.5% of total NACE 47 turnover

Estimator Enumeration of available data + imputation of missing data. Outliers

(turnover exceeding average +/– 3x StdDev) imputed with average

of the stratum.

Combined estimate between LMEs and

SEs

Sum of large, medium and small enterprises estimation results by

domain.

In order to get comparable revisions, the following analysis is based on the comparison of the first

estimate with the fourth, that is not the last version. The fourth estimate includes, in addition to the

information used in the first (see above for more details of the revision policy):

1. updated survey data;

2. updated admin data; and

3. at least two revisions due to benchmarking with quarterly data (for March, April and May

three revisions due to benchmarking).

Page 50: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

50

The analysis focuses on division 47 and some of its sub-domains, as described in table A3.2.

Table A3.3. reports the summary statistics on revisions calculated as described above. The most

critical values of the summary statistics are highlighted in yellow.

Although the time-series is relatively short, some important points about the quality of the output

are apparent:

1) Revisions are quite small compared with the size of the growth rate: the RMAR takes the value

of 0.1% for all the considered domains. In terms of Mean Absolute Revision it ranges between

1 and 2.3%, with the maximum revisions recorded in sub-domain G47-4711-472-473 (3.3%).

2) The analysis of direction measures shows that revisions are systematically positive (i.e. the first

estimates are slightly under-estimates).

3) Looking at variability, the revisions in the domain G4711+472 are the most erratic.

An important difference is apparent between the sector with a large share of survey data

(G4711+472) and those with fewer survey data.

Page 51: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

51

Table A3.2 - Description of the domains considered in the analysis

NACE code Description

G47 Retail trade, except of motor vehicles and motorcycles

G4711+472 Retail sale of food, beverages and tobacco

G47-4711-472-473 Retail sale of manufactured goods excl automotive fuel

G47-473 Retail sale excl. automotive fuel

Table A3.3 - Summary statistics on revisions

Domain No. of

occurrences Period

Target

population

Survey

coverage Size Direction Variability

Impact on

growth rates’

direction

No. of units

% of target

variable of

population

MAR RMAR MaxAR MeAR MR % > 0 % < 0 MeR SDR Range % Sign(Later)

=Sign(Early)

G 47 5 JAN2012-MAY2012 5566 97 1.2 0.1 2.4 1.5 1.1 80 20 1.5 1 2.6 100

G4711+472 5 JAN2012-MAY2012 277 94 1 0.1 2.5 1.1 0.6 80 20 0.4 1.3 3.5 100

G47-4711-472-

473 5 JAN2012-MAY2012 609 64 2.3 0.1 3.3 2.4 2.3 100 0 2.4 1 2.4 100

G47-473 5 JAN2012-MAY2012 919 78 1.4 0.1 2.8 1.6 1.4 80 20 1.6 1.1 3 100

Page 52: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

52

The time series of the first and fourth estimates are represented in the following graphs, together

with revisions.

Figure A3.1 - Graphical analysis: first estimate, fourth estimate and revisions.

The graphs confirm a systematic under-estimation, in division 47 as well as its sub-domains.

Page 53: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

53

In Figure A3.2 revisions are decomposed into the part covered in the preliminary estimate by the

admin data and the part covered by the survey. Looking at the four graphs as a whole, the survey

source appears as a predominant cause of revisions because the survey part has the largest share

compared to the total estimate. The contribution of the smallest enterprises (estimates using admin

data) is proportional to their share of turnover. For instance in division 47 (where the admin data

accounts only for 3% in terms of the target variable (table A.3.2)), they account for just about half

the revision of May 2012 and about one fourth in April 2012. In sector 4711-472-473, where admin

data account for about one third of the target variable, in almost every month they were the

predominant source of revisions. A remarkable observation is that the general under-estimation of

the first estimates comes both from the survey part as well as the admin data part.

Figure A3.2 - Graphical analysis: decomposition of revision due to admin and survey data.

Page 54: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

54

In table A3.4 the information on the main causes of revisions analysis are synthetically reported.

The first section describes in general terms the main characteristics of the revisions while in the

second part for the problematic subdomains 472, 4791 and 4779+4781+4782+4789+4799 the

causes of the most significant revisions are explained.

Table A3.4 - Causes of point in time errors.

Generic

(cross domains)

- Slight general underestimation.

- Underestimation characterises survey and admin data subpopulation, exception is one

month, March 2012.

- Revisions of data are in general low, for some activity groups even negligible. Survey

and admin data revisions direction is the same and do not compensate revisions.

- Higher revisions were observed during benchmarking with quarterly data which may

be considered as elimination of definition difference.

Domain Point in time

errors/generic

revisions

Causes Description

472 2012_5 SL, BQ Late reporter of survey data, imputation of

survey data was underestimated.

Benchmarking with quarterly survey data

Generic Small subpopulation. Every reporter is important.

4791 2012_1

SR, BQ

Correction of quarterly survey microdata,

Benchmarking with quarterly survey data 2012_2

2012_3

Generic Activity with highest revisions. First estimate is underestimated.

4779+4781+4782+

4789+4799

2012_2 OT

Treatment of admin data outliers

2012_3

Generic Only activity with negative revisions. First estimate is

overestimated.

Classification of possible causes of revisions

Revision of Survey data (SR)

Benchmarking of data with quarterly survey data (BQ)

Outlier treatment (OT)

Late reporter of survey data (SL)

Conclusions

The application of the revision analysis framework on the year-on-year growth rates (t,t-12) of the

Estonian monthly retail trade turnover estimates, revealed quality aspects of these estimates which

are important when explaining the estimates or improving the system:

1) total revisions are low, at least compared to the magnitude of the growth rates, suggesting that

the first estimate is quite reliable.

2) total revisions are positive: it appears that the first estimate is slightly underestimated;

3) the main revisions of the estimates from the survey source are due to late responses and

corrections of preliminary reported data, estimates from the admin data are revised by late

reporters and benchmarking with quarterly survey data.

Page 55: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

55

Appendix 2. Summary statistics on revisions

The revision indicators reported in table 2 are described in this section, together with their formulae.

Before presenting the formulae themselves, we first present the general relationships, let’s define:

with respectively the preliminary estimate and the last estimate of month t. The preliminary

and last estimate of the y-on-y growth rate will therefore be respectively equal to:

[A1.1]

and

[A1.2]

The revision of the y-o-y growth rate will thus be defined as:

[A1.3]

That is the ratio of the level revision error and the last estimate of t-12.

will be used to refer to the number of periods t..

Size of revisions

Mean Absolute Revision

[A1.4]

gives a measure of the revision size. The use of absolute value avoids compensation effects between positive and negative revisions. It does not provide information on directional bias.

In order to get a measure that is normalised by the size of the estimate the following indicator is used: Relative Mean Absolute Revision

[A1.5]

,p l

t tY Y

12

12

p lp t t

t l

t

Y YY

Y

12

12

l ll t t

t l

t

Y YY

Y

tR

12 12

l p

t t t

l p

t t tt l l

t t

R Y Y

Y Y RR

Y Y

l p

t t tR Y Y

n

n

1t

pt

lt YY

n

1MAR

nl p

t t

t 1

nl

t

t 1

Y Y

RMAR

Y

Page 56: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

56

Similar to MAR is the Mean Squared Revision (MSR) that emphasises the highest revisions. A further measure of size is the median of revisions in absolute value (MeAR) which is not affected

by extreme observations:

Median Absolute Revision

[A1.6]

and the highest revision considered in absolute value (MaxAR), which immediately highlights the

most extreme case:

Maximum Absolute Revision

[A1.7]

Direction

Mean Revision

[A1.8]

gives an indication on the direction of revisions: if positive (negative) the preliminary estimate underestimates (overestimates) on average the last estimate. This measure doesn’t give useful information on the size of revisions due to compensations of opposite sign revisions. Other simple measures that can be used as supplementary to the mean revision are the % of positive revisions, the % of negative revisions and the % of revisions = zero. Further measures that give indication on the direction of revisions are the median revision

(MeR), that is not affected by extreme revisions and reinforce the interpretation of the MR:

Median Revision

[A1.9]

Variability

Standard Deviation of Revisions

[A1.10]

Gives a measure of spread of revisions around their mean value (MR) providing an indication on

the volatility of revisions in a given time interval. It is affected by extreme values so it is not a good

measure of dispersion of revisions when their distribution is asymmetric. Another measure of

variability is the Range.

Range

[A1.11]

Various versions of range can be also considered: Range90 is the interval into which the 90% of

revisions stands, Range50 etc.

1 2 nMeAR Me( R , R ,....., R )

1 2 nMaxAR max( R , R ,....., R )

n

1t

pt

lt )YY(

n

1MR

1 2 nMeR Me( R ,R ,.....,R )

n

1t

2t )MRR(

1n

1SDR

1 2 n 1 2 nRange Max( R ,R ,.....,R ) Min( R ,R ,.....,R )

Page 57: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

57

Indicators of the skewness of revisions can also be considered, in order to get indications on the

shape of the revisions distribution around the median value.

Impact of revisions on sign of growth rates

It may be of interest to look at how often the last estimates have an opposite sign with respect to the

preliminary estimates. This issue may be important in periods of low economic growth, when the

growth rates lay around zero and revisions may change the sign to growth rates. In order to measure

this aspect one can refer to the % of observations for which the final and the earlier estimates have

the same sign.

Page 58: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

58

Appendix 3. Contribution of admin data and survey data to revisions

The issue we are tackling is to measure the impact of the change of availability in admin data

between the last and the preliminary estimate on the total revision. Ideally we would like to be able

to decompose the revision in a part due to admin data and in a part due to survey data. At the time

when the final estimate is released, provided that the databases and the informative systems allow

the reconstruction of the estimate combining different versions of the data one can decompose the

impact of updating different sources of data.

In formula, let’s write the last estimate and the preliminary estimate very generally as:

[A2.1]

[A2.2]

That is as a function of survey (s) and admin (a) data available respectively for the deadline of the

last (l) and preliminary (p) estimate.

The function g will depend on the specific estimator.

In order to isolate the contribution of each of the two data sources, it can be built an estimate

simulating the situation where the survey data are preliminary and the admin data are final. We will

refer to this kind of estimate as counterfactual estimate:

[A2.3]

Considering this counterfactual estimate, the revision of the level can be decomposed as:

[A2.4]

Where the first term represents the revision due to the survey data and the second term is the one

due to the admin data1.

By dividing the [A2.4] for we have a decomposition of the revision of the growth rate:

[A2.5]

The specific formula [A2.5] to be used in each situation depends on the specific estimator used by

each country. The contribution of Admin data to the estimate, in fact, depends on the role that they

play in the estimator. For instance, in Italy, after the micro imputation, the admin data will be just

enumerated to get the estimate for small and medium enterprises (Baldi et a. 2011, 2012). Instead in

the regression estimator of Lithuania, used for the small and medium enterprises, the admin data

plays the role of auxiliary variable (Kavaliauskiene 2011)2. In UK testing the admin data are used

after a correction (Orchard et al 2011, Teneva 2012).

1 Alternatively the counterfactual estimate can be defined as one where the survey data are final and the admin data are

preliminary. In this case the [A2.3] becomes and accordingly the [A2.4] becomes

where the first term represents the revision due to the admin data and the second

term is the one due to the survey data.

2 Here the estimate of Lithuania is a bit simplified: it is not considered that, in reality, the estimate of the smallest

enterprises is obtained through the survey and a HT estimator.

( , )l l l

tY g s a

( , )p p p

tY g s a

( , )c p l

tY g s a

( ) ( )l p l c c p

t t t t t tY Y Y Y Y Y

12

l

tY

12 12 12

( ) ( )l p l c c p

t t t t t t

l l l

t t t

Y Y Y Y Y Y

Y Y Y

( , )ac p l

tY g s a

( ) ( )l p l ac ac p

t t t t t tY Y Y Y Y Y

Page 59: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

59

In the following, the specific formulation for the cases of quasi complete data and regression

estimator are presented. Since the source of data and the methodology used is often different for the

population of large and small and medium enterprises it is important to identify the two estimates

separately. We will refer to the largest enterprises with the subscript LE and to the small and

medium enterprises with the subscript SME.

Quasi complete data and mixed mode enumeration.

In this situation, such as the one used by Istat-employment indicators (Baldi et al. 2011, 2012) and

SE and Destatis turnover indicators (Lorenz 2011), the estimate of Y can be obtained as a sum of the

estimates obtained through survey and admin data respectively for the Large enterprises and the

small and medium enterprises.

[A2.6]

The preliminary estimate and the last estimate will be expressed accordingly as:

[A2.7]

[A2.8]

The revision error can be decomposed in the part due to the administrative source and to the survey

source:

[A2.9]

By rewriting it in the following form:

[A2.10]

where the first term represents the contribution of the revision of the survey part to the growth rate,

while the second represents the contribution of the admin data part. In this case the two terms also

correspond to the contribution of large and small/medium enterprises to the total.

A step ahead is the representation in terms of growth rates of the two components. In fact

multiplying and dividing the previous expression by and concentrating on the part due to

survey data, we can write:

[A2.11]

that is: where represents the weight in the final estimate due to the survey.

The same operation can be performed on the part of the estimate due to admin data, so that the total

revision error can be written as:

[A2.12]

This approach is valid for those cases where admin data are used as target variable and no

correction is required.

s a

t LE t SME tY Y Y

p ps pa

t LE t SME tY Y Y

l ls la

t LE t SME tY Y Y

12 12

( ) ( )sl sp al ap

t t t tt sl al

t t

Y Y Y YR

Y Y

12 12 12 12

( ) ( )sl sp al ap

t t t tt sl al sl al

t t t t

Y Y Y YR

Y Y Y Y

12

sl

tY

12 12

12 12 12 12 12 12

( ) ( )sl sp sl sl sp sl

t t t t t t

sl al sl sl sl al

t t t t t t

Y Y Y Y Y Y

Y Y Y Y Y Y

12

s s

t tR 12

s

t

12 12

s s a a

t t t t tR R R

Page 60: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

60

Regression estimator

For a regression estimator, such as the one used by Statistics Lithuania for the estimates of Income

for retail trade (Kavaliauskiene 2011), the estimate of the target variable is coming from the results

of a mixed approach where the estimate of the small and medium enterprises is obtained through a

regression estimator, that is:

[A2.13]

where the estimate of small and medium enterprises is obtained with a regression estimator (here

we omit the subscript SME and s to indicate the survey sample for simplicity):

[A2.14]

where

[A2.15]

is the Horwitz Thompson estimator of the target total and dit are the direct weights

[A2.16]

is the Horwitz Thompson estimator of the auxiliary total

[A2.17]

is the total of the auxiliary variable over the target population list L and is the weighted least

square estimator obtained regressing over the sample s, yit on xit. Notice that in the formula of

enter the direct weights and possibly a term for heteroskedasticity

Now let’s introduce the notation for the preliminary estimate and the last estimate in column 1 and

2 of table A2.1

Table A2.1. Regression estimator for the preliminary, last and counterfactual estimates for small

enterprises

Preliminary estimate Last Estimate Counterfactual estimate

- obtained on sp regressing

on

- obtained on sl regressing

on

- obtained on sp regressing

on

t LE t SME tY Y Y

ˆ( )t t t t tY Y X X

t it it

s

Y d y

t it it

s

X d x

t it

L

X x

t

t

ˆ( )p p p p p

t t t t tY Y X X ˆ( )l l l l l

t t t t tY Y X X ˆ( )c p c c c

t t t t tY Y X X

p

p p

t it it

s

Y d y tl

l l

it it

s

Y d y tp

p p

it it

s

Y d y

tp

p p

it it

s

X d x tl

l l

it it

s

X d x tp

c l

it it

s

X d x

t itpt

p p

L

X x t itlt

l l

L

X x t itpt

c l

L

X x

ˆt

p

it

pyit

px

ˆt

l

it

lyit

lx

ˆt

c

it

pyit

lx

Page 61: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

61

In table A2.1 the superscript p and l indicate, as usual, the information available for the preliminary

and last estimate. Thus sp, yit

p, Lt

p, xit

p , - indicate respectively the sample of respondents, the

value of the target variable, the Register list that identifies the target population, the auxiliary

variable coming from the admin data, and the regression coefficient available for the preliminary

estimate deadline. An analogous meaning have the terms sl, yit

l, Lt

l, xit

l , for final estimate

In the case of SL xitp, the admin data available for the preliminary estimate are actually the VAT

data referred to the month before, while xitl are the admin data referred to the current month. The

notation may be found confusing, since we are using in both cases the subscript t. However we

prefer to leave the subscript t to take into account a more general case of a country that has the

availability of current information from admin data and choose to use a regression estimator to

adjust for definitional or other measure related issues.

To measure the contribution to the revision provided by the change in admin data the counterfactual

estimate can be built using all the information referred to the time of preliminary estimate with the

exception of the admin data (used as auxiliary variables) that have to be referred to the time of the

last estimate.

In practice, the regression coefficient can be obtained by regressing the data yitp on xit

l on the

sample sp, while the total is to be calculated summing up xit

l over the list defined by the registry

used for the preliminary estimate, Lp, and is to be calculated over the sample s

p3.

Reintroducing the estimate of the largest enterprises, the revision of the level can thus be

decomposed as follows:

[A2.18]

Where the first term represents the revision due to the change in survey part (respondents,

modification of the values y, and the (business) register list used as the population frame (if it is

changed between the first and the final estimate4) and the second term is due to the change in admin

data part.

This formula also allows easily to decompose between the estimate of large enterprise and the

estimate of small and medium.

3 An alternative formulation of the preliminary, last and counterfactual estimates can be obtained starting from the

representation of the regression estimator as weighted sum of the sample units, where the weights are a product of the

direct weights and the g weights. 4 Following the logic behind the decomposition one might also want to isolate the contribution due to the updating of

the register, if it enter in the final estimate.

ˆit

p

ˆt

l

t

cX

t

cX

( ) ( )l p l c c p

t t t t t tY Y Y Y Y Y

[( ) ( )] [( ) ( )]ls ls ps cs ps cs ps ps

LE t SME t LE t SME t LE t SME t LE t SME tY Y Y Y Y Y Y Y

Page 62: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

62

Appendix 4. SAS code for the calculation of the summary statistics on

revisions and graphs

This section provides the SAS (v9.2) code to calculate the table of summary measures and the

graphs of estimates and revisions.

The input data should have a cross-sectional/time series form as follows:

year month domain v1 v2 ….. Vn

2009 1 RTD 2.5 2.5 0.6

2009 2 RTD -2.2 ….. -1.1

….. ….. ….. ….. ….. …..

2010 1 RTD

2010 2 RTD

….. ….. ….. ….. ….. …..

2009 1 CON 4.3 ….. 2.5

2009 2 CON 2.1 ….. 2.0

….. ….. ….. ….. ….. …..

2010 1 CON

2010 2 CON

….. ….. ….. ….. ….. …..

Where for each time occurrences (year, month) and for the target domains (RTD=retail trade,

CON=construction…), the various vintages of year-on-year growth rates are variables here labelled

as v1,v2,…,vn.

The code is divided in 3 steps: 1) manual settings,2) data management 3) calculate and output

statistics and graphs.

In the manual settings, beyond indicating the input and output paths, three macro parameters are set:

the input dataset, and the two vintages to compare.

The code provides also an example to conditional formatting the cell of the table of summary

measures, to highlight the values exceeding certain predefined thresholds. It might be useful when

the table contains several domains to identify quickly those problematic. In the example the values

between 0.3 and 1 are coloured in yellow and those exceeding 1 in red. This conditional formatting

is applied to the RMAR. It is to be stressed that the it is just an example and different rules and

threshold may be used as according the choices of the analyst.

The code produces the following outputs :

1) Report_&vi.xls that contains the report on summary statistics on revisions based on the

vintages chosen for the comparison;

2) Graph_&vi.html that contains the graphical analysis on y-on-y growth rate.

/*Step 1 – Manual Setting*/

libname libIO "\\PATH WHERE READ THE INPUT(directory)";

filename outpath "\\PATH WHERE SAVE THE REPORTS AND THE GRAPHS(directory)";

proc format;

value cback

low - 0.3 = 'white'

0.3<- 1 = 'yellow'

Page 63: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

63

1< - high = 'red';

run;

*datain is the name of the input sas dataset;

%let datain=dati;

*vj is the name of the y-on-y growth rate chosen as base (i.e. the last estimate);

*vi is the name of the y-on-y growth rate chosen as comparison (i.e. the first estimate);

%let vj=vn;

%let vi=v1;

/*Step 2 – Data management */

%macro prepare;

data &datain (drop= year month);

set libIO.&datain ;

date=mdy(month,1,year);

format date monyy7.;

run;

proc sort data=&datain;

by domain date; run;

data &datain;

set &datain;

if (&vi gt .) then do;

rev&vi._flgt0_vt=0;

rev&vi._fllt0_vt=0;

rev&vi._flsign_vt=0;

&vi._flsign_vt=0;

rev&vi=(&vj-&vi);

revabs&vi=abs(rev&vi);

&vi.abs=abs(&vi);

if rev&vi gt 0 then rev&vi._flgt0_vt=1;

if .<rev&vi<0 then rev&vi._fllt0_vt=1;

if (&vj gt 0 and &vi gt 0) or (.<&vj<0 and .<&vi<0) then &vi._flsign_vt=1;

end;

run;

%mend prepare;

/*Step 3 – Macro to produce a report on summary statistics on revisions and a graph analysis */

%macro revision;

*Calculation of summary statistics on revisions;

proc means data=&datain nway noprint;

where rev&vi ne .;

Page 64: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

64

class domain;

var &vj &vi.abs rev&vi revabs&vi rev&vi._flgt0_vt

rev&vi._fllt0_vt &vi._flsign_vt date;

output out=summary&vi n(rev&vi)=nrev mean(revabs&vi)=MAR mean(&vi.abs)=MAP4

max(revabs&vi)=MaxAR

median(revabs&vi)=MeAR mean(rev&vi)=MR mean(rev&vi._flgt0_vt)=Perc_gt0

mean(rev&vi._fllt0_vt)=Perc_lt0

median(rev&vi)=MeR std(rev&vi)=sdr min(rev&vi)=minrev max(rev&vi)=maxrev

range(rev&vi)=range mean(&vi._flsign_vt)=Perc_signeq min(date)=mindate

max(date)=maxdate ;

run;

data summary&vi (drop=_TYPE_ _freq_ );

set summary&vi;

RMAR=MAR/MAP4;

Perc_gt0=Perc_gt0*100;

Perc_lt0=perc_lt0*100;

Perc_signeq=Perc_signeq*100;

period=put(mindate,monyy7.)||'-'||put(maxdate,monyy7.);

run;

ods tagsets.excelxp path=outpath file="REPORT_&vi..xls"

style=journal options(index='yes' embedded_titles='yes' embedded_footnote='yes'

sheet_interval='none') ;

/*Report On Summary Measures Of Revisions*/

proc report data=summary&vi nowd split='*';

title 'Summary measures of revisions';

col ('Domain' domain) ('N.*occurrences' nrev) ('Period' period)

('Size' MAR RMAR maxar mear)

('Direction' mr perc_gt0 perc_lt0 mer) ('Variability' sdr range) ('Impact on*growth rates*direction'

perc_signeq);

define domain /' ' style={tagattr="format:@"};

define nrev /' ' display;

define period /' ' display;

define MAR/'MAR' display;

define RMAR/'RMAR' display style(column) ={background=cback.};

define maxar/'MaxAR' display;

define mear/'MeAR' display;

define mr/'MR' display;

define perc_gt0/ '% > 0' display;

define perc_lt0/'% < 0' display;

define mer/'MeR' display;

define sdr/'SDR' display;

define range/ 'Range' display;

define perc_signeq/ '% Sign(Later)*=Sign(Early)' display;

format MAR RMAR maxar mear mr perc_gt0 perc_lt0 mer sdr range perc_signeq 8.1 ;

run;

ods tagsets.excelxp close;

Page 65: SE OF ADMINISTRATIVE AND ACCOUNTS DATA IN BUSINESS STATISTICS 2011... · 1 essnet use of administrative and accounts data in business statistics wp4 timeliness of administrative sources

65

*Graph Analysis;

proc sort data=&datain;

by domain date; run;

ods html path=outpath file="GRAPH_&vi..html" style=journal;

proc sgplot data=&datain ;

title "Growth rates: base estimate (&vj), compare estimate (&vi) and revision (&vj minus &vi)";

where &vi ne .;

by domain;

vline date/ response=&vi lineattrs=(color=red) legendlabel="&vi";

vline date/response=&vj lineattrs=(color=blue) legendlabel="&vj";

vbar date / response=rev&vi fillattrs=(color=lightgrey) barwidth=.4 legendlabel='Rev.';

yaxis label=' ';

xaxis label=" " grid interval=monthfitpolicy=thin;

format date monyy7.;

run;

ods html close;

%mend revision;

%prepare;

%revision;