Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila...

18
Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th of May, 11.00-12.30

description

Internal E&I Study of StatF i Forms the basis for the work of the project. Describes the current E&I situation at StatFi. Reveals points where the developmental resources should be allocated in later phases of the project. 5 May 20103Janika Konnu, Pauli Ollila INTERNAL E&I STUDY OF STATFI SURVEY OF E&I PRACTICES AT STATFI DETAILED STUDIES OF E&I IN SOME STATISTICS OTHER STUDIES (e.g. auditing reports) Part 2 Janika Konnu Part 1 Pauli Ollila

Transcript of Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila...

Page 1: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Study of Editing and Imputation Practices at Statistics Finland

Janika Konnu and Pauli OllilaStatistics Finland

Q2010: Editing sessionWednesday 5th of May, 11.00-12.30

Page 2: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Editing Project of Statistics Finland

5 May 2010 2Janika Konnu, Pauli Ollila

INTERNAL E&ISTUDY

OF STATFI

EXTERNAL E&I STUDY

DEVELOPMENTALWORK FOR THE

NEEDS OF STATFI

INFORMATIONAND

EDUCATION

Development project of two years Targets: to provide good E&I practices, help in making statistics

more effective, improve quality, diminish work load, save costs.

Page 3: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Internal E&I Study of StatFi

Forms the basis for the work of the project.

Describes the current E&I situation at StatFi.

Reveals points where the developmental resources should be allocated in later phases of the project.

5 May 2010 3Janika Konnu, Pauli Ollila

INTERNAL E&ISTUDY

OF STATFI

SURVEY OF E&I

PRACTICES AT STATFI

DETAILED STUDIES OF E&I IN SOME STATISTICS

OTHER STUDIES

(e.g. auditing reports)

Part 2Janika Konnu

Part 1Pauli Ollila

Page 4: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Survey of E&I Practices at StatFi Conducted in January 2010. A web questionnaire was used. Directed to all statistics of StatFi, providing information from all relevant

statistics (exceptions: statistics were finished, were to be finished, were in transition etc.)

Equivalence = one response equals also one or more other statistics

5 May 2010 4Janika Konnu, Pauli Ollila

SURVEY OF E&I

PRACTICES AT STATFI

STATISTICS DEPARTMENT RESPONSES EQUIVALENCES STATISTICS IN ALL

Population Statistics 34 17 51

Social Statistics 18 0 18

Prices and Wages 17 7 24

Economic Statistics 20 11 31

Business Trends 20 4 24

Business Structures 25 12 37

ALL 134 51 185

Page 5: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Topics of E&I Survey The survey tried to cover all

important aspects connected to editing and imputation.

The question pattern was commented and tested with E&I and survey experts together with subject matter people.

The structure allowed open-space commenting on every page. This proved to be a very valuable asset.

5 May 2010 5Janika Konnu, Pauli Ollila

SURVEY OF E&I

PRACTICES AT STATFI

SURVEYS, REGISTERS,

SOURCE DATA

DATA COLLECTION

METHODS

PRELIMINARY OPERATIONS

ERROR RECOGNITION

PRACTICES

MISSING VALUE

PRINCIPLES

ERROR CORRECTION

AND IMPUTATION

REPORTINGDATA ARCHIVING

Page 6: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Analysing and Utilising the Results

5 May 2010 6Janika Konnu, Pauli Ollila

SURVEY OF E&I

PRACTICES AT STATFI

DATA BASE OF PRACTICES IN

STATISTICS

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

MAKING “STATISTICS TYPES” BY COMMON

PRACTICES

STUDYING E&I PROCESSES (string of practices, descriptions)

PROVIDES GOOD BASIS FOR THE

DEVELOPMENTAL WORK OF EDITING

PROJECT

VALUABLE INFORMATION FOR

PLANS OF STATISTICS DEPARTMENTS AND OTHER INSTANCES

Page 7: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Example 1: Work time spent for editing and imputation in statistics (%)

5 May 2010 7Janika Konnu, Pauli Ollila

STATISTICS DEPARTMENT

Mis-sing

0 - 10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 ALL

Population Statistics 2 23 7 1 4 4 1 0 9 51Social Statistics 0 9 4 1 2 0 2 0 0 18Prices and Wages 1 11 3 2 3 1 1 2 0 24Economic Statistics 8 8 4 2 0 3 3 0 3 31Business Trends 0 11 3 2 2 0 1 5 0 24Business Structures 2 13 1 5 2 1 0 1 12 37ALL 13 75 22 13 13 9 8 8 24 185

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 8: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Example 2: Type of data in making statistics at Statistics Finland

5 May 2010 8Janika Konnu, Pauli Ollila

STATISTICS DEPARTMENT

SUR REG SOU SURREG

SURSOU

REGSOU

SURREGSOU

ALL

Population Statistics 0 12 7 4 4 9 15 51Social Statistics 1 2 2 10 1 1 1 18Prices and Wages 0 1 1 2 12 0 8 24Economic Statistics 4 0 8 4 4 1 11 32Business Trends 0 2 0 10 0 1 9 22Business Structures 4 1 2 4 1 5 21 38ALL 9 18 20 34 22 17 65 185

SUR = survey, REG = register, SOU = source data

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 9: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

5 May 2010 9Janika Konnu, Pauli Ollila

Example 3: Technical editing at the unit level

Statistics with unit-level processing

Pop.Stat.(44)

Soc.Stat.(15)

Pric. &Wages(22)

Econ.Stat.(24)

Busin.Trends(22)

Busin.Struct.(32)

ALL(159)

Unit-level examination with a computer

19 10 17 23 18 29 116

Logical checks using a program or otherwise

37 13 8 21 13 25 117

Defining non-valid variable values

31 12 8 14 11 19 95

Listing extreme values of variables

13 11 9 10 11 24 78

Comparing with previous or other values

34 10 14 22 13 23 116

Ratio of values of two variables or different time points, other functions

16 8 5 13 4 19 65

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 10: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

5 May 2010 10Janika Konnu, Pauli Ollila

Example 4: Model editing at the unit level

Statistics with unit-level processing Pop.Stat.(44)

Soc.Stat.(15)

Pric. &Wages(22)

Econ.Stat.(24)

Busin.Trends(22)

Busin.Struct.(32)

ALL(159)

Defining the certainty of different variables to be right in the case of conflicting variables (reliability weight, minimum change Fellegi-Holt -principle)

6 3 2 0 6 0 17

Comparing modelled value and observed value

0 1 4 8 1 1 15

Modelling variable values / observations risk to be erroneous (e.g. selective editing)

1 1 1 0 0 0 3

Finding problematic values with defining the importance of the observation or so called sensitivity function (reveals the effect of the observation to the estimate)

0 5 12 0 7 6 30

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 11: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

5 May 2010 11Janika Konnu, Pauli Ollila

Example 5: Macro editingStatistics with unit-level processing Pop.

Stat.(44)

Soc.Stat.(15)

Pric. &Wages(22)

Econ.Stat.(24)

Busin.Trends(22)

Busin.Struct.(32)

ALL(159)

Studying distributions and cross-tabulations

32 15 6 15 6 23 97

Information from calculating preliminary estimates (e.g. mean, total, correlation, deviation)

23 14 10 15 7 26 95

Controlling the joint effect of survey weights and exceptional values

0 5 4 0 1 5 15

Comparing with estimates from previous occasion(s), valid limits for estimates (e.g. time series)

15 11 15 18 10 26 95

Using graphical methods 8 8 5 13 7 15 56Studying aggregated data 25 6 19 19 17 28 114Comparing with other possible data 28 10 8 18 7 27 98

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 12: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

5 May 2010 12Janika Konnu, Pauli Ollila

Example 6: Treatment types (not imputation)Statistics with unit-level processing Pop.

Stat.(44)

Soc.Stat.(15)

Pric. &Wages(22)

Econ.Stat.(24)

Busin.Trends(22)

Busin.Struct.(32)

ALL(159)

Getting contact to the respondent and asking the value or getting it from the paper questioinnaire of the postal enquiry

27 5 17 20 16 30 115

Fetching the previous value (cold-deck)

6 2 13 11 8 20 60

Getting the value from another observation or another source

12 5 13 14 14 25 83

Getting the real value by reasoning based on the information of the observation in question

27 7 8 21 13 27 103

Correcting automatically with program lines including conditions or based on a list of erroneuos values (e.g. ‘america’ = ‘United States’)

37 8 6 14 10 18 93

Correcting automatically based on risk functions (e.g. selective editing)

0 0 1 0 6 0 7

DISTRIBUTIONS OF PRACTICES AT

VARIOUS LEVELS

Page 13: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Example 1: Statistics with no unit-level processing

5 May 2010 13Janika Konnu, Pauli Ollila

Collecting statistics utilises statistics and tabulations from several sources, and after gathering information the required form of the statistics is reached (6 statistics). Strict processing statistics are based on one or more data (statistical data, external source data or register), which are used strictly without changes in order to make the statistics (9 statistics). Calculation model statistics lean on existing, already edited data and/or tabulations/statistics in such way that with using them one can realise a mathematical or statistical calculation model required by the statistics (11 statistics).

MAKING “STATISTICS TYPES” BY COMMON

PRACTICES

Page 14: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Example 2: Different types of utilising statistics (i.e. estimates from other sources)

5 May 2010 14Janika Konnu, Pauli Ollila

MAKING “STATISTICS TYPES” BY COMMON

PRACTICES

Direct use of statistics: statistics (estimates) are directed straight to the process of making statistics, or it goes through a standard treatment before the process. Additions and checks: statistics (estimates) are used for treating missing values and errors and/or for various checks. Making expansion weights: statistics (estimates) and distributions are utilised for making weights expanding the results to the population level (e.g. calibration). Index calculation Account calculation A part of calculating results: all purposes of using statistics (estimates) in calculating the results (excluding index and account calculation).

Page 15: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Example 3: Types of data collection

5 May 2010 15Janika Konnu, Pauli Ollila

MAKING “STATISTICS TYPES” BY COMMON

PRACTICES

STATISTICS DEPARTMENTOnly statistics with data collection

Pop.Stat.

Soc.Stat.

Pri.Wag

Econ.Stat.

Busin.Tren.

Busin.Struct.

ALL

Full Blaise-based data collection 0 7 0 2 1 0 10Paper questionnaire collection only 0 1 0 0 0 0 1Diary surveys 0 2 0 0 0 0 2XCOLA-based data collection 2 0 0 2 1 5 10XCOLA and paper combination 0 0 1 0 3 0 4XCOLA and Excel combination 0 0 8 2 3 2 15Other web collection made in StatFi 0 0 1 0 6 0 7Web collection via external server 5 1 3 4 1 19 33Excel-based data delivery 3 0 3 3 1 4 14Other data delivery or transfer 10 0 2 3 3 0 18YHTEENSÄ 20 11 18 16 19 30 114

Page 16: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

Detailed interviews with statistics

Interviews with different type of statistics from production and editing point of view

Informal discussions with 1-2 interviewers and 1-2 persons from the statistic

Reports finalised with the interview persons and made available for everyone in StatFi

5 May 2010 16Q2010 Konnu and Ollila

DETAILED STUDIES OF E&I IN SOME STATISTICS

Page 17: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

5 May 2010 17Q2010 Konnu and Ollila

DETAILED STUDIES OF E&I IN SOME STATISTICS

Most common methods for editing and imputation Editing

deterministic checking rules

local checkingdistributional checkinguse of other sources or

historical data

Imputationmanualcold deckaveragehot deckautomatic imputation

(checking lists)

Page 18: Study of Editing and Imputation Practices at Statistics Finland Janika Konnu and Pauli Ollila Statistics Finland Q2010: Editing session Wednesday 5 th.

DETAILED STUDIES OF E&I IN SOME STATISTICS

General impression of editing and imputation in StatFi

Usually we take new contact to the respondent

Deduction is used if it’s possible

Personnel has strong contentual knowledge and awareness of current events

Personnel is very interested in and willing to work for methodological improvements

5 May 2010 18Q2010 Konnu and Ollila