Eurostat business process (data processing) & CVD October 2007.

55
Eurostat business process (data processing) & CVD October 2007

Transcript of Eurostat business process (data processing) & CVD October 2007.

Page 1: Eurostat business process (data processing) & CVD October 2007.

Eurostat business process (data processing)

&CVD

October 2007

Page 2: Eurostat business process (data processing) & CVD October 2007.

2

ContentShown today:1. Eurostat business process (data processing) &

CVD – main presentation shown today1. Proposed business model2. Its correspondence to CVD architecture

For reference:1. List of sub-processes and sub-sub-processes2. CVD modules and their relation to business

sub-processes and sub-sub-processes3. CVD modules brief description Implementation

modes and availability schedule

Page 3: Eurostat business process (data processing) & CVD October 2007.

3

Eurostat business process(data processing)

Process

Sub-process (or sub-sub-process)

Sub-process (or sub-sub-process) without software development component

Page 4: Eurostat business process (data processing) & CVD October 2007.

4

ProcessesManage meta-information

5

Disseminate

4

Validate

2

Analyse

3

Collect

1

1. Proposal for discussion

2. Pick-and choose & mix and match

• No order in execution although the numbering follows typical order logic.

Page 5: Eurostat business process (data processing) & CVD October 2007.

5

Data files

MH

TDS Statistical data and metadata Internet Portal

NUI

DL

CVD MANAGER

Pre-treated data

Validated data

Processed data

Reference Environment

data

BB

Domain specificsoftware

DR / DL

BB

Domain specific software

DR / DL

BB

Domain specificsoftware

ASSIST

User support

EDAMIS

CVD ARCHITECTURE

Page 6: Eurostat business process (data processing) & CVD October 2007.

6

Notes

• Each BB can be run in batch (CVD) and interactive mode (stand alone)

• TDS, EDAMIS, NUI, MH, DL are (or will be) compulsory

• BBs a set of tool to mix and match• Domain specific software – procedures that are

unique to a few statistical applications and benefits from developing a generalised solution considered nonexistent

Page 7: Eurostat business process (data processing) & CVD October 2007.

7

Cooperate with providers

1.4

Acquire domain intelligence

3.1

Set up collection1.1

Run collection1.2

Load data1.3

Edit2.1

Detect & treat outliers2.2

Impute2.3

Derive new variables2.4

Integrate and load data

2.5

Prepare tables forDissemination

3.5

Interpret and explain3.4

Check quality3.3

Produce statisticsor indicators

3.2

Manage customer queries

4.2

Produce products4.1

Collect1

Disseminate4

Analyse3

Validate2

Manage meta-Information

5

Page 8: Eurostat business process (data processing) & CVD October 2007.

8

Data files

MH

TDS Statistical data and metadata Internet Portal

NUI

DL

CVD MANAGER

Pre-treated data

Validated data

Processed data

Reference Environment

data

BB

Domain specificsoftware

DR / DL

BB

Domain specific software

DR / DL

BB

Domain specificsoftware

COLLECT

VALIDATE

ANALYSE

DISSEMINATE

ASSIST

User support

EDAMIS

MANAGE META-INFORMATION

Page 9: Eurostat business process (data processing) & CVD October 2007.

9

Summary of CVD modules & business process

+

Module especially designed for the sub-process

Module designed for other sub-process but could be used for this sub-process as well if the functionalities are appropriate

Modules that are used throughout many processes

Other uses may be possible in specific cases

Page 10: Eurostat business process (data processing) & CVD October 2007.

10

TD

S

CV

D M

AN

AG

ER

ED

AM

IS

load

er B

B

read

er B

B

edit

ing

BB

ou

tlie

rs B

B

imp

uta

tio

n B

B

der

ivat

ion

BB

eco

no

mic

ind

ices

BB

GS

AS

T

seas

on

al a

dj B

B

AN

AL

YT

ICA

L

con

fid

enti

alit

y B

B

NU

I

AS

SIS

T

MH

1. COLLECT

1.1. Set up collection 1.2. Run collection 1.3. Load data +

2. VALIDATE

2.1. Edit + + + +2.2. Detect and treat outliers + + + +2.3. Impute + + + +2.4. Derive + + +2.5. Integrate and load data + +

3. ANALYSE

3.2. Produce statistics or indicators + + + 3.3. Check quality + + + + + + +3.4. Interpret and explain + + + + 3.5. Prepare tables for dissemination + + + +

4. DISSEMINATE

4.1. Produce products 4.2. Manage customer queries

5. MANAGE METAINFORMATION

Page 11: Eurostat business process (data processing) & CVD October 2007.

11

END OF MAIN MODULE

Page 12: Eurostat business process (data processing) & CVD October 2007.

12

Navigation

Previous slide

• Most of boxes and frames on presentation contain links

• Names of the CVD modules and BBs are usually link enabled

• Names of the processes as well

Page 13: Eurostat business process (data processing) & CVD October 2007.

13

START OF SUB-SUB-PROCESS

LIST

Page 14: Eurostat business process (data processing) & CVD October 2007.

14

Manage provider relationship

1.4.1

Maintain provider information

1.1.7

Manage provider burden across surveys

1.4.2

Train staff on collection1.1.6

Run collection test1.1.5

Set up collection security1.1.4

Configure collection systems

1.1.3

Pre-validate data1.3.2

Allocate collection responsibilities

1.1.2

Produce collection strategy and schedule

1.1.1

Monitor & report on collection

1.2.5

Follow up non-responses1.2.4

Collect data1.2.3

Request data1.2.2

Contact provider with pre-collection information

1.2.1

Load data & metadata to data environments

1.3.3

Receive electronic data1.3.1

Cooperate with providers1.4

Load data1.3

Run collection1.2

Set up collection1.1

From data arrival to data ready for processing (raw data)

COLLECT

Page 15: Eurostat business process (data processing) & CVD October 2007.

15

Evaluate imputation results

2.3.4

Run imputation2.3.3

Identify items for special treatment

2.3.2

Impute

2.3

Revise existing data2.3.1

Detect and Treat Outliers

2.2

Integrate & load data2.5

Derive New Variables

2.4

Edit

2.1

Detect outliers2.2.1

Manually edit variables

2.1.3

Treat outliers2.2.2

Provide feedback to providers

2.5.3

Evaluate quality of incoming data

2.5.2

Derive variables / indicators

2.4.1

Prepare & load data2.5.1

Resolve versioning

2.1.1

Auto edit variables

2.1.2

From raw, collected data to validated data

VALIDATE

Page 16: Eurostat business process (data processing) & CVD October 2007.

16

Prepare tables for dissemination

3.5

Produce statistics & indicators

3.2.1

Research data sources &

methodology 3.1.3

Produce seasonal adjustment

3.2.2

Assess quality measures against quality standards

3.3.4

Compare with previous periods

3.3.2

Check non-sampling errors3.3.1

Apply confidentiality rules3.5.1

Produce statistics & indicators

3.2

Interpret and explain

3.4

Check quality

3.3

Acquire domain intelligence

3.1

Collect external information

3.1.1

Collect internal data & information

3.1.2

Produce reports

3.1.6

Evaluate & synthesise knowledge

3.1.5

Manage domain knowledge

3.1.4

Prepare microdata files3.2.3

Carry out in-depth statistical analysis

3.4.2

Analyse time series dimension

3.4.1

Verify against expectations &

intelligence3.3.5

Confront with other data sources

3.3.3

Produce quality measures for

statistics3.2.4

Approve explanation and

statistics3.4.4

Identify story / commentary to the

data 3.4.3.

Carry out edit and consistency checks

3.5.2

Finalise tables

3.5.3

Approve tables

3.5.4

From validated data to analysed data and tables

ANALYSE

Page 17: Eurostat business process (data processing) & CVD October 2007.

17

Get customer feedback

4.2.4

Set up for production4.1.1

Transfer data from internal to external

environment4.1.2

Media relations

4.1.3

Other DG / NSI relations

4.1.4

Lift embargo and release products

4.1.5

Analyse and resolve query4.2.3

Review and record customer query

4.2.1

Allocate query

4.2.2

Produce products

4.1

Manage customer queries

4.2

From tables and analysis to customised disseminated products

DISSEMINATE

Page 18: Eurostat business process (data processing) & CVD October 2007.

18

Produce information and

explanation 5.2

Determine information and

explanation 5.1

Appraise the long-term value of

metadata5.5

Prepare metadata for repository

5.3

Load repositories5.4

MANAGE META-INFORMATION

Page 19: Eurostat business process (data processing) & CVD October 2007.

19

END OF SUB-SUB-PROCESS LIST

START OF BB & SUB-PROCES CROSS REFERENCE

Page 20: Eurostat business process (data processing) & CVD October 2007.

20

1.1 Set up collection

• EDAMIS

Train staff on collection1.1.6

Run collection test

1.1.5

Set up collection security1.1.4

Configure collection systems1.1.3

Allocate collection responsibilities

1.1.2

Produce collection strategy and schedule

1.1.1

Maintain provider information1.1.7

Page 21: Eurostat business process (data processing) & CVD October 2007.

21

1.2 Run collection

• EDAMIS

Monitor & report on collection

1.2.5

Follow up non-responses

1.2.4

Collect data

1.2.3

Request data

1.2.2

Contact provider with pre-collection

information1.2.1

Page 22: Eurostat business process (data processing) & CVD October 2007.

22

1.3 Load Data

• EDAMIS

• Editing BB / EDAMIS

• Loader BB

Pre-validate data

1.3.2

Load data & metadata to data

environments1.3.3

Receive electronic data1.3.1

Page 23: Eurostat business process (data processing) & CVD October 2007.

23

2.1 Edit

• EDAMIS

• Reader BB

• Editing BB

• GSAST

• Loader BB

Resolve versioning

2.1.1

Auto edit variables

2.1.2

Page 24: Eurostat business process (data processing) & CVD October 2007.

24

2.2 Detect and treat outliers

• Outliers BB• Reader BB• GSAST

• Derivation BB• GSAST• Loader BB

Detect outliers2.2.1

Treat outliers2.2.2

Page 25: Eurostat business process (data processing) & CVD October 2007.

25

2.3 Impute

• Imputation BB

• Reader BB

• Derivation BB

• GSAST

• Loader BBEvaluate

imputation results2.3.4

Run imputation

2.3.3

Identify items for special treatment

2.3.2

Revise existing data2.3.1

Page 26: Eurostat business process (data processing) & CVD October 2007.

26

2.4 Derive new variables

• Derivation BB• Reader BB

• GSAST

• Loader BB

Derive variables / indicators

2.4.1

Page 27: Eurostat business process (data processing) & CVD October 2007.

27

2.5 Integrate and load data

• Loader BB

• GSAST

• Editing BB

Evaluate quality of incoming data

2.5.2

Prepare & load data2.5.1

Page 28: Eurostat business process (data processing) & CVD October 2007.

28

3.2 Produce statistics or indicators• Reader BB• Derivation BB• GSAST• Economic indices BB

• Seasonal adjustment BB• GSAST

• GSAST• Derivation BB

• GSAST• Derivation BB

• Loader BB

Produce statistics & indicators

3.2.1

Produce seasonal adjustment

3.2.2

Prepare microdata files

3.2.3

Produce quality measures for

statistics3.2.4

Page 29: Eurostat business process (data processing) & CVD October 2007.

29

3.3 Check Quality

• Editing BB• Derivation BB• ANALYTICAL• GSAST• Reader BB• Outliers BB• Economic indices BB• NUI

Assess quality measures

against quality standards

3.3.4

Compare with previous periods

3.3.2

Check non-sampling errors

3.3.1

Confront with other data sources

3.3.3 {

Page 30: Eurostat business process (data processing) & CVD October 2007.

30

3.4 Interpret and explain

• Analytical BB

• GSAST

• Reader BB

• Seasonal adjustment BB

• Economic indices BB

• NUI

Carry out in-depth statistical

analysis3.4.2

Analyse time series dimension

3.4.1

Identify story / commentary to

the data 3.4.3

{

Page 31: Eurostat business process (data processing) & CVD October 2007.

31

3.5 Prepare tables for dissemination

• Confidentiality BB• Reader BB

• Editing BB• GSAST

• Derivation BB• Loader BB

Apply confidentiality

rules3.5.1

Carry out edit and consistency

checks 3.5.2

Finalise tables

3.5.3

Page 32: Eurostat business process (data processing) & CVD October 2007.

32

4.1 Produce products

• NUI

Set up for production

4.1.1

Transfer data from internal to external

environment4.1.2

Lift embargo and release products

4.1.5

Page 33: Eurostat business process (data processing) & CVD October 2007.

33

4.2 Manage customer queries

• ASSIST

Get customer feedback

4.2.4

Analyse and resolve query4.2.3

Review and record customer query

4.2.1

Allocate query

4.2.2

Page 34: Eurostat business process (data processing) & CVD October 2007.

34

5 Manage meta-information

• MH

Prepare metadata for repository

5.3

Load repositories

5.4

Page 35: Eurostat business process (data processing) & CVD October 2007.

35

END OF BB & SUB-PROCES CROSS REFERENCE

START OF BB DESCRIPTION

Page 36: Eurostat business process (data processing) & CVD October 2007.

36

Target Data Storage (TDS)

• not a software• unique structure of the database• contains both statistical data and metadata (all

kinds)• uniqueness allows to implement coherence rules

for the data and metadata throughout the CVD processes

• structure allowing new types (considering way they are used) of metadata to be added

Page 37: Eurostat business process (data processing) & CVD October 2007.

37

CVD MANAGER

To implement a workflow approach for the production process • based on a design of the particular production process

To control and schedule the invoking of the CVD components within the various stages of statistical production process

At each stage of the production process will interact with the human domain manager or with a software component to:

• Launch software• Control output • Request input• Provide status reports on whole process and its individual

components

Page 38: Eurostat business process (data processing) & CVD October 2007.

38

EDAMIS

• supports the transmission of statistical data from Member States to Eurostat

• ensures secure and well monitored transmission of data through a single reception point

• delivery of data to production environments• user access management • links to structural metadata• basic validation• format conversion

Page 39: Eurostat business process (data processing) & CVD October 2007.

39

Loader BB

• loads data and reference metadata in the update or replace mode at the same time assuring its coherence with existing metadata

• algorithm contains coherence rules

• can be used any time during the processing for both data and reference metadata

Page 40: Eurostat business process (data processing) & CVD October 2007.

40

Reader BB

• reads data and metadata and assembles for further processing by other BBs (various formats)

• can be used any time during the processing for both data and metadata

Page 41: Eurostat business process (data processing) & CVD October 2007.

41

Editing BB

• executes editing rules optionally with reference data (lookup tables)

• intra-cell, intra-record (horizontal) and inter-record (vertical) rules

• reports on the rules execution

• allows interactive review of messages

• can be provided to MS for editing at source

Page 42: Eurostat business process (data processing) & CVD October 2007.

42

Outliers BB

• basic and statistical methods to identify outliers

• methods: – Hidiroglou-Berthelot and σ-gap– top and bottom – number or percentiles and

conditions

• Reports on the execution• in future multidimensional distance

measures

Page 43: Eurostat business process (data processing) & CVD October 2007.

43

Imputation BB

• t.b.d. note: possibly based on BANFF software, any system should be really very similar to BANFF

• Implementation of various mathematical imputation methods

• last BB to be developed• Scope not yet established

Page 44: Eurostat business process (data processing) & CVD October 2007.

44

Derivation BB

• Derives new variables optionally with reference data (lookup tables)

• intra-cell, intra-record (horizontal) and inter-record (vertical) derivations

• reports execution

• allows interactive review of messages

• Uses the same engine (subset) as editing BB

Page 45: Eurostat business process (data processing) & CVD October 2007.

45

Economic indices BBCalculates indices used in economy

– Weighted arithmetic mean

– Weighted geometric mean

– Weighted harmonic mean

– Laspeyres

– Paasche

– Lowe

– Edgeworth

– Bowley

– Fisher

– Laspeyres (Geometric)

– Paasche (Geometric)

– Törnqvist-Theil

– Laspeyres (harmonic)

– Paasche (harmonic)

– Chain index

– EKS(-S)

Page 46: Eurostat business process (data processing) & CVD October 2007.

46

GSAST

• Generic system for treating micro-data and operations of micro and macro-data from surveys

• Based on SAS base, BI server and Enterprise guide functionalities

• Also for unique or unusual processing requirements

Page 47: Eurostat business process (data processing) & CVD October 2007.

47

Seasonal adjustment BB

Calculates seasonally adjusted time series.

• Based on X12 and Tramo Seats methods

Page 48: Eurostat business process (data processing) & CVD October 2007.

48

ANALYTICAL

• various mathematical and visual analysis and review of the data

• visualisation through graphing of the data

• statistical analysis

(the exact scope not yet determined – possibly through SAS)

Page 49: Eurostat business process (data processing) & CVD October 2007.

49

Confidentiality BB

• performs confidentiality verification of tables

• applies various masking techniques assuring confidentiality of published statistics

• Based on CSB μ-argus and τ-argus

Page 50: Eurostat business process (data processing) & CVD October 2007.

50

NUI

To provide access to the statistical reference databases of Eurostat.

• single tool for all data and metadata• based on the principles of graphical tools • highly interactive operation• metadata is presented to the user • shows relation of different types of metadata• can be used inside Eurostat

Page 51: Eurostat business process (data processing) & CVD October 2007.

51

ASSIST

• User support tool• Parallel to e-mail system (with attachments)• Service request• Request follow-up• Searchable, central public knowledge database• Decentralised help centres / persons• Sub-systems by subject matter, geography or

any other classification• Access management (to appropriate parts of the

system by administrative privileges or subject matter)

Page 52: Eurostat business process (data processing) & CVD October 2007.

52

MH – metadata handler (1 of 3)

System for handling all the production aspects of classifications, associations and other statistical metadata.

• Updates on the Nomenclatures Codes• Updates on label values of Nomenclature Codes• Updates on the Relations between codes• Updates on label values of Relations• Export classifications, relations to files• Create aggregates from relationships

Page 53: Eurostat business process (data processing) & CVD October 2007.

53

MH – metadata handler (2 of 3)

• Check Relationship Completeness• Footnotes on labels• Materialized View classifications & relationships: allows to create a subset of a classification or a relation by defining:

• Selection rules (wildcard expressions)• SQL statements (SQL Generation wizard)

• Dictionary• Automatic Creation and Update of relationships

• Creation through other existing relationships• Update through Successor/Predecessor

• Multidimensional Nomenclatures• Simple or Subkeys as code

Page 54: Eurostat business process (data processing) & CVD October 2007.

54

MH – metadata handler (3 of 3)Allows management:• Dataset trees • Composite and normal datasets (creation, update, etc…)• Visibility and accessibility flags on objects (datasets,

dictionaries, classifications, etc.)• Classification’s default attribute• Transposition of datasets (micro-data)• implementation list (dictionary)• Access Control Lists• methods• Confidentiality scripts• attachments• cells attachments (footnotes)• presence table.

Page 55: Eurostat business process (data processing) & CVD October 2007.

55

END OF BB DESCRIPTION