VTL (Validation and Transformation Language) A new standard for data validation and processing Marco...
-
Upload
ashlee-shepherd -
Category
Documents
-
view
223 -
download
1
Transcript of VTL (Validation and Transformation Language) A new standard for data validation and processing Marco...
VTL (Validation and Transformation Language)
A new standard for data validation and
processing
Marco PellegrinoEurostat
Acknowledgements: Bank of Italy, SDMX Technical Working Group, DDI Alliance, Bryan Fitzpatrick, Arofan Gregory, and others…
Eurostat
Background
Data validation, a critical issue for the E.S.S.
Eurostat and Member States: double work or "no work"?
Inefficiencies:• Lack of coordination• Lack of documentation• Lack of formalisation of validation procedures and rules• Low harmonisation of software solutions.
Need of a comprehensive solution: portfolio of actions in the framework of the ESS Vision 2020
2
Eurostat
SDMX originally focused on data collection and dissemination
Current line of tendency: Support more stages of the statistical production process
Approach
GSBPM (Generic Statistical Business Process Model)
3
Data Validation Process Before/During Transmission
(“First Level”) - Covered by SDMX today
- Format Check (SDMX-ML) - Code Check (SDMX DSD)
After Transmission( “Second Level”) - Not yet covered by SDMX
SDMX-VTL
- Detailed value check - Mirror check - …
Eurostat
Main goals:
Define and preserve validation rules (document and preserve the validation know-how)
Exchange and share validation rules (with reporting institutions & other correspondents)
Apply validation rules in the collection and production processes (aiming at an industrialized processing of statistical data)
At a later stage:
Improve the VTL to support more complex algorithms for data compilation and estimation
The VTL initiative
5
What is VTL 1.0?• A reference framework for the creation of rules for data
validation and transformation
• It maps to a clear and generic information model
• It aligns with relevant statistical information standards such as SDMX and GSIM
SDMX
VTL: part 1 - part 2
BNF (Extended Backus-Naur Form) Technical notation
6
Eurostat
Main VTL features
• User orientation
• Integrated approach
• IT implementation independence
• Active role for processing
• Extensibility and customizability
• Language effectiveness
Proper governance is needed
8
The VTL Information Model
• VTL is a “stand-alone” specification• It can be used with SDMX, DDI, or potentially anything
else• It can be used on its own
• Because different standards have different information models, VTL must establish its own information model• Other information models can be mapped against it• VTL uses GSIM as a basis
VTL Data Model
• Organizes Data Points into Data Sets
• Describes Data Structures using Structure Components• Measures• Attributes• Identifiers
• very similar to GSIM
Transformation Model
• Takes a set of Transformation Expressions and organizes them into a Transformation Scheme
• Each Expression has an Operand, and Operator, and a Result– Operands can have Parameters– Operators and Results are identified by the Expression
when it is executed– VTL specifies the Operators and the types of Parameters
• VTL uses the SDMX Transformation model
Transformations and Process models
Transformation modelIt exists in SDMX, but not in GSIM and DDI
It allows defining calculations through mathematical expressions
It does not allow cycles (same structure than a spreadsheet)
Process modelIt exists in SDMX, GSIM, DDI and other standards (e.g. BPM)
It allows defining calculations through a process
It allow cycles (like a procedural programming language)
Governance and Standards Alignment
• VTL will be maintained by the SDMX TWG• Extensions will be considered for inclusion in future
versions
• Has already produced some feedback to GSIM for next version• VTL can be mapped against SDMX• VTL can be directly utilized by DDI in those places where
computations are included• VTL could be used in CSPA services where processing is
performed • As GSIM processing Rules
What's next?
• More operators and features + bug-fixing + fine-tuning = VTL 1.1
• Reuse of rules, structural validation?
• SDMX specifications (e.g. for exchanging VTL rules in SDMX messages, for storing rules and for requesting validation rules from web services) in progress
• Implementation tests with some pilot domains
• Integration within the ESS Validation Architecture (Validation project with national statistical institutes).
19
Eurostat
Conclusions
• A formal unambiguous and standard language was needed for encoding validation rules so that these can be translated into specific data editing systems
• Use of generic software services provided within the ESS community is foreseen
• Great achievement, led by a task-force with experts from statistical institutes, central banks, international organisations and (a few) private experts
20
Thanks for your [email protected]
22
Is the total = 100?
check (ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]=100, imbalance(Percentage), all)
VTL Grammar: A Simple Example
23
ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]
check (ds1[keep (Country,Year, Percentage)][aggregate sum(Percentage)]=100, imbalance(Percentage), all)
Steps
VTL Grammar: A Simple Example (cont.)
• We want to create a table (Dresult) which provides totals, combining the values for the US and the European Union:
Dresult := D1 + D2