Combining Metadata Standards: Approaches and Benefits
Arofan Gregory
Open Data Foundation
Overview
• Recent events of interest
• The Standards: Comparison and Explanation
• Emerging Implementation Approaches– DDI and SDMX– SDMX and the Semantic Web Technologies– Classifications & Multiple Standards
• Ideas about Future Work
Recent Events of InterestNote: Some of these
events/implementations have been or will be described in detail in other papers – they are only mentioned here.
• Schloss Dagstuhl, Germany, November 2009 (DDI 3 Workshop)– SDMX 2.0 – DDI 3 field-level mapping work
started– Topic: DDI and the Semantic Web???
Recent Events of Interest (2)
• Semantic Web and SDMX– ONS hosted 2-day meeting in the UK, February 2009
(produced draft “SDMX-RDF”)– Banca d’Italia has a prototype project– New project launched at University of Tillburg in the
Netherlands (RDF expression of OECD SDMX data)
• Australian Bureau of Statistics (ABS) starts looking at SDMX and DDI to support data production lifecycle– Prototype implementations– Some other NSIs also very interested
Recent Events of Interest (3)
• Classifications and ISO/IEC 11179– Australia: Government agencies looking to
exchange classifications with ABS from existing ISO/IEC 11179 system, using SDMX, DDI
– Statistics Canada: Evaluation of IMDB (ISO/IEC 11179-based metadata repository) for use in coordination with Canadian RDC Network (based on DDI 3)
What Does This Mean?
• Not a complete list of events/implementations, but…
• Indicates the interest we are seeing in the combined use of standards!– These are not just experiments!– Organizations are looking at implementation
in a serious way now
Characterizing the Standards
• SDMX:– Data structures and formats– Reference metadata structures and formats– Web-services architecture based on registry services– Content-oriented gudelines
• ISO/IEC 11179:– Model for managing concepts and data elements– Metadata registries and lifecycle
• ISO 19115:– Standard metadata model for geographies– Used by DDI as geographical model
Characterizing the Standards (2)
• Dublin Core:– Citation metadata– Widely used in the Semantic Web– Used natively by DDI for citations
• Semantic Web/ “Linked Data” / RDF– See “Open Issues on the Semantic Web”
• DDI 3:– Will give more detail, as it is not as familiar to
the METIS community…
Characterizing the Standards (3)
• DDI 1.*/2.* was a standard used by archives and data libraries– Based on a “codebook” model– Used by some NSIs, especially in the developing world because
of the IHSN Metadata Management Toolkit– Used by the European network of data archives, CESSDA– Used by many data archives in North America
• Documentation of a single “Study” (survey)– Designed to help researchers find and use microdata
• DDI 3 is more ambitious – capture and use of metadata throughout the entire data lifecycle
DDI 3 Lifecycle Model
Notice: This is very like a high-level view of the METIS model!
Characterizing the Standards (4)
• DDI 3 provides machine-actionable metadata to support “metadata-driven” systems throughout the lifecycle– Focus is on upstream metadata capture and reuse
• Describes tabulation/aggregation of microdata• Provides support for comparison across surveys,
detailed geography, data processing, register data
• Aggregate “NCube” model aligned with SDMX• No architecture/web services support (yet)
An Observation…
• It is easy to say that two standards are “aligned”– Many of these standards were intentionally
aligned as they were developed
• It is much more difficult to understand how to use them in combination effectively…
Approaches and Benefits
• SDMX and DDI– DDI microdata production/SDMX aggregate
dissemination– Using SDMX data in DDI-based systems (combining
aggregates and microdata)– Combined SDMX/DDI supporting the entire data
lifecycle– DDI register data reported to SDMX collection system
• SDMX and the Semantic Web• Classifications and the Standards
Inputdata
Surveys
RegistersCleaning, editing,estimation, aggregation,etc.
Disseminationdata
DDI 3 Metadata
Website/Web Service
SDMX-MLData, Metadata, Structure
DDI – SDMX: Benefits
• The benefits of this approach are those found by using the standards generally– Supports “metadata-driven” system for data
production throughout the lifecycle (DDI)– Metadata-rich dissemination format, preferred
by data collectors (SDMX)– Shared tools; SDMX registry services, Web
Services for discovery and use of aggregates
SDMX – DDI: Integrating Aggregates and Microdata
• Scenario is common in some research– Economic data is often only available as
aggregates– Challenge is to combine aggregates and other
microdata
SDMX Web Service
Data archive/repository
Surveys
Registers
(DDI 3)
(DDI 3)
SDMX-to-DDI 3 Transform
Processing to produceIntegrated data and Metadata (DDI 3)
SDMX – DDI: Benefits
• Allows for easy use of official statistics by researchers– Solves problems of combining aggregates
and microdata
• Note: This does not involve dis-aggregation of published data– Structural transformation only, to allow DDI 3
systems to process aggregates easily
DDI + SDMX: The Data Lifecycle
• Uses a metadata model capable of expression as either SDMX or DDI, depending
• Provides support for process management– Uses many features of SDMX (process
model, structure sets, reporting taxonomies, etc.)
• Uses SDMX architecture/services model– Designed to allow incorporation of other
standards
Process-management system
SDMX Registry
Data and metadata repositories/application databases
Input datastore
Dissemination data store
Surveys
Registers
(DDI 3)
(DDI 3)
All registry interactions use SDMX
(BPML)
(SDMX)
Web site/Print/Web Services
(SDMX, DDI, etc.)
Interactions between systems are DDI orSDMX Web Services,as appropriate
SDMX + DDI: Benefits
• Leverages Web-Services technologies (registry, event triggers, etc.) for efficient automation, migration, flexibility
• Choice of tools is broad– Use the “best” format for any given task
• All the benefits of DDI-SDMX case
• Good support for process management as well as data management
SDMX and the Semantic Web Technologies
• Potentially applies to other standards as well (DDI, ISO/IEC 11179, etc.)
• Note that Semantic Web technologies only apply to dissemination– Not designed to support data production
• Terms:– “Raw data” in an SW context does not mean “raw
data”– “Data” in an SW context means “anything that can be
described using RDF” – not numeric data
Assumptions
• Creation of a harmonized statistical model based on proven models/standards, but expressed as RDF (“ontology” or “vocabulary” in SW terms)
• Implementation of an “SDMX-RDF” in standard SDMX dissemination packages
Dissemination data store (SDMX)
(SDMX-driven production system)
SDMX Web Service
Internal (production environment) External (dissemination to Web)
(SDMX-ML)
“SDMX-RDF”Transform Triplestore
(SDMX-RDF)
(SPARQLQueries)
(RDF)
SDMX and the Semantic Web: Benefits
• Leverages the “Linked Data” phenomenon without requiring a deep understanding of RDF, etc.
• Uses existing standards/models and best practices to do “heavy lifting” (data production)
• Puts a lot of reliable, quality data into the “Linked Data Web”– Helps address issues of provenance
Warning
• RDF is verbose!
• 4.5 Megs of GESMES/TS = 45 Megs of “compact” SDMX-ML XML = 420 Megs of RDF triples
• This may encourage the on-demand production of RDF data from web services, rather than static files
Standards and Classifications
• Some maintainers of standard classifications are looking at expressing them in useful formats (SDMX, DDI)– This is an easy thing to do– It is very useful: promotes re-use,
comparability, etc.– Could apply to Semantic Web RDF
expressions as well as XML-based standards
Ideas for Future Work
• Endorse SDMX – DDI mappings now being produced
• Develop an “SDMX-RDF” (?) or…• Develop a harmonized statistical model for
expression in RDF (based on DDI, SDMX, ISO/IEC 11179) (?)– Encourage tools developers to implement it in
standard dissemination packages
• Publish standard classifications in standard formats
Summary
• Combined use of standards is becoming a reality
• Proactive engagement with the Semantic Web world could provide benefits to all concerned parties, as well as users
Top Related