DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015...

40
DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Transcript of DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015...

Page 1: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

DDI Background

Wendy ThomasDDI Sprint 2015 - Dagstuhl

Copyright © DDI 2015Published under Creative Commons Attribute-ShareAlike 3.0 Unported

Page 2: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Credits• Slides marked “W. Thomas – 2014” were

created by Wendy Thomas• Except as noted all other slides are from the

slide decks created for DDI training workshop at Dagstuhl October 2014 and related events– Arofan Gregory, Wendy Thomas, Joachim

Wackerow, Jon Johnson• All slides are published under Creative

Commons Attribute-ShareAlike 3.0 UnportedOn-line available at: http://creativecommons.org/licenses/by-sa/3.0/This is a human-readable summary of the Legal Code at:

http://creativecommons.org/licenses/by-sa/3.0/legalcode

Page 3: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Environment

• Background of content metadata– Rapid advances in data/metadata storage and

access since 1960’s– Rising expectations for access and openness– Shifting focus on metadata coverage– DDI development in light of these changes

• DDI history

Page 4: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Time Period Access Expectation Applied Metadata DDI

1960-1970 Electronic data capture and dissemination

Print; Bibliographic records for machine readable files

Discussion begins regarding consistent formats for metadata

1980-1990 Increased access to computing; internet; CD-ROMS

Data file description; Electronic format; Discovery and basic codebook metadata

NSF grant to develop DDI; focus on simple study; description of a data file

2000-2005 Web access; web publication

Access systems; Generic access system

Publication of DDI 1.0; NESSTAR and local use tools; Publication of DDI through 2.1; DDI Alliance; IHSN Toolkit

2006-2010 One-stop-shopping; Open Data

Preservation; Provenance

DDI-L [3.0, 3.1]; Expansion of IHSN Toolkit to over 80 countries

2011-2015 "Semantic Statistics"; Linked Data; Big Data

Shift from top down to bottom up

DDI-L 3.2; Moving Forward Project

Page 5: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Types of Metadata

• Bibliographic– Short description applicable to multiple object types for the purpose

of discovery and access• Access metadata

– Additional details required to access specific content of an object (location of content within the object, required hardware/software, etc.)

• Content metadata– Detailed information on the content structure and source of a data file

• Production metadata– Metadata that prescribes and drives processes– Metadata that is not limited to a single data file

Page 6: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Types of metadataBibliographic Access

MetadataContent Metadata

Production Metadata

Dublin CoreMARCDMARCEtc.

SPSSSASRStataPREMISEtc.

EMIANZLICSDMXDDI-C

BPLDDI-LGSBPMGSLPMGSIM

Page 7: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

What is DDI

• DDI describes provenance, processing, structure, and meaning of your data for intelligent use by human beings and machines

• DDI allows you to say what you plan to do, what you actually did, and what the analyst needs to know to make intelligent use of the data

Page 8: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Background• Concept of DDI and definition of needs grew out of

the data archival community• Established in 1995 as a grant funded project

initiated and organized by ICPSR• Members:– Social Science Data Archives (US, Canada, Europe) – Statistical data producers (including US Bureau of the

Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada)

• February 2003 – Formation of DDI Alliance– Membership based alliance– Formalized development procedures

Page 9: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Alliance

• The Alliance is an unincorporated, self‐sustaining membership organization whose members have a voice in the development, promotion, and dissemination of DDI specifications.

• 40 member institutions including universities, archives, national statistical institutes, and international organizations.

Page 10: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Alliance Structure• DDI-L specifications are created by committees drawn from

among the member organizations– Some outside experts are invited to attend

• The elected Executive Board governs the organization– They are elected by Member representatives

• The Scientific Board vote to approve all published work– One representative per member organization

• The Technical Committee (TC) creates the technical work products (XML schemas, UML models, documentation, etc.)

• Working Groups are short term groups working on future DDI topical content (i.e., Active Data Management Plan)

• Web Site Maintenance Group

Page 11: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI-C and DDI-L

• DDI has 2 development lines– DDI Codebook (DDI-C)– DDI Lifecycle (DDI-L)

• Both lines will continue to be improved– DDI-C focusing just on single study codebook

structures– DDI-L focusing on a more inclusive lifecycle model

and support for machine actionability

Page 12: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Early DDI:Characteristics of DDI-C

• Focuses on the static object of a codebook• Designed for limited uses– End user data discovery via the variable or high level study

identification (bibliographic)– Only heavily structured content relates to information

used to drive statistical analysis • Coverage is focused on single study, single data file,

simple survey and aggregate data files• Variable contains majority of information (question,

categories, data typing, physical storage information, statistics)

Page 13: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Limitations of these Characteristics

• Treated as an “add on” to the data collection process• Focus is on the data end product and end users

(static)• Limited tools for creation or exploitation• The Variable must exist before metadata can be

created• Producers hesitant to take up DDI creation because it

is a cost and does not support their development or collection process

Page 14: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Lifecycle Model

Metadata Reuse

Page 15: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Concept Universe

Representation

Geography

Measurement Means of Capture

Processes InstrumentCollection Event

Methods / Protocols

Variables Data Relations

Storage Structure

Store (file level)

Summary Statistics

Management

Agents

Comparison

Discovery

Information Captured in DDI

Page 16: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

DDI-L Driving Principles• Capture the full lifecycle of data• Capture metadata at the point of creation– Increases accuracy– Prevents loss through neglect

• Reuse– Increases accuracy– Provides implicit comparability

• Management over time– Provides context– Ensures preservation

Page 17: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

Additional Issues• Metadata is “touched” by multiple agents– Metadata needs to be accessible throughout the lifecycle

• Metadata can be an “input”, a process definition, and an “output”– Metadata needs to be captured and retained at all points

in the development, production, and usage lifecycle• Metadata can be descriptive or prescriptive;

actionable or non-actionable– The role of the metadata dictates how it needs to be

captured

Page 18: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Metadata Management Over Time

• Modern best practice is to manage metadata in centralized repositories for use throughout the organization

• Metadata management is the foundation of all other applications

• Metadata is not static. It versions, it is enhanced to support new applications, use cases, and to capture provenance at all levels.

Page 19: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Metadata Reuse

• Metadata is often not managed in such a way that it can be easily reused

• Reuse of metadata provides a high degree of consistency and comparability – elements of data quality

• This benefits data producers, managers, and users

Page 20: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Metadata-Driven Systems

• DDI supports the automation of many tasks which today are manual and resource-intensive

• This requires detailed and rich metadata in a machine-actionable form

• May require the re-design of workflows within an organization – put the metadata first!

• A major part of the “modernization” of production and dissemination systems

• Funding agencies are demanding more metadata to allow the reuse and replication of research data they pay for

Page 21: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

DDI RDF Products

• DDI held two workshops on “Semantic Statistics” at Schlöss Dagstuhl over the past few years to address the expression of DDI in an RDF environment

• DDI has just completed the first review for the following RDF vocabularies:– DISCO – DDI-RDF Discovery Vocabulary– XKOS – Extended Knowledge Organization System– PHDD – Physical Data Description

• DDI 4 will be model based and implemented in both XML and RDF

Page 22: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Moving Forward Project

• The development of the next version of DDI is a project with a limited lifespan

• The project has an Advisory Committee to oversee the organization of the work

• It has its own project management and working group structure

• All development work is being done in an open, transparent fashion

• The outputs (for review or publication) are overseen by the Technical Committee

Page 23: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

STANDARDS USED WITH DDI

Page 24: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI and Other Standards

• DDI is used in combination with other standards in many different contexts. The following are two examples:– Data archives and libraries doing preservation,

management, and dissemination– Statistical offices doing data production

• This is not a comprehensive list of scenarios, but is indicative of what takes place

Page 25: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Archival/Management Standards

• Dublin Core – discovery metadata• ISO/IEC 11179 – Metadata registries• ISO/IEC 19115 (and related series) –

geographical systems• OAIS/PREMIS – archival process • METS – information wrapper for transport and

storage• GLBPM – process description for longitudinal

data

Page 26: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Official Statistics

• Generic Statistical Business Process Model (GSBPM) – process model of statistical production

• Generic Statistical Information Model (GSIM) – information model for statistical production

• ISO 17369 SDMX – Statistical data and metadata exchange and dissemination

• Common Statistical Production Architecture (CSPA) – standard architecture for building re-usable statistical services

• Business Process Modelling Notation (BPMN) – generic standard for modelling and communicating business processes

Page 27: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

RDF and the Web of Linked Data

• RDF and related standards are becoming increasingly important to the DDI Community

• There are a large number of RDF vocabularies which are important (dcterms, skos, foaf, prov, qb, etc.)

• In its own RDF vocabularies, DDI has made every effort to align with existing vocabularies in accordance with best practice

Page 28: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI as a Good Citizen

• The DDI Alliance has made many efforts to reach out to other standards organizations – Usually the goal is alignment of the standards

• The DDI Alliance has also tried to stay current with best practice as a standard specification– Shift to model-driven approach– Use of W3C XML Schema– Support for RDF standards

Page 29: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Expanded Domain Usage

• Traditionally, DDI has been used by those in the Social, Behavioral, and Economic (SBE) sciences and by official statistical agencies.

• There are several “growth” domains:– Health research– Education– Environmental science

• Thus, even more existing standards are becoming of interest to the DDI Alliance

Page 30: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI TOOLS OVERVIEW

Page 31: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Tools Outline

• Editing• Data Collection• Transformation• Management• Metadata/Data Dissemination

Page 32: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Editing Tools• Rogatus Survey (DIPF) – WIP, Open Source, .NET, DDI 3.2

– Being updated – available end of 2015• Colectica – Commercial, .NET, DDI-C and DDI-L 3.2

– Also, free Excel Plug-In• DDI Editor (DDA) – Open Source, Java, DDI 3.1• OM Survey Manager – Open Source, Java, DDI 3.1

– Supports merging codes/categories created from transforming DDI-C• GESIS tool for researcher documentation, variable harmonization

(CharmStats)- WIP• Nesstar Publisher (NSD) – Freeware, DDI Codebook• Metadata Management Toolkit (IHSN) – freeware/open source,

Java (based on Nesstar); editor and publications tools for websites/CD-ROM, DDI Codebook

• CED2AR (Cornell) – Open Source, Java, DDI Codebook

Page 33: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Data Collection• Rogatus Survey (DIPF) – WIP, Open Source, .NET, DDI 3.2

– Generate & document surveys, manages fieldwork– Being updated – available end of 2015

• Michigan Questionnaire Documentation System (SRO)– Export DDI documentation from Blaise instruments

• BlaiseDoc (SRO) - WIP – Create documented Blaise instruments

• Colectica – DDI-C, DDI-L (3.2), Commercial, .NET – Read & generates Blaise code, CASES, CSPro, and others– Document questionnaires manually

• DDI Editor (Danish Data Archive) – Open Source, Java, DDI 3.1– Document questionnaires manually

• R code for integrating REDCap questionnaires with DDI (Larry Hoyle) – DDI 3.1

Page 34: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Transformation Tools• DDI XML Upgrade (Metadata Technology NA) – Open Source, DDI-C

to DDI 3.1, command line transform (Java)• Sledgehammer (Metadata Technology NA)– Free and paid versions,

GUI or batch mode, mines metadata from stats packages and produces set-ups for other stats packages, SQL, other formats. All versions of DDI, SSS. (Java)

• Caelum (Metadata Technology NA) - command-line tool for XSLT reports from DDI metadata, freeware (Java)

• StatTransfer – commercial tool supporting DDI 3.1 extraction from stats packages and spreadsheets

• DDI with R Tool – creates R set-ups from DDI Codebook (will be extended for DDI Lifecycle) – Adrian Dusa

• r2ddi (SOEP) – creates R set-ups from DDI 3.1

Page 35: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Management Tools

• Colectica Repository – Commercial, .NET• Rogatus Repository (DIPF) – WIP, Open Source,

.NET, DDI 3.2• Ariā (Metadata Technology North America) –

Commercial, Java, various DDI versions. Classification management system.

Page 36: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Metadata/Data Dissemination Tools

• Colectica Web – Commercial, .NET, DDI 3.2• Rogatus Repository (DIPF) – Open Source, .NET, DDI 3.2 • SuperCross (Space Time Research) – Commercial, online

tabulation reading DDI 3.1 • Nesstar Server (NSD) – Online tabulation and search, DDI

Codebook• NADA Catalog (IHSN) – Open Source, Java, online catalog tool,

DDI Codebook• Questasy (CentreData) – community development, PHP, online

system for documentation creation and dissemination• DDI on Rails (SOEP) – Open Source, based on Ruby on Rails web

framework

Page 37: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

W. Thomas - 2014

RESOURCES

Page 38: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Resources• DDI Alliance Site

– http://www.ddialliance.org– General link to all resources/news– Link to Sourceforge for standards distributions– Controlled Vocabularies

• Tools/Resources Page– http://tools.ddialliance.org– Best place for tools, slides, and resources

• Mailing Lists– www.icpsr.umich.edu/mailman/admin/– All of the lists starting with “DDI” are related to DDI topics

• DDI Users (best place to link into the group)• List for each sub-committee (not all groups are active)

S20 38

Page 39: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

DDI Publications

• Best Practices Across the Data Life Cycle– www.ddialliance.org/resources/publications/working/bestpractices

• Use Cases– www.ddialliance.org/resources/publications/working/usecases

• IASSIST Quarterly– www.iassistdata.org/iq/– A special double feature focusing on various projects related to DDI 3

and it’s enhanced features– Articles related to DDI can be found in many issues of the IQ

S20 39

Page 40: DDI Background Wendy Thomas DDI Sprint 2015 - Dagstuhl W. Thomas - 2014 Copyright © DDI 2015 Published under Creative Commons Attribute-ShareAlike 3.0.

Want to know more on DDI 4.0?

• Minutes and formal products from the Dagstuhl and EDDI sprints are found at the DDI site– http://www.ddialliance.org/ddi-moving-forward-proces

s• A new public wiki has been established to maintain

and organize all working and final documents– https://ddi-alliance.atlassian.net/wiki/pages/viewpage.

action?pageId=491703– Current development platform for content capture– http://lion.ddialliance.org/