Metadata Management and Tools

35
Metadata Management and Tools August 1, 2013 Data Curation Course

description

Metadata Management and Tools. August 1, 2013 Data Curation Course. Outline. General information about metadata Metadata and the data life cycle DDI – a specification for documenting social, behavioral and economic data Exercises. Defining Metadata. - PowerPoint PPT Presentation

Transcript of Metadata Management and Tools

Page 1: Metadata Management and Tools

Metadata Management and Tools

August 1, 2013Data Curation Course

Page 2: Metadata Management and Tools

Outline

• General information about metadata• Metadata and the data life cycle• DDI – a specification for documenting social,

behavioral and economic data• Exercises

Page 3: Metadata Management and Tools

Defining Metadata

• Metadata are commonly described as “data about data”

• Metadata serve as “bridge” between data producer and data user

• Metadata bring data to life, helping user to interpret and understand data

Page 4: Metadata Management and Tools

Simple Example

Bad Better…Better…

Best(Rich,

Structured)

Best(Rich,

Structured)

Page 5: Metadata Management and Tools

Importance of Metadata

• John MacInnes, Professor of Sociology, The University of Edinburgh, talks about the issues in using secondary data.*

• http://www.youtube.com/watch?v=xlQMVV7VJtA

* Video courtesy of MANTRA Research Data Management Training -- http://datalib.edina.ac.uk/mantra/

Page 6: Metadata Management and Tools

Concerns About Creating MetadataConcern Solution

workload required to capture accurate robust metadata

incorporate metadata creation into data development process – distribute the effort

time and resources to create, manage, and maintain metadata

include in grant budget and schedule

readability / usability of metadata use a standardized metadata format

discipline specific information and ontologies

‘profile’ standard to require specific information and use specific values

DataONE Education Module: Metadata. DataONE. Retrieved July 19, 2013

Page 7: Metadata Management and Tools

Metadata Types

• Types of metadata, by content: *– Descriptive: Intellectual content and contextual

information relevant to understanding and interpreting data

– Technical: Physical and digital features of a data resource

– Structural: Configuration of a resource, connections and relationships among parts, or among related resources

*Adapted from Jenn Riley, Seeing Standards: A Visualization of the Metadata Universe

Page 8: Metadata Management and Tools

Metadata and the Data Life Cycle

• Metadata–driven life cycle: Metadata are created, but also used and reused at every stage of the data life cycle

• Ideally, metadata continue to accumulate to provide a complete record of the evolution of a dataset

Page 9: Metadata Management and Tools

Metadata and the Data Life Cycle

Rich metadata = smooth life cycle, high quality data

Page 10: Metadata Management and Tools

Structured Metadata

• Enhances the value and usability of metadata• A consistent, predictable metadata structure

enables– More effective searches– Automated management and processing– Resource sharing– Interoperability

• Standardization leads to greater efficiency

Page 12: Metadata Management and Tools

Standards

Cartoon courtesy of XKCD.com

Page 13: Metadata Management and Tools

What is DDI?

• A metadata standard of and for the community• Two major development lines

– DDI Codebook– DDI Lifecycle

• Metadata for both human and machine consumption• Additional specifications:

– Controlled vocabularies – RDF vocabularies for use with Linked Data

Page 14: Metadata Management and Tools

DDI Background and History

• Its development started in the mid-1990s, as a grant-funded effort initiated and organized by ICPSR, with international participation

• First version published in February 2000

Page 15: Metadata Management and Tools

Background and History Continued

• The DDI Alliance was formed in 2003 to support and develop the DDI standard

http://www.ddialliance.org/• Ever-growing number of DDI users; large

multinational projects– CESSDA data portal (20 European data archives)– International Household Survey Network – IHSN

(developing countries from Africa, Asia, former Soviet Union, and more recently, Latin America)

Page 16: Metadata Management and Tools

DDI Members and Projects Worldwide

Page 17: Metadata Management and Tools

DDI Specification

• The first versions of DDI (1.0 through 2.1) were document- and codebook-centric

• Version 3.0 was published in April 2008 to document the data life cycle

Page 18: Metadata Management and Tools

RDF Vocabularies for Semantic Web

• DDI-RDF Discovery Vocabularyo For publishing metadata about datasets into the Web of Linked

Datao Based on DDI Codebook and DDI Lifecycle

• XKOSo RDF vocabulary for describing statistical

classifications, which is an extension of the popular SKOS vocabulary

Publication expected in second half of 2013

Page 19: Metadata Management and Tools

DDI of the Future

• Robust and persistent data model (for the metadata), with extension possibilities, variety of technical expressions

• Complete data life cycle coverage• Broadened focus for new research domains• Simpler specification that is easier to understand

and use including better documentation

Page 20: Metadata Management and Tools

Benefits of DDI Approach

• Rich content (currently over 800 items)• Metadata reuse across the life cycle• Machine-actionability• Data management and curation• Support for longitudinal data and

comparison

Page 21: Metadata Management and Tools

Metadata Reuse

Page 22: Metadata Management and Tools

DDI Alignment with Other Metadata Standards

• MARC: DDI-C, DDI-L• Dublin Core: DDI-C, DDI-L• SDMX (Statistical Data and Metadata Exchange):DDI-L• ISO 11179 (Metadata Registries): DDI-L• FGDC (Digital Geospatial Metadata): DDI-L• ISO 19115 (Geographic Information Metadata): DDI-L• PREMIS (Preservation Metadata), METS (Metadata

Encoding and Transmission): under consideration

Page 23: Metadata Management and Tools

DDI-L or DDI-C?• DDI-L

– Complex data (hierarchical, longitudinal, comparative)

– Metadata-driven survey design (building questionnaires)

– Multiple languages– Detailed geographic information– Metadata reuse across the data life cycle– Reusable resources: question/concept/variable

banks, registries of organizations and individuals, etc.

Page 24: Metadata Management and Tools

DDI-L or DDI-C?

• DDI-C– Documentation of simple, survey-type data– Catalog records, involving mainly study-level

descriptions (most new features in DDI-L relate to documenting data at item/variable level)

• Both DDI-C and DDI-L may be used within the same organization

• ICPSR uses DDI-C but has translation to DDI-L for study-level records

Page 25: Metadata Management and Tools

DDI-C Structure and ContentsDDI-C main sections:1. Document Description

Self-referencing information about the DDI instance at hand. Usually for internal use, not publicly displayed

2. Study DescriptionGeneral information about the study. Input is usually the introductory part of a codebook, describing the study scope, methodology, topical/temporal coverage, etc. In DDI-C this section also includes data access and availability information

3. File DescriptionDescribes physical characteristics of data file(s) – name, format, structure, dimensions

4. Data DescriptionDetailed description of each variable, including variable groups if applicable. Special subsection for documenting census-type aggregate data

• Other (Study Related) MaterialsReferences, or contains materials used in the production of the study or useful in the analysis of the data

For complete content and Tag Library see http://www.ddialliance.org/Specification/DDI-Codebook/2.1/DTD/Documentation/DDI2-1-tree.html

Page 26: Metadata Management and Tools

Study-level DDI Elements at ICPSR• Study ID (Number, DOI)• Title, Alternate Title• Author/Primary Investigator• Bibliographic Citation• Funding Information• Abstract• Keywords/Topic Classification• Series Information• Geographic Coverage• Time Period Covered• Time Method

Date(s) of CollectionMode of CollectionUniverseSamplingUnit of AnalysisResponse RatesWeighting InformationData TypeExtent of ProcessingAccess Conditions/RestrictionsVersion History

Page 27: Metadata Management and Tools

Study-level DDI at ICPSR• Leveraged in several ways

o Data discovery -- Forms basis of Solr/Lucene faceted search

o Repurposing -- Record is reused across ICPSR’s topical archive sites

o Interoperating -- Records shared with Data-PASS, ODESI, and CESSDA archives

o Study Overview -- Becomes PDF overview bundled with each download

Example: www.icpsr.umich.edu/icpsrweb/ICPSR/studies/30103

Page 28: Metadata Management and Tools

DDI at ICPSR: Study-level Metadata Editor

Page 29: Metadata Management and Tools

DDI at ICPSR: Study-level Metadata Editor

Page 30: Metadata Management and Tools

Variable-level DDI elements at ICPSR

• Variable name and ID• Variable label• Question text• Descriptive variable text• Category labels and values (responses)• Category statistics (frequencies)• Summary statistics • Variable format• Notes

Page 31: Metadata Management and Tools

Variable-level DDI at ICPSR

• Variable-level DDI leveraged in several ways

o Search -- Permits search of variables within a dataset/serieso Search across ICPSR -- Serves as foundation for Social Science

Variables Databaseo Integration with online analysiso Codebook with frequencies -- Enables generation of PDF

documentation• Example:

http://www.icpsr.umich.edu/icpsrweb/ICPSR/ssvd/studies/30103/datasets/1/variables/Q25

Page 32: Metadata Management and Tools

Tools for generating DDI metadata• Nesstar Publisher

– DDI-C, study, file, and variable level• Colectica

– DDI-L configuration, study and variable level– Both DDI-C and DDI-L compatible (import and

export)– Exports DDI and PDF, HTML, RTF documentation

(no need to re-convert to presentation formats)• Colectica for Excel

Page 33: Metadata Management and Tools

Tools continued

• XCONVERT (SDA Berkeley)– DDI-C, variable level: converts SAS, SPSS, or

Stata syntax into DDI-XML, without frequencies

• StatTransfer (v. 11)– DDI-L, variable level: no frequencies

• MQDS tool– Exports Blaise to DDI-L to create study

documentation

Page 34: Metadata Management and Tools

Tools continued

• More DDI tools can be found here:http://www.ddialliance.org/resources/tools

Page 35: Metadata Management and Tools

Questions?