Data Publishing and Metadata Creation - IASSIST Home
Transcript of Data Publishing and Metadata Creation - IASSIST Home
Data Publishing and Metadata Creation
Nicole Quitzsch GESIS Leibniz-Institute for the Social Sciences
IASSIST 7 June 2012
•Data is difficult to manage after project funding ends
•No direct access to data •No widely used method to identify datasets
•No widely used method to cite datasets
•No effective way to link between datasets and articles
•Datasets are not included in impact analysis
Where do we stand?
What can we do about it?
• safeguarding and accessibility of research data
• research data as legitimate, citable contribution
to the scientific record
• linking of data and publications
Why? Data should be…
• visible and accessible
• permanently citable
• linked with published articles and books
Development of an infrastructure in cooperation with DataCite
What is da|ra? • since Feb. 2010 GESIS member of DataCite
• 2011-2013: Implementation of a registration portal for social and economic data; including upgrade of services
da|ra Metadata schema v2.2.1
Registry service and database, Upgrading
SLA Template
da|ra Services • 5 Publication agents • almost 5.000 registered metadata sets • 2.465 OECD metadata sets included
DOI- Registry Process
USER Registry Service
DataCite Metadata
Store
{Metadata, DOI}
DataCite Metadata Store
{Metadata}
1 2
4
{DOI, Metadata}
3
{DOI, Metadata} 5
status: OK
Development of the da|ra Metadata Schema
da|ra Metadata Schema: Structured set of characteristics to describe social and economic research data
Schema version V 1.0: Based on the metadata schema of the GESIS Data Catalogue (DBK) and the DataCite Kernel version 1.0
Version 2.x: Developed by: GESIS + ZBW
Social and economic research data: Specific requirements (1)
DataCite Schema: "lowest common denominator" da|ra Schema: • Extended according to DataCite metadata schema • DataCite elements + specific elements • Extensive description of research data • Foundation for consistent citation of data
Social and economic research data: Specific requirements (2)
• Specific metadata elements: time dimension, temporal coverage, sampling
• Specific development tools: Controlled vocabularies, thesauri, classifications
Properties of the da|ra Metadata Schema (1)
36 elements:
• 8 descriptive mandatory elements • +4 administrative elements • 24 optional items
Properties of the da|ra Metadata Schema (2)
Principles: For fields with controlled content always an extra field for free content Controlled vocabularies/syntax: standards (DataCite vocabularies, DDI vocabularies, DCMI media types, ISO / DIN)
17.1
Geographic Coverage (controlled) Universe.areaControlled
Geographic units on which the study focuses. These are taken from a controlled vocabulary geographic names authority list.
ISO 3166-2/3, UN/LOCODE
17.2 Geographic Coverage (free) Universe.areafree
Geographic units on which the study focuses (free).
Ability to assign geographic units free if they are not available in the controlled vocabulary, eg West Berlin
Properties of the da|ra Metadata Schema (3)
Information Schema documentation:
• Identifier of the elements • Definitions of the elements • Details of the commitment • Repeatability • Vocabulary encoding schemes • Syntax encoding schemes • Data type • Editing of fields
Properties of the da|ra Metadata Schema (4)
Goals: • Ensure quality of metadata • Interoperability • Further development of mappings
(DataCite, DDI, Dublin Core)
• Sustainability of the data: da|ra metadata records should be available for the Semantic Web
• Semantic Web: the "Understanding Web“ Information is linked to the level of meaning with each other
da|ra-Metadata for the Semantic Web (1)
Prerequisite:
• Machine-interpretable information • Uniqueness of a concept • Integration of the standard of individuals,
corporations and topics
da|ra-Metadata for the Semantic Web (2)
First Identification, then Linking
research publications (e.g. DOIs, URNs)
research data (e.g. DOIs)
is provider of
belongs to
is author of
Conclusion • Establishing DOIs for research datasets is easy … if you have a service provider like da|ra. • Managing the metadata and keep track of versions is
possible … if you invest into documentation systems and establish a policy. • Time will tell … if researchers adopt data citation as a scientific principle.
Thank you for your attention!
Nicole Quitzsch
GESIS–Leibniz-Institute for the Social Sciences [email protected]