DataCite, DataONE, Dryad and UC3
William Michener DataONE and University of New Mexico
John Kunze and Patricia CruseUniversity of California Curation Center (UC3), California Digital Library and
DataONE
Ryan ScherleDryad (National Evolutionary Synthesis Center) and DataONE
A Choice
If the scientific record is at risk– Results can’t be reproduced– Science fails, global
catastrophe ensues
The choice: Better data publishing, sharing, and archiving
OR
Planetary destruction?Roberto Rizzato
engaging the scientist in the data curation process supporting the full data life cycle encouraging data stewardship and sharing promoting best practices engaging citizens developing domain-agnostic solutions
Providing universal access to data about life on earth and the environment that sustains it
1. Build on existing cyberinfrastructure
2. Create new cyberinfrastructure
3. Support new communities of practice
A Vision for Change: DataONE
University of California Curation Center, California Digital Library
DataONE CyberinfrastructureMember Nodes
• diverse institutions
• serve local community
• provide resources for managing their data
Coordinating Nodes• retain complete metadata catalog • subset of all data• perform basic indexing• provide network-wide services• ensure data availability (preservation) • provide replication services
Flexible, scalable, sustainable network
DataONE Wish List for Data Citation
• Precise identification of a dataset– At level of version, file, table, cell, etc., or groups thereof– So that readers can find and understand the data
• Credit to data producers and data publishers– Vital incentive for data sharing and archiving
• A link from the traditional literature to the data– Gives intellectual legitimacy to creation of data sets
• Research metrics for datasets– Sponsors want publication and retention numbers
• Coordinated citation support for local data producers, regional archives, and global end-users
Identifier Requirements• To accommodate a diverse set of member nodes that hold a
wide variety of content, the DataONE system must adhere to the following principles:
– Agnosticism – DataONE supports all identifier schemes where the ID can be represented as a Unicode string.
– Opacity – DataONE does not attach any meaning or resolution protocol based on the identifier.
– Authority – The identifier first assigned by a member node is authoritative. Other identifiers may be assigned by other nodes for internal use.
Identifier Requirements
• To participate in the DataONE network, a node must be able to meet the following requirements:
– Uniqueness – Identifiers must be unique across the space of DataONE.
– Granularity – Every item must be assigned an identifier (metadata as well as data).
– Immutability – The object referenced by an identifier cannot change. If an object is modified, it must receive a new identifier.
Think Big, Start SmallCDL leading 2 projects involving DataONE:1. EZID for simple identifier management– Creates ids, stores metadata and resolver target URLs– Supports DataCite DOIs and lower-cost ids (ARKs, URLs)– First customer is DataONE member, Dryad
2. Excel “add-in” project with MS Research– Extend Excel to support data sharing, archiving, and access– E.g., ability to export to data archive in a standard format
with column headings drawn from a shared vocabulary
DataONE/DataCite Example
DataCite Member (eg, CDL)
DataONE Member Node data archive
(eg, Dryad)
Research scientist
6. full citation
7. full citation
1. data + metadata
3. citation + URL + id
DOI resolver and TIB registration
5. URL plus id EZID resolver and
registration service4. save full citation
(opt) CDL-hosted EZID id minting service
DataONE Coordinating Node metadata catalog
(eg, UNM or UCSB)
get unique id string
get unique id string
2. metadata + URL + id
A Repository of DataUnderlying Journal Articles
The Goal• Store all data underlying publications in evolutionary biology,
ecology, and related disciplines, at the time of publication.
GenBank
TreeBASE
Dryad
ccaattggct gttcttcgat tctggcgagt
Identifiers and Versioning
• Each “data package” receives a DOI, which refers to the most recent version of the file.
• doi:10.5061/dryad.20
• When repository content is modified, a version indicator will be appended to the original DOI
• doi:10.5061/dryad.20.2
• To specify a particular file within the data package, a slash is used.
• doi:10.5061/dryad.20.2/3
Identifiers and Versioning
• Metadata and particular formats of the files are not given “true” DOIs. They are reachable by appending a parameter to the DOI.
• doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=dc• doi:10.5061/dryad.20.2/3.1?urlappend=%3fformat=xls
Citation• When using data from Dryad, please cite the original article.
– Sidlauskas, B. 2007. Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Evolution 61: 299–316.
• Additionally, please cite the Dryad data package. The citation should include the following elements: – Author(s)– The date on which the data was deposited– The name of the data file, if applicable– The title of the data package, which in Dryad is always "Data from: [Article name]"– The name "Dryad Digital Repository"– The data identifier
• For example: – Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification
in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20
Challenges/Questions
• Dealing with dynamic streaming data?– How do versions enter into the identifiers
scheme?• Resolving to human or machine-interpretable
description of object?• Need for a registry of name spaces?• Can metadata stds support multiple globally
unique identifiers?
Top Related