Persistent Identifier Services and their Metadata by John Kunze

20
Persistent Identifier Services and their Metadata John Kunze California Digital Library

Transcript of Persistent Identifier Services and their Metadata by John Kunze

Persistent Identifier Services and their Metadata

J o h n K u n z e C a l i f o r n i a D i g i t a l L i b r a r y

Presenter
Presentation Notes
Persistent identifiers (Pids) provide machine-actionable links to data and metadata that are vital to APIs (application programming interfaces) for publishing and citation. APIs are essentially request/response patterns that use Pids to reference things and metadata to describe not only the things themselves, but also any actions requested or taken. As a result, metadata design and standardization is wedded to API design and enhancement. With Pids as nouns and metadata as adjectives and qualifiers, Pid services play a key role in API implementation. [ 30 mins ]

2

Decoding the title persistent identifier services and their metadata || || || things, actions, and descriptors || || || nouns, verbs, and adjectives || preserving and serving scholarly communication around data (Context: scholarly research data)

2

Obama

4

An identifier is not a string of characters

An identifier is an association between a string and thing. An association is an opinion asserted by an authority. Example 1: http://allrecipes.com/recipe/sauteed-fiddleheads Example 2: 4CF3-57AB-2481-651D-D53D-Q

4

http://dx.doi.org/10.5072/4CF3-57AB-2481-651D-D53D-Q http://dx.doi.org/10.5240/4CF3-57AB-2481-651D-D53D-Q

5

Identifier schemes (v1) • URL (Uniform Resource Locator)

• the first time poor id management is blamed on syntax • URN (Uniform Resource Name)

• first attempt to correct poor id management with syntax • Handle

• second attempt to correct poor id management with syntax • DOI (Digital Object Identifier)

• third attempt to correct poor id management with syntax • ARK (Archival Resource Key)

• attempt to let id management be queryable (not yet realized)

5

6

Identifier schemes (v2) • URL (Uniform Resource Locator)

• world’s first actionable id, now underlying all other types • URN (Uniform Resource Name)

• open infrastructure, not fully realized globally • Handle

• closed infrastructure, fully realized globally • DOI (Digital Object Identifier)

• CrossRef enforces good id management, DataCite learning • ARK (Archival Resource Key)

• open infrastructure, realized locally and globally

6

7

If DOIs won why talk about non-DOIs?

• Cost • Open access • Changing nature of the DOI • Flexibility

7

8

Types of identifier services

• Repository – parking the bits • Data-aware dissemination

• more than just returning parked bits

• Citation management for end user researchers • Research tracking – measuring use and impact • Identifier creation, management, and resolution

8

9

Many service tools, many APIs Repository Tools

• ArXiv * • Dataverse * • Fedora/Hydra • Dspace * • Eprints • DataONE • Merritt/Stash • figshare • Zenodo

9

Citation Management • Mendeley • Zotero

Metrics and Tracking • Altmetric • Impactstory • Thomson Reuters Data Citation

Index • Elsevier Scopus

10

API concepts

Application Programming Interface (API) • how software talks to a service • unlike a Graphical User Interface (GUI) • more like a Command Line Interface (CLI)

APIs and CLIs use language constructs • Verbs, nouns, and qualifiers are "words”, and • words form commands/requests/responses, • which form scripts and programs.

10

11

APIs are metadata sentences

A command line interface powering an API interaction

11

$ sort mydata > sorted_data $ grep Smith sorted_data Smith, Sally 2014-04-01 406B Wong, Frank 2013-11-28 334 $ wget --user=sam --no-check-certificate \ "https://n2t.net/a/ezid/b?set cost 25.50" status: ok

12

Problem: traditional standardization

• Change by committee is ugly, costly, and slow • Example: Dublin Core, 15 cross-domain terms

12

European Parliament Technology - DG ITEC @ flickr

Presenter
Presentation Notes
Traditional metadata standards are controlled by panels of experts, eg, FGDC, EML, Darwin Core Change by committee is ugly, costly, and slow Example: perhaps most widely use cross domain vocabulary is Dublin Core, 15 cross-domain terms Agreed on in 5 years, lots of local divergence “I love the 15, but my domain needs these 2 terms. How do we add them?” A: Make your own ontology! Multiply by 200 domains and the result is 200 ontologies, 200 panels, 200 islands of non-interoperation

The Metadata Universe

Jenn Riley, IU

The Metadata Universe

Jenn Riley, IU

The Metadata Universe

Jenn Riley, IU

The Metadata Universe

Jenn Riley, IU

The Metadata Universe

Jenn Riley, IU

18

An alternate metadata universe

• Vision: one dictionary, one namespace • All research domains, any part of “metadata speech”

• Names, values, units, relationships, ...

• Search for terms, comment on terms, add terms, edit your terms, API for automated access

• All terms with globally unique persistent identifiers • Available at yamz.net (yet another metadata zoo)

18

19

YAMZ.net dictionary sociology

• Crowd-sourced evolving vernacular terms, stable canonical terms, and deprecated terms

• Use evolving terms depending on your risk tolerance

• Reputation-based (gaming-resistant) voting means strong terms rise, weak terms decline

19

Applying lessons learned from Wikipedia, the Internet-Draft/RFC process, and StackOverflow

Presenter
Presentation Notes
Something between crowd-sourcing and an exclusive club Learn from wikipedia, internet RFCs, StackOverflow, and American Heritage Dictionary

20

Summary

• Identifiers are not strings, but associations that break when things are not managed well

• People can forget names because we can google, but APIs need persistent names for automation at scale

• APIs are languages using metadata as “words” • Future API building will focus on vocabulary building

• For example, yamz.net

20

Thank you! [email protected]