1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

Post on 22-Dec-2015

219 views 1 download

Transcript of 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

1

CS 502: Computing Methods for Digital Libraries

Lecture 4

Identifiers and Reference Links

2

Desirable Properties of Identifiers

• Location independent name

• Globally unique

• Persistent across time

• Choice of human generated or automatic generation

• Fast resolution

• Decentralized administration

• Supported from standard user interfaces

3

Syntax<naming_authority>/<locally_unique_string>

orhdl:<naming_authority>/<locally_unique_string>

Examples10.1234/1995.02.12.16.42.21;9 (date-time stamp)cornell.cs/cstr-94.45 (mnemonic name)loc/a43v-8940cgr (random string)

Syntax of Handles

4

Examples of DOIs

10.156 / catalog-96

Publisher ID

assigned byDOI Agency

Item IDassigned byPublisher

10.1048 / 872

10.1532 / PII

10.18698 / SICI

5

Elements of the Handle System

• Handle services:

global handle service

local handle services

caching services

• Clients:

client libraries

browser extension

WWW proxy servers

• Handle administration

• System utilities

6

Hierarchy of Naming Authorities

loc 10 cornell

loc.cords

10.1234

cornell.temp

cornell.cs

cornell.cs.d

7

Handle Servers and Handle Service

• The Global Handle Service provides central coordination for all handle services.

• Each naming authority has a home handle service (which may be Global) where its handles are maintained.

• Each handle service may be implemented as several handle servers.• A hashing algorithm determines the server used to store a given handle.

8

Handle Record for a Digital Object

Adm Admin Data

URL

cnri.dlib/arms-09 Adm Admin Data

http://www.cnri/xyz

RAP merlin.dlib.orgNEWorb:#cornell[]norb

9

Address Rules

The Global Handle Service stores:

a record for each naming authority

a record for each local handle service

The record for each naming authority includes:

the home handle service for that naming authority

For each handle, the home handle service stores:

the handle record

10

Resolving a Handle Without CachesHandle cnri.dlib/wya in Global G

GlobalClient

? cnri.dlib/wya ?

handle data

G

cnri.dlib/wya

11

Resolving a Handle Without CachesHandle cnri.dlib/wya in Home Service abc

Global

Home HS for cnri.dlib

Client

? cnri.dlib/wya ? G

abc

pointer to abc

? cnri.dlib/wya ?

handle datacnri.dlib/wya

12

Caching Handle Service

Client Caching Server Handle Servers

Hash

Cache

Hash table

13

Replication

All data is replicated at several sitesfor performance and reliability

Washington, DCLos Angeles, CA

14

Applications of Identifiers

The challenges:Persistent, unique identifiersEliminate broken linksControl duplicates

Applications:On-line publicationRegistrationCitation (reference links)Collection managementArchives

15

User

HandleSystem

DOIs and URNs in Action

Publisher

DOI

16

Flexibility for Publisher

Warehouse

Database

Repository

Every publisher can have adifferent system.

DOIDOI

DOI

17

Reorganization by Publisher

Database

Repositories

The publisher can create a newsystem.

DOI

DOI

DOI

18

Change of Publisher

Halfmoon

Millenium

User

DOI

HandleSystem

19

HandleSystem

Citation

Publisher

User 1

DOI

User 2

DOI

20

User

HandleSystem

Publisher

Search System

DOI

Catalogs and Indexes

21

Copyright Registration

Copyright Registry

User

Handle System

Halfmoon

DOI

22

Multiple Copies

Halfmoon Europe

User

DOI

HandleSystem

Halfmoon USA

23

User

HandleSystem

Archives

Archive

DOI

24

Reference Linking: The Problem

Generic

Given the information in a standard citation, how does one get to the thing to which the citation refers?

Specific

Given the information in a citation to a journal article, howdoes a user get from the citation to an appropriate copy of the article?

25

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

Publisher places information in databases

26

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

ClientCitation

Identifiers

27

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

URLsIdentifier

28

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

URLContent

29

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

CitationIdentifiers

URLsIdentifierURL

Content

30

Target of Citations

• Work

• Expression

• Manifestation

• Item

IFLA model

Citations can refer to any specific creation but for journals usually refer to the work.

31

Identifiers

• Are identifiers necessary?

– Persistence

– Flexible targets

• Examples:

– PubMed ID, BibCode, DOI, etc.

32

How are Identifiers Obtained?

Often the client knows the citation, but not the identifier.

• In the general model identifiers are obtained by searching the reference database.

• In limited domains, identifiers can be calculated from metadata.

• The identifier may be embedded in the citation.

33

Reference Database Lookup

• Static: Reference links are established once for all time.

– Current model in journal publishing

– Not suitable for general user queries

• Dynamic: Reference links are established on demand.

– Provides link based on most recent information

– Success can not be guaranteed

Quality of metadata in reference database(s) is crucial.

34

Metadata in Reference Database

• Existing schemes

– Considerable agreement on minimal elements

– Considerable differences in details and syntax

35

Minimal Metadata Elements for Journal Article

• Title of journal article

• Creator(s)

• Journal title

• Date of publication

• Enumeration (e.g., volume and issue)

• Location (e.g., page or article number)

• Type (e.g., "journal article")

36

Resolution of Identifier

• Choice of resolver (distributed resolution)

– Simple model: identifier determines resolver

• Selection from multiple copies (selective resolution)

– Performance criteria

– Economic and related criteria

– User requirements

37

Interoperability

Several reference linking services under development:

PubMed

Astrophysics Data Center

DOI reference service

Los Alamos National Laboratory internal reference service

What levels of agreement and tools are needed for cross-linking?