1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

37
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    1

Transcript of 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

Page 1: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

1

CS 502: Computing Methods for Digital Libraries

Lecture 4

Identifiers and Reference Links

Page 2: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

2

Desirable Properties of Identifiers

• Location independent name

• Globally unique

• Persistent across time

• Choice of human generated or automatic generation

• Fast resolution

• Decentralized administration

• Supported from standard user interfaces

Page 3: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

3

Syntax<naming_authority>/<locally_unique_string>

orhdl:<naming_authority>/<locally_unique_string>

Examples10.1234/1995.02.12.16.42.21;9 (date-time stamp)cornell.cs/cstr-94.45 (mnemonic name)loc/a43v-8940cgr (random string)

Syntax of Handles

Page 4: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

4

Examples of DOIs

10.156 / catalog-96

Publisher ID

assigned byDOI Agency

Item IDassigned byPublisher

10.1048 / 872

10.1532 / PII

10.18698 / SICI

Page 5: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

5

Elements of the Handle System

• Handle services:

global handle service

local handle services

caching services

• Clients:

client libraries

browser extension

WWW proxy servers

• Handle administration

• System utilities

Page 6: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

6

Hierarchy of Naming Authorities

loc 10 cornell

loc.cords

10.1234

cornell.temp

cornell.cs

cornell.cs.d

Page 7: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

7

Handle Servers and Handle Service

• The Global Handle Service provides central coordination for all handle services.

• Each naming authority has a home handle service (which may be Global) where its handles are maintained.

• Each handle service may be implemented as several handle servers.• A hashing algorithm determines the server used to store a given handle.

Page 8: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

8

Handle Record for a Digital Object

Adm Admin Data

URL

cnri.dlib/arms-09 Adm Admin Data

http://www.cnri/xyz

RAP merlin.dlib.orgNEWorb:#cornell[]norb

Page 9: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

9

Address Rules

The Global Handle Service stores:

a record for each naming authority

a record for each local handle service

The record for each naming authority includes:

the home handle service for that naming authority

For each handle, the home handle service stores:

the handle record

Page 10: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

10

Resolving a Handle Without CachesHandle cnri.dlib/wya in Global G

GlobalClient

? cnri.dlib/wya ?

handle data

G

cnri.dlib/wya

Page 11: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

11

Resolving a Handle Without CachesHandle cnri.dlib/wya in Home Service abc

Global

Home HS for cnri.dlib

Client

? cnri.dlib/wya ? G

abc

pointer to abc

? cnri.dlib/wya ?

handle datacnri.dlib/wya

Page 12: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

12

Caching Handle Service

Client Caching Server Handle Servers

Hash

Cache

Hash table

Page 13: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

13

Replication

All data is replicated at several sitesfor performance and reliability

Washington, DCLos Angeles, CA

Page 14: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

14

Applications of Identifiers

The challenges:Persistent, unique identifiersEliminate broken linksControl duplicates

Applications:On-line publicationRegistrationCitation (reference links)Collection managementArchives

Page 15: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

15

User

HandleSystem

DOIs and URNs in Action

Publisher

DOI

Page 16: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

16

Flexibility for Publisher

Warehouse

Database

Repository

Every publisher can have adifferent system.

DOIDOI

DOI

Page 17: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

17

Reorganization by Publisher

Database

Repositories

The publisher can create a newsystem.

DOI

DOI

DOI

Page 18: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

18

Change of Publisher

Halfmoon

Millenium

User

DOI

HandleSystem

Page 19: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

19

HandleSystem

Citation

Publisher

User 1

DOI

User 2

DOI

Page 20: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

20

User

HandleSystem

Publisher

Search System

DOI

Catalogs and Indexes

Page 21: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

21

Copyright Registration

Copyright Registry

User

Handle System

Halfmoon

DOI

Page 22: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

22

Multiple Copies

Halfmoon Europe

User

DOI

HandleSystem

Halfmoon USA

Page 23: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

23

User

HandleSystem

Archives

Archive

DOI

Page 24: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

24

Reference Linking: The Problem

Generic

Given the information in a standard citation, how does one get to the thing to which the citation refers?

Specific

Given the information in a citation to a journal article, howdoes a user get from the citation to an appropriate copy of the article?

Page 25: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

25

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

Publisher places information in databases

Page 26: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

26

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

ClientCitation

Identifiers

Page 27: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

27

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

URLsIdentifier

Page 28: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

28

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

URLContent

Page 29: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

29

The General Model

Referencedatabase

Locationdatabase

Content

Publisher

Client

CitationIdentifiers

URLsIdentifierURL

Content

Page 30: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

30

Target of Citations

• Work

• Expression

• Manifestation

• Item

IFLA model

Citations can refer to any specific creation but for journals usually refer to the work.

Page 31: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

31

Identifiers

• Are identifiers necessary?

– Persistence

– Flexible targets

• Examples:

– PubMed ID, BibCode, DOI, etc.

Page 32: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

32

How are Identifiers Obtained?

Often the client knows the citation, but not the identifier.

• In the general model identifiers are obtained by searching the reference database.

• In limited domains, identifiers can be calculated from metadata.

• The identifier may be embedded in the citation.

Page 33: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

33

Reference Database Lookup

• Static: Reference links are established once for all time.

– Current model in journal publishing

– Not suitable for general user queries

• Dynamic: Reference links are established on demand.

– Provides link based on most recent information

– Success can not be guaranteed

Quality of metadata in reference database(s) is crucial.

Page 34: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

34

Metadata in Reference Database

• Existing schemes

– Considerable agreement on minimal elements

– Considerable differences in details and syntax

Page 35: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

35

Minimal Metadata Elements for Journal Article

• Title of journal article

• Creator(s)

• Journal title

• Date of publication

• Enumeration (e.g., volume and issue)

• Location (e.g., page or article number)

• Type (e.g., "journal article")

Page 36: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

36

Resolution of Identifier

• Choice of resolver (distributed resolution)

– Simple model: identifier determines resolver

• Selection from multiple copies (selective resolution)

– Performance criteria

– Economic and related criteria

– User requirements

Page 37: 1 CS 502: Computing Methods for Digital Libraries Lecture 4 Identifiers and Reference Links.

37

Interoperability

Several reference linking services under development:

PubMed

Astrophysics Data Center

DOI reference service

Los Alamos National Laboratory internal reference service

What levels of agreement and tools are needed for cross-linking?