Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...

35
Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University [email protected] Mo i Rana, Norway November 10, 1998 http://www2.cs.cornell.edu/payette/ presentations/DL-architecture.ppt

Transcript of Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...

Page 1: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Digital Library Architecture:A Service-Based Approach

Sandra PayetteDepartment of Computer Science

Cornell University

[email protected]

Mo i Rana, Norway

November 10, 1998

http://www2.cs.cornell.edu/payette/presentations/DL-architecture.ppt

Page 2: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Overview

• Why talk about DL architecture?

• Digital Libraries - the architectural perspective

• Review of service-based architecture

• NCSTRL - a working example

• Dienst - existing service-oriented architecture

• Cornell next generation (component-oriented)

• Conclusion

Page 3: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Why Talk about Digital Library Architecture?

• Web alone is not a digital library

• Commercial packages limited– limited flexibility– standards issues– network-enabled applications not DL architecture

• Must position for broader DL opportunities

Page 4: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Web by itself not a DL Architecture

• Documents - Files, CGI, MIME-Types

• Naming - URLs

• Document Servers - HTTP servers

• Resource Discovery - web crawlers

• Collections - web pages, ad-hoc

• IP - Access Control List, passwords, ad-hoc

Page 5: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

WWW Infrastructure Evolving

• Resource Description Framework (RDF)– will allow rich metadata semantics for documents– http://www.w3.org/RDF/

• Extensible Markup Language (XML)– will allow highly structured documents and rich

linking (relationship) capabilities– http://www.w3.org/XML/

• Uniform Resource Names (URNs)– will allow for persistent, globally unique identifiers

Page 6: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

But still need Digital Library Architecture

• Richer document model - digital objects

• Persistent, unique naming - URNs

• Well-defined digital library services

• Better facilities for resource discovery

• Flexible definition of collections

• Management of distributed content & services

• Rights management for intellectual property

Page 7: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

NordicDigital Library

Cornell Digital Library

Digital Library Interoperability

Page 8: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Digital Library Architecture:Key Principles

• Open Architecture– functionality partitioned into set of well-defined services

– services accessible via well-defined protocol

• Modularization– promotes interoperability

– scalable to different clientele (research library, informal web)

• Federation – enable aggregations into logical collections

• Distribution– of content (collections) and services

– of administration and management of DL

Page 9: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Repository Services

Component-Ware Digital LibrariesCollection Services

Index Services

PersistentNAMES

NameService

UserInterfaceGateway

DigitalObjects

Page 10: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

NCSTRL A Working Example

120+ Institutions in US, Europe, and Asia

A Globally Distributed Digital Library

Page 11: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

NCSTRL Participants: collections federated

• 120+ institutions– Universities/labs - research reports– European Research Consortium for Informatics

and Mathematics (ERCIM)– Los Alamos (Physics pre-prints, ACM )– D-Lib Magazine

• 40+ independent servers

Page 12: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Federation of

Collections

Page 13: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Documents inDistributedRepositories

Page 14: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Multi-FormatDocument

Model

Page 15: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

• modular system based on a standard open architecture

• study of hard, real-world problems: policy issues, quality of service, federation of publishers

• creation of a self-sustaining international federated digital collection

NCSTRLReal-world testbed for ...

Page 16: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst NCSTRL technical base

• Implements a service-based architecture for distributed digital libraries

• Protocol and reference implementation

• Network of services

• WWW browser access

• Uniform search over distributed indexes

• Access to documents in distributed repositories

• Access to multi-formatted documents

Page 17: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst:Service-Based Architecture

• Document model

• Naming service (CNRI’s Handle System)

• Repository service

• Indexer service

• Collection service

• User Interface service

Page 18: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst Document Model

decompositionsrepresentations

Handle (URN)

physical logical

AS

CII

TIF

F

Pos

tScr

ipt

met

adat

a

underlying formats

Page 19: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,
Page 20: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst: Document Protocol

• Documents addressable through their URNs

• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format

Page 21: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst 5.0 : Document Protocol

• More complex document model:– versions– hierarchical part specification– binders (multi-part documents)

• “Structure” service request– Reveal, in XML, full or collapsed structure of a

document• e.g., chapters, sections, figures, etc.

– Describe multiple views of a document• e.g., bibliography, content, thumbnails

Page 22: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst: Core Services

WWWbrowser

Dienst UserInterface

Repository

IndexIndex Index

Repository Repository

receive unified hit list

send search request

send site specific search requestreceive hit list

send document requestreceive MIME-typed document

send document requestreceive MIME-typed document

Page 23: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst ProtocolBuilding Gateways to non-Conforming Sites

FTP/HTTP “Repositories”

Standard Servers

User Interface Gateway Server

Page 24: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Dienst: Collection Service

Page 25: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Naming Service

• Documents identified by globally unique names

• Names are persistent, permanent

• Registered names resolve to specific location (URL)

cnri.dlib/april97-payette

http://www.somewebserver.org/somedirectory/somefile

NamingAuthority

ItemName

PersistentIdentifier

(e.g., URN)

Location(URL)

Page 26: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Identifiers: Current Initiatives

• IETF Uniform Resource Names (URN) – specification of URN framework– requirements for resolution systems– syntax definition

• Existing Systems– CNRI’s Handle System (**NCSTRL uses)– OCLC PURLs– DOI Initiative

Page 27: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Looking Ahead: Current Research at Cornell

• Digital Objects and Repository– FEDORA– Joint work in Interoperability with CNRI– Access Management

• Resource Discovery– STARTS (Cornell/Stanford collaboration)– Intelligent Distributed Searching

• Collection Definition

Page 28: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Digital Object is...

recognizable by what it can do

getChaptergetPage

getTrackgetLabel

getSectiongetArticle

getFramegetLength

Page 29: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Structure

Mechanism

Content-TypeInterfaces

Book

MARC

What the client sees vs.What the object is

Page 30: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

application/MARC DS1

application/postscript DS2

GenericDisseminator

FEDORA DigitalObject

Book, DublinCore

ListContentTypes

BookDisseminator

DublinCoreDisseminator

GetChapterGetIndexGetPage

Get(Book.getPage(1))

Page 31: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

FEDORA:Extensibility for Content Types

• Simple, familiar content types

• Complex, compound, dynamic content types

Page 32: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Resource Discovery

• Meta-Searching for Resource Discovery– query multiple document sources– choose best sources to evaluate a query– evaluate the query at these sources– merge the query results from these sources

• Stanford Protocol Proposal for Internet Retrieval and Search (STARTS) – www-db.stanford.edu/~gravano/starts.html

– www.cs.cornell.edu/NCSTRL/STARTS/STARTShome.htm

Page 33: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Distributed Collection Service Definition and Access

Central Collection

Server

Collection QueryRouter

Collection QueryRouter

Collection QueryRouter

User InterfaceIntelligent routing

based on regional conditions

Page 34: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Conclusions: Design with an Eye Toward the Future

• Know limitations of ad-hoc web development and commercial packages

• Embrace a service-based approach – modular designs increase flexibility, extensibility,

plug-in/plug-out– well-defined services with protocols to enable

federation and interoperability– can utilize various technologies or commercial

software underneath the service layers

• Watch Web developments in XML and RDF

Page 35: Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science Cornell University payette@cs.cornell.edu Mo i Rana,

Further reading• Lagoze and Payette: An Infrastructure for Open-Architecture

Digital Libraries http://ncstrl.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690

• Davis and Lagoze: NCSTRL: Design and Deployment of a Globally Distributed Digital Library, Draft of submission to IEEE Computer Special Issue on Digital Libraries, February 1999.

http://www2.cs.cornell.edu/lagoze/papers/NCSTRL-IEEE3.doc

• Payette: Persistent Identifiers, RLG DigiNews http://www.rlg.org/preserv/diginews/diginews22.html

• Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA)http://www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html