Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...

Post on 31-Mar-2015

218 views 1 download

Tags:

Transcript of Digital Library Architecture: A Service-Based Approach Sandra Payette Department of Computer Science...

Digital Library Architecture:A Service-Based Approach

Sandra PayetteDepartment of Computer Science

Cornell University

payette@cs.cornell.edu

Mo i Rana, Norway

November 10, 1998

http://www2.cs.cornell.edu/payette/presentations/DL-architecture.ppt

Overview

• Why talk about DL architecture?

• Digital Libraries - the architectural perspective

• Review of service-based architecture

• NCSTRL - a working example

• Dienst - existing service-oriented architecture

• Cornell next generation (component-oriented)

• Conclusion

Why Talk about Digital Library Architecture?

• Web alone is not a digital library

• Commercial packages limited– limited flexibility– standards issues– network-enabled applications not DL architecture

• Must position for broader DL opportunities

Web by itself not a DL Architecture

• Documents - Files, CGI, MIME-Types

• Naming - URLs

• Document Servers - HTTP servers

• Resource Discovery - web crawlers

• Collections - web pages, ad-hoc

• IP - Access Control List, passwords, ad-hoc

WWW Infrastructure Evolving

• Resource Description Framework (RDF)– will allow rich metadata semantics for documents– http://www.w3.org/RDF/

• Extensible Markup Language (XML)– will allow highly structured documents and rich

linking (relationship) capabilities– http://www.w3.org/XML/

• Uniform Resource Names (URNs)– will allow for persistent, globally unique identifiers

But still need Digital Library Architecture

• Richer document model - digital objects

• Persistent, unique naming - URNs

• Well-defined digital library services

• Better facilities for resource discovery

• Flexible definition of collections

• Management of distributed content & services

• Rights management for intellectual property

NordicDigital Library

Cornell Digital Library

Digital Library Interoperability

Digital Library Architecture:Key Principles

• Open Architecture– functionality partitioned into set of well-defined services

– services accessible via well-defined protocol

• Modularization– promotes interoperability

– scalable to different clientele (research library, informal web)

• Federation – enable aggregations into logical collections

• Distribution– of content (collections) and services

– of administration and management of DL

Repository Services

Component-Ware Digital LibrariesCollection Services

Index Services

PersistentNAMES

NameService

UserInterfaceGateway

DigitalObjects

NCSTRL A Working Example

120+ Institutions in US, Europe, and Asia

A Globally Distributed Digital Library

NCSTRL Participants: collections federated

• 120+ institutions– Universities/labs - research reports– European Research Consortium for Informatics

and Mathematics (ERCIM)– Los Alamos (Physics pre-prints, ACM )– D-Lib Magazine

• 40+ independent servers

Federation of

Collections

Documents inDistributedRepositories

Multi-FormatDocument

Model

• modular system based on a standard open architecture

• study of hard, real-world problems: policy issues, quality of service, federation of publishers

• creation of a self-sustaining international federated digital collection

NCSTRLReal-world testbed for ...

Dienst NCSTRL technical base

• Implements a service-based architecture for distributed digital libraries

• Protocol and reference implementation

• Network of services

• WWW browser access

• Uniform search over distributed indexes

• Access to documents in distributed repositories

• Access to multi-formatted documents

Dienst:Service-Based Architecture

• Document model

• Naming service (CNRI’s Handle System)

• Repository service

• Indexer service

• Collection service

• User Interface service

Dienst Document Model

decompositionsrepresentations

Handle (URN)

physical logical

AS

CII

TIF

F

Pos

tScr

ipt

met

adat

a

underlying formats

Dienst: Document Protocol

• Documents addressable through their URNs

• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format

Dienst 5.0 : Document Protocol

• More complex document model:– versions– hierarchical part specification– binders (multi-part documents)

• “Structure” service request– Reveal, in XML, full or collapsed structure of a

document• e.g., chapters, sections, figures, etc.

– Describe multiple views of a document• e.g., bibliography, content, thumbnails

Dienst: Core Services

WWWbrowser

Dienst UserInterface

Repository

IndexIndex Index

Repository Repository

receive unified hit list

send search request

send site specific search requestreceive hit list

send document requestreceive MIME-typed document

send document requestreceive MIME-typed document

Dienst ProtocolBuilding Gateways to non-Conforming Sites

FTP/HTTP “Repositories”

Standard Servers

User Interface Gateway Server

Dienst: Collection Service

Naming Service

• Documents identified by globally unique names

• Names are persistent, permanent

• Registered names resolve to specific location (URL)

cnri.dlib/april97-payette

http://www.somewebserver.org/somedirectory/somefile

NamingAuthority

ItemName

PersistentIdentifier

(e.g., URN)

Location(URL)

Identifiers: Current Initiatives

• IETF Uniform Resource Names (URN) – specification of URN framework– requirements for resolution systems– syntax definition

• Existing Systems– CNRI’s Handle System (**NCSTRL uses)– OCLC PURLs– DOI Initiative

Looking Ahead: Current Research at Cornell

• Digital Objects and Repository– FEDORA– Joint work in Interoperability with CNRI– Access Management

• Resource Discovery– STARTS (Cornell/Stanford collaboration)– Intelligent Distributed Searching

• Collection Definition

Digital Object is...

recognizable by what it can do

getChaptergetPage

getTrackgetLabel

getSectiongetArticle

getFramegetLength

Structure

Mechanism

Content-TypeInterfaces

Book

MARC

What the client sees vs.What the object is

application/MARC DS1

application/postscript DS2

GenericDisseminator

FEDORA DigitalObject

Book, DublinCore

ListContentTypes

BookDisseminator

DublinCoreDisseminator

GetChapterGetIndexGetPage

Get(Book.getPage(1))

FEDORA:Extensibility for Content Types

• Simple, familiar content types

• Complex, compound, dynamic content types

Resource Discovery

• Meta-Searching for Resource Discovery– query multiple document sources– choose best sources to evaluate a query– evaluate the query at these sources– merge the query results from these sources

• Stanford Protocol Proposal for Internet Retrieval and Search (STARTS) – www-db.stanford.edu/~gravano/starts.html

– www.cs.cornell.edu/NCSTRL/STARTS/STARTShome.htm

Distributed Collection Service Definition and Access

Central Collection

Server

Collection QueryRouter

Collection QueryRouter

Collection QueryRouter

User InterfaceIntelligent routing

based on regional conditions

Conclusions: Design with an Eye Toward the Future

• Know limitations of ad-hoc web development and commercial packages

• Embrace a service-based approach – modular designs increase flexibility, extensibility,

plug-in/plug-out– well-defined services with protocols to enable

federation and interoperability– can utilize various technologies or commercial

software underneath the service layers

• Watch Web developments in XML and RDF

Further reading• Lagoze and Payette: An Infrastructure for Open-Architecture

Digital Libraries http://ncstrl.cs.cornell.edu/Dienst/UI/1.0/Display/ncstrl.cornell/TR98-1690

• Davis and Lagoze: NCSTRL: Design and Deployment of a Globally Distributed Digital Library, Draft of submission to IEEE Computer Special Issue on Digital Libraries, February 1999.

http://www2.cs.cornell.edu/lagoze/papers/NCSTRL-IEEE3.doc

• Payette: Persistent Identifiers, RLG DigiNews http://www.rlg.org/preserv/diginews/diginews22.html

• Payette and Lagoze: Flexible and Extensible Digital Object and Repository Architecture (FEDORA)http://www2.cs.cornell.edu/NCSTRL/CDLRG/FEDORA.html