Dienst Distributed Networked Publishing
description
Transcript of Dienst Distributed Networked Publishing
DienstDistributed Networked Publishing
Carl LagozeDigital Library Scientist
Cornell University
2
Cornell Digital Library Research Group (CDLRG)
• Research and Development of Component-Ware Digital Library Infrastructure
• Developed out of DARPA-funded Computer Science Technical Reports Projects (CS-TR)
3
Component-Ware Digital Libraries
• Service-based infrastructure– Interface (protocol) of each service– Interactions between services– aggregations into logical collections and libraries
• Layered approach accommodates requirements of varying clientele– research libraries - high-integrity, quality of
service, security– informal collections - e.g., web
4
CDLRG Research Projects
• FEDORA
• Distributed Searching and Resource Discovery
• Digital Library Collection Definition
• Metadata (Dublin Core and Warwick Framework)
• Networked Computer Science Technical Reports Project (www.ncstrl.org)
5
What is NCSTRL?
A Vehicle and Testbed for Digital Library Interoperability
A Vehicle for Exploring Policy and Organization
A Production Digital Collection
6
• A growing collection of CS research reports
• A service relied on by users and publishers
• Motivates solving hard, real-world problems: IPR, quality of service, federation of publishers
A Production Digital Collection
7
• Create a modular system based on a standard open architecture
• Provide a testbed for demonstrating and testing new digital library components
• Work with variety of researchers: DLI, ERCIM, Los Alamos
A Testbed for Technology
8
A Vehicle for Exploring Policy and Organization
• Creating a self-sustaining international federated digital collection
• Extending the domain and scope while maintaining a coherent collection
• Policy issues: charging, IPR, liability, technical quality, relationshipto other DL organizations
9
Origins of NCSTRL
• DARPA-funded CS-TR Project– CNRI, Berkeley, CMU, Cornell, MIT,
Stanford
• NSF-funded WATERS Project– Old Dominion, SUNY Buffalo, Virginia,
Virginia Tech
• Other CS Tech Reports Efforts– Harvest, UCSTRI, NZDL
10
NCSTRL Project Participants
• NCSTRL Steering Committee
• NCSTRL Working Group
• Cornell Digital Library Research Group
• The Collection
11
NCSTRL Steering Committee
• Responsible for policy direction, oversight
• How to broaden interoperability efforts into broader community
12
NCSTRL Working Group
• Responsible for operational oversight of the current system
• Membership from CSTR and WATERS projects
13
Cornell Digital Library Research Group
• Responsible for day-to-day support and maintenance of existing system
• Clearing house for technical collaborations
• Evolution and Research Directions
14
Contributing Institutions
105 Institutions in US, Europe, and Asia
15
Dienst
• is a protocol and reference implementation of a distributed digital library service
• where a network of services provide
• World Wide Web browser access,
• uniform search over distributed indexes,
• and multi-formatted documents.
16
Dienst document model
decompositions
physical logical
representations
AS
CII
TIF
F
Pos
tScr
ipt
met
adat
a
Document Handle (URN)
17
Exposing the Model through the Protocol
• Documents addressable through their URNs
• Document service requests– get document metadata– get document formats– get document in format– get document partition (page) in format
18
Dienst Services
send search request
WWWbrowser
Dienst UserInterface
Repository
IndexIndex Index
Repository Repository
send document requestreceive MIME-typed document
send document requestreceive MIME-typed document
send site specific search requestreceive hit list
receive unified hit list
19
Exposing the Services through the Protocol
• All protocol requests are service specific,
• so the functionality of any service can be accessed by another service or a new service.
20
Gateways to non-Conforming Sites
FTP/HTTP “Repositories”
Standard Servers
User Interface Gateway Server
21
Use by External Services
User Interface
Search Engine(Z39.50)
22
Publishing Using DienstRetrospective Conversion
• Scanning of legacy documents– Cornell– MIT– Stanford
• Conversion to common formats– gifs– thumbnails– PostScript
23
Publishing with DienstDigital Originals
• PostScript as lingua franca– “thanks Microsoft”
• Form submission– author-generated descriptive metadata
• Clerical clearing-house
• Automatic format conversion
24
Collection Definition in Digital Libraries
• Multiple levels of selection– authors “publish”– repositories have submission policies– search engines index– objects in search engines aggregated into collections– user interface gateways provide access to multiple
collections
• What is “in” a digital library is defined by what can be found using its resource discovery tools
25
Defining the Collection -Collection Service
26
Regional Structure
central collectionserver
27
Connectivity Regions and Collection Views
28
Improvements to the Protocol - Dienst 5
• Incremental enhancement to existing interoperability framework
• Improved document model– versions– hierarchical part specification– binders (multi-part documents)
• Implementation currently under development
29
Dienst 5 Document Structure
• Structure Request– Reveal, in XML, full or collapsed structure
of a document• e.g., chapters, sections, figures, etc.
– Describe multiple views of a document• e.g., bibliography, content, thumbnails
30
Dienst 5 Document Dissemination
• Disseminate Request– Access to component(s) described by
Structure– e.g., disseminate chapter 2 page 5 in
PostScript
31
Supporting Multiple Collections
• NCSTRL is currently a single collection• Other users of Dienst protocol
– European gray literature, thesis, and dissertation collections
– NASA space science– Mediterranean environment data and software– Los Alamos Pre-prints
• Expanding the technology to multiple collections through regions
32
Lessons Learned and Work to be Done
• Intellectual property• Quality
– quality of collection (reviewing)– quality of metadata– quality of service
• Resisting information entropy• Richer “documents”• Archiving and Preservation