Digital Libraries: Basic Concepts and a Reference Model
description
Transcript of Digital Libraries: Basic Concepts and a Reference Model
Digital Libraries:Basic Concepts anda Reference Model
FUB Summer School 2010Bozen-Bolzano, 20-24 September 2010
Vittore CasarosaISTI-CNR, Pisa and
Free University of Bozen-Bolzano, Italy
The traditional role of libraries
Mediators between information and users Selection
– Definition of collections Acquisition
– Physical objects Description
– Catalogs Access
– Shelves Preservation
– Controlled enviromentFUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 2
Libraries: some figures
Volumes (in millions)
Journals– From 10.000 in 1950 to 150.000 in 2002
Alexandria principle beginning to fadeFUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 3
Institution 1910 1995 2002 Library of Congress 1,8 23 26 Harvard Univ. 0,8 12,9 14,9 Yale Univ. 0,55 9,5 10,9 U Illinois (Urbana) 0,1 8,5 9,9 U California (Berkeley) 0,24 8,1 9,4
British Library 2 15 18 Cambridge Univ. 0,5 3,5 7 Oxford Univ. 0,8 4,8 6 Bibl. Nat. De France 3 11 12
Evolution of technology
Computer technology– CPU and integrated chips (from KHz to GHz)– Random Access Memories (RAM – from KB to GB)– External memories
• Tapes, hard disks, floppy disks• Memory sticks• CDs• DVDs• from MB to GB to TB to PB to EB
Communication technology (networks)– (Telephone) line speed– Point to point (leased lines)– Local Area Networks– Inter-networking (TCP/IP)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 4
The World Wide Web
Combination of computer technology and communication technology
It all started with the “hyperlink” Then came the “browser” (Mosaic) Then came the first wave Then came the “dot come, dot gone” Then came the second wave Finally came the Web and the “information explosion”
– An estimate of 500 to 1000 million hosts– An estimate of 30 to 50 billion pages on line
And now we have Web 2.0 (with Web 3.0 just around the corner)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 5
Definition of “Digital Library”
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 6
• Click to add text
DELOS
The Network of Excellence on Digital Libraries
Define and conduct a joint program of activities in order to integrate and coordinate the on-going research activities of the major European research teams in the field of digital libraries for the purpose of developing the next generation digital library technologies
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 7
DELOS visions
Digital libraries should enable any citizen to access all human knowledge anytime and anywhere,in a friendly, multi-modal, efficient, and effective way, by overcoming barriers of distance, language, and culture and by using multiple Internet-connected devices (about year 2000)
The potential exists for digital libraries to become the universal knowledge repositories and communication conduits for the future, a common vehicle by which everyone will access, discuss, evaluate, and enhance information of all forms (about year 2005)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 8
What is a Digital Library ?
A DL is the combination of “content and services” A DL is an “entity” providing the functionality to mediate
between information objects and information users in the context of distributed collections of information objects. This (external) functionality includes access, publish, delivery, preservation, personalization, etc.
A Digital Library is a “tool” at the centre of intellectual activity having no logical, conceptual, physical, temporal, or personal borders or barriers on information
A Digital Library is an “institution” in charge of providing at least the functionality of a traditional library in the context of distributed and networked collections of information objects
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 9
Digital Libraries in the Information Space
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 10
Structure of Data
Structure of User Behavior Digital
Libraries
Databases/IR
Web
Low
High
High
Digital Libraries Technologies
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 11
DLs
Data Management
Document Management
Information Retrieval
Artifi
cial In
tellig
ence
Hum
an C
om
pute
r in
tera
ctio
nLibrary Sciences
Museum sciences
Arch
ives
SociologyPsy
chol
ogy
Image P
roce
ssin
gW
eb
Conceptual Framework
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 12
Contents
Usage
Digital Library System
Management
Building a Digital Library
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 13
Research Directions in DLs
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 14
Reference Model for DLS
Foundations
ArchitecturesInformation
AccessAudio/Visual
SemanticInterop
Systems
UserInterfaces
Visualization Personalization
User
e-Health, e-Learning, e-Culture, etc.…
ApplicationsH
orizo
nta
l
Cura
tion
Pre
serv
atio
nR
eposito
ries
Evalu
atio
nM
eth
odolo
gie
sEvalu
atio
nIn
frastru
cture
s
Need for a Reference Model
A reference model is an abstract framework for understanding significant relationships among the entities of some environment, and for the development of consistent standards or specifications supporting that environment
A reference model is based on a small number of unifying concepts and may be used as a basis for education and explaining standards to a non-specialist
A reference model is not directly tied to any standards, technologies or other concrete implementation details, but it does seek to provide a common semantics that can be used unambiguously across and between different implementations
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 15
A Three-Entity Framework
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 16
The three Entities
Digital Library– An organization, which might be virtual, that comprehensively collects,
manages, and preserves for the long term rich digital content, and offers to its user communities specialized functionality on that content, of measurable quality and according to codified policies
Digital Library System– A software system that is based on a defined (possibly distributed)
architecture and provides all functionality required by a particular Digital Library. Users interact with a Digital Library through the corresponding Digital Library System
Digital Library Management System– A generic software system that provides the appropriate software
infrastructure both (i) to produce and administer a Digital Library System incorporating the suite of functionality considered foundational for Digital Libraries and (ii) to integrate additional software offering more refined, specialized, or advanced functionality
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 17
Different types of DLMS
Extensible Digital Library System – A complete Digital Library System that is fully operational with respect to basic/
foundational functionality required. It is based on an open software architecture, so that further software components can be incorporated on top of the ones already there with ease (DelosDLMS, GreenStone)
Digital Library System Warehouse – A collection of software components that encapsulate the core suite of DL
functionality and a set of tools that can be used to combine these components in a variety of ways (in Lego-like fashion) to create Digital Library Systems offering a tailored integration of functionalities. New software components can easily be incorporated into the Warehouse for subsequent combination with those already there (BRICKS, DILIGENT)
Digital Library System Generator – A highly parameterized software system that encapsulates templates covering a
broad range of functionalities, including a defined core suite of DL functionality as well as any advanced functionality that has been deemed appropriate to meet the needs of the specific application domain Through an initialization session, the appropriate parameters are set and configured; at the end of that session, an application is automatically generated, and this constitutes the Digital Library System ready for installation and deployment (MARIAN)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 18
DELOS DLMS
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 19
Containment of models
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 20
Actors in the Digital Library
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 21
Main roles of Actors (1)
DL End-Users – They exploit the DL functionality for providing, consuming, and managing the DL
Content as well as some of its other constituents They perceive the DL as a stateful entity that serves their functional needs The behaviour and output of the DL depend on its state at the time a particular part of its functionality is activated. DL end-users may be further partitioned into
• Content Creators• Content Consumers• Librarians (end user)
DL Designers (Digital Librarian) – They exploit their knowledge of the semantic of the application domain to define,
customize, and maintain the Digital Library so that it is aligned with the information and functional needs of its end-users To perform this task, they interact with the DLMS providing functional and content configuration parameters The values of these parameters, which can be modified during the DL lifetime, configure the specific DL perceived by the end-users because they determine the particular Digital Library System instance serving the Digital Library
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 22
Main roles of Actors (2)
DL System Administrators (System Librarian)– They select the software components necessary to create the Digital Library
System needed to serve the required DL (as specified by the DL Designer) and decide where and how to deploy them They interact with the DLMS by providing architectural configuration parameters, such as the selected software components, the hosting nodes, and the components allocation. The value of the architectural configuration parameters can be changed over the DL lifetime Any change of these parameters may result in the provision of different DL functionality and/or different quality
DL Application Developers – They develop the software components of DLMSs and DLSs, implementing the
necessary functionality
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 23
Hierarchy of Actors’ Views
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 24
Main concepts (1)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 25
Main concepts (2)
Content – The Content concept encompasses the data and information that the Digital Library handles
and makes available to its users Content is an umbrella concept used to aggregate all forms of information objects that a Digital Library collects, manages, and delivers It encompasses the diverse range of information objects, including such resources as objects, annotations, and metadata
User – The User concept covers the various actors (whether human or machine) entitled to interact
with Digital Libraries. Digital Libraries connect actors with information and support them in their ability to consume and make creative use of it to generate new information User is an umbrella concept including all notions related to the representation and management of actor entities within a Digital Library. It encompasses such elements as the rights that actors have within the system and the profiles of the actors with characteristics that personalize the system’s behavior or represent these actors in collaborations
Functionality – The Functionality concept encapsulates the services that a Digital Library offers to its
different users, whether classes of users or individual users While the general expectation is that DLs will be rich in capabilities and services, the bare minimum of functions would include such aspects as new information object registration, search, and browse Beyond that, the system seeks to manage the functions of the Digital Library to ensure that the functions reflect the particular needs of the digital library’s community of users and/or the specific requirements relating to the Content it contains
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 26
Main concepts (3)
Policy – The Policy concept represents the set (or sets) of conditions, rules, terms and regulations
governing interaction between the Digital Library and users, whether virtual or real. Examples of policies include acceptable user behaviour, digital rights management, privacy and confidentiality, charges to users, and collection delivery
Quality – The Quality concept represents the parameters that can be used to characterize and
evaluate the content and behavior of a Digital Library. Quality can be associated not only with each class of content or functionality but also with specific information objects or services Some of these parameters are objective in nature and can be automatically measured, whereas others are subjective in nature and can only be measured through user evaluations
Architecture – The Architecture concept refers to the Digital Library System entity and represents a
mapping of the functionality and content offered by a Digital Library onto hardware and software components There are two primary reasons for having Architecture as a core concept: (i) Digital Libraries are often assumed to be among the most complex and advanced forms of information systems; and (ii) interoperability across Digital Libraries is recognized as a substantial research challenge A clear architectural framework for the Digital Library System offers ammunition in addressing both these issues effectively
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 27
The main concepts in perspective
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 28
User
The Reference Model
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 29
Concept Maps
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 30
Digital Library Domains
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 31
The Resource Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 32
The Content Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 33
The User Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 34
The Functionality Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 35
Main functions
C32 Access ResourceC33 Discover
C34 BrowseC35 Search
C36 AcquireC37 Visualize
C38 Manage ResourceC39 CreateC40 SubmitC41 WithdrawC42 UpdateC43 ValidateC44 AnnotateC45 Manage Inform ObjectC64 Manage ActorC71 Manage FunctionC72 Manage PolicyC73 Manage Quality Parameter
C74 CollaborateC75 Exchange Information
C76 Converse
C77 Find Collaborator
C78 Author Collaboratively
C79 Manage DLC80 Manage Content
C85 Manage User
C90 Manage Functionality
C92 Manage Quality
C93 Manage Policy Domain
C94 Manage & Configure DLSC95 Manage DLS
C104 Configure DLS
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 36
Manage Information Object
C46 Disseminate– C47 Publish
C48 Author – C49 Compose
C50 Process– C51 Analyze
• C52 Linguistic Analysis• C53 Qualitative Analysis
– C54 Examine Preservation State• C55 Statistical Analysis • C56 Scientific Analysis• C57 Create Structured Representation• C58 Compare
– C59 Transform • C60 Physically Convert
– C61 Translate• C62 Convert to a Different Format• C63 Extract
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 37
Manage Actor
C65 Establish Actor – C66 Register
• C67 Sign Up
– C68 Login
C69 Personalise– C70 Apply Profile
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 38
Manage DL
C80 Manage Content – C81 Manage Collection
• C82 Import Collection• C83 Export Collection
– C84 Preserve C85 Manage User
– C86 Manage Membership – C87 Manage Group – C88 Manage Role – C89 Manage Actor Profile
C90 Manage Functionality – C91 Monitor Usage
C92 Manage Quality C93 Manage Policy Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 39
Manage & Configure DLS
C95 Manage DLS – C96 Create DLS – C97 Withdraw DLS – C98 Update DLS – C99 Manage Architecture
• C100 Manage Architectural Component• C101 Configure Architectural Componebnt• C102 Deploy Architectural Component• C103 Monitor Architectural Component
C104 Configure DLS – C105 Configure Resource Format – C106 Configure Content – C107 Configure User – C108 Configure Functionality – C109 Configure Policy – C110 Configure Quality
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 40
Access Resource
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 41
Manage Resource (1)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 42
Manage Resource (2)
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 43
Manage Actor
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 45
Collaborate
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 46
Manage DL
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 47
Manage & Configure DLS
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 48
The Policy domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 49
Categorization of Policies
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 50
The Quality Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 51
The Architecture Domain
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 52
The Digital Library Development Framework
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 53
DLS Reference architecture
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 54
DLS Concrete architecture
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 55
Conclusions
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 56
In addition (or related to) the technical aspects of Digital Libraries, there is one (recurring) question:
Will the Web become the ultimate Digital Library ?
Digital Libraries in the Information Space
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 57
Structure of Data
Structure of User Behavior Digital
Libraries
Databases/IR
Web
Low
High
High
Digital Libraries in the Information Space
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 58
Structure of Data
Structure of User Behavior
Digital Libraries
Databases
Web
Low
High
High
Info Retrieval
……
Wikis/blogsCMS/DAMS
Search engines
FUB 2010 59
Rank Provider Searches (000)Share of Searches
- All Search 9,247,187 100.0%
1 Google Search 6,016,689 65.1%
2 Yahoo! Search 1,272,764 13.8%
3MSN/Windows Live/Bing Search
1,198,345 13.0%
4 AOL Search 191,189 2.1%
5 Ask.com Search 184,448 2.0%
6My Web Search Search
112,265 1.2%
7 Comcast Search 48,585 0.5%
8 NexTag Search 25,392 0.3%
9WhitePages.com Network Search
25,370 0.3%
10Yellow Pages Search
19,086 0.2%
Top 10 U.S. Search Providers, Home & WorkMay 2010Source: Nielsen MegaView Search
Vittore Casarosa - Free University of Bozen-Bolzano
Traditional role of libraries
Mediators between information and users Selection
– Definition of collections Acquisition
– Physical objects Description
– Catalogs Access
– Shelves Preservation
– Controlled enviromentFUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 60
crawlers, spiders, bots, etc.
digital
Dublin Corethe Web
a) forever
whichever comes firstb) the next five years
Final question
FUB 2010 Vittore Casarosa - Free University of Bozen-Bolzano 61
“digital librarians”, “digital curators”, etc
???????
Digital Library Digital library
paper, pictures, audio, video
digital “surrogates”born digital objects
librarians, curators, etc
people
libraries, museums, archives, ....
institutions
clay tablets, papyrus, paper
contents
The evolution of “libraries”