Post on 15-Jan-2016
Building a Digital Library with Fedora
International Conference on Developing Digital Institutional Repositories
Hong KongDecember 9, 2004
UVA Digital Library Assumptions:
• All media, all content types integrated into one collection
• A network that is built to be a part of a global network
• The global network will be built by libraries, governments and corporations
• Searching and browsing are equally important
UVA Digital Library Assumptions (cont.):
• We will provide to tools to give access and make use of our collections
• Any given resource can be presented in any number of contexts
• Increasingly, we will be faced with born-digital materials
• This is going to take a very long time …
The digital library as a web of content
Text Collections
Te xts
M o de r n E ng l i s h C o l l e c t i o n
P ag eIm ag e s
Explicit Data Modeling for Serials
C av alie r D aily
D aily I s s u eD aily I s s u e
M o de r n E ng l i s h C o l l e c t i o n
Collecting Scholarly Projects
L e ave s o f G r as s
Te xt 2Te xt 1
M o de r n E ng l i s h C o l l e c t i o n
The W hi tm an P r o je c t
Art and Architecture Data
W o rkO bje c ts
A rt a nd A rc hite c ture C o lle c tio n
Im a g eO bje c ts
Quantitative Data Collections
D a ta s e tD e s c riptio n
O be jc ts
Q ua ntita tive D a ta C o lle c tio n
D a ta ba s eO bje c ts
The Flexible Extensible Digital Repository Architecture
• Developed at Cornell under an NSF grant• UVA Library re-interpreted the architecture and
created the first practical implementation• 3 year project funded in 2001 by Andrew W.
Mellon Foundation to create open-source system• Another 3 years of development funded by Mellon
in 2004
Fedora is a set of web services that can provide a foundation for a variety of information management strategies.
– Supports client applications through SOAP or HTTP connections
– Provides back-end web services for content through behavior objects
– Provides both management and access APIs– Provides a search index aimed at repository
management– Fine-grained policy enforcement
Users access data objects through behaviors.
PI D
B e h a v io r
B e h a v io r
B e h a v io r
B e h a v io r
D a ta s tre a m
D a ta s tre a m
D a ta s tre a mUs e rs
File
File
H TTPS e rv e r
A pplica t io n s
B e h a v io rD e f in in t io n
B e h a v io rM e ch a n is m
Pro ce s s
A data object is one unit of content.
Persistent ID (PID)
Default Disseminator
System Metadata
Datastream (item)
Digital object identifier
methods for disseminating “views” of content
metadata about history and policies
Datastream (item)
Datastream (item)
Your Extension
Your Extension
set of content or metadata items
Persistent ID (PID)
Behavior DefinitionMetadata
SystemMetadata
DatastreamsData Object
Persistent ID (PID)
Service BindingMetadata (WSDL)
SystemMetadata
DatastreamsWeb
Service
behavior contract
behavior
subscriptio
n
data contract
Persistent ID (PID)
Disseminators
Datastreams
System Metadata
Behavior Mechanism Object
Behavior Definition Object
Disseminators for Data Normalization
• Can deliver datastream content directly
• Can transform content into other sizes or formats for delivery
• Can be used to hide differences among objects of a given type
Disseminators as User Interface• Can deliver a “module” of user interface
appropriate for the object
• Different user interfaces for difference purposes or audiences
• Easy to add new types of collections by adding new modules of code
• The set of all behavior objects can be used as a database of code modules
• Can provide a way to collect the “look and feel” of scholarly projects in a formal way
Relationships Among Objects
• Relationship metadata datastream in the data object
• Describes adjacency relationships among objects
• RDF data of the form:
PID – typeOfRelationship – relatedObjectPID
• Can used to assemble collections for such things as creating full-text search indexes
• Can build graphs of relationships to feed into a variety of user interfaces
• Uses Resource Description Framework (RDF)
• The repository can be configured to index any combination of the following aspects of a digital object:– System metadata properties– Dublin Core metadata– Metadata about datastreams and disseminations– Relationship metadata– Internal dependencies (e.g., between datastreams
and disseminators)
The Resource Index
I n te g ra te dS e a rch
I n te rfa ce(R o o m s )
V irg oC a ta lo g
D ig ita l D is co v e ryI n de x
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
G DM S F ile
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
TEI F ile
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
EAD F ile
A rt a n d A rch .S e a rch
I n te rfa ce
Te x tS e a rch
I n te rfa ce
Fin din g A idsS e a rch
I n te rfa ce
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
Imag eDatastre ams
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
Imag eDatastre ams
Ele ctro n icJ o u n a ls a n dD a ta ba s e s
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
F in d in g aid sInd e x
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
Art an dArch ite ctu re
Ind e x
PID
Disse m ina tors
S yste m M e ta da ta
De sc M e tad ata
Admin M e tadata
Te xt Ind e x
UVA First Implementation
Arch Demo
Disseminators
• Two default disseminators on every object– Default access behaviors, i.e. getPreview, getFullView,
getLabel, getDefaultContent
– Administrative and descriptive metadata behaviors
• Class-specific disseminators, i.e. image and text disseminators
• Search services to be provided using collection object disseminators
Text Collections: three models
• TEI transcriptions of texts plus page images• TEI transcriptions only• Page images only, but the text represented by a
minimal TEI file
Future Fedora Development
• Improved infrastructure for workflow• Support for building indexes for searching• Infrastructure for building federations of Fedora
repositories• Enhance performance• Support for preservation• Begin organizing a Fedora development
consortium
Fedora Project web site:http://www.fedora.info
UVA Digital Initiatives:http://www.lib.virginia.edu/digital