Digital Assets Repository 3 -...
Transcript of Digital Assets Repository 3 -...
Digital Assets Repository 3.0
PASIG User Group Conference
Noha AdlyBibliotheca Alexandrina
DAR 3.0
• DAR manages the full lifecycle of a digital asset: its creation, ingestion, metadata management, storage, dissemination, publishing and dissemination, publishing and archival
• An eco-system of components for an integrated institutional repository.
DAR 3.0
• Modular design with integrated components
• Consolidation of assets
• Flexible content model for different types of
digital objects based on current standards digital objects based on current standards
• Integration with different sources of
metadata, e.g ILS, repositories, databases, …
• Repository-bound applications
• Preservation
Conceptual Overview
• Digital Assets Factory (DAF) – Flexible management for the digitization workflow
– Unified means of ingestion into the system
– Support both physical and born digital materials
• Digital Assets Metadata (DAM) manages the • Digital Assets Metadata (DAM) manages the metadata even in an incomplete state.
• Digital Assets Publishing (DAP) components allow applications to synchronize objects and their metadata stored in their databases/indexes with the repository
• Digital Assets Keeper (DAK) manages access to the object files, versions and caching.
Conceptual Overview
• Collections/Sets:– DAR manages one instance of the object
– Objects are consolidated into sets/collections
– An object can belong to different sets
– Objects are shared among applications– Objects are shared among applications
– Applications synch with repository getting latest updates of their objects
– Applications maintain different derivatives of same object
– Relies on RDF to define sets and relations between objects
Conceptual Overview
• Discovery layer– Core files are kept online on spinning drives
– Simple derivatives for display
– Users can browse and search using simple viewers
– Provides full text search across the whole – Provides full text search across the whole collection, based on the access rights granted to the user.
• Ingestion plugins– Flexible Integration with different sources of
metadata
– Allow ingestion and synchronization with external sources
Digital Assets Factory (DAF)
• Full control over the digitization process workflow
• Configurable and flexible management tool for any digitization workflow
• Flexible workflow definition including • Flexible workflow definition including
– Definition of sequence of phases
– Pre-phase and post-phase checks
– Redirects
• Special workflows are defined for different object types
Digital Assets Factory (DAF)
• Automated integrity checks at each step of the workflow.
• Automated ingestion into the repository and archiving.
• Integrates with external sources of metadata • Integrates with external sources of metadata thru plugins
• Integrates with enterprise tools and automated software used for digitization
• Compliant with OAIS
• Available for download at http://wiki.bibalex.org/DAFWiki
Metadata Management
• METS and MODS standards for recording
metadata
• Fedora as a metadata registry
• Content Models (Hybrid)• Content Models (Hybrid)
– Photo (atomistic) / Album (aggregate)
– Book (compound ) / Bibliographic (aggregate)
Triple Store and Handles
• Triple Store– RDF relations between objects are stored in Triple
Store
– Currently using Mulgara
– Scalability Issues– Scalability Issues
– Alternatives: 4Store? Integration with Fedora
• Handles– Each object has a unique identifier UUID
– UUID is used to generate Handle
– list of external identifiers is maintained
METS Store
• A METS skeleton is created for each
object even if metadata is incomplete
• When metadata complete, send to Fedora
and disseminate
• Accommodate digitizing objects before
metadata is ready
• METS store can be used to reconstruct
Fedora
Metadata Synchronization
– External sources• Synchronization is based on XML templates
• Templates map the output of ILS or DB into MODS
• Templates can be easily created for different sources
– Metadata Tool• No source of info to extract metadata• No source of info to extract metadata
• Relies on human data entry (normal users)
• Generates human friendly forms thru configurable XML templates
• Offers type validation, controlled vocabulary, authority lists
• Metadata is synchronized with METS store
• Allows full text search (Solr) across items in sets/collections
• Represent s objects in a hierarchy depicting sets /collections
• Supports simple workflow with designated roles e.g. editors, reviewers, etc.
Copyright and Access Module
• Access control policy for specific sets or objects
• Can define rights to certain operations (e.g. view, print, download …etc) based on the application requesting access
• Can define exceptions to override rules (e.g. • Can define exceptions to override rules (e.g. prevent a certain object from being displayed)
• Coordinate access to objects based on the number of licenses
Authentication and Authorization
• Single Sign On module
• Set management and ACLs
• LDAP integration and local users• LDAP integration and local users
Digital Assets Keeper
• Keep a working copy of the object online
• Maintain a unique copy of the object with persistent identifier
• Handle entries and external identifiers• Handle entries and external identifiers
• A storage abstraction layer isolate repository from storage implementation
• Manages different versions of items
• Manages caching and derivates
• Load balancing among nodes
Online Archive (OnA)
• Complete hardware and software solution for archival
• Provides reliable and scalable storage
• based on commodity hardware with • based on commodity hardware with spinning hard drives
• uses in house developed software for data management
• Any AIP ingested is mirrored at least once
• Heavily relies on Checksums to ensure the integrity of the data
Digital Assets Publishing (DAP)
• Different Viewers and applications are built using the Restful API
• Applications are highly integrated with repository; not separate silos: Repository-bound
• DAR manages one instance of each object• DAR manages one instance of each object
• Applications have access to slice of the data (Sets of Objects) based on their access rights
• Applications synch with DAR: queries API for new or updated metadata and files
• Applications maintain different derivatives independently
Discovery Layer
• Stores simple derivatives for all objects
• Users can browse and search all assets stored within using simple viewers.
• Provides full text search across the • Provides full text search across the metadata and textual content, based on the access rights granted to the user.
• Full text search is built on Solr with support for 5 languages: Arabic, English, French, Spanish and Italian
Current Status• More than 430,000 objects including
– Books
– Photos
– Manuscripts
– Maps
– Documents
• Specialized viewers been built to display items stored within the repository, such as books and photos. within the repository, such as books and photos.
• More viewers are still under e.g. tiled image viewer and manuscript viewer.
• Print on demand (POD) integration layer makes part of DAR available through the POD system.
• Several interfaces can also be built on top of this API to integrate DAR with other systems.
DAR Books
– Application built on top of DAR using Restful API
– displays books stored in the repository (185,000)
– Faceted Search, including content
• Morphological full text search (5 languages)• Morphological full text search (5 languages)
• Search results highlighting
• Embeddable book viewer, can be added to any
webpage.
• Whenever a book is added to or updated in DAR, it
is automatically retrieved by DAR books.
DAR Books
• Annotations Tools– Sticky Notes
– Highlight and underline, colors
– More to come…• Open Annotations, Annotea, …etc
• Web 2.0 Social Features: • Web 2.0 Social Features: – Rating and comments
– Create your own bookshelves
– Sharing and embedding
– Adding to other social sites: Facebook, Twitter,…
Text Highlighting
Text Underlining
Adding Sticky Notes
Future Work
• Enhance the Storage Layer: exploring iRODS, pair trees …etc
• Extending the Copyright and Access modulemodule
• Explore the potential of triple stores
– Beyond defining sets and collections
– Scalability
• Migrating existing applications into repository-bound
Thank You