DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

23
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th , 2009

Transcript of DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Page 1: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

DuraSpace:Digital Information

All Ways, Always

Pretoria, South Africa

May 14th, 2009

Page 2: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

DuraSpace, Inc.

• 501-(c)3 private, non-profit company• 4-year project funded by Moore Foundation to

become self-sustaining• Continuing software development• Moving towards community-based software

development• Establishing “solution communities” for the

development of solution bundles.

Page 3: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

DuraSpace Product

• Fedora• Dspace• DuraClould• Akubra – storage plug-in module with

tranactional file system• Mulgara – RDF indexing engine• Topaz – core semantic knowledgebase

components

Page 4: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Solution Communities

• Community group that creates and maintains the vision for solution bundle in an area

• Gathers resources to create software for solution

• Coordinates development with DuraSpace technical staff

• Smaller group that gets things done will emerge

Page 5: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Solution Areas

• Data Curation• Open Access Publishing• Integration Services• Preservation and Archiving• Small Archives• Scholars’ Workbench

Page 6: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Other Possible Community Groups

• Other software development groups• News and Publications Outreach group that

works with our Communications Director• Issue/advocacy groups that work on

standards important to the community

Page 7: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

The Flexible Extensible Digital Object Repository Architecture

• A set of abstractions that can be used to represent different kinds of data

• A repository management system• A foundation for many information

management applications• Designed to make data “durable” over the

long term

Page 8: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

154 Current Known Users

• Broadcasting and media – 1• Consortia – 7• Corporations – 14• Government agencies – 4• IT- Related Institutions – 9• Medical Centers and Libraries – 4• Museums and Cultural Organizations – 5• National Libraries and Archives – 16• Professional Societies – 2• Publishing - 4• Research Groups and Projects – 17• Semantic and Virtual Library Projects - 6• University Libraries and Archives - 66

Page 9: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

The Flexible Extensible Digital Object Repository Architecture

• A set of abstractions that can be used to represent different kinds of data

• A repository management system• A foundation for many information

management applications• Designed to make data “durable” over the

long term• The key to using Fedora is in the data

modeling

Page 10: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Making complex digital information “durable” is a very hard problem

• The existence and meaning of content needs to be verifiable as technologies change

• A history of the changes to the encoding and state of content must be reliably provided

• A meaningful context for any unit of content may be one of many and must be sustained

• Complex resources will increasingly be dispersed across institutional boundaries.

Page 11: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

The Fedora abstractions provide a durability framework.

• Content is “unitized” as information objects that combine data, metadata, policies, relationships and the history of the object.

• Complex digital resources are formally defined graphs of related objects.

• The public view of the content is presented as abstract behaviors.

• The web services orientation of Fedora provides the basis for repository federation.

Page 12: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Abstract Data M anage me nt(F e dora CommonsSe rv ice F rame work)

Preservationand Archiv ing

Solutions

Data CurationSolutions

Scholars'Repository

PublishingSolutions

TapeLib rar ie s

SAN

RAID array

Page 13: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

DCDC

Persistent ID

RELS-EXTRELS-EXT

AUDITAUDIT

11

22

nn

Reserved Datastreams

Custom Datastreams

(any type, any number)

A data object is one unit of content

POLICYPOLICY

Page 14: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Relationships Among Objects

• Describes adjacency relationships among objects

• RDF data of the form:

PID – typeOfRelationship – relatedObjectPID• Can used to assemble aggregations of

objects• Can build graphs of relationships to feed into

user interfaces

Page 15: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Objects Representing Aggregations

• Creating parent objects for complex resources

• Representing explicit collections• Representing implicit collections• Creating digital surrogates for physical

entities

Page 16: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Te xtTe xt

T h e R os s etti A rch iv e

Ar tw o r k Ar tw o r k

W o r k W o r k

I m a g e s

W o r k

Page 17: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Optional Object Behaviors

• Data objects can have different views or transformations

• Sets of abstract behaviors that different kinds of objects can subscribe to

• Corresponding sets of services that specific objects can execute

• The business logic is hidden behind an abstraction

Page 18: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Pid

syste m Me ta

MO D S

JP2 0 00

T hum b S cree n Mas te rC us to m

S izeD ub linC o re MODS C itation

MODSFile

J PEG200File

ContentAccess

ContentManagement

Page 19: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Content Models

• Create classes of data objects• Expressed as Cmodel objects• A Cmodel object defines the number and

types of data streams for objects of that class• A Cmodel object binds to service objects to

enable appropriate behaviors to be inherited by data objects

Page 20: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Fedora Repository ServiceGSearch GSearch

OAIOAI

IngestIngest

SimpleJMS

SimpleJMS

Fedora Framework Service Integration

More…More…repository publishes events

serviceslisten andconsumeevents or other messages

Page 21: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

Current Work… early seeds for DuraCloud concept

SharedStorage Abstraction

Plug-in 1 Plug-in 2 Plug-in …

Amazon University SAN/Fabric

LocalStorage

Page 22: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

DuraCloud: Possible Evolution…

SharedStorage Abstraction

Plug-in 1 Plug-in 2 Plug-in …

Microsoft

IntraCloud or local store

InternetArchive

DuraCloud Instance

Chinese Menu of DuraCloud Services-Group A: replicate, monitor, audit, migrate-Group B: aggregate, relate, link

Amazon

UniversityIT

Google

Page 23: DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.

http://www.duraspace.org/

http://www.fedora-commons.org/