“Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

29
“Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems Stephen Abrams Patricia Cruse John Kunze David Loy California Digital Library University of California OR 2009: The 4th International Conference on Open Repositories Atlanta, May 18–21, 2009

description

“Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems. OR 2009: The 4th International Conference on Open Repositories Atlanta, May 18–21, 2009. Stephen Abrams Patricia Cruse John Kunze David Loy California Digital Library University of California. - PowerPoint PPT Presentation

Transcript of “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Page 1: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

“Where are We From, Where Are We Going?”

Permanent Objects, Disposable Systems

Stephen AbramsPatricia CruseJohn KunzeDavid Loy

California Digital LibraryUniversity of California

OR 2009: The 4th International Conference on Open Repositories Atlanta, May 18–21, 2009

Page 2: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

D'où venons nous, que sommes nous, où allons nous?

Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

Page 3: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Where are we from, what are we, where are we going?

Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

Page 4: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Where [is our stuff] from, what [is it], where are we going [with it]?

Paul Gauguin, 1897-98, Museum of Fine Arts Boston, 32.270

Page 5: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Where From? What? Where To?

Producer Archive Consumer

Page 6: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Where From? What? Where To?

Producer

Ingest

Archive

Data management /archival storage

Consumer

Access /preservation

planning

Page 7: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Where From? What? Where To?

Producer

Ingest

Provenance

Archive

Data management /archival storage

Characterization

Consumer

Access /preservation

planning

View paths

Page 8: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Repository Landscape

Increasing diversity in types and uses of content

Content arising from non-library contexts

Inevitable technological change

Page 9: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Design Goals

Devolve repository function into a set of independent, but interoperable, services

– Since each is small and self-contained, they are more easily developed and maintained

– Since the level of investment is lower, they are more easily replaced

Provide complex function through the flexible combination of atomistic services

Page 10: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Design Goals

Support interaction through procedural APIs, command line applications, and web interfaces

– Let content managers and curators interact with the services without requiring changes to existing work practices

Rather than force content to come to the services, push the services out to the content

– Easy deployment centrally or locally, either independently or in strategic combinations

Page 11: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Design Goals

Defer implementation decision making until needs and outcomes are clearly articulated

– Requirements are first stated as sets of values and strategies that promote those values

– Strategies are then embodied as abstract services, and, finally, instantiated in technical systems

Page 12: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Object-Centric Values and StrategiesValue Justification Strategy

Identity To distinguish an object from others Persistent naming, actionable resolution

Viability To recover an object from its medium Redundancy, heterogeneity, media refresh

Fixity To ensure that an object is unchanged from its accepted state

Redundancy, error-correcting codes, message digests

Authenticity To ensure that an object is what it purports to be

Provenance, cryptographically-secure signatures

Ontology To understand an object’s significant nature

Syntactic, semantic, and pragmatic characterization

Visibility To enable users to find objects of interest

Public discovery systems and registries, exposure for web harvest

Utility To expose an object’s underlying content

Behavior-rich delivery

Portability To facilitate content sharing and succession planning

Self-contained, self-documenting objects, packaging standards

Appraisement To understand the consequences of time Analysis and assessment

Timeliness To know when a preservation value is threatened

Technology watch, stakeholder engagement

Page 13: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Service-Centric Values and StrategiesValue Justification Strategy

Availability To provide access at a time of user choosing

Redundancy, automated failover

Responsivity To provide appropriate throughput Redundancy, load-balancing

Security To enforce appropriate use of services and content

Cryptographically-secure identity and role management

Interoperability To facilitate creative reuse of content and services

Standard interfaces

Extensibility To enable graceful evolution over time Granularity, orthogonality, virtualization

Trustworthiness To promote users’ sense of predictability and reliability

Transparency, audit, certification

Sustainability To ensure ongoing access and use Commodity components, institutional commitment, financial cost-recovery, professional development

Page 14: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Micro-Services

Interoperation

Publication Annotation

Application Ingest Index Search Transformation

Replication Identity Storage Fixity Replication

Interpretation

Catalog Characterization

Value

Service

Context

State

Curation

Preservation

“Lots of uses keeps stuff valuable”

“Lots of services keeps stuff useful”

“Lots of description keeps stuff meaningful”

“Lots of copies keeps stuff safe”

Suf

ficie

ncy

Nec

essi

ty

Page 15: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Design Process

What are the conceptual entities underlying the service?

What are their state properties?

What are their behaviors?

Page 16: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Storage Service

Storage service

– An aggregation of storage nodes

Storage node

– A particular configuration of object storage

Object

– An aggregation of files over time

Version

– A particular configuration of files at a point in time

File

– A formatted bit stream

Page 17: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Storage Service Methods

Help [idempotent, safe]

Get-state [idempotent, safe]

Get-node-state [idempotent, safe]

Get-object-state [idempotent, safe]

Get-object [idempotent, unsafe]

Get-version-state [idempotent, safe]

Get-version [idempotent, unsafe]

Get-file-state [idempotent, safe]

Get-file [idempotent, unsafe]

Add-version [non-idempotent, unsafe]

Page 18: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Storage Service Interfaces

METHOD Get-File-State [idempotent, safe]

Node Identifier Mandatory Node identifier

Object Identifier Mandatory Object identifier

Version Identifier Optional Version identifier

File Identifier Mandatory File identifier

ReponseForm Enum Optional Response form

RETURN Response form Mandatory File state

GET /node/object/version/file?m=state HTTP/1.1 Accept: application/json

% store –get node/object/version/file –m state –f JSON

File.getState (“node/object/version/file”, Format.JSON);

Page 19: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Storage Service Implementation

ca. 1989– FTP– POSIX– SQL

ca. 2029?– HTTP– URI– XML

Due to their inherent abstracting nature, protocols and interfaces last longer than systems

Page 20: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Storage Service Implementation

Using the file system as the controlling managerial abstraction, what is the thinnest smear of additional functionality that will make it an effective object store?

– Namaste– CAN– Pairtree– Dflat– ReDD

Page 21: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Name As Text (Namaste) Tags

Directory-level signature files extending Dublin Core Kernel metadata

– [ Tag h0 ] 0=name_version

– Who h1 1=who

– What h2 2=what

– When h3 3=when

– Where h4 4=where

Page 22: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Content Access Node (CAN)

File system conventions (structure and reserved names) for an object store

can/ 0=can_0.2 can-info.txt log/ store/ pairtree...

Page 23: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Pairtree

Use a bigram decomposition of an object’s identifier to determine its file system path

pairtree/ 0=pairtree_0.1 pairtree-info.txt pairtree_root/ id/ en/ ti/ fi/ er/ dflat...

Page 24: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Dflat

A “digital flat” for object data and metadata

dflat/ 0=dlfat_0.11 dflat-info.txt v001/ d-manifest.txt delta/ redd... v003/ f-manifest.txt full/ data/ metadata/ enrichment/ annotation/

Page 25: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Reverse Delta Directory (ReDD)

• File-level reverse delta compression

redd/ 0=redd_0.1 add/ delete.txt

Page 26: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Performance Scaling

Modern file systems, e.g. ZFS, exhibit good performance characteristics at reasonable scale

Average CopyTime

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

1 92 183 274 365 456 547

AverageCopyTime

Average MkDir Time

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

1 100 199 298 397 496 595

AverageMkDir Time

Traverse Time

0

5000

10000

15000

20000

25000

1 7 13 19 25 31 37 43 49 55 61 67 73

Traverse Time

2,272,000 files = 28.5 TB

127,058,820 files = 25.7 TB

Page 27: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Status

We are completing development of the foundational Storage and Identity services

– Identity is based on N2T (name-to-thing) and Noid systems

We are planning for the Ingest, Catalog, and Characterization services

– Characterization is based on JHOVE2

As these services become available they will be deployed centrally and locally on campuses

Page 28: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Summary

Safety through redundancy

Meaning through description

Utility through service

Value through use

Code to interfaces

Orthogonality, but interoperability

Composition, not addition

Bring services to content, not content to services

Page 29: “Where are We From, Where Are We Going?” Permanent Objects, Disposable Systems

Questions?

http://www.cdlib.org/inside/diglib/

[email protected]@[email protected]@ucop.edu