Jeff Rothenberg Digital Preservation Perspective

48
Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Jeff Rothenberg March 26, 2012 Digital Preservation in Perspective: How far have we come, and what's next? Color photo by Jeff Rothenberg

description

Digital Preservation in Perspective:How far have we come, and what's next?Jeff Rothenberg

Transcript of Jeff Rothenberg Digital Preservation Perspective

Page 1: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24

Jeff RothenbergMarch 26, 2012

Digital Preservation in Perspective:How far have we come, and what's next?

Col

or p

hoto

by

Jeff

Rot

henb

erg

Page 2: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 0

A brief history of digital preservation

• Early statements of the problem– Jay Bolter, Margaret Hedstrom, David Bearman – Avra Michelson’s & my 1992 American Archivist paper– My 1995 Scientific American article– Into the Future film (CLIR, 1997; shown on PBS)– Tora Bikson’s & my 1999 report for the Dutch National Archives

• Gradual recognition of the problem– By librarians, archivists, modern museum curators – But without much technological depth of understanding in most cases– OAIS Preservation Planning assumed migration, though admits problems

• Some experiments & demonstrations– U. Leeds & U. Mich: CEDARS & CAMiLEON projects; BBC Domesday Book– Dutch National Archives Testbed: migration & UVC “data archiving” – UCSD Supercomputing Center & NARA: formalisms (e-mail only)– Guggenheim “ErlKing” renewal project– Dutch Royal Library (KB): Dioscuri emulator & eDepot

• Few serious attempts at implementation– Most implementations essentially ignore long-term preservation

Page 3: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 1

Page 4: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 2

Page 5: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 3

Col

or p

hoto

by

Jeff

Rot

henb

erg

Page 6: Jeff Rothenberg Digital Preservation Perspective

Outline

• What should we mean by digital preservation?

• Levels of awareness of the problem

• Distinctions across disciplines

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 4

• Responses

• Remaining challenges

Page 7: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 5

What should preservation mean?

“The goal of digital preservation is the accurate rendering of authenticated content over time.”

—ALA “medium” definition

Page 8: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 6

Preserve originals as well as “vernacular renditions”

The Canterbury Tales

Whan that Aprill, with his shoures soote The droghte of March hath perced to the roote

And specially from every shires endeOf Engelond, to Caunterbury they wende,The hooly blisful martir for to sekeThat hem hath holpen, whan that they were seeke.

When in April the sweet showers fallThat pierce March’s drought to the root and all

And specially from every shire’s endOf England they to Canterbury went,The holy blessed martyr there to seekWho helped them when they lay so ill and weak

• Used by scholars for serious research• Used to generate & evaluate vernacular renditions • Accessed by non-scholars for aesthetic purposes

(with help, e.g., see below)

• Used by non-scholars for casual research• May be used by scholars for research as well• Not thought of as a preservation copy• Not used as a source for later vernacular

renditions

Original Vernacular Rendition

Page 9: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 7

A particular “view” of information may be crucial

3 1

2 1

1 1 1 1 2

0 1 3 1 1 2 1 1 1 2 1 1 1 1

53 57 58 63 66 67 68 69 70 72 73 75 76 78 79 80 81

Temperature °F

Levels of O-ring

damage

Example: Space Shuttle O-ring damage vs. temperaturePrior to Challenger

Page 10: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 8

Revealing View of Space Shuttle O-ring Data

3

2

1

0

30o 35o 40o 45o 50o 55o 60o 65o 70o 75o 80o 85o

3

2

1

0

Temperature oF

Extrapolation of damage curve to the 31o F temperature forecast for Challenger’s launch on January 28, 1986.

Dots indicate temperature and O-ring damage for 24 successful launches prior to Challenger. Curve shows that increasing damage is related to cooler temperature.

Page 11: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 9

Furthermore, many digital artifacts are inherently digital

• They cannot be meaningfully represented as page images– Doing so loses essential aspects of their contents and/or behavior

• Examples include dynamic, active or interactive artifacts– Multimedia (e.g., web pages, CD-ROM publications, Ph.D. dissertations) – Dynamically generated (e.g., JavaScript, cgi, ASP or PHP web pages, Servelets)– Active presentation (e.g., animation, simulation, virtual reality)– Interactive (e.g., applets, interactive virtual reality, games)– Digital artwork

• Inherently digital artifacts are those whose perceptibility, meaning, or usability arise from and rely on their being encoded in digital form

Page 12: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 10

What you see is not what you get

V2.24 ERwin if %JoinPKPK(oldrows,newrows,” <> “,” or “) then select count(*) into numrows from %Child where %JoinFKPK(%Child,oldrows,” = “,” and”); if (numrows > 0) then signal parent_updrstrct_err end if; end if; if %JoinPKPK(oldrows,newrows,” <> “,” or “) then update %Child set %JoinFKPK(%Child,newrows,” = “,”,”) where %JoinFKPK(%Child,oldrows,” = “,” and”);

Page 13: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 11

Render unto seer...

Page 14: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 12

In fact, every digital artifact is a program

• A program– Is a sequence of commands in some formal language– That is intended to be interpreted– By an interpreter that understands that language

• An interpreter– Is an active process– That knows how to perform commands– Specified in a given formal language

• Interpretation ultimately involves hardware– ASCII codes are rendered by a printer or display– More complex entities are interpreted by software (applications)– But all software is ultimately interpreted by hardware

Page 15: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 13

Digital information promises to last better than analog

• A bitstream lasts forever– Producing exactly the same behavior, without loss (at least in principle)– So long as it can be interpreted correctly

• But interpreting a bitstream correctly requires software– And software must be run on hardware (a computer)– A computer is (ultimately) an analog device, that does decay– And both hardware and software become obsolete, long before they decay

• Digital objects do not decay, fade, tear, crumble, dissolve, etc.– Their media may, but not the bits themselves

Page 16: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 14

“Digital objects last forever — or five years, whichever comes first”

So the best we can say is...

Page 17: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 15

min ( ∞ , 5 )

“Digital objects last forever — or five years, whichever comes first”

So the best we can say is...

Page 18: Jeff Rothenberg Digital Preservation Perspective

Outline

• What should we mean by digital preservation?

• Levels of awareness of the problem

• Distinctions across disciplines

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 16

• Responses

• Remaining challenges

Page 19: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 17

Levels of awareness of the problem(by disciplines/institutions/individuals)

• Innocence

• Awakening

• Analysis

• Looking under the streetlamp

• Experimentation/Demonstration

• Where are we now?

Page 20: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 18

Innocence

• Why should digital artifacts be any different?– Preservation is preservation, isn’t it?

• Except for media obsolescence– Isn’t this just analogous to medieval monks copying manuscripts?

• Digital artifacts don’t decay or change– Isn’t this a dream come true for preservationists?

Page 21: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 19

Awakening

• Digital poses unique problems– Media obsolescence– Description (unique and complex attributes)– Cataloging (ephemeral reference, links)– Metadata (unique requirements)– Format/encoding (interpretation, conversion, corruption)– Future rendering (in the face of obsolete software and hardware)

• Digital preservation must be proactive– Over relatively short timeframes (5 years?)– Otherwise artifacts are likely to be irretrievably lost

Page 22: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 20

• Digital artifacts– What are their essential characteristics for preservation?

• Authenticity – What does this mean for digital artifacts?

• Rendering – How can we guarantee proper (or any) rendering in the future?

• Preservation– What does (should) this mean for digital artifacts in various disciplines?

• Costs – What are the up-front and long-term costs of digital preservation?– How should these costs be paid and by whom?

Analysis

Page 23: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 21

Looking under the streetlamp

• Metadata– Dublin Core, etc.– Depends on the nature of digital artifacts & technical preservation schemes

• Reference models– OAIS– Premature in the absence of viable technical preservation schemes

• Institutional process models– Premature in the absence of defined, viable technical preservation schemes– May tend to lock in approaches that are not viable

Page 24: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 22

The Open Archival Information System Reference Model(OAIS)

Page 25: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 23

Experimentation/Demonstration

• Dutch Archives Testbed– “Discovered” that migration is very hard (duh!)

• PLANETS, KEEP– Continuing to explore technically viable approaches

• BBC Domesday Book / CAMiLEON Project– Early warning of the need for timely, extreme action– Demonstrated the potential of hardware emulation

• Other emulation examples– Apple’s M68000 emulator for PowerPC– U. Warwick’s EDSAC emulator– Emory U’s MARBL collection – Guggenheim: Renewing the ErlKing– KB’s Dioscuri Emulator

Page 26: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 24

The BBC Domesday / CAMiLEON Project

Emulated at the University of Leeds, U.K. (2002)

Page 27: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 25

EDSAC: the first electronic digital computer

Page 28: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 26

Page 29: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 27

Page 30: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 28

Renewing the ErlKing

• An interactive mixed-media video experience– By Roberta Friedman and Grahame Weinbren– That overlays text and graphics on video content– And branches in response to user touchscreen input

• Highly innovative when created in 1982– Pushed the limits of affordable computers and video display– Included a custom-built “authoring” environment– Widely exhibited in major museums and other venues

Page 31: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 29

The ErlKing in the Guggenheim’s “Seeing Double” Show(March 18, 2004)

Page 32: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 30

KB’s Dioscuri EmulatorRunning my 1982 Calendar/1 Program

Page 33: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 31

Where are we now?

• Somewhere between 4 and 5 – Looking under the streetlamp– Experimentation/Demonstration

• Few end-to-end implementations– Except for page-image artifacts (e.g., LOCKSS, Portico)– And KB eDepot

Page 34: Jeff Rothenberg Digital Preservation Perspective

Outline

• What should we mean by digital preservation?

• Levels of awareness of the problem

• Distinctions across disciplines

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 32

• Responses

• Remaining challenges

Page 35: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 33

Responses

• Denial– What problem?

• Wishful thinking– Deus ex machina

• Misguided efforts (IMHO)– Digital garden paths

• Facing reality– What will it take?

• Where are we now?

Page 36: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 34

Denial

• Just save bits– And hope for the best (let our grandchildren worry about it)

• Expect commercial sector solutions– Microsoft, IBM, etc. will save us

• Popular formats will live forever or auto-migrate– (What the ancient Egyptians thought)

• Convergent formats like HTML and XML solve everything– But these are really just “scaffold” formats embedding others

Page 37: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 35

Preservation approaches

• Save and run obsolete hardware and software– In “computer museums”– To read documents by running the original programs that created them

• Rely on emulation of obsolete hardware to run saved software– Requires no migration or conversion (aside from media)– Saves originals in original form

• Rely on universal, formal description of logical formats – To allow interpreting those formats in the future– Thereby correctly rendering saved digital artifacts

• Rely on standards and migration– Expect new programs to read old documents in enduring standard forms– Convert documents from old standards to new ones as standards evolve

Page 38: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 36

Wishful thinking

• Metadata is all we need– Describe formats, behavior, etc.

• Format migration– The game of “telephone”

• Formal encoding (UCSD/NARA-ERA)– Maybe someday

• Rely on future cryptography– Counterexample: Hieroglyphics

• Digitize to preserve– e.g., Shoah

Page 39: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 37

Misguided efforts (IMHO)

• Focus on short-term preservation– Urgent enough to preclude long-term focus (e.g., JSTOR?)

• Reject emulation without understanding it– Seems like smoke and mirrors

• LC, NARA-ERA– Full speed ahead and damn the technical realities

Page 40: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 38

Facing reality

• Technological issues– For “inherently digital” artifacts (which will become more prevalent)

• Defining/preserving “digital originals”– Retaining original rendering & behavior– Enabling repeated “vernacular extraction” of surrogates

• Comparative cost analyses– Informed by technological understanding– Looking at overall lifecycle costs

• Realistic process models– Based on technologically viable approaches

• Facing long-term issues (KB/IBM-NL eDepot)– Loss of metadata– Partial loss or corruption of archival information package indexes

Page 41: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 39

Current implementation efforts

• NARA’s ERA project– Ill-conceived: assumed a solution would magically appear

• KB may still be in the lead– eDepot designed to address long-term preservation– Using a two-pronged migration/emulation approach– Planets & KEEP projects continuing to explore longer-term issues

• LC still seems somewhat aimless– Lost half their NDIIP funding after 2006 (some since restored)

• Most so-called “archiving” efforts ignore preservation– LOCKSS, Portico (journal archiving) offer no real preservation– Internet Archive seems based on wishful thinking

• BL proceeding rationally– Pursuing a broadly-based, intelligent strategy

Page 42: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 40

Where are we now?

• Somewhere between 2 and 4?– Misguided efforts– Facing reality

• Still at 1?– Denial

Page 43: Jeff Rothenberg Digital Preservation Perspective

Outline

• What should we mean by digital preservation?

• Levels of awareness of the problem

• Distinctions across disciplines

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 41

• Responses

• Remaining challenges

Page 44: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 42

Distinctions across contexts

• Disciplines: Libraries, Archives, Museums– Archives: preserve “record” value– Libraries: preserve[/contextualize] content/rendering– Museums: preserve/recreate/contextualize experience

• Institutions: National, Commercial, NGO– Commercial: film industry, petrochemical, pharma

(core vs. ancillary assets)– Shoah Fndn (Spielberg): http://dornsife.usc.edu/vhi/preservation

• Individuals– Mostly not yet begun

Page 45: Jeff Rothenberg Digital Preservation Perspective

Outline

• What should we mean by digital preservation?

• Levels of awareness of the problem

• Distinctions across disciplines

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 43

• Responses

• Remaining challenges

Page 46: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 44

Remaining challenges

• Integrate true long-term perspective– Render “inherently digital” artifacts– Recognize the executability of all digital artifacts– Preserve digital originals and facilitate “vernacular renditions”

• Engage the Computer Science (ICT) field– Conference sessions, working groups, etc.

• Perform serious cost and process analyses– Based on viable technological approaches

• Try some small-scale “end-to-end” demonstrations– Long-term focus– Inherently digital artifacts– Preserve digital originals and produce “vernacular renditions”– Develop and test realistic process models– Instrument, measure, and evaluate:

- Authenticity, quality, accessibility, usability, cost - Effort, scalability, reproducibility (of process)

Page 47: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 45

Expected cost & effectiveness comparisons

Cost:

Per-format (x 1000)

Per-platform (x 10)

Per-artifact (x 100,000,000)Process at Ingest

Reverse-engineer

Convert over time

Obtain necessary S/W

Create H/W emulators

Per-approach (x 1)Create EVM or formalism

H,M,L: High, Med, Low +,- : Frequent, Rare

arch

aeol

ogy

form

aliz

atio

stan

dard

s

view

ers

mig

ratio

n

emul

atio

n

Access

0 H/ - 0 0 0 H/ -

0 H/ - H/ - H/+ H/+ 00 0 0 M/+ M/- L /+

0 H H 0 0 L0 M/- H/ - H/+ H/+ 0H M L L L L

0 0 0 0 0 H/ -

Effectiveness:On each artifact% of formats handled

L M M M M HL L L M L H

Port to new platforms 0 L/ - M/ - H/ - M/ - M/ -

Page 48: Jeff Rothenberg Digital Preservation Perspective

Jeff Rothenberg Future Perfect 3/26/2012 Rev: 2012-03-24 Chart 46

References for Jeff Rothenberg

[email protected]

http://www.JeffRothenberg.org