Besser--Digital Longevity 9/2/00 (12/12/99) 1
Planning to Maximize Longevity of Digital Information
Howard Besser
UCLA School of Education & Information
http://www.gseis.ucla.edu/~howard
Besser--Digital Longevity 9/2/00 (12/12/99) 2
Planning to Maximize Longevity of Digital Info-
The Ecology Metaphor Why are you Managing this Information? Major Issues Facing Digital Projects The Short Life of Digital Info Important Planning Considerations Key Considerations for Imaging Projects
Besser--Digital Longevity 9/2/00 (12/12/99) 4
Why are you Managing this Information?
Organizational mission & type Users Uses
Besser--Digital Longevity 9/2/00 (12/12/99) 5
Major Issues Facing Digital Projects
Dangerous Changes in Intellectual Property Law
Intellectual Access Storage Delivery Integration with other tools Interoperability
Besser--Digital Longevity 9/2/00 (12/12/99) 6
Serious Longevity Problems
_ What we know from prior widespread digital file formats
_ Images separating from their metadata_ Inaccessibility of software needed to view a
work_ Inability to even decode the file format of a
work
Besser--Digital Longevity 9/2/00 (12/12/99) 7
The Short Life of Digital Info: Digital Longevity Problems-
Disappearing Information The Viewing Problem The Scrambling Problem The Inter-relation Problem The Custodial Problem The Translation Problem
Besser--Digital Longevity 9/2/00 (12/12/99) 8
The Viewing Problem
Digital Info requires a whole infrastructure to view it
Each piece of that infrastructure is changing at an incredibly rapid rate
How can we ever hope to deal with all the permutations and combinations
Besser--Digital Longevity 9/2/00 (12/12/99) 9
The Scrambling Problem
Dangers from: Compression to ease storage & delivery Container Architecture to enhance digital
commerce
Besser--Digital Longevity 9/2/00 (12/12/99) 10
The Inter-relation Problem
-Info is increasingly inter-related to other info
-How do we make our own Info persist when it points to and integrates with Info owned by others?
-What is the boundary of a set of information (or even of a digital object)?
Besser--Digital Longevity 9/2/00 (12/12/99) 11
The Custodial Problem
In the past, much of survival was due to redundancy
How do we decide what to save? Who should save it?
Mellon-funded E-Journal Archives How should they save it?-
Besser--Digital Longevity 9/2/00 (12/12/99) 12
The Custodial Problem:How to save information?
Methods for later accessRefreshingMigrationEmulation
Issues of authenticity and evidence
Besser--Digital Longevity 9/2/00 (12/12/99) 13
The Translation Problem
Content translated into new delivery devices changes meaning– -A photo vs. a painting– -If Info is produced originally in digital form in
one encoded format, will it be the same when translated into another format?
– Behaviors
Besser--Digital Longevity 9/2/00 (12/12/99) 14
Pieces of the Solution (1/2)
-We need to insist upon clearly readable standardized ways for digital objects to self-identify their formats
-We should discourage scrambling -We need to better understand information
inter-relates to other Info, and what constitutes “boundaries” of Info objects
Besser--Digital Longevity 9/2/00 (12/12/99) 15
Pieces of the Solution (2/2)
-People and organizations wishing to make information persist need guidelines of how to go about doing it
-We need to better understand how translating from one storage or display format to another affects the meaning of a work
-We need to save the “behaviors” of a digital object, not just its “contents”
Besser--Digital Longevity 9/2/00 (12/12/99) 16
Conceptual Approaches to Digital Preservation
_ Refreshing always necessary due to volatility of physical strata– Impact on evidential value
_ Migration -- advantages & disadvantages_ Emulation -- advantages & disadvantages
Besser--Digital Longevity 9/2/00 (12/12/99) 18
Persistent IDs--the Problem
_ Need to separate work ID from work location
_ URNs probably won’t be ready until 2003_ Becomes a business process issue when one
organization maintains the resource and another organization references it (ie. licensed from vendors or managed by separate administrative structures)
Besser--Digital Longevity 9/2/00 (12/12/99) 19
More Persistent IDs--the Approach for today
_ PURLs_ Handles_ HTTP redirects
_ And worry about costs now and conversion costs when URNs become feasible
Besser--Digital Longevity 9/2/00 (12/12/99) 20
Data Set ManagementMore issues with referencing IDs
_ References for mirror sites_ References for back-up sites when main site
is down or bottle-necked_ References for off-site copies and archival
copies
Besser--Digital Longevity 9/2/00 (12/12/99) 21
Metadata can be the first line of defense
Can tell you– where the file is (if you can’t find the file)– where more info about the file is (if you have the
file but most other metadata has become separated)
– what the file format is– what the compression scheme is– what application program and version is needed
for the file
Besser--Digital Longevity 9/2/00 (12/12/99) 22
Structural Metadata Issues
http://sunsite.berkeley.edu/moa2
Besser--Digital Longevity 9/2/00 (12/12/99) 23
Architecture: Separating Longevity and Delivery Servers
BerkeleyLongevity
Server
BerkeleyDeliveryServer
OtherDeliveryServer
OtherDeliveryServer
OtherDeliveryServer
User
User
User
User
Besser--Digital Longevity 9/2/00 (12/12/99) 24
Groups Working onthe Big Problem
http://sunsite.Berkeley.EDU/Longevity/
CPA Task Force Getty “Time & Bits” Conference & Follow-ups- Emulation experiments in US and Europe
NEDLIB, CURL, Michigan
Mellon-funded E-Journal Archive experiments
Internet Archive Long Now
Besser--Digital Longevity 9/2/00 (12/12/99) 26
Time & Bits Participants
Steward Brand Howard Besser Brian Eno Danny Hillis Peter Lyman Brewster Kahle Kevin Kelly
Jaron Lanier Doug Carlston John Heilemann Ben Davis Margaret MacLean Bruce Sterling Paul Saffo
Besser--Digital Longevity 9/2/00 (12/12/99) 27
Groups Working onPieces of the Big Problem
http://sunsite.berkeley.edu/Longevity/
Internet Archive Long Now Emulation experiments in US and Europe
NEDLIB, CURL, Michigan
Besser--Digital Longevity 9/2/00 (12/12/99) 28
Journal Archiving
_ License, don’t own; may not be even able to obtain right to make archival copy
_ Increasingly no paper back-up at all_ Usually we don’t have the important
redundancy factor_ Stanford’s LOCKSS Project (Lots of Copies
Keeps Stuff Safe) and its problems (http://lockss.stanford.edu)
Besser--Digital Longevity 9/2/00 (12/12/99) 29
Complexity of Rich Media
_ Works often have artistic nature (including video games)
_ Enormous number of elements can, at times, be very important to preserve (pacing, original artifact, elements used to construct the artifact)
_ Too complex to save every one of these aspects for every type of material
_ Importance of saving documentation
Besser--Digital Longevity 9/2/00 (12/12/99) 30
Important Planning Considerations
File Formats Choosing Interoperable Systems Adhere to standards Vendors with large installed base Refreshing and/or Migration
Besser--Digital Longevity 9/2/00 (12/12/99) 31
Key Considerations for Imaging Projects-
Users' Needs Image Quality Intellectual Property Standards Topology Tools & Processes
Besser--Digital Longevity 9/2/00 (12/12/99) 32
Key Considerations for Imaging Projects (1 of 3)
Users' Needs– Quality of Digital Surrogate– Interoperable desktop applications
Image Quality– Archival– Current online delivery
Besser--Digital Longevity 9/2/00 (12/12/99) 33
Key Considerations for Imaging Projects (2 of 3)
Intellectual Property Standards
– Modular and Layered Architecture– Terminology– Technical imaging information
Topology
Besser--Digital Longevity 9/2/00 (12/12/99) 34
Key Considerations for Imaging Projects (3 of 3)
Tools & Processes– Scanners– Compression techniques– Linking files– Workflow– Interoperable desktop applications
Besser--Digital Longevity 9/2/00 (12/12/99) 35
Some nuts-and-boltsPlanning Considerations
Think about users (and potential users), uses, and type of material/collection
Scan at the highest quality that does not exceed the likely potential users/uses/material
Do not let today’s delivery limitations influence your scanning file sizes; understand the difference between digital masters and derivative files used for delivery
Many documents which appear to be bitonal actually are better represented with greyscale scans
Include color bar and ruler in the scan
Use objective measurements to determine scanner settings (do NOT attempt to make the image good on your particular monitor or use image processing to color correct)
Don’t use lossy compression Store in a common (standardized)
file format Capture as much metadata as is
reasonably possiple (including metadata about the scanning process itself)
Besser--Digital Longevity 9/2/00 (12/12/99) 36
One Final Question:Who will collect the digital works of
today that should become the Special Collections of tomorrow?
_ web sites_ zines_ electronic journals_ listserve and email discussions_ drafts of works that later become famous
Besser--Digital Longevity 9/2/00 (12/12/99) 37
Howard Besser
UCLA School of Education & Information
http://sunsite.berkeley.edu/Longevity/ http://www.gseis.ucla.edu/~howard http://sunsite.berkeley.edu/moa2 http://lockss.stanford.edu http://www.longnow.com/10klibrary/TimeBitsDisc/ http://www.archive.org/
Planning to Maximize Longevity of Digital Information
Top Related