Digital Preservation Andrea Goethals Wendy Gogel From Harvard
University Library NELA 18 October 2010
Slide 2
Digital Preservation 1. Why digital preservation? 2. Whats the
problem? 3. Whats being done? 4. What can you do? 5.
Questions?
Slide 3
1. Why digital preservation?
Slide 4
Slide 5
Everything is digital 1957 first digital image 1969 ARPAnet
1971 first email sent 1972 first video game 1998 first digital
theatrical release
Slide 6
Digital content may be after 12:00 noon January 20, 2001, the
National Archives and Records Administration ("NARA") shall have
sole legal custody of all ClintonGore Administration electronic
mail records that are governed by the Presidential Records Act
("PRA"), 44 U.S.C. 2201, Memorandum of Understanding between NARA
and The Executive Office of the President, dated January 11, 2001
accessed Oct. 2010 at:
http://www.archives.gov/presidential-libraries/laws/access/email-records-memo.html
historically significant.
Slide 7
Digital content may be or your favorite movie. your favorite
song,
Slide 8
Digital content may be Harvard Magazine May/June 2009 the only
version.
Slide 9
Digital content may be a work of art. Doug Aitken. (American,
born 1968). sleepwalkers. 2007. Six-channel video (color, sound),
seven monitors, 12:57 min. The Dunn Bequest. 2008 Doug Aitken.
Photo: Fred Charles.
Slide 10
Digital content may be important to scholarship.
Slide 11
Who cares? Cultural Resource Institutions Museums, historical
societies MOMAs Matters in Media Arts Libraries, archives, special
collections Academic institutions Governments National Library Of
New Zealands NDHA NARAs ERA The Entertainment Industry AFI Digital
Preservation Project
Slide 12
Who cares? You and me, personally!
Slide 13
2. Whats the problem?
Slide 14
Digital content is Transient Fragile Hidden 2400 B.C.E. 1450
C.E.
Slide 15
Digital content is transient The average lifespan of a web site
is between 44 and 100 days Captured April 8, 2009Visited October
13, 2010
Slide 16
Digital content is fragile Digital things are amazingly easy to
destroy Bad people Software or hardware failure Human mistakes The
slip of a finger or an unnoticed consequence of change can happen
easily - and are potentially catastrophic Help! Accidental
deletion. I accidentally deleted 62 images can you please recover
them from backups?
Slide 17
Digital content is hidden Loss is not always apparent Are
either of these corrupt?
Slide 18
Digital content is hidden Loss is not always apparent Both are
corrupt! Use helps but its not enough
Slide 19
Even if its safe is it usable??? Its not enough to preserve the
bits if the format of the bits is obsolete! WordStar? AppleWorks?
Excel 1.0? To use digital content we are dependent on software that
can understand the format
Slide 20
The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200...
Slide 21
The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0
JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0
ECS1 RST1 ECS2...
Slide 22
The importance of format Understanding formats is fundamental
to preservation ffd8ffe000104a46494600010201
008300830000ffed0fb050686f74 6f73686f7020332e30003842494d
03e90a5072696e7420496e666f00 0000007800000000004800480000
000002f40240ffeeffee03060252 0347052803fc0002000000480048
0000000002d80228000100000064 000000010003030300000001270f
0001000100000000000000000000 0000600800190190000000000000
0000000000000000000000000000 0000000000000000000000003842
494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0
JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0
ECS1 RST1 ECS2...
Slide 23
Using information content information content bits formats SW
HW HW (paper) information content HW (paper) symbols language
Analog book Unmediated use Digital book Technology-mediated
use
Slide 24
Formats are key to determining usability information content
bits formats SW HW supporting technologies digital content Formats
are the bridge between the content we want to preserve and
supporting technologies
Slide 25
Dependence on fleeting technology We are dependent on
technology to interpret digital content... Technologies must
understand the format of the content Technologies age and
disappear!
Slide 26
3. Whats being done?
Slide 27
Primary goals of digital preservation 1. Keep the bits safe 2.
Keep the bits useful to people
Slide 28
1. Keep the bits safe Infrastructure, processes, policies and
professional staff to counter risks High quality storage Redundancy
(multiple copies, multiple locations) Media refreshing (replacing)
Integrity monitoring (check for corruption) Security and access
management Content recovery
Slide 29
2. Keep the bits useful Provide ways for people to find it
Provide ways to manage it Keep records of history and significant
events Know what formats you have Make sure theres technology to
support the formats! Technology watch And if theres not, force
there to be technology that supports the formats (migration,
emulation, creation of viewing software)
Slide 30
Degrees of preservation passive preservation aka bit-level
preservation active preservation aka full preservation aka logical
preservation better understood & less costly will not ensure
long-term usability - ensures current and near-term usability more
complex, challenging & costly requires more expertise but
better ensures very long- term usability requires passive
preservation
Slide 31
Degrees of preservation passive preservation aka bit-level
preservation active preservation aka full preservation aka logical
preservation Store Secure Maintain Prevent Migrate Re-engineer
software Emulate Digital archaeology Monitor Restore Add value
Slide 32
Strategic thinking The least expensive, and most effective
preservation measure is to think about the future when digital
content is created! The content production matters! It makes good
sense to try to influence the content creation process
Slide 33
Preservation lifecycle Create or acquire digital content Ingest
into a preservation repository Continuous cycle of: Monitoring
Planning Intervention Subject to collection management decisions
Transfer to next generation of the repository or to a different
repository A series of hand-offs over time
Slide 34
Ongoing commitment Requires continual pro-active program You
cant just stop and start Time frames are MUCH shorter than for
preservation of physical collections Requires ongoing investment in
both technology and staffing
Slide 35
Cant do it alone More than any other library activity,
preservation responsibility must be shared across institutions Even
collectively we do not have adequate resources or
understanding
Slide 36
Preservation community efforts Collaborative organizations
(NDSA, IIPC, OPF) Collaborative projects (AIHT, TIPR) Standards and
metadata Technical metadata for still images, audio, documents METS
(package for metadata and digital objects) PREMIS (preservation
metadata) Preservable formats (PDF/A) Repository certification
Infrastructure Formats registry (UDFR, Pronom) Repository software
(Fedora, DAITTSS, LOCKSS, etc.) Tools (Jhove, FITS, etc.)
Slide 37
4. What can you do?
Slide 38
First steps Inventory your content Identify where it is all
kept web locations computer hard drive Removable media (CDs, etc.)
Select Decide what is worth keeping Given a choice keep the highest
quality version Is someone else already preserving it? Consider
deleting content that's not needed
Slide 39
Second steps Organize your digital content Create a logical
directory/folder structure for the content Give descriptive names
to the files If possible tag or embed with descriptions Catalog
your content Draft a summary description Keep your inventory and a
summary description of the content and how you have it organized in
a secure location
Slide 40
Third steps Make multiple copies of your content Use formats
that are amenable to long-term survival Use open formats when
possible Store on durable media Store in multiple locations
Preferably in different disaster zones. Use it! Periodically check
that you can access the content Migrate to new media over
time.
Slide 41
Fourth steps Keep informed. LC's website
http://www.digitalpreservation.gov/you/http://www.digitalpreservation.gov/you/
Research, training and outreach (DCC, DPC, JISC, IIPC, NEDCC)
http://www.nedcc.org/curriculum/lesson.introduction.ph p
http://www.nedcc.org/curriculum/lesson.introduction.ph p
Professional organizations (ALA, SAA) Conference proceedings
(iPRES, IS&T Archiving, DLF) How to preserve your own digital
materials (LC): http://www.digitalpreservation.gov/you/
http://www.digitalpreservation.gov/you/ 10 basic characteristics of
digital preservation repositories (CRL website):
http://www.crl.edu/archiving-preservation/digital-
archives/metrics-assessing-and-certifying/core-re
http://www.crl.edu/archiving-preservation/digital-
archives/metrics-assessing-and-certifying/core-re
Slide 42
Image Credits First digital image
http://www.worldalmanac.com/blog/2007/05/the_first_digital_image.html
Pong:
http://www.simondelliott.com/blog/2009/01/pong-is-more-than-just-a-game-its-a-way-of-life
1998: First theatrically released:
http://en.wikipedia.org/wiki/The_Last_Broadcast iPod ad:
http://www.ipodhistory.com/ipod-advertising Avatar:
http://www.imdb.com/title/tt0499549 Cuneiform 2400 BC:
http://en.wikipedia.org/wiki/Cuneiform_script 1450 Book of Hours in
French and Latin:
http://www.griffons.com/index.cfm?frm=details&piid=2811&cid=1&scid1=2&CFID=2459509&CFTOKEN=33670424
Server: http://regmedia.co.uk/2007/11/06/hp_mediasmart_server.jpg
Sleepwalkers at MOMA:
http://www.moma.org/explore/collection/conservation/media_art PRS
data sets:
http://www.prsgroup.com.ezp-prod1.hul.harvard.edu/prsgroup_shoppingcart/cdSub4.aspx
Corrupt images:
http://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppthttp://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppt
New Yorker Cover, June 8 and 15, 2009 and October 18, 2010