Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October...

44
Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October 2010

Transcript of Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October...

  • Slide 1
  • Digital Preservation Andrea Goethals Wendy Gogel From Harvard University Library NELA 18 October 2010
  • Slide 2
  • Digital Preservation 1. Why digital preservation? 2. Whats the problem? 3. Whats being done? 4. What can you do? 5. Questions?
  • Slide 3
  • 1. Why digital preservation?
  • Slide 4
  • Slide 5
  • Everything is digital 1957 first digital image 1969 ARPAnet 1971 first email sent 1972 first video game 1998 first digital theatrical release
  • Slide 6
  • Digital content may be after 12:00 noon January 20, 2001, the National Archives and Records Administration ("NARA") shall have sole legal custody of all ClintonGore Administration electronic mail records that are governed by the Presidential Records Act ("PRA"), 44 U.S.C. 2201, Memorandum of Understanding between NARA and The Executive Office of the President, dated January 11, 2001 accessed Oct. 2010 at: http://www.archives.gov/presidential-libraries/laws/access/email-records-memo.html historically significant.
  • Slide 7
  • Digital content may be or your favorite movie. your favorite song,
  • Slide 8
  • Digital content may be Harvard Magazine May/June 2009 the only version.
  • Slide 9
  • Digital content may be a work of art. Doug Aitken. (American, born 1968). sleepwalkers. 2007. Six-channel video (color, sound), seven monitors, 12:57 min. The Dunn Bequest. 2008 Doug Aitken. Photo: Fred Charles.
  • Slide 10
  • Digital content may be important to scholarship.
  • Slide 11
  • Who cares? Cultural Resource Institutions Museums, historical societies MOMAs Matters in Media Arts Libraries, archives, special collections Academic institutions Governments National Library Of New Zealands NDHA NARAs ERA The Entertainment Industry AFI Digital Preservation Project
  • Slide 12
  • Who cares? You and me, personally!
  • Slide 13
  • 2. Whats the problem?
  • Slide 14
  • Digital content is Transient Fragile Hidden 2400 B.C.E. 1450 C.E.
  • Slide 15
  • Digital content is transient The average lifespan of a web site is between 44 and 100 days Captured April 8, 2009Visited October 13, 2010
  • Slide 16
  • Digital content is fragile Digital things are amazingly easy to destroy Bad people Software or hardware failure Human mistakes The slip of a finger or an unnoticed consequence of change can happen easily - and are potentially catastrophic Help! Accidental deletion. I accidentally deleted 62 images can you please recover them from backups?
  • Slide 17
  • Digital content is hidden Loss is not always apparent Are either of these corrupt?
  • Slide 18
  • Digital content is hidden Loss is not always apparent Both are corrupt! Use helps but its not enough
  • Slide 19
  • Even if its safe is it usable??? Its not enough to preserve the bits if the format of the bits is obsolete! WordStar? AppleWorks? Excel 1.0? To use digital content we are dependent on software that can understand the format
  • Slide 20
  • The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200...
  • Slide 21
  • The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...
  • Slide 22
  • The importance of format Understanding formats is fundamental to preservation ffd8ffe000104a46494600010201 008300830000ffed0fb050686f74 6f73686f7020332e30003842494d 03e90a5072696e7420496e666f00 0000007800000000004800480000 000002f40240ffeeffee03060252 0347052803fc0002000000480048 0000000002d80228000100000064 000000010003030300000001270f 0001000100000000000000000000 0000600800190190000000000000 0000000000000000000000000000 0000000000000000000000003842 494d03ed0a5265736f6c7574696f 6e0000000010008313a3000200... SOI APP0 JFIF 1.2 APP13 IPTC APP2 ICC DQT SOF0 183x512 DRI DHT SOS ECS0 RST0 ECS1 RST1 ECS2...
  • Slide 23
  • Using information content information content bits formats SW HW HW (paper) information content HW (paper) symbols language Analog book Unmediated use Digital book Technology-mediated use
  • Slide 24
  • Formats are key to determining usability information content bits formats SW HW supporting technologies digital content Formats are the bridge between the content we want to preserve and supporting technologies
  • Slide 25
  • Dependence on fleeting technology We are dependent on technology to interpret digital content... Technologies must understand the format of the content Technologies age and disappear!
  • Slide 26
  • 3. Whats being done?
  • Slide 27
  • Primary goals of digital preservation 1. Keep the bits safe 2. Keep the bits useful to people
  • Slide 28
  • 1. Keep the bits safe Infrastructure, processes, policies and professional staff to counter risks High quality storage Redundancy (multiple copies, multiple locations) Media refreshing (replacing) Integrity monitoring (check for corruption) Security and access management Content recovery
  • Slide 29
  • 2. Keep the bits useful Provide ways for people to find it Provide ways to manage it Keep records of history and significant events Know what formats you have Make sure theres technology to support the formats! Technology watch And if theres not, force there to be technology that supports the formats (migration, emulation, creation of viewing software)
  • Slide 30
  • Degrees of preservation passive preservation aka bit-level preservation active preservation aka full preservation aka logical preservation better understood & less costly will not ensure long-term usability - ensures current and near-term usability more complex, challenging & costly requires more expertise but better ensures very long- term usability requires passive preservation
  • Slide 31
  • Degrees of preservation passive preservation aka bit-level preservation active preservation aka full preservation aka logical preservation Store Secure Maintain Prevent Migrate Re-engineer software Emulate Digital archaeology Monitor Restore Add value
  • Slide 32
  • Strategic thinking The least expensive, and most effective preservation measure is to think about the future when digital content is created! The content production matters! It makes good sense to try to influence the content creation process
  • Slide 33
  • Preservation lifecycle Create or acquire digital content Ingest into a preservation repository Continuous cycle of: Monitoring Planning Intervention Subject to collection management decisions Transfer to next generation of the repository or to a different repository A series of hand-offs over time
  • Slide 34
  • Ongoing commitment Requires continual pro-active program You cant just stop and start Time frames are MUCH shorter than for preservation of physical collections Requires ongoing investment in both technology and staffing
  • Slide 35
  • Cant do it alone More than any other library activity, preservation responsibility must be shared across institutions Even collectively we do not have adequate resources or understanding
  • Slide 36
  • Preservation community efforts Collaborative organizations (NDSA, IIPC, OPF) Collaborative projects (AIHT, TIPR) Standards and metadata Technical metadata for still images, audio, documents METS (package for metadata and digital objects) PREMIS (preservation metadata) Preservable formats (PDF/A) Repository certification Infrastructure Formats registry (UDFR, Pronom) Repository software (Fedora, DAITTSS, LOCKSS, etc.) Tools (Jhove, FITS, etc.)
  • Slide 37
  • 4. What can you do?
  • Slide 38
  • First steps Inventory your content Identify where it is all kept web locations computer hard drive Removable media (CDs, etc.) Select Decide what is worth keeping Given a choice keep the highest quality version Is someone else already preserving it? Consider deleting content that's not needed
  • Slide 39
  • Second steps Organize your digital content Create a logical directory/folder structure for the content Give descriptive names to the files If possible tag or embed with descriptions Catalog your content Draft a summary description Keep your inventory and a summary description of the content and how you have it organized in a secure location
  • Slide 40
  • Third steps Make multiple copies of your content Use formats that are amenable to long-term survival Use open formats when possible Store on durable media Store in multiple locations Preferably in different disaster zones. Use it! Periodically check that you can access the content Migrate to new media over time.
  • Slide 41
  • Fourth steps Keep informed. LC's website http://www.digitalpreservation.gov/you/http://www.digitalpreservation.gov/you/ Research, training and outreach (DCC, DPC, JISC, IIPC, NEDCC) http://www.nedcc.org/curriculum/lesson.introduction.ph p http://www.nedcc.org/curriculum/lesson.introduction.ph p Professional organizations (ALA, SAA) Conference proceedings (iPRES, IS&T Archiving, DLF) How to preserve your own digital materials (LC): http://www.digitalpreservation.gov/you/ http://www.digitalpreservation.gov/you/ 10 basic characteristics of digital preservation repositories (CRL website): http://www.crl.edu/archiving-preservation/digital- archives/metrics-assessing-and-certifying/core-re http://www.crl.edu/archiving-preservation/digital- archives/metrics-assessing-and-certifying/core-re
  • Slide 42
  • Image Credits First digital image http://www.worldalmanac.com/blog/2007/05/the_first_digital_image.html Pong: http://www.simondelliott.com/blog/2009/01/pong-is-more-than-just-a-game-its-a-way-of-life 1998: First theatrically released: http://en.wikipedia.org/wiki/The_Last_Broadcast iPod ad: http://www.ipodhistory.com/ipod-advertising Avatar: http://www.imdb.com/title/tt0499549 Cuneiform 2400 BC: http://en.wikipedia.org/wiki/Cuneiform_script 1450 Book of Hours in French and Latin: http://www.griffons.com/index.cfm?frm=details&piid=2811&cid=1&scid1=2&CFID=2459509&CFTOKEN=33670424 Server: http://regmedia.co.uk/2007/11/06/hp_mediasmart_server.jpg Sleepwalkers at MOMA: http://www.moma.org/explore/collection/conservation/media_art PRS data sets: http://www.prsgroup.com.ezp-prod1.hul.harvard.edu/prsgroup_shoppingcart/cdSub4.aspx Corrupt images: http://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppthttp://old.hki.uni-koeln.de/people/herrmann/forschung/heydegger_archiving2008.ppt New Yorker Cover, June 8 and 15, 2009 and October 18, 2010
  • Slide 43
  • Slide 44
  • 5.Questions?