Planning for Digital Preservation. Planning for Preservation Digital preservation issues come up...
-
Upload
nathan-gorman -
Category
Documents
-
view
225 -
download
2
Transcript of Planning for Digital Preservation. Planning for Preservation Digital preservation issues come up...
Planning for Digital Preservation
Planning for Preservation
Digital preservation issues come up much faster than traditional preservation issuesDigital resources need on-going attentionBuild a preservation strategy into your project from the startKeep dealing with the short-term issues and you won’t ever need to face the long-term problem
Issues
The content of digital resources is only accessible with the aid of intermediary technologiesDigital resources are complexReliance on specific combination of formats, software and hardware to operate correctlyI.T. develops rapidly, and resources can become obsolete very quickly
Three Key Areas
Content – the bits and bytes
Technologies: software systems; hardware: websites, access and delivery systems
Organisational
Planning for the Future
Short-term: Initial technology still current and actively supported - 0 - 5 yearsMedium-term: Initial technology still in use and supported, but no longer used for new work - 5-10 yearsLong-term: Initial technology no longer used or supported - 10+ years
….. In the Short Term
Making digital assets available
Website administration Website updatesSoftware and operating system patchesPeriodic backupsPeriodic checks on master copies
…….. In the Medium Term
Keeping your existing digital outputs ‘up and running’
Upgrading operating systems and softwareUpgrading hardwareReplacing hardware componentsRefreshing master copiesPeriodic backupsPeriodic checks on master copies
…….. In the Longer Term
Overcoming technological obsolescence to preserve a usable digital resource
Introducing completely new softwareReplacing entire hardware systemsEnhancing functionalityPeriodic backupsPeriodic checks on master copies
During the Data Creation Phase
Importance of backupsPreferably more than one copy, on and off siteAppropriate frequencyMore than one file formatCheck your backupsBut backup is not preservation!
What to Preserve?
Significant Characteristics
Very difficult to preserve everything (data, functionality and interaction) about a digital resourceDocumented or commonly understood significant characteristics help simplify preservation action
Analogue……
Book - Significant:Words, paragraphs, chapters, author, publication date, …
Not Significant:Binding, print run, font, colour of paper, …
Newspapers - Significant: Words, paragraphs, headlines, size of type, date, page number of article, …
Not Significant:Size of page, spacing, text justification, colour of paper, …
Digital………
There is a shared understanding of what is important in a paper-based resourceLess agreement about what is important in a digital resourceComplicated to decide as software and formats support many options that are not knowingly used but have default settings
Questions to ask….
What are the significant characteristics of your digital outputs?
What are the digital objects that make up your resource?What is the purpose of your digital resource?
Think about the problem in terms of content and purposeVery difficult (if not impossible) to ensure your resource stays exactly the same in the futureWhat can change without adverse effects?What changes must be limited, and by how much?How can you check changes are acceptable?
Assessing the scale of the Preservation Task
Estimating volume and type:Textual DocumentsStill ImagesMoving ImagesAudio filesNumeric datasetDatabaseMarkup Documents (XML etc.)CADGISVirtual realityWebsiteSoftware executable
Risk Assessment for file formats used
Review data types and file formats
Assess the risks associated with those file formats
Establish policy for dealing with them
Preservation Metadata
Metadata needed to manage preservation of digital collections: technical; administrative Not necessarily a “complete set” of preservation metadata elementsPossible sources:
OCLC/RLG Working Group; the Consultative Committee for Space Data Systems; CEDARS project; The UK National Archives (formerly the Public Record Office); Arts and Humanities Data Service; NEDLIB project; California Digital Library; Harvard University Library
File Structure
Create an overview of the file structureCreate a list of all filesCreate a logical file strategy from the outset
Choose consistent filenamesAvoid using re-using same filename even in separate folders.Store files in a logical order with systems and contents files kept apart.Summary of contents may be included with each file.Keep a record of encryption keys – important for preservation.
Preservation Strategies: Content
Migration: convert the data to work with new applicationsEmulation: convert the data, application (and operating system) to work on new hardwareTechnology preservation: Keep everything running Virtual computing: create a standard ‘virtual’ runtime environmentMigration on demand: convert original format directly into up-to-date format
Theory ----- Practice
In practice, migration is the simplest and most common approachLimitations of migration are:
Can be difficult to ensure accurate migrationDoes not capture functionality, only (possibly partial) dataMay need to be repeated frequentlyMight lead to ‘mutation’ over time
Migrating to new standards – but which one?
"The good thing about standards is that there are so many to choose from“ (A. Tanenbaum)
Quicktime 1.0 1992MPEG-1 1992Real Media 1995MPEG-2 1996RealVideo 1997MPEG-4 1999Quicktime 5.0 1999Active Streaming Format 1999
DIVX 5.0 2002 The number of A/V “de-facto” standard formats has exploded in the past five years, and this does not cover the dozens of audio and video codec combinations!
Measuring Longevity of Standard
Who developed it?Microsoft, Motion Picture Expert Group, etc.
Has it received mainstream support?Can your hardware save data in that format?
What organisations are using it?Is it used in industry
Is it widely accepted by the professional and amateur community?
Technology watch – check web sites, developer forums and newsgroups.
Has it been submitted as an ISO standard?
Measuring Longevity of Standard
Are there any legal actions to change the standard?
Is there a licensing fee?
What tools are available to create and manipulate the format
Open source vs. proprietaryPRONOM – National Archive database of 250 software products, 550 file formats and 100 manufacturers
Can I execute these tools on my computer?Java, Windows-only, Mac-only
Choosing a Suitable Migration Path
What are the main features?Small file size, streaming support
Will it support your specialist needs?Subtitles, DRM, Internet delivery, etc.
Does it provide sufficient qualityLossless vs. lossy compression.
Will it impose any restrictions on use?Can it actually be played by your target audience?
Is the standard stable or does it change frequently?
How will this affect your desire to use the format?
Migration problems
Have you encountered any problems when accessing these files in other applications?
Quirks (text not displaying, desynchronised audio/video, upside-down video playback).Version incompatibilities
Migrating to other formatsAre there any other problems when exporting to other formats? E.g. lossless-to-lossless conversion, in-editableDocument quirks & incompatibilities for later.
Updating Hardware
Hardware has changed dramatically in the last 3 years
Memory – DDR vs. SD-RAMCPU – pin compatibilityGraphics cards – AGP 2x, 4x, 8xOperating system – will Windows NT4/98 run on newer hardware?
Do you upgrade existing hardware or replace it with new equipment?
Updating Software
Software changes on a frequent basisFour service packs available for Windows 2000.Microsoft issues 3 patches per week on average.Legal action force changes to plugin handling.In addition, there is an estimated 20 un-patched vulnerabilities in Internet Explorer alone (PivX Solutions).
Do you upgrade to a later operating system or continue to use an operating system & software with known security flaws?
Preserving Your Website: technical issues
Standards And FormatsHas the Web site been designed using open standards, which should help future-proofing?Have proprietary formats been used (for which backwards compatibility may not be considered)
Architecture & ImplementationHas the technical architecture of the Web site been documented?Can you continue to use technical systems after funding has finished?
Preserving Your Website: content issues
Accuracy:Is the content of the Web site accurate today Who and how will changes be madeCould the content of the Web site be misleading in the future?
Usability:Maintaining links – short medium and long term
Legal:Is the Web site legal (accessibility; copyright; defamation; IPR; …)?Will the Web site be legal tomorrow, if new legislation is enacted? How will you know – who will make necessary changes?
Maintaining a Website
Run a link check across the Web site. Fix broken internal links and as many external links as is reasonable. Document the link report.
Run HTML (and CSS) validation checks across the Web site. Fix as many invalid pages as is reasonable. Document the findings.
Run an accessibility check across the Web site. Fix as many inaccessible pages as is reasonable. Document the findings.
Maintaining a Website
Address technical areas:
Remove any backend scripts which are no longer needed
Remember that scripts, etc. are liable to go wrong.
Ensure that applications are configured to break gracefully and provide meaningful errors – tell users who to contact if they find an error
Procedures framework
From start to finish:Creation and Management Manuals within Procedures Framework
Key File Format Conversion Guides
Digital Object Preservation Handbook: a ‘how-to’ guide
Options for Ensuring Preservation
Once a project is completed……………
Live, (supported) systemArchivedOrganisational Repository‘Shelved’Abandoned
Not Recommended……..
AbandonedMay be appropriate, probably isn’t, think about archiving the resource instead
‘Shelved’Don’t - shelving a digital resource without active, on-going attention is highly likely to result in its lossMedia degradationSoftware and hardware obsolescenceLoss of knowledge about the resource
Recommended……. But Think About
Live SystemImportance of functionality/interfaceOrganisational buy-in: who is running the system, and what is their commitment to it?What will happen if the system is shut down?Is the digital resource completed or on-going?Who Pays?
Recommended…… But Think About
Deposit in an ArchiveIs the digital resource going to a trusted archive?Are only some aspects of the resource being archived?Will it be available for others to use?Will the resource be updated in the future?Costs?
Recommended…….. But think about
Establish a RepositoryBusiness model and financial plan
Management and administrative processes Policies and proceduresSystems and toolsSoftware and hardwareResource curation Metadata and documentationPreservation management
Establishing Requirements
A pragmatic approach – workable and achievablePreservation requirementsEstablish common practices, procedures and use of standardsInvestigate and establish hardware, systems, and tools requirementsInvestigate and evaluate productsBusiness planning and costings
Developing the Architecture
The architecture must support:The entire activity cycle including ingest, data management, storage, long term preservation, discovery, access and deliveryAll necessary security aspectsComplex resourcesDiscovery and delivery options
Summary
Build in preservation right from the startDocument decisions/policies/proceduresBalance longevity with innovationBe ruthless about what you must keep and what can be discardedThink content and functionalityPlanningIt’s a continuous process – not a one-off