“What I Learned This Summer”: A Week at SAA’s First Electronic Records Summer Camp Daniel...

46
What I Learned This What I Learned This Summer”: Summer”: A Week at SAA’s First A Week at SAA’s First Electronic Records Electronic Records Summer Camp Summer Camp Daniel Linke Daniel Linke University Archivist and Curator of Public Policy Papers University Archivist and Curator of Public Policy Papers December 14, 2007 December 14, 2007

Transcript of “What I Learned This Summer”: A Week at SAA’s First Electronic Records Summer Camp Daniel...

““What I Learned This What I Learned This Summer”:  Summer”: 

A Week at SAA’s First A Week at SAA’s First Electronic Records Summer Electronic Records Summer

CampCamp

Daniel LinkeDaniel LinkeUniversity Archivist and Curator of Public Policy University Archivist and Curator of Public Policy

PapersPapers

December 14, 2007December 14, 2007

Geisel Library at UCSD (Photo by Sara Muth)

University of California, San Diego

August 6-10, 2007

Yes, that Yes, that GeiselGeisel

(Photo by Sara Muth)

Eleanor Roosevelt Campus Eleanor Roosevelt Campus (Photo by Sara Muth)(Photo by Sara Muth)

Our accommodations in the Asante Our accommodations in the Asante dormitory dormitory (Photo by Sara Muth)(Photo by Sara Muth)

• My My suitemates: suitemates: Peter Johnson, Peter Johnson, Eric Paquette, Eric Paquette, and Dylan and Dylan McDonaldMcDonald

• (Photo courtesy of Eric Paquette)(Photo courtesy of Eric Paquette)

27 attendees from a variety of institutions 27 attendees from a variety of institutions (government, educational, and private (government, educational, and private

repositories):repositories):

• UCSD, UC-Irvine, Harvard B. School, UCSD, UC-Irvine, Harvard B. School, U. New Mexico, UT:Arlington, U. New Mexico, UT:Arlington, Occidental College, UWI:MadisonOccidental College, UWI:Madison

• AZ, CA, NC, and WA State ArchivesAZ, CA, NC, and WA State Archives• CIGNA, National Fire Protection CIGNA, National Fire Protection

Association, Ford, History AssociatesAssociation, Ford, History Associates• Sacramento Archives and Museum Sacramento Archives and Museum • Marist Brothers of CanadaMarist Brothers of Canada

Terrace of the college commons where we took our Terrace of the college commons where we took our

mealsmeals(Photos by Sara Muth)(Photos by Sara Muth)

Fellow “campers” : Police Explorers ClubFellow “campers” : Police Explorers Club(Photo by Sara Muth)(Photo by Sara Muth)

Our classroom was within the Our classroom was within the SDSCSDSC

(Photo by Eric Paquette)(Photo by Eric Paquette)

Our classroom Our classroom (Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

Some instructors standing at the Some instructors standing at the

backback(Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

SAA Summer School SAA Summer School InstructorsInstructors

• Mark Conrad (NARA)Mark Conrad (NARA) Preservation principlesPreservation principles

• Mike Smorul (U Md)Mike Smorul (U Md) Preservation servicesPreservation services

• Reagan Moore (SDSC)Reagan Moore (SDSC) Data gridsData grids

• Arcot (Raja) Rajasekar (SDSC)Arcot (Raja) Rajasekar (SDSC) Advanced data gridsAdvanced data grids

• Richard Marciano (SDSC)Richard Marciano (SDSC) Preservation applicationsPreservation applications

• Chien-Yi Hou (SDSC)Chien-Yi Hou (SDSC) Preservation applicationsPreservation applications

What the What the week week

consisted consisted of (in of (in

format)format)(Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

What the week consisted of in subjects What the week consisted of in subjects coveredcovered

• Monday Monday – Electronic Records 101 (Conrad)Electronic Records 101 (Conrad)– Components of an Electronic Records Program (Conrad)Components of an Electronic Records Program (Conrad)– Infrastructure Independence (Moore)Infrastructure Independence (Moore)– mySRB Tutorial (Moore)mySRB Tutorial (Moore)

• Tuesday Tuesday – Appraisal and Disposition (Conrad, Marciano, Chien-Yi)Appraisal and Disposition (Conrad, Marciano, Chien-Yi)– Accessioning (Smorul, Marciano, Conrad)Accessioning (Smorul, Marciano, Conrad)

• WednesdayWednesday– Arrangement (Marciano, Conrad, Moore)Arrangement (Marciano, Conrad, Moore)– Description (Marciano, Rajasekar, Chien-Yi, Moore)Description (Marciano, Rajasekar, Chien-Yi, Moore)

• ThursdayThursday– Preservation (Moore, Smorul, Chien-Yi)Preservation (Moore, Smorul, Chien-Yi)– Access (Moore, Marciano)Access (Moore, Marciano)

• FridayFriday– Scalability (Moore, Marciano)Scalability (Moore, Marciano)– Getting started (Conrad, Moore)Getting started (Conrad, Moore)

What are Electronic What are Electronic Records?Records?

• Easy to DefineEasy to Define– Any Record that Can Only be Accessed Any Record that Can Only be Accessed

With a ComputerWith a Computer

• Hard to DefineHard to Define– Many Records Don’t Have an Analog Many Records Don’t Have an Analog

EquivalentEquivalent– Often Difficult to Say Where the Often Difficult to Say Where the

“Boundaries” of a Record Are“Boundaries” of a Record Are

Where Do They Come From?Where Do They Come From?

• Types of applications that can create Types of applications that can create electronic recordselectronic records– Word processingWord processing– DatabasesDatabases– SpreadsheetsSpreadsheets– Geographic Information SystemsGeographic Information Systems– E-mailE-mail– Any Computer Application Could Potentially Any Computer Application Could Potentially

be used to Create Electronic Records be used to Create Electronic Records

Unique Qualities: Faster than Unique Qualities: Faster than RabbitsRabbits• They Multiply!They Multiply!

• PERMANENT Federal Electronic RecordsPERMANENT Federal Electronic Records– 1 to 5% of the Total Produced1 to 5% of the Total Produced– Next 15 Years – 350 Petabytes Produced Next 15 Years – 350 Petabytes Produced

(Peta = 1000 TB)(Peta = 1000 TB)– Beyond the Current State of the ArtBeyond the Current State of the Art

• Archivists can Identify the Wheat and ChaffArchivists can Identify the Wheat and Chaff– Resource Allocators are Taking NoticeResource Allocators are Taking Notice

Unique Qualities: Handle With Unique Qualities: Handle With CareCare

• They are Fragile!They are Fragile!

– Easily DeletedEasily Deleted

– Keeping the Contextual Information Keeping the Contextual Information Linked to the Data is DifficultLinked to the Data is Difficult•Without this it is difficult to assert you have Without this it is difficult to assert you have

authentic recordsauthentic records

Unique Qualities: Unique Qualities: ManipulationManipulation• The Good: Organized or Used in Multiple WaysThe Good: Organized or Used in Multiple Ways

– Records can be more easily used. Records can be more easily used. •Records that would be difficult to use in paper form Records that would be difficult to use in paper form

can be used quite easily in electronic form. can be used quite easily in electronic form.

•The Not So Good:-Records can be easily changed.

Unique Qualities: Native Habitat vs. Unique Qualities: Native Habitat vs. ZooZoo

• Original ApplicationsOriginal Applications– Run Out of RoomRun Out of Room– Go Belly UpGo Belly Up

• Moving the Records Out of Their Native Moving the Records Out of Their Native Habitat can be ChallengingHabitat can be Challenging– Where is the Boundary Between the Records and Where is the Boundary Between the Records and

the Application?the Application?– How do You Maintain Essential Characteristics in How do You Maintain Essential Characteristics in

a Zoo (aka Preservation Environment)?a Zoo (aka Preservation Environment)?– The Formats Become Obsolete, Too!The Formats Become Obsolete, Too!

COMPONENTS OF AN ELECTRONIC COMPONENTS OF AN ELECTRONIC RECORDS PROGRAMRECORDS PROGRAM

1.1. Policies and MandatesPolicies and Mandates

2.2. Technical InfrastructureTechnical Infrastructure

3.3. Social InfrastructureSocial Infrastructure

Technical Infrastructure Technical Infrastructure

• Challenge: there are NO proven Challenge: there are NO proven methods for the long-term retention methods for the long-term retention of E/R in many formatsof E/R in many formats

--Ongoing Empirical Research: but theory Ongoing Empirical Research: but theory does not Make it So!does not Make it So!

Storage Resource Broker Storage Resource Broker (SRB)(SRB)

Infrastructure IndependenceInfrastructure Independence

RecordsRecords

Preservation

Environment

Evolving Evolving

TechnologyTechnology

Preservation environment middleware insulatesPreservation environment middleware insulates

records from changes in the external worldrecords from changes in the external world

External World

Infrastructure IndependenceInfrastructure Independence• Use data grids to preserve records independently of Use data grids to preserve records independently of

the choice of technologythe choice of technology• Management of archives properties Management of archives properties

• Map technology components to preservation Map technology components to preservation principlesprinciples– Capabilities that support preservation requirementsCapabilities that support preservation requirements

• Construct preservation environment from componentsConstruct preservation environment from components– Archival engineering perspectiveArchival engineering perspective

• Use infrastructure independence to enable use of new Use infrastructure independence to enable use of new technologytechnology– View that new technology is an opportunity instead of a View that new technology is an opportunity instead of a

challengechallenge

Preservation StandardsPreservation Standards• Architectural ModelArchitectural Model

– OAIS, Reference Model for an Open Archival Information SystemOAIS, Reference Model for an Open Archival Information System• Representation information for each recordRepresentation information for each record• Submission / Archival / Dissemination Information Package (SIP / AIP / DIP)Submission / Archival / Dissemination Information Package (SIP / AIP / DIP)

– Data grid - Storage Resource Broker (SRB), integrated Rule Oriented Data Data grid - Storage Resource Broker (SRB), integrated Rule Oriented Data System (iRODS)System (iRODS)

– Digital Library - DSpace, FedoraDigital Library - DSpace, Fedora• MetadataMetadata

– Dublin coreDublin core– LCDRG, NARA Life Cycle Data Requirements GuideLCDRG, NARA Life Cycle Data Requirements Guide– PREMIS, Preservation Metadata Implementation StrategiesPREMIS, Preservation Metadata Implementation Strategies

• Metadata organizationMetadata organization– MPEG-21, ISO/IEC TR 21000-1: MPEG-21 Multimedia FrameworkMPEG-21, ISO/IEC TR 21000-1: MPEG-21 Multimedia Framework– METS, Metadata Encoding and Transmission StandardMETS, Metadata Encoding and Transmission Standard– OAIS, OAIS, Reference Model for an Open Archival Information SystemReference Model for an Open Archival Information System

• Submission / HarvestingSubmission / Harvesting– Producer Archive Interface (NASA)Producer Archive Interface (NASA)– OAI-PMH, Open Archives Initiative - Protocol for Metadata HarvestingOAI-PMH, Open Archives Initiative - Protocol for Metadata Harvesting

• Data format Data format – pdf, xml, (330 formats retrievable on web crawls)pdf, xml, (330 formats retrievable on web crawls)

• Assessment criteriaAssessment criteria– RLG/NARA TRAC - Trustworthy Repositories Audit & Certification: Criteria and Checklist. RLG/NARA TRAC - Trustworthy Repositories Audit & Certification: Criteria and Checklist.

http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdftrac.pdf

Data GridData Grid

Using a Data Grid – Using a Data Grid – in in AbstractAbstract

Ask for d

ata

•User asks for data from the data grid

Data d

elivere

d

•The data is found and returned•Where & how details are hidden

Using a Data Grid - Using a Data Grid - DetailsDetails

Storage Resource Broker Server

•Data request goes to SRB Server

Storage Resource Broker Server

Metadata Catalog

DB

•Server looks up information in catalog

•Catalog tells which SRB server has data

•1st server asks 2nd for data•The data is found and returned

•User asks for data

ux-brk14 ux-brk12Oracle

For more details, see:For more details, see:

Moore, Regan, “Building Preservation Moore, Regan, “Building Preservation Environments with Data Grid Environments with Data Grid

Technology”, Technology”, American ArchivistAmerican Archivist, vol. 69, , vol. 69, no. 1, pp. 139-158, July 2006no. 1, pp. 139-158, July 2006

Appraisal of ER: Get There Appraisal of ER: Get There EarlyEarly

• Records Need to be Appraised: Records Need to be Appraised:

– Early in Their LifecycleEarly in Their Lifecycle•FragileFragile

•EphemeralEphemeral

– In Their Native HabitatIn Their Native Habitat•FunctionalityFunctionality

Technical AppraisalTechnical Appraisal• For Permanent Records Have to Conduct For Permanent Records Have to Conduct

Technical AppraisalTechnical Appraisal

– Feasibility of Preserving the RecordsFeasibility of Preserving the Records

– Identify all of the Digital ObjectsIdentify all of the Digital Objects

– Essential CharacteristicsEssential Characteristics

• At Scale!At Scale!

Bootcamp Bootcamp continued…continued…

Appraise this !@#$

Disposition

In Action…

Arrangement

In Action…

Electronic Records "Summer Camp"

Tapping into Archival KnowledgeTapping into Archival Knowledge

The WebsiteThe Website

The Website, cont’dThe Website, cont’d

Formulating Appraisal RulesFormulating Appraisal Rules• Retrieve root webpage Retrieve root webpage ‘http://water.usgs.gov/lookup/getgislist’‘http://water.usgs.gov/lookup/getgislist’• For each entry:For each entry:

– Create an “Create an “matching entry” matching entry” collection on the SRBcollection on the SRB– Add ‘entry description’ Add ‘entry description’ metadatametadata to that collection to that collection– Create “Create “DescriptionDescription” subcollection” subcollection

• Load Load web pageweb page• Load all Load all “.gif” | “.jpg” | “.jpeg”“.gif” | “.jpg” | “.jpeg” files files• Load all Load all “.doc”“.doc”• Load Load metadatametadata file file

– Create “Create “ArcINFOArcINFO” subcollection” subcollection• Load all Load all “.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt”“.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt” files files

– Create “Create “ShapeShape” subcollection” subcollection• Load all Load all “.shp”“.shp” files files

– Create “Create “SDTSSDTS” subcollection” subcollection• Load all Load all “.sdts”“.sdts” files files

– Create “Create “OthersOthers” subcollection” subcollection• Load Load “.tfw” | “.rdb” | “.clr” | “.asc” | “.prj”“.tfw” | “.rdb” | “.clr” | “.asc” | “.prj” files files

– DECOMPRESS & LOAD DECOMPRESS & LOAD “.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz”“.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz” files files

E-FOIA Document Collections: Dep. Of E-FOIA Document Collections: Dep. Of StateState

National Archives and Records Administration National Archives and Records Administration Transcontinental Persistent Archive PrototypeTranscontinental Persistent Archive Prototype

U Md SDSC

MCAT MCAT

Georgia Tech

MCAT

Federation of Five Independent Data Grids

NARA II

MCAT

NARA I

MCAT

Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.

ACE – Basic MethodologyACE – Basic Methodology

• Three-tiered Cryptographic Information.Three-tiered Cryptographic Information.

• Each tier is periodically audited separately Each tier is periodically audited separately according to policies set by managers.according to policies set by managers.

IntegrityToken

WitnessCryptographicSummaryInformation

• 1 IT/object

• ~1KB

• 1 CSI/time window

• 1 CSI / (n) objects

• ~100MB/year

• 1 Witness/week

• ~2-3KB/year

k:1 l:1

(Photos by Sara Muth)

End of the day

Clu

b

Asan

te

Photos by Sara Muth (top) and Eric Paquette (right)

Commemorative Corkscrew

(Photo by Gary Spurr)(Photo by Gary Spurr)

AcknowledgmentsAcknowledgments

Slides with text are from the Slides with text are from the course instructors’ PowerPoint course instructors’ PowerPoint presentations: Conrad, et. alpresentations: Conrad, et. al

Photos as credited.Photos as credited.

(Photo by Eric Paquette)(Photo by Eric Paquette)