“What I Learned This Summer”:  A Week at SAA’s First Electronic Records Summer Camp

46
What I Learned This What I Learned This Summer”: Summer”: A Week at SAA’s First A Week at SAA’s First Electronic Records Electronic Records Summer Camp Summer Camp Daniel Linke Daniel Linke University Archivist and Curator of Public Policy Papers University Archivist and Curator of Public Policy Papers December 14, 2007 December 14, 2007

description

“What I Learned This Summer”:  A Week at SAA’s First Electronic Records Summer Camp. Daniel Linke University Archivist and Curator of Public Policy Papers December 14, 2007. Geisel Library at UCSD (Photo by Sara Muth). University of California, San Diego August 6-10, 2007. - PowerPoint PPT Presentation

Transcript of “What I Learned This Summer”:  A Week at SAA’s First Electronic Records Summer Camp

Page 1: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

““What I Learned This What I Learned This Summer”:  Summer”: 

A Week at SAA’s First A Week at SAA’s First Electronic Records Summer Electronic Records Summer

CampCamp

Daniel LinkeDaniel LinkeUniversity Archivist and Curator of Public Policy University Archivist and Curator of Public Policy

PapersPapers

December 14, 2007December 14, 2007

Page 2: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Geisel Library at UCSD (Photo by Sara Muth)

University of California, San Diego

August 6-10, 2007

Page 3: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Yes, that Yes, that GeiselGeisel

(Photo by Sara Muth)

Page 4: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Eleanor Roosevelt Campus Eleanor Roosevelt Campus (Photo by Sara Muth)(Photo by Sara Muth)

Page 5: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Our accommodations in the Asante Our accommodations in the Asante dormitory dormitory (Photo by Sara Muth)(Photo by Sara Muth)

Page 6: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

• My My suitemates: suitemates: Peter Johnson, Peter Johnson, Eric Paquette, Eric Paquette, and Dylan and Dylan McDonaldMcDonald

• (Photo courtesy of Eric Paquette)(Photo courtesy of Eric Paquette)

Page 7: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

27 attendees from a variety of institutions 27 attendees from a variety of institutions (government, educational, and private (government, educational, and private

repositories):repositories):

• UCSD, UC-Irvine, Harvard B. School, UCSD, UC-Irvine, Harvard B. School, U. New Mexico, UT:Arlington, U. New Mexico, UT:Arlington, Occidental College, UWI:MadisonOccidental College, UWI:Madison

• AZ, CA, NC, and WA State ArchivesAZ, CA, NC, and WA State Archives• CIGNA, National Fire Protection CIGNA, National Fire Protection

Association, Ford, History AssociatesAssociation, Ford, History Associates• Sacramento Archives and Museum Sacramento Archives and Museum • Marist Brothers of CanadaMarist Brothers of Canada

Page 8: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Terrace of the college commons where we took our Terrace of the college commons where we took our

mealsmeals(Photos by Sara Muth)(Photos by Sara Muth)

Page 9: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Fellow “campers” : Police Explorers ClubFellow “campers” : Police Explorers Club(Photo by Sara Muth)(Photo by Sara Muth)

Page 10: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Our classroom was within the Our classroom was within the SDSCSDSC

(Photo by Eric Paquette)(Photo by Eric Paquette)

Page 11: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Our classroom Our classroom (Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

Page 12: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Some instructors standing at the Some instructors standing at the

backback(Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

Page 13: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

SAA Summer School SAA Summer School InstructorsInstructors

• Mark Conrad (NARA)Mark Conrad (NARA) Preservation principlesPreservation principles

• Mike Smorul (U Md)Mike Smorul (U Md) Preservation servicesPreservation services

• Reagan Moore (SDSC)Reagan Moore (SDSC) Data gridsData grids

• Arcot (Raja) Rajasekar (SDSC)Arcot (Raja) Rajasekar (SDSC) Advanced data gridsAdvanced data grids

• Richard Marciano (SDSC)Richard Marciano (SDSC) Preservation applicationsPreservation applications

• Chien-Yi Hou (SDSC)Chien-Yi Hou (SDSC) Preservation applicationsPreservation applications

Page 14: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

What the What the week week

consisted consisted of (in of (in

format)format)(Photo by Chien-Yi Hou)(Photo by Chien-Yi Hou)

Page 15: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

What the week consisted of in subjects What the week consisted of in subjects coveredcovered

• Monday Monday – Electronic Records 101 (Conrad)Electronic Records 101 (Conrad)– Components of an Electronic Records Program (Conrad)Components of an Electronic Records Program (Conrad)– Infrastructure Independence (Moore)Infrastructure Independence (Moore)– mySRB Tutorial (Moore)mySRB Tutorial (Moore)

• Tuesday Tuesday – Appraisal and Disposition (Conrad, Marciano, Chien-Yi)Appraisal and Disposition (Conrad, Marciano, Chien-Yi)– Accessioning (Smorul, Marciano, Conrad)Accessioning (Smorul, Marciano, Conrad)

• WednesdayWednesday– Arrangement (Marciano, Conrad, Moore)Arrangement (Marciano, Conrad, Moore)– Description (Marciano, Rajasekar, Chien-Yi, Moore)Description (Marciano, Rajasekar, Chien-Yi, Moore)

• ThursdayThursday– Preservation (Moore, Smorul, Chien-Yi)Preservation (Moore, Smorul, Chien-Yi)– Access (Moore, Marciano)Access (Moore, Marciano)

• FridayFriday– Scalability (Moore, Marciano)Scalability (Moore, Marciano)– Getting started (Conrad, Moore)Getting started (Conrad, Moore)

Page 16: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

What are Electronic What are Electronic Records?Records?

• Easy to DefineEasy to Define– Any Record that Can Only be Accessed Any Record that Can Only be Accessed

With a ComputerWith a Computer

• Hard to DefineHard to Define– Many Records Don’t Have an Analog Many Records Don’t Have an Analog

EquivalentEquivalent– Often Difficult to Say Where the Often Difficult to Say Where the

“Boundaries” of a Record Are“Boundaries” of a Record Are

Page 17: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Where Do They Come From?Where Do They Come From?

• Types of applications that can create Types of applications that can create electronic recordselectronic records– Word processingWord processing– DatabasesDatabases– SpreadsheetsSpreadsheets– Geographic Information SystemsGeographic Information Systems– E-mailE-mail– Any Computer Application Could Potentially Any Computer Application Could Potentially

be used to Create Electronic Records be used to Create Electronic Records

Page 18: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Unique Qualities: Faster than Unique Qualities: Faster than RabbitsRabbits• They Multiply!They Multiply!

• PERMANENT Federal Electronic RecordsPERMANENT Federal Electronic Records– 1 to 5% of the Total Produced1 to 5% of the Total Produced– Next 15 Years – 350 Petabytes Produced Next 15 Years – 350 Petabytes Produced

(Peta = 1000 TB)(Peta = 1000 TB)– Beyond the Current State of the ArtBeyond the Current State of the Art

• Archivists can Identify the Wheat and ChaffArchivists can Identify the Wheat and Chaff– Resource Allocators are Taking NoticeResource Allocators are Taking Notice

Page 19: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Unique Qualities: Handle With Unique Qualities: Handle With CareCare

• They are Fragile!They are Fragile!

– Easily DeletedEasily Deleted

– Keeping the Contextual Information Keeping the Contextual Information Linked to the Data is DifficultLinked to the Data is Difficult•Without this it is difficult to assert you have Without this it is difficult to assert you have

authentic recordsauthentic records

Page 20: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Unique Qualities: Unique Qualities: ManipulationManipulation• The Good: Organized or Used in Multiple WaysThe Good: Organized or Used in Multiple Ways

– Records can be more easily used. Records can be more easily used. •Records that would be difficult to use in paper form Records that would be difficult to use in paper form

can be used quite easily in electronic form. can be used quite easily in electronic form.

•The Not So Good:-Records can be easily changed.

Page 21: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Unique Qualities: Native Habitat vs. Unique Qualities: Native Habitat vs. ZooZoo

• Original ApplicationsOriginal Applications– Run Out of RoomRun Out of Room– Go Belly UpGo Belly Up

• Moving the Records Out of Their Native Moving the Records Out of Their Native Habitat can be ChallengingHabitat can be Challenging– Where is the Boundary Between the Records and Where is the Boundary Between the Records and

the Application?the Application?– How do You Maintain Essential Characteristics in How do You Maintain Essential Characteristics in

a Zoo (aka Preservation Environment)?a Zoo (aka Preservation Environment)?– The Formats Become Obsolete, Too!The Formats Become Obsolete, Too!

Page 22: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

COMPONENTS OF AN ELECTRONIC COMPONENTS OF AN ELECTRONIC RECORDS PROGRAMRECORDS PROGRAM

1.1. Policies and MandatesPolicies and Mandates

2.2. Technical InfrastructureTechnical Infrastructure

3.3. Social InfrastructureSocial Infrastructure

Page 23: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Technical Infrastructure Technical Infrastructure

• Challenge: there are NO proven Challenge: there are NO proven methods for the long-term retention methods for the long-term retention of E/R in many formatsof E/R in many formats

--Ongoing Empirical Research: but theory Ongoing Empirical Research: but theory does not Make it So!does not Make it So!

Page 24: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Storage Resource Broker Storage Resource Broker (SRB)(SRB)

Page 25: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp
Page 26: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp
Page 27: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Infrastructure IndependenceInfrastructure Independence

RecordsRecords

Preservation

Environment

Evolving Evolving

TechnologyTechnology

Preservation environment middleware insulatesPreservation environment middleware insulates

records from changes in the external worldrecords from changes in the external world

External World

Page 28: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Infrastructure IndependenceInfrastructure Independence• Use data grids to preserve records independently of Use data grids to preserve records independently of

the choice of technologythe choice of technology• Management of archives properties Management of archives properties

• Map technology components to preservation Map technology components to preservation principlesprinciples– Capabilities that support preservation requirementsCapabilities that support preservation requirements

• Construct preservation environment from componentsConstruct preservation environment from components– Archival engineering perspectiveArchival engineering perspective

• Use infrastructure independence to enable use of new Use infrastructure independence to enable use of new technologytechnology– View that new technology is an opportunity instead of a View that new technology is an opportunity instead of a

challengechallenge

Page 29: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Preservation StandardsPreservation Standards• Architectural ModelArchitectural Model

– OAIS, Reference Model for an Open Archival Information SystemOAIS, Reference Model for an Open Archival Information System• Representation information for each recordRepresentation information for each record• Submission / Archival / Dissemination Information Package (SIP / AIP / DIP)Submission / Archival / Dissemination Information Package (SIP / AIP / DIP)

– Data grid - Storage Resource Broker (SRB), integrated Rule Oriented Data Data grid - Storage Resource Broker (SRB), integrated Rule Oriented Data System (iRODS)System (iRODS)

– Digital Library - DSpace, FedoraDigital Library - DSpace, Fedora• MetadataMetadata

– Dublin coreDublin core– LCDRG, NARA Life Cycle Data Requirements GuideLCDRG, NARA Life Cycle Data Requirements Guide– PREMIS, Preservation Metadata Implementation StrategiesPREMIS, Preservation Metadata Implementation Strategies

• Metadata organizationMetadata organization– MPEG-21, ISO/IEC TR 21000-1: MPEG-21 Multimedia FrameworkMPEG-21, ISO/IEC TR 21000-1: MPEG-21 Multimedia Framework– METS, Metadata Encoding and Transmission StandardMETS, Metadata Encoding and Transmission Standard– OAIS, OAIS, Reference Model for an Open Archival Information SystemReference Model for an Open Archival Information System

• Submission / HarvestingSubmission / Harvesting– Producer Archive Interface (NASA)Producer Archive Interface (NASA)– OAI-PMH, Open Archives Initiative - Protocol for Metadata HarvestingOAI-PMH, Open Archives Initiative - Protocol for Metadata Harvesting

• Data format Data format – pdf, xml, (330 formats retrievable on web crawls)pdf, xml, (330 formats retrievable on web crawls)

• Assessment criteriaAssessment criteria– RLG/NARA TRAC - Trustworthy Repositories Audit & Certification: Criteria and Checklist. RLG/NARA TRAC - Trustworthy Repositories Audit & Certification: Criteria and Checklist.

http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/http://wiki.digitalrepositoryauditandcertification.org/pub/Main/ReferenceInputDocuments/trac.pdftrac.pdf

Page 30: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Data GridData Grid

Using a Data Grid – Using a Data Grid – in in AbstractAbstract

Ask for d

ata

•User asks for data from the data grid

Data d

elivere

d

•The data is found and returned•Where & how details are hidden

Page 31: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Using a Data Grid - Using a Data Grid - DetailsDetails

Storage Resource Broker Server

•Data request goes to SRB Server

Storage Resource Broker Server

Metadata Catalog

DB

•Server looks up information in catalog

•Catalog tells which SRB server has data

•1st server asks 2nd for data•The data is found and returned

•User asks for data

ux-brk14 ux-brk12Oracle

Page 32: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

For more details, see:For more details, see:

Moore, Regan, “Building Preservation Moore, Regan, “Building Preservation Environments with Data Grid Environments with Data Grid

Technology”, Technology”, American ArchivistAmerican Archivist, vol. 69, , vol. 69, no. 1, pp. 139-158, July 2006no. 1, pp. 139-158, July 2006

Page 33: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Appraisal of ER: Get There Appraisal of ER: Get There EarlyEarly

• Records Need to be Appraised: Records Need to be Appraised:

– Early in Their LifecycleEarly in Their Lifecycle•FragileFragile

•EphemeralEphemeral

– In Their Native HabitatIn Their Native Habitat•FunctionalityFunctionality

Page 34: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Technical AppraisalTechnical Appraisal• For Permanent Records Have to Conduct For Permanent Records Have to Conduct

Technical AppraisalTechnical Appraisal

– Feasibility of Preserving the RecordsFeasibility of Preserving the Records

– Identify all of the Digital ObjectsIdentify all of the Digital Objects

– Essential CharacteristicsEssential Characteristics

• At Scale!At Scale!

Page 35: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Bootcamp Bootcamp continued…continued…

Appraise this !@#$

Disposition

In Action…

Arrangement

In Action…

Page 36: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Electronic Records "Summer Camp"

Tapping into Archival KnowledgeTapping into Archival Knowledge

Page 37: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

The WebsiteThe Website

Page 38: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

The Website, cont’dThe Website, cont’d

Page 39: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Formulating Appraisal RulesFormulating Appraisal Rules• Retrieve root webpage Retrieve root webpage ‘http://water.usgs.gov/lookup/getgislist’‘http://water.usgs.gov/lookup/getgislist’• For each entry:For each entry:

– Create an “Create an “matching entry” matching entry” collection on the SRBcollection on the SRB– Add ‘entry description’ Add ‘entry description’ metadatametadata to that collection to that collection– Create “Create “DescriptionDescription” subcollection” subcollection

• Load Load web pageweb page• Load all Load all “.gif” | “.jpg” | “.jpeg”“.gif” | “.jpg” | “.jpeg” files files• Load all Load all “.doc”“.doc”• Load Load metadatametadata file file

– Create “Create “ArcINFOArcINFO” subcollection” subcollection• Load all Load all “.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt”“.e00” | “.clr” | “.asc” | “.nit” | “.dlg” | “.txt” files files

– Create “Create “ShapeShape” subcollection” subcollection• Load all Load all “.shp”“.shp” files files

– Create “Create “SDTSSDTS” subcollection” subcollection• Load all Load all “.sdts”“.sdts” files files

– Create “Create “OthersOthers” subcollection” subcollection• Load Load “.tfw” | “.rdb” | “.clr” | “.asc” | “.prj”“.tfw” | “.rdb” | “.clr” | “.asc” | “.prj” files files

– DECOMPRESS & LOAD DECOMPRESS & LOAD “.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz”“.zip” | “.gz” | “.tgz” | “.tar” | “.tar.gz” files files

Page 40: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

E-FOIA Document Collections: Dep. Of E-FOIA Document Collections: Dep. Of StateState

Page 41: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

National Archives and Records Administration National Archives and Records Administration Transcontinental Persistent Archive PrototypeTranscontinental Persistent Archive Prototype

U Md SDSC

MCAT MCAT

Georgia Tech

MCAT

Federation of Five Independent Data Grids

NARA II

MCAT

NARA I

MCAT

Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products.

Page 42: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

ACE – Basic MethodologyACE – Basic Methodology

• Three-tiered Cryptographic Information.Three-tiered Cryptographic Information.

• Each tier is periodically audited separately Each tier is periodically audited separately according to policies set by managers.according to policies set by managers.

IntegrityToken

WitnessCryptographicSummaryInformation

• 1 IT/object

• ~1KB

• 1 CSI/time window

• 1 CSI / (n) objects

• ~100MB/year

• 1 Witness/week

• ~2-3KB/year

k:1 l:1

Page 43: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

(Photos by Sara Muth)

End of the day

Page 44: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Clu

b

Asan

te

Photos by Sara Muth (top) and Eric Paquette (right)

Page 45: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

Commemorative Corkscrew

(Photo by Gary Spurr)(Photo by Gary Spurr)

Page 46: “What I Learned This Summer”:   A Week at SAA’s First Electronic Records Summer Camp

AcknowledgmentsAcknowledgments

Slides with text are from the Slides with text are from the course instructors’ PowerPoint course instructors’ PowerPoint presentations: Conrad, et. alpresentations: Conrad, et. al

Photos as credited.Photos as credited.

(Photo by Eric Paquette)(Photo by Eric Paquette)