Preserving the Smithsonian Institution’s Web Presence

Post on 16-Jan-2015

873 views 0 download

description

Presentation delivered by Lynda Schmitz Fuhrig, Electronic Archivist, and Jennifer Wright, Archivist, for the Smithsonian Institution Archives, at the Smithsonian Archives Fair on October 14, 2011 in Washington, DC. Although it first began capturing institutional websites in the late 1990s, the Smithsonian Institution Archives initiated a project in 2009 to capture the explosion of public websites and social media instances maintained by its many museums, research centers, and programs with the Heritrix crawler. This presentation reviews appraisal, accessioning, and capture issues in documenting the Smithsonian’s web presence in the early 21st Century.

Transcript of Preserving the Smithsonian Institution’s Web Presence

Lynda Schmitz Fuhrig and Jennifer Wright

Oct. 14, 2011

Preserving the Smithsonian Institution’s

Web Presence

Smithsonian Institution Archives Fair

The Mission of SI Archives

Appraise, acquire, and preserve the records of the Institution

Offer a range of research and reference services

Establish policy and provide expert guidance on record keeping practices

Create and promote products and services that broaden understanding of the Smithsonian

Provide professional archival and conservation expertise

Smithsonian’s First Home Page, 1995

The Smithsonian Today

Website and Social Media Registry A “record” is any official recorded

information, regardless of medium or characteristics, created, received, and maintained by a Smithsonian museum, office, or employee

Websites and social media accounts must be managed as records

Registry allows staff from across the Smithsonian to add and update information about all of their websites and social media accounts

Appraising Records

All records must be appraised to determine their ultimate disposition

Records appraised based on administrative, legal, historical, and research value

Records with long-term value are transferred to Archives

Appraising Traditional WebsitesWebsites are public face of Smithsonian Significant historical and research value Constantly changing Crawl annually and before and after

major redesigns Work with webmasters to determine if

crawls should be more or less frequent

Appraising Social Media Accounts

All social media accounts are used differently Each account appraised individually based on

content Accounts containing significant original content

will be fully captured each year Accounts consisting mostly of links to other

resources will be captured occasionally to document existence

Method and frequency of capture may depend on terms of service and ability to avoid capturing non-Smithsonian content

Past Web Archiving Procedures• Files transferred from the Smithsonian’s

IT office• HTTrack web crawler• Scripts used to create XHTML

preservation files but very manual and time-consuming

Heritrix

• Archival web crawler• Open source• Java• Developed by Internet Archive, National

Library of Norway and National and University Library of Iceland

WARC

WARC – Web ARChive file format International standard – ISO 28500:2009 Extension of the ARC format in use since

1996 Container format

Crawling in Heritrix

STRI website in 1995SIA Accession 05-032

Viewing a Crawl

More To Do

Social Media

Third-party issues Privacy concerns Different tools

Lessons Learned

In-house archiving takes time No one-size fits all solution Master site registry requires regular

updating

Contacts and Resources

Lynda Schmitz FuhrigDigital Services Divisionschmitzfuhrigl@si.edu

Jennifer WrightArchives and Information Management Teamwrightjm@si.edu

Smithsonian Institution Archives website:http://siarchives.si.edu