Preserving the Smithsonian Institution’s Web Presence

21
Lynda Schmitz Fuhrig and Jennifer Wright Oct. 14, 2011 Preserving the Smithsonian Institution’s Web Presence Smithsonian Institution Archives Fair

description

Presentation delivered by Lynda Schmitz Fuhrig, Electronic Archivist, and Jennifer Wright, Archivist, for the Smithsonian Institution Archives, at the Smithsonian Archives Fair on October 14, 2011 in Washington, DC. Although it first began capturing institutional websites in the late 1990s, the Smithsonian Institution Archives initiated a project in 2009 to capture the explosion of public websites and social media instances maintained by its many museums, research centers, and programs with the Heritrix crawler. This presentation reviews appraisal, accessioning, and capture issues in documenting the Smithsonian’s web presence in the early 21st Century.

Transcript of Preserving the Smithsonian Institution’s Web Presence

Page 1: Preserving the Smithsonian Institution’s Web Presence

Lynda Schmitz Fuhrig and Jennifer Wright

Oct. 14, 2011

Preserving the Smithsonian Institution’s

Web Presence

Smithsonian Institution Archives Fair

Page 2: Preserving the Smithsonian Institution’s Web Presence

The Mission of SI Archives

Appraise, acquire, and preserve the records of the Institution

Offer a range of research and reference services

Establish policy and provide expert guidance on record keeping practices

Create and promote products and services that broaden understanding of the Smithsonian

Provide professional archival and conservation expertise

Page 3: Preserving the Smithsonian Institution’s Web Presence

Smithsonian’s First Home Page, 1995

Page 4: Preserving the Smithsonian Institution’s Web Presence

The Smithsonian Today

Page 5: Preserving the Smithsonian Institution’s Web Presence

Website and Social Media Registry A “record” is any official recorded

information, regardless of medium or characteristics, created, received, and maintained by a Smithsonian museum, office, or employee

Websites and social media accounts must be managed as records

Registry allows staff from across the Smithsonian to add and update information about all of their websites and social media accounts

Page 6: Preserving the Smithsonian Institution’s Web Presence

Appraising Records

All records must be appraised to determine their ultimate disposition

Records appraised based on administrative, legal, historical, and research value

Records with long-term value are transferred to Archives

Page 7: Preserving the Smithsonian Institution’s Web Presence

Appraising Traditional WebsitesWebsites are public face of Smithsonian Significant historical and research value Constantly changing Crawl annually and before and after

major redesigns Work with webmasters to determine if

crawls should be more or less frequent

Page 8: Preserving the Smithsonian Institution’s Web Presence

Appraising Social Media Accounts

All social media accounts are used differently Each account appraised individually based on

content Accounts containing significant original content

will be fully captured each year Accounts consisting mostly of links to other

resources will be captured occasionally to document existence

Method and frequency of capture may depend on terms of service and ability to avoid capturing non-Smithsonian content

Page 9: Preserving the Smithsonian Institution’s Web Presence

Past Web Archiving Procedures• Files transferred from the Smithsonian’s

IT office• HTTrack web crawler• Scripts used to create XHTML

preservation files but very manual and time-consuming

Page 10: Preserving the Smithsonian Institution’s Web Presence

Heritrix

• Archival web crawler• Open source• Java• Developed by Internet Archive, National

Library of Norway and National and University Library of Iceland

Page 11: Preserving the Smithsonian Institution’s Web Presence

WARC

WARC – Web ARChive file format International standard – ISO 28500:2009 Extension of the ARC format in use since

1996 Container format

Page 12: Preserving the Smithsonian Institution’s Web Presence

Crawling in Heritrix

Page 13: Preserving the Smithsonian Institution’s Web Presence
Page 14: Preserving the Smithsonian Institution’s Web Presence
Page 15: Preserving the Smithsonian Institution’s Web Presence

STRI website in 1995SIA Accession 05-032

Page 16: Preserving the Smithsonian Institution’s Web Presence

Viewing a Crawl

Page 17: Preserving the Smithsonian Institution’s Web Presence

More To Do

Page 18: Preserving the Smithsonian Institution’s Web Presence

Social Media

Third-party issues Privacy concerns Different tools

Page 19: Preserving the Smithsonian Institution’s Web Presence

Lessons Learned

In-house archiving takes time No one-size fits all solution Master site registry requires regular

updating

Page 20: Preserving the Smithsonian Institution’s Web Presence
Page 21: Preserving the Smithsonian Institution’s Web Presence

Contacts and Resources

Lynda Schmitz FuhrigDigital Services [email protected]

Jennifer WrightArchives and Information Management [email protected]

Smithsonian Institution Archives website:http://siarchives.si.edu