How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

24
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER 2012 - 1

description

 

Transcript of How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Page 1: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

How to Face the Challenges of Web Archiving?

The experiences of a small library on the edge.

Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland

LIBER 2012 - 1

Page 2: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Context: National Library of Ireland

• Beginnings: Established by the Dublin Science and Museum Act, 1877

• Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”.

• The Digital Record: Born Digital Programme established in 2010, covering web archiving.

• Web Archive Projects: 2 pilot projects in 2011

LIBER 2012 - 2

Page 3: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Context: Internet Memory

European Archive / Internet Memory Foundation•Established in 2004 in Amsterdam (offices also in Paris)•Mission: to preserve Web content as a new media for current and future generations •Actions: Sensibilization, partnerships, R&D•Open Access Collections: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland

Internet Memory Research•Spin-off of IM established in June 2011 in Paris•Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction)

LIBER 2012 - 3

Page 4: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

Building a 21st Century Library:

– Born Digital– Digitisation– Single Integrated Catalogue– Digital Repository– OSCAIL, the Digital Library Programme

LIBER 2012 - 4

Page 5: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

Born Digital Materials:• Natural progression for NLI’s strong political,

cultural and historical collections• How best to approach this in time of

unprecedented financial difficulty?• Born Digital Programme established to examine

requirements and produce a policy document for the next steps

LIBER 2012 - 5

Page 6: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

The Hand of History:

– Snap General Election

– Five Weeks

LIBER 2012 - 6

Page 7: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

Just do it

LIBER 2012 - 7

Page 8: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

Just do it

How?

LIBER 2012 - 8

Page 9: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project Origins National Library of Ireland

Collaborative Partnership:

Partner that suited our requirements and that had experience with others in the cultural sector

Requirements:– Technical skills in the

NLI but working on other projects – needed these skills

– Leverage NLI’s on strong curatorial experience, esp. in politics

– Fast!

LIBER 2012 - 9

Page 10: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Web Archiving Project: Project OriginsNational Library of Ireland

Project phases:

– Project scoping and contract– Site selection– Permissions gathering– QA (look and feel)– Publication and promotion

LIBER 2012 - 10

Page 11: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Site Selection and PermissionsNational Library of Ireland

Selection Criteria:

– Website presence– Technical reasons– Cut-off date– Women candidates

Permissions:

– All sites contacted and provided with a brief

– Pressurised but necessary phase

LIBER 2012 - 11

Page 12: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Scope of projectsNational Library of Ireland

General Election:

– Crawl: 200 snapshots– Scope: 100 seeds– Frequency: 2 times– Date: Feb. 2011

Presidential Election:

– Crawl: 80 snapshots– Scope: 70 seeds– Frequency: 3 times– Date: Oct-Nov. 2011

LIBER 2012 - 12

Page 13: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

CrawlInternet Memory

• Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites

• Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt

files exclusion ; Politeness

• Specific incidents technical changes on the flyModification of scope ; Pending crawls ; Adaptation of the politeness

• Improvement of second crawl

LIBER 2012 - 13

Page 14: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Quality Assurance (QA)National Library of Ireland

• Manual QA

• Jira software

• IM – Technical QA

• NLI - ‘Look and Feel’ QA

• Multiple browsers

• Communication with site owners (building relationships and promotion)

LIBER 2012 - 14

Page 15: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Quality Assurance (QA)Internet Memory

• Why?

• How? • Manual and visual method: homepage + 2 • Resolution of issues

• Temporal Coherence

LIBER 2012 - 15

Page 16: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

AccessNational Library of Ireland

• Available to the public

• Full text search

• IM website – search by keyword, URL

• NLI catalogue – keyword via widget developed by NLI IS team and IM

• Future – access through NLI’s own interfaces, issue of integrating results

LIBER 2012 - 16

Page 17: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Publication and PromotionNational Library of Ireland

• NLI social media initiative (Twitter and blog)

• Project participants

• Print media (esp. in area of technology)

• And IM!

• Usage figures have increased but real value more apparent in 5-10 years

LIBER 2012 - 17

Page 18: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Usage Statistics of Web ArchiveNational Library of Ireland

21/09/2011: Official launch of NLI Web archives (Tweets)

26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie

25/11/2011: Paper on irishtimes.com

20/01/2012: Paper on irishtimes.com

17/03/2012: Post on soundofthearchives.wordpress.com

04/05/2012: Paper on irisheconomy.ie

LIBER 2012 - 18

Page 19: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Advantages of Web ArchivingNational Library of Ireland

Web archiving:– New opportunities for delivery of materials to

users– Work with existing users expectations that

content be online– Reach new audiences

LIBER 2012 - 19

Page 20: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Advantages of Web ArchivingNational Library of Ireland

Political web archives;Irish General Election:– Researchers can compare online content pre-

and post-election– Facilitates research into how ‘online’ this

election was– Assess impact of technological developments

in campaign communications– Record of campaign information

LIBER 2012 - 20

Page 21: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Benefits of Working TogetherNational Library of Ireland

Pilot project for a long-term activity:– Allowed us to enter a new collecting area

despite lack of tech expertise– Facilitated collection of important material that

one else was collecting– Collect material quickly– Leverage curatorial skills– Gained new technical skills

LIBER 2012 - 21

Page 22: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Benefits of Working TogetherInternet Memory

• To supporte the development of Web archiving initiatives

• To operate rapid deployment of Web archives

• To address new challenges in this area:• Social media content• QA• Automatization

LIBER 2012 - 22

Page 23: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

Conclusion

General Election:• 18,495,771 URLs• 1.14 TB• 10,405 ARCs

Presidential Election:• 7,333,399 URLs• 278.10 GB• 2,513 ARCs

View the NLI collections at:http://www.nli.ie/en/udlist/digital-collections.aspx

View the Web archive blog entry at:http://www.nli.ie/blog/index.php/2011/10/26/general-election-2011-web-archiving/

View Internet Memory Collections at:http://collections.europarchive.org/

To be continued…

LIBER 2012 - 23

Page 24: How to Face the Challenges of Web Archiving? The Experiences of a Small Library on the Edge

LIBER 2012 - 24

Questions?

Thanks for your attention!

Chloe MartinInternet

Memoryhttp://internetmemory.org

[email protected]@InternetMemory

Catherine RyanNational Library of Irelandhttp://[email protected]@NLIreland