Memento Update CNI Task Force Meeting, Spring 2011 1 Memento Herbert Van de Sompel Robert Sanderson...

53
Memento Update CNI Task Force Meeting, Spring 2011 1 Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps Towards Seamless Navigation of the Web of the Past

Transcript of Memento Update CNI Task Force Meeting, Spring 2011 1 Memento Herbert Van de Sompel Robert Sanderson...

Page 1: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 1

Mementohttp://mementoweb.org/

Herbert Van de Sompel Robert Sanderson

Michael L. Nelson

Giant Leaps Towards Seamless Navigationof the Web of the Past

Page 2: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 2

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 3: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 3

Overview of Memento Framework

Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 4: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 4

Memento wants to make it easy

to access the Web of the Past.

Page 5: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 5

Tate OnlineToday

Select DateMarch 16 2008

Tate OnlineMarch 16 2008

FromNational Archives

Page 6: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 6

Tate OnlineToday

Select DateMarch 16 2008

Tate OnlineMarch 16 2008

FromNational Archives

Dynamic Static

Page 7: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 7

Memento achieves this by introducing

a uniform version access capability to

integrate the present and past Web.

Page 8: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 8

Content Management Systems:

• Designed to be aware of all versions of a resource

• Self-contained

• Variety of proprietary version mechanisms

• Versions interlinked using proprietary mechanisms

• Dynamism is managed

Page 9: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 9

World Wide Web:

• Designed to forget about prior versions of a resource

• Distributed

• Dynamism from a management perspective is ignored

Page 10: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 10

There are resource versions on the Web:

• Content management systems

• Web archives

• Transactional archives

• Search engine caches

Page 11: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 11

But the Web architecture has a hard time dealing with them:

• Cannot talk about a resource as it used to exist

• Cannot access a prior version knowing the current one

• Cannot access the current version knowing a prior one

Current approaches are ad hoc and localized

Page 12: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 12

Memento:

• Regards the Web as a big Content Management System

• Introduces a uniform capability to access versions on the Web

• Does not build new archives but leverages all systems that host versions: Web archives, Content Management Systems, Software Version Systems, etc.

Page 13: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 13

Memento’s version access approach:

• Is distributed: versions may exist on several servers

• Uses time as a global version indicator

• Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link

Page 14: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 14

Original Resource and Versions

Page 15: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 15

Bridge from Present to Past

Page 16: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 16

Bridge from Past to Present

Page 17: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 17

Memento Framework

Page 18: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 18

Multiple Archives

Page 19: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 19

Memento Client-Server Interaction

Page 20: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 20

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 21: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 21

Significant progress has been made towards

seamless navigation of the Web of the Past.

Page 22: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 22

Standardization

• Standardization process started via the IETF

• Interest from IETF and W3C

• Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas

https://datatracker.ietf.org/doc/draft-vandesompel-memento/

Page 23: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 23

Memento Clients

• Several client tools developed by us and others

• Add-ons for FireFox (operational) and Internet Explorer (experimental)

• Applications for Android (operational) and iPhone/iPad (in development)

• Paper in next issue ofCode4Lib Journal

http://www.mementoweb.org/tools/

Page 24: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 24

Memento Server Support (1)

• Memento-compliant Wayback software:• Used by Internet Archive

• Available to Web archives, worldwide

• Please have your favorite Web Archive install this new version 1.6!

http://www.mementoweb.org/tools/

Page 25: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 25

Memento Server Support (2)

• Plug-in for MediaWiki (operational)

• Used on W3C’s main wiki

• Please install it for your MediaWiki!

http://www.mementoweb.org/tools/

Page 26: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 26

Memento Server Validator

• Server side client:• Attempts to perform all

Memento actions against a given URI

• Reports success/failure of the interactions and warnings for optional aspects

• Kept up to date with IETF Internet Draft

http://www.mementoweb.org/tools/

Page 27: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 27

Memento Proxy Support

• Several systems that host Mementos made Memento-compliant “by proxy”:

• All major Web Archives that do not yet run Memento-compliant Wayback software

• 3,000+ MediaWiki systems, including Wikipedia

• We want all of these to become natively Memento compliant!

Page 28: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 28

Memento Website

• Ongoing effort to add materials that support understanding and adoption:• Introduction to Memento• How to recognize

Mementos, TimeGates, Original Resources?

• Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.)

http://www.mementoweb.org/guide/

Page 29: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 29

Funding

• 2007-2010: US $250K grant from Library of Congress

• Approx. 50K on Memento

• 2010-2011: US $1 Million follow-up grant from Library of Congress

• For: Specification, outreach, tool development, further research

Page 30: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 30

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 31: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 31

Memento Time Travel is really powerful.

Time-Series Data via HTTP follow-your-nose.

Page 32: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 32

Memento Framework

Page 33: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 33

Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png

Time Series for Humans

Page 34: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 34

Data collected through HTTP Navigation

Time Travel across versions of a Picture of the Day

Page 35: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 35

Thanks Christine!

time

change

Data

time

Process

time

Reproducibility

But if we had static, discoverable snapshots of the data and the process…

Page 36: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 36

Original Resource: http://dbpedia.org/resource/France

Time Series for Machines

Page 37: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 37

Data collected through HTTP Navigationpaper at http://arxiv.org/abs/1003.3661

Time Travel across versions of DBPedia

Page 38: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 38

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 39: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 39

Very few Web sites provide a “timegate” link.

Need additional mechanisms to support Discovery.

Page 40: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 40

Batch discovery of Mementos: TimeMaps

A TimeMap minimally lists:

• URI and datetime of Mementos known to an archive• URI of Original Resource

TimeMaps can be aggregated across systems that host Mementos

Page 41: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 41

Batch discovery of Mementos: Feed of TimeMaps

• System that host Mementos exposes Feed (e.g. Atom) of TimeMaps to allow applications to remain in sync with its evolving Memento collection:

• One Atom entry per Original Resource for which system hosts Mementos• The entry provides a “timemap” link to a TimeMap for the Original Resource• The datetime value of the updated field of the entry changes when additional Memento for Original Resource becomes available (i.e. TimeMap changes)• The ID of the entry is a tag URI based on URI of Original Resource

Will be proposed to IIPC

Page 42: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 42

Batch discovery of Mementos: robots.txt

• robots.txt file is used by Web servers to convey crawling policies

• Add a directive to support discovery of Mementos known to the server:

• Pointer to a single Memento can suffice as the robot can crawl on from there• Mementos allow for discovery of TimeMaps via HTTP links• e.g. jcdl.org hosts snapshot archives of prior JCDL conferences and adds the following to its robots.txt

Memento: jcdl.org/archive/2002/index.html

Will be promoted via Internet Draft

Page 43: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 45

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 44: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 46

Memento can recreate pages using resources from different archives.

This poses a branding challenge.

Page 45: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 47

Current Branding Practice for Web Archives

Page and embedded resources from same Web Archive

Brandingfor

pageand

embedded resources

Page 46: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 48

Branding for Web Archives in Memento Mode

Will be researched

Page and embedded resources from various Web Archives

Page branding

Nobranding

Nobranding

Page 47: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 49

Overview of Memento Framework

Deployment Progress

Memento and Data

Memento and Discovery

Memento and Branding

Alternative Web Archiving Strategies

Page 48: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 50

Crawl-based Archives host distinct observations.

Transactional Archives never miss an update.

Page 49: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 51

Crawl-Based Web Archives

Observations

For example: Heritrix crawler for Internet Archive

Page 50: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 53

Server-Side Transactional Web Archives

Change History

For example: TTApache, PageVault, Vignette Web Capture

Page 51: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 55

Development of Transactional Web Archive Software

Submit:• Java-Grizzly-Jersey submission interface application• Berkeley DB metadata store• FS store for body and headers

Capture:• Apache connection filter module (mod_ta) captures URI, headers, body• Module POSTs in real-time to transactional archive’s Submit URI

Page 52: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 56

Development of Transactional Web Archive Software

Development timeline:• Ongoing development (LANL) and testing (ODU)• Submit/Access finalized; development focus on collection management• Expected release as open source, 3rd Quarter 2011

Access:• Transactional archive natively supports Memento• Immediate availability of archived content• Export of WARC, e.g. for long-term archiving in other environment

Page 53: Memento Update CNI Task Force Meeting, Spring 2011 1 Memento  Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Memento UpdateCNI Task Force Meeting, Spring 2011 57

Mementohttp://mementoweb.org/

Herbert Van de SompelRobert SandersonMichael L. Nelson

Giant Leaps Towards Seamless Navigation of the Web of the Past