Who and What Links to the Internet Archive
-
Upload
yasmina-anwar -
Category
Technology
-
view
1.647 -
download
0
description
Transcript of Who and What Links to the Internet Archive
![Page 1: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/1.jpg)
Who and What Links to the Internet Archive
Yasmin AlNoamany, Ahmed AlSum, Michele C. Weigle, Michael L. Nelson
Computer Science Department
Old Dominion University, Norfolk, VA
![Page 2: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/2.jpg)
2 Access Patterns for Robots and Humans in Web Archives
Motivation
• No prior study has answered the following questions:
– What do web archive users look for in terms of the content language of requested pages?
– How do people reach web archives?
– Who links to web archives?
– How do sites link to web archives?
– Why do sites link to the past?
![Page 3: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/3.jpg)
3 Access Patterns for Robots and Humans in Web Archives
Dataset Sampling
![Page 4: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/4.jpg)
4 Access Patterns for Robots and Humans in Web Archives
English pages are the most, followed by the European languages
![Page 5: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/5.jpg)
5 Access Patterns for Robots and Humans in Web Archives
European languages represent 22% of the Unarchived requested pages
![Page 6: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/6.jpg)
6 Access Patterns for Robots and Humans in Web Archives
Most languages self-link
![Page 7: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/7.jpg)
7 Access Patterns for Robots and Humans in Web Archives
82% of human sessions connect to the Wayback Machine via referrals
WebSite Percentage Description
en.wikipedia.org 12.9% Wikipedia
archive.org 11.9% IA Home Page
reddit.com 10.2% Social News Web Site
google.TLD 9.9% Search Engine
info-poland.bualo.edu 1.5% Polish Studies
de.wikipedia.org 1.4% Wikipedia
cracked.com 1.2% Humor Site
snopes.com 1.1% Urban Legends Reference Pages
facebook.com 0.9% Social Media
crochetpatterncentral.com 0.9% Crocheting Hobbies
![Page 8: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/8.jpg)
8 Access Patterns for Robots and Humans in Web Archives
Most of the links (86%) are to mementos
![Page 9: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/9.jpg)
9 Access Patterns for Robots and Humans in Web Archives
83% of the mementos that have links from outside the archive do not currently
exist on the live web
![Page 10: Who and What Links to the Internet Archive](https://reader034.fdocuments.net/reader034/viewer/2022051412/5497f8d9ac795982318b4999/html5/thumbnails/10.jpg)
10 Access Patterns for Robots and Humans in Web Archives
Conclusions
• We provided analysis of the distributions of languages to gain insight about what users look for on the Wayback Machine – English is the most used language, followed by many
European languages – The languages are linking mainly to themselves and to
English
• We provided analysis for the human referrers to discover where Wayback Machine users come from: – 86% of the referrer web pages link deeply to mementos – More than 82% of the links to these mementos are
because their corresponding URI-Rs do not exist on the live web