Leetaru, Kalev: lighntning talk, The GDELT Project

8

Transcript of Leetaru, Kalev: lighntning talk, The GDELT Project

Page 1: Leetaru, Kalev: lighntning talk, The GDELT Project
Page 2: Leetaru, Kalev: lighntning talk, The GDELT Project

Datasets• NEWS: Worldwide local news coverage in 100 languages (65 live

translated) – online news preserved via Internet Archive• TELEVISION: Collaboration with the Internet Archive to process

more than 100 television stations across the US, updating daily• ACADEMIC LITERATURE: 21 billion words covering 70 years

(JSTOR/DTIC/CORE/CITESEER/IA)• BOOKS: Collaboration with Internet Archive and HathiTrust to

process 3.5 million books 1800-2015• HUMAN RIGHTS: Half century of worldwide human rights reports• IMAGERY: Large fraction of global news imagery processed via deep

learning: objects/activities, OCR, logos, facial sentiment, geolocation

Page 3: Leetaru, Kalev: lighntning talk, The GDELT Project
Page 4: Leetaru, Kalev: lighntning talk, The GDELT Project

Preserving Online News

• World’s largest initiative to preserve online news – partnership with the Internet Archive

• Only program to focus on worldwide local news in local languages• 1.5-2% of news articles disappear within 2 weeks• 5% disappear within a month• Up to 14% gone after 2 months – half with 404 and half ranging from sustained 500’s

to domain removal (popular in some areas of the world)• Of GDELT-relevant coverage, 140,000 articles published today will be gone in 2 months• 14 million GDELT monitored articles disappeared over a 6 month period representing

2x the total output of the New York Times over the last half century• Nepal 2015 Earthquake: preserving coverage of sudden-onset natural disasters

requires “always on” preservation – GDELT preserved 667,000 articles – 225,000 non-English, with top being Nepali

Page 5: Leetaru, Kalev: lighntning talk, The GDELT Project
Page 6: Leetaru, Kalev: lighntning talk, The GDELT Project
Page 7: Leetaru, Kalev: lighntning talk, The GDELT Project
Page 8: Leetaru, Kalev: lighntning talk, The GDELT Project

[email protected]

http://kalevleetaru.com

http://gdeltproject.org/

http://blog.gdeltproject.org/