Leetaru, Kalev: The GDELT Project

26

Transcript of Leetaru, Kalev: The GDELT Project

Page 1: Leetaru, Kalev: The GDELT Project
Page 2: Leetaru, Kalev: The GDELT Project
Page 3: Leetaru, Kalev: The GDELT Project

Tweets per month (1% sample) Active users per month (1% sample)

Page 4: Leetaru, Kalev: The GDELT Project
Page 5: Leetaru, Kalev: The GDELT Project

Datasets• NEWS: Worldwide local news coverage in 100 languages (65 live

translated) – online news preserved via Internet Archive• TELEVISION: Collaboration with the Internet Archive to process

more than 100 television stations across the US, updating daily• ACADEMIC LITERATURE: 21 billion words covering 70 years

(JSTOR/DTIC/CORE/CITESEER/IA)• BOOKS: Collaboration with Internet Archive and HathiTrust to

process 3.5 million books 1800-2015• HUMAN RIGHTS: Half century of worldwide human rights reports• IMAGERY: Large fraction of global news imagery processed via deep

learning: objects/activities, OCR, logos, facial sentiment, geolocation

Page 6: Leetaru, Kalev: The GDELT Project
Page 7: Leetaru, Kalev: The GDELT Project
Page 8: Leetaru, Kalev: The GDELT Project

Preserving Online News

• World’s largest initiative to preserve online news• Only program to focus on worldwide local news in local

languages• Partnership with Internet Archive’s NO404 program - prior to

this IA’s news archiving was very limited, focused extensively on the Western world and major English-language sources

• Most web archiving efforts preference English and Western news outlets

• Working with IA to ensure preservation of mobile formats and enhanced preservation of embedded article imagery

Page 9: Leetaru, Kalev: The GDELT Project

Preserving Online News

• 1.5-2% of news articles disappear within 2 weeks• 5% disappear within a month• Up to 14% gone after 2 months – half with 404 and half ranging

from sustained 500’s to domain removal (popular in some areas of the world)

• Of GDELT-relevant coverage, 140,000 articles published today will be gone in 2 months

• 14 million GDELT monitored articles disappeared over a 6 month period representing 2x the total output of the New York Times over the last half century

• Numbers vastly higher in some countries

Page 10: Leetaru, Kalev: The GDELT Project

Preserving Online News

• Manual efforts like Archive-IT don’t scale to sudden-onset events like natural disasters or terror attacks – need “always on” archiving. Majority of coverage in first 72 hours and levels off in 14 days.

• Nepal 2015 earthquake: Yale + Columbia preserved 107 URLs with ArchiveIT.

• Nepal 2015 earthquake: GDELT captured over 667,000 articles about the earthquake and the country’s recovery over the following year, including 225,000 in languages other than English, with the top language being Nepali – capturing the local perspective

Page 11: Leetaru, Kalev: The GDELT Project

Global Event Database Global Knowledge Graph

Page 12: Leetaru, Kalev: The GDELT Project

Greece

France

Germany

Italy

United Kingdom

Page 13: Leetaru, Kalev: The GDELT Project

Burundi - 12/13/2015Instability

Tone

Media Attention

Topics

Page 14: Leetaru, Kalev: The GDELT Project
Page 15: Leetaru, Kalev: The GDELT Project
Page 16: Leetaru, Kalev: The GDELT Project

Physical Unrest

Anxiety

Positive/Negative: “Cautiously Optimistic” Trending

Page 17: Leetaru, Kalev: The GDELT Project
Page 18: Leetaru, Kalev: The GDELT Project

US Ebola News Coverage

Number American television news broadcasts per week mentioning "ebola"

• March 2014 WHO announcement

• First American infections• Eric Duncan arrives in Dallas

Average “tone” of English language media coverage of “ebola”

• Steady ascent towards more and more positive coverage as “Western medicine miracles to the rescue” theme dominates coverage

Page 19: Leetaru, Kalev: The GDELT Project
Page 20: Leetaru, Kalev: The GDELT Project
Page 21: Leetaru, Kalev: The GDELT Project
Page 22: Leetaru, Kalev: The GDELT Project

Carbon Capture & Sequestration

• English coverage of CCS 2010-2015• 32,000 websites, 250,000 people, 140,000 organizations,

50,000 locations• Green cluster (center): senior American policymakers• Green cluster (lower): “cap and trade” politicians• Red cluster (bottom): American lawmakers on Congressional

energy committee or sponsoring energy-related legislation• Purple cluster (top right): climate skeptics• Yellow (upper left): Australian politicians• Pink (upper center): British politicians• Periphery of all clusters: journalists and financial analysts who

feature prominently in coverage or who write much of the coverage – Karolin Schaps (Reuters) and Alex Morales (Bloomberg News London) are attached to British political cluster; Tom Friedman is attached to American political cluster

Page 23: Leetaru, Kalev: The GDELT Project
Page 24: Leetaru, Kalev: The GDELT Project

• Red: Actual Ukraine• Green: Avg Turkey

(2/19/1999-4/20/1999) and Lebanon (3/24/2007-5/23/2007)

• (r=0.49)

Page 25: Leetaru, Kalev: The GDELT Project
Page 26: Leetaru, Kalev: The GDELT Project

[email protected]

http://kalevleetaru.com

http://gdeltproject.org/

http://blog.gdeltproject.org/