Keynote talk: Big Crisis Data, an Open Invitation

78
BIG CRISIS DATA An Open Invitation CARLOS CASTILLO @BigCrisisData Manaus, Brasil, Outubro 2015

Transcript of Keynote talk: Big Crisis Data, an Open Invitation

Page 1: Keynote talk: Big Crisis Data, an Open Invitation

BIG CRISIS DATAAn Open Invitation

CARLOS CASTILLO@BigCrisisData

Manaus, Brasil, Outubro 2015

Page 2: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 2

This talk is about ...● Disasters and time-critical situations

– Natural, social, or technological hazards

– Mass convergence events● Social media

– Particularly microtext● Computing

– Applications of many fields including NLP, ML, IR

Page 3: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 3

http://www.youtube.com/watch?v=0UFsJhYBxzY

Page 4: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 4

An earthquake hits a Twitter user

http://xkcd.com/723/

● When an earthquake strikes, the first tweets are posted 20-30 seconds later

● Damaging seismic waves travel at 3-5 km/s, while network communications are light speed on fiber/copper + latency

● After ~100km seismic waves may be overtaken by tweets about them

Page 5: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 5

January 2010

How/when did it start for me?

Page 6: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 6

Humanitarian Computing

At least 775 publications:

● Crisis Analysis (55)

● Crisis Management (309)

● Situational Awareness (67)

● Social Media (231)

● Mobile Phones (74)

● Crowdsourcing (116)

● Software and Tools (97)

● Human-Computer Interaction (28)  

● Natural Language Processing (33)  

● Trust and Security (33)

● Geographical Analysis (53)

Source: http://humanitariancomp.referata.com/

Page 7: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 7

Humanitarian Computing Topics

Page 8: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 8

Page 9: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 9

Page 10: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 10

Fertile grounds for applied research✔ Problems of global significance

✔ Solved with labor-intensive methods

✔ Better solution provides a public good

✔ Large and noisy data sets available

✔ Engage volunteer communities

Page 11: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 11

Fertile grounds for applied research✔ Problems of global significance

✔ Solved with labor-intensive methods

✔ Better solution provides a public good

✔ Large and noisy data sets available

✔ Engage volunteer communities

Relevance to practitioners?

Page 12: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 12

Recent collaboratorsPatrick Meier

Sarah Vieweg– QCRI

Muhammad Imran– QCRI

Irina Temnikova– QCRI

Alexandra Olteanu– EPFL

Aditi Gupta– IIIT Delhi

“P.K.” Kumaraguru– IIIT Delhi

Fernando Diaz– Microsoft

Page 13: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 13

Outline

Volume

Vagueness

Visualization

Volunteering

Values

Page 14: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 14

Disaster Communications

and Scale

Page 15: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 15

Crises and disasters● Crises are unstable situations

– May or may not lead to a disaster● Disasters are social phenomena

– Disruptions of routines

Page 16: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 16

Temporal and Spatial Dimensions

Page 17: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 17

Examples

Page 18: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 18

REEL LIFE OR REAL LIFE?

Page 19: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 19

REEL LIFE OR REAL LIFE?

Page 20: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 20

https://www.youtube.com/watch?v=MylI8HmgMBk

Page 21: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 21

In Real Life ...● Some people panic, most people don't

● People gather information from familiar sources

● People quickly decide whether to flee, take cover, or take action

● People improvise complex rescue operations on the spot

Devon, UK, June 2014 London, UK, May 2015 San José Boquerón, Paraguay, Oct 2013

Page 22: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 22

Example Disaster-Related Messages“OMG! The fire seems out of control: It’s running down the hills!”

Bush fire near Marseilles, France, in 2009 [Longueville et al. 2009]

“Red River at East Grand Forks is 48.70 feet, +20.7 feet of flood stage, -5.65 feet of 1997 crest. #flood09”

Red River Valley floods in 2009 [Starbird et al. 2010]

“My moms backyard in Hatteras. That dock is usually about 3 feet above water [photo]”

Hurricane Sandy 2013 [Leavitt and Clark 2014]

“Sirens going off now!! Take cover...be safe!”Moore Tornado 2013 [Blanford et al. 2014].

“There is shooting at Utøya, my little sister is there and just called home!”

2011 attacks in Norway [Perng et al. 2013]

Page 23: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 23

Social media usage during disasters● Interpersonal (horizontal)

– Stay in touch with family and friends● Citizen sensing (bottom-up)

– Read/Write reports on ground situation● Official communications (top-down)

– E.g. advice, warnings, or evacuation orders

Page 24: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 24

Scale: Tweets per Second

Page 25: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 25

Requirements● Typical users

– Emergency response services

– Humanitarian relief agencies

– Journalists and the Public● Underspecified requirements that vary over time

● Usually a combination of:

1) Capture the “Big Picture”

2) Obtain “Actionable Insights”

Page 26: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 26

Understanding, Classifying and

Extracting

Page 27: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 27

Example

“Media must report about d alleged 20k RSS chaps off 2 #Nepal.here’s a pic coz d 1 @ShainaNC shared isn’t true.. ;)”

Page 28: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 28

Social media messages● Social media is more like a transcript of a conversation than like

text meant to stand on its own

– Awkward entry methods:● Fragmented language and incomplete sentences● Many typographic and grammatical errors

– Conversational:● Little or no context (hard to comprehend in isolation)● Code switching and borrowing● Internet slang

Page 29: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 29

Slang

Page 30: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 30

ClassificationCaution &

AdviceInformation

SourcesDamage &Casualties Donations

Gov

Eyewitness

Media

NGO

Outsider

...

...

Filteredtweets

Page 31: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 31

Classification Axes● By usefulness (application-dependent!)

– Not related, Related but useless, Useful● By factual, subjective, or emotional content

● By information provided

● By information source

– Government, NGOs, media, eyewitnesses, etc.● By humanitarian clusters

Page 32: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 32

Humanitarian Clusters

Page 33: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 33

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: What to Expect When the Unexpected Happens: Social Media Communications Across Crises.To appear in CSCW 2015.

Humanitarian Clusters (cont.)

Page 34: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 34

A large-scale study of crisis tweets● Collect tweets from 26 disasters

● Classify according to:

● Informative / Not informative● Information provided● Information source

● Several iterations required to write the “right” instructions

Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press.

Page 35: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 35

Information Provided in Crisis Tweets

N=26; Data available at http://crisislex.org/

Page 36: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 36

What do people tweet about?● Affected individuals

– 20% on average (min. 5%, max. 57%)

– most prevalent in human-induced, focalized & instantaneous events

● Sympathy and emotional support

– 20% on average (min. 3%, max. 52%)

– most prevalent in instantaneous events● Other useful information

– 32% on average (min. 7%, max. 59%)

– least prevalent in diffused events

Page 37: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 37

What do people tweet about? (cont.)● Infrastructure and utilities

– 7% on average (min. 0%, max. 22%)

– most prevalent in diffused events, in particular floods● Caution and advice

– 10% on average (min. 0%, max. 34%)

– least prevalent in instantaneous & human-induced events● Donations and volunteering

– 10% on average (min. 0%, max. 44%)

– most prevalent in natural hazards

Page 38: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 38

Distribution over information sources

Page 39: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 39

Distribution over time

Page 40: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 40

Dataset

CrisisLexT26

www.crisislex.org

Page 41: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 41

Information Extraction

...

Classifiedtweets @JimFreund: Apparently we have no choice.

There is a tornado watch in effect

tonight.

Page 42: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 42

Extraction● #hashtags, @user mentions, URLs, etc.

– Regular expressions

– Text library from Twitter● Temporal expressions

– Part-of-speech tagger + heuristics

– Natty library● Supervised learning

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media. Social Web and Disaster Management (SWDM) workshop. Rio de Janeiro, Brazil, 2013.

Page 43: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 43

Labels for extraction● Type-dependent instruction

● Ask evaluators to copy-paste a word/phrase from each tweet

Page 44: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 44

Learning: Conditional Random Fields

● Extends HMM to incorporate more possible dependencies

● Used extensively in NLP for part-of-speech tagging and information extraction

HMM Linear-chain CRF

hidden

observed

Page 45: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 45

Tool● CMU ARK Twitter NLP

– Tokenization

– Feature extraction

– CRF learning● Very easy to use

– simply change the training set (part-of-speech tags),

– then re-train

Page 46: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 46

Output examplesRT @weatherchannel: .@NYGovCuomo orders closing of NYC bridges. Only Staten Island bridges unaffected at this time. Bridges must close by 7pm. #Sandy #NYC

Wow what a mess #Sandy has made. Be sure to check on the elderly and homeless please! Thoughts and prayers to all affected

RT @twc_hurricane: Wind gusts over 60 mph are being reported at Central Park and JFK airport in #NYC this hour. #Sandy

RT @mitchellreports: Red Cross tells us grateful for Romney donation but prefer people send money or donate blood dont collect goods NOT best way to help #Sandy

Page 47: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 47

Extractor evaluation

Setting Rec Prec

Train 2/3 Joplin, Test 1/3 Joplin 78% 90%

Train 2/3 Sandy, Test 1/3 Sandy 41% 79%

Train Joplin, Test Sandy 11% 78%

Train Joplin + 10% Sandy, Test 90% Sandy

21% 81%

● Precision is: one word or more in common with what humans extracted

Page 48: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 48

Donations matching● Identify and match requests/offers for donations

– Money, clothing, food, shelter, volunteers, blood● Method

– Classify

– Determine key aspects

– Extract key aspects

– Per-aspect matching

Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz: Emergency-Relief Coordination on Social Media: Automatically Matching Resource Requests and Offers. First Monday 19 (1), January 2014.

Page 49: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 49

Donations matching

Average precision = 0.21 (0.16 if only text similarity is used)

Page 50: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 50

Crisis maps from social

media

Page 51: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 51

Page 52: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 52

Page 53: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 53

Patrick Meier, Social Innovation Director @ QCRI – http://irevolution.net/

“What can speed humanitarian

response to tsunami-ravaged coasts?

Expose human rights atrocities?

Launch helicopters to rescue

earthquake victims? Outwit corrupt

regimes?

A map.”

Page 54: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 54

Crisis mapping goes mainstream (2011)

Page 55: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 55

Page 56: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 56

Page 57: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 57

Page 58: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 58

Page 59: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 59

Page 60: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 60

Automatic Mapping (floods)● Top: hydrological data

● Bottom: tweet density

● Broad match with affected areas

● Many biases towards places with higher density of smartphones

De Albuquerque, João Porto, Herfort, Benjamin, Brenning, Alexander, and Zipf, Alexander. 2015. A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management. International Journal of Geographical Information Science, 29(4), 667–689.

Page 61: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 61

Automatic Mapping (Dengue)

Gomide, Janaina and Veloso, Adriano and Meira, Wagner and Almeida, Virgilio and Benevenuto, Fabricio and Ferraz, Fernanda and Teixeira, Mauro (2011) Dengue surveillance based on a computational model of spatio-temporal locality of Twitter. pp. 1-8. In: Proceedings of the ACM WebSci'11, June 14-17 2011, Koblenz, Germany.

● Top: official reports

● Bottom: tweets

Page 62: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 62

Current Approach

Hybrid real-time systems

MicroMappers

Manual processing: crowdsourcing

Automatic processing: machine learning

Page 63: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 63

http://newsbeatsocial.com/watch/0_s6xxcr3p

Page 64: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 64

Page 65: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 65

Page 66: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 66

https://www.youtube.com/watch?v=uKgE3yWJ0_I

Page 67: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 67

Volunteering and Values

Page 68: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 68

Volunteering is a constant● Integral part of how communities react to disasters

● Organizational types:

– Existing – Extending – Expanding – Emerging● Emergent organizations a mixed blessing for existing ones

● New scenario: digital volunteering

– E.g. volunteer annotations, including crisis mapping

Page 69: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 69

Why do people volunteer?

Altruism is key, but it's

one of many reasons

Page 70: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 70

Privacy and Ethics● Protect the privacy of individuals

– ICRC Data Protection Guidelines

– UN Guidelines on Cyber Security● Protect victims and responders during armed attacks

● Protect volunteers from distal exposure

● Protect citizen reporters from danger and retaliation

● Give back and share results and data

Page 71: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 71

“I'm dying, they are tweeting”

Digital Voyeurism

Page 72: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 72

CONCLUSIONS

Page 73: Keynote talk: Big Crisis Data, an Open Invitation

Computationally feasible

Supported by

data

Useful

Good projects in this space

Page 74: Keynote talk: Big Crisis Data, an Open Invitation

Computationally feasible

Supported by

data

Useful

Good projects in this space

Temptation! Danger!

Poorly planned projects :-(

AI-complete problems

Page 75: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 75

Interdisciplinary Research● As many things, it has Good, Bad, and Ugly aspects● Good

– You learn a lot, and it's the only way of supporting claims of practical utility in applied research

● Bad– Formal response organizations can be very difficult to engage with;

relationships should be established between operations● Ugly

– Working software and 24/7 support for a critical need now vs advanced proof-of-concept later

Page 76: Keynote talk: Big Crisis Data, an Open Invitation

Possibility of large impact by using computer science to support

humanitarian work

=Applied computing at its best

Page 77: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 77

References● Carlos Castillo: “Big Crisis Data.” Cambridge University Press, 2016 (forthcoming).● Muhammad Imran, Carlos Castillo, Fernando Diaz, Sarah Vieweg: "Processing Social Media Messages in Mass

Emergency: A Survey" In ACM Computing Surveys, Volume 47, Issue 4, June 2015.● Alexandra Olteanu, Sarah Vieweg and Carlos Castillo: "What to Expect When the Unexpected Happens: Social

Media Communications Across Crises" In CSCW 2015, 14-18 March in Vancouver, Canada. ACM Press. ● Muhammad Imran, Ioanna Lykourentzou, Yannick Naudet and Carlos Castillo: Engineering Crowdsourced Stream

Processing Systems. Technical report, 2015.● Hemant Purohit, Amit Sheth, Carlos Castillo, Patrick Meier, Fernando Diaz: Emergency-Relief Coordination on

Social Media: Automatically Matching Resource Requests and Offers. First Monday 19 (1), January 2014. ● Sarah Vieweg, Carlos Castillo and Muhammad Imran: "Integrating Social Media Communications into the Rapid

Assessment of Sudden Onset Disasters." SocInfo 2014.● Alexandra Olteanu, Carlos Castillo, Fernando Diaz and Sarah Vieweg: CrisisLex: A Lexicon for Collecting and

Filtering Microblogged Communications in Crises. In ICWSM. Ann Arbor, MI, USA. June 2014. ● Carlos Castillo, Marcelo Mendoza, Barbara Poblete: Predicting Information Credibility in Time-Sensitive Social

Media (+Supplementary Material). In Internet Research, Vol. 23, Issue 5, Special issue on The Predictive Power of Social Media, pp. 560-588. October 2013.

● Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Practical Extraction of Disaster-Relevant Information from Social Media. Social Web and Disaster Management (SWDM) workshop. Rio de Janeiro, Brazil, 2013.

● Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz and Patrick Meier: Extracting Information Nuggets from Disaster-Related Messages in Social Media. In ISCRAM. Baden-Baden, Germany, 2013. Best paper award.

Page 78: Keynote talk: Big Crisis Data, an Open Invitation

Big Crisis Data — Carlos Castillo 78

Thank you!Follow @BigCrisisData