Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Muhammad Imran, Carlos Cas)llo, Ji Lucas, Patrick Meier, Jakob Rogstadius

Qatar Compu0ng Research Ins0tute (QCRI) Doha, Qatar

Coordina0ng Human and Machine Intelligence to Classify Microblog

Communica0ons in Crises

USEFUL INFORMATION ON TWITTER Cau0on and advice

Informa0on source

Dona0ons

Causali0es & damage

A siren heard

Tornado warning issued/li>ed

Tornado sigh)ng/touchdown

42%

50% 30%

12%

18% Photos as info. source

Webpages info. source

Videos as info. source

44%

20%

16%

Other dona)ons

Money

Equipment, shelter, Volunteers, Blood

38%

8%

54%

People injured

People dead

Damage

44%

44%

2%

16%

10%

% of informa0ve tweets Ref: “Extrac-ng Informa-on Nuggets from Disaster-‐Related Messages in Social Media”. Imran et al. ISCRAM-‐2013, Baden-‐Baden, Germany.

SOCIAL MEDIA INFORMATION PROCESSING: OFFLINE APPROACH

Data collec)on

1 2 Human annota)ons on sample data

Machine training

3 Classifica)on

4

Disaster Timeline:

DATA COLLECTION

IMPACT AND RESPONSE TIMELINE

Source: Department of Community Safety, Queensland Govt. 2011 & UNOCHA

Disaster response (today) Disaster response (target)

Target disaster response requires real-‐0me processing.

REAL-‐TIME SOCIAL MEDIA ANALYSIS

Key requirements:

•  Real-‐0me data collec)on •  Capable to incorporate new data collec0on strategies

•  Obtain human-‐labels in real-‐0me •  Perform de-‐duplica0on

•  Perform almost online machine learning •  Con)nuous learning •  Learn as new labels arrive

•  Perform real-‐0me classifica0on •  Scale with big disasters (Sandy 15k posts/min)

Data collec)on

1 2 Human annota)ons Machine training

3 Classifica)on

4

ONLINE APPROACH

DATA COLLECTION

HA

Learning-‐1

CLASSIFICATION

Learning-‐2 Learning-‐3 … Learning-‐n

Human annota)on -‐ 1

Human annota)on -‐ 2

Human annota)on -‐ 3 … Human

annota)on -‐ n

First few hours

SOCIAL MEDIA INFORMATION PROCESSING: ONLINE APPROACH (REAL-‐TIME)

hdp://aidr.qcri.org/

AIDR —Ar)ficial Intelligence for Disaster Response— is a free, open-‐source, and easy-‐to-‐use plagorm to automa)cally filter and classify relevant tweets posted during humanitarian crises.

1 2 3

Collect Curate Classify

AIDR: FROM END-‐USERS PERSPECTIVE

Collec0on Classifier(s)

•  Keywords, Hashtags •  Geographical bounding box •  Languages •  Follow specific set of users

A collec0on is a set of filters A classifier is a set of tags •  Dona0ons requests & offers •  Damage & causali0es •  Eyewitness accounts

2 step approach 1 2


REAL-‐TIME CLASSIFICATION IN AIDR

Collec0on Classifier(s)

Tag Tag

Tag Tag

Learner

Classifier-‐1

Tag

Tag Tag Tag

30k/min

Classifier-‐2


Tag Tag Tag

Labe

ling task

Model

HUMAN ANNOTATION: CHALLENGES


•  Crisis-‐specific labels are necessary •  Contras)ng vocabulary •  Differences in public concerns, affected infrastructure •  New labels should be collected for each new crisis

1-‐ Labeling task selec0on 2-‐ Labeling task scheduling •  Which tasks to pick? •  No duplicate tasks should be labeled •  Priori0ze tasks that are likely to

increase accuracy

•  All-‐at-‐once labeling •  Gradual labeling •  Independent labeling

Crowdsourcing is a big research topic. We address two challenges here:

[ Imran et al. 2013b ]

DATASETS


1.   Joplin-‐2011 •  Consists of 206,764 tweets collected using (#joplin)

2.   Sandy-‐2012 •  Consists of 4,906,521 tweets collected using (#sandy, hurricane sandy, …)

3.   Oklahoma-‐2013 •  Consists of 2,742,588 tweets collected using (Oklahoma, tornado, …)

DISASTER PHASES & # OF TWEETS


Pre: preparedness phase Impact: phase corresponds to the period in which the main effects are felt Post: corresponds to response and recovery phase

Joplin (leL), Sandy (center), and Oklahoma (right). Number of tweets per day in all datasets.

LABELING TASK SELECTION


Experiment: Are crisis-‐specific labels necessary?

Manual labeling (using Crowdflower)

Train Test AUC

Joplin Sandy 0.52

Joplin Oklahoma 0.56

Sandy Oklahoma 0.53

Dataset Phase-‐S1 Phase-‐S2 Phase-‐S3 Phase-‐S4

Joplin 2,000 1,000 1,000 1,000

Sandy 2,000 1,000 1,000 1,000

Oklahoma 2,000 1,000 1,000 N/A

Classifica0on accuracy in various transfer scenarios

* AUC 0.5 represents a random classifier



Experiment: Is de-‐duplica0on necessary?

Phase Train Phase Test AUC (without de-‐duplica0on)

AUC (with de-‐duplica0on)

S1 (pre) 1,500 S1 (pre) 500 0.78 0.74

S1 (pre) 500 S1 (pre) 500 0.73 0.72

S2 (impact) 500 S2 (impact) 500 0.80 0.72

S3 (post) 500 S3 (post) 500 0.79 0.73

S4 (post’) 500 S4 (post’) 500 0.70 0.64

•  29-‐74% of tweets are re-‐tweets & 60-‐75% are near duplicates •  Duplica)on causes an ar0ficial increase in accuracy •  Necessary to reduce classifier bias. Otherwise learning on a fewer concepts •  Necessary to improve workers experience

[ Rogstadius et al. 2011 ]



Experiment: Which approach Passive vs. Ac0ve learning?

JOPLIN

SANDY

OKLAHOMA

S1 S2 S3 S4



•  Are crisis-‐specific labels necessary? [YES] •  Is de-‐duplica0on necessary? [YES] • Which approach to follow Passive vs. Ac0ve learning? [Ac0ve learning]

Now we know WHICH tasks to select. But we s0ll don’t know WHEN to label them?

LABELING TASK SCHEDULING


•  All-‐at-‐once labeling •  Obtain 1,500 labels on S1 and use all for training

•  Cumula0ve labeling

•  Obtain 500 labels in each of S1, S2, and S3 and train on labels available up to each phase

•  Independent labeling •  Obtain 500 labels in each of S1, S2, and S3 and use the most recent labels for training, discarding old.

LABELING TASK SCHEDULING Experiment: Which labeling strategy to follow?

JOPLIN

SANDY

OKLAHOMA

Informa0ve Informa0ve (50%) Dona0ons

CONCLUSION & FUTURE WORK


•  Adap0ve collec0on •  Post-‐processing/filtering •  More features and learning schemes

•  Task selec0on •  De-‐duplica)on is necessary •  Ac)ve learning approach must be employed

•  Task scheduling •  All-‐at-‐once for small-‐scale crises

•  Incremental for medium-‐scale crises (needs tests)

Future work:


AIDR —Ar)ficial Intelligence for Disaster Response— is a free, open-‐source, and easy-‐to-‐use plagorm to automa)cally filter and classify relevant tweets posted during humanitarian crises.

Thank you!

Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Technology

Transcript of Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises