Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

20
Muhammad Imran, Carlos Cas)llo, Ji Lucas, Patrick Meier, Jakob Rogstadius Qatar Compu0ng Research Ins0tute (QCRI) Doha, Qatar Coordina0ng Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

description

An emerging paradigm for the processing of data streams involves human and machine computation working together, allowing human intelligence to process large-scale data. We apply this approach to the classification of crisis-related messages in microblog streams. We begin by describing the platform AIDR (Artificial Intelligence for Disaster Response), which collects human annotations over time to create and maintain automatic supervised classifiers for social media messages. Next, we study two significant challenges in its design: (1) identifying which elements must be labeled by humans, and (2) determining when to ask for such annotations to be done. The first challenge is selecting the items to be labeled by crowdsourcing workers to maximize the productivity of their work. The second challenge is to schedule the work in order to reliably maintain high classification accuracy over time. We provide and validate answers to these challenges by extensive experimentation on real- world datasets.

Transcript of Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Page 1: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Muhammad  Imran,  Carlos  Cas)llo,  Ji  Lucas,    Patrick  Meier,  Jakob  Rogstadius  

Qatar  Compu0ng  Research  Ins0tute  (QCRI)  Doha,  Qatar  

Coordina0ng  Human  and  Machine  Intelligence  to  Classify  Microblog  

Communica0ons  in  Crises  

Page 2: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

USEFUL  INFORMATION  ON  TWITTER  Cau0on    and  advice  

Informa0on    source  

Dona0ons  

Causali0es    &  damage  

A  siren  heard  

Tornado  warning  issued/li>ed  

Tornado  sigh)ng/touchdown  

42%  

50%  30%  

12%  

18%  Photos  as  info.  source  

Webpages  info.  source  

Videos  as  info.  source  

44%  

20%  

16%  

Other  dona)ons  

Money  

Equipment,  shelter,    Volunteers,  Blood  

38%  

8%  

54%  

People  injured  

People  dead  

Damage  

44%  

44%  

2%  

16%  

10%  

%  of  informa0ve  tweets  Ref:  “Extrac-ng  Informa-on  Nuggets  from  Disaster-­‐Related  Messages  in  Social  Media”.  Imran  et  al.  ISCRAM-­‐2013,  Baden-­‐Baden,  Germany.  

Page 3: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

SOCIAL  MEDIA  INFORMATION  PROCESSING:    OFFLINE  APPROACH  

Data  collec)on  

1   2  Human  annota)ons  on  sample  data  

Machine  training  

3  Classifica)on  

4  

Disaster  Timeline:  

DATA  COLLECTION  

Page 4: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

IMPACT  AND  RESPONSE  TIMELINE  

Source:  Department  of  Community  Safety,  Queensland  Govt.  2011  &  UNOCHA  

Disaster  response  (today)   Disaster  response  (target)  

Target  disaster  response  requires  real-­‐0me  processing.  

Page 5: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

REAL-­‐TIME  SOCIAL  MEDIA  ANALYSIS  

Key  requirements:  

•  Real-­‐0me  data  collec)on  •  Capable  to  incorporate  new  data  collec0on  strategies  

•  Obtain  human-­‐labels  in  real-­‐0me  •  Perform  de-­‐duplica0on  

•  Perform  almost  online  machine  learning  •  Con)nuous  learning  •  Learn  as  new  labels  arrive    

•  Perform  real-­‐0me  classifica0on  •  Scale  with  big  disasters  (Sandy  15k  posts/min)  

Page 6: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

Data  collec)on  

1   2  Human  annota)ons   Machine  training  

3  Classifica)on  

4  

ONLINE  APPROACH  

DATA  COLLECTION  

HA  

Learning-­‐1  

CLASSIFICATION  

Learning-­‐2   Learning-­‐3   …   Learning-­‐n  

Human  annota)on  -­‐  1    

Human  annota)on  -­‐  2  

Human  annota)on  -­‐  3   …   Human  

annota)on  -­‐  n  

First  few  hours  

SOCIAL  MEDIA  INFORMATION  PROCESSING:    ONLINE  APPROACH  (REAL-­‐TIME)  

Page 7: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

hdp://aidr.qcri.org/  

AIDR  —Ar)ficial  Intelligence  for  Disaster  Response—  is  a  free,  open-­‐source,  and  easy-­‐to-­‐use    plagorm  to  automa)cally  filter  and  classify  relevant  tweets  posted  during  humanitarian  crises.  

1   2   3  

Collect   Curate   Classify  

Page 8: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

AIDR:  FROM  END-­‐USERS  PERSPECTIVE  

Collec0on   Classifier(s)  

•  Keywords,  Hashtags  •  Geographical  bounding  box  •  Languages  •  Follow  specific  set  of  users  

A  collec0on  is  a  set  of  filters   A  classifier  is  a  set  of  tags  •  Dona0ons  requests  &  offers  •  Damage  &  causali0es  •  Eyewitness  accounts  

2  step  approach  1   2  

hdp://aidr.qcri.org/  

Page 9: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

REAL-­‐TIME  CLASSIFICATION  IN  AIDR  

Collec0on   Classifier(s)  

Tag   Tag  

Tag   Tag  

Learner  

Classifier-­‐1  

Tag  

Tag   Tag   Tag  

30k/min  

Classifier-­‐2  

hdp://aidr.qcri.org/  

Tag   Tag   Tag  

Labe

ling  task  

Model  

Page 10: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

HUMAN  ANNOTATION:  CHALLENGES  

hdp://aidr.qcri.org/  

•  Crisis-­‐specific  labels  are  necessary  •  Contras)ng  vocabulary  •  Differences  in  public  concerns,  affected  infrastructure  •  New  labels  should  be  collected  for  each  new  crisis  

1-­‐  Labeling  task  selec0on   2-­‐  Labeling  task  scheduling  •  Which  tasks  to  pick?  •  No  duplicate  tasks  should  be  labeled  •  Priori0ze  tasks  that  are  likely  to  

increase  accuracy    

•  All-­‐at-­‐once  labeling  •  Gradual  labeling  •  Independent  labeling    

Crowdsourcing  is  a  big  research  topic.  We  address  two  challenges  here:  

[  Imran  et  al.  2013b  ]  

Page 11: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

DATASETS  

hdp://aidr.qcri.org/  

1.   Joplin-­‐2011  •  Consists  of  206,764  tweets  collected  using  (#joplin)  

2.   Sandy-­‐2012  •  Consists  of  4,906,521  tweets  collected  using  (#sandy,  hurricane  sandy,  …)  

3.   Oklahoma-­‐2013  •  Consists  of  2,742,588  tweets  collected  using  (Oklahoma,  tornado,  …)    

Page 12: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

DISASTER  PHASES  &  #  OF  TWEETS  

hdp://aidr.qcri.org/  

Pre:  preparedness  phase  Impact:  phase  corresponds  to  the  period  in  which  the  main  effects  are  felt  Post:  corresponds  to  response  and  recovery  phase  

Joplin  (leL),  Sandy  (center),  and  Oklahoma  (right).  Number  of  tweets  per  day  in  all  datasets.  

Page 13: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SELECTION  

hdp://aidr.qcri.org/  

Experiment:    Are  crisis-­‐specific  labels  necessary?  

Manual  labeling  (using  Crowdflower)  

Train   Test   AUC  

Joplin   Sandy   0.52  

Joplin   Oklahoma   0.56  

Sandy   Oklahoma   0.53  

Dataset   Phase-­‐S1   Phase-­‐S2   Phase-­‐S3   Phase-­‐S4  

Joplin   2,000   1,000   1,000   1,000  

Sandy   2,000   1,000   1,000   1,000  

Oklahoma   2,000   1,000   1,000   N/A  

Classifica0on  accuracy  in  various  transfer  scenarios  

*  AUC  0.5  represents  a  random  classifier    

Page 14: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SELECTION  

hdp://aidr.qcri.org/  

Experiment:    Is  de-­‐duplica0on  necessary?  

Phase   Train   Phase   Test   AUC  (without  de-­‐duplica0on)    

AUC  (with  de-­‐duplica0on)  

S1  (pre)   1,500   S1  (pre)   500   0.78   0.74  

S1  (pre)   500   S1  (pre)   500   0.73   0.72  

S2  (impact)   500   S2  (impact)   500   0.80   0.72  

S3  (post)   500   S3  (post)   500   0.79   0.73  

S4  (post’)   500   S4  (post’)   500   0.70   0.64  

•  29-­‐74%  of  tweets  are  re-­‐tweets  &  60-­‐75%  are  near  duplicates  •  Duplica)on  causes  an  ar0ficial  increase  in  accuracy  •  Necessary  to  reduce  classifier  bias.  Otherwise  learning  on  a  fewer  concepts  •  Necessary  to  improve  workers  experience  

[  Rogstadius  et  al.  2011  ]  

Page 15: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SELECTION  

hdp://aidr.qcri.org/  

Experiment:    Which  approach  Passive  vs.  Ac0ve  learning?  

JOPLIN  

SANDY  

OKLAHOMA  

S1   S2   S3   S4  

Page 16: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SELECTION  

hdp://aidr.qcri.org/  

•  Are  crisis-­‐specific  labels  necessary?  [YES]  •  Is  de-­‐duplica0on  necessary?  [YES]  • Which  approach  to  follow  Passive  vs.  Ac0ve  learning?  [Ac0ve  learning]  

Now  we  know  WHICH  tasks  to  select.  But  we  s0ll  don’t  know  WHEN  to  label  them?  

Page 17: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SCHEDULING  

hdp://aidr.qcri.org/  

•  All-­‐at-­‐once  labeling  •  Obtain  1,500  labels  on  S1  and  use  all  for  training  

•  Cumula0ve  labeling  

•  Obtain  500  labels  in  each  of  S1,  S2,  and  S3  and  train  on  labels  available  up  to  each  phase  

•  Independent  labeling  •  Obtain  500  labels  in  each  of  S1,  S2,  and  S3  and  use  the  most  recent  labels  for  training,  discarding  old.  

 

Page 18: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

LABELING  TASK  SCHEDULING  Experiment:    Which  labeling  strategy  to  follow?  

JOPLIN  

SANDY  

OKLAHOMA  

Informa0ve   Informa0ve  (50%)   Dona0ons  

Page 19: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

CONCLUSION  &  FUTURE  WORK  

hdp://aidr.qcri.org/  

•  Adap0ve  collec0on  •  Post-­‐processing/filtering  •  More  features  and  learning  schemes  

•  Task  selec0on  •  De-­‐duplica)on  is  necessary  •  Ac)ve  learning  approach  must  be  employed  

•  Task  scheduling  •  All-­‐at-­‐once  for  small-­‐scale  crises  

•  Incremental  for  medium-­‐scale  crises  (needs  tests)  

Future  work:  

Page 20: Coordinating Human and Machine Intelligence to Classify Microblog Communica0ons in Crises

hdp://aidr.qcri.org/  

AIDR  —Ar)ficial  Intelligence  for  Disaster  Response—  is  a  free,  open-­‐source,  and  easy-­‐to-­‐use    plagorm  to  automa)cally  filter  and  classify  relevant  tweets  posted  during  humanitarian  crises.  

Thank  you!