Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations
-
Upload
denis-parra-santander -
Category
Education
-
view
55 -
download
0
Transcript of Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations
![Page 1: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/1.jpg)
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations
Alfredo Cobo [email protected]
Denis Parra [email protected]
Jaime Navón [email protected]
Pon=ficia Universidad Católica de Chile Departamento de Ciencia de la Computación
Av. Vicuña Mackenna 4860, Macul San=ago, Chile
![Page 2: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/2.jpg)
I (… and some other people in this room)
… come from Chile
Picture from hMp://www.quadrodemedalhas.com/images/mapas/mapa-‐chile.jpg
hMp://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Chile_in_South_America_(-‐mini_map_-‐rivers).svg/409px-‐Chile_in_South_America_(-‐mini_map_-‐rivers).svg.png
![Page 3: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/3.jpg)
Chile, well-known for its..
• Copper (Top Producer)
"Top 5 Copper Producers" by Plazak -‐ Own work. Licensed under CC BY-‐SA 3.0 via Wikimedia Commons -‐ hMp://commons.wikimedia.org/wiki/File:Top_5_Copper_Producers.png#/media/File:Top_5_Copper_Producers.png hMps://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0CAYQjB0&url=hMp%3A%2F%2Fcommons.wikimedia.org%2Fwiki%2FFile%3ANa=ve_Copper_(mineral).jpg&ei=L31ZVbOsL4r1UrbRgKAB&bvm=bv.93564037,d.d24&psig=AFQjCNHr2zm5m4Jmim7AgkCwwSb0b5mGUA&ust=1432014509629311
![Page 4: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/4.jpg)
Chile, well-known for its..
• Wine (Price + quality)
"Fiesta de Vendimia" by LuxoDresden -‐ Own work. Licensed under CC BY-‐SA 3.0 via Wikimedia Commons -‐ hMp://commons.wikimedia.org/wiki/File:Fiesta_de_Vendimia.JPG#/media/File:Fiesta_de_Vendimia.JPG
![Page 5: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/5.jpg)
If you start typing in Google…
9 out of 10 disasters …
![Page 6: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/6.jpg)
If you start typing in Google…
9 out of 10 disasters … prefer Chile
![Page 7: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/7.jpg)
… and for Natural Disasters L
• Largest ever registered earthquake in History: Valdivia, Chile, 22nd of May of 1960 (9.5 in Richter Scale)
• We usually have 1 large earthquake every 30 years (~ 8 degrees in Richter Scale)
• Last one in 2010 close to Concepción, but it also affected San=ago (the capital)
![Page 8: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/8.jpg)
… so, at PUC Chile
• We created CIGIDEN “Na=onal Research Center for the Integrated Administra=on of Natural Disasters”
![Page 9: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/9.jpg)
CIGIDEN’s Goal in this project
• Help ci=zens staying informed during situa=ons of natural disasters by using Social Media. • Build Mobile Applica=on (Carlos Molina) • Filter automa=cally relevant messages from those not related to earthquakes (Alfredo Cobo) to feed the applica=on
![Page 10: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/10.jpg)
Our Task: Building a Twitter classifier -‐ Filter tweets related to natural disasters from those who did not.
![Page 11: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/11.jpg)
Related Work Manual Classifica8on Data Post-‐processing Feature Genera8on Tools for Disaster Management
Vieweg et al. (2010) Imran et al. (2013) Mendoza et al. (2010)
Mendoza et al. (2010) Cas=llo et al. (2011) (Informa=on Credibility on TwiMer)
Gimpel et al. (2011) Koloumpis et al. (2011) Liu et al. (2012) Wu et al. (2011) Lee et al. (2014) (Not necessarily for natural disasters)
Hiltz et al. (2013) Power et al. (2013) Caragea et al. (2011) Abel et al. (2012) Middleton et al. (2014) MorstaMer et al. (2013) Imran et al. (2014)
![Page 12: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/12.jpg)
Why building this classifier would be a contribution? • Building and valida=ng a ground truth for classifying tweets in Spanish.
• Building the classifier and dealing with • Class Imbalance • Number of latent dimensions (Feature Genera=on using LDA)
![Page 13: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/13.jpg)
Workflow of Activities
Chile’s Earthquake 2010
Cas=llo et al. (2010)
Our groundtruth
Non-‐relevant messages
Realis=c dataset
Sampling, Cleaning & filtering
Classifiers
-‐ Feature selec=on (LDA)
-‐ Class Imbalance
10% -‐ 80%
![Page 14: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/14.jpg)
Building the ground truth
• Random sampling of 5,000 tweets from Cas=llo et al. (2010) dataset, used to study credibility ~ Chile’s 2010 earthquake.
• Dates: From February 27th un=l March 2nd (Spanning 4 days in 2010)
• We kept only Spanish messages, removed messages too similar (Lavenshtein distance): 2,187 messages leE
![Page 15: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/15.jpg)
Validating of the ground truth
• Fleiss Kappa: • κ = 0.645, p < .001
• Intraclass correla=on • ICC(2,1): IIC = 0.646, p < .001
• Landis and Koch et al. (1977)
• Relevant messages were labeled based on Imran et al. (2013) classifica=on: • Cau=on/Warning • Casual=es and Damage • People (missing, found, etc.) • Informa=on source
![Page 16: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/16.jpg)
Workflow of Activities
Chile’s Earthquake 2010
Cas=llo et al. (2010)
Our groundtruth
Non-‐relevant messages
Realis=c dataset
Sampling, Cleaning & filtering
Classifiers
-‐ Feature selec=on (LDA)
-‐ Class Imbalance
![Page 17: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/17.jpg)
Classification Problem Features Class Imbalance
User Network
Content (4,766 unique words)
Followers Hashtags Followees Words
User men=ons
• Ground Truth is a not realis=c representa=on of TwiMer
• We added “Noise”: Introduced Tweets non-‐relevant to the event (20% -‐ 80%)
• Sampled non-‐relevant tweets from 5 months.
• Removed all tweets posted during days of seismic ac=vi=es
![Page 18: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/18.jpg)
Model Precision Recall F1 score Accuracy AUC Dimensions Noise Propor8on
Baseline 0.625 0.545 0.53 0.5 0.568 -‐ 0
Bernoulli NB
0.831 0.226 0.355 0.594 0.605 2000 0
Logis=c Regression
0.827 0.641 0.722 0.756 0.834 2000 0.6
Linear SVM 0.687 0.677 0.682 0.687 0.719 1000 0.6
Random Forest
0.807 0.673 0.734 0.758 0.844 1000 0.8
Classification Results
![Page 19: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/19.jpg)
Analysis ~ LDA Dimensions and Noise
![Page 20: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/20.jpg)
Analysis ~ LDA Dimensions and Noise
![Page 21: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/21.jpg)
Conclusions & Future Work
• We built and validated a ground truth of tweets in Spanish relevant to disasters
• We implemented a classifier and analyzed its performance based on several algorithms and dealing with class imbalance problem
• Future Work: Move the applica=on from prototype to produc=on, test online scalability
![Page 22: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/22.jpg)
That’s all folks!
• Thanks and ques=ons to corresponding author Alfredo Cobo: [email protected] or Denis Parra: [email protected]
![Page 23: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/23.jpg)
Chile, small country, but well-known for its..
• Length (4,300 Km)
~ 4,300 Km ~8,000 Km
![Page 24: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/24.jpg)
Model Features
• Newman et al. (2007) • Biro et al. (2008) • Wei et al. (2006) • Wang et al. (2012) • Han (2005)
Features Corpora Features Followers Hashtags Friends Words
User men=ons
![Page 25: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/25.jpg)
Results
• Amatriain et al. (2013)
![Page 26: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/26.jpg)
Architecture
![Page 27: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/27.jpg)
Plots of bootstrap Agreement Day 1 Agreement Day 2
Agreement Day 4 Agreement Day 3
![Page 28: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/28.jpg)
Word Frequencies
![Page 29: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/29.jpg)
Just “Terremoto”: AUC
![Page 30: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/30.jpg)
Related Work
![Page 31: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/31.jpg)
Manual classification
• Vieweg et al. (2010) • Imran et al. (2013)
![Page 32: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/32.jpg)
Post Processing
• Cas=llo et al. (2011) • Mendoza et al. (2010)
![Page 33: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/33.jpg)
Feature Generation Approaches
• Gimpel et al. (2011) • Koloumpis et al. (2011) • Liu et al. (2012) • Wu et al. (2011) • Lee et al. (2014)
![Page 34: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/34.jpg)
Tools For Disaster Management
• Hiltz et al. (2013) • Power et al. (2013) • Caragea et al. (2011) • Abel et al. (2012) • Middleton et al. (2014) • MorstaMer et al. (2013) • Imran et al. (2014)
![Page 35: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/35.jpg)
Building the ground truth
• Mendoza et al. (2010)
• Imran et al. (2013)
![Page 36: Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural Disaster Situations](https://reader034.fdocuments.net/reader034/viewer/2022042716/55b55a73bb61ebc45d8b4688/html5/thumbnails/36.jpg)
Algorithms and evaluation procedure
• Cas=llo et al. (2011) • FawceM et al. (2004) • Manning et al. (2008) • Wen et al. (2014)