Classifying Microblogs For Disasters
-
Upload
sarvnaz-karimi -
Category
Data & Analytics
-
view
176 -
download
0
description
Transcript of Classifying Microblogs For Disasters
Classifying Microblogs for Disasters
Sarvnaz Karimi Jessie Yin Cecile Paris
Social media plays an important role during disasters
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi2 |
• Realtime, popular, free• Accessible• Available
During disasters people share useful information
• lyttelton tunnel had reopened last night #eqnz
Or ask for help or information
• Kindercare in Fendalton, Christchurch - all okay? We are trying to get through with no luck. #eqnz
• Need help. Any donors of medicines for diarrhea cases in Baganga, Davao Oriental pls? #reliefPH #PabloPH pls tweet @KarloPuerto
Or even offer help
• I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz
And sometimes not so useful
• Someone just wondered aloud if the #eqnz was just another sign from God that he doesn't want The Hobbit to get made. #maybe?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi3 |
Challenges of Working with Twitter Data
• In fact, lots of times Tweets are useless babbles
• Tweets are really short (140 characters)
• People often speak informal language
• And even in serious messages, tweets can be abbreviated to compensate for the length
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi4 |
Finding useful content can become looking for a needle in a haystack!
I hv final yr medstudents in parade rd addington! They cn help. Bruce n boys #eqnz
How to filter massive amount of Twitter messages in order to identify high value tweets related to natural or man-made disasters, or even specific types of disaster?
CSIRO: positive impact | Presentation title | Presenter name5 |
Keyword search to find disaster-related tweets
• Lots of false-positives due to multiple senses or ambiguities of keywords such as “fire”, or even “earthquake”
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi6 |
She’s a natural disaster: a tsunami in her eyes an earthquake in her chest a hurricaneflooding her mind she’s a travelingcatastrophe
In a pool of over 5700 tweets retrieved using keyword search, we had over 50% false positives.
Our work: Classify Twitter Stream for Disasters
•Classify tweets as Disaster and Non-disasterBinary Classification
•Classify tweets into disaster types:
– Earthquake
– Storm (hurricane, tornado, cyclone)
– Fire
– Flooding
– Other (e.g Civil disorder, Traffic accident)Multi-class classification problem
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi7 |
Related Studies
• Tweet classification: o Papers that used classifiers for categories such as news and junk, or opinion,
and private messages.
o Papers that heavily used hashtags.
o Adding context to short tweets by aggregating those that share the same hashtags, or by adding URL contents.
• Twitter during disasters:o Qualitative analysis on tweets published during a specific event to study
microblogger behaviour.
o On of the most cited works is by Sakaki et al. (2010), which made a classifier for earthquake to alert people. Their classifier was based on tweet length, position of query term (earthquake or shaking) in the tweet, n-grams, context of the query terms.
CSIRO: positive impact | Presentation title | Presenter name8 |
We do not focus on specific incidents, and do not assume the hashtags are known.We study different types of disasters, not just one.
Twitter Data
• Sampled a total of 6,500 tweets published in a range of two years, from December 2010 till November 2012
• Data was gathered using keyword search (fire, flooding, storm, tornado, hurricane, cyclone, and earthquake, accident).
• No retweets
• A number of disasters were included, among others: earthquake in Christchurch, New Zealand, 2011, Cyclone Yasi QLD, 2011, QLD floods, 2010-2011, bushfires in VIC, 2011, and the Hurricane Sandy, US 2012.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi9 |
Annotations
• Two stage annotations
• Crowd-sourced the annotations using Crowdflower.
• Annotators where asked:1. Is this tweet talking about a disaster? (Yes or No);
2. What type of disaster is it talking about? (multiple choice)
• Each tweet was annotated by three annotators
• 5,747 had full agreement
• 2850 tweets were identified as disaster-related and 2,897 as non-disaster
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi10 |
Classifiers
• SVM Classifier
• Multinomial Naive Bayes Classifier
• We only reported SVM. Naive Bayes consistently performed worse in all the experiments.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi11 |
C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology
Classification Features
Specific Features:• N-grams
• Hashtags
• Mentions
Generic Features:• Mention count
• Hashtag count
• Links
• Tweet length
CSIRO: positive impact | Presentation title | Presenter name12 |
What is the effect of using incident-specific compared to generic features inclassification accuracy? What are the best features to use for disaster classifiers?
Evaluation: Cross-validation vs. Time-Split
• K-fold cross-validation (e.g., 10 fold) is used in most similar studies (Sriram et al., 2010, Takemura and Tajima, 2012, Vosecky et al., 2012)
Problem:
• It overlooks the time-dependency among microblog data, and uses future-evidence, including hashtags, disaster names
Alternative:
• Time-split evaluation: Sort the data based on time, take the latest chunk as testing and others for training.
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi13 |
Disaster or Non-Disaster
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi14 |
Disaster-Type Classification
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi15 |
What features worked
• When training data is small, counts were better features. – Disaster-related tweets had 1.2 hashtags on average, versus 0.4 for non-
disaster tweets
• When our knowledge of an event is limited, hashtags or mentions are not so useful.
• In our experiments, classification accuracy using bigram features was worse than unigram.
CSIRO: positive impact | Presentation title | Presenter name16 |
Generic Features vs. Event-specific Features
• We need to learn the patterns that imply a type of natural or man-made disaster:
Same location, no disaster:
CSIRO: positive impact | Presentation title | Presenter name17 |
A massive cloud of smoke can be seen in south-west LakeMacquarie from the Wyee bushfire #nswfires #wyeefire@NewcastleHerald
Lake Macquarie is big & beautiful http: // lockerz.com/ s/ 257143427
Can we cross-train for disaster types?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi18 |
Application:
- Compromise for disaster types with little training data.
- Reduce ambiguity
Training Testing
Cross-Disaster Classification
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi19 |
Generic featureSpecific Feature
How much our classifiers can be generalised to identify previously unseen disaster types?
• We used under-sampling to create training and testing data
Can we cross-train for disaster types?
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi20 |
• Yes! Our results showed promise, especially for fire.
• “Language of disaster”
• Using generic features was more effective.
What’s Next
Events are often associated with a location1. Better Classifiers: We can use existence of location information
as a feature to strengthen our classifiers
2. Help taking actions on the information: Once we know a tweet is talking about a disaster, we can then extract information on locations. This could help emergency responders in resource allocation.
• We have already established that traditional Named Entity Recognisers are able to identify locations in tweets with high accuracy*. Now we need to pinpoint them on the map!
CSIRO: positive impact | Classifying Microblogs for Disasters | Sarvnaz Karimi21 |
* J. Lingad, S. Karimi, J. Yin, Location Extraction From Disaster-Related Microblogs, Proceedings of the 22nd international conference on World Wide Web companion, 2013