From Chirps to Whistles - Discovering Event-specific Informative Content from Twitter

From Chirps to Whistles Discovering Event- specific Informative Content from Twitter Debanjan Mahata, John R. Talburt [email protected] , [email protected] Department of Information Science University of Arkansas at Little Rock, Little Rock, USA Vivek Kumar Singh [email protected] Department of Computer Science South Asian University, New Delhi, India

Transcript of From Chirps to Whistles - Discovering Event-specific Informative Content from Twitter

From Chirps to WhistlesDiscovering Event-specific

Informative Content from Twitter

Debanjan Mahata, John R. [email protected], [email protected]

Department of Information ScienceUniversity of Arkansas at Little Rock, Little Rock, USA

Vivek Kumar [email protected]

Department of Computer ScienceSouth Asian University, New Delhi, India

Real-life Events

“In #Sochi, the Dutch are dominating the overall Olympic medal count (Reuters)”

“New post: Sochi Was For Suckers - Laugh Studios/ #lol #funny #rofl #funnypic #fail #wtf.”“Thanks for the memories Sochi!

I've had the time of my life #Sochi2014 #sochiselfie”

“Cooked my first low-fat meal today, officially on a diet #sochi.”


Twitter Content for Real-life Events

Intriguing Questions• Which are the event-specific informative tweets and how to

identify them?

• Who are the users producing large amount of event-specific informative content in Twitter?

• Which are the best hashtags and URLs to follow that will lead to high quality event-specific information?

• Which are the hashtags and text units suitable for indexing for efficient retrieval of event-specific information?

• Can we possibly devise a method that answers the above questions simultaneously?

Potential Applications• Event Monitoring and Analysis• Event Information Retrieval• Opinion and Review Mining• Recommender Systems• Event Management and Marketing• Social Media Data Integration• Digital Journalism• Many More


Volume and Velocity Veracity

New post: Sochi Was For Suckers - Laugh Studios/ #lol #funny #rofl #funnypic #fail #wtf

Informal Text


Searching the Long Tail

Sampling Bias

Sparse Link Structure Between

Content in Social Media

Lack of Evaluation Datasets

Problem Statement

Given an event , a time ordered stream of n tweets related to the event posted in time period , the problem is to find a ranked set of :

• Tweets

• Hashtags

• Text Units

• URLs

• Users

Ordered in terms of their decreasing order of its event-specific informativeness

iE },...,,{ 21 nE mmmMi


}|......{ 1 jimmmmM njiEi

}|......{ 1 jihhhhH pjiEi

}|......{ 1 jiwwwwW rjiEi

}|......{ 1 jillllL tjiEi

}|......{ 1 jiuuuuU sjiEi

4.3 million tweets

5 events

Event Reference Preparation• Parts-of-Speech Tagging• Special Character Detection• Data Cleansing• Duplicate Detection• Stop Word Detection and Elimination• Slang Word Extraction• Feeling Word Extraction• Tokenization• Stemming• Tweet Meta-Data

• Expanded URLs• User Information• Verification• Favorite Count• Retweet Count• User Mentions

• Entity Extraction

Tweet Features

No. of Unigram Tokens, No. of Stop Words, No. of Slang Words, No. of Feeling Words, No. of Hashtags, Has URL, Is Verified, No. of User Mentions, Length of Post, No. of Unique Characters, No. of Special Characters, Favorite Count, Retweet Count, Formality, No. of Nouns, No. of Adjectives, No. of Verbs, No. of Adverbs.

Logistic Regression Model Performance

Precision Recall F-1 Score

Non-informative (0) 0.70 0.49 0.57

Informative (1) 0.78 0.90 0.84

Avg/TotalAccuracy = 76.64

0.76 0.77 0.75

Olteanu, Alexandra, et al. "CrisisLex: A lexicon for collecting and filtering microblogged communications in crises." In Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM" 14). No. EPFL-CONF-203561. 2014.

Event Related Content Analysis

28000 annotated tweets

26 Events

Related and Informative – “#MediaLarge wildfire in N. Colorado promptsEvacuation : Crews are battling a fast-Moving wildfire #Politics #News”

Related but not Informative – “RT @LarimerSheriff: #HighParkFire update”

Not Related – “#Intern #US #TATTOO#Wisconsin #Ohio #NC #PA #Florida#Colorado #Iowa #Nevada #Virginia#NV #mlb Travel Destinations;”

Event Related Content Analysis

3.8 million tweets

3 events


Process using



• SeenRank (

• TextRank (Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing order into texts." Association for Computational Linguistics, 2004.)

• LexRank(Erkan, Günes, and Dragomir R. Radev. "LexRank: graph-based lexical centrality as salience in text summarization." Journal of Artificial Intelligence Research (2004): 457-479.)

• RTRank

• Centroid(Becker, Hila, Mor Naaman, and Luis Gravano. "Selecting Quality Twitter Content for Events." ICWSM 11 (2011).)

• Logistic Regression


Evaluation Metrics




p iDCG


1 )1log(






natreferencesrelevantofNumbernatecision Pr

Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.

Järvelin, Kalervo, and Jaana Kekäläinen. "Cumulated gain-based evaluation of IR techniques." ACM Transactions on Information Systems (TOIS) 20.4 (2002): 422-446.

NDCG Curves for Millions March NYC

NDCG Curves for Sydney Siege Crisis

NDCG Values for Millions March NYCTechnique @











0.979 0.975 0.966 0.966 0.957 0.936 0.951 0.960 0.967 0.989

LexRank 0.859 0.807 0.830 0.813 0.822 0.825 0.834 0.878 0.922 0.944

RTRank 0.744 0.752 0.749 0.765 0.792 0.822 0.861 0.870 0.884 0.922

Logistic Regression

0.729 0.753 0.757 0.752 0.757 0.776 0.792 0.839 0.878 0.915

SeenRank 0.595 0.652 0.708 0.733 0.745 0.759 0.801 0.828 0.859 0.884

Centroid 0.519 0.560 0.623 0.658 0.690 0.727 0.747 0.788 0.835 0.857

TextRank 0.333 0.383 0.418 0.468 0.499 0.564 0.633 0.681 0.729 0.782

Precision Values for Millions March NYC

Event Name Sydney Siege CrisisTop 10 Event-specific Informative Hashtags #sydneysiege, #SydneySiege, #Sydneysiege, #MartinPlace, #9News,

#SydneyHostageCrisis, #Sydney, #Lindt, #ISIS, #SYDNEYSIEGE

Top 10 Event-specific Informative Text Units police, sydney, reporter, lindt, isis, nsw, commissioner, australia,

catherine, martin

Top 5 Event-specific Informative URLs


Top 5 Event-specific Informative Tweets

1. RT @faithcnn: Hostage taker in Sydney cafe has demanded 2 things: ISIS flag and; phone call with Australia PM Tony Abbott #SydneySiege

2. Aussie grand mufti and; Imam Council condemn #Sydneysiege hostage capture - LIVE UPDATES http://t.c...

3. RT @PatDollard: #SydneySiege: Hostages Held By Jihadis In Australian Cafe - WATCH LIVE VIDEO COVERAGE #tcot #pjnet

4. RT @FoxNews: MORE: Police confirm 3 hostages escape Sydney cafe, unknown number remain inside #Sydneysiege

5. Watch #sydneysiege police conference live as hostages are still being held inside a central Sydney cafe #c4news

Sample Raw Results for Sydney Siege Crisis

Sample Raw Results for Sydney Siege CrisisTop Five Event-

specific Informative Users

Three Randomly Selected Tweet Excerpts

User 1Total no. of event related tweets by

the user: 41

1. RT @cnni: Hostage taker in Sydney cafe demands ISIS flag and call with Australian PM, Sky News reports. #sydneysiege

2. RT @DR_SHAHID: Hostage taker demands delivery of an #ISIS flag and a conversation with Prime Minister Tony Abbott

3. RT @SkyNewsBreak: Update - New South Wales police commissioner confirms five hostages have escaped from the Lindt cafe in Sydney #sydneysiege

User 2Total no. of event related tweets by

the user: 33

1. RT @smh: NSW Police Deputy Commissioner Catherine Burn will hold a press conference to update on the #SydneySiege at 6.30pm.

2. RT @Y7News: Helpful travel advice for commuters heading out of #Sydney’s CBD this evening - #sydneysiege

3. RT @hughwhitfeld: British PM David Cameron informed of #sydneysiege .. UK Foreign Office is in touch with Aus authorities

User 3Total no. of event related tweets by

the user: 32

1. RT @RT_com: #SYDNEY: Gunman tall man in late 40s, dressed in black – eyewitness #SydneySiege

2. RT @NewsAustralia: 2GB's Ray Hadley claims hostage takers in #SydneySiege "wants to speak to Prime Minister Abbott live on radio."

3. RT @BBCWorld: "Profoundly shocking" -Australia PM Tony Abbott delivers second #sydneysiege statement. MORE:

Future Directions

• Summarizing Event Content• Identification of Insightful Opinionated

Content• Event Topic Modeling• Event-specific Recommendations• Distributed Processing of

TwitterEventInfoGraph• Ontology for Event Content in Social Media• Many More

Additional Slides


Defining Events

An event is defined as a real-world occurrence with an associated time period and a time ordered stream of tweets , of substantial volume, Discussing about the event and posted in time .

iE)( end

EstartEE iii

ttT iE



Becker, Hila, Mor Naaman, and Luis Gravano. "Beyond Trending Topics: Real-World Event Identification on Twitter." ICWSM 11 (2011): 438-441.

},...,,{ 21 nE mmmMi

},...,,{ 21 pE hhhHi

},...,,{ 21 tE lllLi

},...,,{ 21 rE wwwWi

Tweets are primarily composed of

• Set of hashtags

• Set of text units • Set of URLs • Set of users },...,,{ 21 sE uuuU
