Using Twitter Data to Predict Flu Outbreak
-
Upload
division-of-biomedical-informatics-uc-san-diego -
Category
Health & Medicine
-
view
254 -
download
3
description
Transcript of Using Twitter Data to Predict Flu Outbreak
Using Twi)er Data to Predict Flu Outbreak
Son Doan Division of Biomedical Informa2cs University of California San Diego
BigData@UCSD workshop
Nov 25, 2013
Seasonal influenza and influenza-‐like illness
• Seasonal influenza is a major public health concern: • 3-‐5 million cases of severe illness • 250,000 to 500,000 deaths worldwide each year
• Seasonal influenza has main syndrome called Influenza-‐Like Illness (ILI)
• During the peak of a major outbreak of influenza, more cases of ILI are observed
à Monitoring ILI can help in predict flu outbreak
Tradi?onal system to monitor ILI: ILINet
• ILINet: CDC’s U.S. Outpa2ent ILI Surveillance Network – consists of >3,000 outpa2ent healthcare providers – all 50 US states and area – reports more than 30 million pa2ent visits each year
• ILINet monitors influenza through ILI rate – ILI rate is percentage of pa2ents with ILI among all pa2ents
– Average na2onal baseline ILI rate for 2013 is 2.0%
Source: hVp://www.cdc.gov/flu/weekly/index.htm
Let’s revisit the process
Pa2ent 1 Healthcare provider
Pa2ent 2 Healthcare provider
Pa2ent n Healthcare provider
…
visits
visits
visits
Check if ILI
Check if ILI
Check if ILI
ILINet gather data and then calculate ILI rate
ILINet issue
ILINet needs 1-‐2 weeks to gather and process data
Can we leverage other data sources to predict ILI rate faster?
Nowadays, users tend to find informa?on in Internet
User 1
User 2
User n
…
searches
searches
searches
Internet
… or tweet their personal health condi?ons
User 1
User 2
User n
…
tweets
tweets
tweets
Internet
Es?mate ILI rate using user-‐generated data • Models
– Linear model [1]: ILI rate = (ILI-‐related data)�α + error
– Logis2c regression [2]: logit(ILI rate) = logit(ILI-‐related data)�α + error
• Key point: How to iden2fy ILI-‐related data? • Hint: ILI is defined as fever (temperature of 100°F [37.8°C] or greater) and cough and/or sore throat
[1] Polgreen et al. “Using internet searches for influenza surveillance”, Clinical Infec2ous Disease, 2008, 47(11):1443-‐8. [2] Ginsberg et al. “Detec?ng influenza epidemics using search engine query data.”, Nature. 2009 Feb 19;457(7232):1012-‐4
GFT es?mates based on flu-‐related queries are highly correlated to ILI rate
Source: hVp://www.google.org/flutrends/about/how.html
Repor2ng lag of about 1 day
GFT is good, however… • Researchers cannot access original data • GFT does not disclose search queries
Source: Ginsberg et al, Nature 457, 1012-‐1014 (19 February 2009)
SOURCES: GOOGLE FLU TRENDS (WWW.GOOGLE.ORG/FLUTRENDS); CDC; FLU NEAR YOU
Twi)er corpus Timeline: 36 weeks for the US 2009 influenza season (Aug 30, 2009 to May 8, 2010) Name Total
Tweets 587,290,394
Unique users
23,571,765
URL 136,034,309
Hash Tags
96,399,587
Thanks to Brendan O’Connor (CMU) and TwiVer Inc.
5 mil
10 mil
15 mil
20 mil
25 mil
Related work
Twi)er corpus
ILI-‐related tweets
Culo)a4 Signorini3 Chew3
flu swine h1n1
cough flu swine flu
headache influenza swineflu
sore throat
[3] A. CuloVa, “Detec2ng influenza epidemics by analyzing twiVer messages,” arXiv:1007.4748v1 [4] A. Signorini, A. M. Segre, and P. M. Polgreen, “The Use of TwiVer to Track Levels of Disease Ac2vity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic,” PLoS ONE, vol. 6, no. 5, p. e19467, 05 2011. [5] C. Chew and G. Eysenbach, “Pandemics in the Age of TwiVer: Content Analysis of Tweets during the 2009 H1N1 Outbreak,” PLoS ONE, vol. 5, no. 11, p. e14118, 11 2010.
Our approach: two-‐step filtering
Respiratory syndrome only
Respirator syndrome + “flu”
Respiratory syndrome + “flu” -‐ URL
Nega?on Emo?con
HashTags Humor
Geo
Knowledge-‐based approach Seman?c level
Twi)er corpus
Respiratory syndrome-‐related
tweets
Seman?c filtered tweets
Filter 1 Filter 2
Correla?on to ILI rate (CDC data)
Method Pearson corr with ILI rate
Google Flu Trends 0.9912 Related work CuloVa4 0.9485 Filter 1 Respiratory syndrome + “flu” -‐ URL 0.9752 Filter 1+2 Nega2on + Emo2con + HashTags +
Humor + Geo 0.9846
Correla?on to ILI rate (CDC data) %
S. Doan, L.Ohno-‐Machado, N. Collier, "Enhancing TwiVer Data Analysis with Simple Seman2c Filtering: Example in Tracking Influenza-‐ Like Illnesses", Proc. of the 2nd IEEE HISB 2012, pp.62-‐71, 2012.
Big Data challenge
Is sampling data enough?
Twi)er: 140 millions ac?ve users 340 millions tweets/day
Twitter API sampling rate is small (1-5% data)
Filtered tweets: 0.2% of samples
Syndromic surveillance for gastrointes?nal, respiratory, neurological, dermatological, haemorrhagic, musculoskeletal from Tweets in 40 world ci2es.
DIZIE: system for syndromic surveillance using Twi)er
Use cases
• DIZIE was integrated to BioCaster, our news media biosurveillance system
• DIZIE was used by European Centre for Disease Preven2on and Control (ECDC) to track syndromes in the London 2012 Summer Olympics
Poten?al applica?ons using Twi)er in public health
• Mental Heath Analysis
• Tobacco surveillance
• Medica2on use in social media
Acknowledgements
• Nigel Collier, European Bioinforma2cs Ins2tute • Mike Conway, UCSD • Lucila Ohno-‐Machado, UCSD
Data source for influenza surveillance
• Data provided by physicians and laboratory • Over-‐the-‐counter-‐drug sales • School absentee records • Health-‐related phone calls • Internet-‐based data:
– News media – Mailing list – Social media
Extract respiratory syndrome keywords
achy chest cold symptom respiratory failure
apnea cough runny nose
asthma dyspnea short of breath
asthma?c dyspnoea shortness of breath
blocked nose gasping for air sinusi?s
breathing difficul?es lung sounds sore throat
breathing trouble pneumonia stop breathing
bronchi?s rales stuffy nose
… … …
We have a total of 37 keywords
Knowledge-‐based approach
Name Example
Respiratory syndrome only
tweets containing syndrome keywords
Barber just coughed on me in the chair.
Respiratory syndrome + “flu”
tweets containing syndrome keywords and “flu”
I got flu n coughed a lot.
Respiratory syndrome + “flu” -‐ URL
tweets containing syndrome keywords and “flu”, remove links
7-year-old boy dies of flu,pneumonia < URL>
Seman2c level filtering
Name Examples
Nega?on Remove nega?on in tweets I don’t have flu
Emo?con Remove tweets containing smiley emo?cons, e.g., :-‐),,:D
Glad to hear that you’re beating the flu. :-) Hope you don’t get the nasty cough that everyone’s getting this year
HashTags Keeps tweets containing keyword “flu”
Still coughing smh #swineflu #h1n1
Humor Remove humor features in tweets, e.g., “haha”,”hihi”, “***cough … cough***”
Hm Im kinda wanting to go to NYC really soon ***cough … cough*** @Ctmomofsix =)
Geo Tweets from graphical loca?ons (e.g., US)
Seman2c-‐level filtered tweets
Types Tweet samples Influenza confirma?on I got flu n coughed a lot. Now my voice is like
monster’s voice. Rrr
Influenza symptoms My day: flu-like symptoms (headache, body aches, cough, chills, 100.9 fever). Swine flu not ruled out. #H1N1
Flu shots I’m still getting flu shots, nothing is worth flu turning into bronchitis into pneumonia
Self protec?on Cover your mouth if coughing, use a tissue, wash your hands often & get a flu shot - protect and defend your community from #H1N1
Medica?on Wondering why I didn’t take the flu shot, laying in bed with cough drops, medicine, and the remote