Using Twitter Data to Predict Flu Outbreak

28
Using Twi)er Data to Predict Flu Outbreak Son Doan Division of Biomedical Informa2cs University of California San Diego BigData@UCSD workshop Nov 25, 2013

description

Using Twitter Data to Predict Flu Outbreak

Transcript of Using Twitter Data to Predict Flu Outbreak

Page 1: Using Twitter Data to Predict Flu Outbreak

Using  Twi)er  Data  to  Predict  Flu  Outbreak  

Son  Doan  Division  of  Biomedical  Informa2cs  University  of  California  San  Diego  

 BigData@UCSD  workshop  

Nov  25,  2013  

Page 2: Using Twitter Data to Predict Flu Outbreak

Seasonal  influenza  and  influenza-­‐like  illness  

•  Seasonal  influenza  is  a  major  public  health  concern:  •  3-­‐5  million  cases  of  severe  illness    •  250,000  to  500,000  deaths  worldwide    each  year    

•  Seasonal  influenza  has  main  syndrome  called  Influenza-­‐Like  Illness  (ILI)  

•  During  the  peak  of  a  major  outbreak  of  influenza,  more  cases  of  ILI  are  observed  

à  Monitoring  ILI  can  help  in  predict  flu  outbreak        

Page 3: Using Twitter Data to Predict Flu Outbreak

Tradi?onal  system  to  monitor  ILI:  ILINet  

 •  ILINet:  CDC’s  U.S.  Outpa2ent  ILI  Surveillance  Network  –  consists  of    >3,000  outpa2ent  healthcare  providers    –  all  50  US  states  and  area  –  reports  more  than  30  million  pa2ent  visits  each  year  

•  ILINet  monitors  influenza  through  ILI  rate    –  ILI  rate  is  percentage  of  pa2ents  with  ILI  among  all  pa2ents  

– Average  na2onal  baseline  ILI  rate  for  2013  is  2.0%  

Page 4: Using Twitter Data to Predict Flu Outbreak

Source:  hVp://www.cdc.gov/flu/weekly/index.htm  

Page 5: Using Twitter Data to Predict Flu Outbreak

Let’s  revisit  the  process  

     Pa2ent  1   Healthcare  provider  

     Pa2ent  2   Healthcare  provider  

     Pa2ent  n   Healthcare  provider  

…  

visits  

visits  

visits  

Check  if  ILI  

Check  if  ILI  

Check  if  ILI  

ILINet  gather  data  and  then  calculate  ILI  rate  

Page 6: Using Twitter Data to Predict Flu Outbreak

ILINet  issue  

ILINet  needs  1-­‐2  weeks  to  gather  and  process  data  

Can  we  leverage  other  data  sources  to  predict  ILI  rate  faster?  

Page 7: Using Twitter Data to Predict Flu Outbreak

Nowadays,  users  tend  to  find  informa?on  in  Internet  

   User  1  

     User  2  

     User  n  

…  

searches  

searches  

searches  

Internet  

Page 8: Using Twitter Data to Predict Flu Outbreak

…  or  tweet  their  personal  health  condi?ons  

   User  1  

     User  2  

     User  n  

…  

tweets  

tweets  

tweets  

Internet  

Page 9: Using Twitter Data to Predict Flu Outbreak

Es?mate  ILI  rate  using  user-­‐generated  data    •  Models  

–  Linear  model  [1]:  ILI  rate  =  (ILI-­‐related  data)�α  +  error  

–  Logis2c  regression  [2]:    logit(ILI  rate)  =  logit(ILI-­‐related  data)�α  +  error  

 •  Key  point:  How  to  iden2fy  ILI-­‐related  data?    •  Hint:  ILI  is  defined  as  fever  (temperature  of  100°F  [37.8°C]  or  greater)  and  cough  and/or  sore  throat  

[1]  Polgreen  et  al.  “Using  internet  searches  for  influenza  surveillance”,  Clinical  Infec2ous  Disease,  2008,  47(11):1443-­‐8.  [2]  Ginsberg  et  al.  “Detec?ng  influenza  epidemics  using  search  engine  query  data.”,  Nature.  2009  Feb  19;457(7232):1012-­‐4  

Page 10: Using Twitter Data to Predict Flu Outbreak

GFT  es?mates  based  on  flu-­‐related  queries  are  highly  correlated  to  ILI  rate  

Source:  hVp://www.google.org/flutrends/about/how.html    

Repor2ng  lag  of  about  1  day  

Page 11: Using Twitter Data to Predict Flu Outbreak

GFT  is  good,  however…  •  Researchers  cannot  access  original  data  •  GFT  does  not  disclose  search  queries  

Source:  Ginsberg  et  al,  Nature  457,  1012-­‐1014  (19  February  2009)  

Page 12: Using Twitter Data to Predict Flu Outbreak

SOURCES:  GOOGLE  FLU  TRENDS  (WWW.GOOGLE.ORG/FLUTRENDS);  CDC;  FLU  NEAR  YOU  

Page 13: Using Twitter Data to Predict Flu Outbreak

Twi)er  corpus  Timeline:  36  weeks  for  the  US  2009  influenza  season  (Aug  30,  2009  to  May  8,  2010)    Name   Total  

Tweets   587,290,394  

Unique  users  

23,571,765    

URL   136,034,309  

Hash  Tags  

96,399,587  

Thanks  to  Brendan  O’Connor  (CMU)  and  TwiVer  Inc.  

5 mil

10 mil

15 mil

20 mil

25 mil

Page 14: Using Twitter Data to Predict Flu Outbreak

Related  work  

Twi)er  corpus  

ILI-­‐related  tweets  

Culo)a4   Signorini3   Chew3  

flu   swine   h1n1  

cough   flu   swine  flu  

headache   influenza   swineflu  

sore  throat  

[3]  A.  CuloVa,  “Detec2ng  influenza  epidemics  by  analyzing  twiVer  messages,”  arXiv:1007.4748v1  [4]  A.  Signorini,  A.  M.  Segre,  and  P.  M.  Polgreen,  “The  Use  of  TwiVer  to  Track  Levels  of  Disease  Ac2vity  and  Public  Concern  in  the  U.S.  during  the  Influenza  A  H1N1  Pandemic,”  PLoS  ONE,  vol.  6,  no.  5,  p.  e19467,  05  2011.    [5]  C.  Chew  and  G.  Eysenbach,  “Pandemics  in  the  Age  of  TwiVer:  Content  Analysis  of  Tweets  during  the  2009  H1N1  Outbreak,”  PLoS  ONE,  vol.  5,  no.  11,  p.  e14118,  11  2010.  

Page 15: Using Twitter Data to Predict Flu Outbreak

Our  approach:  two-­‐step  filtering  

Respiratory  syndrome  only  

Respirator  syndrome    +  “flu”  

Respiratory  syndrome    +  “flu”  -­‐  URL  

Nega?on   Emo?con  

HashTags   Humor  

Geo  

Knowledge-­‐based  approach   Seman?c  level      

Twi)er  corpus  

Respiratory  syndrome-­‐related  

tweets  

Seman?c  filtered  tweets  

Filter  1   Filter  2  

Page 16: Using Twitter Data to Predict Flu Outbreak

Correla?on  to  ILI  rate  (CDC  data)  

Method   Pearson  corr  with  ILI  rate  

Google  Flu  Trends   0.9912  Related  work   CuloVa4   0.9485  Filter  1   Respiratory  syndrome  +  “flu”  -­‐  URL   0.9752  Filter  1+2   Nega2on  +  Emo2con  +  HashTags  +  

Humor  +  Geo  0.9846  

Page 17: Using Twitter Data to Predict Flu Outbreak

Correla?on  to  ILI  rate  (CDC  data)  %

S.  Doan,  L.Ohno-­‐Machado,  N.  Collier,  "Enhancing  TwiVer  Data  Analysis  with  Simple  Seman2c  Filtering:  Example  in  Tracking  Influenza-­‐  Like  Illnesses",  Proc.  of  the  2nd  IEEE  HISB  2012,  pp.62-­‐71,  2012.  

Page 18: Using Twitter Data to Predict Flu Outbreak

Big  Data  challenge  

Is  sampling  data    enough?  

Twi)er:  140  millions  ac?ve  users  340  millions  tweets/day  

Twitter API sampling rate is small (1-5% data)

Filtered tweets: 0.2% of samples

Page 19: Using Twitter Data to Predict Flu Outbreak

Syndromic  surveillance  for  gastrointes?nal,  respiratory,  neurological,  dermatological,  haemorrhagic,  musculoskeletal  from  Tweets  in  40  world  ci2es.  

DIZIE:  system  for  syndromic  surveillance  using  Twi)er  

Page 20: Using Twitter Data to Predict Flu Outbreak

Use  cases  

•  DIZIE  was  integrated  to  BioCaster,  our  news  media  biosurveillance  system  

•  DIZIE  was  used  by  European  Centre  for  Disease  Preven2on  and  Control  (ECDC)  to  track  syndromes  in  the  London  2012  Summer  Olympics  

Page 21: Using Twitter Data to Predict Flu Outbreak

Poten?al  applica?ons  using  Twi)er  in  public  health  

•  Mental  Heath  Analysis  

•  Tobacco  surveillance  

•  Medica2on  use  in  social  media  

Page 22: Using Twitter Data to Predict Flu Outbreak

Acknowledgements  

•  Nigel  Collier,  European  Bioinforma2cs  Ins2tute  •  Mike  Conway,  UCSD  •  Lucila  Ohno-­‐Machado,  UCSD  

Page 23: Using Twitter Data to Predict Flu Outbreak
Page 24: Using Twitter Data to Predict Flu Outbreak

Data  source  for  influenza  surveillance  

•  Data  provided  by  physicians  and  laboratory  •  Over-­‐the-­‐counter-­‐drug  sales  •  School  absentee  records  •  Health-­‐related  phone  calls  •  Internet-­‐based  data:  

– News  media  – Mailing  list  – Social  media  

Page 25: Using Twitter Data to Predict Flu Outbreak

Extract  respiratory  syndrome  keywords  

achy  chest   cold  symptom   respiratory  failure  

apnea   cough   runny  nose  

asthma   dyspnea   short  of  breath    

asthma?c   dyspnoea   shortness  of  breath  

blocked  nose   gasping  for  air   sinusi?s  

breathing  difficul?es   lung  sounds   sore  throat  

breathing  trouble   pneumonia   stop  breathing  

bronchi?s   rales   stuffy  nose  

…   …   …  

We  have  a  total  of  37  keywords    

Page 26: Using Twitter Data to Predict Flu Outbreak

Knowledge-­‐based  approach  

Name   Example  

Respiratory  syndrome  only  

tweets  containing  syndrome  keywords  

Barber just coughed on me in the chair.

Respiratory  syndrome  +  “flu”  

tweets  containing  syndrome  keywords  and  “flu”    

I got flu n coughed a lot.

Respiratory    syndrome  +  “flu”  -­‐  URL  

tweets  containing  syndrome  keywords  and  “flu”,  remove  links    

7-year-old boy dies of flu,pneumonia < URL>

Page 27: Using Twitter Data to Predict Flu Outbreak

Seman2c  level  filtering  

Name   Examples  

Nega?on   Remove  nega?on  in  tweets   I don’t have flu

Emo?con   Remove  tweets  containing  smiley  emo?cons,  e.g.,  :-­‐),,:D    

Glad to hear that you’re beating the flu. :-) Hope you don’t get the nasty cough that everyone’s getting this year

HashTags   Keeps  tweets  containing  keyword  “flu”  

Still coughing smh #swineflu #h1n1

Humor   Remove  humor  features  in  tweets,  e.g.,  “haha”,”hihi”,  “***cough  …  cough***”  

Hm Im kinda wanting to go to NYC really soon ***cough … cough*** @Ctmomofsix =)

Geo   Tweets  from  graphical  loca?ons  (e.g.,  US)  

Page 28: Using Twitter Data to Predict Flu Outbreak

Seman2c-­‐level  filtered  tweets  

Types   Tweet  samples  Influenza  confirma?on   I got flu n coughed a lot. Now my voice is like

monster’s voice. Rrr

Influenza  symptoms   My day: flu-like symptoms (headache, body aches, cough, chills, 100.9 fever). Swine flu not ruled out. #H1N1

Flu  shots   I’m still getting flu shots, nothing is worth flu turning into bronchitis into pneumonia

Self  protec?on   Cover your mouth if coughing, use a tissue, wash your hands often & get a flu shot - protect and defend your community from #H1N1

Medica?on   Wondering why I didn’t take the flu shot, laying in bed with cough drops, medicine, and the remote