Mobile Network Big Data for Modeling Spread of Infec8ous...

24
Mobile Network Big Data for Modeling Spread of Infec8ous Disease LIRNEasia Big Data for Development Team

Transcript of Mobile Network Big Data for Modeling Spread of Infec8ous...

Page 1: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MobileNetworkBigDataforModelingSpreadofInfec8ous

Disease

LIRNEasiaBigDataforDevelopmentTeam

Page 2: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Dengue•  WHOes8mates50-100millioninfec8onsoccureveryyear•  Endemicinover100countries•  Avector-bornediseasecausedbyanRNAvirusoffamily

Flaviviridae;genusFlavivirus(Huhtamo,Eilietal.,2008)•  Mainvectors(Monath,1994)

– Aedesaegyp8– Aedesalbopictus

•  Fourserotypeshavebeeniden8fied(DENV1-4)

Dengvaxia,avaccinefordengue,developedbySanofiPasteuriscurrentlyundergoingclinicaltrialsinmul<plecountries(Sirisena&Noordeen,2016)

2

Page 3: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DengueinSriLanka

•  All4serotypeshavebeenobservedinSriLanka

•  42,194casesreportedin2016sofar;likelytoexceeedthe44,000reportedin2012

–  51.19%reportedfromWesternProvince

–  IncreaseofDENV-2duringJune,2016outbreak

Page 4: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Roleofhumanmobilityindenguepropaga8on

•  Themosquitohasalifespanof2-4weeksandarangeofaround100-800m(Muir&Kay1998;Honórioetal.,2003)

•  Dengueisspreadbeyondthenaturalrangeofthemosquitobythemovementofinfectedhosts

Knowledgeofhumanmobilitypaaernscanshedlightondenguepropaga8onandthelevelofdiseaseincidenceinaregion

Page 5: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DengueinJaffnaCeasefireagreementmayhavecontributedtodengueoutbreaksinJaffnaader2002

Propaga8onofdengueinSriLankaovertheyears(ImageSource:EpidemiologyUnit,M.ofH.(2009).CURRENTSITUATION&EPIDEMIOLOGYOFDENGUEINSRILANKA.InMediaSeminarforDengueWeek.Retrievedfromhap://www.healthedu.gov.lk/web/images/pdf/msp/current_situa8on_epidemiology_of_dengue.pdf)

Page 6: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MobileNetworkBigData(MNBD)usedintheresearch

•  Mul8plemobileoperatorsinSriLankahaveprovidedfourdifferenttypesofmeta-data–  CallDetailRecords(CDRs)

•  Recordsofcalls•  SMS•  Internetaccess

–  Air8merechargerecords•  DatasetsdonotincludeanyPersonallyIden8fiableInforma8on

–  Allphonenumbersarepseudonymized–  LIRNEasiadoesnotmaintainanymappingsofiden8fierstooriginalphone

numbers•  Cover50-60%ofusers;veryhighcoverageinWestern(wherethecapital

cityislocated)&Northern(mostaffectedbycivilconflict)provinces,basedoncorrela8onwithcensusdata

Page 7: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

InsightsalreadyusedinurbanplanninginColombo

46.9%ofColomboCity’sday8mepopula8oncomesfromthesurroundingregions

Page 8: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

MNBDformodelingspreadofdisease

•  AmyWesolowskihaspioneereduseofMNBDfordiseasemodeling

•  Studyon2009MalariaoutbreakinKenya(Wesolowskiet.al,2012)usesMNBDtoiden8fyregionswithhighMalariaoutbreakrisk

•  Travelnetworkisderivedbycalcula8ngtheaveragemonthlytripsbyasubscribertodifferentregions

Travelnetworkbetweennodes(Regions).Edgesweightedbyvolumeoftraffic.Wesolowskietal.2012

Page 9: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

UsingMNBDtopredictDengueEpidemics

•  ModeldevelopedbyWesolowskietal.(2015)modelonemergenceofdengueepidemicsinPakistancorrespondstoactualdenguepropaga8on

•  Exis8ngento-epidemiologicalmodelwasusedtopredictthenumberofdenguecaseswithinanadministra8veunitknownasatehsil

–  ModelwasfiaedtotheendemicregionofKarachistar8ngatDay1,toes8matedailyinfectednumberofpeopleover8mebasedonreportedcases

•  Usedes8matelikelihoodofdengueimporta8onfromKarachitootherregionsonagivenday

–  Modelwasfiaed(inreverse)tonaïveregionstoes8matethe8meforintroduc8onofdenguefromoutsidesolelywithreportedcases

•  Usedtovalidatees8matesofdengueintroduc8onbasedonhumanmobilitypaaerns

•  MobilitymodelsdevelopedfromCDRandgravitymodelswerecomparedtodeterminewhichmodelwasmoreaccurate

Page 10: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

ConclusionsfromPakistanstudy•  CDRMobilitymodelcanpredictthe

8meofintroduc8onofdenguemoreaccuratelythanthediffusionmodel

•  CDRpredictedthefirstinfectedcasescomingtoLahoresome8mebeforetheactualoutbreak

–  ThisdelayintheoutbreakinLahoreisaaributedtotheimmunologyofthepopula8oninLahore

–  Therewasapreviousoutbreakin2011•  Ontheotherhand,theoutbreak

occurredimmediatelyadertheintroduc8oninMingora

–  Mingorapopula8onisimmunologicallynaive

The8meline of introduc8onofDengue toA) Lahore,B) Mingora according to the Pakistan study(Wesolowskiet.al,2015)

Page 11: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Dataneededtopredictdengueoutbreaks

•  Mobilityisoneamongmanyfactorsthataffectdiseasepropaga8on–  Weather,seasonality–  Vectorpopula8ondynamics–  Incuba8onperiodsofthedisease–  Vectorcontrolmechanisms–  Asymptoma8ctransmission–  Immunityofthepopula8ontodifferentserotypesof

dengue

Page 12: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Howdowemodelthesedifferentparameters?

•  Differentapproachesarepossible–  Machinelearningtechniques–  Sta8s8calmodellingtechniques

•  Needtoderivemathema8calmodelswithgoodassump8ons

•  Needtounderstandhowdifferentparametersbehavetoapplymachinelearningmodels

•  Epidemiologicalandentomologicalexper8seareessen8al

Es8ma8ngvectortohuman(ƛV->h)andhumantovector(ƛh->V)dengueincidenceraterequiresknowledgeonthepopula8ons(Ih,Sh,Nh,Iv,Sv,Nh)andbi8ngrate(a)andtransmissionprobabili8es(ΦV->h,Φh->V)(Wesolowskiet.al.,2015)

Page 13: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

SriLanka:Mul8-disciplinary,mul8-stakeholderresearcheffort

•  EpidemiologyUnit,MinistryofHealth–  Casedata,groundtruthdata–  LIRNEasia&UofMoratuwalackexper8seon

epidemiologyandentomologyneededtounderstandthediseasedynamics

–  Keycollaborators:Dr.HasithaA.Tissera,Dr.AzharGhouse

•  UniversityofMoratuwa–  Providesexper8seon

•  Sta8s8cs&Machinelearning•  Computa8onalmodeling

–  Keycollaborators:Dr.ShehanPerera

Page 14: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Workdonesofar•  Dataonweeklynewdenguecasesobtainedandpreprocessedfor2013

and2014•  DatafromSriLanka’sweathersta8ons(total414)beinganalyzed

–  Obtainedrainfalldatafrom112sta8onsfor2013•  70sta8onshaddatafortheen8reyear•  24sta8onshadmissingdataforlessthan2months(filledbyimputa8on)•  18sta8onshadmissingdataformorethan2months(discarded)

–  Temperaturedatafrom23sta8ons•  Onesta8onwasmissing6monthsofdata(discarded)•  Othermissingvalueswereimputed

•  Spa8alboundariesofcelltowers,MOHdivisions,andGNdivisionsdiffered

–  Wecouldnotmappopula8ontoMOHdivisionormobilitytoMOHdivisionbecauseofthesedifferentboundaries

–  AnundergraduategroupfromUniversityofMoratuwawasabletosolvetheissueofoverlappingspa8alboundariesusingVoronoitessella8onanddevelopedtheirownalgorithmtoderivemobility

Page 15: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Studentworkindetail•  Voronoitessella8onwasemployedtoes8matecelltowercoverageand

MOHdivisionalboundaries•  Thefrac8onofcoverageoverlappinganMOHdivisionwasusedtoobtain

frac8onalvaluesformobilityfromeachcellcoverage•  StudentswerelaterabletoobtaintheMOHboundariesandthe

popula8onswithassistancefromtheEpidemiologyUnit•  Mul8plesta8s8calmodelsaswellasdifferentmachinelearning

techniquesweretriedoutonthedata

fij - fractional coverage of a cell tower j on MOH area i

Page 16: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Meta-popula8onmodelforColomboCity

•  Asta8s8calmodelproposedfordenguebySarzynska,Udiani&Zhang,2013inastudydoneforPeruwasadaptedforSriLanka,butusingparametervaluesfromPeru

•  Resultswerenotencouraging(RMSE-70.29)

•  Weareworkingones8ma8ngparametersforSriLankausingmachinelearningtechniques

Predicted

Actual

Page 17: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Machinelearning:From1tomanyMOHdivisions

•  ForasingleMOHdivision,only52datapointsforayear–  Only70%(36datapoints)ledfortraining

•  Duetosmalltrainingset,modelstrainedareoverfiaedandperformbadlywhenpredic8ngunseendata

•  TrainingforasingleMOHdivisionwasdonetoiden8fysuitabilityofdifferenttechniques–  Finalmodelwillincorporatespa8alaspectsandpredictfortheen8re

country•  SincetrainingforasingleMOHwasnotyieldinggoodresults,

6similarMOHareaswereconsideredfortraining–  Providesmoredatapointsfortraining–  Providesanideaonmodelbehaviourunderdifferentcondi8ons

Page 18: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Mul8pleapproaches• RandomForestmethodusedinPakistan(Rehmanetal.,2016)triedinDehiwalaMOHDivision

– Lowerror,butdoesnotpredicttrendyet• NeuralnetworkshavebeentriedinendemiccountriessuchasSingapore(Aburasetal.,2010);wastriedforKoaeMOHDivision

– SriLankalacksdataforalongenoughperiod;nineyearsofdatainSEAsiacases

Page 19: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

XGBoostforDehiwalaMOH•  XGBoostisshortfor“ExtremeGradientBoos8ng”.Canbeconsideredas

anensembledecisiontreealgorithm•  RMSEof8.61adercleaningthedata•  Temperature,weatherdataextrapolatedfor2014duetounavailability

ofrealdata•  Topfeatures

–  temperaturelagfor2weeks

–  caseslagfor2weeks

–  temperaturelaggedfor1week

•  Ourpredic8onsarege{ngclosertotheactualcurve

Page 20: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

DoesMobilityimproveourmodel?•  Appliedthenaivemobilitymodeltoneuralnetworks•  SmallimprovementinRMSEvaluefrom9.62to8.75•  Improvementinthepredic8oncurve•  Wordofcau8on:Thesearepreliminaryresults;Insomeitera8onswedid

notseesignificantimprovements•  However,highcorrela8onwithmobilitywasconsistent

Observedhighercorrela8onformobilitythanrainfallduringourpreliminaryanalysis-  Analysedwith15variables-  Correla8onofrainfall:0.10497-  Correla8onwithmobility:0.14514

Page 21: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Whatdothepreliminaryresultssay?

•  ThereisnotmuchdifferenceinRMSEvaluesbetweenthemachinelearningtechniquesaaemptedsofar

–  Notenoughofadifferencetodiscardanytechniqueoutright•  EventhoughRMSEvalueisgood,modelswithoutmobilitydonotpredict

thepeaks,troughsandthegeneraltrendofdengueincidence•  Mobilityhasadefinitecorrela8onwithdengueincidenceandimproved

RMSEaswellasthepredic8oncurve–  Roomforimprovementintwoaspectswhenconsideringmobility

•  Es8ma8ngthe8mespentbythetravellingpopula8oninanMOHarea•  Methodologyoffusingmobilitytoourmodels

•  Wehaven’tfiguredoutthefinerdetailsofincorpora8ngspa8alproper8estothemodel,inwhichcase,introducingmul8pleMOHdivisionsmightactuallyincreasetheerror

•  Qualityofweatherdata,presenceofmissingvaluescanalsoaffectmodelaccuracy

Page 22: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

Generalizeddiseasepropaga8onpredic8onmodel

•  Ifourmodelcanincorporatetheessen8alfactors(includinghumanmobility)thataffectspa8otemporaldiseasepropaga8onandaccuratelypredictfordengue,wecangeneralizeitforotherdiseases

•  CanusethesamemodelforZikaandpredictanoutbreak–  Canexecutevectorcontrolmechanisms,awarenesscampaignspre-

emp8vely•  Mobilitywillplayanevenlargerroleinnon-vectorborne

diseases–  Ifwecancomeupwithanaccuratemodelforvectorbornediseases,

modellingothertypesofinfec8ousdiseaseswillbeeasier

Page 23: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

References1.  Huhtamo, Eili et al. “Molecular Epidemiology of Dengue Virus Strains from Finnish Travelers.” Emerging Infectious

Diseases 14.1 (2008): 80–83. PMC. Web. 10 Oct. 2016. 2.  Monath, T. P. (1994). Dengue: the risk to developed and developing countries. Proceedings of the National Academy of

Sciences, 91(7), 2395-2400. 3.  Centre for Dengue Research, U. of S. J. (2016). Dengue Virus Serotypes of Current Epidemic. Retrieved October 6, 2016,

from https://www.facebook.com/206869486156896/photos/a.206973839479794.1073741828.206869486156896/598635470313627/?type=3&theater

4.  Brockmann, D. (2010). Human Mobility and Spatial Disease Dynamics. Reviews of Nonlinear Dynamics and Complexity, 2, 1–24. http://doi.org/10.1002/9783527628001.ch1

5.  Brockmann, D., Hufnagel, L., & Geisel, T. (2006). The scaling laws of human travel. Nature, 439(7075), 462–465. http://doi.org/10.1038/nature04292

6.  Wesolowski, A., Eagle, N., Tatem, A. J., Smith, D. L., Noor, A. M., Snow, R. W., & Buckee, C. O. (2012). Quantifying the impact of human mobility on malaria. Science (New York, N.Y.), 338(6104), 267–70. http://doi.org/10.1126/science.1223467

7.  Wesolowski, A., Buckee, C. O., Bengtsson, L., Wetter, E., Lu, X., & Tatem, A. J. (2014). Commentary: Containing the Ebola Outbreak – the Potential and Challenge of Mobile Network Data. PLoS Currents Outbreaks, 1–17. http://doi.org/10.1371/currents.outbreaks.0177e7fcf52217b8b634376e2f3efc5e.Funding

8.  Wesolowski, A., Qureshi, T., Boni, M. F., Sundsøy, P. R., Johansson, M. A., Rasheed, S. B., … Buckee, C. O. (2015). Impact of human mobility on the emergence of dengue epidemics in Pakistan. Proceedings of the National Academy of Sciences, 112(38), 11887–11892. http://doi.org/10.1073/pnas.1504964112

9.  Sarzynska, M., Udiani, O., & Zhang, N. (2013). A study of gravity-linked metapopulation models for the spatial spread of dengue fever. arXiv Preprint arXiv:1308.4589, 2008, 1–32. Retrieved from http://arxiv.org/abs/1308.4589

10.  Rehman, N. A., Kalyanaraman, S., Ahmad, T., Pervaiz, F., Saif, U., & Subramanian, L. (2016). Fine-grained dengue forecasting using telephone triage services. Science Advances, 2(7), 1–10. http://doi.org/10.1126/sciadv.1501215

Page 24: Mobile Network Big Data for Modeling Spread of Infec8ous ...lirneasia.net/wp-content/uploads/2016/10/Samarajiva_PandemicSym… · Dengue • WHO es8mates 50-100 million infec8ons

References11.  Aburas, H. M., Cetiner, B. G., & Sari, M. (2010). Dengue confirmed-cases prediction: A neural network model. Expert

Systems with Applications, 37(6), 4256–4260. http://doi.org/10.1016/j.eswa.2009.11.077 12.  Husin, N. A., Salim, N., & Ahmad, A. R. (2008). Modeling of dengue outbreak prediction in Malaysia: A comparison of

neural network and nonlinear regression model. Proceedings - International Symposium on Information Technology 2008, ITSim, 4, 6–9. http://doi.org/10.1109/ITSIM.2008.4632022

13.  Rachata, N., Charoenkwan, P., Yooyativong, T., Chamnongthai, K., Lursinsap, C., & Higuchi, K. (2008). Automatic prediction system of dengue haemorrhagic-fever outbreak risk by using entropy and artificial neural network. 2008 International Symposium on Communications and Information Technologies, ISCIT 2008, (Iscit), 210–214. http://doi.org/10.1109/ISCIT.2008.4700184

14.  WHO Scientific Group on Arthropod-Borne and Rodent-Borne Viral Diseases. (1985). Arthropod-borne and rodent-borne viral diseases : report of a WHO scientific group [meeting held in Geneva from 28 February to 4 March 1983].

15.  Vitarana, T., Jayakuru, W., & Withane, N. (1997). Historical account of Dengue Haemorrhagic Fever in Sri Lanka. 16.  Yang, H. M., Macoris, M. L. G., Galvani, K. C., Andrighetti, M. T. M., & Wanderley, D. M. V. (2009). Assessing the effects

of temperature on the population of Aedes aegypti, the vector of dengue. Epidemiology and Infection, 137(8), 1188–1202. http://doi.org/10.1017/S0950268809002040

17.  P. H. D. Kusumawathie, & R. R. M. L. R. Siyambalagoda. (2005). Distribution and breeding sites of potential dengue vectors in Kandy and Nuwara Eliya districts of Sri Lanka. The Ceyion Journal of Medical Science .