A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif...

20
A Data Intensive High Performance A Data Intensive High Performance Simulation & Visualization Simulation & Visualization Framework for Disease Surveillance Framework for Disease Surveillance Arif Ghafoor, David Ebert, Madiha Sahar Arif Ghafoor, David Ebert, Madiha Sahar Ross Maciejewski, Shehzad Afzal, Farrukh Arslan Ross Maciejewski, Shehzad Afzal, Farrukh Arslan Acknowledgement: Project Partially Funded by Acknowledgement: Project Partially Funded by Cyber Center Cyber Center

Transcript of A Data Intensive High Performance Simulation & Visualization Framework for Disease Surveillance Arif...

A Data Intensive High Performance A Data Intensive High Performance Simulation & Visualization Framework for Simulation & Visualization Framework for

Disease SurveillanceDisease Surveillance

Arif Ghafoor, David Ebert, Madiha SaharArif Ghafoor, David Ebert, Madiha Sahar

Ross Maciejewski, Shehzad Afzal, Farrukh ArslanRoss Maciejewski, Shehzad Afzal, Farrukh Arslan

Acknowledgement: Project Partially Funded by Cyber CenterAcknowledgement: Project Partially Funded by Cyber Center

Objective and GoalsObjective and Goals

Objective: To address the infectious disease Objective: To address the infectious disease surveillance challenges and develop a collaborative surveillance challenges and develop a collaborative capability for all the stakeholders for monitoring and capability for all the stakeholders for monitoring and managing outbreaks infectious diseases in large managing outbreaks infectious diseases in large citiescities

Approach: Develop a high performance computing Approach: Develop a high performance computing (HPC) framework employing robust and novel (HPC) framework employing robust and novel infectious disease epidemiology models with real-infectious disease epidemiology models with real-time inference and pre/exercise planning capabilities. time inference and pre/exercise planning capabilities.

Objective and GoalsObjective and Goals

Real-time data analysis capabilities, providing a Real-time data analysis capabilities, providing a model for infrastructure development where model for infrastructure development where lessons learned can be used to develop best lessons learned can be used to develop best practice modelspractice models

A comparative assessment of disease modeling A comparative assessment of disease modeling techniques by focusing on the tradeoff between techniques by focusing on the tradeoff between the level of granularity used in creating the model the level of granularity used in creating the model and the model efficacyand the model efficacy

Novel visual analytics paradigms integrating Novel visual analytics paradigms integrating decision support and resource allocation tools decision support and resource allocation tools with live streaming data and disease simulation with live streaming data and disease simulation scenariosscenarios

4

Conceptual view of Proposed Conceptual view of Proposed Infectious Disease Surveillance Infectious Disease Surveillance

FrameworkFramework

Tasks: Tasks:

Task A: Data Intensive Multi-Task A: Data Intensive Multi-Resolution Simulation ModelingResolution Simulation Modeling

Task B: High Performance Task B: High Performance Simulation Modeling on HADOOPSimulation Modeling on HADOOP

Task A: Initial Research ResultsTask A: Initial Research Results Challenge: The notion of context, is important Challenge: The notion of context, is important

for syndromic surveillance. For syndromic data for syndromic surveillance. For syndromic data set we need:set we need: Contextual attributesContextual attributes Behavioral attributesBehavioral attributes

We have proposed an HPC data mining We have proposed an HPC data mining framework for contextual and behavioral framework for contextual and behavioral attributes using Syndrome Ontology attributes using Syndrome Ontology (Assumption: Domain Knowledge is available) (Assumption: Domain Knowledge is available)

Currently pursuing system Implementation --Currently pursuing system Implementation --WEKA: Machine Learning & Data Mining in WEKA: Machine Learning & Data Mining in Java. Java. (http://www.cs.waikato.ac.nz/ml/weka/index.html)(http://www.cs.waikato.ac.nz/ml/weka/index.html)

Task A: Data Intensive Multi-Task A: Data Intensive Multi-Resolution Simulation Modeling Resolution Simulation Modeling

(initial results)(initial results)

7

• Proposed HPC framework for mining of contextual (eg. spatio-temporal) and behavioral attributes using Syndrome Ontology. • Domain knowledge is available through domain ontology

8

Ontological Syndromic and Climate Ontological Syndromic and Climate ClassifiersClassifiers

Exploration towards decision trees spanning over distributed multi-domains, representing semantic knowledge at temporal, spatial and socio-economic level.

99

Patient ID

Date Age Gender

Location Chief Complain

ts

9398 1/10/11

20 Female

Kot Begum

Flu

10816 1/14/11

24 Male Faisal Park

Chills

1491 1/27/11

28 Male Bhamman

Bodyaches

16237 2/1/11 20 Female

Chah Miran

Anxiety

CoCo Classifier

10

Epidemic Spread VisualizationEpidemic Spread Visualization

11

Developing Novel Statistical HeterogeneousDeveloping Novel Statistical HeterogeneousAgent Based SIR ModelAgent Based SIR Model

• Adding age based and gender based classificationAdding age based and gender based classification• Demographic impacts on spread rate (socioeconomic Demographic impacts on spread rate (socioeconomic

classification) classification) • Capturing seasonal trends of disease spread Capturing seasonal trends of disease spread • Effect of decision making considering preventive measures Effect of decision making considering preventive measures

(inoculation of population, resource allocation of (inoculation of population, resource allocation of healthcare) healthcare)

Components of Proposed HPC HADOOP PlatformComponents of Proposed HPC HADOOP Platform

12

Cloud Platform as a Service + Support services (Storage,

DB, Security, Aggregation)

HadoopHPC Hardware

Virtualization Cloud Infrastructure Management

Multi-Tenant, Deployment & Cloud ClusterManagement

Service Development Platform

HadoopHPC

Infrastructure

Data Filtering, Anonymizationand Ingestation

Statistical Data Analyticsfor Health Forecast

Real-TimeOn-Line Data Mining

for SyndromicSurveillance

ID SpreadnessSimulation

Visual Analytics Environment for Web-Based Real-time HPC Distributed Services

IDDatabankComponent Legend

To be developed

Open-source

Partially developed

Real-timenetworked data

streams

Decision Support Sub-System

Task B: High Performance Task B: High Performance Simulation Modeling on HADOOP Simulation Modeling on HADOOP

(in progress)(in progress) Objective:Objective: Development of agent-based and multi- Development of agent-based and multi-

granularity homogenous mixing model for HPC-based granularity homogenous mixing model for HPC-based simulation.simulation.

TASK B: High Performance TASK B: High Performance Simulation Modeling on HADOOPSimulation Modeling on HADOOP

Development of Agent-Based SIR Model Development of Agent-Based SIR Model for Heterogeneous Networksfor Heterogeneous Networks

Simulation Based Disease Spread Simulation Based Disease Spread BehaviorBehavior

Analysis of Decision making for Analysis of Decision making for Preventive MeasuresPreventive Measures

SIR IN HETEROGENEOUS SIR IN HETEROGENEOUS NETWORKSNETWORKS

Each node can have three states: Susceptible, Infected, and Each node can have three states: Susceptible, Infected, and Recovered (S, I, R)Recovered (S, I, R)

Once infected, a node can transmit infection to neighboring Once infected, a node can transmit infection to neighboring susceptible nodes with a probability susceptible nodes with a probability ββ

InfectedInfected nodes stay infected for a duration nodes stay infected for a duration d d Recovery rate of infected nodes Recovery rate of infected nodes υυ is is 1/d1/d Susceptibility of an individual may vary depending upon the Susceptibility of an individual may vary depending upon the

number of infected neighborsnumber of infected neighbors Within a group interaction:Within a group interaction:

Figure: State diagram of SIR Model

β: probability of getting disease during a contact

d: duration of infectionυ: Recovery Rate ( 1/d)

N: Total Population

SOCIAL NETWORK MODELING FOR SOCIAL NETWORK MODELING FOR PREDICTION & MANAGEMENT OF PREDICTION & MANAGEMENT OF

EPIDEMICSEPIDEMICS

Development of an Agent Based social Development of an Agent Based social networking model to simulate the infectious networking model to simulate the infectious disease spreaddisease spread

Population is divided into groups depending Population is divided into groups depending upon age, gender, occupation, and location – upon age, gender, occupation, and location – a phenomenon known as a phenomenon known as Assortative MixingAssortative Mixing

Distribution of contacts play a key role in Distribution of contacts play a key role in determining the onset of expansion phase of determining the onset of expansion phase of epidemicepidemic

Population Classification Attributes Population Classification Attributes

HETEROGENEOUS GRAPH MODEL FOR HETEROGENEOUS GRAPH MODEL FOR MULTI-GROUP POPULATION MULTI-GROUP POPULATION

INTERCATIONINTERCATION

CURRENT STATUSCURRENT STATUS

Development of Heterogeneous Models Development of Heterogeneous Models & evaluation of their fidelity. Simulation & evaluation of their fidelity. Simulation in NETLOGOin NETLOGO

Simulation ObjectivesSimulation Objectives Effect of demographic propertiesEffect of demographic properties Effect of weather on epidemic disease Effect of weather on epidemic disease

spread and seasonal trendsspread and seasonal trends Effect of pharmaceutical and other decision Effect of pharmaceutical and other decision

measures on epidemic spreadmeasures on epidemic spread

Summary and StatusSummary and Status

Proposed an HPC-based data mining framework for Proposed an HPC-based data mining framework for contextual and behavioral attributes using Syndrome contextual and behavioral attributes using Syndrome Ontology (Assumption: Domain Knowledge is Ontology (Assumption: Domain Knowledge is available). Currently pursuing system Implementation available). Currently pursuing system Implementation --WEKA: Machine Learning & Data Mining in Java.--WEKA: Machine Learning & Data Mining in Java.

Development of agent-based SIR heterogeneous Development of agent-based SIR heterogeneous population model for HPC-based simulation for large population model for HPC-based simulation for large cities (in progress).cities (in progress).

Proposal (in preparation): Proposal (in preparation): Gates Foundation Grand Challenges Explorations for Global Gates Foundation Grand Challenges Explorations for Global

Health Health Potential collaboration with MSR Potential collaboration with MSR