Stochastic Hybrid Systems Modeling & Middleware-enabled … · 2015-10-01 · Stochastic Hybrid...

Stochastic Hybrid Systems Modeling &

Middleware-enabled DDDAS for Next-

generation US Air Force Systems

FA9550-13-1-0227

Acknowledgments: Dr. Frederica Darema

Aniruddha Gokhale Associate Professor, Dept of EECS &

Institute for Software Integrated Systems

Vanderbilt University, Nashville, TN, USA

Email: [email protected]

PI Meeting, Dec 1-3, 2014

IBM T. J. Watson Center, Yorktown Heights, NY

Role of Systems Software in DDDAS

• DDDAS provides symbiotic feedback

control between the application

instrumentation and its simulation to

steer a system on its trajectory

• Systems software enables dynamic

resource provisioning for model

learning, model execution and

dynamic instrumentation among

other things 2

•Ship-wide QoS Doctrine & Readiness Display

•Network latency

•& bandwidth

•Workload &

•Replicas

•CPU & memory

•Connections &

•priority bands

•Network latency

•& bandwidth

•Workload &

•Replicas

•CPU & memory

•Connections &

•priority bands

•Control

•Vars. •}

•Local

•middleware

•Qo

S •Qo

S

•TBMD

Application •AAW Application

•Control

•Algorithm •Control

•Algorithm

•Control


•Algorithm

•Control


•Algorithm

•Requested QoS

•Measured QoS

•Global

•Middleware

•Network latency

•& bandwidth

•Workload &

•Replicas

•CPU & memory

•Connections &

•priority bands

Applications Modeling & Systems Software (AMASS)

AMASS targets DDDAS Applications Modeling & Systems Software

areas

Leverages prior work on

instrumentation, statistical

algorithms, & adaptive

software infrastructure for

Naval combat systems

• Prof. Aniruddha Gokhale (PI), Prof. Xenofon

Koutsoukos and Prof. Douglas Schmidt (Co-PIs)

• Students

• Faruk Caglar

• Shashank Shekhar

• Shweta Khare

• Michael Walker

• Violetta Vylegzhanina

• Anirban Bhattacharjee

• Hamzah Abdelaziz

• Kyoungho An

• Other collaborators

• Dr. Sumant Tambe (RTI)

• Dr. Abhishek Dubey, Dr. William Otte, Dr. Nilabja Roy (All VU)

4

Our Team

• Prior projects with AFRL

• We have worked with Steven Drager and William

McKeever on Software Producibility projects

• Aniruddha Gokhale did the summer faculty program

in 2009

• Some publications jointly authored with AFRL

program managers

• Doug Schmidt has served on scientific

advisory board

5

Team Experience working with Air Force

AMASS Focus Areas • Applications modeling using stochastic hybrid modeling for model building

& anytime algorithms for incremental model refinement

DDDAS Modeling & Infrastructure ArchitectureOnline updates

(Re)Instrument

DDDAS Systems SoftwareModel

Repository

DDDAS Online Model Learning & Model

Generation Environment

Dynamic

Resource

Management

Model Update &

Deployment and

Configuration

Model Fidelity

Decision

Support System

Me

asu

rem

en

t

Operationalized

DoD System

sense

Traditional

Control Loop

Collection of

Running DoD

System Models

steer

Heterogeneous Multi-layered Resource Platforms for Model Executions

Data Center

Runtime query & retrieval

AMASS Focus Areas • Applications modeling using stochastic hybrid modeling for model building

& anytime algorithms for incremental model refinement

• Systems software comprising dynamic resource management, deployment

& configuration, & online model updates to support distributed & real-time

model execution & control

DDDAS Modeling & Infrastructure ArchitectureOnline updates

(Re)Instrument

DDDAS Systems SoftwareModel

Repository

DDDAS Online Model Learning & Model

Generation Environment

Dynamic

Resource

Management

Model Update &

Deployment and

Configuration

Model Fidelity

Decision

Support System

Me

asu

rem

en

t

Operationalized

DoD System

sense

Traditional

Control Loop

Collection of

Running DoD

System Models

steer

Heterogeneous Multi-layered Resource Platforms for Model Executions

Data Center

Runtime query & retrieval

8

Team Interaction

• Weekly meeting

• Redmine-based project management

• Meeting notes on project wiki page

• Git version control for software and publications

• Project started in Sept 2013

• Year 1 Accomplishments

• Cloud-focused resource management for satisfying

DDDAS applications QoS (e.g., required for DDDAS

simulations, model learning), and real-time stream

processing

• Demonstrated one simple end-to-end scenario that uses

stochastic models and simulations, and resource

management using lightweight virtualization

• Publications, a doctoral student PhD proposal defense

• Year 2 Plans

• Mobile device-based instrumentation, Real-time

streaming processing, resource mgmt across a spectrum

of resources, model learning

• Collaborate with DDDAS teams for application use cases 9

Summary of Contributions

Summary of Publications (1/2)

Journal

• Shashank Shekhar, Hamza Abdelaziz, Michael Walker, Faruk Caglar, Aniruddha

Gokhale, and Xenofon Koutsoukos, “A Simulation as a Service Cloud Middleware,”

The Springer Journal of Annals of Telecommunications, 2014 (in submission).

• Faruk Caglar and Aniruddha Gokhale, “iOverbook: Intelligent Resource-Overbooking

to Support Soft Real-time Applications in the Cloud,” 7th International Conference on

Cloud Computing (IEEECloud), Alaska, USA, June 27, 2014 (invited to International

Journal of Cloud Computing)

Book Chapters

• Shashank Shekhar, Shweta Khare, Faruk Caglar, Aniruddha Gokhale, Douglas

Schmidt, and Xenofon Koutsoukos, “Middleware-enabled DDDAS,” Book Chapter in

Springer, 2014 (in submission).

Panel

• Aniruddha Gokhale, “Systems Software Challenges for InfoSymbiotics

Systems/DDDAS,” SuperComputing 2014 panel on InfoSymbiotic Systems/DDDAS,

New Orleans, LA, Nov 2014

Summary of Publications (2/2) Conference & Workshop Publications

• Faruk Caglar, Shashank Shekhar, and Aniruddha Gokhale. “iPlace: An Intelligent

and Tunable Power- and Performance-Aware Virtual Machine Placement Technique

for Cloud-based Real-time Applications,” 17th IEEE Symposium on

Object/Component/Service-oriented Real-time Distributed Computing (ISORC),

Reno, Nevada, USA, June 10, 2014

• Faruk Caglar and Aniruddha Gokhale, “iOverbook: Intelligent Resource-Overbooking

to Support Soft Real-time Applications in the Cloud,” 7th International Conference on

Cloud Computing (IEEECloud), Alaska, USA, June 27, 2014

• Faruk Caglar, Shashank Shekhar, and Aniruddha Gokhale. “iTune: Engineering the

Performance of Xen Hypervisor via Autonomous and Dynamic Scheduler

Reconfiguration,” (in submission)

• Faruk Caglar, Shashank Shekhar and Aniruddha Gokhale, “Towards a Performance

Interference-aware Virtual Machine Placement Strategy for Supporting Soft Real-

time Applications in the Cloud,” 3rd International Workshop on Real-time and

Distributed Computing in Emerging Applications (REACTION 2014), Rome, Italy,

Dec 2, 2014. (to appear)

• Shweta Khare, Kyoungho An, Aniruddha Gokhale and Sumant Tambe, ,”Functional

Reactive Stream Processing for Data-centric Publish/Subscribe,” Submitted to

IPDPS 2015, Hyderabad, India.

• DDDAS application simulations (and model learning

algorithms) require resources to execute

12

Area 1: Cloud-based Resource Mgmt

Resource Pool (e.g.

Data Center)

Model of the Resource

Pool

instrument

control

DDDAS Application

Model of the DDDAS

Application

instrument

control • Simulations can

execute in the cloud

• Applications have

different QoS

requirements

• Need resource

management in the

cloud data center

• Apply DDDAS

principles to the cloud

data center

• To achieve this vision, we need to instrument a data

center and obtain resource utilization information

13

Cloud Data Center Instrumentation: How to?

Google Data Center info from 2011

Model of the Google Data

Center

instrument

control

DDDAS Application

Model of the DDDAS

Application

instrument

control

• We used a pre-

instrumented

trace log from

Google

• Solved various

resource mgmt

problems

• We leveraged cluster trace made available by Google for a period of 29 days in May 2011.

• Data is available for more than 12,000 host machines

• Data comprises machine events, machine attributes, jobs, tasks, constraints, and resource usage details.

• Resource usage data contains about 1.2 billion rows

14

Data from an Instrumented Data Center

Google Data Center

(May 2011)

Model of the Google Data

Center

machine learning techniques

Cloud Data Center Architecture •Management and

Orchestration of

Cloud Environment

•Delivery of cloud-

based applications

and services

•Virtual Machine

Management on top

of Host Machines

Focus so far is only on the compute resources;

storage and I/O to be considered later

Challenge 1:Autonomous and

Dynamic Scheduler Reconfiguration

Virtualization Layer comprises scheduling

mechanism to share the physical CPU

Scheduling mechanism is usually

configured by certain parameters in the

hypervisor

Performance of an application running in

the VM is directly impacted by the

configuration

•Finding the optimum scheduling

configuration is required

Solution to Challenge 1: iTune

iTune : An Intelligent and Autonomous Self-tuning

Middleware to Optimize the Scheduler Parameters of the

Virtualization Mechanism

• Method is applicable to all scheduling environments

• Specifically, we focus on Xen hypervisor

• Tunes the parameters of the default scheduler in the Xen

hypervisor, which is a credit-based CPU scheduler

• iTune tunes the Xen’s credit scheduler parameters by

dealing with changing workload on the host machine

• Based on the empirical insights, it was proved that (1) CPU

Utilization, (2) CPU Overbooking Ratio, and (3) VM Count are

strong features to be used for workload clustering.

Challenge 2: Accommodating Multiple

Tasks using Resource Overbooking

Overbooking helps to increase energy

efficiency and resource utilization.

Common practice to make the business

model more profitable (e.g. airlines,

hotels, cell phone operators)

•How to systematically identify

effective overbooking ratios?

Solution to Challenge 2: iOverbook

iOverbook : Intelligent Resource-Overbooking to Support

Soft Real-time Applications in the Cloud

Machine learning approach to making systematic and

online determination of overbooking ratios.

Utilizes historic data of tasks and host machines in the

cloud

Extracts their resource usage patterns

Predicts future resource usage and expected mean

performance of host machines.

Used cluster trace log released by Google.

Challenge 3: Power- and Performance-

aware VM Placement Aims to tolerate faults, balance

workload, eliminate hotspots, etc.

concerns

Virtual machines are migrated in the

data center

Power and performance tradeoffs are

critical concerns faced by CSPs

How to find the aptly suited host

machine for power- and performance-

aware VM placement?

Challenge 4: Performance Interference

Effects on App Performance Analyzing the performance anomalies

Cloud systems are multi-tenant

CSPs overbook physical system

resources

Resource overbooking and noisy

neighbors can lead to performance

interference and anomalies among VMs

How to predict the performance

interference and the faults that may

occur before a VM placement

decision is made?

Solutions to Challenges 3 & 4 iPlace: An intelligent and Tunable Power- and Performance-

aware Virtual Placement Middleware

• The goal of iPlace is to find an aptly suited host machine by carefully considering the energy efficiency of the data center and performance requirements of soft-real time applications.

• Placement decision is based on power changes and performance effects to the applications

hALT :harmonious Art of Living Together

• Performance Interference-aware Virtual Machine Placement

Strategy for Supporting Soft Real-time Applications in the Cloud

• hALT extracts the best VM collocation patterns by utilizing

features such as CPU, Memory usage, and performance.

• hALT assumes that CSPs overbook their underlying cloud

infrastructure to save energy costs.

Problem

• Instrumented data must be processed on-the-fly

• Must handle dynamic changes in incoming sources of

streams

• Should be able to use archived data (history)

Solution

• We are using real-time stream processing

• Combining the power of real-time publish/subscribe

(i.e., sources/sinks of info) with reactive programming

• Achieve scale-out (pub/sub) and scale-up (reactive)

• Concretely, we combined the Data Distribution Service

(DDS) with .NET Reactive extensions (Rx.NET)

25

Area 2: Stream Processing for Model Learning

SIMaaS Cloud Middleware

HOST CLUSTERHOST CLUSTER

. . .. . .

Docker

Host 1

Simulation Cloud

Docker Host k

Container Manager (CM)

Result Aggregator (RA)

Docker

Host n

Docker Host 1 . . .

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Sim

Container

Performance Monitor (PM)

SIMaaS

Manager

(SM)

26

Area 3: Simulation-as-a-Service

• Middleware to support “Simulation-as-a-Service” for users to

host their simulations (e.g., DDDAS application simulations)

• Stochastic Physics model of heating of a building – large

number of parallel simulations are executed

• Resource management using Docker containers

• Virtual machines were deemed too heavy weight

Emerging Context for DDDAS

• No longer a single system that

needs to be steered but rather

need to steer multiple systems

simultaneously

• Requires trade-offs

• Deal with uncertainty

• Large-scale Big Data and Large-

scale Big Computation

• adaptive traffic light, street lights 27

• Multiple interconnected

systems (systems of

systems)

• Emergence of Internet of

Things (IoT) (and variants)

• Multiple phases needing DDDAS systems s/w

• Feedback and adaptation among the different

phases and artifacts of system software

28

New Responsibilities for DDDAS Systems S/W

DDDAS MODEL SIMULATION

Dynamic Adaptation in Sensing

Dynamic Discovery

of Info Sources

Scalable and real-

time stream processing

Provision Resources for learning

and sim

• This is the vision we started with

29

Lessons Learned & Our Needs

Resource Pool (e.g.

Data Center)

Model of the Resource

Pool

instrument

control

DDDAS Application

Model of the DDDAS

Application

instrument

control • But we don’t have

real DDDAS

applications –

rather we use

emulated applns

from Google trace

• We need to use

DDDAS community’s

models as our

workloads

• Understand how our

solutions will work with

these real applns;

develop new solns

• DDDAS Applications Community

• Utilize the application simulation models and execute them

on our cloud to create a realistic scenario of workloads

• Spoken to multiple DDDAS Applications researchers for their

applications (Yuri, Richard, Alok, Eric); more synergies

sought

• Other synergies are possible, e.g., in model learning

• DDDAS Systems Community

• Combine our work with security, parallel processing

• Spoken to systems researchers (Salim, Sanjay, Vaidy) and

utilizing mobile test bed (Shuvra)

• Industry and Govt agencies

• e.g., IBM’s work in events, stream processing, IoT

• AFRL’s work in live DBMS (communicated with Alex and

Erik) 30

Collaboration Opportunities

• Use DDDAS application case studies for resource

management

• Instrumentation using mobile devices

• Executing simulation models across range of

devices – not just a cloud data center

• Real-time stream processing with reactive

extensions

• Dynamic resource management

• Dynamic offloading from mobile devices

• Just-in-time resource provisioning

• Model learning

• Stochastic hybrid systems modeling

• Submitted DURIP and NSF/AFOSR proposals 31

Ongoing and Future Focus

• IPDPS Workshop

• IPDPS 2015 organizers have agreed to provide us

a ½ day workshop slot on the last day of

conference

• Suggest ideas, get community ideas on workshop

theme

• Participation from the community

• GPCE 2015 Conference with SPLASH

• Generative programming conference

• Gokhale is serving as program chair

• If you have ideas, please submit (due date in June,

conference in Oct in Pittsburgh, PA)

• Looking for volunteers to serve on program

committee 32

Upcoming Events of Interest

• Modeling and Systems Software project for

DDDAS

• Completed one year and 2 months

• Initial focus

• resource management in the cloud data center using

emulated workloads and pre-instrumented data centers

• Initial ideas on real-time stream processing for model

learning

• End-to-end scenario

• Focus for years 2 and 3

• Involve mobile entities, instrumentation, model learning

• Collaborate with DDDAS community for real workloads,

testbeds, etc

33

Concluding Remarks

34

•Thank You

•Questions

Stochastic Hybrid Systems Modeling & Middleware-enabled … · 2015-10-01 · Stochastic Hybrid...

Documents

Transcript of Stochastic Hybrid Systems Modeling & Middleware-enabled … · 2015-10-01 · Stochastic Hybrid...