Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness

Dual Use of Performance Dual Use of Performance Analysis Techniques for System Analysis Techniques for System

Design and Improving Cyber Design and Improving Cyber Trustworthiness Trustworthiness

Aniruddha Gokhale [email protected]

Asst. Professor of EECS, Vanderbilt

University, Nashville, TN

Swapna Gokhale [email protected]

Asst. Professor of CSE, University of

Connecticut, Storrs, CT

Presented at NSWC

Dahlgren, VA

April 13, 2005

mailto:[email protected]



2

Focus: Distributed Performance Sensitive Software (DPSS) Systems

Military/Civilian distributed performance-sensitive software systems

• Network-centric & larger-scale “systems of systems”• Stringent simultaneous QoS demands

• e.g., dependability, security, scalability, thruput• Dynamic context

Military/Civilian distributed performance-sensitive software systems

• Network-centric & larger-scale “systems of systems”• Stringent simultaneous QoS demands

• e.g., dependability, security, scalability, thruput• Dynamic context

3

Context: Trends in DPSS Development Historically developed

using low-level APIs

Increasing use of middleware technologies

Standards-based COTS middleware helps to:– Control end-to-end

resources & QoS– Leverage hardware &

software technology advances

– Evolve to new environments & requirements

Middleware helps capture & codify commonalities across applications in different domains by providing reusable & configurable patterns-based building blocks

Examples: CORBA, .Net, J2EE, ICE, MQSeries

Patterns: Gang of Four, POSA 1,2 & 3

Key Observation

• DPSS systems are composed of patterns-based building blocks

• Observed quality of service depends on the right composition

4

Talk Outline

1. Motivation

2. Use of Performance Analysis Methods for System design

3. Use of Performance Analysis Methods for Improving Cyber Trust

4. Planned Future Work

5. Concluding Remarks

5

Problem 1: Variability in Middleware

Per Building Block Variability– Incurred due to variations in

implementations & configurations for a patterns-based building block

– E.g., single threaded versus thread-pool based reactor implementation dimension that crosscuts the event demultiplexing strategy (e.g., select, poll, WaitForMultipleObjects

Although middleware provides reusable building blocks that capture commonalities, these blocks and their compositions incur variabilities that impact performance in significant ways.

Compositional Variability– Incurred due to variations in the

compositions of these building blocks

– Need to address compatibility in the compositions and individual configurations

– Dictated by needs of the domain

– E.g., Leader-Follower makes no sense in a single threaded Reactor

Reactor

event demultiplexing strategy

event handling strategy

single threaded

thread pool

select poll WaitForMultipleObjects

Qt Tk

6

Composed System

Solution Approach: Applying Performance Analytical Models for DPSS Design

Build and validate performance models for invariant parts of middleware building blocks

Weaving of variability concerns manifested in a building block into the performance models

Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems

Estimate end-to-end performance of composed system

Iterate until design meets performance requirements

Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems

Invariant model of a

pattern

Refined model of a

patternvariability variabilityweave weave

Refined model of a

pattern

Refined model of a

pattern

Refined model of a

pattern

Refined model of a

pattern

Refined model of a

pattern

Refined model of a

patternworkload

workloadsystem

7

Problem 2: Benign/Intentional Disruptions

Terrorist threats and/or malicious users can bring down a cyber infrastructure

Normal failures (hardware and software) could also disrupt the cyber infrastructure

Existing disruption detection techniques use low-level trace data that is agnostic about the application

Application-specific disruption detection is expensive

Need a reusable middleware-based solution

8

Solution Approach: Design-Time Performance Analysis for Disruption Detection

•Identify the service profile, which consists of the modes of operation of the service which uses the building block.

•Estimate the likelihood or occurrence probabilities of each mode of operation.

•Estimate the values of the input parameters for each mode of operation.

•Obtain the values of the performance metrics for each mode of operation by solving the SRN model.

•Compute the expected estimates of the performance metrics, as the weighted sum of the performance metrics for each mode, with weights given by the occurrence probabilities of each mode.

9

Algorithm: Design-Time Performance Analysis for Disruption Detection

1. Compute performance metrics for each observation window.

2. Summarize performance metrics for several past observation windows using exponential moving average.

3. Approximate weight of each window determined by smoothing constant.

4. Compute an anomaly score for each performance metric using the chi-square test.

5. Bayesian network to correlate the anomaly scores computed using each performance metric to obtain an overall anomaly score for the building block as a whole.

6. Correlate the anomaly scores of different building blocks, residing possibly at different layers to obtain the anomaly score of the service.

7. Hierarchical correlation of anomaly scores to reduce the false positives.

10

Algorithm: Runtime Requirements

Context

Low-level data logging is application agnostic

Application-level logging is expense to implement and cannot access low-level logs

Needs

Operational data collection at multiple layers of DPSS system

Provide data logging as reusable middleware feature

QoS-driven, configurable and selective data logging e.g., based on throughput, queue delays, event losses

Collected data corresponds to QoS policy e.g., number of events, their priorities, lost events, missed deadlines

11

Case Study: The Reactor PatternThe Reactor architectural pattern allows event-driven applications to demultiplex & dispatch service requests that are delivered to an application from one or more clients.

•Many networked applications are developed as event-driven programs

•Common sources of events in these applications include activity on an IPC stream for I/O operations, POSIX signals, Windows handle signaling, & timer expirations

•Reactor pattern decouples the detection, demultiplexing, & dispatching of events from the handling of events

•Participants include the Reactor, Event handle, Event demultiplexer, abstract and concrete event handlers

Handleowns

dispatches*

notifies**

handle set

Reactorhandle_events()register_handler()remove_handler()

Event Handler

handle_event ()get_handle()

Concrete Event Handler A


Concrete Event Handler B


SynchronousEvent Demuxer

select ()

<<uses>>

12

Reactor Dynamics

Registration Phase– Event handlers register themselves with the Reactor for an event type (e.g., input,

output, timeout, exception event types)– Reactor returns a handle it maintains, which it uses to associate an event type with the

registered handler

Snapshot Phase– Main program delegates thread of control to Reactor, which in turn takes a snapshot of

the system to determine which events are enabled in that snapshot– For each enabled event, the corresponding event handler is invoked, which services

the event– When all events in a snapshot are handled, the Reactor proceeds to the next snapshot

: Main Program : ConcreteEvent Handler

: Reactor : Synchronous Event

Demultiplexer

register_handler()

get_handle()

handle_events() select()

handle_event()

Handle

Handles

Handles

Con. EventHandler Events

service()

event

13

Case Study: Virtual Router

Virtual router is used to scale virtual private networks

Differentiated services for different VPNs – security is key requirement along with scalability and dependability

Illustrates demultiplexing and dispatching semantics of the Reactor pattern

Provider Edge (PE)

Provider Edge (PE)

Provider Edge (PE)VR

VR

VR

VR

CE

CE

CE

VR

VR

VR

VR

CE

CE

CE

VR

VR

VR

VR

CE

CE

CE

CE

CE

Provider Edge (PE)VR

VR

VR

VR

Level 2 Service Provider

Backbone 1

Provider Edge (PE) VR

VR

VR

VR

Level 1 Service Providers

Provider Edge (PE) VR

VRVR

Backbone 2

VRVR

VR

CE

CE

CE

CE

CE

CE

CE

VP

N1

VP

N2

VP

N3

VP

N1

VP

N2

VP

N3

Virtual Router

FirewallMultiple tunnels to customer edge or virtual routers

Multiple tunnels to backbone or virtual routers

Level 1 Service Providers

14

Characteristics of the Reactor Performance Model

network

Single Threaded Reactor

Event Handler with

exponential service time m1

select-based event demultiplexer

Event Handler with

exponential service time m2

l2 Poisson arrival rate

l1 Poisson arrival rate

N1

N2

incoming events

•Single-threaded, select-based Reactor implementation

•Reactor accepts two types of input events, with one event handler registered for each event type with the Reactor

•Each event type has a separate queue to hold the incoming events. Buffer capacity for events of type one is 1 and of type two is 2.

•Event arrivals are Poisson for type one and type two events with rates l1and l2.

•Event service time is exponential for type one and type two events with rates m1and m2.

•In a snapshot, event of type one is serviced with a higher priority over event of type two. - Event handles corresponding to both types of events are enabled, event handle of type one event is serviced prior to event handle of type two event.

15

Performance Metrics for the Reactor•Throughput:

-Number of events that can be processed -Applications such as telecommunications call processing.

•Queue length: -Queuing for the event handler queues. -Appropriate scheduling policies for applications with real-time requirements.

•Total number of events: -Total number of events in the system. -Scheduling decisions. -Resource provisioning required to sustain system demands.

•Probability of event loss: -Events discarded due to lack of buffer space. -Safety-critical systems. -Levels of resource provisioning.

•Response time: -Time taken to service the incoming event. -Bounded response time for real-time systems.

16

Using Stochastic Reward Nets (SRNs) for Performance Analysis

• Stochastic Reward Nets (SRNs) are an extension to Generalized Stochastic Petri Nets (GSPNs) which are an extension to Petri Nets.

• Extend the modeling power of GSPNs by allowing: Guard functions Marking-dependent arc multiplicities General transition probabilities Reward rates at the net level

• Allow model specification at a level closer to intuition.

• Solved using tools such as SPNP (Stochastic Petri Net Package).

17

Modeling the Reactor using SRN

Part A:•Models arrivals, queuing, and prioritized service of events. •Transitions A1 and A2: Event arrivals.•Places B1 and B2: Buffer/queues.•Places S1 and S2: Service of the events.•Transitions Sr1 and Sr2: Service completions.•Inhibitor arcs: Place B1and transition A1 with multiplicity N1 (B2, A2, N2) - Prevents firing of transition A1 when there are N1 tokens in place B1. •Inhibitor arc from place S1 to transition Sr2: - Offers prioritized service to an event of type one over event of type two. - Prevents firing of transition Sr2 when there is a token in place S1.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg

T_SrvSnpSht T_EndSnpSht

(a) (b)

18

Reactor SRN: Taking a Snapshot

Part B:•Process of taking successive snapshots•Sn1 enabled: Token in StSnpSht & Tokens in B1 & No Token in S1.•Sn2 enabled: Token in StSnpSht & Tokens in B2 & No Token in S2.•T_SrvSnpSht enabled: Token in S1 or/and S2.•T_EndSnpSht enabled: No token in S1 and S2.•Sn1 and Sn2 have same priority•T_SrvSnpSht lower priority than Sn1 and Sn2

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

19

Reactor SRN: Initial Marking

Initial marking:•StSnpSht = 1, B1 = 2, B2 = 2, S1 = 0, S2 = 0•Transitions enabled: Sn1 and Sn2•Sn1 fires.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

20

Reactor SRN: Firing a Transition (1/6)

Upon firing of Sn1:•StSnpSht = 1, B1 = 1, B2 = 2, S1 = 1, S2 = 0•Transitions enabled: Sr1, Sn2, T_SrvSnpSht•Sn2 and T_SrvSnhpSht are immediate transitions, have to fire before Sr1.•T_SrvSnpSht has a lower priority over Sn2.•Sn2 fires.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

21


Upon firing of Sn2:•StSnpSht = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1•Transitions enabled: Sr1, T_SrvSnpSht•T_SrvSnhpSht is an immediate transition, has to fire before Sr1.•T_SrvSnpSht fires.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

22


Upon firing of T_SrvSnpSht:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1•Transitions enabled: Sr1•Snapshot in progress.•Sr1 fires.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

23


Upon firing of Sr1:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 1•Transitions enabled: Sr2•Snapshot in progress•Sr2 fires.

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

24


Upon firing of Sr2:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0•Transitions enabled: T_EndSnpSht•End of snapshot•T_EndSnpSht fires

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

25


Upon firing of T_EndSnpSht:•StSnpSht = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0•Transitions enabled: Sn1 and Sn2•Back to initial state

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

26

Reactor SRN: Performance MeasuresReward rate assignments to compute performance measures

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

•Throughput: Rate of firing of transitions Sr1 (Sr2).•Queue length: Number of tokens in place B1 (B2).•Total number of events: Sum of the tokens in places B1 & S1 (B2 & S2)•Probability of event loss: Number of tokens in place B1 == N1 (B2 == N2)•Response time: Can be obtained using the tagged customer approach. •SRN model solved using Stochastic Petri Net Package (SPNP) to obtain estimates of performance metrics.

27

Designing DPSS Systems using SRNs

N1 N2A1 A2

B1 B2

Sn1 Sn2

S2S1

Sr1 Sr2

StSnpSht

SnpShtInProg


(a) (b)

Initial Step• Obtain performance measures for individual patterns-based building blocks

Iterative Algorithm• Compose systems vertically and horizontally to form a DPSS system• Determine performance measures for specified workloads and service times• Alter the configurations until DPSS performance meets specifications.

28

VR SRN: Performance Model•VR provides VPN service to two organizations.

•Each organization has a customer edge router (CE) connected to the VR

•Employees of each organization issue connection set up and tear down requests: - Employees classified into two categories: Technical & Administrative

•Differentiated level of service: - Technical employees receive prioritized service over admin. employees

•Reactor pattern could be used to (de)multiplex these events: - Requests from tech. employees constitute event #1 (l1m1N1) - Requests from admin. employees constitute event #2 (l2m2N2)

•SRN model of the Reactor could be used to obtain estimates of performancemetrics.

29

VR SRN: Performance Estimates•SRN model solved using Stochastic Petri Net Package (SPNP) to obtain estimates of performance metrics.•Parameter values:l1secl2/sec, m12secm22/sec.

•Two cases: N1 = N2 = 1, and N1 = N2 = 5.

Observations:•Probability of event loss is higher when the buffer space is 1•Total number of events of type two is higher than type one. •Events of type two stay in the system longer than events of type one.•May degrade the response time of event requests for admin. employeescompared to requests from technical employees

N1 = N2 = 1 N1 = N2 = 5Perf. metric

#1 #2 #1 #2

Throughput 0.37/s 0.37/s 0.40/s 0.40/s

Queue length 0.065 0.065 0.12 0.12

Total events 0.25 0.27 0.32 0.35

Loss probab. 0.065 0.065 .00026 .00026

30

VR SRN: Sensitivity Analysis•Analyze the sensitivity of performance metrics to variations in input parameter values.•Vary l1from 0.5/sec to 2.0/sec. •Values of other parameters:l2/sec, m12secm22/sec, N1 = N2 = 5.•Compute performance measures for each one of the input values.

Observations:•Throughput of event requests from technical employees increases, but rate of increase declines.•Throughput of event requests from admin employees remains unchanged.

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0.4 0.44 0.5 0.57 0.66 0.8 1 1.33 2

Lambda1

Th

rou

gh

pu

t

31

VR SRN: Expected Behavior•VPN service has two modes of operation: normal & inclement.

•Normal mode: - Daily basis, some employees have negotiated telecommute plans and use VPN for remote access.

•Inclement mode: - Hazardous driving conditions due to bad weather may keep people at home. - Large number of telecommuters - Increase in the connection set up and tear down requests.

•Modes of operation can be defined at a finer level of granularity, such as a few hours, rather than a day.

32

VR SRN: Expected Behavior•Normal mode: - l1secl2/sec, m12secm22/sec, N1 = N2 = 5 - Probability – 0.9

•Inclement mode: - l11secl21/sec, m12secm22/sec, N1 = N2 = 5 - Probability – 0.1

Perf. Metric Normal Inclement AverageEvent #1

Throughput 0.40/s 0.90/sec 0.4510/s

Queue length 0.12 1.86 0.2940

Loss probab. 0.09 0.21 0.0291

33

VR SRN: Disruption Detection•Obtain an anomaly score for the Reactor based on each one of the performance metrics for each event type.

•Correlate the anomaly scores based on each event type to obtain an overallanomaly score for the Reactor. - Anomaly score for the Reactor used at each CE to demultiplex events from two groups within a single organization.

•Anomaly score for the Reactor in the VR used to demultiplex events from the two organizations.

•Correlate the anomaly score of the Reactor in the VR with the score of the Reactor in CE #1 to determine service disruptions for organization #1.

•Correlate the anomaly score of the Reactor in the VR with the score of the Reactor in CE #2 to determine service disruptions for organization #2.

•Source of disruption may be identified by correlating the scores at various layers.

34

Future Collaborative Research Performance analysis methodology (UConn – S. Gokhale)

– Develop and validate performance models for invariant characteristics of building blocks.

– Compose and validate performance models for common building block compositions.

– Develop model decomposition and solution strategies to alleviate state-space explosion issue.

Model-driven generative methodology (Vanderbilt – A. Gokhale)– Manually developing performance models of each block with its variations

is cumbersome– Compositions of building blocks cannot be made in ad hoc, arbitrary

manner– Model-driven generative tools use visual modeling languages and model

interpreters to automate tedious tasks and provide “correct-by-construction” development

Aspect-oriented methodology (Univ of Alabama, Birmingham – J. Gray)– Variability in building blocks and compositions is a primary candidate for

separating the concern as an aspect– Aspect weaving technology can be used to refine and enhance the models

by weaving in the concerns into the performance models

35

Concluding Remarks DPSS systems becoming increasingly complex

Increasing use of middleware technologies

Middleware resolves many challenges of DPSS development but also incurs many variability challenges due to their flexibility

Need to estimate performance early in development lifecycle

Goal is to use model-based performance analysis, model-driven generative techniques and aspect weaving to build middleware stacks whose performance can be estimated at design-time

Performance analysis can also be used to improve cyber trustworthiness

www.cse.uconn.edu/~ssg (Swapna Gokhale)

www.dre.vanderbilt.edu/~gokhale (Aniruddha Gokhale)

www.gray-area.org (Jeff Gray)

http://www.cse.uconn.edu/~ssg

http://www.dre.vanderbilt.edu/~gokhale

http://www.gray-area.org/

Questions?

EXTRAS

Dual Use of Performance Dual Use of Performance Analysis Techniques for System Analysis Techniques for System

Design and Improving Cyber Design and Improving Cyber Trustworthiness Trustworthiness

Aniruddha Gokhale [email protected]

Asst. Professor of EECS, Vanderbilt

University, Nashville, TN

Jeffrey Gray [email protected] Asst. Professor of CIS, University of

Alabama, Birmingham, AL

Swapna Gokhale [email protected]

Asst. Professor of CSE, University of

Connecticut, Storrs, CT

Presented at NSWC

Dahlgren, VA

April 13, 2005





39

Solution: A New Approach to DPSS Design

Build and validate performance models for invariant parts of middleware building blocks

Weaving of variability concerns manifested in a building block into the performance models

Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems

Estimate end-to-end performance of composed system

Iterate until design meets performance requirements

MIDDLEWARE

Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems

Submissions to Sigmetrics 2005, HPDC 2005, Globecom 2005, IAWS 2005

Planned submission to SRDS 2005, ISSRE 2005

Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness

Documents

Transcript of Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness