Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness
description
Transcript of Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness
Dual Use of Performance Dual Use of Performance Analysis Techniques for System Analysis Techniques for System
Design and Improving Cyber Design and Improving Cyber Trustworthiness Trustworthiness
Aniruddha Gokhale [email protected]
Asst. Professor of EECS, Vanderbilt
University, Nashville, TN
Swapna Gokhale [email protected]
Asst. Professor of CSE, University of
Connecticut, Storrs, CT
Presented at NSWC
Dahlgren, VA
April 13, 2005
2
Focus: Distributed Performance Sensitive Software (DPSS) Systems
Military/Civilian distributed performance-sensitive software systems
• Network-centric & larger-scale “systems of systems”• Stringent simultaneous QoS demands
• e.g., dependability, security, scalability, thruput• Dynamic context
Military/Civilian distributed performance-sensitive software systems
• Network-centric & larger-scale “systems of systems”• Stringent simultaneous QoS demands
• e.g., dependability, security, scalability, thruput• Dynamic context
3
Context: Trends in DPSS Development Historically developed
using low-level APIs
Increasing use of middleware technologies
Standards-based COTS middleware helps to:– Control end-to-end
resources & QoS– Leverage hardware &
software technology advances
– Evolve to new environments & requirements
Middleware helps capture & codify commonalities across applications in different domains by providing reusable & configurable patterns-based building blocks
Examples: CORBA, .Net, J2EE, ICE, MQSeries
Patterns: Gang of Four, POSA 1,2 & 3
Key Observation
• DPSS systems are composed of patterns-based building blocks
• Observed quality of service depends on the right composition
4
Talk Outline
1. Motivation
2. Use of Performance Analysis Methods for System design
3. Use of Performance Analysis Methods for Improving Cyber Trust
4. Planned Future Work
5. Concluding Remarks
5
Problem 1: Variability in Middleware
Per Building Block Variability– Incurred due to variations in
implementations & configurations for a patterns-based building block
– E.g., single threaded versus thread-pool based reactor implementation dimension that crosscuts the event demultiplexing strategy (e.g., select, poll, WaitForMultipleObjects
Although middleware provides reusable building blocks that capture commonalities, these blocks and their compositions incur variabilities that impact performance in significant ways.
Compositional Variability– Incurred due to variations in the
compositions of these building blocks
– Need to address compatibility in the compositions and individual configurations
– Dictated by needs of the domain
– E.g., Leader-Follower makes no sense in a single threaded Reactor
Reactor
event demultiplexing strategy
event handling strategy
single threaded
thread pool
select poll WaitForMultipleObjects
Qt Tk
6
Composed System
Solution Approach: Applying Performance Analytical Models for DPSS Design
Build and validate performance models for invariant parts of middleware building blocks
Weaving of variability concerns manifested in a building block into the performance models
Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems
Estimate end-to-end performance of composed system
Iterate until design meets performance requirements
Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems
Invariant model of a
pattern
Refined model of a
patternvariability variabilityweave weave
Refined model of a
pattern
Refined model of a
pattern
Refined model of a
pattern
Refined model of a
pattern
Refined model of a
pattern
Refined model of a
patternworkload
workloadsystem
7
Problem 2: Benign/Intentional Disruptions
Terrorist threats and/or malicious users can bring down a cyber infrastructure
Normal failures (hardware and software) could also disrupt the cyber infrastructure
Existing disruption detection techniques use low-level trace data that is agnostic about the application
Application-specific disruption detection is expensive
Need a reusable middleware-based solution
8
Solution Approach: Design-Time Performance Analysis for Disruption Detection
•Identify the service profile, which consists of the modes of operation of the service which uses the building block.
•Estimate the likelihood or occurrence probabilities of each mode of operation.
•Estimate the values of the input parameters for each mode of operation.
•Obtain the values of the performance metrics for each mode of operation by solving the SRN model.
•Compute the expected estimates of the performance metrics, as the weighted sum of the performance metrics for each mode, with weights given by the occurrence probabilities of each mode.
9
Algorithm: Design-Time Performance Analysis for Disruption Detection
1. Compute performance metrics for each observation window.
2. Summarize performance metrics for several past observation windows using exponential moving average.
3. Approximate weight of each window determined by smoothing constant.
4. Compute an anomaly score for each performance metric using the chi-square test.
5. Bayesian network to correlate the anomaly scores computed using each performance metric to obtain an overall anomaly score for the building block as a whole.
6. Correlate the anomaly scores of different building blocks, residing possibly at different layers to obtain the anomaly score of the service.
7. Hierarchical correlation of anomaly scores to reduce the false positives.
10
Algorithm: Runtime Requirements
Context
Low-level data logging is application agnostic
Application-level logging is expense to implement and cannot access low-level logs
Needs
Operational data collection at multiple layers of DPSS system
Provide data logging as reusable middleware feature
QoS-driven, configurable and selective data logging e.g., based on throughput, queue delays, event losses
Collected data corresponds to QoS policy e.g., number of events, their priorities, lost events, missed deadlines
11
Case Study: The Reactor PatternThe Reactor architectural pattern allows event-driven applications to demultiplex & dispatch service requests that are delivered to an application from one or more clients.
•Many networked applications are developed as event-driven programs
•Common sources of events in these applications include activity on an IPC stream for I/O operations, POSIX signals, Windows handle signaling, & timer expirations
•Reactor pattern decouples the detection, demultiplexing, & dispatching of events from the handling of events
•Participants include the Reactor, Event handle, Event demultiplexer, abstract and concrete event handlers
Handleowns
dispatches*
notifies**
handle set
Reactorhandle_events()register_handler()remove_handler()
Event Handler
handle_event ()get_handle()
Concrete Event Handler A
handle_event ()get_handle()
Concrete Event Handler B
handle_event ()get_handle()
SynchronousEvent Demuxer
select ()
<<uses>>
12
Reactor Dynamics
Registration Phase– Event handlers register themselves with the Reactor for an event type (e.g., input,
output, timeout, exception event types)– Reactor returns a handle it maintains, which it uses to associate an event type with the
registered handler
Snapshot Phase– Main program delegates thread of control to Reactor, which in turn takes a snapshot of
the system to determine which events are enabled in that snapshot– For each enabled event, the corresponding event handler is invoked, which services
the event– When all events in a snapshot are handled, the Reactor proceeds to the next snapshot
: Main Program : ConcreteEvent Handler
: Reactor : Synchronous Event
Demultiplexer
register_handler()
get_handle()
handle_events() select()
handle_event()
Handle
Handles
Handles
Con. EventHandler Events
service()
event
13
Case Study: Virtual Router
Virtual router is used to scale virtual private networks
Differentiated services for different VPNs – security is key requirement along with scalability and dependability
Illustrates demultiplexing and dispatching semantics of the Reactor pattern
Provider Edge (PE)
Provider Edge (PE)
Provider Edge (PE)VR
VR
VR
VR
CE
CE
CE
VR
VR
VR
VR
CE
CE
CE
VR
VR
VR
VR
CE
CE
CE
CE
CE
Provider Edge (PE)VR
VR
VR
VR
Level 2 Service Provider
Backbone 1
Provider Edge (PE) VR
VR
VR
VR
Level 1 Service Providers
Provider Edge (PE) VR
VRVR
Backbone 2
VRVR
VR
CE
CE
CE
CE
CE
CE
CE
VP
N1
VP
N2
VP
N3
VP
N1
VP
N2
VP
N3
Virtual Router
FirewallMultiple tunnels to customer edge or virtual routers
Multiple tunnels to backbone or virtual routers
Level 1 Service Providers
14
Characteristics of the Reactor Performance Model
network
Single Threaded Reactor
Event Handler with
exponential service time m1
select-based event demultiplexer
Event Handler with
exponential service time m2
l2 Poisson arrival rate
l1 Poisson arrival rate
N1
N2
incoming events
•Single-threaded, select-based Reactor implementation
•Reactor accepts two types of input events, with one event handler registered for each event type with the Reactor
•Each event type has a separate queue to hold the incoming events. Buffer capacity for events of type one is 1 and of type two is 2.
•Event arrivals are Poisson for type one and type two events with rates l1and l2.
•Event service time is exponential for type one and type two events with rates m1and m2.
•In a snapshot, event of type one is serviced with a higher priority over event of type two. - Event handles corresponding to both types of events are enabled, event handle of type one event is serviced prior to event handle of type two event.
15
Performance Metrics for the Reactor•Throughput:
-Number of events that can be processed -Applications such as telecommunications call processing.
•Queue length: -Queuing for the event handler queues. -Appropriate scheduling policies for applications with real-time requirements.
•Total number of events: -Total number of events in the system. -Scheduling decisions. -Resource provisioning required to sustain system demands.
•Probability of event loss: -Events discarded due to lack of buffer space. -Safety-critical systems. -Levels of resource provisioning.
•Response time: -Time taken to service the incoming event. -Bounded response time for real-time systems.
16
Using Stochastic Reward Nets (SRNs) for Performance Analysis
• Stochastic Reward Nets (SRNs) are an extension to Generalized Stochastic Petri Nets (GSPNs) which are an extension to Petri Nets.
• Extend the modeling power of GSPNs by allowing: Guard functions Marking-dependent arc multiplicities General transition probabilities Reward rates at the net level
• Allow model specification at a level closer to intuition.
• Solved using tools such as SPNP (Stochastic Petri Net Package).
17
Modeling the Reactor using SRN
Part A:•Models arrivals, queuing, and prioritized service of events. •Transitions A1 and A2: Event arrivals.•Places B1 and B2: Buffer/queues.•Places S1 and S2: Service of the events.•Transitions Sr1 and Sr2: Service completions.•Inhibitor arcs: Place B1and transition A1 with multiplicity N1 (B2, A2, N2) - Prevents firing of transition A1 when there are N1 tokens in place B1. •Inhibitor arc from place S1 to transition Sr2: - Offers prioritized service to an event of type one over event of type two. - Prevents firing of transition Sr2 when there is a token in place S1.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
18
Reactor SRN: Taking a Snapshot
Part B:•Process of taking successive snapshots•Sn1 enabled: Token in StSnpSht & Tokens in B1 & No Token in S1.•Sn2 enabled: Token in StSnpSht & Tokens in B2 & No Token in S2.•T_SrvSnpSht enabled: Token in S1 or/and S2.•T_EndSnpSht enabled: No token in S1 and S2.•Sn1 and Sn2 have same priority•T_SrvSnpSht lower priority than Sn1 and Sn2
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
19
Reactor SRN: Initial Marking
Initial marking:•StSnpSht = 1, B1 = 2, B2 = 2, S1 = 0, S2 = 0•Transitions enabled: Sn1 and Sn2•Sn1 fires.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
20
Reactor SRN: Firing a Transition (1/6)
Upon firing of Sn1:•StSnpSht = 1, B1 = 1, B2 = 2, S1 = 1, S2 = 0•Transitions enabled: Sr1, Sn2, T_SrvSnpSht•Sn2 and T_SrvSnhpSht are immediate transitions, have to fire before Sr1.•T_SrvSnpSht has a lower priority over Sn2.•Sn2 fires.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
21
Reactor SRN: Firing a Transition (2/6)
Upon firing of Sn2:•StSnpSht = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1•Transitions enabled: Sr1, T_SrvSnpSht•T_SrvSnhpSht is an immediate transition, has to fire before Sr1.•T_SrvSnpSht fires.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
22
Reactor SRN: Firing a Transition (3/6)
Upon firing of T_SrvSnpSht:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1•Transitions enabled: Sr1•Snapshot in progress.•Sr1 fires.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
23
Reactor SRN: Firing a Transition (4/6)
Upon firing of Sr1:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 1•Transitions enabled: Sr2•Snapshot in progress•Sr2 fires.
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
24
Reactor SRN: Firing a Transition (5/6)
Upon firing of Sr2:•TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0•Transitions enabled: T_EndSnpSht•End of snapshot•T_EndSnpSht fires
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
25
Reactor SRN: Firing a Transition (6/6)
Upon firing of T_EndSnpSht:•StSnpSht = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0•Transitions enabled: Sn1 and Sn2•Back to initial state
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
26
Reactor SRN: Performance MeasuresReward rate assignments to compute performance measures
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
•Throughput: Rate of firing of transitions Sr1 (Sr2).•Queue length: Number of tokens in place B1 (B2).•Total number of events: Sum of the tokens in places B1 & S1 (B2 & S2)•Probability of event loss: Number of tokens in place B1 == N1 (B2 == N2)•Response time: Can be obtained using the tagged customer approach. •SRN model solved using Stochastic Petri Net Package (SPNP) to obtain estimates of performance metrics.
27
Designing DPSS Systems using SRNs
N1 N2A1 A2
B1 B2
Sn1 Sn2
S2S1
Sr1 Sr2
StSnpSht
SnpShtInProg
T_SrvSnpSht T_EndSnpSht
(a) (b)
Initial Step• Obtain performance measures for individual patterns-based building blocks
Iterative Algorithm• Compose systems vertically and horizontally to form a DPSS system• Determine performance measures for specified workloads and service times• Alter the configurations until DPSS performance meets specifications.
28
VR SRN: Performance Model•VR provides VPN service to two organizations.
•Each organization has a customer edge router (CE) connected to the VR
•Employees of each organization issue connection set up and tear down requests: - Employees classified into two categories: Technical & Administrative
•Differentiated level of service: - Technical employees receive prioritized service over admin. employees
•Reactor pattern could be used to (de)multiplex these events: - Requests from tech. employees constitute event #1 (l1m1N1) - Requests from admin. employees constitute event #2 (l2m2N2)
•SRN model of the Reactor could be used to obtain estimates of performancemetrics.
29
VR SRN: Performance Estimates•SRN model solved using Stochastic Petri Net Package (SPNP) to obtain estimates of performance metrics.•Parameter values:l1secl2/sec, m12secm22/sec.
•Two cases: N1 = N2 = 1, and N1 = N2 = 5.
Observations:•Probability of event loss is higher when the buffer space is 1•Total number of events of type two is higher than type one. •Events of type two stay in the system longer than events of type one.•May degrade the response time of event requests for admin. employeescompared to requests from technical employees
N1 = N2 = 1 N1 = N2 = 5Perf. metric
#1 #2 #1 #2
Throughput 0.37/s 0.37/s 0.40/s 0.40/s
Queue length 0.065 0.065 0.12 0.12
Total events 0.25 0.27 0.32 0.35
Loss probab. 0.065 0.065 .00026 .00026
30
VR SRN: Sensitivity Analysis•Analyze the sensitivity of performance metrics to variations in input parameter values.•Vary l1from 0.5/sec to 2.0/sec. •Values of other parameters:l2/sec, m12secm22/sec, N1 = N2 = 5.•Compute performance measures for each one of the input values.
Observations:•Throughput of event requests from technical employees increases, but rate of increase declines.•Throughput of event requests from admin employees remains unchanged.
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.4 0.44 0.5 0.57 0.66 0.8 1 1.33 2
Lambda1
Th
rou
gh
pu
t
31
VR SRN: Expected Behavior•VPN service has two modes of operation: normal & inclement.
•Normal mode: - Daily basis, some employees have negotiated telecommute plans and use VPN for remote access.
•Inclement mode: - Hazardous driving conditions due to bad weather may keep people at home. - Large number of telecommuters - Increase in the connection set up and tear down requests.
•Modes of operation can be defined at a finer level of granularity, such as a few hours, rather than a day.
32
VR SRN: Expected Behavior•Normal mode: - l1secl2/sec, m12secm22/sec, N1 = N2 = 5 - Probability – 0.9
•Inclement mode: - l11secl21/sec, m12secm22/sec, N1 = N2 = 5 - Probability – 0.1
Perf. Metric Normal Inclement AverageEvent #1
Throughput 0.40/s 0.90/sec 0.4510/s
Queue length 0.12 1.86 0.2940
Loss probab. 0.09 0.21 0.0291
33
VR SRN: Disruption Detection•Obtain an anomaly score for the Reactor based on each one of the performance metrics for each event type.
•Correlate the anomaly scores based on each event type to obtain an overallanomaly score for the Reactor. - Anomaly score for the Reactor used at each CE to demultiplex events from two groups within a single organization.
•Anomaly score for the Reactor in the VR used to demultiplex events from the two organizations.
•Correlate the anomaly score of the Reactor in the VR with the score of the Reactor in CE #1 to determine service disruptions for organization #1.
•Correlate the anomaly score of the Reactor in the VR with the score of the Reactor in CE #2 to determine service disruptions for organization #2.
•Source of disruption may be identified by correlating the scores at various layers.
34
Future Collaborative Research Performance analysis methodology (UConn – S. Gokhale)
– Develop and validate performance models for invariant characteristics of building blocks.
– Compose and validate performance models for common building block compositions.
– Develop model decomposition and solution strategies to alleviate state-space explosion issue.
Model-driven generative methodology (Vanderbilt – A. Gokhale)– Manually developing performance models of each block with its variations
is cumbersome– Compositions of building blocks cannot be made in ad hoc, arbitrary
manner– Model-driven generative tools use visual modeling languages and model
interpreters to automate tedious tasks and provide “correct-by-construction” development
Aspect-oriented methodology (Univ of Alabama, Birmingham – J. Gray)– Variability in building blocks and compositions is a primary candidate for
separating the concern as an aspect– Aspect weaving technology can be used to refine and enhance the models
by weaving in the concerns into the performance models
35
Concluding Remarks DPSS systems becoming increasingly complex
Increasing use of middleware technologies
Middleware resolves many challenges of DPSS development but also incurs many variability challenges due to their flexibility
Need to estimate performance early in development lifecycle
Goal is to use model-based performance analysis, model-driven generative techniques and aspect weaving to build middleware stacks whose performance can be estimated at design-time
Performance analysis can also be used to improve cyber trustworthiness
www.cse.uconn.edu/~ssg (Swapna Gokhale)
www.dre.vanderbilt.edu/~gokhale (Aniruddha Gokhale)
www.gray-area.org (Jeff Gray)
Questions?
EXTRAS
Dual Use of Performance Dual Use of Performance Analysis Techniques for System Analysis Techniques for System
Design and Improving Cyber Design and Improving Cyber Trustworthiness Trustworthiness
Aniruddha Gokhale [email protected]
Asst. Professor of EECS, Vanderbilt
University, Nashville, TN
Jeffrey Gray [email protected] Asst. Professor of CIS, University of
Alabama, Birmingham, AL
Swapna Gokhale [email protected]
Asst. Professor of CSE, University of
Connecticut, Storrs, CT
Presented at NSWC
Dahlgren, VA
April 13, 2005
39
Solution: A New Approach to DPSS Design
Build and validate performance models for invariant parts of middleware building blocks
Weaving of variability concerns manifested in a building block into the performance models
Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems
Estimate end-to-end performance of composed system
Iterate until design meets performance requirements
MIDDLEWARE
Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems
Submissions to Sigmetrics 2005, HPDC 2005, Globecom 2005, IAWS 2005
Planned submission to SRDS 2005, ISSRE 2005