Performance Monitoring of Diverse Computer Systems Joseph M. Lancaster, Roger D. Chamberlain Dept....

38
Performance Monitoring of Diverse Computer Systems Joseph M. Lancaster, Roger D. Chamberlain Dept. of Computer Science and Engineering Washington University in St. Louis {lancaster, roger}@wustl.edu Research supported by NSF grant CNS-0720667

Transcript of Performance Monitoring of Diverse Computer Systems Joseph M. Lancaster, Roger D. Chamberlain Dept....

Performance Monitoring of Diverse Computer Systems

Joseph M. Lancaster, Roger D. Chamberlain

Dept. of Computer Science and EngineeringWashington University in St. Louis

{lancaster, roger}@wustl.edu

Research supported by NSF grant CNS-0720667

Performance Monitoring of Diverse Computer Systems

Run correctly Do not dead-lock Meet hard real-time deadlines

Run fast High-throughput / low latency Low rate of soft deadline misses

Infrastructure should help us debug when it runs incorrectly or slow

9/25/08 – HPEC 2008 2

Performance Monitoring of Diverse Computer Systems04/20/23 3

Increasingly common in HPEC systems e.g. Mercury, XtremeData, DRC,

Nallatech, ClearSpeed

CMP

CORE CORE

FPGA

µP Logic

Performance Monitoring of Diverse Computer Systems04/20/23 4

App deployed using all four components

CMPCORE CORE

CMPCORE CORE

GPUFPGA

CMP

CMP

GPU

FPG

A

Performance Monitoring of Diverse Computer Systems04/20/23 5

FPGA Cell CMPCORE

CORE

CORE

CORE

CORE

CORE

CORE

CORE

GPUCORE x256

CMP

CORE CORE

CORECORE

CMP

CORE CORE

CORECORE

CORE LOGIC

Performance Monitoring of Diverse Computer Systems

Large performance gains realized Power efficient compared to CMP alone

– Requires knowledge of individual architectures/languages

– Components operate independently Distributed system Separate memories and clocks

04/20/23 6

Performance Monitoring of Diverse Computer Systems04/20/23 7

Tool support for these systems insufficient Many architectures lack tools for

monitoring and validation Tools for different architectures not

integrated Ad hoc solutions

Solution: Runtime performance monitoring and validation for diverse systems!

Performance Monitoring of Diverse Computer Systems

Introduction Runtime performance monitoring Frame monitoring User-guidance

04/20/23 8

Performance Monitoring of Diverse Computer Systems04/20/23 9

Natural fit for diverse HPEC systems

Dataflow model Composed of

blocks and edges

Blocks compute concurrently

Data flows along edges

Languages: StreamIt, Streams-C, X

A

B

C

D

Performance Monitoring of Diverse Computer Systems04/20/23 10

GPU

FPGA

A

B

C

D

CMP

CORE 1 CORE 2

Performance Monitoring of Diverse Computer Systems

CMP

CORE 1 CORE 2

04/20/23 11

GPU

FPGA

A

B

C

D

A D

B

C

Performance Monitoring of Diverse Computer Systems04/20/23 12

Programming model

Strategy Tools / Environments

Shared Memory Execution profiling gprof, Valgrind, PAPI

Message Passing Execution profiling, message logging

TAU, mpiP, PARAVER

Stream Programming

Simulation StreamIt [MIT], StreamC [Stanford], Streams-C [LANL],Auto-Pipe [WUSTL]

Performance Monitoring of Diverse Computer Systems

Limitations for diverse systems No universal PC or architecture No shared memory Different clocks Communication latency and bandwidth

04/20/23 13

Performance Monitoring of Diverse Computer Systems

Simulation is a useful first step but: Models can abstract away system details Too slow for large datasets HPEC applications growing in complexity

Need to monitor deployed, running app Measure actual performance of system Validate performance of large, real-world

datasets

04/20/23 14

Performance Monitoring of Diverse Computer Systems

Report more than just aggregate statistics Capture rare events

Quantify measurement impact where possible Overhead due to sampling, communication, etc.

Measure runtime performance efficiently Low overhead High accuracy

Validate performance of real datasets Increase developer productivity

04/20/23 15

Performance Monitoring of Diverse Computer Systems

Monitor edges / queues Find bottlenecks in app

Change over time? Computation or communication?

Measure latency between two points

04/20/23 16

12

3

4

56

Performance Monitoring of Diverse Computer Systems04/20/23 17

Interconnects are a precious resource Uses same interconnects as application Stay below bandwidth constraint Keep

perturbation low

CMP

CORE CORE

FPGA

µP

FPGA Agent

App. Logic

Monitor Server

CPUAgent

App.Code App.

Code

Performance Monitoring of Diverse Computer Systems

Understand measurement perturbation Dedicate compute resources when

possible Aggressively reduce amount of

performance meta-data stored and transmitted

Utilize compression in both time resolution and fidelity of data values

Use knowledge from user to specify their performance expectations / measurements

04/20/23 18

Performance Monitoring of Diverse Computer Systems

Use CMP core as the server monitor Monitor other cores for performance information Process data from agents (e.g. FPGA, GPU) Combine hardware and software information for

global view Use logical clocks to synchronize events

Dedicate unused FPGA area to monitoring

04/20/23 19

Performance Monitoring of Diverse Computer Systems

Introduction Runtime Performance Monitoring Frame monitoring User-guidance

04/20/23 20

Performance Monitoring of Diverse Computer Systems04/20/23 21

Performance Monitoring of Diverse Computer Systems

A frame summarizes performance over a period of the execution

Maintain some temporal information Capture system performance anomalies

04/20/23 22

Time

Performance Monitoring of Diverse Computer Systems

A frame summarizes performance over a period of the execution

Maintain some temporal information Capture system performance anomalies

04/20/23 23

Time

1

Performance Monitoring of Diverse Computer Systems

A frame summarizes performance over a period of the execution

Maintain some temporal information Capture system performance anomalies

04/20/23 24

Time

1 2 3 4 5 6

Performance Monitoring of Diverse Computer Systems

A frame summarizes performance over a period of the execution

Maintain some temporal information Capture system performance anomalies

04/20/23 25

Time

1 2 3 4 5 6 7 8 9

Performance Monitoring of Diverse Computer Systems

Each frame reports one performance metric Frame size can be dynamic

Dynamic bandwidth budget Low variance data / application phases Trade temporal granularity for lower perturbation

Frames from different agents will likely be unsynchronized and different sizes

Monitor server presents user with consistent global view of performance

04/20/23 26

Performance Monitoring of Diverse Computer Systems

Introduction Runtime Performance Monitoring Frame Monitoring User-guidance

04/20/23 27

Performance Monitoring of Diverse Computer Systems

Why? Related work: Performance Assertions for

Mobile Devices [Lenecevicius’06] Validates user performance assertions on multi-

threaded embedded CPU Our system enables validation of

performance expectations across diverse architectures

04/20/23 28

Performance Monitoring of Diverse Computer Systems

1. Measurement User specifies a set of “taps” for agent Taps can be off an edge or an input queue Agent then records events on each tap

Supported measurements for a tap: Average value + standard deviation Min or max value Histogram of values Outliers (based on parameter)

Basic arithmetic and logical operators on taps: Arithmetic: add, subtract, multiply, divide Logic: and, or, not

04/20/23 29

Performance Monitoring of Diverse Computer Systems

What is the throughput of block A?

04/20/23 30

A

Measurement Context

Runtime Monitor

Performance Monitoring of Diverse Computer Systems

What is throughput of block A when it is not data starved?

04/20/23 31

A

Measurement Context

Runtime Monitor

Performance Monitoring of Diverse Computer Systems

What is the throughput of block A when not starved for data and no downstream congestion

04/20/23 32

A

Measurement Context

Runtime Monitor

Performance Monitoring of Diverse Computer Systems9/25/08 – HPEC 2008 33

1. Measurement Set of “taps” for agent to count, histogram, or

perform simple logical operations on Taps can be an edge or an input queue

2. Performance assertion User describes their performance expectations of an

application as assertions Runtime monitor validates these assertions by

collecting measurements and evaluating logical expressions Arithmetic operators: +, -, *, / Logical operators: and, or, not Annotations: t, L

Performance Monitoring of Diverse Computer Systems

throughput: “at least 100 A.Input events will be produced in any period of 1001 time units”

t(A.Input[i +100]) – t(A.Input[i]) ≤ 1001

latency: “A.Output is generated no more than 125 time units after A.Input”

t(A.Output[i]) – t(A.Input[i]) ≤ 125

queue bound: “A.InQueue never exceeds 100 elements”

L(A.InQueue[i]) ≤ 100

04/20/23 34

Performance Monitoring of Diverse Computer Systems9/25/08 – HPEC 2008 35

Runtime measurements Query CMP/GPU performance counters Custom FPGA counters

Local assertions Can be evaluated within a single agent No need for communication with other

agents/system monitor Global assertions

Requires aggregating results from more than one agent on different compute resources

Performance Monitoring of Diverse Computer Systems

Some assertions impose prohibitive memory requirements Either disallow these or warn user of large

monitoring impact Other assertions are compute intensive A few are both! Fortunately, much can be gained from

simple queries Input queue lengths over time

9/25/08 – HPEC 2008 36

Performance Monitoring of Diverse Computer Systems

FPGA Agent mostly operational Monitor only, no user assertions yet

Initial target application is the BLAST biosequence analysis application CPU + FPGA hardware platform [Jacob, et al. TRETS ’08]

Next target application is computational finance CPU + GPU + FPGA Performance significantly worse than models

9/25/08 – HPEC 2008 37

Performance Monitoring of Diverse Computer Systems

Runtime performance monitoring enables More efficient development Better testing for real-time systems

Support correctness assertions Investigate ways to best present results to

developer

9/25/08 – HPEC 2008 38