NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC (...

25
NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC (Performance Modeling & Characterization) Group Leader www.sdsc.edu/PMaC SDSC

Transcript of NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC (...

Page 1: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

NAVO MSRC PET ProgramTowards More Meaningful Machine

Comparisons

Dr. Allan Snavely

PMaC (Performance Modeling & Characterization) Group Leader

www.sdsc.edu/PMaC

SDSC

Page 2: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

PMaC Mission

• To bring scientific rigor to the art or performance prediction– for procurement– for architectural tradeoffs– for guiding applications to best-suited machine– for performance tuning

Page 3: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

PMaC Mission

• To bridge the gap between benchmarks and cycle-accurate simulation– Benchmarks have dubious relevancy to real

apps, particularly on future machines– Cycle-accurate simulations take too long

Page 4: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Projects• MAPS (Memory Access Patterns)

– memory subsystem & interconnect signatures

• MetaSim

– an on-the-fly simulator for playing “what if?” (4 orders of magnitude faster than cycle-accurate simulation)

• Pseudocode Cache Simulator

• Scientific Application Loop Set

• Terascale Application Information

• IDC HPC List

Page 5: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

People

• Dr. Allan Snavely, Group Leader– Dr. Laura Carrington, Xiaofeng Gao (MAPS)– Dr.Stuart Johnson (Pseudocode simulator)– Dr. Larry Carter (senior technical advisor)– Dr. Wayne Pfeiffer (Scientific Application

Loop Set)– Nicole Wolter (Paraver/Dimemas)– Dr. Bob Leary (resident mathemeticain)

Page 6: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

What’s wrong with benchmarks?

• May anti-correlate to actual performance1

1: Conventional Benchmarks as a Sample of the Performance Spectrum

John L. Gustafson, Rajat Todi Ames Laboratory, USDOE

Page 7: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

PMaC Methods

• Performance modeling via separation of concerns– Machine signatures– Application profiles– Convolution methods

Page 8: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Memory Bandwidth vs. Size for Loadson Blue Horizon

0

1000

2000

3000

4000

5000

6000

7000

1000 10000 100000 1000000 10000000

Size (W)

Ban

dW

idth

(M

B/s

)

1 - Stream

2 - Streams

3 - Streams

4 - Streams

Page 9: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Memory Bandwidth vs. Size for Loadson Blue Horizon

0

1000

2000

3000

4000

5000

6000

7000

1000 10000 100000 1000000 10000000

Size (W)

Ban

dW

idth

(M

B/s

)

1 - Stream

2 - Streams

3 - Streams

4 - Streams

TLB 131072 word4KB pages

2 way

L2 1048576 word4 way 16 block

L1

8192 word128 way 16 block

Page 10: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Memory Bandwidth vs. Size for Loadson BH, T3E, SX-4

0

1000

2000

3000

4000

5000

6000

7000

8000

1000 10000 100000 1000000 10000000Size (W)

Ba

nd

Wid

th (

MB

/s)

BH 1 - Stream

t3e 1 - Stream

sx-4

Page 11: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

MAPS

• Useful in its own right for more meaningful machine comparisons at a glance

• Work going forward to port to Compaq TCS1, SX-5, T90, Sv1, MTA, Sun HPC 10K, Origin, others?

• Provides input to MetaSim (next)

Page 12: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Meta-SimA meta-simulator tool

Page 13: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Meta-Sim

• Takes 2 inputs– a program– a description of a machine

• Consumes instrumented trace data “on-the-fly”– 100 fold slowdown (as opposed to 1M fold!)

• Performs an automated predictive convolution

Page 14: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Meta-Sim

• Models caches and TLB– any number of levels– arbitrary sizes, line lengths, associativities

• Does accounting on the Basic Block level

• Looks for memory access patterns

Page 15: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

A (simplistic) Convolution

MFLOPS i=1

n= Wt. BB

i * Rate BBi

Intensity BBi *

Wt. BB = % of total memory references

Rate BB = sustained rate of memory references

Intensity BB = ratio of floating point ops to memory opsi

i

i

Page 16: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

How to determine rate of memory access for BB?

• sum = sum + a(k)*b(colidx(k))

• Even if only 33% of memory references in a BB fall out to MM, they may slow down the whole BB to the speed of MM accesses

• Why?

Page 17: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Results

NAS FP Kernels

0

100

200

300

cg s cg w cg a ft s ft w mg s mg w

MFL

OP

S

Predicted MFLOPS Observed MFLOPS

Page 18: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Results

Error for NAS FP Kernels

-2.00%

-1.00%

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

cg s cgw

cg a ft s ft w mgs

mgw

%

% error

Page 19: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Occam’s Razor

• Only add complexity if required to explain observed phenomena

• Observation - this approach just as accurate as SMTSIM (Tullsen, Snavely, et al) but 4 orders of magnitude faster!

Page 20: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Conventional Benchmarks as a Sample of the Performance

Spectrum

1

10

100

1000

1000 10000 100000 1000000 10000000

log MW

log

MW

/s

Random Loads Random Stores

FT S

FT W

CG SCG W

90% L1

CG A80% L1

MG S MG W

Page 21: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Apps Results

NAS Apps

0

200

400

600

bt s lu s lu w sp s sp w

MFL

OP

S

Predicted MFLOPS

Observed MFLOPS

Page 22: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Apps Results

Error for NAS Apps

-10.00%

0.00%

10.00%

20.00%

30.00%

bt s lu s lu w sp s sp w

% % error

Page 23: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Apps as a Sample of the Performance Spectrum (?)

1

10

100

1000

1000 10000 100000 1000000 10000000

log MW

log

MW

/s

Random Loads Random Stores

BT SLU S/W SP S

SP W 90 % L1

Page 24: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

Work going forward

• Development of probes ala MAPS for floating point and integer functional unit issue, logical operations, I/O

• Increase sophistication of convolutions as required to fit observed facts

• Big goal; a robust set of metrics and methods for performance modeling and characterization

Page 25: NAVO MSRC PET Program Towards More Meaningful Machine Comparisons Dr. Allan Snavely PMaC ( Performance Modeling & Characterization ) Group Leader .

PMaC Thanks Our Sponsors

• Now includes DOE SciDac award (SUPREME)

• Support from HPC Users Forum

• DoD HPC Modernization was 1st to fund us and their vision made this work possible