Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman...
-
Upload
antony-headen -
Category
Documents
-
view
215 -
download
0
Transcript of Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman...
Discovering and Exploiting Program Phases
Timothy Sherwood, Erez Perelman,Greg Hamerly, Suleyman Sair, Brad Calder
CSE 231 Presentation by Justin Ma
400 Million Instructions
New Compiler
Non-Existent ProcessorNew Processor
Simulator
BenchmarkSpec2000
400 Million Instructions
• Suppose you have a time budget…• Less than half second of execution
time• What would you simulate?
– Beginning?– Middle?– End?
400 Million Instructions
gzip gcc
Programs exhibit diverse modes of
behavior
400 Million Instructions
• Suppose you have a time budget…• Less than half second of execution
time• What would you simulate?
– Beginning?– Middle?– End?– Samples of different modes of behavior
Program Phases
• Observation: programs exhibit various modes of periodic behavior
• These modes are program phases
• Challenge: Extract these automatically
Phase Basics
• Intervals – slices in times• Phases – intervals with similar
behavior
Time (Instruction Count)
IPC
Phase Basics
• Intervals – slices in times• Phases – intervals with similar
behavior
Time (Instruction Count)
IPC
Defining “Similar Behavior”
• Metric for comparing intervals?– Cache misses?– IPC?– Branch misprediction rates?
• Problem: Performance alone is too architecture dependent
Defining “Similar Behavior”
• Code path traversal– Directly affects time-varying behavior– Execute same code, same performance– Architecture independent
• Metrics for code path traversal– Frequency of branches– Frequency of function calls– Frequency of basic block calls
Basic Block Vector
B1
B2 B3
B4
0 0 0 0
B1 B2 B3 B4
Time t
Basic Block Vector
B1
B2 B3
B4
1 1 0 1
B1 B2 B3 B4
Time t
Basic Block Vector
B1
B2 B3
B4
2 1 1 2
B1 B2 B3 B4
Time t
Basic Block Vector
B1
B2 B3
B4
2 1 1 2
B1 B2 B3 B4
Time t
0 0 0 0
B1 B2 B3 B4
Time t + 1
Basic Block Vector
B1
B2 B3
B4
2 1 1 2
B1 B2 B3 B4
Time t
1 1 0 1
B1 B2 B3 B4
Time t + 1
Basic Block Vector
B1
B2 B3
B4
2 1 1 2
B1 B2 B3 B4
Time t
2 2 0 2
B1 B2 B3 B4
Time t + 1
Manhattan Distance = |1 – 2| + |1 – 0| = 2Euclidian Distance = sqrt((1 – 2)2 + (1 – 0)2) = sqrt(2)
Basic Block Similarity Matrix
• gzip
Basic Block Similarity Matrix
• gcc
BBV similarity between intervals
reflects performance
similarity
Automatic Phase Classification
• Classify intervals into phases– We do not know which BBVs correspond to
particular phases a priori
• k-means clustering– Iterative clustering algorithm– Dimension Reduction
• Random Linear Projection
– Try different k values• Use BIC to choose best
Automatic Phase Classification
Automatic Phase Classification
Clustering accurately distinguishes phases
automatically
SimPoint
• Simulate large programs on a budget
• Perform detailed simulation on representative code snippets– Choose centroid interval from each phase
(10 million instructions)
• Extrapolate large program performance– Weighted by frequency of phase
• Simulate 400 million instructions total
SimPoint
Accurate estimate despite instruction
budget
Why SimPoint Succeeds
• Program behavior varies over time
• SimPoint intelligently chooses which intervals to simulate
• Regularity within program phases allows accurate extrapolation
Online Classification
• Detect phases as program is running
• Applications– Thread scheduling– Power management– Predicting future phases
• Challenges– One pass of input– Limited storage
Online Classification
Online ClassificationHigh variance in metrics
across full trace
Low variance shows online classification succeeds in finding
phases
Conclusions
• Phases are a vital abstraction– Performance varies greatly w/in program– Attributable to different modes of behavior
• Can discover phases automatically– Offline: k-means clustering– Online
• Code path characterization– Strong correlation with actual performance– SimPoint exploits this with great success
Outline
• Introduction (motivate)• Basics (definitions, BBV, BBMatrix)• Offline Phase Classification
– SimPoints
• Online Phase Classification• Conclusions
Limitations of Clustering
Bayesian Information Criterion
• Fit to Gaussians
Self-Modifying Code
Self-m
odifyin
g c
ode
Program Phases
85o
Learning Phases
Learning Phases