1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center...
-
date post
19-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center...
![Page 1: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/1.jpg)
1
A Case for Vertical Profiling
Peter SweeneyMichael Hind
IBM Thomas J. WatsonResearch Center
Matthias HauswirthAmer Diwan
University of Coloradoat Boulder
![Page 2: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/2.jpg)
2
Finding Causes of Performance Phenomena
ApplicationOperating SystemHardware
C Program
ApplicationFrameworkJava LibraryVirtual MachineNative LibraryOperating SystemHardware
Java / .net Program
![Page 3: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/3.jpg)
3
Warehouse Transactions
Methodology
• Benchmark: SPECjbb2000
• Virtual machine: JikesRVM
Initialization1 thread120,000 transactions50 transactions per time slice
time
![Page 4: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/4.jpg)
4
Expected Performance of Warehouse Thread
Inst / Cyc
9,792 million 39,816 millionCycles
![Page 5: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/5.jpg)
5
Observed Performance of Warehouse Thread
Inst / Cyc
0
0.622
9,792 million 39,816 millionCycles
![Page 6: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/6.jpg)
6
Investigation:Why this Difference?
• Correlate IPC with more than 100 other hardware performance metrics– No significant overall correlation
![Page 7: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/7.jpg)
7
Investigation:Correlate with GC Activity
Inst / Cyc
0
0.622
9,792 million 39,816 millionCycles
![Page 8: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/8.jpg)
8
Phenomenon Pre-GC Dip
Inst / Cyc
0
0.622
9,792 million 39,816 millionCycles
![Page 9: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/9.jpg)
9
Phenomenon Pre-GC DipCorrelate with OS-Level Metric
Inst / Cyc
EEOff / Cyc
0
0.622
0
0.219+300%
-6%
9,792 million 39,816 millionCycles
![Page 10: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/10.jpg)
10
Phenomenon Pre-GC DipNext Steps
• We have not found the root cause yet…
• Need metrics from different levels:– Allocation– Synchronization– System calls– Interrupts
![Page 11: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/11.jpg)
11
Observed Performance
Inst / Cyc
0
0.622
9,792 million 39,816 millionCycles
![Page 12: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/12.jpg)
12
Phenomenon Continuous increase
Inst / Cyc
0
0.622
9,792 million 39,816 millionCycles
![Page 13: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/13.jpg)
13
Phenomenon Continuous increase Correlate with HW-Level Metric
Inst / Cyc
0
0.622
LsuFlush / Cyc
0.037
0
9,792 million 39,816 millionCycles
![Page 14: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/14.jpg)
14
Phenomenon Continuous increase Correlate with VM-Level
Metric Non-Opt
AOS Opt
Start End
IPC 0.3479 0.4091 0.4890 0.5082
LsuFlush/Cyc 0.0533 0.0250 0.0017 0.0007
![Page 15: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/15.jpg)
15
Phenomenon Continuous increase Next Steps
• We have not verified the root cause yet…
• Need metrics from different levels:– Recompilation activity– Time spent executing non-optimized vs.
optimized code
![Page 16: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/16.jpg)
16
• Gather data about multiple levels Application Framework Java Library Virtual Machine Native Library Operating System Hardware
Pre-GC Dip Continuous increase
Vertical Profiling
![Page 17: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/17.jpg)
17
Vertical Event Trace
time
Hardware
Native libs
Java libs
Framework
Application
sem
ant
ic le
vel
OS
VM
![Page 18: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/18.jpg)
18
Challenges & Possible Approaches• Huge difference in event frequencies
– E.g. 7 GCs, but 20 billion instructions completed– Idea: Count high-frequency events, trace low-frequency events
• Large number of possible metrics– Trace everything: impossible to anticipate, too expensive– Write many specialized profilers: error prone, large effort– Idea: Generate profilers from specification
• Overhead– E.g. tracing every memory access is very expensive– Idea: Provide tunable profiling parameters for least overhead
• Perturbation– E.g. instrumenting every memory access perturbs HPMs– Idea: Use separate runs for interfering metrics
• Separate Traces– E.g. handling non-determinism– Idea: Combine traces using intervals to summarize
![Page 19: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/19.jpg)
19
ArchitectureSpecification
(what)Parameters
(how)
Tracer Trace Reader Trace Analyzer
Generator
EventStream
VisualizerInstrumentations
Event creations,Counter updates
Event Stream
IntervalStream
Aggregated Profiles
Instrumenters
![Page 20: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/20.jpg)
20
Intervals
Events
Vertical Profiling Specification:What to Profile
specification IPC_And_BytesAllocated { hardware counter long Cyc; hardware counter long Inst; software counter long BytesAllocated; event ThreadSwitch { int fromThread; int toThread; long cyc = Cyc; long inst = Inst; long bytesAllocated = BytesAllocated; } interval TimeSlice { starts with ThreadSwitch; ends with ThreadSwitch where end.fromThread == start.toThread; double ipc = (end.inst-start.inst) / (end.cyc-start.cyc); long bytesAllocated = end.bytesAllocated – start.bytesAllocated; }}
Event Attributes
IntervalMetrics
Counters
![Page 21: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/21.jpg)
21
Status
• Profiling– Hardware Performance Monitors [VM’04]– Software Performance Monitors– Specification-driven (early prototype)
• Visualization & Analysis– IBM Performance Explorer
![Page 22: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/22.jpg)
22
Future Work
• Evaluate utility– Find root causes of phenomena
• Evaluate perturbation– Intra-level perturbation
(e.g. HPM → HPM)
– Inter-level perturbation(e.g. lock tracing → HPM)
• Semi-automate investigative process– Statistics / Machine learning
![Page 23: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/23.jpg)
23
Related Work
• Trace Analyzer– [Perl 92] Performance Assertion Checking– [Perl et al. 98] Continuous Monitoring
• Software Performance Counters– [Microsoft] Windows Management Instrumentation
• HPM and JikesRVM– [Sweeney et al. 04] Using Hardware Performance
Monitors to Understand the Behavior of Java Applications
![Page 24: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/24.jpg)
24
Questions?
![Page 25: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/25.jpg)
25
EXTRAS
![Page 26: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/26.jpg)
26
Profiling HPMs: Infrastructure
Power4 Performance Monitors
AIX 5.x pmsvc Kernel Extension
AIX 5.x pmapi Library
JikesRVM 2.3.0.1+ HPM Facility
OS
Hardware
C Library
VM
![Page 27: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/27.jpg)
27
Profiling HPMs: Samples
• A sample represents a time slice– Start and end time (in time-base or “decrementer” ticks)
– 8 event counts– Processor id– Java thread id– Preempted or yielding– Java method ending the sample
VP (CPU) 1:
VP (CPU) 2:
10 ms
![Page 28: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/28.jpg)
28
Profiling HPMs: Benchmark
• SPEC JBB• Modified to execute a given number of
transactions (120,000)
• Startup phase (ca. 8 sec)
– 1 main thread• Steady-state phase (ca. 24 sec)
– N warehouse threads• Configurations
– {1,2,3,4} warehouses on {1,2,3,4} processors• Steady-state behavior
– Ca. 50 transactions per 10 ms time slice
![Page 29: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/29.jpg)
29
Performance Explorer
• Visualizer for JikesRVM hardware performance counter traces
• Built-in information about all Power4 performance events
• Support for creating computed metrics (e.g. Inst/Cyc, given Cyc and Instr counter values)
• Multiple visualizations, like time chart and scatter plot (for correlation of metrics)
![Page 30: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/30.jpg)
30
Performance Explorer:Power4 Event Information
![Page 31: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/31.jpg)
31
Performance Explorer:Creation of Computed Metrics
![Page 32: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/32.jpg)
32
Performance Explorer:Overview of Java Threads
![Page 33: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/33.jpg)
33
Performance Explorer:Time Chart
![Page 34: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/34.jpg)
34
Performance Explorer:Scatter Plot
![Page 35: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/35.jpg)
35
Phenomenon Pre-GC Dip in IPC Other Correlated Metrics
Metric Normal Dip Increase
IPC 0.4924 0.46095 -6.4%
EeOff/Cyc 0.01965 0.0785 +300%
HvCyc/Cyc 0.02387 0.12489 +423%
GrpDispBlkSbCyc/Cyc 0.00595 0.02577 +333%
LsuSrqSyncCyc/Cyc 0.00612 0.017 +178%
StcxFail/StcxPassFail 0.00086 0.00395 +362%
LsuLrqFullCyc/Cyc 0.00077 0.00271 +250%
![Page 36: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/36.jpg)
36
Vertical Profiling Matrix
Instrument:
Observe:
Hardware Machine code
Byte code Source code
Hardware OS Native libs VM Java libs Framework Application
![Page 37: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/37.jpg)
37
Vertical Profiling Matrix
• Two “vertical” dimensions– What we observe– What we instrument
• We may observe higher level behavior by instrumenting a lower level, or vice versa– Instrument HW, observe OS time– Instrument byte code, observe branch misses
![Page 38: 1 A Case for Vertical Profiling Peter Sweeney Michael Hind IBM Thomas J. Watson Research Center Matthias Hauswirth Amer Diwan University of Colorado at.](https://reader035.fdocuments.net/reader035/viewer/2022062407/56649d405503460f94a1a366/html5/thumbnails/38.jpg)
38
Vertical profiling specification:How to profile
Parameter Possible ValuesBuffer size 100000, 1000000, 10000000, …
Buffer type Java byte[], Java int[], native
Buffer ownership Global, Processor, Thread
Buffer access synchronization None, Lock-free, Locked
Buffer access Java, Magic
Buffer overflow handling Flush, Disable, Ignore
Buffer flushing Explicit, Seg fault, Each thread switch
Buffer flush target File, Socket, C routine