ISA-Independent Workload Characterization and Implications ... · Implications for Specialized...
Transcript of ISA-Independent Workload Characterization and Implications ... · Implications for Specialized...
![Page 1: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/1.jpg)
ISA-Independent Workload Characterization and Implications for Specialized Architectures
Yakun Sophia Shao and David Brooks Harvard University
{shao,dbrooks}@eecs.harvard.edu
![Page 2: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/2.jpg)
Specialized architectures are decoupled from legacy ISAs.
2
Spectrum of Specialization:
General-Purpose CPU GPU Fixed-Function
ASIC
High Efficiency Low Efficiency
Low Programmability
High Programmability
No ISA Tied to a Specific ISA
![Page 3: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/3.jpg)
Specialization requires workload intrinsic characteristics.
Specialized architecture is tailored to applications.
• e.g. special data path, memory access patterns.
3
I want to design specialized architectures for applications.
You need to first understand their characteristics.
Where should I start first?
![Page 4: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/4.jpg)
4
Yeah, good point! What should I do to understand
those characteristics?
Hmmm…it’s what you used to do for CPU designs.
Specialization requires workload intrinsic characteristics.
but is what you get the true program characteristic?
How about I run the program and collect performance-
counter stats?
![Page 5: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/5.jpg)
Performance-Counter Based Workload Characterization
• Metrics – IPC – Cache miss rates – Branch mis-prediction rates – …
• Microarchitecture-dependent – What if there is a bigger cache/a better branch predictor? – Not program intrinsic characteristics
5
![Page 6: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/6.jpg)
6
Specialization requires workload intrinsic characteristics.
Oh I also heard about microarchitecture-independent
workload characterization.
hmmm…that removes microarchitecture dependency.
But it still ties to a specific ISA.
We can perform the profiling analysis just using the instruction
trace.
![Page 7: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/7.jpg)
7
Specialization requires workload intrinsic characteristics.
“Ties to a specific ISA”? Will that be a problem?
Yes for specialized architectures!
![Page 8: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/8.jpg)
ISA impacts program behaviors.
Stack Overhead • Limited Registers • Additional Load/Store
Complex Operations • Memory Operands • Vector Operations
Calling Conventions
8
![Page 9: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/9.jpg)
9
Specialization requires workload intrinsic characteristics.
I see. So is there a way to get ISA-independent
program characteristics?
That’s a good question. I found a paper in ISPASS this year which seems to
answer this question. Let’s take a look!
![Page 10: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/10.jpg)
Paper Summary
Goal: • An analysis tool to characterize workloads ISA-Independent
characteristics for specialized architectures
10
Methods: • Leverage compiler’s intermediate representation (IR) • Categorize characteristics into compute, memory, and control Takeaways: • ISA-dependent characterization is misleading for specialization. • ISA-independent characterization allows designers to quickly
identify opportunities for specialization.
![Page 11: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/11.jpg)
Tool Overview
Program
IR Trace
x86 Trace
Characterization for Specialized Architecture
Compute Memory Control
ISA-Independent
Design of Specialized Architecture
11
ISA-Dependent
![Page 12: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/12.jpg)
Program Representations
12
Program
IR Trace
x86 Trace
ILDJIT
LLVM
![Page 13: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/13.jpg)
Program Representations
• SPEC CPU2000
13
Program
IR Trace
x86 Trace
ILDJIT
LLVM
![Page 14: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/14.jpg)
Program Representations
ILDJIT • A modular compilation framework • Performs machine-independent
classical optimizations at the IR level • Uses LLVM’s back end to
– Do machine-dependent optimizations – Generate machine code
14
Program
IR Trace
x86 Trace
ILDJIT
LLVM
Campanoni, et al., A Highly Flexible, Parallel Virtual Machine: Design and Experience of ILDJIT, Software Practice Experience, 2010
![Page 15: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/15.jpg)
Program Representations
ILDJIT IR
• High-level IR • Machine-, ISA-, and system-library-
independent • Features:
– 80 instructions – Unlimited registers – Only loads/stores access memory – No vector operations – Parameters are passed by variables
15
Program
IR Trace
x86 Trace
ILDJIT
LLVM
![Page 16: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/16.jpg)
Program Representations
x86 Trace • Used for ISA-dependent analysis • Semantically equivalent to the IR
code • Collected with Pin instrumentation
16
Program
IR Trace
x86 Trace
ILDJIT
LLVM
![Page 17: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/17.jpg)
Tool Overview
Program
IR Trace
x86 Trace
Characterization for Specialized Architecture
Compute Memory Control
ISA-Independent
Design of Specialized Architecture
17
ISA-Dependent
![Page 18: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/18.jpg)
ISA-Independent Workload Characteristics
18
Compute
Memory
Control
• Opcode Diversity • Static Instructions (I-MEM)!
• Memory Footprint (D-MEM) • Global Address Entropy • Local Address Entropy
• Branch Instruction Counts • Branch Entropy
![Page 19: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/19.jpg)
Compute::Static Instructions
19
![Page 20: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/20.jpg)
20
Compute::Static Instructions
I will think those stack operations are part of the
“hot code”.
So if you use x86 trace instead of
IR trace…
![Page 21: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/21.jpg)
ISA-Independent Workload Characteristics
21
Compute
Memory
Control
• Opcode Diversity • Static Instructions (I-MEM)
• Memory Footprint (D-MEM) • Global Address Entropy!• Local Address Entropy!
• Branch Instruction Counts • Branch Entropy
![Page 22: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/22.jpg)
Memory::Entropy
Entropy: a measure of the randomness
22
Entropy = − p(xi )* log2i=1
N
∑ p(xi )
Case 1: X is always a constant. p(X) =1
log2 p(X) = 0Entropy = 0
Case 2: N possible outcomes of X occur equally.
p(X) = 1N
log2 p(X) = log2 N−1
Entropy = −N * 1N* log2 N
−1
Entropy = log2 N
![Page 23: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/23.jpg)
Memory::Global Address Entropy
23
Temporal Locality
Address Stream A Address Stream B (less temporal locality) (more temporal locality)
0 0 0 0 !0 0 0 1 !0 0 1 0 !0 0 1 1 !
Entropy = 2! Entropy = 0"
Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08
![Page 24: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/24.jpg)
Memory::Global Address Entropy
24
Temporal Locality
Address Stream A Address Stream B (less temporal locality) (more temporal locality)
0 0 0 0 !0 0 0 1 !0 0 1 0 !0 0 1 1 !
Entropy = 2! Entropy = 0"
Yen, Draper, and Hill. Notary: Hardware Techniques to Enhance Signatures. MICRO 08
![Page 25: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/25.jpg)
Memory::Global Address Entropy
25
Temporal Locality
I will have wrong locality estimate for workloads!
So if you use x86 trace instead of
IR trace…
![Page 26: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/26.jpg)
Memory::Local Address Entropy
Address Stream A Address Stream B
0 0 0 0
0 1 0 0
1 0 0 0
1 1 0 0
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 421 3
# of Bits Skipped
Local Entropy
1
2
AB
(less spatial locality) (more spatial locality)
26
Spatial Locality
Address Stream A Address Stream B (less spatial locality) (more spatial locality)
0 0 0 0 !0 1 0 0 !1 0 0 0 !1 1 0 0 !
![Page 27: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/27.jpg)
Memory::Local Address Entropy
27
Spatial Locality
I will think program has more spatial locality than
it really has.
So if you use x86 trace instead of
IR trace…
![Page 28: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/28.jpg)
ISA-Independent Workload Characteristics
28
Compute
Memory
Control
• Opcode Diversity • Static Instructions (I-MEM)
• Memory Footprint (D-MEM) • Global Address Entropy • Local Address Entropy
• Branch Instruction Counts • Branch Entropy!
Yokota, et all, Introducing Entropies for Representing Program Behavior and Branch Predictor Performance, 07
![Page 29: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/29.jpg)
Control::Branch Entropy
29
![Page 30: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/30.jpg)
Control::Branch Entropy
30
I won’t get much wrong for control.
So if you use x86 trace instead of
IR trace…
![Page 31: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/31.jpg)
Tool Overview
Program
IR Trace
x86 Trace
Characterization for Specialized Architecture
Compute Memory Control
ISA-Independent
Design of Specialized Architecture
31
ISA-Dependent
![Page 32: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/32.jpg)
ISA-Independent Workload Characteristics
32
Compute
Memory
Control
• Opcode Diversity • Static Instructions (I-MEM)
• Memory Footprint (D-MEM) • Global Address Entropy • Local Address Entropy
• Branch Instruction Counts • Branch Entropy
Is there a way to compare those
across workloads?
Yes, Kiviat plot!
![Page 33: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/33.jpg)
ISA-Independent Workload Characteristics
33
Compute
Memory
Control
• Opcode Diversity!• Static Instructions (I-MEM)!
• Memory Footprint (D-MEM)!• Global Address Entropy!• Local Address Entropy
• Branch Instruction Counts • Branch Entropy!
![Page 34: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/34.jpg)
Workload Characterization
34
![Page 35: ISA-Independent Workload Characterization and Implications ... · Implications for Specialized Architectures Yakun Sophia Shao and David Brooks Harvard University {shao,dbrooks}@eecs.harvard.edu](https://reader034.fdocuments.net/reader034/viewer/2022050716/5e1f31ad5a9af10bec1c6e9d/html5/thumbnails/35.jpg)
Conclusions
• We demonstrate that ISA-dependent analysis can be misleading for specialized architectures.
• We present an analysis tool to characterize ISA-independent characteristics for specialization.
• We show that our tool provides opportunities for designers to compare workloads’ characteristics.
35