SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe...
Transcript of SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe...
![Page 1: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/1.jpg)
SAGE: Self-Tuning Approximation for Graphics
Engines
Mehrzad Samadi1, Janghaeng Lee1, D. Anoushe Jamshidi1, Amir Hormati2, and Scott Mahlke1
University of Michigan1, Google Inc.2
December 2013
Compilers creating custom processorsUniversity of MichiganElectrical Engineering and Computer Science
![Page 2: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/2.jpg)
2
Approximate Computing
• Different domains:– Machine Learning– Image Processing– Video Processing– Physical Simulation– …
Higher performanceLower power consumption
Less work
Quality: 100% 95%
90% 85%
![Page 3: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/3.jpg)
3
Ubiquitous Graphics Processing Units
Super Computers Desktops Cell Phones
• Wide range of devices
• Mostly regular applications• Works on large data sets
Servers
Good opportunity for automatic approximation
![Page 4: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/4.jpg)
4
SAGE Framework
• Simplify or skip processing• Computationally expensive• Lowest impact on the output quality
Output Error
Speedup
10%
10%
2~3x
• Self-Tuning Approximation on Graphics Engines• Write the program once• Automatic approximation• Self-tuning dynamically
![Page 5: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/5.jpg)
5
Overview
![Page 6: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/6.jpg)
6
SAGE FrameworkInput Program
ApproximationMethods
Approximate Kernels
Tuning Parameters
Static Compiler
Runtime system
![Page 7: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/7.jpg)
7
Static Compilation
Evaluation Metric
Target Output
Quality (TOQ)
CUDA Code
Atomic Operation
Data Packing
ThreadFusion
Approximate Kernels
Tuning Parameters
![Page 8: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/8.jpg)
8
Runtime System
Preprocessing
Tuning Execution Calibration
CPU
GPU
Time
Quality
Speedup
TOQ
Tuning T T + C T + 2C
Tuning
Preprocessing
Calibration
![Page 9: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/9.jpg)
9
Runtime System
Preprocessing
Tuning Execution Calibration
CPU
GPU
Time
Quality
Speedup
TOQ
Tuning T T + C T
![Page 10: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/10.jpg)
10
Approximation Methods
![Page 11: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/11.jpg)
11
Approximation Methods
Evaluation Metric
Target Output
Quality (TOQ)
CUDA Code
Atomic Operation
Data Packing
ThreadFusion
Approximate Kernels
Tuning Parameters
![Page 12: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/12.jpg)
12
Atomic Operations
// Compute histogram of colors in an image __global__ void histogram(int n, int* color, int* bucket) int tid = threadIdx.x + blockDim.x * blockIdx.x; int nThreads = gridDim.x * blockDim.x; for ( int i = tid ; tid < n; tid += nThreads) int c = colors[i]; atomicAdd(&bucket[c], 1);
• Atomic operations update a memory location such that the update appears to happen atomically
![Page 13: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/13.jpg)
13
Atomic Operations
Threads
![Page 14: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/14.jpg)
14
Atomic Operations
Threads
![Page 15: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/15.jpg)
15
Atomic Operations
Threads
![Page 16: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/16.jpg)
16
Atomic Operations
Threads
![Page 17: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/17.jpg)
17
Atomic Add
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 310
2
4
6
8
10
12
14
Conflicts per Warp
Slowdown
![Page 18: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/18.jpg)
18
Atomic Operation Tuning
Iterations
T0 T32 T64 T96 T128 T160
// Compute histogram of colors in an image __global__ void histogram(int n, int* color, int* bucket) int tid = threadIdx.x + blockDim.x * blockIdx.x; int nThreads = gridDim.x * blockDim.x; for ( int i = tid ; tid < n; tid += nThreads) int c = colors[i]; atomicAdd(&bucket[c], 1);
![Page 19: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/19.jpg)
19
Atomic Operation Tuning
Iterations
T0 T32 T64 T96 T128 T160
• SAGE skips one iteration per thread• To improve the performance, it drops the iteration with the maximum number of conflicts
![Page 20: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/20.jpg)
20
Atomic Operation Tuning
T0 T32 T64 T96 T128 T160
It drops 50% of iterations• SAGE skips one iteration per thread• To improve the performance, it drops the iteration with the maximum number of conflicts
![Page 21: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/21.jpg)
21
Atomic Operation Tuning
T0 T32 T64
Drop rate goes down to 25%
![Page 22: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/22.jpg)
22
Dropping One Iteration
0
Conflicts
2
8
17
12
Iteration No.
1
2
3
After optimizationCD 0
0
Conflict Detection
1
3
CD 1
CD 2
CD 3
Max conflictIteration 0Iteration 1Iteration 2
![Page 23: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/23.jpg)
23
Approximation Methods
Evaluation Metric
Target Output
Quality (TOQ)
CUDA Code
Atomic Operation
Data Packing
ThreadFusion
Approximate Kernels
Tuning Parameters
![Page 24: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/24.jpg)
24
Data PackingThreads
Memory
![Page 25: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/25.jpg)
25
Data PackingThreads
Memory
![Page 26: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/26.jpg)
26
Data PackingThreads
Memory
![Page 27: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/27.jpg)
27
Data PackingThreads
Memory
![Page 28: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/28.jpg)
28
Quantization• Preprocessing finds min and max of the input
sets and packs the data
• During execution, each thread unpacks the data and transforms the quantization level to data by applying a linear transformation
Min Max
0 1 2 3 4 5 6 7
101
Quantization Levels
![Page 29: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/29.jpg)
29
Quantization• Preprocessing finds max and min of the input
sets and packs the data
• During execution, each thread unpacks the data and transforms the quantization level to data by applying a linear transformation
Min Max
0 1 2 3 4 5 6 7
101
![Page 30: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/30.jpg)
30
Approximation Methods
Evaluation Metric
Target Output
Quality (TOQ)
CUDA Code
Atomic Operation
Data Packing
ThreadFusion
Approximate Kernels
Tuning Parameters
![Page 31: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/31.jpg)
31
Thread Fusion
Threads
Memory
Memory ≈ ≈ ≈ ≈
![Page 32: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/32.jpg)
32
Thread Fusion
Threads
Memory
Memory
![Page 33: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/33.jpg)
33
Thread Fusion
T0
Computation
Output writing
T1
ComputationOutput writing
![Page 34: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/34.jpg)
34
Thread Fusion
T0 T1 T254 T255 T0 T1 T254 T255
Block 0 Block 1
ComputationOutput writing
![Page 35: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/35.jpg)
35
Thread Fusion
T0 T127
Block 0 Block 1
T0 T127
Reducing the number of threads per block results in poor resource utilization
![Page 36: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/36.jpg)
36
Block Fusion
T0 T254
Block 0 & 1 fused
T255T1
![Page 37: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/37.jpg)
37
Runtime
![Page 38: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/38.jpg)
38
How to Compute Output Quality?
Accurate Version
ApproximateVersion
EvaluationMetric
• High overhead• Tuning should find a good enough
approximate version as fast as possible• Greedy algorithm
![Page 39: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/39.jpg)
39
Tuning
K(0,0) Quality = 100%Speedup = 1x
TOQ = 90%
K(x,y)
Tuning parameter of the First optimization
Tuning parameter of the Second optimization
![Page 40: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/40.jpg)
40
Tuning
K(0,0) Quality = 100%Speedup = 1x
K(1,0) K(0,1)94%1.15X
96%1.5X
TOQ = 90%
![Page 41: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/41.jpg)
41
Tuning
K(0,0) Quality = 100%Speedup = 1x
K(1,0) K(0,1)
K(1,1) K(0,2)
94%1.15X
96%1.5X
94%2.5X
95%1.6X
TOQ = 90%
![Page 42: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/42.jpg)
42
Tuning
K(0,0) Quality = 100%Speedup = 1x
K(1,0) K(0,1)
K(1,1) K(0,2)
K(2,1) K(1,2)
94%1.15X
96%1.5X
94%2.5X
95%1.6X
88%3X
87%2.8X
Final ChoiceTuning Path
TOQ = 90%
![Page 43: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/43.jpg)
43
Evaluation
![Page 44: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/44.jpg)
44
Experimental Setup
• Backend of Cetus compiler
• GPU– NVIDIA GTX 560
• 2GB GDDR 5
• CPU– Intel Core i7
• Benchmarks– Image processing– Machine Learning
![Page 45: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/45.jpg)
45
K-Means
80
90
100
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 1011061111160
1
2
Output Quality
Accumulative Speedup
TOQ
![Page 46: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/46.jpg)
46
Confidence
0 50 100 150 200 250 300 350 40040
50
60
70
80
90
100
Calibration points
93
CI = 99%
CI = 95%
CI = 90%
After checking 50 samples, we will be 93% confident that 95% of the outputs satisfy the TOQ threshold𝑝𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 ∝𝑝𝑟𝑖𝑜𝑟× h𝑙𝑖𝑘𝑒𝑙𝑖 𝑜𝑜𝑑
Beta Uniform Binomial
![Page 47: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/47.jpg)
47
Calibration Overhead
0 20 40 60 80 100 120 140 160 180 2000
5
10
15
20
25
Calibration Interval
Gaussian
K-Means
Calibration Overhead(%)
![Page 48: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/48.jpg)
48
K-Means
NB
Histogram
SVM
Fuzzy
MeanShift
Binarization
Dynamic
Mean Filter
Gaussian
1 2 3 4
2.0
1.8
3.6
1.4
1.7
2.3
2.2
1.3
3.3
1.7
1.7
1.3
1.3
1.8
1.5
1.6
Speedup 1 2 3 4
2.5
2.2
6.4
1.4
1.8
2.7
2.2
1.3
3.3
3.1
3.1
1.4
1.6
3.5
1.3
1.3
1.5
1.4
1.3
Speedup
Performance
6.4
SAGE SAGE
TOQ = 95% TOQ = 90%Loop Perforation Loop Perforation
GeometricMean
![Page 49: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/49.jpg)
49
Conclusion• Automatic approximation is possible
• SAGE automatically generates approximate kernels with different parameters
• Runtime system uses tuning parameters to control the output quality during execution
• 2.5x speedup with less than 10% quality loss compared to the accurate execution
![Page 50: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/50.jpg)
SAGE: Self-Tuning Approximation for Graphics
Engines
Mehrzad Samadi1, Janghaeng Lee1, D. Anoushe Jamshidi1, Amir Hormati2, and Scott Mahlke1
University of Michigan1, Google Inc.2
December 2013
Compilers creating custom processorsUniversity of MichiganElectrical Engineering and Computer Science
![Page 51: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/51.jpg)
51
Backup Slides
![Page 52: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/52.jpg)
52
What Does Calibration Miss?
10 20 30 40 50 60 70 80 90 1000%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
flower(250) grandma(800) deadline(900) foreman(300) harbour(300)
ice(240) akiyo(300) carphone(380) crew(300) galleon(350)
Calibration Interval
Perc
ent D
iffer
ence
in E
rror
![Page 53: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/53.jpg)
53
SAGE Can Control The Output Quality
909192939495969798991001
2
3
4
5
6
7Naïve Bayes Fuzzy Kmeans Mean Filter
Output Quality(%)
Sp
eed
up
![Page 54: SAGE: Self-Tuning Approximation for Graphics Engines Mehrzad Samadi 1, Janghaeng Lee 1, D. Anoushe Jamshidi 1, Amir Hormati 2, and Scott Mahlke 1 University.](https://reader036.fdocuments.net/reader036/viewer/2022062407/56649e025503460f94aeccf7/html5/thumbnails/54.jpg)
54
Distribution of Errors
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
HistogramKmeansNaïve BayesFuzzy KmeansSVMDynamicMean FilterGaussianBinarizationMeanshift
Error
Perc
enta
ge o
f Ele
men
ts