Cherry Picking: Exploiting Process Variations in the Dark...
Transcript of Cherry Picking: Exploiting Process Variations in the Dark...
![Page 1: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/1.jpg)
Cherry Picking: Exploiting Process Variations in the Dark Silicon Era
Siddharth GargUniversity of Waterloo
Co-authors:Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu
![Page 2: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/2.jpg)
Dark Silicon Challenge
1
0
0.2
0.4
0.6
0.8
1
1.2
0
5
10
15
20
25
30
35
45 32 22 16 11 8
# Tr
ansi
sto
rs
Po
we
r/D
ark
Silic
on
Technology Node
Power
Dark Silicon
# Transistors80% dark silicon at the 8 nm
technology node[Esmaeilzadeh et al., ISCA’11]
![Page 3: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/3.jpg)
Heterogeneous Cores
• How to best utilize dark silicon for performance enhancement?
– Heterogeneity
Dark Silicon Architectures
2
Accelerators
Homogeneous Cores?
![Page 4: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/4.jpg)
• Inability to precisely manufacture transistors
– Chip-to-chip variations
– Within-chip variations
Process Variations
3
[Source: Friedberg et al., ISQED’05]
Increasing proportion of within-chip variations
![Page 5: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/5.jpg)
Process Variation Impact
4
1.7X deviation in leakage power
Intel 80-core Teraflop
[Dighe et al., JSSC’11]
30% deviation in frequencyKey idea: exploit heterogeneity that arises from
the impact of process variations
![Page 6: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/6.jpg)
• Best 1 of N statistics
– Provision chip with N identical cores and cherry-pick core with highest frequency
Motivation: Best 1 of N
5
Core Frequency
Co
un
t
Best 1 of 2Mean = 1.1Best 1 of 4Mean = 1.2
1-s increase in frequency with core doubling
![Page 7: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/7.jpg)
• 30% reduction in average leakage power
• 2X reduction in worst-case leakage power
Best 1 of N for Leakage
6
Leakage Power Dissipation
Count
Potential yield loss due to thermal runaway
Best 1-of-1
Best 1-of-4
Best 1-of-2
![Page 8: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/8.jpg)
• BubbleWrap [Karpuzcu et al.,MICRO’09]
– Use redundant cores to increase lifetime
– Cores run in Turbo mode till they “pop”
• Dark silicon architectures
– Heterogeneous cores [Esmaeilzadeh et al.,ISCA’11]
– Accelerators [Venkatesh et al.,MICRO’11]
• Statistical Element Selection
– Increasing immunity of analog circuits to process variations [Keskin et al.,CICC’10]
• Process variation aware scheduling
– ILP based solution for multi-programmed apps [Teodorescu et al.,ISCA’08]
Related Work
7
![Page 9: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/9.jpg)
• Generate die map of process variations
Variability Modeling
8
Distance
Co
rre
lati
on
Co
eff
icie
nt
(r)
Single Gaussian random variable to model impact of process variations at each location
Spatial correlations modeled using an exponentially decaying
function of distance
Slow
Fast
[Zhiong et al., TCAD’07]
![Page 10: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/10.jpg)
• Each core has Ncp identical critical paths– Core frequency limited by slowest
critical path– Critical path delay inversely
proportional to process parameter
Frequency and Leakage
9
Critical Paths (CP)
• Leakage is summed over all Ncore grid points
– Exponential dependence on process parameters
![Page 11: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/11.jpg)
• Wide range of power and frequency values• One “technology beating” core
– Likelihood increases with more % dark silicon
Cherry Picking for Single Threads
10
Core Power Dissipation
Co
re F
req
ue
ncy
Pareto Optimal Cores
Technology BeatingCore
Only 4 Pareto optimal cores in the original design without spare cores
![Page 12: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/12.jpg)
• Maximize performance within a P Watt budget
– Performance measured as the sum of frequencies of cores that are selected
Cherry Picking: Multi-program Workloads
11
P Watt Bin
Instance of the knapsack problem Pseudo-polynomial time solution
![Page 13: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/13.jpg)
Cherry Picking: Multi-threaded Wkloads
12
• Common execution template for a number of parallel benchmarks
– Sequential phase followed by barrier based synchronization of parallel threads
• Optimal mapping of threads to cores such that:
– Performance is maximized within a power budget
![Page 14: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/14.jpg)
• Goal: analytical + accurate performance model that is amenable to optimization
• Execution time limited by sequential thread and slowest parallel thread
– Surprisingly accurate, although grossly simplified
Performance Model
13
Execution time
Amount of sequential work
Frequency of sequential core
Amount of parallel work
Number of parallel threads Slowest parallel
core frequency
![Page 15: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/15.jpg)
• When core 1 frequency is lower than frequency of other cores, lower execution time with increasing frequency
• When core 1 frequency is higher than frequency of other cores, fixed execution time with increasing frequency
Validation
14
![Page 16: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/16.jpg)
• Assume that:
– Seq. thread executes on core i
– Slowest parallel thread executes on core j
– Q is a set of M-1 other cores:
• Execution time:
Optimal Mapping
15
Seq.
Par. 1 Par. 2 Par. M
Core i
Core j
![Page 17: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/17.jpg)
• For some <i,j> combinations, there might not exist M-1 faster cores that meet the power budget
– Frequency scaling can be used to meet power constraints at expense of performance
– Frequency of all parallel cores scaled to the same frequency fpar such that:
– Sufficient to only look at M-1 lowest leakage cores
Frequency Scaling
16
![Page 18: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/18.jpg)
• All experimental results based on the Sniper x86 multi-core simulator
– Interval core model, cycle-accurate cache, network and memory models
• Parsec and SPLASH benchmarks with M=16
– Blackscholes
– FFT
– Radix
– Fluidanimate
– Swaptions
Experimental Set-up
17
![Page 19: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/19.jpg)
• 4.7% average error and 7.2% RMS error
Performance Model Validation
18
Simulated Execution Time
Pre
dic
ted
Exe
cuti
on
Tim
e
Under-prediction because increasednetwork latencies are not accounted for
50% Dark Silicon(red)
33% Dark Silicon(green)
![Page 20: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/20.jpg)
• Averaged over 10 Monte Carlo experiments for each benchmark and each architecture
Performance Improvements
19
30% 22%
![Page 21: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/21.jpg)
Insight
20
![Page 22: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/22.jpg)
• Cherry picking proposes to pick the best subset of cores in a homogeneous dark silicon chip
– Power budget is met
– Performance is maximized
– Exploits process variations to create heterogeneity
• Next generation dark silicon architectures might consist of a mix of architectural and process variation driven heterogeneity
– Replica accelerators
Discussion
21
![Page 23: Cherry Picking: Exploiting Process Variations in the Dark ...public.gi.ucsc.edu/~yatisht/files/DATE13_cherrypick.pdf · Bharathwaj Raghunathan, Yatish Turakhia and Diana Marculescu.](https://reader033.fdocuments.net/reader033/viewer/2022050214/5f607bedb0e783244e4340fa/html5/thumbnails/23.jpg)
• HaDeS: Architectural Synthesis for Heterogeneous Dark Silicon Chip Multi-Processors, DAC’13
– More sophisticated analytical performance models
– Varying degrees of parallelism
– Architectural heterogeneity
Upcoming
22