Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH...
Transcript of Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH...
![Page 1: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/1.jpg)
Continuous Runahead: Transparent Hardware Acceleration for Memory Intensive Workloads
Milad Hashemi, Onur Mutlu, Yale N. PattUT Austin/Google, ETH Zürich, UT Austin
October 19th, 2016
![Page 2: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/2.jpg)
Continuous Runahead Outline• OverviewofRunahead• RunaheadLimitations• ContinuousRunaheadDependenceChains• ContinuousRunaheadEngine• ContinuousRunaheadEvaluation• Conclusions
2
![Page 3: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/3.jpg)
Continuous Runahead Outline• OverviewofRunahead• RunaheadLimitations• ContinuousRunaheadDependenceChains• ContinuousRunaheadEngine• ContinuousRunaheadEvaluation• Conclusions
3
![Page 4: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/4.jpg)
Runahead Execution Overview•Runahead dynamically expands the instruction window
when the pipeline is stalled [Mutlu et al., 2003]• The core checkpoints architectural state• The result of the memory operation that caused the stall is
marked as poisoned in the physical register file• The core continues to fetch and execute instructions• Operations are discarded instead of retired• The goal is to generate new independent cache misses
4
![Page 5: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/5.jpg)
Traditional Runahead Accuracy
0%10%20%30%40%50%60%70%80%90%
100%
Requ
estA
ccuracy
Runahead
GHB
Stream
Markov+Stream
5
![Page 6: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/6.jpg)
Traditional Runahead Accuracy
0%10%20%30%40%50%60%70%80%90%
100%
Requ
estA
ccuracy
Runahead
GHB
Stream
Markov+Stream
6
Runaheadis95%Accurate
![Page 7: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/7.jpg)
Traditional Runahead Prefetch Coverage
0%10%20%30%40%50%60%70%80%90%
100%
%Inde
pend
entC
ache
Misses
7
![Page 8: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/8.jpg)
Traditional Runahead Prefetch Coverage
0%10%20%30%40%50%60%70%80%90%
100%
%Inde
pend
entC
ache
Misses
8
Runaheadhasonly13%Prefetch Coverage
![Page 9: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/9.jpg)
Traditional Runahead Performance Gain
0%
50%
100%
150%
200%
250%
300%
350%
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadPerformanceGain OraclePerformanceGain
9
![Page 10: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/10.jpg)
Traditional Runahead Performance Gain
0%
50%
100%
150%
200%
250%
300%
350%
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadPerformanceGain OraclePerformanceGain
10
Runaheadhasa12%PerformanceGainRunaheadOraclehasan85%PerformanceGain
![Page 11: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/11.jpg)
Traditional Runahead Interval Length
0
20
40
60
80
100
120
140
CyclesPerRun
aheadInterval
128ROB
256ROB
512ROB
1024ROB
11
![Page 12: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/12.jpg)
Traditional Runahead Interval Length
0
20
40
60
80
100
120
140
CyclesPerRun
aheadInterval
128ROB
256ROB
512ROB
1024ROB
12
RunaheadIntervalsareShortLowPerformanceGain
![Page 13: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/13.jpg)
•WhichinstructionstouseduringContinuousRunahead?• Dynamicallytargetthedependencechainsthatleadtocriticalcachemisses
•WhathardwaretouseforContinuousRunahead?• Howlongshouldchainspre-executefor?
Continuous Runahead Challenges
13
![Page 14: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/14.jpg)
Dependence Chains
LD[R6]->R8
ADDR9,R1->R6
ADDR4,R5->R9
LD[R3]->R5
CacheMiss
14
![Page 15: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/15.jpg)
Experimentwith3policiestodeterminethebestpolicytouseforContinuousRunahead:• PC-BasedPolicy• UsethedependencechainthathascausedthemostmissesforthePCthatisblockingretirement
• MaximumMissesPolicy• Useadependencechainfrom thePCthathasgeneratedthemostmisses fortheapplication
• StallPolicy• UseadependencechainfromthePCthathascausedthemostfull-windowstalls fortheapplication
Dependence Chain Selection Policies
15
![Page 16: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/16.jpg)
-20
0
20
40
60
80
100
%IPCIm
provem
ent
RunaheadBuffer
PC-Policy
Maximum-MissesPolicy
StallPolicy
Dependence Chain Selection Policies
16
![Page 17: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/17.jpg)
0100200300400500600700800900
1000
Num
bero
fPCs
90%ofStalls
AllStalls
AllMisses
Why does Stall Policy Work?
17
![Page 18: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/18.jpg)
0100200300400500600700800900
1000
Num
bero
fPCs
90%ofStalls
AllStalls
AllMisses
Why does Stall Policy Work?
18
19PCscover90%ofallStalls
![Page 19: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/19.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
Normalize
dPerfo
rmance
1Chain
2Chains
4Chains
8Chains
16Chains
32Chains
Constrained Dependence Chain Storage
19
![Page 20: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/20.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
Normalize
dPerfo
rmance
1Chain
2Chains
4Chains
8Chains
16Chains
32Chains
Constrained Dependence Chain Storage
20
Storing1Chainprovides95%ofthePerformance
![Page 21: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/21.jpg)
Maintaintwostructures:• 32-entrycacheofPCstotracktheoperationsthatcausethepipelinetofrequentlystall• ThelastdependencechainforthePCthathascausedthemostfull-windowstalls
Ateveryfullwindowstall:• IncrementthecounterofthePCthatcausedthestall• GenerateadependencechainforthePCthathascausedthemoststalls
Continuous Runahead Chain Generation
21
![Page 22: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/22.jpg)
Runahead for Longer Intervals
ContinuousRunahead
Engine(CRE)
Core 0 Core 1
Core 2 Core 3
LLC
LLC
LLC
LLC
DRAMChannel 0
DRAMChannel 1
22
![Page 23: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/23.jpg)
• NoFront-End• NoRegisterRenamingHardware• 32PhysicalRegisters• 2-Wide• NoFloatingPointorVectorPipeline• 4kBDataCache
CRE Microarchitecture
23
![Page 24: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/24.jpg)
SHIFTP1->P9
ADDP7+1->P1
ADDP9+P1->P3
SHIFTP3->P2
LD[P2]->P8
Cycle:012345
RegisterRemappingTable:
ADDE5+1->E3
SHIFTE3->E4
ADDE4+E3->E2
SHIFTE2->E1
LD[E1]->E0
CorePhysicalRegister
CREPhysicalRegister
SearchList: P2
FirstCREPhysicalRegister
EAX EBX ECX
P8
E0
E0
P2
E1
E1
P3
E2
P3
P1
E3
E3
P9
E4
P9,P1P1P7
P7
E5MAPE3->E5
Dependence Chain Generation
24
![Page 25: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/25.jpg)
ADD E5 + 1 -> E3
SHIFT E3 -> E4 ADD E4 + E3 -> E2
SHIFT E2 -> E1
MEM_LD [E1] -> E0
Dependence Chain Generation
25
![Page 26: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/26.jpg)
0%10%20%30%40%50%60%70%80%90%
100%
1k 5k 10k 25k 50k 100k 250k 500k 1M 2MUpdateInterval(InstructionsRetired)
ContinuousRunaheadRequestAccuracy
GMeanPerformanceGain
Interval Length
26
![Page 27: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/27.jpg)
• Single-Core/Quad-Core• 4-wideIssue• 256EntryReorderBuffer• 92EntryReservationStation
• Caches• 32KB8-WaySetAssociativeL1I/D-Cache• 1MB8-WaySetAssociativeSharedLastLevelCacheperCore
• Non-UniformMemoryAccessLatencyDDR3System• 256-EntryMemoryQueue• BatchScheduling
• Prefetchers• Stream,GlobalHistoryBuffer• FeedbackDirectedPrefetching:DynamicDegree1-32
• CRECompute• 2-wideissue• 1ContinuousRunaheadissuecontextwitha32-entrybufferand32-entryphysicalregisterfile• 4kBDataCache
System Configuration
27
![Page 28: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/28.jpg)
0
20
40
60
80
100
120
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadBuffer
ContinuousRunahead
Single-Core Performance
28
![Page 29: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/29.jpg)
0
20
40
60
80
100
120
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadBuffer
ContinuousRunahead
Single-Core Performance
29
21%SingleCorePerformanceIncreaseoverpriorStateoftheArt
![Page 30: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/30.jpg)
0
20
40
60
80
100
120
140
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadBuffer
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Single-Core Performance + Prefetching
30
![Page 31: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/31.jpg)
0
20
40
60
80
100
120
140
%IPCIm
provem
ento
ver
No-PrefetchingBa
seline
RunaheadBuffer
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Single-Core Performance + Prefetching
31
IncreasesPerformanceoverandIn-ConjunctionwithPrefetching
![Page 32: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/32.jpg)
0%10%20%30%40%50%60%70%80%90%
100%
%Inde
pend
entC
ache
Misses
Prefetched
Independent Miss Coverage
32
![Page 33: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/33.jpg)
0%10%20%30%40%50%60%70%80%90%
100%
%Inde
pend
entC
ache
Misses
Prefetched
Independent Miss Coverage
33
70%Prefetch Coverage
![Page 34: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/34.jpg)
0
0.5
1
1.5
2
2.5
Normalize
dBa
ndwidth
ContinuousRunahead
StreamPF
GHBPF
Bandwidth Overhead
34
![Page 35: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/35.jpg)
0
0.5
1
1.5
2
2.5
Normalize
dBa
ndwidth
ContinuousRunahead
StreamPF
GHBPF
Bandwidth Overhead
35
LowBandwidthOverhead
![Page 36: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/36.jpg)
0
10
20
30
40
50
60
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 GMean
%W
eightedSpeedu
pIm
provem
ent
ContinuousRunahead
StreamPF
GHBPF
Multi-Core Performance
36
![Page 37: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/37.jpg)
0
10
20
30
40
50
60
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 GMean
%W
eightedSpeedu
pIm
provem
ent
ContinuousRunahead
StreamPF
GHBPF
Multi-Core Performance
37
43%WeightedSpeedupIncrease
![Page 38: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/38.jpg)
0
10
20
30
40
50
60
70
80
90
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 GMean
%W
eightedSpeedu
pIm
provem
ent
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Multi-Core Performance + Prefetching
38
![Page 39: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/39.jpg)
0
10
20
30
40
50
60
70
80
90
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 GMean
%W
eightedSpeedu
pIm
provem
ent
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Multi-Core Performance + Prefetching
39
13%WeightedSpeedupGainoverGHBPrefetching
![Page 40: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/40.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 Mean
EnergyNormalize
dto
No-PrefetchingBa
seline
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Multi-Core Energy Evaluation
40
![Page 41: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/41.jpg)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 Mean
EnergyNormalize
dto
No-PrefetchingBa
seline
ContinuousRunahead
StreamPF
GHBPF
ContinuousRunahead+Stream
ContinuousRunahead+GHB
Multi-Core Energy Evaluation
41
22%EnergyReduction
![Page 42: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/42.jpg)
• Runaheadprefetch coverageislimitedbythedurationofeachrunaheadinterval• Toremovethisconstraint,weintroducethenotionofContinuousRunahead•WecandynamicallyidentifythemostcriticalLLCmissestotargetwithContinuousRunaheadbytrackingtheoperationsthatcausethepipelinetofrequentlystall•WemigratethesedependencechainstotheCREwheretheyareexecutedcontinuouslyinaloop
Conclusions
42
![Page 43: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/43.jpg)
• ContinuousRunaheadgreatlyincreasesprefetch coverage• Increasessingle-coreperformanceby34.4%• Increasesmulti-coreperformanceby43.3%• Synergisticwithvarioustypesofprefetching
Conclusions
43
![Page 44: Continuous Runahead: Transparent Hardware Acceleration for ...€¦ · UT Austin/Google, ETH Zürich, UT Austin October 19th, 2016. Continuous Runahead Outline ... • No Front-End](https://reader034.fdocuments.net/reader034/viewer/2022051910/600066fa3da7aa532f5dfa4b/html5/thumbnails/44.jpg)
Continuous Runahead: Transparent Hardware Acceleration for Memory Intensive Workloads
Milad Hashemi, Onur Mutlu, Yale N. PattUT Austin/Google, ETH Zürich, UT Austin
October 19th, 2016
44