Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev...
-
Upload
geraldine-richardson -
Category
Documents
-
view
222 -
download
1
description
Transcript of Efficiently Prefetching Complex Address Patterns Manjunath Shevgoor, Sahil Koladiya, Rajeev...
Variable Length Delta Prefetcher 1
Efficiently Prefetching Complex Address
PatternsManjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian
University of UtahChris Wilkerson, Zeshan Chishti, Seth Pugsley
*Intel Labs
Variable Length Delta Prefetcher 2
Prefetchers
Confirmation Based Prefetchers• Issue predictions after a few deltas• High Accuracy• Short Streams Lose out
Immediate Prefetchers• Aggressive• Low Accuracy• Waste DRAM bandwidth and
cache capacity
Accurate Fast
Variable Length Delta Prefetcher 3
Spatial Correlation• Learn Access (Delta) Patterns• Apply patterns when similar conditions re-occur. • Eg: PC, physical address, delta patterns
Delta Patterns• Regular Delta Patterns. Eg: ( +1, +1, +1)…, (+2, +2, +2, +2)…• Irregular Delta Patterns. Eg: ( +1, +2, +3 )…
Variable Length Delta Prefetcher 4
Long Repeatable Streams of Irregular Deltas
Page Num: 479218 Deltas: 1, 9, -8, 1, 8, 1, -8, 1, 1, 7……..
Delta patterns for milc
Variable Length Delta Prefetcher 5
Long Repeatable Streams of Irregular Deltas
Deltas : 1, 9, -8, 1, 8, 1, -8, 1, 1, 7, -1, -5,…..Cache Line: A+1, A+10, A+2, A+3, A+11, A+12, A+4, A+5, A+6, A+13, A+12, A+7……
Stream 1 : A+1, A+2, A+ 3, A+4, A+5, A+6, A+7 Stream2: A+10, A+11, A+12, A+13
Confirmation Prefetches
Stride Prefetcher Coverage: 5/11
SandBox Prefetcher Coverage: 9/11
Neither are perfectly timely!
Variable Length Delta Prefetcher 6
Variable Length Delta Prefetcher
Variable Length Delta Prefetcher 7
Core 1
Last
Lev
el $
$
$ Access
$ AccessCore 8
Delta Prediction TablesPer Page
Delta History Tables
Per Page Delta History
Tables
PredictedDelta/OffsetOffset Prediction
Tables
Delta Prediction Tables
Offset Prediction Tables
Structure of VLDP
PredictedDelta/Offset
Variable Length Delta Prefetcher 8
Delta History Table Tracks delta within a page
for (i=0;i<BIGNUM; i++){
a[i]=b[i]+c[i];}
a, b, c can each belong to different pages So Deltas between pages is meaningless
Delta = Last Address- Current Address
Variable Length Delta Prefetcher 9
Delta History Table
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Times Used
Last Four Prefetched Offsets
Variable Length Delta Prefetcher 10
Delta Prediction Tables
Delta(1) Pred. Accuracy
8 b 8 b 2 b
Deltas (3) Pred. Accuracy
8b 8b 8b 8b 2b
Match?
Predicted Delta
64 Rows per Table
Highest Priority (t=3)Lowest Priority (t=1)
MUX
…
Match?
Variable Length Delta Prefetcher 11
Offset Prediction TableFirst Page
OffsetPred.Offset
Accuracy
7 b 7 b 2 b
OPT is used only to predict the second access to a page
Variable Length Delta Prefetcher 12
Need for Multiple TablesRepeating Delta Pattern- (1, 2, 3, 5, 2, 4)…
Delta Pred.1 22 33 55 2
Delta Pred.1,2 32,3 53,5 25,2 4
Table 1 Table 2
50% Accuracy
Search for Delta pattern match starts from right most table
Variable Length Delta Prefetcher 13
Looking farther than one Delta aheadRepeating Delta Pattern- (1, 2, 3), (1, 2, 3)…….
Delta Pred.1 22 33 1- -
Delta Pred.1,2 32,3 13,1 2-,- -
Degree 1 Prediction
Current Delta
Variable Length Delta Prefetcher 14
Looking farther than one Delta aheadRepeating Delta Pattern- 1, 2, 3, 1, 2, 3…….
Delta Pred.1 22 33 1- -
Delta Pred.1,2 32,3 13,1 2-,- -
Degree 1 Prediction
Degree 2 Prediction
Use Recursive lookup to look farther than one Delta
Current Delta Deg 1 Prediction
Variable Length Delta Prefetcher 15
Case Study: Streaming WorkloadsRepeating Delta Pattern- 1, 1, 1, 1, 1…
Delta Pred.1 1- -- -- -
Delta Pred.-,- --,- --,- --,- -
Table 1 Table 2
Patterns learned from one page is applied to another
Variable Length Delta Prefetcher 16
Updating the Delta History TablesEvict Not Recently Used
If Page present, add
Delta
If Page not present, replace
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Used
Last 4 Prefetches
Page Num.
Last Add.
Last 4 Deltas
Last Predictor
Num. Used
Last 4 Prefetches
LLC Access
Variable Length Delta Prefetcher 17
Updating the Prediction TablesPage Num.
Last Add.
Last 3 Deltas
B, C, D
Delta Pred.B,C,D E?
- -- -- -
Table 3
ELatest Delta If Prediction is Correct
Increment AccuracyIf Prediction of Wrong Decrement Accuracy If Accuracy==0 Update + Promote PredictionIf Prediction is Missing Seed T1 with prediction
Delta Pred.C,D E?
- -- -- -
Delta Pred.D F?- -- -- -
Table 2Table 1
Can the current state predict Latest Delta?
Last Predictor
Variable Length Delta Prefetcher 18
Populating the Prediction Tables
Delta Pred.1 A- -- -- -
Delta Pred.1,1 B-,- --,- --,- -
Delta Pred.1,1,1 C
- -- -- -
Table 1 Table 2 Table 3Table 1Wrong
Table 2Wrong
NRU NRUNRU
If mis-predict, a longer Delta history might be needed
Pattern Missing
Variable Length Delta Prefetcher 19
Evaluation Methodology• Simics + USIMM• 8 RISC cores, UltraSPARC III ISA• 3.2 GHz, 4-wide OoO, 128-entry RoB• 32 KB I&D L1 caches, 4 cycles• 8 MB shared (1MB per core) L2 cache, 10 cycles
• DRAM Specifications• 2Channels, 2 Ranks per Channel, 8 Banks per Rank• 800MHz DDR3 DRAM
• SPEC 2006, NPB, and Cloudsuite• Mix1- milc, astar, lbm, libq; Mix2- xalancbmk, lbm, zeusmp,
milc;
Variable Length Delta Prefetcher 20
VLDP Configuration• Per-Core VLDP• 1 Offset Prediction Table, 64 entry• 3 Delta Prediction Tables, 64 entries each• 16 entry Delta History Table• Only Delta Prediction Tables 2,3 contribute to multi degree prefetch
Offset Prediction Table 128 B
Delta History Table 222 B
Delta Prediction Table 648 B
Total 998 B/Core
Variable Length Delta Prefetcher 21
Performance Improvement (Vs No PC)
VLDP is 6% better than AMPM 9% better than SBP17% better than FDP
CG IS LU MG SPClassi
fCloud
Astar
Lbm Lib
qMcf
Milc
Omnet
Soplex
XalancZeus
Mix1 Mix2 GM0.81.01.21.41.61.82.0 FDP SBP AMPM VLDP
Spee
dup
Variable Length Delta Prefetcher 22
Performance Improvement (Vs PC)
VLDP is7.1% better than GHB7.6% better than SMS
CG IS LU MG SPClassi
c
Cloud9Asta
rLb
m Libq
McfMilc
Omnet
Soplex
XalaZeus
Mix1 Mix2 GM0.81.01.21.41.61.82.0 SMS GHB_PC_DC VLDP
Spee
dup
Variable Length Delta Prefetcher 23
Coverage
FDP 16%SMS 55%SBP 40%
GHB 33%AMPM 49%VLDP 61%
NPB CloudSuite Spec2006 Spec2006-Mix
GM0%20%40%60%80%
100%120%
FDP SMS SBP GHB_PC_DC AMPM VLDP
Cove
rage
Variable Length Delta Prefetcher 24
Sensitivity to table size
32Page_8T
32Page_16T
32Page_32T
32Page_64T
16Page_8T
16Page_16T
16Page_32T
16Page_64T
8Page_8T
8Page_16T
8Page_32T
8Page_64T0.980.991.001.011.021.03
Spee
dup
2% increase in performance when DPT size is increased
Variable Length Delta Prefetcher 25
Sensitivity number of Delta Prediction Tables
3DPT improves efficiency despite a modest 1% performance improvement by reducing DRAM requests by 3%
1DPT_NoOPT 1DPT+OPT 2DPT+OPT 3DPT+OPT 4DPT+OPT1
1.1
1.2
1.3
1.4
1.5Speedup DRAM Accesses
Variable Length Delta Prefetcher 26
Conclusions•OPT Issues predictions without confirmation•DPT recognizes Irregular Delta Patterns• Long delta patterns provide high accuracy• Less than 1KB per core overhead• 6% better performance
Variable Length Delta Prefetcher 27
Thank You