A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of...
-
Upload
roger-collin-bryan -
Category
Documents
-
view
219 -
download
0
Transcript of A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of...
![Page 1: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/1.jpg)
A Hardware-based Cache A Hardware-based Cache Pollution Filtering Pollution Filtering Mechanism for Mechanism for Aggressive PrefetchesAggressive Prefetches
Georgia Institute of TechnologyGeorgia Institute of TechnologyAtlanta, GA 30332Atlanta, GA 30332
ICPP, Kaohsiung, Taiwan, 2003
Xiaotong ZhuangXiaotong Zhuang Hsien-Hsin Sean
LeeCollege of ComputingCollege of Computing School of Electrical andSchool of Electrical andComputer EngineeringComputer Engineering
![Page 2: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/2.jpg)
2ICPP-03
AgendaAgenda
IntroductionIntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion
![Page 3: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/3.jpg)
3ICPP-03
AgendaAgenda
IntroductionIntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion
![Page 4: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/4.jpg)
4ICPP-03
Data PrefetchingData PrefetchingWhyWhy data prefetching data prefetching??
Speed gap between CPU and main memory Initial data references still miss Performance suffers if no enough independent instructions
to mask the latencyPrefetching techniquesPrefetching techniques
Hardware-based Software-based
Design Trend Design Trend Memory bandwidth increase more aggressive prefetch L1 cache is getting smaller for expediting accesses
When When prefetchingprefetching becomes “ becomes “tootoo aggressive”aggressive” Severe pollution Performance overkill
![Page 5: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/5.jpg)
5ICPP-03
Cache PollutionCache PollutionSource of pollutionSource of pollution
No prefetching guarantees 100% accuracy HW-based prefetching can cause a lot of pollution Stride-based prefetching can easily become ineffective for
pointer-based applications
OutcomesOutcomes of pollution of pollution Evict useful data Compete for available resources
Limited size of cache capacity Cache ports Bus bandwidth between components of memory hiearchy
Degrade performance
![Page 6: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/6.jpg)
6ICPP-03
Related WorkRelated WorkPrefetch bufferPrefetch buffer [Chen et al. ‘91] [Chen & Baer ‘95]
Separate normal and prefetched data, access in parallel Small-size, fully-associative, in critical path
Evict-meEvict-me [Wang et al. ’02]
Reuse distance check, mark unused or distance too long Evict-me data have higher priority to be cast out
Dead cache line detection [Lai, Fide & Falsafi ’01]
Detect dead blocks and replace with useful prefetches Prevent useful data from being evicted
Prefetch taxonomy [Srinivasan et al. ‘99]
More detailed classification of prefetches Proposed “static filter”—profiling based pollution filtering
![Page 7: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/7.jpg)
7ICPP-03
Our ContributionOur ContributionCharacterization of prefetch effectivenessPropose and evaluate two hardware prefetch
pollution filtering mechanisms Per-Address (PA) based Program Counter (PC) based
Quantify our technique through simulation
![Page 8: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/8.jpg)
8ICPP-03
AgendaAgenda
IntroductionMotivationMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion
![Page 9: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/9.jpg)
9ICPP-03
Prefetch ClassificationPrefetch Classification
Prefetch classification Comprehensive classification is not desirable due
to its implementation complexity in hardware Good or effective— those referenced in the cache
before they are evicted Bad or ineffective — those never referenced
during their lifetime in the cache
![Page 10: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/10.jpg)
10ICPP-03
Prefetch EffectivenessPrefetch Effectiveness
11 benchmarks, HW prefetch—NSP, SDP, SW prefetchMore than 52% prefetches are bad!!
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1good prefetch bad prefetch
Norm
aliz
ed #
of
Pre
fetc
hes
![Page 11: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/11.jpg)
11ICPP-03
AgendaAgenda
IntroductionMotivationThe Prefetch Pollution FilterThe Prefetch Pollution FilterExperimental ResultsConclusion
![Page 12: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/12.jpg)
12ICPP-03
Cache Pollution FilterCache Pollution Filter
OOO Core
L1 Cache
LD
/ST
Q
ueue
L2 Cache
HardwarePrefetcher
Pre
fetc
h Q
ueue
Issu
e P
refe
tch
SW
Pre
fetc
hes
look
up
Prefetch Pollution FilterPrefetch Pollution Filter
History Tablearray of 2-bit counters
Hash
Upd
ate
Ld/st inst includ.SW prefetches
TAG
Reference Indication Bit (RIB)
Pre
fetc
h In
dica
tion
bit
(P
IB)
DATA
![Page 13: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/13.jpg)
13ICPP-03
Prefetch Pollution FiltersPrefetch Pollution FiltersPA-basedPA-based
Per-Address-based, track cache line addresses issued by each prefetch operation
Can distinguish different prefetch addresses by the same issuing instruction
Need longer history table to reduce aliasing
PC-basedPC-based Track the program counter that triggers a prefetch SW prefetch: PC of the prefetch instruction HW pretetch: the memory instruction that triggers the
prefetch Less aliasing, tolerate smaller history table, less precise
![Page 14: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/14.jpg)
14ICPP-03
AgendaAgenda
IntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusion
![Page 15: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/15.jpg)
15ICPP-03
SimulationSimulation Configuration Configuration (Default)(Default)
Processor Caches
Target frequency 2GHz L1 I/D 8K, 32-byte lineDM, 1 cycle
Issue/retire width 8 per cycle
Reorder bufer 128 entries L1 D ports 3
Load/store queue 64 entries L2 I/D 512K 32-byte line4 way 15 cycle delay
Branch Predictor Bimodal with 2048 entries L2 I/D ports 1
BTB size 4096 sets, assoc=4 Prefetcher
Memory Queue Len 64 entries
Latency 150 core cycles Pollution Filter
Bus 64 byte wide Hist table 1KB, 4K entries
![Page 16: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/16.jpg)
16ICPP-03
BenchmarksBenchmarks and Miss Rates and Miss Rates
Benchmarks Input data sets L1 miss rate L2 miss rate
bh 2048 bodies 0.0464 0.0026
em3d 100 nodes 10 arity 10K iter 0.2161 0.0001
perimeter 12 Levels 0.0478 0.2709
ijpeg penguin.ppm 0.0565 0.0235
fpppp natoms.in 0.0807 0.0003
Gcc cp-decl.i 0.0551 0.0221
Wave5 wave5.in 0.1387 0.0209
Gap ref.in 0.0409 0.2247
Gzip input .graphic 0.0597 0.3176
Mcf inp.in 0.0648 0.2426
![Page 17: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/17.jpg)
17ICPP-03
Prefetch Reduction Prefetch Reduction Comparison Comparison ((Default Default ModelModel))
Normalized to the good one without filteringLoss of bad prefetches: 97%(PA) 98%(PC)Loss of good prefetches: 51%(PA) 48%(PC)Traffic reduction: 75%(PA) 74%(PC)
Norm
aliz
ed #
of
Pre
fetc
hes
0
0.5
1
1.5
2
2.5
3
3.5
bad(no filtering) bad(PA) bad(PC) good(no filtering) good(PA) good(PC)
![Page 18: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/18.jpg)
18ICPP-03
IPC IPC Comparison Comparison (Default Model)(Default Model)
Increase: 8.2%(PA) 9.1%(PC)
0
0.5
1
1.5
2
2.5
3
3.5no-filtering PA-based PC-based
IPC
![Page 19: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/19.jpg)
19ICPP-03
Prefetch Reduction Prefetch Reduction Comparison Comparison Comparison Comparison (32KB)(32KB)
00.20.40.60.8
11.21.41.6
bad(no filtering) bad(PA) bad(PC) good(no filtering) good(PA) good(PC)
Loss of bad prefetches: 91%(PA) 92%(PC)Loss of good prefetches: 35%(PA) 27%(PC)Traffic reduction: 52%(PA) 47%(PC)
![Page 20: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/20.jpg)
20ICPP-03
IPC IPC Comparison Comparison (32K Cache(32K Cache Model Model))
Increase: 7.0%(PA) 8.1%(PC)
0
0.5
1
1.5
2
2.5
3
3.5
4no-f iltering PA-based PC-based
IPC
![Page 21: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/21.jpg)
21ICPP-03
IPC for Different History Table IPC for Different History Table SizesSizes
Jump at 2k-4k, 6% <1% before & after
0
0.5
1
1.5
2
2.5
3
3.51K 2K 4K 8K 16K
IPC
![Page 22: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/22.jpg)
22ICPP-03
Bad/Good Prefetch Ratio for Bad/Good Prefetch Ratio for DDifferent ifferent ## of L1 Ports of L1 Ports
6% drop from 3-port to 4-port, 2% drop from 4-port to 5-port
Bad/G
ood P
refe
tch R
ati
o
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
13-port 4-port 5-port
![Page 23: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/23.jpg)
23ICPP-03
IPC for IPC for DifferentDifferent ## of L1 of L1 PortsPorts
4% speedup from 3-port to 4-port, <1% speedup from 4-port to 5-port
0
0.5
1
1.5
2
2.5
3
3.53-port 4-port 5-port
IPC
![Page 24: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/24.jpg)
24ICPP-03
Bad/Good Prefetch Ratio wBad/Good Prefetch Ratio w// Prefetch BufferPrefetch Buffer
Prefbuf, on critical path, very smallPrefbuf, no reduction in traffic, short lifetime for good prefetch
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6PA-based(no prefbuf) PA-based(prefbuf)
PC-based(no prefbuf) PC-based(prefbuf)
![Page 25: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/25.jpg)
25ICPP-03
IPC Comparison wIPC Comparison w// Prefetch Prefetch BufferBuffer
IPC Loss: 9% (PA) 10%(PC)
0
0.5
1
1.5
2
2.5
3
3.5
4PA-based(no prefbuf) PA-based(prefbuf)
PC-based(no prefbuf) PC-based(prefbuf)
IPC
![Page 26: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/26.jpg)
26ICPP-03
AgendaAgenda
IntroductionMotivationThe Prefetch Pollution FilterExperimental ResultsConclusionConclusion
![Page 27: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/27.jpg)
27ICPP-03
Conclusion Conclusion Too aggressive prefetching is an overkillLots of prefetches are ineffective
Cannot remove SW-induced prefetches without source code Have to live with HW-induced prefetches Need dynamic HW-based prefetch filtering schemes
We propose (1) Per-Address-based and (2) Program-Counter-based that can Filter out ~98% bad prefetches for 8KB L1 Filter out ~92% bad prefetches for 32KB L1 Most good prefetches are retained ~50%(8K L1) ~70%(32K L1)
Improvement Traffic reduced by ~75%(8K L1) ~50%(32K L1) Overall IPC improved by 7% to 9%
History table size can be reasonably smallImprovements decrease when more cache ports are addedIPC loses (9-10 %) with dedicated prefetch buffer for
aggressive prefetching
![Page 28: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/28.jpg)
28ICPP-03
That’s All Folks !That’s All Folks !Thanks Archbeer!Thanks Archbeer!
![Page 29: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/29.jpg)
29ICPP-03
Bad/Good Prefetch Bad/Good Prefetch Ratio Comparison Ratio Comparison ((Default ModelDefault Model))
Reduction: 70%(PA) 91%(PC)
0
0.5
1
1.5
2
2.5
3
3.5 no-filtering PA-based PC-based
Bad/G
ood P
refe
tch R
ati
o
![Page 30: A Hardware-based Cache Pollution Filtering Mechanism for Aggressive Prefetches Georgia Institute of Technology Atlanta, GA 30332 ICPP, Kaohsiung, Taiwan,](https://reader030.fdocuments.net/reader030/viewer/2022032604/56649e675503460f94b63353/html5/thumbnails/30.jpg)
30ICPP-03
Bad/Good Prefetch Bad/Good Prefetch Ratio Comparison Ratio Comparison (32KB)(32KB)
Reduction: 75%(PA) 93%(PC)
Bad
/Good
Pre
fetc
h R
ati
o
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6no-f iltering PA-based PC-based