PIPP:Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches
Yuejian Xie, Gabriel H. LohGeorgia Institute of Technology
Presented by: Yingying Tian
36th ACM/IEEE International Symposium on Computer Architecture (ISCA ‘09)
Last Level Caches (LLCs) are shared by all cores in Chip Multi-Processors (CMPs).
Multiple cores compete for the limited LLC capacity.
Manage Shared Caches
Core0
L1I L1D
Core1
L1IL1D
Last Level Cache (LLC)Core1’s DataCore0’s Data
LRU leads to poor performance and fairness as a sharing-oblivious cache management policy.
Previous works tried to allocate LLC resources fairly via: Capacity Management: way-partitioning
(UCP) Dead-Time Management: LRU insertion
(TADIP)
PIPP: Do both capacity and dead time management better at the same time !
Outline
Background and Motivation Previous Work PIPP Evaluation Conclusion
UCP (Utility based Cache Partitioning) `
Core1Core0
Core 0 gets 5 ways
Core 1 gets 3 ways
*Some materials are taken from original presentation slides.
DIP (Dynamic Insertion Policy)
MRU LRU
Incoming Block
MRU LRU
Occupies one cache blockfor a long time with no benefit!
DIP (Dynamic Insertion Policy)
DIP (Dynamic Insertion Policy)
MRU LRU
Incoming Block
DIP (Dynamic Insertion Policy)
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
DIP (Dynamic Insertion Policy)
MRU LRU
Useless Block Evicted at next eviction
Useful Block Moved to MRU position
Cache Replacement Policy Eviction: Which block should be
replaced when a cache miss occurs? LRU block
Insertion: For a coming block, where should it be inserted in the corresponding set? MRU insertion (Default LRU replacement
policy) LRU insertion (Dead-on-arrival blocks)
Promotion: If a block is re-referenced, where should its position be adjusted? Move to MRU position
PIPP: Promotion/Insertion Pseudo-Partitioning Insertion:Target partitioning: ∏ = {∏1, ∏2, …., ∏n},
∑∏i = w (w is the associativity of the cache)On insertion, corei inserts its coming block in position ∏i. (Dynamically computed via
UCP monitors or other ways.) Promotion:One step toward MRU position with P and unchanged with 1-P.
MRU LRU
To Evict
Promote
Hit
Insert Position = 3 (Target Allocation) New
13
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 3 4 5B C
Core0’s Block Core1’s Block
Request
MRU LRU
Core1’s quota=3
D
14
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 53 4 D B
Core0’s Block Core1’s Block
Request
MRU LRU
6
Core0’s quota=5
15
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 6 3 4 D B
Core0’s Block Core1’s Block
Request
MRU LRU
Core0’s quota=5
7
16
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 6 3 4 D
Core0’s Block Core1’s Block
Request
MRU LRU
D
7
17
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 7 6 4
Core0’s Block Core1’s Block
Request
MRU LRU
Core1’s quota=3
D3
E
18
PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks
1 A 2 7 6 D
Core0’s Block Core1’s Block
Request
MRU LRU
3E
2
19
Pseudo-Partition Benefit
MRU0
Core0 quota: 5 blocksCore1 quota: 3 blocks
Core0’s Block Core1’s Block
Request
Strict Partition
MRU1 LRU1LRU0
New
20
Pseudo-Partition Benefit
MRU LRU
Core0 quota: 5 blocksCore1 quota: 3 blocks
Core0’s Block Core1’s Block
Request
New
Pseudo Partition
Methodology
SimpleScalar simulator for x86 Intel Core 2 processor 32KB, 8-way 3-cycle L1I-L1D for
each core A shared 4MB, 16-way, 11-cycle LLC Multi-programmed workloads from
SPEC CPU benchmarks. (2-core and 4-core workloads)
500m insns warmup, 250m insns simulation
Evaluation 2-Core Weighted Speedup
TADIP FriendlyUCP Friendly
PIPP outperforms LRU by 19.0%, UCP by 10.6%, TADIP by 10.1%
4-Core Weighted Speedup
TADIP FriendlyUCP Friendly
PIPP outperforms LRU by 21.9%, UCP by 12.1%, TADIP by 17.5%
Occupancy Control
For most workloads, the partitioning deviation is within 1.0 of the target allocation, similar to UCP.
Conclusion
Novel proposal on Insertion and Promotion
A single unified mechanism provides both capacity and dead time management
Outperforms prior UCP and TADIP
Thank you !
Questions?
Top Related