PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H....

Post on 21-Jan-2016

222 views 0 download

Tags:

Transcript of PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches Yuejian Xie, Gabriel H....

PIPP:Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches

Yuejian Xie, Gabriel H. LohGeorgia Institute of Technology

Presented by: Yingying Tian

36th ACM/IEEE International Symposium on Computer Architecture (ISCA ‘09)

Last Level Caches (LLCs) are shared by all cores in Chip Multi-Processors (CMPs).

Multiple cores compete for the limited LLC capacity.

Manage Shared Caches

Core0

L1I L1D

Core1

L1IL1D

Last Level Cache (LLC)Core1’s DataCore0’s Data

LRU leads to poor performance and fairness as a sharing-oblivious cache management policy.

Previous works tried to allocate LLC resources fairly via: Capacity Management: way-partitioning

(UCP) Dead-Time Management: LRU insertion

(TADIP)

PIPP: Do both capacity and dead time management better at the same time !

Outline

Background and Motivation Previous Work PIPP Evaluation Conclusion

UCP (Utility based Cache Partitioning) `

Core1Core0

Core 0 gets 5 ways

Core 1 gets 3 ways

*Some materials are taken from original presentation slides.

DIP (Dynamic Insertion Policy)

MRU LRU

Incoming Block

MRU LRU

Occupies one cache blockfor a long time with no benefit!

DIP (Dynamic Insertion Policy)

DIP (Dynamic Insertion Policy)

MRU LRU

Incoming Block

DIP (Dynamic Insertion Policy)

MRU LRU

Useless Block Evicted at next eviction

Useful Block Moved to MRU position

DIP (Dynamic Insertion Policy)

MRU LRU

Useless Block Evicted at next eviction

Useful Block Moved to MRU position

Cache Replacement Policy Eviction: Which block should be

replaced when a cache miss occurs? LRU block

Insertion: For a coming block, where should it be inserted in the corresponding set? MRU insertion (Default LRU replacement

policy) LRU insertion (Dead-on-arrival blocks)

Promotion: If a block is re-referenced, where should its position be adjusted? Move to MRU position

PIPP: Promotion/Insertion Pseudo-Partitioning Insertion:Target partitioning: ∏ = {∏1, ∏2, …., ∏n},

∑∏i = w (w is the associativity of the cache)On insertion, corei inserts its coming block in position ∏i. (Dynamically computed via

UCP monitors or other ways.) Promotion:One step toward MRU position with P and unchanged with 1-P.

MRU LRU

To Evict

Promote

Hit

Insert Position = 3 (Target Allocation) New

13

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 3 4 5B C

Core0’s Block Core1’s Block

Request

MRU LRU

Core1’s quota=3

D

14

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 53 4 D B

Core0’s Block Core1’s Block

Request

MRU LRU

6

Core0’s quota=5

15

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 6 3 4 D B

Core0’s Block Core1’s Block

Request

MRU LRU

Core0’s quota=5

7

16

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 6 3 4 D

Core0’s Block Core1’s Block

Request

MRU LRU

D

7

17

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 7 6 4

Core0’s Block Core1’s Block

Request

MRU LRU

Core1’s quota=3

D3

E

18

PIPP ExampleCore0 quota: 5 blocksCore1 quota: 3 blocks

1 A 2 7 6 D

Core0’s Block Core1’s Block

Request

MRU LRU

3E

2

19

Pseudo-Partition Benefit

MRU0

Core0 quota: 5 blocksCore1 quota: 3 blocks

Core0’s Block Core1’s Block

Request

Strict Partition

MRU1 LRU1LRU0

New

20

Pseudo-Partition Benefit

MRU LRU

Core0 quota: 5 blocksCore1 quota: 3 blocks

Core0’s Block Core1’s Block

Request

New

Pseudo Partition

Methodology

SimpleScalar simulator for x86 Intel Core 2 processor 32KB, 8-way 3-cycle L1I-L1D for

each core A shared 4MB, 16-way, 11-cycle LLC Multi-programmed workloads from

SPEC CPU benchmarks. (2-core and 4-core workloads)

500m insns warmup, 250m insns simulation

Evaluation 2-Core Weighted Speedup

TADIP FriendlyUCP Friendly

PIPP outperforms LRU by 19.0%, UCP by 10.6%, TADIP by 10.1%

4-Core Weighted Speedup

TADIP FriendlyUCP Friendly

PIPP outperforms LRU by 21.9%, UCP by 12.1%, TADIP by 17.5%

Occupancy Control

For most workloads, the partitioning deviation is within 1.0 of the target allocation, similar to UCP.

Conclusion

Novel proposal on Insertion and Promotion

A single unified mechanism provides both capacity and dead time management

Outperforms prior UCP and TADIP

Thank you !

Questions?