University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas...

17
University of Toronto University of Toronto Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering Jason Zebchuk Jason Zebchuk and Andreas Moshovos and Andreas Moshovos {zebchuk,moshovos}@eecg.toronto.edu {zebchuk,moshovos}@eecg.toronto.edu Workshop on Complexity-Effective Design - June 2006 June 2006 RegionTracker: RegionTracker: Using Dual-Grain Tracking for Using Dual-Grain Tracking for Energy Efficient Cache Lookup Energy Efficient Cache Lookup

description

June 18, 2006 Zebchuk © 3RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup Dual-Grain Tracking Memory as a collection of REGIONS Memory as a collection of blocks n Region: 2 n sized, aligned memory area n Similar concept already used by various structures l TLB, Page Table

Transcript of University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas...

Page 1: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

University of TorontoUniversity of TorontoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering

Jason ZebchukJason Zebchukand Andreas Moshovosand Andreas Moshovos

{zebchuk,moshovos}@eecg.toronto.edu{zebchuk,moshovos}@eecg.toronto.edu

Workshop on Complexity-Effective Design - June 2006June 2006

RegionTracker:RegionTracker:Using Dual-Grain Tracking forUsing Dual-Grain Tracking forEnergy Efficient Cache LookupEnergy Efficient Cache Lookup

Page 2: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 2RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

I$D$

CPU

L2 D

ATA

Need for Energy EfficientL2 Lookups

Locate blocks in high level caches more efficiently Conventional tags are getting larger

Technology, microarchitectural and application trends Larger caches use more energy

Demonstrate lookup energy reductions up to 82% Up to 38% average across SPEC

L2 T

AGS

Page 3: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 3RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Dual-Grain Tracking

Memory as a collection of REGIONS

Memory as a collection of blocks

Region: 2n sized, aligned memory area

Similar concept already used by various structures TLB, Page Table

Page 4: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 4RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Program Behavior / Motivation

Few active Regions “Bursty” access Mostly gone before accessed again RegionTracker:

Identify First Misses Track block location for Few Regions

In principle In practiceAnd before

is touched again

How can this reduce energy?

Page 5: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 5RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

I$D$

CPU

L2 D

ATA

RegionTracker: Low Power Lookups

Frequent case: Few Active Regions Macroscopically Transient

RegionTracker: Dynamically Identify Newly Touched Regions Track block location using a compact structure

L2 T

AGS

I$D$

CPU

L2 D

ATA

L2 T

AGS

Page 6: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 6RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

RegionTracker Organization

CRH for First Miss Detection: 5% of tags

CBV for Tracking blocks within 128 regions: 17.5%

128 x 8kB regions = 1MB tracked (at most 25% of a 4MB L2)

I$D$

CPU

L2 D

ATA

L2 T

AGS

Page 7: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 7RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Which Regions are Cached?

If we had as many counters as regions: Block Allocation: counter[region]++ Block Eviction: counter[region]-- Region cached only if counter[Region] non-zero

Not Practical: E.g., 8KB Regions and 4GB Memory 512K counters

Region Tag offset

counter

Page 8: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 8RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Which Regions are Cached?

Region Tag offset

counter

hash()

Imprecise: Records a superset of currently cached Regions False positives: lost opportunity, correctness preserved Small: e.g., 512-4k entries for 2MB or 4MB cache

First Miss: Full location information for ALL BLOCKS No need for temporal locality

Cached Region Hash(CRH)

Page 9: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 9RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

CBV: Tracking Blocks within Regions

Region Tag

Block infoRegion Tag offsetblock

Block #0 Block #63

Which data way is the block cached at?

Parallel lookup of RegionTag and Block Info Experiments with 64 and 128 entry, 8-way set-

associative CBV

4

256

Page 10: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 10RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Conventional Solution

Tag Hierarchy Requires Locality

Temporal Spatial as long as L2 block size > L1 block size

Latency limited Not very energy efficient RegionTracker is Better

I$D$

CPU

L2 D

ATA

L2 T

AGS

Page 11: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 11RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Tag Hierarchy

Set Tag

Block Tag OffsetSet

Tag #0 Tag #7

Each access reads/writes 23 bytes Sequential Comparison of Set Tag AND Block Tag

23

184

= = = = = = = =

Page 12: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 12RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Complexity Tradeoffs

Tag Heirarchy Read/Write 184 bits

Complex Wiring to transfer 184 bits Updated on every Tag Hierarchy miss

RegionTracker Read/Write 4 bits

Only 4 bits transferred from tag array Updated on L2 misses only Flexible implementation (vertical/horizontal partitioning) No modification to conventional cache policies/structures

Page 13: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 13RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Methodology

Processor Deeply-Pipelined 128-entry window 8-way superscalar 32kB L1 instruction and data caches

Spec CPU 2000 / Reference Inputs

10 Billion Committed Instr. Samples after 100B

Used CACTI to estimate energy requirements

Page 14: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 14RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Energy Savings w/ 4MB L2

Average reduction of 38% Up to 82% reduction (gzip) Robust performance, significant power savings for most programs

Bet

ter

CRH/CBV:

-20%

0%

20%

40%

60%

80%

100%

applu lucas art twolf mgrid fma3d gzip vortex vpr Average

% L

2 Lo

okup

Ene

rgy

Save

d1k/64 4k/64 1k/128 4k/128

Page 15: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 15RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

Tag Hierarchy Savings

-40%

-30%

-20%

-10%

0%

10%

2MB 4MB 8MB 16MB

L2 Cache Size

% L

2 Lo

okup

Ene

rgy

Red

uctio

n16 32 64 128

Only 2 configurations actually save power! Similar fraction of requests served by RegionTracker RegionTracker much better!

Sets:

Page 16: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 16RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

RegionTracker Summary

Coarse-Grain tracking to capture first misses Dual-Grain tracking to track blocks

Service many L2 Requests Reduce L2 Lookup Energy

Does not require temporal locality Can exploit spatial locality much better than a tag

hierarchy

Significantly reduces L2 Lookup Power with minimal additional complexity

Page 17: University of Toronto Department of Electrical and Computer Engineering Jason Zebchuk and Andreas Moshovos June 2006.

June 18, 2006 Zebchuk© 17RegionTracker: Using Dual-GrainTracking for Energy Efficient Cache Lookup

RegionTracker:RegionTracker:Using Dual-Grain Tracking for Using Dual-Grain Tracking for Energy Efficient Cache LookupEnergy Efficient Cache Lookup

Jason Zebchuk Jason Zebchuk and Andreas Moshovosand Andreas Moshovos{zebchuk, moshovos}@eecg.toronto.edu{zebchuk, moshovos}@eecg.toronto.edu

University of TorontoUniversity of TorontoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering