1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus...

25
1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre and Lasse Natvig Norwegian University of Science and Technology Energy Micro

Transcript of 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus...

Page 1: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

1

DIEF: An Accurate Interference FeedbackMechanism for Chip Multiprocessor MemorySystems

Magnus Jahre†, Marius Grannaes† ‡ and Lasse Natvig†

† Norwegian University of Science and Technology‡ Energy Micro

Page 2: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

2

Chip Multiprocessor Resources

• Hardware-controlled, shared resources– Interconnect bandwidth– Shared cache capacity– Memory bus bandwidth– Memory capacity is allocated by the operating system

Interference can occur in all shared units

CPU 1

Inte

rcon

nect

MainMemory

MemoryBus

D-Cache

I-Cache

CPU 2D-Cache

I-Cache

CPU 3D-Cache

I-Cache

CPU 4D-Cache

I-Cache

Sha

red

Cac

he

Mem

ory

Con

trol

ler

Private Memory System Shared Memory System

Current CMP implementations do not take interference into

account

Page 3: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

3

Why Control Resource Allocation?

Provide predictable performance

Support OS scheduler assumptions

Cloud: Fulfill Service Level Agreement

Page 4: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

4

Resource Allocation Tasks

Measurement

Allocation(Policy)

Enforcement(Mechanism)

Focus of this work

Page 5: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

5

Resource Allocation Baselines

Baseline = Interference-free configuration

Quantify performance impact from interference

Private Mode and Shared Mode

Page 6: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

6

Multi-Programmed Baseline

• All processes in a workload run concurrently

• Static and equal partitioning of all shared resources

50%Program

B

50%Program

A

Memory Bus

Shared Cache

50%: Program B50%: Program A

Multiprogrammed Baseline

Page 7: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

7

Single Program Baseline

• The process is run alone in one core

• All other cores are idle

• Exclusive access to all shared resources

100%Program

A

Shared Cache

Memory Bus

100%: Program A

Single Program Baseline

100%Program

B

Shared Cache

Memory Bus

100%: Program B

Page 8: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

8

Baseline Weaknesses

• Multiprogrammed Baseline– Only accounts for interference in partitioned resources– Static and equal division of DRAM bandwidth does not give equal

latency– Complex relationship between resource allocation and performance

• Single Program Baseline– Does not exist in shared mode

Dynamic Interference Estimation Framework (DIEF)

Page 9: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

9

Outline

• Introduction

• Dynamic Interference Estimation Framework– Shared Cache– Memory Bus – On-chip interconnect

• Results

• Summary

Page 10: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

10

Interference Estimation

Full-System Interference EstimationAggregate interference from different units

Common unit of measureAverage Latency (Clock Cycles)

DIEFGeneral, component-based framework

Page 11: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

11

Interference Definition

InterferencePrivate Mode

Latency

Estimate ErrorPrivate

Mode Latency Measurement

Shared Mode Latency

PrivateMode Latency

Estimate

Page 12: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

12

Shared Cache Interference

B

NM

ABA M N

Auxiliary Tag Directories

CP

U 0

CP

U 1

Cache Accesses:

B

Shared Cache

...... ...

......

...

Page 13: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

13

Shared Cache Interference

B

NM

AAB M N

Auxiliary Tag Directories

CP

U 0

CP

U 1

Cache Accesses:

B

Shared Cache

...... ...

......

...

C

C

Eviction may not be interference

Page 14: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

14

Shared Cache Interference

B

NM

AAB M

Auxiliary Tag Directories

CP

U 0

CP

U 1

Cache Accesses:

B

Shared Cache

...... ...

......

...

C

C CB

N

Interference cost = miss penalty

Hit

Miss

Page 15: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

15

Bus Interference Requirements

• Out-of-order memory bus scheduling• Shared mode only cache misses and cache hits• Shared cache writebacks

Computing private latency based on shared mode queue contents is difficult

Emulate private scheduling in the shared mode

Page 16: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

16

E D

Shared Bus Queue

C B

D C B A

1202004040

Arrival Order

Head Pointer

Execution Order

15

32

Latency Lookup Table

Bank 0

Bank 1

...

...

Open Page Emulation Registers

Memory Latency Estimation Buffer

Bank/ Page Mapping: A à (0,15), B à (0,19), C à (0,15), D à (1,32)

Estimated Queue Latency 120 40 40+ +=

BCD 40200

Page 17: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

17

Interconnect Interference

A

F E

BCCPU 0

CPU 1

L2 Bank 0

L2 Bank 1

Interference Counters

0 0

A

E

48

CPU 1 delays CPU 0

Page 18: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

18

Outline

• Introduction

• Dynamic Interference Estimation Framework– Shared Cache– Memory Bus – On-chip interconnect

• Results

• Summary

Page 19: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

19

Relative Estimation Errors

1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C4 Cores 8 Cores 16 Cores 4 Cores 8 Cores 16 Cores

Crossbar Ring

-4 %

0 %

4 %

8 %

Ave

rag

e R

elat

ive

Err

or

Page 20: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

20

RMS Error Breakdown

1 C 2 C 4 C 1 C 2 C 4 C 1 C 2 C 4 C 1 C 2 C 4 C 1 C 2 C 4 C 1 C 2 C 4 C4 Cores 8 Cores 16 Cores 4 Cores 8 Cores 16 Cores

Crossbar Ring

0

20

40

60

80

100

Bus Queue Bus ServiceInterconnect Request Queue

Su

m o

f A

vera

ge

Per

-B

ench

mar

k P

er-U

nit

RM

S

Err

or

(clo

ck c

ycle

s)

Remaining units contribute less than 2 clock cycles

Page 21: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

21

Auxiliary Tag Directory Accuracy

1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C 1C 2C 4C4 8 16 4 8 16

Crossbar Ring

-2 %

0 %

2 %

Rel

ativ

e M

iss

Est

imat

e E

rro

r

Page 22: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

22

Outline

• Introduction

• Dynamic Interference Estimation Framework– Shared Cache– Memory Bus – On-chip interconnect

• Results

• Summary

Page 23: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

23

Summary• Memory system interference causes unpredictable

performance

• DIEF provides– Accurate private mode latency estimates– Accurate shared mode latency measurements

• Future opportunities– Guiding dynamic optimizations– Guiding OS scheduling decisions– Debugging and optimization

Page 24: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

24

Thank you!

Visit our website:http://research.idi.ntnu.no/multicore/

Questions?

Page 25: 1 DIEF: An Accurate Interference Feedback Mechanism for Chip Multiprocessor Memory Systems Magnus Jahre †, Marius Grannaes † ‡ and Lasse Natvig † † Norwegian.

25

Experiment Methodology

• M5 simulator– Extended with crossbar and ring on-chip interconnect models– DDR2 memory bus model

• Randomly generated workloads of SPEC2000 benchmarks– 40 4-core workloads– 20 8-core workloads– 10 16-core workloads