Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3...

165
Fairness via Source Throttling: A configurable and high-performance fairness substrate for multi-core memory systems Eiman Ebrahimi * Chang Joo Lee* Onur Mutlu Yale N. Patt * * HPS Research Group The University of Texas at Austin ‡ Computer Architecture Laboratory Carnegie Mellon University Wednesday, March 17, 2010

Transcript of Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3...

Page 1: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Fairness via Source Throttling:

A configurable and high-performance fairness substrate for multi-core memory systems

Eiman Ebrahimi*

Chang Joo Lee*

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group

The University of Texas at Austin

‡ Computer Architecture Laboratory

Carnegie Mellon University

Wednesday, March 17, 2010

Page 2: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N...

2

Wednesday, March 17, 2010

Page 3: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 4: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 5: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 6: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 7: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 8: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Background and Problem

Core 0 Core 1 Core 2 Core N

Shared Cache

Memory Controller

DRAMBank 0

DRAMBank 1

DRAM Bank 2

... DRAMBank K

...

Shared MemoryResources

Chip BoundaryOn-chipOff-chip

2

Wednesday, March 17, 2010

Page 9: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Applications slow down due to interference from memory requests of other applications

3

Background and Problem

Wednesday, March 17, 2010

Page 10: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)

•Applications slow down due to interference from memory requests of other applications

3

Background and Problem

Wednesday, March 17, 2010

Page 11: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)

•Applications slow down due to interference from memory requests of other applications

3

Background and Problem

SharedTi

•Slowdown of application i = Ti

Alone

Wednesday, March 17, 2010

Page 12: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• A memory system is fair if slowdowns of same-priority applications are equal(MICRO ’06, MICRO ’07, ISCA ’08)

•Applications slow down due to interference from memory requests of other applications

3

Background and Problem

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i(MICRO ’07)

•Unfairness =

Wednesday, March 17, 2010

Page 13: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

4

Background and Problem

Wednesday, March 17, 2010

Page 14: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Magnitude of each application’s slowdown depends on concurrently running applications’ memory behavior

4

Background and Problem

Wednesday, March 17, 2010

Page 15: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

4

Background and Problem

Wednesday, March 17, 2010

Page 16: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

01234567

lbm omnet apsi vortexSl

owdo

wn

4

Background and Problem

Wednesday, March 17, 2010

Page 17: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

• Large disparities in slowdowns are unacceptable

01234567

lbm omnet apsi vortexSl

owdo

wn

4

Background and Problem

Wednesday, March 17, 2010

Page 18: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

• Large disparities in slowdowns are unacceptable• Low system performance

01234567

lbm omnet apsi vortexSl

owdo

wn

4

Background and Problem

Wednesday, March 17, 2010

Page 19: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

• Large disparities in slowdowns are unacceptable• Low system performance • Vulnerability to denial of service attacks

01234567

lbm omnet apsi vortexSl

owdo

wn

4

Background and Problem

Wednesday, March 17, 2010

Page 20: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

zeus art

Slow

dow

n• Magnitude of each application’s slowdown depends on

concurrently running applications’ memory behavior

• Large disparities in slowdowns are unacceptable• Low system performance • Vulnerability to denial of service attacks• Difficult for system software to enforce priorities

01234567

lbm omnet apsi vortexSl

owdo

wn

4

Background and Problem

Wednesday, March 17, 2010

Page 21: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Background and Problem

•Motivation for Source Throttling

• Fairness via Source Throttling (FST)

•Evaluation

•Conclusion

5

Outline

Wednesday, March 17, 2010

Page 22: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

6

Prior Approaches

Wednesday, March 17, 2010

Page 23: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Primarily manage inter-application interference in only one particular resource

• Shared Cache, Memory Controller, Interconnect, etc.

6

Prior Approaches

Wednesday, March 17, 2010

Page 24: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Primarily manage inter-application interference in only one particular resource

• Shared Cache, Memory Controller, Interconnect, etc.

• Combining techniques for the different resources can result in negative interaction

6

Prior Approaches

Wednesday, March 17, 2010

Page 25: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Primarily manage inter-application interference in only one particular resource

• Shared Cache, Memory Controller, Interconnect, etc.

• Combining techniques for the different resources can result in negative interaction

• Approaches that coordinate interaction among techniques for different resources require complex implementations

6

Prior Approaches

Wednesday, March 17, 2010

Page 26: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Primarily manage inter-application interference in only one particular resource

• Shared Cache, Memory Controller, Interconnect, etc.

• Combining techniques for the different resources can result in negative interaction

• Approaches that coordinate interaction among techniques for different resources require complex implementations

Our Goal: Enable fair sharing ofthe entire memory system by dynamically detecting and controlling interference in a coordinated manner

6

Prior Approaches

Wednesday, March 17, 2010

Page 27: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

7

Our Approach

Wednesday, March 17, 2010

Page 28: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Manage inter-application interference at the cores, not at the shared resources

7

Our Approach

Wednesday, March 17, 2010

Page 29: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Manage inter-application interference at the cores, not at the shared resources

• Dynamically estimate unfairness in the memory system

7

Our Approach

Wednesday, March 17, 2010

Page 30: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Manage inter-application interference at the cores, not at the shared resources

• Dynamically estimate unfairness in the memory system

• If unfairness > system-software-specified target thenthrottle down core causing unfairness & throttle up core that was unfairly treated

7

Our Approach

Wednesday, March 17, 2010

Page 31: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Wednesday, March 17, 2010

Page 32: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A:B:

Unmanaged Interference

A:

B:Fair Source Throttling

Wednesday, March 17, 2010

Page 33: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Oldest ⎧||⎩

Shared MemoryResources

A:B:

Unmanaged Interference

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 34: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Oldest ⎧||⎩

Shared MemoryResources

A: Compute

ComputeB:Unmanaged Interference

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 35: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Oldest ⎧||⎩

Shared MemoryResources

A: Compute

ComputeB:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 36: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute

ComputeB:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 37: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

ComputeB:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 38: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

ComputeB:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall time

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 39: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resourcesB:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall time

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 40: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall time

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 41: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A:

B:Fair Source Throttling

queue of requests to shared resources

Wednesday, March 17, 2010

Page 42: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A:

B:Fair Source Throttling

queue of requests to shared resources

Intensive application A generates many requests and causes long stall times for less intensive application B

Wednesday, March 17, 2010

Page 43: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A: Compute

ComputeB:Fair Source Throttling

Request Generation OrderA1, A2, A3, A4, B1

queue of requests to shared resources

Intensive application A generates many requests and causes long stall times for less intensive application B

Wednesday, March 17, 2010

Page 44: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A: Compute

ComputeB:Fair Source Throttling

Request Generation OrderA1, A2, A3, A4, B1

queue of requests to shared resources

Intensive application A generates many requests and causes long stall times for less intensive application B

Wednesday, March 17, 2010

Page 45: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A: Compute

ComputeB:Fair Source Throttling

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 46: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

⎧||⎩

Shared MemoryResources

A: Compute

ComputeB:Fair Source Throttling

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 47: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute

ComputeB:Fair Source Throttling

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 48: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1

Compute Stall wait.B:Fair Source Throttling

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 49: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1

Compute Stall wait. Stall on B1B:Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 50: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1

Compute Stall wait. Stall on B1B:

Core B’s stall time

Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Wednesday, March 17, 2010

Page 51: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2

Compute Stall wait. Stall on B1B:

Core B’s stall time

Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Stall on A4Stall on A3

Wednesday, March 17, 2010

Page 52: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2

Compute Stall wait. Stall on B1B:

Core A’s stall time

Core B’s stall time

Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Oldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Stall on A4Stall on A3

Extra Cycles Core A

Wednesday, March 17, 2010

Page 53: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2

Compute Stall wait. Stall on B1B:

Core A’s stall time

Core B’s stall time

Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Saved Cycles Core BOldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Stall on A4Stall on A3

Extra Cycles Core A

Wednesday, March 17, 2010

Page 54: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

A4B1

A1

A2

A3

Oldest ⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2 Stall on A3 Stall on A4

Compute Stall waiting for shared resources Stall on B1B:

Request Generation Order: A1, A2, A3, A4, B1

Unmanaged Interference

Core A’s stall timeCore B’s stall time

A4

B1

A1

A2

A3

⎧||⎩

Shared MemoryResources

A: Compute Stall on A1 Stall on A2

Compute Stall wait. Stall on B1B:

Dynamically detect application A’s interference for application B and throttle down application A

Core A’s stall time

Core B’s stall time

Fair Source Throttling

Stall wait.

Request Generation OrderA1, B1, A2, A3, A4

queue of requests to shared resources

queue of requests to shared resources

Saved Cycles Core BOldest

Intensive application A generates many requests and causes long stall times for less intensive application B

Throttled Requests

Stall on A4Stall on A3

Extra Cycles Core A

Wednesday, March 17, 2010

Page 55: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Background and Problem

•Motivation for Source Throttling

• Fairness via Source Throttling (FST)

•Evaluation

•Conclusion

9

Outline

Wednesday, March 17, 2010

Page 56: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

10

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 57: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Runtime Unfairness Evaluation• Dynamically estimates the unfairness in the

memory system

10

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 58: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Runtime Unfairness Evaluation• Dynamically estimates the unfairness in the

memory system

•Dynamic Request Throttling• Adjusts how aggressively each core makes

requests to the shared resources

10

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 59: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

FST

TimeInterval 1 Interval 2 Interval 3

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 60: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

FST

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 61: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness

FST

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

Runtime UnfairnessEvaluation

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 62: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)

FST

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

Runtime UnfairnessEvaluation

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 63: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

FST

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

Runtime UnfairnessEvaluation

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 64: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

FSTUnfairness Estimate

App-slowestApp-interfering

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

DynamicRequest Throttling

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 65: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) {

FSTUnfairness Estimate

App-slowestApp-interfering

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

DynamicRequest Throttling

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 66: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering

FSTUnfairness Estimate

App-slowestApp-interfering

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

DynamicRequest Throttling

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 67: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

| ⎨ | ⎧⎩Slowdown Estimation

TimeInterval 1 Interval 2 Interval 3

DynamicRequest Throttling

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 68: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

12

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 69: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

13

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i•Unfairness =

Estimating System Unfairness

Wednesday, March 17, 2010

Page 70: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•How can be estimated in shared mode?TiAlone

13

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i•Unfairness =

Estimating System Unfairness

Wednesday, March 17, 2010

Page 71: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•How can be estimated in shared mode?TiAlone

13

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i•Unfairness =

TiExcess

• is the number of extra cycles it takes application i to execute due to interference

Estimating System Unfairness

Wednesday, March 17, 2010

Page 72: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•How can be estimated in shared mode?TiAlone

13

SharedTi

•Slowdown of application i = Ti

Alone

Max{Slowdown i} over all applications i

Min{Slowdown i} over all applications i•Unfairness =

TiExcess

• is the number of extra cycles it takes application i to execute due to interference

TiShared

=TiAlone

- TiExcess

Estimating System Unfairness

Wednesday, March 17, 2010

Page 73: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

14

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 74: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

14

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 75: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:

14

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 76: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache

14

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 77: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank

14

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 78: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers

14

Bank 2

Tracking Inter-Core Interference

Row

Wednesday, March 17, 2010

Page 79: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers

14

Bank 2

Tracking Inter-Core Interference

Row

Wednesday, March 17, 2010

Page 80: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers

FST hardware

14

Bank 2

Tracking Inter-Core Interference

Row

Wednesday, March 17, 2010

Page 81: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

Three interference sources:1. Shared Cache2. DRAM bus and bank3. DRAM row-buffers

FST hardware

14

Bank 2

Tracking Inter-Core Interference

Row

Wednesday, March 17, 2010

Page 82: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Core 0 Core 1

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 83: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Bank 2Bank 0 Bank 1 Bank 7...

Core 0 Core 1

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 84: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row Aqueue of requests to bank 2

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 85: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row Aqueue of requests to bank 2

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 86: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row Aqueue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 87: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row A

Interference per corebit vector

0 0Core # 0 1

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 88: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row A

Shadow Row Address Register(SRAR) Core 0 :

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 89: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row A

Shadow Row Address Register(SRAR) Core 0 :

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 90: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row Hit

Row A

Shadow Row Address Register(SRAR) Core 0 :

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 91: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row Hit

Row A

Row BShadow Row Address Register(SRAR) Core 0 :

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 92: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row ConflictRow A

Row BShadow Row Address Register(SRAR) Core 0 :

Row A

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 93: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row B

Row A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row ConflictRow A

Row BRow B

Shadow Row Address Register(SRAR) Core 0 :

Row A

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 94: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row BRow A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row A

Row Hit

Row BRow B

Shadow Row Address Register(SRAR) Core 0 :

Row A

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 95: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row BRow A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row ARow Conflict

Row B

Row BShadow Row Address Register(SRAR) Core 0 :

Row A

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 96: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row BRow A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row ARow Conflict

Row B

Row BShadow Row Address Register(SRAR) Core 0 :

Row A

Interference inducedrow conflict

Interference per corebit vector

0 0Core # 0 1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 97: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Row BRow A

Bank 2

Row B

Bank 0 Bank 1 Bank 7...

Core 0 Core 1

Row ARow Conflict

Row B

Row BShadow Row Address Register(SRAR) Core 0 :

Row A

Interference inducedrow conflict

Interference per corebit vector

0Core # 0 1

1

Shadow Row Address Register(SRAR) Core 1 :

queue of requests to bank 2

FST additions

Row Buffer:

15

Tracking DRAM Row-Buffer Interference

Wednesday, March 17, 2010

Page 98: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

FST hardwareCore 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 99: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

0

Excess Cycles Counters per core

FST hardwareCore 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 100: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

0

Excess Cycles Counters per core

TCycle Count

FST hardwareCore 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 101: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

0

Excess Cycles Counters per core

1

TCycle Count

FST hardwareCore 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 102: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

0

Excess Cycles Counters per core

1

TCycle Count

FST hardwareCore 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 103: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

Excess Cycles Counters per core

1

Cycle Count T+2

2FST hardware

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 104: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

Excess Cycles Counters per core

1

Cycle Count T+2

2FST hardware

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 105: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

Excess Cycles Counters per core

1

Cycle Count T+2

2FST hardware

1

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 106: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

Excess Cycles Counters per core

1

Cycle Count T+2

2FST hardware

1

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 107: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0 0 0

Interference per corebit vector

Core # 0 1 2 3

0

0

0

Excess Cycles Counters per core

1

Cycle Count T+2

2FST hardware

1

T+3

3

1

Core 0 Core 1 Core 2 Core 3

Bank 0 Bank 1 Bank 2 Bank 7...

Memory Controller

Shared Cache

16

TiExcess

|⎨|⎧

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 108: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

17

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 109: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 110: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• To identify App-interfering, for each core i

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 111: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 112: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

Interference per corebit vector

Core # 0 1 2 3

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 0

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 113: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 00123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 114: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 115: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 116: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 2,1

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 117: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 2,1

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

1

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 118: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 2,1

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

1core 2

interfered with

core 1

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 119: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

1core 2

interfered with

core 1

Cnt 2,1++

0123

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 120: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

1core 2

interfered with

core 1

Cnt 2,1++

0123

App-slowest = 2

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 121: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Cnt 3Cnt 2Cnt 1Cnt 00

0 0 0 -

Interference per corebit vector

Core # 0 1 2 3-

Cnt 1,0

Cnt 2,0

Cnt 3,0

Excess Cycles Counters per core

• To identify App-interfering, for each core i• FST separately tracks interference

caused by each core j ( j ! i )

0 0 - 0

0 - 0 0

- 0 0 0

|⎨|⎧ ⎩

|⎨|⎧

Interfered with core

Interfering core

Cnt 0,1

-

Cnt 3,1

Cnt 0,2

Cnt 1,2

-

Cnt 3,2

Cnt 0,3

Cnt 1,3

Cnt 2,3

-

1core 2

interfered with

core 1

Cnt 2,1++

0123

Row with largest count determines App-interfering

App-slowest = 2

18

Tracking Inter-Core Interference

Wednesday, March 17, 2010

Page 122: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Runtime UnfairnessEvaluation

DynamicRequest Throttling

1- Estimating system unfairness 2- Find app. with the highest slowdown (App-slowest)3- Find app. causing most interference for App-slowest (App-interfering)

if (Unfairness Estimate >Target) { 1-Throttle down App-interfering 2-Throttle up App-slowest}

FSTUnfairness Estimate

App-slowestApp-interfering

19

Fairness via Source Throttling (FST)

Wednesday, March 17, 2010

Page 123: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 124: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Goal: Adjust how aggressively each core makes requests to the shared resources

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 125: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Goal: Adjust how aggressively each core makes requests to the shared resources

•Mechanisms:• Miss Status Holding Register (MSHR) quota

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 126: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Goal: Adjust how aggressively each core makes requests to the shared resources

•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests

accessing shared resources from each application

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 127: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Goal: Adjust how aggressively each core makes requests to the shared resources

•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests

accessing shared resources from each application

• Request injection frequency

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 128: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Goal: Adjust how aggressively each core makes requests to the shared resources

•Mechanisms:• Miss Status Holding Register (MSHR) quota• Controls the number of concurrent requests

accessing shared resources from each application

• Request injection frequency• Controls how often memory requests are issued

to the last level cache from the MSHRs

20

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 129: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Throttling level assigned to each core determines both MSHR quota and request injection rate

21

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 130: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Throttling level assigned to each core determines both MSHR quota and request injection rate

Throttling level MSHR quota Request Injection Rate

100% 128 Every cycle50% 64 Every other cycle25% 32 Once every 4 cycles10% 12 Once every 10 cycles5% 6 Once every 20 cycles4% 5 Once every 25 cycles3% 3 Once every 30 cycles2% 2 Once every 50 cycles

Total # ofMSHRs: 128

21

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 131: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Throttling level assigned to each core determines both MSHR quota and request injection rate

Throttling level MSHR quota Request Injection Rate

100% 128 Every cycle50% 64 Every other cycle25% 32 Once every 4 cycles10% 12 Once every 10 cycles5% 6 Once every 20 cycles4% 5 Once every 25 cycles3% 3 Once every 30 cycles2% 2 Once every 50 cycles

Total # ofMSHRs: 128

21

Dynamic Request Throttling

Wednesday, March 17, 2010

Page 132: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Time

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 3Interval i

Interval i + 1Interval i + 2

Core 0 Core 2

System software fairness goal: 1.4

22

FST at Work

Wednesday, March 17, 2010

Page 133: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%Interval i

Interval i + 1Interval i + 2

Core 0 Core 2

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 134: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%Interval i

Interval i + 1Interval i + 2

Core 0 Core 2

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 135: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%Interval i

Interval i + 1Interval i + 2

3

Core 2

Core 0

Core 0 Core 2

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 136: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%Interval i

Interval i + 1Interval i + 2

3

Core 2

Core 0

Core 0 Core 2

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 137: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%Interval i

Interval i + 1Interval i + 2

3

Core 2

Core 0

Core 0 Core 2Throttle down Throttle up

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 138: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i Interval i+1

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%

Interval iInterval i + 1Interval i + 2

3

Core 2

Core 0

Core 0 Core 2Throttle down Throttle up

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 139: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i Interval i+1

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%

Interval iInterval i + 1Interval i + 2

Core 0 Core 2

2.5

Core 2

Core 1

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 140: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i Interval i+1

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%

Interval iInterval i + 1Interval i + 2

Core 0 Core 2

2.5

Core 2

Core 1

Throttle down Throttle up

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 141: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

TimeInterval i Interval i+1 Interval i+2

Runtime UnfairnessEvaluation Dynamic

Request Throttling

FSTUnfairness Estimate

App-slowest

App-interfering

Throttling Levels

Core 0 Core 1 Core 350% 100% 10% 100%25% 100% 25% 100%25% 50% 50% 100%

Interval iInterval i + 1Interval i + 2

Core 0 Core 2

2.5

Core 2

Core 1

Throttle down Throttle up

System software fairness goal: 1.4

22

FST at Work

Slowdown Estimation

| ⎨ | ⎧⎩

Wednesday, March 17, 2010

Page 142: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

23

System Software Support

Wednesday, March 17, 2010

Page 143: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software

23

System Software Support

Wednesday, March 17, 2010

Page 144: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

23

System Software Support

Wednesday, March 17, 2010

Page 145: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

• Estimated Max Slowdown > Target Max Slowdown

23

System Software Support

Wednesday, March 17, 2010

Page 146: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

• Estimated Max Slowdown > Target Max Slowdown

• Estimated Slowdown(i) > Target Slowdown(i)

23

System Software Support

Wednesday, March 17, 2010

Page 147: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

• Estimated Max Slowdown > Target Max Slowdown

• Estimated Slowdown(i) > Target Slowdown(i)

•Support for thread priorities

23

System Software Support

Wednesday, March 17, 2010

Page 148: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Different fairness objectives can be configured by system software• Estimated Unfairness > Target Unfairness

• Estimated Max Slowdown > Target Max Slowdown

• Estimated Slowdown(i) > Target Slowdown(i)

•Support for thread priorities• Weighted Slowdown(i) =

Estimated Slowdown(i) x Weight(i)

23

System Software Support

Wednesday, March 17, 2010

Page 149: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Total storage cost required for 4 cores is ! 12KB

• FST does not require any structures or logic that are on the processor’s critical path

24

Hardware Cost

Wednesday, March 17, 2010

Page 150: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

•Background and Problem

•Motivation for Source Throttling

• Fairness via Source Throttling (FST)

•Evaluation

•Conclusion

25

Outline

Wednesday, March 17, 2010

Page 151: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• x86 cycle accurate simulator

• Baseline processor configuration• Per-core• 4-wide issue, out-of-order, 256 entry ROB

• Shared (4-core system)• 128 MSHRs • 2 MB, 16-way L2 cache

• Main Memory• DDR3 1333 MHz• Latency of 15ns per command (tRP, tRCD, CL)• 8B wide core to memory bus

26

Evaluation Methodology

Wednesday, March 17, 2010

Page 152: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 153: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 154: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 155: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 156: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 157: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

27

System Unfairness Results

Wednesday, March 17, 2010

Page 158: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

44.4%

27

System Unfairness Results

Wednesday, March 17, 2010

Page 159: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

1

2

3

4

5

6

7

Syst

em U

nfai

rnes

s

No FairnessFair Cache Capacity (VPC)Parallelism-Aware Batch Scheduling+VPCFairness via Source Throttling (FST)

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

gmea

n

44.4%

36%

27

System Unfairness Results

Wednesday, March 17, 2010

Page 160: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0.4

0.8

1.2

1.6

2

Syst

em P

erf.

Nor

mal

ized

to

No

Fair

ness Fair Cache Capacity (VPC)

Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)

gmea

n

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

28

System Performance Results

Wednesday, March 17, 2010

Page 161: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0.4

0.8

1.2

1.6

2

Syst

em P

erf.

Nor

mal

ized

to

No

Fair

ness Fair Cache Capacity (VPC)

Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)

gmea

n

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

28

System Performance Results

Wednesday, March 17, 2010

Page 162: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0.4

0.8

1.2

1.6

2

Syst

em P

erf.

Nor

mal

ized

to

No

Fair

ness Fair Cache Capacity (VPC)

Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)

gmea

n

25.6%

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

28

System Performance Results

Wednesday, March 17, 2010

Page 163: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

0

0.4

0.8

1.2

1.6

2

Syst

em P

erf.

Nor

mal

ized

to

No

Fair

ness Fair Cache Capacity (VPC)

Parallelism-Aware Batch Scheduling + VPCFairness via Source Throttling (FST)

gmea

n

25.6%

14%

grom+a

rt+ast

ar+h

264

art+

games

+Gem

s+h2

64

lbm+o

mnet+

apsi+

vorte

x

art+

leslie

+gam

es+g

rom

art+

astar

+lesli

e+cr

afty

art+

milc+v

ortex

+calc

ulix

lucas+

ammp+

xalan

c+gro

m

lbm+G

ems+

astar

+mes

a

mgrid+

parse

r+so

plex+

perlb

gcc0

6+xa

lanc+

lbm+c

actu

s

28

System Performance Results

Wednesday, March 17, 2010

Page 164: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

• Fairness via Source Throttling (FST) is a new fair and high-performance shared resource management approach for CMPs

• Dynamically monitors unfairness and throttles down sources of interfering memory requests

• Eliminates the need for and complexity of multiple per-resource fairness techniques

• Improves both system fairness and performance

• Incorporates thread weights and enables different fairness objectives

29

Conclusion

Wednesday, March 17, 2010

Page 165: Fairness via Source Throttlingusers.ece.cmu.edu/~omutlu/pub/ebrahimi_asplos10_talk.pdfB1 A1 A2 A3 Oldest Shared Memory Resources A: Compute Stall on A1 Stall on A2 Stall on A3 Stall

Fairness via Source Throttling:

A configurable and high-performance fairness substrate for multi-core memory systems

Eiman Ebrahimi*

Chang Joo Lee*

Onur Mutlu‡

Yale N. Patt*

* HPS Research Group

The University of Texas at Austin

‡ Computer Architecture Laboratory

Carnegie Mellon University

Wednesday, March 17, 2010