A Systematic Methodology to Develop Resilient Cache Coherence Protocols

Post on 23-Feb-2016

29 views 0 download

Tags:

description

A Systematic Methodology to Develop Resilient Cache Coherence Protocols. Konstantinos Aisopos (Princeton, MIT ) Li- Shiuan Peh (MIT ). Motivation. CMP era is here … Enabled by aggressive transistor scaling shrinking transistor dimensions  unreliable silicon - PowerPoint PPT Presentation

Transcript of A Systematic Methodology to Develop Resilient Cache Coherence Protocols

A Systematic Methodology to Develop Resilient

Cache Coherence Protocols

Konstantinos Aisopos (Princeton, MIT)Li-Shiuan Peh (MIT)

Motivation

• CMP era is here…

• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)

NIC

P$ S$

P

C C … CC

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

[1,2]

[1] R. Bauman (TI), IEEE Design Test of Computers, vol. 22 (3), 2005 [2] J. Graham (MoSys), EE Times, 2002

Motivation

• CMP era is here…

• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)• Goal: resilient cache coherence protocol

NIC

P$ S$

P

C C … CC

loss of a single coherencemessage : deadlock

R

Rdatarequest

R RR

S

R

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

S1 S2

S S

R

S

SM

dir

I I

M

request (M)

unblock

ackack

S{ } BM M{ }

request (M)

R S1S2 R

1. initiator sends request to the directory2. directory forwards request to the sharers3. sharers invalidate their copy and acknowledge 4. request completes and initiator sends unblock to the dir5. dir updates sharing vector and may now process succeeding requests

Walkthrough Example:transaction resilient transaction

S1 S2R

Sdir

request (M)

request (M)

SM

request (M)

1. initiator sends request to the directory2. request is lost3. initiator resends request after a timeout4. directory forwards request to the sharers (…transaction continues identically as before)

Walkthrough Example:transaction resilient transaction

S2 S1R

request (M)

ack

S{R,S1,S2} BM

request (M)

Srequest (M)

SM

dir

ack

S{ }R S1S2

1. initiator resends its request

Walkthrough Example:transaction resilient transaction

S2 S1R

request (M)

ackack

S{R,S1,S2} BM

request (M)

Srequest (M)

SM

Srequest(M)

request (S)

BS

unblock

SM

BM

request(M)

?

request(M)

dir

tolerate a duplicate request:(1) transit to same state(2) generate the same messages

S{ }R S1S2

1. initiator resends its request

Walkthrough Example:transaction resilient transaction

BM

(M)request

unblock

S2 S1R

request (M)

ack

request (M)

Srequest (M)

SM

ack

dir

S{R,S1,S2} BMS{ }R S1S2

1. initiator resends its request2. directory forwards the request to sharers (again)

Walkthrough Example:transaction resilient transaction

S2 S1

request (M)

ackack

S

I

request(M) ack

request(M)

ack

Walkthrough Example:transaction resilient transaction

tolerate a duplicate request:(1) transit to same state(2) generate the same messages

S2 S1R

request (M)

ack

request (M)

Srequest (M)

SM

ack

dir

ackack

M

1. initiator resends its request2. directory forwards the request to sharers (again)3. sharers acknowledge (again) (…transaction completes identically as before)

Walkthrough Example:transaction resilient transaction

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Defining the Resilience Properties

request R

………

Rresponse

- same state transition - same outgoing messages- same state transition

- same outgoing messages

response

message loss => transaction suspended the requestor

regenerates its request after timeout

Defining the Resilience Properties

request

X

A

msgA

msgAY

msgA

msgB

msgA msgB

transient…

transient

stable

requeststable

messagelast

R

……

Property 1 initiator remains transient

throughout the transaction

Property 2 replicate msgs roll-back

to same earlier state

Property 3 retain information

to regenerate msgs

Rresponse

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Enforcing Property 1

the initiator remains transient throughout a transaction to be able to resend lost messages

transient…

transient

stable

requeststable

messagelast

Property 1

Enforcing Property 1

the initiator remains transient throughout a transaction to be able to resend lost messages

transient…

transient

requeststable

messagelast

Property 1

transient

stable

requeststable

dir…response

unblock

done initiator cannot

resend unblock

counter-example:Enforcement:

transient

- detect every outgoing message that transits the initiator to stable state

- replace the stable with a transient state, and wait for done

stable

Enforcing Property 2Property 2

A

msgA

…msgA

replicate messages roll-back to the earlier state the original message transitioned to

T1

S

msgA

T2

msgA

… ……TM … TM2

T1

S

msgA

T2

msgA

… ……TM1TM

disassociate branches after merging pointmsgA msgA

msgAT1 or T2?

Enforcing Property 2

replicate messages roll-back to the earlier state the original message transitioned to

Property 2A

msgA

…msgA

unique data

I

M

Rrequest (M)

dir( )

unique data

request (M)dir( )

Enforcing Property 3

retain info to regenerate every outgoing message, in case a replicate request is received

Property 3 msgA

msgB

msgA msgB

Sharer

TM

unique data

M

Rrequest (M)

dir( )

ITI invalidate permission

invalidate ack

Enforcing Property 3

retain info to regenerate every outgoing message, in case a replicate request is received

Property 3 msgA

msgB

msgA msgB

Sharer

unique dataretains

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Evaluation: Overhead

directory-based protocol (static directory node, MESI)base states resilient states

stable

Modified Md (M, waiting done)Ed (E, waiting done)Exclusive

Shared Sd (S, waiting done)

Invalid Id (I, waiting done)

transient

IM (I M) Sp (S, waiting permission)

IS (I S) Ip (I, waiting permission)

SM (SM) Ma (M, waiting ack)

ISI (IS I) Sa (S, waiting ack)

MI (M I)

base states resilient statesstableransient

Modified Md (M, waiting done)

Owned Ed (E, waiting done)

Exclusive Sd (S, waiting done)

Shared Id (I, waiting done)

Invalid MId (MI, waiting done)

transient

IM (I M) Sp (S, waiting permission)

IS (I S) Ip (I, waiting permission)

SM (S M) Ma (M, waiting ack)

SE (S E) Ea (E, waiting ack)

SS (S S) Sa (S, waiting ack)

OM (OM)

WB req

broadcast-based protocol (AMD Hammer, MOESI)

9 to 17 states (4 to 5 bits)

12 to 22 states (4 to 5 bits)

stab

letr

ansie

nt

stab

letr

ansie

nt

No state was introduced into the critical path of serving a request

PC address requestor flags state

Miss Status Holding Register (MSHR)

entr

ies

4-

32

timer

0 to 213

state

1bit 13bits

response bitvector

64bits

transID

6bits

11 bytes

total storage overhead : < 0.5 KB / core (worst-case: 2KB / core)

(*)

assuming a 64-node CMP with in-order cores(*)

Evaluation: Overhead

Network-on-ChipTopology 8x8 meshChannels 64-bitVNets 5Routing XY

System ConfigurationProcessors in-order SPARC coresL1 Caches 64KB/node, 3 cycles 4-way

64Byte blkL2 Caches 1MB/node, 6 cyclesMemory 4 controllers * 1GB, 160 cycles

Simulator: Wisconsin Multifacet GEMS

Evaluation: Performance

0%

2%

4%

6%

8%

10%

12%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec

benchmark

runti

me

over

head

(%)

fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate

SPLASH PARSEC

7.4%

11%

1.4%

1.8%

1.1%

3.5%

lower is better

directory protocol

Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline

0%

5%

10%

15%

20%

25%

30%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec

benchmark

runti

me

over

head

(%)

fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate

SPLASH PARSEC

2.4%

5.1%

0.5%

20.4%

51%

56%

broadcast protocol

Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

We have presented a generic methodology:• coherence protocol -> resilient coherence protocol …by enforcing 3 properties• minimal hardware overhead (<2KB / node)• small performance overhead

– directory-based protocol: 1.4% (1 fault / msec)– broadcast-based protocol: 2.4% (1 fault / msec)

Conclusions

Thank You!

Questions?

BACKUP SLIDES

Why performance overhead?

• transactions last longer => a request may have to wait for outstanding

conflicting requests to complete• data remain in caches for longer (3-way hs) => cache replacement duration • more messages are injected in the NoC => network traffic => average NoC latency

Transaction DurationB R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R

L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc

holescanneal fluidan-

imateswaptions x264

0

50

100

150

200

250

300

350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action

benchmark

dura

tion

(cyc

les)

B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)

+12%

+18%

Transaction Duration

11%

24%

B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R

L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc

holescanneal fluidan-

imateswaptions x264

0

50

100

150

200

250

300

350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action

benchmark

dura

tion

(cyc

les)

B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)

large working sets, shared data =>high number of requests (high traffic)(!) retransmissions saturate network)

Network Traffic

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80%

10%

20%

30%

40%

50%

60%

70%baseline protocol, no faultsextrapolated (baseline protocol, no faults)resilient protocol, no faultsextrapolated (resilient protocol, no faults)resilient protocol, 1fault/10μsecextrapolated (resilient protocol, 1 fault/10μsec)

request rate (total requests / cycle)

link

utiliz

ation

(%)

most congested link

average over all links

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……T1

T2

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

ack

SM + acks =1ack

SM + acks =2

Rrequest (M)

SM + acks =0

M

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

……

XmsgA

T [XYZ=100]

msgA

YT [XYZ=110]

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

……

XmsgA

T [XYZ=100]

msgA

XT [XYZ=100]

(duplicate)

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 19 21 22 23

24 25 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

2018

26