A Systematic Methodology to Develop Resilient Cache Coherence Protocols

39
A Systematic Methodology to Develop Resilient Cache Coherence Protocols Konstantinos Aisopos (Princeton, MIT) Li-Shiuan Peh (MIT)

description

A Systematic Methodology to Develop Resilient Cache Coherence Protocols. Konstantinos Aisopos (Princeton, MIT ) Li- Shiuan Peh (MIT ). Motivation. CMP era is here … Enabled by aggressive transistor scaling shrinking transistor dimensions  unreliable silicon - PowerPoint PPT Presentation

Transcript of A Systematic Methodology to Develop Resilient Cache Coherence Protocols

Page 1: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

A Systematic Methodology to Develop Resilient

Cache Coherence Protocols

Konstantinos Aisopos (Princeton, MIT)Li-Shiuan Peh (MIT)

Page 2: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Motivation

• CMP era is here…

• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)

NIC

P$ S$

P

C C … CC

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

R

[1,2]

[1] R. Bauman (TI), IEEE Design Test of Computers, vol. 22 (3), 2005 [2] J. Graham (MoSys), EE Times, 2002

Page 3: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Motivation

• CMP era is here…

• Enabled by aggressive transistor scaling shrinking transistor dimensions unreliable silicon (10K-100K FITs, frequency of errors : months)• Goal: resilient cache coherence protocol

NIC

P$ S$

P

C C … CC

loss of a single coherencemessage : deadlock

R

Rdatarequest

R RR

S

R

Page 4: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Page 5: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S1 S2

S S

R

S

SM

dir

I I

M

request (M)

unblock

ackack

S{ } BM M{ }

request (M)

R S1S2 R

1. initiator sends request to the directory2. directory forwards request to the sharers3. sharers invalidate their copy and acknowledge 4. request completes and initiator sends unblock to the dir5. dir updates sharing vector and may now process succeeding requests

Walkthrough Example:transaction resilient transaction

Page 6: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S1 S2R

Sdir

request (M)

request (M)

SM

request (M)

1. initiator sends request to the directory2. request is lost3. initiator resends request after a timeout4. directory forwards request to the sharers (…transaction continues identically as before)

Walkthrough Example:transaction resilient transaction

Page 7: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S2 S1R

request (M)

ack

S{R,S1,S2} BM

request (M)

Srequest (M)

SM

dir

ack

S{ }R S1S2

1. initiator resends its request

Walkthrough Example:transaction resilient transaction

Page 8: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S2 S1R

request (M)

ackack

S{R,S1,S2} BM

request (M)

Srequest (M)

SM

Srequest(M)

request (S)

BS

unblock

SM

BM

request(M)

?

request(M)

dir

tolerate a duplicate request:(1) transit to same state(2) generate the same messages

S{ }R S1S2

1. initiator resends its request

Walkthrough Example:transaction resilient transaction

BM

(M)request

unblock

Page 9: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S2 S1R

request (M)

ack

request (M)

Srequest (M)

SM

ack

dir

S{R,S1,S2} BMS{ }R S1S2

1. initiator resends its request2. directory forwards the request to sharers (again)

Walkthrough Example:transaction resilient transaction

Page 10: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S2 S1

request (M)

ackack

S

I

request(M) ack

request(M)

ack

Walkthrough Example:transaction resilient transaction

tolerate a duplicate request:(1) transit to same state(2) generate the same messages

Page 11: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

S2 S1R

request (M)

ack

request (M)

Srequest (M)

SM

ack

dir

ackack

M

1. initiator resends its request2. directory forwards the request to sharers (again)3. sharers acknowledge (again) (…transaction completes identically as before)

Walkthrough Example:transaction resilient transaction

Page 12: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Page 13: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Defining the Resilience Properties

request R

………

Rresponse

- same state transition - same outgoing messages- same state transition

- same outgoing messages

response

message loss => transaction suspended the requestor

regenerates its request after timeout

Page 14: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Defining the Resilience Properties

request

X

A

msgA

msgAY

msgA

msgB

msgA msgB

transient…

transient

stable

requeststable

messagelast

R

……

Property 1 initiator remains transient

throughout the transaction

Property 2 replicate msgs roll-back

to same earlier state

Property 3 retain information

to regenerate msgs

Rresponse

Page 15: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Page 16: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing Property 1

the initiator remains transient throughout a transaction to be able to resend lost messages

transient…

transient

stable

requeststable

messagelast

Property 1

Page 17: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing Property 1

the initiator remains transient throughout a transaction to be able to resend lost messages

transient…

transient

requeststable

messagelast

Property 1

transient

stable

requeststable

dir…response

unblock

done initiator cannot

resend unblock

counter-example:Enforcement:

transient

- detect every outgoing message that transits the initiator to stable state

- replace the stable with a transient state, and wait for done

stable

Page 18: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing Property 2Property 2

A

msgA

…msgA

replicate messages roll-back to the earlier state the original message transitioned to

Page 19: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

T1

S

msgA

T2

msgA

… ……TM … TM2

T1

S

msgA

T2

msgA

… ……TM1TM

disassociate branches after merging pointmsgA msgA

msgAT1 or T2?

Enforcing Property 2

replicate messages roll-back to the earlier state the original message transitioned to

Property 2A

msgA

…msgA

Page 20: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

unique data

I

M

Rrequest (M)

dir( )

unique data

request (M)dir( )

Enforcing Property 3

retain info to regenerate every outgoing message, in case a replicate request is received

Property 3 msgA

msgB

msgA msgB

Sharer

Page 21: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

TM

unique data

M

Rrequest (M)

dir( )

ITI invalidate permission

invalidate ack

Enforcing Property 3

retain info to regenerate every outgoing message, in case a replicate request is received

Property 3 msgA

msgB

msgA msgB

Sharer

unique dataretains

Page 22: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Page 23: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Evaluation: Overhead

directory-based protocol (static directory node, MESI)base states resilient states

stable

Modified Md (M, waiting done)Ed (E, waiting done)Exclusive

Shared Sd (S, waiting done)

Invalid Id (I, waiting done)

transient

IM (I M) Sp (S, waiting permission)

IS (I S) Ip (I, waiting permission)

SM (SM) Ma (M, waiting ack)

ISI (IS I) Sa (S, waiting ack)

MI (M I)

base states resilient statesstableransient

Modified Md (M, waiting done)

Owned Ed (E, waiting done)

Exclusive Sd (S, waiting done)

Shared Id (I, waiting done)

Invalid MId (MI, waiting done)

transient

IM (I M) Sp (S, waiting permission)

IS (I S) Ip (I, waiting permission)

SM (S M) Ma (M, waiting ack)

SE (S E) Ea (E, waiting ack)

SS (S S) Sa (S, waiting ack)

OM (OM)

WB req

broadcast-based protocol (AMD Hammer, MOESI)

9 to 17 states (4 to 5 bits)

12 to 22 states (4 to 5 bits)

stab

letr

ansie

nt

stab

letr

ansie

nt

No state was introduced into the critical path of serving a request

Page 24: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

PC address requestor flags state

Miss Status Holding Register (MSHR)

entr

ies

4-

32

timer

0 to 213

state

1bit 13bits

response bitvector

64bits

transID

6bits

11 bytes

total storage overhead : < 0.5 KB / core (worst-case: 2KB / core)

(*)

assuming a 64-node CMP with in-order cores(*)

Evaluation: Overhead

Page 25: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Network-on-ChipTopology 8x8 meshChannels 64-bitVNets 5Routing XY

System ConfigurationProcessors in-order SPARC coresL1 Caches 64KB/node, 3 cycles 4-way

64Byte blkL2 Caches 1MB/node, 6 cyclesMemory 4 controllers * 1GB, 160 cycles

Simulator: Wisconsin Multifacet GEMS

Evaluation: Performance

Page 26: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

0%

2%

4%

6%

8%

10%

12%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec

benchmark

runti

me

over

head

(%)

fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate

SPLASH PARSEC

7.4%

11%

1.4%

1.8%

1.1%

3.5%

lower is better

directory protocol

Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline

Page 27: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

0%

5%

10%

15%

20%

25%

30%no faults1 fault / 1msec1 fault / 100μsec1 fault / 10μsec

benchmark

runti

me

over

head

(%)

fft fmm lu radix water water blacks canneal fluidan swaptions x264 AVERAGE nsq sp choles imate

SPLASH PARSEC

2.4%

5.1%

0.5%

20.4%

51%

56%

broadcast protocol

Evaluation: Performancemetric: runtime overhead vs. non-resilient baseline

Page 28: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Outline• Motivation• Methodology

–Walkthrough: a resilient transaction–Defining resilience properties–Enforcing resilience properties

• Evaluation–Overhead–Performance

• Conclusions

Page 29: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

We have presented a generic methodology:• coherence protocol -> resilient coherence protocol …by enforcing 3 properties• minimal hardware overhead (<2KB / node)• small performance overhead

– directory-based protocol: 1.4% (1 fault / msec)– broadcast-based protocol: 2.4% (1 fault / msec)

Conclusions

Page 30: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Thank You!

Questions?

Page 31: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

BACKUP SLIDES

Page 32: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Why performance overhead?

• transactions last longer => a request may have to wait for outstanding

conflicting requests to complete• data remain in caches for longer (3-way hs) => cache replacement duration • more messages are injected in the NoC => network traffic => average NoC latency

Page 33: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Transaction DurationB R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R

L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc

holescanneal fluidan-

imateswaptions x264

0

50

100

150

200

250

300

350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action

benchmark

dura

tion

(cyc

les)

B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)

+12%

+18%

Page 34: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Transaction Duration

11%

24%

B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R B R

L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1 L2 L1fft fmm lu radix water nsq water sp blacksc

holescanneal fluidan-

imateswaptions x264

0

50

100

150

200

250

300

350 Property 2 (inval-idation HS)Property 1 (done message)baseline trans-action

benchmark

dura

tion

(cyc

les)

B: baseline protocol, no faults R: resilient protocol, 1fault/10μsec L1: transaction served by sharer's L1 L2: transaction served by directory (L2)

large working sets, shared data =>high number of requests (high traffic)(!) retransmissions saturate network)

Page 35: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Network Traffic

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80%

10%

20%

30%

40%

50%

60%

70%baseline protocol, no faultsextrapolated (baseline protocol, no faults)resilient protocol, no faultsextrapolated (resilient protocol, no faults)resilient protocol, 1fault/10μsecextrapolated (resilient protocol, 1 fault/10μsec)

request rate (total requests / cycle)

link

utiliz

ation

(%)

most congested link

average over all links

Page 36: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……T1

T2

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

ack

SM + acks =1ack

SM + acks =2

Rrequest (M)

SM + acks =0

M

Page 37: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

……

XmsgA

T [XYZ=100]

msgA

YT [XYZ=110]

Page 38: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

Enforcing the Resilience Properties

A single message type transits to a unique state in every FSM branch

P2

……

msgA

Case 2: identical messages in same branch

X

Y

msgA

T count =1

T count =2

……

XmsgA

T [XYZ=100]

msgA

XT [XYZ=100]

(duplicate)

Page 39: A Systematic Methodology  to Develop Resilient  Cache Coherence Protocols

0 1 2 3 4 5 6 7

8 9 10 11 12 13 14 15

16 17 19 21 22 23

24 25 27 28 29 30 31

32 33 34 35 36 37 38 39

40 41 42 43 44 45 46 47

48 49 50 51 52 53 54 55

56 57 58 59 60 61 62 63

2018

26