EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware...

24
EazyHTM: Eager-Lazy Hardware Transactional Memory Saša Tomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián Cristal, Osman Unsal, Tim Harris, Mateo Valero Barcelona Supercomputing Center, UPC BITS Pilani Microsoft Research Cambridge

Transcript of EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware...

Page 1: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

EazyHTM: Eager-Lazy Hardware

Transactional Memory

Saša Tomić, Cristian Perfumo, Chinmay Kulkarni,

Adrià Armejach, Adrián Cristal, Osman Unsal,

Tim Harris, Mateo Valero

Barcelona Supercomputing Center, UPC

BITS Pilani

Microsoft Research Cambridge

Page 2: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Why Transactional Memory?

• Lock-based parallel programming has problems

– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue

– Optimistic concurrency control mechanism

– Easy to use

– Deadlock free

– Supports composability

– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

• Lock-based parallel programming has problems

– Deadlocks, races, complexity, performance, …

• Transactional Memory (TM) to the rescue

– Optimistic concurrency control mechanism

– Easy to use

– Deadlock free

– Supports composability

– Protects data in critical sections

• Hardware-TM (HTM), Software-TM (STM) and hybrid

2

Page 3: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

HTM terminology

• Atomic section/transaction: group of instructions that

appear to take effect instantaneously

• Where are speculative values stored (version

management):

– in-place, and log the original value, or

– buffered in private storage, publish on commit

• Conflict: TX writes where others TX reads

– Detection: an action in which we check for conflicts

– Resolution: an action performed to resolve the conflict

• Can be abort, stalling the execution, …

3

Page 4: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

• A.k.a. pessimistic

• Writes in-place, detects&resolves conflicts on every access

• LogTM [Moore, HPCA06], LogTM-SE [Yen, HPCA07]

Eager HTM

4

Stall

W

RR

TX 1

TX 2

TX 3

fast

commit

Limited

concurrency

Fast commit

Slow abort

Page 5: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

• A.k.a. optimistic

• Writes buffered, detect&resolve conflicts on commit

• TCC [Hammond, ISCA04], Scalable-TCC [Chafi, HPCA07]

Lazy HTM

5

W

RR

TX 1

TX 2

TX 3

complex

commit:

validate +

write

Fast abort

Complex

commit

Good

concurrency

Page 6: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

The Motivation

Splitting conflict management

• Eager-Lazy hardware-software TM exists (FlexTM [Shriraman, ISCA08]):

– Software begin, commit and abort

– Probabilistic (signature based) conflict detection

• EazyHTM is the first pure-hardware TM

6

Conflict

detection

Eager

Lazy

Conflict resolution

Eager Lazy

LogTM

TCC, S-TCCImpossible

EazyHTM Fast commit

Good

concurrency

Page 7: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Outline

• Motivation

• Contributions

• Hardware changes

• The Protocol

• Evaluation

• Conclusions

7

Page 8: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

EazyHTM Contributions

• The best of two worlds

– Eager conflict detection: simple commit/exact list of

conflicts in advance

– Lazy conflict resolution: good concurrency

• Parallel commits of non-conflicting TXs

• Designed for CMPs (Chip-Multiprocessors)

– Use cores proximity

– MESI/MOESI protocol upgrade (easier verification)

8

Page 9: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Hardware changes

9

Racers list – 1 bit per core

Killers list – 1 bit per core

SR – 1 bit per line

SM – 1 bit per line

TD – 1 bit per line

Register file

checkpoint

Racers listRacers list

Killers listKillers listCPU

S

R

S

R Existing cache logicPrivate

Cache(s)S

M

S

M

T

D

T

D Existing directory logicDirectory

• tracks conflicts

• tracks conflicts

• bit-vector

• 32 bits for 32 cores

holds read/write set

read only optimization bit

(details in the paper)

read-only optimization bit

(details in the paper)

core core core... ... ...

Page 10: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Racers and killers list

• If line is shared between two TXs:

– Read-Read

• No conflict

– Write-Read, Read-Write, Write-Write

• Writer adds reader TX into “racers” list

– “TXs that I have to abort” list, if I commit first

• Reader adds writer TX into “killers” list

– “TXs that can abort me” list, if they commit first

• We illustrate only the Write-after-Read (WAR) conflict

10

Page 11: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

txMark @A

ACK @A, 0

... ...

no other

sharers

EazyHTM Protocol

Conflict Detection (1/2)

11

racers

killers

TX 0

racers

killers

TX 2

sharers @A

Directory

1

2

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

Replaces

GETS/GETX

Page 12: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

racers

killers

TX 2

sharers @A

Directory

racers

killers

TX 0

ACK @A, 1txAccessor #2, @A

txMark @A

Reader #0, @A

Potential

conflict

1 other

sharer

Writer #2, @A

EazyHTM Protocol

Conflict Detection (2/2)

12

Remember:

abort TX#0

on commitRemember:

TX#2 can

abort me

1

23

4

5

Page 13: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

racers

killers

TX 2

racers

killers

TX 0

sharers @A

Directory

Abort from TX#2

WR @A (commit)

Abort Ack from TX#0

EazyHTM Protocol

Conflict Resolution

13

TX#2 first came to the commit point, abort TX#0!1

1

2

3

TX 0 TX 2

BTX

RD A

CTX

TX 0 TX 2

BTX

BTX

RD A

WR A

CTX

CTX

Page 14: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

TX 0 TX 2

BTX

WR A

CTX

TX 0 TX 2

BTX

BTX

WR A

WR B

CTX

CTX

0 other

sharers

EazyHTM Protocol

Disjoint data => parallel commit

14

txMark @B

...

txMark @A

ACK @A, 0

WR @A

(commit)

WR @B

(commit)

TX#0 works with line @A TX#2 works with line @B

sharers @A

Directorysharers @B

1 1

ACK @B, 022

racers

killers

TX 0

3racers

killers

TX 2

3

...

NO

SERIALIZATION0 other

sharers

Page 15: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Implementation

• Implemented in M5, full-system simulator (Alpha)

• Private L1 (32KB, 4-way, 64B CL, 2 cycles)

• Private L2 (512KB, 8-way, 64B CL, 10 cycles)

• Memory (with directory, 100 cycles)

• ICN (2D Mesh, 10 cycles per hop)

15

Page 16: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Evaluation

• Evaluated STAMP benchmarks

• Compared with Scalable-TCC-like HTM

– Same base simulator

– Implemented specialized directory protocol

• Compared with ideal lazy HTM (MESI based)

– magical conflict detection

– instant conflict resolution

– parallel write-back commit

16

Page 17: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Kmeans Low

• Small TXs (RS 15 CL; WS 5 CL)

• Low contention

(10% aborts)

• Similar profile to

“replacing locks with atomic”

• Near ideal performance

• K-means: groups N-dimensional

space into K clusters

• Most of the SPLASH-2 suite has

similar profile

17

0

5

10

15

20

25

30

0 10 20 30 40

sp

ee

du

p

processors

Kmeans-Low

Ideal

EazyHTM

STCC

Page 18: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

SSCA2

• Small TXs (RS 50 CL, WS 10 CL)

• Low contention

(1.2% aborts)

• Near ideal performance

• Scalability affected by barriers,

not by contention

• SSCA2: large directed graph

operations

18

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 10 20 30 40

sp

ee

du

p

processors

SSCA2

Ideal

EazyHTM

STCC

Page 19: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Yada

• Large TXs (260 CL RS, 140 CL

WS)

• Moderate contention

(35% aborts)

• We can see good performance

also for large TXs!

• Yada: delaunay mesh refinement

19

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Yada

Ideal

EazyHTM

STCC

Page 20: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Intruder

• Medium TXs (53 CL RS, 20 CL

WS)

• High contention (85%

aborts)

• Very bad scalability for all HTMs

• Every transaction detects conflicts

over and over again – lot of

conflict detection messages slow

down the execution

• Intruder: signature based network

intrusion detection system

20

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Intruder

Ideal

EazyHTM

STCC

Page 21: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Only high-conflict STAMP

• >50% abort rate only

• High contention high-core-count

should be optimized

• Averages:

• Labyrinth

• Intruder

• Kmeans-Hi

• Results highly affected by

Intruder

21

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

High-conflict STAMP

Ideal

EazyHTM

STCC

Page 22: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Only low-conflict STAMP

• <50% abort rate only

• Low abort rate necessary for

scaling

• Excludes:

• Labyrinth 8-32

• Intruder 16-32

• Kmeans-Hi 32

22

0

2

4

6

8

10

12

0 10 20 30 40

sp

ee

du

p

processors

Scaling STAMP

Ideal

EazyHTM

STCC

Page 23: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Conclusions

• Introduced EazyHTM, a new HTM implementation

– Eager conflict detection, lazy conflict resolution

– Fast: performs well for low conflict parallel applications

– Minimal changes to directory protocols (easier verification)

– As scalable as standard directory protocol

• EazyHTM mechanism could allow (future work):

– Simpler transaction prioritization

– Less wasted work

– Better performance optimization

– Power efficient TM mechanisms

23

Page 24: EazyHTM: Eager-Lazy Hardware Transactional Memory€¦ · EazyHTM: Eager-Lazy Hardware Transactional Memory SašaTomić, Cristian Perfumo, Chinmay Kulkarni, Adrià Armejach, Adrián

Thank you!

Questions?

[email protected]

24