Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

43
Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy TRANSACT 2014

description

Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory. TRANSACT 2014. Irina Calciu Justin Gottschlich Tatiana Shpeisman Gilles Pokam Maurice Herlihy. Multicore Performance Scaling. 2. Hardware Transactional Memory (HTM). - PowerPoint PPT Presentation

Transcript of Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Page 1: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Improved Single Global Lock Fallback for Best-effort Hardware

Transactional Memory

Irina CalciuJustin GottschlichTatiana Shpeisman

Gilles PokamMaurice Herlihy

TRANSACT 2014

Page 2: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Multicore Performance Scaling

2

Page 3: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Intel’s Haswell TSX: RTM & HLE

3

Low overhead (cache based)

IBM’s Blue Gene/Q & System Z & Power Architecture

Hardware Transactional Memory (HTM)

Page 4: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Haswell RTM

if (_xbegin() == _XBEGIN_STARTED)

_xend()

Speculate Execution

Speculate Execution, without any locks

Read and Write Sets

4

Abort on memory conflict

else

Abort Handler

Page 5: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Haswell RTM

5

_xbegin()

_xend()

Read X

Write Y

Add to Read Set

Add to Write Set

_xbegin()

_xend()

Write X

Write YAdd to Write Set

Make the change to Y visibleCOMMIT

Add to Write SetABORT

if (_xbegin() == _XBEGIN_STARTED)

_xend()

Speculate Execution

Page 6: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Lock Elision

<HLE_Aquire_Prefix> Lock(L)

<HLE_Release_Prefix> Release(L)

Atomic region executed as a transaction or mutually exclusive on L

Execute optimistically, without any locks

Track Read and Write Sets

6

Abort on memory conflict: rollback acquire lock

Page 7: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

[Anand Tech]7

Page 8: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Best-effort

OverflowUnsupported InstructionsInterrupts

Conflicts

8

Small & Medium Transactions

Haswell RTM

Needs software fallback

Page 9: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

9

Page 10: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Try_SPEC:Wait until Lock is freeTransactional_Read(Lock)If Lock is taken ABORTSpeculate critical sectionEnd speculation

Single Global Lock HyTM (simple and common)

10

EndHW txn

BeginHW txnRead L

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Page 11: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txnRead L

EndHW txn

(1)

BeginHW txnRead L

EndHW txn

(2)

BeginHW txnRead L

BeginHW txnRead L

EndHW txn

(3) EndHW txn

(4)

XX

X

X

Legend: X = ABORT

Single Global Lock HyTM (simple and common)

Tim

e

11

Page 12: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Acquire(L)

Release(L)

CRITICAL SECTION(SW TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Tim

e

Thread 1 Thread 2

Execution Time 1 12

Page 13: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Thread 1 Thread 2

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Acquire(L)

Release(L)

CRITICAL SECTION(SW TXN)

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Begin_HW_TXN (L)

End_HW_TXN (L)

CRITICAL SECTION

Execution Time 1

Tim

e

Execution Time 2

13

Page 14: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Try_SPEC:Speculate critical sectionTransactional_Read(Lock)If Lock is taken ABORTEnd speculation

Lazy SGL

1414

Begin SW txn

Acquire L

Release LEnd

SW txn

On_ABORT:If try_lock(Lock)

Critical sectionRelease(Lock)

Else Try_SPEC

Does not abort!

Read LEnd

HW txn

BeginHW txn

Page 15: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend: X = ABORT

COMMITCOMMIT

Lazy SGL

Tim

e

15

Page 16: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

16

Page 17: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Transactional Memory Correctness

Transaction 1SW

Transaction 2HW

Tim

e

Order T2 AFTER T1

Order T2 BEFORE T1

COMMIT

COMMIT

17

Page 18: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Thread 1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread 2(HW)

Correct: a Actual: b

Tim

e

Case 1: HW begins SW begins HW ends SW ends

X value: a b

Check Lock

ABORT

Correct: a Actual: a

18

Page 19: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread 1(SW)

Thread 2(HW)

Case 2: SW beginsHW beginsHW endsSW ends

Correct: a Actual: b

Tim

e

Correct: a Actual: a

Check Lock

ABORT

X value:

19

Page 20: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Case 3: SW beginsHW beginsSW endsHW ends

Thread 1(SW)

Thread 2(HW)

Tim

eX value: a b

Correct: b Actual: b

Check LockCOMMIT

20

Page 21: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Acquire Lock…

X = a…

Release Lock

TXN_BEGIN

X = b…

TXN_END

Case 4: HW beginsSW beginsSW endsHW ends

Thread 1(SW)

Thread 2(HW)

Tim

e

X value:Correct:

b Actual: b

Check Lock

COMMIT

21

Page 22: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

22

Thread 1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

TXN_BEGIN

Z = 1/(Y-X)

TXN_END

Thread 2(HW)

Z = 1/0 !!!Tim

e

Hardware Sandboxing

Page 23: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Indirect Jumps

Thread 1(SW)

X = 5; Y = 6Acquire Lock

…++X

++Y…

Release Lock

_xbegin

if (X == Y) *p = garbagep()

…if (lock) abort_xend

Thread 2(HW)

_xend

Indirect jump to

garbage location

Tim

e

23

Page 24: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

24

Page 25: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Ssca2 (small txns)

Threads

Spee

dup

1 2 4 80

0.51

1.52

2.53

3.54

Labyrinth (large txns)

Threads

Spee

dup

25

Intruder (medium txns)

1 2 4 80

0.5

1

1.5

2

2.5

3

TL2

SGL

HLE

E-SGL

L-SGL

Threads

Spee

dup

Better

Page 26: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Improved Lock Acquisition Rate

26

Vacation Low (medium txns)

Kmeans High (small txns)

Intruder (medium txns)

Labyrinth (large txns)

1 2 4 80

5

10

15

20

25

30

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 80

10

20

30

40

50

60

70

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 805

1015202530354045

HLEE-SGLL-SGL

Threads

% lo

ck a

cqui

sitio

ns

1 2 4 80

10

20

30

40

50

60

70

80

HLEE-SGLL-SGL

Threads

% lo

ck a

cqui

sitio

ns

Better

Page 27: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

No single thread overhead

27

Slowdown relative to sequential for 1 thread

baye

s

geno

me

intrud

er

km_lo

w

km_h

igh

labyri

nth

vaca

tion_

low

vaca

tion_

high

ssca

2ya

da0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

TL2SGLHLEE-SGLL-SGLSl

owdo

wn

Page 28: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Overview

• Best-effort Hardware Transactional Memory

• Lazy SGL

• Bloom Filter SGL

Description

Correctness

Results

28

Page 29: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Bloom Filters

• Efficient probabilistic data structure to compute fast set intersection

• Can admit false positives

• No false negatives

• Used in TM for Conflict Detection

29

Page 30: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Read LEnd

HW txn(1)

BeginHW txn

Read LEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

XX

Legend: X = ABORT

COMMITCOMMIT

Lazy SGL

Tim

e

30

Page 31: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Begin SW txn

Acquire L

Release LEnd

SW txn

BeginHW txn

Check BFEnd

HW txn(1)

BeginHW txn

Check BFEnd

HW txn(2)

BeginHW txn

BeginHW txn

Read LEnd

HW txn(3)

Read LEnd

HW txn(4)

Legend: X = ABORT

COMMITCOMMIT

BF SGL

Tim

e

31

Page 32: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Thread 1(SW)

Acquire Lock…

X = a

Release Lock

TXN_BEGIN

X = b…

TXN_END

Thread 2(HW)

Correct: a Actual: b

Tim

e

Case 1: HW begins SW begins HW ends SW ends

X value: a b

Check Lock

ABORT

Correct: a Actual: a

Check BF

If BFs intersect: ABORTElse: COMMIT

32

Page 33: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Acquire Lock…

X = a

Release Lock

TXN_BEGIN…

X = b…

TXN_END

Thread 1(SW)

Thread 2(HW)

Case 2: SW beginsHW beginsHW endsSW ends

Correct: a Actual: b

Tim

e

Correct: a Actual: a

Check Lock

ABORT

X value:

Check BF

If BFs intersect: ABORTElse: COMMIT 33

Page 34: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Conclusions

• HTMs are becoming more available

• Best-effort – need software fallback

• Eager SGL • simple and fast fallback, • often preferred to more efficient solutions

34

Page 35: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

Conclusions

• Lazy SGL • as simple as Eager SGL• more efficient

• Bloom Filter SGL • more accurate conflict detection• Slower

• Can be implemented directly in hardware

35

Page 36: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory
Page 37: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

http://de.sap.info/wp-content/uploads/2012/02/In_Memory_Technologie.jpg

http://www.avoiceformen.com/wp-content/uploads/sites/2/2013/01/Questions.jpg

References

Page 38: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

1 2 4 80

0.5

1

1.5

2

2.5

3

Intruder

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Vacation Low

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

Vacation High

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Genome

TL2SGLHLEHyswell

Threads

Spee

dup

38

Medium transactions

Page 39: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

1 2 4 80

0.51

1.52

2.53

3.54

4.5

Kmeans Low

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.51

1.52

2.53

3.54

4.5

Kmeans High

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

Ssca2

TL2SGLHLEHyswell

Threads

Spee

dup

39

Small transactions

Page 40: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

4

Bayes

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.5

1

1.5

2

2.5

3

3.5

4

Labyrinth

TL2SGLHLEHyswell

Threads

Spee

dup

1 2 4 80

0.2

0.4

0.6

0.8

1

1.2

Yada

TL2SGLHLEHyswell

Threads

Spee

dup

40

Large transactions

Page 41: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

bayes genome intruder kmeans low kmeans high

labyrinth ssca2 vacation low

vacation high

yada0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Speedup over sequential for 8 threads

TL2

SGL

HLE

Hyswell

41

Page 42: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory

  Software Hardware  (1) Read(x) Read(x) Not a conflict

(2)Read(x)  

Write(x)

Software transaction ordered before hardware transaction -> CORRECT

(3) 

Read(x)

Write(x) Hardware abort

(4)Write(x)

 

 

Read(x)

Software transaction ordered before hardware transaction -> CORRECT

(5) 

Write(x)

Read(x) Hardware abort

(6)Write(x)

 

 

Write(x)

Software transaction ordered before hardware transaction -> CORRECT

(7) 

Write(x)

Write(x) Hardware abort

42

Page 43: Improved Single Global Lock Fallback for Best-effort Hardware Transactional Memory