LogTM: Log-Based Transactional Memory

29
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis

description

LogTM: Log-Based Transactional Memory. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis. Credits. Animations from the original LogTM HPCA presentation Original graphs modified for readability. Big Picture. - PowerPoint PPT Presentation

Transcript of LogTM: Log-Based Transactional Memory

LogTM: Log-Based Transactional Memory

Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood

Presented by Colleen Lewis

Credits

Animations from the original LogTM HPCA presentation

Original graphs modified for readability

Big Picture

Hardware transaction motivation Per thread log Optimize commits (Hardware)

Design Decisions

Version Management Eager – write in place Lazy – write on commit

Conflict Detection Eager – detect at read/write time Lazy – detect at commit time

Transaction Logs

Pointer to the beginning of the log Pointer to the end of the log Read and Write bits for each cache line

2/15/06 HPCA-12

12--------------

--------------23

34--------------

0

Transaction Log Example

00

40

C0

1000

1040

1080

• Initial State• LogBase = LogPointer• TM count > 0

Data BlockVA

Log Base

Log Ptr

TM count

1000

1000

1

0 0

R W

0 0

0 0

2/15/06 HPCA-12

10001048--

34------------

12--------------

--------------23

34-------------- 0

Transaction Log Example

56--------------

00

40

C0

1000

1040

1080

• Store r2, (c0) /* r2 = 56 */– Set W bit for block (c0)

– Store address (c0) and old data on the log

– Increment Log Ptr to 1048

– Update memory

Data BlockVA

Log Base

Log Ptr

TM count

1000

1

0 0

R W

0 0

0 1

c0

2/15/06 HPCA-12

12--------------

--------------23

56--------------

Transaction Log Example

00

40

C0

1000

1040

1080

• Commit transaction– Clear R & W for all blocks– Reset Log Ptr to Log Base

(1000)– Clear TM count

Data BlockVA

Log Base

Log Ptr

TM count

1000

1000

0

0 0

R W

0 0

0 0

34------------c0

--

0

0 0

1

1

1048

2/15/06 HPCA-12

1

1090

Transaction Log Example

12--------------

--------------23

34--------------

00

40

C0

1000

1040

1080

• Abort transaction– Replay log entries to “undo”

the transaction– Reset Log Ptr to Log Base

(1000)– Clear R & W bits for all

blocks– Clear TM count

Data BlockVA

Log Base

Log Ptr

TM count

1000

1048

0

0 0

R W

0 0

0 0

c0 34------------

--

0

0 0

156--------------

1000

Conflict Detection

Checked at every read/write Directory forwards read requests Directory can have “sticky” data Individual nodes responsible for detecting

conflicts Needs

Transaction mode bit Overflow bit

2/15/06 HPCA-12

I [old]M@P0 [old]

I (--) [none]M (--) [old]M (-W) [new]

Conflict Detection (example)

Directory

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

1

• P0 store– P0 sends get exclusive

(GETX) request

– Directory responds with data (old)

– P0 executes store

P0

GETX DATA

2/15/06 HPCA-12

M (-W) [new]M (-W) [new]

Conflict Detection (example)

Directory

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

M@P0 [old]

1

• In-cache transaction conflict– P1 sends get shared

(GETS) request

– Directory forwards to P0

– P0 detects conflict and sends NACK

P0

GETS

Fwd_GETS

Conflict!

NACK

2/15/06 HPCA-12

M (-W) [new]I (--) [none]

M@P0 [old]Msticky@P0 [new]

Conflict Detection (example)

Directory

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

1

• Cache overflow– P0 sends put exclusive

(PUTX) request

– Directory acknowledges

– P0 sets overflow bit

– P0 writes data back to memory

P0

PUTX ACK DATA

1

2/15/06 HPCA-12

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

M@P0 [old]

1

• Out-of-cache conflict– P1 sends GETS request

– Directory forwards to P0

– P0 detects a (possible) conflict

– P0 sends NACK

P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

1

GETS

Fwd_GETS

Conflict!

NACK

1

2/15/06 HPCA-12

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

M@P0 [old]

1

• Commit– P0 clears TM mode and

Overflow bits

P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

1

0

0

2/15/06 HPCA-12

Msticky@P0 [new]S(P1) [new]

0

0

0

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

0 P1

I (--) [none]

TM modeOverflow

0

0

• Lazy cleanup– P1 sends GETS request

– Directory forwards request to P0

– P0 detects no conflict, sends CLEAN

– Directory sends Data to P1

P0

M (--) [old]M (-W) [new]I (--) [none]

GETS

Fwd_GETSCLEAN DATA

S (--) [new]

False Positives?

What if P0 has started a new transaction without cleaning the sticky data?

M (-W) [new]I (--) [none]

M@P0 [old]Msticky@P0 [new]

False Positive Example

Directory

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

1

Cache overflow P0 sends put exclusive

(PUTX) request Directory acknowledges P0 sets overflow bit P0 writes data

back to memory P0

PUTX ACK DATA

1

False Positive Example

Directory

I (--) [none]

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

M@P0 [old]

1

Commit P0 clears TM mode and

Overflow bits

Start New Transaction P0 set TM mode Eventually overflow Set overflow bits P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

1

0

0

1

1

Conflict Detection (example)

Directory

I (--) [none]

TM modeOverflow

0

0P1

I (--) [none]

TM modeOverflow

0

0

M@P0 [old]

1

Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible)

conflict P0 sends NACK

P0

M (--) [old]M (-W) [new]

Msticky@P0 [new]

I (--) [none]

1

GETSFwd_GETS

Conflict!

NACK

1

Conflict Resolution and Deadlock Avoidance

Options Wait – risk deadlock? Abort – risk livelock?

Current Behavior Wait Abort if waiting on a logically younger process

Future Behavior? Software contention manager

Evaluation

32 SPARC processors Solaris 9 OS SIMICS – full system simulator

Magic no-ops Tests

Micro-benchmarks SPLASH suite

Microbenchmarks

High Contention / Short Transactions

Comparing: EXP - TTS locks with exponential backoff MCS – SW Queue based locks

BEGIN_TRANSACTION();

new_total = total.count + 1; private_data[id].count++; total.count = new_total;

COMMIT_TRANSACTION();

0

10

20

30

40

50

60

70

80

90

0 5 10 15 20 25 30 35

Threads (on 32 Processors)

Execu

tion

Tim

e (

in m

illion

s o

f cycle

s)

EXPMCSLogTM

SPLASH2 Benchmark Results

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER

Benchmark

Execu

tion

Tim

e (

in m

illion

s o

f cycle

s)

4.18

2.68

SPLASH2 Benchmark Results

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER

Benchmark

Execu

tion

Tim

e (

in m

illion

s o

f cycle

s)

4.18

2.68

Data presented as:PARMACS locks execution time

LogTM execution time Modified version:

LogTM execution time

PARMACS locks execution time

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER

Benchmark

Execu

tion T

ime (

in m

illio

ns

of

cycl

es) 4.18

2.68

0

10

20

30

40

50

60

70

80

90

100

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER

Benchmark

Sp

eed

up

62.7%

24.6%

10.9%

18.6% 18.3%

4.3%

76.1%

0

10

20

30

40

50

60

70

80

90

100

OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER

Benchmark

Sp

eed

up

62.7%

24.6%

10.9%

18.6% 18.3%

4.3%

76.1%

SPLASH2 Benchmark Results

Conclusions

Optimize commits Aborts handled by software Stall to avoid wasting work Allow sticky data because overflow is rare Good performance on microbenchmark False sharing has a big impacts on LogTM

Questions?