Automatic Loop Parallelization using STM

74
Automatic Loop Parallelization using Software Transactional Memory Amr Abed

Transcript of Automatic Loop Parallelization using STM

Automatic Loop Parallelizationusing Software Transactional Memory

Amr Abed

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Why Parallelization?

O Multi-coresO Only multi-treaded applications

Why automatic?

O Parallel programsO Hard to implement

O Legacy codeO Low knowledge

Why loops ?

Most of execution time of a program takes place inside loops

Why STM ?

O Lock-basedO DeadlockO Priority InversionO Convoying

O Lock-freeO Not easy to implementO CAS multiple locations

O STM O Easy to implement, as in lock-basedO Higher performance than lock-free

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

STM

O Each transaction performs an atomic task

O Transactions run concurrently

O To access shared memoryO Write buffer

O Undo log

O At end, validate reads O No conflict Commit

O Conflict Abort and restart

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Motivation

Location-based metadata

TX writing to W locationsO(W) CAS operations

Committing R/W TXO(R+W) overhead

All validation done in CS

TX-based metadata

TX writing to W locationsNo CAS operations

Committing R/W TXSingle CAS operation

Bloom filters used for validation

Orec-based STM Ring STM

Transaction

Write Buffer

StartTimestamp

Write FilterRead Filter

The Ring

41

42

43

40

44

46

39

45

Writing

Complete

Ring Entry

CommitTimestamp

Write Filter

Status

0 0 0 0 0 0 0 0 0 0 0 0 0 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Initially

0 0 0 0 0 0 0 0 0 0 0 0 0 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 0 0 1 0 0 1 0 0 0 0 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Hash Functions

A

171013

Inserting A

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 0 0 1 0 0 1 0 0 0 0 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Hash Functions

B

27912

Inserting B

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Hash Functions

A

171013

Searching for A

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Hash Functions

C

371012

Searching for C

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Bloom Filter

Hash Functions

D

271012

Searching for D

New Transaction

41

42

43

40

44

46

39

45 Writing

Complete

Write Buffer

Start Time43

Write FilterRead Filter

1 0 0 1 0 0 1 0 0 0 0 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 0 0 1 0 0 0 1 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

On Write

Hash Functions

Address

171012

Write filter

Add address to Write set

1 0 0 1 0 0 1 0 0 0 0 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 0 0 1 0 0 0 1 0 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

On Write

Hash Functions

Address

171012

Write filter

Write Buffer

Value

Address

Add <address, value> to Write buffer

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

On Read

Hash Functions

Address

171013

Address is in Write set

Write filter

On Read

Get value from write buffer

Write Buffer

Value

Address

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Hash Functions

Address

171013

Write filter

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

On Read

Hash Functions

Address

371012

Address is not in Write Set

Write filter

On Read

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 1 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

Hash Functions

Address

371012

Get value from memory

Write filter

Memory

Value

Address

1 1 0 1 1 0 1 0 0 0 1 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

1 0 0 1 1 0 1 0 0 0 0 1 1 000

15 14 13 12 11 10 9 8 7 6 5 4 3 2

1 0

On Read

Hash Functions

Address

371012

Add address to read set

Read filter

Write Buffer

Start Time43

Write FilterRead Filter

On Read

41

42

43

40

44

46

39

45

Read Filter

Start Time44

Check for conflicts

Write Buffer

Start Time43

Write FilterRead Filter

On Commit

41

42

43

40

44

46

39

45

47

Add new entry, and update index

Write Buffer

Start Time43

Write FilterRead Filter

On Commit

41

42

43

40

44

46

47

45

Write Filter

Check for conflicting writers

On Commit

Memory

Address

WriteBuffer

Address

If no conflicts, write to memory

On Commit

41

42

43

40

44

46

47

45 4445

46

47

Set status to complete

Results

Results

Results

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Motivation

Transaction

Write Buffer

Startversion

Write SignatureRead Signature

Commitversion

Abort?

Commit?

TCM

Commitlog

Pre-Commit

log

minSVlog

Commit Log Entry

Commitversion

Write Signature

Transaction pointer

Pre-commit Log Entry

CommitVersion

Write SignatureRead Signature

Ready?

Transaction Pointer

New Transaction

Write Buffer

Startversion

Write SignatureRead Signature

Commitversion

Abort?

Commit?

Global Clock

Commitlog

Pre-Commit

log

minSVlog

New Transaction

Write Buffer

Startversion

Write SignatureRead Signature

Commitversion

Abort?

Commit?

Memory Access (R/W)

Same as RingSTMExcept, no eager validation on read

Pre-Commit

CommitVersion

Write FilterRead Filter

Ready = 1

Transaction Pointer

GlobalClock

Ready?

Pre-Commit

CommitVersion

Write SignatureRead Signature

Ready = 1

Transaction Pointer

Commitlog

Commit

Memory

Address

WriteBuffer

Address

Results

Results

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Loop parallelization

O Non-Speculative ParallelizationDOALL

DOACROSS

DSWP

O Speculative ParallelizationTLS

Spec-PS-DSWP

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Speculative Pipelining

Slow Mode Fast ModeTransition

Thread 1

Thread 2

Thread 3

The Value Algorithm

Slow-modeper-access instrumentation

Consistency Check

Fast-mode un-instrumented speed

No false conflicts

Data forwarding

The Signature Algorithm

Array of write signatures

Update entry on each write

Maintain read signature

On Transition, intersect sets

Even before!

Results

Speedup using Value algorithm

Results

Speedup using Signature algorithm

Outline

O Motivation

O Software Transactional Memory

O RingSTM

O STMlite

O Speculative Parallelization

O Fastpath

O SMTX (Spec-PS-DSWP)

Example code

A: while(node) {

B: node = node−>next;

C: res = work(node);

D: write(res); }A B

C

D

Control Dependency

Data Dependency

Speculation

while(TRUE) {

B: node = node−>next;

C: res = work(node);

D: write(res); }A

C

D

BB

Control Dependency

Data Dependency

Parallel Stage

CC

Pipelining

C

D

B

Stage 1 (Sequential)

Stage 2 (Parallel)

Stage 3 (Sequential)

node = node−>next;

res = work(node);

write(res);

Core 0 Core1 Core 2 Core 3 Core 4 Core 5

0

1

2

3

4

5

Execution

B0

B1

B2

B3

B4

B5

D0

D1

D2

C2

C1

C3Commit 0try1

C0

C4

Try0

Copy

on

Write

initialization

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

MTX creation

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Worker 1

Worker 2

Copy

on

Write

Communication Channel

Copy

on

Write

Com

mit

Memory access

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Worker 1

Worker 2

Priv

ate

Priv

ate

Copy

on

Write

Copy

on

Write

Communication Channel

Com

mit

Commit

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Worker 1

Worker 2

Priv

ate

Priv

ate

Communication Channel

or

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Worker 1

Worker 2

Priv

ate

Priv

ate

Com

mit

Communication Channel

Com

mit

Rollback

Copy

on

Write

Virtu

al A

dd

ress S

pa

ce

Page table

Main/Commit

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Page tablePage table

Virtu

al A

dd

ress S

pa

ce

Page table

Worker 1

Worker 2

Priv

ate

Priv

ate

Copy

on

Write

Communication Channel

Results

1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Number of threads

Speedup

SMTX

TX

Thank You