TM performance: seeing the whole picture or Looking back over the first 500 papers

33
TM performance: seeing the whole picture or Looking back over the first 500 papers Tim Harris (MSR Cambridge)

description

TM performance: seeing the whole picture or Looking back over the first 500 papers. Tim Harris (MSR Cambridge). How might we compare TM systems? Where might TM be most useful?. Extending Dan’s GC analogy. “Here’s a way to reduce the pause times...”. C. A. - PowerPoint PPT Presentation

Transcript of TM performance: seeing the whole picture or Looking back over the first 500 papers

Page 1: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

TM performance: seeing the whole picture

or

Looking back over the first 500 papers

Tim Harris (MSR Cambridge)

Page 2: TM performance: seeing  the whole picture or Looking back over  the first 500 papers
Page 3: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

How might we compare TM systems?

Where might TM be most useful?

Page 4: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Extending Dan’s GC analogy

Concurrent GC algorithm

(run GC in small steps in

amongst mutators)

“Here’s a way to reduce the pause times...”

A

“Here’s a way to support pinned objects...”

B “Here’s a way to improve the throughput (total app

runtime)...

C

Page 5: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Min mutator utilization

0 2 4 6 8 10 120.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Algorithm AAlgorithm B

Time interval / ms

Min

facti

on o

f int

erva

l run

ning

mut

ator

Page 6: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Five dimensions to TM behaviorSequentialoverhead

Scalability(to longer

transactions)

Scalability(to more cores)

Tx-supportedoperations

Semantics

Page 7: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling to large transactions

0 1 2 3 4 5 6 7 8 9 100.00.51.01.52.02.53.03.54.04.55.0

Algorithm AAlgorithm B

Tx size

Norm

alize

d ex

ecuti

on ti

me

1.0 = optimized sequential code(no tx, no locks)

Page 8: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling: n*1-core copies

0 1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

Algorithm AAlgorithm B

#cores

Norm

alize

d ex

ecuti

on ti

me

1.0 = optimized sequential code(no tx, no locks)

Page 9: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Scaling: 1*n-core copy

0 1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

Algorithm AAlgorithm B

#cores

Spee

dup

over

sequ

entia

l

1.0 = optimized sequential code(no tx, no locks)

Page 10: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

How might we compare TM systems?

Where might TM be most useful?

Page 11: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential Parallelizable

f = fraction of original program that is parallelizable

Page 12: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential

Parallel

Parallel

Parallel

...

f = fraction of original program that is parallelizablen = num parallel threads

Page 13: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #1

Sequential

Parallel, transactional

Parallel, transactional

Parallel, transactional

...

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-down

Page 14: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Conflict model

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

1 2 3 4 5 6

Fixed number of alternatives, executedifferent alternatives in parallel

Execute conflictingoperations in series

Page 15: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, vary f, vary x

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 16: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

8x on 16 threads => 95% parallelizable

Page 17: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Straight-line slow-down bites quickly

Page 18: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.1 (1..1024)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 19: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.4 (1..256)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 20: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=2.0 (1..64)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 21: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=3.1 (1..16)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

If Amdahl and overheads don’t get

you then conflicts still can...

Page 22: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, scaling of large tx

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*f

Page 23: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, x*(f+(f^1.25)/4)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*(f+

(f^1.

25)/

4)

Page 24: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0, x*(f+(f^2)/4)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.55991731349224

75%78%80%83%85%88%90%93%95%98%100%

75%78%80%85%88%

x (straight-line transactional slow-down)

f (pa

ralle

l pro

porti

on)

f = fraction of original program that is parallelizablen = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

0.0 1.0 2.0 3.0 4.00.0

5.0

10.0

x*f

x*(f+

(f^2)

/4)

Page 25: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Application model #2: 100% parallel

Tx

...

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Non-tx

Tx Non-tx

Tx Non-tx

Page 26: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Workloads (ASPLOS ’10)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)Labyrinth

Genome

JBBAtomicVacation

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

MaxFlow

Page 27: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Workloads (ASPLOS ’10)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%30%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Labyrinth

Genome

JBBAtomicVacation

MaxFlow

Page 28: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0 (no conflicts)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 29: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.0 (no conflicts)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Overheads rapidly reduce the amount

that transactions can be used

Page 30: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.1 (1..1024)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 31: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=1.4 (1..256)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635721

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 32: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

n=16, c=2.0 (1..64)

11.21

1.4641

1.771561

2.14358881

2.5937424601

3.138428376721

3.79749833583242

4.59497298635722

5.559917313492240%10%20%30%40%50%60%70%80%90%100%

0%10%20%40%

x (straight-line transactional slow-down)

t (tr

ansa

ction

al p

ropo

rtion

)

t = fraction of original program that is transactionaln = num parallel threadsx = straight-line transactional slow-downc = mean number of attempts per transaction (1 => no conflicts)

Page 33: TM performance: seeing  the whole picture or Looking back over  the first 500 papers

Conclusions• Bad things come in threes...

– Amdahl’s law– Sequential overhead– Conflicts

• When developing TM systems we need to be careful about tradeoffs between these

• There’s a risk of “chasing around the TM design space”– Sequential overhead– Scaling without conflicts– Scaling with conflicts