Predicting Parallel Performance
description
Transcript of Predicting Parallel Performance
INTEL CONFIDENTIAL
Predicting Parallel PerformanceIntroduction to Parallel Programming – Part 10
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
2
Review & Objectives
Previously: Design and implement of a task decomposition solution
At the end of this part you should be able to:Define speedup and efficiencyUse Amdahl’s Law to predict maximum speedup
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
3
Speedup
Speedup is the ratio between sequential execution time and parallel execution time
For example, if the sequential program executes in 6 seconds and the parallel program executes in
2 seconds, the speedup is 3X
Speedup curveslook like this
Cores
Spee
dup
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Efficiency
EfficiencyA measure of core utilizationSpeedup divided by the number of cores
ExampleProgram achieves speedup of 3 on 4 coresEfficiency is 3 / 4 = 75%
4
Efficie
ncy
Cores
Efficiency curveslook like this
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Speedup Example
Painting a picket fence– 30 minutes of preparation (serial)– One minute to paint a single picket– 30 minutes of cleanup (serial)
Thus, 300 pickets takes 360 minutes (serial time)
5
Speedup and Efficiency
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Computing Speedup
6
Number of painters
Time Speedup
1 30 + 300 + 30 = 360 1.0X
2 30 + 150 + 30 = 210 1.7X
10 30 + 30 + 30 = 90 4.0X
100 30 + 3 + 30 = 63 5.7X
Infinite 30 + 0 + 30 = 60 6.0X
Speedup and Efficiency
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
7
Efficiency ExampleNumber of painters
Time Speedup Efficiency
1 360 1.0X 100%
2 30 + 150 + 30 = 210 1.7X 85%
10 30 + 30 + 30 = 90 4.0X 40%
100 30 + 3 + 30 = 63 5.7X 5.7%
Infinite 30 + 0 + 30 = 60 6.0X very low
Speedup and Efficiency
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Idea Behind Amdahl’s Law
8
Cores
Exec
utio
n Ti
me
s
s
ss s
1-s (1-s )/2 (1-s )/3 (1-s )/5(1-s )/4
Portion of computationthat will be performed
sequentially
Portion of computationthat will be executed
in parallel
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
9
Derivation of Amdahl’s Law
Speedup is ratio of execution time on 1 core to execution time on p cores
Execution time on 1 core is s + (1-s)Execution time on p cores is at least s + (1-s)/p
psspssss
/)1(1
/)1()1(
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Amdahl’s Law Is Too Optimistic
Amdahl’s Law ignores parallel processing overheadExamples of this overhead include time spent
creating and terminating threadsParallel processing overhead is usually an increasing
function of the number of cores (threads)
10
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Graph with Parallel Overhead Added
11
Cores
Exec
utio
n Ti
me Parallel overhead
increases with# of cores
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Other Optimistic Assumptions
Amdahl’s Law assumes that the computation divides evenly among the cores
In reality, the amount of work does not divide evenly among the cores
Core waiting time is another form of overhead
12
Task started
Task completed
Working time
Waiting time
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Graph with Workload Imbalance Added
13
Cores
Exec
utio
n Ti
me
Time lostdue to
workloadimbalance
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Illustration of the Amdahl Effect
14
n = 100,000
n = 10,000
n = 1,000
Cores
Spee
dup
Linear speedup
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Using Amdahl’s Law
Program executes in 5 secondsProfile reveals 80% of time spent in function alpha,
which we can execute in parallelWhat would be maximum speedup on 2 cores?
New execution time ≥ 5 sec / 1.67 = 3 seconds
15
67.16.01
2/)2.01(2.01
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Superlinear Speedup
According to our general speedup formula, the maximum speedup a program can achieve on p cores is p
Superlinear speedup is the situation where speedup is greater than the number of cores used
It means the computational rate of the cores is faster when the parallel program is executing
Superlinear speedup is usually caused because the cache hit rate of the parallel program is higher
16
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
17
References
Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
19
More General Speedup Formula
(n,p) Speedup for problem of size n on p cores(n) Time spent in sequential portion of code for
problem of size n(n) Time spent in parallelizable portion of code
for problem of size n(n,p) Parallel overhead
),(/)()()()(),(
pnpnnnnpn
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
Amdahl’s Law: Maximum Speedup
20
),(/)()()()(),(
pnpnnnnpn
This term is set to 0
Assumes parallelwork divides perfectlyamong available cores
Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.
The Amdahl Effect
21
As n theseterms dominate
Speedup is an increasing function of problem size
),(/)()()()(),(
pnpnnnnpn