Parallel Processing (CS 667) Lecture 5: Shared Memory Parallel Programming with OpenMP *
(Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.
-
Upload
vincent-randall -
Category
Documents
-
view
229 -
download
10
Transcript of (Short) Introduction to Parallel Computing CS 6560: Operating Systems Design.
(Short) Introduction to Parallel Computing
CS 6560: Operating Systems Design
2
Why Parallel Computing? Performance!
Many applications require serious performance. Examples:
Structural biology
Chemical dynamics
Pharmaceutical design
Weather forecasting
Human genome
Ocean modeling
3
Processor Performance: Need Parallelism!
2-3 GHz
4
Case Study 1: Simulating Ocean Currents
Model as two-dimensional gridsDiscretize in space and time
finer spatial and temporal resolution => greater accuracy
Many different computations per time step
set up and solve equations
Concurrency across and within grid computations(a) Cross sections (b) Spatial discretization
of a cross section
5
Simulate interactions of many stars evolving over time
Computing forces is expensive
O(n2) brute force approach
Hierarchical methods take advantage of force law: Gm1m2
r2
Star on which forcesare being computed
Star too close toapproximate
Small group far enough away toapproximate to center of mass
Large group farenough away toapproximate
Case Study 2: Simulating Galaxy Evolution
6
Case Study 2: Barnes-Hut
Many time steps, plenty of concurrency across stars
Locality Goal
Particles close together in space should be on same processor
Difficulties: Non-uniform, dynamically changing
Spatial Domain Quad-tree
7
Case Study 3: Rendering by Ray Tracing
Goal is to produce image from representation of real world
Shoot rays into scene through pixels in projection plane
Result is color for pixel
Rays shot through pixels in projection plane are called primary rays
Reflect and refract when they hit objects
Recursive process generates ray tree per primary ray
Tradeoffs between execution time and image quality Viewpoint
Projection Plane
3D Scene
Ray fromviewpoint to
upper right cornerpixel
Dynamicallygenerated ray
8
Partitioning
Need dynamic assignment
Use contiguous blocks to exploit spatial coherence among neighboring rays, plus tiles for task stealing
A block,the unit ofassignment
A tile,the unit of decompositionand stealing
9
Sample Speedups
Speedups on NUMA multiprocessor
Speedup = (best) time on 1 processor / time on multiple processors
10
Ideal/Linear Speedup? Amdahl’s Law
If a fraction s of a computation is not parallelizable, then the best achievable speedup is
sS
1
Speedup for computations with fraction s of sequential work
0
20
40
60
80
100
1 10 20 30 40 50 60 70 80 90 100
Number of processors
Sp
eed
up
0
0.01
0.025
0.05
0.1
0.2
NNss
S ,/)1(
1
11
Pictorial Depiction of Amdahl’s Law
1
p
1
Time
Parallelizable work Sequential work
12
But Goal is not just Performance
At some point, we’re willing to trade some performance for:
Ease of programming
High portability
Low cost
Ease of programming & high portability
Parallel programming for the masses
Leverage new or faster hardware asap
Low cost
High-end parallel machines are expensive resources
13
Parallel Applications
Scientific computing not the only class of parallel applications
Examples of non-scientific parallel applications:
Data mining
Real-time rendering
Distributed servers
Today, programmers are encouraged to find parallelism in all sorts of software