U nderprivileged C hildren’s E ducational P rograms 44 years of providing second-chance education…
Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p...
Transcript of Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p...
![Page 1: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/1.jpg)
Advanced Topics
Chapter 17
Parallelism
CS250 -- Part V
1D
r.Rajesh Subram
anyan, 2005
![Page 2: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/2.jpg)
Topics
•Introduction
•C
haracterizations of Parallelism
•M
icroscopic Vs M
acroscopic
•Sym
metric V
s. Asym
metric
•Fine G
rain Vs C
oarse-grain
•E
xplicit Vs. Im
plicit
•Parallel A
rchitectures
•SISD
•SIM
D
CS250 -- Part V
2D
r.Rajesh Subram
anyan, 2005
![Page 3: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/3.jpg)
Topics
•M
IMD
•Sym
metric M
ultiprocessors
•C
omm
unication, Coordination, C
ontention
•Perform
ance of Multiprocessors
•C
onsequence for Programm
ers
•L
ocks and Mutual E
xclusion
•Program
ming E
xplicit and Implicit Parallel C
omputers
•R
edundant Parallel Architectures
•D
istributed and Cluster C
omputers
CS250 -- Part V
3D
r.Rajesh Subram
anyan, 2005
![Page 4: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/4.jpg)
Introduction
•Previous parts covered
−processors
−m
emory
−I/O
•N
ow
−advanced topics
−use of parallelism
to increase speed
−parallel hardw
are, taxonomy,lim
itations etc.
•Parallel and pipelined architectures
−tw
o fundamental techniques or basis of increasing speed
CS250 -- Part V
4D
r.Rajesh Subram
anyan, 2005
![Page 5: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/5.jpg)
Characterizations of P
arallelism
•Parallel and non-parallel are the extrem
es
•T
he greyregion in-betw
een describes amount and type of
parallelism
−m
icroscopic vs macroscopic
−sym
metric vs. asym
metric
−fine grain vs coarse-grain
−explicit vs. im
plicit
CS250 -- Part V
5D
r.Rajesh Subram
anyan, 2005
![Page 6: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/6.jpg)
Microscopic V
s Macroscopic
•M
icroscopic
−parallelism
in a computer rem
ains hidden insidesubcom
ponents
−parallel hardw
are within specific com
ponent
•M
acroscopic
−parallelism
is a basic premise around w
hich the system is
designed
CS250 -- Part V
6D
r.Rajesh Subram
anyan, 2005
![Page 7: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/7.jpg)
Exam
ples
•M
icroscopic
−A
LU
, registers, physical mem
ory,parallel busarchitecture.
•M
acroscopic
−use parallelism
across multiple large-scale com
ponents ofa
computer system
s, multiple identical processors,
multiple dissim
ilar processors
CS250 -- Part V
7D
r.Rajesh Subram
anyan, 2005
![Page 8: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/8.jpg)
Symm
etric Vs. A
symm
etric
•Sym
metric
−replication of identical elem
ents
−e.g. dual processors
•A
symm
etric
−m
ultiple elements that function at the sam
e time but differ
from each other
−PC
with graphics coprocessor and m
ath coprocessor
CS250 -- Part V
8D
r.Rajesh Subram
anyan, 2005
![Page 9: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/9.jpg)
Fine G
rain Vs C
oarse-grain
•Fine G
rain
−parallelism
at the levelofindividual instructions or
individual data elements
−e.g. graphics processor that uses 16 hardw
are units toupdate 16 bytes of an im
age at the same tim
e
•C
oarse-grain
−parallelism
on the levelofprogram
s or large blocks ofdata
−e.g. dual processor,using a processor each to print andcom
pose an email sim
ultaneously
CS250 -- Part V
9D
r.Rajesh Subram
anyan, 2005
![Page 10: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/10.jpg)
Explicit V
s. Implicit
•Im
plicit
−hardw
are handles parallelism autom
atically without
requiring a programm
er to initiate or control parallelexecution
•E
xplicit
−program
mer m
ust control each parallel unit
CS250 -- Part V
10D
r.Rajesh Subram
anyan, 2005
![Page 11: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/11.jpg)
Parallel A
rchitectures
•Parallel architectures: parallelism
of the central featuresaround w
hich the entire system is designed
•Flynn’s
classification
Nam
e M
eanin
gS
ISD
S
ing
leIn
structio
n S
ing
le Data stream
SIM
D
Sin
gle
Instru
ction
Mu
ltiple D
ata streams
MIM
D
Mu
ltiple
Instru
ction
s Mu
ltiple D
ata streams
Terminology used to characterize com
puters according to theam
ount and type of parallelism.
Note, hybrids that span
multiple types also exist.
CS250 -- Part V
11D
r.Rajesh Subram
anyan, 2005
![Page 12: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/12.jpg)
SISD
•Single Instruction Single D
ata stream
•A
lso called sequential or uniprocessor architecture
•Follow
s Von N
eumann’s
architecture
•D
oes not support macroscopic parallelism
, can useparallelism
internally
CS250 -- Part V
12D
r.Rajesh Subram
anyan, 2005
![Page 13: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/13.jpg)
SIMD
•Single Instruction stream
s Multiple D
ata streams
•E
ach instruction specifies a single operation applied to many
data items sim
ultaneously
•V
ector,array processors
for i from 1 to N
{
V[i]
←V
[i]×
Q;
}
Algorithm
for vector normalization in a sequential com
puter.
•Sam
e in a SIMD
computer
V<
-V
XQ
•G
raphic processors
CS250 -- Part V
13D
r.Rajesh Subram
anyan, 2005
![Page 14: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/14.jpg)
MIM
D
•M
ultiple Instruction streams M
ultiple Data stream
s
•E
ach processor performs independent com
putations at thesam
e time
•E
xample of M
IMD
−sym
metric m
ultiprocessors (SMP)
−asym
metric m
ultiprocessors (AM
P)
−m
ath graphics coprocessors
−I/O
processors
−C
DC
peripheral processors
CS250 -- Part V
14D
r.Rajesh Subram
anyan, 2005
![Page 15: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/15.jpg)
Symm
etric Multiprocessors
Main
Mem
or y
(variou
sm
od
ules)
Devices
P1Pi
P2
Pi+1
PN
Pi+2
Symm
etric multiprocessor w
ith N identical processors. E
achprocessor has access to m
emory I/O
devices.
CS250 -- Part V
15D
r.Rajesh Subram
anyan, 2005
![Page 16: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/16.jpg)
Perform
ance of Multiprocessors
•Speedup =
execution time on a single processor/execution
time required on a m
ultiprocessor
CP
U
I/O d
evices
PP
1
PP
2
PP
3P
P4
PP
5
PP
6
PP
7P
P8
PP
9
PP
10
Illustration of the ideal and typical performance of a
multiprocessor as the num
ber of processors is increased.V
alues on the y-axis list the relative speedup compared to a
single processor.
CS250 -- Part V
16D
r.Rajesh Subram
anyan, 2005
![Page 17: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/17.jpg)
Perform
ance of Multiprocessors
•R
easons why
multiprocessor architectures have not fulfilled
the promise of scalable, high perform
ance computing.
−O
Sbottlenecks
−m
emory contention
−I/O
bottlenecks
When used for general-purpose com
puting,am
ultiprocessor does not perform w
ell. In some cases,
added overhead means perform
ance decreases as more
processorsare added.
CS250 -- Part V
17D
r.Rajesh Subram
anyan, 2005
![Page 18: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/18.jpg)
Consequence for P
rogramm
ers
•L
ocks and mutual exclusion
•Program
ming explicit and im
plicit parallel computers
•Program
ming sym
metric and asym
metric m
ultiprocessors
CS250 -- Part V
18D
r.Rajesh Subram
anyan, 2005
![Page 19: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/19.jpg)
Locks and M
utual Exclusion
•Shared variables in m
ultiprocessors face the following
problem.
−W
hat happens if two
processors attempt to increm
ent atthe sam
e time?
−V
ariable is incremented only once instead of the correct
two
times.
•O
ne solution: locks
CS250 -- Part V
19D
r.Rajesh Subram
anyan, 2005
![Page 20: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/20.jpg)
Locks and M
utual Exclusion
loadx, R5
incrR5
storeR5, x
An exam
ple sequence of machine instructions used to increm
enta
variable in mem
ory.Inm
ost architectures, increment entails a
load and store.
CS250 -- Part V
20D
r.Rajesh Subram
anyan, 2005
![Page 21: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/21.jpg)
Locks
lock17loadx, R
5incrR
5storeR
5, xrelease 17
Illustration of the instructions used to guarantee exclusive accessto a variable. A
separate lock is assigned to each shared item.
•O
nly one copyof
lock exists, only one processor can gainaccess to lock at a tim
e.
CS250 -- Part V
21D
r.Rajesh Subram
anyan, 2005
![Page 22: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/22.jpg)
Locks
•Problem
s
−program
mers forget to apply locks to shared variables
−errors due to sim
ultaneous access depend on timing and
are rare to detect
−locking can reduce perform
ance
−locking adds overhead
CS250 -- Part V
22D
r.Rajesh Subram
anyan, 2005
![Page 23: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/23.jpg)
Program
ming E
xplicit and Implicit P
arallel Com
puters
Fr om
apro gram
mer’s
point of view, a system
that uses explicitparallelism
is significantly more
complex
topro gram
that asystem
that uses implicit parallelism
.
CS250 -- Part V
23D
r.Rajesh Subram
anyan, 2005
![Page 24: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/24.jpg)
Redundant P
arallel Architectures
•U
se of parallel hardware to im
prove reliability and preventfailure
•M
ultiple copies of a hardware unit that operate in parallel to
perform an operation (sam
e data and same operation on all
units)
•W
hat happens if redundant copies of hardware disagree?
−votes?
−error m
essage.
CS250 -- Part V
24D
r.Rajesh Subram
anyan, 2005
![Page 25: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/25.jpg)
Distributed and C
luster Com
puters
•T
ightly coupled
−parallel hardw
are units are located inside the same
computer system
•L
oosely coupled
−m
ultiple computer system
s that are interconnected by acom
munication m
echanism
−e.g. distributed architectures, cluster com
puter
CS250 -- Part V
25D
r.Rajesh Subram
anyan, 2005
![Page 26: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/26.jpg)
Distributed and C
luster Com
puters
•D
istributed architecture
−set of com
puters connected by a computer netw
ork orInternet
•C
luster computer
−set of independent com
puters connected by a high-speedcom
puter network that are all dedicated to solving one
problem at a tim
e
−also called netw
ork cluster
CS250 -- Part V
26D
r.Rajesh Subram
anyan, 2005
![Page 27: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/27.jpg)
Summ
ary
•Parallelism
is one of the fundamental optim
ization techniquesused to increase hardw
are performance
•E
xplicit parallelism gives
aprogram
mer control over
the useof parallel facilities; im
plicit parallelism handles parallelism
automatically
•SIM
D architecture allow
s an instruction to operate on anarray of values
−vector processors
−graphic processors
CS250 -- Part V
27D
r.Rajesh Subram
anyan, 2005
![Page 28: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/28.jpg)
Summ
ary
•M
IMD
employs m
ultiple, independent processors thatoperate sim
ultaneously and can each execute a separateprogram
.
−sym
metric m
ultiprocessors
−asym
metric m
ultiprocessors
•Speedup
−theoretically m
achine with N
processors should performN
times as fast as a single processor
−in
practice, speedup does not increase linearly with
number of processors due to m
emory contention and
comm
unication overhead.
•A
lternativesto
SIMD
and MIM
D are redundant, distributed,
and cluster architecturesC
S250 -- Part V28
Dr.R
ajesh Subramanyan, 2005
![Page 29: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/29.jpg)
CS250 -- Part V
29D
r.Rajesh Subram
anyan, 2005
![Page 30: Pa Adv - eca.cs.purdue.edu · update 16 bytes of an image at the same time •C oarse-grain −p arallelism on the le ve lo fp rograms or lar ge blocks of data −e.g. dual processor,u](https://reader035.fdocuments.net/reader035/viewer/2022071014/5fcd6bff20f70369e33d4ee3/html5/thumbnails/30.jpg)
CS250 -- Part V
29D
r.Rajesh Subram
anyan, 2005