Anshul Kumar, CSE IITD CS718 : Data Parallel Processors 27 th April, 2006.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th...
-
Upload
crystal-austin -
Category
Documents
-
view
221 -
download
2
Transcript of Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th...
Anshul Kumar, CSE IITD
CSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : MultiprocessorsCSL718 : Multiprocessors
Interconnection Mechanisms
Performance Models
20th April, 2006
Anshul Kumar, CSE IITD slide 2
Connecting Processors and MemoriesConnecting Processors and MemoriesConnecting Processors and MemoriesConnecting Processors and Memories
• Shared Buses
• Interconnection Networks– Static Networks
– Dynamic Networks
P P P P
M M M
Interconnection Network
M
M M M
P P P P
M M M
Interconnection Network
M
M M M
Global Interconnection Network
M M M
Anshul Kumar, CSE IITD slide 3
Shared BusShared BusShared BusShared Buseach processor sees this picture:
processing
bus access
timentransactiobustimeprocessing
timentransactiobusnutilizatiobus
prob of a processor using the bus = prob of a processor not using the bus = 1 – prob of none of the n processors using the bus = (1 – )n
prob of at least one processor using the bus = 1 – (1 – )n
achieved BW on a relative scale = 1 – (1 – )n
required BW = n available BW = 1
Anshul Kumar, CSE IITD slide 4
Effect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requests
A W
(1-PA )1- + PA 1-PA
PA
1 also11
111
1
raterequestactual
111
a
aA
na
AA
A
A
A
wA
AWA
A
AA
AA
aPanBW
PP
P
P
P
qqa
qqP
P
PP
Pq
prob = qA prob = qW
Shared Bus : BW per proc
-0.100
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
BW required (req probability)
BW a
chie
ved
n = 2
n = 3
n = 4
n = 2
n = 3
n = 4
Shared Bus : utilization
-0.200
0.000
0.200
0.400
0.600
0.800
1.000
1.200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
req probability
utili
zatio
n
n = 2
n = 3
n = 4
n = 2
n = 3
n = 4
Anshul Kumar, CSE IITD slide 7
Waiting timeWaiting timeWaiting timeWaiting time
busa
abus
A
A
A
AAbus
iA
iAbus
Ai
Ai
busw
Ai
A
th
bus
TTP
P
P
PPTPiPT
PPTiT
PP
)(ii
T i
1
)1(1
1 )1(
)1( time waitingof valueExpected
)1( thisofy probabilit
attempt 1on accepted and times rejected isrequest if
timewaiting
21
1
Anshul Kumar, CSE IITD slide 8
Switched NetworksSwitched NetworksSwitched NetworksSwitched Networks
BUS• Shared media• Lower Cost• Lower throughput• Scalability poor
Switched Network• Switched paths• Higher cost• Higher throughput• Scalability better
Anshul Kumar, CSE IITD slide 9
Interconnection NetworksInterconnection NetworksInterconnection NetworksInterconnection Networks• Topology : who is connected to whom
• Direct / Indirect : where is switching done
• Static / Dynamic : when is switching done
• Circuit switching / packet switching : how are connections established
• Store & forward / worm hole routing : how is the path determined
• Centralized / distributed : how is switching controlled
• Synchronous/asynchronous : mode of operation
Anshul Kumar, CSE IITD slide 10
PM
Direct and Indirect NetworksDirect and Indirect NetworksDirect and Indirect NetworksDirect and Indirect Networks
PMS
PMS
SMP
SMP
PM
PM
PM
SW
ITC
H
DIRECTINDIRECT
node node
node node
link
linklink link
node
node
node
node
link
link
link
link
Anshul Kumar, CSE IITD slide 11
Static and Dynamic NetworksStatic and Dynamic NetworksStatic and Dynamic NetworksStatic and Dynamic Networks
• Static Networks– fixed point to point connections– usually direct– each node pair may not have a direct connection– routing through nodes
• Dynamic Networks– connections established as per need– usually indirect– path can be established between any pair of nodes– routing through switches
Anshul Kumar, CSE IITD slide 12
Static Network TopologiesStatic Network TopologiesStatic Network TopologiesStatic Network Topologies
Linear
Star
2D-Mesh
Tree
Non-uniform connectivity
Anshul Kumar, CSE IITD slide 13
Static Networks Topologies- contd.Static Networks Topologies- contd.Static Networks Topologies- contd.Static Networks Topologies- contd.
Ring
Fully ConnectedTorus
Uniform connectivity
Anshul Kumar, CSE IITD slide 14
Illiac IV Mesh NetworkIlliac IV Mesh NetworkIlliac IV Mesh NetworkIlliac IV Mesh Network
0 1 2
3 4 5
6 7 8
01
2
3
45
6
7
8
neighbors of node r :(r 1) mod 9 and(r 3) mod 9 Chordal Ring
Anshul Kumar, CSE IITD slide 16
Dynamic NetworksDynamic NetworksDynamic NetworksDynamic Networks
k kcross -bar
switch
building block for multi-stagedynamic networks
2 2switch
straight exchange upperbroadcast
lowerbroadcast
simplestcross-bar
Anshul Kumar, CSE IITD slide 17
Baseline NetworkBaseline NetworkBaseline NetworkBaseline Network
000
001
010
011
100
101
110
111
000
001
010
011
100
101
110
111
blocking can occur
Anshul Kumar, CSE IITD slide 19
Switching MechanismSwitching MechanismSwitching MechanismSwitching Mechanism
• Circuit Switching (connection oriented communication)– A circuit is established between the source and
the destination
• Packet Switching (connectionless communication)– Information is divided into packets and each
packet is sent independently from node to node
Anshul Kumar, CSE IITD slide 20
Routing in NetworksRouting in NetworksRouting in NetworksRouting in Networks
nodeincomingmessage
outgoingmessage
header payload/datastore & forward
routing
worm holerouting
time
BW
H
BW
l
BW
l
BW
Hnlatency
BW
l
BW
Hnlatency
Anshul Kumar, CSE IITD slide 21
Routing in presence of congestionRouting in presence of congestionRouting in presence of congestionRouting in presence of congestion
• Worm hole routing– When message header is blocked, many links
get blocked with the message
• Solution: cut-through routing– When message header is blocked, tail is
allowed to move, compressing the message into a single node
Anshul Kumar, CSE IITD slide 22
Routing OptionsRouting OptionsRouting OptionsRouting Options
• Deterministic routing: always same path followed
• Adaptive routing: best path selected to minimize congestion
• Source based routing: message specifies path to destination
• Destination based routing: message specifies only destination address
Anshul Kumar, CSE IITD slide 23
Some Performance ParametersSome Performance ParametersSome Performance ParametersSome Performance Parameters
time
sender
receiver
time of flight
overhead
overhead
Tx time=bytes/BW
Tx time=bytes/BW
transport latency
total latency
Anshul Kumar, CSE IITD slide 24
Other ParametersOther ParametersOther ParametersOther Parameters
• Throughput Bandwidth (no credit for header)
• Bisection bandwidth = BW across a bisection
• Node degree
• Network Diameter
• Cost
• Fault Tolerance
Anshul Kumar, CSE IITD slide 25
Multidimensional Grid/MeshMultidimensional Grid/MeshMultidimensional Grid/MeshMultidimensional Grid/Mesh
Size
=k k …. k (n times)
= k n
Diameter
= (k-1) n without end around
connections
= k n /2 with end around
connections
k-ary n-cube
for (Binary) Hypercube : k = 2
Anshul Kumar, CSE IITD slide 26
Grid/Mesh Performance - 1Grid/Mesh Performance - 1Grid/Mesh Performance - 1Grid/Mesh Performance - 1
cycle ain req message of prob is
dimension one along
hops of no. av. is
dimensions ofnumber is
rate arrival Message
r
k
n
knr
d
d
kd
Anshul Kumar, CSE IITD slide 27
Grid/Mesh Performance - 2Grid/Mesh Performance - 2Grid/Mesh Performance - 2Grid/Mesh Performance - 2
np
Tkr
T
n
sd
s
2link a along
request ofy Probabilit
2Occupancy Server
2 rate Service
Anshul Kumar, CSE IITD slide 28
Grid/Mesh Performance - 3Grid/Mesh Performance - 3Grid/Mesh Performance - 3Grid/Mesh Performance - 3
k-ary n-cube
sw
w
Tpp
T
D
T
)1(2)1(2
)(1
model queueopen 1//M use
node aat time waiting
B
Anshul Kumar, CSE IITD slide 29
Switch PerformanceSwitch PerformanceSwitch PerformanceSwitch Performance
k mcross -bar
switch
mm
mm
m
mm
E(i)i
rrCq(i)ki
T
r
i
i
ii
ikii
k
11
)1(
portsoutput of num
portoutput specific a including patterns address offraction
requests ofout accepted requests of no. expected
)1( ports on requests ussimultaneo of prob
timeservice same requires packet)(or
mesageeach that assumed isit Here
cycle service one duringport
input an at request of probLet
Anshul Kumar, CSE IITD slide 30
Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.
kk
k
i
iki
ik
k
i
ikii
k
k
i
ikii
kik
i
ikii
k
k
i
ikii
ki
k
i
m
rmmrr
m
mmm
rrm
mCmrrCm
rrCmm
mrrCm
rrCmm
m
iqiE
1)1(1
)1(1
)1(
)1(1
)1(
)1(1
1
)()( scale) relative(on BW Expected
00
00
0
0
Anshul Kumar, CSE IITD slide 31
Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.Switch Performance – contd.
waiting.of because delays compute also and submission-re
todue raterequest revised compute toneed We
conflicts. todue submission-rerequest ofeffect consider now We
requests of acceptance of prob
)1 that (assuming as wellas than less is this
1conflicts)port output of (becauseBW Expected
conflicts)port output no were there(ifBW Expected
bandwidth Requested
kr
BWP
rr km
m
rmm
m
r k
A
k
Anshul Kumar, CSE IITD slide 32
Effect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requestsEffect of re-submitted requests
link ofBW
1 timewaiting
'
'1
1'
) and states graph with Markov (using
' raterequestactual
lHtimecycleT
TP
P
kr
BWP
m
rmmBW
rPr
rr
qqrr
A
A
A
k
A
wA
wA
Anshul Kumar, CSE IITD slide 33
Effect of bufferingEffect of bufferingEffect of bufferingEffect of buffering
There are two possibilities
• Buffering before switching (k buffers, one at each input port)
• Buffering after switching (m buffers, one at each output port)
Anshul Kumar, CSE IITD slide 34
Switch with input buffersSwitch with input buffersSwitch with input buffersSwitch with input buffers
Rate of messages at input and output of each queue
is same in steady state - r per cycle
Service time includes delays due to conflicts,
calculated as earlier. This has an
exponential distribution – recall the analysis for a
shared bus.
M/M/1 open queue model can be used to calculate
queuing delay. Details are omitted.
TP
P
A
A1
Anshul Kumar, CSE IITD slide 35
Switch with output buffersSwitch with output buffersSwitch with output buffersSwitch with output buffers
Here we assume that all the messages destined for same
output are queued in the same buffer, in some order. That
is no rejections and no re-submissions.
For each queue,
Messages arriving per service cycle = =
Prob of a request coming from one of
the k sources = p =
Apply MB/D/1 model for finding queuing delay Tw
m
kr
m
r
Tp
Tw )1(2