Interconnection Network - AndroBench
Transcript of Interconnection Network - AndroBench
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])
Interconnection Network
Jinkyu Jeong ([email protected])Computer Systems Laboratory
Sungkyunkwan Universityhttp://csl.skku.edu
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 2
Topics• Taxonomy
• Metric
• Topologies
• Characteristics– Cost– Performance
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 3
Interconnection Networks • Carry data between processors and to memory. • Components– Switches– Interfaces– Links (wires, fiber)
• Classifications– Static networks
• Point-to-point communication links among processing nodes• A.k.a. direct networks
– Dynamic networks• Using switches and communication links• A.k.a. indirect networks
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 4
Static and Dynamic Networks
static(direct) network dynamic(indirect) network
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 5
Dynamic Network Switch• Map a fixed number of inputs to outputs
• Number of ports– Degree of the switch.
• Switch cost– Grows as the square of switch degree– Packaging costs linearly as the number of pins
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 6
Network Interfaces • Links processors (or node) to the interconnect• Functions
– Packetizing communication data– Computing routing information– Buffering incoming/outgoing data
• Network interface connection– I/O buss: Peripheral Component Interface express (PCIe)– Memory bus: e.g., AMD HyperTransport, Intel QuickPath
• Network performance– Depends on relative speeds of I/O and memory busses
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 7
Example: Intel Quickpath Interconnect
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 8
Network Topologies • A variety of network topologies exist
• Topologies tradeoff performance for cost
• Commercial machines often implement hybrids of multiple topologies
– Due to packaging, cost, and available components
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 9
Metrics for Interconnection Networks
• Degree– # of links per node
• Diameter– Longest distance between two nodes in the network– Worst case communication latency
• Bisection width– Minimum # of wire cuts to divide the network into two
equal parts
• Cost– # of links and switches
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 10
Network Topologies: Buses • All processors access a common bus for exchanging data• Used in simplest and earliest parallel machines
– Ex. Sun enterprise servers, Intel Pentium
• Advantages– Distance between any two nodes is O(1)– Provides a convenient broadcast media
• Disadvantage– Bus bandwidth is a performance bottleneck
P P P…
Bus
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])
Network Topologies: Buses
P-Pro bus (64-bit data, 36-bit address, 66 MHz)
CPU
Bus interface
MIU
P-Promodule
P-Promodule
P-Promodule256-KB
L2 $Interruptcontroller
PCIbridge
PCIbridge
Memorycontroller
1-, 2-, or 4-wayinterleaved
DRAM
PCI b
us
PCI b
usPCII/O
cards
Bus-based interconnects without local caches
Bus-based interconnects with local caches
Interconnects in Intel Pentium Pro Quad
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 12
Network Topologies: CrossbarsA crossbar network uses an p�m grid of switches to
connect p inputs to m outputs in a non-blocking manner
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 13
Network Topologies: Crossbars• Cost– O(p2) for p processors (and memory banks)
• Difficult to scale for large values of p
• Ex. Sun Ultra HPC 10000 and the Fujitsu VPP500.
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 14
Multistage Networks • Busses– Excellent cost scalability– Poor performance scalability
• Crossbars– Excellent performance scalability– Poor cost scalability
• Multistage interconnects– Compromise between these extremes
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])
Multistage Networks
The schematic of a typical multistage interconnection network.
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 16
Multistage Omega Network• Organization
– log p stages– p Inputs and outputs
• At each stage, input i is connected to output j if:
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 17
Multistage Omega Network
Each stage of the Omega network implements a perfect shuffle as follows:
A perfect shuffle interconnection for eight inputs and outputs
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 18
Multistage Omega Network
• The perfect shuffle patterns are connected using 2�2 switches.
• The switches operate in two modes– Pass-through– Cross-over
Pass-through Cross-over
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 19
Multistage Omega Network
A complete omega network connecting eight inputs and eight outputs.
Cost: p/2 � log p switches à O(p log p)
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 20
Multistage Omega Network – Routing
• s is source processor in binary representation• d is destination processor in binary
representation• In each stage– if the most significant bits in s and d are the same
• Pass-through– Otherwise
• Cross-over– Strip the most significant bits
• Repeat for each of the log p switching stages
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 21
Multistage Omega Network – Routing
• Example: 001 à 1001. Stage 1: 0 != 1 à cross-over2. Stage 2: 0 == 0 à pass-through3. Stage 3: 1 != 0 à cross-over
cross-over
cross-over
pass-through
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 22
Blocking in Omega Network
One of the messages (010 to 111 or 110 to 100) is blocked at link AB
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 23
Completely Connected Network• Each processor is connected to every other processor• Costs
– # of links is O(p2)
• Performance scales very well• Hardware complexity is not realizable for large values of p• Static counterparts of crossbars.
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 24
Star Connected Network• Every node is connected only to a common node
at the center• Distance between any pair of nodes– O(1)– But, the central node becomes a bottleneck
• Static counterparts of buses
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 25
Linear Array & Ring• Linear array– Each node has two neighbors, one to its left and
one to its right
• Ring (or 1-D torus)– If the nodes at either end are connected (having a
wrap-around link)
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 26
Meshes and k-dimensional Meshes• Mesh– Generalization of linear array to 2D– 4 neighbors (north, south, east, and west)
• k-dimensional mesh– 2k neighbors
2D mesh 2D torus 3D mesh
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 27
Hypercubes
0D 1D 2D 3D
4D
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 28
Hypercubes• Distance between any two nodes is at most log p
• Each node has log p neighbors
• Distance between two nodes– # of bit positions at which the two nodes differ
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 29
Tree-Based Networks
Static tree network Dynamic tree network
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 30
Tree-Based Networks• Distance between two nodes is
at most 2 log p• Easy to layout as planar graphs– E.g. H-Trees
• Root can become bottleneck– Links closer to root carry more traffic than those at
lower levels
• Solution: fat tree– Fattens the links as we go up the tree
H-Tree
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 31
Fat Tree
A fat tree network of 16 processing nodes.
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 32
Evaluating Interconnection Networks• Diameter
– Longest distance between two nodes– Measuring the longest latency of possible communications
• Bisection Width– Minimum # of wire cuts to divide the network into two equal parts– Measuring # of concurrent communications
• Cost– # of links or switches– Ability to layout the network– Length of wires
two concurrent communications vs. 4 concurrent communications
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])
Static Interconnection Networks
Network Diameter Bisection Width Cost (# of links)
Completely-connected
Star
Complete binary tree
Linear array
2-D mesh, no wraparound
2-D wraparound mesh
Hypercube
Wraparound k-ary d-cube
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])
Dynamic Interconnection Networks
Network Diameter Bisection Width Cost (# of switches)
Crossbar
Omega Network
Dynamic Tree
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 35
Summary• Interconnection network
– Performance (latency, bandwidth), Cost (#links, #switches)– Used to be important, becomes less important– Likely to be important for multi-core processors
• Topologies – Low dimension networks
• Bus, ring, mesh, torus – embedding into 2D/3D• Direct network (nodes are connected directly)
– Logarithmic networks (multi-stage networks)• More switches between nodes (nodes are connected indirectly)
– High dimension networks• Hypercube (binary n-cube) – theoretically good characteristics• But degree of node increases exponentially – impractical in real world
SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 36
References• Chapter 2.4.2-2.4.4 in “Introduction to Parallel
Computing” by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Addison Wesley, 2003
• COMP422: Parallel Computing by Prof. John Mellor-Crummey at Rice Univ.