Interconnection Network - AndroBench

SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected])

Interconnection Network

Jinkyu Jeong ([email protected])Computer Systems Laboratory

Sungkyunkwan Universityhttp://csl.skku.edu

SSE3054: Multicore Systems, Spring 2017, Jinkyu Jeong ([email protected]) 2

Topics• Taxonomy

• Metric

• Topologies

• Characteristics– Cost– Performance


Interconnection Networks • Carry data between processors and to memory. • Components– Switches– Interfaces– Links (wires, fiber)

• Classifications– Static networks

• Point-to-point communication links among processing nodes• A.k.a. direct networks

– Dynamic networks• Using switches and communication links• A.k.a. indirect networks


Static and Dynamic Networks

static(direct) network dynamic(indirect) network


Dynamic Network Switch• Map a fixed number of inputs to outputs

• Number of ports– Degree of the switch.

• Switch cost– Grows as the square of switch degree– Packaging costs linearly as the number of pins


Network Interfaces • Links processors (or node) to the interconnect• Functions

– Packetizing communication data– Computing routing information– Buffering incoming/outgoing data

• Network interface connection– I/O buss: Peripheral Component Interface express (PCIe)– Memory bus: e.g., AMD HyperTransport, Intel QuickPath

• Network performance– Depends on relative speeds of I/O and memory busses


Example: Intel Quickpath Interconnect


Network Topologies • A variety of network topologies exist

• Topologies tradeoff performance for cost

• Commercial machines often implement hybrids of multiple topologies

– Due to packaging, cost, and available components


Metrics for Interconnection Networks

• Degree– # of links per node

• Diameter– Longest distance between two nodes in the network– Worst case communication latency

• Bisection width– Minimum # of wire cuts to divide the network into two

equal parts

• Cost– # of links and switches


Network Topologies: Buses • All processors access a common bus for exchanging data• Used in simplest and earliest parallel machines

– Ex. Sun enterprise servers, Intel Pentium

• Advantages– Distance between any two nodes is O(1)– Provides a convenient broadcast media

• Disadvantage– Bus bandwidth is a performance bottleneck

P P P…

Bus


Network Topologies: Buses

P-Pro bus (64-bit data, 36-bit address, 66 MHz)

CPU

Bus interface

MIU

P-Promodule

P-Promodule

P-Promodule256-KB

L2 $Interruptcontroller

PCIbridge

PCIbridge

Memorycontroller

1-, 2-, or 4-wayinterleaved

DRAM

PCI b

us

PCI b

usPCII/O

cards

Bus-based interconnects without local caches

Bus-based interconnects with local caches

Interconnects in Intel Pentium Pro Quad


Network Topologies: CrossbarsA crossbar network uses an p�m grid of switches to

connect p inputs to m outputs in a non-blocking manner


Network Topologies: Crossbars• Cost– O(p2) for p processors (and memory banks)

• Difficult to scale for large values of p

• Ex. Sun Ultra HPC 10000 and the Fujitsu VPP500.


Multistage Networks • Busses– Excellent cost scalability– Poor performance scalability

• Crossbars– Excellent performance scalability– Poor cost scalability

• Multistage interconnects– Compromise between these extremes


Multistage Networks

The schematic of a typical multistage interconnection network.


Multistage Omega Network• Organization

– log p stages– p Inputs and outputs

• At each stage, input i is connected to output j if:


Multistage Omega Network

Each stage of the Omega network implements a perfect shuffle as follows:

A perfect shuffle interconnection for eight inputs and outputs



• The perfect shuffle patterns are connected using 2�2 switches.

• The switches operate in two modes– Pass-through– Cross-over

Pass-through Cross-over



A complete omega network connecting eight inputs and eight outputs.

Cost: p/2 � log p switches à O(p log p)


Multistage Omega Network – Routing

• s is source processor in binary representation• d is destination processor in binary

representation• In each stage– if the most significant bits in s and d are the same

• Pass-through– Otherwise

• Cross-over– Strip the most significant bits

• Repeat for each of the log p switching stages


Multistage Omega Network – Routing

• Example: 001 à 1001. Stage 1: 0 != 1 à cross-over2. Stage 2: 0 == 0 à pass-through3. Stage 3: 1 != 0 à cross-over

cross-over

cross-over

pass-through


Blocking in Omega Network

One of the messages (010 to 111 or 110 to 100) is blocked at link AB


Completely Connected Network• Each processor is connected to every other processor• Costs

– # of links is O(p2)

• Performance scales very well• Hardware complexity is not realizable for large values of p• Static counterparts of crossbars.


Star Connected Network• Every node is connected only to a common node

at the center• Distance between any pair of nodes– O(1)– But, the central node becomes a bottleneck

• Static counterparts of buses


Linear Array & Ring• Linear array– Each node has two neighbors, one to its left and

one to its right

• Ring (or 1-D torus)– If the nodes at either end are connected (having a

wrap-around link)


Meshes and k-dimensional Meshes• Mesh– Generalization of linear array to 2D– 4 neighbors (north, south, east, and west)

• k-dimensional mesh– 2k neighbors

2D mesh 2D torus 3D mesh


Hypercubes

0D 1D 2D 3D

4D


Hypercubes• Distance between any two nodes is at most log p

• Each node has log p neighbors

• Distance between two nodes– # of bit positions at which the two nodes differ


Tree-Based Networks

Static tree network Dynamic tree network


Tree-Based Networks• Distance between two nodes is

at most 2 log p• Easy to layout as planar graphs– E.g. H-Trees

• Root can become bottleneck– Links closer to root carry more traffic than those at

lower levels

• Solution: fat tree– Fattens the links as we go up the tree

H-Tree


Fat Tree

A fat tree network of 16 processing nodes.


Evaluating Interconnection Networks• Diameter

– Longest distance between two nodes– Measuring the longest latency of possible communications

• Bisection Width– Minimum # of wire cuts to divide the network into two equal parts– Measuring # of concurrent communications

• Cost– # of links or switches– Ability to layout the network– Length of wires

two concurrent communications vs. 4 concurrent communications


Static Interconnection Networks

Network Diameter Bisection Width Cost (# of links)

Completely-connected

Star

Complete binary tree

Linear array

2-D mesh, no wraparound

2-D wraparound mesh

Hypercube

Wraparound k-ary d-cube


Dynamic Interconnection Networks

Network Diameter Bisection Width Cost (# of switches)

Crossbar

Omega Network

Dynamic Tree


Summary• Interconnection network

– Performance (latency, bandwidth), Cost (#links, #switches)– Used to be important, becomes less important– Likely to be important for multi-core processors

• Topologies – Low dimension networks

• Bus, ring, mesh, torus – embedding into 2D/3D• Direct network (nodes are connected directly)

– Logarithmic networks (multi-stage networks)• More switches between nodes (nodes are connected indirectly)

– High dimension networks• Hypercube (binary n-cube) – theoretically good characteristics• But degree of node increases exponentially – impractical in real world


References• Chapter 2.4.2-2.4.4 in “Introduction to Parallel

Computing” by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar, Addison Wesley, 2003

• COMP422: Parallel Computing by Prof. John Mellor-Crummey at Rice Univ.

Interconnection Network - AndroBench

Documents

Transcript of Interconnection Network - AndroBench