Program and Network Properties

download Program and Network Properties

of 94

Transcript of Program and Network Properties

  • 7/24/2019 Program and Network Properties

    1/94

    Chapter 2: Program and Network Properties

    Conditions of parallelism

    Program partitioning and scheduling

    Program flow mechanisms

    System interconnect architectures

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    2/94

    EENG-630 - Chapter 2

    Conditions of Parallelism

    The exploitation of parallelism in computing requires

    understanding the basic theor associated !ith it"

    #rogress is needed in se$eral areas%computation models for parallel computing

    interprocessor communication in parallel architectures

    integration of parallel sstems into general en$ironments

  • 7/24/2019 Program and Network Properties

    3/94

    Data dependences

    The ordering relationship between statements is indicated by the data

    dependence.

    low dependence

    !nti dependence

    "utput dependence

    #$" dependence

    %nknown dependence

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    4/94

    Data Dependence & '

    low dependence: S' precedes S2( and at least one output of S' is

    input to S2.

    !ntidependence: S' precedes S2( and the output of S2 o)erlaps the

    input to S'.

    "utput dependence: S' and S2 write to the same output )ariable.

    #$" dependence: two #$" statements *read$write+ reference the

    same )ariable( and$or the same file.

  • 7/24/2019 Program and Network Properties

    5/94

    Data Dependence & 2

    %nknown dependence:

    o The subscript of a )ariable is itself subscripted.

    o The subscript does not contain the loop inde, )ariable.

    o ! )ariable appears more than once with subscripts ha)ing different

    coefficients of the loop )ariable *that is( different functions of the loop

    )ariable+.

    o The subscript is nonlinear in the loop inde, )ariable.

    Parallel e,ecution of program segments which do not ha)e total data

    independence can produce non&deterministic results.

  • 7/24/2019 Program and Network Properties

    6/94

    Data dependence e,ample

    EENG-630 - Chapter 2

    &'% (oad )'* +&2% +dd )2* )'

    &3% ,o$e )'* )3

    &% &tore .* )'

    &'

    &2 &

    &3

  • 7/24/2019 Program and Network Properties

    7/94

    #$" dependence e,ample

    EENG-630 - Chapter 2

    &'% )ead /* +/1&2% )e!ind /

    &3% rite /* ./1

    &% )e!ind /

    &' &314

  • 7/24/2019 Program and Network Properties

    8/94

    Control dependence

    The order of e,ecution of statements cannot be determined before

    run time

    o Conditional branches

    o Successi)e operations of a looping procedure

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    9/94

    Control dependence e,amples

    Do2- # '( N

    !*#+ C*#+

    #*!*#+ ./T. -+ !*#+'

    2- Continue

    Do'- # '( N

    #*!*#&'+ .01. -+ !*#+-

    '- Continue

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    10/94

    esource dependence

    Concerned with the conflicts in using shared resources

    o #nteger units

    o loating&point units

    o egisters

    o 3emory areas

    o !/%

    o 4orkplace storage

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    11/94

    5ernstein6s conditions

    Set of conditions for two processes to e,ecute in parallel

    #'"2 7

    #2"' 7

    "'"2 7

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    12/94

    5ernstein6s Conditions & 2

    #n terms of data dependencies( 5ernstein6s conditions imply that

    two processes can e,ecute in parallel if they are flow&independent(

    antiindependent( and output&independent.

    The parallelism relation 88 is commutati)e *Pi88Pj impliesPj 88Pi +(

    but not transiti)e *Pi88PjandPj88Pkdoes not implyPi88Pk+ .

    Therefore( 88 is not an e9ui)alence relation.

    #ntersection of the input sets is allowed.

  • 7/24/2019 Program and Network Properties

    13/94

    %tiliing 5ernstein6s conditions

    P': C D , 0

    P2: 3 ; < C

    P=: ! 5 < C

    P>: C / < 3

    P?: ; $ 0

    EENG-630 - Chapter 2

    #'

    #3

    #2 #

    #5

  • 7/24/2019 Program and Network Properties

    14/94

    %tiliing 5ernstein6s conditions

  • 7/24/2019 Program and Network Properties

    15/94

    @ardware parallelism

    ! function of cost and performance tradeoffs

    Displays the resource utiliation patterns of

    simultaneously e,ecutable operations Denote the number of instruction issues per

    machine cycle: k-issueprocessor

    ! multiprocessor system with n k&issue processorsshould be able to handle a ma,imum number of nk

    threads of instructions simultaneously

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    16/94

    Software parallelism

    Defined by the control and data dependence of programs

    ! function of algorithm( programming style( and compiler

    organiation

    The program flow graph displays the patterns of simultaneously

    e,ecutable operations

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    17/94

    3ismatch between software and hardware parallelism & '

    (' (2 (3 (

    ' 2

    7 -

    + .

    Maximum softwareparallelism (L=load,

    X/+/- = arithmetic).

    Ccle '

    Ccle 2

    Ccle 3

  • 7/24/2019 Program and Network Properties

    18/94

    3ismatch between software and hardwareparallelism & 2

    ('

    (2

    (

    (3'

    2

    7

    -+

    .

    Ccle '

    Ccle 2

    Ccle 3

    Ccle

    Ccle 5

    Ccle 6

    Ccle 8

    Same problem, but

    cosideri! the

    parallelism o a two-issuesuperscalar processor.

  • 7/24/2019 Program and Network Properties

    19/94

    3ismatch between software and hardwareparallelism & =

    ('

    (2

    &'

    '

    7

    (5

    (3

    (

    &2

    2

    -

    (6

    .+

    Ccle '

    Ccle 2

    Ccle 3

    Ccle

    Ccle 5

    Ccle 6

    Same problem,with two si!le-

    issue processors

    = iserted for

    s"chroi#atio

  • 7/24/2019 Program and Network Properties

    20/94

    Software parallelism

    Control parallelism A allows two or more operations to be

    performed concurrently

    o Pipelining( multiple functional units

    Data parallelism A almost the same operation is performed o)er

    many data elements by many processors concurrently

    o Code is easier to write and debug

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    21/94

    Types of Software Parallelism

    Control Parallelism A two or more operations can be performedsimultaneously. This can be detected by a compiler( or aprogrammer can e,plicitly indicate control parallelism by usingspecial language constructs or di)iding a program into multiple

    processes. Data parallelism A multiple data elements ha)e the same operations

    applied to them at the same time. This offers the highest potentialfor concurrency *in S#3D and 3#3D modes+. Synchroniation inS#3D machines handled by hardware.

  • 7/24/2019 Program and Network Properties

    22/94

    Sol)ing the 3ismatch Problems

    De)elop compilation support

    edesign hardware for more efficient e,ploitation by compilers

    %se large register files and sustained instruction pipelining.

    @a)e the compiler fill the branch and load delay slots in code

    generated for #SC processors.

  • 7/24/2019 Program and Network Properties

    23/94

    The ole of Compilers

    Compilers used to e,ploit hardware features to impro)eperformance.

    #nteraction between compiler and architecture design is a necessityin modern computer de)elopment.

    #t is not necessarily the case that more software parallelism willimpro)e performance in con)entional scalar processors.

    The hardware and compiler should be designed at the same time.

  • 7/24/2019 Program and Network Properties

    24/94

    Program Partitioning B Scheduling

    The sie of the parts or pieces of a program that can be considered

    for parallel e,ecution can )ary.

    The sies are roughly classified using the term granule sie( or

    simply granularity.

    The simplest measure( for e,ample( is the number of instructions in

    a program part.

    ;rain sies are usually described asfine( mediumor coarse(depending on the le)el of parallelism in)ol)ed.

  • 7/24/2019 Program and Network Properties

    25/94

    /atency

    Latencyis the time re9uired for communication between different

    subsystems in a computer.

    Memory latency( for e,ample( is the time re9uired by a processor to

    access memory.

    Synchronization latencyis the time re9uired for two processes to

    synchronie their e,ecution.

    Computational granularity and communicatoin latency are closelyrelated.

  • 7/24/2019 Program and Network Properties

    26/94

    /e)els of Parallelism

    9obs or programs

    1nstructions

    or statements

    Non-recursi$e loopsor unfolded iterations

    #rocedures* subroutines*

    tas:s* or coroutines

    &ubprograms* ;ob steps or

    related parts of a program

    cycles *se9uential program on one processor+

    o M>' cycles *L processors+ & speedup '.'

    o >> cycles *> processors+ & speedup '.O>

  • 7/24/2019 Program and Network Properties

    41/94

    Static multiprocessor scheduling

    ;rain packing may not be optimal

    Dynamic multiprocessor scheduling is an NP&hard problem

    Node duplication is a static scheme for multiprocessor scheduling

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    42/94

    Node duplication

    Duplicate some nodes to eliminate idle time and reduce

    communication delays

    ;rain packing and node duplication are often used Eointly to

    determine the best grain sie and corresponding schedule

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    43/94

    Schedule without node duplication

    EENG-630 - Chapter 2

    +*a*'

    b*' c*?

    a*?

    c*'

    C*'.*'

    @*2 E*2

    e*d*

    #' #2 #2#'

    +

    @

    .

    1

    E

    C

    1

    6

    '3

    2'

    28

    23

    20

    '6

    '

    '2

  • 7/24/2019 Program and Network Properties

    44/94

    Schedule with node duplication

    EENG-630 - Chapter 2

    +*a*'

    b*' c*'

    a*'

    c*'

    CA*'.*'

    @*2 E*2

    #' #2 #2#'

    +

    @

    .

    E

    C

    +

    6

    '0

    '

    '3

    B

    6

    C*'

    +A*

    a*'

    C8

  • 7/24/2019 Program and Network Properties

    45/94

    ;rain determination and scheduling optimiation

    Step ': Construct a fine&grain program graph

    Step 2: Schedule the fine&grain computation

    Step =: ;rain packing to produce coarse grains

    Step >: ;enerate a parallel schedule based on

    the packed graph

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    46/94

    Program low 3echanisms

    Con)entional machines used control flow mechanism in which order of

    program e,ecution e,plicitly stated in user programs.

    Dataflow machineswhich instructions can be e,ecuted by determining

    operand a)ailability.

    Reduction machinestrigger an instruction6s e,ecution based on the

    demand for its results.

  • 7/24/2019 Program and Network Properties

    47/94

    Control low )s. Data low

    Control flow machines used shared memory for instructions and

    data. Since )ariables are updated by many instructions( there may

    be side effects on other instructions. These side effects fre9uently

    pre)ent parallel processing. Single processor systems are

    inherently se9uential.

    #nstructions in dataflow machines are unordered and can be

    e,ecuted as soon as their operands are a)ailableF data is held in

    the instructions themsel)es. Data tokensare passed from an

    instruction to its dependents to trigger e,ecution.

  • 7/24/2019 Program and Network Properties

    48/94

    Data low eatures

    No need for

    o shared memory

    o program counter

    o

    control se9uencer

    Special mechanisms are re9uired to

    o detect data a)ailability

    o match data tokens with instructions needing them

    o enable chain reaction of asynchronous instruction e,ecution

  • 7/24/2019 Program and Network Properties

    49/94

    ! Dataflow !rchitecture & '

    The !r)ind machine *3#T+ has N P0s and an N&by&Ninterconnection network.

    0ach P0 has a token&matching mechanism that dispatchesonly instructions with data tokens a)ailable.

    0ach datum is tagged witho address of instruction to which it belongso conte,t in which the instruction is being e,ecuted

    Tagged tokens enter P0 through local path *pipelined+( andcan also be communicated to other P0s through the routing

    network.

  • 7/24/2019 Program and Network Properties

    50/94

    ! Dataflow !rchitecture & 2

    #nstruction address*es+ effecti)ely replace the program counter in a

    control flow machine.

    Conte,t identifier effecti)ely replaces the frame base register in a

    control flow machine.

    Since the dataflow machine matches the data tags from one

    instruction with successors( synchronied instruction e,ecution is

    implicit.

  • 7/24/2019 Program and Network Properties

    51/94

    ! Dataflow !rchitecture & =

    !nI-structurein each P0 is pro)ided to eliminate e,cessi)e copying

    of data structures.

    0ach word of the #&structure has a two&bit tag indicating whether the

    )alue is empty( full( or has pending read re9uests.

    This is a retreat from the pure dataflow approach.

    0,ample 2. shows a control flow and dataflow comparison.

    Special compiler technology needed for dataflow machines.

  • 7/24/2019 Program and Network Properties

    52/94

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    53/94

    Demand&Dri)en 3echanisms

    Data&dri)en machines select instructions for e,ecution based on the

    a)ailability of their operandsF this is essentially a bottom&up approach.

    Demand&dri)en machines take a top&down approach( attempting to

    e,ecute the instruction *a demander+ that yields the final result. This

    triggers the e,ecution of instructions that yield its operands( and so

    forth.

    The demand&dri)en approach matches naturally with functional

    programming languages *e.g. /#SP and SC@030+.

  • 7/24/2019 Program and Network Properties

    54/94

    eduction 3achine 3odels

    String&reduction model:o each demander gets a separate copy of the e,pression string to

    e)aluateo each reduction step has an operator and embedded reference to

    demand the corresponding operandso each operator is suspended while arguments are e)aluated

    ;raph&reduction model:o e,pression graph reduced by e)aluation of branches or subgraphs(

    possibly in parallel( with demanders gi)en pointers to results ofreductions.

    o

    based on sharing of pointers to argumentsF tra)ersal and re)ersal ofpointers continues until constant arguments are encountered.

  • 7/24/2019 Program and Network Properties

    55/94

    System #nterconnect !rchitectures

    Direct networks for static connections

    #ndirect networks for dynamic connections

    Networks are used for

    o internal connections in a centralied system among

    processors

    memory modules

    #$" disk arrays

    o distributed networking of multicomputer nodes

  • 7/24/2019 Program and Network Properties

    56/94

    ;oals and !nalysis

    The goals of an interconnection network are to pro)ideo low&latencyo high data transfer rateo wide communication bandwidth

    !nalysis includeso

    latencyo bisection bandwidtho data&routing functionso scalability of parallel architecture

  • 7/24/2019 Program and Network Properties

    57/94

    Network Properties and outing

    Static networks: point&to&point direct connections that will not

    change during program e,ecution

    Dynamic networks:

    o switched channels dynamically configured to match user program

    communication demands

    o include buses( crossbar switches( and multistage networks

    5oth network types also used for inter&P0 data routing in S#3D

    computers

  • 7/24/2019 Program and Network Properties

    58/94

    Network Parameters

    Network sie: The number of nodes in the graph used to represent

    the network

    Node Degree d: The number of edges incident to a node. Sum of in

    degree and out degree

    Network Diameter D: The ma,imum shortest path between any two

    nodes

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    59/94

    Network Parameters *cont.+

    5isection 4idth:o Channel bisection width b: The minimum number of edges along the cut

    that di)ides the network in two e9ual hal)eso 0ach channel has w bit wireso 4ire bisection width: 5bwF 5 is the wiring density of the network. #t

    pro)ides a good indicator of tha ma, communication bandwidth along thebisection of the network

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    60/94

    Terminology & '

    Network usually represented by a graph with a finitenumber of nodes linked by directed or undirected edges.

    Number of nodes in graph network size. Number of edges *links or channels+ incident on a node

    node degree d *also note in and out degrees whenedges are directed+. Node degree reflects number of #$"ports associated with a node( and should ideally be smalland constant.

    DiameterDof a network is the ma,imum shortest pathbetween any two nodes( measured by the number of linkstra)ersedF this should be as small as possible *from acommunication point of )iew+.

    T i l

  • 7/24/2019 Program and Network Properties

    61/94

    Terminology & 2

    Channel isection width minimum number of edges cutto split a network into two parts each ha)ing the samenumber of nodes. Since each channel has wbit wires( thewire isection width! w. 5isection width pro)idesgood indication of ma,imum communication bandwidthalong the bisection of a network( and all other cross sections

    should be bounded by the bisection width. "ire*or channel+ length length *e.g. weight+ of edges

    between nodes. Network is symmetric if the topology is the same looking

    from any nodeF these are easier to implement or to

    program. "ther useful characteriing properties: homogeneousnodesI buffered channelsI nodes are switchesI

  • 7/24/2019 Program and Network Properties

    62/94

    Data outing unctions

    Shifting otating Permutation *one to one+ 5roadcast *one to all+ 3ulticast *many to many+ Personalied broadcast *one to many+ Shuffle 0,change 0tc.

  • 7/24/2019 Program and Network Properties

    63/94

    Permutations

    or n obEects there are nQ permutations by whichthe n obEects can be reordered.The set of allpermutations form a permutation group with

    respect to a composition operation. Cyclenotation can be used to specify a permutationoperation.

    Permutation *a( b( c+*d( e+ means: a&Rb( b&Rc(c&Ra( d&Re and e&Rd in a circular fashion. The

    cycle *a( b( c+ has a period of =( and the cycle *d(e+ has a period of 2. will ha)e a period e9ual to2 , = .

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    64/94

    Permutations *cont.+

    Can be implemented using crossbar switches( multistage networks

    or with shifting or broadcast operations.

    Permutation capability is an indication of network6s data routing

    capabilities

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    65/94

    Perfect Shuffle and 0,change

    Stone suggested the special permutation that entries according to

    the mapping of the k&bit binary number a # kto c # k a*that is(

    shifting ' bit to the left and wrapping it around to the least

    significant bit position+.

    The in)erse perfect shuffle re)erses the effect of the perfect shuffle.

  • 7/24/2019 Program and Network Properties

    66/94

    Perfect Shuffle

    Special permutation function

    n 2kobEectsF each obEect representation re9uires k bits

    Perfect shuffle maps , to y where:

    o , * ,k&'( ( ,'( ,- +

    o y * ,k&2( ( ,'( ,-( ,k&'+

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    67/94

    0,change

    n 2kobEectsF each obEect representation re9uires k bits

    The e,change maps , to y where:

    o , * ,k&'( ( ,'( ,- +

    o y * ,k&'( ( ,'( ,-6 +

    @ypercube routing functions are e,changes

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    68/94

    @ypercube outing unctions

    #f the )ertices of a n&dimensional cube are labeled with n&bit

    numbers so that only one bit differs between each pair of adEacent

    )ertices( then nrouting functions are defined by the bits in the node

    *)erte,+ address.

    or e,ample( with a =&dimensional cube( we can easily identify

    routing functions that e,change data between nodes with addresses

    that differ in the least significant( most significant( or middle bit.

  • 7/24/2019 Program and Network Properties

    69/94

    5roadcast and 3ulticast

    5roadcast: "ne&to&all mapping

    3ulticast: one subset to another subset

    Personalied 5roadcast: Personalied messages to only selected

    recei)ers

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    70/94

    Network Performance

    unctionality

    Network latency

    5andwidth

    @ardware comple,ity

    Scalability

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    71/94

    actors !ffecting Performance

    unctionality A how the network supports data routing( interrupt

    handling( synchroniation( re9uest$message combining( and

    coherence

    Network latency A worst&case time for a unit message to betransferred

    5andwidth A ma,imum data rate

    @ardware comple,ity A implementation costs for wire( logic(

    switches( connectors( etc. Scalability A how easily does the scheme adapt to an increasing

    number of processors( memories( etc.I

  • 7/24/2019 Program and Network Properties

    72/94

    Static Networks

    /inear !rray

    ing and Chordal ing

    5arrel Shifter

    Tree and Star

    at Tree

    3esh and Torus

  • 7/24/2019 Program and Network Properties

    73/94

    Static Networks A /inear !rray

    N nodes connected by n&' links *not a bus+F segments between

    different pairs of nodes can be used in parallel.

    #nternal nodes ha)e degree 2F end nodes ha)e degree '.

    Diameter n&'

    5isection '

    or small n( this is economical( but for large n( it is ob)iously

    inappropriate.

  • 7/24/2019 Program and Network Properties

    74/94

    Static Networks A ing( Chordal ing

    /ike a linear array( but the two end nodes are connected by an n thlinkF the ring can be uni& or bi&directional. Diameter is n$2for abidirectional ring( or n for a unidirectional ring.

    5y adding additional links *e.g. chords in a circle+( the node degreeis increased( and we obtain a chordal ring. This reduces the network

    diameter. #n the limit( we obtain a fully&connected network( with a node

    degree of n &' and a diameter of '.

  • 7/24/2019 Program and Network Properties

    75/94

    Static Networks A 5arrel Shifter

    /ike a ring( but with additional links between all pairs of nodes thatha)e a distance e9ual to a power of 2.

    4ith a network of sie$ 2n( each node has degree d 2n &'( andthe network has diameterD n $2.

    5arrel shifter connecti)ity is greater than any chordal ring of lower

    node degree. 5arrel shifter much less comple, than fully&interconnected network.

  • 7/24/2019 Program and Network Properties

    76/94

    Static Networks A Tree and Star

    ! k&le)el completely balanced binary tree will ha)e$ 2kA ' nodes(

    with ma,imum node degree of = and network diameter is 2*k A '+.

    The balanced binary tree is scalable( since it has a constant

    ma,imum node degree.

    ! star is a two&le)el tree with a node degree d$A ' and a constant

    diameter of 2.

  • 7/24/2019 Program and Network Properties

    77/94

    Static Networks A at Tree

    ! fat tree is a tree in which the number of edges between nodes

    increases closer to the root *similar to the way the thickness of limbs

    increases in a real tree as we get closer to the root+.

    The edges represent communication channels *wires+( and since

    communication traffic increases as the root is approached( it seems

    logical to increase the number of channels there.

  • 7/24/2019 Program and Network Properties

    78/94

    Static Networks A 3esh and Torus

    Pure meshA$ n k nodes with links between each adEacent pair of

    nodes in a row or column *or higher degree+. This is not a

    symmetric networkF interior node degree d 2k( diameter k *nA

    '+. Illiac mesh*used in #lliac #H computer+ A wraparound is allowed(

    thus reducing the network diameter to about half that of the

    e9ui)alent pure mesh.

    ! torushas ring connections in each dimension( and is symmetric.!n n n binary torus has node degree of > and a diameter of 2 n $

    2.

  • 7/24/2019 Program and Network Properties

    79/94

    Static Networks A Systolic !rray

    ! systolic array is an arrangement of processing elements and

    communication links designed specifically to match the

    computation and communication re9uirements of a specific

    algorithm *or class of algorithms+.

    This specialied character may yield better performance than more

    generalied structures( but also makes them more e,pensi)e( and

    more difficult to program.

  • 7/24/2019 Program and Network Properties

    80/94

    Static Networks A @ypercubes

    ! binary n&cube architecture with$ 2nnodes spanning along n

    dimensions( with two nodes per dimension.

    The hypercube scalability is poor( and packaging is difficult for

    higher&dimensional hypercubes.

  • 7/24/2019 Program and Network Properties

    81/94

    Static Networks A Cube&connected Cycles

    k&cube connected cycles *CCC+ can be created from a k&cube by

    replacing each )erte, of the k&dimensional hypercube by a ring of k

    nodes.

    ! k&cube can be transformed to a k&CCC with k2knodes.

    The maEor ad)antage of a CCC is that each node has a constant

    degree *but longer latency+ than in the corresponding k&cube. #n

    that respect( it is more scalable than the hypercube architecture.

  • 7/24/2019 Program and Network Properties

    82/94

    Static Networks A k&ary n&Cubes

    ings( meshes( tori( binary n&cubes( and "mega networks *to be

    seen+ are topologically isomorphic to a family of k&ary n&cube

    networks.

    nis the dimension of the cube( and kis the radi,( or number of of

    nodes in each dimension.

    The number of nodes in the network($( is k n.

    olding *alternating nodes between connections+ can be used to

    a)oid the long end&around delays in the traditional

    implementation.

  • 7/24/2019 Program and Network Properties

    83/94

    Static Networks A k&ary n&Cubes

    The cost of k&ary n&cubes is dominated by the amount of wire( not

    the number of switches.

    4ith constant wire bisection( low&dimensional networks with wider

    channels pro)ide lower latecny( less contention( and higher hot&

    spot throughput than higher&dimensional networks with narrower

    channels.

  • 7/24/2019 Program and Network Properties

    84/94

    Network Throughput

    $etwork through%utA number of messages a network can handle in

    a unit time inter)al.

    "ne way to estimate is to calculate the ma,imum number of

    messages that can be present in a network at any instant *itsca%acity+F throughput usually is some fraction of its capacity.

    ! hot s%otis a pair of nodes that accounts for a disproportionately

    large portion of the total network traffic *possibly causing

    congestion+.

    &ot s%ot through%utis ma,imum rate at which messages can be

    sent between two specific nodes.

  • 7/24/2019 Program and Network Properties

    85/94

    3inimiing /atency

    /atency is minimied when the network radi, kand dimension nare

    chose so as to make the components of latency due to distance * of

    hops+ and the message aspect ratioL $ "*message length / di)ided

    by the channel width " + appro,imately e9ual.

    This occurs at a )ery low dimension. or up to '-2> nodes( the best

    dimension *in this respect+ is 2.

  • 7/24/2019 Program and Network Properties

    86/94

    Dynamic Connection Networks

    Dynamic connection networks can implement all communication

    patterns based on program demands.

    #n increasing order of cost and performance( these include

    o

    bus systemso multistage interconnection networks

    o crossbar switch networks

    Price can be attributed to the cost of wires( switches( arbiters( and

    connectors. Performance is indicated by network bandwidth( data transfer rate(

    network latency( and communication patterns supported.

  • 7/24/2019 Program and Network Properties

    87/94

    Dynamic Networks A 5us Systems

    ! ussystem *contention us( time-sharing us+ has

    o a collection of wires and connectors

    o multiple modules *processors( memories( peripherals( etc.+ which

    connect to the wireso data transactions between pairs of modules

    5us supports only one transaction at a time.

    5us arbitration logic must deal with conflicting re9uests.

    /owest cost and bandwidth of all dynamic schemes.

    3any bus standards are a)ailable.

  • 7/24/2019 Program and Network Properties

    88/94

    Dynamic Networks A Switch 3odules

    !n aswitch module has a inputs and b outputs. ! binary switch

    has a 2.

    #t is not necessary for a ( but usually a 2k( for some integer

    k.

    #n general( any input can be connected to one or more of the

    outputs. @owe)er( multiple inputs may not be connected to the

    same output.

    4hen only one&to&one mappings are allowed( the switch is called a

    crossar switch.

  • 7/24/2019 Program and Network Properties

    89/94

    3ultistage Networks

    #n general( any multistage network is comprised of a collection of a

    switch modules and fi,ed network modules. The aswitch

    modules are used to pro)ide )ariable permutation or other

    reordering of the inputs( which are then further reordered by thefi,ed network modules.

    ! generic multistage network consists of a se9uence alternating

    dynamic switches *with relati)ely small )alues for aand 'with

    static networks *with larger numbers of inputs and outputs+. The

    static networks are used to implement interstage connections *#SC+.

  • 7/24/2019 Program and Network Properties

    90/94

    "mega Network

    ! 2 2 switch can be configured foro Straight&througho Crosso)ero %pper broadcast *upper input to both outputs+o /ower broadcast *lower input to both outputs+o *No output is a somewhat )acuous possibility as well+

    4ith four stages of eight 2 2 switches( and a staticperfect shuffle for each of the four #SCs( a ' by '"mega network can be constructed *but not allpermutations are possible+.

    #n general ( an n&input "mega network re9uires log 2nstages of 2 2 switches and n $ 2 switch modules.

  • 7/24/2019 Program and Network Properties

    91/94

    EENG-630 - Chapter 2

  • 7/24/2019 Program and Network Properties

    92/94

    5aseline Network

    ! baseline network can be shown to be topologically e9ui)alent to

    other networks *including "mega+( and has a simple recursi)e

    generation procedure.

    Stage k *k -( '( + is an m m switch block *where m N $ 2k+

    composed entirely of 2 2 switch blocks( each ha)ing two

    configurations: straight through and crosso)er.

  • 7/24/2019 Program and Network Properties

    93/94

    > > 5aseline Network

  • 7/24/2019 Program and Network Properties

    94/94

    Crossbar Networks

    ! m n crossbar network can be used to pro)ide a constant latency

    connection between de)icesF it can be thought of as a single stage switch.

    Different types of de)ices can be connected( yielding different

    constraints on which switches can be enabled.o 4ith m processors and n memories( one processor may be able to generate re9uests

    for multiple memories in se9uenceF thus se)eral switches might be set in the same

    row.

    o or m m interprocessor communication( each P0 is connected to both an input

    and an output of the crossbarF only one switch in each row and column can be

    turned on simultaneously. !dditional control processors are used to manage the

    crossbar itself.