CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5...

38
CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5. Interprocessor Communication Prof. Michael T. Heath Department of Computer Science University of Illinois at Urbana-Champaign Copyright c 2004, Michael T. Heath – p.1/38

Transcript of CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5...

Page 1: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

CSE 412/CS454/MATH 486Parallel Numerical Algorithms

5. Inter processorCommunicationProf. MichaelT. Heath

Departmentof ComputerScience

Universityof Illinois atUrbana-Champaign

Copyright c

2004,MichaelT. Heath– p.1/38

Page 2: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

MessageRouting

If messageis sentbetweenprocessorsthatarenotdirectly connected,thenit mustberoutedthroughintermediateprocessors

Messageroutingalgorithmscanbeminimal or nonminimalstaticor dynamicdeterministicor randomizedcircuit switchedor packetswitched

Most regularnetwork topologiesadmitrelativelysimpleroutingschemesthatarestatic,deterministic,andminimal

Copyright c

2004,MichaelT. Heath– p.2/38

Page 3: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Example: Routing in Mesh

In 2-D mesh,messageisforwardedalongrow (orcolumn)of sendingnodeuntil column(or row) ofdestinationnodeis reached,thenforwardedalongdestinationcolumn(or row)until destinationnodeisreached

In 3-D mesh,forwardingtakesplacesimilarly alongeachdimensionuntildestinationnodeis reached

Copyright c

2004,MichaelT. Heath– p.3/38

Page 4: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Example: Routing in Hypercube

In hypercube,if currentnodenumberdiffersfrom thatofdestinationnodein

th bit,thenmessageis forwardedtoadjacentnodewith oppositevaluein

th bit

Messagereachesdestinationnodein

steps,where�

isnumberof bit positionsinwhichsourceanddestinationnodenumbersdiffer, whichis atmost

��� � ��

011

111

010

101100

110

000 001

Copyright c

2004,MichaelT. Heath– p.4/38

Page 5: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

MessageRouting

Thereis oftenconsiderablefreedomin choosingroutingscheme

In 2-D or 3-D meshonecantake respectivedimensionsin any orderIn hypercube,bits thatdiffer betweensourceanddestinationnodescanbe“corrected”in any order

Thus,thereareoftenmultiple possiblepathsfor anygivenmessage,andthis freedomcansometimesbeexploitedfor improvedperformanceor faulttolerance

Copyright c

2004,MichaelT. Heath– p.5/38

Page 6: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Cut-Thr oughRouting

Earlydistributed-memorymulticomputersusedstore-and-forward routing: ateachnodealongpathfrom sourceto destination,entiremessageisreceivedandstoredbeforebeingforwarded

Moderncommunicationnetworksusecut-through(or wormhole) routing,in whichmessageis brokeninto smallersegmentsthatarepipelinedthroughnetwork

Eachnodeonpathforwardseachsegmentassoonasit is received,which improvesperformanceandreducesbuffer spacerequirements

Copyright c

2004,MichaelT. Heath– p.6/38

Page 7: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Store-and-Forward vs Cut-Thr ough

PPPP

0

1

2

3

time

PPPP

0

1

2

3

time

store-and-forward

cut-through

Copyright c

2004,MichaelT. Heath– p.7/38

Page 8: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Cut-Thr oughRouting

In effect,cut-throughroutingestablishesvirtualcircuit betweensourceanddestinationnodes

Caremustbetakenin designingroutingalgorithmtoavoid potentialdeadlockwhenmultiple messagescontendfor samelink

Cut-throughroutingmakesnetwork distancelessimportantfor individualmessages,somatchingproblemtopologyto network topologyis lesscrucial

Aggregatebandwidthconstraintsstill necessitatesomeattentionto locality, however

Copyright c

2004,MichaelT. Heath– p.8/38

Page 9: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Communication Concurrency

Wehave thusfarconsideredonly point-to-pointcommunication,in whichonepairof processorscommunicatewith eachother

If many processorscommunicatesimultaneously,overall performanceis affectedby degreeofconcurrency supportedby communicationsystem

It mayor maynotbepossiblefor processortosendandreceiveonsamelink simultaneouslysendononelink andreceiveonanotherlinksimultaneouslysendand/orreceiveonmultiple linkssimultaneously

Copyright c

2004,MichaelT. Heath– p.9/38

Page 10: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Communication Concurrency

Wecanusuallyallow for thesedistinctionsbyappropriatelydefiningwhatwemeanby “step” ofgivencommunicationpattern

Effect is to multiply overall costby constantfactorin network whosedegreedoesnot varywith numberof processors

Correspondingfactormaygrow with numberofprocessorsin network having variabledegree

Copyright c

2004,MichaelT. Heath– p.10/38

Page 11: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

CollectiveCommunication

Collectivecommunicationinvolvesmultiple nodessimultaneously

Examplesoccurringfrequentlyincludebroadcast: one-to-allreduction: all-to-onemultinodebroadcast: all-to-allscatter/gather: one-to-all/all-to-onetotal exchange: personalizedall-to-allscanor prefixcircular shiftbarrier

Copyright c

2004,MichaelT. Heath– p.11/38

Page 12: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Broadcast

In broadcast, sourcenodecommunicatessinglemessageto � � �

othernodes

Sourcenodecouldsend� � �

separatemessagesserially, oneto eachof othernodes

Efficiency canbeimprovedby exploitingparallelismandfactthatmessagesoftenneedto beroutedthroughintermediatenodesanyway

Copyright c

2004,MichaelT. Heath– p.12/38

Page 13: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

GenericBroadcastAlgorithm

1. If source me,receivemessage

2. Sendmessageto eachof my directneighborswhohave notalreadyreceivedit

Copyright c

2004,MichaelT. Heath– p.13/38

Page 14: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Broadcastin Mesh or Torus

4

4 3

3 2

2 1 4 4 4

4444

43333

33

22

2

4

2

1

1

2-D mesh

1-D mesh

1-D torus(ring)

Copyright c

2004,MichaelT. Heath– p.14/38

Page 15: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Broadcastin Hypercube

4-cube

3-cube2-cube

Copyright c

2004,MichaelT. Heath– p.15/38

Page 16: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Costof Broadcast

Broadcastalgorithmgeneratesspanningtreeforgivennetwork, with sourcenodeasroot

Heightof spanningtreedeterminestotalnumberofstepsrequired

Costof broadcastfor messageof length

is1-D mesh:

� � � � ����� � � �

2-D mesh:

� � � � � ���� � � �

Hypercube:

�� � � � ����� � � �

Copyright c

2004,MichaelT. Heath– p.16/38

Page 17: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

EnhancedBroadcast

For longmessagefor whichbandwidthdominateslatency, network bandwidthmaybebetterexploitedby breakingmessageinto piecesandeither

pipelinepiecesalongsinglespanningtree,orsendeachpiecealongdifferentspanningtreewith sameroot

In hypercubewith

� �

nodes,with any givennodeasroot, thereare

edge-disjointspanningtrees,all ofwhichcanpotentiallybeexploitedsimultaneouslyinbroadcast

Copyright c

2004,MichaelT. Heath– p.17/38

Page 18: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Reduction

In reduction, datafrom all � nodesarecombinedbyapplyingspecifiedassociative operation(e.g.,sum,product,max,min, logicalOR, logicalAND) toproduceoverall result

As with broadcast,reductionusesspanningtreeforgivennetwork, but dataflow is in oppositedirection,from leavesto root

Incomingresultsarecombinedwith receivingnode’s valuebeforeforwardingto its parent

Final resultendsupat root node;if it is alsoneededby othernodes,final resultcanbebroadcast

Copyright c

2004,MichaelT. Heath– p.18/38

Page 19: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

GenericReductionAlgorithm

1. Receive messagefrom eachof my childreninspanningtree,if any

2. Combinereceivedvalueswith my own usingspecifiedassociative operation

3. Sendresultto my parent,if any

Copyright c

2004,MichaelT. Heath– p.19/38

Page 20: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Reduction in Mesh

1221

4

3

3

2

2

1

1

4 332222

1 1 1 1

11

3 4

1 1

1-D torus(ring) 2-D mesh

1-D mesh

Copyright c

2004,MichaelT. Heath– p.20/38

Page 21: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Reduction in Hypercube

4-cube

2-cube 3-cube

Copyright c

2004,MichaelT. Heath– p.21/38

Page 22: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Costof Reduction

Reductionalgorithmusessamespanningtreeasbroadcast,but messagesflow in reversedirection

Heightof spanningtreedeterminestotalnumberofstepsrequired

Costof reductionfor messageof length

is1-D mesh:

� � � � �����

�� � ��� �

2-D mesh:

� � � � � ����

��� � �� �

Hypercube:

�� � � � �����

��� � ��� �

where

��� is costperwordof associative reductionoperation

Copyright c

2004,MichaelT. Heath– p.22/38

Page 23: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Multinode Broadcast

In multinodebroadcast, eachnodesendsmessagetoall othernodes

This all-to-all operationis logically equivalentto �

broadcasts,onefrom eachnode,andcouldbeimplementedthatway

Efficiency canoftenbeimprovedby overlappingseparatebroadcasts

Total costof multinodebroadcastdependsstronglyondegreeof overlapsupportedby targetsystem

Multinodebroadcastneedbenomorecostlythanstandardbroadcastif aggressiveoverlappingofcommunicationis supported

Copyright c

2004,MichaelT. Heath– p.23/38

Page 24: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Multinode Broadcastin Torus

In 1-D torus,broadcastcanbeinitiatedfrom eachnodesimultaneouslyin samedirectionaroundring

After � � �

steps,eachnodehasreceiveddatafromall othernodes,andmultinodebroadcastiscomplete,andcostis sameasstandardbroadcast

In 2-D torus,ring algorithmcanbeappliedfirst ineachrow, thenin eachcolumn(or viceversa)

Thereare

� � � � �

stepsfor square2-D torus

Messagesfor secondphasearelargerby factorof

� , sototal amountof datatransferredis stillproportionalto �

Copyright c

2004,MichaelT. Heath– p.24/38

Page 25: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Multinode Broadcastin Hypercube

In hypercubewith

� �

nodes,multinodebroadcastcanbeimplementedby successive pairwiseexchangesin eachof

dimensions,with messagesconcatenatedateachstage

Thereare

��� � � �

stepsfor hypercube,but growth inmessagesizesmeansthattotal communicationvolumeis still proportionalto �

Copyright c

2004,MichaelT. Heath– p.25/38

Page 26: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Reductionvia Multinode Broadcast

If insteadof concatenatingmessagesthey arecombinedusingspecifiedassociative operation,multinodebroadcastcanbeusedto implementreduction

Sinceall nodesreceivefinal result,thisapproachavoidsrootnodehaving to broadcastit afterreduction,therebysaving factorof up to two in costif resultis neededby all nodes

Copyright c

2004,MichaelT. Heath– p.26/38

Page 27: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

PersonalizedCollectiveComm.

In broadcastor multinodebroadcast,givennodesendssamemessageto all othernodes

In analogouspersonalizedversions,distinctmessageis sentto eachothernode

Scatter: analogousto broadcast,but root sendsdistinctmessageto eachothernodeGather: analogousto reduction,but datareceivedby root areconcatenatedratherthancombinedusingassociative operationTotal exchange: analogousto multinodebroadcast,but eachnodeexchangesdistinctmessagewith eachothernode

Copyright c

2004,MichaelT. Heath– p.27/38

Page 28: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

PersonalizedCollectiveComm.

Scatterusesspanningtreealgorithmsimilar tostandardbroadcast,but multiple messagesaretransmittedtogetherateachstage

Rootnodesendsmessagesto eachof its childrencontainingdatafor entiresubtreeof which thatchild is rootEachchild retainsits own portionof dataandforwardsappropriatesubsetsof remaindertoeachof its childrenEventualyevery nodereceivesits distinctmessage

Copyright c

2004,MichaelT. Heath– p.28/38

Page 29: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

PersonalizedCollectiveComm.

Gatherusesalgorithmssimilar to reduction,exceptdataareconcatenatedateachstageratherthancombinedusingassociative operation

Total exchangeusesalgorithmsimilar to multinodebroadcast,exceptbroadcastsarereplacedby scatteroperations,whichareoverlappedasin multinodebroadcast

Copyright c

2004,MichaelT. Heath– p.29/38

Page 30: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Scanor Prefix

In scanor prefixoperation,datavalues

����� ����� � � �� � � ! �aregiven,onepernode,alongwith specifiedassociative operation

Sequenceof partialresults

"�#� "� � � � �� "� ! �

is to becomputed,where

" � �$� �$� % % % � �

and " � is to resideonnode

��

� &� � � � � � �

Copyright c

2004,MichaelT. Heath– p.30/38

Page 31: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Scanor Prefix

Scanoperationcanbeimplementedby algorithmssimilar to thosefor multinodebroadcast,exceptthatintermediateresultsreceivedby eachnodeareselectively combined,dependingonsendingnode’snumbering,beforeforwarding

Copyright c

2004,MichaelT. Heath– p.31/38

Page 32: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Cir cular Shift

In circular

-shift, with

& ' � ' � , node

sendsdatato node

� � � (� ) �

Suchoperationsarisein somefinite differenceandmatrix computationsandstringmatchingproblems

Circularshift canbeimplementedquitenaturallyinring network

Implementingcircularshift in othernetworkscanbeconsiderablymorecomplicated,but basicallyitinvolvesembeddingring or seriesof ringsin givennetwork

Copyright c

2004,MichaelT. Heath– p.32/38

Page 33: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

Barrier

Barrier: synchronizationmechanismin whichallprocessorsmustreachbarrierbeforeany processoris allowedto proceedbeyondit

Implementationof barrierdependsonunderlyingmemoryarchitectureandnetwork

In distributed-memorysystems,barrieris usuallyimplementedby messagepassing,usingalgorithmssimilar to thosefor all-to-all communication

In shared-memorysystems,barrieris usuallyimplementedusingtest-and-set,semaphore,or othermechanismfor enforcingmutualexclusion

Copyright c

2004,MichaelT. Heath– p.33/38

Page 34: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

References

M. Barnett,R. Littlefield, D. Payne,andR. vandeGeijn,

Globalcombinealgorithmsfor 2-D mesheswith wormhole

routing,J. Parallel Distrib. Comput.24:191-201,1995

M. Barnett,D. Payne,R. vandeGeijn,andJ.Watts,

Broadcastingonmesheswith wormholerouting,J. Parallel

Distrib. Comput.35:111-122,1996

D. P. Bertsekas,C. Ozveren,G. D. Stamoulis,P. Tseng,and

J.N. Tsitsiklis,Optimalcommunicationalgorithmsfor

hypercubes,J. Parallel Distrib. Comput.11:263-275,1991

G. E. Blelloch,Scansasprimitive operations,IEEE Trans.

Comput.38:1526-1538,1989

Copyright c

2004,MichaelT. Heath– p.34/38

Page 35: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

References

V. G. Cerf,Networks,ScientificAmerican265(3):72-81,

September1991

W. J.Dally andC. L. Seitz,Deadlock-freeroutingin

multiprocessorinterconnectionnetworks,IEEETrans.

Comput.36:547-553,1987

A. Grama,A. Gupta,G. Karypis,andV. Kumar, Introduction

to Parallel Computing, 2nd.ed.,Addison-Wesley, 2003

D. Hensgen,R. Finkel, andU. Manber, Two algorithmsfor

barriersynchronization,Internat.J. Parallel Prog. 17:1-17,

1988

Copyright c

2004,MichaelT. Heath– p.35/38

Page 36: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

References

S.L. JohnssonandC.-T. Ho, Optimumbroadcastingand

personalizedcommunicationin hypercubes,IEEETrans.

Comput.38:1249-1268,1989

P. KermaniandL. Kleinrock,Virtual cut-through:a new

communicationswitchingtechnique,ComputerNetworks

3:267-286,1979

C. P. Kruskal,L. Rudolph,andM. Snir, Thepower of parallel

prefix, IEEETrans.Comput.C-34:965-968,1985

R. E. LadnerandM. J.Fischer, Parallelprefix computation,

J. ACM 27:831-838,1980

Copyright c

2004,MichaelT. Heath– p.36/38

Page 37: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

References

O. McBryanandE. F. VandeVelde,Hypercubealgorithmsand

implementations,SIAMJ. Sci.Stat.Comput.8:s227-s287,

1987

L. M. Ni andP. K. McKinley, A survey of wormholerouting

techniquesin directnetworks,IEEEComputer26(2):62-76,

1993

S.Ranka,Y. Won,andS.Sahni,Programminga hypercube

multicomputer, IEEESoftware 69-77,September1988

Y. SaadandM. H. Schultz,Datacommunicationin

hypercubes,J. Parallel Distrib. Comput.6:115-135,1989

Copyright c

2004,MichaelT. Heath– p.37/38

Page 38: CSE 412/CS 454/MATH 486 Parallel Numerical Algorithms 5 ...users.ece.northwestern.edu/~boz283/ece-358-mine/05... · Message Routing There is often considerable freedom in choosing

References

Y. SaadandM. H. Schultz,Datacommunicationin parallel

architectures,Parallel Computing11:131-150,1989

Q. F. StoutandB. Wagar, Intensive hypercubecommunication,

J. Parallel Distrib. Comput.10:167-181,1990

R. vandeGeijn,Onglobalcombineoperations,J. Parallel

Distrib. Comput.22:324-328,1994

Copyright c

2004,MichaelT. Heath– p.38/38