Parallel Gold Berm Ax Flow

download Parallel Gold Berm Ax Flow

of 10

Transcript of Parallel Gold Berm Ax Flow

  • 8/6/2019 Parallel Gold Berm Ax Flow

    1/10

    On the Parallel Implementation of Goldbergs Maximum FlowAlgorithm *

    Richard J. Andersont Jo&o C. Setubal$

    Department of Computer Science & EngineeringUniversity of Washington

    Seattle, WA 98195

    AbstractWe describe an efficient parallel implementationof Goldbergs maximum flow algorithm for ashared-memory multiprocessor. Our main tech-nical innovation is a method that allows a globalrelabeling heuristic to be executed concurrentlywith the main algorithm; this heuristic is essen-tial for good performance in practice. We presentperformance results from a Sequent Symmetryfor a variety of input distributions. We achievespeed-ups of up to 8.8 with 16 processors, rela-tive to the parallel program with 1 processor (5.8when compared to our best sequential program).We consider these speed-ups very good and weprovide evidence that hardware effects and insuf-ficient parallelism in certain inputs are the mainobstacles to achieving better performance.

    1 IntroductionThe general research area addressed in this pa-per is the implementation of parallel algorithms.

    *This work was supported by NSF Presidential YoungInvestigator Award CCR-8657562, NSF CER grant CCR-861966, NSF/Darpai grant CCR-8907960, and BrazilianAgency FAPESP grant 87/1385-7.

    t E-mail address anderson@cs. Washington. edu.ton leave from State University of Campinas-j Brazil.

    E-mail address setubal@cs. washington, edu.

    Permission to copy without fee all or part of this material is grantedprovided that the copies are not made or distributed for directcommercial advantage, the ACM copyright notice and the title of thepubhcation and its date appear, and notice is given that copying is bypermission of the Association for Computing Machinery. To copy other-wise, or to republish, requires a fee aud/or specific permission.

    SPAA 92- 61921CA

    Our current research is focused on the implemen-t ation of combinatorial algorithms on shared-memory machines and in this paper we present aparallel implement ation of Goldbergs maximumflow algorithm [G0187, GT88]. The problem ofcomputing a maximum flow in a network is afundamental combinatorial problem, with manyapplications in the fields of transportation plan-ning and operations research [Law76, PS82].We concentrate on Goldbergs algorithm for

    two reasons: computational experiments [DM89,AS92] have shown that it is the fastest sequen-tial maximum flow algorithm in practice; andit has a structure amenable to parallel imple-ment ation. Those experiments and others havealso shown that Goldbergs algorithm performsbest when a global relabeling heuristic is incor-porated to the algorithm. Running times havebeen observed to decrease by two orders of mag-nitude on moderate-sized graphs (4K vertices)when this heuristic is in place. Therefore, anyparallel implementation of Goldbergs algorithmmust also include this heuristic. However, thenatural approach of incorporating the heuristicinto a parallel algorithm gives rise to errors, dueto subtle interactions between the global rela-beling and the mtin operations of the algorithm.One of our main results is a correct method forincorporating concurrent global relabeling into aparallel implement ation.

    Another contribution that we present is a datastructure that allows some control of parallelgranularity. In many combinatorial algorithmsit is the case that towards the end of execu-tion only a few time-consuming tasks remain,and these can severely degrade parallel perfor-

    @1992 ACM 0-89791-484-8/92/0006/0168 . . . . . . . $1.50 168

  • 8/6/2019 Parallel Gold Berm Ax Flow

    2/10

    mance. This has been reported, for instance, fora parallel algorithm for the assignment problem[BMPT91]. It is also the case for Goldbergs al-gorithm. To alleviate this problem, in our im-plementation processors send tasks to, and re-ceive tasks from, a global workpile. This clat astructure is able to dynamically change the sizeof tasks assigned to processors depending on theamount of work available.We have performed detailed measurements of

    the implement ations performance on a widerange of input distributions. Our input ,gra,phshave generally been graphs with semi-regularstructure and random edge capacities. We haveachieved what we consider to be respectable per-formances on a Sequent Symmetry: on one classof graphs, a speed-up relative to the one proces-sor version of the parallel implementation of 8.8on 16 processors, and a speed-up of 5.8 relativeto our best sequential implementation. The dif-ference in performance between the one proces-sor parallel implement ation and the sequentialimplementation is primarily the cost of locking.We have looked at the sources of performancedegradation and concluded that a combination ofhardware effects, like bus contention, and lack ofparallelism in some inputs are the major sourcesof slowdown. Lack of parallelism is a factor whena significant part of the execution is devoted tohandling the few remaining tasks alluded to inthe previous paragraph, and the size of thesetasks cannot be further decreased.

    2 Goldbergs algorithmThe formal definition of the maximum fio w prob-lem is (after [Tar83]): The input is a directedgraph G(V, E), IVI = n, [El = n-t, with distin-guished vertices s (the source) and t (the sink),and a positive capacity C(V, w) associated withevery edge (v, w) E E. A flow on the graphis a real-valued function ~ on each edge suchthat (1) for all e E E, j(v, w) s C(V, w) (when~(v, w) = C(V, w) we say that e is sai%rafed) (2)j(v, w) = ~(w, v) (antisymmetrg constraint)and (3) ~w f(v, -w) = O for every vertex w ex-cept the source and the sink (flow conservation

    constraint). The value of the flow is ~V ~(s, v),the net flow out of the source. The desired out-put is a flow function which maximizes the valueof the flow,Goldbergs algorithm [G0187, GT88] operates

    over the residual graph Gj = (V, I?j): it has thesame vertex set, but only edges (v, w) such that~(v, w) < C(V, w) are considered (the residualedges). 1 The basic idea in the algorithm is tointroduce as much flow as possible at the source,and to gradually push it towards the sink. Thealgorithm allows the conservation constraint tobe violated during execution: vertices can havemore flow coming into them than going out. Thismeans that at any point during execution somevertices will have ezcess flow; these vertices arecalled active. Initially the source sends all theflow it can to its neighboring vertices: they willbe the first active vertices. The basic local op-eration of the algorithm is the push operation:an active vertex is selected for discharge and itsexcess flow is pushed to neighboring vertices inGf (thus possibly activating other vertices). Ifthe edge is saturated by the push, we say thatit is a saturating push. Otherwise it is a non-saturating push, The decision as to which ver-tices to push the excess flow to is based on labelsassociated with each vertex. These labels arelower-bound estimates on the dist ante (in num-ber of edges) that flow would have to cover go-ing from the vertex to the sink. We denote thelabel of vertex v by d(v). The sink has fixed la-bel O and the source fixed label n; other verticescan be assumed initially to have label O. Activevertex v pushes flow to w iff (v, w) c G} andd(v) = d(w) + 1. In other words, active ver-tices try to send flow through (what appear tobe) shortest paths to the sink. An active ver-tex v may find itself unable t: push flow becauseall labels of neighboring vertices are greater thanor equal to VS. In this case a relabel operationapplies: v scans its neighbors and changes its la-bel to be one higher than the minimum amongthe neighbors labels. Some of the flow initiallypushed by the source may not be able to reach

    1Note that whenever there is flow from w to w therewill be a residual edge from w to v.

    169

  • 8/6/2019 Parallel Gold Berm Ax Flow

    3/10

    the sink; this excess fiow is pushed back to thesource. To achieve this the labels of the verticesthat have been disconnected from the sink shouldnow reflect the estimated shortest-path distancefrom each of them to the source. When thereare no more active vertices, the algorithm hascomputed a maximum flow and stops.

    There is considerable flexibility in an imple-ment ation of Goldbergs algorithm, in particulardifferent selection rules may be used in choos-ing the order that active vertices are processed.Goldbergs algorithm runs in polynomial time forany ordering of active vertex processing. If theactive vertices are processed in a queue (FIFO)ordering, then the running time is O (n3); if theactive vertex of highest label is processed first,the running time is 0(n2TTL1/2) [CM89]. Bothof these bounds are tight, although the graphsthat force the given running times are very con-trived. These worst case bounds improve toO(nm log(n2/m)) when dynamic trees are used[GT88]. In all of these cases the dominant termin the running time is due to the number of non-saturating pushes.There have been a number of studies of the

    performance of sequential implementations ofGoldbergs algorithm [DM89, AS92]. The re-sults of these studies differ from the theoreti-cal predictions, generally observing much betterperformance than the worst case bounds. Forexample, in [AS92] running times range from0(nl-3) to O(nlg) for various classes of sparsegraphs, for both the queue and highest-label-firstimplement ations. The observations also suggestthat dynamic trees would not give the substan-tial performance improvement that they give tothe worst case analysis.The approach to obtaining an efficient paral-

    lel implement ation of Goldbergs algorithm de-pends upon the target machine. If a very largenumber of processors is available (such as on aConnection Machine), then a natural implemen-tation is to simultaneously perform push/relabeloperations from all vertices, both active and in-active. However, if only a small number of pro-cessors is available, as was our case, then it is bet-ter to only process the active vertices. Since thealgorithm remains correct for any ordering of op-

    erations, the operations can proceed in an asyn-chronous manner, provided some mechanism isused to prevent conflicts of the individual oper-ations. The general approach we adopted in ourimplement ation was to maintain a global workqueue which keeps track of the active vertices.A processor removes vertices from the queue, ap-plies push operations to them, possibly relabelsthem, and puts any newly activated vertices backin the queue. More details will be given in thefollowing sections.As described above, our parallel algorithm

    is essentially a parallel implementation of thequeue variant of Goldbergs algorithm, and thespeed-ups we present are computed with respectto the sequential queue implementation. We notethat our measurements [AS92] showed that thesequential queue implement ation was more ro-bust than highest-label-first, meaning that itsperformance was either better or competitivewith that of highest-label-first, depending on theparticular class of graphs in which they weretested.We finally note that a natural question is

    whether other algorithms could have given a bet-ter parallel implement ation. Among sequentialalgorithms, an implementation of Dinics [Din70]was the only other close to Goldberg7s in perfor-mance in our experiments [AS92], and in all casesbut one it was at least twice as slow. In addition,a parallefization of Dinics algorithm seems morecomplicated and unintituitive, requiring a lot ofsynchronization among processors. In spite ofthese observations we did attempt to parallelizethis ~gorithm but the speed-ups we obtainedwere

    3very poor.

    Concurrent Global Relabel-ing

    As explained above vertices labels are updatedduring Goldbergs aJgorithm execution by the re-label operation. Even though these updates aresufficient for the algorithm to compute the cor-rect result it has been found that performing oc-casional global relabelings vastly improves theperformance of the algorithm. This heuristic was

    170

  • 8/6/2019 Parallel Gold Berm Ax Flow

    4/10

    suggested by Goldberg and Tarjan [G0187, GT88]and is done by performing a backwards breadth-first search (BFS) on Gf from the sink and fromthe source. The labels are updated to be the ex-act dktance in number of edges from each vertexto the sink or, when a vertex cannot reach thesink, the exact distance to the source plus n. Wecall this heuristic periodic global relabeling.A different heuristic, called gap-relabel and

    proposed by Derigs and Meier [DM89], does notrely on B FS, but simply detects when velrticescan no longer reach the sink and updates theirlabels to n. The new labels obtained are notas accurate as those given by global relabeling,but the heuristic is less costly. Sequential exper-iments that we have done indicate that in spiteof its simplicity, a sequential implement atiom us-ing gap-relabel is not as efficient as one usingperiodic global relabeling. In particular, gap-relabel does not help in returning excess flow tothe source.Given the improvements in performance: that

    result when these heuristics are used, any pamdlelimplement ation must include one or the other toachieve good performance. The only other paral-lel implementation of Goldbergs algorithm thatuses such heuristic relabeling and that we areaware of [AG91] is a synchronous implement a-tion running on a Connection Machine, andl theyexecute the heuristic relabeling while suspendi-ng the other operations. However, in an asyn-chronous implement ation using a small nu~mberof processors like ours it is highly desirable to al-low heuristic relabeling to be done concurrentlywit h push/relabel operations. In addition, webelieve that concurrent periodic global relabel-ing should be preferred over gap-relabel if its costcan be kept at roughly the same levels of t he se-quential implement ation. The question then is:how can periodic global relabeling be incorpo-rated into a parallel implementation?Given the simplicity of both the push/relabel

    operations and the manner in which global re-labeling is done, one is tempted to simply plugin the heuristic into a parallel implement :ation,taking only the obvious precaution of prevent-ing simultaneous label up dat es. However, thisapproach may violate one essential invariant of

    Goldbergs algorithm: given any vertex v it mustbe the case that d(v) < d(w) + 1, for W edges(v, w) c Ef. This is the valid labeling condition.Invalid labelings can cause flow to be routed inthe wrong direction resulting in the algorithmstopping with a nonmaximum flow. Next we de-scribe how to incorporate concurrent global re-labeling while keeping the labeling valid,We see each application of global relabeling as

    a wave that sweeps through the graph. Wavesare numbered consecutively from zero. In ad-dition to its label, each vertex has also a wavenumber, which reflects the number of the wavethat most recently updated it. The conditionsfor the push operation are augmented to allowpushes only between vertices that have the samewave number. We also insure that in any la-bel update the label is never decreased. Finally,both the relabel operation and global relabelingmust lock the vertex to be updated, and the pushoperation must lock both endpoints of the edgethrough which flow is being pushed. Deadlockis avoided by having the push operation acquireboth locks only when both are free. Denotingby .Z%oci a procedure as executed by processor i,we get the following pseudo-code (compare to fig-ure 1 in [GT88]):Push;(?), w)Applicability: Processor i holds the locks for bothv and w, v is active, (v, w) c I?f, d(v) == d(w)+l,and wave(v) = wave(w).Action: Push as much flow to w as (v, w) affords,and update vs and ws excesses.RelabelApplicability: Processor i holds the lock for v, vis active, and if (v, w) E Ej then d(v) < d(w).Action: newd - rnin{d(w) + 11(v, W) E ~j};if newd > d(v) then d(v) + newd.In addition to Push and Relabel, a processor

    can also invoke operation GlobalRelabel on a ver-tex v when a global relabeling wave reaches v(more details on global relabeling are given insection 4.2). We assume there are global vari-ables Current Wave and CurrentLevel which keeptrack of the current wave number and currentlevel in the BFS tree, respectively.

    171

  • 8/6/2019 Parallel Gold Berm Ax Flow

    5/10

    GlobalRelabel;(v)Applicability: Processor i holds the lock for v,wave(v) < Cur~entWave.Action: if d(v) < CurrentLevel then

    d(v) ~ CumentLeoel;wave(v) G CurrentWave.

    With these modifications we can prove thatthe algorithm is correct.Theorem 1 The parallel algorithm incorporat-ing global relabeling as described above correctlycomputes a maximum flow.

    The proof is based upon Goldbergs originalproof [G0187]. The main modification is in es-tablishing that the labeling remains valid, whichis given by the following lemma:

    Lemma 1 With the method described above, forall edges e = (v, w) c Ef such that both v andw belong to the same global relabeling wave, itis the case that d(v) < d(w) + 1 throughout thealgorithm 9s ezecution.Proo& First we note that initially the la-

    beling is valid. To simplify the proof we as-sume that only one processor is performing thebreadth-first search for the current global rela-beling wave, while the others are concurrentlydoing push/relabel operations.

    Case 1: Push operation. consider edge(v, w) as in the statement of the lemma. Sincea push from v to w can only happen if v andw belong to the same wave and both v and ware locked while the push is being done, labelingremains valid.

    Case 2: Relabel operation. If vertex v isbeing relabeled, it is locked, so no other proces-sor will change vs label while vs neighbors arescanned. Suppose vs new label is d(w) + 1. Be-tween the time v checks ws label and the timethat v actually is relabeled, ws label may havechanged. But ws label can only increase, there-fore vs label is still valid. No change in the flowof edge (v, w) can happen, since v is locked.

    Case 3: Global relabeling. Assume the cur-rent wave is number k and that it is relabelingvertices at level 1+ 1 (meaning that these verticesare at least 1 + 1 edges away from the sink; ananalogous argument applies to vertices that canonly reach the source). Consider vertex v on thklevel and the actual label d(v) that v has whenit isa:

    b:

    c:

    44.1

    reached by wave k.d(v) >1 + 1. This means that vs label wasraised by a local relabel operation; therefore,by case (2) above, its label is valid.d(v) = 1+1. Suppose there exists edge (v, w)such that w has also been reached by wave kand d(w) 1 + 1. Then it must be the casethat d(u) was set by a relabel operation, soby case (2) above labeling between u and vis valid.

    1

    Implementation detailsThe queue for active vertices

    The queue for active vertices is actually dividedin two: one shared, and the other local to eachprocessor. The local queue is further subdividedinto inqueue and outqueue. A processor takesa vertex from inqueue and discharges it. Wheninqueue is empty, a processor gets a new batchof b vertices from the shared queue and storesthem in its local inqueue. AS these vertices aredischarged, newly activated vertices are placed inout queue, which also has size b. When out queuegets full, the processor places the entire contentsin the shared queue. A processor becomes idle

    172

  • 8/6/2019 Parallel Gold Berm Ax Flow

    6/10

    if it exhausts its inqueue, and both the sharedqueue and its out queue are empty. It is activatedagain as soon as the shared queue receives newvertices.The granularity control previously alluded to

    is achieved through dynamic adjustment in thevalue of variable b, taking into account the num-ber of idle processors. This adjustment is doneas follows:

    q

    q

    q

    q

    initially b = 16 (the maximum value).b decreases in half if at least 2 or 15% cjf theprocessors are idle.b doubles if ap+aV/b > 1.5p, where ap is thenumber of active processors, aV is the num-ber of available vertices in the shared queue,and p is the total number of processors.the rules above are checked every 200 dis-charge operations.

    The idea of having different rules to increaseand to decrease b is to prevent too much oscilla-tion in the value of b.

    4.2 Global RelabelingGiven the correct way to perform global relabel-ing concurrently wit h push/relabel operations,there remains the question of how to dlvicle allthese tasks among the processors. Two possibil-ityies are apparent: (1) whenever the time comesfor global relabeling, assign only one processorto do it; (2) make all processors take part in alloperations. The advantage of (1) is that a sim-ple queue suffices as data structure, since thereis never cent ention for it. This, in fact, was ourinitial approach. However, it has the significantdrawback of limiting the global relabelings shareof the total parallel work to l/p (where p is thenumber of processors being used). Profiles of se-quential executions indicated that global relabel-ing can be responsible for as much as 40% of thework, so clearly for any p > 2, a parallel imple-ment ation based on approach (1) might be doingless global relabelings than necessary. We there-fore adopted approach (2), as described ne~t.

    Similarly to the sequential implementation,globzd relabeling is applied periodically to thegraph. When global relabeling is active, afterevery discharge a processor tries to retrieve up toq vertices from the BFS queue (q is a parameter ofthe implementation, and for the results describedhere we used the value 4). The immediate prede-cessors of these vertices are then processed andpossibly placed in the BFS queue. The key toobtaining maximum benefit from global relabel-ing at the lowest cost comes from the fact that itis done concurrently with other operations andbecause a processor performs this BFS operationonly if the BFS queue is not currently being usedby any other processor; otherwise it goes on toperform another discharge operation.The frequency with which global relabeling is

    applied is based upon the total number of dis-charge operations performed by all processors.Our experience in sequential implementationshas shown that, for a given input graph, thereseems to be an optimum frequency for global re-labeling, in the sense that it represents the besttrade-off between obtaining bet ter labels and notspending too much time to get them. In the se-quential case, a frequency that does a good jobon most inputs is every 2n discharge operations,and that is the frequency we used on the par-allel implement ation. Due to the way we imple-ment ed global relabeling the number of dischargeoperations carried out while global relabeling isin progress increases as the number of proces-sors increases. This means that beyond a cer-t ain number of processors global relabeling willbe done continuously, since more than 2n dis-charge operations will have been completed be-fore the wave terminates. With the parametersettings that we used, we estimate this thresholdto be around 30 processors.

    5 Experimental Results5.1 MachineThe experiments were conduct ed on a SequentSymmetry S81 with 20 Intel 16Mhz 80386 pro-cessorfi, and 32 megabyt eii of memory, runningDYNIX 3.0. Each processor has a 64 Kbyte

    173

  • 8/6/2019 Parallel Gold Berm Ax Flow

    7/10

    cache memory. The program was written inC using the Parallel Programming Library pro-vided with Sequent systems, which allows theforking of processes, one per processor.

    5.2 Test data and methodologyWe tested the implementation on many kinds

    of graphs, and we present the results for threedifferent input czasses, as follows:Random level graphs: These are rectangular

    grids of vertices , where every vertex in a rowhas 3 edges to randomly chosen vertices in thefollowing row. The source and the sink are ex-ternal to the grid, the source has edges to all ver-tices in the top row, and all vertices in the bot-tom row have edges to the sink. Edge capacitiesare integers drawn randomly and uniformly from[1,104]. Two sub-classes were considered: wide,in which there are more columns than rows, andlong, vice-versa; below we use acronyms rlgw andrlgl to designate these classes, respectively. Re-sults in this paper are for instances with n ~ 214vertices.

    Rmf graphs: these were described in [GG88]and are made of 11 square grids of vertices(frames) having 12x Iz vertices, and connected toeach other in sequence. The source is in a cornerof the first frame, and the sink is in a corner ofthe last frame. Each vertex is connected to itsgrid neighbors within the frame, and to one ver-tex randomly chosen from the next frame. Thereare about 6n edges in the network. Edge capaci-ties within a frame are 104x J2 x 12, Capacities foredges between frames are chosen uniformly fromthe range [1,104]. Also in this case we consideredwide (few and large frames) and long (many andsmall frames) varieties; below we use acronymsrm~w and rmfZ to designate these classes, respec-tively. Results in this paper are for inst anteswith n = 214 vertices.Acyclic dense graphs: these are complete di-

    rected acyclic graphs. Edge capacities are in therange [1 ,lOG]; below we use acronym ad to des-ignate this class. Results in this paper are forinstances with n = 500 vertices.

    The methodology in preparing the experi-

    ments was as follows:e

    q

    e

    5.3We

    for every input class, 15 inst antes were gen-erated using different seeds for the pseudo-random generator. For each instance 3 runswere conducted. The running times andother statistics that we report below are av-erages of these 45 runs, for each input class.The standard deviation for each set of 15 in-st antes was computed for each average andgenera~y found to be within 15% of themean (in all cases it was under 25%).Runs on the same set of inputs were donewith number of processors ranging from 1to 16.reported running times for both sequentialand parallel implement ations do not includeinput and initialization time.

    Times and speed-upsdefine speed-up as the running time of our

    best sequential program divided by the parallelrunning time. Relative speed-up is defined usingthe running time of the parallel program with 1processor as the numerator. We denote numberof processors by p.Table 1 gives the sequential and parallel run-

    ning times for 16 processors, as well as speed-upsand relative speed-ups obtained.Figure 1 shows how relative speed-up changes

    with number of processors, for input classes rmfiand rmfw. The curves for classes rlgl, rlgw, andad were similar to that for rmfl.

    5.4 Analysis of resultsIn this section we concentrate on relative

    speed-up and try to answer the question: whyisnt it optimal? We provide evidence that lackof parallelism and hardware efiects are the majorfactors. Another potential source of slowdown islock contention but this does not appear to be aproblem in our implementation.Lack of parallelism results whenever the num-

    ber of active vertices is less than the number ofprocessors in use. If this is the case for a sig-nificant part of the execution then the resulting

    174

  • 8/6/2019 Parallel Gold Berm Ax Flow

    8/10

    9 ,876543

    1 1 1 I 1 I I ,2 4 6 8 10 12 14 16

    number of processors

    Figure 1: Relative speed-up for classes rmfl (solid) and rmfw (dashed).input class n m sequential parallel speed-up relativeI time time speed-up1. . .. 1 6386 48896 30.3 5.5 5.5 8.6

    AWMLQ 2.5..5 4.9 5.2 R.A--- I I -r334T 176.4 I 43.3 4.1 6.2 I

    I --.I -- 1 -., I 1 , I

    Table 1: Average running times (in seconds) and speed-ups obtained. Parallel times were obtainedwith 16 processors.

    speed-up will be poor. We can have an idea ofhow much parallelism is available by looking atthe queue contents during the sequential algo-rithm execution. Let us define a phase as the pe-riod when the algorithm discharges all verticesthat were activated in the previous phase. Inthe first phase the active vertices are the sourcesneighbors. The number of active vertices at thebeginning of a phase can be seen as the maximumavailable parallel ism. In general we found thatavailable parallelism decreases drastically aftera certain point in the execution. Depending onthe input class, this can result in a large num-ber of phases with just a few vertices, meaning

    that the remaining flow that has to be pushed isflowing through just a few paths. To illustratethis, in table 2 we present a profile of the phasesfor all input classes tested. There we see that inclasses rlgw, rlgl, and rmfl there is a large per-cent age of phases with many vertices, and thesewere the classes where we got the best speed-ups.In contrast, the class where speed-up was poor-est ( rm~w) has a large percentage of phases wherethe number of available vertices is less than 50.Hardware effects is a broad term that en-

    compasses phenomena like bus contention, cacheflushes due to false sharing, and other mem-ory hierarchy effects typical of a shared-memory

    175

  • 8/6/2019 Parallel Gold Berm Ax Flow

    9/10

  • 8/6/2019 Parallel Gold Berm Ax Flow

    10/10

    cost flow problem can be solved by similar tech-niques as the maximum flow, but it generally re-quires substantially more computation, whi~ch inturn gives a greater opportunity for paralleliza-tion.

    References[AG91] F. Alizadeh and A. V. Goldberg. E:~per-

    iments with the Push-Relabel Methodfor the Maximum Flow Problem on aConnection Machine. Paper presentedat the DX~ACS Implementation Chal-lenge Workshop, 1991.

    [AS92] R. J. Anderson and J. C. Setubal. Gold-bergs Algorithm for Maximum Flow inPerspective: a Computational Study.Submitted for inclusion in the DI-MACS Implementation Challenge Work-shop Proceedings, 1992.

    [BMPT91] E. Balas, D. Miller, J. Pekny,, and

    [CM89]

    [DM89]

    [Din70]

    [G0187]

    P. Toth. A Parallel Shortest Augment-ing Path Algorithm for the AssignmentProblem. JACM, 38(4):985-1004, 1991.J. Cheriyan and S. N. Maheshwari,Analysis of Preflow Push Algorithms forMaximum Network Flow. SIAM Jour-nal on Computing, 18(6):10571086,1989.U. Derigs and W. Meier. Implement-ing Goldbergs Max-Flow Algorit hlm a Computational Investigation. ZOR Methods and Models of Operations Re-search, 33;383403, 1989.E. A. Dinic. Algorithm for Solution ofa Problem of Maximum Flow in a, Net-work Wit h Power Estimation. $ovietMath. Dokl., 11:12771280, 1970.A. V. Goldberg. Efficient Graph Algo-rithms for Sequential and Parallel Com-puters. Ph. D. dissertation, Massachus-set ts Institute of Technology, Cam-bridge, Mass., Jan. 1987.

    [GT88]

    [GG88]

    [Law76]

    [PS82]

    [Tar83]

    A. V. Goldberg and R. E. Tarjan.A New Approach to the Maximum-Flow Problem. Journal of the ACM,35(4):921-940, 1988.D. Goldfarb and M. Grigoriadis. AComputational Comparison of theDinic and Network Simplex Methodsfor Maximum Flow. Annals of Oper-ations Research, 13:83123, 1988.E. L. Lawler. Combinatorial Optimiza-tion: Networks and Matroids. Holt,Rinehart, and Winston, New York,1976.C. H. Papadimitriou and K. Stei-glitz. Combinatorial Optimization: Al-gorithms and Complexity. Prentice-Hall, Englewood Cliffs, N. J., 1982.R. E. Tarjan. Data Structures and Net-work Algorithms. SIAM, Philadelphia,Pennsylvania, 1983.

    177