DISCRETE OPTIMIZATION - people.math.gatech.edu

DISCRETE OPTIMIZATION

Notes prepared for:

MATH 2602 Linear and Discrete

Mathematics

Fall 1999

1

Table of Contents

1. Introduction to Some Discrete Problems . . . . . . . . . . . . . . . . . 11.1 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2. Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53. Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8

3.1 Branch-and-Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.1.1 Branch-and-Bound Solution to the

Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .113.1.2 Branch-and-Bound Solution to the

Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . 143.2 Direct Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .203.2.1 1-Machine Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Vertex Covering on Trees . . . . . . . . . . . . . . . . . . . . . . . . . 223.2.3 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4. Chinese Postman Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .305. Approximation Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.1 Bin-Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.3 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . . . . 39

6. Final Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .438. References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2

DISCRETE OPTIMIZATION

Earlier, we were introduced to the rudiments of linear programming (LP),a methodology, which to this day remains an exceptionally powerful tool inthe problem-solving arsenal of mathematicians, engineers, systems planners,and other analysts. With the ever-increasing power of modern day comput-ers, commercial LP software is now routinely available for the solution oftruly large linear optimization models. Unfortunately, however, there aremany problems that arise in practice, and hence, are of substantial impor-tance, but that are simply not amenable to modeling and solution as linearprograms. This state of affairs suggests, of course, that analysts have toenhance their array of tools in order to effectively deal with these problems,many of which are exceptionally difficult. In this document, we examine asmall sample of the sorts of the problems that reside beyond the realm ofroutine LP-solvability and we look at some of the strategies that are possiblein dealing with them.

1 Introduction to Some Discrete Problems

We begin by considering a few real-world examples. For the sake of brevity,the presentation of some of these may seem overly simplistic and/or con-trived; however, in each case there are many manifestations of the problemsetting described that occur every day in the world of practice. As a conse-quence, it is essential that effective strategies for their solution be found.

1.1 Illustrations

Example 1. Within a large geographical region there are a number oflocalities where television transmitters can be located. The cost of locatingtransmitters can be substantial and varies with locales. Find a placement oftransmitters in order that all customers in the region are served and do soat minimum total location cost. 2

Example 2. During the Second World War, some allied air squadrons uti-lized planes requiring two pilots. However, not all pairings of pilots wereadmissable due to differences in language and/or technical expertise. During

1

combat runs, the aim was to have as many planes as possible in the air atonce. How could this be easily guaranteed? 2

Example 3. A variety of devices can be loaded in a designated compartmentof a space vehicle. All of these devices are functional but none are absolutelyindispensible relative to the mission. Unfortunately, the capacity of the com-partment is limited and so all of the items cannot be loaded. If NASA hasassigned a “score” to each item relative to its respective value to the mission,find a selection of items that maximizes total score/value and that does notexceed the capacity restriction of the designated vehicle compartment. 2

Example 4. Suppose the irregular shaped layout of unit squares shown inFigure 1 is to be tiled using precut slabs of fine marble. These slabs are cutin 1 × 2 unit square configurations. Now, it is extremely expensive to cutthese into two 1× 1 squares so it is important to try and cover perfectly, theregion shown using only 1 × 2 slabs. Can this be done? 2

Example 5. A new subdivision has been built in the northern suburbs ofAtlanta. Pictorially, this subdivision can be captured by the graph shownin Figure 2. Edges represent streets and vertices depict intersections in thenew neighborhood. The good citizens in the subdivision have contractedwith a private refuse collection firm to pick up trash on a weekly basis. Asingle vehicle will enter the subdivision at the point marked x and departat the point marked y. What is the least total distance route through theneighborhood assuming that the vehicle must travel down each street at leastonce? 2

2

Figure 1: Region to be tiled

Figure 2: Graph depicting layout of subdivision

Example 6. A large number of single-operation jobs are to processed on amachine. Each job requires a certain processing time and at most one jobat a time can be processed on the machine; once started, the processing ofa job cannot be interrupted. Now, when a given job is finished, the follow-ing job cannot be assumed to simply begin instanteously since there is anonneglible amount of changeoever time involved in removing the predeces-sor job, cleaning the machine and setting it up for processing the successor

3

work. Moreover, this changeover time is a function of sequence, i.e., thechangeover between jobs a and b may be different than that between a andc, than between b and a, etc.. The aim is to process all of the jobs so as tocomplete the entire batch in the least amount of time possible. 2

1.2 Overview

What should be clear in these examples is that admissible solutions in eachcase involve discrete outcomes. That is, the decision variables are not allowedto be continuous because they quantify phenomena that is inherently inte-gral. It should also be obvious that this stipulation is far from casual. Forexample, it must be easy to agree that per Example 1, it would make nosense to have an outcome that assigned a fraction of a television transmitterto a portion of a locale; from Example 5 it would be nonsensical to considerroutes that required the refuse vehicle to simultaneously travel down two (ormore) different streets in the subdivision. Relaxing attention to integralityrequirements for the sorts of problems described above is not just sloppy mod-eling but rather is tantamount to simply missing their fundamental intent;discreteness is at the core of these problems, not the periphery.

Unfortunately, this inherent integrality that lies at the heart of mostdiscrete problems gives rise to what is most likely a permanent condition ofintransigence regarding the existence of efficient solution procedures. In fact,this condition is well-studied in the context of complexity theory. The lat-ter subject is certainly beyond the scope of this presentation but it is worthpointing out that the vast majority of interesting and practical discrete prob-lems are known to reside in the notorious complexity class of NP-Completeproblems. Members in this class (which consists of hundreds of problems)share a common bond in that none are known to be solved in any “efficient”sense but if one were to be so resolved then so would every other member inthe class. Similarly, if there is ever a proof that a member of this difficultclass cannot be efficiently solved, then this negative condition must hold forthe other members as well. While certainly circumstantial, being in the classNP-Complete is strong evidence that a fast solution is not likely.

Of course the indictment of being an NP-Complete problem does notmitigate our requirement to look for some way deal with it. In what follows,we examine, in an introductory manner, a few results.

4

2 Formulations

If one is faced with problems like those posed above and knowing little else, agood starting point in any analysis would be to create a mathematical model.Suppose we give this a try for a few of the earlier cases.Assignment of pilots (Example 2). First, let us portray this problem ina rather standard pictorial fashion. Assume the pilots are identified withvertices in a graph and let vertices be connected by edges reflecting thecompatibility of a given pair of pilots. Obviously, the greatest number ofplanes that can fly would correspond to the maximum number of admissiblepairings. But this corresponds to a largest subset of edges (from the graphrepresenting the problem) having the property that no two such edges fromthe subset are incident to a common vertex. Now, letting a pilot-pair/edgebe given by (i, j) where i and j denote the respective pilots/vertices, let usdefine decision variables xi,j where xi,j = 1 if pilots i and j are matched and0 otherwise. If the graph representing a given instance of this pilot-paringproblem is given by G = (V,E), then a correct mathematical model fordetermining a maximum number of these independent edges is the following:

(PM) max∑

(i,j)∈E xi,js.t.

∑(v,i)∈Ev xv,i ≤ 1 for every v ∈ V

xi,j = 0 or 1 for all edges (i, j) ∈ E.

Note that we have used Ev to denote the set of edges in G that are incidentto vertex v.Discussion: Clearly, the objective function in PM captures the number ofpilot-pairings, i.e., it simply counts edges. On the other hand, the constraintsmodel the requirement that every pilot can be paired with at most one otherpilot, the choices for which are depicted by the edges incident to the respectivevertex in the graph. Obviously, we must require the variables xi,j to be 0−1since a pairing is created or it is not.

This problem is referred to as a matching problem. Employing the graph-theoretic perspective, we would seek a subset of edges that constitute amatching in the corresponding graph or alternately, an independent subsetof edges. Of course rather than matching or pairing pilots per Example 2,we might seek to match freshman roommates for a university housing office,assign pairs of jobs to be simultaneously processed on two identical machines,

5

or seek an assignment of job applicants to a set of positions. The problemsettings may differ but the underlying matching model PM does not. 2

Packing the spacecraft (Example 3). Let an item to be packed on thespacecraft be denoted by j and the variable that decides whether or not itgets packed, by xj . Obviously, xj is restricted to take on a value of either 1(packed) or 0 (not packed). Each item has a value, vj and each utilizes anamount, aj of the craft’s overall total capacity of b. It should be easy to seethat if we can solve the following model, we will be able to provide NASAwith an optimal packing of its items:

(PK) max v1x1 + v2x2 + . . .+ vnxns.t. a1x1 + a2x2 + . . .+ anxn ≤ b

xj = 0 or 1 for j = 1, 2, . . . , n.

Discussion: This problem is referred to generally as a knapsack problem.The name derives from an obvious notion: a hiker has to select from a groupof items, all of which may be suitable for his trip, a subset that has greatestvalue while not exceeding the capacity (weight, space, etc.) of his knapsack.Naturally, the “knapsack” may be a compartment on a spacecraft as in ourillustration but it may also be a budget that limits the amount invested invarious stocks or perhaps a time window within which certain projects haveto be completed. 2

Single-processor scheduling (Example 6). This is a problem frequentlyencountered in the broad realm of production planning. Accordingly, let usassume that the machine upon which the jobs are to be processed, starts inan initial state from which setup is incurred prior to processing the first job.Upon conclusion of the final job in the processing sequence, the machine iscleaned and returned to its intial state. Now, if the aim is to process all ofthe jobs and to do so in minimum overall time, it is evident that for anypermutation of the jobs, the total processing time is a constant (which needsno optimizing of course) but that the effect of changeover/setup effort onoverall completion time is directly influenced by the sequential arrangementof the jobs on the machine. Had there been no changeover/setup involved theproblem has no interest since every possible permutation of the jobs producesthe same sum; however, with changeover times introduced, the problem isquite complicated involving, for an n-job instance, a search over as many asn! possibilities, all of which are admissible.

6

Let us create a “dummy” job, denoted by 0, that represents the ini-tial/final state of the machine. This is a modeling trick that while notabsolutely necessary will help us a bit in building (and interpreting) oursubsequent formulation. Now, let variables xi,j be defined so that if job iis sequenced immediately prior to job j, then the respective xi,j is set to 1;otherwise, it takes on value 0. Then denoting by ci,j the changeover time forprocessing the pair (i, j), we create the following model:

(PS) min∑ni=0

∑nj=0 ci,jxi,j

s.t.∑ni=0 xi,j = 1 for j = 0, 1, . . . , n∑nj=0 xi,j = 1 for i = 0, 1, . . . , n∑i∈S

∑j /∈S xi,j ≥ 1 for all nonempty S ⊂ {0, 1, . . . , n}

xi,j = 0 or 1 for 0 ≤ i, j ≤ n.

This formulation is trickier than the first two; let us see if it is even valid?Obviously, if the pair (i, j) is processed as intended then its changeover timeci,j is picked up in the objective function since the respective xi,j is 1. Now,the first set of constraints do nothing more than assure that a given job ispreceded directly by exactly one other job and similarly, the second set guar-antees that each is succeeded by exactly one job. The third set of constraintsis more interesting (but complicate matters substantially). Let’s try to seewhat they are doing by imagining what could happen if they were not in-cluded. If only the first two sets of equality constraints were in force, thenit should be easy to see that outcomes (i.e., ones satisfying the constraints)could occur involving “subcycles” of jobs that, in turn, make no sense in thecontext of the production schedule. So, what the third set of constraints dois assure that all jobs are processed before the machine has to be returnedto its intial state. In fact, by defining the intial and final state to be a “job”,what we are really seeking with PS is a single cycle that includes each job(including the dummy job) once and only once. In this regard, the thirdset of contraints enforce the requirement that for any possible subset of jobs(that could otherwise form a subcyle), at least one of these jobs must besequenced/processed immediately prior to some other job that is not in thesubset.

Discussion: It turns out that the discrete model that we have employedfor this scheduling problem is a celebrated one. To recognize this, supposewe interpret jobs to be cities and the changeover times, ci,j to be intercitydistances. Then any tour or cycle that visited every city exactly once and

7

did so with least total distance would produce the same, albeit disguised,outcome that was described within the context of production scheduling.In fact, sticking with this “tour of cities” interpretation, by treating PS wewould be solving a traveling salesman problem, one of the most famous (anddifficult) in all of discrete optimization. In its purest form, the latter seeksa least weight spanning cycle in a weighted complete graph. In our practicalsetting involving scheduling, we have simply borrowed this interpretation byletting vertices be jobs, using edges to capture changeovers, and specifyingedge weights to be the changeover values, ci,j . One interpretation seeks atour for a salesman, the other, a production schedule. In either case, themodel PS produces a correct solution. 2

With modeling in general, there are often variations in how this is donefor each problem. For example, researchers and analysts have formulatedTSP’s in many different ways for years. Some are tailor-made for specificcases while others are more robust, suited for general settings. In any event,modeling can be a very useful exercise in its own right and interested readersare encouraged to try their hand accordingly at the other illustrations givenearlier.

3 Solutions

Creating a mathematical model of a problem is a good start. In fact, whendealing with linear programming formulations, this accomplishment often(though not always) brings the analyst very close to actually resolving aproblem, i.e., once formulated correctly, the resultant linear program is usu-ally routinely solved. Unfortunately, with most discrete problems, this is notthe case.

Consider the knapsack model given by PK . For an n-variable instance,there are obviously 2n possible 0 − 1 n-tuples to be considered. Of course,many of these may be inadmissible but removing these cases (even if weknew ahead of time what they were) does not provide much relief; indeed,eliminating half of these solutions still leaves 2n−1 possibilities. For n evenreasonably big (e.g. 100) this reduction is not terribly meaningful.

But “wait” you say. The solution spaces for admissible linear programsare infinite (so long as at least two distinct feasible solutions exist); for dis-crete problems, they are finite....or at least countable. The fact is however,

8

that for linear programs, we have some very powerful structural properties atwork that essentially convert the search from an infinite space to a finite oneby allowing us to examine only the extreme points of the feasible region of agiven LP. We know eactly how to identify these extreme points algebraicallyand for most practical instances, the resultant search among them is quiteefficient. Sadly, we have no similar machinery for most discrete problems andherein lies a key difficulty. For the vast majority of interesting discrete opti-mization problems, we are usually forced to employ some form of (at best)implicit enumeration over the respective solution spaces of the problems. Asthe simple illustration with PK points out, this could be a very long searchindeed. In fact, if the reader needs more convincing, imagine even a partialenumeration of the number of possibilites for a small instance of the travelingsalesman problem . . . say one involving only 100 cities.

Unfortunately these discrete problems insist on arising often in practiceand somehow they do get treated, either by exact (optimal) procedures orperhaps with heuristic (approximate) methods. In the following section, welook at one of the primary general-purpose optimization methods for tacklingdiscrete problems; a couple of our prior examples are used to demonstrateits application.

3.1 Branch-and-Bound

For the most part, general purpose strategies for dealing with hard discreteoptimization problems involve some approach that is enumerative in nature.Naturally, this is a bit scary since we saw previously that the size of thesolution spaces for popular problems can be enormous (or greater). Obvi-ously, a trivial algorithm for such a problem is to look at all solutions andpick the best; however, enumerating over these spaces might take time thatoutlasts the life of the analyst even if he/she has available the fastest ma-chine known to facilitate the search. On the other hand, we may be able tohasten these searches by looking only at portions of the full space and eventhen, portions that possess reasonable chances of yielding the optimal. Thisis the basic idea in the partial enumeration strategy popularly referred to asbranch-and-bound.

The basic concept of a branch-and-bound approach is particularly simple:we attempt to create a proof that a solution is optimal based on successiveseparations of the solution space to the problem. The “branching” piece of

9

the approach refers to the separation process whereas “bounding” occurs byassessing values on regions resulting from the separation and that specifylimits (i.e., upper and lower bounds) on the best achievable solution value ofany member in a given region. In some sense, branch-and-bound is simply a“divide and conquer” strategy. Above all, it is really an educated search of aproblem’s solution space that is facilitated by defining the original problemover more and more restricted versions of the original space. Obviously, thisis a very general description and indeed, specific algorithmic manifestations ofthe so-called branch-and-bound notion can vary substantially from problemto problem as well as even within various approaches for the same problem.

Let us be a little more formal. If our original problem is denoted by P ,then let the set of admissible solutions to P be given by S. Then at anystage in the search, the set S can be defined by a (possibly empty) subsetS0 of already considered solutions and a list of others given by S1, S2, . . . , Sqthat collectively exhaust S. That is,

S = S0 ∪ S1 ∪ . . . ∪ Sq.

Each Sk characterizes a candidate problem, say Pk and accordingly, our proce-dure essentially involves a search that is guided by maintaining and updatinga list of candidate problems; for some candidate problem, Pk we need onlyevaluate if it must be examined further in identifying an optimum for P . Ifthe answer is no, then we would remove Pk from the candidate problem listand set S0 ← S0 ∪ Sk. This is called fathoming, i.e., we need look no deeperinto the subspace represented by Sk. On the other hand, if the answer isyes, then we would enumerate Sk further by replacing Pk by new candidatesthat are derived from Pk by adding further restrictions to Sk. We wouldthen select another candidate problem from our active list and repeat; if nocandidates remain that need to be explored further (if they have all beenfathomed), then we are done and we should have a solution and a certificate(proof) of its optimality.

Any “branching” that occurs in branch-and-bound is identified with thecandidate problem creation step; sometimes these candidates are referred toas subproblems. But what about the “bounding” part of the strategy? Infact, this is how we decide whether or not a candidate problem is fathomed;the notion is, in concept, easy. If Pk is a candidate problem and if the optimalsolution value for Pk is denoted by z∗(Pk), then we say that the value b(Pk) isa lower bound on z∗(Pk) if z∗(Pk) ≥ b(Pk). When the inequality is reversed,

10

b(PK) is an upper bound.So, when examining whether or not to further restrict and hence, search

deeper in some Sk, we know that this would not be necessary if:• an optimal solution were known for Pk,• we knew that Pk had no feasible solutions, or• z∗(Pk) ≥ bu where bu is the value of some current, best

known solution found in the search at this point. The latter isreferred to as an incumbent solution.

If any of these three conditions hold, then Pk is fathomed. If the first con-dition is applicable, then a new candidate may have been found and if allcandidates get fathomed by the second, we would know that P itself had noadmissible solutions.

Of course, for a given problem, we have to decide how to create candidatesubproblems (i.e., how to branch) and we certainly need to be able to evaluatevalid and reliable bounds on these candidates. In general, we want boundsthat are “tight” in the sense that for a candidate Pk, we would like to haveb(Pk) be as close to z∗(Pk) as possible. Clearly, we can’t guarantee a level ofstrength that will always hold but if we are reasonably clever and attentiveto special structures that exist in our problem, we can often do quite well.Still, these are matters that are highly problem-dependent and in some cases,even quite data-dependent. Following, we demonstrate a simple applicationof branch-and-bound using the knapsack problem.

3.1.1 A Branch-and-Bound Solution to the Knapsack Problem

Recall PK given in general form earlier and consider the following smallinstance:

min 2x1 + 26x2 + 45x3 + 60x4 − 100x5

s.t. x1 + x2 + x3 + x4 − 4x5 ≥ 0xi ∈ {0, 1} for i = 1, . . . , 5

To motivate a specific implementation of the general branch-and-boundscheme sketched out above, let us make a simple observation. Clearly thetroublesome aspect of a knapsack problem is the restriction on the variablesto be either 0 or 1. Now, if for each variable xi we were to replace thisrequirement with the continuous inequality, 0 ≤ xi ≤ 1, we would end upwith an instance of a linear programming programming problem and we

11

know that the latter are easily solved. Suppose that we solved this linearprogramming relaxtion and the outcome happened to be be integral, i.e., allof the decision variables take on the values at their lower or upper bounds of 0or 1. Trivially, this outcome must be optimal for the “real” instance; indeed,this would really mean that the original imposition of the 0 − 1 restrictionswas redundant; they really weren’t needed. Of course, this outcome is highlyunlikely in general but the point is that by solving the relaxation we knowthat the corresponding continuous solution value would obviously never beany worse than that for the optimal value to the integer problem. That is, theoptimal solution value found over a relaxation of the original solution spacecould never be worse than that found with regard to the original space. Thissimple attribute produces a ready-made source of bounds.

So, if we solve the LP relaxation of the instance above (called P1), thesolution x∗ = (1, 0, 0, 0, 1

4) is produced with value−23 and although infeasible

because x5 = 14, we know that the value of an optimal solution to the integer

problem has value no better than this. Now, suppose we consider the set ofall feasible solutions to the original instance and let us partition these on thefollowing basis. If some variable x∗j in our LP relaxation has fractional valuef then clearly the collection of all solutions, S is preserved by partitioningS as S1 and S2 induced by enforcing the constraints x∗j ≤ bfc and x∗j ≥ dfe.For our instance, we select x5 and form constraints x5 ≤ 0 and x5 ≥ 1 whichdefine two new candidate problems. Adding the first of these inequalities tothe original LP relaxation and solving (P2), produces a continuous solutionof (0, 0, 0, 0, 0) having value 0. The solution is all interger so it is feasiblefor the original problem but it is premature to claim that it is optimal sincethere remains a sizeable portion of the solution space within which a bettersolution might be found. In order to test this possibility, we restore theoriginal relaxation, add the second constraint and solve the resultant LPinstance (P3) whereupon we observe a solution given by (1, 1, 1, 1, 1) andhaving value 33. This solution is also feasible (i.e., integral) but even had itnot been, we know that any further exploration is meaningless because thevalue of this solution is worse than that of our current incumbent which hasvalue 0. We thus have our proof of optimality; all solutions are accounted forby the stated partition so we are safe in proclaiming that the solution givenby (0, 0, 0, 0, 0) must be optimal. Figure 3 summarizes the computation.

12

Figure 3: Branch-and-bound tree summarizing knapsack solution

Discussion: In the application above, the first candidate problem in thebranch-and-bound was simply the LP relaxation of the original instance.The decision to pursue further exploration of this candidate was based uponthe LP relaxation solution that produced at least one fractional valued vari-able. Branching on the only fractional-valued variable in this case, producedtwo new candidates and the bounds on both of these were produced bysolving the corresponding relaxations for each. Although not relevant inthis particular case, had one of the two new candidates produced anotherfractional-valued outcome and with a value strictly less than the value of thecurrent incumbent, then we would not have fathomed that subproblem andrather, would have pursued some further restriction based upon the indicatedround-up/round-down separation rule.

In any practical application of this strategy, there are computationaldetails that have to be addressed. Certainly, one might ponder variousfractional-valued variable selection strategies for branching. The exampleused above was uninteresting on this count because there was no choice.Still, various rules-of-thumb might be investigated such as picking a variablethat is closest to one of the two integer bound values while another mightselect one on some comparative basis such as the contribution of its weightin the objective function or how much “space” in the knapsack it consumes(its aj value). Just the same, any of these rules would be heuristic in nature

13

meaning that they possess no certainty of always performing well.Note that if variables in a problem are restricted to take on only values

of 0 or 1, then the “round-up, round-down” partitioning rule is equivalent tosimply fixing a fractional-valued variable (upon which the branching is based)at either 0 or 1, i.e., if one branch adds constraint xj ≥ 1 but xj ≤ 1 becausethe problem is binary, then obviously the only way to satisy both of theseis for xj = 1; similarly, for the branch on xj ≤ 0, then given nonnegativityrequirements, it must be that xj = 0. However, this is an outcome reflectedby the binary problem; if our problem simply requires variables to be, forexample, nonnegative integers, the inequalities indicated above are routinelyapplied as indicated in the illustration.

There may also be merit in looking at ad hoc ways to strengthen thebounds that are obtained from the LP relaxations. Various computationalstudies have addressed this. Finally, it is worth pointing out that in applyingthe strategy proposed above, the solution of a candidate subproblem (afterthe initial one) is not obtained “from scratch.” That is, if Pk1 and Pk2 are thetwo subproblems created from the parent problem, Pk, then the LP solutionto the latter is usually easily modified to produce the respective solutions tothe two subproblems. 2

3.1.2 A Branch-and-Bound Solution to the Traveling SalesmanProblem

In this section, we provide what should be a more interesting demonstrationof the basic branch-and-bound strategy by applying it to the solution of thetraveling salesman problem (TSP). While the essential notions remain intact, this application should reveal enough technical detail so that the readeris able to appreciate the approach more fully.

Recall that the (TSP) seeks a least-distance tour through a set of n cities,visiting each city exactly once. We assume the tour begins and ends at thesame city. Recall as well that the formulation given by PS earlier, is a correctmodel of the general problem. Now, in theory, we could approach the TSPin the same way that we dealt with the knapsack problem. That is, wecould replace the 0 − 1 requirement on the variables, now given by xi,j , bycontinuous constraints of the form 0 ≤ xi,j ≤ 1 for all 1 ≤ i, j ≤ n. Then, weare left with an LP which we would could try to solve as a source of bounds,branching as before, on any fractional-valued variables.

14

Unfortunately, this is not a viable strategy at all. First, for even smallinstances, fractional-valued variables would simply occur entirely too oftenwhich would, in turn, require entirely too much branching leading to severe,in fact crippling computational demands. However, there is another problemthat is a little more subtle. As we related previously, the key constraints inthe PS formulation are the subtour constraints:∑

i∈S∑j /∈S xi,j ≥ 1 for S ⊂ {1, 2, . . . , n}.

These inequalities derive their name because they prohibit the formation ofpremature subcycles. Unfortunately, there are on the order of 2n of theseconstraints and so for even moderate values of n it is not entirely clear thatwe could even write all of the contraints down in a reasonable amount oftime. Obviously, another approach has to be sought.

Suppose we were to simply remove the subtour constraints from ourmodel. This would leave us with the relaxation below, which we will de-note by PA:

(PA) min∑ni=1

∑nj=1 ci,jxi,j

s.t.∑ni=1 xi,j = 1 for j = 1, 2, . . . , n∑nj=1 xi,j = 1 for i = 1, 2, . . . , n

xi,j = 0 or 1 for 1 ≤ i, j ≤ n.

But PA is the formulation of the well-known assignment problem. The nameorginates from the notion that is most commonly ascribed as an application:n people are to be assigned to n jobs; each person is assigned to exactlyone job and each job has exactly one person assigned to fill it. The costof placing person i in job j is ci,j and the aim is to produce a least costassignment of people to jobs. Obviously, this assignment interpretation canbe easily modified to involve a matching of pairs of cities in the travelingsalesman itinerary, i.e., city i “assigned” to precede city j directly, etc..

Why is this relevant? Well, it turns out that PA has a very important(and somewhat rare) property that allows us to replace the 0 − 1 restrictionon the variables xi,j by the continuous ones of the form 0 ≤ xi,j ≤ 1, and thensolve the resultant linear progam with an outcome that will be all integral,i.e., in any solution, the variables will automatically take on values of either 0or 1! This means that exceptionally large instances of PA can be easily solvedusing the substantial advantage that being a linear program affords us whileobtaining “free of charge” a solution to an inherently discrete problem. Now,

15

every admissible solution to PS is also feasible for PA but not the conversewhich means that solutions to PA provide an natural source of lower boundson the TSP solution. This now, should begin to shape an idea for a branch-and-bound application for PS.

Suppose that given an instance of PS, we first create its assignment relax-ation, PA. Solving the latter will produce n variables, xi,j with value 1 eachof which corresponds to an arc (i, j). If these n arcs form a single cycle, thenour assignment solution has also solved the corresponding TSP instance. Farmore likely, however, is that these n arcs will form two or more subcycles, anoutcome that is not admissible for PS. But we would expect this to happensince the constraints that would have prohibited this subcycle formation wereexactly those that were removed in order to create our relaxtion. Now, justas for the knapsack problem where we selected and then enforced one of theviolated contraints (reflected by a fractional-valued variable), here we caninvoke the same rationalle by picking one of the subcycles and identify thesubtour constraint that prohibits it. If the subcycle contained say k citiesthen in any admissible TSP solution to be derived from the candidate as-signment problem, at least one of these k must be directed to a city differentthan the other k−1 in the subcycle. Hence, we can branch by creating k newsubproblems, each derived from the candidate PA. Letting S be the citiesin the subcycle, then taking each in turn, where a specific one is denotedby t, define assignment relaxation PAt which is identical to PA except thatvariables xt,j are set to 0 for j ∈ S \ {t}. That is, if city t does not directlyprecede some other city in S, then it must directly precede a city not in Swhich, of course, breaks the subcycle that existed from the solution to theparent relaxation PA. These PAt become new candidates.

Following is a more precise statement of a branch-and-bound algorithmfor the TSP.

Algorithm for the Traveling Salesman ProblemStep 1: Let the initial incumbent (dummy) value be∞ and solve the originalassignment relaxation. If the outcome is a tour, stop with the optimal TSPsolution. If there are subtours, go to Step 2.Step 2: Find an assignment relaxation solution having smallest value. Selecta smallest subtour. If there are q cities in the subtour, create q candidateassignment problems by invoking the relevant subtour elimination constraintas indicated above. Solve each corresponding assignment problem and then

16

go to Step 3.Step 3: If any of the newly created candidate assignment problems solvewithout subtours, pick the best and compare its value with the current in-cumbent solution value; if better, a new incumbent is produced. The valuesof assignment solutions with subtours that are less than the incumbent valuebecome new candidates. Select one with smallest value. If this is not less thanthe incumbent value, then the optimal solution has been found. Otherwise,go to Step 2. 2

The easiest way to proceed at this point is to simply work through anexample. In this regard, consider a 6-city instance where the ci,j values aregiven by the array shown in Figure 4, i.e., the travel time between cities iand j is identified with the value in cell (i, j). Obviously we cannot travelbetween a city and itself directly so the variables identified with diagonalelements in C are prohibited which is enforced by assigning a large weightto the respective cells.

Figure 4: Intercity cost data for TSP instance

Now, letting PA1 be the assignment relaxation of the original TSP in-stance, its solution produces the following: x1,2 = x2,3 = x3,1 = x4,6 = x5,4 =x6,5 = 1 and all other xi,j = 0. The value of this solution is 15 (sum of thesix ci,j values corresponding to the variables having value 1). Since subtoursare formed, the indicated assignment solution is not feasible for the corre-sponding TSP instance; however, its value is a lower bound on the optimalvalue of the latter solution.

Let us select (arbitrarily) the subtour involving cities 1, 2, and 3. Ac-cordingly, we will create three new candidate assignment problems whichthemselves are relaxations of three restricted TSP instances, each derivedfrom the original candidate PA1 . Let these be PA2, PA3 , and PA4 where be-low, we indicate the specific restrictions identified with each:

17

PA2: x1,2 = x1,3 = 0PA3: x2,1 = x2,3 = 0PA4: x3,1 = x3,2 = 0.

For example, in the solution of PA2 , while city 1 may again be part of asubcycle, it will at least not include cities 2 and 3; that is, the earlier subcylehas been eliminated from any deeper search for TSP solutions within thespace corresponding to PA2. Enforcement of this vis-a-vis a solution of theassignment problem is accomplished by setting the respective ci,j to somevery large value, i.e., c1,2 = c1,3 =∞.

Solving PA2 produces an assignment solution that is also a tour and hasvalue 24. Solving PA3 also produces a tour and one with value 20 which isbetter and it becomes the new incumbent. Now, the solution to PA4 producesa pair of subtours and has value 19. Although not a tour, the bound of 19indicates a possibility of ultimately locating a better TSP solution by furtherexploration in this portion of the overall space. Picking the smaller of thesubtours (hence fewer descendant subproblems) creates new candidates PA5

and PA6 as indicated in Figure 5. The restrictions added to PA4 would bex1,2 = 0 for PA5 and x2,1 = 0 for PA6. Solving each of these producesassignment solutions that are tours but neither are improvements on thepresent incumbent that was found in PA3 . Every candidate problem/branchis now fathomed and we may safely conclude that the tour 1-2-6-5-4-3-1 isoptimal for the original TSP instance. The entire computation is summarizedby the tree in Figure 5. Observe that assignment subproblem solutions aredepicted by graphs next to the respective nodes in the tree.

It is worth pointing out a computational subtlety in the application justdemonstated. The bound value of 19 produced by solving PA4 might beenough to fathom the corresponding branch even though its value is strictlyless that the value of the incumbent at that time (20). If the solution to PA4

had been unique and assuming the data for our instance is integer then anyultimate tour formation derived from deeper exploration within PA4 wouldhave to add at least 1 to the bound value. But this would then be equivalentto the incumbent value and hence, deeper exploration at this point couldnot lead to a strict improvement. Of course, for real instances involvinghundreds of cities, there may be many assignment solutions to a given PAtat least one of which is an actual tour. Rather than wasting time trying totest if this is the case, it is most likely better, computationally speaking, to

18

simply continue the branching process and let the fathoming decisions occurnaturally.

Discussion: From a computational perspective, the subproblems formedin our approach above, are very easy to solve. That is, exceptionally largeassignment problems can be treated without substantial effort. As suggestedin the previous example, when a new assignment problem is formed fromsome parent candidate, there is no fresh re-start; the existing solution to theparent is easily modified.

Unlike the case with the knapsack solution, the branching strategy weproposed for this TSP application does not create subproblems that explic-itly partition the space of the parent from which the subproblems are derived.However, this is not so important because the most crucial aspect of branch-ing holds: no solution to the parent candidate is lost when the subproblemsare created. Here, it just means that some solutions can be found in morethan one subproblem’s solution space. In other approaches to the TSP (ofwhich there are many) branching rules are available that do, in fact, createseparations that form a true partition; however, this doesn’t imply necessar-ily any computational savings. Rather, the latter is often realized by strongerbounding rules.

On a final note, we should point out a theoretical attribute that wasalluded to previously. The claim was made that the linear relaxation of anyassignment problem solves with interger values, i.e., 0 or 1. The reason thatthis happens is that the constraints of the assignment model, PA have a veryspecial structure. In particular, if viewed in compact form, the constraintmatrix would of course consist of only 0’s and 1’s but more importantly,if one were to compute the determinant of every square submatrix of thiscontraint matrix, we would always obtain values of 0, +1, or -1. Matriceshaving this property in general are referred to as totally unimodular andwhat this means is that the extreme points of the linear program, with sucha constraint matrix, are always integral (assuming that the righ-hand-sidevalues of the constraints are integer). Assignment problems are among anelite few such discrete models that exhibit this total unimodularity propertyand it is precisely what allows the linear relaxtion of PA to solve with onlyvalues of 0 and 1. 2

19

Figure 5: Branch-an-bound tree summarizing TSP solution

3.2 Direct Approaches

Branch-and-bound remains a mainline, general purpose strategy for dealingwith hard discrete optimization problems. Other approaches such as cuttingplane methods and dynamic programming are used but the basic divide-and-conquer tactic of branch-and-bound persists (often in combination with theother methods) as a viable tool. While particularly simple in concept, focusedapplications of the strategy can get reasonably complicated at the level ofproblem-specific details. By its inherently enumerative nature, branch-and-bound can also exact a heavy computational burden, especially when dealingwith large (read: practical) instances. Nonetheless, it remains, in some man-ifestation, one of the best alternatives in the optimization toolkit.

On the other hand, there are numerous settings where one would not needto resort to branch-and-bound; indeed, where it would be embarrassing todo so (such cases have occurred). Sometimes, for example, there is enoughspecial structure in a problem, either that is indigenous to the problem itself(e.g. assignment problems) or because we are presented with some special,restricted cases, that allows an approach that is direct in the sense that wecan solve the problem in some nonenumerative, constructive fashion. These

20

are typically the best sorts of strategies because they consume far less ef-fort. Unfortunately, they don’t arise that often. In this section, we examine,through come examples, the sorts of outcomes that reflect the essence ofthese approaches.

3.2.1 1-Machine Scheduling

While the vast majority of problems in the context of scheduling are hard andthus are solid candidates for enumerative approaches like branch-and-bound,there are some important cases where we can do much better.Problem 1: Suppose we have a set of n jobs, each requiring one operationon a machine; we have only one machine and it can process at most one jobat a time. Each job j has a duration time, tj . The completion time of a jobis denoted by cj and the aim is to find a sequence of the n jobs that exhibitsa minimum sum of completion times,

∑j cj .

Solution: Obviously, the completion time of a job is a function of whereit is positioned in a given n-job sequence so the objective is to find, amongthe n! possibilities a sequence that minimizes the stated total completiontime measure. Now, we certainly could write a mathematical model for thisproblem but the exercise would be tedious. Put more bluntly, it would bemisguided. Moreover, there is no need to ponder branch-and-bound possi-bilities. Indeed, all that is required to solve this problem is to arrange the njobs in nondecreasing order of their processing times. For example, if 10 jobshave tj values given by (5, 7, 3, 8, 2, , 3, 9, 2, 1, 6) for jobs 1, 2, . . . , 10 respec-tively, one solution would be to process in the order (9, 5, 8, 3, 6, 1, 10, 2, 4, 7).Obviously, the effort in applying this algorithm is only as much as a fastsorting algorithm. 2

Problem 2: Suppose we modify Problem 1 by introducing for each job, adue-date denoted by dj. If the lateness of a job, Lj is given by cj − dj thenwhat job sequence minimizes total lateness?Solution: A hasty examination of this problem might suggest that anordering on job due-dates would resolve the matter. However, it turnsout that the problem is solved by the same sequence that solves the to-tal completion time question of Problem 1. In fact, this is easy to see since∑j Lj =

∑j(cj−dj) =

∑j cj−

∑j dj. But the sum of all due-dates is constant

and so minimizing total lateness is equivalent to minimizing total completion

21

time; the solution follows by the same nondecreasing, duration-time order-ings. 2

Discussion: The field of combinatorial scheduling theory (from which prob-lems like the ones above are drawn) is rich in interesting discrete problems.Unfortunately, the vast majority of these problems are hard and must beattacked by enumerative procedures, popular among which is indeed branch-and-bound. It also turns out that it is a field that is full of interesting butfrustrating attributes involving what appear to be harmless alterations tootherwise easy problems but that actually produce exceptionally hard cases.For example, the tardiness, Tj of a job is the amount by which its comple-tion time exceeds its due-date, i.e., Tj = max (cj − dj). Then if we modifyProblem 2 above and seek a sequence of the n jobs that minimizes

∑j Tj, one

would find that no known ordering “trick” akin to those for the other twoproblems exists. In fact, minimizing total tardiness is an exceptionally hardproblem with no known solution short of enumerative-based ones. Interest-ingly, if rather than minimizing

∑j Tj, we had sought to minimize maximum

tardiness given by maxj Tj , then ordering on job due-dates (earliest first)provides the answer. 2

3.2.2 Vertex Covering on Trees

If G = (V,E) is a graph, then a vertex cover in G is a subset C ⊂ V suchthat every edge in G is incident to at least one vertex in C. Naturally, theoptimization question of interest is to find a smallest vertex cover in G. Forarbitrary graphs, this is a hard problem and no direct algorithm is likely toexist. On the other hand, if our graphs are trees (connected, acyclic graphs),the situation changes.Problem 3: Let T be a tree. Find a minimum vertex cover in T .Solution: We claim the following strategy will always work (i.e., for anytree):Begin with C = ∅. Find any degree-1 vertex in the tree and identify theunique vertex adjacent to it, say v. Set C ← C ∪ v and remove v from thetree along with all of its incident edges. Repeat this process on the resultingsubtree and continue until all edges have been eliminated. 2

An illustration of our proposed solution to this problem is provided inFigure 6. For ease, we simply indicate the degree-1 vertices that are identified

22

iteratively (as edges are removed per the actual statement of thr algorithm)and the vertices v . The latter are darkened.

Figure 6: Vertex cover on trees

Clearly, the strategy is fast. But is it correct? How would we go aboutestablishing the matter one way or another? Certainly, an argument forits “incorrectness” would follow if we could exhibit an instance where upona correct application of the procedure an outcome was produced but thatwas known to be dominated by a vertex cover of smaller size, i.e., if wecould produce a counterexample. Fortunately, however, it turns out thatthe proposed algorithm is optimal in that it will always produce a smallestvertex cover on any tree. Following we given a simple, constructive argumentestablishing this correctness.

Recall that a matching in a graph G = (V,E) is a subset M ⊂ E suchthat no two edges in M are incident to the same vertex. The notion of amatching was introduced earlier in the context of the pilot pairing problem(Example 2). In any event, matchings and vertex coverings in graphs arevery closely related. Specifically, if M and C are arbitrary matchings andcoverings in some G, then it is always the case that |M |≤| C |. This followssince edges in M are independent and hence at least one vertex incident toeach edge in M must therefore be in any cover (otherwise the respective edge

23

is not covered). This relationship holds for any matching and cover, whichcertainly includes a maximum matching and a minimum cover, M∗ and C∗

respectively.Now, let us examine the simple algorithm that was proposed. Each time

a vertex v is identified a single incident edge {v, k} is also identified (k isthe relevant degree-1 vertex). When v is removed, so too are all of the edgesincident to it; when the process is repeated, other vertices and edges aresimilarly identified and upon conclusion, we will have produced a set C thatis certainly a cover, but also we will have produced a set of edges that areindependent and hence constitute a matching. Moreover, the matching isexactly the same size as the resultant cover C. If this matching is denotedby M then certainly it can not be larger than a maximum matching in thegraph and of course our cover can be no smaller than a minimum one whichleads to the following:

M ≤M∗ ≤ C∗ ≤ C.

But it then follows that M = M∗ = C∗ = C and the solution produced byour algorithm is optimal.

Discussion: Probably the most interesting notion presented above is theshort proof that our simple algorithm works; often, we are not so fortunate.Still, the lesson should not be lost regarding how we accomplished this forthe stated problem and its solution algorithm. If we have a pair of problemsthat can be related in some consistent way such as by demonstrating a re-lationship where any solution value to one is bounded in some direction byany solution value to the other, then we have what is often referred to as aweak duality relationship. Students who have studied fundamental primal-dual relationships in linear programming will be familiar with this notion. Inany event, given a weak duality property between a pair of problems, thensometimes a ready-made argument for establishing the validity of an algo-rithm for solving a problem is, once a solution is in hand, to always showhow to produce a solution to its dual problem and that has the same value.By the weak duality relationship, we know that our solution must be optimaland that our algorithm must be correct.

Finally, it is important to note that while the inequality relationship aboveregarding covers and matchings holds for arbitrary graphs, if our structureis a tree, then the size of the optimal cover is always the same as that for anoptimal matching. In fact, this is true for any any graph that is free of odd

24

cyles (trees have no cycles at all). Such graphs are referred to as bipartite.On the other hand, the inequality can be strict for arbitrary graphs which acycle on three vertices makes clear. Here, a maximum matching consists of asingle edge while every minimum vertex cover would require two vertices. 2

3.2.3 Matching

We will conclude Section 3.2 with a brief description of what is often con-sidered to be one of the major “success stories“ in discrete optimization.This accolade generally follows because, for the most part, the outcomes inthis field are fairly gloomy from a computational viewpoint; as indicated inthe introduction, most discrete optimization problems are formally difficult.Accordingly, the only general purpose strategies are ones that consume pro-hibitive amounts of time. Conversely, those that are solved by so-called fastalgorithms tend to involve special cases or otherwise simple problems in gen-eral. The case of finding maximum matchings in arbitrary graphs is one ofthe best known departures from this convention. In this final section, wewill briefly sketch a rather famous procedure for solving the problem on anygraph.

Consider the graph in Figure 7a. Now, let us select an arbitrary matchingin the graph such as the one depicted by the bold edges in 7b. Obviously, ifevery vertex in the graph had been incident to an edge in this this matchingthen, trivially, the latter would be optimal. Now, given that this is not thecase, let us see if there exists a matching in the graph that is larger than theone shown in bold.

Figure 7: Matchings

25

Identify any vertex in the graph of Figure 7b that is not saturated byan edge in a current matching in the graph, i.e., is not incident to anymatching edge. Now, starting with this vertex, create a path of edges thatalternates between ones in the current matching and those that are not.This is referred to as an alternating path. If such a path exists betweenany pair of distinct unsaturated vertices then it is called an augmenting pathwhere the name derives from the easy observation that we can improve ourold matching by replacing it with one that has larger number of edges, i.e.,we can “augment” our previous matching. To see this, simply consider theedges on the augmenting path and replace the old matching edges with newones corresponding to the previously unmatched edges in the path. Sincean augmenting path has an odd number of edges, this switch will result ina new matching that is one edge larger than the previous matching. Toillustrate, we see that there is an augmenting path between the verticeslabeled x and y in Figure 7b and upon affecting the aforementioned switch,the new matching shown in part (c) results. We then would repeat thisprocess on the new matching, continuing until we reached a matching whereno more augmentation was possible (the matching in 7c is augmentable viaa path between x and y). Figure 8 exhibits a second example.

Now, it turns out that this simple process is really all that one needs. Infact, the graph-theorist Claude Berge proved that a matching in a graph Gis optimal if and only if there exists no augmenting path in the graph. Forexample, the matching in Figure 8c is optimal for the graph shown.

26

Figure 8: Matching updates

So why is this easy notion particularly special? All we do is start withsome arbitrary matching in a graph, look for an augmenting path which iffound, improves our matching and if nonexistent, then by Berge’s theorem,we know that we are done. The answer lies in the fact that while not soevident with small illustrations, this augmenting path search requires sub-stantial attention to details; otherwise, it can be very inefficient. A look atthe graph in Figure 9 provides a glimpse of the issue. If one starts the searchfor an augmenting path, it will be quickly found that some of our alternatingpaths “fold back on themselves.” This occurs in the presence of cycles ina graph that have an odd number of edges. That is, the first and last un-saturated vertices are the same and yet trying to effectively monitor and/orprohibit these paths can pose severe computational problems. Unfortunately,a full appreciation of this phenomena is beyond the scope of our treatmenthere. Nonetheless, it was not until the seminal results of the mathemati-cian Jack Edmonds (1965) that an efficient search scheme among alternatingpaths in general was found and made formal.

27

Figure 9: Matching in arbitrary graphs

Discussion: The computational issues regarding the handling of alternatingpath searches arise only for arbitrary graphs. If we are trying to find amaximum matching in a graph that has no odd cycles, that is, for the specialcase of bipartite graphs, then we will not encounter difficulties since none ofthe troublesome alternating paths can occur.

But what about just solving the matching formulation PM? Accordingly,suppose we create the explicit formulation for the graph in Figure 10a. Wewould have:

max x1,2 + x2,3 + x3,4 + x4,1

s.t. x1,2 + x4,1 ≤ 1x1,2 + x2,3 ≤ 1x2,3 + x3,4 ≤ 1x3,4 + x4,1 ≤ 1x1,2, x2,3, x3,4, x4,1 = 0 or 1.

Now, suppose we relax the 0− 1 restriction on the variables by replacingeach by 0 ≤ xi,j ≤ 1 and solve the resultant linear program. If we do this,we would produce one of two solutions: x1,2 = x3,4 = 1; x2,3 = x4,1 = 0 orx1,2 = x3,4 = 0; x2,3 = x4,1 = 1. Both are obviously matchings consisting oftwo edges which, of course, is maximum. This could be a good sign.

Indeed, if solving the linear programming relaxation of PM works, theprior concern regarding alternating path searches could be rendered moot.To get a little more confidence before any claims are made, however, let usconsider another graph, say the one in part (b) of Figure 10. Formulatingthe linear relaxation of the corresponding PM model for the 3-cycle shownyields:

28

max x1,2 + x2,3 + x3,1

s.t. x1,2 + x3,1 ≤ 1x1,2 + x2,3 ≤ 1x2,3 + x3,1 ≤ 1x1,2, x2,3, x3,1 = 0 or 1.

Figure 10: Illustrations for PM formulation

Upon solving the LP relaxation, we obtain x1,2 = x2,3 = x3,1 = 1/2 witha value of 3/2 which although a legitimate LP solution is nonsense relativeto matchings. In fact, if one were to repeat these LP calculations on thecorresponding formulation for any arbitrary graph there is a very good chancethat fractional-valued variables would result; if they do then this has to occurin the presense of odd cycles (although the converse is not necessarily true,i.e., odd cycles do not necessarily induce fractional-valued variables). As anaside, if fractions do result, they are always 1/2. Put more formally, thevariables in every admissible solution to the linear relaxation of PM (for anygraph) take on values 0, 1, or 1/2.

But suppose we had some constraints that would disallow these fractional-valued variables from occuring while at the same time not eliminating anyof the other, legitimate 0-1 valued outcomes. For example, suppose we addto the LP relaxation above the single constraint: x1,2 + x2,3 + x3,1 ≤ 1. Indoing this, and upon solving the LP, it will be found that an optimal solutionresults with exactly one of the three variables (any one) taking on the valueof 1 while the other two are 0. This produces an outcome that is relevant asa maximum matching in the given graph. But this is only a single example;have we done anything that generalizes?

In fact, what we have just done above is apply what amounts to a com-plete resolution of the odd-cycle “problem” vis-a-vis the solution to the LP

29

relaxation of PM . That is, for any graph G we form the linear relaxation ofits relevant PM and to this, add the following contraints:∑

(i,j)∈ES xi,j ≤ b|S|−1

2c for all odd-cardinality vertex sets S ⊆ V .

Note that we use ES to denote the subgraph of G that is induced by the oddvertex set S. Clearly, the constraint in our 3-cycle example follows by lettingS = V = {1, 2, 3}. In fact, it is true that adding these so-called odd-setcontraints will always produce an LP that solves with interger values thatare 0 or 1, i.e., the new constraints simply cut off the unwanted, fractionalextreme points without affecting any of the others.

But does this really mean that we have actually found a way to getaround the intimated complication of finding maximum matchings in graphsthat have lots of odd cycles? Unfortunately, the answer is no and the reasonshould by now be evident. While it is indeed true that the odd-set contraintswill “work”, this turns out to be of theoretical importance but less thanhelpful from a practical perspective. The “glitch” is that there are simplyentirely too many of these constraints, i.e., there are exponentially manyodd sets S from which such constraints could be formed. Of course, manyof these might not be relevant but the problem is that we can’t tell ahead oftime which to use and which to disregard. 2

4 Chinese Postman Problem

In this section, we will describe and demonstrate a general solution for one ofthe more famous problems in discrete optimization. In fact, a variant of theproblem was actually given at the outset by Example 5. Below, we statethe fundamental version:

Given a graph G = (V,E) where edges are assigned nonnegative integerweights, find a least weight traversal in G that begins at some vertex, say v,includes each edge in G at least once, and returns to v.

This is referred to as the Chinese Postman’s Problem; the name arisingsince it seems to have been first posed by the mathematician Kwan andbecause it evokes a sense of the sort of tour that a postman might pursueby moving up and down streets of a neighborhood. The first algorithmsproposed for solving the basic problem, while clever, were not efficient. In1965, and as part of his work on the matching problem, Edmonds provideda general solution for the Postman’s problem that is formally efficient and

30

that will work for any undirected, connected graph (with nonnegative edgeweights). Not surprisingly, Edmonds’ approach relies heavily on his resultsdealing with matchings. Following, we state the procedure:

Algorithm for the Chinese Postman’s ProblemStep 0: Let G be the input graph and denote the set of odd-degree verticesin G by VO.Step 1: Determine a shortest path in G (and the respective lengths) betweenall pairs of vertices in VO.Step 2: Form a complete graph on vertex set VO with edges weighted bythe respective shortest path lengths from Step 1. Find a least weight perfectmatching in K|VO| and duplicate the implied edges in G. Call this supergraphG′.Step 3: Produce the existing Eulerian traversal in G′. 2

Recall that an Eulerian traversal in any connected graph is a walk thatbegins and ends at the same vertex and that includes each edge in the graphexactly once. In 1736, Euler established that any connected graph witheven degree at every vertex admits such a traversal; indeed, such graphsare referred to as Eulerian. In any event, if the input graph G for ourChinese Postman’s instance is already Eulerian, then we are done of course;the existing Eulerain traversal most surely is a least weight postman traversalsince each edge is included exactly once. On the other hand, if it is not,then what the algorithm above is actually doing is finding a least weightduplication of edges on our input graph that makes it (in the form of G′)Eulerian. Obviously, in terms of a traversal in the real graph G, a duplicateedge (as part of G′) implies backtracking on the real edge of G.

On a final interpretive note, Step 2 above refers to a perfect matching.This is simply a matching that saturates every vertex. Obviously, arbitrarygraphs need not necessarily have perfect matchings; however, all completegraphs on an even number of vertices do. Here, we want a perfect matchingof minimum total weight and this is where Edmonds’ seminal work comesinto play.

We will conclude with a demonstration of the algorithm but we will doso while at the same time, showing how to solve the problem of Example5:. The latter was given in Section 1 and certainly looks almost like ourclassic Postman problem with the exception that the refuse pickup vehiclebegins and ends at different points. Is this difference fatal or is there some

31

way to deal with it? Fortunately, the answer is yes. The trick is to simplyadd an artificial vertex d to the graph representing the neighborhood shownin Figure 2 and connect it to the specified vertices x and y. Then assignsome very large weight to these two artificial edges,i.e., large enough so thatthey would never be included in any shortest paths. Now, if this new graphis denoted by G, we simply apply the algorithm above to G and find theEulerian traversal (called for in Step 3) in the resultant G

′. If this traversal

is denoted by the vertex sequence {d, x, . . . , y,d}, then simply remove vertexd and its incident edges and we are left with the desired traversal between xand y in the graph depicting the neighborhood. Below, we demonstrate thealgorithm explicitly.

The graph in Figure 11a is the prior structure (Figure 2) with edgesweighted and dummy vertex d added as shown. Now, we have VO = {x, 2, 6, 7}and the respective shortest paths and their values relative to all six pairs ofvertices in VO is given in Figure 11b. A minimum weight matching pairs theshortest paths between x and 2 and between 6 and 7. The correspondingedges are duplicated and G

′results as indicated in Figure 11c. An Eulerian

traversal in G′

is given the vertex sequence

{d, x, 1, 2, 1, x, 2, 3, x, 6, 5, 3, 7, 2, 4, 7, 5, y, 7, y, 6, y,d}.

Removing vertex d and its incident edges leaves the desired traversal asrequired by the problem statement in Example 5. 2

32

Figure 11: Chinese Postman’s Example

Discussion: To novices, the Chinese Postman’s Problem is often consideredmore difficult than the TSP. Although not important per se, this reactionusually arises from the sense that a salesman has to visit every city but hasmany options (we assume a complete graph) and hence some flexibility rel-ative to his/her choices whereas, the postman has to adher, in a fairly rigidway, to the structure of the graph and that accordingly, the chances for get-ting “painted into a corner” are greater. In any event, what the postman isreally trying to do is minimize the amount of backtracking that is necessary,assuming that the graph is not Eulerian (which would require no backtrack-ing). Cast in this context, it is remarkable that a total solution is known.Unfortunately, revealing the full beauty of Edmonds’ work is also beyond thescope in this document.

It should not go unmentioned that in applying Edmonds’ solution, a good

33

shortest path algorithm is needed in Step 1; fortunately this is not a problemsince a host of options are available. Also, we need some way to produce theEulerian traversal in G′. Again, this requires an algorithm but the task is apretty simple one. Finally, and related to the shortest path issue, readers maywonder why the input specification for the Chinese Postman’s Problem calledfor nonnegative edge weights. The answer is a bit subtle in that if we wereto allow for arbitrary edge weights which of course includes negative values,then we cast the entire problem into the feared NP-Complete category thathas been mentioned before. Indeed, we know of no way to find (efficiently)shortest paths in graphs where negative cycles exist, i.e., cylces having totaledge weight less than zero.

We conclude by pointing out that there is also a version of the ChinesePostman’s Problem for graphs that are directed. On the surface, this wouldseem to make the problem substantially more difficult. Happily, this is notso and the relevant procedure, while a little more involved, is essentially thesame as for the undirected case (c.f., Parker (1995)). On the other hand, ifour input graph is mixed with some directed and undirected edges, then theresulting Postman’s Problem is very hard; it is in fact NP-Complete.

5 Approximation Procedures

Sometimes we are simply faced with a problem that is sufficiently compli-cated, seems free of any special structure, and typically involves instancesthat are beyond our available computational resources. While we might re-late all of this to our boss and even achieve some measure of sympathy, espe-cially if he/she understands a little complexity theory, it is the case that theproblem does not go away, it remains important, and some sort of “solution”has to be produced accordingly.

Under these circumstances, which occur fairly routinely in practice, wesimply have to be less ambitious. No longer will we be able to employ proce-dures that guarantee optimal solutions but rather, we will have to settle forapproximate outcomes produced by heuristics. These are approaches thatare guided by various rules-of-thumb, sometimes very clever rules, and oftenones that actually perform quite well. The difficulty is that we cannot assurethat they will always perform well. In fact, sometimes a generally reliableheuristic can produce solutions that are substantially distant from an opti-

34

mal solution. Given such a state of affairs, it would be comforting to be ableto at least bound the damage that a heuristic could cause. That is, if forsome problem P we denoted the optimal solution value of P (relative to anyinstance) as z∗P and the value produced by a heuristic A as z∗A, then a finite

bound on the ratioz∗Az∗P

would suggest a worst-case peformance guarantee on

the behavior of the heuristic A. If P is a minimization problem, then abound of 2 would suggest that, in the worst case, the heuristically generatedsolution could never be more than twice as bad as the optimal (never offby more than 100% . . . ever); a bound of 5

4, misses by no more than 25%,

etc. Naturally, the aim is to have a worst-case bound that is as close to 1 aspossible.

On a positve note, heuristics are usually very fast; obviously, there wouldbe little use for a procedure that could not guarantee optimality but con-sumed effort only marginally different from a one that could. Following, wetake up three generally hard problems, two of which have been introducedearlier in this document. For each we will describe a popular heuristic andthen look briefly at some issues that its particular application raises.

5.1 Bin-Packing

The classic bin-packing problem seeks an assignment or packing of a finiteset of “chips” each with some positive weight, wj into the fewest number of“bins,” each having some finite capacity, C. This is a very difficult problemand in that regard, one that has attracted much attention with respect to theuse of heuristics. One of the best is the so-called first-fit, decreasing weightheuristic, sometimes indicated as FFD:Create a list L of chips arranged in nonincreasing weight-order. Select chipsfrom L in this order, placing a given selection in the first available bin intowhich it will fit. 2

For example, consider a list of 10 chips which, for ease, we will simplyidentify by their weight order and that have already been placed in L as{44, 24, 24, 22, 21, 17, 8, 8, 6, 6}. If the bins have capacity C = 60, then it iseasy to apply FFD and the three-bin packing from Figure 12 results. Obvi-ously, this packing is optimal since we know that at least three bins must be

used in this instance, i.e., at least d∑

jwj

Ce bins are required for any instance.

35

Figure 12: FFD when C = 60.

Suppose that we reapply FFD to this same instance but with one mi-nor alteration: increase C to 61. Interestingly, a correct application of theheuristic produces the packing shown in Figure 13 and that requires fourbins! Anomalously, we have increased the bin capacity for packing the samechips and yet, we require more bins to do so. Apparently, heuristics cannotonly perform badly (i.e., miss the optimal), but they can be devious as well.

Figure 13: FFD when C = 61.

36

Of course, the counterintuitive outcome above needs to be placed in per-spective. First, students should not miss the point here; to be sure, if the tenchips can be packed into three bins of capacity C, then they most certainlycan be packed into three with capacity C ≥ C. At issue is that the heuris-tic rule-of-thumb was simply “tricked” in that it forced us to act in a hastyfashion by placing a chip in a position that ultimately penalized us at a laterpoint. The heuristic was not sensitive enough to recognize this (none wouldbe) and once a chip placement was fixed, there is no recourse. Looking forways to “undo” bad heuristic decisions is a very risky undertaking becausethis can lead to computational demands that rival an optimization algorithmwhich , if employed, would have avoided the anomalies in the first place.

But how does this FFD heuristic behave in practice? Perhaps the sortof “trick” outcomes like that demonstrated above are rare and even so, asinstances grow very large, maybe their effect is mitigated. Indeed, this tendsto be the case. In fact, it is known that the FFD heuristic for bin-packingactually performas quite well. Generally, as instances of bin-packing growlarge (several hundred chips), it has been shown that one will never use morethan 11

9of the optimal number of bins, i.e., no more than 22 % beyond the

minimum.Before we move on, we should deal with what might be construed as an

inconsistency with the 119

statement above and the example just given. Re-garding the latter, we produced an instance where FFD yielded an outcomerequiring 4 bins whereas the optimal solution is to use only 3. Should webe concerned that this produces a ratio of 4

3which obviously is greater than

119

? Not at all. Observe that our claim earlier was made regarding very largeinstances; certainly the one upon which our anomaly was based is very small.In fact, the 11

9ratio referred to on FFD is an asymptotic result. The formal

performance bound on FFD has an additional term but as instances grow,this term tends to zero.

5.2 Vertex Cover

Recall that for a graph, G = (V,E), a minimum vertex cover is a smallestsubset C ⊂ V having the property that every edge in E is incident to at leastone vertex in C. As we indicated earlier, finding a minimum vertex coverin arbitrary graphs is hard and so if we do not have the luxury of dealingwith any known special cases such as bipartite graphs, then the problem is

37

a candidate for the application of a heuristic procedure. Here’s one:Find a vertex of largest degree and put this vertex in a cover. Remove thevertex and all of its incident edges; repeat on the resulting graph, continuinguntil all edges are removerd, i.e., covered. 2

We have not been too formal in the statement of the procedure but itsapplication is simple and should be clear. Nonetheless, the mentality of thescheme is fairly rational: vertices of high degree are likely to be in optimalsolutions because they cover the greatest number of edges, at least in a localsense. For example, if we apply this strategy to the graph in Figure 14, acover consisting of the verticies in bold results which happens to be optimal.Note that rather than actually removing edges and evaluating new vertexdegrees (as the algorithm would indeed require), we have simply numberedthe selected vertices as they would have been selected iteratively.

Figure 14: Vertex cover application

On the other hand, if we try our hand on the instance given in Figure15, a perfectly correct application of the heuristic would produce the coverconsisting of the a and c vertices. Obviously, a much smaller cover would bethe one consisting only of the b vertices. In fact, our heuristic has not donevery well. Somewhat discouraging, it is possible to create instances that willforce this heuristic to perform as badly as we want; that is, that producesolutions arbitrarily distant from the optimal one. In our worst-case boundparlance, we would say that this heuristic is not bounded by any constant.

38

Figure 15: Bad vertex cover application

5.3 Traveling Salesman Problem

When it was indicated earlier that the traveling salesman problem was oneof the most celebrated, indeed one of the most studied problems in discreteoptimization, the claim was made not only from the perspective of the ap-plication of optimization procedures. The literature on the TSP is in factflooded with heuristic approaches and their concomittant experimental andtheoretical evaluations. In this concluding section we will examine a coupleof interesting notions regarding the latter.

First of all, if one is given a proposed TSP heuristic, even a good one,it is generally pretty easy to trick it. This is a cynical view of course, andindeed, most clever heuristics are pretty reliable in that they will performquite well on “everyday” instances (whatever these are). On the other hand,one is never certain when an instance that is nonstandard will be presented;if the heuristic is fool-proof then this is of no consequence but otherwise, itsapplication could be disastrous.

If we were to evaluate TSP heuristics on the basis of their worst-caseperformance bounds, the one at the top of the list (i.e., smallest knownconstant bound) is a strategy that looks remarkably similar to our soultionfor the Chinese Postman’s problem. Due to Nicos Christofides (1976), the

39

procedure can be stated as follows:

Christofides’ Heuristic for the TSPThe input graph is given by G = (V,E) = Kn where edges are weighted

as nonnegative integers.Step 1: Find a minimum weight spanning tree in G. Let the tree be denotedby T .Step 2: Let the odd-degree vertices in the tree of Step 1 be denoted by VOand find a minimum weight perfect matching in the subgraph of Kn inducedby VO. Let this matching be denoted by M ⊆ E.Step 3: The graph formed as M ∪T is Eulerian. Produce the corresponding(Eulerian) traversal and, interpreting it as a vertex sequence, form a TSP tourby beginning at the initial vertex, and proceeding in order, “shortcutting”past duplicated vertices until the starting vertex is reached again. 2

An illustration of the Christofides heuristic is given in Figure 16. Wetake as our input graph, K6 with edge weights given by C as shown, e.g.,w5,2 = 1, w3,2 = 2, etc.. The three steps are applied with the outcomes shown;the computations/constructions should be evident.

Now, if we denote the tour value produced by this procedure as z∗C andthe optimal tour length by z∗, then it is known that

z∗Cz∗≤ 3

2.

However, if one were to pick a totally random set of edge weights for some n-city instance, it is entirely conceivable that the Christofides algorithm wouldbe much worse than 50 % greater than the optimal solution value. How couldthis be, in light of the 3

2claim just made?

40

Figure 16: Application of Christofides’ TSP heuristic

The answer is that the heuristic described, and that is indeed the bestknown in terms of a performance guarantee, must be applied to instanceswhere the triangle-inequality holds on edge weights in order that the validityof the bound is established. This metric requires that edge-weights, wi,jsatisfy the following relationship:

wi,j ≤ wi,k + wk.j for all triples (i, j, k) ∈ V .

Obviously, we can apply the procedure to any instance but without the statededge weights, we can no longer guarantee the bound. What the triangle-inequality establishes is something of a safeguard against catastrophic be-haviour. Conversely, if completely arbitrary edge weights are allowed thenwe can always produce a graph and a set of weights that will force any heuris-tic to be painted in the proverbial corner where the price to be paid is toincur an unavoidably large edge weight which in turns makes our heuristicallygenerated solution as bad as we want.

41

This condition can be made even more stark. To date, there is no knownTSP heuristic applicable to instances having arbitrary edge weights that evenhas a finite worst case performance guarantee. What’s more, the prospectsare not at all bright that this state of affairs will ever change (c.f., Parkerand Rardin (1988)).

On a final note, it is interesting that there is no known heuristic thatbests the Christofides procedure in terms of a constant bound; the 3

2ratio

remains the best known. However, there is also no proof suggesting that sucha procedure does not exist. Trying to resolve this issue would make a nicehomework problem indeed.

6 Final Comments

The purpose of this document is to provide an elementary introduction tothe subject of discrete optimization. Accordingly, we have included “bits andpieces” of a variety of topics that are representative of the field. In this regard,we have only scratched the surface of what is a subject rich in extremelychallenging problems, worthy of study from a fundamental perspective andmost certainly from that of a decision-maker facing complex problems thatarise in business and industry on a daily basis.

Students seeking to learn more about the basic methodological subjectmatter in the field can expect to find relevant coursework in operations re-search, computer science, industrial and systems engineering, and relateddisciplines in the mathematical sciences. To be successful, in fact to evenappreciate the myriad nuances and subtleties that permeate the subject, astudent needs to be comfortable with abstraction and must understand theneed to apply sophisticated computational tools in order to resolve what areoften overwhlemingly cumbersome problems.

Perhaps most important, and an attribute that is relevant in mathemati-cal optimization in general but especially so in the discrete setting, studentsmust be prepared to exercise maturity in both recognizing and dealing withdiscrete optimization problems. Some of the most interesting (and difficult)outcomes have been motivated by real-world settings in areas such as man-ufacturing, production, telecommuncations, and logistics. Stripped of their“practical” context, many of these problems generate deep theoretical in-terest in their own right; couched in their real-world settings, they create

42

challenges that if treated effectively, often yield substantial payoffs.

7 Exercises

Following, we give some exercises; some of these are routine applications ofalgorithms while others are more thought-provoking.1. Construct mathematical models for Examples 1, 4, and 5.2. The vertex coloring problem on graphs asks for the smallest number ofcolors that can be assigned to the vertices such that no pair of adjacentvertices have the same color. Write a mathematical model for solving thisproblem.3. Solve the following problem by branch-and-bound:

max 6x1 + 3x2

s.t. 3x1 + 2x2 ≤ 182x1 + 6x2 ≤ 15x1 ≤ 4x1, x2 nonnegativity integers.

4. Using the branch-and-bound procedure demonstrated, solve the 6-cityTSP having cost matrix below:

5. It was suggested that when solving assignment relaxations, one should notwaste time trying to test if a complete tour exists when multiple assignmentsolutions exist. Give a formal argument that provides credibility to thisclaim.6. A TSP instance is said to be symmetric if ci,j = cj,i for all pairs (i, j), i 6= j.Otherwise, it is asymmetric. Now, when applying the branch-and-bound pro-cedure provided in Section 3.2.1, would you expect the same computationaldemands regardless of which case was being solved? Explain.

43

7. Suppose we have a bottleneck TSP where our aim is to minimize thelongest single edge weight in a tour. Show that an optimal solution to ourconventional, min-sum version of the TSP need not also be optimal under thismin-max criterion. Argue that the two problems are equivalent in difficulty.8. A matching M in a graph G is said to be maximal if there is no largermatching in G that properly contains M . Now, consider the following heuris-tic for the vertex covering problem on a graph G: find any arbitrary maximalmatching in G and let the vertices defining the edges in this matching consti-tute a cover. First, does the procedure always produce an admissible vertexcover? Second, establish a performance bound on its application.9. Solve the maximum matching problem on the graph given below; startwith the matching indicated in bold.

10. Given a weighted, complete graph Kn, finding a minimum weight span-ning tree in Kn is easy. Suppose now that we ask for a degree-constrainedminimum weight spanning tree, i.e., for some fixed k ≤ n − 1, find a mini-mum weight spanning tree having no vertex of degree in excess of k. First,give an argument that there is likely to be no fast, “direct approach” to thisproblem and then, devise a branch-and-bound approach to solve it.11. Consider the weighted graph below and solve the Chinese Postman’sProblem accordingly.

44

12. Prove that a correct application of the Chinese Postman’s algorithm willnever force the postman to backtrack down a street more than once.......orgive a counterexample to the claim.13. Apply the FFD algorithm for bin-packing to the 10-chip instance withchip weights as follows and for bins of capacity 11: {7, 2, 5, 3, 6, 5, 3, 1, 9, 3}.14. Apply the Christofides heuristic to a TSP instance (of your choosing);randomly assign 1’s and 2’s to the edges in order to satisfy the triangle-inequality.15. Create an instance of a TSP that makes the Christofides procedureperform close to its performance bound of 3

2. Make sure your instance satisfies

the triangle-inequality.16. A dominating set in a graph is a subset of vertices D with the propertythat every vertex in the graph is adjacent to at least one vertex in D. Describehow (if at all) this problem is related to the vertex cover problem. Is it easierthan the latter?17. In its abstract form, The Chinese Postman’s Problem looks to add edges,in some least cost fashion, to a non-Eulerian graph in order to make it Eule-rian. Consider the opposite notion: given some non-Eulerian graph, removethe least costly number of edges in order to leave a subgraph that is Eulerian.What is your sense of the relative “difficulty” of this version, as compared tothe standard one that was treated earlier? Explain.18. Suppose an instance of the bin-packing problem has bins of capacity Cand chips having weight, wj > d

C3e. Describe a “direct method” (in the sense

described in Section 3.2) for solving the instance.

8 References

We list here some general references that interested students might find ben-eficial, especially in examining details of specific results that were mentionedthroughout, i.e., matching problems, Chinese Postamn’s problem, etc. Be-yond this, these references, while fairly advanced, can still serve to acquainteven beginners, with the richness of this field and of course, students whodo want to dig deeper, will be easily guided by the extensive reference listscontained in each.Ahuja, R. K., Magnanti, T. L., and Orlin, J. B., (1993) Network Flows,Prentice-Hall, Englewood Cliffs, NJ.

45

Lawler, E. L., (1976) Combinatorial Optimization: Networks and Matroids.Holt, Rinehart and Winston, New York.Lawler, E. L., Lenstra, J. K., Rinnooy Kan, A. H. G. and Shmoys, D. B. (eds)(1985) The Traveling Salesman Problem: a Guided Tour of CombinatorialOptimization, John Wiley, Chichester.Nemhauser, G. L. and Wolsey, L., (1988) Integer and Combinatorial Opti-mization, John Wiley, New York.Papadimitriou, C. H. and Steiglitz, K., (1982) Combinatorial Optimization:Algorithms and Complexity, Prentice-Hall, Englewood Cliffs, NJ.Parker, R. G., (1995) Deterministic Scheduling Theory, Chapman & Hall,London.Parker, R. G. and Rardin, R. L., (1988) Discrete Optimization, AcademicPress, New York.

46

DISCRETE OPTIMIZATION - people.math.gatech.edu

Documents

Transcript of DISCRETE OPTIMIZATION - people.math.gatech.edu