BoxRouter: A New Global Router Based on Box Expansion and...

14
2130 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007 BoxRouter: A New Global Router Based on Box Expansion and Progressive ILP Minsik Cho and David Z. Pan, Senior Member, IEEE Abstract—In this paper, we propose a new global router, BoxRouter, powered by the concept of box expansion, progressive integer linear programming (PILP), and adaptive maze routing (AMR). BoxRouter first uses a simple PreRouting strategy to predict and capture the most congested region with high fidelity as compared to the final routing. Based on progressive box expansion initiated from the most congested region, BoxRouting is performed with PILP and AMR. Our PILP is shown to be much more efficient than the traditional ILP in terms of speed and quality, and the AMR based on multisource multitarget with bridge model is effective in minimizing the congestion and wirelength. It is followed by an effective PostRouting step, which reroutes without rip-up to enhance the routing solution further and obtain smooth tradeoff between wirelength and routability. Our experimental results show that the BoxRouter significantly outperforms the state-of-the-art published global routers, e.g., 91% better routabil- ity than Labyrinth (with 14% less wirelength and 3.3× speedup), 79% better routability than Chi-dispersion router (with similar wirelength and 2× speedup), and 4.2% less wirelength and 16× speedup than a multicommodity flow-based router (with simi- lar routability). Additional enhancement in box expansion and PostRouting further improves the result with similar wirelength but much better routability than the latest work in global routing. Index Terms—Congestion, global routing, integer linear pro- gramming (ILP), physical design, rectilinear minimum Steiner tree (RMST), routability. I. I NTRODUCTION R OUTING is a key stage for very large scale integration (VLSI) physical design. Aggressive technology scaling has led to much smaller/faster devices but more resistive inter- connects and larger coupling capacitance. Since routing directly determines interconnects (wirelength, routability/congestion, and so on) and the overall VLSI system performance [6]–[8], it plays a critical role in the deep-submicrometer- design closure. For nanometer interconnects, the manufactura- bility and variability issues, such as antenna effect, copper chemical–mechanical polishing (CMP), subwavelength print- ability, and yield loss due to random defects, are becoming a growing concern and showing to directly be impacted by wire embedding [8]–[15]. Thus, routing also plays a major role in terms of the manufacturing closure. Manuscript received September 28, 2006; revised January 2, 2007. This work was supported in part by SRC under Contracts 2005-TJ-1321 and 2005-TJ- 1362, by IBM under a Faculty Award, by Fujitsu, by Sun, and by Intel through equipment donations. This paper was recommended by Associate Editor C. J. Alpert. The authors are with the Department of Electrical and Computer En- gineering, University of Texas, Austin, TX 78712 USA (e-mail: thyeros@ cerc.utexas.edu; [email protected]). Digital Object Identifier 10.1109/TCAD.2007.907003 In general, routing consists of two steps: global and detailed routings. While detailed routing finalizes the exact design-rule compatible (DRC) pin-to-pin connections, the global routing, as its name implies, is the routing stage that plans the ap- proximate routing path of each net to reduce the complexity of routing task and guide the detailed router [16]. Thus, it has significant impact on the overall wirelength, routability, and timing [16], [17]. Furthermore, it is also the key stage for optimizing the wire-density distribution to improve the overall manufacturability (e.g., less post-CMP topography variation, less copper erosion/dishing, and less optical interference for better printability [10]–[13]) and yield (e.g., smaller critical area [14], [15]). The importance of global routing in VLSI design flow has led to many works in predicting and estimating routing con- gestion and designing global routers. Probability-based con- gestion prediction for global routing is studied in [18]–[21], and global-router-based congestion estimation is researched in [22] and [23] for early wirelength estimation. Within the scope of over-the-cell global-routing model [2], Burstein and Pelavin [24] proposed a hierarchical approach to speed up integer-programming formulation for global routing, and Kastner et al. [25] proposed a pattern-based global routing. Hadsell and Madden [2] presented Chi-dispersion router based on linear cost function as well as predicted congestion map and showed better results than the study in [25]. The multi- commodity flow-based global router by Albrecht [3] showed good results and was used in industry but at the expense of computational effort. Fast global router [4], [26], [27] can feed more accurate interconnect information (such as wirelength and congestion) back to placement or other early physical synthesis engines for better design convergence and tighter integration. In this paper, a new routability-driven global router, called the BoxRouter, is proposed. BoxRouter first performs a very fast yet effective PreRouting to identify the most congested regions or boxes. Then, it progressively expands the routing box and performs routing within each expanded box (BoxRout- ing) until the entire circuit is covered, i.e., all the wires are routed. Efficient progressive integer linear programming (PILP) is formulated with adaptive maze routing (AMR), and effective PostRouting follows BoxRouting for further enhancement. The major contributions of this paper include the following propo- sitions and observation. 1) We propose a new ILP formulation, which is significantly faster and more scalable than the traditional formulation in [3] and [28], which makes it practical to apply ILP to solve VLSI routing. 0278-0070/$25.00 © 2007 IEEE

Transcript of BoxRouter: A New Global Router Based on Box Expansion and...

Page 1: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2130 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

BoxRouter: A New Global Router Based onBox Expansion and Progressive ILP

Minsik Cho and David Z. Pan, Senior Member, IEEE

Abstract—In this paper, we propose a new global router,BoxRouter, powered by the concept of box expansion, progressiveinteger linear programming (PILP), and adaptive maze routing(AMR). BoxRouter first uses a simple PreRouting strategy topredict and capture the most congested region with high fidelity ascompared to the final routing. Based on progressive box expansioninitiated from the most congested region, BoxRouting is performedwith PILP and AMR. Our PILP is shown to be much moreefficient than the traditional ILP in terms of speed and quality,and the AMR based on multisource multitarget with bridge modelis effective in minimizing the congestion and wirelength. It isfollowed by an effective PostRouting step, which reroutes withoutrip-up to enhance the routing solution further and obtain smoothtradeoff between wirelength and routability. Our experimentalresults show that the BoxRouter significantly outperforms thestate-of-the-art published global routers, e.g., 91% better routabil-ity than Labyrinth (with 14% less wirelength and 3.3× speedup),79% better routability than Chi-dispersion router (with similarwirelength and 2× speedup), and 4.2% less wirelength and 16×speedup than a multicommodity flow-based router (with simi-lar routability). Additional enhancement in box expansion andPostRouting further improves the result with similar wirelengthbut much better routability than the latest work in global routing.

Index Terms—Congestion, global routing, integer linear pro-gramming (ILP), physical design, rectilinear minimum Steinertree (RMST), routability.

I. INTRODUCTION

ROUTING is a key stage for very large scale integration(VLSI) physical design. Aggressive technology scaling

has led to much smaller/faster devices but more resistive inter-connects and larger coupling capacitance. Since routing directlydetermines interconnects (wirelength, routability/congestion,and so on) and the overall VLSI system performance[6]–[8], it plays a critical role in the deep-submicrometer-design closure. For nanometer interconnects, the manufactura-bility and variability issues, such as antenna effect, copperchemical–mechanical polishing (CMP), subwavelength print-ability, and yield loss due to random defects, are becoming agrowing concern and showing to directly be impacted by wireembedding [8]–[15]. Thus, routing also plays a major role interms of the manufacturing closure.

Manuscript received September 28, 2006; revised January 2, 2007. This workwas supported in part by SRC under Contracts 2005-TJ-1321 and 2005-TJ-1362, by IBM under a Faculty Award, by Fujitsu, by Sun, and by Intel throughequipment donations. This paper was recommended by Associate EditorC. J. Alpert.

The authors are with the Department of Electrical and Computer En-gineering, University of Texas, Austin, TX 78712 USA (e-mail: [email protected]; [email protected]).

Digital Object Identifier 10.1109/TCAD.2007.907003

In general, routing consists of two steps: global and detailedroutings. While detailed routing finalizes the exact design-rulecompatible (DRC) pin-to-pin connections, the global routing,as its name implies, is the routing stage that plans the ap-proximate routing path of each net to reduce the complexityof routing task and guide the detailed router [16]. Thus, ithas significant impact on the overall wirelength, routability,and timing [16], [17]. Furthermore, it is also the key stage foroptimizing the wire-density distribution to improve the overallmanufacturability (e.g., less post-CMP topography variation,less copper erosion/dishing, and less optical interference forbetter printability [10]–[13]) and yield (e.g., smaller criticalarea [14], [15]).

The importance of global routing in VLSI design flow hasled to many works in predicting and estimating routing con-gestion and designing global routers. Probability-based con-gestion prediction for global routing is studied in [18]–[21],and global-router-based congestion estimation is researchedin [22] and [23] for early wirelength estimation. Withinthe scope of over-the-cell global-routing model [2], Bursteinand Pelavin [24] proposed a hierarchical approach to speedup integer-programming formulation for global routing, andKastner et al. [25] proposed a pattern-based global routing.Hadsell and Madden [2] presented Chi-dispersion router basedon linear cost function as well as predicted congestion mapand showed better results than the study in [25]. The multi-commodity flow-based global router by Albrecht [3] showedgood results and was used in industry but at the expense ofcomputational effort. Fast global router [4], [26], [27] can feedmore accurate interconnect information (such as wirelength andcongestion) back to placement or other early physical synthesisengines for better design convergence and tighter integration.

In this paper, a new routability-driven global router, calledthe BoxRouter, is proposed. BoxRouter first performs a veryfast yet effective PreRouting to identify the most congestedregions or boxes. Then, it progressively expands the routingbox and performs routing within each expanded box (BoxRout-ing) until the entire circuit is covered, i.e., all the wires arerouted. Efficient progressive integer linear programming (PILP)is formulated with adaptive maze routing (AMR), and effectivePostRouting follows BoxRouting for further enhancement. Themajor contributions of this paper include the following propo-sitions and observation.

1) We propose a new ILP formulation, which is significantlyfaster and more scalable than the traditional formulationin [3] and [28], which makes it practical to apply ILP tosolve VLSI routing.

0278-0070/$25.00 © 2007 IEEE

Page 2: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2131

TABLE INOTATIONS IN THIS PAPER

2) We observe that a simple PreRouting step can capture theoverall congestion and improve runtime.

3) We propose the key BoxRouting idea, which efficientlyutilizes limited routing capacities based on box expansioninitiated from the most congested region estimated byPreRouting. BoxRouting is efficient in terms of routabil-ity, as the wires in the more congested region are routedbefore those in the less congested region.

4) We propose an efficient PILP for BoxRouting. In ourPILP, only wires between two successive boxes are routedwith L-shape patterns. Thus, even with ILP, our runtimeis still much faster than the existing global routers [2],[3], [25].

5) We propose an AMR based on multisource multitargetwith bridge (MMB) model. Our AMR uses differentrouting strategies inside and outside the box such thatroutability can be maximized with minimum wirelengthincrease.

6) We propose an effective PostRouting step, which rerouteswires from the most congested region without rip-up.It is more efficient than the conventional rip-up androute (R&R). It also provides smooth tradeoff betweenwirelength and routability with only a simple parameter.

BoxRouter achieves much better results on the standardISPD98 IBM benchmarks than the results in [1]–[5], thuspushes the state-of-the-art considerably. Due to the fundamentalimportance of global routing in routability, timing, and manu-facturability [29], we believe that it shall have many applica-tions/implications for nanometer designs. Preliminary work ofthe BoxRouter is presented in [30].

The rest of the paper is organized as follows. In Section II,the preliminaries are described. Previous works are surveyedin Section III. Comparison and evaluation of ILP formulationsare presented in Section IV. In Section V, the BoxRouter isproposed. Experimental results are discussed in Section VI.Section VII concludes this paper with future work.

II. PRELIMINARIES

A. Notations

Table I lists the notations used throughout this paper.

B. Global-Routing Model

The global-routing problem can be modeled as a grid graphG(V,E), where each vertex vi represents a rectangular regionof the chip, so-called a global-routing cell (G-cell), and an edgeeij represents the boundary between vi and vj with a givenmaximum routing capacity mij . All the pins are assumed tobe at the center of the corresponding G-cell. Fig. 1 shows howthe chip can be abstracted into a grid graph where mAB = 3. Aglobal routing is to find paths that connect the pins inside theG-cells through G(V,E) for every net.

Fig. 1. Real circuit with netlists can be dissected into multiple grids whichcan be mapped into a graph for global routing with routing capacity on an edge.(a) Real circuit with G-cells. (b) Grid graph for routing.

C. Global-Routing Metrics

The key task of the global router is to maximize the routabil-ity for a successful detailed routing [27]. In addition, wire-length, runtime, and timing are other important metrics for theglobal router.

1) Routability is usually the most important metric forglobal routing. It can be estimated by the number ofoverflows, which indicates that routing demand exceedsthe available routing capacity [25], [27]. In Fig. 1, thenumber of overflow between vA and vB is one, as thereare four routed nets, but mAB = 3. Formal definition ofoverflow can be found in [25].

2) Wirelength is an important metric for placement, aswell as routing. But, it is less important as comparedto routability, as most wires are routed with shortestdistances, thus the total wirelength is, in general, not toofar away from optimum for a reasonable global-routingsolution [27]. However, there can be a huge difference interms of routability between two different global-routingsolutions of similar wirelength.

3) Runtime is also an important consideration, as global-routing links placement and detailed routing. A fastglobal router can feed proper interconnection informationto higher level design flow for better design conver-gence [26].

4) Other objectives such as timing and manufacturabilityare also significant objectives. Since the focus of thispaper is on the core global-routing techniques, they arenot explicitly considered in this paper. However, ourframework can be extended to handle them in the future.

III. PREVIOUS WORKS

A. Congestion Prediction and Estimation

Fast and accurate congestion prediction and estimation areessential techniques for reducing congestion in multiple stagesof physical synthesis. For example, during placement, the cellscan be inflated or the white space can be allocated in thecongested region to reduce the congestion and enhance theroutability [31], [32].

Recent study in congestion prediction includes a numberof probabilistic approaches. Lou et al. [18] decompose a netinto multiple two-pin wires then compute the probabilisticcongestion for each G-cell based on the chance of havingthe two-pin wire routed in the G-cell. Whereas all possibledetour-free paths are assumed with the same probability in [18],

Page 3: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2132 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

Westra et al. [20] only consider the simple L/Z-shape rout-ing based on the observation that one- or two-bend nets aredominant in the real designs. By empirically extracting theoccurrence of L- and Z-shape routings from multiple realindustrial designs, different probability weights are assigned toL- and Z-shape routings. In [23], it is shown that fast global-routing-based congestion estimation can be more accurate thanprobabilistic congestion prediction, as probabilistic approachhighly depends on tools or designs. However, global-routing-based congestion estimation is not exact either. A recent paper[26] claims that congestion estimation can be different fromglobal-routing result, unless the same techniques and optimiza-tion parameters are applied in both congestion estimation andglobal routing.

B. Global Router

A typical global router decomposes every net into a set oftwo-pin wires by building minimum spanning tree or Steinertree, then routes them by maze-routing algorithm, followedby R&R technique for further improvement. In [1] and [25],Kastner et al. propose a simple pattern-based routing ratherthan maze routing for fast runtime without incurring significantrouting-quality degradation. Hadsell and Madden [2] take ad-vantage of predicted congestion map to guide the global routerand show considerable routing-quality improvement over [1].The congestion-aware Steiner tree in [4], [5], and [26] reducesthe runtime by increasing the number of nets routed by simpleand fast pattern routing and, thus, less relying on expensivemaze routing.

Whereas the previous global routers [2], [25], [26] are mainlybased on pattern routing, maze routing, or shortest path algo-rithm, Albrecht [3] formulates the global routing as a multi-commodity problem, which can be solved by an approximationalgorithm for fractional flow with randomized rounding. First,it repeats building a rectilinear minimum Steiner tree (RMST)using maze routing in net-by-net manner. After all nets arerouted in RMST, a set of G-cells above the given congestionthreshold is selected, and all the nets on any of those G-cellsare routed again by building new RMST. Two key advantagesof such approach are that congestion can be evenly distributedover the chip and a small set of nets, which are penalized by ex-treme detour, will be discouraged. Overall, this algorithm showsgood congestion reduction but at a cost of high computationaloverhead.

IV. PRACTICAL ILP FOR GLOBAL ROUTING

The ILP technique has been believed to be unacceptablyslow for global routing in VLSI design despite that it findsthe global optimum for a given instance of problem. In thissection, we propose a new ILP formulation for global routing,which is inherently different from the one in [3] and [28] anddiscuss pros/cons of each formulation. In this paper, to avoidany confusion, we call the traditional ILP as T-ILP and our newILP as N-ILP. Both T-ILP and N-ILP are routability driven, butthey adopt different formulations, which make a big differencein performance and scalability.

Before the main discussion, we describe Fig. 2 for a clearexplanation in the following subsections. Fig. 2(a) shows twounrouted nets a and b, which are further decomposed into wires(see Section V-A): net a has three wires (wa1, wa2, and wa3),and net b has one wire (wb1). For each wire, we can enumerateall the possible routing paths, but for simplicity, we show onlythe paths in the minimum length and with minimum vias, asshown in Fig. 2(b). Each possible routing path is called arouting candidate of the given wire. In this example, we assumethat the routing capacity is two for the all the edges (r12 = 2,r25 = 2, and so forth), thus both Fig. 2(c) and (d) are routablesolutions.

A. Traditional ILP

T-ILP minimizes the maximum congestion over all the edges.Fig. 3 is a T-ILP formulation of Fig. 2(b), where a variable Cis set to be larger than any congestion on any edge (i.e., theupper bound). The routing result after solving Fig. 3 is notFig. 2(d) but Fig. 2(c), as Fig. 2(d) has the maximum conges-tion of 1.0 on e45, whereas Fig. 2(c) has the maximum con-gestion of 0.5.

Let E be the set of edges in the grid (indexed by e) and letN be the set of all feasible routing candidates. Furthermore, letL(e) be the set of routing candidates crossing edge e. Supposethat xijk is a binary variable set to one if the kth routingcandidate of wire j of net i is chosen. Then, Fig. 4 shows ageneral formulation of T-ILP. Note that the number of routingcandidates must be kept small (L-shape or L/Z-shape path)due to practical limitations (e.g., memory). The advantages ofT-ILP formulation include the following: 1) As it minimizesthe maximum congestion (min–max formulation), it essentiallytries to achieve more uniform congestion distribution. 2) Thesolution of T-ILP formulation always includes one routingcandidate for each unrouted wire. Thus, it completes routingby itself and does not need any additional step, unless there isany overcongested edge.

Meanwhile, the drawbacks of T-ILP formulation include thefollowing.

1) When C in Fig. 4 is larger than any me (the maximumrouting capacity of the edge e), the number of overcon-gested edges will explode. It considers not the overallcongestion but the maximum congestion. Therefore, aslong as the congestion is smaller than C, it is possible tohave many overcongested edges.

2) All the overcongested edges should be taken care of tomeet congestion constraint (otherwise, it is unroutable bydetailed router) by postprocessing steps, such as R&R.

3) The T-ILP cannot be efficiently solved with branch-and-bound or branch-and-cut algorithms. This will beexplained in Section IV-C.

B. New-ILP

Our proposed N-ILP maximizes the weighted summation ofthe number of routed wires under the routing-capacity con-straint. Fig. 5 is an N-ILP formulation of Fig. 2(b), where eachrouting candidate is weighted by its length in the objective.

Page 4: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2133

Fig. 2. Example of an ILP for global routing with two possible routing solutions. Two routing solutions in (c) and (d) are valid with respect to the given routingcapacities but different in terms of congestion distribution. The one in (c) achieves more uniform congestion distribution. T-ILP prefers routing (c) to routing (d),while N-ILP has no preference. (a) Decomposed net a, b. (b) Routing candidates. (c) Possible routing A. (d) Possible routing B.

Fig. 3. T-ILP formulation for the example of Fig. 2(b).

Fig. 4. General T-ILP formulation.

The result from Fig. 5 can be either Fig. 2(c) or (d), asN-ILP does not care about the maximum congestion, as longas there is no overflow. Fig. 6 shows the general formulation ofN-ILP where aijk is the weight of the routing candidate xijk,and the other notations are the same as in Fig. 4. Again, thenumber of routing candidates should be kept small (L-shape orL/Z-shape path). The advantages of N-ILP formulation includethe following.

1) As each candidate xijk can have a different weight, otherdesign objectives like timing can easily be incorporated.

Fig. 5. N-ILP formulation for the example of Fig. 2(b).

Fig. 6. General N-ILP formulation.

2) Due to the hard constraint on routing capacity, the solu-tion from N-ILP does not cause any overcongestion onany edge.

3) The N-ILP can be efficiently solved with branch-and-bound or branch-and-cut algorithm. This will be ex-plained in Section IV-C.

However, the drawbacks of N-ILP formulation include thefollowing.

1) The N-ILP may produce a biased routing solution interms of congestion uniformness. For example, if there

Page 5: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2134 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

are two valid solutions with different congestion distribu-tions, it may choose any of both, depending on the solverregardless of congestion uniformness (see Fig. 2).

2) Differently from T-ILP, it may not complete the routing.If an overcongested edge appears, it will give up routingsome wires with smaller weight not to violate the hardrouting-capacity constraint. Thus, N-ILP requires an ad-ditional step for complete routing.

C. T-ILP Versus N-ILP

Based on the discussions in Sections IV-A and B, we com-pare both ILP formulations in two aspects: routability andruntime.1) Routability: As aforementioned, both T-ILP and N-ILP

maximize the routability but in different manners: T-ILP min-imizes the maximum congestion, but N-ILP maximizes thenumber of routed wires under the routing-capacity constraint.This difference becomes highly distinct, depending on whetherthe design is undercongested or overcongested.

1) For undercongested designs, it is easy for T-ILP andN-ILP to satisfy the routing constraint. Therefore, T-ILPmay be superior to N-ILP, as it can make more uniformcongestion distribution, which improves manufacturabil-ity and crosstalk noise.

2) For overcongested designs, T-ILP may unnecessarilycause a lot of overflows, as it only cares about themaximum congestion. However, N-ILP itself does notcause any overcongested edges by leaving some wiresunrouted. The overflows from T-ILP and the unroutedwires from N-ILP need to be picked up by the followingmaze routing.

Since, modern VLSI designs are highly congested, in gen-eral, the advantage of T-ILP is quite trivial.2) Runtime: For a given ILP solver, different ILP formula-

tions may have different runtime complexity. An ILP problem isfirst solved as LP, and then a branch-based algorithm is appliedto any fractional variable to find the integral optimal solution.We find that, for the most widely used ILP-solving algorithms,branch-and-bound and branch-and-cut [33], [34], the N-ILPformulation can be solved much more efficiently than the T-ILPformulation for the same routing problem.

For demonstration purposes, we prepare various routingproblems with different problem sizes (in terms of the numberof variables) and then formulate them into both T-ILP andN-ILP. Fig. 7 shows the normalized runtime of each T-ILPand N-ILP formulation under typical computing environments(see Section VI) with GNU LP Kit (GLPK) 4.8 with all speedupoptions turned on. Note that we obtain a very similar trend forvarious algorithms such as branch-and-bound and branch-and-cut with different cutting planes [34], [35]. It is clear that N-ILPis significantly faster than T-ILP and that such speedup becomesmore significant for larger problem size, e.g., over 1100 timesfor some large cases. There are two theoretical explanationswhy N-ILP can be solved much faster than T-ILP.

1) Since N-ILP is similar to a binary knapsack formulation,the solution after LP is a near-feasible solution withalmost all variables nonfractional [33], [36]. However,

Fig. 7. Runtimes of T-ILP and N-ILP are compared. It shows that N-ILP ismuch faster and more scalable for larger problems than T-ILP.

Fig. 8. Basic concept of the BoxRouter. (a) Motivation for the BoxRouting.(b) Strategies of the BoxRouting.

due to the min–max nature of the objective function,the variables in T-ILP have more incentive to remainfractional after LP as opposed to their counterparts inN-ILP. Consequently, the LP solution of T-ILP is muchmore fractional than that of N-ILP, resulting in morebranches during branch-and-bound or branch-and-cut.

2) The branch-and-bound or branch-and-cut technique ter-minates in shorter time if more nodes can be fathomed[33]. Unfortunately, the min–max nature of the objectivefunction in T-ILP results in many near-optimal solutions.Therefore, the corresponding nodes cannot be fathomedefficiently, and the branch tree grows needlessly.

3) Summary: As discussed in Sections IV-C1 and C2,N-ILP is significantly faster than T-ILP, and the solution qualityfrom N-ILP is similar to that from N-ILP for the overcongesteddesign. Thus, N-ILP is expected to work better for the modernVLSI designs. Our proposed N-ILP is adopted in the BoxRouterin Section V, in a progressive manner with the box-expansionconcept.

V. BOXROUTER

In this section, we present our new global router,BoxRouter, which is based on congestion-initiated box expan-sion. BoxRouter progressively expands a box which initiallycovers the most congested region only but, finally, covers thewhole circuit. After every expansion, a circuit is divided intotwo sections: inside the box and outside the box. BoxRouteruses different routing strategies for each section to maximizeroutability and minimize wirelength. Consider Fig. 8(a) where

Page 6: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2135

Fig. 9. BoxRouter consists of three main steps: PreRouting, BoxRouting, andPostRouting. BoxRouting can be further composed of PILP and AMR.

two wires (a and b) are inside the box, whereas the other wires(c and d) are not inside the box. The routing capacity inside thebox is more precious to a and b than c and d for two reasons:1) If a and b are not routed within the box, the wirelength willincrease due to detour; and 2) c and d may have another viablerouting path outside the box, which does not waste the routingcapacity inside the box.

Therefore, the BoxRouter first routes as many wires insidethe box as possible with N-ILP in Section IV-B, maximallyutilizing the routing capacity inside the box. Then, for thewires which cannot be routed by N-ILP within the box (dueto insufficient routing capacities), BoxRouter detours them byAMR with the following two strategies:

1) Inside the box, use the routing capacities as much aspossible (greedily), as the wires inside the box havepriority over those outside the box.

2) Outside the box, use the routing capacities conserva-tively, as the wires outside the box may need them laterfor their viable routing paths.

These two strategies keep the wire density of the circuit,as shown in Fig. 8(b), and make the wires detour the morecongested region to maximize the routability with minimumwirelength overhead.

The overall flow of the BoxRouter is in Fig. 9, which willbe explained in detail in the rest of this section. Section V-Adescribes the preprocessing for the BoxRouter. Section V-Billustrates the PreRouting for congestion estimation and routingspeedup. Section V-C explains the BoxRouting, which is themain idea of the BoxRouter, which includes PILP, AMR, andbox expansion. Finally, Section V-D shows how PostRoutingimproves wirelength and routability further while controllingthe tradeoff between them.

A. Steiner Tree and Net Decomposition

A net can be decomposed into two-pin wires with RMST, asshown in Fig. 10. In the BoxRouter, Flute [37] and GeoSteiner

Fig. 10. Net can be decomposed into two-pin wires with RMST construction.(a) Hypergraph for a net. (b) Wires after decomposition.

Fig. 11. Congestion estimations after PreRouting and BoxRouting are com-pared. It shows that simple PreRouting can effectively capture overall conges-tion, as well as the most congested region. (a) Congestion after PreRouting.(b) Congestion after the BoxRouting.

[38], [39] are tested for Steiner-tree construction, but Flute isfinally adopted due to its small computational overhead. Notethat different Steiner-tree algorithms, such as a timing-drivenor congestion-driven Steiner-tree algorithm, can also be used inthe BoxRouter. A special wire which does not need a bend iscalled a flat wire [18]. For example, wires a–e, e–d, e–f , andb–f in Fig. 10(b) are flat wires, while wire f–c requires at leastone bend to be routed. Each wire from a net becomes a singlerouting object. However, the net is finally routed only if all thewires from a net are routed. Routing each wire from a singlenet separately may have the downside of loosing informationon other wires, resulting in suboptimal routing. This issue isaddressed in the AMR in Section V-C2.

B. PreRouting and Initial Box

PreRouting simply routes as many flat wires as possiblevia the shortest path without creating any overflow, as inAlgorithm 1. As the bulk of nets are destined to be routedin simple patterns (L-shape or Z-shape) [20], [23], [25], Pre-Routing can improve the runtime without degrading the finalsolution. More importantly, if enough wires can be routed byPreRouting, the global congestion can be captured with rea-sonable accuracy. According to our experiments for the testedbenchmarks, about 60% of the final wirelength on average canbe routed with tiny computational overhead by PreRouting.Fig. 11 shows two congestion maps, one after PreRouting andthe other one after BoxRouting, where more congested area isbrighter. It shows that congestion hotspots in Fig. 11(b) canbe predicted from Fig. 11(a) by PreRouting. A box whichencompasses the four G-cells in the most congested area willbe created, as shown in Fig. 12(a), as a starting point of

Page 7: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2136 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

Fig. 12. BoxRouting example. (a) Initial box is created on the hotspot, which is estimated by PreRouting. (b) Box i with wires which will be routed by theBoxRouting. (c) Wires within Box i will be routed by PILP. (d) Unrouted wire b after PILP is routed by AMR. (e) Box i is expanded, and more wires are enclosedby Box i + 1. (f) BoxRouting is performed with Box i + 1.

BoxRouting. Note that if there are two most congested areas,then the one closer to the center of the circuit is selected.

Algorithm 1 PreRoutingInput: A list of wires W

1: Sort each w in W by length in ascending order2: for each w in W do3: if w is flat then4: Make w routed5: OF = the number of updated overflows6: if OF > 0 then7: Make w unrouted8: end if9: end if10: end for

C. BoxRouting

In this section, the BoxRouting will be explained withFig. 12. BoxRouting consists of three steps: PILP routing,AMR, and box expansion, as shown in Fig. 9. These three stepsare repeated until the expanded box covers the whole circuit.Each of these steps are explained in the following sections.1) PILP Routing: We show in Section IV that N-ILP is more

efficient than T-ILP for modern typically overcongested VLSIdesign. Therefore, we use N-ILP formulation in PILP and fur-ther extend it by combining it with the box-expansion concept.

Assuming a box is expanded from the most congested region,as shown in Fig. 12(a), consider Fig. 12(b), where wires withinthe box after the ith expansion (box i) are shown with squares(b, f , and h) and the other wires are shown with circles.

Fig. 13. PILP formulation of Fig. 12(c).

The already-routed wires by either PreRouting or previousBoxRouting are simply shown as solid lines. Note that some flatwires like f , i, and k could remain unrouted until the BoxRout-ing if PreRouting gives up in routing them due to any overflowor new Steiner points introduced by AMR (explained later inthis section) convert a nonflat wire into a flat wire. For efficientrouting, as aforementioned in the beginning of this section, onlywires within the box will be routed by PILP and AMR.

In Fig. 12(c), the wires within the box are shown with G-cells(vA, vB , vC , and vD), and the corresponding PILP formulationfor maximum routability is shown in Fig. 13. To minimize thenumber of vias, two L-shape routing candidates (xb1, xb2 andxh1, xh2) are considered for each wire in our PILP formulation,but only one routing candidate (xf1 and xf2 = 0) is consideredfor flat wires. General PILP formulation is shown in Fig. 14,where ce is the available routing capacity on edge e (seeTable I), Wbox is a set of unrouted wires within the current box,and Wflat is a set of flat wires.

Page 8: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2137

Fig. 14. General PILP formulation.

Differently from the hierarchical ILP [24], our PILP progres-sively routes a part of the circuit, which is covered by eachexpanding box. This box expansion limits the problem size suchthat PILP, which is NP-hard, can be solved efficiently. Threeadvantages of our PILP can be summarized as follows.

1) Basic formulation is the same as N-ILP of Section IV-B,inheriting its advantages in runtime and scalability.

2) Even though the last box can cover the whole circuit, thePILP size remains tractable, as N-ILP is performed on thewires between two successive boxes like between Box iand Box i + 1 in Fig. 12(e).

3) As shown in Fig. 12(e), the newer box always contains theolder box. Consequently, the solution from the older PILPis reflected in the newer PILP formulation, providingsmooth transition between two successive problems forhigh-quality solution.

Due to the limited routing capacity of each edge, some wiresmay not be routed with the aforementioned PILP. xi1 + xi2 ≤ 1in Fig. 14 relaxes the routing constraint such that some wiresmay not be routed if the overflow occurs. For example, assum-ing that mBD = mCD = 2 and xh1 = 1, the wire b cannot berouted with ILP (xb1 = xb2 = 0), as two prerouted wires oneCD and one prerouted wire with the wire h(xh1 = 1) on eBD

consume all the routing capacities. For this case, the wire b isrouted by AMR, as shown in Fig. 12(d), with the routing costfrom Algorithm 3.

Algorithm 2 BoxRoutingInput: A list of wires W in box B

1: Solve progressive ILP with W2: for each w in W do3: if w is unrouted then4: Perform adaptive maze routing for w5: end if6: end for

2) Adaptive Maze Routing (AMR): Algorithm 3 returns aunit cost as long as eXY is inside the box and still has availablerouting capacity (lines 2 and 3). Otherwise, it returns a cost in-versely proportional to the available routing capacities (line 1).This cost function makes maze routing adaptively find the bestrouting path such that the shortest path inside the box is forwirelength minimization, but the most idle path outside the boxis for routability maximization. Note that the resource outsidethe box should be used conservatively, as the wires outsidethe current box may need them later. If too big a detour isrequired to avoid small overflows, AMR may return a path withoverflows for the least overall cost.

Fig. 15. Efficient multisource multitarget maze-routing examples. More effi-cient alternative paths are found by considering multiple sources and targets.(a) By finding shorter path x−y. (b) By sharing routed path x−y.

Fig. 16. MMB maze-routing model.

Algorithm 3 Adaptive Maze Routing Cost for BoxRoutingInput: G-cell Vx, Vy , Box B

1: Cost C = mxy − cxy

2: if exy is inside B and cxy > 0 then3: C = 14: end if5: C

For the maze-router implementation, we propose an MMBmaze-routing model for higher efficiency, as shown in Fig. 15.Consider the example in Fig. 15(a), where the source G-cell Sand the target G-cell T are to be routed, and the congestion isrepresented as a shaded region. To avoid congestion, a simplemaze routing can easily find the routing path, path2, instead ofpath1. However, as the goal is to make S and T electrically con-nected, we can achieve electrical connection as well as shorterwirelength, by alternatively routing x and y, as shown as path3.The other example in Fig. 15(b) shows the case where the rout-ing between b and c is detoured due to congestion. In this case,even though path1 is the shortest path between S and T withoutany congestion issue, the path S–x–y–T shown as path2–path3is the better routing path, because it shares and utilizes the exist-ing routed path path3, resulting in the shorter total wirelength.

Aware of the aforementioned cases, the proposed MMB-based maze routing in Fig. 16 is implemented for AMR. Thebasic idea behind MMB is to make the maze router honor theexisting partial routed paths of the net for shorter wirelengthand less congestion. In detail, the proposed model is based onthree different groups of G-cells, as shown in Fig. 16.

1) Source group: A group of G-cells which are electricallyconnected to the source G-cell S.

2) Target group: A group of G-cells which are electricallyconnected to the target G-cell T .

Page 9: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2138 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

3) Bridge group: Multiple groups of G-cells on the partialrouting paths which are connected to neither the source Snor the target T .

Note that identifying each group of G-cells can be done withany graph-traversal algorithm with trivial computational over-head. There can be multiple bridge groups in case many routedpaths (from PreRouting or previous AMR) are not connectedwith each other.

Flooding of the maze routing is started from all the G-cellsin the source group and is terminated when any G-cell in thetarget group with the minimal cost is discovered. The floodingwithin a bridge group is free by treating one bridge group as asingle virtual G-cell to encourage the utilization of the existingrouted paths for shorter total wirelength (details on AMR is inAlgorithm 4).

Algorithm 4 Adaptive Maze RoutingInput: Source s and target t of net N with box B

1: Find source group Gs of s2: Find target group Gt of t3: Find all bridge groups Gb1, Gb2, . . . of N4: A priority queue Q = φ5: for each G-cell Vx in Gs do6: Cost Tx of Vx = 07: Enqueue Vx into Q8: end for9: Best target G-cell Vb = φ, Tb = ∞10: while Q is not empty do11: dequeue a G-cell Vx from Q12: if Tx ≥ Tb then13: break14: end if15: for each adjacent G-cell Vy of Vx do16: Tn = Algorithm 3 (Vx, Vy, B)17: Ty = Tx + Tn

18: if Vy ∈ Gt and Ty < Tb then19: Vb = Vy , Tb = Ty

20: else if Vy ∈ Gbi then21: for each G-cell Vz in Gbi do22: Tz = Ty

23: Enqueue Vz into Q24: end for25: else26: Enqueue Vy into Q27: end if28: end for29: end while30: P = Backtrace the best path from Vb to any G-cell

of Gs

31: P

It should be noted that MMB-based maze routing maychange the initial Steiner-tree structure according to the con-gestion updated during routing, and this may negatively affectthe runtime as the maze router needs to search larger space forthe optimal routing path. This runtime issue can be mitigatedif the congestion-driven Steiner-tree algorithm is adopted. Note

that a simple wirelength-driven Steiner-tree algorithm is usedin this paper.3) Box Expansion: After all the wires inside the box are

routed, either by PILP or AMR, the box i will be expandedto box i + 1, and new wires (c, d, and k) are encompassed bybox i + 1, as shown in Fig. 12(e). The result after applying theBoxRouting (AMR after PILP), again, is shown in Fig. 12(f).The amount of increment during the box expansion signifi-cantly affects the routing solution. As the box grows larger forevery expansion with bigger increment, the runtime increasesexponentially due to larger PILP problem size (more wires areadded into the formulation due to larger expansion). But, thesmaller overflow can be obtained as the routing is performedmore globally. There can be several heuristics to determinethe increment, such as constant increment size or dynamicincrement size, but it is required to keep PILP problem sizemanageable (more discussion is presented in Section VI). Afterall wires are routed (the box becomes big enough to coverthe whole circuit), PostRouting of Section V-D will follow theBoxRouting. Each wire in the box is optimally routed by PILP,but the global optimality is not guaranteed as the box expands.

To a certain extent, the BoxRouting mimics the diffusioneffect which was originally proposed for placement migrationin [31]. By each BoxRouting step, all the wires in the morecongested region (within the box) are routed first by PILP, thenby AMR. This makes the wires outside the box detour thebox, if necessary. Such box expansion and congestion spreadingdiffuses wires in a progressive and systematic manner.

Our box expansion can be initiated from multiple regions,in case there are several congestion hotspots. This may lead tobetter congestion distribution, as well as improved runtime. Asthe key idea behind the box expansion is to diffuse wires frommore congested regions to less congested regions, intuitivelymultiple box expansion have advantages. More importantly,multiple box expansion can be effectively performed on mul-tiprocessor/distributed computing environment due to two rea-sons: 1) Most commercial ILP solvers support such computingenvironments and 2) each PILP can be solved independently, aslong as boxes are not overlapped. However, several implemen-tation issues, such as where to begin (how to define congestionhotspot) and when to stop, should be addressed with well-tunedheuristics.

D. PostRouting (Reroute Without Rip-Up)

As AMR in the BoxRouting uses conservative strategy out-side the box, as in Algorithm 3 (finding the most idle routingpath outside the box), it may create unnecessary detour andoverflow. Thus, PostRouting simply reroutes wires to removeunnecessary overhead with the box expansion initiated from themost congested region, as done in the BoxRouting. In detail, awire in the more congested region will be rerouted first andsuch a rerouted wire can release the routing capacity, as it mayfind the better routing path. Then, the surrounding wires can bererouted with the released routing capacity, potentially reducingwirelength and overflow again. This chain reaction propagatesfrom the most congested region to the less congested regionsalong the box expansion. Consider the example in Fig. 17 where

Page 10: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2139

Fig. 17. Example of PostRouting is shown. In (a), a routing capacity R is notutilized by the BoxRouting, as AMR finds less congested path. If R remainsunused after the BoxRouting is finished, it may be the reason for the suboptimalrouting path for a wire x. Thus, wire x can be rerouted by utilizing R, whichshortens a wire y with the released routing capacity from wire x, as well as in(b). (a) Routing before PostRouting. (b) Routing after PostRouting.

two wires x and y are routed around the G-cell Vz . Before thePostRouting (thus, during the BoxRouting), the wire x detoursVz during AMR in Section V-C2 due to high congestion in Vz

in spite of the available routing capacity R. However, if R isstill available after the BoxRouting is finished, then there isno reason to leave R available at a cost of longer wirelength.Wire x can be rerouted through R to minimize the wirelengthwithout causing any overflow. After wire x is rerouted, the wirey, which is detoured during the BoxRouting due to wire x, canbe rerouted, as well as using the routing capacity released afterwire x is rerouted, thus reducing again the wirelength.

Algorithm 5 Maze Routing Cost for PostRoutingInput: G-cell Vx, Vy , Param K

1: Cost C = K2: if cxy > 0 then

3: C = 14: end if5: C

AMR of Section V-C2 is used again for PostRouting butwith a different routing cost function in Algorithm 5, wherea user-defined parameter K is introduced. The parameter Kcontrols the tradeoff between the wirelength and routability(overflow) by setting the cost of each overflow as K. Thus,higher K will discourage overflow at a cost of wirelengthincrease (more detours), but lower K will suppress detourat a cost of overflows. The effectiveness of parameter K isdiscussed in Section VI.

Our PostRouting is more efficient than the widely used R&R,as the PostRouting makes a wire voluntarily release a rout-ing capacity (this happens only when the solution improves)during its rerouting, while R&R deprives it from a wire inthe congested region without guaranteeing any improvement.Although R&R and PostRouting target for less congestion, theapproaches are different. While R&R rips up the already routedwires to directly secure routing capacity, PostRouting makesmore routing capacity available indirectly by shortening thewirelength of each wire (wirelength is linearly proportionalto the number of routing capacities in use). PostRouting isguaranteed to find the equal or better routing path for the givenobjective function, as the current routing path can always befound as the worst case routing path. Thus, the routing qualitycan be improved gradually by repeating PostRouting.

TABLE IIISPD98 IBM BENCHMARKS FOR GLOBAL ROUTING

TABLE IIIPREROUTING IN THE BOXROUTER FOR ISPD98 IBM BENCHMARKS

Fig. 18. Overflow and runtime change by box increment for ibm04.

VI. EXPERIMENTAL RESULTS

We implement the BoxRouter in C++. All the experimentsare performed on a 2.8-GHz Pentium-4 Linux machine with2-GB RAM. Flute [37] with high-accuracy option is usedfor RMST, and GLPK 4.8 [34] is used as an ILP solver.We use ISPD98 IBM benchmarks [1] for this paper. Table IIsummarizes each ISPD98 IBM benchmark circuit and its cor-responding grid-graph model. The lower bound wirelength ofeach circuit is computed by the most accurate GeoSteiner 3.1[37], [39].

Table III shows the routing-completion percentage after thePreRouting. On average, 61% of the lower bound wirelengthcan be routed after the PreRouting, which is enough to capturethe overall congestion, as well as the most congested region.

Page 11: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2140 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

Fig. 19. Routability and wirelength tradeoff by parameter K. (a) ibm02.(b) ibm10.

Furthermore, over 61% routing completion, even before themain routing phase, will improve the runtime.

Fig. 18 shows the overflow and runtime by the amount of abox increment (see Section V-C) for one benchmark. It clearlyshows that, with larger box increment, the overflow decreases,but the runtime increases exponentially. While the wirelengthvaries only by 0.11%, the overflow decreases by 30%, but theruntime increases by 500%. It indicates that, with larger boxincrement during the box expansion of the BoxRouting, thesolution quality can be improved at a cost of runtime.

The effectiveness of parameter K (see Section V-D) is shownin Fig. 19. It shows that, with larger K, the overflow decreasesexponentially, but the wirelength increases logarithmically. Weconstantly find that the overflow saturates faster than wire-length, and the best tradeoff occurs between K = 10 andK = 15 for all the tested benchmarks. Fig. 20 also showsthat the runtime is independent of parameter K. The runtimevariations (stdev/average) of ibm02 and ibm10 are only 0.7%and 0.5%, respectively, while K varies from 1 to 30.

Table IV shows the routing results by the BoxRouter withK = 10 and K = 15, the best tradeoff found in Fig. 19. Itshows that the BoxRouter has, on average, 3.4% and 3.7%wirelength overhead (regarding the lower bound wirelength) forK = 10 and K = 15, respectively, and provides high-qualitysolutions for larger circuits with smaller overflows.

Fig. 20. Runtime change by parameter K. (a) ibm02. (b) ibm10.

TABLE IVRESULTS FROM THE BOXROUTER FOR ISPD98 IBM BENCHMARKS

TABLE VIMPROVEMENTS ON THE RANDOMLY INITIATED

BOX EXPANSION (K = 15)

Table V compares the congestion-initiated box expansionwith the random-initiated box expansion when K = 15. Itshows that the box expansion initiated from the most congestedregion can improve the number of overflow by 33.1% on

Page 12: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2141

TABLE VICOMPARISON WITH THE LABYRINTH 1.1 [25] AND THE FENGSHUI 5.1 (Chi DISPERSION) [2] FOR ISPD98 IBM BENCHMARKS

TABLE VIICOMPARISON WITH THE MULTICOMMODITY FLOW-BASED

ROUTER [3] FOR ISPD98 IBM BENCHMARKS

average, proving that it is more effective than the randomlyinitiated one in terms of congestion.

For thorough comparison, we download two available globalrouters, Labyrinth 1.1 [1], [25] and Fengshui 5.1 (which hasthe newest implementation of the Chi-dispersion router) [2],[40], and implement multicommodity flow-based global router[3] in C++ (the binary is not available from the author).Note that we use the same routine for RMST, congestionestimation, and maze routing for fair comparison in themulticommodity flow-based global-router implementation. Al-though the results of the Labyrinth and the Fengshui are re-ported in [2], we reproduce the results due to the recent updatein the benchmarks [1].

Table VI shows the experimental results and the comparisonfor the Labyrinth and the Fengshui, and Table VII for the mul-ticommodity flow-based router. As there is a tradeoff betweenthe wirelength and routability, we choose the parameter K ofthe BoxRouter of Table VI with wirelength constraint suchthat the wirelength from the BoxRouter is as small as orsmaller than those from the Labyrinth and the Fengshuifor fair comparison. Regarding Table VII, we first care-fully choose the parameters of the multicommodity flow-based router for each benchmark such that the best resultsare yielded within 25 phases (the maximum phase in [3]),then simulate ibm01, ibm02, ibm04, and ibm07 (circuits withnonzero overflow in Table VI) again for the BoxRouter withoutany constraint.

Fig. 21. Pie chart for average CPU-time breakdown.

As shown in Table VI, the BoxRouter outperforms theLabyrinth and the Fengshui by a wide margin. In terms ofwirelength and overflow, the BoxRouter can reduce the wire-length by 14.3%, the overflow by 91.7% as compared with theLabyrinth, and improve the overflow by 79% with similar wire-length (actually, 0.8% better) as compared with the Fengshui.In addition, the BoxRouter is 3.3× and 2.0× faster than theLabyrinth and the Fengshui, respectively. The multicommodityflow-based router and the BoxRouter show very comparableoverflows, as shown in Table VII. However, the BoxRouteris, on average, 15.7× up to 29× faster and produces 4.2%shorter wirelength on average than the multicommodity flow-based router. It implies that the BoxRouter can provide high-quality global-routing solution with significantly less designturn-around time.

Fig. 21 shows a pie chart for CPU-time breakdown averagedfrom all the benchmark circuits. PreRouting takes a negligibleamount of total CPU time (1.4%), while routing over 60%of wires. BoxRouting, which may be considered the slowestpart of the BoxRouter due to ILP, takes about 25%, while thePostRouting takes over 57%. This proves that the proposedPILP is significantly fast while providing high-quality solution.On the other hand, PostRouting which is mainly the mazerouting is the current bottleneck in runtime for the BoxRouter.

So far, all the results from the BoxRouter are with constantsize of box increment and single PostRouting. However, it ispossible to alter the size of box increment dynamically andrepeat the PostRouting for further improvement. Instead ofhaving a constant size of box increment, we fix the maxi-mum number of wires for each PILP, which can be found byempirically testing the ILP solver. Consequently, the box is

Page 13: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

2142 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 26, NO. 12, DECEMBER 2007

TABLE VIIICOMPARISON WITH THE DPROUTER [5] AND THE FASTROUTE 2.0 [4] FOR ISPD98 IBM BENCHMARKS

kept expanded until it covers the given maximum number ofwires. We call the improved BoxRouter as BoxRouter+, asshown in Table VIII, and compare the BoxRouter+ with thelatest global routers, DpRouter [5] and FastRoute 2.0 [4]. Inaddition, it shows the number of PILP instances with 10 000maximum wires and the number of PostRouting repetitions.Overall, the BoxRouter+ shows significantly better routabilitythan DpRouter and FastRoute 2.0 with comparable wirelengthand completes the most number of circuits. However, theBoxRouter+ is slower than DpRouter and FastRoute 2.0. Themain bottleneck in BoxRouter+ depends on the difficulty ofeach circuit. If a circuit is relatively easy, which requiresonly one or two iterations of PostRouting, the main runtimebottleneck is solving larger ILP instance. For harder circuits,it requires multiple iterations of PostRouting, which makesit the bottleneck for runtime. However, considering that thereal bottleneck in VLSI routing flow is detailed routing, betterroutability can compensate the runtime overhead as it can resultin significant speedup in detailed routing.

VII. CONCLUSION

In order to cope with the increasing impact of interconnecton system performance, we present an efficient global router,BoxRouter, to maximize the routability with minimum wire-length. Experimental results show that the BoxRouter outper-forms the state-of-the-art publicly available global routers interms of wirelength, routability, and runtime. We believe thatfurther improvement can be achieved with multiple box ex-pansions, faster ILP solver, and so on. Current implementationof the BoxRouter is available at www.cerc.utexas.edu/utda. Weplan to address timing, crosstalk, and manufacturability issueson top of the BoxRouter framework.

ACKNOWLEDGMENT

The authors would like to thank Prof. P. Madden fromSUNY, Binghamton, and Dr. C. Albrecht from CadenceBerkeley Laboratory for their helpful discussions, I. M. Iyoobof the Department of Operation Research, University of Texas,Austin, for the comparison of ILP formulations, and K. Yuanand K. Lu of UTDA Laboratory, University of Texas, Austin,for proofreading.

REFERENCES

[1] Labyrinth. Available: http://www.ece.ucsb.edu/~kastner/labyrinth/[2] R. T. Hadsell and P. H. Madden, “Improved global routing through con-

gestion estimation,” in Proc. Des. Autom. Conf., Jun. 2003, pp. 28–31.[3] C. Albrecht, “Global routing by new approximation algorithms for mul-

ticommodity flow,” IEEE Trans. Comput.-Aided Design Integr. CircuitsSyst., vol. 20, no. 5, pp. 622–632, May 2001.

[4] M. Pan and C. Chu, “FastRoute 2.0: A high-quality and efficient globalrouter,” in Proc. Asia South Pac. Des. Autom. Conf., 2007, pp. 250–255.

[5] Z. Cao, T. Jing, J. Xiong, Y. Hu, L. He, and X. Hong, “DpRouter: A fastand accurate dynamic-pattern-based global routing algorithm,” in Proc.Asia South Pac. Des. Autom. Conf., 2007, pp. 256–261.

[6] J. Cong, “Challenges and opportunities for design innovations in nano-meter technologies,” SRC Design Science Concept Papers, 1997.

[7] R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh, “An exact algorithmfor coupling-free routing,” in Proc. Int. Symp. Phys. Des., Apr. 2001,pp. 10–15.

[8] D. Wu, J. Hu, and R. Mahapatra, “Coupling aware timing optimizationand antenna avoidance in layer assignment,” in Proc. Int. Symp. Phys.Des., Apr. 2005, pp. 20–27.

[9] G. Xu, L. Huang, D. Z. Pan, and D. F. Wong, “Redundant-via enhancedmaze routing for yield improvement,” in Proc. Asia South Pac. Des.Autom. Conf., Jan. 2005, pp. 1148–1151.

[10] L. Huang and M. D. F. Wong, “Optical proximity correction(OPC)-friendly maze routing,” in Proc. Des. Autom. Conf., Jun. 2004,pp. 186–191.

[11] J. Mitra, P. Yu, and D. Z. Pan, “RADAR: RET-aware detailed routingusing fast lithography simulations,” in Proc. Des. Autom. Conf., Jun. 2005,pp. 369–372.

[12] R. Tian, D. F. Wong, and R. Boone, “Model-based dummy featureplacement for oxide chemical–mechanical polishing manufacturability,”IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 20, no. 7,pp. 902–910, Jul. 2001.

[13] T. E. Gbondo-Tugbawa, “Chip-scale modeling of pattern dependencies incopper chemical mechanical polishing process,” Ph.D. dissertation, MIT,Cambridge, 2002.

[14] S. Y. Kuo, “YOR: A yield-optimizing routing algorithm by minimiz-ing critical areas and vias,” IEEE Trans. Comput.-Aided Design Integr.Circuits Syst., vol. 12, no. 9, pp. 1303–1311, Sep. 1993.

[15] A. Venkataraman, H. Chen, and I. Koren, “Yield enhanced routing forhigh-performance VLSI designs,” in Proc. SPIE Microelectron. Manuf.Yield, Reliab. Failure Anal., Oct. 1997, pp. 51–60.

[16] J. Hu and S. Sapatnekar, “A survey on multi-net global routing for inte-grated circuits,” Integration, VLSI J., vol. 31, no. 1, pp. 1–49, Nov. 2001.

[17] J. Hu and S. Sapatnekar, “A timing-constrained algorithm for simultane-ous global routing of multiple nets,” in Proc. Int. Conf. Comput.-AidedDes., 2000, pp. 99–103.

[18] J. Lou, S. Krishnamoorthy, and H. S. Sheng, “Estimating routing conges-tion using probabilistic analysis,” in Proc. Int. Symp. Phys. Des., 2001,pp. 112–117.

[19] A. B. Kahng and X. Xu, “Accurate pseudo-constructive wirelengthand congestion estimation,” in Proc. Syst.-Lev. Interconnect Predict.,Apr. 2003, pp. 61–68.

[20] J. Westra, C. Bartels, and P. Groeneveld, “Probabilistic congestion predic-tion,” in Proc. Int. Symp. Phys. Des., Apr. 2004, pp. 204–209.

[21] C. Sham and E. F. Y. Young, “Congestion prediction in early stages,” inProc. Syst.-Lev. Interconnect Predict., Apr. 2005, pp. 91–98.

Page 14: BoxRouter: A New Global Router Based on Box Expansion and ...daehyun/teaching/2015_EE582/papers/r-box.pdf · BoxRouter: A New Global Router Based on Box Expansion and Progressive

CHO AND PAN: BOXROUTER: NEW GLOBAL ROUTER BASED ON BOX EXPANSION AND PROGRESSIVE ILP 2143

[22] M. Wang and M. Sarrafzadeh, “Modeling and minimization of routingcongestion,” in Proc. Des. Autom. Conf., 2000, pp. 185–190.

[23] J. Westra, C. Bartels, and P. Groeneveld, “Is probabilistic conges-tion estimation worthwhile?” in Proc. Syst.-Lev. Interconnect Predict.,Apr. 2005, pp. 99–106.

[24] M. Burstein and R. Pelavin, “Hierarchical wire routing,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 2, no. 4, pp. 223–234,Oct. 1983.

[25] R. Kastner, E. Bozorgzadeh, and M. Sarrafzadeh, “Pattern routing: Useand theory for increasing predictability and avoiding coupling,” IEEETrans. Comput.-Aided Design Integr. Circuits Syst., vol. 21, no. 7,pp. 777–790, Jul. 2002.

[26] M. Pan and C. Chu, “FastRoute: A step to integrate global routing intoplacement,” in Proc. Int. Conf. Comput.-Aided Des., 2006, pp. 464–471.

[27] J. Westra, P. Groeneveld, T. Yan, and P. H. Madden, “Global routing:Metrics, benchmarks, and tools,” in Proc. IEEE DATC Electron. Des.Process, Apr. 2005.

[28] X. Yang, R. Kastner, and M. Sarrafzadeh, “Congestion reduction duringplacement based on integer programming,” in Proc. Int. Conf. Comput.-Aided Des., 2001, pp. 573–576.

[29] M. Cho, H. Xiang, R. Puri, and D. Z. Pan, “Wire density driven globalrouting for CMP variation and timing,” in Proc. Int. Conf. Comput.-AidedDes., Nov. 2006, pp. 487–492.

[30] M. Cho and D. Z. Pan, “BoxRouter: A new global router based on boxexpansion and progressive ILP,” in Proc. Des. Autom. Conf., Jul. 2006,pp. 373–378.

[31] H. Ren, D. Z. Pan, C. Alpert, and P. Villarrubia, “Diffusion based place-ment migration,” in Proc. Des. Autom. Conf., Jun. 2005, pp. 515–520.

[32] X. Yang, B.-K. Choi, and M. Sarrafzadeh, “Routability-driven whitespace allocation for fixed-die standard-cell placement,” IEEE Trans.Comput.-Aided Design Integr. Circuits Syst., vol. 22, no. 4, pp. 410–419,Apr. 2003.

[33] L. A. Wolsey, Integer Programming. Hoboken, NJ: Wiley, 1998.[34] GNU Linear Programming Kit. Available: http://www.gnu.org/

software/glpk/glpk.html/[35] E. Balas, S. Ceria, G. Cornuejols, and N. Natraj, “Gomory cuts revisited,”

Oper. Res. Lett., vol. 19, no. 1, pp. 1–9, Jul. 1996.[36] A. Osyczka, Multicriterion Optimization in Engineering—With Fortran

Programs. New York: Ellis Horwood, 1984.[37] C. C. N. Chu, “FLUTE: Fast lookup table based wirelength estimation

technique,” in Proc. Int. Conf. Comput.-Aided Des., 2004, pp. 696–701.[38] D. Warme, “Spanning trees in hypergraphs with applications to

Steiner trees,” Ph.D. dissertation, Comput. Sci. Dept., Univ. Virginia,Charlottesville, Apr. 1998.

[39] GeoSteiner. Available: http://www.diku.dk/geosteiner/[40] BLAC CAD Group. Available: http://vlsicad.cs.binghamton.edu/

Minsik Cho received the B.S. degree in electricalengineering from Seoul National University, Seoul,Korea, in 1999 and the M.S. degree in electricaland computer engineering from the University ofWisconsin, Madison, in 2004, respectively. He iscurrently working toward the Ph.D. degree in theDepartment of Electrical and Computer Engineering,University of Texas, Austin.

He was with Intel during the summer of 2005, andthe IBM T. J. Watson Research during the summerof 2006 and 2007. His current research interests are

focused on manufacturability-driven routing algorithm and variation-tolerantvery large scale integration physical design.

Mr. Cho was the recipient of the Korean Information Technology Scholarshipin 2002, the Best Paper Award Nominations at Asia and South Pacific DesignAutomation Conference (ASPDAC): 2006 and Design Automation Conference(DAC): 2006, the ISPD 2007 Routing Contest Awards (2nd place in 3D and 3rdplace in 2D), and the IBM Ph.D. Scholarship in 2007.

David Z. Pan (S’97–M’00–SM’06) received thePh.D. degree in computer science from theUniversity of California, Los Angeles (UCLA),Los Angeles, in 2000.

From 2000 to 2003, he was a Research StaffMember with the IBM T. J. Watson Research Cen-ter. He is currently an Assistant Professor with theDepartment of Electrical and Computer Engineering,University of Texas, Austin. His research interestsinclude nanometer physical design, design for manu-facturing, low power, vertical-integration design and

technology, and computer-aided design (CAD) for emerging technologies. Heis with the Design Technology Working Group of the International TechnologyRoadmap for Semiconductors. He has published over 65 technical papers andis the holder of five U.S. patents.

Dr. Pan is an Associate Editor for the IEEE TRANSACTIONS ON

COMPUTER-AIDED DESIGN (TCAD), the IEEE TRANSACTIONS ON VERY

LARGE SCALE INTEGRATION (VLSI) SYSTEMS, the IEEE TRANSACTIONS

ON CIRCUITS AND SYSTEMS (TCAS-II), and the IEEE CAS SocietyNewsletter. He has served in the technical program committees of majorVLSI/CAD conferences, including International Conference on Computer-Aided Design (ICCAD), International Symposium on Physical Design (ISPD),Design Automation and Test in Europe (DATE), Asia and South PacificDesign Automation Conference (ASPDAC), IEEE International Symposiumon Quality Electronic Design (ISQED), International Symposium on CircuitsAnd Systems (ISCAS), SLIP, Great Lake Symposium on Very Large ScaleIntegration (GLSVLSI), and ICICDT. He is the Chair of the IEEE CANDEWorkshop of 2007, the Program Committee Chair of ISPD 2007, the CADTrack Cochair of ISCAS 2006 and 2007, and the Design of Reliable Circuitsand Systems Track Chair of ISQED 2007. He is a member of the Association forComputing Machinery (ACM)/Special Interest Group on Design Automation(SIGDA) Technical Committee on Physical Design. He was the recipient ofthe Best Paper in Session Award from SRC Techcon 1998, the IBM ResearchFellowship in 1999, the Dimitris Chorafas Foundation Award in 2000, the SRCInventor Recognition Award in 2000, the Outstanding Ph.D. Award from UCLAin 2001, the IBM Research Bravo Award in 2003, the IBM Faculty Award from2004 to 2006, the ACM/SIGDA Outstanding New Faculty Award in 2005, theBest Paper Award Nominations at DAC 2006 and ASPDAC 2006, ISPD 2007Routing Contest Awards (second place in 3D and third place in 2D), and theNSF CAREER Award in 2007.