93541014

93
考慮電源完整性與輸出/入埠 限制之積體電路設計配置方法 VLSI Design Planning with Power Integrity and I/O Constraints 生:呂昭宏 指導教授:劉建男 博士 陳宏明 博士

description

93541014

Transcript of 93541014

  • /

    VLSI Design Planning with Power Integrity and

    I/O Constraints

  • VLSI Design Planning with Power Integrity and I/O Constraints

    by

    Chao-Hung Lu

    A dissertation submitted in partial satisfaction of the

    requirements for the degree of

    Doctor of Philosophy

    in Electrical Engineering

    in the GRADUATE DIVISION

    of the NATIONAL CENTRAL UNIVERSITY

    Taiwan, Republic of China

    Professor Chien-Nan Liu and Hung-Ming Chen

    January 2010

  • (98 4 )

    ( 1 )

    /()

    ( ) ()

    ( ) ()

    ( ) ()

    ( ) ()

    ( )

    93541014

    : /

    98 1 26

    1. 15 3

    http://thesis.lib.ncu.edu.tw/

    2.

    3.

    4.

  • /

    :

    /

    /

    /

    O(n)

    /

  • Abstract In modern VLSI deigns, manufacturing issues have complicated the designs of

    chips as well as packages. Moreover, due to the requirement of the market, modern

    circuits have higher functionality, lower supply voltage and more I/Os. These

    conditions increase complexity of chip designs. In this dissertation, we present some

    I/O plan and floorplan methods to solve these problems. They cannot only be applied

    to mitigate the power supply noise in the core, but also can consider the package

    designs, and stacking IC designs.

    For the simultaneous switching noise, our method adopts a two-stage technique

    of the floorplan followed by the decoupling capacitance (decap) insertion. In the

    floorplan, the area and noise are evaluated to find a noise-driven floorplanning result.

    Then, we use a noise-driven decap planning approach to inserting minimal decaps

    into a floorplan. For IR-drop and the packages issues, we adopt a finger/pad

    assignment method to solve these problems. Our finger/pad assignment is a two-step

    method: we first solve the package design problem, then try to minimize IR-drop by

    switching finger/pad locations. In addition, since stacking IC is promising to the

    development of a high-performance IC, in this dissertation, we propose a partition

    approach to minimizing the 3D-vias and balancing the I/O number for each tier in

    stacking IC. Finally, we perform a floorplanning to show the importance of the

    aspect-ratio factor in stacking IC.

  • 2010

  • I

    ContentChapter1 Introduction..........................................................................................................................................1

    1.1 TrendsinVLSI........................................................................................................................................................1

    1.2 StackingICAdvantageandTechnology......................................................................................................2

    1.3 PowerIntegrityImpactsinChipDesign.....................................................................................................6

    1.4 ImpactsofI/OPadLocationin2DandStackingICs...........................................................................7

    1.5 DissertationOrganization.................................................................................................................................8

    Chapter2 EffectiveDecapInsertioninAreaArrayI/OArchitecture................................................9

    2.1 OverviewofDecapInsertion...........................................................................................................................9

    2.2 PowerDeliveryandSignalIntegrityIssues............................................................................................12

    2.2.1PowerDeliveryModelandNoiseEstimation..................................................................................12

    2.2.2DecapBudgetinginAreaArrayArchitecture.................................................................................14

    2.2.3ProblemFormulation.................................................................................................................................15

    2.3 MinimalDecapAllocationinPowerSupplyNoiseAwareFloorplanning.................................15

    2.3.1OTreeBasedPowerSupplyNoiseAwareFloorplanning.........................................................16

    2.3.2FeasibleRegionforDecapAllocation.................................................................................................22

    2.3.3 IdentificationofSpacePriorityforDecapInsertion....................................................................24

    2.3.4DecapCompensationforVoltageDropinofPowerNetwork.................................................25

    2.4 ExperimentalResults.......................................................................................................................................28

    Chapter3 PackageRoutabilityandIRDropAwareFinger/PadAssignment..........................32

    3.1 OverviewofPackageDesignMethods......................................................................................................32

    3.2 CongestionandIRDropViolationMinimizationinFinger/PadPlanning...............................36

    3.2.1ArchitectureandRoutingofBGAPackagein2DIC....................................................................37

  • II

    3.2.2ArchitectureandInfluenceofBGAPackageinStackingICs.....................................................38

    3.2.3TheImpactofFinger/PadLocationsonWireCongestion........................................................38

    3.2.4TheImpactofFinger/PadLocationsonIRDropViolation......................................................39

    3.2.5ProblemFormulation.................................................................................................................................41

    3.3 CongestiondrivenFinger/PadAssignmentwithIRDropImprovement.................................42

    3.3.1CongestiondrivenFinger/PadAssignment....................................................................................42

    3.3.2Finger/PadExchangeof2DandStackingICsforIRDropandBondingWire

    Improvement.............................................................................................................................................................48

    3.4 ExperimentalResults.......................................................................................................................................51

    Chapter4 DesignPlanningwith3DViaOptimizationinStackingIC............................................55

    4.1 OverviewofOurPartitionMethod.............................................................................................................55

    4.2 StackingICModelsandDesignFlow.........................................................................................................57

    4.2.13DViaandStackingICModels..............................................................................................................58

    4.2.2DesignFlowofAlternativeStackingIC..............................................................................................58

    4.2.3TheImpactofI/OLocationinAlternativeStackingIC...............................................................61

    4.2.4ProblemFormulation.................................................................................................................................62

    4.3 I/OsandModulesPlanningwithMinimal3DViaNumberinAlternativeStackingICs....63

    4.3.1GlobalPlanningforI/OsandModules...............................................................................................63

    4.3.2 I/OAllocationbyCongestiondrivenPlanningandIterativeRefinement.........................65

    4.4 ExperimentalResults.......................................................................................................................................67

    Chapter5 ConcludingRemarksandFutureWorks................................................................................71

    Reference...........................................................................................................................................................................74

  • III

    List of Figures Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking

    technology. ....................................................................................................... - 3 -Figure 1.2 The architectures of stacking ICs ............................................................ - 4 -Figure 1.3 Sub-classification of Wafer stacking ICs. ................................................ - 5 -Figure 1.4 The example of Ldi/dt noise effect. ......................................................... - 7 -Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result. ........... - 8 -Figure 2.1 (A) Flowchart of our proposed method (B) The illustration of our

    method ...- 8 - Figure 2.2 Power delivery model in the area-array architecture - 8 - Figure 2.3 (A) The current consumption profile of module. (B) Simulation result by

    HSPICE. - 8 - Figure 2.4 An O-tree example. - 17- Figure 2.5 The difference between the original O-tree and our approach ...- 18- Figure 2.6 The new Delete operation .....- 19- Figure 2.7 The relation between the area and the rotary module - 21- Figure 2.8 The new Insert operation.... - 21- Figure 2.9 The power-supply noise-driven floorplan algorithm ..- 22- Figure 2.10 The partition method for a decap. - 24- Figure 2.11 The space relation between decap locations and area - 25- Figure 2.12 The NDP_MAI flow chart. - 26- Figure 2.13 Circuit analysis for the power network.... - 27- Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology.

    (C) The flowchart for IC physical designs. ................................................... - 33 -Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time

    for the co-design method. .............................................................................. - 35 -Figure 3.3 The architecture of the two-layer ball grid array package. .................... - 38 -Figure 3.4 The relationship between the density and the finger/pad locations. ...... - 40 -Figure 3. 5 The simulation results of IR-drop. ........................................................ - 40 -Figure 3.6 The analysis model for IR-drop. ............................................................ - 41 -Figure 3.7 The pseudo code of the IFA method. ..................................................... - 44 -Figure 3.8 (A) The IFA assignment result. (B) The routing result. ......................... - 45 -Figure 3.9 The pseudo code of the DFA method. ................................................... - 46 -Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment

    method. .......................................................................................................... - 47 -Figure 3.11 The comparison of IFA and DFA.. ....................................................... - 48 -Figure 3.12 The pseudo code of our finger/pad exchange method. ........................ - 51 -

  • IV

    Figure 3.13 The routing results of Circuit 2. (A) Random (B) IFA (C) DFA ......... - 54 - Figure 4.1 The alternative stacking architecture. .................................................... - 56 -Figure 4.2 The flowchart of stacking ICs. .............................................................. - 59 -Figure 4.3 The area effect of aspect-ratio on stacking IC. ...................................... - 61 -Figure 4.4 The effect of I/O number in the 2-D chip design. ................................. - 62 -Figure 4.5 The effect of I/O number in alternative stacking IC. ............................. - 62 -Figure 4.6 The relation between the via number and I/O locations. ....................... - 64 -Figure 4.7 The connection graph of the circuit ....................................................... - 64 -Figure 4.8 The pseudo code of CPIR. ..................................................................... - 66 -Figure 4.9 The example of the CPIR method. ........................................................ - 67 -Figure 4.10 The detailed description of the I/O Planning Method. ........................ - 68 -Figure 4.11 The experimental result of the floorplan. ............................................ - 70 - Figure 5.1 Heat dissipation of Chips [60] ............................................................... - 73 -Figure 5.2 The flowchart of the future co-design tool ............................................ - 73 -

  • V

    List of Tables Table 1.1 ITRS predictions for circuit performance in 2009[1]..-1- Table 1.2 A comparison of all classified stacking ICs.....-6- Table 2.1 A comparison of six floorplan representations.-16- Table 2.2 The peak noise after floorplanning, the decap insertion and run time -29- Table 2.3 The comparison table of our decap computation and [17]... -30- Table 2.4 Experimental results for some MCNC benchmarks with various approaches

    for comparison.. -31- Table 3.1 The experimental data of test circuits... -52- Table 3.2 The maximum density, the total wire length and the maximum IR-drop in

    our test circuits.........-53- Table 3.3 The improved ratio of IR-drop and bonding wires....-54- Table 4.1 The experimental result of our I/O planning methods targeting stacking

    architecture with 3, 4, 5 and 8 tiers..................................................................-69-

  • - 1 -

    Chapter 1 Introduction

    1.1 Trends in VLSI

    Current trends in chip design are integrating multiple functions into a single chip,

    and simultaneously improving its size, power, performance and cost. To accomplish

    this goal, semiconductor manufacturing companies are continually increasing the

    number of transistors per square inch on integrated circuits, and improving their

    manufacturing technology. International Technology Roadmap for Semiconductors

    (ITRS) provides a prediction for Very Large Scale Integration (VLSI) growth [1], as

    Table 1.3 illustrates.

    Table 1.3 ITRS predictions for circuit performance in 2009 [1] Year 2009 2010 2011 2012 2013 2014

    ASIC M1 Pitch (nm) 54 45 38 32 27 24 Vdd (High-performance) 1.0 0.97 0.93 0.87 0.84 0.81

    Vdd (Low Operating ) 0.95 0.95 0.85 0.85 0.8 0.8 Transistors/Chip

    (millions) 773 773 1546 1546 3092 3092

    On-chip Clock(GHz) 5.45 5.84 6.32 6.81 7.34 7.91 Power Consumption (W) 143 146 161 158 149 152

    Modern chip design technology is continually advancing to meet these ITRS

    predictions. As VLSI technology enters the nanometer era, the resistance in connected

    wires is increasing greatly. This resistance seriously decreases the performance of the

    chip if the semiconductor manufacturing company uses the advanced technology to

    make IC (Integrated Circuits). Besides the resistance, the power integrity issue must

    be solved for the chip design as the trend in VLSI is to reduce supply voltages. This

    condition helps reduce power dissipation, but also decreases the noise margin of

  • - 2 -

    devices. Noise margin interference can sometimes generate erroneous chip functions,

    which seriously reduce chip performance. As a result, the power integrity problem has

    become one of the major factors affecting chip yield. In System-on-Chip (SoC)

    designs, chips contain more functions and are expected to have much better

    performance. At the same time, I/O counts are continually increasing. This adds up to

    more routing complexity in the package design. These problems are the topic of

    discussion in this dissertation.

    1.2 StackingICAdvantageandTechnology

    Stacking IC is promising to the development of a high-density high-performance

    IC. Stacking technology stacks a die (chip, wafer) over another die (chip, wafer).

    Transistors can be fabricated on different tiers, and the total wire length and size of

    chip shrink by vertical interconnecting, as shown in Figure 1.1. The benefits of

    stacking ICs include improvements in density, noise, power, performance, and

    functionality.

    (1) Density: In stacking ICs, transistors can be stacked and the package size can

    be reduced. Compared to 2-D standard cells with stacking cells, this approach offers a

    30% increase in area [2]. These reasons added the density when we convert a 2-D IC

    to a stacking IC, since circuit components can be placed on top of, or underneath,

    each other. Therefore, higher-density and higher-speed circuits can be created by

    stacking ICs.

    (2) Noise: Shorter wires have lower wire-to-wire capacitance, resulting in less

    noise coupling between signal lines. Shorter global wires with reduced numbers of

    repeaters should also have less noise and less jitter, providing better signal integrity.

    Since Stacking IC can greatly reduce the length of global wires, it greatly improves

    noise immunity.

  • - 3 -

    (3) Power: The shorter wires will decrease the load capacitance, resistance and

    the number of buffers needed. Since interconnect wires with their supporting repeaters

    consume a significant portion of total dynamic power, the reduced average

    interconnect length in stacking IC can reduce the total power consumption. Compared

    with 2-D IC, stacking IC can reduce the wire length and significantly reduce total

    dynamic power by more than 10% [3].

    (4) Performance: Shorter wires decrease the time required to deliver a signal,

    meaning that stacking IC can improve performance.

    (5) Functionality: Stacked integration allows the combination of dissimilar

    technologies (memory, RF, analog, logic) to create hybrid circuits.

    Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking technology.

    Current research refers to stacking ICs as three-dimensional (3-D) ICs [4] or

    System in Package (SiP) [5]. Stacking ICs can be classified into four types: (1)

    package stacking; (2) chip stacking; (3) wafer stacking; and (4) device stacking.

    Figure 1.2 shows the differences between each type. In the package stacking approach,

    the chip is packaged before stacking, as Figure 1.2(A) illustrates. Chip stacking ICs [6]

    stack dies before packaging, as Figure 1.2(B) shows. Wafer stacking fabrication

  • - 4 -

    ([7],[8]) stacks the wafers before cutting, as Figure 1.2(C) shows. A wafer stacking IC

    is smaller than a chip stacking IC. The size and the performance of device stacking

    ICs are better than wafer stacking ICs and the architecture is shown in Figure 1.2(D).

    Because modern manufacturing technologies for device stacking [9] ICs are relatively

    Figure 1.2 The architectures of stacking ICs :(A)Package Stacking; (B)Chip Stacking; (C)Wafer Stacking; (D)Device Stacking.

  • - 5 -

    new, device stacking ICs cannot be manufactured by semiconductor manufacturing

    companies. Therefore, the main-stream of modern stacking IC manufacturing is wafer

    stacking IC.

    In [10], wafer stacking is subdivided into two types: (1) chip-to-wafer, and (2)

    wafer-to-wafer. Figure 1.3 shows the differences between these approaches. In the

    chip-to-wafer IC manufacturing process, defective dies are removed before stacking

    [7]. The yield can be increased by the removing step. The disadvantage of the

    chip-to-wafer stacking is that the lower tier area is larger than the higher tiers and the

    spacing between die to die is wider than wafer-to-wafer IC. Wafers must be aimed

    before cutting in wafer-to-wafer ICs [8]. The disadvantage of the wafer-to-wafer

    stacking is that the bad die is used in the stacking even though the defective die is

    found, as shown in Figure 1.3(B). Table 1. 4 provides a comparison of all classified

    stacking ICs.

    Figure 1.3 Sub-classification of Wafer stacking ICs. (A) Chip-to-Wafer (B) Wafer-to-Wafer

  • - 6 -

    Table 1. 4 A comparison of all classified stacking ICs.

    Package

    Stacking

    Chip

    Stacking

    Wafer Stacking Device

    Stacking Chip-to-Wafer Wafer-to-Wafer

    Yield Highest High Normal Low Lowest

    Size Largest Large Normal Small Smallest

    Performance Lowest Low Normal Normal Highest

    1.3 Power Integrity Impacts in Chip Design

    Basically, the power integrity issues can be categorized into signal integrity

    problems and power integrity problems. This dissertation focuses on the power

    integrity problem caused by power supply noises such as the I (delta-I, Ldi/dit) and

    IR-drop noise. Ldi/dt is also called SSN (Simultaneous Switching Noise) or I

    (delta-I) noise. Figure 1.4 shows that Ldi/dt noise is a voltage fluctuation phenomenon.

    Figure 1.4(A) shows the architecture of the circuit. This circuit consists of two

    sub-circuits, A and B. If the start time of switching activity for A and B is not the same,

    i.e.t t , the voltage can remain at a high level, as in Figure 1.4(B). If not, the

    unstable voltage increases for a short period and the voltage drops below the high

    VDD constraint, as in Figure 1.4(C). Using the decap to enhance the stabilization of

    the voltage fluctuation is a popular method. This dissertation develops a decap

    insertion method to solve the SSN problem in chip designs.

    IR-drop is the voltage drop when the current goes through the non-zero resistors

    of supply lines. As IC design moves into the nanometer regime, the resistance of the

    connection wire incurs a voltage drop that exceeds the lower boundary constraint. In

    this dissertation, we modify the location of each power pad to reduce the resistance of

    the connection wire, and suppress the IR-drop. We use a real chip design and

  • - 7 -

    commercial tools to support this idea.

    Figure 1.4 The example of Ldi/dt noise effect. (A) The circuit architecture. (B) The

    voltage measure when t t . (C) The voltage measure when t t .

    1.4 Impacts of I/O Pad Location in 2-D and Stacking ICs

    In VLSI designs, the I/O pad locations not only impact the power integrity, but

    also affect the package routing and the core area. This section describes these effects

    in detail. A good I/O pad assignment can help reduce the length of bonding wires in

    stacking ICs. Compared stacking IC with 2-D IC, the architectures of the bonding

    wires are different, as shown in Figure 1.5. If the stacking factor is ignored (as shown

    in Figure 1.5(A)), the chip performance would be worsened due to the longer bonding

    wires. Figure 1.5(B) shows the ideal result for the wire bonding. To achieve this target,

    we need to consider the stacking factor in the finger/pad planning method.

    Besides bonding wires, I/O pads affect the number of 3D-Via. Most previous

    studies do not consider the physical impacts of 3D-Vias which critically affect the

    area, latency, performance and cost of stacking ICs. Therefore, the place and route

    algorithms on stacking IC should effectively plan 3D-Via.

  • - 8 -

    Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result.

    1.5 Dissertation Organization

    The goal of this dissertation is to provide a set of solutions considering the power

    integrity and I/O constraint issues in the EDA field. We firstly develop a power model

    to calculate the required decap for solving the delta-I problem and to increase the

    usage of available space in the floorplan to reduce the area overhead caused by decap

    insertion. Chapter 2 provides a detailed description of this step. We then develop a

    planning method to determine the order of I/O pads considering package routability.

    This plan method not only reduces the total wire length in the package but also

    suppresses IR-drop noise of the core. Chapter 3 provides a detailed description of this

    method. After determining the I/O order, the next step is to compute the location of

    the I/Os. This study proposes a system partition approach to minimizing the number

    of 3D-vias and balancing the number of I/Os on each tier, and modifies a traditional

    floorplan method to optimize the I/O and module locations. Then, the stacking IC

    design can be simplified into several 2-D IC designs. Chapter 4 describes this process

    in detail. Finally, Chapter 5 concludes this dissertation and provides directions for

    future research.

  • - 9 -

    Chapter 2 EffectiveDecapInsertioninAreaArrayI/OArchitecture

    As VLSI technology enters the nanometer era, supply voltages continue to drop

    due to the reduction of power dissipation, but it makes power integrity problems even

    worse. Employing decoupling capacitances (decaps) in floorplan stage is a common

    approach to alleviating supply noise problems. Previous researches overestimate the

    decap budget and do not fully utilize the empty space of the floorplan. A floorplan

    usually has a lot of available space which can be used to insert decaps without

    increasing the floorplan area. Therefore, the work presented in this chapter is to

    develop a better model to calculate the required decap to solve the power supply noise

    problem, and increases the usage of available space in the floorplan to reduce the area

    overhead caused by decap insertion. The experimental results of this work are

    encouraging. Compared with previous approaches, our methodology reduces 38% of

    the decap budget on average for MCNC benchmarks but can still meet the power

    supply noise requirements. The final floorplans with decaps are also smaller than the

    results in previous works.

    2.1 OverviewofDecapInsertion

    Many researchers have proposed various approaches to solving this problem in

    every design stage. The power/ground (P/G) network [11]-[13] is an important factor

    in the supply noise problem. Power supply noise can be greatly improved by a better

    P/G network with minimal penalty cost. Besides sizing the power lines [14],

    employing decoupling capacitances (decaps) is a common approach to reducing

  • - 10 -

    supply noise. Traditionally, the decap insertion process is performed after routing in

    the physical design flow. This method would waste many unnecessary area of the

    decap budget to improve the noise and decrease the efficiency of the decap budget.

    Therefore, more and more researches propose to insert the decap before routing. [15]

    proposed a two-step decap insertion method to improve power supply noise in the

    placement level. This method includes one prediction method and one correction

    method. The prediction step estimates the required decap pessimistically. Although the

    decap size can be adjusted in the correction step, a smaller area overhead can be

    achieved if decap insertion can be considered at an earlier stage. In [16] and [17], the

    authors proposed decap insertion methods at the floorplan level to reduce supply noise.

    Unfortunately, these previous researches often overestimate the decap budget. They

    assume that the decap is able to fully supply the maximum current of the module,

    which is too pessimistic in our observation. Besides the decap budget computation,

    previous works do not fully use the available floorplan space. A floorplan usually has

    a lot of available spaces that can be used to insert the decap without increasing the

    floorplan area.

    To make a high-performance and high pin-count IC, the area-array architecture is

    often used. In this architecture, the signal bumps are uniformly distributed over the

    chip. Therefore, the resistance from the core I/O to the signal bumps can be greatly

    reduced and larger number of I/Os can be accommodated. In [18], the authors used

    this architecture in floorplanning to improve the power supply noise. Because the

    area-array architecture has such advantages, more and more chips adopt this

    architecture to improve the power supply noise and limited pin number problem.

    However, without decap insertion, the resulting floorplans and the area-array

    architecture still suffer from supply noise violations.

    The purpose of this work is to develop a better model for calculating the decap

  • - 11 -

    Figure 2.14 (A) Flowchart of our proposed method (B) The illustration of our method

    required to solve the power supply noise problem and to wisely use the available

    space in the floorplan to reduce the area overhead caused by decap insertion. Based on

    the area-array architecture, we propose a two-step approach that includes a

    noise-driven floorplanning algorithm and a decap insertion approach to suppressing

    power supply noise at the floorplan level, as Figure 2.14 illustrates. First, we use a

    noise-driven floorplan algorithm to reduce the possible noise. This work adopts a

    stronger adjacent module relation O-tree representation as the engine for supply noise

    driven floorplanning, and successfully modifies the primary operations Delete and

    Insert in the proposed framework. Second, we use a Noise-driven Decap Planning

    with Minimum Area Insertion (NDP_MAI) approach to inserting minimal decaps into

    a noise-guided resultant floorplan, with blocks and decaps legalization. Note that this

  • - 12 -

    approach can compute the required decap size for a real design, and then provide the

    optimal location for each deacp in floorplanning. After routing, we can use the method

    in [19] to rectify our result, further improving the power supply noise problem.

    The rest of this chapter is organized as follows. Section 2.2 describes the

    floorplan design with power supply noise consideration, the new noise estimation

    method, decap budget computation, and problem formulation. Section 2.3 presents the

    floorplanning algorithm and the decap insertion approach for power supply noise

    avoidance. Section 2.4 shows the experimental results.

    2.2 PowerDeliveryandSignalIntegrityIssues

    IR-drop and Ldi/dt (delta-I) noise are the main contributors in the noise margin

    issue, and this work focuses primarily on the Ldi/dt noise. In this section, we describe

    our power delivery model and noise estimation model used in this work and formulate

    our problem.

    2.2.1 PowerDeliveryModelandNoiseEstimation

    In this work, the power source distribution is based on the area-array architecture.

    The area-array architecture is a mesh structure, and the VDD and GND bumps are

    uniformly distributed across the die with signal bumps in fixed interspersed location,

    as illustrated in Figure 2.15(A). The resistance from the I/O to the connection block is

    substantially decreased, the effect of the power delivery is better than peripheral-I/O

    architecture. As a result, many high-performance chips adopt the area-array

    architecture.

    In the area-array architecture, a VDD bump supplies the current to all modules

    according to the direct proportion of the distance from the bump to the module. Four

    neighboring VDD bumps (right-top, right-down, left-top and left-down) of the module

  • - 13 -

    Figure 2.15 Power delivery model in the area-array architecture.

    supply the main current, as Figure 2.15(B) shows, so we compute the noise from these

    VDD bumps only. Since there exist many paths for current delivery to the target

    module, we only consider the shortest and second shortest paths for noise computation

    simplification, as Figure 2.2(C) shows. The main reason is that currents follow the

    least-impedance paths when flowing from the VDD to the target module. Compared

    with this method with SPICE simulation, the error is within 10%, which is proved by

    [17]. This computation method is fast and the error can be controlled within tolerable

    range, so we use this method to compute the power supply noise in the floorplan level.

    Kirchhoff's voltage law can be used to represent the noise calculation of each module:

    V i RP LPP T ........................................................................(2.1)

    where V denotes the power supply noise at module k, Pj denotes the path from

    the power bump to node j, Pjk denotes the path from node j to node k, Tk denotes the

    union of shortest paths and the second shortest paths, RP denotes the resistance of

    P , LP denotes the inductance of P and i is the current flowing along path P .

  • - 14 -

    2.2.2 DecapBudgetinginAreaArrayArchitecture

    In [16] and [17], the authors assumed that the decap should fully supply the

    maximum current of the module, as shown in the white region in Figure 2.16(A). In

    this environment, the decap budget is possibly over-estimated. Actually, the VDD pin

    continuously provides current when the chip is operating, as the grey region in Figure

    2.16(A). Therefore, the required decap size can be significantly reduced.

    Figure 2.16 (A) The current consumption profile of module. (B) Simulation result by HSPICE.

    The required decap size can be obtained by the difference between the maximum

    current (Imax) and the target current limit (Igen) for each module. Assume the target

    current limit of module k is defined as I ,k=1,2,...M, and the maximum switching

    current of module k is I . Let C be the required decap for circuit k and Q be

    the amount of electric charge for the C . Q can then be obtained by the following

    equation based on the triangle model shown in Figure 2.16.

    Q I t dt I t dt.................................................................(2.2)

    where t is the start time and t is the finish time when the target module is in

    operational mode. The charge can be converted to the silicon area of the capacitance

    fabrication as follows:

    C QV

    .............................................................................................................(2.3)

  • - 15 -

    S CC

    ........................................................................................................(2.4)

    where V is the noise constraint of the voltage, C is the decap budget and S

    is the silicon area of C . C is the unit area capacitance of a MOS capacitor and

    C /t , where is the permittivity of SiO2 and t is the oxide thickness.

    We use SPICE to verify the accuracy of EQ(2.2), and compare the result with [17]. In

    the experiment, we use 0.25m technology to do this simulation. The supply voltage

    is set at 2.5V and the power supply noise tolerance level is set at 0.04V. Adopting the

    [17] method to compute the required decap produces the result of 112pF. The decap

    budget is 96pF when using EQ(2.2). Figure 2.16(B) shows the simulation result. The

    proposed method yields less required decap than [17].

    2.2.3 ProblemFormulation

    The goal of this work is to use minimum decap to solve the power supply noise

    problem in the floorplan level. In other words, we suppress power noise by each

    module in different locations and empty minimal decap area to avoid possible power

    noise during floorplanning. The problem can be formulated as follows :

    Given a set of modules, B1,B2,...,Bm, current consumption, and , of

    each block Bk, 1 k m, a set of power bumps, P1,P2,...,Pm, and the noise

    constraint for each module , find a feasible solution such that each module Bk

    obtains a required decap budget size DBSk, and minimum penalty area when DBSk is

    inserted. At the same time, the voltage noise of module must be smaller than

    the noise constraint .

    2.3 MinimalDecapAllocationinPowerSupplyNoise

    AwareFloorplanning

    To solve the power supply noise problem, this work develops a two-step

    methodology to suppress and reduce noise at the floorplan level, as Figure 2.14

  • - 16 -

    illustrates. Since placing two high current consumption modules close together

    seriously increases noise, we first propose a noise-driven floorplan algorithm to

    improve this issue; the idea is to place the high current consumption modules

    intelligently. The goal of our floorplan averages the high power consumption block at

    one chip. This method can bring two benefits: (1) the peak noise can be improved; (2)

    the decap budget can be averagely planned at one chip. The empty room after

    floorplanning is small and dispersive. If many decaps are inserted into one particular

    region, the area of the floorplan may increase because the empty room does not have

    enough space for the decap. We then propose a Noise-driven Decap Planning with

    Minimum Area Insertion (NDP_MAI) approach to reducing the power supply noise

    and area overhead after floorplanning. We briefly introduce the representation of

    O-tree and new needed operations, Delete and Insert, and then discuss the feasible

    region of the decap budget.

    2.3.1 OTreeBasedPowerSupplyNoiseAwareFloorplanning

    To obtain a better result for noise-driven floorplan, a suitable and controllable

    floorplan representation is needed. Table 2.5 compares six floorplan representations.

    Table 2.5 A comparison of six floorplan representations.

    Floorplan

    Representation

    Adjacent

    Relation

    Solution Space Operation

    Delete Insert

    SP[20] Not Good O((m!)2) - -

    B*-tree[21] Good O(m!22m-2/m1.5) O(1) O(m)

    O-tree[22] Best O(m!22m-2/m1.5) O(m) O(m)

    TCG[23] Good O((m!)2) O(m2) O(m2)

    CBL[24] Good O(m!23m-3/m1.5) O(m) O(m)

    DBL[16] Good O(m!2m-1) O(m) O(m)

  • - 17 -

    We choose O-tree to be our representation, the main reason is that the adjacent

    relations can be directly obtained. High current consumption modules can be placed at

    a distance from each other.

    The O-tree is composed of a horizontal tree and a vertical tree, as shown in

    Figure 2.17 (B) and (D). The horizontal (vertical) tree uses , to represent the

    data structure, as shown in Figure 2.17 (C) and (E), where denotes the tree type,

    denotes the paternity of the tree structure, and denotes the permutation of modules.

    If the module touches another module horizontally (vertically), such as modules H

    and L in Figure 2.17, it could be easily observed in the horizontal (vertical)

    representation. If we use other representations, the adjacent relation of each module is

    more difficult to be found.

    Figure 2.17 An O-tree example. (A) Floorplan result. (B) Vertical tree. (C) Vertical tree representation. (D) Horizontal tree. (E) Horizontal tree representation.

    Figure 2.18(A) shows the original O-tree operations. If module J is deleted, the

    Delete operation generates a Left-Down(LD)-packing floorplan. The result is that two

    high-current modules (I and K) are placed at an adjacent location. In some special

    regions, they consume more power than other regions, and must use more decap to

    reduce power supply noise. A similar situation occurs for the Insert operation because

    it only considers the area and the wire length in original operation conditions.

  • - 18 -

    According to the previous description, the original O-tree operations cannot control

    the neighboring blocks. Therefore, new transformation operations are necessary.

    These operations help avoid placing the high current consumption modules at adjacent

    locations. We propose two new transformation operations:

    Delete : The original operation deletes the selected module only. The new

    operation can delete the selected module and top-right modules of the

    selected module.

    Insert : The original operation considers the area factor only. The module

    can be inserted into a low noise location and an extensive area can be

    minimized in our new operation.

    We use an example to explain new transformation operations. To delete module J in

    Figure 2.18(B), the selected module is module J and the top-right modules of the

    selected module is module K only, so module J and module K must be deleted

    together. The reason that top-right modules must be deleted is because the floorplan

    must maintain a LD-packing result. If the right-top modules are not deleted at the

    same time, the high current consumption module may be placed at a neighboring

    location.

    Figure 2.18 The difference between the original O-tree and our approach.

  • - 19 -

    The new Delete operation consists of several steps. We first choose a

    to-be-deleted module from of the horizontal and vertical O-trees, and then all

    modules after in are chosen. We could obtain two block sets and . We then find

    the intersection of and and obtain the candidate list of deletion modules. The final

    step in the Delete operation is to delete the modules at the intersection of the vertical

    and horizontal O-trees. To clarify, we use an example to explain our Delete operation.

    Figure 2.19 shows the horizontal and the vertical representations of Figure 2.18. The

    horizontal representation is set as (0011000111,HLIJK) and the vertical representation

    is set as (0010101101,HIJKL). In this case, module J must be deleted. Therefore, the

    block set of includes modules J and K, and the block set of includes modules J, K

    and L. The deletion candidate list, JK, can be obtained after the intersection: JK JKL.

    Finally, modules J and K in the representation are deleted. The horizontal

    representation changes to (001101,HLI), and the vertical representation is modified as

    (001101,HIL). The time complexity for the new Delete operation is O(m), where m

    denotes the number of modules.

    Figure 2.19 The new Delete operation

    The new Insert operation consists of three parts: (1) find all possible locations; (2)

    compute costs; (3) choose the optimal location. If one module is inserted in a

    floorplan, there are many locations to choose from, and the first step is to discover

    these candidate locations in a LD-packing floorplan. The possible insertion location is

    at the lower-left corner of the floorplan result, as shown in Figure 2.21. Every possible

  • - 20 -

    insertion location has a different cost, and next step is to compute the cost for each

    candidate location. The cost function can be represented as follows:

    C D A A D I I I I ........................................(2.5)

    where C denotes the cost when module A is inserted in this location, D and D

    are the weights, A is the area of the floorplan after the module is inserted,

    A is the original area, I , denotes the current consumption of the module

    a(b,c), and I denotes the threshold value for local current consumption. D is set

    at a large value for penalizing high local current consumption. Every candidate

    location cost must be computed twice since costs are different when the insert module

    is directly inserted or rotated, as Figure 2.20 shows. Note that EQ(2.5) considers the

    area and the power consumption only, this cost computation can be extended by

    considering other objectives, such as wire length, etc.. Finally, the module is inserted

    in the minimal cost location. The following illustration explains the new Insert

    operation. In Figure 2.21(A), the initial floorplan result was made up of modules I, H

    and L and modules J and K are the insert module candidates. Four triangles denote

    the candidate location in the floorplan. In Figure 2.21(B), we compute the cost after

    module J is inserted in all the candidate locations. Finally, module J is inserted in the

    minimal cost location. In this case, the minimal cost location is at the corner between

    module H and module L, as Figure 2.21(C) shows. Because the candidate list is not

    null, the Insert operation must be repeatedly applied, as shown in Figure 2.21(D).

    Note that EQ(2.5) considers left and down adjacent modules only when calculating

    the cost function. The main reasons for this are (1)Because the Insert operation must

    maintain a LD-packing floorplan, it only considers the left and down module of A;

    (2)The Delete operation deletes all the top-right modules of A. If all modules are

    inserted at the down-left corner only, there are no modules on the top and right side.

    The time complexity of the new Insert operation is , where m denotes the

  • - 21 -

    number of modules.

    Figure 2.20 The relation between the area and the rotary module.

    Figure 2.21 The new Insert operation.

    According to previous operations and the SA (Simulated Annealing) [25]

    algorithm, we propose a power-supply noise-driven floorplan algorithm, as illustrated

    in Figure 2.22. We first provide a floorplan result and set the initial value for two

    annealing temperature parameters (line 2). The new Delete operation is used to delete

    modules (lines 4-6). Then, the new Insert operation and EQ(2.5) are used to insert

    modules (lines 7-11). After the Insert operation, the new LD-packing floorplan can be

    obtained. The difference C between the new floorplan and the original floorplan is

    computed (line 12). If C is smaller than zero, it means that a better floorplan is

  • - 22 -

    obtained and we would adopt this result to be our new solution (line 14). If not, we

    randomly decide that the original floorplan must be replaced by the new floorplan

    (lines 15-17). Finally, the temperature is cooled (lines 18).

    Figure 2.22 The power-supply noise-driven floorplan algorithm

    2.3.2 FeasibleRegionforDecapAllocation

    After obtaining one floorplan, we calculate the required decap size for each

    module. According to [20], [21], [22], [23], [24] and [16], the empty room after

    floorplanning is small and dispersive. If a bigger decap is inserted into one floorplan,

    the area of the floorplan may increase because the unitary empty room does not have

    Power-supply Noise-driven Floorplan Algorithm Input : A compacted floorplan,F, with the current consumption for each block Output : A compacted floorplan,F, that two continuous high current consumption blocks could be separately placed 1. Begin 2. Initial floorplan; Temperature; Final_Temperature; 3. while Temperature > Final_Temperature 4. Randomly choose one block Bx from F; 5. Using Delete operation to delete Bx; 6. All deleted blocks added into a candidate list; 7. while candidate list is not NULL 8. Choose one block By from candidate list; 9. Using EQ(2.6) and Insert operation to insert By; 10. Delete By candidate information; 11. end_while 12. C = Cost(New_Floorplan) Cost(Floorplan); 13. if C < 0 14. Floorplan = New Floorplan;

    15. else if Random(0,1) > exp CT

    16. Floorplan = New Floorplan; 17. end_if 18. Cooling(Temperature); 19. end_while

  • - 23 -

    enough space for the decap. Besides the area factor, the charge/discharge time of the

    capacitance must be considered. The charge time is substantially reduced when

    several smaller decaps form the required deacp. Our method cuts required decap into

    four smaller decaps to minimally increase the floorplan area and reduce the

    charge/discharge time of the decap. Note that the sizes of each smaller decap are not

    the same. The distribution is based on the Manhattan distance from the VDD source to

    the power bump and ( , ) denotes the connection relation. denotes the

    Manhattan distance from the module's VDD location to the power bump x and

    denotes the obtainable current contribution ratio from the power bump x. The

    computational equation of the current contribution ratio can be written as follows:

    DP , P , , P , , ,

    P P , P , , P , , ,............................................................(2.6)

    where (a x) denotes a and x are the different power bump. P , denotes the

    power bump source of P and P are different. According to EQ(2.6), is

    inversely proportional to P , it is to follow the current divided theorem. A simple

    example helps to explain EQ(2.6). Figure 2.23(A) shows a result of the floorplan.

    Module D needs decap to supply the current consumption. We first use EQ (2.2)-(2.4)

    to compute the optimal decap sizing. The decap is partitioned into four smaller decaps.

    Each smaller decap is given a feasible region that ranges from the location of the

    power bump to the VDD source. We then use EQ(2.6) to compute the current ratio for

    each power bump, as Figure 2.23(B) shows. Based on these constraints, a smaller

    decap can be inserted into the chip and the charge time of the capacitance can be

    substantially decreased.

    In modern chip design, the decap is inserted in the empty space after detailed

    routing. In reality, the decap cannot improve the power noise for the high current

    consumption module if the distance from the decap to the module is far. To effectively

  • - 24 -

    Figure 2.23 The partition method for a decap.

    utilize the energy for each decap, the rectangle scope from the power bump to the

    VDD pin is the feasible region for each small decap, as shown in Figure 2.23(A).

    2.3.3 IdentificationofSpacePriorityforDecapInsertion

    After floorplanning, the chip has some exploitable space for decap insertion. If

    these spaces can be fully utilized, the cost of the chip might not increase even when

    the decap is inserted into the chip. This section discusses the effect of placing the

    decap in each different space. Furthermore, we propose a Noise-driven Decap

    Planning with Minimum Area Insertion approach that simultaneously considers the

    area cost and the noise effect.

    In a floorplan result, it certainly has one or more horizontal (vertical) longest

    paths. The path denotes the maximum width (height) of the floorplan. As shown in

    Figure 2.24, module H and L compose a horizontal longest path. Varying these

    modules directly modifies the area. Therefore, if one decap is inserted in the channel

    space between these modules, the area would increase significantly. This channel

    space is called the extensible space. A channel space that overlaps side of the empty

    room is called the empty space that is not held by any module. If one decap is inserted

    in this space, the location of each module does not change. The remaining channel

    spaces are called the available space except for the channels of the extensible and

    empty space. If one decap is inserted in this space, the area of the floorplan is fixed

  • - 25 -

    and the location of some modules shifts only slightly. In Figure 2.24, the horizontal

    longest path is H L and the vertical longest path is H I. If one decap is inserted

    into the extensible space between H and I, the area would increase 285m . If one

    decap is inserted into the empty space corner between H and I, the area and the

    topology are not affected. Hence, the cost is lowest when the decap is inserted in the

    empty space. If the decap is inserted into the available space between L and J, the area

    is not increased but the topology changes. Therefore three types of spaces are the

    candidates location for the decap -- Available Space, Extensible Space and Empty

    Space. The priorities of these spaces are defined as follows: Empty Space>Available

    Space>Extensible Space. The minimum cost space will be selected for the decap

    insertion. Figure 2.25 illustrates the NDP_MAI approach.

    Figure 2.24 The space relation between decap locations and area.

    2.3.4 DecapCompensationforVoltageDropinofPowerNetwork

    When using EQ(2.2)-EQ(2.4) to calculate the required decap for each module, the

    decap must be placed around the target. The NDP_MAI approach does not consider

    this factor when placing the decap. If the location of the decap is not near the target,

    the power supply noise violates the given constraint because a part of the supply

    power from the decap would be consumed by wire resistance. To improve this

    problem, we use a simple compensative computation as follows:

    V ........................................................................................(2.7)

  • - 26 -

    Figure 2.25 The NDP_MAI flow chart.

    where is the required decap of module k after the compensation, V is the

    supply voltage, is the distance from the space to the connection point, and is the

    wire capacitance per unit length. Although we could compensate the power

    consumption of the wire by EQ(2.7), the power network is another important issue

    that affects the supply power of the decap. Figure 2.26(A) shows the power network

    after the decap insertion. We could utilize the superposition theorem to analyze the

    circuit, as Figure 2.26(B)(C) shows. The discharge current from the decap disperses to

    different modules because they both depend on the same power network. If decap

  • - 27 -

    budget computations do not consider this factor, the power supply noise constraint

    may be violated.

    Figure 2.26 Circuit analysis for the power network.

    To solve the above problem, we need a more accurate compensation equation for

    EQ(2.7). According to the current divided theorem

    V ........................................................................(2.8)

    where is the total resistance on the side of the module k and is the total

    resistance of other sides(Figure 2.26(B)). EQ(2.8) can accurately compensate the

    required decap. We use SPICE to verify the accuracy of these two compensative

    equations. In our experiment (the circuit in Figure 2.26), these modules are of the

    same resistance. We expect module B to obtain 2.75mA from decap. If we use EQ(2.7)

    to correct the decap, module B obtains only 2.66mA, which is insufficient for module

    B. If we use EQ(4.8) to adjust the decap, it obtains 2.78mA from decap. This

  • - 28 -

    experiment shows that the module can obtain sufficient current from the decap when

    we use EQ(2.2)-(2.4) and EQ(2.8) to compute the required decap.

    2.4 ExperimentalResults

    We implemented the Power-supply Noise-driven floorplan algorithm, the

    NDP_MAI approach, and the approach in [17] using C++ language on an AMD 3200

    computer with 1G memory. Table II and IV compare the run-time, peak noise, and

    decap budget with [17]. The purpose of this work is to analyze the effect of effective

    decap insertion in the floorplan level. To obtain an equivalent comparison, the original

    cost function of [17] is " A A W W ", where " W W " denotes the

    wire length cost. We set as zero because the wire length is ignored in our cost

    function. Five MCNC benchmark circuits, apte, hp, xerox, ami33 and ami49 are used

    to test the performance of proposed methodology. Since the MCNC benchmark

    includes no noise constraint, the noise constraint is set at 0.13V and 0.25V. In our

    experiments, the operation times tw0 and tw1 of the switching current waveform are set

    to be 0.3 and 0.8. The power supply voltage is 1.2V and the distance between two

    continuous VDD bumps is 1000/m and the power supply mesh is 333.3/m. We

    use [17]'s method to generate the current consumption information of each module.

    The for module k is A D , where A is the area of module k, and D is the

    worst case current density. is assigned as a random value to be 1.05 ~2 .

    Table 2.6 compares our method with [17], the peak noise at noise-aware

    floorplanning (noise-driven) and the post-floorplanning decap insertion (post).

    Experimental results show that our floorplan method obtained better results than [17].

    The main reason for this is that the high current consumption modules are placed

    apart when we use EQ(2.1) to compute the peak noise. The time complexity of our

    floorplan method is slightly higher than the method in [17]. The main reason for this

  • - 29 -

    are: (1) The time complexity of the original Insert operation is O(m) and the new

    Insert operation is O(m2). The time complexity of new operations is higher than

    original O-tree operations; (2) In [17], the authors use the sequence pair-based

    floorplanner to plan blocks. The sequence pair method modifies the list order to

    change the floorplan result. The time complexity for each change should be lower

    than the original O-tree method. In the post-floorplanning decap insertion, all results

    conform to the given constraint, 0.13V, and both run-times are very fast.

    Table 2.6 The peak noise after floorplanning, the decap insertion and run time.

    Our Method [17]

    Circuit Peak Noise(V)

    (noise-driven)

    Peak Noise(V)

    (noise-driven)

    Run Time(s)

    (noise-driven)

    Peak Noise(V)

    (noise-driven)

    Peak Noise(V)

    (noise-driven)

    Run Time(s)

    (noise-driven)

    apte 2.05 0.13 2 2.05 0.11

  • - 30 -

    Simulations have been run for xerox and ami33 with HSPICE. The peak noise before

    decap insertion is 1.63V for xerox, 0.29V for ami33. After applying [17]'s method for

    decap insertion, the peak noise is 0.06V for xerox, 0.11V for ami33. If we use our

    proposed method, the peak noise is 0.04V for xerox, 0.06V for ami33. These results

    are close to our results in Table 2.7.

    Table 2.7 The comparison table of our decap computation and [17]. Circuit Our Decap

    Budget(nF) Our Peak Noise(V)

    [17]Decap Budget(nF)

    [17]Peak Noise(V)

    Decrease Ratio

    apte 14.77 0.13 22.12 0.11 33%

    hp 3.6 0.13 5.01 0.11 28%

    xerox 6.53 0.13 9.55 0.10 31%

    ami33 0.17 0.12 0.36 0.09 52%

    ami49 7.85 0.12 14.54 0.10 46%

    Table 2.8 compares the area information of the proposed methodology with [17].

    In the third column, we completely use the [17] method to compute the incremental

    area. In the fourth column, we partially adopt the [17] method (including floorplan

    and decap computation) and our decap insertion method to implement. The third

    column and the fourth column show that our proposed floorplanning framework alone

    has our performed the method in [17]. According to this result, our decap insertion

    method is better than [17]. The main reason is our method cuts decap into four smaller

    decaps to minimally increase the floorplan area. The incremental area of our proposed

    method is shown in the last column. Compared to the numbers reported in previous

    papers, the proposed floorplanning framework creates better initial floorplans to work

    on, followed by the effective NDP_MAI approach to inserting enough decaps.

  • - 31 -

    Table 2.8 Experimental results for some MCNC benchmarks with various approaches for comparison.

    Circuit Modules[17] Increased

    Area(m2)

    [17]s Decap+Our Insertion Increased

    Area(m2)

    Our Methodology

    Increase Area(m2)

    apte 9 356832 292618 19036

    hp 11 76608 68184 157640

    xerox 10 152061 121648 75006

    ami33 33 8228 5082 3824

    ami49 49 266616 201486 154990

  • - 32 -

    Chapter 3 Package Routability- and IR- Drop-Aware Finger/Pad Assignment

    The trends in VLSI is to make more and more electronic devices into a single

    chip, and the performance requirement is getting more severe. To achieve this

    objective, the finger/pad counts are continuously increased, thus the package and chip

    design becomes more and more complex. Due to the increasing complexity of the

    design interactions between the chip and package, it is essential to consider them at

    the same time. In addition, the reduced supply voltages in modern chip design are

    tightening the noise margin. IR-drop is an important part of the design issue, and it is

    now an inevitable waste when the circuit obtains energy from a power source.

    In order to simultaneously handle core and package problems, co-design of core

    and package is a widely adopted solution, particularly because the finger/pad

    locations significantly affect IR-drop of the core and the package routing. In this

    chapter, we developed chip-package co-design techniques to determine the locations

    of the fingers/pads for package routability and signal integrity concerns in 2-D and

    stacking IC design. Our finger/pad assignment is a two-step method: we first solve the

    wire congestion problem in package routing and then try to minimize the IR-drop

    violation and the length of the bonding wires. The experimental results are

    encouraging. Compared with the randomly optimized method, on average, our

    approaches reduce the maximum package density by 42% and 68% for both

    technologies, the IR-drop by 10.61% and 4.58%, and the bonding wires for stacking

    IC by 15.66%.

    3.1 OverviewofPackageDesignMethods

    In the traditional design methodology, the core and package of a chip are

  • - 33 -

    designed separately, as shown in Figure 3.1(A) and (C). Core designers assume that

    package problems will not affect the performance of the chip. However the

    performance, complexity, and noise of the package critically affect the chip [26]. In

    the new chip design paradigm such as Stacking-IC [5], the package design absolutely

    determines the final quality of the chip. Therefore, a high quality package design is

    needed for a modern chip design.

    Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology. (C) The flowchart for IC physical designs.

    As VLSI technology enters the nanometer era, chips contain more functions and

    are expected to have much better performance. At the same time, finger/pad counts

    are continually increased. This adds up to more routing complexity in the package

    design. In early package technologies, the number of available finger/pads was small,

    such as Dual In-line Package (DIP) or Pin Grid Array (PGA). The Ball Grid Array

    (BGA) is a popular package technology for modern package design because it can

    handle high finger/pad counts to connect to the Printed Circuit Board (PCB). The

  • - 34 -

    package design flow can be divided into several parts, as shown in Figure 3.1(A). The

    major problem in package design is routing. Many researchers [27][28][29][30] have

    proposed various approaches to solving the routing problem in package design. Using

    finger/pad assignments to improve the package routing is another alternative. In

    [31][32][33], the authors proposed numerous assignment algorithms to improve the

    routing problem. Because these methods can only handle a small finger count (

  • - 35 -

    separately in the modern chip design, as shown in Figure 3.2(B). This principle will

    cause the over-design conduction. Package designers usually use a finger planning

    method to improve package routing, and core designers propose a noise-driven I/O

    planning method to improve IR-drop of a core. To build a functionally correct chip,

    we usually over-design the chip to mitigate routability(noise)-related issues in the

    finger planning (I/O planning) step. This over-design brings two disadvantages: the

    longest cycle time for the chip design and a greater cost for the chip design. If we

    perform chip-package co-design to simultaneously compute the interdependent

    influence of IR-drop and package routing across the die and chip, these disadvantages

    can be easily eliminated.

    Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time for the co-design method.

    We enhance the [37]s method to apply in stacking IC. We develop a two-step

    approach to simultaneously improving the package congestion and IR-drop of the core

    at the finger/pad assignment step for a 2-D IC. If this approach is used for a stacking

    IC, the length of bonding wires, and the package congestion and IR-drop can be

    simultaneously improved. This approach includes one congestion-driven assignment

  • - 36 -

    and one finger/pad exchange approach, as shown in Figure 3.1(B). Our contributions

    are summarized as follows.

    We present a finger/pad assignment method to minimize the maximum wire

    congestion, and propose a finger/pad exchange method to improve IR-drop

    of the core in a stacking IC design. The assignment result can certainly lead

    to a legal routing solution.

    We propose an efficient estimation to analyze the wire congestion before

    routing. This method does not need to analyze the whole substrate, and it can

    directly find the most congested region.

    We develop a co-design methodology to simultaneously improve the

    problems with the package and core in (stacking) ICs. The cycle time for the

    chip design can be greatly shortened.

    The rest of this chapter is organized as follows. Section 3.2 describes the package

    architecture in 2-D and stacking ICs, finger/pad assignment design with congestion

    and IR-drop consideration, and the problem formulation. Section 3.3 presents two

    congestion-driven assignment methods and one finger/pad exchange method to

    improve package problems and IR-drop. Section 3.4 shows the experimental results.

    3.2 CongestionandIRDropViolationMinimizationin

    Finger/PadPlanning

    To deliver great data in modern chip designs, finger/pad counts are continually

    increased and the complexity of package routing is greatly raised. In addition, the

    IR-drop issue seriously impacts the performance for the chip design. The finger/pad

    not only affects the package routing, but also impacts IR-drop of the core. This study

    focuses primarily on these problems. We first introduce our package model, and then

    the sources of the package routing and IR-drop problems are described. Finally, we

  • - 37 -

    formulate the target problem in this work.

    3.2.1 ArchitectureandRoutingofBGAPackagein2DIC

    Based on the modern package technology, we can utilize multiple layers for

    package routing. In our package model, there are two layers for routing, the die on the

    top layer of the substrate, and the bump balls on the bottom layer of the substrate. The

    fingers, which are the relay from the pad to the package substrate, are placed as a

    closing rectangle on Layer 1. The pads are connected to the fingers by wire-bond and

    flip-chip [1] technologies. Because wire-bond packages are cheaper than flip-chip

    packages, we adopt the wire-bonding technology to connect the die and the package

    substrate in our package module. The detailed architecture is shown in Figure 3.3.

    Figure 3.3(A) shows the vertical view and (B) is the profile. Bump balls, which are

    connected to the printed circuit board, are uniformly distributed on Layer 2. The net

    between the finger and the bump ball is implemented within a package substrate on

    Layer 1 and Layer 2. The function of the via is to connect a wire on Layer 1 and

    another wire on Layer 2, as shown in Figure 3.3(B). In addition, we partition the

    package area into four parts and solve the package problems individually. We also

    assume that the finger order and the pad order are the same.

    Because the via count affects the performance and the area of the package, we

    constrain that the maximum via count of each net is one in our package routing. In

    addition, the candidate locations for the vias are around the bump ball. The number of

    vias between four adjacent bump balls is at most one. In [28], the authors proposed a

    global routing method to plan the via location and the net path, and the routing result

    complies the monotonic characteristic. The monotonic characteristic is that the net

    from the finger to the bump ball intersects every horizontal grid line only once.

    Therefore, the detour routing would not occur and the wire length can be reduced. We

    adopt the idea of [28] to plan the via location and the routing path for the same

  • - 38 -

    Figure 3.3 The architecture of the two-layer ball grid array package. (A) The vertical view. (B) The profile.

    purposes.

    3.2.2 ArchitectureandInfluenceofBGAPackageinStackingICs

    Compared stacking IC with 2-D IC, the architectures of the bonding wires are

    different, as shown in Figure 1.5. If the stacking effect is ignored (as shown in Figure

    1.5(A)), the chip performance would be worsened because bonding wires are longer

    and the resistance and inductance are inversely proportional to the wire length. In

    addition, the bonding wire yield is lower if the distance between the finger and

    connected pad is longer. Figure 1.5(B) shows the optimal result for the finger/pad

    planning. To achieve this target, we need to consider the stacking factor in the

    finger/pad planning method.

    3.2.3 TheImpactofFinger/PadLocationsonWireCongestion

    The vias are evenly distributed on the substrate in our package architecture. We

    compute the wire count between two continuous vias to denote the density. If the

  • - 39 -

    density is higher, it indicates that too many wires pass through a narrow range.

    Therefore, a violation of design rule is probably occurred. To improve this problem, it

    is essential to develop a good method to control the density. The relationship between

    the density, via location and routing method is detailed in [28]. This work focuses on

    the relationship between the density and the finger/pad locations.

    A good finger/pad assignment can help to reduce the density of the package

    routing. We use an example to explain the relationship between the density and the

    finger/pad assignments. To display the importance of the finger/pad assignments, the

    via location and routing method is fixed in the example. In Figure 3.4(A), we use a

    random method to generate the finger order, 10,1,2,3,11,6,9,4,5,8,7,0. In Figure

    3.4(B), a congestion-driven assignment method is used to generate a new finger order,

    10,11,1,2,6,3,4,9,5,7,8,0. Compared Figure 3.4(B) with (A), the maximum density can

    be reduced 50% when we merely change the finger order.

    3.2.4 TheImpactofFinger/PadLocationsonIRDropViolation

    IR-drop is the unavoidable waste of electric charge when the circuit obtains

    energy from power pads. Compared wire-bond packaging with flip-chip packaging,

    the IR-drop problem of a wire-bond package is worse than a flip-chip package. The

    main reason is that the distance from the power pad to the module in a flip-chip

    package is shorter than in a wire-bond package. However, as we move into the

    nanometer regime, the resistance of the connection wire would consume the supply

    energy. If the power pad cannot supply enough energy, the voltage drop might exceed

    the lower boundary constraint. In this work, we modify the location of each power

    pad to improve the resistance of the connection wire. Further, IR-drop can be

    improved. We use a true chip design and commercial tools to verify the accuracy of

    this concept. The simulated result is shown in Figure 3. 5. Compared Figure 3. 5(B)

    with (A), IR-drop can be greatly improved by just changing the pad locations.

  • - 40 -

    Figure 3.4 The relationship between the density and the finger/pad locations.

    Figure 3. 5 The simulation results of IR-drop.

    To minimize the cycle time of the chip design, we need a good and efficient

    model to analyze IR-drop. This is usually done after floorplanning and placement

    [38][39], and the results are shown to be close to the results from SPICE simulation.

    In [40], authors proposed an analytical model for use before floorplanning. Since the

  • - 41 -

    finger/pad assignment problem is resolved before floorplanning, we adopt the model

    in [40] to obtain the IR-drop map. Since this model should be used before the

    planning of the core, it is not very accurate. The power grid model of [40] is shown in

    Figure 3.6(A). Figure 3.6(B) is a node model for the grid. The authors assume that the

    power consumption of all the locations are the same, and propose the following

    equation to calculate IR-drop of each point.

    VIR , VIR ,

    R

    VIR , VIR ,

    R

    VIR , VIR ,

    R

    VIR , VIR ,R

    J x y ............................................................................(3.1)

    where VIR x, y is the voltage of a point x, y , J is the current density, and R

    and R are the resistances in the x and y directions. According to EQ(3.1), we can

    exchange power pad locations to minimize x and y to improve IR-drop.

    Figure 3.6 The analysis model for IR-drop.

    3.2.5 ProblemFormulation

    We have detailed the relationships between the wire congestion, IR-drop and

    finger/pad locations in 2-D and stacking ICs. In modern chips, the finger/pad counts

    are continuously increased and the supply voltages are continuously decreased. Issues

    related to the wire congestion on a substrate, IR-drop of the core and bonding wires of

    stacking ICs are becoming more and more serious. The goal of this work is to plan

  • - 42 -

    nets on regular finger/pad locations to improve these issues. In other words, we

    decrease the density, voltage drop of the core, and length of the bonding wires by

    relocating the finger/pad locations. The problems can be formulated as follows:

    Input : The locations of the fingers/pads, F1,F2,...,F from the left to the right, the

    set of the net names, N1,N2,...,N and the type of each net, the locations of the

    bump balls, B1,1,1,B2,1,2,...,B , , , where , denote the coordinates of the bump

    ball, denotes the net name, denotes the total net count, and denotes the

    total finger/pad count. In addition, we must set the tier number, , and the pad

    number for each tier.

    Output : The assignment of net Nb, 1 b to finger/pad locations Fa , 1

    a .

    Objective : Minimize the maximum density and the voltage drop of the core, and

    improve the length of the bonding wires based on a pre-floorplan model.

    3.3 CongestiondrivenFinger/PadAssignmentwith

    IRDropImprovement

    To solve the density and IR-drop problems, we propose a two-step methodology

    at the finger/pad planning level, as Figure 3.1(B) illustrates. We first propose two

    congestion-driven finger/pad assignment methods to improve the package density; the

    idea is to calculate the ideal density and compute a suitable finger/pad order and

    locations. We then present a finger/pad exchanging approach to reduce IR-drop. This

    exchange approach will simultaneously consider the density, IR-drop and bonding

    wires.

    3.3.1 Congestion-driven Finger/Pad Assignment

    The monotonic routing is a method in the package design [28]. It can provide a

    high-quality routing result. This work adopts this routing principle to verify the effect

  • - 43 -

    of the assignment method. Based to the monotonic characteristic, [28] proposed a via

    assignment rule. For each finger Fa, the target bump ball is Bb,x,y, the net name is Nb,

    and the connected via is Vb. The coordinates of Vb are (Vb,x,Vb,y). We randomly

    choose two nets Nb1 and Nb2. The connected finger/pad locations are Fa1 and Fa2 and

    the connected via locations are (Vb1,x,Vb1,y) and (Vb2,x,Vb2,y). If Vb1,x < Vb2,x and Vb1,y

    = Vb2,y, a1 is certainly smaller than a2. In other words, the via order and the displayed

    sequence of the finger order are the same. An example can help to explain the rule. In

    Figure 3.4(A), the finger locations from the left to the right are F1, F2, ... F12, and the

    finger order is _,11,_,_,6,_,_,9,_,_,_,_. The via order in y=2 is 11,6,9. If the via order

    conforms to this rule, a legal monotonic routing certainly exists in this package. In

    this work, we assume that the connected via is fixed at the bottom-left corner of the

    bump ball and use the routing method from [28] to show the effectiveness of the

    finger/pad assignment. To improve the maximum density, a better finger/pad

    assignment method is needed. Here we propose two congestion-driven finger/pad

    assignment approaches: Intuitive-Insertion-Based Finger/Pad Assignment and

    Density-Interval-Based Finger/Pad Assignment.

    Intuitive-Insertion-Based Finger/Pad Assignment (IFA)

    This method depends on the inserted characteristic to avoid the illegal monotonic

    rule. The pseudo code is shown in Figure 3.7. In the IFA method, the first step is to

    find un-route horizontal lines (line 1). For each horizontal line, we must calculate the

    number of bump balls (line 2). For the first horizontal line (y=n, n is the highest

    horizontal line), the net name of each bump ball Bi,x,y is directly assigned on Fx (lines

    3-5). For other horizontal lines (y=n-1 to 0), the net name of the first bump ball Bi,1,y

    assigns into F1 and the net name of bump balls (x=2 to m-1) is assigned at Fb-1, where

    Fb denotes the (x-1)th bump ball location in the y-1 horizontal line (lines 7-11). The

    net name of the last bump ball is directly inserted into the last finger location (line 13).

  • - 44 -

    The time complexity for IFA is O(n2).

    Figure 3.7 The pseudo code of the IFA method.

    We can use an example to explain the IFA flow. In this example, the locations of

    the bump balls and nets are the same as in Figure 3.8. An illustration of the IFA is

    shown in Figure 3.8(A) and the routing result is shown in Figure 3.8(B). In Figure

    3.8(A), because nets 11, 6, and 9 are set at the highest horizontal line (y=2), step 1

    assigns these three nets into finger locations F1, F2, and F3. Step 2 inserts nets 1, 3, 5,

    and 8 (y=1) into suitable locations. Net 1 is set at Bi,1,y; we assign net 1 into F1 and the

    other nets on the finger move to the next finger location. For net 3, the bump ball

    location is B3,2,1. The net name on Bi,2,1+1 is "Net 6". Therefore, net 3 is inserted before

    net 6. Net 5 uses the same method to obtain a suitable location. Net 8 is inserted into

    the last location because it is the last net on this line. Step 3 repeats step 2 to insert

    remaining nets. The final finger order is 10,1,11,2,3,6,4,5,9,7,8,0. The routing result is

    shown in Figure 3.4(B) and the density is 2. Compare this result with Figure 3.4(A),

    the maximum density has decreased by 50%.

  • - 45 -

    Figure 3.8 (A) The IFA assignment result. (B) The routing result.

    Density-Interval-Based Finger/Pad Assignment (DFA)

    If IFA is applied to a two-level BGA package, the routing result is satisfied. If IFA

    is applied to a BGA package with three or more level, the result is imperfect because

    the insertion method of IFA only considers two horizontal lines. We propose another

    method, Density-interval-based Finger/pad Assignment (DFA), to solve this problem.

    The pseudo code is shown in Figure 3.9. We first determine a processing priority

    based on the coordinates of all the horizontal lines where n is total number of

    horizontal lines (line 1). For each horizontal line, we calculate the number of bump

    balls (line 2). Then, the density interval (DI) is computed (line 3), where "Total

  • - 46 -

    Non-allocated Net" denotes the number of nets not connected to the via, "Total Via

    Number" denotes the number of via on the horizontal line, and "Used Via Number"

    denotes the via used on the horizontal line. "(Total Via Number + 1)" denotes the

    segment in this horizontal line. For each bump ball (Bi,x,y, 1 x m), we calculate

    the empty number (EN) and insert the net name into the (EN + 1)th location (lines 4-7),

    where EN denotes the empty slot in the finger location. The time complexity for DFA

    is O(n). If we use this method to plan the nets, the net names would be averagely

    assigned into the finger/pad locations for each horizontal line, the routing path of all

    nets can be averagely planned into the whole substrate.

    Figure 3.9 The pseudo code of the DFA method.

    We use the same example to show the effectiveness of DFA. An illustration of

    DFA is shown in Figure 3.10. Because nets 11, 6, and 9 are set at the highest

    horizontal line (y=2), the first step is to decide on the finger locations of these three

    nets. According to the input information, the bump balls of these three nets are B11,1,2,

    B6,2,2 and B9,3,2. The Total Non-allocated Net is 12, Total Via Number is 4 and Used

    Via Number is 3. DI = (12-3)/(4+1)= 1.8. For net 11, EN 1 1.8 1. Therefore,

    net 11 is inserted into F2 because F1 is an empty slot. For net 6, EN 1 1.8 3.

    Because F2 is occupied, F1, F3, and F4 are unassigned spaces, and net 6 is assigned to

  • - 47 -

    the (3+1)th unassigned space, F5. Using the same method, all of the nets can be

    inserted into suitable locations. The final order of the nets is 10,11,1,2,6,3,4,9,5,7,8,0,

    as shown in Figure 3.10(C), and the routing result is shown in Figure 3.4(B).

    Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment method.

    DFA can obtain a better finger order when the finger number and the bump-level

    is large. We use another example to show that DFA is better than IFA. The nets and

    bump ball locations are shown in Figure 3.11 If IFA is used to plan these nets, the

    finger order is 13,7,3,1,14,8,4,2,15,9,5,16,10,6,17,11,18,12,19,20, and the density of

    package routing is 6, as shown in Figure 3.11(A). If we adopt DFA to plan, the finger

    order of nets is 13,7,3,14,1,4,8,15,9,5,2,16,10,17,6,11,18,12,19,20, and the density is

    5, as shown in Figure 3.11(B).

  • - 48 -

    Figure 3.11 The comparison of IFA and DFA. (A) The IFA routing result. (B) The DFA routing result.

    3.3.2 Finger/PadExchangeof2DandStackingICsforIRDropandBonding

    WireImprovement

    After obtaining an initial net order for finger/pad locations, we can exchange this

    order to improve IR-drop of the core. If we directly use EQ(3.1) to calculate IR-drop,

    the analysis time for the chip is very long. The main reason is that the analysis point

    and power pads are very more in all chip designs. To improve this problem, an

    efficient method for quickly analyzing IR-drop is needed. In this dissertation, we

  • - 49 -

    compute the variation of x and y to be the evidence of the IR-drop improvement

    when the location of the power pad is exchanged. This method would cause

    high-density routing in a package design because the density problem is ignored in

    this computation. Here we propose a method to improve IR-drop while

    simultaneously suppressing the density.

    Section 3.3.1 introduced the monotonic order. If our exchange method ignores

    this principle, the monotonic routing result is non-existent in the package. To maintain

    this property, we add a range constraint in our exchange method. The key idea for the

    range constraint is to mapping the monotonic order of vias [28] above the finger

    locations. We choose three bump balls Bb1,x1,y1, Bb2,x2,y2 and Bb3,x3,y3, and the

    connected fingers are Fa1, Fa2 and Fa3. If x1 < x2 < x3 and y1 = y2 = y3, Fa1 is

    certainly shown on Fa2 left and Fa3 is certainly showed on Fa2 right. We use an

    example to explain how we formulate the constraint. In Figure 3.4(B), net 6 is

    assigned at F5, and the exchange range of net 6 is between F3 and F7. If the exchange

    range is without the limit, we must pay a higher cost to find a suitable connected via

    to build the monotonic routing.

    When the finger/pads are exchanging, the package density needs to be controlled

    at the same time. We propose a control method. After the congestion-driven

    assignment step, the initial order of the nets on the finger/pad locations is determined.

    The bump ball locations should be recorded when they are planned at the highest

    horizontal line. This recording is needed because the monotonic rule is used in our

    package routing. The density of the high horizontal line is higher than the density of

    the low horizontal line. Therefore, we only oversee the density in the highest

    horizontal line. If the recorded number is x, nets could be divided into x+1 sections,

    Sc , 0 c x+1. For each section, we should record the interval number I , 1

    c x+1. When the nets are exchanged, the interval number would be changed.

  • - 50 -

    These numbers are called I , 1 c x+1. Therefore, the increased density (ID)

    can be computed as follows:

    ID max I I , 1 c x 1 ..............................................................(3.2)

    The package density is inversely proportional to the value of ID.

    The impact of bonding wires should be considered in the finger/pad exchange

    method when there are more than two tiers. We propose a method to improve bonding

    wires; the idea is to equidistant plan pads of different tiers. According to the tier

    number, , we make a unique parameter for each tier, UPd, 1 d . This

    unique parameter hasbits. One bit denotes one tier. We can use an example to

    explain the method for making this parameter. If the tier number is 3, the parameters

    from Tier 1 to Tier 3 are "001", "010" and "100". Every finger has one bonding wire

    to connect to the pad. The pad is set at Tier d, 1 d , and Tier d has one unique

    parameter, UPd. Therefore, we set the parameter of the fingers that connect to Tier d

    as UPd. The set of finger locations, F1,...F, ar