93541014

/

VLSI Design Planning with Power Integrity and

I/O Constraints

VLSI Design Planning with Power Integrity and I/O Constraints

by

Chao-Hung Lu

A dissertation submitted in partial satisfaction of the

requirements for the degree of

Doctor of Philosophy

in Electrical Engineering

in the GRADUATE DIVISION

of the NATIONAL CENTRAL UNIVERSITY

Taiwan, Republic of China

Professor Chien-Nan Liu and Hung-Ming Chen

January 2010

(98 4 )

( 1 )

/()

( ) ()

( ) ()

( ) ()

( ) ()

( )

93541014

: /

98 1 26

1. 15 3

http://thesis.lib.ncu.edu.tw/

2.

3.

4.

/

:

/

/

/

O(n)

/

Abstract In modern VLSI deigns, manufacturing issues have complicated the designs of

chips as well as packages. Moreover, due to the requirement of the market, modern

circuits have higher functionality, lower supply voltage and more I/Os. These

conditions increase complexity of chip designs. In this dissertation, we present some

I/O plan and floorplan methods to solve these problems. They cannot only be applied

to mitigate the power supply noise in the core, but also can consider the package

designs, and stacking IC designs.

For the simultaneous switching noise, our method adopts a two-stage technique

of the floorplan followed by the decoupling capacitance (decap) insertion. In the

floorplan, the area and noise are evaluated to find a noise-driven floorplanning result.

Then, we use a noise-driven decap planning approach to inserting minimal decaps

into a floorplan. For IR-drop and the packages issues, we adopt a finger/pad

assignment method to solve these problems. Our finger/pad assignment is a two-step

method: we first solve the package design problem, then try to minimize IR-drop by

switching finger/pad locations. In addition, since stacking IC is promising to the

development of a high-performance IC, in this dissertation, we propose a partition

approach to minimizing the 3D-vias and balancing the I/O number for each tier in

stacking IC. Finally, we perform a floorplanning to show the importance of the

aspect-ratio factor in stacking IC.

I

ContentChapter1 Introduction..........................................................................................................................................1

1.1 TrendsinVLSI........................................................................................................................................................1

1.2 StackingICAdvantageandTechnology......................................................................................................2

1.3 PowerIntegrityImpactsinChipDesign.....................................................................................................6

1.4 ImpactsofI/OPadLocationin2DandStackingICs...........................................................................7

1.5 DissertationOrganization.................................................................................................................................8

Chapter2 EffectiveDecapInsertioninAreaArrayI/OArchitecture................................................9

2.1 OverviewofDecapInsertion...........................................................................................................................9

2.2 PowerDeliveryandSignalIntegrityIssues............................................................................................12

2.2.1PowerDeliveryModelandNoiseEstimation..................................................................................12

2.2.2DecapBudgetinginAreaArrayArchitecture.................................................................................14

2.2.3ProblemFormulation.................................................................................................................................15

2.3 MinimalDecapAllocationinPowerSupplyNoiseAwareFloorplanning.................................15

2.3.1OTreeBasedPowerSupplyNoiseAwareFloorplanning.........................................................16

2.3.2FeasibleRegionforDecapAllocation.................................................................................................22

2.3.3 IdentificationofSpacePriorityforDecapInsertion....................................................................24

2.3.4DecapCompensationforVoltageDropinofPowerNetwork.................................................25

2.4 ExperimentalResults.......................................................................................................................................28

Chapter3 PackageRoutabilityandIRDropAwareFinger/PadAssignment..........................32

3.1 OverviewofPackageDesignMethods......................................................................................................32

3.2 CongestionandIRDropViolationMinimizationinFinger/PadPlanning...............................36

3.2.1ArchitectureandRoutingofBGAPackagein2DIC....................................................................37

II

3.2.2ArchitectureandInfluenceofBGAPackageinStackingICs.....................................................38

3.2.3TheImpactofFinger/PadLocationsonWireCongestion........................................................38

3.2.4TheImpactofFinger/PadLocationsonIRDropViolation......................................................39


3.3 CongestiondrivenFinger/PadAssignmentwithIRDropImprovement.................................42

3.3.1CongestiondrivenFinger/PadAssignment....................................................................................42

3.3.2Finger/PadExchangeof2DandStackingICsforIRDropandBondingWire

Improvement.............................................................................................................................................................48


Chapter4 DesignPlanningwith3DViaOptimizationinStackingIC............................................55

4.1 OverviewofOurPartitionMethod.............................................................................................................55

4.2 StackingICModelsandDesignFlow.........................................................................................................57

4.2.13DViaandStackingICModels..............................................................................................................58

4.2.2DesignFlowofAlternativeStackingIC..............................................................................................58

4.2.3TheImpactofI/OLocationinAlternativeStackingIC...............................................................61


4.3 I/OsandModulesPlanningwithMinimal3DViaNumberinAlternativeStackingICs....63

4.3.1GlobalPlanningforI/OsandModules...............................................................................................63

4.3.2 I/OAllocationbyCongestiondrivenPlanningandIterativeRefinement.........................65


Chapter5 ConcludingRemarksandFutureWorks................................................................................71

Reference...........................................................................................................................................................................74

III

List of Figures Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking

technology. ....................................................................................................... - 3 -Figure 1.2 The architectures of stacking ICs ............................................................ - 4 -Figure 1.3 Sub-classification of Wafer stacking ICs. ................................................ - 5 -Figure 1.4 The example of Ldi/dt noise effect. ......................................................... - 7 -Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result. ........... - 8 -Figure 2.1 (A) Flowchart of our proposed method (B) The illustration of our

method ...- 8 - Figure 2.2 Power delivery model in the area-array architecture - 8 - Figure 2.3 (A) The current consumption profile of module. (B) Simulation result by

HSPICE. - 8 - Figure 2.4 An O-tree example. - 17- Figure 2.5 The difference between the original O-tree and our approach ...- 18- Figure 2.6 The new Delete operation .....- 19- Figure 2.7 The relation between the area and the rotary module - 21- Figure 2.8 The new Insert operation.... - 21- Figure 2.9 The power-supply noise-driven floorplan algorithm ..- 22- Figure 2.10 The partition method for a decap. - 24- Figure 2.11 The space relation between decap locations and area - 25- Figure 2.12 The NDP_MAI flow chart. - 26- Figure 2.13 Circuit analysis for the power network.... - 27- Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology.

(C) The flowchart for IC physical designs. ................................................... - 33 -Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time

for the co-design method. .............................................................................. - 35 -Figure 3.3 The architecture of the two-layer ball grid array package. .................... - 38 -Figure 3.4 The relationship between the density and the finger/pad locations. ...... - 40 -Figure 3. 5 The simulation results of IR-drop. ........................................................ - 40 -Figure 3.6 The analysis model for IR-drop. ............................................................ - 41 -Figure 3.7 The pseudo code of the IFA method. ..................................................... - 44 -Figure 3.8 (A) The IFA assignment result. (B) The routing result. ......................... - 45 -Figure 3.9 The pseudo code of the DFA method. ................................................... - 46 -Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment

method. .......................................................................................................... - 47 -Figure 3.11 The comparison of IFA and DFA.. ....................................................... - 48 -Figure 3.12 The pseudo code of our finger/pad exchange method. ........................ - 51 -

IV

Figure 3.13 The routing results of Circuit 2. (A) Random (B) IFA (C) DFA ......... - 54 - Figure 4.1 The alternative stacking architecture. .................................................... - 56 -Figure 4.2 The flowchart of stacking ICs. .............................................................. - 59 -Figure 4.3 The area effect of aspect-ratio on stacking IC. ...................................... - 61 -Figure 4.4 The effect of I/O number in the 2-D chip design. ................................. - 62 -Figure 4.5 The effect of I/O number in alternative stacking IC. ............................. - 62 -Figure 4.6 The relation between the via number and I/O locations. ....................... - 64 -Figure 4.7 The connection graph of the circuit ....................................................... - 64 -Figure 4.8 The pseudo code of CPIR. ..................................................................... - 66 -Figure 4.9 The example of the CPIR method. ........................................................ - 67 -Figure 4.10 The detailed description of the I/O Planning Method. ........................ - 68 -Figure 4.11 The experimental result of the floorplan. ............................................ - 70 - Figure 5.1 Heat dissipation of Chips [60] ............................................................... - 73 -Figure 5.2 The flowchart of the future co-design tool ............................................ - 73 -

V

List of Tables Table 1.1 ITRS predictions for circuit performance in 2009[1]..-1- Table 1.2 A comparison of all classified stacking ICs.....-6- Table 2.1 A comparison of six floorplan representations.-16- Table 2.2 The peak noise after floorplanning, the decap insertion and run time -29- Table 2.3 The comparison table of our decap computation and [17]... -30- Table 2.4 Experimental results for some MCNC benchmarks with various approaches

for comparison.. -31- Table 3.1 The experimental data of test circuits... -52- Table 3.2 The maximum density, the total wire length and the maximum IR-drop in

our test circuits.........-53- Table 3.3 The improved ratio of IR-drop and bonding wires....-54- Table 4.1 The experimental result of our I/O planning methods targeting stacking

architecture with 3, 4, 5 and 8 tiers..................................................................-69-

- 1 -

Chapter 1 Introduction

1.1 Trends in VLSI

Current trends in chip design are integrating multiple functions into a single chip,

and simultaneously improving its size, power, performance and cost. To accomplish

this goal, semiconductor manufacturing companies are continually increasing the

number of transistors per square inch on integrated circuits, and improving their

manufacturing technology. International Technology Roadmap for Semiconductors

(ITRS) provides a prediction for Very Large Scale Integration (VLSI) growth [1], as

Table 1.3 illustrates.

Table 1.3 ITRS predictions for circuit performance in 2009 [1] Year 2009 2010 2011 2012 2013 2014

ASIC M1 Pitch (nm) 54 45 38 32 27 24 Vdd (High-performance) 1.0 0.97 0.93 0.87 0.84 0.81

Vdd (Low Operating ) 0.95 0.95 0.85 0.85 0.8 0.8 Transistors/Chip

(millions) 773 773 1546 1546 3092 3092

On-chip Clock(GHz) 5.45 5.84 6.32 6.81 7.34 7.91 Power Consumption (W) 143 146 161 158 149 152

Modern chip design technology is continually advancing to meet these ITRS

predictions. As VLSI technology enters the nanometer era, the resistance in connected

wires is increasing greatly. This resistance seriously decreases the performance of the

chip if the semiconductor manufacturing company uses the advanced technology to

make IC (Integrated Circuits). Besides the resistance, the power integrity issue must

be solved for the chip design as the trend in VLSI is to reduce supply voltages. This

condition helps reduce power dissipation, but also decreases the noise margin of

- 2 -

devices. Noise margin interference can sometimes generate erroneous chip functions,

which seriously reduce chip performance. As a result, the power integrity problem has

become one of the major factors affecting chip yield. In System-on-Chip (SoC)

designs, chips contain more functions and are expected to have much better

performance. At the same time, I/O counts are continually increasing. This adds up to

more routing complexity in the package design. These problems are the topic of

discussion in this dissertation.

1.2 StackingICAdvantageandTechnology

Stacking IC is promising to the development of a high-density high-performance

IC. Stacking technology stacks a die (chip, wafer) over another die (chip, wafer).

Transistors can be fabricated on different tiers, and the total wire length and size of

chip shrink by vertical interconnecting, as shown in Figure 1.1. The benefits of

stacking ICs include improvements in density, noise, power, performance, and

functionality.

(1) Density: In stacking ICs, transistors can be stacked and the package size can

be reduced. Compared to 2-D standard cells with stacking cells, this approach offers a

30% increase in area [2]. These reasons added the density when we convert a 2-D IC

to a stacking IC, since circuit components can be placed on top of, or underneath,

each other. Therefore, higher-density and higher-speed circuits can be created by

stacking ICs.

(2) Noise: Shorter wires have lower wire-to-wire capacitance, resulting in less

noise coupling between signal lines. Shorter global wires with reduced numbers of

repeaters should also have less noise and less jitter, providing better signal integrity.

Since Stacking IC can greatly reduce the length of global wires, it greatly improves

noise immunity.

- 3 -

(3) Power: The shorter wires will decrease the load capacitance, resistance and

the number of buffers needed. Since interconnect wires with their supporting repeaters

consume a significant portion of total dynamic power, the reduced average

interconnect length in stacking IC can reduce the total power consumption. Compared

with 2-D IC, stacking IC can reduce the wire length and significantly reduce total

dynamic power by more than 10% [3].

(4) Performance: Shorter wires decrease the time required to deliver a signal,

meaning that stacking IC can improve performance.

(5) Functionality: Stacked integration allows the combination of dissimilar

technologies (memory, RF, analog, logic) to create hybrid circuits.

Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking technology.

Current research refers to stacking ICs as three-dimensional (3-D) ICs [4] or

System in Package (SiP) [5]. Stacking ICs can be classified into four types: (1)

package stacking; (2) chip stacking; (3) wafer stacking; and (4) device stacking.

Figure 1.2 shows the differences between each type. In the package stacking approach,

the chip is packaged before stacking, as Figure 1.2(A) illustrates. Chip stacking ICs [6]

stack dies before packaging, as Figure 1.2(B) shows. Wafer stacking fabrication

- 4 -

([7],[8]) stacks the wafers before cutting, as Figure 1.2(C) shows. A wafer stacking IC

is smaller than a chip stacking IC. The size and the performance of device stacking

ICs are better than wafer stacking ICs and the architecture is shown in Figure 1.2(D).

Because modern manufacturing technologies for device stacking [9] ICs are relatively

Figure 1.2 The architectures of stacking ICs :(A)Package Stacking; (B)Chip Stacking; (C)Wafer Stacking; (D)Device Stacking.

- 5 -

new, device stacking ICs cannot be manufactured by semiconductor manufacturing

companies. Therefore, the main-stream of modern stacking IC manufacturing is wafer

stacking IC.

In [10], wafer stacking is subdivided into two types: (1) chip-to-wafer, and (2)

wafer-to-wafer. Figure 1.3 shows the differences between these approaches. In the

chip-to-wafer IC manufacturing process, defective dies are removed before stacking

[7]. The yield can be increased by the removing step. The disadvantage of the

chip-to-wafer stacking is that the lower tier area is larger than the higher tiers and the

spacing between die to die is wider than wafer-to-wafer IC. Wafers must be aimed

before cutting in wafer-to-wafer ICs [8]. The disadvantage of the wafer-to-wafer

stacking is that the bad die is used in the stacking even though the defective die is

found, as shown in Figure 1.3(B). Table 1. 4 provides a comparison of all classified

stacking ICs.

Figure 1.3 Sub-classification of Wafer stacking ICs. (A) Chip-to-Wafer (B) Wafer-to-Wafer

- 6 -

Table 1. 4 A comparison of all classified stacking ICs.

Package

Stacking

Chip

Stacking

Wafer Stacking Device

Stacking Chip-to-Wafer Wafer-to-Wafer

Yield Highest High Normal Low Lowest

Size Largest Large Normal Small Smallest

Performance Lowest Low Normal Normal Highest

1.3 Power Integrity Impacts in Chip Design

Basically, the power integrity issues can be categorized into signal integrity

problems and power integrity problems. This dissertation focuses on the power

integrity problem caused by power supply noises such as the I (delta-I, Ldi/dit) and

IR-drop noise. Ldi/dt is also called SSN (Simultaneous Switching Noise) or I

(delta-I) noise. Figure 1.4 shows that Ldi/dt noise is a voltage fluctuation phenomenon.

Figure 1.4(A) shows the architecture of the circuit. This circuit consists of two

sub-circuits, A and B. If the start time of switching activity for A and B is not the same,

i.e.t t , the voltage can remain at a high level, as in Figure 1.4(B). If not, the

unstable voltage increases for a short period and the voltage drops below the high

VDD constraint, as in Figure 1.4(C). Using the decap to enhance the stabilization of

the voltage fluctuation is a popular method. This dissertation develops a decap

insertion method to solve the SSN problem in chip designs.

IR-drop is the voltage drop when the current goes through the non-zero resistors

of supply lines. As IC design moves into the nanometer regime, the resistance of the

connection wire incurs a voltage drop that exceeds the lower boundary constraint. In

this dissertation, we modify the location of each power pad to reduce the resistance of

the connection wire, and suppress the IR-drop. We use a real chip design and

- 7 -

commercial tools to support this idea.

Figure 1.4 The example of Ldi/dt noise effect. (A) The circuit architecture. (B) The

voltage measure when t t . (C) The voltage measure when t t .

1.4 Impacts of I/O Pad Location in 2-D and Stacking ICs

In VLSI designs, the I/O pad locations not only impact the power integrity, but

also affect the package routing and the core area. This section describes these effects

in detail. A good I/O pad assignment can help reduce the length of bonding wires in

stacking ICs. Compared stacking IC with 2-D IC, the architectures of the bonding

wires are different, as shown in Figure 1.5. If the stacking factor is ignored (as shown

in Figure 1.5(A)), the chip performance would be worsened due to the longer bonding

wires. Figure 1.5(B) shows the ideal result for the wire bonding. To achieve this target,

we need to consider the stacking factor in the finger/pad planning method.

Besides bonding wires, I/O pads affect the number of 3D-Via. Most previous

studies do not consider the physical impacts of 3D-Vias which critically affect the

area, latency, performance and cost of stacking ICs. Therefore, the place and route

algorithms on stacking IC should effectively plan 3D-Via.

- 8 -

Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result.

1.5 Dissertation Organization

The goal of this dissertation is to provide a set of solutions considering the power

integrity and I/O constraint issues in the EDA field. We firstly develop a power model

to calculate the required decap for solving the delta-I problem and to increase the

usage of available space in the floorplan to reduce the area overhead caused by decap

insertion. Chapter 2 provides a detailed description of this step. We then develop a

planning method to determine the order of I/O pads considering package routability.

This plan method not only reduces the total wire length in the package but also

suppresses IR-drop noise of the core. Chapter 3 provides a detailed description of this

method. After determining the I/O order, the next step is to compute the location of

the I/Os. This study proposes a system partition approach to minimizing the number

of 3D-vias and balancing the number of I/Os on each tier, and modifies a traditional

floorplan method to optimize the I/O and module locations. Then, the stacking IC

design can be simplified into several 2-D IC designs. Chapter 4 describes this process

in detail. Finally, Chapter 5 concludes this dissertation and provides directions for

future research.

- 9 -

Chapter 2 EffectiveDecapInsertioninAreaArrayI/OArchitecture

As VLSI technology enters the nanometer era, supply voltages continue to drop

due to the reduction of power dissipation, but it makes power integrity problems even

worse. Employing decoupling capacitances (decaps) in floorplan stage is a common

approach to alleviating supply noise problems. Previous researches overestimate the

decap budget and do not fully utilize the empty space of the floorplan. A floorplan

usually has a lot of available space which can be used to insert decaps without

increasing the floorplan area. Therefore, the work presented in this chapter is to

develop a better model to calculate the required decap to solve the power supply noise

problem, and increases the usage of available space in the floorplan to reduce the area

overhead caused by decap insertion. The experimental results of this work are

encouraging. Compared with previous approaches, our methodology reduces 38% of

the decap budget on average for MCNC benchmarks but can still meet the power

supply noise requirements. The final floorplans with decaps are also smaller than the

results in previous works.

2.1 OverviewofDecapInsertion

Many researchers have proposed various approaches to solving this problem in

every design stage. The power/ground (P/G) network [11]-[13] is an important factor

in the supply noise problem. Power supply noise can be greatly improved by a better

P/G network with minimal penalty cost. Besides sizing the power lines [14],

employing decoupling capacitances (decaps) is a common approach to reducing

- 10 -

supply noise. Traditionally, the decap insertion process is performed after routing in

the physical design flow. This method would waste many unnecessary area of the

decap budget to improve the noise and decrease the efficiency of the decap budget.

Therefore, more and more researches propose to insert the decap before routing. [15]

proposed a two-step decap insertion method to improve power supply noise in the

placement level. This method includes one prediction method and one correction

method. The prediction step estimates the required decap pessimistically. Although the

decap size can be adjusted in the correction step, a smaller area overhead can be

achieved if decap insertion can be considered at an earlier stage. In [16] and [17], the

authors proposed decap insertion methods at the floorplan level to reduce supply noise.

Unfortunately, these previous researches often overestimate the decap budget. They

assume that the decap is able to fully supply the maximum current of the module,

which is too pessimistic in our observation. Besides the decap budget computation,

previous works do not fully use the available floorplan space. A floorplan usually has

a lot of available spaces that can be used to insert the decap without increasing the

floorplan area.

To make a high-performance and high pin-count IC, the area-array architecture is

often used. In this architecture, the signal bumps are uniformly distributed over the

chip. Therefore, the resistance from the core I/O to the signal bumps can be greatly

reduced and larger number of I/Os can be accommodated. In [18], the authors used

this architecture in floorplanning to improve the power supply noise. Because the

area-array architecture has such advantages, more and more chips adopt this

architecture to improve the power supply noise and limited pin number problem.

However, without decap insertion, the resulting floorplans and the area-array

architecture still suffer from supply noise violations.

The purpose of this work is to develop a better model for calculating the decap

- 11 -

Figure 2.14 (A) Flowchart of our proposed method (B) The illustration of our method

required to solve the power supply noise problem and to wisely use the available

space in the floorplan to reduce the area overhead caused by decap insertion. Based on

the area-array architecture, we propose a two-step approach that includes a

noise-driven floorplanning algorithm and a decap insertion approach to suppressing

power supply noise at the floorplan level, as Figure 2.14 illustrates. First, we use a

noise-driven floorplan algorithm to reduce the possible noise. This work adopts a

stronger adjacent module relation O-tree representation as the engine for supply noise

driven floorplanning, and successfully modifies the primary operations Delete and

Insert in the proposed framework. Second, we use a Noise-driven Decap Planning

with Minimum Area Insertion (NDP_MAI) approach to inserting minimal decaps into

a noise-guided resultant floorplan, with blocks and decaps legalization. Note that this

- 12 -

approach can compute the required decap size for a real design, and then provide the

optimal location for each deacp in floorplanning. After routing, we can use the method

in [19] to rectify our result, further improving the power supply noise problem.

The rest of this chapter is organized as follows. Section 2.2 describes the

floorplan design with power supply noise consideration, the new noise estimation

method, decap budget computation, and problem formulation. Section 2.3 presents the

floorplanning algorithm and the decap insertion approach for power supply noise

avoidance. Section 2.4 shows the experimental results.

2.2 PowerDeliveryandSignalIntegrityIssues

IR-drop and Ldi/dt (delta-I) noise are the main contributors in the noise margin

issue, and this work focuses primarily on the Ldi/dt noise. In this section, we describe

our power delivery model and noise estimation model used in this work and formulate

our problem.

2.2.1 PowerDeliveryModelandNoiseEstimation

In this work, the power source distribution is based on the area-array architecture.

The area-array architecture is a mesh structure, and the VDD and GND bumps are

uniformly distributed across the die with signal bumps in fixed interspersed location,

as illustrated in Figure 2.15(A). The resistance from the I/O to the connection block is

substantially decreased, the effect of the power delivery is better than peripheral-I/O

architecture. As a result, many high-performance chips adopt the area-array

architecture.

In the area-array architecture, a VDD bump supplies the current to all modules

according to the direct proportion of the distance from the bump to the module. Four

neighboring VDD bumps (right-top, right-down, left-top and left-down) of the module

- 13 -

Figure 2.15 Power delivery model in the area-array architecture.

supply the main current, as Figure 2.15(B) shows, so we compute the noise from these

VDD bumps only. Since there exist many paths for current delivery to the target

module, we only consider the shortest and second shortest paths for noise computation

simplification, as Figure 2.2(C) shows. The main reason is that currents follow the

least-impedance paths when flowing from the VDD to the target module. Compared

with this method with SPICE simulation, the error is within 10%, which is proved by

[17]. This computation method is fast and the error can be controlled within tolerable

range, so we use this method to compute the power supply noise in the floorplan level.

Kirchhoff's voltage law can be used to represent the noise calculation of each module:

V i RP LPP T ........................................................................(2.1)

where V denotes the power supply noise at module k, Pj denotes the path from

the power bump to node j, Pjk denotes the path from node j to node k, Tk denotes the

union of shortest paths and the second shortest paths, RP denotes the resistance of

P , LP denotes the inductance of P and i is the current flowing along path P .

- 14 -

2.2.2 DecapBudgetinginAreaArrayArchitecture

In [16] and [17], the authors assumed that the decap should fully supply the

maximum current of the module, as shown in the white region in Figure 2.16(A). In

this environment, the decap budget is possibly over-estimated. Actually, the VDD pin

continuously provides current when the chip is operating, as the grey region in Figure

2.16(A). Therefore, the required decap size can be significantly reduced.

Figure 2.16 (A) The current consumption profile of module. (B) Simulation result by HSPICE.

The required decap size can be obtained by the difference between the maximum

current (Imax) and the target current limit (Igen) for each module. Assume the target

current limit of module k is defined as I ,k=1,2,...M, and the maximum switching

current of module k is I . Let C be the required decap for circuit k and Q be

the amount of electric charge for the C . Q can then be obtained by the following

equation based on the triangle model shown in Figure 2.16.

Q I t dt I t dt.................................................................(2.2)

where t is the start time and t is the finish time when the target module is in

operational mode. The charge can be converted to the silicon area of the capacitance

fabrication as follows:

C QV

.............................................................................................................(2.3)

- 15 -

S CC

........................................................................................................(2.4)

where V is the noise constraint of the voltage, C is the decap budget and S

is the silicon area of C . C is the unit area capacitance of a MOS capacitor and

C /t , where is the permittivity of SiO2 and t is the oxide thickness.

We use SPICE to verify the accuracy of EQ(2.2), and compare the result with [17]. In

the experiment, we use 0.25m technology to do this simulation. The supply voltage

is set at 2.5V and the power supply noise tolerance level is set at 0.04V. Adopting the

[17] method to compute the required decap produces the result of 112pF. The decap

budget is 96pF when using EQ(2.2). Figure 2.16(B) shows the simulation result. The

proposed method yields less required decap than [17].

2.2.3 ProblemFormulation

The goal of this work is to use minimum decap to solve the power supply noise

problem in the floorplan level. In other words, we suppress power noise by each

module in different locations and empty minimal decap area to avoid possible power

noise during floorplanning. The problem can be formulated as follows :

Given a set of modules, B1,B2,...,Bm, current consumption, and , of

each block Bk, 1 k m, a set of power bumps, P1,P2,...,Pm, and the noise

constraint for each module , find a feasible solution such that each module Bk

obtains a required decap budget size DBSk, and minimum penalty area when DBSk is

inserted. At the same time, the voltage noise of module must be smaller than

the noise constraint .

2.3 MinimalDecapAllocationinPowerSupplyNoise

AwareFloorplanning

To solve the power supply noise problem, this work develops a two-step

methodology to suppress and reduce noise at the floorplan level, as Figure 2.14

- 16 -

illustrates. Since placing two high current consumption modules close together

seriously increases noise, we first propose a noise-driven floorplan algorithm to

improve this issue; the idea is to place the high current consumption modules

intelligently. The goal of our floorplan averages the high power consumption block at

one chip. This method can bring two benefits: (1) the peak noise can be improved; (2)

the decap budget can be averagely planned at one chip. The empty room after

floorplanning is small and dispersive. If many decaps are inserted into one particular

region, the area of the floorplan may increase because the empty room does not have

enough space for the decap. We then propose a Noise-driven Decap Planning with

Minimum Area Insertion (NDP_MAI) approach to reducing the power supply noise

and area overhead after floorplanning. We briefly introduce the representation of

O-tree and new needed operations, Delete and Insert, and then discuss the feasible

region of the decap budget.

2.3.1 OTreeBasedPowerSupplyNoiseAwareFloorplanning

To obtain a better result for noise-driven floorplan, a suitable and controllable

floorplan representation is needed. Table 2.5 compares six floorplan representations.

Table 2.5 A comparison of six floorplan representations.

Floorplan

Representation

Adjacent

Relation

Solution Space Operation

Delete Insert

SP[20] Not Good O((m!)2) - -

B*-tree[21] Good O(m!22m-2/m1.5) O(1) O(m)

O-tree[22] Best O(m!22m-2/m1.5) O(m) O(m)

TCG[23] Good O((m!)2) O(m2) O(m2)

CBL[24] Good O(m!23m-3/m1.5) O(m) O(m)

DBL[16] Good O(m!2m-1) O(m) O(m)

- 17 -

We choose O-tree to be our representation, the main reason is that the adjacent

relations can be directly obtained. High current consumption modules can be placed at

a distance from each other.

The O-tree is composed of a horizontal tree and a vertical tree, as shown in

Figure 2.17 (B) and (D). The horizontal (vertical) tree uses , to represent the

data structure, as shown in Figure 2.17 (C) and (E), where denotes the tree type,

denotes the paternity of the tree structure, and denotes the permutation of modules.

If the module touches another module horizontally (vertically), such as modules H

and L in Figure 2.17, it could be easily observed in the horizontal (vertical)

representation. If we use other representations, the adjacent relation of each module is

more difficult to be found.

Figure 2.17 An O-tree example. (A) Floorplan result. (B) Vertical tree. (C) Vertical tree representation. (D) Horizontal tree. (E) Horizontal tree representation.

Figure 2.18(A) shows the original O-tree operations. If module J is deleted, the

Delete operation generates a Left-Down(LD)-packing floorplan. The result is that two

high-current modules (I and K) are placed at an adjacent location. In some special

regions, they consume more power than other regions, and must use more decap to

reduce power supply noise. A similar situation occurs for the Insert operation because

it only considers the area and the wire length in original operation conditions.

- 18 -

According to the previous description, the original O-tree operations cannot control

the neighboring blocks. Therefore, new transformation operations are necessary.

These operations help avoid placing the high current consumption modules at adjacent

locations. We propose two new transformation operations:

Delete : The original operation deletes the selected module only. The new

operation can delete the selected module and top-right modules of the

selected module.

Insert : The original operation considers the area factor only. The module

can be inserted into a low noise location and an extensive area can be

minimized in our new operation.

We use an example to explain new transformation operations. To delete module J in

Figure 2.18(B), the selected module is module J and the top-right modules of the

selected module is module K only, so module J and module K must be deleted

together. The reason that top-right modules must be deleted is because the floorplan

must maintain a LD-packing result. If the right-top modules are not deleted at the

same time, the high current consumption module may be placed at a neighboring

location.

Figure 2.18 The difference between the original O-tree and our approach.

- 19 -

The new Delete operation consists of several steps. We first choose a

to-be-deleted module from of the horizontal and vertical O-trees, and then all

modules after in are chosen. We could obtain two block sets and . We then find

the intersection of and and obtain the candidate list of deletion modules. The final

step in the Delete operation is to delete the modules at the intersection of the vertical

and horizontal O-trees. To clarify, we use an example to explain our Delete operation.

Figure 2.19 shows the horizontal and the vertical representations of Figure 2.18. The

horizontal representation is set as (0011000111,HLIJK) and the vertical representation

is set as (0010101101,HIJKL). In this case, module J must be deleted. Therefore, the

block set of includes modules J and K, and the block set of includes modules J, K

and L. The deletion candidate list, JK, can be obtained after the intersection: JK JKL.

Finally, modules J and K in the representation are deleted. The horizontal

representation changes to (001101,HLI), and the vertical representation is modified as

(001101,HIL). The time complexity for the new Delete operation is O(m), where m

denotes the number of modules.

Figure 2.19 The new Delete operation

The new Insert operation consists of three parts: (1) find all possible locations; (2)

compute costs; (3) choose the optimal location. If one module is inserted in a

floorplan, there are many locations to choose from, and the first step is to discover

these candidate locations in a LD-packing floorplan. The possible insertion location is

at the lower-left corner of the floorplan result, as shown in Figure 2.21. Every possible

- 20 -

insertion location has a different cost, and next step is to compute the cost for each

candidate location. The cost function can be represented as follows:

C D A A D I I I I ........................................(2.5)

where C denotes the cost when module A is inserted in this location, D and D

are the weights, A is the area of the floorplan after the module is inserted,

A is the original area, I , denotes the current consumption of the module

a(b,c), and I denotes the threshold value for local current consumption. D is set

at a large value for penalizing high local current consumption. Every candidate

location cost must be computed twice since costs are different when the insert module

is directly inserted or rotated, as Figure 2.20 shows. Note that EQ(2.5) considers the

area and the power consumption only, this cost computation can be extended by

considering other objectives, such as wire length, etc.. Finally, the module is inserted

in the minimal cost location. The following illustration explains the new Insert

operation. In Figure 2.21(A), the initial floorplan result was made up of modules I, H

and L and modules J and K are the insert module candidates. Four triangles denote

the candidate location in the floorplan. In Figure 2.21(B), we compute the cost after

module J is inserted in all the candidate locations. Finally, module J is inserted in the

minimal cost location. In this case, the minimal cost location is at the corner between

module H and module L, as Figure 2.21(C) shows. Because the candidate list is not

null, the Insert operation must be repeatedly applied, as shown in Figure 2.21(D).

Note that EQ(2.5) considers left and down adjacent modules only when calculating

the cost function. The main reasons for this are (1)Because the Insert operation must

maintain a LD-packing floorplan, it only considers the left and down module of A;

(2)The Delete operation deletes all the top-right modules of A. If all modules are

inserted at the down-left corner only, there are no modules on the top and right side.

The time complexity of the new Insert operation is , where m denotes the

- 21 -

number of modules.

Figure 2.20 The relation between the area and the rotary module.

Figure 2.21 The new Insert operation.

According to previous operations and the SA (Simulated Annealing) [25]

algorithm, we propose a power-supply noise-driven floorplan algorithm, as illustrated

in Figure 2.22. We first provide a floorplan result and set the initial value for two

annealing temperature parameters (line 2). The new Delete operation is used to delete

modules (lines 4-6). Then, the new Insert operation and EQ(2.5) are used to insert

modules (lines 7-11). After the Insert operation, the new LD-packing floorplan can be

obtained. The difference C between the new floorplan and the original floorplan is

computed (line 12). If C is smaller than zero, it means that a better floorplan is

- 22 -

obtained and we would adopt this result to be our new solution (line 14). If not, we

randomly decide that the original floorplan must be replaced by the new floorplan

(lines 15-17). Finally, the temperature is cooled (lines 18).

Figure 2.22 The power-supply noise-driven floorplan algorithm

2.3.2 FeasibleRegionforDecapAllocation

After obtaining one floorplan, we calculate the required decap size for each

module. According to [20], [21], [22], [23], [24] and [16], the empty room after

floorplanning is small and dispersive. If a bigger decap is inserted into one floorplan,

the area of the floorplan may increase because the unitary empty room does not have

Power-supply Noise-driven Floorplan Algorithm Input : A compacted floorplan,F, with the current consumption for each block Output : A compacted floorplan,F, that two continuous high current consumption blocks could be separately placed 1. Begin 2. Initial floorplan; Temperature; Final_Temperature; 3. while Temperature > Final_Temperature 4. Randomly choose one block Bx from F; 5. Using Delete operation to delete Bx; 6. All deleted blocks added into a candidate list; 7. while candidate list is not NULL 8. Choose one block By from candidate list; 9. Using EQ(2.6) and Insert operation to insert By; 10. Delete By candidate information; 11. end_while 12. C = Cost(New_Floorplan) Cost(Floorplan); 13. if C < 0 14. Floorplan = New Floorplan;

15. else if Random(0,1) > exp CT

16. Floorplan = New Floorplan; 17. end_if 18. Cooling(Temperature); 19. end_while

- 23 -

enough space for the decap. Besides the area factor, the charge/discharge time of the

capacitance must be considered. The charge time is substantially reduced when

several smaller decaps form the required deacp. Our method cuts required decap into

four smaller decaps to minimally increase the floorplan area and reduce the

charge/discharge time of the decap. Note that the sizes of each smaller decap are not

the same. The distribution is based on the Manhattan distance from the VDD source to

the power bump and ( , ) denotes the connection relation. denotes the

Manhattan distance from the module's VDD location to the power bump x and

denotes the obtainable current contribution ratio from the power bump x. The

computational equation of the current contribution ratio can be written as follows:

DP , P , , P , , ,

P P , P , , P , , ,............................................................(2.6)

where (a x) denotes a and x are the different power bump. P , denotes the

power bump source of P and P are different. According to EQ(2.6), is

inversely proportional to P , it is to follow the current divided theorem. A simple

example helps to explain EQ(2.6). Figure 2.23(A) shows a result of the floorplan.

Module D needs decap to supply the current consumption. We first use EQ (2.2)-(2.4)

to compute the optimal decap sizing. The decap is partitioned into four smaller decaps.

Each smaller decap is given a feasible region that ranges from the location of the

power bump to the VDD source. We then use EQ(2.6) to compute the current ratio for

each power bump, as Figure 2.23(B) shows. Based on these constraints, a smaller

decap can be inserted into the chip and the charge time of the capacitance can be

substantially decreased.

In modern chip design, the decap is inserted in the empty space after detailed

routing. In reality, the decap cannot improve the power noise for the high current

consumption module if the distance from the decap to the module is far. To effectively

- 24 -

Figure 2.23 The partition method for a decap.

utilize the energy for each decap, the rectangle scope from the power bump to the

VDD pin is the feasible region for each small decap, as shown in Figure 2.23(A).

2.3.3 IdentificationofSpacePriorityforDecapInsertion

After floorplanning, the chip has some exploitable space for decap insertion. If

these spaces can be fully utilized, the cost of the chip might not increase even when

the decap is inserted into the chip. This section discusses the effect of placing the

decap in each different space. Furthermore, we propose a Noise-driven Decap

Planning with Minimum Area Insertion approach that simultaneously considers the

area cost and the noise effect.

In a floorplan result, it certainly has one or more horizontal (vertical) longest

paths. The path denotes the maximum width (height) of the floorplan. As shown in

Figure 2.24, module H and L compose a horizontal longest path. Varying these

modules directly modifies the area. Therefore, if one decap is inserted in the channel

space between these modules, the area would increase significantly. This channel

space is called the extensible space. A channel space that overlaps side of the empty

room is called the empty space that is not held by any module. If one decap is inserted

in this space, the location of each module does not change. The remaining channel

spaces are called the available space except for the channels of the extensible and

empty space. If one decap is inserted in this space, the area of the floorplan is fixed

- 25 -

and the location of some modules shifts only slightly. In Figure 2.24, the horizontal

longest path is H L and the vertical longest path is H I. If one decap is inserted

into the extensible space between H and I, the area would increase 285m . If one

decap is inserted into the empty space corner between H and I, the area and the

topology are not affected. Hence, the cost is lowest when the decap is inserted in the

empty space. If the decap is inserted into the available space between L and J, the area

is not increased but the topology changes. Therefore three types of spaces are the

candidates location for the decap -- Available Space, Extensible Space and Empty

Space. The priorities of these spaces are defined as follows: Empty Space>Available

Space>Extensible Space. The minimum cost space will be selected for the decap

insertion. Figure 2.25 illustrates the NDP_MAI approach.

Figure 2.24 The space relation between decap locations and area.

2.3.4 DecapCompensationforVoltageDropinofPowerNetwork

When using EQ(2.2)-EQ(2.4) to calculate the required decap for each module, the

decap must be placed around the target. The NDP_MAI approach does not consider

this factor when placing the decap. If the location of the decap is not near the target,

the power supply noise violates the given constraint because a part of the supply

power from the decap would be consumed by wire resistance. To improve this

problem, we use a simple compensative computation as follows:

V ........................................................................................(2.7)

- 26 -

Figure 2.25 The NDP_MAI flow chart.

where is the required decap of module k after the compensation, V is the

supply voltage, is the distance from the space to the connection point, and is the

wire capacitance per unit length. Although we could compensate the power

consumption of the wire by EQ(2.7), the power network is another important issue

that affects the supply power of the decap. Figure 2.26(A) shows the power network

after the decap insertion. We could utilize the superposition theorem to analyze the

circuit, as Figure 2.26(B)(C) shows. The discharge current from the decap disperses to

different modules because they both depend on the same power network. If decap

- 27 -

budget computations do not consider this factor, the power supply noise constraint

may be violated.

Figure 2.26 Circuit analysis for the power network.

To solve the above problem, we need a more accurate compensation equation for

EQ(2.7). According to the current divided theorem

V ........................................................................(2.8)

where is the total resistance on the side of the module k and is the total

resistance of other sides(Figure 2.26(B)). EQ(2.8) can accurately compensate the

required decap. We use SPICE to verify the accuracy of these two compensative

equations. In our experiment (the circuit in Figure 2.26), these modules are of the

same resistance. We expect module B to obtain 2.75mA from decap. If we use EQ(2.7)

to correct the decap, module B obtains only 2.66mA, which is insufficient for module

B. If we use EQ(4.8) to adjust the decap, it obtains 2.78mA from decap. This

- 28 -

experiment shows that the module can obtain sufficient current from the decap when

we use EQ(2.2)-(2.4) and EQ(2.8) to compute the required decap.

2.4 ExperimentalResults

We implemented the Power-supply Noise-driven floorplan algorithm, the

NDP_MAI approach, and the approach in [17] using C++ language on an AMD 3200

computer with 1G memory. Table II and IV compare the run-time, peak noise, and

decap budget with [17]. The purpose of this work is to analyze the effect of effective

decap insertion in the floorplan level. To obtain an equivalent comparison, the original

cost function of [17] is " A A W W ", where " W W " denotes the

wire length cost. We set as zero because the wire length is ignored in our cost

function. Five MCNC benchmark circuits, apte, hp, xerox, ami33 and ami49 are used

to test the performance of proposed methodology. Since the MCNC benchmark

includes no noise constraint, the noise constraint is set at 0.13V and 0.25V. In our

experiments, the operation times tw0 and tw1 of the switching current waveform are set

to be 0.3 and 0.8. The power supply voltage is 1.2V and the distance between two

continuous VDD bumps is 1000/m and the power supply mesh is 333.3/m. We

use [17]'s method to generate the current consumption information of each module.

The for module k is A D , where A is the area of module k, and D is the

worst case current density. is assigned as a random value to be 1.05 ~2 .

Table 2.6 compares our method with [17], the peak noise at noise-aware

floorplanning (noise-driven) and the post-floorplanning decap insertion (post).

Experimental results show that our floorplan method obtained better results than [17].

The main reason for this is that the high current consumption modules are placed

apart when we use EQ(2.1) to compute the peak noise. The time complexity of our

floorplan method is slightly higher than the method in [17]. The main reason for this

- 29 -

are: (1) The time complexity of the original Insert operation is O(m) and the new

Insert operation is O(m2). The time complexity of new operations is higher than

original O-tree operations; (2) In [17], the authors use the sequence pair-based

floorplanner to plan blocks. The sequence pair method modifies the list order to

change the floorplan result. The time complexity for each change should be lower

than the original O-tree method. In the post-floorplanning decap insertion, all results

conform to the given constraint, 0.13V, and both run-times are very fast.

Table 2.6 The peak noise after floorplanning, the decap insertion and run time.

Our Method [17]

Circuit Peak Noise(V)

(noise-driven)

Peak Noise(V)

(noise-driven)

Run Time(s)

(noise-driven)

Peak Noise(V)

(noise-driven)

Peak Noise(V)

(noise-driven)

Run Time(s)

(noise-driven)

apte 2.05 0.13 2 2.05 0.11

- 30 -

Simulations have been run for xerox and ami33 with HSPICE. The peak noise before

decap insertion is 1.63V for xerox, 0.29V for ami33. After applying [17]'s method for

decap insertion, the peak noise is 0.06V for xerox, 0.11V for ami33. If we use our

proposed method, the peak noise is 0.04V for xerox, 0.06V for ami33. These results

are close to our results in Table 2.7.

Table 2.7 The comparison table of our decap computation and [17]. Circuit Our Decap

Budget(nF) Our Peak Noise(V)

[17]Decap Budget(nF)

[17]Peak Noise(V)

Decrease Ratio

apte 14.77 0.13 22.12 0.11 33%

hp 3.6 0.13 5.01 0.11 28%

xerox 6.53 0.13 9.55 0.10 31%

ami33 0.17 0.12 0.36 0.09 52%

ami49 7.85 0.12 14.54 0.10 46%

Table 2.8 compares the area information of the proposed methodology with [17].

In the third column, we completely use the [17] method to compute the incremental

area. In the fourth column, we partially adopt the [17] method (including floorplan

and decap computation) and our decap insertion method to implement. The third

column and the fourth column show that our proposed floorplanning framework alone

has our performed the method in [17]. According to this result, our decap insertion

method is better than [17]. The main reason is our method cuts decap into four smaller

decaps to minimally increase the floorplan area. The incremental area of our proposed

method is shown in the last column. Compared to the numbers reported in previous

papers, the proposed floorplanning framework creates better initial floorplans to work

on, followed by the effective NDP_MAI approach to inserting enough decaps.

- 31 -

Table 2.8 Experimental results for some MCNC benchmarks with various approaches for comparison.

Circuit Modules[17] Increased

Area(m2)

[17]s Decap+Our Insertion Increased

Area(m2)

Our Methodology

Increase Area(m2)

apte 9 356832 292618 19036

hp 11 76608 68184 157640

xerox 10 152061 121648 75006

ami33 33 8228 5082 3824

ami49 49 266616 201486 154990

- 32 -

Chapter 3 Package Routability- and IR- Drop-Aware Finger/Pad Assignment

The trends in VLSI is to make more and more electronic devices into a single

chip, and the performance requirement is getting more severe. To achieve this

objective, the finger/pad counts are continuously increased, thus the package and chip

design becomes more and more complex. Due to the increasing complexity of the

design interactions between the chip and package, it is essential to consider them at

the same time. In addition, the reduced supply voltages in modern chip design are

tightening the noise margin. IR-drop is an important part of the design issue, and it is

now an inevitable waste when the circuit obtains energy from a power source.

In order to simultaneously handle core and package problems, co-design of core

and package is a widely adopted solution, particularly because the finger/pad

locations significantly affect IR-drop of the core and the package routing. In this

chapter, we developed chip-package co-design techniques to determine the locations

of the fingers/pads for package routability and signal integrity concerns in 2-D and

stacking IC design. Our finger/pad assignment is a two-step method: we first solve the

wire congestion problem in package routing and then try to minimize the IR-drop

violation and the length of the bonding wires. The experimental results are

encouraging. Compared with the randomly optimized method, on average, our

approaches reduce the maximum package density by 42% and 68% for both

technologies, the IR-drop by 10.61% and 4.58%, and the bonding wires for stacking

IC by 15.66%.

3.1 OverviewofPackageDesignMethods

In the traditional design methodology, the core and package of a chip are

- 33 -

designed separately, as shown in Figure 3.1(A) and (C). Core designers assume that

package problems will not affect the performance of the chip. However the

performance, complexity, and noise of the package critically affect the chip [26]. In

the new chip design paradigm such as Stacking-IC [5], the package design absolutely

determines the final quality of the chip. Therefore, a high quality package design is

needed for a modern chip design.

Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology. (C) The flowchart for IC physical designs.

As VLSI technology enters the nanometer era, chips contain more functions and

are expected to have much better performance. At the same time, finger/pad counts

are continually increased. This adds up to more routing complexity in the package

design. In early package technologies, the number of available finger/pads was small,

such as Dual In-line Package (DIP) or Pin Grid Array (PGA). The Ball Grid Array

(BGA) is a popular package technology for modern package design because it can

handle high finger/pad counts to connect to the Printed Circuit Board (PCB). The

- 34 -

package design flow can be divided into several parts, as shown in Figure 3.1(A). The

major problem in package design is routing. Many researchers [27][28][29][30] have

proposed various approaches to solving the routing problem in package design. Using

finger/pad assignments to improve the package routing is another alternative. In

[31][32][33], the authors proposed numerous assignment algorithms to improve the

routing problem. Because these methods can only handle a small finger count (

- 35 -

separately in the modern chip design, as shown in Figure 3.2(B). This principle will

cause the over-design conduction. Package designers usually use a finger planning

method to improve package routing, and core designers propose a noise-driven I/O

planning method to improve IR-drop of a core. To build a functionally correct chip,

we usually over-design the chip to mitigate routability(noise)-related issues in the

finger planning (I/O planning) step. This over-design brings two disadvantages: the

longest cycle time for the chip design and a greater cost for the chip design. If we

perform chip-package co-design to simultaneously compute the interdependent

influence of IR-drop and package routing across the die and chip, these disadvantages

can be easily eliminated.

Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time for the co-design method.

We enhance the [37]s method to apply in stacking IC. We develop a two-step

approach to simultaneously improving the package congestion and IR-drop of the core

at the finger/pad assignment step for a 2-D IC. If this approach is used for a stacking

IC, the length of bonding wires, and the package congestion and IR-drop can be

simultaneously improved. This approach includes one congestion-driven assignment

- 36 -

and one finger/pad exchange approach, as shown in Figure 3.1(B). Our contributions

are summarized as follows.

We present a finger/pad assignment method to minimize the maximum wire

congestion, and propose a finger/pad exchange method to improve IR-drop

of the core in a stacking IC design. The assignment result can certainly lead

to a legal routing solution.

We propose an efficient estimation to analyze the wire congestion before

routing. This method does not need to analyze the whole substrate, and it can

directly find the most congested region.

We develop a co-design methodology to simultaneously improve the

problems with the package and core in (stacking) ICs. The cycle time for the

chip design can be greatly shortened.

The rest of this chapter is organized as follows. Section 3.2 describes the package

architecture in 2-D and stacking ICs, finger/pad assignment design with congestion

and IR-drop consideration, and the problem formulation. Section 3.3 presents two

congestion-driven assignment methods and one finger/pad exchange method to

improve package problems and IR-drop. Section 3.4 shows the experimental results.

3.2 CongestionandIRDropViolationMinimizationin

Finger/PadPlanning

To deliver great data in modern chip designs, finger/pad counts are continually

increased and the complexity of package routing is greatly raised. In addition, the

IR-drop issue seriously impacts the performance for the chip design. The finger/pad

not only affects the package routing, but also impacts IR-drop of the core. This study

focuses primarily on these problems. We first introduce our package model, and then

the sources of the package routing and IR-drop problems are described. Finally, we

- 37 -

formulate the target problem in this work.

3.2.1 ArchitectureandRoutingofBGAPackagein2DIC

Based on the modern package technology, we can utilize multiple layers for

package routing. In our package model, there are two layers for routing, the die on the

top layer of the substrate, and the bump balls on the bottom layer of the substrate. The

fingers, which are the relay from the pad to the package substrate, are placed as a

closing rectangle on Layer 1. The pads are connected to the fingers by wire-bond and

flip-chip [1] technologies. Because wire-bond packages are cheaper than flip-chip

packages, we adopt the wire-bonding technology to connect the die and the package

substrate in our package module. The detailed architecture is shown in Figure 3.3.

Figure 3.3(A) shows the vertical view and (B) is the profile. Bump balls, which are

connected to the printed circuit board, are uniformly distributed on Layer 2. The net

between the finger and the bump ball is implemented within a package substrate on

Layer 1 and Layer 2. The function of the via is to connect a wire on Layer 1 and

another wire on Layer 2, as shown in Figure 3.3(B). In addition, we partition the

package area into four parts and solve the package problems individually. We also

assume that the finger order and the pad order are the same.

Because the via count affects the performance and the area of the package, we

constrain that the maximum via count of each net is one in our package routing. In

addition, the candidate locations for the vias are around the bump ball. The number of

vias between four adjacent bump balls is at most one. In [28], the authors proposed a

global routing method to plan the via location and the net path, and the routing result

complies the monotonic characteristic. The monotonic characteristic is that the net

from the finger to the bump ball intersects every horizontal grid line only once.

Therefore, the detour routing would not occur and the wire length can be reduced. We

adopt the idea of [28] to plan the via location and the routing path for the same

- 38 -

Figure 3.3 The architecture of the two-layer ball grid array package. (A) The vertical view. (B) The profile.

purposes.

3.2.2 ArchitectureandInfluenceofBGAPackageinStackingICs

Compared stacking IC with 2-D IC, the architectures of the bonding wires are

different, as shown in Figure 1.5. If the stacking effect is ignored (as shown in Figure

1.5(A)), the chip performance would be worsened because bonding wires are longer

and the resistance and inductance are inversely proportional to the wire length. In

addition, the bonding wire yield is lower if the distance between the finger and

connected pad is longer. Figure 1.5(B) shows the optimal result for the finger/pad

planning. To achieve this target, we need to consider the stacking factor in the

finger/pad planning method.

3.2.3 TheImpactofFinger/PadLocationsonWireCongestion

The vias are evenly distributed on the substrate in our package architecture. We

compute the wire count between two continuous vias to denote the density. If the

- 39 -

density is higher, it indicates that too many wires pass through a narrow range.

Therefore, a violation of design rule is probably occurred. To improve this problem, it

is essential to develop a good method to control the density. The relationship between

the density, via location and routing method is detailed in [28]. This work focuses on

the relationship between the density and the finger/pad locations.

A good finger/pad assignment can help to reduce the density of the package

routing. We use an example to explain the relationship between the density and the

finger/pad assignments. To display the importance of the finger/pad assignments, the

via location and routing method is fixed in the example. In Figure 3.4(A), we use a

random method to generate the finger order, 10,1,2,3,11,6,9,4,5,8,7,0. In Figure

3.4(B), a congestion-driven assignment method is used to generate a new finger order,

10,11,1,2,6,3,4,9,5,7,8,0. Compared Figure 3.4(B) with (A), the maximum density can

be reduced 50% when we merely change the finger order.

3.2.4 TheImpactofFinger/PadLocationsonIRDropViolation

IR-drop is the unavoidable waste of electric charge when the circuit obtains

energy from power pads. Compared wire-bond packaging with flip-chip packaging,

the IR-drop problem of a wire-bond package is worse than a flip-chip package. The

main reason is that the distance from the power pad to the module in a flip-chip

package is shorter than in a wire-bond package. However, as we move into the

nanometer regime, the resistance of the connection wire would consume the supply

energy. If the power pad cannot supply enough energy, the voltage drop might exceed

the lower boundary constraint. In this work, we modify the location of each power

pad to improve the resistance of the connection wire. Further, IR-drop can be

improved. We use a true chip design and commercial tools to verify the accuracy of

this concept. The simulated result is shown in Figure 3. 5. Compared Figure 3. 5(B)

with (A), IR-drop can be greatly improved by just changing the pad locations.

- 40 -

Figure 3.4 The relationship between the density and the finger/pad locations.

Figure 3. 5 The simulation results of IR-drop.

To minimize the cycle time of the chip design, we need a good and efficient

model to analyze IR-drop. This is usually done after floorplanning and placement

[38][39], and the results are shown to be close to the results from SPICE simulation.

In [40], authors proposed an analytical model for use before floorplanning. Since the

- 41 -

finger/pad assignment problem is resolved before floorplanning, we adopt the model

in [40] to obtain the IR-drop map. Since this model should be used before the

planning of the core, it is not very accurate. The power grid model of [40] is shown in

Figure 3.6(A). Figure 3.6(B) is a node model for the grid. The authors assume that the

power consumption of all the locations are the same, and propose the following

equation to calculate IR-drop of each point.

VIR , VIR ,

R

VIR , VIR ,

R

VIR , VIR ,

R

VIR , VIR ,R

J x y ............................................................................(3.1)

where VIR x, y is the voltage of a point x, y , J is the current density, and R

and R are the resistances in the x and y directions. According to EQ(3.1), we can

exchange power pad locations to minimize x and y to improve IR-drop.

Figure 3.6 The analysis model for IR-drop.

3.2.5 ProblemFormulation

We have detailed the relationships between the wire congestion, IR-drop and

finger/pad locations in 2-D and stacking ICs. In modern chips, the finger/pad counts

are continuously increased and the supply voltages are continuously decreased. Issues

related to the wire congestion on a substrate, IR-drop of the core and bonding wires of

stacking ICs are becoming more and more serious. The goal of this work is to plan

- 42 -

nets on regular finger/pad locations to improve these issues. In other words, we

decrease the density, voltage drop of the core, and length of the bonding wires by

relocating the finger/pad locations. The problems can be formulated as follows:

Input : The locations of the fingers/pads, F1,F2,...,F from the left to the right, the

set of the net names, N1,N2,...,N and the type of each net, the locations of the

bump balls, B1,1,1,B2,1,2,...,B , , , where , denote the coordinates of the bump

ball, denotes the net name, denotes the total net count, and denotes the

total finger/pad count. In addition, we must set the tier number, , and the pad

number for each tier.

Output : The assignment of net Nb, 1 b to finger/pad locations Fa , 1

a .

Objective : Minimize the maximum density and the voltage drop of the core, and

improve the length of the bonding wires based on a pre-floorplan model.

3.3 CongestiondrivenFinger/PadAssignmentwith

IRDropImprovement

To solve the density and IR-drop problems, we propose a two-step methodology

at the finger/pad planning level, as Figure 3.1(B) illustrates. We first propose two

congestion-driven finger/pad assignment methods to improve the package density; the

idea is to calculate the ideal density and compute a suitable finger/pad order and

locations. We then present a finger/pad exchanging approach to reduce IR-drop. This

exchange approach will simultaneously consider the density, IR-drop and bonding

wires.

3.3.1 Congestion-driven Finger/Pad Assignment

The monotonic routing is a method in the package design [28]. It can provide a

high-quality routing result. This work adopts this routing principle to verify the effect

- 43 -

of the assignment method. Based to the monotonic characteristic, [28] proposed a via

assignment rule. For each finger Fa, the target bump ball is Bb,x,y, the net name is Nb,

and the connected via is Vb. The coordinates of Vb are (Vb,x,Vb,y). We randomly

choose two nets Nb1 and Nb2. The connected finger/pad locations are Fa1 and Fa2 and

the connected via locations are (Vb1,x,Vb1,y) and (Vb2,x,Vb2,y). If Vb1,x < Vb2,x and Vb1,y

= Vb2,y, a1 is certainly smaller than a2. In other words, the via order and the displayed

sequence of the finger order are the same. An example can help to explain the rule. In

Figure 3.4(A), the finger locations from the left to the right are F1, F2, ... F12, and the

finger order is _,11,_,_,6,_,_,9,_,_,_,_. The via order in y=2 is 11,6,9. If the via order

conforms to this rule, a legal monotonic routing certainly exists in this package. In

this work, we assume that the connected via is fixed at the bottom-left corner of the

bump ball and use the routing method from [28] to show the effectiveness of the

finger/pad assignment. To improve the maximum density, a better finger/pad

assignment method is needed. Here we propose two congestion-driven finger/pad

assignment approaches: Intuitive-Insertion-Based Finger/Pad Assignment and

Density-Interval-Based Finger/Pad Assignment.

Intuitive-Insertion-Based Finger/Pad Assignment (IFA)

This method depends on the inserted characteristic to avoid the illegal monotonic

rule. The pseudo code is shown in Figure 3.7. In the IFA method, the first step is to

find un-route horizontal lines (line 1). For each horizontal line, we must calculate the

number of bump balls (line 2). For the first horizontal line (y=n, n is the highest

horizontal line), the net name of each bump ball Bi,x,y is directly assigned on Fx (lines

3-5). For other horizontal lines (y=n-1 to 0), the net name of the first bump ball Bi,1,y

assigns into F1 and the net name of bump balls (x=2 to m-1) is assigned at Fb-1, where

Fb denotes the (x-1)th bump ball location in the y-1 horizontal line (lines 7-11). The

net name of the last bump ball is directly inserted into the last finger location (line 13).

- 44 -

The time complexity for IFA is O(n2).

Figure 3.7 The pseudo code of the IFA method.

We can use an example to explain the IFA flow. In this example, the locations of

the bump balls and nets are the same as in Figure 3.8. An illustration of the IFA is

shown in Figure 3.8(A) and the routing result is shown in Figure 3.8(B). In Figure

3.8(A), because nets 11, 6, and 9 are set at the highest horizontal line (y=2), step 1

assigns these three nets into finger locations F1, F2, and F3. Step 2 inserts nets 1, 3, 5,

and 8 (y=1) into suitable locations. Net 1 is set at Bi,1,y; we assign net 1 into F1 and the

other nets on the finger move to the next finger location. For net 3, the bump ball

location is B3,2,1. The net name on Bi,2,1+1 is "Net 6". Therefore, net 3 is inserted before

net 6. Net 5 uses the same method to obtain a suitable location. Net 8 is inserted into

the last location because it is the last net on this line. Step 3 repeats step 2 to insert

remaining nets. The final finger order is 10,1,11,2,3,6,4,5,9,7,8,0. The routing result is

shown in Figure 3.4(B) and the density is 2. Compare this result with Figure 3.4(A),

the maximum density has decreased by 50%.

- 45 -

Figure 3.8 (A) The IFA assignment result. (B) The routing result.

Density-Interval-Based Finger/Pad Assignment (DFA)

If IFA is applied to a two-level BGA package, the routing result is satisfied. If IFA

is applied to a BGA package with three or more level, the result is imperfect because

the insertion method of IFA only considers two horizontal lines. We propose another

method, Density-interval-based Finger/pad Assignment (DFA), to solve this problem.

The pseudo code is shown in Figure 3.9. We first determine a processing priority

based on the coordinates of all the horizontal lines where n is total number of

horizontal lines (line 1). For each horizontal line, we calculate the number of bump

balls (line 2). Then, the density interval (DI) is computed (line 3), where "Total

- 46 -

Non-allocated Net" denotes the number of nets not connected to the via, "Total Via

Number" denotes the number of via on the horizontal line, and "Used Via Number"

denotes the via used on the horizontal line. "(Total Via Number + 1)" denotes the

segment in this horizontal line. For each bump ball (Bi,x,y, 1 x m), we calculate

the empty number (EN) and insert the net name into the (EN + 1)th location (lines 4-7),

where EN denotes the empty slot in the finger location. The time complexity for DFA

is O(n). If we use this method to plan the nets, the net names would be averagely

assigned into the finger/pad locations for each horizontal line, the routing path of all

nets can be averagely planned into the whole substrate.

Figure 3.9 The pseudo code of the DFA method.

We use the same example to show the effectiveness of DFA. An illustration of

DFA is shown in Figure 3.10. Because nets 11, 6, and 9 are set at the highest

horizontal line (y=2), the first step is to decide on the finger locations of these three

nets. According to the input information, the bump balls of these three nets are B11,1,2,

B6,2,2 and B9,3,2. The Total Non-allocated Net is 12, Total Via Number is 4 and Used

Via Number is 3. DI = (12-3)/(4+1)= 1.8. For net 11, EN 1 1.8 1. Therefore,

net 11 is inserted into F2 because F1 is an empty slot. For net 6, EN 1 1.8 3.

Because F2 is occupied, F1, F3, and F4 are unassigned spaces, and net 6 is assigned to

- 47 -

the (3+1)th unassigned space, F5. Using the same method, all of the nets can be

inserted into suitable locations. The final order of the nets is 10,11,1,2,6,3,4,9,5,7,8,0,

as shown in Figure 3.10(C), and the routing result is shown in Figure 3.4(B).

Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment method.

DFA can obtain a better finger order when the finger number and the bump-level

is large. We use another example to show that DFA is better than IFA. The nets and

bump ball locations are shown in Figure 3.11 If IFA is used to plan these nets, the

finger order is 13,7,3,1,14,8,4,2,15,9,5,16,10,6,17,11,18,12,19,20, and the density of

package routing is 6, as shown in Figure 3.11(A). If we adopt DFA to plan, the finger

order of nets is 13,7,3,14,1,4,8,15,9,5,2,16,10,17,6,11,18,12,19,20, and the density is

5, as shown in Figure 3.11(B).

- 48 -

Figure 3.11 The comparison of IFA and DFA. (A) The IFA routing result. (B) The DFA routing result.

3.3.2 Finger/PadExchangeof2DandStackingICsforIRDropandBonding

WireImprovement

After obtaining an initial net order for finger/pad locations, we can exchange this

order to improve IR-drop of the core. If we directly use EQ(3.1) to calculate IR-drop,

the analysis time for the chip is very long. The main reason is that the analysis point

and power pads are very more in all chip designs. To improve this problem, an

efficient method for quickly analyzing IR-drop is needed. In this dissertation, we

- 49 -

compute the variation of x and y to be the evidence of the IR-drop improvement

when the location of the power pad is exchanged. This method would cause

high-density routing in a package design because the density problem is ignored in

this computation. Here we propose a method to improve IR-drop while

simultaneously suppressing the density.

Section 3.3.1 introduced the monotonic order. If our exchange method ignores

this principle, the monotonic routing result is non-existent in the package. To maintain

this property, we add a range constraint in our exchange method. The key idea for the

range constraint is to mapping the monotonic order of vias [28] above the finger

locations. We choose three bump balls Bb1,x1,y1, Bb2,x2,y2 and Bb3,x3,y3, and the

connected fingers are Fa1, Fa2 and Fa3. If x1 < x2 < x3 and y1 = y2 = y3, Fa1 is

certainly shown on Fa2 left and Fa3 is certainly showed on Fa2 right. We use an

example to explain how we formulate the constraint. In Figure 3.4(B), net 6 is

assigned at F5, and the exchange range of net 6 is between F3 and F7. If the exchange

range is without the limit, we must pay a higher cost to find a suitable connected via

to build the monotonic routing.

When the finger/pads are exchanging, the package density needs to be controlled

at the same time. We propose a control method. After the congestion-driven

assignment step, the initial order of the nets on the finger/pad locations is determined.

The bump ball locations should be recorded when they are planned at the highest

horizontal line. This recording is needed because the monotonic rule is used in our

package routing. The density of the high horizontal line is higher than the density of

the low horizontal line. Therefore, we only oversee the density in the highest

horizontal line. If the recorded number is x, nets could be divided into x+1 sections,

Sc , 0 c x+1. For each section, we should record the interval number I , 1

c x+1. When the nets are exchanged, the interval number would be changed.

- 50 -

These numbers are called I , 1 c x+1. Therefore, the increased density (ID)

can be computed as follows:

ID max I I , 1 c x 1 ..............................................................(3.2)

The package density is inversely proportional to the value of ID.

The impact of bonding wires should be considered in the finger/pad exchange

method when there are more than two tiers. We propose a method to improve bonding

wires; the idea is to equidistant plan pads of different tiers. According to the tier

number, , we make a unique parameter for each tier, UPd, 1 d . This

unique parameter hasbits. One bit denotes one tier. We can use an example to

explain the method for making this parameter. If the tier number is 3, the parameters

from Tier 1 to Tier 3 are "001", "010" and "100". Every finger has one bonding wire

to connect to the pad. The pad is set at Tier d, 1 d , and Tier d has one unique

parameter, UPd. Therefore, we set the parameter of the fingers that connect to Tier d

as UPd. The set of finger locations, F1,...F, ar

93541014

Documents

Transcript of 93541014