93541014
-
Upload
mohammad-johar -
Category
Documents
-
view
213 -
download
0
description
Transcript of 93541014
-
/
VLSI Design Planning with Power Integrity and
I/O Constraints
-
VLSI Design Planning with Power Integrity and I/O Constraints
by
Chao-Hung Lu
A dissertation submitted in partial satisfaction of the
requirements for the degree of
Doctor of Philosophy
in Electrical Engineering
in the GRADUATE DIVISION
of the NATIONAL CENTRAL UNIVERSITY
Taiwan, Republic of China
Professor Chien-Nan Liu and Hung-Ming Chen
January 2010
-
(98 4 )
( 1 )
/()
( ) ()
( ) ()
( ) ()
( ) ()
( )
93541014
: /
98 1 26
1. 15 3
http://thesis.lib.ncu.edu.tw/
2.
3.
4.
-
/
:
/
/
/
O(n)
/
-
Abstract In modern VLSI deigns, manufacturing issues have complicated the designs of
chips as well as packages. Moreover, due to the requirement of the market, modern
circuits have higher functionality, lower supply voltage and more I/Os. These
conditions increase complexity of chip designs. In this dissertation, we present some
I/O plan and floorplan methods to solve these problems. They cannot only be applied
to mitigate the power supply noise in the core, but also can consider the package
designs, and stacking IC designs.
For the simultaneous switching noise, our method adopts a two-stage technique
of the floorplan followed by the decoupling capacitance (decap) insertion. In the
floorplan, the area and noise are evaluated to find a noise-driven floorplanning result.
Then, we use a noise-driven decap planning approach to inserting minimal decaps
into a floorplan. For IR-drop and the packages issues, we adopt a finger/pad
assignment method to solve these problems. Our finger/pad assignment is a two-step
method: we first solve the package design problem, then try to minimize IR-drop by
switching finger/pad locations. In addition, since stacking IC is promising to the
development of a high-performance IC, in this dissertation, we propose a partition
approach to minimizing the 3D-vias and balancing the I/O number for each tier in
stacking IC. Finally, we perform a floorplanning to show the importance of the
aspect-ratio factor in stacking IC.
-
2010
-
I
ContentChapter1 Introduction..........................................................................................................................................1
1.1 TrendsinVLSI........................................................................................................................................................1
1.2 StackingICAdvantageandTechnology......................................................................................................2
1.3 PowerIntegrityImpactsinChipDesign.....................................................................................................6
1.4 ImpactsofI/OPadLocationin2DandStackingICs...........................................................................7
1.5 DissertationOrganization.................................................................................................................................8
Chapter2 EffectiveDecapInsertioninAreaArrayI/OArchitecture................................................9
2.1 OverviewofDecapInsertion...........................................................................................................................9
2.2 PowerDeliveryandSignalIntegrityIssues............................................................................................12
2.2.1PowerDeliveryModelandNoiseEstimation..................................................................................12
2.2.2DecapBudgetinginAreaArrayArchitecture.................................................................................14
2.2.3ProblemFormulation.................................................................................................................................15
2.3 MinimalDecapAllocationinPowerSupplyNoiseAwareFloorplanning.................................15
2.3.1OTreeBasedPowerSupplyNoiseAwareFloorplanning.........................................................16
2.3.2FeasibleRegionforDecapAllocation.................................................................................................22
2.3.3 IdentificationofSpacePriorityforDecapInsertion....................................................................24
2.3.4DecapCompensationforVoltageDropinofPowerNetwork.................................................25
2.4 ExperimentalResults.......................................................................................................................................28
Chapter3 PackageRoutabilityandIRDropAwareFinger/PadAssignment..........................32
3.1 OverviewofPackageDesignMethods......................................................................................................32
3.2 CongestionandIRDropViolationMinimizationinFinger/PadPlanning...............................36
3.2.1ArchitectureandRoutingofBGAPackagein2DIC....................................................................37
-
II
3.2.2ArchitectureandInfluenceofBGAPackageinStackingICs.....................................................38
3.2.3TheImpactofFinger/PadLocationsonWireCongestion........................................................38
3.2.4TheImpactofFinger/PadLocationsonIRDropViolation......................................................39
3.2.5ProblemFormulation.................................................................................................................................41
3.3 CongestiondrivenFinger/PadAssignmentwithIRDropImprovement.................................42
3.3.1CongestiondrivenFinger/PadAssignment....................................................................................42
3.3.2Finger/PadExchangeof2DandStackingICsforIRDropandBondingWire
Improvement.............................................................................................................................................................48
3.4 ExperimentalResults.......................................................................................................................................51
Chapter4 DesignPlanningwith3DViaOptimizationinStackingIC............................................55
4.1 OverviewofOurPartitionMethod.............................................................................................................55
4.2 StackingICModelsandDesignFlow.........................................................................................................57
4.2.13DViaandStackingICModels..............................................................................................................58
4.2.2DesignFlowofAlternativeStackingIC..............................................................................................58
4.2.3TheImpactofI/OLocationinAlternativeStackingIC...............................................................61
4.2.4ProblemFormulation.................................................................................................................................62
4.3 I/OsandModulesPlanningwithMinimal3DViaNumberinAlternativeStackingICs....63
4.3.1GlobalPlanningforI/OsandModules...............................................................................................63
4.3.2 I/OAllocationbyCongestiondrivenPlanningandIterativeRefinement.........................65
4.4 ExperimentalResults.......................................................................................................................................67
Chapter5 ConcludingRemarksandFutureWorks................................................................................71
Reference...........................................................................................................................................................................74
-
III
List of Figures Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking
technology. ....................................................................................................... - 3 -Figure 1.2 The architectures of stacking ICs ............................................................ - 4 -Figure 1.3 Sub-classification of Wafer stacking ICs. ................................................ - 5 -Figure 1.4 The example of Ldi/dt noise effect. ......................................................... - 7 -Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result. ........... - 8 -Figure 2.1 (A) Flowchart of our proposed method (B) The illustration of our
method ...- 8 - Figure 2.2 Power delivery model in the area-array architecture - 8 - Figure 2.3 (A) The current consumption profile of module. (B) Simulation result by
HSPICE. - 8 - Figure 2.4 An O-tree example. - 17- Figure 2.5 The difference between the original O-tree and our approach ...- 18- Figure 2.6 The new Delete operation .....- 19- Figure 2.7 The relation between the area and the rotary module - 21- Figure 2.8 The new Insert operation.... - 21- Figure 2.9 The power-supply noise-driven floorplan algorithm ..- 22- Figure 2.10 The partition method for a decap. - 24- Figure 2.11 The space relation between decap locations and area - 25- Figure 2.12 The NDP_MAI flow chart. - 26- Figure 2.13 Circuit analysis for the power network.... - 27- Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology.
(C) The flowchart for IC physical designs. ................................................... - 33 -Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time
for the co-design method. .............................................................................. - 35 -Figure 3.3 The architecture of the two-layer ball grid array package. .................... - 38 -Figure 3.4 The relationship between the density and the finger/pad locations. ...... - 40 -Figure 3. 5 The simulation results of IR-drop. ........................................................ - 40 -Figure 3.6 The analysis model for IR-drop. ............................................................ - 41 -Figure 3.7 The pseudo code of the IFA method. ..................................................... - 44 -Figure 3.8 (A) The IFA assignment result. (B) The routing result. ......................... - 45 -Figure 3.9 The pseudo code of the DFA method. ................................................... - 46 -Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment
method. .......................................................................................................... - 47 -Figure 3.11 The comparison of IFA and DFA.. ....................................................... - 48 -Figure 3.12 The pseudo code of our finger/pad exchange method. ........................ - 51 -
-
IV
Figure 3.13 The routing results of Circuit 2. (A) Random (B) IFA (C) DFA ......... - 54 - Figure 4.1 The alternative stacking architecture. .................................................... - 56 -Figure 4.2 The flowchart of stacking ICs. .............................................................. - 59 -Figure 4.3 The area effect of aspect-ratio on stacking IC. ...................................... - 61 -Figure 4.4 The effect of I/O number in the 2-D chip design. ................................. - 62 -Figure 4.5 The effect of I/O number in alternative stacking IC. ............................. - 62 -Figure 4.6 The relation between the via number and I/O locations. ....................... - 64 -Figure 4.7 The connection graph of the circuit ....................................................... - 64 -Figure 4.8 The pseudo code of CPIR. ..................................................................... - 66 -Figure 4.9 The example of the CPIR method. ........................................................ - 67 -Figure 4.10 The detailed description of the I/O Planning Method. ........................ - 68 -Figure 4.11 The experimental result of the floorplan. ............................................ - 70 - Figure 5.1 Heat dissipation of Chips [60] ............................................................... - 73 -Figure 5.2 The flowchart of the future co-design tool ............................................ - 73 -
-
V
List of Tables Table 1.1 ITRS predictions for circuit performance in 2009[1]..-1- Table 1.2 A comparison of all classified stacking ICs.....-6- Table 2.1 A comparison of six floorplan representations.-16- Table 2.2 The peak noise after floorplanning, the decap insertion and run time -29- Table 2.3 The comparison table of our decap computation and [17]... -30- Table 2.4 Experimental results for some MCNC benchmarks with various approaches
for comparison.. -31- Table 3.1 The experimental data of test circuits... -52- Table 3.2 The maximum density, the total wire length and the maximum IR-drop in
our test circuits.........-53- Table 3.3 The improved ratio of IR-drop and bonding wires....-54- Table 4.1 The experimental result of our I/O planning methods targeting stacking
architecture with 3, 4, 5 and 8 tiers..................................................................-69-
-
- 1 -
Chapter 1 Introduction
1.1 Trends in VLSI
Current trends in chip design are integrating multiple functions into a single chip,
and simultaneously improving its size, power, performance and cost. To accomplish
this goal, semiconductor manufacturing companies are continually increasing the
number of transistors per square inch on integrated circuits, and improving their
manufacturing technology. International Technology Roadmap for Semiconductors
(ITRS) provides a prediction for Very Large Scale Integration (VLSI) growth [1], as
Table 1.3 illustrates.
Table 1.3 ITRS predictions for circuit performance in 2009 [1] Year 2009 2010 2011 2012 2013 2014
ASIC M1 Pitch (nm) 54 45 38 32 27 24 Vdd (High-performance) 1.0 0.97 0.93 0.87 0.84 0.81
Vdd (Low Operating ) 0.95 0.95 0.85 0.85 0.8 0.8 Transistors/Chip
(millions) 773 773 1546 1546 3092 3092
On-chip Clock(GHz) 5.45 5.84 6.32 6.81 7.34 7.91 Power Consumption (W) 143 146 161 158 149 152
Modern chip design technology is continually advancing to meet these ITRS
predictions. As VLSI technology enters the nanometer era, the resistance in connected
wires is increasing greatly. This resistance seriously decreases the performance of the
chip if the semiconductor manufacturing company uses the advanced technology to
make IC (Integrated Circuits). Besides the resistance, the power integrity issue must
be solved for the chip design as the trend in VLSI is to reduce supply voltages. This
condition helps reduce power dissipation, but also decreases the noise margin of
-
- 2 -
devices. Noise margin interference can sometimes generate erroneous chip functions,
which seriously reduce chip performance. As a result, the power integrity problem has
become one of the major factors affecting chip yield. In System-on-Chip (SoC)
designs, chips contain more functions and are expected to have much better
performance. At the same time, I/O counts are continually increasing. This adds up to
more routing complexity in the package design. These problems are the topic of
discussion in this dissertation.
1.2 StackingICAdvantageandTechnology
Stacking IC is promising to the development of a high-density high-performance
IC. Stacking technology stacks a die (chip, wafer) over another die (chip, wafer).
Transistors can be fabricated on different tiers, and the total wire length and size of
chip shrink by vertical interconnecting, as shown in Figure 1.1. The benefits of
stacking ICs include improvements in density, noise, power, performance, and
functionality.
(1) Density: In stacking ICs, transistors can be stacked and the package size can
be reduced. Compared to 2-D standard cells with stacking cells, this approach offers a
30% increase in area [2]. These reasons added the density when we convert a 2-D IC
to a stacking IC, since circuit components can be placed on top of, or underneath,
each other. Therefore, higher-density and higher-speed circuits can be created by
stacking ICs.
(2) Noise: Shorter wires have lower wire-to-wire capacitance, resulting in less
noise coupling between signal lines. Shorter global wires with reduced numbers of
repeaters should also have less noise and less jitter, providing better signal integrity.
Since Stacking IC can greatly reduce the length of global wires, it greatly improves
noise immunity.
-
- 3 -
(3) Power: The shorter wires will decrease the load capacitance, resistance and
the number of buffers needed. Since interconnect wires with their supporting repeaters
consume a significant portion of total dynamic power, the reduced average
interconnect length in stacking IC can reduce the total power consumption. Compared
with 2-D IC, stacking IC can reduce the wire length and significantly reduce total
dynamic power by more than 10% [3].
(4) Performance: Shorter wires decrease the time required to deliver a signal,
meaning that stacking IC can improve performance.
(5) Functionality: Stacked integration allows the combination of dissimilar
technologies (memory, RF, analog, logic) to create hybrid circuits.
Figure 1.1 The total wire length and size can of a chip be shrunk by the stacking technology.
Current research refers to stacking ICs as three-dimensional (3-D) ICs [4] or
System in Package (SiP) [5]. Stacking ICs can be classified into four types: (1)
package stacking; (2) chip stacking; (3) wafer stacking; and (4) device stacking.
Figure 1.2 shows the differences between each type. In the package stacking approach,
the chip is packaged before stacking, as Figure 1.2(A) illustrates. Chip stacking ICs [6]
stack dies before packaging, as Figure 1.2(B) shows. Wafer stacking fabrication
-
- 4 -
([7],[8]) stacks the wafers before cutting, as Figure 1.2(C) shows. A wafer stacking IC
is smaller than a chip stacking IC. The size and the performance of device stacking
ICs are better than wafer stacking ICs and the architecture is shown in Figure 1.2(D).
Because modern manufacturing technologies for device stacking [9] ICs are relatively
Figure 1.2 The architectures of stacking ICs :(A)Package Stacking; (B)Chip Stacking; (C)Wafer Stacking; (D)Device Stacking.
-
- 5 -
new, device stacking ICs cannot be manufactured by semiconductor manufacturing
companies. Therefore, the main-stream of modern stacking IC manufacturing is wafer
stacking IC.
In [10], wafer stacking is subdivided into two types: (1) chip-to-wafer, and (2)
wafer-to-wafer. Figure 1.3 shows the differences between these approaches. In the
chip-to-wafer IC manufacturing process, defective dies are removed before stacking
[7]. The yield can be increased by the removing step. The disadvantage of the
chip-to-wafer stacking is that the lower tier area is larger than the higher tiers and the
spacing between die to die is wider than wafer-to-wafer IC. Wafers must be aimed
before cutting in wafer-to-wafer ICs [8]. The disadvantage of the wafer-to-wafer
stacking is that the bad die is used in the stacking even though the defective die is
found, as shown in Figure 1.3(B). Table 1. 4 provides a comparison of all classified
stacking ICs.
Figure 1.3 Sub-classification of Wafer stacking ICs. (A) Chip-to-Wafer (B) Wafer-to-Wafer
-
- 6 -
Table 1. 4 A comparison of all classified stacking ICs.
Package
Stacking
Chip
Stacking
Wafer Stacking Device
Stacking Chip-to-Wafer Wafer-to-Wafer
Yield Highest High Normal Low Lowest
Size Largest Large Normal Small Smallest
Performance Lowest Low Normal Normal Highest
1.3 Power Integrity Impacts in Chip Design
Basically, the power integrity issues can be categorized into signal integrity
problems and power integrity problems. This dissertation focuses on the power
integrity problem caused by power supply noises such as the I (delta-I, Ldi/dit) and
IR-drop noise. Ldi/dt is also called SSN (Simultaneous Switching Noise) or I
(delta-I) noise. Figure 1.4 shows that Ldi/dt noise is a voltage fluctuation phenomenon.
Figure 1.4(A) shows the architecture of the circuit. This circuit consists of two
sub-circuits, A and B. If the start time of switching activity for A and B is not the same,
i.e.t t , the voltage can remain at a high level, as in Figure 1.4(B). If not, the
unstable voltage increases for a short period and the voltage drops below the high
VDD constraint, as in Figure 1.4(C). Using the decap to enhance the stabilization of
the voltage fluctuation is a popular method. This dissertation develops a decap
insertion method to solve the SSN problem in chip designs.
IR-drop is the voltage drop when the current goes through the non-zero resistors
of supply lines. As IC design moves into the nanometer regime, the resistance of the
connection wire incurs a voltage drop that exceeds the lower boundary constraint. In
this dissertation, we modify the location of each power pad to reduce the resistance of
the connection wire, and suppress the IR-drop. We use a real chip design and
-
- 7 -
commercial tools to support this idea.
Figure 1.4 The example of Ldi/dt noise effect. (A) The circuit architecture. (B) The
voltage measure when t t . (C) The voltage measure when t t .
1.4 Impacts of I/O Pad Location in 2-D and Stacking ICs
In VLSI designs, the I/O pad locations not only impact the power integrity, but
also affect the package routing and the core area. This section describes these effects
in detail. A good I/O pad assignment can help reduce the length of bonding wires in
stacking ICs. Compared stacking IC with 2-D IC, the architectures of the bonding
wires are different, as shown in Figure 1.5. If the stacking factor is ignored (as shown
in Figure 1.5(A)), the chip performance would be worsened due to the longer bonding
wires. Figure 1.5(B) shows the ideal result for the wire bonding. To achieve this target,
we need to consider the stacking factor in the finger/pad planning method.
Besides bonding wires, I/O pads affect the number of 3D-Via. Most previous
studies do not consider the physical impacts of 3D-Vias which critically affect the
area, latency, performance and cost of stacking ICs. Therefore, the place and route
algorithms on stacking IC should effectively plan 3D-Via.
-
- 8 -
Figure 1.5 The finger/pad planned results. (A) Bad result. (B) Good result.
1.5 Dissertation Organization
The goal of this dissertation is to provide a set of solutions considering the power
integrity and I/O constraint issues in the EDA field. We firstly develop a power model
to calculate the required decap for solving the delta-I problem and to increase the
usage of available space in the floorplan to reduce the area overhead caused by decap
insertion. Chapter 2 provides a detailed description of this step. We then develop a
planning method to determine the order of I/O pads considering package routability.
This plan method not only reduces the total wire length in the package but also
suppresses IR-drop noise of the core. Chapter 3 provides a detailed description of this
method. After determining the I/O order, the next step is to compute the location of
the I/Os. This study proposes a system partition approach to minimizing the number
of 3D-vias and balancing the number of I/Os on each tier, and modifies a traditional
floorplan method to optimize the I/O and module locations. Then, the stacking IC
design can be simplified into several 2-D IC designs. Chapter 4 describes this process
in detail. Finally, Chapter 5 concludes this dissertation and provides directions for
future research.
-
- 9 -
Chapter 2 EffectiveDecapInsertioninAreaArrayI/OArchitecture
As VLSI technology enters the nanometer era, supply voltages continue to drop
due to the reduction of power dissipation, but it makes power integrity problems even
worse. Employing decoupling capacitances (decaps) in floorplan stage is a common
approach to alleviating supply noise problems. Previous researches overestimate the
decap budget and do not fully utilize the empty space of the floorplan. A floorplan
usually has a lot of available space which can be used to insert decaps without
increasing the floorplan area. Therefore, the work presented in this chapter is to
develop a better model to calculate the required decap to solve the power supply noise
problem, and increases the usage of available space in the floorplan to reduce the area
overhead caused by decap insertion. The experimental results of this work are
encouraging. Compared with previous approaches, our methodology reduces 38% of
the decap budget on average for MCNC benchmarks but can still meet the power
supply noise requirements. The final floorplans with decaps are also smaller than the
results in previous works.
2.1 OverviewofDecapInsertion
Many researchers have proposed various approaches to solving this problem in
every design stage. The power/ground (P/G) network [11]-[13] is an important factor
in the supply noise problem. Power supply noise can be greatly improved by a better
P/G network with minimal penalty cost. Besides sizing the power lines [14],
employing decoupling capacitances (decaps) is a common approach to reducing
-
- 10 -
supply noise. Traditionally, the decap insertion process is performed after routing in
the physical design flow. This method would waste many unnecessary area of the
decap budget to improve the noise and decrease the efficiency of the decap budget.
Therefore, more and more researches propose to insert the decap before routing. [15]
proposed a two-step decap insertion method to improve power supply noise in the
placement level. This method includes one prediction method and one correction
method. The prediction step estimates the required decap pessimistically. Although the
decap size can be adjusted in the correction step, a smaller area overhead can be
achieved if decap insertion can be considered at an earlier stage. In [16] and [17], the
authors proposed decap insertion methods at the floorplan level to reduce supply noise.
Unfortunately, these previous researches often overestimate the decap budget. They
assume that the decap is able to fully supply the maximum current of the module,
which is too pessimistic in our observation. Besides the decap budget computation,
previous works do not fully use the available floorplan space. A floorplan usually has
a lot of available spaces that can be used to insert the decap without increasing the
floorplan area.
To make a high-performance and high pin-count IC, the area-array architecture is
often used. In this architecture, the signal bumps are uniformly distributed over the
chip. Therefore, the resistance from the core I/O to the signal bumps can be greatly
reduced and larger number of I/Os can be accommodated. In [18], the authors used
this architecture in floorplanning to improve the power supply noise. Because the
area-array architecture has such advantages, more and more chips adopt this
architecture to improve the power supply noise and limited pin number problem.
However, without decap insertion, the resulting floorplans and the area-array
architecture still suffer from supply noise violations.
The purpose of this work is to develop a better model for calculating the decap
-
- 11 -
Figure 2.14 (A) Flowchart of our proposed method (B) The illustration of our method
required to solve the power supply noise problem and to wisely use the available
space in the floorplan to reduce the area overhead caused by decap insertion. Based on
the area-array architecture, we propose a two-step approach that includes a
noise-driven floorplanning algorithm and a decap insertion approach to suppressing
power supply noise at the floorplan level, as Figure 2.14 illustrates. First, we use a
noise-driven floorplan algorithm to reduce the possible noise. This work adopts a
stronger adjacent module relation O-tree representation as the engine for supply noise
driven floorplanning, and successfully modifies the primary operations Delete and
Insert in the proposed framework. Second, we use a Noise-driven Decap Planning
with Minimum Area Insertion (NDP_MAI) approach to inserting minimal decaps into
a noise-guided resultant floorplan, with blocks and decaps legalization. Note that this
-
- 12 -
approach can compute the required decap size for a real design, and then provide the
optimal location for each deacp in floorplanning. After routing, we can use the method
in [19] to rectify our result, further improving the power supply noise problem.
The rest of this chapter is organized as follows. Section 2.2 describes the
floorplan design with power supply noise consideration, the new noise estimation
method, decap budget computation, and problem formulation. Section 2.3 presents the
floorplanning algorithm and the decap insertion approach for power supply noise
avoidance. Section 2.4 shows the experimental results.
2.2 PowerDeliveryandSignalIntegrityIssues
IR-drop and Ldi/dt (delta-I) noise are the main contributors in the noise margin
issue, and this work focuses primarily on the Ldi/dt noise. In this section, we describe
our power delivery model and noise estimation model used in this work and formulate
our problem.
2.2.1 PowerDeliveryModelandNoiseEstimation
In this work, the power source distribution is based on the area-array architecture.
The area-array architecture is a mesh structure, and the VDD and GND bumps are
uniformly distributed across the die with signal bumps in fixed interspersed location,
as illustrated in Figure 2.15(A). The resistance from the I/O to the connection block is
substantially decreased, the effect of the power delivery is better than peripheral-I/O
architecture. As a result, many high-performance chips adopt the area-array
architecture.
In the area-array architecture, a VDD bump supplies the current to all modules
according to the direct proportion of the distance from the bump to the module. Four
neighboring VDD bumps (right-top, right-down, left-top and left-down) of the module
-
- 13 -
Figure 2.15 Power delivery model in the area-array architecture.
supply the main current, as Figure 2.15(B) shows, so we compute the noise from these
VDD bumps only. Since there exist many paths for current delivery to the target
module, we only consider the shortest and second shortest paths for noise computation
simplification, as Figure 2.2(C) shows. The main reason is that currents follow the
least-impedance paths when flowing from the VDD to the target module. Compared
with this method with SPICE simulation, the error is within 10%, which is proved by
[17]. This computation method is fast and the error can be controlled within tolerable
range, so we use this method to compute the power supply noise in the floorplan level.
Kirchhoff's voltage law can be used to represent the noise calculation of each module:
V i RP LPP T ........................................................................(2.1)
where V denotes the power supply noise at module k, Pj denotes the path from
the power bump to node j, Pjk denotes the path from node j to node k, Tk denotes the
union of shortest paths and the second shortest paths, RP denotes the resistance of
P , LP denotes the inductance of P and i is the current flowing along path P .
-
- 14 -
2.2.2 DecapBudgetinginAreaArrayArchitecture
In [16] and [17], the authors assumed that the decap should fully supply the
maximum current of the module, as shown in the white region in Figure 2.16(A). In
this environment, the decap budget is possibly over-estimated. Actually, the VDD pin
continuously provides current when the chip is operating, as the grey region in Figure
2.16(A). Therefore, the required decap size can be significantly reduced.
Figure 2.16 (A) The current consumption profile of module. (B) Simulation result by HSPICE.
The required decap size can be obtained by the difference between the maximum
current (Imax) and the target current limit (Igen) for each module. Assume the target
current limit of module k is defined as I ,k=1,2,...M, and the maximum switching
current of module k is I . Let C be the required decap for circuit k and Q be
the amount of electric charge for the C . Q can then be obtained by the following
equation based on the triangle model shown in Figure 2.16.
Q I t dt I t dt.................................................................(2.2)
where t is the start time and t is the finish time when the target module is in
operational mode. The charge can be converted to the silicon area of the capacitance
fabrication as follows:
C QV
.............................................................................................................(2.3)
-
- 15 -
S CC
........................................................................................................(2.4)
where V is the noise constraint of the voltage, C is the decap budget and S
is the silicon area of C . C is the unit area capacitance of a MOS capacitor and
C /t , where is the permittivity of SiO2 and t is the oxide thickness.
We use SPICE to verify the accuracy of EQ(2.2), and compare the result with [17]. In
the experiment, we use 0.25m technology to do this simulation. The supply voltage
is set at 2.5V and the power supply noise tolerance level is set at 0.04V. Adopting the
[17] method to compute the required decap produces the result of 112pF. The decap
budget is 96pF when using EQ(2.2). Figure 2.16(B) shows the simulation result. The
proposed method yields less required decap than [17].
2.2.3 ProblemFormulation
The goal of this work is to use minimum decap to solve the power supply noise
problem in the floorplan level. In other words, we suppress power noise by each
module in different locations and empty minimal decap area to avoid possible power
noise during floorplanning. The problem can be formulated as follows :
Given a set of modules, B1,B2,...,Bm, current consumption, and , of
each block Bk, 1 k m, a set of power bumps, P1,P2,...,Pm, and the noise
constraint for each module , find a feasible solution such that each module Bk
obtains a required decap budget size DBSk, and minimum penalty area when DBSk is
inserted. At the same time, the voltage noise of module must be smaller than
the noise constraint .
2.3 MinimalDecapAllocationinPowerSupplyNoise
AwareFloorplanning
To solve the power supply noise problem, this work develops a two-step
methodology to suppress and reduce noise at the floorplan level, as Figure 2.14
-
- 16 -
illustrates. Since placing two high current consumption modules close together
seriously increases noise, we first propose a noise-driven floorplan algorithm to
improve this issue; the idea is to place the high current consumption modules
intelligently. The goal of our floorplan averages the high power consumption block at
one chip. This method can bring two benefits: (1) the peak noise can be improved; (2)
the decap budget can be averagely planned at one chip. The empty room after
floorplanning is small and dispersive. If many decaps are inserted into one particular
region, the area of the floorplan may increase because the empty room does not have
enough space for the decap. We then propose a Noise-driven Decap Planning with
Minimum Area Insertion (NDP_MAI) approach to reducing the power supply noise
and area overhead after floorplanning. We briefly introduce the representation of
O-tree and new needed operations, Delete and Insert, and then discuss the feasible
region of the decap budget.
2.3.1 OTreeBasedPowerSupplyNoiseAwareFloorplanning
To obtain a better result for noise-driven floorplan, a suitable and controllable
floorplan representation is needed. Table 2.5 compares six floorplan representations.
Table 2.5 A comparison of six floorplan representations.
Floorplan
Representation
Adjacent
Relation
Solution Space Operation
Delete Insert
SP[20] Not Good O((m!)2) - -
B*-tree[21] Good O(m!22m-2/m1.5) O(1) O(m)
O-tree[22] Best O(m!22m-2/m1.5) O(m) O(m)
TCG[23] Good O((m!)2) O(m2) O(m2)
CBL[24] Good O(m!23m-3/m1.5) O(m) O(m)
DBL[16] Good O(m!2m-1) O(m) O(m)
-
- 17 -
We choose O-tree to be our representation, the main reason is that the adjacent
relations can be directly obtained. High current consumption modules can be placed at
a distance from each other.
The O-tree is composed of a horizontal tree and a vertical tree, as shown in
Figure 2.17 (B) and (D). The horizontal (vertical) tree uses , to represent the
data structure, as shown in Figure 2.17 (C) and (E), where denotes the tree type,
denotes the paternity of the tree structure, and denotes the permutation of modules.
If the module touches another module horizontally (vertically), such as modules H
and L in Figure 2.17, it could be easily observed in the horizontal (vertical)
representation. If we use other representations, the adjacent relation of each module is
more difficult to be found.
Figure 2.17 An O-tree example. (A) Floorplan result. (B) Vertical tree. (C) Vertical tree representation. (D) Horizontal tree. (E) Horizontal tree representation.
Figure 2.18(A) shows the original O-tree operations. If module J is deleted, the
Delete operation generates a Left-Down(LD)-packing floorplan. The result is that two
high-current modules (I and K) are placed at an adjacent location. In some special
regions, they consume more power than other regions, and must use more decap to
reduce power supply noise. A similar situation occurs for the Insert operation because
it only considers the area and the wire length in original operation conditions.
-
- 18 -
According to the previous description, the original O-tree operations cannot control
the neighboring blocks. Therefore, new transformation operations are necessary.
These operations help avoid placing the high current consumption modules at adjacent
locations. We propose two new transformation operations:
Delete : The original operation deletes the selected module only. The new
operation can delete the selected module and top-right modules of the
selected module.
Insert : The original operation considers the area factor only. The module
can be inserted into a low noise location and an extensive area can be
minimized in our new operation.
We use an example to explain new transformation operations. To delete module J in
Figure 2.18(B), the selected module is module J and the top-right modules of the
selected module is module K only, so module J and module K must be deleted
together. The reason that top-right modules must be deleted is because the floorplan
must maintain a LD-packing result. If the right-top modules are not deleted at the
same time, the high current consumption module may be placed at a neighboring
location.
Figure 2.18 The difference between the original O-tree and our approach.
-
- 19 -
The new Delete operation consists of several steps. We first choose a
to-be-deleted module from of the horizontal and vertical O-trees, and then all
modules after in are chosen. We could obtain two block sets and . We then find
the intersection of and and obtain the candidate list of deletion modules. The final
step in the Delete operation is to delete the modules at the intersection of the vertical
and horizontal O-trees. To clarify, we use an example to explain our Delete operation.
Figure 2.19 shows the horizontal and the vertical representations of Figure 2.18. The
horizontal representation is set as (0011000111,HLIJK) and the vertical representation
is set as (0010101101,HIJKL). In this case, module J must be deleted. Therefore, the
block set of includes modules J and K, and the block set of includes modules J, K
and L. The deletion candidate list, JK, can be obtained after the intersection: JK JKL.
Finally, modules J and K in the representation are deleted. The horizontal
representation changes to (001101,HLI), and the vertical representation is modified as
(001101,HIL). The time complexity for the new Delete operation is O(m), where m
denotes the number of modules.
Figure 2.19 The new Delete operation
The new Insert operation consists of three parts: (1) find all possible locations; (2)
compute costs; (3) choose the optimal location. If one module is inserted in a
floorplan, there are many locations to choose from, and the first step is to discover
these candidate locations in a LD-packing floorplan. The possible insertion location is
at the lower-left corner of the floorplan result, as shown in Figure 2.21. Every possible
-
- 20 -
insertion location has a different cost, and next step is to compute the cost for each
candidate location. The cost function can be represented as follows:
C D A A D I I I I ........................................(2.5)
where C denotes the cost when module A is inserted in this location, D and D
are the weights, A is the area of the floorplan after the module is inserted,
A is the original area, I , denotes the current consumption of the module
a(b,c), and I denotes the threshold value for local current consumption. D is set
at a large value for penalizing high local current consumption. Every candidate
location cost must be computed twice since costs are different when the insert module
is directly inserted or rotated, as Figure 2.20 shows. Note that EQ(2.5) considers the
area and the power consumption only, this cost computation can be extended by
considering other objectives, such as wire length, etc.. Finally, the module is inserted
in the minimal cost location. The following illustration explains the new Insert
operation. In Figure 2.21(A), the initial floorplan result was made up of modules I, H
and L and modules J and K are the insert module candidates. Four triangles denote
the candidate location in the floorplan. In Figure 2.21(B), we compute the cost after
module J is inserted in all the candidate locations. Finally, module J is inserted in the
minimal cost location. In this case, the minimal cost location is at the corner between
module H and module L, as Figure 2.21(C) shows. Because the candidate list is not
null, the Insert operation must be repeatedly applied, as shown in Figure 2.21(D).
Note that EQ(2.5) considers left and down adjacent modules only when calculating
the cost function. The main reasons for this are (1)Because the Insert operation must
maintain a LD-packing floorplan, it only considers the left and down module of A;
(2)The Delete operation deletes all the top-right modules of A. If all modules are
inserted at the down-left corner only, there are no modules on the top and right side.
The time complexity of the new Insert operation is , where m denotes the
-
- 21 -
number of modules.
Figure 2.20 The relation between the area and the rotary module.
Figure 2.21 The new Insert operation.
According to previous operations and the SA (Simulated Annealing) [25]
algorithm, we propose a power-supply noise-driven floorplan algorithm, as illustrated
in Figure 2.22. We first provide a floorplan result and set the initial value for two
annealing temperature parameters (line 2). The new Delete operation is used to delete
modules (lines 4-6). Then, the new Insert operation and EQ(2.5) are used to insert
modules (lines 7-11). After the Insert operation, the new LD-packing floorplan can be
obtained. The difference C between the new floorplan and the original floorplan is
computed (line 12). If C is smaller than zero, it means that a better floorplan is
-
- 22 -
obtained and we would adopt this result to be our new solution (line 14). If not, we
randomly decide that the original floorplan must be replaced by the new floorplan
(lines 15-17). Finally, the temperature is cooled (lines 18).
Figure 2.22 The power-supply noise-driven floorplan algorithm
2.3.2 FeasibleRegionforDecapAllocation
After obtaining one floorplan, we calculate the required decap size for each
module. According to [20], [21], [22], [23], [24] and [16], the empty room after
floorplanning is small and dispersive. If a bigger decap is inserted into one floorplan,
the area of the floorplan may increase because the unitary empty room does not have
Power-supply Noise-driven Floorplan Algorithm Input : A compacted floorplan,F, with the current consumption for each block Output : A compacted floorplan,F, that two continuous high current consumption blocks could be separately placed 1. Begin 2. Initial floorplan; Temperature; Final_Temperature; 3. while Temperature > Final_Temperature 4. Randomly choose one block Bx from F; 5. Using Delete operation to delete Bx; 6. All deleted blocks added into a candidate list; 7. while candidate list is not NULL 8. Choose one block By from candidate list; 9. Using EQ(2.6) and Insert operation to insert By; 10. Delete By candidate information; 11. end_while 12. C = Cost(New_Floorplan) Cost(Floorplan); 13. if C < 0 14. Floorplan = New Floorplan;
15. else if Random(0,1) > exp CT
16. Floorplan = New Floorplan; 17. end_if 18. Cooling(Temperature); 19. end_while
-
- 23 -
enough space for the decap. Besides the area factor, the charge/discharge time of the
capacitance must be considered. The charge time is substantially reduced when
several smaller decaps form the required deacp. Our method cuts required decap into
four smaller decaps to minimally increase the floorplan area and reduce the
charge/discharge time of the decap. Note that the sizes of each smaller decap are not
the same. The distribution is based on the Manhattan distance from the VDD source to
the power bump and ( , ) denotes the connection relation. denotes the
Manhattan distance from the module's VDD location to the power bump x and
denotes the obtainable current contribution ratio from the power bump x. The
computational equation of the current contribution ratio can be written as follows:
DP , P , , P , , ,
P P , P , , P , , ,............................................................(2.6)
where (a x) denotes a and x are the different power bump. P , denotes the
power bump source of P and P are different. According to EQ(2.6), is
inversely proportional to P , it is to follow the current divided theorem. A simple
example helps to explain EQ(2.6). Figure 2.23(A) shows a result of the floorplan.
Module D needs decap to supply the current consumption. We first use EQ (2.2)-(2.4)
to compute the optimal decap sizing. The decap is partitioned into four smaller decaps.
Each smaller decap is given a feasible region that ranges from the location of the
power bump to the VDD source. We then use EQ(2.6) to compute the current ratio for
each power bump, as Figure 2.23(B) shows. Based on these constraints, a smaller
decap can be inserted into the chip and the charge time of the capacitance can be
substantially decreased.
In modern chip design, the decap is inserted in the empty space after detailed
routing. In reality, the decap cannot improve the power noise for the high current
consumption module if the distance from the decap to the module is far. To effectively
-
- 24 -
Figure 2.23 The partition method for a decap.
utilize the energy for each decap, the rectangle scope from the power bump to the
VDD pin is the feasible region for each small decap, as shown in Figure 2.23(A).
2.3.3 IdentificationofSpacePriorityforDecapInsertion
After floorplanning, the chip has some exploitable space for decap insertion. If
these spaces can be fully utilized, the cost of the chip might not increase even when
the decap is inserted into the chip. This section discusses the effect of placing the
decap in each different space. Furthermore, we propose a Noise-driven Decap
Planning with Minimum Area Insertion approach that simultaneously considers the
area cost and the noise effect.
In a floorplan result, it certainly has one or more horizontal (vertical) longest
paths. The path denotes the maximum width (height) of the floorplan. As shown in
Figure 2.24, module H and L compose a horizontal longest path. Varying these
modules directly modifies the area. Therefore, if one decap is inserted in the channel
space between these modules, the area would increase significantly. This channel
space is called the extensible space. A channel space that overlaps side of the empty
room is called the empty space that is not held by any module. If one decap is inserted
in this space, the location of each module does not change. The remaining channel
spaces are called the available space except for the channels of the extensible and
empty space. If one decap is inserted in this space, the area of the floorplan is fixed
-
- 25 -
and the location of some modules shifts only slightly. In Figure 2.24, the horizontal
longest path is H L and the vertical longest path is H I. If one decap is inserted
into the extensible space between H and I, the area would increase 285m . If one
decap is inserted into the empty space corner between H and I, the area and the
topology are not affected. Hence, the cost is lowest when the decap is inserted in the
empty space. If the decap is inserted into the available space between L and J, the area
is not increased but the topology changes. Therefore three types of spaces are the
candidates location for the decap -- Available Space, Extensible Space and Empty
Space. The priorities of these spaces are defined as follows: Empty Space>Available
Space>Extensible Space. The minimum cost space will be selected for the decap
insertion. Figure 2.25 illustrates the NDP_MAI approach.
Figure 2.24 The space relation between decap locations and area.
2.3.4 DecapCompensationforVoltageDropinofPowerNetwork
When using EQ(2.2)-EQ(2.4) to calculate the required decap for each module, the
decap must be placed around the target. The NDP_MAI approach does not consider
this factor when placing the decap. If the location of the decap is not near the target,
the power supply noise violates the given constraint because a part of the supply
power from the decap would be consumed by wire resistance. To improve this
problem, we use a simple compensative computation as follows:
V ........................................................................................(2.7)
-
- 26 -
Figure 2.25 The NDP_MAI flow chart.
where is the required decap of module k after the compensation, V is the
supply voltage, is the distance from the space to the connection point, and is the
wire capacitance per unit length. Although we could compensate the power
consumption of the wire by EQ(2.7), the power network is another important issue
that affects the supply power of the decap. Figure 2.26(A) shows the power network
after the decap insertion. We could utilize the superposition theorem to analyze the
circuit, as Figure 2.26(B)(C) shows. The discharge current from the decap disperses to
different modules because they both depend on the same power network. If decap
-
- 27 -
budget computations do not consider this factor, the power supply noise constraint
may be violated.
Figure 2.26 Circuit analysis for the power network.
To solve the above problem, we need a more accurate compensation equation for
EQ(2.7). According to the current divided theorem
V ........................................................................(2.8)
where is the total resistance on the side of the module k and is the total
resistance of other sides(Figure 2.26(B)). EQ(2.8) can accurately compensate the
required decap. We use SPICE to verify the accuracy of these two compensative
equations. In our experiment (the circuit in Figure 2.26), these modules are of the
same resistance. We expect module B to obtain 2.75mA from decap. If we use EQ(2.7)
to correct the decap, module B obtains only 2.66mA, which is insufficient for module
B. If we use EQ(4.8) to adjust the decap, it obtains 2.78mA from decap. This
-
- 28 -
experiment shows that the module can obtain sufficient current from the decap when
we use EQ(2.2)-(2.4) and EQ(2.8) to compute the required decap.
2.4 ExperimentalResults
We implemented the Power-supply Noise-driven floorplan algorithm, the
NDP_MAI approach, and the approach in [17] using C++ language on an AMD 3200
computer with 1G memory. Table II and IV compare the run-time, peak noise, and
decap budget with [17]. The purpose of this work is to analyze the effect of effective
decap insertion in the floorplan level. To obtain an equivalent comparison, the original
cost function of [17] is " A A W W ", where " W W " denotes the
wire length cost. We set as zero because the wire length is ignored in our cost
function. Five MCNC benchmark circuits, apte, hp, xerox, ami33 and ami49 are used
to test the performance of proposed methodology. Since the MCNC benchmark
includes no noise constraint, the noise constraint is set at 0.13V and 0.25V. In our
experiments, the operation times tw0 and tw1 of the switching current waveform are set
to be 0.3 and 0.8. The power supply voltage is 1.2V and the distance between two
continuous VDD bumps is 1000/m and the power supply mesh is 333.3/m. We
use [17]'s method to generate the current consumption information of each module.
The for module k is A D , where A is the area of module k, and D is the
worst case current density. is assigned as a random value to be 1.05 ~2 .
Table 2.6 compares our method with [17], the peak noise at noise-aware
floorplanning (noise-driven) and the post-floorplanning decap insertion (post).
Experimental results show that our floorplan method obtained better results than [17].
The main reason for this is that the high current consumption modules are placed
apart when we use EQ(2.1) to compute the peak noise. The time complexity of our
floorplan method is slightly higher than the method in [17]. The main reason for this
-
- 29 -
are: (1) The time complexity of the original Insert operation is O(m) and the new
Insert operation is O(m2). The time complexity of new operations is higher than
original O-tree operations; (2) In [17], the authors use the sequence pair-based
floorplanner to plan blocks. The sequence pair method modifies the list order to
change the floorplan result. The time complexity for each change should be lower
than the original O-tree method. In the post-floorplanning decap insertion, all results
conform to the given constraint, 0.13V, and both run-times are very fast.
Table 2.6 The peak noise after floorplanning, the decap insertion and run time.
Our Method [17]
Circuit Peak Noise(V)
(noise-driven)
Peak Noise(V)
(noise-driven)
Run Time(s)
(noise-driven)
Peak Noise(V)
(noise-driven)
Peak Noise(V)
(noise-driven)
Run Time(s)
(noise-driven)
apte 2.05 0.13 2 2.05 0.11
-
- 30 -
Simulations have been run for xerox and ami33 with HSPICE. The peak noise before
decap insertion is 1.63V for xerox, 0.29V for ami33. After applying [17]'s method for
decap insertion, the peak noise is 0.06V for xerox, 0.11V for ami33. If we use our
proposed method, the peak noise is 0.04V for xerox, 0.06V for ami33. These results
are close to our results in Table 2.7.
Table 2.7 The comparison table of our decap computation and [17]. Circuit Our Decap
Budget(nF) Our Peak Noise(V)
[17]Decap Budget(nF)
[17]Peak Noise(V)
Decrease Ratio
apte 14.77 0.13 22.12 0.11 33%
hp 3.6 0.13 5.01 0.11 28%
xerox 6.53 0.13 9.55 0.10 31%
ami33 0.17 0.12 0.36 0.09 52%
ami49 7.85 0.12 14.54 0.10 46%
Table 2.8 compares the area information of the proposed methodology with [17].
In the third column, we completely use the [17] method to compute the incremental
area. In the fourth column, we partially adopt the [17] method (including floorplan
and decap computation) and our decap insertion method to implement. The third
column and the fourth column show that our proposed floorplanning framework alone
has our performed the method in [17]. According to this result, our decap insertion
method is better than [17]. The main reason is our method cuts decap into four smaller
decaps to minimally increase the floorplan area. The incremental area of our proposed
method is shown in the last column. Compared to the numbers reported in previous
papers, the proposed floorplanning framework creates better initial floorplans to work
on, followed by the effective NDP_MAI approach to inserting enough decaps.
-
- 31 -
Table 2.8 Experimental results for some MCNC benchmarks with various approaches for comparison.
Circuit Modules[17] Increased
Area(m2)
[17]s Decap+Our Insertion Increased
Area(m2)
Our Methodology
Increase Area(m2)
apte 9 356832 292618 19036
hp 11 76608 68184 157640
xerox 10 152061 121648 75006
ami33 33 8228 5082 3824
ami49 49 266616 201486 154990
-
- 32 -
Chapter 3 Package Routability- and IR- Drop-Aware Finger/Pad Assignment
The trends in VLSI is to make more and more electronic devices into a single
chip, and the performance requirement is getting more severe. To achieve this
objective, the finger/pad counts are continuously increased, thus the package and chip
design becomes more and more complex. Due to the increasing complexity of the
design interactions between the chip and package, it is essential to consider them at
the same time. In addition, the reduced supply voltages in modern chip design are
tightening the noise margin. IR-drop is an important part of the design issue, and it is
now an inevitable waste when the circuit obtains energy from a power source.
In order to simultaneously handle core and package problems, co-design of core
and package is a widely adopted solution, particularly because the finger/pad
locations significantly affect IR-drop of the core and the package routing. In this
chapter, we developed chip-package co-design techniques to determine the locations
of the fingers/pads for package routability and signal integrity concerns in 2-D and
stacking IC design. Our finger/pad assignment is a two-step method: we first solve the
wire congestion problem in package routing and then try to minimize the IR-drop
violation and the length of the bonding wires. The experimental results are
encouraging. Compared with the randomly optimized method, on average, our
approaches reduce the maximum package density by 42% and 68% for both
technologies, the IR-drop by 10.61% and 4.58%, and the bonding wires for stacking
IC by 15.66%.
3.1 OverviewofPackageDesignMethods
In the traditional design methodology, the core and package of a chip are
-
- 33 -
designed separately, as shown in Figure 3.1(A) and (C). Core designers assume that
package problems will not affect the performance of the chip. However the
performance, complexity, and noise of the package critically affect the chip [26]. In
the new chip design paradigm such as Stacking-IC [5], the package design absolutely
determines the final quality of the chip. Therefore, a high quality package design is
needed for a modern chip design.
Figure 3.1 (A) The flowchart for package designs. (B) Our Co-design Methodology. (C) The flowchart for IC physical designs.
As VLSI technology enters the nanometer era, chips contain more functions and
are expected to have much better performance. At the same time, finger/pad counts
are continually increased. This adds up to more routing complexity in the package
design. In early package technologies, the number of available finger/pads was small,
such as Dual In-line Package (DIP) or Pin Grid Array (PGA). The Ball Grid Array
(BGA) is a popular package technology for modern package design because it can
handle high finger/pad counts to connect to the Printed Circuit Board (PCB). The
-
- 34 -
package design flow can be divided into several parts, as shown in Figure 3.1(A). The
major problem in package design is routing. Many researchers [27][28][29][30] have
proposed various approaches to solving the routing problem in package design. Using
finger/pad assignments to improve the package routing is another alternative. In
[31][32][33], the authors proposed numerous assignment algorithms to improve the
routing problem. Because these methods can only handle a small finger count (
-
- 35 -
separately in the modern chip design, as shown in Figure 3.2(B). This principle will
cause the over-design conduction. Package designers usually use a finger planning
method to improve package routing, and core designers propose a noise-driven I/O
planning method to improve IR-drop of a core. To build a functionally correct chip,
we usually over-design the chip to mitigate routability(noise)-related issues in the
finger planning (I/O planning) step. This over-design brings two disadvantages: the
longest cycle time for the chip design and a greater cost for the chip design. If we
perform chip-package co-design to simultaneously compute the interdependent
influence of IR-drop and package routing across the die and chip, these disadvantages
can be easily eliminated.
Figure 3.2 (A) The cycle time for the traditional design method. (B) The cycle time for the co-design method.
We enhance the [37]s method to apply in stacking IC. We develop a two-step
approach to simultaneously improving the package congestion and IR-drop of the core
at the finger/pad assignment step for a 2-D IC. If this approach is used for a stacking
IC, the length of bonding wires, and the package congestion and IR-drop can be
simultaneously improved. This approach includes one congestion-driven assignment
-
- 36 -
and one finger/pad exchange approach, as shown in Figure 3.1(B). Our contributions
are summarized as follows.
We present a finger/pad assignment method to minimize the maximum wire
congestion, and propose a finger/pad exchange method to improve IR-drop
of the core in a stacking IC design. The assignment result can certainly lead
to a legal routing solution.
We propose an efficient estimation to analyze the wire congestion before
routing. This method does not need to analyze the whole substrate, and it can
directly find the most congested region.
We develop a co-design methodology to simultaneously improve the
problems with the package and core in (stacking) ICs. The cycle time for the
chip design can be greatly shortened.
The rest of this chapter is organized as follows. Section 3.2 describes the package
architecture in 2-D and stacking ICs, finger/pad assignment design with congestion
and IR-drop consideration, and the problem formulation. Section 3.3 presents two
congestion-driven assignment methods and one finger/pad exchange method to
improve package problems and IR-drop. Section 3.4 shows the experimental results.
3.2 CongestionandIRDropViolationMinimizationin
Finger/PadPlanning
To deliver great data in modern chip designs, finger/pad counts are continually
increased and the complexity of package routing is greatly raised. In addition, the
IR-drop issue seriously impacts the performance for the chip design. The finger/pad
not only affects the package routing, but also impacts IR-drop of the core. This study
focuses primarily on these problems. We first introduce our package model, and then
the sources of the package routing and IR-drop problems are described. Finally, we
-
- 37 -
formulate the target problem in this work.
3.2.1 ArchitectureandRoutingofBGAPackagein2DIC
Based on the modern package technology, we can utilize multiple layers for
package routing. In our package model, there are two layers for routing, the die on the
top layer of the substrate, and the bump balls on the bottom layer of the substrate. The
fingers, which are the relay from the pad to the package substrate, are placed as a
closing rectangle on Layer 1. The pads are connected to the fingers by wire-bond and
flip-chip [1] technologies. Because wire-bond packages are cheaper than flip-chip
packages, we adopt the wire-bonding technology to connect the die and the package
substrate in our package module. The detailed architecture is shown in Figure 3.3.
Figure 3.3(A) shows the vertical view and (B) is the profile. Bump balls, which are
connected to the printed circuit board, are uniformly distributed on Layer 2. The net
between the finger and the bump ball is implemented within a package substrate on
Layer 1 and Layer 2. The function of the via is to connect a wire on Layer 1 and
another wire on Layer 2, as shown in Figure 3.3(B). In addition, we partition the
package area into four parts and solve the package problems individually. We also
assume that the finger order and the pad order are the same.
Because the via count affects the performance and the area of the package, we
constrain that the maximum via count of each net is one in our package routing. In
addition, the candidate locations for the vias are around the bump ball. The number of
vias between four adjacent bump balls is at most one. In [28], the authors proposed a
global routing method to plan the via location and the net path, and the routing result
complies the monotonic characteristic. The monotonic characteristic is that the net
from the finger to the bump ball intersects every horizontal grid line only once.
Therefore, the detour routing would not occur and the wire length can be reduced. We
adopt the idea of [28] to plan the via location and the routing path for the same
-
- 38 -
Figure 3.3 The architecture of the two-layer ball grid array package. (A) The vertical view. (B) The profile.
purposes.
3.2.2 ArchitectureandInfluenceofBGAPackageinStackingICs
Compared stacking IC with 2-D IC, the architectures of the bonding wires are
different, as shown in Figure 1.5. If the stacking effect is ignored (as shown in Figure
1.5(A)), the chip performance would be worsened because bonding wires are longer
and the resistance and inductance are inversely proportional to the wire length. In
addition, the bonding wire yield is lower if the distance between the finger and
connected pad is longer. Figure 1.5(B) shows the optimal result for the finger/pad
planning. To achieve this target, we need to consider the stacking factor in the
finger/pad planning method.
3.2.3 TheImpactofFinger/PadLocationsonWireCongestion
The vias are evenly distributed on the substrate in our package architecture. We
compute the wire count between two continuous vias to denote the density. If the
-
- 39 -
density is higher, it indicates that too many wires pass through a narrow range.
Therefore, a violation of design rule is probably occurred. To improve this problem, it
is essential to develop a good method to control the density. The relationship between
the density, via location and routing method is detailed in [28]. This work focuses on
the relationship between the density and the finger/pad locations.
A good finger/pad assignment can help to reduce the density of the package
routing. We use an example to explain the relationship between the density and the
finger/pad assignments. To display the importance of the finger/pad assignments, the
via location and routing method is fixed in the example. In Figure 3.4(A), we use a
random method to generate the finger order, 10,1,2,3,11,6,9,4,5,8,7,0. In Figure
3.4(B), a congestion-driven assignment method is used to generate a new finger order,
10,11,1,2,6,3,4,9,5,7,8,0. Compared Figure 3.4(B) with (A), the maximum density can
be reduced 50% when we merely change the finger order.
3.2.4 TheImpactofFinger/PadLocationsonIRDropViolation
IR-drop is the unavoidable waste of electric charge when the circuit obtains
energy from power pads. Compared wire-bond packaging with flip-chip packaging,
the IR-drop problem of a wire-bond package is worse than a flip-chip package. The
main reason is that the distance from the power pad to the module in a flip-chip
package is shorter than in a wire-bond package. However, as we move into the
nanometer regime, the resistance of the connection wire would consume the supply
energy. If the power pad cannot supply enough energy, the voltage drop might exceed
the lower boundary constraint. In this work, we modify the location of each power
pad to improve the resistance of the connection wire. Further, IR-drop can be
improved. We use a true chip design and commercial tools to verify the accuracy of
this concept. The simulated result is shown in Figure 3. 5. Compared Figure 3. 5(B)
with (A), IR-drop can be greatly improved by just changing the pad locations.
-
- 40 -
Figure 3.4 The relationship between the density and the finger/pad locations.
Figure 3. 5 The simulation results of IR-drop.
To minimize the cycle time of the chip design, we need a good and efficient
model to analyze IR-drop. This is usually done after floorplanning and placement
[38][39], and the results are shown to be close to the results from SPICE simulation.
In [40], authors proposed an analytical model for use before floorplanning. Since the
-
- 41 -
finger/pad assignment problem is resolved before floorplanning, we adopt the model
in [40] to obtain the IR-drop map. Since this model should be used before the
planning of the core, it is not very accurate. The power grid model of [40] is shown in
Figure 3.6(A). Figure 3.6(B) is a node model for the grid. The authors assume that the
power consumption of all the locations are the same, and propose the following
equation to calculate IR-drop of each point.
VIR , VIR ,
R
VIR , VIR ,
R
VIR , VIR ,
R
VIR , VIR ,R
J x y ............................................................................(3.1)
where VIR x, y is the voltage of a point x, y , J is the current density, and R
and R are the resistances in the x and y directions. According to EQ(3.1), we can
exchange power pad locations to minimize x and y to improve IR-drop.
Figure 3.6 The analysis model for IR-drop.
3.2.5 ProblemFormulation
We have detailed the relationships between the wire congestion, IR-drop and
finger/pad locations in 2-D and stacking ICs. In modern chips, the finger/pad counts
are continuously increased and the supply voltages are continuously decreased. Issues
related to the wire congestion on a substrate, IR-drop of the core and bonding wires of
stacking ICs are becoming more and more serious. The goal of this work is to plan
-
- 42 -
nets on regular finger/pad locations to improve these issues. In other words, we
decrease the density, voltage drop of the core, and length of the bonding wires by
relocating the finger/pad locations. The problems can be formulated as follows:
Input : The locations of the fingers/pads, F1,F2,...,F from the left to the right, the
set of the net names, N1,N2,...,N and the type of each net, the locations of the
bump balls, B1,1,1,B2,1,2,...,B , , , where , denote the coordinates of the bump
ball, denotes the net name, denotes the total net count, and denotes the
total finger/pad count. In addition, we must set the tier number, , and the pad
number for each tier.
Output : The assignment of net Nb, 1 b to finger/pad locations Fa , 1
a .
Objective : Minimize the maximum density and the voltage drop of the core, and
improve the length of the bonding wires based on a pre-floorplan model.
3.3 CongestiondrivenFinger/PadAssignmentwith
IRDropImprovement
To solve the density and IR-drop problems, we propose a two-step methodology
at the finger/pad planning level, as Figure 3.1(B) illustrates. We first propose two
congestion-driven finger/pad assignment methods to improve the package density; the
idea is to calculate the ideal density and compute a suitable finger/pad order and
locations. We then present a finger/pad exchanging approach to reduce IR-drop. This
exchange approach will simultaneously consider the density, IR-drop and bonding
wires.
3.3.1 Congestion-driven Finger/Pad Assignment
The monotonic routing is a method in the package design [28]. It can provide a
high-quality routing result. This work adopts this routing principle to verify the effect
-
- 43 -
of the assignment method. Based to the monotonic characteristic, [28] proposed a via
assignment rule. For each finger Fa, the target bump ball is Bb,x,y, the net name is Nb,
and the connected via is Vb. The coordinates of Vb are (Vb,x,Vb,y). We randomly
choose two nets Nb1 and Nb2. The connected finger/pad locations are Fa1 and Fa2 and
the connected via locations are (Vb1,x,Vb1,y) and (Vb2,x,Vb2,y). If Vb1,x < Vb2,x and Vb1,y
= Vb2,y, a1 is certainly smaller than a2. In other words, the via order and the displayed
sequence of the finger order are the same. An example can help to explain the rule. In
Figure 3.4(A), the finger locations from the left to the right are F1, F2, ... F12, and the
finger order is _,11,_,_,6,_,_,9,_,_,_,_. The via order in y=2 is 11,6,9. If the via order
conforms to this rule, a legal monotonic routing certainly exists in this package. In
this work, we assume that the connected via is fixed at the bottom-left corner of the
bump ball and use the routing method from [28] to show the effectiveness of the
finger/pad assignment. To improve the maximum density, a better finger/pad
assignment method is needed. Here we propose two congestion-driven finger/pad
assignment approaches: Intuitive-Insertion-Based Finger/Pad Assignment and
Density-Interval-Based Finger/Pad Assignment.
Intuitive-Insertion-Based Finger/Pad Assignment (IFA)
This method depends on the inserted characteristic to avoid the illegal monotonic
rule. The pseudo code is shown in Figure 3.7. In the IFA method, the first step is to
find un-route horizontal lines (line 1). For each horizontal line, we must calculate the
number of bump balls (line 2). For the first horizontal line (y=n, n is the highest
horizontal line), the net name of each bump ball Bi,x,y is directly assigned on Fx (lines
3-5). For other horizontal lines (y=n-1 to 0), the net name of the first bump ball Bi,1,y
assigns into F1 and the net name of bump balls (x=2 to m-1) is assigned at Fb-1, where
Fb denotes the (x-1)th bump ball location in the y-1 horizontal line (lines 7-11). The
net name of the last bump ball is directly inserted into the last finger location (line 13).
-
- 44 -
The time complexity for IFA is O(n2).
Figure 3.7 The pseudo code of the IFA method.
We can use an example to explain the IFA flow. In this example, the locations of
the bump balls and nets are the same as in Figure 3.8. An illustration of the IFA is
shown in Figure 3.8(A) and the routing result is shown in Figure 3.8(B). In Figure
3.8(A), because nets 11, 6, and 9 are set at the highest horizontal line (y=2), step 1
assigns these three nets into finger locations F1, F2, and F3. Step 2 inserts nets 1, 3, 5,
and 8 (y=1) into suitable locations. Net 1 is set at Bi,1,y; we assign net 1 into F1 and the
other nets on the finger move to the next finger location. For net 3, the bump ball
location is B3,2,1. The net name on Bi,2,1+1 is "Net 6". Therefore, net 3 is inserted before
net 6. Net 5 uses the same method to obtain a suitable location. Net 8 is inserted into
the last location because it is the last net on this line. Step 3 repeats step 2 to insert
remaining nets. The final finger order is 10,1,11,2,3,6,4,5,9,7,8,0. The routing result is
shown in Figure 3.4(B) and the density is 2. Compare this result with Figure 3.4(A),
the maximum density has decreased by 50%.
-
- 45 -
Figure 3.8 (A) The IFA assignment result. (B) The routing result.
Density-Interval-Based Finger/Pad Assignment (DFA)
If IFA is applied to a two-level BGA package, the routing result is satisfied. If IFA
is applied to a BGA package with three or more level, the result is imperfect because
the insertion method of IFA only considers two horizontal lines. We propose another
method, Density-interval-based Finger/pad Assignment (DFA), to solve this problem.
The pseudo code is shown in Figure 3.9. We first determine a processing priority
based on the coordinates of all the horizontal lines where n is total number of
horizontal lines (line 1). For each horizontal line, we calculate the number of bump
balls (line 2). Then, the density interval (DI) is computed (line 3), where "Total
-
- 46 -
Non-allocated Net" denotes the number of nets not connected to the via, "Total Via
Number" denotes the number of via on the horizontal line, and "Used Via Number"
denotes the via used on the horizontal line. "(Total Via Number + 1)" denotes the
segment in this horizontal line. For each bump ball (Bi,x,y, 1 x m), we calculate
the empty number (EN) and insert the net name into the (EN + 1)th location (lines 4-7),
where EN denotes the empty slot in the finger location. The time complexity for DFA
is O(n). If we use this method to plan the nets, the net names would be averagely
assigned into the finger/pad locations for each horizontal line, the routing path of all
nets can be averagely planned into the whole substrate.
Figure 3.9 The pseudo code of the DFA method.
We use the same example to show the effectiveness of DFA. An illustration of
DFA is shown in Figure 3.10. Because nets 11, 6, and 9 are set at the highest
horizontal line (y=2), the first step is to decide on the finger locations of these three
nets. According to the input information, the bump balls of these three nets are B11,1,2,
B6,2,2 and B9,3,2. The Total Non-allocated Net is 12, Total Via Number is 4 and Used
Via Number is 3. DI = (12-3)/(4+1)= 1.8. For net 11, EN 1 1.8 1. Therefore,
net 11 is inserted into F2 because F1 is an empty slot. For net 6, EN 1 1.8 3.
Because F2 is occupied, F1, F3, and F4 are unassigned spaces, and net 6 is assigned to
-
- 47 -
the (3+1)th unassigned space, F5. Using the same method, all of the nets can be
inserted into suitable locations. The final order of the nets is 10,11,1,2,6,3,4,9,5,7,8,0,
as shown in Figure 3.10(C), and the routing result is shown in Figure 3.4(B).
Figure 3.10 The illustration of the Density-Interval-Based Finger/Pad Assignment method.
DFA can obtain a better finger order when the finger number and the bump-level
is large. We use another example to show that DFA is better than IFA. The nets and
bump ball locations are shown in Figure 3.11 If IFA is used to plan these nets, the
finger order is 13,7,3,1,14,8,4,2,15,9,5,16,10,6,17,11,18,12,19,20, and the density of
package routing is 6, as shown in Figure 3.11(A). If we adopt DFA to plan, the finger
order of nets is 13,7,3,14,1,4,8,15,9,5,2,16,10,17,6,11,18,12,19,20, and the density is
5, as shown in Figure 3.11(B).
-
- 48 -
Figure 3.11 The comparison of IFA and DFA. (A) The IFA routing result. (B) The DFA routing result.
3.3.2 Finger/PadExchangeof2DandStackingICsforIRDropandBonding
WireImprovement
After obtaining an initial net order for finger/pad locations, we can exchange this
order to improve IR-drop of the core. If we directly use EQ(3.1) to calculate IR-drop,
the analysis time for the chip is very long. The main reason is that the analysis point
and power pads are very more in all chip designs. To improve this problem, an
efficient method for quickly analyzing IR-drop is needed. In this dissertation, we
-
- 49 -
compute the variation of x and y to be the evidence of the IR-drop improvement
when the location of the power pad is exchanged. This method would cause
high-density routing in a package design because the density problem is ignored in
this computation. Here we propose a method to improve IR-drop while
simultaneously suppressing the density.
Section 3.3.1 introduced the monotonic order. If our exchange method ignores
this principle, the monotonic routing result is non-existent in the package. To maintain
this property, we add a range constraint in our exchange method. The key idea for the
range constraint is to mapping the monotonic order of vias [28] above the finger
locations. We choose three bump balls Bb1,x1,y1, Bb2,x2,y2 and Bb3,x3,y3, and the
connected fingers are Fa1, Fa2 and Fa3. If x1 < x2 < x3 and y1 = y2 = y3, Fa1 is
certainly shown on Fa2 left and Fa3 is certainly showed on Fa2 right. We use an
example to explain how we formulate the constraint. In Figure 3.4(B), net 6 is
assigned at F5, and the exchange range of net 6 is between F3 and F7. If the exchange
range is without the limit, we must pay a higher cost to find a suitable connected via
to build the monotonic routing.
When the finger/pads are exchanging, the package density needs to be controlled
at the same time. We propose a control method. After the congestion-driven
assignment step, the initial order of the nets on the finger/pad locations is determined.
The bump ball locations should be recorded when they are planned at the highest
horizontal line. This recording is needed because the monotonic rule is used in our
package routing. The density of the high horizontal line is higher than the density of
the low horizontal line. Therefore, we only oversee the density in the highest
horizontal line. If the recorded number is x, nets could be divided into x+1 sections,
Sc , 0 c x+1. For each section, we should record the interval number I , 1
c x+1. When the nets are exchanged, the interval number would be changed.
-
- 50 -
These numbers are called I , 1 c x+1. Therefore, the increased density (ID)
can be computed as follows:
ID max I I , 1 c x 1 ..............................................................(3.2)
The package density is inversely proportional to the value of ID.
The impact of bonding wires should be considered in the finger/pad exchange
method when there are more than two tiers. We propose a method to improve bonding
wires; the idea is to equidistant plan pads of different tiers. According to the tier
number, , we make a unique parameter for each tier, UPd, 1 d . This
unique parameter hasbits. One bit denotes one tier. We can use an example to
explain the method for making this parameter. If the tier number is 3, the parameters
from Tier 1 to Tier 3 are "001", "010" and "100". Every finger has one bonding wire
to connect to the pad. The pad is set at Tier d, 1 d , and Tier d has one unique
parameter, UPd. Therefore, we set the parameter of the fingers that connect to Tier d
as UPd. The set of finger locations, F1,...F, ar