Janus Discs: Preparation, Size-Tunability, Visualization ...
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation...
-
Upload
jane-young -
Category
Documents
-
view
224 -
download
1
Transcript of Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability Allocation...
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability
Allocation
Variability-Driven Formulation for Simultaneous Gate Sizing and Post-Silicon Tunability
Allocation
Vishal Khandelwal and Ankur Srivastava
Department of Electrical and Computer Engineering
University of Maryland College Parkhttp://www.ece.umd.edu/~vishalk
Vishal Khandelwal and Ankur Srivastava
Department of Electrical and Computer Engineering
University of Maryland College Parkhttp://www.ece.umd.edu/~vishalk
2
IntroductionIntroduction
Process variations cause significant spread in design performance in sub 90nm technologies
Impact yield and reliability
It is necessary to explicitly consider the impact of process variations on design parameters
Several statistical analysis and optimization techniques have been proposed to improve timing/power yields
3
Handling Process VariationsHandling Process Variations
Statistical Gate Sizing
Statistical Buffer Insertion
Process Variations
Design-Time Optimization Post-Fabrication Tunability
Post-Silicon Tunable Clock-Tree Buffers
Adaptive Body-Biasing
[Davoodi, DAC’06] [Sapatnekar, DAC’05][Zhou, ICCAD’05]
[He, ISPD’06][Davoodi, ICCD’05][Wong, ICCAD’05][Khandelwal, ICCAD’03]
[Chen, ICCAD’05][Mahoney, ISSC’05][Takahashi, 2003][Tam, JSSC’00]
[Kim, ISLPED’03][Orshansky, ICCAD’06]
4
Traditional Gate SizingTraditional Gate Sizing
iii
consn
iij
sss
Tt
tsdt
t
maxmin
0
)(
0
Minimize Area, Power, …
Gate size: si
Minimize area, or power Subject to:
meeting a delay constraint at the output size constraints
[Fishburn, Dunlop 1985]
[Sapatnekar,1993]
tiitj
di
n0
Tcons
5
Traditional Gate SizingTraditional Gate Sizing
i
iFOjjij
ii s
saad
)(
0i
j
Posynomial Gate Delay Expression [Fishburn, Dunlop 1985]
[Sapatnekar,1993]
iii
consn
iij
sss
Tt
tsdt
t
maxmin
0
)(
0
Minimize Area, Power, …
ix
i
consn
ijj
xx
iji
ses
Tt
tteaa
t
i
ij
maxmin
0
0
0)(
0
Minimize Area, Power, …
)(xdi
ixi es
Convex Formulation
6
Effects of Process VariationsEffects of Process Variations
{ , ,...}eff oxL T ��������������
Delay of each gate becomes a random variable
Statistical Gate Sizing
( )0
( )
( , ) ( )ij j
j Fanout ii i
i
a s
d s as
��������������
�������������������������� ��
Tox
n+ n+Leff
Set of random variables with arbitrary distributions
[Davoodi, DAC’06] [Sapatnekar, DAC’05][Zhou, ICCAD’05]
7
Post-Silicon Tunable (PST) Clock Tree BuffersPost-Silicon Tunable (PST) Clock Tree Buffers
FF1
FF2
FF3
FF4
FF5
FF6
FF7
FF8
B1
B2
B4
B3
B5 B6 B7
Tunable clock buffers can introduce extra slack into critical paths after fabrication
Design Overhead Area, Clock-Tree Power [Chen, ICCAD’05]
[Mahoney, ISSC’05][Takahashi, 2003][Tam, JSSC’00]
8
Post-Silicon Tunable Clock Tree BuffersPost-Silicon Tunable Clock Tree Buffers
Let Dij be the delay of the longest path between flip-flops i and j
Consider Flip-Flops 2 and 7: Tune buffers to change clock-skew
FF1
FF2
FF3
FF4
FF5
FF6
FF7
FF8
B1
B2
B4
B3
B5 B6 B7
i ij clk j setT D T T T
2 1 2 4 27 7 1 3 7( ) ( )
0
Buf Buf Buf Buf Buf Bufclk set
Buf Bufi i
T T T T D T T T T T T
T Max
9
Optimization Objective: Tunability CostOptimization Objective: Tunability Cost
Metric to capture the overhead due to PST buffers in the design Silicon Area Clock-Tree Power
Bufi
i PST Buffers
TunabilityCost Max
10
Optimization Objective: Binning Yield LossOptimization Objective: Binning Yield Loss
consT T dttftLossldLossBinningYie )()(
[V. Zolotov, DAC’04]
( ) ( )cons
cons TTBYL Q t T f t dt
Convex loss function Q(.)
LossLoss
TconsTcons Delay (t)Delay (t)
)(tfT )(tfT
(BYL)
[D. Blaauw, GLSVLSI’05]
11
Problem StatementProblem Statement
Given a sequential design with a synthesized PST clock-tree (known buffer locations), perform simultaneous Statistical gate sizing PST buffer tuning range determination
Such that Binning Yield Loss and Tunability Cost is minimized
FF1
FF2
FF3
FF4
FF5
FF6
FF7
FF8
B1
B2
B4
B3
B5 B6 B7i
di
n0
Tcons
12
Two-Stage FormulationTwo-Stage Formulation
Gate Size: , Tuning Buffer Range: x r
1. Deterministic constraints: meeting timing requirement assuming no variations
2. Capturing variability in objective
0 0 0{ , ,...}eff oxl t ��������������
( ( , ) ( ) )Minimize BYL x r TunabilityCost r GateSizes
0
min max
( , )
( , ) ( )( , )
( )
0
i ij clk j set
p q q
q ij
Buf
T D T T T FlipFlops i j
t d x t p fanin qFlipFlops i j
t D q fanin FlipFlop j
x x x
r Max
��������������
FirstStage
13
Second Stage FormulationSecond Stage Formulation
( ( , , ) )( , , )
0
ij cons ij consQ D x r T D TV x r
Otherwise
TconsTcons
Loss Q
Loss Q
)(tfT )(tfTvv
( , ) ( ( , , ) ) ( ) ( ,) [ ( , )]cons
ij cons T VTBYL x r Q D x r T f t dt v f v dv E V x r
0 0 ( , )
0
0
0
( , , ) ( )
( ) ( , ) ( )
( , )
( , ) ( )( , )
( , ) ( ( ))
i
violijFF i j
Buf Bufi k ij clk j kk C k Cj
violset ij
p q q
q ij
vij
v x r Minimize Q T
T T D x T T T
T T FlipFlop i j
t d x t p fanin qFlipFlop i j
t D x q fanin FlipFlop j
T
0
0 ( , )
0
iol
Buf
FlipFlop i j
T r PST Buffer
SecondStage
Given a solution to the first stage problem and a variability sample: 0 0( , , )x r No Statistical Timing Analysis scheme exists to estimate
the timing distribution of a circuit given gate sizes and tuning buffer ranges Each sample of variability requires different amount of
tuning for maximum timing yield
14
THEOREM: The proposed two-stage stochastic programmingformulation is convex
PROOF: Detailed proof omitted for brevity
( , ) [ ( , , )] ( ) ( , , ) ( )VBYL x r E V x r v f v dv V x r f d
��������������
��������������
Convex ProblemConvex Problem
First stage constraints are convex
First stage objective is convex if BYL(x,r) is convex
From second stage formulation one can show that
is convex
Need to show each sample is convex( , , )V x r
0
( , , )V x r
15
Kelley’s Cutting Plane AlgorithmKelley’s Cutting Plane Algorithm
Iteratively solve first and second stage formulation
Given a solution to the first stage formulation, we use method of finite differences to generate a lower bound to BYL from the second stage formulation
( , ) , ( , )k kBYL x r x r
Add this constraint to the first stage formulation at each iteration
16
Shortest-Path ConstraintsShortest-Path Constraints
Inherently non-convex in nature
Approximate gate delay using a linear approximation (lower bound)
The two-stage stochastic programming formulation can be modified to consider shortest path constraints
( , )short ji ij j holdT D T T FlipFlop i j
0 1
( )
pshort linij m
m
linm m m m n n
n fanout m
D d gates m on path p
d a a x b x
17
Experimental ResultsExperimental Results
Implemented the framework in SIS using MOSEK to solve the convex formulation
Used CAPO to place netlist to get spatially correlated gate delays
Assumed 15% Vth variation in 90nm technology node [Predictive Technology Model]
Synthesized the PST clock-tree using the technique proposed in [Chen et. al, ICCAD’05]
xixi
yiyiii
xjxj
yjyj jj
18
Experimental ResultsExperimental Results
Experimental Comparison – ISCAS benchmarks [Chen]:
Nominal gate sizing PST clock-tree generation using [Chen et. al, ICCAD’05]
Sensitivity: Retain PST clock-tree location and range Sensitivity-driven statistical gate sizing algorithm
– Size the gate with maximum yield gain greedily (iterative)
– Similar in spirit to [Zhou ICCAD’05, Zolotov DAC’05]
Stochastic: Retain PST clock-tree buffer locations Proposed simultaneous gate sizing and post-silicon tunability
allocation algorithm
19
BYL, Area and Tuning Range ComparisonBYL, Area and Tuning Range Comparison
0
50000
100000
150000
200000
250000
300000
s344 s382 s400 s526 s635
Binning Yield Loss
[Chen]
Sensitivity
Stochastic
3000
4000
5000
6000
7000
8000
9000
s344 s382 s400 s526 s635
Area (Logic Gates) Comparison
[Chen]
Sensitivity
Stochastic
0
2
4
6
8
10
12
14
s344 s382 s400 s526 s635
Tuning Range Comparison
[Chen]
Sensitivity
Stochastic
20
Timing Yield Loss ComparisonTiming Yield Loss Comparison
0
0.05
0.1
0.15
0.2
0.25
0.3
s344 s382 s400 s526 s635
Timing Yield Loss
[Chen]
Sensitivity
Stochastic
[Chen] Sensitivity Stochastic
Average Timing Yield Loss
0.22 0.19 0.03
21
Runtime ComparisonRuntime Comparison
0
50
100
150
200
250
300
350
400
s344 s382 s400 s526 s635
Runtime
Sensitivity
Stochastic
Technique s344 s382 s400 s526 s635
Sensitivity 24 40 18 15 109
Stochastic 7 19 13 14 7
Number of Iterations
22
Summary and Future WorkSummary and Future Work
Variability-driven framework for simultaneous gate sizing and post-silicon tunability allocation to minimize binning-yield loss and tunability cost
Efficient stochastic programming based scheme to solve the formulation
No assumptions about parameter distribution or their correlations
Need to develop a statistical timing analysis scheme that can consider the effect of post-silicon tunability
23
Thank You!Thank You!