Assessing Costs of Variability, Reliability and...
Transcript of Assessing Costs of Variability, Reliability and...
Assessing Costs of Variability, Reliability and Resilience
Andrew B. KahngUCSD CSE and ECE Departments
[email protected]://vlsicad.ucsd.edu
UCSD VLSI CAD Laboratory 2NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Design Capability Gap, Value Scaling Gap• “Available density” ideally grows at 2x/node
• = a typical view of “Moore’s Law”
• Even so, “realized density” grows at 1.6x/node• Power, performance, area resources spent on guardband, reliability, etc.• Designers obtain only part of Moore’s Law scaling benefits
UCSD VLSI CAD Laboratory 3NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Challenge: Variability + Reliability
• Variability + Reliability = challenges to design closure for a competitive IC product• Design costs from margins; “0‐1 benefits”
• Resilience = system product’s ability to mitigate variability and reliability phenomena• Error detection and repair mechanisms• Alternative guardbanding mechanisms for
different system abstractions: stochastic, approximate, …
• Costs and benefits often less well‐defined
Defocus/Dose VariationMisalignment
TemperatureVariation
Reliability
Non-Rectangular ShapesLine-End Shortening
CrosstalkIR-drop
Imperfect regulatorsNon-Uniform CD
Erosion/Dishing in CMP
Electromigration
Hot-Carrier Injection
NBTI
Alpha-Particle
Line Edge Roughness
Mask CD Error
Wafer flatness Lens Aberration
Flare
UCSD VLSI CAD Laboratory 4NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
“Cost of Variability and Reliability”D
esig
n qu
ality
(e.g
., fr
eque
ncy)
Technology Node
Signoff with larger guardbands
Guardbands
Standard vague picture: increased guardband lost benefits of technology = no ROI
Lost benefits
UCSD VLSI CAD Laboratory 5NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Quantified Cost of Guardband [ISQED08]
Can we quantify cost of guardband?Idea (2007‐2008): study design benefit of reduced guardband
N.B.: going to the next node gives 20% speed, 20% power benefit 10% is half a node!
E.g., 50% guardband reduction looks like:
Expected impacts of guardband reduction:
Parambest Paramworst
-100% 100%0%
Delay reduction
Easier optimization
Smaller gate size
Smaller area (A)
Fewer defects
Less cost
Shorter wires
Adr eY
Ar
ArN dies 2
22
(d: defect density)
(r: wafer radius)
UCSD VLSI CAD Laboratory 6NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Design Outcomes from Guardband Reduction• 40% guardband reduction
• Area: 13% reduction• Dynamic power: 13% reduction• Leakage power: 19% reduction• Wirelength: 12% reduction• Tool runtime (S,P&R): 28% reduction• #Timing viols.:100% reduction • #Good dies per wafer (w/o process
enhancement): 4% increase• Raw die per wafer • Parametric yield • 40nm sweetspot: 20% guardband reduction
• Quantified impact of guardband insight into cost of guardband !
• Can we then answer: What is cost of {variability, reliability, resilience}?
Cell library guardband reduction
Synthesis
RC guardbandreduction
Placement
Clock tree synthesis
Routing
Analyze outcomes(Area, wirelength,
runtime, #violations,yield)
RTL Design(AES, JPEG, SOC1)
Technology(90nm, 65nm, 45nm)
Experimentswith industry chipimplementationflow
UCSD VLSI CAD Laboratory 7NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
My Group: Reduced Margin = Reduced Cost• Pessimism removal with more accurate margins• Explicit tradeoffs across various types of margin e.g., 1 mV = 5 MHz• Co‐optimization across engineering scopes, chip implementation phases includes “cross‐layer”, adaptivity / resilience, …
Design Time
Margin
Product Quality Model and Analysis Accuracy
ps, nm, mV, …
power, area, fmax, Iddq,…rms, %, σ
UCSD VLSI CAD Laboratory 8NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Reducing Cost Measuring Cost•Measuring Cost of X is difficult! which is why we’re here …• Reliability margins are intertwined with other margins• Tough to isolate specific costs of variability / reliability / resilience, especially in any design‐agnostic way
• Toward Assessing Cost of … (work at UCSD)• … Variability
• Reducing (phantom) margins: BEOL corners, FF timing model• … Reliability
• “cost of EM guardband”• AVS‐BTI‐EM: cost of wrong signoff conditions• Non‐default routing rules: cost of naïve enforcement of reliability margins• Assessment of EM margin considering lifetime (throughput and performance)
• … Resilience• “MinRazor”: tradeoff of resilience mechanism cost vs. margin cost• “PVS”: process‐aware voltage scaling (design‐independent, tunable monitors)
UCSD VLSI CAD Laboratory 9NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Our Usual Playing Field: SOC Implementation
Cell library guardband reduction
Synthesis
RC guardbandreduction
Placement
Clock tree synthesis
Routing
Outcomes(area, wirelength,
runtime, #viols, yield)
RTL Designs
Technology(90nm, 65nm, 45nm, 28nm)
P&R stage optimization
Signoff
BEOL cornersFF model
AVS-BTI-EM signoff
naive EM compliance
Runtime optimization
EM-overdrive
MinRazor
UCSD VLSI CAD Laboratory 10NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Activity factor
(α)
Jrms
Temp
Wire width
Lifetime (MTTF)
Driver size
A B Inverse relation; if A increases then B decreases
A B Direct relation; if A increases then B increases
Supply voltage
Timing slack
|Vthp |
Wire spacing
TDDB
TDDB
EM
EM
Freq.|Vthn |
Slew rate
Load/fanout
Gate length
Junction resistance
EM, TDDB, NBTI, HCI
HCINBTI
HCIHCI
HCI
HCI
HCI
HCI
NBTI
Tunable at design or runtime
Tunable at design
general
general
general
generalgeneral
general
general
general
generalgeneral
general
general
general
general
general
general
general
HCI
HCI
NBTI
Another View of the (Reliability) Playing Field
Models; technology parameters (not tunable)
UCSD VLSI CAD Laboratory 11NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
I. Assessing Costs of Variability“Phantom margins”: (1) BEOL, (2) FF model pessimism
UCSD VLSI CAD Laboratory 12NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Pessimism in Conventional BEOL Corners
• Conventional BEOL corners (CBC)• Skew all layers in the same direction to guardband for variability• Too pessimistic! Impossible to have worst‐case on all layers
• Pessimism in CBC creates “false” timing‐critical paths• Fixing “false” paths degrades design quality• Slow down design turnaround time
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
∆W ∆T ∆H
Typical typical typical Typical
Cbest min min max
Cworst max max min
RCbest max max max
Rcworst min min min
[ICCD14]
UCSD VLSI CAD Laboratory 13NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
A New Timing Signoff Flow
Routed design
Timing analysis using conventional
BEOL corners (CBC)
ECOusing CBC
violation = 0?
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0?
Timing analysis
using CBC
violation = 0?
ECOusing TBC
done
Our work
NoNo
UCSD VLSI CAD Laboratory 14NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Pessimism in Conventional BEOL Corners (CBC)• Assumption: a max (setup) path pj is “safe” when the delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ≥ 3σj + dj(Ytyp)
• For a given path, we can compare the statistical delay variation and the delay obtained from a given CBC
αj = 3σj / Δdj(YCBC) Δdj(YCBC)= [dj(YCBC) ‐ dj(Ytyp)] YCBC {Ycw, Ycb, Yrcw, Yrcb}
• A small αj implies there is a large pessimism
delay-3σ
dj(YCBC)-dj(Ytyp)3σj
Large pessimism
UCSD VLSI CAD Laboratory 15NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Wiring Structure in Timing-Critical Paths
• Wires on critical paths are routed on many layers
• Similar wiring structure is an outcome of design flow
Testcase:• 45nm foundry library (wire resistivity scaled by 8X)• Netlist: NETCARD 1mm2, 570K standard cell instances• 9 metal layers• Extract critical paths from different PVT and BEOL corners
Max. wirelength ratio across all layers (%)
Cum
ulat
ive
prob
abili
ty
0.92
60%
92% of paths have < 60% of wirelength on any single layer
Testcase:• 45nm foundry library (wire resistivity scaled by 8X)• Netlist: NETCARD 1mm2, 570K standard cell instances• 9 metal layers• Extract critical paths from different PVT and BEOL corners
UCSD VLSI CAD Laboratory 16NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Wiring Structure in Timing-Critical Paths
UCSD VLSI CAD Laboratory 17NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Opportunities for Tightened BEOL Corners
• CBC can be pessimistic! Most paths have α < 0.5 • Use tightened BEOL corners, e.g., scale BEOL variation in
.itf with α = 0.5
∆dj(Yrcw)/dj(Ytyp) x 100%
3σj/d(Ytyp) x 100%
Challenge: how to avoid underestimating delay variation to preserve parametric yield?
UCSD VLSI CAD Laboratory 18NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Scaling Factor α Delay Variation @Cw,RCw• Paths with small Δdrcw and Δdcw have large α• E.g., there are αj > 0.6 when ((Δdrcw < 3%) AND (Δdcw < 3%))• Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
α
∆d(Ycw)/d(Ytyp)
∆d(Yrcw)/d(Ytyp)
UCSD VLSI CAD Laboratory 19NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
A Practical Filter for TBC-Amenable Paths
Acw
Arcw
Gtbc = paths which can be safely signed off using tightened corners:(Path with (∆dcw larger than Acw)) OR (Path with (∆drcw larger than Arcw))
∆d(Ycw)/d(Ytyp)
∆d(Yrcw)/d(Ytyp)
UCSD VLSI CAD Laboratory 20NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Benefits of Tightened BEOL Corners
• WNS and TNS are reduced by up to 100ps and 53ns• #Timing violations reduced by
24% to 100% [Moore’s Law: 1% / week !]
• TBC-0.6 : more benefits• Tradeoff between reduced margin
vs. #paths which use TBC
‐0.2
‐0.15
‐0.1
‐0.05
0LEON SUPERBLUE12 NETCARD
WNS (ns)
CBC TBC‐0.5 TBC‐0.6 TBC‐0.7
‐100
‐80
‐60
‐40
‐20
0LEON SUPERBLUE12 NETCARD
TNS (ns)
CBC TBC‐0.5 TBC‐0.6 TBC‐0.7
0
500
1000
1500
LEON SUPERBLUE12 NETCARD
#Tim
ing violations
CBC TBC‐0.5 TBC‐0.6 TBC‐0.7
UCSD VLSI CAD Laboratory 21NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Flexible FF Timing Margin Recovery
setup
c2q
hold
c2q
C2q-setup-hold surface
setup holdc2q
setup
hold
c2q1
c2qn
...
setup‐hold‐c2q flexible model
• Setup time, hold time and clock-to-q (c2q) delay of FF⇒ NOT fixed values
• Flexible FF timing model considering operating (function/test) modes⇒ Reduce pessimism in timing analysis⇒ Reassessment of costs of variation
• Sequential LP• setup-c2q
optimization + hold-c2q optimization
• Objective: Find the best setup/hold time/c2q for each FF
setup‐hold‐c2q fixed model
[ISQED14]
UCSD VLSI CAD Laboratory 22NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Improved Timing Signoff Flow
Extract path timing information
LP formulation with flexible flip‐flop timing model
Solve Sequential LP (STA_FTmax , STA_FTmin)
Annotate new timing model for each flip‐flop
Solution
Netlist (and SPEF, if routed)
Timing signoff with annotated timing
Takeaways• Fix timing violations “for free”• 48ps average improvement of
slack over 5 designs in a foundry 65nm technology
Next steps• Study in advanced nodes• Better exploitation of disjoint
cycles/modes • More accurate modeling of
setup-hold-c2q tradeoff• Circuit optimization exploiting
FF timing model flexibility
UCSD VLSI CAD Laboratory 23NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Takeaways on Variability• Phantom margins leave O(node) value on the table recovering this is essential “equivalent scaling”• Two examples: BEOL corners, FF timing model• NOTE: To assess costs/benefits of new methods, need correct starting point!
• Conventional BEOL corners are VERY pessimistic!• Bottleneck for wire‐dominated, high‐performance circuits
• Revised signoff flow + tightened BEOL corners reduces WNS, TNS and #timing violations• Signoff methodology change under way at sponsor company
• Relaxed timing closure shortened design cycle, better PPA
UCSD VLSI CAD Laboratory 24NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
II. Assessing Costs of Reliability(1) cost of suboptimal AVS-BTI-EM signoff;
(2) cost of naïve EM rule enforcement;
(3) available lifetime throughput and performance benefit from
scheduling of multi-cores
UCSD VLSI CAD Laboratory 25NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Reliability Margin vs. Adaptive Voltage Scaling• Interaction between reliability margins and AVS mechanism• BTI aging higher |ΔVth| lower fmax AVS used to compensate performance degradation
• Higher voltage worsens EM on wires
Circuit frequency
Vdd
time
time
Without AVS
With AVS
target
Stress on Wires
VDD(AVS)
Design Implementation
Vlib , VBTI
Derated Libraries
Signoff loop of BTI + EM
EM loop
BTI loop
[DATE13,SLIP14]
UCSD VLSI CAD Laboratory 26NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
• VBTI = Voltage for BTI aging estimation• Vlib = Voltage for circuit performance estimation (library characterization)
• VBTI and Vlib are required in signoff • VBTI and Vlib selection should consider BTI + AVS interaction• Aging and Vfinal are unknowns before circuit implementation
BTI degradation
and AVSVfinal?
VBTI |Vt|
Step 1
Vlib
Deratedlibrary
Step 2
Circuit implementation
and signoff
circuit
Step 3
Derated Library Characterization and AVS (BTI Loop)
UCSD VLSI CAD Laboratory 27NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Derated Library Characterization and AVS (BTI Loop)
•VBTI = Voltage for BTI aging estimation•Vlib = Voltage for circuit performance estimation (library characterization)
•VBTI and Vlib are required in signoff •VBTI and Vlib depend on aging during AVS•Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt|
Circuit implementation
and signoff
circuitBTI
degradation and AVS
Vfinal?
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal , Vlib , VBTI• What is the design overhead when timing
libraries are not properly characterized?• Can we define BTI- and AVS-aware signoff
corners that ensure product goals with small design, lifetime energy overheads?
• What is the impact of EM for different signoff corners?
UCSD VLSI CAD Laboratory 28NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Energy vs. Area Across Different Signoffs
“Knee” point for area vs. lifetime energy
Optimistic signoff corner • AVS increases supply voltage
aggressively to compensate aging
• Large lifetime energy overhead• May fail to meet timing if
desired supply voltage > Vmax
Pessimistic signoff corner • Ovestimate aging and/or
underestimate circuit performance
• Large area overhead
UCSD VLSI CAD Laboratory 29NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
AVS Impact on EM Lifetime
0.8
0.9
1
1.1
1.2
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8
Vfi
nal
(V)
Life
tim
e (y
ear)
Implementation #
Lifetime (year) Vfinal (V)
11
• Assume no EM fix at signoff• BTI degradation is checked at each step and MTTF is updated as
30% MTTF penalty
200mV voltage compensation
UCSD VLSI CAD Laboratory 30NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Power Penalty of Fixing EM with AVS
0.310.310.320.320.330.330.340.340.350.35
12.00
13.00
14.00
15.00
16.00
17.00
1 2 3 4 5 6 7 8
P/G
Pow
er (
mW
)
Cor
e P
ower
(m
W)
Implementation #
Core Power (mW) P/G Power (mW)
• Core power increases with elevated voltage • P/G power increases due to both elevated voltage, PDN degradation• Tradeoff with guardband investment at design signoff
Highest invested guardband
Least invested guardband
14% power penalty
UCSD VLSI CAD Laboratory 31NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
0.90
0.95
1.00
0 5 10 15
VD
D
Year
S1 S2 S3 S4 S5
DMA, #3 7.97.98.08.08.18.1
S1 S2 S3 S4 S5
MTT
F (Y
ear)
EM Impact on AVS Scheduling
1.2 years MTTF penalty
• AVS affects EM lifetime penalty • We empirically sweep AVS voltage step size to obtain the impact• 5 step sizes: S1 – S5 = {8, 10, 15, 18, 20} mV
UCSD VLSI CAD Laboratory 32NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Smarter NDRs in CTS (EM Cost Reduction)•NDRs apply wider wire widths (= costs of EM) and spacing to address EM and parasitic and delay variation for clock tree•However, a wire does NOT need to be wide if it has a small number of downstream sinks
Accurate assessment of EM margin should include clock tree topologies (e.g., #downstream sinks)
Less #downstream sinks (== Less current) at leaf-side in a clock tree
sink
driver
Driving 4 buffers
Driving 2 buffers
Driving 1 buffer
[DAC13]
UCSD VLSI CAD Laboratory 33NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Vicious Cycle vs. Virtuous Cycle• Excessive margin ⇒ Not just “design overhead”• Vicious cycle vs. virtuous cycle
sink
driver
# downstream = 2
# downstream = 16
Larger Cap.
More/Larger Buffers
More EM Viol.
Fixed NDR
More power
Smaller Cap.
Fewer/Smaller Buffers
Less EM Viol.
Smart NDR
Less powe
r
Fixed NDR (Wider Wires)
Smart NDR (Tapering)
• Less-naïve compliance with EM rules ⇒ reduce design overhead, and avoid vicious cycle
UCSD VLSI CAD Laboratory 34NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Smart Routing NDRs: Clock Power Reduction• 9.2% wire capacitance, 4.9% clock switching power reduction • Still, satisfy skew, max transition limits and EM limit
Capacitance, Clock Power Reduction
0.0%5.0%
10.0%15.0%
Red
ucti
on
[%]
Wire Cap.Clock Switching Pwr
0.0%2.0%4.0%6.0%8.0%
Red
ucti
on
[%]
Wire Cap. Clock Switching Pwr
Default: 4W5S
Default: 2W4S
Proportions of NDRs
0% 20% 40% 60% 80% 100%
aeseth
jpeg_encmc
mpeg2tv80susbf
conmaxdma
1W8S2W7S3W6S4W5S
N*spacingmin
M*widthmin
NDR {M}W{N}S
UCSD VLSI CAD Laboratory 35NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Reliability-Constrained foverdrive Selection
• Reliability and system lifetime guarantees are key design considerations for multicore processors in advanced nodes
• Task scheduling determines use of cores across operating modes• Overdrive (turbo) mode can meet performance and throughput requirements, but incurs faster MTTF degradation
• Two potential failures: throughput and performance• Can violate “acceptable throughput” for tasks: cores fail before all assigned tasks
are completed• Can violate minimum “acceptable performance” for tasks: ores operate only at
lower frequencies than needed
• “EMOD”: solves a new Maximum‐Value Reliability‐Constrained Overdrive Frequencies (MVRCOF) optimization (offline) problem • When all cores not simultaneously active, adjust task scheduling on a subset of
active cores for balanced wearout• Guarantee prescribed levels of performance and lifetime throughput• Overdrive frequencies = optimization variables; user experience = objective
[ISQED14]
UCSD VLSI CAD Laboratory 36NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Comparison vs. Previous Works
Work TypeReiss12 NRC, NLG, NPG
Karpuzcu09 RC, NLG, NPG
Mihic04 RC, LG (Dynamic power management), NPG
Rosing07 RC, LG (Dynamic power management), NPG
Rong06 RC, LG (Dynamic power management), NPG
Coskun09 RC, LG (Dynamic thermal management), NPG
Srinivasan04 RC, LG (Dynamic reliability management), NPG
Karl08 RC, LG (Dynamic reliability management), NPG
Our Work RC, LG (Dynamic reliability management, PG
(N)RC – (Non-) Reliability Constrained(N)LG – (No) Lifetime Guarantee(N)PG – (No) Performance Guarantee
UCSD VLSI CAD Laboratory 37NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Optimal (Discretized) Solution Flow• For each core• For each combination in which the core is active
• Choose discrete values of overdrive frequencies within a range• Perform power and temperature simulations one‐time LUT creation
• Example: • If a system has 3 cores (Core A, B, C), the number of active cores
can be 1, 2 or 3• Core A is active
• One (out of three) combinations when 1; two (out of three) combinations when 2; one (out of one) combination when 3
•Use exhaustive search based on LUT to find optimal overdrive frequencies that maximize the value of the objective function
UCSD VLSI CAD Laboratory 38NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Heuristic Flow
•We maximize the overdrive frequency (fOD,m) in the order of the set of active cores for which the product of weights (wnom,m, wOD,m) and execution times (Enom,m, EOD,m) is maximum• Example:
• If a system has 3 cores, the number of active cores can be 1, 2 or 3• If , ∙ , , ∙ , , ∙ , , we maximize , , , ,and ,
• Empirically, finds large improvements in objective function value
, ∙ , ∙ , , ∙ , ∙ ,
UCSD VLSI CAD Laboratory 39NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Testcases
Name m Enom,m(Kh)
EOD,m(Kh)
wnom,m wOD,m
4-I 1, 23, 4
1, 23, 2
3, 58, 5
0.5, 0.30.2, 0.4
0.5, 0.70.8, 0.6
• Testcases are described by • #activecores
• , , , nominalandoverdriveexecutiontimes
• , , , nominalandoverdriveuser‐definedweights
• Eight testcases in total• Format is ‐Testcase#• Seven have optimal solutions• One does not have feasible solution
• Example
UCSD VLSI CAD Laboratory 40NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Optimal, Heuristic vs. RC-LG (Baseline)
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
4-I 4-II 4-III 4-IV 4-V 6-I 8-I
Obj
ecti
ve F
unct
ion
Val
ue
Testcase
Optimal Heuristic Baseline
-3.3%
-17.4%
-12%-9%
Optimal solution improves objective function value by up to 17.4%
UCSD VLSI CAD Laboratory 41NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Takeaways on Reliability• Signoff methodology can have huge impact
• Example: Chicken‐and‐egg loops among BTI, EM, and signoff corner selection in AVS‐enabled systems
• AVS = new dimension in reliability vs. design cost (power/area) tradeoff space
• Naïve enforcement of reliability rules can be costly• Post‐IC implementation, reliability awareness at scheduler‐level improves lifetime “user experience” and guaranteed performance
• Basic challenges remain:• (i) reliability modeling and calibration• (ii) measuring reliability cost ({PPA}) with, without reliability margins• (iii) many “don’t turn over rocks” barriers to reliability cost reduction
UCSD VLSI CAD Laboratory 42NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
III. Assessing Costs of Resilience(1) “MinRazor”
UCSD VLSI CAD Laboratory 43NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
How to Minimize Cost of Resilience ?• Additional circuits area and power penalties• Recovery from errors throughput degradation• Large hold margin short‐path padding cost• Want benefits (e.g., energy) to maximally outweigh costs • “MinRazor”: Minimum‐Cost Resilient Design Implementation
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBERPower penalty 30% [Das08] ~0% [Kim13] 100% [Choudhury09]
Area penalty 182% [Kim13] 33% [Kim13] 255% [Chen13]
#recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
[GLSVLSI14]
UCSD VLSI CAD Laboratory 44NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Tradeoff: Resilience Cost vs. Datapath Cost
#Razor FFs (resilience cost)
Power/area of fanin circuits
Tradeoff
0
1
2
3
4
8
9
10
11
12
Ener
gy (m
J)
#Razor FFs
Total energyEnergy of non-resilient partResilience cost
300 100 50 0
We seek to minimize total energy via this tradeoff
UCSD VLSI CAD Laboratory 45NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Selective-Endpoint Optimization (SEOpt)• Optimize fanin cone of an endpoint w/ tighter constraints Allows replacement of Razor FF w/ normal FF
• Pick endpoints based on heuristic sensitivity functionsVary #endpoints compare area/power penalty
1 | |
2 | |
3 | |
4 | |
5 | |
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
UCSD VLSI CAD Laboratory 46NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Clock Skew Optimization (SkewOpt)• Increase slacks on timing‐critical and/or frequently‐exercised paths1. Generate sequential graph 2. Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3. Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
ClockData path Clock tree
W31
,1 β ,
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
W’
W’ W’
W’ = average weight on cycle
UCSD VLSI CAD Laboratory 47NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Overall Optimization Flow• Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy < min energy?
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w/ normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
UCSD VLSI CAD Laboratory 48NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Benefit of Low-Cost Resilience• Proposed method (CO) minimizes cost of resilience in terms of energy• Reference flows
• Pure‐margin (PM): conventional method w/ only margin insertion Cost of pure margin insertion = up to 21% energy overhead
• Brute‐force (BF): use error‐tolerant FFs for timing‐critical endpoints Cost of resilience w/ poor design method = up to 10% energy overhead
• Cost increases with larger process variation
27
29
31
33
35
37
PM BF CO PM BF CO PM BF CO
Ener
gy (m
J)
22
26
30
34
38
PM BF CO PM BF CO PM BF CO
Ener
gy (m
J)
Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy w/o resilience
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Small/medium/large margin 1σ/2σ/3σ for SS corner
Technology: foundry 28nm
UCSD VLSI CAD Laboratory 49NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Increased Benefit of Resilience with AVS• Adaptive voltage scaling allows a lower supply voltage for resilient designs, thus reduced power
• Proposed method trades off between timing‐error penalty vs. reduced power at a lower supply voltage
• Proposed method achieves an average of 17% energy reduction compared to pure‐margin designs Proposed optimization leads to further reduced resilience cost in the context of AVS strategy
25
30
35
40
45
50
0.86 0.9 0.94 0.98 1.02
Ener
gy (m
J)
Supply voltage (V)
pure-marginbrute-forceCombOpt
24
26
28
30
32
34
36
0.70 0.72 0.74 0.76 0.78 0.80
Ener
gy (m
J)
Supply voltage (V)
pure-marginbrute-forceCombOpt
MUL EXU
Minimum achievable energy
Technology: foundry 28nm
UCSD VLSI CAD Laboratory 50NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Optimization of TIMBER-Based Designs• TIMBER FFs use time borrowing to mask timing errors
• Additional constraints to select endpoints as TIMBER FFs(1) No loop of TIMBER FFs(2) No chained TIMBER FFs with more than two stages (assume two error‐detection intervals)
• Require additional timing slacks on fanout paths to mitigate timing errors• As compared to the solution of the proposed flow (CO) Cost of pure margin insertion = 23% energy overhead Cost of resilience w/ poor design method = 7% energy overhead
0
1
2
3
4
5
6
7
PM BF CO
Ener
gy (m
J)
Energy penalty of additional circuitsEnergy w/o resilience
Design: ARM M0Technology: foundry 40nmED interval = 10% of clock period
UCSD VLSI CAD Laboratory 51NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Recent: Iterative Opt for Conventional Designs• Cost of resilience = area/power overheads, design difficulties … Can we achieve similar benefits without resilient circuits, but following the same spirit of optimization for resilient designs?
• Optimization flowI. Relax timing constraints on all paths to be original clock period + relaxed marginII. Calculate sensitivity function of each endpoint with respect to original clock period
(SF = sum of |slack * power| of negative‐slack cells in the fanin cone)III. Based on SF (sorted in increasing order), select top 10% endpoints to recover to
original clock period (i.e., perform timing optimization with updated SDC file)IV. Iterate Steps II and III 10 times
• Design: ARM M0 at foundry 40nm (clock period = 6ns, relaxed margin = 300ps) • Optimization shows 16% power reduction
All power values are reported at clock period = 6ns
Iteration Power(mW)
Endpointsw/ violation
Area(um^2)
1 2.145 452 1313402 2.669 339 1310893 2.703 264 1305634 2.371 215 1304865 2.329 139 1308806 2.373 89 1314157 2.446 31 1307128 2.452 0 131011
PM 2.934 0 131319
0
1
2
3
0100200300400500
Tota
l pow
er (m
W)
#Endpoints with timing violation
PM
Opt
UCSD VLSI CAD Laboratory 52NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
A Different Slack Distribution• Design optimized with the new iterative optimization flow has more balanced slack distribution
• More timing paths with small slacks exploit additional timing slacks for power reduction
• Reopens the question: How to best trade timing slacks for power reduction in IC implementation / performance closure
Optimized designPM design
UCSD VLSI CAD Laboratory 53NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
Takeaways on Resilience• “Cost of resilience” strongly depends on ability to mix resilient and non‐resilient circuits • Up to 21% and 10% energy overheads respectively for cost of margin insertion
and resilience (with poor design method) • Careful reduction of resilience cost can improve resilient design value
proposition in the AVS context• Yet again: hard to obtain correct starting point for benefit/cost assessment!
• Basic challenges remain: • (i) measuring cost of resilience at software level• (ii) unpredictable dependencies on design, implementation and operating
scenarios• (iii) missing formulations of resilience as “optimizable objectives” for design
tools
UCSD VLSI CAD Laboratory 54NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
In Closing …• ‘Ground‐up’ (crawl before walk, walk before run) approach is still stuck at ground level (in my group)• Basics of techno models, reliability models, margins, signoff criteria, implementation flows, design testcases, workload models, narrow windows of opportunity, …, optimization problem statements, … still way too fuzzy for our tastes (!)
• How should we assess the cost of {reliability, resilience}?• Is it even possible in a general, non‐artifactual way? • Can we taxonomize and avoid pitfalls seen in previous works?
• Targets for next / new research?• Missing theorems? Missing links? Missing infrastructure? Missing models and data? Missing problem statements?
= to be identified during this colloquium !?!?
UCSD VLSI CAD Laboratory 55NSF Variability Expedition / DFG SPP 1500 Colloquium, 141113
THANK YOU !