TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing...

17
Lecture 13 Advanced Technologies on Digital Circuits Performance Metrics for Digital Circuits Impact of Advanced Technologies Reading: multiple research articles (reference list at the end of this lecture) B. Nikolic, EE 241 course materials, UC Berkeley 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 1970 1980 1990 2000 2010 1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08 1.E+09 1.E+10 4004 8080 286 486 TM Pentium® II CPU Pentium® 4 CPU Itanium® 2 CPU 2x increase every 2 years Penryn QC Single Core Dual Core Quad Core 6 Core 8 Core # of Transistors per Die

Transcript of TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing...

Page 1: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Lecture 13• Advanced Technologies on Digital Circuits

– Performance Metrics for Digital Circuits– Impact of Advanced Technologies

Reading: • multiple research articles (reference list at the end of this

lecture)• B. Nikolic, EE 241 course materials, UC Berkeley

1.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

1970 1980 1990 2000 20101.E+03

1.E+04

1.E+05

1.E+06

1.E+07

1.E+08

1.E+09

1.E+10

4004

8080

286

486TM

Pentium® II CPU

Pentium® 4 CPU

Itanium® 2 CPU

2x increase every

2 years

PenrynQC

Single CoreDual Core

Quad Core

6 Core8 Core#

of T

rans

isto

rs p

er D

ie

Page 2: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Complementary Logic Gates

11/17/2013 2Nuo Xu EE 290D, Fall 2013

• τd is reduced by increasing IEFF and reducing load capacitance C.

EFF

DDpLHpHLd I

CVtt22

1Intrinsic Delay Load capacitance includes:

• Cdiff• (effective) fanout cap.s CGG Cfringe

• Cwire

Other Combinational Logic GatesInverter Chains

Page 3: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Energy & Delay of Digital Circuits

11/17/2013 3Nuo Xu EE 290D, Fall 2013

• tdelay = LD · F · CVDD/(2IEFF)• Edyn + Eleak = α · LD · F · CVDD

2 + LD · F · IOFFVDDtdelay

• = α · LD · F · CVDD2 [ 1 + (LD·F /2α) / (IEFF/IOFF) ]

• LD : Logic Depth• F : effective Fanout

per stage• α : Activity Factor

approximated model for the data path:(this may not be the latency-optimized design,

but the energy-optimized one.)

Latch Latch

Courtesy of E. Alon (UC Berkeley)

Page 4: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Activity Factor Calculation

11/17/2013 4Nuo Xu EE 290D, Fall 2013

Energy Delivery Path during 0→1 at Output:

Energy consumed during N cycles:

Example: A 2-input NOR Gate Truth Table

Courtesy of B. Nikolic (UC Berkeley)

Page 5: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Short-Circuit Energy

11/17/2013 5Nuo Xu EE 290D, Fall 2013

0.10 0.15 0.20 0.25 0.30 0.35 0.4010-17

10-16

10-15

Ener

gy p

er O

pera

tion

(J)

Threshold Voltage (V)

Esc Etot

22 nm TriGate TransistorsVDD = 0.7 V

GND

VDDS

S

DDVIN VOUT

CMOS inverter: Impact of VTH on CMOS

short-circuit energy

• Short circuit energy can be mitigated by adjusting |VT,P| + |VT,N| > VDDH.J.M. Veendrick, JSSC (1984)

Page 6: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Energy vs. Delay Plots

11/17/2013 6Nuo Xu EE 290D, Fall 2013

L. Wei, TED (2011)

• tdelay = LD · F · CVDD/(2IEFF)

• Edyn + Eleak = α · LD · F · CVDD2 +

LD · F · IOFFVDDtdelay

• Obviously, there exists the inherent trade-off between energy and delay (1 / performance) in logic circuits:

• Optimizations needed for: VDD ION, IOFF or ION/IOFF ? Transistor sizing Logic structure (LD, α, F) System architecture

Energy

Delay

Page 7: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Optimal VDD and ION/IOFFTotal energy of a logic chain:

The Edyn/Eleak under optimum ION/IOFF:

H. Kam, TED (2012)(for a MOSFET)

• Pick VDD, VTH to minimize energy for given performance (1/delay)

– Assuming work function (VTH) can be freely tuned

• Result: optimal ION/IOFF LD · F / α

11/17/2013 7Nuo Xu EE 290D, Fall 2013

Derive E vs. VDD, at a fixed tdelay

Derive E vs. VDD, to find the minimum E:

2 ∙ ∙ ∙ 2 ∙ ∙∙ ∙

⁄ 0

→ , m ∙ 24∙

• Note that the optimal VDD occurs at MOSFET sub-threshold region.

B.H. Calhoun, JSSC (2005)

Page 8: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Application Oriented Energy Optimization

11/17/2013 Nuo Xu EE 290D, Fall 2013

H. Fuketa, IEDM (2012)

Page 9: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Energy vs. Delay Optimizations: To Probe Further

11/17/2013 9Nuo Xu EE 290D, Fall 2013

• Downsizing transistors ( CL)• Reducing clock frequency ( fclk) lowers the performance

• Reducing activity factor requires logic restructuring

• Increasing threshold ( VTH)• Lowering the supply voltage ( VDD) Lowers the performance

• Improving electrostatics + mobility• Reducing parasites

Designs Technology

L. Wei, TED (2011)

Pareto Curve of minimum E vs. D

Page 10: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Power Gating Techniques

11/17/2013 10Nuo Xu EE 290D, Fall 2013

B. Calhoun, JSSC (2004)

Dynamic VTH Tuning Sleep Transistors

Dual VDD

Y. Shimazaki, JSSC (2004)

Page 11: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Micro-Architectural Design

11/17/2013 11Nuo Xu EE 290D, Fall 2013

Perf. fclk, E/op ~ 0.5 unit

AB

Parallelism

A

B

A

B

Reference

Pipeline

Perf. fclk, E/op ~ 0.5 unit

Perf. fclk, E/op ~ 1 unit

• Parallelism allow transistors operate at slower speed while maintains the same throughput per frequency.

Energy

Perf.

High VDDLow VDD

w/ parallelism

Page 12: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Digital Circuit Design Trade-offs

11/17/2013 12Nuo Xu EE 290D, Fall 2013

S. Borkar, VLSI-T short course (2007)

Page 13: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Impact of Technology Scaling

11/17/2013 13Nuo Xu EE 290D, Fall 2013

Source: Intel

In a power-limited technology:• Parasites reduction seems

the most important among all solutions (i.e. SS↓, µ↑)

• VDD scaling will be helpful only for devices with good electrostatics.

A. Khakifirooz, IEDM (2008)

Page 14: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Impact of Variability

11/17/2013 14Nuo Xu EE 290D, Fall 2013

• Device variability hurts in two ways1. Reduces effective IEFF (delay set by worst-case)2. Increase effective IOFF (leaky devices dominate) Forces increase in nominal ION/IOFF and VDDmin …

200p 400p 600p 800p 1n

1

2

3

w/ RW

HP Devices

Ener

gy p

er S

witc

h (1

0-16 J

)

Delay (Sec)

N-MOSFET

W/Leff=50/28nm

= 0.01 F = 4 LD = 30

LP DevicesSimu.

UniformDoping

Energy vs. Delay plots under RDF Logic VDDmin vs. logic depth

H. Fuketa, IEDM (2012)

Page 15: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Advancement ofThin-Body MOSFETs

11/17/2013 15Nuo Xu EE 290D, Fall 2013

K. Kuhn, IEDM (2012)

N. Planes, VLSI (2012)

• Benefits of thin-body MOSFETs: Improved ION/IOFF from

higher mobility and better electrostatics

Improved variability• However, parasites

reduction are still challenging. SOI substrate will be

extremely helpful in this regime

UTBB FD-SOI @ 28 nm

TriGate @ 22 nm

Page 16: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

Bulk vs. SOI FinFETs: Inverter Benchmark

11/17/2013 16Nuo Xu EE 290D, Fall 2013

T. Chiarella, SSE (2010)

• SOI FinFET fits better for low-power corner while bulk FinFET for the high-performance one.

• A balanced P/N design will benefit for bulk FinFET further.

Page 17: TM - University of California, Berkeleyee290d/fa13/LectureNotes/Lecture13.… · • Downsizing transistors ... Y. Shimazaki, JSSC (2004) Micro-Architectural Design 11/17/2013 Nuo

References1. H.J.M. Veendrick, “Short-Circuit Dissipation of Static CMOS Circuitry and Its Impact on the Design

of Buffer Circuits,” IEEE Journal of Solid-State Circuits, vol.19, no.4, pp. 468-473, 1984.2. L. Wei et al., “Technology Assessment Methodology for Complementary Logic Applications Based

on Energy-Delay Optimization,” IEEE Trans. Electron Devices, vol.58, no.8, pp.2430-2439, 2011.3. H. Kam et al., “Design Requirements for Steeply Switching Logic Devices,” IEEE Trans. Electron

Devices, vol.59, no.2, pp.326-334, 2012.4. B.H. Calhoun et al., “Modeling and Sizing for Minimum Energy Operation in Subthreshold

Circuits,” IEEE Journal of Solid-State Circuits, vol.40, no.9, pp.1778-1786, 2005.5. H. Fuketa et al., “Device-Circuit Interactions in Extremely Low Voltage CMOS Designs,” IEEE

IEDM Tech. Dig., pp. 559-562, 2011.6. B.H. Calhoun et al., “A Leakage Reduction Methodology for Distributed MTCMOS,” ,” IEEE

Journal of Solid-State Circuits, vol.39, no.5, pp. 818-826, 2004.7. Y. Shimazaki et al., “A Shared-Well Dual-Supply-Voltage 64-bit ALU,” IEEE Journal of Solid-State

Circuits, vol.39, no.3, pp. 494-500, 2004.8. S. Borkar, “Power/Performance Management & Challenges,” VLSI Tech. Symp., Short Course,

2007.9. A. Khakifirooz et al., “The Future of High-Performance CMOS: Trends and Requirements,” IEEE

IEDM Tech. Dig., pp. 30-33, 2008.10. N. Planes et al., “28nm FDSOI Technology Platform for High-Speed Low-Voltage Digital

Applications,” Symp. VLSI Tech. Dig., pp. 133-134, 2012.11. K. Kuhn et al., “The Ultimate CMOS Device and Beyond,” IEEE IEDM Tech. Dig., pp. 171-174,

2012.12. T. Chiarella et al., “Benchmarking SOI and Bulk FinFET Alternatives for Planar CMOS Scaling

Succession,” Solid-State Electronics, vol.54, pp.855-860, 2010.