Device-Circuit Co-design approaches for Multi-gate FET Technologies

DEVICE-CIRCUIT CO-DESIGN APPROACHES

FOR MULTI-GATE FET TECHNOLOGIES

AJAY NUGGEHALLI BHOJ

A DISSERTATION

PRESENTED TO THE FACULTY

OF PRINCETON UNIVERSITY

IN CANDIDACY FOR THE DEGREE

OF DOCTOR OF PHILOSOPHY

RECOMMENDED FOR ACCEPTANCE

BY THE DEPARTMENT OF

ELECTRICAL ENGINEERING

ADVISER: PROFESSOR NIRAJ K. JHA

APRIL 2013

c© Copyright by Ajay Nuggehalli Bhoj, 2013.

All rights reserved.

Abstract

Planar CMOS technology has reached its scaling limits at the 22nm node, where it

is increasingly difficult to design high-performance low-power devices with good

yield in the presence of global and local process variations. Multi-gate FET technol-

ogy is the best alterative that can extend scaling to the sub-10nm technology nodes

with minimum additional processing costs. However, owing to the non-planar na-

ture of multi-gate devices, several challenges in process technology, CAD/layout

design, and testing need to be addressed to enable design portability from planar

to multi-gate FET chips.

This thesis strives to bridge the device-circuit co-design gap that has severely

limited predictive modeling of circuits using emerging multi-gate/FinFET de-

vices during early stages of process technology development. First, the tra-

ditional notion of leveraging independent-gate devices for power reduction

is challenged, by contrasting logic gates having symmetric gate-workfunction

shorted/independent-gate FinFETs alongside logic gates having asymmetric

gate-workfunction shorted-gate FinFETs, in a high-performance process. The

superiority of asymmetric gate-workfunction devices is demonstrated by com-

paring leakage-delay trends, and the downsides of logic gates employing a mix

of shorted and independent-gate devices is brought out from a testing/fault

modeling perspective.

Next, efficient methodologies are developed for unifying the layout and pro-

cess simulation worlds, in order to breach the ‘many-device TCAD barrier’ that

has limited the applicability of 3D-TCAD modeling for over a decade. Here, im-

portant bottlenecks for layout to 3D circuit structure generation, such as the time

and memory complexity of 3D process simulation, are identified. To bypass the

latter, a radically new layout-/process-/device-independent approach based on

iii

automated structure synthesis is proposed and evaluated for accuracy and scala-

bility, using SRAM bitcell structures with 32/22nm process assumptions.

After addressing the 3D-TCAD structure generation issue, several hitherto in-

tractable problems, such as true 3D parasitic capacitance extraction for generic

multi-gate circuit layouts in sub-32nm technology nodes, entered the realm of pos-

sibility. Here, the need for transport analysis based capacitance extraction is ex-

plained, by highlighting the difference between field solver based extractions and

TCAD based extractions on sub-32nm IBM SOI SRAMs. Thereafter, the combina-

tion of structure synthesis and transport analysis based extraction is validated with

hardware data from two companion 6T SRAM arrays fabricated in an IBM 32nm

SOI HKMG process. Next, a multi-gate version of the structure synthesizer is used

to predict and analyze key parasitic capacitance trends in 6T multi-gate SRAMs at

the 22/14/10nm technology nodes.

Finally, this thesis delineates a path to enable multi-gate layout/process/circuit

co-design, using a unified 3D/mixed-mode 2D-TCAD methodology for systemat-

ically designing and evaluating different 6T FinFET SRAM bitcell topologies in a

22nm SOI process. Here, the role of parasitic capacitances, i.e., their dependencies

on fin/gate pitch, etc., are examined in detail, and the need to evaluate multi-gate

bitcells based on dynamic behavior, rather than DC metric targets, is highlighted.

iv

Acknowledgments

Firstly, I’d like to express my heartfelt gratitude to my research adviser Prof.

Niraj Jha. Over the years, he has been a tremendous source of inspiration and

support. I thank him for his timely guidance and insights that have played a key

role in shaping this thesis. I am also deeply indebted to him for being very patient

with me over several paper iterations, and for the encouragement to expand my

horizons into new areas. I have had so much to learn from his impeccable writing

skills, mental discipline, as well as his balanced approach to professional life.

I have also been very lucky to have Dr. Rajiv Joshi from the IBM T.J. Watson

Research Center as my internship mentor and research collaborator during my

stints at IBM Research, Yorktown Heights, NY and IBM SRDC, Bengaluru. Dr.

Joshi was instrumental in providing key ideas and material support in the projects

that I worked on, and his astute focus and perspectives on emerging technologies

have proved invaluable in the course of my research at Princeton. I’d also like to

acknowledge several IBMers who have helped shape my work through interesting

conversations and valuable inputs/feedback: Koushik Das, Sudhir Gowda, Jeff

Burns, Jeff Johnson, Steve Furkay, Abe Elfadel, David Katcoff, Chung-Hsun Lin,

Dieter Wendel, Keunwoo Kim, Matt Ziegler, Phil Oldiges, Pong-Fei Lu, Bob Wong,

Ruchir Puri, Werner Rausch, Aditya Bansal, and Yue Tan. Many thanks to Murali

Kota, Samarth Agarwal, Mohit Bajaj, Rajan Pandey, Ninad Sathaye, Sreekumar

Kuriyedath, and Arvind Ajoy, for guiding me during my internship at IBM SRDC,

Bengaluru.

I am very grateful to Prof. Naveen Verma and Dr. Koushik Das for taking time

to read this thesis in detail, and for providing pointers to help improve it. I’d also

like to thank Prof. Verma and Prof. Jha for the opportunity to work on an SRAM

chip tape-out early on that helped me appreciate the challenges faced by circuit

designers, as well as their support for timely CAD tool maintenance.

v

Owing to the cross-disciplinary nature of my work, I owe a lot to the graduate-

level courses taught by Prof. Sharad Malik, Prof. Niraj Jha, Prof. Li-Shuan Peh,

Prof. Naveen Verma, Prof. James Sturm, Prof. Claire Gmachl, Prof. Steve Chou,

and Prof. Mansour Shayegan. Many thanks to Wali Akande, Wenzhe Cao, Tracy

Tsai, Jiun-Yun Li, Qiang Liu, Yenting Chiu, and Yixing Liang, for helping me out

in courses in the early days.

A bulk of the work in this thesis was enabled by Princeton’s Terascale Infras-

tructure for Groundbreaking Research in Science and Engineering (TIGRESS) HPC

clusters. I am indebted to Dennis McRitchie, Bill Wichser, Bob Knight and Cur-

tis Hillegas for critical software and CAD installation support that they provided

during the course of my research. I also thank all the EE department staff, Sarah

McGovern, Lori Bailey, Stacy Weber, and Roelie Abdi for helping me out on innu-

merable occasions.

I am very thankful to the past and present members of my research group who

have made my stay memorable: Wei Zhang, Amit Kumar, Prateek Mishra, Niket

Agarwal, Muzaffer Simsir, Chun-Yi Lee, Chunxiao Li, Jun-Wei Chuah, Ting-Jung

Lin, Mohammed Shoaib, Meng Zhang, Sourindra Chaudhuri, Aoxiang Tang,

Chia-Chun Lin, Yang Yang, Xianmin Chen, and Amlan Chakrabarti, for many

entertaining conversations. Special thanks to Mohammed Shoaib, Tushar Krishna,

Prakash Prabhu, Arnab Sinha, Easwaran Raman, Aravindan Vijayaraghavan,

Aditya Bhaskara, Anirudh Badam, Arun Raman, Divjyot Sethi, and Rajsekar

Manokaran for wonderful times at Princeton.

Finally, I am deeply grateful to my parents and my brother for their unwavering

love and encouragement throughout the many years of my education. I’d also like

to acknowledge the antharyamin for expressing itself at the toughest of times and

helping me stay focussed on my graduate work.

vi

To my parents.

vii

Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

1 Introduction 1

1.1 The move to multi-gate transistors . . . . . . . . . . . . . . . . . . . . 1

1.2 Dissertation contributions . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Design and test of FinFET logic circuits . . . . . . . . . . . . . 6

1.2.2 Efficient algorithms for 3D-TCAD modeling of emerging de-

vices and circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2.3 Transport analysis based 3D-TCAD parasitic capacitance ex-

traction in emerging technologies . . . . . . . . . . . . . . . . 8

1.2.4 Parasitics-aware design of FinFET SRAMs . . . . . . . . . . . 9

1.3 Dissertation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Background 12

2.1 Device modeling with Technology CAD . . . . . . . . . . . . . . . . . 12

2.1.1 Process simulation . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.2 Device simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.3 Device transport and physical models in TCAD . . . . . . . . 16

viii

2.2 Compact models for multi-gate FETs . . . . . . . . . . . . . . . . . . . 21

2.3 A generic multi-gate device fabrication flow . . . . . . . . . . . . . . 24

2.4 Multi-gate FET adoption challenges . . . . . . . . . . . . . . . . . . . 27

3 Design and Test of FinFET Logic Circuits 30

3.1 Design of logic gates and flip-flops in high-performance FinFET

technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.1.3 Symmetric-ΦG and asymmetric-ΦG FinFET devices . . . . . . 33

3.1.4 Symmetric-ΦG and asymmetric-ΦG FinFET logic gates . . . . 47

3.1.5 Symmetric-ΦG and asymmetric-ΦG FinFET latches and flip-

flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.1.6 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.2 Fault models for logic circuits in the multi-gate era . . . . . . . . . . . 66

3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

3.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

3.2.3 FinFET logic gates . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.2.4 Modeling defects in FinFET logic gates . . . . . . . . . . . . . 71

3.2.5 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 88

4 Efficient Algorithms for 3D-TCAD Modeling of Emerging Devices and

Circuits 91

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

4.3 Structure synthesis methodologies . . . . . . . . . . . . . . . . . . . . 97

4.3.1 Key ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

4.3.2 Building blocks of the algorithm . . . . . . . . . . . . . . . . . 102

ix

4.3.3 Implementation strategies . . . . . . . . . . . . . . . . . . . . . 114

4.4 Structure synthesis case studies . . . . . . . . . . . . . . . . . . . . . . 116

4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

4.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

5 Transport analysis based 3D-TCAD Parasitic Capacitance Extraction in

Emerging Technologies 125

5.1 The need for transport analysis based parasitic capacitance extraction 126

5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.1.2 Transport analysis based capacitance extraction . . . . . . . . 127

5.1.3 Methodology and results . . . . . . . . . . . . . . . . . . . . . 130

5.1.4 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 134

5.2 Hardware-assisted predictive capacitance extraction in 32nm SOI 6T

SRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

5.2.2 Methodology and results . . . . . . . . . . . . . . . . . . . . . 136

5.2.3 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 139

5.3 Transport analysis based parasitic capacitance extraction in emerg-

ing multi-gate devices and circuits . . . . . . . . . . . . . . . . . . . . 142

5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

5.3.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.3.3 Multi-gate device-level parasitics . . . . . . . . . . . . . . . . . 144

5.3.4 Multi-gate circuit-level parasitics . . . . . . . . . . . . . . . . . 150

5.3.5 Multi-gate parasitics vs. device transport . . . . . . . . . . . . 162

5.3.6 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6 Parasitics-aware Design of Symmetric and Asymmetric Gate-workfunction

FinFET SRAMs 171

x

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

6.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

6.3 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

6.3.1 DC metrics of 6T FinFET SRAMs . . . . . . . . . . . . . . . . . 175

6.3.2 Transport analysis based 3D-TCAD extraction of FinFET

SRAM parasitic capacitances . . . . . . . . . . . . . . . . . . . 178

6.3.3 Modeling dynamic behavior of FinFET SRAM bitcells . . . . . 179

6.4 Design of 6T FinFET SRAMs . . . . . . . . . . . . . . . . . . . . . . . . 180

6.4.1 6T FinFET SRAM topologies . . . . . . . . . . . . . . . . . . . 180

6.4.2 6T FinFET SRAM DC metrics . . . . . . . . . . . . . . . . . . . 186

6.4.3 6T FinFET SRAM parasitic capacitances . . . . . . . . . . . . . 194

6.4.4 Transient behavior of 6T FinFET SRAMs . . . . . . . . . . . . 199

6.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

7 Conclusion 207

7.1 Dissertation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

A FinE3D framework 219

A.1 FinE3D Sentaurus TCAD decks . . . . . . . . . . . . . . . . . . . . . . 219

Bibliography 223

xi

List of Tables

3.1 FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2 Standard cell FinFET INV characteristics, VLOW =−0.2V,VHIGH = 1.2V 53

3.3 Standard cell FinFET NAND2 characteristics . . . . . . . . . . . . . . 53

3.4 TG latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,

T2 = SG(1P1N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.5 HS latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,

N1/N3/N7 = SG(2N), I5 = SG(2P1N) . . . . . . . . . . . . . . . . . . . 60

3.6 Hold static noise margins, xPyN = x-fin p-FinFET, y-fin n-FinFET . . 62

3.7 ON-state current for individual FinFET devices . . . . . . . . . . . . . 69

3.8 Metrics of SG/LP-mode FinFET INV/NAND gates . . . . . . . . . . 71

3.9 Detected and undetected faults in SG- and LP-mode FinFET NAND

gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.10 Shorting source and drain of an n-/p-FinFET in SG/LP-mode

INV/NAND gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.1 Process feature rulebook examples . . . . . . . . . . . . . . . . . . . . 109

4.2 Resource usage: Process simulation vs. structure synthesis . . . . . . 118

5.1 Bulk and SOI FinFET device parameters . . . . . . . . . . . . . . . . . 145

6.1 22nm SOI FinFET device parameters . . . . . . . . . . . . . . . . . . . 174

6.2 6T FinFET SRAM device configurations . . . . . . . . . . . . . . . . . 182

xii

List of Figures

1.1 The family of multiple-gate transistors [1] . . . . . . . . . . . . . . . . 3

1.2 FinFET types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Electrostatic integrity for different transistor configurations [1] . . . . 5

2.1 TCAD to SPICE model generation . . . . . . . . . . . . . . . . . . . . 13

2.2 The Sentaurus TCAD ecosystem . . . . . . . . . . . . . . . . . . . . . 13

2.3 Generic process simulation steps . . . . . . . . . . . . . . . . . . . . . 15

2.4 Transport models used in device simulation [2] . . . . . . . . . . . . . 17

2.5 I-V comparison between Sentaurus Device and Spice3-UFDG com-

pact model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Generic multi-gate device fabrication flow . . . . . . . . . . . . . . . . 24

2.7 Outline of a gate-first process . . . . . . . . . . . . . . . . . . . . . . . 26

2.8 Outline of a gate-last process . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 FinE simulation framework for double-gate circuit design space ex-

ploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2 SG-/IG-mode 3D FinFET structures simulated in Sentaurus TCAD . 34

3.3 Two-dimensional (X-Y ) cross-section of an n-FinFET simulated in

Sentaurus TCAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4 Symm-ΦG FinFET symbols: (a) SG-mode n-type, (b) IG-mode n-

type, (c) SG-mode p-type, and (d) IG-mode p-type . . . . . . . . . . . 36

xiii

3.5 Electrostatic potential and electron density distributions within the

fin region of an SG-mode n-FinFET for on-state (VGFS = VGBS =

1V,VDS = 1V ) and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions . . 37


fin region of an IG-mode n-FinFET for on-state (VGFS = 1V,VGBS =

−0.2V,VDS = 1V ), and off-state (VGFS = 0V,VGBS = −0.2V,VDS = 1V )

conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.7 IDS vs. VGFS for an IG-mode n-FinFET, VDS = 1V,VGBS varying from

0V to −0.3V . IOFF = IDS(VGFS = 0V ) varies by 120× . . . . . . . . . . . 40

3.8 Asymm-ΦG FinFET symbols: (a) a-SG-mode n-type, and (b) a-SG-

mode p-type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40


fin region of an a-SG-mode n-FinFET for on-state (VGFS = VGBS =

1V,VDS = 1V ), and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions . 41

3.10 Energy band diagrams for (a) a-SG-mode n-FinFET, off-state (VGFS =

VGBS = 0V,VDS = 1V ), and (b) IG-mode n-FinFET, off-state (VGFS =

0V,VGBS =−0.2V,VDS = 1V ) . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.11 IDS vs. VGFS for an a-SG-mode n-FinFET (VDS = 1V ), with corre-

sponding curves for SG-mode and IG-mode n-FinFETs . . . . . . . . 43

3.12 IDS vs. VGFS for an a-SG-mode p-FinFET (|VDS| = 1V ), with corre-

sponding curves for SG-mode and IG-mode p-FinFETs . . . . . . . . 43

3.13 ION characteristics vs. variations in LG, TSI , and LUN . . . . . . . . . . 45

3.14 IOFF characteristics vs. variations in LG, TSI , and LUN . . . . . . . . . . 46

3.15 ILEAK distribution for a-SG-/SG-/IG-mode n-FinFETs under gate

workfunction fluctuations, σΦG = 50meV . . . . . . . . . . . . . . . . . 47

3.16 IDS vs. VGFS for an n-FinFET at different temperatures . . . . . . . . . 47

xiv

3.17 IOFF vs. temperature for an a-SG-mode n-FinFET with correspond-

ing curves for SG-mode and IG-mode n-FinFETs . . . . . . . . . . . . 48

3.18 Fractional error in IDS vs. VGFS for 2D/3D device simulations . . . . . 49

3.19 INV gates: (a) SG, (b) LP, (c) IGn, and (d) IGp . . . . . . . . . . . . . . 50

3.20 INV layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.21 NAND2 gates: (a) SG, (b) LP, and (c) MT . . . . . . . . . . . . . . . . . 51

3.22 NAND2 gates: (a) IG, (b) IG2, (c) XT, and (d) XT2 . . . . . . . . . . . 51

3.23 NAND2 layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.24 Asymm-ΦG SG-mode FinFET gates: (a) a-SG-INV, (b) a-SG-NAND2,

and (c) a-SG-NAND2S . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.25 Leakage-delay spectrum for FinFET INV configurations . . . . . . . . 53

3.26 Leakage-delay spectrum for FinFET NAND2 configurations . . . . . 54

3.27 SG-NAND2 transient charactertistics. Input rise time has been in-

creased to 50ps from 10ps to improve visibility. . . . . . . . . . . . . . 55

3.28 XT2-NAND2 transient charactertistics. Input rise time has been in-

creased to 50ps from 10ps to improve visibility. . . . . . . . . . . . . . 56

3.29 Leakage-delay spectrum for asymm-ΦG FinFET logic gates . . . . . . 57

3.30 Average leakage (ILEAK) vs. temperature for FinFET INV and

NAND2 standard cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.31 FinFET latch templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.32 TG flip-flop (TGF) template . . . . . . . . . . . . . . . . . . . . . . . . 59

3.33 HS flip-flop (HSF) template . . . . . . . . . . . . . . . . . . . . . . . . 59

3.34 Transient simulations of TGF1 and HSF1 . . . . . . . . . . . . . . . . . 62

3.35 Average ILEAK for FinFET latches . . . . . . . . . . . . . . . . . . . . . 63

3.36 Average ILEAK for FinFET flip-flops . . . . . . . . . . . . . . . . . . . . 63

3.37 Average propagation delay for FinFET latches . . . . . . . . . . . . . 64

3.38 Average propagation delay for FinFET flip-flops . . . . . . . . . . . . 64

xv

3.39 Setup time for FinFET flip-flops . . . . . . . . . . . . . . . . . . . . . . 65

3.40 (a) SG-mode INV, (b) LP-mode INV, (c) SG-mode NAND, and (d)

LP-mode NAND. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

3.41 Leakage and delay characteristics under different back-gate bias

voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . . . . 70

3.42 (a) Regime I: Opens on shared back-gate bias lines for many LP-

mode INV gates, and (b) Regime II/III: Opens on individual back-

gate bias lines for an LP-mode INV gate . . . . . . . . . . . . . . . . . 74

3.43 Leakage and delay variation with different p-FinFET back-gate bias

voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . . . . 76

3.44 Leakage and delay variation under different n-FinFET back-gate

bias voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . 78

3.45 Leakage and delay variation with different p-FinFET back-gate bias

voltages for (a) SG-mode INV, and (b) SG-mode NAND. . . . . . . . 80

3.46 Leakage and delay variation with different n-FinFET back-gate bias

voltages for (a) SG-mode INV, and (b) SG-mode NAND. . . . . . . . 81

3.47 Effect of cutting a subset of fins in an LP-mode NAND gate p-

FinFET with four fins on (a) delay, and (b) leakage. . . . . . . . . . . . 82

3.48 Pulse characterization setup for (a) SG-mode INV, and (b) SG-mode

NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

3.49 Interplay between CD,BG and CFG,BG with (a) LUN variation, TSI =

10nm, and (b) TSI variation, LUN = 16nm . . . . . . . . . . . . . . . . . 84

3.50 Transient pulse behavior of SG-mode INV in Regime II with (a) n-

FinFET back-gate cut, and (b) p-FinFET back-gate cut . . . . . . . . . 85

3.51 Transient pulse behavior of LP-mode INV in Regime II with (a) n-


xvi

3.52 Transient pulse behavior of SG-mode INV having n-FinFET back-

gate cuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime

III. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

3.53 Transient pulse behavior of SG-mode INV having p-FinFET back-

gate cuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime

III. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

3.54 Transient pulse behavior of LP-mode INV in Regime III with (a) n-


3.55 Transient pulse behavior of SG-mode NAND in Regime III having

(a) n-FinFET back-gate cut at A, (b) n-FinFET back-gate cut at B, and

(c) p-FinFET back-gate cut at A. . . . . . . . . . . . . . . . . . . . . . . 89

4.1 Technology-circuit co-design gap . . . . . . . . . . . . . . . . . . . . . 92

4.2 TCAD flow for the 130nm node and higher . . . . . . . . . . . . . . . 92

4.3 TCAD flow for 90nm-32nm technology nodes . . . . . . . . . . . . . . 93

4.4 The ultimate wishlist for 3D-TCAD assisted process/device devel-

opment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

4.5 TCAD modeling quadrants . . . . . . . . . . . . . . . . . . . . . . . . 97

4.6 (a) Modeling ambiguity with manual inputs, and (b) difficulty of

iterative optimization with human elements in the TCAD flow . . . . 99

4.7 3D-TCAD structure generation for layouts: (a) traditional approach,

and (b) proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 100

4.8 Delineation of process zones . . . . . . . . . . . . . . . . . . . . . . . 104

4.9 Construction of device-layout database (DLD) . . . . . . . . . . . . . 106

4.10 Pre-synthesis transformations on PA-GA zones . . . . . . . . . . . . . 107

4.11 Process feature rulebook (PFRB) generation . . . . . . . . . . . . . . . 108

4.12 Layout analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.13 Layout annotation for a 1×1 6T FinFET SRAM bitcell . . . . . . . . . 111

xvii

4.14 Generation of lithography-effects database (LED) . . . . . . . . . . . 112

4.15 Architecture of the structure synthesizer . . . . . . . . . . . . . . . . 113

4.16 FEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 114

4.17 BEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 115

4.18 Integrated structure synthesis . . . . . . . . . . . . . . . . . . . . . . . 116

4.19 Structure formation during a planar 6T SRAM process simula-

tion: (a) trench device isolation, (b) formation of gate stack, (c)

source/drain formation with spacers, (d) contact and via formation,

and (e) final structure with doping . . . . . . . . . . . . . . . . . . . . 117

4.20 (a) Synthesized planar 6T SRAM structure, and (b) CBL extraction

error percentage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

4.21 Process-simulated versus synthesized 6T SRAM cells: (a) hold static

noise margin (HSNM), and (b) read static noise margin (RSNM) . . . 119

4.22 Synthesized 6T FinFET SRAM bitcell configurations . . . . . . . . . . 120

4.23 3×3 6T FinFET SRAM bitcell structure with mesh . . . . . . . . . . . 121

4.24 Synthesized FinFET ring oscillator configurations . . . . . . . . . . . 122

4.25 6T FinFET SRAM: Synthesis time (in sec.) versus number of FinFETs 122

4.26 FinFET ring oscillator: Synthesis time (in sec.) versus number of

FinFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

4.27 Logic synthesis flows are the circuit-world analogs of Fig. 4.7(b) . . . 124

5.1 Cross-sectional view of a metal wire running over an active semi-

conductor region with two arbitrary doping profiles . . . . . . . . . . 129

5.2 Comparison between FS and TCAD extracted capacitance CAB under

different conditions, ω/2π = 1MHz . . . . . . . . . . . . . . . . . . . . 129

5.3 32nm planar SOI Type I and II SRAM structures . . . . . . . . . . . . 131

5.4 Type I computed BEOL capacitances (TCAD vs. FS) . . . . . . . . . . 132

5.5 Type II computed BEOL capacitances (TCAD vs. FS) . . . . . . . . . . 132

xviii

5.6 Performance difference in Type I & II cells during read operations . . 133

5.7 Type I and Type II Read stability (TCAD vs. FS) . . . . . . . . . . . . 134

5.8 Thin-cell 6T SRAM array SEM top view showing HKMG n-/p-FETs . 135

5.9 Measured intra-wafer CBL for (a) 6T1, and (b) 6T2 . . . . . . . . . . . 136

5.10 Measured inter-wafer CBL for (a) 6T1, and (b) 6T2 . . . . . . . . . . . 136

5.11 Synthesized (FEOL+BEOL) structure for the 6T1 SRAM bitcell . . . . 137

5.12 Effect of variation in BEOL parameters (subject to intra-wafer toler-

ances) on CBL and CWL for 6T1 . . . . . . . . . . . . . . . . . . . . . . . 138

5.13 (a) Measured vs. simulated CGS-VGS data for the nMOS capacitor

structure in Fig. 5.13(b) with width 1µm×2 fingers, (b) Multi-finger

(FEOL+BEOL) nMOS capacitor structure . . . . . . . . . . . . . . . . 139

5.14 CBL variation with p-well dose for (a) 6T1, and (b) 6T2 . . . . . . . . 140

5.15 (a) P-well dose distribution computed from measured 6T1 CBL

distribution [Fig. 5.10(a)] and the characteristic curve of 6T1 [Fig.

5.14(a)], and (b) measured vs. predicted distribution for 6T2. The

characteristic curve of 6T2 [Fig. 5.14(b)] along with the computed

p-well dose distribution [Fig. 5.15(a)] is used to compute the 6T2

CBL distribution. BEOL variation is not considered. . . . . . . . . . . 141

5.16 (a) Bulk FinFET, and (b) SOI FinFET . . . . . . . . . . . . . . . . . . . 145

5.17 Bulk FinFET ‘gate-last’ process simulation steps . . . . . . . . . . . . 146

5.18 Dependence of CDRAIN,TOT and CGAT E,TOT on LG and HGAT E . . . . . . 147

5.19 Dependence of CDRAIN,TOT and CGAT E,TOT on LSP and TSI . . . . . . . . 148

5.20 Dependence of CDRAIN,TOT and CGAT E,TOT on HFIN and HELEV . . . . . 149

5.21 Dependence of CDRAIN,TOT and CGAT E,TOT on NCH and LDL . . . . . . . 149

xix

5.22 3D-TCAD based capacitance extraction for generic multi-gate cir-

cuit layouts: (a) traditional approach using brute-force process sim-

ulation, and (b) our flow which leverages the automated structure

synthesis approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

5.23 Multi-fin FinFET (a) bulk, and (b) SOI structures. Dielectric regions

are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

5.24 Dependence of CDRAIN,TOT and CGAT E,TOT on FP . . . . . . . . . . . . 154

5.25 Bulk FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and

(b) FEOL only. Dielectric regions are not shown . . . . . . . . . . . . 155

5.26 SOI FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)

FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 156

5.27 CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. FP, GP = 90nm . . . . . . . . 157

5.28 FEOL components of capacitance in the 6T SRAM (111) configura-

tion, FP = 50nm, GP = 90nm . . . . . . . . . . . . . . . . . . . . . . . . 158

5.29 CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. GP, FP = 50nm . . . . . . . . 158









5.34 CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM

(FEOL+BEOL) configurations . . . . . . . . . . . . . . . . . . . . . . . 161

5.35 CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM FEOL

configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

xx

5.36 BEOL metal stack from the 22nm 6T SRAM (111) bitcell (a) without

lithography effects, and (b) with lithography effects. Dielectric re-

gions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

5.37 CBL,TOT , CWL,TOT , and CNL,TOT error percentages for (a) bulk 6T

SRAM (111) configuration, FP = 50nm, and (b) varying FP for the

22nm bulk 6T SRAM (111) configuration. GP = 90nm . . . . . . . . . . 163

5.38 Vanilla mixed-mode setup (V MM) . . . . . . . . . . . . . . . . . . . . 164

5.39 Mixed-mode setup with FS-extracted BEOL capacitances (FSMM) . . 165

5.40 Mixed-mode setup with corrected 3D-TCAD capacitances (3D-

TCADMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

5.41 Write operations for a 6T FinFET SRAM (111) bitcell using the setups

described in Figs. 5.38, 5.39, and 5.40 . . . . . . . . . . . . . . . . . . . 167

5.42 Minimum write pulse width (TW ) vs. cell sigma . . . . . . . . . . . . . 167

5.43 (a) SG-NAND2, and (b) LP-NAND2 FinFET configurations . . . . . . 168

5.44 Propagation delays of (a) SG-NAND2, and (b) LP-NAND2 config-

urations with different physical models. (DD = Drift-diffusion for-

malism, HD = hydrodynamic formalism, PC = 3D-TCAD-extracted

parasitic capacitances corrections added) . . . . . . . . . . . . . . . . 170

6.1 (a) Two-dimensional SOI n-FinFET cross section, and (b) 3D SOI n-

FinFET structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

6.2 Setup for (a) DC hold metrics, and (b) DC read/write metrics . . . . 176

6.3 N-curve for the DC read condition . . . . . . . . . . . . . . . . . . . . 177

6.4 N-curve for the DC write condition . . . . . . . . . . . . . . . . . . . . 178

6.5 Hybrid mixed-mode device simulation methodology for simulating

SRAM read/write operations . . . . . . . . . . . . . . . . . . . . . . . 180

6.6 V(135) bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric re-


xxi

6.7 PGFB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric re-


6.8 PGFB-PUWG bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielec-

tric regions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.9 PGFB-SPU bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric

regions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

6.10 RBB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regions

are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

6.11 Bitcell areas normalized to FP×GP . . . . . . . . . . . . . . . . . . . . 185

6.12 V SC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 186

6.13 IGC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 187

6.14 MSC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 188

6.15 V SC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 189

6.16 IGC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 190

6.17 MSC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 191

6.18 IREAD vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . 192

6.19 ILEAK vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . 193

6.20 Breakup of V (111) (FEOL+BEOL) BL and WL capacitances . . . . . . 194



6.23 Breakup of PGFB (FEOL+BEOL) BL and WL capacitances . . . . . . . 195

6.24 Breakup of PGFB-SPU (FEOL+BEOL) BL and WL capacitances . . . . 196

6.25 Breakup of PGFB-PUWG (FEOL+BEOL) BL and WL capacitances . . 196

6.26 Breakup of RBB (FEOL+BEOL) BL and WL capacitances . . . . . . . . 196

6.27 (FEOL+BEOL) BL and WL capacitances in V SC bitcells . . . . . . . . . 197

6.28 (FEOL+BEOL) capacitances vs. FP for (a) CBL, and (b) CWL (GP= 90nm)197

6.29 IGC bitcell capacitances vs. V (111): (a) (FEOL+BEOL), and (b) FEOL 198

xxii

6.30 TR vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . . . 199

6.31 TR vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V . . . . . . . . 200

6.32 TW vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . . 202

6.33 TW vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V . . . . . . . . 203

6.34 TR vs. array configuration: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . 204

6.35 TW vs. array configuration: (a) V SC, (b) IGC, and (c) MSC . . . . . . . 205

6.36 (a) TR, and (b) TW vs. VDD for V (111), across different FP . . . . . . . 206

7.1 Scaling behavior: Direct versus iterative linear solvers . . . . . . . . . 212

7.2 Key question: Can the solution of a large structure be approximated

using individual pre-solved device states? . . . . . . . . . . . . . . . . 214

7.3 Transient mixed-mode 2D device simulation runtimes for FinFET

NAND gates, with and without cache-restore of device states . . . . 215

7.4 Generation of device-state database . . . . . . . . . . . . . . . . . . . 215

7.5 State retrieval and extrapolation using the BGB algorithm . . . . . . . 216

7.6 Updating state in the solver loop . . . . . . . . . . . . . . . . . . . . . 217

A.1 A sample process simulation deck . . . . . . . . . . . . . . . . . . . . 220

A.2 Pre-synthesis transformations . . . . . . . . . . . . . . . . . . . . . . . 220

A.3 Layout annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

A.4 FEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 221

A.5 BEOL and integrated structure synthesis . . . . . . . . . . . . . . . . 222

A.6 Mesh refinement, capacitance extraction, and post-processing . . . . 222

xxiii

Chapter 1

Introduction

For the last three decades, CMOS technology has provided consistent scaling and

enabled the implementation of high-density, high-speed, and low-power VLSI sys-

tems. In continuing the march towards denser circuitry, however, it has become

apparent that scaling the classical “bulk” MOSFET below the 22nm node is not

practical on account of poor electrostatic behavior [3], [4]. This has triggered re-

search into silicon-on-insulator (SOI) structures like partially-depleted SOI and

fully-depleted SOI (FD-SOI) with better short-channel effects (SCEs), greater per-

formance, and lower power consumption [5], [6], [7].

1.1 The move to multi-gate transistors

Over the past decade, transistor structures have evolved a step further from planar,

classical, single-gate FETs to 3D multi-gate FETs whose behavior can only be fully

explained by advanced carrier transport phenomena. Multi-gate FET technology is

slated to replace FD-SOI and other scaled planar bulk technologies in this decade

according to the International Technology Roadmap for Semiconductors (ITRS)

[8]. Indeed, Intel and TSMC have announced their switch to such devices at the

1

upcoming technology nodes, and other semiconductor companies are expected to

follow suit.

Multi-gate FETs can be classified into (i) double-gate structures, e.g., DELTA

FET [9], silicon-on-nothing FET [10], multiple independent-gate FET [11], and Fin-

FET [12], (ii) triple-gate structures, e.g., trigate FET [13], Π-gate FET [14], and Ω-

gate FET [15], and (iii) surround-gate structures, e.g., cylindrical FET [16], [17],

multi-bridge channel FET [18], planar “gate-all-around” FET [19], twin-silicon-

nanowire FET [20], and nano-beam stacked channel FET [21]. Fig. 1.1 summarizes

the multi-gate family described above.

From the fabrication perspective, the most likely candidate for widespread

adoption amongst the above is the FinFET [22–26]. Over the past few years, con-

siderable research has been directed towards issues dealing with improving and

economically integrating FinFET technology into the conventional CMOS process

[22, 25, 27–30]. Also, analysis and design of FinFET devices and digital circuits

[31–62], analog/RF circuits [63–70], and SRAMs [71–78] have been very active ar-

eas of research in recent years.

The FinFET device structure consists of a silicon fin surrounded by shorted or

independent gates on either side of the fin, typically on an SOI substrate. The dis-

tance between the center points of consecutive fins is referred to as the fin pitch,

while the distance between the center points of consecutive gate conductors is re-

ferred to as the gate pitch. In the shorted-gate (SG) mode of operation, the two

gates are biased together to turn-on the device, providing maximum gate drive

[Fig. 1.2(a)]. In the independent-gate (IG) mode of operation [Fig. 1.2(b)], the

two gates are electrically independent. The back-gate bias can be used to alter the

threshold voltage (Vth) of the front gate, thereby controlling the off-current (IOFF )

of the device [79]. IOFF in SG-mode devices is generally much higher than corre-

sponding IG-mode devices (with reverse biased back-gates), and due to the fixed

2

Figure 1.1: The family of multiple-gate transistors [1]

Vth, it cannot be altered electrically. The Vth is typically controlled by directly setting

the gate workfunction. While IG-mode devices provide the advantage of electri-

cally controlling device Vth, and hence delay/leakage, they lead to a more com-

plicated transistor layout strategy. This is due to the fact that multi-fin IG-mode

FinFETs need larger spacing between the source and drain regions, as well as larger

3

DRAIN

GATE

Z

Y X

SOURCE

HFIN

LG

TSI

(a) SG-mode FinFET

DRAIN

SOURCE

FRONT GATE

BACK GATE

Y

Z

X

(b) IG-mode FinFET

Figure 1.2: FinFET types

fin pitch in order to land a contact to the back gate in comparison to corresponding

multi-fin SG-mode FinFETs that have compact layouts.

Overall, the move to multi-gate transistors presents the following possibilities:

• Multiple gates lead to better electrostatic integrity (EI), which is an important

measure of the short-channel behavior of the FET. SCEs are collectively a

set of undesirable effects [e.g., drain-induced barrier lowering (DIBL)] that

reduce the ability of the gate to control the channel potential as the proximity

between the source and drain regions decreases [1]. As shown in Fig. 1.3, the

extent of penetration of source/drain fields into the FET body is minimized

with multiple gates, and as a result, the roll-off in FET Vth on account of gate

length reduction is alleviated. Better EI also translates to steep subthreshold

slopes and lower sub-threshold leakage current.

• Owing to the small doping volumes involved with the active fin regions,

undoped-/intrinsic-body FinFETs are favorable from the perspective of man-

ufacturabililty. This makes the device virtually free of the random dopant

4

Figure 1.3: Electrostatic integrity for different transistor configurations [1]

fluctuation effect, apart from unintentional source/drain dopants that can

diffuse into the channel region.

• Undoped-body FinFETs lead to decreased impurity scattering for charge car-

riers [1]. Also, thin fins reduce interface scattering owing to volume inver-

sion where majority of the carriers are confined to the center of the fin due to

quantum confinement. Both of these effects coupled with channel strain can

greatly enhance channel mobility and, hence, drain current.

• With the advent of high-k metal-gate (HKMG) in scaled planar technologies,

physical gate-dielectric thickness can be high with a low effective oxide thick-

ness (EOT). Since gate leakage is exponentially dependent on physical gate-

dielectric thickness, gate leakage has been curtailed to a large extent with

HKMG. In FinFETs, HKMG gate stacks coupled with an intrinsic body low-

ers the surface electric field considerably and, hence, reduces gate leakage

even further.

1.2 Dissertation contributions

The major contribution of this thesis is toward the development of efficient

methodologies for unifying the circuit layout/process/device simulation worlds

for early-stage 3D-Technology CAD (3D-TCAD) based modeling of emerging

5

devices and circuits, in particular multi-gate devices such as FinFETs and Tri-gate

transistors. Specifically, the algorithms and frameworks developed herein will em-

power engineers to breach the decade long ‘many-device TCAD barrier’ that has

severely limited the scope of traditional continuum TCAD methods. Other contri-

butions include the design and test of FinFET logic circuits in high-performance

processes and parasitics-aware design of FinFET SRAMs.

These contributions are described in more detail next.

1.2.1 Design and test of FinFET logic circuits

This work consists of two major themes encompassing design and test of FinFET

logic circuits. The first deals with the design of FinFET logic and sequential ele-

ments in a high-performance process. The second explores the possibility of de-

veloping fault models for FinFET logic circuits that employ a mix of SG and IG

FinFETs. A brief summary of the contributions of this work is:

• A head-to-head comparison between symmetric gate-workfunction (Symm-

ΦG) SG- and IG-mode FinFETs along with asymmetric gate-workfunction

(Asymm-ΦG) SG-mode FinFETs and logic/sequential elements employing

them in a 22nm SOI process using TCAD device simulations, where

– Asymm-ΦG FinFETs have better leakage and on-current behavior than

Symm-ΦG IG-mode FinFETs.

– Standard cell logic gates employing Asymm-ΦG FinFETs have better

leakage-delay characteristics than the best logic gates that can be formed

using a mix of Symm-ΦG SG- and IG-mode FinFETs.

– Latches and flip-flops employing a mix of Symm-ΦG/Asymm-ΦG SG-

mode FinFETs are able to optimize delay/setup-time in the best possible

manner.6

• While CMOS fault models overlap considerably with fault models for FinFET

logic circuits, open defects on the back-gate of IG-mode devices/SG-mode

devices, which have accidentally been converted to IG-mode, do not have a

single fault model that is able to capture the observed characteristics.

– Logic gates employing IG-mode FinFETs exhibit a wide range of behav-

iors making it impossible to develop a single test protocol to detect de-

fects, owing to the non-injective, non-surjective mapping of logic gate

behaviors to traditional fault models.

1.2.2 Efficient algorithms for 3D-TCAD modeling of emerging

devices and circuits

In this work, efficient and accurate methodologies are developed for unifying

the layout and process simulation worlds, thereby, expanding the horizon of pre-

dictive modeling for emerging devices beyond the ‘many-device TCAD barrier,’

which is a major showstopper at lower technology nodes. In particular, this work:

• Identifies important bottlenecks that plague modeling efforts in 3D-TCAD

structure generation.

– Presents a compelling case for a layout/process/device-independent

methodology for bypassing the 3D process simulation barrier.

• Proposes an innovative 3D-TCAD ‘structure synthesis’ methodology which

enables automated layout to 3D-TCAD structure generation, and is akin to

logic synthesis in the circuit design world.

– Outlines the necessary algorithms needed to accomplish layout/process

and technology node-independent structure synthesis with reasonable

time/memory complexity.7

– Evaluates the efficacy of the approach by comparing process-simulated

and synthesized structures.

– Enables transport analysis based capacitance extraction, which is critical

for highly scaled devices and circuits.

1.2.3 Transport analysis based 3D-TCAD parasitic capacitance ex-

traction in emerging technologies

After addressing the 3D-TCAD structure generation issue, several hitherto in-

tractable problems in nanoscale device/circuit modeling enter the realm of feasi-

ble solutions. One such problem is parasitic capacitance extraction in nanoscale

circuits like SRAMs, eDRAMs, etc., where traditional segregated approaches to

modeling front-end-of-the-line (FEOL) and back-end-of-the-line (BEOL) capaci-

tances break down. Indeed, the latter is listed as an issue in the 2011 ITRS modeling

and simulation roadmap [80] in Section 3.5. In this work, we demonstrate:

• The need for transport analysis based capacitance extraction for nanoscale

circuits

– Via comprehensive evaluations of the ‘BEOL component’ of parasitic

capacitances obtained using field solvers and 3D-TCAD extraction in

sub-32nm SOI 6T SRAM structures.

– For multi-bitcell based extractions and quantifying the role of edge ef-

fects during extraction.

• Hardware validation of the structure synthesis methodology using bit line

capacitance data from several SRAM arrays in an experimental 32nm SOI

process.

8

• The need for FEOL circuit extraction in multi-gate SRAMs, using a multi-gate

version of the structure synthesizer at the 22/14/10nm nodes.

1.2.4 Parasitics-aware design of FinFET SRAMs

The biggest benefit to chip designs on account of moving to multi-gate devices will

be the massive density improvement in on-chip memories. This work consists of

two major threads:

• Design of FinFET SRAM bitcells using Symm-ΦG and Asymm-ΦG devices

considering DC metrics in a 22nm SOI process.

• A comprehensive evaluation of current and new 6T SRAM topologies from

the perspective of parasitic capacitances/transient analysis, which show that

using DC targets alone can lead to sub-optimal bitcell choices.

1.3 Dissertation structure

The rest of this thesis is organized as follows. Chapter 2 provides a background on

the role of TCAD in the IC design eco-system. Since a bulk of the work in this thesis

is based on multi-gate devices/circuits modeled in the state-of-the-art Sentaurus

TCAD tool suite [81], a brief survey of the transport/physical models used for de-

vice simulation is also presented. Thereafter, contemporary multi-gate compact

models, such as Spice3-UFDG and BSIM-CMG/IMG, are covered. Next, fabrica-

tion flows for high-k metal-gate FinFETs are briefly discussed. This is followed by

the challenges/trade-offs involved in adopting multi-gate FET technology.

Chapters 3, 4, 5, and 6 address several different problems that are relevant to

FinFET logic/memory circuit design and parasitic capacitance extraction. On ac-

count of the rapid increase in process complexity at each technology node, the dif-

9

ferentiation between high-performance/low-power processes is becoming increas-

ingly difficult to sustain, and leading-edge foundries are focussing on enabling

both ends of the spectrum in a single process [82]. In this context, Chapter 3 delves

into the design of ultra-low power FinFET logic and sequential circuits in a 22nm

SOI high-performance process, using Symm-ΦG and Asymm-ΦG FinFETs. Chap-

ter 3 also touches upon an important aspect of the FinFET chip design process that

was hitherto unexplored, which is testing of FinFET logic circuits. Here, a detailed

analysis of mapping defects to fault models under different regimes is presented.

Chapter 4 tackles the 3D process simulation barrier that has severely im-

peded progress in 3D-TCAD simulation, which is widely used for modeling

FinFET logic/memory circuits as well as other nano-scale devices in flash mem-

ories, eDRAMs, etc. An efficient layout/process/technology-node-independent

methodology is proposed to enable the sythesis of device-simulation-ready struc-

tures from generic input layouts, with the aid of technology process assumptions

and process-simulated-device databases. The scaling properties of the method as

well as comparisons with layout-process-simulated structures are also presented,

to highlight the practicality of the approach.

Chapter 5 delves into the pressing problem of accurate layout-aware para-

sitic capacitance extraction for FEOL and BEOL features in highly scaled CMOS

circuits, during early stages of technology development. It comprehensively

establishes the fact that reliance on segregated approaches with compact mod-

els/accelerated field solvers (which were used at higher technology nodes) is not

possible. Here, the scope of the work is broader than extraction of FinFET/multi-

gate circuit parasitics, as the problem becomes relevant from the planar CMOS

32nm node itself. The methodology proposed in Chapter 4 immediately enables

the correct physics-based approach, which is transport analysis based parasitic

capacitance extraction in a device simulator, on the 3D device-simulation-ready

10

structures obtained from the respective layouts. Hardware validation of the

unified parasitic capacitance extraction methodology is also provided in an ex-

perimental 32nm IBM SOI process. Thereafter, capacitance trends in bulk/SOI

FinFETs at the 22/14/10nm technology nodes are computed using a multi-gate

version of the structure synthesizer that is presented in Chapter 4.

Continuing along the lines of the parasitic capacitance extraction problem in

Chapter 5, Chapter 6 develops the entire technology-circuit co-design flow for

early-stage design of FinFET SRAMs that employ a combination of Symm-ΦG SG-

and IG-mode and Asymm-ΦG FinFETs. Leveraging 3D-TCAD parasitic capaci-

tance extraction and back-annotations of capacitances into mixed-mode 2D-TCAD

circuit simulations, several different topologies are examined from a transient anal-

ysis perspective, and the problems with DC target based classification of SRAM

bitcells are highlighted.

Finally, Chapter 7 concludes this thesis and presents directions for future

work. The abstractions provided in Chapter 4 are radically different from the

traditional approaches seen in the TCAD community and enable ‘intelligent de-

vice state caching,’ which is proposed in Chapter 7. Here, individual devices

can be pre-solved under various bias conditions and their solution vectors can

be sampled/cached. These solutions can later be reused for generating intel-

ligent solution guesses for other similar layout-synthesized structures, upon

which DC/transient simulations are being performed, thereby accelerating device

simulation by orders of magnitude.

11

Chapter 2

Background

2.1 Device modeling with Technology CAD

IC design is carried out at various levels of abstraction: architecture, logic, transis-

tor, etc. TCAD is used at the lowest level of the hierarchy and enables technology

development with fewest abstractions. It is predominantly physics-based and has

traditionally been the primary vehicle for predictive modeling of transistors and

other active devices, considered to be part of FEOL manufacturing. TCAD is also

used to explore newer device designs and extrapolate to the next technology node,

besides giving engineers a better understanding of the benefits and drawbacks of

any modifications to existing manufacturing processes, as well as the development

of compact/SPICE models.

Fig. 2.1 depicts the typical sequence of steps involved in computational device

research and development with TCAD, and Fig. 2.2 shows a sample state-of-the-

art toolchain from Synopsys [81]. Initially, process descriptions are crystallized

into concrete assumptions from trial process runs. The process recipe is fed to a

process simulator, which applies the recipe to yield-sensitive circuit layouts, e.g.,

SRAM/eDRAM bitcells, to generate a process-simulated device (PSD) structure.

12

Process simulation

Device simulation

Process assumptions

Device parameter extraction

SPICE model

generation

Test structure

fabrication

Hardware

measurements

Figure 2.1: TCAD to SPICE model generation

Figure 2.2: The Sentaurus TCAD ecosystem

This structure is provided as an input to a device simulator that models electri-

cal/thermal transport behavior. This is followed by parameter extraction that is

useful for compact (SPICE) model development and verification, along with hard-

ware data from test structures in the process technology. Lithography simulation

(not shown), which is generally part of process simulation, assists in the formu-

lation of design rules and process-development-kits that are extensively used by

13

circuit designers. The two keys steps in Fig. 2.1 are process and device simulation,

which have been used throughout the dissertation, and are described next.

2.1.1 Process simulation

The primary objective of process simulation is to accurately predict the physi-

cal/structural layers and geometry of devices at the end of a process run, as well

as the active dopant/stress distributions. As shown in Fig. 2.3, the input to pro-

cess simulation is a process flow guided by process assumptions and layout/layer

masks. The initial wafer/substrate is subject to a variety of process conditions,

each of which may involve steps like oxidation, diffusion, implantation, deposi-

tion, etching, etc. Lithography simulation is also performed to accurately capture

feature geometries.

Process simulation generally uses a finite-element or finite-volume mesh to

compute and store the device dopant and stress profiles. Every geometric change

in the simulation domain requires a new mesh that fits the new device boundaries,

in order to model the next series of process steps. The accuracy of the profiles

strongly depends on the choice of mesh nodes at any given time. The mesh should

be sufficiently dense to resolve all dopant and stress profiles, but not too dense, as

the computational cost increases rapidly with the number of mesh nodes. For ex-

ample, a typical deep-submicron planar CMOS process simulation may have more

than 100 mesh modifications. For each mesh change, data values on the new mesh

are obtained through interpolation. Balancing interpolation error and computa-

tional cost is the key to successful TCAD simulation.

The complexity of physical models is a major factor that impacts process simu-

lation. Simplified physics minimizes computation time. With technology scaling,

however, the need for ever more accurate doping/stress profiles has increased and

complex physical models are added at each new generation. On account of the de-

14

Layout (post

design-rule check)

Layout transformations,

logical operations on layers

Substrate/initial mesh

Process condition 1

Oxidation/diffusion

simulation

Lithography simulation

Implantation simulation

Deposition simulation

Etching simulation

Process condition 2

Process condition N

Modified GDS layout/

masks

Re-mesh for

device simulation

Process simulation

module

Figure 2.3: Generic process simulation steps

tailed physical modeling involved, process simulation is almost exclusively used

to fine-tune the development of individual devices. The limitations of process sim-

ulation spurred research directions presented in Chapter 4. The output of process

simulation, i.e., the PSD structure, is generally re-meshed for device simulation,

which is discussed next.

2.1.2 Device simulation

Device simulation is used to analyze the electrical and thermal behavior of the

PSD structure obtained from extensive process simulations. Its main elements are

PSD structure, material system parameters, circuit/contact boundary conditions,

list of physical effects to be captured, numerical constraints on the solver, carrier

transport model, and the modes of simulation, i.e., DC, AC or transient, with spe-

cific external biasing conditions. There are two types of device simulation: single15

device and mixed-mode. Single device simulation is used to investigate transport

phenomena in a single device. Mixed-mode simulation is used to study the behav-

ior of small circuits constructed out of individual device instances and is generally

less rigorous in terms of physical models, owing to the increase in simulation com-

plexity. Next, we discuss the different transport models that are commonly used

in device simulation and have been adopted throughout this dissertation.

2.1.3 Device transport and physical models in TCAD

Fig. 2.4 shows the typical transport models used in modeling nanoscale semicon-

ductor devices as well as the the scope of traditional continuum TCAD device

simulation, which is based on semi-classical approaches with quantum correc-

tions. The starting point for the semi-classical approach to modeling transport

is the Boltzmann Transport Equation (BTE), which is essentially a statement of the

conservation of particle probability flux in the six-dimensional phase space of po-

sition~r and crystal momentum~k. The probability distribution function f (~r,~k, t) is

the probability of finding a carrier with crystal momentum~k at position~r at time

t. The time evolution of f enables the calculation of carrier density n(~r, t), current

density ~J(~r, t), and energy density W (~r, t) as,

n(~r, t) =1V

Σ~k f (~r,~k, t) (2.1)

~J(~r, t) =− qV

Σ~k~v(~k) f (~r,~k, t) (2.2)

W (~r, t) =1V

Σ~kE(~k) f (~r,~k, t) (2.3)

16

where~v(~k) and E(~k) are the carrier velocity and energy, respectively. The BTE can

be derived from simple arguments to be

∂ f (~r,~k, t)∂t

+~vg ·~∇~r f (~r,~k, t)+~Fh·~∇~k f (~r,~k, t) =

∂ f∂t|coll + s(~r,~k, t) (2.4)

where~vg =1h~∇~kE(~k) is the group velocity of the carriers, ~F is the net external force,

∂ f∂t |coll is the rate of change in f on account of collisions and scattering, s(~r,~k, t) rep-

resents the change due to generation-recombination processes, and h = h2π

, where

h is Planck’s constant.

Model ImprovementsApproximate

Exact

Compact model Appropriate for circuit

design

Drift-diffusion equations

Hydrodynamic equations

Boltzmann transport

equation

Quantum hydrodynamics

Quantum Monte-Carlo

methods

Quantum kinetic / Wigner

equation

Green’s function methods

Schrödinger equation

Good for devices down to

0.5µm, includes µ(E)

Velocity overshoot is

accounted for properly

Accurate up to classical

limits

Hydrodynamic features +

quantum corrections

Accurate up to single

particle description

All classical features +

quantum corrections

Includes correlations in

both space and time domain

Can be solved only for a few

particles

Sem

i-cla

ssic

al ap

pro

ach

es

Qu

an

tum

ap

pro

ach

es

TCAD scope

Figure 2.4: Transport models used in device simulation [2]

In a modern device simulator, the typical equations that describe the motion

of charge carriers in a semiconductor device are the Poisson equation and carrier

17

continuity equations for electrons and holes, which are:

~∇ · (ε~∇φ) = q · (n+N−A − p−N+D ) (2.5)

~∇ ·~Jn = q ·(

R+∂n∂t

)(2.6)

~∇ · ~Jp =−q ·(

R+∂p∂t

)(2.7)

where φ is the electrostatic potential, ε is the position-dependent dielectric permit-

tivity, n and p are electron and hole concentrations, N−A and N+D are ionized acceptor

and donor impurity concentrations, ~Jn and ~Jp are electron and hole current density

vectors, and R is the net generation-recombination rate.

Drift-diffusion model: The drift-diffusion formalism [2] is the simplest of trans-

port models, and is derived from the BTE under the relaxation-time approxima-

tion. It has been the workhorse of most device simulators up to the beginning of

the deep-submicron regime. The drift-diffusion current relations are derived un-

der the assumption that the carriers are in thermal equilibrium with the lattice, and

are:

~Jn = q ·µn ·n ·[~∇

(EC

q−φ

)+

kB

q· NC

n·~∇(

n ·TL

NC

)](2.8)

~Jp = q ·µp · p ·[~∇

(EV

q−φ

)− kB

q· NV

p·~∇(

p ·TL

NV

)](2.9)

where µn and µp are electron and hole mobilities, TL is the lattice temperature, EC

and EV are the position-dependent conduction and valence band edge energies,

NC and NV are the effective density of states at the conduction and valence band

edges, and kB is the Boltzmann constant.

Hydrodynamic model: At the next level, hydrodynamic/energy balance trans-

port formalisms [2] increase modeling complexity, as they are derived from higher

moments of the BTE, and can account for effects like velocity overshoot, etc. In

18

the hydrodynamic model, carrier temperatures (Tn and Tp for electrons and holes,

respectively) are assumed to be different from the lattice temperature (TL). Here,

the current densities are:

~Jn = q ·µn ·n[~∇

(EC

q−φ

)+

kB

q· NC

n·~∇(

n ·Tn

NC

)](2.10)

~Jp = q ·µp · p ·[~∇

(EV

q−φ

)− kB

q· NV

p·~∇(

p ·Tp

NV

)](2.11)

In addition to Eqs. (2.5), (2.6), and (2.7), in the hydrodynamic model, the energy

balance equations state the conservation of average carrier energies. In terms of

the carrier temperatures Tn and Tp, they are:

~∇ · ~Sn = ~∇

(EC

q−φ

)·~Jn−

3 · kB

2·[

∂(n ·Tn)

∂t+R ·Tn +n ·

(Tn−TL

τε,n

)](2.12)

~∇ · ~Sp = ~∇

(EV

q−φ

)· ~Jp−

3 · kB

2·[

∂(p ·Tp)

∂t+R ·Tp +n ·

(Tp−TL

τε,p

)](2.13)

where τε,n and τε,p denote the electron and hole energy relaxation times, while ~Sn

and ~Sp are the electron and hole energy fluxes computed as:

~Sn =−κn ·~∇Tn−52· kB ·Tn

q·~Jn (2.14)

~Sp =−κp ·~∇Tp +52·

kB ·Tp

q· ~Jp (2.15)

Here, the thermal conductivities κn and κp are assumed to obey the Wiedemann-

Franz Law and are related to Tn and Tp by:

κn =

(52+ cn

)· k

2Bq·Tn ·µn ·n (2.16)

κp =

(52+ cp

)· k

2Bq·Tp ·µp · p (2.17)

19

Different variations of the hydrodynamic model exist in the literature [2] and have

been implemented in commmerical device simulators.

Lattice heating: In order to account for heating effects, the lattice heat flow

equation [2] can be solved, which is:

~∇ · ~SL = HG−ρL · cL ·∂TL

∂t(2.18)

where SL is the lattice heat flux defined as,

~SL =−κL ·~∇TL (2.19)

and ρL, cL, and κL are the mass density, specific heat, and thermal conductivity, re-

spectively. HG is the generated local heat density and is calculated from the trans-

port model. In the drift diffusion case, HG is defined as,

HG = ~∇ ·(

EC

q−φ

)·~Jn +~∇ ·

(EV

q−φ

)· ~Jp (2.20)

In the hydrodynamic case, HG can be defined in terms of the relaxation times:

HG =3 · kB

2·[

n ·(

Tn−TL

τε,n

)+ p ·

(Tp−TL

τε,p

)](2.21)

Density gradient quantization model: While modeling nanoscale devices like

multi-gate FETs, it is essential to account for the effect of structural and electrical

quantum confinement on Vth. In semi-classical transport approaches, quantum ef-

fects are typically included as a potential-like correction (Λn and Λp for electrons

and holes, respectively) to the quasi-fermi level based calculations for carrier con-

20

centrations. In the case of electrons, under the Boltzmann approximation:

n = ni · exp(

EF,n−Ei−Λn

kB ·Tn

)(2.22)

where Λn is related to the density-gradient [2] as

Λn =−γ h2

6mn· ∇

2(√

n)√n

(2.23)

Here, ni is the intrinsic carrier concentration, EF,n and Ei are the electron quasi-

fermi and intrinsic energy levels, mn is electron effective mass, and γ is a fitting

factor.

Overall, in order to simulate multi-gate devices accurately using a commercial

device simulator like Sentaurus Device [83], quantum hydrodynamic models are

the best option in terms of the tradeoff between simulation accuracy vs. computa-

tion time.

2.2 Compact models for multi-gate FETs

While device simulators are reasonably accurate for a given transport model

framework, they are very slow for large-scale circuit simulation. Here, compact

models serve as a crucial link between process technology and circuit simula-

tion, by leveraging inputs from TCAD simulation and hardware data. Several

challenges exist in developing reliable scalable compact models for multi-gate

devices that can capture all physical regimes of device operation. Two popular

flavors of compact models for FinFETs/multi-gate FETs are available in the liter-

ature, namely, Spice3-UFDG from the University of Florida, Gainsville [42] and

BSIM-CMG/IMG from the University of California, Berkeley [41]. They are briefly

discussed below.

21

Spice3-UFDG model: Spice3-UFDG is a process/physics based model that re-

lies on charge-based modeling of generic double-gate MOSFETs [42]. It is an exten-

sion of the UFSOI/FD model [84], and incorporates a compact iterative Poisson-

Schrodinger solver with the primary assumption of a fully-depleted silicon body

under weak inversion. The model physically accounts for the charge coupling

between front and back gates, and computes the quantum mechanical carrier dis-

tribution throughout the body/channel regions in weak as well as strong inversion

regions. Since Spice3-UFDG is well calibrated with hardware data, we compared

Sentaurus TCAD device simulations of FinFET devices with Spice3-UFDG using

identical physical parameters. A sample case with 30nm gate length, 15nm fin

thickness, 1.2nm gate dielectric thickness, and 75nm fin height is shown in Fig.

2.5, where the two can be seen to be in good agreement for a wide range of bias

voltages. While Spice3-UFDG works well for single FET simulations, it faces con-

vergence issues for larger circuit-level simulations, thereby making it unattractive.

0 0.2 0.4 0.6 0.8 110

−8

10−7

10−6

10−5

10−4

10−3

10−2

VGS

(V)

I DS (

A/µ

m)

TCADSpice3−UFDG

@ VDS

= 1V

Figure 2.5: I-V comparison between Sentaurus Device and Spice3-UFDG compactmodel

BSIM-CMG/IMG model: Unlike Spice3-UFDG, BSIM-CMG/IMG are surface

potential based compact models, i.e., all terminal currents, charges, and capaci-

tances are derived from the surface potentials calculated in the device. Owing22

to the simplicity of the surface-potential formalism, the BSIM-CMG drain current

model for SG-mode FinFETs can be expressed as

IDS = 2 ·µ ·We f f

Le f f· [G(φS)−G(φD)] (2.24)

where We f f is the effective electrical FET width, Le f f is the effective FET gate length,

and the function G(φ) is given by

G(φ)=Q2

INV2COX

+2 · kBTq·QINV−

kBTq·[

5 · εSikBTqTSI

+QBULK

]·ln[

5 · εSikBTqTSI

+QBULK +QINV

](2.25)

Here, QINV and QBULK are the inversion and bulk depletion charges (which are

functions of φ), TSI is the fin thickness, and COX is the gate capacitance.

A significant advantage of the BSIM model is that it can correctly predict drain

current in fully-depleted as well as partially-depleted body regimes. It also cap-

tures many relevant effects seen in HKMG FinFETs such as quantum confinement,

velocity overshoot, gate-induced drain leakage, etc.

The BSIM-IMG model is a seperate model for IG-mode FinFETs, where the front

and back gates are biased differently. BSIM-IMG reuses several of the models pro-

vided in BSIM-CMG, and accurately accounts for the Vth dependence of the front

gate with respect to an applied reverse bias at the back gate. However, a signifi-

cant disadvantage of the model is its inability to correctly compute drain current

when the back channel is also inverted, as well as cases where the voltage differ-

ence between the front and back gates is smaller than the difference between the

gate-workfunctions of the front and back gates. While BSIM-CMG is relatively sta-

ble from a convergence perspective, BSIM-IMG has poor convergence properties

when simulating large circuits.

23

2.3 A generic multi-gate device fabrication flow

Vertical multi-gate transistors such as FinFETs, Ω-FETs, and Tri-gate FETs have

self-aligned gates, which is a major advantage from a fabrication perspective. The

major fabrication steps for FEOL processing of generic SOI FinFETs is shown in

Fig. 2.6 [1].

• The process starts with (A), which is an SOI wafer having a predefined sili-

con over insulator thickness that determines the fin height. With the aid of

either direct lithography or spacer lithography, fins of a critical dimension

are defined, followed by plasma etching to get to (B).

(A)(B)

(C)

(D)

(E)

(F)(G)

(H)

Figure 2.6: Generic multi-gate device fabrication flow

• Thereafter, oxidation and H2 annealing are used to smoothen the sidewall

surfaces. This is followed by the growth of the gate dielectric and deposition

of the metal gate to get to (C). For undoped body FinFETs, this is a critical

24

step, where the workfunction of the metal-gate/interface capping layers di-

rectly determines the Vth of the FinFET.

• Since the gate stack is deposited unevenly over the fin topography, it is essen-

tial to planarize and flatten the gate surface to get to (D), in order to enable

subsequent gate processing steps. Next, the gate patterning and gate etching

steps are performed to define the gate length of the FinFET, as in (E). Here,

the gate etching process needs to be highly selective to avoid damage to the

silicon fin.

• Next, large-angle, low-energy tilt implants are used to conformally dope the

source/drain regions and avoid migration of dopants into the undoped fin to

reach (F). This is followed by a sequence of steps to enable selective growth of

epitaxial source/drain regions without shorting them to the gate. To enable

the latter, source/drain offset nitride spacers are formed along the sidewalls

of the gate and the fin to reach (G).

• Finally, the fin spacers are removed and the extended source/drain regions

are subject to selective epitaxial growth as in (H), where epitaxy serves the

dual purpose of reducing parasitic resistances as well as reducing the number

of contacts needed to connect to multi-fin FinFET source/drain regions by

shorting them.

The flow described above is a gate-first process which involves gate definition

prior to source/drain implantation. For high-k metal-gate transistors, there are

predominantly two approaches that are possible, namely, gate-first and gate-last,

which are broadly applicable to multi-gate FETs as well. In a dual metal-gate-first

process (Fig. 2.7), the high-k gate dielectric is formed followed by metal-gate-1

(MG1) deposition, as in (A). Thereafter, MG1 is patterned and metal-gate-2 (MG2)

is deposited, as in (B). This is followed by MG2 patterning and gate etching to25

reach (C). Finally, source/drain implantation is performed followed by contacts

to the FETs. In a dual metal-gate-last or replacement gate process (Fig. 2.8), the

gate dielectric is formed and patterned along with a dummy polysilicon gate, as

in (A). This serves as a mask for source/drain implantation and deposition of in-

terlayer dielectrics (ILD) along with ILD polish, as in (B). Next, the sacrificial gate

is removed and MG1 is deposited and patterned, as in (C). Thereafter, MG2 is de-

posited/patterned and contacts to the FETs are formed, as in (D).

(A) (B)

(C)(D)

Silicon

Gate dielectric

STI

Metal-gate 1

Metal-gate 2

Contacts

Figure 2.7: Outline of a gate-first process

(A) (B)

(C)(D)

Silicon

Gate dielectric

STI

Metal-gate 1

Metal-gate 2

Contacts

Polysilicon

Figure 2.8: Outline of a gate-last process

26

Over the years, it has become increasing clear that the gate-last approach is

more favorable, and this has been adopted in process simulations in Chapters 5

and 6. The gate-last process induces strain effects on the FETs which can be sig-

nificant and greatly improve performance. Also, getting low Vth pFET devices in

a gate-first process is very difficult on account of thermal issues, which cause Vth

drift. However, despite the advantages, the gate-last process places constraints on

layout density, as it requires a chemical mechanical polishing (CMP) step at the

very end. This can significantly increase layout area with respect to an identical

topology implemented in a gate-first process.

2.4 Multi-gate FET adoption challenges

Several challenges need to be addressed to enable a smooth transition from planar,

single-gate FET technology to multi-gate FET technology. They can be classified

into process/device and circuit-design-specific issues.

Process/device issues:

• Lithography: Patterning vertical fins with dimensions many times smaller

than the wavelength of light (typically 193nm) at tight fin/gate pitches is

highly non-trivial (e.g., process steps, such as the removal of spacer mate-

rial around a fin without eroding it, require extreme precision). While spacer

lithography [1] is the preferred choice for fin patterning, for other layers, it is

unclear whether extreme-UV (EUV) or double-patterning will persist at the

lower technology nodes in terms of meeting yield constraints.

• Wafer scale fin height uniformity: In the case of bulk FinFETs/processes that

rely on some form of CMP, fin height tolerances are difficult to control. This

translates to a design problem as the electrical width of a FinFET is directly

proportional to the fin height.27

• Multi-gate parasitic resistances: Decreasing source/drain series resistances

to the channel as well as gate conductor resistance with fin/gate pitch scaling

is a major challenge. Here, fin aspect ratios and fin/gate pitches play an

important role in determining if the source/drain regions can be conformally

doped to yield very low resistances using low-energy tilt implants.

• Multi-gate parasitic capacitances: In order to enable robust bar contacts to

the FETs, nearly all future multi-gate technologies will rely on extended

source/drain epitaxy to short sources/drains of parallel fins. However, this

dramatically increases gate to source/drain parasitic capacitance and needs

to be addressed.

• Tuning the gate-workfunction: Since undoped-body FinFETs are likely to be

the first choice from a manufacturability perspective, process technologies

that permit broad tunability in n-/p-FinFET gate-workfunctions in an inde-

pendent manner (for obtaining high-Vth and low-Vth devices) are being re-

searched in the industry.

Circuit design issues:

• Width quantization: Multi-gate devices impose FET electrical width quanti-

zation, which is a design limitation for SRAM/analog/RF circuit designers.

• Circuit parasitics: Extraction of FEOL parasitics corresponding to generic

multi-gate circuit layouts is a major problem.

• Process variations: Although there is a significant improvement with re-

spect to planar devices [85, 86], sources of variation, such as fin/gate line-

edge roughness, grain-orientation dependent gate workfunction variation,

fin thickness/height variation, etc., are expected to affect performance in

multi-gate circuits. Hence, it is essential to develop methodologies to model

28

multi-gate circuits accurately in the presence of such variations to maximize

yield at design time, for any given process recipe.

29

Chapter 3

Design and Test of FinFET Logic

Circuits

In this chapter, we focus on two aspects of FinFET circuit design. The first sec-

tion deals with the design of low-power logic gates and sequential elements in a

high-performance FinFET process technology. The second section delves into the

development of fault models for FinFET logic circuits.

3.1 Design of logic gates and flip-flops in high-performance

FinFET technology

In this section, we delve into the design of ultralow power logic gates and se-

quential elements in a high-performance FinFET process technology, where the

leakage-delay tradeoff is an important consideration.

3.1.1 Introduction

Owing to the rapid increase in technology/process complexity at the lower

technology nodes, leading edge foundries are focusing on enabling both high-

30

performance and low-power devices in a single process [82]. This is very relevant

in the context of emerging multi-gate devices like FinFETs, where logic/sequential

circuit design tradeoffs have not been explored for high-performance processes.

In the transition from planar CMOS to FinFET standard cell design for ultra-low-

leakage, high-performance circuits, important questions that demand attention

are:

• From the perspective of process and layout complexity, is it profitable to use

IG-mode FinFETs at all? What is the best way to mix SG-/IG-mode FinFETs

in order to reduce leakage current in a standard cell?

• How do SG-/IG-mode FinFETs fare in terms of leakage under temperature

variations and what are the tradeoffs offered by different topologies for these

scenarios?

• Are there alternatives to using back-gate biased IG-mode FinFETs for leakage

reduction in a high-performance technology?

In addition to addressing the above, the major contributions in the current section

are as follows [60, 87]:

• We evaluate Symm-ΦG and Asymm-ΦG FinFET devices head-to-head in a

high-performance process using 3D device simulations in Sentaurus TCAD

[81].

• We examine the effect of physical device parameters on on-current (ION) and

off-current (IOFF ), and gate-workfunction fluctuations (which are likely to

be the largest sources of Vth variation [88–91]) on FinFET leakage via quasi-

Monte Carlo 3D device simulations.

• We comprehensively probe the design space of Symm-ΦG and Asymm-ΦG

FinFET logic gates and flip-flops along various electrical characteristic di-31

mensions (leakage, delay) and layout complexity/area by suitably mixing

SG-/a-SG-/IG-mode FinFETs, using mixed-mode 2D device simulations.

• For the first time, we also demonstrate that the most versatile Symm-ΦG

topologies fail to approach the leakage-delay trade-offs enjoyed by logic ele-

ments based on Asymm-ΦG SG-mode FinFETs. This suggests that it is more

practical to use Asymm-ΦG FinFETs for ultra-low-leakage designs in a high-

performance FinFET technology rather than integrate Symm-ΦG IG-mode

FinFETs, which have high area/process overheads and introduce additional

CAD/layout design/testing costs.

The rest of this section is organized as follows. In Section 3.1.2, we review re-

lated work. In Section 3.1.3, we evaluate key metrics of Symm-ΦG and Asymm-ΦG

FinFETs via 3D/2D device transport simulations. Thereafter, we employ mixed-

mode 2D device simulations in subsequent sections, owing to the rapid increase

in computational complexity/time of 3D device simulations. In Section 3.1.4, we

characterize various plausible Symm-ΦG and Asymm-ΦG FinFET inverter (INV)

and two-input NAND (NAND2) logic gates in detail to determine the most ver-

satile configurations with respect to electrical characteristics. In Section 3.1.5, we

examine tradeoffs in designing basic latch and flip-flop topologies using various

combinations of Symm-ΦG SG-/IG-mode and Asymm-ΦG SG-mode FinFETs, us-

ing insights from Sections 3.1.3 and 3.1.4. Finally, Section 3.1.6 presents the section

summary.

3.1.2 Related work

Circuit design based on low-leakage multi-gate FETs/FinFETs has garnered signif-

icant attention owing to the explosive increase in leakage power consumption in

planar FETs at lower technology nodes, over the past decade. Low-power multi-

32

gate circuit design has been explored from a device-circuit viewpoint in [92, 93].

In [94–98], logic styles leveraging the SG and IG modes of FinFET operation have

been investigated. FinFET latches and flip-flops have been studied in [99], [100].

Owing to its small dimensions, a FinFET is likely to suffer from the effects of pro-

cess and temperature variations. In [101], engineering the workfunction of the

gate material is shown to be effective in controlling Vth under variations and sen-

sitivity of device electrical parameters to fluctuations in gate length, fin thickness,

and gate dielectric thickness is also analyzed. In [88–91], gate-workfunction vari-

ation is shown to be the most important contributor to variation in Vth for metal-

gate FinFETs. FinFETs with asymmetric gate workfunctions in the form of n+/p+

polysilicon gates have been engineered and investigated in [102], [103]. Overall,

with respect to planar devices, FinFETs are expected to fare well from a variability

perspective [85, 86].

Since multi-gate adoption is likely to be driven by performance/area benefits,

in this work, we comprehensively characterize Symm-ΦG and Asymm-ΦG FinFETs

in a high-performance process. We also investigate various possible configurations

of logic gates and flip-flops employing such FinFETs through mixed-mode device

simulation (taking into account the effect of temperature), from a digital circuit

designer’s perspective. Preliminary results dealing with the latter were presented

in [60].

3.1.3 Symmetric-ΦG and asymmetric-ΦG FinFET devices

In this section, we evaluate Symm-ΦG and Asymm-ΦG FinFETs head-to-head in a

high-performance process. Owing to the absence of a suitable platform for multi-

gate circuit design exploration, we use FinE3D, an extension of FinE [104] (Fig.

3.1), which integrates double-gate compact models, like Spice3-UFDG [42], BSIM-

CMG/IMG [41], and a device simulator, like Sentaurus TCAD [81], into a single

33

framework. We utilized the SG-/IG-mode FinFET device structures shown in Figs.

3.2(a)/3.2(b) for 3D device transport simulations in Sentaurus Device [83]. Also,

MATLAB postprocessing

LTSpice netlist extraction

Quasi−MC process

variation moduleCompact Model

Spice3−UFDG

Parameter extraction

module

MATLAB GUI

Sentaurus TCAD

mixed mode device

simulation

Figure 3.1: FinE simulation framework for double-gate circuit design space explo-ration

DRAIN

SOURCEGATE

Z

XY

(a) SG-mode FinFET structure

DRAIN

SOURCE

FRONT GATE

BACK GATE

Y

Z

X

(b) IG-mode FinFET structure

Figure 3.2: SG-/IG-mode 3D FinFET structures simulated in Sentaurus TCAD

a two-dimensional (X-Y) cross-section of the device structures in Figs. 3.2(a) and34

Figure 3.3: Two-dimensional (X-Y ) cross-section of an n-FinFET simulated in Sen-taurus TCAD

3.2(b), as shown in Fig. 3.3, was employed for mixed-mode device-circuit simu-

lations. In Table 3.1, the parameters for a typical n-/p-FinFET device are listed,

where LGF , LGB, TOXF , TOXB, TSI , HFIN , HGF , HGB, LSPF , LSPB, LUN , NBODY , ΦGF , ΦGB,

NSD, VDD are the physical front- and back-gate lengths, front- and back-gate effec-

tive oxide thicknesses, fin thickness, fin height, front- and back-gate thicknesses,

front- and back-gate spacer thicknesses, gate-drain/source underlap, body dop-

ing, front- and back-gate workfunctions, source/drain doping, and the operating

voltage, respectively.

The fin body thickness is chosen to be small enough in comparison to the gate

length, in order to ensure that the gate has excellent control over the channel [1].

The channel region in the fin is typically undoped, owing to the small dimensions

of the device. The heavily doped extended raised source/drain regions (HCON ×

LCON) aid in forming contacts to the device. They lead into the source/drain

regions in the fin where the dopant concentration gradually decreases progress-

ing towards the relatively undoped body region, causing either an overlap (LOV )

or an underlap (LUN). The Vth of FinFETs is typically tuned by directly adjust-

ing the workfunction of the gate material [105]. The workfunctions for n-FinFET

35

Table 3.1: FinFET device parameters

PARAMETERSLGF ,LGB(nm) 25

Effective TOXF ,TOXB(nm) 1TSI(nm) 10

HFIN(nm) 50HGF ,HGB(nm) 20LSPF ,LSPB(nm) 20

LUN(nm) 10NBODY (cm−3) 1015

ΦGF ,ΦGB(eV ) ΦGn: 4.4, ΦGp: 4.8NSD(cm−3) 1020

VDD(V ) 1VHIGH(V ) 1.2VLOW (V ) −0.2

ΦGF

=ΦGB

=4.4eV ΦGF

=ΦGB

=4.8eV

(a) (b) (c) (d)

Figure 3.4: Symm-ΦG FinFET symbols: (a) SG-mode n-type, (b) IG-mode n-type,(c) SG-mode p-type, and (d) IG-mode p-type

(ΦGF =ΦGB =ΦGn = 4.4eV ) and p-FinFET (ΦGF =ΦGB =ΦGp = 4.8eV ) devices were

chosen corresponding to high-performance logic requirements [1] and yield low-

Vth devices, whose symbols are shown in Fig. 3.4.

ION and IOFF characteristics

We revisit the physics of SG- and IG-mode FinFET devices, to better appreciate

the limitations of Symm-ΦG devices and the advantages of Asymm-ΦG FinFETs.

Accounting for temperature effects, we performed hydrodynamic mixed-mode 3D

device simulations on carefully-defined meshes (for excellent convergence) and

invoked the density gradient model for incorporating quantum effects in a thin fin.

36

We ignored the effects of gate tunneling currents owing to the undoped fin, and

used an effective oxide thickness that can easily be realized using thicker high-k

dielectrics to suppress gate leakage.

Y

X

DRAIN

SOURCE

FR

ON

T G

AT

E

BA

CK

GA

TE

(a) ON-state electrostaticpotential

Y

X

DRAIN

SOURCE

FR

ON

T G

AT

E

BA

CK

GA

TE

(b) OFF-state electrostaticpotential

FR

ON

T G

AT

E

Z

X

BA

CK

GA

TE

(c) ON-state electron den-sity

BA

CK

GA

TE

FR

ON

T G

AT

E

Z

X

(d) OFF-state electrondensity

Figure 3.5: Electrostatic potential and electron density distributions within the finregion of an SG-mode n-FinFET for on-state (VGFS = VGBS = 1V,VDS = 1V ) and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions

37

Figs. 3.5(a) and 3.5(b) show the electrostatic potential in the fin region (X-Y

plane) of an SG-mode n-FinFET under on-state (VGFS = VGBS = 1V,VDS = 1V ) and

off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions, respectively. In the on-state, both

gates contribute to band-bending such that inverted regions [Fig. 3.5(c)] form be-

side both gates (and move toward the fin center as TSI decreases, due to increased

quantum confinement), leading to high drain current. In the off-state, the fin cen-

ter is most susceptible to leakage [Fig. 3.5(d)], as the potential barrier height for

electrons is higher for paths closer to either gate.

In Figs. 3.6(a)-3.6(d), the electrostatic potential and electron density in an IG-

mode n-FinFET is shown with VGBS = −0.2V . The bias on the back-gate causes an

inverted region to form predominently near the front gate, which contributes to

the drain current in the on-state [Figs. 3.6(a), 3.6(c)], and leads to leakage paths

beside the front gate in the off-state [Figs. 3.6(b), 3.6(d)]. The peak electron density

in the off-state (which is tunable using VGBS) is over an order of magnitude smaller

in the IG mode in comparison to the SG mode, indicating that IG-mode FinFETs

have lower leakage.

Fig. 3.7 shows the dependence of drain current (IDS) on front-gate voltage (VGFS)

for an IG-mode n-FinFET with VDS = 1V and back-gate voltage (VGBS) varying from

0V to −0.3V . This suggests that IG-mode FinFETs (with a strong reverse bias on

the back-gate) can reduce leakage by upto two orders of magnitude in FinFET

standard cells in high-performance processes.

Next, we introduce Asymm-ΦG FinFETs and demonstrate that they possess

steep subthreshold characteristics that can be employed in the design of ultra-

low-leakage logic circuits in high-performance process technologies, thus reduc-

ing the need for Symm-ΦG IG-mode FinFET based back-gate biasing schemes.

Asymm-ΦG FinFETs can be formed by adjusting workfunctions on each side of

the SG-mode FinFET using selective implantation of a suitable dopant for the gate

38

Y

X

DRAIN

SOURCE

BA

CK

GA

TE

FR

ON

T G

AT

E


X

Y

DRAIN

FR

ON

T G

AT

E

BA

CK

GA

TE

SOURCE


X

Z

FR

ON

T G

AT

E

BA

CK

GA

TE


X

Z

BA

CK

GA

TE

FR

ON

T G

AT

E


Figure 3.6: Electrostatic potential and electron density distributions within the finregion of an IG-mode n-FinFET for on-state (VGFS = 1V,VGBS = −0.2V,VDS = 1V ),and off-state (VGFS = 0V,VGBS =−0.2V,VDS = 1V ) conditions

stack. This has been demonstrated for n+/p+ polysilicon gates using large-angle

tilt implants [102], [103]. If the choice of front/back-gate workfunctions is identi-

cal to that of high-performance n-/p-FinFET metal-gate workfunctions, as shown

39

Figure 3.7: IDS vs. VGFS for an IG-mode n-FinFET, VDS = 1V,VGBS varying from 0Vto −0.3V . IOFF = IDS(VGFS = 0V ) varies by 120×

ΦGF

=4.8eV (b)

ΦGB

=4.4eVΦGB

=4.4eV

ΦGF

=4.8eV(a)

Figure 3.8: Asymm-ΦG FinFET symbols: (a) a-SG-mode n-type, and (b) a-SG-modep-type

in Fig. 3.8, it would be favorable from a fabrication perspective. All Asymm-ΦG

FinFETs, n- or p-channel, would have both workfunctions on either side of the

fin, without the need for complicating the process with a third gate workfunction

exclusively for high-Vth devices, and high-performance SG-mode Symm-ΦG n-/p-

FinFETs would be fabricated along with them using the same gate workfunctions.

In Fig. 3.8, both n-FinFETs and p-FinFETs have 4.4eV /4.8eV workfunctions, with

the source/drain doping determining the type of majority charge carrier conduc-

tion during the on-state. Since the gates of Asymm-ΦG FinFETs are shorted, they

are also referred to as ‘a-SG-mode’ FinFETs.

40

Y

X

DRAIN

SOURCE

BA

CK

GA

TE

(ΦG

= 4

.4eV

)

FR

ON

T G

AT

E (Φ

G =

4.8

eV)


Y

X

DRAIN

SOURCE

BA

CK

GA

TE

(ΦG

= 4

.4eV

)

FR

ON

T G

AT

E (Φ

G =

4.8

eV)


X

Z

BA

CK

GA

TE

(ΦG

= 4

.4eV

)

FR

ON

T G

AT

E (Φ

G =

4.8

eV)


Z

X

FR

ON

T G

AT

E (Φ

G =

4.8

eV)

BA

CK

GA

TE

(ΦG

= 4

.4eV

)


Figure 3.9: Electrostatic potential and electron density distributions within the finregion of an a-SG-mode n-FinFET for on-state (VGFS = VGBS = 1V,VDS = 1V ), andoff-state (VGFS =VGBS = 0V,VDS = 1V ) conditions

From Fig. 3.9(a), we see that during the on-state (VGFS = VGBS = 1V,VDS = 1V ),

the electrostatic potential distribution in an a-SG-mode n-FinFET approaches that

of a Symm-ΦG SG-mode n-FinFET [Fig. 3.5(a)], resulting in a reasonably high drain

41

current. This is also indicated by volume inversion in the fin [Fig. 3.9(c)]. In the off-

state [Figs. 3.9(b), 3.9(d)], the energy bands bend strongly near the front-gate side

(as ΦGF = 4.8eV ), thereby raising the barrier for electrons. The electrostatic poten-

tial/electron density distributions are qualitatively identical to those observed in

the Symm-ΦG IG-mode FinFETs in the off-state in Figs. 3.6(b) and 3.6(d), respec-

tively.

−0.03 −0.02 −0.01 0 0.01 0.02 0.03

−2

−1.5

−1

−0.5

0

0.5

X (µm)

Ban

d E

nerg

y (e

V)

Source

DrainAt back−gateAt front−gate

At fin center

(a) a-SG-mode

−0.03 −0.02 −0.01 0 0.01 0.02 0.03

−2

−1.5

−1

−0.5

0

0.5

X (µm)

Ban

d E

nerg

y (e

V)

Source

At front−gateDrain

At back−gate

At fin center

(b) IG-mode

Figure 3.10: Energy band diagrams for (a) a-SG-mode n-FinFET, off-state (VGFS =VGBS = 0V,VDS = 1V ), and (b) IG-mode n-FinFET, off-state (VGFS = 0V,VGBS =−0.2V,VDS = 1V )

From Figs. 3.10(a) and 3.10(b), we can see that the amount of band-bending near

the front gate is stronger for a-SG-mode FinFETs in comparison to the back gate in

Symm-ΦG IG-mode FinFETs (VGBS =−0.2V ), whereby the leakage current of a-SG-

mode devices is lower. Therefore, Asymm-ΦG FinFETs combine the advantages

offered by Symm-ΦG SG- and IG-mode FinFETs, with SG-mode-like ION and IG-

mode-like IOFF . Fig. 3.11 quantifies the above, showing that Symm-ΦG SG-mode

(IG-mode) n-FinFETs have 415× (15×) higher leakage current compared to a-SG-

mode devices at 300K. Similarly, Fig. 3.12 shows that Symm-ΦG SG-mode (IG-

mode) p-FinFETs have 175× (5×) higher leakage than a-SG-mode p-FinFETs.

42

0 0.2 0.4 0.6 0.8 110

−12

10−10

10−8

10−6

10−4

10−2

VGFS

(V)

I DS (

A)

a−SG−mode (VGBS

= VGFS

)

IG−mode (VGBS

= −0.2V)

SG−mode (VGBS

= VGFS

)15X

415X

68% ION

reduction w.r.t SG−mode

26% ION


Figure 3.11: IDS vs. VGFS for an a-SG-mode n-FinFET (VDS = 1V ), with correspond-ing curves for SG-mode and IG-mode n-FinFETs

−1 −0.8 −0.6 −0.4 −0.2 010

−12

10−10

10−8

10−6

10−4

10−2

VGFS

(V)

I DS (

A)

SG−mode (VGBS

= VGFS

)

IG−mode (VGBS

= 0.2V)

a−SG−mode (VGBS

= VGFS

41% ION


88% ION


175X

5X

Figure 3.12: IDS vs. VGFS for an a-SG-mode p-FinFET (|VDS|= 1V ), with correspond-ing curves for SG-mode and IG-mode p-FinFETs

43

Effect of device parameter variations

We also investigated the effect of parameteric variations in LG, TSI , and LUN on

ION and IOFF . Fig. 3.13(a) shows that in SG-/a-SG-mode and IG-mode FinFETs,

ION decreases almost linearly with an increase in LG. ION increases linearly with an

increase in TSI in Fig. 3.13(b), with higher slopes for SG-/IG-mode FETs in compar-

ison to a-SG-mode FETs. Fig. 3.13(c) shows that ION in SG-mode FETs is very sen-

sitive to reduction in LUN , followed by IG-mode FETs, while a-SG-mode FETs are

relatively immune to changes in LUN . IOFF , on the other hand, is greatly affected by

all three parameters. Fig. 3.14(a) shows that IOFF in SG-/IG-mode devices has an

exp(k/LG) dependence, while a-SG-mode FETs show a stronger exp(k1/LG + k2LG)

dependence, where k, k1, and k2 are constants. Fig. 3.14(b) shows that IOFF has an

exp(k1/LG + k2LG) dependence in all cases, with different values for k1 and k2 for

each device. IOFF appears to roughly have an exp(k1L2UN + k2LUN) dependence on

LUN in all cases in Fig. 3.14(c).

Effect of gate-workfunction fluctuations

Since metal-gate FET Vths are linearly dependent on the gate workfunction, we

studied the effect of workfunction fluctuations on IOFF (or ILEAK) in n-FinFETs.

In [88], gate-workfunction variation is shown to be the major cause of Vth variation,

in comparison to LG and TSI , which have minor contributions. Using a quasi-Monte

Carlo (QMC) sample generator based on Sobol’s sequence [106], we performed

QMC 3D device simulations, varying ΦG for SG-/a-SG-/IG-mode n-FinFETs with

σΦG = 50meV , and limited the total sample count to 100 samples on account of the

prohibitively large runtimes for 3D device simulation. While conventional Monte

Carlo methods suffer from the sample clustering problem, QMC methods based

on low discrepancy sequences [107] sample the design space uniformly, leading to

much quicker convergence with fewer samples. In Fig. 3.15, the ILEAK distribu-

44

0.02 0.022 0.024 0.026 0.028 0.032

4

6

8

10

12

14x 10

−5

LG

(µm)

I ON (

A)

SG−modea−SG−modeIG−mode

(a) ION vs. LG

0.006 0.008 0.01 0.012 0.0140

0.2

0.4

0.6

0.8

1

1.2

1.4x 10

−4

TSI

(µm)

I ON (

A)


(b) ION vs. TSI

4 6 8 10 12

x 10−3

2

4

6

8

10

12

14

16x 10

−5

LUN

(µm)

I ON (

A)


(c) ION vs. LUN

Figure 3.13: ION characteristics vs. variations in LG, TSI , and LUN

tions are shown, where a-SG-mode devices have lower/comparable spreads with

respect to SG-/IG-mode FinFETs. The above investigation into parameteric de-

pendencies with respect to LG, TSI , and LUN , and variation analysis based on gate-

workfunction fluctuations suggests that a-SG-mode FinFETs are likely to be very

robust to process variations.

Effect of temperature on leakage

Figs. 3.16(a) and 3.16(b) capture the variation in IOFF for SG-mode and IG-mode

FinFETs with temperature varying between 280K and 400K. IG-mode FinFETs reg-

ister a change of 200× in IOFF , while SG-mode FinFETs display a change of 70×.

45

0.02 0.022 0.024 0.026 0.028 0.03−0.08

−0.06

−0.04

−0.02

0

0.02

LG

(µm)

LG

log

10 (

I OF

F/1

nA

)


(a) IOFF vs. LG

0.006 0.008 0.01 0.012 0.014−0.02

−0.015

−0.01

−0.005

0

0.005

0.01

0.015

TSI

(µm)

TS

I log

10 (

I OF

F/1

nA

)


(b) IOFF vs. TSI

4 6 8 10 12

x 10−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

1

LUN

(µm)

log

10 (

I OF

F/1

nA

)


(c) IOFF vs. LUN

Figure 3.14: IOFF characteristics vs. variations in LG, TSI , and LUN

This suggests that the ILEAK advantage in topologies having a mix of SG- and IG-

mode FinFETs would lessen relative to those having only SG-mode FinFETs, with

an increase in temperature. Fig. 3.17 shows that even with a 100K increase in tem-

perature, a-SG-mode devices have two (one) orders of magnitude lower IOFF than

Symm-ΦG SG-mode (IG-mode) FinFETs. However, the IOFF advantage of IG-mode

and a-SG-mode over SG-mode reduces by ∼ 2× (from 18×→ 8×) and ∼ 6× (from

640×→ 104×), respectively.

46

a−SG

IGSG

Figure 3.15: ILEAK distribution for a-SG-/SG-/IG-mode n-FinFETs under gateworkfunction fluctuations, σΦG = 50meV

(a) SG-mode (b) IG-mode (VGBS =−0.2V )

Figure 3.16: IDS vs. VGFS for an n-FinFET at different temperatures

3.1.4 Symmetric-ΦG and asymmetric-ΦG FinFET logic gates

A significant problem with logic circuits implemented in high-performance pro-

cess technologies is the relatively high leakage current that is concomitant with

the high on-state current. Hence, circuit topologies with low-leakage that do not

compromise on performance constitute the optimal design points. In this section,

47

280 300 320 340 360 380 40010

−12

10−11

10−10

10−9

10−8

10−7

Temperature (K)

I OF

F (

A)

a−SG−modeSG−modeIG−mode (V

GBS = −0.2V)

640X

104X

18X

8X

Figure 3.17: IOFF vs. temperature for an a-SG-mode n-FinFET with correspondingcurves for SG-mode and IG-mode n-FinFETs

we explore the design space of Symm-ΦG FinFET INV and NAND2 gates in detail

to determine the most versatile topologies that can arise by mixing Symm-ΦG SG-

and IG-mode FinFETs.

3D versus 2D device simulation

Owing to the prohibitively high computational costs involved in single FET 3D

transport simulations, mixed-mode 3D device simulations for FinFET circuits is

intractable in practical timeframes. Also, transient simulations, which are neces-

sary to capture logic element delays, are extremely cumbersome to perform via

3D device simulation on account of which device simulations on a 2D structure

(corresponding to a slice of the 3D FinFET device) are used hereafter. Since 2D

simulations do not fully capture all physical effects (e.g., corner effect [1]) on car-

rier transport, we computed the error percentage from the drain-currents, (IDS,2D−

IDS,3D)/IDS,2D vs. VGFS from 2D/3D device simulations [Fig. 3.18]. In general, 2D

device simulation overpredicts IOFF and underestimates ION with respect to cor-

responding 3D simulations. Also, a-SG-mode devices have relatively larger dif-

48

ferences between 2D and 3D simulations in the sub-threshold regime, in compar-

ison to SG-/IG-mode devices. Overall, IOFF and ION predictions are marginally

different across all FETs (within 25% for IOFF and 12.6% for ION), suggesting that

reasonably accurate comparisons can be made with mixed-mode 2D device circuit

simulations.

0 0.2 0.4 0.6 0.8 1−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

VGFS

(V)

(ID

S, 2

D −

I DS

, 3D)/

I DS

, 2D

SG−modeIG−modea−SG−mode

25%

4.4%

19%

−12.6%

−4.5%

Figure 3.18: Fractional error in IDS vs. VGFS for 2D/3D device simulations

Symm-ΦG and Asymm-ΦG logic gates

Fig. 3.19 shows four possible INV configurations with SG-/IG-mode FinFETs: SG-

, low-power (LP-) [94], IGn- and IGp-INV. The SG-INV configuration has only

SG-mode n-/p-FinFETs with a highly compact layout, as shown in Fig. 3.20(a).

In the LP-INV configuration [Fig. 3.19(b)], the back-gate of PA (NA) in the pull-

up (pull-down) network is biased to VHIGH (VLOW ), necessitating a complex layout

[Fig. 3.20(b)] with 36% larger area than size X2 SG-INV, while IGn-INV [Fig. 3.20(c)]

and IGp-INV [Fig. 3.20(d)] occupy the same area as LP-INV, owing to the multi-

fin IG-mode FinFET back-gate contacts [108]. Amongst NAND2 gates [Figs. 3.21,

3.22], while SG-NAND2 has the most compact layout [Fig. 3.23(a)], LP-NAND2

[Fig. 3.23(b)] occupies 27% more area than size X2 SG-NAND2, with a staggered

49

pull-up network of parallel FinFETs, and shared back-gate contacts for the se-

ries pull-down FinFETs. Mixed-terminal (MT-) NAND2 [109] is identical to LP-

NAND2 in area, with NB in SG mode [Fig. 3.21(c)]. IG- and IG2-NAND2 combine

the parallel FinFETs of the pull-up network into a single p-FinFET, whereby the

layout area is the same as SG-NAND2. XT-NAND2 is a variant of MT-NAND2,

with both FinFETs of the pull-down network in SG mode and identical layout

area (not shown). XT2-NAND2 is also a variant of MT-NAND2, with both par-

allel FinFETs of the pull-up network in SG mode, which enables a compact layout

[Fig. 3.23(c)] with the same area as SG-NAND2.

Figure 3.19: INV gates: (a) SG, (b) LP, (c) IGn, and (d) IGp

(a) SG (size X2) (b) LP (size X1) (c) IGn (size X1) (d) IGp (size X1)

Figure 3.20: INV layouts

50

Figure 3.21: NAND2 gates: (a) SG, (b) LP, and (c) MT

Figure 3.22: NAND2 gates: (a) IG, (b) IG2, (c) XT, and (d) XT2

Figs. 3.24(a) and 3.24(b) show Asymm-ΦG SG-mode FinFET INV and NAND2

gates, respectively. (Note that any Symm-ΦG FinFET logic gate schematic/layout

can be converted to the corresponding Asymm-ΦG version by replacing the de-

vices, with no layout overheads). For generalized pull-up and pull-down net-

works, it is possible to mix Asymm-ΦG FinFETs for leakage reduction with Symm-

ΦG FinFETs for speed. This strategy was applied to the NAND2 gate to yield the

NAND2S gate shown in Fig. 3.24(c).

Leakage-delay characteristics: Symm-ΦG logic

In Table 3.2 and Fig. 3.25, the leakage-delay characteristics of the Symm-ΦG FinFET

INV standard cells are shown. The leakage current, ILEAK , is an average over all

input vectors and delay, tp, is the fanout-of-four (FO4) delay. All comparisons

below are drawn with respect to SG-INV (size X2), as it is the largest single finger

SG-INV that can be accommodated for the chosen standard cell height.

51

(a) SG (size X2) (b) LP (size X1) (c) XT2 (size X1)

Figure 3.23: NAND2 layouts

Figure 3.24: Asymm-ΦG SG-mode FinFET gates: (a) a-SG-INV, (b) a-SG-NAND2,and (c) a-SG-NAND2S

In Fig. 3.25, VHIGH and VLOW are varied (if permitted by the topology), in order to

sweep the design space. SG-INV (size X2) has the smallest delay tp = 3.31 ps, with

the largest average ILEAK [SG-INV (size X1) was found to have tp = 5.75 ps]. LP-INV

shows over an order of magnitude reduction in mean ILEAK with a 267% (111%)

increase in tp with respect to SG-INV size X2 (size X1). From Fig. 3.25, it is clear that

the dominant factor affecting tp, for the current choice of ΦGn and ΦGp, is VHIGH .

For IGp-INV, lowering VHIGH increases tp and only marginally reduces average

ILEAK . For LP-INV, varying VLOW (with VHIGH = 1.2V ) presents a lower slope on the

leakage-delay plot in comparison to varying VHIGH (with VLOW = −0.2V ), which

reaffirms the above. IGn-INV appears to provide the best leakage-delay tradeoff,

52

Table 3.2: Standard cell FinFET INV characteristics, VLOW =−0.2V,VHIGH = 1.2V

Topology SG LP IGn IGpArea (w.r.t to SG) 1 1.36 1.36 1.36Avg. ILEAK (nA) 2.51 0.12 0.33 2.31

tp (ps) 3.31 12.15 5.55 9.66

Figure 3.25: Leakage-delay spectrum for FinFET INV configurations

with upto an order of magnitude reduction in average ILEAK at the cost of 66%

increase in tp with respect to SG-INV (size X2) and marginally better tp than SG-

INV (size X1).

Table 3.3: Standard cell FinFET NAND2 characteristicsTopology SG LP MT IG IG2 XT XT2

Area (w.r.t to SG) 1 1.27 1.27 1 1 1.27 1Avg. ILEAK (nA) 2.76 0.15 1.05 2.76 1.16 2.72 1.16tp(Toggle A) (ps) 5.47 22.60 20.80 8.77 11.40 17.50 8.04tp(Toggle B) (ps) 5.07 22.82 19.66 8.56 10.26 18.17 7.01

tp(Toggle AB) (ps) 4.41 15.33 13.66 4.41 6.85 10.50 6.85

In Table 3.3 and Fig. 3.26, the leakage-delay spectrum for the various FinFET

NAND2 gates is shown. All comparisons below are drawn with respect to SG-

53

Figure 3.26: Leakage-delay spectrum for FinFET NAND2 configurations

NAND2 (size X2), as it is the largest SG-NAND2 that can be accommodated in the

chosen standard cell height. In Fig. 3.26, LP-NAND2 (VLOW =−0.2V,VHIGH = 1.2V )

shows over an order of magnitude reduction in mean cell leakage with around

4× higher tp in comparison to SG-NAND2. As with the INV cases, varying VHIGH

presents a steep slope in the leakage-delay plot for our choice of ΦGp and ΦGn,

suggesting that pull-up FinFETs should be in SG mode. This is also seen for XT-

NAND2 and MT-NAND2 gates, where varying VHIGH only increases delay and

does not decrease the average ILEAK . IG-NAND2 does not gain in average ILEAK

in spite of combining the parallel pull-up FinFETs into a single p-FinFET. Instead,

the rising delay, tpLH , degrades, which increases tp. IG2-NAND2 has a larger tp

compared to IG-NAND2 over the entire spectrum of VLOW variation due to higher

falling delay, tpHL (owing to a slower pull-down stack). However, decreasing VLOW

enables over 50% reduction in average ILEAK . XT2-NAND2 presents a similar

tradeoff in average ILEAK reduction, with the benefit of lower tpLH , owing to a fast,

parallel SG-mode pull-up. Overall, XT2-NAND2 lies closest to SG-NAND2 in the

54

leakage-delay spectrum, offering the best way to leverage back-gate biasing to re-

duce average ILEAK , without a significant degradation in delay.

0 1 2 3 4 5 6

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vol

tage

(V

)

VA, V

B

VOUT

, Toggle A

VOUT

, Toggle B

VINT

, Toggle A

VINT

, Toggle B

(a)

0 1 2 3 4 5 6

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

VG

FS, F

inF

ET

NA

Toggle A, B=1Toggle B, A=1

(b)

Figure 3.27: SG-NAND2 transient charactertistics. Input rise time has been in-creased to 50ps from 10ps to improve visibility.

We see from Table 3.3 that unlike traditional planar bulk CMOS NAND2 gates,

tp(Toggle A) ≥ tp(Toggle B) for many of the FinFET logic styles [e.g., Figs. 3.27(a)

and 3.28(a)]. This is dependent on the input slew rate, intermediate node capaci-

tance (CINT )/node voltage (VINT ) of the pull-down stack, output load capacitance

(COUT ) and modes of FinFET operation in the logic gate. In Figs. 3.27(a) and 3.27(b),

the transient behavior of SG-NAND2 is shown, with VGFS across FinFET NA ris-

ing slightly faster for the Toggle B condition in comparison to Toggle A. Hence,

tp(Toggle A) > tp(Toggle B). This is exacerbated in XT2-NAND2 [Figs. 3.28(a) and

3.28(b)] as VINT does not rise to VDD when VOUT =VA =VDD, owing to the IG-mode

FinFET NA, which loses gate drive very quickly when VINT increases, and VGFS is

non-zero in the DC condition. The latter along with the fact that CINT COUT (CINT

mainly consists of source/drain-body depletion capacitances, which are negligible

in FinFETs) helps VGFS develop very quickly across NA in the Toggle B condition in

comparison to Toggle A [Fig. 3.28(b)].

55

0 1 2 3 4 5 6

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vol

tage

(V

)

VA, V

B

VOUT

, Toggle A

VOUT

, Toggle B

VINT

, Toggle A

VINT

, Toggle B

(a)

0 1 2 3 4 5 6

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

VG

FS (

V),

Fin

FE

T N

A

Toggle A, B=1Toggle B, A=1

(b)

Figure 3.28: XT2-NAND2 transient charactertistics. Input rise time has been in-creased to 50ps from 10ps to improve visibility.

From the above analysis, introducing a single IG-mode n-FinFET in the pull-

down series stack with only SG-mode p-FinFETs in the pull-up network, as with

XT2-NAND2, appears to be the best method to leverage the leakage-delay tradeoff

using back-gate biasing in high-performance Symm-ΦG FinFET standard cells.

Leakage-delay characteristics: Asymm-ΦG logic

Fig. 3.29 shows the leakage-delay characteristics of the Asymm-ΦG gates com-

pared to their corresponding Symm-ΦG SG-mode counterparts as well as IGn-INV

and XT2-NAND2 gates, which were the best Symm-ΦG gates. a-SG-INV gates

are 60% slower than their SG-INV counterparts, with average leakage that is 238×

lower, while a-SG-NAND2 gates are 65% slower than SG-NAND2 gates, with 235×

lower leakage. (a-SG-NOR2, a-SG-XOR2, a-SG-XNOR2) gates had (234×, 206×,

234×) lower average leakage compared to (SG-NOR2, SG-XOR2, SG-XNOR2) with

(34%, 20%, 10%) higher delay, respectively. The NAND2S gate, which introduces a

Symm-ΦG SG-mode n-FinFET to reduce delay, has SG-NAND2-like leakage for the

‘10’ vector, thereby increasing overall average ILEAK . From Fig. 3.29, it is also clear

that the best mixed SG-/IG-mode configurations like IGn-INV and XT2-NAND2

56

10−11

10−10

10−9

10−8

2

4

6

8

10

12

14x 10

−12

Average ILEAK

(A)

Ave

rage

FO

4 de

lay

(s)

a−SG−NAND2S

XT2−NAND2

SG−NOR2

SG−NAND2

SG−XNOR2

IGn−INV

SG−INV

SG−XOR2

a−SG−NAND2

a−SG−XNOR2

a−SG−INV

a−SG−XOR2

a−SG−NOR2

Figure 3.29: Leakage-delay spectrum for asymm-ΦG FinFET logic gates

are not as well placed as their a-SG-mode counterparts in the leakage-delay spec-

trum.

Effect of temperature on leakage

From Figs. 3.16(a) and 3.16(b), we can see that IG-mode FinFETs have a larger

fractional change in leakage current with increasing temperature. This is reflected

in logic gates as well [Fig. 3.30(a)], where the average ILEAK gap between SG- and

LP-INV decreases from 35× at 280K to 12.5× at 400K.

Fig. 3.30(b) reiterates the above observation, where the average ILEAK fractional

gap between SG-NAND2 and LP-NAND2 decreases from 22× at 280K to 13× at

400K. IG2-, MT- and XT2-NAND2 show similar trends with a 2.7× to 2.5× reduc-

tion in the average ILEAK gap. a-SG-mode devices, which display excellent leakage

behavior with an increase in temperature [Fig. 3.17], translate their benefits to a-

SG-mode logic gates as well with an order of magnitude lower average ILEAK in

57

280 300 320 340 360 380 40010

−11

10−10

10−9

10−8

10−7

Temperature (K)

Ave

rage

I L

EA

K (

A)

12.5X

35X9X

4X

LP

IGn

SG

IGp

(a) Symm-ΦG INV

280 300 320 340 360 380 40010

−11

10−10

10−9

10−8

10−7

Temperature (K)

Ave

rage

I L

EA

K (

A)

XT2

IG

XT

MT

22X

2.7X

13X

2.5XIG2

SG

LP

(b) Symm-ΦG NAND2

Figure 3.30: Average leakage (ILEAK) vs. temperature for FinFET INV and NAND2standard cells

(a-SG-INV, a-SG-NAND2) with respect to (SG-INV, SG-NAND2) and (IGn-INV,

XT2-NAND2) gates (not shown).

3.1.5 Symmetric-ΦG and asymmetric-ΦG FinFET latches and flip-

flops

Next, we investigate simple latches and flip-flops that leverage combinations of

Symm-ΦG and Asymm-ΦG FinFETs, using insights from earlier sections. We mod-

ified four template configurations, namely, the brute-force transmission gate [TGL,

Fig. 3.31(a)] and half-swing clocked FinFET latches [HSL, Fig. 3.31(b)], and the

corresponding flip-flops [TGF, Fig. 3.32; HSF, Fig. 3.33], in order to demonstrate

the importance of choosing the appropriate kinds of FinFETs to optimize leakage,

propagation delay, and setup time.

Tables 3.4 and 3.5 show the various possible cases of interest for TGL, TGF, HSL,

and HSF using SG-, a-SG-, and IG-mode FinFETs along with their fin counts. TGL1

and TGF1 have only SG-mode FinFETs, which necessitates a larger I1 inverter in

order to overcome I3 and force the data into the cross-coupled inverter configura-

58

(a) TG latch (TGL) (b) HS latch (HSL)

Figure 3.31: FinFET latch templates

Figure 3.32: TG flip-flop (TGF) template

Figure 3.33: HS flip-flop (HSF) template

59

tion. TGL2 and TGF2 employ a-SG-mode FinFETs to implement a weaker I3 in-

verter, hence, permitting a smaller I1 inverter. By replacing I1/I2 with a-SG-mode

FinFETs as well, TGL3 and TGF3 push the limits of operation. TGL4 and TGF4 use

IG-mode FinFETs (with n-FinFET back-gate tied to ground and p-FinFET back-gate

tied to VDD) to weaken I3.

Table 3.4: TG latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET, T2 =SG(1P1N)

Case I1/I2 I3 T 1 I4 I5 I6TGL1 SG SG SG - - -

4P2N/2P1N 1P1N 2P2N - - -TGL2 SG a-SG SG - - -TGL3 a-SG a-SG SG - - -TGL4 SG IG SG - - -

2P1N/2P1N 1P1N 1P1N - - -TGF1 SG SG SG SG SG SG

4P2N/2P1N 1P1N 2P2N 1P1N 1P1N 2P1NTGF2 SG a-SG SG SG a-SG SGTGF3 a-SG a-SG SG a-SG a-SG a-SGTGF4 SG IG SG IG IG SG

2P1N/2P1N 1P1N 1P1N 1P1N 1P1N 2P1N

Table 3.5: HS latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,N1/N3/N7 = SG(2N), I5 = SG(2P1N)

Case I1/I2 N2/N4 I3/I4 N5/N6HSL1 SG SG - -HSL2 SG a-SG - -HSL3 a-SG SG - -HSL4 a-SG a-SG - -HSL5 IG SG - -HSL6 IG a-SG - -

1P1N/1P1N 2N/2N - -HSF1 SG SG SG SGHSF2 SG a-SG SG a-SGHSF3 a-SG a-SG a-SG a-SGHSF4 a-SG SG a-SG SGHSF5 IG SG IG SGHSF6 IG a-SG IG a-SG

1P1N/1P1N 2N/2N 1P1N/1P1N 2N/2N

60

For the HS latches and flip-flops, HSL1 and HSF1 constitute the base cases with

only SG-mode FinFETs. A half-swing clock is employed, which toggles between 0

and VDD/2, thereby reducing dynamic clock power dissipation considerably. How-

ever, the switched clock load capacitance doubles, as N1-N7 are sized-up to two

fins to be able to flip the cross-coupled inverters. Therefore, the effective clock

power dissipation is halved with respect to TG configurations using T 1/T 2 gates

with single-fin FinFETs. (HSL2, HSL3, HSL4) and (HSF2, HSF3, HSF4) introduce a-

SG-mode FinFETs at all possible locations except N1, N3, and N7, which are driven

by the half-swing clock. HSL5 and HSF5 use IG-mode FinFETs (with n-FinFET

back gate tied to ground and p-FinFET back gate tied to VDD) for I1/I2 and I3/I4.

This carries over to HSL6 and HSF6 as well, however, N2/N4/N5/N6 are a-SG-

mode FinFETs. With respect to layout area, all versions of TGL occupy the same

area with standard cell height consisting of four fins for the p-FinFETs and two

fins for the n-FinFETs. The same is true for all versions of TGF, HSL, and HSF. Both

TGFs and HSFs are negative edge-triggered, as shown in Figs. 3.34(a) and 3.34(b).

For TGL and TGF configurations, when the clock is high, data value D is forced

into I2/I3 through T 1, while T 2 is off and I4/I5 are in the hold mode. When the

clock goes low, T 1 shuts off and T 2 forces the value at the output of I2 into I4/I5

for TGF. In HSL (HSF) configurations, when both clock and D are high, QB (INB)

is pulled low, forcing Q (IN) high. For HSF, when the clock goes low, N7 is active,

and depending on the polarity of IN and INB, Q is pulled either low or high.

Table 3.6 shows the hold static noise margins of the cross-coupled inverter pairs

used in Tables 3.4 and 3.5. a-SG (1P1N) outperforms the rest of the configurations,

including IG (1P1N) suggesting that a-SG-mode FinFETs are ideal for keeper in-

verters in latches/flip-flops as well.

Quasistationary/DC simulations were used to measure average leakage over

all possible legal combinations of input/output vectors and internal states. From

61

Table 3.6: Hold static noise margins, xPyN = x-fin p-FinFET, y-fin n-FinFET

INV1 INV2 SNM (mV)(SG, 2P1N) (SG, 1P1N) 310(SG, 1P1N) (SG, 1P1N) 315(SG, 2P1N) (IG, 1P1N) 325

(a-SG, 2P1N) (SG, 1P1N) 320(a-SG, 2P1N) (a-SG, 1P1N) 375(a-SG, 1P1N) (a-SG, 1P1N) 400

(IG, 1P1N) (IG, 1P1N) 375

0 1 2 3 4 5

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vol

tage

(V

)

CLK D

IN

INBQ

QB

(a) TGF1

0 0.5 1 1.5

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vol

tage

(V

)CLK

DQ

QBIN

INB

(b) HSF1

Figure 3.34: Transient simulations of TGF1 and HSF1

Fig. 3.35, TGL3, which employs a-SG-mode FinFETs (except for T 1), can be seen to

have over 10× lower leakage than TGL1. Similarly, HSL4, with mostly a-SG-mode

FinFETs, has nearly 3× lower leakage compared to HSL1. From Fig. 3.36, TGF3 and

HSF3 can be seen to follow similar trends. The introduction of IG-mode FinFETs

results in a marginal reduction in average leakage in (TGL4, TGF4), (HSL5, HSF5),

and (HSL6, HSF6).

Propagation delay was averaged for 1→ 0 and 0→ 1 transitions, assuming an

output load of four size-X1 SG-INVs for both latches and flip-flops. From Fig. 3.37,

TGL3 can be seen to have nearly 2× larger delay compared to TGL1 owing to the

weaker a-SG-mode FinFETs. Similar observations hold good for (HSL1, HSL2) and

(HSL3-HSL6). However, from Fig. 3.38, TGF3 and TGF1 can be seen to have almost

62

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6x 10

−9

Ave

rage

ILE

AK

(A

)

TGL2

TGL3

TGL4

TGL1

HSL1

HSL2

HSL3

HSL4

HSL5

HSL6

Figure 3.35: Average ILEAK for FinFET latches

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6

7

8x 10

−9

Ave

rage

ILE

AK

(A

)

TGF1

TGF2

TGF3

TGF4

HSF1

HSF2

HSF3

HSF4

HSF5

HSF6

Figure 3.36: Average ILEAK for FinFET flip-flops

identical delays. This is due to the fact that forcing data into I4/I5 in TGF1, when

the clock is low, is harder due to the stronger SG-mode keeper FinFETs, thereby

increasing the CLK→Q delay. The poor leakage-delay behavior for TGF4 suggests

that IG-mode FinFETs are best suited for the weaker inverters (I3/I5) and should

not be used for I2/I4 in TGF configurations. For (HSF3-HSF6), the introduction

of IG/a-SG-mode FinFETs results in roughly 30% increase in average propagation

delay with respect to (HSF1, HSF2).

63

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5x 10

−11

Ave

rage

pro

paga

tion

dela

y (s

)

TGL2

TGL3

TGL4TGL1

HSL1HSL2

HSL3

HSL4 HSL5 HSL6

Figure 3.37: Average propagation delay for FinFET latches

1 2 3 4 5 6 7 8 9 100

1

2

3

4

5

6x 10

−11

Ave

rage

pro

paga

tion

dela

y (s

)

TGF1

TGF2

TGF3

TGF4

HSF1 HSF2

HSF3 HSF4

HSF5 HSF6

Figure 3.38: Average propagation delay for FinFET flip-flops

The maximum of the setup periods of legal 0→ 1 and 1→ 0 output transitions

(for corresponding input transitions before the clock edge) is reported as the setup

time for flip-flops in Fig. 3.39. TGF4 has the smallest setup time owing to the IG-

mode FinFETs, which weaken I3. TGF1 has a comparatively low setup time for

an all-SG-mode FinFET configuration, owing to the large I1 data-forcing inverter.

(TGF2, TGF3) and (HSF3, HSF4) have considerably larger setup times as they em-

ploy weaker a-SG-mode FinFETs. Similar trends were observed for latches as well.

64

In summary, (TGF3, HSF3), which were implemented using a combination of

a-SG-mode and SG-mode FinFETs, have the best tradeoffs in leakage, delay, and

setup time for (TG, HS) flip-flop configurations.

1 2 3 4 5 6 7 8 9 100

0.5

1

1.5

2

2.5

3

3.5x 10

−11

Set

up ti

me

(s)

TGF1

TGF2

TGF3

TGF4

HSF1HSF2

HSF3

HSF4

HSF5HSF6

Figure 3.39: Setup time for FinFET flip-flops

3.1.6 Section summary

In this section, we evaluated Symm-ΦG SG-/IG-mode FinFETs and Asymm-ΦG

SG-mode FinFETs head-to-head in a high-performance process. We also investi-

gated the design space of logic gates, latches, and flip-flops employing them in

various possible configurations, which resulted in the following key insights:

• Asymm-ΦG SG-mode FinFETs in a high-performance process provide very

steep subthreshold slopes, ultra-low off-currents, and reasonably high on-

currents in comparison to corresponding Symm-ΦG SG-/IG-mode FinFETs,

and maintain their advantage at high temperature. This suggests that they

could be widely used (in combination with Symm-ΦG SG-mode FinFETs

when necessary) in off-critical paths, with the same layout as Symm-ΦG

65

SG-mode devices and without the routing and process related problems of

integrating IG-mode back-gate biased devices.

• While it is possible to trade off leakage vs. delay using IG-mode FinFETs,

indiscriminate use of back-gate biasing could impact area, performance, and

leakage as IG-mode devices need extra area to land back-gate contacts and

have degraded subthreshold slopes. In this regard, using a single IG-mode

device at the top of a series stack is sufficient to reduce leakage considerably

without too much degradation in delay.

3.2 Fault models for logic circuits in the multi-gate era

In this section, we delve into the problem of developing fault models for FinFET

logic circuits.

3.2.1 Introduction

Fault modeling [110] for planar single-gate CMOS is an extensively researched

area, and comprehensive fault models have been established at various abstraction

levels. Bridging [111], stuck-at [112], delay [113] and stuck-open [114] faults are

among the most widely used fault models for CMOS. However, on account of the

double-gate configuration, it is unclear if CMOS fault models can comprehensively

model defects in FinFET circuits. Here, the important questions that need to be

investigated are: (i) how do FinFET logic gates behave in the presence of defects

like opens and shorts, and (ii) are CMOS fault models adequate for covering all

defects in FinFET logic gates.

To the best of our knowledge, this is the first work on fault modeling for Fin-

FET circuits that considers defects in both SG- and IG-mode FinFETs, including

66

open defects on the back gate, which are unique to IG-mode FinFETs. The main

contributions of this section can be summarized as follows:

• We model opens (cuts) and shorts in FinFET gates with SG- and IG-mode

devices using mixed-mode device simulation in Synopsys Sentaurus TCAD

[81] using FinE [104], a double-gate circuit design environment.

• In the case of a floating back-gate node due to an open defect, we show that a

combination of models is needed to account for the observed leakage-delay

trends of the logic gates, taking into account the dominant capacitances that

couple to the back gate.

• In the regime dominated by stray coupling capacitances, pulse-broadening

(pulse-shrinking) occurs for a wide range of back-gate voltages in n-FinFET

(p-FinFET) back-gate cuts.

• While pulse-shrinking occurs for a majority of cases in the regime domi-

nated by strong coupling capacitances to the front gate/drain/source re-

gions, pulse-broadening and stuck-at conditions may also be manifested de-

pending on the logic gate topology.

The rest of the section is organized as follows. We present related fault modeling

work in Section 3.2.2. In Section 3.2.3, we discuss the FinFET inverter (INV) and

NAND gates from a testing perspective. We demonstrate the need for new FinFET

fault models and explore them in Section 3.2.4. Finally, Section 3.2.5 presents the

section summary.

3.2.2 Related work

Fault modeling is the process of developing models of physical defects at higher

levels of abstraction. For CMOS circuits, it is estimated that around 80% of phys-67

ical defects can be detected using the stuck-at fault model [115]. With scaling of

technology, testing for bridging and delay faults becomes critical. Shrinking ge-

ometries lead to greater chances of bridging between disjoint conductive regions,

owing to mask and lithographic imperfections. Process variations greatly affect

the Vth spread of FETs, which directly affects drain current, leading to delay faults

on nodes with considerable capacitive loads. For specific input combinations, a

bridging fault causes a connection between supply and ground, and leads to an

abrupt change in the supply current in the steady state. This behavior can be de-

tected by monitoring the supply current through IDDQ test [116], possibly using an

array of on-chip current sensors [117].

Currently, there are no comprehensive fault models for FinFET circuits.

Vazquez et al. [118] showed that the hold time for stuck-open faults decreases

dramatically on account of increased sub-threshold leakage and gate leakage in

nanoscale FETs, including FinFETs. However, the study characterizes only stuck-

open faults in SG-mode FinFET circuits and defects in IG-mode FinFETs were not

considered. While bridging, stuck-at, stuck-open, and delay faults cover most of

the defects in CMOS gates, it is unclear if they comprehensively map defects in

FinFET logic gates as well, which is the focus of the current investigation. The

back-gate bias plays an important role in determining the device Vth of IG-mode

FinFETs, which is a significant departure from planar single-gate CMOS, as it can

lead to scenarios such as open defects on the back gate, with the intended signal

at the front gate. Therefore, it is expected that fault models for IG-mode FinFETs

(and logic gates employing them) cannot be oblivious to device/layout parasitics,

unlike most fault models for planar CMOS. Preliminary results capturing the

effect of defects manifested as cuts on the back gate were presented in [119, 120],

indicating a formidable challenge towards the development of a reliable fault

model. In the subsequent sections, we deal with different flavors of FinFET logic

68

gates and show that a hybrid combination of models is essential to capture the

effect of back-gate cuts.

3.2.3 FinFET logic gates

We used the FinE simulation framework (Fig. 3.1) to perform all the experiments

in this work, using device parameters specified in Table 3.1. The on-state currents

for SG- and IG-mode n-/p-FinFETs are presented in Table 3.7. While we cover SG-

/LP-mode FinFET INV and NAND2 gates in this work, the methodology outlined

in later sections is applicable to other mixed configurations discussed in Section

3.1.

From the perspective of testing, the observable metrics of interest are delay and

static leakage power consumption. To obtain the gate delay (tgate), the low-to-high

transition delay (tpLH) and high-to-low transition delay (tpHL) were measured from

the 50% transition of the input to 50% transition of the output, and tgate was set to

max(tpLH , tpHL). For transient simulations, the rise and fall times of the input signal

were set to 10ps, and each logic gate had a fanout of four SG-mode INVs. Since

leakage power consumption is input vector dependent, we report the maximum

leakage observed in each configuration.

Table 3.7: ON-state current for individual FinFET devicesConfiguration n-FinFET ION (A) p-FinFET ION (A)

SG-mode 7.31∗10−5 9.33∗10−5

IG-mode 2.24∗10−5 2.41∗10−5

Using SG- and IG-mode FinFETs, a variety of CMOS-style logic gates can be

constructed as discussed in Section 3.1. In Fig. 3.40, the schematics of SG- (shorted-

gate) and LP-mode (low power) INV and NAND gates are, respectively, shown.

SG-mode logic gates consist of pure SG-mode FinFETs and have no flexibility in

trading off leakage vs. delay. The LP-mode logic gates consist of IG-mode Fin-

69

Figure 3.40: (a) SG-mode INV, (b) LP-mode INV, (c) SG-mode NAND, and (d) LP-mode NAND.

FETs, where the back-gate of the p-FinFETs (n-FinFETs) is connected to a positive

(negative) voltage source, denoted by VHIGH (VLOW ). LP-mode logic gates provide

an opportunity for tuning the leakage-delay characteristic of the gate by adjusting

the back-gate bias statically or dynamically.

0 0.05 0.1 0.15 0.2 0.25 0.321

22

23

24

25

26

27

28

29

30

31x 10

−12

∆ V (V)

De

lay (

s)

LP mode INV leakage and delay vs. ∆ V

0 0.05 0.1 0.15 0.2 0.25 0.30

0.2

0.4

0.6

0.8

1

1.2

1.4

x 10−9

Le

aka

ge

(A

)Leakage

Delay

(a)

0 0.05 0.1 0.15 0.2 0.25 0.340

50

60x 10

−12D

ela

y (

s)

LP mode NAND leakage and delay vs. ∆ V

0 0.05 0.1 0.15 0.2 0.25 0.30

0.2

0.4

0.6

0.8

1

1.2

1.4

x 10−9

∆ V (V)

Le

aka

ge

(A

)

Delay

Leakage

(b)

Figure 3.41: Leakage and delay characteristics under different back-gate bias volt-ages for (a) LP-mode INV, and (b) LP-mode NAND.

In Figs. 3.41(a) and 3.41(b), the trends of leakage and delay for an LP-mode INV

and NAND gate are, respectively, shown. The horizontal axis (∆V ) refers to the

increment (decrement) in the back-gate bias for p-FinFETs (n-FinFETs) in the LP-

mode. For these simulations, the back-gate bias voltages are calculated as follows

(note that VDD = 1V ):

VHIGH = 1+∆V and VLOW = 0−∆V

70

From Figs. 3.41(a) and 3.41(b), reverse biasing the back-gate (above the rail for p-

FinFET and below the rail for n-FinFET) increases the effective transistor threshold

voltages linearly, whereby, leakage decreases exponentially and delay increases

roughly linearly [94]. For the SG- and LP-mode INV and NAND gates, the maxi-

mum leakage was around 6× higher than the minimum.

Table 3.8: Metrics of SG/LP-mode FinFET INV/NAND gates

Logic gate Leakage (A) Delay (s)SG-mode INV 1.28∗10−9 7.96∗10−12

LP-mode INV 6.55∗10−11 26.11∗10−12

SG-mode NAND 1.29∗10−9 15.13∗10−12

LP-mode NAND 6.49∗10−11 53.82∗10−12

We also simulated fault-free SG-mode INV and NAND gates to compare their

leakage and delay values with respect to their LP-mode counterparts. The results

are presented in Table 3.8. For the LP-mode logic gates, nominal VHIGH and VLOW

are as shown in Table 3.1. From Table 3.8, SG-mode implementations result in

around 3× faster gates at the expense of an order of magnitude higher leakage.

3.2.4 Modeling defects in FinFET logic gates

In this section, we examine the behavior of FinFET INV and NAND gates in the

presence of defects. In order to model defects, we inserted cuts on each wire in

the SG- and LP-mode INV and NAND gates and shorted each transistor’s source

and drain terminals. As FinFETs have fully-depleted body regions, they do not

generally suffer from the history effect seen during the test of partially-depleted

SOI FETs [121].

We applied test vectors that detect all faults in CMOS based INV and NAND

gates to the SG- and LP-mode FinFET INV and NAND gates, respectively. Ta-

ble 3.9 shows the sequence of test vectors applied to both SG- and LP-mode Fin-

FET NAND gates in the first column and detected stuck-at, stuck-on, and stuck-71

Table 3.9: Detected and undetected faults in SG- and LP-mode FinFET NANDgates

Test vector Detected faults Undetected faultsstuck-at stuck-on stuck-open11 A/0, B/0, out/1 pA, pB stuck-on01 A/1, out/0 nA stuck-on pA stuck-open pA open back-gate11 A/0, B/0, out/1 pA, pB stuck-on nA, nB stuck-open nA, nB open back-gate10 B/1, out/0 nB stuck-on pB stuck-open pB open back-gate

open faults in the second, third, and fourth columns, respectively. The last column

shows the faults that cannot be detected. In the table, the first (second) bit of the

test vector shows the value of input A (B) of the NAND gates shown in Fig. 3.40

for both SG and LP modes. pA, pB (nA, nB) refer to the p-FinFETs (n-FinFETs) fed

by the corresponding signal. A stuck-at 0 (1) fault assumes that the value of a wire

is fixed at 0 (1) and cannot be changed. A stuck-on fault assumes that a transistor

is always on, which corresponds to shorting the source and drain terminals of a

transistor. A stuck-open fault represents the case opposite to the stuck-on fault,

that is, a transistor is always off regardless of the applied gate voltage.

In order to detect stuck-at faults, a test vector assigns a value opposite to the

assumed stuck-at fault and ensures that the faulty value is observed at the output.

The stuck-at faults are assumed to be at the gate inputs and the output. The second

column in Table 3.9 shows that all stuck-at faults of FinFET NAND gates can be

detected using the CMOS stuck-at test set.

A stuck-on fault causes a VDD-to-ground connection for a particular set of input

combinations. In this case, the static leakage current increases drastically. The third

column in Table 3.9 shows which test vectors can be used to detect this type of fault.

In the presence of the stuck-on faults, the leakage currents observed during the test

of SG- and LP-mode INV/NAND gates are shown in Table 3.10. The four to six

orders of magnitude increase in current, in comparison to the nominal leakage

shown in Table 3.8, enables detection of these defects using IDDQ testing.

72

Table 3.10: Shorting source and drain of an n-/p-FinFET in SG/LP-modeINV/NAND gates

Logic gate Maximum leakage (A)SG-mode INV 9.33∗10−5

LP-mode INV 2.41∗10−5

SG-mode NAND 5.13∗10−5

LP-mode NAND 1.52∗10−5

Detection of a stuck-open fault requires application of a two-pattern test. The

first vector is for initialization and the second one results in the wrong output

value in the presence of the fault. The sequence of test vectors applied to a NAND

gate (shown in Table 3.9) inherently contains three sets of two-pattern tests. For

example, application of initialization vector 11 followed by test vector 01 detects

pA stuck-open. The fourth column in the table lists all FinFET stuck-open faults

that can be detected. Although it is possible to detect all faults related to the front

gates of the transistors in both SG- and LP-mode FinFET NAND gates, detection

of open faults on the back gates is non-trivial. (It should be noted that, while front

and back gates are physically equivalent in our structure, we refer to the gate,

which has been disconnected from the inputs/fixed bias voltage, as the back gate.)

When a cut on the back gate occurs, it should be treated as a floating node. De-

pending on the capacitances that couple to the back gate node and transitions that

occur across the coupling capacitances, the back-gate may float to the intended

original value, or vary drastically and dynamically. Since the back-gate bias af-

fects the Vth strongly [42], FinFETs display a range of behaviors, which needs to be

analyzed from a leakage-delay perspective.

The width quantization property of FinFETs necessitates the use of an integer

number of fins to implement a FET with a large electrical width. Therefore, a short

or a cut on some combinations of adjacent fins can lead to a partially-defective

transistor. The analysis presented below is based on the assumption that each

73

transistor in the SG- and LP-mode INV and NAND gates has one fin. An analysis

of cuts on a subset of fins is provided in Section 3.2.4.

Based on the above discussion, we categorize back-gate cut FinFET operation

into three regimes: back-gate node capacitance (CBG) dominated by coupling from

stray sources (CST RAY,BG), coupling from the front gate (CFG,BG), and coupling due

to source/drain regions (CD,BG, CS,BG). Layout styles and choice of device parame-

ters greatly affect the predominant regime of operation.

Regime I: CST RAY,BGCFG,BG,CD,BG,CS,BG

In circuits with dense layouts, crowding of interconnect features around a back-

gate cut defect can increase the effective CST RAY,BG, whereby the back-gate node

voltage is almost independent of voltage changes occurring at the front gate as

well as source/drain regions. Fig. 3.42(a) shows a possible scenario, with a large

wire capacitance contribution to CST RAY,BG from a cut located at the VHIGH back-gate

bias line from the voltage generator, shared by many logic gates in the region.

(a) (b)

Figure 3.42: (a) Regime I: Opens on shared back-gate bias lines for many LP-modeINV gates, and (b) Regime II/III: Opens on individual back-gate bias lines for anLP-mode INV gate

74

Effect of an open on the p-FinFET back-gate in LP-mode logic gates

We simulated LP-mode INV and NAND gates with open defects on the back gates

of the p-FinFETs. The back-gate biases, VHIGH and VLOW , for the defect-free cases

were set to their nominal values shown in Table 3.1. Since the back gate is floating

when open, it is necessary to characterize the logic gate for a range of possible

voltages, which may be manifested on the dynamic node. Assuming CST RAY,BG

CFG,BG,CD,BG,CS,BG, the voltage on the cut back gate, VBG = Vcut , was varied from

VLOW to VHIGH . In Fig. 3.40(d), for the LP-mode NAND gate, a wire cut at VHIGH

before the fanout leads to an open fault for both p-FinFETs. They have the same

variable back-gate bias, which is Vcut . While this turned out to be the worst-case

scenario, the leakage-delay characteristics were similar in the cases in which the

connection to only one p-FinFET back-gate is cut.

In Figs. 3.43(a) and 3.43(b), the variation in leakage and delay with respect to

Vcut is shown. A drastic increase in leakage occurs as Vcut decreases from its in-

tended bias of VHIGH . For the LP-mode NAND gate, leakage stays relatively con-

stant until Vcut approaches 0.8V . This is due to the fact that leakage of n-FinFETs

(leakage vector AB=10) dominates the maximum leakage up to this point. When

the back-gate bias of the p-FinFETs is less than 0.8V , leakage from p-FinFETs dom-

inates (leakage vector AB=11), resulting in an exponential increase in leakage with

decreasing Vcut .

Also, as Vcut decreases from its intended value of VHIGH , the logic gates switch

faster due to the fact that the p-FinFET has greater current drive, which reduces the

gate delay. However, below 0.6V , the high-to-low transition delay tpHL dominates

the maximum delay as most of the current through the pull-down network consists

of the p-FinFET leakage current, thereby limiting the current that discharges the

output capacitance. Beyond a certain point, the p-FinFET is always on, so that the

output is stuck at high and the logic gates fail to function correctly. Therefore, it is

75

−0.2 0 0.2 0.4 0.6 0.8 1 1.220

25

30

35

40

45

50x 10

−12

Vcut

(V)

De

lay (

s)

LP mode INV leakage and delay vs. Vcut

on p−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

Le

aka

ge

(A

)

Leakage

Delay

tpHL

tpLH

(a)

−0.2 0 0.2 0.4 0.6 0.8 1 1.225

30

35

40

45

50

55

x 10−12

Vcut

(V)

De

lay (

s)

LP mode NAND leakage and delay vs. Vcut

on p−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

Le

aka

ge

(A

)

Delay

Leakage

tpHL

tpLH

(b)

Figure 3.43: Leakage and delay variation with different p-FinFET back-gate biasvoltages for (a) LP-mode INV, and (b) LP-mode NAND.

possible to conclude that a cut on the back gate of a p-FinFET in an LP-mode logic

gate corresponds to many fault models, depending on the observed voltage on the

cut. If Vcut is below 0.5V , the fault corresponds to a p-FinFET stuck-on and can be

detected using IDDQ testing. In the extreme case, the output is stuck-at 1. On the

other hand, because of coupling effects, if Vcut assumes values greater than 0.6V ,

76

then the logic gates switch faster, but have increased leakage power. This scenario

does not have a corresponding fault model in CMOS and is unique to FinFETs.

Effect of an open on the n-FinFET back-gate in LP-mode logic gates

To apply the methodology presented above to open defects on the back gates of

n-FinFETs in LP-mode INV and NAND, Vcut was varied from VLOW to VHIGH . We

inserted a cut on the back-gate wires of the top and bottom n-FinFETs in the pull-

down network and observed similar characteristics.

In Fig. 3.44, the variation of leakage and delay values with changing Vcut is

shown. Similar to the case with opens on p-FinFET back-gates, opens on n-FinFETs

can cause an exponential increase in leakage. However, delay is not affected until

n-FinFETs become severely forward-biased (i.e., Vth of the FET drops considerably),

which happens after 0.4V . In this region, although tpHL decreases, it is not the

dominating factor and the delay of low-to-high transition limits the overall delay

of the gate. In the extreme case, the n-FinFET is always on and the output is stuck-

at 0.

An open fault on an n-FinFET can be modeled using two fault models. If the

back-gate bias drifts toward 0.4V , then leakage increases. However, it might not

be as high as in the case of an SG-mode n-FinFET. Therefore, it is possible that an

IDDQ test may miss detecting this defect. On the other hand, when the n-FinFET is

severely forward-biased (Vcut > 0.4V ), delay and leakage increase substantially. It

is possible to detect this case using delay fault testing or IDDQ testing. The above

scenario is also unique to FinFETs.

Effect of an open on the p/n-FinFET back-gate in SG-mode logic gates

Fault models for cuts on SG-mode gate connections require special attention as

SG- and LP-mode logic gates use different devices. An IG-mode FinFET in an LP-

77

−0.2 0 0.2 0.4 0.6 0.8 1 1.220

30

40

50

60

70

80x 10

−12

De

lay (

s)

LP mode INV leakage and delay vs. Vcut

on n−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

Vcut

(V)

Le

aka

ge

(A

)

Delay

Leakage

tpHL

tpLH

(a)

−0.2 0 0.2 0.4 0.6 0.8 1 1.250

60

70

80

90

100

110x 10

−12

Vcut

(V)

De

lay (

s)

LP mode NAND leakage and delay vs. Vcut

on n−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−11

10−10

10−9

10−8

10−7

10−6

10−5

10−4

Le

aka

ge

(A

)

Delay

Leakage

tpHL

tpLH

(b)

Figure 3.44: Leakage and delay variation under different n-FinFET back-gate biasvoltages for (a) LP-mode INV, and (b) LP-mode NAND.

mode logic gate has two independent gates. Therefore, a cut on the back-gate wire

corresponds to a change in this voltage. However, a cut on the gate connection of

an SG-mode FinFET changes the FinFET into an IG-mode FinFET with a floating

back gate and all other FETs in SG mode.

Here, as with earlier cases, Vcut was swept between two extreme cases, namely,

VLOW and VHIGH , for a cut on a p-FinFET. In Fig. 3.45, the trends in leakage and

78

delay are shown. On decreasing Vcut , leakage increases. When the p-FinFET is

severely forward-biased, the leakage current approaches very high values, similar

to those of LP-mode logic gates. However, the difference between the LP- and

SG-mode logic gates lies in the delay characteristics. In SG-mode gates, the cut

typically increases delay in comparison to the fault-free case (for all the swept back-

gate biases in the INV and for a large fraction of the swept biases in the NAND

gate). In addition, for back-gate voltages spanning VLOW to VHIGH , the logic gate

remains functional. This result can be explained by the greater drive strength of

SG-mode FinFETs as compared to IG-mode FinFETs. While a back-gate cut turns

an SG-mode p-FinFET into an IG-mode p-FinFET, the remaining FinFETs in the

pull-up network compensate for the defect at the expense of increased delay and

leakage.

Simulations for cuts on n-FinFET back-gate connections for SG-mode logic

gates were performed using a similar setup and the resulting leakage-delay char-

acteristic is shown in Fig. 3.46. When the n-FinFETs are forward-biased, leakage

increases drastically and delay tends to decrease up to a certain point. To sum-

marize, cuts on back-gate connections of SG-mode FinFETs cause an increase in

leakage and delay in the worst case. Also, for back-gate voltages spanning VLOW to

VHIGH , the logic gates maintain functionality. This behavior is different from that

observed for LP-mode logic gates.

Effect of an open on a subset of fins

We increased the electrical width of all transistors in LP-mode INV and NAND

gates from one to four fins to analyze the behavior of partially-defective transistors

during testing. To model open defects on the back-gates of LP-mode gates, we

simulated all possible combinations, which are open defects on 1, 2, 3, and 4 fins at

79

−0.2 0 0.2 0.4 0.6 0.8 1 1.25

10

15

20

25

30

35x 10

−12

De

lay (

s)

SG mode INV leakage and delay vs. Vcut

on p−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−9

10−8

10−7

10−6

10−5

10−4

Vcut

(V)

Le

aka

ge

(A

)

Leakage

Fault−freedelay

Delay

Fault−freeleakage

tpHL

tpLH

(a)

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

15

20

25

30

35

40

45

50

55x 10

−12

De

lay (

s)

SG mode NAND leakage and delay vs. Vcut

on p−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−9

10−8

10−7

10−6

10−5

10−4

Vcut

(V)

Le

aka

ge

(A

)

Leakage

Delay

Fault−freedelay

Fault−freeleakage

tpLH

tpHL

(b)

Figure 3.45: Leakage and delay variation with different p-FinFET back-gate biasvoltages for (a) SG-mode INV, and (b) SG-mode NAND.

a time, respectively. It must be noted that the case of open defects on all four fins

is expected to be equivalent to an open defect on the back-gate of a 1-fin FinFET.

In Fig. 3.47, the variation in delay and leakage is shown for an LP-mode NAND

gate when open defects are introduced on one of the p-FinFETs. As expected, when

all four fins are cut, the change in delay and leakage is similar to the one shown in

Fig. 3.43(b). However, if only one or two fins are cut, the gate remains functional

80

−0.2 0 0.2 0.4 0.6 0.8 1 1.25

10

15

20

25x 10

−12

Vcut

(V)

De

lay (

s)

SG mode INV leakage and delay vs. Vcut

on n−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−10

10−9

10−8

10−7

10−6

10−5

10−4

Le

aka

ge

(A

)

Fault−freeleakage

Fault−free delay

Delay

Leakage

tpHL t

pLH

(a)

−0.2 0 0.2 0.4 0.6 0.8 1 1.214

16

18

20

22

24

26

28

30

32

34x 10

−12

De

lay (

s)

SG mode NAND leakage and delay vs. Vcut

on n−FinFET

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−9

10−8

10−7

10−6

10−5

10−4

Vcut

(V)

Le

aka

ge

(A

)

Delay

Fault−freedelay

Fault−freeleakage

Leakage

tpLH

tpHL

(b)

Figure 3.46: Leakage and delay variation with different n-FinFET back-gate biasvoltages for (a) SG-mode INV, and (b) SG-mode NAND.

even if the value of Vcut assumes the extreme value of −0.2V . On the other hand,

the change in leakage with respect to Vcut follows the same trend irrespective of the

number of fins that have open faults.

Although not shown in this section, we also simulated all possible combina-

tions of multiple fin open defects on n-/p-FinFET back gates of LP-mode INV and

NAND gates. The simulation results showed similar trends as Fig. 3.47. The

81

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

20

30x 10

−12

Vcut

(V)

De

lay (

s)

LP mode NAND delay vs. Vcut

for cuts on multiple fins of one pFinFET

Cut 1 out of 4 fins

Cut 2 out of 4 fins

Cut 3 out of 4 fins

Cut 4 out of 4 fins

(a)

−0.2 0 0.2 0.4 0.6 0.8 1 1.210

−10

10−9

10−8

10−7

10−6

10−5

10−4

Vcut

(V)

Le

aka

ge

(A

)

LP mode NAND leakage vs. Vcut

for cuts on multiple fins of one pFinFET

Cut 1 out of 4 fins

Cut 2 out of 4 fins

Cut 3 out of 4 fins

Cut 4 out of 4 fins

(b)

Figure 3.47: Effect of cutting a subset of fins in an LP-mode NAND gate p-FinFETwith four fins on (a) delay, and (b) leakage.

change in leakage with respect to Vcut follows the same trends irrespective of the

number of fins that have open defects. On the other hand, open defects on only

one or two fins do not negatively affect the propagation delay of the gates. This

implies that transistor sizing could be used to improve robustness against delay

faults caused by back-gate open defects within a standard cell.

From the perspective of delay fault testing, majority of the cases in Regime I re-

sult in either pulse-broadening or pulse-shrinking of the output pulse with respect

to an input pulse for buffer-like configurations, i.e., logic gate cascaded with an

82

SG-INV [Figs. 3.48(a) and 3.48(b)]. This is due to the fact that either tpHL or tpLH in-

creases dramatically, thereby leading to slow-rising output edges due to p-FinFET

back-gate cuts (Figs. 3.43, 3.45) and slow-falling output edges due to n-FinFET

back-gate cuts (Figs. 3.44, 3.46) for a wide range of back-gate voltages. These can be

detected using three-/two-pattern delay tests described in [121]. While the above

analysis suggests that it is possible to model the defects in Regime I using a piece-

wise approach, i.e., using a breakpoint at Vcut = 0.5V , and so on, it should be noted

that the breakpoints are very dependent on the sizing of FETs in the gate, making

it impossible to generalize for arbitrary pull-up/pull-down networks.

(a) (b)

Figure 3.48: Pulse characterization setup for (a) SG-mode INV, and (b) SG-modeNAND

Regime II: CFG,BGCST RAY,BG,CD,BG,CS,BG

This scenario can occur when the FET is engineered with sufficiently large

source/drain underlaps (LUN) to the gates (source/drain dopants do not diffuse

into the fin channel region), thereby decreasing CD,BG, CS,BG. A relatively scant

layout with the absence of BEOL features can significantly reduce CST RAY,BG de-

pending on the location of the cut [VHIGH in Fig. 3.42(b)], and CFG,BG can dominate

if the fin thickness (TSI) is small. Therefore, the cut back-gate voltage VBG is deter-

mined by the front gate. Figs. 3.49(a) and 3.49(b) show the conditions for LUN and

TSI under which Regime II dominates, for the chosen FinFET structure.

83

In order to model this regime, we increased the front to back-gate coupling

capacitance (using a thinner fin, TSI = 7nm, larger underlap, LUN = 16nm) and sim-

ulated a buffer configuration for delay fault testing, as shown for the SG-mode

INV in Fig. 3.48(a).

0 5 100

50

100

150

200

LUN

(nm)

Ca

pa

cit

an

ce

(a

F)

CD, BG

CFG, BG

Regime III

Regime II

(a)

6 8 10 12 1420

40

60

80

100

120

140

TSI

(nm)C

ap

ac

ita

nc

e (

aF

)

CD, BG

CFG,BG

Regime II

(b)

Figure 3.49: Interplay between CD,BG and CFG,BG with (a) LUN variation, TSI = 10nm,and (b) TSI variation, LUN = 16nm

Figs. 3.50(a) and 3.50(b) show the transient pulse behavior for an SG-mode INV

with a cut on the n-FinFET and p-FinFET back gate, respectively. Both instances

witness pulse-shrinking (with respect to the defect-free case) albeit on opposite

edges. In the n-FinFET cut back-gate case, on the rising edge of input node A, VBG

rises to an intermediate voltage (instead of VDD), thereby marginally increasing the

fall time for node OUT and rise time for node OUT 2. During the falling edge of

node A, VBG mimics VA and settles to the intended voltage. As a result, the falling

edge of node OUT 2 is not affected. With the p-FinFET cut back-gate case, the

falling edge of node OUT is unchanged, while the rising edge is sharper. This is

due to the fact that VBG is below the rail, helping improve the drive on the p-FinFET

and, hence, the falling edge of node OUT 2 occurs earlier. A smaller slew rate on

node A would cause greater pulse-shrinking as VBG is likely to be negative for a

84

longer period of time. This shows that input slew rate as well as front to back-gate

coupling are key factors affecting pulse-shrinking in this regime of operation.

For LP-mode INV gates, in the n-FinFET cut back-gate case [Fig. 3.51(a)], VBG

rises to an intermediate voltage on the rising edge of node A on account of which

the falling edge of node OUT occurs earlier than the defect-free case (with VBG =

VLOW ), and node OUT 2 rises earlier, resulting in pulse-broadening. In the p-FinFET

cut back-gate case [Fig. 3.51(b)], VBG remains close to zero, weakly turning on the p-

FinFET and node OUT fails to discharge completely for the given pulse width. As a

result, the rising edge of node OUT 2 is delayed, thereby causing the pulse to shrink

considerably. Hence, the behavior of SG- and LP-mode INV gates under n-FinFET

cut back-gate cases is opposite for the current configurations, while the p-FinFET

cut back-gate cases are similar with different degrees of pulse-shrinking. Similar

results were obtained for SG- and LP-mode NAND gates and are not shown here.

Pulse-shrinking due to a late rising edge can lead to setup time failures, while early

falling edges can lead to hold time failures. They are generally detected using two-

pattern delay tests [121].

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VAV

OUT2

VBG

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VOUT2

VA

VBG

(b)

Figure 3.50: Transient pulse behavior of SG-mode INV in Regime II with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut

85

0 50 100 150 200 250

x 10−12

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)V

OUT

VBG

VA

VOUT2

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VA

VBG

VOUT2

(b)

Figure 3.51: Transient pulse behavior of LP-mode INV in Regime II with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut

Regime III: CD,BG,CS,BGCST RAY,BG,CFG,BG

In Fig. 3.42(b), if the FinFET is designed with small or no source/drain underlaps,

or even overlaps to the gates, CD,BG, CS,BG dominate, and CST RAY,BG, CFG,BG have

little effect in determining VBG. Fig. 3.49(a) shows the interplay between CD,BG

and CFG,BG with LUN variation, which is difficult to control. With LUN = 10nm, the

device has relatively low CD,BG, while with LUN = 0nm, CD,BG is greater, moving

from Regime II to Regime III.

We used the same setup as Figs. 3.48(a) and 3.48(b) to simulate transient pulse

behavior. In Fig. 3.52(a), for the case of the n-FinFET back-gate cut, VBG rises dur-

ing the rising edge of node A and is driven below the rail on the falling edge, which

enables rapid charging of node OUT and discharging of node OUT 2, resulting in

marginal pulse-shrinking. With LUN = 0nm [Fig. 3.52(b)], pulse-shrinking is more

pronounced on account of larger CD,BG, and increased drive current of the FETs due

to lower LUN . Similar observations can be made in the p-FinFET back-gate cut case

in Figs. 3.53(a) and 3.53(b) with a major difference − since VBG is driven below the

rail, it strongly turns on the p-FinFET, thereby preventing node OUT from fully

86

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)V

OUT

VA

VOUT2

VBG

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VA

VOUT2

VBG

(b)

Figure 3.52: Transient pulse behavior of SG-mode INV having n-FinFET back-gatecuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime III.

discharging, which is also accompanied by increased leakage (not shown). This

shows that between Regimes II and III, SG-mode INVs exhibit varying degrees of

pulse-shrinking. In the case of an n-FinFET back-gate cut in LP-mode INV [Fig.

3.54(a)], there is no change in the behavior of the gate as VBG ≈ VLOW . However,

for a p-FinFET back-gate cut [Fig. 3.54(b)], VBG is driven negative, thereby turning

on the p-FinFET, resulting in logic failure, and node OUT 2 is stuck-at 0. Similar

observations hold for the SG-mode NAND as well [Fig. 3.55], albeit the degree

of pulse-shrinking was marginal for cuts in either FET in the pull-up/pull-down

network. In the LP-mode NAND cases (not shown), p-FinFET back-gate cuts re-

sult in node OUT 2 being stuck-at 0 while n-FinFET back-gate cuts cause marginal

pulse-shrinking.

Finally, it should be noted that for LUN , TSI , and layout style combinations when

neither CD,BG, CFG,BG nor CST RAY,BG dominates, it is difficult to generalize pulse

shrinking/broadening behavior, which is a limitation of the current approach.

87

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)V

OUT

VA

VBG

VOUT2

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VBG

VA

VOUT2

(b)

Figure 3.53: Transient pulse behavior of SG-mode INV having p-FinFET back-gatecuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime III.

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VA

VOUT2

VBG

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VA

VOUT2

VBG

(b)

Figure 3.54: Transient pulse behavior of LP-mode INV in Regime III with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut


As robust design methodologies for multi-gate devices mature, the need to de-

velop fault models for defects becomes increasingly important. In this section, we

showed that most opens and shorts in FinFET logic circuits map to established

fault models in planar CMOS. However, opens on the back gate with the intended

signal at the front gate cause delay and leakage problems, which are unique to Fin-

88

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)V

OUT

VBG, A

VOUT2

VA

(a)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VOUT2

VB

VBG, B

(b)

0 20 40 60 80 100 120 140 160 180

x 10−12

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

VOUT

VOUT2

VA

VBG, A

(c)

Figure 3.55: Transient pulse behavior of SG-mode NAND in Regime III having(a) n-FinFET back-gate cut at A, (b) n-FinFET back-gate cut at B, and (c) p-FinFETback-gate cut at A.

FETs, owing to the strong dependence of Vth on the back-gate bias, thereby com-

pounding the role of device/layout parasitics.

We broadly identified three regimes of operation, which affect testability. In

the regime dominated by stray capacitances, for a wide range of back-gate volt-

ages, the logic gates show pulse-broadening, which can be detected using three-

pattern delay tests. In the regime dominated by front to back-gate or source/drain

to back-gate coupling, back-gate cuts lead to pulse-shrinking in SG-mode gates (to

different degrees depending on the logic gate), primarily due to slow-rising out-

89

put edges for n-FinFET back-gate cuts and fast-falling output edges for p-FinFET

back-gate cuts. In LP-mode gates, however, in the regime dominated by front

to back-gate coupling, n-FinFET back-gate cuts cause pulse-broadening and p-

FinFET back-gate cuts lead to pulse-shrinking. In the regime dominated by back-

gate to source/drain coupling, n-FinFET back-gate cuts lead to marginal pulse-

shrinking while p-FinFET back-gate cuts cause stuck-at faults as the outputs are

permanently held high. The absence of a unified fault model for back-gate cuts

in IG-mode FinFETs poses a testing challenge, owing to the diversity of output

behaviors. However, SG-mode logic gates can be tested for back-gate isolation us-

ing a combination of pulse-broadening/shortening/IDDQ tests, depending on the

regime of operation, without logic failure when appropriately sized.

90

Chapter 4

Efficient Algorithms for 3D-TCAD

Modeling of Emerging Devices and

Circuits

4.1 Introduction

Hardware experiments with multi-gate devices and larger circuits (e.g., SRAMs,

eDRAMs, ring oscillators, etc.) entail very high cost and turnaround time. Thus,

efficient predictive process/device characterization methods for such circuits are

urgently needed. A lack of such methods poses a significant impediment to rapid

progress in this area and represents the technology-circuit co-design gap shown in

Fig. 4.1. Here, while continuum TCAD methods have paved the way for atomistic

TCAD simulation for individual devices, predicting circuit-level metrics in new

process technologies still remains a major challenge.

Reliance on compact models is not possible since such models lag advances

in technology and often require inputs from detailed device simulations of test

structures in the early phases of technology development. Key methodologies

91

Figure 4.1: Technology-circuit co-design gap

to enable immediate, accurate feedback to designers (e.g., optimization of para-

sitics in SRAM bitcell layouts, noise margin analysis, top-down optimization of

the back-end stack, etc.) have not yet been formulated. In Fig. 4.2, the TCAD

Figure 4.2: TCAD flow for the 130nm node and higher

flow for process/device development is shown for the 130nm node and higher

technology nodes. Here, individual devices were investigated in isolation, and92

thereafter, compact models were developed for circuit simulation. At the 90nm

node and below, owing to a variety of physical effects that need to be captured,

the TCAD flow proceeded along the lines of Fig. 4.3. Mixed-mode device-circuit

simulation became the cornerstone of early stage technology/process evaluations

of key circuit elements as compact model development remained a slow and cum-

bersome process. At the 32nm node and lower, owing to the plethora of layout-

dependent effects, 3D-TCAD contiguous/mixed-mode simulation grew more im-

perative [122–124].

Figure 4.3: TCAD flow for 90nm-32nm technology nodes

Though 3D-TCAD based exploration enables accurate predictive modeling at

lower technology nodes, it is beset with major challenges in manual process mod-

eling/simulation of large layouts, inability to adapt quickly to rapidly changing

process recipes using manual inputs on each occasion, computational complexity

of DC/AC and transient-state 3D device simulations, as well as a very high cost

for model setup for each circuit/layout under investigation (LUI). An LUI can con-

sist of anywhere between two to several tens of devices. This highlights the need

93

to develop a seamless set of methodologies integrated with 3D-TCAD eco-systems

for resolving process, layout, and device-level issues quickly. Here, questions with

broad application scope that need attention are:

1. How do we leverage 3D-TCAD to enhance technology-circuit co-design with

emerging devices such as FinFETs?

2. How do we efficiently obtain process-simulated 3D structures for LUIs with

several devices?

Process

conditions

3D process simulation

of the entire layout using detailed

mechanisms/kinetics

Tweak layout

3D device-simulation-ready structure

3D device simulation:

Exhaustive

characterization,

simple physical models

Advanced transport

phenomena

Transient analysis with

external excitation!

Extract I-V and C-V

characteristics,

thermal behavior, etc.

Litho simulation,

mask generation

Metrics look

OK?

Generic layout

Tweak process

O(days to weeks)

No

O(months)O(months to years)

Figure 4.4: The ultimate wishlist for 3D-TCAD assisted process/device develop-ment

Fig. 4.4 succinctly summarizes the above discussion and presents the ultimate

wishlist for 3D-TCAD assisted hardware development, assuming nominal compu-

tational resources per case. With current structure generation methodologies, 3D

process simulations of LUIs is feasible in the time-frame of days to several weeks,

94

depending on the device type and size of the layout. 3D device simulation using

simple physical models, followed by iterative process/layout optimization take

on the order of months, and the use of advanced transport phenomena/transient

analysis in the loop pushes the computation time to several months/years. Hence,

the implementation flow presented in Fig. 4.4 is impractical, and has not been at-

tempted in the industry or academia, which refer to it as the ‘many-device TCAD

barrier’.

In this work, we develop efficient and accurate methodologies for unifying

the layout and process simulation worlds, thereby, expanding the horizon of pre-

dictive modeling for emerging devices beyond the ‘many-device TCAD barrier,’

which is a major showstopper at lower technology nodes [125]. For the first time,

we show that it is profitable to adopt an automated structure synthesis approach

for large-scale structure generation, rather than perform top-down process simu-

lations of LUIs. In doing so, we identify important bottlenecks that plague model-

ing efforts in 3D-TCAD structure generation, and outline innovative techniques to

overcome them. The proposed methodologies are inspired by the observation that

all regions of the device structure need not have process-simulation-level accuracy,

and that it is possible to amortize process simulations of simpler blocks by reusing

them to synthesize structures corresponding to larger layouts.

The rest of the chapter is organized as follows. In Section 4.2, we review re-

lated work. In Section 4.3, we describe the structure synthesis methodologies in

detail. We cover two case studies using the proposed methodologies in Section

4.4. Finally, we present a brief discussion in Section 4.5 and conclude in Section

4.6.

95

4.2 Related work

Over the past few years, 3D-TCAD based analysis of emerging devices like Fin-

FETs, etc., has garnered increased attention. It has guided the semiconductor in-

dustry in the march to miniaturize transistors by streamlining the design of test

processes/structures, resulting in dramatic cost savings. 3D-TCAD simulation

for obtaining modeling insight in emerging CMOS/Flash devices is demonstrated

in [122–124, 126–128]. Most TCAD flows consist of two major steps, i.e., process

simulation followed by device simulation. The primary objective of process sim-

ulation is to accurately predict the physical/structural layers and geometry of de-

vices at the end of a process run, as well as the active dopant/stress distributions.

Techniques to improve process simulation have been investigated in [129–132].

The complexity of physical models is a major factor that impacts process simu-

lation. Simplified physics minimizes computation time. With technology scaling,

however, the need for ever more accurate doping/stress profiles has increased and

complex physical models are added at each new generation. On account of the de-

tailed physical modeling involved, process simulation is almost exclusively used

to fine-tune the development of individual devices. Therefore, it is difficult to

generate accurate integrated representations of front-end-of-line (FEOL) and back-

end-of-line (BEOL) structures corresponding to LUIs in TCAD. This has resulted

in a segmented modeling approach for FEOL/BEOL components, which can be a

problem at lower technology nodes, e.g., in capacitance (parasitics) extraction for

highly scaled circuits, where a true 3D representation, consisting of active semi-

conductor devices, metals, and conformal dielectrics, is crucial to compute accu-

rate results through transport analysis based 3D-TCAD extraction [133]. Fig. 4.5

summarizes the different application domains of 3D-TCAD from the perspective

of process structure complexity and physical model complexity. TCAD has tra-

ditionally been used in quadrants III/IV and quadrants I/II have been relatively

96

inaccessible owing to the high computational costs of structure generation via pro-

cess simulation.

Simple physical models

Complex structures

E.g., Parasitics extraction

for 6T SRAM

Simple physical models

Simple structures

E.g., First-pass simulation

of a simple FET

Complex physical models

Simple structures

E.g., Detailed transport simulation

of a simple FET

Complex physical models

Complex structures

E.g., Detailed transport

simulation for process-

simulated 6T SRAM

I. II.

III. IV.

Figure 4.5: TCAD modeling quadrants

The ideas proposed in the next section are radically new approaches for chip-

ping away the computing barriers shown in Fig. 4.4 that have been unavoidable

in traditional 3D-TCAD approaches.

4.3 Structure synthesis methodologies

In this section, we outline methodologies for automated 3D structure synthesis

that can circumvent the bottlenecks posed by process simulation of large layouts.

4.3.1 Key ideas

Process/circuit engineers at emerging nodes (e.g., FinFETs) grapple with typical

design questions, such as:

• What is the effect of modifying fin/gate pitch (for manufacturability) on Fin-

FET SRAM array performance?

• Given several different FinFET SRAM bitcell layouts, which one has the best

area vs. stability vs. performance (parasitics) vs. leakage tradeoff?97

• What is the best way to design the back-end stack (metal/via heights and di-

electrics) to minimize RC-delay, while meeting material, fabrication, thermal,

and electro-migration constraints?

Accurate answers to the above can be obtained mainly from detailed pro-

cess/device simulations, as there are no higher-level compact models avail-

able for a new process. Here, the low-level manual input that is needed to

prepare 3D-TCAD decks poses a major barrier, as shown in Figs. 4.6(a) and

4.6(b). Maintaining consistency across layouts/processes/engineers is difficult

and modeling ambiguity can lead to widely differing end results. Also, iterative

process/technology/circuit co-optimization in a TCAD flow cannot be sustained

with a human element within, thereby necessitating massive automation on all

fronts.

The traditional approach to process simulation of M different layouts or opti-

mization of the same layout using M variants is shown in Fig. 4.7(a). The process

simulator requires time on the order of f (N) and memory on the order of g(N) per

case, where N is the number of devices in the layout and f (·)/g(·) are generally

polynomial functions of N. For M layouts/variants, the time complexity scales

as O(M f (N)) and memory complexity as O(maxg(N)). From past experiments

with planar and multi-gate devices, we have found that f (N) ∝ N2+ε for small N,

where ε worsens as N increases. This is unacceptably slow for any kind of itera-

tive process/layout-TCAD simulation based optimization that is much desired by

engineers at lower technology nodes.

In Fig. 4.7(b), we propose an innovative methodology to supplant the brute-

force process simulator driven approach of Fig. 4.7(a). The process simulator is

replaced by a “structure synthesizer,” a plugin into TCAD that can synthesize

process-simulated structures using information from the layout, a formal set of

process assumptions, and ready-to-use pre-processed device regions in the tech-

98

(a)

(b)

Figure 4.6: (a) Modeling ambiguity with manual inputs, and (b) difficulty of itera-tive optimization with human elements in the TCAD flow

nology. The key idea behind the proposed method is to relieve the process sim-

ulator from the burden of simulating layouts with more than one device, which

reduces complexity dramatically. Here, the time required scales as O(Mk(N)) and

memory as O(maxh(N)). With linear (or close to linear) time and memory com-

99

Process

simulator

Process

recipe

Processed structure

N devices

Time: f(N), Memory: g(N)

Loop over M layouts Time: O(Mf(N)), Memory: O(maxg(N))

Typically, f(N): O(N2) Time: O(MN2)

General layout

(a)

Structure

synthesizer

Device

database

Processed structure

N devices

Time: k(N), Memory: h(N)

Loop over M layouts Time: O(Mk(N)), Memory: O(maxh(N))

General layout

Process

assumptions

(b)

Figure 4.7: 3D-TCAD structure generation for layouts: (a) traditional approach,and (b) proposed approach

plexity for k(·) and h(·), the flow in Fig. 4.7(b) would enable 3D-TCAD structure

generation to comfortably scale beyond the ‘many-device TCAD barrier.’

To summarize, the automated structure synthesis approach opens up the fol-

lowing new avenues in modeling:

100

1. For the first time, it enables extensive iterative layout/process-TCAD simu-

lation based refinement in a practical time-frame, as generation of process-

simulated device (PSD) structures is not a bottleneck any more, thereby ad-

dressing the problem depicted in Fig. 4.6(b).

2. It enables quick evaluation of different LUIs in the same process, once process

assumptions are fixed.

3. It also enables quick evaluation of different process recipes on the same LUI,

thereby maintaining overall consistency across processes/layouts, while en-

hancing device-circuit co-design capabilities, and addressing the issues de-

picted in Fig. 4.6(a).

4. It permits independent analysis of FEOL and BEOL components of PSD

structures, as well as modeling of contiguous (FEOL + BEOL) PSD struc-

tures. This is critical to efficient design of experiments for variability and

reliability investigations.

To develop methodologies to realize the flow shown in Fig. 4.7(b), the following

features were targeted:

1. Layout independence: Since process simulators handle layouts right down

to individual masks, our approach is able to handle arbitrary layout features,

and is largely independent of the orientation of devices as well as the process-

development-kit (PDK). This ensures that any layout, irrespective of its un-

derlying nature, e.g., digital, analog or RF, can be easily imported (with the

aid of PDK layer-map files) and analyzed.

2. Process independence: Different process recipes yield different PSD struc-

tures. Hence, our approach incorporates sufficient layers of abstraction to

101

encapsulate the key features of the process, such as a process file enumer-

ating the material systems/dielectric layers/layer thicknesses used, etc., so

that certain elements controlled/marked by the designer can be ignored dur-

ing structure synthesis. This is critical for evaluating efficiency versus accu-

racy tradeoffs.

3. Technology-node/device independence: In order to perform design space

exploration of FEOL and BEOL components, we architected the synthesis

methodology to be as independent of the underlying devices as possible,

and provided abstractions to ensure that it is configurable at any technol-

ogy node. This has the added advantage of being able to migrate from

older technology nodes to newer ones easily, and perform a wide variety

of tests/optimizations (by simply swapping the technology setup files con-

sisting of device databases, process files, etc.) that are not possible using

the traditional approach. Device independence also implies that our ap-

proach can, in principle, be tailored to structure synthesis using any generic

underlying device in TCAD.

Next, we delve into the building blocks required to realize the flow in

Fig. 4.7(b).

4.3.2 Building blocks of the algorithm

Our core approach consists of the following steps:

• Process characterization (PC): (i) delineation of process zones, (ii) construc-

tion of the device-layout database (DLD) using pre-synthesis transforma-

tions, and (iii) process-feature rulebook (PFRB) generation.

102

• Layout characterization (LC): (i) layout analysis using the device-recognition-

rule database (DRD) and (ii) generation of lithography-effects database

(LED).

• Structure synthesis (SS): (i) FEOL only, (ii) BEOL only, and (iii) integrated

(FEOL + BEOL).

It should be noted that while PC is semi-manual with a one-time setup cost per

technology, LC and SS are fully automated. They are described next.

(PC) Delineation of process zones: This is the most critical step where, using a

first-pass process simulation run, ‘process zones’ are allocated to reduce model-

ing complexity as much as possible, while preserving modeling accuracy where

needed. While defining zones, the following terminology relates to doping/stress

profiles: (i) process accurate (PA): if they are precise, (ii) process weakly accurate

(PW ): if they are moderately accurate, and (iii) process independent (PI): in the

event that they are not accounted for or not needed. Similarly, the following termi-

nology relates to physical geometries: (i) geometry accurate (GA): if they are pre-

cise and (ii) geometry weakly accurate (GW ): if they are moderately accurate. PA

doping profiles would import the locations of dopants, the exact profile obtained

from a detailed process simulation. On the other hand, PW doping profiles can be

analytic/formula-based with fitting constants, e.g., an approximate Gaussian pro-

file with a characteristic decay length that corresponds to a profile obtained from

process-simulated output.

As shown in Fig. 4.8, which is a typical FEOL cross-section with metal-1 wiring,

the following zone classifications are found to be useful.

1. PA-GA: This is used to capture major active device/FET regions, such as zone

A, where modeling transport precisely is extremely important. PA-GA zones

can also encompass regions around active FETs to capture the effects of stress,

103

[PA-GA]

[PW-GW]

[PW-GA]

[PA-GW]

[PI-GA]

[PI-GW]

Zone E

Zone F

Zone A

Zone B

Zone C

Zone D

Figure 4.8: Delineation of process zones

proximity, etc., if they are very critical to the parameters being modeled. PA-

GA is primarily assigned to a small group of distinct FEOL regions, e.g., one

instance of each type of device (low-Vth FET, high-Vth FET, and so on) in a pro-

cess technology. It could also be used to designate different process corners

of a device.

2. PA-GW : This is applicable to regions like zone B, where contacts to devices,

etc., need to capture stresses/thermal behavior and can ignore rounding due

to lithography. To the first order, we can expect that minor corner round-

ing in vias and contacts has little effect on parasitics and saves a lot of mesh

nodes that are otherwise needed to capture circular/cylindrical shapes dur-

ing boundary tessellation [134].

104

3. PW -GA: This is used to model regions like zone C, which are part of the ac-

tive device layer and lie between devices, serving as shared source/drain

regions. Locations like zone C mostly consist of heavily-doped regions with-

out any major gradients in dopant concentrations, thereby allowing them to

be modeled as PW .

4. PW -GW : This can model regions like zone D, which lie between active device

layers/islands, where process and geometric accuracy can be sacrificed to

some extent.

5. PI-GA: While modeling BEOL metals, such as zone E, it is essential to cap-

ture exact corner-rounding characteristics for moderately large metal chunks.

Hence, geometric input from lithography simulations is needed. Depending

on device simulation requirements, process profiles can be completely ig-

nored here, making it PI, else PW can also be used.

6. PI-GW : In BEOL metal areas like zone F, apart from ignoring process simula-

tion data or designating it to be PI, to the first order, minor shape variations

in the vertical direction and in the horizontal plane (from lithography sim-

ulations) can be ignored, making it GW , thereby saving mesh nodes during

boundary tessellation.

From Fig. 4.8, we see that zones can overlap with each other. Hence, a priority

order needs to be specified to resolve overlaps: PA-GA > PA-GW > PW -GA > PW -

GW > PI-GA > PI-GW , where the zone with higher priority replaces regions in

zones with lower priority, should they intersect during structure synthesis. Typi-

cally, different processes have different kinds of features that require zone assign-

ment. At the end of this step, a lookup table of assignments, which designates

features obtained from process simulation and geometric interactions, is compiled

and utilized in the steps that follow.105

Process

simulator

Process

recipe

Time ~ f(1), Memory ~ g(1)

Loop over each distinct device type, device width (only for 3D), and process recipe

Device-layout

database (DLD)

Individual device

layouts (nFET,

pFET, etc.)

nFET

pFET

Pre-synthesis

transformations

3D

nFET

pFET

2D

Figure 4.9: Construction of device-layout database (DLD)

(PC) Construction of the device-layout database (DLD): The proposed synthesis

methodology is based on the key observation that it is necessary to preserve ex-

treme detail only in device regions where interesting phenomena occur. It is im-

portant to note that the definition of a ‘device’ in a PSD structure is broader than

just a single FET/active device. It can encompass an entire region consisting of

many FETs (e.g., matched transistors), which are regarded as a single repeating

unit in larger layouts (the finest granularity is a single FET). Therefore, as per Fig.

4.9, one instance of each possible PA-GA zone in the process technology is created

by passing the corresponding ‘device layouts’ through a process simulator. The

resulting PSD structures, which are either three- or two-dimensional, are absorbed

into the DLD after undergoing certain pre-synthesis transformations that are ex-

plained below.

106

Pre-synthesis transformations: These are applied to individual PSD structures,

and are defined on a technology node basis. The basic steps are shown in Fig. 4.10:

PA-GA PSD structures undergo device zoning, including domain trimming, and

any rotations, translations, and reflections needed to obtain a complete PA-GA zone

with the correct orientation, after which all contacts that are present are removed,

and the structure is checked into the DLD.

Process-

simulated structure

Is it 3D or

2D?

Extrude to

required width

2D

3D Device zoning

Trimming

Fix orientation

Rotate

Translate

Reflect

Remove

contacts

Pre-synthesis transformationsTo DLD

Figure 4.10: Pre-synthesis transformations on PA-GA zones

Since the above process simulations involve a one-time cost with a single device

(with various orientations, Vth classes, process corners, etc.) per process recipe, the

time and memory complexity are greatly reduced, and it is possible to update the

DLD for each iteration of the process recipe on a practical timescale. The cached

PA-GA regions are indexed in the DLD and are amenable for ‘insertion’ into larger

structures. This is accomplished via simple geometrical operations and rules to

stitch doping/stress profiles/mesh entities in non-PA-GA zones during the struc-

ture synthesis steps, with the aid of a PFRB, which is described next.

107

Process

simulator

Common-case test

layouts

Structure 1

Structure 3

Structure 4

Slice

generator

Slices

Global

feature abstraction +

rulebook

generation

X-Y

X-Y

X-Y

Y-Z

Y-Z

Y-Z

X-Z

X-Z

FEOL/BEOL rulebook

Global feature 1:

Rule #1, Rule #2, …

Global feature 2:

Rule #1, Rule #2,

…

Process feature rulebook generation

Process

conditions

Structure 2

Figure 4.11: Process feature rulebook (PFRB) generation

(PC) Process feature rulebook (PFRB) generation: While PA-GA zones are ac-

counted for through the construction of the DLD, zones PW -GA and PW -GW are

captured using approximate process profiles generated from a rulebook. This stage

is reached when the process technology is reasonably mature and relatively few

changes are expected during fine-tuning of the process recipe. Fig. 4.11 shows

the procedure for generating the PFRB. Several test case layouts having a small

number (two to six) of devices undergo rigorous process simulation to generate

their respective structures, after which a slice generation macro is employed to cre-

ate slices at different locations of each structure and along various planes. Using

information aggregated from interfaces of regions around PA-GA zones, a global

feature rulebook is generated. The PFRB can be created for both FEOL and BEOL

components. It is used by the respective structure synthesizers (described later) to

intelligently assist in the reconstruction of shapes around PA-GA zones.

The PFRB rules encompass three areas, namely, geometry, feature profiles

(dopant/stress information), and meshing. Geometry rules mainly consist of

108

Boolean add/remove/merge operations. An example of a geometry rule to

produce conformal dielectrics around BEOL metal would be Rule #k1 in Ta-

ble 4.1, where ILD stands for inter-layer dielectrics. Feature profiles, such as

dopant/stress data, are generally analytical/formula-based with suitable fitting

constants to mimic certain process-simulated output profiles. For instance, a rule

to produce a Gaussian doping profile when a layout condition, ‘LC-m’, such as

the edge of a PA-GA zone, is triggered, would be Rule #k2 in Table 4.1. Meshing-

related rules guide the mesh density in intermediate regions between PA-GA zone

submeshes, providing input, such as the maximum and minimum (x,y,z) mesh

spacings. An example rule providing minimum mesh spacings in ‘region-m’

would be Rule #k3 in Table 4.1. The above methodologies systematically charac-

Table 4.1: Process feature rulebook examples

Rule # Rule descriptionk1 ILD-layer-n

⋂Metal-layer-m = ILD-layer-n removed

in regions where Metal-layer-m existsk2 if(LC-m), then dopant-placement [loc(LC-m), profile(LC-m)],

profile(LC-m) = Gaussian[N0, (x,y,z) = loc(LC-m), decay-length]k3 if(region-m), then mesh-xmin = l1 nm,

mesh-ymin = l2 nm, mesh-zmin = l3 nm

terize arbitrary processes and set the stage for the analysis of arbitrary layouts in

the process, which is discussed next.

(LC) Layout analysis: After characterizing the process, the layout to be inves-

tigated is annotated and passed through an automated layout analyzer, as shown

in Fig. 4.12. The layout analyzer is assisted by a device-recognition-rule database

(DRD) where the designer can specify arbitrary Boolean operations between layout

layers to recognize current and new devices. For instance, in Fig. 4.12, the inter-

section of POLY and ACT IV E regions is automatically indexed as a planar FET,

while the intersection of POLY , ACT IV E, and FIN is indexed as a FinFET in the

DLD. This step also extracts all planar geometrical information, including device109

locations, device types, POLY orientation, layout partitions, and doping/PA-GA

submesh boundaries that are used by the structure synthesizer.

Layout analyzer

General layout

Device

recognition rule database

(DRD)

Rule #1: ACTIVE ∩ POLY = planar FET

Rule #2: ACTIVE ∩ POLY ∩ FIN = FinFET

POLY

ACTIVE

FIN

Layout partitioning/segmentation Layout information:

Device types

Device locations

Device orientation

Doping & submesh

boundaries

DLD

Figure 4.12: Layout analyzer

For the layout analyzer to achieve the above in a process/PDK-independent

manner, it is necessary for the designer to annotate the layout with additional

markers, either manually or through layout scripting languages, such as SKILL

[135]. The pre- and post-annotation stages for the case of a 1×1 6T FinFET SRAM

bitcell are shown in Fig. 4.13. The layers, which are added during layout annota-

tion, would correspond to FET Vth markers, FET process corners for the particular

instantiation of the FET, and contact markers for automated contact creation in the

final synthesized 3D structure.

(LC) Generation of the lithography-effects database (LED): Typically, litho-

graphic effects can be directly captured in the individual masks of the layout and

used in process simulation. However, this greatly increases computational com-

plexity, owing to the dense meshes required to tessellate curved surfaces, and the

need to re-mesh with each process step to accurately capture stress/dopant behav-

ior. In our framework, only PA-GA zones need to be lithography-accurate and ob-

tained once from process simulation. Other FEOL and BEOL components, which

110

BL

VDD WL

GND GNDAn

no

tate

d l

ayo

ut

BLB

Inp

ut

layo

ut

POLY

FIN

ACTIVE

METAL-2

METAL-3

Figure 4.13: Layout annotation for a 1×1 6T FinFET SRAM bitcell

need GA, are captured using corner rounding at locations specified by the LED.

Fig. 4.14 summarizes the steps needed for LED generation. Input layouts undergo

lithography simulation for each mask and post-lithography simulation features are

approximated by rounding radii. This information is indexed in the LED for each

layout at different locations and process layers.

(SS) Structure synthesis: This is the final stage in which a structure is stitched

together using information gathered in PC and LC stages. Fig. 4.15 shows the

architecture of the structure synthesizer, where the input layout is partitioned by

the layout analyzer, followed by independent FEOL and BEOL structure synthe-

sis. Here, DLD plays a central role by supplying layout and PA-GA zone informa-

tion to the structure synthesizers. The FEOL (BEOL) synthesizer uses the latter,

along with FEOL (BEOL) assumptions and the FEOL (BEOL) PFRB, to create an

intermediate structure. Depending on the desired accuracy, FEOL (BEOL) lithog-

raphy effects are introduced, following which a final remeshing step generates the

111

Lithography

simulation

Input layout

Post-litho

features

Approximate

by corner rounding/

chamfers

Radii of

curvatureFEOL/BEOL

litho effects

database (LED) R2(X2,Y2)M1

R1(X1,Y1)Active

Rounding

radii

LocationLayer

Litho analyzer

Figure 4.14: Generation of lithography-effects database (LED)

FEOL (BEOL) structure. It is also possible to generate (FEOL+BEOL) structures by

combining the intermediate FEOL and BEOL structures in the integrated structure

synthesizer.

Dealing with proximity effects: It is important to note that stress proximity ef-

fect as a function of inter-device distance, shallow trench isolation, etc., cannot be

captured accurately using the single-device process simulation approach shown in

Fig. 4.9, which is likely to lead to inaccurate structure synthesis. For simulations

where proximity effects are absolutely essential, two approaches can be taken:

• The PA-GA zone is extended to encompass the entire region of interest to

capture dopant/stress profiles with process-level accuracy. This would make

structure synthesis unattractive, if the region is very large.

• Instead of individual layouts in Fig. 4.9, layouts having a group of three de-

vices undergo process simulation (with common-case inter-device distances

assumed as per the technology design rules) to obtain intermediate struc-

tures, which undergo pre-synthesis transformations. Here, the device to

112

Layout

analyzer

Litho

analyzer

FEOL

structure synthesizer

DLD

BEOL


FEOL

process assumptions

FEOL

feature rulebook

BEOL

process assumptions

BEOL

feature rulebook

Integrated


FEOL litho

effects insertion

Re-mesh

BEOL litho

effects insertion

FEOL

LED

BEOL

LED

Re-mesh

Litho

effects insertion

Re-mesh

FEOL

structure

BEOL

structure

(FEOL +

BEOL) structure

Litho

settings

Litho

settings

Figure 4.15: Architecture of the structure synthesizer

be mapped to a PA-GA zone should be located at the center. In the device

zoning step of Fig. 4.10, only the material within the PA-GA zone bound-

ary around the center device is preserved, and the remaining structure is

trimmed. Therefore, the resultant PA-GA zone captures the expected proxim-

ity effects on transport in the PA-GA zone without having to store the entire

structure. During the structure synthesis phase, even though the stitching

of stress/dopant profiles can appear to be non-physical at the PA-GA zone

boundaries, proximity effects in important regions of the PA-GA zone (such

as the channel, source/drain-body boundary, etc.) are preserved as the PA-

GA zones were derived from similar test case process simulations.

Next, we discuss implementation strategies for the structure synthesizers.

113

FEOL structure synthesizer: Basic steps

Active island (AI) generation

Active FET merge operations

Active island

FEOL structure

DLD

Figure 4.16: FEOL structure synthesis

4.3.3 Implementation strategies

We implemented a basic FEOL synthesis algorithm, using the steps outlined in Fig.

4.16, for template 32nm bulk/SOI and 22nm SOI processes. Using inputs from the

FEOL PFRB and the layout active layer mask, we sequentially generate and merge

all active islands of the simulation domain. Thereafter, PA-GA zones or active FET

regions that were recognized from the layout are sequentially imported from the

DLD with the appropriate layout-designated widths and translated into another

simulation domain. Then, through a series of merge operations, the two domains

are merged to produce an FEOL region with geometrically accurate boundaries.

Doping/stress profiles in the PA-GA zones are stitched together with those in other

zones (whose profiles are prescribed by the FEOL PFRB), and introduced during

re-meshing of the FEOL structure.

114

Litho

effects insertion

Inter-layer dielectrics (ILD)

generationIncremental metal/via generation

with merge operations

BEOL metal with

conformal ILD

BEOL structure synthesizer: Basic steps

Figure 4.17: BEOL structure synthesis

BEOL synthesis and integrated structure synthesis occur along similar lines, as

shown in Figs. 4.17 and 4.18. During BEOL synthesis, individual metal layers are

created sequentially and merged to form a contiguous BEOL metal stack. Lithog-

raphy effects are introduced on a layer-by-layer basis via corner rounding from

the LED. Thereafter, the BEOL dielectric stack is generated from the BEOL pro-

cess assumptions and the BEOL PFRB. The metal stack is pushed into it through

a series of merge operations in order to generate a BEOL structure with metal and

conformal ILD. During integrated structure synthesis, the intermediate BEOL and

FEOL structures (generated by the respective synthesizers) are united with over-

lap resolution dictated by priorities specified in the FEOL/BEOL PFRB. In the next

115

section, we present three case studies to demonstrate the efficacy of the structure

synthesis approach.

Integrated structure synthesizer: Basic steps

Doping/sub-mesh

placement

Merge/resolution operations

Figure 4.18: Integrated structure synthesis

4.4 Structure synthesis case studies

The methodologies outlined in Section 4.3 have been implemented in a plugin tool

for the state-of-the-art Sentaurus TCAD tool suite [136], in order to leverage ad-

vanced process simulators like SProcess/TSuprem4 [137], as well as the Sentau-

rus Structure Editor [134] for structure synthesis. From a validation perspective,

we applied the structure synthesis approach to capacitance extraction experiments

(Appendix A), and the results have been very promising. In this section, we com-

pare capacitance extraction on synthesized structures, versus PSD structures in

Case 1, and characterize the scaling behavior of our implementation in Case 2.

116

Case 1: Capacitance extraction for 32nm bulk 6T SRAM – Process-simulated ver-

sus synthesized structures

To verify the efficacy of our approach, we simulated a planar 6T SRAM layout

with 30nm gate length and 112.5nm poly-to-poly pitch in a 32nm bulk process [138].

Fig. 4.19 shows the output at the intermediate steps involved in the gate-last pro-

cess consisting of trench device-isolation, formation of high-k dielectrics, polysil-

icon gate formation, source/drain formation with p-FET SiGe and n-FET raised

Si pockets, salicide formation, interlevel deposition/polish, removal of polysilicon

gates, dual metal-gate deposition, and finally, contact formation.

(a) (b)

(c)

(d)

(e)

Figure 4.19: Structure formation during a planar 6T SRAM process simulation: (a)trench device isolation, (b) formation of gate stack, (c) source/drain formation withspacers, (d) contact and via formation, and (e) final structure with doping

Following the methodology outlined in Section 4.3, we assigned PA-GA zones,

performed process simulation on individual FETs, and constructed the DLD. Next,

117

Table 4.2: Resource usage: Process simulation vs. structure synthesis

Metric Process simulation Structure synthesisTotal CPU time 75 hrs 6 hrs (synthesis + meshing)

+ 11.5 hrs (DLD construction)Memory 64 GB 12 GB (dominated

by DLD construction)Disk space 6 GB 2 GB

Number of threads 8 1 (synthesis)+ 8 (DLD construction)

we generated the FEOL PFRB and used it for structure synthesis. The synthesized

structure is shown in Fig. 4.20(a), where doping/stress profiles are accurate only

in the FET regions, and moderately accurate in the bulk. Table 4.2 shows the re-

sources consumed for both cases. Brute-force process simulation on the LUI is con-

siderably slower than synthesis (which has a one-time cost of DLD construction).

Maximum memory usage and disk space required are also considerably lower.

This suggests that automated structure synthesis could be leveraged to prune the

design space quickly, and full 3D process simulation of layouts can be performed

only on the finalized candidates if necessary.

We performed transport analysis based 3D-TCAD capacitance extraction ex-

periments on both the process-simulated structure and the synthesized structure

at five different extraction frequencies. Fig. 4.20(b) shows that the error percent-

age in bitline capacitance extraction (CBL) between the two is negligible above

10KHz and is maximum at around 2%, at 100Hz. We also performed hold static

noise margin (HSNM) and read static noise margin (RSNM) experiments for the

process-simulated and synthesized structures, with VDD = 1V . From Figs. 4.21(a)

and 4.21(b), it can be seen that the butterfly plots extracted from the structure-

synthesized 6T cell are nearly identical to the process-simulated 6T cell. This shows

that the proposed approach is reasonable and practical.

118

Pull-up (pFET)

Pass-gate (nFET)

Pull-down (nFET)

(a)

2 3 4 5 6−1

−0.5

0

0.5

1

1.5

2

2.5

Log10

(Extraction frequency)

Err

or

in b

itli

ne

ca

pa

cit

an

ce

(C

BL)

in

%

(b)

Figure 4.20: (a) Synthesized planar 6T SRAM structure, and (b) CBL extractionerror percentage

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VL (V)

VR (

V)

Process−simulated 6T cellStructure−synthesized 6T cell

(a)

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VL (V)

VR (

V)

Process−simulated 6T cellStructure−synthesized 6T cell

(b)

Figure 4.21: Process-simulated versus synthesized 6T SRAM cells: (a) hold staticnoise margin (HSNM), and (b) read static noise margin (RSNM)

119

Case 2: Scaling behavior for 22nm SOI FinFETs

We leveraged the technology/device-independent nature of the proposed ap-

proach to investigate the scaling properties of our structure synthesizer imple-

mentations for 22nm SOI FinFET devices and circuits. We synthesized four dif-

ferent configurations of 6T FinFET SRAMs (Fig. 4.22) consisting of (1×1), (2×2),

(3×3) and (4×4) bitcells, directly from annotated layouts, such as the one shown in

Fig. 4.13. In Fig. 4.23, a fully meshed 3×3 structure with over 8×106 mesh nodes is

shown, which consumes 48 hours to extract capacitances between every net-to-net

pair in Sentaurus Device. Ring oscillators consisting of 5/11/17/23 stages were

also synthesized (Fig. 4.24). In each case, we measured the FEOL/BEOL/(FEOL +

BEOL) integration structure synthesis time as the time duration from layout anal-

ysis up to the beginning of the re-mesh operations, with a maximum memory al-

location of 32GB.

(a) 1X1 cell (b) 2X2 cell

(c) 3X3 cell (d) 4X4 cell

Figure 4.22: Synthesized 6T FinFET SRAM bitcell configurations

120

Figure 4.23: 3×3 6T FinFET SRAM bitcell structure with mesh

From Fig. 4.25, we can see that (FEOL+BEOL) integration dominates the total

runtime as a considerable amount of computation is performed in overlap reso-

lution during integrated structure synthesis. Also, FEOL synthesis is faster than

BEOL synthesis. This is highly dependent on the BEOL features/metal density

and FEOL device complexity, and can vary dramatically from layout to layout.

This is corroborated by the scaling behavior of ring oscillator structures, as shown

in Fig. 4.26, where BEOL synthesis time is similar to FEOL synthesis time owing to

the fact that metal-2/metal-3 density is lower in the ring oscillator layouts. From

Figs. 4.25 and 4.26, for a reasonable number of FinFETs (#FinFETs), FEOL, BEOL

as well as (FEOL+BEOL) synthesis times can be seen to scale very well using the

proposed approach.

121

(a) RO-5(b) RO-11

(c) RO-17(d) RO-23

Figure 4.24: Synthesized FinFET ring oscillator configurations

0.8 1 1.2 1.4 1.6 1.8 2 2.21

2

3

4

5

6

Log10

(# FinFETs)

Lo

g10

(S

tru

ctu

re s

ynth

esis

tim

e)

FEOL synthesisBEOL synthesis(FEOL+BEOL) integration Total (FEOL+BEOL) synthesis

(1X1)

(2X2)

(3X3)(4X4)

Figure 4.25: 6T FinFET SRAM: Synthesis time (in sec.) versus number of FinFETs

4.5 Discussion

While the above case studies demonstrate the efficacy of the structure synthesis ap-

proach, it is important to note that we have solved only the first part of the prob-

lem, as generic device simulation experiments can still be very time-consuming.

122

1.4 1.6 1.8 2 2.22

2.5

3

3.5

4

4.5

5

5.5

Log10

(# FinFETs)

Log

10 (

Str

uctu

re s

ynth

esis

tim

e)

FEOL synthesisBEOL synthesis(FEOL + BEOL) integrationTotal (FEOL + BEOL) synthesis

RO−5

RO−17

RO−11

RO−23

Figure 4.26: FinFET ring oscillator: Synthesis time (in sec.) versus number of Fin-FETs

(Capacitance extraction using 3D-TCAD device simulation is, in general, very fast

in comparison to DC/transient simulations.) However, the automated nature of

structure synthesis would help engineers utilize device simulators more efficiently,

as the work formerly done by an engineer (in setting up a 3D-TCAD deck for an

LUI) over a span of 3-4 weeks can be accomplished in a few minutes. Thus, struc-

ture synthesis makes the whole TCAD cycle more efficient by cutting down the

most time-consuming portion of the cycle, namely, manual re-coding of the 3D-

TCAD deck for a new LUI/process.

The current work also highlights the need to provide high-level abstrac-

tions/interfaces to TCAD engineers in order to maintain consistency across

technology nodes, layouts, and processes. Here, automated structure synthesis is

akin to logic synthesis in the circuit world (Fig. 4.27), where a gate-level netlist can

be derived from high-level synthesizable hardware description language (HDL)

code using a set of design libraries. The process of manually arriving at a gate-level

netlist from HDL code can be extremely cumbersome. In an analogous manner,

the absence of an automated structure synthesizer has been a major impediment

to TCAD engineers, and has been addressed for the first time in the current work.

123

RTL/HDL

description

Logic

synthesisGate-level

netlist

Design

libraries

Figure 4.27: Logic synthesis flows are the circuit-world analogs of Fig. 4.7(b)

4.6 Chapter summary

Analyzing and optimizing nanoscale devices and circuits using 3D-TCAD is

emerging as a necessity at lower technology nodes. Here, obtaining accurate 3D-

TCAD structures corresponding to LUIs via 3D process simulation is impractical,

as the latter is not amenable to iterative layout-TCAD optimization. In this work,

we proposed and validated an automated structure synthesis framework that

substantially reduces time and memory complexity during 3D-TCAD structure

generation. We circumvented the 3D process simulation barrier by preserving

accuracy, when needed, using individual process-simulated blocks and stitching

them together using layout information, technology assumptions, and PFRBs to

generate larger structures. Capacitance extraction experiments for comparing

structure synthesis with process-simulated layouts indicate that the methodology

is an excellent substitute to 3D process simulation for 3D-TCAD based analy-

sis of large LUIs, for which manual coding and process simulation runtime are

prohibitively expensive.

124

Chapter 5

Transport analysis based 3D-TCAD

Parasitic Capacitance Extraction in

Emerging Technologies

In this chapter, we focus on the problem of accurate parasitic capacitance extrac-

tion for circuits in highly-scaled CMOS technologies, which is listed as an issue

in the 2011 ITRS modeling and simulation roadmap [80] under Section 3.5. The

chapter is divided into three sections. Section 5.1 outlines the need for transport

analysis based 3D-TCAD parasitic capacitance extraction. Thereafter, Section 5.2

deals with hardware validation of the transport analysis approach in an experi-

mental 32nm SOI process. Finally, in Section 5.3, we explore parasitic capacitances

in emerging multi-gate devices at the 22/14/10nm technology nodes, using the

above approach.

125

5.1 The need for transport analysis based parasitic ca-

pacitance extraction

With technology scaling, extraction of layout-dependent parasitic capacitances is

becoming extremely important. In this section, we establish the need for a true

3D transport analysis based approach for highly scaled circuits, by leveraging the

structure synthesis methods from Chapter 4.

5.1.1 Introduction

Capacitance extraction is a key element of state-of-the-art industrial VLSI flows

affecting design timing, power, and stability of circuits. Current methods in dig-

ital/analog/RF design rely on field solvers [139] [140] [141] [142], which model

BEOL dielectrics and metal, and compact models which capture FEOL related

capacitances. While compact models can account for certain layout-dependent

effects, they are oblivious to the myriad possibilities in which back-end features

can interact with the active semiconductor device layer (as well as the numerous

shapes in which the active layer may be patterned). Field solvers treat FEOL sil-

icon at best as a material of uniform conductivity or as a lossy material, thereby

ignoring its nonlinear nature for arbitrary doping profiles.

Owing to the above, it is questionable whether the total capacitance predicted

at each node (as the sum of the FEOL and BEOL components) accurately captures

the actual capacitances seen in highly scaled circuits. Capacitance misprediction

is significant for compact yield-limiting circuits like large SRAM and eDRAM ar-

rays, where a minor difference in estimation of a few percent per bitcell coupled

with a large column height, can shift the failure point of operation. In this section,

we clarify the above for yield-critical single-/dual-port 6T SRAM bit cells in an ad-

126

vanced sub-32nm IBM SOI process through transport analysis based 3D-TCAD [81]

capacitance extraction.

5.1.2 Transport analysis based capacitance extraction

In general, for an N terminal contacted device structure, the phasor terminal

voltages Vk, k = 1, ...,N, are related to the phasor terminal currents Ik, k = 1, ...,N,

through the N ×N admittance matrix Y such that I = YV , where V = [V1, ...,VN ]T

and I = [I1, ..., IN ]T . Elements of Y , ˜Yab, are determined by individually exciting

each terminal with Vb so that

˜Yab = (Ia

Vb)|Vk=0,k 6=b (5.1)

The conductance matrix G and capacitance matrix C of the structure are obtained

from G= ReY and ωC = ImY, where ω is the excitation frequency. In the case of

highly scaled circuits, in order to determine the responses Ik accurately at each ω, it

is essential to treat the FEOL silicon as a semiconductor or solid-state plasma with

mobile carriers [143], i.e., obtain the solution of the coupled system of Poisson and

carrier continuity equations for slight perturbations around the DC bias point. At

each mesh node i of the device structure, the Poisson, electron, and hole continuity

equations can be recast into the form [144]:

Fφi(φ,n, p) = 0

Fni(φ,n, p)− ∂Gni(n)∂t

= 0

Fpi(φ,n, p)−∂Gpi(p)

∂t= 0 (5.2)

where F and G are nonlinear functions of φ,n, p, which represent matrices of po-

tential, electron, and hole density, respectively. The AC system of equations is

127

obtained by substituting ζ(t) = ζ0 + ζejωt with ζ = (φ,n, p) and ζ0 as the steady-

state solution in Eq. (5.2). Using the Taylor expansion with only linear terms

yields [144]:

Σ j

∂Fφi∂φ j

∂Fφi∂n j

∂Fφi∂p j

∂Fni∂φ j

(∂Fnin j− jω∂Gni

∂n j) ∂Fni

∂p j

∂Fpi∂φ j

∂Fpi∂n j

(∂Fpi∂p j− jω∂Gpi

∂p j)

φ j

n j

p j

= 0 (5.3)

With the appropriate boundary conditions, the global AC system is constructed

from Eq. (5.3) and solved to obtain the solution vectors [φ j n j p j]. The ac current

densities are computed as [83]:

~Jn = Σζ=φ,n,p∂~Jn

∂ζ|DC · ζ, ~Jp = Σζ=φ,n,p

∂~Jp

∂ζ|DC · ζ (5.4)

Using the above, the total phasor terminal currents of the device are calculated and

admittance matrix Y is obtained from Eq. (5.1).

The above methodology captures field-carrier interactions and is capable of ac-

counting for the inherent nonlinearity of the active semiconductor layer under all

bias conditions. This is quantified via a simple experiment using field solver (FS)

and transport analysis based TCAD capacitance extraction on a metal wire (contact

A) running over an active semiconductor region with contact B (cross-sections in

Figs. 5.1(a) and 5.1(b) having different doping profiles, with peak doping ND and

separation D). From Fig. 5.2(a), when D ≥ 0.3µm, FS and TCAD predictions for

high/low ND match closely. This reflects the sufficiency of FS based extractions at

higher technology nodes, with large separation between FEOL regions and BEOL

metal. However, as D decreases, FS overestimates capacitance considerably, even

at high ND. From Fig. 5.2(b), as expected, FS fails to track VAB changes and over-

128

estimates capacitance even at zero bias. The results in Figs. 5.2(a) and 5.2(b) are

very dependent on the doping profile in Figs. 5.1(a) and 5.1(b), respectively, and

can vary widely, indicating that FS based extraction will be inaccurrate at highly

scaled technology nodes.

Contact B

D

Peak doping (ND

)

Nitride

Oxide

Contact A

Silicon

1e208.7e167.6e13−3.7e12−4.4e15−5.0e18

Doping (cm −3)

(a)

Contact B

D

Oxide

Contact A

Nitride

Silicon

Peak doping (ND

)

Doping (cm −3)

1e208.7e167.6e13−3.7e12−4.4e15−5.0e18

(b)

Figure 5.1: Cross-sectional view of a metal wire running over an active semicon-ductor region with two arbitrary doping profiles

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2x 10

−16

D (µm)

Cap

acita

nce

CA

B (

F)

TCAD, ND = 1e16 cm−3





FS

(a) Using cross-section in Fig. 5.1(a)

−1 −0.5 0 0.5 10.5

0.6

0.7

0.8

0.9

1

1.1

1.2

1.3

1.4

1.5x 10

−16

VAB

(V), D = 0.05µm

Cap

acita

nce

CA

B (

F)

TCAD, ND=1e16 cm−3



FS

(b) Using cross-section in Fig. 5.1(b)

Figure 5.2: Comparison between FS and TCAD extracted capacitance CAB underdifferent conditions, ω/2π = 1MHz

129

5.1.3 Methodology and results

In this work, for the first time, transport analysis based 3D-TCAD capacitance ex-

traction is performed on multi-cell SRAM blocks. A layout-independent, auto-

mated TCAD structure generator (described in Chapter 4) was used to generate

3D meshed structures using the Synopsys Sentaurus TCAD tool suite [81]. The

structure generator incorporated a single set of FEOL and BEOL feature assump-

tions for an advanced sub-32nm IBM SOI process. This was applied to single- and

dual-port 6T SRAM layouts to create individual features embedded in conformal

interlayer dielectrics for generating the BEOL structures [e.g., Fig. 5.3(a)]. Indi-

vidual FETs of each type were imported from 2D process simulations performed

in Tsuprem4 [137], and were extruded, trimmed to the layout specific shapes and

placed at the appropriate locations in the FEOL structure with the correspond-

ing doping profiles [e.g., Fig. 5.3(b)]. Finally, the FEOL and BEOL structures

were merged to generate an accurate 3D representation of the bitcells with doping

profiles, contacts, and meshing. For each layout, two structures were generated:

(FEOL + BEOL) and (only FEOL). The difference in capacitances, i.e., (FEOL +

BEOL) capacitance − (FEOL) capacitance, represents the computed BEOL compo-

nent of the capacitance from ac transport analysis in TCAD. The BEOL component

was inserted into circuit schematics/netlists and used in SPICE simulations to in-

vestigate bitcell stability and performance.

Two types of SRAM cells, Type I [single-port 6T thin cell, Fig. 5.3(c)] and Type

II [dual-port 6T thin cell, Fig. 5.3(d)], were modeled using transport based analysis

(TCAD with zero bias conditions, ω/2π = 3 GHz) and FS. A series of multi-cell

blocks were generated: 1× 1, 2× 1, and 3× 1, where the bit lines (BL, BLB) are

shared across the cells. For each configuration, the bit (word) line capacitance

is computed as the total capacitance at the node for the entire structure divided

by the number of cells sharing the bit (word) line. From Fig. 5.4, we see that the

130

(a) Type I 3X3 (BEOL) (b) Type I 3X1 (FEOL)

(c) Type I 1X1 (d) Type II 1X1

Figure 5.3: 32nm planar SOI Type I and II SRAM structures

computed BEOL word line (WL) TCAD capacitances differ by (38%,34%,34%) in

comparison with the FS extracted capacitances for Type I cells for (1× 1, 2× 1,

3×1) configurations, which is considerable. Bit line capacitances differ on average

by (11%,8%,7%), respectively. Increasing the number of cells in the block refines

the capacitances, showing opposite trends for bit lines vs. word lines, thereby

highlighting the inadequacy of single cell simulations that suffer from edge effects

(in the absence of periodic boundary conditions which are difficult to use as they

degrade solver convergence).

131

Figure 5.4: Type I computed BEOL capacitances (TCAD vs. FS)

Figure 5.5: Type II computed BEOL capacitances (TCAD vs. FS)

Simulating blocks of M× 1 (1×M) cells (M > 2), which share bit (word) lines,

improves the average bit (word) line capacitance estimates as M increases, as they

are more representative of the true environment around the bit (word) lines and

suppress the contribution of edge effects. From Fig. 5.5, we see that for Type II

cells, the difference between FS and TCAD extracted word line and bit line capac-

itances is on average (19%,6%,7%) and (7%,6%,9%), respectively, and not as high

as for Type I cells, showing that the assignment of bit and word lines to different

metal layers of the bit cell layout is critical.

The TCAD and FS predicted BEOL bit line capacitances were used in SPICE

netlists/simulations of the bitcells. Figs. 5.6(a) and 5.6(b) highlight the worst-case

132

(a) Type I (b) Type II

Figure 5.6: Performance difference in Type I & II cells during read operations

bit line delay (50% discharge) assuming a column height of 32 bitcells, versus the

normalized bitcell sigma, defined as the fractional change on the bitcell FET Vths

occuring in the worst-case combination for pull-up, pass-gate, and pull-down de-

vices during read/write operations. For Type I cells, TCAD predicted bit line ca-

pacitance is consistently lower than the FS estimate, leading to unacceptably high

delay differences for the 1× 1 case. These differences decrease for the multi-cell

cases. In Type II cells, the predicted TCAD bit line capacitance is lower for the

1× 1 case and higher for the 3× 1 case, thereby causing positive as well as neg-

ative delay differences with respect to FS. Under typical non-zero DC operating

conditions, which are cumbersome to simulate via 3D-TCAD owing to the com-

putational cost/time, it is very likely that delay differences will be higher. It is

also important to note that the voltage/frequency-dependent contribution from

the FEOL compact models often fail to capture shape-specific BEOL to FEOL sili-

con interactions in generic layouts.

Fig. 5.7 shows the effect of using FS and TCAD bit and word line capacitances

in SPICE simulations to compute cell read stability, which is defined as the maxi-

mum bitcell sigma variation that can be tolerated before read upsets occur, i.e., bit

line charge leakage into internal cell nodes is sufficient to flip the bitcell state dur-

133

ing a read operation. The normalized read stability for Type I and II cells is lower

for the FS 1× 1 case. This changes with the 3× 1 case, with the TCAD prediction

being higher for Type I and lower for Type II.

Figure 5.7: Type I and Type II Read stability (TCAD vs. FS)


In this work, a detailed comparison between transport analysis based 3D-TCAD

and FS based capacitance extraction for sub-32nm SRAM blocks of varying cell

counts and port configurations was performed. Simulation results from these

structures showed differences up to 11% (38%) in bit line (word line) capacitances,

which arises due the fact that transport in FEOL silicon is not properly accounted

for in the pure electrostatic approach. Also, the inadequacy of single cell TCAD

modeling (which leads to inaccurate performance and stability estimates) was

quantified from multi-cell extractions.

134

5.2 Hardware-assisted predictive capacitance extrac-

tion in 32nm SOI 6T SRAMs

In order to demonstrate the efficacy of structure synthesis combined with transport

analysis based extraction, we validated the methodologies with hardware data ob-

tained from a 32nm SOI process [145]. This is discussed next.

5.2.1 Introduction

Macros consisting of two flavors of thin-cell 6T SRAM arrays, namely 6T1 and 6T2,

were fabricated in an experimental IBM 32nm SOI HKMG technology (Fig. 5.8).

Figure 5.8: Thin-cell 6T SRAM array SEM top view showing HKMG n-/p-FETs

Using test structures for total bit line capacitance extraction, we obtained data

samples from 61 wafers at 10 locations per wafer. Intra-wafer measurements in-

dicated very little variation in CBL for both 6T1 and 6T2 (Fig. 5.9). However, the

inter-wafer CBL spread, shown in Fig. 5.10, was astonishingly high, with the max-

imum CBL being 56% higher than the minimum for 6T1. For 6T2, similar results

were obtained with a maximum to minimum spread of 47%. In order to pinpoint

135

(a) (b)

Figure 5.9: Measured intra-wafer CBL for (a) 6T1, and (b) 6T2

the major source of the spread, it was essential to determine if it originated from

FEOL or BEOL processing.

5.2.2 Methodology and results

We performed iterative BEOL analysis (IBA) and iterative FEOL analysis (IFA)

using the structure synthesis approach. We started with IBA, where back-end

(a) (b)

Figure 5.10: Measured inter-wafer CBL for (a) 6T1, and (b) 6T2

process assumptions and tolerances were obtained from scanning electron micro-

136

Figure 5.11: Synthesized (FEOL+BEOL) structure for the 6T1 SRAM bitcell

scope (SEM) snapshots and were utilized to generate several 6T1 BEOL instances

with varying metal, via, contact, and poly heights using the BEOL synthesizer.

These were combined with identical nominal FEOL instances to generate inte-

grated (FEOL+BEOL) instances, such as the one shown in Fig. 5.11, using the

(FEOL+BEOL) synthesizer. From Fig. 5.12, we see that inter-wafer variation in

BEOL parameters, which were subject to tight tolerances, failed to explain the large

spread in CBL.

Next, we moved to IFA. FEOL process assumptions were corroborated from

measured CGS-VGS data [Fig. 5.13(a)] and capacitance extraction simulations on

the multi-finger nMOS/pMOS capacitor test structures [Fig. 5.13(b)], which were

generated from their corresponding layouts.

On the FEOL side, since junction capacitance is the major contributor to CBL, we

examined process factors such as p-well dose, which affect junction capacitance.

Several FET process simulations were performed to obtain a variety of candidate

profiles by varying the p-well dose using the FEOL synthesizer. After synthesizing

137

-3.08%-1.88%4) CONFIG 3) + 21%

DECREASE IN VIA-1

HEIGHT

-1.33%

-3.93%

-1.25%

+0.52%

DIFFERENCE IN

CBL

-4.22%5) 33% DECREASE IN

METAL-3 WIDTH

-4.07%3) CONFIG 2) + 14%

DECREASE IN

CONTACT HEIGHT

-3.03%2) CONFIG 1) + 10%

DECREASE IN METAL-2

HEIGHT

-1.98%1) 7.5% DECREASE IN

POLY HEIGHT

DIFFERENCE IN

CWL

BEOL CONFIGURATION

(W.R.T BASE CASE)

Figure 5.12: Effect of variation in BEOL parameters (subject to intra-wafer toler-ances) on CBL and CWL for 6T1

integrated (FEOL+BEOL) instances, a characteristic curve of CBL vs. implanted p-

well dose [Fig. 5.14(a)] was constructed from transport analysis based extraction

in TCAD. From Fig. 5.14(a), we see that minor variations in the p-well dose are

sufficient to cause the CBL spread in Fig. 5.10. Thereafter, using hardware data

from 6T1, the p-well dose distribution of the process was computed [Fig. 5.15(a)],

which was likely to be the single largest contributor to the CBL spread.

To validate the above, we applied the p-well dose distribution to the compan-

ion 6T array in the same process, namely 6T2, and predicted its expected CBL dis-

tribution from its characteristic curve [Fig. 5.14(b)]. Fig. 5.15(b) shows that the

measured data for 6T2 corresponds very well with the predicted CBL distribution,

suggesting that FEOL bit line junction capacitance was indeed the source for CBL

variation. The above observations demonstrate that our methodology is very use-

ful for determining the sources of variation, and that it is versatile enough to pre-

dict distributions of new layouts already (to be) fabricated in a process, once the

process has been characterized.

138

(a)

(b)

Figure 5.13: (a) Measured vs. simulated CGS-VGS data for the nMOS capacitor struc-ture in Fig. 5.13(b) with width 1µm× 2 fingers, (b) Multi-finger (FEOL+BEOL)nMOS capacitor structure


In this section, a hardware-assisted, unified 3D-TCAD based capacitance extrac-

tion methodology was validated using two companion 6T SRAM macros in an

IBM 32nm SOI HKMG process. It helped isolate the FEOL component, namely

junction capacitance as a dominant factor affecting total bit line capacitance varia-

tion across wafers, by using hardware data from one SRAM macro to compute the

139

−2 −1.5 −1 −0.5 0 0.52.05

2.1

2.15

2.2

2.25

2.3

2.35

2.4

2.45

Log10

(Normalized p−well dose)

Log

10 (

Nor

mal

ized

CB

L)

y = 0.032*x3 + 0.021*x2 + 0.045*x + 2.3

(a)

(b)

Figure 5.14: CBL variation with p-well dose for (a) 6T1, and (b) 6T2

p-well dose distribution of the process and subsquently applying it to the compan-

ion 6T SRAM macro, followed by a comparison with measured data corresponding

to the latter.

140

(a)

(b)

Figure 5.15: (a) P-well dose distribution computed from measured 6T1 CBL dis-tribution [Fig. 5.10(a)] and the characteristic curve of 6T1 [Fig. 5.14(a)], and (b)measured vs. predicted distribution for 6T2. The characteristic curve of 6T2 [Fig.5.14(b)] along with the computed p-well dose distribution [Fig. 5.15(a)] is used tocompute the 6T2 CBL distribution. BEOL variation is not considered.

141

5.3 Transport analysis based parasitic capacitance ex-

traction in emerging multi-gate devices and cir-

cuits

In order to perform timing analysis for multi-gate circuits accurately and deter-

mine the most efficient operating point, it is essential to capture parasitic resis-

tances and capacitances accurately. With narrow design windows, overestima-

tion of parasitics results in excessive guardbands, which impacts performance and

voids the benefits of migrating to a new technology node. On the other hand, un-

derestimation of parasitics coupled with process variations causes timing failures

and yield loss. In this section, we delve into transport analysis based capacitance

extraction for layouts having multi-gates FETs at the 22/14/10nm nodes.

5.3.1 Introduction

The nonplanar nature of multi-gate FETs leads to width quantization, where tran-

sistor electrical widths are integer multiples of individual fin electrical widths that

are determined by the fin height and fin thickness. Width quantization and the

3D nature of active multi-gate FET regions pose problems for extraction of FEOL

parasitic capacitances in arbitrary structures (e.g., multi-fin, multiple-finger multi-

gate FETs), thereby making it cumbersome to develop a generic, unified compact

model that can precisely capture FEOL capacitances. The latter is due to the fact

that compact models are predominately based on 2D cross-section assumptions.

Calculation of FEOL capacitances is nontrivial on account of the gate-source/drain

fringe capacitances that are present along each fin in a multi-fin multi-gate FET.

Here, unlike planar FETs, as the electrical width increases, total fringe capacitance

is not amortized to a negligible value per unit width. With increased proximity

142

of BEOL metal/vias/contacts to FEOL features, the issue of (FEOL+BEOL) capac-

itance extraction in multi-gate circuits becomes significant, and the need to com-

prehensively account for FEOL-BEOL interactions arises [133].

In this section, we address the above problems of FEOL/(FEOL+BEOL)

multi-gate capacitance extraction via a 3D-TCAD transport analysis based ap-

proach [146]. The rest of this section is organized as follows. In Section 5.3.2,

we discuss prior related work in multi-gate parasitics extraction. We examine

the sensitivity of device-level parasitic capacitances to various process param-

eters in candidate process-simulated bulk and SOI FinFETs at the 22/14/10nm

nodes in Section 5.3.3. In Section 5.3.4, we describe a unified 3D-TCAD flow for

the extraction of FEOL/(FEOL+BEOL) capacitances in generic multi-gate circuit

layouts by leveraging automated structure synthesis algorithms developed in

Chapter 4, thereby circumventing the complexity barriers posed by 3D process

simulations. Here, using the process-simulated devices from Section 5.3.3, we

compute circuit-level parasitic capacitances for 6T multi-gate SRAMs having dif-

ferent fin pitch, gate pitch, and fin count configurations, thereby providing critical

insight into bit line/word line/internal node FEOL/(FEOL+BEOL) capacitance

trends. In Section 5.3.5, we show that traditional segregated FEOL/BEOL mod-

eling approaches fail to accurately predict transient behavior, by back-annotating

3D-TCAD-extracted capacitances into mixed-mode write simulations of a 6T

FinFET SRAM bitcell. Here, we also examine the relative importance of accurately

modeling device transport versus parasitics, using propagation delay simulations

of FinFET NAND2 logic gates as an example. Finally, we conclude in Section 5.3.6.

5.3.2 Related work

With the advent of nonplanar multi-gate FETs, FEOL device design and optimiza-

tion have garnered significant attention from a parasitics perspective. Mitigation of

143

parasitic resistances/on-current (ION) enchancement via design and process modi-

fications, such as elevated source/drain extensions, usage of stress liners, strained

SOI, and doping profile optimization, have been explored in [147–153]. The effects

of fringe capacitances on device performance have been examined via 3D simu-

lations in [154], while an analysis of geometry-dependent parasitic capacitances

in multi-fin FinFETs is provided in [155]. RC-delay optimization (with fin pitch,

gate pitch, fin height, and fin thickness as parameters) in highly scaled multi-fin

FinFETs has been studied in [148, 156]. Experimental, aggressively-scaled FinFET

SRAM bitcells/arrays with tight fin/gate pitches and their design challenges have

been explored at the 32/22/10nm nodes in [73, 152, 157–160]. In comparison to

modeling parasitic resistances, capturing parasitic capacitances for arbitrary multi-

gate layouts remains a major challenge to-date. This is addressed via a holistic

3D-TCAD flow in the current work. In the next section, we examine the sensitiv-

ity of device-level parasitic capacitances to physical parameters used in FET-level

process simulations.

5.3.3 Multi-gate device-level parasitics

In this section, we examine parasitic capacitances in single-fin FinFETs. We per-

formed 3D process simulations in Sentaurus Process [136] in order to generate bulk

and SOI FinFET structures (such as those shown in Fig. 5.16) at the 22/14/10nm

nodes, using the parameters shown in Table 5.1. The physical dimensions (and

their ranges shown in parentheses) were obtained from a combination of candi-

date device configurations that have either been investigated experimentally or

via device simulations in [72, 148, 151, 153, 156, 160–164].

Here, LG, E f f ective TOX , HGAT E , LSP, TSI , HFIN , HELEV , LDL, NCH , and NSD are

the physical gate length, front/back-gate effective oxide thickness, gate height

above fin, spacer thickness, fin width, fin height, source/drain elevation above

144

(a) (b)

Gate

Raised

drain Raised

source

Channel

stop implant in bulk fin

Active

fin

Gate

Raised

drain

Buried

oxide

Figure 5.16: (a) Bulk FinFET, and (b) SOI FinFET

Table 5.1: Bulk and SOI FinFET device parameters

Technology node→ 22nm 14nm 10nmParameter Bulk and SOI FinFETs

LG(nm) 24[20−25] 14[14−18] 10[10−12]Effective TOX(nm) 1.1 0.9 0.7

HGAT E(nm) 40 40 40LSP(nm) 8 8 8TSI(nm) 10[10−12] 8[8−10] 6[4−7]

HFIN(nm) 40[24−50] 30[14−40] 20[10−30]HELEV (nm) 20 20 20

LDL(nm) 1.5 1.5 1.5NCH(cm−3) 1015 1015 1015

NSD(cm−3) 1020 1020 1020

Bulk FinFETs onlyTST I(nm) 80 80 80

NSTOP(cm−3) 3∗1018 3∗1018 3∗1018

SOI FinFETs onlyTBOX(nm) 240 240 240

fin, source/drain doping decay length, channel doping, and source/drain doping

concentrations, respectively. The above are common to both bulk and SOI FinFETs.

TST I and NSTOP are the shallow-trench isolation (STI) depth and channel stop im-

145

plant concentration, respectively, and are applicable to bulk FinFETs, while TBOX

is the buried oxide thickness in SOI FinFETs. In Fig. 5.17, we highlight the major

steps of the ‘gate-last’ process for bulk FinFETs [138], which involves fin defini-

tion, STI and high-k formation, followed by poly gate and spacer formation. Next,

source/drain epitaxy is performed, followed by poly gate removal and metal gate

deposition, and finally, the contact vias are formed. A similar gate-last process

is employed for SOI FinFETs as well. Thereafter, we perform capacitance extrac-

tion at zero-bias conditions, by importing the single-fin structures into Sentaurus

Device [136], in order to determine off-state capacitances.

Fin

formation

STI + High-k

formation

Poly gate

formation

Spacer

formation

Source/drain

epitaxy

Poly gate

removal

Metal gate

depositionContact via

formation

Figure 5.17: Bulk FinFET ‘gate-last’ process simulation steps

We investigated the dependence of total drain capacitance (CDRAIN,TOT ) and

total gate capacitance (CGAT E,TOT ) on various physical parameters listed in Table

5.1, and the results are shown in Figs. 5.18, 5.19, 5.20, and 5.21. Each process pa-

rameter was perturbed around the nominal value, leading to different ranges for

146

0 0.01 0.02 0.0310

15

20

25

30

LG

(µm)

CD

RA

IN, T

OT (

aF)

0 0.01 0.02 0.0320

30

40

50

60

LG

(µm)

CG

AT

E, T

OT (

aF)

Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm

0.02 0.03 0.04 0.05 0.0610

15

20

25

30

35

HGATE

(µm)

CD

RA

IN, T

OT (

aF)

0.02 0.03 0.04 0.05 0.0620

30

40

50

60

70

HGATE

(µm)C

GA

TE

, TO

T (aF

)

(a) (b)

(c) (d)

Figure 5.18: Dependence of CDRAIN,TOT and CGAT E,TOT on LG and HGAT E

different devices, depending on the technology node. From Figs. 5.18(a) and (b),

Figs. 5.19(c) and (d), and Figs. 5.20(c) and (d), we observe that at all the three tech-

nology nodes, CDRAIN,TOT and CGAT E,TOT are relatively immune to changes in LG,

TSI , and HELEV , with a 5-10% change in off-state capacitance for the chosen win-

dows around the nominal values. From Figs. 5.18(c) and (d), and Figs. 5.20 (a) and

(b), we see that CDRAIN,TOT and CGAT E,TOT scale linearly with HGAT E and HFIN , sug-

gesting that parallel-plate-like gate-drain/source-contact capacitances dominate.

From Figs. 5.19(a) and (b), we see that CDRAIN,TOT and CGAT E,TOT are extremely

sensitive to spacer length LSP, where an 8nm increase in LSP is sufficient to halve

the capacitances. In Figs. 5.21(a) and (b), NCH is seen to have negligible impact on

CDRAIN,TOT and CGAT E,TOT , which suggests that zero-bias depletion capacitances in

the fin drain-body junction are negligible. From Fig. 5.21(c) and (d), CDRAIN,TOT

147

0.006 0.008 0.01 0.012 0.0140

10

20

30

40

LSP

(µm)

CD

RA

IN, T

OT (

aF)

0.006 0.008 0.01 0.012 0.0140

20

40

60

80

LSP

(µm)

CG

AT

E, T

OT (

aF)

0 0.005 0.01 0.01510

15

20

25

30

TSI

(µm)

CD

RA

IN, T

OT (

aF)

0 0.005 0.01 0.01520

30

40

50

60

TSI

(µm)C

GA

TE

, TO

T (aF

)


(a) (b)

(c) (d)

Figure 5.19: Dependence of CDRAIN,TOT and CGAT E,TOT on LSP and TSI

and CGAT E,TOT are seen to be moderately affected by small variations in LDL (which

determines the device underlap/overlap).

In addition to the above, we found CDRAIN,TOT and CGAT E,TOT to be weakly

susceptible to minor variations in NSTOP, TST I , TBOX , and facet angle of the raised

source/drain regions. From Figs. 5.18, 5.19, 5.20, and 5.21, we see that single-fin

22nm bulk FETs have only marginally higher CDRAIN,TOT and CGAT E,TOT in com-

parison to their SOI counterparts, while at the 14nm/10nm nodes, the difference

is negligibly small. Overall, CDRAIN,TOT decreases by 16% (44%) moving from the

22nm to 14nm FETs (14nm to 10nm FETs), while CGAT E,TOT decreases by 18% (51%),

respectively. The above analysis (which is based on single-fin structures) suggests

that the maximum reduction in FEOL off-state capacitances would occur when

switching from the 14nm to 10nm devices, for the design points chosen in Table 5.1.

148

0.01 0.02 0.03 0.04 0.0510

15

20

25

30

35

HFIN

(µµµµm)

CD

RA

IN,

TO

T (

aF

)

0.01 0.02 0.03 0.04 0.0520

30

40

50

60

70

HFIN

(µµµµm)

CG

AT

E,

TO

T (

aF

)

0.01 0.015 0.02 0.025 0.0310

15

20

25

30

35

HELEV

(µµµµm)

CD

RA

IN,

TO

T (

aF

)

0.01 0.015 0.02 0.025 0.0320

30

40

50

60

HELEV

(µµµµm)

CG

AT

E,

TO

T (

aF

)

Bulk 22nm

SOI 22nm

Bulk 14nm

SOI 14nm

Bulk 10nm

SOI 10nm

(a) (b)

(c) (d)

Figure 5.20: Dependence of CDRAIN,TOT and CGAT E,TOT on HFIN and HELEV

1015

1016

1017

1018

10

15

20

25

30

NCH

(cm-3

)

CD

RA

IN,

TO

T (

aF

)

1015

1016

1017

1018

20

30

40

50

60

NCH

(cm-3

)

CG

AT

E,

TO

T (

aF

)

1 2 3 4

x 10-3

10

15

20

25

30

35

40

LDL

(µµµµm)

CD

RA

IN,

TO

T (

aF

)

1 2 3 4

x 10-3

20

30

40

50

60

70

LDL

(µµµµm)

CG

AT

E,

TO

T (

aF

)

Bulk 22nm

SOI 22nm

Bulk 14nm

SOI 14nm

Bulk 10nm

SOI 10nm

(a) (b)

(c) (d)

Figure 5.21: Dependence of CDRAIN,TOT and CGAT E,TOT on NCH and LDL

149

In the next section, we compute parasitic capacitances in multi-fin FinFETs and 6T

SRAMs to explore circuit-level trends.

5.3.4 Multi-gate circuit-level parasitics

In this section, we explain the challenges involved in multi-gate circuit-level para-

sitics extraction and demonstrate a pragmatic 3D-TCAD based solution applied to

multi-fin multi-gate FETs and 6T multi-gate SRAMs.

Methodology

Estimation of parasitic capacitances in multi-gate circuits is beset with two

major problems: FEOL extraction for generic FET layout configurations and

(FEOL+BEOL) extraction for arbitrary metal layout configurations above the

FEOL active regions. Traditionally, the FEOL component of capacitance is com-

puted via TCAD simulations on 2D/3D geometries of select configurations (as

in Section 5.3.3), while the BEOL component is captured using accelerated field

solvers [139–141]. As FEOL devices shrink with each subsequent technology node,

BEOL metals/contacts/vias also get smaller, resulting in metal features that are

close to the active FET regions. Thus, it is difficult to accurately determine the total

capacitance at each node of the circuit using a segregated FEOL/BEOL approach,

as it would not comprehensively account for geometry-specific FEOL-BEOL inter-

actions [133]. The development of unified, layout-aware compact models that can

accurately compute FEOL capacitances in multi-gate circuits with widely varying

fin counts, fin/gate pitches in multi-finger FETs, FETs with shared source/drains

with different fin counts on either side, etc., is also a cumbersome problem.

The above present challenges for process-device-circuit co-design during the

early phases of technology development, where no straightforward methodology

exists to directly determine the effect of process modifications on circuit-level par-

150

asitics. Owing to the small absolute capacitances involved, modeling errors of

the order of few tens of aF per fin, which would be ignored in devices at higher

technology nodes, can easily translate to a large percentage difference in predicted

capacitances, and lead to erroneous timing estimates in multi-gate circuits.

Transport analysis based 3D-TCAD capacitance extraction offers a plausible

solution, as it can account for transport in FEOL device regions, and capture

FEOL-BEOL interactions accurately by treating active regions as semiconductors

in (FEOL+BEOL) structures. However, the traditional approach (as shown in

Fig. 5.22(a)) is plagued by the intractable time/memory complexity of 3D process

simulation of large layouts (for the generation of accurate geometries prior to

device simulation), which dramatically limits its scope.

In our work, we circumvented the 3D process simulation barrier by lever-

aging the automated structure synthesis methodology, whose multi-gate ver-

sion is outlined in Fig. 5.22(b). As before, the approach involves a one-time

process-simulation cost for the construction of a multi-gate FET database con-

sisting of n-/p-FinFETs at each technology node. Thereafter, with the aid of an

FEOL/(FEOL+BEOL) multi-gate structure synthesizer (which is equipped with a

layout analyzer/partitioner), the FEOL/(FEOL+BEOL) structure corresponding

to any input multi-gate circuit layout is synthesized automatically, using the

device database and FEOL/BEOL process assumptions. This enables a crucial

modeling trade-off, where process-level accuracy is preserved in regions in and

around active FETs, while providing very favorable time/memory scaling prop-

erties, thereby extending its reach beyond simple layouts. It also permits iterative

optimization for a large number of layouts in a practical timeframe. We harness

the setup in Fig. 5.22(b) to analyze multi-fin multi-gate FETs and 6T multi-gate

SRAMs in subsequent parts of this section.

151

Process

conditions

3D process simulation

of the entire layout using detailed

mechanisms/kinetics

Process-simulated structure

Transport analysis based

3D-TCAD capacitance

extraction

Litho simulation,

mask generation

Generic layout

N devices

Tweak layout/process parameters

Complexity bottleneck

Time: f(N), Memory: g(N)

(a)

FEOL/BEOL

process assumptions

FEOL/(FEOL+BEOL)

multi-gate structure synthesizer

FEOL structure


3D-TCAD FEOL/(FEOL+BEOL)

capacitance extraction

Generic layout

N devices

Tweak layout

/process

Time: k(N), Memory: h(N)

(FEOL+BEOL) structure

Process

simulator

Process

recipe

Device

database

Single

FET layout

Scalable to large layouts

Process-simulated

FETs

(b)

Figure 5.22: 3D-TCAD based capacitance extraction for generic multi-gate circuitlayouts: (a) traditional approach using brute-force process simulation, and (b) ourflow which leverages the automated structure synthesis approach

Parasitic capacitances in multi-fin multi-gate FETs

Owing to the width quantization property, multi-gate FETs with large electrical

widths need to have multiple fins. We synthesized multi-fin FinFETs using the152

bulk and SOI FinFETs generated earlier at the 22/14/10nm nodes. They con-

sisted of four fins each, with shared raised source/drain epi-regions that are

via-contacted and connected using metal-1, as shown in Figs. 5.23(a) and (b). We

varied the fin pitch, FP, and computed the parasitic (FEOL+BEOL) capacitances

for each layout using the setup described in Fig. 5.22(b).

GATE

DRAIN SOURCE DRAIN SOURCE

GATE

(a) (b)

Figure 5.23: Multi-fin FinFET (a) bulk, and (b) SOI structures. Dielectric regionsare not shown

From Fig. 5.24(a), we can see that the trends in CDRAIN,TOT are in stark contrast

to the single-fin results in Section 5.3.3. While moving from SOI to bulk FETs,

there is a 11.5%, 10.8%, and 8.8% increase in CDRAIN,TOT for the 22nm, 14nm, and

10nm nodes, respectively, which can be attributed to the shared drain-to-bulk fin

capacitances in bulk FETs. However, in the case of CGAT E,TOT [Fig. 5.24(b)], there is

only a 2-4% increase from SOI to bulk FETs. An increase in FP from 40nm to 70nm

results in a 20%, 31%, and 36% increase in CDRAIN,TOT for the 22nm, 14nm, and 10nm

nodes, respectively, while CGAT E,TOT increases by 16%, 26%, and 28%, respectively.

These results suggest that gate-to-epi-source/drain/metal-1 capacitances begin to

153

dominate as FP increases/as the technology node decreases, and they highlight

the need to model the entire (FEOL+BEOL) structure.

40 45 50 55 60 65 7070

80

90

100

110

120

130

140

150

Fin pitch, FP (nm)

CD

RA

IN, T

OT (

aF)


(a)

40 45 50 55 60 65 70100

120

140

160

180

200

220

240

Fin pitch, FP (nm)

CG

AT

E, T

OT (

aF)


(b)

Figure 5.24: Dependence of CDRAIN,TOT and CGAT E,TOT on FP

Parasitic capacitances in 6T multi-gate SRAMs

Since SRAMs are among the densest circuits manufactured at any technology

node, we examine their multi-gate variants in detail. As with highly scaled planar

6T SRAMs, a delicate balance needs to be maintained between static/dynamic

readability and writeability metrics in multi-gate 6T SRAMs as well. Here, a key

challenge is to determine the bit line (CBL,TOT , CBLB,TOT ), word line (CWL,TOT ),

and internal node (CNL,TOT , CNR,TOT ) capacitances accurately, so that they can be

back-annotated into SPICE/mixed-mode TCAD simulations that capture dynamic

stability metrics. Figs. 5.25(a) and 5.26(a) show the synthesized 3D (FEOL+BEOL)

geometries for bulk and SOI FinFET SRAM bitcells, while Figs. 5.25(b) and 5.26(b)

show their corresponding FEOL counterparts. The bitcell names are based on

pull-up (PU), pass-gate (PG), and pull-down (PD) fin counts. Hence, Figs. 5.25

and 5.26 represent 6T SRAM (111) bulk and SOI configurations, respectively. From

Figs. 5.25(a) and 5.26(a), we can see that both bitcells are designed with metal-3

154

word lines (WLs), metal-2 bit lines (BL and BLB), metal-2 ground (GND) and sup-

ply (VDD), which is also the convention followed in all the bitcell configurations

that follow. For the FEOL structures in Figs. 5.25(b) and 5.26(b), all back-end

features, including contacts to epi-raised source/drain regions, are absent. The

connected gates are preserved to capture WL and internal node (NL and NR)

connectivity.

(a) (b)

WL

WL

NR

NL

WL

GND

BL

BLBVDD

GND

Pull-down

Pull-up

Pass-gate

n-FinFET

p-FinFET

Figure 5.25: Bulk FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown

Varying fin pitch: We computed (FEOL+BEOL) capacitances corresponding to

6T FinFET SRAM (111) layouts for various fin pitches with gate pitch, GP = 90nm

(Fig. 5.27). From Fig. 5.27(a), we can see that as FP decreases, CBL,TOT increases.

Since metal-2 BLs are vertical to the cell in the chosen configurations, CBL,TOT is

highly susceptible to fin pitch modifications. The effect is very pronounced in all

the bulk (SOI) SRAMs, which witness a 31-36% (31-38%) increase in capacitance,

respectively. The plateau in CBL,TOT at FP = 50-60nm is due to the fact that metal-2

BL, BLB, VDD, and GND tracks are wider for FP = 60nm, 70nm, owing to the larger155

(a) (b)

GND BLBLB

VDD

GND

WLWL NR

NL

WL

Pull-down

Pull-up

Pass-gate

n-FinFET

p-FinFET

Figure 5.26: SOI FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown

pitches. CWL,TOT is affected by trends at high and low FP. As FP increases, the

metal-3 WL gets longer and aggregates capacitances from bitcell features below it,

which increases CWL,TOT . When FP decreases beyond a certain point (FP = 50nm),

the capacitance between the WL gate and shared source/drain/metal-1 regions

in the neighborhood boosts CWL,TOT . Fig. 5.28 shows the FEOL capacitances for

FP = 50nm. For both SOI and bulk SRAMs, FEOL CBL,TOT , CWL,TOT , and CNL,TOT

increase by 40-60% from the 10nm to 22nm nodes. The ratio between FEOL and

(FEOL+BEOL) components across technology nodes at FP = 50nm for CBL,TOT ,

CWL,TOT , and CNL,TOT is 22-28%, 50-55%, and 76-82%, respectively. From the above

observations, we can see that FP needs to be chosen carefully to manage CBL,TOT ,

CWL,TOT , and CNL,TOT .

Varying gate pitch: We also computed (FEOL+BEOL) capacitances correspond-

ing to 6T FinFET SRAM (111) layouts for various gate pitches with FP = 50nm

(Fig. 5.29). In Fig. 5.29(a), CBL,TOT can be seen to increase by 6-8% across the tech-

nology nodes, as GP increases from 70nm (80nm) to 100nm (110nm). This is on ac-

156

40 45 50 55 60 65 7080

90

100

110

120

130

140

Fin pitch, FP (nm)

CB

L, T

OT (

aF)


(a)

40 45 50 55 60 65 70160

170

180

190

200

210

220

230

240

Fin pitch, FP (nm)

CW

L, T

OT (

aF)


(b)

40 45 50 55 60 65 7024

26

28

30

32

34

36

38

40

Fin pitch, FP (nm)

CB

L, W

L (aF

)


(c)

40 45 50 55 60 65 70140

160

180

200

220

240

260

Fin pitch, FP (nm)

CN

L, T

OT (

aF)


(d)

Figure 5.27: CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. FP, GP = 90nm

count of metal-2 BL parallel-plate-like capacitances to surrounding features, which

is dependent on the BL track length/bitcell height and, hence, on GP. As GP de-

creases, CWL,TOT trends upward in Fig. 5.29(b), implying that at tight gate pitches,

WL gate to BL, and WL gate to internal node coupling increases. This is also con-

firmed by Figs. 5.29(c) and (d), where CBL,WL increases by 11-12% and CNL,TOT in-

creases by 13-15% for the bulk and SOI cases, when moving from GP = 100nm

(110nm) to 70nm (80nm). The latter trend is not observed for the FP cases in Fig. 5.27.

In stark contrast with single-fin FEOL capacitance observations in Section 5.3.3,

from Figs. 5.27 and 5.29, we see that the maximum reduction in (FEOL+BEOL)

157

1 2 30

20

40

60

80

100

120

140

160

180

200

Cap

acit

ance

(aF

)

1 2 30

20

40

60

80

100

120

140

160

180

200

Cap

acit

ance

(aF

)

Bulk 10nmBulk 14nmBulk 22nm

SOI 10nmSOI 14nmSOI 22nm

CBL, TOT

CBL, TOT

CWL, TOT C

WL, TOT

CNL, TOT C

NL, TOT

Figure 5.28: FEOL components of capacitance in the 6T SRAM (111) configuration,FP = 50nm, GP = 90nm

70 80 90 100 11085

90

95

100

105

110

115

120

125

Gate pitch, GP (nm)

CB

L, T

OT (

aF)


(a)

70 80 90 100 110160

170

180

190

200

210

220

230

Gate pitch, GP (nm)

CW

L, T

OT (

aF)


(b)

70 80 90 100 11020

25

30

35

40

45

Gate pitch, GP (nm)

CB

L, W

L (aF

)


(c)

70 80 90 100 110140

160

180

200

220

240

260

Gate pitch, GP (nm)

CN

L, T

OT (

aF)


(d)

Figure 5.29: CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. GP, FP = 50nm

158

CBL,TOT and CWL,TOT occurs when moving from the 22nm node to 14nm node, and

is around 17% and 21%, respectively.

Varying fin count: An important aspect of SRAM bitcell design is the set of bitcell

β (βPD/PG and βPG/PU ) ratios that determine readability and writeability metrics,

and are set by the electrical widths of the PU, PG and PD FETs. Owing to the

width quantization property, the electrical widths of the PG/PD FinFETs can only

be increased in integer multiples of a single fin, on account of which it is necessary

to examine the impact of varying fin counts on bitcell capacitances. We synthesized

several different flavors of FinFET SRAMs, namely (112), (113), (122), and (123)

(shown in Figs. 5.30, 5.31, 5.32, and 5.33), using 22nm bulk and SOI FETs with FP =

40nm and GP = 90nm. Fig. 5.34(a) shows the variation of CBL,TOT , CWL,TOT , and

GND BL BLBVDD GND

WL NR

NL

Pull-down

WL

WLPull-up

Pass-gateNR

NL

(a) (b)

n-FinFET

p-FinFET


CNL,TOT for the different configurations. CBL,TOT decreases by 25% while moving

from the (111) to (112) configuration, as the addition of a single fin to the PD FET

adds an extra fin pitch, which permits larger spacings between GND, BL, and VDD.

While the (113) configuration adds an extra PD fin, the PG fin remains unchanged,

on account of which reduction in CBL,TOT from (112) to (113) is not significant. From

the above, we can see that a 33% (66%) increase in bitcell area, from the (111) to159

(a) (b)

GND BL BLBVDD GND

WL NR

NL

Pull-down

WL

WLPull-up

Pass-gateNR

NL

n-FinFET

p-FinFET


GND BL BLBVDD GND

WL NR

NL

(a) (b)

Pull-downWL

WLPull-up

Pass-gateNR

NL

n-FinFET

p-FinFET


(112) or [(111) to (113)] configuration, can reduce CBL,TOT and increase the βPD/PG

ratio significantly. The (122) and (123) bitcells have higher CBL,TOT as the PG fin

count is higher. Since metal-3 WLs run across the breadth of the bitcell, CWL,TOT

generally increases as the PD/PG fin count increases. However, CWL,TOT decreases

from the (122) to (123) configuration, as the WL gate to internal node coupling

decreases, due to the additional fin pitch spacing between them. We also examined

the FEOL capacitance trends across the different bitcell configurations using SOI

160

GND BL BLBVDD GND

WL NR

NL

Pull-downWL

WL

Pull-up

Pass-gateNR

NL

(a) (b)

n-FinFET

p-FinFET


1 2 3 4 595

100

105

110

115

120

125

130

135

140

CB

L, T

OT (

aF)

1 2 3 4 5180

200

220

240

260

280

300

320

340

CW

L, T

OT (

aF)

1 2 3 4 5150

200

250

300

350

400

450

CN

L, T

OT (

aF)

Bulk 22nmSOI 22nm

(111)(112)(113)(122)(123) (111)(112)(113)(122)(123) (111)(112)(113)(122)(123)

(a) (b) (c)

Figure 5.34: CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM(FEOL+BEOL) configurations

FETs (Fig. 5.35). While the addition of PD fins does not significantly impact FEOL

CBL,TOT , the addition of a PG fin leads to a 73% increase with respect to (111).

CWL,TOT decreases while moving from the (111) to (112) configuration, as the WL

gate is located away from the internal nodes (Fig. 5.30). However, the addition of

a PG fin increases CWL,TOT by 65% with respect to (111).

161

1 2 3 4 530

35

40

45

50

55

CB

L, T

OT (

aF)

1 2 3 4 5110

120

130

140

150

160

170

180

190

200

210

CW

L, T

OT (

aF)

1 2 3 4 5160

180

200

220

240

260

280

300

320

CN

L, T

OT (

aF)

SOI 22nm

(111) (112) (113) (122) (123) (111) (112) (113) (122) (123) (111) (112) (113) (122) (123)

(c)(b)(a)

Figure 5.35: CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM FEOLconfigurations

Modeling lithographic effects: The structures synthesized above do not take into

account the lithographic rounding effects on printed features and, hence, our ear-

lier setups are likely to overestimate parasitic capacitances. In order to quantify the

latter, we performed simple experiments where feature rounding was introduced

into the BEOL metal layer by layer (with 10nm and 6nm radii of curvature for metal

and vias, respectively) during structure synthesis, to realistically model printed

via/metal shapes. Figs. 5.36(a) and (b) show the 22nm bulk 6T SRAM (111) BEOL

metal stack without and with lithographic corner rounding. From Figs. 5.37(a) and

(b), we can see that there is only a 3-4% overestimation error, which suggests that

lithographic effects can be ignored without too much loss in accuracy.

5.3.5 Multi-gate parasitics vs. device transport

In this section, we highlight the need to back-annotate 3D-TCAD-extracted para-

sitic capacitances into mixed-mode transient simulations, and compare the relative

importance of modeling device transport versus device parasitics.

162

(a) (b)

Figure 5.36: BEOL metal stack from the 22nm 6T SRAM (111) bitcell (a) withoutlithography effects, and (b) with lithography effects. Dielectric regions are notshown

1 2 3−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

Err

or =

(C

WIT

H L

ITH

O −

CW

ITH

OU

T L

ITH

O)/

CW

ITH

LIT

HO

(%

)

Bulk 10nmBulk 14nmBulk 22nm

CBL, TOT

CWL, TOT

CNL, TOT

(a)1 2 3

−0.035

−0.03

−0.025

−0.02

−0.015

−0.01

−0.005

0

Err

or =

(C

WIT

H L

ITH

O −

CW

ITH

OU

T L

ITH

O)/

CW

ITH

LIT

HO

(%

)

FP=40nm

FP=50nm

FP=60nm

FP=70nm

CNL, TOT

CWL, TOTC

BL, TOT

(b)

Figure 5.37: CBL,TOT , CWL,TOT , and CNL,TOT error percentages for (a) bulk 6T SRAM(111) configuration, FP = 50nm, and (b) varying FP for the 22nm bulk 6T SRAM(111) configuration. GP = 90nm

Back-annotating 3D-TCAD parasitics: Device-circuit co-design with FETs at

emerging technology nodes is studied using mixed-mode simulations [81]. How-

ever, parasitic capacitances are generally ignored and rarely accounted for explic-

163

NN

BL BLBWLWL

VDD

NL

GND

NR

Mixed-mode setup

Transient mixed-

mode TCAD simulation

Figure 5.38: Vanilla mixed-mode setup (V MM)

itly [72], under the assumption that they have negligible effects. Figs. 5.38, 5.39,

and 5.40 show three different TCAD setups that are relevant. Fig. 5.38 represents

the ‘vanilla’ mixed-mode (V MM) setup, where individual devices are connected

to form a circuit, and transient simulations are performed without any additional

parasitics data. With technology scaling, under the assumption that FEOL ca-

pacitances are captured correctly in the TCAD device cross-sections, additional

BEOL capacitances can be included explicitly, as shown in Fig. 5.39. They are

obtained from FS based capacitance extractions on the relevant BEOL structures,

and the setup is referred to as ‘FS parasitics + mixed-mode’ (FSMM). In the third

setup (shown in Fig. 5.40), 3D-TCAD-extracted capacitances from (FEOL+BEOL)

structures are back-annotated into the mixed-mode setup. However, to avoid

double counting contributions already accounted for in the device cross-sections,

a capacitance extraction experiment is performed for the mixed-mode setup,

and the difference between the former and the latter is included explicitly. The

3D-TCAD based extraction/back-annotation strategy in Fig. 5.40 is enabled by the

methodologies discussed in Fig. 5.22(b). This setup is referred to as ‘3D-TCAD

parasitics + mixed-mode’ (3D-TCADMM).

Using the above, we performed mixed-mode bitcell write simulations for the

22nm SOI 6T FinFET SRAM (111) bitcell (FP = 40nm, GP = 90nm), assuming an ar-

ray column height (row width) of 32 (256) bitcells, and VDD = 1V . From Fig. 5.41,

we see that for the chosen WL pulse width of 150ps, the V MM and FSMM setups

164

BEOL structure

synthesis

FS capacitance

extraction

NN

BL BLBWLWL

VDD

NL

GND

NR

BEOL capacitance

contribution

…, etc

BL

NL

BL

WL

BL

NRC(B

L, N

L)

Mixed-mode setup

C(B

L, W

L)

C(B

L, N

R)

Transient mixed-

mode TCAD simulation

Input layout

Figure 5.39: Mixed-mode setup with FS-extracted BEOL capacitances (FSMM)

3D TCAD

structure synthesis

3D TCAD

capacitance extraction

NN

BL BLBWLWL

VDD

NL

GND

NR

Mixed-mode

TCAD capacitance

extraction

C(BL, NL) = ...

C(BL,WL) = …C(BL, NL) = ...

C(BL,WL) = …

Mixed-mode capacitance

correction

…, etc

BL

NL

BL

WL

BL

NR∆C

(BL, N

L)

Mixed-mode setup

∆C

(BL, W

L)

∆C

(BL, N

R)

Corrected

transient mixed-mode TCAD

simulation

Input layout

Figure 5.40: Mixed-mode setup with corrected 3D-TCAD capacitances (3D-TCADMM)

165

yield writeable cells, albeit with a large difference in NL-NR cross-over point. The

3D-TCADMM setup predicts a write failure. This shows that V MM setups are un-

realible for transient multi-gate circuit simulations. Also, the difference between

FSMM and 3D-TCADMM suggests that a segregated FEOL/BEOL modeling ap-

proach breaks down at such highly scaled technology nodes. This is due to the fact

that FEOL capacitances are not accurately captured from single-fin 2D/3D cross-

sections in the FSMM setup, e.g., SRAM bitcells with different fin/gate-pitches

would effectively have the same FEOL capacitance contribution. The 3D-TCADMM

setup is able to holistically capture FEOL capacitances for any multi-gate layout

configuration, and also account for FEOL-BEOL interactions accurately.

Typically, DC write metrics such as writeability current (IW ), which are ex-

tracted from the ‘N-curves’ [165], can ensure writeability at some arbitrary write

pulse width. However, from an array performance and throughput perspective,

it is essential to determine the minimum write-pulse width (TW ), i.e., the shortest

WL pulse width needed to unconditionally write into any bitcell in the array. Here,

modeling dynamic behavior of the bitcell with accurate parasitics data is critical.

We quantified the difference between FSMM and 3D-TCADMM by extracting TW at

various cell sigmas, with σVt = 30mV . The results are shown in Fig. 5.42. TW pre-

dicted by 3D-TCADMM is consistently higher than FSMM, which implies that the

net effects of the FEOL capacitance difference between the two and FEOL-BEOL

capacitance contributions captured by 3D-TCADMM are not negligible.

Modeling device transport vs. parasitics: Over the past decade, considerable

amount of research has been directed toward the inclusion of advanced transport

phenomena into mainstream device simulators, in order to perform accurate

mixed-mode device-circuit simulations. The latter is becoming increasingly com-

mon owing to the fact that compact model development for SPICE simulations

lags developments in technology, and that it is cumbersome to re-target compact

166

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

x 10−10

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vol

tage

(V

)

WLINNL, Vanilla mixed modeNR, Vanilla mixed modeNL, FS parasitics + mixed modeNR, FS parasitics + mixed modeNL, 3D−TCAD parasitics + mixed modeNR, 3D−TCAD parasitics + mixed mode

Figure 5.41: Write operations for a 6T FinFET SRAM (111) bitcell using the setupsdescribed in Figs. 5.38, 5.39, and 5.40

50 100 150 200 250 3000

1

2

3

4

5

6

7

Minimum write pulse width, TW

(ps)

Cel

l sig

ma

FS parasitics + mixed mode3D−TCAD parasitics + mixed mode

Figure 5.42: Minimum write pulse width (TW ) vs. cell sigma

models to new processes. With increased FEOL/BEOL scaling, a key modeling

question that needs attention is: what is the relative importance of advanced trans-

port models versus accurate parasitics data, when modeling multi-gate circuits

via mixed-mode TCAD transient simulations?

We addressed the above by studying propagation delay in two flavors

of FinFET NAND2 logic gates, namely SG-NAND2 and LP-NAND2, whose

167

(FEOL+BEOL) structures are shown in Fig. 5.43. SG-NAND2 consists of SG-

mode FinFETs, while LP-NAND2 consists of IG-mode FinFETs, where front and

back gates are electrically independent. The back gates of the p-FinFETs are tied to

VHIGH = 1.2V , while the back gates of the n-FinFETs are connected to VLOW =−0.2V ,

with VDD = 1V . On account of the back-gate biases, LP-NAND2 has lower leakage

and higher latency in comparison to SG-NAND2.

VDD

OUT

GND

A

B

VHIGH

VLOW

A

B

OUT

n-FinFET

p-FinFET

IG-MODE

(a) (b)

Figure 5.43: (a) SG-NAND2, and (b) LP-NAND2 FinFET configurations

We examined four transport model scenarios: (a) only the drift diffusion (DD)

transport formalism [2,166] is used, (b) DD is used along with back-annotated 3D-

TCAD parasitic capacitances (DD+PC), (c) only the hydrodynamic (HD) transport

formalism [2] is used, and (d) HD is used along with back-annotated 3D-TCAD

parasitic capacitances (HD+ PC). We refer to F(A,B) as the error percentage in

propagation delay between the scenarios A and B (A,B ∈ DD,DD+PC,HD,HD+

PC) such that F(A,B) = tp(A)−tp(B)tp(B)

× 100, where tp is the average of the rise (tpLH)

and fall (tpHL) delays. From Fig. 5.43, we can see that BEOL metal density in SG-

NAND2 is not as high as in LP-NAND2. Hence, from Figs. 5.44(a) and (b), we see168

that F(DD+PC,DD) and F(HD+PC,HD) are higher for LP-NAND2. For both con-

figurations, it can be seen that |F(DD+PC,DD)| and |F(HD+PC,HD)| are smaller

than and comparable to |F(HD,DD)| and |F(HD+PC,DD+PC)|. This shows that

while transport models dominate, it is very important to capture parasitic capaci-

tances accurately, as in the absence of parasitics data, complicated transport mod-

els at the device level fail to provide precise predictions at the circuit level in a

mixed-mode TCAD setup. (The latter observation is also valid for SPICE simula-

tions where compact models are unlikely to capture FEOL parasitics accurately.)


In the move from planar to multi-gate FET technology, the need to optimize para-

sitics is likely to be the most important circuit-level design priority, as the ability to

predict and control parasitics will determine whether device-level improvements

in the on-current translate to tangible overall performance improvements. In

this regard, extraction of parasitic capacitances for multi-gate circuits at the 22nm

node and beyond is beset with major challenges in terms of FEOL/(FEOL+BEOL)

extraction for generic layouts. In this section, we established the fact that seg-

regated FEOL/BEOL modeling approaches for parasitic capacitance extraction

in multi-gate circuit layouts are insufficient and quantified the relative impor-

tance of modeling advanced transport phenomena versus incorporating parasitic

capacitances. In doing so, we developed a pragmatic 3D-TCAD flow based on

the automated structure synthesis approach, which can serve as a solution to the

FEOL/(FEOL+BEOL) capacitance extraction problem for small to reasonably large

multi-gate circuit layouts, and assist in the development/validation of compact

models during early phases of technology development. Using the 3D-TCAD

flow, we also provided critical insight into BL and WL capacitance scaling in 6T

multi-gate SRAMs along the 22/14/10nm technology nodes.

169

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

x 10-10

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

1 2 3 4-25

-20

-15

-10

-5

0

5

10

15

Err

or

(%),

F(A

,B)

= (

(tp(A

) -

t p(B

))*1

00

/tp(B

))

A

OUT, DD

OUT, DD+PC

OUT, HD

OUT, HD+PC

F(HD, DD)

F(HD+PC, DD+PC)

F(DD+PC, DD)

F(HD+PC, HD)

(a)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

x 10-10

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

Time (s)

Vo

lta

ge

(V

)

1 2 3 4-30

-20

-10

0

10

20

30

Err

or

(%),

F(A

,B)

= (

(tp(A

) -

t p(B

))*1

00

/tp(B

))

A

OUT, DD

OUT, DD+PC

OUT, HD

OUT, HD+PC

F(DD+PC, DD)

F(HD+PC, HD)

F(HD+PC, DD+PC)

F(HD, DD)

(b)

Figure 5.44: Propagation delays of (a) SG-NAND2, and (b) LP-NAND2 configu-rations with different physical models. (DD = Drift-diffusion formalism, HD =hydrodynamic formalism, PC = 3D-TCAD-extracted parasitic capacitances correc-tions added)

170

Chapter 6

Parasitics-aware Design of Symmetric

and Asymmetric Gate-workfunction

FinFET SRAMs

6.1 Introduction

Since SRAM bitcells are often the densest features patterned on a chip, a significant

amount of research has been directed towards the design and manufacturability of

multi-gate SRAMs [73,152,157–160] at emerging technology nodes. However, most

of these investigations have focused on enhancing/contrasting SRAM DC metric

targets. While DC metrics can be directly obtained using measurements on flycells,

inferring array-scale behavior at design time is proving to be exceedingly difficult.

In order to predict array-scale metrics through simulation, capturing transient be-

havior accurately is absolutely essential. To accomplish the latter, SRAM parasitic

capacitances need to be extracted accurately from the layout.

As mentioned in Chapter 5, width quantization in multi-gate devices poses

problems for extraction of FEOL parasitic capacitances in arbitrary SRAM layouts

171

in upcoming/emerging process technologies, where reliance on compact models

for FEOL parasitics is not possible. Also, due to the increased proximity of BEOL

metal/vias to FEOL active silicon regions, it is highly questionable whether tra-

ditional segregated FEOL/BEOL parasitics modeling approaches yield accurate

results.

In this chapter, we evaluate several Symm-ΦG and Asymm-ΦG 6T FinFET

SRAMs from the perspective of transient behavior, by back-annotating 3D-TCAD-

extracted (FEOL+BEOL) layout-specific parasitic capacitances into mixed-mode

transient device simulations, thereby addressing the shortcomings described

above [167].

The chapter is organized as follows. In Section 6.2, we discuss prior related

work in multi-gate parasitics extraction and SRAM design. Thereafter, we outline

our simulation setup in Section 6.3. In Section 6.4, we explore a variety of current

and new SRAM bitcell topologies using the setup described in Section 6.3, and

highlight the need for transient analysis based bitcell design space exploration.

Finally, we present our conclusions in Section 6.5.

6.2 Related work

Over the past decade, a considerable amount of research has been directed towards

SRAM bitcell design using FinFETs/Tri-gate devices. Experimental, aggressively-

scaled FinFET SRAM bitcells/arrays with tight fin/gate pitches and their design

challenges have been explored in [73, 152, 157–159]. The highest-density func-

tional multi-gate SRAM bitcell to-date [fabricated using extreme ultra violet (EUV)

lithography, with an area of 0.021µm2 and minimum contacted fin/gate pitch of

50nm] has been reported in [160]. On the modeling front, several investigations

into FinFET/Tri-gate SRAM topologies as well as DC metrics are available in the

172

literature [88, 168–171]. In [71, 72], SRAM topologies using IG-mode FinFETs, such

as pass-gate feedback (PGFB) and pull-up write gating (PUWG), have been pro-

posed to enhance SRAM read/write margins. A dynamic row-based back-gate bi-

asing (RBB) scheme to enhance performance and reduce leakage in 6T/8T FinFET

SRAMs has been proposed in [77]. While most of the above work focus mainly on

DC metrics, a detailed analysis of transient behavior and parasitics data is lacking

for the bitcells proposed, thereby rendering such explorations incomplete.

Optimization of multi-gate device-level parasitic resistances and capacitances

has received a lot of attention as well. Mitigation of parasitic resistances/on-

current (ION) enhancement via design and process modifications, such as elevated

source/drain extensions, usage of stress liners, strained SOI, and doping profile

optimization, have been explored in [147–153]. The effects of fringe capacitances

on device performance have been examined via 3D simulations in [154], while an

analysis of geometry-dependent parasitic capacitances in multi-fin FinFETs is pro-

vided in [155]. RC-delay optimization (with fin pitch, gate pitch, fin height, and

fin thickness as parameters) in highly scaled multi-fin FinFETs has been studied

in [148, 156].

6.3 Simulation setup

In this section, we briefly explain the simulation setup that was employed to eval-

uate the various FinFET SRAM bitcell architectures discussed in Section 6.4. We

performed process simulations in Sentaurus Process [136] in order to generate

SOI FinFET structures at the 22nm node with device parameters specified in Ta-

ble 6.1, which were obtained from a combination of candidate device configura-

tions that have either been investigated experimentally or via device simulations

in [72, 153, 156, 161, 162].

173

Table 6.1: 22nm SOI FinFET device parameters

Parameter ValueLG(nm) 24

Effective TOX(nm) 1.1HGAT E(nm) 40

LSP(nm) 12TSI(nm) 12

HFIN(nm) 40HELEV (nm) 20

LDL(nm) 4NCH(cm−3) 1015

NSD(cm−3) 3∗1020

TBOX(nm) 240FP(nm) 40GP(nm) 90

HGATE

TBOX

LG

HFIN

TSI

HELEV

LG

LSP

TOX

(a) (b)

Figure 6.1: (a) Two-dimensional SOI n-FinFET cross section, and (b) 3D SOI n-FinFET structure

Fig. 6.1(a) shows the two-dimensional cross-section of the 3D SOI n-FinFET

structure [Fig. 6.1(b)] obtained from process simulation. Here, LG, E f f ective TOX ,

HGAT E , LSP, TSI , HFIN , HELEV , LDL, NCH , NSD, TBOX , FP, and GP are the physical

gate length, front/back-gate effective oxide thickness, gate height above fin, spacer

thickness, fin width, fin height, source/drain elevation above fin, source/drain

174

doping decay length, channel doping, source/drain doping concentrations, buried

oxide thickness, fin pitch, and gate pitch, respectively.

FinFETs with three different gate workfunctions (ΦG) are used in the configu-

rations explored in Section 6.4. For Symm-ΦG high-performance n-FinFETs and

p-FinFETs, the workfunction is set to ΦGn = 4.4eV , ΦGp = 4.8eV , respectively. To

obtain medium-Vth n-/p-FinFET devices, ΦGn = ΦGp = 4.6eV . For Asymm-ΦG de-

vices, the front-gate workfunction is set to ΦGF = 4.4eV , while the back-gate work-

function is set to ΦGB = 4.8eV , with the source/drain doping type determining

the major carrier during on-state conduction. Symm-ΦG and Asymm-ΦG devices

have been evaluated in detail in Chapter 3. The results presented in Section 6.4

involve three major simulation setups that were deployed in the Sentaurus TCAD

tool suite [136], and are explained next.

6.3.1 DC metrics of 6T FinFET SRAMs

SRAM DC metrics cover stability, bitcell read current, and bitcell leakage. Sta-

bility encompasses the hold/read/write conditions, which are categorized as the

data retention margin, access disturb margin, and write margin. While several

definitions of each metric have been used in the literature, we adopt the method

of ‘N-curves’ prescribed in [165, 172]. Unlike traditional static voltage noise mar-

gin based setups, the N-curve method enables direct measurement of maximum

DC noise voltage as well as DC noise current that can be tolerated at the SRAM

internal nodes. Figs. 6.2(a) and (b) describe the N-curve measurement setup for

hold/read/write conditions. Here, a source monitor unit (SMU) is connected to

the internal storage node (NL) and measures the current supplied/drawn from

node NL (INL) when the node voltage (VNL) is swept from GND to VDD.

In the read condition, BL, BLB, and WL are held at VDD, while the SMU sweeps

node NL. The characteristic read N-curves for two bitcells, 6T 1 and 6T 2, are shown

175

VDD

GND

NL NR

WL

WL

BLBLB

PG1

PU1

PD1

PU2

PD2

PG2

SMU

VDDVDD

GND

GND

(a)

VDD

GND

NL NR

WL

WL

BL

BLB

PG1

PU1

PD1

PU2

PD2

PG2

SMU

VDD

VDD

VDD

VDD

(b)

Figure 6.2: Setup for (a) DC hold metrics, and (b) DC read/write metrics

in Fig. 6.3, and contain two zero-crossings Ai and Bi. The read voltage noise margin

(RV NM) is defined as RV NMi = VBi −VAi . The read current noise margin (RINM)

is defined as RINMi = max(INL) : VAi < VNL < VBi. Since bitcells with unequal

RV NMs/RINMs cannot be compared directly [173], a read power noise margin

(RPNM) metric is needed:

RPNMi =∫ VBi

VAi

INLdVNL (6.1)

176

0 0.2 0.4 0.6 0.8 1-5

0

5

10

15

x 10-5

VNL

(V)

I NL (

A)

Bitcell 6T1

Bitcell 6T2

RINM1

RINM2

A2

A1

B2

B1

RVNM1

RVNM2

C1

C2

Figure 6.3: N-curve for the DC read condition

Since bitcells with a larger RPNM are more stable during the read operation, 6T 2

is better than 6T 1 from a read margin perspective. The second half of the N-

curve can also be used to evaluate DC write-ability (Fig. 6.4). Here, the write-

trip voltage (WTV ) is defined as WTVi = VCi −VBi . The write-trip current (WT I),

which is the maximum current required to write into the bitcell, is defined as

WT Ii = max(|INL|) : VBi < VNL < VCi. As with earlier conditions, the write-trip

power (WT P) is defined as

WT Pi =∫ VCi

VBi

|INL|dVNL (6.2)

Thus, the main criteria for 6T FinFET SRAM DC operation are minimizing WT P,

while maximizing RPNM [173].

177

0.4 0.5 0.6 0.7 0.8 0.9 1

-3

-2

-1

0

1

2

3

4

5

x 10-5

VNL

(V)

I NL (

A)

Bitcell 6T1

Bitcell 6T2

B2

B1

C1

C2

WTV2

WTV1

WTI2

WTI1

Figure 6.4: N-curve for the DC write condition

6.3.2 Transport analysis based 3D-TCAD extraction of FinFET

SRAM parasitic capacitances

With increased scaling, owing to the close proximity of FEOL/BEOL regions,

dense layouts, such as FinFET SRAM arrays, need to undergo transport analysis

based 3D-TCAD parasitic capacitance extraction [133]. Here, active silicon re-

gions need to be treated as semiconductors in (FEOL+BEOL) structures generated

through process simulation of the layout in consideration. However, process

simulation time/memory complexity scales very poorly as the number of devices

in the layout increases. This is a major bottleneck for any iterative flow employing

it. We circumvented the process simulation barrier by leveraging an automated

multi-gate structure synthesizer [Fig. 5.22(b)], that is described in Chapters 4 and

5. A planar FET implementation of the method, with hardware validation in a

32nm SOI CMOS process, is described in [145].

178

The structure synthesis methodology involves a one-time process-simulation

cost for the construction of a 22nm SOI FinFET database consisting of n-/p-

FinFETs. Thereafter, with the aid of the FEOL/(FEOL+BEOL) multi-gate struc-

ture synthesizer (which is equipped with a layout analyzer/partitioner), the

FEOL/(FEOL+BEOL) structure corresponding to any input FinFET circuit layout

is synthesized automatically, using the FinFET device database and FEOL/BEOL

process assumptions. Therefore, by preserving process-level accuracy in critical

regions of the FinFET structure, the flow in Fig. 5.22(b) provides very favorable

time/memory scaling properties and enables iterative optimization in a practical

timeframe.

6.3.3 Modeling dynamic behavior of FinFET SRAM bitcells

In order to model the transient behavior of FinFET SRAMs, we utilized the hybrid

mixed-mode setup shown in Fig. 6.5. We leveraged the framework in Fig. 5.22(b)

and performed transport analysis based 3D-TCAD capacitance extraction on the

(FEOL+BEOL) structure synthesized from each input SRAM layout. Thereafter,

the 3D-TCAD-extracted capacitances from the (FEOL+BEOL) structure were back-

annotated into the bitcell mixed-mode 2D-TCAD setup. In order to avoid double

counting contributions already accounted for in the device cross-sections, a ca-

pacitance extraction experiment was performed for the bitcell mixed-mode setup,

and the difference between the former and the latter was included explicitly as a

mixed-mode capacitance correction.

Since an SRAM bitcell transient simulation is only relevant in the context of an

array, the complete mixed-mode setup was dynamically generated for each array

configuration of the bitcell, i.e., depending on the row width and column height,

as shown in Fig. 6.5. The latter was used to compute the minimum read/write

pulse-widths under various conditions.

179

In the next section, we discuss several 6T FinFET SRAM bitcell topologies,

which were evaluated under various conditions with the setup described above.

3D TCAD

structure

synthesis

Device

database

Process

assumptions

3D TCAD parasitics

extraction

Mixed-mode setup

SRAM

bitcell

Input layout

WL driver

BL BLB

Figure 6.5: Hybrid mixed-mode device simulation methodology for simulatingSRAM read/write operations

6.4 Design of 6T FinFET SRAMs

In this section, we evaluate several different FinFET SRAM bitcells from a DC,

parasitics, and transient perspective, after a brief discussion on bitcell operation

for each topology.

6.4.1 6T FinFET SRAM topologies

6T FinFET SRAMs can be broadly classified into three different categories:

180

• The ‘vanilla shorted-gate configurations’ (V SCs) are direct extensions of pla-

nar SRAMs, using only Symm-ΦG SG-mode FinFETs. Owing to width quan-

tization, βPD/PG and βPG/PU ratios are restricted, and the use of a single ΦG

significantly improves processing cost/yield (recall that PD, PG, and PU re-

fer to the pull-down, pass-gate, and pull-up FETs, respectively.) Hence, V SCs

are attractive for fin/gate pitch scaling, are the easiest bitcells to manufacture,

and require larger bitcell areas when pass-gate/pull-down fin counts are in-

creased to improve bitcell β ratios. They are designated as V (NPU NPGNPD),

where NPU /NPG/NPD are the fin counts for PU/PG/PD FETs, respectively.

• The ‘independent-gate configurations’ (IGCs) are derived from VSCs by re-

placing one or more Symm-ΦG SG-mode FinFETs with Symm-ΦG IG-mode

FinFETs. Owing to the flexibility of back-gate bias based Vth modulation, IGC

bitcells can improve DC metrics without resorting to increasing PG/PD fin

counts for improved stability. However, they are harder to manufacture (due

to layout-specific IG-mode devices), and are unlikely to be very scalable.

• The ‘multiple-ΦG shorted-gate configurations’ (MSCs) leverage SG-mode

FinFETs with two or more gate workfunctions. The devices can be Symm-ΦG

or Asymm-ΦG, and either condition leads to increased processing complex-

ity for the gate-stack. Owing to the availability of multiple Vth’s, MSC bitcells

can improve DC/transient metrics with the same fin/gate pitch scaling

abilities as VSCs, without using multi-fin PG/PD devices. In the current

work, we restrict our investigations to bitcells having combinations of only

two distinct ΦG’s.

In Table 6.2, the PU/PG/PD device configurations for V SC, IGC, and MSC bit-

cells are shown, along with the nomenclatures for each bitcell.

181

Table 6.2: 6T FinFET SRAM device configurationsTopology NPU PU ΦG(eV ) NPG PG ΦG(eV ) NPD PD ΦG(eV )

Vanilla configurationsV (111), V (112), V (113), V (122), V (123), V (135) 1 SG, 4.6 1→ 3 SG, 4.6 1→ 5 SG, 4.6

Independent-gate configurationsPass-gate feedback, PGFB 1 SG, 4.6 1 IG, 4.6 1 SG, 4.6

Pull-up write gating, PGFB-PUWG 1 IG, 4.6 1 IG, 4.6 1 SG, 4.6Split-pull-up, PGFB-SPU 1 IG, 4.6 1 IG, 4.6 1 SG, 4.6

Row-based back-gate bias, RBB 1 SG, 4.6 1 IG, 4.6 1 IG, 4.6Multiple-ΦG configurations

A(111),A(112) 1 a-SG, 4.4/4.8 1 a-SG, 4.4/4.8 1→ 2 a-SG, 4.4/4.8A(11)S 1 a-SG, 4.4/4.8 1 a-SG, 4.4/4.8 1 SG, 4.4DPD-L 1 SG, 4.6 1 SG, 4.6 1 SG, 4.4DPG-H 1 SG, 4.6 1 SG, 4.8 1 SG, 4.6

V SC bitcells: In Fig. 6.6, the (FEOL+BEOL) and FEOL structures for V (135) are

shown [the structures for V (111), V (112), V (113), V (122), and V (123) are shown

in Chapter 5]. They are based on traditional SRAM thin-cell layouts with PU p-

FinFETs sandwiched between PG/PD n-FinFETs, and have metal-3 word lines

(WL) and metal-2 bit lines/power lines (BL, BLB, VDD, GND), as shown in the

perspective view in Fig. 5.26(a). The default fin pitch and gate pitch were set to

FP = 40nm and GP = 90nm, respectively.

GND BL BLBVDD GND

WLNR

NL

(a) (b)

Pull-down

WL

WL

Pull-up

Pass-gateNR

NL

n-FinFET

p-FinFET

Figure 6.6: V(135) bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown

IGC bitcells: In Figs. 6.7, 6.8, 6.9, and 6.10, the (FEOL+BEOL) and FEOL struc-

tures corresponding to the PGFB and PUWG configurations (PGFB-PUWG) [71,

72], pass-gate feedback split pull-up (PGFB-SPU), and RBB configurations [77] are

shown, respectively, with FP = 40nm and GP = 90nm.182

WL

WL

NL

NR

Pull-up

Pull-down

Pass-gate

WL

GNDGND BL BLBVDD

NR

NL

IG-MODE

n-FinFET

p-FinFET(a) (b)

Figure 6.7: PGFB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown

PGFB: Unlike the V (111) bitcell, the PGFB bitcell contains IG-mode PG devices

whose back gates are connected to their respective storage nodes. While there is

no area penalty in moving from V (111) to PGFB, the internal node gate conductors

need to be extended, which affects the internal node capacitances and, hence, the

dynamic write-ability of the bitcell changes.

WL

WL

WWL

WWL

NR

NL

Pull-up

Pass-gate

Pull-down

VDD VDDGNDBL BLB

WL

WWL

NR

NL

IG-MODE

n-FinFET

p-FinFET(a) (b)

Figure 6.8: PGFB-PUWG bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectricregions are not shown

183

PGFB-PUWG: This bitcell is derived from the PGFB bitcell, with the PU devices

in IG mode. The back gates of the latter are connected to WWL. During hold and

read conditions, WWL is de-asserted, and the bitcell is expected to behave like the

PGFB bitcell. During a write operation, WWL is asserted to selectively weaken the

PU devices, thereby improving write-ability. Also, the PGFB-PUWG bitcell incurs

extra area in the form of an additional fin pitch, in order to prevent a conflict in

wiring metal-3 WL and WWL nodes.

WL

WL

NR

NL

VDD

VDD

Pull-up Pass-gate

Pull-down

VDD VDDGNDBL BLB

WL

NR

NL

IG-MODE

n-FinFET

p-FinFET(a) (b)

Figure 6.9: PGFB-SPU bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectricregions are not shown

PGFB-SPU : The PGFB-SPU bitcell is a variant of the PGFB bitcell, where the PU

devices are in IG mode and their back gates are hardwired to VDD. This helps im-

prove write-ability over the PGFB configuration without adding additional para-

sitic capacitances (which arise due to an extra word line in PGFB-PUWG).

RBB: In this configuration, the PG/PD devices are in IG mode and connected to a

metal-3 BIAS node that is shared across the entire row. The bitcell area increases by

33% (Fig. 6.11) as additional fin pitches are needed to contact the back gates of the

PG/PD FETs. During the hold operation, the BIAS node is de-asserted (below the

184

WL

NR

NL

BIAS

WL

BIAS

Pull-up

Pull-down

Pass-gate

GND BL BLBVDD GND

WL

BIAS

NR

NL

IG-MODE

n-FinFET

p-FinFET(a) (b)

Figure 6.10: RBB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown

rail), thereby limiting bitcell leakage. During the read/write operations, the BIAS

node is asserted along with WL, in order to improve dynamic read/write-ability.

MSC bitcells are identical to corresponding V SC bitcells from a layout perspec-

tive. They leverage either Asymm-ΦG or dual-Symm-ΦG n-/p-FinFETs to improve

DC metrics without incurring additional parasitic capacitances seen in IGC bitcells.

1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

12

14

Bitc

ell a

rea/

(FP

x G

P)

V(1

12),

A(1

12)

V(113)V(135)

RBB

V(1

11),

A(1

11),

A(1

1)S

, DP

D−L

, DP

G−H

V(123)

V(122)

PGFB

PGFB−PUWG

PGFB−SPU

Figure 6.11: Bitcell areas normalized to FP×GP

185

Next, we take a closer look at DC metrics of the bitcells described above.

0.4 0.6 0.8 10.1

0.2

0.3

0.4

0.5

VDD

(V)

RV

NM

(V

)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0.4 0.6 0.8 10

1

x 10−4

VDD

(V)

RIN

M (

A)

V(111)V(112)V(113)V(122)V(123)V(135)

(b)

0.4 0.6 0.8 10

1

2

3

4 x 10−5

VDD

(V)

RP

NM

(W

)

V(111)

V(112)

V(113)

V(122)

V(123)

V(135)

(c)

Figure 6.12: V SC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM

6.4.2 6T FinFET SRAM DC metrics

We now examine the dependence of DC metrics on VDD to provide a high-level pic-

ture of read stability, write-ability, read current (IREAD), and bitcell leakage (ILEAK).

Read margins: In Figs. 6.12, 6.13, and 6.14, read margins for the V SC, IGC, and

MSC bitcells are shown. Owing to the high βPD/PG ratio, V (113) has the high-

est RV NM. However, V (135) has the highest RINM, which degrades gracefully

as VDD decreases. In terms of RPNM, V (113) is the best amongst V SC bitcells for

186

0.4 0.6 0.8 10.1

0.2

0.3

0.4

0.5

VDD

(V)

RV

NM

(V

)

PGFBPGFB−PUWGPGFB−SPU

(a)

0.4 0.6 0.8 10

2

4

6

8 x 10−5

VDD

(V)

RIN

M (

A)


(b)

0.4 0.6 0.8 10

0.5

1

1.5

2 x 10−5

VDD

(V)

RP

NM

(W

)

PGFB

PGFB−PUWG

PGFB−SPU

(c)

Figure 6.13: IGC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM

VDD > 0.8V , while V (135) crosses over for VDD < 0.7V . A similar RPNM crossover

can be observed between V (112) and V (123), at VDD = 0.9V . This shows that high

βPD/PG ratio does not necessarily translate to high RINM/RPNM. Also, at VDD = 1V ,

RPNMV (113) : RPNMV (111) ≈ 5.2, which is a dramatic increase in stability, with in-

creased βPD/PG ratio.

Amongst the IGC bitcells, stability metrics for RBB were not computed as the

N-curve based RPNM and WT P definitions are not directly applicable in the tradi-

tional sense, owing to the inherently dynamic nature of read and write operations

(where BIAS overdrives the PG/PD n-FinFETs, when WL is triggered). While the

PGFB-PUWG bitcell faces a similar dynamic condition during write, we character-

187

0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

VDD

(V)

RV

NM

(V

)

A(112)A(111)A(11)SDPD−LDPG−H

(a)

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1 x 10−4

VDD

(V)

RIN

M (

A)


(b)

0.4 0.6 0.8 10

0.5

1

1.5

2 x 10−5

VDD

(V)

RP

NM

(W

)


(c)

Figure 6.14: MSC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM

ized read/write metrics for VWWL = 0V . From Figs. 6.13(a) and (b), RV NMPGFB <

RV NMPGFB−PUWG, while RINMPGFB crosses over RINMPGFB−PUWG at VDD = 0.7V .

This hints to a potential RPNMPGFB > RPNMPGFB−PUWG crossover for VDD > 1V (not

shown). Similar behavior has been observed in [72]. PGFB fares better than V (111),

as RPNMPGFB : RPNMV (111) ≈ 2.7 (all ratios are reported at VDD = 1V ). Among the

IGC bitcells, PGFB-SPU has the least RV NM/RINM, resulting in poorer read stabil-

ity with respect to PGFB, as RPNMPGFB : RPNMPGFB−SPU ≈ 1.6.

From Fig. 6.14(a), MSC bitcells can be seen to fare well in terms of RV NM. How-

ever, RINM is surprisingly poor for A(111), A(11)S, and DPD-L. This results in poor

188

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

VDD

(V)

WT

V (

V)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0.4 0.6 0.8 10

0.5

1

1.5

2

2.5 x 10−5

VDD

(V)

WT

I (A

)

V(111)V(112)V(113)V(122)V(123)V(135)

(b)

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1 x 10−5

VDD

(V)

WT

P (

W)

V(111)V(112)V(113)V(122)V(123)V(135)

(c)

Figure 6.15: V SC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P

RPNM, as RPNMA(112) : RPNMA(111) ≈ 3.6 and RPNMA(112) : RPNMA(11)S ≈ 7. A(111)

is slightly worse than its V SC counterpart V (111), as RPNMV (111) : RPNMA(111)≈ 1.5.

It also fares poorly with respect to PGFB, where RPNMPGFB : RPNMA(111) ≈ 4.2.

Write margins: In Figs. 6.15, 6.16, and 6.17, write margins for the V SC, IGC, and

MSC bitcells are shown. While all the V SC bitcells have reasonable write WTV s,

V (113) and V (112) have high WT Is. V (111) and V (122) have the lowest WT P and,

hence, are the best V SC bitcells from a DC write-ability perspective. V (112), V (113),

V (123), and V (135) have large WT Ps, which suggests that dynamic write-ability

will be a major concern for V SC bitcells. Here, it is important to note that DC read

metrics are pessimistic, i.e., they seek unconditional read stability for arbitrarily

189

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

VDD

(V)

WT

V (

V)


(a)

0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3 x 10−5

VDD

(V)

WT

I (A

)


(b)

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1 x 10−5

VDD

(V)

WT

P (

W)


(c)

Figure 6.16: IGC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P

large WL pulse widths. On the other hand, DC write metrics can be very optimistic,

as WL pulse widths, in reality, are of finite durations during which the bitcell needs

to be written successfully. While the ability to guarantee write-ability with a WL

pulse width of infinite duration is not very useful, the relative difficulty in writing

to a bitcell can be gauged from the WT P.

Amongst the IGC bitcells, while PGFB-PUWG has the lowest WTV , it also has

the highest WT I. PGFB-SPU has the lowest WT I and WT P, making it the best

IGC bitcell from a write-ability perspective. It also represents the other operating

extreme from PGFB-PUWG, when VWWL =VDD. PGFB-SPU is better in comparison

190

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

VDD

(V)

WT

V (

V)


(a)

0.4 0.6 0.8 10

0.5

1

1.5

2 x 10−5

VDD

(V)

WT

I (A

)


(b)

0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1 x 10−5

VDD

(V)

WT

P (

W)


(c)

Figure 6.17: MSC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P

to V (122) (which is the most write-able V SC bitcell), as WT PV (122)/WT PPGFB−SPU ≈

1.7.

MSC bitcells have high WTV s and low WT Is (Fig. 6.17). A(111), A(11)S and

DPD-L have low WT Ps, with A(111) being the best bitcell. In moving from A(111)

to A(112), while RPNM improves by 3.6× (Fig. 6.14), write-ability degrades, as

WT PA(112)/WT PA(111) ≈ 2.1.

Read current: In Figs. 6.18(a), (b), and (c), the dependence of IREAD on VDD is

shown for each of the bitcells. At VDD = 1V , amongst the V SC bitcells, IREAD is

roughly proportional to the PG fin count (e.g., IREAD,V (122) : IREAD,V (111) ≈ 2). Also,

V (111)/V (112) bitcells have 3× larger IREAD compared to PGFB bitcells. However,

191

0.4 0.6 0.8 110

−6

10−5

10−4

10−3

VDD

(V)

I RE

AD

(A

)

V(111)

V(112)

V(113)

V(122)

V(123)

V(135)

(a)

0.4 0.6 0.8 110

−8

10−7

10−6

10−5

10−4

VDD

(V)

I RE

AD

(A

)

PGFBPGFB−PUWGPGFB−SPURBB

(b)

0.4 0.6 0.8 110

−10

10−8

10−6

10−4

VDD

(V)

I RE

AD

(A

)


(c)

Figure 6.18: IREAD vs. VDD: (a) V SC, (b) IGC, and (c) MSC

IREAD,RBB ≈ IREAD,V (111). Also, all MSC bitcells (with the exception of DPG-H) have

approximately the same IREAD as V (111).

Bitcell leakage current: In Figs. 6.19(a), (b), and (c), the dependence of ILEAK on VDD is

shown. At VDD = 1V , while moving from V (111) to V (112) [V (113)], ILEAK increases

by 48% (97%). Amongst V SC bitcells, V (135) has the highest ILEAK that is nearly

4× ILEAK,V (111). Among the IGC bitcells, PGFB-PUWG has unacceptably high ILEAK

for the DC hold state configuration with VWWL = 0V . Hence, from an ILEAK per-

spective, the PGFB-PUWG bitcell will typically be restricted to VWWL = VDD − δ

(δ < 0.2V ). However, such a choice for VWWL would dramatically impact the read

192

0.4 0.6 0.8 110

−11

10−10

10−9

VDD

(V)

I LEA

K (

A)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0.4 0.6 0.8 110

−12

10−10

10−8

10−6

10−4

VDD

(V)

I LEA

K (

A)

PGFBPGFB−PUWG, V

WWL=0V

PGFB−PUWG, VWWL

=1V

PGFB−SPURBB

(b)

0.4 0.6 0.8 110

−11

10−10

10−9

10−8

10−7

VDD

(V)

I LEA

K (

A)


(c)

Figure 6.19: ILEAK vs. VDD: (a) V SC, (b) IGC, and (c) MSC

stability, thereby making it unattractive. Amongst MSC bitcells, A(11)S and DPD-

L have nearly two orders of magnitude higher ILEAK than A(111), owing to the

presence of Symm-ΦG high-performance SG-mode FinFETs. In comparison to

V (111), which is equipped with Symm-ΦG low power FinFETs (ΦG = 4.6eV ), A(111)

[A(112)] fares poorly with 4.7× (7×) higher ILEAK . However, vanilla-like topologies

with Asymm-φG SG-mode FinFETs, such as A(111), are still an attractive option for

low-leakage bitcells in high-performance dual gate-workfunction processes.

Next, we examine parasitic capacitances in the V SC, IGC, and MSC bitcells in

greater detail.

193

6.4.3 6T FinFET SRAM parasitic capacitances

Since the SRAM bitcell layout closely affects the parasitic BL and WL capacitances,

it is essential to understand the sources of coupling and their relative contributions

in order to perform any kind of layout exploration/optimization for good transient

behavior.

9%

35%

27%

< 1%

27%

< 1%< 1%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

(a)

21%

2%

16%

21%

24%

16%< 1%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

(b)

Figure 6.20: Breakup of V (111) (FEOL+BEOL) BL and WL capacitances

13%

19%

50%

< 1%

17%< 1%< 1%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

(a)

22%

2%

21%

22%

12%

20%

< 1%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

(b)


(FEOL+BEOL) capacitance break-up: In Figs. 6.20, 6.21, and 6.22, the V SC (FEOL+BEOL)

BL capacitance (CBL) and WL capacitance (CWL) are decomposed into their com-

ponents. Moving from V (111) → V (123) → V (135), CBL,WL dominates, growing

from 27%→ 50%→ 63%, while CBL,VDD nearly halves itself on each occasion, from

35%→ 19%→ 7%. On the CWL front, CWL,GND loses share, from 24%→ 12%→ 10%,

while CWL,BL and CWL,BLB increase and level out, from 16%→ 20%→ 19%. This194

13%

7%

63%

< 1%

15%< 1%1%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

(a)

24%

3%

19%

24%

10%

19%< 1%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

(b)


shows that reduction in CBL,WL and CWL,NL/CWL,NR are key priorities when the

βPD/PG ratio is increased in V SC bitcells.

17%

35%20%

< 1%

27%

< 1% < 1%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

(a)

21%

3%

13%

21%

28%

13%< 1%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

(b)

Figure 6.23: Breakup of PGFB (FEOL+BEOL) BL and WL capacitances

In the PGFB bitcell (Fig. 6.23), CBL is dominated by CBL,VDD (35%), while CWL

is dominated by CWL,GND (28%). However, for the PGFB-SPU bitcell (Fig. 6.24),

the trend reverses. Here, CBL is dominated by CBL,GND (38%) and CWL is domi-

nated by CWL,VDD (32%). In the PGFB-PUWG scenario (Fig. 6.25), CBL,GND (37%) and

CWL,WWL (26%) dominate CBL and CWL, respectively. For the RBB bitcell (Fig. 6.26),

CBL mainly consists of CBL,WL (35%) and CBL,VDD (34%). However, CWL reduction is

very difficult in this case, as it consists of six nearly equal contributions.

V SC bitcell capacitances: In Fig. 6.27, the (FEOL+BEOL) capacitances for the V SC bit-

cells are shown. CBL decreases by 25% while moving from V (111) to V (112), as the

195

13%

28%

19%< 1%

38%

< 1%< 1%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

(a)

21%

32%11%

21%

3%

11%1%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

(b)

Figure 6.24: Breakup of PGFB-SPU (FEOL+BEOL) BL and WL capacitances

13%

27%

18%< 1%

37%

< 1%< 1%

3%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

CBL, WWL

(a)

18%

18%

9%17%

2%

9%

1%

26%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

CWL, WWL

(b)

Figure 6.25: Breakup of PGFB-PUWG (FEOL+BEOL) BL and WL capacitances

6%

34%

35%

< 1%6%

< 1%< 1%

18%

CBL,NL

CBL,VDD

CBL,WL

CBL,NR

CBL,GND

CBL,BLB

CBL,WFRGND

CBL, BIAS

(a)

14%

2%

16%

15%17%

19%

< 1%

17%

CWL,NL

CWL,VDD

CWL,BL

CWL,NR

CWL,GND

CWL,BLB

CWL,WFRGND

CWL, BIAS

(b)

Figure 6.26: Breakup of RBB (FEOL+BEOL) BL and WL capacitances

addition of a single fin to the PD FET adds an extra fin pitch, which permits larger

spacings between GND, BL, and VDD. While the V (113) configuration adds an extra

PD fin, the PG fin count remains unchanged, on account of which the reduction

in CBL from V (112) to V (113) is not significant. Thus, we see that a 33% (66%) in-

196

1 2 3 4 5 60

50

100

150

CB

L (aF

)

V(111)

V(112)

V(122)V(123)V(123)

V(113)

(a)1 2 3 4 5 6

0

100

200

300

400

CW

L (

aF) V(111)

V(112)V(113)

V(123)V(122)

V(135)

(b)

Figure 6.27: (FEOL+BEOL) BL and WL capacitances in V SC bitcells

crease in bitcell area, from the V (111) to V (112) [V (111) to V (113)] configuration,

can reduce CBL and increase the βPD/PG ratio significantly. The V (122) and V (123)

bitcells have higher CBL as the PG fin count is higher. Since metal-3 WLs run across

the breadth of the bitcell, CWL generally increases as the PD/PG fin count increases.

However, CWL decreases from the V (122) to V (123) configuration, as the WL gate to

internal node coupling decreases, due to the additional FP spacing between them.

40 50 60 7090

100

110

120

130

140

CB

L (aF

)

Fin pitch, FP (nm)

V(111)PGFB

(a)

40 50 60 70180

200

220

240

CW

L (

aF)

Fin pitch, FP (nm)

V(111)PGFB

(b)

Figure 6.28: (FEOL+BEOL) capacitances vs. FP for (a) CBL, and (b) CWL (GP= 90nm)

Effect of FP: In Fig. 6.28(a), we can see that as FP decreases, CBL increases. CBL is

greatly affected by FP as the metal-2 BLs run vertical to the bitcell in thin-cell lay-

outs. V (111) and PGFB witness a 32% and 39% increase in CBL, respectively, in

197

moving from FP = 40nm to 70nm. The plateau in CBL at FP = 50-60nm is due to

the fact that metal-2 BL, BLB, VDD, and GND tracks are wider for FP = 60nm, 70nm,

owing to the larger pitches. CWL is affected by trends at high and low FP for both

bitcells. As FP increases, the metal-3 WL gets longer and aggregates capacitances

from bitcell features below it, which increases CWL. When FP decreases beyond

a certain point (FP = 50nm), the capacitance between the WL gate and shared el-

evated source/drain/metal-1 regions in the neighborhood boosts CWL. From the

above observations, we can see that there is an optimal FP where CBL and CWL are

minimized.

1 2 30

100

200

300

400

Cap

acita

nce

(aF

)

V(111)PGFBPGFB−PUWGPGFB−SPURBB

1 2 3−20

0

20

40

% d

iffer

ence

w.r

.t. V

(111

)

CBL

CWLC

NL

(a)

1 2 30

50

100

150

200C

apac

itanc

e (a

F)

V(111)PGFBPGFB−PUWGPGFB−SPURBB

1 2 3−30

−20

−10

0

10

% d

iffer

ence

w.r

.t. V

(111

)

CBL

CNL

CWL

(b)

Figure 6.29: IGC bitcell capacitances vs. V (111): (a) (FEOL+BEOL), and (b) FEOL

IGC vs. V (111): In Fig. 6.29, a comparison between IGC and V (111) bitcell ca-

pacitances is shown. The (FEOL+BEOL) CBL for IGC bitcells hovers around 0.95-

1.05×CBL,V (111). However, in the (FEOL+BEOL) CWL cases, PGFB-PUWG (RBB) reg-

isters a 13% (31%) increase, and PGFB witnesses a 15% reduction. PGFB-SPU has

nearly identical CBL/CWL values as V (111). This suggests that CWL increases con-

siderably in IGC configurations, which have extensive routing for back-gate con-

198

nections. The latter is supported by Fig. 6.29(b), where RBB has 16% lower FEOL

CWL than V (111) (owing to the four IG-mode FETs).

Next, we examine the transient behaviors of the bitcells and contrast them with

inferences drawn earlier from DC metrics.

6.4.4 Transient behavior of 6T FinFET SRAMs

We captured the minimum read pulse width (TR) and minimum write pulse width

(TW ) for several array configurations, which were modeled using the setup shown

in Fig. 6.5.

0.5 0.6 0.7 0.8 0.9 12

4

6

8

10

12

14 x 10−10

VDD

(V)

T R (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0.6 0.7 0.8 0.9 14

5

6

7

8

9x 10−10

VDD

(V)

T R (

s)


(b)

0.5 0.6 0.7 0.8 0.9 12

4

6

8

10

12

14 x 10−10

VDD

(V)

T R (

s)


(c)

Figure 6.30: TR vs. VDD: (a) V SC, (b) IGC, and (c) MSC

199

0 2 4 62

4

6

8

10

12 x 10−10

Bitcell σ

T R (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0 2 4 64

5

6

7

8

9

10 x 10−10

Bitcell σ

T R (

s)


(b)

0 2 4 62

3

4

5

6

7 x 10−10

Bitcell σ

T R (

s)


(c)

Figure 6.31: TR vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V

In Fig. 6.30, the dependence of TR on VDD is shown. The default array config-

uration consisted of 32 bitcells per column and 512 bitcells per row. Amongst the

V SC bitcells, V (135) has the largest TR, nearly 2.7× higher than that of V (111), de-

spite having the highest IREAD, owing to the highest WL RC-delay. Also, V (111)

has marginally lower TR in comparison to V (112) despite having larger CBL, owing

to lower WL RC-delay. At VDD = 1V , in moving from V (111) to V (113) (which has

the best RPNM), there is a 45% increase in TR, which is considerable. Among the

IGC bitcells, while PGFB has the lowest TR at VDD = 1V , RBB crosses over at lower

VDD. However, in comparison to V (111), TR,PGFB is 33% higher. In the MSC bitcell

category [Fig. 6.30(c)], DPG-H has the highest TR owing to the lowest IREAD, on ac-

200

count of the high-Vth PG n-FinFETs. A(111), A(11)S, and DPD-L have the lowest TR.

With the exception of DPG-H, for all the MSC bitcells, TR degrades gracefully as

VDD decreases. In moving from A(111) to A(112) (which has much higher RPNM),

TR increases by 14%.

The dependence of TR on the worst-case bitcell FET Vth skews, measured in

units of bitcell σ (where σΦG = 30meV or equivalently σVth = 30mV , VDD = 1V ), is

shown in Fig. 6.31. For the V SC bitcells, with the exception of V (135), TR increases

by 34-42%, in moving from the nominal to 6σ cases. Among the IGC bitcells, al-

though RBB has tolerable TR vs. VDD variation, TR increases dramatically for larger

bitcell σ. In comparison to V SC bitcells, the PGFB bitcells fare poorly as bitcell σ

increases. PGFB-PUWG, PGFB-SPU , and PGFB have 58%, 58%, and 84% higher

TR, respectively, in the 6σ cases. With the exception of DPG-H, all the MSC bitcells

degrade gracefully with increased bitcell σ, and witness a 33-35% increase in TR in

the 6σ case.

In Fig. 6.32, the dependence of TW on VDD is shown. For the V SC bitcells, TW

increases by 46-51% as VDD decreases. Amongst the IGC bitcells, PGFB, PGFB-

SPU , and PGFB-PUWG fare poorly as TW increases by 95%, 114%, and 109%, re-

spectively. On the other hand, RBB faces a 44% increase in TW , despite being

the hardest bitcell to write to at VDD = 1V . Among the MSC bitcells, DPG-H has

the highest TW with a poor VDD scaling trend. A(111) has the best TW , and is 2%

lower than V (111), at VDD = 1V . In moving from A(111) to A(112), TW increases

by 34%, whereby write delay dominates. In Fig. 6.33, the effect of bitcell σ on TW

is shown. V SC bitcells degrade gracefully as bitcell σ increases. However, among

IGC bitcells, PGFB and RBB face a steep rise in TW , implying that dynamic write-

ability is a major problem for these bitcells at high bitcell σ. PGFB-SPU has the

best TW on account of having the back-gate tied to VDD. Also, during a write op-

eration, TW,PGFB−PUWG > TW,PGFB−SPU as WWL, which is asserted along with WL,

201

0.5 0.6 0.7 0.8 0.9 10

0.5

1

1.5

2 x 10−9

VDD

(V)

T W (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0.5 0.6 0.7 0.8 0.9 13

4

5

6

7

8

9 x 10−10

VDD

(V)

T W (

s)


(b)

0.5 0.6 0.7 0.8 0.9 13

4

5

6

7

8

9 x 10−10

VDD

(V)

T W (

s)


(c)

Figure 6.32: TW vs. VDD: (a) V SC, (b) IGC, and (c) MSC

faces a finite RC-delay before weakening the PU p-FinFETs. Amongst the MSC bit-

cells, A(111) has the best TW characteristic and the performance loss with respect to

A(112) is relatively uniform as bitcell σ increases.

Several trends in access time [TACC = max(TR, TW )] can be inferred from

Figs. 6.30, 6.31, 6.32, and 6.33. V SC bitcells are limited by TW for all VDD. Also, for

any given bitcell σ, TACC = TW , implying that V SC bitcells are faced with the issue

of poor dynamic write-ability. Amongst IGC bitcells, at high VDD, TACC = TW for

PGFB and RBB, and TACC = TR for PGFB-PUWG and PGFB-SPU . However, at low

VDD, TACC = TR for all the IGC bitcells. Similarly, MSC bitcells are limited by TR at

low VDD and by TW at high VDD. The above observations also broadly apply when

202

0 2 4 60

0.5

1

1.5

2 x 10−9

Bitcell σ

T W (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(a)

0 1 2 3 4 53

4

5

6

7

8

9 x 10−10

Bitcell σ

T W (

s)


(b)

0 2 4 62

4

6

8

10

12

14 x 10−10

Bitcell σ

T W (

s)


(c)

Figure 6.33: TW vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V

the array configuration (designated by [column height, row width]) is modified

(Figs. 6.34, 6.35). In general, TR and TW increase dramatically on increasing the row

width from 128 to 512 bits (with the column height at 32 bitcells). Increasing the

column height from 16 to 64 bitcells results in a marginal increase in TR and TW ,

which suggests that reducing WL RC-delay is as important as reducing CBL.

Effect of FP: The parasitic capacitances extracted during FP variation experiments

for V (111) in Section 6.4.3 were back-annotated into transient simulations for

TR/TW , and the results are shown in Fig. 6.36. At VDD = 1V , while moving from

FP = 40nm to (FP = 50nm, 60nm, 70nm), TR increases by (9%, 26%, 40%), respec-

tively, and TW increases by (15%, 36%, 56%), respectively. These results show that

203

1 2 3 4 50

0.2

0.4

0.6

0.8

1 x 10−9

Array Configuration

T R (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(32,128)

(64,512)(16,512) (32,256) (32,512)

(a)

1 2 3 4 51

2

3

4

5

6 x 10−10

Array Configuration

T R (

s)


(32,128) (32,512)(16,512) (32,256)

(64,512)

(b)

1 2 3 4 50

1

2

3

4

5 x 10−10

Array Configuration

T R (

s)


(16,512)(32,256)

(32,512)

(64,512)(32,128)

(c)

Figure 6.34: TR vs. array configuration: (a) V SC, (b) IGC, and (c) MSC

FP has a very pronounced impact on the transient behavior of the bitcell, and

the dependence of FEOL parasitic capacitances on FP needs to be accounted for

accurately in compact models for circuit-level transient simulations.

Overall, from Figs. 6.12, 6.13, 6.14, 6.15, 6.16, 6.17 and Figs. 6.30, 6.31, 6.32,

6.33, we can see that the dynamic behavior of the bitcell is extremely important

to account for and analysis using DC metrics alone can lead to misleading conclu-

sions. For instance, V SC bitcells with multi-fin PG FETs suffer a penalty in terms

of increased word line RC-delay that dwarfs any savings from the improved DC

write-ability of the bitcell. E.g., while V (122) has the least WT P with βPG/PU ≈ 2, it

does not translate to the lowest TW , on account of higher parasitics.

204

1 2 3 4 50

0.5

1

1.5 x 10−9

Array Configuration

T W (

s)

V(111)V(112)V(113)V(122)V(123)V(135)

(16,512)

(32,128)

(64,512)(32,512)(32,256)

(a)

1 2 3 4 51

2

3

4

5

6 x 10−10

Array Configuration

T W (

s)


(16,512)(32,128)

(32,256) (64,512)(32,512)

(b)

1 2 3 4 50

2

4

6 x 10−10

Array Configuration

T W (

s)


(16,512) (32,128) (32,256)(32,512)

(64,512)

(c)

Figure 6.35: TW vs. array configuration: (a) V SC, (b) IGC, and (c) MSC

6.5 Chapter summary

Parasitic capacitances play a very critical role in determining SRAM behavior in

highly scaled technology nodes employing multi-gate devices like FinFETs. Prior

research in FinFET SRAM design has generally focused on the optimization of

DC metrics, and largely ignored the effect of parasitics as well as the dynamic

behavior of bitcells. In this chapter, we presented a unified 3D/mixed-mode 2D-

TCAD methodology that facilitates the extraction of FinFET SRAM parasitic capac-

itances using a transport analysis based approach as well as their subsequent back-

annotation into mixed-mode transient simulations, thereby delineating a path for

layout/technology/multi-gate circuit co-design.

205

0.5 0.6 0.7 0.8 0.9 12

4

6

8

10 x 10−10

VDD

(V)

T R (

s)

FP = 40nmFP = 50nmFP = 60nmFP = 70nm

(a)

0.5 0.6 0.7 0.8 0.9 13

4

5

6

7

8

9 x 10−10

VDD

(V)

T W (

s)

FP = 40nmFP = 50nmFP = 60nmFP = 70nm

(b)

Figure 6.36: (a) TR, and (b) TW vs. VDD for V (111), across different FP

Results indicate that while bitcells having IG-mode devices (in single-ΦG pro-

cesses) often have superior DC metrics with respect to shorted-gate configura-

tions, their transient characteristics with VDD/Vth variation makes them unattrac-

tive. Asymm-ΦG FinFET SRAM bitcells (in dual-ΦG processes), on the other hand,

have competitive DC metrics and better dynamic write-ability.

206

Chapter 7

Conclusion

Planar transistor scaling in deep-submicron CMOS technology has approached its

limits at the sub-22nm nodes, owing to very poor electrostatic integrity, which is

manifested as degraded short-channel behavior and high leakage current. Ow-

ing to the latter, it is becoming increasing difficult to design chips that meet the

stringent high-performance, low-power specifications for products ranging from

servers/data centers to cellphones/tablets, and yet maintain high yield. Multi-

gate FETs overcome these problems due to tighter control of the channel potential

by multiple gates wrapped around the body. However, several design and techno-

logical challenges remain towards full-scale adoption of multi-gate devices in all

parts of the power-performance spectrum.

The primary contribution of this thesis is towards the unification of circuit lay-

out/process/device simulation worlds for early-stage 3D modeling of circuits us-

ing emerging devices, such as FinFETs and other multi-gate FETs. This was done

by proposing a novel structure synthesis methodology that circumvents the time

and memory complexity barrier posed by 3D process simulation and obviates the

need for repetitive process simulations via caching/re-use of process-accurate de-

vice structures. The latter contribution, which is unique to the industry, was lever-

207

aged throughout the dissertation to answer critical questions that arise during

technology-circuit co-design efforts for multi-gate circuits.

7.1 Dissertation summary

In this dissertation, we first explored the design space for ultra-low-leakage Fin-

FET logic and sequential elements using Symm-ΦG and Asymm-ΦG FinFETs in

a high-performance process. Through comprehensive simulations of several Fin-

FET INV and NAND2 logic topologies, we demonstrated that introducing a single

Symm-ΦG IG-mode FinFET at the top of the pull-down stack yields the best trade-

offs between leakage and delay for logic styles that mix SG- and IG-mode FinFETs.

We also established the fact that logic gates using pure a-SG-mode FinFETs outper-

form the best topologies possible using a mix of SG-/IG-mode FinFETs, with the

advantage of having no routing overheads due to back-gate biasing and little or no

modifications to existing CAD tools/layouts. Hence, this work challenges the con-

ventional notion from most earlier works that back-gate biasing based IG-mode

devices are the best option for obtaining low-leakage FinFET circuits.

Next, we examined the problem of developing comprehensive fault models for

FinFET circuits. Here, the key question of whether CMOS fault models cover all

defects in FinFET logic circuits had to be addressed. Using exhaustive single de-

fect injection into mixed-mode device-circuit simulations of FinFET SG-/LP-INV

and NAND2 topologies, we confirmed that the majority of defects in SG-mode-like

circuits map to CMOS fault models, such as stuck-at/stuck-on/stuck-open. How-

ever, opens on the back gate have no corresponding fault model and the entire logic

gate had to be characterized from a leakage-delay perspective. Here, several kinds

of behaviors were observed. Depending on the regime of operation, in the pres-

ence of n-/p-FinFET back-gate cuts, the logic gate could behave normally or suffer

208

excessive leakage or suffer excessive delay/pulse-broadening/pulse-shortening,

making it very difficult to develop a single test protocol for detecting back-gate

cuts. The above explorations comprehensively establish the fact that integrating

IG-mode devices into a multi-gate CMOS process is a bad idea, and that asymmet-

ric gate-workfunction devices are a better option for obtaining high-Vth devices

without the hassles of additional gate-workfuctions/IG-mode routing overheads

and testing issues.

Thereafter, drawing upon the shortcomings of performing mixed-mode 2D-

TCAD transient simulation for obtaining delay in Chapter 3, we switched to the

problem of determining parasitic capacitances in multi-gate circuit layouts in

Chapters 4 and 5. Here, the need for a true 3D extraction setup, which accounts for

silicon as a semiconductor, became apparent. In the process of solving the latter,

we encountered a much larger problem, which was the 3D process simulation

barrier. We developed a systematic automated structure synthesis methodology

to circumvent the process simulation barrier with the key insight that all portions

of the circuit structure under simulation need not have process-simulation-level

accuracy and that it is possible to amortize process simulations of smaller cir-

cuit blocks by re-using them to synthesize larger circuits. We also validated

the transport analysis based capacitance extraction approach (that requires a

process/device simulator) by comparing simulation results with hardware data

for two companion 6T SRAM arrays implemented in an IBM 32nm SOI process

that was calibrated into our TCAD setup. After establishing the validity of our

approach, we next provided critical insight into parasitic capacitances in multi-

gate circuits at the 22/14/10nm nodes using a multi-gate version of the structure

synthesizer. Overall, structure synthesis is the TCAD analog of logic synthesis

from the circuit world, and it obviates low-level layout-to-3D-TCAD coding work,

which engineers have grappled with for the past decade.

209

Finally, leveraging the structure synthesis approach, we analyzed several

topologies of FinFET SRAMs in a 22nm SOI process. By performing layout-

to-3D-TCAD capacitance extraction for each SRAM topology (having a mix of

SG-/IG-/a-SG-mode devices), and back-annotating the parasitic capacitances into

mixed-mode device-circuit transient simulations thereafter, we demonstrated that

dynamic read-ability/write-ability can vary dramatically with fin/gate pitch as

well as SRAM topology, and that design with DC targets can lead to sub-optimal

SRAM bitcells and arrays.

7.2 Future work

A significant limitation of the structure synthesis ideas proposed in Chapter 4 is

the exponential slowdown due to the device simulator for any 3D simulation that

does not involve zero bias capacitance extraction, e.g., 3D transient/DC simula-

tion. Here, we propose future work on enabling rapid convergence during generic

device simulation experiments on synthesized 3D structures, by leveraging the

constructs and abstractions proposed in Chapter 4.

Key insights

In device simulation, the solution of the carrier transport equations (depend-

ing on the model assumed, e.g., drift-diffusion, hydrodynamic, etc.) is obtained

from the PSD/synthesized structure subject to voltage/current boundary condi-

tions imposed by the external circuit. After time and space discretization, the com-

plete system of transport equations, which are typically nonlinear partial differen-

tial equations, is transformed into a set of nonlinear algebraic equations (F). Let K

denote the total number of mesh nodes in the structure, with R variables to solve

per node. Let us consider the drift-diffusion model for the sake of illustration. For

this model, the variables that need to be solved are electrostatic potential ψ, elec-

210

tron concentration n, and hole concentration p. Thus, R = 3. The solution vector x

can be written as x = (ψ1,n1, p1, ...,ψK,nK, pK)T . The system of nonlinear equations

can be cast in matrix form as:

F(x) = Ax−b = 0, A→ 3K×3K, b→ 3K×1 (7.1)

A direct solver (DS) [174], [175] attempts to solve this equation by directly in-

verting A to give x = A−1b, which can be very cumbersome for large K. Alterna-

tively, some form of decomposition, e.g., LU decomposition, can be used, where A

is expressed as A = LU. Here, L and U are upper and lower triangular matrices,

respectively. LY = b is solved first, followed by UX = Y. Both can be solved by

back-substitution.

An iterative linear solver (ILS) [176] attempts to solve Eq. (7.1) without any

explicit matrix inversion, by starting with an initial guess x0 and expanding F as:

F(x0 +δx0)≈ F(x0)+Fx(x0)δx0 ≈ 0, where Fx = Jacobian of F w.r.t. x (7.2)

and then solving for the update vector δx0 = x1−x0, and so on, till certain conver-

gence criteria are satisfied. This is the Newton-Raphson method, several variants

of which exist in the literature [177–179]. Fig. 7.1 shows the memory scaling be-

havior of a DS and ILS with increasing number of mesh nodes for sample device

simulations, using Sentaurus Device [83]. In terms of computation time, DS scales

as O(K2) whereas ILS scales as O(PK), where P is the number of iterations. This

analysis shows that ILS is a more suitable solver for 3D device simulations.

An important criterion for ILS to succeed is that each successive iteration ‘i’

should reduce the update vector |δxi|. Therefore, convergence is dependent on

how close the initial guess x0 is to the actual solution and how accurately the Jaco-

bian Fx is computed. When both are favorable, the convergence rate is quadratic.

211

Figure 7.1: Scaling behavior: Direct versus iterative linear solvers

The “closeness” criterion is key to solving certain hard, pathological instances

where an initial guess x0, which would normally have been considered to be close

to the solution, causes successive iterations to oscillate due to the nonlinear behav-

ior of the system and prevents convergence within the maximum iteration limit.

While iterations on Eq. (7.2) may help reach a solution at a certain bias, it is

necessary to have a strategy to obtain a solution at any bias, as this is essential

for DC, AC, and transient simulations. Here, extrapolation from an earlier biasing

condition is generally used in most device simulators. In order to extrapolate, the

dependence on boundary conditions V is explicitly cast as:

F(x,V) = 0 (7.3)

Assuming that a solution x is available at a certain bias V, in order to ramp up to a

new bias condition, a bias increment δV needs to be chosen. Advancing to the new

bias V+δV, the new solution x+δx can be computed from:

F(x+δx,V+δV)≈ Fx(x,V)δx+FV(x,V)δV≈ 0 (7.4)

212

whereby

Fx(x,V)δx =−FV(x,V)δV (7.5)

Whereas Fx(x,V) is the Jacobian as before, FV(x,V) needs to be evaluated from the

equations that have a dependence on V. By iterating on Eq. (7.5), the solution at

V+δV is determined. Thereafter, a new δV is chosen until the final biasing con-

dition is reached. Since each intermediate ramping step can potentially consume

a large number of iterations for a 3D device structure, performing simulation not

close to zero-bias conditions is very cumbersome. Also, extrapolation works well

only when δV is small. This is due to the fact that at arbitrary nonequilibrium con-

ditions V, only if the system of equations F behaves linearly around V does the

projected solution converge quadratically. Else, convergence is not obtained and a

smaller bias increment needs to be chosen. This imposes a restriction on δV, which

increases the total number of ramping steps needed. Hence, the total runtime in-

creases. With the introduction of more complex physical models, the system of

equations F gets harder to solve. E.g., for the hydrodynamic model, R = 6, and

since the size of A scales as O(K2R2), the runtime increases even more.

Proposed approach

We propose to replace the traditional extrapolate strategy with a hybrid cache-

extrapolate-update approach to overcome the difficulties encountered in finding

a solution for F, targeting both larger bias increments as well as complex physi-

cal models that have R ≥ 3. The key idea is illustrated in Fig. 7.2, which shows a

synthesized 3D FinFET SRAM structure on the right. During the course of a sim-

ulation, the devices in the structure may traverse various states (e.g., in terms of

ψ,n, p) repeatedly. Thus, the following questions arise. Why should the solver take

the trouble of extrapolation for each global boundary condition V+δV = Σ? Is it

possible to express Σ as a set of internal boundary/bias conditions ζ1,ζ2, ...,ζNZ ,

where NZ is the number of zones that the structure can be partitioned into? There-

213

Bias condition 1

Bias condition 2

Bias condition 3, …Extended structure

Figure 7.2: Key question: Can the solution of a large structure be approximatedusing individual pre-solved device states?

after, for the structure in zone j, 1 ≤ j ≤ NZ, if a corresponding pre-solved device

(should match in geometry, but need not be identical in doping distributions, etc.,

i.e., PW -GA is sufficient) exists, would it be possible to “restore” the cached device

state at the boundary condition ζ j? This would enable the solver to start with a

solution that is very close to the actual solution for Σ, with extrapolation being

performed only in the “uncached” zones corresponding to regions between FETs,

etc. Also, extrapolation in uncached zones could be a heuristic transform (Ω) con-

sisting of a combination of extrapolations from cached device states, which are in

the vicinity, and extrapolation from the earlier bias condition. This strategy will

reduce the number of iterations per bias ramp, irrespective of the value of R, per-

mitting larger values for δV. Owing to the zoning requirement, this approach also

merges well with the zone-based structure synthesis approach proposed earlier.

The above approach can be easily extended to mixed-mode simulation, where

ζ j denotes the boundary condition at the contacts of individual devices. We per-

214

1 2 3 4 5 6 70

500

1000

1500

Mix

ed−m

ode

sim

ulat

ion

runt

ime

(s)

Without cachingWith caching

SGNAND

IGNANDMTNAND

IG2NAND

XT2NAND

XTNAND

LPNAND

Figure 7.3: Transient mixed-mode 2D device simulation runtimes for FinFETNAND gates, with and without cache-restore of device states

formed preliminary experiments with a trivial cache-restore strategy on seven dif-

ferent FinFET NAND gate topologies described in [60]. Fig. 7.3 shows the results.

An improvement of 2.6× to 5× is seen, which is promising. Since cache and restore

breakpoints were manually inserted in these cases, they are not suitable for larger

mixed-mode/3D simulations, which would need a systematic and automated ap-

proach.

2D/3D process

simulation of a single

device

(Tsuprem4, Sprocess, etc.)

If 2D, extrude and

transform into 3D

Re-mesh single

device structure

Add/delete

electrical/thermal

contacts

(Sentaurus Mesh, etc.)

Database of process

conditions

Material system

parameters/properties

Physical model

coefficients

Detailed single-device 3D

simulation

Different physical/

transport models, e.g.,

DD, hydrodynamic, etc.

DC, AC and transient

pulse characterization

(Sentaurus Device, etc.)

Sub-sample 3D device/

cache device snapshots at

Small subset of device

mesh points

Different bias conditions

(Vi , Ik ) of the structure

Intelligent device-state

caching (DSC) algorithm

Topologically encode

device state, e.g.,

n(r), T(r), (r)

Loop over each device

Device-state database (DSD)

Figure 7.4: Generation of device-state database

215

Best-guess-at-bias

(BGB) algorithm

Stitch solution for

entire 3D structure

from individual device

states and transform

to DLD coordinates

at current bias

Extrapolate at

regions/points not

present in DSD

Generate complete

solution adaptable to

required structure on

the fly

DSD Extended 3D structure

virtual segmentation

using DLD device

association with

global mesh points

Intelligently select

mesh points for BGB

from 3D structure

DLD

Devj Vi

n(r), T(r), (r), Ik

Spatially extrapolate to

all mesh points at

current bias condition

BGB

interface

Structure

interface

Figure 7.5: State retrieval and extrapolation using the BGB algorithm

The building blocks of the proposed cache-extrapolate-update approach are

shown in Figs. 7.4, 7.5, and 7.6. The first step will be to generate the device-state

database (DSD). This will need to be performed for all the devices that are ex-

pected to be repeatedly used in a large number of simulations, to efficiently amor-

tize the effort involved. These devices will be simulated under several different

DC and transient boundary conditions to generate considerable device-state data.

Thereafter, the device will be sub-sampled at select mesh locations and select bias

conditions, using an intelligent device-state caching algorithm that encodes the

device-state without using up a prohibitively large amount of disk space. Fig. 7.5

shows a possible state retrieval/reconstruction approach based on a best-guess-at-

bias (BGB) algorithm that will be developed. Fig. 7.6 shows the point of entry into

the solver loop of the device simulator. While solving the 3D structure at zero-bias

conditions (Fig. 7.6) as a separate thread, structural information can be passed to

the retrieval step, where the structure is segmented/partitioned with the aid of the

DLD into zones (these can be identical to the zones used in synthesis). Using the

DLD and DSD, it will become possible to determine the zones that can be mapped

216

to pre-solved devices, whereby the mesh associated with device “ j” develops an

association with the global structure mesh (this mapping is referred to as Γ j). At

this juncture, the method will also select restore points in the 3D structure intelli-

gently.

Load extended

3D structure into device simulator

Solve at zero biasIncrement/decrement

boundary condition ()

BGB

interface

Re-solve till

convergence criteria satisfied

STOP

Yes

No Current bias

= final bias?

Update state of

extended 3D structure

Additional steps

Structure

interface

Figure 7.6: Updating state in the solver loop

Depending on the nature of the “solve” task, the solver will increment/decrement

the boundary condition Σ. The earlier device-state and Σ will be passed to the BGB

interface. The BGB algorithm will convert Σ into many zone boundary conditions

ζ j = Dev jVj, which will be passed to the DSD for state-lookup. This operation

can be multi-threaded, with lookup occurring in parallel, for each zone. The DSD

will retrieve the closest state corresponding to ζ j and spatially extrapolate to all

mesh points of the device structure. Next, the BGB algorithm will collect all the

data from the DSD and map/extrapolate them to the global structure mesh using

each Γ j. Thereafter, for zones that are not covered by the DSD, such as regions

between FETs, etc., another extrapolation will be performed using the heuristic

transform Ω, in order to stitch together a complete solution for the entire structure.

217

This will serve as an initial guess for the solver, as shown in Fig. 7.6. Thereafter,

the solver will assume control and iterate using Eq. (7.2) to reach convergence.

It is important to note that the additional steps introduced by the proposed

approach are optional. If the solver is able to make large δV increments with fewer

iterations to convergence, the above steps can be skipped and Eq. (7.5) can be used.

Else, whenever δV decreases or a large number of steps is needed for convergence

at a bias point, the BGB update can be invoked to provide a good initial guess. The

proposed approach will prove to be very useful when the complexity of physical

models increases, e.g., solving with quantum hydrodynamic models, where F can

be quite complex and elude convergence.

218

Appendix A

FinE3D framework

A.1 FinE3D Sentaurus TCAD decks

The FinE3D structure synthesis framework was developed in Python and inte-

grated into the Sentaurus Workbench. The setup consists of two kinds of decks:

• PrepareFETs: These constitute the PA-GA zone databases obtained from de-

tailed process simulations followed by presynthesis transformations to make

them amenable to structure synthesis, as shown in Figs. A.1 and A.2.

• GDS2Device: These decks perform structure synthesis for input layouts that

are annotated, and are derived for a fixed layout-layer map file from Ca-

dence Virtuoso, as shown in Fig. A.3. Thereafter, the layout is analyzed and

the FEOL synthesizer (Fig. A.4) produces the FEOL structure using FEOL

process assumptions and the device database. This is followed by BEOL and

integrated structure synthesis that have their respective input files, as shown

in Fig. A.5. After structure synthesis, selective mesh refinement is performed,

followed by transport analysis based capacitance extraction in the device

simulator, as shown in Fig. A.6. Thereafter, post-processing toolchains are

appended to produce the requested outputs.

219

Process

simulation

Figure A.1: A sample process simulation deck

Pre-synthesis

transformations

Figure A.2: Pre-synthesis transformations

220

Layout layer

map

Annotated

layouts

Layout

annotator/converter

Figure A.3: Layout annotation

FEOL

synthesizerPA-GA zone

database

FEOL process

assumptions

FEOL flags

Figure A.4: FEOL structure synthesis

221

BEOL

synthesizerBEOL process

assumptions

BEOL flags

Integrated

structure

synthesizer

Integration

flags

Figure A.5: BEOL and integrated structure synthesis

Mesh

refinements


capacitance extraction Post-processing

Figure A.6: Mesh refinement, capacitance extraction, and post-processing

222

Bibliography

[1] J. P. Colinge, FinFETs and Other Multi-gate Transistors. Springer, New York,2008.

[2] D. Vasileska, S. Goodnick, and G. Klimeck, Computational Electronics: Semi-classical and Quantum Device Modeling and Simulation. CRC Press, 2010.

[3] B. Agrawal, V. K. De, J. M. Pimbley, and J. D. Meindl, “Short channel modelsand scaling limits of SOI and bulk MOSFETs,” IEEE J. Solid-State Circuits,vol. 29, no. 2, pp. 122–125, Feb. 1994.

[4] T. Skotnicki, G. Merckel, and T. Pedron, “The voltage doping transformation:A new approach to the modeling of MOSFET short channel effects,” IEEEElectron Device Lett., vol. 9, no. 3, pp. 109–112, Mar. 1988.

[5] T. Sakurai, A. Matsuzawa, and T. Douseki, Fully depleted SOI CMOS circuitsand Technology for Ultralow Power Applications. Springer, New York, 2006.

[6] J. P. Colinge, Silicon on Insulator Technology: Materials to VLSI. Springer, NewYork, 2004.

[7] K. Bernstein and N. Rohrer, SOI Circuit Design Concepts. Springer, NewYork, 2000.

[8] “2007 International Technology Roadmap for Semiconductors,” http://www.itrs.net/Links/2007ITRS/Home2007.htm.

[9] D. Hisamoto, T. Kaga, Y. Kawamoto, and E. Takeda, “A fully depleted leanchannel transistor (DELTA),” in Proc. Int. Electron Devices Mtg., Dec. 1989, pp.833–836.

[10] M. Jurczak et al., “Silicon on nothing (SON): An innovative process for ad-vanced CMOS,” IEEE Trans. Electron Devices, vol. 47, no. 11, pp. 2179–2187,Nov. 2000.

[11] L. Mathew et al., “CMOS vertical multiple independent gate field effect tran-sistor (MIGFET),” in Proc. Int. SOI Conf., Oct. 2004, pp. 187–189.

[12] K. Okano et al., “Process integration technology and device characteristics ofCMOS FinFET on bulk silicon substrate with sub-10 nm fin width and 20 nmgate length,” in Proc. Int. Electron Devices Mtg., Dec. 2005, pp. 721–724.

223

http://www.itrs.net/Links/2007ITRS/Home2007.htm

http://www.itrs.net/Links/2007ITRS/Home2007.htm

[13] B. S. Doyle et al., “High performance fully depleted tri-gate CMOS transis-tors,” IEEE Electron Device Lett., vol. 24, no. 4, pp. 263–265, Apr. 2003.

[14] J. T. Park, J. P. Colinge, and C. H. Diaz, “Pi-gate SOI MOSFET,” IEEE ElectronDevice Lett., vol. 22, no. 8, pp. 405–406, 2001.

[15] F. L. Yang et al., “25 nm CMOS omega FETs,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 255–258.

[16] N. Singh et al., “High performance fully depleted silicon nanowire (diameter< 5nm) gate all around CMOS devices,” IEEE Electron Device Lett., vol. 27,no. 5, pp. 383–386, May 2006.

[17] S. Miyano, M. Hirose, and F. Masuoka, “Numerical analysis of a cylindricalthin pillar transistor (CYNTHIA),” IEEE Trans. Electron Devices, vol. 39, no. 8,pp. 1876–1881, Aug. 1992.

[18] S. Y. Lee et al., “Three dimensional MBCFET as an ultimate transistor,” IEEEElectron Device Lett., vol. 25, no. 4, pp. 217–219, Apr. 2004.

[19] J. P. Colinge, M. H. Gao, A. Romano, H. Maes, and C. Claeys, “Silicon oninsulator gate all around device,” in Proc. Int. Electron Devices Mtg., Dec. 1990,pp. 595–598.

[20] D. Park, “3 dimensional GAA transistors: Twin silicon nanowire MOSFETand multi bridge channel MOSFET,” in Proc. Int. SOI Conf., Oct. 2006, pp.131–134.

[21] T. Ernst et al., “Novel 3D integration process for highly scalable nano beamstacked channels GAA (NBG) CMOSFETs with HfO2/TiN gate stack,” inProc. Int. Electron Devices Mtg., Dec. 2006, pp. 4–7.

[22] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T. Chuang, K. Bern-stein, and R. Puri, “Turning silicon on its edge,” IEEE Circuits and DevicesMagazine, vol. 20, no. 1, pp. 20–31, Jan.-Feb. 2004.

[23] X. Huang et al., “Sub-50 nm FinFET: PMOS,” in Proc. Int. Electron DevicesMtg., Dec. 1999, pp. 67–70.

[24] B. Yu et al., “FinFET scaling to 10nm gate length,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 251–254.

[25] J. Yang, P. M. Zeitzoff, and H. H. Tseng, “Highly manufacturable double-gate FinFET with gate source/drain underlap,” IEEE Trans. Electron Devices,vol. 54, no. 6, pp. 1464–1470, June 2007.

[26] T. Ludwig et al., “FinFET technology for future microprocessors,” in Proc.Int. SOI Conf., 2003, pp. 33–34.

224

[27] L. Mathew et al., “Multi-gated device architectures advances, advantagesand challenges,” in Proc. Int. Conf. Integrated Circuit Design and Technologyand Tutorial, 2004, pp. 97–98.

[28] Y. K. Choi et al., “FinFET process refinements for improved mobility and gateworkfunction engineering,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp.259–262.

[29] Z. B. Zhang et al., “An integratable dual metal gate/high-k CMOS solutionfor FDSOI and MuGFET technologies,” in Proc. Int. SOI Conf., 2005, pp. 157–158.

[30] M. Ieong et al., “High performance double-gate device technology challengesand opportunities,” in Proc. Int. Symp. Quality of Electronic Design, Mar. 2002,pp. 492–495.

[31] B. Majkusiak, T. Janik, and J. Walczak, “Semiconductor thickness effects inthe double gate SOI MOSFET,” IEEE Trans. Electron Devices, vol. 45, no. 5, pp.1127–1134, May 1998.

[32] C. W. Lee et al., “Device design guidelines for nanoscale MuGFETs,” Solid-State Electronics, vol. 51, no. 3, pp. 505–510, 2007.

[33] Y. K. Choi, T. J. King, and C. Hu, “Nanoscale CMOS spacer FinFET for theterabit era,” IEEE Electron Device Lett., vol. 23, no. 1, pp. 25–27, Jan. 2002.

[34] J. Kedzierski et al., “Extension and source/drain design for high performanceFinFET devices,” IEEE Trans. Electron Devices, vol. 50, no. 4, pp. 952–958, Apr.2003.

[35] H. Shang et al., “Investigation of FinFET devices for 32nm technologies andbeyond,” in Proc. Int. Symp. VLSI Technology, June 2006, pp. 54–55.

[36] S. Xiong and J. Bokor, “Sensitivity of double gate and FinFET devices to pro-cess variations,” IEEE Trans. Electron Devices, vol. 50, no. 11, pp. 2255–2261,Nov. 2003.

[37] J. Kedzierski et al., “Metal gate FinFET and fully depleted SOI devices usingtotal gate silicidation,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp. 247–250.

[38] P. Ranade et al., “Tunable workfunction molybdenum gate technology forFDSOI-CMOS,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp. 363–366.

[39] W. P. Maszara et al., “Transistors with dual workfunction metal gates by sin-gle full silicidation (FUSI) of polysilicon gates,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 367–370.

225

[40] D. Ha et al., “Molybdenum gate HfO2 CMOS FinFET technology,” in Proc.Int. Electron Devices Mtg., Dec. 2004, pp. 643–646.

[41] M. Dunga et al., “BSIM-MG: A versatile multi-gate FET model for mixed-signal design,” in Proc. Int. Symp. VLSI Technology, June 2007, pp. 60–61.

[42] J. Fossum et al., “A process-physics based compact model for nanoclassicalCMOS device and circuit design,” Solid-State Electronics, vol. 48, pp. 919–926,June 2004.

[43] W. Zhang, J. Fossum, L. Mathew, and Y. Du, “Physical insights regardingdesign and performance of independent-gate FinFETs,” IEEE Trans. ElectronDevices, vol. 52, no. 10, pp. 2198–2207, Oct. 2005.

[44] S. H. Kim, J. G. Fossum, and V. P. Trivedi, “Bulk inversion in FinFETs and im-plied insights on effective gate width,” IEEE Trans. Electron Devices, vol. 52,no. 9, pp. 1993–1997, Sept. 2005.

[45] V. Trivedi, J. G. Fossum, and M. M. Chowdhury, “Nanoscale FinFETs withgate source/drain underlap,” IEEE Trans. Electron Devices, vol. 52, no. 1, pp.56–62, Jan. 2005.

[46] H.-K. Lim and J. G. Fossum, “Threshold voltage of thin film silicon on insu-lator MOSFETs,” IEEE Trans. Electron Devices, vol. 30, no. 10, pp. 1244–1251,Oct. 1983.

[47] M. M. Chowdhury and J. G. Fossum, “Physical insights on electron mobilityin contemporary FinFETs,” vol. 27, pp. 482–485, June 2006.

[48] M. M. Chowdhury, V. P. Trivedi, J. G. Fossum, and L. Mathew, “Carrier mo-bility/transport in undoped-UTB DG FinFETs,” IEEE Trans. Electron Devices,vol. 54, pp. 1125–1131, May 2007.

[49] J. G. Fossum et al., “Pragmatic design of nanoscale multi-gate CMOS,” inProc. Int. Electron Devices Mtg., Dec. 2004, pp. 613–616.

[50] M. Masahara et al., “Demonstration of asymmetric gate oxide thickness 4-terminal FinFETs,” in Proc. Int. SOI Conf., 2006, pp. 165–166.

[51] N. Collaert et al., “A functional 41-stage ring oscillator using scaled FinFETdevices with 25-nm gate lengths and 10-nm fin widths applicable for the 45-nm CMOS node,” IEEE Electron Device Lett., vol. 25, pp. 568–570, Aug. 2004.

[52] A. Datta et al., “Modeling and circuit synthesis for independently controlleddouble gate FinFET devices,” IEEE Trans. Computer-Aided Design, vol. 26,no. 11, pp. 1957–1966, Nov. 2007.

[53] K. Endo et al., “A dynamical power-management demonstration using four-terminal separated-gate FinFETs,” IEEE Electron Device Lett., vol. 28, pp. 452–454, May 2007.

226

[54] D. Hackler, D. DeGregorio, and S. Parke, “Ultra-low-power, high-performance, dynamic-threshold digital circuits in the FlexFETindependently-double-gated SOI CMOS technology,” in Proc. Int. SOIConf., 2005, pp. 81–82.

[55] J. Gu, J. Keane, S. Sapatnekar, and C. H. Kim, “Statistical leakage estimationof double gate FinFET devices considering the width quantization property,”IEEE Trans. VLSI Systems, vol. 16, pp. 206–209, Feb. 2008.

[56] J. Ouyang and Y. Xie, “Power optimization for FinFET-based circuits usinggenetic algorithms,” in Proc. Int. SOI Conf., 2008, pp. 211–214.

[57] A. Kumar, B. A. Minch, and S. Tiwari, “Low voltage and performance tun-able CMOS circuit design using independently driven double gate MOS-FETs,” in Proc. Int. SOI Conf., 2004, pp. 119–121.

[58] M. H. Chiang, K. Kim, C. Tretz, and C. Chuang, “Novel high-density low-power logic circuit techniques using DG devices,” IEEE Trans. Electron De-vices, vol. 52, pp. 2339–2342, Oct. 2005.

[59] A. Muttreja, N. Agarwal, and N. K. Jha, “CMOS logic design withindependent-gate FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007,pp. 560–567.

[60] A. N. Bhoj and N. K. Jha, “Design of ultra-low-leakage logic gates and flip-flops in high performance FinFET technology,” in Proc. Int. Symp. Quality ofElectronic Design, Mar. 2011, pp. 1–8.

[61] S. A. Tawfik and V. Kursun, “Characterization of new static independent-gate-biased FinFET latches and flip-flops under process variations,” in Proc.Int. Symp. Quality of Electronic Design, Mar. 2008, pp. 311–316.

[62] ——, “Low-power and compact sequential circuits with independent-gateFinFETs,” IEEE Trans. Electron Devices, vol. 55, no. 1, pp. 60–70, Jan. 2008.

[63] G. Curatola and S. Nuttinck, “The role of volume inversion on the intrin-sic RF performance of double-gate FinFETs,” IEEE Trans. Electron Devices,vol. 54, no. 1, pp. 141–150, Jan. 2007.

[64] S. Nuttinck, B. Parvais, G. Curatola, and A. Mercha, “Double-gate FinFETsas a CMOS technology downscaling option: An RF perspective,” IEEE Trans.Electron Devices, vol. 54, no. 2, pp. 279–283, Feb. 2007.

[65] V. Subramanian et al., “Planar bulk MOSFETs versus FinFETs: An analog/RFperspective,” IEEE Trans. Electron Devices, vol. 53, no. 12, pp. 3071–3079, Dec.2006.

[66] P. Wambacq et al., “The potential of FinFETs for analog and RF circuit appli-cations,” IEEE Trans. Circuits & Systems, vol. 54, pp. 2541–2551, Nov. 2007.

227

[67] A. Kranti and G. A. Armstrong, “Design and optimization of FinFETs forultra-low-voltage analog applications,” IEEE Trans. Electron Devices, vol. 54,pp. 3308–3316, Dec. 2007.

[68] G. Pei and E. C. Kan, “Independently driven DG MOSFETs for mixed-signal circuits: Part I-quasi-static and nonquasi-static channel coupling,”IEEE Trans. Electron Devices, vol. 51, pp. 2086–2093, Dec. 2004.

[69] ——, “Independently driven DG MOSFETs for mixed-signal circuits: PartII-applications on cross-coupled feedback and harmonics generation,” IEEETrans. Electron Devices, vol. 51, pp. 2094–2101, Dec. 2004.

[70] M. Shrivastava et al., “A novel and robust approach for common mode feed-back using IDDG FinFET,” IEEE Trans. Electron Devices, vol. 55, pp. 3274–3282, Nov. 2008.

[71] Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic,“FinFET-based SRAM design,” in Proc. Int. Symp. Low Power Electronics &Design, Aug. 2005, pp. 2–7.

[72] A. Carlson, Z. Guo, S. Balasubramanian, R. Zlatanovici, T. Liu, andB. Nikolic, “SRAM read/write margin enhancements using FinFETs,” IEEETrans. VLSI Systems, vol. 18, no. 6, pp. 887–900, Sept. 2009.

[73] S. Inaba et al., “Direct evaluation of DC characteristic variability in FinFETSRAM cell for 32nm node and beyond,” in Proc. Int. Electron Devices Mtg.,Dec. 2007, pp. 487–490.

[74] A. Bansal, S. Mukhopadhyay, and K. Roy, “Device-optimization techniquefor robust and low-power FinFET SRAM design in nanoscale era,” IEEETrans. Electron Devices, vol. 54, pp. 1409–1419, June 2007.

[75] Z. Liu, S. A. Tawfik, and V. Kursun, “Statistical data stability and leakageevaluation of FinFET SRAM cells with dynamic threshold voltage tuningunder process parameter fluctuations,” in Proc. Int. Symp. Quality of ElectronicDesign, Mar. 2008, pp. 305–310.

[76] B. K. Young, K. Yong-Bin, and F. Lombardi, “Low power 8T SRAM using32nm independent gate FinFET technology,” in Proc. Int. SOI Conf., 2008, pp.247–250.

[77] R. V. Joshi, K. Kim, R. Q. Williams, E. J. Nowak, and C.-T. Chuang, “A high-performance, low leakage, and stable SRAM row-based back-gate biasingscheme in FinFET technology,” in Proc. Int. Conf. VLSI Design, Jan. 2007, pp.665–672.

[78] K. Itoh, M. Horiguchi, and H. Tanaka, Ultra-low Voltage Nanoscale Memories.Springer, New York, 2007.

228

[79] W. Zhang, J. G. Fossum, L. Mathew, and Y. Du, “Physical insights regardingdesign and performance of independent-gate FinFETs,” IEEE Trans. ElectronDevices, vol. 52, pp. 2198–2206, Oct. 2005.

[80] “2011 International Technology Roadmap for Semiconductors, Modelingand Simulation,” http://www.itrs.net/Links/20011ITRS/2011Chapters/2011Modeling.pdf.

[81] Sentaurus TCAD manuals. http://www.synopsys.com.

[82] Fabless foundry model under stress: 20nm challenges.http://semimd.com/blog/2012/06/26/fabless-foundry-model-under-stress/.

[83] Sentaurus Device manuals. http://www.synopsys.com.

[84] UF-SOI user guide. http://www.soi.tec.ufl.edu/.

[85] Y. Li, C.-H. Hwang, and H.-W. Cheng, “Process-variation and random-dopants-induced threshold voltage fluctuations in nanoscale planar MOS-FET and bulk FinFET devices,” Microelectronic Engineering, vol. 86, no. 3, pp.277–282, Mar. 2009.

[86] A. Thean et al., “Performance and variability comparisons between multi-gate FETs and planar SOI transistors,” in Proc. Int. Electron Devices Mtg., Dec.2006, pp. 1–4.

[87] A. N. Bhoj and N. K. Jha, “Design of logic gates and flip-flops in high-performance FinFET technology,” accepted for publication in IEEE Trans. VLSISystems.

[88] H. Dadgour, K. Endo, V. De, and K. Banerjee, “Modeling and analysis ofgrain-orientation effects in emerging metal-gate devices and implications forSRAM reliability,” in Proc. Int. Electron Devices Mtg., 2008, pp. 705–708.

[89] ——, “Grain-orientation induced work function variation in nanoscalemetal-gate transistors - part I: modeling, analysis and experimental valida-tion,” IEEE Trans. Electron Devices, vol. 57, no. 10, pp. 2504–2513, Oct. 2010.

[90] ——, “Grain-orientation induced work function variation in nanoscalemetal-gate transistors - part II: implications for process, device and circuitdesign,” IEEE Trans. Electron Devices, vol. 57, no. 10, pp. 2515–2525, Oct. 2010.

[91] S. Rasouli, K. Endo, J. Chen, N. Singh, and K. Banerjee, “Grain-orientationinduced quantum confinement variation in FinFETs and multi-gate ultra-thin body CMOS devices and implications for digital design,” IEEE Trans.Electron Devices, vol. 58, no. 8, pp. 2282–2292, Aug. 2011.

229

http://www.itrs.net/Links/20011ITRS/2011Chapters/2011Modeling.pdf

http://www.itrs.net/Links/20011ITRS/2011Chapters/2011Modeling.pdf

[92] K. von Arnim et al., “A low power multi-gate FET CMOS technology with13.9ps inverter delay,” in Proc. Int. Symp. VLSI Technology, 2007, pp. 106–107.

[93] C. Pacha et al., “Efficiency of low-power design techniques in multi-gate FETCMOS circuits,” in Proc. European Conf. Solid-State Circuits, 2007, pp. 111–114.

[94] A. Muttreja, N. Agarwal, and N. K. Jha, “CMOS logic design with indepen-dent gate FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007, pp. 560–567.

[95] M. Rostami and K. Mohanram, “Dual-Vth independent-gate FinFETs for lowpower logic circuits,” IEEE Trans. Computer-Aided Design, vol. 30, no. 3, pp.337–349, 2011.

[96] A. Datta, A. Goel, R. T. Cakici, H. Mahmoodi, D. Lakshmanan, and K. Roy,“Modeling and circuit synthesis for independently controlled double gateFinFET devices,” IEEE Trans. Computer-Aided Design, vol. 26, no. 11, pp.1957–1966, Nov. 2007.

[97] M.-H. Chiang, K. Kim, C. Tretz, and C.-T. Chuang, “Novel high-density low-power logic circuit techniques using DG devices,” IEEE Electron Device Lett.,vol. 52, no. 10, pp. 2339–2342, Oct. 2005.

[98] A. Kumar, B. A. Minch, and S. Tiwari, “Low voltage and performance tun-able CMOS circuit design using independently driven double gate MOS-FETs,” in Proc. Int. SOI Conf., Oct. 2004.

[99] S. A. Tawfik and V. Kursun, “Low-power and compact sequential circuitswith independent-gate FinFETs,” IEEE Trans. Electron Devices, vol. 55, pp.60–70, Jan. 2008.

[100] S. Tawfik and V. Kursun, “Characterization of new static independent-gatebiased FinFET latches and flip-flops under process variations,” in Proc. Int.Symp. Quality of Electronic Design, Mar. 2008, pp. 311–316.

[101] S. Xiong and J. Bokor, “Sensitivity of double-gate and FinFET devices to pro-cess variations,” IEEE Trans. Electron Devices, vol. 50, pp. 2255–2261, Nov.2003.

[102] J. Kedzierski et al., “High-performance symmetric-gate and CMOS-compatible Vt asymmetric-gate FinFET devices,” in Proc. Int. Electron DevicesMtg., 2001, pp. 19.5.1–19.5.4.

[103] L. Mathew et al., “FinFET with isolated n+ and p+ gate regions strapped withmetal and polysilicon,” in Proc. Int. SOI Conf., Nov. 2003, pp. 109–110.

[104] A. N. Bhoj and N. K. Jha, “Gated-diode FinFET DRAMs: Device and circuitdesign considerations,” ACM J. Emerging Technologies in Computing Systems,vol. 6, no. 4, pp. 12:1–12:32, 2010.

230

[105] D. Ha, H. Takeuchi, Y.-K. Choi, and T.-J. King, “Molybdenum gate technol-ogy for ultrathin-body MOSFETs and FinFETs,” IEEE Trans. Electron Devices,vol. 51, no. 12, pp. 1989–1996, Dec. 2004.

[106] A. Singhee and R. A. Rutenbar, “From finance to flip flops: A study of fastquasi-Monte Carlo methods from computational finance applied to statisti-cal circuit analysis,” in Proc. Int. Symp. Quality of Electronic Design, Mar. 2007,pp. 685–692.

[107] I. M. Sobol, “The distribution of points in a cube and the approximationevaluation of integrals,” in USSR Comp. Math and Math. Phys., 1967, pp. 86–112.

[108] K. Anil, K. Henson, S. Biesemans, and N. Collaert, “Layout density analysisof FinFETs,” in Proc. European Solid-State Device Research Conf., Sept. 2003, pp.139–142.

[109] M. Alioto, “Comparative evaluation of layout density in 3T, 4T, and MT Fin-FET standard cells,” IEEE Trans. VLSI Systems, vol. 19, no. 5, pp. 751–762,May 2011.

[110] R. L. Wadsack, “Fault modeling and logic simulation of CMOS and MOSintegrated circuits,” Bell System Technical J., vol. 57, pp. 1449–1474, May 1978.

[111] T. Storey and W. Maly, “CMOS bridging fault detection,” in Proc. Int. TestConf., Sept. 1990, pp. 842–851.

[112] E. J. McCluskey and C.-W. Tseng, “Stuck-fault tests vs. actual defects,” inProc. Int. Test Conf., Oct. 2000, pp. 336–342.

[113] A. Pramanick and S. Reddy, “On the detection of delay faults,” in Proc. Int.Test Conf., Sept. 1988, pp. 845–856.

[114] J. Li, T. Chao-Wen, and E. J. McCluskey, “Testing for resistive opens andstuck opens,” in Proc. Int. Test Conf., Nov. 2001, pp. 1049–1058.

[115] F. J. Ferguson and J. P. Shen, “A CMOS fault extractor for inductive faultanalysis,” IEEE Trans. Computer-Aided Design, vol. 7, no. 11, pp. 1181–1194,Nov. 1988.

[116] R. Rajsuman, “IDDQ testing for CMOS VLSI,” Proc. IEEE, vol. 88, no. 4, pp.544–568, Apr. 2000.

[117] C.-L. Hsu, M.-H. Ho, and C.-F. Lin, “Novel built-in current-sensor-basedtesting scheme for CMOS integrated circuits,” IEEE Trans. Instrumentationand Measurement, vol. 58, no. 7, pp. 2196–2208, July 2009.

[118] J. Vazquez, V. Champac, C. Hawkins, and J. Segura, “Stuck-open fault leak-age and testing in nanometer technologies,” in Proc. VLSI Test Symp., May2009, pp. 315–320.

231

[119] M. O. Simsir, A. N. Bhoj, and N. K. Jha, “Fault modeling for FinFET circuits,”in Proc. Int. Symp. Nanoscale Architectures, June 2010, pp. 41–46.

[120] A. N. Bhoj, M. O. Simsir, and N. K. Jha, “Fault models for logic circuits in themultigate era,” IEEE Trans. Nanotechnology, vol. 11, no. 1, pp. 182–193, Jan.2012.

[121] E. MacDonald and N. A. Touba, “Delay testing of SOI circuits: Challengeswith the history effect,” in Proc. Int. Test Conf., Sept. 1999, pp. 269–275.

[122] A. Zaka et al., “Characterization and 3D TCAD simulation of NOR-type flashnon-volatile memories with emphasis on corner effects,” Solid-State Electron-ics, vol. 63, no. 1, pp. 158–162, Sept. 2011.

[123] W. Wang, S. Chang, J. Huang, and S. Kuang, “3D TCAD simulations ofstrained Si CMOS devices with silicon-based alloy stressors and stressedCESL,” Solid-State Electronics, vol. 53, no. 8, pp. 880–887, Aug. 2009.

[124] G. Pei, J. Kedzierski, P. Oldiges, M. Ieong, and E. Kan, “FinFET design con-siderations based on 3D TCAD simulation and analytical modeling,” IEEETrans. Electron Devices, vol. 49, no. 8, pp. 1411–1419, Aug. 2002.

[125] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “Efficient methodologies for 3D-TCADmodeling of emerging devices and circuits,” accepted for publication in IEEETrans. Computer-Aided Design.

[126] Z. Essa, P. Boulenc, C. Tavernier, F. Hirigoyen, A. Crocherie, J. Michelot, andD. Rideau, “3D TCAD simulation of advanced CMOS image sensors,” inProc. Int. Conf. Simulation of Semiconductor Processes and Devices, Sept. 2011,pp. 187–190.

[127] M. Nawaz, W. Molzer, S. Decker, L. Giles, and T. Schulz, “On the device de-sign assessment of multigate FETs (MuGFETs) using full process and devicesimulation in 3D TCAD,” IEEE Trans. Electron Devices, vol. 38, no. 12, pp.1238–1251, Dec. 2007.

[128] L. Sponton, L. Bomholt, and W. Fichtner, “Analysis of process-geometrymodulations through 3D TCAD,” in Proc. Int. Conf. Simulation of Semicon-ductor Processes and Devices, Sept. 2007, pp. 385–388.

[129] P. Fleischmann, R. Sabelka, A. Stach, R. Strasser, and S. Selberherr, “Gridgeneration for three-dimensional process and device simulation,” in Proc.Int. Conf. Simulation of Semiconductor Processes and Devices, Sept. 1996, pp.161–166.

[130] W. Wessner, J. Cervenka, C. Heitzinger, A. Hossinger, and S. Selberherr,“Anisotropic mesh refinement for the simulation of three-dimensional semi-conductor manufacturing processes,” IEEE Trans. Computer-Aided Design,vol. 25, no. 10, pp. 2129–2139, Oct. 2006.

232

[131] K. K. H. Toh, A. R. Neureuther, and E. W. Scheckler, “Algorithms for sim-ulation of three-dimensional etching,” IEEE Trans. Computer-Aided Design,vol. 13, no. 5, pp. 616–624, May 1994.

[132] Z. F. Zhou, Q. A. Huang, W. H. Li, and W. Lu, “A novel 3-D dynamic cellu-lar automata model for photoresist-etching process simulation,” IEEE Trans.Computer-Aided Design, vol. 26, no. 1, pp. 100–114, Jan. 2007.

[133] A. N. Bhoj and R. V. Joshi, “Transport analysis based 3D TCAD capacitanceextraction for sub-32nm SRAM structures,” IEEE Electron Device Lett., pp.158–160, Feb. 2012.

[134] Sentaurus Structure Editor manuals. http://www.synopsys.com.

[135] Cadence SKILL scripting language. http://www.cadence.com.

[136] Sentaurus TCAD tool suite. http://www.synopsys.com.

[137] Sentaurus Process manuals. http://www.synopsys.com.

[138] Sentaurus TCAD Application examples and notes.http://solvnet.synopsys.com.

[139] K. Nabors, S. Kim, J. White, and S. Senturia, “Fast capacitance extractionof general three-dimensional structures,” IEEE Trans. Microwave Theory andTechniques, vol. 40, no. 7, pp. 1496–1506, July 1992.

[140] T. Lu, Z. Wang, and W. Yu, “Hierarchical block boundary-element method(HBBEM): A fast field solver for 3-D capacitance extraction,” IEEE Trans. Mi-crowave Theory and Techniques, vol. 52, no. 1, pp. 10–19, Jan. 2004.

[141] W. Yu, Z. Wang, and J. Gu, “Fast capacitance extraction of actual 3-D VLSIinterconnects using quasi-multiple medium accelerated BEM,” IEEE Trans.Microwave Theory and Techniques, vol. 51, no. 1, pp. 109–119, Jan. 2003.

[142] A. Chin et al., “RF passive devices on Si with excellent performance close toideal devices designed by electro-magnetic simulation,” in Proc. Int. ElectronDevices Mtg., Dec. 2003, pp. 375–378.

[143] G. Wang, X. Qi, Z. Yu, and R. Dutton, “Device level modeling of metal-insulator-semiconductor interconnects,” IEEE Trans. Electron Devices, vol. 48,no. 8, pp. 1672 – 1682, Aug. 2001.

[144] S. Laux, “Techniques for small-signal analysis of semiconductor devices,”IEEE Trans. Electron Devices, vol. 32, no. 10, pp. 2028 – 2037, Oct. 1985.

[145] A. N. Bhoj, R. V. Joshi, S. Polonsky, R. Kanj, S. Saroop, Y. Tan, and N. K. Jha,“Hardware-assisted 3D TCAD for predictive capacitance extraction in 32nmSOI SRAMs,” in Proc. Int. Electron Devices Mtg., Dec. 2011, pp. 34.7.1–34.7.4.

233

[146] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “3D-TCAD based parasitic capacitanceextraction for emerging multigate devices and circuits,” accepted for publica-tion in IEEE Trans. VLSI Systems.

[147] C. Wang et al., “FinFET resistance mitigation through design and processoptimization,” in Proc. Int. Symp. VLSI Technology, Apr. 2009, pp. 127–128.

[148] C. H. Lin et al., “Non-planar device architecture for 15nm node: FinFET orTrigate?” in Proc. Int. SOI Conf., Oct. 2010, pp. 1–2.

[149] A. Kaneko et al., “Sidewall transfer process and selective gate sidewall spacerformation technology for sub-15nm FinFET with elevated source/drain ex-tension,” in Proc. Int. Electron Devices Mtg., Dec. 2005, pp. 844–847.

[150] T. Kanemura et al., “Improvement of drive current in bulk-FinFET using full3D process/device simulations,” in Proc. Int. Conf. Simulation of Semiconduc-tor Processes and Devices, Sept. 2006, pp. 131–134.

[151] K. Maitra et al., “Aggressively scaled strained-silicon-on-insulator undoped-body high-κ/metal-gate nFinFETs for high-performance logic applications,”IEEE Electron Device Lett., pp. 713–715, June 2011.

[152] H. Kawasaki et al., “Challenges and solutions of FinFET integration in anSRAM cell and a logic circuit for 22nm node and beyond,” in Proc. Int. Elec-tron Devices Mtg., Dec. 2009, pp. 1–4.

[153] T. Yamashita et al., “Analysis of parasitic resistance in double gate FinFETswith different fin lengths,” in Proc. Int. SOI Conf., Oct. 2011, pp. 1–2.

[154] H. Zhao, Y. Yeo, S. Rustagi, and G. Samudra, “Analysis of the effects of fring-ing electric field on FinFET device performance and structural optimizationusing 3-D simulation,” IEEE Trans. Electron Devices, vol. 55, no. 5, pp. 1177–1184, May 2008.

[155] W. Wu and M. Chan, “Analysis of geometry-dependent parasitics in multifindouble-gate FinFETs,” IEEE Trans. Electron Devices, vol. 54, no. 4, pp. 692–698,Apr. 2007.

[156] M. Guillorn et al., “FinFET performance advantage at 22nm: An AC perspec-tive,” in Proc. Int. Symp. VLSI Technology, June 2008, pp. 12–13.

[157] H. Kawasaki et al., “Embedded bulk FinFET SRAM cell technology with pla-nar FET peripheral circuit for hp32 nm node and beyond,” in Proc. Int. Symp.VLSI Technology, June 2006, pp. 70–71.

[158] ——, “Demonstration of highly scaled FinFET SRAM cells with high-k/metal gate and investigation of characteristic variability for the 32nm nodeand beyond,” in Proc. Int. Electron Devices Mtg., Dec. 2008, pp. 1–4.

234

[159] V. Basker et al., “A 0.063 µm2 FinFET SRAM cell demonstration with conven-tional lithography using a novel integration scheme with aggressively scaledfin and gate pitch,” in Proc. Int. Symp. VLSI Technology, June 2010, pp. 19–20.

[160] M. Guillorn et al., “A 0.021 µm2 trigate SRAM cell with aggressively scaledgate and contact pitch,” in Proc. Int. Symp. VLSI Technology, June 2011, pp.64–65.

[161] C. H. Lin et al., “Modeling of width-quantization-induced variations in logicFinFETs for 22nm and beyond,” in Proc. Int. Symp. VLSI Technology, June2011, pp. 16–17.

[162] T. Yamashita et al., “Sub-25nm FinFET with advanced fin formation andshort channel effect engineering,” in Proc. Int. Symp. VLSI Technology, June2011, pp. 14–15.

[163] P. Oldiges et al., “Critical analysis of 14nm device options,” in Proc. Int. Conf.Simulation of Semiconductor Processes and Devices, Sept. 2011, pp. 5–8.

[164] J. B. Chang et al., “Scaling of SOI FinFETs down to fin width of 4nm for the10nm technology node,” in Proc. Int. Symp. VLSI Technology, June 2011, pp.12–13.

[165] C. Wann et al., “SRAM cell design for stability methodology,” in Proc. Int.Symp. VLSI Technology, Aug. 2005, pp. 21–22.

[166] Y. Taur and H. Ning, Fundamentals of Modern VLSI Devices. Cambridge,U.K.: Cambridge Univ. Press, 1998.

[167] A. N. Bhoj and N. K. Jha, “Parasitics-aware design of symmetric and asym-metric gate-workfunction FinFET SRAMs,” under review.

[168] S. Gangwal, S. Mukopadhyay, and K. Roy, “Optimization for surface orien-tation for high-performance, low-power and robust FinFET SRAM,” in Proc.Custom Integrated Circuits Conf., Sept. 2006, pp. 433–436.

[169] S. A. Tawfik and V. Kursun, “Low power and stable FinFET SRAM withstatic independent gate bias for enhanced integration density,” in Proc. Int.Conf. Electronics, Circuits, and Systems, Dec. 2007, pp. 443–446.

[170] K. Endo, S.-I. Ouchi, Y. Ishikawa, Y. Liu, T. Matsukawa, K. Sakamoto,M. Masahara, J. Tsukada, K. Ishii, H. Yamauchi, and E. Suzuki,“Independent-gate four-terminal FinFET SRAM for drastic leakage currentreduction,” in Proc. Int. Conf. Integrated Circuit Design and Technology and Tu-torial, June 2008, pp. 63–66.

[171] S. A. Tawfik and V. Kursun, “Portfolio of FinFET memories: Innovative tech-niques for an emerging technology,” in Proc. Int. SOC Design Conf., Nov. 2008,pp. 101–104.

235

[172] A. Singhee and R. Rutenbar, Extreme Statistics in Nanoscale Memory Design.Springer, New York, 2010.

[173] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, “Read stability and write-ability analysis of SRAM cells for nanometer technologies,” IEEE J. Solid-State Circuits, vol. 41, no. 11, pp. 2577–2588, Nov. 2006.

[174] O. Schenk, K. Gartner, W. Fichtner, and A. Stricker, “PARDISO: A high-performance serial and parallel sparse linear solver in semiconductor devicesimulation,” Future Generation Computer Systems, vol. 18, no. 1, pp. 69–78,Jan. 2001.

[175] M. Joshi, G. Karypis, A. Gupta, and F. Gustavson, “PSPASES: Scalable par-allel direct solver for sparse systems,” in Proc. SIAM Conf. Parallel ProcessingScientific Computing, 1999.

[176] Y. Saad, Iterative Methods for Sparse Linear Systems. PWS Publishing Com-pany, 1996.

[177] R. E. Bank and D. J. Rose, “Global approximate Newton methods,” Nu-merische Mathematik, vol. 37, no. 2, pp. 279–295, 1981.

[178] P. Deuflhard, “Global inexact Newton methods for very large scale nonlinearproblems,” IMPACT of Computing in Science and Engineering, vol. 3, no. 4, pp.366–393, Dec. 1991.

[179] R. E. Bank and D. J. Rose, “Parameter selection for Newton-like methods ap-plicable to nonlinear partial differential equations,” SIAM J. Numerical Anal-ysis, vol. 17, no. 6, pp. 806–822, Dec. 1980.

236

Device-Circuit Co-design approaches for Multi-gate FET Technologies

Documents

Transcript of Device-Circuit Co-design approaches for Multi-gate FET Technologies