Device-Circuit Co-design approaches for Multi-gate FET Technologies
Transcript of Device-Circuit Co-design approaches for Multi-gate FET Technologies
DEVICE-CIRCUIT CO-DESIGN APPROACHES
FOR MULTI-GATE FET TECHNOLOGIES
AJAY NUGGEHALLI BHOJ
A DISSERTATION
PRESENTED TO THE FACULTY
OF PRINCETON UNIVERSITY
IN CANDIDACY FOR THE DEGREE
OF DOCTOR OF PHILOSOPHY
RECOMMENDED FOR ACCEPTANCE
BY THE DEPARTMENT OF
ELECTRICAL ENGINEERING
ADVISER: PROFESSOR NIRAJ K. JHA
APRIL 2013
c© Copyright by Ajay Nuggehalli Bhoj, 2013.
All rights reserved.
Abstract
Planar CMOS technology has reached its scaling limits at the 22nm node, where it
is increasingly difficult to design high-performance low-power devices with good
yield in the presence of global and local process variations. Multi-gate FET technol-
ogy is the best alterative that can extend scaling to the sub-10nm technology nodes
with minimum additional processing costs. However, owing to the non-planar na-
ture of multi-gate devices, several challenges in process technology, CAD/layout
design, and testing need to be addressed to enable design portability from planar
to multi-gate FET chips.
This thesis strives to bridge the device-circuit co-design gap that has severely
limited predictive modeling of circuits using emerging multi-gate/FinFET de-
vices during early stages of process technology development. First, the tra-
ditional notion of leveraging independent-gate devices for power reduction
is challenged, by contrasting logic gates having symmetric gate-workfunction
shorted/independent-gate FinFETs alongside logic gates having asymmetric
gate-workfunction shorted-gate FinFETs, in a high-performance process. The
superiority of asymmetric gate-workfunction devices is demonstrated by com-
paring leakage-delay trends, and the downsides of logic gates employing a mix
of shorted and independent-gate devices is brought out from a testing/fault
modeling perspective.
Next, efficient methodologies are developed for unifying the layout and pro-
cess simulation worlds, in order to breach the ‘many-device TCAD barrier’ that
has limited the applicability of 3D-TCAD modeling for over a decade. Here, im-
portant bottlenecks for layout to 3D circuit structure generation, such as the time
and memory complexity of 3D process simulation, are identified. To bypass the
latter, a radically new layout-/process-/device-independent approach based on
iii
automated structure synthesis is proposed and evaluated for accuracy and scala-
bility, using SRAM bitcell structures with 32/22nm process assumptions.
After addressing the 3D-TCAD structure generation issue, several hitherto in-
tractable problems, such as true 3D parasitic capacitance extraction for generic
multi-gate circuit layouts in sub-32nm technology nodes, entered the realm of pos-
sibility. Here, the need for transport analysis based capacitance extraction is ex-
plained, by highlighting the difference between field solver based extractions and
TCAD based extractions on sub-32nm IBM SOI SRAMs. Thereafter, the combina-
tion of structure synthesis and transport analysis based extraction is validated with
hardware data from two companion 6T SRAM arrays fabricated in an IBM 32nm
SOI HKMG process. Next, a multi-gate version of the structure synthesizer is used
to predict and analyze key parasitic capacitance trends in 6T multi-gate SRAMs at
the 22/14/10nm technology nodes.
Finally, this thesis delineates a path to enable multi-gate layout/process/circuit
co-design, using a unified 3D/mixed-mode 2D-TCAD methodology for systemat-
ically designing and evaluating different 6T FinFET SRAM bitcell topologies in a
22nm SOI process. Here, the role of parasitic capacitances, i.e., their dependencies
on fin/gate pitch, etc., are examined in detail, and the need to evaluate multi-gate
bitcells based on dynamic behavior, rather than DC metric targets, is highlighted.
iv
Acknowledgments
Firstly, I’d like to express my heartfelt gratitude to my research adviser Prof.
Niraj Jha. Over the years, he has been a tremendous source of inspiration and
support. I thank him for his timely guidance and insights that have played a key
role in shaping this thesis. I am also deeply indebted to him for being very patient
with me over several paper iterations, and for the encouragement to expand my
horizons into new areas. I have had so much to learn from his impeccable writing
skills, mental discipline, as well as his balanced approach to professional life.
I have also been very lucky to have Dr. Rajiv Joshi from the IBM T.J. Watson
Research Center as my internship mentor and research collaborator during my
stints at IBM Research, Yorktown Heights, NY and IBM SRDC, Bengaluru. Dr.
Joshi was instrumental in providing key ideas and material support in the projects
that I worked on, and his astute focus and perspectives on emerging technologies
have proved invaluable in the course of my research at Princeton. I’d also like to
acknowledge several IBMers who have helped shape my work through interesting
conversations and valuable inputs/feedback: Koushik Das, Sudhir Gowda, Jeff
Burns, Jeff Johnson, Steve Furkay, Abe Elfadel, David Katcoff, Chung-Hsun Lin,
Dieter Wendel, Keunwoo Kim, Matt Ziegler, Phil Oldiges, Pong-Fei Lu, Bob Wong,
Ruchir Puri, Werner Rausch, Aditya Bansal, and Yue Tan. Many thanks to Murali
Kota, Samarth Agarwal, Mohit Bajaj, Rajan Pandey, Ninad Sathaye, Sreekumar
Kuriyedath, and Arvind Ajoy, for guiding me during my internship at IBM SRDC,
Bengaluru.
I am very grateful to Prof. Naveen Verma and Dr. Koushik Das for taking time
to read this thesis in detail, and for providing pointers to help improve it. I’d also
like to thank Prof. Verma and Prof. Jha for the opportunity to work on an SRAM
chip tape-out early on that helped me appreciate the challenges faced by circuit
designers, as well as their support for timely CAD tool maintenance.
v
Owing to the cross-disciplinary nature of my work, I owe a lot to the graduate-
level courses taught by Prof. Sharad Malik, Prof. Niraj Jha, Prof. Li-Shuan Peh,
Prof. Naveen Verma, Prof. James Sturm, Prof. Claire Gmachl, Prof. Steve Chou,
and Prof. Mansour Shayegan. Many thanks to Wali Akande, Wenzhe Cao, Tracy
Tsai, Jiun-Yun Li, Qiang Liu, Yenting Chiu, and Yixing Liang, for helping me out
in courses in the early days.
A bulk of the work in this thesis was enabled by Princeton’s Terascale Infras-
tructure for Groundbreaking Research in Science and Engineering (TIGRESS) HPC
clusters. I am indebted to Dennis McRitchie, Bill Wichser, Bob Knight and Cur-
tis Hillegas for critical software and CAD installation support that they provided
during the course of my research. I also thank all the EE department staff, Sarah
McGovern, Lori Bailey, Stacy Weber, and Roelie Abdi for helping me out on innu-
merable occasions.
I am very thankful to the past and present members of my research group who
have made my stay memorable: Wei Zhang, Amit Kumar, Prateek Mishra, Niket
Agarwal, Muzaffer Simsir, Chun-Yi Lee, Chunxiao Li, Jun-Wei Chuah, Ting-Jung
Lin, Mohammed Shoaib, Meng Zhang, Sourindra Chaudhuri, Aoxiang Tang,
Chia-Chun Lin, Yang Yang, Xianmin Chen, and Amlan Chakrabarti, for many
entertaining conversations. Special thanks to Mohammed Shoaib, Tushar Krishna,
Prakash Prabhu, Arnab Sinha, Easwaran Raman, Aravindan Vijayaraghavan,
Aditya Bhaskara, Anirudh Badam, Arun Raman, Divjyot Sethi, and Rajsekar
Manokaran for wonderful times at Princeton.
Finally, I am deeply grateful to my parents and my brother for their unwavering
love and encouragement throughout the many years of my education. I’d also like
to acknowledge the antharyamin for expressing itself at the toughest of times and
helping me stay focussed on my graduate work.
vi
To my parents.
vii
Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
1 Introduction 1
1.1 The move to multi-gate transistors . . . . . . . . . . . . . . . . . . . . 1
1.2 Dissertation contributions . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Design and test of FinFET logic circuits . . . . . . . . . . . . . 6
1.2.2 Efficient algorithms for 3D-TCAD modeling of emerging de-
vices and circuits . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Transport analysis based 3D-TCAD parasitic capacitance ex-
traction in emerging technologies . . . . . . . . . . . . . . . . 8
1.2.4 Parasitics-aware design of FinFET SRAMs . . . . . . . . . . . 9
1.3 Dissertation structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 Background 12
2.1 Device modeling with Technology CAD . . . . . . . . . . . . . . . . . 12
2.1.1 Process simulation . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Device simulation . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.3 Device transport and physical models in TCAD . . . . . . . . 16
viii
2.2 Compact models for multi-gate FETs . . . . . . . . . . . . . . . . . . . 21
2.3 A generic multi-gate device fabrication flow . . . . . . . . . . . . . . 24
2.4 Multi-gate FET adoption challenges . . . . . . . . . . . . . . . . . . . 27
3 Design and Test of FinFET Logic Circuits 30
3.1 Design of logic gates and flip-flops in high-performance FinFET
technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.3 Symmetric-ΦG and asymmetric-ΦG FinFET devices . . . . . . 33
3.1.4 Symmetric-ΦG and asymmetric-ΦG FinFET logic gates . . . . 47
3.1.5 Symmetric-ΦG and asymmetric-ΦG FinFET latches and flip-
flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.6 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.2 Fault models for logic circuits in the multi-gate era . . . . . . . . . . . 66
3.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.2.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.2.3 FinFET logic gates . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2.4 Modeling defects in FinFET logic gates . . . . . . . . . . . . . 71
3.2.5 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4 Efficient Algorithms for 3D-TCAD Modeling of Emerging Devices and
Circuits 91
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.3 Structure synthesis methodologies . . . . . . . . . . . . . . . . . . . . 97
4.3.1 Key ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.3.2 Building blocks of the algorithm . . . . . . . . . . . . . . . . . 102
ix
4.3.3 Implementation strategies . . . . . . . . . . . . . . . . . . . . . 114
4.4 Structure synthesis case studies . . . . . . . . . . . . . . . . . . . . . . 116
4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.6 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5 Transport analysis based 3D-TCAD Parasitic Capacitance Extraction in
Emerging Technologies 125
5.1 The need for transport analysis based parasitic capacitance extraction 126
5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
5.1.2 Transport analysis based capacitance extraction . . . . . . . . 127
5.1.3 Methodology and results . . . . . . . . . . . . . . . . . . . . . 130
5.1.4 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.2 Hardware-assisted predictive capacitance extraction in 32nm SOI 6T
SRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
5.2.2 Methodology and results . . . . . . . . . . . . . . . . . . . . . 136
5.2.3 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3 Transport analysis based parasitic capacitance extraction in emerg-
ing multi-gate devices and circuits . . . . . . . . . . . . . . . . . . . . 142
5.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.3.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.3.3 Multi-gate device-level parasitics . . . . . . . . . . . . . . . . . 144
5.3.4 Multi-gate circuit-level parasitics . . . . . . . . . . . . . . . . . 150
5.3.5 Multi-gate parasitics vs. device transport . . . . . . . . . . . . 162
5.3.6 Section summary . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6 Parasitics-aware Design of Symmetric and Asymmetric Gate-workfunction
FinFET SRAMs 171
x
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
6.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.3 Simulation setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.3.1 DC metrics of 6T FinFET SRAMs . . . . . . . . . . . . . . . . . 175
6.3.2 Transport analysis based 3D-TCAD extraction of FinFET
SRAM parasitic capacitances . . . . . . . . . . . . . . . . . . . 178
6.3.3 Modeling dynamic behavior of FinFET SRAM bitcells . . . . . 179
6.4 Design of 6T FinFET SRAMs . . . . . . . . . . . . . . . . . . . . . . . . 180
6.4.1 6T FinFET SRAM topologies . . . . . . . . . . . . . . . . . . . 180
6.4.2 6T FinFET SRAM DC metrics . . . . . . . . . . . . . . . . . . . 186
6.4.3 6T FinFET SRAM parasitic capacitances . . . . . . . . . . . . . 194
6.4.4 Transient behavior of 6T FinFET SRAMs . . . . . . . . . . . . 199
6.5 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
7 Conclusion 207
7.1 Dissertation summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
A FinE3D framework 219
A.1 FinE3D Sentaurus TCAD decks . . . . . . . . . . . . . . . . . . . . . . 219
Bibliography 223
xi
List of Tables
3.1 FinFET device parameters . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 Standard cell FinFET INV characteristics, VLOW =−0.2V,VHIGH = 1.2V 53
3.3 Standard cell FinFET NAND2 characteristics . . . . . . . . . . . . . . 53
3.4 TG latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,
T2 = SG(1P1N) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.5 HS latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,
N1/N3/N7 = SG(2N), I5 = SG(2P1N) . . . . . . . . . . . . . . . . . . . 60
3.6 Hold static noise margins, xPyN = x-fin p-FinFET, y-fin n-FinFET . . 62
3.7 ON-state current for individual FinFET devices . . . . . . . . . . . . . 69
3.8 Metrics of SG/LP-mode FinFET INV/NAND gates . . . . . . . . . . 71
3.9 Detected and undetected faults in SG- and LP-mode FinFET NAND
gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.10 Shorting source and drain of an n-/p-FinFET in SG/LP-mode
INV/NAND gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.1 Process feature rulebook examples . . . . . . . . . . . . . . . . . . . . 109
4.2 Resource usage: Process simulation vs. structure synthesis . . . . . . 118
5.1 Bulk and SOI FinFET device parameters . . . . . . . . . . . . . . . . . 145
6.1 22nm SOI FinFET device parameters . . . . . . . . . . . . . . . . . . . 174
6.2 6T FinFET SRAM device configurations . . . . . . . . . . . . . . . . . 182
xii
List of Figures
1.1 The family of multiple-gate transistors [1] . . . . . . . . . . . . . . . . 3
1.2 FinFET types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Electrostatic integrity for different transistor configurations [1] . . . . 5
2.1 TCAD to SPICE model generation . . . . . . . . . . . . . . . . . . . . 13
2.2 The Sentaurus TCAD ecosystem . . . . . . . . . . . . . . . . . . . . . 13
2.3 Generic process simulation steps . . . . . . . . . . . . . . . . . . . . . 15
2.4 Transport models used in device simulation [2] . . . . . . . . . . . . . 17
2.5 I-V comparison between Sentaurus Device and Spice3-UFDG com-
pact model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Generic multi-gate device fabrication flow . . . . . . . . . . . . . . . . 24
2.7 Outline of a gate-first process . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Outline of a gate-last process . . . . . . . . . . . . . . . . . . . . . . . 26
3.1 FinE simulation framework for double-gate circuit design space ex-
ploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 SG-/IG-mode 3D FinFET structures simulated in Sentaurus TCAD . 34
3.3 Two-dimensional (X-Y ) cross-section of an n-FinFET simulated in
Sentaurus TCAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4 Symm-ΦG FinFET symbols: (a) SG-mode n-type, (b) IG-mode n-
type, (c) SG-mode p-type, and (d) IG-mode p-type . . . . . . . . . . . 36
xiii
3.5 Electrostatic potential and electron density distributions within the
fin region of an SG-mode n-FinFET for on-state (VGFS = VGBS =
1V,VDS = 1V ) and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions . . 37
3.6 Electrostatic potential and electron density distributions within the
fin region of an IG-mode n-FinFET for on-state (VGFS = 1V,VGBS =
−0.2V,VDS = 1V ), and off-state (VGFS = 0V,VGBS = −0.2V,VDS = 1V )
conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7 IDS vs. VGFS for an IG-mode n-FinFET, VDS = 1V,VGBS varying from
0V to −0.3V . IOFF = IDS(VGFS = 0V ) varies by 120× . . . . . . . . . . . 40
3.8 Asymm-ΦG FinFET symbols: (a) a-SG-mode n-type, and (b) a-SG-
mode p-type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.9 Electrostatic potential and electron density distributions within the
fin region of an a-SG-mode n-FinFET for on-state (VGFS = VGBS =
1V,VDS = 1V ), and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions . 41
3.10 Energy band diagrams for (a) a-SG-mode n-FinFET, off-state (VGFS =
VGBS = 0V,VDS = 1V ), and (b) IG-mode n-FinFET, off-state (VGFS =
0V,VGBS =−0.2V,VDS = 1V ) . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.11 IDS vs. VGFS for an a-SG-mode n-FinFET (VDS = 1V ), with corre-
sponding curves for SG-mode and IG-mode n-FinFETs . . . . . . . . 43
3.12 IDS vs. VGFS for an a-SG-mode p-FinFET (|VDS| = 1V ), with corre-
sponding curves for SG-mode and IG-mode p-FinFETs . . . . . . . . 43
3.13 ION characteristics vs. variations in LG, TSI , and LUN . . . . . . . . . . 45
3.14 IOFF characteristics vs. variations in LG, TSI , and LUN . . . . . . . . . . 46
3.15 ILEAK distribution for a-SG-/SG-/IG-mode n-FinFETs under gate
workfunction fluctuations, σΦG = 50meV . . . . . . . . . . . . . . . . . 47
3.16 IDS vs. VGFS for an n-FinFET at different temperatures . . . . . . . . . 47
xiv
3.17 IOFF vs. temperature for an a-SG-mode n-FinFET with correspond-
ing curves for SG-mode and IG-mode n-FinFETs . . . . . . . . . . . . 48
3.18 Fractional error in IDS vs. VGFS for 2D/3D device simulations . . . . . 49
3.19 INV gates: (a) SG, (b) LP, (c) IGn, and (d) IGp . . . . . . . . . . . . . . 50
3.20 INV layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.21 NAND2 gates: (a) SG, (b) LP, and (c) MT . . . . . . . . . . . . . . . . . 51
3.22 NAND2 gates: (a) IG, (b) IG2, (c) XT, and (d) XT2 . . . . . . . . . . . 51
3.23 NAND2 layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.24 Asymm-ΦG SG-mode FinFET gates: (a) a-SG-INV, (b) a-SG-NAND2,
and (c) a-SG-NAND2S . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.25 Leakage-delay spectrum for FinFET INV configurations . . . . . . . . 53
3.26 Leakage-delay spectrum for FinFET NAND2 configurations . . . . . 54
3.27 SG-NAND2 transient charactertistics. Input rise time has been in-
creased to 50ps from 10ps to improve visibility. . . . . . . . . . . . . . 55
3.28 XT2-NAND2 transient charactertistics. Input rise time has been in-
creased to 50ps from 10ps to improve visibility. . . . . . . . . . . . . . 56
3.29 Leakage-delay spectrum for asymm-ΦG FinFET logic gates . . . . . . 57
3.30 Average leakage (ILEAK) vs. temperature for FinFET INV and
NAND2 standard cells . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.31 FinFET latch templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.32 TG flip-flop (TGF) template . . . . . . . . . . . . . . . . . . . . . . . . 59
3.33 HS flip-flop (HSF) template . . . . . . . . . . . . . . . . . . . . . . . . 59
3.34 Transient simulations of TGF1 and HSF1 . . . . . . . . . . . . . . . . . 62
3.35 Average ILEAK for FinFET latches . . . . . . . . . . . . . . . . . . . . . 63
3.36 Average ILEAK for FinFET flip-flops . . . . . . . . . . . . . . . . . . . . 63
3.37 Average propagation delay for FinFET latches . . . . . . . . . . . . . 64
3.38 Average propagation delay for FinFET flip-flops . . . . . . . . . . . . 64
xv
3.39 Setup time for FinFET flip-flops . . . . . . . . . . . . . . . . . . . . . . 65
3.40 (a) SG-mode INV, (b) LP-mode INV, (c) SG-mode NAND, and (d)
LP-mode NAND. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.41 Leakage and delay characteristics under different back-gate bias
voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . . . . 70
3.42 (a) Regime I: Opens on shared back-gate bias lines for many LP-
mode INV gates, and (b) Regime II/III: Opens on individual back-
gate bias lines for an LP-mode INV gate . . . . . . . . . . . . . . . . . 74
3.43 Leakage and delay variation with different p-FinFET back-gate bias
voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . . . . 76
3.44 Leakage and delay variation under different n-FinFET back-gate
bias voltages for (a) LP-mode INV, and (b) LP-mode NAND. . . . . . 78
3.45 Leakage and delay variation with different p-FinFET back-gate bias
voltages for (a) SG-mode INV, and (b) SG-mode NAND. . . . . . . . 80
3.46 Leakage and delay variation with different n-FinFET back-gate bias
voltages for (a) SG-mode INV, and (b) SG-mode NAND. . . . . . . . 81
3.47 Effect of cutting a subset of fins in an LP-mode NAND gate p-
FinFET with four fins on (a) delay, and (b) leakage. . . . . . . . . . . . 82
3.48 Pulse characterization setup for (a) SG-mode INV, and (b) SG-mode
NAND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.49 Interplay between CD,BG and CFG,BG with (a) LUN variation, TSI =
10nm, and (b) TSI variation, LUN = 16nm . . . . . . . . . . . . . . . . . 84
3.50 Transient pulse behavior of SG-mode INV in Regime II with (a) n-
FinFET back-gate cut, and (b) p-FinFET back-gate cut . . . . . . . . . 85
3.51 Transient pulse behavior of LP-mode INV in Regime II with (a) n-
FinFET back-gate cut, and (b) p-FinFET back-gate cut . . . . . . . . . 86
xvi
3.52 Transient pulse behavior of SG-mode INV having n-FinFET back-
gate cuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime
III. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3.53 Transient pulse behavior of SG-mode INV having p-FinFET back-
gate cuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime
III. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.54 Transient pulse behavior of LP-mode INV in Regime III with (a) n-
FinFET back-gate cut, and (b) p-FinFET back-gate cut . . . . . . . . . 88
3.55 Transient pulse behavior of SG-mode NAND in Regime III having
(a) n-FinFET back-gate cut at A, (b) n-FinFET back-gate cut at B, and
(c) p-FinFET back-gate cut at A. . . . . . . . . . . . . . . . . . . . . . . 89
4.1 Technology-circuit co-design gap . . . . . . . . . . . . . . . . . . . . . 92
4.2 TCAD flow for the 130nm node and higher . . . . . . . . . . . . . . . 92
4.3 TCAD flow for 90nm-32nm technology nodes . . . . . . . . . . . . . . 93
4.4 The ultimate wishlist for 3D-TCAD assisted process/device devel-
opment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 TCAD modeling quadrants . . . . . . . . . . . . . . . . . . . . . . . . 97
4.6 (a) Modeling ambiguity with manual inputs, and (b) difficulty of
iterative optimization with human elements in the TCAD flow . . . . 99
4.7 3D-TCAD structure generation for layouts: (a) traditional approach,
and (b) proposed approach . . . . . . . . . . . . . . . . . . . . . . . . 100
4.8 Delineation of process zones . . . . . . . . . . . . . . . . . . . . . . . 104
4.9 Construction of device-layout database (DLD) . . . . . . . . . . . . . 106
4.10 Pre-synthesis transformations on PA-GA zones . . . . . . . . . . . . . 107
4.11 Process feature rulebook (PFRB) generation . . . . . . . . . . . . . . . 108
4.12 Layout analyzer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.13 Layout annotation for a 1×1 6T FinFET SRAM bitcell . . . . . . . . . 111
xvii
4.14 Generation of lithography-effects database (LED) . . . . . . . . . . . 112
4.15 Architecture of the structure synthesizer . . . . . . . . . . . . . . . . 113
4.16 FEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.17 BEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.18 Integrated structure synthesis . . . . . . . . . . . . . . . . . . . . . . . 116
4.19 Structure formation during a planar 6T SRAM process simula-
tion: (a) trench device isolation, (b) formation of gate stack, (c)
source/drain formation with spacers, (d) contact and via formation,
and (e) final structure with doping . . . . . . . . . . . . . . . . . . . . 117
4.20 (a) Synthesized planar 6T SRAM structure, and (b) CBL extraction
error percentage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
4.21 Process-simulated versus synthesized 6T SRAM cells: (a) hold static
noise margin (HSNM), and (b) read static noise margin (RSNM) . . . 119
4.22 Synthesized 6T FinFET SRAM bitcell configurations . . . . . . . . . . 120
4.23 3×3 6T FinFET SRAM bitcell structure with mesh . . . . . . . . . . . 121
4.24 Synthesized FinFET ring oscillator configurations . . . . . . . . . . . 122
4.25 6T FinFET SRAM: Synthesis time (in sec.) versus number of FinFETs 122
4.26 FinFET ring oscillator: Synthesis time (in sec.) versus number of
FinFETs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.27 Logic synthesis flows are the circuit-world analogs of Fig. 4.7(b) . . . 124
5.1 Cross-sectional view of a metal wire running over an active semi-
conductor region with two arbitrary doping profiles . . . . . . . . . . 129
5.2 Comparison between FS and TCAD extracted capacitance CAB under
different conditions, ω/2π = 1MHz . . . . . . . . . . . . . . . . . . . . 129
5.3 32nm planar SOI Type I and II SRAM structures . . . . . . . . . . . . 131
5.4 Type I computed BEOL capacitances (TCAD vs. FS) . . . . . . . . . . 132
5.5 Type II computed BEOL capacitances (TCAD vs. FS) . . . . . . . . . . 132
xviii
5.6 Performance difference in Type I & II cells during read operations . . 133
5.7 Type I and Type II Read stability (TCAD vs. FS) . . . . . . . . . . . . 134
5.8 Thin-cell 6T SRAM array SEM top view showing HKMG n-/p-FETs . 135
5.9 Measured intra-wafer CBL for (a) 6T1, and (b) 6T2 . . . . . . . . . . . 136
5.10 Measured inter-wafer CBL for (a) 6T1, and (b) 6T2 . . . . . . . . . . . 136
5.11 Synthesized (FEOL+BEOL) structure for the 6T1 SRAM bitcell . . . . 137
5.12 Effect of variation in BEOL parameters (subject to intra-wafer toler-
ances) on CBL and CWL for 6T1 . . . . . . . . . . . . . . . . . . . . . . . 138
5.13 (a) Measured vs. simulated CGS-VGS data for the nMOS capacitor
structure in Fig. 5.13(b) with width 1µm×2 fingers, (b) Multi-finger
(FEOL+BEOL) nMOS capacitor structure . . . . . . . . . . . . . . . . 139
5.14 CBL variation with p-well dose for (a) 6T1, and (b) 6T2 . . . . . . . . 140
5.15 (a) P-well dose distribution computed from measured 6T1 CBL
distribution [Fig. 5.10(a)] and the characteristic curve of 6T1 [Fig.
5.14(a)], and (b) measured vs. predicted distribution for 6T2. The
characteristic curve of 6T2 [Fig. 5.14(b)] along with the computed
p-well dose distribution [Fig. 5.15(a)] is used to compute the 6T2
CBL distribution. BEOL variation is not considered. . . . . . . . . . . 141
5.16 (a) Bulk FinFET, and (b) SOI FinFET . . . . . . . . . . . . . . . . . . . 145
5.17 Bulk FinFET ‘gate-last’ process simulation steps . . . . . . . . . . . . 146
5.18 Dependence of CDRAIN,TOT and CGAT E,TOT on LG and HGAT E . . . . . . 147
5.19 Dependence of CDRAIN,TOT and CGAT E,TOT on LSP and TSI . . . . . . . . 148
5.20 Dependence of CDRAIN,TOT and CGAT E,TOT on HFIN and HELEV . . . . . 149
5.21 Dependence of CDRAIN,TOT and CGAT E,TOT on NCH and LDL . . . . . . . 149
xix
5.22 3D-TCAD based capacitance extraction for generic multi-gate cir-
cuit layouts: (a) traditional approach using brute-force process sim-
ulation, and (b) our flow which leverages the automated structure
synthesis approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.23 Multi-fin FinFET (a) bulk, and (b) SOI structures. Dielectric regions
are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
5.24 Dependence of CDRAIN,TOT and CGAT E,TOT on FP . . . . . . . . . . . . 154
5.25 Bulk FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and
(b) FEOL only. Dielectric regions are not shown . . . . . . . . . . . . 155
5.26 SOI FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)
FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 156
5.27 CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. FP, GP = 90nm . . . . . . . . 157
5.28 FEOL components of capacitance in the 6T SRAM (111) configura-
tion, FP = 50nm, GP = 90nm . . . . . . . . . . . . . . . . . . . . . . . . 158
5.29 CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. GP, FP = 50nm . . . . . . . . 158
5.30 SOI FinFET 6T SRAM (112) configuration (a) (FEOL+BEOL), and (b)
FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 159
5.31 SOI FinFET 6T SRAM (113) configuration (a) (FEOL+BEOL), and (b)
FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 160
5.32 SOI FinFET 6T SRAM (122) configuration (a) (FEOL+BEOL), and (b)
FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 160
5.33 SOI FinFET 6T SRAM (123) configuration (a) (FEOL+BEOL), and (b)
FEOL only. Dielectric regions are not shown . . . . . . . . . . . . . . 161
5.34 CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM
(FEOL+BEOL) configurations . . . . . . . . . . . . . . . . . . . . . . . 161
5.35 CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM FEOL
configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
xx
5.36 BEOL metal stack from the 22nm 6T SRAM (111) bitcell (a) without
lithography effects, and (b) with lithography effects. Dielectric re-
gions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
5.37 CBL,TOT , CWL,TOT , and CNL,TOT error percentages for (a) bulk 6T
SRAM (111) configuration, FP = 50nm, and (b) varying FP for the
22nm bulk 6T SRAM (111) configuration. GP = 90nm . . . . . . . . . . 163
5.38 Vanilla mixed-mode setup (V MM) . . . . . . . . . . . . . . . . . . . . 164
5.39 Mixed-mode setup with FS-extracted BEOL capacitances (FSMM) . . 165
5.40 Mixed-mode setup with corrected 3D-TCAD capacitances (3D-
TCADMM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.41 Write operations for a 6T FinFET SRAM (111) bitcell using the setups
described in Figs. 5.38, 5.39, and 5.40 . . . . . . . . . . . . . . . . . . . 167
5.42 Minimum write pulse width (TW ) vs. cell sigma . . . . . . . . . . . . . 167
5.43 (a) SG-NAND2, and (b) LP-NAND2 FinFET configurations . . . . . . 168
5.44 Propagation delays of (a) SG-NAND2, and (b) LP-NAND2 config-
urations with different physical models. (DD = Drift-diffusion for-
malism, HD = hydrodynamic formalism, PC = 3D-TCAD-extracted
parasitic capacitances corrections added) . . . . . . . . . . . . . . . . 170
6.1 (a) Two-dimensional SOI n-FinFET cross section, and (b) 3D SOI n-
FinFET structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2 Setup for (a) DC hold metrics, and (b) DC read/write metrics . . . . 176
6.3 N-curve for the DC read condition . . . . . . . . . . . . . . . . . . . . 177
6.4 N-curve for the DC write condition . . . . . . . . . . . . . . . . . . . . 178
6.5 Hybrid mixed-mode device simulation methodology for simulating
SRAM read/write operations . . . . . . . . . . . . . . . . . . . . . . . 180
6.6 V(135) bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric re-
gions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
xxi
6.7 PGFB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric re-
gions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.8 PGFB-PUWG bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielec-
tric regions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . 183
6.9 PGFB-SPU bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric
regions are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.10 RBB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regions
are not shown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.11 Bitcell areas normalized to FP×GP . . . . . . . . . . . . . . . . . . . . 185
6.12 V SC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 186
6.13 IGC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 187
6.14 MSC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM . . . . 188
6.15 V SC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 189
6.16 IGC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 190
6.17 MSC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P . . . . . . 191
6.18 IREAD vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . 192
6.19 ILEAK vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . 193
6.20 Breakup of V (111) (FEOL+BEOL) BL and WL capacitances . . . . . . 194
6.21 Breakup of V (123) (FEOL+BEOL) BL and WL capacitances . . . . . . 194
6.22 Breakup of V (135) (FEOL+BEOL) BL and WL capacitances . . . . . . 195
6.23 Breakup of PGFB (FEOL+BEOL) BL and WL capacitances . . . . . . . 195
6.24 Breakup of PGFB-SPU (FEOL+BEOL) BL and WL capacitances . . . . 196
6.25 Breakup of PGFB-PUWG (FEOL+BEOL) BL and WL capacitances . . 196
6.26 Breakup of RBB (FEOL+BEOL) BL and WL capacitances . . . . . . . . 196
6.27 (FEOL+BEOL) BL and WL capacitances in V SC bitcells . . . . . . . . . 197
6.28 (FEOL+BEOL) capacitances vs. FP for (a) CBL, and (b) CWL (GP= 90nm)197
6.29 IGC bitcell capacitances vs. V (111): (a) (FEOL+BEOL), and (b) FEOL 198
xxii
6.30 TR vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . . . 199
6.31 TR vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V . . . . . . . . 200
6.32 TW vs. VDD: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . . . . . . . . . 202
6.33 TW vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V . . . . . . . . 203
6.34 TR vs. array configuration: (a) V SC, (b) IGC, and (c) MSC . . . . . . . . 204
6.35 TW vs. array configuration: (a) V SC, (b) IGC, and (c) MSC . . . . . . . 205
6.36 (a) TR, and (b) TW vs. VDD for V (111), across different FP . . . . . . . 206
7.1 Scaling behavior: Direct versus iterative linear solvers . . . . . . . . . 212
7.2 Key question: Can the solution of a large structure be approximated
using individual pre-solved device states? . . . . . . . . . . . . . . . . 214
7.3 Transient mixed-mode 2D device simulation runtimes for FinFET
NAND gates, with and without cache-restore of device states . . . . 215
7.4 Generation of device-state database . . . . . . . . . . . . . . . . . . . 215
7.5 State retrieval and extrapolation using the BGB algorithm . . . . . . . 216
7.6 Updating state in the solver loop . . . . . . . . . . . . . . . . . . . . . 217
A.1 A sample process simulation deck . . . . . . . . . . . . . . . . . . . . 220
A.2 Pre-synthesis transformations . . . . . . . . . . . . . . . . . . . . . . . 220
A.3 Layout annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.4 FEOL structure synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.5 BEOL and integrated structure synthesis . . . . . . . . . . . . . . . . 222
A.6 Mesh refinement, capacitance extraction, and post-processing . . . . 222
xxiii
Chapter 1
Introduction
For the last three decades, CMOS technology has provided consistent scaling and
enabled the implementation of high-density, high-speed, and low-power VLSI sys-
tems. In continuing the march towards denser circuitry, however, it has become
apparent that scaling the classical “bulk” MOSFET below the 22nm node is not
practical on account of poor electrostatic behavior [3], [4]. This has triggered re-
search into silicon-on-insulator (SOI) structures like partially-depleted SOI and
fully-depleted SOI (FD-SOI) with better short-channel effects (SCEs), greater per-
formance, and lower power consumption [5], [6], [7].
1.1 The move to multi-gate transistors
Over the past decade, transistor structures have evolved a step further from planar,
classical, single-gate FETs to 3D multi-gate FETs whose behavior can only be fully
explained by advanced carrier transport phenomena. Multi-gate FET technology is
slated to replace FD-SOI and other scaled planar bulk technologies in this decade
according to the International Technology Roadmap for Semiconductors (ITRS)
[8]. Indeed, Intel and TSMC have announced their switch to such devices at the
1
upcoming technology nodes, and other semiconductor companies are expected to
follow suit.
Multi-gate FETs can be classified into (i) double-gate structures, e.g., DELTA
FET [9], silicon-on-nothing FET [10], multiple independent-gate FET [11], and Fin-
FET [12], (ii) triple-gate structures, e.g., trigate FET [13], Π-gate FET [14], and Ω-
gate FET [15], and (iii) surround-gate structures, e.g., cylindrical FET [16], [17],
multi-bridge channel FET [18], planar “gate-all-around” FET [19], twin-silicon-
nanowire FET [20], and nano-beam stacked channel FET [21]. Fig. 1.1 summarizes
the multi-gate family described above.
From the fabrication perspective, the most likely candidate for widespread
adoption amongst the above is the FinFET [22–26]. Over the past few years, con-
siderable research has been directed towards issues dealing with improving and
economically integrating FinFET technology into the conventional CMOS process
[22, 25, 27–30]. Also, analysis and design of FinFET devices and digital circuits
[31–62], analog/RF circuits [63–70], and SRAMs [71–78] have been very active ar-
eas of research in recent years.
The FinFET device structure consists of a silicon fin surrounded by shorted or
independent gates on either side of the fin, typically on an SOI substrate. The dis-
tance between the center points of consecutive fins is referred to as the fin pitch,
while the distance between the center points of consecutive gate conductors is re-
ferred to as the gate pitch. In the shorted-gate (SG) mode of operation, the two
gates are biased together to turn-on the device, providing maximum gate drive
[Fig. 1.2(a)]. In the independent-gate (IG) mode of operation [Fig. 1.2(b)], the
two gates are electrically independent. The back-gate bias can be used to alter the
threshold voltage (Vth) of the front gate, thereby controlling the off-current (IOFF )
of the device [79]. IOFF in SG-mode devices is generally much higher than corre-
sponding IG-mode devices (with reverse biased back-gates), and due to the fixed
2
Figure 1.1: The family of multiple-gate transistors [1]
Vth, it cannot be altered electrically. The Vth is typically controlled by directly setting
the gate workfunction. While IG-mode devices provide the advantage of electri-
cally controlling device Vth, and hence delay/leakage, they lead to a more com-
plicated transistor layout strategy. This is due to the fact that multi-fin IG-mode
FinFETs need larger spacing between the source and drain regions, as well as larger
3
DRAIN
GATE
Z
Y X
SOURCE
HFIN
LG
TSI
(a) SG-mode FinFET
DRAIN
SOURCE
FRONT GATE
BACK GATE
Y
Z
X
(b) IG-mode FinFET
Figure 1.2: FinFET types
fin pitch in order to land a contact to the back gate in comparison to corresponding
multi-fin SG-mode FinFETs that have compact layouts.
Overall, the move to multi-gate transistors presents the following possibilities:
• Multiple gates lead to better electrostatic integrity (EI), which is an important
measure of the short-channel behavior of the FET. SCEs are collectively a
set of undesirable effects [e.g., drain-induced barrier lowering (DIBL)] that
reduce the ability of the gate to control the channel potential as the proximity
between the source and drain regions decreases [1]. As shown in Fig. 1.3, the
extent of penetration of source/drain fields into the FET body is minimized
with multiple gates, and as a result, the roll-off in FET Vth on account of gate
length reduction is alleviated. Better EI also translates to steep subthreshold
slopes and lower sub-threshold leakage current.
• Owing to the small doping volumes involved with the active fin regions,
undoped-/intrinsic-body FinFETs are favorable from the perspective of man-
ufacturabililty. This makes the device virtually free of the random dopant
4
Figure 1.3: Electrostatic integrity for different transistor configurations [1]
fluctuation effect, apart from unintentional source/drain dopants that can
diffuse into the channel region.
• Undoped-body FinFETs lead to decreased impurity scattering for charge car-
riers [1]. Also, thin fins reduce interface scattering owing to volume inver-
sion where majority of the carriers are confined to the center of the fin due to
quantum confinement. Both of these effects coupled with channel strain can
greatly enhance channel mobility and, hence, drain current.
• With the advent of high-k metal-gate (HKMG) in scaled planar technologies,
physical gate-dielectric thickness can be high with a low effective oxide thick-
ness (EOT). Since gate leakage is exponentially dependent on physical gate-
dielectric thickness, gate leakage has been curtailed to a large extent with
HKMG. In FinFETs, HKMG gate stacks coupled with an intrinsic body low-
ers the surface electric field considerably and, hence, reduces gate leakage
even further.
1.2 Dissertation contributions
The major contribution of this thesis is toward the development of efficient
methodologies for unifying the circuit layout/process/device simulation worlds
for early-stage 3D-Technology CAD (3D-TCAD) based modeling of emerging
5
devices and circuits, in particular multi-gate devices such as FinFETs and Tri-gate
transistors. Specifically, the algorithms and frameworks developed herein will em-
power engineers to breach the decade long ‘many-device TCAD barrier’ that has
severely limited the scope of traditional continuum TCAD methods. Other contri-
butions include the design and test of FinFET logic circuits in high-performance
processes and parasitics-aware design of FinFET SRAMs.
These contributions are described in more detail next.
1.2.1 Design and test of FinFET logic circuits
This work consists of two major themes encompassing design and test of FinFET
logic circuits. The first deals with the design of FinFET logic and sequential ele-
ments in a high-performance process. The second explores the possibility of de-
veloping fault models for FinFET logic circuits that employ a mix of SG and IG
FinFETs. A brief summary of the contributions of this work is:
• A head-to-head comparison between symmetric gate-workfunction (Symm-
ΦG) SG- and IG-mode FinFETs along with asymmetric gate-workfunction
(Asymm-ΦG) SG-mode FinFETs and logic/sequential elements employing
them in a 22nm SOI process using TCAD device simulations, where
– Asymm-ΦG FinFETs have better leakage and on-current behavior than
Symm-ΦG IG-mode FinFETs.
– Standard cell logic gates employing Asymm-ΦG FinFETs have better
leakage-delay characteristics than the best logic gates that can be formed
using a mix of Symm-ΦG SG- and IG-mode FinFETs.
– Latches and flip-flops employing a mix of Symm-ΦG/Asymm-ΦG SG-
mode FinFETs are able to optimize delay/setup-time in the best possible
manner.6
• While CMOS fault models overlap considerably with fault models for FinFET
logic circuits, open defects on the back-gate of IG-mode devices/SG-mode
devices, which have accidentally been converted to IG-mode, do not have a
single fault model that is able to capture the observed characteristics.
– Logic gates employing IG-mode FinFETs exhibit a wide range of behav-
iors making it impossible to develop a single test protocol to detect de-
fects, owing to the non-injective, non-surjective mapping of logic gate
behaviors to traditional fault models.
1.2.2 Efficient algorithms for 3D-TCAD modeling of emerging
devices and circuits
In this work, efficient and accurate methodologies are developed for unifying
the layout and process simulation worlds, thereby, expanding the horizon of pre-
dictive modeling for emerging devices beyond the ‘many-device TCAD barrier,’
which is a major showstopper at lower technology nodes. In particular, this work:
• Identifies important bottlenecks that plague modeling efforts in 3D-TCAD
structure generation.
– Presents a compelling case for a layout/process/device-independent
methodology for bypassing the 3D process simulation barrier.
• Proposes an innovative 3D-TCAD ‘structure synthesis’ methodology which
enables automated layout to 3D-TCAD structure generation, and is akin to
logic synthesis in the circuit design world.
– Outlines the necessary algorithms needed to accomplish layout/process
and technology node-independent structure synthesis with reasonable
time/memory complexity.7
– Evaluates the efficacy of the approach by comparing process-simulated
and synthesized structures.
– Enables transport analysis based capacitance extraction, which is critical
for highly scaled devices and circuits.
1.2.3 Transport analysis based 3D-TCAD parasitic capacitance ex-
traction in emerging technologies
After addressing the 3D-TCAD structure generation issue, several hitherto in-
tractable problems in nanoscale device/circuit modeling enter the realm of feasi-
ble solutions. One such problem is parasitic capacitance extraction in nanoscale
circuits like SRAMs, eDRAMs, etc., where traditional segregated approaches to
modeling front-end-of-the-line (FEOL) and back-end-of-the-line (BEOL) capaci-
tances break down. Indeed, the latter is listed as an issue in the 2011 ITRS modeling
and simulation roadmap [80] in Section 3.5. In this work, we demonstrate:
• The need for transport analysis based capacitance extraction for nanoscale
circuits
– Via comprehensive evaluations of the ‘BEOL component’ of parasitic
capacitances obtained using field solvers and 3D-TCAD extraction in
sub-32nm SOI 6T SRAM structures.
– For multi-bitcell based extractions and quantifying the role of edge ef-
fects during extraction.
• Hardware validation of the structure synthesis methodology using bit line
capacitance data from several SRAM arrays in an experimental 32nm SOI
process.
8
• The need for FEOL circuit extraction in multi-gate SRAMs, using a multi-gate
version of the structure synthesizer at the 22/14/10nm nodes.
1.2.4 Parasitics-aware design of FinFET SRAMs
The biggest benefit to chip designs on account of moving to multi-gate devices will
be the massive density improvement in on-chip memories. This work consists of
two major threads:
• Design of FinFET SRAM bitcells using Symm-ΦG and Asymm-ΦG devices
considering DC metrics in a 22nm SOI process.
• A comprehensive evaluation of current and new 6T SRAM topologies from
the perspective of parasitic capacitances/transient analysis, which show that
using DC targets alone can lead to sub-optimal bitcell choices.
1.3 Dissertation structure
The rest of this thesis is organized as follows. Chapter 2 provides a background on
the role of TCAD in the IC design eco-system. Since a bulk of the work in this thesis
is based on multi-gate devices/circuits modeled in the state-of-the-art Sentaurus
TCAD tool suite [81], a brief survey of the transport/physical models used for de-
vice simulation is also presented. Thereafter, contemporary multi-gate compact
models, such as Spice3-UFDG and BSIM-CMG/IMG, are covered. Next, fabrica-
tion flows for high-k metal-gate FinFETs are briefly discussed. This is followed by
the challenges/trade-offs involved in adopting multi-gate FET technology.
Chapters 3, 4, 5, and 6 address several different problems that are relevant to
FinFET logic/memory circuit design and parasitic capacitance extraction. On ac-
count of the rapid increase in process complexity at each technology node, the dif-
9
ferentiation between high-performance/low-power processes is becoming increas-
ingly difficult to sustain, and leading-edge foundries are focussing on enabling
both ends of the spectrum in a single process [82]. In this context, Chapter 3 delves
into the design of ultra-low power FinFET logic and sequential circuits in a 22nm
SOI high-performance process, using Symm-ΦG and Asymm-ΦG FinFETs. Chap-
ter 3 also touches upon an important aspect of the FinFET chip design process that
was hitherto unexplored, which is testing of FinFET logic circuits. Here, a detailed
analysis of mapping defects to fault models under different regimes is presented.
Chapter 4 tackles the 3D process simulation barrier that has severely im-
peded progress in 3D-TCAD simulation, which is widely used for modeling
FinFET logic/memory circuits as well as other nano-scale devices in flash mem-
ories, eDRAMs, etc. An efficient layout/process/technology-node-independent
methodology is proposed to enable the sythesis of device-simulation-ready struc-
tures from generic input layouts, with the aid of technology process assumptions
and process-simulated-device databases. The scaling properties of the method as
well as comparisons with layout-process-simulated structures are also presented,
to highlight the practicality of the approach.
Chapter 5 delves into the pressing problem of accurate layout-aware para-
sitic capacitance extraction for FEOL and BEOL features in highly scaled CMOS
circuits, during early stages of technology development. It comprehensively
establishes the fact that reliance on segregated approaches with compact mod-
els/accelerated field solvers (which were used at higher technology nodes) is not
possible. Here, the scope of the work is broader than extraction of FinFET/multi-
gate circuit parasitics, as the problem becomes relevant from the planar CMOS
32nm node itself. The methodology proposed in Chapter 4 immediately enables
the correct physics-based approach, which is transport analysis based parasitic
capacitance extraction in a device simulator, on the 3D device-simulation-ready
10
structures obtained from the respective layouts. Hardware validation of the
unified parasitic capacitance extraction methodology is also provided in an ex-
perimental 32nm IBM SOI process. Thereafter, capacitance trends in bulk/SOI
FinFETs at the 22/14/10nm technology nodes are computed using a multi-gate
version of the structure synthesizer that is presented in Chapter 4.
Continuing along the lines of the parasitic capacitance extraction problem in
Chapter 5, Chapter 6 develops the entire technology-circuit co-design flow for
early-stage design of FinFET SRAMs that employ a combination of Symm-ΦG SG-
and IG-mode and Asymm-ΦG FinFETs. Leveraging 3D-TCAD parasitic capaci-
tance extraction and back-annotations of capacitances into mixed-mode 2D-TCAD
circuit simulations, several different topologies are examined from a transient anal-
ysis perspective, and the problems with DC target based classification of SRAM
bitcells are highlighted.
Finally, Chapter 7 concludes this thesis and presents directions for future
work. The abstractions provided in Chapter 4 are radically different from the
traditional approaches seen in the TCAD community and enable ‘intelligent de-
vice state caching,’ which is proposed in Chapter 7. Here, individual devices
can be pre-solved under various bias conditions and their solution vectors can
be sampled/cached. These solutions can later be reused for generating intel-
ligent solution guesses for other similar layout-synthesized structures, upon
which DC/transient simulations are being performed, thereby accelerating device
simulation by orders of magnitude.
11
Chapter 2
Background
2.1 Device modeling with Technology CAD
IC design is carried out at various levels of abstraction: architecture, logic, transis-
tor, etc. TCAD is used at the lowest level of the hierarchy and enables technology
development with fewest abstractions. It is predominantly physics-based and has
traditionally been the primary vehicle for predictive modeling of transistors and
other active devices, considered to be part of FEOL manufacturing. TCAD is also
used to explore newer device designs and extrapolate to the next technology node,
besides giving engineers a better understanding of the benefits and drawbacks of
any modifications to existing manufacturing processes, as well as the development
of compact/SPICE models.
Fig. 2.1 depicts the typical sequence of steps involved in computational device
research and development with TCAD, and Fig. 2.2 shows a sample state-of-the-
art toolchain from Synopsys [81]. Initially, process descriptions are crystallized
into concrete assumptions from trial process runs. The process recipe is fed to a
process simulator, which applies the recipe to yield-sensitive circuit layouts, e.g.,
SRAM/eDRAM bitcells, to generate a process-simulated device (PSD) structure.
12
Process simulation
Device simulation
Process assumptions
Device parameter extraction
SPICE model
generation
Test structure
fabrication
Hardware
measurements
Figure 2.1: TCAD to SPICE model generation
Figure 2.2: The Sentaurus TCAD ecosystem
This structure is provided as an input to a device simulator that models electri-
cal/thermal transport behavior. This is followed by parameter extraction that is
useful for compact (SPICE) model development and verification, along with hard-
ware data from test structures in the process technology. Lithography simulation
(not shown), which is generally part of process simulation, assists in the formu-
lation of design rules and process-development-kits that are extensively used by
13
circuit designers. The two keys steps in Fig. 2.1 are process and device simulation,
which have been used throughout the dissertation, and are described next.
2.1.1 Process simulation
The primary objective of process simulation is to accurately predict the physi-
cal/structural layers and geometry of devices at the end of a process run, as well
as the active dopant/stress distributions. As shown in Fig. 2.3, the input to pro-
cess simulation is a process flow guided by process assumptions and layout/layer
masks. The initial wafer/substrate is subject to a variety of process conditions,
each of which may involve steps like oxidation, diffusion, implantation, deposi-
tion, etching, etc. Lithography simulation is also performed to accurately capture
feature geometries.
Process simulation generally uses a finite-element or finite-volume mesh to
compute and store the device dopant and stress profiles. Every geometric change
in the simulation domain requires a new mesh that fits the new device boundaries,
in order to model the next series of process steps. The accuracy of the profiles
strongly depends on the choice of mesh nodes at any given time. The mesh should
be sufficiently dense to resolve all dopant and stress profiles, but not too dense, as
the computational cost increases rapidly with the number of mesh nodes. For ex-
ample, a typical deep-submicron planar CMOS process simulation may have more
than 100 mesh modifications. For each mesh change, data values on the new mesh
are obtained through interpolation. Balancing interpolation error and computa-
tional cost is the key to successful TCAD simulation.
The complexity of physical models is a major factor that impacts process simu-
lation. Simplified physics minimizes computation time. With technology scaling,
however, the need for ever more accurate doping/stress profiles has increased and
complex physical models are added at each new generation. On account of the de-
14
Layout (post
design-rule check)
Layout transformations,
logical operations on layers
Substrate/initial mesh
Process condition 1
Oxidation/diffusion
simulation
Lithography simulation
Implantation simulation
Deposition simulation
Etching simulation
Process condition 2
Process condition N
Modified GDS layout/
masks
Re-mesh for
device simulation
Process simulation
module
Figure 2.3: Generic process simulation steps
tailed physical modeling involved, process simulation is almost exclusively used
to fine-tune the development of individual devices. The limitations of process sim-
ulation spurred research directions presented in Chapter 4. The output of process
simulation, i.e., the PSD structure, is generally re-meshed for device simulation,
which is discussed next.
2.1.2 Device simulation
Device simulation is used to analyze the electrical and thermal behavior of the
PSD structure obtained from extensive process simulations. Its main elements are
PSD structure, material system parameters, circuit/contact boundary conditions,
list of physical effects to be captured, numerical constraints on the solver, carrier
transport model, and the modes of simulation, i.e., DC, AC or transient, with spe-
cific external biasing conditions. There are two types of device simulation: single15
device and mixed-mode. Single device simulation is used to investigate transport
phenomena in a single device. Mixed-mode simulation is used to study the behav-
ior of small circuits constructed out of individual device instances and is generally
less rigorous in terms of physical models, owing to the increase in simulation com-
plexity. Next, we discuss the different transport models that are commonly used
in device simulation and have been adopted throughout this dissertation.
2.1.3 Device transport and physical models in TCAD
Fig. 2.4 shows the typical transport models used in modeling nanoscale semicon-
ductor devices as well as the the scope of traditional continuum TCAD device
simulation, which is based on semi-classical approaches with quantum correc-
tions. The starting point for the semi-classical approach to modeling transport
is the Boltzmann Transport Equation (BTE), which is essentially a statement of the
conservation of particle probability flux in the six-dimensional phase space of po-
sition~r and crystal momentum~k. The probability distribution function f (~r,~k, t) is
the probability of finding a carrier with crystal momentum~k at position~r at time
t. The time evolution of f enables the calculation of carrier density n(~r, t), current
density ~J(~r, t), and energy density W (~r, t) as,
n(~r, t) =1V
Σ~k f (~r,~k, t) (2.1)
~J(~r, t) =− qV
Σ~k~v(~k) f (~r,~k, t) (2.2)
W (~r, t) =1V
Σ~kE(~k) f (~r,~k, t) (2.3)
16
where~v(~k) and E(~k) are the carrier velocity and energy, respectively. The BTE can
be derived from simple arguments to be
∂ f (~r,~k, t)∂t
+~vg ·~∇~r f (~r,~k, t)+~Fh·~∇~k f (~r,~k, t) =
∂ f∂t|coll + s(~r,~k, t) (2.4)
where~vg =1h~∇~kE(~k) is the group velocity of the carriers, ~F is the net external force,
∂ f∂t |coll is the rate of change in f on account of collisions and scattering, s(~r,~k, t) rep-
resents the change due to generation-recombination processes, and h = h2π
, where
h is Planck’s constant.
Model ImprovementsApproximate
Exact
Compact model Appropriate for circuit
design
Drift-diffusion equations
Hydrodynamic equations
Boltzmann transport
equation
Quantum hydrodynamics
Quantum Monte-Carlo
methods
Quantum kinetic / Wigner
equation
Green’s function methods
Schrödinger equation
Good for devices down to
0.5µm, includes µ(E)
Velocity overshoot is
accounted for properly
Accurate up to classical
limits
Hydrodynamic features +
quantum corrections
Accurate up to single
particle description
All classical features +
quantum corrections
Includes correlations in
both space and time domain
Can be solved only for a few
particles
Sem
i-cla
ssic
al ap
pro
ach
es
Qu
an
tum
ap
pro
ach
es
TCAD scope
Figure 2.4: Transport models used in device simulation [2]
In a modern device simulator, the typical equations that describe the motion
of charge carriers in a semiconductor device are the Poisson equation and carrier
17
continuity equations for electrons and holes, which are:
~∇ · (ε~∇φ) = q · (n+N−A − p−N+D ) (2.5)
~∇ ·~Jn = q ·(
R+∂n∂t
)(2.6)
~∇ · ~Jp =−q ·(
R+∂p∂t
)(2.7)
where φ is the electrostatic potential, ε is the position-dependent dielectric permit-
tivity, n and p are electron and hole concentrations, N−A and N+D are ionized acceptor
and donor impurity concentrations, ~Jn and ~Jp are electron and hole current density
vectors, and R is the net generation-recombination rate.
Drift-diffusion model: The drift-diffusion formalism [2] is the simplest of trans-
port models, and is derived from the BTE under the relaxation-time approxima-
tion. It has been the workhorse of most device simulators up to the beginning of
the deep-submicron regime. The drift-diffusion current relations are derived un-
der the assumption that the carriers are in thermal equilibrium with the lattice, and
are:
~Jn = q ·µn ·n ·[~∇
(EC
q−φ
)+
kB
q· NC
n·~∇(
n ·TL
NC
)](2.8)
~Jp = q ·µp · p ·[~∇
(EV
q−φ
)− kB
q· NV
p·~∇(
p ·TL
NV
)](2.9)
where µn and µp are electron and hole mobilities, TL is the lattice temperature, EC
and EV are the position-dependent conduction and valence band edge energies,
NC and NV are the effective density of states at the conduction and valence band
edges, and kB is the Boltzmann constant.
Hydrodynamic model: At the next level, hydrodynamic/energy balance trans-
port formalisms [2] increase modeling complexity, as they are derived from higher
moments of the BTE, and can account for effects like velocity overshoot, etc. In
18
the hydrodynamic model, carrier temperatures (Tn and Tp for electrons and holes,
respectively) are assumed to be different from the lattice temperature (TL). Here,
the current densities are:
~Jn = q ·µn ·n[~∇
(EC
q−φ
)+
kB
q· NC
n·~∇(
n ·Tn
NC
)](2.10)
~Jp = q ·µp · p ·[~∇
(EV
q−φ
)− kB
q· NV
p·~∇(
p ·Tp
NV
)](2.11)
In addition to Eqs. (2.5), (2.6), and (2.7), in the hydrodynamic model, the energy
balance equations state the conservation of average carrier energies. In terms of
the carrier temperatures Tn and Tp, they are:
~∇ · ~Sn = ~∇
(EC
q−φ
)·~Jn−
3 · kB
2·[
∂(n ·Tn)
∂t+R ·Tn +n ·
(Tn−TL
τε,n
)](2.12)
~∇ · ~Sp = ~∇
(EV
q−φ
)· ~Jp−
3 · kB
2·[
∂(p ·Tp)
∂t+R ·Tp +n ·
(Tp−TL
τε,p
)](2.13)
where τε,n and τε,p denote the electron and hole energy relaxation times, while ~Sn
and ~Sp are the electron and hole energy fluxes computed as:
~Sn =−κn ·~∇Tn−52· kB ·Tn
q·~Jn (2.14)
~Sp =−κp ·~∇Tp +52·
kB ·Tp
q· ~Jp (2.15)
Here, the thermal conductivities κn and κp are assumed to obey the Wiedemann-
Franz Law and are related to Tn and Tp by:
κn =
(52+ cn
)· k
2Bq·Tn ·µn ·n (2.16)
κp =
(52+ cp
)· k
2Bq·Tp ·µp · p (2.17)
19
Different variations of the hydrodynamic model exist in the literature [2] and have
been implemented in commmerical device simulators.
Lattice heating: In order to account for heating effects, the lattice heat flow
equation [2] can be solved, which is:
~∇ · ~SL = HG−ρL · cL ·∂TL
∂t(2.18)
where SL is the lattice heat flux defined as,
~SL =−κL ·~∇TL (2.19)
and ρL, cL, and κL are the mass density, specific heat, and thermal conductivity, re-
spectively. HG is the generated local heat density and is calculated from the trans-
port model. In the drift diffusion case, HG is defined as,
HG = ~∇ ·(
EC
q−φ
)·~Jn +~∇ ·
(EV
q−φ
)· ~Jp (2.20)
In the hydrodynamic case, HG can be defined in terms of the relaxation times:
HG =3 · kB
2·[
n ·(
Tn−TL
τε,n
)+ p ·
(Tp−TL
τε,p
)](2.21)
Density gradient quantization model: While modeling nanoscale devices like
multi-gate FETs, it is essential to account for the effect of structural and electrical
quantum confinement on Vth. In semi-classical transport approaches, quantum ef-
fects are typically included as a potential-like correction (Λn and Λp for electrons
and holes, respectively) to the quasi-fermi level based calculations for carrier con-
20
centrations. In the case of electrons, under the Boltzmann approximation:
n = ni · exp(
EF,n−Ei−Λn
kB ·Tn
)(2.22)
where Λn is related to the density-gradient [2] as
Λn =−γ h2
6mn· ∇
2(√
n)√n
(2.23)
Here, ni is the intrinsic carrier concentration, EF,n and Ei are the electron quasi-
fermi and intrinsic energy levels, mn is electron effective mass, and γ is a fitting
factor.
Overall, in order to simulate multi-gate devices accurately using a commercial
device simulator like Sentaurus Device [83], quantum hydrodynamic models are
the best option in terms of the tradeoff between simulation accuracy vs. computa-
tion time.
2.2 Compact models for multi-gate FETs
While device simulators are reasonably accurate for a given transport model
framework, they are very slow for large-scale circuit simulation. Here, compact
models serve as a crucial link between process technology and circuit simula-
tion, by leveraging inputs from TCAD simulation and hardware data. Several
challenges exist in developing reliable scalable compact models for multi-gate
devices that can capture all physical regimes of device operation. Two popular
flavors of compact models for FinFETs/multi-gate FETs are available in the liter-
ature, namely, Spice3-UFDG from the University of Florida, Gainsville [42] and
BSIM-CMG/IMG from the University of California, Berkeley [41]. They are briefly
discussed below.
21
Spice3-UFDG model: Spice3-UFDG is a process/physics based model that re-
lies on charge-based modeling of generic double-gate MOSFETs [42]. It is an exten-
sion of the UFSOI/FD model [84], and incorporates a compact iterative Poisson-
Schrodinger solver with the primary assumption of a fully-depleted silicon body
under weak inversion. The model physically accounts for the charge coupling
between front and back gates, and computes the quantum mechanical carrier dis-
tribution throughout the body/channel regions in weak as well as strong inversion
regions. Since Spice3-UFDG is well calibrated with hardware data, we compared
Sentaurus TCAD device simulations of FinFET devices with Spice3-UFDG using
identical physical parameters. A sample case with 30nm gate length, 15nm fin
thickness, 1.2nm gate dielectric thickness, and 75nm fin height is shown in Fig.
2.5, where the two can be seen to be in good agreement for a wide range of bias
voltages. While Spice3-UFDG works well for single FET simulations, it faces con-
vergence issues for larger circuit-level simulations, thereby making it unattractive.
0 0.2 0.4 0.6 0.8 110
−8
10−7
10−6
10−5
10−4
10−3
10−2
VGS
(V)
I DS (
A/µ
m)
TCADSpice3−UFDG
@ VDS
= 1V
Figure 2.5: I-V comparison between Sentaurus Device and Spice3-UFDG compactmodel
BSIM-CMG/IMG model: Unlike Spice3-UFDG, BSIM-CMG/IMG are surface
potential based compact models, i.e., all terminal currents, charges, and capaci-
tances are derived from the surface potentials calculated in the device. Owing22
to the simplicity of the surface-potential formalism, the BSIM-CMG drain current
model for SG-mode FinFETs can be expressed as
IDS = 2 ·µ ·We f f
Le f f· [G(φS)−G(φD)] (2.24)
where We f f is the effective electrical FET width, Le f f is the effective FET gate length,
and the function G(φ) is given by
G(φ)=Q2
INV2COX
+2 · kBTq·QINV−
kBTq·[
5 · εSikBTqTSI
+QBULK
]·ln[
5 · εSikBTqTSI
+QBULK +QINV
](2.25)
Here, QINV and QBULK are the inversion and bulk depletion charges (which are
functions of φ), TSI is the fin thickness, and COX is the gate capacitance.
A significant advantage of the BSIM model is that it can correctly predict drain
current in fully-depleted as well as partially-depleted body regimes. It also cap-
tures many relevant effects seen in HKMG FinFETs such as quantum confinement,
velocity overshoot, gate-induced drain leakage, etc.
The BSIM-IMG model is a seperate model for IG-mode FinFETs, where the front
and back gates are biased differently. BSIM-IMG reuses several of the models pro-
vided in BSIM-CMG, and accurately accounts for the Vth dependence of the front
gate with respect to an applied reverse bias at the back gate. However, a signifi-
cant disadvantage of the model is its inability to correctly compute drain current
when the back channel is also inverted, as well as cases where the voltage differ-
ence between the front and back gates is smaller than the difference between the
gate-workfunctions of the front and back gates. While BSIM-CMG is relatively sta-
ble from a convergence perspective, BSIM-IMG has poor convergence properties
when simulating large circuits.
23
2.3 A generic multi-gate device fabrication flow
Vertical multi-gate transistors such as FinFETs, Ω-FETs, and Tri-gate FETs have
self-aligned gates, which is a major advantage from a fabrication perspective. The
major fabrication steps for FEOL processing of generic SOI FinFETs is shown in
Fig. 2.6 [1].
• The process starts with (A), which is an SOI wafer having a predefined sili-
con over insulator thickness that determines the fin height. With the aid of
either direct lithography or spacer lithography, fins of a critical dimension
are defined, followed by plasma etching to get to (B).
(A)(B)
(C)
(D)
(E)
(F)(G)
(H)
Figure 2.6: Generic multi-gate device fabrication flow
• Thereafter, oxidation and H2 annealing are used to smoothen the sidewall
surfaces. This is followed by the growth of the gate dielectric and deposition
of the metal gate to get to (C). For undoped body FinFETs, this is a critical
24
step, where the workfunction of the metal-gate/interface capping layers di-
rectly determines the Vth of the FinFET.
• Since the gate stack is deposited unevenly over the fin topography, it is essen-
tial to planarize and flatten the gate surface to get to (D), in order to enable
subsequent gate processing steps. Next, the gate patterning and gate etching
steps are performed to define the gate length of the FinFET, as in (E). Here,
the gate etching process needs to be highly selective to avoid damage to the
silicon fin.
• Next, large-angle, low-energy tilt implants are used to conformally dope the
source/drain regions and avoid migration of dopants into the undoped fin to
reach (F). This is followed by a sequence of steps to enable selective growth of
epitaxial source/drain regions without shorting them to the gate. To enable
the latter, source/drain offset nitride spacers are formed along the sidewalls
of the gate and the fin to reach (G).
• Finally, the fin spacers are removed and the extended source/drain regions
are subject to selective epitaxial growth as in (H), where epitaxy serves the
dual purpose of reducing parasitic resistances as well as reducing the number
of contacts needed to connect to multi-fin FinFET source/drain regions by
shorting them.
The flow described above is a gate-first process which involves gate definition
prior to source/drain implantation. For high-k metal-gate transistors, there are
predominantly two approaches that are possible, namely, gate-first and gate-last,
which are broadly applicable to multi-gate FETs as well. In a dual metal-gate-first
process (Fig. 2.7), the high-k gate dielectric is formed followed by metal-gate-1
(MG1) deposition, as in (A). Thereafter, MG1 is patterned and metal-gate-2 (MG2)
is deposited, as in (B). This is followed by MG2 patterning and gate etching to25
reach (C). Finally, source/drain implantation is performed followed by contacts
to the FETs. In a dual metal-gate-last or replacement gate process (Fig. 2.8), the
gate dielectric is formed and patterned along with a dummy polysilicon gate, as
in (A). This serves as a mask for source/drain implantation and deposition of in-
terlayer dielectrics (ILD) along with ILD polish, as in (B). Next, the sacrificial gate
is removed and MG1 is deposited and patterned, as in (C). Thereafter, MG2 is de-
posited/patterned and contacts to the FETs are formed, as in (D).
(A) (B)
(C)(D)
Silicon
Gate dielectric
STI
Metal-gate 1
Metal-gate 2
Contacts
Figure 2.7: Outline of a gate-first process
(A) (B)
(C)(D)
Silicon
Gate dielectric
STI
Metal-gate 1
Metal-gate 2
Contacts
Polysilicon
Figure 2.8: Outline of a gate-last process
26
Over the years, it has become increasing clear that the gate-last approach is
more favorable, and this has been adopted in process simulations in Chapters 5
and 6. The gate-last process induces strain effects on the FETs which can be sig-
nificant and greatly improve performance. Also, getting low Vth pFET devices in
a gate-first process is very difficult on account of thermal issues, which cause Vth
drift. However, despite the advantages, the gate-last process places constraints on
layout density, as it requires a chemical mechanical polishing (CMP) step at the
very end. This can significantly increase layout area with respect to an identical
topology implemented in a gate-first process.
2.4 Multi-gate FET adoption challenges
Several challenges need to be addressed to enable a smooth transition from planar,
single-gate FET technology to multi-gate FET technology. They can be classified
into process/device and circuit-design-specific issues.
Process/device issues:
• Lithography: Patterning vertical fins with dimensions many times smaller
than the wavelength of light (typically 193nm) at tight fin/gate pitches is
highly non-trivial (e.g., process steps, such as the removal of spacer mate-
rial around a fin without eroding it, require extreme precision). While spacer
lithography [1] is the preferred choice for fin patterning, for other layers, it is
unclear whether extreme-UV (EUV) or double-patterning will persist at the
lower technology nodes in terms of meeting yield constraints.
• Wafer scale fin height uniformity: In the case of bulk FinFETs/processes that
rely on some form of CMP, fin height tolerances are difficult to control. This
translates to a design problem as the electrical width of a FinFET is directly
proportional to the fin height.27
• Multi-gate parasitic resistances: Decreasing source/drain series resistances
to the channel as well as gate conductor resistance with fin/gate pitch scaling
is a major challenge. Here, fin aspect ratios and fin/gate pitches play an
important role in determining if the source/drain regions can be conformally
doped to yield very low resistances using low-energy tilt implants.
• Multi-gate parasitic capacitances: In order to enable robust bar contacts to
the FETs, nearly all future multi-gate technologies will rely on extended
source/drain epitaxy to short sources/drains of parallel fins. However, this
dramatically increases gate to source/drain parasitic capacitance and needs
to be addressed.
• Tuning the gate-workfunction: Since undoped-body FinFETs are likely to be
the first choice from a manufacturability perspective, process technologies
that permit broad tunability in n-/p-FinFET gate-workfunctions in an inde-
pendent manner (for obtaining high-Vth and low-Vth devices) are being re-
searched in the industry.
Circuit design issues:
• Width quantization: Multi-gate devices impose FET electrical width quanti-
zation, which is a design limitation for SRAM/analog/RF circuit designers.
• Circuit parasitics: Extraction of FEOL parasitics corresponding to generic
multi-gate circuit layouts is a major problem.
• Process variations: Although there is a significant improvement with re-
spect to planar devices [85, 86], sources of variation, such as fin/gate line-
edge roughness, grain-orientation dependent gate workfunction variation,
fin thickness/height variation, etc., are expected to affect performance in
multi-gate circuits. Hence, it is essential to develop methodologies to model
28
multi-gate circuits accurately in the presence of such variations to maximize
yield at design time, for any given process recipe.
29
Chapter 3
Design and Test of FinFET Logic
Circuits
In this chapter, we focus on two aspects of FinFET circuit design. The first sec-
tion deals with the design of low-power logic gates and sequential elements in a
high-performance FinFET process technology. The second section delves into the
development of fault models for FinFET logic circuits.
3.1 Design of logic gates and flip-flops in high-performance
FinFET technology
In this section, we delve into the design of ultralow power logic gates and se-
quential elements in a high-performance FinFET process technology, where the
leakage-delay tradeoff is an important consideration.
3.1.1 Introduction
Owing to the rapid increase in technology/process complexity at the lower
technology nodes, leading edge foundries are focusing on enabling both high-
30
performance and low-power devices in a single process [82]. This is very relevant
in the context of emerging multi-gate devices like FinFETs, where logic/sequential
circuit design tradeoffs have not been explored for high-performance processes.
In the transition from planar CMOS to FinFET standard cell design for ultra-low-
leakage, high-performance circuits, important questions that demand attention
are:
• From the perspective of process and layout complexity, is it profitable to use
IG-mode FinFETs at all? What is the best way to mix SG-/IG-mode FinFETs
in order to reduce leakage current in a standard cell?
• How do SG-/IG-mode FinFETs fare in terms of leakage under temperature
variations and what are the tradeoffs offered by different topologies for these
scenarios?
• Are there alternatives to using back-gate biased IG-mode FinFETs for leakage
reduction in a high-performance technology?
In addition to addressing the above, the major contributions in the current section
are as follows [60, 87]:
• We evaluate Symm-ΦG and Asymm-ΦG FinFET devices head-to-head in a
high-performance process using 3D device simulations in Sentaurus TCAD
[81].
• We examine the effect of physical device parameters on on-current (ION) and
off-current (IOFF ), and gate-workfunction fluctuations (which are likely to
be the largest sources of Vth variation [88–91]) on FinFET leakage via quasi-
Monte Carlo 3D device simulations.
• We comprehensively probe the design space of Symm-ΦG and Asymm-ΦG
FinFET logic gates and flip-flops along various electrical characteristic di-31
mensions (leakage, delay) and layout complexity/area by suitably mixing
SG-/a-SG-/IG-mode FinFETs, using mixed-mode 2D device simulations.
• For the first time, we also demonstrate that the most versatile Symm-ΦG
topologies fail to approach the leakage-delay trade-offs enjoyed by logic ele-
ments based on Asymm-ΦG SG-mode FinFETs. This suggests that it is more
practical to use Asymm-ΦG FinFETs for ultra-low-leakage designs in a high-
performance FinFET technology rather than integrate Symm-ΦG IG-mode
FinFETs, which have high area/process overheads and introduce additional
CAD/layout design/testing costs.
The rest of this section is organized as follows. In Section 3.1.2, we review re-
lated work. In Section 3.1.3, we evaluate key metrics of Symm-ΦG and Asymm-ΦG
FinFETs via 3D/2D device transport simulations. Thereafter, we employ mixed-
mode 2D device simulations in subsequent sections, owing to the rapid increase
in computational complexity/time of 3D device simulations. In Section 3.1.4, we
characterize various plausible Symm-ΦG and Asymm-ΦG FinFET inverter (INV)
and two-input NAND (NAND2) logic gates in detail to determine the most ver-
satile configurations with respect to electrical characteristics. In Section 3.1.5, we
examine tradeoffs in designing basic latch and flip-flop topologies using various
combinations of Symm-ΦG SG-/IG-mode and Asymm-ΦG SG-mode FinFETs, us-
ing insights from Sections 3.1.3 and 3.1.4. Finally, Section 3.1.6 presents the section
summary.
3.1.2 Related work
Circuit design based on low-leakage multi-gate FETs/FinFETs has garnered signif-
icant attention owing to the explosive increase in leakage power consumption in
planar FETs at lower technology nodes, over the past decade. Low-power multi-
32
gate circuit design has been explored from a device-circuit viewpoint in [92, 93].
In [94–98], logic styles leveraging the SG and IG modes of FinFET operation have
been investigated. FinFET latches and flip-flops have been studied in [99], [100].
Owing to its small dimensions, a FinFET is likely to suffer from the effects of pro-
cess and temperature variations. In [101], engineering the workfunction of the
gate material is shown to be effective in controlling Vth under variations and sen-
sitivity of device electrical parameters to fluctuations in gate length, fin thickness,
and gate dielectric thickness is also analyzed. In [88–91], gate-workfunction vari-
ation is shown to be the most important contributor to variation in Vth for metal-
gate FinFETs. FinFETs with asymmetric gate workfunctions in the form of n+/p+
polysilicon gates have been engineered and investigated in [102], [103]. Overall,
with respect to planar devices, FinFETs are expected to fare well from a variability
perspective [85, 86].
Since multi-gate adoption is likely to be driven by performance/area benefits,
in this work, we comprehensively characterize Symm-ΦG and Asymm-ΦG FinFETs
in a high-performance process. We also investigate various possible configurations
of logic gates and flip-flops employing such FinFETs through mixed-mode device
simulation (taking into account the effect of temperature), from a digital circuit
designer’s perspective. Preliminary results dealing with the latter were presented
in [60].
3.1.3 Symmetric-ΦG and asymmetric-ΦG FinFET devices
In this section, we evaluate Symm-ΦG and Asymm-ΦG FinFETs head-to-head in a
high-performance process. Owing to the absence of a suitable platform for multi-
gate circuit design exploration, we use FinE3D, an extension of FinE [104] (Fig.
3.1), which integrates double-gate compact models, like Spice3-UFDG [42], BSIM-
CMG/IMG [41], and a device simulator, like Sentaurus TCAD [81], into a single
33
framework. We utilized the SG-/IG-mode FinFET device structures shown in Figs.
3.2(a)/3.2(b) for 3D device transport simulations in Sentaurus Device [83]. Also,
MATLAB postprocessing
LTSpice netlist extraction
Quasi−MC process
variation moduleCompact Model
Spice3−UFDG
Parameter extraction
module
MATLAB GUI
Sentaurus TCAD
mixed mode device
simulation
Figure 3.1: FinE simulation framework for double-gate circuit design space explo-ration
DRAIN
SOURCEGATE
Z
XY
(a) SG-mode FinFET structure
DRAIN
SOURCE
FRONT GATE
BACK GATE
Y
Z
X
(b) IG-mode FinFET structure
Figure 3.2: SG-/IG-mode 3D FinFET structures simulated in Sentaurus TCAD
a two-dimensional (X-Y) cross-section of the device structures in Figs. 3.2(a) and34
Figure 3.3: Two-dimensional (X-Y ) cross-section of an n-FinFET simulated in Sen-taurus TCAD
3.2(b), as shown in Fig. 3.3, was employed for mixed-mode device-circuit simu-
lations. In Table 3.1, the parameters for a typical n-/p-FinFET device are listed,
where LGF , LGB, TOXF , TOXB, TSI , HFIN , HGF , HGB, LSPF , LSPB, LUN , NBODY , ΦGF , ΦGB,
NSD, VDD are the physical front- and back-gate lengths, front- and back-gate effec-
tive oxide thicknesses, fin thickness, fin height, front- and back-gate thicknesses,
front- and back-gate spacer thicknesses, gate-drain/source underlap, body dop-
ing, front- and back-gate workfunctions, source/drain doping, and the operating
voltage, respectively.
The fin body thickness is chosen to be small enough in comparison to the gate
length, in order to ensure that the gate has excellent control over the channel [1].
The channel region in the fin is typically undoped, owing to the small dimensions
of the device. The heavily doped extended raised source/drain regions (HCON ×
LCON) aid in forming contacts to the device. They lead into the source/drain
regions in the fin where the dopant concentration gradually decreases progress-
ing towards the relatively undoped body region, causing either an overlap (LOV )
or an underlap (LUN). The Vth of FinFETs is typically tuned by directly adjust-
ing the workfunction of the gate material [105]. The workfunctions for n-FinFET
35
Table 3.1: FinFET device parameters
PARAMETERSLGF ,LGB(nm) 25
Effective TOXF ,TOXB(nm) 1TSI(nm) 10
HFIN(nm) 50HGF ,HGB(nm) 20LSPF ,LSPB(nm) 20
LUN(nm) 10NBODY (cm−3) 1015
ΦGF ,ΦGB(eV ) ΦGn: 4.4, ΦGp: 4.8NSD(cm−3) 1020
VDD(V ) 1VHIGH(V ) 1.2VLOW (V ) −0.2
ΦGF
=ΦGB
=4.4eV ΦGF
=ΦGB
=4.8eV
(a) (b) (c) (d)
Figure 3.4: Symm-ΦG FinFET symbols: (a) SG-mode n-type, (b) IG-mode n-type,(c) SG-mode p-type, and (d) IG-mode p-type
(ΦGF =ΦGB =ΦGn = 4.4eV ) and p-FinFET (ΦGF =ΦGB =ΦGp = 4.8eV ) devices were
chosen corresponding to high-performance logic requirements [1] and yield low-
Vth devices, whose symbols are shown in Fig. 3.4.
ION and IOFF characteristics
We revisit the physics of SG- and IG-mode FinFET devices, to better appreciate
the limitations of Symm-ΦG devices and the advantages of Asymm-ΦG FinFETs.
Accounting for temperature effects, we performed hydrodynamic mixed-mode 3D
device simulations on carefully-defined meshes (for excellent convergence) and
invoked the density gradient model for incorporating quantum effects in a thin fin.
36
We ignored the effects of gate tunneling currents owing to the undoped fin, and
used an effective oxide thickness that can easily be realized using thicker high-k
dielectrics to suppress gate leakage.
Y
X
DRAIN
SOURCE
FR
ON
T G
AT
E
BA
CK
GA
TE
(a) ON-state electrostaticpotential
Y
X
DRAIN
SOURCE
FR
ON
T G
AT
E
BA
CK
GA
TE
(b) OFF-state electrostaticpotential
FR
ON
T G
AT
E
Z
X
BA
CK
GA
TE
(c) ON-state electron den-sity
BA
CK
GA
TE
FR
ON
T G
AT
E
Z
X
(d) OFF-state electrondensity
Figure 3.5: Electrostatic potential and electron density distributions within the finregion of an SG-mode n-FinFET for on-state (VGFS = VGBS = 1V,VDS = 1V ) and off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions
37
Figs. 3.5(a) and 3.5(b) show the electrostatic potential in the fin region (X-Y
plane) of an SG-mode n-FinFET under on-state (VGFS = VGBS = 1V,VDS = 1V ) and
off-state (VGFS =VGBS = 0V,VDS = 1V ) conditions, respectively. In the on-state, both
gates contribute to band-bending such that inverted regions [Fig. 3.5(c)] form be-
side both gates (and move toward the fin center as TSI decreases, due to increased
quantum confinement), leading to high drain current. In the off-state, the fin cen-
ter is most susceptible to leakage [Fig. 3.5(d)], as the potential barrier height for
electrons is higher for paths closer to either gate.
In Figs. 3.6(a)-3.6(d), the electrostatic potential and electron density in an IG-
mode n-FinFET is shown with VGBS = −0.2V . The bias on the back-gate causes an
inverted region to form predominently near the front gate, which contributes to
the drain current in the on-state [Figs. 3.6(a), 3.6(c)], and leads to leakage paths
beside the front gate in the off-state [Figs. 3.6(b), 3.6(d)]. The peak electron density
in the off-state (which is tunable using VGBS) is over an order of magnitude smaller
in the IG mode in comparison to the SG mode, indicating that IG-mode FinFETs
have lower leakage.
Fig. 3.7 shows the dependence of drain current (IDS) on front-gate voltage (VGFS)
for an IG-mode n-FinFET with VDS = 1V and back-gate voltage (VGBS) varying from
0V to −0.3V . This suggests that IG-mode FinFETs (with a strong reverse bias on
the back-gate) can reduce leakage by upto two orders of magnitude in FinFET
standard cells in high-performance processes.
Next, we introduce Asymm-ΦG FinFETs and demonstrate that they possess
steep subthreshold characteristics that can be employed in the design of ultra-
low-leakage logic circuits in high-performance process technologies, thus reduc-
ing the need for Symm-ΦG IG-mode FinFET based back-gate biasing schemes.
Asymm-ΦG FinFETs can be formed by adjusting workfunctions on each side of
the SG-mode FinFET using selective implantation of a suitable dopant for the gate
38
Y
X
DRAIN
SOURCE
BA
CK
GA
TE
FR
ON
T G
AT
E
(a) ON-state electrostaticpotential
X
Y
DRAIN
FR
ON
T G
AT
E
BA
CK
GA
TE
SOURCE
(b) OFF-state electrostaticpotential
X
Z
FR
ON
T G
AT
E
BA
CK
GA
TE
(c) ON-state electron den-sity
X
Z
BA
CK
GA
TE
FR
ON
T G
AT
E
(d) OFF-state electrondensity
Figure 3.6: Electrostatic potential and electron density distributions within the finregion of an IG-mode n-FinFET for on-state (VGFS = 1V,VGBS = −0.2V,VDS = 1V ),and off-state (VGFS = 0V,VGBS =−0.2V,VDS = 1V ) conditions
stack. This has been demonstrated for n+/p+ polysilicon gates using large-angle
tilt implants [102], [103]. If the choice of front/back-gate workfunctions is identi-
cal to that of high-performance n-/p-FinFET metal-gate workfunctions, as shown
39
Figure 3.7: IDS vs. VGFS for an IG-mode n-FinFET, VDS = 1V,VGBS varying from 0Vto −0.3V . IOFF = IDS(VGFS = 0V ) varies by 120×
ΦGF
=4.8eV (b)
ΦGB
=4.4eVΦGB
=4.4eV
ΦGF
=4.8eV(a)
Figure 3.8: Asymm-ΦG FinFET symbols: (a) a-SG-mode n-type, and (b) a-SG-modep-type
in Fig. 3.8, it would be favorable from a fabrication perspective. All Asymm-ΦG
FinFETs, n- or p-channel, would have both workfunctions on either side of the
fin, without the need for complicating the process with a third gate workfunction
exclusively for high-Vth devices, and high-performance SG-mode Symm-ΦG n-/p-
FinFETs would be fabricated along with them using the same gate workfunctions.
In Fig. 3.8, both n-FinFETs and p-FinFETs have 4.4eV /4.8eV workfunctions, with
the source/drain doping determining the type of majority charge carrier conduc-
tion during the on-state. Since the gates of Asymm-ΦG FinFETs are shorted, they
are also referred to as ‘a-SG-mode’ FinFETs.
40
Y
X
DRAIN
SOURCE
BA
CK
GA
TE
(ΦG
= 4
.4eV
)
FR
ON
T G
AT
E (Φ
G =
4.8
eV)
(a) ON-state electrostaticpotential
Y
X
DRAIN
SOURCE
BA
CK
GA
TE
(ΦG
= 4
.4eV
)
FR
ON
T G
AT
E (Φ
G =
4.8
eV)
(b) OFF-state electrostaticpotential
X
Z
BA
CK
GA
TE
(ΦG
= 4
.4eV
)
FR
ON
T G
AT
E (Φ
G =
4.8
eV)
(c) ON-state electron den-sity
Z
X
FR
ON
T G
AT
E (Φ
G =
4.8
eV)
BA
CK
GA
TE
(ΦG
= 4
.4eV
)
(d) OFF-state electrondensity
Figure 3.9: Electrostatic potential and electron density distributions within the finregion of an a-SG-mode n-FinFET for on-state (VGFS = VGBS = 1V,VDS = 1V ), andoff-state (VGFS =VGBS = 0V,VDS = 1V ) conditions
From Fig. 3.9(a), we see that during the on-state (VGFS = VGBS = 1V,VDS = 1V ),
the electrostatic potential distribution in an a-SG-mode n-FinFET approaches that
of a Symm-ΦG SG-mode n-FinFET [Fig. 3.5(a)], resulting in a reasonably high drain
41
current. This is also indicated by volume inversion in the fin [Fig. 3.9(c)]. In the off-
state [Figs. 3.9(b), 3.9(d)], the energy bands bend strongly near the front-gate side
(as ΦGF = 4.8eV ), thereby raising the barrier for electrons. The electrostatic poten-
tial/electron density distributions are qualitatively identical to those observed in
the Symm-ΦG IG-mode FinFETs in the off-state in Figs. 3.6(b) and 3.6(d), respec-
tively.
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
−2
−1.5
−1
−0.5
0
0.5
X (µm)
Ban
d E
nerg
y (e
V)
Source
DrainAt back−gateAt front−gate
At fin center
(a) a-SG-mode
−0.03 −0.02 −0.01 0 0.01 0.02 0.03
−2
−1.5
−1
−0.5
0
0.5
X (µm)
Ban
d E
nerg
y (e
V)
Source
At front−gateDrain
At back−gate
At fin center
(b) IG-mode
Figure 3.10: Energy band diagrams for (a) a-SG-mode n-FinFET, off-state (VGFS =VGBS = 0V,VDS = 1V ), and (b) IG-mode n-FinFET, off-state (VGFS = 0V,VGBS =−0.2V,VDS = 1V )
From Figs. 3.10(a) and 3.10(b), we can see that the amount of band-bending near
the front gate is stronger for a-SG-mode FinFETs in comparison to the back gate in
Symm-ΦG IG-mode FinFETs (VGBS =−0.2V ), whereby the leakage current of a-SG-
mode devices is lower. Therefore, Asymm-ΦG FinFETs combine the advantages
offered by Symm-ΦG SG- and IG-mode FinFETs, with SG-mode-like ION and IG-
mode-like IOFF . Fig. 3.11 quantifies the above, showing that Symm-ΦG SG-mode
(IG-mode) n-FinFETs have 415× (15×) higher leakage current compared to a-SG-
mode devices at 300K. Similarly, Fig. 3.12 shows that Symm-ΦG SG-mode (IG-
mode) p-FinFETs have 175× (5×) higher leakage than a-SG-mode p-FinFETs.
42
0 0.2 0.4 0.6 0.8 110
−12
10−10
10−8
10−6
10−4
10−2
VGFS
(V)
I DS (
A)
a−SG−mode (VGBS
= VGFS
)
IG−mode (VGBS
= −0.2V)
SG−mode (VGBS
= VGFS
)15X
415X
68% ION
reduction w.r.t SG−mode
26% ION
reduction w.r.t SG−mode
Figure 3.11: IDS vs. VGFS for an a-SG-mode n-FinFET (VDS = 1V ), with correspond-ing curves for SG-mode and IG-mode n-FinFETs
−1 −0.8 −0.6 −0.4 −0.2 010
−12
10−10
10−8
10−6
10−4
10−2
VGFS
(V)
I DS (
A)
SG−mode (VGBS
= VGFS
)
IG−mode (VGBS
= 0.2V)
a−SG−mode (VGBS
= VGFS
41% ION
reduction w.r.t SG−mode
88% ION
reduction w.r.t SG−mode
175X
5X
Figure 3.12: IDS vs. VGFS for an a-SG-mode p-FinFET (|VDS|= 1V ), with correspond-ing curves for SG-mode and IG-mode p-FinFETs
43
Effect of device parameter variations
We also investigated the effect of parameteric variations in LG, TSI , and LUN on
ION and IOFF . Fig. 3.13(a) shows that in SG-/a-SG-mode and IG-mode FinFETs,
ION decreases almost linearly with an increase in LG. ION increases linearly with an
increase in TSI in Fig. 3.13(b), with higher slopes for SG-/IG-mode FETs in compar-
ison to a-SG-mode FETs. Fig. 3.13(c) shows that ION in SG-mode FETs is very sen-
sitive to reduction in LUN , followed by IG-mode FETs, while a-SG-mode FETs are
relatively immune to changes in LUN . IOFF , on the other hand, is greatly affected by
all three parameters. Fig. 3.14(a) shows that IOFF in SG-/IG-mode devices has an
exp(k/LG) dependence, while a-SG-mode FETs show a stronger exp(k1/LG + k2LG)
dependence, where k, k1, and k2 are constants. Fig. 3.14(b) shows that IOFF has an
exp(k1/LG + k2LG) dependence in all cases, with different values for k1 and k2 for
each device. IOFF appears to roughly have an exp(k1L2UN + k2LUN) dependence on
LUN in all cases in Fig. 3.14(c).
Effect of gate-workfunction fluctuations
Since metal-gate FET Vths are linearly dependent on the gate workfunction, we
studied the effect of workfunction fluctuations on IOFF (or ILEAK) in n-FinFETs.
In [88], gate-workfunction variation is shown to be the major cause of Vth variation,
in comparison to LG and TSI , which have minor contributions. Using a quasi-Monte
Carlo (QMC) sample generator based on Sobol’s sequence [106], we performed
QMC 3D device simulations, varying ΦG for SG-/a-SG-/IG-mode n-FinFETs with
σΦG = 50meV , and limited the total sample count to 100 samples on account of the
prohibitively large runtimes for 3D device simulation. While conventional Monte
Carlo methods suffer from the sample clustering problem, QMC methods based
on low discrepancy sequences [107] sample the design space uniformly, leading to
much quicker convergence with fewer samples. In Fig. 3.15, the ILEAK distribu-
44
0.02 0.022 0.024 0.026 0.028 0.032
4
6
8
10
12
14x 10
−5
LG
(µm)
I ON (
A)
SG−modea−SG−modeIG−mode
(a) ION vs. LG
0.006 0.008 0.01 0.012 0.0140
0.2
0.4
0.6
0.8
1
1.2
1.4x 10
−4
TSI
(µm)
I ON (
A)
SG−modea−SG−modeIG−mode
(b) ION vs. TSI
4 6 8 10 12
x 10−3
2
4
6
8
10
12
14
16x 10
−5
LUN
(µm)
I ON (
A)
SG−modea−SG−modeIG−mode
(c) ION vs. LUN
Figure 3.13: ION characteristics vs. variations in LG, TSI , and LUN
tions are shown, where a-SG-mode devices have lower/comparable spreads with
respect to SG-/IG-mode FinFETs. The above investigation into parameteric de-
pendencies with respect to LG, TSI , and LUN , and variation analysis based on gate-
workfunction fluctuations suggests that a-SG-mode FinFETs are likely to be very
robust to process variations.
Effect of temperature on leakage
Figs. 3.16(a) and 3.16(b) capture the variation in IOFF for SG-mode and IG-mode
FinFETs with temperature varying between 280K and 400K. IG-mode FinFETs reg-
ister a change of 200× in IOFF , while SG-mode FinFETs display a change of 70×.
45
0.02 0.022 0.024 0.026 0.028 0.03−0.08
−0.06
−0.04
−0.02
0
0.02
LG
(µm)
LG
log
10 (
I OF
F/1
nA
)
SG−modea−SG−modeIG−mode
(a) IOFF vs. LG
0.006 0.008 0.01 0.012 0.014−0.02
−0.015
−0.01
−0.005
0
0.005
0.01
0.015
TSI
(µm)
TS
I log
10 (
I OF
F/1
nA
)
SG−modea−SG−modeIG−mode
(b) IOFF vs. TSI
4 6 8 10 12
x 10−3
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
LUN
(µm)
log
10 (
I OF
F/1
nA
)
SG−modea−SG−modeIG−mode
(c) IOFF vs. LUN
Figure 3.14: IOFF characteristics vs. variations in LG, TSI , and LUN
This suggests that the ILEAK advantage in topologies having a mix of SG- and IG-
mode FinFETs would lessen relative to those having only SG-mode FinFETs, with
an increase in temperature. Fig. 3.17 shows that even with a 100K increase in tem-
perature, a-SG-mode devices have two (one) orders of magnitude lower IOFF than
Symm-ΦG SG-mode (IG-mode) FinFETs. However, the IOFF advantage of IG-mode
and a-SG-mode over SG-mode reduces by ∼ 2× (from 18×→ 8×) and ∼ 6× (from
640×→ 104×), respectively.
46
a−SG
IGSG
Figure 3.15: ILEAK distribution for a-SG-/SG-/IG-mode n-FinFETs under gateworkfunction fluctuations, σΦG = 50meV
(a) SG-mode (b) IG-mode (VGBS =−0.2V )
Figure 3.16: IDS vs. VGFS for an n-FinFET at different temperatures
3.1.4 Symmetric-ΦG and asymmetric-ΦG FinFET logic gates
A significant problem with logic circuits implemented in high-performance pro-
cess technologies is the relatively high leakage current that is concomitant with
the high on-state current. Hence, circuit topologies with low-leakage that do not
compromise on performance constitute the optimal design points. In this section,
47
280 300 320 340 360 380 40010
−12
10−11
10−10
10−9
10−8
10−7
Temperature (K)
I OF
F (
A)
a−SG−modeSG−modeIG−mode (V
GBS = −0.2V)
640X
104X
18X
8X
Figure 3.17: IOFF vs. temperature for an a-SG-mode n-FinFET with correspondingcurves for SG-mode and IG-mode n-FinFETs
we explore the design space of Symm-ΦG FinFET INV and NAND2 gates in detail
to determine the most versatile topologies that can arise by mixing Symm-ΦG SG-
and IG-mode FinFETs.
3D versus 2D device simulation
Owing to the prohibitively high computational costs involved in single FET 3D
transport simulations, mixed-mode 3D device simulations for FinFET circuits is
intractable in practical timeframes. Also, transient simulations, which are neces-
sary to capture logic element delays, are extremely cumbersome to perform via
3D device simulation on account of which device simulations on a 2D structure
(corresponding to a slice of the 3D FinFET device) are used hereafter. Since 2D
simulations do not fully capture all physical effects (e.g., corner effect [1]) on car-
rier transport, we computed the error percentage from the drain-currents, (IDS,2D−
IDS,3D)/IDS,2D vs. VGFS from 2D/3D device simulations [Fig. 3.18]. In general, 2D
device simulation overpredicts IOFF and underestimates ION with respect to cor-
responding 3D simulations. Also, a-SG-mode devices have relatively larger dif-
48
ferences between 2D and 3D simulations in the sub-threshold regime, in compar-
ison to SG-/IG-mode devices. Overall, IOFF and ION predictions are marginally
different across all FETs (within 25% for IOFF and 12.6% for ION), suggesting that
reasonably accurate comparisons can be made with mixed-mode 2D device circuit
simulations.
0 0.2 0.4 0.6 0.8 1−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
VGFS
(V)
(ID
S, 2
D −
I DS
, 3D)/
I DS
, 2D
SG−modeIG−modea−SG−mode
25%
4.4%
19%
−12.6%
−4.5%
Figure 3.18: Fractional error in IDS vs. VGFS for 2D/3D device simulations
Symm-ΦG and Asymm-ΦG logic gates
Fig. 3.19 shows four possible INV configurations with SG-/IG-mode FinFETs: SG-
, low-power (LP-) [94], IGn- and IGp-INV. The SG-INV configuration has only
SG-mode n-/p-FinFETs with a highly compact layout, as shown in Fig. 3.20(a).
In the LP-INV configuration [Fig. 3.19(b)], the back-gate of PA (NA) in the pull-
up (pull-down) network is biased to VHIGH (VLOW ), necessitating a complex layout
[Fig. 3.20(b)] with 36% larger area than size X2 SG-INV, while IGn-INV [Fig. 3.20(c)]
and IGp-INV [Fig. 3.20(d)] occupy the same area as LP-INV, owing to the multi-
fin IG-mode FinFET back-gate contacts [108]. Amongst NAND2 gates [Figs. 3.21,
3.22], while SG-NAND2 has the most compact layout [Fig. 3.23(a)], LP-NAND2
[Fig. 3.23(b)] occupies 27% more area than size X2 SG-NAND2, with a staggered
49
pull-up network of parallel FinFETs, and shared back-gate contacts for the se-
ries pull-down FinFETs. Mixed-terminal (MT-) NAND2 [109] is identical to LP-
NAND2 in area, with NB in SG mode [Fig. 3.21(c)]. IG- and IG2-NAND2 combine
the parallel FinFETs of the pull-up network into a single p-FinFET, whereby the
layout area is the same as SG-NAND2. XT-NAND2 is a variant of MT-NAND2,
with both FinFETs of the pull-down network in SG mode and identical layout
area (not shown). XT2-NAND2 is also a variant of MT-NAND2, with both par-
allel FinFETs of the pull-up network in SG mode, which enables a compact layout
[Fig. 3.23(c)] with the same area as SG-NAND2.
Figure 3.19: INV gates: (a) SG, (b) LP, (c) IGn, and (d) IGp
(a) SG (size X2) (b) LP (size X1) (c) IGn (size X1) (d) IGp (size X1)
Figure 3.20: INV layouts
50
Figure 3.21: NAND2 gates: (a) SG, (b) LP, and (c) MT
Figure 3.22: NAND2 gates: (a) IG, (b) IG2, (c) XT, and (d) XT2
Figs. 3.24(a) and 3.24(b) show Asymm-ΦG SG-mode FinFET INV and NAND2
gates, respectively. (Note that any Symm-ΦG FinFET logic gate schematic/layout
can be converted to the corresponding Asymm-ΦG version by replacing the de-
vices, with no layout overheads). For generalized pull-up and pull-down net-
works, it is possible to mix Asymm-ΦG FinFETs for leakage reduction with Symm-
ΦG FinFETs for speed. This strategy was applied to the NAND2 gate to yield the
NAND2S gate shown in Fig. 3.24(c).
Leakage-delay characteristics: Symm-ΦG logic
In Table 3.2 and Fig. 3.25, the leakage-delay characteristics of the Symm-ΦG FinFET
INV standard cells are shown. The leakage current, ILEAK , is an average over all
input vectors and delay, tp, is the fanout-of-four (FO4) delay. All comparisons
below are drawn with respect to SG-INV (size X2), as it is the largest single finger
SG-INV that can be accommodated for the chosen standard cell height.
51
(a) SG (size X2) (b) LP (size X1) (c) XT2 (size X1)
Figure 3.23: NAND2 layouts
Figure 3.24: Asymm-ΦG SG-mode FinFET gates: (a) a-SG-INV, (b) a-SG-NAND2,and (c) a-SG-NAND2S
In Fig. 3.25, VHIGH and VLOW are varied (if permitted by the topology), in order to
sweep the design space. SG-INV (size X2) has the smallest delay tp = 3.31 ps, with
the largest average ILEAK [SG-INV (size X1) was found to have tp = 5.75 ps]. LP-INV
shows over an order of magnitude reduction in mean ILEAK with a 267% (111%)
increase in tp with respect to SG-INV size X2 (size X1). From Fig. 3.25, it is clear that
the dominant factor affecting tp, for the current choice of ΦGn and ΦGp, is VHIGH .
For IGp-INV, lowering VHIGH increases tp and only marginally reduces average
ILEAK . For LP-INV, varying VLOW (with VHIGH = 1.2V ) presents a lower slope on the
leakage-delay plot in comparison to varying VHIGH (with VLOW = −0.2V ), which
reaffirms the above. IGn-INV appears to provide the best leakage-delay tradeoff,
52
Table 3.2: Standard cell FinFET INV characteristics, VLOW =−0.2V,VHIGH = 1.2V
Topology SG LP IGn IGpArea (w.r.t to SG) 1 1.36 1.36 1.36Avg. ILEAK (nA) 2.51 0.12 0.33 2.31
tp (ps) 3.31 12.15 5.55 9.66
Figure 3.25: Leakage-delay spectrum for FinFET INV configurations
with upto an order of magnitude reduction in average ILEAK at the cost of 66%
increase in tp with respect to SG-INV (size X2) and marginally better tp than SG-
INV (size X1).
Table 3.3: Standard cell FinFET NAND2 characteristicsTopology SG LP MT IG IG2 XT XT2
Area (w.r.t to SG) 1 1.27 1.27 1 1 1.27 1Avg. ILEAK (nA) 2.76 0.15 1.05 2.76 1.16 2.72 1.16tp(Toggle A) (ps) 5.47 22.60 20.80 8.77 11.40 17.50 8.04tp(Toggle B) (ps) 5.07 22.82 19.66 8.56 10.26 18.17 7.01
tp(Toggle AB) (ps) 4.41 15.33 13.66 4.41 6.85 10.50 6.85
In Table 3.3 and Fig. 3.26, the leakage-delay spectrum for the various FinFET
NAND2 gates is shown. All comparisons below are drawn with respect to SG-
53
Figure 3.26: Leakage-delay spectrum for FinFET NAND2 configurations
NAND2 (size X2), as it is the largest SG-NAND2 that can be accommodated in the
chosen standard cell height. In Fig. 3.26, LP-NAND2 (VLOW =−0.2V,VHIGH = 1.2V )
shows over an order of magnitude reduction in mean cell leakage with around
4× higher tp in comparison to SG-NAND2. As with the INV cases, varying VHIGH
presents a steep slope in the leakage-delay plot for our choice of ΦGp and ΦGn,
suggesting that pull-up FinFETs should be in SG mode. This is also seen for XT-
NAND2 and MT-NAND2 gates, where varying VHIGH only increases delay and
does not decrease the average ILEAK . IG-NAND2 does not gain in average ILEAK
in spite of combining the parallel pull-up FinFETs into a single p-FinFET. Instead,
the rising delay, tpLH , degrades, which increases tp. IG2-NAND2 has a larger tp
compared to IG-NAND2 over the entire spectrum of VLOW variation due to higher
falling delay, tpHL (owing to a slower pull-down stack). However, decreasing VLOW
enables over 50% reduction in average ILEAK . XT2-NAND2 presents a similar
tradeoff in average ILEAK reduction, with the benefit of lower tpLH , owing to a fast,
parallel SG-mode pull-up. Overall, XT2-NAND2 lies closest to SG-NAND2 in the
54
leakage-delay spectrum, offering the best way to leverage back-gate biasing to re-
duce average ILEAK , without a significant degradation in delay.
0 1 2 3 4 5 6
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vol
tage
(V
)
VA, V
B
VOUT
, Toggle A
VOUT
, Toggle B
VINT
, Toggle A
VINT
, Toggle B
(a)
0 1 2 3 4 5 6
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
VG
FS, F
inF
ET
NA
Toggle A, B=1Toggle B, A=1
(b)
Figure 3.27: SG-NAND2 transient charactertistics. Input rise time has been in-creased to 50ps from 10ps to improve visibility.
We see from Table 3.3 that unlike traditional planar bulk CMOS NAND2 gates,
tp(Toggle A) ≥ tp(Toggle B) for many of the FinFET logic styles [e.g., Figs. 3.27(a)
and 3.28(a)]. This is dependent on the input slew rate, intermediate node capaci-
tance (CINT )/node voltage (VINT ) of the pull-down stack, output load capacitance
(COUT ) and modes of FinFET operation in the logic gate. In Figs. 3.27(a) and 3.27(b),
the transient behavior of SG-NAND2 is shown, with VGFS across FinFET NA ris-
ing slightly faster for the Toggle B condition in comparison to Toggle A. Hence,
tp(Toggle A) > tp(Toggle B). This is exacerbated in XT2-NAND2 [Figs. 3.28(a) and
3.28(b)] as VINT does not rise to VDD when VOUT =VA =VDD, owing to the IG-mode
FinFET NA, which loses gate drive very quickly when VINT increases, and VGFS is
non-zero in the DC condition. The latter along with the fact that CINT COUT (CINT
mainly consists of source/drain-body depletion capacitances, which are negligible
in FinFETs) helps VGFS develop very quickly across NA in the Toggle B condition in
comparison to Toggle A [Fig. 3.28(b)].
55
0 1 2 3 4 5 6
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vol
tage
(V
)
VA, V
B
VOUT
, Toggle A
VOUT
, Toggle B
VINT
, Toggle A
VINT
, Toggle B
(a)
0 1 2 3 4 5 6
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
VG
FS (
V),
Fin
FE
T N
A
Toggle A, B=1Toggle B, A=1
(b)
Figure 3.28: XT2-NAND2 transient charactertistics. Input rise time has been in-creased to 50ps from 10ps to improve visibility.
From the above analysis, introducing a single IG-mode n-FinFET in the pull-
down series stack with only SG-mode p-FinFETs in the pull-up network, as with
XT2-NAND2, appears to be the best method to leverage the leakage-delay tradeoff
using back-gate biasing in high-performance Symm-ΦG FinFET standard cells.
Leakage-delay characteristics: Asymm-ΦG logic
Fig. 3.29 shows the leakage-delay characteristics of the Asymm-ΦG gates com-
pared to their corresponding Symm-ΦG SG-mode counterparts as well as IGn-INV
and XT2-NAND2 gates, which were the best Symm-ΦG gates. a-SG-INV gates
are 60% slower than their SG-INV counterparts, with average leakage that is 238×
lower, while a-SG-NAND2 gates are 65% slower than SG-NAND2 gates, with 235×
lower leakage. (a-SG-NOR2, a-SG-XOR2, a-SG-XNOR2) gates had (234×, 206×,
234×) lower average leakage compared to (SG-NOR2, SG-XOR2, SG-XNOR2) with
(34%, 20%, 10%) higher delay, respectively. The NAND2S gate, which introduces a
Symm-ΦG SG-mode n-FinFET to reduce delay, has SG-NAND2-like leakage for the
‘10’ vector, thereby increasing overall average ILEAK . From Fig. 3.29, it is also clear
that the best mixed SG-/IG-mode configurations like IGn-INV and XT2-NAND2
56
10−11
10−10
10−9
10−8
2
4
6
8
10
12
14x 10
−12
Average ILEAK
(A)
Ave
rage
FO
4 de
lay
(s)
a−SG−NAND2S
XT2−NAND2
SG−NOR2
SG−NAND2
SG−XNOR2
IGn−INV
SG−INV
SG−XOR2
a−SG−NAND2
a−SG−XNOR2
a−SG−INV
a−SG−XOR2
a−SG−NOR2
Figure 3.29: Leakage-delay spectrum for asymm-ΦG FinFET logic gates
are not as well placed as their a-SG-mode counterparts in the leakage-delay spec-
trum.
Effect of temperature on leakage
From Figs. 3.16(a) and 3.16(b), we can see that IG-mode FinFETs have a larger
fractional change in leakage current with increasing temperature. This is reflected
in logic gates as well [Fig. 3.30(a)], where the average ILEAK gap between SG- and
LP-INV decreases from 35× at 280K to 12.5× at 400K.
Fig. 3.30(b) reiterates the above observation, where the average ILEAK fractional
gap between SG-NAND2 and LP-NAND2 decreases from 22× at 280K to 13× at
400K. IG2-, MT- and XT2-NAND2 show similar trends with a 2.7× to 2.5× reduc-
tion in the average ILEAK gap. a-SG-mode devices, which display excellent leakage
behavior with an increase in temperature [Fig. 3.17], translate their benefits to a-
SG-mode logic gates as well with an order of magnitude lower average ILEAK in
57
280 300 320 340 360 380 40010
−11
10−10
10−9
10−8
10−7
Temperature (K)
Ave
rage
I L
EA
K (
A)
12.5X
35X9X
4X
LP
IGn
SG
IGp
(a) Symm-ΦG INV
280 300 320 340 360 380 40010
−11
10−10
10−9
10−8
10−7
Temperature (K)
Ave
rage
I L
EA
K (
A)
XT2
IG
XT
MT
22X
2.7X
13X
2.5XIG2
SG
LP
(b) Symm-ΦG NAND2
Figure 3.30: Average leakage (ILEAK) vs. temperature for FinFET INV and NAND2standard cells
(a-SG-INV, a-SG-NAND2) with respect to (SG-INV, SG-NAND2) and (IGn-INV,
XT2-NAND2) gates (not shown).
3.1.5 Symmetric-ΦG and asymmetric-ΦG FinFET latches and flip-
flops
Next, we investigate simple latches and flip-flops that leverage combinations of
Symm-ΦG and Asymm-ΦG FinFETs, using insights from earlier sections. We mod-
ified four template configurations, namely, the brute-force transmission gate [TGL,
Fig. 3.31(a)] and half-swing clocked FinFET latches [HSL, Fig. 3.31(b)], and the
corresponding flip-flops [TGF, Fig. 3.32; HSF, Fig. 3.33], in order to demonstrate
the importance of choosing the appropriate kinds of FinFETs to optimize leakage,
propagation delay, and setup time.
Tables 3.4 and 3.5 show the various possible cases of interest for TGL, TGF, HSL,
and HSF using SG-, a-SG-, and IG-mode FinFETs along with their fin counts. TGL1
and TGF1 have only SG-mode FinFETs, which necessitates a larger I1 inverter in
order to overcome I3 and force the data into the cross-coupled inverter configura-
58
(a) TG latch (TGL) (b) HS latch (HSL)
Figure 3.31: FinFET latch templates
Figure 3.32: TG flip-flop (TGF) template
Figure 3.33: HS flip-flop (HSF) template
59
tion. TGL2 and TGF2 employ a-SG-mode FinFETs to implement a weaker I3 in-
verter, hence, permitting a smaller I1 inverter. By replacing I1/I2 with a-SG-mode
FinFETs as well, TGL3 and TGF3 push the limits of operation. TGL4 and TGF4 use
IG-mode FinFETs (with n-FinFET back-gate tied to ground and p-FinFET back-gate
tied to VDD) to weaken I3.
Table 3.4: TG latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET, T2 =SG(1P1N)
Case I1/I2 I3 T 1 I4 I5 I6TGL1 SG SG SG - - -
4P2N/2P1N 1P1N 2P2N - - -TGL2 SG a-SG SG - - -TGL3 a-SG a-SG SG - - -TGL4 SG IG SG - - -
2P1N/2P1N 1P1N 1P1N - - -TGF1 SG SG SG SG SG SG
4P2N/2P1N 1P1N 2P2N 1P1N 1P1N 2P1NTGF2 SG a-SG SG SG a-SG SGTGF3 a-SG a-SG SG a-SG a-SG a-SGTGF4 SG IG SG IG IG SG
2P1N/2P1N 1P1N 1P1N 1P1N 1P1N 2P1N
Table 3.5: HS latch and flip-flop cases, xPyN = x-fin p-FinFET, y-fin n-FinFET,N1/N3/N7 = SG(2N), I5 = SG(2P1N)
Case I1/I2 N2/N4 I3/I4 N5/N6HSL1 SG SG - -HSL2 SG a-SG - -HSL3 a-SG SG - -HSL4 a-SG a-SG - -HSL5 IG SG - -HSL6 IG a-SG - -
1P1N/1P1N 2N/2N - -HSF1 SG SG SG SGHSF2 SG a-SG SG a-SGHSF3 a-SG a-SG a-SG a-SGHSF4 a-SG SG a-SG SGHSF5 IG SG IG SGHSF6 IG a-SG IG a-SG
1P1N/1P1N 2N/2N 1P1N/1P1N 2N/2N
60
For the HS latches and flip-flops, HSL1 and HSF1 constitute the base cases with
only SG-mode FinFETs. A half-swing clock is employed, which toggles between 0
and VDD/2, thereby reducing dynamic clock power dissipation considerably. How-
ever, the switched clock load capacitance doubles, as N1-N7 are sized-up to two
fins to be able to flip the cross-coupled inverters. Therefore, the effective clock
power dissipation is halved with respect to TG configurations using T 1/T 2 gates
with single-fin FinFETs. (HSL2, HSL3, HSL4) and (HSF2, HSF3, HSF4) introduce a-
SG-mode FinFETs at all possible locations except N1, N3, and N7, which are driven
by the half-swing clock. HSL5 and HSF5 use IG-mode FinFETs (with n-FinFET
back gate tied to ground and p-FinFET back gate tied to VDD) for I1/I2 and I3/I4.
This carries over to HSL6 and HSF6 as well, however, N2/N4/N5/N6 are a-SG-
mode FinFETs. With respect to layout area, all versions of TGL occupy the same
area with standard cell height consisting of four fins for the p-FinFETs and two
fins for the n-FinFETs. The same is true for all versions of TGF, HSL, and HSF. Both
TGFs and HSFs are negative edge-triggered, as shown in Figs. 3.34(a) and 3.34(b).
For TGL and TGF configurations, when the clock is high, data value D is forced
into I2/I3 through T 1, while T 2 is off and I4/I5 are in the hold mode. When the
clock goes low, T 1 shuts off and T 2 forces the value at the output of I2 into I4/I5
for TGF. In HSL (HSF) configurations, when both clock and D are high, QB (INB)
is pulled low, forcing Q (IN) high. For HSF, when the clock goes low, N7 is active,
and depending on the polarity of IN and INB, Q is pulled either low or high.
Table 3.6 shows the hold static noise margins of the cross-coupled inverter pairs
used in Tables 3.4 and 3.5. a-SG (1P1N) outperforms the rest of the configurations,
including IG (1P1N) suggesting that a-SG-mode FinFETs are ideal for keeper in-
verters in latches/flip-flops as well.
Quasistationary/DC simulations were used to measure average leakage over
all possible legal combinations of input/output vectors and internal states. From
61
Table 3.6: Hold static noise margins, xPyN = x-fin p-FinFET, y-fin n-FinFET
INV1 INV2 SNM (mV)(SG, 2P1N) (SG, 1P1N) 310(SG, 1P1N) (SG, 1P1N) 315(SG, 2P1N) (IG, 1P1N) 325
(a-SG, 2P1N) (SG, 1P1N) 320(a-SG, 2P1N) (a-SG, 1P1N) 375(a-SG, 1P1N) (a-SG, 1P1N) 400
(IG, 1P1N) (IG, 1P1N) 375
0 1 2 3 4 5
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vol
tage
(V
)
CLK D
IN
INBQ
QB
(a) TGF1
0 0.5 1 1.5
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vol
tage
(V
)CLK
DQ
QBIN
INB
(b) HSF1
Figure 3.34: Transient simulations of TGF1 and HSF1
Fig. 3.35, TGL3, which employs a-SG-mode FinFETs (except for T 1), can be seen to
have over 10× lower leakage than TGL1. Similarly, HSL4, with mostly a-SG-mode
FinFETs, has nearly 3× lower leakage compared to HSL1. From Fig. 3.36, TGF3 and
HSF3 can be seen to follow similar trends. The introduction of IG-mode FinFETs
results in a marginal reduction in average leakage in (TGL4, TGF4), (HSL5, HSF5),
and (HSL6, HSF6).
Propagation delay was averaged for 1→ 0 and 0→ 1 transitions, assuming an
output load of four size-X1 SG-INVs for both latches and flip-flops. From Fig. 3.37,
TGL3 can be seen to have nearly 2× larger delay compared to TGL1 owing to the
weaker a-SG-mode FinFETs. Similar observations hold good for (HSL1, HSL2) and
(HSL3-HSL6). However, from Fig. 3.38, TGF3 and TGF1 can be seen to have almost
62
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6x 10
−9
Ave
rage
ILE
AK
(A
)
TGL2
TGL3
TGL4
TGL1
HSL1
HSL2
HSL3
HSL4
HSL5
HSL6
Figure 3.35: Average ILEAK for FinFET latches
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8x 10
−9
Ave
rage
ILE
AK
(A
)
TGF1
TGF2
TGF3
TGF4
HSF1
HSF2
HSF3
HSF4
HSF5
HSF6
Figure 3.36: Average ILEAK for FinFET flip-flops
identical delays. This is due to the fact that forcing data into I4/I5 in TGF1, when
the clock is low, is harder due to the stronger SG-mode keeper FinFETs, thereby
increasing the CLK→Q delay. The poor leakage-delay behavior for TGF4 suggests
that IG-mode FinFETs are best suited for the weaker inverters (I3/I5) and should
not be used for I2/I4 in TGF configurations. For (HSF3-HSF6), the introduction
of IG/a-SG-mode FinFETs results in roughly 30% increase in average propagation
delay with respect to (HSF1, HSF2).
63
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
3.5x 10
−11
Ave
rage
pro
paga
tion
dela
y (s
)
TGL2
TGL3
TGL4TGL1
HSL1HSL2
HSL3
HSL4 HSL5 HSL6
Figure 3.37: Average propagation delay for FinFET latches
1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6x 10
−11
Ave
rage
pro
paga
tion
dela
y (s
)
TGF1
TGF2
TGF3
TGF4
HSF1 HSF2
HSF3 HSF4
HSF5 HSF6
Figure 3.38: Average propagation delay for FinFET flip-flops
The maximum of the setup periods of legal 0→ 1 and 1→ 0 output transitions
(for corresponding input transitions before the clock edge) is reported as the setup
time for flip-flops in Fig. 3.39. TGF4 has the smallest setup time owing to the IG-
mode FinFETs, which weaken I3. TGF1 has a comparatively low setup time for
an all-SG-mode FinFET configuration, owing to the large I1 data-forcing inverter.
(TGF2, TGF3) and (HSF3, HSF4) have considerably larger setup times as they em-
ploy weaker a-SG-mode FinFETs. Similar trends were observed for latches as well.
64
In summary, (TGF3, HSF3), which were implemented using a combination of
a-SG-mode and SG-mode FinFETs, have the best tradeoffs in leakage, delay, and
setup time for (TG, HS) flip-flop configurations.
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
2
2.5
3
3.5x 10
−11
Set
up ti
me
(s)
TGF1
TGF2
TGF3
TGF4
HSF1HSF2
HSF3
HSF4
HSF5HSF6
Figure 3.39: Setup time for FinFET flip-flops
3.1.6 Section summary
In this section, we evaluated Symm-ΦG SG-/IG-mode FinFETs and Asymm-ΦG
SG-mode FinFETs head-to-head in a high-performance process. We also investi-
gated the design space of logic gates, latches, and flip-flops employing them in
various possible configurations, which resulted in the following key insights:
• Asymm-ΦG SG-mode FinFETs in a high-performance process provide very
steep subthreshold slopes, ultra-low off-currents, and reasonably high on-
currents in comparison to corresponding Symm-ΦG SG-/IG-mode FinFETs,
and maintain their advantage at high temperature. This suggests that they
could be widely used (in combination with Symm-ΦG SG-mode FinFETs
when necessary) in off-critical paths, with the same layout as Symm-ΦG
65
SG-mode devices and without the routing and process related problems of
integrating IG-mode back-gate biased devices.
• While it is possible to trade off leakage vs. delay using IG-mode FinFETs,
indiscriminate use of back-gate biasing could impact area, performance, and
leakage as IG-mode devices need extra area to land back-gate contacts and
have degraded subthreshold slopes. In this regard, using a single IG-mode
device at the top of a series stack is sufficient to reduce leakage considerably
without too much degradation in delay.
3.2 Fault models for logic circuits in the multi-gate era
In this section, we delve into the problem of developing fault models for FinFET
logic circuits.
3.2.1 Introduction
Fault modeling [110] for planar single-gate CMOS is an extensively researched
area, and comprehensive fault models have been established at various abstraction
levels. Bridging [111], stuck-at [112], delay [113] and stuck-open [114] faults are
among the most widely used fault models for CMOS. However, on account of the
double-gate configuration, it is unclear if CMOS fault models can comprehensively
model defects in FinFET circuits. Here, the important questions that need to be
investigated are: (i) how do FinFET logic gates behave in the presence of defects
like opens and shorts, and (ii) are CMOS fault models adequate for covering all
defects in FinFET logic gates.
To the best of our knowledge, this is the first work on fault modeling for Fin-
FET circuits that considers defects in both SG- and IG-mode FinFETs, including
66
open defects on the back gate, which are unique to IG-mode FinFETs. The main
contributions of this section can be summarized as follows:
• We model opens (cuts) and shorts in FinFET gates with SG- and IG-mode
devices using mixed-mode device simulation in Synopsys Sentaurus TCAD
[81] using FinE [104], a double-gate circuit design environment.
• In the case of a floating back-gate node due to an open defect, we show that a
combination of models is needed to account for the observed leakage-delay
trends of the logic gates, taking into account the dominant capacitances that
couple to the back gate.
• In the regime dominated by stray coupling capacitances, pulse-broadening
(pulse-shrinking) occurs for a wide range of back-gate voltages in n-FinFET
(p-FinFET) back-gate cuts.
• While pulse-shrinking occurs for a majority of cases in the regime domi-
nated by strong coupling capacitances to the front gate/drain/source re-
gions, pulse-broadening and stuck-at conditions may also be manifested de-
pending on the logic gate topology.
The rest of the section is organized as follows. We present related fault modeling
work in Section 3.2.2. In Section 3.2.3, we discuss the FinFET inverter (INV) and
NAND gates from a testing perspective. We demonstrate the need for new FinFET
fault models and explore them in Section 3.2.4. Finally, Section 3.2.5 presents the
section summary.
3.2.2 Related work
Fault modeling is the process of developing models of physical defects at higher
levels of abstraction. For CMOS circuits, it is estimated that around 80% of phys-67
ical defects can be detected using the stuck-at fault model [115]. With scaling of
technology, testing for bridging and delay faults becomes critical. Shrinking ge-
ometries lead to greater chances of bridging between disjoint conductive regions,
owing to mask and lithographic imperfections. Process variations greatly affect
the Vth spread of FETs, which directly affects drain current, leading to delay faults
on nodes with considerable capacitive loads. For specific input combinations, a
bridging fault causes a connection between supply and ground, and leads to an
abrupt change in the supply current in the steady state. This behavior can be de-
tected by monitoring the supply current through IDDQ test [116], possibly using an
array of on-chip current sensors [117].
Currently, there are no comprehensive fault models for FinFET circuits.
Vazquez et al. [118] showed that the hold time for stuck-open faults decreases
dramatically on account of increased sub-threshold leakage and gate leakage in
nanoscale FETs, including FinFETs. However, the study characterizes only stuck-
open faults in SG-mode FinFET circuits and defects in IG-mode FinFETs were not
considered. While bridging, stuck-at, stuck-open, and delay faults cover most of
the defects in CMOS gates, it is unclear if they comprehensively map defects in
FinFET logic gates as well, which is the focus of the current investigation. The
back-gate bias plays an important role in determining the device Vth of IG-mode
FinFETs, which is a significant departure from planar single-gate CMOS, as it can
lead to scenarios such as open defects on the back gate, with the intended signal
at the front gate. Therefore, it is expected that fault models for IG-mode FinFETs
(and logic gates employing them) cannot be oblivious to device/layout parasitics,
unlike most fault models for planar CMOS. Preliminary results capturing the
effect of defects manifested as cuts on the back gate were presented in [119, 120],
indicating a formidable challenge towards the development of a reliable fault
model. In the subsequent sections, we deal with different flavors of FinFET logic
68
gates and show that a hybrid combination of models is essential to capture the
effect of back-gate cuts.
3.2.3 FinFET logic gates
We used the FinE simulation framework (Fig. 3.1) to perform all the experiments
in this work, using device parameters specified in Table 3.1. The on-state currents
for SG- and IG-mode n-/p-FinFETs are presented in Table 3.7. While we cover SG-
/LP-mode FinFET INV and NAND2 gates in this work, the methodology outlined
in later sections is applicable to other mixed configurations discussed in Section
3.1.
From the perspective of testing, the observable metrics of interest are delay and
static leakage power consumption. To obtain the gate delay (tgate), the low-to-high
transition delay (tpLH) and high-to-low transition delay (tpHL) were measured from
the 50% transition of the input to 50% transition of the output, and tgate was set to
max(tpLH , tpHL). For transient simulations, the rise and fall times of the input signal
were set to 10ps, and each logic gate had a fanout of four SG-mode INVs. Since
leakage power consumption is input vector dependent, we report the maximum
leakage observed in each configuration.
Table 3.7: ON-state current for individual FinFET devicesConfiguration n-FinFET ION (A) p-FinFET ION (A)
SG-mode 7.31∗10−5 9.33∗10−5
IG-mode 2.24∗10−5 2.41∗10−5
Using SG- and IG-mode FinFETs, a variety of CMOS-style logic gates can be
constructed as discussed in Section 3.1. In Fig. 3.40, the schematics of SG- (shorted-
gate) and LP-mode (low power) INV and NAND gates are, respectively, shown.
SG-mode logic gates consist of pure SG-mode FinFETs and have no flexibility in
trading off leakage vs. delay. The LP-mode logic gates consist of IG-mode Fin-
69
Figure 3.40: (a) SG-mode INV, (b) LP-mode INV, (c) SG-mode NAND, and (d) LP-mode NAND.
FETs, where the back-gate of the p-FinFETs (n-FinFETs) is connected to a positive
(negative) voltage source, denoted by VHIGH (VLOW ). LP-mode logic gates provide
an opportunity for tuning the leakage-delay characteristic of the gate by adjusting
the back-gate bias statically or dynamically.
0 0.05 0.1 0.15 0.2 0.25 0.321
22
23
24
25
26
27
28
29
30
31x 10
−12
∆ V (V)
De
lay (
s)
LP mode INV leakage and delay vs. ∆ V
0 0.05 0.1 0.15 0.2 0.25 0.30
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10−9
Le
aka
ge
(A
)Leakage
Delay
(a)
0 0.05 0.1 0.15 0.2 0.25 0.340
50
60x 10
−12D
ela
y (
s)
LP mode NAND leakage and delay vs. ∆ V
0 0.05 0.1 0.15 0.2 0.25 0.30
0.2
0.4
0.6
0.8
1
1.2
1.4
x 10−9
∆ V (V)
Le
aka
ge
(A
)
Delay
Leakage
(b)
Figure 3.41: Leakage and delay characteristics under different back-gate bias volt-ages for (a) LP-mode INV, and (b) LP-mode NAND.
In Figs. 3.41(a) and 3.41(b), the trends of leakage and delay for an LP-mode INV
and NAND gate are, respectively, shown. The horizontal axis (∆V ) refers to the
increment (decrement) in the back-gate bias for p-FinFETs (n-FinFETs) in the LP-
mode. For these simulations, the back-gate bias voltages are calculated as follows
(note that VDD = 1V ):
VHIGH = 1+∆V and VLOW = 0−∆V
70
From Figs. 3.41(a) and 3.41(b), reverse biasing the back-gate (above the rail for p-
FinFET and below the rail for n-FinFET) increases the effective transistor threshold
voltages linearly, whereby, leakage decreases exponentially and delay increases
roughly linearly [94]. For the SG- and LP-mode INV and NAND gates, the maxi-
mum leakage was around 6× higher than the minimum.
Table 3.8: Metrics of SG/LP-mode FinFET INV/NAND gates
Logic gate Leakage (A) Delay (s)SG-mode INV 1.28∗10−9 7.96∗10−12
LP-mode INV 6.55∗10−11 26.11∗10−12
SG-mode NAND 1.29∗10−9 15.13∗10−12
LP-mode NAND 6.49∗10−11 53.82∗10−12
We also simulated fault-free SG-mode INV and NAND gates to compare their
leakage and delay values with respect to their LP-mode counterparts. The results
are presented in Table 3.8. For the LP-mode logic gates, nominal VHIGH and VLOW
are as shown in Table 3.1. From Table 3.8, SG-mode implementations result in
around 3× faster gates at the expense of an order of magnitude higher leakage.
3.2.4 Modeling defects in FinFET logic gates
In this section, we examine the behavior of FinFET INV and NAND gates in the
presence of defects. In order to model defects, we inserted cuts on each wire in
the SG- and LP-mode INV and NAND gates and shorted each transistor’s source
and drain terminals. As FinFETs have fully-depleted body regions, they do not
generally suffer from the history effect seen during the test of partially-depleted
SOI FETs [121].
We applied test vectors that detect all faults in CMOS based INV and NAND
gates to the SG- and LP-mode FinFET INV and NAND gates, respectively. Ta-
ble 3.9 shows the sequence of test vectors applied to both SG- and LP-mode Fin-
FET NAND gates in the first column and detected stuck-at, stuck-on, and stuck-71
Table 3.9: Detected and undetected faults in SG- and LP-mode FinFET NANDgates
Test vector Detected faults Undetected faultsstuck-at stuck-on stuck-open11 A/0, B/0, out/1 pA, pB stuck-on01 A/1, out/0 nA stuck-on pA stuck-open pA open back-gate11 A/0, B/0, out/1 pA, pB stuck-on nA, nB stuck-open nA, nB open back-gate10 B/1, out/0 nB stuck-on pB stuck-open pB open back-gate
open faults in the second, third, and fourth columns, respectively. The last column
shows the faults that cannot be detected. In the table, the first (second) bit of the
test vector shows the value of input A (B) of the NAND gates shown in Fig. 3.40
for both SG and LP modes. pA, pB (nA, nB) refer to the p-FinFETs (n-FinFETs) fed
by the corresponding signal. A stuck-at 0 (1) fault assumes that the value of a wire
is fixed at 0 (1) and cannot be changed. A stuck-on fault assumes that a transistor
is always on, which corresponds to shorting the source and drain terminals of a
transistor. A stuck-open fault represents the case opposite to the stuck-on fault,
that is, a transistor is always off regardless of the applied gate voltage.
In order to detect stuck-at faults, a test vector assigns a value opposite to the
assumed stuck-at fault and ensures that the faulty value is observed at the output.
The stuck-at faults are assumed to be at the gate inputs and the output. The second
column in Table 3.9 shows that all stuck-at faults of FinFET NAND gates can be
detected using the CMOS stuck-at test set.
A stuck-on fault causes a VDD-to-ground connection for a particular set of input
combinations. In this case, the static leakage current increases drastically. The third
column in Table 3.9 shows which test vectors can be used to detect this type of fault.
In the presence of the stuck-on faults, the leakage currents observed during the test
of SG- and LP-mode INV/NAND gates are shown in Table 3.10. The four to six
orders of magnitude increase in current, in comparison to the nominal leakage
shown in Table 3.8, enables detection of these defects using IDDQ testing.
72
Table 3.10: Shorting source and drain of an n-/p-FinFET in SG/LP-modeINV/NAND gates
Logic gate Maximum leakage (A)SG-mode INV 9.33∗10−5
LP-mode INV 2.41∗10−5
SG-mode NAND 5.13∗10−5
LP-mode NAND 1.52∗10−5
Detection of a stuck-open fault requires application of a two-pattern test. The
first vector is for initialization and the second one results in the wrong output
value in the presence of the fault. The sequence of test vectors applied to a NAND
gate (shown in Table 3.9) inherently contains three sets of two-pattern tests. For
example, application of initialization vector 11 followed by test vector 01 detects
pA stuck-open. The fourth column in the table lists all FinFET stuck-open faults
that can be detected. Although it is possible to detect all faults related to the front
gates of the transistors in both SG- and LP-mode FinFET NAND gates, detection
of open faults on the back gates is non-trivial. (It should be noted that, while front
and back gates are physically equivalent in our structure, we refer to the gate,
which has been disconnected from the inputs/fixed bias voltage, as the back gate.)
When a cut on the back gate occurs, it should be treated as a floating node. De-
pending on the capacitances that couple to the back gate node and transitions that
occur across the coupling capacitances, the back-gate may float to the intended
original value, or vary drastically and dynamically. Since the back-gate bias af-
fects the Vth strongly [42], FinFETs display a range of behaviors, which needs to be
analyzed from a leakage-delay perspective.
The width quantization property of FinFETs necessitates the use of an integer
number of fins to implement a FET with a large electrical width. Therefore, a short
or a cut on some combinations of adjacent fins can lead to a partially-defective
transistor. The analysis presented below is based on the assumption that each
73
transistor in the SG- and LP-mode INV and NAND gates has one fin. An analysis
of cuts on a subset of fins is provided in Section 3.2.4.
Based on the above discussion, we categorize back-gate cut FinFET operation
into three regimes: back-gate node capacitance (CBG) dominated by coupling from
stray sources (CST RAY,BG), coupling from the front gate (CFG,BG), and coupling due
to source/drain regions (CD,BG, CS,BG). Layout styles and choice of device parame-
ters greatly affect the predominant regime of operation.
Regime I: CST RAY,BGCFG,BG,CD,BG,CS,BG
In circuits with dense layouts, crowding of interconnect features around a back-
gate cut defect can increase the effective CST RAY,BG, whereby the back-gate node
voltage is almost independent of voltage changes occurring at the front gate as
well as source/drain regions. Fig. 3.42(a) shows a possible scenario, with a large
wire capacitance contribution to CST RAY,BG from a cut located at the VHIGH back-gate
bias line from the voltage generator, shared by many logic gates in the region.
(a) (b)
Figure 3.42: (a) Regime I: Opens on shared back-gate bias lines for many LP-modeINV gates, and (b) Regime II/III: Opens on individual back-gate bias lines for anLP-mode INV gate
74
Effect of an open on the p-FinFET back-gate in LP-mode logic gates
We simulated LP-mode INV and NAND gates with open defects on the back gates
of the p-FinFETs. The back-gate biases, VHIGH and VLOW , for the defect-free cases
were set to their nominal values shown in Table 3.1. Since the back gate is floating
when open, it is necessary to characterize the logic gate for a range of possible
voltages, which may be manifested on the dynamic node. Assuming CST RAY,BG
CFG,BG,CD,BG,CS,BG, the voltage on the cut back gate, VBG = Vcut , was varied from
VLOW to VHIGH . In Fig. 3.40(d), for the LP-mode NAND gate, a wire cut at VHIGH
before the fanout leads to an open fault for both p-FinFETs. They have the same
variable back-gate bias, which is Vcut . While this turned out to be the worst-case
scenario, the leakage-delay characteristics were similar in the cases in which the
connection to only one p-FinFET back-gate is cut.
In Figs. 3.43(a) and 3.43(b), the variation in leakage and delay with respect to
Vcut is shown. A drastic increase in leakage occurs as Vcut decreases from its in-
tended bias of VHIGH . For the LP-mode NAND gate, leakage stays relatively con-
stant until Vcut approaches 0.8V . This is due to the fact that leakage of n-FinFETs
(leakage vector AB=10) dominates the maximum leakage up to this point. When
the back-gate bias of the p-FinFETs is less than 0.8V , leakage from p-FinFETs dom-
inates (leakage vector AB=11), resulting in an exponential increase in leakage with
decreasing Vcut .
Also, as Vcut decreases from its intended value of VHIGH , the logic gates switch
faster due to the fact that the p-FinFET has greater current drive, which reduces the
gate delay. However, below 0.6V , the high-to-low transition delay tpHL dominates
the maximum delay as most of the current through the pull-down network consists
of the p-FinFET leakage current, thereby limiting the current that discharges the
output capacitance. Beyond a certain point, the p-FinFET is always on, so that the
output is stuck at high and the logic gates fail to function correctly. Therefore, it is
75
−0.2 0 0.2 0.4 0.6 0.8 1 1.220
25
30
35
40
45
50x 10
−12
Vcut
(V)
De
lay (
s)
LP mode INV leakage and delay vs. Vcut
on p−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−11
10−10
10−9
10−8
10−7
10−6
10−5
10−4
Le
aka
ge
(A
)
Leakage
Delay
tpHL
tpLH
(a)
−0.2 0 0.2 0.4 0.6 0.8 1 1.225
30
35
40
45
50
55
x 10−12
Vcut
(V)
De
lay (
s)
LP mode NAND leakage and delay vs. Vcut
on p−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−11
10−10
10−9
10−8
10−7
10−6
10−5
10−4
Le
aka
ge
(A
)
Delay
Leakage
tpHL
tpLH
(b)
Figure 3.43: Leakage and delay variation with different p-FinFET back-gate biasvoltages for (a) LP-mode INV, and (b) LP-mode NAND.
possible to conclude that a cut on the back gate of a p-FinFET in an LP-mode logic
gate corresponds to many fault models, depending on the observed voltage on the
cut. If Vcut is below 0.5V , the fault corresponds to a p-FinFET stuck-on and can be
detected using IDDQ testing. In the extreme case, the output is stuck-at 1. On the
other hand, because of coupling effects, if Vcut assumes values greater than 0.6V ,
76
then the logic gates switch faster, but have increased leakage power. This scenario
does not have a corresponding fault model in CMOS and is unique to FinFETs.
Effect of an open on the n-FinFET back-gate in LP-mode logic gates
To apply the methodology presented above to open defects on the back gates of
n-FinFETs in LP-mode INV and NAND, Vcut was varied from VLOW to VHIGH . We
inserted a cut on the back-gate wires of the top and bottom n-FinFETs in the pull-
down network and observed similar characteristics.
In Fig. 3.44, the variation of leakage and delay values with changing Vcut is
shown. Similar to the case with opens on p-FinFET back-gates, opens on n-FinFETs
can cause an exponential increase in leakage. However, delay is not affected until
n-FinFETs become severely forward-biased (i.e., Vth of the FET drops considerably),
which happens after 0.4V . In this region, although tpHL decreases, it is not the
dominating factor and the delay of low-to-high transition limits the overall delay
of the gate. In the extreme case, the n-FinFET is always on and the output is stuck-
at 0.
An open fault on an n-FinFET can be modeled using two fault models. If the
back-gate bias drifts toward 0.4V , then leakage increases. However, it might not
be as high as in the case of an SG-mode n-FinFET. Therefore, it is possible that an
IDDQ test may miss detecting this defect. On the other hand, when the n-FinFET is
severely forward-biased (Vcut > 0.4V ), delay and leakage increase substantially. It
is possible to detect this case using delay fault testing or IDDQ testing. The above
scenario is also unique to FinFETs.
Effect of an open on the p/n-FinFET back-gate in SG-mode logic gates
Fault models for cuts on SG-mode gate connections require special attention as
SG- and LP-mode logic gates use different devices. An IG-mode FinFET in an LP-
77
−0.2 0 0.2 0.4 0.6 0.8 1 1.220
30
40
50
60
70
80x 10
−12
De
lay (
s)
LP mode INV leakage and delay vs. Vcut
on n−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−11
10−10
10−9
10−8
10−7
10−6
10−5
10−4
Vcut
(V)
Le
aka
ge
(A
)
Delay
Leakage
tpHL
tpLH
(a)
−0.2 0 0.2 0.4 0.6 0.8 1 1.250
60
70
80
90
100
110x 10
−12
Vcut
(V)
De
lay (
s)
LP mode NAND leakage and delay vs. Vcut
on n−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−11
10−10
10−9
10−8
10−7
10−6
10−5
10−4
Le
aka
ge
(A
)
Delay
Leakage
tpHL
tpLH
(b)
Figure 3.44: Leakage and delay variation under different n-FinFET back-gate biasvoltages for (a) LP-mode INV, and (b) LP-mode NAND.
mode logic gate has two independent gates. Therefore, a cut on the back-gate wire
corresponds to a change in this voltage. However, a cut on the gate connection of
an SG-mode FinFET changes the FinFET into an IG-mode FinFET with a floating
back gate and all other FETs in SG mode.
Here, as with earlier cases, Vcut was swept between two extreme cases, namely,
VLOW and VHIGH , for a cut on a p-FinFET. In Fig. 3.45, the trends in leakage and
78
delay are shown. On decreasing Vcut , leakage increases. When the p-FinFET is
severely forward-biased, the leakage current approaches very high values, similar
to those of LP-mode logic gates. However, the difference between the LP- and
SG-mode logic gates lies in the delay characteristics. In SG-mode gates, the cut
typically increases delay in comparison to the fault-free case (for all the swept back-
gate biases in the INV and for a large fraction of the swept biases in the NAND
gate). In addition, for back-gate voltages spanning VLOW to VHIGH , the logic gate
remains functional. This result can be explained by the greater drive strength of
SG-mode FinFETs as compared to IG-mode FinFETs. While a back-gate cut turns
an SG-mode p-FinFET into an IG-mode p-FinFET, the remaining FinFETs in the
pull-up network compensate for the defect at the expense of increased delay and
leakage.
Simulations for cuts on n-FinFET back-gate connections for SG-mode logic
gates were performed using a similar setup and the resulting leakage-delay char-
acteristic is shown in Fig. 3.46. When the n-FinFETs are forward-biased, leakage
increases drastically and delay tends to decrease up to a certain point. To sum-
marize, cuts on back-gate connections of SG-mode FinFETs cause an increase in
leakage and delay in the worst case. Also, for back-gate voltages spanning VLOW to
VHIGH , the logic gates maintain functionality. This behavior is different from that
observed for LP-mode logic gates.
Effect of an open on a subset of fins
We increased the electrical width of all transistors in LP-mode INV and NAND
gates from one to four fins to analyze the behavior of partially-defective transistors
during testing. To model open defects on the back-gates of LP-mode gates, we
simulated all possible combinations, which are open defects on 1, 2, 3, and 4 fins at
79
−0.2 0 0.2 0.4 0.6 0.8 1 1.25
10
15
20
25
30
35x 10
−12
De
lay (
s)
SG mode INV leakage and delay vs. Vcut
on p−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−9
10−8
10−7
10−6
10−5
10−4
Vcut
(V)
Le
aka
ge
(A
)
Leakage
Fault−freedelay
Delay
Fault−freeleakage
tpHL
tpLH
(a)
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
15
20
25
30
35
40
45
50
55x 10
−12
De
lay (
s)
SG mode NAND leakage and delay vs. Vcut
on p−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−9
10−8
10−7
10−6
10−5
10−4
Vcut
(V)
Le
aka
ge
(A
)
Leakage
Delay
Fault−freedelay
Fault−freeleakage
tpLH
tpHL
(b)
Figure 3.45: Leakage and delay variation with different p-FinFET back-gate biasvoltages for (a) SG-mode INV, and (b) SG-mode NAND.
a time, respectively. It must be noted that the case of open defects on all four fins
is expected to be equivalent to an open defect on the back-gate of a 1-fin FinFET.
In Fig. 3.47, the variation in delay and leakage is shown for an LP-mode NAND
gate when open defects are introduced on one of the p-FinFETs. As expected, when
all four fins are cut, the change in delay and leakage is similar to the one shown in
Fig. 3.43(b). However, if only one or two fins are cut, the gate remains functional
80
−0.2 0 0.2 0.4 0.6 0.8 1 1.25
10
15
20
25x 10
−12
Vcut
(V)
De
lay (
s)
SG mode INV leakage and delay vs. Vcut
on n−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−10
10−9
10−8
10−7
10−6
10−5
10−4
Le
aka
ge
(A
)
Fault−freeleakage
Fault−free delay
Delay
Leakage
tpHL t
pLH
(a)
−0.2 0 0.2 0.4 0.6 0.8 1 1.214
16
18
20
22
24
26
28
30
32
34x 10
−12
De
lay (
s)
SG mode NAND leakage and delay vs. Vcut
on n−FinFET
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−9
10−8
10−7
10−6
10−5
10−4
Vcut
(V)
Le
aka
ge
(A
)
Delay
Fault−freedelay
Fault−freeleakage
Leakage
tpLH
tpHL
(b)
Figure 3.46: Leakage and delay variation with different n-FinFET back-gate biasvoltages for (a) SG-mode INV, and (b) SG-mode NAND.
even if the value of Vcut assumes the extreme value of −0.2V . On the other hand,
the change in leakage with respect to Vcut follows the same trend irrespective of the
number of fins that have open faults.
Although not shown in this section, we also simulated all possible combina-
tions of multiple fin open defects on n-/p-FinFET back gates of LP-mode INV and
NAND gates. The simulation results showed similar trends as Fig. 3.47. The
81
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
20
30x 10
−12
Vcut
(V)
De
lay (
s)
LP mode NAND delay vs. Vcut
for cuts on multiple fins of one pFinFET
Cut 1 out of 4 fins
Cut 2 out of 4 fins
Cut 3 out of 4 fins
Cut 4 out of 4 fins
(a)
−0.2 0 0.2 0.4 0.6 0.8 1 1.210
−10
10−9
10−8
10−7
10−6
10−5
10−4
Vcut
(V)
Le
aka
ge
(A
)
LP mode NAND leakage vs. Vcut
for cuts on multiple fins of one pFinFET
Cut 1 out of 4 fins
Cut 2 out of 4 fins
Cut 3 out of 4 fins
Cut 4 out of 4 fins
(b)
Figure 3.47: Effect of cutting a subset of fins in an LP-mode NAND gate p-FinFETwith four fins on (a) delay, and (b) leakage.
change in leakage with respect to Vcut follows the same trends irrespective of the
number of fins that have open defects. On the other hand, open defects on only
one or two fins do not negatively affect the propagation delay of the gates. This
implies that transistor sizing could be used to improve robustness against delay
faults caused by back-gate open defects within a standard cell.
From the perspective of delay fault testing, majority of the cases in Regime I re-
sult in either pulse-broadening or pulse-shrinking of the output pulse with respect
to an input pulse for buffer-like configurations, i.e., logic gate cascaded with an
82
SG-INV [Figs. 3.48(a) and 3.48(b)]. This is due to the fact that either tpHL or tpLH in-
creases dramatically, thereby leading to slow-rising output edges due to p-FinFET
back-gate cuts (Figs. 3.43, 3.45) and slow-falling output edges due to n-FinFET
back-gate cuts (Figs. 3.44, 3.46) for a wide range of back-gate voltages. These can be
detected using three-/two-pattern delay tests described in [121]. While the above
analysis suggests that it is possible to model the defects in Regime I using a piece-
wise approach, i.e., using a breakpoint at Vcut = 0.5V , and so on, it should be noted
that the breakpoints are very dependent on the sizing of FETs in the gate, making
it impossible to generalize for arbitrary pull-up/pull-down networks.
(a) (b)
Figure 3.48: Pulse characterization setup for (a) SG-mode INV, and (b) SG-modeNAND
Regime II: CFG,BGCST RAY,BG,CD,BG,CS,BG
This scenario can occur when the FET is engineered with sufficiently large
source/drain underlaps (LUN) to the gates (source/drain dopants do not diffuse
into the fin channel region), thereby decreasing CD,BG, CS,BG. A relatively scant
layout with the absence of BEOL features can significantly reduce CST RAY,BG de-
pending on the location of the cut [VHIGH in Fig. 3.42(b)], and CFG,BG can dominate
if the fin thickness (TSI) is small. Therefore, the cut back-gate voltage VBG is deter-
mined by the front gate. Figs. 3.49(a) and 3.49(b) show the conditions for LUN and
TSI under which Regime II dominates, for the chosen FinFET structure.
83
In order to model this regime, we increased the front to back-gate coupling
capacitance (using a thinner fin, TSI = 7nm, larger underlap, LUN = 16nm) and sim-
ulated a buffer configuration for delay fault testing, as shown for the SG-mode
INV in Fig. 3.48(a).
0 5 100
50
100
150
200
LUN
(nm)
Ca
pa
cit
an
ce
(a
F)
CD, BG
CFG, BG
Regime III
Regime II
(a)
6 8 10 12 1420
40
60
80
100
120
140
TSI
(nm)C
ap
ac
ita
nc
e (
aF
)
CD, BG
CFG,BG
Regime II
(b)
Figure 3.49: Interplay between CD,BG and CFG,BG with (a) LUN variation, TSI = 10nm,and (b) TSI variation, LUN = 16nm
Figs. 3.50(a) and 3.50(b) show the transient pulse behavior for an SG-mode INV
with a cut on the n-FinFET and p-FinFET back gate, respectively. Both instances
witness pulse-shrinking (with respect to the defect-free case) albeit on opposite
edges. In the n-FinFET cut back-gate case, on the rising edge of input node A, VBG
rises to an intermediate voltage (instead of VDD), thereby marginally increasing the
fall time for node OUT and rise time for node OUT 2. During the falling edge of
node A, VBG mimics VA and settles to the intended voltage. As a result, the falling
edge of node OUT 2 is not affected. With the p-FinFET cut back-gate case, the
falling edge of node OUT is unchanged, while the rising edge is sharper. This is
due to the fact that VBG is below the rail, helping improve the drive on the p-FinFET
and, hence, the falling edge of node OUT 2 occurs earlier. A smaller slew rate on
node A would cause greater pulse-shrinking as VBG is likely to be negative for a
84
longer period of time. This shows that input slew rate as well as front to back-gate
coupling are key factors affecting pulse-shrinking in this regime of operation.
For LP-mode INV gates, in the n-FinFET cut back-gate case [Fig. 3.51(a)], VBG
rises to an intermediate voltage on the rising edge of node A on account of which
the falling edge of node OUT occurs earlier than the defect-free case (with VBG =
VLOW ), and node OUT 2 rises earlier, resulting in pulse-broadening. In the p-FinFET
cut back-gate case [Fig. 3.51(b)], VBG remains close to zero, weakly turning on the p-
FinFET and node OUT fails to discharge completely for the given pulse width. As a
result, the rising edge of node OUT 2 is delayed, thereby causing the pulse to shrink
considerably. Hence, the behavior of SG- and LP-mode INV gates under n-FinFET
cut back-gate cases is opposite for the current configurations, while the p-FinFET
cut back-gate cases are similar with different degrees of pulse-shrinking. Similar
results were obtained for SG- and LP-mode NAND gates and are not shown here.
Pulse-shrinking due to a late rising edge can lead to setup time failures, while early
falling edges can lead to hold time failures. They are generally detected using two-
pattern delay tests [121].
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VAV
OUT2
VBG
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VOUT2
VA
VBG
(b)
Figure 3.50: Transient pulse behavior of SG-mode INV in Regime II with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut
85
0 50 100 150 200 250
x 10−12
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)V
OUT
VBG
VA
VOUT2
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VA
VBG
VOUT2
(b)
Figure 3.51: Transient pulse behavior of LP-mode INV in Regime II with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut
Regime III: CD,BG,CS,BGCST RAY,BG,CFG,BG
In Fig. 3.42(b), if the FinFET is designed with small or no source/drain underlaps,
or even overlaps to the gates, CD,BG, CS,BG dominate, and CST RAY,BG, CFG,BG have
little effect in determining VBG. Fig. 3.49(a) shows the interplay between CD,BG
and CFG,BG with LUN variation, which is difficult to control. With LUN = 10nm, the
device has relatively low CD,BG, while with LUN = 0nm, CD,BG is greater, moving
from Regime II to Regime III.
We used the same setup as Figs. 3.48(a) and 3.48(b) to simulate transient pulse
behavior. In Fig. 3.52(a), for the case of the n-FinFET back-gate cut, VBG rises dur-
ing the rising edge of node A and is driven below the rail on the falling edge, which
enables rapid charging of node OUT and discharging of node OUT 2, resulting in
marginal pulse-shrinking. With LUN = 0nm [Fig. 3.52(b)], pulse-shrinking is more
pronounced on account of larger CD,BG, and increased drive current of the FETs due
to lower LUN . Similar observations can be made in the p-FinFET back-gate cut case
in Figs. 3.53(a) and 3.53(b) with a major difference − since VBG is driven below the
rail, it strongly turns on the p-FinFET, thereby preventing node OUT from fully
86
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)V
OUT
VA
VOUT2
VBG
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VA
VOUT2
VBG
(b)
Figure 3.52: Transient pulse behavior of SG-mode INV having n-FinFET back-gatecuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime III.
discharging, which is also accompanied by increased leakage (not shown). This
shows that between Regimes II and III, SG-mode INVs exhibit varying degrees of
pulse-shrinking. In the case of an n-FinFET back-gate cut in LP-mode INV [Fig.
3.54(a)], there is no change in the behavior of the gate as VBG ≈ VLOW . However,
for a p-FinFET back-gate cut [Fig. 3.54(b)], VBG is driven negative, thereby turning
on the p-FinFET, resulting in logic failure, and node OUT 2 is stuck-at 0. Similar
observations hold for the SG-mode NAND as well [Fig. 3.55], albeit the degree
of pulse-shrinking was marginal for cuts in either FET in the pull-up/pull-down
network. In the LP-mode NAND cases (not shown), p-FinFET back-gate cuts re-
sult in node OUT 2 being stuck-at 0 while n-FinFET back-gate cuts cause marginal
pulse-shrinking.
Finally, it should be noted that for LUN , TSI , and layout style combinations when
neither CD,BG, CFG,BG nor CST RAY,BG dominates, it is difficult to generalize pulse
shrinking/broadening behavior, which is a limitation of the current approach.
87
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)V
OUT
VA
VBG
VOUT2
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VBG
VA
VOUT2
(b)
Figure 3.53: Transient pulse behavior of SG-mode INV having p-FinFET back-gatecuts with (a) LUN = 10nm, Regime II, and (b) LUN = 0nm, Regime III.
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VA
VOUT2
VBG
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VA
VOUT2
VBG
(b)
Figure 3.54: Transient pulse behavior of LP-mode INV in Regime III with (a) n-FinFET back-gate cut, and (b) p-FinFET back-gate cut
3.2.5 Section summary
As robust design methodologies for multi-gate devices mature, the need to de-
velop fault models for defects becomes increasingly important. In this section, we
showed that most opens and shorts in FinFET logic circuits map to established
fault models in planar CMOS. However, opens on the back gate with the intended
signal at the front gate cause delay and leakage problems, which are unique to Fin-
88
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)V
OUT
VBG, A
VOUT2
VA
(a)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VOUT2
VB
VBG, B
(b)
0 20 40 60 80 100 120 140 160 180
x 10−12
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
VOUT
VOUT2
VA
VBG, A
(c)
Figure 3.55: Transient pulse behavior of SG-mode NAND in Regime III having(a) n-FinFET back-gate cut at A, (b) n-FinFET back-gate cut at B, and (c) p-FinFETback-gate cut at A.
FETs, owing to the strong dependence of Vth on the back-gate bias, thereby com-
pounding the role of device/layout parasitics.
We broadly identified three regimes of operation, which affect testability. In
the regime dominated by stray capacitances, for a wide range of back-gate volt-
ages, the logic gates show pulse-broadening, which can be detected using three-
pattern delay tests. In the regime dominated by front to back-gate or source/drain
to back-gate coupling, back-gate cuts lead to pulse-shrinking in SG-mode gates (to
different degrees depending on the logic gate), primarily due to slow-rising out-
89
put edges for n-FinFET back-gate cuts and fast-falling output edges for p-FinFET
back-gate cuts. In LP-mode gates, however, in the regime dominated by front
to back-gate coupling, n-FinFET back-gate cuts cause pulse-broadening and p-
FinFET back-gate cuts lead to pulse-shrinking. In the regime dominated by back-
gate to source/drain coupling, n-FinFET back-gate cuts lead to marginal pulse-
shrinking while p-FinFET back-gate cuts cause stuck-at faults as the outputs are
permanently held high. The absence of a unified fault model for back-gate cuts
in IG-mode FinFETs poses a testing challenge, owing to the diversity of output
behaviors. However, SG-mode logic gates can be tested for back-gate isolation us-
ing a combination of pulse-broadening/shortening/IDDQ tests, depending on the
regime of operation, without logic failure when appropriately sized.
90
Chapter 4
Efficient Algorithms for 3D-TCAD
Modeling of Emerging Devices and
Circuits
4.1 Introduction
Hardware experiments with multi-gate devices and larger circuits (e.g., SRAMs,
eDRAMs, ring oscillators, etc.) entail very high cost and turnaround time. Thus,
efficient predictive process/device characterization methods for such circuits are
urgently needed. A lack of such methods poses a significant impediment to rapid
progress in this area and represents the technology-circuit co-design gap shown in
Fig. 4.1. Here, while continuum TCAD methods have paved the way for atomistic
TCAD simulation for individual devices, predicting circuit-level metrics in new
process technologies still remains a major challenge.
Reliance on compact models is not possible since such models lag advances
in technology and often require inputs from detailed device simulations of test
structures in the early phases of technology development. Key methodologies
91
Figure 4.1: Technology-circuit co-design gap
to enable immediate, accurate feedback to designers (e.g., optimization of para-
sitics in SRAM bitcell layouts, noise margin analysis, top-down optimization of
the back-end stack, etc.) have not yet been formulated. In Fig. 4.2, the TCAD
Figure 4.2: TCAD flow for the 130nm node and higher
flow for process/device development is shown for the 130nm node and higher
technology nodes. Here, individual devices were investigated in isolation, and92
thereafter, compact models were developed for circuit simulation. At the 90nm
node and below, owing to a variety of physical effects that need to be captured,
the TCAD flow proceeded along the lines of Fig. 4.3. Mixed-mode device-circuit
simulation became the cornerstone of early stage technology/process evaluations
of key circuit elements as compact model development remained a slow and cum-
bersome process. At the 32nm node and lower, owing to the plethora of layout-
dependent effects, 3D-TCAD contiguous/mixed-mode simulation grew more im-
perative [122–124].
Figure 4.3: TCAD flow for 90nm-32nm technology nodes
Though 3D-TCAD based exploration enables accurate predictive modeling at
lower technology nodes, it is beset with major challenges in manual process mod-
eling/simulation of large layouts, inability to adapt quickly to rapidly changing
process recipes using manual inputs on each occasion, computational complexity
of DC/AC and transient-state 3D device simulations, as well as a very high cost
for model setup for each circuit/layout under investigation (LUI). An LUI can con-
sist of anywhere between two to several tens of devices. This highlights the need
93
to develop a seamless set of methodologies integrated with 3D-TCAD eco-systems
for resolving process, layout, and device-level issues quickly. Here, questions with
broad application scope that need attention are:
1. How do we leverage 3D-TCAD to enhance technology-circuit co-design with
emerging devices such as FinFETs?
2. How do we efficiently obtain process-simulated 3D structures for LUIs with
several devices?
Process
conditions
3D process simulation
of the entire layout using detailed
mechanisms/kinetics
Tweak layout
3D device-simulation-ready structure
3D device simulation:
Exhaustive
characterization,
simple physical models
Advanced transport
phenomena
Transient analysis with
external excitation!
Extract I-V and C-V
characteristics,
thermal behavior, etc.
Litho simulation,
mask generation
Metrics look
OK?
Generic layout
Tweak process
O(days to weeks)
No
O(months)O(months to years)
Figure 4.4: The ultimate wishlist for 3D-TCAD assisted process/device develop-ment
Fig. 4.4 succinctly summarizes the above discussion and presents the ultimate
wishlist for 3D-TCAD assisted hardware development, assuming nominal compu-
tational resources per case. With current structure generation methodologies, 3D
process simulations of LUIs is feasible in the time-frame of days to several weeks,
94
depending on the device type and size of the layout. 3D device simulation using
simple physical models, followed by iterative process/layout optimization take
on the order of months, and the use of advanced transport phenomena/transient
analysis in the loop pushes the computation time to several months/years. Hence,
the implementation flow presented in Fig. 4.4 is impractical, and has not been at-
tempted in the industry or academia, which refer to it as the ‘many-device TCAD
barrier’.
In this work, we develop efficient and accurate methodologies for unifying
the layout and process simulation worlds, thereby, expanding the horizon of pre-
dictive modeling for emerging devices beyond the ‘many-device TCAD barrier,’
which is a major showstopper at lower technology nodes [125]. For the first time,
we show that it is profitable to adopt an automated structure synthesis approach
for large-scale structure generation, rather than perform top-down process simu-
lations of LUIs. In doing so, we identify important bottlenecks that plague model-
ing efforts in 3D-TCAD structure generation, and outline innovative techniques to
overcome them. The proposed methodologies are inspired by the observation that
all regions of the device structure need not have process-simulation-level accuracy,
and that it is possible to amortize process simulations of simpler blocks by reusing
them to synthesize structures corresponding to larger layouts.
The rest of the chapter is organized as follows. In Section 4.2, we review re-
lated work. In Section 4.3, we describe the structure synthesis methodologies in
detail. We cover two case studies using the proposed methodologies in Section
4.4. Finally, we present a brief discussion in Section 4.5 and conclude in Section
4.6.
95
4.2 Related work
Over the past few years, 3D-TCAD based analysis of emerging devices like Fin-
FETs, etc., has garnered increased attention. It has guided the semiconductor in-
dustry in the march to miniaturize transistors by streamlining the design of test
processes/structures, resulting in dramatic cost savings. 3D-TCAD simulation
for obtaining modeling insight in emerging CMOS/Flash devices is demonstrated
in [122–124, 126–128]. Most TCAD flows consist of two major steps, i.e., process
simulation followed by device simulation. The primary objective of process sim-
ulation is to accurately predict the physical/structural layers and geometry of de-
vices at the end of a process run, as well as the active dopant/stress distributions.
Techniques to improve process simulation have been investigated in [129–132].
The complexity of physical models is a major factor that impacts process simu-
lation. Simplified physics minimizes computation time. With technology scaling,
however, the need for ever more accurate doping/stress profiles has increased and
complex physical models are added at each new generation. On account of the de-
tailed physical modeling involved, process simulation is almost exclusively used
to fine-tune the development of individual devices. Therefore, it is difficult to
generate accurate integrated representations of front-end-of-line (FEOL) and back-
end-of-line (BEOL) structures corresponding to LUIs in TCAD. This has resulted
in a segmented modeling approach for FEOL/BEOL components, which can be a
problem at lower technology nodes, e.g., in capacitance (parasitics) extraction for
highly scaled circuits, where a true 3D representation, consisting of active semi-
conductor devices, metals, and conformal dielectrics, is crucial to compute accu-
rate results through transport analysis based 3D-TCAD extraction [133]. Fig. 4.5
summarizes the different application domains of 3D-TCAD from the perspective
of process structure complexity and physical model complexity. TCAD has tra-
ditionally been used in quadrants III/IV and quadrants I/II have been relatively
96
inaccessible owing to the high computational costs of structure generation via pro-
cess simulation.
Simple physical models
Complex structures
E.g., Parasitics extraction
for 6T SRAM
Simple physical models
Simple structures
E.g., First-pass simulation
of a simple FET
Complex physical models
Simple structures
E.g., Detailed transport simulation
of a simple FET
Complex physical models
Complex structures
E.g., Detailed transport
simulation for process-
simulated 6T SRAM
I. II.
III. IV.
Figure 4.5: TCAD modeling quadrants
The ideas proposed in the next section are radically new approaches for chip-
ping away the computing barriers shown in Fig. 4.4 that have been unavoidable
in traditional 3D-TCAD approaches.
4.3 Structure synthesis methodologies
In this section, we outline methodologies for automated 3D structure synthesis
that can circumvent the bottlenecks posed by process simulation of large layouts.
4.3.1 Key ideas
Process/circuit engineers at emerging nodes (e.g., FinFETs) grapple with typical
design questions, such as:
• What is the effect of modifying fin/gate pitch (for manufacturability) on Fin-
FET SRAM array performance?
• Given several different FinFET SRAM bitcell layouts, which one has the best
area vs. stability vs. performance (parasitics) vs. leakage tradeoff?97
• What is the best way to design the back-end stack (metal/via heights and di-
electrics) to minimize RC-delay, while meeting material, fabrication, thermal,
and electro-migration constraints?
Accurate answers to the above can be obtained mainly from detailed pro-
cess/device simulations, as there are no higher-level compact models avail-
able for a new process. Here, the low-level manual input that is needed to
prepare 3D-TCAD decks poses a major barrier, as shown in Figs. 4.6(a) and
4.6(b). Maintaining consistency across layouts/processes/engineers is difficult
and modeling ambiguity can lead to widely differing end results. Also, iterative
process/technology/circuit co-optimization in a TCAD flow cannot be sustained
with a human element within, thereby necessitating massive automation on all
fronts.
The traditional approach to process simulation of M different layouts or opti-
mization of the same layout using M variants is shown in Fig. 4.7(a). The process
simulator requires time on the order of f (N) and memory on the order of g(N) per
case, where N is the number of devices in the layout and f (·)/g(·) are generally
polynomial functions of N. For M layouts/variants, the time complexity scales
as O(M f (N)) and memory complexity as O(maxg(N)). From past experiments
with planar and multi-gate devices, we have found that f (N) ∝ N2+ε for small N,
where ε worsens as N increases. This is unacceptably slow for any kind of itera-
tive process/layout-TCAD simulation based optimization that is much desired by
engineers at lower technology nodes.
In Fig. 4.7(b), we propose an innovative methodology to supplant the brute-
force process simulator driven approach of Fig. 4.7(a). The process simulator is
replaced by a “structure synthesizer,” a plugin into TCAD that can synthesize
process-simulated structures using information from the layout, a formal set of
process assumptions, and ready-to-use pre-processed device regions in the tech-
98
(a)
(b)
Figure 4.6: (a) Modeling ambiguity with manual inputs, and (b) difficulty of itera-tive optimization with human elements in the TCAD flow
nology. The key idea behind the proposed method is to relieve the process sim-
ulator from the burden of simulating layouts with more than one device, which
reduces complexity dramatically. Here, the time required scales as O(Mk(N)) and
memory as O(maxh(N)). With linear (or close to linear) time and memory com-
99
Process
simulator
Process
recipe
Processed structure
N devices
Time: f(N), Memory: g(N)
Loop over M layouts Time: O(Mf(N)), Memory: O(maxg(N))
Typically, f(N): O(N2) Time: O(MN2)
General layout
(a)
Structure
synthesizer
Device
database
Processed structure
N devices
Time: k(N), Memory: h(N)
Loop over M layouts Time: O(Mk(N)), Memory: O(maxh(N))
General layout
Process
assumptions
(b)
Figure 4.7: 3D-TCAD structure generation for layouts: (a) traditional approach,and (b) proposed approach
plexity for k(·) and h(·), the flow in Fig. 4.7(b) would enable 3D-TCAD structure
generation to comfortably scale beyond the ‘many-device TCAD barrier.’
To summarize, the automated structure synthesis approach opens up the fol-
lowing new avenues in modeling:
100
1. For the first time, it enables extensive iterative layout/process-TCAD simu-
lation based refinement in a practical time-frame, as generation of process-
simulated device (PSD) structures is not a bottleneck any more, thereby ad-
dressing the problem depicted in Fig. 4.6(b).
2. It enables quick evaluation of different LUIs in the same process, once process
assumptions are fixed.
3. It also enables quick evaluation of different process recipes on the same LUI,
thereby maintaining overall consistency across processes/layouts, while en-
hancing device-circuit co-design capabilities, and addressing the issues de-
picted in Fig. 4.6(a).
4. It permits independent analysis of FEOL and BEOL components of PSD
structures, as well as modeling of contiguous (FEOL + BEOL) PSD struc-
tures. This is critical to efficient design of experiments for variability and
reliability investigations.
To develop methodologies to realize the flow shown in Fig. 4.7(b), the following
features were targeted:
1. Layout independence: Since process simulators handle layouts right down
to individual masks, our approach is able to handle arbitrary layout features,
and is largely independent of the orientation of devices as well as the process-
development-kit (PDK). This ensures that any layout, irrespective of its un-
derlying nature, e.g., digital, analog or RF, can be easily imported (with the
aid of PDK layer-map files) and analyzed.
2. Process independence: Different process recipes yield different PSD struc-
tures. Hence, our approach incorporates sufficient layers of abstraction to
101
encapsulate the key features of the process, such as a process file enumer-
ating the material systems/dielectric layers/layer thicknesses used, etc., so
that certain elements controlled/marked by the designer can be ignored dur-
ing structure synthesis. This is critical for evaluating efficiency versus accu-
racy tradeoffs.
3. Technology-node/device independence: In order to perform design space
exploration of FEOL and BEOL components, we architected the synthesis
methodology to be as independent of the underlying devices as possible,
and provided abstractions to ensure that it is configurable at any technol-
ogy node. This has the added advantage of being able to migrate from
older technology nodes to newer ones easily, and perform a wide variety
of tests/optimizations (by simply swapping the technology setup files con-
sisting of device databases, process files, etc.) that are not possible using
the traditional approach. Device independence also implies that our ap-
proach can, in principle, be tailored to structure synthesis using any generic
underlying device in TCAD.
Next, we delve into the building blocks required to realize the flow in
Fig. 4.7(b).
4.3.2 Building blocks of the algorithm
Our core approach consists of the following steps:
• Process characterization (PC): (i) delineation of process zones, (ii) construc-
tion of the device-layout database (DLD) using pre-synthesis transforma-
tions, and (iii) process-feature rulebook (PFRB) generation.
102
• Layout characterization (LC): (i) layout analysis using the device-recognition-
rule database (DRD) and (ii) generation of lithography-effects database
(LED).
• Structure synthesis (SS): (i) FEOL only, (ii) BEOL only, and (iii) integrated
(FEOL + BEOL).
It should be noted that while PC is semi-manual with a one-time setup cost per
technology, LC and SS are fully automated. They are described next.
(PC) Delineation of process zones: This is the most critical step where, using a
first-pass process simulation run, ‘process zones’ are allocated to reduce model-
ing complexity as much as possible, while preserving modeling accuracy where
needed. While defining zones, the following terminology relates to doping/stress
profiles: (i) process accurate (PA): if they are precise, (ii) process weakly accurate
(PW ): if they are moderately accurate, and (iii) process independent (PI): in the
event that they are not accounted for or not needed. Similarly, the following termi-
nology relates to physical geometries: (i) geometry accurate (GA): if they are pre-
cise and (ii) geometry weakly accurate (GW ): if they are moderately accurate. PA
doping profiles would import the locations of dopants, the exact profile obtained
from a detailed process simulation. On the other hand, PW doping profiles can be
analytic/formula-based with fitting constants, e.g., an approximate Gaussian pro-
file with a characteristic decay length that corresponds to a profile obtained from
process-simulated output.
As shown in Fig. 4.8, which is a typical FEOL cross-section with metal-1 wiring,
the following zone classifications are found to be useful.
1. PA-GA: This is used to capture major active device/FET regions, such as zone
A, where modeling transport precisely is extremely important. PA-GA zones
can also encompass regions around active FETs to capture the effects of stress,
103
[PA-GA]
[PW-GW]
[PW-GA]
[PA-GW]
[PI-GA]
[PI-GW]
Zone E
Zone F
Zone A
Zone B
Zone C
Zone D
Figure 4.8: Delineation of process zones
proximity, etc., if they are very critical to the parameters being modeled. PA-
GA is primarily assigned to a small group of distinct FEOL regions, e.g., one
instance of each type of device (low-Vth FET, high-Vth FET, and so on) in a pro-
cess technology. It could also be used to designate different process corners
of a device.
2. PA-GW : This is applicable to regions like zone B, where contacts to devices,
etc., need to capture stresses/thermal behavior and can ignore rounding due
to lithography. To the first order, we can expect that minor corner round-
ing in vias and contacts has little effect on parasitics and saves a lot of mesh
nodes that are otherwise needed to capture circular/cylindrical shapes dur-
ing boundary tessellation [134].
104
3. PW -GA: This is used to model regions like zone C, which are part of the ac-
tive device layer and lie between devices, serving as shared source/drain
regions. Locations like zone C mostly consist of heavily-doped regions with-
out any major gradients in dopant concentrations, thereby allowing them to
be modeled as PW .
4. PW -GW : This can model regions like zone D, which lie between active device
layers/islands, where process and geometric accuracy can be sacrificed to
some extent.
5. PI-GA: While modeling BEOL metals, such as zone E, it is essential to cap-
ture exact corner-rounding characteristics for moderately large metal chunks.
Hence, geometric input from lithography simulations is needed. Depending
on device simulation requirements, process profiles can be completely ig-
nored here, making it PI, else PW can also be used.
6. PI-GW : In BEOL metal areas like zone F, apart from ignoring process simula-
tion data or designating it to be PI, to the first order, minor shape variations
in the vertical direction and in the horizontal plane (from lithography sim-
ulations) can be ignored, making it GW , thereby saving mesh nodes during
boundary tessellation.
From Fig. 4.8, we see that zones can overlap with each other. Hence, a priority
order needs to be specified to resolve overlaps: PA-GA > PA-GW > PW -GA > PW -
GW > PI-GA > PI-GW , where the zone with higher priority replaces regions in
zones with lower priority, should they intersect during structure synthesis. Typi-
cally, different processes have different kinds of features that require zone assign-
ment. At the end of this step, a lookup table of assignments, which designates
features obtained from process simulation and geometric interactions, is compiled
and utilized in the steps that follow.105
Process
simulator
Process
recipe
Time ~ f(1), Memory ~ g(1)
Loop over each distinct device type, device width (only for 3D), and process recipe
Device-layout
database (DLD)
Individual device
layouts (nFET,
pFET, etc.)
nFET
pFET
Pre-synthesis
transformations
3D
nFET
pFET
2D
Figure 4.9: Construction of device-layout database (DLD)
(PC) Construction of the device-layout database (DLD): The proposed synthesis
methodology is based on the key observation that it is necessary to preserve ex-
treme detail only in device regions where interesting phenomena occur. It is im-
portant to note that the definition of a ‘device’ in a PSD structure is broader than
just a single FET/active device. It can encompass an entire region consisting of
many FETs (e.g., matched transistors), which are regarded as a single repeating
unit in larger layouts (the finest granularity is a single FET). Therefore, as per Fig.
4.9, one instance of each possible PA-GA zone in the process technology is created
by passing the corresponding ‘device layouts’ through a process simulator. The
resulting PSD structures, which are either three- or two-dimensional, are absorbed
into the DLD after undergoing certain pre-synthesis transformations that are ex-
plained below.
106
Pre-synthesis transformations: These are applied to individual PSD structures,
and are defined on a technology node basis. The basic steps are shown in Fig. 4.10:
PA-GA PSD structures undergo device zoning, including domain trimming, and
any rotations, translations, and reflections needed to obtain a complete PA-GA zone
with the correct orientation, after which all contacts that are present are removed,
and the structure is checked into the DLD.
Process-
simulated structure
Is it 3D or
2D?
Extrude to
required width
2D
3D Device zoning
Trimming
Fix orientation
Rotate
Translate
Reflect
Remove
contacts
Pre-synthesis transformationsTo DLD
Figure 4.10: Pre-synthesis transformations on PA-GA zones
Since the above process simulations involve a one-time cost with a single device
(with various orientations, Vth classes, process corners, etc.) per process recipe, the
time and memory complexity are greatly reduced, and it is possible to update the
DLD for each iteration of the process recipe on a practical timescale. The cached
PA-GA regions are indexed in the DLD and are amenable for ‘insertion’ into larger
structures. This is accomplished via simple geometrical operations and rules to
stitch doping/stress profiles/mesh entities in non-PA-GA zones during the struc-
ture synthesis steps, with the aid of a PFRB, which is described next.
107
Process
simulator
Common-case test
layouts
Structure 1
Structure 3
Structure 4
Slice
generator
Slices
Global
feature abstraction +
rulebook
generation
X-Y
X-Y
X-Y
Y-Z
Y-Z
Y-Z
X-Z
X-Z
FEOL/BEOL rulebook
Global feature 1:
Rule #1, Rule #2, …
Global feature 2:
Rule #1, Rule #2,
…
Process feature rulebook generation
Process
conditions
Structure 2
Figure 4.11: Process feature rulebook (PFRB) generation
(PC) Process feature rulebook (PFRB) generation: While PA-GA zones are ac-
counted for through the construction of the DLD, zones PW -GA and PW -GW are
captured using approximate process profiles generated from a rulebook. This stage
is reached when the process technology is reasonably mature and relatively few
changes are expected during fine-tuning of the process recipe. Fig. 4.11 shows
the procedure for generating the PFRB. Several test case layouts having a small
number (two to six) of devices undergo rigorous process simulation to generate
their respective structures, after which a slice generation macro is employed to cre-
ate slices at different locations of each structure and along various planes. Using
information aggregated from interfaces of regions around PA-GA zones, a global
feature rulebook is generated. The PFRB can be created for both FEOL and BEOL
components. It is used by the respective structure synthesizers (described later) to
intelligently assist in the reconstruction of shapes around PA-GA zones.
The PFRB rules encompass three areas, namely, geometry, feature profiles
(dopant/stress information), and meshing. Geometry rules mainly consist of
108
Boolean add/remove/merge operations. An example of a geometry rule to
produce conformal dielectrics around BEOL metal would be Rule #k1 in Ta-
ble 4.1, where ILD stands for inter-layer dielectrics. Feature profiles, such as
dopant/stress data, are generally analytical/formula-based with suitable fitting
constants to mimic certain process-simulated output profiles. For instance, a rule
to produce a Gaussian doping profile when a layout condition, ‘LC-m’, such as
the edge of a PA-GA zone, is triggered, would be Rule #k2 in Table 4.1. Meshing-
related rules guide the mesh density in intermediate regions between PA-GA zone
submeshes, providing input, such as the maximum and minimum (x,y,z) mesh
spacings. An example rule providing minimum mesh spacings in ‘region-m’
would be Rule #k3 in Table 4.1. The above methodologies systematically charac-
Table 4.1: Process feature rulebook examples
Rule # Rule descriptionk1 ILD-layer-n
⋂Metal-layer-m = ILD-layer-n removed
in regions where Metal-layer-m existsk2 if(LC-m), then dopant-placement [loc(LC-m), profile(LC-m)],
profile(LC-m) = Gaussian[N0, (x,y,z) = loc(LC-m), decay-length]k3 if(region-m), then mesh-xmin = l1 nm,
mesh-ymin = l2 nm, mesh-zmin = l3 nm
terize arbitrary processes and set the stage for the analysis of arbitrary layouts in
the process, which is discussed next.
(LC) Layout analysis: After characterizing the process, the layout to be inves-
tigated is annotated and passed through an automated layout analyzer, as shown
in Fig. 4.12. The layout analyzer is assisted by a device-recognition-rule database
(DRD) where the designer can specify arbitrary Boolean operations between layout
layers to recognize current and new devices. For instance, in Fig. 4.12, the inter-
section of POLY and ACT IV E regions is automatically indexed as a planar FET,
while the intersection of POLY , ACT IV E, and FIN is indexed as a FinFET in the
DLD. This step also extracts all planar geometrical information, including device109
locations, device types, POLY orientation, layout partitions, and doping/PA-GA
submesh boundaries that are used by the structure synthesizer.
Layout analyzer
General layout
Device
recognition rule database
(DRD)
Rule #1: ACTIVE ∩ POLY = planar FET
Rule #2: ACTIVE ∩ POLY ∩ FIN = FinFET
POLY
ACTIVE
FIN
Layout partitioning/segmentation Layout information:
Device types
Device locations
Device orientation
Doping & submesh
boundaries
DLD
Figure 4.12: Layout analyzer
For the layout analyzer to achieve the above in a process/PDK-independent
manner, it is necessary for the designer to annotate the layout with additional
markers, either manually or through layout scripting languages, such as SKILL
[135]. The pre- and post-annotation stages for the case of a 1×1 6T FinFET SRAM
bitcell are shown in Fig. 4.13. The layers, which are added during layout annota-
tion, would correspond to FET Vth markers, FET process corners for the particular
instantiation of the FET, and contact markers for automated contact creation in the
final synthesized 3D structure.
(LC) Generation of the lithography-effects database (LED): Typically, litho-
graphic effects can be directly captured in the individual masks of the layout and
used in process simulation. However, this greatly increases computational com-
plexity, owing to the dense meshes required to tessellate curved surfaces, and the
need to re-mesh with each process step to accurately capture stress/dopant behav-
ior. In our framework, only PA-GA zones need to be lithography-accurate and ob-
tained once from process simulation. Other FEOL and BEOL components, which
110
BL
VDD WL
GND GNDAn
no
tate
d l
ayo
ut
BLB
Inp
ut
layo
ut
POLY
FIN
ACTIVE
METAL-2
METAL-3
Figure 4.13: Layout annotation for a 1×1 6T FinFET SRAM bitcell
need GA, are captured using corner rounding at locations specified by the LED.
Fig. 4.14 summarizes the steps needed for LED generation. Input layouts undergo
lithography simulation for each mask and post-lithography simulation features are
approximated by rounding radii. This information is indexed in the LED for each
layout at different locations and process layers.
(SS) Structure synthesis: This is the final stage in which a structure is stitched
together using information gathered in PC and LC stages. Fig. 4.15 shows the
architecture of the structure synthesizer, where the input layout is partitioned by
the layout analyzer, followed by independent FEOL and BEOL structure synthe-
sis. Here, DLD plays a central role by supplying layout and PA-GA zone informa-
tion to the structure synthesizers. The FEOL (BEOL) synthesizer uses the latter,
along with FEOL (BEOL) assumptions and the FEOL (BEOL) PFRB, to create an
intermediate structure. Depending on the desired accuracy, FEOL (BEOL) lithog-
raphy effects are introduced, following which a final remeshing step generates the
111
Lithography
simulation
Input layout
Post-litho
features
Approximate
by corner rounding/
chamfers
Radii of
curvatureFEOL/BEOL
litho effects
database (LED) R2(X2,Y2)M1
R1(X1,Y1)Active
Rounding
radii
LocationLayer
Litho analyzer
Figure 4.14: Generation of lithography-effects database (LED)
FEOL (BEOL) structure. It is also possible to generate (FEOL+BEOL) structures by
combining the intermediate FEOL and BEOL structures in the integrated structure
synthesizer.
Dealing with proximity effects: It is important to note that stress proximity ef-
fect as a function of inter-device distance, shallow trench isolation, etc., cannot be
captured accurately using the single-device process simulation approach shown in
Fig. 4.9, which is likely to lead to inaccurate structure synthesis. For simulations
where proximity effects are absolutely essential, two approaches can be taken:
• The PA-GA zone is extended to encompass the entire region of interest to
capture dopant/stress profiles with process-level accuracy. This would make
structure synthesis unattractive, if the region is very large.
• Instead of individual layouts in Fig. 4.9, layouts having a group of three de-
vices undergo process simulation (with common-case inter-device distances
assumed as per the technology design rules) to obtain intermediate struc-
tures, which undergo pre-synthesis transformations. Here, the device to
112
Layout
analyzer
Litho
analyzer
FEOL
structure synthesizer
DLD
BEOL
structure synthesizer
FEOL
process assumptions
FEOL
feature rulebook
BEOL
process assumptions
BEOL
feature rulebook
Integrated
structure synthesizer
FEOL litho
effects insertion
Re-mesh
BEOL litho
effects insertion
FEOL
LED
BEOL
LED
Re-mesh
Litho
effects insertion
Re-mesh
FEOL
structure
BEOL
structure
(FEOL +
BEOL) structure
Litho
settings
Litho
settings
Figure 4.15: Architecture of the structure synthesizer
be mapped to a PA-GA zone should be located at the center. In the device
zoning step of Fig. 4.10, only the material within the PA-GA zone bound-
ary around the center device is preserved, and the remaining structure is
trimmed. Therefore, the resultant PA-GA zone captures the expected proxim-
ity effects on transport in the PA-GA zone without having to store the entire
structure. During the structure synthesis phase, even though the stitching
of stress/dopant profiles can appear to be non-physical at the PA-GA zone
boundaries, proximity effects in important regions of the PA-GA zone (such
as the channel, source/drain-body boundary, etc.) are preserved as the PA-
GA zones were derived from similar test case process simulations.
Next, we discuss implementation strategies for the structure synthesizers.
113
FEOL structure synthesizer: Basic steps
Active island (AI) generation
Active FET merge operations
Active island
FEOL structure
DLD
Figure 4.16: FEOL structure synthesis
4.3.3 Implementation strategies
We implemented a basic FEOL synthesis algorithm, using the steps outlined in Fig.
4.16, for template 32nm bulk/SOI and 22nm SOI processes. Using inputs from the
FEOL PFRB and the layout active layer mask, we sequentially generate and merge
all active islands of the simulation domain. Thereafter, PA-GA zones or active FET
regions that were recognized from the layout are sequentially imported from the
DLD with the appropriate layout-designated widths and translated into another
simulation domain. Then, through a series of merge operations, the two domains
are merged to produce an FEOL region with geometrically accurate boundaries.
Doping/stress profiles in the PA-GA zones are stitched together with those in other
zones (whose profiles are prescribed by the FEOL PFRB), and introduced during
re-meshing of the FEOL structure.
114
Litho
effects insertion
Inter-layer dielectrics (ILD)
generationIncremental metal/via generation
with merge operations
BEOL metal with
conformal ILD
BEOL structure synthesizer: Basic steps
Figure 4.17: BEOL structure synthesis
BEOL synthesis and integrated structure synthesis occur along similar lines, as
shown in Figs. 4.17 and 4.18. During BEOL synthesis, individual metal layers are
created sequentially and merged to form a contiguous BEOL metal stack. Lithog-
raphy effects are introduced on a layer-by-layer basis via corner rounding from
the LED. Thereafter, the BEOL dielectric stack is generated from the BEOL pro-
cess assumptions and the BEOL PFRB. The metal stack is pushed into it through
a series of merge operations in order to generate a BEOL structure with metal and
conformal ILD. During integrated structure synthesis, the intermediate BEOL and
FEOL structures (generated by the respective synthesizers) are united with over-
lap resolution dictated by priorities specified in the FEOL/BEOL PFRB. In the next
115
section, we present three case studies to demonstrate the efficacy of the structure
synthesis approach.
Integrated structure synthesizer: Basic steps
Doping/sub-mesh
placement
Merge/resolution operations
Figure 4.18: Integrated structure synthesis
4.4 Structure synthesis case studies
The methodologies outlined in Section 4.3 have been implemented in a plugin tool
for the state-of-the-art Sentaurus TCAD tool suite [136], in order to leverage ad-
vanced process simulators like SProcess/TSuprem4 [137], as well as the Sentau-
rus Structure Editor [134] for structure synthesis. From a validation perspective,
we applied the structure synthesis approach to capacitance extraction experiments
(Appendix A), and the results have been very promising. In this section, we com-
pare capacitance extraction on synthesized structures, versus PSD structures in
Case 1, and characterize the scaling behavior of our implementation in Case 2.
116
Case 1: Capacitance extraction for 32nm bulk 6T SRAM – Process-simulated ver-
sus synthesized structures
To verify the efficacy of our approach, we simulated a planar 6T SRAM layout
with 30nm gate length and 112.5nm poly-to-poly pitch in a 32nm bulk process [138].
Fig. 4.19 shows the output at the intermediate steps involved in the gate-last pro-
cess consisting of trench device-isolation, formation of high-k dielectrics, polysil-
icon gate formation, source/drain formation with p-FET SiGe and n-FET raised
Si pockets, salicide formation, interlevel deposition/polish, removal of polysilicon
gates, dual metal-gate deposition, and finally, contact formation.
(a) (b)
(c)
(d)
(e)
Figure 4.19: Structure formation during a planar 6T SRAM process simulation: (a)trench device isolation, (b) formation of gate stack, (c) source/drain formation withspacers, (d) contact and via formation, and (e) final structure with doping
Following the methodology outlined in Section 4.3, we assigned PA-GA zones,
performed process simulation on individual FETs, and constructed the DLD. Next,
117
Table 4.2: Resource usage: Process simulation vs. structure synthesis
Metric Process simulation Structure synthesisTotal CPU time 75 hrs 6 hrs (synthesis + meshing)
+ 11.5 hrs (DLD construction)Memory 64 GB 12 GB (dominated
by DLD construction)Disk space 6 GB 2 GB
Number of threads 8 1 (synthesis)+ 8 (DLD construction)
we generated the FEOL PFRB and used it for structure synthesis. The synthesized
structure is shown in Fig. 4.20(a), where doping/stress profiles are accurate only
in the FET regions, and moderately accurate in the bulk. Table 4.2 shows the re-
sources consumed for both cases. Brute-force process simulation on the LUI is con-
siderably slower than synthesis (which has a one-time cost of DLD construction).
Maximum memory usage and disk space required are also considerably lower.
This suggests that automated structure synthesis could be leveraged to prune the
design space quickly, and full 3D process simulation of layouts can be performed
only on the finalized candidates if necessary.
We performed transport analysis based 3D-TCAD capacitance extraction ex-
periments on both the process-simulated structure and the synthesized structure
at five different extraction frequencies. Fig. 4.20(b) shows that the error percent-
age in bitline capacitance extraction (CBL) between the two is negligible above
10KHz and is maximum at around 2%, at 100Hz. We also performed hold static
noise margin (HSNM) and read static noise margin (RSNM) experiments for the
process-simulated and synthesized structures, with VDD = 1V . From Figs. 4.21(a)
and 4.21(b), it can be seen that the butterfly plots extracted from the structure-
synthesized 6T cell are nearly identical to the process-simulated 6T cell. This shows
that the proposed approach is reasonable and practical.
118
Pull-up (pFET)
Pass-gate (nFET)
Pull-down (nFET)
(a)
2 3 4 5 6−1
−0.5
0
0.5
1
1.5
2
2.5
Log10
(Extraction frequency)
Err
or
in b
itli
ne
ca
pa
cit
an
ce
(C
BL)
in
%
(b)
Figure 4.20: (a) Synthesized planar 6T SRAM structure, and (b) CBL extractionerror percentage
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VL (V)
VR (
V)
Process−simulated 6T cellStructure−synthesized 6T cell
(a)
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
VL (V)
VR (
V)
Process−simulated 6T cellStructure−synthesized 6T cell
(b)
Figure 4.21: Process-simulated versus synthesized 6T SRAM cells: (a) hold staticnoise margin (HSNM), and (b) read static noise margin (RSNM)
119
Case 2: Scaling behavior for 22nm SOI FinFETs
We leveraged the technology/device-independent nature of the proposed ap-
proach to investigate the scaling properties of our structure synthesizer imple-
mentations for 22nm SOI FinFET devices and circuits. We synthesized four dif-
ferent configurations of 6T FinFET SRAMs (Fig. 4.22) consisting of (1×1), (2×2),
(3×3) and (4×4) bitcells, directly from annotated layouts, such as the one shown in
Fig. 4.13. In Fig. 4.23, a fully meshed 3×3 structure with over 8×106 mesh nodes is
shown, which consumes 48 hours to extract capacitances between every net-to-net
pair in Sentaurus Device. Ring oscillators consisting of 5/11/17/23 stages were
also synthesized (Fig. 4.24). In each case, we measured the FEOL/BEOL/(FEOL +
BEOL) integration structure synthesis time as the time duration from layout anal-
ysis up to the beginning of the re-mesh operations, with a maximum memory al-
location of 32GB.
(a) 1X1 cell (b) 2X2 cell
(c) 3X3 cell (d) 4X4 cell
Figure 4.22: Synthesized 6T FinFET SRAM bitcell configurations
120
Figure 4.23: 3×3 6T FinFET SRAM bitcell structure with mesh
From Fig. 4.25, we can see that (FEOL+BEOL) integration dominates the total
runtime as a considerable amount of computation is performed in overlap reso-
lution during integrated structure synthesis. Also, FEOL synthesis is faster than
BEOL synthesis. This is highly dependent on the BEOL features/metal density
and FEOL device complexity, and can vary dramatically from layout to layout.
This is corroborated by the scaling behavior of ring oscillator structures, as shown
in Fig. 4.26, where BEOL synthesis time is similar to FEOL synthesis time owing to
the fact that metal-2/metal-3 density is lower in the ring oscillator layouts. From
Figs. 4.25 and 4.26, for a reasonable number of FinFETs (#FinFETs), FEOL, BEOL
as well as (FEOL+BEOL) synthesis times can be seen to scale very well using the
proposed approach.
121
(a) RO-5(b) RO-11
(c) RO-17(d) RO-23
Figure 4.24: Synthesized FinFET ring oscillator configurations
0.8 1 1.2 1.4 1.6 1.8 2 2.21
2
3
4
5
6
Log10
(# FinFETs)
Lo
g10
(S
tru
ctu
re s
ynth
esis
tim
e)
FEOL synthesisBEOL synthesis(FEOL+BEOL) integration Total (FEOL+BEOL) synthesis
(1X1)
(2X2)
(3X3)(4X4)
Figure 4.25: 6T FinFET SRAM: Synthesis time (in sec.) versus number of FinFETs
4.5 Discussion
While the above case studies demonstrate the efficacy of the structure synthesis ap-
proach, it is important to note that we have solved only the first part of the prob-
lem, as generic device simulation experiments can still be very time-consuming.
122
1.4 1.6 1.8 2 2.22
2.5
3
3.5
4
4.5
5
5.5
Log10
(# FinFETs)
Log
10 (
Str
uctu
re s
ynth
esis
tim
e)
FEOL synthesisBEOL synthesis(FEOL + BEOL) integrationTotal (FEOL + BEOL) synthesis
RO−5
RO−17
RO−11
RO−23
Figure 4.26: FinFET ring oscillator: Synthesis time (in sec.) versus number of Fin-FETs
(Capacitance extraction using 3D-TCAD device simulation is, in general, very fast
in comparison to DC/transient simulations.) However, the automated nature of
structure synthesis would help engineers utilize device simulators more efficiently,
as the work formerly done by an engineer (in setting up a 3D-TCAD deck for an
LUI) over a span of 3-4 weeks can be accomplished in a few minutes. Thus, struc-
ture synthesis makes the whole TCAD cycle more efficient by cutting down the
most time-consuming portion of the cycle, namely, manual re-coding of the 3D-
TCAD deck for a new LUI/process.
The current work also highlights the need to provide high-level abstrac-
tions/interfaces to TCAD engineers in order to maintain consistency across
technology nodes, layouts, and processes. Here, automated structure synthesis is
akin to logic synthesis in the circuit world (Fig. 4.27), where a gate-level netlist can
be derived from high-level synthesizable hardware description language (HDL)
code using a set of design libraries. The process of manually arriving at a gate-level
netlist from HDL code can be extremely cumbersome. In an analogous manner,
the absence of an automated structure synthesizer has been a major impediment
to TCAD engineers, and has been addressed for the first time in the current work.
123
RTL/HDL
description
Logic
synthesisGate-level
netlist
Design
libraries
Figure 4.27: Logic synthesis flows are the circuit-world analogs of Fig. 4.7(b)
4.6 Chapter summary
Analyzing and optimizing nanoscale devices and circuits using 3D-TCAD is
emerging as a necessity at lower technology nodes. Here, obtaining accurate 3D-
TCAD structures corresponding to LUIs via 3D process simulation is impractical,
as the latter is not amenable to iterative layout-TCAD optimization. In this work,
we proposed and validated an automated structure synthesis framework that
substantially reduces time and memory complexity during 3D-TCAD structure
generation. We circumvented the 3D process simulation barrier by preserving
accuracy, when needed, using individual process-simulated blocks and stitching
them together using layout information, technology assumptions, and PFRBs to
generate larger structures. Capacitance extraction experiments for comparing
structure synthesis with process-simulated layouts indicate that the methodology
is an excellent substitute to 3D process simulation for 3D-TCAD based analy-
sis of large LUIs, for which manual coding and process simulation runtime are
prohibitively expensive.
124
Chapter 5
Transport analysis based 3D-TCAD
Parasitic Capacitance Extraction in
Emerging Technologies
In this chapter, we focus on the problem of accurate parasitic capacitance extrac-
tion for circuits in highly-scaled CMOS technologies, which is listed as an issue
in the 2011 ITRS modeling and simulation roadmap [80] under Section 3.5. The
chapter is divided into three sections. Section 5.1 outlines the need for transport
analysis based 3D-TCAD parasitic capacitance extraction. Thereafter, Section 5.2
deals with hardware validation of the transport analysis approach in an experi-
mental 32nm SOI process. Finally, in Section 5.3, we explore parasitic capacitances
in emerging multi-gate devices at the 22/14/10nm technology nodes, using the
above approach.
125
5.1 The need for transport analysis based parasitic ca-
pacitance extraction
With technology scaling, extraction of layout-dependent parasitic capacitances is
becoming extremely important. In this section, we establish the need for a true
3D transport analysis based approach for highly scaled circuits, by leveraging the
structure synthesis methods from Chapter 4.
5.1.1 Introduction
Capacitance extraction is a key element of state-of-the-art industrial VLSI flows
affecting design timing, power, and stability of circuits. Current methods in dig-
ital/analog/RF design rely on field solvers [139] [140] [141] [142], which model
BEOL dielectrics and metal, and compact models which capture FEOL related
capacitances. While compact models can account for certain layout-dependent
effects, they are oblivious to the myriad possibilities in which back-end features
can interact with the active semiconductor device layer (as well as the numerous
shapes in which the active layer may be patterned). Field solvers treat FEOL sil-
icon at best as a material of uniform conductivity or as a lossy material, thereby
ignoring its nonlinear nature for arbitrary doping profiles.
Owing to the above, it is questionable whether the total capacitance predicted
at each node (as the sum of the FEOL and BEOL components) accurately captures
the actual capacitances seen in highly scaled circuits. Capacitance misprediction
is significant for compact yield-limiting circuits like large SRAM and eDRAM ar-
rays, where a minor difference in estimation of a few percent per bitcell coupled
with a large column height, can shift the failure point of operation. In this section,
we clarify the above for yield-critical single-/dual-port 6T SRAM bit cells in an ad-
126
vanced sub-32nm IBM SOI process through transport analysis based 3D-TCAD [81]
capacitance extraction.
5.1.2 Transport analysis based capacitance extraction
In general, for an N terminal contacted device structure, the phasor terminal
voltages Vk, k = 1, ...,N, are related to the phasor terminal currents Ik, k = 1, ...,N,
through the N ×N admittance matrix Y such that I = YV , where V = [V1, ...,VN ]T
and I = [I1, ..., IN ]T . Elements of Y , ˜Yab, are determined by individually exciting
each terminal with Vb so that
˜Yab = (Ia
Vb)|Vk=0,k 6=b (5.1)
The conductance matrix G and capacitance matrix C of the structure are obtained
from G= ReY and ωC = ImY, where ω is the excitation frequency. In the case of
highly scaled circuits, in order to determine the responses Ik accurately at each ω, it
is essential to treat the FEOL silicon as a semiconductor or solid-state plasma with
mobile carriers [143], i.e., obtain the solution of the coupled system of Poisson and
carrier continuity equations for slight perturbations around the DC bias point. At
each mesh node i of the device structure, the Poisson, electron, and hole continuity
equations can be recast into the form [144]:
Fφi(φ,n, p) = 0
Fni(φ,n, p)− ∂Gni(n)∂t
= 0
Fpi(φ,n, p)−∂Gpi(p)
∂t= 0 (5.2)
where F and G are nonlinear functions of φ,n, p, which represent matrices of po-
tential, electron, and hole density, respectively. The AC system of equations is
127
obtained by substituting ζ(t) = ζ0 + ζejωt with ζ = (φ,n, p) and ζ0 as the steady-
state solution in Eq. (5.2). Using the Taylor expansion with only linear terms
yields [144]:
Σ j
∂Fφi∂φ j
∂Fφi∂n j
∂Fφi∂p j
∂Fni∂φ j
(∂Fnin j− jω∂Gni
∂n j) ∂Fni
∂p j
∂Fpi∂φ j
∂Fpi∂n j
(∂Fpi∂p j− jω∂Gpi
∂p j)
φ j
n j
p j
= 0 (5.3)
With the appropriate boundary conditions, the global AC system is constructed
from Eq. (5.3) and solved to obtain the solution vectors [φ j n j p j]. The ac current
densities are computed as [83]:
~Jn = Σζ=φ,n,p∂~Jn
∂ζ|DC · ζ, ~Jp = Σζ=φ,n,p
∂~Jp
∂ζ|DC · ζ (5.4)
Using the above, the total phasor terminal currents of the device are calculated and
admittance matrix Y is obtained from Eq. (5.1).
The above methodology captures field-carrier interactions and is capable of ac-
counting for the inherent nonlinearity of the active semiconductor layer under all
bias conditions. This is quantified via a simple experiment using field solver (FS)
and transport analysis based TCAD capacitance extraction on a metal wire (contact
A) running over an active semiconductor region with contact B (cross-sections in
Figs. 5.1(a) and 5.1(b) having different doping profiles, with peak doping ND and
separation D). From Fig. 5.2(a), when D ≥ 0.3µm, FS and TCAD predictions for
high/low ND match closely. This reflects the sufficiency of FS based extractions at
higher technology nodes, with large separation between FEOL regions and BEOL
metal. However, as D decreases, FS overestimates capacitance considerably, even
at high ND. From Fig. 5.2(b), as expected, FS fails to track VAB changes and over-
128
estimates capacitance even at zero bias. The results in Figs. 5.2(a) and 5.2(b) are
very dependent on the doping profile in Figs. 5.1(a) and 5.1(b), respectively, and
can vary widely, indicating that FS based extraction will be inaccurrate at highly
scaled technology nodes.
Contact B
D
Peak doping (ND
)
Nitride
Oxide
Contact A
Silicon
1e208.7e167.6e13−3.7e12−4.4e15−5.0e18
Doping (cm −3)
(a)
Contact B
D
Oxide
Contact A
Nitride
Silicon
Peak doping (ND
)
Doping (cm −3)
1e208.7e167.6e13−3.7e12−4.4e15−5.0e18
(b)
Figure 5.1: Cross-sectional view of a metal wire running over an active semicon-ductor region with two arbitrary doping profiles
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2x 10
−16
D (µm)
Cap
acita
nce
CA
B (
F)
TCAD, ND = 1e16 cm−3
TCAD, ND = 5e17 cm−3
TCAD, ND = 1e18 cm−3
TCAD, ND = 1e19 cm−3
TCAD, ND = 1e20 cm−3
FS
(a) Using cross-section in Fig. 5.1(a)
−1 −0.5 0 0.5 10.5
0.6
0.7
0.8
0.9
1
1.1
1.2
1.3
1.4
1.5x 10
−16
VAB
(V), D = 0.05µm
Cap
acita
nce
CA
B (
F)
TCAD, ND=1e16 cm−3
TCAD, ND=1e18 cm−3
TCAD, ND=1e20 cm−3
FS
(b) Using cross-section in Fig. 5.1(b)
Figure 5.2: Comparison between FS and TCAD extracted capacitance CAB underdifferent conditions, ω/2π = 1MHz
129
5.1.3 Methodology and results
In this work, for the first time, transport analysis based 3D-TCAD capacitance ex-
traction is performed on multi-cell SRAM blocks. A layout-independent, auto-
mated TCAD structure generator (described in Chapter 4) was used to generate
3D meshed structures using the Synopsys Sentaurus TCAD tool suite [81]. The
structure generator incorporated a single set of FEOL and BEOL feature assump-
tions for an advanced sub-32nm IBM SOI process. This was applied to single- and
dual-port 6T SRAM layouts to create individual features embedded in conformal
interlayer dielectrics for generating the BEOL structures [e.g., Fig. 5.3(a)]. Indi-
vidual FETs of each type were imported from 2D process simulations performed
in Tsuprem4 [137], and were extruded, trimmed to the layout specific shapes and
placed at the appropriate locations in the FEOL structure with the correspond-
ing doping profiles [e.g., Fig. 5.3(b)]. Finally, the FEOL and BEOL structures
were merged to generate an accurate 3D representation of the bitcells with doping
profiles, contacts, and meshing. For each layout, two structures were generated:
(FEOL + BEOL) and (only FEOL). The difference in capacitances, i.e., (FEOL +
BEOL) capacitance − (FEOL) capacitance, represents the computed BEOL compo-
nent of the capacitance from ac transport analysis in TCAD. The BEOL component
was inserted into circuit schematics/netlists and used in SPICE simulations to in-
vestigate bitcell stability and performance.
Two types of SRAM cells, Type I [single-port 6T thin cell, Fig. 5.3(c)] and Type
II [dual-port 6T thin cell, Fig. 5.3(d)], were modeled using transport based analysis
(TCAD with zero bias conditions, ω/2π = 3 GHz) and FS. A series of multi-cell
blocks were generated: 1× 1, 2× 1, and 3× 1, where the bit lines (BL, BLB) are
shared across the cells. For each configuration, the bit (word) line capacitance
is computed as the total capacitance at the node for the entire structure divided
by the number of cells sharing the bit (word) line. From Fig. 5.4, we see that the
130
(a) Type I 3X3 (BEOL) (b) Type I 3X1 (FEOL)
(c) Type I 1X1 (d) Type II 1X1
Figure 5.3: 32nm planar SOI Type I and II SRAM structures
computed BEOL word line (WL) TCAD capacitances differ by (38%,34%,34%) in
comparison with the FS extracted capacitances for Type I cells for (1× 1, 2× 1,
3×1) configurations, which is considerable. Bit line capacitances differ on average
by (11%,8%,7%), respectively. Increasing the number of cells in the block refines
the capacitances, showing opposite trends for bit lines vs. word lines, thereby
highlighting the inadequacy of single cell simulations that suffer from edge effects
(in the absence of periodic boundary conditions which are difficult to use as they
degrade solver convergence).
131
Figure 5.4: Type I computed BEOL capacitances (TCAD vs. FS)
Figure 5.5: Type II computed BEOL capacitances (TCAD vs. FS)
Simulating blocks of M× 1 (1×M) cells (M > 2), which share bit (word) lines,
improves the average bit (word) line capacitance estimates as M increases, as they
are more representative of the true environment around the bit (word) lines and
suppress the contribution of edge effects. From Fig. 5.5, we see that for Type II
cells, the difference between FS and TCAD extracted word line and bit line capac-
itances is on average (19%,6%,7%) and (7%,6%,9%), respectively, and not as high
as for Type I cells, showing that the assignment of bit and word lines to different
metal layers of the bit cell layout is critical.
The TCAD and FS predicted BEOL bit line capacitances were used in SPICE
netlists/simulations of the bitcells. Figs. 5.6(a) and 5.6(b) highlight the worst-case
132
(a) Type I (b) Type II
Figure 5.6: Performance difference in Type I & II cells during read operations
bit line delay (50% discharge) assuming a column height of 32 bitcells, versus the
normalized bitcell sigma, defined as the fractional change on the bitcell FET Vths
occuring in the worst-case combination for pull-up, pass-gate, and pull-down de-
vices during read/write operations. For Type I cells, TCAD predicted bit line ca-
pacitance is consistently lower than the FS estimate, leading to unacceptably high
delay differences for the 1× 1 case. These differences decrease for the multi-cell
cases. In Type II cells, the predicted TCAD bit line capacitance is lower for the
1× 1 case and higher for the 3× 1 case, thereby causing positive as well as neg-
ative delay differences with respect to FS. Under typical non-zero DC operating
conditions, which are cumbersome to simulate via 3D-TCAD owing to the com-
putational cost/time, it is very likely that delay differences will be higher. It is
also important to note that the voltage/frequency-dependent contribution from
the FEOL compact models often fail to capture shape-specific BEOL to FEOL sili-
con interactions in generic layouts.
Fig. 5.7 shows the effect of using FS and TCAD bit and word line capacitances
in SPICE simulations to compute cell read stability, which is defined as the maxi-
mum bitcell sigma variation that can be tolerated before read upsets occur, i.e., bit
line charge leakage into internal cell nodes is sufficient to flip the bitcell state dur-
133
ing a read operation. The normalized read stability for Type I and II cells is lower
for the FS 1× 1 case. This changes with the 3× 1 case, with the TCAD prediction
being higher for Type I and lower for Type II.
Figure 5.7: Type I and Type II Read stability (TCAD vs. FS)
5.1.4 Section summary
In this work, a detailed comparison between transport analysis based 3D-TCAD
and FS based capacitance extraction for sub-32nm SRAM blocks of varying cell
counts and port configurations was performed. Simulation results from these
structures showed differences up to 11% (38%) in bit line (word line) capacitances,
which arises due the fact that transport in FEOL silicon is not properly accounted
for in the pure electrostatic approach. Also, the inadequacy of single cell TCAD
modeling (which leads to inaccurate performance and stability estimates) was
quantified from multi-cell extractions.
134
5.2 Hardware-assisted predictive capacitance extrac-
tion in 32nm SOI 6T SRAMs
In order to demonstrate the efficacy of structure synthesis combined with transport
analysis based extraction, we validated the methodologies with hardware data ob-
tained from a 32nm SOI process [145]. This is discussed next.
5.2.1 Introduction
Macros consisting of two flavors of thin-cell 6T SRAM arrays, namely 6T1 and 6T2,
were fabricated in an experimental IBM 32nm SOI HKMG technology (Fig. 5.8).
Figure 5.8: Thin-cell 6T SRAM array SEM top view showing HKMG n-/p-FETs
Using test structures for total bit line capacitance extraction, we obtained data
samples from 61 wafers at 10 locations per wafer. Intra-wafer measurements in-
dicated very little variation in CBL for both 6T1 and 6T2 (Fig. 5.9). However, the
inter-wafer CBL spread, shown in Fig. 5.10, was astonishingly high, with the max-
imum CBL being 56% higher than the minimum for 6T1. For 6T2, similar results
were obtained with a maximum to minimum spread of 47%. In order to pinpoint
135
(a) (b)
Figure 5.9: Measured intra-wafer CBL for (a) 6T1, and (b) 6T2
the major source of the spread, it was essential to determine if it originated from
FEOL or BEOL processing.
5.2.2 Methodology and results
We performed iterative BEOL analysis (IBA) and iterative FEOL analysis (IFA)
using the structure synthesis approach. We started with IBA, where back-end
(a) (b)
Figure 5.10: Measured inter-wafer CBL for (a) 6T1, and (b) 6T2
process assumptions and tolerances were obtained from scanning electron micro-
136
Figure 5.11: Synthesized (FEOL+BEOL) structure for the 6T1 SRAM bitcell
scope (SEM) snapshots and were utilized to generate several 6T1 BEOL instances
with varying metal, via, contact, and poly heights using the BEOL synthesizer.
These were combined with identical nominal FEOL instances to generate inte-
grated (FEOL+BEOL) instances, such as the one shown in Fig. 5.11, using the
(FEOL+BEOL) synthesizer. From Fig. 5.12, we see that inter-wafer variation in
BEOL parameters, which were subject to tight tolerances, failed to explain the large
spread in CBL.
Next, we moved to IFA. FEOL process assumptions were corroborated from
measured CGS-VGS data [Fig. 5.13(a)] and capacitance extraction simulations on
the multi-finger nMOS/pMOS capacitor test structures [Fig. 5.13(b)], which were
generated from their corresponding layouts.
On the FEOL side, since junction capacitance is the major contributor to CBL, we
examined process factors such as p-well dose, which affect junction capacitance.
Several FET process simulations were performed to obtain a variety of candidate
profiles by varying the p-well dose using the FEOL synthesizer. After synthesizing
137
-3.08%-1.88%4) CONFIG 3) + 21%
DECREASE IN VIA-1
HEIGHT
-1.33%
-3.93%
-1.25%
+0.52%
DIFFERENCE IN
CBL
-4.22%5) 33% DECREASE IN
METAL-3 WIDTH
-4.07%3) CONFIG 2) + 14%
DECREASE IN
CONTACT HEIGHT
-3.03%2) CONFIG 1) + 10%
DECREASE IN METAL-2
HEIGHT
-1.98%1) 7.5% DECREASE IN
POLY HEIGHT
DIFFERENCE IN
CWL
BEOL CONFIGURATION
(W.R.T BASE CASE)
Figure 5.12: Effect of variation in BEOL parameters (subject to intra-wafer toler-ances) on CBL and CWL for 6T1
integrated (FEOL+BEOL) instances, a characteristic curve of CBL vs. implanted p-
well dose [Fig. 5.14(a)] was constructed from transport analysis based extraction
in TCAD. From Fig. 5.14(a), we see that minor variations in the p-well dose are
sufficient to cause the CBL spread in Fig. 5.10. Thereafter, using hardware data
from 6T1, the p-well dose distribution of the process was computed [Fig. 5.15(a)],
which was likely to be the single largest contributor to the CBL spread.
To validate the above, we applied the p-well dose distribution to the compan-
ion 6T array in the same process, namely 6T2, and predicted its expected CBL dis-
tribution from its characteristic curve [Fig. 5.14(b)]. Fig. 5.15(b) shows that the
measured data for 6T2 corresponds very well with the predicted CBL distribution,
suggesting that FEOL bit line junction capacitance was indeed the source for CBL
variation. The above observations demonstrate that our methodology is very use-
ful for determining the sources of variation, and that it is versatile enough to pre-
dict distributions of new layouts already (to be) fabricated in a process, once the
process has been characterized.
138
(a)
(b)
Figure 5.13: (a) Measured vs. simulated CGS-VGS data for the nMOS capacitor struc-ture in Fig. 5.13(b) with width 1µm× 2 fingers, (b) Multi-finger (FEOL+BEOL)nMOS capacitor structure
5.2.3 Section summary
In this section, a hardware-assisted, unified 3D-TCAD based capacitance extrac-
tion methodology was validated using two companion 6T SRAM macros in an
IBM 32nm SOI HKMG process. It helped isolate the FEOL component, namely
junction capacitance as a dominant factor affecting total bit line capacitance varia-
tion across wafers, by using hardware data from one SRAM macro to compute the
139
−2 −1.5 −1 −0.5 0 0.52.05
2.1
2.15
2.2
2.25
2.3
2.35
2.4
2.45
Log10
(Normalized p−well dose)
Log
10 (
Nor
mal
ized
CB
L)
y = 0.032*x3 + 0.021*x2 + 0.045*x + 2.3
(a)
(b)
Figure 5.14: CBL variation with p-well dose for (a) 6T1, and (b) 6T2
p-well dose distribution of the process and subsquently applying it to the compan-
ion 6T SRAM macro, followed by a comparison with measured data corresponding
to the latter.
140
(a)
(b)
Figure 5.15: (a) P-well dose distribution computed from measured 6T1 CBL dis-tribution [Fig. 5.10(a)] and the characteristic curve of 6T1 [Fig. 5.14(a)], and (b)measured vs. predicted distribution for 6T2. The characteristic curve of 6T2 [Fig.5.14(b)] along with the computed p-well dose distribution [Fig. 5.15(a)] is used tocompute the 6T2 CBL distribution. BEOL variation is not considered.
141
5.3 Transport analysis based parasitic capacitance ex-
traction in emerging multi-gate devices and cir-
cuits
In order to perform timing analysis for multi-gate circuits accurately and deter-
mine the most efficient operating point, it is essential to capture parasitic resis-
tances and capacitances accurately. With narrow design windows, overestima-
tion of parasitics results in excessive guardbands, which impacts performance and
voids the benefits of migrating to a new technology node. On the other hand, un-
derestimation of parasitics coupled with process variations causes timing failures
and yield loss. In this section, we delve into transport analysis based capacitance
extraction for layouts having multi-gates FETs at the 22/14/10nm nodes.
5.3.1 Introduction
The nonplanar nature of multi-gate FETs leads to width quantization, where tran-
sistor electrical widths are integer multiples of individual fin electrical widths that
are determined by the fin height and fin thickness. Width quantization and the
3D nature of active multi-gate FET regions pose problems for extraction of FEOL
parasitic capacitances in arbitrary structures (e.g., multi-fin, multiple-finger multi-
gate FETs), thereby making it cumbersome to develop a generic, unified compact
model that can precisely capture FEOL capacitances. The latter is due to the fact
that compact models are predominately based on 2D cross-section assumptions.
Calculation of FEOL capacitances is nontrivial on account of the gate-source/drain
fringe capacitances that are present along each fin in a multi-fin multi-gate FET.
Here, unlike planar FETs, as the electrical width increases, total fringe capacitance
is not amortized to a negligible value per unit width. With increased proximity
142
of BEOL metal/vias/contacts to FEOL features, the issue of (FEOL+BEOL) capac-
itance extraction in multi-gate circuits becomes significant, and the need to com-
prehensively account for FEOL-BEOL interactions arises [133].
In this section, we address the above problems of FEOL/(FEOL+BEOL)
multi-gate capacitance extraction via a 3D-TCAD transport analysis based ap-
proach [146]. The rest of this section is organized as follows. In Section 5.3.2,
we discuss prior related work in multi-gate parasitics extraction. We examine
the sensitivity of device-level parasitic capacitances to various process param-
eters in candidate process-simulated bulk and SOI FinFETs at the 22/14/10nm
nodes in Section 5.3.3. In Section 5.3.4, we describe a unified 3D-TCAD flow for
the extraction of FEOL/(FEOL+BEOL) capacitances in generic multi-gate circuit
layouts by leveraging automated structure synthesis algorithms developed in
Chapter 4, thereby circumventing the complexity barriers posed by 3D process
simulations. Here, using the process-simulated devices from Section 5.3.3, we
compute circuit-level parasitic capacitances for 6T multi-gate SRAMs having dif-
ferent fin pitch, gate pitch, and fin count configurations, thereby providing critical
insight into bit line/word line/internal node FEOL/(FEOL+BEOL) capacitance
trends. In Section 5.3.5, we show that traditional segregated FEOL/BEOL mod-
eling approaches fail to accurately predict transient behavior, by back-annotating
3D-TCAD-extracted capacitances into mixed-mode write simulations of a 6T
FinFET SRAM bitcell. Here, we also examine the relative importance of accurately
modeling device transport versus parasitics, using propagation delay simulations
of FinFET NAND2 logic gates as an example. Finally, we conclude in Section 5.3.6.
5.3.2 Related work
With the advent of nonplanar multi-gate FETs, FEOL device design and optimiza-
tion have garnered significant attention from a parasitics perspective. Mitigation of
143
parasitic resistances/on-current (ION) enchancement via design and process modi-
fications, such as elevated source/drain extensions, usage of stress liners, strained
SOI, and doping profile optimization, have been explored in [147–153]. The effects
of fringe capacitances on device performance have been examined via 3D simu-
lations in [154], while an analysis of geometry-dependent parasitic capacitances
in multi-fin FinFETs is provided in [155]. RC-delay optimization (with fin pitch,
gate pitch, fin height, and fin thickness as parameters) in highly scaled multi-fin
FinFETs has been studied in [148, 156]. Experimental, aggressively-scaled FinFET
SRAM bitcells/arrays with tight fin/gate pitches and their design challenges have
been explored at the 32/22/10nm nodes in [73, 152, 157–160]. In comparison to
modeling parasitic resistances, capturing parasitic capacitances for arbitrary multi-
gate layouts remains a major challenge to-date. This is addressed via a holistic
3D-TCAD flow in the current work. In the next section, we examine the sensitiv-
ity of device-level parasitic capacitances to physical parameters used in FET-level
process simulations.
5.3.3 Multi-gate device-level parasitics
In this section, we examine parasitic capacitances in single-fin FinFETs. We per-
formed 3D process simulations in Sentaurus Process [136] in order to generate bulk
and SOI FinFET structures (such as those shown in Fig. 5.16) at the 22/14/10nm
nodes, using the parameters shown in Table 5.1. The physical dimensions (and
their ranges shown in parentheses) were obtained from a combination of candi-
date device configurations that have either been investigated experimentally or
via device simulations in [72, 148, 151, 153, 156, 160–164].
Here, LG, E f f ective TOX , HGAT E , LSP, TSI , HFIN , HELEV , LDL, NCH , and NSD are
the physical gate length, front/back-gate effective oxide thickness, gate height
above fin, spacer thickness, fin width, fin height, source/drain elevation above
144
(a) (b)
Gate
Raised
drain Raised
source
Channel
stop implant in bulk fin
Active
fin
Gate
Raised
drain
Buried
oxide
Figure 5.16: (a) Bulk FinFET, and (b) SOI FinFET
Table 5.1: Bulk and SOI FinFET device parameters
Technology node→ 22nm 14nm 10nmParameter Bulk and SOI FinFETs
LG(nm) 24[20−25] 14[14−18] 10[10−12]Effective TOX(nm) 1.1 0.9 0.7
HGAT E(nm) 40 40 40LSP(nm) 8 8 8TSI(nm) 10[10−12] 8[8−10] 6[4−7]
HFIN(nm) 40[24−50] 30[14−40] 20[10−30]HELEV (nm) 20 20 20
LDL(nm) 1.5 1.5 1.5NCH(cm−3) 1015 1015 1015
NSD(cm−3) 1020 1020 1020
Bulk FinFETs onlyTST I(nm) 80 80 80
NSTOP(cm−3) 3∗1018 3∗1018 3∗1018
SOI FinFETs onlyTBOX(nm) 240 240 240
fin, source/drain doping decay length, channel doping, and source/drain doping
concentrations, respectively. The above are common to both bulk and SOI FinFETs.
TST I and NSTOP are the shallow-trench isolation (STI) depth and channel stop im-
145
plant concentration, respectively, and are applicable to bulk FinFETs, while TBOX
is the buried oxide thickness in SOI FinFETs. In Fig. 5.17, we highlight the major
steps of the ‘gate-last’ process for bulk FinFETs [138], which involves fin defini-
tion, STI and high-k formation, followed by poly gate and spacer formation. Next,
source/drain epitaxy is performed, followed by poly gate removal and metal gate
deposition, and finally, the contact vias are formed. A similar gate-last process
is employed for SOI FinFETs as well. Thereafter, we perform capacitance extrac-
tion at zero-bias conditions, by importing the single-fin structures into Sentaurus
Device [136], in order to determine off-state capacitances.
Fin
formation
STI + High-k
formation
Poly gate
formation
Spacer
formation
Source/drain
epitaxy
Poly gate
removal
Metal gate
depositionContact via
formation
Figure 5.17: Bulk FinFET ‘gate-last’ process simulation steps
We investigated the dependence of total drain capacitance (CDRAIN,TOT ) and
total gate capacitance (CGAT E,TOT ) on various physical parameters listed in Table
5.1, and the results are shown in Figs. 5.18, 5.19, 5.20, and 5.21. Each process pa-
rameter was perturbed around the nominal value, leading to different ranges for
146
0 0.01 0.02 0.0310
15
20
25
30
LG
(µm)
CD
RA
IN, T
OT (
aF)
0 0.01 0.02 0.0320
30
40
50
60
LG
(µm)
CG
AT
E, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
0.02 0.03 0.04 0.05 0.0610
15
20
25
30
35
HGATE
(µm)
CD
RA
IN, T
OT (
aF)
0.02 0.03 0.04 0.05 0.0620
30
40
50
60
70
HGATE
(µm)C
GA
TE
, TO
T (aF
)
(a) (b)
(c) (d)
Figure 5.18: Dependence of CDRAIN,TOT and CGAT E,TOT on LG and HGAT E
different devices, depending on the technology node. From Figs. 5.18(a) and (b),
Figs. 5.19(c) and (d), and Figs. 5.20(c) and (d), we observe that at all the three tech-
nology nodes, CDRAIN,TOT and CGAT E,TOT are relatively immune to changes in LG,
TSI , and HELEV , with a 5-10% change in off-state capacitance for the chosen win-
dows around the nominal values. From Figs. 5.18(c) and (d), and Figs. 5.20 (a) and
(b), we see that CDRAIN,TOT and CGAT E,TOT scale linearly with HGAT E and HFIN , sug-
gesting that parallel-plate-like gate-drain/source-contact capacitances dominate.
From Figs. 5.19(a) and (b), we see that CDRAIN,TOT and CGAT E,TOT are extremely
sensitive to spacer length LSP, where an 8nm increase in LSP is sufficient to halve
the capacitances. In Figs. 5.21(a) and (b), NCH is seen to have negligible impact on
CDRAIN,TOT and CGAT E,TOT , which suggests that zero-bias depletion capacitances in
the fin drain-body junction are negligible. From Fig. 5.21(c) and (d), CDRAIN,TOT
147
0.006 0.008 0.01 0.012 0.0140
10
20
30
40
LSP
(µm)
CD
RA
IN, T
OT (
aF)
0.006 0.008 0.01 0.012 0.0140
20
40
60
80
LSP
(µm)
CG
AT
E, T
OT (
aF)
0 0.005 0.01 0.01510
15
20
25
30
TSI
(µm)
CD
RA
IN, T
OT (
aF)
0 0.005 0.01 0.01520
30
40
50
60
TSI
(µm)C
GA
TE
, TO
T (aF
)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(a) (b)
(c) (d)
Figure 5.19: Dependence of CDRAIN,TOT and CGAT E,TOT on LSP and TSI
and CGAT E,TOT are seen to be moderately affected by small variations in LDL (which
determines the device underlap/overlap).
In addition to the above, we found CDRAIN,TOT and CGAT E,TOT to be weakly
susceptible to minor variations in NSTOP, TST I , TBOX , and facet angle of the raised
source/drain regions. From Figs. 5.18, 5.19, 5.20, and 5.21, we see that single-fin
22nm bulk FETs have only marginally higher CDRAIN,TOT and CGAT E,TOT in com-
parison to their SOI counterparts, while at the 14nm/10nm nodes, the difference
is negligibly small. Overall, CDRAIN,TOT decreases by 16% (44%) moving from the
22nm to 14nm FETs (14nm to 10nm FETs), while CGAT E,TOT decreases by 18% (51%),
respectively. The above analysis (which is based on single-fin structures) suggests
that the maximum reduction in FEOL off-state capacitances would occur when
switching from the 14nm to 10nm devices, for the design points chosen in Table 5.1.
148
0.01 0.02 0.03 0.04 0.0510
15
20
25
30
35
HFIN
(µµµµm)
CD
RA
IN,
TO
T (
aF
)
0.01 0.02 0.03 0.04 0.0520
30
40
50
60
70
HFIN
(µµµµm)
CG
AT
E,
TO
T (
aF
)
0.01 0.015 0.02 0.025 0.0310
15
20
25
30
35
HELEV
(µµµµm)
CD
RA
IN,
TO
T (
aF
)
0.01 0.015 0.02 0.025 0.0320
30
40
50
60
HELEV
(µµµµm)
CG
AT
E,
TO
T (
aF
)
Bulk 22nm
SOI 22nm
Bulk 14nm
SOI 14nm
Bulk 10nm
SOI 10nm
(a) (b)
(c) (d)
Figure 5.20: Dependence of CDRAIN,TOT and CGAT E,TOT on HFIN and HELEV
1015
1016
1017
1018
10
15
20
25
30
NCH
(cm-3
)
CD
RA
IN,
TO
T (
aF
)
1015
1016
1017
1018
20
30
40
50
60
NCH
(cm-3
)
CG
AT
E,
TO
T (
aF
)
1 2 3 4
x 10-3
10
15
20
25
30
35
40
LDL
(µµµµm)
CD
RA
IN,
TO
T (
aF
)
1 2 3 4
x 10-3
20
30
40
50
60
70
LDL
(µµµµm)
CG
AT
E,
TO
T (
aF
)
Bulk 22nm
SOI 22nm
Bulk 14nm
SOI 14nm
Bulk 10nm
SOI 10nm
(a) (b)
(c) (d)
Figure 5.21: Dependence of CDRAIN,TOT and CGAT E,TOT on NCH and LDL
149
In the next section, we compute parasitic capacitances in multi-fin FinFETs and 6T
SRAMs to explore circuit-level trends.
5.3.4 Multi-gate circuit-level parasitics
In this section, we explain the challenges involved in multi-gate circuit-level para-
sitics extraction and demonstrate a pragmatic 3D-TCAD based solution applied to
multi-fin multi-gate FETs and 6T multi-gate SRAMs.
Methodology
Estimation of parasitic capacitances in multi-gate circuits is beset with two
major problems: FEOL extraction for generic FET layout configurations and
(FEOL+BEOL) extraction for arbitrary metal layout configurations above the
FEOL active regions. Traditionally, the FEOL component of capacitance is com-
puted via TCAD simulations on 2D/3D geometries of select configurations (as
in Section 5.3.3), while the BEOL component is captured using accelerated field
solvers [139–141]. As FEOL devices shrink with each subsequent technology node,
BEOL metals/contacts/vias also get smaller, resulting in metal features that are
close to the active FET regions. Thus, it is difficult to accurately determine the total
capacitance at each node of the circuit using a segregated FEOL/BEOL approach,
as it would not comprehensively account for geometry-specific FEOL-BEOL inter-
actions [133]. The development of unified, layout-aware compact models that can
accurately compute FEOL capacitances in multi-gate circuits with widely varying
fin counts, fin/gate pitches in multi-finger FETs, FETs with shared source/drains
with different fin counts on either side, etc., is also a cumbersome problem.
The above present challenges for process-device-circuit co-design during the
early phases of technology development, where no straightforward methodology
exists to directly determine the effect of process modifications on circuit-level par-
150
asitics. Owing to the small absolute capacitances involved, modeling errors of
the order of few tens of aF per fin, which would be ignored in devices at higher
technology nodes, can easily translate to a large percentage difference in predicted
capacitances, and lead to erroneous timing estimates in multi-gate circuits.
Transport analysis based 3D-TCAD capacitance extraction offers a plausible
solution, as it can account for transport in FEOL device regions, and capture
FEOL-BEOL interactions accurately by treating active regions as semiconductors
in (FEOL+BEOL) structures. However, the traditional approach (as shown in
Fig. 5.22(a)) is plagued by the intractable time/memory complexity of 3D process
simulation of large layouts (for the generation of accurate geometries prior to
device simulation), which dramatically limits its scope.
In our work, we circumvented the 3D process simulation barrier by lever-
aging the automated structure synthesis methodology, whose multi-gate ver-
sion is outlined in Fig. 5.22(b). As before, the approach involves a one-time
process-simulation cost for the construction of a multi-gate FET database con-
sisting of n-/p-FinFETs at each technology node. Thereafter, with the aid of an
FEOL/(FEOL+BEOL) multi-gate structure synthesizer (which is equipped with a
layout analyzer/partitioner), the FEOL/(FEOL+BEOL) structure corresponding
to any input multi-gate circuit layout is synthesized automatically, using the
device database and FEOL/BEOL process assumptions. This enables a crucial
modeling trade-off, where process-level accuracy is preserved in regions in and
around active FETs, while providing very favorable time/memory scaling prop-
erties, thereby extending its reach beyond simple layouts. It also permits iterative
optimization for a large number of layouts in a practical timeframe. We harness
the setup in Fig. 5.22(b) to analyze multi-fin multi-gate FETs and 6T multi-gate
SRAMs in subsequent parts of this section.
151
Process
conditions
3D process simulation
of the entire layout using detailed
mechanisms/kinetics
Process-simulated structure
Transport analysis based
3D-TCAD capacitance
extraction
Litho simulation,
mask generation
Generic layout
N devices
Tweak layout/process parameters
Complexity bottleneck
Time: f(N), Memory: g(N)
(a)
FEOL/BEOL
process assumptions
FEOL/(FEOL+BEOL)
multi-gate structure synthesizer
FEOL structure
Transport analysis based
3D-TCAD FEOL/(FEOL+BEOL)
capacitance extraction
Generic layout
N devices
Tweak layout
/process
Time: k(N), Memory: h(N)
(FEOL+BEOL) structure
Process
simulator
Process
recipe
Device
database
Single
FET layout
Scalable to large layouts
Process-simulated
FETs
(b)
Figure 5.22: 3D-TCAD based capacitance extraction for generic multi-gate circuitlayouts: (a) traditional approach using brute-force process simulation, and (b) ourflow which leverages the automated structure synthesis approach
Parasitic capacitances in multi-fin multi-gate FETs
Owing to the width quantization property, multi-gate FETs with large electrical
widths need to have multiple fins. We synthesized multi-fin FinFETs using the152
bulk and SOI FinFETs generated earlier at the 22/14/10nm nodes. They con-
sisted of four fins each, with shared raised source/drain epi-regions that are
via-contacted and connected using metal-1, as shown in Figs. 5.23(a) and (b). We
varied the fin pitch, FP, and computed the parasitic (FEOL+BEOL) capacitances
for each layout using the setup described in Fig. 5.22(b).
GATE
DRAIN SOURCE DRAIN SOURCE
GATE
(a) (b)
Figure 5.23: Multi-fin FinFET (a) bulk, and (b) SOI structures. Dielectric regionsare not shown
From Fig. 5.24(a), we can see that the trends in CDRAIN,TOT are in stark contrast
to the single-fin results in Section 5.3.3. While moving from SOI to bulk FETs,
there is a 11.5%, 10.8%, and 8.8% increase in CDRAIN,TOT for the 22nm, 14nm, and
10nm nodes, respectively, which can be attributed to the shared drain-to-bulk fin
capacitances in bulk FETs. However, in the case of CGAT E,TOT [Fig. 5.24(b)], there is
only a 2-4% increase from SOI to bulk FETs. An increase in FP from 40nm to 70nm
results in a 20%, 31%, and 36% increase in CDRAIN,TOT for the 22nm, 14nm, and 10nm
nodes, respectively, while CGAT E,TOT increases by 16%, 26%, and 28%, respectively.
These results suggest that gate-to-epi-source/drain/metal-1 capacitances begin to
153
dominate as FP increases/as the technology node decreases, and they highlight
the need to model the entire (FEOL+BEOL) structure.
40 45 50 55 60 65 7070
80
90
100
110
120
130
140
150
Fin pitch, FP (nm)
CD
RA
IN, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(a)
40 45 50 55 60 65 70100
120
140
160
180
200
220
240
Fin pitch, FP (nm)
CG
AT
E, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(b)
Figure 5.24: Dependence of CDRAIN,TOT and CGAT E,TOT on FP
Parasitic capacitances in 6T multi-gate SRAMs
Since SRAMs are among the densest circuits manufactured at any technology
node, we examine their multi-gate variants in detail. As with highly scaled planar
6T SRAMs, a delicate balance needs to be maintained between static/dynamic
readability and writeability metrics in multi-gate 6T SRAMs as well. Here, a key
challenge is to determine the bit line (CBL,TOT , CBLB,TOT ), word line (CWL,TOT ),
and internal node (CNL,TOT , CNR,TOT ) capacitances accurately, so that they can be
back-annotated into SPICE/mixed-mode TCAD simulations that capture dynamic
stability metrics. Figs. 5.25(a) and 5.26(a) show the synthesized 3D (FEOL+BEOL)
geometries for bulk and SOI FinFET SRAM bitcells, while Figs. 5.25(b) and 5.26(b)
show their corresponding FEOL counterparts. The bitcell names are based on
pull-up (PU), pass-gate (PG), and pull-down (PD) fin counts. Hence, Figs. 5.25
and 5.26 represent 6T SRAM (111) bulk and SOI configurations, respectively. From
Figs. 5.25(a) and 5.26(a), we can see that both bitcells are designed with metal-3
154
word lines (WLs), metal-2 bit lines (BL and BLB), metal-2 ground (GND) and sup-
ply (VDD), which is also the convention followed in all the bitcell configurations
that follow. For the FEOL structures in Figs. 5.25(b) and 5.26(b), all back-end
features, including contacts to epi-raised source/drain regions, are absent. The
connected gates are preserved to capture WL and internal node (NL and NR)
connectivity.
(a) (b)
WL
WL
NR
NL
WL
GND
BL
BLBVDD
GND
Pull-down
Pull-up
Pass-gate
n-FinFET
p-FinFET
Figure 5.25: Bulk FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
Varying fin pitch: We computed (FEOL+BEOL) capacitances corresponding to
6T FinFET SRAM (111) layouts for various fin pitches with gate pitch, GP = 90nm
(Fig. 5.27). From Fig. 5.27(a), we can see that as FP decreases, CBL,TOT increases.
Since metal-2 BLs are vertical to the cell in the chosen configurations, CBL,TOT is
highly susceptible to fin pitch modifications. The effect is very pronounced in all
the bulk (SOI) SRAMs, which witness a 31-36% (31-38%) increase in capacitance,
respectively. The plateau in CBL,TOT at FP = 50-60nm is due to the fact that metal-2
BL, BLB, VDD, and GND tracks are wider for FP = 60nm, 70nm, owing to the larger155
(a) (b)
GND BLBLB
VDD
GND
WLWL NR
NL
WL
Pull-down
Pull-up
Pass-gate
n-FinFET
p-FinFET
Figure 5.26: SOI FinFET 6T SRAM (111) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
pitches. CWL,TOT is affected by trends at high and low FP. As FP increases, the
metal-3 WL gets longer and aggregates capacitances from bitcell features below it,
which increases CWL,TOT . When FP decreases beyond a certain point (FP = 50nm),
the capacitance between the WL gate and shared source/drain/metal-1 regions
in the neighborhood boosts CWL,TOT . Fig. 5.28 shows the FEOL capacitances for
FP = 50nm. For both SOI and bulk SRAMs, FEOL CBL,TOT , CWL,TOT , and CNL,TOT
increase by 40-60% from the 10nm to 22nm nodes. The ratio between FEOL and
(FEOL+BEOL) components across technology nodes at FP = 50nm for CBL,TOT ,
CWL,TOT , and CNL,TOT is 22-28%, 50-55%, and 76-82%, respectively. From the above
observations, we can see that FP needs to be chosen carefully to manage CBL,TOT ,
CWL,TOT , and CNL,TOT .
Varying gate pitch: We also computed (FEOL+BEOL) capacitances correspond-
ing to 6T FinFET SRAM (111) layouts for various gate pitches with FP = 50nm
(Fig. 5.29). In Fig. 5.29(a), CBL,TOT can be seen to increase by 6-8% across the tech-
nology nodes, as GP increases from 70nm (80nm) to 100nm (110nm). This is on ac-
156
40 45 50 55 60 65 7080
90
100
110
120
130
140
Fin pitch, FP (nm)
CB
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(a)
40 45 50 55 60 65 70160
170
180
190
200
210
220
230
240
Fin pitch, FP (nm)
CW
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(b)
40 45 50 55 60 65 7024
26
28
30
32
34
36
38
40
Fin pitch, FP (nm)
CB
L, W
L (aF
)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(c)
40 45 50 55 60 65 70140
160
180
200
220
240
260
Fin pitch, FP (nm)
CN
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(d)
Figure 5.27: CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. FP, GP = 90nm
count of metal-2 BL parallel-plate-like capacitances to surrounding features, which
is dependent on the BL track length/bitcell height and, hence, on GP. As GP de-
creases, CWL,TOT trends upward in Fig. 5.29(b), implying that at tight gate pitches,
WL gate to BL, and WL gate to internal node coupling increases. This is also con-
firmed by Figs. 5.29(c) and (d), where CBL,WL increases by 11-12% and CNL,TOT in-
creases by 13-15% for the bulk and SOI cases, when moving from GP = 100nm
(110nm) to 70nm (80nm). The latter trend is not observed for the FP cases in Fig. 5.27.
In stark contrast with single-fin FEOL capacitance observations in Section 5.3.3,
from Figs. 5.27 and 5.29, we see that the maximum reduction in (FEOL+BEOL)
157
1 2 30
20
40
60
80
100
120
140
160
180
200
Cap
acit
ance
(aF
)
1 2 30
20
40
60
80
100
120
140
160
180
200
Cap
acit
ance
(aF
)
Bulk 10nmBulk 14nmBulk 22nm
SOI 10nmSOI 14nmSOI 22nm
CBL, TOT
CBL, TOT
CWL, TOT C
WL, TOT
CNL, TOT C
NL, TOT
Figure 5.28: FEOL components of capacitance in the 6T SRAM (111) configuration,FP = 50nm, GP = 90nm
70 80 90 100 11085
90
95
100
105
110
115
120
125
Gate pitch, GP (nm)
CB
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(a)
70 80 90 100 110160
170
180
190
200
210
220
230
Gate pitch, GP (nm)
CW
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(b)
70 80 90 100 11020
25
30
35
40
45
Gate pitch, GP (nm)
CB
L, W
L (aF
)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(c)
70 80 90 100 110140
160
180
200
220
240
260
Gate pitch, GP (nm)
CN
L, T
OT (
aF)
Bulk 22nmSOI 22nmBulk 14nmSOI 14nmBulk 10nmSOI 10nm
(d)
Figure 5.29: CBL,TOT , CWL,TOT , CBL,WL, and CNL,TOT vs. GP, FP = 50nm
158
CBL,TOT and CWL,TOT occurs when moving from the 22nm node to 14nm node, and
is around 17% and 21%, respectively.
Varying fin count: An important aspect of SRAM bitcell design is the set of bitcell
β (βPD/PG and βPG/PU ) ratios that determine readability and writeability metrics,
and are set by the electrical widths of the PU, PG and PD FETs. Owing to the
width quantization property, the electrical widths of the PG/PD FinFETs can only
be increased in integer multiples of a single fin, on account of which it is necessary
to examine the impact of varying fin counts on bitcell capacitances. We synthesized
several different flavors of FinFET SRAMs, namely (112), (113), (122), and (123)
(shown in Figs. 5.30, 5.31, 5.32, and 5.33), using 22nm bulk and SOI FETs with FP =
40nm and GP = 90nm. Fig. 5.34(a) shows the variation of CBL,TOT , CWL,TOT , and
GND BL BLBVDD GND
WL NR
NL
Pull-down
WL
WLPull-up
Pass-gateNR
NL
(a) (b)
n-FinFET
p-FinFET
Figure 5.30: SOI FinFET 6T SRAM (112) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
CNL,TOT for the different configurations. CBL,TOT decreases by 25% while moving
from the (111) to (112) configuration, as the addition of a single fin to the PD FET
adds an extra fin pitch, which permits larger spacings between GND, BL, and VDD.
While the (113) configuration adds an extra PD fin, the PG fin remains unchanged,
on account of which reduction in CBL,TOT from (112) to (113) is not significant. From
the above, we can see that a 33% (66%) increase in bitcell area, from the (111) to159
(a) (b)
GND BL BLBVDD GND
WL NR
NL
Pull-down
WL
WLPull-up
Pass-gateNR
NL
n-FinFET
p-FinFET
Figure 5.31: SOI FinFET 6T SRAM (113) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
GND BL BLBVDD GND
WL NR
NL
(a) (b)
Pull-downWL
WLPull-up
Pass-gateNR
NL
n-FinFET
p-FinFET
Figure 5.32: SOI FinFET 6T SRAM (122) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
(112) or [(111) to (113)] configuration, can reduce CBL,TOT and increase the βPD/PG
ratio significantly. The (122) and (123) bitcells have higher CBL,TOT as the PG fin
count is higher. Since metal-3 WLs run across the breadth of the bitcell, CWL,TOT
generally increases as the PD/PG fin count increases. However, CWL,TOT decreases
from the (122) to (123) configuration, as the WL gate to internal node coupling
decreases, due to the additional fin pitch spacing between them. We also examined
the FEOL capacitance trends across the different bitcell configurations using SOI
160
GND BL BLBVDD GND
WL NR
NL
Pull-downWL
WL
Pull-up
Pass-gateNR
NL
(a) (b)
n-FinFET
p-FinFET
Figure 5.33: SOI FinFET 6T SRAM (123) configuration (a) (FEOL+BEOL), and (b)FEOL only. Dielectric regions are not shown
1 2 3 4 595
100
105
110
115
120
125
130
135
140
CB
L, T
OT (
aF)
1 2 3 4 5180
200
220
240
260
280
300
320
340
CW
L, T
OT (
aF)
1 2 3 4 5150
200
250
300
350
400
450
CN
L, T
OT (
aF)
Bulk 22nmSOI 22nm
(111)(112)(113)(122)(123) (111)(112)(113)(122)(123) (111)(112)(113)(122)(123)
(a) (b) (c)
Figure 5.34: CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM(FEOL+BEOL) configurations
FETs (Fig. 5.35). While the addition of PD fins does not significantly impact FEOL
CBL,TOT , the addition of a PG fin leads to a 73% increase with respect to (111).
CWL,TOT decreases while moving from the (111) to (112) configuration, as the WL
gate is located away from the internal nodes (Fig. 5.30). However, the addition of
a PG fin increases CWL,TOT by 65% with respect to (111).
161
1 2 3 4 530
35
40
45
50
55
CB
L, T
OT (
aF)
1 2 3 4 5110
120
130
140
150
160
170
180
190
200
210
CW
L, T
OT (
aF)
1 2 3 4 5160
180
200
220
240
260
280
300
320
CN
L, T
OT (
aF)
SOI 22nm
(111) (112) (113) (122) (123) (111) (112) (113) (122) (123) (111) (112) (113) (122) (123)
(c)(b)(a)
Figure 5.35: CBL,TOT , CWL,TOT , and CNL,TOT vs. various (PU PG PD) SRAM FEOLconfigurations
Modeling lithographic effects: The structures synthesized above do not take into
account the lithographic rounding effects on printed features and, hence, our ear-
lier setups are likely to overestimate parasitic capacitances. In order to quantify the
latter, we performed simple experiments where feature rounding was introduced
into the BEOL metal layer by layer (with 10nm and 6nm radii of curvature for metal
and vias, respectively) during structure synthesis, to realistically model printed
via/metal shapes. Figs. 5.36(a) and (b) show the 22nm bulk 6T SRAM (111) BEOL
metal stack without and with lithographic corner rounding. From Figs. 5.37(a) and
(b), we can see that there is only a 3-4% overestimation error, which suggests that
lithographic effects can be ignored without too much loss in accuracy.
5.3.5 Multi-gate parasitics vs. device transport
In this section, we highlight the need to back-annotate 3D-TCAD-extracted para-
sitic capacitances into mixed-mode transient simulations, and compare the relative
importance of modeling device transport versus device parasitics.
162
(a) (b)
Figure 5.36: BEOL metal stack from the 22nm 6T SRAM (111) bitcell (a) withoutlithography effects, and (b) with lithography effects. Dielectric regions are notshown
1 2 3−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0
Err
or =
(C
WIT
H L
ITH
O −
CW
ITH
OU
T L
ITH
O)/
CW
ITH
LIT
HO
(%
)
Bulk 10nmBulk 14nmBulk 22nm
CBL, TOT
CWL, TOT
CNL, TOT
(a)1 2 3
−0.035
−0.03
−0.025
−0.02
−0.015
−0.01
−0.005
0
Err
or =
(C
WIT
H L
ITH
O −
CW
ITH
OU
T L
ITH
O)/
CW
ITH
LIT
HO
(%
)
FP=40nm
FP=50nm
FP=60nm
FP=70nm
CNL, TOT
CWL, TOTC
BL, TOT
(b)
Figure 5.37: CBL,TOT , CWL,TOT , and CNL,TOT error percentages for (a) bulk 6T SRAM(111) configuration, FP = 50nm, and (b) varying FP for the 22nm bulk 6T SRAM(111) configuration. GP = 90nm
Back-annotating 3D-TCAD parasitics: Device-circuit co-design with FETs at
emerging technology nodes is studied using mixed-mode simulations [81]. How-
ever, parasitic capacitances are generally ignored and rarely accounted for explic-
163
NN
BL BLBWLWL
VDD
NL
GND
NR
Mixed-mode setup
Transient mixed-
mode TCAD simulation
Figure 5.38: Vanilla mixed-mode setup (V MM)
itly [72], under the assumption that they have negligible effects. Figs. 5.38, 5.39,
and 5.40 show three different TCAD setups that are relevant. Fig. 5.38 represents
the ‘vanilla’ mixed-mode (V MM) setup, where individual devices are connected
to form a circuit, and transient simulations are performed without any additional
parasitics data. With technology scaling, under the assumption that FEOL ca-
pacitances are captured correctly in the TCAD device cross-sections, additional
BEOL capacitances can be included explicitly, as shown in Fig. 5.39. They are
obtained from FS based capacitance extractions on the relevant BEOL structures,
and the setup is referred to as ‘FS parasitics + mixed-mode’ (FSMM). In the third
setup (shown in Fig. 5.40), 3D-TCAD-extracted capacitances from (FEOL+BEOL)
structures are back-annotated into the mixed-mode setup. However, to avoid
double counting contributions already accounted for in the device cross-sections,
a capacitance extraction experiment is performed for the mixed-mode setup,
and the difference between the former and the latter is included explicitly. The
3D-TCAD based extraction/back-annotation strategy in Fig. 5.40 is enabled by the
methodologies discussed in Fig. 5.22(b). This setup is referred to as ‘3D-TCAD
parasitics + mixed-mode’ (3D-TCADMM).
Using the above, we performed mixed-mode bitcell write simulations for the
22nm SOI 6T FinFET SRAM (111) bitcell (FP = 40nm, GP = 90nm), assuming an ar-
ray column height (row width) of 32 (256) bitcells, and VDD = 1V . From Fig. 5.41,
we see that for the chosen WL pulse width of 150ps, the V MM and FSMM setups
164
BEOL structure
synthesis
FS capacitance
extraction
NN
BL BLBWLWL
VDD
NL
GND
NR
BEOL capacitance
contribution
…, etc
BL
NL
BL
WL
BL
NRC(B
L, N
L)
Mixed-mode setup
C(B
L, W
L)
C(B
L, N
R)
Transient mixed-
mode TCAD simulation
Input layout
Figure 5.39: Mixed-mode setup with FS-extracted BEOL capacitances (FSMM)
3D TCAD
structure synthesis
3D TCAD
capacitance extraction
NN
BL BLBWLWL
VDD
NL
GND
NR
Mixed-mode
TCAD capacitance
extraction
C(BL, NL) = ...
C(BL,WL) = …C(BL, NL) = ...
C(BL,WL) = …
Mixed-mode capacitance
correction
…, etc
BL
NL
BL
WL
BL
NR∆C
(BL, N
L)
Mixed-mode setup
∆C
(BL, W
L)
∆C
(BL, N
R)
Corrected
transient mixed-mode TCAD
simulation
Input layout
Figure 5.40: Mixed-mode setup with corrected 3D-TCAD capacitances (3D-TCADMM)
165
yield writeable cells, albeit with a large difference in NL-NR cross-over point. The
3D-TCADMM setup predicts a write failure. This shows that V MM setups are un-
realible for transient multi-gate circuit simulations. Also, the difference between
FSMM and 3D-TCADMM suggests that a segregated FEOL/BEOL modeling ap-
proach breaks down at such highly scaled technology nodes. This is due to the fact
that FEOL capacitances are not accurately captured from single-fin 2D/3D cross-
sections in the FSMM setup, e.g., SRAM bitcells with different fin/gate-pitches
would effectively have the same FEOL capacitance contribution. The 3D-TCADMM
setup is able to holistically capture FEOL capacitances for any multi-gate layout
configuration, and also account for FEOL-BEOL interactions accurately.
Typically, DC write metrics such as writeability current (IW ), which are ex-
tracted from the ‘N-curves’ [165], can ensure writeability at some arbitrary write
pulse width. However, from an array performance and throughput perspective,
it is essential to determine the minimum write-pulse width (TW ), i.e., the shortest
WL pulse width needed to unconditionally write into any bitcell in the array. Here,
modeling dynamic behavior of the bitcell with accurate parasitics data is critical.
We quantified the difference between FSMM and 3D-TCADMM by extracting TW at
various cell sigmas, with σVt = 30mV . The results are shown in Fig. 5.42. TW pre-
dicted by 3D-TCADMM is consistently higher than FSMM, which implies that the
net effects of the FEOL capacitance difference between the two and FEOL-BEOL
capacitance contributions captured by 3D-TCADMM are not negligible.
Modeling device transport vs. parasitics: Over the past decade, considerable
amount of research has been directed toward the inclusion of advanced transport
phenomena into mainstream device simulators, in order to perform accurate
mixed-mode device-circuit simulations. The latter is becoming increasingly com-
mon owing to the fact that compact model development for SPICE simulations
lags developments in technology, and that it is cumbersome to re-target compact
166
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 10−10
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vol
tage
(V
)
WLINNL, Vanilla mixed modeNR, Vanilla mixed modeNL, FS parasitics + mixed modeNR, FS parasitics + mixed modeNL, 3D−TCAD parasitics + mixed modeNR, 3D−TCAD parasitics + mixed mode
Figure 5.41: Write operations for a 6T FinFET SRAM (111) bitcell using the setupsdescribed in Figs. 5.38, 5.39, and 5.40
50 100 150 200 250 3000
1
2
3
4
5
6
7
Minimum write pulse width, TW
(ps)
Cel
l sig
ma
FS parasitics + mixed mode3D−TCAD parasitics + mixed mode
Figure 5.42: Minimum write pulse width (TW ) vs. cell sigma
models to new processes. With increased FEOL/BEOL scaling, a key modeling
question that needs attention is: what is the relative importance of advanced trans-
port models versus accurate parasitics data, when modeling multi-gate circuits
via mixed-mode TCAD transient simulations?
We addressed the above by studying propagation delay in two flavors
of FinFET NAND2 logic gates, namely SG-NAND2 and LP-NAND2, whose
167
(FEOL+BEOL) structures are shown in Fig. 5.43. SG-NAND2 consists of SG-
mode FinFETs, while LP-NAND2 consists of IG-mode FinFETs, where front and
back gates are electrically independent. The back gates of the p-FinFETs are tied to
VHIGH = 1.2V , while the back gates of the n-FinFETs are connected to VLOW =−0.2V ,
with VDD = 1V . On account of the back-gate biases, LP-NAND2 has lower leakage
and higher latency in comparison to SG-NAND2.
VDD
OUT
GND
A
B
VHIGH
VLOW
A
B
OUT
n-FinFET
p-FinFET
IG-MODE
(a) (b)
Figure 5.43: (a) SG-NAND2, and (b) LP-NAND2 FinFET configurations
We examined four transport model scenarios: (a) only the drift diffusion (DD)
transport formalism [2,166] is used, (b) DD is used along with back-annotated 3D-
TCAD parasitic capacitances (DD+PC), (c) only the hydrodynamic (HD) transport
formalism [2] is used, and (d) HD is used along with back-annotated 3D-TCAD
parasitic capacitances (HD+ PC). We refer to F(A,B) as the error percentage in
propagation delay between the scenarios A and B (A,B ∈ DD,DD+PC,HD,HD+
PC) such that F(A,B) = tp(A)−tp(B)tp(B)
× 100, where tp is the average of the rise (tpLH)
and fall (tpHL) delays. From Fig. 5.43, we can see that BEOL metal density in SG-
NAND2 is not as high as in LP-NAND2. Hence, from Figs. 5.44(a) and (b), we see168
that F(DD+PC,DD) and F(HD+PC,HD) are higher for LP-NAND2. For both con-
figurations, it can be seen that |F(DD+PC,DD)| and |F(HD+PC,HD)| are smaller
than and comparable to |F(HD,DD)| and |F(HD+PC,DD+PC)|. This shows that
while transport models dominate, it is very important to capture parasitic capaci-
tances accurately, as in the absence of parasitics data, complicated transport mod-
els at the device level fail to provide precise predictions at the circuit level in a
mixed-mode TCAD setup. (The latter observation is also valid for SPICE simula-
tions where compact models are unlikely to capture FEOL parasitics accurately.)
5.3.6 Section summary
In the move from planar to multi-gate FET technology, the need to optimize para-
sitics is likely to be the most important circuit-level design priority, as the ability to
predict and control parasitics will determine whether device-level improvements
in the on-current translate to tangible overall performance improvements. In
this regard, extraction of parasitic capacitances for multi-gate circuits at the 22nm
node and beyond is beset with major challenges in terms of FEOL/(FEOL+BEOL)
extraction for generic layouts. In this section, we established the fact that seg-
regated FEOL/BEOL modeling approaches for parasitic capacitance extraction
in multi-gate circuit layouts are insufficient and quantified the relative impor-
tance of modeling advanced transport phenomena versus incorporating parasitic
capacitances. In doing so, we developed a pragmatic 3D-TCAD flow based on
the automated structure synthesis approach, which can serve as a solution to the
FEOL/(FEOL+BEOL) capacitance extraction problem for small to reasonably large
multi-gate circuit layouts, and assist in the development/validation of compact
models during early phases of technology development. Using the 3D-TCAD
flow, we also provided critical insight into BL and WL capacitance scaling in 6T
multi-gate SRAMs along the 22/14/10nm technology nodes.
169
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x 10-10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
1 2 3 4-25
-20
-15
-10
-5
0
5
10
15
Err
or
(%),
F(A
,B)
= (
(tp(A
) -
t p(B
))*1
00
/tp(B
))
A
OUT, DD
OUT, DD+PC
OUT, HD
OUT, HD+PC
F(HD, DD)
F(HD+PC, DD+PC)
F(DD+PC, DD)
F(HD+PC, HD)
(a)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
x 10-10
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
Time (s)
Vo
lta
ge
(V
)
1 2 3 4-30
-20
-10
0
10
20
30
Err
or
(%),
F(A
,B)
= (
(tp(A
) -
t p(B
))*1
00
/tp(B
))
A
OUT, DD
OUT, DD+PC
OUT, HD
OUT, HD+PC
F(DD+PC, DD)
F(HD+PC, HD)
F(HD+PC, DD+PC)
F(HD, DD)
(b)
Figure 5.44: Propagation delays of (a) SG-NAND2, and (b) LP-NAND2 configu-rations with different physical models. (DD = Drift-diffusion formalism, HD =hydrodynamic formalism, PC = 3D-TCAD-extracted parasitic capacitances correc-tions added)
170
Chapter 6
Parasitics-aware Design of Symmetric
and Asymmetric Gate-workfunction
FinFET SRAMs
6.1 Introduction
Since SRAM bitcells are often the densest features patterned on a chip, a significant
amount of research has been directed towards the design and manufacturability of
multi-gate SRAMs [73,152,157–160] at emerging technology nodes. However, most
of these investigations have focused on enhancing/contrasting SRAM DC metric
targets. While DC metrics can be directly obtained using measurements on flycells,
inferring array-scale behavior at design time is proving to be exceedingly difficult.
In order to predict array-scale metrics through simulation, capturing transient be-
havior accurately is absolutely essential. To accomplish the latter, SRAM parasitic
capacitances need to be extracted accurately from the layout.
As mentioned in Chapter 5, width quantization in multi-gate devices poses
problems for extraction of FEOL parasitic capacitances in arbitrary SRAM layouts
171
in upcoming/emerging process technologies, where reliance on compact models
for FEOL parasitics is not possible. Also, due to the increased proximity of BEOL
metal/vias to FEOL active silicon regions, it is highly questionable whether tra-
ditional segregated FEOL/BEOL parasitics modeling approaches yield accurate
results.
In this chapter, we evaluate several Symm-ΦG and Asymm-ΦG 6T FinFET
SRAMs from the perspective of transient behavior, by back-annotating 3D-TCAD-
extracted (FEOL+BEOL) layout-specific parasitic capacitances into mixed-mode
transient device simulations, thereby addressing the shortcomings described
above [167].
The chapter is organized as follows. In Section 6.2, we discuss prior related
work in multi-gate parasitics extraction and SRAM design. Thereafter, we outline
our simulation setup in Section 6.3. In Section 6.4, we explore a variety of current
and new SRAM bitcell topologies using the setup described in Section 6.3, and
highlight the need for transient analysis based bitcell design space exploration.
Finally, we present our conclusions in Section 6.5.
6.2 Related work
Over the past decade, a considerable amount of research has been directed towards
SRAM bitcell design using FinFETs/Tri-gate devices. Experimental, aggressively-
scaled FinFET SRAM bitcells/arrays with tight fin/gate pitches and their design
challenges have been explored in [73, 152, 157–159]. The highest-density func-
tional multi-gate SRAM bitcell to-date [fabricated using extreme ultra violet (EUV)
lithography, with an area of 0.021µm2 and minimum contacted fin/gate pitch of
50nm] has been reported in [160]. On the modeling front, several investigations
into FinFET/Tri-gate SRAM topologies as well as DC metrics are available in the
172
literature [88, 168–171]. In [71, 72], SRAM topologies using IG-mode FinFETs, such
as pass-gate feedback (PGFB) and pull-up write gating (PUWG), have been pro-
posed to enhance SRAM read/write margins. A dynamic row-based back-gate bi-
asing (RBB) scheme to enhance performance and reduce leakage in 6T/8T FinFET
SRAMs has been proposed in [77]. While most of the above work focus mainly on
DC metrics, a detailed analysis of transient behavior and parasitics data is lacking
for the bitcells proposed, thereby rendering such explorations incomplete.
Optimization of multi-gate device-level parasitic resistances and capacitances
has received a lot of attention as well. Mitigation of parasitic resistances/on-
current (ION) enhancement via design and process modifications, such as elevated
source/drain extensions, usage of stress liners, strained SOI, and doping profile
optimization, have been explored in [147–153]. The effects of fringe capacitances
on device performance have been examined via 3D simulations in [154], while an
analysis of geometry-dependent parasitic capacitances in multi-fin FinFETs is pro-
vided in [155]. RC-delay optimization (with fin pitch, gate pitch, fin height, and
fin thickness as parameters) in highly scaled multi-fin FinFETs has been studied
in [148, 156].
6.3 Simulation setup
In this section, we briefly explain the simulation setup that was employed to eval-
uate the various FinFET SRAM bitcell architectures discussed in Section 6.4. We
performed process simulations in Sentaurus Process [136] in order to generate
SOI FinFET structures at the 22nm node with device parameters specified in Ta-
ble 6.1, which were obtained from a combination of candidate device configura-
tions that have either been investigated experimentally or via device simulations
in [72, 153, 156, 161, 162].
173
Table 6.1: 22nm SOI FinFET device parameters
Parameter ValueLG(nm) 24
Effective TOX(nm) 1.1HGAT E(nm) 40
LSP(nm) 12TSI(nm) 12
HFIN(nm) 40HELEV (nm) 20
LDL(nm) 4NCH(cm−3) 1015
NSD(cm−3) 3∗1020
TBOX(nm) 240FP(nm) 40GP(nm) 90
HGATE
TBOX
LG
HFIN
TSI
HELEV
LG
LSP
TOX
(a) (b)
Figure 6.1: (a) Two-dimensional SOI n-FinFET cross section, and (b) 3D SOI n-FinFET structure
Fig. 6.1(a) shows the two-dimensional cross-section of the 3D SOI n-FinFET
structure [Fig. 6.1(b)] obtained from process simulation. Here, LG, E f f ective TOX ,
HGAT E , LSP, TSI , HFIN , HELEV , LDL, NCH , NSD, TBOX , FP, and GP are the physical
gate length, front/back-gate effective oxide thickness, gate height above fin, spacer
thickness, fin width, fin height, source/drain elevation above fin, source/drain
174
doping decay length, channel doping, source/drain doping concentrations, buried
oxide thickness, fin pitch, and gate pitch, respectively.
FinFETs with three different gate workfunctions (ΦG) are used in the configu-
rations explored in Section 6.4. For Symm-ΦG high-performance n-FinFETs and
p-FinFETs, the workfunction is set to ΦGn = 4.4eV , ΦGp = 4.8eV , respectively. To
obtain medium-Vth n-/p-FinFET devices, ΦGn = ΦGp = 4.6eV . For Asymm-ΦG de-
vices, the front-gate workfunction is set to ΦGF = 4.4eV , while the back-gate work-
function is set to ΦGB = 4.8eV , with the source/drain doping type determining
the major carrier during on-state conduction. Symm-ΦG and Asymm-ΦG devices
have been evaluated in detail in Chapter 3. The results presented in Section 6.4
involve three major simulation setups that were deployed in the Sentaurus TCAD
tool suite [136], and are explained next.
6.3.1 DC metrics of 6T FinFET SRAMs
SRAM DC metrics cover stability, bitcell read current, and bitcell leakage. Sta-
bility encompasses the hold/read/write conditions, which are categorized as the
data retention margin, access disturb margin, and write margin. While several
definitions of each metric have been used in the literature, we adopt the method
of ‘N-curves’ prescribed in [165, 172]. Unlike traditional static voltage noise mar-
gin based setups, the N-curve method enables direct measurement of maximum
DC noise voltage as well as DC noise current that can be tolerated at the SRAM
internal nodes. Figs. 6.2(a) and (b) describe the N-curve measurement setup for
hold/read/write conditions. Here, a source monitor unit (SMU) is connected to
the internal storage node (NL) and measures the current supplied/drawn from
node NL (INL) when the node voltage (VNL) is swept from GND to VDD.
In the read condition, BL, BLB, and WL are held at VDD, while the SMU sweeps
node NL. The characteristic read N-curves for two bitcells, 6T 1 and 6T 2, are shown
175
VDD
GND
NL NR
WL
WL
BLBLB
PG1
PU1
PD1
PU2
PD2
PG2
SMU
VDDVDD
GND
GND
(a)
VDD
GND
NL NR
WL
WL
BL
BLB
PG1
PU1
PD1
PU2
PD2
PG2
SMU
VDD
VDD
VDD
VDD
(b)
Figure 6.2: Setup for (a) DC hold metrics, and (b) DC read/write metrics
in Fig. 6.3, and contain two zero-crossings Ai and Bi. The read voltage noise margin
(RV NM) is defined as RV NMi = VBi −VAi . The read current noise margin (RINM)
is defined as RINMi = max(INL) : VAi < VNL < VBi. Since bitcells with unequal
RV NMs/RINMs cannot be compared directly [173], a read power noise margin
(RPNM) metric is needed:
RPNMi =∫ VBi
VAi
INLdVNL (6.1)
176
0 0.2 0.4 0.6 0.8 1-5
0
5
10
15
x 10-5
VNL
(V)
I NL (
A)
Bitcell 6T1
Bitcell 6T2
RINM1
RINM2
A2
A1
B2
B1
RVNM1
RVNM2
C1
C2
Figure 6.3: N-curve for the DC read condition
Since bitcells with a larger RPNM are more stable during the read operation, 6T 2
is better than 6T 1 from a read margin perspective. The second half of the N-
curve can also be used to evaluate DC write-ability (Fig. 6.4). Here, the write-
trip voltage (WTV ) is defined as WTVi = VCi −VBi . The write-trip current (WT I),
which is the maximum current required to write into the bitcell, is defined as
WT Ii = max(|INL|) : VBi < VNL < VCi. As with earlier conditions, the write-trip
power (WT P) is defined as
WT Pi =∫ VCi
VBi
|INL|dVNL (6.2)
Thus, the main criteria for 6T FinFET SRAM DC operation are minimizing WT P,
while maximizing RPNM [173].
177
0.4 0.5 0.6 0.7 0.8 0.9 1
-3
-2
-1
0
1
2
3
4
5
x 10-5
VNL
(V)
I NL (
A)
Bitcell 6T1
Bitcell 6T2
B2
B1
C1
C2
WTV2
WTV1
WTI2
WTI1
Figure 6.4: N-curve for the DC write condition
6.3.2 Transport analysis based 3D-TCAD extraction of FinFET
SRAM parasitic capacitances
With increased scaling, owing to the close proximity of FEOL/BEOL regions,
dense layouts, such as FinFET SRAM arrays, need to undergo transport analysis
based 3D-TCAD parasitic capacitance extraction [133]. Here, active silicon re-
gions need to be treated as semiconductors in (FEOL+BEOL) structures generated
through process simulation of the layout in consideration. However, process
simulation time/memory complexity scales very poorly as the number of devices
in the layout increases. This is a major bottleneck for any iterative flow employing
it. We circumvented the process simulation barrier by leveraging an automated
multi-gate structure synthesizer [Fig. 5.22(b)], that is described in Chapters 4 and
5. A planar FET implementation of the method, with hardware validation in a
32nm SOI CMOS process, is described in [145].
178
The structure synthesis methodology involves a one-time process-simulation
cost for the construction of a 22nm SOI FinFET database consisting of n-/p-
FinFETs. Thereafter, with the aid of the FEOL/(FEOL+BEOL) multi-gate struc-
ture synthesizer (which is equipped with a layout analyzer/partitioner), the
FEOL/(FEOL+BEOL) structure corresponding to any input FinFET circuit layout
is synthesized automatically, using the FinFET device database and FEOL/BEOL
process assumptions. Therefore, by preserving process-level accuracy in critical
regions of the FinFET structure, the flow in Fig. 5.22(b) provides very favorable
time/memory scaling properties and enables iterative optimization in a practical
timeframe.
6.3.3 Modeling dynamic behavior of FinFET SRAM bitcells
In order to model the transient behavior of FinFET SRAMs, we utilized the hybrid
mixed-mode setup shown in Fig. 6.5. We leveraged the framework in Fig. 5.22(b)
and performed transport analysis based 3D-TCAD capacitance extraction on the
(FEOL+BEOL) structure synthesized from each input SRAM layout. Thereafter,
the 3D-TCAD-extracted capacitances from the (FEOL+BEOL) structure were back-
annotated into the bitcell mixed-mode 2D-TCAD setup. In order to avoid double
counting contributions already accounted for in the device cross-sections, a ca-
pacitance extraction experiment was performed for the bitcell mixed-mode setup,
and the difference between the former and the latter was included explicitly as a
mixed-mode capacitance correction.
Since an SRAM bitcell transient simulation is only relevant in the context of an
array, the complete mixed-mode setup was dynamically generated for each array
configuration of the bitcell, i.e., depending on the row width and column height,
as shown in Fig. 6.5. The latter was used to compute the minimum read/write
pulse-widths under various conditions.
179
In the next section, we discuss several 6T FinFET SRAM bitcell topologies,
which were evaluated under various conditions with the setup described above.
3D TCAD
structure
synthesis
Device
database
Process
assumptions
3D TCAD parasitics
extraction
Mixed-mode setup
SRAM
bitcell
Input layout
WL driver
BL BLB
Figure 6.5: Hybrid mixed-mode device simulation methodology for simulatingSRAM read/write operations
6.4 Design of 6T FinFET SRAMs
In this section, we evaluate several different FinFET SRAM bitcells from a DC,
parasitics, and transient perspective, after a brief discussion on bitcell operation
for each topology.
6.4.1 6T FinFET SRAM topologies
6T FinFET SRAMs can be broadly classified into three different categories:
180
• The ‘vanilla shorted-gate configurations’ (V SCs) are direct extensions of pla-
nar SRAMs, using only Symm-ΦG SG-mode FinFETs. Owing to width quan-
tization, βPD/PG and βPG/PU ratios are restricted, and the use of a single ΦG
significantly improves processing cost/yield (recall that PD, PG, and PU re-
fer to the pull-down, pass-gate, and pull-up FETs, respectively.) Hence, V SCs
are attractive for fin/gate pitch scaling, are the easiest bitcells to manufacture,
and require larger bitcell areas when pass-gate/pull-down fin counts are in-
creased to improve bitcell β ratios. They are designated as V (NPU NPGNPD),
where NPU /NPG/NPD are the fin counts for PU/PG/PD FETs, respectively.
• The ‘independent-gate configurations’ (IGCs) are derived from VSCs by re-
placing one or more Symm-ΦG SG-mode FinFETs with Symm-ΦG IG-mode
FinFETs. Owing to the flexibility of back-gate bias based Vth modulation, IGC
bitcells can improve DC metrics without resorting to increasing PG/PD fin
counts for improved stability. However, they are harder to manufacture (due
to layout-specific IG-mode devices), and are unlikely to be very scalable.
• The ‘multiple-ΦG shorted-gate configurations’ (MSCs) leverage SG-mode
FinFETs with two or more gate workfunctions. The devices can be Symm-ΦG
or Asymm-ΦG, and either condition leads to increased processing complex-
ity for the gate-stack. Owing to the availability of multiple Vth’s, MSC bitcells
can improve DC/transient metrics with the same fin/gate pitch scaling
abilities as VSCs, without using multi-fin PG/PD devices. In the current
work, we restrict our investigations to bitcells having combinations of only
two distinct ΦG’s.
In Table 6.2, the PU/PG/PD device configurations for V SC, IGC, and MSC bit-
cells are shown, along with the nomenclatures for each bitcell.
181
Table 6.2: 6T FinFET SRAM device configurationsTopology NPU PU ΦG(eV ) NPG PG ΦG(eV ) NPD PD ΦG(eV )
Vanilla configurationsV (111), V (112), V (113), V (122), V (123), V (135) 1 SG, 4.6 1→ 3 SG, 4.6 1→ 5 SG, 4.6
Independent-gate configurationsPass-gate feedback, PGFB 1 SG, 4.6 1 IG, 4.6 1 SG, 4.6
Pull-up write gating, PGFB-PUWG 1 IG, 4.6 1 IG, 4.6 1 SG, 4.6Split-pull-up, PGFB-SPU 1 IG, 4.6 1 IG, 4.6 1 SG, 4.6
Row-based back-gate bias, RBB 1 SG, 4.6 1 IG, 4.6 1 IG, 4.6Multiple-ΦG configurations
A(111),A(112) 1 a-SG, 4.4/4.8 1 a-SG, 4.4/4.8 1→ 2 a-SG, 4.4/4.8A(11)S 1 a-SG, 4.4/4.8 1 a-SG, 4.4/4.8 1 SG, 4.4DPD-L 1 SG, 4.6 1 SG, 4.6 1 SG, 4.4DPG-H 1 SG, 4.6 1 SG, 4.8 1 SG, 4.6
V SC bitcells: In Fig. 6.6, the (FEOL+BEOL) and FEOL structures for V (135) are
shown [the structures for V (111), V (112), V (113), V (122), and V (123) are shown
in Chapter 5]. They are based on traditional SRAM thin-cell layouts with PU p-
FinFETs sandwiched between PG/PD n-FinFETs, and have metal-3 word lines
(WL) and metal-2 bit lines/power lines (BL, BLB, VDD, GND), as shown in the
perspective view in Fig. 5.26(a). The default fin pitch and gate pitch were set to
FP = 40nm and GP = 90nm, respectively.
GND BL BLBVDD GND
WLNR
NL
(a) (b)
Pull-down
WL
WL
Pull-up
Pass-gateNR
NL
n-FinFET
p-FinFET
Figure 6.6: V(135) bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown
IGC bitcells: In Figs. 6.7, 6.8, 6.9, and 6.10, the (FEOL+BEOL) and FEOL struc-
tures corresponding to the PGFB and PUWG configurations (PGFB-PUWG) [71,
72], pass-gate feedback split pull-up (PGFB-SPU), and RBB configurations [77] are
shown, respectively, with FP = 40nm and GP = 90nm.182
WL
WL
NL
NR
Pull-up
Pull-down
Pass-gate
WL
GNDGND BL BLBVDD
NR
NL
IG-MODE
n-FinFET
p-FinFET(a) (b)
Figure 6.7: PGFB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown
PGFB: Unlike the V (111) bitcell, the PGFB bitcell contains IG-mode PG devices
whose back gates are connected to their respective storage nodes. While there is
no area penalty in moving from V (111) to PGFB, the internal node gate conductors
need to be extended, which affects the internal node capacitances and, hence, the
dynamic write-ability of the bitcell changes.
WL
WL
WWL
WWL
NR
NL
Pull-up
Pass-gate
Pull-down
VDD VDDGNDBL BLB
WL
WWL
NR
NL
IG-MODE
n-FinFET
p-FinFET(a) (b)
Figure 6.8: PGFB-PUWG bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectricregions are not shown
183
PGFB-PUWG: This bitcell is derived from the PGFB bitcell, with the PU devices
in IG mode. The back gates of the latter are connected to WWL. During hold and
read conditions, WWL is de-asserted, and the bitcell is expected to behave like the
PGFB bitcell. During a write operation, WWL is asserted to selectively weaken the
PU devices, thereby improving write-ability. Also, the PGFB-PUWG bitcell incurs
extra area in the form of an additional fin pitch, in order to prevent a conflict in
wiring metal-3 WL and WWL nodes.
WL
WL
NR
NL
VDD
VDD
Pull-up Pass-gate
Pull-down
VDD VDDGNDBL BLB
WL
NR
NL
IG-MODE
n-FinFET
p-FinFET(a) (b)
Figure 6.9: PGFB-SPU bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectricregions are not shown
PGFB-SPU : The PGFB-SPU bitcell is a variant of the PGFB bitcell, where the PU
devices are in IG mode and their back gates are hardwired to VDD. This helps im-
prove write-ability over the PGFB configuration without adding additional para-
sitic capacitances (which arise due to an extra word line in PGFB-PUWG).
RBB: In this configuration, the PG/PD devices are in IG mode and connected to a
metal-3 BIAS node that is shared across the entire row. The bitcell area increases by
33% (Fig. 6.11) as additional fin pitches are needed to contact the back gates of the
PG/PD FETs. During the hold operation, the BIAS node is de-asserted (below the
184
WL
NR
NL
BIAS
WL
BIAS
Pull-up
Pull-down
Pass-gate
GND BL BLBVDD GND
WL
BIAS
NR
NL
IG-MODE
n-FinFET
p-FinFET(a) (b)
Figure 6.10: RBB bitcell: (a) (FEOL+BEOL), and (b) FEOL only. Dielectric regionsare not shown
rail), thereby limiting bitcell leakage. During the read/write operations, the BIAS
node is asserted along with WL, in order to improve dynamic read/write-ability.
MSC bitcells are identical to corresponding V SC bitcells from a layout perspec-
tive. They leverage either Asymm-ΦG or dual-Symm-ΦG n-/p-FinFETs to improve
DC metrics without incurring additional parasitic capacitances seen in IGC bitcells.
1 2 3 4 5 6 7 8 9 100
2
4
6
8
10
12
14
Bitc
ell a
rea/
(FP
x G
P)
V(1
12),
A(1
12)
V(113)V(135)
RBB
V(1
11),
A(1
11),
A(1
1)S
, DP
D−L
, DP
G−H
V(123)
V(122)
PGFB
PGFB−PUWG
PGFB−SPU
Figure 6.11: Bitcell areas normalized to FP×GP
185
Next, we take a closer look at DC metrics of the bitcells described above.
0.4 0.6 0.8 10.1
0.2
0.3
0.4
0.5
VDD
(V)
RV
NM
(V
)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0.4 0.6 0.8 10
1
x 10−4
VDD
(V)
RIN
M (
A)
V(111)V(112)V(113)V(122)V(123)V(135)
(b)
0.4 0.6 0.8 10
1
2
3
4 x 10−5
VDD
(V)
RP
NM
(W
)
V(111)
V(112)
V(113)
V(122)
V(123)
V(135)
(c)
Figure 6.12: V SC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM
6.4.2 6T FinFET SRAM DC metrics
We now examine the dependence of DC metrics on VDD to provide a high-level pic-
ture of read stability, write-ability, read current (IREAD), and bitcell leakage (ILEAK).
Read margins: In Figs. 6.12, 6.13, and 6.14, read margins for the V SC, IGC, and
MSC bitcells are shown. Owing to the high βPD/PG ratio, V (113) has the high-
est RV NM. However, V (135) has the highest RINM, which degrades gracefully
as VDD decreases. In terms of RPNM, V (113) is the best amongst V SC bitcells for
186
0.4 0.6 0.8 10.1
0.2
0.3
0.4
0.5
VDD
(V)
RV
NM
(V
)
PGFBPGFB−PUWGPGFB−SPU
(a)
0.4 0.6 0.8 10
2
4
6
8 x 10−5
VDD
(V)
RIN
M (
A)
PGFBPGFB−PUWGPGFB−SPU
(b)
0.4 0.6 0.8 10
0.5
1
1.5
2 x 10−5
VDD
(V)
RP
NM
(W
)
PGFB
PGFB−PUWG
PGFB−SPU
(c)
Figure 6.13: IGC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM
VDD > 0.8V , while V (135) crosses over for VDD < 0.7V . A similar RPNM crossover
can be observed between V (112) and V (123), at VDD = 0.9V . This shows that high
βPD/PG ratio does not necessarily translate to high RINM/RPNM. Also, at VDD = 1V ,
RPNMV (113) : RPNMV (111) ≈ 5.2, which is a dramatic increase in stability, with in-
creased βPD/PG ratio.
Amongst the IGC bitcells, stability metrics for RBB were not computed as the
N-curve based RPNM and WT P definitions are not directly applicable in the tradi-
tional sense, owing to the inherently dynamic nature of read and write operations
(where BIAS overdrives the PG/PD n-FinFETs, when WL is triggered). While the
PGFB-PUWG bitcell faces a similar dynamic condition during write, we character-
187
0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
VDD
(V)
RV
NM
(V
)
A(112)A(111)A(11)SDPD−LDPG−H
(a)
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1 x 10−4
VDD
(V)
RIN
M (
A)
A(112)A(111)A(11)SDPD−LDPG−H
(b)
0.4 0.6 0.8 10
0.5
1
1.5
2 x 10−5
VDD
(V)
RP
NM
(W
)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.14: MSC read margins vs. VDD: (a) RV NM, (b) RINM, and (c) RPNM
ized read/write metrics for VWWL = 0V . From Figs. 6.13(a) and (b), RV NMPGFB <
RV NMPGFB−PUWG, while RINMPGFB crosses over RINMPGFB−PUWG at VDD = 0.7V .
This hints to a potential RPNMPGFB > RPNMPGFB−PUWG crossover for VDD > 1V (not
shown). Similar behavior has been observed in [72]. PGFB fares better than V (111),
as RPNMPGFB : RPNMV (111) ≈ 2.7 (all ratios are reported at VDD = 1V ). Among the
IGC bitcells, PGFB-SPU has the least RV NM/RINM, resulting in poorer read stabil-
ity with respect to PGFB, as RPNMPGFB : RPNMPGFB−SPU ≈ 1.6.
From Fig. 6.14(a), MSC bitcells can be seen to fare well in terms of RV NM. How-
ever, RINM is surprisingly poor for A(111), A(11)S, and DPD-L. This results in poor
188
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
VDD
(V)
WT
V (
V)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0.4 0.6 0.8 10
0.5
1
1.5
2
2.5 x 10−5
VDD
(V)
WT
I (A
)
V(111)V(112)V(113)V(122)V(123)V(135)
(b)
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1 x 10−5
VDD
(V)
WT
P (
W)
V(111)V(112)V(113)V(122)V(123)V(135)
(c)
Figure 6.15: V SC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P
RPNM, as RPNMA(112) : RPNMA(111) ≈ 3.6 and RPNMA(112) : RPNMA(11)S ≈ 7. A(111)
is slightly worse than its V SC counterpart V (111), as RPNMV (111) : RPNMA(111)≈ 1.5.
It also fares poorly with respect to PGFB, where RPNMPGFB : RPNMA(111) ≈ 4.2.
Write margins: In Figs. 6.15, 6.16, and 6.17, write margins for the V SC, IGC, and
MSC bitcells are shown. While all the V SC bitcells have reasonable write WTV s,
V (113) and V (112) have high WT Is. V (111) and V (122) have the lowest WT P and,
hence, are the best V SC bitcells from a DC write-ability perspective. V (112), V (113),
V (123), and V (135) have large WT Ps, which suggests that dynamic write-ability
will be a major concern for V SC bitcells. Here, it is important to note that DC read
metrics are pessimistic, i.e., they seek unconditional read stability for arbitrarily
189
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
VDD
(V)
WT
V (
V)
PGFBPGFB−PUWGPGFB−SPU
(a)
0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3 x 10−5
VDD
(V)
WT
I (A
)
PGFBPGFB−PUWGPGFB−SPU
(b)
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1 x 10−5
VDD
(V)
WT
P (
W)
PGFBPGFB−PUWGPGFB−SPU
(c)
Figure 6.16: IGC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P
large WL pulse widths. On the other hand, DC write metrics can be very optimistic,
as WL pulse widths, in reality, are of finite durations during which the bitcell needs
to be written successfully. While the ability to guarantee write-ability with a WL
pulse width of infinite duration is not very useful, the relative difficulty in writing
to a bitcell can be gauged from the WT P.
Amongst the IGC bitcells, while PGFB-PUWG has the lowest WTV , it also has
the highest WT I. PGFB-SPU has the lowest WT I and WT P, making it the best
IGC bitcell from a write-ability perspective. It also represents the other operating
extreme from PGFB-PUWG, when VWWL =VDD. PGFB-SPU is better in comparison
190
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
VDD
(V)
WT
V (
V)
A(112)A(111)A(11)SDPD−LDPG−H
(a)
0.4 0.6 0.8 10
0.5
1
1.5
2 x 10−5
VDD
(V)
WT
I (A
)
A(112)A(111)A(11)SDPD−LDPG−H
(b)
0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1 x 10−5
VDD
(V)
WT
P (
W)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.17: MSC write margins vs. VDD: (a) WTV , (b) WT I, and (c) WT P
to V (122) (which is the most write-able V SC bitcell), as WT PV (122)/WT PPGFB−SPU ≈
1.7.
MSC bitcells have high WTV s and low WT Is (Fig. 6.17). A(111), A(11)S and
DPD-L have low WT Ps, with A(111) being the best bitcell. In moving from A(111)
to A(112), while RPNM improves by 3.6× (Fig. 6.14), write-ability degrades, as
WT PA(112)/WT PA(111) ≈ 2.1.
Read current: In Figs. 6.18(a), (b), and (c), the dependence of IREAD on VDD is
shown for each of the bitcells. At VDD = 1V , amongst the V SC bitcells, IREAD is
roughly proportional to the PG fin count (e.g., IREAD,V (122) : IREAD,V (111) ≈ 2). Also,
V (111)/V (112) bitcells have 3× larger IREAD compared to PGFB bitcells. However,
191
0.4 0.6 0.8 110
−6
10−5
10−4
10−3
VDD
(V)
I RE
AD
(A
)
V(111)
V(112)
V(113)
V(122)
V(123)
V(135)
(a)
0.4 0.6 0.8 110
−8
10−7
10−6
10−5
10−4
VDD
(V)
I RE
AD
(A
)
PGFBPGFB−PUWGPGFB−SPURBB
(b)
0.4 0.6 0.8 110
−10
10−8
10−6
10−4
VDD
(V)
I RE
AD
(A
)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.18: IREAD vs. VDD: (a) V SC, (b) IGC, and (c) MSC
IREAD,RBB ≈ IREAD,V (111). Also, all MSC bitcells (with the exception of DPG-H) have
approximately the same IREAD as V (111).
Bitcell leakage current: In Figs. 6.19(a), (b), and (c), the dependence of ILEAK on VDD is
shown. At VDD = 1V , while moving from V (111) to V (112) [V (113)], ILEAK increases
by 48% (97%). Amongst V SC bitcells, V (135) has the highest ILEAK that is nearly
4× ILEAK,V (111). Among the IGC bitcells, PGFB-PUWG has unacceptably high ILEAK
for the DC hold state configuration with VWWL = 0V . Hence, from an ILEAK per-
spective, the PGFB-PUWG bitcell will typically be restricted to VWWL = VDD − δ
(δ < 0.2V ). However, such a choice for VWWL would dramatically impact the read
192
0.4 0.6 0.8 110
−11
10−10
10−9
VDD
(V)
I LEA
K (
A)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0.4 0.6 0.8 110
−12
10−10
10−8
10−6
10−4
VDD
(V)
I LEA
K (
A)
PGFBPGFB−PUWG, V
WWL=0V
PGFB−PUWG, VWWL
=1V
PGFB−SPURBB
(b)
0.4 0.6 0.8 110
−11
10−10
10−9
10−8
10−7
VDD
(V)
I LEA
K (
A)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.19: ILEAK vs. VDD: (a) V SC, (b) IGC, and (c) MSC
stability, thereby making it unattractive. Amongst MSC bitcells, A(11)S and DPD-
L have nearly two orders of magnitude higher ILEAK than A(111), owing to the
presence of Symm-ΦG high-performance SG-mode FinFETs. In comparison to
V (111), which is equipped with Symm-ΦG low power FinFETs (ΦG = 4.6eV ), A(111)
[A(112)] fares poorly with 4.7× (7×) higher ILEAK . However, vanilla-like topologies
with Asymm-φG SG-mode FinFETs, such as A(111), are still an attractive option for
low-leakage bitcells in high-performance dual gate-workfunction processes.
Next, we examine parasitic capacitances in the V SC, IGC, and MSC bitcells in
greater detail.
193
6.4.3 6T FinFET SRAM parasitic capacitances
Since the SRAM bitcell layout closely affects the parasitic BL and WL capacitances,
it is essential to understand the sources of coupling and their relative contributions
in order to perform any kind of layout exploration/optimization for good transient
behavior.
9%
35%
27%
< 1%
27%
< 1%< 1%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
(a)
21%
2%
16%
21%
24%
16%< 1%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
(b)
Figure 6.20: Breakup of V (111) (FEOL+BEOL) BL and WL capacitances
13%
19%
50%
< 1%
17%< 1%< 1%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
(a)
22%
2%
21%
22%
12%
20%
< 1%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
(b)
Figure 6.21: Breakup of V (123) (FEOL+BEOL) BL and WL capacitances
(FEOL+BEOL) capacitance break-up: In Figs. 6.20, 6.21, and 6.22, the V SC (FEOL+BEOL)
BL capacitance (CBL) and WL capacitance (CWL) are decomposed into their com-
ponents. Moving from V (111) → V (123) → V (135), CBL,WL dominates, growing
from 27%→ 50%→ 63%, while CBL,VDD nearly halves itself on each occasion, from
35%→ 19%→ 7%. On the CWL front, CWL,GND loses share, from 24%→ 12%→ 10%,
while CWL,BL and CWL,BLB increase and level out, from 16%→ 20%→ 19%. This194
13%
7%
63%
< 1%
15%< 1%1%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
(a)
24%
3%
19%
24%
10%
19%< 1%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
(b)
Figure 6.22: Breakup of V (135) (FEOL+BEOL) BL and WL capacitances
shows that reduction in CBL,WL and CWL,NL/CWL,NR are key priorities when the
βPD/PG ratio is increased in V SC bitcells.
17%
35%20%
< 1%
27%
< 1% < 1%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
(a)
21%
3%
13%
21%
28%
13%< 1%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
(b)
Figure 6.23: Breakup of PGFB (FEOL+BEOL) BL and WL capacitances
In the PGFB bitcell (Fig. 6.23), CBL is dominated by CBL,VDD (35%), while CWL
is dominated by CWL,GND (28%). However, for the PGFB-SPU bitcell (Fig. 6.24),
the trend reverses. Here, CBL is dominated by CBL,GND (38%) and CWL is domi-
nated by CWL,VDD (32%). In the PGFB-PUWG scenario (Fig. 6.25), CBL,GND (37%) and
CWL,WWL (26%) dominate CBL and CWL, respectively. For the RBB bitcell (Fig. 6.26),
CBL mainly consists of CBL,WL (35%) and CBL,VDD (34%). However, CWL reduction is
very difficult in this case, as it consists of six nearly equal contributions.
V SC bitcell capacitances: In Fig. 6.27, the (FEOL+BEOL) capacitances for the V SC bit-
cells are shown. CBL decreases by 25% while moving from V (111) to V (112), as the
195
13%
28%
19%< 1%
38%
< 1%< 1%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
(a)
21%
32%11%
21%
3%
11%1%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
(b)
Figure 6.24: Breakup of PGFB-SPU (FEOL+BEOL) BL and WL capacitances
13%
27%
18%< 1%
37%
< 1%< 1%
3%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
CBL, WWL
(a)
18%
18%
9%17%
2%
9%
1%
26%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
CWL, WWL
(b)
Figure 6.25: Breakup of PGFB-PUWG (FEOL+BEOL) BL and WL capacitances
6%
34%
35%
< 1%6%
< 1%< 1%
18%
CBL,NL
CBL,VDD
CBL,WL
CBL,NR
CBL,GND
CBL,BLB
CBL,WFRGND
CBL, BIAS
(a)
14%
2%
16%
15%17%
19%
< 1%
17%
CWL,NL
CWL,VDD
CWL,BL
CWL,NR
CWL,GND
CWL,BLB
CWL,WFRGND
CWL, BIAS
(b)
Figure 6.26: Breakup of RBB (FEOL+BEOL) BL and WL capacitances
addition of a single fin to the PD FET adds an extra fin pitch, which permits larger
spacings between GND, BL, and VDD. While the V (113) configuration adds an extra
PD fin, the PG fin count remains unchanged, on account of which the reduction
in CBL from V (112) to V (113) is not significant. Thus, we see that a 33% (66%) in-
196
1 2 3 4 5 60
50
100
150
CB
L (aF
)
V(111)
V(112)
V(122)V(123)V(123)
V(113)
(a)1 2 3 4 5 6
0
100
200
300
400
CW
L (
aF) V(111)
V(112)V(113)
V(123)V(122)
V(135)
(b)
Figure 6.27: (FEOL+BEOL) BL and WL capacitances in V SC bitcells
crease in bitcell area, from the V (111) to V (112) [V (111) to V (113)] configuration,
can reduce CBL and increase the βPD/PG ratio significantly. The V (122) and V (123)
bitcells have higher CBL as the PG fin count is higher. Since metal-3 WLs run across
the breadth of the bitcell, CWL generally increases as the PD/PG fin count increases.
However, CWL decreases from the V (122) to V (123) configuration, as the WL gate to
internal node coupling decreases, due to the additional FP spacing between them.
40 50 60 7090
100
110
120
130
140
CB
L (aF
)
Fin pitch, FP (nm)
V(111)PGFB
(a)
40 50 60 70180
200
220
240
CW
L (
aF)
Fin pitch, FP (nm)
V(111)PGFB
(b)
Figure 6.28: (FEOL+BEOL) capacitances vs. FP for (a) CBL, and (b) CWL (GP= 90nm)
Effect of FP: In Fig. 6.28(a), we can see that as FP decreases, CBL increases. CBL is
greatly affected by FP as the metal-2 BLs run vertical to the bitcell in thin-cell lay-
outs. V (111) and PGFB witness a 32% and 39% increase in CBL, respectively, in
197
moving from FP = 40nm to 70nm. The plateau in CBL at FP = 50-60nm is due to
the fact that metal-2 BL, BLB, VDD, and GND tracks are wider for FP = 60nm, 70nm,
owing to the larger pitches. CWL is affected by trends at high and low FP for both
bitcells. As FP increases, the metal-3 WL gets longer and aggregates capacitances
from bitcell features below it, which increases CWL. When FP decreases beyond
a certain point (FP = 50nm), the capacitance between the WL gate and shared el-
evated source/drain/metal-1 regions in the neighborhood boosts CWL. From the
above observations, we can see that there is an optimal FP where CBL and CWL are
minimized.
1 2 30
100
200
300
400
Cap
acita
nce
(aF
)
V(111)PGFBPGFB−PUWGPGFB−SPURBB
1 2 3−20
0
20
40
% d
iffer
ence
w.r
.t. V
(111
)
CBL
CWLC
NL
(a)
1 2 30
50
100
150
200C
apac
itanc
e (a
F)
V(111)PGFBPGFB−PUWGPGFB−SPURBB
1 2 3−30
−20
−10
0
10
% d
iffer
ence
w.r
.t. V
(111
)
CBL
CNL
CWL
(b)
Figure 6.29: IGC bitcell capacitances vs. V (111): (a) (FEOL+BEOL), and (b) FEOL
IGC vs. V (111): In Fig. 6.29, a comparison between IGC and V (111) bitcell ca-
pacitances is shown. The (FEOL+BEOL) CBL for IGC bitcells hovers around 0.95-
1.05×CBL,V (111). However, in the (FEOL+BEOL) CWL cases, PGFB-PUWG (RBB) reg-
isters a 13% (31%) increase, and PGFB witnesses a 15% reduction. PGFB-SPU has
nearly identical CBL/CWL values as V (111). This suggests that CWL increases con-
siderably in IGC configurations, which have extensive routing for back-gate con-
198
nections. The latter is supported by Fig. 6.29(b), where RBB has 16% lower FEOL
CWL than V (111) (owing to the four IG-mode FETs).
Next, we examine the transient behaviors of the bitcells and contrast them with
inferences drawn earlier from DC metrics.
6.4.4 Transient behavior of 6T FinFET SRAMs
We captured the minimum read pulse width (TR) and minimum write pulse width
(TW ) for several array configurations, which were modeled using the setup shown
in Fig. 6.5.
0.5 0.6 0.7 0.8 0.9 12
4
6
8
10
12
14 x 10−10
VDD
(V)
T R (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0.6 0.7 0.8 0.9 14
5
6
7
8
9x 10−10
VDD
(V)
T R (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(b)
0.5 0.6 0.7 0.8 0.9 12
4
6
8
10
12
14 x 10−10
VDD
(V)
T R (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.30: TR vs. VDD: (a) V SC, (b) IGC, and (c) MSC
199
0 2 4 62
4
6
8
10
12 x 10−10
Bitcell σ
T R (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0 2 4 64
5
6
7
8
9
10 x 10−10
Bitcell σ
T R (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(b)
0 2 4 62
3
4
5
6
7 x 10−10
Bitcell σ
T R (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.31: TR vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V
In Fig. 6.30, the dependence of TR on VDD is shown. The default array config-
uration consisted of 32 bitcells per column and 512 bitcells per row. Amongst the
V SC bitcells, V (135) has the largest TR, nearly 2.7× higher than that of V (111), de-
spite having the highest IREAD, owing to the highest WL RC-delay. Also, V (111)
has marginally lower TR in comparison to V (112) despite having larger CBL, owing
to lower WL RC-delay. At VDD = 1V , in moving from V (111) to V (113) (which has
the best RPNM), there is a 45% increase in TR, which is considerable. Among the
IGC bitcells, while PGFB has the lowest TR at VDD = 1V , RBB crosses over at lower
VDD. However, in comparison to V (111), TR,PGFB is 33% higher. In the MSC bitcell
category [Fig. 6.30(c)], DPG-H has the highest TR owing to the lowest IREAD, on ac-
200
count of the high-Vth PG n-FinFETs. A(111), A(11)S, and DPD-L have the lowest TR.
With the exception of DPG-H, for all the MSC bitcells, TR degrades gracefully as
VDD decreases. In moving from A(111) to A(112) (which has much higher RPNM),
TR increases by 14%.
The dependence of TR on the worst-case bitcell FET Vth skews, measured in
units of bitcell σ (where σΦG = 30meV or equivalently σVth = 30mV , VDD = 1V ), is
shown in Fig. 6.31. For the V SC bitcells, with the exception of V (135), TR increases
by 34-42%, in moving from the nominal to 6σ cases. Among the IGC bitcells, al-
though RBB has tolerable TR vs. VDD variation, TR increases dramatically for larger
bitcell σ. In comparison to V SC bitcells, the PGFB bitcells fare poorly as bitcell σ
increases. PGFB-PUWG, PGFB-SPU , and PGFB have 58%, 58%, and 84% higher
TR, respectively, in the 6σ cases. With the exception of DPG-H, all the MSC bitcells
degrade gracefully with increased bitcell σ, and witness a 33-35% increase in TR in
the 6σ case.
In Fig. 6.32, the dependence of TW on VDD is shown. For the V SC bitcells, TW
increases by 46-51% as VDD decreases. Amongst the IGC bitcells, PGFB, PGFB-
SPU , and PGFB-PUWG fare poorly as TW increases by 95%, 114%, and 109%, re-
spectively. On the other hand, RBB faces a 44% increase in TW , despite being
the hardest bitcell to write to at VDD = 1V . Among the MSC bitcells, DPG-H has
the highest TW with a poor VDD scaling trend. A(111) has the best TW , and is 2%
lower than V (111), at VDD = 1V . In moving from A(111) to A(112), TW increases
by 34%, whereby write delay dominates. In Fig. 6.33, the effect of bitcell σ on TW
is shown. V SC bitcells degrade gracefully as bitcell σ increases. However, among
IGC bitcells, PGFB and RBB face a steep rise in TW , implying that dynamic write-
ability is a major problem for these bitcells at high bitcell σ. PGFB-SPU has the
best TW on account of having the back-gate tied to VDD. Also, during a write op-
eration, TW,PGFB−PUWG > TW,PGFB−SPU as WWL, which is asserted along with WL,
201
0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2 x 10−9
VDD
(V)
T W (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0.5 0.6 0.7 0.8 0.9 13
4
5
6
7
8
9 x 10−10
VDD
(V)
T W (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(b)
0.5 0.6 0.7 0.8 0.9 13
4
5
6
7
8
9 x 10−10
VDD
(V)
T W (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.32: TW vs. VDD: (a) V SC, (b) IGC, and (c) MSC
faces a finite RC-delay before weakening the PU p-FinFETs. Amongst the MSC bit-
cells, A(111) has the best TW characteristic and the performance loss with respect to
A(112) is relatively uniform as bitcell σ increases.
Several trends in access time [TACC = max(TR, TW )] can be inferred from
Figs. 6.30, 6.31, 6.32, and 6.33. V SC bitcells are limited by TW for all VDD. Also, for
any given bitcell σ, TACC = TW , implying that V SC bitcells are faced with the issue
of poor dynamic write-ability. Amongst IGC bitcells, at high VDD, TACC = TW for
PGFB and RBB, and TACC = TR for PGFB-PUWG and PGFB-SPU . However, at low
VDD, TACC = TR for all the IGC bitcells. Similarly, MSC bitcells are limited by TR at
low VDD and by TW at high VDD. The above observations also broadly apply when
202
0 2 4 60
0.5
1
1.5
2 x 10−9
Bitcell σ
T W (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(a)
0 1 2 3 4 53
4
5
6
7
8
9 x 10−10
Bitcell σ
T W (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(b)
0 2 4 62
4
6
8
10
12
14 x 10−10
Bitcell σ
T W (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(c)
Figure 6.33: TW vs. bitcell σ: (a) V SC, (b) IGC, and (c) MSC, VDD = 1V
the array configuration (designated by [column height, row width]) is modified
(Figs. 6.34, 6.35). In general, TR and TW increase dramatically on increasing the row
width from 128 to 512 bits (with the column height at 32 bitcells). Increasing the
column height from 16 to 64 bitcells results in a marginal increase in TR and TW ,
which suggests that reducing WL RC-delay is as important as reducing CBL.
Effect of FP: The parasitic capacitances extracted during FP variation experiments
for V (111) in Section 6.4.3 were back-annotated into transient simulations for
TR/TW , and the results are shown in Fig. 6.36. At VDD = 1V , while moving from
FP = 40nm to (FP = 50nm, 60nm, 70nm), TR increases by (9%, 26%, 40%), respec-
tively, and TW increases by (15%, 36%, 56%), respectively. These results show that
203
1 2 3 4 50
0.2
0.4
0.6
0.8
1 x 10−9
Array Configuration
T R (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(32,128)
(64,512)(16,512) (32,256) (32,512)
(a)
1 2 3 4 51
2
3
4
5
6 x 10−10
Array Configuration
T R (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(32,128) (32,512)(16,512) (32,256)
(64,512)
(b)
1 2 3 4 50
1
2
3
4
5 x 10−10
Array Configuration
T R (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(16,512)(32,256)
(32,512)
(64,512)(32,128)
(c)
Figure 6.34: TR vs. array configuration: (a) V SC, (b) IGC, and (c) MSC
FP has a very pronounced impact on the transient behavior of the bitcell, and
the dependence of FEOL parasitic capacitances on FP needs to be accounted for
accurately in compact models for circuit-level transient simulations.
Overall, from Figs. 6.12, 6.13, 6.14, 6.15, 6.16, 6.17 and Figs. 6.30, 6.31, 6.32,
6.33, we can see that the dynamic behavior of the bitcell is extremely important
to account for and analysis using DC metrics alone can lead to misleading conclu-
sions. For instance, V SC bitcells with multi-fin PG FETs suffer a penalty in terms
of increased word line RC-delay that dwarfs any savings from the improved DC
write-ability of the bitcell. E.g., while V (122) has the least WT P with βPG/PU ≈ 2, it
does not translate to the lowest TW , on account of higher parasitics.
204
1 2 3 4 50
0.5
1
1.5 x 10−9
Array Configuration
T W (
s)
V(111)V(112)V(113)V(122)V(123)V(135)
(16,512)
(32,128)
(64,512)(32,512)(32,256)
(a)
1 2 3 4 51
2
3
4
5
6 x 10−10
Array Configuration
T W (
s)
PGFBPGFB−PUWGPGFB−SPURBB
(16,512)(32,128)
(32,256) (64,512)(32,512)
(b)
1 2 3 4 50
2
4
6 x 10−10
Array Configuration
T W (
s)
A(112)A(111)A(11)SDPD−LDPG−H
(16,512) (32,128) (32,256)(32,512)
(64,512)
(c)
Figure 6.35: TW vs. array configuration: (a) V SC, (b) IGC, and (c) MSC
6.5 Chapter summary
Parasitic capacitances play a very critical role in determining SRAM behavior in
highly scaled technology nodes employing multi-gate devices like FinFETs. Prior
research in FinFET SRAM design has generally focused on the optimization of
DC metrics, and largely ignored the effect of parasitics as well as the dynamic
behavior of bitcells. In this chapter, we presented a unified 3D/mixed-mode 2D-
TCAD methodology that facilitates the extraction of FinFET SRAM parasitic capac-
itances using a transport analysis based approach as well as their subsequent back-
annotation into mixed-mode transient simulations, thereby delineating a path for
layout/technology/multi-gate circuit co-design.
205
0.5 0.6 0.7 0.8 0.9 12
4
6
8
10 x 10−10
VDD
(V)
T R (
s)
FP = 40nmFP = 50nmFP = 60nmFP = 70nm
(a)
0.5 0.6 0.7 0.8 0.9 13
4
5
6
7
8
9 x 10−10
VDD
(V)
T W (
s)
FP = 40nmFP = 50nmFP = 60nmFP = 70nm
(b)
Figure 6.36: (a) TR, and (b) TW vs. VDD for V (111), across different FP
Results indicate that while bitcells having IG-mode devices (in single-ΦG pro-
cesses) often have superior DC metrics with respect to shorted-gate configura-
tions, their transient characteristics with VDD/Vth variation makes them unattrac-
tive. Asymm-ΦG FinFET SRAM bitcells (in dual-ΦG processes), on the other hand,
have competitive DC metrics and better dynamic write-ability.
206
Chapter 7
Conclusion
Planar transistor scaling in deep-submicron CMOS technology has approached its
limits at the sub-22nm nodes, owing to very poor electrostatic integrity, which is
manifested as degraded short-channel behavior and high leakage current. Ow-
ing to the latter, it is becoming increasing difficult to design chips that meet the
stringent high-performance, low-power specifications for products ranging from
servers/data centers to cellphones/tablets, and yet maintain high yield. Multi-
gate FETs overcome these problems due to tighter control of the channel potential
by multiple gates wrapped around the body. However, several design and techno-
logical challenges remain towards full-scale adoption of multi-gate devices in all
parts of the power-performance spectrum.
The primary contribution of this thesis is towards the unification of circuit lay-
out/process/device simulation worlds for early-stage 3D modeling of circuits us-
ing emerging devices, such as FinFETs and other multi-gate FETs. This was done
by proposing a novel structure synthesis methodology that circumvents the time
and memory complexity barrier posed by 3D process simulation and obviates the
need for repetitive process simulations via caching/re-use of process-accurate de-
vice structures. The latter contribution, which is unique to the industry, was lever-
207
aged throughout the dissertation to answer critical questions that arise during
technology-circuit co-design efforts for multi-gate circuits.
7.1 Dissertation summary
In this dissertation, we first explored the design space for ultra-low-leakage Fin-
FET logic and sequential elements using Symm-ΦG and Asymm-ΦG FinFETs in
a high-performance process. Through comprehensive simulations of several Fin-
FET INV and NAND2 logic topologies, we demonstrated that introducing a single
Symm-ΦG IG-mode FinFET at the top of the pull-down stack yields the best trade-
offs between leakage and delay for logic styles that mix SG- and IG-mode FinFETs.
We also established the fact that logic gates using pure a-SG-mode FinFETs outper-
form the best topologies possible using a mix of SG-/IG-mode FinFETs, with the
advantage of having no routing overheads due to back-gate biasing and little or no
modifications to existing CAD tools/layouts. Hence, this work challenges the con-
ventional notion from most earlier works that back-gate biasing based IG-mode
devices are the best option for obtaining low-leakage FinFET circuits.
Next, we examined the problem of developing comprehensive fault models for
FinFET circuits. Here, the key question of whether CMOS fault models cover all
defects in FinFET logic circuits had to be addressed. Using exhaustive single de-
fect injection into mixed-mode device-circuit simulations of FinFET SG-/LP-INV
and NAND2 topologies, we confirmed that the majority of defects in SG-mode-like
circuits map to CMOS fault models, such as stuck-at/stuck-on/stuck-open. How-
ever, opens on the back gate have no corresponding fault model and the entire logic
gate had to be characterized from a leakage-delay perspective. Here, several kinds
of behaviors were observed. Depending on the regime of operation, in the pres-
ence of n-/p-FinFET back-gate cuts, the logic gate could behave normally or suffer
208
excessive leakage or suffer excessive delay/pulse-broadening/pulse-shortening,
making it very difficult to develop a single test protocol for detecting back-gate
cuts. The above explorations comprehensively establish the fact that integrating
IG-mode devices into a multi-gate CMOS process is a bad idea, and that asymmet-
ric gate-workfunction devices are a better option for obtaining high-Vth devices
without the hassles of additional gate-workfuctions/IG-mode routing overheads
and testing issues.
Thereafter, drawing upon the shortcomings of performing mixed-mode 2D-
TCAD transient simulation for obtaining delay in Chapter 3, we switched to the
problem of determining parasitic capacitances in multi-gate circuit layouts in
Chapters 4 and 5. Here, the need for a true 3D extraction setup, which accounts for
silicon as a semiconductor, became apparent. In the process of solving the latter,
we encountered a much larger problem, which was the 3D process simulation
barrier. We developed a systematic automated structure synthesis methodology
to circumvent the process simulation barrier with the key insight that all portions
of the circuit structure under simulation need not have process-simulation-level
accuracy and that it is possible to amortize process simulations of smaller cir-
cuit blocks by re-using them to synthesize larger circuits. We also validated
the transport analysis based capacitance extraction approach (that requires a
process/device simulator) by comparing simulation results with hardware data
for two companion 6T SRAM arrays implemented in an IBM 32nm SOI process
that was calibrated into our TCAD setup. After establishing the validity of our
approach, we next provided critical insight into parasitic capacitances in multi-
gate circuits at the 22/14/10nm nodes using a multi-gate version of the structure
synthesizer. Overall, structure synthesis is the TCAD analog of logic synthesis
from the circuit world, and it obviates low-level layout-to-3D-TCAD coding work,
which engineers have grappled with for the past decade.
209
Finally, leveraging the structure synthesis approach, we analyzed several
topologies of FinFET SRAMs in a 22nm SOI process. By performing layout-
to-3D-TCAD capacitance extraction for each SRAM topology (having a mix of
SG-/IG-/a-SG-mode devices), and back-annotating the parasitic capacitances into
mixed-mode device-circuit transient simulations thereafter, we demonstrated that
dynamic read-ability/write-ability can vary dramatically with fin/gate pitch as
well as SRAM topology, and that design with DC targets can lead to sub-optimal
SRAM bitcells and arrays.
7.2 Future work
A significant limitation of the structure synthesis ideas proposed in Chapter 4 is
the exponential slowdown due to the device simulator for any 3D simulation that
does not involve zero bias capacitance extraction, e.g., 3D transient/DC simula-
tion. Here, we propose future work on enabling rapid convergence during generic
device simulation experiments on synthesized 3D structures, by leveraging the
constructs and abstractions proposed in Chapter 4.
Key insights
In device simulation, the solution of the carrier transport equations (depend-
ing on the model assumed, e.g., drift-diffusion, hydrodynamic, etc.) is obtained
from the PSD/synthesized structure subject to voltage/current boundary condi-
tions imposed by the external circuit. After time and space discretization, the com-
plete system of transport equations, which are typically nonlinear partial differen-
tial equations, is transformed into a set of nonlinear algebraic equations (F). Let K
denote the total number of mesh nodes in the structure, with R variables to solve
per node. Let us consider the drift-diffusion model for the sake of illustration. For
this model, the variables that need to be solved are electrostatic potential ψ, elec-
210
tron concentration n, and hole concentration p. Thus, R = 3. The solution vector x
can be written as x = (ψ1,n1, p1, ...,ψK,nK, pK)T . The system of nonlinear equations
can be cast in matrix form as:
F(x) = Ax−b = 0, A→ 3K×3K, b→ 3K×1 (7.1)
A direct solver (DS) [174], [175] attempts to solve this equation by directly in-
verting A to give x = A−1b, which can be very cumbersome for large K. Alterna-
tively, some form of decomposition, e.g., LU decomposition, can be used, where A
is expressed as A = LU. Here, L and U are upper and lower triangular matrices,
respectively. LY = b is solved first, followed by UX = Y. Both can be solved by
back-substitution.
An iterative linear solver (ILS) [176] attempts to solve Eq. (7.1) without any
explicit matrix inversion, by starting with an initial guess x0 and expanding F as:
F(x0 +δx0)≈ F(x0)+Fx(x0)δx0 ≈ 0, where Fx = Jacobian of F w.r.t. x (7.2)
and then solving for the update vector δx0 = x1−x0, and so on, till certain conver-
gence criteria are satisfied. This is the Newton-Raphson method, several variants
of which exist in the literature [177–179]. Fig. 7.1 shows the memory scaling be-
havior of a DS and ILS with increasing number of mesh nodes for sample device
simulations, using Sentaurus Device [83]. In terms of computation time, DS scales
as O(K2) whereas ILS scales as O(PK), where P is the number of iterations. This
analysis shows that ILS is a more suitable solver for 3D device simulations.
An important criterion for ILS to succeed is that each successive iteration ‘i’
should reduce the update vector |δxi|. Therefore, convergence is dependent on
how close the initial guess x0 is to the actual solution and how accurately the Jaco-
bian Fx is computed. When both are favorable, the convergence rate is quadratic.
211
Figure 7.1: Scaling behavior: Direct versus iterative linear solvers
The “closeness” criterion is key to solving certain hard, pathological instances
where an initial guess x0, which would normally have been considered to be close
to the solution, causes successive iterations to oscillate due to the nonlinear behav-
ior of the system and prevents convergence within the maximum iteration limit.
While iterations on Eq. (7.2) may help reach a solution at a certain bias, it is
necessary to have a strategy to obtain a solution at any bias, as this is essential
for DC, AC, and transient simulations. Here, extrapolation from an earlier biasing
condition is generally used in most device simulators. In order to extrapolate, the
dependence on boundary conditions V is explicitly cast as:
F(x,V) = 0 (7.3)
Assuming that a solution x is available at a certain bias V, in order to ramp up to a
new bias condition, a bias increment δV needs to be chosen. Advancing to the new
bias V+δV, the new solution x+δx can be computed from:
F(x+δx,V+δV)≈ Fx(x,V)δx+FV(x,V)δV≈ 0 (7.4)
212
whereby
Fx(x,V)δx =−FV(x,V)δV (7.5)
Whereas Fx(x,V) is the Jacobian as before, FV(x,V) needs to be evaluated from the
equations that have a dependence on V. By iterating on Eq. (7.5), the solution at
V+δV is determined. Thereafter, a new δV is chosen until the final biasing con-
dition is reached. Since each intermediate ramping step can potentially consume
a large number of iterations for a 3D device structure, performing simulation not
close to zero-bias conditions is very cumbersome. Also, extrapolation works well
only when δV is small. This is due to the fact that at arbitrary nonequilibrium con-
ditions V, only if the system of equations F behaves linearly around V does the
projected solution converge quadratically. Else, convergence is not obtained and a
smaller bias increment needs to be chosen. This imposes a restriction on δV, which
increases the total number of ramping steps needed. Hence, the total runtime in-
creases. With the introduction of more complex physical models, the system of
equations F gets harder to solve. E.g., for the hydrodynamic model, R = 6, and
since the size of A scales as O(K2R2), the runtime increases even more.
Proposed approach
We propose to replace the traditional extrapolate strategy with a hybrid cache-
extrapolate-update approach to overcome the difficulties encountered in finding
a solution for F, targeting both larger bias increments as well as complex physi-
cal models that have R ≥ 3. The key idea is illustrated in Fig. 7.2, which shows a
synthesized 3D FinFET SRAM structure on the right. During the course of a sim-
ulation, the devices in the structure may traverse various states (e.g., in terms of
ψ,n, p) repeatedly. Thus, the following questions arise. Why should the solver take
the trouble of extrapolation for each global boundary condition V+δV = Σ? Is it
possible to express Σ as a set of internal boundary/bias conditions ζ1,ζ2, ...,ζNZ ,
where NZ is the number of zones that the structure can be partitioned into? There-
213
Bias condition 1
Bias condition 2
Bias condition 3, …Extended structure
Figure 7.2: Key question: Can the solution of a large structure be approximatedusing individual pre-solved device states?
after, for the structure in zone j, 1 ≤ j ≤ NZ, if a corresponding pre-solved device
(should match in geometry, but need not be identical in doping distributions, etc.,
i.e., PW -GA is sufficient) exists, would it be possible to “restore” the cached device
state at the boundary condition ζ j? This would enable the solver to start with a
solution that is very close to the actual solution for Σ, with extrapolation being
performed only in the “uncached” zones corresponding to regions between FETs,
etc. Also, extrapolation in uncached zones could be a heuristic transform (Ω) con-
sisting of a combination of extrapolations from cached device states, which are in
the vicinity, and extrapolation from the earlier bias condition. This strategy will
reduce the number of iterations per bias ramp, irrespective of the value of R, per-
mitting larger values for δV. Owing to the zoning requirement, this approach also
merges well with the zone-based structure synthesis approach proposed earlier.
The above approach can be easily extended to mixed-mode simulation, where
ζ j denotes the boundary condition at the contacts of individual devices. We per-
214
1 2 3 4 5 6 70
500
1000
1500
Mix
ed−m
ode
sim
ulat
ion
runt
ime
(s)
Without cachingWith caching
SGNAND
IGNANDMTNAND
IG2NAND
XT2NAND
XTNAND
LPNAND
Figure 7.3: Transient mixed-mode 2D device simulation runtimes for FinFETNAND gates, with and without cache-restore of device states
formed preliminary experiments with a trivial cache-restore strategy on seven dif-
ferent FinFET NAND gate topologies described in [60]. Fig. 7.3 shows the results.
An improvement of 2.6× to 5× is seen, which is promising. Since cache and restore
breakpoints were manually inserted in these cases, they are not suitable for larger
mixed-mode/3D simulations, which would need a systematic and automated ap-
proach.
2D/3D process
simulation of a single
device
(Tsuprem4, Sprocess, etc.)
If 2D, extrude and
transform into 3D
Re-mesh single
device structure
Add/delete
electrical/thermal
contacts
(Sentaurus Mesh, etc.)
Database of process
conditions
Material system
parameters/properties
Physical model
coefficients
Detailed single-device 3D
simulation
Different physical/
transport models, e.g.,
DD, hydrodynamic, etc.
DC, AC and transient
pulse characterization
(Sentaurus Device, etc.)
Sub-sample 3D device/
cache device snapshots at
Small subset of device
mesh points
Different bias conditions
(Vi , Ik ) of the structure
Intelligent device-state
caching (DSC) algorithm
Topologically encode
device state, e.g.,
n(r), T(r), (r)
Loop over each device
Device-state database (DSD)
Figure 7.4: Generation of device-state database
215
Best-guess-at-bias
(BGB) algorithm
Stitch solution for
entire 3D structure
from individual device
states and transform
to DLD coordinates
at current bias
Extrapolate at
regions/points not
present in DSD
Generate complete
solution adaptable to
required structure on
the fly
DSD Extended 3D structure
virtual segmentation
using DLD device
association with
global mesh points
Intelligently select
mesh points for BGB
from 3D structure
DLD
Devj Vi
n(r), T(r), (r), Ik
Spatially extrapolate to
all mesh points at
current bias condition
BGB
interface
Structure
interface
Figure 7.5: State retrieval and extrapolation using the BGB algorithm
The building blocks of the proposed cache-extrapolate-update approach are
shown in Figs. 7.4, 7.5, and 7.6. The first step will be to generate the device-state
database (DSD). This will need to be performed for all the devices that are ex-
pected to be repeatedly used in a large number of simulations, to efficiently amor-
tize the effort involved. These devices will be simulated under several different
DC and transient boundary conditions to generate considerable device-state data.
Thereafter, the device will be sub-sampled at select mesh locations and select bias
conditions, using an intelligent device-state caching algorithm that encodes the
device-state without using up a prohibitively large amount of disk space. Fig. 7.5
shows a possible state retrieval/reconstruction approach based on a best-guess-at-
bias (BGB) algorithm that will be developed. Fig. 7.6 shows the point of entry into
the solver loop of the device simulator. While solving the 3D structure at zero-bias
conditions (Fig. 7.6) as a separate thread, structural information can be passed to
the retrieval step, where the structure is segmented/partitioned with the aid of the
DLD into zones (these can be identical to the zones used in synthesis). Using the
DLD and DSD, it will become possible to determine the zones that can be mapped
216
to pre-solved devices, whereby the mesh associated with device “ j” develops an
association with the global structure mesh (this mapping is referred to as Γ j). At
this juncture, the method will also select restore points in the 3D structure intelli-
gently.
Load extended
3D structure into device simulator
Solve at zero biasIncrement/decrement
boundary condition ()
BGB
interface
Re-solve till
convergence criteria satisfied
STOP
Yes
No Current bias
= final bias?
Update state of
extended 3D structure
Additional steps
Structure
interface
Figure 7.6: Updating state in the solver loop
Depending on the nature of the “solve” task, the solver will increment/decrement
the boundary condition Σ. The earlier device-state and Σ will be passed to the BGB
interface. The BGB algorithm will convert Σ into many zone boundary conditions
ζ j = Dev jVj, which will be passed to the DSD for state-lookup. This operation
can be multi-threaded, with lookup occurring in parallel, for each zone. The DSD
will retrieve the closest state corresponding to ζ j and spatially extrapolate to all
mesh points of the device structure. Next, the BGB algorithm will collect all the
data from the DSD and map/extrapolate them to the global structure mesh using
each Γ j. Thereafter, for zones that are not covered by the DSD, such as regions
between FETs, etc., another extrapolation will be performed using the heuristic
transform Ω, in order to stitch together a complete solution for the entire structure.
217
This will serve as an initial guess for the solver, as shown in Fig. 7.6. Thereafter,
the solver will assume control and iterate using Eq. (7.2) to reach convergence.
It is important to note that the additional steps introduced by the proposed
approach are optional. If the solver is able to make large δV increments with fewer
iterations to convergence, the above steps can be skipped and Eq. (7.5) can be used.
Else, whenever δV decreases or a large number of steps is needed for convergence
at a bias point, the BGB update can be invoked to provide a good initial guess. The
proposed approach will prove to be very useful when the complexity of physical
models increases, e.g., solving with quantum hydrodynamic models, where F can
be quite complex and elude convergence.
218
Appendix A
FinE3D framework
A.1 FinE3D Sentaurus TCAD decks
The FinE3D structure synthesis framework was developed in Python and inte-
grated into the Sentaurus Workbench. The setup consists of two kinds of decks:
• PrepareFETs: These constitute the PA-GA zone databases obtained from de-
tailed process simulations followed by presynthesis transformations to make
them amenable to structure synthesis, as shown in Figs. A.1 and A.2.
• GDS2Device: These decks perform structure synthesis for input layouts that
are annotated, and are derived for a fixed layout-layer map file from Ca-
dence Virtuoso, as shown in Fig. A.3. Thereafter, the layout is analyzed and
the FEOL synthesizer (Fig. A.4) produces the FEOL structure using FEOL
process assumptions and the device database. This is followed by BEOL and
integrated structure synthesis that have their respective input files, as shown
in Fig. A.5. After structure synthesis, selective mesh refinement is performed,
followed by transport analysis based capacitance extraction in the device
simulator, as shown in Fig. A.6. Thereafter, post-processing toolchains are
appended to produce the requested outputs.
219
Process
simulation
Figure A.1: A sample process simulation deck
Pre-synthesis
transformations
Figure A.2: Pre-synthesis transformations
220
Layout layer
map
Annotated
layouts
Layout
annotator/converter
Figure A.3: Layout annotation
FEOL
synthesizerPA-GA zone
database
FEOL process
assumptions
FEOL flags
Figure A.4: FEOL structure synthesis
221
BEOL
synthesizerBEOL process
assumptions
BEOL flags
Integrated
structure
synthesizer
Integration
flags
Figure A.5: BEOL and integrated structure synthesis
Mesh
refinements
Transport analysis based
capacitance extraction Post-processing
Figure A.6: Mesh refinement, capacitance extraction, and post-processing
222
Bibliography
[1] J. P. Colinge, FinFETs and Other Multi-gate Transistors. Springer, New York,2008.
[2] D. Vasileska, S. Goodnick, and G. Klimeck, Computational Electronics: Semi-classical and Quantum Device Modeling and Simulation. CRC Press, 2010.
[3] B. Agrawal, V. K. De, J. M. Pimbley, and J. D. Meindl, “Short channel modelsand scaling limits of SOI and bulk MOSFETs,” IEEE J. Solid-State Circuits,vol. 29, no. 2, pp. 122–125, Feb. 1994.
[4] T. Skotnicki, G. Merckel, and T. Pedron, “The voltage doping transformation:A new approach to the modeling of MOSFET short channel effects,” IEEEElectron Device Lett., vol. 9, no. 3, pp. 109–112, Mar. 1988.
[5] T. Sakurai, A. Matsuzawa, and T. Douseki, Fully depleted SOI CMOS circuitsand Technology for Ultralow Power Applications. Springer, New York, 2006.
[6] J. P. Colinge, Silicon on Insulator Technology: Materials to VLSI. Springer, NewYork, 2004.
[7] K. Bernstein and N. Rohrer, SOI Circuit Design Concepts. Springer, NewYork, 2000.
[8] “2007 International Technology Roadmap for Semiconductors,” http://www.itrs.net/Links/2007ITRS/Home2007.htm.
[9] D. Hisamoto, T. Kaga, Y. Kawamoto, and E. Takeda, “A fully depleted leanchannel transistor (DELTA),” in Proc. Int. Electron Devices Mtg., Dec. 1989, pp.833–836.
[10] M. Jurczak et al., “Silicon on nothing (SON): An innovative process for ad-vanced CMOS,” IEEE Trans. Electron Devices, vol. 47, no. 11, pp. 2179–2187,Nov. 2000.
[11] L. Mathew et al., “CMOS vertical multiple independent gate field effect tran-sistor (MIGFET),” in Proc. Int. SOI Conf., Oct. 2004, pp. 187–189.
[12] K. Okano et al., “Process integration technology and device characteristics ofCMOS FinFET on bulk silicon substrate with sub-10 nm fin width and 20 nmgate length,” in Proc. Int. Electron Devices Mtg., Dec. 2005, pp. 721–724.
223
[13] B. S. Doyle et al., “High performance fully depleted tri-gate CMOS transis-tors,” IEEE Electron Device Lett., vol. 24, no. 4, pp. 263–265, Apr. 2003.
[14] J. T. Park, J. P. Colinge, and C. H. Diaz, “Pi-gate SOI MOSFET,” IEEE ElectronDevice Lett., vol. 22, no. 8, pp. 405–406, 2001.
[15] F. L. Yang et al., “25 nm CMOS omega FETs,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 255–258.
[16] N. Singh et al., “High performance fully depleted silicon nanowire (diameter< 5nm) gate all around CMOS devices,” IEEE Electron Device Lett., vol. 27,no. 5, pp. 383–386, May 2006.
[17] S. Miyano, M. Hirose, and F. Masuoka, “Numerical analysis of a cylindricalthin pillar transistor (CYNTHIA),” IEEE Trans. Electron Devices, vol. 39, no. 8,pp. 1876–1881, Aug. 1992.
[18] S. Y. Lee et al., “Three dimensional MBCFET as an ultimate transistor,” IEEEElectron Device Lett., vol. 25, no. 4, pp. 217–219, Apr. 2004.
[19] J. P. Colinge, M. H. Gao, A. Romano, H. Maes, and C. Claeys, “Silicon oninsulator gate all around device,” in Proc. Int. Electron Devices Mtg., Dec. 1990,pp. 595–598.
[20] D. Park, “3 dimensional GAA transistors: Twin silicon nanowire MOSFETand multi bridge channel MOSFET,” in Proc. Int. SOI Conf., Oct. 2006, pp.131–134.
[21] T. Ernst et al., “Novel 3D integration process for highly scalable nano beamstacked channels GAA (NBG) CMOSFETs with HfO2/TiN gate stack,” inProc. Int. Electron Devices Mtg., Dec. 2006, pp. 4–7.
[22] E. J. Nowak, I. Aller, T. Ludwig, K. Kim, R. V. Joshi, C.-T. Chuang, K. Bern-stein, and R. Puri, “Turning silicon on its edge,” IEEE Circuits and DevicesMagazine, vol. 20, no. 1, pp. 20–31, Jan.-Feb. 2004.
[23] X. Huang et al., “Sub-50 nm FinFET: PMOS,” in Proc. Int. Electron DevicesMtg., Dec. 1999, pp. 67–70.
[24] B. Yu et al., “FinFET scaling to 10nm gate length,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 251–254.
[25] J. Yang, P. M. Zeitzoff, and H. H. Tseng, “Highly manufacturable double-gate FinFET with gate source/drain underlap,” IEEE Trans. Electron Devices,vol. 54, no. 6, pp. 1464–1470, June 2007.
[26] T. Ludwig et al., “FinFET technology for future microprocessors,” in Proc.Int. SOI Conf., 2003, pp. 33–34.
224
[27] L. Mathew et al., “Multi-gated device architectures advances, advantagesand challenges,” in Proc. Int. Conf. Integrated Circuit Design and Technologyand Tutorial, 2004, pp. 97–98.
[28] Y. K. Choi et al., “FinFET process refinements for improved mobility and gateworkfunction engineering,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp.259–262.
[29] Z. B. Zhang et al., “An integratable dual metal gate/high-k CMOS solutionfor FDSOI and MuGFET technologies,” in Proc. Int. SOI Conf., 2005, pp. 157–158.
[30] M. Ieong et al., “High performance double-gate device technology challengesand opportunities,” in Proc. Int. Symp. Quality of Electronic Design, Mar. 2002,pp. 492–495.
[31] B. Majkusiak, T. Janik, and J. Walczak, “Semiconductor thickness effects inthe double gate SOI MOSFET,” IEEE Trans. Electron Devices, vol. 45, no. 5, pp.1127–1134, May 1998.
[32] C. W. Lee et al., “Device design guidelines for nanoscale MuGFETs,” Solid-State Electronics, vol. 51, no. 3, pp. 505–510, 2007.
[33] Y. K. Choi, T. J. King, and C. Hu, “Nanoscale CMOS spacer FinFET for theterabit era,” IEEE Electron Device Lett., vol. 23, no. 1, pp. 25–27, Jan. 2002.
[34] J. Kedzierski et al., “Extension and source/drain design for high performanceFinFET devices,” IEEE Trans. Electron Devices, vol. 50, no. 4, pp. 952–958, Apr.2003.
[35] H. Shang et al., “Investigation of FinFET devices for 32nm technologies andbeyond,” in Proc. Int. Symp. VLSI Technology, June 2006, pp. 54–55.
[36] S. Xiong and J. Bokor, “Sensitivity of double gate and FinFET devices to pro-cess variations,” IEEE Trans. Electron Devices, vol. 50, no. 11, pp. 2255–2261,Nov. 2003.
[37] J. Kedzierski et al., “Metal gate FinFET and fully depleted SOI devices usingtotal gate silicidation,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp. 247–250.
[38] P. Ranade et al., “Tunable workfunction molybdenum gate technology forFDSOI-CMOS,” in Proc. Int. Electron Devices Mtg., Dec. 2002, pp. 363–366.
[39] W. P. Maszara et al., “Transistors with dual workfunction metal gates by sin-gle full silicidation (FUSI) of polysilicon gates,” in Proc. Int. Electron DevicesMtg., Dec. 2002, pp. 367–370.
225
[40] D. Ha et al., “Molybdenum gate HfO2 CMOS FinFET technology,” in Proc.Int. Electron Devices Mtg., Dec. 2004, pp. 643–646.
[41] M. Dunga et al., “BSIM-MG: A versatile multi-gate FET model for mixed-signal design,” in Proc. Int. Symp. VLSI Technology, June 2007, pp. 60–61.
[42] J. Fossum et al., “A process-physics based compact model for nanoclassicalCMOS device and circuit design,” Solid-State Electronics, vol. 48, pp. 919–926,June 2004.
[43] W. Zhang, J. Fossum, L. Mathew, and Y. Du, “Physical insights regardingdesign and performance of independent-gate FinFETs,” IEEE Trans. ElectronDevices, vol. 52, no. 10, pp. 2198–2207, Oct. 2005.
[44] S. H. Kim, J. G. Fossum, and V. P. Trivedi, “Bulk inversion in FinFETs and im-plied insights on effective gate width,” IEEE Trans. Electron Devices, vol. 52,no. 9, pp. 1993–1997, Sept. 2005.
[45] V. Trivedi, J. G. Fossum, and M. M. Chowdhury, “Nanoscale FinFETs withgate source/drain underlap,” IEEE Trans. Electron Devices, vol. 52, no. 1, pp.56–62, Jan. 2005.
[46] H.-K. Lim and J. G. Fossum, “Threshold voltage of thin film silicon on insu-lator MOSFETs,” IEEE Trans. Electron Devices, vol. 30, no. 10, pp. 1244–1251,Oct. 1983.
[47] M. M. Chowdhury and J. G. Fossum, “Physical insights on electron mobilityin contemporary FinFETs,” vol. 27, pp. 482–485, June 2006.
[48] M. M. Chowdhury, V. P. Trivedi, J. G. Fossum, and L. Mathew, “Carrier mo-bility/transport in undoped-UTB DG FinFETs,” IEEE Trans. Electron Devices,vol. 54, pp. 1125–1131, May 2007.
[49] J. G. Fossum et al., “Pragmatic design of nanoscale multi-gate CMOS,” inProc. Int. Electron Devices Mtg., Dec. 2004, pp. 613–616.
[50] M. Masahara et al., “Demonstration of asymmetric gate oxide thickness 4-terminal FinFETs,” in Proc. Int. SOI Conf., 2006, pp. 165–166.
[51] N. Collaert et al., “A functional 41-stage ring oscillator using scaled FinFETdevices with 25-nm gate lengths and 10-nm fin widths applicable for the 45-nm CMOS node,” IEEE Electron Device Lett., vol. 25, pp. 568–570, Aug. 2004.
[52] A. Datta et al., “Modeling and circuit synthesis for independently controlleddouble gate FinFET devices,” IEEE Trans. Computer-Aided Design, vol. 26,no. 11, pp. 1957–1966, Nov. 2007.
[53] K. Endo et al., “A dynamical power-management demonstration using four-terminal separated-gate FinFETs,” IEEE Electron Device Lett., vol. 28, pp. 452–454, May 2007.
226
[54] D. Hackler, D. DeGregorio, and S. Parke, “Ultra-low-power, high-performance, dynamic-threshold digital circuits in the FlexFETindependently-double-gated SOI CMOS technology,” in Proc. Int. SOIConf., 2005, pp. 81–82.
[55] J. Gu, J. Keane, S. Sapatnekar, and C. H. Kim, “Statistical leakage estimationof double gate FinFET devices considering the width quantization property,”IEEE Trans. VLSI Systems, vol. 16, pp. 206–209, Feb. 2008.
[56] J. Ouyang and Y. Xie, “Power optimization for FinFET-based circuits usinggenetic algorithms,” in Proc. Int. SOI Conf., 2008, pp. 211–214.
[57] A. Kumar, B. A. Minch, and S. Tiwari, “Low voltage and performance tun-able CMOS circuit design using independently driven double gate MOS-FETs,” in Proc. Int. SOI Conf., 2004, pp. 119–121.
[58] M. H. Chiang, K. Kim, C. Tretz, and C. Chuang, “Novel high-density low-power logic circuit techniques using DG devices,” IEEE Trans. Electron De-vices, vol. 52, pp. 2339–2342, Oct. 2005.
[59] A. Muttreja, N. Agarwal, and N. K. Jha, “CMOS logic design withindependent-gate FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007,pp. 560–567.
[60] A. N. Bhoj and N. K. Jha, “Design of ultra-low-leakage logic gates and flip-flops in high performance FinFET technology,” in Proc. Int. Symp. Quality ofElectronic Design, Mar. 2011, pp. 1–8.
[61] S. A. Tawfik and V. Kursun, “Characterization of new static independent-gate-biased FinFET latches and flip-flops under process variations,” in Proc.Int. Symp. Quality of Electronic Design, Mar. 2008, pp. 311–316.
[62] ——, “Low-power and compact sequential circuits with independent-gateFinFETs,” IEEE Trans. Electron Devices, vol. 55, no. 1, pp. 60–70, Jan. 2008.
[63] G. Curatola and S. Nuttinck, “The role of volume inversion on the intrin-sic RF performance of double-gate FinFETs,” IEEE Trans. Electron Devices,vol. 54, no. 1, pp. 141–150, Jan. 2007.
[64] S. Nuttinck, B. Parvais, G. Curatola, and A. Mercha, “Double-gate FinFETsas a CMOS technology downscaling option: An RF perspective,” IEEE Trans.Electron Devices, vol. 54, no. 2, pp. 279–283, Feb. 2007.
[65] V. Subramanian et al., “Planar bulk MOSFETs versus FinFETs: An analog/RFperspective,” IEEE Trans. Electron Devices, vol. 53, no. 12, pp. 3071–3079, Dec.2006.
[66] P. Wambacq et al., “The potential of FinFETs for analog and RF circuit appli-cations,” IEEE Trans. Circuits & Systems, vol. 54, pp. 2541–2551, Nov. 2007.
227
[67] A. Kranti and G. A. Armstrong, “Design and optimization of FinFETs forultra-low-voltage analog applications,” IEEE Trans. Electron Devices, vol. 54,pp. 3308–3316, Dec. 2007.
[68] G. Pei and E. C. Kan, “Independently driven DG MOSFETs for mixed-signal circuits: Part I-quasi-static and nonquasi-static channel coupling,”IEEE Trans. Electron Devices, vol. 51, pp. 2086–2093, Dec. 2004.
[69] ——, “Independently driven DG MOSFETs for mixed-signal circuits: PartII-applications on cross-coupled feedback and harmonics generation,” IEEETrans. Electron Devices, vol. 51, pp. 2094–2101, Dec. 2004.
[70] M. Shrivastava et al., “A novel and robust approach for common mode feed-back using IDDG FinFET,” IEEE Trans. Electron Devices, vol. 55, pp. 3274–3282, Nov. 2008.
[71] Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic,“FinFET-based SRAM design,” in Proc. Int. Symp. Low Power Electronics &Design, Aug. 2005, pp. 2–7.
[72] A. Carlson, Z. Guo, S. Balasubramanian, R. Zlatanovici, T. Liu, andB. Nikolic, “SRAM read/write margin enhancements using FinFETs,” IEEETrans. VLSI Systems, vol. 18, no. 6, pp. 887–900, Sept. 2009.
[73] S. Inaba et al., “Direct evaluation of DC characteristic variability in FinFETSRAM cell for 32nm node and beyond,” in Proc. Int. Electron Devices Mtg.,Dec. 2007, pp. 487–490.
[74] A. Bansal, S. Mukhopadhyay, and K. Roy, “Device-optimization techniquefor robust and low-power FinFET SRAM design in nanoscale era,” IEEETrans. Electron Devices, vol. 54, pp. 1409–1419, June 2007.
[75] Z. Liu, S. A. Tawfik, and V. Kursun, “Statistical data stability and leakageevaluation of FinFET SRAM cells with dynamic threshold voltage tuningunder process parameter fluctuations,” in Proc. Int. Symp. Quality of ElectronicDesign, Mar. 2008, pp. 305–310.
[76] B. K. Young, K. Yong-Bin, and F. Lombardi, “Low power 8T SRAM using32nm independent gate FinFET technology,” in Proc. Int. SOI Conf., 2008, pp.247–250.
[77] R. V. Joshi, K. Kim, R. Q. Williams, E. J. Nowak, and C.-T. Chuang, “A high-performance, low leakage, and stable SRAM row-based back-gate biasingscheme in FinFET technology,” in Proc. Int. Conf. VLSI Design, Jan. 2007, pp.665–672.
[78] K. Itoh, M. Horiguchi, and H. Tanaka, Ultra-low Voltage Nanoscale Memories.Springer, New York, 2007.
228
[79] W. Zhang, J. G. Fossum, L. Mathew, and Y. Du, “Physical insights regardingdesign and performance of independent-gate FinFETs,” IEEE Trans. ElectronDevices, vol. 52, pp. 2198–2206, Oct. 2005.
[80] “2011 International Technology Roadmap for Semiconductors, Modelingand Simulation,” http://www.itrs.net/Links/20011ITRS/2011Chapters/2011Modeling.pdf.
[81] Sentaurus TCAD manuals. http://www.synopsys.com.
[82] Fabless foundry model under stress: 20nm challenges.http://semimd.com/blog/2012/06/26/fabless-foundry-model-under-stress/.
[83] Sentaurus Device manuals. http://www.synopsys.com.
[84] UF-SOI user guide. http://www.soi.tec.ufl.edu/.
[85] Y. Li, C.-H. Hwang, and H.-W. Cheng, “Process-variation and random-dopants-induced threshold voltage fluctuations in nanoscale planar MOS-FET and bulk FinFET devices,” Microelectronic Engineering, vol. 86, no. 3, pp.277–282, Mar. 2009.
[86] A. Thean et al., “Performance and variability comparisons between multi-gate FETs and planar SOI transistors,” in Proc. Int. Electron Devices Mtg., Dec.2006, pp. 1–4.
[87] A. N. Bhoj and N. K. Jha, “Design of logic gates and flip-flops in high-performance FinFET technology,” accepted for publication in IEEE Trans. VLSISystems.
[88] H. Dadgour, K. Endo, V. De, and K. Banerjee, “Modeling and analysis ofgrain-orientation effects in emerging metal-gate devices and implications forSRAM reliability,” in Proc. Int. Electron Devices Mtg., 2008, pp. 705–708.
[89] ——, “Grain-orientation induced work function variation in nanoscalemetal-gate transistors - part I: modeling, analysis and experimental valida-tion,” IEEE Trans. Electron Devices, vol. 57, no. 10, pp. 2504–2513, Oct. 2010.
[90] ——, “Grain-orientation induced work function variation in nanoscalemetal-gate transistors - part II: implications for process, device and circuitdesign,” IEEE Trans. Electron Devices, vol. 57, no. 10, pp. 2515–2525, Oct. 2010.
[91] S. Rasouli, K. Endo, J. Chen, N. Singh, and K. Banerjee, “Grain-orientationinduced quantum confinement variation in FinFETs and multi-gate ultra-thin body CMOS devices and implications for digital design,” IEEE Trans.Electron Devices, vol. 58, no. 8, pp. 2282–2292, Aug. 2011.
229
[92] K. von Arnim et al., “A low power multi-gate FET CMOS technology with13.9ps inverter delay,” in Proc. Int. Symp. VLSI Technology, 2007, pp. 106–107.
[93] C. Pacha et al., “Efficiency of low-power design techniques in multi-gate FETCMOS circuits,” in Proc. European Conf. Solid-State Circuits, 2007, pp. 111–114.
[94] A. Muttreja, N. Agarwal, and N. K. Jha, “CMOS logic design with indepen-dent gate FinFETs,” in Proc. Int. Conf. Computer Design, Oct. 2007, pp. 560–567.
[95] M. Rostami and K. Mohanram, “Dual-Vth independent-gate FinFETs for lowpower logic circuits,” IEEE Trans. Computer-Aided Design, vol. 30, no. 3, pp.337–349, 2011.
[96] A. Datta, A. Goel, R. T. Cakici, H. Mahmoodi, D. Lakshmanan, and K. Roy,“Modeling and circuit synthesis for independently controlled double gateFinFET devices,” IEEE Trans. Computer-Aided Design, vol. 26, no. 11, pp.1957–1966, Nov. 2007.
[97] M.-H. Chiang, K. Kim, C. Tretz, and C.-T. Chuang, “Novel high-density low-power logic circuit techniques using DG devices,” IEEE Electron Device Lett.,vol. 52, no. 10, pp. 2339–2342, Oct. 2005.
[98] A. Kumar, B. A. Minch, and S. Tiwari, “Low voltage and performance tun-able CMOS circuit design using independently driven double gate MOS-FETs,” in Proc. Int. SOI Conf., Oct. 2004.
[99] S. A. Tawfik and V. Kursun, “Low-power and compact sequential circuitswith independent-gate FinFETs,” IEEE Trans. Electron Devices, vol. 55, pp.60–70, Jan. 2008.
[100] S. Tawfik and V. Kursun, “Characterization of new static independent-gatebiased FinFET latches and flip-flops under process variations,” in Proc. Int.Symp. Quality of Electronic Design, Mar. 2008, pp. 311–316.
[101] S. Xiong and J. Bokor, “Sensitivity of double-gate and FinFET devices to pro-cess variations,” IEEE Trans. Electron Devices, vol. 50, pp. 2255–2261, Nov.2003.
[102] J. Kedzierski et al., “High-performance symmetric-gate and CMOS-compatible Vt asymmetric-gate FinFET devices,” in Proc. Int. Electron DevicesMtg., 2001, pp. 19.5.1–19.5.4.
[103] L. Mathew et al., “FinFET with isolated n+ and p+ gate regions strapped withmetal and polysilicon,” in Proc. Int. SOI Conf., Nov. 2003, pp. 109–110.
[104] A. N. Bhoj and N. K. Jha, “Gated-diode FinFET DRAMs: Device and circuitdesign considerations,” ACM J. Emerging Technologies in Computing Systems,vol. 6, no. 4, pp. 12:1–12:32, 2010.
230
[105] D. Ha, H. Takeuchi, Y.-K. Choi, and T.-J. King, “Molybdenum gate technol-ogy for ultrathin-body MOSFETs and FinFETs,” IEEE Trans. Electron Devices,vol. 51, no. 12, pp. 1989–1996, Dec. 2004.
[106] A. Singhee and R. A. Rutenbar, “From finance to flip flops: A study of fastquasi-Monte Carlo methods from computational finance applied to statisti-cal circuit analysis,” in Proc. Int. Symp. Quality of Electronic Design, Mar. 2007,pp. 685–692.
[107] I. M. Sobol, “The distribution of points in a cube and the approximationevaluation of integrals,” in USSR Comp. Math and Math. Phys., 1967, pp. 86–112.
[108] K. Anil, K. Henson, S. Biesemans, and N. Collaert, “Layout density analysisof FinFETs,” in Proc. European Solid-State Device Research Conf., Sept. 2003, pp.139–142.
[109] M. Alioto, “Comparative evaluation of layout density in 3T, 4T, and MT Fin-FET standard cells,” IEEE Trans. VLSI Systems, vol. 19, no. 5, pp. 751–762,May 2011.
[110] R. L. Wadsack, “Fault modeling and logic simulation of CMOS and MOSintegrated circuits,” Bell System Technical J., vol. 57, pp. 1449–1474, May 1978.
[111] T. Storey and W. Maly, “CMOS bridging fault detection,” in Proc. Int. TestConf., Sept. 1990, pp. 842–851.
[112] E. J. McCluskey and C.-W. Tseng, “Stuck-fault tests vs. actual defects,” inProc. Int. Test Conf., Oct. 2000, pp. 336–342.
[113] A. Pramanick and S. Reddy, “On the detection of delay faults,” in Proc. Int.Test Conf., Sept. 1988, pp. 845–856.
[114] J. Li, T. Chao-Wen, and E. J. McCluskey, “Testing for resistive opens andstuck opens,” in Proc. Int. Test Conf., Nov. 2001, pp. 1049–1058.
[115] F. J. Ferguson and J. P. Shen, “A CMOS fault extractor for inductive faultanalysis,” IEEE Trans. Computer-Aided Design, vol. 7, no. 11, pp. 1181–1194,Nov. 1988.
[116] R. Rajsuman, “IDDQ testing for CMOS VLSI,” Proc. IEEE, vol. 88, no. 4, pp.544–568, Apr. 2000.
[117] C.-L. Hsu, M.-H. Ho, and C.-F. Lin, “Novel built-in current-sensor-basedtesting scheme for CMOS integrated circuits,” IEEE Trans. Instrumentationand Measurement, vol. 58, no. 7, pp. 2196–2208, July 2009.
[118] J. Vazquez, V. Champac, C. Hawkins, and J. Segura, “Stuck-open fault leak-age and testing in nanometer technologies,” in Proc. VLSI Test Symp., May2009, pp. 315–320.
231
[119] M. O. Simsir, A. N. Bhoj, and N. K. Jha, “Fault modeling for FinFET circuits,”in Proc. Int. Symp. Nanoscale Architectures, June 2010, pp. 41–46.
[120] A. N. Bhoj, M. O. Simsir, and N. K. Jha, “Fault models for logic circuits in themultigate era,” IEEE Trans. Nanotechnology, vol. 11, no. 1, pp. 182–193, Jan.2012.
[121] E. MacDonald and N. A. Touba, “Delay testing of SOI circuits: Challengeswith the history effect,” in Proc. Int. Test Conf., Sept. 1999, pp. 269–275.
[122] A. Zaka et al., “Characterization and 3D TCAD simulation of NOR-type flashnon-volatile memories with emphasis on corner effects,” Solid-State Electron-ics, vol. 63, no. 1, pp. 158–162, Sept. 2011.
[123] W. Wang, S. Chang, J. Huang, and S. Kuang, “3D TCAD simulations ofstrained Si CMOS devices with silicon-based alloy stressors and stressedCESL,” Solid-State Electronics, vol. 53, no. 8, pp. 880–887, Aug. 2009.
[124] G. Pei, J. Kedzierski, P. Oldiges, M. Ieong, and E. Kan, “FinFET design con-siderations based on 3D TCAD simulation and analytical modeling,” IEEETrans. Electron Devices, vol. 49, no. 8, pp. 1411–1419, Aug. 2002.
[125] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “Efficient methodologies for 3D-TCADmodeling of emerging devices and circuits,” accepted for publication in IEEETrans. Computer-Aided Design.
[126] Z. Essa, P. Boulenc, C. Tavernier, F. Hirigoyen, A. Crocherie, J. Michelot, andD. Rideau, “3D TCAD simulation of advanced CMOS image sensors,” inProc. Int. Conf. Simulation of Semiconductor Processes and Devices, Sept. 2011,pp. 187–190.
[127] M. Nawaz, W. Molzer, S. Decker, L. Giles, and T. Schulz, “On the device de-sign assessment of multigate FETs (MuGFETs) using full process and devicesimulation in 3D TCAD,” IEEE Trans. Electron Devices, vol. 38, no. 12, pp.1238–1251, Dec. 2007.
[128] L. Sponton, L. Bomholt, and W. Fichtner, “Analysis of process-geometrymodulations through 3D TCAD,” in Proc. Int. Conf. Simulation of Semicon-ductor Processes and Devices, Sept. 2007, pp. 385–388.
[129] P. Fleischmann, R. Sabelka, A. Stach, R. Strasser, and S. Selberherr, “Gridgeneration for three-dimensional process and device simulation,” in Proc.Int. Conf. Simulation of Semiconductor Processes and Devices, Sept. 1996, pp.161–166.
[130] W. Wessner, J. Cervenka, C. Heitzinger, A. Hossinger, and S. Selberherr,“Anisotropic mesh refinement for the simulation of three-dimensional semi-conductor manufacturing processes,” IEEE Trans. Computer-Aided Design,vol. 25, no. 10, pp. 2129–2139, Oct. 2006.
232
[131] K. K. H. Toh, A. R. Neureuther, and E. W. Scheckler, “Algorithms for sim-ulation of three-dimensional etching,” IEEE Trans. Computer-Aided Design,vol. 13, no. 5, pp. 616–624, May 1994.
[132] Z. F. Zhou, Q. A. Huang, W. H. Li, and W. Lu, “A novel 3-D dynamic cellu-lar automata model for photoresist-etching process simulation,” IEEE Trans.Computer-Aided Design, vol. 26, no. 1, pp. 100–114, Jan. 2007.
[133] A. N. Bhoj and R. V. Joshi, “Transport analysis based 3D TCAD capacitanceextraction for sub-32nm SRAM structures,” IEEE Electron Device Lett., pp.158–160, Feb. 2012.
[134] Sentaurus Structure Editor manuals. http://www.synopsys.com.
[135] Cadence SKILL scripting language. http://www.cadence.com.
[136] Sentaurus TCAD tool suite. http://www.synopsys.com.
[137] Sentaurus Process manuals. http://www.synopsys.com.
[138] Sentaurus TCAD Application examples and notes.http://solvnet.synopsys.com.
[139] K. Nabors, S. Kim, J. White, and S. Senturia, “Fast capacitance extractionof general three-dimensional structures,” IEEE Trans. Microwave Theory andTechniques, vol. 40, no. 7, pp. 1496–1506, July 1992.
[140] T. Lu, Z. Wang, and W. Yu, “Hierarchical block boundary-element method(HBBEM): A fast field solver for 3-D capacitance extraction,” IEEE Trans. Mi-crowave Theory and Techniques, vol. 52, no. 1, pp. 10–19, Jan. 2004.
[141] W. Yu, Z. Wang, and J. Gu, “Fast capacitance extraction of actual 3-D VLSIinterconnects using quasi-multiple medium accelerated BEM,” IEEE Trans.Microwave Theory and Techniques, vol. 51, no. 1, pp. 109–119, Jan. 2003.
[142] A. Chin et al., “RF passive devices on Si with excellent performance close toideal devices designed by electro-magnetic simulation,” in Proc. Int. ElectronDevices Mtg., Dec. 2003, pp. 375–378.
[143] G. Wang, X. Qi, Z. Yu, and R. Dutton, “Device level modeling of metal-insulator-semiconductor interconnects,” IEEE Trans. Electron Devices, vol. 48,no. 8, pp. 1672 – 1682, Aug. 2001.
[144] S. Laux, “Techniques for small-signal analysis of semiconductor devices,”IEEE Trans. Electron Devices, vol. 32, no. 10, pp. 2028 – 2037, Oct. 1985.
[145] A. N. Bhoj, R. V. Joshi, S. Polonsky, R. Kanj, S. Saroop, Y. Tan, and N. K. Jha,“Hardware-assisted 3D TCAD for predictive capacitance extraction in 32nmSOI SRAMs,” in Proc. Int. Electron Devices Mtg., Dec. 2011, pp. 34.7.1–34.7.4.
233
[146] A. N. Bhoj, R. V. Joshi, and N. K. Jha, “3D-TCAD based parasitic capacitanceextraction for emerging multigate devices and circuits,” accepted for publica-tion in IEEE Trans. VLSI Systems.
[147] C. Wang et al., “FinFET resistance mitigation through design and processoptimization,” in Proc. Int. Symp. VLSI Technology, Apr. 2009, pp. 127–128.
[148] C. H. Lin et al., “Non-planar device architecture for 15nm node: FinFET orTrigate?” in Proc. Int. SOI Conf., Oct. 2010, pp. 1–2.
[149] A. Kaneko et al., “Sidewall transfer process and selective gate sidewall spacerformation technology for sub-15nm FinFET with elevated source/drain ex-tension,” in Proc. Int. Electron Devices Mtg., Dec. 2005, pp. 844–847.
[150] T. Kanemura et al., “Improvement of drive current in bulk-FinFET using full3D process/device simulations,” in Proc. Int. Conf. Simulation of Semiconduc-tor Processes and Devices, Sept. 2006, pp. 131–134.
[151] K. Maitra et al., “Aggressively scaled strained-silicon-on-insulator undoped-body high-κ/metal-gate nFinFETs for high-performance logic applications,”IEEE Electron Device Lett., pp. 713–715, June 2011.
[152] H. Kawasaki et al., “Challenges and solutions of FinFET integration in anSRAM cell and a logic circuit for 22nm node and beyond,” in Proc. Int. Elec-tron Devices Mtg., Dec. 2009, pp. 1–4.
[153] T. Yamashita et al., “Analysis of parasitic resistance in double gate FinFETswith different fin lengths,” in Proc. Int. SOI Conf., Oct. 2011, pp. 1–2.
[154] H. Zhao, Y. Yeo, S. Rustagi, and G. Samudra, “Analysis of the effects of fring-ing electric field on FinFET device performance and structural optimizationusing 3-D simulation,” IEEE Trans. Electron Devices, vol. 55, no. 5, pp. 1177–1184, May 2008.
[155] W. Wu and M. Chan, “Analysis of geometry-dependent parasitics in multifindouble-gate FinFETs,” IEEE Trans. Electron Devices, vol. 54, no. 4, pp. 692–698,Apr. 2007.
[156] M. Guillorn et al., “FinFET performance advantage at 22nm: An AC perspec-tive,” in Proc. Int. Symp. VLSI Technology, June 2008, pp. 12–13.
[157] H. Kawasaki et al., “Embedded bulk FinFET SRAM cell technology with pla-nar FET peripheral circuit for hp32 nm node and beyond,” in Proc. Int. Symp.VLSI Technology, June 2006, pp. 70–71.
[158] ——, “Demonstration of highly scaled FinFET SRAM cells with high-k/metal gate and investigation of characteristic variability for the 32nm nodeand beyond,” in Proc. Int. Electron Devices Mtg., Dec. 2008, pp. 1–4.
234
[159] V. Basker et al., “A 0.063 µm2 FinFET SRAM cell demonstration with conven-tional lithography using a novel integration scheme with aggressively scaledfin and gate pitch,” in Proc. Int. Symp. VLSI Technology, June 2010, pp. 19–20.
[160] M. Guillorn et al., “A 0.021 µm2 trigate SRAM cell with aggressively scaledgate and contact pitch,” in Proc. Int. Symp. VLSI Technology, June 2011, pp.64–65.
[161] C. H. Lin et al., “Modeling of width-quantization-induced variations in logicFinFETs for 22nm and beyond,” in Proc. Int. Symp. VLSI Technology, June2011, pp. 16–17.
[162] T. Yamashita et al., “Sub-25nm FinFET with advanced fin formation andshort channel effect engineering,” in Proc. Int. Symp. VLSI Technology, June2011, pp. 14–15.
[163] P. Oldiges et al., “Critical analysis of 14nm device options,” in Proc. Int. Conf.Simulation of Semiconductor Processes and Devices, Sept. 2011, pp. 5–8.
[164] J. B. Chang et al., “Scaling of SOI FinFETs down to fin width of 4nm for the10nm technology node,” in Proc. Int. Symp. VLSI Technology, June 2011, pp.12–13.
[165] C. Wann et al., “SRAM cell design for stability methodology,” in Proc. Int.Symp. VLSI Technology, Aug. 2005, pp. 21–22.
[166] Y. Taur and H. Ning, Fundamentals of Modern VLSI Devices. Cambridge,U.K.: Cambridge Univ. Press, 1998.
[167] A. N. Bhoj and N. K. Jha, “Parasitics-aware design of symmetric and asym-metric gate-workfunction FinFET SRAMs,” under review.
[168] S. Gangwal, S. Mukopadhyay, and K. Roy, “Optimization for surface orien-tation for high-performance, low-power and robust FinFET SRAM,” in Proc.Custom Integrated Circuits Conf., Sept. 2006, pp. 433–436.
[169] S. A. Tawfik and V. Kursun, “Low power and stable FinFET SRAM withstatic independent gate bias for enhanced integration density,” in Proc. Int.Conf. Electronics, Circuits, and Systems, Dec. 2007, pp. 443–446.
[170] K. Endo, S.-I. Ouchi, Y. Ishikawa, Y. Liu, T. Matsukawa, K. Sakamoto,M. Masahara, J. Tsukada, K. Ishii, H. Yamauchi, and E. Suzuki,“Independent-gate four-terminal FinFET SRAM for drastic leakage currentreduction,” in Proc. Int. Conf. Integrated Circuit Design and Technology and Tu-torial, June 2008, pp. 63–66.
[171] S. A. Tawfik and V. Kursun, “Portfolio of FinFET memories: Innovative tech-niques for an emerging technology,” in Proc. Int. SOC Design Conf., Nov. 2008,pp. 101–104.
235
[172] A. Singhee and R. Rutenbar, Extreme Statistics in Nanoscale Memory Design.Springer, New York, 2010.
[173] E. Grossar, M. Stucchi, K. Maex, and W. Dehaene, “Read stability and write-ability analysis of SRAM cells for nanometer technologies,” IEEE J. Solid-State Circuits, vol. 41, no. 11, pp. 2577–2588, Nov. 2006.
[174] O. Schenk, K. Gartner, W. Fichtner, and A. Stricker, “PARDISO: A high-performance serial and parallel sparse linear solver in semiconductor devicesimulation,” Future Generation Computer Systems, vol. 18, no. 1, pp. 69–78,Jan. 2001.
[175] M. Joshi, G. Karypis, A. Gupta, and F. Gustavson, “PSPASES: Scalable par-allel direct solver for sparse systems,” in Proc. SIAM Conf. Parallel ProcessingScientific Computing, 1999.
[176] Y. Saad, Iterative Methods for Sparse Linear Systems. PWS Publishing Com-pany, 1996.
[177] R. E. Bank and D. J. Rose, “Global approximate Newton methods,” Nu-merische Mathematik, vol. 37, no. 2, pp. 279–295, 1981.
[178] P. Deuflhard, “Global inexact Newton methods for very large scale nonlinearproblems,” IMPACT of Computing in Science and Engineering, vol. 3, no. 4, pp.366–393, Dec. 1991.
[179] R. E. Bank and D. J. Rose, “Parameter selection for Newton-like methods ap-plicable to nonlinear partial differential equations,” SIAM J. Numerical Anal-ysis, vol. 17, no. 6, pp. 806–822, Dec. 1980.
236