Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses...

6
ISSCC 2009 / February 9, 2009 / 10:35 AM 23 DIGEST OF TECHNICAL PAPERS 1 1.3 The New Era of Scaling in an SoC World Mark Bohr, Senior Fellow, Intel, Hillsboro, OR 1. Introduction MOSFET scaling has served our industry well for more than three decades by providing significant improvements in transistor performance, power and cost-per-transistor. Along this path many barriers to continued scaling have been perceived as insurmountable and the end of scaling was often predicted. But, perceived barriers are meant to be surmounted, circumvent- ed or tunneled through, and the combined ingenuity of our industry has pushed transistor technology and microprocessor design well beyond what anyone thought possible a few decades ago. The scaling path that we have been on has not been a straight evolutionary path, it has taken some unex- pected turns in direction. Our challenge in this new era of scaling is to rec- ognize the coming revolutionary changes and opportunities and to prepare to utilize them. 2. Transistor Scaling Classical MOSFET scaling was first described by Dennard in 1974 [1]. The combination of Moore’s Law and Dennard’s scaling methodology has pro- vided our industry with many generations of smaller faster transistors, and higher performance microprocessors (Figure 1.3.1). Classical MOSFET scaling techniques were followed successfully until around the 90nm gen- eration, when gate-oxide scaling started to slow down due to increased gate leakage. The limitation posed by gate leakage became so severe that there was essentially no gate-oxide thickness scaling from the 90nm to the 65nm generation, and many companies converged on a SiO 2 thickness close to 1.2nm for their high-performance logic process. When gate-oxide thickness can no longer be scaled, then other key MOSFET parameters, such as sup- ply voltage, can not be scaled and still expect to deliver improved transistor performance. Without new inventions, MOSFET scaling and Moore’s Law were threatened with the likelihood of coming to an end. One of the first significant transistor innovations in the past decade was the introduction in 2003 of strained-silicon technology to enhance transistor performance for Intel’s 90nm microprocessors [2, 3]. The 65nm generation introduced in 2005 further improved these strain techniques to increase transistor performance, even though gate-oxide thickness stayed at rough- ly the same 1.2nm value to avoid increased leakage current [4]. Strained sil- icon is an example of a revolutionary technology that provided improved performance without following classical MOSFET scaling methods. Although strained silicon provided valuable performance enhancements for 90nm and 65nm generations, we could not ignore the need to scale gate- oxide thickness, and the need to reduce gate-oxide leakage, on future tech- nologies. Intel’s 45nm logic technology was the first to introduce high-κ dielectric with metal-gate transistors for improved performance and reduced leakage [5, 6]. A hafnium-based dielectric replaced SiO 2 to provide a gate oxide that was physically thicker, which reduced leakage, but had a thinner electrical equivalence, which improved transistor performance. Special metal-gate materials replaced polysilicon to help further reduce the electrical-oxide thickness to a value of 1.0nm. In addition to the perform- ance and leakage benefits of the 45nm high-κ/metal-gate (HK+MG) transis- tors, the diminished electrical thickness also helped to reduce transistor V t variability, an important factor in the ability of circuits to use scaled devices [7]. These 45nm HK+MG transistors provided an average increase of 30% in drive current compared to the previous 65nm generation (Figure 1.3.2). Alternately, these transistors can provide more than a 5× reduction in sub- threshold (I off ) leakage. Gate-oxide leakage is reduced >25× for NMOS, >1000× for PMOS and SRAM bitcell leakage is reduced >10×. Fast and dense interconnects are just as important to modern logic products as small and fast transistors. Unlike transistors, interconnects do not get faster as they scale, so our industry has addressed this problem by gradu- ally increasing the number of interconnect layers. This has allowed some layers to use wider/thicker wires for fast signal propagation, while other lay- ers have tight pitch for improved density. New interconnect materials have also been introduced to improve RC delay and current-carrying capacity. Copper replaced aluminum early in this decade as a means to increase conductivity and to improve electromigration resistance. A series of ever- lower-κ dielectrics have been introduced to reduce wire capacitance, which has both signal-speed and active-power-reduction benefits. The implemen- tation of more interconnect layers and improved materials has enabled many generations of scaled interconnects, but interconnects will continue to be a problem that requires a combination of process, design, and architec- ture solutions to overcome. Intel’s 45nm technology uses 9 layers of Cu interconnect, one more than the previous 65nm generation. A hierarchy of interconnect pitches is used in this technology, providing high-density local interconnects on lower layers, and high-speed global interconnects on upper layers. A special 7μm-thick M9 layer was introduced on this generation to provide low-resistance power routing to minimize voltage droops. Intel has been in high-volume produc- tion of this 45nm technology since November of 2007 and many products have been introduced for a wide range of performance and power applica- tions, including single-core, dual-core, quad-core, and six-core micro- processors. Intel’s 32nm logic technology uses second generation high-κ + metal-gate transistors, fourth generation strained silicon, nine copper interconnect lay- ers with low-κ dielectrics and minimum pitches that are scaled ~70% from the 45nm generation [8]. Gate-oxide equivalent oxide thickness (EOT) is scaled to 0.9nm and transistor gate pitch is scaled to 112.5nm. Transistor performance is increased by more than 22% compared to 45nm transistors through a combination of increased drive current and reduced gate capaci- tance. As shown in Figure 1.3.3, transistor drive currents have continued to increase over the past several generations while scaling gate pitch and maintaining constant subthreshold leakage. The minimum metal layer pitch- es are 112.5nm and this process uses a hierarchical interconnect pitch sys- tem similar to that used in the 45nm generation. A 32nm 291Mbit SRAM test chip with more than 1.9 billion transistors and a cell size of 0.171μm 2 has been demonstrated in this technology [9]. The classical method of MOSFET scaling has served us well for more than 30 years. Unfortunately, the standard MOSFET transistor structure using SiO 2 gate oxide and polysilicon gate electrode is no longer scalable. The new era of scaling is one where material and structure innovation are just as important as dimensional scaling. 3. Microprocessor Evolution Just as transistors have evolved in more ways than simply having scaled dimensions, so have microprocessors evolved. The first microprocessor (Intel 4004) was introduced in 1971 and processed 4 bits of data per cycle. It contained 2300 transistors and operated at 100kHz. From that starting point microprocessor performance has increased rapidly following predic- tions derived from Moore’s Law. Technology scaling has enabled ever more complex designs to be built. Integrating multiple pipelined execution engines in microprocessors enabled micro-architectural innovations to exploit program-instruction parallelism and allow multiple instructions to be executed in each clock cycle. Speculation has been used to fetch and exe- cute instructions based on predicting the program control flow. Out-of- order issue and register renaming techniques have been employed to remove stalls when instructions are not ready for execution. Memory- address width has increased to 64 bits enabling access to large memories, while integrated caches have made the effective memory access faster. Multi-threading has been introduced to allow instructions from independent program threads to efficiently exploit the microprocessor’s parallel execu- tion engines and execute simultaneously. Today’s microprocessors now integrate multiple processing cores onto a single chip. Along with enabling micro-architecture advancements, technology scaling has produced faster transistors. Increasing clock frequency has been an important contributor to microprocessor performance gains. Clock systems have evolved from a simple buffer tree to global clock grids driven by a hier- archy of distributed buffers. PLLs for clock generation have been integrat- Please click on paper title to view Visual Supplement. Please click on paper title to view Visual Supplement. Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Transcript of Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses...

Page 1: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

ISSCC 2009 / February 9, 2009 / 10:35 AM

23DIGEST OF TECHNICAL PAPERS •

11.3 The New Era of Scaling in an SoC World

Mark Bohr, Senior Fellow, Intel, Hillsboro, OR

1. Introduction MOSFET scaling has served our industry well for more than three decadesby providing significant improvements in transistor performance, powerand cost-per-transistor. Along this path many barriers to continued scalinghave been perceived as insurmountable and the end of scaling was oftenpredicted. But, perceived barriers are meant to be surmounted, circumvent-ed or tunneled through, and the combined ingenuity of our industry haspushed transistor technology and microprocessor design well beyond whatanyone thought possible a few decades ago. The scaling path that we havebeen on has not been a straight evolutionary path, it has taken some unex-pected turns in direction. Our challenge in this new era of scaling is to rec-ognize the coming revolutionary changes and opportunities and to prepareto utilize them.

2. Transistor Scaling Classical MOSFET scaling was first described by Dennard in 1974 [1]. Thecombination of Moore’s Law and Dennard’s scaling methodology has pro-vided our industry with many generations of smaller faster transistors, andhigher performance microprocessors (Figure 1.3.1). Classical MOSFETscaling techniques were followed successfully until around the 90nm gen-eration, when gate-oxide scaling started to slow down due to increased gateleakage. The limitation posed by gate leakage became so severe that therewas essentially no gate-oxide thickness scaling from the 90nm to the 65nmgeneration, and many companies converged on a SiO2 thickness close to1.2nm for their high-performance logic process. When gate-oxide thicknesscan no longer be scaled, then other key MOSFET parameters, such as sup-ply voltage, can not be scaled and still expect to deliver improved transistorperformance. Without new inventions, MOSFET scaling and Moore’s Lawwere threatened with the likelihood of coming to an end.

One of the first significant transistor innovations in the past decade was theintroduction in 2003 of strained-silicon technology to enhance transistorperformance for Intel’s 90nm microprocessors [2, 3]. The 65nm generationintroduced in 2005 further improved these strain techniques to increasetransistor performance, even though gate-oxide thickness stayed at rough-ly the same 1.2nm value to avoid increased leakage current [4]. Strained sil-icon is an example of a revolutionary technology that provided improvedperformance without following classical MOSFET scaling methods.

Although strained silicon provided valuable performance enhancements for90nm and 65nm generations, we could not ignore the need to scale gate-oxide thickness, and the need to reduce gate-oxide leakage, on future tech-nologies. Intel’s 45nm logic technology was the first to introduce high-κdielectric with metal-gate transistors for improved performance andreduced leakage [5, 6]. A hafnium-based dielectric replaced SiO2 to providea gate oxide that was physically thicker, which reduced leakage, but had athinner electrical equivalence, which improved transistor performance.Special metal-gate materials replaced polysilicon to help further reduce theelectrical-oxide thickness to a value of 1.0nm. In addition to the perform-ance and leakage benefits of the 45nm high-κ/metal-gate (HK+MG) transis-tors, the diminished electrical thickness also helped to reduce transistor Vt

variability, an important factor in the ability of circuits to use scaled devices[7]. These 45nm HK+MG transistors provided an average increase of 30%in drive current compared to the previous 65nm generation (Figure 1.3.2).Alternately, these transistors can provide more than a 5× reduction in sub-threshold (Ioff) leakage. Gate-oxide leakage is reduced >25× for NMOS,>1000× for PMOS and SRAM bitcell leakage is reduced >10×.

Fast and dense interconnects are just as important to modern logic productsas small and fast transistors. Unlike transistors, interconnects do not getfaster as they scale, so our industry has addressed this problem by gradu-ally increasing the number of interconnect layers. This has allowed somelayers to use wider/thicker wires for fast signal propagation, while other lay-ers have tight pitch for improved density. New interconnect materials have

also been introduced to improve RC delay and current-carrying capacity.Copper replaced aluminum early in this decade as a means to increase conductivity and to improve electromigration resistance. A series of ever-lower-κ dielectrics have been introduced to reduce wire capacitance, whichhas both signal-speed and active-power-reduction benefits. The implemen-tation of more interconnect layers and improved materials has enabledmany generations of scaled interconnects, but interconnects will continue tobe a problem that requires a combination of process, design, and architec-ture solutions to overcome.

Intel’s 45nm technology uses 9 layers of Cu interconnect, one more than theprevious 65nm generation. A hierarchy of interconnect pitches is used inthis technology, providing high-density local interconnects on lower layers,and high-speed global interconnects on upper layers. A special 7µm-thickM9 layer was introduced on this generation to provide low-resistance powerrouting to minimize voltage droops. Intel has been in high-volume produc-tion of this 45nm technology since November of 2007 and many productshave been introduced for a wide range of performance and power applica-tions, including single-core, dual-core, quad-core, and six-core micro-processors.

Intel’s 32nm logic technology uses second generation high-κ + metal-gatetransistors, fourth generation strained silicon, nine copper interconnect lay-ers with low-κ dielectrics and minimum pitches that are scaled ~70% fromthe 45nm generation [8]. Gate-oxide equivalent oxide thickness (EOT) isscaled to 0.9nm and transistor gate pitch is scaled to 112.5nm. Transistorperformance is increased by more than 22% compared to 45nm transistorsthrough a combination of increased drive current and reduced gate capaci-tance. As shown in Figure 1.3.3, transistor drive currents have continued toincrease over the past several generations while scaling gate pitch andmaintaining constant subthreshold leakage. The minimum metal layer pitch-es are 112.5nm and this process uses a hierarchical interconnect pitch sys-tem similar to that used in the 45nm generation. A 32nm 291Mbit SRAMtest chip with more than 1.9 billion transistors and a cell size of 0.171µm2

has been demonstrated in this technology [9].

The classical method of MOSFET scaling has served us well for more than30 years. Unfortunately, the standard MOSFET transistor structure usingSiO2 gate oxide and polysilicon gate electrode is no longer scalable. The newera of scaling is one where material and structure innovation are just asimportant as dimensional scaling.

3. Microprocessor Evolution Just as transistors have evolved in more ways than simply having scaleddimensions, so have microprocessors evolved. The first microprocessor(Intel 4004) was introduced in 1971 and processed 4 bits of data per cycle.It contained 2300 transistors and operated at 100kHz. From that startingpoint microprocessor performance has increased rapidly following predic-tions derived from Moore’s Law. Technology scaling has enabled ever morecomplex designs to be built. Integrating multiple pipelined executionengines in microprocessors enabled micro-architectural innovations toexploit program-instruction parallelism and allow multiple instructions to beexecuted in each clock cycle. Speculation has been used to fetch and exe-cute instructions based on predicting the program control flow. Out-of-order issue and register renaming techniques have been employed toremove stalls when instructions are not ready for execution. Memory-address width has increased to 64 bits enabling access to large memories,while integrated caches have made the effective memory access faster.Multi-threading has been introduced to allow instructions from independentprogram threads to efficiently exploit the microprocessor’s parallel execu-tion engines and execute simultaneously. Today’s microprocessors nowintegrate multiple processing cores onto a single chip.

Along with enabling micro-architecture advancements, technology scalinghas produced faster transistors. Increasing clock frequency has been animportant contributor to microprocessor performance gains. Clock systemshave evolved from a simple buffer tree to global clock grids driven by a hier-archy of distributed buffers. PLLs for clock generation have been integrat-

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Page 2: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

24 • 2009 IEEE International Solid-State Circuits Conference

ISSCC 2009 / SESSION 1 / PLENARY / 1.3

ed on-chip to generate multiple clock-frequency domains. Active de-skew-ing schemes using delay lines have been used to minimize the skew acrossthe die [10]. The combination of more complex micro architectures andhigher operating frequencies has led to power consumption becoming a pri-mary design constraint. Clock systems have evolved to implement extensiveclock gating to disable idle circuitry and save power. They also incorporateextensive tuning and debugging circuits, an example of which is the locatecritical path (LCP) mode, which is based on programmable delay driversactivated in the local clock drivers and controlled by configuration registersthat allow arrival times of local clocks to be adjusted post silicon. [11]

An example of the latest generation microprocessor is the 45nm Intel®

Core™ i7 Processor, formerly known as Nehalem (Figure 1.3.4). This is acomplex system on a chip with multiple functional units and multiple inter-faces, including four cores, 8MB L2 cache, integrated memory controller,DDR3 I/O and QPI I/O [12]. There are 11 PLL circuits, 23 master DLL cir-cuits and 5 digital thermal sensors located around the chip providing multi-ple clocking domains and local control. NMOS sleep transistors are used inthe cache to shut off leakage in inactive sub-blocks. They provide a 5 to 10×cache leakage reduction during retention/standby [13]. Power-gate transis-tors are integrated on the chip to shut off both active and leakage power oncores that are idle (Figure 1.3.5). These on-die power gates are enabled bya 45nm process which uses a 7µm thick top metal layer that provides very-low power-distribution impedance combined with ultra-low leakage transis-tors with low on-resistance. Nehalem introduces a turbo mode where corefrequency can be boosted in response to workload demand, thus dynami-cally delivering optimal performance and energy efficiency. An integratedpower control unit incorporates real-time sensors for current/power, voltageand temperature and individually controls settings for each core (Figure1.3.6). An adaptive frequency system adjusts clock frequency upwards dur-ing voltage supply spikes and reduces frequency during voltage supplydroops, thus allowing the cores to operate at optimal performance andpower without excessive guardbanding or adding excessive decouplingcapacitance [14].

On-chip SRAM cache is a vital component of any microprocessor. Transistordimensions and operating voltage need to scale to meet product perform-ance and power requirements, but these scale factors tend to degradeSRAM cell operating margin. In the past, scaled gate-oxide thickness helpedto minimize Vt variability on scaled transistors. The conversion to high-κ +metal-gate transistors provided this reduced variability benefit and helped tomaintain stable SRAM operating margin. Going forward, circuit design tech-niques such as using different WL and cell voltages [15] will be needed tocontinue SRAM cell scaling.

Figure 1.3.7 compares today’s Nehalem PC platform to the Intel386TM

processor PC platform from 1985. The 1985 platform used many separatechips, including a math co-processor, SRAM cache, cache controller andDRAM controller that are now all integrated on the single Nehalem chip.Nehalem not only greatly surpasses the Intel386TM processor in the expect-ed parameters such as transistor count and clock frequency, but Nehalemalso has dramatically higher I/O bandwidth and includes many sophisticat-ed adaptive circuits to optimize both performance and power (Figure 1.3.8).

SoC products require a wider range of transistor types than typical high-per-formance CPU products. Process technologies optimized for low-powerSoC products have transistors with a wider range of performance and leak-age capabilities extending below 0.1nA/µm Ioff values [16, 17, 18]. A recentexample is Intel’s 45nm SoC process [19] which uses the same basic high-κ + metal-gate transistors used for high-performance microprocessors, butadds low-leakage versions of these transistors optimized for very low-powerproducts, and also provides high-voltage high-speed I/O transistors to meetthe needs of a variety of I/O circuits (Figure 1.3.9). Several different deviceelements are added to this SoC process that are not normally included onCPU processes to meet the needs of analog circuits, including precisionresistors, precision capacitors, high-Q inductors and varactors. 45nmCMOS transistors have fT and fMAX values of more than 300GHz, whichmakes them an attractive option for RF and mixed signal circuits.

The old era of microprocessor scaling used smaller and faster transistors tobuild larger cores with higher frequency and higher power. The new era ofmicroprocessor scaling makes greater use of energy efficiency, power man-agement, parallelism, adaptive circuits and SoC features to provide productsthat are many-core, multi-core and multi-function.

4. Vision of the Future As Moore’s Law continues, we expect to continue doubling transistor den-sity every two years for the next several years. We will evolve from integrat-ing multiple cores on a chip to integrating many cores [20]. Power con-sumption will continue to be a primary consideration. Microprocessors willintegrate several different types of cores and functional units each optimizedfor a specific application. The processors will be highly adaptive; dynamical-ly optimizing themselves for peak performance, power efficiency and idlepower reduction. These terascale machines will target applications such asdata mining, visual computing, recognition, and simulation of immersiveenvironments.

From the earliest days of the integrated-circuit industry, continual improve-ments in lithographic techniques have enabled feature-size scaling andincreased transistor counts. Scaling exposure wavelength has been a keypart of the strategy for improving lithographic resolution, but as shown inFigure 1.3.10, wavelength has been scaling at a much slower rate than fea-ture size, and today’s minimum features are much smaller than the 193nmwavelength used in exposure tools. Resolution-enhancement techniques,such as optical-proximity correction, phase-shift masks, and immersionlithography have been introduced to bring us to the 32nm generation. Buteven with these enhancements, layout restrictions, such as unidirectionalfeatures, gridded layout and restricted line + space combinations, have hadto be gradually adopted. Double-patterning techniques and computationallithography [21] are options being investigated to continue scaling to 22nmand possibly 16nm generations before extreme ultraviolet (EUV) lithographycould be ready to provide a significant wavelength-reduction and resolutionenhancement (Figure 1.3.11).

Strained silicon, high-κ dielectrics and metal gates have been significantinnovations that have allowed MOSFET density, performance and energyefficiency to show continued improvements past when traditional scalingtechniques ran out of steam. These are not the last of innovative transistoroptions, and several other promising ideas are being explored by variousresearch groups. Substrate engineering makes use of (110) wafers toimprove p-channel mobility, but may not offer any advantage for n-channeldevices [22, 23]. Multi-gate transistors such as FinFET, Tri-Gate and Gate-All-Around devices offer improved electrostatics and steeper subthresholdslopes, but may suffer from higher parasitic capacitance and parasiticresistance [24, 25, 26] (Figure 1.3.12). III-V channel materials such as InSb[27], InGaAs [28] and InAs [29] are showing promise for providing highswitching speed at low operating voltage due to increased carrier mobility(Figure 1.3.13), but challenges remain before a practical CMOS solution willbe ready.

3D chip stacking combined with through-silicon vias provides a high densi-ty of chip-to-chip interconnects along with the small form factor that isneeded for mobile and handheld systems [30, 31] (Figure 1.3.14). The highdensity of chip-to-chip interconnects provided with this approach can helpto improve bandwidth between CPU and cache memory. 3D stacking is alsoa way to integrate together chips of dissimilar process technology that maybe impractical to implement on a single piece of silicon. The downsides to3D chip stacking in this manner include the added process cost of through-silicon vias, the silicon area lost on the chip that has vias cut through it, andthe challenges of delivering power and removing heat from the stack.

High-performance computing is increasingly limited by bandwidth betweenthe CPU and main memory. Optical interconnects can address this band-width bottleneck if technologies can be developed that cost effectively inte-grate photonics with silicon logic (Figure 1.3.15). Ideas are emerging thatcombine optical interconnect transceivers with silicon chips to provide highbandwidth chip-to-chip interconnects [32]. Using optical interconnects foron-chip signaling may be further off in the future due to the difficulties withscaling optical transceivers and interconnects to the dimensions required.

978-1-4244-3457-2/09/$25.00 ©2009 IEEE

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Page 3: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

ISSCC 2009 / February 9, 2009 / 10:35 AM

High-density memory beyond what SRAM can provide is needed for bothlow-power and high-performance product applications. In addition to tradi-tional DRAM, eDRAM and Flash memory cell options, floating body cell [33,34], phase-change memory [35] and seek-and-scan probe memory optionsall provide greater bit density than what 6T SRAM cells can provide. Butintegrating a novel memory process together with a logic process on a sin-gle wafer without compromising one or the other could be difficult.

If there is one lesson we should have learned over the past decades of scal-ing, it is that it is not sufficient to take smaller transistors as they becomeavailable to simply make more complex versions of the same system com-ponents. The real advantage comes when you begin to combine differentcomponents together in an integrated form, as we did during the evolutionfrom the Intel386TM CPU PC platform to the Nehalem PC platform. This trendwill continue in the future as more system components are integrated ontosingle chips or into single packages to reap benefits of increased perform-ance, lower power and smaller form factors. System integration will use acombination of traditional 2D (SoC) integration and 3D chip stacking tech-niques. With either approach the real challenge we face is learning how tointegrate a wider range of heterogeneous elements (Figure 1.3.16).

As we ponder the best paths to take in doing higher level integration in theelectronics world, we may consider examples provided by nature as organ-ic systems evolved to higher lifeforms (Figure 1.3.17). Reptiles have an inte-grated system of computing, sensors and motion feedback that surpassesthe most advanced autonomous vehicles. As we explore the neuron and theworkings of the human brain, we observe some remarkable differences andsimilarities between organic and electronic systems [36]. Electronic sys-tems hold a clear advantage in transistor and interconnect speed, by manyorders of magnitude, but the human brain is amazingly power efficient forwhat it does, partly due to the low power of neuron activity and also due tothe massively parallel nature of the brain (Figure 1.3.18). Future computingdevices will have powerful computing capabilities that are coupled with sen-sors to make them aware of the user and environment. The combination willprovide computers with the potential of recognition and understanding ofmovement, emotion, and speech.

5. Conclusion The time has passed when traditional MOSFET scaling techniques were ade-quate to meet the needs of microprocessor products, but that has not meantthe end of Moore’s Law nor the end of improvements in microprocessorperformance and power. In the new era of device scaling, innovations inmaterials and device structure are just as important as dimensional scaling.The past trend of using smaller transistors to build larger microprocessorcores operating at higher frequency and consuming more power is also atan end. The new era of microprocessor scaling is a system-on-a-chipapproach that combines a diverse set of components using adaptive cir-cuits, integrated sensors, sophisticated power-management techniques,and increased parallelism to build products that are many-core, multi-core,and multi-function. Although many promising technologies and deviceoptions are in the research pipeline, we need to recognize that we are doingsystem integration, and the future challenge we face is learning how to inte-grate an ever wider range of heterogeneous elements.

Acknowledgments: Sincere thanks to Bill Bowhill, Doris Burrill, Nasser Kurd, Ian Young and Kevin Zhang whocontributed to preparing this paper.

References: [1] R. Dennard, F. Gaensslen, H. Yu, V. Rideout, A. LeBlanc, “Design of Ion-ImplantedMOSFET’s with Very Small Physical Dimensions,” IEEE J. Solid-State Circuits, vol. 9, no.5, pp. 256-268, Oct. 1974. [2] T. Ghani, et al., “A 90nm High Volume Manufacturing Logic Technology FeaturingNovel 45nm gate Length Strained Silicon CMOS Transistors,” IEDM Tech. Dig., pp 978-980, Dec. 2003. [3] S. Thompson, et al., “A 90nm Logic Technology Featuring Strained-Silicon,” IEEETrans. Electron Devices, vol. 51, no. 11, pp. 1790-1797, Nov. 2004. [4] P. Bai, et al., “A 65nm Logic Technology Featuring 35nm Gate Lengths, EnhancedChannel Strain, 8 Cu Interconnect Layers, Low-κ ILD and 0.57µm2 SRAM Cell,” IEDMTech. Dig., pp. 657-660, Dec. 2004. [5] K. Mistry, et al., “A 45nm Logic Technology with High-κ + Metal-Gate Transistors,Strained Silicon, 9 Cu Interconnect Layers, 193nm Dry Patterning, and 100% Pb-freePackaging,” IEDM Tech. Dig., pp. 247-250, Dec. 2007.

[6] C. Auth, et al., “45nm High-κ + Metal gate Strain-Enhanced Transistors,” Symp. VLSITechnology, pp. 128-129, June 2008. [7] K. Kuhn, “Reducing Variation in Advanced Logic Technologies: Approaches to Processand Design for Manufacturability of Nanoscale CMOS,” IEDM Tech. Dig., pp. 471-474,Dec. 2007. [8] S. Natarajan, et al., “A 32nm Logic Technology Featuring 2nd Generation High-κ +

Metal Gate Transistors, Enhanced Channel Strain and 0.171µm2 SRAM Cell Size in a291Mb Array,” IEDM Tech. Dig., paper 27.9, Dec. 2008. [9] Y. Wang, et al., “A 4.0GHz 291Mb Voltage-Scalable SRAM in 32nm High-κ Metal-GateCMOS with Integrated Power Management,” ISSCC Dig. Tech. Papers, paper 27.1, Feb.2009. [10] S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, I. Young, “Clock Generation andDistribution for the First IA-64 Microprocessor,” IEEE J. Solid-State Circuits, vol. 35, no.11, pp. 1545-1552, Nov. 2000. [11] N. Sakran, M. Yuffe, M. Mehalel, J. Doweck, E. Knoll, A. Kovacs, “TheImplementation of the 65nm Dual-Core 64b Merom Processor,” ISSCC Dig. Tech.Papers,pp. 106-107, Feb. 2007. [12] R. Kumar and G. Hinton, “A Family of 45nm IA Processors,” ISSCC Dig. Tech.Papers, Paper 3.2, Feb. 2009. [13] K. Zhang, et al., “A SRAM Design on 65nm CMOS Technology with IntegratedLeakage Reduction Scheme,” Symp. VLSI Circuits, pp. 294-295, June 2004. [14] N. Kurd, J. Douglas, P. Mosalikanti, R. Kumar, “Next Generation Intel® Micro-archi-tecture (Nehalem) Clocking Architecture,” Symp. VLSI Circuits, pp. 62-63, June 2008. [15] K. Zhang, et al., “A 3GHz 70Mb SRAM in 65nm CMOS Technology with IntegratedColumn-Based Dynamic Power Supply,” ISSCC Dig. Tech. Papers, pp. 474-475, Feb.2005. [16] S. Fung, et al., “65nm CMOS High-Speed, General Purpose and Low PowerTransistorTechnology for High Volume Foundry Application,” Symp. VLSITechnology, pp. 92-93,June 2004. [17] K. Utsumi, et al., “A 65nm Low Power CMOS Platform with 0.495µm2 SRAM forDigital Processing and Mobile Applications,” Symp. VLSI Technology, pp. 216-217, June2005. [18] S. Ekbote, et al., “45nm Low-Power CMOS SoC Technology with AggressiveReduction of Random Variation for SRAM and Analog Transistors,” Symp. VLSITechnology, pp. 160-161, June 2008. [19] C. Jan, et al., “A 45nm Low Power System-On-Chip Technology with Dual Gate(Logic and I/O) High-κ/Metal Gate Strained Silicon Transistors,” IEDM Tech. Dig., paper27.4, Dec. 2008. [20] S. Vangal, et al., “An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS,” ISSCCDig. Tech. Papers, pp. 98-99, Feb. 2007. [21] Y. Borodovsky, W-H Cheng, R. Schenker, V. Singh, “Pixelated phase mask as novellithography RET,” Proc. SPIE, vol. 6924-13, Feb. 2008. [22] M. Yang, et al., “On the Integration of CMOS with Hybrid Crystal Orientations,” Symp.VLSI Technology, pp. 160-161, June 2004. [23] P. Packan, et al., “High Performance Hi-κ + Metal Gate Strain Enhanced Transistorson (110) Silicon,” IEDM Tech. Dig., Dec. 2008. [24] B. Doyle, et al., “Tri-Gate Fully-Depleted CMOS Transistors: Fabrication, Design andLayout,” Symp. VLSI Technology, pp. 133-134, June 2003. [25] A. Thean, et al., “Performance and Variability Comparisons between Multi-Gate FETsand Planar SOI Transistors,” IEDM Tech. Dig., pp. 881-884, Dec. 2006. [26] C. Kang, et al., “A Novel Electrode-Induced Strain Engineering for High PerformanceSOI FinFET utilizing Si (110) Channel for Both N and PMOSFETs,” IEDM Tech. Dig., pp.885-888, Dec. 2006. [27] M. Radosavljevic, et al., “High-Performance 40nm Gate Length InSb P-ChannelCompressively Strained Quantum Well Field Effect Transistors for Low-Power (VCC=0.5V)Logic Applications,” IEDM Tech. Dig., Dec. 2008. [28] M. Hudait, et al., “Heterogeneous Integration of Enhancement Mode In0.7Ga0.3AsQuantum Well Transistor on Silicon Substrate using Thin (<2 um) Composite BufferArchitecture for High-Speed and Low-Voltage (0.5V) Logic Applications,” IEDM Tech.Dig., pp. 625-628, Dec. 2007. [29] D. Kim and J. del Alamo, “Logic Performance of 40nm InAs HEMTs,” IEDM Tech.Dig., pp. 629-632, Dec. 2007. [30] T. Fukushima, Y. Yamada, H. Kikuchi, M. Koyanagi, “New Three-DimensionalIntegration Technology Using Self-Assembly Technique,” IEDM Tech. Dig., pp. 359-362,Dec. 2005. [31] P. Ramm and A. Klumpp, “Through-Silicon Via Technologies for ExtremeMiniaturized 3D Integrated Wireless Sensor Systems (e-CUBES)” IEEE Int. InterconnectTechnology Conf., pp. 7-9, June 2008. [32] I. Young, et al., “Optical I/O Technology in Tera-Scale Computing,” ISSCC Dig. Tech.Papers, paper 28.1, Feb. 2009. [33] T. Shino, et al., “Floating Body RAM Technology and its Scalability to 32nm Node andBeyond,” IEDM Tech. Dig., pp. 569-572, Dec. 2006. [34] I. Ban, U. Avci, D. Kencke, P. Chang, “A Scaled Floating Body Cell (FBC) memory withHigh-κ + Metal Gate on Thin-Silicon and Thin-BOX for 16nm Technology Node andBeyond,” Symp. VLSI Technology, pp. 92-93, June 2008. [35] S. Lai and T. Lowrey, “OUM – A 180nm Nonvolatile Memory Cell Element Technologyfor Stand Alone and Embedded Applications,” IEDM Tech. Dig., pp. 803-806, Dec. 2001. [36] J. Nolte, “The Human Brain, An Introduction to Its Functional Anatomy”, Mosby,fifth edition, 2002.

25DIGEST OF TECHNICAL PAPERS •

1

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Page 4: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

26 • 2009 IEEE International Solid-State Circuits Conference 978-1-4244-3457-2/09/$25.00 ©2009 IEEE

ISSCC 2009 / SESSION 1 / PLENARY / 1.3

Figure 1.3.1: CPU transistor count and feature size trend. Figure 1.3.2: 45nm high-κ + metal gate transistor ION-IOFF characteristics.

Figure 1.3.3: Transistor drive current and gate pitch trend over multiple generations.

Figure 1.3.5: Integrated power gates.

Figure 1.3.4: 45nm CoreTM i7 processor (Nehalem).

Figure 1.3.6: Nehalem power control unit.

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Page 5: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

27DIGEST OF TECHNICAL PAPERS •

ISSCC 2009 / February 9, 2009 / 10:35 AM

Figure 1.3.7: PC platform comparison, 1985 vs. 2008. Figure 1.3.8: 386TM vs. Nehalem processors.

Figure 1.3.11: Future lithography options.

Figure 1.3.9: 45nm SOC transistor characteristics.

1

Figure 1.3.10: Lithography trends.

Figure 1.3.12: Future transistor options.

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.

Page 6: Please click on paper title to view Visual Supplement. · Intel’s 32nm logic technology uses second generation high-κ+ metal-gate transistors, fourth generation strained silicon,

28 • 2009 IEEE International Solid-State Circuits Conference 978-1-4244-3457-2/09/$25.00 ©2009 IEEE

ISSCC 2009 / SESSION 1 / PLENARY / 1.3

Figure 1.3.13: III-V transistor options. Figure 1.3.14: 3-D chip stacking using through-silicon vias.

Figure 1.3.15: Optical interconnects for high bandwidth chip-chip interconnects.

Figure 1.3.17: Organic vs. electronic evolution.

Figure 1.3.16: System integration directions.

μ μμ μ

Figure 1.3.18: Organic vs. electronic systems.

Please click on paper title to view Visual Supplement.

Please click on paper title to view Visual Supplement.Authorized licensed use limited to: Univ of Calif San Diego. Downloaded on January 6, 2010 at 00:07 from IEEE Xplore. Restrictions apply.