Three-Dimensional Microelectronics Integration: Design, Analysis and Characterization
Three Dimensional Integration - SMTA Dimensional Integration ... Stack image collection layer on top...
Transcript of Three Dimensional Integration - SMTA Dimensional Integration ... Stack image collection layer on top...
Three Dimensional Integration
Paul Franzon North Carolina State University
Raleigh, NC
[email protected] 919.515.7351
2 © 2013, Paul D. Franzon
Outline 3DIC Motivation
Performance and Memory Bandwidth
Power Efficiency Power per unit of cost Miniaturization
3DIC Manufacturing Bulk TSV formation Wafer and chip assembly
flows Interposers Relative costs
3DIC Design Power and power efficiency Memories and memory interfaces Electrical modeling & design ESD protection The potential for logic partitioning Heterogeneous Computing
Design Support Thermal Design Test issues Potential Test Flows
Conclusions and Future perspectives
3 © 2013, Paul D. Franzon
Outline 3DIC Motivation
Performance and Memory Bandwidth
Power Efficiency Power per unit of cost Miniaturization
3DIC Manufacturing Bulk TSV formation Wafer and chip assembly
flows Interposers Relative costs
3DIC Design Power and power efficiency Memories and memory interfaces Electrical modeling & design ESD protection The potential for logic partitioning Heterogeneous Computing
Design Support Thermal Design Test issues Potential Test Flows
Conclusions and Future perspectives
4 © 2013, Paul D. Franzon
Memory Bandwidth
Most compute systems have rapidly growing memory bandwidth demands Mulitcore
Mobile: 50 GBps and more Graphics, Networking: Can easilly benefit from 1 TBps
Networking needs high capacity as well Data-driven workloads: High cross-system bandwidth
Interposers and 3DIC can provide high memory bandwidth at better power efficiencies than conventional packaging
5 © 2013, Paul D. Franzon
Off-chip vs. ON-chip Scaling Trends
Source: Poulton, NVidea
6 © 2013, Paul D. Franzon
Dark Silicon
Performance per unit power Systems increasingly limited by power consumption, not number
of transistors “Dark Silicon” : Most of the chip will be OFF to meet thermal
limits
7 © 2013, Paul D. Franzon
Server Power Costs
2% of worlds electrical power consumption, predicted to be 30% of US power by 2030
Cost of power growing percentage of cost of ownership
Source: IBM
8 © 2013, Paul D. Franzon
Compute Cost Scaling beyond 7 nm CMOS
We’ve achieved exponential gains before without silicon scaling. We can again.
CMOS scaling assumed to end around 2020 at around the 7 nm node.
9 © 2013, Paul D. Franzon
Cost reduction
3DIC processing adds 15%+ to wafer processing cost
Silicon interposers add 25%+ to cost of silicon before packaging
Thus seek potential cost reductions Heterogeneous integration Saving on high pin count packaging Reduced cooling overhead Yielded silicon cost savings
Yield decreases exponentially with chip area 22 x 22 mm chip : Yield = 44%; Say $62 / part 11 x 22 mm chip: Yield = 64%; Say $42 per pair of parts As long as cost of 3D or 2.5D integration is less than $20, then
saved money due to increased yield
10 © 2013, Paul D. Franzon
Heterogeneous Integration
Digital logic in advanced node (e.g. 22 nm) on top of analog circuits in legacy node (e.g. 45 nm) Optimizes cost per transistor since analog
transistors scale poorly Reduces design cost
Mixed technologies E.g. InP or GaN on top of silicon
Ultra high performance or high power Low power (low leakage) CMOS on top of high
performance CMOS Silicon Photonics; MEMS
Image Sensors Stack image collection layer on top of information
processing layer Lower cost; Specialized sensors; More in-situ
computation
Xilinx: 28 nm FPGA slices + 180 nm SERDES (IO)
11 © 2013, Paul D. Franzon
3D Miniaturization
Cell phone cameras Height reduction through TSVs
Miniature Sensors mm3 scale Implantable cm3 scale Food Safety & Agriculture TSVs provide a lot more interconenct than
wire bonds
RF harvester/sensor + Antenna
Low-power mixed signal ASIC
Low power Non-volatile memory
Secondary battery/ultra-capacitor
MEMS
12 © 2013, Paul D. Franzon
Outline 3DIC Motivation
Performance and Memory Bandwidth
Power Efficiency Power per unit of cost Miniaturization
3DIC Manufacturing Bulk TSV formation Wafer and chip assembly
flows Interposers Relative costs
3DIC Design Power and power efficiency Memories and memory interfaces Electrical modeling & design ESD protection The potential for logic partitioning Heterogeneous Computing
Design Support Thermal Design Test issues Potential Test Flows
Conclusions and Future perspectives
13 © 2013, Paul D. Franzon
3DIC with TSVs
Technology set:
Wafer Thinning
Underfill
14 © 2013, Paul D. Franzon
“Commercial” TSV Options
Tezzaron Down to 1.2 µm features, Tungsten
IBM, Samsung, Elpida IMEC
5 - 10 µm features, Copper
CETI/LEA (ST-Micro and others) 5 - 10 µm features Cu
TSMC (& other) interposer 10 µm features, 100 µm pitch Copper
Tezzaron
IMEC
AllVia 30µm
15 © 2013, Paul D. Franzon
Transistor/TSV Integration Options
Face-to-Face Face-to-Back Back-to-Back
Via-First/ Via-Middle
Via-Last
16 © 2013, Paul D. Franzon
Attachment technologies
Solder micobumps Today typically 40 µm pitch; Tomorrow possibly 5 µm
Copper-copper @ high temperature (> 400 C) @ Room temperature (Ziptronix DBI) Typical 2 – 5 µm pitch Potential for 1 µm pitch
IBM
17 © 2013, Paul D. Franzon
Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
Wafer to Wafer (W2W)
Wafer 1
Wafer 2
Wafer 3
Mount Thin
Mount Thin Bump Advantages Disadvantages
Simpler Identical sized chips Lower Cost Accumulated Yield Loss
Higher via Density Better Alignment Thinner Chips
1 tier 2 tiers 3 tiers 4 tiers 90% 81% 73% 65%
18 © 2013, Paul D. Franzon
Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
ONE chip to wafer (or wafer stack) (face mounted)
Wafer 1
Wafer 2
Test
Test Dice
Advantages Disadvantages Known Good Die – no accumulated yield loss
Higher cost – serial pick and place
Different die sizes Worse alignment Wafer die size largest Solder bump requires coarse TSVs in one layer Limited to stack of two
19 © 2013, Paul D. Franzon
Chip to Wafer (C2W) vs. Wafer to Wafer (W2W)
Multiple chips to wafer (or wafer stack)
Wafer 1
Wafer 2
Test
Test
Dice
Wafer 3
Mount Thin
Temporary Carrier
Attach/Demount
Advantages Disadvantages Known Good Die in multiple chips Highest cost – temporary carrier Thin TSV – Little area loss to connections to solder bumps
Still in research
Limited to stack of two
20 © 2013, Paul D. Franzon
Interposers and RDLs
Redistribution layer = thick metal (layers) added to wafers to customize interface to next chip in 3D stack
Interposer = Silicon or other carrier used to mount chips WITHIN package Examples:
Modified Legacy Process
50 – 200 µm
1-2 µm thick metal
Modified 65 nm or 90 nm Back End of Line Process
21 © 2013, Paul D. Franzon
Relative Manufacturing Costs
DRAM Chip
DRAM KGD Test/chip
ASIC Chip
ASIC KGD Test/chip
W2W 3D steps / chip
C2W 3D steps / chip
Assembled stack test
Interposer / chip stack
2,000 pin package ($10)
22 © 2013, Paul D. Franzon
Outline 3DIC Motivation
Performance and Memory Bandwidth
Power Efficiency Power per unit of cost Miniaturization
3DIC Manufacturing Bulk TSV formation Wafer and chip assembly
flows Interposers Relative costs
3DIC Design Power and power efficiency Memories and memory interfaces Electrical modeling & design ESD protection The potential for logic partitioning Heterogeneous Computing
Design Support Thermal Design Test issues Potential Test Flows
Conclusions and Future perspectives
23 © 2013, Paul D. Franzon
Energy per Operation DDR3 4.8 nJ/word
MIPS 64 core 400 pJ/cycle
45 nm 0.8 V FPU 38 pJ/Op
Low swing I/O 128 pJ/Word
(64
bit w
ords
)
LPDDR2 512 pJ/Word
SERDES I/O 1.9 nJ/Word
On-chip/mm 7 pJ/Word TSV I/O (ESD) 7 pJ/Word
TSV I/O (secondary ESD) 2 pJ/Word
Optimized DRAM core 128 pJ/word
11 nm 0.4 V core 200 pJ/op
1 cm / high-loss interposer 300 pJ/Word
Various Sources
0.4 V / low-loss interposer 45 pJ/Word
24 © 2013, Paul D. Franzon
Energy/Operation – 32 bit ops
PCB
3 mm on-chip
2MB L2 Cache
Interposer
Multiply Accumu
Ener
gy /
Ope
ratio
n (p
J)
Node (nm) Register File
8kB L2 Cache TSV
0
10
20
30
40
50
60
0 10 20 30 40 50
25 © 2013, Paul D. Franzon
Energy/Operation Ratio
pJ/32-bit op
3.6x
1.15x
0 10 20 30 40 50 60
FPU-MAC
RF
8kB cache
2MB L2
PCB IO
Interposer IO
3 mm on-chip
TSV
745
26 © 2013, Paul D. Franzon
Detailed Comparison Simulation Study
pJ/bit
27 © 2013, Paul D. Franzon
Memory on Logic
27
Conventional TSV Enabled
nVidea
or
Nx32
or
N x 128 “wide I/O”
Less Overhead
Flexible bank access
Less interface power
Flexible architecture
Short on-chip wires
Processor
Mobile
28 © 2013, Paul D. Franzon
Wide IO
Standard aimed at mainly at Mobile First standard was “under-specified” limiting interoperability
ST-Ericsson
ST, STM, LETI, Cadence
SOC
29 © 2013, Paul D. Franzon
HBM and HMC 128 GBps
ST-Ericsson
30 © 2013, Paul D. Franzon
Tezzaron “Dis-integrated RAM” Mixed technology concept
DRAM arrays in low-leakage DRAM technology (at node N)
Peripheral circuits in high-performance logic process (at node N-1)
Bit and word lines fed vertically at array edge
No repair or test prior to assembly
BIST and CAM based remapping in logic layer
Claimed results Reduced overall cost/bit
Two metals only in DRAM tiers
Effective ~ 60-70% fill factor (?)
Faster timing on interfaces, down to 3 ns RAS-RAS cycle
Configuration 8 x 128-bit ports
90 nm DRAM on 130 nm logic
Density 1 Gb/layer of DRAM
Burst access in page/port
1 Gword/s
(128 Gbps)
31 © 2013, Paul D. Franzon
IMEC TSV Parasitics
Plas et.al., ISSCC 2010
40 fF
On-chip interconnect: ~ 70 – 300 fF/mm
32 © 2013, Paul D. Franzon
ElectroStatic Discharge (ESD) Protection
There are NO published definitive studies as to what level of ESD protection is needed
Current “working” assumptions 3D integration through interposer
Need full ESD protection (~ 1 pF) – Can distribute amongst tiers
3D integration through stacking in separate fabs Need machine model ESD protection only (~250 fF)
3D integration within fab Fab can specify (Tezzaron: Antenna diode)
R
33 © 2013, Paul D. Franzon
Logic Partitioning Approaches
1. Modular Partitioning 30% improvement in power/performance
2. Cell level partitioning 18% - 35% improvement in
power/performance
3. Heterogeneous Integration 30% improvement in power/performance
4. “Extreme” 3DIC Stacking > 4 chips to effect
HP CPU Low Power CPU Specialized RAM General RAM Interconnect Modular Bus
34 © 2013, Paul D. Franzon
Modular Partitioning
3D FFT Engine 60% energy per op savings in memory
9% energy per op savings in logic
25% more silicon as 2DIC
Thor Thorolfsson
0.427
35 © 2013, Paul D. Franzon
Tezzaron 130 nm 3D SAR DSP
Complete Synthetic Aperture Radar processor 10.3 mW/GFLOPS 2 layer 3D logic
All Flip-flops on bottom partition Removes need for 3D
clock router
HMETIS partitioning used to drive 3D placement
Thor Thorolfsson
Logic only Logic, clocks, flip-flops
36 © 2013, Paul D. Franzon
Cell level partitioning
Relying on wire-length reduction alone is not enough
2D Design 0.13 µm Cell Placement split across 6.6 µm face-to-face bump structure
37 © 2013, Paul D. Franzon
Fast thread transfer
Two heterogeneous cores Different clocks Nominally different process nodes PISA (MIPS-like) instruction set
3D-enabled bus provides fast thread transfer (FTT) <50 CPU cycles to move process from one core to another, or to
swap process Varies as controlled by a third, faster clock
Switch L1 cache connection at same time (CTT)
2-issue CPU 1-issue CPU
Comparison with Running Data in 2-issue CPU alone:
Energy / op Performance 1-issue CPU alone 28% savings 39% reduction Two CPU stick with FTT and CTT
27% savings 7% reduction
c/- Brandown Dwiel, Eric Rotenberg
38 © 2013, Paul D. Franzon
Specialized RAM as L2/L3 cache
Specialized Tezzaron DRAM as combined L2/L3 Cache Capable of 3 ns RAS-RAS cycle
Option Performance Power (W) 4MB SRAM cache 1x 2.4 W 240 MB DRAM 1.89 x 0.53 W
Brings 16 core system power down by 15%
Specialized RAM
39 © 2013, Paul D. Franzon
Plug and Play Interfaces
Self-configuring, self-testing, resilient, low-overhead, low-power interfaces that can communicate between different clock-domains
“Face to Face” “Face to Back”
DistributedRequest
Module #A
Local Initiator #A
ENB
ENB
ENB
ENB
ENB
ENB
ENB
ENB
DistributedRequest
Module #B
DistributedRequest
Module #C …
Local Target #X
Processor
Cache/Memory...
Tri-state Buffer
TSVData channel
#0-3
TSVRequest channel
#0-3
40 © 2013, Paul D. Franzon
Outline 3DIC Motivation
Performance and Memory Bandwidth
Power Efficiency Power per unit of cost Miniaturization
3DIC Manufacturing Bulk TSV formation Wafer and chip assembly
flows Interposers Relative costs
3DIC Design Power and power efficiency Memories and memory interfaces Electrical modeling & design ESD protection The potential for logic partitioning Heterogeneous Computing
Design Support Thermal Design Test issues Potential Test Flows
Conclusions and Future perspectives
41 © 2013, Paul D. Franzon
CAD Thermal Pathfinding
Early design and technology investigation NCSU Pathfinding flow:
ESL model
Power Scoreboard
Physical & Technology Parameters
SystemC Static & Dynamic Thermal
Power Delivery
Partitioning
Visualization & Model Fitting
DOE
Off-module
42 © 2013, Paul D. Franzon
Thermal Mitigation: DVFS
Two (V,F) point: (1.1V, 1.66GHz), (0.9V, 1.36 GHz)
• As L2 channel temperature on Tier B, reaches 385K, downscale the voltage and frequency
• As channel
temperature reaches 370K, upscale the voltage and frequency
• Further decrease in temperature is due to changing (decrease) power profile
43 © 2013, Paul D. Franzon
Thermal Management
With conventional cooling technologies: Leverage low-power potential of 3DIC Use of thermal vias and power/ground system Dynamic in-situ thermal management
With liquid cooling Potential reduction in cost of cooling, together with increase in
performance Advanced liquid cooling demonstrated up to 3.5 kW/cm2
44 © 2013, Paul D. Franzon
Test Issues
Want to minimize total cost of test and test-escape Extremes:
“Stack and Pray” = accumulated yield loss
100% Test before assembly and after each assembly event = high test cost
Do we test through the TSV/microbump interface or around it? Testing a 10,000 microbump array is difficult and potentially
very expensive
1 chip 2 chips 3 chips 4 chips 90% 81% 73% 64%
45 © 2013, Paul D. Franzon
Basic Test Flow
Assumptions: TSV/interposer yield high enough not to need redundancy Might or might not test partial stacks
Memories
Wafer Test Burnin Repair
Sort & Stack or
Stack + test Stack + test Complete
stack
SOCs
Wafer Test to (close to)
Known Good Die Standard
3D Integration
Test & possibly
Memory BIST
Packaging and Final
Test
46 © 2013, Paul D. Franzon
Concluding Remarks
As off-chip bandwidth requirements and hunger for low power expands: 3D packaging Interposers 3DIC
BUT Interposers are expensive and potential for cost reduction is
modest 3DIC is expensive but there is potential for process learning
AND great benefit harvesting Especially once D2W is solved
Thermal and power delivery will remain challenges and bottlenecks with technology advances
Test and resiliency are potential interesting research vectors
47 © 2013, Paul D. Franzon
Acknowledgements
Faculty: Rhett Davis, Michael B. Steer, Eric Rotenberg, James Tuck, Huiyang Zhou
Professionals: Steven Lipa, Eric Wyers Current Students: Joonmu Hu, Brandon Dwiel, Zhou Wang, Marcus Tishibanqu,
Ellliott Forbes, Randy Wilkiansano, Joshua Ledford, Jong Beom Park, Past Students: Hua Hao, Samson Melamed, Peter Gadfort, Akalu Lentiro,
Shivam Priyadarshi, Christopher Mineo, Julie Oh, Won Ha Choi, Ambirish Sule, Gary Charles, Thor Thorolfsson,
Department of Electrical and Computer Engineering NC State University
0.427
48 © 2013, Paul D. Franzon
Removed
49 © 2013, Paul D. Franzon
Bandwidth Density
High off-chip bandwidth density required to close off-chip performance gap Determined by technology; crosstalk and signaling
rate limits
Upper limits
…
1 mm
x BW/wire
Technology Approximate limit High density laminate ~ 50 Gbps/mm Silicon interposer ~ 150 Gbps/mm 3DIC - microbumps > 10 Tbps/mm 2
3DIC - TSV > 1 Tbps/mm 2
50 © 2013, Paul D. Franzon
Yield Improvement
Example: Xilinx multi-chip Virtex 7
51 © 2013, Paul D. Franzon
Simplified Process Flow
1. Etch TSV holes in substrate Max. aspect ratio 10:1 hole depth < 10x
hole radius
2. Passivate side walls to isolate from bulk
52 © 2013, Paul D. Franzon
… Simplified Process Flow
3. Fill TSV with metal Copper plating, or Tungsten filling
4. Often the wafer is then attached to a carrier or another wafer before thinning
5. Back side grinding and etching to expose bottom of metal filled holes
6. Formation of backside microbumps Wafer Thinning
53 © 2013, Paul D. Franzon
… Simplified Process Flow
7. Wafer bonding and (sometimes) underfill distribution
TSV enabled 3D stack
Underfill
54 © 2013, Paul D. Franzon
ASIC
Substrate Alternatives
Side-by-side mounting Silicon Interposer or thin film Multi-chip Module
Top-to-bottom mounting
RAM ASIC
Memory
ASIC
Conventional Interposer e.g. High Density Laminate
Memory
ASIC
TSV Enabled Silicon Interposer
Face-up-Silicon Interposer
ASIC RAM
TSV Enabled No TSVs
55 © 2013, Paul D. Franzon
Glass Interposers
Leveraging large panel (TV) processing for cost reduction
Still in development Coarser TSV pitch and line widths than Silicon
56 © 2013, Paul D. Franzon
Detail En
ergy
/ O
pera
tion
(pJ)
Node (nm)
0
0.5
1
1.5
2
2.5
3
0 10 20 30 40 50
FPU-MACRF8kB cacheTSV
57 © 2013, Paul D. Franzon
Design Status
130 nm 2D Design in Fab
65 nm 3D design for May tapeout
58 © 2013, Paul D. Franzon
Design and Verification CAD Flows
All major vendors have extensions to the current flows for 3D integration purposes Electrical parasitics, including TSVs Full stack verification Full stack test insertion (in part) TSV/package stress management (coming)
Likely approach: Post assembly transistor SPICE models
Low/Zero commercial availability Pathfinding tools Tools to support aggressive 3D integration
59 © 2013, Paul D. Franzon
Heterogeneous Integration - Computing
Illustrates value of leveraging of 3D state of the art
High Performance CPU Low Power CPU Specialized RAM Specialized Interconnect
10% - 30% logic { power savings {
8x power savings in L2-L3 { 2x interconnect power {
savings
60 © 2013, Paul D. Franzon
“Extreme 3D”
Aggressive exploitation of 3DIC and high density photonics High density, vertical connections Ability to build systems that continue high density beyond a
small range of chips High density, multi-wavelength low-power photonics connectors