ECEN720: High-Speed Links Circuits and Systems Spring 2019 · 2020. 10. 30. · ppd swing (R=200 ,...
Transcript of ECEN720: High-Speed Links Circuits and Systems Spring 2019 · 2020. 10. 30. · ppd swing (R=200 ,...
Sam PalermoAnalog & Mixed-Signal Center
Texas A&M University
ECEN720: High-Speed Links Circuits and Systems
Spring 2021
Lecture 14: Clock Distribution Techniques
Announcements • Project Preliminary Report due today• Exam 2 Apr 23• Posted on Canvas at 8AM and due at 5PM• Focuses on material from Lectures 7-14• Previous years’ Exam 2s are posted on the
website for reference
• Project Final Report due Apr 29• Project Presentations May 4 (11AM-1:30PM)
2
Agenda• Wire Scaling• Clock Distribution • CML2CMOS Converters• Multi-Phase Generation• Multi-Phase Calibration
3
Clock Distribution in Serial I/O Systems
4
• On-die global clock distribution is necessary in multi-channel embedded and fowarded clock serial link systems
Forwarded Clock SystemEmbedded Clock System
VLSI Interconnect (Wires)
5
[Bohr ISSCC 2009]
45nm CMOS
Wire Scaling
• Ideally, we scale everything by 0.7x when we move to a more advanced technology node for 2x density
• Results in 2x wire resistance, which dramatically increases wire RC delay• To compensate resistance wires get taller
• Cap grows at a smaller pace with scaling• Taller wires increase sidewall cap• Improved (low-k) dielectrics help reduce cap
6
Node “N” Node “N+1” (ideal scaling)
Node “N+1” (actual scaling)
[Ho]
Wire Scaling - Delay
• Global on-chip wire RC delay becomes many (100+) gate delays (if driven w/ one lumped driver)
7
[Ho Proc. IEEE 2001]
FO4 delay
1cm wire
Limited Wire Bandwidth• Global on-chip wire
bandwidth is much worse than chip-to-chip channels
• RC-dominated on-chip wires vs(R)LC-dominated off-chip wires
8
Agenda• Wire Scaling• Clock Distribution • CML2CMOS Converters• Multi-Phase Generation• Multi-Phase Calibration
9
Cascaded Clock Buffers
10
• Total output jitter in frequency domain can be obtained from per-stage JTFs and phase noise (PN)
[Kim ISSCC 2019]
ST1 ST2 ST3 STN
JTF1Tj_i
PN1 PN2 PN3 PNN
Tj_oJTF2 JTF3 JTFN
ST1 ST2 ST3 STN
Tj_o=( … (((Tj_i * JTF1) + PN1) * JTF2) + PN2) * … ) * JTFN) + PNN
Clock Buffer Jitter Transfer Function
11
00
4
8
12
14G, FO4, ST514G, FO2, ST10
28G, FO2, ST5
|JTF
| (dB
)
Frequency (GHz) fclk
Jitter peaking near fclk
[Kim ISSCC 2019]
Quarter‐rateHalf‐rate
0
4
8
12
20 40 60 80 100 120Data‐rate (Gb/s)
Energy/bit (p
J/b)
• Significant jitter amplification at 28GHz in the clock distribution chain• Motivates most 112Gb/s systems to use a quarter-rate clocking
scheme with 14GHz clocks• Quadrature phase spacing and duty cycle correction is necessary for
uniform output eyes
106 107 108 109 1010
Offset Frequency [Hz]
-170
-160
-150
-140
-130
-120
-110
-100
-90
-80
-70
PLLAfter IQGenAfter DCC,QEC,CK Fan-Up and TX DriverAfter 10MHz CDR FilterAfter 3MHz CDR Filter
Noi
se S
pect
ral D
ensi
ty [d
Bc/
Hz]
14GHz Quadrature Clock Distribution
12
[Kim ISSCC 2019]
• PLL output phase noise is multiplied by clock distribution JTF
• CDR filter (high-pass) is applied to get the untracked effective TX jitter
Regulator
DCC QEC4:1
PulseGen
Driver Output Stage
VCC_HV
VCC_Analog
DCD/QED
FSM
Coa
rse/
Fine
C
ontr
ol
4 4 4
4
Coa
rse/
Fine
C
ontr
ol
BufferIQ Gen
14GHzLC-PLL
4
CK Distribution
208fsrms w/ 3MHz CDR BW185fsrms w/ 10MHz CDR BW
Clock Distribution Regulation
13
Suppresses HF ripple
Low-BW Op-amp sets DC Level and LF PSRR Source Follower
provides mid-frequency PSRR[Alon ’06]
[Turker ISSCC 2019]
Huawei 60Gb/s PAM4 Clock Distribution
• Wideband ½-rate single-ended clock distribution (2-16GHz)• 2 independent data rates possible per 4-lane macro• LDO-powered CMOS inverter-based distribution• Metal shield around distribution wires to lower crosstalk 14
[LaCroix ISSCC 2019]
Mediatek 56Gb/s PAM4 Clock Distribution
15
[Ali ISSCC 2019]
• Tuned standing-wave clock distribution• Two shunt inductors placed in the middle set boundary
conditions for the transmission line and tune nearly equal amplitude at the drop points
Inductive-Loaded Clock Distribution
16
[Shibasaki ISSCC 2016]
• 2-stage narrow-band buffer drives 2-lane 2:1 MUXs and divider
• Minimal length 28GHz clock path
Active-Inductor-Based Clock Distribution
17[Upadhyaya ISSCC 2018]
Active Inductor CML Load
Agenda• Wire Scaling• Clock Distribution • CML2CMOS Converters• Multi-Phase Generation• Multi-Phase Calibration
18
CML2CMOS Converter (1)
• Differential input stage followed by high-swing output stage
• Can be sensitive to power-supply noise and reduce jitter benefits of low-swing distribution techniques
• Often require some type of duty-cycle control19
[Balamurugan JSSC 2008]
CML2CMOS Converter (2)
• AC-coupled self-biased inverter input stages and cross-coupled buffer stages can help improve duty cycle performance
20
[Kossel JSSC 2008]
Agenda• Wire Scaling• Clock Distribution • CML2CMOS Converters• Multi-Phase Generation• Multi-Phase Calibration
21
ILO-Based Multi-Phase Clock Generation
• ILO generates multiple output phases from differential injected clock
• Coarse frequency tuning loop ensures that the ILO will lock• Fine quadrature-locked loop minimizes phase error 22
CMOSPI
QPDV-to-I
Vreg_ILRO
Regulator
DClk/DClk_b
XClk/XClk_b
CoarseFreq Track Control
Vdet_p
LPF
Vreg_PI
inj
injb
FTL DAC
CK0,CK180CK45, CK225CK90,CK270CK135, CK315
Vctrl
Fine QLL
inj
CK0
Vdet_n
Coarse FTL
Injection Lock
[Chen, ISSCC 2018]
IBM 100Gb/s PAM4 QDLL Phase Generation
23
[Cevrero ISSCC 2019]
• Low-complexity inverter-based DLL generates ¼-rate clock phases from differential distribution
Agenda• Wire Scaling• Clock Distribution • CML2CMOS Converters• Multi-Phase Generation• Multi-Phase Calibration
24
Clock Error Calibration Loop
• 2-stage DCC & 4-stage QEC w/ 2-stage x-coupled buffers
• DCC with current injection and QEC with C-DAC
25
1-UIPulse Gen(4:1 MUX)
FSM
DCD/QED
CKIn
CKQ/QBCKI/IB
ControlDCC
FineFine Coarse CoarseQEC
Fine Coarse
[Kim ISSCC 2019]
BiasControl
(6b)
RangeControl (4b)
DecoderDecoder
Decoder
CtrlCtrl
Coarse Fine
Asynchronous Sampling Error Detection
• Asynchronous VCO samples final ¼-rate clocks• Duty cycle error minimized w/ equal P/N count• Quadrature error minimized w/ equal IP*QP/QP*IN count
26
CKP Cntr[23:0]
DC offset[23:0]
CKN Cntr[23:0]
CKIP/QPCKIN/QN
D QFSM
DC detect
CKIP
CKQP
CKIN
CKQN
VCO edge
Compare
DC detect
CKIPCKQP
CKQP
CKQPCKIN
VSS
CKIP/QP Cntr[23:0]
Skew offset[23:0]
CK QP/IN Cntr[23:0]
D Q
D Q
D Q
FSM
CKIP
CKQP
CKIN
CKQN
Compare
Asynchronous
Quadrature error detection
[Kim ISSCC 2019]
Duty-cycle error detection
Asynchronous
Background Quadrature Clock Calibration
• Duty cycle error detected by low-pass filtering clocks• IQ mismatch is detected by monitoring output duty cycle of replica mux• Information is used to control independent I/Q VCDLs
27
D1
CK0
Dout
D2
D1
CK0
Dout
D2
D1
CK0
Dout
D2
Case 1: IQ Matched Case 2: CK0 Early Case 3: CK0 Late
Dout,rep Duty = 50%
Dout,rep Dout,rep Dout,rep
Dout,rep Duty > 50% Dout,rep Duty < 50%
128Gb/s PAM4 TX Clock Generation
• DCC with current injection buffers
• QEC with differential offset voltage in half-rate CML divider28
[Toprak-Deniz ISSCC 2018]
DCC w/ Inverter Trip Point Adjustment• Clocks are AC-coupled to
input inverters that are biased at the trip point with feedback resistors
• IDC injected at inverter input shifts trip point and output duty cycle
• Monotonic control achieved with pull-up/down diodes
• RDC can also be adjusted to change tuning range
29
[Menolfi ISSCC 2018]
Clock Generation & Distribution Take-Away Points• Low-noise clock distribution is necessary in
high-performance serial links
• Jitter amplification must be avoided in multi-lane clock distribution
• Efficient multi-phase generation and calibration is necessary for ¼-rate front-ends
30