A Compact Clock Generator for Heterogeneous GALS
-
Upload
sethu-george -
Category
Documents
-
view
215 -
download
0
Transcript of A Compact Clock Generator for Heterogeneous GALS
-
7/28/2019 A Compact Clock Generator for Heterogeneous GALS
1/5
566 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
Brief Papers
A Compact Clock Generator for Heterogeneous GALS
MPSoCs in 65-nm CMOS Technology
Sebastian Hppner, Holger Eisenreich, Stephan Henker,Dennis Walter, Georg Ellguth, and Ren Schffny
AbstractThis paper presents an all-digital phase-locked loop (ADPLL)
clock generator for globally asynchronous locally synchronous (GALS)
multiprocessor systems-on-chip (MPSoCs). With its low power con-
sumption of 2.7 mW and ultra small chip area of 0.0078 mm it can be
instantiated per core for fine-grained power management like DVFS. It is
based on an ADPLL providing a multiphase clock signal from which core
frequencies from 83 to 666 MHz with 50% duty cycle are generated by
phase rotation and frequency division. The clock meets the specification
for DDR2/DDR3 memory interfaces. Additionally, it provides a dedicated
high-speed clock up to 4 GHz for serial network-on-chip data links.
Core frequencies can be changed arbitrarily within one clock cycle for
fast dynamic frequency scaling applications. The performance including
statistical analysis of mismatch has been verified by a prototype in 65-nm
CMOS technology.
Index TermsAll-digital phase-locked loop (ADPLL), digitally con-
trolled oscillator (DCO), dynamic voltage and frequency scaling (DVFS),
globally asynchronous locally synchronous (GALS), multiprocessor sys-
tems-on-chip (MPSoCs).
I. INTRODUCTION
Multiprocessor systems-on-chip (MPSoCs) provide flexible so-
lutions for baseband processing [1], where a wide range of radio
standards has to be supported and high energy efficiency is manda-
tory for mobile devices. To meet these requirements, heterogenous
MPSoCs include different cores like general purpose RISC processors,
digital signal processors (DSPs), or hardware accelerators. Addition-
ally, on-chip data transmission structures like network-on-chip (NoC)
high-speed links [2] and I/O components like double-data-rate (DDR)
memory interfaces orfield-programmable gate array (FPGA) connec-
tions [3] are part of complex MPSoCs. They have specialized demands
for clock frequency and clock quality (e.g., jitter, duty cycle). Com-
monly, centralized clock generators are used which provide clocks for
multiple cores or I/O interfaces in heterogeneous MPSoCs [4]. The 80
core processor [5] uses meso-synchronous clocking with a single PLL
only, which does not allow individual per core frequency scaling. In
contrast, globally asynchronous locally synchronous (GALS) clocking
architectures provide each core with a dedicated clock avoiding global
high-speed synchronous clock distribution networks [6]. To furtherenhance energy efficiency, fine grained power management techniques
Manuscript received August 16, 2011; revised December 13, 2011; acceptedJanuary 30, 2012. Date of publication March 08, 2012; date of current versionFebruary 20, 2013. This work was supported by the German Ministry of Educa-tion and Research BMBF under grant number 13N10788 (CoolBaseStations).
The authors are with the Chair of Highly-Parallel VLSI Systems andNeuromorphic Circuits at the Faculty of Electrical Engineering and Infor-mation Technology at Technische Universitt Dresden, Germany (e-mail:[email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected]).
Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2012.2187224
Fig. 1. Clock generator block-level schematic (simplified).
like dynamic voltage and frequency scaling (DVFS) are employed [7],
[8]. Per-core DVFS can effectively be applied to GALS MPSoCs due
to the independent local clocks. Solutions for ultra-fast supply voltage
changes have been reported [7]. Thus a fast responding clock generator
is required to realize these fast DVFS schemes which reduce core idle
times when changing the performance level and increase the system
throughput. [9] and [10] employ simple ring oscillators for GALS
cores, which however are not suitable for DVFS. [11] uses locally
calibrated clock generators based on controlled delay lines, but no fast
core frequency switching is possible here. [8] uses a configurable ring
oscillator for fast DVFS which can not generate low jitter clocks at
specified frequencies because it is not locked to a reference clock. [12]
presents a versatile clock generator based on flying adder (FA) fre-
quency synthesis, which can generate a wide range of frequencies with
low jitter, but requires large chip area. Considering the trend towards
integration of many-core MPSoCs [5], flexible clocking solutions are
required that fulfill the clock quality demands and support advanced
fine-grained power management techniques.
Therefore, this work presents a low chip area, low power all-dig-ital phase-locked loop (ADPLL)-based clock generator for per-core
instantiation in heterogeneous GALS MPSoCs, providing both a wide
range of output frequencies and specialized high-speed clocks. It en-
ables ultra-fast frequency switching for per-core DVFS.
II. ADPLL SYSTEM
Fig. 1 shows the blocklevel schematic of theproposed MPSoCclock
generator. A digitally controlled oscillator (DCO) generates eight clock
phases at nominal 2 GHz frequency, from which high-speed NoC fre-
quencies of 2 and 4 GHz are generated by multiplexing or frequency
doubling. Core frequencies up to 666 MHz with 50% duty cycle are
synthesized using an open-loop clock generator. The DCO is locked to
its nominal frequency of 2 GHzbyan ADPLLwithbang-bang
phase frequency detector (BBPFD). No large area analog filters are
required and the digital filter implementation benefits from geometry
scaling in state-of-the-art CMOS technologies. Moreover, no time-to-
digital converter is required which further reduces area and power.
The ADPLL is locked one time at start-up and core frequency changes
are realized solely by open-loop clock generation. Thus ultra-fast fre-
quency changes are possible and ADPLL lock time is not an issue. The
DCO has an individual supply voltage to reduce supply-noise induced
jitter, whereas all other components are connected to core supply. The
bang-bang phase frequency detector (BBPFD) is adopted from [13].
The digital filter is a standard implementation as often used in bang-
bang ADPLLs [14] and the delta-sigma modulator (DSM) realizes frac-
tional tuning.
1063-8210/$31.00 2012 IEEE
-
7/28/2019 A Compact Clock Generator for Heterogeneous GALS
2/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013 567
Fig. 2. BBPFD. (a) Schematic. (b) Waveform.
When the ADPLL is started, a coarse lock sequence is performed
which tunes the DCO to compensate process variations. To reduce
hardware effort, the BBPFD is used as binary frequency detector which
compares the reference period to the divided DCO signal of period
. A timer defines a frame where t he c oarse t une s ignal
is kept constant. The output of the BBPFD at the end of this frame is
used as the frequency comparison value. As shown in Fig. 2(b), the
dead zone of the BBPFD causes wrong output pulses. Considering a
divider output signal which is faster than the reference clock, the ex-
pected BBPFD output would be zero. If a first rising edge (1.) of DIV
occurs within the dead zone, it is ignored and the following rising edge
of REF is considered the first one, causing a false output. Due to the
beat period of the ignored edge moves through the dead
zone. The false pulse is corrected when the next rising edge of DIV
occurs before the rising edge of REF, i.e., after
cycles. The false output signal pulse width increases if the ADPLL is
closer to frequency lock condition. To reduce the probability of a false
frequency comparison, a filter is applied whose output is changed to
one/zeroif consecutive ones/zeros occur at theBBPFD output. Oth-
erwise it keeps its previous value. The required number offilter cycles
reads (e.g., , 1 ns,
5 ps, ).Therefore lesscounterhardware isrequired comparedto
conventional counter-based frequency detectors. The number of cycles
to be counted for a maximum resulting error of the DCO period of
is (e.g., 0.5 ns, 5 ps, ).Based on this binary frequency detection, linear search or successive
approximation [15] is performed to determine the coarse tune value.
After this, fine tune phase lock is achieved by closed loop ADPLL op-
eration.
III. DIGITALLY CONTROLLED OSCILLATOR
A. Architecture
The DCO is the central component of the ADPLL. It provides eight
equally spaced clock phases at 2 GHz with a small tuning step size
of a few ps only. A four-stage differential ring oscillator is a suitable
topology. Each stage must be tuned equally to preserve phase matching
over the whole tuning range. Thus a centralized tuning mechanism
is a good solution for the targeted multi phase clock generation for
minimized chip area. In this work the ring oscillator is built up using
current-starved inverter cells [see Fig. 3(b)]. Besides the minimized
overhead for equal tuning of all stages, the current-starved inverter
cells provide the advantage of wide tuning range, which is crucial for
compensation of process variations, and good supply noise immunity
[16]. Fig. 3(a) shows the schematic of the DCO. It consists of four
pseudo-differential stages. The cross-coupled inverters ensure 180 de-
gree phase shift between the differential nodes and help to compen-
sate delay mismatch variations. The 8-phase output signalsare buffered
using CMOS inverters.
The DCO tuning bias (tunen/tunep) is generated by a current
based digital-to-analog converter (DAC). This includes a beta multi-
plier-based CMOS reference current source followed by switchable
current mirrors. 6-bit binary weighted current switches are used for
coarse tuning with low control hardware effort. Fine tune is achieved
Fig. 3. DCO schematic and startup circuit. (a) Four-stage differential ring os-cillator. (b) Inverter. (c) Switch.
Fig. 4. Spice simulated differential small-signal DC gain of one DCO stageversus common mode voltage, 1.2 V.
by 32 thermometer coded switches with minimum sized transistors.This causes only small parasitic charge injection on the control signal
nodes and therefore minimizes reference clock feedthrough.
The ring oscillator with an even number of differential stages might
suffer from startup problems because a stable common mode can be
reached, where the differential gain of the delay stages is too small
to meet the oscillation criterion. This can be overcome by high drive
strengths of the cross-coupled inverters that ensure differential signals
at the according nodes. If they are too strong, the main oscillation loop
can be slowed down and regenerative switching can occur, leading to
increased jitter [16]. If they are too weak, startup might fail. Here the
drive strength of the cross-coupled inverters is chosen one third of the
main inverters to prevent regenerative switching [16]. The startup issue
is solved by a special technique.
B. Power-Down and Startup
When a current-starved DCO is disabled (EN=0), the main inverters
are disabled. Their output nodes would be high-resistive. Therefore,
the internal node voltages would not be well defined which might lead
to unwanted static currents in the output inverters and failing startup.
Fig. 4 shows the simulated differential small-signal DC gain of one
pseudo-differential DCO stage. If the common mode is below 0.2 V
or above 0.7 V the differential mode is attenuated and no oscillation
can ramp up. These issues are overcome by switching the bias volt-
ages to dedicated levels during power-down, as shown in Fig. 3(c)
and (a). The switches are built up using CMOS transmission gates.
The main inverters are completely disabled whereas in the cross-cou-
pled stages either the pull-up current source Msp or the pull-down cur-
rent source Msn are enabled during power down. Therefore the in-ternal nodes are kept in fully settled differential mode (Rst1, Rst0)
during power down as shown in Fig. 3(a), causing the output inverters
to switch completely without drawing static current (except leakage)
when the oscillator is disabled. Start-up can occur safely from true dif-
ferential mode, where the DCO stages have their maximum differential
gain . During operation (EN=1) all current sources
Msp/Msn are connected to the tuning voltages tunep/tunen, respec-
tively.
IV. OPEN-LOOP CLOCK GENERATION
Fig. 5(a) shows the open-loop clock generator which generates the
core clock from the 8-phase DCO signal (nominal 0.5 ns). A
phase switching frequency division architecture [17] is used because
it can generate sub-integer division ratios and thus a larger variety of
output frequencies. A phase multiplexer selects one out of the eight
-
7/28/2019 A Compact Clock Generator for Heterogeneous GALS
3/5
568 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
Fig. 5. Open-loop core clock generator. (a) Schematic. (b) Reverse phaseswitching.
clock phases with period . This multiplexer output is divided by
and by the even division ratio in the
output divider. So the base division ratio reads if no phase
switching occurs. As illustrated in Fig. 5(b) phases are switched in a
glitch-free reverse switching scheme [18]. Each time the multiplexer
switches phase, its output period is shortened. Based on the consider-
ations in [17] which suggests a switch step of , we
chose . So the multiplexer output period is reduced by
or per switching event. The multiplexer clock CM
is fed to the divide-by-2-or-3 circuit which generates the clock C23
and enable pulses for the rotator. The division ratio is set by S23. From
up to phase switchings can occur per C23 cycle. The
phase multiplexer select signals are generated by a rotator, consisting
of a 1-hot shift register ring with selectable step size of 1 or 2. The
multiplexer select signals are directly connected to the rotatorflip-flop
outputs to achieve minimized skew for phase selection. Rotating is en-
abled by clock gating which keeps the rotator unclocked if no phase
is switched to reduce power consumption. The rotator acts like or
adder compared to the FA frequency synthesizer [12]. The clock
C23 is fed to the synchronous divider with even division ratios
that ensure 50% duty cycle of the output clock. In summary, the output
period reads
(1)
In total 33 different division ratios in the range from 3 to 24 can be re-
alized as shown in Fig. 10(a). Only the integer primes 13, 17 and 19 are
missing. For 0.5 ns the available output frequencies include 100,
133, 166, 200, 266, 333, 400, 533, and 666 MHz with 50% duty cycle
for DDR, DDR2 and DDR3 memory interfaces. A dedicated synchro-
nization block ensures that the fcntrl signal is captured once per output
cycle and all internal control signals are kept constant within the whole
output cycle. This ensures that the output frequency can be changed ar-
bitrarily within a single output clock cycle which enables fast dynamic
frequency scaling. A detailed description of the open-loop clock gen-
erator circuit can be found in [19].The output clock of the open-loop clock generator exhibits jitter due
to static phase mismatch of its multi-phase input clocks which are gen-
erated by the DCO with a chain of delay cells locked to a period of .
The DCO with four differential stages provides phases, where
the delays are represented by the rising edge delays and falling edge
delays of each delay stage. Each stage exhibits a delay which can be
written as where is the relative
variation which is assumed to be distributed normally with standard de-
viation and zero mean value. represents a tuning constant
which allows tuning by the ADPLL. For phases the total delay is
locked to the reference , and the resulting tune control value
reads
(2)
Fig. 6. MPSoC GALS clocking architecture with DVFS.
Fig. 7. Chip photo and clock generator layout m .
The time difference of a switching step over phases is
(3)
Assuming that and its mean value is zero, (3) can be approxi-
mated by its Taylor series [20]. It reads
(4)
which relates the standard deviation of the total switch timing error to
the timing error of a single delay element. The total number of switch-
ings within one open-loop clock generator output period for a given
division ratio as presented by (1) is
resulting in an effective number of switchings
(5)
For NoC clocking, high-speed clocks of 4 or 2 GHz are generated
by XOR combination (frequency doubling) of 90 spaced DCO clock
phases or multiplexing, respectively.
V. MPSOC SYSTEM INTEGRATION
Fig. 6 shows the GALS [6] clocking architecture of a heterogeneous
MPSoC with DVFS capability using the proposed clock generator.
Each processor core has a dedicated clock generator and a performance
level lookup table (LUT) which relates a 2-bit performance level to the
6-bit control word fcntrl. A power management unit (PMU) controls
the performance level, which is a defined voltage and frequency set-
ting, depending on its computational load. The LUT can be written
via JTAG interface. For further efficiency increase, several cores can
share one ADPLL clock generator, e.g., if no high speed NoC clock
is required the DCO can drive two independent open-loop core clock
generators for individual frequency scaling of two MPSoC cores.
VI. RESULTS
The ADPLL clock generator hasbeen implemented in TSMC 65-nm
LP CMOS technology. Fig. 7 shows the partial chip photo of a MPSoC
with eight clock generators and its layout. For measurement one clock
-
7/28/2019 A Compact Clock Generator for Heterogeneous GALS
4/5
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013 569
Fig. 8. Measured ADPLL jitter performance. (a) Supply sensitivity of periodjitter. (b) Accumulated jitter.
TABLE IDDR2/DDR3 PERIOD JITTER SPECIFICATION
Fig. 9. Measured clock generator signals. (a) Period jitter 5.4 ps of 2GHz ADPLL output. (b) Instantaneous core clock period change from 12 to 1.5ns.
generator output is fed to an LVDS pad (up to 2.4 GHz). All measure-
ments have been performed with DCO and core supply voltages of 1.2
V unless otherwise specified.
The sensitivity of the ADPLL period jitter to noise on the DCO
supply voltage is shown in Fig. 8(a). A sine wave signal is added to
the DCO supply and the period jitter is measured. Due to the cur-
rent-starved inverter architecture of the DCO the supply jitter sensi-
tivity is low, which is mandatory due to the fact that multiple DCOs
are connected to the supply net in the MPSoC. The clock generator is
characterized for the DDR2/DDR3 memory interface clock specifica-
tion [21], [22]. As an example, Fig. 8(b) shows the measured accumu-
lated jitter compared to the DDR2-667 specification. Table I summa-
rizes the period jitter compared to the DDR2/DDR3 specification.Fig. 9(a) shows the period jitter histogram of the 2 GHz ADPLL
signal and Fig. 9(b) shows the measured output signal of the open-loop
core clock generator, when the period is changed from its maximum 12
ns to its minimum 1.5 ns, demonstrating the capability of instantaneous
frequency changes. Fig. 10(a) shows the available output frequencies
at 2 GHz DCO frequency.
As shown in Section IV the open-loop core clock generator is af-
fected by mismatch of the eight DCO clock phases. Fig. 10(b) shows
the measured normalized period jitter of the core clock to-
gether with the number of effective phase switchings from (5), for 21
ADPLLs on 21 dies. Ideally, should be constant, if only DCO
jitter is accumulated, if the open-loop clock generator is noise free and
if no phase mismatch is present. In the real circuit phase mismatch
leads to increased output jitter if for high output frequencies
(low fcntrl). For lower output frequencies the phase mismatch effect
Fig. 10. Measured open-loop clock generator frequency and jitter. (a) Coreclock frequencies. (b) Mismatch sensitivity from (4) and measured jitter.
decreases because if DCO periods are summed, the output jitter is
, whereas the phase mismatch due to remains constant.
The open-loop clock generator internal noise adds jitter as well which
results in increased for lower frequencies. In summary, the
phase mismatch effect is present but does not significantly degrade the
output clock quality.
Table II compares the ADPLL performance achieved in this work
to previously published PLLs in sub-100-nm CMOS technologies. As
the period jitter is a critical measure for processor core clock quality it
is considered in this comparison. Because usually the DCO frequency
is much higher than the reference frequency of the PLL, uncorrelated
thermal noise as main contributor to DCO period jitter is not filtered
by the closed PLL loop. So in contrast to [31] we consider only the
DCO noise for benchmarking, assuming that it is mainly caused by
thermal noise and that the DCO is the main contributor to power in a
PLL optimized for low period jitter. So we define the figure of merit
(6)
This work with its low power consumption (2.7 mW at 83
MHz and 2 GHz) achieves a similar compared to pre-
vious work. The energy overhead of a dedicated ADPLL instance per
core is reduced. While achieving a wide output frequency range with
ultra-short frequency change times the main advantage of the proposed
clock generator is its extremely small chip area, which is the smallest
reported for PLL clock generators in sub-100-nm CMOS. This enables
efficient per-core instantiation in GALS MPSoCs with flexible, high
quality clocking demands.
VII. CONCLUSION
A low-power clock generator with ultra small chip area for GALS
MPSoCs has been presented. The mostly digital architecture of the
ADPLL, where only the DCO contains analog custom components,
leads to small die area which scales well within modern technologies.
The DCO features a novel power-down scheme to ensures save
start-up. Output clocks are generated by open-loop methods which
allow instantaneous changes of frequencies without time-consuming
ADPLL re-lock. This enables fast frequency scaling in fine grained
DVFS architectures. The clock generator provides a wide range of
output frequencies from 83 MHz up to 4 GHz for core clocking, special
purpose DDR2/3 clocks and high-speed NoC clocks. A concept of
MPSoC integration with per-core-instantiation has been shown. The
performance has been validated by testchip measurements in TSMC
65-nm LP CMOS technology. Statistical measurements prove the
robustness with respect to mismatch variations. It has been shown that
-
7/28/2019 A Compact Clock Generator for Heterogeneous GALS
5/5
570 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013
TABLE IIPERFORMANCE COMPARISON OF PLL CLOCK GENERATORS IN SUB-100-nm CMOS
the effect of phase mismatch in phase rotating frequency dividers is not
an issue for the targeted application. Supply noise sensitivity tests have
proven the capability of robust operation in MPSoC environments.
Therefore the proposed circuit provides an efficient novel clocking
solution for low-power heterogeneous MPSoCs.
REFERENCES
[1] U. Ramacher, Software-definedradio prospectsfor multistandard mo-bile phones, Computer, vol. 40, no. 10, pp. 6269, Oct. 2007.
[2] D. Schinkel, E. Mensink, E. Klumperink, E. Ven Tuijl, and B. Nauta,Low-power, high-speed transceivers for network-on-chip communi-cation, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no.1, pp. 1221, Jan. 2009.
[3] S. Scholze, H. Eisenreich, S. Hppner,G. Ellguth,S. Henker, M. Ander,S. Hnzsche, J. Partzsch, C. Mayr, and R. Schffny, A 32 GBit/s com-munication SoC for a waferscale neuromorphic system, Integr., VLSI
J., vol. 45, no. 1, pp. 6175, 2012.[4] T. Limberg, M. Winter, M. Bimberg, R. Klemm, E. Matus, M. Tavares,
G. Fettweis, H. Ahlendorf, and P. Robelly, A fully programmable 40
GOPS SDRsingle chip baseband forLTE/WiMAX terminals, inProc.Solid-State Circuits Conf., 2008, pp. 466469.
[5] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y.Hoskote, N. Borkar, and S. Borkar, An 80-tile sub-100-W teraflopsprocessor in 65-nm CMOS, IEEE J. Solid-State Circuits, vol. 43, no.1, pp. 2941, Jan. 2008.
[6] Z. Yu and B. Baas, High performance, energy efficiency, and scala-bility with GALS chip multiprocessors, IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 17, no. 1, pp. 6679, Jan. 2009.
[7] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, System level analysisof fast, per-core DVFS using on-chip switching regulators, in Proc.
IEEE 14th Int. Symp. High Perform. Comput. Arch. ( HPCA), 2008, pp.123134.
[8] D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge,M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P.
Mejia,and B. Baas, A 167-processor computational platform in 65 nmCMOS, IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 11301144,Apr. 2009.
[9] R. Jipa, Dedicated solution for local clock programing in GALS de-signs, in Proc. Int. Semicond. Conf. (CAS), 2008, pp. 393396.
[10] A. Sobczyk, A. Luczyk, and W. Pleskacz, Controllable local clocksignal generator for deepsubmicronGALS architectures, inProc. 11th
IEEE Workshop Design Diagnos. Electron. Circuits Syst., 2008, pp.14.
[11] S. Moore, G. Taylor, P. Cunningham, R. Mullins, and P. Robinson,Self calibrating clocks for globally asynchronous locally synchronoussystems, in Proc. Int. Conf. Comput. Design, 2000, pp. 7378.
[12] L. Xiu, A flying-adder on-chip frequency generator for complex SoCenvironment, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no.12, pp. 10671071, Dec. 2007.
[13] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, A wide power
supply range, wide tuning range, all static CMOS all digital PLL in65 nm SOI, IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 4251,Jan. 2008.
[14] N. Da Dalt, A design-oriented study of the nonlinear dynamics of dig-ital bang-bang PLLs, IEEE Trans. Circuits Syst. I, Reg. Papers, vol.52, no. 1, pp. 2131, Jan. 2005.
[15] H. Eisenreich, C. Mayr, S. Henker, M. Wickert, and R. Schffny, Anovel ADPLL design using successive approximation frequency con-
trol, Microelectron. J., vol. 40, pp. 16131622, Nov. 2009.[16] J. A. McNeill and D. Ricketts, The Designers Guide to Jitter in Ring
Oscillators. New York: Springer, 2009.[17] B. Floyd, Sub-integer frequency synthesis using phase-rotating fre-
quency dividers, IEEE Trans. Circuits Syst. I, Reg. Papers , vol. 55,no. 7, pp. 18231833, Jul. 2008.
[18] K. Shu and E. Sinchez-Sinencio, CMOS PLL Synthesizers: Analysisand Design. New York: Springer, 2005.
[19] S. Hppner, S. Henker, H. Eisenreich, and R. Schffny, An open-loopclock generator for fast frequency scaling in 65 nm CMOS tech-nology, in Proc. Mixed Design Integr. Circuits Syst., 2011, pp.264269.
[20] R. Ven de Beek, E. Klumperink, C. Vaucher, and B. Nauta, Low-jitterclock multiplication: a comparison between PLLs and DLLs, IEEETrans. Circuits Syst. II, Analog Digit. Signal Process. , vol. 49, no. 8,pp. 555566, Aug. 2002.
[21] JEDEC Solid State Technology Association, Arlington, VA,JESD79-2F DDR2 SDRAM Specification, 2009.
[22] JEDEC Solid State Technology Association, Arlington, VA,JESD79-3E DDR3 SDRAM Specification, 2010.
[23] W. Yin, R. Inti, A. Elshazly, B. Young, and P. K. Hanumolu, A0.7-to-3.5 GHz 0.6-to-2.8 mW highly digital phase-locked loopwith bandwidth tracking, IEEE J. Solid-State Circuits, DOI:10.1109/JSSC.2011.2157259.
[24] P.-H. Hsieh, J. Maxey, and C.-K. Yang, A phase-selecting digitalphase-locked loop with bandwidth tracking in 65-nm CMOS tech-nology, IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 781792,Apr. 2010.
[25] C.-C. Hung and S.-I. Liu, A leakage-compensated PLL in 65-nmCMOS technology, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.56, no. 7, pp. 525529, Jul. 2009.
[26] J.-Y. Chang and S.-I. Liu, A phase-locked loop with background
leakage current compensation, IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 57, no. 9, pp. 666670, Sep. 2010.
[27] J.-Y. Chang and S.-I. Liu, A 1.5 GHz phase-locked loop with leakagecurrent suppression in 65 nm CMOS, IET Circuits, Devices Syst., vol.3, no. 6, pp. 350358, Dec. 2009.
[28] M.-W. Chen, D. Su, and S. Mehta, A calibration-free 800 MHz frac-tional-N digital PLL with embedded TDC, IEEE J. Solid-State Cir-cuits, vol. 45, no. 12, pp. 28192827, Dec. 2010.
[29] W. Grollitsch, R. Nonis, and N. Da Dalt, A 1.4 PSRMS-period-jitterTDC-less fractional-N digital PLL with digitally controlled ring oscil-lator in 65 nm CMOS, in IEEE Int. Solid-State Circuits Conf. Dig.Tech. Papers (ISSCC), 2010, pp. 478479.
[30] A. Rylyakov, J. Tierno, G. English, M. Sperling, and D. Friedman,A wide tuning range (1 GHz-to-15 Ghz) fractional-N all-digitalPLL in 45 nm SOI, in Proc. Custom Integr. Circuits Conf.,2008, pp. 431434.
[31] X. Gao, E. Klumperink, P. Geraedts, and B. Nauta, Jitter analysis anda benchmarking figure-of-merit for phase-locked loops, IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 56, no. 2, pp. 117121, Feb. 2009.