A Compact Clock Generator for Heterogeneous GALS

download A Compact Clock Generator for Heterogeneous GALS

of 5

Transcript of A Compact Clock Generator for Heterogeneous GALS

  • 7/28/2019 A Compact Clock Generator for Heterogeneous GALS

    1/5

    566 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013

    Brief Papers

    A Compact Clock Generator for Heterogeneous GALS

    MPSoCs in 65-nm CMOS Technology

    Sebastian Hppner, Holger Eisenreich, Stephan Henker,Dennis Walter, Georg Ellguth, and Ren Schffny

    AbstractThis paper presents an all-digital phase-locked loop (ADPLL)

    clock generator for globally asynchronous locally synchronous (GALS)

    multiprocessor systems-on-chip (MPSoCs). With its low power con-

    sumption of 2.7 mW and ultra small chip area of 0.0078 mm it can be

    instantiated per core for fine-grained power management like DVFS. It is

    based on an ADPLL providing a multiphase clock signal from which core

    frequencies from 83 to 666 MHz with 50% duty cycle are generated by

    phase rotation and frequency division. The clock meets the specification

    for DDR2/DDR3 memory interfaces. Additionally, it provides a dedicated

    high-speed clock up to 4 GHz for serial network-on-chip data links.

    Core frequencies can be changed arbitrarily within one clock cycle for

    fast dynamic frequency scaling applications. The performance including

    statistical analysis of mismatch has been verified by a prototype in 65-nm

    CMOS technology.

    Index TermsAll-digital phase-locked loop (ADPLL), digitally con-

    trolled oscillator (DCO), dynamic voltage and frequency scaling (DVFS),

    globally asynchronous locally synchronous (GALS), multiprocessor sys-

    tems-on-chip (MPSoCs).

    I. INTRODUCTION

    Multiprocessor systems-on-chip (MPSoCs) provide flexible so-

    lutions for baseband processing [1], where a wide range of radio

    standards has to be supported and high energy efficiency is manda-

    tory for mobile devices. To meet these requirements, heterogenous

    MPSoCs include different cores like general purpose RISC processors,

    digital signal processors (DSPs), or hardware accelerators. Addition-

    ally, on-chip data transmission structures like network-on-chip (NoC)

    high-speed links [2] and I/O components like double-data-rate (DDR)

    memory interfaces orfield-programmable gate array (FPGA) connec-

    tions [3] are part of complex MPSoCs. They have specialized demands

    for clock frequency and clock quality (e.g., jitter, duty cycle). Com-

    monly, centralized clock generators are used which provide clocks for

    multiple cores or I/O interfaces in heterogeneous MPSoCs [4]. The 80

    core processor [5] uses meso-synchronous clocking with a single PLL

    only, which does not allow individual per core frequency scaling. In

    contrast, globally asynchronous locally synchronous (GALS) clocking

    architectures provide each core with a dedicated clock avoiding global

    high-speed synchronous clock distribution networks [6]. To furtherenhance energy efficiency, fine grained power management techniques

    Manuscript received August 16, 2011; revised December 13, 2011; acceptedJanuary 30, 2012. Date of publication March 08, 2012; date of current versionFebruary 20, 2013. This work was supported by the German Ministry of Educa-tion and Research BMBF under grant number 13N10788 (CoolBaseStations).

    The authors are with the Chair of Highly-Parallel VLSI Systems andNeuromorphic Circuits at the Faculty of Electrical Engineering and Infor-mation Technology at Technische Universitt Dresden, Germany (e-mail:[email protected]; [email protected];[email protected]; [email protected]; [email protected]; [email protected]).

    Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TVLSI.2012.2187224

    Fig. 1. Clock generator block-level schematic (simplified).

    like dynamic voltage and frequency scaling (DVFS) are employed [7],

    [8]. Per-core DVFS can effectively be applied to GALS MPSoCs due

    to the independent local clocks. Solutions for ultra-fast supply voltage

    changes have been reported [7]. Thus a fast responding clock generator

    is required to realize these fast DVFS schemes which reduce core idle

    times when changing the performance level and increase the system

    throughput. [9] and [10] employ simple ring oscillators for GALS

    cores, which however are not suitable for DVFS. [11] uses locally

    calibrated clock generators based on controlled delay lines, but no fast

    core frequency switching is possible here. [8] uses a configurable ring

    oscillator for fast DVFS which can not generate low jitter clocks at

    specified frequencies because it is not locked to a reference clock. [12]

    presents a versatile clock generator based on flying adder (FA) fre-

    quency synthesis, which can generate a wide range of frequencies with

    low jitter, but requires large chip area. Considering the trend towards

    integration of many-core MPSoCs [5], flexible clocking solutions are

    required that fulfill the clock quality demands and support advanced

    fine-grained power management techniques.

    Therefore, this work presents a low chip area, low power all-dig-ital phase-locked loop (ADPLL)-based clock generator for per-core

    instantiation in heterogeneous GALS MPSoCs, providing both a wide

    range of output frequencies and specialized high-speed clocks. It en-

    ables ultra-fast frequency switching for per-core DVFS.

    II. ADPLL SYSTEM

    Fig. 1 shows the blocklevel schematic of theproposed MPSoCclock

    generator. A digitally controlled oscillator (DCO) generates eight clock

    phases at nominal 2 GHz frequency, from which high-speed NoC fre-

    quencies of 2 and 4 GHz are generated by multiplexing or frequency

    doubling. Core frequencies up to 666 MHz with 50% duty cycle are

    synthesized using an open-loop clock generator. The DCO is locked to

    its nominal frequency of 2 GHzbyan ADPLLwithbang-bang

    phase frequency detector (BBPFD). No large area analog filters are

    required and the digital filter implementation benefits from geometry

    scaling in state-of-the-art CMOS technologies. Moreover, no time-to-

    digital converter is required which further reduces area and power.

    The ADPLL is locked one time at start-up and core frequency changes

    are realized solely by open-loop clock generation. Thus ultra-fast fre-

    quency changes are possible and ADPLL lock time is not an issue. The

    DCO has an individual supply voltage to reduce supply-noise induced

    jitter, whereas all other components are connected to core supply. The

    bang-bang phase frequency detector (BBPFD) is adopted from [13].

    The digital filter is a standard implementation as often used in bang-

    bang ADPLLs [14] and the delta-sigma modulator (DSM) realizes frac-

    tional tuning.

    1063-8210/$31.00 2012 IEEE

  • 7/28/2019 A Compact Clock Generator for Heterogeneous GALS

    2/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013 567

    Fig. 2. BBPFD. (a) Schematic. (b) Waveform.

    When the ADPLL is started, a coarse lock sequence is performed

    which tunes the DCO to compensate process variations. To reduce

    hardware effort, the BBPFD is used as binary frequency detector which

    compares the reference period to the divided DCO signal of period

    . A timer defines a frame where t he c oarse t une s ignal

    is kept constant. The output of the BBPFD at the end of this frame is

    used as the frequency comparison value. As shown in Fig. 2(b), the

    dead zone of the BBPFD causes wrong output pulses. Considering a

    divider output signal which is faster than the reference clock, the ex-

    pected BBPFD output would be zero. If a first rising edge (1.) of DIV

    occurs within the dead zone, it is ignored and the following rising edge

    of REF is considered the first one, causing a false output. Due to the

    beat period of the ignored edge moves through the dead

    zone. The false pulse is corrected when the next rising edge of DIV

    occurs before the rising edge of REF, i.e., after

    cycles. The false output signal pulse width increases if the ADPLL is

    closer to frequency lock condition. To reduce the probability of a false

    frequency comparison, a filter is applied whose output is changed to

    one/zeroif consecutive ones/zeros occur at theBBPFD output. Oth-

    erwise it keeps its previous value. The required number offilter cycles

    reads (e.g., , 1 ns,

    5 ps, ).Therefore lesscounterhardware isrequired comparedto

    conventional counter-based frequency detectors. The number of cycles

    to be counted for a maximum resulting error of the DCO period of

    is (e.g., 0.5 ns, 5 ps, ).Based on this binary frequency detection, linear search or successive

    approximation [15] is performed to determine the coarse tune value.

    After this, fine tune phase lock is achieved by closed loop ADPLL op-

    eration.

    III. DIGITALLY CONTROLLED OSCILLATOR

    A. Architecture

    The DCO is the central component of the ADPLL. It provides eight

    equally spaced clock phases at 2 GHz with a small tuning step size

    of a few ps only. A four-stage differential ring oscillator is a suitable

    topology. Each stage must be tuned equally to preserve phase matching

    over the whole tuning range. Thus a centralized tuning mechanism

    is a good solution for the targeted multi phase clock generation for

    minimized chip area. In this work the ring oscillator is built up using

    current-starved inverter cells [see Fig. 3(b)]. Besides the minimized

    overhead for equal tuning of all stages, the current-starved inverter

    cells provide the advantage of wide tuning range, which is crucial for

    compensation of process variations, and good supply noise immunity

    [16]. Fig. 3(a) shows the schematic of the DCO. It consists of four

    pseudo-differential stages. The cross-coupled inverters ensure 180 de-

    gree phase shift between the differential nodes and help to compen-

    sate delay mismatch variations. The 8-phase output signalsare buffered

    using CMOS inverters.

    The DCO tuning bias (tunen/tunep) is generated by a current

    based digital-to-analog converter (DAC). This includes a beta multi-

    plier-based CMOS reference current source followed by switchable

    current mirrors. 6-bit binary weighted current switches are used for

    coarse tuning with low control hardware effort. Fine tune is achieved

    Fig. 3. DCO schematic and startup circuit. (a) Four-stage differential ring os-cillator. (b) Inverter. (c) Switch.

    Fig. 4. Spice simulated differential small-signal DC gain of one DCO stageversus common mode voltage, 1.2 V.

    by 32 thermometer coded switches with minimum sized transistors.This causes only small parasitic charge injection on the control signal

    nodes and therefore minimizes reference clock feedthrough.

    The ring oscillator with an even number of differential stages might

    suffer from startup problems because a stable common mode can be

    reached, where the differential gain of the delay stages is too small

    to meet the oscillation criterion. This can be overcome by high drive

    strengths of the cross-coupled inverters that ensure differential signals

    at the according nodes. If they are too strong, the main oscillation loop

    can be slowed down and regenerative switching can occur, leading to

    increased jitter [16]. If they are too weak, startup might fail. Here the

    drive strength of the cross-coupled inverters is chosen one third of the

    main inverters to prevent regenerative switching [16]. The startup issue

    is solved by a special technique.

    B. Power-Down and Startup

    When a current-starved DCO is disabled (EN=0), the main inverters

    are disabled. Their output nodes would be high-resistive. Therefore,

    the internal node voltages would not be well defined which might lead

    to unwanted static currents in the output inverters and failing startup.

    Fig. 4 shows the simulated differential small-signal DC gain of one

    pseudo-differential DCO stage. If the common mode is below 0.2 V

    or above 0.7 V the differential mode is attenuated and no oscillation

    can ramp up. These issues are overcome by switching the bias volt-

    ages to dedicated levels during power-down, as shown in Fig. 3(c)

    and (a). The switches are built up using CMOS transmission gates.

    The main inverters are completely disabled whereas in the cross-cou-

    pled stages either the pull-up current source Msp or the pull-down cur-

    rent source Msn are enabled during power down. Therefore the in-ternal nodes are kept in fully settled differential mode (Rst1, Rst0)

    during power down as shown in Fig. 3(a), causing the output inverters

    to switch completely without drawing static current (except leakage)

    when the oscillator is disabled. Start-up can occur safely from true dif-

    ferential mode, where the DCO stages have their maximum differential

    gain . During operation (EN=1) all current sources

    Msp/Msn are connected to the tuning voltages tunep/tunen, respec-

    tively.

    IV. OPEN-LOOP CLOCK GENERATION

    Fig. 5(a) shows the open-loop clock generator which generates the

    core clock from the 8-phase DCO signal (nominal 0.5 ns). A

    phase switching frequency division architecture [17] is used because

    it can generate sub-integer division ratios and thus a larger variety of

    output frequencies. A phase multiplexer selects one out of the eight

  • 7/28/2019 A Compact Clock Generator for Heterogeneous GALS

    3/5

    568 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013

    Fig. 5. Open-loop core clock generator. (a) Schematic. (b) Reverse phaseswitching.

    clock phases with period . This multiplexer output is divided by

    and by the even division ratio in the

    output divider. So the base division ratio reads if no phase

    switching occurs. As illustrated in Fig. 5(b) phases are switched in a

    glitch-free reverse switching scheme [18]. Each time the multiplexer

    switches phase, its output period is shortened. Based on the consider-

    ations in [17] which suggests a switch step of , we

    chose . So the multiplexer output period is reduced by

    or per switching event. The multiplexer clock CM

    is fed to the divide-by-2-or-3 circuit which generates the clock C23

    and enable pulses for the rotator. The division ratio is set by S23. From

    up to phase switchings can occur per C23 cycle. The

    phase multiplexer select signals are generated by a rotator, consisting

    of a 1-hot shift register ring with selectable step size of 1 or 2. The

    multiplexer select signals are directly connected to the rotatorflip-flop

    outputs to achieve minimized skew for phase selection. Rotating is en-

    abled by clock gating which keeps the rotator unclocked if no phase

    is switched to reduce power consumption. The rotator acts like or

    adder compared to the FA frequency synthesizer [12]. The clock

    C23 is fed to the synchronous divider with even division ratios

    that ensure 50% duty cycle of the output clock. In summary, the output

    period reads

    (1)

    In total 33 different division ratios in the range from 3 to 24 can be re-

    alized as shown in Fig. 10(a). Only the integer primes 13, 17 and 19 are

    missing. For 0.5 ns the available output frequencies include 100,

    133, 166, 200, 266, 333, 400, 533, and 666 MHz with 50% duty cycle

    for DDR, DDR2 and DDR3 memory interfaces. A dedicated synchro-

    nization block ensures that the fcntrl signal is captured once per output

    cycle and all internal control signals are kept constant within the whole

    output cycle. This ensures that the output frequency can be changed ar-

    bitrarily within a single output clock cycle which enables fast dynamic

    frequency scaling. A detailed description of the open-loop clock gen-

    erator circuit can be found in [19].The output clock of the open-loop clock generator exhibits jitter due

    to static phase mismatch of its multi-phase input clocks which are gen-

    erated by the DCO with a chain of delay cells locked to a period of .

    The DCO with four differential stages provides phases, where

    the delays are represented by the rising edge delays and falling edge

    delays of each delay stage. Each stage exhibits a delay which can be

    written as where is the relative

    variation which is assumed to be distributed normally with standard de-

    viation and zero mean value. represents a tuning constant

    which allows tuning by the ADPLL. For phases the total delay is

    locked to the reference , and the resulting tune control value

    reads

    (2)

    Fig. 6. MPSoC GALS clocking architecture with DVFS.

    Fig. 7. Chip photo and clock generator layout m .

    The time difference of a switching step over phases is

    (3)

    Assuming that and its mean value is zero, (3) can be approxi-

    mated by its Taylor series [20]. It reads

    (4)

    which relates the standard deviation of the total switch timing error to

    the timing error of a single delay element. The total number of switch-

    ings within one open-loop clock generator output period for a given

    division ratio as presented by (1) is

    resulting in an effective number of switchings

    (5)

    For NoC clocking, high-speed clocks of 4 or 2 GHz are generated

    by XOR combination (frequency doubling) of 90 spaced DCO clock

    phases or multiplexing, respectively.

    V. MPSOC SYSTEM INTEGRATION

    Fig. 6 shows the GALS [6] clocking architecture of a heterogeneous

    MPSoC with DVFS capability using the proposed clock generator.

    Each processor core has a dedicated clock generator and a performance

    level lookup table (LUT) which relates a 2-bit performance level to the

    6-bit control word fcntrl. A power management unit (PMU) controls

    the performance level, which is a defined voltage and frequency set-

    ting, depending on its computational load. The LUT can be written

    via JTAG interface. For further efficiency increase, several cores can

    share one ADPLL clock generator, e.g., if no high speed NoC clock

    is required the DCO can drive two independent open-loop core clock

    generators for individual frequency scaling of two MPSoC cores.

    VI. RESULTS

    The ADPLL clock generator hasbeen implemented in TSMC 65-nm

    LP CMOS technology. Fig. 7 shows the partial chip photo of a MPSoC

    with eight clock generators and its layout. For measurement one clock

  • 7/28/2019 A Compact Clock Generator for Heterogeneous GALS

    4/5

    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013 569

    Fig. 8. Measured ADPLL jitter performance. (a) Supply sensitivity of periodjitter. (b) Accumulated jitter.

    TABLE IDDR2/DDR3 PERIOD JITTER SPECIFICATION

    Fig. 9. Measured clock generator signals. (a) Period jitter 5.4 ps of 2GHz ADPLL output. (b) Instantaneous core clock period change from 12 to 1.5ns.

    generator output is fed to an LVDS pad (up to 2.4 GHz). All measure-

    ments have been performed with DCO and core supply voltages of 1.2

    V unless otherwise specified.

    The sensitivity of the ADPLL period jitter to noise on the DCO

    supply voltage is shown in Fig. 8(a). A sine wave signal is added to

    the DCO supply and the period jitter is measured. Due to the cur-

    rent-starved inverter architecture of the DCO the supply jitter sensi-

    tivity is low, which is mandatory due to the fact that multiple DCOs

    are connected to the supply net in the MPSoC. The clock generator is

    characterized for the DDR2/DDR3 memory interface clock specifica-

    tion [21], [22]. As an example, Fig. 8(b) shows the measured accumu-

    lated jitter compared to the DDR2-667 specification. Table I summa-

    rizes the period jitter compared to the DDR2/DDR3 specification.Fig. 9(a) shows the period jitter histogram of the 2 GHz ADPLL

    signal and Fig. 9(b) shows the measured output signal of the open-loop

    core clock generator, when the period is changed from its maximum 12

    ns to its minimum 1.5 ns, demonstrating the capability of instantaneous

    frequency changes. Fig. 10(a) shows the available output frequencies

    at 2 GHz DCO frequency.

    As shown in Section IV the open-loop core clock generator is af-

    fected by mismatch of the eight DCO clock phases. Fig. 10(b) shows

    the measured normalized period jitter of the core clock to-

    gether with the number of effective phase switchings from (5), for 21

    ADPLLs on 21 dies. Ideally, should be constant, if only DCO

    jitter is accumulated, if the open-loop clock generator is noise free and

    if no phase mismatch is present. In the real circuit phase mismatch

    leads to increased output jitter if for high output frequencies

    (low fcntrl). For lower output frequencies the phase mismatch effect

    Fig. 10. Measured open-loop clock generator frequency and jitter. (a) Coreclock frequencies. (b) Mismatch sensitivity from (4) and measured jitter.

    decreases because if DCO periods are summed, the output jitter is

    , whereas the phase mismatch due to remains constant.

    The open-loop clock generator internal noise adds jitter as well which

    results in increased for lower frequencies. In summary, the

    phase mismatch effect is present but does not significantly degrade the

    output clock quality.

    Table II compares the ADPLL performance achieved in this work

    to previously published PLLs in sub-100-nm CMOS technologies. As

    the period jitter is a critical measure for processor core clock quality it

    is considered in this comparison. Because usually the DCO frequency

    is much higher than the reference frequency of the PLL, uncorrelated

    thermal noise as main contributor to DCO period jitter is not filtered

    by the closed PLL loop. So in contrast to [31] we consider only the

    DCO noise for benchmarking, assuming that it is mainly caused by

    thermal noise and that the DCO is the main contributor to power in a

    PLL optimized for low period jitter. So we define the figure of merit

    (6)

    This work with its low power consumption (2.7 mW at 83

    MHz and 2 GHz) achieves a similar compared to pre-

    vious work. The energy overhead of a dedicated ADPLL instance per

    core is reduced. While achieving a wide output frequency range with

    ultra-short frequency change times the main advantage of the proposed

    clock generator is its extremely small chip area, which is the smallest

    reported for PLL clock generators in sub-100-nm CMOS. This enables

    efficient per-core instantiation in GALS MPSoCs with flexible, high

    quality clocking demands.

    VII. CONCLUSION

    A low-power clock generator with ultra small chip area for GALS

    MPSoCs has been presented. The mostly digital architecture of the

    ADPLL, where only the DCO contains analog custom components,

    leads to small die area which scales well within modern technologies.

    The DCO features a novel power-down scheme to ensures save

    start-up. Output clocks are generated by open-loop methods which

    allow instantaneous changes of frequencies without time-consuming

    ADPLL re-lock. This enables fast frequency scaling in fine grained

    DVFS architectures. The clock generator provides a wide range of

    output frequencies from 83 MHz up to 4 GHz for core clocking, special

    purpose DDR2/3 clocks and high-speed NoC clocks. A concept of

    MPSoC integration with per-core-instantiation has been shown. The

    performance has been validated by testchip measurements in TSMC

    65-nm LP CMOS technology. Statistical measurements prove the

    robustness with respect to mismatch variations. It has been shown that

  • 7/28/2019 A Compact Clock Generator for Heterogeneous GALS

    5/5

    570 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 21, NO. 3, MARCH 2013

    TABLE IIPERFORMANCE COMPARISON OF PLL CLOCK GENERATORS IN SUB-100-nm CMOS

    the effect of phase mismatch in phase rotating frequency dividers is not

    an issue for the targeted application. Supply noise sensitivity tests have

    proven the capability of robust operation in MPSoC environments.

    Therefore the proposed circuit provides an efficient novel clocking

    solution for low-power heterogeneous MPSoCs.

    REFERENCES

    [1] U. Ramacher, Software-definedradio prospectsfor multistandard mo-bile phones, Computer, vol. 40, no. 10, pp. 6269, Oct. 2007.

    [2] D. Schinkel, E. Mensink, E. Klumperink, E. Ven Tuijl, and B. Nauta,Low-power, high-speed transceivers for network-on-chip communi-cation, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 17, no.1, pp. 1221, Jan. 2009.

    [3] S. Scholze, H. Eisenreich, S. Hppner,G. Ellguth,S. Henker, M. Ander,S. Hnzsche, J. Partzsch, C. Mayr, and R. Schffny, A 32 GBit/s com-munication SoC for a waferscale neuromorphic system, Integr., VLSI

    J., vol. 45, no. 1, pp. 6175, 2012.[4] T. Limberg, M. Winter, M. Bimberg, R. Klemm, E. Matus, M. Tavares,

    G. Fettweis, H. Ahlendorf, and P. Robelly, A fully programmable 40

    GOPS SDRsingle chip baseband forLTE/WiMAX terminals, inProc.Solid-State Circuits Conf., 2008, pp. 466469.

    [5] S. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz,D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y.Hoskote, N. Borkar, and S. Borkar, An 80-tile sub-100-W teraflopsprocessor in 65-nm CMOS, IEEE J. Solid-State Circuits, vol. 43, no.1, pp. 2941, Jan. 2008.

    [6] Z. Yu and B. Baas, High performance, energy efficiency, and scala-bility with GALS chip multiprocessors, IEEE Trans. Very Large ScaleIntegr. (VLSI) Syst., vol. 17, no. 1, pp. 6679, Jan. 2009.

    [7] W. Kim, M. Gupta, G.-Y. Wei, and D. Brooks, System level analysisof fast, per-core DVFS using on-chip switching regulators, in Proc.

    IEEE 14th Int. Symp. High Perform. Comput. Arch. ( HPCA), 2008, pp.123134.

    [8] D. Truong, W. Cheng, T. Mohsenin, Z. Yu, A. Jacobson, G. Landge,M. Meeuwsen, C. Watnik, A. Tran, Z. Xiao, E. Work, J. Webb, P.

    Mejia,and B. Baas, A 167-processor computational platform in 65 nmCMOS, IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 11301144,Apr. 2009.

    [9] R. Jipa, Dedicated solution for local clock programing in GALS de-signs, in Proc. Int. Semicond. Conf. (CAS), 2008, pp. 393396.

    [10] A. Sobczyk, A. Luczyk, and W. Pleskacz, Controllable local clocksignal generator for deepsubmicronGALS architectures, inProc. 11th

    IEEE Workshop Design Diagnos. Electron. Circuits Syst., 2008, pp.14.

    [11] S. Moore, G. Taylor, P. Cunningham, R. Mullins, and P. Robinson,Self calibrating clocks for globally asynchronous locally synchronoussystems, in Proc. Int. Conf. Comput. Design, 2000, pp. 7378.

    [12] L. Xiu, A flying-adder on-chip frequency generator for complex SoCenvironment, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 54, no.12, pp. 10671071, Dec. 2007.

    [13] J. A. Tierno, A. V. Rylyakov, and D. J. Friedman, A wide power

    supply range, wide tuning range, all static CMOS all digital PLL in65 nm SOI, IEEE J. Solid-State Circuits, vol. 43, no. 1, pp. 4251,Jan. 2008.

    [14] N. Da Dalt, A design-oriented study of the nonlinear dynamics of dig-ital bang-bang PLLs, IEEE Trans. Circuits Syst. I, Reg. Papers, vol.52, no. 1, pp. 2131, Jan. 2005.

    [15] H. Eisenreich, C. Mayr, S. Henker, M. Wickert, and R. Schffny, Anovel ADPLL design using successive approximation frequency con-

    trol, Microelectron. J., vol. 40, pp. 16131622, Nov. 2009.[16] J. A. McNeill and D. Ricketts, The Designers Guide to Jitter in Ring

    Oscillators. New York: Springer, 2009.[17] B. Floyd, Sub-integer frequency synthesis using phase-rotating fre-

    quency dividers, IEEE Trans. Circuits Syst. I, Reg. Papers , vol. 55,no. 7, pp. 18231833, Jul. 2008.

    [18] K. Shu and E. Sinchez-Sinencio, CMOS PLL Synthesizers: Analysisand Design. New York: Springer, 2005.

    [19] S. Hppner, S. Henker, H. Eisenreich, and R. Schffny, An open-loopclock generator for fast frequency scaling in 65 nm CMOS tech-nology, in Proc. Mixed Design Integr. Circuits Syst., 2011, pp.264269.

    [20] R. Ven de Beek, E. Klumperink, C. Vaucher, and B. Nauta, Low-jitterclock multiplication: a comparison between PLLs and DLLs, IEEETrans. Circuits Syst. II, Analog Digit. Signal Process. , vol. 49, no. 8,pp. 555566, Aug. 2002.

    [21] JEDEC Solid State Technology Association, Arlington, VA,JESD79-2F DDR2 SDRAM Specification, 2009.

    [22] JEDEC Solid State Technology Association, Arlington, VA,JESD79-3E DDR3 SDRAM Specification, 2010.

    [23] W. Yin, R. Inti, A. Elshazly, B. Young, and P. K. Hanumolu, A0.7-to-3.5 GHz 0.6-to-2.8 mW highly digital phase-locked loopwith bandwidth tracking, IEEE J. Solid-State Circuits, DOI:10.1109/JSSC.2011.2157259.

    [24] P.-H. Hsieh, J. Maxey, and C.-K. Yang, A phase-selecting digitalphase-locked loop with bandwidth tracking in 65-nm CMOS tech-nology, IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 781792,Apr. 2010.

    [25] C.-C. Hung and S.-I. Liu, A leakage-compensated PLL in 65-nmCMOS technology, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol.56, no. 7, pp. 525529, Jul. 2009.

    [26] J.-Y. Chang and S.-I. Liu, A phase-locked loop with background

    leakage current compensation, IEEE Trans. Circuits Syst. II, Exp.Briefs, vol. 57, no. 9, pp. 666670, Sep. 2010.

    [27] J.-Y. Chang and S.-I. Liu, A 1.5 GHz phase-locked loop with leakagecurrent suppression in 65 nm CMOS, IET Circuits, Devices Syst., vol.3, no. 6, pp. 350358, Dec. 2009.

    [28] M.-W. Chen, D. Su, and S. Mehta, A calibration-free 800 MHz frac-tional-N digital PLL with embedded TDC, IEEE J. Solid-State Cir-cuits, vol. 45, no. 12, pp. 28192827, Dec. 2010.

    [29] W. Grollitsch, R. Nonis, and N. Da Dalt, A 1.4 PSRMS-period-jitterTDC-less fractional-N digital PLL with digitally controlled ring oscil-lator in 65 nm CMOS, in IEEE Int. Solid-State Circuits Conf. Dig.Tech. Papers (ISSCC), 2010, pp. 478479.

    [30] A. Rylyakov, J. Tierno, G. English, M. Sperling, and D. Friedman,A wide tuning range (1 GHz-to-15 Ghz) fractional-N all-digitalPLL in 45 nm SOI, in Proc. Custom Integr. Circuits Conf.,2008, pp. 431434.

    [31] X. Gao, E. Klumperink, P. Geraedts, and B. Nauta, Jitter analysis anda benchmarking figure-of-merit for phase-locked loops, IEEE Trans.Circuits Syst. II, Exp. Briefs, vol. 56, no. 2, pp. 117121, Feb. 2009.