CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE …bk656yt4469/Alireza_thesis... · rithms required for...

CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE

PREDISTORTION OF HIGH-SPEED DACS

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF ELECTRICAL

ENGINEERING

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Alireza Dastgheib

March 2013

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/bk656yt4469

© 2013 by Alireza Dastgheib. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/bk656yt4469

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Boris Murmann, Primary Adviser


Ada Poon


Bruce Wooley

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

In this thesis, the design and implementation of circuits and signal processing algo-

rithms required for digital predistortion of a digital-to-analog converter (DAC) with

an open-loop driver is presented. On the circuit design side, the implementation of

a high precision, high acquisition-bandwidth calibration analog-to-digital converter

(ADC) for sampling the DAC output is discussed. The ADC has a cyclic core and

a variant of the switched-RC sampling network suitable for high frequency opera-

tion. It is implemented in 90-nm CMOS and achieves an SFDR of higher than 72

dB for input frequencies under 500 MHz. On the signal processing side, a calibration

algorithm is presented that uses the input/output data of the DAC to identify its

nonlinearity and cancel it through digital predistortion. The algorithm represents a

novel technique for the linearization of Hammerstein systems with low computational

cost and is suitable for hardware realization. Overall, this calibration architecture

improves the SFDR of the DAC by close to 30 dB to achieve a final linearity of 53

dB for input frequencies up to 400 MHz and peak-to-peak differential output swing

of 800 mV.

iv

Acknowledgements

Much like any journey in life, my PhD was not all about the destination—the thesis

in this case, but rather mostly about the process that took me here. I would like

to thank all the people who accompanied me in this journey and made me a better

person.

Firstly, I thank my advisor, Prof. Boris Murmann, who in addition to being a

research all-star, was a great instructor and a supportive mentor. I also thank Prof.

Bruce Wooley and Prof. Ada Poon, for being on my orals committee as well as my

dissertation reading committee. I must also thank Prof. Norbert Pelc for chairing

my orals committee. I am deeply thankful to Ann Guerra for her extraordinary help

with administrative issues and lots of other random errands. I thank my research

partner, Clay Daigle, for the many things I learned from him.

Being in the silicon valley, I had the opportunity to get help from several industry

members. I would like to thank Ken Poulton (Agilent Technologies), Keith Ring

(Intersil) and Sim Narasimha for their help and brainstorming during various phases

of the research. I thank Phil Golden (Intersil) and Evan Chang (Hamamatsu) for

their sincere help with debugging my chips. Henrique Miranda went out of his way

in spite of a busy schedule, and helped me fix a problem with my evaluation board.

I appreciate the help and valuable feedbacks from my group mates as well as other

student in the Allen building throughout this research. I thank the current members

of the group: Alex, Bill, Doug, Jon, Mahmoud, Man-Chia, Martin, Noam, Ross,

Ryan, Siddharth, Vaibhav; as well as the alumni: Donghyun, Drew, Echere, Jason,

Justin, Manar, Parastoo, Pedram, Wei. I also thank Mohammad, Maryam, Roozbeh

and Rostam for their friendship and help in and out of work.

v

I was fortunate to have the wonderful company of many friends during my years

at Stanford. Their help and support was invaluable all along the way. Among them

are some of the most amazing and intelligent people I have ever known in my life. I

am deeply grateful to every one of them. Finally, I would like to thank my family for

their love and support. I would not have made it without them.

vi

Contents

Abstract iv

Acknowledgements v

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Related Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.1 Pipeline ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Digital Predistortion of Power Amplifiers . . . . . . . . . . . . 6

1.2.3 Loudspeakers . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 System Considerations 10

2.1 DAC Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 ADC Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Sampling Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2.2 Acquisition Bandwidth . . . . . . . . . . . . . . . . . . . . . . 16

2.2.3 Boxcar Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.4 Noise Requirement . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Calibration Algorithm 23

3.1 System Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Volterra Model . . . . . . . . . . . . . . . . . . . . . . . . . . 23

vii

3.1.2 Static Nonlinearity and Nonlinear Filtering . . . . . . . . . . . 24

3.2 Hammerstein System Calibration . . . . . . . . . . . . . . . . . . . . 27

3.2.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 Adaptive Predistortion . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Optimization of the Predistorter . . . . . . . . . . . . . . . . . . . . . 35

3.4 Calibration Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Prototype Design 45

4.1 ADC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.1.1 High Bandwidth Sampling . . . . . . . . . . . . . . . . . . . . 46

4.1.2 ADC Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.1.3 Foreground Calibration . . . . . . . . . . . . . . . . . . . . . . 58

4.1.4 OTA Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2 Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2.1 Review of Amplifier Noise Analysis . . . . . . . . . . . . . . . 60

4.2.2 Notes on Sampling Noise . . . . . . . . . . . . . . . . . . . . . 64

4.2.3 Track-and-hold Noise Analysis . . . . . . . . . . . . . . . . . . 67

4.2.4 ADC Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . 68

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Experimental Results 70

5.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.1 ADC Measurements . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.2 DAC Measurements . . . . . . . . . . . . . . . . . . . . . . . . 75

5.3 Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6 Conclusions and Future Work 83

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

viii

A Calculation of Least Squares Residual Error 86

B Predistortion Error Due to Model Inaccuracy 91

Bibliography 95

ix

List of Figures

1.1 Basic DAC architectures. . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 The research’s big picture. This research focuses on the design of the

calibration ADC as well as the predistortion algorithm. . . . . . . . . 4

1.3 Structure of a pipeline ADC. . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Digitally predistorted PA. . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 General setup of an AEC. . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1 Two examples where a DAC’s I/O curve is not reversible. (a) Input

codes are not tight enough. (b) I/O curve discontinuity produces dead-

zone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 The calibration algorithm modifies the DAC’s mapping between input

digital codes and its codomain of output pulses. . . . . . . . . . . . . 11

2.3 Discrete-time approximation of a continuous-time Volterra system. . . 13

2.4 Parameter estimation of a continuous-time Volterra system. The un-

known signal is the slow-changing set of parameters θ(t) and not the

input code x[n]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5 Two examples where pulse energy does not translate to signal linearity.

(a) Output signal is nonlinear but pulse energy is linear. (b) Output

signal is linear in bandwidth of interest but pulse energy is nonlinear. 18

2.6 System block diagram and mapping between continuous-time and discrete-

time. (a) Accurate mapping of a continuous time signal into a discrete

sample cannot be implemented easily. A prefilter in front of the ADC

as shown in (b) can facilitate this. . . . . . . . . . . . . . . . . . . . . 20

x

2.7 Two pre-filter options for the calibration ADC: A filter with a long

time response (a) has a narrow frequency bandwidth (b). However, a

boxcar sampler with a short time window (c) will pass more frequencies

(d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 DAC with static nonlinearity and nonlinear output filter. . . . . . . . 24

3.2 Calibration of a DAC with static nonlinearity and nonlinear output

filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Hammerstein model of a system with input x and output y. . . . . . 27

3.4 Calibration through system identification. . . . . . . . . . . . . . . . 31

3.5 Calibration through adaptive predistortion. . . . . . . . . . . . . . . . 32

3.6 Recursive least squares using Givens rotations. . . . . . . . . . . . . . 33

3.7 Systolic array implementation of the RLS error calculation algorithm. 34

3.8 Search for the next simplex in the Nelder-Mead simplex algorithm. . 36

3.9 An example of the Nelder-Mead simplex algorithm. . . . . . . . . . . 39

3.10 Matlab simulation of a calibration run. . . . . . . . . . . . . . . . . . 40

3.11 The adaptive calibration consists of three nested iterative algorithms,

with simple CORDIC iterations at the lowest level. . . . . . . . . . . 41

4.1 A traditional bottom-plate track-and-hold circuit. . . . . . . . . . . . 46

4.2 Track-and-hold circuit with RC sampling. . . . . . . . . . . . . . . . 47

4.3 (a) Cascaded RC sampling stages to decrease feedthrough. (b) Non-

linear capacitances due to large switches and large resistors degrade

tracking linearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Modified RC sampling based on Wheatstone bridge operation. . . . . 48

4.5 Comparison of three switch configurations. (a) Switch in series with a

resistor, (b) RC sampler, (c) Wheatstone sampler. . . . . . . . . . . . 49

xi

4.6 Signal feedthrough in an RC and a Wheatstone sampler as well as

switch parasitic capacitance are shown as a function of the switch

width. The length of transistors in the switch is set to 300 nm, the total

signal path resistance R = 10 KΩ, and Cs = 500 fF. The resistance r in

the Wheatstone sampler is chosen to minimize the signal feedthrough.

It can be seen that the signal feedthrough in the Wheatstone sampler

is about 3 orders of magnitude smaller than that of the RC sampler. . 50

4.7 RC switch implementation. Two Wheatstone samplers are cascaded

to further reduce the signal feedthrough (a), all of the switches are

implemented using anti-parallel thick-oxide PMOS devices (b). . . . . 52

4.8 Basic two-phase operation of the ADC. . . . . . . . . . . . . . . . . . 53

4.9 Charge injection from the bottom-plate switch. . . . . . . . . . . . . 54

4.10 More detailed diagram of the ADC MDAC. (a) Finite state machine

representing phases of operation. (b) ADC clocking signals. (c) Single-

ended MDAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4.11 Phase 1 operation duplicates the residue from C1 in (a) onto C3 and

C4 in (b). In phase 2, C3 and C4 charges add up (c) and construct

new residue on C1 (d). In the transition between phases 1 and 2, the

bottom-plate switch is opened first (e). If the top-plate switches are

not opened right after that, switching glitches can get transferred to

the bottom-plate and cause charge leakage (f). . . . . . . . . . . . . . 57

4.12 Top-level and booster amplifier schematics. . . . . . . . . . . . . . . . 59

4.13 Small-signal model of the OTA with varying details. . . . . . . . . . . 61

4.14 Typical pole-zero plots of amplifiers. (a) Miller-compensated amplifier.

(b) Cascode-compensated amplifier. . . . . . . . . . . . . . . . . . . . 62

4.15 T/H transient noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.16 Derivation of kT/C result. (a) Frequency domain analysis of a contin-

uous RC branch. (b) Time domain analysis of switched RC branch. . 65

xii

5.1 Die photo of the prototype chip and the ADC layout. The chip also

includes two other DAC-ADC pairs (not outlined in the photo) for

debugging options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

5.2 Chip I/O interface. The Wheatstone samplers are simplified in this

schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.3 ADC measured static nonlinearity. . . . . . . . . . . . . . . . . . . . 74

5.4 Measured SFDR/SNDR and magnitude of the fundamental tone of the

ADC output versus input frequency. . . . . . . . . . . . . . . . . . . 75

5.5 Measured frequency responses of the ADC. The input amplitude is 1

Vppd. The ADC sample rate is about 1 MHz. . . . . . . . . . . . . . . 76

5.6 Test setup for DAC calibration. . . . . . . . . . . . . . . . . . . . . . 77

5.7 Measured output spectra of the DAC before and after calibration. . . 79

5.8 Measured output spectrum of the DAC before and after calibration.

Input signal is 800 mVppd signal containing two tones at 370 and 376

MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.9 Measured SFDR of the DAC before and after calibration versus input

frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.10 SFDR versus frequency of some published DACs in comparison with a

data point from the prototype chip. . . . . . . . . . . . . . . . . . . . 82

B.1 Alternative model of the DAC including a filter before the driver. . . 91

xiii

Chapter 1

Introduction

CMOS technology scaling has fueled the continuous growth of throughput in com-

munication systems by enabling increasingly sophisticated digital signal processing

(DSP) techniques. The requirements of high-speed communication systems also prop-

agates to the circuitry further up the signal chain, such as the analog-to-digital and

digital-to-analog converters (ADCs and DACs). Traditional architectures of data

converters do not directly benefit from scaling, so there has been a spur of new archi-

tectures (mostly in ADCs) that get a boost from digital gates through calibration.

Recently, a new DAC was introduced that relies on digital calibration for achieving

high performance. The DAC uses an open-loop output driver for delivering power to

off-chip loads. While saving power, this driver is weakly nonlinear. So the DAC relies

on a calibration algorithm to identify its nonlinearity and construct a predistorter to

linearize the overall system. This thesis will focus on the design and implementation

of circuits and algorithms for the calibration of the DAC.

In the rest of this chapter, we briefly review the motivation for the new DAC

architecture. Next, in order to put the problem in perspective, we review some

applications that deal with similar challenges and/or have inspired our solution to

the problem. We discuss how each of these scenarios are similar to the problem at

hand, and in what ways they are unique.

1

CHAPTER 1. INTRODUCTION 2

din

Vout

(a) Simplified diagram of a current steering (CS) DAC.

Voutdinoutputdriver

SC core

(b) Simplified diagram of a switched-capacitor (SC) DAC.

Figure 1.1: Basic DAC architectures.

1.1 Motivation

Overview of DAC Architectures

D/A conversion is typically accomplished through the division or multiplication of a

reference quantity with a digitally controlled switched element network. The reference

can be either of the basic analog quantities, i.e., voltage, current or charge. The latter

two quantities can be more easily controlled in analog circuits and so the majority of

DAC designs are either current or charge based.

A B-bit current based DAC consists of 2B − 1 matching unit current sources, as

shown in Figure 1.1(a). Depending on the input code, the currents of a number of


these sources are steered to a common branch and combined to provide a correspond-

ing analog output. Hence, this architecture is named current steering (CS) DAC. In a

CS DAC, the accuracy of the output amplitude is determined by the current matching

between the cells. This quickly becomes a critical problem at high resolution because

there must be as many unit current sources as the output levels. Also, since the

unit currents are directly steered to the output, the dynamic accuracy of the DAC is

determined by the timing matching between the cells. Therefore, any amplitude or

timing error in the unit cells is transparent to the output. These considerations are

the most important challenges in the design of high performance CS DACs.

Charge based DACs, as shown in Figure 1.1(b), operate through manipulation of

capacitor charges in a switched-capacitor (SC) network, and hence they are known

as SC DACs. The SC core produces a charge output, and so for off-chip driving

capability a driver must be included at the output [1]. The driver samples the output

of the SC core when the conversion is complete and transforms it to an output voltage.

Therefore, amplitude and timing accuracy are separated in this architecture [2], in

contrast to a CS DAC. The fundamental problem with this architecture, however, is

that the output driver must provide linearity at the speed of operation. Therefore,

this architecture has been considered inferior at high operating speeds.

Proposed DAC Architecture

The newly proposed DAC is based on an SC architecture. However, instead of a

traditional closed-loop output driver, it utilizes an open-loop output driver. Figure 1.2

depicts the “big picture” of the research and shows where the newly proposed DAC

fits in. As implied in the picture, the DAC has a weak nonlinearity as a result of

open-loop amplification. This is a let-down in accuracy but it can enable higher

performance in speed. A digital look-up-table (LUT) is placed in front of the DAC

to cancel out its nonlinearity. However, since the nonlinearity of the DAC cannot be

known a priori and is also subject to drift, a digital calibration loop must constantly

adjust the LUT. The input data is readily available in the digital domain and is

therefore accessible for calibration. Also, the calibration algorithm needs to be able

to look at the DAC output in order to assess its nonlinearity. Therefore, an ADC is


DAC

slowADC

AdaptiveAlgorithm

LUT

din Vout

Figure 1.2: The research’s big picture. This research focuses on the design of the

calibration ADC as well as the predistortion algorithm.

needed to provide that data. We will explain why this ADC is labeled “slow” later,

as well as why the addition of all these extra components will not defeat the purpose

of making a higher performance DAC.

The first goal of this thesis is the design of an appropriate ADC to sense the

DAC output for use in calibration. The required ADC must maintain high linearity

across the DAC bandwidth while imposing small power and area overhead. The

other aspect of the research is the design of a practical calibration algorithm with low

computational cost that is suitable for hardware realization.

1.2 Related Applications

1.2.1 Pipeline ADCs

Perhaps the majority of digitally-assisted mixed-signal calibration techniques intro-

duced in the past few years were targeted for pipeline ADCs. The constituent blocks

of a pipeline ADC consist of a sub-ADC, a sub-DAC and a residue amplifier, as shown

in Figure 1.3. The interesting property of a pipeline is that it is robust to errors in the


S/H stage 1 stage 2 stage i stage N

A/D D/A

GVin Vres

Vin

Figure 1.3: Structure of a pipeline ADC.

sub-ADCs, as long as the output of every stage is within the input conversion range

of the rest of the converter. On the other hand, any inaccuracy in the gain of the

residue amplifier is detrimental to the linearity of the ADC and must be compensated

through calibration [3, 4] .

The use of an open-loop amplifier instead of a traditional closed-loop architecture

in pipeline ADCs has been well researched in the past few years [5, 6, 7]. There are

several reasons that make pipeline ADCs especially attractive for calibration tech-

niques. First, ADCs produce digital output that is readily available to a calibration

algorithm without a need for an additional sensing mechanism. Also, the inherent

redundancy of the pipeline architecture tolerates small intentional modulations of

subADC errors without affecting the output. The digital output together with the

introduced modulations inside ADC stages provide information about the nonlinear-

ity under calibration. A DAC, on the other hand, is a continuous time system and so

it must be paired with an ADC in order to feed the calibration algorithm with output

signal information. Also, regular DAC architectures lack an inherent redundancy as

in the case of pipeline ADCs. As we will see in the next chapter, the proposed DAC

is especially designed to have redundant codes that can be exploited for calibration.


digitalpredistortion

PADAC

ADC

Figure 1.4: Digitally predistorted PA.

1.2.2 Digital Predistortion of Power Amplifiers

Open-loop output drivers are reminiscent of power amplifiers (PAs), where the oper-

ation at radio frequencies (RF) precludes the use of a closed-loop amplifier. Similar

to our current application, PAs also are sometimes driven into their nonlinear region

in order to achieve higher DC to RF power conversion efficiency. Operating in the

nonlinear region generates both in-band distortion resulting in higher bit error rate,

as well as out-of-band spectral re-growth causing interference to adjacent channels.

Therefore, there is a need for PA calibration to inhibit these errors.

The calibration techniques used for PA nonlinearity compensation are diverse and

depend on the type of nonlinearities involved and the desired level of accuracy. In case

the PA has a memoryless nonlinearity, lookup table (LUT) approaches can be used

for the predistorter [8]. A Wiener predistorter is utilized to linearize a PA modeled

by a Hammerstein system [9]. In wide bandwidth applications such as wideband

code-division multiple access (WCDMA) or wideband orthogonal frequency-division

multiplexing (W-OFDM), PA memory effects are more complicated. These memory

effects can occur both on the timescale of the signal—e.g. due to filtering of the

matching networks—as well as much longer timescales—e.g. due to thermal time

constants and memory of the bias lines. Assuming a general Volterra filter model, the

PA can be predistorted by first identifying the Volterra kernels and then calculating

the so-called pth-order inverse [10]. Also [11] shows the construction of robust memory

polynomial predistorters by adopting an indirect learning architecture.


PA

AEC

echo

localspeech

LEMS

far-end speech

speaker

mic

near-end speech+ residual echo+ noise

Figure 1.5: General setup of an AEC.

System-level calibration algorithms become increasingly complicated as aggregate

errors of more blocks come into sight. For example, a baseband predistorter in a

Cartesian transmitter must compensate not only the nonlinearities of the output PA,

but also any frequency dependent amplitude mismatch and phase imbalance between

the two paths. In these cases, the calibration does not constitute just a small overhead,

but rather requires considerable power and complexity.

1.2.3 Loudspeakers

Another scenario that resembles the current application is the problem of acoustic

echo cancellation (AEC) in loudspeakers. This problem arises in full-duplex hands-

free telephone systems, where acoustic coupling between the loudspeaker and the

microphone at the near-end of the line causes echoes at the far-end. The microphone

at the near-end picks up both a direct sound from the loudspeaker as well as its

reflections through the loudspeaker-enclosure-microphone system (LEMS) such as a

conference room or a car compartment. Figure 1.5 shows an acoustic echo path in a

telephone system. Similar paths exist at every end of a telephone system that employs

a hands-free system. An acoustic echo has a long and often time-varying delay that

includes nonlinear components and so it becomes annoying for the far-end speaker.

Nonlinearity in the system can also occur in cheap loudspeaker when the amplifier

is being overdriven. In case the amplifier saturation is negligible or target sound

quality is not high, a linear model or a slowly time varying model can be used.


However, it is preferable to use low-cost and small-sized loudspeakers in consumer

audio products for economic reasons. These cheaper loudspeakers exhibit nonlinear

characteristics and signal distortion and make the acoustic path nonlinear.

Some extreme solutions for this problem include the use of Volterra filter as in

[12], which is very computationally demanding. Also, neural network based structures

were proposed in [13], but training the neural network can be costly. An alternative

approach is to model the acoustic path with a Hammerstein system consisting of a

memory-less nonlinearity corresponding to the PA/loudspeaker, and a linear block

corresponding to the room impulse response from the loudspeaker to the microphone.

This model is less complicated than a Volterra filter and can be more easily calibrated

[14].

DACs, PAs and loudspeakers all produce continuous time outputs, can be modeled

as systems with a mixture of nonlinearities and memory effects and can be predis-

torted in the digital domain to inhibit the nonlinearities. However, there is not a

single recipe that can fit all applications. The appropriate calibration technique de-

pends on the characteristics of the model (such as the number and magnitudes of

the parameters), the allowable overhead for calibration (in terms of hardware and

calibration time), the desired accuracy, etc.

1.3 Thesis Outline

The rest of this thesis is organized as follows:

• Chapter 2 covers some of the requirements of the system under calibration.

Specifically, we focus on the requirements of the calibration ADC and discuss

how some of the important ADC characteristics such as sampling speed, acqui-

sition bandwidth and noise affect the calibration.

• Chapter 3 introduces the suggested linearization method. We will discuss two

calibration methods that were implemented in software to linearize the DAC

and that achieved comparable results. The first method is based on parameter

estimation that results in a short calibration time but requires considerable


computational prowess, which makes it unsuitable for physical implementation.

A second calibration method is proposed that trades off algorithm complexity

with calibration time and has good potential for VLSI implementation.

• Chapter 4 explains the physical design and implementation of the calibration

ADC. We discuss the ADC architecture and go over its various building blocks

at the circuit level. Important considerations and trade-offs in the design of

each block are also covered.

• Chapter 5 provides measurement results of the prototype IC.

• Finally, chapter 6 concludes the thesis and provides some ideas for future work

and improvement of the system.

Chapter 2

System Considerations

2.1 DAC Requirements

In the previous chapter, we talked about the proposed DAC architecture. In this

section, we review some of the characteristics of the DAC that are essential in under-

standing the system requirements and the calibration.

We mentioned that in order to simplify the calibration, the DAC is specifically de-

signed to have memoryless nonlinearity [15]. Therefore, there is a one-to-one mapping

between inputs and outputs of the DAC and a digital predistorter having the inverse

mapping can cancel its nonlinearity. It is important to note that there is only so much

that the calibration can do because it does not change the range of the output pulses

that the DAC can generate. The calibration (via the predistorter LUT) can provide

the DAC with alternative input codes, but nevertheless the DAC must be designed to

be able to output all the pulses that a perfectly linear DAC generates. This is illus-

trated in Figure 2.1. The first picture shows a DAC with exaggerated nonlinearity as

well as the desired response. We can see that the DAC cannot always produce outputs

that are within 1 LSB of the desired outputs, such as the ones that are marked with

filled squares. This would only happen close to the mid-code where the I/O curve

has the largest slope. This problem can be solved by including redundant codes in

the DAC. It can be seen that if the mid-code slope is k times larger than the ideal

line then the DAC must have at least k times more codes to be able to fill the output

10

CHAPTER 2. SYSTEM CONSIDERATIONS 11

input code

Vout

(a) (b)

input code

Vout

Figure 2.1: Two examples where a DAC’s I/O curve is not reversible. (a) Input codes

are not tight enough. (b) I/O curve discontinuity produces dead-zone.

1111111111111111111011111111011111111100

0000000111000000011000000001010000000100

0000000011000000001000000000010000000000

11111111

00000001

00000000

DSP ADC

fsfs/2

Spectrum Analyzer

Figure 2.2: The calibration algorithm modifies the DAC’s mapping between input

digital codes and its codomain of output pulses.


gaps. Another potential problem is shown in Figure 2.1(b). In this case, the DAC

exhibits a discontinuous I/O curve. This happens because of capacitor mismatches

inside the switched-capacitor (SC) core of the DAC and—if left to randomness—the

jumps can go in either directions. However, the upside jumps create dead zones in

the DAC output that cannot be fixed. Therefore, the capacitors in the core must be

intentionally mismatched so that the SC core produces downward jumps large enough

that random mismatches do not make the transfer curve irreversible.

A similar argument can be made about the dynamics of a DAC output. Figure 2.1

only focuses on the settled values of the output pulses. However, the transients of

a pulse also affect the linearity of the output [16] and these dynamics cannot be

left to the calibration to correct. The idea here is demonstrated in Figure 2.2. The

DAC accepts 8-bit input codes. But internally it contains two additional bits for the

implementation of the redundancy. The calibration algorithm can then associate one

of the 210 possible outputs for each of the 28 input codes. Since the pulse transients

are already hard wired in the circuit, the DAC must be designed such that it can

produce pulses with adequate linearity in the bandwidth of interest.

2.2 ADC Requirements

We showed that not any random nonlinear DAC can be successfully predistorted.

Next, we discuss the requirements of the calibration ADC. Obviously, a high-precision

ADC that runs at the Nyquist rate of the fastest tone in the system and connects

to the DAC through a high-bandwidth buffer will do. However, such an ADC will

burden the whole system. In this section, we will see what characteristics of the

calibration ADC are key and what others can be compromised without affecting the

calibration.

2.2.1 Sampling Rate

The literature on sampling requirements for identification of nonlinear systems is

large and diverse. According to the Nyquist-Shannon sampling theorem, in order to


Vc

Vd

Ts Ts

x(t)y(t)

e[n]x[n] y[n]

Figure 2.3: Discrete-time approximation of a continuous-time Volterra system.

discretize a system without loss of information the sampling frequency—known as

the Nyquist frequency—is sufficient to be twice the maximum signal frequency in the

system. Nonlinearity expands the signal bandwidth, and so the Nyquist sampling

frequency is determined by the output. However, the Nyquist frequency of the tones

that a nonlinear system generates are often too high for practical purposes. Still,

information about the nonlinear system can allow a more relaxed sampling require-

ment. It is shown that the output of a system with a memoryless nonlinearity only

needs to be sampled at the Nyquist frequency of its input and still be perfectly re-

constructed [17, 18, 19]. This problem is also studied in the case of nonlinear systems

with memory, i.e. systems with a Volterra kernel model, and it is shown that again

the sampling frequency only needs to accommodate the input signal [20, 21]. This

result is obtained by inspecting Figure 2.3, where the sampled output of a continuous

time system is compared against the output of an equivalent discrete time system

that operates on the sampled input. It is shown that, with sampling frequencies as

low as the input Nyquist rate, the two outputs will be identical and e[n] = 0. It is

therefore concluded that an identification algorithm that uses the sampled input and

output signals will correctly identify the original continuous time system.

In the present application, however, our objective is not to reconstruct the output

of the DAC. We try to find the parameters of a model of the system. Essentially

our problem is a parameter estimation problem, as shown in Figure 2.4, where the

unknown is the system parameters that are modulated by a known sequence of input

data. The input code x[n] is in digital and available to the estimator in its entirety.


Vc

Vd

Ts

y(t)

e[n]

θ(t)

X[n] y[n]

Figure 2.4: Parameter estimation of a continuous-time Volterra system. The unknown

signal is the slow-changing set of parameters θ(t) and not the input code x[n].

The challenge of the parameter estimation algorithm is to demodulate x[n] from the

sampled output and deduce the hidden parameters of the system, θ(t). Therefore, the

actual signal that needs to be reconstructed is θ(t). The innovation rate of the system

is the rate at which these unknown parameters change—which is usually very slow—

and the sampling frequency only needs to track such changes in the parameters. These

variations can happen due to environmental effects while the circuit is running or at

startup—when the circuit wakes up to a set of actual parameters that are different

from the preset values. Often the bottleneck is set by the calibration time budget at

startup. This time, along with the amount of data that is needed by the calibration

algorithm to estimate θ, determines the sampling frequency.

As an example, suppose we know that the system can be modelled as y(t) =

x(t)+εx3(t). Therefore, we just need to estimate ε in order to construct a predistorter.

If we ignore noise in this simple example, then no matter what the frequency of x(t),

all we need to calculate ε would be a single measurement of y(t). This argument can

be extended to the general case of an arbitrary nonlinear system with memory that

can be modelled with a discrete time Volterra series [22, 23, 24]. In this case, the


relationship between the output y[n] and inputs x[n] is

y[n] = h0 +m−1∑i1=0

h1(i1)x[n− i1] +m−1∑i1=0

m−1∑i2=0

h2(i1, i2)x[n− i1]x[n− i2] + . . .

+m−1∑i1=0

m−1∑i2=0

· · ·m−1∑il=0

hl(i1, i2, . . . , il)x[n− i1]x[n− i2] . . . x[n− il] (2.1)

where m is the length of memory and l is the order of nonlinearity of the system. We

can see that although the output y is a nonlinear function of the input x, it is a linear

function of the filter parameters h. Therefore, these parameters can be found from a

sufficient number of input-output equations using, e.g., a least squares solution.

...

y[n1]

y[n2]

y[n3]...

=

.... . .

.... . .

.... . .

1 · · · x[n1 − i1] · · · x[n1 − il1] · · ·x[n1 − ill] · · ·1 · · · x[n2 − i1] · · · x[n2 − il1] · · · x[n2 − ill] · · ·1 · · · x[n3 − i1] · · · x[n3 − il1] · · · x[n3 − ill] · · ·...

. . ....

. . ....

. . .

h0...

h1(i1)...

hl(il1, . . . , ill)...

(2.2)

From the above equation, we can see that it does not really matter how far apart

in time y[n1] and y[n2] were placed. Even if y[n] is obtained at very low sample

rates, the least squares problem can be solved. Of course, the sampling rate must

be fast enough that the filter parameters stay constant while the above matrices are

populated. Therefore, it is possible to find the coefficients of a general Volterra model

from downsampled outputs of the system. There are also several methods to find the

inverse of such a system [10, 25].

It should be noted that modeling a system with a general Volterra series is a

brute-force method that is often not a good recipe for parameter estimation. A

Volterra model is often very complex and the number of involved parameters quickly

grows as the order of nonlinearity and memory in the system increases. For example,

consider a system that consists of the cascade of a static nonlinearity and a linear

filter. Assuming that the nonlinearity can be described with l coefficients and the filter


with m coefficients, the Volterra series of the system would include l ·m coefficients,

whereas only the l + m underlying parameters are truly needed for modelling. Also

typically, many of these coefficients are already small. When multiplied together they

will be even smaller, making the identification problem degenerate and increasing the

error. Numerical stability of the problem can be improved with regularization but,

in general, it pays off to go deeper with the modeling to incorporate the underlying

coefficients and not use the general Volterra form.

2.2.2 Acquisition Bandwidth

In the previous section, we concluded that the identification of a discrete time Volterra

filter with known inputs can be accomplished with arbitrary sampling rates at the

output. In this section, we study the requirement on the acquisition bandwidth of the

sampler. For this analysis, we need to think in the frequency domain, and to make

the thinking simpler, we first assume that the sampler works faster than necessary at

the Nyquist frequency of the output so there is no spectrum folding.

The goal of the calibration algorithm is to linearize the DAC. However, the DAC

output spectrum beyond the Nyquist zone will be filtered further down the signal

chain and so it is irrelevant to its performance. The calibration ADC is to capture

the output and judge the linearity of the system. So it suffices for the ADC to have

better than the target linearity in the DAC Nyquist zone and ignore higher frequency

content. In practice, however, the ADC does not quite ignore the spectrum above the

DAC bandwidth, but captures it with some error. This error must be within bound so

it does not affect the accuracy of parameter estimation. The essence of most param-

eter estimation algorithms is a least squares problem where the algorithm converges

to a set of parameters that minimize the squared error. In this case, our calibration

algorithm tries to linearize the system, so the error must capture the nonlinearity

of the DAC. According to the Parseval theorem, the total squared nonlinearity er-

ror in the time domain is equivalent to the total distortion power in the frequency

domain. Therefore, the calibration can function properly as long as the out-of-band

nonlinearity error does not pollute the in-band error. In other words, the power of


the dominant in-band harmonic must be larger than the total out-of-band distortion

power.

If the ADC downsamples the DAC output, the harmonics will fold down and

mix. But as we proved in the previous section this does not affect the calibration.

It will only make the reconstruction of the DAC output impossible. But from the

perspective of a least square problem downsampling is equivalent to eliminating rows

of equations. The final result will be intact as long as there is a sufficient number of

equations.

2.2.3 Boxcar Sampling

The slow calibration ADC may sample an output pulse of the high speed DAC in

no more than one instant. However, ideal sampling cannot fully measure the non-

linearity of the whole pulse unless the DAC is designed to have sufficiently fast or

linear transitions and the output pulse is sampled after it is settled. A typical DAC

output pulse may exhibit various artefacts, such as nonlinear transitions, overshoot

or ringing, clock feedthrough, etc., before it settles to its final value. Taking a single

sample during such transients does not fully represent the whole pulse. A better

choice might be to sample the pulse close to its end when its value is almost settled.

This representation especially works for sampled systems, such as switched capacitor

circuits, which process instantaneous signal samples. However, a DAC output is a

continuous time signal and signal transients must be taken into account as part of

the whole picture. Heuristically, a better linearity measure is pulse energy which can

be obtained by integrating the whole pulse. This way all transients will be simply

combined proportional to their temporal durations. If the transients are short, then

the pulse energy will be proportional to its final value and if not, they will be weighed

in accordingly. Equivalently, we could measure the so called ”glitch energy”, which is

the integral of the difference between the actual and ideal pulse, as a measure of the

nonlinearity of a DAC. Yet again, pulse or glitch energy do not fully represent a con-

tinuous time pulse. Linearity of pulse energy is neither a necessary and nor sufficient

condition for linearity of a DAC. Figure 2.5 illustrates two example cases when looking


(a) (b)

time

output signal

time

output signal

Figure 2.5: Two examples where pulse energy does not translate to signal linearity.

(a) Output signal is nonlinear but pulse energy is linear. (b) Output signal is linear

in bandwidth of interest but pulse energy is nonlinear.

at pulse energy alone could be misleading. In case (a) the signal exhibits nonlinear

overshoot and undershoot. However the two deviations almost cancel each other in a

way that the area under the pulse remains linear. In case (b) there is significant clock

feedthrough which leads to a nonlinear pulse energy. However such fast transients

are more likely to fall outside the bandwidth of interest and be completely irrelevant.

In comparison, the exact same glitch energy if originated from wrong signal levels

would be more important. Next, we will examine the heuristic behind pulse energy

and consider the situations when it provides meaningful information.

The ultimate test for linearity of the DAC is to examine its frequency response

via Fourier transform. If we write the DAC output as a series of pulses as

vDAC(t) =∑k

pk(t− kTs) (2.3)


then its Fourier transform will be

F(vDAC) =

∫ ∑k

pk(t− kTs)e−j2πftdt

=∑k

(∫Ts

pk(t)e−j2πftdt

)e−j2πfkTs

=∑k

dADC [k]e−j2πfkTs

= DTFTdADC [k] (2.4)

where

dADC [k] =

∫Ts

pk(t)e−j2πftdt (2.5)

Therefore, given the above relationship for the ADC output, the DTFT of the ADC

samples will be equivalent to the continuous-time Fourier transform of the DAC out-

put. This means that ideally the ADC must prefilter the DAC output with a boxcar

sampler that has a complex gain of e−j2πft. Measuring the pulse energy is equivalent

to using a normal boxcar sampler with a fixed gain of 1. It will give an approximation

of the ideal case which is valid only at frequencies that are much smaller than the

DAC sampling frequency:

0 ≤ t ≤ Ts : e−j2πft ≈ 1↔ fTs 1↔ f 1

Ts(2.6)

We see that there is not an easy way to accurately represent the DAC pulse wave-

forms with discrete samples. However, all we need the ADC for here is to calibrate

the DAC nonlinearity. We also know that the DAC signal may undergo some filtering

depending on its [off-chip] load and any calibration algorithm must be blind to linear

filtering at the output of the DAC. Therefore, the ADC may as well operate on a

filtered version of the DAC output. Assuming that there is a filter h(t) at the input

of the ADC (Figure 2.6), we have

dADC = h(t) ∗∑k

pk(t− kTs) =∑k

∫h(τ)pk(t− kTs − τ)dτ

If the filter response stays constant during the intervals of Ts the above equation can

be simplified to

dADC =∑k

h(t− kTs)ek (2.7)


001010011011001000001110

01011001

Ts

fastDAC

slowADC

filter

001010011011001000001110

01011000

fastDAC

slowADC

(a)

(b)

Figure 2.6: System block diagram and mapping between continuous-time and

discrete-time. (a) Accurate mapping of a continuous time signal into a discrete sam-

ple cannot be implemented easily. A prefilter in front of the ADC as shown in (b)

can facilitate this.


t

t

f

f

h1(t)

h2(t)

H1(f)

H2(f)

T

T fs

fs

(a) (b)

(c) (d)

Figure 2.7: Two pre-filter options for the calibration ADC: A filter with a long time

response (a) has a narrow frequency bandwidth (b). However, a boxcar sampler with

a short time window (c) will pass more frequencies (d).

where ek is the energy—or integral of—pulse pk.

Two example filters that satisfy the above criterion are shown in Figure 2.7. In

case (a), the filter has a slow changing time domain response. However, this translates

into severe filtering in the frequency domain which decreases the sampling SNR. The

filter shown in (c) provides a better alternative with a wider bandwidth.

Therefore, we end up using a boxcar sampler just in line with the heuristic intuition

that was mentioned at the beginning of this section! To ascertain that we were not

going in loops here with all the discussion that went in between, we note again that

in our system the ADC will not represent the DAC output, but rather its sinc-filtered

output. The reason this is acceptable is that the sampler has to be sensitive to the

DAC nonlinearity and insensitive to output filtering. So insertion of an extra filter


before the ADC, as long as it does not limit its acquisition bandwidth, does not get

in the way of the calibration.

2.2.4 Noise Requirement

In order to see the impact of noise on the calibration, we again consider approaching

the parameter estimation problem with a least squares method. In this framework,

there are typically more equations than unknowns, i.e., the problem is overdeter-

mined. The least squares approximation is intentionally tailored to weigh in all the

information and average out the noise. Therefore, as long as we can provide enough

measurements, the parameter estimation problem can be made robust to noise. This

means that the ADC can have relaxed noise requirements. However the downside will

be longer calibration time. We will take a closer look at the trade-off between noise

and calibration time in the next chapter.

2.3 Summary

In this chapter, we discussed some of the high-level requirements of the DAC and the

ADC. It is seen that the requirements on the calibration ADC are much different than

those of a general purpose ADC, which leads to unique tradeoffs and opportunities.

To recap, the ADC has very low conversion rate and high noise. On the other hand, it

must have large acquisition bandwidth, small power and area overhead on the overall

system as well as negligible loading on the DAC. These requirements only make sense

when the design process starts at the system level.

Chapter 3

Calibration Algorithm

Before we can develop a calibration strategy, we must have an appropriate model

of the system. The complexity of this model directly affects the complexity of the

calibration algorithm. So it is important that we model the system in the simplest

possible way without compromising accuracy.

In this chapter, we first review a few models of the DAC in discussion. Next,

we will cover a few calibration algorithms and also describe several techniques to

reduce the computational power of the algorithm. The development of the calibration

algorithms and the design of the DAC were partly overlapped. So the progress of the

discussion in this chapter reflects the trend of thought during the research timeline.

3.1 System Modelling

3.1.1 Volterra Model

We mentioned in the previous chapter that a general nonlinear system with memory

can be modeled as a Volterra filter. While being general, a Volterra model presents

a convoluted picture of the system and, more often than not, includes redundant

degrees of freedom. Therefore, the estimation of a Volterra filter parameters and,

consequently, the calculation of its inverse filter will become unreasonably costly.

Obviously, the most direct way of simplifying the model is through careful design of

23

CHAPTER 3. CALIBRATION ALGORITHM 24

xy

(a) Predistortion off

xy ≈ x

(b) Predistortion on

Figure 3.1: DAC with static nonlinearity and nonlinear output filter.

the system. The specific DAC under calibration was intentionally designed such that

the nonlinearity and the memory effects do not blend together [15]. The nonlinearity

of the DAC comes in part from the SC core—due to capacitor mismatches—and in

part from the open-loop drivers. The memory effect mainly happens at the output

load which resides off-chip. In order to keep the nonlinearity and the filter isolated,

care was taken to decrease any filtering effect through the SC core up until the

drivers. Then, using circuit level information we can come up with a model that is

much simpler than a general Volterra filter.

3.1.2 Static Nonlinearity and Nonlinear Filtering

By minimizing filtering through the DAC core we are able to model it as a static

nonlinearity, which is the simplest form of nonlinearity. The output filter, however,

might not be a simple linear filter. Especially, in the early versions of the DAC design,

there was non-negligible nonlinearity in the output filter. This was caused due to the

voltage dependence of the junction capacitors at the output and large voltage swing

at this node. Therefore, the DAC is modeled with a static nonlinearity cascaded with

a nonlinear filter. We will next explain an algorithm for predistortion of a DAC with

this model.

Figure 3.1 is a simple diagram showing the predistorter block, the DAC core and

the output filter. In the first picture, the predistorter is off and so the input x directly

gets to the DAC input. The open-loop driver of the DAC produces a current G(x)

with a weak sigmoid nonlinearity. This current is pumped into a resistive load and a

weakly nonlinear output capacitance. Writing a KCL at this node, we can find the


output voltage y as below:

G(x) = C(y)y + y (3.1)

where C(y) is the voltage dependent capacitance, y is the derivative of the output

voltage and the equation is normalized to the value of the resistance. The information

available to the calibration algorithm are the entire sequence of the input x as well as

sporadic samples of y. We see that the derivative of the output also participates in

the system’s equation, but it cannot be obtained from downsampled output values.

However, if the nonlinearities of the circuit are small or if the predistorter can almost

linearize the DAC, then the output y will almost be equal to x (Figure 3.1(b)). So we

may use the input derivative as an initial approximation for y. This approximation

improves as the system is progressively linearized with calibration. Therefore, we can

write

y = −C(y)y +G(x) ≈ −C(y)x+G(x) (3.2)

Lets assume for simplicity that we pick polynomial bases to describe the nonlinear

functions G(·) and C(·). Thus,

y ≈ −x(c0 + c1y + c2y

2 + c3y3 + . . .

)+(g0 + g1x+ g2x

2 + g3x3 + . . .

)(3.3)

As we take more and more samples of the output, we can build up a system of

equations in terms of the unknown coefficients c and g:

...

yl

ym

yn...

=

......

......

. . ....

......

.... . .

−xl −xlyl −xly2l −xly3l · · · 1 xl x2l x3l · · ·− ˙xm − ˙xmym − ˙xmy

2m − ˙xmy

3m · · · 1 xm x2m x3m · · ·

−xn −xnyn −xny2n −xny3n · · · 1 xn x2n x3n · · ·...

......

.... . .

......

......

. . .

c0

c1

c2

c3...

g0

g1

g2

g3...

(3.4)


xz y ≈ x

LMS algorithm

Figure 3.2: Calibration of a DAC with static nonlinearity and nonlinear output filter.

These equations can be solved using a least-mean-squares (LMS) method, as

shown in Figure 3.2. After obtaining estimates of the DAC nonlinearity function

C(·), the next step would be to predistort the incoming data such that

C(z) = x⇒ z = C−1(x) (3.5)

The predistorter can use the following algorithm to calculate the inverse function:

Algorithm 1 Iterative predistortion algorithm to calculate z = C−1(x)

1: z(0) = x

2: while z has not reached steady state do

3: z(n+1) = z(n) + ε(x− C(z(n))

)// where ε is a small update factor

4: end while

Since the nonlinearity is weak, the initial value of z in line (1) is not far from

accurate. After a few iterations at steady state we have

z(n+1) = z(n) (3.6)

which combined with the update formula in line (3) results in

z(n+1) = z(n) = z(n) + ε(x− C(z(n))

)⇒ x = C(z(n)) (3.7)


x y

c h

Figure 3.3: Hammerstein model of a system with input x and output y.

which is exactly what we needed.

The example calibration in this section shows how the knowledge about the nature

of nonidealities can help us come up with a calibration scheme. The main techniques

that were used in this example—such as LMS filtering, adaptive calibration and

iterative predistortion—are recurrent themes in calibration algorithms. We will see

more examples of these techniques in the rest of this chapter.

3.2 Hammerstein System Calibration

As the design of the DAC progressed, it turned out that the static nonlinearity of the

DAC tends to be more complicated than the output filtering. The DAC eventually

utilized three interleaved cores to increase sampling speed. Since each core is different,

there would be three times more nonlinearity coefficients. On the other hand, the

nonlinearity of output capacitance was diminished through the use of cascode devices.

Eventually, the DAC model boiled down to a cascade of a static nonlinearity and a

linear filter, as illustrated in Figure 3.3. This model is known as a Hammerstein

model. In the next two sections, we cover techniques for predistortion of this model.

3.2.1 Parameter Estimation

A Hammerstein model is defined by two sets of parameters: c ∈ Rl is a vector of

nonlinearity coefficients and h ∈ Rm a vector of filter coefficients. It can be seen

that the relationship between the input and output in this system can be described


as follows

yi = hT

1 xi x2i . . .

1 xi−1 x2i−1 . . .

1 xi−2 x2i−2 . . ....

......

. . .

c = hTVic (3.8)

A Hammerstein system identification problem is finding the two vectors c and h

that best describe the input-output relationship of a real system, or in other words,

minimize the error between the actual output of the system and the one expected

output from a Hammerstein model. Therefore, we have

[h c] = arg min[h c]

N∑i=1

(yi − hTVic

)2(3.9)

The above equation for h and c does not have a closed-form solution. However, we

can see from (3.8) that a Hammerstein model is linear in terms of h and c. So having

an estimate for c or h, we can find the other one through a least squares problem;

then, we can use the new vector to improve our initial estimate and continue these

iterations until results converge. We will now take a closer look at one step of this

iterative method.

With a given c vector, the equation below denotes an LS problem that gives h:

y = Ah (3.10)

where y is a vector of DAC output digitized by the ADC and A is a tall matrix whose

elements are functions of the DAC inputs and the c coefficients. As the ADC takes

more measurements, new elements will be appended to the end of the vector y and

new rows to the end of this equation. For example, a few rows of the equation (3.10)

will look like

...

yl

ym

yn...

=

......

.... . .

f(xl) f(xl−1) f(xl−2) . . .

f(xm) f(xm−1) f(xm−2) . . .

f(xn) f(xn−1) f(xn−2) . . ....

......

. . .

h0

h1

h2...

(3.11)


where f is our estimate of the nonlinearity function based on the current set of c

vectors. After solving the above equation for h, we can update the c coefficients

through this equation:

y = Bc (3.12)

where the entries of the matrix B are functions of the DAC input as well as the

current estimate of the filter h. So a few rows of this equation will be

...

yl

ym

yn...

=

......

.... . .∑

hk∑hkxl

∑hkx

2l . . .∑

hk∑hkxm

∑hkx

2m . . .∑

hk∑hkxn

∑hkx

2n . . .

......

.... . .

c0

c1

c2...

(3.13)

The two LS problems must be solved in each iteration until the coefficients c and

h converge to their final values. There are efficient techniques for solving such LS

problems. In particular, the algorithm can be implemented so that it does not need to

store the whole A or B matrix before solving it, but rather the computations can be

paced at the rate the matrices are being built. So the algorithm takes each row of the

matrix and the corresponding entry of y as they come in and updates the solution.

Therefore, the height of the LS matrices, as determined by the noise requirement,

only adds to the computation time and not the hardware computing resources. The

width of the matrices, however, is equal to the number of unknown parameters and

will directly determine the complexity of the DSP hardware.

The width of the matrix A is equal to the number of filter coefficients, h, and is set

by the time constants associated with the load. The width of the matrix B is equal

to the number of nonlinearity coefficients. As we mentioned before, the DAC includes

three interleaved cores, each of which have their own driver nonlinearity coefficients

as well as SC core mismatch coefficients. With three cores, the equations (3.11) will


change into

...

yl

ym

yn...

=

......

.... . .

f(xl) f ′(xl−1) f ′′(xl−2) . . .

f(xm) f ′(xm−1) f ′′(xm−2) . . .

f(xn) f ′(xn−1) f ′′(xn−2) . . ....

......

. . .

h0

h1

h2...

(3.14)

where f(·), f ′(·) and f′′(·) denote the nonlinear functions corresponding to the three

cores. On the other hand (3.13) changes into

...

yl

ym

yn...

=

......

.... . .∑

h3k∑h3kxl−3k

∑h3kx

2l−3k . . .∑

h3k∑h3kxm−3k

∑h3kx

2m−3k . . .∑

h3k∑h3kxn−3k

∑h3kx

2n−3k . . .

......

.... . .

c

c′

c′′

(3.15)

where c, c′

and c′′

are vectors corresponding to the nonlinearity coefficients of each

core. We see that the matrix entries in (3.15) are now more complicated than those

in (3.13). Also the LS matrix is now three times wider than before. Therefore,

solving the corresponding LS equation in (3.15) becomes a major bottleneck in the

calculations. In the next section, we investigate methods to reduce the complexity of

the problem.

3.2.2 Adaptive Predistortion

The linearization technique based on parameter estimation first attempts to estimate

all parameters of the model; then, it passes the nonlinearity coefficients to a predis-

torter block where the inverse nonlinearity is constructed (Figure 3.4). It is clear

that this is a brute-force approach to linearization problem because only a subset of

the identified parameters are needed for constructing the predistorter. In particular,

filter coefficients will be discarded at the end of parameter estimation. They are only

used as a complementary step in the iterative solution, whereas the main goal is the

identification of the nonlinearity coefficients from which we can calculate the inverse


ADCSystem

ID

outputvoltage

DAC h1

h2

LUTdigitalinput

x

y

d

Figure 3.4: Calibration through system identification.

nonlinearity and construct a predistorter. In fact, even the nonlinearity coefficients

are not directly used at the end of the algorithm and all we care about is obtaining

the inverse coefficients.

The redundancy of information produced in a parameter estimation based lin-

earization suggests that there might be a direct path of obtaining the predistorter

coefficients without having to identify the system parameters. In other words, the

objective of the calibration algorithm is not to estimate the system parameters per

se, but rather to come up with a set of predistortion parameters that can invert the

system nonlinearity. An efficient calibration algorithm should be blind to any linear

filter inside the system and only provide a measure of nonlinearity of whole signal

path. Given a cost function that measures the path nonlinearity, an adaptive algo-

rithm can then directly adjust the predistorter until desired linearity is obtained. This

linearization approach is shown in Figure 3.5. Such an adaptive method bypasses the

calculation of system parameters and therefore can be more efficient than a lineariza-

tion method through system identification. In the following section, we investigate

the calculation of a cost function that can be used in an adaptive calibration.


ADCAdaptive

Filter

digitalinput

outputvoltage

DAC h1

h2

LUT

y

d x

Figure 3.5: Calibration through adaptive predistortion.

Adaptive Linearization Based on Coherence Function

A suitable cost function for linearizing the system must yield a measure of linearity

of the system given its inputs and outputs. Having such a cost function, we can

adjust the control knobs of the system until desired linearity is achieved even without

knowledge of the linear block. An example cost function used in [14] in the context

of AEC calibration is the coherence function as given below:

Cxy(f) =|Sxy(f)|2

Sxx(f)Syy(f)(3.16)

where Sxy(f) is the cross-spectral density between x and y, and Sxx(f) and Syy(f) are

the auto-spectral density of x and y, respectively. It can be shown that this function

is equal to unity at any frequency where the input/output relationship is linear and its

value decreases as the nonlinearity grows. In order to have a measure for nonlinearity

across all frequencies, the integral of the coherence function can be used. Calculation

of the coherence function is especially economical thanks to effective methods for

obtaining DFT. However, capturing the samples for the calculation of DFT requires

Nyquist sampling at the output of the system which we hoped to avoid.


Figure 3.6: Recursive least squares using Givens rotations.

Adaptive Linearization Based on LS Error

A better candidate for the cost function of an adaptive predistortion can be obtained

by the following formula:

e = ‖y −Dh‖ (3.17)

where y is the output of the system, D is a matrix containing the inputs and h is a

filter and comprises the linear relationship between d and y. The error captures the

deviation from a linear filter model of the system so it can quantify the linearity of

the system.

The least squares (LS) error in (3.17) can, of course, be found by first solving the

equation for h and then explicitly evaluating the expression. This is typically done

by applying orthogonal transformations to both sides of the equation such that the

matrix D is triangularized, which then makes the equation much easier to solve. The

transformation can be done through a series of Givens rotations, where each rotation

zeros an element in the subdiagonal of the matrix. In a real time application with

growing sets of measurements, every time a new row is appended to D, a series of

Givens rotations are applied to zero that row. This will update the triangular matrix

which then is used to update the LS solution. An example is illustrated in Figure 3.6.

The leftmost picture shows an already triangularized matrix D alongside the vector

y and it is assumed that the length of the filter h is 3. When the ADC takes a new

sample, a new row will be added to the equation. Three Givens rotations are then

applied that zero the new row. The LS solution h and the LS error in (3.17) can then

be calculated successively.

However, it can be shown that having the Givens rotation angles is sufficient for


dnT yn1

e

Figure 3.7: Systolic array implementation of the RLS error calculation algorithm.

calculating the error of a LS problem and there is no need for explicit calculation

of the filter coefficients [26, 27]. A proof of this theorem is presented in Appendix

1. A pseudocode is also provided that shows how the error can be updated online

when the LS equation develops over time. The interestingly simple final result is that

the residual error is proportional to the product of all the cosines of Givens rotation

angles:

en =m∏i=1

cos(φi)pn (3.18)

where en is the error at time n, φi is the angle of the ith Givens rotation, and pn is

the results of the Givens transformations applied to the nth measurement yn.

In this work, the RLS error calculation was implemented in software. However,

there is so much that can be done to make it more hardware-friendly. Specifically, the

calculation of sinφ and cosφ ’s involved in the algorithm (lines 7 and 8 of the pseu-

docode in Appendix 1) can be done without any division or square root operation

using the CORDIC algorithm. In this case, the whole algorithm can be implemented


using simple operations such as addition, subtraction, bitshift and table lookup. Fur-

thermore, as Figure 3.7 shows, the algorithm can be pipelined in a systolic array

structure [27]. The systolic array in this case consists of a network of processing cells

and each cell is a CORDIC block. Rows of the data matrix and elements of y enter

the array from the top and the error is obtained at the bottom. Since the processing

is pipelined, the grid accepts new data and sends out an updated error in every cycle.

From Figure 3.7, it can be seen that the number of CORDIC blocks in the systolic

array is about (m + 2)(m + 3)/2, where m is equal to the width of the matrix D or

the length of the filter h. The number of logic gates in a CORDIC block is roughly

comparable to the number of gates in a multiplier. Assuming that the processing is

done with 15 bits of accuracy, then the implementation of the systolic array will have

a gate count equivalent to about 100m2 full adders (FAs).

3.3 Optimization of the Predistorter

In the previous section, we showed how to measure the nonlinearity of the signal path

by using the inputs and sampled outputs of the system. The nonlinearity error can

be fed into an optimization method that controls the predistorter parameters. The

optimizer then tunes the predistorter until the nonlinearity is minimized, i.e. the

predistortion best cancels the nonlinearity of the system.

The objective function to be minimized is a scalar function of the predistorter

parameters. The class of techniques that minimize such functions are the multivariate

descent methods. This multivariate optimization problem can be formulated as

min error = f(θ1, θ2, . . . , θl)

subject to θ ∈ Θ (3.19)

where Θ ⊂ Rl is the domain of acceptable parameters. The function f(·) relates the

nonlinearity of the signal chain—starting from the predistorter all the way through

the DAC, the output filter and the ADC sampling network—to the predistorter pa-

rameters. In the previous section, we talked about an algorithm to evaluate f based

on output measurements. This, however effective, is far from having a closed-form


θ(1)m

c

r

cc

v(l+1)

s

θ(l+1)

Figure 3.8: Search for the next simplex in the Nelder-Mead simplex algorithm.

expression for the cost function. Therefore, the optimization must be driven only

by function evaluations, and cannot directly use its gradient. The class of optimiza-

tion techniques that do not explicitly use derivatives are known as unconstrained

optimization or direct search methods [28].

A very straightforward direct search method is the coordinate search. The search

starts by varying one parameter at a time by steps of the same magnitude. If one of

these steps yields a better cost function, the improved point becomes the new iterate;

otherwise the step size is halved, and the process repeats until the steps are sufficiently

small. There are quite a few variations of this basic algorithm [29]. Algorithms in this

class are slow—essentially because they do not use any curvature information—but

they are very reliable. Also, they have very easy implementation and low hardware

requirement.

The most well-cited of unconstrained optimization methods is the Nelder-Mead

simplex algorithm [30, 31]. This algorithm takes a simplex of l + 1 points for l-

dimensional vector θ and modifies the simplex until it tightly surrounds the minimizer.

The simplex is modified according to the following procedure, as illustrated in

Figure 3.8: First, the points in the simplex are sorted according to their corresponding

cost function from the lowest (best) cost function at θ(1) to the highest (worst) cost


function at θ(l+ 1). Then, the cost function is evaluated at four points on a line that

passes through θ(l + 1) and the centroid of the other vertices. These four points are

given by

θ(l + 1) + α (m− θ(l + 1)) (3.20)

where m = (θ(1) + · · ·+ θ(l)) /l is the centroid point and α ∈ 12, 32, 2, 3. Depending

on the value of the cost function at the exploratory points r, s, c or cc, one of these

points replaces the current worst vertex. The simplex either reflects around the

centroid point (α = 2), contracts inside itself (α = 12), contracts outside itself (α = 3

2)

or expands (α = 3). If all of the exploratory points seem to be making for a worst

simplex, then the whole simplex shrinks towards the best vertex.

In each iteration of the algorithm, the shape of the simplex modifies according

to the local topology of the function. This adaptability makes the simplex algo-

rithm more efficient and faster compared to the coordinate search algorithm. The

pseudocode in algorithm 2 describes the algorithm in more details.

There is no guarantee that this algorithm does not fail, unless it starts from a

simplex that is close enough to the minimizer. In this case, it will move toward

the minimum very efficiently. Figure 3.9 shows an example of the progress of the

Nelder-Mead simplex algorithm in R2.

Figure 3.10 shows an example of an adaptive calibration with simplex optimiza-

tion. The top pictures show the nonlinear input-output characteristic of the system

as well as the output filter. After the calibration is done, the predistorter curve con-

verges to the inverse of the original nonlinearity, with limits set by the order of the

predistortion function. The last picture shows how the linearity improves with sim-

plex iterations. The linearity is denoted by the spurious-free dynamic range (SFDR),

which is defined as the ratio of the amplitude of the fundamental to the amplitude of

the largest in-band spur. This example only shows the calibration of a single core and

also ignores the discontinuities of the characteristic curve. In this case the simplex

algorithm converges with less than 100 function evaluations. Each of these function

evaluations corresponds to 1000 measurements of the output, which are passed to the

RLS algorithm for an update of the cost function. As we saw previously, the RLS

algorithm consists of a series of Givens rotations, and each of the Givens rotations


Algorithm 2 Nelder-Mead simplex algorithm

Require: an initial simplex θ(i), i = 1, . . . , l + 1

1: order the points from lowest function value f(θ(1)) to highest f(θ(l + 1))

2: while maximum number of iternations is not exceeded do

3: calculate r = 2m−θ(l + 1), where m =l∑1

θ(i)/l, and evaluate f(r)

4: if f(θ(1)) ≤ f(r) < f(θ(l)) then

5: accept r and terminate this iteration

6: else if f(r) < f(θ(1)) then

7: calculate s = m+ 2(m−θ(l + 1)) and evaluate f(s)

8: if f(s) < f(r) then

9: accept s and terminate the iteration

10: else

11: accept r and terminate the iteration

12: end if

13: else if f(r) ≥ f(θ(l)) then

14: if f(r) < f(θ(l + 1)) then

15: calculate c = m+ (r−m)/2 and evaluate f(c)

16: if f(c) < f(r) then

17: accept c and terminate the iteration

18: end if

19: else if f(r) ≥ f(θ(l + 1)) then

20: calculate cc = m+ (θ(l + 1)−m)/2 and evaluate f(cc)

21: if f(cc) < f(n+ 1) then

22: accept cc and terminate the iteration

23: end if

24: end if

25: else

26: Calculate the l points v(i) = θ(1) + (θ(i)− θ(1)) /2

27: The simplex at the next iteration is θ(1), v(2), . . . , v(l + 1)

28: end if

29: end while


(a) Initial simplex (b) Expand (c) Expand

(d) Reflect (e) Contract inside (f) Contract inside

(g) Contract outside (h) Contract outside (i) Contract inside

Figure 3.9: An example of the Nelder-Mead simplex algorithm.


-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Non

linea

rity

Cur

ve

Normalized Input0 2 4 6 8

-0.2

0

0.2

0.4

0.6

0.8

Filt

er

Samples

-1 -0.5 0 0.5 1-1

-0.5

0

0.5

1

Pre

dist

ortio

n C

urve

Normalized Input0 50 100

0

20

40

60

SF

DR

(dB

)

Simplex Iterations

Figure 3.10: Matlab simulation of a calibration run.


Simplex iterations

Given rotations iterations

CORDIC iterations

Figure 3.11: The adaptive calibration consists of three nested iterative algorithms,

with simple CORDIC iterations at the lowest level.

can be implemented through a series of CORDIC iterations. Therefore, as illustrated

in Figure 3.11, the whole predistortion algorithm boils down to a number of CORDIC

iterations at its heart. This algorithm is much simpler to implement than the pa-

rameter estimation algorithm. However, it consists of nested iterative algorithms and

requires more time to converge. In essence, the simplicity of hardware implementation

is traded off with longer calibration time to make the algorithm physically realizable.

The number of simplex iterations depends on the number of optimization param-

eters, while the number of measurements required in the RLS algorithm depends on

the sampling noise. Both of these will directly affect the calibration time. In the next

section, we quantify the relationship between these variables.

3.4 Calibration Speed

In the previous section we depicted how an unconstrained optimization can be used to

tune the predistorter parameters. It can be shown that if the number of parameters

is not too large, the number of function evaluations that the optimization algorithm


performs scales linearly with the number of parameters [32]. Thus,

s = k · l (3.21)

where s is the number of function evaluations of the Nelder-Mead simplex algorithm

and l is the number of predistorter parameters that get calibrated. The factor k is a

proportionality factor which turns out to be in the order of 100 for the Nelder-Mead

method.

Each of the function evaluations called by the optimization algorithm measures

the error in

y = Dh+ ε (3.22)

where ε denotes the ADC sampling noise on top of y. In order to adjust the predis-

tortion parameters, we need to be able to solve this equation with a certain accuracy.

The added noise will degrade the level of accuracy that we can get from the above

equation. In order to compensate for noise, we must add more rows to y and D in

(3.22) and make them taller. This means that we need to gather more samples and,

therefore, calibration time gets longer.

The equation (3.22) is a multiple linear regression problem, which describes a

linear relationship between a set of dependent variables in y and multiple explanatory

variables in h. A linear regression model can be easily fitted using the least squares

approach. Of course, we showed previously that all we care about is the error in the

equation and that can be found through a workaround without explicitly calculating

the unknown h. But in this section, let’s assume that we decide to calculate h and see

how the trade-off between the number of measurements and sampling noise affects

our estimation accuracy. The least squares solution for h is

h = D†y (3.23)

where

D† = (DTD)−1DT (3.24)

is the Moore-Penrose pseudoinverse of the matrix D. The error in h due to the


sampling noise ε is

h− h = h−D†y

= h−D†(Dh+ ε) = −D†ε (3.25)

The covariance of this error is

covD†ε = D†covε(D†)T

= D†σ2ADCI

(D†)T

= σ2ADC(DTD)−1DT

((DTD)−1DT

)T= σ2

ADC

(DTD

)−1(3.26)

The matrix D contains the digital input data that enter the DAC. We can assume

that the elements of D are independent identically distributed random variables with

variance σ2d. Therefore, if the matrix D is tall enough, then the diagonal entries of

DTD are equal to the inner products of a random vector and itself and therefore

have a mean of nσ2d. The off-diagonal elements of DTD are inner products of two

independent random vectors and, therefore, have zero mean. It can also be shown

that the average of the diagonal elements of (DTD)−1 is the reciprocal of the average

of the diagonal elements of DTD:

mean(diag

((DTD)−1

))=

1

mean (diag (DTD))=

1

nσ2d

(3.27)

Also,

Σ(diag

((DTD)−1

))=

1

Σ (diag (DTD))=

m

nσ2d

(3.28)

We assume that the norm of the vector h is one, so that 1/σ2h is the SNR in the

estimated filter coefficients. Also, we can write

yi = dTi h⇒ σ2 (yi) = ‖h‖2σ2d = σ2

d (3.29)

Therefore, the ratio σ2d/σ

2ADC is a measure for the ADC SNR. Thus,

m

n∝ SNRADC

SNRh

(3.30)


The required number of measurements directly scales with the number of filter coef-

ficients and the desired accuracy in the estimation of the filter, and inversely scales

with the ADC SNR. So, for example, if the desired SNR for the estimation of the

filter coefficients is 60 dB, then

n ∝ m · 1060−SNRADC

10 ∝ m · 260−SNRADC

3 (3.31)

where in this equation the ADC SNR is expressed in dB. We assumed in (3.21) that

the simplex optimization converges after s times evaluating its objective function.

Since each of these function evaluations require n ADC samples then, assuming that

the ADC works at a sampling frequency of fs, the calibration time will be

tcal =s · nfs

(3.32)

Thus, combining (3.32), (3.21) and (3.31), we can write

tcal = αl ·mfs· 2

60−SNRADC3 (3.33)

where α is a proportionality factor. For example, assuming α ≈ 100, l = 5, m = 10,

and a target SFDR improvement of 20 dB, the calibration will approximately require

500, 000 samples, and will take about 0.5 s if the sampling is done at 1 MS/s.

3.5 Summary

In this chapter, we talked about several calibration techniques for linearizing the

DAC. We eventually converged on an adaptive linearization method and showed its

advantages compared to system identification-based methods. We showed how an

optimization algorithm can be used to minimize the cost function by adjusting the

predistortion LUT, and investigated the system-level trade-offs that affect the cali-

bration time.

Chapter 4

Prototype Design

This chapter describes some of the most important aspects in the design of the cal-

ibration ADC, which is implemented in a cyclic (algorithmic) architecture. Cyclic

ADCs are very similar to pipelines, and although not nearly as widely implemented,

they have been long used and understood. However, the special requirements in this

project call for some unique features. In this chapter, we will focus on the features

that differentiate this ADC from typical general purpose ADCs.

4.1 ADC Architecture

A cyclic ADC is similar to a single stage of a pipeline converter. In the first cycle of

a conversion, the cyclic stage acquires the input signal. In the following cycles, the

output of the stage—instead of moving through a pipeline—is repeatedly fed back

to its input until the conversion is complete. Because of its serial operation, a cyclic

converter cannot work as fast as a pipeline. Also, since it cannot utilize stage scaling,

a cyclic ADC is less energy efficient than a pipeline ADC.

As explained in Chapter 2, the calibration ADC can be very slow. Therefore, the

speed disadvantage of a cyclic architecture is not limiting in this case. To remedy

the energy inefficiency, a cyclic ADC could use tapered conversion cycles [33], where

the MSB residues get more time than the LSB residues. In the current application,

the power of the ADC is lowered due to relaxed noise requirement, and the usual

45

CHAPTER 4. PROTOTYPE DESIGN 46

Cf

Vin Vin

track hold

C1

Cf

C1

Cs Cs

Figure 4.1: A traditional bottom-plate track-and-hold circuit.

inefficiency of cyclic converters due to the absence of stage scaling is not prominent

in this design. On the plus side, a cyclic ADC has a size advantage compared to

a pipeline. Also, there is only one residue gain involved in a cyclic converter and

therefore the calibration is straightforward. Lastly, the cyclic ADC in this work is

designed to have a programmable number of cycles. Therefore, in exchange of more

conversion time, the LSB size can be decreased. This was critical because we wanted

to have the option of experimenting with different signal levels at the DAC output

when testing the chip. Obviously, such programmability is not an option in a pipeline

architecture.

4.1.1 High Bandwidth Sampling

The biggest challenge of linear sampling is the design of the sampling switch. In

a typical track-and-hold circuit (Figure 4.1) the voltage dependent resistance of the

switch transistors degrades the linearity at high speeds [22].

An interesting solution to the problem of nonlinear sampling switch is to do away

with the switch altogether [34]. As shown in Figure 4.2, the idea is to use fixed resistors

to connect the input signal to the sampling capacitors, so there is no nonlinear element

in the signal path during sampling. In the hold mode, the top plates of the sampling

capacitors are shorted to block the input signal.

Due to the finite resistance of the shorting switches, there will still be some input


Cf

Vin Vin

track hold

C1

Cf

C1

Cs Cs

Figure 4.2: Track-and-hold circuit with RC sampling.

Vin Vin Vin

(a)

Vin Vin Vin

(b)

Figure 4.3: (a) Cascaded RC sampling stages to decrease feedthrough. (b) Nonlinear

capacitances due to large switches and large resistors degrade tracking linearity.


t

htrack(t)

Tt

hhold(t)

RC T RC

CsVin

Cf

CsVin Vt

track hold

Figure 4.4: Modified RC sampling based on Wheatstone bridge operation.

feedthrough in the hold mode. In order to reduce the feedthrough, two [35] or three

[36] switched-RC branches can be cascaded and both the switches and the resistors

have to be made larger. This is illustrated in Figure 4.3(a). Shown in (b) are the

capacitances along the signal path in the track mode. We can see that large switches

or large resistors add nonlinear junction capacitances along the sampling path, which

decrease the amplitude of the sampled voltage and degrade the linearity at high

frequencies. Therefore, this type of sampling is only appropriate for low-frequency

operation.

An alternative implementation is to arrange the switches so that they form a

Wheatstone bridge in the hold mode, as shown in Figure 4.4. In this case, the input

suppression is achieved through resistance matching between switches and resistors

instead of resistive division. The matching cannot be made perfect so there would

still be some feedthrough. Again, the feedthrough can be made smaller by cascading

two stages; however, in this case there is no need for large resistors or switches and

it is possible to use minimum size devices instead. This will reduce the nonlinear

capacitance along the path and improve tracking linearity. Figure 4.4 also shows the


Vin

Cs

r

Cs

Vin

CsR-r

(a) (b) (c)

Vin

R-rR r

r1

r2

r1

r2

Figure 4.5: Comparison of three switch configurations. (a) Switch in series with a

resistor, (b) RC sampler, (c) Wheatstone sampler.

resulting time-domain response of the sampling interface. During track phase, the

sampler impulse response is a decaying exponential corresponding to the low-pass

filter due to R and Cs. In the hold mode, the signal is shut off. If the time constant

RC is much larger than the sampling window, then the sampler can approximate a

boxcar response, which, according to the discussion in Chapter 2, is the desired filter

before the ADC.

Figure 4.5 recaps our discussion with a side-by-side comparison of three basic

switch arrangements. In (a), regular switches are paired with series resistors, which

are included for a fair comparison with the other two configurations. The series re-

sistor improves linearity by decreasing the contribution of nonlinear switch resistance

as well as lowering the input signal level. It also decreases the loading of the ADC

on the DAC driver which is important in this application. To improve the linearity

of the switch, it can be bootstrapped to achieve a signal independent resistance [37].

In the current application, the input signal Vin is the voltage across off-chip loads

and centers around 2 V. This complicates the design of the bootstrap switch due to a

requirement of high voltage charge pump. Also, to avoid breakdown, the switch itself

has to be implemented with high voltage devices which make worse switches than the

core transistors.

Figure 4.5(b) shows an RC-sampler. The signal feedthrough is proportional to

r/R, where r is the resistance of the shorting switches and R is the value of the pas-

sive resistor. As mentioned before, in order to decrease this ratio, the resistor and/or


1 2 3 4 510

-2

10-1

100

101

102

feed

thro

ugh

[mV

]

1 2 3 4 5

4

8

12

16

switc

h pa

rasi

tic c

apac

itanc

e [p

F]

W [µm]

RC sampler feedthrough

Wheatstone sampler feedthrough

switch capacitance

Figure 4.6: Signal feedthrough in an RC and a Wheatstone sampler as well as switch

parasitic capacitance are shown as a function of the switch width. The length of

transistors in the switch is set to 300 nm, the total signal path resistance R = 10 KΩ,

and Cs = 500 fF. The resistance r in the Wheatstone sampler is chosen to minimize

the signal feedthrough. It can be seen that the signal feedthrough in the Wheatstone

sampler is about 3 orders of magnitude smaller than that of the RC sampler.


the switches must be made larger, and this will increase the nonlinear parasitic ca-

pacitance along the signal path. Figure 4.5(c) illustrates the proposed Wheatstone

sampler. The total resistance along the signal path during the track phase is again

equal to R. This resistance is broken into two parts: A first resistor equal to R − rwhich isolates the DAC and the ADC, and a second resistor equal to r which approxi-

mately matches the switch resistances r1 and r2 and results in minimum feedthrough.

The feedthrough can be found to be

VftVin

=r1

R + r1− R

R + r2(4.1)

Assuming r1 = R(1− ε1) and r2 = R(1 + ε2) we have

VftVin

=ε2 − ε1

4 + 2(ε2 − ε1)(4.2)

The switch resistances r1 and r2 are voltage dependent, so the feedthrough cannot

be made exactly zero. However, as we can see from (4.2), the feedthrough depends

on the difference of the errors in r1 and r2. Therefore, it can be made small by

making the switch resistances have a symmetric profile around the input common

mode voltage. Furthermore, unlike the RC sampling configuration, there is no need

for a large resistor or large switches. The switches must be designed to match the

resistor and this results in much smaller switches than in the RC sampler.

Figure 4.6 compares the signal feedthrough in the hold phase for an RC sampler

and a Wheatstone sampler. The sampler circuits are implemented as shown in Fig-

ures 4.5(b) and (c). The total track phase resistance in each case is set to R=10 KΩ,

the sampling capacitance is Cs = 500 fF, and the input signal is 1 Vppd. The switches

are implemented with an anti-parallel pair of I/O PMOS devices with a length of

300nm. The signal feedthrough is plotted as a function of the width of the PMOS

transistors, W . For equal values of tracking resistances and identical switches, the

RC sampler and the Wheatstone sampler achieve comparable linearity performance.

However, as seen in the picture, the feedthrough in a Wheatstone sampler is ap-

proximately three orders of magnitude smaller than that of an RC sampler. For a

fixed value of R, the feedthrough in an RC sampler can be made smaller by increas-

ing W , which in turn, adds to the parasitic capacitance along the path. The signal


Csr

Vin

R

r1

r2

(a) (b)

Figure 4.7: RC switch implementation. Two Wheatstone samplers are cascaded to

further reduce the signal feedthrough (a), all of the switches are implemented using

anti-parallel thick-oxide PMOS devices (b).

feedthrough drops as 1/W whereas the parasitic capacitance grows proportional to

W . Therefore, considering the numbers in the figure, in order to make the feedthrough

of the RC sampler as small as the feedthrough of the Wheatstone sampler, we have

to use very wide transistors with significant parasitic capacitances.

Another requirement of the sampling network is that it must shield the ADC core

from the large input voltage levels. In this design, the input common mode voltage

is stored on the sampling capacitors. As a result, it can change from 1.5 V to 2

V without affecting the sampling linearity. A schematic of the Wheatstone sampler

circuit implemented in the ADC is shown in Figure 4.7. It consists of a cascade of

two Wheatstone samplers to further reduce the signal feedthrough. The switches in

the sampling network are designed with thick-oxide I/O devices with an aspect ratio

of 2µ/240nm. The isolating resistor in the sampling network equals 15 KΩ, which

is much larger than the DAC output resistors in order to reduce loading. Also, r =

1KΩ and Cs = 400 fF. All resistors are implemented using high-resistance poly (HR

poly) which provides a tight variation.

Two replica sampling networks are used in parallel with the main sampling net-

work in order to regulate the ADC sampling current. During every DAC pulse, either

one of the replicas or the main sampling network is connected to the DAC output,

therefore any glitch due to sampling is pushed out of the DAC bandwidth to its


C4

C1C3

(a) End of previous phase 2

C2

C3 C4C1

Vref

2±

(b) Phase 1: addition

C4

C1C3

(c) Phase 2: multipliation

C4

C1C3

(d) End of phase 2

Figure 4.8: Basic two-phase operation of the ADC.

sampling frequency. Figure 5.2 in Chapter 5 illustrates a schematic of the sampling

circuit with the replica networks.

4.1.2 ADC Core

The ADC core is a variant of the cyclic architecture similar to the design in [38]. All

of the core operations are achieved through different configurations of one OTA and

six pairs of capacitors Cs, Cf , and C1 to C4. The conversion starts with sampling

the input signal on the sampling capacitor Cs (track phase in Figure 4.4). Then the

voltage is amplified and transferred to a capacitor C1 (hold phase in Figure 4.4).

This amplification compensates for the signal attenuation through the switched-RC

network and reduces the input referred noise of successive cycles. Next, the ADC

goes back and forth between two phases until it resolves the required number of bits.

In the first phase, as shown in Figure 4.8(b), the voltage on C1 is added or subtracted

from the reference voltage and transferred onto capacitors C3 and C4. In the second

phase (Figure 4.8(c)), the voltage is multiplied by two by combining the charges on


Φ

Φe

Vin Vtop

VbotQinj

Cs

Figure 4.9: Charge injection from the bottom-plate switch.

C3 and C4. The resulting voltage is transferred back onto C1 and makes the new

residue voltage. During phase 1 the OTA drives C3 and C4 and during phase 2 it

drives C1 and the comparator. So even though it is possible to combine the reference

addition and multiplication by two in one phase, separating those helps balance the

OTA load.

Since the noise requirement for the ADC is relaxed, we can use small capacitors

in the design to lower the power. However, small capacitors are more susceptible to

charge injection errors. As a result, certain design aspects become more important

to mitigate these errors and maintain high linearity. We will next describe the two

most important of these design aspects.

Switch Implementations

Apart from the sampling switch that has direct impact on the linearity of the sampled

voltage, other switches also need to be carefully designed to maintain linear processing

of the signal. In this circuit, the bottom plate switches are implemented with anti-

parallel NMOS transistors and the top plate switches are transmission gates. Several

considerations must be taken into account in sizing these switches. First, their resis-

tances must be small enough so that their associated time constants do not slow the

settling performance of the circuit. Also, the switch resistances must be examined to


avoid phase margin degradation. Another important caveat exists in regards to the

charge injection from the bottom plate switches, illustrated in Figure 4.9. It is known

that the channel charge from the bottom plate switches is signal independent and

so bottom plate sampling is immune to nonlinear charge injection. However, when

a bottom plate switch opens, the splitting of the channel charge between its source

and drain is a function of the resistance seen from these terminals. Therefore, the

charge that will be injected onto the bottom plate of the sampling capacitor depends

on the resistance of the top plate switch and consequently on the signal level. When

the circuit processes a large differential voltage, the different signal levels in the two

halves of the circuit can cause non-equal resistances in top-plate switches and result

in mismatched charge injection from bottom-plate switches. In order to reduce the

effect of this error charge, the top-plate switches must be designed to have a balanced

resistance profile around the common-mode voltage. A residual error still remains

due to imperfect symmetry and mismatch between switches in the two halves of the

circuit, which was verified to be negligible through simulations.

ADC Phase Transitions

A cyclic ADC operates through the rearrangement of a few circuit components to

accomplish the operations of an extended pipeline converter. As a result, the con-

nectivities and the switching patterns are more complicated in a cyclic ADC than a

pipeline. This can be seen from Figure 4.10 which illustrates a single-ended diagram

of the MDAC along with the clock signals. It can be seen that the ADC requires

many more clock phases than, e.g., a pipeline which operates using a pair of non-

overlapping clock signals. It is important, then, to carefully arrange all clock edges

in order to preserve the sampled charge while transitioning between different phases.

Figure 4.11 illustrates an example of incorrect phase transitions. Figures (a) to

(d) show the charge distributions during phase 1 and phase 2 operation and Figures

(e) and (f) show stages of transition between phase 1 and 2. The end of phase 1

operation is shown in (c). When transitioning to phase 2, first step is to open the

bottom-plate sampling switch—as is the case in pipeline ADCs—which in this case

is the switch to the left of C4. This state is shown in (e). There are still eleven more


PH

I2S

MP

LH

OLD

PH

I1P

HI2

PH

I1

s h a b c d e 1 1e 2

sh

ab

c1

11 d d

1e ed

122

1

c

f

RS

T

v out

v CM

I

v R+

v CM

I

v R-

v out

Idle

/ph

ase2

sam

ple

phas

e1ho

ldv C

M

v CM

v in

v CM

v CM

v CM

v CM

(c)

(a)

(b)

Fig

ure

4.10

:M

ore

det

aile

ddia

gram

ofth

eA

DC

MD

AC

.(a

)F

init

est

ate

mac

hin

ere

pre

senti

ng

phas

esof

oper

atio

n.

(b)

AD

Ccl

ock

ing

sign

als.

(c)

Sin

gle-

ended

MD

AC

.


voutvCM

vR+vCMvR-vout

(a) Beginning of phase 1

voutvCM

vR+vCMvR-vout

(b) End of phase 1

voutvCM

vR+vCMvR-vout

(c) Beginning of new phase 2

voutvCM

vR+vCMvR-vout

(d) End of phase 2

voutvCM

vR+vCMvR-vout

(e) Transition between phase 1 and 2

voutvCM

vR+vCMvR-vout

(f) Transition between phase 1 and 2

Figure 4.11: Phase 1 operation duplicates the residue from C1 in (a) onto C3 and C4

in (b). In phase 2, C3 and C4 charges add up (c) and construct new residue on C1

(d). In the transition between phases 1 and 2, the bottom-plate switch is opened first

(e). If the top-plate switches are not opened right after that, switching glitches can

get transferred to the bottom-plate and cause charge leakage (f).


switches that must change state until we get into phase 2. We must open the switches

that connect C3 and C4 to the output of the amplifier, close the switches on the two

sides of C2, disconnect C1 from VCM and connect it to the output, and also switch

the summing node of the amplifier. Interestingly, out of the many possible switching

sequences, only a few are safe. For example, Figure (f) shows a failing scenario in

which the switch to the left of C2 is closed next. This could result in a negative

voltage excursion on the left plate of C2 that gets coupled to the node between C3

and C4. A negative impulse on that node could partially turn on the bottom-plate

switch and lead to charge leakage.

4.1.3 Foreground Calibration

The output of the cyclic ADC is related to the output bits through the following

relationship:

Vout = Vref∑i

(g−1di + ε+ Vos

)(4.3)

where g is the stage radix, ε is the unresolved residue and Vos is the offset of the

ADC, and they all contain random errors due to circuit mismatches and non-idealities.

However, it is only the stage radix that governs the linearity of the converter and it

can be found through a foreground calibration [38].

4.1.4 OTA Structure

The OTA is implemented as a two-stage Miller compensated amplifier with nested

gain boosting, as illustrated in Figure 4.12(a). The first stage of the top-level OTA as

well as all the nested OTAs are implemented as folded cascade amplifiers. A schematic

of a nested OTA is shown in Figure 4.12(b).

The boosted output impedance will have a pole-zero doublet at the unity gain

frequency of each auxiliary amplifier. To avoid slow settling, these doublets must be

pushed beyond the unity gain frequency of the main amplifier, hence making “fast

doublets” [39]. The feedback factor of the auxiliary amplifiers is near one; while that

of the main amplifier is usually a fraction of unity. So the above condition does not


M1

Mc

M2

R Cc

Rsense

M12 M21

M11

VDD = 1 V

Vin

Vout

Vout,CM

(a) Top-level two-stage gain-boosted OTA.

VDD = 1 V

Vin

Vout,CM

Vout

(b) Booster amplifier.

Figure 4.12: Top-level and booster amplifier schematics.


require additional stages to be faster than the main OTA. The transistors in auxiliary

amplifiers have smaller widths and non-minimal channel lengths and are biased at low

currents, thus providing high gains. The main stage on the other hand will have fast

transistors. The separation of gain and speed requirements allows the use of minimal

channel length transistors in the input.

The nested OTAs use active common-mode feedback sensing to preserve their

gains, whereas simple resistive averaging was enough to sense the output common-

mode at the second stage of the top-level OTA. The overall OTA achieves a gain of

more than 95 dB and a unity gain frequency of 130 MHz.

4.2 Noise Analysis

In this section, we review the noise analysis of the ADC. First we consider the noise

contribution of the amplifiers. Due to the complexity of arithmetic involved, there are

multiple recipes for the derivation of noise, each with a different degree of accuracy.

We briefly review these methods to present a unified view. Next, we revisit the

sampling noise, keeping an eye on the differences between the proposed sampling and

the regular sampling processes. We finally combine the results to get the total noise

of the converter.

4.2.1 Review of Amplifier Noise Analysis

The total output noise of an amplifier in a sampled system is obtained by integrating

its output noise power across all frequencies from zero to infinity. The output noise

power is found by multiplying each noise source in the circuit by the magnitude

square of its transfer function to the output of the amplifier and then adding up all

the results.

Nout,tot =∑i

∫ ∞0

Si(f) |Hi(f)|2 df (4.4)

where Si is the one-sided noise power spectral density (PSD) of the ith component

and Hi(f) is its corresponding transfer function. Here, we will only consider thermal

noise of transistors and resistors because these are the dominant sources of electronic


M1

McM2

Cx

C1

CL

Cy Mb

vo

-βvo

v1

vx

vy

CcR

(a) Simplified model of the OTA with all noise sources along the signal path.

M1 M2

C1

CL

vo

-βvo

v1

CcR

(b) Simplified model of the OTA with dominant noise sources.

Figure 4.13: Small-signal model of the OTA with varying details.


(a) (b)

× × × ×××Re

ImIm

×p1p2p3 p3 z1 -z1p1

p2

p2

*

Re

Figure 4.14: Typical pole-zero plots of amplifiers. (a) Miller-compensated amplifier.

(b) Cascode-compensated amplifier.

noise in the circuit. The channel noise of transistors can be modeled as a current

source between the drain and source terminals and its one-sided PSD in active region

is approximately given by:

SMOS(f) = 4kBTγgm (4.5)

where kB is the Boltzmann constant, kB = 1.38 × 10−23 J/K, T is the absolute

temperature in degrees kelvin, γ is a coefficient equal to 2/3 for long channel devices

[40], and gm is the transconductance of the transistor. The thermal noise of a resistor

R can be modeled by a voltage source with the following PSD:

SR(f) = 4kBTR (4.6)

Derivation of noise transfer functions (NTF) for each noise source in a circuit and

then carrying out the noise integrals can be performed accurately even with symbolic

expressions. However, including every single device in calculations can lead to very

“high-entropy” results which does not provide good design intuition. But depending

on the desired accuracy—for example, whether it is a back of the envelop design or

part of an automated circuit optimization program—we can make assumptions to

simplify the calculations.

A first order approximation of this integral can be obtained by approximating the

filter response with a box and simply multiplying the noise PSD by an equivalent

noise bandwidth [41].

Nout,tot =∑i

∫ ∞0

v2n,outdf = v2n0 ·Bn (4.7)


where v2n0 is the total low frequency output referred noise of the circuit and Bn is the

equivalent noise bandwidth, chosen such that the above equation holds. The main

contributors to the low frequency noise are the non-cascode transistors of the first

stage of amplifier. The contribution of the second stage can be ignored because they

do not see the gain of the first stage. Also cascode transistors have a negligible low

frequency gain and so their noise contribution is negligible.

In order to determine the noise bandwidth, we consider the typical pole-zero plots

of amplifier closed loop responses as shown in Figure 4.14. The first picture shows the

case where all poles are real as in a Miller-compensated amplifier and the second one

shows an amplifier with a real and a pair of complex conjugate poles. Although these

pole-zero plots look crowded, first-order approximations can be worked out to obtain

the noise bandwidth [42] [43]. For example, in the first case a first-order approxima-

tion would be to ignore all the poles except the dominant one and approximate the

amplifier with a one-pole system, in which case we have Bn = π/2 · p1.However, the single pole model is quite often a very crude model of the amplifier.

Usually amplifiers are designed for a phase margin of 60 to 70—as opposed to 90—

for best settling response. This means that the higher order poles already begin to kick

in around the unity gain bandwidth of the amplifier. Therefore, a better estimation is

obtained if the effect of these higher order pole are also considered in noise bandwidth

calculation. A small signal model of the amplifier considering the effect of the second

pole is shown in Figure 4.13(b). It is seen that the noise generating elements are now

the input transistors of the first and second stage and the nulling resistor, and they

each see a different transfer function to the output. In fact, due to the dominant pole

at node v1, the noise of M1 is more severely filtered compared to M2 and the resistor,

and so the previous argument about ignoring the noise of second stage components

based on DC gain analysis is inaccurate. The noise integrals are more involved in

this case but the math can nevertheless be worked out. Assuming R = 1/gm2 and

neglecting C1 compared to Cc and CL, we can obtain the following results [44].

NOTA,single−ended = N1 +N2 +NR =1

β

kBT

Ccγ +

kBT

CLtot(γ + 1) (4.8)

Comparing Figures 4.13(a) and 4.13(b), we can see that the input cascode device and


the gain boosting device are still out of our noise equations. The noise of these devices

can only appear at the output at high frequencies close to the cascode pole related

to node vx. However, while designing the amplifier, we made sure to push out the

cascode pole far away from the first two dominant poles in order to preserve phase

margin. Also, both the cascode as well as the booster device are located in the first

stage where they are influenced by the dominant poles of the loop gain. Therefore, at

frequencies that the noise of these devices start to leak out the signal transfer from

v1 to the output has vanished and the noise will be directed to ground. So it is a

reasonable assumption to ignore the noise of these two devices. Next, we include the

noise of active load devices M11, M12, M13 in the first stage and M21 in the second

stage and double the noise power to account for the other half of the circuit (see

Figure 4.12(a)). Thus we get

NOTA =2

β

kBTγ

Cc

(1 +

gm11 + gm12

gm1

)+

2kBT

CLtot

(γ

(1 +

gm21

gm2

)+ 1

)(4.9)

4.2.2 Notes on Sampling Noise

In this section, we consider the noise that is collected in the sampling process. The

present sampling method exhibits non-stationary noise and incomplete settling which

makes it different from the common sampling that involves stationary noise and so it

is worth to revisit the result.

Figure 4.15(a) shows a transient waveform of sampled signal with stationary noise.

The signal exhibits noisy perturbations during track phases and it is frozen in time

during hold phases. In the beginning of every track phase the signal starts off from

the value that it was previously held. If we eliminate the hold phases and stitch the

track phase waveforms together we will get a continuous signal whose statistics do

not change over time, and therefore is stationary. The result is no different from the

output of an RC branch that operates in continuous time and has no switches. The

output noise power in this case can be found by integrating the filtered output noise

PSD, as shown in Figure 4.16(a).

Since the input PSD is directly proportional to R and the circuit bandwidth is

inversely proportional to it, the resistance value drops out and we get the familiar


t

vn1

t

vn2

t

vn1

t

vn2

(a) (b)

(c) (d)

Figure 4.15: T/H transient noise.

PS

D [

V2/H

z]

Pn

[V2 ]

frequency time

(a) (b)

R1

R2

T1 T2

noise from track phase

noise from reset phase

Figure 4.16: Derivation of kT/C result. (a) Frequency domain analysis of a continuous

RC branch. (b) Time domain analysis of switched RC branch.


kT/C result. This also does not depend on the length of the tracking phase and

whether the settling is incomplete or not. If the track phase is short compared to the

time constant of the branch, successive noise samples will be correlated. However, as

we take more and more samples they will be spread across longer time frames and

their statistics will approach those of the continuous time process.

The waveforms of a non-stationary noise process are shown in Figure 4.15(c),

which also applies to the Wheatstone sampler used in this work. In this case, the

periods with large signal perturbations denote the track phases and the periods with

quieter variations are hold phases when the sampling capacitor is being reset. The

branch resistance during the track phase is much larger than the resistance of the

resetting switches so the noise variance is larger in track phase compared to hold

phase. From the noise eye diagram in Figure 4.15(d) we see that the noise is not

stationary, which is in contrast with the eye diagram in Figure 4.15(b) that does not

show any time dependence. In this case, the usual frequency domain analysis does

not apply. Instead, the noise power can be found through time-domain analysis of

the noise process’ autocorrelation function [45][46].

Ryy(t1, t2) = h(t1) ∗Rxx(t1, t2) ∗ h(t2) (4.10)

where Rxx and Ryy are the autocorrelation of the input and output noise, and h(t)

is the impulse response of the RC filter. The variance of the output noise power can

be found from its autocorrelation as follows:

σ2y(t) = Ryy(t, t) (4.11)

Since the input noise is white, its autocorrelation is a Dirac delta function:

Rxx(t1, t2) =Sx02δ(t1 − t2) (4.12)

where Sx0 = 4kBTR is the one sided white noise PSD of the resistor. Combining

these equations, we can carry out the convolution and find the output noise power as


follows:

σ2y(t) =

Sx02

∫ Ts

0

h2(τ)dτ

= 2kBTR ·1

2τ

(1− e−2Ts/τ

)=kBT

Cs

(1− e−2

Tsτ

)(4.13)

Therefore, we see that the sampled noise grows exponentially until it reaches the final

value of kBT/Cs. Another noise source that must be included in this result is the

noise that is stuck on the capacitor from the reset phase. Since the time constant of

the branch is much shorter in this phase, the noise variance reaches the steady value

of kBT/Cs. During the following track phase, this voltage decays exponentially and

its final value at the end of the track phase will be

v2n,reset =kBT

Cse−2

Tsτ (4.14)

Combining the two contributions of the two noise sources, we get the total sampled

noise as follows:

v2n =kBT

Cs(4.15)

Once again, we arrive at the pervasive expression of kBT/Cs. The noise being in-

tegrated increases during sampling window while the noise power stored on the ca-

pacitor decreases, both exponentially and with the same time constant. So the total

noise stays the same, as shown in Figure 4.16(b). Interestingly, the way the branch

resistance falls out of the equation in this time domain analysis is reminiscent of the

situation where the noise power of an RC branch stays independent of the resistor

value in the frequency domain analysis.

4.2.3 Track-and-hold Noise Analysis

The noise contribution at the output of a track-and-hold stage consists of the noise

that is generated during track mode and gets transferred to the output, as well as

the hold mode noise. We already calculated the noise contribution of the sampling


capacitor. But the track mode noise also includes contributions from the other capac-

itances that are connected to the amplifier virtual ground node, such as the feedback

capacitor Cf and the parasitic capacitance of that node Cpar. It can be shown that

the total output referred noise of the track mode is [47]

Ntrack = 2kBTCs + Cf + Cpar

C2f

(4.16)

The noise added to output during hold-mode operation primarily consists of the

noise of the amplifier as well as the thermal noise of switches:

Nhold = NOTA +Nswitches (4.17)

The amplifier noise follows from (4.9). The switches’ noise contributions is found

by transferring their noise PSDs to the output, and integrating the PSDs across all

frequencies. Noise voltages corresponding to switches that are in series with the

sampling capacitor see a gain of (Cs/Cf )2 while others that are in the feedback path

or are connected to the load see unity gain. We assume a second order filter response

with a resonant frequency ω0 and quality factor Q. Therefore, the noise contribution

of switches can be obtained from

Nswitches = 4kBT

(∑Rs

(CsCf

)2

+∑

Rf +∑

RL

)ω0Q

4(4.18)

where the summations are carried out over all switch resistances in series with the

sampling, feedback, and load capacitors in the complete differential circuit. The

values of ω0 and Q are related to the loop gain unity gain frequency ωc and phase

margin PM as follows:

ω0 = ωc4

√1 + tan2(PM)

Q =4√

1 + tan2(PM)

tan(PM)(4.19)

4.2.4 ADC Noise Analysis

In order to calculate the total converter noise, we follow the signal path to see what

noise sources are added to the signal. First, the input signal is sampled through the


RC sampling network and accrues a noise given by (4.16). Then, in the hold phase, the

sampled voltage is transferred from Cs onto C1, meanwhile being amplified by Cs/Cf

and getting a noise of NSH . Next, the two-phase ADC operation starts and repeats

until the desired number of bits are resolved. We denote the noise contributions in

phase 1 and 2 by N1 and N2, respectively. Each time the signal goes through the

phase 2 operation, it is amplified by a factor of 2. The total input referred noise of

the cyclic ADC is therefore

NADC = Ntrack +NSH +

(N1 + N2

4

) (1 + 1

4+ 1

16+ · · ·

)(CsCf

)2 (4.20)

where the noise componentsNSH , N1 andN2 follow from the hold mode noise equation

(4.17), with the only difference being the capacitive network the amplifier is embedded

in.

4.3 Summary

We covered some of the important and unique aspects of the design of the calibra-

tion ADC. Input sampling is accomplished by the Wheatstone sampling technique

presented in this chapter. Due to the small sizes of capacitors, care was taken to

minimize charge injection errors from switches considering second order effects that

are usually insignificant in regular designs. Appropriately sequencing clock edges be-

tween various phases of the ADC is another important aspect in this design that was

pointed out. Finally, the noise analysis of the complete converter was reviewed.

Chapter 5

Experimental Results

A prototype chip was taped out as a proof of concept for the proposed architecture.

The chip was fabricated in UMC’s 90-nm, 9 metal, 1 poly, CMOS process. The

die micrograph is shown in Figure 5.1(a). The die has a total area of 2 mm × 2

mm and was packaged in an 88-pin QFN package. The active area of the DAC is

0.36 mm2 and that of the ADC is only 0.04 mm2. The chip contains three slightly

different DAC-ADC pairs for testing purposes. The layout of the ADC is shown in

Figure 5.1(b).

5.1 Test Setup

The chip I/O interface is shown in Figure 5.2. The interface is passive and bidirec-

tional and was used both for capturing the data out of the DAC as well as testing the

ADC. Both single-ended as well as differential I/O options are included on the test

board. For the purpose of testing the ADC, the input signal is generated from an

HP8644B low-jitter, high-performance RF signal generator. It is then filtered through

a K&L tunable bandpass filter to block the spurious tones before connecting to the

test board. A cascade of two ETC-1-1-13 baluns are used on the board for single-

ended to differential conversion with small amplitude imbalance. The 50 Ω resistors

are the off-chip loads of the DAC. The inductors in parallel with the loads are used to

set the input common-mode voltage when testing the ADC. The combination of the

70

CHAPTER 5. EXPERIMENTAL RESULTS 71

DACADC

(a) Chip micrograph.

OTA

switches

digitalcontrol

capacitors

comp

(b) ADC layout.

Figure 5.1: Die photo of the prototype chip and the ADC layout. The chip also

includes two other DAC-ADC pairs (not outlined in the photo) for debugging options.


chip boundary

dummysamplers

DAC output driver

22Ω

2pF

0.1uF

0

0

0

0

50Ω

to the rest of ADC

S

ETC-1-1-13100uH 50Ω

0.1uF

Vcmi

2V

V id

V ip

Vim

Vca

scod

e

Figure 5.2: Chip I/O interface. The Wheatstone samplers are simplified in this

schematic.


2 pF capacitor and 22 Ω resistors provide an additional stage of filtering right at the

border of the chip. The value of the capacitor is adjusted depending on the frequency

of the test input to the ADC. This capacitor is removed when the DAC is under test

with a broadband signal. As shown in the picture, the bias of the cascode transistors

in the DAC output driver can be set off-chip. During regular DAC operation, they

are biased at 2 V. However, when testing the ADC they are turned off so that the

(nonlinear) output impedance of the DAC driver does not load the ADC. Two repli-

cas of the ADC sampling network are placed in parallel with the ADC. The clocking

of these replicas is programmed such that at any time one branch is configured in

sample mode and two branches are in hold mode. Therefore, the loading on the DAC

driver is regularized across time and there is similar sampling current drawn from the

DAC driver during every output pulse.

5.2 Measurement Results

5.2.1 ADC Measurements

Due to capacitor mismatches, the accurate closed-loop gain of the residue amplifier

in the ADC is not known a priori and must be obtained through measurements. So

before using the ADC to calibrate the DAC, we must perform a foreground calibration

on the ADC and find the residue gain. The right gain is the one that maximizes the

ADC linearity, and can be found by testing either the static or dynamic linearity of

the ADC. In this work, we chose to measure the dynamic linearity by applying a

sinusoid to the ADC and finding the residue gain that maximizes the output SFDR.

This gain is used in all other measurements afterwards.

ADC Static Nonlinearity

Figure 5.3 shows the measured static nonlinearity of the ADC. The input is a low

frequency signal at about 50 MHz and the integral and differential nonlinearities (INL

and DNL) are found through a histogram test. The ADC achieves a DNL below 0.62

LSB and an INL below 2.2 LSB.


0 2000 4000 6000 8000 10000 12000 14000 16000-1

-0.5

0

0.5

1

DN

L (L

SB

)

Peak DNL = 0.61 / -0.52

0 2000 4000 6000 8000 10000 12000 14000 16000-4

-2

0

2

4

code

INL

(LS

B)

Peak INL = 2.20 / -2.17

Figure 5.3: ADC measured static nonlinearity.

ADC Dynamic Nonlinearity

Figures 5.4 and 5.5(a) show single-tone tests of the ADC dynamic nonlinearity. Input

frequency was swept from 50 MHz to above 500 MHz. The ADC sampling frequency

is about 1 MHz, therefore the ADC downsamples the input signal. The SFDR rolls

off at higher frequency mainly due to filtering of the sampling interface. This is

confirmed by the expected droop of the fundamental tone.

The bandwidth of the sampling interface is larger than the bandwidth of an ideal

boxcar sampler. This is because the actual sampling interface is only an approxima-

tion of an ideal boxcar sampler. It exhibits non-zero rise and fall times as well as a

drooping amplitude due to the finite RC time constant of the sampling branch. This

results in an effective sampling window smaller than ideal value of Ts = 1.25 ns and

hence a larger bandwidth.

A shortcoming of the single-tone tests is that the observed linearity might be

optimistic as the harmonics of high frequency inputs can get blocked by the ADC


0 100 200 300 400 500 60040

50

60

70

80

SF

DR

/SN

DR

(dB

)

Frequency (MHz)

SFDR

SNDR

0 100 200 300 400 500 600-5

0

5

Fun

dam

enta

l (dB

)

Frequency (MHz)

Figure 5.4: Measured SFDR/SNDR and magnitude of the fundamental tone of the

ADC output versus input frequency.

itself. Therefore, a two-tone test must be done to cross-check the high-frequency

performance of the ADC. Figure 5.5(b) shows the output spectrum of a two-tone

test. The amplitude of the input signal is kept the same as in the single-tone tests

at 1 Vppd. It can be seen that the performance is in agreement with the results from

single-tone tests.

5.2.2 DAC Measurements

Figure 5.6 shows the main test setup during background calibration of the DAC. The

high-speed inputs for the DAC are generated using ADI’s Data Pattern Generator 2

(DPG2). The calibration ADC digitizes the DAC with a down sampling ratio of

about 800. The ADC data is captured using an NI board. The calibration algorithm

runs in software and updates the LUTs in the computer. A new set of predistorted

codes are generated accordingly and sent to the DAC through the DPG2. This cycle

continuously repeats until the LUTs converge to their steady state values.


0 0.1 0.2 0.3 0.4 0.5-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

DF

T M

agni

tude

(dB

FS

)

Frequency (f/fs)

3rd harmonic

(a) Single-tone input with a frequency of 491 MHz.

0 0.1 0.2 0.3 0.4 0.5-100

-90

-80

-70

-60

-50

-40

-30

-20

-10

0

DF

T M

agni

tude

(dB

FS

)

Frequency (f/fs)

(b) Two-tone input with frequencies of 487 and 491 MHz.

Figure 5.5: Measured frequency responses of the ADC. The input amplitude is 1

Vppd. The ADC sample rate is about 1 MHz.


DAC

ADC

HP8644B signal generator

NI PXI-6120 NI PXI-6562

PC

ADI DPG2Pattern generator

prototype chipon test board

HP 8560Espectrum analyzer

control

800 MS/s

1 MS/s

Figure 5.6: Test setup for DAC calibration.


During calibration, the DAC input data is a random vector in order to provide

linearly independent equations. When the calibration converges, the DAC is tested

with predistorted sinusoid tones and the performance is evaluated with a spectrum

analyzer.

Single tone output spectra of the DAC before and after calibration are shown

in Figure 5.7. The first picture shows the outputs due to a low frequency input at

29 MHz. It can be seen that the linearity before calibration is about 30 dB and

limited by third order harmonic. The plot also shows tones at 800/3±29 MHz which

are due to the input tones mixing with the clock of the DAC cores at 800/3 MHz.

After calibration, the harmonics of the input as well as the mixing terms are pushed

down and the SFDR improves to about 60 dB. The second picture is a test with an

input frequency close to the Nyquist frequency. Similarly in this picture, the linearity

improves by about 30 dB and the DAC achieves a final SFDR of 53 dB.

After calibration, the DAC was also tested using two-tone inputs to make sure that

the single-tone readings are in fact valid and no significant high frequency harmonic

is being filtered and missed. One of these tests is shown Figure 5.8. The input signal

contains two tones, at frequencies of 370 and 376 MHz, both very close to the DAC

Nyquist frequency of 400 MHz. The resulting spectrum plot shows an SFDR of about

68 dB.

Figure 5.9 shows a plot of DAC SFDR before and after calibration versus fre-

quency. The DAC achieves low frequency SFDRs as high as 58.5 dB and an overall

linearity of better than 53 dB for input frequencies up to 400 MHz and peak-to-peak

differential output swing of 800 mV. The SFDR roll-off is gradual across frequencies

which confirms that the DAC nonlinearity is frequency independent for the most part.

One factor that can explain this roll-off though is due to some filtering prior to the

DAC output driver. Specifically, because of non-zero switch resistances there is a

finite bandwidth filter at the output of the SC cores [48]. This filter goes between the

predistorter and the DAC nonlinearity and so the nonlinearity can only be partially

cancelled. As shown in Appendix B, even if the predistorter has the exact inverse of


0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

Frequency (MHz)

Spe

ctru

m (d

Bm

)

0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

Frequency (MHz)

Spe

ctru

m (d

Bm

)

(a) Input signal is 800 mVppd at 29 MHz.

0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

Frequency (MHz)

Spe

ctru

m (d

Bm

)

0 50 100 150 200 250 300 350 400-80

-60

-40

-20

0

Frequency (MHz)

Spe

ctru

m (d

Bm

)

(b) Input signal is 800 mVppd at 373 MHz.

Figure 5.7: Measured output spectra of the DAC before and after calibration.


0 50 100 150 200 250 300 350 400-80

-70

-60

-50

-40

-30

-20

-10

0

Frequency (MHz)

Spe

ctru

m (d

Bm

)

Figure 5.8: Measured output spectrum of the DAC before and after calibration. Input

signal is 800 mVppd signal containing two tones at 370 and 376 MHz.

the DAC driver characteristic, this filter will limit the achievable linearity to

HD3 =3

4A2 c3c1

(fsigfT/H

)2

(5.1)

where A is the signal amplitude, c3/c1 is the relative magnitudes of the third order

nonlinearity to the linear term, fsig is the signal frequency and fT/H is the band-

width of the track-and-hold. Therefore, at high frequencies the simple model of static

nonlinearity becomes invalid and the SFDR starts to degrade.

5.3 Performance Summary

Figure 5.10 compares the performance achieved by the calibrated DAC discussed in

this thesis with some other published works. It can be seen that the design landscape

is dominated by the current steering DACs. The DAC discussed in this thesis is one of

the few ones that are based on an SC architecture, and achieves the best performance


0 50 100 150 200 250 300 350 40020

30

40

50

60

70

80

SF

DR

(dB

)

Signal Frequency (MHz)

Before CalibrationAfter Calibration

Figure 5.9: Measured SFDR of the DAC before and after calibration versus input

frequency.

in this category. The performance is still comparable to the best current steering

DACs which proves the potential of this new architecture.


0 100 200 300 400 500 600 700 800 900 10000

10

20

30

40

50

60

70

80

90

Frequency (MHz)

SF

DR

(dB

)

Current Steering CMOSSwitched Cap CMOSThis Work

[49]

[50]

[51]

[52]

[53]

Figure 5.10: SFDR versus frequency of some published DACs in comparison with a

data point from the prototype chip.

Chapter 6

Conclusions and Future Work

6.1 Conclusions

Digital calibration techniques are increasingly used to push the performance of analog

integrated circuits (ICs). These techniques are especially suitable for mixed-signal

systems where one end of the signal chain is in digital and therefore already accessible

to digital signal processing (DSP) functions. Examples include digital calibration of

data converters to correct the nonlinearity of op amps or digital predistortion of

power amplifiers where the aggregate nonlinearity of the transmitter chain can be

compensated in the baseband. Among data converter circuits, much focus has been

on the ADCs—especially pipeline—where the inherent redundancy in the architecture

and availability of the output in digital facilitates the implementation of calibration

algorithms. DAC design, on the other hand, comes in a much more limited variety

and offers very few calibration ideas.

Complementing the new DAC architecture proposed in [48], this work investi-

gates some important ramifications of calibrating a DAC. Important considerations

and trade-offs in the design of a sense ADC as well as an adaptive background cali-

bration algorithm were discussed. The complete system was implemented and tested

as a proof of concept. The results are comparable with those achieved from more tra-

ditional designs. We hope that this will open up doors to new ideas and architectures

in the future.

83

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 84

6.2 Future Work

This work was primarily a step towards identifying and understanding the various

trade-offs involved in the design. Some of this understanding was gained along the

way, and in light of that, some of our design choices can be further improved. Also,

this work was meant to be a proof of concept. Therefore, there are additional steps

that must be taken to obtain an industry-standard product. These two considerations

are explained further below.

Optimization of the Design

Being the first one of its kind, this design can be further tuned and optimized. This

was primarily due to the exploratory nature of the design approach, and that the final

calibration algorithm was developed in the lab after having received the chips from

the foundry. Therefore, there was a disconnect between the design and optimization

of the calibration algorithm and the design of the circuit blocks. Specifically, the

adaptive calibration algorithm requires in the order of 106 measurements before it

converges, and that translates into a calibration time in the order of seconds, which

might be too long for some applications. There are two ways to shorten the calibration

time: One is to modify the design of the calibration ADC and increase its sampling

frequency or improve its SNR. The second option is to modify the DAC design and

operate the output drives at a lower efficiency. This will squeeze the operating zone

of the output differential pair and result in a more linear response, thereby decreasing

the number of nonlinearity coefficients.

Hardware Implementation of the Calibration Algorithm

The proposed calibration algorithm was implemented in software. This gave us the

flexibility to experiment with different algorithms. However, it also introduced major

complications in testing. The transportation of data between the chip, the computer

and the DPG was the main bottleneck in real time calibration. Furthermore, the

lack of handshaking signal from the DPG caused syncing problems during real time

operation. The DPG was only able to play an array of input codes indefinitely, and

CHAPTER 6. CONCLUSIONS AND FUTURE WORK 85

the array could not be updated while the DPG was playing. Therefore, we had to

stop the DPG, update its RAM and start it over again. Now, there were three DAC

cores in the chip and three corresponding LUTs in the software, and at every update

the DPG would start from a random LUT out of sync with the cores. We were able to

solve this problem by interrupting the DPG operation repeatedly until chance would

put everything in sync. But, this resulted in a more complex and longer calibration.

In a real application, the LUTs must be implemented in hardware and these issues

will be avoided all the way.

Appendix A

Calculation of Least Squares

Residual Error

Let’s assume that y(n) denotes the vector of measured outputs up until time n, and

the sampled output at time n is yn; therefore, the following relationship captures the

growth of the vector y(n) based on its previous value:

y(n) =

[y(n− 1)

yn

]∈ Rn (A.1)

Similarly, D(n) denotes the data matrix at time n. It is constructed by appending a

new row of input data dTn to the end of D(n− 1):

D(n) =

[D(n− 1)

dTn

]∈ Rn×m (A.2)

As explained earlier and illustrated in Figure 3.6, the least-squares problem in (3.17)

can be solved by applying a series of Givens rotations to the data matrix and output

vector. At time n− 1, we have

G(n− 1)Λ(n− 1)[D(n− 1) y(n− 1)

]=

[R(n− 1) p1(n− 1)

0 p2(n− 1)

](A.3)

where G(n−1) is the aggregate product of all the Givens rotations that triangularize

the data matrix D(n − 1), R(n − 1) is an m-by-m upper triangular matrix, and

86

APPENDIX A. CALCULATION OF LEAST SQUARES RESIDUAL ERROR 87

p1(n− 1) and p2(n− 1) are m-by-1 and (n−m− 1)-by-1 vectors, respectively. The

matrix Λ(n − 1) is a diagonal forgetting matrix that emphasizes the more recent

entries of the LS equation.

Λ(n− 1) =

λn−1 0 . . . 0

0 λn−2 . . . 0...

.... . .

...

0 0 . . . 1

(A.4)

where 0 < λ < 1 is the forgetting factor.

Now, using equations (A.1) and (A.2), we write an equation similar to equation

(A.3), but corresponding to time n.

G(n)

λR(n− 1) λp1(n− 1)

0 λp2(n− 1)

dTn yn

=

[R(n) p1(n)

0 p2(n)

](A.5)

The error vector at time n is defined as:

e(n) = y(n)−D(n)h(n) (A.6)

Similar set of recursion equations can be written for the error vector as follows:

e(n) =

[e(n− 1)

en

](A.7)

where

en = yn − dTnh(n) (A.8)

Now, we apply the Givens rotation matrix to the weighted error vector at time n− 1,

and use (A.3) to expand the result:

G(n− 1)Λ(n− 1)e(n− 1) = G(n− 1)Λ(n− 1)(y(n− 1)−D(n− 1)h(n− 1)

)=

[p1(n− 1)

p2(n− 1)

]−

[R(n− 1)

0

]h(n− 1) (A.9)


The same relationship at time n will be

G(n)

[λe(n− 1)

en

]=

[p1(n)

p2(n)

]−

[R(n)

0

]h(n) =

[0

p2(n)

](A.10)

where the last equality holds because at time n we have p1(n) = R(n)h(n).

Now let’s examine the Givens rotation matrix G(n). This matrix is the product

of m Givens rotation matrices that zero out the last row of the data matrix, dTn :

G(n) =

cos(φ1) sin(φ1)

1. . .

−sin(φ1) cos(φ1)

1

cos(φ2) sin(φ2). . .

−sin(φ2) cos(φ2)

· · ·

· · ·

1. . . 0

cos(φm) sin(φm)

0 . . .

−sin(φm) cos(φm)

(A.11)

Examining the product of these matrices, we can see that the Givens rotation matrix

has the following structure:

m

n−m

0

0

0

0

0G(n) = 1

1

1

Multiplying both sides of (A.10) by GT (n), we have[λe(n− 1)

en

]= GT (n)

[0

p2(n)

](A.12)


Now let’s focus on the row-column multiplication in (A.12) that gives en. The vector

on the right side of (A.12) consists of m zeros on top. So multiplying that by the

last row of G(n) filters out all but the corner element of G(n). Again, if we inspect

(A.11) we can see that the corner element is the product of all the cosφ ’s. Therefore,

we have

en = Gnnpn =m∏i=1

cos(φi)pn (A.13)

Therefore, we obtain the LS residual error without explicit derivation of the so-

lution. Algorithm 3 summarizes the steps to get the error en. The inputs to the

algorithm are the latest sample yn and the corresponding data row dTn . Also available

to the algorithm are the upper triangular matrix R(n) and the vector p1(n).


Algorithm 3 RLS error calculation

Require: sampled output yn and the corresponding row of data matrix dTn

1: append dTn to the bottom of the upper-triangular matrix R ∈ Rm×m

2: append yn to the bottom of the vector p ∈ Rm×1

3: set Π = 1

4: for i = 1 to m do // calculate the rotation angle to zero out Rm+1,i

5: rx = λRi,i

6: ry = Rm+1,i

7: sinφ = ry√r2x+r

2y

8: cosφ = rx√r2x+r

2y

9: for j = i to m do // rotate elements of R

10: [Ri,j

Rm+1,j

]=

[cosφ sinφ

−sinφ cosφ

][rx

ry

]11: end for

12: rotate elements of p:[pi

pm+1

]=

[cosφ sinφ

−sinφ cosφ

][λpi

pm+1

]

13: build up the product of cosφ ’s: Π = Π · cosφ14: end for

15: return e = Π · pm+1

Appendix B

Predistortion Error Due to Model

Inaccuracy

When modeling the DAC, we assumed that it comprises of a static nonlinearity fol-

lowed by a filter at the output, thereby arriving at a Hammerstein model. Of course,

there is always some modeling inaccuracy at this level of abstraction. Specifically,

there is some filtering before the DAC drivers—due to finite bandwidth of the SC core

track-and-hold—and therefore the assumption of memoryless nonlinearity is not quite

true. The system can be more accurately modeled using power series with memory.

In this section, we use the more general Volterra filter model to analyse the signal

chain.

A model that includes the filtering effect prior to DAC nonlinearity is shown in

Figure B.1. In this picture, the sigmoid nonlinearity denoted by the function f(·)corresponds to the DAC driver nonlinearity. The filter h(t) models the filtering effect

that was absent in our earlier modeling. The function g(·) denotes the predistorter

x y

f(·)g(·) h(t)

t

Figure B.1: Alternative model of the DAC including a filter before the driver.

91

APPENDIX B. PREDISTORTION ERROR DUE TO MODEL INACCURACY 92

curve and it is assumed that it has the inverse characteristic of f . The output filter

of the Hammerstein model is not shown in this picture. If g and f were cascaded

back-to-back, their nonlinearities would cancel out. However, in the presence of h,

the predistortion is not perfect and there would be some residual nonlinearity in the

signal path.

Assuming a simple third order nonlinearity for the DAC output driver, we have

f(x) = x− εx3 (B.1)

If we ignore terms with higher powers of ε, the predistorter for this nonlinearity will

be

g(x) = x+ εx3 (B.2)

We assume that the filter h has a first-order response with time constant T . Thus,

h(t) =1

Te−

tT (B.3)

Now, ignoring powers of ε larger than one, we can obtain the output of the DAC as

follows:

y = f (h ∗ g (x))

= f

(∫h(τ)

(x(t− τ) + εx3(t− τ)

)dτ

)=

∫h(τ)

(x(t− τ) + εx3(t− τ)

)dτ − ε

(∫h(τ)

(x(t− τ) + εx3(t− τ)

)dτ

)3

≈∫h(τ)x(t− τ)dτ + ε

(∫h(τ)x3(t− τ)dτ −

(∫h(τ)x(t− τ)dτ

)3)

(B.4)

The last expression in (B.4) can be cast into the following format:

y =

∫h(τ)x(t− τ)dτ

+ ε

(∫h(τ)x3(t− τ)dτ

−∫∫∫

h(τ1)h(τ2)h(τ3)x(t− τ1)x(t− τ2)x(t− τ3)dτ1dτ2dτ3)

(B.5)


We recognize that the output is in the form of a third order Volterra filter as shown

below:

y =

∫h1(τ)x(t− τ)dτ

+

∫∫∫h3(τ1, τ2, τ3)x(t− τ1)x(t− τ2)x(t− τ3)dτ1dτ2dτ3 (B.6)

Matching equations (B.4) and (B.6), we can find the symmetric Volterra kernels as

follows:

h1(τ) = h(τ) =1

Te−

τT (B.7)

h3(τ1, τ2, τ3) = ε(

3√h(τ1)h(τ2)h(τ3)δ (τ − τ1, τ − τ2, τ − τ3)− h(τ1)h(τ2)h(τ3)

)=ε

T

(e−

τ1+τ2+τ33T δ (τ − τ1, τ − τ2, τ − τ3)− e−

τ1+τ2+τ3T

)(B.8)

The corresponding frequency domain kernels are

H1(s) =1

1 + sT(B.9)

H3(s1, s2, s3) = ε

(1

1 + (s1 + s2 + s3)T− 1

(1 + s1T ) (1 + s2T ) (1 + s3T )

)(B.10)

Having the Volterra kernels, we can find the response of the system to a single tone.

x = Acos(ωt)

y = A |H1(jω)| cos (ωt+ ∠H1(jω))

+3A3

4|H3(jω, jω,−jω)| cos (ωt+ ∠H3(jω, jω,−jω))

+A3

4|H3(jω, jω, jω)| cos (3ωt+ ∠H3(jω, jω, jω))

Assuming ε 1, the third order nonlinearity at the output will be given by [44]

HD3 =A2

4

|H3(jω, jω, jω)||H1(jω)|

=A2ε

4

∣∣∣ 11+3jωT

− 1(1+jωT )3

∣∣∣∣∣∣ 11+jωT

∣∣∣ (B.11)


Next, we assume that the bandwidth of the filter is much larger than the signal

frequency, therefore ωT 1. Using binomial expansion, we can further simplify the

above result to get

HD3 ≈ A2ε

4

∣∣∣∣ 1

1 + 3jωT− 1

(1 + jωT )3

∣∣∣∣=A2ε

4

∣∣1− 3jωT + (3jωT )2 −(1− 3jωT + 6 (jωT )2

)∣∣=

3

4A2ε(ωT )2 =

3

4A2ε

(fsigfT/H

)2

(B.12)

Bibliography

[1] G. Manganaro, S.-U. Kwak, and A. Bugeja, “A Dual 10-b 200 MS/s Pipeline D/A

Converter with DLL-Based Clock Synthesizer,” in Custom Integrated Circuits

Conference, 2003. Proceedings of the IEEE 2003, Sep 2003, pp. 429–432.

[2] K. Khanoyan, F. Behbahani, and A. Abidi, “A 10 b, 400 MS/s Glitch-Free

CMOS D/A Converter,” in VLSI Circuits, 1999. Digest of Technical Papers.

1999 Symposium on, 1999, pp. 73–76.

[3] M. Gustavsson, J. J. Wikner, and N. N. Tan, CMOS Data Converters for Com-

munications. Norwell, MA, USA: Kluwer Academic Publishers, 2000.

[4] F. Maloberti, Data Converters. Springer, 2007.

[5] B. Murmann and B. Boser, “A 12 bit 75 MS/s Pipelined ADC Using Open-Loop

Residue Amplification,” Solid-State Circuits, IEEE Journal of, Dec 2003.

[6] D.-L. Shen and T.-C. Lee, “A 6 bit 800 MS/s Pipelined A/D Converter with

Open-Loop Amplifiers,” Solid-State Circuits, IEEE Journal of, Feb 2007.

[7] A. Nazemi, C. Grace, L. Lewyn, B. Kobeissy, O. Agazzi, P. Voois, C. Abidin,

G. Eaton, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. Posse,

S. Wang, and G. Asmanis, “A 10.3 GS/s 6 bit (5.1 ENOB at Nyquist) Time-

Interleaved/Pipelined ADC Using Open-Loop Amplifiers and Digital Calibration

in 90 nm CMOS,” in VLSI Circuits, 2008 IEEE Symposium on, Jun 2008.

[8] S. Boumaiza, J. Li, M. Jaidane-Saidane, and F. Ghannouchi, “Adaptive Digi-

tal/RF Predistortion Using a Nonuniform LUT Indexing Function with Built-in

95

BIBLIOGRAPHY 96

Dependence on the Amplifier Nonlinearity,” Microwave Theory and Techniques,

IEEE Transactions on, vol. 52, no. 12, pp. 2670–2677, Dec 2004.

[9] T. Wang and J. Ilow, “Compensation of Nonlinear Distortions with Memory Ef-

fects in OFDM Transmitters,” in Global Telecommunications Conference, 2004.

GLOBECOM ’04. IEEE, vol. 4, Nov.-3 Dec. 2004, pp. 2398–2403.

[10] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Systems. R.E.

Krieger Publishing Company, 1980. [Online]. Available: http://books.google.

com/books?id=dMB3QwAACAAJ

[11] L. Ding, G. Zhou, D. Morgan, Z. Ma, J. Kenney, J. Kim, and C. Giardina, “A

Robust Digital bBaseband Predistorter Constructed Using Memory Polynomi-

als,” Communications, IEEE Transactions on, vol. 52, no. 1, pp. 159–165, Jan

2004.

[12] F. Kuech and W. Kellermann, “Partitioned Block Frequency-Domain Adaptive

Second-Order Volterra Filter,” Signal Processing, IEEE Transactions on, vol. 53,

no. 2, pp. 564–575, Feb 2005.

[13] L. Ngia and J. Sjoberg, “Efficient Training of Neural Nets for Nonlinear Adaptive

Filtering Using a Recursive Levenberg-Marquardt Algorithm,” Signal Processing,

IEEE Transactions on, vol. 48, no. 7, pp. 1915–1927, Jul 2000.

[14] K. Shi, X. Ma, and G. Zhou, “Acoustic Echo Cancellation Using a Pseudo-

coherence Function in the Presence of Memoryless Nonlinearity,” Circuits and

Systems I: Regular Papers, IEEE Transactions on, vol. 55, no. 9, pp. 2639–2649,

Oct 2008.

[15] C. Daigle, “Switched-Capacitor DACs Using Open-Loop Output Drivers and

Digital Predistortion,” Ph.D. dissertation, Stanford University, Stanford, Aug.

2010. [Online]. Available: http://purl.stanford.edu/fv654tp9350

[16] K. Doris, QRD-RLS Adaptive Filtering. Dordrecht, The Netherlands: Springer,

2006.

http://books.google.com/books?id=dMB3QwAACAAJ

http://books.google.com/books?id=dMB3QwAACAAJ

http://purl.stanford.edu/fv654tp9350

BIBLIOGRAPHY 97

[17] Y.-M. Zhu, “Generalized Sampling Theorem,” Circuits and Systems II: Analog

and Digital Signal Processing, IEEE Transactions on, vol. 39, no. 8, pp. 587–588,

Aug 1992.

[18] J. Tsimbinos and K. Lever, “Input Nyquist Sampling Suffices to Identify and

Compensate Nonlinear Systems,” Signal Processing, IEEE Transactions on,

vol. 46, no. 10, pp. 2833–2837, Oct 1998.

[19] J. Tsimbinos, “Identification and Compensation of Nonlinear Distortion,” Ph.D.

dissertation, Univ. of South Australia, The Levels, South Australia, Feb. 1995.

[20] W. Frank, “Sampling Requirements for Volterra System Identification,” Signal

Processing Letters, IEEE, vol. 3, no. 9, pp. 266–268, Sep 1996.

[21] R. Martin, “Volterra System Identification and Kramer’s Sampling Theorem,”

Signal Processing, IEEE Transactions on, vol. 47, no. 11, pp. 3152–3155, Nov

1999.

[22] P. Nikaeen, “Digital Compensation of Dynamic Acquisition Errors at the Front-

End of ADCs,” Ph.D. dissertation, Stanford University, Stanford, Sep. 2008.

[23] P. Wambacq and W. Sansen, Distortion Analysis of Analog Integrated Circuits.

Boston, MA: Kluwer, 1998.

[24] M. Hasler and G. Kubin, “Mixed-Domain System Representation Using Volterra

Series,” in Circuits and Systems, 2008. ISCAS 2008. IEEE International Sym-

posium on, May 2008, pp. 572–575.

[25] M. Schetzen, “Theory of pth-Order Inverses of Nonlinear Systems,” Circuits and

Systems, IEEE Transactions on, vol. 23, no. 5, pp. 285–291, May 1976.

[26] J. G. McWhirter, “Recursive Least-Squares Minimization Using a Systolic Ar-

ray,” vol. 431, no. 6, Jan 1983, pp. 105–112.

[27] J. Apolinario, QRD-RLS Adaptive Filtering, ser. Lecture Notes in Mathematics.

Springer, 2009. [Online]. Available: http://books.google.com/books?id=

w3TROT70zpYC

http://books.google.com/books?id=w3TROT70zpYC

http://books.google.com/books?id=w3TROT70zpYC

BIBLIOGRAPHY 98

[28] T. G. Kolda, R. M. Lewis, and V. Torczon, “Optimization by Direct Search: New

Perspectives on Some Classical and Modern Methods,” SIAM Review, vol. 45,

pp. 385–482, 2003.

[29] V. Torczon, “On the Convergence of Pattern Search Algorithms,” SIAM Journal

on Optimization, vol. 7, pp. 1–25, 1993.

[30] J. A. Nelder and R. Mead, “A Simplex Method for Function Minimization,”

The Computer Journal, vol. 7, no. 4, pp. 308–313, Jan. 1965. [Online]. Available:

http://dx.doi.org/10.1093/comjnl/7.4.308

[31] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, “Convergence Prop-

erties of the Nelder-Mead Simplex Method in Low Dimensions,” SIAM Journal

of Optimization, vol. 9, pp. 112–147, 1998.

[32] M. Baudin, “Nelder-Mead User’s Manual,” 2010.

[33] M. G. Kim, G. cho Ahn, and U. ku Moon, “An Improved Algorithmic ADC

Clocking Scheme,” IEEE Int. Symp. Circuits & Systems, pp. 589–592, 2004.

[34] M. Keskin, “A Novel Low-Voltage Switched-Capacitor Input Branch,” Circuits

and Systems II: Analog and Digital Signal Processing, IEEE Transactions on,

vol. 50, no. 6, pp. 315–317, Jun 2003.

[35] J. Li, G.-C. Ahn, D.-Y. Chang, and U.-K. Moon, “A 0.9 V 12 mW 5 MS/s

Algorithmic ADC with 77 dB SFDR,” Solid-State Circuits, IEEE Journal of,

vol. 40, no. 4, pp. 960–969, Apr 2005.

[36] G.-C. Ahn, M. G. Kim, P. Hanumolu, and U.-K. Moon, “A 1 V 10-b 30 MS/s

Switched-RC Pipelined ADC,” in Custom Integrated Circuits Conference, 2007.

CICC ’07. IEEE, Sep 2007, pp. 325–328.

[37] A. Abo and P. Gray, “A 1.5 V, 10 bit, 14.3 MS/s CMOS pipeline Analog-to-

Digital Converter,” Solid-State Circuits, IEEE Journal of, May 1999.

http://dx.doi.org/10.1093/comjnl/7.4.308

BIBLIOGRAPHY 99

[38] O. Erdogan, P. Hurst, and S. Lewis, “A 12-b Digital-Background-Calibrated

Algorithmic ADC with −90 dB THD,” Solid-State Circuits, IEEE Journal of,

Dec 1999.

[39] K. Bult and G. Geelen, “A Fast-Settling CMOS Op Amp for SC Circuits with

90 dB DC Gain,” Solid-State Circuits, IEEE Journal of, vol. 25, no. 6, pp. 1379–

1384, Dec 1990.

[40] R. Jindal, “Compact Noise Models for MOSFETs,” Electron Devices, IEEE

Transactions on, vol. 53, no. 9, pp. 2051–2061, Sep 2006.

[41] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. New York, NY,

USA: McGraw-Hill, Inc., 2001.

[42] A. Abo, “Design for Reliability of Low-voltage, Switched-capacitor Circuits,”

Ph.D. dissertation, University of California, Berkeley, 1999.

[43] D. Cline, “Noise, Speed, and Power Trade-offs in Pipelined Analog to Digital

Converters,” Ph.D. dissertation, University of California, Berkeley.

[44] B. Murmann, EE214 Lecture Notes, Stanford University, 2012.

[45] ——, EE315A Lecture Notes, Stanford University, 2012.

[46] T. Sepke, P. Holloway, C. Sodini, and H.-S. Lee, “Noise Analysis for Comparator-

Based Circuits,” Circuits and Systems I: Regular Papers, IEEE Transactions on,

Mar 2009.

[47] B. Murmann, EE315B Lecture Notes, Stanford University, 2012.

[48] C. Daigle, A. Dastgheib, and B. Murmann, “A 12-bit 800-MS/s Switched-

Capacitor DAC with Open-Loop Output Driver and Digital Predistortion,” in

Solid State Circuits Conference (A-SSCC), 2010 IEEE Asian, Nov. 2010, pp.

1–4.

BIBLIOGRAPHY 100

[49] W. Schofield, D. Mercer, and L. Onge, “A 16-b 400 MS/s DAC with < −80

dBc IMD to 300MHz and < −160 dBm/Hz Noise Power Spectral Density,” in

Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC. 2003

IEEE International, vol. 1, 2003, pp. 126–482.

[50] B. Schafferer and R. Adams, “A 3 V CMOS 400 mW 14-b 1.4 GS/s DAC for

Multi-Carrier Applications,” in Solid-State Circuits Conference, 2004. Digest of

Technical Papers. ISSCC. 2004 IEEE International, vol. 1, 2004, pp. 360–532.

[51] A. Van den Bosch, M. Borremans, M. S. J. Steyaert, and W. Sansen, “A 10

bit 1 GSample/s Nyquist Current-Steering CMOS D/A Converter,” Solid-State

Circuits, IEEE Journal of, vol. 36, no. 3, pp. 315–324, 2001.

[52] C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan, E. Ayranci,

X. Liu, and K. Bult, “A 12 bit 2.9 GS/s DAC with IM3 < −60 dBc Beyond 1

GHz in 65 nm CMOS,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 12,

pp. 3285–3293, 2009.

[53] J. Savoj, A. Abbasfar, A. Amirkhany, M. Jeeradit, and B. Garlepp, “A 12 GS/s

Phase-Calibrated CMOS Digital-to-Analog Converter for Backplane Communi-

cations,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 5, pp. 1207–1216,

2008.

CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE …bk656yt4469/Alireza_thesis... · rithms required for...

Documents

Transcript of CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE …bk656yt4469/Alireza_thesis... · rithms required for...