CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE …bk656yt4469/Alireza_thesis... · rithms required for...
Transcript of CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE …bk656yt4469/Alireza_thesis... · rithms required for...
CALIBRATION ADC AND ALGORITHM FOR ADAPTIVE
PREDISTORTION OF HIGH-SPEED DACS
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF ELECTRICAL
ENGINEERING
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Alireza Dastgheib
March 2013
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/bk656yt4469
© 2013 by Alireza Dastgheib. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Boris Murmann, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Ada Poon
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Bruce Wooley
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
Abstract
In this thesis, the design and implementation of circuits and signal processing algo-
rithms required for digital predistortion of a digital-to-analog converter (DAC) with
an open-loop driver is presented. On the circuit design side, the implementation of
a high precision, high acquisition-bandwidth calibration analog-to-digital converter
(ADC) for sampling the DAC output is discussed. The ADC has a cyclic core and
a variant of the switched-RC sampling network suitable for high frequency opera-
tion. It is implemented in 90-nm CMOS and achieves an SFDR of higher than 72
dB for input frequencies under 500 MHz. On the signal processing side, a calibration
algorithm is presented that uses the input/output data of the DAC to identify its
nonlinearity and cancel it through digital predistortion. The algorithm represents a
novel technique for the linearization of Hammerstein systems with low computational
cost and is suitable for hardware realization. Overall, this calibration architecture
improves the SFDR of the DAC by close to 30 dB to achieve a final linearity of 53
dB for input frequencies up to 400 MHz and peak-to-peak differential output swing
of 800 mV.
iv
Acknowledgements
Much like any journey in life, my PhD was not all about the destination—the thesis
in this case, but rather mostly about the process that took me here. I would like
to thank all the people who accompanied me in this journey and made me a better
person.
Firstly, I thank my advisor, Prof. Boris Murmann, who in addition to being a
research all-star, was a great instructor and a supportive mentor. I also thank Prof.
Bruce Wooley and Prof. Ada Poon, for being on my orals committee as well as my
dissertation reading committee. I must also thank Prof. Norbert Pelc for chairing
my orals committee. I am deeply thankful to Ann Guerra for her extraordinary help
with administrative issues and lots of other random errands. I thank my research
partner, Clay Daigle, for the many things I learned from him.
Being in the silicon valley, I had the opportunity to get help from several industry
members. I would like to thank Ken Poulton (Agilent Technologies), Keith Ring
(Intersil) and Sim Narasimha for their help and brainstorming during various phases
of the research. I thank Phil Golden (Intersil) and Evan Chang (Hamamatsu) for
their sincere help with debugging my chips. Henrique Miranda went out of his way
in spite of a busy schedule, and helped me fix a problem with my evaluation board.
I appreciate the help and valuable feedbacks from my group mates as well as other
student in the Allen building throughout this research. I thank the current members
of the group: Alex, Bill, Doug, Jon, Mahmoud, Man-Chia, Martin, Noam, Ross,
Ryan, Siddharth, Vaibhav; as well as the alumni: Donghyun, Drew, Echere, Jason,
Justin, Manar, Parastoo, Pedram, Wei. I also thank Mohammad, Maryam, Roozbeh
and Rostam for their friendship and help in and out of work.
v
I was fortunate to have the wonderful company of many friends during my years
at Stanford. Their help and support was invaluable all along the way. Among them
are some of the most amazing and intelligent people I have ever known in my life. I
am deeply grateful to every one of them. Finally, I would like to thank my family for
their love and support. I would not have made it without them.
vi
Contents
Abstract iv
Acknowledgements v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Related Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Pipeline ADCs . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Digital Predistortion of Power Amplifiers . . . . . . . . . . . . 6
1.2.3 Loudspeakers . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 System Considerations 10
2.1 DAC Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 ADC Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Sampling Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.2 Acquisition Bandwidth . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Boxcar Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Noise Requirement . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Calibration Algorithm 23
3.1 System Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1 Volterra Model . . . . . . . . . . . . . . . . . . . . . . . . . . 23
vii
3.1.2 Static Nonlinearity and Nonlinear Filtering . . . . . . . . . . . 24
3.2 Hammerstein System Calibration . . . . . . . . . . . . . . . . . . . . 27
3.2.1 Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . 27
3.2.2 Adaptive Predistortion . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Optimization of the Predistorter . . . . . . . . . . . . . . . . . . . . . 35
3.4 Calibration Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Prototype Design 45
4.1 ADC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.1 High Bandwidth Sampling . . . . . . . . . . . . . . . . . . . . 46
4.1.2 ADC Core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.1.3 Foreground Calibration . . . . . . . . . . . . . . . . . . . . . . 58
4.1.4 OTA Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.2 Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.2.1 Review of Amplifier Noise Analysis . . . . . . . . . . . . . . . 60
4.2.2 Notes on Sampling Noise . . . . . . . . . . . . . . . . . . . . . 64
4.2.3 Track-and-hold Noise Analysis . . . . . . . . . . . . . . . . . . 67
4.2.4 ADC Noise Analysis . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Experimental Results 70
5.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2 Measurement Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 ADC Measurements . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 DAC Measurements . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Performance Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6 Conclusions and Future Work 83
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
viii
A Calculation of Least Squares Residual Error 86
B Predistortion Error Due to Model Inaccuracy 91
Bibliography 95
ix
List of Figures
1.1 Basic DAC architectures. . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The research’s big picture. This research focuses on the design of the
calibration ADC as well as the predistortion algorithm. . . . . . . . . 4
1.3 Structure of a pipeline ADC. . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Digitally predistorted PA. . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 General setup of an AEC. . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Two examples where a DAC’s I/O curve is not reversible. (a) Input
codes are not tight enough. (b) I/O curve discontinuity produces dead-
zone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 The calibration algorithm modifies the DAC’s mapping between input
digital codes and its codomain of output pulses. . . . . . . . . . . . . 11
2.3 Discrete-time approximation of a continuous-time Volterra system. . . 13
2.4 Parameter estimation of a continuous-time Volterra system. The un-
known signal is the slow-changing set of parameters θ(t) and not the
input code x[n]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Two examples where pulse energy does not translate to signal linearity.
(a) Output signal is nonlinear but pulse energy is linear. (b) Output
signal is linear in bandwidth of interest but pulse energy is nonlinear. 18
2.6 System block diagram and mapping between continuous-time and discrete-
time. (a) Accurate mapping of a continuous time signal into a discrete
sample cannot be implemented easily. A prefilter in front of the ADC
as shown in (b) can facilitate this. . . . . . . . . . . . . . . . . . . . . 20
x
2.7 Two pre-filter options for the calibration ADC: A filter with a long
time response (a) has a narrow frequency bandwidth (b). However, a
boxcar sampler with a short time window (c) will pass more frequencies
(d). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1 DAC with static nonlinearity and nonlinear output filter. . . . . . . . 24
3.2 Calibration of a DAC with static nonlinearity and nonlinear output
filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Hammerstein model of a system with input x and output y. . . . . . 27
3.4 Calibration through system identification. . . . . . . . . . . . . . . . 31
3.5 Calibration through adaptive predistortion. . . . . . . . . . . . . . . . 32
3.6 Recursive least squares using Givens rotations. . . . . . . . . . . . . . 33
3.7 Systolic array implementation of the RLS error calculation algorithm. 34
3.8 Search for the next simplex in the Nelder-Mead simplex algorithm. . 36
3.9 An example of the Nelder-Mead simplex algorithm. . . . . . . . . . . 39
3.10 Matlab simulation of a calibration run. . . . . . . . . . . . . . . . . . 40
3.11 The adaptive calibration consists of three nested iterative algorithms,
with simple CORDIC iterations at the lowest level. . . . . . . . . . . 41
4.1 A traditional bottom-plate track-and-hold circuit. . . . . . . . . . . . 46
4.2 Track-and-hold circuit with RC sampling. . . . . . . . . . . . . . . . 47
4.3 (a) Cascaded RC sampling stages to decrease feedthrough. (b) Non-
linear capacitances due to large switches and large resistors degrade
tracking linearity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Modified RC sampling based on Wheatstone bridge operation. . . . . 48
4.5 Comparison of three switch configurations. (a) Switch in series with a
resistor, (b) RC sampler, (c) Wheatstone sampler. . . . . . . . . . . . 49
xi
4.6 Signal feedthrough in an RC and a Wheatstone sampler as well as
switch parasitic capacitance are shown as a function of the switch
width. The length of transistors in the switch is set to 300 nm, the total
signal path resistance R = 10 KΩ, and Cs = 500 fF. The resistance r in
the Wheatstone sampler is chosen to minimize the signal feedthrough.
It can be seen that the signal feedthrough in the Wheatstone sampler
is about 3 orders of magnitude smaller than that of the RC sampler. . 50
4.7 RC switch implementation. Two Wheatstone samplers are cascaded
to further reduce the signal feedthrough (a), all of the switches are
implemented using anti-parallel thick-oxide PMOS devices (b). . . . . 52
4.8 Basic two-phase operation of the ADC. . . . . . . . . . . . . . . . . . 53
4.9 Charge injection from the bottom-plate switch. . . . . . . . . . . . . 54
4.10 More detailed diagram of the ADC MDAC. (a) Finite state machine
representing phases of operation. (b) ADC clocking signals. (c) Single-
ended MDAC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.11 Phase 1 operation duplicates the residue from C1 in (a) onto C3 and
C4 in (b). In phase 2, C3 and C4 charges add up (c) and construct
new residue on C1 (d). In the transition between phases 1 and 2, the
bottom-plate switch is opened first (e). If the top-plate switches are
not opened right after that, switching glitches can get transferred to
the bottom-plate and cause charge leakage (f). . . . . . . . . . . . . . 57
4.12 Top-level and booster amplifier schematics. . . . . . . . . . . . . . . . 59
4.13 Small-signal model of the OTA with varying details. . . . . . . . . . . 61
4.14 Typical pole-zero plots of amplifiers. (a) Miller-compensated amplifier.
(b) Cascode-compensated amplifier. . . . . . . . . . . . . . . . . . . . 62
4.15 T/H transient noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.16 Derivation of kT/C result. (a) Frequency domain analysis of a contin-
uous RC branch. (b) Time domain analysis of switched RC branch. . 65
xii
5.1 Die photo of the prototype chip and the ADC layout. The chip also
includes two other DAC-ADC pairs (not outlined in the photo) for
debugging options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Chip I/O interface. The Wheatstone samplers are simplified in this
schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 ADC measured static nonlinearity. . . . . . . . . . . . . . . . . . . . 74
5.4 Measured SFDR/SNDR and magnitude of the fundamental tone of the
ADC output versus input frequency. . . . . . . . . . . . . . . . . . . 75
5.5 Measured frequency responses of the ADC. The input amplitude is 1
Vppd. The ADC sample rate is about 1 MHz. . . . . . . . . . . . . . . 76
5.6 Test setup for DAC calibration. . . . . . . . . . . . . . . . . . . . . . 77
5.7 Measured output spectra of the DAC before and after calibration. . . 79
5.8 Measured output spectrum of the DAC before and after calibration.
Input signal is 800 mVppd signal containing two tones at 370 and 376
MHz. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.9 Measured SFDR of the DAC before and after calibration versus input
frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.10 SFDR versus frequency of some published DACs in comparison with a
data point from the prototype chip. . . . . . . . . . . . . . . . . . . . 82
B.1 Alternative model of the DAC including a filter before the driver. . . 91
xiii
Chapter 1
Introduction
CMOS technology scaling has fueled the continuous growth of throughput in com-
munication systems by enabling increasingly sophisticated digital signal processing
(DSP) techniques. The requirements of high-speed communication systems also prop-
agates to the circuitry further up the signal chain, such as the analog-to-digital and
digital-to-analog converters (ADCs and DACs). Traditional architectures of data
converters do not directly benefit from scaling, so there has been a spur of new archi-
tectures (mostly in ADCs) that get a boost from digital gates through calibration.
Recently, a new DAC was introduced that relies on digital calibration for achieving
high performance. The DAC uses an open-loop output driver for delivering power to
off-chip loads. While saving power, this driver is weakly nonlinear. So the DAC relies
on a calibration algorithm to identify its nonlinearity and construct a predistorter to
linearize the overall system. This thesis will focus on the design and implementation
of circuits and algorithms for the calibration of the DAC.
In the rest of this chapter, we briefly review the motivation for the new DAC
architecture. Next, in order to put the problem in perspective, we review some
applications that deal with similar challenges and/or have inspired our solution to
the problem. We discuss how each of these scenarios are similar to the problem at
hand, and in what ways they are unique.
1
CHAPTER 1. INTRODUCTION 2
din
Vout
(a) Simplified diagram of a current steering (CS) DAC.
Voutdinoutputdriver
SC core
(b) Simplified diagram of a switched-capacitor (SC) DAC.
Figure 1.1: Basic DAC architectures.
1.1 Motivation
Overview of DAC Architectures
D/A conversion is typically accomplished through the division or multiplication of a
reference quantity with a digitally controlled switched element network. The reference
can be either of the basic analog quantities, i.e., voltage, current or charge. The latter
two quantities can be more easily controlled in analog circuits and so the majority of
DAC designs are either current or charge based.
A B-bit current based DAC consists of 2B − 1 matching unit current sources, as
shown in Figure 1.1(a). Depending on the input code, the currents of a number of
CHAPTER 1. INTRODUCTION 3
these sources are steered to a common branch and combined to provide a correspond-
ing analog output. Hence, this architecture is named current steering (CS) DAC. In a
CS DAC, the accuracy of the output amplitude is determined by the current matching
between the cells. This quickly becomes a critical problem at high resolution because
there must be as many unit current sources as the output levels. Also, since the
unit currents are directly steered to the output, the dynamic accuracy of the DAC is
determined by the timing matching between the cells. Therefore, any amplitude or
timing error in the unit cells is transparent to the output. These considerations are
the most important challenges in the design of high performance CS DACs.
Charge based DACs, as shown in Figure 1.1(b), operate through manipulation of
capacitor charges in a switched-capacitor (SC) network, and hence they are known
as SC DACs. The SC core produces a charge output, and so for off-chip driving
capability a driver must be included at the output [1]. The driver samples the output
of the SC core when the conversion is complete and transforms it to an output voltage.
Therefore, amplitude and timing accuracy are separated in this architecture [2], in
contrast to a CS DAC. The fundamental problem with this architecture, however, is
that the output driver must provide linearity at the speed of operation. Therefore,
this architecture has been considered inferior at high operating speeds.
Proposed DAC Architecture
The newly proposed DAC is based on an SC architecture. However, instead of a
traditional closed-loop output driver, it utilizes an open-loop output driver. Figure 1.2
depicts the “big picture” of the research and shows where the newly proposed DAC
fits in. As implied in the picture, the DAC has a weak nonlinearity as a result of
open-loop amplification. This is a let-down in accuracy but it can enable higher
performance in speed. A digital look-up-table (LUT) is placed in front of the DAC
to cancel out its nonlinearity. However, since the nonlinearity of the DAC cannot be
known a priori and is also subject to drift, a digital calibration loop must constantly
adjust the LUT. The input data is readily available in the digital domain and is
therefore accessible for calibration. Also, the calibration algorithm needs to be able
to look at the DAC output in order to assess its nonlinearity. Therefore, an ADC is
CHAPTER 1. INTRODUCTION 4
DAC
slowADC
AdaptiveAlgorithm
LUT
din Vout
Figure 1.2: The research’s big picture. This research focuses on the design of the
calibration ADC as well as the predistortion algorithm.
needed to provide that data. We will explain why this ADC is labeled “slow” later,
as well as why the addition of all these extra components will not defeat the purpose
of making a higher performance DAC.
The first goal of this thesis is the design of an appropriate ADC to sense the
DAC output for use in calibration. The required ADC must maintain high linearity
across the DAC bandwidth while imposing small power and area overhead. The
other aspect of the research is the design of a practical calibration algorithm with low
computational cost that is suitable for hardware realization.
1.2 Related Applications
1.2.1 Pipeline ADCs
Perhaps the majority of digitally-assisted mixed-signal calibration techniques intro-
duced in the past few years were targeted for pipeline ADCs. The constituent blocks
of a pipeline ADC consist of a sub-ADC, a sub-DAC and a residue amplifier, as shown
in Figure 1.3. The interesting property of a pipeline is that it is robust to errors in the
CHAPTER 1. INTRODUCTION 5
S/H stage 1 stage 2 stage i stage N
A/D D/A
GVin Vres
Vin
Figure 1.3: Structure of a pipeline ADC.
sub-ADCs, as long as the output of every stage is within the input conversion range
of the rest of the converter. On the other hand, any inaccuracy in the gain of the
residue amplifier is detrimental to the linearity of the ADC and must be compensated
through calibration [3, 4] .
The use of an open-loop amplifier instead of a traditional closed-loop architecture
in pipeline ADCs has been well researched in the past few years [5, 6, 7]. There are
several reasons that make pipeline ADCs especially attractive for calibration tech-
niques. First, ADCs produce digital output that is readily available to a calibration
algorithm without a need for an additional sensing mechanism. Also, the inherent
redundancy of the pipeline architecture tolerates small intentional modulations of
subADC errors without affecting the output. The digital output together with the
introduced modulations inside ADC stages provide information about the nonlinear-
ity under calibration. A DAC, on the other hand, is a continuous time system and so
it must be paired with an ADC in order to feed the calibration algorithm with output
signal information. Also, regular DAC architectures lack an inherent redundancy as
in the case of pipeline ADCs. As we will see in the next chapter, the proposed DAC
is especially designed to have redundant codes that can be exploited for calibration.
CHAPTER 1. INTRODUCTION 6
digitalpredistortion
PADAC
ADC
Figure 1.4: Digitally predistorted PA.
1.2.2 Digital Predistortion of Power Amplifiers
Open-loop output drivers are reminiscent of power amplifiers (PAs), where the oper-
ation at radio frequencies (RF) precludes the use of a closed-loop amplifier. Similar
to our current application, PAs also are sometimes driven into their nonlinear region
in order to achieve higher DC to RF power conversion efficiency. Operating in the
nonlinear region generates both in-band distortion resulting in higher bit error rate,
as well as out-of-band spectral re-growth causing interference to adjacent channels.
Therefore, there is a need for PA calibration to inhibit these errors.
The calibration techniques used for PA nonlinearity compensation are diverse and
depend on the type of nonlinearities involved and the desired level of accuracy. In case
the PA has a memoryless nonlinearity, lookup table (LUT) approaches can be used
for the predistorter [8]. A Wiener predistorter is utilized to linearize a PA modeled
by a Hammerstein system [9]. In wide bandwidth applications such as wideband
code-division multiple access (WCDMA) or wideband orthogonal frequency-division
multiplexing (W-OFDM), PA memory effects are more complicated. These memory
effects can occur both on the timescale of the signal—e.g. due to filtering of the
matching networks—as well as much longer timescales—e.g. due to thermal time
constants and memory of the bias lines. Assuming a general Volterra filter model, the
PA can be predistorted by first identifying the Volterra kernels and then calculating
the so-called pth-order inverse [10]. Also [11] shows the construction of robust memory
polynomial predistorters by adopting an indirect learning architecture.
CHAPTER 1. INTRODUCTION 7
PA
AEC
echo
localspeech
LEMS
far-end speech
speaker
mic
near-end speech+ residual echo+ noise
Figure 1.5: General setup of an AEC.
System-level calibration algorithms become increasingly complicated as aggregate
errors of more blocks come into sight. For example, a baseband predistorter in a
Cartesian transmitter must compensate not only the nonlinearities of the output PA,
but also any frequency dependent amplitude mismatch and phase imbalance between
the two paths. In these cases, the calibration does not constitute just a small overhead,
but rather requires considerable power and complexity.
1.2.3 Loudspeakers
Another scenario that resembles the current application is the problem of acoustic
echo cancellation (AEC) in loudspeakers. This problem arises in full-duplex hands-
free telephone systems, where acoustic coupling between the loudspeaker and the
microphone at the near-end of the line causes echoes at the far-end. The microphone
at the near-end picks up both a direct sound from the loudspeaker as well as its
reflections through the loudspeaker-enclosure-microphone system (LEMS) such as a
conference room or a car compartment. Figure 1.5 shows an acoustic echo path in a
telephone system. Similar paths exist at every end of a telephone system that employs
a hands-free system. An acoustic echo has a long and often time-varying delay that
includes nonlinear components and so it becomes annoying for the far-end speaker.
Nonlinearity in the system can also occur in cheap loudspeaker when the amplifier
is being overdriven. In case the amplifier saturation is negligible or target sound
quality is not high, a linear model or a slowly time varying model can be used.
CHAPTER 1. INTRODUCTION 8
However, it is preferable to use low-cost and small-sized loudspeakers in consumer
audio products for economic reasons. These cheaper loudspeakers exhibit nonlinear
characteristics and signal distortion and make the acoustic path nonlinear.
Some extreme solutions for this problem include the use of Volterra filter as in
[12], which is very computationally demanding. Also, neural network based structures
were proposed in [13], but training the neural network can be costly. An alternative
approach is to model the acoustic path with a Hammerstein system consisting of a
memory-less nonlinearity corresponding to the PA/loudspeaker, and a linear block
corresponding to the room impulse response from the loudspeaker to the microphone.
This model is less complicated than a Volterra filter and can be more easily calibrated
[14].
DACs, PAs and loudspeakers all produce continuous time outputs, can be modeled
as systems with a mixture of nonlinearities and memory effects and can be predis-
torted in the digital domain to inhibit the nonlinearities. However, there is not a
single recipe that can fit all applications. The appropriate calibration technique de-
pends on the characteristics of the model (such as the number and magnitudes of
the parameters), the allowable overhead for calibration (in terms of hardware and
calibration time), the desired accuracy, etc.
1.3 Thesis Outline
The rest of this thesis is organized as follows:
• Chapter 2 covers some of the requirements of the system under calibration.
Specifically, we focus on the requirements of the calibration ADC and discuss
how some of the important ADC characteristics such as sampling speed, acqui-
sition bandwidth and noise affect the calibration.
• Chapter 3 introduces the suggested linearization method. We will discuss two
calibration methods that were implemented in software to linearize the DAC
and that achieved comparable results. The first method is based on parameter
estimation that results in a short calibration time but requires considerable
CHAPTER 1. INTRODUCTION 9
computational prowess, which makes it unsuitable for physical implementation.
A second calibration method is proposed that trades off algorithm complexity
with calibration time and has good potential for VLSI implementation.
• Chapter 4 explains the physical design and implementation of the calibration
ADC. We discuss the ADC architecture and go over its various building blocks
at the circuit level. Important considerations and trade-offs in the design of
each block are also covered.
• Chapter 5 provides measurement results of the prototype IC.
• Finally, chapter 6 concludes the thesis and provides some ideas for future work
and improvement of the system.
Chapter 2
System Considerations
2.1 DAC Requirements
In the previous chapter, we talked about the proposed DAC architecture. In this
section, we review some of the characteristics of the DAC that are essential in under-
standing the system requirements and the calibration.
We mentioned that in order to simplify the calibration, the DAC is specifically de-
signed to have memoryless nonlinearity [15]. Therefore, there is a one-to-one mapping
between inputs and outputs of the DAC and a digital predistorter having the inverse
mapping can cancel its nonlinearity. It is important to note that there is only so much
that the calibration can do because it does not change the range of the output pulses
that the DAC can generate. The calibration (via the predistorter LUT) can provide
the DAC with alternative input codes, but nevertheless the DAC must be designed to
be able to output all the pulses that a perfectly linear DAC generates. This is illus-
trated in Figure 2.1. The first picture shows a DAC with exaggerated nonlinearity as
well as the desired response. We can see that the DAC cannot always produce outputs
that are within 1 LSB of the desired outputs, such as the ones that are marked with
filled squares. This would only happen close to the mid-code where the I/O curve
has the largest slope. This problem can be solved by including redundant codes in
the DAC. It can be seen that if the mid-code slope is k times larger than the ideal
line then the DAC must have at least k times more codes to be able to fill the output
10
CHAPTER 2. SYSTEM CONSIDERATIONS 11
input code
Vout
(a) (b)
input code
Vout
Figure 2.1: Two examples where a DAC’s I/O curve is not reversible. (a) Input codes
are not tight enough. (b) I/O curve discontinuity produces dead-zone.
1111111111111111111011111111011111111100
0000000111000000011000000001010000000100
0000000011000000001000000000010000000000
11111111
00000001
00000000
DSP ADC
fsfs/2
Spectrum Analyzer
Figure 2.2: The calibration algorithm modifies the DAC’s mapping between input
digital codes and its codomain of output pulses.
CHAPTER 2. SYSTEM CONSIDERATIONS 12
gaps. Another potential problem is shown in Figure 2.1(b). In this case, the DAC
exhibits a discontinuous I/O curve. This happens because of capacitor mismatches
inside the switched-capacitor (SC) core of the DAC and—if left to randomness—the
jumps can go in either directions. However, the upside jumps create dead zones in
the DAC output that cannot be fixed. Therefore, the capacitors in the core must be
intentionally mismatched so that the SC core produces downward jumps large enough
that random mismatches do not make the transfer curve irreversible.
A similar argument can be made about the dynamics of a DAC output. Figure 2.1
only focuses on the settled values of the output pulses. However, the transients of
a pulse also affect the linearity of the output [16] and these dynamics cannot be
left to the calibration to correct. The idea here is demonstrated in Figure 2.2. The
DAC accepts 8-bit input codes. But internally it contains two additional bits for the
implementation of the redundancy. The calibration algorithm can then associate one
of the 210 possible outputs for each of the 28 input codes. Since the pulse transients
are already hard wired in the circuit, the DAC must be designed such that it can
produce pulses with adequate linearity in the bandwidth of interest.
2.2 ADC Requirements
We showed that not any random nonlinear DAC can be successfully predistorted.
Next, we discuss the requirements of the calibration ADC. Obviously, a high-precision
ADC that runs at the Nyquist rate of the fastest tone in the system and connects
to the DAC through a high-bandwidth buffer will do. However, such an ADC will
burden the whole system. In this section, we will see what characteristics of the
calibration ADC are key and what others can be compromised without affecting the
calibration.
2.2.1 Sampling Rate
The literature on sampling requirements for identification of nonlinear systems is
large and diverse. According to the Nyquist-Shannon sampling theorem, in order to
CHAPTER 2. SYSTEM CONSIDERATIONS 13
Vc
Vd
Ts Ts
x(t)y(t)
e[n]x[n] y[n]
Figure 2.3: Discrete-time approximation of a continuous-time Volterra system.
discretize a system without loss of information the sampling frequency—known as
the Nyquist frequency—is sufficient to be twice the maximum signal frequency in the
system. Nonlinearity expands the signal bandwidth, and so the Nyquist sampling
frequency is determined by the output. However, the Nyquist frequency of the tones
that a nonlinear system generates are often too high for practical purposes. Still,
information about the nonlinear system can allow a more relaxed sampling require-
ment. It is shown that the output of a system with a memoryless nonlinearity only
needs to be sampled at the Nyquist frequency of its input and still be perfectly re-
constructed [17, 18, 19]. This problem is also studied in the case of nonlinear systems
with memory, i.e. systems with a Volterra kernel model, and it is shown that again
the sampling frequency only needs to accommodate the input signal [20, 21]. This
result is obtained by inspecting Figure 2.3, where the sampled output of a continuous
time system is compared against the output of an equivalent discrete time system
that operates on the sampled input. It is shown that, with sampling frequencies as
low as the input Nyquist rate, the two outputs will be identical and e[n] = 0. It is
therefore concluded that an identification algorithm that uses the sampled input and
output signals will correctly identify the original continuous time system.
In the present application, however, our objective is not to reconstruct the output
of the DAC. We try to find the parameters of a model of the system. Essentially
our problem is a parameter estimation problem, as shown in Figure 2.4, where the
unknown is the system parameters that are modulated by a known sequence of input
data. The input code x[n] is in digital and available to the estimator in its entirety.
CHAPTER 2. SYSTEM CONSIDERATIONS 14
Vc
Vd
Ts
y(t)
e[n]
θ(t)
X[n] y[n]
Figure 2.4: Parameter estimation of a continuous-time Volterra system. The unknown
signal is the slow-changing set of parameters θ(t) and not the input code x[n].
The challenge of the parameter estimation algorithm is to demodulate x[n] from the
sampled output and deduce the hidden parameters of the system, θ(t). Therefore, the
actual signal that needs to be reconstructed is θ(t). The innovation rate of the system
is the rate at which these unknown parameters change—which is usually very slow—
and the sampling frequency only needs to track such changes in the parameters. These
variations can happen due to environmental effects while the circuit is running or at
startup—when the circuit wakes up to a set of actual parameters that are different
from the preset values. Often the bottleneck is set by the calibration time budget at
startup. This time, along with the amount of data that is needed by the calibration
algorithm to estimate θ, determines the sampling frequency.
As an example, suppose we know that the system can be modelled as y(t) =
x(t)+εx3(t). Therefore, we just need to estimate ε in order to construct a predistorter.
If we ignore noise in this simple example, then no matter what the frequency of x(t),
all we need to calculate ε would be a single measurement of y(t). This argument can
be extended to the general case of an arbitrary nonlinear system with memory that
can be modelled with a discrete time Volterra series [22, 23, 24]. In this case, the
CHAPTER 2. SYSTEM CONSIDERATIONS 15
relationship between the output y[n] and inputs x[n] is
y[n] = h0 +m−1∑i1=0
h1(i1)x[n− i1] +m−1∑i1=0
m−1∑i2=0
h2(i1, i2)x[n− i1]x[n− i2] + . . .
+m−1∑i1=0
m−1∑i2=0
· · ·m−1∑il=0
hl(i1, i2, . . . , il)x[n− i1]x[n− i2] . . . x[n− il] (2.1)
where m is the length of memory and l is the order of nonlinearity of the system. We
can see that although the output y is a nonlinear function of the input x, it is a linear
function of the filter parameters h. Therefore, these parameters can be found from a
sufficient number of input-output equations using, e.g., a least squares solution.
...
y[n1]
y[n2]
y[n3]...
=
.... . .
.... . .
.... . .
1 · · · x[n1 − i1] · · · x[n1 − il1] · · ·x[n1 − ill] · · ·1 · · · x[n2 − i1] · · · x[n2 − il1] · · · x[n2 − ill] · · ·1 · · · x[n3 − i1] · · · x[n3 − il1] · · · x[n3 − ill] · · ·...
. . ....
. . ....
. . .
h0...
h1(i1)...
hl(il1, . . . , ill)...
(2.2)
From the above equation, we can see that it does not really matter how far apart
in time y[n1] and y[n2] were placed. Even if y[n] is obtained at very low sample
rates, the least squares problem can be solved. Of course, the sampling rate must
be fast enough that the filter parameters stay constant while the above matrices are
populated. Therefore, it is possible to find the coefficients of a general Volterra model
from downsampled outputs of the system. There are also several methods to find the
inverse of such a system [10, 25].
It should be noted that modeling a system with a general Volterra series is a
brute-force method that is often not a good recipe for parameter estimation. A
Volterra model is often very complex and the number of involved parameters quickly
grows as the order of nonlinearity and memory in the system increases. For example,
consider a system that consists of the cascade of a static nonlinearity and a linear
filter. Assuming that the nonlinearity can be described with l coefficients and the filter
CHAPTER 2. SYSTEM CONSIDERATIONS 16
with m coefficients, the Volterra series of the system would include l ·m coefficients,
whereas only the l + m underlying parameters are truly needed for modelling. Also
typically, many of these coefficients are already small. When multiplied together they
will be even smaller, making the identification problem degenerate and increasing the
error. Numerical stability of the problem can be improved with regularization but,
in general, it pays off to go deeper with the modeling to incorporate the underlying
coefficients and not use the general Volterra form.
2.2.2 Acquisition Bandwidth
In the previous section, we concluded that the identification of a discrete time Volterra
filter with known inputs can be accomplished with arbitrary sampling rates at the
output. In this section, we study the requirement on the acquisition bandwidth of the
sampler. For this analysis, we need to think in the frequency domain, and to make
the thinking simpler, we first assume that the sampler works faster than necessary at
the Nyquist frequency of the output so there is no spectrum folding.
The goal of the calibration algorithm is to linearize the DAC. However, the DAC
output spectrum beyond the Nyquist zone will be filtered further down the signal
chain and so it is irrelevant to its performance. The calibration ADC is to capture
the output and judge the linearity of the system. So it suffices for the ADC to have
better than the target linearity in the DAC Nyquist zone and ignore higher frequency
content. In practice, however, the ADC does not quite ignore the spectrum above the
DAC bandwidth, but captures it with some error. This error must be within bound so
it does not affect the accuracy of parameter estimation. The essence of most param-
eter estimation algorithms is a least squares problem where the algorithm converges
to a set of parameters that minimize the squared error. In this case, our calibration
algorithm tries to linearize the system, so the error must capture the nonlinearity
of the DAC. According to the Parseval theorem, the total squared nonlinearity er-
ror in the time domain is equivalent to the total distortion power in the frequency
domain. Therefore, the calibration can function properly as long as the out-of-band
nonlinearity error does not pollute the in-band error. In other words, the power of
CHAPTER 2. SYSTEM CONSIDERATIONS 17
the dominant in-band harmonic must be larger than the total out-of-band distortion
power.
If the ADC downsamples the DAC output, the harmonics will fold down and
mix. But as we proved in the previous section this does not affect the calibration.
It will only make the reconstruction of the DAC output impossible. But from the
perspective of a least square problem downsampling is equivalent to eliminating rows
of equations. The final result will be intact as long as there is a sufficient number of
equations.
2.2.3 Boxcar Sampling
The slow calibration ADC may sample an output pulse of the high speed DAC in
no more than one instant. However, ideal sampling cannot fully measure the non-
linearity of the whole pulse unless the DAC is designed to have sufficiently fast or
linear transitions and the output pulse is sampled after it is settled. A typical DAC
output pulse may exhibit various artefacts, such as nonlinear transitions, overshoot
or ringing, clock feedthrough, etc., before it settles to its final value. Taking a single
sample during such transients does not fully represent the whole pulse. A better
choice might be to sample the pulse close to its end when its value is almost settled.
This representation especially works for sampled systems, such as switched capacitor
circuits, which process instantaneous signal samples. However, a DAC output is a
continuous time signal and signal transients must be taken into account as part of
the whole picture. Heuristically, a better linearity measure is pulse energy which can
be obtained by integrating the whole pulse. This way all transients will be simply
combined proportional to their temporal durations. If the transients are short, then
the pulse energy will be proportional to its final value and if not, they will be weighed
in accordingly. Equivalently, we could measure the so called ”glitch energy”, which is
the integral of the difference between the actual and ideal pulse, as a measure of the
nonlinearity of a DAC. Yet again, pulse or glitch energy do not fully represent a con-
tinuous time pulse. Linearity of pulse energy is neither a necessary and nor sufficient
condition for linearity of a DAC. Figure 2.5 illustrates two example cases when looking
CHAPTER 2. SYSTEM CONSIDERATIONS 18
(a) (b)
time
output signal
time
output signal
Figure 2.5: Two examples where pulse energy does not translate to signal linearity.
(a) Output signal is nonlinear but pulse energy is linear. (b) Output signal is linear
in bandwidth of interest but pulse energy is nonlinear.
at pulse energy alone could be misleading. In case (a) the signal exhibits nonlinear
overshoot and undershoot. However the two deviations almost cancel each other in a
way that the area under the pulse remains linear. In case (b) there is significant clock
feedthrough which leads to a nonlinear pulse energy. However such fast transients
are more likely to fall outside the bandwidth of interest and be completely irrelevant.
In comparison, the exact same glitch energy if originated from wrong signal levels
would be more important. Next, we will examine the heuristic behind pulse energy
and consider the situations when it provides meaningful information.
The ultimate test for linearity of the DAC is to examine its frequency response
via Fourier transform. If we write the DAC output as a series of pulses as
vDAC(t) =∑k
pk(t− kTs) (2.3)
CHAPTER 2. SYSTEM CONSIDERATIONS 19
then its Fourier transform will be
F(vDAC) =
∫ ∑k
pk(t− kTs)e−j2πftdt
=∑k
(∫Ts
pk(t)e−j2πftdt
)e−j2πfkTs
=∑k
dADC [k]e−j2πfkTs
= DTFTdADC [k] (2.4)
where
dADC [k] =
∫Ts
pk(t)e−j2πftdt (2.5)
Therefore, given the above relationship for the ADC output, the DTFT of the ADC
samples will be equivalent to the continuous-time Fourier transform of the DAC out-
put. This means that ideally the ADC must prefilter the DAC output with a boxcar
sampler that has a complex gain of e−j2πft. Measuring the pulse energy is equivalent
to using a normal boxcar sampler with a fixed gain of 1. It will give an approximation
of the ideal case which is valid only at frequencies that are much smaller than the
DAC sampling frequency:
0 ≤ t ≤ Ts : e−j2πft ≈ 1↔ fTs 1↔ f 1
Ts(2.6)
We see that there is not an easy way to accurately represent the DAC pulse wave-
forms with discrete samples. However, all we need the ADC for here is to calibrate
the DAC nonlinearity. We also know that the DAC signal may undergo some filtering
depending on its [off-chip] load and any calibration algorithm must be blind to linear
filtering at the output of the DAC. Therefore, the ADC may as well operate on a
filtered version of the DAC output. Assuming that there is a filter h(t) at the input
of the ADC (Figure 2.6), we have
dADC = h(t) ∗∑k
pk(t− kTs) =∑k
∫h(τ)pk(t− kTs − τ)dτ
If the filter response stays constant during the intervals of Ts the above equation can
be simplified to
dADC =∑k
h(t− kTs)ek (2.7)
CHAPTER 2. SYSTEM CONSIDERATIONS 20
001010011011001000001110
01011001
Ts
fastDAC
slowADC
filter
001010011011001000001110
01011000
fastDAC
slowADC
(a)
(b)
Figure 2.6: System block diagram and mapping between continuous-time and
discrete-time. (a) Accurate mapping of a continuous time signal into a discrete sam-
ple cannot be implemented easily. A prefilter in front of the ADC as shown in (b)
can facilitate this.
CHAPTER 2. SYSTEM CONSIDERATIONS 21
t
t
f
f
h1(t)
h2(t)
H1(f)
H2(f)
T
T fs
fs
(a) (b)
(c) (d)
Figure 2.7: Two pre-filter options for the calibration ADC: A filter with a long time
response (a) has a narrow frequency bandwidth (b). However, a boxcar sampler with
a short time window (c) will pass more frequencies (d).
where ek is the energy—or integral of—pulse pk.
Two example filters that satisfy the above criterion are shown in Figure 2.7. In
case (a), the filter has a slow changing time domain response. However, this translates
into severe filtering in the frequency domain which decreases the sampling SNR. The
filter shown in (c) provides a better alternative with a wider bandwidth.
Therefore, we end up using a boxcar sampler just in line with the heuristic intuition
that was mentioned at the beginning of this section! To ascertain that we were not
going in loops here with all the discussion that went in between, we note again that
in our system the ADC will not represent the DAC output, but rather its sinc-filtered
output. The reason this is acceptable is that the sampler has to be sensitive to the
DAC nonlinearity and insensitive to output filtering. So insertion of an extra filter
CHAPTER 2. SYSTEM CONSIDERATIONS 22
before the ADC, as long as it does not limit its acquisition bandwidth, does not get
in the way of the calibration.
2.2.4 Noise Requirement
In order to see the impact of noise on the calibration, we again consider approaching
the parameter estimation problem with a least squares method. In this framework,
there are typically more equations than unknowns, i.e., the problem is overdeter-
mined. The least squares approximation is intentionally tailored to weigh in all the
information and average out the noise. Therefore, as long as we can provide enough
measurements, the parameter estimation problem can be made robust to noise. This
means that the ADC can have relaxed noise requirements. However the downside will
be longer calibration time. We will take a closer look at the trade-off between noise
and calibration time in the next chapter.
2.3 Summary
In this chapter, we discussed some of the high-level requirements of the DAC and the
ADC. It is seen that the requirements on the calibration ADC are much different than
those of a general purpose ADC, which leads to unique tradeoffs and opportunities.
To recap, the ADC has very low conversion rate and high noise. On the other hand, it
must have large acquisition bandwidth, small power and area overhead on the overall
system as well as negligible loading on the DAC. These requirements only make sense
when the design process starts at the system level.
Chapter 3
Calibration Algorithm
Before we can develop a calibration strategy, we must have an appropriate model
of the system. The complexity of this model directly affects the complexity of the
calibration algorithm. So it is important that we model the system in the simplest
possible way without compromising accuracy.
In this chapter, we first review a few models of the DAC in discussion. Next,
we will cover a few calibration algorithms and also describe several techniques to
reduce the computational power of the algorithm. The development of the calibration
algorithms and the design of the DAC were partly overlapped. So the progress of the
discussion in this chapter reflects the trend of thought during the research timeline.
3.1 System Modelling
3.1.1 Volterra Model
We mentioned in the previous chapter that a general nonlinear system with memory
can be modeled as a Volterra filter. While being general, a Volterra model presents
a convoluted picture of the system and, more often than not, includes redundant
degrees of freedom. Therefore, the estimation of a Volterra filter parameters and,
consequently, the calculation of its inverse filter will become unreasonably costly.
Obviously, the most direct way of simplifying the model is through careful design of
23
CHAPTER 3. CALIBRATION ALGORITHM 24
xy
(a) Predistortion off
xy ≈ x
(b) Predistortion on
Figure 3.1: DAC with static nonlinearity and nonlinear output filter.
the system. The specific DAC under calibration was intentionally designed such that
the nonlinearity and the memory effects do not blend together [15]. The nonlinearity
of the DAC comes in part from the SC core—due to capacitor mismatches—and in
part from the open-loop drivers. The memory effect mainly happens at the output
load which resides off-chip. In order to keep the nonlinearity and the filter isolated,
care was taken to decrease any filtering effect through the SC core up until the
drivers. Then, using circuit level information we can come up with a model that is
much simpler than a general Volterra filter.
3.1.2 Static Nonlinearity and Nonlinear Filtering
By minimizing filtering through the DAC core we are able to model it as a static
nonlinearity, which is the simplest form of nonlinearity. The output filter, however,
might not be a simple linear filter. Especially, in the early versions of the DAC design,
there was non-negligible nonlinearity in the output filter. This was caused due to the
voltage dependence of the junction capacitors at the output and large voltage swing
at this node. Therefore, the DAC is modeled with a static nonlinearity cascaded with
a nonlinear filter. We will next explain an algorithm for predistortion of a DAC with
this model.
Figure 3.1 is a simple diagram showing the predistorter block, the DAC core and
the output filter. In the first picture, the predistorter is off and so the input x directly
gets to the DAC input. The open-loop driver of the DAC produces a current G(x)
with a weak sigmoid nonlinearity. This current is pumped into a resistive load and a
weakly nonlinear output capacitance. Writing a KCL at this node, we can find the
CHAPTER 3. CALIBRATION ALGORITHM 25
output voltage y as below:
G(x) = C(y)y + y (3.1)
where C(y) is the voltage dependent capacitance, y is the derivative of the output
voltage and the equation is normalized to the value of the resistance. The information
available to the calibration algorithm are the entire sequence of the input x as well as
sporadic samples of y. We see that the derivative of the output also participates in
the system’s equation, but it cannot be obtained from downsampled output values.
However, if the nonlinearities of the circuit are small or if the predistorter can almost
linearize the DAC, then the output y will almost be equal to x (Figure 3.1(b)). So we
may use the input derivative as an initial approximation for y. This approximation
improves as the system is progressively linearized with calibration. Therefore, we can
write
y = −C(y)y +G(x) ≈ −C(y)x+G(x) (3.2)
Lets assume for simplicity that we pick polynomial bases to describe the nonlinear
functions G(·) and C(·). Thus,
y ≈ −x(c0 + c1y + c2y
2 + c3y3 + . . .
)+(g0 + g1x+ g2x
2 + g3x3 + . . .
)(3.3)
As we take more and more samples of the output, we can build up a system of
equations in terms of the unknown coefficients c and g:
...
yl
ym
yn...
=
......
......
. . ....
......
.... . .
−xl −xlyl −xly2l −xly3l · · · 1 xl x2l x3l · · ·− ˙xm − ˙xmym − ˙xmy
2m − ˙xmy
3m · · · 1 xm x2m x3m · · ·
−xn −xnyn −xny2n −xny3n · · · 1 xn x2n x3n · · ·...
......
.... . .
......
......
. . .
c0
c1
c2
c3...
g0
g1
g2
g3...
(3.4)
CHAPTER 3. CALIBRATION ALGORITHM 26
xz y ≈ x
LMS algorithm
Figure 3.2: Calibration of a DAC with static nonlinearity and nonlinear output filter.
These equations can be solved using a least-mean-squares (LMS) method, as
shown in Figure 3.2. After obtaining estimates of the DAC nonlinearity function
C(·), the next step would be to predistort the incoming data such that
C(z) = x⇒ z = C−1(x) (3.5)
The predistorter can use the following algorithm to calculate the inverse function:
Algorithm 1 Iterative predistortion algorithm to calculate z = C−1(x)
1: z(0) = x
2: while z has not reached steady state do
3: z(n+1) = z(n) + ε(x− C(z(n))
)// where ε is a small update factor
4: end while
Since the nonlinearity is weak, the initial value of z in line (1) is not far from
accurate. After a few iterations at steady state we have
z(n+1) = z(n) (3.6)
which combined with the update formula in line (3) results in
z(n+1) = z(n) = z(n) + ε(x− C(z(n))
)⇒ x = C(z(n)) (3.7)
CHAPTER 3. CALIBRATION ALGORITHM 27
x y
c h
Figure 3.3: Hammerstein model of a system with input x and output y.
which is exactly what we needed.
The example calibration in this section shows how the knowledge about the nature
of nonidealities can help us come up with a calibration scheme. The main techniques
that were used in this example—such as LMS filtering, adaptive calibration and
iterative predistortion—are recurrent themes in calibration algorithms. We will see
more examples of these techniques in the rest of this chapter.
3.2 Hammerstein System Calibration
As the design of the DAC progressed, it turned out that the static nonlinearity of the
DAC tends to be more complicated than the output filtering. The DAC eventually
utilized three interleaved cores to increase sampling speed. Since each core is different,
there would be three times more nonlinearity coefficients. On the other hand, the
nonlinearity of output capacitance was diminished through the use of cascode devices.
Eventually, the DAC model boiled down to a cascade of a static nonlinearity and a
linear filter, as illustrated in Figure 3.3. This model is known as a Hammerstein
model. In the next two sections, we cover techniques for predistortion of this model.
3.2.1 Parameter Estimation
A Hammerstein model is defined by two sets of parameters: c ∈ Rl is a vector of
nonlinearity coefficients and h ∈ Rm a vector of filter coefficients. It can be seen
that the relationship between the input and output in this system can be described
CHAPTER 3. CALIBRATION ALGORITHM 28
as follows
yi = hT
1 xi x2i . . .
1 xi−1 x2i−1 . . .
1 xi−2 x2i−2 . . ....
......
. . .
c = hTVic (3.8)
A Hammerstein system identification problem is finding the two vectors c and h
that best describe the input-output relationship of a real system, or in other words,
minimize the error between the actual output of the system and the one expected
output from a Hammerstein model. Therefore, we have
[h c] = arg min[h c]
N∑i=1
(yi − hTVic
)2(3.9)
The above equation for h and c does not have a closed-form solution. However, we
can see from (3.8) that a Hammerstein model is linear in terms of h and c. So having
an estimate for c or h, we can find the other one through a least squares problem;
then, we can use the new vector to improve our initial estimate and continue these
iterations until results converge. We will now take a closer look at one step of this
iterative method.
With a given c vector, the equation below denotes an LS problem that gives h:
y = Ah (3.10)
where y is a vector of DAC output digitized by the ADC and A is a tall matrix whose
elements are functions of the DAC inputs and the c coefficients. As the ADC takes
more measurements, new elements will be appended to the end of the vector y and
new rows to the end of this equation. For example, a few rows of the equation (3.10)
will look like
...
yl
ym
yn...
=
......
.... . .
f(xl) f(xl−1) f(xl−2) . . .
f(xm) f(xm−1) f(xm−2) . . .
f(xn) f(xn−1) f(xn−2) . . ....
......
. . .
h0
h1
h2...
(3.11)
CHAPTER 3. CALIBRATION ALGORITHM 29
where f is our estimate of the nonlinearity function based on the current set of c
vectors. After solving the above equation for h, we can update the c coefficients
through this equation:
y = Bc (3.12)
where the entries of the matrix B are functions of the DAC input as well as the
current estimate of the filter h. So a few rows of this equation will be
...
yl
ym
yn...
=
......
.... . .∑
hk∑hkxl
∑hkx
2l . . .∑
hk∑hkxm
∑hkx
2m . . .∑
hk∑hkxn
∑hkx
2n . . .
......
.... . .
c0
c1
c2...
(3.13)
The two LS problems must be solved in each iteration until the coefficients c and
h converge to their final values. There are efficient techniques for solving such LS
problems. In particular, the algorithm can be implemented so that it does not need to
store the whole A or B matrix before solving it, but rather the computations can be
paced at the rate the matrices are being built. So the algorithm takes each row of the
matrix and the corresponding entry of y as they come in and updates the solution.
Therefore, the height of the LS matrices, as determined by the noise requirement,
only adds to the computation time and not the hardware computing resources. The
width of the matrices, however, is equal to the number of unknown parameters and
will directly determine the complexity of the DSP hardware.
The width of the matrix A is equal to the number of filter coefficients, h, and is set
by the time constants associated with the load. The width of the matrix B is equal
to the number of nonlinearity coefficients. As we mentioned before, the DAC includes
three interleaved cores, each of which have their own driver nonlinearity coefficients
as well as SC core mismatch coefficients. With three cores, the equations (3.11) will
CHAPTER 3. CALIBRATION ALGORITHM 30
change into
...
yl
ym
yn...
=
......
.... . .
f(xl) f ′(xl−1) f ′′(xl−2) . . .
f(xm) f ′(xm−1) f ′′(xm−2) . . .
f(xn) f ′(xn−1) f ′′(xn−2) . . ....
......
. . .
h0
h1
h2...
(3.14)
where f(·), f ′(·) and f′′(·) denote the nonlinear functions corresponding to the three
cores. On the other hand (3.13) changes into
...
yl
ym
yn...
=
......
.... . .∑
h3k∑h3kxl−3k
∑h3kx
2l−3k . . .∑
h3k∑h3kxm−3k
∑h3kx
2m−3k . . .∑
h3k∑h3kxn−3k
∑h3kx
2n−3k . . .
......
.... . .
c
c′
c′′
(3.15)
where c, c′
and c′′
are vectors corresponding to the nonlinearity coefficients of each
core. We see that the matrix entries in (3.15) are now more complicated than those
in (3.13). Also the LS matrix is now three times wider than before. Therefore,
solving the corresponding LS equation in (3.15) becomes a major bottleneck in the
calculations. In the next section, we investigate methods to reduce the complexity of
the problem.
3.2.2 Adaptive Predistortion
The linearization technique based on parameter estimation first attempts to estimate
all parameters of the model; then, it passes the nonlinearity coefficients to a predis-
torter block where the inverse nonlinearity is constructed (Figure 3.4). It is clear
that this is a brute-force approach to linearization problem because only a subset of
the identified parameters are needed for constructing the predistorter. In particular,
filter coefficients will be discarded at the end of parameter estimation. They are only
used as a complementary step in the iterative solution, whereas the main goal is the
identification of the nonlinearity coefficients from which we can calculate the inverse
CHAPTER 3. CALIBRATION ALGORITHM 31
ADCSystem
ID
outputvoltage
DAC h1
h2
LUTdigitalinput
x
y
d
Figure 3.4: Calibration through system identification.
nonlinearity and construct a predistorter. In fact, even the nonlinearity coefficients
are not directly used at the end of the algorithm and all we care about is obtaining
the inverse coefficients.
The redundancy of information produced in a parameter estimation based lin-
earization suggests that there might be a direct path of obtaining the predistorter
coefficients without having to identify the system parameters. In other words, the
objective of the calibration algorithm is not to estimate the system parameters per
se, but rather to come up with a set of predistortion parameters that can invert the
system nonlinearity. An efficient calibration algorithm should be blind to any linear
filter inside the system and only provide a measure of nonlinearity of whole signal
path. Given a cost function that measures the path nonlinearity, an adaptive algo-
rithm can then directly adjust the predistorter until desired linearity is obtained. This
linearization approach is shown in Figure 3.5. Such an adaptive method bypasses the
calculation of system parameters and therefore can be more efficient than a lineariza-
tion method through system identification. In the following section, we investigate
the calculation of a cost function that can be used in an adaptive calibration.
CHAPTER 3. CALIBRATION ALGORITHM 32
ADCAdaptive
Filter
digitalinput
outputvoltage
DAC h1
h2
LUT
y
d x
Figure 3.5: Calibration through adaptive predistortion.
Adaptive Linearization Based on Coherence Function
A suitable cost function for linearizing the system must yield a measure of linearity
of the system given its inputs and outputs. Having such a cost function, we can
adjust the control knobs of the system until desired linearity is achieved even without
knowledge of the linear block. An example cost function used in [14] in the context
of AEC calibration is the coherence function as given below:
Cxy(f) =|Sxy(f)|2
Sxx(f)Syy(f)(3.16)
where Sxy(f) is the cross-spectral density between x and y, and Sxx(f) and Syy(f) are
the auto-spectral density of x and y, respectively. It can be shown that this function
is equal to unity at any frequency where the input/output relationship is linear and its
value decreases as the nonlinearity grows. In order to have a measure for nonlinearity
across all frequencies, the integral of the coherence function can be used. Calculation
of the coherence function is especially economical thanks to effective methods for
obtaining DFT. However, capturing the samples for the calculation of DFT requires
Nyquist sampling at the output of the system which we hoped to avoid.
CHAPTER 3. CALIBRATION ALGORITHM 33
Figure 3.6: Recursive least squares using Givens rotations.
Adaptive Linearization Based on LS Error
A better candidate for the cost function of an adaptive predistortion can be obtained
by the following formula:
e = ‖y −Dh‖ (3.17)
where y is the output of the system, D is a matrix containing the inputs and h is a
filter and comprises the linear relationship between d and y. The error captures the
deviation from a linear filter model of the system so it can quantify the linearity of
the system.
The least squares (LS) error in (3.17) can, of course, be found by first solving the
equation for h and then explicitly evaluating the expression. This is typically done
by applying orthogonal transformations to both sides of the equation such that the
matrix D is triangularized, which then makes the equation much easier to solve. The
transformation can be done through a series of Givens rotations, where each rotation
zeros an element in the subdiagonal of the matrix. In a real time application with
growing sets of measurements, every time a new row is appended to D, a series of
Givens rotations are applied to zero that row. This will update the triangular matrix
which then is used to update the LS solution. An example is illustrated in Figure 3.6.
The leftmost picture shows an already triangularized matrix D alongside the vector
y and it is assumed that the length of the filter h is 3. When the ADC takes a new
sample, a new row will be added to the equation. Three Givens rotations are then
applied that zero the new row. The LS solution h and the LS error in (3.17) can then
be calculated successively.
However, it can be shown that having the Givens rotation angles is sufficient for
CHAPTER 3. CALIBRATION ALGORITHM 34
dnT yn1
e
Figure 3.7: Systolic array implementation of the RLS error calculation algorithm.
calculating the error of a LS problem and there is no need for explicit calculation
of the filter coefficients [26, 27]. A proof of this theorem is presented in Appendix
1. A pseudocode is also provided that shows how the error can be updated online
when the LS equation develops over time. The interestingly simple final result is that
the residual error is proportional to the product of all the cosines of Givens rotation
angles:
en =m∏i=1
cos(φi)pn (3.18)
where en is the error at time n, φi is the angle of the ith Givens rotation, and pn is
the results of the Givens transformations applied to the nth measurement yn.
In this work, the RLS error calculation was implemented in software. However,
there is so much that can be done to make it more hardware-friendly. Specifically, the
calculation of sinφ and cosφ ’s involved in the algorithm (lines 7 and 8 of the pseu-
docode in Appendix 1) can be done without any division or square root operation
using the CORDIC algorithm. In this case, the whole algorithm can be implemented
CHAPTER 3. CALIBRATION ALGORITHM 35
using simple operations such as addition, subtraction, bitshift and table lookup. Fur-
thermore, as Figure 3.7 shows, the algorithm can be pipelined in a systolic array
structure [27]. The systolic array in this case consists of a network of processing cells
and each cell is a CORDIC block. Rows of the data matrix and elements of y enter
the array from the top and the error is obtained at the bottom. Since the processing
is pipelined, the grid accepts new data and sends out an updated error in every cycle.
From Figure 3.7, it can be seen that the number of CORDIC blocks in the systolic
array is about (m + 2)(m + 3)/2, where m is equal to the width of the matrix D or
the length of the filter h. The number of logic gates in a CORDIC block is roughly
comparable to the number of gates in a multiplier. Assuming that the processing is
done with 15 bits of accuracy, then the implementation of the systolic array will have
a gate count equivalent to about 100m2 full adders (FAs).
3.3 Optimization of the Predistorter
In the previous section, we showed how to measure the nonlinearity of the signal path
by using the inputs and sampled outputs of the system. The nonlinearity error can
be fed into an optimization method that controls the predistorter parameters. The
optimizer then tunes the predistorter until the nonlinearity is minimized, i.e. the
predistortion best cancels the nonlinearity of the system.
The objective function to be minimized is a scalar function of the predistorter
parameters. The class of techniques that minimize such functions are the multivariate
descent methods. This multivariate optimization problem can be formulated as
min error = f(θ1, θ2, . . . , θl)
subject to θ ∈ Θ (3.19)
where Θ ⊂ Rl is the domain of acceptable parameters. The function f(·) relates the
nonlinearity of the signal chain—starting from the predistorter all the way through
the DAC, the output filter and the ADC sampling network—to the predistorter pa-
rameters. In the previous section, we talked about an algorithm to evaluate f based
on output measurements. This, however effective, is far from having a closed-form
CHAPTER 3. CALIBRATION ALGORITHM 36
θ(1)m
c
r
cc
v(l+1)
s
θ(l+1)
Figure 3.8: Search for the next simplex in the Nelder-Mead simplex algorithm.
expression for the cost function. Therefore, the optimization must be driven only
by function evaluations, and cannot directly use its gradient. The class of optimiza-
tion techniques that do not explicitly use derivatives are known as unconstrained
optimization or direct search methods [28].
A very straightforward direct search method is the coordinate search. The search
starts by varying one parameter at a time by steps of the same magnitude. If one of
these steps yields a better cost function, the improved point becomes the new iterate;
otherwise the step size is halved, and the process repeats until the steps are sufficiently
small. There are quite a few variations of this basic algorithm [29]. Algorithms in this
class are slow—essentially because they do not use any curvature information—but
they are very reliable. Also, they have very easy implementation and low hardware
requirement.
The most well-cited of unconstrained optimization methods is the Nelder-Mead
simplex algorithm [30, 31]. This algorithm takes a simplex of l + 1 points for l-
dimensional vector θ and modifies the simplex until it tightly surrounds the minimizer.
The simplex is modified according to the following procedure, as illustrated in
Figure 3.8: First, the points in the simplex are sorted according to their corresponding
cost function from the lowest (best) cost function at θ(1) to the highest (worst) cost
CHAPTER 3. CALIBRATION ALGORITHM 37
function at θ(l+ 1). Then, the cost function is evaluated at four points on a line that
passes through θ(l + 1) and the centroid of the other vertices. These four points are
given by
θ(l + 1) + α (m− θ(l + 1)) (3.20)
where m = (θ(1) + · · ·+ θ(l)) /l is the centroid point and α ∈ 12, 32, 2, 3. Depending
on the value of the cost function at the exploratory points r, s, c or cc, one of these
points replaces the current worst vertex. The simplex either reflects around the
centroid point (α = 2), contracts inside itself (α = 12), contracts outside itself (α = 3
2)
or expands (α = 3). If all of the exploratory points seem to be making for a worst
simplex, then the whole simplex shrinks towards the best vertex.
In each iteration of the algorithm, the shape of the simplex modifies according
to the local topology of the function. This adaptability makes the simplex algo-
rithm more efficient and faster compared to the coordinate search algorithm. The
pseudocode in algorithm 2 describes the algorithm in more details.
There is no guarantee that this algorithm does not fail, unless it starts from a
simplex that is close enough to the minimizer. In this case, it will move toward
the minimum very efficiently. Figure 3.9 shows an example of the progress of the
Nelder-Mead simplex algorithm in R2.
Figure 3.10 shows an example of an adaptive calibration with simplex optimiza-
tion. The top pictures show the nonlinear input-output characteristic of the system
as well as the output filter. After the calibration is done, the predistorter curve con-
verges to the inverse of the original nonlinearity, with limits set by the order of the
predistortion function. The last picture shows how the linearity improves with sim-
plex iterations. The linearity is denoted by the spurious-free dynamic range (SFDR),
which is defined as the ratio of the amplitude of the fundamental to the amplitude of
the largest in-band spur. This example only shows the calibration of a single core and
also ignores the discontinuities of the characteristic curve. In this case the simplex
algorithm converges with less than 100 function evaluations. Each of these function
evaluations corresponds to 1000 measurements of the output, which are passed to the
RLS algorithm for an update of the cost function. As we saw previously, the RLS
algorithm consists of a series of Givens rotations, and each of the Givens rotations
CHAPTER 3. CALIBRATION ALGORITHM 38
Algorithm 2 Nelder-Mead simplex algorithm
Require: an initial simplex θ(i), i = 1, . . . , l + 1
1: order the points from lowest function value f(θ(1)) to highest f(θ(l + 1))
2: while maximum number of iternations is not exceeded do
3: calculate r = 2m−θ(l + 1), where m =l∑1
θ(i)/l, and evaluate f(r)
4: if f(θ(1)) ≤ f(r) < f(θ(l)) then
5: accept r and terminate this iteration
6: else if f(r) < f(θ(1)) then
7: calculate s = m+ 2(m−θ(l + 1)) and evaluate f(s)
8: if f(s) < f(r) then
9: accept s and terminate the iteration
10: else
11: accept r and terminate the iteration
12: end if
13: else if f(r) ≥ f(θ(l)) then
14: if f(r) < f(θ(l + 1)) then
15: calculate c = m+ (r−m)/2 and evaluate f(c)
16: if f(c) < f(r) then
17: accept c and terminate the iteration
18: end if
19: else if f(r) ≥ f(θ(l + 1)) then
20: calculate cc = m+ (θ(l + 1)−m)/2 and evaluate f(cc)
21: if f(cc) < f(n+ 1) then
22: accept cc and terminate the iteration
23: end if
24: end if
25: else
26: Calculate the l points v(i) = θ(1) + (θ(i)− θ(1)) /2
27: The simplex at the next iteration is θ(1), v(2), . . . , v(l + 1)
28: end if
29: end while
CHAPTER 3. CALIBRATION ALGORITHM 39
(a) Initial simplex (b) Expand (c) Expand
(d) Reflect (e) Contract inside (f) Contract inside
(g) Contract outside (h) Contract outside (i) Contract inside
Figure 3.9: An example of the Nelder-Mead simplex algorithm.
CHAPTER 3. CALIBRATION ALGORITHM 40
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Non
linea
rity
Cur
ve
Normalized Input0 2 4 6 8
-0.2
0
0.2
0.4
0.6
0.8
Filt
er
Samples
-1 -0.5 0 0.5 1-1
-0.5
0
0.5
1
Pre
dist
ortio
n C
urve
Normalized Input0 50 100
0
20
40
60
SF
DR
(dB
)
Simplex Iterations
Figure 3.10: Matlab simulation of a calibration run.
CHAPTER 3. CALIBRATION ALGORITHM 41
Simplex iterations
Given rotations iterations
CORDIC iterations
Figure 3.11: The adaptive calibration consists of three nested iterative algorithms,
with simple CORDIC iterations at the lowest level.
can be implemented through a series of CORDIC iterations. Therefore, as illustrated
in Figure 3.11, the whole predistortion algorithm boils down to a number of CORDIC
iterations at its heart. This algorithm is much simpler to implement than the pa-
rameter estimation algorithm. However, it consists of nested iterative algorithms and
requires more time to converge. In essence, the simplicity of hardware implementation
is traded off with longer calibration time to make the algorithm physically realizable.
The number of simplex iterations depends on the number of optimization param-
eters, while the number of measurements required in the RLS algorithm depends on
the sampling noise. Both of these will directly affect the calibration time. In the next
section, we quantify the relationship between these variables.
3.4 Calibration Speed
In the previous section we depicted how an unconstrained optimization can be used to
tune the predistorter parameters. It can be shown that if the number of parameters
is not too large, the number of function evaluations that the optimization algorithm
CHAPTER 3. CALIBRATION ALGORITHM 42
performs scales linearly with the number of parameters [32]. Thus,
s = k · l (3.21)
where s is the number of function evaluations of the Nelder-Mead simplex algorithm
and l is the number of predistorter parameters that get calibrated. The factor k is a
proportionality factor which turns out to be in the order of 100 for the Nelder-Mead
method.
Each of the function evaluations called by the optimization algorithm measures
the error in
y = Dh+ ε (3.22)
where ε denotes the ADC sampling noise on top of y. In order to adjust the predis-
tortion parameters, we need to be able to solve this equation with a certain accuracy.
The added noise will degrade the level of accuracy that we can get from the above
equation. In order to compensate for noise, we must add more rows to y and D in
(3.22) and make them taller. This means that we need to gather more samples and,
therefore, calibration time gets longer.
The equation (3.22) is a multiple linear regression problem, which describes a
linear relationship between a set of dependent variables in y and multiple explanatory
variables in h. A linear regression model can be easily fitted using the least squares
approach. Of course, we showed previously that all we care about is the error in the
equation and that can be found through a workaround without explicitly calculating
the unknown h. But in this section, let’s assume that we decide to calculate h and see
how the trade-off between the number of measurements and sampling noise affects
our estimation accuracy. The least squares solution for h is
h = D†y (3.23)
where
D† = (DTD)−1DT (3.24)
is the Moore-Penrose pseudoinverse of the matrix D. The error in h due to the
CHAPTER 3. CALIBRATION ALGORITHM 43
sampling noise ε is
h− h = h−D†y
= h−D†(Dh+ ε) = −D†ε (3.25)
The covariance of this error is
covD†ε = D†covε(D†)T
= D†σ2ADCI
(D†)T
= σ2ADC(DTD)−1DT
((DTD)−1DT
)T= σ2
ADC
(DTD
)−1(3.26)
The matrix D contains the digital input data that enter the DAC. We can assume
that the elements of D are independent identically distributed random variables with
variance σ2d. Therefore, if the matrix D is tall enough, then the diagonal entries of
DTD are equal to the inner products of a random vector and itself and therefore
have a mean of nσ2d. The off-diagonal elements of DTD are inner products of two
independent random vectors and, therefore, have zero mean. It can also be shown
that the average of the diagonal elements of (DTD)−1 is the reciprocal of the average
of the diagonal elements of DTD:
mean(diag
((DTD)−1
))=
1
mean (diag (DTD))=
1
nσ2d
(3.27)
Also,
Σ(diag
((DTD)−1
))=
1
Σ (diag (DTD))=
m
nσ2d
(3.28)
We assume that the norm of the vector h is one, so that 1/σ2h is the SNR in the
estimated filter coefficients. Also, we can write
yi = dTi h⇒ σ2 (yi) = ‖h‖2σ2d = σ2
d (3.29)
Therefore, the ratio σ2d/σ
2ADC is a measure for the ADC SNR. Thus,
m
n∝ SNRADC
SNRh
(3.30)
CHAPTER 3. CALIBRATION ALGORITHM 44
The required number of measurements directly scales with the number of filter coef-
ficients and the desired accuracy in the estimation of the filter, and inversely scales
with the ADC SNR. So, for example, if the desired SNR for the estimation of the
filter coefficients is 60 dB, then
n ∝ m · 1060−SNRADC
10 ∝ m · 260−SNRADC
3 (3.31)
where in this equation the ADC SNR is expressed in dB. We assumed in (3.21) that
the simplex optimization converges after s times evaluating its objective function.
Since each of these function evaluations require n ADC samples then, assuming that
the ADC works at a sampling frequency of fs, the calibration time will be
tcal =s · nfs
(3.32)
Thus, combining (3.32), (3.21) and (3.31), we can write
tcal = αl ·mfs· 2
60−SNRADC3 (3.33)
where α is a proportionality factor. For example, assuming α ≈ 100, l = 5, m = 10,
and a target SFDR improvement of 20 dB, the calibration will approximately require
500, 000 samples, and will take about 0.5 s if the sampling is done at 1 MS/s.
3.5 Summary
In this chapter, we talked about several calibration techniques for linearizing the
DAC. We eventually converged on an adaptive linearization method and showed its
advantages compared to system identification-based methods. We showed how an
optimization algorithm can be used to minimize the cost function by adjusting the
predistortion LUT, and investigated the system-level trade-offs that affect the cali-
bration time.
Chapter 4
Prototype Design
This chapter describes some of the most important aspects in the design of the cal-
ibration ADC, which is implemented in a cyclic (algorithmic) architecture. Cyclic
ADCs are very similar to pipelines, and although not nearly as widely implemented,
they have been long used and understood. However, the special requirements in this
project call for some unique features. In this chapter, we will focus on the features
that differentiate this ADC from typical general purpose ADCs.
4.1 ADC Architecture
A cyclic ADC is similar to a single stage of a pipeline converter. In the first cycle of
a conversion, the cyclic stage acquires the input signal. In the following cycles, the
output of the stage—instead of moving through a pipeline—is repeatedly fed back
to its input until the conversion is complete. Because of its serial operation, a cyclic
converter cannot work as fast as a pipeline. Also, since it cannot utilize stage scaling,
a cyclic ADC is less energy efficient than a pipeline ADC.
As explained in Chapter 2, the calibration ADC can be very slow. Therefore, the
speed disadvantage of a cyclic architecture is not limiting in this case. To remedy
the energy inefficiency, a cyclic ADC could use tapered conversion cycles [33], where
the MSB residues get more time than the LSB residues. In the current application,
the power of the ADC is lowered due to relaxed noise requirement, and the usual
45
CHAPTER 4. PROTOTYPE DESIGN 46
Cf
Vin Vin
track hold
C1
Cf
C1
Cs Cs
Figure 4.1: A traditional bottom-plate track-and-hold circuit.
inefficiency of cyclic converters due to the absence of stage scaling is not prominent
in this design. On the plus side, a cyclic ADC has a size advantage compared to
a pipeline. Also, there is only one residue gain involved in a cyclic converter and
therefore the calibration is straightforward. Lastly, the cyclic ADC in this work is
designed to have a programmable number of cycles. Therefore, in exchange of more
conversion time, the LSB size can be decreased. This was critical because we wanted
to have the option of experimenting with different signal levels at the DAC output
when testing the chip. Obviously, such programmability is not an option in a pipeline
architecture.
4.1.1 High Bandwidth Sampling
The biggest challenge of linear sampling is the design of the sampling switch. In
a typical track-and-hold circuit (Figure 4.1) the voltage dependent resistance of the
switch transistors degrades the linearity at high speeds [22].
An interesting solution to the problem of nonlinear sampling switch is to do away
with the switch altogether [34]. As shown in Figure 4.2, the idea is to use fixed resistors
to connect the input signal to the sampling capacitors, so there is no nonlinear element
in the signal path during sampling. In the hold mode, the top plates of the sampling
capacitors are shorted to block the input signal.
Due to the finite resistance of the shorting switches, there will still be some input
CHAPTER 4. PROTOTYPE DESIGN 47
Cf
Vin Vin
track hold
C1
Cf
C1
Cs Cs
Figure 4.2: Track-and-hold circuit with RC sampling.
Vin Vin Vin
(a)
Vin Vin Vin
(b)
Figure 4.3: (a) Cascaded RC sampling stages to decrease feedthrough. (b) Nonlinear
capacitances due to large switches and large resistors degrade tracking linearity.
CHAPTER 4. PROTOTYPE DESIGN 48
t
htrack(t)
Tt
hhold(t)
RC T RC
CsVin
Cf
CsVin Vt
track hold
Figure 4.4: Modified RC sampling based on Wheatstone bridge operation.
feedthrough in the hold mode. In order to reduce the feedthrough, two [35] or three
[36] switched-RC branches can be cascaded and both the switches and the resistors
have to be made larger. This is illustrated in Figure 4.3(a). Shown in (b) are the
capacitances along the signal path in the track mode. We can see that large switches
or large resistors add nonlinear junction capacitances along the sampling path, which
decrease the amplitude of the sampled voltage and degrade the linearity at high
frequencies. Therefore, this type of sampling is only appropriate for low-frequency
operation.
An alternative implementation is to arrange the switches so that they form a
Wheatstone bridge in the hold mode, as shown in Figure 4.4. In this case, the input
suppression is achieved through resistance matching between switches and resistors
instead of resistive division. The matching cannot be made perfect so there would
still be some feedthrough. Again, the feedthrough can be made smaller by cascading
two stages; however, in this case there is no need for large resistors or switches and
it is possible to use minimum size devices instead. This will reduce the nonlinear
capacitance along the path and improve tracking linearity. Figure 4.4 also shows the
CHAPTER 4. PROTOTYPE DESIGN 49
Vin
Cs
r
Cs
Vin
CsR-r
(a) (b) (c)
Vin
R-rR r
r1
r2
r1
r2
Figure 4.5: Comparison of three switch configurations. (a) Switch in series with a
resistor, (b) RC sampler, (c) Wheatstone sampler.
resulting time-domain response of the sampling interface. During track phase, the
sampler impulse response is a decaying exponential corresponding to the low-pass
filter due to R and Cs. In the hold mode, the signal is shut off. If the time constant
RC is much larger than the sampling window, then the sampler can approximate a
boxcar response, which, according to the discussion in Chapter 2, is the desired filter
before the ADC.
Figure 4.5 recaps our discussion with a side-by-side comparison of three basic
switch arrangements. In (a), regular switches are paired with series resistors, which
are included for a fair comparison with the other two configurations. The series re-
sistor improves linearity by decreasing the contribution of nonlinear switch resistance
as well as lowering the input signal level. It also decreases the loading of the ADC
on the DAC driver which is important in this application. To improve the linearity
of the switch, it can be bootstrapped to achieve a signal independent resistance [37].
In the current application, the input signal Vin is the voltage across off-chip loads
and centers around 2 V. This complicates the design of the bootstrap switch due to a
requirement of high voltage charge pump. Also, to avoid breakdown, the switch itself
has to be implemented with high voltage devices which make worse switches than the
core transistors.
Figure 4.5(b) shows an RC-sampler. The signal feedthrough is proportional to
r/R, where r is the resistance of the shorting switches and R is the value of the pas-
sive resistor. As mentioned before, in order to decrease this ratio, the resistor and/or
CHAPTER 4. PROTOTYPE DESIGN 50
1 2 3 4 510
-2
10-1
100
101
102
feed
thro
ugh
[mV
]
1 2 3 4 5
4
8
12
16
switc
h pa
rasi
tic c
apac
itanc
e [p
F]
W [µm]
RC sampler feedthrough
Wheatstone sampler feedthrough
switch capacitance
Figure 4.6: Signal feedthrough in an RC and a Wheatstone sampler as well as switch
parasitic capacitance are shown as a function of the switch width. The length of
transistors in the switch is set to 300 nm, the total signal path resistance R = 10 KΩ,
and Cs = 500 fF. The resistance r in the Wheatstone sampler is chosen to minimize
the signal feedthrough. It can be seen that the signal feedthrough in the Wheatstone
sampler is about 3 orders of magnitude smaller than that of the RC sampler.
CHAPTER 4. PROTOTYPE DESIGN 51
the switches must be made larger, and this will increase the nonlinear parasitic ca-
pacitance along the signal path. Figure 4.5(c) illustrates the proposed Wheatstone
sampler. The total resistance along the signal path during the track phase is again
equal to R. This resistance is broken into two parts: A first resistor equal to R − rwhich isolates the DAC and the ADC, and a second resistor equal to r which approxi-
mately matches the switch resistances r1 and r2 and results in minimum feedthrough.
The feedthrough can be found to be
VftVin
=r1
R + r1− R
R + r2(4.1)
Assuming r1 = R(1− ε1) and r2 = R(1 + ε2) we have
VftVin
=ε2 − ε1
4 + 2(ε2 − ε1)(4.2)
The switch resistances r1 and r2 are voltage dependent, so the feedthrough cannot
be made exactly zero. However, as we can see from (4.2), the feedthrough depends
on the difference of the errors in r1 and r2. Therefore, it can be made small by
making the switch resistances have a symmetric profile around the input common
mode voltage. Furthermore, unlike the RC sampling configuration, there is no need
for a large resistor or large switches. The switches must be designed to match the
resistor and this results in much smaller switches than in the RC sampler.
Figure 4.6 compares the signal feedthrough in the hold phase for an RC sampler
and a Wheatstone sampler. The sampler circuits are implemented as shown in Fig-
ures 4.5(b) and (c). The total track phase resistance in each case is set to R=10 KΩ,
the sampling capacitance is Cs = 500 fF, and the input signal is 1 Vppd. The switches
are implemented with an anti-parallel pair of I/O PMOS devices with a length of
300nm. The signal feedthrough is plotted as a function of the width of the PMOS
transistors, W . For equal values of tracking resistances and identical switches, the
RC sampler and the Wheatstone sampler achieve comparable linearity performance.
However, as seen in the picture, the feedthrough in a Wheatstone sampler is ap-
proximately three orders of magnitude smaller than that of an RC sampler. For a
fixed value of R, the feedthrough in an RC sampler can be made smaller by increas-
ing W , which in turn, adds to the parasitic capacitance along the path. The signal
CHAPTER 4. PROTOTYPE DESIGN 52
Csr
Vin
R
r1
r2
(a) (b)
Figure 4.7: RC switch implementation. Two Wheatstone samplers are cascaded to
further reduce the signal feedthrough (a), all of the switches are implemented using
anti-parallel thick-oxide PMOS devices (b).
feedthrough drops as 1/W whereas the parasitic capacitance grows proportional to
W . Therefore, considering the numbers in the figure, in order to make the feedthrough
of the RC sampler as small as the feedthrough of the Wheatstone sampler, we have
to use very wide transistors with significant parasitic capacitances.
Another requirement of the sampling network is that it must shield the ADC core
from the large input voltage levels. In this design, the input common mode voltage
is stored on the sampling capacitors. As a result, it can change from 1.5 V to 2
V without affecting the sampling linearity. A schematic of the Wheatstone sampler
circuit implemented in the ADC is shown in Figure 4.7. It consists of a cascade of
two Wheatstone samplers to further reduce the signal feedthrough. The switches in
the sampling network are designed with thick-oxide I/O devices with an aspect ratio
of 2µ/240nm. The isolating resistor in the sampling network equals 15 KΩ, which
is much larger than the DAC output resistors in order to reduce loading. Also, r =
1KΩ and Cs = 400 fF. All resistors are implemented using high-resistance poly (HR
poly) which provides a tight variation.
Two replica sampling networks are used in parallel with the main sampling net-
work in order to regulate the ADC sampling current. During every DAC pulse, either
one of the replicas or the main sampling network is connected to the DAC output,
therefore any glitch due to sampling is pushed out of the DAC bandwidth to its
CHAPTER 4. PROTOTYPE DESIGN 53
C4
C1C3
(a) End of previous phase 2
C2
C3 C4C1
Vref
2±
(b) Phase 1: addition
C4
C1C3
(c) Phase 2: multipliation
C4
C1C3
(d) End of phase 2
Figure 4.8: Basic two-phase operation of the ADC.
sampling frequency. Figure 5.2 in Chapter 5 illustrates a schematic of the sampling
circuit with the replica networks.
4.1.2 ADC Core
The ADC core is a variant of the cyclic architecture similar to the design in [38]. All
of the core operations are achieved through different configurations of one OTA and
six pairs of capacitors Cs, Cf , and C1 to C4. The conversion starts with sampling
the input signal on the sampling capacitor Cs (track phase in Figure 4.4). Then the
voltage is amplified and transferred to a capacitor C1 (hold phase in Figure 4.4).
This amplification compensates for the signal attenuation through the switched-RC
network and reduces the input referred noise of successive cycles. Next, the ADC
goes back and forth between two phases until it resolves the required number of bits.
In the first phase, as shown in Figure 4.8(b), the voltage on C1 is added or subtracted
from the reference voltage and transferred onto capacitors C3 and C4. In the second
phase (Figure 4.8(c)), the voltage is multiplied by two by combining the charges on
CHAPTER 4. PROTOTYPE DESIGN 54
Φ
Φe
Vin Vtop
VbotQinj
Cs
Figure 4.9: Charge injection from the bottom-plate switch.
C3 and C4. The resulting voltage is transferred back onto C1 and makes the new
residue voltage. During phase 1 the OTA drives C3 and C4 and during phase 2 it
drives C1 and the comparator. So even though it is possible to combine the reference
addition and multiplication by two in one phase, separating those helps balance the
OTA load.
Since the noise requirement for the ADC is relaxed, we can use small capacitors
in the design to lower the power. However, small capacitors are more susceptible to
charge injection errors. As a result, certain design aspects become more important
to mitigate these errors and maintain high linearity. We will next describe the two
most important of these design aspects.
Switch Implementations
Apart from the sampling switch that has direct impact on the linearity of the sampled
voltage, other switches also need to be carefully designed to maintain linear processing
of the signal. In this circuit, the bottom plate switches are implemented with anti-
parallel NMOS transistors and the top plate switches are transmission gates. Several
considerations must be taken into account in sizing these switches. First, their resis-
tances must be small enough so that their associated time constants do not slow the
settling performance of the circuit. Also, the switch resistances must be examined to
CHAPTER 4. PROTOTYPE DESIGN 55
avoid phase margin degradation. Another important caveat exists in regards to the
charge injection from the bottom plate switches, illustrated in Figure 4.9. It is known
that the channel charge from the bottom plate switches is signal independent and
so bottom plate sampling is immune to nonlinear charge injection. However, when
a bottom plate switch opens, the splitting of the channel charge between its source
and drain is a function of the resistance seen from these terminals. Therefore, the
charge that will be injected onto the bottom plate of the sampling capacitor depends
on the resistance of the top plate switch and consequently on the signal level. When
the circuit processes a large differential voltage, the different signal levels in the two
halves of the circuit can cause non-equal resistances in top-plate switches and result
in mismatched charge injection from bottom-plate switches. In order to reduce the
effect of this error charge, the top-plate switches must be designed to have a balanced
resistance profile around the common-mode voltage. A residual error still remains
due to imperfect symmetry and mismatch between switches in the two halves of the
circuit, which was verified to be negligible through simulations.
ADC Phase Transitions
A cyclic ADC operates through the rearrangement of a few circuit components to
accomplish the operations of an extended pipeline converter. As a result, the con-
nectivities and the switching patterns are more complicated in a cyclic ADC than a
pipeline. This can be seen from Figure 4.10 which illustrates a single-ended diagram
of the MDAC along with the clock signals. It can be seen that the ADC requires
many more clock phases than, e.g., a pipeline which operates using a pair of non-
overlapping clock signals. It is important, then, to carefully arrange all clock edges
in order to preserve the sampled charge while transitioning between different phases.
Figure 4.11 illustrates an example of incorrect phase transitions. Figures (a) to
(d) show the charge distributions during phase 1 and phase 2 operation and Figures
(e) and (f) show stages of transition between phase 1 and 2. The end of phase 1
operation is shown in (c). When transitioning to phase 2, first step is to open the
bottom-plate sampling switch—as is the case in pipeline ADCs—which in this case
is the switch to the left of C4. This state is shown in (e). There are still eleven more
CHAPTER 4. PROTOTYPE DESIGN 56
PH
I2S
MP
LH
OLD
PH
I1P
HI2
PH
I1
s h a b c d e 1 1e 2
sh
ab
c1
11 d d
1e ed
122
1
c
f
RS
T
v out
v CM
I
v R+
v CM
I
v R-
v out
Idle
/ph
ase2
sam
ple
phas
e1ho
ldv C
M
v CM
v in
v CM
v CM
v CM
v CM
(c)
(a)
(b)
Fig
ure
4.10
:M
ore
det
aile
ddia
gram
ofth
eA
DC
MD
AC
.(a
)F
init
est
ate
mac
hin
ere
pre
senti
ng
phas
esof
oper
atio
n.
(b)
AD
Ccl
ock
ing
sign
als.
(c)
Sin
gle-
ended
MD
AC
.
CHAPTER 4. PROTOTYPE DESIGN 57
voutvCM
vR+vCMvR-vout
(a) Beginning of phase 1
voutvCM
vR+vCMvR-vout
(b) End of phase 1
voutvCM
vR+vCMvR-vout
(c) Beginning of new phase 2
voutvCM
vR+vCMvR-vout
(d) End of phase 2
voutvCM
vR+vCMvR-vout
(e) Transition between phase 1 and 2
voutvCM
vR+vCMvR-vout
(f) Transition between phase 1 and 2
Figure 4.11: Phase 1 operation duplicates the residue from C1 in (a) onto C3 and C4
in (b). In phase 2, C3 and C4 charges add up (c) and construct new residue on C1
(d). In the transition between phases 1 and 2, the bottom-plate switch is opened first
(e). If the top-plate switches are not opened right after that, switching glitches can
get transferred to the bottom-plate and cause charge leakage (f).
CHAPTER 4. PROTOTYPE DESIGN 58
switches that must change state until we get into phase 2. We must open the switches
that connect C3 and C4 to the output of the amplifier, close the switches on the two
sides of C2, disconnect C1 from VCM and connect it to the output, and also switch
the summing node of the amplifier. Interestingly, out of the many possible switching
sequences, only a few are safe. For example, Figure (f) shows a failing scenario in
which the switch to the left of C2 is closed next. This could result in a negative
voltage excursion on the left plate of C2 that gets coupled to the node between C3
and C4. A negative impulse on that node could partially turn on the bottom-plate
switch and lead to charge leakage.
4.1.3 Foreground Calibration
The output of the cyclic ADC is related to the output bits through the following
relationship:
Vout = Vref∑i
(g−1di + ε+ Vos
)(4.3)
where g is the stage radix, ε is the unresolved residue and Vos is the offset of the
ADC, and they all contain random errors due to circuit mismatches and non-idealities.
However, it is only the stage radix that governs the linearity of the converter and it
can be found through a foreground calibration [38].
4.1.4 OTA Structure
The OTA is implemented as a two-stage Miller compensated amplifier with nested
gain boosting, as illustrated in Figure 4.12(a). The first stage of the top-level OTA as
well as all the nested OTAs are implemented as folded cascade amplifiers. A schematic
of a nested OTA is shown in Figure 4.12(b).
The boosted output impedance will have a pole-zero doublet at the unity gain
frequency of each auxiliary amplifier. To avoid slow settling, these doublets must be
pushed beyond the unity gain frequency of the main amplifier, hence making “fast
doublets” [39]. The feedback factor of the auxiliary amplifiers is near one; while that
of the main amplifier is usually a fraction of unity. So the above condition does not
CHAPTER 4. PROTOTYPE DESIGN 59
M1
Mc
M2
R Cc
Rsense
M12 M21
M11
VDD = 1 V
Vin
Vout
Vout,CM
(a) Top-level two-stage gain-boosted OTA.
VDD = 1 V
Vin
Vout,CM
Vout
(b) Booster amplifier.
Figure 4.12: Top-level and booster amplifier schematics.
CHAPTER 4. PROTOTYPE DESIGN 60
require additional stages to be faster than the main OTA. The transistors in auxiliary
amplifiers have smaller widths and non-minimal channel lengths and are biased at low
currents, thus providing high gains. The main stage on the other hand will have fast
transistors. The separation of gain and speed requirements allows the use of minimal
channel length transistors in the input.
The nested OTAs use active common-mode feedback sensing to preserve their
gains, whereas simple resistive averaging was enough to sense the output common-
mode at the second stage of the top-level OTA. The overall OTA achieves a gain of
more than 95 dB and a unity gain frequency of 130 MHz.
4.2 Noise Analysis
In this section, we review the noise analysis of the ADC. First we consider the noise
contribution of the amplifiers. Due to the complexity of arithmetic involved, there are
multiple recipes for the derivation of noise, each with a different degree of accuracy.
We briefly review these methods to present a unified view. Next, we revisit the
sampling noise, keeping an eye on the differences between the proposed sampling and
the regular sampling processes. We finally combine the results to get the total noise
of the converter.
4.2.1 Review of Amplifier Noise Analysis
The total output noise of an amplifier in a sampled system is obtained by integrating
its output noise power across all frequencies from zero to infinity. The output noise
power is found by multiplying each noise source in the circuit by the magnitude
square of its transfer function to the output of the amplifier and then adding up all
the results.
Nout,tot =∑i
∫ ∞0
Si(f) |Hi(f)|2 df (4.4)
where Si is the one-sided noise power spectral density (PSD) of the ith component
and Hi(f) is its corresponding transfer function. Here, we will only consider thermal
noise of transistors and resistors because these are the dominant sources of electronic
CHAPTER 4. PROTOTYPE DESIGN 61
M1
McM2
Cx
C1
CL
Cy Mb
vo
-βvo
v1
vx
vy
CcR
(a) Simplified model of the OTA with all noise sources along the signal path.
M1 M2
C1
CL
vo
-βvo
v1
CcR
(b) Simplified model of the OTA with dominant noise sources.
Figure 4.13: Small-signal model of the OTA with varying details.
CHAPTER 4. PROTOTYPE DESIGN 62
(a) (b)
× × × ×××Re
ImIm
×p1p2p3 p3 z1 -z1p1
p2
p2
*
Re
Figure 4.14: Typical pole-zero plots of amplifiers. (a) Miller-compensated amplifier.
(b) Cascode-compensated amplifier.
noise in the circuit. The channel noise of transistors can be modeled as a current
source between the drain and source terminals and its one-sided PSD in active region
is approximately given by:
SMOS(f) = 4kBTγgm (4.5)
where kB is the Boltzmann constant, kB = 1.38 × 10−23 J/K, T is the absolute
temperature in degrees kelvin, γ is a coefficient equal to 2/3 for long channel devices
[40], and gm is the transconductance of the transistor. The thermal noise of a resistor
R can be modeled by a voltage source with the following PSD:
SR(f) = 4kBTR (4.6)
Derivation of noise transfer functions (NTF) for each noise source in a circuit and
then carrying out the noise integrals can be performed accurately even with symbolic
expressions. However, including every single device in calculations can lead to very
“high-entropy” results which does not provide good design intuition. But depending
on the desired accuracy—for example, whether it is a back of the envelop design or
part of an automated circuit optimization program—we can make assumptions to
simplify the calculations.
A first order approximation of this integral can be obtained by approximating the
filter response with a box and simply multiplying the noise PSD by an equivalent
noise bandwidth [41].
Nout,tot =∑i
∫ ∞0
v2n,outdf = v2n0 ·Bn (4.7)
CHAPTER 4. PROTOTYPE DESIGN 63
where v2n0 is the total low frequency output referred noise of the circuit and Bn is the
equivalent noise bandwidth, chosen such that the above equation holds. The main
contributors to the low frequency noise are the non-cascode transistors of the first
stage of amplifier. The contribution of the second stage can be ignored because they
do not see the gain of the first stage. Also cascode transistors have a negligible low
frequency gain and so their noise contribution is negligible.
In order to determine the noise bandwidth, we consider the typical pole-zero plots
of amplifier closed loop responses as shown in Figure 4.14. The first picture shows the
case where all poles are real as in a Miller-compensated amplifier and the second one
shows an amplifier with a real and a pair of complex conjugate poles. Although these
pole-zero plots look crowded, first-order approximations can be worked out to obtain
the noise bandwidth [42] [43]. For example, in the first case a first-order approxima-
tion would be to ignore all the poles except the dominant one and approximate the
amplifier with a one-pole system, in which case we have Bn = π/2 · p1.However, the single pole model is quite often a very crude model of the amplifier.
Usually amplifiers are designed for a phase margin of 60 to 70—as opposed to 90—
for best settling response. This means that the higher order poles already begin to kick
in around the unity gain bandwidth of the amplifier. Therefore, a better estimation is
obtained if the effect of these higher order pole are also considered in noise bandwidth
calculation. A small signal model of the amplifier considering the effect of the second
pole is shown in Figure 4.13(b). It is seen that the noise generating elements are now
the input transistors of the first and second stage and the nulling resistor, and they
each see a different transfer function to the output. In fact, due to the dominant pole
at node v1, the noise of M1 is more severely filtered compared to M2 and the resistor,
and so the previous argument about ignoring the noise of second stage components
based on DC gain analysis is inaccurate. The noise integrals are more involved in
this case but the math can nevertheless be worked out. Assuming R = 1/gm2 and
neglecting C1 compared to Cc and CL, we can obtain the following results [44].
NOTA,single−ended = N1 +N2 +NR =1
β
kBT
Ccγ +
kBT
CLtot(γ + 1) (4.8)
Comparing Figures 4.13(a) and 4.13(b), we can see that the input cascode device and
CHAPTER 4. PROTOTYPE DESIGN 64
the gain boosting device are still out of our noise equations. The noise of these devices
can only appear at the output at high frequencies close to the cascode pole related
to node vx. However, while designing the amplifier, we made sure to push out the
cascode pole far away from the first two dominant poles in order to preserve phase
margin. Also, both the cascode as well as the booster device are located in the first
stage where they are influenced by the dominant poles of the loop gain. Therefore, at
frequencies that the noise of these devices start to leak out the signal transfer from
v1 to the output has vanished and the noise will be directed to ground. So it is a
reasonable assumption to ignore the noise of these two devices. Next, we include the
noise of active load devices M11, M12, M13 in the first stage and M21 in the second
stage and double the noise power to account for the other half of the circuit (see
Figure 4.12(a)). Thus we get
NOTA =2
β
kBTγ
Cc
(1 +
gm11 + gm12
gm1
)+
2kBT
CLtot
(γ
(1 +
gm21
gm2
)+ 1
)(4.9)
4.2.2 Notes on Sampling Noise
In this section, we consider the noise that is collected in the sampling process. The
present sampling method exhibits non-stationary noise and incomplete settling which
makes it different from the common sampling that involves stationary noise and so it
is worth to revisit the result.
Figure 4.15(a) shows a transient waveform of sampled signal with stationary noise.
The signal exhibits noisy perturbations during track phases and it is frozen in time
during hold phases. In the beginning of every track phase the signal starts off from
the value that it was previously held. If we eliminate the hold phases and stitch the
track phase waveforms together we will get a continuous signal whose statistics do
not change over time, and therefore is stationary. The result is no different from the
output of an RC branch that operates in continuous time and has no switches. The
output noise power in this case can be found by integrating the filtered output noise
PSD, as shown in Figure 4.16(a).
Since the input PSD is directly proportional to R and the circuit bandwidth is
inversely proportional to it, the resistance value drops out and we get the familiar
CHAPTER 4. PROTOTYPE DESIGN 65
t
vn1
t
vn2
t
vn1
t
vn2
(a) (b)
(c) (d)
Figure 4.15: T/H transient noise.
PS
D [
V2/H
z]
Pn
[V2 ]
frequency time
(a) (b)
R1
R2
T1 T2
noise from track phase
noise from reset phase
Figure 4.16: Derivation of kT/C result. (a) Frequency domain analysis of a continuous
RC branch. (b) Time domain analysis of switched RC branch.
CHAPTER 4. PROTOTYPE DESIGN 66
kT/C result. This also does not depend on the length of the tracking phase and
whether the settling is incomplete or not. If the track phase is short compared to the
time constant of the branch, successive noise samples will be correlated. However, as
we take more and more samples they will be spread across longer time frames and
their statistics will approach those of the continuous time process.
The waveforms of a non-stationary noise process are shown in Figure 4.15(c),
which also applies to the Wheatstone sampler used in this work. In this case, the
periods with large signal perturbations denote the track phases and the periods with
quieter variations are hold phases when the sampling capacitor is being reset. The
branch resistance during the track phase is much larger than the resistance of the
resetting switches so the noise variance is larger in track phase compared to hold
phase. From the noise eye diagram in Figure 4.15(d) we see that the noise is not
stationary, which is in contrast with the eye diagram in Figure 4.15(b) that does not
show any time dependence. In this case, the usual frequency domain analysis does
not apply. Instead, the noise power can be found through time-domain analysis of
the noise process’ autocorrelation function [45][46].
Ryy(t1, t2) = h(t1) ∗Rxx(t1, t2) ∗ h(t2) (4.10)
where Rxx and Ryy are the autocorrelation of the input and output noise, and h(t)
is the impulse response of the RC filter. The variance of the output noise power can
be found from its autocorrelation as follows:
σ2y(t) = Ryy(t, t) (4.11)
Since the input noise is white, its autocorrelation is a Dirac delta function:
Rxx(t1, t2) =Sx02δ(t1 − t2) (4.12)
where Sx0 = 4kBTR is the one sided white noise PSD of the resistor. Combining
these equations, we can carry out the convolution and find the output noise power as
CHAPTER 4. PROTOTYPE DESIGN 67
follows:
σ2y(t) =
Sx02
∫ Ts
0
h2(τ)dτ
= 2kBTR ·1
2τ
(1− e−2Ts/τ
)=kBT
Cs
(1− e−2
Tsτ
)(4.13)
Therefore, we see that the sampled noise grows exponentially until it reaches the final
value of kBT/Cs. Another noise source that must be included in this result is the
noise that is stuck on the capacitor from the reset phase. Since the time constant of
the branch is much shorter in this phase, the noise variance reaches the steady value
of kBT/Cs. During the following track phase, this voltage decays exponentially and
its final value at the end of the track phase will be
v2n,reset =kBT
Cse−2
Tsτ (4.14)
Combining the two contributions of the two noise sources, we get the total sampled
noise as follows:
v2n =kBT
Cs(4.15)
Once again, we arrive at the pervasive expression of kBT/Cs. The noise being in-
tegrated increases during sampling window while the noise power stored on the ca-
pacitor decreases, both exponentially and with the same time constant. So the total
noise stays the same, as shown in Figure 4.16(b). Interestingly, the way the branch
resistance falls out of the equation in this time domain analysis is reminiscent of the
situation where the noise power of an RC branch stays independent of the resistor
value in the frequency domain analysis.
4.2.3 Track-and-hold Noise Analysis
The noise contribution at the output of a track-and-hold stage consists of the noise
that is generated during track mode and gets transferred to the output, as well as
the hold mode noise. We already calculated the noise contribution of the sampling
CHAPTER 4. PROTOTYPE DESIGN 68
capacitor. But the track mode noise also includes contributions from the other capac-
itances that are connected to the amplifier virtual ground node, such as the feedback
capacitor Cf and the parasitic capacitance of that node Cpar. It can be shown that
the total output referred noise of the track mode is [47]
Ntrack = 2kBTCs + Cf + Cpar
C2f
(4.16)
The noise added to output during hold-mode operation primarily consists of the
noise of the amplifier as well as the thermal noise of switches:
Nhold = NOTA +Nswitches (4.17)
The amplifier noise follows from (4.9). The switches’ noise contributions is found
by transferring their noise PSDs to the output, and integrating the PSDs across all
frequencies. Noise voltages corresponding to switches that are in series with the
sampling capacitor see a gain of (Cs/Cf )2 while others that are in the feedback path
or are connected to the load see unity gain. We assume a second order filter response
with a resonant frequency ω0 and quality factor Q. Therefore, the noise contribution
of switches can be obtained from
Nswitches = 4kBT
(∑Rs
(CsCf
)2
+∑
Rf +∑
RL
)ω0Q
4(4.18)
where the summations are carried out over all switch resistances in series with the
sampling, feedback, and load capacitors in the complete differential circuit. The
values of ω0 and Q are related to the loop gain unity gain frequency ωc and phase
margin PM as follows:
ω0 = ωc4
√1 + tan2(PM)
Q =4√
1 + tan2(PM)
tan(PM)(4.19)
4.2.4 ADC Noise Analysis
In order to calculate the total converter noise, we follow the signal path to see what
noise sources are added to the signal. First, the input signal is sampled through the
CHAPTER 4. PROTOTYPE DESIGN 69
RC sampling network and accrues a noise given by (4.16). Then, in the hold phase, the
sampled voltage is transferred from Cs onto C1, meanwhile being amplified by Cs/Cf
and getting a noise of NSH . Next, the two-phase ADC operation starts and repeats
until the desired number of bits are resolved. We denote the noise contributions in
phase 1 and 2 by N1 and N2, respectively. Each time the signal goes through the
phase 2 operation, it is amplified by a factor of 2. The total input referred noise of
the cyclic ADC is therefore
NADC = Ntrack +NSH +
(N1 + N2
4
) (1 + 1
4+ 1
16+ · · ·
)(CsCf
)2 (4.20)
where the noise componentsNSH , N1 andN2 follow from the hold mode noise equation
(4.17), with the only difference being the capacitive network the amplifier is embedded
in.
4.3 Summary
We covered some of the important and unique aspects of the design of the calibra-
tion ADC. Input sampling is accomplished by the Wheatstone sampling technique
presented in this chapter. Due to the small sizes of capacitors, care was taken to
minimize charge injection errors from switches considering second order effects that
are usually insignificant in regular designs. Appropriately sequencing clock edges be-
tween various phases of the ADC is another important aspect in this design that was
pointed out. Finally, the noise analysis of the complete converter was reviewed.
Chapter 5
Experimental Results
A prototype chip was taped out as a proof of concept for the proposed architecture.
The chip was fabricated in UMC’s 90-nm, 9 metal, 1 poly, CMOS process. The
die micrograph is shown in Figure 5.1(a). The die has a total area of 2 mm × 2
mm and was packaged in an 88-pin QFN package. The active area of the DAC is
0.36 mm2 and that of the ADC is only 0.04 mm2. The chip contains three slightly
different DAC-ADC pairs for testing purposes. The layout of the ADC is shown in
Figure 5.1(b).
5.1 Test Setup
The chip I/O interface is shown in Figure 5.2. The interface is passive and bidirec-
tional and was used both for capturing the data out of the DAC as well as testing the
ADC. Both single-ended as well as differential I/O options are included on the test
board. For the purpose of testing the ADC, the input signal is generated from an
HP8644B low-jitter, high-performance RF signal generator. It is then filtered through
a K&L tunable bandpass filter to block the spurious tones before connecting to the
test board. A cascade of two ETC-1-1-13 baluns are used on the board for single-
ended to differential conversion with small amplitude imbalance. The 50 Ω resistors
are the off-chip loads of the DAC. The inductors in parallel with the loads are used to
set the input common-mode voltage when testing the ADC. The combination of the
70
CHAPTER 5. EXPERIMENTAL RESULTS 71
DACADC
(a) Chip micrograph.
OTA
switches
digitalcontrol
capacitors
comp
(b) ADC layout.
Figure 5.1: Die photo of the prototype chip and the ADC layout. The chip also
includes two other DAC-ADC pairs (not outlined in the photo) for debugging options.
CHAPTER 5. EXPERIMENTAL RESULTS 72
chip boundary
dummysamplers
DAC output driver
22Ω
2pF
0.1uF
0
0
0
0
50Ω
to the rest of ADC
S
ETC-1-1-13100uH 50Ω
0.1uF
Vcmi
2V
V id
V ip
Vim
Vca
scod
e
Figure 5.2: Chip I/O interface. The Wheatstone samplers are simplified in this
schematic.
CHAPTER 5. EXPERIMENTAL RESULTS 73
2 pF capacitor and 22 Ω resistors provide an additional stage of filtering right at the
border of the chip. The value of the capacitor is adjusted depending on the frequency
of the test input to the ADC. This capacitor is removed when the DAC is under test
with a broadband signal. As shown in the picture, the bias of the cascode transistors
in the DAC output driver can be set off-chip. During regular DAC operation, they
are biased at 2 V. However, when testing the ADC they are turned off so that the
(nonlinear) output impedance of the DAC driver does not load the ADC. Two repli-
cas of the ADC sampling network are placed in parallel with the ADC. The clocking
of these replicas is programmed such that at any time one branch is configured in
sample mode and two branches are in hold mode. Therefore, the loading on the DAC
driver is regularized across time and there is similar sampling current drawn from the
DAC driver during every output pulse.
5.2 Measurement Results
5.2.1 ADC Measurements
Due to capacitor mismatches, the accurate closed-loop gain of the residue amplifier
in the ADC is not known a priori and must be obtained through measurements. So
before using the ADC to calibrate the DAC, we must perform a foreground calibration
on the ADC and find the residue gain. The right gain is the one that maximizes the
ADC linearity, and can be found by testing either the static or dynamic linearity of
the ADC. In this work, we chose to measure the dynamic linearity by applying a
sinusoid to the ADC and finding the residue gain that maximizes the output SFDR.
This gain is used in all other measurements afterwards.
ADC Static Nonlinearity
Figure 5.3 shows the measured static nonlinearity of the ADC. The input is a low
frequency signal at about 50 MHz and the integral and differential nonlinearities (INL
and DNL) are found through a histogram test. The ADC achieves a DNL below 0.62
LSB and an INL below 2.2 LSB.
CHAPTER 5. EXPERIMENTAL RESULTS 74
0 2000 4000 6000 8000 10000 12000 14000 16000-1
-0.5
0
0.5
1
DN
L (L
SB
)
Peak DNL = 0.61 / -0.52
0 2000 4000 6000 8000 10000 12000 14000 16000-4
-2
0
2
4
code
INL
(LS
B)
Peak INL = 2.20 / -2.17
Figure 5.3: ADC measured static nonlinearity.
ADC Dynamic Nonlinearity
Figures 5.4 and 5.5(a) show single-tone tests of the ADC dynamic nonlinearity. Input
frequency was swept from 50 MHz to above 500 MHz. The ADC sampling frequency
is about 1 MHz, therefore the ADC downsamples the input signal. The SFDR rolls
off at higher frequency mainly due to filtering of the sampling interface. This is
confirmed by the expected droop of the fundamental tone.
The bandwidth of the sampling interface is larger than the bandwidth of an ideal
boxcar sampler. This is because the actual sampling interface is only an approxima-
tion of an ideal boxcar sampler. It exhibits non-zero rise and fall times as well as a
drooping amplitude due to the finite RC time constant of the sampling branch. This
results in an effective sampling window smaller than ideal value of Ts = 1.25 ns and
hence a larger bandwidth.
A shortcoming of the single-tone tests is that the observed linearity might be
optimistic as the harmonics of high frequency inputs can get blocked by the ADC
CHAPTER 5. EXPERIMENTAL RESULTS 75
0 100 200 300 400 500 60040
50
60
70
80
SF
DR
/SN
DR
(dB
)
Frequency (MHz)
SFDR
SNDR
0 100 200 300 400 500 600-5
0
5
Fun
dam
enta
l (dB
)
Frequency (MHz)
Figure 5.4: Measured SFDR/SNDR and magnitude of the fundamental tone of the
ADC output versus input frequency.
itself. Therefore, a two-tone test must be done to cross-check the high-frequency
performance of the ADC. Figure 5.5(b) shows the output spectrum of a two-tone
test. The amplitude of the input signal is kept the same as in the single-tone tests
at 1 Vppd. It can be seen that the performance is in agreement with the results from
single-tone tests.
5.2.2 DAC Measurements
Figure 5.6 shows the main test setup during background calibration of the DAC. The
high-speed inputs for the DAC are generated using ADI’s Data Pattern Generator 2
(DPG2). The calibration ADC digitizes the DAC with a down sampling ratio of
about 800. The ADC data is captured using an NI board. The calibration algorithm
runs in software and updates the LUTs in the computer. A new set of predistorted
codes are generated accordingly and sent to the DAC through the DPG2. This cycle
continuously repeats until the LUTs converge to their steady state values.
CHAPTER 5. EXPERIMENTAL RESULTS 76
0 0.1 0.2 0.3 0.4 0.5-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
DF
T M
agni
tude
(dB
FS
)
Frequency (f/fs)
3rd harmonic
(a) Single-tone input with a frequency of 491 MHz.
0 0.1 0.2 0.3 0.4 0.5-100
-90
-80
-70
-60
-50
-40
-30
-20
-10
0
DF
T M
agni
tude
(dB
FS
)
Frequency (f/fs)
(b) Two-tone input with frequencies of 487 and 491 MHz.
Figure 5.5: Measured frequency responses of the ADC. The input amplitude is 1
Vppd. The ADC sample rate is about 1 MHz.
CHAPTER 5. EXPERIMENTAL RESULTS 77
DAC
ADC
HP8644B signal generator
NI PXI-6120 NI PXI-6562
PC
ADI DPG2Pattern generator
prototype chipon test board
HP 8560Espectrum analyzer
control
800 MS/s
1 MS/s
Figure 5.6: Test setup for DAC calibration.
CHAPTER 5. EXPERIMENTAL RESULTS 78
During calibration, the DAC input data is a random vector in order to provide
linearly independent equations. When the calibration converges, the DAC is tested
with predistorted sinusoid tones and the performance is evaluated with a spectrum
analyzer.
Single tone output spectra of the DAC before and after calibration are shown
in Figure 5.7. The first picture shows the outputs due to a low frequency input at
29 MHz. It can be seen that the linearity before calibration is about 30 dB and
limited by third order harmonic. The plot also shows tones at 800/3±29 MHz which
are due to the input tones mixing with the clock of the DAC cores at 800/3 MHz.
After calibration, the harmonics of the input as well as the mixing terms are pushed
down and the SFDR improves to about 60 dB. The second picture is a test with an
input frequency close to the Nyquist frequency. Similarly in this picture, the linearity
improves by about 30 dB and the DAC achieves a final SFDR of 53 dB.
After calibration, the DAC was also tested using two-tone inputs to make sure that
the single-tone readings are in fact valid and no significant high frequency harmonic
is being filtered and missed. One of these tests is shown Figure 5.8. The input signal
contains two tones, at frequencies of 370 and 376 MHz, both very close to the DAC
Nyquist frequency of 400 MHz. The resulting spectrum plot shows an SFDR of about
68 dB.
Figure 5.9 shows a plot of DAC SFDR before and after calibration versus fre-
quency. The DAC achieves low frequency SFDRs as high as 58.5 dB and an overall
linearity of better than 53 dB for input frequencies up to 400 MHz and peak-to-peak
differential output swing of 800 mV. The SFDR roll-off is gradual across frequencies
which confirms that the DAC nonlinearity is frequency independent for the most part.
One factor that can explain this roll-off though is due to some filtering prior to the
DAC output driver. Specifically, because of non-zero switch resistances there is a
finite bandwidth filter at the output of the SC cores [48]. This filter goes between the
predistorter and the DAC nonlinearity and so the nonlinearity can only be partially
cancelled. As shown in Appendix B, even if the predistorter has the exact inverse of
CHAPTER 5. EXPERIMENTAL RESULTS 79
0 50 100 150 200 250 300 350 400-80
-60
-40
-20
0
Frequency (MHz)
Spe
ctru
m (d
Bm
)
0 50 100 150 200 250 300 350 400-80
-60
-40
-20
0
Frequency (MHz)
Spe
ctru
m (d
Bm
)
(a) Input signal is 800 mVppd at 29 MHz.
0 50 100 150 200 250 300 350 400-80
-60
-40
-20
0
Frequency (MHz)
Spe
ctru
m (d
Bm
)
0 50 100 150 200 250 300 350 400-80
-60
-40
-20
0
Frequency (MHz)
Spe
ctru
m (d
Bm
)
(b) Input signal is 800 mVppd at 373 MHz.
Figure 5.7: Measured output spectra of the DAC before and after calibration.
CHAPTER 5. EXPERIMENTAL RESULTS 80
0 50 100 150 200 250 300 350 400-80
-70
-60
-50
-40
-30
-20
-10
0
Frequency (MHz)
Spe
ctru
m (d
Bm
)
Figure 5.8: Measured output spectrum of the DAC before and after calibration. Input
signal is 800 mVppd signal containing two tones at 370 and 376 MHz.
the DAC driver characteristic, this filter will limit the achievable linearity to
HD3 =3
4A2 c3c1
(fsigfT/H
)2
(5.1)
where A is the signal amplitude, c3/c1 is the relative magnitudes of the third order
nonlinearity to the linear term, fsig is the signal frequency and fT/H is the band-
width of the track-and-hold. Therefore, at high frequencies the simple model of static
nonlinearity becomes invalid and the SFDR starts to degrade.
5.3 Performance Summary
Figure 5.10 compares the performance achieved by the calibrated DAC discussed in
this thesis with some other published works. It can be seen that the design landscape
is dominated by the current steering DACs. The DAC discussed in this thesis is one of
the few ones that are based on an SC architecture, and achieves the best performance
CHAPTER 5. EXPERIMENTAL RESULTS 81
0 50 100 150 200 250 300 350 40020
30
40
50
60
70
80
SF
DR
(dB
)
Signal Frequency (MHz)
Before CalibrationAfter Calibration
Figure 5.9: Measured SFDR of the DAC before and after calibration versus input
frequency.
in this category. The performance is still comparable to the best current steering
DACs which proves the potential of this new architecture.
CHAPTER 5. EXPERIMENTAL RESULTS 82
0 100 200 300 400 500 600 700 800 900 10000
10
20
30
40
50
60
70
80
90
Frequency (MHz)
SF
DR
(dB
)
Current Steering CMOSSwitched Cap CMOSThis Work
[49]
[50]
[51]
[52]
[53]
Figure 5.10: SFDR versus frequency of some published DACs in comparison with a
data point from the prototype chip.
Chapter 6
Conclusions and Future Work
6.1 Conclusions
Digital calibration techniques are increasingly used to push the performance of analog
integrated circuits (ICs). These techniques are especially suitable for mixed-signal
systems where one end of the signal chain is in digital and therefore already accessible
to digital signal processing (DSP) functions. Examples include digital calibration of
data converters to correct the nonlinearity of op amps or digital predistortion of
power amplifiers where the aggregate nonlinearity of the transmitter chain can be
compensated in the baseband. Among data converter circuits, much focus has been
on the ADCs—especially pipeline—where the inherent redundancy in the architecture
and availability of the output in digital facilitates the implementation of calibration
algorithms. DAC design, on the other hand, comes in a much more limited variety
and offers very few calibration ideas.
Complementing the new DAC architecture proposed in [48], this work investi-
gates some important ramifications of calibrating a DAC. Important considerations
and trade-offs in the design of a sense ADC as well as an adaptive background cali-
bration algorithm were discussed. The complete system was implemented and tested
as a proof of concept. The results are comparable with those achieved from more tra-
ditional designs. We hope that this will open up doors to new ideas and architectures
in the future.
83
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 84
6.2 Future Work
This work was primarily a step towards identifying and understanding the various
trade-offs involved in the design. Some of this understanding was gained along the
way, and in light of that, some of our design choices can be further improved. Also,
this work was meant to be a proof of concept. Therefore, there are additional steps
that must be taken to obtain an industry-standard product. These two considerations
are explained further below.
Optimization of the Design
Being the first one of its kind, this design can be further tuned and optimized. This
was primarily due to the exploratory nature of the design approach, and that the final
calibration algorithm was developed in the lab after having received the chips from
the foundry. Therefore, there was a disconnect between the design and optimization
of the calibration algorithm and the design of the circuit blocks. Specifically, the
adaptive calibration algorithm requires in the order of 106 measurements before it
converges, and that translates into a calibration time in the order of seconds, which
might be too long for some applications. There are two ways to shorten the calibration
time: One is to modify the design of the calibration ADC and increase its sampling
frequency or improve its SNR. The second option is to modify the DAC design and
operate the output drives at a lower efficiency. This will squeeze the operating zone
of the output differential pair and result in a more linear response, thereby decreasing
the number of nonlinearity coefficients.
Hardware Implementation of the Calibration Algorithm
The proposed calibration algorithm was implemented in software. This gave us the
flexibility to experiment with different algorithms. However, it also introduced major
complications in testing. The transportation of data between the chip, the computer
and the DPG was the main bottleneck in real time calibration. Furthermore, the
lack of handshaking signal from the DPG caused syncing problems during real time
operation. The DPG was only able to play an array of input codes indefinitely, and
CHAPTER 6. CONCLUSIONS AND FUTURE WORK 85
the array could not be updated while the DPG was playing. Therefore, we had to
stop the DPG, update its RAM and start it over again. Now, there were three DAC
cores in the chip and three corresponding LUTs in the software, and at every update
the DPG would start from a random LUT out of sync with the cores. We were able to
solve this problem by interrupting the DPG operation repeatedly until chance would
put everything in sync. But, this resulted in a more complex and longer calibration.
In a real application, the LUTs must be implemented in hardware and these issues
will be avoided all the way.
Appendix A
Calculation of Least Squares
Residual Error
Let’s assume that y(n) denotes the vector of measured outputs up until time n, and
the sampled output at time n is yn; therefore, the following relationship captures the
growth of the vector y(n) based on its previous value:
y(n) =
[y(n− 1)
yn
]∈ Rn (A.1)
Similarly, D(n) denotes the data matrix at time n. It is constructed by appending a
new row of input data dTn to the end of D(n− 1):
D(n) =
[D(n− 1)
dTn
]∈ Rn×m (A.2)
As explained earlier and illustrated in Figure 3.6, the least-squares problem in (3.17)
can be solved by applying a series of Givens rotations to the data matrix and output
vector. At time n− 1, we have
G(n− 1)Λ(n− 1)[D(n− 1) y(n− 1)
]=
[R(n− 1) p1(n− 1)
0 p2(n− 1)
](A.3)
where G(n−1) is the aggregate product of all the Givens rotations that triangularize
the data matrix D(n − 1), R(n − 1) is an m-by-m upper triangular matrix, and
86
APPENDIX A. CALCULATION OF LEAST SQUARES RESIDUAL ERROR 87
p1(n− 1) and p2(n− 1) are m-by-1 and (n−m− 1)-by-1 vectors, respectively. The
matrix Λ(n − 1) is a diagonal forgetting matrix that emphasizes the more recent
entries of the LS equation.
Λ(n− 1) =
λn−1 0 . . . 0
0 λn−2 . . . 0...
.... . .
...
0 0 . . . 1
(A.4)
where 0 < λ < 1 is the forgetting factor.
Now, using equations (A.1) and (A.2), we write an equation similar to equation
(A.3), but corresponding to time n.
G(n)
λR(n− 1) λp1(n− 1)
0 λp2(n− 1)
dTn yn
=
[R(n) p1(n)
0 p2(n)
](A.5)
The error vector at time n is defined as:
e(n) = y(n)−D(n)h(n) (A.6)
Similar set of recursion equations can be written for the error vector as follows:
e(n) =
[e(n− 1)
en
](A.7)
where
en = yn − dTnh(n) (A.8)
Now, we apply the Givens rotation matrix to the weighted error vector at time n− 1,
and use (A.3) to expand the result:
G(n− 1)Λ(n− 1)e(n− 1) = G(n− 1)Λ(n− 1)(y(n− 1)−D(n− 1)h(n− 1)
)=
[p1(n− 1)
p2(n− 1)
]−
[R(n− 1)
0
]h(n− 1) (A.9)
APPENDIX A. CALCULATION OF LEAST SQUARES RESIDUAL ERROR 88
The same relationship at time n will be
G(n)
[λe(n− 1)
en
]=
[p1(n)
p2(n)
]−
[R(n)
0
]h(n) =
[0
p2(n)
](A.10)
where the last equality holds because at time n we have p1(n) = R(n)h(n).
Now let’s examine the Givens rotation matrix G(n). This matrix is the product
of m Givens rotation matrices that zero out the last row of the data matrix, dTn :
G(n) =
cos(φ1) sin(φ1)
1. . .
−sin(φ1) cos(φ1)
1
cos(φ2) sin(φ2). . .
−sin(φ2) cos(φ2)
· · ·
· · ·
1. . . 0
cos(φm) sin(φm)
0 . . .
−sin(φm) cos(φm)
(A.11)
Examining the product of these matrices, we can see that the Givens rotation matrix
has the following structure:
m
n−m
0
0
0
0
0G(n) = 1
1
1
Multiplying both sides of (A.10) by GT (n), we have[λe(n− 1)
en
]= GT (n)
[0
p2(n)
](A.12)
APPENDIX A. CALCULATION OF LEAST SQUARES RESIDUAL ERROR 89
Now let’s focus on the row-column multiplication in (A.12) that gives en. The vector
on the right side of (A.12) consists of m zeros on top. So multiplying that by the
last row of G(n) filters out all but the corner element of G(n). Again, if we inspect
(A.11) we can see that the corner element is the product of all the cosφ ’s. Therefore,
we have
en = Gnnpn =m∏i=1
cos(φi)pn (A.13)
Therefore, we obtain the LS residual error without explicit derivation of the so-
lution. Algorithm 3 summarizes the steps to get the error en. The inputs to the
algorithm are the latest sample yn and the corresponding data row dTn . Also available
to the algorithm are the upper triangular matrix R(n) and the vector p1(n).
APPENDIX A. CALCULATION OF LEAST SQUARES RESIDUAL ERROR 90
Algorithm 3 RLS error calculation
Require: sampled output yn and the corresponding row of data matrix dTn
1: append dTn to the bottom of the upper-triangular matrix R ∈ Rm×m
2: append yn to the bottom of the vector p ∈ Rm×1
3: set Π = 1
4: for i = 1 to m do // calculate the rotation angle to zero out Rm+1,i
5: rx = λRi,i
6: ry = Rm+1,i
7: sinφ = ry√r2x+r
2y
8: cosφ = rx√r2x+r
2y
9: for j = i to m do // rotate elements of R
10: [Ri,j
Rm+1,j
]=
[cosφ sinφ
−sinφ cosφ
][rx
ry
]11: end for
12: rotate elements of p:[pi
pm+1
]=
[cosφ sinφ
−sinφ cosφ
][λpi
pm+1
]
13: build up the product of cosφ ’s: Π = Π · cosφ14: end for
15: return e = Π · pm+1
Appendix B
Predistortion Error Due to Model
Inaccuracy
When modeling the DAC, we assumed that it comprises of a static nonlinearity fol-
lowed by a filter at the output, thereby arriving at a Hammerstein model. Of course,
there is always some modeling inaccuracy at this level of abstraction. Specifically,
there is some filtering before the DAC drivers—due to finite bandwidth of the SC core
track-and-hold—and therefore the assumption of memoryless nonlinearity is not quite
true. The system can be more accurately modeled using power series with memory.
In this section, we use the more general Volterra filter model to analyse the signal
chain.
A model that includes the filtering effect prior to DAC nonlinearity is shown in
Figure B.1. In this picture, the sigmoid nonlinearity denoted by the function f(·)corresponds to the DAC driver nonlinearity. The filter h(t) models the filtering effect
that was absent in our earlier modeling. The function g(·) denotes the predistorter
x y
f(·)g(·) h(t)
t
Figure B.1: Alternative model of the DAC including a filter before the driver.
91
APPENDIX B. PREDISTORTION ERROR DUE TO MODEL INACCURACY 92
curve and it is assumed that it has the inverse characteristic of f . The output filter
of the Hammerstein model is not shown in this picture. If g and f were cascaded
back-to-back, their nonlinearities would cancel out. However, in the presence of h,
the predistortion is not perfect and there would be some residual nonlinearity in the
signal path.
Assuming a simple third order nonlinearity for the DAC output driver, we have
f(x) = x− εx3 (B.1)
If we ignore terms with higher powers of ε, the predistorter for this nonlinearity will
be
g(x) = x+ εx3 (B.2)
We assume that the filter h has a first-order response with time constant T . Thus,
h(t) =1
Te−
tT (B.3)
Now, ignoring powers of ε larger than one, we can obtain the output of the DAC as
follows:
y = f (h ∗ g (x))
= f
(∫h(τ)
(x(t− τ) + εx3(t− τ)
)dτ
)=
∫h(τ)
(x(t− τ) + εx3(t− τ)
)dτ − ε
(∫h(τ)
(x(t− τ) + εx3(t− τ)
)dτ
)3
≈∫h(τ)x(t− τ)dτ + ε
(∫h(τ)x3(t− τ)dτ −
(∫h(τ)x(t− τ)dτ
)3)
(B.4)
The last expression in (B.4) can be cast into the following format:
y =
∫h(τ)x(t− τ)dτ
+ ε
(∫h(τ)x3(t− τ)dτ
−∫∫∫
h(τ1)h(τ2)h(τ3)x(t− τ1)x(t− τ2)x(t− τ3)dτ1dτ2dτ3)
(B.5)
APPENDIX B. PREDISTORTION ERROR DUE TO MODEL INACCURACY 93
We recognize that the output is in the form of a third order Volterra filter as shown
below:
y =
∫h1(τ)x(t− τ)dτ
+
∫∫∫h3(τ1, τ2, τ3)x(t− τ1)x(t− τ2)x(t− τ3)dτ1dτ2dτ3 (B.6)
Matching equations (B.4) and (B.6), we can find the symmetric Volterra kernels as
follows:
h1(τ) = h(τ) =1
Te−
τT (B.7)
h3(τ1, τ2, τ3) = ε(
3√h(τ1)h(τ2)h(τ3)δ (τ − τ1, τ − τ2, τ − τ3)− h(τ1)h(τ2)h(τ3)
)=ε
T
(e−
τ1+τ2+τ33T δ (τ − τ1, τ − τ2, τ − τ3)− e−
τ1+τ2+τ3T
)(B.8)
The corresponding frequency domain kernels are
H1(s) =1
1 + sT(B.9)
H3(s1, s2, s3) = ε
(1
1 + (s1 + s2 + s3)T− 1
(1 + s1T ) (1 + s2T ) (1 + s3T )
)(B.10)
Having the Volterra kernels, we can find the response of the system to a single tone.
x = Acos(ωt)
y = A |H1(jω)| cos (ωt+ ∠H1(jω))
+3A3
4|H3(jω, jω,−jω)| cos (ωt+ ∠H3(jω, jω,−jω))
+A3
4|H3(jω, jω, jω)| cos (3ωt+ ∠H3(jω, jω, jω))
Assuming ε 1, the third order nonlinearity at the output will be given by [44]
HD3 =A2
4
|H3(jω, jω, jω)||H1(jω)|
=A2ε
4
∣∣∣ 11+3jωT
− 1(1+jωT )3
∣∣∣∣∣∣ 11+jωT
∣∣∣ (B.11)
APPENDIX B. PREDISTORTION ERROR DUE TO MODEL INACCURACY 94
Next, we assume that the bandwidth of the filter is much larger than the signal
frequency, therefore ωT 1. Using binomial expansion, we can further simplify the
above result to get
HD3 ≈ A2ε
4
∣∣∣∣ 1
1 + 3jωT− 1
(1 + jωT )3
∣∣∣∣=A2ε
4
∣∣1− 3jωT + (3jωT )2 −(1− 3jωT + 6 (jωT )2
)∣∣=
3
4A2ε(ωT )2 =
3
4A2ε
(fsigfT/H
)2
(B.12)
Bibliography
[1] G. Manganaro, S.-U. Kwak, and A. Bugeja, “A Dual 10-b 200 MS/s Pipeline D/A
Converter with DLL-Based Clock Synthesizer,” in Custom Integrated Circuits
Conference, 2003. Proceedings of the IEEE 2003, Sep 2003, pp. 429–432.
[2] K. Khanoyan, F. Behbahani, and A. Abidi, “A 10 b, 400 MS/s Glitch-Free
CMOS D/A Converter,” in VLSI Circuits, 1999. Digest of Technical Papers.
1999 Symposium on, 1999, pp. 73–76.
[3] M. Gustavsson, J. J. Wikner, and N. N. Tan, CMOS Data Converters for Com-
munications. Norwell, MA, USA: Kluwer Academic Publishers, 2000.
[4] F. Maloberti, Data Converters. Springer, 2007.
[5] B. Murmann and B. Boser, “A 12 bit 75 MS/s Pipelined ADC Using Open-Loop
Residue Amplification,” Solid-State Circuits, IEEE Journal of, Dec 2003.
[6] D.-L. Shen and T.-C. Lee, “A 6 bit 800 MS/s Pipelined A/D Converter with
Open-Loop Amplifiers,” Solid-State Circuits, IEEE Journal of, Feb 2007.
[7] A. Nazemi, C. Grace, L. Lewyn, B. Kobeissy, O. Agazzi, P. Voois, C. Abidin,
G. Eaton, M. Kargar, C. Marquez, S. Ramprasad, F. Bollo, V. Posse,
S. Wang, and G. Asmanis, “A 10.3 GS/s 6 bit (5.1 ENOB at Nyquist) Time-
Interleaved/Pipelined ADC Using Open-Loop Amplifiers and Digital Calibration
in 90 nm CMOS,” in VLSI Circuits, 2008 IEEE Symposium on, Jun 2008.
[8] S. Boumaiza, J. Li, M. Jaidane-Saidane, and F. Ghannouchi, “Adaptive Digi-
tal/RF Predistortion Using a Nonuniform LUT Indexing Function with Built-in
95
BIBLIOGRAPHY 96
Dependence on the Amplifier Nonlinearity,” Microwave Theory and Techniques,
IEEE Transactions on, vol. 52, no. 12, pp. 2670–2677, Dec 2004.
[9] T. Wang and J. Ilow, “Compensation of Nonlinear Distortions with Memory Ef-
fects in OFDM Transmitters,” in Global Telecommunications Conference, 2004.
GLOBECOM ’04. IEEE, vol. 4, Nov.-3 Dec. 2004, pp. 2398–2403.
[10] M. Schetzen, The Volterra and Wiener Theories of Nonlinear Systems. R.E.
Krieger Publishing Company, 1980. [Online]. Available: http://books.google.
com/books?id=dMB3QwAACAAJ
[11] L. Ding, G. Zhou, D. Morgan, Z. Ma, J. Kenney, J. Kim, and C. Giardina, “A
Robust Digital bBaseband Predistorter Constructed Using Memory Polynomi-
als,” Communications, IEEE Transactions on, vol. 52, no. 1, pp. 159–165, Jan
2004.
[12] F. Kuech and W. Kellermann, “Partitioned Block Frequency-Domain Adaptive
Second-Order Volterra Filter,” Signal Processing, IEEE Transactions on, vol. 53,
no. 2, pp. 564–575, Feb 2005.
[13] L. Ngia and J. Sjoberg, “Efficient Training of Neural Nets for Nonlinear Adaptive
Filtering Using a Recursive Levenberg-Marquardt Algorithm,” Signal Processing,
IEEE Transactions on, vol. 48, no. 7, pp. 1915–1927, Jul 2000.
[14] K. Shi, X. Ma, and G. Zhou, “Acoustic Echo Cancellation Using a Pseudo-
coherence Function in the Presence of Memoryless Nonlinearity,” Circuits and
Systems I: Regular Papers, IEEE Transactions on, vol. 55, no. 9, pp. 2639–2649,
Oct 2008.
[15] C. Daigle, “Switched-Capacitor DACs Using Open-Loop Output Drivers and
Digital Predistortion,” Ph.D. dissertation, Stanford University, Stanford, Aug.
2010. [Online]. Available: http://purl.stanford.edu/fv654tp9350
[16] K. Doris, QRD-RLS Adaptive Filtering. Dordrecht, The Netherlands: Springer,
2006.
BIBLIOGRAPHY 97
[17] Y.-M. Zhu, “Generalized Sampling Theorem,” Circuits and Systems II: Analog
and Digital Signal Processing, IEEE Transactions on, vol. 39, no. 8, pp. 587–588,
Aug 1992.
[18] J. Tsimbinos and K. Lever, “Input Nyquist Sampling Suffices to Identify and
Compensate Nonlinear Systems,” Signal Processing, IEEE Transactions on,
vol. 46, no. 10, pp. 2833–2837, Oct 1998.
[19] J. Tsimbinos, “Identification and Compensation of Nonlinear Distortion,” Ph.D.
dissertation, Univ. of South Australia, The Levels, South Australia, Feb. 1995.
[20] W. Frank, “Sampling Requirements for Volterra System Identification,” Signal
Processing Letters, IEEE, vol. 3, no. 9, pp. 266–268, Sep 1996.
[21] R. Martin, “Volterra System Identification and Kramer’s Sampling Theorem,”
Signal Processing, IEEE Transactions on, vol. 47, no. 11, pp. 3152–3155, Nov
1999.
[22] P. Nikaeen, “Digital Compensation of Dynamic Acquisition Errors at the Front-
End of ADCs,” Ph.D. dissertation, Stanford University, Stanford, Sep. 2008.
[23] P. Wambacq and W. Sansen, Distortion Analysis of Analog Integrated Circuits.
Boston, MA: Kluwer, 1998.
[24] M. Hasler and G. Kubin, “Mixed-Domain System Representation Using Volterra
Series,” in Circuits and Systems, 2008. ISCAS 2008. IEEE International Sym-
posium on, May 2008, pp. 572–575.
[25] M. Schetzen, “Theory of pth-Order Inverses of Nonlinear Systems,” Circuits and
Systems, IEEE Transactions on, vol. 23, no. 5, pp. 285–291, May 1976.
[26] J. G. McWhirter, “Recursive Least-Squares Minimization Using a Systolic Ar-
ray,” vol. 431, no. 6, Jan 1983, pp. 105–112.
[27] J. Apolinario, QRD-RLS Adaptive Filtering, ser. Lecture Notes in Mathematics.
Springer, 2009. [Online]. Available: http://books.google.com/books?id=
w3TROT70zpYC
BIBLIOGRAPHY 98
[28] T. G. Kolda, R. M. Lewis, and V. Torczon, “Optimization by Direct Search: New
Perspectives on Some Classical and Modern Methods,” SIAM Review, vol. 45,
pp. 385–482, 2003.
[29] V. Torczon, “On the Convergence of Pattern Search Algorithms,” SIAM Journal
on Optimization, vol. 7, pp. 1–25, 1993.
[30] J. A. Nelder and R. Mead, “A Simplex Method for Function Minimization,”
The Computer Journal, vol. 7, no. 4, pp. 308–313, Jan. 1965. [Online]. Available:
http://dx.doi.org/10.1093/comjnl/7.4.308
[31] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright, “Convergence Prop-
erties of the Nelder-Mead Simplex Method in Low Dimensions,” SIAM Journal
of Optimization, vol. 9, pp. 112–147, 1998.
[32] M. Baudin, “Nelder-Mead User’s Manual,” 2010.
[33] M. G. Kim, G. cho Ahn, and U. ku Moon, “An Improved Algorithmic ADC
Clocking Scheme,” IEEE Int. Symp. Circuits & Systems, pp. 589–592, 2004.
[34] M. Keskin, “A Novel Low-Voltage Switched-Capacitor Input Branch,” Circuits
and Systems II: Analog and Digital Signal Processing, IEEE Transactions on,
vol. 50, no. 6, pp. 315–317, Jun 2003.
[35] J. Li, G.-C. Ahn, D.-Y. Chang, and U.-K. Moon, “A 0.9 V 12 mW 5 MS/s
Algorithmic ADC with 77 dB SFDR,” Solid-State Circuits, IEEE Journal of,
vol. 40, no. 4, pp. 960–969, Apr 2005.
[36] G.-C. Ahn, M. G. Kim, P. Hanumolu, and U.-K. Moon, “A 1 V 10-b 30 MS/s
Switched-RC Pipelined ADC,” in Custom Integrated Circuits Conference, 2007.
CICC ’07. IEEE, Sep 2007, pp. 325–328.
[37] A. Abo and P. Gray, “A 1.5 V, 10 bit, 14.3 MS/s CMOS pipeline Analog-to-
Digital Converter,” Solid-State Circuits, IEEE Journal of, May 1999.
BIBLIOGRAPHY 99
[38] O. Erdogan, P. Hurst, and S. Lewis, “A 12-b Digital-Background-Calibrated
Algorithmic ADC with −90 dB THD,” Solid-State Circuits, IEEE Journal of,
Dec 1999.
[39] K. Bult and G. Geelen, “A Fast-Settling CMOS Op Amp for SC Circuits with
90 dB DC Gain,” Solid-State Circuits, IEEE Journal of, vol. 25, no. 6, pp. 1379–
1384, Dec 1990.
[40] R. Jindal, “Compact Noise Models for MOSFETs,” Electron Devices, IEEE
Transactions on, vol. 53, no. 9, pp. 2051–2061, Sep 2006.
[41] B. Razavi, Design of Analog CMOS Integrated Circuits, 1st ed. New York, NY,
USA: McGraw-Hill, Inc., 2001.
[42] A. Abo, “Design for Reliability of Low-voltage, Switched-capacitor Circuits,”
Ph.D. dissertation, University of California, Berkeley, 1999.
[43] D. Cline, “Noise, Speed, and Power Trade-offs in Pipelined Analog to Digital
Converters,” Ph.D. dissertation, University of California, Berkeley.
[44] B. Murmann, EE214 Lecture Notes, Stanford University, 2012.
[45] ——, EE315A Lecture Notes, Stanford University, 2012.
[46] T. Sepke, P. Holloway, C. Sodini, and H.-S. Lee, “Noise Analysis for Comparator-
Based Circuits,” Circuits and Systems I: Regular Papers, IEEE Transactions on,
Mar 2009.
[47] B. Murmann, EE315B Lecture Notes, Stanford University, 2012.
[48] C. Daigle, A. Dastgheib, and B. Murmann, “A 12-bit 800-MS/s Switched-
Capacitor DAC with Open-Loop Output Driver and Digital Predistortion,” in
Solid State Circuits Conference (A-SSCC), 2010 IEEE Asian, Nov. 2010, pp.
1–4.
BIBLIOGRAPHY 100
[49] W. Schofield, D. Mercer, and L. Onge, “A 16-b 400 MS/s DAC with < −80
dBc IMD to 300MHz and < −160 dBm/Hz Noise Power Spectral Density,” in
Solid-State Circuits Conference, 2003. Digest of Technical Papers. ISSCC. 2003
IEEE International, vol. 1, 2003, pp. 126–482.
[50] B. Schafferer and R. Adams, “A 3 V CMOS 400 mW 14-b 1.4 GS/s DAC for
Multi-Carrier Applications,” in Solid-State Circuits Conference, 2004. Digest of
Technical Papers. ISSCC. 2004 IEEE International, vol. 1, 2004, pp. 360–532.
[51] A. Van den Bosch, M. Borremans, M. S. J. Steyaert, and W. Sansen, “A 10
bit 1 GSample/s Nyquist Current-Steering CMOS D/A Converter,” Solid-State
Circuits, IEEE Journal of, vol. 36, no. 3, pp. 315–324, 2001.
[52] C.-H. Lin, F. van der Goes, J. Westra, J. Mulder, Y. Lin, E. Arslan, E. Ayranci,
X. Liu, and K. Bult, “A 12 bit 2.9 GS/s DAC with IM3 < −60 dBc Beyond 1
GHz in 65 nm CMOS,” Solid-State Circuits, IEEE Journal of, vol. 44, no. 12,
pp. 3285–3293, 2009.
[53] J. Savoj, A. Abbasfar, A. Amirkhany, M. Jeeradit, and B. Garlepp, “A 12 GS/s
Phase-Calibrated CMOS Digital-to-Analog Converter for Backplane Communi-
cations,” Solid-State Circuits, IEEE Journal of, vol. 43, no. 5, pp. 1207–1216,
2008.