CBMS-NSF REGIONAL CONFERENCE...

CBMS-NSF REGIONAL CONFERENCE SERIESIN APPLIED MATHEMATICS

A series of lectures on topics of current research interest in applied mathematics under the directionof the Conference Board of the Mathematical Sciences, supported by the National ScienceFoundation and published by SIAM.

GARRETT BIRKHOEE, The Numerical Solution of Elliptic EquationsD. V. LINDLEY, Bayesian Statistics, A ReviewR. S. VARGA, Functional Analysis and Approximation Theory in Numerical AnalysisR. R. B A H A D U R , Some Limit Theorems in StatisticsPATRICK BILEINGSEEY, Weak Convergence of Measures: Applications in ProbabilityJ. L. LIONS, Some Aspects of the Optimal Control of Distributed Parameter SystemsROGER PENROSE, Techniques of Differential Topology in RelativityHERMAN CHERNOFF, Sequential Analysis and Optimal DesignJ. DURBIN, Distribution Theory for Tests Based on the Sample Distribution FunctionSoi. I. RUBINOW, Mathematical Problems in the Biological SciencesP. D. LAX, Hyperbolic Systems of Conservation Laws and the Mathematical Theory

of Shock WavesI. J. SCHORNBERG, Cardinal Spline InterpolationIVAN SINGER, The Theory of Best Approximation and Functional AnalysisWERNER C. RHEINBOI.DT, Methods of Solving Systems of Nonlinear EquationsHANS F. WEINBERGER, Variational Methods for Eigenvalue ApproximationR. TYRRELL ROCKAEELLAR, Conjugate Duality and OptimizationSIR JAMES LIGHTHILL, Mathematical BiofluiddynamicsG E R A R D SALTON, Theory of IndexingCATHEEEN S. MORAWKTZ, Notes on Time Decay and Scattering for Some Hyperbolic ProblemsF. HOPPENSTEADT, Mathematical Theories of Populations: Demographics, Genetics

and EpidemicsR I C H A R D ASKEY, Orthogonal Polynomials and Special FunctionsL. E. PAYNE. Improperly Posed Problems in Partial Differential EquationsS. ROSEN, Lectures on the Measurement and Evaluation of the Performance

of Computing SystemsHERBERT B. KELLER, Numerical Solution of Two Point Boundary Value ProblemsJ. P. LASALLE, The Stability of Dynamical Systems - Z. ARTSTEIN, Appendix A: Limiting

Equations and Stability of Nonautonomous Ordinary Differential EquationsD. GOTTLIEB AND S. A. ORS/AG, Numerical Analysis of Spectral Methods: Theorv

and ApplicationsPETER J. HUBER. Robust Statistical ProceduresHERBERT SOLOMON, Geometric ProbabilityFRED S. ROBERTS, Graph Theory and Its Applications to Problems of SocietyJ L R I S HARTMANIS, Feasible Computations and Provable Complexity PropertiesZOIIAR M A N N A , Lectures on the Logic of Computer ProgrammingFI.IJS L. JOHNSON, Integer Programming: Facets, Subadditivity, and Duality for Group

and Semi-Group ProblemsSHMUI- . I . WINOGRAD, Arithmetic Complexity of Computations1. F. r. KINCMAN, Mathematics of Genetic DiversityMORION E. GURTIN, Topics in Finite ElasticityTHOMAS G. KURTZ, Approximation of Population Processes

(continued on inside back cover)

INGRID DAUBECHIESRutgers University andAT&T Bell Laboratories

Ten Lectures onWavelets

SOCIETY FOR INDUSTRIAL AND APPLIED MATHEMATICS

PHILADELPHIA, PENNSYLVANIA 1992

Copyright 1992 by the Society for Industrial and Applied Mathematics

All rights reserved. No part of this book may be reproduced, stored, ortransmitted in any manner without the written permission of the Publisher.For information, write the Society for Industrial and Applied Mathematics,3600 University City Science Center, Philadelphia, Pennsylvania 19104-2688.

Second Printing 1992Third Printing 1994Fourth Printing 1995Fifth Printing 1997Sixth Printing 1999Seventh Printing 2002Eighth Printing 2004

Library of Congress Cataloging-in-Publication Data

Daubechies, Ingrid.Ten lectures on wavelets / Ingrid Daubechies.

P. Cm, — (CBMS-WSF regional conforoncQ series in appliedmathematics ; 61)

Includes bibliographical references and index.ISBN 0-89871-274-21. Wavelets (Mathematics)—Congresses. I. Title. II. Series.

QA403.3.D38 1992515'.2433—dc20 92-13201

SIAM. is a registered trademark.

To my mother, who gave me the will to be independent.

To my father, who stimulated my interest in science.

iii

This page intentionally left blank

Contents

vii INTRODUCTION

xi PRELIMINARIES AND NOTATION

1 CHAPTER 1: The What, Why, and How of Wavelets

17 CHAPTER 2: The Continuous Wavelet Transform

53 CHAPTER 3: Discrete Wavelet Transforms: Frames

107 CHAPTER 4: Time-Frequency Density and Orthonormal Bases

129 CHAPTER 5: Orthonormal Bases of Wavelets and Multiresolution Analysis

167 CHAPTER 6: Orthonormal Bases of Compactly Supported Wavelets

215 CHAPTER 7: More About the Regularity of Compactly Supported Wavelets

251 CHAPTER 8: Symmetry for Compactly Supported Wavelet Bases

289 CHAPTER 9: Characterization of Functional Spaces by Means of Wavelets

313 CHAPTER 10: Generalizations and Tricks for Orthonormal Wavelet Bases

341 REFERENCES

353 SUBJECT INDEX

355 AUTHOR INDEX

V

Introduction

Wavelets are a relatively recent development in applied mathematics. Theirname itself was coined approximately a decade ago (Morlet, Arens, Fourgeau,and Giard (1982), Morlet (1983), Grossmann and Morlet (1984)); in the last tenyears interest in them has grown at an explosive rate. There are several rea-sons for their present success. On the one hand, the concept of wavelets can beviewed as a synthesis of ideas which originated during the last twenty or thirtyyears in engineering (subband coding), physics (coherent states, renormalizationgroup), and pure mathematics (study of Calderon-Zygmund operators). As aconsequence of these interdisciplinary origins, wavelets appeal to scientists andengineers of many different backgrounds. On the other hand, wavelets are a fairlysimple mathematical tool with a great variety of possible applications. Alreadythey have led to exciting applications in signal analysis (sound, images) (someearly references are Kronland-Martinet, Morlet and Grossmann (1987), Mallat(1989b), (1989c); more recent references are given later) and numerical analy-sis (fast algorithms for integral transforms in Beylkin, Coifman, and Rokhlin(1991)); many other applications are being studied. This wide applicability alsocontributes to the interest they generate.

This book contains ten lectures I delivered as the principal speaker at theCBMS conference on wavelets organized in June 1990 by the Mathematics De-partment at the University of Lowell, Massachusetts. According to the usualformat of the CBMS conferences, other speakers (G. Battle, G. Beylkin, C. Chui,A. Cohen, R. Coifman, K. Grochenig, J. Liandrat, S. Mallat, B. Torresani,and A. Willsky) provided lectures on their work related to wavelets. Moreover,three workshops were organized, on applications to physics and inverse problems(chaired by B. DeFacio), group theory and harmonic analysis (H. Feichtinger),and signal analysis (M. Vetterli). The audience consisted of researchers activein the field of wavelets as well as of mathematicians and other scientists andengineers who knew little about wavelets and hoped to learn more. This secondgroup constituted the largest part of the audience. I saw it as my task to providea tutorial on wavelets to this part of the audience, which would then be a solidgrounding for more recent work exposed by the other lecturers and myself. Con-sequently, about two thirds of my lectures consisted of "basic wavelet theory,"

vii

viii INTRODUCTION

the other third being devoted to more recent and unpublished work. This divi-sion is reflected in the present write-up as well. As a result, I believe that thisbook will be useful as an introduction to the subject, to be used either for indi-vidual reading, or for a seminar or graduate course. None of the other lecturesor workshop papers presented at the CBMS conference have been incorporatedhere. As a result, this presentation is biased more toward my own work than theCBMS conference was. In many instances I have included pointers to referencesfor further reading or a detailed exposition of particular applications, comple-menting the present text. Other books on wavelets published include Waveletsand Time Frequency Methods (Combes, Grossmann, and Tchamitchian (1987)),which contains the proceedings of the International Wavelet Conference held inMarseille, Prance, in December 1987, Ondelettes, by Y. Meyer (1990) (in French;English translation expected soon), which contains a mathematically more ex-panded treatment than the present lectures, with fewer forays into other fieldshowever, Les Ondelettes en 1989, edited by P. G. Lemarie (1990), a collection oftalks given at the Universite Paris XI in the spring of 1989, and A n Introductionto Wavelets, by C. K. Chui (1992b), an introduction from the approximationtheory viewpoint. The proceedings of the International Wavelet Conference inMay 1989, held again in Marseille, are due to come out soon (Meyer (1992)).Moreover, many of the other contributors to the CBMS conference, as well assome wavelet researchers who could not attend, were invited to write an essayon their wavelet work; the result is the essay collection Wavelets and their Ap-plications (Ruskai et al. (1992)), which can be considered a companion book tothis one. Another wavelet essay book is Wavelets: A Tutorial in Theory andApplications, edited by C. K. Chui (1992c); in addition, I know of several otherwavelet essay books in preparation (edited by J. Benedetto and M. Frazier, an-other by M. Barlaud), as well as a monograph by M. Holschneider; there wasa special wavelet issue of IEEE Trans. Inform. Theory in March of 1992; therewill be another one, later in 1992, of Constructive Approximation Theory, andone in 1993, of IEEE Trans. Sign. Proc. In addition, several recent books in-clude chapters on wavelets. Examples are Multirate Systems and Filter Banksby P. P. Vaidyanathan (1992) and Quantum Physics, Relativity and ComplexSpacetime: Towards a New Synthesis by G. Kaiser (1990). Readers interestedin the present lectures will find these books and special issues useful for manydetails and other aspects not fully presented here. It is moreover clear that thesubject is still developing rapidly.

This book more or less follows the path of my lectures: each of the ten chap-ters stands for one of the ten lectures, presented in the order in which theywere delivered. The first chapter presents a quick overview of different aspectsof the wavelet transform. It sketches the outlines of a big fresco; subsequentchapters then fill in more detail. Prom there on, we proceed to the continu-ous wavelet transform (Chapter 2; with a short review of bandlimited functionsand Shannon's theorem), to discrete but redundant wavelet transforms (frames;Chapter 3) and to a general discussion of time-frequency density and the possibleexistence of orthonormal bases (Chapter 4). Many of the results in Chapters 2-4 can be formulated for the windowed Fourier transform as well as the wavelet

INTRODUCTION ix

transform, and the two cases are presented in parallel, with analogies and differ-ences pointed out as we go along. The remaining chapters all focus on orthonor-mal bases of wavelets: multiresolution analysis and a first general strategy forthe construction of orthonormal wavelet bases (Chapter 5), orthonormal basesof compactly supported wavelets and their link to subband coding (Chapter 6),sharp regularity estimates for these wavelet bases (Chapter 7), symmetry forcompactly supported wavelet bases (Chapter 8). Chapter 9 shows that orthonor-mal bases are "good" bases for many functional spaces where Fourier methodsare not well adapted. This chapter is the most mathematical of the whole book;most of its material is not connected to the applications discussed in other chap-ters, so that it can be skipped by readers uninterested in this aspect of wavelettheory. I included it for several reasons: the kind of estimates used in the proofare very important for harmonic analysis, and similar (but more complicated)estimates in the proof of the "T(l)"-theorem of David and Journe have turnedout to be the groundwork for the applications to numerical analysis in the workof Beylkin, Coifman, and Rokhlin (1991). Moreover, the Calderon-Zygmundtheorem, explained in this chapter, illustrates how techniques using differentscales, one of the forerunners of wavelets, were used in harmonic analysis longbefore the advent of wavelets. Finally, Chapter 10 sketches several extensionsof the constructions of orthonormal wavelet bases: to more than one dimension,to dilation factors different from two (even noninteger), with the possibility ofbetter frequency localization, and to wavelet bases on a finite interval insteadof the whole line. Every chapter concludes with a section of numbered "Notes,"referred to in the text of the chapter by superscript numbers. These containadditional references, extra proofs excised to keep the text flowing, remarks, etc.

This book is a mathematics book: it states and proves many theorems. Italso presupposes some mathematical background. In particular, I assume thatthe reader is familiar with the basic properties of the Fourier transform andFourier series. I also use some basic theorems of measure and integration theory(Fatou's lemma, dominated convergence theorem, Fubini's theorem; these canbe found in any good book on real analysis). In some chapters, familiarity withbasic Hilbert space techniques is useful. A list of the basic notions and theoremsused in the book is given in the Preliminaries.

The reader who finds that he or she does not know all of these prerequisitesshould not be dismayed, however; most of the book can be followed with just thebasic notions of Fourier analysis. Moreover, I have tried to keep a very pedes-trian pace in almost all the proofs, at the risk of boring some mathematicallysophisticated readers. I hope therefore that these lecture notes will interest peo-ple other than mathematicians. For this reason I have often shied away fromthe "Definition-Lemma-Proposition-Theorem-Corollary" sequence, and I havetried to be intuitive in many places, even if this meant that the exposition be-came less succinct. I hope to succeed in sharing with my readers some of theexcitement that this interdisciplinary subject has brought into my scientific life.

I want to take this opportunity to express my gratitude to the many peoplewho made the Lowell conference happen: the CBMS board, and the MathematicsDepartment of the University of Lowell, in particular Professors G. Kaiser and

X INTRODUCTION

M. B. Ruskai. The success of the conference, which unexpectedly turned out tohave many more participants than customary for CBMS conferences, was due inlarge part to its very efficient organization. As experienced conference organizerI. M. James (1991) says, "every conference is mainly due to the efforts of asingle individual who does almost all the work"; for the 1990 Wavelet CBMSconference, this individual was Mary Beth Ruskai. I am especially grateful toher for proposing the conference in the first place, for organizing it in sucha way that I had a minimal paperwork load, while keeping me posted aboutall the developments, and for generally being the organizational backbone, nosmall task. Prior to the conference I had the opportunity to teach much of thismaterial as a graduate course in the Mathematics Department of the Universityof Michigan, in Ann Arbor. My one-term visit there was supported jointly bya Visiting Professorship for Women from the National Science Foundation, andby the University of Michigan. I would like to thank both institutions for theirsupport. I would also like to thank all the faculty and students who sat in onthe course, and who provided feedback and useful suggestions. The manuscriptwas typeset by Martina Sharp, who I thank for her patience and diligence, andfor doing a wonderful job. I wouldn't even have attempted to write this bookwithout her. I am grateful to Jeff Lagarias for editorial comments. Several peoplehelped me spot typos in the galley proofs, and I am grateful to all of them; Iwould like to thank especially Pascal Auscher, Gerry Kaiser, Ming-Jun Lai, andMartin Vetterli. All remaining mistakes are of course my responsiblity. I alsowould like to thank Jim Driscoll and Sharon Murrel for helping me prepare theauthor index. Finally, I want to thank my husband Robert Calderbank for beingextremely supportive and committed to our two-career-track with family, eventhough it occasionally means that he as well as I prove a few theorems less.

Ingrid DaubechiesRutgers University

andAT&T Bell Laboratories

In subsequent printings minor mistakes and many typographical errors havebeen corrected. I am grateful to everybody who helped me to spot them. I havealso updated a few things: some of the previously unpublished references haveappeared and some of the problems that were listed as open have been solved. Ihave made no attempt to include the many other interesting papers on waveletsthat have appeared since the first printing; in any case, the list of references wasnot and is still not meant as a complete bibliography of the subject.

Preliminaries and Notation

This preliminary chapter fixes notation conventions and normalizations. It alsostates some basic theorems that will be used later in the book. For those lessfamiliar with Hilbert and Banach spaces, it contains a very brief primer. (Thisprimer should be used mainly as a reference, to come back to in those instanceswhen the reader comes across some Hilbert or Banach space language that sheor he is unfamiliar with. For most chapters, these concepts are not used.)

Let us start by some notation conventions. For x 6 R, we write \_x\ for thelargest integer not exceeding x,

For example, |_3/2J = 1, [–3/2 = -2, [–2J = -2. Similarly, \x\ is the smallestinteger which is larger than or equal to x.

If a—»0 (or oo), then we denote by O(a) any quantity that is bounded by aconstant times a, by o(a) any quantity that tends to 0 (or oo) when a does.

The end of a proof is always marked with a •; for clarity, many remarks orexamples are ended with a a.

In many proofs, C denotes a "generic" constant, which need not havethe same value throughout the proof. In chains of inequalities, I often useC, C", C", • • • or Ci, 62, G's, • • • to avoid confusion.

We use the following convention for the Fourier transform (in one dimension):

With this normalization, one has

where

xi

xii PRELIMINARIES AND NOTATION

Inversion of the Fourier transform is then given by

with the notation /(€) = -£if.If a function / is compactly supported, i.e., f ( x ) = 0 if x < a or x > 6,

where —oo < a < b < oo, then its Fourier transform /(£) is well defined also forcomplex £, and

Strictly speaking, (0.0.1), (0.0.3) are well defined only if /, respectively Ff, areabsolutely integrable; for general Z/2-functions /, e.g., we should define Tj viaa limiting process (see also below). We will implicitly assume that the adequatelimiting process is used in all cases, and write, with a convenient abuse of no-tation, formulas similar to (0.0.1) and (0.0.3) even when a limiting process isunderstood.

A standard property of the Fourier transform is:

hence

If / is moreover infinitely differentiable, then the same argument can be appliedto fW, leading to bounds on \£\e |/(£)|. For a C°° function / with support [a, 6]there exist therefore constants CN so that the analytic extension of the Fouriertransform of / satisfies

Conversely, any entire function which satisfies bounds of the type (0.0.4) for allN € N is the analytic extension of the Fourier transform of a C°° function withsupport in [a, b]. This is the Paley-Wiener theorem.

We will occasionally encounter (tempered) distributions. These are linearmaps T from the set S(R) (consisting of all C°° functions that decay faster thanany negative power (1 + jxl)"^) to C, such that for all m,n € N, there existsCn,m for which

PRELIMINARIES AND NOTATION xiii

holds, for all / G <S(R). The set of all such distributions is called <S'(R). Anypolynomially bounded function F can be interpreted as a distribution, withF(f) = J dx F(x) /(re). Another example is the so-called "6-function" of Dirac,6(f) = /(O). A distribution T is said to be supported in [a, b] if T(f) = 0 forall functions / the support of which has empty intersection with [a, 6]. One candefine the Fourier transform FT or T of a distribution T by T(f) = T(f) (ifT is a function, then this coincides with our earlier definition). There exists aversion of the Paley-Wiener theorem for distributions: an entire function T(£)is the analytic extension of the Fourier transform of a distribution T in *S'(R)supported in [a, 6] if and only if, for some N € N, CN > 0,

The only measure we will use is Lebesgue measure, on R and Rn. We willoften denote the (Lebesgue) measure of S by |5|; in particular, | [a ,b]\ = b — a(where 6 > a).

Well-known theorems from measure and integration theory which we will useinclude

Fatou's lemma. // fn > 0, fn(x] —>• f(x) almost everywhere (i.e., the setof points where pointwise convergence fails has zero measure with respect toLebesgue measure), then

In particular, if this lim sup is finite, then f is integrable.

(The lim sup of a sequence is defined by

every sequence, even if it does not have a limit (such as an = (—l)n) , has alimsup (which may be oo); for sequences that converge to a limit, the limsupcoincides with the limit.)

Dominated convergence theorem. Suppose fn(x) —>• f(x) almost every-where. If \fn(x)\ < g(x) for all n, and f dx g(x) < oo, then f is integrable,and

Fubini's theorem. If f dx[f dy \f(x,y)\] < oo, then

xiv PRELIMINARIES AND NOTATION

i.e., the order of the integrations can be permuted.In these three theorems the domain of integration can be any measurable

subset of R (or R2 for Fubini).When Hilbert spaces are used, they are usually denoted by H, unless they

already have a name. We will follow the mathematician's convention and usescalar products which are linear in the first argument:

Here the integration runs from —oo to oo; we will often drop the integrationbounds when the integral runs over the whole real line.

Another example is l2(Z), the set of all square summable sequences of com-plex numbers indexed by integers, with

As usual, we have

where a denotes the complex conjugate of a, and (w, u) > 0 for all u 6 H. Wedefine the norm \\u\\ of u by

In a Hilbert space, \\u\\ = 0 implies u = 0, and all Cauchy sequences (withrespect to || ||) have limits within the space. (More explicitly, if un e H and if||wn — um\\ becomes arbitrarily small if n, m are large enough—i.e., for all e > 0,there exists no, depending on e, so that \\un — um\\ < e if n, m > HQ — , then thereexists u e Ji so that the un tend to u for n—>oo, i.e., limn-^ \\u — un\\ — 0.)

A standard example of such a Hilbert soace is L2(R), with

Again, we will often drop the limits on the summation index when we sumover all integers. Both L2(R) and ^2(Z) are infinite-dimensional Hilbert spaces.Even simpler are finite-dimensional Hilbert spaces, of which C is the standardexample, with the scalar product

foru = ( u i , - - - , W f c ) , v = (vi, • • • , « * ) eC fc.Hilbert spaces always have orthonormal bases, i.e., there exist families of

vectors en in 7i

and

PRELIMINARIES AND NOTATION xv

for all u € H. (We only consider separable Hilbert spaces, i.,3., spaces in whichorthonormal bases are countable.) Examples of orthonormal bases are the Her-mite functions in L2(R), the sequences en defined by (en}j = 6nj, with n,j 6 Zin ^2(Z) (i.e., all entries but the nth vanish), or the k vectors ei, • • • , efc in Cfc

defined by (e^)m = ^,m, with 1 < t, m < k. (We use Kronecker's symbol 6 withthe usual meaning: 6ij = 1 if i = j, 0 if i ^ j.)

A standard inequality in a Hilbert space is the Cauchy-Schwarz inequality,

Operators from H to C are called "linear functionals." For bounded linearfunctionals one has Riesz' representation theorem: for any i: ?i—>C, linear and

easily proved by writing (0.0.5) for appropriate linear combinations of v and w.In particular, for /, g e L2(R), we have

A consequence of (0.0.6) is

"Operators" on H are linear maps from H to another Hilbert space, often Hitself. Explicitly, if A is an operator on ?i, then

An operator is continuous if Au — Av can be made arbitrarily small by makingu — v small. Explicitly, for all e > 0 there should exist 6 (depending on e) sothat \\u — v\\ < 6 implies \\Au — Av\\ < e. If we take v = 0, e = 1, then wefind that, for some b > 0, \\Au\\ < 1 if \\u\\ < b. For any w e H we can define

w' = ^w- clearly \\w'\\ < b and therefore \\Aw\\ = ^ \\Aw'\\ < b-l\\w\\. If||Aw;||/||w;|| (w ^ 0) is bounded, then the operator A is called bounded. We havejust seen that any continuous operator is bounded; the reverse is also true. Thenorm \\A\\ of A is defined by

It immediately follows that, for all u € H.,

xvi PRELIMINARIES AND NOTATION

bounded, i.e., \£(u)\ < C\\u\\ for all u € H, there exists a unique vt G H so thati(u) = (u,vi).

An operator U from Hi to H-2 is an isometry if (Uv, C/u;) = (v, u;} for alli>,«; G HI] U is unitary if moreover UH\ = HI, i.e., every element v-2 € 7^2 canbe written as V2 = Uv\ for some v\ €.H\. If the en constitute an orthonormalbasis in HI, and U is unitary, then the Uen constitute an orthonormal basis in7^2- The reverse is also true: any operator that maps an orthonormal basis toanother orthonormal basis is unitary.

A set D is called dense in H if every u 6 H can be written as the limit ofsome sequence of un in D. (One then says that the closure of D is all of H. Theclosure of a set 5 is obtained by adding to it all the v that can be obtained aslimits of sequences in 5.) If Av is only defined for v £ D, but we know that

then we can extend A to all of H "by continuity." Explicitly: if u e H, findun € D so that limn_Kx> un — u. Then the un are necessarily a Cauchy sequence,and because of (0.0.9), so are the Aun; the Aun have therefore a limit, which wecall Au (it does not depend on the particular sequence un that was chosen).

One can also deal with unbounded operators, i.e., A for which there existsno finite C such that \\Au\\ < C\\u\\ holds for all u 6 H. It is a fact of life thatthese can usually only be defined on a dense set D in H, and cannot be extendedby the above trick (since they are not continuous). An example is ^ in L2(R),where we can take D = (70° (R), the set of all infinitely differentiable functionswith compact support, for D. The dense set on which the operator is defined iscalled its domain.

The adjoint A* of a bounded operator A from a Hilbert space HI to a Hilbertspace H2 (which may be HI itself) is the operator from H-2, to HI defined by

which should hold for all u\ e HI, u-i G H-}. (The existence of A* is guaranteedby Riesz' representation theorem: for fixed u-2, we can define a linear functionalt on Hi by £(ui) = (JAu1,u2)- It is clearly bounded, and corresponds thereforeto a vector v so that (HI, v) = t(u\). It is easy to check that the correspondenceu-2,—*v is linear; this defines the operator .A*.) One has

If A* = A (only possible if A maps H to itself), then A is called self-adjoint. Ifa self-adjoint operator A satisfies (Au, u) > 0 for all u £ H, then it is called apositive operator; this is often denoted A > 0. We will write A > B if A — B isa positive operator.

Trace-class operators are special operators such that ^n |(Aen,en)\ is finitefor all orthonormal bases in H. For such a trace-class operator, ^n (Aen, en) isindependent of the chosen orthonormal basis; we call this sum the trace of A,

PRELIMINARIES AND NOTATION xvii

If A is positive, then it is sufficient to check whether ]Tn (Aen,en) is finite foronly one orthonormal basis; if it is, then A is trace-class. (This is not true fornon-positive operators!)

The spectrum cr(A) of an operator A from H. to itself consists of all theA e C such that A — X Id (Id stands for the identity operator, Id u = u) doesnot have a bounded inverse. In a finite-dimensional Hilbert space, ff(A) consistsof the eigenvalues of A] in the infinite-dimensional case, a(A) contains all theeigenvalues (constituting the point spectrum) but often contains other A as well,constituting the continuous spectrum. (For instance, in L2(R), multiplication off ( x ) with sin-Trx has no point spectrum, but its continuous spectrum is [—1,1].)The spectrum of a self-adjoint operator consists of only real numbers; the spec-trum of a positive operator contains only non-negative numbers. The spectralradius p(A) is defined by

It has the properties

Self-adjoint operators can be diagonalized. This is easiest to understand iftheir spectrum consists only of eigenvalues (as is the case in finite dimensions).One then has

with a corresponding orthonormal family of eigenvectors,

It then follows that, for all u e H,

which is the "diagonalization" of A. (The spectral theorem permits us to gen-eralize this if part (or all) of the spectrum is continuous, but we will not need itin this book.) If two operators commute, i.e., ABu = BAu for all u e H, thenthey can be diagonalized simultaneously: there exists an orthonormal basis suchthat

Many of these properties for bounded operators can also be formulated for un-bounded operators: adjoints, spectrum, diagonalization all exist for unboundedoperators as well. One has to be very careful with domains, however. For in-stance, generalizing the simultaneous diagonalization of commuting operatorsrequires a careful definition of commuting operators: there exist pathologicalexamples where ^4, B are both defined on a domain D, where AB and BA bothmake sense on D and are equal on D, but where A and B nevertheless are not

xviii PRELIMINARIES AND NOTATION

simultaneously diagonalizable (because D was chosen "too small"; see, e.g., Reedand Simon (1971) for an example). The proper definition of commuting for un-bounded self-adjoint operators uses associated bounded operators: H\ and HZcommute if their associated unitary evolution operators commute. For a self-adjoint operator H, the associated unitary evolution operators Ut are defined asfollows: for any v € D, the domain of H (beware: the domain of a self-adjointoperator is not just any dense set on which H is well denned), UTV is the solutionv(t] at time t = T of the differential equation

It turns out that all bounded linear functionals on Lp are of this type, i.e.,(Lf)* = Lq. In particular, L2 is its own dual; by Riesz' representation theorem(see above), every Hilbert space is its own dual. The adjoint A* of an operatorA from EI to E2 is now an operator from E% to E{, defined by

with initial condition v(0) — v.Banach spaces share many properties with but are more general than Hilbert

spaces. They are linear spaces equipped with a norm (which need not be andgenerally is not derived from a scalar product), complete with respect to thatnorm (i.e., all Cauchy sequences converge; see above). Some of the conceptswe reviewed above for Hilbert spaces also exist in Banach spaces; e.g., boundedoperators, linear functionals, spectrum and spectral radius. An example of aBanach space that is not a Hilbert space is .^(R), the set of all functions / on Rsuch that \\f\\L*> (see (0.0.2)) is finite, with 1 < p < oo, p ^ 2. Another exampleis L°°(R), the set of all bounded functions on R, with ||/||L°° = SUPX€R l/(x)l-The dual E* of a Banach space E is the set of all bounded linear functionalson E; it is also a linear space, which comes with a natural norm (defined as hi(0.0.8)), with respect to which it is complete: E* is a Banach space itself. In thecase of the //-spaces, 1 < p < oo, it turns out that elements of L9, where p andq are related by p~l + q~l = 1, define bounded linear functionals on L?'. Indeed,one has Holder's inequality,

There exist different types of bases in Banach spaces. (We will again onlyconsider separable spaces, in which bases are countable.) The en consti-tute a Schauder basis if, for all v 6 E, there exist unique p,n 6 C so thatv = limjv^oo X)n=i MnCn (i-6., ||u - £n=1 V>nen\\^Q as JV-KX>). The uniquenessrequirement of the /xn forces the en to be linearly independent, in the sense thatno en can be in the closure of the linear span of all the others, i.e., there exist no7m so that en = limjv-KX) X^m=i m^n TmCm- In a Schauder basis, the orderingof the en may be important. A basis is called unconditional if in addition itsatisfies one of the following two equivalent properties:

for all u G H. If A is a bounded operator with a bounded inverse, then Amaps any orthonormal basis to a Riesz basis. Moreover, all Riesz bases can beobtained as such images of an orthonormal basis. In a way, Riesz bases are thenext best thing to an orthonormal basis. Note that the inequalities in (0.0.10)are not sufficient to guarantee that the en constitute a Riesz basis: the en alsoneed to be linearly independent!

For an unconditional basis, the order in which the basis vectors are taken doesnot matter. Not all Banach spaces have unconditional bases: L:(R) and L°°(]R)do not.

In a Hilbert space H, an unconditional basis is also called a Riesz basis. ARiesz basis can also be characterized by the following equivalent requirement:there exist a > 0, (3 < oo so that

PRELIMINARIES AND NOTATION xix

CHAPTER

The What, Why, and How ofWavelets

The wavelet transform is a tool that cuts up data or functions or operators intodifferent frequency components, and then studies each component with a resolu-tion matched to its scale. Forerunners of this technique were invented indepen-dently in pure mathematics (Calderon's resolution of the identity in harmonicanalysis—see e.g., Calderon (1964)), physics (coherent states for the (ax + &)-group in quantum mechanics, first constructed by Aslaksen and Klauder (1968),and linked to the hydrogen atom Hamiltonian by Paul (1985)) and engineering(QMF filters by Esteban and Galland (1977), and later QMF filters with exactreconstruction property by Smith and Barnwell (1986), Vetterli (1986) in elec-trical engineering; wavelets were proposed for the analysis of seismic data byJ. Morlet (1983)). The last five years have seen a synthesis between all thesedifferent approaches, which has been very fertile for all the fields concerned.

Let us stay for a moment within the signal analysis framework. (The dis-cussion can easily be translated to other fields.) The wavelet transform of asignal evolving in time (e.g., the amplitude of the pressure on an eardrum, foracoustical applications) depends on two variables: scale (or frequency) and time;wavelets provide a tool for time-frequency localization. The first section tells uswhat time-frequency localization means and why it is of interest. The remainingsections describe different types of wavelets.

1.1. Time-frequency localization.

In many applications, given a signal f ( t ) (for the moment, we assume thatt is a continuous variable), one is interested in its frequency content locally intime. This is similar to music notation, for example, which tells the player whichnotes (= frequency information) to play at any given moment. The standardFourier transform,

also gives a representation of the frequency content of /, but information con-cerning time-localization of, e.g., high frequency bursts cannot be read off easilyfrom Tj. Time-localization can be achieved by first windowing the signal /, so

1

1

2 CHAPTER 1

as to cut off only a well-localized slice of /, and then taking its Fourier transform:

This is the windowed Fourier transform, which is a standard technique for time-frequency localization.1 It is even more familiar to signal analysts in its discreteversion, where t and u; are assigned regularly spaced values: t = nt0, u = mujQ,where m, n range over Z, and u;o,^o > 0 are fixed. Then (1.1.1) becomes

This procedure is schematically represented in Figure 1.1: for fixed n, the^|n(/) correspond to the Fourier coefficients of f(-)g(- — nto). If, for instance,g is compactly supported, then it is clear that, with appropriately chosen WQ,the Fourier coefficients T^n(/) are sufficient to characterize and, if need be,to reconstruct /(•)<?(• — ntQ). Changing n amounts to shifting the "slices" bysteps of to and its multiples, allowing the recovery of all of / from the r™'£(/).(We will discuss this in more mathematical detail in Chapter 3.) Many possiblechoices have been proposed for the window function g in signal analysis, mostof which have compact support and reasonable smoothness. In physics, (1.1.1)is related to coherent state representations; the gWlt(s) = elujsg(s — t} are thecoherent states associated to the Weyl-Heisenberg group (see, e.g., Klauder andSkagerstam (1985)). In this context, a very popular choice is a Gaussian g. In allapplications, g is supposed to be well concentrated in both time and frequency; ifg and g are both concentrated around zero, then (Twm/)(u;, t) can be interpretedloosely as the "content" of / near time t and near frequency uj. The windowedFourier transform provides thus a description of / in the time-frequency plane.

FlG. 1.1. The windowed Fourier transform: the function f(t) is multiplied with the windowfunction g(t), and the Fourier coefficients of the product f(t)g(t) are computed; the procedureis then repeated for translated versions of the window, g(t — to), g(t — 2to), • • •-

THE WHAT, WHY, AND HOW OF WAVELETS 3

1.2. The wavelet transform: Analogies and differences with thewindowed Fourier transform.

The wavelet transform provides a similar time-frequency description, with afew important differences. The wavelet transform formulas analogous to (1.1.1)and (1.1.2) are

(for reasons explained in Chapters 2 and 3).Formula (1.2.2) is again obtained from (1.2.1) by restricting a, 6 to only dis-

crete values: a = a™, b = nboa™ in this case, with m, n ranging over Z, andao > 1, 60 > 0 fixed. One similarity between the wavelet and windowed Fouriertransforms is clear: both (1-1.1) and (1.2.1) take the inner products of / with afamily of functions indexed by two labels, gu;'t(s) = ezu}sg(s — t) in (1.1.1), andV>a'6(s) = H~1/2 V> (^r) in (1.2.1). The functions Va'6 are called "wavelets";the function ^ is sometimes called "mother wavelet." (Note that ^ and g areimplicitly assumed to be real, even though this is by no means essential; if theyare not, then complex conjugates have to be introduced in (1.1.1), (1.2.1).) Atypical choice for i[> is ifj(t) = (1 — t2)exp(—t2/2), the second derivative of theGaussian, sometimes called the mexican hat function because it resembles a crosssection of a Mexican hat. The mexican hat function is well localized in both timeand frequency, and satisfies (1.2.3). As a changes, the t/ja'°(s) = \a\~1^2i/j(s/a)cover different frequency ranges (large values of the scaling parameter |a| cor-respond to small frequencies, or large scale t/>a>0; small values of |a| correspondto high frequencies or very fine scale â>0). Changing the parameter b as wellallows us to move the time localization center: each ^°'b(s) is localized arounds = b. It follows that (1.2.1), like (1.1.1), provides a time-frequency descriptionof /. The difference between the wavelet and windowed Fourier transforms liesin the shapes of the analyzing functions gWft and â'6, as shown in Figure 1.2.The functions gwt all consist of the same envelope function <?, translated to theproper time location, and "filled in" with higher frequency oscillations. All thegu'*, regardless of the value of a;, have the same width. In contrast, the t/>a'6 havetime-widths adapted to their frequency: high frequency t/â'6 are very narrow,while low frequency â'6 are much broader. As a result, the wavelet transformis better able than the windowed Fourier transform to "zoom in" on very short-lived high frequency phenomena, such as transients in signals (or singularities

and

In both cases we assume that u satisfies

FlG. 1.2. Topical shapes of (a) windowed Fourier transform functions gu<t, and

(b) wavelets ^a>6. The g'^'t(x) = ell^xg(x — t) can be mewed as translated envelopes g, "filled

in" with higher frequencies; the ^a>b are all copies of the same functions, translated and com-

pressed or stretched.

For the example in Figure 1.3a, v\ = 500 Hz, 1/2 = 1 kHz, r = 1/8,000 sec (i.e.,we have 8,000 samples per second), a — 1.5, and n^ — n\ = 32 (correspondingto 4 milliseconds between the two pulses). The three spectrograms (graphs of

4 CHAPTER 1

in functions or integral kernels). This is illustrated by Figure 1.3, which showswindowed Fourier transforms and the wavelet transform of the same signal /defined by

In practice, this signal is not given by this continuous expression, but by samples,and adding a 6-function is then approximated by adding a constant to one sampleonly. In sampled version, we have then

THE WHAT, WHY, AND HOW OF WAVELETS

FIG. 1.3. (a) The signal /(£). (b) Windowed Fourier transforms of f with three different

window widths. These are so-called spectrograms: only |Twm(/)| is plotted (the phase is not

rendered on the graph), using grey levels (high values = black, zero — white, intermediate

grey levels are assigned proportional to log |Twm(/)|) in the t(abscissa), uj(ordinate) plane.(c) Wavelet transform of f. To make the comparison with (b) we have also plotted |Twav(/)|,with the same grey level method, and a linear frequency axis (i.e., the ordinate corresponds

to a"1), (d) Comparison of the frequency resolution between the three spectrograms and thewavelet transform. I would like to thank Oded Ghitza for generating this figure.

5

6 CHAPTER 1

the modulus of the windowed Fourier transform) in Figure 1.3b use standardHamming windows, with widths 12.8, 6.4, and 3.2 milliseconds, respectively.(Time t varies horizontally, frequency u> vertically, on these plots; the grey levelsindicate the value of \Twm(f)\, with black standing for the highest value.) Asthe window width increases, the resolution of the two pure tones gets better,but it becomes harder or even impossible to resolve the two pulses. Figure 1.3cshows the modulus of the wavelet transform of / computed by means of the(complex) Morlet wavelet V>(£) = Ce~t2/a2(eint - e"*2"2/4), with a = 4. (Tomake comparison with the spectrograms easier, a linear frequency axis has beenused here; for wavelet transforms, a logarithmic frequency axis is more usual.)One already sees that the two impulses are resolved even better than with the3.2 msec Hamming window (right in Figure 1.3b), while the frequency resolu-tion for the two pure tones is comparable with that obtained with the 6.4 msecHamming window (middle in Figure 1.3b). This comparison of frequency resolu-tions is illustrated more clearly by Figure 1.3d: here sections of the spectrograms(i.e., plots of |(Twm/)(-,t)| with fixed t) and of the wavelet transform modulusQ^j-wavy^.^j -wfth fixed b) are compared. The dynamic range (ratio betweenthe maxima and the "dip" between the two peaks) of the wavelet transform iscomparable to that of the 6.4 msec spectrogram. (Note that the flat horizontal"tail" for the wavelet transform in the graphs in Figure 1.3d is an artifact ofthe plotting package used, which set a rather high cut-off, as compared with thespectrogram plots; anyway, this cut-off is already at —24 dB.)

In fact, our ear uses a wavelet transform when analyzing sound, at least inthe very first stage. The pressure amplitude oscillations are transmitted fromthe eardrum to the basilar membrane, which extends over the whole length ofthe cochlea. The cochlea is rolled up as a spiral inside our inner ear; imagine itunrolled to a straight segment, so that the basilar membrane is also stretchedout. We can then introduce a coordinate y along this segment. Experiment andnumerical simulation show that a pressure wave which is a pure tone, f ^ ( t ) =el(4it, leads to a response excitation along the basilar membrane which has thesame frequency in time, but with an envelope in y, Fw,(t,y) = eta>t 4>w(y). In afirst approximation, which turns out to be pretty good for frequencies u> above500 Hz, the dependence on o> of </>w(y) corresponds to a shift by log u>: there existsone function (f> so that <t>w(y) is very close to 0(y—log w). For a general excitationfunction /, f ( t ) = -/== / dw f(u)elut, it follows that the response function F(t, y)is given by the corresponding superposition of "elementary response functions,"

If we now introduce a change of parameterization, by defining

THE WHAT, WHY, AND HOW OF WAVELETS

then it follows that

which (up to normalization) is exactly a wavelet transform. The dilation param-eter comes in, of course, because of the logarithmic shifts in frequency in the </>„,.The occurrence of the wavelet transform in the first stage of our own biologicalacoustical analysis suggests that wavelet-based methods for acoustical analysishave a better chance than other methods to lead, e.g., to compression schemesundetectable by our ear.

1.3. Different types of wavelet transform.

There exist many different types of wavelet transform, all starting from thebasic formulas (1.2.1), (1.2.2). In these notes we will distinguish between

A. The continuous wavelet transform (1.2.1), and

B. The discrete wavelet transform (1.2.2).

Within the discrete wavelet transform we distinguish further between

Bl. Redundant discrete systems (frames) and

B2. Orthonormal (and other) bases of wavelets.

1.3.1. The continuous wavelet transform. Here the dilation and trans-lation parameters a, 6 vary continuously over ]R (with the constraint a = 0). Thewavelet transform is given by formula (1.2.1); a function can be reconstructedfrom its wavelet transform by means of the "resolution of identity" formula

we assume Cu < oo (otherwise (1.3.1) does not make sense). If t/> is in LX(]R)(this is the case in all examples of practical interest), then t/> is continuous, sothat C^ can be finite only if ^(0) = 0, i.e., f dx if)(x) = 0. A proof for (1.3.1)will be given in Chapter 2. (Note that we have implicitly assumed that t/> is real;for complex ijj, we should use tp instead of ip in (1.2.1). In some applications,such complex $ are useful.)

Formula (1.3.1) can be viewed in two different ways: (1) as a way of re-constructing / once its wavelet transform Twav/ is known, or (2) as a way to

7

where ifra>b(x} = |a|~1/2 if> (^p), and ( , ) denotes the L2-inner product. Theconstant C^ depends only on T/> and is given by

8 CHAPTER 1

write / as a superposition of wavelets ^a>6; the coefficients in this superpositionare exactly given by the wavelet transform of /. Both points of view lead tointeresting applications.

The correspondence f ( x ) —> (Twav/)(a, 6) represents a one-variable functionby a function of two variables, into which lots of correlations are built in (seeChapter 2). This redundancy of the representation can be exploited; a beautifulapplication is the concept of the "skeleton" of a signal, extracted from the con-tinuous wavelet transform, which can be used for nonlinear filtering (see, e.g.,Torresani (1991), Delprat et al. (1992)).

1.3.2. The discrete but redundant wavelet transform-frames. In thiscase the dilation parameter a and the translation parameter both take onlydiscrete values. For a we choose the integer (positive and negative) powers ofone fixed dilation parameter ao > 1, i.e., a = a™. As already illustrated byFigure 1.2, different values of ra correspond to wavelets of different widths. Itfollows that the discretization of the translation parameter b should depend onm: narrow (high frequency) wavelets are translated by small steps in order tocover the whole time range, while wider (lower frequency) wavelets are translatedby larger steps. Since the width of t/;(a^mx) is proportional to a™, we choosetherefore to discretize b by b = nb0aoM, where 60 > 0 is fixed, and n e Z. Thecorresponding discretely labelled wavelets are therefore

Figure 1.4a shows schematically the lattice of time-frequency localization centerscorresponding to the tl>m,n- For a given function /, the inner products {/, ̂ m,n)then give exactly the discrete wavelet transform T^&^(f) as defined in (1.2.2)(we assume again that ?/> is real).

In the discrete case, there does not exist, in general, a "resolution of theidentity" formula analogous to (1.3.1) for the continuous case. Reconstructionof / from Twav(/), if at all possible, must therefore be done by some other means.The following questions naturally arise:

(1) Is it possible to characterize / completely by knowing TWAV(f)r?

(2) Is it possible to reconstruct / in a numerically stable way from ywav(J)?

These questions concern the recovery of / from its wavelet transform. We canalso consider the dual problem (see §1.3.1), the possibility of expanding / intowavelets, which then leads to the dual questions:

(!') Can any function be written as a superposition of ipm^

(2') Is there a numerically stable algorithm to compute the coefficients for suchan expansion?

Chapter 3 addresses these questions. As in the continuous case, these discretewavelet transforms often provide a very redundant description of the original

CBMS-NSF REGIONAL CONFERENCE...

Documents

Transcript of CBMS-NSF REGIONAL CONFERENCE...