Wavelets and Approximation Theory

30
Wavelets and Approximation Theory Bradley J. Lucier October 23, 2013 Revision: 1.37 These notes are incomplet and inkorrect (as an old computer documentation joke goes). Nonetheless, I’ve decided to distribute them in case they prove useful to someone. My goal is to present certain results that can be proved in a (relatively) straightforward way. So, for some problems we present complete proofs in L 2 and partial proofs (or none at all) for L p , p = 2. To simplify arguments even more, we consider only wavelet spaces of piecewise constant functions on uniform grids of size 2 k × 2 k on the unit square. As far as I know, all results can be extended to approximations of higher order. We don’t consider at all the area of “construction of wavelets,” which is a large topic in itself. I’ve presented some of these results in graduate “topics” courses at Purdue University. The notes have certain themes: generalized wavelets (results can be generalized to box splines, for example), nonlinearity (nonlinear approximations, nonlinear wavelet decompo- sitions, avoiding arguments that use linear functionals, etc.), non-Hilbert spaces (L p and p for p = 2), nonconvexity (L p and p for 0 <p< 1). My approach has been inspired by and is derived from the approach of Ron DeVore, who has made tremendous contributions to this field and with whom I collaborated for about a decade. Some parts of these notes are influenced directly by the survey paper “Wavelets,” by Ronald A. DeVore and Bradley J. Lucier, Acta Numerica 1 (1992), 1–56. Specific examples of nonlinear, piecewise constant, wavelet approximations appeared in “Image compression through wavelet transform coding,” by Ronald A. DeVore, Bjorn Jawerth, and Bradley J. Lucier, IEEE Transactions on Information Theory , 38 (1992), 719–746. And the approach in that paper was in turn influenced by the proofs in “Interpolation of Besov spaces,” by Ronald A. DeVore and Vasil A. Popov, Transactions of the American Mathematical Society 305 (1988), 397–414. But my goal is not to document everything that influenced these notes, for two reasons: (1) I can’t remember where many ideas came from and (2) nearly all these results are now over 20 years old, so can appear, perhaps, in these informal notes without strict attribution. Please e-mail me at [email protected] to get the latest version of these notes, or to point out errors or gaps in arguments. I thank Jeffrey Gaither, who helped correct and complete some aspects of these notes. 1

Transcript of Wavelets and Approximation Theory

Page 1: Wavelets and Approximation Theory

Wavelets and Approximation Theory

Bradley J. Lucier

October 23, 2013

Revision: 1.37

These notes are incomplet and inkorrect (as an old computer documentation joke goes).

Nonetheless, I’ve decided to distribute them in case they prove useful to someone.

My goal is to present certain results that can be proved in a (relatively) straightforward

way. So, for some problems we present complete proofs in L2 and partial proofs (or none

at all) for Lp, p 6= 2. To simplify arguments even more, we consider only wavelet spaces of

piecewise constant functions on uniform grids of size 2k × 2k on the unit square. As far as

I know, all results can be extended to approximations of higher order. We don’t consider

at all the area of “construction of wavelets,” which is a large topic in itself. I’ve presented

some of these results in graduate “topics” courses at Purdue University.

The notes have certain themes: generalized wavelets (results can be generalized to box

splines, for example), nonlinearity (nonlinear approximations, nonlinear wavelet decompo-

sitions, avoiding arguments that use linear functionals, etc.), non-Hilbert spaces (Lp and

ℓp for p 6= 2), nonconvexity (Lp and ℓp for 0 < p < 1).

My approach has been inspired by and is derived from the approach of Ron DeVore,

who has made tremendous contributions to this field and with whom I collaborated for

about a decade.

Some parts of these notes are influenced directly by the survey paper “Wavelets,” by

Ronald A. DeVore and Bradley J. Lucier, Acta Numerica 1 (1992), 1–56. Specific examples

of nonlinear, piecewise constant, wavelet approximations appeared in “Image compression

through wavelet transform coding,” by Ronald A. DeVore, Bjorn Jawerth, and Bradley J.

Lucier, IEEE Transactions on Information Theory , 38 (1992), 719–746. And the approach

in that paper was in turn influenced by the proofs in “Interpolation of Besov spaces,” by

Ronald A. DeVore and Vasil A. Popov, Transactions of the American Mathematical Society

305 (1988), 397–414.

But my goal is not to document everything that influenced these notes, for two reasons:

(1) I can’t remember where many ideas came from and (2) nearly all these results are now

over 20 years old, so can appear, perhaps, in these informal notes without strict attribution.

Please e-mail me at [email protected] to get the latest version of these notes,

or to point out errors or gaps in arguments.

I thank Jeffrey Gaither, who helped correct and complete some aspects of these notes.

1

Page 2: Wavelets and Approximation Theory

Background

We use the following terminology.

A norm on a vector space X , ‖ · ‖ : X → [0,∞) ⊂ R, satisfies (1) (∀a ∈ R) (∀x ∈ X)

‖ax‖ = |a| ‖x‖, (2) (∀x, y ∈ X) ‖x+ y‖ ≤ ‖x‖+ ‖y‖, and (3) (∀x ∈ X | ‖x‖ = 0) x = 0.

A semi-norm satisfies (1) and (2), but not necessarily (3).

A quasi-norm satisfies (1), (3), and (2′): (∃C > 1) (∀x, y ∈ X) ‖x+y‖ ≤ C(‖x‖+‖y‖).And a semi-quasi-norm (or quasi-semi-norm) satisfies (1) and (2′).

Elements of the sequence space

ℓp =

x = (x1, x2, . . . )

∣∣∣∣xk ∈ R, ‖x‖pℓp :=

k≥0

|xk|p <∞

for 0 < p <∞ (with the usual change when p = ∞) are denoted by (xk) or xk.The notation A(f) ≍ B(f), A,B : X → [0,∞) ⊂ R, means that there exist positive

constants c, c such that for all f ∈ X ,

cA(f) ≤ B(f) ≤ cA(f).

We say that A is equivalent to B onX . It is a standard result that all pairs of (quasi-)norms

on a finite-dimensional space are equivalent.

We use the following two inequalities extensively.

For any bounded domain Ω ⊂ R2, let |Ω| be the volume of Ω. If 0 < q < p < ∞, we

can use Holder’s inequality on Ω with

s =p

q≥ 1 and

1

r+

1

s= 1

to see that for any

f ∈ Lp(Ω) :=

f : Ω → R

∣∣∣∣‖f‖pLp(Ω) :=

Ω

|f |p <∞

we have

1

|Ω|

Ω

(|f |q · 1

)≤(

1

|Ω|

Ω

(|f |q

)s)1/s(

1

|Ω|

Ω

1r)1/r

=

(1

|Ω|

Ω

(|f |q

)p/q)q/p

.

Taking qth roots shows that

(1)

(1

|Ω|

Ω

|f |q)1/q

≤(

1

|Ω|

Ω

|f |p)1/p

.

On the other hand, for 0 < q ≤ p ≤ ∞ and sequence norms we have

(2) ‖(xk)‖ℓp ≤ ‖(xk)‖ℓq ;

when p = ∞ this is obvious, while for other values of p this inequality will be proved later.2

Page 3: Wavelets and Approximation Theory

Nested finite-dimensional linear spaces

Let I be the unit interval [0, 1]2; for k ≥ 0 and

j ∈ Zk2 := j = (j1, j2) ∈ Z

2 | 0 ≤ j1, j2 < 2k

let Ij,k be the square with opposite corners j/2k and (j + (1, 1))/2k; in other words

Ij,k = 2−k(I + j).

The corners of the squares Ij,k and the squares themselves are called dyadic, as the corners

are pairs of rational numbers whose denominators are powers of 2.

We define the characteristic function of Ij,k to be

χj,k(x) :=

1, x ∈ Ij,k,

0, otherwise.

We want to project a function f defined on I onto functions that are constant on each

Ij,k. We’ll call the space of such functions

Sk =f : I → R

∣∣ f∣∣Ij,k

is a constant.

These spaces have the property that Sk ⊂ Sk+1, i.e., this is an expanding sequences

of finite-dimensional spaces.

Projections as near-best approximations

For now, let’s take

Pkf∣∣Ij,k

=1

|Ij,k|

Ij,k

f,

i.e., Pkf in Ij,k is the average of f on Ij,k. Pkf is the best L2(I) approximation to f from

Sk, as calculus shows.

We make a series of claims about Pk. First, note that Pkf is bounded on Lp(I) for

1 ≤ p ≤ ∞, because, for each Ij,k, by (1),

|Pkf | =∣∣∣∣

1

|Ij,k|

Ij,k

f

∣∣∣∣≤(

1

|Ij,k|

Ij,k

|f |p)1/p

.

So, ∫

Ij,k

|Pkf |p ≤∫

Ij,k

1

|Ij,k|

Ij,k

|f |p =∫

Ij,k

|f |p;

3

Page 4: Wavelets and Approximation Theory

summing this inequality over j and taking pth roots gives ‖Pkf‖Lp(I) ≤ ‖f‖Lp(I).Second, if S ∈ Sk, then PkS = S (which is obvious). Third, Pkf is additive with

respect to elements of Sk: for any S in Sk,

Pk(f + S) = Pkf + PkS = Pkf + S, S ∈ Sk.

Pk is in fact linear, but we only need additivity.

These three properties imply that Pkf is a near-best approximation to f in Sk, because

for any S ∈ Sk, we have

‖Pkf − f‖Lp(I) = ‖(Pkf − PkS) + (S − f)‖Lp(I) ≤ ‖Pkf − PkS‖Lp(I) + ‖f − S‖Lp(I)≤ ‖f − S‖Lp(I) + ‖f − S‖Lp(I).

So for all f ∈ Lp(I),

(3) ‖Pkf − f‖Lp(I) ≤ 2 infS∈Sk

‖f − S‖Lp(I).

The same would be true of any (possibly nonlinear) projector of Lp(I) onto Sk that satisfies

these three properties.

Moduli of smoothness

We want to compare how close f is to Pkf by considering the smoothness of f . So for

any x ∈ I, nonnegative integer r, and h ∈ R2 for which this makes sense, we define the

differences

∆0hf(x) := f(x), ∆r+1

h f(x) := ∆rhf(x+ h) −∆r

hf(x), r ≥ 0.

Thus

∆1hf(x) = f(x+ h)− f(x), ∆2

hf(x) = f(x+ 2h)− 2f(x+ h) + f(x), etc.;

by induction,

∆rhf(x) =

r∑

s=0

(r

s

)

(−1)r+sf(x+ sh).

We have x+ sh ∈ I for s = 0, . . . , r, only for x in

Irh := x ∈ I | x+ rh ∈ I,

so for f(x) defined for x ∈ I, ∆rhf(x) is defined for x ∈ Irh.

4

Page 5: Wavelets and Approximation Theory

For any 0 < p ≤ ∞, the rth modulus of smoothness in Lp(I) is defined by

ωr(f, t)p := sup|h|<t

‖∆rhf‖Lp(Irh).

Bounding approximation error by modulus of smoothness

We claim that for 1 ≤ p ≤ ∞, we have

‖f − Pkf‖Lp(I) ≤ (2π)1/pω1(f, 2−k√2)p.

Why? Let’s consider 1 ≤ p <∞. On each Ij,k, we have

Ij,k

|f(x)− Pkf(x)|pdx =

Ij,k

∣∣∣∣f(x)− 1

|Ij,k|

Ij,k

f(y) dy

∣∣∣∣

p

dx.

But now, because f(x) does not depend on y we can pull f(x) into the integral in y; since

1 ≤ p, we can use (1) to show the latter quantity is bounded by

Ij,k

∣∣∣∣

1

|Ij,k|

Ij,k

f(x)− f(y) dy

∣∣∣∣

p

dx ≤∫

Ij,k

1

|Ij,k|

Ij,k

|f(x)− f(y)|p dy dx.

Note that for a given x ∈ Ij,k, the values of y are also restricted to Ij,k; so, in fact, if we

define h = y − x, we have |h| ≤ 2−k√2 and we have the further bound

Ij,k

1

|Ij,k|

|h|≤√2 2−k

x∈Ij,k , x+h∈I

|f(x+ h)− f(x)|p dh dx.

We now sum over all j to find

I

|f(x)− Pkf(x)|p dx ≤ 1

|Ij,k|

I

|h|≤√2 2−k

x∈Ih

|f(x+ h) − f(x)|p dh dx.

Change the order of integration, take a supremum over all |h| ≤ 2−k√2, and get

I

|f(x)− Pkf(x)|p dx ≤ |h | |h| ≤ 2−k√2|

|Ij,k|sup

|h|≤2−k√2

x∈Ih|f(x+ h)− f(x)|p dx

= 2πω1(f, 2−k√2)pp.

Thus

‖f − Pkf‖Lp(I) ≤ (2π)1/pω1(f, 2−k√2)p.

5

Page 6: Wavelets and Approximation Theory

A different argument, which we leave to the reader, is needed when p = ∞.

At this point I’m supposed to present various properties of ωr(f, t)p, 1 ≤ p ≤ ∞, like

(a) ωr(f, t)p is nondecreasing,

(b) ωr(f, t)p → 0 as t → 0 if 1 ≤ p < ∞ and f ∈ Lp(I) or if p = ∞ and f is uniformly

continuous on I,

(c) ωr(f, nt)p ≤ nrωr(f, t)p and ωr(f, λt)p ≤ (λ+ 1)rωr(f, t)p for integer n > 0 and real

λ > 0,

(d) ωr(f + g, t)p ≤ ωr(f, t)p + ωr(g, t)p,

(e) ωr(f, t)p = 0 for all t > 0 iff f is a polynomial of total degree < r on I,

(f) ωr(f, t)p ≤ 2r‖f‖Lp(I).

See Chapter 2 of DeVore and Lorentz, Constructive Approximation Theory , for the one-

dimensional theory. Note that property (d) implies that ωr( · , t)p is a semi-norm on Lp(I).

The “only if” part of (e) is nontrivial. These properties hold for more general domains

than our square I, for example they hold for functions defined on open, convex Ω.

Because of (c) we have, in fact,

(4) ‖f − Pkf‖Lp(I) ≤ Cω1(f, 2−k)p.

And because of (4) and (b), we have that Pkf → f in Lp(I) as k → ∞. Thus, if we define

P−1f = 0 we can write

Pmf = (Pmf − Pm−1f) + · · ·+ (P1f − P0f) =

m∑

k=0

(Pkf − Pk−1f).

Let m→ ∞ to see that

f = limm→∞

Pmf =

∞∑

k=0

(Pkf − Pk−1f),

where the sum converges in Lp(I).

Since Pk−1f ∈ Sk−1 ⊂ Sk and Pkf ∈ Sk, Pkf − Pk−1f ∈ Sk.

So we can write any f in Lp(I) as a convergent sum of elements of Sk, k ≥ 0.

0 < p < 1: Not as strange as it might seem

If we use

Pkf(x) =1

|Ij,k|

Ij,k

f for x ∈ Ij,k

as our projector, then the above stuff works only for p ≥ 1. For certain applications, we

will need p < 1 and it will be nice to get it to work there, too.6

Page 7: Wavelets and Approximation Theory

The basic inequality for 0 < p ≤ 1 is

(5)

∣∣∣∣

i

|ti|∣∣∣∣

p

≤∑

i

|ti|p.

We can prove this as follows.

Without loss of generality we assume that 0 ≤ b ≤ a and

|a+ b|p = |a|p|1 + (b/a)|p = ap(1 + x)p, 0 ≤ x = b/a ≤ 1.

We now note that for x ≥ 0,

F (x) :=(1 + x)p

1 + xp

satisfies F (0) = 1, and for 0 < x < 1,

F ′(x) =(1 + xp)p(1 + x)p−1 − (1 + x)ppxp−1

(1 + xp)2.

The numerator is

p(1+x)p−1[1+xp− (1+x)xp−1] = p(1+x)p−1[1+xp−xp−1−xp] = p(1+x)p−1[1−xp−1].

Since p− 1 ≤ 0 and 0 < x < 1, this quantity is negative. Thus, F (x) is nonincreasing on

[0, 1] and(1 + x)p

1 + xp≤ F (0) = 1, i.e., (1 + x)p ≤ 1 + xp.

Backing up gives

|a+ b|p = ap(1 + (b/a))p ≤ ap(1 + (b/a)p) = ap + bp,

which in turn gives (5).

From (5) we can derive other useful inequalities. For example, if 0 < s ≤ r < ∞ and

ti = |xi|r, then set p = s/r ≤ 1 to see that

(∑

|xi|r)s/r

≤∑

|xi|r·s/r,

or, on taking sth roots of each side,

(∑

|xi|r)1/r

≤(∑

|xi|s)1/s

,

7

Page 8: Wavelets and Approximation Theory

as claimed earlier.

Formula (5) immediately gives∫

I

|f(x) + g(x)|p dx ≤∫

I

(|f(x)|p + |g(x)|p

)dx,

i.e.,

(6) ‖f + g‖pLp(I) ≤ ‖f‖pLp(I) + ‖g‖pLp(I).

The function F (ξ) = ξp is concave on [0,∞) for 0 < p ≤ 1, so for 0 ≤ t ≤ 1 and any

nonnegative a and b

tF (a) + (1− t)F (b) ≤ F(ta+ (1− t)b

).

Setting t = 1/2, a = ‖f‖Lp(I), and b = ‖g‖Lp(I), we get with some algebra

‖f‖pLp(I) + ‖g‖pLp(I) ≤ 21−p(‖f‖Lp(I) + ‖g‖Lp(I)

)p.

Combining this with (6) gives

‖f + g‖Lp(I) ≤ 21p−1(‖f‖Lp(I) + ‖g‖Lp(I)

),

i.e., ‖ · ‖Lp(I) is a quasi-norm. The completeness of Lp(I) for p < 1 is proved in the same

way as for 1 ≤ p <∞.

Projections and moduli of smoothness in Lp(I), 0 < p ≤ ∞For any nontrivial measurable set Ω we can define a (possibly non-unique) median of

f on Ω to be any number m for which

(7) |f(x) ≥ m| ≥ 1

2|Ω| and |f(x) ≤ m| ≥ 1

2|Ω|.

A median is a best L1(Ω) constant approximation to f on Ω. If k < m, for example, then

because |f ≥ m| ≥ 12 |Ω|

|f − k| =∫

f≥k(f − k) +

f<k

(k − f)

=

f≥m(f −m) +

f≥m(m− k) +

k≤f<m(f − k)

+

f<m

(m− f) +

f<m

(k −m)−∫

k≤f<m(k − f)

=

|f −m| + (m− k)(|f ≥ m| − |f < m|) + 2

k≤f<m(f − k)

≥∫

|f −m|.8

Page 9: Wavelets and Approximation Theory

We can try a new projector

fj,k := Pkf∣∣∣Ij,k

:= a median of f on Ij,k.

Note that this isn’t necessarily unique, so just pick one.

We note that ∫

x∈Ij,k||f(x)|≥|fj,k||fj,k|p dx ≤

Ij,k

|f(x)|p dx

No matter the choice of median, the measure of the set of all x such that |f(x)| ≥ |fj,k|on Ij,k is at least 1

2 |Ij,k|, so the left hand side is

|x ∈ Ij,k | |f(x)| ≥ |fj,k|| |fj,k|p =|x ∈ Ij,k | |f(x)| ≥ |fj,k||

|Ij,k||Ij,k| |fj,k|p

≥ 1

2

Ij,k

|fj,k|p,

so ∫

Ij,k

|Pkf |p ≤ 2

Ij,k

|f |p.

Summing over all j gives

‖Pkf‖pLp(I) ≤ 2‖f‖pLp(I).

or

‖Pkf‖Lp(I) ≤ 21/p‖f‖Lp(I).

Note that this argument works for any 0 < p <∞, and the case p = ∞ is trivial.

Although Pk is no longer linear, one sees immediately that PkS = S for S ∈ Sk and

Pk is still additive with respect to elements of Sk, i.e., one can choose medians so that

Pk(f + S) = Pkf + PkS = Pkf + S.

So, for 1 ≤ p and Pk the median operator, we have that ‖Pkf‖Lp(I) ≤ 2‖f‖Lp(I) and the

same argument as for the averaging operator shows that

‖f − Pkf‖Lp(I) ≤ 3 infS∈Sk

‖f − S‖Lp(I).

For 0 < p < 1, use (6) to see that for any S ∈ Sk,

‖f − Pkf‖pLp(I) = ‖f − S + S − Pkf‖pLp(I)≤ ‖f − S‖pLp(I) + ‖S − Pkf‖pLp(I)= ‖f − S‖pLp(I) + ‖Pk(S − f)‖pLp(I)≤ 3‖f − S‖pLp(I)

9

Page 10: Wavelets and Approximation Theory

so

‖f − Pk‖Lp(I) ≤ 31/p infS∈Sk

‖f − S‖Lp(I),

i.e., Pkf is a near-best approximation to f in Sk with a constant that depends on p.

If we now distinguish P averagek f from Pmedian

k we see that for p ≥ 1 and k ≥ 0,

‖f − Pmediank f‖Lp(I) ≤ C inf

S∈Sk‖f − S‖Lp(I) ≤ C‖f − P average

k f‖Lp(I) ≤ Cω1(f, 2−k)p.

A different argument works for all 0 < p < ∞. For fixed x ∈ Ij,k, we note that for

points y that cover at least half the measure of Ij,k, we have

|f(x)− Pkf | ≤ |f(x)− f(y)|,

since f(y) and f(x) will lie on “opposite” sides of the median. So for x ∈ Ij,k and p > 0,

|f(x)− Pkf |p1

2|Ij,k| ≤ |f(x)− Pkf |p

∣∣y ∈ Ij,k | |f(x)− f(y)| ≥ |f(x)− Pkf |

∣∣

=

y∈Ij,k||f(x)−f(y)|≥|f(x)−Pkf ||f(x)− Pkf |p dy

≤∫

y∈Ij,k||f(x)−f(y)|≥|f(x)−Pkf ||f(x)− f(y)|p dy

≤∫

Ij,k

|f(x)− f(y)|p dy.

So1

2|Ij,k|

Ij,k

|f(x)− Pkf |p dx ≤∫

Ij,k

Ij,k

|f(x)− f(y)|p dy dx

or ∫

Ij,k

|f(x)− Pkf |p dx ≤ 2

|Ij,k|

Ij,k

Ij,k

|f(x)− f(y)|p dy dx

and we proceed as with the average projector to show that

‖f − Pkf‖Lp(I) ≤ C(p)ω1(f, 2−k)p.

What are the properties of the modulus of smoothness when 0 < p ≤ 1? When p ≤ 1,

property (d) for the modulus of smoothness is modified to

ωr(f + g, t)pp ≤ ωr(f, t)pp + ωr(g, t)

pp,

10

Page 11: Wavelets and Approximation Theory

or, for all 0 < p ≤ ∞,

ωr(f + g, t)min(p,1)p ≤ ωr(f, t)

min(p,1)p + ωr(g, t)

min(p,1)p ,

Similarly, because

Irh

∣∣∆r

hf(x)∣∣pdx =

Irh

∣∣∣∣

r∑

s=0

(−1)r+s(r

s

)

f(x+ sh)

∣∣∣∣

p

dx

≤∫

Irh

r∑

s=0

(r

s

)p

|f(x+ sh)|p dx

≤∫

Irh

r∑

s=0

(r

s

)

|f(x+ sh)|p dx

≤ 2r‖f‖pLp(I)

for 0 < p ≤ 1, (f) is now

ωr(f, t)min(p,1)p ≤ 2r‖f‖min(p,1)

Lp(I)for 0 < p ≤ ∞.

Finally, for integer n > 0 and real λ > 0, (c) is now

ωr(f, nt)min(p,1)p ≤ nrωr(f, t)

min(p,1)p and ωr(f, λt)

min(p,1)p ≤ (λ+ 1)rω(f, t)min(p,1)

p .

We can choose many other nonlinear projectors. One that I find interesting is

Pkf = round

(1

|Ij,k|

Ij,k

f

)

where round(x) is the integer closest to x (breaking ties arbitrarily). It is shown in DeVore,

Jawerth, Lucier “Image compression through wavelet transform coding” that if f takes on

integer values (as it does from a digital camera, for example), then this Pkf is stable for all

0 < p. This leads to so-called “integer-to-integer” wavelet transforms (which necessarily

are nonlinear) and wavelet transforms for binary images.

One can even calculate the averages using fixed-point arithmetic with enough precision

and still have a stable transform. This nonlinear theory can accomodate many computa-

tional “crimes”.11

Page 12: Wavelets and Approximation Theory

Besov smoothness spaces

We measure the smoothness of f by considering how fast the modulus of smoothness

decays as t→ 0. One natural measure may be: (∃C, α > 0) (∀t > 0),

ωr(f, t)p ≤ Ctα.

This is equivalent to

t−αωr(f, t)p ≤ C

and, in fact, we could take

suptt−αωr(f, t)p

as a semi-norm if p ≥ 1 (and a semi-quasi-norm if 0 < p < 1).

Unfortunately, this is not quite general enough for our purposes. We need to make a

more subtle distinction in the decay of ωr(f, t)p, so we add another parameter q and define

for 0 < p, q ≤ ∞ and r−1 ≤ α < r the (quasi-)semi-norms for the Besov spaces Bαq (Lp(I))

by

|f |Bαq (Lp(I)) :=(∫ 1

0

[t−αωr(f, t)p]q dt

t

)1/q

, 0 < q <∞,

and

|f |Bα∞

(Lp(I)) := sup0<t<1

t−αωr(f, t)p.

We also define the (quasi-)norm

‖f‖Bαq (Lp(I)) := |f |Bαq (Lp(I)) + ‖f‖Lp(I).

(And after that bit of pedantic distinction between (quasi-)(semi-)norms and (semi-)norms,

we shall in the future call quasi-norms “norms.”)

Now∫ 1

0

[t−αωr(f, t)p]q dt

t=∑

k≥0

∫ 2−k

2−k−1

[t−αωr(f, t)p]q dt

t.

and we have for 2−k−1 ≤ t ≤ 2−k

[2αkωr(f, 2−(k+1))p]

q2k ≤ [t−αωr(f, t)p]q 1

t≤ [2α(k+1)ωr(f, 2

−k)p]q2k+1

We integrate over [2−(k+1), 2−k] (which has length 2−(k+1)), sum over k, and use property

(c) for moduli of smoothness (arguing separately for p ≥ 1 and 0 < p < 1) to see that

there are positive constants c and c such that

c

(∑

k≥0

[2αkωr(f, 2−k)p]

q

)1/q

≤ |f |Bαq (Lp(I)) ≤ c

(∑

k≥0

[2αkωr(f, 2−k)p]

q

)1/q

,

12

Page 13: Wavelets and Approximation Theory

i.e.,

|f |Bαq (Lp(I)) ≍ ‖2αkωr(f, 2−k)p‖ℓq .

So we can take as the norm of Bαq (Lp(I)) the quantity

‖f‖Bαq (Lp(I)) := ‖f‖Lp(I) + ‖2αkωr(f, 2−k)p‖ℓq .

Sequence norms of wavelet coefficients: Part I, the direct inequality

We have decomposed f ∈ Lp(I) by

f =∑

k≥0

(Pkf − Pk−1f),

where we can take Pk to be the average projector onto Sk when p ≥ 1 and we take Pk to

be the median projector onto Sk for any p > 0.

Because

Pk−1f ∈ Sk−1 ⊂ Sk,

we have Pkf − Pk−1f ∈ Sk so we define dj,k by

Pkf − Pk−1f =∑

j∈Z2k

dj,kχj,k;

note that

‖Pkf − Pk−1f‖Lp(I) =(∑

j

‖dj,kχj,k‖pLp(I))1/p

= ‖‖dj,kχj,k‖Lp(I)‖ℓp(j∈Z2k).

Now for 0 < p ≤ ∞ and k ≥ 1 we have

‖Pkf − Pk−1f‖Lp(I) = ‖Pkf − f + f − Pk−1f‖Lp(I)≤ C(‖Pkf − f‖Lp(I) + ‖Pk−1f − f‖Lp(I))≤ C(ω1(f, 2

−k)p + ω1(f, 2−(k+1))p)

≤ Cω1(f, 2−k)p,

while for k = 0,

‖P0f − P−1f‖Lp(I) = ‖P0f‖Lp(I) = |d0,0| ≤ C‖f‖Lp(I).13

Page 14: Wavelets and Approximation Theory

Combining these bounds, we see that for any 0 < q ≤ ∞ we have

‖2αk‖Pkf − Pk−1f‖Lp(I)‖ℓq(k≥0) = ‖2αk‖‖dj,kχj,k‖Lp(I)‖ℓp(j∈Z2k)‖ℓq(k≥0)

≤ C(‖2αkω1(f, 2−k)p‖ℓq(k≥0) + ‖f‖Lp(I))

= C‖f‖Bαq (Lp(I)).

We tend to keep ‖χj,k‖Lp(I) in there as a weight ; we can also write the left hand-side

of the inequality as

‖2αk‖χj,k‖Lp(I)‖dj,k‖ℓp(j∈Z2k)‖ℓq(k≥0).

Since we have the explicit value ‖χj,k‖Lp(I) = 2−2k/p we can also write the left-hand-side

of the inequality as

‖2(α−2/p)k‖dj,k‖ℓp(j∈Z2k)‖ℓq(k≥0).

If we use a different normalization for χj,k then the weight 2(α−2/p)k will differ; normalizing

χj,k to be have ‖χj,k‖Lp(I) = 1 rather than just being the characteristic function of Ij,k,

for example, will lead to a weight of 2αk.

The direct inequality

(8) ‖2αk‖χj,k‖Lp(I)‖dj,k‖ℓp(j∈Z2k)‖ℓq(k≥0) ≤ C‖f‖Bαq (Lp(I)),

which followed from the Jackson inequality

‖f − Pkf‖Lp(I) ≤ Cω1(f, 2−k)p,

is the easy part.

Sequence norms of wavelet coefficients: Part II, the inverse inequality

The harder part is to show that

‖f‖Bαq (Lp(I)) ≤ C‖2αk‖Pkf − Pk−1f‖Lp(I)‖ℓq(k≥0),

which is called the inverse inequality. The inverse inequality will follow by a bound on the

modulus of smoothness of any element of Sk; specifically for S ∈ Sk we have the Bernstein

inequality

(9) ω1(S, t)p ≤ C

‖S‖Lp(I), 2−k ≤ t,

2k/pt1/p‖S‖Lp(I), 0 < t ≤ 2−k.14

Page 15: Wavelets and Approximation Theory

The first inequality is a special case of

ωr(S, t)min(1,p)p ≤ 2r‖S‖min(1,p)

Lp(I).

To prove the second, we start with

f(x+ h) − f(x) = f(x+ (h1, h2))− f(x+ (h1, 0)) + f(x+ (h1, 0))− f(x), h = (h1, h2),

so

‖∆1hf‖Lp(Ih) ≤ Cp(‖∆1

(h1,0)f‖Lp(I(h1,0)) + ‖∆1

(0,h2)f‖Lp(I(0,h2))),

so we need to consider only offsets parallel to the coordinate axes.

Now we denote

S =∑

j

sj,kχj,k.

When x ∈ Ij,k and 0 ≤ h ≤ 2−k, we have

S(x+ (h, 0))− S(x) =

sj,k − sj,k = 0, x+ (h, 0) ∈ Ij,k,

sj+(1,0),k − sj,k, x+ (h, 0) ∈ Ij+(1,0),k.

The area where x ∈ Ij,k and x+ (h, 0) ∈ Ij+(1,0),k is h · 2−k; therefore∫

I(h,0)

|S(x+ (h, 0))− S(x)|p dx =∑

j

Ij,k∩I(h,0)|S(x+ (h, 0))− S(x)|p dx

=∑

j1+1<2k

|sj+(1,0),k − sj,k|ph2−k

= 2kh∑

j1+1<2k

|sj+(1,0),k − sj,k|p2−2k

= 2kh‖∆1(2−k,0)S‖

pLp(I(2−k,0))

≤ C2kh‖S‖pLp(I).

This implies the second part of (9).

If we write Sk = Pkf − Pk−1f ∈ Sk, we have

f =∑

k≥0

Sk.

If 0 < p ≤ 1 we have

Ih

|∆1hf(x)|p =

Ih

∣∣∣∣

k≥0

∆1hSk

∣∣∣∣

p

≤∑

k≥0

Ih

|∆1hSk|p.

15

Page 16: Wavelets and Approximation Theory

If we now take |h| ≤ 2−m and use the two parts of (9), we find

Ih

|∆1hf |p ≤ C

(∑

0≤k<m

Ih

|∆1hSk|p +

m≤k

Ih

|∆1hSk|p

)

≤ C

(∑

0≤k<m(2k2−m)‖Sk‖pLp(I) +

m≤k‖Sk‖pLp(I)

)

.

Taking suprema over |h| ≤ 2−m gives

ω1(f, 2−m)pp ≤ C

(∑

0≤k<m2k−m‖Sk‖pLp(I) +

m≤k‖Sk‖pLp(I)

)

, m ≥ 0,

and multiplying by 2αmp gives

2αmpω1(f, 2−m)pp ≤ C

(∑

0≤k<m2k+(αp−1)m‖Sk‖pLp(I) +

m≤k2αmp‖Sk‖pLp(I)

)

.

Now we see that we want to have 2αkp‖Sk‖pLp(I) on the right hand side, so we multiply

and divide each term on the right by 2αkp:

2αmpω1(f, 2−m)pp ≤ C

(∑

0≤k<m2(αp−1)(m−k)2αkp‖Sk‖pLp(I) +

m≤k2α(m−k)p2αkp‖Sk‖pLp(I)

)

This set of inequalities can be written in vector form. We set

xm := 2αmpω1(f, 2−m)pp and yk := 2αkp‖Sk‖pLp(I).

Then, componentwise in m,

(xm) ≤ C

1 2−αp 2−2αp 2−3αp . . .

2(αp−1) 1 2−αp 2−2αp . . .

22(αp−1) 2(αp−1) 1 2−αp . . .

23(αp−1) 22(αp−1) 2(αp−1) 1 . . ....

......

...

(yk).

If p ≥ 1 we proceed similarly, but now we use the triangle inequality and end up with

2αmω1(f, 2−m)p ≤ C

(∑

0≤k<m2(α−1/p)(m−k)2αk‖Sk‖Lp(I) +

m≤k2α(m−k)2αk‖Sk‖Lp(I)

)

.

16

Page 17: Wavelets and Approximation Theory

Now we set

xm := 2αmω1(f, 2−m)p and yk := 2αk‖Sk‖Lp(I)

and get the vector inequality

(xm) ≤ C

1 2−α 2−2α 2−3α . . .

2(α−1/p) 1 2−α 2−2α . . .

22(α−1/p) 2(α−1/p) 1 2−α . . .

23(α−1/p) 22(α−1/p) 2(α−1/p) 1 . . ....

......

...

(yk).

These two vector inequalities, which are the crux of the matter, were derived by Larry

Brown at Purdue.

In either case for ℓ ≥ 0 we set

Tℓ(y0, y1, y2, . . . ) := (yℓ, yℓ+1, . . . ) and T−ℓ(y0, y1, y2, . . . ) := (0, . . . , 0︸ ︷︷ ︸

ℓ times

, y0, y1, . . . ).

For any 0 < r ≤ ∞, we obviously have ‖Tℓ(yk)‖r ≤ ‖(yk)‖r.For 0 < p ≤ 1 our vector inequality can be written

(xm) ≤ C

(∑

0≤ℓ2−αℓpTℓ(yk) +

1≤ℓ2ℓ(αp−1)T−ℓ(yk)

)

.

We set r = q/p; if 0 < r ≤ 1, we have

‖(xm)‖rr ≤ C

(∑

0≤ℓ2−αℓpr‖Tℓ(yk)‖rr +

1≤ℓ2ℓ(αp−1)r‖T−ℓ(yk)‖rr

)

.

Both these sums are finite if αp− 1 < 0, i.e., α < 1/p, and we get

‖(xm)‖rr =∑

m≥0

[2αmpω1(f, 2−m)pp]

q/p ≤ C‖(yk)‖rr = C∑

k≥0

[2αkp‖Sk‖pLp(I)]q/p

which is what we want upon taking qth roots. If 1 ≤ r, then we have

‖(xm)‖r ≤ C

(∑

0≤ℓ2−αℓp‖Tℓ(yk)‖r +

1≤ℓ2ℓ(αp−1)‖T−ℓ(yk)‖r

)

;

again, both these sums are finite if α < 1/p, and

‖(xm)‖r ≤ C‖(yk)‖r,17

Page 18: Wavelets and Approximation Theory

which again gives what we want upon taking pth roots.

For 1 ≤ p our vector inequality becomes

(xm) ≤ C

(∑

0≤ℓ2−αℓTℓ(yk) +

1≤ℓ2ℓ(α−1/p)T−ℓ(yk)

)

.

Now we take r = q; again, whether 0 < r ≤ 1 or 1 ≤ r, we have for α < 1/p

m≥0

[2αmω1(f, 2−m)p]

q ≤ C∑

k≥0

[2αk‖Sk‖Lp(I)]q.

In a similar way we can show

(10) ‖f‖Lp(I) ≤ C‖2αk‖Sk‖Lp(I)‖ℓq(0≤k).1

So we have

(11) ‖f‖Bαq (Lp(I)) ≤ C

(∑

k≥0

[2αk‖Pkf − Pk−1f‖Lp(I)]q)1/q

,

and from (8) and (11) we have proved the following theorem:

1 Indeed, if p ≥ 1 then ‖f‖Lp(I) ≤∑

k≥0 ‖Sk‖Lp(I); if q ≤ 1 we have

k≥0

‖Sk‖Lp(I) ≤

(

k≥0

‖Sk‖qLp(I)

)1/q

(

k≥0

2αkq‖Sk‖qLp(I)

)1/q

and if q ≥ 1 we have (with 1/q + 1/q′ = 1)

k≥0

‖Sk‖Lp(I) =∑

k≥0

‖Sk‖Lp(I)2αk2−αk ≤

(

k≥0

‖Sk‖qLp(I)

2αkq

)1/q(∑

k≥0

2−αkq′)1/q′

,

so we’ve proved (10) for p ≥ 1. If p ≤ 1 then ‖f‖pLp(I)

≤∑

k≥0 ‖Sk‖pLp(I)

; if r = q/p ≤ 1 then

k≥0

‖Sk‖pLp(I)

(

k≥0

‖Sk‖prLp(I)

)1/r

=

(

k≥0

‖Sk‖qLp(I)

)p/q

(

k≥0

2αkq‖Sk‖qLp(I)

)p/q

,

which implies (10), and if r ≥ 1 we have (again with 1/r + 1/r′ = 1)

k≥0

‖Sk‖pLp(I)

=∑

k≥0

‖Sk‖pLp(I)

2αkp2−αkp ≤

(

k≥0

‖Sk‖prLp(I)

2αkpr

)1/r(∑

k≥0

2−αkpr′)1/r′

,

which, since pr = q, again implies (10).

18

Page 19: Wavelets and Approximation Theory

Theorem 1. Assume 0 < p ≤ ∞, 0 < α < min(1, 1/p), 0 < q ≤ ∞. For x ∈ Ij,k we

let

Pkf(x) =

1

|Ij,k|

Ij,k

f, 1 ≤ p,

a median of f on Ij,k, 0 < p.

Then

(12) ‖f‖Bαq (Lp(I)) ≍ ‖2αk‖Pkf − Pk−1f‖Lp(I)‖ℓq(0≤k).

Embeddings of Besov spaces

One can derive a number of properties of Besov spaces from Theorem 1.

Because Lp(I) ⊂ Lp′(I) when p′ ≤ p, we have in that case Bαq (Lp(I)) ⊂ Bαq (Lp′(I)).

Because ℓq ⊂ ℓq′ if 0 < q < q′ ≤ ∞, we have Bαq (Lp(I)) ⊂ Bαq′(Lp(I)) for 0 < q < q′ ≤∞. In particular, Bαq (Lp(I)) ⊂ Bα∞(Lp(I)) for all 0 < q <∞.

If α′ > α, however, then Bα′

q′ (Lp(I)) ⊂ Bαq (Lp(I)) for any q and q′. One sees this by

noting that Bα′

q′ (Lp(I)) ⊂ Bα′

∞(Lp(I)) and f ∈ Bα′

∞(Lp(I)) implies there is a C such that

2α′k‖Pkf − Pk−1f‖Lp(I) ≤ C.

This means that for all k

2αk‖Pkf − Pk−1f‖Lp(I) ≤ C2−(α′−α)k,

and the right-hand side, being a decreasing geometric sequence, is in ℓq for any q.

Thus α is the main determiner of smoothness for fixed p, and the second parameter q

allows us to make finer distinctions.

We now fix δ ∈ R and consider the pairs α, q (q > 0, 0 < α < min(1, 1/q)) that satisfy

(13)1

q− α

2= δ.

For any pair satisfying (13) we have with Pkf − Pk−1f =∑

j∈Z2kdj,kχj,k,

‖2αk‖Pkf − Pk−1f‖Lq(I)‖qℓq(0≤k) =

k≥0

2αkq∑

j∈Z2k

|dj,k|q‖χj,k‖qLq(I)

=∑

k≥0

2αkq∑

j∈Z2k

|dj,k|q2−2k

=∑

k≥0

j∈Z2k

|2(α−2/q)kdj,k|q

=∑

k≥0

j∈Z2k

|2−2kδdj,k|q.

19

Page 20: Wavelets and Approximation Theory

So

(14) ‖f‖Bαq (Lq(I)) ≍ ‖2αk‖Pkf − Pk−1f‖Lq(I)‖ℓq(0≤k) = ‖2−2kδdj,k‖ℓq(k≥0, j∈Z2k).

If the pair α′, q′ also satisfies (13) with α′ > α, then q′ < q; because ℓq′ ⊂ ℓq in this

case, we have that Bα′

q′ (Lq′(I)) ⊂ Bαq (Lq(I)).

All these inclusions come with norm inequalities, so they are in fact embeddings.

To summarize: for α, α′ > 0, 0 < p, p′, q, q′ ≤ ∞,

Bαq (Lp′(I)) → Bαq (Lp(I)) when p′ > p;

Bαq′(Lp(I)) → Bαq (Lp(I)) when q′ < q;

Bα′

q′ (Lp(I)) → Bαq (Lp(I)) when α′ > α for any q, q′;

Bα′

q′ (Lq′(I)) → Bαq (Lq(I)) when1

q− α

2=

1

q′− α′

2and α′ > α.

In fact, we have for 0 < α, 0 < p <∞, and

1

q=α

2+

1

p

(

so δ =1

p

)

,

that Bαq (Lq(I)) → Lp(I). This can be shown simply when p = 2, so

1

q=α

2+

1

2

(

so δ =1

2

)

,

Pkf is the best L2(I) approximation to f on Sk, and 0 < α < 1. For then Pkf − Pk−1f

is orthogonal to all S ∈ Sℓ for ℓ ≤ k − 1, and for fixed k the χj,k, j ∈ Z2k, are orthogonal

because they have essentially disjoint support.

Thus we have

(15)

‖f‖2L2(I)=∑

k≥0

‖Pkf − Pk−1f‖2L2(I)=∑

k≥0

j∈Z2k

‖dj,kχj,k‖2L2(I)

=∑

k≥0

j∈Z2k

|2−kdj,k|2 =∑

k≥0

j∈Z2k

|2−2kδdj,k|2.

= ‖2−2kδdj,k‖ℓ2(k≥0, j∈Z2k).

Again, because q < 2, ℓq → ℓ2. Thus from (15) and (14) we have, as claimed,

Bαq (Lq(I)) → L2(I).

We summarize these results in the important Figure 1. A function space with smooth-

ness α in Lp(I) is graphed at the point (1/p, α). This concept is not so precise, so the

Sobolev space W 1,1(I), the Besov space B1q (L1(I)) for any 0 < q ≤ ∞, and the space of

functions of bounded variation BV(I) would all be represented by the point (1, 1). Sim-

ilarly, all Besov spaces Bαq (Lp(I)) for any q would be graphed at the point (1/p, α), and

the space Lp(I) would be represented by the point (1/p, 0) (since functions in Lp(I) have

zero smoothness).20

Page 21: Wavelets and Approximation Theory

Bα′

q′(Lp′ (I))

0

α

1

δ 1

p

1

p−

α

2= δ

Figure 1. The Besov space Bα′

q′ (Lp′(I)) is embedded in every Besov space

Bαq (Lp(I)) with (1/p, α) in at least the (open) shaded area.

Two-dimensional Haar transform

This may not look much like “regular” wavelets, so we specialize a bit further and see

how this applies to two-dimensional Haar wavelets.

For each k > 0, each interval

Ij,k−1 = I2j,k ∪ I2j+(1,0),k ∪ I2j+(0,1),k ∪ I2j+(1,1),k ,

so naturally we can write on Ij,k−1

Pkf − Pk−1f = d2j,kχ2j,k + d2j+(1,0),kχ2j+(1,0),k

+ d2j+(0,1),kχ2j+(0,1),k + d2j+(1,1),kχ2j+(1,1),k

=d2j+(0,1),k d2j+(1,1),k

d2j,k d2j+(1,0),k,

where we have boxed the values of the difference of projections to suggest the values in

each quarter of the interval Ij,k−1

We choose a different basis for this four-dimensional space: For any numbers a, b, c,

and d, we can write

(16)a b

c d= α

+1 −1

+1 −1+ β

−1 −1

+1 +1+ γ

−1 +1

+1 −1+ δ

+1 +1

+1 +1.

Therefore, for any projector Pk we can write Pkf − Pk−1f as a linear combination

Pkf − Pk−1f =∑

j∈Z2k−1

ψ∈Ψ

cj,k−1,ψψj,k−1

21

Page 22: Wavelets and Approximation Theory

where

Ψ = ψ(1), ψ(2), ψ(3), ψ(4)

and each ψ is defined on I by

ψ(1)(x1, x2) =

1, 0 ≤ x1 < 1/2,

−1, otherwise,

ψ(2)(x1, x2)) =

1, 0 ≤ x2 < 1/2,

−1, otherwise,

ψ(3)(x1, x2) =

1 0 ≤ x1, x2 < 1/2 or 1/2 ≤ x1, x2 ≤ 1,

−1, otherwise,

ψ(4)(x1, x2) = 1 for all (x1, x2) ∈ I.

So we can write

(17) f =∑

k≥0

j∈Z2k

ψ∈Ψ

cj,k,ψψj,k.

On each interval Ij,k−1, we have for any 0 < p <∞

(18)

Ij,k−1

(∑

dℓ,kχℓ,k

)p

≍∑

ψ

Ij,k−1

|cj,k−1,ψψj,k−1|p,

with constants that depend only on p because all (quasi-)norms on a four-dimensional

space are equivalent. Thus, from (12) and (18) we have the norm equivalence

‖f‖Bαq (Lp(I)) ≍(∑

k≥0

[

2αk(∑

j,ψ

‖cj,k,ψψj,k‖pLp(I))1/p]q

)1/q

.

When Pk is the average projector,

Pkf(x) =1

|Ij,k|

Ij,k

f = fj,k for x ∈ Ij,k,

we can specialize things a bit more. We have on Ij,k−1

fj,k−1 =1

4(f2j,k + f2j+(1,0),k + f2j+(0,1),k + f2j+(1,1),k),

i.e., fj,k−1 is the average of the four k-level “pixel values” on Ij,k−1, and the coefficient δ

in (16) is zero. (When k − 1 = 0, δ is the average value of f on I.)22

Page 23: Wavelets and Approximation Theory

We then write Ψ0 = Ψ and

Ψk = ψ(1), ψ(2), ψ(3), k > 0,

and normalize ψj,k in L2(I):

ψj,k(x) = 2kψ(2kx− j), k ≥ 0, ψ ∈ Ψk, j ∈ Z2k.

The set ψj,k | k ≥ 0, ψ ∈ Ψk, j ∈ Z2k is orthonormal and, because of (17), forms an

orthonormal basis for L2(I). In particular, if

f =∑

k≥0

ψ∈Ψk

j∈Z2k

cj,k,ψψj,k (the Haar transform),

then

(19) ‖f‖L2(I) = ‖(cj,k,ψ)‖ℓ2(0≤k,j∈Z2k,ψ∈Ψ).

Furthermore, since

‖ψj,k‖Lp(I) =(∫

Ij,k

[2k]p)1/p

= (2−2k2kp)1/p = 2k(1−2/p) for all ψ ∈ Ψk and j ∈ Z2k,

we can write for 0 < α < 1pand p ≥ 1

‖f‖Bαq (Lp(I)) ≍(∑

k≥0

[

2αk2k(1−2/p)

(∑

ψ∈Ψk

j∈Z2k

|cj,k,ψ|p)1/p]q)1/q

because the average projector Pk is bounded on Lp(I).

In particular, if 0 < α < 1 and p = q satisfies

1

q=α

2+

1

2,

so 1 < q < 2, then the exponent of 2 in the previous formula satisfies

(

α+ 1− 2

p

)

k = 0,

and

(20) ‖f‖Bαq (Lq(I)) ≍ ‖(cj,k,ψ)‖ℓq(0≤k,j∈Z2k,ψ∈Ψ).

Since ℓq ⊂ ℓ2, we see immediately from (19) and (20) that Bαq (Lq(I)) is embedded in

L2(I).

In the following we accept implicitly for the Haar transform that ψ(4) will be used only

when k = 0.23

Page 24: Wavelets and Approximation Theory

The big picture, part I: Linear approximation

In the next two sections we discuss two big ideas in the context of wavelets:

(1) Nonlinear approximation is better than linear approximation.

(2) Approximation is equivalent to smoothness.

We ask a number of questions:

(1) What do we mean by linear and nonlinear approximation by wavelets?

(2) What does it mean to approximate a function well by wavelets?

(3) Can one characterize the set of functions that are approximated well by wavelets?

We note that the average projector Pkf is the best approximation in L2(I) to f on Sk,

and using the Haar wavelets we developed in the previous section we can write

(21) Pkf =∑

0≤ℓ<k, j∈Z2ℓ, ψ∈Ψ

cj,ℓ,ψψj,ℓ, cj,ℓ,ψ = 〈f, ψj,ℓ〉.

There are 22k terms in the sum in (21), and the dimension of Sk is 22k. So for each k we

have chosen, a priori, a set of 22k wavelet terms

ψj,ℓ | 0 ≤ ℓ < k, j ∈ Z2ℓ , ψ ∈ Ψ

before even looking at f . We also have that

Pk(αf + βg) = αPkf + βPkg,

i.e., this approximation process is linear .

So if we define for 1 ≤ p ≤ ∞

EN (f)p = infS∈Sk

‖f − S‖Lp(I), N = 22k (the dimension of Sk),

the error of best approximation of f in Lp(I), we have by (3) and (4),

E22k(f)p ≤ ‖f − Pkf‖Lp(I) ≤ 2E22k(f)p ≤ Cω1(f, 2−k)p.

So, for any 1 ≤ p <∞, 0 < α < 1/p, and 0 < q ≤ ∞,

‖2αkE22k(f)p‖ℓq(k≥0) ≤ C‖2αkω1(f, 2−k)p‖ℓq(k≥0) ≍ C‖f‖Bαq (Lp(I)).

24

Page 25: Wavelets and Approximation Theory

Conversely, from Theorem 1,

‖f‖Bαq (Lp(I)) ≍ ‖2αk‖Pkf − Pk−1f‖Lp(I)‖ℓq(k≥0)

≤ ‖2αk(‖Pkf − f‖Lp(I) + ‖f − Pk−1f‖Lp(I))‖ℓq(k≥0)

≤ Cq‖2αk‖Pkf − f‖Lp(I)‖ℓq(k≥0)

≤ C‖2αkE22k(f)p‖ℓq(k≥0).

Combining the previous two inequality gives

‖2αkE22k(f)p‖ℓq(k≥0) ≍ ‖f‖Bαq (Lp(I)),

i.e., f can be approximated in Sk at a certain rate if and only if f is in a specific Besov

smoothness space.

This is the first example of the dictum:

Approximation is equivalent to smoothness.

The big picture, part II: Compression of wavelet coefficients

Note: This section is not yet written in a way that fits in with the previous material.

We choose an error space; for the moment, we choose L2(I). We also work with

orthonormal Haar wavelets ψj,k.We note that if we want to approximate

f =∑

j,k,ψ

cj,k,ψψj,k

by a sum

f =∑

cj,k,ψψj,k∈Λ

cj,k,ψψj,k

with no more than N terms in Λ and we want to minimize

‖f − f‖L2(I) =

(∑

cj,k,ψψj,k 6∈Λ

|cj,k,ψ|2)1/2

then we should put into Λ the N terms cj,k,ψψj,k with the largest values of |cj,k,ψ|; we call

that approximation fN . (Break ties in an arbitrary manner.)

If we sort cj,k,ψψj,k in nonincreasing order of |cj,k,ψ|, and call the resulting sequence

ci = |cj,k,ψ|, ci ≥ ci+1,25

Page 26: Wavelets and Approximation Theory

then

‖f − fN‖L2(I) =

(∑

N<i

c2i

)1/2

.

We now want to find an equivalence between the rate of decay of ‖f − fN‖L2(I) as

N → ∞ and the smoothness of f in Bαq (Lq(I)), if in fact

‖f‖Bαq (Lq(I)) ≍(∑

j,k,ψ

|cj,k,ψ|q)1/q

=

(∑

i

cqi

)1/q

<∞.

A simple bound on ‖f − fN‖L2(I) when f ∈ Bαq (Lq(I)), 1/q = α/2 + 1/2, can be

obtained as follows.

We choose an ǫ > 0 and ask “How many coefficients can satisfy |cj,k,ψ| > ǫ?” If we

denote this number by N , then we must have

(Nǫq)1/q ≤(

|cj,k,ψ|>ǫ|cj,k,ψ|q

)1/q

≤(∑

j,k,ψ

|cj,k,ψ|q)1/q

≤ C‖f‖Bαq (Lq(I)),

so

N ≤ Cǫ−q‖f‖qBαq (Lq(I)) and ǫ < CN−1/q‖f‖Bαq (Lq(I)).

We put these N coefficients into Λ, the resulting error is

‖f − fN‖L2(I) =

(∑

ci≤ǫc2i

)1/2

≤ supci≤ǫ

c(2−q)/2i

(∑

ci≤ǫcqi

)1/2

≤ ǫ(2−q)/2(∑

cqi

)1/2

≤ C(N−1/q‖f‖Bαq (Lq(I)))(2−q)/2‖f‖q/2Bαq (Lq(I))

= CN−(2−q)/(2q)‖f‖Bαq (Lq(I))= CN−α/2‖f‖Bαq (Lq(I))

since1

q− 1

2=α

2.

We can rewrite this as

Nα/2‖f − fN‖L2(I) ≤ C‖f‖Bαq (Lq(I)),26

Page 27: Wavelets and Approximation Theory

or

(22) ‖Nα/2‖f − fN‖L2(I)‖∞ ≤ C‖f‖Bαq (Lq(I)).

In fact, we have the more subtle estimate

(23)

(∑

N≥1

[Nα/2‖f − fN−1‖L2(I)]q 1

N

)1/q

=

(∑

N≥1

[

Nα/2

( ∞∑

i=N

c2i

)1/2]q1

N

)1/q

≍( ∞∑

i=1

cqi

)1/q

≍ ‖f‖Bαq (Lq(I)),

which we prove using a lemma from DeVore and Temlyakov, Advances in Computational

Math, Vol 5, 1996, 173–197. I’m responsible for any errors in translation. Note that (23)

implies (22) since∑

N 1/N diverges. Let ak be a non-negative, non-increasing sequence,

σ2m =

∞∑

k=m

a2k,

and1

q=α

2+

1

2.

Then we have

a2m ≤ a2m−1 ≤(

1

m

2m−1∑

k=m

a2k

)1/2

≤ 1

m1/2σm.

So ∞∑

m=1

aqm ≤ 2

∞∑

m=1

1

mq/2σqm = 2

∞∑

m=1

mαq/2σqm1

m

or( ∞∑

m=1

aqm

)1/q

≤ 21/q( ∞∑

m=1

[mα/2σm]q 1

m

)1/q

.

In the other direction,

σ2m =

( ∞∑

k=2m

a2k

)1/2

≤( ∞∑

k=m

2ka22k

)1/2

since ak ≤ a2m for the 2m terms k = 2m, . . . , 2m+1 − 1

≤( ∞∑

k=m

2kq/2aq2k

)1/q

27

Page 28: Wavelets and Approximation Theory

since q < 2.

Thus,

∞∑

m=0

2mαq/2σq2m ≤∞∑

m=0

2mαq/2∞∑

k=m

2kq/2aq2k

=

∞∑

k=0

2kq/2k∑

m=0

2mαq/2aq2k

(change order of summation)

=∞∑

k=0

2kq/22(k+1)αq/2 − 1

2αq/2 − 1aq2k

(geometric series)

≤ 2αq/2

2αq/2 − 1

∞∑

k=0

2kq/22αkq/2aq2k

=2αq/2

2αq/2 − 1

∞∑

k=0

2kaq2k, since q/2 + αq/2 = 1,

≤ 2αq/2+1

2αq/2 − 1

∞∑

j=1

aqj

since a2k ≤ aj for the 2k−1 terms with j = 2k−1 + 1, . . . , 2k. We also have

σ2m ≥ σj ≥ σ2m+1 and 2mαq/2 ≤ jαq/2 < 2(m+1)αq/2

for the 2m terms with 2m ≤ j ≤ 2m+1 − 1. So

2m+1−1∑

j=2m

jαq/2σqj1

j≤ 2(m+1)αqσq2m

1

2m2m ≤ 2αq/22mαq/2σq2m .

Combining these inequalities gives us

∞∑

j=1

[jα/2σj]q 1

j≤ 2αq/2

∞∑

m=0

2mαq/2σq2m ≤ 2αq+1

2αq/2 − 1

∞∑

j=1

aqj ,

which is what we needed.

“What is a wavelet?”

In the course of presenting these notes I was asked “What is a wavelet?”.

There are many people who can give a better answer than I can, but I’ll give my

perspective.28

Page 29: Wavelets and Approximation Theory

I prefer the question “What is a wavelet transform?” The reader should realize that

the rest of this section is meta-mathematics, not mathematics.

Let φ : R2 → R be a refinable function, that is, there are finitely many coefficients aj,

j ∈ Z2, for which

φ(x) =∑

j

ajφ(2x− j).

For each k ≥ 0, let Sk be the span of the functions

φjk(x) = φ(2kx− j), j ∈ Z2,

whose support intersects with the unit interval I nontrivially.

Assume that there are (possibly nonlinear) projectors Pk : Lp(I) → Sk for some range

of p and a positive constant C such that for some r > 0 and all k ≥ 0 and all f ∈ Lp(I),

(24) ‖Pkf − f‖Lp(I) ≤ Cωr(f, 2−k)p.

Assume also that there is some β > 0 and C > 0 such that for all S ∈ Sk,

(25) ωr(S, t)p ≤ C

‖S‖Lp(I), 2−k ≤ t,

2kβ/ptβ/p‖S‖Lp(I), 0 < t ≤ 2−k.

In other words, Jackson (24) and Bernstein (25) inequalities hold for the spaces Sk.

In this case, the series

f =∑

k≥0

(Pkf − Pk−1f)

converges in Lp(I) and Theorem 1 will hold for some range of α and p.

Conclusion: This combination of a family of subspaces Sk, generated by the dyadic

dilates and translates of a function φ, together with specific projectors Pk such that Jackson

(24) and Bernstein (25) inequalities hold is a wavelet transform.

In most cases people construct linear projectors Pk such that that Sk is the direct sum

of Sk−1 and the range of Pk − Pk−1, which we’ll denote by W k−1. So

Sk = Sk−1 ⊕W k−1.

Because φ is refinable, the space W k−1 is generated by 2d−1 functions ψ ∈ Ψ, where we’re

working in Rd (d = 2 up until now). If the functions φ(·− j), j ∈ Z

2, are mutually orthog-

onal, then the functions ψ(· − j), j ∈ Z2, ψ ∈ Ψ can be taken to be mutually orthogonal.

This is the situation for the Haar transform, for which φ = χI , the characteristic function

of I, and Pk is the L2(I) projection onto Sk.29

Page 30: Wavelets and Approximation Theory

If Pk is the median transform, however, then it’s not hard to see that the range of

Pk − Pk−1 is Sk itself. Nonetheless, with this specific way of calculating coefficients,

Jackson and Bernstein inequalities hold, and we consider this a wavelet transform.

This definition is at the very least imprecise and most likely incorrect in important

ways. But it is general enough to allow the notion of nonlinear wavelet transforms, binary

wavelet transforms (apply median projectors to functions f whose ranges take only the

values 0 and 1), integer-to-integer wavelet transforms (see Devore-Jawerth-Lucier, “Image

compression through wavelet transform coding”), etc.

30