Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna...

114
Harmonic Analysis of Deep Convolutional Neural Networks HelmutB˝olcskei Department of Information Technology and Electrical Engineering October 2017 joint work with Thomas Wiatowski and Philipp Grohs

Transcript of Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna...

Page 1: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Harmonic Analysis ofDeep Convolutional Neural Networks

Helmut Bolcskei

Department of Information Technology and Electrical Engineering

October 2017

joint work with Thomas Wiatowski and Philipp Grohs

Page 2: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

ImageNet

Page 3: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

ImageNet

ski

rock

coffee

plant

CNNs win the ImageNet 2015 challenge [He et al., 2015 ]

Page 4: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

ImageNet

ski

rock

coffee

plant

CNNs win the ImageNet 2015 challenge [He et al., 2015 ]

Page 5: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989.”

Page 6: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos

Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989

.”

Page 7: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber

conducting the Vienna Philharmonic’s New Year’s Concert 1989

.”

Page 8: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber conducting the

Vienna Philharmonic’s New Year’s Concert 1989

.”

Page 9: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber conducting the Vienna Philharmonic’s

New Year’s Concert 1989

.”

Page 10: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert

1989

.”

Page 11: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Describing the content of an image

CNNs generate sentences describingthe content of an image [Vinyals et al., 2015 ]

“Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989.”

Page 12: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Feature extraction and classification

input: f =

non-linear feature extraction

feature vector Φ(f)

linear classifier

{〈w,Φ(f)〉 > 0, ⇒ Shannon

〈w,Φ(f)〉 < 0, ⇒ von Neumannoutput:

Page 13: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Why non-linear feature extractors?

Task: Separate two categories of data through a linear classifier

1

: 〈w, f〉 > 0

: 〈w, f〉 < 0

Φ(f) =

[‖f‖1

]

: 〈w,Φ(f)〉 > 0

: 〈w,Φ(f)〉 < 0

possible with w =

[1−1

]

Page 14: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Why non-linear feature extractors?

Task: Separate two categories of data through a linear classifier

1

: 〈w, f〉 > 0

: 〈w, f〉 < 0

not possible!

Φ(f) =

[‖f‖1

]

: 〈w,Φ(f)〉 > 0

: 〈w,Φ(f)〉 < 0

possible with w =

[1−1

]

Page 15: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Why non-linear feature extractors?

Task: Separate two categories of data through a linear classifier

1

: 〈w, f〉 > 0

: 〈w, f〉 < 0

not possible!

Φ(f) =

[‖f‖1

]

: 〈w,Φ(f)〉 > 0

: 〈w,Φ(f)〉 < 0

possible with w =

[1−1

]

Page 16: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Why non-linear feature extractors?

Task: Separate two categories of data through a linear classifier

Φ(f) =

[‖f‖1

]

⇒ Φ is invariant to angular component of the data

⇒ Linear separability in feature space!

Page 17: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Why non-linear feature extractors?

Task: Separate two categories of data through a linear classifier

Φ(f) =

[‖f‖1

]

⇒ Φ is invariant to angular component of the data

⇒ Linear separability in feature space!

Page 18: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Translation invariance

Handwritten digits from the MNIST database [LeCun & Cortes, 1998 ]

Feature vector should be invariant to spatial location⇒ translation invariance

Page 19: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Deformation insensitivity

Feature vector should be independent of cameras (of differentresolutions), and insensitive to small acquisition jitters

Page 20: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])

feature mapf

|f ∗ gλ(k)1

|

||f ∗ gλ(k)1

| ∗ gλ(l)2

|

|f ∗ gλ(p)1

|

||f ∗ gλ(p)1

| ∗ gλ(r)2

|

Page 21: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])

feature mapf

|f ∗ gλ(k)1

|

· ∗ χ2

||f ∗ gλ(k)1

| ∗ gλ(l)2

|

· ∗ χ3

|f ∗ gλ(p)1

|

· ∗ χ2

||f ∗ gλ(p)1

| ∗ gλ(r)2

|

· ∗ χ3

· ∗ χ1

Page 22: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Scattering networks ([Mallat, 2012 ], [Wiatowski and HB, 2015 ])

feature map

feature vector Φ(f)

f

|f ∗ gλ(k)1

|

· ∗ χ2

||f ∗ gλ(k)1

| ∗ gλ(l)2

|

· ∗ χ3

|f ∗ gλ(p)1

|

· ∗ χ2

||f ∗ gλ(p)1

| ∗ gλ(r)2

|

· ∗ χ3

· ∗ χ1

General scattering networks guarantee [Wiatowski & HB, 2015 ]

- (vertical) translation invariance

- small deformation sensitivity

essentially irrespective of filters, non-linearities, and poolings!

Page 23: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn

An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn

‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)

e.g.: Learned filters

Page 24: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn

An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn

‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)

e.g.: Structured filters

e.g.: Learned filters

Page 25: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn

An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn

‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)

e.g.: Unstructured filters

e.g.: Learned filters

Page 26: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn

An‖f‖22 ≤ ‖f ∗ χn‖22 +∑λn∈Λn

‖f ∗ gλn‖2 ≤ Bn‖f‖22, ∀f ∈ L2(Rd)

e.g.: Learned filters

Page 27: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Non-linearities: Point-wise and Lipschitz-continuous

‖Mn(f)−Mn(h)‖2 ≤ Ln‖f − h‖2, ∀ f, h ∈ L2(Rd)

⇒ Satisfied by virtually all non-linearities usedin the deep learning literature!

ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 14 ; ...

Page 28: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Non-linearities: Point-wise and Lipschitz-continuous

‖Mn(f)−Mn(h)‖2 ≤ Ln‖f − h‖2, ∀ f, h ∈ L2(Rd)

⇒ Satisfied by virtually all non-linearities usedin the deep learning literature!

ReLU: Ln = 1; modulus: Ln = 1; logistic sigmoid: Ln = 14 ; ...

Page 29: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Pooling: In continuous-time according to

f 7→ Sd/2n Pn(f)(Sn·),

where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous

⇒ Emulates most poolings used in the deep learning literature!

e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1

Page 30: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Pooling: In continuous-time according to

f 7→ Sd/2n Pn(f)(Sn·),

where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous

⇒ Emulates most poolings used in the deep learning literature!

e.g.: Pooling by sub-sampling Pn(f) = f with Rn = 1

Page 31: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n non-lin. pool.

gλ(k)n non-lin. pool.

Pooling: In continuous-time according to

f 7→ Sd/2n Pn(f)(Sn·),

where Sn ≥ 1 is the pooling factor and Pn : L2(Rd)→ L2(Rd) isRn-Lipschitz-continuous

⇒ Emulates most poolings used in the deep learning literature!

e.g.: Pooling by averaging Pn(f) = f ∗ φn with Rn = ‖φn‖1

Page 32: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Vertical translation invariance

Theorem (Wiatowski and HB, 2015)

Assume that the filters, non-linearities, and poolings satisfy

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N.

Let the pooling factors be Sn ≥ 1, n ∈ N. Then,

|||Φn(Ttf)− Φn(f)||| = O(

‖t‖S1 . . . Sn

),

for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.

Page 33: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Vertical translation invariance

Theorem (Wiatowski and HB, 2015)

Assume that the filters, non-linearities, and poolings satisfy

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N.

Let the pooling factors be Sn ≥ 1, n ∈ N. Then,

|||Φn(Ttf)− Φn(f)||| = O(

‖t‖S1 . . . Sn

),

for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.

⇒ Features become more invariant with increasing network depth!

Page 34: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Vertical translation invariance

Theorem (Wiatowski and HB, 2015)

Assume that the filters, non-linearities, and poolings satisfy

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N.

Let the pooling factors be Sn ≥ 1, n ∈ N. Then,

|||Φn(Ttf)− Φn(f)||| = O(

‖t‖S1 . . . Sn

),

for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.

Full translation invariance: If limn→∞

S1 · S2 · . . . · Sn =∞, then

limn→∞

|||Φn(Ttf)− Φn(f)||| = 0

Page 35: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Vertical translation invariance

Theorem (Wiatowski and HB, 2015)

Assume that the filters, non-linearities, and poolings satisfy

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N.

Let the pooling factors be Sn ≥ 1, n ∈ N. Then,

|||Φn(Ttf)− Φn(f)||| = O(

‖t‖S1 . . . Sn

),

for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.

The condition

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N,

is easily satisfied by normalizing the filters {gλn}λn∈Λn .

Page 36: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Vertical translation invariance

Theorem (Wiatowski and HB, 2015)

Assume that the filters, non-linearities, and poolings satisfy

Bn ≤ min{1, L−2n R−2

n }, ∀n ∈ N.

Let the pooling factors be Sn ≥ 1, n ∈ N. Then,

|||Φn(Ttf)− Φn(f)||| = O(

‖t‖S1 . . . Sn

),

for all f ∈ L2(Rd), t ∈ Rd, n ∈ N.

⇒ applies to general filters, non-linearities, and poolings

Page 37: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind invariance results

Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:

limJ→∞

|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become invariant in every network layer, but needsJ →∞

- applies to wavelet transform and modulus non-linearity withoutpooling

“Vertical” translation invariance:

limn→∞

|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become more invariant with increasing network depth

- applies to general filters, general non-linearities, and generalpoolings

Page 38: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind invariance results

Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:

limJ→∞

|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become invariant in every network layer, but needsJ →∞

- applies to wavelet transform and modulus non-linearity withoutpooling

“Vertical” translation invariance:

limn→∞

|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become more invariant with increasing network depth

- applies to general filters, general non-linearities, and generalpoolings

Page 39: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind invariance results

Mallat’s “horizontal” translation invariance [Mallat, 2012 ]:

limJ→∞

|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become invariant in every network layer, but needsJ →∞

- applies to wavelet transform and modulus non-linearity withoutpooling

“Vertical” translation invariance:

limn→∞

|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

- features become more invariant with increasing network depth

- applies to general filters, general non-linearities, and generalpoolings

Page 40: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Non-linear deformations

Non-linear deformation (Fτf)(x) = f(x− τ(x)), where τ : Rd → Rd

For “small” τ :

Page 41: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Non-linear deformations

Non-linear deformation (Fτf)(x) = f(x− τ(x)), where τ : Rd → Rd

For “large” τ :

Page 42: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Deformation sensitivity for signal classes

Consider (Fτf)(x) = f(x− τ(x)) = f(x− e−x2)

x

f1(x), (Fτf1)(x)

x

f2(x), (Fτf2)(x)

For given τ the amount of deformation inducedcan depend drastically on f ∈ L2(Rd)

Page 43: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind deformation stability/sensitivity bounds

Mallat’s deformation stability bound [Mallat, 2012 ]:

|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞

)‖f‖W ,

for all f ∈ HW ⊆ L2(Rd)

- The signal class HW and the corresponding norm ‖ · ‖W dependon the mother wavelet (and hence the network)

Our deformation sensitivity bound:

|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)

- The signal class C (band-limited functions, cartoon functions, orLipschitz functions) is independent of the network

Page 44: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind deformation stability/sensitivity bounds

Mallat’s deformation stability bound [Mallat, 2012 ]:

|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞

)‖f‖W ,

for all f ∈ HW ⊆ L2(Rd)

- Signal class description complexity implicit via norm ‖ · ‖W

Our deformation sensitivity bound:

|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)

- Signal class description complexity explicit via CC- L-band-limited functions: CC = O(L)- cartoon functions of size K: CC = O(K3/2)- M -Lipschitz functions CC = O(M)

Page 45: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind deformation stability/sensitivity bounds

Mallat’s deformation stability bound [Mallat, 2012 ]:

|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞

)‖f‖W ,

for all f ∈ HW ⊆ L2(Rd)

Our deformation sensitivity bound:

|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)

- Decay rate α > 0 of the deformation error is signal-class-specific (band-limited functions: α = 1, cartoon functions:α = 1

2 , Lipschitz functions: α = 1)

Page 46: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind deformation stability/sensitivity bounds

Mallat’s deformation stability bound [Mallat, 2012 ]:

|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞

)‖f‖W ,

for all f ∈ HW ⊆ L2(Rd)

- The bound depends explicitly on higher order derivatives of τ

Our deformation sensitivity bound:

|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)

- The bound implicitly depends on derivative of τ via thecondition ‖Dτ‖∞ ≤ 1

2d

Page 47: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Philosophy behind deformation stability/sensitivity bounds

Mallat’s deformation stability bound [Mallat, 2012 ]:

|||ΦW (Fτf)−ΦW (f)||| ≤ C(2−J‖τ‖∞+J‖Dτ‖∞+‖D2τ‖∞

)‖f‖W ,

for all f ∈ HW ⊆ L2(Rd)

- The bound is coupled to horizontal translation invariance

limJ→∞

|||ΦW (Ttf)− ΦW (f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

Our deformation sensitivity bound:

|||Φ(Fτf)− Φ(f)||| ≤ CC‖τ‖α∞, ∀f ∈ C ⊆ L2(Rd)

- The bound is decoupled from vertical translation invariance

limn→∞

|||Φn(Ttf)− Φn(f)||| = 0, ∀f ∈ L2(Rd), ∀t ∈ Rd

Page 48: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

CNNs in a nutshell

CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!

e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Page 49: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

CNNs in a nutshell

CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!

e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Page 50: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

CNNs in a nutshell

CNNs used in practice employ potentiallyhundreds of layers and 10,000s of nodes!

e.g.: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Such depths (and breadths) pose formidable computationalchallenges in training and operating the network!

Page 51: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Topology reduction

Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers

Guarantee trivial null-space for feature extractor Φ

Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector

For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy

Page 52: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Topology reduction

Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers

Guarantee trivial null-space for feature extractor Φ

Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector

For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy

Page 53: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Topology reduction

Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers

Guarantee trivial null-space for feature extractor Φ

Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector

For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy

Page 54: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Topology reduction

Determine how fast the energy contained in thepropagated signals (a.k.a. feature maps) decays across layers

Guarantee trivial null-space for feature extractor Φ

Specify the number of layers needed to have “most” of theinput signal energy be contained in the feature vector

For a fixed (possibly small) depth, design CNNsthat capture “most” of the input signal energy

Page 55: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Building blocks

Basic operations in the n-th network layer

f...

gλ(r)n | · | ↓S

gλ(k)n | · | ↓S

Filters: Semi-discrete frame Ψn := {χn} ∪ {gλn}λn∈Λn

Non-linearity: Modulus | · |Pooling: Sub-sampling with pooling factor S ≥ 1

Page 56: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Demodulation effect of modulus non-linearity

Components of feature vector given by |f ∗ gλn | ∗ χn+1

1

ω

· · · · · ·

gλn(ω) χn+1(ω)

f(ω)

1

ω

f(ω) · gλn(ω)

Page 57: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Demodulation effect of modulus non-linearity

Components of feature vector given by |f ∗ gλn | ∗ χn+1

1

ω

· · · · · ·

gλn(ω) χn+1(ω)

f(ω)

1

ω

f(ω) · gλn(ω)

Page 58: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Demodulation effect of modulus non-linearity

Components of feature vector given by |f ∗ gλn | ∗ χn+1

1

ω

· · · · · ·

gλn(ω) χn+1(ω)

f(ω)

1

ω

f(ω) · gλn(ω)

Modulus squared:

|f ∗ gλn(x)|2 Rf ·gλn

(ω)

Page 59: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Demodulation effect of modulus non-linearity

Components of feature vector given by |f ∗ gλn | ∗ χn+1

1

ω

· · · · · ·

gλn(ω) χn+1(ω)

f(ω)

1

ω

f(ω) · gλn(ω)

1

ω

|f ∗ gλn |∧

(ω)

Φ(f)

via χn+1

Page 60: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Do all non-linearities demodulate?

High-pass filtered signal:

−2R 2R

2R

F(f ∗ gλ)

ω

Page 61: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Do all non-linearities demodulate?

High-pass filtered signal:

−2R 2R

2R

F(f ∗ gλ)

ω

Modulus: Yes!

−2R 2Rω

|F(|f ∗ gλ|)|

−2R 2Rω

|F(|f ∗ gλ|)|

... but (small) tails!

Page 62: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Do all non-linearities demodulate?

High-pass filtered signal:

−2R 2R

2R

F(f ∗ gλ)

ω

Modulus squared: Yes, and sharply so!

−2R 2R

|F(|f ∗ gλ|2)|

ω

... but not Lipschitz-continuous!

Page 63: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Do all non-linearities demodulate?

High-pass filtered signal:

−2R 2R

2R

F(f ∗ gλ)

ω

Rectified linear unit: No!

−2R 2R

|F(ReLU(f ∗ gλ))|

ω

Page 64: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

First goal: Quantify feature map energy decay

W1(f)

W2(f)

f

|f ∗ gλ(k)1

|

· ∗ χ2

||f ∗ gλ(k)1

| ∗ gλ(l)2

|

· ∗ χ3

|f ∗ gλ(p)1

|

· ∗ χ2

||f ∗ gλ(p)1

| ∗ gλ(r)2

|

· ∗ χ3

· ∗ χ1

Page 65: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Assumptions (on the filters)

i) Analyticity: For every filter gλn there exists a (not necessarilycanonical) orthant Hλn ⊆ Rd such that

supp(gλn) ⊆ Hλn .

ii) High-pass: There exists δ > 0 such that∑λn∈Λn

|gλn(ω)|2 = 0, a.e. ω ∈ Bδ(0).

⇒ Comprises various contructions of WH filters, wavelets,ridgelets, (α)-curvelets, shearlets

e.g.: analytic band-limited curvelets: ω1

ω2

Page 66: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Assumptions (on the filters)

i) Analyticity: For every filter gλn there exists a (not necessarilycanonical) orthant Hλn ⊆ Rd such that

supp(gλn) ⊆ Hλn .

ii) High-pass: There exists δ > 0 such that∑λn∈Λn

|gλn(ω)|2 = 0, a.e. ω ∈ Bδ(0).

⇒ Comprises various contructions of WH filters, wavelets,ridgelets, (α)-curvelets, shearlets

e.g.: analytic band-limited curvelets: ω1

ω2

Page 67: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Input signal classes

Sobolev functions of order s ≥ 0:

Hs(Rd) ={f ∈ L2(Rd)

∣∣∣ ∫Rd

(1 + |ω|2)s|f(ω)|2dω <∞}

Hs(Rd) contains a wide range of practically relevant signal classes

- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2

L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0

- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)

Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

Page 68: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Input signal classes

Sobolev functions of order s ≥ 0:

Hs(Rd) ={f ∈ L2(Rd)

∣∣∣ ∫Rd

(1 + |ω|2)s|f(ω)|2dω <∞}

Hs(Rd) contains a wide range of practically relevant signal classes

- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2

L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0

- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)

Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

Page 69: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Input signal classes

Sobolev functions of order s ≥ 0:

Hs(Rd) ={f ∈ L2(Rd)

∣∣∣ ∫Rd

(1 + |ω|2)s|f(ω)|2dω <∞}

Hs(Rd) contains a wide range of practically relevant signal classes

- square-integrable functions L2(Rd) = H0(Rd)

- L-band-limited functions L2L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0

- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)

Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

Page 70: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Input signal classes

Sobolev functions of order s ≥ 0:

Hs(Rd) ={f ∈ L2(Rd)

∣∣∣ ∫Rd

(1 + |ω|2)s|f(ω)|2dω <∞}

Hs(Rd) contains a wide range of practically relevant signal classes

- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2

L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0

- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)

Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

Page 71: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Input signal classes

Sobolev functions of order s ≥ 0:

Hs(Rd) ={f ∈ L2(Rd)

∣∣∣ ∫Rd

(1 + |ω|2)s|f(ω)|2dω <∞}

Hs(Rd) contains a wide range of practically relevant signal classes

- square-integrable functions L2(Rd) = H0(Rd)- L-band-limited functions L2

L(Rd) ⊆ Hs(Rd), ∀L > 0, ∀s ≥ 0

- cartoon functions [Donoho, 2001 ] CCART ⊆ Hs(Rd), ∀s ∈ [0, 12)

Handwritten digits from MNIST database [LeCun & Cortes, 1998 ]

Page 72: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Theorem

Let the filters be wavelets with mother wavelet

supp(ψ ) ⊆ [r−1, r], r > 1,

or Weyl-Heisenberg (WH) filters with prototype function

supp(g) ⊆ [−R,R], R > 0.

Then, for every f ∈ Hs(Rd), there exists β > 0 such that

Wn(f) = O(a−n(2s+β)

2s+β+1

),

where a = r2+1r2−1

in the wavelet case, and a = 12 + 1

R in the WH case.

⇒ decay factor a is explicit and can be tuned via r,R

Page 73: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Theorem

Let the filters be wavelets with mother wavelet

supp(ψ ) ⊆ [r−1, r], r > 1,

or Weyl-Heisenberg (WH) filters with prototype function

supp(g) ⊆ [−R,R], R > 0.

Then, for every f ∈ Hs(Rd), there exists β > 0 such that

Wn(f) = O(a−n(2s+β)

2s+β+1

),

where a = r2+1r2−1

in the wavelet case, and a = 12 + 1

R in the WH case.

⇒ decay factor a is explicit and can be tuned via r,R

Page 74: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Exponential energy decay:

Wn(f) = O(a−n(2s+β)

2s+β+1

)

- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to

|f(ω)| ≤ µ(1 + |ω|2)−( s2

+ 14

+β4

), ∀ |ω| ≥ L,

for some µ > 0, and L acts as an “effective bandwidth”

- smoother input signals (i.e., s↑) lead to faster energy decay

- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a

S

What about general filters? ⇒ polynomial energy decay!

Page 75: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Exponential energy decay:

Wn(f) = O(a−n(2s+β)

2s+β+1

)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to

|f(ω)| ≤ µ(1 + |ω|2)−( s2

+ 14

+β4

), ∀ |ω| ≥ L,

for some µ > 0, and L acts as an “effective bandwidth”

- smoother input signals (i.e., s↑) lead to faster energy decay

- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a

S

What about general filters? ⇒ polynomial energy decay!

Page 76: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Exponential energy decay:

Wn(f) = O(a−n(2s+β)

2s+β+1

)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to

|f(ω)| ≤ µ(1 + |ω|2)−( s2

+ 14

+β4

), ∀ |ω| ≥ L,

for some µ > 0, and L acts as an “effective bandwidth”

- smoother input signals (i.e., s↑) lead to faster energy decay

- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a

S

What about general filters? ⇒ polynomial energy decay!

Page 77: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Exponential energy decay:

Wn(f) = O(a−n(2s+β)

2s+β+1

)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to

|f(ω)| ≤ µ(1 + |ω|2)−( s2

+ 14

+β4

), ∀ |ω| ≥ L,

for some µ > 0, and L acts as an “effective bandwidth”

- smoother input signals (i.e., s↑) lead to faster energy decay

- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a

S

What about general filters? ⇒ polynomial energy decay!

Page 78: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Exponential energy decay

Exponential energy decay:

Wn(f) = O(a−n(2s+β)

2s+β+1

)- β > 0 determines the decay of f(ω) (as |ω| → ∞) according to

|f(ω)| ≤ µ(1 + |ω|2)−( s2

+ 14

+β4

), ∀ |ω| ≥ L,

for some µ > 0, and L acts as an “effective bandwidth”

- smoother input signals (i.e., s↑) lead to faster energy decay

- pooling through sub-sampling f 7→ S1/2f(S·) leads to decayfactor a

S

What about general filters? ⇒ polynomial energy decay!

Page 79: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our second goal ... trivial null-space for Φ

Why trivial null-space?

Feature space

w

: 〈w,Φ(f)〉 > 0

: 〈w,Φ(f)〉 < 0

Non-trivial null-space: ∃ f∗ 6= 0 such that Φ(f∗) = 0

⇒ 〈w,Φ(f∗)〉 = 0 for all w !

⇒ these f∗ become unclassifiable!

Page 80: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our second goal ... trivial null-space for Φ

Why trivial null-space?

Feature space

w

Φ(f∗)

: 〈w,Φ(f)〉 > 0

: 〈w,Φ(f)〉 < 0

Non-trivial null-space: ∃ f∗ 6= 0 such that Φ(f∗) = 0

⇒ 〈w,Φ(f∗)〉 = 0 for all w !

⇒ these f∗ become unclassifiable!

Page 81: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our second goal ...

Trivial null-space for feature extractor:{f ∈ L2(Rd) | Φ(f) = 0

}={

0}

Feature extractor Φ(·) =⋃∞n=0 Φn(·) shall satisfy

A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀f ∈ L2(Rd),

for some A,B > 0.

Page 82: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

“Energy conservation”

Theorem

For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=

∏∞n=1 max{1, Bn} and A :=

∏∞n=1 min{1, An}. If

0 < A ≤ B <∞,then

A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).

- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields

|||Φ(f)|||2 = ‖f‖22

- Connection to energy decay:

‖f‖22 =n−1∑k=0

|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0

Page 83: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

“Energy conservation”

Theorem

For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=

∏∞n=1 max{1, Bn} and A :=

∏∞n=1 min{1, An}. If

0 < A ≤ B <∞,then

A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).

- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields

|||Φ(f)|||2 = ‖f‖22

- Connection to energy decay:

‖f‖22 =n−1∑k=0

|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0

Page 84: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

“Energy conservation”

Theorem

For the frame upper {Bn}n∈N and frame lower bounds {An}n∈N,define B :=

∏∞n=1 max{1, Bn} and A :=

∏∞n=1 min{1, An}. If

0 < A ≤ B <∞,then

A‖f‖22 ≤ |||Φ(f)|||2 ≤ B‖f‖22, ∀ f ∈ L2(Rd).

- For Parseval frames (i.e., An = Bn = 1, n ∈ N), this yields

|||Φ(f)|||2 = ‖f‖22

- Connection to energy decay:

‖f‖22 =

n−1∑k=0

|||Φk(f)|||2 +Wn(f)︸ ︷︷ ︸→ 0

Page 85: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... and our third goal ...

For a given CNN, specify the number of layersneeded to capture “most” of the input signal energy

How many layers n are needed to have at least ((1− ε) · 100)% ofthe input signal energy be contained in the feature vector, i.e.,

(1− ε)‖f‖22 ≤n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).

Page 86: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... and our third goal ...

For a given CNN, specify the number of layersneeded to capture “most” of the input signal energy

How many layers n are needed to have at least ((1− ε) · 100)% ofthe input signal energy be contained in the feature vector, i.e.,

(1− ε)‖f‖22 ≤n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).

Page 87: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Theorem

Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If

n ≥⌈

loga

(L

(1−√

1− ε )

)⌉,

then(1− ε)‖f‖22 ≤

n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22.

Page 88: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Theorem

Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If

n ≥⌈

loga

(L

(1−√

1− ε )

)⌉,

then(1− ε)‖f‖22 ≤

n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22.

⇒ also guarantees trivial null-space for⋃nk=0 Φk(f)

Page 89: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Theorem

Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If

n ≥⌈

loga

(L

(1−√

1− ε )

)⌉,

then(1− ε)‖f‖22 ≤

n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22.

- lower bound depends on

- description complexity of input signals (i.e., bandwidth L)

- decay factor (wavelets a = r2+1r2−1 , WH filters a = 1

2 + 1R )

- similar estimates for Sobolev input signals and for generalfilters (polynomial decay!)

Page 90: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Theorem

Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and let ε ∈ (0, 1). If

n ≥⌈

loga

(L

(1−√

1− ε )

)⌉,

then(1− ε)‖f‖22 ≤

n∑k=0

|||Φk(f)|||2 ≤ ‖f‖22.

- lower bound depends on

- description complexity of input signals (i.e., bandwidth L)

- decay factor (wavelets a = r2+1r2−1 , WH filters a = 1

2 + 1R )

- similar estimates for Sobolev input signals and for generalfilters (polynomial decay!)

Page 91: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Numerical example for bandwidth L = 1:

(1− ε)0.25 0.5 0.75 0.9 0.95 0.99

wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199

Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Page 92: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Numerical example for bandwidth L = 1:

(1− ε)0.25 0.5 0.75 0.9 0.95 0.99

wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199

Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Page 93: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Number of layers needed

Numerical example for bandwidth L = 1:

(1− ε)0.25 0.5 0.75 0.9 0.95 0.99

wavelets (r = 2) 2 3 4 6 8 11WH filters (R = 1) 2 4 5 8 10 14general filters 2 3 7 19 39 199

Recall: Winner of the ImageNet 2015 challenge [He et al., 2015 ]

- Network depth: 152 layers

- average # of nodes per layer: 472

- # of FLOPS for a single forward pass: 11.3 billion

Page 94: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our fourth and last goal ...

For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy

Page 95: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our fourth and last goal ...

For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy

Recall: Let the filters be wavelets with mother wavelet

supp(ψ ) ⊆ [r−1, r], r > 1,

or Weyl-Heisenberg filters with prototype function

supp(g) ⊆ [−R,R], R > 0.

Page 96: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

... our fourth and last goal ...

For a fixed (possibly small) depth N , design scatteringnetworks that capture “most” of the input signal energy

For fixed depth N, want to choose r in the wavelet and R in the WHcase so that

(1− ε)‖f‖22 ≤N∑k=0

|||Φk(f)|||2 ≤ ‖f‖22, ∀f ∈ L2(Rd).

Page 97: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-constrained networks

Theorem

Let the frame bounds satisfy An = Bn = 1, n ∈ N. Let the inputsignal f be L-band-limited, and fix ε ∈ (0, 1) and N ∈ N. If, in thewavelet case,

1 < r ≤√κ+ 1

κ− 1,

or, in the WH case,

0 < R ≤√

1

κ− 12

,

where κ :=(

L(1−√

1−ε )

) 1N

, then

(1− ε)‖f‖22 ≤N∑k=0

|||Φk(f)|||2 ≤ ‖f‖22.

Page 98: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-width tradeoff

Spectral supports of wavelet filters:

ωL1

r1 r r2 r3

1

g1 g2 g3ψ

Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal

⇒ larger number of filters in the first layer

⇒ depth-width tradeoff

Page 99: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-width tradeoff

Spectral supports of wavelet filters:

ωL1

r1 r r2 r3

1

g1 g2 g3ψ

Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal

⇒ larger number of filters in the first layer

⇒ depth-width tradeoff

Page 100: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-width tradeoff

Spectral supports of wavelet filters:

ωL1

r1 r r2 r3

1

g1 g2 g3ψ

Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal

⇒ larger number of filters in the first layer

⇒ depth-width tradeoff

Page 101: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-width tradeoff

Spectral supports of wavelet filters:

ωL1

r1 r r2 r3

1

g1 g2 g3ψ

Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal

⇒ larger number of filters in the first layer

⇒ depth-width tradeoff

Page 102: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Depth-width tradeoff

Spectral supports of wavelet filters:

ωL1

r1 r r2 r3

1

g1 g2 g3ψ

Smaller depth N ⇒ smaller “bandwidth” r of mother wavelet

⇒ larger number of wavelets (O(logr(L))) tocover the spectral support [−L,L] of input signal

⇒ larger number of filters in the first layer

⇒ depth-width tradeoff

Page 103: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Yours truly

Page 104: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Experiment: Handwritten digit classification

- Dataset: MNIST database of handwritten digits [LeCun &Cortes, 1998 ]; 60,000 training and 10,000 test images

- Φ-network: D = 3 layers; same filters, non-linearities, andpooling operators in all layers

- Classifier: SVM with radial basis function kernel [Vapnik, 1995 ]

- Dimensionality reduction: Supervised orthogonal least squaresscheme [Chen et al., 1991 ]

Page 105: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Experiment: Handwritten digit classification

Classification error in percent:

Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig

n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26

- modulus and ReLU perform better than tanh and LogSig

- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost

- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]

- similar feature extraction network with directional, non-separablewavelets and no pooling

- significantly higher computational complexity

Page 106: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Experiment: Handwritten digit classification

Classification error in percent:

Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig

n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26

- modulus and ReLU perform better than tanh and LogSig

- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost

- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]

- similar feature extraction network with directional, non-separablewavelets and no pooling

- significantly higher computational complexity

Page 107: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Experiment: Handwritten digit classification

Classification error in percent:

Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig

n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26

- modulus and ReLU perform better than tanh and LogSig

- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost

- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]

- similar feature extraction network with directional, non-separablewavelets and no pooling

- significantly higher computational complexity

Page 108: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Experiment: Handwritten digit classification

Classification error in percent:

Haar wavelet Bi-orthogonal waveletabs ReLU tanh LogSig abs ReLU tanh LogSig

n.p. 0.57 0.57 1.35 1.49 0.51 0.57 1.12 1.22sub. 0.69 0.66 1.25 1.46 0.61 0.61 1.20 1.18max. 0.58 0.65 0.75 0.74 0.52 0.64 0.78 0.73avg. 0.55 0.60 1.27 1.35 0.58 0.59 1.07 1.26

- modulus and ReLU perform better than tanh and LogSig

- results with pooling (S = 2) are competitive with those withoutpooling, at significanly lower computational cost

- state-of-the-art: 0.43 [Bruna and Mallat, 2013 ]

- similar feature extraction network with directional, non-separablewavelets and no pooling

- significantly higher computational complexity

Page 109: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Waldspurger, 2017 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- 1-D wavelet filters

- every network layer equipped with the same set of wavelets

- vanishing moments condition on the mother wavelet

- applies to 1-D real-valued band-limited input signals f ∈ L2(R)

Page 110: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Waldspurger, 2017 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- 1-D wavelet filters

- every network layer equipped with the same set of wavelets

- vanishing moments condition on the mother wavelet

- applies to 1-D real-valued band-limited input signals f ∈ L2(R)

Page 111: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Waldspurger, 2017 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- 1-D wavelet filters

- every network layer equipped with the same set of wavelets

- vanishing moments condition on the mother wavelet

- applies to 1-D real-valued band-limited input signals f ∈ L2(R)

Page 112: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Czaja and Li, 2016 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)

- every network layer equipped with the same set of filters

- analyticity and vanishing moments conditions on the filters

- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)

Page 113: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Czaja and Li, 2016 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)

- every network layer equipped with the same set of filters

- analyticity and vanishing moments conditions on the filters

- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)

Page 114: Harmonic Analysis of Deep Convolutional Neural Networks...\Carlos Kleiber conducting the Vienna Philharmonic’s New Year’s Concert 1989." Describing the content of an image CNNs

Energy decay: Related work

[Czaja and Li, 2016 ]: Exponential energy decay

Wn(f) = O(a−n),

for some unspecified a > 1.

- d-dimensional uniform covering filters (similar to Weyl-Heisenberg filters), but does not cover multi-scale filters (e.g.wavelets, ridgedelets, curvelets etc.)

- every network layer equipped with the same set of filters

- analyticity and vanishing moments conditions on the filters

- applies to d-dimensional complex-valued input signalsf ∈ L2(Rd)