Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid...

35
1/35 1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University

Transcript of Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid...

Page 1: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

1/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

Rapid Introduction to Machine Learning/Deep Learning

Hyeong In Choi

Seoul National University

Page 2: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

2/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

Lecture 4bConvolutional Network

October 30, 2015

Page 3: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

3/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

Table of contents

1 1. Objectives of Lecture 4b

2 2. Convolution kernel2.1. Convolution

3 3. Convolutional network3.1. 2D convolution3.2. Analysis of LeCun’s example3.3. Another example3.4. Classification3.5. Training convolutional network

Page 4: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

4/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

1. Objectives of Lecture 4b

Objective 1

Learn the basic formalism of convolutional network

Objective 2

Go through LeCun’s examples

Objective 3

Learn about the training of convolutional network

Page 5: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

5/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

2. Convolution kernel2.1. Convolution

f (x) ∶ functionK(x) ∶ convolution kernel (filter)

(f ∗K)(x) = ∫ f (y)K(x − y)dy = ∫ f (x − y)K(y)dy

Discrete convolution

x(n) ∶ dataK(n) ∶ convolution kernel (filter)

(x ∗K)(n) =∑m

x(m)K(n −m) =∑m

x(m − n)K(m)

Page 6: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

6/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

Example (1D Convolution)

(x ∗K)(5) = x(5 − 1)K(1) + x(5 − 0)K(0) + x(5 + 1)K(−1)

= x(4)K(1) + x(5)K(0) + x(6)K(−1)

= x(4) + 2x(5) − x(6)

Page 7: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

7/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

(x ∗K)(5) = x(4) + 2x(5) − x(6)

Page 8: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

8/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

(x ∗K)(6) = x(5) + 2x(6) − x(7)

Page 9: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

9/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

Example (2D Convolution)

x(m,n) ∶ data K(p,q) ∶ convolution kernel

(x ∗K)(m,n) =∑p,q

x(m − p,n − q)K(p,q)

Page 10: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

10/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

(x ∗K)(3,4) = 2x(2,3) + 4x(2,4) − 2x(2,5)

+ 3x(3,3) + 6x(3,4) − 3x(3,5)

+ x(4,3) + 2x(4,4) − x(4,5)

Page 11: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

11/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

(x ∗K)(3,5) = 2x(2,4) + 4x(2,5) − 2x(2,6)

+ 3x(3,4) + 6x(3,5) − 3x(3,6)

+ x(4,4) + 2x(4,5) − x(4,6)

Page 12: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

12/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

Boundary effect

Example

At the boundary

Page 13: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

13/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

There is no x(−1), so (x ∗K)(1) is not defined

One may pad 0’s around boundaries

But the “valid” part of x ∗K is shorter than x itself

In the above example, the valid part of x ∗K is an array ofsize 3

Page 14: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

14/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

2.1. Convolution

in general if K is a (2p + 1) × (2q + 1) matrix then the validpart of x ∗K is an (M − 2p) × (N − 2q) matrix, where x is anM ×N matrix

Page 15: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

15/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.1. 2D convolution

3. Convolutional network3.1. 2D convolution

The same convolution kernel K is applied at every position

Page 16: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

16/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.1. 2D convolution

Example

The same convolution kernel K is applied at every position

Page 17: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

17/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

3.2. Analysis of LeCun’s example

Page 18: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

18/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

3.2.1.

Page 19: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

19/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Pooling

Moving 10 × 10 window on 75 × 75 image results in 66 × 66matrix

Pooling is taken as one of the following:

Maximum (Max Pooling)LP sum (P = 1,2,⋯)Average

Page 20: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

20/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Subsampling

Example: 5× 5 subsampling (i.e., column stride = 5, row stride = 5)

14 =66 − 1

5+ 1

sampling at (1,1), (1,6),⋯, (1,66), (6,1), (6,6),⋯, (6,66), ⋯,(66,1), (66,6), ⋯, (66,66)

Page 21: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

21/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Layer 3

There are 256 features maps in Layer 3. Each of such (256)feature maps are gotten as follows:

Randomly select 16 feature maps out of 64 feature maps inLayer 2

Page 22: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

22/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Convolution is done for 16 × 9 × 9 3D pipe in the 16 × 14 × 14volume. For each feature map of 16 features, this defines a2D convolution kerenl; thus 16 kernels for Thus there are256 × 16 = 4096 2D kernels

Augmentation

The step from Convolution to Pooling and Subsampling canbe augmented with rectification and Local ContrastNormalization(LCN)

xi ∶ ith feature mapxijk ∶ (j , k)th pixel value of xi

Rectification (Rabc)xijk → ∣xijk ∣

Page 23: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

23/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Subtractive normalization

xijk → vijk = xijk − ∑i ,p,q

ωpqxi ,j+p,k+q,

where ωpq is a Gaussian-like filter such that ∑i ,p,q ωpq = 1

Divisive normalization

vijk → yijk = vijk/max(c , σjk),

where σjk = (∑i ,p,q ωpqv2i ,j+p,k+q)

1/2

Page 24: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

24/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Page 25: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

25/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Summary: Model architecture

The number of n2 × n3 image (input feature map) is n1

Page 26: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

26/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

xi ∶ ith image (input feature map)

kij ∶ convolution kernel of size `1 × `2 operating on xi toproduce yj , j = 1,⋯,m1 where m1 is the number of outputfeature maps

yj ∶ jth output feature map

yj = {gj tanh (∑

n1i=1 kij ∗ xi)

gjsigm (∑n1i=1 kij ∗ xi)

for j = 1,⋯,m1

[Hence gj is called the gain coefficient]

Page 27: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

27/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.2. Analysis of LeCun’s example

Notations

(a)C = ConvolutionS = sigm/ tanhG = gain

⎫⎪⎪⎪⎬⎪⎪⎪⎭

⇒ FCSG

In LeCun’s example above, Layer 1 is denoted by 64F 9×9CSG

[64 = number of kernels, 9 × 9 = convolution kernel size]

(b) Rabs ∶ rectification (= taking the absolute value)

(c) N ∶ local contrast normalization (LCN)

(d) PA ∶ average pooling and subsamplingPM ∶ max pooling and subsampling

Page 28: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

28/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.3. Another example

3.3. Another example

Page 29: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

29/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.3. Another example

Th above processes are denoted by

64F 9×9CSG → R/N/P5×5

The whole processes are denoted by

64F 9×9CSG → R/N/P5×5

→ 256F 9×9CSG → R/N/P4×4

Page 30: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

30/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.4. Classification

3.4. classification

The final layer is fed into the classification layer like softmaxlayer

Page 31: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

31/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.4. Classification

These two layers are fully connected

Train the entire network in the supervised manner

Only the filters (kernels) are trained

The error derivative back propagation has to be worked outacross R/N/P layers

Page 32: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

32/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.5. Training convolutional network

3.5. Training convolutional network

Weight training (learning)

Convolution weights

Training is done just like the usual neural networkTo enforce convolution, need to maintain equality constraint

Example

Suppose weights ω1 = ω2 = ⋯ = ωN due to convolutionconstraintDuring the training get ω̃1(new), ω̃2(new),⋯, ω̃N(new)To enforce the equality constraint, define

ωi(new) =1

N

j=1

∑N

ω̃j(new),

for i = 1,⋯,N

Page 33: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

33/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.5. Training convolutional network

R/N/P

The computations in R/N steps do not involve weights. So noneed to be concerned on these steps in the training

For the pooling step1D Example: pooling by 3, subsampling by 2 (stride 2)

Page 34: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

34/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.5. Training convolutional network

Combine the weights affecting the subsampling neurons tocome up with an effective network

Page 35: Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

35/35

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

3.5. Training convolutional network

Derivative of max function

max(x1, x2) =1

2{∣x1 − x2∣ + x1 + x2}

∂x1max(x1, x2) = {

1 if x1 > x2

0 else

Simiarly

∂x1max(x1, x2, x3) = {

1 if x1 > x2, x1 > x3

0 else

If the pooling is average or other Lp norm, the derivatives canbe easily computed

Once the derivatives of pooling layers are computed, the backpropagation algorithm can be applied