Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid...

1. Objectives of Lecture 4b 2. Convolution kernel 3. Convolutional network

Rapid Introduction to Machine Learning/Deep Learning

Hyeong In Choi

Seoul National University

Lecture 4bConvolutional Network

October 30, 2015

Table of contents

1 1. Objectives of Lecture 4b

2 2. Convolution kernel2.1. Convolution

3 3. Convolutional network3.1. 2D convolution3.2. Analysis of LeCun’s example3.3. Another example3.4. Classification3.5. Training convolutional network

1. Objectives of Lecture 4b

Objective 1

Learn the basic formalism of convolutional network

Objective 2

Go through LeCun’s examples

Objective 3

Learn about the training of convolutional network

2.1. Convolution

2. Convolution kernel2.1. Convolution

f (x) ∶ functionK(x) ∶ convolution kernel (filter)

(f ∗K)(x) = ∫ f (y)K(x − y)dy = ∫ f (x − y)K(y)dy

Discrete convolution

x(n) ∶ dataK(n) ∶ convolution kernel (filter)

(x ∗K)(n) =∑m

x(m)K(n −m) =∑m

x(m − n)K(m)

2.1. Convolution

Example (1D Convolution)

(x ∗K)(5) = x(5 − 1)K(1) + x(5 − 0)K(0) + x(5 + 1)K(−1)

= x(4)K(1) + x(5)K(0) + x(6)K(−1)

= x(4) + 2x(5) − x(6)

2.1. Convolution

(x ∗K)(5) = x(4) + 2x(5) − x(6)

2.1. Convolution

(x ∗K)(6) = x(5) + 2x(6) − x(7)

2.1. Convolution

Example (2D Convolution)

x(m,n) ∶ data K(p,q) ∶ convolution kernel

(x ∗K)(m,n) =∑p,q

x(m − p,n − q)K(p,q)

2.1. Convolution

(x ∗K)(3,4) = 2x(2,3) + 4x(2,4) − 2x(2,5)

+ 3x(3,3) + 6x(3,4) − 3x(3,5)

+ x(4,3) + 2x(4,4) − x(4,5)

2.1. Convolution

(x ∗K)(3,5) = 2x(2,4) + 4x(2,5) − 2x(2,6)

+ 3x(3,4) + 6x(3,5) − 3x(3,6)

+ x(4,4) + 2x(4,5) − x(4,6)

2.1. Convolution

Boundary effect

Example

At the boundary

2.1. Convolution

There is no x(−1), so (x ∗K)(1) is not defined

One may pad 0’s around boundaries

But the “valid” part of x ∗K is shorter than x itself

In the above example, the valid part of x ∗K is an array ofsize 3

2.1. Convolution

in general if K is a (2p + 1) × (2q + 1) matrix then the validpart of x ∗K is an (M − 2p) × (N − 2q) matrix, where x is anM ×N matrix

3.1. 2D convolution

3. Convolutional network3.1. 2D convolution

The same convolution kernel K is applied at every position

3.1. 2D convolution

Example

The same convolution kernel K is applied at every position

3.2. Analysis of LeCun’s example

3.2.1.

Pooling

Moving 10 × 10 window on 75 × 75 image results in 66 × 66matrix

Pooling is taken as one of the following:

Maximum (Max Pooling)LP sum (P = 1,2,⋯)Average

Subsampling

Example: 5× 5 subsampling (i.e., column stride = 5, row stride = 5)

14 =66 − 1

sampling at (1,1), (1,6),⋯, (1,66), (6,1), (6,6),⋯, (6,66), ⋯,(66,1), (66,6), ⋯, (66,66)

Layer 3

There are 256 features maps in Layer 3. Each of such (256)feature maps are gotten as follows:

Randomly select 16 feature maps out of 64 feature maps inLayer 2

Convolution is done for 16 × 9 × 9 3D pipe in the 16 × 14 × 14volume. For each feature map of 16 features, this defines a2D convolution kerenl; thus 16 kernels for Thus there are256 × 16 = 4096 2D kernels

Augmentation

The step from Convolution to Pooling and Subsampling canbe augmented with rectification and Local ContrastNormalization(LCN)

xi ∶ ith feature mapxijk ∶ (j , k)th pixel value of xi

Rectification (Rabc)xijk → ∣xijk ∣

Subtractive normalization

xijk → vijk = xijk − ∑i ,p,q

ωpqxi ,j+p,k+q,

where ωpq is a Gaussian-like filter such that ∑i ,p,q ωpq = 1

Divisive normalization

vijk → yijk = vijk/max(c , σjk),

where σjk = (∑i ,p,q ωpqv2i ,j+p,k+q)

Summary: Model architecture

The number of n2 × n3 image (input feature map) is n1

xi ∶ ith image (input feature map)

kij ∶ convolution kernel of size `1 × `2 operating on xi toproduce yj , j = 1,⋯,m1 where m1 is the number of outputfeature maps

yj ∶ jth output feature map

yj = {gj tanh (∑

n1i=1 kij ∗ xi)

gjsigm (∑n1i=1 kij ∗ xi)

for j = 1,⋯,m1

[Hence gj is called the gain coefficient]

Notations

(a)C = ConvolutionS = sigm/ tanhG = gain

⎫⎪⎪⎪⎬⎪⎪⎪⎭

⇒ FCSG

In LeCun’s example above, Layer 1 is denoted by 64F 9×9CSG

[64 = number of kernels, 9 × 9 = convolution kernel size]

(b) Rabs ∶ rectification (= taking the absolute value)

(c) N ∶ local contrast normalization (LCN)

(d) PA ∶ average pooling and subsamplingPM ∶ max pooling and subsampling

3.3. Another example

Th above processes are denoted by

64F 9×9CSG → R/N/P5×5

The whole processes are denoted by

64F 9×9CSG → R/N/P5×5

→ 256F 9×9CSG → R/N/P4×4

3.4. Classification

3.4. classification

The final layer is fed into the classification layer like softmaxlayer

3.4. Classification

These two layers are fully connected

Train the entire network in the supervised manner

Only the filters (kernels) are trained

The error derivative back propagation has to be worked outacross R/N/P layers

3.5. Training convolutional network

Weight training (learning)

Convolution weights

Training is done just like the usual neural networkTo enforce convolution, need to maintain equality constraint

Example

Suppose weights ω1 = ω2 = ⋯ = ωN due to convolutionconstraintDuring the training get ω̃1(new), ω̃2(new),⋯, ω̃N(new)To enforce the equality constraint, define

ωi(new) =1

ω̃j(new),

for i = 1,⋯,N

The computations in R/N steps do not involve weights. So noneed to be concerned on these steps in the training

For the pooling step1D Example: pooling by 3, subsampling by 2 (stride 2)

Combine the weights affecting the subsampling neurons tocome up with an effective network

Derivative of max function

max(x1, x2) =1

2{∣x1 − x2∣ + x1 + x2}

∂x1max(x1, x2) = {

1 if x1 > x2

0 else

Simiarly

∂x1max(x1, x2, x3) = {

1 if x1 > x2, x1 > x3

0 else

If the pooling is average or other Lp norm, the derivatives canbe easily computed

Once the derivatives of pooling layers are computed, the backpropagation algorithm can be applied

Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid...

Documents

Transcript of Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4b.pdf · Rapid...

Enterprise Rapid E-Learning Project managing rapid e ... articles/misc... · Enterprise Rapid E-Learning Project managing rapid e-learning in large organisations Overview The role

Rapid Learning eBook Printable · PDF fileSix Reaction Types Mnemonic: Redox, Decomposition, Double Displacement, ... Rapid Learning eBook Printable Tutorial 1 +-+-+-+-)-+--+--.-+-+--)

Meta-Learning: from Few-Shot Learning to Rapid ...

Physical Chemistry Rapid Learning Series

Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/Lecture2b.pdf · Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University.

Rapid Learning Cycles and Lean Product Development

Rapid E Learning Breeze

Rapid Learning Cycles and Lean Product · PDF fileWhy and How Rapid Learning Cycles ... for Production ... Rapid Learning Cycles Push Decisions Later to Preserve Flexibility

Inorganic Chemistry Rapid Learning Series

Rapid e-Learning Design with Microsoft InfoPath

Rapid learning for precision oncology

Pharmacology - Rapid Learning Center

Rapid Introduction to Machine Learning/ Deep Learninghichoi/seminar2015/lecture4a.pdf · 2015-11-13 · Objectives of Lecture 4a 2. Multilayer perceptron 3. Training neural networks

Rapid postgresql learning, part 1

iGrafx Rapid Learning Guide 07

MotherNets: Rapid Deep Ensemble Learning

Rapid E-learning in Support of Health (RELISH) Creating and using Rapid E-learning tools to create content.

Video4 rapid learning

Rapid Introduction to Machine Learning/ Deep Learninghichoi/machinelearning/lecture... · 4.2 Universal approximation theorem 4.3 Deep vs Shallow learning. 4/62 1 Bird’s-eye view

Rapid Viz Midterm Learning Portfolio fall 2010