Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3...

39
32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)

Transcript of Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3...

Page 1: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201613

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)

Page 2: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201614

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Page 3: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201615

32

32

3

Convolution Layer32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation maps

1

28

28

consider a second, green filter

Page 4: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201616

32

32

3

Convolution Layer

activation maps

6

28

28

For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:

We stack these up to get a “new image” of size 28x28x6!

Page 5: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201617

Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions

32

32

3

28

28

6

CONV,ReLUe.g. 6 5x5x3 filters

Page 6: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201618

Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions

32

32

3

CONV,ReLUe.g. 6 5x5x3 filters 28

28

6

CONV,ReLUe.g. 10 5x5x6 filters

CONV,ReLU

….

10

24

24

Page 7: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201619

Preview [From recent Yann LeCun slides]

Page 8: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201620

Preview [From recent Yann LeCun slides]

Page 9: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201621

example 5x5 filters(32 total)

We call the layer convolutional because it is related to convolution of two signals:

elementwise multiplication and sum of a filter and the signal (image)

one filter => one activation map

Page 10: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201622

preview:

Page 11: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201623

A closer look at spatial dimensions:

32

32

3

32x32x3 image5x5x3 filter

convolve (slide) over all spatial locations

activation map

1

28

28

Page 12: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201624

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 13: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201625

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 14: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201626

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 15: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201627

7x7 input (spatially)assume 3x3 filter

7

7

A closer look at spatial dimensions:

Page 16: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201628

7x7 input (spatially)assume 3x3 filter

=> 5x5 output

7

7

A closer look at spatial dimensions:

Page 17: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201629

7x7 input (spatially)assume 3x3 filterapplied with stride 2

7

7

A closer look at spatial dimensions:

Page 18: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201631

7x7 input (spatially)assume 3x3 filterapplied with stride 2=> 3x3 output!

7

7

A closer look at spatial dimensions:

Page 19: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201632

7x7 input (spatially)assume 3x3 filterapplied with stride 3?

7

7

A closer look at spatial dimensions:

Page 20: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201633

7x7 input (spatially)assume 3x3 filterapplied with stride 3?

7

7

A closer look at spatial dimensions:

doesn’t fit! cannot apply 3x3 filter on 7x7 input with stride 3.

Page 21: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201634

N

NF

F

Output size:(N - F) / stride + 1

e.g. N = 7, F = 3:stride 1 => (7 - 3)/1 + 1 = 5stride 2 => (7 - 3)/2 + 1 = 3stride 3 => (7 - 3)/3 + 1 = 2.33 :\

Page 22: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201635

In practice: Common to zero pad the border0 0 0 0 0 0

0

0

0

0

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

(recall:)(N - F) / stride + 1

Page 23: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201636

In practice: Common to zero pad the border

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

7x7 output!

0 0 0 0 0 0

0

0

0

0

Page 24: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201637

In practice: Common to zero pad the border

e.g. input 7x73x3 filter, applied with stride 1 pad with 1 pixel border => what is the output?

7x7 output!in general, common to see CONV layers with stride 1, filters of size FxF, and zero-padding with (F-1)/2. (will preserve size spatially)e.g. F = 3 => zero pad with 1 F = 5 => zero pad with 2 F = 7 => zero pad with 3

0 0 0 0 0 0

0

0

0

0

Page 25: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201638

Remember back to… E.g. 32x32 input convolved repeatedly with 5x5 filters shrinks volumes spatially!(32 -> 28 -> 24 ...). Shrinking too fast is not good, doesn’t work well.

32

32

3

CONV,ReLUe.g. 6 5x5x3 filters 28

28

6

CONV,ReLUe.g. 10 5x5x6 filters

CONV,ReLU

….

10

24

24

Page 26: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201639

Examples time:

Input volume: 32x32x310 5x5 filters with stride 1, pad 2

Output volume size: ?

Page 27: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201640

Examples time:

Input volume: 32x32x310 5x5 filters with stride 1, pad 2

Output volume size: (32+2*2-5)/1+1 = 32 spatially, so32x32x10

Page 28: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201641

Examples time:

Input volume: 32x32x310 5x5 filters with stride 1, pad 2

Number of parameters in this layer?

Page 29: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201642

Examples time:

Input volume: 32x32x310 5x5 filters with stride 1, pad 2

Number of parameters in this layer?each filter has 5*5*3 + 1 = 76 params (+1 for bias)

=> 76*10 = 760

Page 30: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201643

Page 31: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201644

Common settings:

K = (powers of 2, e.g. 32, 64, 128, 512)- F = 3, S = 1, P = 1- F = 5, S = 1, P = 2- F = 5, S = 2, P = ? (whatever fits)- F = 1, S = 1, P = 0

Page 32: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201645

(btw, 1x1 convolution layers make perfect sense)

64

56

561x1 CONVwith 32 filters

3256

56(each filter has size 1x1x64, and performs a 64-dimensional dot product)

Page 33: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201646

Example: CONV layer in Torch

Page 34: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201654

Pooling layer- makes the representations smaller and more manageable - operates over each activation map independently:

Page 35: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201655

1 1 2 4

5 6 7 8

3 2 1 0

1 2 3 4

Single depth slice

x

y

max pool with 2x2 filters and stride 2 6 8

3 4

MAX POOLING

Page 36: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201656

Page 37: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201657

Common settings:

F = 2, S = 2F = 3, S = 2

Page 38: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201658

Fully Connected Layer (FC layer)- Contains neurons that connect to the entire input volume, as in ordinary Neural

Networks

Page 39: Convolution Layer...Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 -13 27 Jan 2016 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot

Lecture 7 - 27 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 7 - 27 Jan 201660

Case Study: LeNet-5[LeCun et al., 1998]

Conv filters were 5x5, applied at stride 1Subsampling (Pooling) layers were 2x2 applied at stride 2i.e. architecture is [CONV-POOL-CONV-POOL-CONV-FC]