Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah [email protected]...

187
Convolutional Neural Networks Amir H. Payberah [email protected] 05/12/2018

Transcript of Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah [email protected]...

Page 1: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Neural Networks

Amir H. [email protected]

05/12/2018

Page 2: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

The Course Web Page

https://id2223kth.github.io

1 / 122

Page 3: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Where Are We?

2 / 122

Page 4: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Where Are We?

3 / 122

Page 5: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Let’s Start With An Example

4 / 122

Page 6: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

MNIST Dataset

I Handwritten digits in the MNIST dataset are 28x28 pixel greyscale images.

5 / 122

Page 7: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (1/4)

I Let’s make a one-layer neural network for classifying digits.

I Each neuron in a neural network:• Does a weighted sum of all of its inputs• Adds a bias• Feeds the result through some non-linear activation function, e.g., softmax.

6 / 122

Page 8: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (1/4)

I Let’s make a one-layer neural network for classifying digits.

I Each neuron in a neural network:• Does a weighted sum of all of its inputs• Adds a bias• Feeds the result through some non-linear activation function, e.g., softmax.

6 / 122

Page 9: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (2/4)

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

7 / 122

Page 10: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum ofall the pixels of the first image.

• The first neuron:L0,0 = w0,0x

(1)0 + w1,0x

(1)1 + · · ·+ w783,0x

(1)783

• The 2nd neuron until the 10th:L0,1 = w0,1x

(1)0 + w1,1x

(1)1 + · · ·+ w783,1x

(1)783

· · ·L0,9 = w0,9x

(1)0 + w1,9x

(1)1 + · · ·+ w783,9x

(1)783

• Repeat the operation for the other 99 images,i.e., x(2) · · · x(100)

8 / 122

Page 11: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum ofall the pixels of the first image.

• The first neuron:L0,0 = w0,0x

(1)0 + w1,0x

(1)1 + · · ·+ w783,0x

(1)783

• The 2nd neuron until the 10th:L0,1 = w0,1x

(1)0 + w1,1x

(1)1 + · · ·+ w783,1x

(1)783

· · ·L0,9 = w0,9x

(1)0 + w1,9x

(1)1 + · · ·+ w783,9x

(1)783

• Repeat the operation for the other 99 images,i.e., x(2) · · · x(100)

8 / 122

Page 12: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum ofall the pixels of the first image.

• The first neuron:L0,0 = w0,0x

(1)0 + w1,0x

(1)1 + · · ·+ w783,0x

(1)783

• The 2nd neuron until the 10th:L0,1 = w0,1x

(1)0 + w1,1x

(1)1 + · · ·+ w783,1x

(1)783

· · ·L0,9 = w0,9x

(1)0 + w1,9x

(1)1 + · · ·+ w783,9x

(1)783

• Repeat the operation for the other 99 images,i.e., x(2) · · · x(100)

8 / 122

Page 13: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum ofall the pixels of the first image.

• The first neuron:L0,0 = w0,0x

(1)0 + w1,0x

(1)1 + · · ·+ w783,0x

(1)783

• The 2nd neuron until the 10th:L0,1 = w0,1x

(1)0 + w1,1x

(1)1 + · · ·+ w783,1x

(1)783

· · ·L0,9 = w0,9x

(1)0 + w1,9x

(1)1 + · · ·+ w783,9x

(1)783

• Repeat the operation for the other 99 images,i.e., x(2) · · · x(100)

8 / 122

Page 14: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum ofall the pixels of the first image.

• The first neuron:L0,0 = w0,0x

(1)0 + w1,0x

(1)1 + · · ·+ w783,0x

(1)783

• The 2nd neuron until the 10th:L0,1 = w0,1x

(1)0 + w1,1x

(1)1 + · · ·+ w783,1x

(1)783

· · ·L0,9 = w0,9x

(1)0 + w1,9x

(1)1 + · · ·+ w783,9x

(1)783

• Repeat the operation for the other 99 images,i.e., x(2) · · · x(100)

8 / 122

Page 15: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

One-Layer Network For Classifying MNIST (4/4)

I Each neuron must now add its bias.

I Apply the softmax activation function for each instance x(i).

I For each input instance x(i): Li =

Li,0Li,1

...Li,9

I yi = softmax(Li + b)

9 / 122

Page 16: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (yi)and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weightsand all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

Page 17: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (yi)and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weightsand all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

Page 18: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (yi)and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weightsand all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

Page 19: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Adding More Layers

I Add more layers to improve the accuracy.

I On intermediate layers we will use the the sigmoid activation function.

I We keep softmax as the activation function on the last layer.

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

11 / 122

Page 20: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Some Improvement

I Better activation function, e.g., using ReLU(z) = max(0, z).

I Overcome Network overfitting, e.g., using dropout.

I Network initialization. e.g., using He initialization.

I Better optimizer, e.g., using Adam optimizer.

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

12 / 122

Page 21: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale.• In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information whenwe flattened the pixels.

13 / 122

Page 22: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale.• In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information whenwe flattened the pixels.

13 / 122

Page 23: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale.• In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information whenwe flattened the pixels.

13 / 122

Page 24: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Vanilla Deep Neural Networks Challenges (2/2)

I Difficult to recognize objects.

I Rotation

I Lighting: objects may look different depending on the level of external lighting.

I Deformation: objects can be deformed in a variety of non-affine ways.

I Scale variation: visual classes often exhibit variation in their size.

I Viewpoint invariance.

14 / 122

Page 25: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Vanilla Deep Neural Networks Challenges (2/2)

I Difficult to recognize objects.

I Rotation

I Lighting: objects may look different depending on the level of external lighting.

I Deformation: objects can be deformed in a variety of non-affine ways.

I Scale variation: visual classes often exhibit variation in their size.

I Viewpoint invariance.

14 / 122

Page 26: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Tackle the Challenges

I Convolutional neural networks (CNN) can tackle the vanilla model challenges.

I CNN is a type of neural network that can take advantage of shape information.

I It applies a series of filters to the raw pixel data of an image to extract and learnhigher-level features, which the model can then use for classification.

15 / 122

Page 27: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Tackle the Challenges

I Convolutional neural networks (CNN) can tackle the vanilla model challenges.

I CNN is a type of neural network that can take advantage of shape information.

I It applies a series of filters to the raw pixel data of an image to extract and learnhigher-level features, which the model can then use for classification.

15 / 122

Page 28: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters and Convolution Operations

16 / 122

Page 29: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Brain Visual Cortex Inspired CNNs

I 1959, David H. Hubel and Torsten Wiesel.

I Many neurons in the visual cortex have a small local receptive field.

I They react only to visual stimuli located in a limited region of the visual field.

17 / 122

Page 30: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Brain Visual Cortex Inspired CNNs

I 1959, David H. Hubel and Torsten Wiesel.

I Many neurons in the visual cortex have a small local receptive field.

I They react only to visual stimuli located in a limited region of the visual field.

17 / 122

Page 31: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Receptive Fields and Filters

I Imagine a flashlight that is shining over the top left of the image.

I The region that it is shining over is called the receptive field.

I This flashlight is called a filter.

I A filter is a set of weights.

I A filter is a feature detector, e.g., straight edges, simple colors, and curves.

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

18 / 122

Page 32: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Receptive Fields and Filters

I Imagine a flashlight that is shining over the top left of the image.

I The region that it is shining over is called the receptive field.

I This flashlight is called a filter.

I A filter is a set of weights.

I A filter is a feature detector, e.g., straight edges, simple colors, and curves.

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

18 / 122

Page 33: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Receptive Fields and Filters

I Imagine a flashlight that is shining over the top left of the image.

I The region that it is shining over is called the receptive field.

I This flashlight is called a filter.

I A filter is a set of weights.

I A filter is a feature detector, e.g., straight edges, simple colors, and curves.

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

18 / 122

Page 34: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Example (1/3)

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

19 / 122

Page 35: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Example (1/3)

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

19 / 122

Page 36: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Example (2/3)

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

20 / 122

Page 37: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Example (3/3)

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

21 / 122

Page 38: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation

I Convolution takes a filter and multiplying it over the entire area of an input image.

I Imagine this flashlight (filter) sliding across all the areas of the input image.

[https://adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks]

22 / 122

Page 39: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - More Formal Definition

I Convolution is a mathematical operation on two functions x and h.• You can think of x as the input image, and h as a filter (kernel) on the input image.

I For a 1D convolution we can define it as below:

y(k) =N−1∑n=0

h(n) · x(k− n)

I N is the number of elements in h.

I We are sliding the filter h over the input image x.

23 / 122

Page 40: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - More Formal Definition

I Convolution is a mathematical operation on two functions x and h.• You can think of x as the input image, and h as a filter (kernel) on the input image.

I For a 1D convolution we can define it as below:

y(k) =N−1∑n=0

h(n) · x(k− n)

I N is the number of elements in h.

I We are sliding the filter h over the input image x.

23 / 122

Page 41: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - More Formal Definition

I Convolution is a mathematical operation on two functions x and h.• You can think of x as the input image, and h as a filter (kernel) on the input image.

I For a 1D convolution we can define it as below:

y(k) =N−1∑n=0

h(n) · x(k− n)

I N is the number of elements in h.

I We are sliding the filter h over the input image x.

23 / 122

Page 42: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 1D Example (1/2)

I Suppose our input 1D image is x, and filter h are as follows:

I Let’s call the output image y.

I What is the value of y(3)?

y(k) =N−1∑n=0

h(n) · x(k− n)

24 / 122

Page 43: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 1D Example (2/2)

I To compute y(3), we slide the filter so that it is centered around x(3).

y(3) =1

350 +

1

360 +

1

310 = 40

I We can compute the other values of y as well.

25 / 122

Page 44: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 1D Example (2/2)

I To compute y(3), we slide the filter so that it is centered around x(3).

y(3) =1

350 +

1

360 +

1

310 = 40

I We can compute the other values of y as well.

25 / 122

Page 45: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 2D Example (1/2)

26 / 122

Page 46: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 2D Example (2/2)

I Detect vertical and horizontal lines in an image.

I Slide the filters across the entirety of the image.

I The result is our feature map: indicates where we’ve found the feature we’re lookingfor in the original image.

27 / 122

Page 47: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Operation - 2D Example (2/2)

I Detect vertical and horizontal lines in an image.

I Slide the filters across the entirety of the image.

I The result is our feature map: indicates where we’ve found the feature we’re lookingfor in the original image.

27 / 122

Page 48: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Neural Network (CNN)

28 / 122

Page 49: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (1/2)

I Convolutional layers: apply a specified number of convolution filters to the image.

I Pooling layers: downsample the image data extracted by the convolutional layers toreduce the dimensionality of the feature map in order to decrease processing time.

I Dense layers: a fully connected layer that performs classification on the featuresextracted by the convolutional layers and downsampled by the pooling layers.

29 / 122

Page 50: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (1/2)

I Convolutional layers: apply a specified number of convolution filters to the image.

I Pooling layers: downsample the image data extracted by the convolutional layers toreduce the dimensionality of the feature map in order to decrease processing time.

I Dense layers: a fully connected layer that performs classification on the featuresextracted by the convolutional layers and downsampled by the pooling layers.

29 / 122

Page 51: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (1/2)

I Convolutional layers: apply a specified number of convolution filters to the image.

I Pooling layers: downsample the image data extracted by the convolutional layers toreduce the dimensionality of the feature map in order to decrease processing time.

I Dense layers: a fully connected layer that performs classification on the featuresextracted by the convolutional layers and downsampled by the pooling layers.

29 / 122

Page 52: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (2/2)

I A CNN is composed of a stack of convolutional modules.

I Each module consists of a convolutional layer followed by a pooling layer.

I The last module is followed by one or more dense layers that perform classification.

I The final dense layer contains a single node for each target class in the model, witha softmax activation function.

30 / 122

Page 53: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (2/2)

I A CNN is composed of a stack of convolutional modules.

I Each module consists of a convolutional layer followed by a pooling layer.

I The last module is followed by one or more dense layers that perform classification.

I The final dense layer contains a single node for each target class in the model, witha softmax activation function.

30 / 122

Page 54: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (2/2)

I A CNN is composed of a stack of convolutional modules.

I Each module consists of a convolutional layer followed by a pooling layer.

I The last module is followed by one or more dense layers that perform classification.

I The final dense layer contains a single node for each target class in the model, witha softmax activation function.

30 / 122

Page 55: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN Components (2/2)

I A CNN is composed of a stack of convolutional modules.

I Each module consists of a convolutional layer followed by a pooling layer.

I The last module is followed by one or more dense layers that perform classification.

I The final dense layer contains a single node for each target class in the model, witha softmax activation function.

30 / 122

Page 56: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer

31 / 122

Page 57: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (1/4)

I Sparse interactions

I Each neuron in the convolutional layers is only connected to pixels in its receptivefield (not every single pixel).

32 / 122

Page 58: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (2/4)

I Each neuron applies filters on its receptive field.

• Calculates a weighted sum of the input pixels in the receptive field.

I Adds a bias, and feeds the result through its activation function to the next layer.

I The output of this layer is a feature map (activation map)

33 / 122

Page 59: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (2/4)

I Each neuron applies filters on its receptive field.• Calculates a weighted sum of the input pixels in the receptive field.

I Adds a bias, and feeds the result through its activation function to the next layer.

I The output of this layer is a feature map (activation map)

33 / 122

Page 60: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (2/4)

I Each neuron applies filters on its receptive field.• Calculates a weighted sum of the input pixels in the receptive field.

I Adds a bias, and feeds the result through its activation function to the next layer.

I The output of this layer is a feature map (activation map)

33 / 122

Page 61: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (2/4)

I Each neuron applies filters on its receptive field.• Calculates a weighted sum of the input pixels in the receptive field.

I Adds a bias, and feeds the result through its activation function to the next layer.

I The output of this layer is a feature map (activation map)

33 / 122

Page 62: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (3/4)

I Parameter sharing

I All neurons of a convolutional layer reuse the same weights.

I They apply the same filter in different positions.

I Whereas in a fully-connected network, each neuron had its own set of weights.

34 / 122

Page 63: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (3/4)

I Parameter sharing

I All neurons of a convolutional layer reuse the same weights.

I They apply the same filter in different positions.

I Whereas in a fully-connected network, each neuron had its own set of weights.

34 / 122

Page 64: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolutional Layer (4/4)

I Assume the filter size (kernel size) is fw × fh.• fh and fw are the height and width of the receptive field, respectively.

I A neuron in row i and column j of a given layer is connected to the outputs of theneurons in the previous layer in rows i to i + fh − 1, and columns j to j + fw − 1.

35 / 122

Page 65: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Padding

I What will happen if you apply a 5x5 filter to a 32x32 input volume?• The output volume would be 28x28.• The spatial dimensions decrease.

I Zero padding: in order for a layer to have the same height and width as the previouslayer, it is common to add zeros around the inputs.

I In TensorFlow, padding can be either SAME or VALID to have zero padding or not.

36 / 122

Page 66: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Padding

I What will happen if you apply a 5x5 filter to a 32x32 input volume?• The output volume would be 28x28.• The spatial dimensions decrease.

I Zero padding: in order for a layer to have the same height and width as the previouslayer, it is common to add zeros around the inputs.

I In TensorFlow, padding can be either SAME or VALID to have zero padding or not.

36 / 122

Page 67: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Padding

I What will happen if you apply a 5x5 filter to a 32x32 input volume?• The output volume would be 28x28.• The spatial dimensions decrease.

I Zero padding: in order for a layer to have the same height and width as the previouslayer, it is common to add zeros around the inputs.

I In TensorFlow, padding can be either SAME or VALID to have zero padding or not.

36 / 122

Page 68: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Stride (1/2)

I The distance between two consecutive receptive fields is called the stride.

I The stride controls how the filter convolves around the input volume.

I Assume sh and sw are the vertical and horizontal strides, then, a neuron located inrow i and column j in a layer is connected to the outputs of the neurons in theprevious layer located in rows i × sh to i × sh + fh − 1, and columns j × sw toj× sw + fw − 1.

37 / 122

Page 69: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Stride (1/2)

I The distance between two consecutive receptive fields is called the stride.

I The stride controls how the filter convolves around the input volume.

I Assume sh and sw are the vertical and horizontal strides, then, a neuron located inrow i and column j in a layer is connected to the outputs of the neurons in theprevious layer located in rows i × sh to i × sh + fh − 1, and columns j × sw toj× sw + fw − 1.

37 / 122

Page 70: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Stride (1/2)

I The distance between two consecutive receptive fields is called the stride.

I The stride controls how the filter convolves around the input volume.

I Assume sh and sw are the vertical and horizontal strides, then, a neuron located inrow i and column j in a layer is connected to the outputs of the neurons in theprevious layer located in rows i × sh to i × sh + fh − 1, and columns j × sw toj× sw + fw − 1.

37 / 122

Page 71: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Stride (2/2)

38 / 122

Page 72: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Stacking Multiple Feature Maps

I Up to now, we represented each convolutional layer with a single feature map.

I Each convolutional layer can be composed of several feature maps of equal sizes.

I Input images are also composed of multiple sublayers: one per color channel.

I A convolutional layer simultaneously applies multiple filters to its inputs.

39 / 122

Page 73: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Activation Function

I After calculating a weighted sum of the input pixels in the receptive fields, and addingbiases, each neuron feeds the result through its ReLU activation function to the nextlayer.

I The purpose of this activation function is to add non linearity to the system.

40 / 122

Page 74: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling Layer

41 / 122

Page 75: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling Layer (1/2)

I After the activation functions, we can apply a pooling layer.

I Its goal is to subsample (shrink) the input image.

• To reduce the computational load, the memory usage, and the number of parameters.

42 / 122

Page 76: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling Layer (1/2)

I After the activation functions, we can apply a pooling layer.

I Its goal is to subsample (shrink) the input image.• To reduce the computational load, the memory usage, and the number of parameters.

42 / 122

Page 77: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling Layer (2/2)

I Each neuron in a pooling layer is connected to the outputs of a receptive field in theprevious layer.

I A pooling neuron has no weights.

I It aggregates the inputs using an aggregation function such as the max or mean.

43 / 122

Page 78: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

44 / 122

Page 79: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

I This layer takes an input from the last convolution module, and outputs an N dimen-sional vector.

• N is the number of classes that the model has to choose from.

I For example, if you wanted a digit classification model, N would be 10.

I Each number in this N dimensional vector represents the probability of a certain class.

45 / 122

Page 80: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

I This layer takes an input from the last convolution module, and outputs an N dimen-sional vector.

• N is the number of classes that the model has to choose from.

I For example, if you wanted a digit classification model, N would be 10.

I Each number in this N dimensional vector represents the probability of a certain class.

45 / 122

Page 81: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

I This layer takes an input from the last convolution module, and outputs an N dimen-sional vector.

• N is the number of classes that the model has to choose from.

I For example, if you wanted a digit classification model, N would be 10.

I Each number in this N dimensional vector represents the probability of a certain class.

45 / 122

Page 82: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Flattening

I We need to convert the output of the convolutional part of the CNN into a 1Dfeature vector.

I This operation is called flattening.

I It gets the output of the convolutional layers, flattens all its structure to create asingle long feature vector to be used by the dense layer for the final classification.

46 / 122

Page 83: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Flattening

I We need to convert the output of the convolutional part of the CNN into a 1Dfeature vector.

I This operation is called flattening.

I It gets the output of the convolutional layers, flattens all its structure to create asingle long feature vector to be used by the dense layer for the final classification.

46 / 122

Page 84: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Example

47 / 122

Page 85: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

A Toy ConvNet: X’s and O’s

48 / 122

Page 86: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

For Example

49 / 122

Page 87: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Trickier Cases

50 / 122

Page 88: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Deciding is Hard

51 / 122

Page 89: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

What Computers See

52 / 122

Page 90: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

What Computers See

52 / 122

Page 91: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Computers are Literal

53 / 122

Page 92: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

ConvNets Match Pieces of the Image

54 / 122

Page 93: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

55 / 122

Page 94: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

56 / 122

Page 95: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

57 / 122

Page 96: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

58 / 122

Page 97: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

59 / 122

Page 98: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filters Match Pieces of the Image

60 / 122

Page 99: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

61 / 122

Page 100: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

62 / 122

Page 101: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

63 / 122

Page 102: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

64 / 122

Page 103: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

65 / 122

Page 104: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

66 / 122

Page 105: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

67 / 122

Page 106: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

68 / 122

Page 107: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

69 / 122

Page 108: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

70 / 122

Page 109: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

71 / 122

Page 110: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

72 / 122

Page 111: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

73 / 122

Page 112: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

74 / 122

Page 113: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Filtering: The Math Behind the Match

75 / 122

Page 114: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution: Trying Every Possible Match

76 / 122

Page 115: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Three Filters Here, So Three Images Out

77 / 122

Page 116: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Convolution Layer

I One image becomes a stack of filtered images.

78 / 122

Page 117: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Rectified Linear Units (ReLUs)

79 / 122

Page 118: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Rectified Linear Units (ReLUs)

80 / 122

Page 119: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Rectified Linear Units (ReLUs)

81 / 122

Page 120: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Rectified Linear Units (ReLUs)

82 / 122

Page 121: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

ReLU Layer

I A stack of images becomes a stack of images with no negative values.

83 / 122

Page 122: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

84 / 122

Page 123: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

85 / 122

Page 124: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

86 / 122

Page 125: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

87 / 122

Page 126: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

88 / 122

Page 127: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Pooling: Shrinking the Image Stack

89 / 122

Page 128: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Repeat For All the Filtered Images

90 / 122

Page 129: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Layers Get Stacked

I The output of one becomes the input of the next.

91 / 122

Page 130: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Deep Stacking

92 / 122

Page 131: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

I Flattening the outputs before giving them to the fully connected layer.

93 / 122

Page 132: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

94 / 122

Page 133: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

95 / 122

Page 134: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

96 / 122

Page 135: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

97 / 122

Page 136: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

98 / 122

Page 137: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

99 / 122

Page 138: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

100 / 122

Page 139: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Fully Connected Layer

101 / 122

Page 140: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Putting It All Together

102 / 122

Page 141: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

103 / 122

Page 142: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow

104 / 122

Page 143: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 144: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 145: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 146: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 147: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 148: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 149: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (1/8)

I A CNN for the MNIST dataset with the following network.

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Dense layer: densely connected layer with 1024 neurons.

I Logits layer

105 / 122

Page 150: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (2/8)

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Input tensor shape: [batch size, 28, 28, 1]

I Output tensor shape: [batch size, 28, 28, 32]

I Padding same is added to preserve width and height.

# MNIST images are 28x28 pixels, and have one color channel

X = tf.placeholder(tf.float32, [None, 28, 28, 1])

y_true = tf.placeholder(tf.float32, [None, 10])

conv1 = tf.layers.conv2d(inputs=X, filters=32, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

106 / 122

Page 151: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (2/8)

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Input tensor shape: [batch size, 28, 28, 1]

I Output tensor shape: [batch size, 28, 28, 32]

I Padding same is added to preserve width and height.

# MNIST images are 28x28 pixels, and have one color channel

X = tf.placeholder(tf.float32, [None, 28, 28, 1])

y_true = tf.placeholder(tf.float32, [None, 10])

conv1 = tf.layers.conv2d(inputs=X, filters=32, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

106 / 122

Page 152: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (2/8)

I Conv. layer 1: computes 32 feature maps using a 5x5 filter with ReLU activation.

I Input tensor shape: [batch size, 28, 28, 1]

I Output tensor shape: [batch size, 28, 28, 32]

I Padding same is added to preserve width and height.

# MNIST images are 28x28 pixels, and have one color channel

X = tf.placeholder(tf.float32, [None, 28, 28, 1])

y_true = tf.placeholder(tf.float32, [None, 10])

conv1 = tf.layers.conv2d(inputs=X, filters=32, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

106 / 122

Page 153: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (3/8)

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Input tensor shape: [batch size, 28, 28, 32]

I Output tensor shape: [batch size, 14, 14, 32]

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

107 / 122

Page 154: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (3/8)

I Pooling layer 1: max pooling layer with a 2x2 filter and stride of 2.

I Input tensor shape: [batch size, 28, 28, 32]

I Output tensor shape: [batch size, 14, 14, 32]

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

107 / 122

Page 155: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (4/8)

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Input tensor shape: [batch size, 14, 14, 32]

I Output tensor shape: [batch size, 14, 14, 64]

I Padding same is added to preserve width and height.

conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

108 / 122

Page 156: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (4/8)

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Input tensor shape: [batch size, 14, 14, 32]

I Output tensor shape: [batch size, 14, 14, 64]

I Padding same is added to preserve width and height.

conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

108 / 122

Page 157: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (4/8)

I Conv. layer 2: computes 64 feature maps using a 5x5 filter.

I Input tensor shape: [batch size, 14, 14, 32]

I Output tensor shape: [batch size, 14, 14, 64]

I Padding same is added to preserve width and height.

conv2 = tf.layers.conv2d(inputs=pool1, filters=64, kernel_size=[5, 5], padding="same",

activation=tf.nn.relu)

108 / 122

Page 158: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (5/8)

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Input tensor shape: [batch size, 14, 14, 64]

I Output tensor shape: [batch size, 7, 7, 64]

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

109 / 122

Page 159: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (5/8)

I Pooling layer 2: max pooling layer with a 2x2 filter and stride of 2.

I Input tensor shape: [batch size, 14, 14, 64]

I Output tensor shape: [batch size, 7, 7, 64]

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

109 / 122

Page 160: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (6/8)

I Flatten tensor into a batch of vectors.

• Input tensor shape: [batch size, 7, 7, 64]• Output tensor shape: [batch size, 7 ∗ 7 ∗ 64]

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

I Dense layer: densely connected layer with 1024 neurons.

• Input tensor shape: [batch size, 7 ∗ 7 ∗ 64]• Output tensor shape: [batch size, 1024]

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

110 / 122

Page 161: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (6/8)

I Flatten tensor into a batch of vectors.• Input tensor shape: [batch size, 7, 7, 64]• Output tensor shape: [batch size, 7 ∗ 7 ∗ 64]

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

I Dense layer: densely connected layer with 1024 neurons.

• Input tensor shape: [batch size, 7 ∗ 7 ∗ 64]• Output tensor shape: [batch size, 1024]

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

110 / 122

Page 162: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (6/8)

I Flatten tensor into a batch of vectors.• Input tensor shape: [batch size, 7, 7, 64]• Output tensor shape: [batch size, 7 ∗ 7 ∗ 64]

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

I Dense layer: densely connected layer with 1024 neurons.

• Input tensor shape: [batch size, 7 ∗ 7 ∗ 64]• Output tensor shape: [batch size, 1024]

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

110 / 122

Page 163: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (6/8)

I Flatten tensor into a batch of vectors.• Input tensor shape: [batch size, 7, 7, 64]• Output tensor shape: [batch size, 7 ∗ 7 ∗ 64]

pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 64])

I Dense layer: densely connected layer with 1024 neurons.• Input tensor shape: [batch size, 7 ∗ 7 ∗ 64]• Output tensor shape: [batch size, 1024]

dense = tf.layers.dense(inputs=pool2_flat, units=1024, activation=tf.nn.relu)

110 / 122

Page 164: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (7/8)

I Add dropout operation; 0.6 probability that element will be kept

dropout = tf.layers.dropout(inputs=dense, rate=0.4)

I Logits layer

• Input tensor shape: [batch size, 1024]• Output tensor shape: [batch size, 10]

logits = tf.layers.dense(inputs=dropout, units=10)

111 / 122

Page 165: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (7/8)

I Add dropout operation; 0.6 probability that element will be kept

dropout = tf.layers.dropout(inputs=dense, rate=0.4)

I Logits layer• Input tensor shape: [batch size, 1024]• Output tensor shape: [batch size, 10]

logits = tf.layers.dense(inputs=dropout, units=10)

111 / 122

Page 166: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

CNN in TensorFlow (8/8)

# define the cost and accuracy functions

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_true)

cross_entropy = tf.reduce_mean(cross_entropy) * 100

# define the optimizer

lr = 0.003

optimizer = tf.train.AdamOptimizer(lr)

train_step = optimizer.minimize(cross_entropy)

# execute the model

init = tf.global_variables_initializer()

n_epochs = 2000

with tf.Session() as sess:

sess.run(init)

for i in range(n_epochs):

batch_X, batch_y = mnist.train.next_batch(100)

sess.run(train_step, feed_dict={X: batch_X, y_true: batch_y})

112 / 122

Page 167: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

113 / 122

Page 168: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNNs

114 / 122

Page 169: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (1/4)

I Let’s see how to use backpropagation on a single convolutional layer.

I Assume we have an input X of size 3x3 and a single filter W of size 2x2.

I No padding and stride = 1.

I It generates an output H of size 2x2.

115 / 122

Page 170: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (1/4)

I Let’s see how to use backpropagation on a single convolutional layer.

I Assume we have an input X of size 3x3 and a single filter W of size 2x2.

I No padding and stride = 1.

I It generates an output H of size 2x2.

115 / 122

Page 171: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (1/4)

I Let’s see how to use backpropagation on a single convolutional layer.

I Assume we have an input X of size 3x3 and a single filter W of size 2x2.

I No padding and stride = 1.

I It generates an output H of size 2x2.

115 / 122

Page 172: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (1/4)

I Let’s see how to use backpropagation on a single convolutional layer.

I Assume we have an input X of size 3x3 and a single filter W of size 2x2.

I No padding and stride = 1.

I It generates an output H of size 2x2.

115 / 122

Page 173: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (2/4)

I Forward pass

h11 = W11X11 + W12X12 + W21X21 + W22X22

h12 = W11X12 + W12X13 + W21X22 + W22X23

h21 = W11X21 + W12X22 + W21X31 + W22X32

h22 = W11X22 + W12X23 + W21X32 + W22X33

116 / 122

Page 174: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (2/4)

I Forward pass

h11 = W11X11 + W12X12 + W21X21 + W22X22

h12 = W11X12 + W12X13 + W21X22 + W22X23

h21 = W11X21 + W12X22 + W21X31 + W22X32

h22 = W11X22 + W12X23 + W21X32 + W22X33

116 / 122

Page 175: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (2/4)

I Forward pass

h11 = W11X11 + W12X12 + W21X21 + W22X22

h12 = W11X12 + W12X13 + W21X22 + W22X23

h21 = W11X21 + W12X22 + W21X31 + W22X32

h22 = W11X22 + W12X23 + W21X32 + W22X33

116 / 122

Page 176: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (2/4)

I Forward pass

h11 = W11X11 + W12X12 + W21X21 + W22X22

h12 = W11X12 + W12X13 + W21X22 + W22X23

h21 = W11X21 + W12X22 + W21X31 + W22X32

h22 = W11X22 + W12X23 + W21X32 + W22X33

116 / 122

Page 177: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (2/4)

I Forward pass

h11 = W11X11 + W12X12 + W21X21 + W22X22

h12 = W11X12 + W12X13 + W21X22 + W22X23

h21 = W11X21 + W12X22 + W21X31 + W22X32

h22 = W11X22 + W12X23 + W21X32 + W22X33

116 / 122

Page 178: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (3/4)

I Backward pass

I E is the error: E = Eh11 + Eh12 + Eh21 + Eh22

∂E

∂W11=∂Eh11∂h11

∂h11

∂W11+∂Eh12∂h12

∂h12

∂W11+∂Eh21∂h21

∂h21

∂W11+∂Eh22∂h22

∂h22

∂W11

∂E

∂W12=∂Eh11∂h11

∂h11

∂W12+∂Eh12∂h12

∂h12

∂W12+∂Eh21∂h21

∂h21

∂W12+∂Eh22∂h22

∂h22

∂W12

∂E

∂W21=∂Eh11∂h11

∂h11

∂W21+∂Eh12∂h12

∂h12

∂W21+∂Eh21∂h21

∂h21

∂W21+∂Eh22∂h22

∂h22

∂W21

∂E

∂W22=∂Eh11∂h11

∂h11

∂W22+∂Eh12∂h12

∂h12

∂W22+∂Eh21∂h21

∂h21

∂W22+∂Eh22∂h22

∂h22

∂W22

117 / 122

Page 179: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (3/4)

I Backward pass

I E is the error: E = Eh11 + Eh12 + Eh21 + Eh22

∂E

∂W11=∂Eh11∂h11

∂h11

∂W11+∂Eh12∂h12

∂h12

∂W11+∂Eh21∂h21

∂h21

∂W11+∂Eh22∂h22

∂h22

∂W11

∂E

∂W12=∂Eh11∂h11

∂h11

∂W12+∂Eh12∂h12

∂h12

∂W12+∂Eh21∂h21

∂h21

∂W12+∂Eh22∂h22

∂h22

∂W12

∂E

∂W21=∂Eh11∂h11

∂h11

∂W21+∂Eh12∂h12

∂h12

∂W21+∂Eh21∂h21

∂h21

∂W21+∂Eh22∂h22

∂h22

∂W21

∂E

∂W22=∂Eh11∂h11

∂h11

∂W22+∂Eh12∂h12

∂h12

∂W22+∂Eh21∂h21

∂h21

∂W22+∂Eh22∂h22

∂h22

∂W22

117 / 122

Page 180: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (3/4)

I Backward pass

I E is the error: E = Eh11 + Eh12 + Eh21 + Eh22

∂E

∂W11=∂Eh11∂h11

∂h11

∂W11+∂Eh12∂h12

∂h12

∂W11+∂Eh21∂h21

∂h21

∂W11+∂Eh22∂h22

∂h22

∂W11

∂E

∂W12=∂Eh11∂h11

∂h11

∂W12+∂Eh12∂h12

∂h12

∂W12+∂Eh21∂h21

∂h21

∂W12+∂Eh22∂h22

∂h22

∂W12

∂E

∂W21=∂Eh11∂h11

∂h11

∂W21+∂Eh12∂h12

∂h12

∂W21+∂Eh21∂h21

∂h21

∂W21+∂Eh22∂h22

∂h22

∂W21

∂E

∂W22=∂Eh11∂h11

∂h11

∂W22+∂Eh12∂h12

∂h12

∂W22+∂Eh21∂h21

∂h21

∂W22+∂Eh22∂h22

∂h22

∂W22

117 / 122

Page 181: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (3/4)

I Backward pass

I E is the error: E = Eh11 + Eh12 + Eh21 + Eh22

∂E

∂W11=∂Eh11∂h11

∂h11

∂W11+∂Eh12∂h12

∂h12

∂W11+∂Eh21∂h21

∂h21

∂W11+∂Eh22∂h22

∂h22

∂W11

∂E

∂W12=∂Eh11∂h11

∂h11

∂W12+∂Eh12∂h12

∂h12

∂W12+∂Eh21∂h21

∂h21

∂W12+∂Eh22∂h22

∂h22

∂W12

∂E

∂W21=∂Eh11∂h11

∂h11

∂W21+∂Eh12∂h12

∂h12

∂W21+∂Eh21∂h21

∂h21

∂W21+∂Eh22∂h22

∂h22

∂W21

∂E

∂W22=∂Eh11∂h11

∂h11

∂W22+∂Eh12∂h12

∂h12

∂W22+∂Eh21∂h21

∂h21

∂W22+∂Eh22∂h22

∂h22

∂W22

117 / 122

Page 182: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (3/4)

I Backward pass

I E is the error: E = Eh11 + Eh12 + Eh21 + Eh22

∂E

∂W11=∂Eh11∂h11

∂h11

∂W11+∂Eh12∂h12

∂h12

∂W11+∂Eh21∂h21

∂h21

∂W11+∂Eh22∂h22

∂h22

∂W11

∂E

∂W12=∂Eh11∂h11

∂h11

∂W12+∂Eh12∂h12

∂h12

∂W12+∂Eh21∂h21

∂h21

∂W12+∂Eh22∂h22

∂h22

∂W12

∂E

∂W21=∂Eh11∂h11

∂h11

∂W21+∂Eh12∂h12

∂h12

∂W21+∂Eh21∂h21

∂h21

∂W21+∂Eh22∂h22

∂h22

∂W21

∂E

∂W22=∂Eh11∂h11

∂h11

∂W22+∂Eh12∂h12

∂h12

∂W22+∂Eh21∂h21

∂h21

∂W22+∂Eh22∂h22

∂h22

∂W22

117 / 122

Page 183: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Training CNN (4/4)

I Update the wights W

W(next)11 = W11 − η

∂E

∂W11

W(next)12 = W12 − η

∂E

∂W12

W(next)21 = W21 − η

∂E

∂W21

W(next)22 = W22 − η

∂E

∂W22

118 / 122

Page 184: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Summary

119 / 122

Page 185: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Summary

I Receptive fields and filters

I Convolution operation

I Padding and strides

I Pooling layer

I Flattening, dropout, dense

120 / 122

Page 186: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Reference

I Tensorflow and Deep Learning without a PhDhttps://codelabs.developers.google.com/codelabs/cloud-tensorflow-mnist

I Ian Goodfellow et al., Deep Learning (Ch. 9)

I Aurelien Geron, Hands-On Machine Learning (Ch. 13)

121 / 122

Page 187: Convolutional Neural Networks · Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018. The Course Web Page 1/122. Where Are We? 2/122. Where Are We? 3/122. Let’s

Questions?

122 / 122