of 187

• date post

03-Aug-2020
• Category

## Documents

• view

0
• download

0

Embed Size (px)

### Transcript of Convolutional Neural Networks Convolutional Neural Networks Amir H. Payberah payberah@kth.se...

• Convolutional Neural Networks

Amir H. Payberah payberah@kth.se

05/12/2018

• The Course Web Page

https://id2223kth.github.io

1 / 122

• Where Are We?

2 / 122

• Where Are We?

3 / 122

• Let’s Start With An Example

4 / 122

• MNIST Dataset

I Handwritten digits in the MNIST dataset are 28x28 pixel greyscale images.

5 / 122

• One-Layer Network For Classifying MNIST (1/4)

I Let’s make a one-layer neural network for classifying digits.

I Each neuron in a neural network: • Does a weighted sum of all of its inputs • Adds a bias • Feeds the result through some non-linear activation function, e.g., softmax.

6 / 122

• One-Layer Network For Classifying MNIST (1/4)

I Let’s make a one-layer neural network for classifying digits.

I Each neuron in a neural network: • Does a weighted sum of all of its inputs • Adds a bias • Feeds the result through some non-linear activation function, e.g., softmax.

6 / 122

• One-Layer Network For Classifying MNIST (2/4)

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

7 / 122

• One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

• The first neuron: L0,0 = w0,0x

(1) 0 + w1,0x

(1) 1 + · · ·+ w783,0x

(1) 783

• The 2nd neuron until the 10th: L0,1 = w0,1x

(1) 0 + w1,1x

(1) 1 + · · ·+ w783,1x

(1) 783

· · · L0,9 = w0,9x

(1) 0 + w1,9x

(1) 1 + · · ·+ w783,9x

(1) 783

• Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

8 / 122

• One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

• The first neuron: L0,0 = w0,0x

(1) 0 + w1,0x

(1) 1 + · · ·+ w783,0x

(1) 783

• The 2nd neuron until the 10th: L0,1 = w0,1x

(1) 0 + w1,1x

(1) 1 + · · ·+ w783,1x

(1) 783

· · · L0,9 = w0,9x

(1) 0 + w1,9x

(1) 1 + · · ·+ w783,9x

(1) 783

• Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

8 / 122

• One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

• The first neuron: L0,0 = w0,0x

(1) 0 + w1,0x

(1) 1 + · · ·+ w783,0x

(1) 783

• The 2nd neuron until the 10th: L0,1 = w0,1x

(1) 0 + w1,1x

(1) 1 + · · ·+ w783,1x

(1) 783

· · · L0,9 = w0,9x

(1) 0 + w1,9x

(1) 1 + · · ·+ w783,9x

(1) 783

• Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

8 / 122

• One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

• The first neuron: L0,0 = w0,0x

(1) 0 + w1,0x

(1) 1 + · · ·+ w783,0x

(1) 783

• The 2nd neuron until the 10th: L0,1 = w0,1x

(1) 0 + w1,1x

(1) 1 + · · ·+ w783,1x

(1) 783

· · · L0,9 = w0,9x

(1) 0 + w1,9x

(1) 1 + · · ·+ w783,9x

(1) 783

• Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

8 / 122

• One-Layer Network For Classifying MNIST (3/4)

I Assume we have a batch of 100 images as the input.

I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

• The first neuron: L0,0 = w0,0x

(1) 0 + w1,0x

(1) 1 + · · ·+ w783,0x

(1) 783

• The 2nd neuron until the 10th: L0,1 = w0,1x

(1) 0 + w1,1x

(1) 1 + · · ·+ w783,1x

(1) 783

· · · L0,9 = w0,9x

(1) 0 + w1,9x

(1) 1 + · · ·+ w783,9x

(1) 783

• Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

8 / 122

• One-Layer Network For Classifying MNIST (4/4)

I Each neuron must now add its bias.

I Apply the softmax activation function for each instance x(i).

I For each input instance x(i): Li =

 Li,0 Li,1

... Li,9

 I ŷi = softmax(Li + b)

9 / 122

• How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

• How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

• How Good the Predictions Are?

I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

10 / 122

• Adding More Layers

I Add more layers to improve the accuracy.

I On intermediate layers we will use the the sigmoid activation function.

I We keep softmax as the activation function on the last layer.

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

11 / 122

• Some Improvement

I Better activation function, e.g., using ReLU(z) = max(0, z).

I Overcome Network overfitting, e.g., using dropout.

I Network initialization. e.g., using He initialization.

I Better optimizer, e.g., using Adam optimizer.

[https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

12 / 122

• Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

13 / 122

• Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

13 / 122

• Vanilla Deep Neural Networks Challenges (1/2)

I Pixels of each image were flattened into a single vector (really bad idea).

I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

13 / 122

• Vanilla Deep Neural Networks Challenges (2/2)

I Difficult to recognize objects.

I Rotation

I Lighting: objects may look different depending on the level of external lighting.

I Deformation: objects can be deformed in a variety of non-affine ways.

I Scale variation: visual classes often exhibit variation in their size.

I Viewpoint invariance.

14 / 122

• Vanilla Deep Neural Networks Challenges (2/2)

I Difficult to recognize objects.

I Rotation

I Lighting: objects may look different depending on the level of external lighting.

I Deformation: objects can be deformed in a variety of non-affine ways.

I Scale variation: visual classes often exhibit variation in their size.

I Viewpoint invariance.

14 / 122

• Tackle the Challenges

I Convolutional neural networks (CNN) can tackle the vanilla model challenges.

I CNN is a type of neural network that can take advantage of shape information.

I It applies a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification.

15 / 122

• Tackle the Challenges

I Convolutional neural networks (CNN) can tackle the