Convolutional Neural Networks Convolutional Neural Networks Amir H. Payberah payberah@kth.se...

download Convolutional Neural Networks Convolutional Neural Networks Amir H. Payberah payberah@kth.se 05/12/2018

of 187

  • date post

    03-Aug-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Convolutional Neural Networks Convolutional Neural Networks Amir H. Payberah payberah@kth.se...

  • Convolutional Neural Networks

    Amir H. Payberah payberah@kth.se

    05/12/2018

  • The Course Web Page

    https://id2223kth.github.io

    1 / 122

  • Where Are We?

    2 / 122

  • Where Are We?

    3 / 122

  • Let’s Start With An Example

    4 / 122

  • MNIST Dataset

    I Handwritten digits in the MNIST dataset are 28x28 pixel greyscale images.

    5 / 122

  • One-Layer Network For Classifying MNIST (1/4)

    I Let’s make a one-layer neural network for classifying digits.

    I Each neuron in a neural network: • Does a weighted sum of all of its inputs • Adds a bias • Feeds the result through some non-linear activation function, e.g., softmax.

    6 / 122

  • One-Layer Network For Classifying MNIST (1/4)

    I Let’s make a one-layer neural network for classifying digits.

    I Each neuron in a neural network: • Does a weighted sum of all of its inputs • Adds a bias • Feeds the result through some non-linear activation function, e.g., softmax.

    6 / 122

  • One-Layer Network For Classifying MNIST (2/4)

    [https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

    7 / 122

  • One-Layer Network For Classifying MNIST (3/4)

    I Assume we have a batch of 100 images as the input.

    I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

    • The first neuron: L0,0 = w0,0x

    (1) 0 + w1,0x

    (1) 1 + · · ·+ w783,0x

    (1) 783

    • The 2nd neuron until the 10th: L0,1 = w0,1x

    (1) 0 + w1,1x

    (1) 1 + · · ·+ w783,1x

    (1) 783

    · · · L0,9 = w0,9x

    (1) 0 + w1,9x

    (1) 1 + · · ·+ w783,9x

    (1) 783

    • Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

    8 / 122

  • One-Layer Network For Classifying MNIST (3/4)

    I Assume we have a batch of 100 images as the input.

    I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

    • The first neuron: L0,0 = w0,0x

    (1) 0 + w1,0x

    (1) 1 + · · ·+ w783,0x

    (1) 783

    • The 2nd neuron until the 10th: L0,1 = w0,1x

    (1) 0 + w1,1x

    (1) 1 + · · ·+ w783,1x

    (1) 783

    · · · L0,9 = w0,9x

    (1) 0 + w1,9x

    (1) 1 + · · ·+ w783,9x

    (1) 783

    • Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

    8 / 122

  • One-Layer Network For Classifying MNIST (3/4)

    I Assume we have a batch of 100 images as the input.

    I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

    • The first neuron: L0,0 = w0,0x

    (1) 0 + w1,0x

    (1) 1 + · · ·+ w783,0x

    (1) 783

    • The 2nd neuron until the 10th: L0,1 = w0,1x

    (1) 0 + w1,1x

    (1) 1 + · · ·+ w783,1x

    (1) 783

    · · · L0,9 = w0,9x

    (1) 0 + w1,9x

    (1) 1 + · · ·+ w783,9x

    (1) 783

    • Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

    8 / 122

  • One-Layer Network For Classifying MNIST (3/4)

    I Assume we have a batch of 100 images as the input.

    I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

    • The first neuron: L0,0 = w0,0x

    (1) 0 + w1,0x

    (1) 1 + · · ·+ w783,0x

    (1) 783

    • The 2nd neuron until the 10th: L0,1 = w0,1x

    (1) 0 + w1,1x

    (1) 1 + · · ·+ w783,1x

    (1) 783

    · · · L0,9 = w0,9x

    (1) 0 + w1,9x

    (1) 1 + · · ·+ w783,9x

    (1) 783

    • Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

    8 / 122

  • One-Layer Network For Classifying MNIST (3/4)

    I Assume we have a batch of 100 images as the input.

    I Using the first column of the weights matrix W, we compute the weighted sum of all the pixels of the first image.

    • The first neuron: L0,0 = w0,0x

    (1) 0 + w1,0x

    (1) 1 + · · ·+ w783,0x

    (1) 783

    • The 2nd neuron until the 10th: L0,1 = w0,1x

    (1) 0 + w1,1x

    (1) 1 + · · ·+ w783,1x

    (1) 783

    · · · L0,9 = w0,9x

    (1) 0 + w1,9x

    (1) 1 + · · ·+ w783,9x

    (1) 783

    • Repeat the operation for the other 99 images, i.e., x(2) · · · x(100)

    8 / 122

  • One-Layer Network For Classifying MNIST (4/4)

    I Each neuron must now add its bias.

    I Apply the softmax activation function for each instance x(i).

    I For each input instance x(i): Li =

     Li,0 Li,1

    ... Li,9

     I ŷi = softmax(Li + b)

    9 / 122

  • How Good the Predictions Are?

    I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

    I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

    I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

    10 / 122

  • How Good the Predictions Are?

    I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

    I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

    I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

    10 / 122

  • How Good the Predictions Are?

    I Define the cost function J(W) as the cross-entropy of what the network tells us (ŷi) and what we know to be the truth (yi), for each instance x(i).

    I Compute the partial derivatives of the cross-entropy with respect to all the weights and all the biases, ∇WJ(W).

    I Update weights and biases by a fraction of the gradient W(next) = W − η∇WJ(W)

    10 / 122

  • Adding More Layers

    I Add more layers to improve the accuracy.

    I On intermediate layers we will use the the sigmoid activation function.

    I We keep softmax as the activation function on the last layer.

    [https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

    11 / 122

  • Some Improvement

    I Better activation function, e.g., using ReLU(z) = max(0, z).

    I Overcome Network overfitting, e.g., using dropout.

    I Network initialization. e.g., using He initialization.

    I Better optimizer, e.g., using Adam optimizer.

    [https://github.com/GoogleCloudPlatform/tensorflow-without-a-phd]

    12 / 122

  • Vanilla Deep Neural Networks Challenges (1/2)

    I Pixels of each image were flattened into a single vector (really bad idea).

    I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

    I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

    13 / 122

  • Vanilla Deep Neural Networks Challenges (1/2)

    I Pixels of each image were flattened into a single vector (really bad idea).

    I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

    I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

    13 / 122

  • Vanilla Deep Neural Networks Challenges (1/2)

    I Pixels of each image were flattened into a single vector (really bad idea).

    I Vanilla deep neural networks do not scale. • In MNIST, images are black-and-white 28x28 pixel images: 28× 28 = 784 weights.

    I Handwritten digits are made of shapes and we discarded the shape information when we flattened the pixels.

    13 / 122

  • Vanilla Deep Neural Networks Challenges (2/2)

    I Difficult to recognize objects.

    I Rotation

    I Lighting: objects may look different depending on the level of external lighting.

    I Deformation: objects can be deformed in a variety of non-affine ways.

    I Scale variation: visual classes often exhibit variation in their size.

    I Viewpoint invariance.

    14 / 122

  • Vanilla Deep Neural Networks Challenges (2/2)

    I Difficult to recognize objects.

    I Rotation

    I Lighting: objects may look different depending on the level of external lighting.

    I Deformation: objects can be deformed in a variety of non-affine ways.

    I Scale variation: visual classes often exhibit variation in their size.

    I Viewpoint invariance.

    14 / 122

  • Tackle the Challenges

    I Convolutional neural networks (CNN) can tackle the vanilla model challenges.

    I CNN is a type of neural network that can take advantage of shape information.

    I It applies a series of filters to the raw pixel data of an image to extract and learn higher-level features, which the model can then use for classification.

    15 / 122

  • Tackle the Challenges

    I Convolutional neural networks (CNN) can tackle the