Backpropagation and Lecture 4: Neural · PDF file1 Lecture 4: Backpropagation and ... Serena...

Click here to load reader

  • date post

    17-May-2018
  • Category

    Documents

  • view

    214
  • download

    0

Embed Size (px)

Transcript of Backpropagation and Lecture 4: Neural · PDF file1 Lecture 4: Backpropagation and ... Serena...

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20171

    Lecture 4:Backpropagation and

    Neural Networks

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

    Administrative

    Assignment 1 due Thursday April 20, 11:59pm on Canvas

    2

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 3 - April 11, 2017

    Administrative

    Project: TA specialities and some project ideas are posted on Piazza

    3

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

    Administrative

    Google Cloud: All registered students will receive an email this week with instructions on how to redeem $100 in credits

    4

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20175

    want

    scores function

    SVM loss

    data loss + regularization

    Where we are...

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20176

    Optimization

    Landscape image is CC0 1.0 public domainWalking man image is CC0 1.0 public domain

    http://maxpixel.freegreatpicture.com/Mountains-Valleys-Landscape-Hills-Grass-Green-699369https://creativecommons.org/publicdomain/zero/1.0/http://maxpixel.freegreatpicture.com/Mountains-Valleys-Landscape-Hills-Grass-Green-699369http://www.publicdomainpictures.net/view-image.php?image=139314&picture=walking-manhttps://creativecommons.org/publicdomain/zero/1.0/http://www.publicdomainpictures.net/view-image.php?image=139314&picture=walking-man

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20177

    Numerical gradient: slow :(, approximate :(, easy to write :)Analytic gradient: fast :), exact :), error-prone :(

    In practice: Derive analytic gradient, check your implementation with numerical gradient

    Gradient descent

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20178

    x

    W

    hinge loss

    R

    + Ls (scores)

    Computational graphs

    *

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 20179

    input image

    loss

    weights

    Convolutional network(AlexNet)

    Figure copyright Alex Krizhevsky, Ilya Sutskever, and

    Geoffrey Hinton, 2012. Reproduced with permission.

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201710

    Neural Turing Machine

    Figure reproduced with permission from a Twitter post by Andrej Karpathy.

    input image

    loss

    https://twitter.com/karpathy/status/597631909930242048?lang=en

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

    Neural Turing Machine

    Figure reproduced with permission from a Twitter post by Andrej Karpathy.

    https://twitter.com/karpathy/status/597631909930242048?lang=en

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201712

    e.g. x = -2, y = 5, z = -4

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201713

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201714

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201715

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201716

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201717

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201718

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201719

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201720

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201721

    Chain rule:

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201722

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201723

    Chain rule:

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201724

    f

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201725

    f

    local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201726

    f

    local gradient

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201727

    f

    local gradient

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201728

    f

    local gradient

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201729

    f

    local gradient

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201730

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201731

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201732

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201733

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201734

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201735

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201736

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201737

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201738

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201739

    Another example:

    (-1) * (-0.20) = 0.20

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201740

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201741

    Another example:

    [local gradient] x [upstream gradient][1] x [0.2] = 0.2[1] x [0.2] = 0.2 (both inputs!)

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201742

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201743

    Another example:

    [local gradient] x [upstream gradient]x0: [2] x [0.2] = 0.4w0: [-1] x [0.2] = -0.2

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201744

    sigmoid function

    sigmoid gate

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 2017

    sigmoid gate

    45

    sigmoid function

    (0.73) * (1 - 0.73) = 0.2

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201746

    add gate: gradient distributor

    Patterns in backward flow

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201747

    add gate: gradient distributor

    Patterns in backward flow

    Q: What is a max gate?

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201748

    add gate: gradient distributor

    Patterns in backward flow

    max gate: gradient router

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201749

    add gate: gradient distributor

    Patterns in backward flow

    max gate: gradient routerQ: What is a mul gate?

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201750

    add gate: gradient distributor

    Patterns in backward flow

    max gate: gradient routermul gate: gradient switcher

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201751

    +

    Gradients add at branches

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201752

    f

    local gradient

    This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)

    (x,y,z are now vectors)

    gradients

    Gradients for vectorized code

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201753

    f(x) = max(0,x)(elementwise)

    4096-d input vector

    4096-d output vector

    Vectorized operations

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201754