Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 ... Fei-Fei Li & Justin Johnson...

download Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 ... Fei-Fei Li & Justin Johnson & Serena Yeung

of 101

  • date post

    30-May-2020
  • Category

    Documents

  • view

    12
  • download

    0

Embed Size (px)

Transcript of Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 ... Fei-Fei Li & Justin Johnson...

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20181

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018

    Administrative

    Assignment 1 due Wednesday April 18, 11:59pm

    2

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018

    Administrative

    All office hours this week will use queuestatus

    3

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20184

    want

    scores function

    SVM loss

    data loss + regularization

    Where we are...

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20185

    Optimization

    Landscape image is CC0 1.0 public domain Walking man image is CC0 1.0 public domain

    http://maxpixel.freegreatpicture.com/Mountains-Valleys-Landscape-Hills-Grass-Green-699369 https://creativecommons.org/publicdomain/zero/1.0/ http://www.publicdomainpictures.net/view-image.php?image=139314&picture=walking-man https://creativecommons.org/publicdomain/zero/1.0/

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20186

    Numerical gradient: slow :(, approximate :(, easy to write :) Analytic gradient: fast :), exact :), error-prone :(

    In practice: Derive analytic gradient, check your implementation with numerical gradient

    Gradient descent

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20187

    x

    W

    hinge loss

    R

    + L s (scores)

    Computational graphs

    *

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20188

    input image

    loss

    weights

    Convolutional network (AlexNet)

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 20189

    Neural Turing Machine

    Figure reproduced with permission from a Twitter post by Andrej Karpathy.

    input image

    loss

    https://twitter.com/karpathy/status/597631909930242048?lang=en

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018

    Neural Turing Machine

    Figure reproduced with permission from a Twitter post by Andrej Karpathy.

    https://twitter.com/karpathy/status/597631909930242048?lang=en

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201811

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201812

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201713

    e.g. x = -2, y = 5, z = -4

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201714

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201715

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201716

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201717

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201718

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201719

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201720

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201721

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

    Chain rule:

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201722

    Chain rule:

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201723

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

    Chain rule:

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 13, 201724

    Chain rule:

    e.g. x = -2, y = 5, z = -4

    Want:

    Backpropagation: a simple example

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201825

    f

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201826

    f

    “local gradient”

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201827

    f

    “local gradient”

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201828

    f

    “local gradient”

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201829

    f

    “local gradient”

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201830

    f

    “local gradient”

    gradients

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201831

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201832

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201833

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201834

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201835

    Another example:

    Upstream gradient

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201836

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201837

    Another example:

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201838

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201839

    Another example:

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201840

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201841

    Another example:

    Upstream gradient

    Local gradient

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201842

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201843

    Another example:

    [upstream gradient] x [local gradient] [0.2] x [1] = 0.2 [0.2] x [1] = 0.2 (both inputs!)

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201844

    Another example:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201845

    Another example:

    [upstream gradient] x [local gradient] x0: [0.2] x [2] = 0.4 w0: [0.2] x [-1] = -0.2

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201846

    sigmoid function

    sigmoid gate

    Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 2018

    sigmoid gate

    47

    [upstream gradient] x [local gradient] [1.00] x [(1 - 0.73) (0.73)]= 0.2

    sigmoid function

    Computational graph representation may not be unique. Choose one where local gradients at each node can be easily expressed!

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201848

    add gate: gradient distributor

    Patterns in backward flow

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 4 - April 12, 201849

    add gate: gradient distributor

    Patterns in backward flow

    Q: What is a max gate?

  • Fei-Fei Li & Justin Johnson & Serena