Lecture 10: Recurrent Neural Networks - Stanford · PDF fileFei-Fei Li & Justin Johnson...

Click here to load reader

  • date post

    20-May-2018
  • Category

    Documents

  • view

    213
  • download

    1

Embed Size (px)

Transcript of Lecture 10: Recurrent Neural Networks - Stanford · PDF fileFei-Fei Li & Justin Johnson...

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20171

    Lecture 10:Recurrent Neural Networks

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20172

    Administrative

    A1 grades will go out soon

    A2 is due today (11:59pm)

    Midterm is in-class on Tuesday!We will send out details on where to go soon

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20173

    Extra Credit: Train Game

    More details on Piazza by early next week

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20174

    Last Time: CNN Architectures

    AlexNet

    Figure copyright Kaiming He, 2016. Reproduced with permission.

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20175

    Last Time: CNN Architectures

    Figure copyright Kaiming He, 2016. Reproduced with permission.

    3x3 conv, 128

    Pool

    3x3 conv, 64

    3x3 conv, 64

    Input

    3x3 conv, 128

    Pool

    3x3 conv, 256

    3x3 conv, 256

    Pool

    3x3 conv, 512

    3x3 conv, 512

    Pool

    3x3 conv, 512

    3x3 conv, 512

    Pool

    FC 4096

    FC 1000

    Softmax

    FC 4096

    3x3 conv, 512

    3x3 conv, 512

    Pool

    Input

    Pool

    Pool

    Pool

    Pool

    Softmax

    3x3 conv, 512

    3x3 conv, 512

    3x3 conv, 256

    3x3 conv, 256

    3x3 conv, 128

    3x3 conv, 128

    3x3 conv, 64

    3x3 conv, 64

    3x3 conv, 512

    3x3 conv, 512

    3x3 conv, 512

    3x3 conv, 512

    3x3 conv, 512

    3x3 conv, 512

    FC 4096

    FC 1000

    FC 4096

    VGG16 VGG19 GoogLeNet

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20176

    Last Time: CNN Architectures

    Figure copyright Kaiming He, 2016. Reproduced with permission. Input

    Softmax

    3x3 conv, 64

    7x7 conv, 64 / 2

    FC 1000

    Pool

    3x3 conv, 64

    3x3 conv, 643x3 conv, 64

    3x3 conv, 643x3 conv, 64

    3x3 conv, 1283x3 conv, 128 / 2

    3x3 conv, 1283x3 conv, 128

    3x3 conv, 1283x3 conv, 128

    ...

    3x3 conv, 643x3 conv, 64

    3x3 conv, 643x3 conv, 64

    3x3 conv, 643x3 conv, 64

    Pool

    relu

    Residual block

    conv

    conv

    Xidentity

    F(x) + x

    F(x)

    relu

    X

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20177

    Figures copyright Larsson et al., 2017. Reproduced with permission.

    Pool

    Conv

    Dense Block 1

    Conv

    Input

    Conv

    Dense Block 2

    Conv

    Pool

    Conv

    Dense Block 3

    Softmax

    FC

    Pool

    Conv

    Conv

    1x1 conv, 64

    1x1 conv, 64

    Input

    Concat

    Concat

    Concat

    Dense Block

    DenseNet FractalNet

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20178

    Last Time: CNN Architectures

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20179

    Last Time: CNN ArchitecturesAlexNet and VGG have tons of parameters in the fully connected layers

    AlexNet: ~62M parameters

    FC6: 256x6x6 -> 4096: 38M paramsFC7: 4096 -> 4096: 17M paramsFC8: 4096 -> 1000: 4M params~59M params in FC layers!

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201710

    Today: Recurrent Neural Networks

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201711

    Vanilla Neural Networks

    Vanilla Neural Network

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201712

    Recurrent Neural Networks: Process Sequences

    e.g. Image Captioningimage -> sequence of words

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201713

    Recurrent Neural Networks: Process Sequences

    e.g. Sentiment Classificationsequence of words -> sentiment

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201714

    Recurrent Neural Networks: Process Sequences

    e.g. Machine Translationseq of words -> seq of words

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201715

    Recurrent Neural Networks: Process Sequences

    e.g. Video classification on frame level

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201716

    Sequential Processing of Non-Sequence Data

    Ba, Mnih, and Kavukcuoglu, Multiple Object Recognition with Visual Attention, ICLR 2015.Gregor et al, DRAW: A Recurrent Neural Network For Image Generation, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

    Classify images by taking a series of glimpses

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201717

    Sequential Processing of Non-Sequence Data

    Gregor et al, DRAW: A Recurrent Neural Network For Image Generation, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

    Generate images one piece at a time!

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201718

    Recurrent Neural Network

    x

    RNN

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201719

    Recurrent Neural Network

    x

    RNN

    yusually want to predict a vector at some time steps

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201720

    Recurrent Neural Network

    x

    RNN

    yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

    new state old state input vector at some time step

    some functionwith parameters W

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201721

    Recurrent Neural Network

    x

    RNN

    yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

    Notice: the same function and the same set of parameters are used at every time step.

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201722

    (Vanilla) Recurrent Neural Network

    x

    RNN

    y

    The state consists of a single hidden vector h:

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201723

    h0 fW h1

    x1

    RNN: Computational Graph

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201724

    h0 fW h1 fW h2

    x2x1

    RNN: Computational Graph

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201725

    h0 fW h1 fW h2 fW h3

    x3

    x2x1

    RNN: Computational Graph

    hT

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201726

    h0 fW h1 fW h2 fW h3

    x3

    x2x1W

    RNN: Computational Graph

    Re-use the same weight matrix at every time-step

    hT

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201727

    h0 fW h1 fW h2 fW h3

    x3

    yT

    x2x1W

    RNN: Computational Graph: Many to Many

    hT

    y3y2y1

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201728

    h0 fW h1 fW h2 fW h3

    x3

    yT

    x2x1W

    RNN: Computational Graph: Many to Many

    hT

    y3y2y1 L1 L2 L3 LT

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201729

    h0 fW h1 fW h2 fW h3

    x3

    yT

    x2x1W

    RNN: Computational Graph: Many to Many

    hT

    y3y2y1 L1 L2 L3 LT

    L

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201730

    h0 fW h1 fW h2 fW h3

    x3

    y

    x2x1W

    RNN: Computational Graph: Many to One

    hT

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201731

    h0 fW h1 fW h2 fW h3

    yT

    x

    W

    RNN: Computational Graph: One to Many

    hT

    y3y3y3

  • Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201732

    Sequence to Sequence: Many-to-one + one-to-many

    h0

    fWh1

    fWh2

    fWh3

    x3

    x2

    x1

    W1

    hT

    Many to one: Encode input sequence in a single vector