# Lecture 10: Recurrent Neural Networks - Stanford · PDF fileFei-Fei Li & Justin Johnson...

date post

20-May-2018Category

## Documents

view

213download

1

Embed Size (px)

### Transcript of Lecture 10: Recurrent Neural Networks - Stanford · PDF fileFei-Fei Li & Justin Johnson...

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20171

Lecture 10:Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20172

Administrative

A1 grades will go out soon

A2 is due today (11:59pm)

Midterm is in-class on Tuesday!We will send out details on where to go soon

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20173

Extra Credit: Train Game

More details on Piazza by early next week

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20174

Last Time: CNN Architectures

AlexNet

Figure copyright Kaiming He, 2016. Reproduced with permission.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20175

Last Time: CNN Architectures

Figure copyright Kaiming He, 2016. Reproduced with permission.

3x3 conv, 128

Pool

3x3 conv, 64

3x3 conv, 64

Input

3x3 conv, 128

Pool

3x3 conv, 256

3x3 conv, 256

Pool

3x3 conv, 512

3x3 conv, 512

Pool

3x3 conv, 512

3x3 conv, 512

Pool

FC 4096

FC 1000

Softmax

FC 4096

3x3 conv, 512

3x3 conv, 512

Pool

Input

Pool

Pool

Pool

Pool

Softmax

3x3 conv, 512

3x3 conv, 512

3x3 conv, 256

3x3 conv, 256

3x3 conv, 128

3x3 conv, 128

3x3 conv, 64

3x3 conv, 64

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

3x3 conv, 512

FC 4096

FC 1000

FC 4096

VGG16 VGG19 GoogLeNet

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20176

Last Time: CNN Architectures

Figure copyright Kaiming He, 2016. Reproduced with permission. Input

Softmax

3x3 conv, 64

7x7 conv, 64 / 2

FC 1000

Pool

3x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 1283x3 conv, 128 / 2

3x3 conv, 1283x3 conv, 128

3x3 conv, 1283x3 conv, 128

...

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

3x3 conv, 643x3 conv, 64

Pool

relu

Residual block

conv

conv

Xidentity

F(x) + x

F(x)

relu

X

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20177

Figures copyright Larsson et al., 2017. Reproduced with permission.

Pool

Conv

Dense Block 1

Conv

Input

Conv

Dense Block 2

Conv

Pool

Conv

Dense Block 3

Softmax

FC

Pool

Conv

Conv

1x1 conv, 64

1x1 conv, 64

Input

Concat

Concat

Concat

Dense Block

DenseNet FractalNet

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20178

Last Time: CNN Architectures

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 20179

Last Time: CNN ArchitecturesAlexNet and VGG have tons of parameters in the fully connected layers

AlexNet: ~62M parameters

FC6: 256x6x6 -> 4096: 38M paramsFC7: 4096 -> 4096: 17M paramsFC8: 4096 -> 1000: 4M params~59M params in FC layers!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201710

Today: Recurrent Neural Networks

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201711

Vanilla Neural Networks

Vanilla Neural Network

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201712

Recurrent Neural Networks: Process Sequences

e.g. Image Captioningimage -> sequence of words

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201713

Recurrent Neural Networks: Process Sequences

e.g. Sentiment Classificationsequence of words -> sentiment

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201714

Recurrent Neural Networks: Process Sequences

e.g. Machine Translationseq of words -> seq of words

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201715

Recurrent Neural Networks: Process Sequences

e.g. Video classification on frame level

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201716

Sequential Processing of Non-Sequence Data

Ba, Mnih, and Kavukcuoglu, Multiple Object Recognition with Visual Attention, ICLR 2015.Gregor et al, DRAW: A Recurrent Neural Network For Image Generation, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Classify images by taking a series of glimpses

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201717

Sequential Processing of Non-Sequence Data

Gregor et al, DRAW: A Recurrent Neural Network For Image Generation, ICML 2015Figure copyright Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra, 2015. Reproduced with permission.

Generate images one piece at a time!

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201718

Recurrent Neural Network

x

RNN

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201719

Recurrent Neural Network

x

RNN

yusually want to predict a vector at some time steps

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201720

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201721

Recurrent Neural Network

x

RNN

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201722

(Vanilla) Recurrent Neural Network

x

RNN

y

The state consists of a single hidden vector h:

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201723

h0 fW h1

x1

RNN: Computational Graph

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201724

h0 fW h1 fW h2

x2x1

RNN: Computational Graph

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201725

h0 fW h1 fW h2 fW h3

x3

x2x1

RNN: Computational Graph

hT

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201726

h0 fW h1 fW h2 fW h3

x3

x2x1W

RNN: Computational Graph

Re-use the same weight matrix at every time-step

hT

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201727

h0 fW h1 fW h2 fW h3

x3

yT

x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201728

h0 fW h1 fW h2 fW h3

x3

yT

x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1 L1 L2 L3 LT

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201729

h0 fW h1 fW h2 fW h3

x3

yT

x2x1W

RNN: Computational Graph: Many to Many

hT

y3y2y1 L1 L2 L3 LT

L

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201730

h0 fW h1 fW h2 fW h3

x3

y

x2x1W

RNN: Computational Graph: Many to One

hT

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201731

h0 fW h1 fW h2 fW h3

yT

x

W

RNN: Computational Graph: One to Many

hT

y3y3y3

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - May 4, 201732

Sequence to Sequence: Many-to-one + one-to-many

h0

fWh1

fWh2

fWh3

x3

x2

x1

W1

hT

Many to one: Encode input sequence in a single vector

*View more*