High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

High Level Computer Vision

Recurrent Neural Networks@ June 5, 2019

Bernt Schiele & Mario Fritz

www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken

High Level Computer Vision | Bernt Schiele & Mario Fritz

Overview Today’s Lecture

• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)

‣ Language modeling - including “unreasonable effectiveness of RNNs”

‣ RNNs for image description / captioning

‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”

!2

Recurrent Networks offer a lot of flexibility:

slidecredit:AndrejKarpathy

Sequences in VisionSequences in the input… (many-to-one)

JumpingDancingFightingEating

Running

slidecredit:JeffDonahue

Sequences in VisionSequences in the output… (one-to-many)

A happy brown dog.


Sequences in VisionSequences everywhere! (many-to-many)

A dog jumps over a hurdle.


ConvNets

Krizhevsky et al., NIPS 2012


Problem #1

fixed-size, static input

224

224


Problem #1

fixed-size, static input

224

224

???


Problem #2

Krizhevsky et al., NIPS 2012


output is a single choice from a fixed list of options

doghorsefishsnake

cat

Problem #2slidecredit:JeffDonahue

output is a single choice from a fixed list of options

a happy brown doga big brown doga happy red doga big red dog

Problem #2

…



Recurrent Neural Network (RNN)

!14

slide credit: Fei-Fei, Justin Johnson, Serena Yeung



!15




!16




!17



(Simple) Recurrent Neural Network

!18



RNN: Computational Graph

!19




!20




!21



RNN: Computational Graph: Many to Many

!22



RNN: Computational Graph: Many to Many

!23


High Level Computer Vision | Bernt Schiele & Mario Fritz !24



RNN: Computational Graph: Many to One

!25



RNN: Computational Graph: One to Many

!26



Sequence to Sequence

!27



Sequence to Sequence

!28


Language Models

Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]

Word-level language model. Similar to:


h0

x0 <START>

y0

x1 “cat”

h1

y1

“cat sat on mat”

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

300 (learnable) numbers associated with each word in vocabulary


h0

x0 <START>

y0

x1 “cat”

h1

y1


x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4


hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)


h0

x0 <START>

y0

x1 “cat”

h1

y1


x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4


10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)


h0

x0 <START>

y0

x1 “cat”

h1

y1


x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4


10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4

hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)

Recurrent Neural Network:


Training this on a lot of sentences would give us a language model. A way to predict

P(next word | previous words)

x0 <START>


Generating Sentences...


P(next word | previous words) h0

x0 <START>

y0





x0 <START>

y0

x1 “cat”

sample!





x0 <START>

y0

x1 “cat”

h1

y1





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

sample!





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

sample!





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

sample!





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4





x0 <START>

y0

x1 “cat”

h1

y1

x2 “sat”

h2

y2

x3 “on”

h3

y3

x4 “mat”

h4

y4

samples <END>? done.




Example…

!48



Example…

!49



Example…

!50



Example…

!51



Example…

!52



Example…

!53



Example…

!54



Learning via Backpropagation…

!55




!56




!57




!58


“The Unreasonable Effectiveness of Recurrent Neural Networks”

karpathy.github.io


Character-level language model example

Vocabulary: [h,e,l,o]

Example training sequence: “hello”


Try it yourself: char-rnn on Github (uses Torch7)


Cooking Recipes


Obama Speeches


Learning from Linux Source Code


Yoav Goldberg n-gram experiments

Order 10 ngram model on Shakespeare:


But on Linux:


“straw hat”

training example


“straw hat”

training example

X


“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat


“straw hat”

training example

X

h0

x0 <START>

y0

x1 “straw”

h1

y1

x2 “hat”

h2

y2

<START> straw hat

before: h0 = tanh(0, Wxh * x0)

now: h0 = tanh(0, Wxh * x0 + Wih * v)


test image


test image

x0 <START>

<START>


h0

x0 <START>

y0

<START>

test image


h0

x0 <START>

y0

<START>

test image

straw

sample!


h0

x0 <START>

y0

<START>

test image

straw

h1

y1


h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

sample!


h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2


h0

x0 <START>

y0

<START>

test image

straw

h1

y1

hat

h2

y2

sample! <END> token => finish.


Wow I can’t believe that worked


Well, I can kind of see it


Not sure what happened there...



Vanilla RNN...

!94



Vanilla RNN...

!95



Vanilla RNN...

!96



Vanilla RNN...

!97



Vanilla RNN...

!98



Vanilla RNN...

!99



Long Short Term Memory (LSTM)

!100




!101




!102



Long Short Term Memory (LSTM): Gradient Flow

!103




!104




!105




!106



Alternatives

!107



Summary

!108


Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)


Hunting interpretable cells


quote detection cell



line length tracking cell



if statement cell



quote/comment cell



code depth cell



something interesting cell (not quite sure what)


High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...

Documents

Transcript of High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...