High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...
Transcript of High Level Computer Vision Recurrent Neural …...“cat sat on mat” x2 “sat” h2 y2 x3...
High Level Computer Vision
Recurrent Neural Networks@ June 5, 2019
Bernt Schiele & Mario Fritz
www.mpi-inf.mpg.de/hlcv/ Max Planck Institute for Informatics & Saarland University, Saarland Informatics Campus Saarbrücken
High Level Computer Vision | Bernt Schiele & Mario Fritz
Overview Today’s Lecture
• Recurrent Neural Networks (RNNs) ‣ Motivation & flexibility of RNNs (some recap from last week)
‣ Language modeling - including “unreasonable effectiveness of RNNs”
‣ RNNs for image description / captioning
‣ Standard RNN and a particularly successful RNN: Long Short Term Memory (LSTM) - including “visualizations of RNN cells”
!2
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
Sequences in VisionSequences in the input… (many-to-one)
JumpingDancingFightingEating
Running
slidecredit:JeffDonahue
Sequences in VisionSequences in the output… (one-to-many)
A happy brown dog.
slidecredit:JeffDonahue
Sequences in VisionSequences everywhere! (many-to-many)
A dog jumps over a hurdle.
slidecredit:JeffDonahue
ConvNets
Krizhevsky et al., NIPS 2012
slidecredit:JeffDonahue
Problem #1
fixed-size, static input
224
224
slidecredit:JeffDonahue
Problem #1
fixed-size, static input
224
224
???
slidecredit:JeffDonahue
Problem #2
Krizhevsky et al., NIPS 2012
slidecredit:JeffDonahue
output is a single choice from a fixed list of options
doghorsefishsnake
cat
Problem #2slidecredit:JeffDonahue
output is a single choice from a fixed list of options
a happy brown doga big brown doga happy red doga big red dog
Problem #2
…
slidecredit:JeffDonahue
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!14
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!15
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!16
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Recurrent Neural Network (RNN)
!17
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
(Simple) Recurrent Neural Network
!18
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!19
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!20
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph
!21
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to Many
!22
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to Many
!23
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz !24
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: Many to One
!25
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
RNN: Computational Graph: One to Many
!26
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Sequence to Sequence
!27
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Sequence to Sequence
!28
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
Recurrent Networks offer a lot of flexibility:
slidecredit:AndrejKarpathy
Language Models
Recurrent Neural Network Based Language Model [Tomas Mikolov, 2010]
Word-level language model. Similar to:
slidecredit:AndrejKarpathy
Suppose we had the training sentence “cat sat on mat”
We want to train a language model: P(next word | previous words)
i.e. want these to be high: P(cat | [<S>]) P(sat | [<S>, cat]) P(on | [<S>, cat, sat]) P(mat | [<S>, cat, sat, on]) P(<E>| [<S>, cat, sat, on, mat])
slidecredit:AndrejKarpathy
Suppose we had the training sentence “cat sat on mat”
We want to train a language model: P(next word | previous words)
First, suppose we had only a finite, 1-word history: i.e. want these to be high: P(cat | <S>) P(sat | cat) P(on | sat) P(mat | on) P(<E>| mat)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4)
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
x1 “cat”
h1
y1
“cat sat on mat”
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
300 (learnable) numbers associated with each word in vocabulary
10,001-D class scores: Softmax over 10,000 words and a special <END> token. y4 = Why * h4
hidden layer (e.g. 500-D vectors) h4 = tanh(0, Wxh * x4 + Whh * h3)
Recurrent Neural Network:
slidecredit:AndrejKarpathy
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words)
x0 <START>
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
sample!
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
slidecredit:AndrejKarpathy
Generating Sentences...
Training this on a lot of sentences would give us a language model. A way to predict
P(next word | previous words) h0
x0 <START>
y0
x1 “cat”
h1
y1
x2 “sat”
h2
y2
x3 “on”
h3
y3
x4 “mat”
h4
y4
samples <END>? done.
slidecredit:AndrejKarpathy
Generating Sentences...
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!48
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!49
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!50
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!51
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!52
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!53
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Example…
!54
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!55
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!56
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!57
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Learning via Backpropagation…
!58
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
“The Unreasonable Effectiveness of Recurrent Neural Networks”
karpathy.github.io
slidecredit:AndrejKarpathy
Character-level language model example
Vocabulary: [h,e,l,o]
Example training sequence: “hello”
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Try it yourself: char-rnn on Github (uses Torch7)
slidecredit:AndrejKarpathy
Cooking Recipes
slidecredit:AndrejKarpathy
Obama Speeches
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Learning from Linux Source Code
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
slidecredit:AndrejKarpathy
Yoav Goldberg n-gram experiments
Order 10 ngram model on Shakespeare:
slidecredit:AndrejKarpathy
But on Linux:
slidecredit:AndrejKarpathy
“straw hat”
training example
slidecredit:AndrejKarpathy
“straw hat”
training example
slidecredit:AndrejKarpathy
“straw hat”
training example
X
slidecredit:AndrejKarpathy
“straw hat”
training example
X
h0
x0 <START>
y0
x1 “straw”
h1
y1
x2 “hat”
h2
y2
<START> straw hat
slidecredit:AndrejKarpathy
“straw hat”
training example
X
h0
x0 <START>
y0
x1 “straw”
h1
y1
x2 “hat”
h2
y2
<START> straw hat
before: h0 = tanh(0, Wxh * x0)
now: h0 = tanh(0, Wxh * x0 + Wih * v)
slidecredit:AndrejKarpathy
test image
slidecredit:AndrejKarpathy
test image
x0 <START>
<START>
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
sample!
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
sample!
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
slidecredit:AndrejKarpathy
h0
x0 <START>
y0
<START>
test image
straw
h1
y1
hat
h2
y2
sample! <END> token => finish.
slidecredit:AndrejKarpathy
Wow I can’t believe that worked
slidecredit:JeffDonahue
Wow I can’t believe that worked
slidecredit:JeffDonahue
Well, I can kind of see it
slidecredit:JeffDonahue
Well, I can kind of see it
slidecredit:JeffDonahue
Not sure what happened there...
slidecredit:JeffDonahue
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!94
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!95
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!96
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!97
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!98
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Vanilla RNN...
!99
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!100
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!101
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM)
!102
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!103
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!104
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!105
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Long Short Term Memory (LSTM): Gradient Flow
!106
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Alternatives
!107
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
High Level Computer Vision | Bernt Schiele & Mario Fritz
Summary
!108
slide credit: Fei-Fei, Justin Johnson, Serena Yeung
Visualizing and Understanding Recurrent Networks Andrej Karpathy*, Justin Johnson*, Li Fei-Fei (on arXiv.org)
slidecredit:AndrejKarpathy
Hunting interpretable cells
Hunting interpretable cells
slidecredit:AndrejKarpathy
Hunting interpretable cells
quote detection cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
line length tracking cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
if statement cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
quote/comment cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
code depth cell
slidecredit:AndrejKarpathy
Hunting interpretable cells
something interesting cell (not quite sure what)
slidecredit:AndrejKarpathy