Recurrent Neural Networks

Instructor: Alan Ritter

Slides Adapted from Andrej Karpathy

Lecture 10 - 8 Feb 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 10 - 8 Feb 20166

Recurrent Networks offer a lot of flexibility:

Vanilla Neural Networks

e.g. Image Captioningimage -> sequence of words

e.g. Sentiment Classificationsequence of words -> sentiment

e.g. Machine Translationseq of words -> seq of words

e.g. Video classification on frame level

Recurrent Neural Network

yusually want to predict a vector at some time steps

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

new state old state input vector at some time step

some functionwith parameters W

yWe can process a sequence of vectors x by applying a recurrence formula at every time step:

Notice: the same function and the same set of parameters are used at every time step.

(Vanilla) Recurrent Neural Network

The state consists of a single “hidden” vector h:

Character-levellanguage modelexample

Vocabulary:[h,e,l,o]

Example trainingsequence:“hello”

min-char-rnn.py gist: 112 lines of Python

(https://gist.github.com/karpathy/d4dee566867f8291f086)

min-char-rnn.py gistData I/O

min-char-rnn.py gist Initializations

recall:

min-char-rnn.py gistMain loop

min-char-rnn.py gist Loss function- forward pass (compute loss)- backward pass (compute param gradient)

min-char-rnn.py gist

Softmax classifier

recall:

train more

at first:

open source textbook on algebraic geometry

Latex source

Generated C code

Searching for interpretable cells

[Visualizing and Understanding Recurrent Networks, Andrej Karpathy*, Justin Johnson*, Li Fei-Fei]

quote detection cell

line length tracking cell

if statement cell

quote/comment cell

code depth cell

Explain Images with Multimodal Recurrent Neural Networks, Mao et al.Deep Visual-Semantic Alignments for Generating Image Descriptions, Karpathy and Fei-FeiShow and Tell: A Neural Image Caption Generator, Vinyals et al.Long-term Recurrent Convolutional Networks for Visual Recognition and Description, Donahue et al.Learning a Recurrent Visual Representation for Image Caption Generation, Chen and Zitnick

Image Captioning

Convolutional Neural Network

test image

x0<START>

<START>

x0<START>

<START>

test image

before:h = tanh(Wxh * x + Whh * h)

now:h = tanh(Wxh * x + Whh * h + Wih * v)

x0<START>

<START>

test image

sample!

x0<START>

<START>

test image

x0<START>

<START>

test image

sample!

x0<START>

<START>

test image

x0<START>

<START>

test image

sample<END> token=> finish.

Image Sentence Datasets

Microsoft COCO[Tsung-Yi Lin et al. 2014]mscoco.org

currently:~120K images~5 sentences each

Long Short Term Memory (LSTM)[Hochreiter et al., 1997]

vector from before (h)

vector from below (x)

sigmoid

4n x 2n 4n 4*n

cellstate c

higher layer, or prediction

cellstate c

one timestep one timestep

h hh x x

f f fRNN

+LSTM(ignoring forget gates)

Recall: “PlainNets” vs. ResNets

ResNet is to PlainNet what LSTM is to RNN, kind of.

Understanding gradient flow dynamicsCute backprop signal video: http://imgur.com/gallery/vaNahKE

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

Understanding gradient flow dynamics

if the largest eigenvalue is > 1, gradient will explodeif the largest eigenvalue is < 1, gradient will vanish

[On the difficulty of training Recurrent Neural Networks, Pascanu et al., 2013]

can control exploding with gradient clippingcan control vanishing with LSTM

LSTM variants and friends

[LSTM: A Search Space Odyssey, Greff et al., 2015]

[An Empirical Exploration of Recurrent Network Architectures, Jozefowicz et al., 2015]

GRU [Learning phrase representations using rnn encoder-decoder for statistical machine translation, Cho et al. 2014]

Summary- RNNs allow a lot of flexibility in architecture design- Vanilla RNNs are simple but don’t work very well- Common to use LSTM or GRU: their additive interactions

improve gradient flow- Backward flow of gradients in RNN can explode or vanish.

Exploding is controlled with gradient clipping. Vanishing is controlled with additive interactions (LSTM)

- Better/simpler architectures are a hot topic of current research- Better understanding (both theoretical and empirical) is needed.

Recurrent Neural Networks - GitHub Pagesaritter.github.io/courses/5523_slides/rnn.pdf ·...

Documents

Transcript of Recurrent Neural Networks - GitHub Pagesaritter.github.io/courses/5523_slides/rnn.pdf ·...

Online Learning with Stochastic Recurrent Neural …...recurrent neural network, hence, we refer to it as a stochastic recurrent neural network. The basis model consists of two different

Recurrent neural networks and robust time series prediction - Neural

Gated Recurrent Convolution Neural Network for OCR · which leads to the Gated Recurrent Convolution Neural Network ... Many new deep neural networks for general image ... Gated Recurrent

Tutorial Training Recurrent Neural Networks

Particle Filter Recurrent Neural Networks

Hierarchical Recurrent Neural Networks for Long …papers.nips.cc/paper/1102-hierarchical-recurrent-neural...Hierarchical Recurrent Neural Networks for Long-term Dependencies 495 The

Lecture 13: Recurrent Neural Nets

Beyond Recurrent Neural Networks - GitHub Pagesiamthevastidledhitchhiker.github.io/figs/SOM_11MAR2016/TensorFlow... · The Hitchhiker’s Guide to TensorFlow: Beyond Recurrent Neural

Recurrent neural network - SINTEF · 2017. 1. 18. · THE RECURRENT NEURAL NETWORK A recurrent neural network (RNN) is a universal approximator of dynamical systems. It can be trained

Lecture 6: Recurrent Neural Networks - GitHub Pages · Lecture 6: Recurrent Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Pixel Recurrent Neural Networks

RECURRENT NEURAL NETWORKS - index-of.co.ukindex-of.co.uk/Artificial-Intelligence/Recurrent Neural Networks Design And... · recurrent neural networks and is exemplary of current research

Discovering Gated Recurrent Neural Network Architecturesnn.cs.utexas.edu/downloads/papers/AdityaRawalThesis.pdfDiscovering Gated Recurrent Neural Network Architectures by Aditya Rawal,

Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks . Recurrent Artificial Neural Networks

Deep Q-Learning with Recurrent Neural Networkscs229.stanford.edu/...DeepQLearningWithRecurrentNeuralNetwords... · Deep Q-Learning with Recurrent Neural Networks ... rent neural network

Recurrent Neural Networksjcheung/teaching/fall-2017/...Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks

A Recurrent Quantum Neural Network

Recurrent neural network - SINTEF · 2017-01-18 · THE RECURRENT NEURAL NETWORK A recurrent neural network (RNN) is a universal approximator of dynamical systems. It can be trained

Recurrent Neural Networks - GitHub Pages