Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...

Recurrent Neural networks

Slides borrowed fromAbhishek Narwekar and Anusri Pampari at UIUC

Lecture Outline

1. Introduction

2. Learning Long Term Dependencies

3. Visualization for RNNs

Applications of RNNs

Image Captioning [re fe rence]

.. and Trum p [re fe rence]

Write like Shakespeare [reference]

… and m ore !

Technically, an RNN models sequences• Time series• Natural Language, Speech

We can even convert non -sequences to sequences, eg: feed an image as a sequence of pixels!

• RNN Generated TED Talks

• YouTube Link

• RNN Generated Eminem rapper

• RNN Shady

• RNN Generated Music

• Music Link

Why RNNs?

• Can model sequences having variable length

• Efficient: Weights shared across time-steps

• They work!

• SOTA in several speech, NLP tasks

The Recurrent Neuron

next time step

Source: Slides by Arun

● xt: Input at time t● ht-1: State at time t -1

Unfolding an RNN

Weights shared over time!

Source: Slides by Arun

Making Feedforward Neural Networks Deep

Source: http ://www.opennn .ne t/im ages/deep_neura l_ne twork.png

Option 1: Feedforward Depth (df)

• Feedforward depth: longest path between an input and output at thesame timestep

Feedforward depth = 4

High level feature!

Notation : h 0,1 ⇒ tim e step 0, neuron # 1

Option 2: Recurrent Depth (dr)

● Recurrent depth : Longest path

between same hidden state in

successive timesteps

Recurrent depth = 3

Backpropagation Through Time (BPTT)

• Objective is to update the weight matrix:

• Issue: W occurs each timestep• Every path from W to L is one dependency• Find all paths from W to L!

(note: dropping subscript h from Wh for brevity)

Systematically Finding All Paths

How many paths exist from W to L through L1?

Just 1. Origina ting a t h 0.

How many paths from W to L through L2?

2. Origina ting a t h 0 and h 1.

And 3 in this case.

The grad ien t has two sum m ations:1: Over Lj2: Over h k

Origin of pa th = basis for Σ

Backpropagation as two summations

• First summation over L

(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)

𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗

• First summation over L

(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)

𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗

● Second summation over h: Each Lj depends on the weight matrices before it

Lj depends on a ll h k be fore it.

● No explicit of L j on h k

● Use chain rule to fill missing steps

● No explicit of L j on h k

● Use chain rule to fill missing steps

The Jacobian

Indirect dependency. One final use of the chain rule gives:

“The Jacobian”

The Final Backpropagation Equation

● Often , to reduce m em ory requ irem ent,

we trunca te the ne twork

● Inne r sum m ation runs from j-p to j for

som e p ==> trunca ted BPTT

Expanding the Jacobian

The Issue with the Jacobian

Repeated matrix multiplications leads to vanishing and exploding gradients.How? Let’s take a slight detour.

Weight Matrix Deriva tive of activa tion function

Eigenvalues and Stability• Consider identity activation function• If Recurrent Matrix Wh is a diagonalizable:

Computing powers of Wh is sim ple :

Bengio et al, "On the difficulty of training recurrent neural networks." (2012)

Q matrix composed of eigenvectors of W h

Λ is a d iagonal m atrix with e igenvalues p laced on the d iagonals

Eigenvalues and stabilityVanishing gradients

Exploding gradients

2. Learning Long Term Dependencies

Outline

Vanishing/Exploding Gradients in RNN

Weight In itia liza tion

Methods

Constan t Error Carouse l

Hessian Free Optim iza tion

Echo Sta te Ne tworks

● Iden tity-RNN● np-RNN

● LSTM● GRU

Outline

Methods

● LSTM● GRU

Weight Initialization Methods

• Activation function : ReLU

Bengio et al,. "On the difficulty of training recurrent neural networks." ( 2012

• Random Wh initialization of RNN has no constraint on eigenvalues⇒ vanishing or exploding gradients in the initial epoch

• Careful initialization of Wh with suitable eigenvalues⇒ allows the RNN to learn in the initial epochs ⇒ hence can generalize well for further iterations

Weight Initialization Trick #1: IRNN

● Wh initialized to Identity

● Activation function: ReLU

Geoffrey et al, “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”

Weight Initialization Trick #2: np-RNN● Wh positive definite (+ve real eigenvalues)

● At least one eigenvalue is 1, others all less than equal to one

● Activation function: ReLU

Geoffrey et al, “Improving Performance of Recurrent Neural Network with ReLU nonlinearity””

np-RNN vs IRNN

• Sequence Classification Task

Geoffrey et al, “Improving Perfomance of Recurrent Neural Network with ReLU nonlinearity””

RNN Type Accuracy Test Parameter ComplexityCompared to RNN

Sensitivity to parameters

IRNN 67 % x1 high

np-RNN 75.2 % x1 low

LSTM 78.5 % x4 low

Outline

Methods

● LSTM● GRU

The LSTM Network (Long Short-Term Memory)

Source: http://colah.github.io/posts/2015 -08-Understand ing-LSTMs/

The LSTM Cell

● σ(): sigmoid non -linearity

● x : e lem ent-wise m ultip lica tion

ℎ𝑡𝑡−1 ℎ𝑡𝑡

𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)

(output)

The LSTM Cell● σ(): sigmoid non -linearity

Forge t ga te (f)

Input ga te (i)

Candida te sta te (g)

(output)

Output ga te (o)

The LSTM CellForget old state Remember new state

Forget gate(f)

Output ga te (o)Input ga te (i)

(output)

● σ(): sigm oid non -linearity

The LSTM CellForget old state Remember new state

Forget gate(f)

Output ga te (o)Input ga te (i)

(output)

● σ(): sigm oid non -linearity

Gated Recurrent Unit

Source: https://en.wikipedia.org/wiki/Gated_recurrent_unit

h[t]: memoryh[t]: new informationz[t]: update gate (where to add)r[t]: reset gatey[t]: output

Comparing GRU and LSTM

• Both GRU and LSTM better than RNN with tanh on music and speech

modeling

• GRU performs comparably to LSTM

• No clear consensus between GRU and LSTM

Source: Empirical evaluation of GRUs on sequence modeling, 2014

Section 3: Visualizing and Understanding Recurrent Networks

Character Level Language Modelling

Task: Predicting the next character given the current character

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Generated Text:● Remembers to close a bracket● Capitalize nouns● 404 Page Not Found! :P The

LSTM hallucinates it.

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurren t Neura l Ne tworks”

100 thiteration

300 thiteration

700 thiteration

2000 thiteration

Visualizing Predictions and Neuron “firings”

Excited neuron in urlNot excited neuron outside url

Likely predictionNot a likely prediction

Features RNN Captures in Common Language ?

Cell Sensitive to Position in Line● Can be interpreted as tracking the line length

Cell That Turns On Inside Quotes

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neura l Ne tworks”

Features RNN Captures in C Language?

Cell That Activates Inside IF Statements

Cell That Is Sensitive To Indentation● Can be interpreted as tracking indentation of code.

● Increasing strength as indentation increases

Non-Interpretable Cells

● Only 5% of the cells show such interesting properties● Large portion of the cells are not interpretable by themselves

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks ”

Visualizing Hidden State Dynamics

• Observe changes in hidden state representation overtime• Tool : LSTMVis

Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”

Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurren t Neura l Ne tworks”

Key Takeaways

• Recurrent neural network for sequential data processing• Feedforward depth

• Recurrent depth

• Long term dependencies are a major problem in RNNs. Solutions:• Intelligent weight initialization

• LSTMs / GRUs

• Visualization helps• Analyze finer details of features produced by RNNs

References

• Survey Papers• Lipton, Zachary C., John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence

learning, arXiv preprint arXiv:1506.00019 (2015).• Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Chapter 10: Sequence Modeling: Recurrent and Recursive Nets.

MIT Press, 2016.• Training

• Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118 (2016).

• Arjovsky, Martin, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks. arXiv preprint arXiv:1511.06464 (2015).

• Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015).

• Cooijmans, Tim, et al. Recurrent batch normalization. arXiv preprint arXiv:1603.09025 (2016).

References (contd)

• Architectural Complexity Measures• Zhang, Saizheng, et al, Architectural Complexity Measures of Recurrent Neural Networks. Advances in Neural

Information Processing Systems. 2016.• Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026 (2013).

• RNN Variants• Zilly, Julian Georg, et al. Recurrent highway networks. arXiv preprint arXiv:1607.03474 (2016)• Chung, Junyoung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks, arXiv preprint

arXiv:1609.01704 (2016).• Visualization

• Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078 (2015).

• Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber, Hanspeter Pfister, Alexander M. Rush. LSTMVis: Visual Analysis for RNN, arXiv preprint arXiv:1606.07461 (2016).

Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...

Documents

Transcript of Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...

Lecture 6: Recurrent & Graph Neural NetworksLecture 6: Recurrent & Graph Neural Networks Efstratios Gavves. UVA DEEP LEARNING COURSE –EFSTRATIOS GAVVES RECURRENT NEURAL NETWORKS

Recurrent neural network - SINTEF · 2017-01-18 · THE RECURRENT NEURAL NETWORK A recurrent neural network (RNN) is a universal approximator of dynamical systems. It can be trained

Understanding Recurrent Neural Networks Using ...

Recurrent Neural Networks - GitHub Pages

RECURRENT NEURAL NETWORKS - index-of.co.ukindex-of.co.uk/Artificial-Intelligence/Recurrent Neural Networks Design And... · recurrent neural networks and is exemplary of current research

순환신경망(Recurrent neural networks) 개요

Recurrent Neural Network - whdeng Neural Network.pdfRecurrent neural networks Long Short-Term Memory recurrent networks Applications Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk

Particle Filter Recurrent Neural Networks

Recurrent Neural Network Grammars - nlp.cs.hku.hk

Neural Network Part 4: Recurrent Neural Networkspages.cs.wisc.edu/~jerryzhu/cs760/10_neural-networks-4.pdf · •recurrent neural networks (RNN) and the advantage •training recurrent

Recurrent neural networks for statistical machine translation (Neural Machine Translation) · 2016-11-14 · Recurrent neural networks for statistical machine translation (Neural

Recurrent neural network - SINTEF · 2017. 1. 18. · THE RECURRENT NEURAL NETWORK A recurrent neural network (RNN) is a universal approximator of dynamical systems. It can be trained

SUPERRESOLUTION RECURRENT CONVOLUTIONAL NEURAL NETWORKS ...

Gated Recurrent Convolution Neural Network for OCR · which leads to the Gated Recurrent Convolution Neural Network ... Many new deep neural networks for general image ... Gated Recurrent

“Vanilla” Neural Network - GitHub Pages · 2019-12-27 · Pascanu et al, “On the difficulty of training recurrent neural networks”, ICML 2013. Fei-Fei Li & Justin Johnson

Lecture 13: Recurrent Neural Nets

Neural networks for structured data 1. Table of contents Recurrent models Partially recurrent neural networks Elman networks Jordan networks Recurrent.

Recurrent Neural Network tutorial (2nd)

Beyond Recurrent Neural Networks - GitHub Pagesiamthevastidledhitchhiker.github.io/figs/SOM_11MAR2016/TensorFlow... · The Hitchhiker’s Guide to TensorFlow: Beyond Recurrent Neural

Jacek Mazurkiewicz, PhD Softcomputing · Softcomputing Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks . Recurrent Artificial Neural Networks