Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...

62
Recurrent Neural networks Slides borrowed from Abhishek Narwekar and Anusri Pampari at UIUC

Transcript of Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… ·...

Page 1: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Recurrent Neural networks

Slides borrowed fromAbhishek Narwekar and Anusri Pampari at UIUC

Page 2: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Lecture Outline

1. Introduction

2. Learning Long Term Dependencies

3. Visualization for RNNs

Page 3: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Applications of RNNs

Image Captioning [re fe rence]

.. and Trum p [re fe rence]

Write like Shakespeare [reference]

… and m ore !

Page 4: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Applications of RNNs

Technically, an RNN models sequences• Time series• Natural Language, Speech

We can even convert non -sequences to sequences, eg: feed an image as a sequence of pixels!

Page 5: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Applications of RNNs

• RNN Generated TED Talks

• YouTube Link

• RNN Generated Eminem rapper

• RNN Shady

• RNN Generated Music

• Music Link

Page 6: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Why RNNs?

• Can model sequences having variable length

• Efficient: Weights shared across time-steps

• They work!

• SOTA in several speech, NLP tasks

Page 7: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The Recurrent Neuron

next time step

Source: Slides by Arun

● xt: Input at time t● ht-1: State at time t -1

Page 8: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Unfolding an RNN

Weights shared over time!

Source: Slides by Arun

Page 9: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Making Feedforward Neural Networks Deep

Source: http ://www.opennn .ne t/im ages/deep_neura l_ne twork.png

Page 10: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Option 1: Feedforward Depth (df)

• Feedforward depth: longest path between an input and output at thesame timestep

Feedforward depth = 4

High level feature!

Notation : h 0,1 ⇒ tim e step 0, neuron # 1

Page 11: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Option 2: Recurrent Depth (dr)

● Recurrent depth : Longest path

between same hidden state in

successive timesteps

Recurrent depth = 3

Page 12: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation Through Time (BPTT)

• Objective is to update the weight matrix:

• Issue: W occurs each timestep• Every path from W to L is one dependency• Find all paths from W to L!

(note: dropping subscript h from Wh for brevity)

Page 13: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Systematically Finding All Paths

How many paths exist from W to L through L1?

Just 1. Origina ting a t h 0.

Page 14: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Systematically Finding All Paths

How many paths from W to L through L2?

2. Origina ting a t h 0 and h 1.

Page 15: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Systematically Finding All Paths

And 3 in this case.

The grad ien t has two sum m ations:1: Over Lj2: Over h k

Origin of pa th = basis for Σ

Page 16: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

• First summation over L

(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)

𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗

Page 17: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

• First summation over L

(𝐿𝐿 = 𝐿𝐿1 + 𝐿𝐿2 + ⋯𝐿𝐿𝑇𝑇−1)

𝜕𝜕𝐿𝐿𝜕𝜕𝐿𝐿𝑗𝑗

Page 18: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

● Second summation over h: Each Lj depends on the weight matrices before it

Lj depends on a ll h k be fore it.

Page 19: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

● No explicit of L j on h k

● Use chain rule to fill missing steps

j

k

Page 20: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

● No explicit of L j on h k

● Use chain rule to fill missing steps

j

k

Page 21: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The Jacobian

Indirect dependency. One final use of the chain rule gives:

“The Jacobian”

j

k

Page 22: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The Final Backpropagation Equation

Page 23: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Backpropagation as two summations

j

k

● Often , to reduce m em ory requ irem ent,

we trunca te the ne twork

● Inne r sum m ation runs from j-p to j for

som e p ==> trunca ted BPTT

Page 24: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Expanding the Jacobian

Page 25: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The Issue with the Jacobian

Repeated matrix multiplications leads to vanishing and exploding gradients.How? Let’s take a slight detour.

Weight Matrix Deriva tive of activa tion function

Page 26: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Eigenvalues and Stability• Consider identity activation function• If Recurrent Matrix Wh is a diagonalizable:

Computing powers of Wh is sim ple :

Bengio et al, "On the difficulty of training recurrent neural networks." (2012)

Q matrix composed of eigenvectors of W h

Λ is a d iagonal m atrix with e igenvalues p laced on the d iagonals

Page 27: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Eigenvalues and stabilityVanishing gradients

Exploding gradients

Page 28: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

2. Learning Long Term Dependencies

Page 29: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Outline

Vanishing/Exploding Gradients in RNN

Weight In itia liza tion

Methods

Constan t Error Carouse l

Hessian Free Optim iza tion

Echo Sta te Ne tworks

● Iden tity-RNN● np-RNN

● LSTM● GRU

Page 30: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Outline

Vanishing/Exploding Gradients in RNN

Weight In itia liza tion

Methods

Constan t Error Carouse l

Hessian Free Optim iza tion

Echo Sta te Ne tworks

● Iden tity-RNN● np-RNN

● LSTM● GRU

Page 31: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Weight Initialization Methods

• Activation function : ReLU

Bengio et al,. "On the difficulty of training recurrent neural networks." ( 2012

Page 32: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Weight Initialization Methods

• Random Wh initialization of RNN has no constraint on eigenvalues⇒ vanishing or exploding gradients in the initial epoch

Page 33: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Weight Initialization Methods

• Careful initialization of Wh with suitable eigenvalues⇒ allows the RNN to learn in the initial epochs ⇒ hence can generalize well for further iterations

Page 34: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Weight Initialization Trick #1: IRNN

● Wh initialized to Identity

● Activation function: ReLU

Geoffrey et al, “A Simple Way to Initialize Recurrent Networks of Rectified Linear Units”

Page 35: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Weight Initialization Trick #2: np-RNN● Wh positive definite (+ve real eigenvalues)

● At least one eigenvalue is 1, others all less than equal to one

● Activation function: ReLU

Geoffrey et al, “Improving Performance of Recurrent Neural Network with ReLU nonlinearity””

Page 36: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

np-RNN vs IRNN

• Sequence Classification Task

Geoffrey et al, “Improving Perfomance of Recurrent Neural Network with ReLU nonlinearity””

RNN Type Accuracy Test Parameter ComplexityCompared to RNN

Sensitivity to parameters

IRNN 67 % x1 high

np-RNN 75.2 % x1 low

LSTM 78.5 % x4 low

Page 37: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Outline

Vanishing/Exploding Gradients in RNN

Weight In itia liza tion

Methods

Constan t Error Carouse l

Hessian Free Optim iza tion

Echo Sta te Ne tworks

● Iden tity-RNN● np-RNN

● LSTM● GRU

Page 38: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The LSTM Network (Long Short-Term Memory)

Source: http://colah.github.io/posts/2015 -08-Understand ing-LSTMs/

Page 39: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The LSTM Cell

● σ(): sigmoid non -linearity

● x : e lem ent-wise m ultip lica tion

ℎ𝑡𝑡−1 ℎ𝑡𝑡

𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)

(output)

Page 40: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The LSTM Cell● σ(): sigmoid non -linearity

● x : e lem ent-wise m ultip lica tion

Forge t ga te (f)

Input ga te (i)

Candida te sta te (g)

ℎ𝑡𝑡−1 ℎ𝑡𝑡

𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)

(output)

Output ga te (o)

Page 41: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The LSTM CellForget old state Remember new state

Forget gate(f)

Output ga te (o)Input ga te (i)

Candida te sta te (g)

ℎ𝑡𝑡−1 ℎ𝑡𝑡

𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)

(output)

x

x x

● σ(): sigm oid non -linearity

● x : e lem ent-wise m ultip lica tion

Page 42: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

The LSTM CellForget old state Remember new state

Forget gate(f)

Output ga te (o)Input ga te (i)

Candida te sta te (g)

ℎ𝑡𝑡−1 ℎ𝑡𝑡

𝑐𝑐𝑡𝑡−1 𝑐𝑐𝑡𝑡 (cell state)

(output)

x

x x

● σ(): sigm oid non -linearity

● x : e lem ent-wise m ultip lica tion

Page 43: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Gated Recurrent Unit

Source: https://en.wikipedia.org/wiki/Gated_recurrent_unit

h[t]: memoryh[t]: new informationz[t]: update gate (where to add)r[t]: reset gatey[t]: output

^

^

Page 44: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Comparing GRU and LSTM

• Both GRU and LSTM better than RNN with tanh on music and speech

modeling

• GRU performs comparably to LSTM

• No clear consensus between GRU and LSTM

Source: Empirical evaluation of GRUs on sequence modeling, 2014

Page 45: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Section 3: Visualizing and Understanding Recurrent Networks

Page 46: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Character Level Language Modelling

Task: Predicting the next character given the current character

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 47: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Generated Text:● Remembers to close a bracket● Capitalize nouns● 404 Page Not Found! :P The

LSTM hallucinates it.

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurren t Neura l Ne tworks”

Page 48: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

100 thiteration

300 thiteration

700 thiteration

2000 thiteration

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 49: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Visualizing Predictions and Neuron “firings”

Excited neuron in urlNot excited neuron outside url

Likely predictionNot a likely prediction

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 50: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Features RNN Captures in Common Language ?

Page 51: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Cell Sensitive to Position in Line● Can be interpreted as tracking the line length

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 52: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Cell That Turns On Inside Quotes

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neura l Ne tworks”

Page 53: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Features RNN Captures in C Language?

Page 54: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Cell That Activates Inside IF Statements

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 55: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Cell That Is Sensitive To Indentation● Can be interpreted as tracking indentation of code.

● Increasing strength as indentation increases

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks”

Page 56: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Non-Interpretable Cells

● Only 5% of the cells show such interesting properties● Large portion of the cells are not interpretable by themselves

Andrej Karpathy, Blog on “Unreasonable Effectiveness of Recurrent Neural Networks ”

Page 57: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Visualizing Hidden State Dynamics

• Observe changes in hidden state representation overtime• Tool : LSTMVis

Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks”

Page 58: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Visualizing Hidden State Dynamics

Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurren t Neura l Ne tworks”

Page 59: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Visualizing Hidden State Dynamics

Hendrick et al, “Visual Analysis of Hidden State Dynamics in Recurren t Neura l Ne tworks”

Page 60: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

Key Takeaways

• Recurrent neural network for sequential data processing• Feedforward depth

• Recurrent depth

• Long term dependencies are a major problem in RNNs. Solutions:• Intelligent weight initialization

• LSTMs / GRUs

• Visualization helps• Analyze finer details of features produced by RNNs

Page 61: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

References

• Survey Papers• Lipton, Zachary C., John Berkowitz, and Charles Elkan. A critical review of recurrent neural networks for sequence

learning, arXiv preprint arXiv:1506.00019 (2015).• Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Chapter 10: Sequence Modeling: Recurrent and Recursive Nets.

MIT Press, 2016.• Training

• Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. Recurrent dropout without memory loss. arXiv preprint arXiv:1603.05118 (2016).

• Arjovsky, Martin, Amar Shah, and Yoshua Bengio. Unitary evolution recurrent neural networks. arXiv preprint arXiv:1511.06464 (2015).

• Le, Quoc V., Navdeep Jaitly, and Geoffrey E. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv preprint arXiv:1504.00941 (2015).

• Cooijmans, Tim, et al. Recurrent batch normalization. arXiv preprint arXiv:1603.09025 (2016).

Page 62: Recurrent Neural networks - Simon Fraser Universityfurukawa/cmpt762/slides/22-recurrent-neur… · • Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv

References (contd)

• Architectural Complexity Measures• Zhang, Saizheng, et al, Architectural Complexity Measures of Recurrent Neural Networks. Advances in Neural

Information Processing Systems. 2016.• Pascanu, Razvan, et al. How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026 (2013).

• RNN Variants• Zilly, Julian Georg, et al. Recurrent highway networks. arXiv preprint arXiv:1607.03474 (2016)• Chung, Junyoung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks, arXiv preprint

arXiv:1609.01704 (2016).• Visualization

• Karpathy, Andrej, Justin Johnson, and Li Fei-Fei. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078 (2015).

• Hendrik Strobelt, Sebastian Gehrmann, Bernd Huber, Hanspeter Pfister, Alexander M. Rush. LSTMVis: Visual Analysis for RNN, arXiv preprint arXiv:1606.07461 (2016).