Recurrent neural networks

65
Recurrent Neural Networks Viacheslav Khomenko, Ph.D.

Transcript of Recurrent neural networks

Page 1: Recurrent neural networks

Recurrent Neural Networks

Viacheslav Khomenko, Ph.D.

Page 2: Recurrent neural networks

Contents

Recap: feed-forward artificial neural network

Temporal dependencies

Recurrent neural network architectures

RNN training

New RNN architectures

Practical considerations

Neural models for locomotion

Application of RNNs

Page 3: Recurrent neural networks

RECAP: FEED-FORWARD

ARTIFICIAL NEURAL

NETWORK

Page 4: Recurrent neural networks

Feed-forward network

W. McCulloch and W. Pitts , 1940s Abstract mathematical model of a brain cell

Perceptron for classificationF. Rosenblatt, 1958

Multi-layer artificial neural networkP. Werbos, 1975

Input

Features

Input

Input

Input

Petals

Sepal

Yellow

patch

VeinsIris flower

Input

layer

Hidden

layer(s)

Output

layer

Hid-

den

Hid-

den

Hid-

den

Out-

putIris

Out-

put¬Iris

Decisions

Page 5: Recurrent neural networks

Feed-forward network

Decisions are based on current inputs:

• No memory about the past

• No future scope

A 𝒚x A

Input layer Hidden layer(s) Output layer

A

Input Decision output

Simplified representation:

Vector of input features:

Vector of predicted values:

x

𝒚

Neural activation:

A – some activation function (tanh etc…)

𝑤, 𝑏 – network parameters

Page 6: Recurrent neural networks

TEMPORAL

DEPENDENCIES

Page 7: Recurrent neural networks

Temporal dependencies

Analyzing temporal dependencies

Frame 0 Frame 1 Frame 2 Frame 3 Frame 4

P(Iris): 0.1

P(¬Iris): 0.9

P(Iris): 0.11

P(¬Iris): 0.89

P(Iris): 0.2

P(¬Iris): 0.8

P(Iris): 0.45

P(¬Iris): 0.55

P(Iris): 0.9

P(¬Iris): 0.1

Decision on

sequence of

observations

Improved decisions

Stem: seen

Petals: hidden

Stem: seen

Petals: hidden

Stem: seen

Petals: partial

Stem: partial

Petals: partialStem: hidden

Petals: seen

Page 8: Recurrent neural networks

For each state

Reber Grammar

Synthetic problem that can not be solved without memory.

Learn to predict

next possible edges

Transitions have equal probabilities:

P(1→2) = P(1→3) = 0.5

0.5

0.5 0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

States (nodes)

Transitions

(edges)

Page 9: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 10: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 11: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 12: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 13: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 14: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 15: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 16: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 17: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 18: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Page 19: Recurrent neural networks

WordCurrent node Possible paths

Begin 1 2 3 4 5 6 1 2 3 4 5 6 End

B

Step

0 1 0 0 0 0 0 0

Step

0 1 0 0 0 0 0 0

P 1 0 1 0 0 0 0 0 1 0 1 1 0 0 0 0

T 2 0 0 0 1 0 0 0 2 0 0 1 0 1 0 0

T 3 0 0 0 1 0 0 0 3 0 0 1 0 1 0 0

T 4 0 0 0 1 0 0 0 4 0 0 1 0 1 0 0

T 5 0 0 0 1 0 0 0 5 0 0 1 0 1 0 0

T 6 0 0 0 1 0 0 0 6 0 0 1 0 1 0 0

V 7 0 0 0 1 0 0 0 7 0 0 1 0 1 0 0

P 8 0 0 0 0 0 1 0 8 0 0 0 1 0 1 0

X 9 0 0 0 0 1 0 0 9 0 0 1 0 0 1 0

T 10 0 0 0 1 0 0 0 10 0 0 1 0 1 0 0

T 11 0 0 0 1 0 0 0 11 0 0 1 0 1 0 0

T 12 0 0 0 1 0 0 0 12 0 0 1 0 1 0 0

T 13 0 0 0 1 0 0 0 13 0 0 1 0 1 0 0

V 14 0 0 0 1 0 0 0 14 0 0 1 0 1 0 0

V 15 0 0 0 0 0 1 0 15 0 0 0 1 0 1 0

E 16 0 0 0 0 0 0 1 16 0 0 0 0 0 0 1

Reber Grammar

Input vector x at time t = 2 Output vector y at time t = 2

Page 20: Recurrent neural networks

Memory is important → Reasoning relies on

experience

Page 21: Recurrent neural networks

Pro: Dependencies between features at different timestamps

Cons:

• Limited history of the input (< 10 timestamps)

• Delay values should be set explicitly

• Not general, can not solve complex tasks (such as Reber Grammar)

• FFNN with delayed inputs

• No internal state

Time-delay neural network

Input

Features

Input

Input

Input

Input

layer Hidden

layer

Output

layerHid-

den

Hid-

den

Hid-

den

Out-

put 𝒚(𝒕)

x(t)

x(t-1)

x(t-2)

x(t-3)

delay

delay

delay

Page 22: Recurrent neural networks

RECURRENT NEURAL

NETWORK

ARCHITECTURES

Page 23: Recurrent neural networks

But… not working because not stable!

Simple recurrence:

feed-back output to inputNaïve attempts…

Lack of the feedback control

A∑ 𝒚(𝒕)x(t)

A

Input layer Hidden layer Output layer

AInput Decision output

Past output state

1 step delay

Expected

𝒚

𝒚

Obtained

𝒚

𝒚

Introducing recurrence

Page 24: Recurrent neural networks

A

A

𝒚(𝒕)x(t)A

Input layer Hidden layer Output layer

A

Context layer

Pro: Fast to train because can be parallelized in time

Cons:

• Output transforms hidden state → nonlinear effects, information distorted

• The output dimension may be too small → information in hidden states is truncated

M.I. Jordan, 1986

1 step delay

Jordan recurrent networkLimited short-term

memory

Output-to hidden

connections

Page 25: Recurrent neural networks

J.L. Elman, 1990

Often referenced as the basic RNN structure

and called “Vanilla” RNN

• Should see complete sequence to be trained

• Can not be parallelized by timestamps

• Has some important training difficulties….

A

A

𝒚(𝒕)x(t)A

Input layer Hidden layer Output layer

A

Context layer 1 step delay

Hidden-to hidden connections

make system Turing-complete

Elman recurrent network

Page 26: Recurrent neural networks

𝑾𝑖ℎ Weight matrix from input to hidden

𝑾𝑜 Weight matrix from hidden to output

𝒙𝑡 Input (feature) vector at time t

𝒚𝑡 Network output vector at time t

𝒉𝑡 Network internal (hidden) states vector at time t

𝑼 Weight matrix from hidden to hidden

𝒃 Bias parameter vector

𝒉𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙𝑡 +𝑼 ∙ 𝒉𝑡−1 +𝒃

𝒚𝑡 = 𝜎 𝑾𝑜 ∙ 𝒉𝑡

Vanilla RNN

Page 27: Recurrent neural networks

Unfolding the network in time

Vanilla RNN

𝒉𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙𝑡 +𝑼 ∙ 𝒉𝑡−1 +𝒃

𝒚𝑡 = 𝜎 𝑾𝑜 ∙ 𝒉𝑡

Page 28: Recurrent neural networks

RNN TRAINING

Page 29: Recurrent neural networks

Backpropagation: • Reliable and controlled convergence

• Supported by most of ML frameworks

Evolutionary methods, expectation maximization,

non-parametric methods, particle swarm optimization

Target: obtain the network parameters that optimize the cost function

Cost functions: log loss, mean squared root error etc…

Tasks:

Methods:

• For each timestamp of the input sequence x predict output y (synchronously)

• For the input sequence x predict the scalar value of y (e.g., at end of sequence)

• For the input sequence x of length Lx generate the

output sequence y of different length Ly

Research

RNN training

Page 30: Recurrent neural networks

1. Unfold the network.

2. Repeat for the train data:

1. Given some input sequence 𝒙2. For t in 0, N-1:

1. Forward-propagate

2. Initialize hidden state to the past value 𝒉𝑡−13. Obtain output sequence 𝒚4. Calculate error 𝑬 𝒚, 𝒚5. Back-propagate error across the unfolded network

6. Average the weights

7. Compute next hidden state value 𝒉𝑡

𝒉𝑡 = 𝜎 𝑾𝑖ℎ ∙ 𝒙𝑡 +𝑼 ∙ 𝒉𝑡−1 +𝒃

𝒚𝑡 = 𝜎 𝑾𝑜 ∙ 𝒉𝑡

𝑬 𝒚, 𝒚 = −

𝑡

𝒚𝒕 lg 𝒚𝒕

E.g., cross entropy loss:

Back-propagation through time

Page 31: Recurrent neural networks

Apply chain rule:

Back-propagation through time

𝜕𝑬𝟐𝜕𝜽

=

𝑘=0

2𝜕𝑬𝟐𝜕 𝒚𝟐

∙𝜕 𝒚𝟐𝜕𝒉𝟐

∙𝜕𝒉𝟐𝜕𝒉𝒌

∙𝜕𝒉𝒌𝜕𝜽

𝜽 - Network parametersFor time 2:

𝜕𝒉𝟐𝜕𝒉𝟎

=𝜕𝒉𝟐𝜕𝒉𝟏

∙𝜕𝒉𝟏𝜕𝒉𝟎

Page 32: Recurrent neural networks

Back-propagation through time

Page 33: Recurrent neural networks

Back-propagation through time

Page 34: Recurrent neural networks

Back-propagation through time

Page 35: Recurrent neural networks

Back-propagation through time

Page 36: Recurrent neural networks

Back-propagation through time

Page 37: Recurrent neural networks

Back-propagation through time

Page 38: Recurrent neural networks

Saturation

Gradient

close to 0

Saturated neurons gradients → 0

• Smaller weigh parameters leads to faster gradients vanishing.

• Very big initial parameters make the gradient descent to diverge fast (explode).

Drive previous layers gradients to 0

(especially for far time-stamps)

Known problem for deep feed-forward networks.

For recurrent networks (even shallow) makes impossible to learn long-term dependencies!

𝝏𝒉𝑡𝝏𝒉0

=𝝏𝒉𝑡𝝏𝒉𝑡−1

∙ ⋯ ∙𝝏𝒉3𝝏𝒉2

∙𝝏𝒉2𝝏𝒉1

∙𝝏𝒉1𝝏𝒉0

• Decays exponentially

• Network stops learning, can’t update

• Impossible to learn correlations

• between temporally distant events

Problem: vanishing gradients

Page 39: Recurrent neural networks

Network can not converge and

weigh parameters do not stabilize

Diagnostics: NaNs; Cost function large fluctuations

Large increase in the norm of the gradient

during training

Pascanou R. et al, On the difficulty of training

recurrent neural networks. arXiv (2012)

Problem: exploding gradients

Solutions:

• Use gradient clipping

• Try reduce learning rate

• Change loss function by setting constrains on weights (L1/L2 norms)

Page 40: Recurrent neural networks

Deep networks train difficulties:

• Vanishing gradient

• Exploding gradient

Possible solutions:

• One of the previously proposed solutions

or

• Use unsupervised pre-training →

difficult to implement, sometimes the

unsupervised solution differs much from the supervised

or

• Improve network architecture!

Fundamental deep learning problem

Page 41: Recurrent neural networks

NEW RNN ARCHITECTURES

Page 42: Recurrent neural networks

Echo State

Network Readout

Only readout

neurons are

trained!

Herbert Jaeger, 2001

In practice:

• Easy to over-fit

(models learns by

heart) – gives good

results on the train

data only

• The reservoir hyper-

parameters

optimization is not

evident

Reservoir computing

Page 43: Recurrent neural networks

Liquid state

machine

Similar to ESN, but using more

biological plausible neuron models

→ spiking (dynamic) neurons

In practice:

• Still, more a

research area

• Requires special

hardware to be

computationally

efficient

Daniel Brunner

Tal Dahan and Astar Sade

Reservoir computing

Page 44: Recurrent neural networks

• No Input Gate

• No Forget Gate

• No Output Gate

• No Input Activation Function

• No Output Activation Function

• No Peepholes

• Coupled Input and

• Forget Gate

• Full Gate Recurrence

Variants

S. Hochreiter & J. Schmidhuber, 1997

Long short-term memory

Due to gaining routing

mechanism, can be

efficiently trained to learn

LONG-TERM dependencies

Page 45: Recurrent neural networks

Has context in both directions, at any timestamp

Bidirectional RNN

Page 46: Recurrent neural networks

Last-1 output = First+1 output

BPXXXXXPE

BTXXXXXXXXTE

Testing capacity to

maintain long term

dependencies

Correct cases

BT ….. TE

BP ….. PE

Incorrect cases

BT ….. PE

BP ….. TE

System must be able to learn to compare

First+1 symbol with Last-1 symbol

Embedded Reber Grammar

Page 47: Recurrent neural networks

PRACTICAL

CONSIDERATIONS

Page 48: Recurrent neural networks

Masking input (output)

Input (output) has variable length

Data batch

Page 49: Recurrent neural networks

Length of input ≠ length of output•CTC loss function•Encoder-decoder architecture

Transform the network outputs into a

conditional probability distribution over label

sequences

- C - A - T -

- BLANK

labelling

Result decoding

Raw output: -----CCCC---AA-TTTT---1) Remove repeating symbols: -C-A-T-

2) Remove blanks: CAT

Page 50: Recurrent neural networks

NEURAL MODELS FOR

LOCOMOTION

Page 51: Recurrent neural networks

Locomotion principles in nature

[S.Roland et al., 2004]

Locomotion: movement or

the ability to move from

one place to another

Manipulation ≠ Locomotion

Aperiodic

series of

motions

Stable

Periodic

motion

gaits

Quasi stable

[A. Ijspeert et al., 2007]

Page 52: Recurrent neural networks

Wheeled on soft

ground

[S.Roland et al. 2004 ]

Locomotion efficiency

Page 53: Recurrent neural networks

Nature: no “pure” wheeled locomotion

Reason: variety of surfaces, rough terrain, adaptation is necessary

Biological locomotion exploits patterns

The number of legs influences

• Mechanical complexity

• Control complexity

• Generated patterns (for 6 legs N = (2k-1)! = 11! = 39 916 800 )

[S.Roland 2004]

Locomotion efficiency

Page 54: Recurrent neural networks

• Gait control is on “automatic pilot”

• Automatic gait is energy efficient

• Perturbation introduces modification

Not fully nature way (weak adaptation, no decisions)

How the nature deals with locomotion?

- Initiate motion by putting energy

- Passive stage

- Generate

- Control for stability

- Repeat

- Brain?

- Nervous system?

- Spinal cord?

Inconceivable automation

Page 55: Recurrent neural networks

Complexity of the phenomena involved in motor control

Central Nervous System

Motor Nervous System

NeuromuscularJunction

Models of musculoskeletal system …

Models of Motor Nervous System

Extrait: Univ du Québec-etsmtl (cours) Extrait: collège de France ( L. Damn)

Extrait: Univ. Paris 8- cours Licence L.612

Spinal cord

[P. Hénaff 2013]

Biological motor control

Page 56: Recurrent neural networks

MU aggregates muscular fibers

innervated by the common

motor neuron. Contraction of

these fibers is thus

simultaneous.

Motor unit

Sensory nerve

Motor nerve

Dorsal rootPosterior horn

Anterior horn

Ventral root

Nervo-muscular

fiber

Reflexes: pathways

Muscle contraction as a

response to its own elongationMuscle contraction as a

response to external stimuli

[P. Hénaff 2013]

Page 57: Recurrent neural networks

Central Pattern Generator• Automatic activity is controlled by spinal centers

• CPG (Central Pattern Generator) is a group of synaptic connections to generate

rhythmic motions

• The spinal pattern-generating networks do not require sensory input but nevertheless

are strongly regulated by input from limb proprioceptors

Page 58: Recurrent neural networks

Sensory-motor architecture for locomotion

[McCrea 2006]

Biological sensory-motor architecture

models

Page 59: Recurrent neural networks

Muscular contraction is put in place during embryonic life or after the birth

• Insects can walk directly upon

birth

• Most mammals require several

minutes to stand

• Humans require more than a

year to walk on two legs

How learning occurs

[ejjack2]

Page 60: Recurrent neural networks

Mathematical modeling of CPG

[J. Nassour et al.

2010]

[P.F. Rowat,

A.I. Selverston

1997]

Page 61: Recurrent neural networks

CPG approximation Limit cycle behavior

Gait Matrix

Coupling different CPG

Sensory feedback

Mathematical modeling of CPG

Hopf oscillator

Page 62: Recurrent neural networks

Neural controllers

CPG of tronc

ipsilateral

And

Contralateral

Connections

Matsuoka model

Neural based CPG controller for biped locomotion [Taga 1995]

Neural controller• 1 CPG per joint

• 2 coupled neurons per CPG

• Inhibitions: contra and ipsi latéral

• sensori motricity Intégration

Extrait de Taga 1995 (Biol. Cyb.)

Internal coupling of

the networkArticular sensory inputs:

speeds, forces, contact

ground

Model of Neuron i

(Matsuoka 1985)[P. Hénaff 2013]

Page 63: Recurrent neural networks

With couplingTemporal evaluation of frequency components of the

sagittal acceleration of the robot’s pelvis

• Automatically determines robot’s natural frequencies

• Continuously adapts to evolution of defects

Phase portraits of the oscillator

Without coupling

Learning

Synchronous

Compensation of articulation defects

ROBIAN

LISV, UVSQ

ROBIAN

LISV, UVSQ

[V.Khomenko, 2013,

LISV, UVSQ, France]

Page 64: Recurrent neural networks

APPLICATION OF

RECURRENT NEURAL

NETWORKS

Page 65: Recurrent neural networks

• Human-computer interaction

– Speech and handwriting recognition

– Music composition

– Activity recognition

• Identification and control

– Identification and control of dynamic systems by learning

– Biologically inspired robotics for adaptive locomotion

– Study of biological pattern structures forming and evaluation

Application of RNNs