Multilayer Perceptrons 1. Overview Recap of neural network theory The multi-layered perceptron ...

45
Multilayer Perceptrons 1

Transcript of Multilayer Perceptrons 1. Overview Recap of neural network theory The multi-layered perceptron ...

Page 1: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Multilayer Perceptrons 1

Page 2: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Overview

Recap of neural network theory The multi-layered perceptron Back-propagation Introduction to training Uses

Page 3: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Recap

Page 4: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Linear separability

When a neuron learns it is positioning a line so that all points on or above the line give an output of 1 and all points below the line give an output of 0

When there are more than 2 inputs, the pattern space is multi-dimensional, and is divided by a multi-dimensional surface (or hyperplane) rather than a line

Page 5: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Pattern space - linearly separable

X2

X1

Page 6: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Non-linearly separable problems If a problem is not linearly separable,

then it is impossible to divide the pattern space into two regions

A network of neurons is needed

Page 7: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Pattern space - non linearly separable

X2

X1

Decision surface

Page 8: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The multi-layered perceptron (MLP)

Page 9: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The multi-layered perceptron (MLP)Input layer Hidden layer Output layer

Page 10: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Complex decision surface

The MLP has the ability to emulate any function using one hidden layer with a sigmoid function, and a linear output layer

A 3-layered network can therefore produce any complex decision surface

However, the number of neurons in the hidden layer cannot be calculated

Page 11: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Network architecture

All neurons in one layer are connected to all neurons in the next layer

The network is a feedforward network, so all data flows from the input to the output

The architecture of the network shown is described as 3:4:2

All neurons in the hidden and output layers have a bias connection

Page 12: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Input layer

Receives all of the inputs Number of neurons equals the

number of inputs Does no processing Connects to all the neurons in the

hidden layer

Page 13: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Hidden layer

Could be more than one layer, but theory says that only one layer is necessary

The number of neurons is found by experiment

Processes the inputs Connects to all neurons in the output

layer The output is a sigmoid function

Page 14: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Output layer

Produces the final outputs Processes the outputs from the

hidden layer The number of neurons equals the

number of outputs The output could be linear or sigmoid

Page 15: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Problems with networks

Originally the neurons had a hard-limiter on the output

Although an error could be found between the desired output and the actual output, which could be used to adjust the weights in the output layer, there was no way of knowing how to adjust the weights in the hidden layer

Page 16: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The invention of back-propagation By introducing a smoothly changing

output function, it was possible to calculate an error that could be used to adjust the weights in the hidden layer(s)

Page 17: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Output function

The sigmoid function

0

0.2

0.4

0.6

0.8

1

1.2

-5

-4.5 -4

-3.5 -3

-2.5 -2

-1.5 -1

-0.5 -0 0.5 1

1.5 2

2.5 3

3.5 4

4.5 5

net

y

Page 18: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Sigmoid function

The sigmoid function goes smoothly from 0 to 1 as net increases

The value of y when net=0 is 0.5 When net is negative, y is between 0

and 0.5 When net is positive, y is between

0.5 and 1.0

Page 19: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Back-propagation

The method of training is called the back-propagation of errors

The algorithm is an extension of the delta rule, called the generalised delta rule

Page 20: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Generalised delta rule

The equation for the generalised delta rule is ΔWi = ηXiδ

δ is the defined according to which layer is being considered.

For the output layer, δ is y(1-y)(d-y). For the hidden layer δ is a more

complex.

Page 21: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Training a network

Example: The problem could not be implemented on a single layer - nonlinearly separable

A 3 layer MLP was tried with 2 neurons in the hidden layer - which trained

With 1 neuron in the hidden layer it failed to train

Page 22: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The hidden neurons

0

1

2

3

4

5

6

0 1 2 3 4 5 6

Series1

Series2

Page 23: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The weights

The weights for the 2 neurons in the hidden layer are -9, 3.6 and 0.1 and 6.1, 2.2 and -7.8

These weights can be shown in the pattern space as two lines

The lines divide the space into 4 regions

Page 24: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Training and Testing

Page 25: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Starting with a data set, the first step is to divide the data into a training set and a test set

Use the training set to adjust the weights until the error is acceptably low

Test the network using the test set, and see how many it gets right

Page 26: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

A better approach

Critics of this standard approach have pointed out that training to a low error can sometimes cause “overfitting”, where the network performs well on the training data but poorly on the test data

The alternative is to divide the data into three sets, the extra one being the validation set

Page 27: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Validation set

During training, the training data is used to adjust the weights

At each iteration, the validation/test data is also passed through the network and the error recorded but the weights are not adjusted

The training stops when the error for the validation/test set starts to increase

Page 28: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Stopping criteria

error

time

Stop here

Validation set

Training set

Page 29: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

The multi-layered perceptron (MLP) and Backpropogation

Page 30: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Architecture

Input layer Hidden layer Output layer

Page 31: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Back-propagation

The method of training is called the back-propagation of errors

The algorithm is an extension of the delta rule, called the generalised delta rule

Page 32: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Generalised delta rule

The equation for the generalised delta rule is ΔWi = ηXiδ

δ is the defined according to which layer is being considered.

For the output layer, δ is y(1-y)(d-y). For the hidden layer δ is a more

complex.

Page 33: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Hidden Layer

We have to deal with the error from the output layer being feedback backwards to the hidden layer.

Lets look at example the weight w2(1,2)

Which is the weight connecting neuron 1 in the input layer with neuron 2 in the hidden layer.

Page 34: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Δw2(1,2)=ηX1(1)δ2(2) Where

X1(1) is the output of the neuron 1 in the hidden layer.

δ2(2) is the error on the output of neuron 2 in the hidden layer.

δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)

Page 35: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

δ3(1) = y(1-y)(d-y)=x3(1)[1-x3(1)][d-x3(1)]

So we start with the error at the output and use this result to ripple backwards altering the weights.

Page 36: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Page 37: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Example

Exclusive OR using the network shown earlier: 2:2:1 network

Initial weights W2(0,1)=0.862518, W2(1,1)=-0.155797,

W2(2,1)=0.282885 W2(0,2)=0.834986, w2(1,2)=-0.505997, w2(2,2)=-

0.864449 W3(0,1)=0.036498, w3(1,1)=-0.430437,

w3(2,1)=0.48121

Page 38: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Feedforward – hidden layer (neuron 1)

So if X1(0)=1 (the bias) X1(1)=0 X1(2)=0

The output of weighted sum inside neuron 1 in the hidden layer=0.862518

Then using sigmoid function X2(1)=0.7031864

Page 39: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Feedforward – hidden layer (neuron 2) So if

X1(0)=1 (the bias) X1(1)=0 X1(2)=0

The output of weighted sum inside neuron 2 in the hidden layer=0.834986

Then using sigmoid function X2(2)=0.6974081

Page 40: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Feedforward – output layer So if

X2(0)=1 (the bias) X2(1)=0.7031864 X2(2)=0.6974081

The output of weighted sum inside neuron 2 in the hidden layer=0.0694203

Then using sigmoid function X3(1)=0.5173481 Desired output=0

Page 41: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

δ3(1)=x3(1)[1-x3(1)][d-x3(1)] =-0.1291812 δ2(1)=X2(1)[1-X2(1)]w3(1,1)

δ3(1)=0.0116054 δ2(2)=X2(2)[1-X2(2)]w3(2,1) δ3(1)=-

0.0131183

Now we can use the delta rule to calculate the change in the weights

ΔWi = ηXiδ

Page 42: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

Examples

If we set η=0.5 ΔW2(0,1) = ηX1(0)δ2(1)

=0.5 x 1 x 0.0116054=0.0058027

ΔW3(2,1) = ηX2(1)δ3(1)=0.5 x 0.7031864 x –

0.1291812=-0.04545192

Page 43: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

What would be the results of the following?

ΔW2(2,1) = ηX1(2)δ2(1) ΔW2(2,2) = ηX1(2)δ2(2)

Page 44: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

ΔW2(2,1) = ηX1(2)δ2(1)=0.5x0x0.0116054=0

ΔW2(2,2) = ηX1(2)δ2(2)=0.5 x 0 x –0.131183=0

Page 45: Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.

New weights W2(0,1)=0.868321 W2(1,1)=-0.155797

W2(2,1)=0.282885 W2(0,2)=0.828427 w2(1,2)=-0.505997

w2(2,2)=-0.864449 W3(0,1)=0.028093 w3(1,1)=-0.475856

w3(2,1)=0.436164