1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering...

45
1 EE459 EE459 Neural Networks Neural Networks Backpropagation Backpropagation Kasin Prakobwaitayakit Kasin Prakobwaitayakit Department of Electrical Engineering Department of Electrical Engineering Chiangmai University Chiangmai University

Transcript of 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering...

Page 1: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

1

EE459EE459Neural NetworksNeural NetworksBackpropagationBackpropagation

EE459EE459Neural NetworksNeural NetworksBackpropagationBackpropagation

Kasin PrakobwaitayakitKasin PrakobwaitayakitDepartment of Electrical EngineeringDepartment of Electrical Engineering

Chiangmai UniversityChiangmai University

Page 2: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

2

BackgroundArtificial neural networks (ANNs) provide a general, practical method for learning real-valued, discrete-valued, and vector-valued functions from examples. Algorithms such as BACKPROPAGATION use gradient descent to tune network parameters to best fit a training set of input-output pairs. ANN learning is robust to errors in the training data and has been successfully applied to problems such as face recognition/detection, speech recognition, and learning robot control strategies.

Page 3: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

3

Autonomous Vehicle Steering

Page 4: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

4

Characteristics of ANNs•Instances are represented by many attribute-value pairs.

•The target function output may be discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes.

•The training examples may contain errors.

•Long training times are acceptable.

•Fast evaluation of the learned target function may be required.

•The ability of humans to understand the learned target function is not important.

Page 5: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

5

Very simple example

0 1

0

-0.10.4

net input = 0.4 0 + -0.1 1 = -0.1

Page 6: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

6

Learning problem to be solved

• Suppose we have an input pattern (0 1)• We have a single output pattern (1)• We have a net input of -0.1, which

gives an output pattern of (0)• How could we adjust the weights, so

that this situation is remedied and the spontaneous output matches our target output pattern of (1)?

Page 7: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

7

Answer• Increase the weights, so that the

net input exceeds 0.0• E.g., add 0.2 to all weights• Observation: Weight from input

node with activation 0 does not have any effect on the net input

• So we will leave it alone

Page 8: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

8

One type of ANN system is based on a unit called a perceptron.

The perceptron function can sometimes be written as

The space H of candidate hypotheses considered in perceptron learning is the set of all possible real-valued weight vectors.

Perceptrons

Page 9: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

9

Representational Power of Perceptrons

Page 10: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

10

Decision surface

linear decision surface nonlinear decision surface

Programming Example of Decision Surface

Page 11: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

11

The Perceptron Training Rule

One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many times as needed until the perceptron classifies all training examples correctly. Weights are modified at each step according to the perceptron training rule, which revises the weight associated with input according to the rule

Page 12: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

12

Gradient Descent and Delta Rule

The delta training rule is best understood by considering the task of training an unthresholded perceptron; that is, a linear unit for which the output o is given by

In order to derive a weight learning rule for linear units, let us begin by specifying a measure for the training error of a hypothesis (weight vector), relative to the training examples.

Page 13: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

13

Visualizing the Hypothesis Space

minimum error

initial weight vector by random

Page 14: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

14

Derivation of the Gradient Descent Rule

The vector derivative is called the gradient of E with respect to , written

The gradient specifies the direction that produces the steepest increase in E. The negative of this vector therefore gives the direction of steepest decrease. The training rule for gradient descent is

Page 15: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

15

Derivation of the Gradient Descent Rule (cont.)

The negative sign is presented because we want to move the weight vector in the direction that decreases E. This training rule can also written in its component form

which makes it clear that steepest descent is achieved by altering each component of in proportion to .

Page 16: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

16

The vector of derivatives that form the gradient can be obtained by differentiating E

Derivation of the Gradient Descent Rule (cont.)

The weight update rule for standard gradient descent can be summarized as

Page 17: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

17

Stochastic Approximation to Gradient DescentEECP0720 Expert Systems – Artificial Neural Networks

Page 18: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

18

Summary of PerceptronPerceptron training rule guaranteed to succeed if

•training examples are linearly separable

•sufficiently small learning rate

Linear unit training rule uses gradient descent

•guaranteed to converge to hypothesis with minimum squared error

•given sufficiently small learning rate

•even when training data contains noise

EECP0720 Expert Systems – Artificial Neural Networks

Page 19: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

19

BACKPROPAGATION Algorithm

EECP0720 Expert Systems – Artificial Neural Networks

Page 20: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

20

Error FunctionThe Backpropagation algorithm learns the weights for a multilayer network, given a network with a fixed set of units and interconnections. It employs gradient descent to attempt to minimize the squared error between the network output values and the target values for those outputs. We begin by redefining E to sum the errors over all of the network output units

where outputs is the set of output units in the network, and tkd and okd are the target and output values associated with the kth output unit and training example d.

Page 21: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

21

Architecture of Backpropagation

Page 22: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

22

Backpropagation Learning Algorithm

Page 23: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

23

Backpropagation Learning Algorithm (cont.)

Page 24: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

24

Backpropagation Learning Algorithm (cont.)

Page 25: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

25

Backpropagation Learning Algorithm (cont.)

Page 26: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

26

Backpropagation Learning Algorithm (cont.)

Page 27: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

27

Inputs To Neurons• Arise from other neurons or from

outside the network• Nodes whose inputs arise outside the

network are called input nodes and simply copy values

• An input may excite or inhibit the response of the neuron to which it is applied, depending upon the weight of the connection

Page 28: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

28

Weights• Represent synaptic efficacy and may be

excitatory or inhibitory• Normally, positive weights are

considered as excitatory while negative weights are thought of as inhibitory

• Learning is the process of modifying the weights in order to produce a network that performs some function

Page 29: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

29

Output• The response function is normally

nonlinear• Samples include

– Sigmoid

– Piecewise linear

xexf

1

1)(

xif

xifxxf

,0

,)(

Page 30: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

30

Backpropagation Preparation

• Training SetA collection of input-output patterns that are used to train the network

• Testing SetA collection of input-output patterns that are used to assess network performance

• Learning Rate-ηA scalar parameter, analogous to step size in numerical integration, used to set the rate of adjustments

Page 31: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

31

Network Error• Total-Sum-Squared-Error (TSSE)

• Root-Mean-Squared-Error (RMSE)

patterns outputs

actualdesiredTSSE 2)(2

1

outputspatterns

TSSERMSE

*##

*2

Page 32: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

32

A Pseudo-Code Algorithm

• Randomly choose the initial weights• While error is too large

– For each training pattern• Apply the inputs to the network• Calculate the output for every neuron from the input

layer, through the hidden layer(s), to the output layer• Calculate the error at the outputs• Use the output error to compute error signals for pre-

output layers• Use the error signals to compute weight adjustments• Apply the weight adjustments

– Periodically evaluate the network performance

Page 33: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

33

Face Detection using Neural Networks

Neural

Network

Face Database

Non-Face Database

Training ProcessOutput=1, for face database

Output=0, for non-face database

Face

orNon-

Face?

Test

ing P

roc e

ss

Page 34: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

34

Backpropagation Using Gradient Descent

• Advantages– Relatively simple implementation– Standard method and generally works well

• Disadvantages– Slow and inefficient– Can get stuck in local minima resulting in

sub-optimal solutions

Page 35: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

35

Local Minima

Local Minimum

Global Minimum

Page 36: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

36

Alternatives To Gradient Descent

• Simulated Annealing– Advantages

• Can guarantee optimal solution (global minimum)

– Disadvantages• May be slower than gradient descent• Much more complicated implementation

Page 37: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

37

Alternatives To Gradient Descent

• Genetic Algorithms/Evolutionary Strategies– Advantages

• Faster than simulated annealing• Less likely to get stuck in local minima

– Disadvantages• Slower than gradient descent• Memory intensive for large nets

Page 38: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

38

Enhancements To Gradient Descent

• Momentum– Adds a percentage of the last

movement to the current movement

Page 39: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

39

Enhancements To Gradient Descent

• Momentum– Useful to get over small bumps in the error

function– Often finds a minimum in less steps– w(t) = -n*d*y + a*w(t-1)

• w is the change in weight• n is the learning rate• d is the error• y is different depending on which layer we are

calculating• a is the momentum parameter

Page 40: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

40

Enhancements To Gradient Descent

• Adaptive Backpropagation Algorithm– It assigns each weight a learning rate– That learning rate is determined by the sign

of the gradient of the error function from the last iteration

• If the signs are equal it is more likely to be a shallow slope so the learning rate is increased

• The signs are more likely to differ on a steep slope so the learning rate is decreased

– This will speed up the advancement when on gradual slopes

Page 41: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

41

Enhancements To Gradient Descent

• Adaptive Backpropagation– Possible Problems:

• Since we minimize the error for each weight separately the overall error may increase

– Solution:• Calculate the total output error after each

adaptation and if it is greater than the previous error reject that adaptation and calculate new learning rates

Page 42: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

42

Enhancements To Gradient Descent

• SuperSAB(Super Self-Adapting Backpropagation)– Combines the momentum and adaptive methods.– Uses adaptive method and momentum so long as

the sign of the gradient does not change• This is an additive effect of both methods resulting in a

faster traversal of gradual slopes– When the sign of the gradient does change the

momentum will cancel the drastic drop in learning rate

• This allows for the function to roll up the other side of the minimum possibly escaping local minima

Page 43: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

43

Enhancements To Gradient Descent

• SuperSAB– Experiments show that the SuperSAB

converges faster than gradient descent

– Overall this algorithm is less sensitive (and so is less likely to get caught in local minima)

Page 44: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

44

Other Ways To Minimize Error

• Varying training data– Cycle through input classes– Randomly select from input classes

• Add noise to training data– Randomly change value of input node (with

low probability)• Retrain with expected inputs after

initial training– E.g. Speech recognition

Page 45: 1 EE459 Neural Networks Backpropagation Kasin Prakobwaitayakit Department of Electrical Engineering Chiangmai University.

45

Other Ways To Minimize Error

• Adding and removing neurons from layers– Adding neurons speeds up learning

but may cause loss in generalization– Removing neurons has the opposite

effect