Single layer and multi layer perceptron (Supervised learning ......Department of Computer...

101
Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi layer perceptron (Supervised learning) By: Dr. Alireza Abdollahpouri

Transcript of Single layer and multi layer perceptron (Supervised learning ......Department of Computer...

Page 1: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Department of Computer Engineering

University of Kurdistan

Neural Networks (Graduate level) Single layer and multi layer perceptron

(Supervised learning)

By: Dr. Alireza Abdollahpouri

Page 2: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

2

Classification- Supervised learning

Page 3: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

3

Classification

Basically we want our system to classify a set of patterns as belonging to a given class or not.

Page 4: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

4

Classification

Page 5: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

5

Linear Classifier

Page 6: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

6

Supervised learning

Page 7: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

7

Learning phase

Page 8: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

8

The Perceptron

x1 w0

y

x2

x3

x4

x5

w1

w2

w3

w4

w5

0

1

sgn wxwyn

i

iiΣ

The first model of a biological neuron

1

Page 9: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

9

Perceptron for Classification

Page 10: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

10

Classification

Simple case (two classes – one output neuron)

General case (multiple classes – Several output neurons)

Page 11: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Complexity of PR – An Example

Problem: Sorting incoming fish

on a conveyor belt.

Assumption: Two kind of fish:

(1) sea bass

(2) salmon

11

Page 12: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Pre-processing Step

Example (1) Image enhancement

(2) Separate touching

or occluding fish

(3) Find the boundary of

each fish

12

Page 13: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Feature Extraction

Assume a fisherman told us that a sea bass is generally longer than a salmon.

We can use length as a feature and decide between sea bass and salmon according to a threshold on length.

How should we choose the threshold?

13

Page 14: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

“Length” Histograms

• Even though sea bass is longer than salmon on the average, there are many examples of fish where this observation does not hold.

14

threshold l*

14

Page 15: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

“Average Lightness” Histograms

• Consider a different feature such as “average lightness”

• It seems easier to choose the threshold x* but we still

cannot make a perfect decision.

15

threshold x*

15

Page 16: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Multiple Features

To improve recognition accuracy, we might have to use more than one features at a time.

Single features might not yield the best performance.

Using combinations of features might yield better performance.

How many features should we choose?

1

2

x

x

1

2

:

:

x lightness

x width

16

Page 17: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Classification

Partition the feature space into two regions by finding the decision boundary that minimizes the error.

How should we find the optimal decision boundary?

17

17

Page 18: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

18

What a Perceptron does?

For a perceptron with 2 input variables namely x1 and x2

Equation WTX = 0 determines a line separating positive from

negative examples.

x2

x1

x1

y

x2

w1

w2

Σ

w0

y = sgn(w1x1+w2x2+w0)

Page 19: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

19

What a Perceptron does?

For a perceptron with n input variables, it draws a Hyper-plane as the decision boundary over the (n-dimensional) input space. It classifies input patterns into two classes.

The perceptron outputs 1 for instances lying on one side of the hyperplane and outputs –1 for instances on the other side.

x3

x2

x1

Page 20: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

20

What can be represented using Perceptrons?

and or

Representation Theorem: perceptrons can only represent linearly separable functions. Examples: AND,OR, NOT.

Page 21: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

21

Limits of the Perceptron

A perceptron can learn only examples that are called

“linearly separable”. These are examples that can be perfectly

separated by a hyperplane.

+

+

+

-

-

-

+

+

+ -

-

-

Linearly separable Non-linearly separable

Page 22: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

22

Learning Perceptrons

• Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameters changes take place.

• In the case of Perceptrons, we use a supervised learning.

• Learning a perceptron means finding the right values for W that satisfy the input examples {(inputi, targeti)

*}

• The hypothesis space of a perceptron is the space of all weight vectors.

Page 23: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

How to find the weights? We want to find a set of weights that enable our perceptron to correctly classify our data.

we know 𝜕𝐸/𝜕𝑤𝑖 then we can search for a minimum of E in weight

space

23

Page 24: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

24

Learning Perceptrons

Principle of learning using the perceptron rule:

1. A set of training examples is given: {(x, t)*} where x is the input and t the target output [supervised learning]

2. Examples are presented to the network.

3. For each example, the network gives an output o.

4. If there is an error, the hyperplane is moved in order to correct the output error.

5. When all training examples are correctly classified, Stop learning.

Page 25: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

25

More formally, the algorithm for learning Perceptrons is as follows:

1. Assign random values to the weight vector

2. Apply the perceptron rule to every training example 3. Are all training examples correctly classified?

Yes. Quit No. Go Back to Step 2.

Learning Perceptrons

Page 26: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

26

Perceptron Training Rule

The perceptron training rule: For a new training example [X = (x1, x2, …, xn), t] update each weight according to this rule: wi = wi + Δwi

Where Δwi = η (t-o) xi

t: target output o: output generated by the perceptron η: constant called the learning rate (e.g., 0.1)

Page 27: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

27

Comments about the perceptron training rule: • If the example is correctly classified the term (t-o) equals zero, and no update on the weight is necessary.

• If the perceptron outputs –1 and the real answer is 1, the weight is increased.

• If the perceptron outputs a 1 and the real answer is -1, the weight is decreased.

• Provided the examples are linearly separable and a small value for η is used, the rule is proved to classify all training examples correctly.

Perceptron Training Rule

Page 28: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

28

Consider the following example: (two classes: Red and Green)

Perceptron Training Rule

Page 29: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

29

Random Initialization of perceptron weights …

Perceptron Training Rule

Page 30: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

30

Apply Iteratively Perceptron Training Rule on the different examples:

Perceptron Training Rule

Page 31: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

31

Apply Iteratively Perceptron Training Rule on the different examples:

Perceptron Training Rule

Page 32: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

32

Apply Iteratively Perceptron Training Rule on the different examples:

Perceptron Training Rule

Page 33: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

33

Apply Iteratively Perceptron Training Rule on the different examples:

Perceptron Training Rule

Page 34: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

34

All examples are correctly classified … stop Learning

Perceptron Training Rule

Page 35: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

35

The straight line w1x+ w2y + w0=0 separates the two classes

Perceptron Training Rule

Page 36: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

36

Learning AND/OR operations

P = [ 0 0 1 1; ... % Input patterns 0 1 0 1 ]; T = [ 0 1 1 1]; % Desired Outputs net = newp([0 1;0 1],1); net.adaptParam.passes = 35; net = adapt(net,P,T); x = [1; 1]; y = sim(net,x); display(y);

x1

y

x2

w1

w2

Σ

w0

Page 37: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Example

We want to classify the following 21 characters written by 3 fonts into 7 classes.

37

Page 38: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

38

Example

Single-layer network

Page 39: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Example

39

Page 40: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Multi-Layer Perceptron (MLP)

40

Page 41: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

41

MultiLayer Perceptron

In contrast to perceptrons, multilayer networks can learn not only

multiple decision boundaries, but the boundaries may be nonlinear.

Input nodes Internal nodes Output nodes

Page 42: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

42

MultiLayer Perceptron- Decision Boundaries

A

B

A

B

A

B

A

A

B

B

A

A

B

B

A

A

B

B

HALF PLANE

BOUNDED BY

HYPERPLANE

CONVEX

OPEN OR

CLOSED

REGION

ARBITRARY

(complexity

limited by

number of

neurons)

Single-layer

Two-layer

Three-layer

Page 43: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

43

Solution for XOR : Add a hidden layer !!

Input nodes Internal nodes Output nodes

X1

X2

X1 XOR X2

x1

x2

x1

x2

x1

Page 44: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

44

Solution for XOR : Add a hidden layer !!

Input nodes Internal nodes Output nodes

X1

X2

The problem is: How to learn Multi Layer Perceptrons??

Solution: Backpropagation Algorithm invented by Rumelhart and colleagues in 1986

X1 XOR x2

Page 45: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Problems

How do we train a multi-layered network?

What is the desired output of hidden neurons?

45

Page 46: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

46

Backpropagation learning Algorithm

Page 47: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

1- Feed Forward

2- Error Backpropagation

3- Update the weights

47

Backpropagation- Algorithm

Page 48: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Backpropagation: Objectives

48

Page 49: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

49

Backpropagation (Error or cost)

Page 50: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

50

Backpropagation (Error or cost)

Page 51: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Error space (Multi-Modal Cost Surface)

-3-2

-10

12

3

-2

0

2

-10

-5

0

5

global min local min

Gradient descent

51

Page 52: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

52

Local vs. global minimum

Page 53: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

53

Weight updates

Page 54: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Backpropagation- Algorithm

54

Page 55: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

55

Minimizing Error Using Steepest Descent

The main idea:

Find the way downhill and take a step:

E

x

minimum

downhill = - _____ d E d x

= step size

x x -d E d x

Page 56: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

56

Convergence

Page 57: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

57

Updating weights

Page 58: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

58

Back-propagating error

Page 59: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

59

Back-propagating- computing δj (for output layer)

Page 60: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

60

Back-propagating- computing δj (for hidden layers)

Page 61: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

61

Updating weights

Page 62: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

62

Page 63: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Minimizing the global error

63

Page 64: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

64

Backpropagating- remarks

Page 65: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

65

Backpropagating- remarks

Page 66: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Learning rate

66

Page 67: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Initializing weights

67

Page 68: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Hyper parameters

68

Page 69: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Type of data sets

69

(Used to decide when to stop training only by monitoring the error.)

Page 70: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Consistency of the TS

70

If some examples are inconsistent, convergence of learning is not guaranteed:

In real cases, inconsistencies can be introduced by similar noisy patterns belonging to different classes

Examples of problematic training patterns taken from images of handwritten characters:

Page 71: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Stopping criteria

71

Page 72: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

72

Generalization in Classification Suppose the task of our network is to learn a classification decision

boundary Our aim is for the network to generalize to classify new inputs

appropriately. If we know that the training data contains noise, we don’t necessarily want the training data to be classified totally accurately, as that is likely to reduce the generalization ability.

Page 73: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

73

The problem of overfitting …

Approximation of the function y = f(x) :

2 neurons in hidden layer

5 neurons in hidden layer

40 neurons in hidden layer

x

y

• The overfitting is not detectable in the learning phase …

• So use Cross-Validation ...

Page 74: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Overfitting and underfitting

74

Page 75: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

If the network is well dimensioned, both the errors ETS and EVS are small

75

Generalization in Function Approximation

Page 76: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

If the network does not have enough hidden neurons, it is not able to approximate the function and both errors are large:

76

Generalization in Function Approximation

Page 77: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

If the network has too many hidden neurons, it could respond correclty to the TS (ETS< ε), but could not generalize well (EVS too large):

77

Generalization in Function Approximation

Page 78: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

To avoid the overtraining effect, we can train the network using the TS, but monitor EVS and stop the training when (EVS< ε):

78

(network must learn the rule, not just the examples

Generalization in Function Approximation

Page 79: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

79

Use of a validation set allows periodic testing to see whether the model has overfitted

Stop here

Earlier Stopping - Good Generalization

Page 80: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

K-Fold Cross Validation

80

Page 81: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

81

Application of MLPs

Network

Stimulus Response

0 1 0 1 1 1 0 0

1 1 0 0 1 0 1 0

Input

Pattern

Output

Pattern

encoding decoding

The general scheme when using ANNs is as follows:

Page 82: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

82

Application: Digit Recognition

Page 83: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

83

Learning XOR Operation: Matlab Code

P = [ 0 0 1 1; ...

0 1 0 1]

T = [ 0 1 1 0];

net = newff([0 1;0 1],[6 1],{'tansig' 'tansig'});

net.trainParam.epochs = 4850;

net = train(net,P,T);

X = [0 1];

Y = sim(net,X);

display(Y);

Page 84: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Solving the EXOR operation

y55

x1 31

x2

Input

layer

Output

layer

Hiddenlayer

42

3

w13

w24

w23

w24

w35

w45

4

5

1

1

1

84

Page 85: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

The effect of the threshold applied to a neuron in the hidden or output layer is represented by its weight, , connected to a fixed input equal to 1.

The initial weights and threshold levels are set randomly as follows: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = 1.2, w45 = 1.1, 3 = 0.8, 4 = 0.1 and 5 = 0.3.

85

Solving the EXOR operation

Page 86: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

We consider a training set where inputs x1 and x2 are equal to 1 and desired output yd,5 is 0. The actual outputs of neurons 3 and 4 in the hidden layer are calculated as

Now the actual output of neuron 5 in the output layer is determined as:

Thus, the following error is obtained:

5250.01/1)( )8.014.015.01(32321313 ewxwxsigmoidy

8808.01/1)( )1.010.119.01(42421414

ewxwxsigmoidy

5097.01/1)()3.011.18808.02.15250.0(

54543535 ewywysigmoidy

5097.05097.0055, yye d

86

Solving the EXOR operation

Page 87: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

The next step is weight training. To update the weights and threshold levels in our network, we propagate the error, e, from the output layer backward to the input layer.

First, we calculate the error gradient for neuron 5 in the output layer:

Then we determine the weight corrections assuming that the learning rate parameter, a, is equal to 0.1:

1274.05097).0(0.5097)(10.5097)1( 555 eyy

0112.0)1274.0(8808.01.05445 yw

0067.0)1274.0(5250.01.05335 yw

0127.0)1274.0()1(1.0)1( 55

87

Solving the EXOR operation

Page 88: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Next we calculate the error gradients for neurons 3 and 4 in the hidden layer:

We then determine the weight corrections:

0381.0)2.1(0.1274)(0.5250)(10.5250)1( 355333 wyy

0.0147.114)0.127(0.8808)(10.8808)1( 455444 wyy

0038.00381.011.03113 xw

0038.00381.011.03223 xw0038.00381.0)1(1.0)1( 33

0015.0)0147.0(11.04114 xw

0015.0)0147.0(11.04224 xw0015.0)0147.0()1(1.0)1( 44

88

Solving the EXOR operation

Page 89: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

At last, we update all weights and threshold:

The training process is repeated until the sum of squared errors is less than 0.001.

5038.00038.05.0131313 www

8985.00015.09.0141414 www

4038.00038.04.0232323 www

9985.00015.00.1242424 www

2067.10067.02.1353535 www

0888.10112.01.1454545 www

7962.00038.08.0333

0985.00015.01.0444

3127.00127.03.0555

89

Solving the EXOR operation

Page 90: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Learning curve for operation Exclusive-OR

0 50 100 150 200

10 1

Epoch

Sum-Squared Network Error for 224 Epochs

100

10 -1

10 -2

10-3

10-4

Su

m-S

qu

are

d E

rro

r

90

Page 91: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Final results of three-layer network learning

Inputs

x1 x2

1

0

1

0

1

1

0

0

0

1

1

Desired

output

yd

0

0.0155

Actual

output

y5 e

Sum of

squared

errors

0.9849

0.9849

0.0175

0.0010

91

Page 92: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Training Modes

Incremental mode (on-line, sequential, stochastic, or per-observation): Weights updated after each instance is presented

Batch mode (off-line or per-epoch): Weights updated after all the patterns are presented

92

Page 93: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

NN: Universal Approximator

Any desired continuous function can be implemented by a three-layer network given sufficient number of hidden units, proper nonlinearitiers and weighs (Kolmogorov)

93

Page 94: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

● Data representation

● Network Topology

● Network Parameters

● Training

NN DESIGN ISSUES

94

Page 95: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

● Data representation depends on the problem.

● In general ANNs work on continuous (real valued) attributes. Therefore symbolic attributes are encoded into continuous ones.

● Attributes of different types may have different ranges of values which affect the training process.

● Normalization may be used, like the following one which scales each attribute to assume values between 0 and 1.

for each value xi of ith attribute, mini and maxi are the minimum

and maximum value of that attribute over the training set.

Data Representation

i

i

minmax

min

i

ii

xx

95

Page 96: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

● The number of layers and neurons depend on the specific task.

● In practice this issue is solved by trial and error.

● Two types of adaptive algorithms can be used:

− start from a large network and successively remove some neurons and links until network performance degrades.

− begin with a small network and introduce new neurons until performance is satisfactory.

Network Topology

96

Page 97: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

● How are the weights initialized?

● How is the learning rate chosen?

● How many hidden layers and how many neurons?

● How many examples in the training set?

Network parameters

97

Page 98: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Initialization of weights

● In general, initial weights are randomly chosen, with typical values between -1.0 and 1.0 or -0.5 and 0.5.

● If some inputs are much larger than others, random initialization may bias the network to give much more importance to larger inputs.

● In such a case, weights can be initialized as follows:

Ni

N

,...,1

|x|1

21

ij iw For weights from the input to the first layer

For weights from the first to the second layer

Ni

Ni,...,1)xw(

121

jk ijw

98

Page 99: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

● The right value of depends on the application.

● Values between 0.1 and 0.9 have been used in many applications.

● Other heuristics is that adapt during the training as described in previous slides.

● It is common to start with large values and decrease monotonically.

- Start with 0.9 and decrease every 5 epochs

- Use a Gaussian function

- = 1/k

- …

Choice of learning rate

99

Page 100: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

Size of Training set

● Rule of thumb:

− the number of training examples should be at least five to ten times the number of weights of the network.

● Other rule:

|W|= number of weights

a=expected accuracy on test set a)-(1

|W| N

100

Page 101: Single layer and multi layer perceptron (Supervised learning ......Department of Computer Engineering University of Kurdistan Neural Networks (Graduate level) Single layer and multi

101