September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In...

September 30, 2010 Neural Networks Lecture 8: Backpropagation Learning

1

Sigmoidal NeuronsSigmoidal Neurons

In backpropagation networks, we typically choose In backpropagation networks, we typically choose = 1 and = 1 and = 0. = 0.

11

00

11

ffii(net(netii(t))(t))

netnetii(t)(t)-1-1

= = 11

= = 0.10.1

/))(net(1

1))(net(

tii ietf


2


This leads to a simplified form of the sigmoid function: This leads to a simplified form of the sigmoid function:

We do not need a modifiable threshold We do not need a modifiable threshold , because , because we will use “dummy” inputs as we did for perceptrons. we will use “dummy” inputs as we did for perceptrons.

The choice = 1 works well in most situations and The choice = 1 works well in most situations and results in a very simple derivative of S(net).results in a very simple derivative of S(net).

)(1

1)(

netenetS


3


This result will be very useful when we develop the This result will be very useful when we develop the backpropagation algorithm.backpropagation algorithm.

xexS

1

1)(

2)1(

)()('

x

x

e

e

dx

xdSxS

22 )1(

1

1

1

)1(

11xxx

x

eee

e

))(1)(( xSxS


4

Backpropagation LearningBackpropagation Learning

Similar to the Adaline, the goal of the Similar to the Adaline, the goal of the Backpropagation learning algorithm is to modify the Backpropagation learning algorithm is to modify the network’s weights so that its output vectornetwork’s weights so that its output vector

oopp = (o = (op,1p,1, o, op,2p,2, …, o, …, op,Kp,K))

is as close as possible to the desired output vector is as close as possible to the desired output vector

ddpp = (d = (dp,1p,1, d, dp,2p,2, …, d, …, dp,Kp,K))

for K output neurons and input patterns p = 1, …, P.for K output neurons and input patterns p = 1, …, P.

The set of input-output pairs (exemplars) The set of input-output pairs (exemplars) {({(xxpp, , ddpp) | p = 1, …, P} constitutes the training set.) | p = 1, …, P} constitutes the training set.


5

Backpropagation LearningBackpropagation LearningAlso similar to the Adaline, we need a cumulative Also similar to the Adaline, we need a cumulative error function that is to be minimized:error function that is to be minimized:

P

pppErr

1

),(Error do

We can choose the mean square error (MSE) once We can choose the mean square error (MSE) once again (but the 1/P factor does not matter):again (but the 1/P factor does not matter):

P

p

K

jjplP 1 1

2, )(

1MSE

wherewhere

jpjpjp dol ,,,


6

TerminologyTerminologyExample: Example: Network function f: Network function f: RR3 3 RR22

output layeroutput layer

hidden layerhidden layer

input layerinput layer

input vectorinput vector

output vectoroutput vector

xx11 xx22

oo22oo11

xx33

)1,2(1,1w

)1,2(4,2w

)0,1(1,1w

)0,1(3,4w


7

Backpropagation LearningBackpropagation LearningFor input pattern p, the i-th input layer node holds xFor input pattern p, the i-th input layer node holds xp,ip,i. .

Net input to j-th node in hidden layer:Net input to j-th node in hidden layer:

n

iipijj xwnet

0,

)0,1(,

)1(

K

kkpkp

K

kkpp odlE

1

2,,

1

2, )()(Network error for p: Network error for p:

jjpjkkp xwSo )1(

,)1,2(

,,Output of k-th node in output layer: Output of k-th node in output layer:

j

jpjkk xwnet )1(,

)1,2(,

)2(Net input to k-th node in output layer: Net input to k-th node in output layer:

n

iipijjp xwSx

0,

)0,1(,

)1(,Output of j-th node in hidden layer: Output of j-th node in hidden layer:


8

Backpropagation LearningBackpropagation LearningAs E is a function of the network weights, we can use As E is a function of the network weights, we can use gradient descent to find those weights that result in gradient descent to find those weights that result in minimal error.minimal error.

For individual weights in the hidden and output layers, we For individual weights in the hidden and output layers, we should move against the error gradient (omitting index p):should move against the error gradient (omitting index p):

)1,2(,

)1,2(,

jkjk w

Ew

)0,1(,

)0,1(,

ijij w

Ew

Output layer: Output layer: Derivative easy to calculateDerivative easy to calculate

Hidden layer: Hidden layer: Derivative difficult to calculateDerivative difficult to calculate


9

Backpropagation LearningBackpropagation LearningWhen computing the derivative with regard to wWhen computing the derivative with regard to wk,jk,j

(2,1)(2,1), we , we can disregard any output units except ocan disregard any output units except okk::

ki

kki odlE 22 )(

)(2 kkk

odo

E

Remember that oRemember that okk is obtained by applying the sigmoid is obtained by applying the sigmoid function S to netfunction S to netkk

(2)(2), which is computed by: , which is computed by:

j

jjkk xwnet )1()1,2(,

)2(

Therefore, we need to apply the chain rule twice. Therefore, we need to apply the chain rule twice.


10


)1.2(,

)2(

)2()1.2(, jk

k

k

k

kjk w

net

net

o

o

E

w

E

Since Since j

jjkk xwnet )1()1,2(,

)2(

)1()1.2(

,

)2(

jjk

k xw

net

We have: We have:

We know that:We know that: )2()2(

' kk

k netSnet

o

Which gives us:Which gives us: )1()2()1.2(

,

')(2 jkkkjk

xnetSodw

E


11

Backpropagation LearningBackpropagation LearningFor the derivative with regard to wFor the derivative with regard to wj,ij,i

(1,0)(1,0), notice that E depends , notice that E depends on it through neton it through netjj

(1)(1), which influences each o, which influences each okk with k = 1, …, K: with k = 1, …, K:

Using the chain rule of derivatives again: Using the chain rule of derivatives again:

, )2(kk netSo

iiijj xwnet )0,1(

,)1( , )1()1(

jj netSx

)0,1(,

)1(

)1(

)1(

1)1(

)2(

)2()0,1(, ij

j

j

jK

k j

k

k

k

kij w

net

net

x

x

net

net

o

o

E

w

E

K

kijjkkkk

ij

xnetSwnetSodw

E

1

)1()1,2(,

)2()0,1(

,

'')(2


12


This gives us the following weight changes at the This gives us the following weight changes at the output layer:output layer:

… … and at the inner layer: and at the inner layer:

with)1()1.2(, jkjk xw

)2(')( kkkk netSod

with)0,1(, ijij xw

)1(

1

)1,2(, ' j

K

kjkkj netSw


13

Backpropagation LearningBackpropagation LearningAs you surely remember from a few minutes ago:As you surely remember from a few minutes ago:

Then we can simplify the generalized error terms:Then we can simplify the generalized error terms:))(1)(()(' xSxSxS

)2(')( kkkk netSod )1()( kkkkk oood

And:And:

)1(

1

)1,2(, ' j

K

kjkkj netSw

)1()1(

1

)1,2(, 1 jj

K

kjkkj xxw


14

Backpropagation LearningBackpropagation LearningThe simplified error terms The simplified error terms kk and and jj use variables that are calculated in the feedforward use variables that are calculated in the feedforward phase of the network and can thus be calculated very efficiently.phase of the network and can thus be calculated very efficiently.

Now let us state the final equations again and reintroduce the subscript p for the p-th pattern:Now let us state the final equations again and reintroduce the subscript p for the p-th pattern:

with)1(,,

)1.2(, jpkpjk xw

)1()( ,,,,, kpkpkpkpkp oood

with,,)0,1(

, ipjpij xw

)1(,

)1(,

1

)1,2(,,, 1 jpjp

K

kjkkpjp xxw


15

Backpropagation LearningBackpropagation LearningAlgorithmAlgorithm Backpropagation; Backpropagation; Start with randomly chosen weights;Start with randomly chosen weights; whilewhile MSE is above desired threshold and computational MSE is above desired threshold and computational bounds are not exceeded, bounds are not exceeded, dodo

forfor each input pattern each input pattern xxpp, 1 , 1 p p P, P, Compute hidden node inputs;Compute hidden node inputs; Compute hidden node outputs;Compute hidden node outputs; Compute inputs to the output nodes;Compute inputs to the output nodes; Compute the network outputs;Compute the network outputs; Compute the error between output and desired output;Compute the error between output and desired output; Modify the weights between hidden and output nodes;Modify the weights between hidden and output nodes; Modify the weights between input and hidden nodes;Modify the weights between input and hidden nodes; end-forend-for end-while.end-while.

September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In...

Documents

Transcript of September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In...