Fundamental Neurocomputing Concepts

118
1 Fundamental Neuroco mputing Concepts 國國國國國國國國 國國國國國國國 國國國 (Chuan-Yu Chang ) 國國 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: [email protected]

description

Fundamental Neurocomputing Concepts. 國立雲林科技大學 資訊工程研究所 張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: [email protected]. Basic Models of Artificial neurons. An artificial neuron can be referred to as a processing element, node, or a threshold logic unit. - PowerPoint PPT Presentation

Transcript of Fundamental Neurocomputing Concepts

Page 1: Fundamental Neurocomputing Concepts

1

Fundamental Neurocomputing Concepts

國立雲林科技大學 資訊工程研究所張傳育 (Chuan-Yu Chang ) 博士Office: ES 709TEL: 05-5342601 ext. 4337E-mail: [email protected]

Page 2: Fundamental Neurocomputing Concepts

2

Basic Models of Artificial neurons An artificial neuron can be referred to as a

processing element, node, or a threshold logic unit. There are four basic components of a neuron

A set of synapses with associated synaptic weights A summing device, each input is multiplied by the

associated synaptic weight and then summed. A activation function, serves to limit the amplitude of the

neuron output. A threshold function, externally applied and lowers the

cumulative input to the activation function.

Page 3: Fundamental Neurocomputing Concepts

3

Basic Models of Artificial neurons

Page 4: Fundamental Neurocomputing Concepts

4

Basic Models of Artificial neurons

q

n

jjqjq

qqqq

Tqnqq

n

jq

Tjqjq

xwfy

ufvfy

www

xxxwu

1

1n21

1

bygiven isneuron theofoutput the

)(

isfunction activation theofoutput the

R,...,, where

iscombiner linear theofoutput the

q

Tq

w

ww

Page 5: Fundamental Neurocomputing Concepts

5

Basic Models of Artificial neurons The threshold (or bias) is incorporated into the synaptic

weight vector wq for neuron q.

Page 6: Fundamental Neurocomputing Concepts

6

Basic Models of Artificial neurons

qq

n

jjqjq

vfy

q

xwv

as written is neuron ofoutput The

as written is potential activation internal effective The

0

Page 7: Fundamental Neurocomputing Concepts

7

Basic Activation Functions

The activation function, transfer function, Linear or nonlinear

Linear (identity) activation function

qqlinq vvfy

Page 8: Fundamental Neurocomputing Concepts

8

Basic Activation Functions

Hard limiter Binary function, threshold function

(0,1) The output of the binary hard

limiter can be written as

Hard limiter activation function

0 if1

0 if0

q

qqhlq v

vvfy

Page 9: Fundamental Neurocomputing Concepts

9

Basic Activation Functions

Bipolar, symmetric hard limiter (-1, 1) The output of the symmetric ha

rd limiter can be written as

Sometimes referred to as the signum (or sign) function.

0 if1

0 if0

0 if1

q

q

q

qshlq

v

v

v

vfySymmetric limiter activation function

Page 10: Fundamental Neurocomputing Concepts

10

Basic Activation Functions

Saturation linear function, piecewise linear function The output of the saturation

linear function is given by

2

1 if1

2

1

2

1- if

2

12

1 if0

q

qq

q

qslq

v

vv

v

vfySaturation linear activation function

Page 11: Fundamental Neurocomputing Concepts

11

Basic Activation Functions

Saturation linear function The output of the symmetric

saturation linear function is given by

Saturation linear activation function

1 if1

11- if

1 if1

q

qq

q

qsslq

v

vv

v

vfy

Page 12: Fundamental Neurocomputing Concepts

12

Basic Activation Functions

Sigmoid function (S-shaped function) Binary sigmoid function The output of the Binary

sigmoid function is given by

qvqbsq

evfy

1

1

Where is the slope parameter of the binary sigmoid function

Binary sigmoid function

Hard limiter has no derivative at the origin, the binary sigmoid is a continuousAnd differentiable function

Page 13: Fundamental Neurocomputing Concepts

13

Basic Activation Functions The derivation of the binary sigmoid function for two

different values of the slope parameter.

qbsqbsv

v

q

qbsqbs vfvf

e

e

dv

vdfvg

q

q

11

2

Page 14: Fundamental Neurocomputing Concepts

14

Basic Activation Functions

Sigmoid function (S-shaped function) Bipolar sigmoid function, hyperb

olic tangent sigmoid The output of the Binary sigmoid

function is given by

q

q

qq

qq

v

v

vv

vv

qqhtsqe

e

ee

eevvfy

2

2

1

1tanh

Page 15: Fundamental Neurocomputing Concepts

15

Basic Activation Functions The effect of the threshold qan

d bias q can be illustrated by observing the binary sigmoid activation function Three plots of the binary sigm

oid function Threshold=2 (q=2) Bias=2(q=2) Nominal case(q=q=2)

Applying a threshold is analogous to delaying a time-domain signal

Adding a bias is analogous to an advance of a signal.

Page 16: Fundamental Neurocomputing Concepts

16

The Hopfield Model of the Artificial Neuron The Hopfield neural network is an asynchronous

parallel processing, fully interconnected. Discrete-time model of the Hopfield neuron

Page 17: Fundamental Neurocomputing Concepts

17

The Hopfield Model of the Artificial Neuron

(2.18) )1(

as )1(neuron ofoutput the(2.17), and (2.16) Using

(2.17) 1

(2.16) 1)1(

as zdelay unit thebeforeneuron theofoutput The

1

1

-1

n

jqjqjshlq

q

n

jqjqjq

qshlq

kxwfky

ky

kxwkv

where

kvfky

Page 18: Fundamental Neurocomputing Concepts

18

The Hopfield Model of the Artificial Neuron

Continuous-time model of the Hopfield artificial neuron

qhtsq

n

jqjqjqq

qcq

vfy

xwvdt

dvT

1

Page 19: Fundamental Neurocomputing Concepts

19

Adaline and Madaline Least-Mean-Square (LMS) Algorithm

Widrow-Hoff learning rule Delta rule The LMS is an adaptive algorithm that computes adjustmen

ts of the neuron synaptic weights. The algorithm is based on the method of steepest decent. It adjusts the neuron weights to minimize the mean square

error between the inner product of the weight vector with the input vector and the desired output of the neuron.

Adaline (adaptive linear element) A single neuron whose synaptic weights are updated accor

ding to the LMS algorithm. Madaline (Multiple Adaline)

Page 20: Fundamental Neurocomputing Concepts

20

Simple adaptive linear combiner

kxkwkwkxkv TT

inputs

x0=1, wo= (bias)

Page 21: Fundamental Neurocomputing Concepts

21

Simple adaptive linear combiner The difference between the desired response and

the network response is

The MSE criterion can be written as

Expanding Eq(2.23)

22 )()(2

1)(

2

1)( kxkwkdkeEwJ T

kxkwkdkvkdke T

)()(2

1)()(

2

1

)()()()(2

1)()()()(

2

1)(

2

2

kwCkwkwpkdE

kwkxkxEkwkwkxkdEkdEwJ

xTT

TTT

(2.22)

(2.23)

(2.24)

(2.25)

Page 22: Fundamental Neurocomputing Concepts

22

Simple adaptive linear combiner Cross correlation vector between the desired respon

se and the input patterns

Covariance matrix for the input pattern

J(w) 的 MSE 表面有一個最小值 (minimum) ,因此計算梯度等於零的權重值

因此,最佳的權重值為

)()( kxkdEp

)()( kxkxEC Tx

0)()(

)(

kwCp

w

wJwJ xw

pCw x1*

(2.26)

(2.27)

Page 23: Fundamental Neurocomputing Concepts

23

Adaline and MadalineTypical MSE surface of an adaptive linear combiner

Page 24: Fundamental Neurocomputing Concepts

24

The LMS Algorithm

上式的兩個限制 求解 covariance matrix 的反矩陣很費時 不適合即時的修正權重,因為在大部分情況, covariance m

atrix 和 cross correlation vector 無法事先知道。 為避開這些問題, Widow and Hoff 提出了 LMS algo

rithm To obtain the optimal values of the synaptic weights when J

(w) is minimum. Search the error surface using a gradient descent method t

o find the minimum value. We can reach the bottom of the error surface by changing the we

ights in the direction of the negative gradient of the surface.

Page 25: Fundamental Neurocomputing Concepts

25

The LMS Algorithm Because the gradient on the surface cannot be computed with

out knowledge of the input covariance matrix and the cross-correlation vector, these must be estimated during an iterative procedure.

Estimate of the MSE gradient surface can be obtained by taking the gradient of the instantaneous error surface.

The gradient of J(w) approximated as

The learning rule for updating the weights using the steepest descent gradients method as

)()(

)(

2

1)( )(

2

kxkew

kewJ kwww

)()()()()()1( kxkekwwJkwkw w

(2.28)

(2.29)

Learning rate specifies the magnitude of the update step for the weights in the negative gradient direction.

Page 26: Fundamental Neurocomputing Concepts

26

The LMS Algorithm

If the value of is chosen to be too small, the learning algorithm will modify the weights slowly and a relatively large number of iterations will be required.

If the value of is set too large, the learning rule can become numerically unstable leading to the weights not converging.

Page 27: Fundamental Neurocomputing Concepts

27

The LMS Algorithm

The scalar form of the LMS algorithm can be written from (2.22) and (2.29)

從 (2.22) 及 (2.29) 式,我們必須給 learning rate 設立一個上限,以維持穩定性。

n

hhh kxkwkdke

1

)()()()(

)()()()1( kxkekwkw iii

(2.30)

(2.31)

max

20

The largest eigenvalue of the inpu

t covariance matrix Cx

Page 28: Fundamental Neurocomputing Concepts

28

The LMS Algorithm

為使 LMS 收斂的最小容忍的穩定性,可接受的 learning rate 可限定在

(2.33) 式是一個近似的合理解法,因為

xCtrace

20

n

h

n

hxhhhx cCtrace

1 1max

(2.33)

(2.34)

Page 29: Fundamental Neurocomputing Concepts

29

The LMS Algorithm

從 (2.32) 、 (2.33) 式知道, learning rate 的決定,至少得計算輸入樣本的 covariance matrix ,在實際的應用上,是很難達到的。

即使可以得到,這種固定 learning rate 在結果的精確度上是有問題的。 因此, Robbin’s and Monro’s root-finding algorithm 提出了,隨時間

變動 learning rate 的方法。 (Stochastic approximation )

where k is a very small constant. 缺點: learning rate 減低的速度太快。

kk

)( (2.35)

Page 30: Fundamental Neurocomputing Concepts

30

The LMS Algorithm

理想的做法應該是在學習的過程中, learning rate 應該在訓練的開始時有較大的值,然後逐漸降低。 (Schedule-type adjustment)

Darken and Moody Search-then converge algorithm

Search phase: is relatively large and almost constant. Converge phase: is decrease exponentially to zero.

0 >0 and >>1, typically 100<=<=500 These methods of adjusting the learning rate are commonly

called learning rate schedules.

/1)( 0

kk

(2.36)

Page 31: Fundamental Neurocomputing Concepts

31

The LMS Algorithm

Adaptive normalization approach (non-schedule-type) is adjusted according to the input data every time step

where 0 is a fixed constant.

Stability is guaranteed if 0< 0 <2; the practical range is 0.1<= 0 <=1

2

2

0

)()(

kxk

(2.37)

Page 32: Fundamental Neurocomputing Concepts

32

The LMS Algorithm Comparison of two learning rate schedules: stochastic approximation

schedule and the search-then-converge schedule.

Eq.(2.35)

Eq.(2.36)

is a constant

Page 33: Fundamental Neurocomputing Concepts

33

Summary of the LMS algorithm Step 1: set k=1, initialize the synaptic weight vector w(k=1), and sel

ect values for 0 and . Step 2: Compute the learning rate parameter

Step 3: Computer the error

Step 4: Update the synaptic weights

Step 5: If convergence is achieved, stop; else set k=k+1, then go to step 2.

/10

kk

n

hhh kxkwkdke

1

)()()()(

)()()()()1( kxkekkwkw iii

Page 34: Fundamental Neurocomputing Concepts

34

Example 2.1: Parametric system identification Input data consist of 1000 zero-mean Gaussian random vectors w

ith three components. The bias is set to zero. The variance of the components of x are 5, 1, and 0.5. The assumed linear model is given by b=[1, 0.8, -1]T.

To generate the target values the 1000 input vectors are used to form a matrix X=[x1x2…x1000] the desired outputs are computed according to d=bTX

The progress of the learning rate parameter as it is adjusted according to the search-then converge schedule.

bx d

200

1936.09.0

10001000

1

max0

1000

1

h

TT

xXX

xxC

The learning process was terminated when 82 102/1 keJ

Page 35: Fundamental Neurocomputing Concepts

35

Example 2.1 (cont.) Parametric system identification: estimating a parameter vector

associated with a dynamic model of a system given only input/output data from the system.

The root mean square (RMS) value of the performance measure.

Page 36: Fundamental Neurocomputing Concepts

36

Adaline and Madaline

Adaline It is an adaptive pattern classification network trained by t

he LMS algorithm.

X0(k)=1

可調整的 bias或 weight

產生 bipolar (+1, -1)的輸出,可因 activation function 的不同,而有 (0,1) 的輸出

)()()( kvkdke )()()(~ kykdke

Page 37: Fundamental Neurocomputing Concepts

37

Adaline

Linear error The difference between the desired output and the outp

ut of the linear combiner

Quantizer error The difference between the desired output and the outp

ut of the symmetric hard limiter

)()()( kvkdke

)()()(~ kykdke

Page 38: Fundamental Neurocomputing Concepts

38

Adaline

Adaline 的訓練過程 輸入向量 x 必須和對應的 desired 輸出 d ,同時餵給 Adaline 。 神經鍵的權重值 w ,會根據 linear LMS algorithm 動態的調

整。 Adaline 在訓練的過程,並沒有使用到 activation function ,

(activation function 只有在測試階段才會使用 ) 一旦網路的權重經過適當的調整後,可用未經訓練的 pattern

來測試 Adaline 的反應。 如果 Adaline 的輸出和測試的輸入有很高的正確性時,可稱

網路已經 generalization 。

Page 39: Fundamental Neurocomputing Concepts

39

訓練時,輸入向量 x 必須和 desired 輸出同時餵給 Adaline 。

Weight 是根據 linear LMS 動態來改變。 Adaline 在訓練的過程並沒有使用到 activation

function , (activation function 只有在測試階段才用 )

一旦權重適當調整後,可用未經訓練的 pattern 來測試 Adaline 的反應。

Page 40: Fundamental Neurocomputing Concepts

40

Adaline One common application of the Adaline is for the realizat

ion of a small class of logic functions: AND

otherwisexxxANDnxy n

n

jj 1

1 xall if1,...,,1sgn j

211

Page 41: Fundamental Neurocomputing Concepts

41

Adaline OR

otherwisexxxORnxy n

n

jj 1

1 xsome if1,...,,1sgn j

211

Page 42: Fundamental Neurocomputing Concepts

42

Adaline Majority

otherwisexxxMAJxy n

n

jj 1

1 xofmajority theif1,...,,sgn j

211

Page 43: Fundamental Neurocomputing Concepts

43

Adaline Linear separability

The Adaline acts as a classifier which separates all possible input patterns into two categories.

The output of the linear combiner is given as

)(

)()(

)(

)()(

0)()()()()(

0)(

)()()()()()(

2

01

2

12

02211

02211

kw

kwkx

kw

kwkx

or

kwkxkwkxkw

kv

kwkxkwkxkwkv

Page 44: Fundamental Neurocomputing Concepts

44

AdalineLinear separability of the Adaline

Adaline 只能分割線性可分割的 patten

Page 45: Fundamental Neurocomputing Concepts

45

AdalineNonlinear separation problem

非 straight line ,Adaline 無法分割

Page 46: Fundamental Neurocomputing Concepts

46

Adaline Adaline with nonlinearly transformed inputs (polynomial d

iscriminant function) To solve the classification problem for patterns that are not linearl

y separable, the inputs to the Adaline can be preprocessed with fixed nonlinearities. (polynomial discriminant function)

22524213

122110

)()()(

)()()()(

xkwxkwxxkw

xkwxkwkwkv

(2.45)

Page 47: Fundamental Neurocomputing Concepts

47

Adaline The critical thresholding condition for this Adaline with nonlinearly tr

ansformed inputs occurs when v(k) in (2.45) is set to zero. Realizing a nonlinearly separable function (XNOR)

If the appropriate nonlinearities are chosen, the network can be trained to separate the input space into two subspaces which are not linearly separable.

Page 48: Fundamental Neurocomputing Concepts

48

Adaline (cont.) Linear error correction rules

有兩種基本的線性修正規則,可用來動態調整網路的權重值。( 網路權重的改變與網路實際輸出和 desire 輸出的差異有關 ) -LMS: same as (2.22) and (2.29) -LMS: a self-normalizing version of the -LMS learning rule

-LMS 演算法是根據最小擾動原則 (minimal-disturbance principle) ,當調整權重以適應新的 pattern 的同時,對於先前的 pattern 的反應,應該有最小的影響。

-LMS 是基於最小化 MSE 表面。 -LMS 則是更新權重使得目前的誤差降低。

2

2)(

)()()()1(

kx

kxkekwkw (2.46)

Page 49: Fundamental Neurocomputing Concepts

49

Adaline (cont.) Consider the change in the error for -LMS

From (2.47) The choice of a controls stability and speed of convergence,

is typically set in the range

kekekx

kxkxkeke

kekxkx

kxkekwkd

kekxkwkdkekeke

T

TT

T

2

2

2

2

11

ke

ke

11.0

(2.47)

(2.48)

Page 50: Fundamental Neurocomputing Concepts

50

Adaline (cont.) Detail comparison of the -LMS and -LMS

From (2.46)

Define normalized desired response and normalized training vector

Eq(2.49) can be rewrote as

222

2

2

2

2

)()()(

)(

)(

)()()()1(

kx

kx

kx

kxkw

kx

kdkw

kx

kxkxkwkdkw

kx

kxkekwkw

T

T

22

,kx

kxkx

kx

kdkd

)()1( kxkxkwkdkwkw T

和 -LMS 具有相同的型式,所以 -LMS 表示正規化輸入樣本後的 -LMS 。

(2.49)

(2.50-51)

(2.52)

Page 51: Fundamental Neurocomputing Concepts

51

Multiple Adaline (Madaline)

單一個 Adaline 無法解決非線性分割區域的問題。

可使用多個 adaline Multiple adaline Madaline

Madaline I : single-layer network with single output. Madaline II : multi-layer network with multiple output.

Page 52: Fundamental Neurocomputing Concepts

52

Example of Madaline I network consisting of three Adalines

May be OR, AND, and MAJ

Page 53: Fundamental Neurocomputing Concepts

53

Two-layer Madaline II architecture

Page 54: Fundamental Neurocomputing Concepts

54

Madaline I realization of an XNOR logic function

Page 55: Fundamental Neurocomputing Concepts

55

Multiple Adaline (Madaline)

)(

)()(

)(

)()(

移項整理後可得),(同除以0)()()()()()(

0)( ,1)(

)()()()()()()(

12

101

12

112

12

100122111

10

1001221111

kw

kwkx

kw

kwkx

kw

kwkxkwkxkwkx

kvkx

kwkxkwkxkwkxkv

Page 56: Fundamental Neurocomputing Concepts

56

Multiple Adaline (Madaline)

同理

)(

)()(

)(

)()(

移項整理後可得),(同除以0)()()()()()(

0)( ,1)(

)()()()()()()(

22

201

22

212

22

200211222

20

2002112222

kw

kwkx

kw

kwkx

kw

kwkxkwkxkwkx

kvkx

kwkxkwkxkwkxkv

Page 57: Fundamental Neurocomputing Concepts

57

Madaline I separation properties for the XNOR problem

Page 58: Fundamental Neurocomputing Concepts

58

Madaline Learning Strategies

Madline 的學習策略有兩種 Madaline rule I, MRI, for Madaline I

The basic idea is to adjust those weights for the neuron whose linear output vj(k) is closest to zero.

MRI follows the minimal-disturbance principle. Madaline rule II, MRII, for Madaline II

The weights are initially set to small random values. The training patterns are presented in a random fashion

with the objective of minimizing the average Hamming error over the training set.

Page 59: Fundamental Neurocomputing Concepts

59

Simple Perceptron

Simple perceptron (single-layer perceptron) Very similar to the Adaline, 由 Frank Rosenblatt (1950) 提

出。 Minsky and Paper 發縣現一個嚴重的限制: perceptron 無

法解決 XOR 的問題。 藉由正確的 processing layer ,可解決 XOR 問題,或是, p

arity function 的問題。 Simple perceptron 和典型的 pattern classifier 的 maximum-

likelihood Gaussian classifier 有關,均可視為線性分類器。 大部分的 perceptron 的訓練是 supervised ,也有部分是 sel

f-organizing 。

Page 60: Fundamental Neurocomputing Concepts

60

Simple Perceptron (cont.)

In Rosenblatt’s early work The perceptron had three layers

Sensory surface (retina): Association area (A unit) Response unit (R unit)

It was not allowed to have more than one R unit on at a time. (Winner-take-all)

Page 61: Fundamental Neurocomputing Concepts

61

Simple Perceptron (cont.)

Original Rosenblatt’s perceptron Binary input, no bias.

Modified perceptron Bipolar inputs and a bias term Output y{-1,1}

Page 62: Fundamental Neurocomputing Concepts

62

Simple Perceptron (cont.) The quantizer error is used to adjust the synaptic we

ights of the neuron. The adaptive algorithm for adjusting the neuron wei

ghts (the perceptron learning rule) is given as

Rosenblatt normally set to unity. The choice of the learning rate does not affect the

numerical stability of the perceptron learning rule. can affect the speed of convergence.

kykdkxkwkdke

kxke

kwkw

T

sgn~where

2

~1 (2.55)

(2.56)

比較 (2.46)

Page 63: Fundamental Neurocomputing Concepts

63

Simple Perceptron (cont.)

The perceptron learning rule is considered a nonlinear algorithm.

The perceptron learning rule performs the update of the weights until all the input patterns are classified correctly. The quantizer error will be zero for all training patt

ern inputs, and no weight adjustments will occur. The weights are not guaranteed to be optimal.

Page 64: Fundamental Neurocomputing Concepts

64

Simple Perceptron

Mays’s perceptron learning rules Mays 對標準的 perceptron learning rules 提出兩個修改: Increment adaptation algorithm

Modified relaxation algorithm

kvifkx

kxkdkw

kvifkx

kxkekw

kw )(

2

)(~

1

2

2

2

2

otherwise

kx

kxkekw

kekvifkw

kw2

2

)(

0)(~ and

1

(2.58)

(2.57)

Page 65: Fundamental Neurocomputing Concepts

65

Simple Perceptron with a Sigmoid Activation Function The learning rule is based on the method of steepest

descent and attempts to minimize an instantaneous performance function.

Page 66: Fundamental Neurocomputing Concepts

66

Simple Perceptron with a Sigmoid Activation Function (cont.) 學習演算法可由 MSE 推導獲得

The instantaneous performance function to be minimized is given as

kykdke

keEwJ

qqq

qq

~ where

~2

1 2

qq

Tqq

qqqq

qqqq

kwkxfkvfky

kykykdkd

kykdkewJ

where

22

12

1~2

1

22

22

(2.61)

(2.60)

(2.59)

Page 67: Fundamental Neurocomputing Concepts

67

Simple Perceptron with a Sigmoid Activation Function (cont.) 假設 activation function 為 hyperbolic tangent sigm

oid ,因此,神經元的輸出可表示成

根據 (2.15) 式對 hyperbolic tangent sigmoid 函數的微分

採用 steepest descent 的 discrete-time learning rule

kvkvfky qqhtsq tanh

kvfkvfkvg qqq21'

qwqq wJkwkw 1

( 參考 2.29 式 )

(2.64)

(2.63)

(2.62)

Page 68: Fundamental Neurocomputing Concepts

68

Simple Perceptron with a Sigmoid Activation Function (cont.) 計算 (2.64) 式的梯度 (gradient)

以 (2.63) 式代入 (2.65) 式

採用 (2.66) 式的 gradient ,則 discrete-time learning rule for simple perceptron 可寫成

kxkvfke

kxkvfkvfkd

kxkvfkvfkxkvfkdwJ

qq

qqq

qqqqqw

'~'

''

kxkyke

kxkvkewJ

qq

qqqw

2

2

1~f1~

kxkykekwkw qqqq21~1

(2.65)

(2.66)

(2.67)

Page 69: Fundamental Neurocomputing Concepts

69

Simple Perceptron with a Sigmoid Activation Function (cont.) (2.67) 式可改寫成 scalar form

其中

(2.68) 、 (2.69) 和 (2.70) 為 backpropagation training algorithm 的標準形式。

kxkykekwkw jqqqjqj21~1

n

jqqjjqq

qqq

kwkxfkvfky

kykdke

1

~

(2.70)

(2.69)

(2.68)

Page 70: Fundamental Neurocomputing Concepts

70

Example 2.2 Figure 2.30 的結構, to learn character “E” The character image consists of 5x5 array, 25 pixel (column majo

r) The learning rule is (2.67), with =1, =0.25 The desired neuron response d=0.5, error goal 10-8. The initial weights of the neuron were randomized. After 39 training pattern, the actual neuron output y=0.50009 (se

e Fig. 2.32)

Page 71: Fundamental Neurocomputing Concepts

71

Example 2.2 (cont.)

The single neuron cannot correct for a noisy input.

For Fig. 2.31 (b), y=0.5204 For Fig. 2.31 (c), y=0.6805 To compensate for noisy

Multi-layer perceptron Hopfield associative

memory.

Page 72: Fundamental Neurocomputing Concepts

72

Feedforward Multilayer Perceptron Multilayer perceptron (MLP)

The branches can only broadcast information in one direction. Synaptic weight can be adjusted according to a defined learni

ng rule. h-p-m feedforward MLP neural network. In general there can be any number of hidden layers in the arc

hitecture; however, from a practical perspective, only one or two hidden layer are used.

Page 73: Fundamental Neurocomputing Concepts

73

Feedforward Multilayer Perceptron (cont.)

Page 74: Fundamental Neurocomputing Concepts

74

Feedforward Multilayer Perceptron (cont.) The first layer has the weight matrix

The second layer has the weight matrix

The third layer has the weight matrix

Define a diagonal nonlinear operator matrix

nhjiwW )1()1(

hprjwW )2()2(

pmsrwW )3()3(

)()()( ,...,, fffdiagf (2.71)

Page 75: Fundamental Neurocomputing Concepts

75

Feedforward Multilayer Perceptron (cont.) The output of the first layer can be written as

The output of the second layer can be written as

The output of the third layer can be written as

將 (2.72) 代入 (2.73) ,再代入 (2.74) 可得最後的輸出為

xWfvfxout)1()1()1()1(

1

1)2()2()2()2(

2 outout xWfvfx

2)3()3()3()3(

3 outout xWfvfx

(2.72)

(2.73)

(2.74)

xWfWfWfy )1()1()2()2()3()3( (2.75)

The synaptic weights are fixed, a training process must be carried out a prioriTo properly adjust the weights.

Page 76: Fundamental Neurocomputing Concepts

76

Overview of Basic Learning Rules for a Single Neuron Generalized LMS Learning Rule

定義一個需最小化的 performance function (energy function)

其中 , ||w||2為向量 w 的 Euclidean norm(.) 為任何可微分的函數, e is the linear error 。

2

22

1)( we

xwde TDesired output

Weight vector Input vector

(2.76)

(2.77)

Page 77: Fundamental Neurocomputing Concepts

77

Generalized LMS Learning Rule (cont.) 採用最陡坡降法 (steepest descent approach) ,可獲得 gen

eral LMS algorithm 。 Continuous-time learning rule( 可視為向量的微分 )

Discrete-time learning rule

If (t)=1/2t2, and ’(t)=g(t)=t, then (2.81) is written as

wxeg

wdt

dww

)(

)(

)()()()(

)()()1(

kwkxegkw

wkwkw w

Learning rate Leakage factor

(2.78)

(2.79)

(2.82)

(2. 81)

wexwexwexdt

dw (2.83)

Page 78: Fundamental Neurocomputing Concepts

78

Generalized LMS Learning Rule (cont.)

Leaky LMS algorithm (0<=<1)

Standard LMS algorithm (=0)

The scalar form of standard LMS algorithm

)()()()1(

)()()()()1(

kxkekw

kwkxkekwkw

)()()()1( kxkekwkw

(2.84)

(2.85)

n

jjj

jjj

kxkwkdke

nj

kxkekwkw

1

)()()()(

,...,2,1,0for

)()()()1( (2.86)

Page 79: Fundamental Neurocomputing Concepts

79

Generalized LMS Learning Rule (cont.) Standard LMS 可有三種重要的變化:

慣性 (momentum)被設計來在平均下坡力量的方向上,提供一特殊動量來改變權重向量。可定義成目前權重 w(k) 和前一權重 w(k-1) 間的差異。

因此 (2.85) 式可改寫成

其中 0<<1 為 momentum parameter

1)( kwkwkw

11 kwkwkxkekwkw

(2.87)

(2.88)

Standard LMS algorithm with momentum

Page 80: Fundamental Neurocomputing Concepts

80

Generalized LMS Learning Rule (cont.)

Recursive weighted least-squares包含有 參數向量 (parameter vector) 的更新表示 增益向量 (gain vector) 的更新表示 權重矩陣 (weighting matrix) 的更新表示 The update expression for the parameter vector is

The update expression for the gain vector is

The update expression for the weighting matrix is

)()1()()1( kekLkwkw

1)()()()()()1(

kxkPkxkwkPkL T

)()()1()(1

)1( kPkxkLkPkP T

11 nL11 nnP

11 nw

(2.89)

(2.90)

(2.91)

Page 81: Fundamental Neurocomputing Concepts

81

Generalized LMS Learning Rule (cont.)

其中 error term e(k) is given by

將 (2.90) 代入 (2.89) 得

因此修改後的 synaptic weight vector 可表示成

其中

)()()()( kxkwkdke T

)()()(

)()()()()1(

kxkPkx

kwkPkekwkw

T

)()()()()()1( kxkPkekkwkw

)()()(

1)(

kxkPkxk

T

(2.92)

(2.93)

(2.94)

(2.95)

Page 82: Fundamental Neurocomputing Concepts

82

Generalized LMS Learning Rule (cont.)

將 (2.90) 代入 (2.91) 可得 weighting matrix

The update expression for the weighting matrix is written as

)()()()()(1

)()()()()()(1

)()()(

)()()()()(

1)1(

kPkxkxkPkI

kPkxkxkPkkP

kxkPkx

kPkxkxkPkPkP

T

T

T

T

)()()()()(1

)1( kPkxkxkPkIkP T

(2.96)

(2.97)

Page 83: Fundamental Neurocomputing Concepts

83

Generalized LMS Learning Rule (cont.)

最小擾動原則 (minimal disturbance principle) Modified normalized LMS 在 (2.46) 式中,在分母的地方加入正的常數,確保權重的更

新不會變成無限大。

2

2

1kx

kxkekwkw

(2.98)

Page 84: Fundamental Neurocomputing Concepts

84

Example 2.3

The same as Example 2.1 ,但使用不同的 LMS algorithm 。

Use the same Initial weight vector Initial learning rate Termination criterion

Page 85: Fundamental Neurocomputing Concepts

85

Overview of basic learning rules for a single neuron Hebbian Learning

[Donald Hebb] The strength of a synapse between cells A and B increased slightly for the situation when the firing in A was followed by firing in B with a very small time delay. For two neurons on either side of a synapse that are synchronously a

ctivated, then the strength of the synapse is increased. [Stent] expanded Hebb’s original statement to include the case wh

en two neurons on either side of a synapse are asynchronously activated, leading to a weakened synapse.

[Rumelhart] Adjust the strength of the connection between units A and B in proportion to the product of their simultaneous activation. If the product of the activations is positive, the modification to the syna

ptic connection is more excitatory. If the product of the activations is negative, the modification to the syn

aptic connection is more inhibitory.

Page 86: Fundamental Neurocomputing Concepts

86

Overview of basic learning rules for a single neuron (cont.)

Hebbian synapse Uses a highly local, time-dependent, and strongly interac

tive mechanism to increase synaptic efficiency as a function of the correlation between the presynaptic and postsynaptic activity levels.

Page 87: Fundamental Neurocomputing Concepts

87

Overview of basic learning rules for a single neuron (cont.)

Four key properties of a Hebbian synapse Time-dependent mechanism

To change in a Hebbian synapse that depend on the precise time of occurrence of the the presynaptic and postsynaptic activity levels.

Local mechanism Within a synapse, ongoing activity levels in the presynaptic and postsy

naptic units are used by a hebbian synapse to produce an input-dependent, local synaptic modification.

Interactive mechanism Any form of hebbian learning depends on the interaction between pres

ynaptic and postsynaptic activities.

Conjunctional (correlational) mechanism The “co-occurrence” of presynaptic and postsynaptic activities within a

relatively short time interval is sufficient to produce a synaptic modification.

Page 88: Fundamental Neurocomputing Concepts

88

Overview of basic learning rules for a single neuron (cont.)

Synaptic activities can be categorized as Hebbian

A Hebbian synapse increases its strength, and its strength is decreased when the activities are either uncorrelated or negatively correlated.

Anti-Hebbian An anti-Hebbian synapse enhance negatively correlated pr

esynaptic or postsynaptic activities and weakens positively correlated activities.

Non-Hebbian A non-Hebbian synapse does not involve the strongly inter

active, highly local, time-dependent mechanism.

Page 89: Fundamental Neurocomputing Concepts

89

Overview of basic learning rules for a single neuron (cont.) Standard Hebbian learning for a single neuro

n

Page 90: Fundamental Neurocomputing Concepts

90

Overview of basic learning rules for a single neuron (cont.)

The standard Hebbian learning rule for a single neurons from an energy function defined as

The output of the neuron

Taking the steepest descent approach to derive the continuous-time learning rule

2

22)( wxww T

vfvd

vdy

wdt

dww

xwv T

(2.99)

(2.100)

(2.101)

Page 91: Fundamental Neurocomputing Concepts

91

Overview of basic learning rules for a single neuron (cont.)

The gradient of (2.99) is given as

Using (2.102) and (2.101), the continuous-time standard Hebbian learning rule for a single neuron is given as

The discrete-time standard Hebbian learning rule is

The scalar discrete-time form

wyxww

vvfww

)()(

wyxdt

dw

)()()()()1( kwkxkykwkw

)()()()()1( kwkxkykwkw jjjj

(2.102)

(2.103)

(2.104)

(2.105)

Page 92: Fundamental Neurocomputing Concepts

92

Overview of basic learning rules for a single neuron (cont.) Generalized Hebbian learning rule

Can be considered as a gradient optimization process when an appropriate energy or Lyapunov function is selected

The resulting generalized Hebbian learning rule is given as

The discrete-time form as

w

w

dt

dw

)(

wxdt

dw

)()()()1(

)()()()(

)()()()()1(

kxkkw

kxkkwkw

kwkxkkwkw

(2.106)

(2.107)

(2.108)

Page 93: Fundamental Neurocomputing Concepts

93

Overview of basic learning rules for a single neuron (cont.)

Assume that the learning signal is the output of the neuron

Therefore, (2.107) becomes

)()(

vfdv

vdy

wyxdt

dw

(2.109)

(2.110)

Page 94: Fundamental Neurocomputing Concepts

94

Overview of basic learning rules for a single neuron (cont.) Oja’s Learning Rule

Oja’s learning rule 可由最小化下列能量函數推導:

兩個基本假設: 神經元的權重向量是正規化的 (normalized) , ||w||2=1 神經元採用線性的激發函數 (linear activation function) , y=w

Tx 因此能量函數可改寫成

xxewhere

ew

ˆ

)(2

221

2

2

2

22

12

12

1

wywxywxx

wyxywx

wyxw

TTT

TT

(2.111)

(2.112)

(2.114)

Page 95: Fundamental Neurocomputing Concepts

95

Overview of basic learning rules for a single neuron (cont.)

採用 steepest descent 法, Oja’s 的 continuous-time learning rule 可改寫成:

將 (2.116)帶入 (2.115) 可得 Oja’s 的 continuous-time learning rule

rate learning 是 0 where,

2wyxyw

wdt

dw

w

w

xwvywhere

wyxydt

dw

T

2

(2.115)

(2.116)

(2.117)

(2.118)

Hebbian cooccurance term Active decay term

Page 96: Fundamental Neurocomputing Concepts

96

Overview of basic learning rules for a single neuron (cont.)

(2.117) 式可改寫成 discrete-time 格式:

Typical (simple) form of Hebbian learning

kykwkxkykwkw

kykwkxkykwkw

jjjj

)(1

as writtern becan (2.119) of formscalar The

)(1

kxkykwkw jjj )(1

(2.119)

(2.120)

(2.121)

Page 97: Fundamental Neurocomputing Concepts

97

Example 2.4

Page 98: Fundamental Neurocomputing Concepts

98

Overview of basic learning rules for a single neuron (cont.) Potential Learning Rule

不需要 desired signal ,是一種 unsupervised learning 。 The learning is performed on the activity level of the neuro

n. Potential learning rule 可由最小化下列能量函數推導:

function loss theis

0

where

2

22

xwv

wxww

T

T(2.122)

Page 99: Fundamental Neurocomputing Concepts

99

Overview of basic learning rules for a single neuron (cont.)

The gradient of (2.122) is given as

The learning signal is

Using Amari’s result in (2.106) and the gradient in (2,123), the continuous-time potential learning rule

The discrete-time form

The discrete-time scalar form

wxvww )()(

dv

vdv

)()(

wxvdt

dw )(

)()()()()1( kwkxvkwkw

)()()()()1( kwkxvkwkw jjjj

(2.123)

(2.124)

(2.125)

(2.126)

Page 100: Fundamental Neurocomputing Concepts

100

Overview of basic learning rules for a single neuron (cont.) Correlation Learning Rule

Minimizing the energy function

The gradient with respect to the synaptic weight matrix is

利用 Amari’s (2.106) 及 (2.128) 可得 continuous-time correlation learning rule

The discrete-time form can be written as

The discrete-time scalar form is

2

22)( wxdww T

wdxww )(

wdxdt

dw

kwkxkdkwkw 1

kwkxkdkwkw jjjj 1

Correlation learning rule通常應用在記憶網路中紀錄資料,如果將 (2.129) 式中的 d 改成 y ,則成為 hebbian leaning rule 。

(2.127)

(2.128)

(2.129)

(2.130)

(2.131)

Page 101: Fundamental Neurocomputing Concepts

101

Overview of basic learning rules for a single neuron (cont.) Standard Perceptron Learning Rule

可由 minimizing the MSE criterion 來推導獲得

其中

神經元的輸出

採用 steepest descent approach, the continuous-time learning rule is given by

2

2

1)( ew

yde

)()( vfxwfy T

)(wdt

dww

(2.132)

(2.133)

(2.134)

Page 102: Fundamental Neurocomputing Concepts

102

Overview of basic learning rules for a single neuron (cont.)

則 (2.132) 式的 gradient 可得

x

xdv

vdfe

xdv

vdfvd

xdv

vdfvx

dv

vdfdww

)(

)()(

)()(

)()(

)()(')(

vegvefdv

vdfe

其中

(2.135)

(2.136)

Page 103: Fundamental Neurocomputing Concepts

103

Overview of basic learning rules for a single neuron (cont.)

使用 (2.134), (2.135) 和 (2.136) 式, the continuous-time standard perceptron learning rule for a single neuron as

(2.137) 式可改寫成 discrete-time形式

The scalar form of (2.138) can be written as

xdt

dw

)()()()1( kxkkwkw

)()()()1( kxkkwkw jjj

(2.137)

(2.138)

(2.139)

Page 104: Fundamental Neurocomputing Concepts

104

Overview of basic learning rules for a single neuron (cont.) Generalized Perceptron Learning Rule

Page 105: Fundamental Neurocomputing Concepts

105

Overview of basic learning rules for a single neuron (cont.) Generalized Perceptron Learning Rule

When the energy function is not defined to be the MSE criterion, we can define a general energy function as

其中為可微函數。如果 e2 則變成 standard perceptron learning rule.

其中

)()()( ydew

w

v

v

y

y

e

eww

)(

)()(')(

eee

e

)(vfxwfy T

(2.141)

(2.142)

(2.143)

(2.140)

Page 106: Fundamental Neurocomputing Concepts

106

Overview of basic learning rules for a single neuron (cont.)

f(.) is a differentiable function, and

(2.141) can be written as

The continuous-time general perceptron learning rule is given as

If we define the learning signal as

(2.146) can be written as

)()(')()(

vgvfdv

vdf

dv

vdy

xvgeww )()()(

xvgedt

dw)()(

)()( vge

xdt

dw

(2.144)

(2.145)

(2.146)

(2.147)

(2.148)

Page 107: Fundamental Neurocomputing Concepts

107

Overview of basic learning rules for a single neuron (cont.) Discrete-time form

Discrete scalar form

)()()()1( kxkkwkw

)()()()1( kxkkwkw jjj

(2.149)

(2.150)

Page 108: Fundamental Neurocomputing Concepts

108

Data Preprocessing

The performance of a neural network is strongly dependent on the preprocessing that is performed on the training data.

Scaling The training data can be amplitude-scaled in two

ways The value of the pattern lie between -1 and 1. The value of the pattern lie between 0 and 1.

Referred to as min/max scaling. MATLAB: premnmx

Page 109: Fundamental Neurocomputing Concepts

109

Data Preprocessing (cont.)

Another scaling process Mean centering

如果用來 training 的 data包含有 biases 時。 Variance scaling

如果用來 training 的 data 具有不同的單位時。 假設輸入向量以行方向排列成矩陣 目標向量也以行方向排列成矩陣 Mean centering

計算矩陣 A 、 C 中每一列的 mean value 將矩陣 A 、 C 中的每個元素,減去該對應的 mean value 。

Variance scaling 計算矩陣 A 、 C 中每一列的 standard deviation. 將矩陣 A 、 C 中的每個元素,除以該對應的 standard deviation 。

mnA mpC

Page 110: Fundamental Neurocomputing Concepts

110

Data Preprocessing (cont.)

Transformations The feature of certain “raw” signals are used fro training

inputs provide better results than the raw signals. A front-end feature extractor can be used to discern salient

or distinguishing characteristics of the data. Four transform methods:

Fourier Transform Principal-Component Analysis Partial Least-Squares Regression Wavelets and Wavelet Transforms

Page 111: Fundamental Neurocomputing Concepts

111

Data Preprocessing (cont.)

Fourier Transform The FFT can be used to extract the import

features of the data, and then these dominant characteristic features can be used to train the neural network.

Page 112: Fundamental Neurocomputing Concepts

112

Data Preprocessing (cont.)三個具有相同波形,不同相位的信號,每個信號具有 1024 個取樣點。

Page 113: Fundamental Neurocomputing Concepts

113

Data Preprocessing (cont.)

在 FFT magnitude response 上,具有相同 magnitude response ,而且只需 16 個 magnitude取樣即可。

三個具有相同波形,不同相位的信號,在 FFT 的相位上則有所不同。

Page 114: Fundamental Neurocomputing Concepts

114

Data Preprocessing (cont.)

Principal-Component Analysis PCA can be used to “compress” the input training

data set, reduce the dimension of the inputs. By determining the important features of the data

according to an assessment of the variance of the data.

In MATLAB, prepca is provided to perform PCA on the training data

Page 115: Fundamental Neurocomputing Concepts

115

Data Preprocessing (cont.) Given a set of training data

where assumed that m>>n,n denote the dimension of the input training patternsm denote the number of training pattern.

Using PCA, an “optimal” orthogonal transformation matrix can be determined

where h<<n (the degree of dimension reduction) The dimension of the input vectors can be reduced according to t

he transformation

where Ar is the reduced-dimension set of training patterns.The columns of Ar are the principal components for each of the inputs from A

mnA

nhpcaW

AWA pcar

mhrA

(2.151)

Page 116: Fundamental Neurocomputing Concepts

116

Data Preprocessing (cont.) Partial Least-Squares Regression

PLSR can be used to compress the input training data set. Restricted for use with supervised trained neural networks. Only scalar target values are allowed. The factor analysis in PLSR can determine the degree of

compression of the input data. After the optimal number of PLSR factor h has been

determined, the weight loading vectors can be used to transform the data similar to the PCA approach.

The optimal number weight loading vectors can form an orthogonal transformation matrix as the columns of the matrix

The dimension of the input vectors can be reduced according to the transformation

hnplsrW

AWA Tplsrr (2.152)

Page 117: Fundamental Neurocomputing Concepts

117

Data Preprocessing (cont.)

PCA and PLSR orthogonal transformation vectors used for data compression

PLSR 使用輸入資料與目標資料來產生orthogonal transformation Wplsr 的weight loading vector

Page 118: Fundamental Neurocomputing Concepts

118

Data Preprocessing (cont.)

Wavelets and Wavelet Transforms A wave is an oscillating function of time. Fourier analysis is used for analyzing waves

Certain function can be expanded in terms of sinusoidal waves. How much of each frequency component is required to synthesize the sign

al. Very useful for periodic, time-invariant, stationary signal analysis.

A wavelet can be considered as a small wave, whose energy is concentrated. Useful for analyzing signals that are time-varying, transient, nonstatio

nary. To allow for simultaneous time and frequency analysis. Wavelets are local waves. The wavelet transform can provide a time-frequency description of si

gnals and can be used to compress data for training neural network.