Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review...

24
1 Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva

Transcript of Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review...

Page 1: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

1

Neural Networks Review

Machine Learning 10-601 Fall 2012 Recitation

October 2 & 3, 2012 Petia Georgieva

Page 2: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

2

Outline

• Biological neuron networks

• Artificial neuron model (perceptron)

• Activation functions

• Gradient Descent

• BackPropagation

Page 3: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

3

Biological neuron networks

bird cerebellum

human hippocampus human cortex

http://en.wikipedia.org/wiki/Neuron

Page 4: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

4

Rat neo-cortex

BlueBrainProject: http://mediatheque.epfl.ch/sv/Blue_brain

Page 5: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

5

Artificial neuron model (McCulloch and Pitts, 1942)

ℜ→ℜng :

[ ]nxxx ,...., 21=x - Neuron Input Vector

1, 00

==∑=

xxwsn

iii (bias)

)(sfy = - Neuron Scalar Output

(.)f - Activation Function

Page 6: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

6

Biological analogy

Page 7: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

7

Activation functions • Linear : ssf β=)(

• Step (Heaviside): ⎩⎨⎧

<

≥=

0,0,

)(1

2

ss

sfβ

β, normally ( ) ( )1/0,1 112 −=== βββ

• Ramp: ⎪⎩

⎪⎨

>−

<

>

=

ββ

β

ββ

ssss

sf,,,

)(

• Sigmoid: )exp(1

1)(s

sfλ−+

=

• Tangent hyperbolic: )exp()exp()exp()exp()(sssssfλλλλ−+

−−=

Page 8: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

8

Activation functions

Page 9: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

9

Discriminative capacity of one neuron

Linearly separable objects – Hyperplane that separates objects: ( )0/0 <> yy Rosenblatt's perceptron (late 1950's) - simple Learning Machine, step act. funct. Example : Logical AND or Logical OR are classification problems.

Page 10: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

10

Limitations of perception (Minsky and Papert, Perceptrons, 1969)

The exclusive-or (XOR) problem cannot be computed by a perceptron (or any two layer network). XOR problem - the output must be turned on when either of the inputs is turned on, but not when both are turned on. XOR is not a linearly separable problem.

Page 11: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

11

Why a Hidden Layer? Idea 1: Recode the XOR problem in three dimensions so that it becomes linearly separable. By adding an appropriate extra feature, it is possible to linearly separate the two classes.

Idea 2: Another way to make a problem become linearly separable is to add an extra (hidden) layer between the inputs and the outputs. ! Given a sufficient number of hidden units, it is possible to recode any unsolvable problem into a linearly separable one.

Page 12: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

12

Gradient Descent (GD)

Quadratic Error Function: ( )2jjj yde −=

jd - target output, jy - observed (NN) output

fig.: A. P. Engelbrecht, Computational Intelligence, An Introduction, John Wiley & Sons, 2002

Page 13: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

13

Gradient Descent (GD) Given data point j [ ]( )jjnjjj dxxx →≡ ,.., 21x

Each weight iw at iteration t is updated as

)()1()( twtwtw iii Δ+−=

⎥⎦

⎤⎢⎣

∂−=Δ

i

ji w

etw η)(

( ) jij

jji

j xsfyd

we

∂∂

−−=∂

∂2

jix - input i of data point j, η- learning rate

f – neuron activation function Any constraints on f ?

Page 14: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

14

GD- Sigmoid Activation Function

If )exp(1

1)(s

sfλ−+

= ( ))(1)( sfsfsf

−=∂∂

⇒ λ

( ) ( ) jijjjji xsfsfydtw )(1)(2)( −−=Δ λη

Least Mean Squares (Delta Rule)

If ssf =)( - linear act. funct. 1=∂∂

⇒sf

( ) jijji xydtw −=Δ η2)(

Adaline (Widrow e Hof, 1959) is the first hardware realization of a NN trained with Delta Rule.

Page 15: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

15

Multilayer perceptron (MLP)

FeedForward Neural Network (FFNN)

Perceptron (one layer)

Page 16: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

16

MLP as universal approximator

• FFNN with 1 hidden layer and continuous and differentiable activation functions can approximate any continuous function.

1. G. Cybenko. Approximation by superposition of a sigmoidal function. Mathematics of Control, Signal and Systems, 2:303-314, 1989.

2. K. Hornik. Some new results on neural network approximation. Neural Networks, 6:1060-1072, 1993.

• FFNN with 2 hidden layer and continuous and differentiable activation

functions can approximate any function.

3. A.Lapedes and R. Farber. How neural nets work. In Anderson, editor, Neural Information Processing Systems, pages 442-456. New York, American Institute of Physics, 1987.

4. G. Cybenko. Continuous valued neural networks with two hidden layers are sufficient. Technical report, Dep. of Computer Science, Tufts University, Medford, MA, 1988.

Page 17: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

17

BackPropagation (BP)

1. Initialize the NN weights For each object of the training set compute:

2. Forward computations

layerfirst 0 ∑ −+=i

iijjj xwws

sywwsi

iijjj layeroutput &hidden layerprevious

0 ∑∈

−+=

jsfy jj neuron ofoutput )( −=

Page 18: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

18

BackPropagation (BP)

3. Backward computations (c –cost function)

layeroutput )( −∂∂ʹ′=j

jj ycsfe

layersinput &hidden)(layernext

−ʹ′= ∑∈p

pjpjj ewsfe

4. Weights update (batch training)

ijijij www Δ+←

layerfirst −−=Δ ∑∈Xx

jiij exN

w η

layersoutput &hidden−−=Δ ∑∈Xx

jiij eyN

w η

5. Stop if conditions are achieved, if not go back to p. 2

Page 19: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

19

BackPropagation (BP)

4. Weights update (on-line training)

layerfirst −−=Δ jiij exw η layersoutput &hidden−−=Δ jiij eyw η

For f sigmoide: ( ))(1)()( jjj sfsfsf −=ʹ′ , 1=λ

Mean Squared Error (MSE) cost function

( )∑=

−=N

ijj yd

Nc

1

21

( )jjj

ydNy

c−−=

∂∂ 2

Page 20: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

20

Classification with MPL

MLP creates a decision boundary between classes. Rule of thumb: Each line of the boundary corresponds to one hidden layer neuron.

 (fig.: Engelbrecht, p.50)

Page 21: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

21

Regression with MPL (function approximation)

Rule of thumb: # of hidden layer neurons = inflexion points of the function to approximate

(fig.: Engelbrecht, p.51)

# of hidden layer neurons ? The NN architecture?

Page 22: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

22

Lu ANN Applications -

System Identification

Page 23: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

23

ANN Applications - Model Predictive Control (NN MPC)

Page 24: Neural Networks Reviewtom/10601_fall2012/recitations/Recitation… · Neural Networks Review Machine Learning 10-601 Fall 2012 Recitation October 2 & 3, 2012 Petia Georgieva . 2 Outline

24

ANN Applications - BRAIN MACHINE INTERFACE (BMI)