New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij...

57
INTRODUCTION TO NEURAL NETWORKS

Transcript of New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij...

Page 1: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

INTRODUCTION TO NEURAL NETWORKS

Page 2: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Complex computations: Mach’s Bands

Observe the transitions among the bands

Page 3: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Complex computations: Mach’s Bands

Page 4: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Da: R. Pierantoni, La trottola di Prometeo, Laterza (1996)

Complex computations: Mach’s Bands

Observe the transitions among the bands

Page 5: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Stimulus Percept

Inte

nsi

tyIn

ten

sity

Complex computations: Mach’s Bands

Page 6: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

A simple model of the retina neuron

0

50

100

150

200

250

0 20 40 60 80 100

Incident Intensity (fotons/s)

Potential (mV)

Linear light-to-potential transducer

Light

Potential

Page 7: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Neuron transduction

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0

40

80

120

160

200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Fo

ton

s/s

mV

Page 8: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Adding lateral inhibition

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Fo

ton

i/s

Each neuron inhibits its neighbor by a 10% of its non inhibited potential

160

-0.1x160-0.1x160

128

Page 9: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Adding lateral inhibition

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Fo

ton

s/s

mV

0

40

80

120

160

200

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

160 - 0.1 160-0.1 40=140

40 - 0.1 160-0.1 40=20

40 - 0.1 40-0.1 40=32

160 - 0.1 160-0.1 160=128

Each neuron inhibits its neighbor by a 10% of its non inhibited potential

Page 10: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Many identical computing units, each one performing verysimple operations, can perform very complex computationswhen they are widely and specifically connected.

The “knowledge” is stored in the topology and in the strength of the synapses

Complex computations: Mach’s Bands

Page 11: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Model Neuron: McCulloch and Pitts

A neuron is a computational unit that1) performs the weighted sum of the input signals,computing the activation signal (a)

2) transforms the activation signal following a tranferfunction g and computing then the output z

i

d

i

i xwa1

)(agz

w: synaptic weights: activation threshold

Page 12: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Transfer functions

0

0,5

1

-10 0 10

aeag

1

1)(

Usually, NON linear functions are adopted

Page 13: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Non linearity

0

0,5

1

-10 0 10

The same variation in the input can give very different variations on the transferred signal

Page 14: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Artificial Neural Networks

Wij Synaptic weitghs

Neuron i

-

i

d

i

i xwa1

)(agz

The threshold can beimplicitly considered by addingan extra-neuron, alwaysactivated and connected tothe current neuron withweight equal to -

Page 15: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Topology of artificial neural networks

The topology of the connections among the neurons definesthe network class. We will take into consideration only thefeed-forward architectures, where the neurons areorganized into hierarchical layers and the signal flows injust a direction.

Perceptrons2 layers: Input and Output

wij

ji

i

ijj xwgz

Page 16: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Neural networks and logical operators

2

1

3

ORw13 = 0.5 w23 = 0.5 3 = 0.25

a3 = 0.25

z3 = 1

a3 = 0.25

z3 = 1

a3 = 0.75

z3 = 1

a3 = -0.25

z3 = 0

Page 17: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

1

3

ANDw13 = 0.5 w23 = 0.5 3 = 0.75

a3 = -0.25

z3 = 0

a3 = -0.25

z3 = 0

a3 = 0.25

z3 = 1

a3 = -0.75

z3 = 0

Neural networks and logical operators

Page 18: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

1

3

NOT (1)w13 = -0.5 w23 = 0.1 3 = -0.25

a3 = -0.25

z3 = 0

a3 = 0.35

z3 = 1

a3 = -0.15

z3 = 0

a3 = 0.25

z3 = 1

Neural networks and logical operators

Page 19: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Supervised artificial neural networks

Feed-forward artificial neural networks can be trainedstarting from examples with known solution.

Error functionGiven a set of examples xi with known desired output, di andgiven a network with parameters w, the square error iscomputed starting from the output of the network z (j sumsover the output neurons )

2

,

),(2

1

ji

i

j

i

j dwxzE

The training procedure consists in finding the parameters wthat minimize the error: iterative minimization algorithmsare adopted. However they do NOT guarantee to reach theglobal minimum.

Page 20: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Training a perceptron

We consider a differentiable transfer function

aeag

1

1)(

)(1)(1

)('2

agage

eag

a

a

Given some initial parameters w:

ii

lj

i

j

i

j

i

j

i

jlj wxw

wxa

wxa

wxz

wxz

E

w

E

),(

),(

),(

),(

),(

i

j

i

ji

j

dwxzwxz

E

),(

),(

)('),(

),(ag

wxa

wxzi

j

i

j

i

lj

i

jlx

w

wxa

),(z2

z1

x2

x1

jj agz

j

id

i

ljj lxwa

1

2

,

),(2

1

ji

i

j

i

j dwxzE

deviation: d ij

Page 21: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Then

i

i

l

i

j

i

i

l

i

j

i

j

lj

xxagdwxzw

Ed)('),(

deviation: d ij

Using the gradient we can update the weights with the“steepest descent” procedure

lj

ljljw

Eww

is the learning rate:Too low: slow trainingToo high: the minima can be lost

Convergence:0

ljw

E

Training a perceptron

Page 22: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Steepest descent finds the minimum of a function by always pointing in the direction that leads downhill.

Page 23: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Steepest descent finds the LOCAL minimum of a function by always pointing in the direction that leads downhill.

Page 24: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

f: RnR. If f(x) is of class C2, objective function

Gradient of f

Is a vector containing all the partial derivatives of the first order

Gradient

Page 25: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Given a function f(xy) and a level curve f(x,y) = c

the gradient of f is:

Consider 2 points of the curve: (x,y); (x+εx, x+εy), for small ε

The Gradient is locally perpendicular to level curves

y

f

x

ff ,

(x,y)(x+εx, y+εy)

),(

),(),(

,

,,

yx

T

yx

y

yx

xyx

gyxf

y

f

x

fyxfyxf

ε

Page 26: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Since both : (x,y); (x+εx, x+εy), points satisfy the curve equation:

The gradient is perpendicular to ε.For small ε, ε is parallel to the curve and,by consequence, thegradient is perpendicular to the curve.

The gradient points towards the direction of maximumincrease of f

The local perpendicular to a curve: Gradient

(x,y)(x+εx, x+εy)

0),(

),(

yx

T

yx

T

f

fcc

ε

ε

ε

grad (f)

Page 27: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly
Page 28: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Steepest descent finds the LOCAL minimum of a function by always pointing in the direction that leads downhill.

Page 29: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Example: OR

2

1

3

w13 = 0 w23 = 0 3 = 0 =2

Training examplesx1 x2 d a z E dE/dw13 dE/dw23 dE/d3

1 0 1 0 0.5 0.125 -0.125 0 0.125

0 1 1 0 0.5 0.125 0 -0.125 0.125

0 0 0 0 0.5 0.125 0 0 -0.125

0 0 0 0 0.5 0.125 0 0 -0.125

0.5 -0.125 -0.125 0

aeag

1

1)( )1()(1)()(' zzagagag

i

i

l

i

j

i

i

l

i

j

i

j

lj

xxagdxzw

Ed)(')(

Page 30: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Example: OR, Step 1

2

1

3

w13 = 0.25 w23 = 0.25 3 = 0 =2

Training examplesx1 x2 d a z E dE/dw13 dE/dw23 dE/d3

1 0 1 0.25 0.56 0.096 -0.108 0 0.108

0 1 1 0.25 0.56 0.096 0 -0.108 0.108

0 0 0 0 0.5 0.125 0 0 -0.125

0 0 0 0 0.5 0.125 0 0 -0.125

0.442 -0.108 -0.108 -0.035

Page 31: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Example: OR, Step 2

2

1

3

w13 = 0.466 w23 = 0.466 3 = 0.069 =2

Training examplesx1 x2 d a z E dE/dw13 dE/dw23 dE/d3

1 0 1 0.397 0.598 0.081 -0.097 0 0.097

0 1 1 0.397 0.598 0.081 0 -0.097 0.097

0 0 0 -0.069 0.483 0.117 0 0 -0.121

0 0 0 -0.069 0.483 0.117 0 0 -0.121

0.395 -0.097 -0.097 -0.048

Page 32: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Example: OR, Step 3

2

1

3

w13 = 0.659 w23 = 0.659 3 = 0.164 =2

Training examplesx1 x2 d a z E dE/dw13 dE/dw23 dE/d3

1 0 1 0.494 0.621 0.072 -0.089 0 0.089

0 1 1 0.494 0.621 0.072 0 -0.089 0.089

0 0 0 -0.164 0.459 0.105 0 0 -0.114

0 0 0 -0.164 0.459 0.105 0 0 -0.114

0.354 -0.089 -0.089 -0.05

Page 33: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Generalization

2

1

3

w13 = 0.659 w23 = 0.659 3 = 0.164 =2

And what happens for the input (1,1)?x1 x2 d a z

1 1 1 1.153 0.760

The network generalized the rules learned from known examples

Page 34: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Linear separability

Given a step-like transfer function, the output neuron of a perceptron is activated if the activation is positive:

0a

01

i

d

i

i xw

The input space is then divided into two regions

If the requested mapping cannot be separated by an hyperplane, the perceptron in insufficient.

Page 35: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Linear separability

AND OR NOT(1)

XOR

The XOR problem cannot be solved with a perceptron.

Page 36: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Multi-layer feed-forward neural networks

Neurons are organized into hierarchical layers

Each layer receive their inputs from the previous one and transmits the output to the next one

w1ij

w2ij

111

ji

i

ijj xwgz

2122

ji

i

ijj zwgz

Page 37: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

(12

1

(11

1

(21

2

1w1

11

w122

w121

w112

w211

w221

XORw1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

a11 = -0.5 z1

1 = 0

a12 = -0.5 z1

2 = 0

a21 = -0.5 z2

1 = 0

x1 = 0 x2 = 0

Page 38: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

(12

1

(11

1

(21

2

1w1

11

w122

w121

w112

w211

w221

a11 = 0.2 z1

1 = 1

a12 = -0.2 z1

2 = 0

a21 = 0.2 z2

1 = 1

x1 = 1 x2 = 0

XORw1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

Page 39: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

(12

1

(11

1

(21

2

1w1

11

w122

w121

w112

w211

w221

a11 = 0.2 z1

1 = 1

a12 = -0.2 z1

2 = 0

a21 = 0.2 z2

1 = 1

x1 = 0 x2 = 1

XORw1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

Page 40: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

2

(12

1

(11

1

(21

2

1w1

11

w122

w121

w112

w211

w221

a11 = 0.9 z1

1 = 1

a12 = 0.1 z1

2 = 1

a21 = -0.5 z2

1 = 0

x1 = 1 x2 = 1

XORw1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

Page 41: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

The hidden layer maps the input in a new representation that is linearly separable

Input Desired Activation ofoutput hidden neurons

0 0 0 0 01 0 1 1 00 1 1 1 01 1 0 1 1

Page 42: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Supervised artificial neural networks

Feed-forward artificial neural networks can be trainedstarting from examples with known solution.

Error functionGiven a set of examples xi with known desired output, di andgiven a network with parameters w, the square error iscomputed starting from the output of the network z (j sumsover the output neurons )

2

,

),(2

1

ji

i

j

i

j dwxzE

The training procedure consists in finding the parameters wthat minimize the error: iterative minimization algorithmsare adopted. However they do NOT guarantee to reach theglobal minimum.

Page 43: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Training a perceptron

We consider a differentiable transfer function

aeag

1

1)(

)(1)(1

)('2

agage

eag

a

a

Given some initial parameters w:

ii

lj

i

j

i

j

i

j

i

jlj wxw

wxa

wxa

wxz

wxz

E

w

E

),(

),(

),(

),(

),(

i

j

i

ji

j

dwxzwxz

E

),(

),(

)('),(

),(ag

wxa

wxzi

j

i

j

i

lj

i

jlx

w

wxa

),(z2

z1

x2

x1

jj agz

j

id

i

ljj lxwa

1

2

,

),(2

1

ji

i

j

i

j dwxzE

deviation: d ij

Page 44: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Then

i

i

l

i

j

i

i

l

i

j

i

j

lj

xxagdwxzw

Ed)('),(

deviation: d ij

Using the gradient we can update the weights with the“steepest descent” procedure

lj

ljljw

Eww

is the learning rate:Too low: slow trainingToo high: the minima can be lost

Convergence:0

ljw

E

Training a perceptron

Page 45: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Training of multilayer network: Back-propagation

w1ij

w2ij

i

i

l

i

j

i

i

l

i

j

i zzagdwxzw

Ej

lj

,1,2,12

2)('),( d

For the layer 2, the perceptron formula holds, upon the substitution x z1,i

Page 46: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

i

l

i

i

j

i

i

j

i

j

xw

a

a

E

w

E

ljlj

,1

1

,1

,11d

i

j

i

k

k

i

k

ki

j

i

k

i

k

i a

a

a

a

a

E

a

E

j

,1

,2,2

,1

,2

,2,1

d

w1ij

w2ij

For the layer 1:

Defining d 1,ij

m

mk

i

m

i

k waga 2,1,2 )(2,1

,1

,2

)(' jk

i

ji

j

i

k waga

a

2,1,2,1 )(' jk

i

j

k

i

k

i

j wag dd

Training of multilayer network: Back-propagation

Page 47: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Compute zl for each example (Feed-forward step) ;

Compute the deviation on the output layer, d 2l;

Compute the deviation on the hidden layer, d j1;

Compute the gradient of the Error with respect to the weights

Update the weights with the steepest-descent method

Input

Output

Training of multilayer network: Back-propagation

Page 48: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

What does a neural network learn?

Considering the ideal case, consisting in a continuous set ofexample, x, each one represented with frequency P(x). Thedesired solutions t are associated to the input with probabilityP(t | x)

jj

j

jj dxxPxdPdwxzE dd)()|(),(2

1 2

0),(

wxz

E

jd

dTraining, after convergence:

jjjj

j

jj dxxxxPxdPdwxz d)d-()()|(),(0 , dd

jjjj dxdPdwxz d)|(),(

Functional derivative

The activation state of the j-th

output neuron is equal to the

average of the solution associated

to the input x in the training set.

Page 49: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Neural Networks for classification and regression

Networks can be used for classification or for regression

In regression: desired outputs are real numbersIn classification: desired outputs are 0 or 1

Error function

2

,

),(2

1

ji

i

j

i

j ywxzE

Page 50: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Increasing the hidden neurons increases the number of parameters and then increases risk of overfitting learning

data

Neural Networks and overfitting

Page 51: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

1) Be sure that the number of parameters is far lower than the number of points to learn

(What is the number of parameters of a network with n inputs, k outputs and r hidden neurons?)

2) Use regularizers (if possible). E.g.

Many other formulation are possible

2)(2

,

),(2

1 k

ij

ji

i

j

i

j wywxzE

Page 52: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

3) Use always an independent test set for deciding when to stop the training [EARLY STOPPING]

(and then validate the method on a third independent set)

Test

Stop the training at this iteration

Page 53: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Back propagation is not suitable for training the networks as the number of

layers increases: DEEP LEARNING PROCEDURES ARE NEEDED

Can we add more layers?

Page 54: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

Stuttgart Neural Network Simulator

http://www.ra.cs.uni-tuebingen.de/SNNS/

Page 55: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

http://www.opennn.net/

OpenNN

Page 56: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

http://deeplearning.net/software/theano/

THEANO

Page 57: New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij Synaptic weitghs Neuron i- i d i a wi x 1 z g(a) The threshold can be implicitly

https://grey.colorado.edu/emergent/index.php/

Comparison_of_Neural_Network_Simulators

More on: