New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij...

INTRODUCTION TO NEURAL NETWORKS

Complex computations: Mach’s Bands

Observe the transitions among the bands

Da: R. Pierantoni, La trottola di Prometeo, Laterza (1996)

Observe the transitions among the bands

Stimulus Percept

A simple model of the retina neuron

0 20 40 60 80 100

Incident Intensity (fotons/s)

Potential (mV)

Linear light-to-potential transducer

Potential

Neuron transduction

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Adding lateral inhibition

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Each neuron inhibits its neighbor by a 10% of its non inhibited potential

-0.1x160-0.1x160

Adding lateral inhibition

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

160 - 0.1 160-0.1 40=140

40 - 0.1 160-0.1 40=20

40 - 0.1 40-0.1 40=32

160 - 0.1 160-0.1 160=128

Each neuron inhibits its neighbor by a 10% of its non inhibited potential

Many identical computing units, each one performing verysimple operations, can perform very complex computationswhen they are widely and specifically connected.

The “knowledge” is stored in the topology and in the strength of the synapses

Model Neuron: McCulloch and Pitts

A neuron is a computational unit that1) performs the weighted sum of the input signals,computing the activation signal (a)

2) transforms the activation signal following a tranferfunction g and computing then the output z

i xwa1

w: synaptic weights: activation threshold

Transfer functions

-10 0 10

Usually, NON linear functions are adopted

Non linearity

-10 0 10

The same variation in the input can give very different variations on the transferred signal

Artificial Neural Networks

Wij Synaptic weitghs

Neuron i

i xwa1

The threshold can beimplicitly considered by addingan extra-neuron, alwaysactivated and connected tothe current neuron withweight equal to -

Topology of artificial neural networks

The topology of the connections among the neurons definesthe network class. We will take into consideration only thefeed-forward architectures, where the neurons areorganized into hierarchical layers and the signal flows injust a direction.

Perceptrons2 layers: Input and Output

ijj xwgz

Neural networks and logical operators

ORw13 = 0.5 w23 = 0.5 3 = 0.25

a3 = 0.25

z3 = 1

a3 = 0.25

z3 = 1

a3 = 0.75

z3 = 1

a3 = -0.25

z3 = 0

ANDw13 = 0.5 w23 = 0.5 3 = 0.75

a3 = -0.25

z3 = 0

a3 = -0.25

z3 = 0

a3 = 0.25

z3 = 1

a3 = -0.75

z3 = 0

NOT (1)w13 = -0.5 w23 = 0.1 3 = -0.25

a3 = -0.25

z3 = 0

a3 = 0.35

z3 = 1

a3 = -0.15

z3 = 0

a3 = 0.25

z3 = 1

Supervised artificial neural networks

Feed-forward artificial neural networks can be trainedstarting from examples with known solution.

Error functionGiven a set of examples xi with known desired output, di andgiven a network with parameters w, the square error iscomputed starting from the output of the network z (j sumsover the output neurons )

j dwxzE

The training procedure consists in finding the parameters wthat minimize the error: iterative minimization algorithmsare adopted. However they do NOT guarantee to reach theglobal minimum.

Training a perceptron

We consider a differentiable transfer function

)(1)(1

Given some initial parameters w:

jlj wxw

dwxzwxz

)('),(

jj agz

ljj lxwa

j dwxzE

deviation: d ij

xxagdwxzw

Ed)('),(

deviation: d ij

Using the gradient we can update the weights with the“steepest descent” procedure

is the learning rate:Too low: slow trainingToo high: the minima can be lost

Convergence:0

Steepest descent finds the minimum of a function by always pointing in the direction that leads downhill.

Steepest descent finds the LOCAL minimum of a function by always pointing in the direction that leads downhill.

f: RnR. If f(x) is of class C2, objective function

Gradient of f

Is a vector containing all the partial derivatives of the first order

Gradient

Given a function f(xy) and a level curve f(x,y) = c

the gradient of f is:

Consider 2 points of the curve: (x,y); (x+εx, x+εy), for small ε

The Gradient is locally perpendicular to level curves

(x,y)(x+εx, y+εy)

),(),(

fyxfyxf

Since both : (x,y); (x+εx, x+εy), points satisfy the curve equation:

The gradient is perpendicular to ε.For small ε, ε is parallel to the curve and,by consequence, thegradient is perpendicular to the curve.

The gradient points towards the direction of maximumincrease of f

The local perpendicular to a curve: Gradient

(x,y)(x+εx, x+εy)

grad (f)

Steepest descent finds the LOCAL minimum of a function by always pointing in the direction that leads downhill.

Example: OR

w13 = 0 w23 = 0 3 = 0 =2

Training examplesx1 x2 d a z E dE/dw13 dE/dw23 dE/d3

1 0 1 0 0.5 0.125 -0.125 0 0.125

0 1 1 0 0.5 0.125 0 -0.125 0.125

0 0 0 0 0.5 0.125 0 0 -0.125

0.5 -0.125 -0.125 0

1)( )1()(1)()(' zzagagag

xxagdxzw

Ed)(')(

Example: OR, Step 1

w13 = 0.25 w23 = 0.25 3 = 0 =2

1 0 1 0.25 0.56 0.096 -0.108 0 0.108

0 1 1 0.25 0.56 0.096 0 -0.108 0.108

0 0 0 0 0.5 0.125 0 0 -0.125

0.442 -0.108 -0.108 -0.035

Example: OR, Step 2

w13 = 0.466 w23 = 0.466 3 = 0.069 =2

1 0 1 0.397 0.598 0.081 -0.097 0 0.097

0 1 1 0.397 0.598 0.081 0 -0.097 0.097

0 0 0 -0.069 0.483 0.117 0 0 -0.121

0.395 -0.097 -0.097 -0.048

Example: OR, Step 3

w13 = 0.659 w23 = 0.659 3 = 0.164 =2

1 0 1 0.494 0.621 0.072 -0.089 0 0.089

0 1 1 0.494 0.621 0.072 0 -0.089 0.089

0 0 0 -0.164 0.459 0.105 0 0 -0.114

0.354 -0.089 -0.089 -0.05

Generalization

w13 = 0.659 w23 = 0.659 3 = 0.164 =2

And what happens for the input (1,1)?x1 x2 d a z

1 1 1 1.153 0.760

The network generalized the rules learned from known examples

Linear separability

Given a step-like transfer function, the output neuron of a perceptron is activated if the activation is positive:

The input space is then divided into two regions

If the requested mapping cannot be separated by an hyperplane, the perceptron in insufficient.

Linear separability

AND OR NOT(1)

The XOR problem cannot be solved with a perceptron.

Multi-layer feed-forward neural networks

Neurons are organized into hierarchical layers

Each layer receive their inputs from the previous one and transmits the output to the next one

ijj xwgz

ijj zwgz

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

a11 = -0.5 z1

a12 = -0.5 z1

a21 = -0.5 z2

x1 = 0 x2 = 0

a11 = 0.2 z1

a12 = -0.2 z1

a21 = 0.2 z2

x1 = 1 x2 = 0

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

a11 = 0.2 z1

a12 = -0.2 z1

a21 = 0.2 z2

x1 = 0 x2 = 1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

a11 = 0.9 z1

a12 = 0.1 z1

a21 = -0.5 z2

x1 = 1 x2 = 1

11 = 0.7 w121 = 0.7 1

1 = 0. 5 w1

12 = 0.3 w122 = 0.3 1

2 = 0. 5 w2

11 = 0.7 w221 = -0.7 2

1 = 0. 5

The hidden layer maps the input in a new representation that is linearly separable

Input Desired Activation ofoutput hidden neurons

0 0 0 0 01 0 1 1 00 1 1 1 01 1 0 1 1

Supervised artificial neural networks

Feed-forward artificial neural networks can be trainedstarting from examples with known solution.

Error functionGiven a set of examples xi with known desired output, di andgiven a network with parameters w, the square error iscomputed starting from the output of the network z (j sumsover the output neurons )

j dwxzE

The training procedure consists in finding the parameters wthat minimize the error: iterative minimization algorithmsare adopted. However they do NOT guarantee to reach theglobal minimum.

We consider a differentiable transfer function

)(1)(1

Given some initial parameters w:

jlj wxw

dwxzwxz

)('),(

jj agz

ljj lxwa

j dwxzE

deviation: d ij

xxagdwxzw

Ed)('),(

deviation: d ij

Using the gradient we can update the weights with the“steepest descent” procedure

is the learning rate:Too low: slow trainingToo high: the minima can be lost

Convergence:0

Training of multilayer network: Back-propagation

i zzagdwxzw

,1,2,12

2)('),( d

For the layer 2, the perceptron formula holds, upon the substitution x z1,i

For the layer 1:

Defining d 1,ij

k waga 2,1,2 )(2,1

)(' jk

k waga

2,1,2,1 )(' jk

j wag dd

Compute zl for each example (Feed-forward step) ;

Compute the deviation on the output layer, d 2l;

Compute the deviation on the hidden layer, d j1;

Compute the gradient of the Error with respect to the weights

Update the weights with the steepest-descent method

Output

What does a neural network learn?

Considering the ideal case, consisting in a continuous set ofexample, x, each one represented with frequency P(x). Thedesired solutions t are associated to the input with probabilityP(t | x)

jj dxxPxdPdwxzE dd)()|(),(2

dTraining, after convergence:

jj dxxxxPxdPdwxz d)d-()()|(),(0 , dd

jjjj dxdPdwxz d)|(),(

Functional derivative

The activation state of the j-th

output neuron is equal to the

average of the solution associated

to the input x in the training set.

Neural Networks for classification and regression

Networks can be used for classification or for regression

In regression: desired outputs are real numbersIn classification: desired outputs are 0 or 1

Error function

j ywxzE

Increasing the hidden neurons increases the number of parameters and then increases risk of overfitting learning

Neural Networks and overfitting

1) Be sure that the number of parameters is far lower than the number of points to learn

(What is the number of parameters of a network with n inputs, k outputs and r hidden neurons?)

2) Use regularizers (if possible). E.g.

Many other formulation are possible

j wywxzE

3) Use always an independent test set for deciding when to stop the training [EARLY STOPPING]

(and then validate the method on a third independent set)

Stop the training at this iteration

Back propagation is not suitable for training the networks as the number of

layers increases: DEEP LEARNING PROCEDURES ARE NEEDED

Can we add more layers?

Stuttgart Neural Network Simulator

http://www.ra.cs.uni-tuebingen.de/SNNS/

http://www.opennn.net/

OpenNN

http://deeplearning.net/software/theano/

THEANO

https://grey.colorado.edu/emergent/index.php/

Comparison_of_Neural_Network_Simulators

New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij...

Documents

Transcript of New INTRODUCTION TO NEURAL NETWORKS - unibo.it · 2016. 4. 28. · Artificial Neural Networks W ij...

Higher Order Neural Networks and Neural Networks for ...

Neural Networks Neural Networks

Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Neural Networks with Complex Activations and … · Neural Networks with Complex Activations and Connection Weights ... neural network paradigms, ... Neural Networks with Complex

Neural Networks

Artificial Neural Networks Lect7: Neural networks based on competition

Neural Networks Part II Feed-Forward Neural Networks

Neural Networks Recurrent networks Boltzmann networks …liacs.leidenuniv.nl/~nijssensgr/CI/2011/7 neural networks.pdfNeural Networks • Recurrent networks • Boltzmann networks

An introduction to neural networks for beginners€¦ · Part 1 – Introduction to neural networks 1.1 WHAT ARE ARTIFICIAL NEURAL NETWORKS? Artificial neural networks (ANNs) are

Neural and Fuzzy Neural Networks

Deep Parametric Continuous Convolutional Neural Networks€¦ · Graph Neural Networks: Graph neural networks (GNNs) [25] are generalizations of neural networks to graph structured

Artificial Neural Networks Lect8: Neural networks for constrained optimization

CS536: Machine Learning Artificial Neural Networks Neural Networks

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Learning Compact Recurrent Neural Networks With Block-Term ...zhe/pdf/Ye_Learning... · RNN training, the BTD layer automatically learns inter-parameter correlations to implicitly

Neural network for Neural Networks

Neural Networks Neural Networks based on Competition CHAPTER 4.

Artiﬁcial Neural Networks and Fuzzy Neural Networks for ...

Neural Networks: Old and New · Arti cial neural networks Brain neural networks Credit: Max Pixel Arti cial neural networks Why called arti cial? {(Over-)simpli cation on neural level

Neural Network Part 1: Multiple Layer Neural Networkspages.cs.wisc.edu/.../slides/lecture10-neural-networks-1.pdfNeural networks • a.k.a. artificial neural networks, connectionist