whoami - Technische Universität Ilmenau · Only very few analogies in the biological archetype...

whoamiwhoami

• Dipl.-Inf. Sven [email protected]

• Finished Ph.D. thesis about motion analysis with focus on prediction and classification

Sven Hellbach: Neural Networks - 1Self-Organization in Communication Systems (WS 2009)

AgendaAgenda

• What is a Neuron? (with a little biological background)

• How does this lead to complex networks? • Which kinds of network topologies do exist and

what are they made for? Just to name a few: – Recurrent Neural Networks (RNN)


– Recurrent Neural Networks (RNN)– Kohonen Networks

• Self organizing feature maps

• Neural Gas

• How are these networks adapted to the problem?

Structure and function of the primal sections of a neuronStructure and function of the primal sections of a neuron

Section of a neuron Functions (from a data processing point of view)

1. Synapses

2. Dendrite

Directed information transfer & memory� Adaptive analog switch� D-to-A convertion of pre-synaptical spike sequence into

analoge membrane potentials

Spatio-temporal integration & transfer� Non-linear spatio-temporal superposition of the local

dentrite membrane potentials� Transfer of the local membrane potentials (including loss,

delay and latency effects)


3. Soma

4. Axon hillock

1. Synapses

5. Axon

delay and latency effects)

Global spatio-temporal integration � Integration over each dendrite potential into the soma

membrane potential

Interference-safe Recoding� A-to-D-conv. of the soma potential into spike squences� Information coding in spike rate & spikezeitpunktInformation routing und distribution� safe, loss-free und energie optimal routing� activ spatio-temporal recoding

Modeling the NeuronModeling the Neuron

Remark:

• Up to now a large number of neuron models have been developed

• Different complexity and fidelity to the biological archetype

• Import Models are:


1. Compartment Model Neurons

(Skipped in this lecture)

2. Integrate & Fire Neurons

3. Leaky Integrators (Dynamical Rate Neurons)

4. Static Rate Neurons

Pra

ctic

alR

elev

ance

Rel

atio

n to

B

iolo

gy

The Integrate & Fire (I&F) NeuronThe Integrate & Fire (I&F) Neuronaxondendritessynapses

spikesspikes

threshold

soma

.

.

.

x w

yx w

i1 i1

i2 i2>=

z

zz

( )

( )t

t

i2

i1

i


spikes

weighted postsynaptic potentials

afterhyperpolarization

.

.x wij ij

z ( )tijahp

• Synapses are modeled as linear transfer element• Each spike (Dirac-impulse) leads to an impulse response at the synapse

depending on the weighting function g(t) – a local post synaptic potential zij (t)• Weighting function g(t) usually approximated as Alpha-function

Illustration of the temporal convolutionIllustration of the temporal convolution

• Demo zur Calculation of the PSP at a Synapse of a I&F Neuron

x(t)g(t)

z (t)


zi(t)

t=0t=-k∆T

Leaky IntegratorLeaky Integrator

wi1

wi2

x1(t)

x2(t)

zi1(t)

zi2(t)

Activationfunction

Outputfunction

yi

y (t)

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate

� Input and Output are interpreted as temporal changing average spike rate� Focusing on the decay of the alpha function, leaving out the steep ascent


Synapses

wi2

wij

win

Axon

x2(t)

xj(t)

xn(t)

x(t)

Dendrite

zi2(t)

zij(t) =

zin(t)

wijxj(t)*g(t)

Soma

=)(tzi

∑=

n

jij

tz1

)(

Axon hillock

( ) ( )( )tft zy ii=

zi

yi(t)

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate

The Static Rate NeuronThe Static Rate Neuron

wi1

w

x1

x

Output function

yi

Activation or State function

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate

• Derived from the dynamical rate neuron• Neglecting the temporal functionality of the neuron

i.e. the temporal change caused by the input signals


Synapses/Dendrite

wi2

wij

win

Axon

x2

xj

xn

x

Soma Axon hillock

( )zy iif=

zi

yi),( ii fz wx=w i

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate

IntermissionIntermission

What have we learned?• Three types of neurons motivated from

the biological archetype – the human brain:

1. Integrate & Fire Neurons

2. Leaky Integrators (Dynamical Rate Neurons)




• What’s yet unknown?– Activation function– Output function

Activation or State functionActivation or State function

Definition: Linear or non-linear mapping of the input vector x onto the activation zi of neuron i by use of the weight vector wi

Inner product activation global operating activation function


a) Inner product neuron

( ) γcos,1

xwxwxw ⋅=⋅==⋅= ∑=

i

n

jjiji

T

i

T

ii xwnetz

x1 , w1

x2 , w2

w i

xp

γ xpw

Geometric View on the Inner Product NeuronGeometric View on the Inner Product Neuron

Question: Which type of mathematical function is realized by an inner product neuron in ℜn ?

� Visualizing mapping in input-output coordinate system leads to a simple plane:

zxwxw

xwxwz

⋅−⋅+⋅=⋅+⋅=

10 2211

2211

z Plane: z=x1 + x2


x1

x2

1

1

-1-1

z

z=0 � separation line

Plane: z=x1 + x2

1� Infinite activation plane

� global Operation� Orientation and steepness of the

plane are controlled by w1/2

� Translation of the plane in z-direction is realized by threshold T:z =w1 x1 + w2 x2 - T

� Determination of the parameter is done by iterative teaching (can also be done analytically)

Inner product based activationInner product based activation

• Biological Archetype: presynaptical modulation

• Allows a solution for non-linear seperable classification problems,

b) Sigma-Pi (ΣΠΣΠΣΠΣΠ) aktivation

( ) ∑ ∏= =

⋅=

n

j

p

ujuijii xwz

1 1

, xwx1

xj1

xj2

xjk

yi


seperable classification problems, e.g. XOR using y=z=x1+x2-x1x2

• :xn

xjk

x1 x2 x1 x2 y=z

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0x1

x2

x1 x2

1

Distance Bases Activation Local Operating Activation Function

Distance Bases Activation Local Operating Activation Function

Vector norm:Common p- respectively L-Norm of d

pn

j

p

pd j

/1

1

= ∑

=d

∑=

=n

jjd

11d

∑=

==n

d j

2

2dd

– Sum of absolutes norm or.

Manhatten-norm (L = p = 1)

– Euklidian norm (L = p = 2)


∑=j

d j12

dd

d jnj

max1 ≤≤∞

=d

x2

w2

x1 w1

x

wL1

L-∞=minL∞=max

d = x-wL2

– Maximum norm (L = p = ∞∞∞∞)

– Euklidian norm (L = p = 2)

Beispiel:

Output functionOutput function

Basic Idea: How behaves the output activity yi (average spike rate) depending on the internal aktivation zi ?

wi1

w

x1

x

Output function

yi

Activation or State function

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate


Synapses/Dendrite

wi2

wij

win

Axon

x2

xj

xn

x

Soma Axon hillock

( )zy iif=

zi

yi),( ii fz wx=w i

Ave

rag

e sp

ike

rate

Ave

rag

e sp

ike

rate

Overview of Significant Output FunctionsOverview of Significant Output Functions

1. Identity function

1

-1

zi

f(zi)1

2. Ramp function

1

-1

zi

f(zi)

T1 T2

3. Step function

1

-1

zi

f(zi)


4. Fermi function

1

-1

zi

c>1 c=1

c<1

f(zi)5. Hyperbol. Tangens

1

-1

zi

f(zi)

6. Gauß function

1

-1

zi

f(zi)

Overview of Neuron typesOverview of Neuron types

∑r

Static Rate Neuron:• Inner Product Activation• Linear Output

∑r

Static Rate Neuron:• Inner Product Activation• Binary Output

Static Rate Neuron:• Inner Product Activation∑


• Inner Product Activation• Output: Sigmoid function

∑r

dn

Static Rate Neuron:• Distance or Norm based activation function• Output: Gauss function

∑tr ,

Static Rate Neuron with time window:• Inner Product Activation• Output: Sigmoid function

Übersicht über wesentliche Neuronentypen (2)Übersicht über wesentliche Neuronentypen (2)

∑tr ,

∑tr ,

Dynamic Rate Neuron:• Inner Product Activation• Leaky Integrator (PT1 - behavior)• Linear Output

Dynamic Rate Neuron::• Inner Product Activation• Leaky Integrator (PT1 - behavior)


∑tr ,

• Leaky Integrator (PT1 - behavior)• Output: Sigmoid function

Integrate-and-Fire (Spiking) Neuron:• Inner Product Activation• Leaky Integrator (PT1 - behavior)• Output: Delta impulses (Spikes)

The Static Rate Neuron as Linear ClassifierThe Static Rate Neuron as Linear Classifier

Goal:1. Search for a separation plane (class

border) which discriminates the set of data points

2. Automatic classification of unknown data point by determining their position in relation to the separation plane

Prerequisites:

X2(e.g. width of a cell)

x

x

x x

x


Prerequisites:• Separation of the two classed has to be

possible by a line (plane) � linear separability

Determining the separation plane:1. Teaching of the neuron with known training

data set

X1(e.g. length

of a cell)

1

-1

xx

x

� Linear Separation of two regions of the input space by a separation line


What have we learned?• nD-Input can be transferred by use of

specific functions to an 1D-output• Small problems can already be solved

What‘s yet unknown?• What about larger problems?


• What about larger problems?• How to build larger structures?

Anatomie of the brainAnatomie of the brain


Models for Neural NetworksModels for Neural Networks

Task: Connect known neurons in an appropriate way to generate neural networks.

Common Network Topologies


Common Network Topologies

� Homogeneous vs. structured networks � Complete vs. partially connected networks� feed-forward vs. recurrent networks � Networke with constant vs. variable number of neurons

Homogeneous vs. Structured Networks Homogeneous vs. Structured Networks

Homogeneous Networks:� Very formal and regular structure� as feed-forward or recurrent networks� Particular knowledge about local connections or relations is not considered� Only very few analogies in the biological archetype


Feed-forward Networks (e.g. MLP) Recurrent Networks (e.g. Hopfield-Nets)

x yx y

Homogeneous vs. Structured Networks (2)Homogeneous vs. Structured Networks (2)

Structured Networks :� Allows the integration of structure knowledge and about secific local

connections or relations� Hence, show a strongly structured configuration� as feed-forward or as recurrent networks� Clear and plausible biological background

x


x

y

Example: Local connections of the pyramidal cells in the visual cortex (Layer 2 und 3)

Homogenous Feed-Forward NetworksHomogenous Feed-Forward Networks

Properties and Application:� Neurons ordered in layers� Only consecutive layers are connected, typically complete connections� Network activity is propagated in a unidirectional way (forward) from input

layer over the hidden layer to the output layer� Usually applied as hetero-associative or auto-associative memory, e.g. for

classification or function approximation tasks


� Number of layers determines the mapping abilities of the network

x y

Separation tasks with Multi-layer Feed-forward Networks (using ΣΣΣΣ-Neurons) in R2Separation tasks with Multi-layer Feed-forward Networks (using ΣΣΣΣ-Neurons) in R2

Number of network layers

Available decision regions

Graphical represenation

in R2

Example 1: XOR-

Problem

Example 2: concave data distribution

1.x1 x2

1 0

0 1 A

BHalf-plane, delimited by hyper-plane

(in R2 separation line)


x1 x2

1.

2.

1.2.3.

x2x1

1

0

0

1

1

1

0

0

A

B

A

B

Convex infinite or finite region

arbitrary regions(complexity only limited by number of neurons)

Representing Time: The Sliding window techniqueRepresenting Time: The Sliding window technique

• Usage of a moving time window to detect local temporal relationships (causalities)

• Biological Motivation: Different signal delays at the neuronal dendrite

xj (t)xj (t-∆T)

xj (t- k∆T)

j


• Instead of a single pattern x(t)T = (x1, x2, …, xj ,..., xn) a time window with delay k is applied on temproal changing input pattern

j

∆−=

Tk(t

(t)

)T

T

x

x

X M

Representing Time: The Sliding window technique(2)Representing Time: The Sliding window technique(2)

X Basic idea: explicit or external


t

x(t)

X

Time window with delay k∆T

Basic idea: explicit or external representation of signal or pattern history by using a sliding time window

�� Time-Delay-Neural-Nets (TDNN)

Homogenous Complete Recurrent NetworksHomogenous Complete Recurrent Networks

Properties and Application:� Each neuron is connected with all other neurons of the network� Application as auto-associative memory for pattern memorisation� Allows recognition of noisy or incomplete input patterns


x y

Partially recurrent NetworksPartially recurrent Networks

Properties and application:� A subset of the neurons back couples their activity into the network� Commonly Applied for representing and processing of temporal

dependencies within the data (e.g. signal pattern, video sequences)� Usage of dynamical neurons as context neurons� Examples for partially recurrent networks:

Jordan -NetworkElman-Network


input layer

hidden layer

output layer

w=1

context cells

w=1

context cells

Jordan -Network

x

y y

x

Neuro-OscillatorsNeuro-Oscillators

• Neuro-Oscillators as Concept for generating temporal changing neuron activity (variable average fire rates) to realize synchronization processeswithout the application of I&F-Neurons

• Based on a group of coupled dynamic rate neurons, which exhibit or inhibit each other � e.g. Wilson-Cowan Oscillator (1973)

Das formale dynamische Ratenneuron

System of non-linear 1st order differential equations

∑tr ,wuu

w

I


)()( 1

1)(

1

1)(

)()()()(

)()()()()(

tczvtzcu

uvuvvvvv

Ivuvuuuuu

vu ety

ety

tywtywtztz

tIwtywtywtztz

−⋅− +=

+=

+−−=⋅+−+−=⋅

und

&

&

ττ

differential equationsu

vwvv

wuv

wvu

wI

Der Buhmann-Oszillator - SimulationDer Buhmann-Oszillator - Simulation

u

v

w

1

1

(1-w)/2

I=1

1/2

typical oscillation


v

limit cycle attractor

Echo State NetworksEcho State Networks

x(1)

x(i)

yout(1)

yout(j)

Win

W Wout

H. Jaeger, H. Haas: Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless telecommunication. Science 2004


x(i)

Input layer

yout(j)

Output layerHidden layer„dynamic reservoir“(randomly initialized)

Wback

Constant Weights

Optional Weights

Trainable Weights

Dynamic ReservoirDynamic Reservoir

• Weights in reservoir can be set to zero, which corresponds to the non-existence of the weight

• Advantage of such a sparse weight matrix for the reservoir:Formation of independent sub-networks and hence uncorrelated dynamics, which are linear combined to the output



What have we learned?• How to build complex networks by combining simple neurons• These networks can solve classification and regression task in spatial

as well as temporal domain

What‘s yet unknown?• There’s a large set of parameter (weights): How do we determine the


• There’s a large set of parameter (weights): How do we determine the right value for our problem?

Learning with Neural NetworksLearning with Neural Networks

a) Learning of the Model Structure1. Incremental Learning Paradigm

� Insertion of new neurons� Insertion of new synaptic connections or topology connections

between the neurons� Examples: Growing Neural Gas, Dynamic Cell Structures (neurons

with local activation function); Cascade Correlation (neurons with global activation function)

Basic kinds of Learning


global activation function)2. Decremental Learning Paradigm

� Deletion of Neurons� Deletion of existing synaptic connections or topology connection

b) Learning of Model Parameters� Parameter define the placement of the hyper-planes / receptiv fields, as

well as their form and extend� Modification of the synaptic weights of the neurons� Modification of the activation and output function

Hint: Incremental Networks (GNG, DCS) apply both paradigms

Basic kinds of LearningBasic kinds of Learning

1. Error-driven Learning

(Supervised Learning, Lerning with a teacher)

2. Reinforcement Learning

3. Self-organizing Learning

(Unsupervised Learning, Competitive Learning)


Hint: innovative network models do not differentiate between learning and

application phase. They are able to perform on-line and life-long learning.

Assignment of common Neural Networks to the Learning ParadigmsAssignment of common Neural Networks to the Learning Paradigms

Probabilistische Netze

Partially recurrent:- Jordan-Network- Elman-Network- Mozer-Network

Complete recurrent:- Back-propagation through Time (BPtT)- Real Time Recurrent Learning (RTRL)

Error minimizing:-Perceptrons / Adalines mit Least-mean square (LMS)-learning (Delta-Rule)

-Multilayer Perceptrons (MLP) with Back-propagation learning

-Time Delay Neural Nets (TDNN)-Radial Basis Function Nets (RBF)- Counter-propagation Nets

Adaptive Resonance Theory

Lear

ning

Processing & TopologyFeed-forward Recurrent

Sup

ervi

sed

(Err

or-d

riven

)


Adaptive Vector Quantization-Self-Organizing Feature Maps (SOFM)-Neural Gas-Learning Vector Quantization (LVQ 1-3)

Incremental Vector Quantization- Growing Neural Gas (GNG)- Growing Cell Structures (GCS)

Probabilistische Netze Adaptive Resonance Theory- ARTMAP, FuzzyARTMAP

Adaptive Resonance Theory- ART 1-3- FuzzyART, Gaussian ART- Distributed ART, Laminar ART

Associative memory- Hopfield-Networks- Boltzmann-Machine

Reinforcement Learning (Q, AHC)

Lear

ning

Uns

uper

vise

d(S

elf-

orga

nizi

ng)

Error-driven or Supervised LearningError-driven or Supervised Learning

ΣΣΣΣ

ΣΣΣΣ

ΣΣΣΣ

ΣΣΣΣ

W(1)

xp

ΣΣΣΣ

ΣΣΣΣ

W(2)

y p tp

∆W∆W

Basic Idea:

(2)(1)


ΣΣΣΣ

Hiddenneurons

Inputneurons

Outputneurons

Errorvector

∆W∆W

ppp ytd −=Mapping error

2|||| ppE d=∑=

p

pE 2|||| d

(2)(1)

Realization:• Presentation of the pattern pair (x, t)

from the learning sample (training set) which has to be associated

• Learning usually done by error correction using gradient decent method on the error function E(w)

Self-organizing/Unsupervised LearningHebbian Correlation LearningSelf-organizing/Unsupervised LearningHebbian Correlation Learning

The synaptic coupling between two neuron behaves plastically and changes proportional to the correlation between the activities of the sender and receiver

j iwij

Sending neuron Receiver neuron

kiy

kj

kj xy =

� Most simple unsupervised learning rule (Hebb, 1949)


proportional to the correlation between the activities of the sender and receiver neuron, i.e. the pre- and post-synaptic activities

+01

000

10yjyi

kij

kij

1kij www ∆+=+

kj

ki

kj

ki

kij xyyyw ⋅⋅=⋅⋅=∆ αα

Example: Assembly-Grouping of feature detectors

Self-organizing / Competetive LearningSelf-organizing / Competetive Learning

Basic Idea:� Learning without external or internal

supervision, i.e. neither an explicit teacher nor implicit Critic

� Learning only based on the datadepending from the statistics of the input data in the learning sample (its probability distribution – p.d.f.)

←←←← Voronoi-Tessellationw2x2


probability distribution – p.d.f.) � Clustering

� Neural network tries to cluster or segment the continous input space into a finite number of receptive fields (Voronoi regions) – not necessary uniform but depending on the p.d.f. of the input data

� Vector Quantization or Voronoi Tessellation of the input space

↑↑↑↑ Delaunay- Triangulation

w1

x1

Self-Organizing Feature Maps (SOFM)Self-Organizing Feature Maps (SOFM)

Kohonen Map

r

r ´Best-

ΦΦΦΦ

Goal: Self-organization of a topology-conserving mapping of the n-dimensional input space Rn

(feature space) into a m-dimensional neural Kohonen map Rm

with a pre-defined grid structure


R2

s

Input Space

Exhitation area

Best-matching Neuron∆ w r´

w r´

Kohonen AlgorithmKohonen Algorithm

3

1

2

x2 , w2

w i

1.0

0.8

0.5

h(t) ε =0.5

∆w i

w i(t+1)


3

5

4

x1 , w1

xp

xp – w3

0.8

1.0

0.5∆w3

∆w i

w3(t+1)

1-D Kohonen map withclosed ring topology

Neural GasNeural Gas

1

2

x2 , w2

w i

0.4

0.2

h(t) ε =0.5

w i(t+1)


3

5

4

x1 , w1

xp

xp – w3

0.6

1.0

0.8∆w3

∆w i

w3(t+1)

Suggested Reading (1)Suggested Reading (1)

Books:• Zell, A. Simulation Neuronaler Netzwerke. Addison-Wesley/

Oldenbourg 1994.

• Brause, R. Neuronale Netze. B.G. Teubner Stuttgart, 1999


• Ritter, H., Martinetz, Th., Schulten, K. Neuronale Netze. Addison-Wesley/Oldenbourg 1994.

• Lämmel, U., Cleve, J. Künstliche Intelligenz – Lehr- und Übungsbuch. Fachbuchverlag Leipzig, 2001

Suggested Reading (2)Suggested Reading (2)Books:• Bishop, Chr. Neural Networks for Patter Recognition, Oxford Press

1997

• Kohonen, T. Self-Organizing Maps. Springer Series in Information Sciences. Vol. 30, Springer Verlag 1995

• Görz, G., Rollinger, C. -R., Schneeberger, J. Handbuch der


• Görz, G., Rollinger, C. -R., Schneeberger, J. Handbuch der Künstlichen Intelligenz, Oldenbourg Verlag 2000

• Vetters, K. Mathematik für Ingenieure und Naturwissenschaftler –Formeln und Fakten, Teubner Stuttgart 1996

Journals:• Neural Networks, Neural Computing, Neural Computation• IEEE Transactions on Neural Networks, Künstliche Intelligenz

whoami - Technische Universität Ilmenau · Only very few analogies in the biological archetype...

Documents

Transcript of whoami - Technische Universität Ilmenau · Only very few analogies in the biological archetype...