Post on 24-Jan-2015
description
Introduction to Machine Learning
Lecture 11N l N t kNeural Networks
Albert Orriols i Puigaorriols@salle.url.edu
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull
Recap of Lecture 5-10Data classification
Decision trees (C4.5)
Instance-based learners (kNN and CBR)
Slide 2Artificial Intelligence Machine Learning
Recap of Lecture 5-10Data classification
Probabilistic-based learners
)()|( hPhDP)(
)()|()|(DP
hPhDPDhP =
Linear/polynomial classifier
Slide 3Artificial Intelligence Machine Learning
Today’s Agenda
Why Neural Networks?Looking into a BrainNeural NetworksNeural NetworksStarting from the Beginning:
PerceptronsMulti-layer perceptrons
Slide 4Artificial Intelligence Machine Learning
Why Neural Networks?Brain vs. machines
Machines are tremendously faster than brains in well-defined problems:
Invert matrices solve differential equations etcInvert matrices, solve differential equations, etc.
Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p qof processing
Recognize the character of objects in TV
Let’s simulate our brains with artificial neural networks!Massive parallelism
Neurons interchanging signals
Slide 5Artificial Intelligence Machine Learning
Looking into a Brain1011 neurons of more than 20 different types
0.001 seconds of neuron switching time
104-5 connections per neuron
0.1 seconds of scene recognition time
Slide 6Artificial Intelligence Machine Learning
Artificial Neural NetworksBorrow some ideas from nervous systems of animals
)()( , jijjii aWginga ∑==
Slide 7Artificial Intelligence Machine Learning
THE PERCEPTRON (McCulloch & Pitts)
AdalineAdaptive Linear Element
Adaptive linear combiner cascaded with a hard-limiting quantizer
Linear output transformed to binary by means of a threshold device
Training = adjusting the weights
Activation functions
Slide 8Artificial Intelligence Machine Learning
AdalineNote that Adaline implements a function
∑+=n
iiiwxwwxf
10),( rr
This defines a threshold when the output is zero
=i 1
This defines a threshold when the output is zero
0),( 0 =+= ∑n
iiwxwwxf rr 0),(1
0 +∑=i
iiwxwwxf
Slide 9Artificial Intelligence Machine Learning
AdalineLet’s assume that we have two variables
0)( ++ wxwxwwxf rr
Therefore
0),( 22110 =++= wxwxwwxf
01 wxwx −−=2
12
2 wx
wx −−=
So, Adaline is drawing a linear , gdiscriminant that divides the space into two regions
Linear classifierLinear classifier
Slide 10Artificial Intelligence Machine Learning
AdalineSo, we got a cool way to create linear classifiers
But are linear classifiers enough to tackle our problems?
Can you draw a line that separates examples of class whiteCan you draw a line that separates examples of class white and black for the last example?
Slide 11Artificial Intelligence Machine Learning
Moving to more Flexible NNSo, we want to classify problems such as x-or. Any idea?
Polynomial discriminant functions
In this system:
Slide 12Artificial Intelligence Machine Learning
0),( 222222122111
21110 =+++++= wxwxwxxwxwxwwxf rr
Moving to more Flexible NN
With appropriate values of w, I can fit data that is not linearly separable
Slide 13Artificial Intelligence Machine Learning
Even more Flexible: Multi-layer NN
So, we want to classify problems such as x-or. Any other idea?
Madaline: Multiple Adalines connected
This also enables the network to solve non-separable problems
Slide 14Artificial Intelligence Machine Learning
But Step Down… How Do I Learn w?
We have seen that different structures enable me to define different functionsdefine different functions
But the key is to get a proper estimation of w
There are many algorithmsPerceptron rule
α-LMS
α-perceptron
May’s algorithm
Backpropagationp p g
We are going to see two examples: α-LMS and backprop.
Slide 15Artificial Intelligence Machine Learning
Weight Learning in AdalineRecall that we want to adjust w
Slide 16Artificial Intelligence Machine Learning
Weight Learning in AdalineWeight learning with α-LMS algorithm
XεIncrementally update weights as 21
k
kkkk
XXWW εα+=+
The error is the difference between the actual and the expected output k
Tkkk XWd −=+1ε
A change in the weights effects the error k
Tkk
Tkkk WXXWd Δ−=−Δ=Δ )(ε
XεAnd the weight change is 21
k
kkkkk
XXWWW εα=−=Δ +
Therefore kk
Tkk
kXX αεεαε −=−=Δ 2
Slide 17
Therefore
Artificial Intelligence Machine Learning
kk
kX 2
Weight Learning in Adaline
kk
k XX
W 2εα=Δk
Tkk WX ΔεΔ −=
kX
Slide 18Artificial Intelligence Machine Learning
Backpropagationα-LMS works for networks with a single layer. But what happens in networks with multiple layers?happens in networks with multiple layers?
Backpropagation (Rumelhat, 1986)The most influential development of NN in the 1980s
Here, we present the method conceptually (the math details are in the papers)in the papers)
Let’s assume a network withThree neurons in the input layer
Two neurons in the output layer
Slide 19Artificial Intelligence Machine Learning
BackpropagationStrategy
Compute the gradient of the error
2∂ε
k
kk W
ˆ∂∂
=∇ε
Adjust the weights in the direction opposite to the instantaneous error gradient
Now, Wk is a vector that contains all the components of the net
Slide 20Artificial Intelligence Machine Learning
BackpropagationAlgorithm1. Insert a new example Xk into the network and sweep it forward
till getting the output y
C t th f thi tt ib t2. Compute the square error of this attribute
( )∑∑ −==yy N
2ikik
N2
ik2
k ydεε
For example, for two outputs (disregarding k)
∑∑== 1i1i
P t th t th i l (b k ti )
( ) ( )2222
112 ydyd −+−=ε
3. Propagate the error to the previous layer (back-propagation). How?
Steepest descent
Slide 21
pCompute the derivative of the square error δ for each Adaline
Artificial Intelligence Machine Learning
Backpropagation ExampleExample borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
Slide 22Artificial Intelligence Machine Learning
Backpropagation Example1. Sweep the weights forward
Slide 23Artificial Intelligence Machine Learning
Backpropagation Example2. Backpropagate the error
Slide 24Artificial Intelligence Machine Learning
Backpropagation Example3. Modify the weights of each neuron
Slide 25Artificial Intelligence Machine Learning
Backpropagation Example3.bis. Do the same of each neuron
Slide 26Artificial Intelligence Machine Learning
Backpropagation Example3.bis2. Until reaching the output
Slide 27Artificial Intelligence Machine Learning
Backpropagation for a Two-Layer Net.
That is, the algorithm is1. Find the instantaneous square error derivative
2)l( 1 ∂εδ
This tells us how sensitive is the square output error of the net ork is to changes in the linear o tp t s of the associated
)l(j
)l(j s2 ∂
−=δ
network is to changes in the linear output s of the associated Madaline
2. Expanding the error term we getp g g
[ ])2(
1
2)2(11
)2(1
222
211)2(
1 s)s(sgmd
21
s)yd()yd(
21
∂−∂
−=∂
−+−∂−=
][δ
3. And recognizing that d1 is independent of s1
11 s2s2 ∂∂
)2()2()2()2()2(
Slide 28Artificial Intelligence Machine Learning
)s('sgm)s('sgm)s(sgmd )2(1
)2(1
)2(1
)2(11
)2(1 εδ =−= ][
Backpropagation for a Two-Layer Net.
That is, the algorithm is4. Similarly for the hidden layers we have
⎟⎞
⎜⎛ ∂∂∂∂∂ )2(
22)2(
122
)1( ss11 εεεδ ⎟⎟⎠
⎜⎜⎝ ∂
∂∂∂
+∂∂
∂∂
−=∂∂
−= )1(1
2)2(
2)1(
1
1)2(
1)1(
1
)1(1 s
sss
ss2
1s2
1 εεεδ
)2()2(
5. That is)1(
1
)2(2)2(
2)1(1
)2(1)2(
1)1(
1 ss
ss
∂∂
+∂∂
= δδδ
4. Which yields)1(
23
1i
)2(i1
)2(20
)1(i
3
1i
)2(i1
)2(10 )s(sgmww)2()s(sgmww)2()1( ⎥⎦
⎤⎢⎣⎡
∑+∂⎥⎦⎤
⎢⎣⎡
∑+∂== + δδδ )1(
1
1i)1(
1
1i
s)(
2s)(
1)(
1 ∂⎦⎣
∂⎦⎣ == += δδδ
)s('sgmw)s('sgmw )1(1
)2(21
)2(2
)1(1
)2(11
)2(1 δδ +=
Slide 29Artificial Intelligence Machine Learning
[ ] )s('sgmww )1(1
)2(21
)2(2
)2(11
)2(1 δδ +=
Backpropagation for a Two-Layer Net.
Defining )2(21
)2(2
)2(11
)2(1
)1(1 ww δδε +=
Δ
We obtain )s('sgm )1(1
)1(1
)1(1 εδ =
Slide 30
Implementation details of each Adaline
Next Class
Support Vector MachinesSupport Vector Machines
Slide 31Artificial Intelligence Machine Learning
Introduction to Machine Learning
Lecture 11N l N t kNeural Networks
Albert Orriols i Puigaorriols@salle.url.edu
Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle
Universitat Ramon Llull