Lecture11 - neural networks

Introduction to Machine Learning

Lecture 11N l N t kNeural Networks

Albert Orriols i Puigaorriols@salle.url.edu

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Recap of Lecture 5-10Data classification

Decision trees (C4.5)

Instance-based learners (kNN and CBR)

Artificial Intelligence Machine Learning

Recap of Lecture 5-10Data classification

Probabilistic-based learners

)()|( hPhDP)(

)()|()|(DP

hPhDPDhP =

Linear/polynomial classifier

Today’s Agenda

Why Neural Networks?Looking into a BrainNeural NetworksNeural NetworksStarting from the Beginning:

PerceptronsMulti-layer perceptrons

Why Neural Networks?Brain vs. machines

Machines are tremendously faster than brains in well-defined problems:

Invert matrices solve differential equations etcInvert matrices, solve differential equations, etc.

Brains are tremendously faster and more accurate than machines in ill-defined methods or problems that require a lot p qof processing

Recognize the character of objects in TV

Let’s simulate our brains with artificial neural networks!Massive parallelism

Neurons interchanging signals

Looking into a Brain1011 neurons of more than 20 different types

0.001 seconds of neuron switching time

104-5 connections per neuron

0.1 seconds of scene recognition time

Artificial Neural NetworksBorrow some ideas from nervous systems of animals

)()( , jijjii aWginga ∑==

THE PERCEPTRON (McCulloch & Pitts)

AdalineAdaptive Linear Element

Adaptive linear combiner cascaded with a hard-limiting quantizer

Linear output transformed to binary by means of a threshold device

Training = adjusting the weights

Activation functions

AdalineNote that Adaline implements a function

∑+=n

iiiwxwwxf

10),( rr

This defines a threshold when the output is zero

0),( 0 =+= ∑n

iiwxwwxf rr 0),(1

0 +∑=i

iiwxwwxf

AdalineLet’s assume that we have two variables

0)( ++ wxwxwwxf rr

Therefore

0),( 22110 =++= wxwxwwxf

01 wxwx −−=2

wx −−=

So, Adaline is drawing a linear , gdiscriminant that divides the space into two regions

Linear classifierLinear classifier

AdalineSo, we got a cool way to create linear classifiers

But are linear classifiers enough to tackle our problems?

Can you draw a line that separates examples of class whiteCan you draw a line that separates examples of class white and black for the last example?

Moving to more Flexible NNSo, we want to classify problems such as x-or. Any idea?

Polynomial discriminant functions

In this system:

0),( 222222122111

21110 =+++++= wxwxwxxwxwxwwxf rr

Moving to more Flexible NN

With appropriate values of w, I can fit data that is not linearly separable

Even more Flexible: Multi-layer NN

So, we want to classify problems such as x-or. Any other idea?

Madaline: Multiple Adalines connected

This also enables the network to solve non-separable problems

But Step Down… How Do I Learn w?

We have seen that different structures enable me to define different functionsdefine different functions

But the key is to get a proper estimation of w

There are many algorithmsPerceptron rule

α-LMS

α-perceptron

May’s algorithm

Backpropagationp p g

We are going to see two examples: α-LMS and backprop.

Weight Learning in AdalineRecall that we want to adjust w

Weight Learning in AdalineWeight learning with α-LMS algorithm

XεIncrementally update weights as 21

XXWW εα+=+

The error is the difference between the actual and the expected output k

Tkkk XWd −=+1ε

A change in the weights effects the error k

Tkkk WXXWd Δ−=−Δ=Δ )(ε

XεAnd the weight change is 21

XXWWW εα=−=Δ +

Therefore kk

kXX αεεαε −=−=Δ 2

Therefore

Weight Learning in Adaline

W 2εα=Δk

Tkk WX ΔεΔ −=

Backpropagationα-LMS works for networks with a single layer. But what happens in networks with multiple layers?happens in networks with multiple layers?

Backpropagation (Rumelhat, 1986)The most influential development of NN in the 1980s

Here, we present the method conceptually (the math details are in the papers)in the papers)

Let’s assume a network withThree neurons in the input layer

Two neurons in the output layer

BackpropagationStrategy

Compute the gradient of the error

2∂ε

ˆ∂∂

=∇ε

Adjust the weights in the direction opposite to the instantaneous error gradient

Now, Wk is a vector that contains all the components of the net

BackpropagationAlgorithm1. Insert a new example Xk into the network and sweep it forward

till getting the output y

C t th f thi tt ib t2. Compute the square error of this attribute

( )∑∑ −==yy N

k ydεε

For example, for two outputs (disregarding k)

∑∑== 1i1i

P t th t th i l (b k ti )

( ) ( )2222

112 ydyd −+−=ε

3. Propagate the error to the previous layer (back-propagation). How?

Steepest descent

pCompute the derivative of the square error δ for each Adaline

Backpropagation ExampleExample borrowed from: http://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html

Backpropagation Example1. Sweep the weights forward

Backpropagation Example2. Backpropagate the error

Backpropagation Example3. Modify the weights of each neuron

Backpropagation Example3.bis. Do the same of each neuron

Backpropagation Example3.bis2. Until reaching the output

Backpropagation for a Two-Layer Net.

That is, the algorithm is1. Find the instantaneous square error derivative

2)l( 1 ∂εδ

This tells us how sensitive is the square output error of the net ork is to changes in the linear o tp t s of the associated

)l(j s2 ∂

−=δ

network is to changes in the linear output s of the associated Madaline

2. Expanding the error term we getp g g

[ ])2(

2)2(11

211)2(

1 s)s(sgmd

s)yd()yd(

∂−∂

−=∂

−+−∂−=

3. And recognizing that d1 is independent of s1

11 s2s2 ∂∂

)2()2()2()2()2(

)s('sgm)s('sgm)s(sgmd )2(1

)2(1 εδ =−= ][

That is, the algorithm is4. Similarly for the hidden layers we have

⎟⎞

⎜⎛ ∂∂∂∂∂ )2(

)1( ss11 εεεδ ⎟⎟⎠

⎜⎜⎝ ∂

∂∂∂

+∂∂

∂∂

−=∂∂

−= )1(1

)1(1 s

1 εεεδ

)2()2(

5. That is)1(

)2(2)2(

)2(1)2(

∂∂

+∂∂

= δδδ

4. Which yields)1(

)2(10 )s(sgmww)2()s(sgmww)2()1( ⎥⎦

⎤⎢⎣⎡

∑+∂⎥⎦⎤

⎢⎣⎡

∑+∂== + δδδ )1(

1 ∂⎦⎣

∂⎦⎣ == += δδδ

)s('sgmw)s('sgmw )1(1

)2(1 δδ +=

[ ] )s('sgmww )1(1

)2(1 δδ +=

Defining )2(21

)1(1 ww δδε +=

We obtain )s('sgm )1(1

)1(1 εδ =

Implementation details of each Adaline

Next Class

Support Vector MachinesSupport Vector Machines

Introduction to Machine Learning

Lecture 11N l N t kNeural Networks

Albert Orriols i Puigaorriols@salle.url.edu

Artificial Intelligence – Machine Learningg gEnginyeria i Arquitectura La Salle

Universitat Ramon Llull

Lecture11 - neural networks

Documents

Transcript of Lecture11 - neural networks

Neural and Fuzzy Neural Networks

Lecture 10: Neural Networks and Deep Learningsaravanan-thirumuruganathan.github.io/cse5334... · Neural Networks Deep Learning Convolutional Neural Networks Recurrent Neural Networks

Neural networks Computer vision Neural Networks and ... · Neural networks Computer vision Neural Networks and Learning Machines (3rd Edition) Author: Simon's H. Haykin Published:

Neural Networks with Complex Activations and … · Neural Networks with Complex Activations and Connection Weights ... neural network paradigms, ... Neural Networks with Complex

ARTIFICIAL NEURAL NETWORKS. Introduction to Neural Networks.

Neural network for Neural Networks

Neural Network Part 1: Multiple Layer Neural Networkspages.cs.wisc.edu/.../slides/lecture10-neural-networks-1.pdfNeural networks • a.k.a. artificial neural networks, connectionist

Neural Networks Recurrent networks Boltzmann networks …liacs.leidenuniv.nl/~nijssensgr/CI/2011/7 neural networks.pdfNeural Networks • Recurrent networks • Boltzmann networks

Lecture11-Networks Interconnection ModelsMPLS.ppt

Neural Networks: Old and New · Arti cial neural networks Brain neural networks Credit: Max Pixel Arti cial neural networks Why called arti cial? {(Over-)simpli cation on neural level

Neural Networks Part II Feed-Forward Neural Networks

Introduction to Neural Networks - Databricks · • Introduction to Neural Networks • Training Neural Networks • Applying your Neural Networks This series will be make use of

An introduction to neural networks for beginners€¦ · Part 1 – Introduction to neural networks 1.1 WHAT ARE ARTIFICIAL NEURAL NETWORKS? Artificial neural networks (ANNs) are

Neural Networks · Neural Networks NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos Abstract This report is an introduction to Artificial Neural Networks. The various types

Artificial Neural Networks Lect8: Neural networks for constrained optimization

Neural Networks Neural Networks

Neural Networks Neural Networks based on Competition CHAPTER 4.

Higher Order Neural Networks and Neural Networks for ...

Artificial Neural Networks Lecture Noteslucci/notes/lecture11.pdfArtificial Neural Networks Lecture Notes Stephen Lucci, PhD Artificial Neural Networks Part 11 Stephen Lucci, PhD Page

Neural Networks