Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd...

41
Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Transcript of Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd...

Page 1: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Artificial Intelligence

Statistical learning methodsChapter 20, AIMA 2nd ed.Chapter 18, AIMA 3rd ed.

(only ANNs & SVMs)

Page 2: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Artificial neural networks

The brain is a pretty intelligent system.

Can we ”copy” it?

There are approx. 1011 neurons in the brain.

There are approx. 23109 neurons in the male cortex (females have about 15% less).

Page 3: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

The simple model

• The McCulloch-Pitts model (1943)

Image from Neuroscience: Exploring the brain by Bear, Connors, and Paradiso

y = g(w0+w1x1+w2x2+w3x3)

w1

w2

w3

Page 4: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Neuron firing rate

Firi

ng r

ate

Depolarization

Page 5: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Transfer functions g(z)

The Heaviside function The logistic function

Page 6: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

The simple perceptron

With {-1,+1} representation

Traditionally (early 60:s) trained with Perceptron learning.

0 if1

0 if1]sgn[)(

y

T

TT

xw

xwxwx

22110 xwxwwT xw

Page 7: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Perceptron learning

Repeat until no errors are made anymore1. Pick a random example [x(n),f(n)]

2. If the classification is correct, i.e. if y(x(n)) = f(n) , then do nothing

3. If the classification is wrong, then do the following update to the parameters (, the learning rate, is a small positive number)

Bn

Annf

class tobelongs )( if1

class tobelongs )( if1)(

x

xDesired output

)()( nxnfww iii

Page 8: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

Initial values:

= 0.3

1

1

5.0

w

Page 9: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

1

1

5.0

w

This one is correctlyclassified, no action.

Page 10: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

1

1

5.0

w

This one is incorrectlyclassified, learning action.

7.01

10

8.01

22

11

00

ww

ww

ww

Page 11: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

1

8.0

w

This one is incorrectlyclassified, learning action.

7.01

10

8.01

22

11

00

ww

ww

ww

Page 12: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

1

8.0

w

This one is correctlyclassified, no action.

Page 13: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1This one is incorrectlyclassified, learning action.

7.00

7.01

1.11

22

11

00

ww

ww

ww

7.0

1

8.0

w

Page 14: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

7.0

1.1

w

This one is incorrectlyclassified, learning action.

7.00

7.01

1.11

22

11

00

ww

ww

ww

Page 15: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Perceptron learning

The AND function

x1

x2x1 x2 f

0 0 -1

0 1 -1

1 0 -1

1 1 +1

7.0

7.0

1.1

w Final solution

Page 16: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Perceptron learning

• Perceptron learning is guaranteed to find a solution in finite time, if a solution exists.

• Perceptron learning cannot be generalized to more complex networks.

• Better to use gradient descent – based on formulating an error and differentiable functions

N

n

nynfE1

2),()()( WW

Page 17: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Gradient search

)(WW EW

W

E(W)

“Go downhill”

The “learning rate” () is set heuristically

W(k)

W(k+1) = W(k) + W(k)

Page 18: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

The Multilayer Perceptron (MLP)

• Combine several single layer perceptrons.

• Each single layer perceptron uses a sigmoid function (C)E.g.

x k

h j

h i

y l 1)exp(1)(

)tanh()(

zz

zz

outp

ut

inpu

t

Can be trained using gradient descent

Page 19: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: One hidden layer

• Can approximate any continuous function

(z) = sigmoid or linear, (z) = sigmoid.

x k

h j

y i

D

kkjkjj

J

jjijii

xwwh

hvvy

10

10

)(

)()(

x

xx

Page 20: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Training: Backpropagation(Gradient descent)

ijijij

N

nkjjk

N

njiij

w

tnhtvtntn

nxtntw

tnhtntv

tEt

]),([)(),(),(

where

)(),()(

]),([),()(

)()(

'

1

1

x

x

w

Page 21: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Support vector machines

Page 22: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Linear classifier on a linearly separable problem

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Page 23: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Choose the line with thelargest margin.

The “large margin classifier”

margin

Linear classifier on a linearly separable problem

Page 24: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

There are infinitely manylines that have zero trainingerror.

Which line should we choose?

Choose the line with thelargest margin.

The “large margin classifier”

margin

Linear classifier on a linearly separable problem

”Support vectors”

Page 25: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

The plane separating and is defined by

The dashed planes are given by

margin

Computing the margin

w

aT xw

ba

baT

T

xw

xw

Page 26: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Divide by b

Define new w = w/b and = a/bmargin

Computing the margin

w 1//

1//

bab

babT

T

xw

xw

1

1

xw

xwT

T

We have defined a scalefor w and b

Page 27: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

We have

which gives

margin

Computing the margin

margin

1)(

1

w

wxw

xw

T

T

w

x

x + w

w

2margin

Page 28: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

w

Maximizing the margin isequal to minimizing

||w||

subject to the constraints

wTx(n) – +1 for all

wTx(n) – -1 for all

Quadratic programming problem, constraints can be included with Lagrange multipliers.

Linear classifier on a linearly separable problem

Page 29: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

N

n

N

n

N

m

TmnnD mnmynyL

1 1 1

)()()()(2

1xx

Quadratic programming problem

N

n

Tnp nnyL

1

21)()(

2

1 xww

Minimize cost (Lagrangian)

Minimum of Lp occurs at the maximum of (the Wolfe dual)

Only scalar productin cost. IMPORTANT!

Page 30: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

sn

Tn

T nnyy xxxwx )()(sgnsgn)(ˆ

Linear Support Vector Machine

Test phase, the predicted output

Where is determined e.g. by looking at one of the support vectors.

Still only scalar products in the expression.

Page 31: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

How deal with nonlinear case?

• Project data into high-dimensional space Z. There we know that it will be linearly separable (due to VC dimension of linear classifier).

• We don’t even have to know the projection...!

)]([)( nn xz

Page 32: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Scalar product kernel trick

))(())(())(),(( mnmnK T xφxφxx

If we can find kernel such that

Then we don’t even have to know the mappingto solve the problem...

Page 33: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Valid kernels (Mercer’s theorem)

)](),([)]2(),([)]1(),([

)](),2([)]2(),2([)]1(),2([

)](),1([)]2(),1([)]1(),1([

NNKNKNK

NKKK

NKKK

xxxxxx

xxxxxx

xxxxxx

K

Define the matrix

If K is symmetric, K = KT, and positive semi-definite, thenK[x(i),x(j)] is a valid kernel.

Page 34: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Examples of kernels

dT jijiK

jijiK

)()()](),([

2/)()(exp)](),([2

xxxx

xxxx

First, Gaussian kernel. Second, polynomial kernel. With d=1 we have linear SVM.

Linear SVM often used with good success on highdimensional data (e.g. text classification).

Page 35: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Example: Robot color vision(Competition 1999)

Classify the Lego pieces into red, blue, and yellow.Classify white balls, black sideboard, and green carpet.

Page 36: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

What the camera sees (RGB space)

Yellow

Green

Red

Page 37: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Mapping RGB (3D) to rgb (2D)

BGR

Bb

BGR

Gg

BGR

Rr

Page 38: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Lego in normalized rgb space

2Xx 6Cc

Input is 2D

x1

x2

Output is 6D:{red, blue, yellow,green, black, white}

Page 39: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

MLP classifier

E_train = 0.21%E_test = 0.24%2-3-1 MLP

Levenberg-Marquardt

Training time(150 epochs):51 seconds

Page 40: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

SVM classifier

E_train = 0.19%E_test = 0.20%SVM with

= 1000

2)(exp),( yxyx K

Training time:22 seconds

Page 41: Artificial Intelligence Statistical learning methods Chapter 20, AIMA 2 nd ed. Chapter 18, AIMA 3 rd ed. (only ANNs & SVMs)

Machine Learning

• Machine learning (multilayer perceptrons, support vector machines, clustering) is covered in great detail in the course ”Learning Systems”.