CS621/CS449 Artificial Intelligence Lecture Notes

22
13/08/2004 CS-621/CS-449 Lecture Notes Instructor: Prof. Pushpak Bhattacharyya CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004

description

CS621/CS449 Artificial Intelligence Lecture Notes. Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004. Outline. Proof of Convergence of PTA Some Observations Study of Linear Separability Test for Linear Separability Asummability. Proof of Convergence of PTA. - PowerPoint PPT Presentation

Transcript of CS621/CS449 Artificial Intelligence Lecture Notes

Page 1: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 CS-621/CS-449 Lecture Notes

Instructor: Prof. Pushpak Bhattacharyya

CS621/CS449Artificial Intelligence

Lecture Notes

Set 4: 13/08/2004, 17/08/2004, 18/08/2004, 20/08/2004

Page 2: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Outline

• Proof of Convergence of PTA• Some Observations • Study of Linear Separability• Test for Linear Separability• Asummability

Page 3: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Proof of Convergence of PTA

• Perceptron Training Algorithm (PTA)

• Statement:

Whatever be the initial choice of weights and whatever be the vector chosen for testing, PTA converges if the vectors are from a linearly separable function.

Page 4: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Proof of Convergence of PTA

• Suppose wn is the weight vector at the nth step of the algorithm.

• At the beginning, the weight vector is w0

• Go from wi to wi+1 when a vector Xj fails the test wiXj > 0 and update wi as

wi+1 = wi + Xj

• Since Xjs form a linearly separable function,

w* s.t. w*Xj > 0 j

Page 5: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Proof of Convergence of PTA

• Consider the expression

G(wn) = wn . w*

| wn|

where wn = weight at nth iteration• G(wn) = |wn| . |w*| . cos

|wn|

where = angle b/w wn and w*• G(wn) = |w*| . cos • G(wn) ≤ |w*| ( as -1 ≤ cos ≤ 1)

Page 6: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Behavior of Numerator of G

wn . w* = (wn-1 + Xn-1fail ) . w*

wn-1 . w* + Xn-1fail . w*

(wn-2 + Xn-2fail ) . w* + Xn-1

fail . w* ….. w0 . w* + ( X0

fail + X1fail +.... + Xn-1

fail ). w*

• Suppose |Xj| ≥ , where is the minimum magnitude.

• Num of G ≥ |w0 . w*| + n . |w*| • So, numerator of G grows with n.

Page 7: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Behavior of Denominator of G

• |wn| = wn . wn

(wn-1 + Xn-1fail )2

(wn-1)2 + 2. wn-1. Xn-1

fail + (Xn-1fail )2

≤ (wn-1)2 + (Xn-1

fail )2 (as wn-1. Xn-1

fail ≤ 0 )

≤ (w0)2 + (X0

fail )2 + (X1fail )2 +…. + (Xn-1

fail )2

• |Xj| ≤ (max magnitude)

• So, Denom ≤ (w0)2 + n2

Page 8: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Some Observations

• Numerator of G grows as n• Denominator of G grows as n

=> Numerator grows faster than denominator

• If PTA does not terminate, G(wn) values will become unbounded.

• But, as |G(wn)| ≤ |w*| which is finite, this is impossible!

• Hence, PTA has to converge. • Proof is due to Marvin Minsky.

Page 9: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Study of Linear Separability

y

x1

. . .

θ

w1 w2 w3 wn

x2 x3 xn

• W. Xj = 0 defines a hyperplane in the (n+1) dimension.

=> W vector and Xj vectors are perpendicular to each other.

Page 10: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Linear Separability

Xk+1 -

Xk+2 -

-

Xm -

X1+ + X2

Xk+

+

Positive set :

w. Xj > 0 j≤k

Negative set :

w. Xj < 0 j>kSeparating hyperplane

Page 11: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Linear Separability

• w. Xj = 0 => w is normal to the hyperplane which separates the +ve points from the –ve points.

• In this computing paradigm, computation means “placing hyperplanes”.

• Functions computable by the perceptron are called – “threshold functions” because of comparing

∑ wiXi with θ (the threshold)– “linearly separable” because of setting up

linear surfaces to separate +ve and –ve points

Page 12: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Linear Separability

• Decision problems may have +ve and –ve points that need to be separated

• The hyperplane found should be generalized through learning– PAC learning, SVMs etc. for learning

Page 13: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Test for Linear Separability (LS)

• Theorem:

A function is linearly separable iff the vectors corresponding to the function do not have a Positive Linear Combination (PLC)

• PLC – Both a necessary and sufficient condition.

• X1, X2, … , Xm - Vectors of the function• Y1, Y2, … , Ym - Augmented negated set

• Prepending -1 to the 0-class vector Xi and negating it, gives Yi

Page 14: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Example (1) - XNOR

• The set {Yi} has a PLC if ∑ Pi Yi = 0 , 1 ≤ i ≤ m

– where each Pi is a non-negative scalar and

– atleast one Pi > 0

• Example : 2 bit even-parity (X-NOR function)

X1 <0,0> + Y1 <-1,0,0>

X2 <0,1> - Y2 <1,0,-1>

X3 <1,0> - Y3 <1,-1,0>

X4 <1,1> + Y4 <-1,1,1>

Page 15: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Example (1) - XNOR

• P1 [ -1 0 0 ]T + P2 [ 1 0 -1 ]T

+ P3 [ 1 -1 0 ]T + P4 [ -1 1 1 ]T

= [ 0 0 0 ]T

• All Pi = 1 gives the result.

• For Parity function,PLC exists => Not linearly separable.

Page 16: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Example (2) – Majority function

Xi Yi

<0 0 0> - <1 0 0 0>

<0 0 1> - <1 0 0 -1>

<0 1 0> - <1 0 -1 0>

<0 1 1> + <-1 0 1 1>

<1 0 0> - <1 -1 0 0>

<1 0 1> + <-1 1 0 1>

<1 1 0> + <-1 1 1 0>

<1 1 1> + <-1 1 1 1>

• 3-bit majority function

• Suppose PLC exists. Equations obtained are:

• P1 + P2+ P3- P4 + P5- P6- P7 – P8 = 0

• -P5+ P6 + P7+ P8 = 0

• -P3+ P4 + P7+ P8 = 0

• -P2+ P4 + P6+ P8 = 0

• On solving, all Pi will be forced to 0

• 3 bit majority function

=> No PLC => LS

Page 17: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Example (3) – n bit comparatorz

y2n-1

. . .

θ

W2n-1 wn wn w0

yn Xn-1 x0

. .

• n-bit comparator• z = 1 if dec(y) > dec(x)

= 0 otherwise• Perceptron realization

possible?

• w vector is such that

wi = - 2i i , 0 ≤ i ≤ n-1= 2j-n j , n ≤ j ≤ 2n-1

• and θ = 0

Page 18: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Example (3) – n bit comparator

• 2 bit comparator (y0, y1, x0, x1, z) • +ve class : 0100, 1000 and so on• -ve class : 0000, 0001and so on

• After augmentation and negation, 10 non-negative nos. Pi are to be found such that not all of them are zero.

• Suppose PLC exists, Σ Pi Yi = 0 such that i Pi ≥ 0, i=1…15 and ~ (i , Pi = 0)

• Solving the equations, we eventually end up with all

Pis = 0.

Page 19: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Observations

• If the vectors are from a linearly separable function, then during the course of training, never can there be a repetition of weights.

• Weights strictly increase and hence do not repeat

• Asummability

*.

....

*.

*.

*).(*.

0

2

1

11

ww

ww

ww

wXwww

n

n

nfailnn

Page 20: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Asummability• Another form of PLC test• Helps in quick testing of LS• Definition of k-summability:

Given the +ve and -ve classes, a function is called

k-summable, if we can find k vectors from the +ve class and k vectors from the –ve class such that

where pis and njs are >0.

kk

kk

YnYnYn

YpYpYp

...

...

2211

2211

Page 21: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Asummability

• A function is called asummable if it is not k-summable for any k.

• Asummability is equivalent to LS.• Example : XOR

01

10

+ve class11

00

-ve class

01

10

11

11

00

11

2-summable => not LS

Page 22: CS621/CS449 Artificial Intelligence Lecture Notes

13/08/2004 - 20/08/2004 CS-621/CS-449 Lecture Notes Prof. Pushpak Bhattacharyya

IIT Bombay

Assumability – Examples

• AND – is not k-summable => asummability =>LS

11+ve class

00

01

10

-ve class

• A single vector in one class and all others in another => linearly separable

• Another example – Majority function