Introduction to Neural networks (under graduate course) Lecture 6 of 9

16
Neural Networks Dr. Randa Elanwar Lecture 6

Transcript of Introduction to Neural networks (under graduate course) Lecture 6 of 9

Page 1: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Neural Networks

Dr. Randa Elanwar

Lecture 6

Page 2: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Lecture Content

• Non Linearly separable functions: XOR gate implementation

– MLP data transformation

– mapping implementation

– graphical solution

2Neural Networks Dr. Randa Elanwar

Page 3: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Non linear problems

• XOR problem

• The only way to separate the positive from negative examples is to draw 2 lines (i.e., we need 2 straight line equations) or nonlinear region to capture one type only

3Neural Networks Dr. Randa Elanwar

+ve

+ve-ve

-ve+ve-ve

cba yx 22

Page 4: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Non linear problems

• To implement the nonlinearity we need to insert one or more extra layer of nodes between the input layer and the output layer (Hidden layer)

4Neural Networks Dr. Randa Elanwar

Page 5: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Non linear problems

2-layer Feed Forward Example XOR solution

5Neural Networks Dr. Randa Elanwar

Page 6: Introduction to Neural networks (under graduate course) Lecture 6 of 9

MLP data transformation and mapping implementation

• Need for hidden units:

• If there is one layer of enough hidden units, the input can be recoded (memorized) multilayer perceptron (MLP)

• This recoding allows any problem to be mapped/represented (e.g., x2, x3, etc.)

• Question: how can the weights of the hidden units be trained?

• Answer: Learning algorithms e.g., back propagation

• The word ‘Back propagation’ is meant to the error propagation for weight adaptation of layers beginning from the last hidden layer back to the first i.e. weights of last layer are computed before the previous layers

6Neural Networks Dr. Randa Elanwar

Page 7: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Learning Non Linearly Separable Functions

• Back propagation tries to transform training patterns to make them almost linearly separable and use linear network

• In other words, if we need more than 1 straight line to separate +ve and –ve patterns, we solve the problem in two phases:

– In phase 1: we first represent each straight line with a single perceptron and classify/map the training patterns (output)

– In phase 2: these outputs are transformed to new patterns which are now linearly separable and can be classified by an additional perceptron giving the final result.

7Neural Networks Dr. Randa Elanwar

Page 8: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Multi-layer Networks and Perceptrons

8Neural Networks Dr. Randa Elanwar

- Have one or more layers of hidden units.

- With two possibly very large hidden layers, it is possible to implement any function.

- Networks without hidden layer are called perceptrons.

- Perceptrons are very limited in what they can represent, but this makes their learning problem much simpler.

Page 9: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Example: XOR problem

• Phase 1: we draw arbitrary lines• We find the line equations g1(x) = 0 and g2(x) = 0 using

arbitrary intersections on the axes (yellow points p1, p2, p3, p4). • We assume the +ve and –ve directions for each line.• We classify the given patterns as +ve/-ve with respect to both g1(x) & g2(x)• Phase 2: we transform the patterns we have• Let the patterns that are +ve/-ve with respect to both g1(x) and g2(x) belong to class

B (similar signs), otherwise belong to class A (different signs).• We find the line equations g(y) = 0 using arbitrary intersections on the new axes

9Neural Networks Dr. Randa Elanwar

X2

X1

A

A

B

B

x1 x2 XOR Class0 0 0 B0 1 1 A1 0 1 A1 1 0 B

g2(x)g1(x) +ve+ve

p1p2

p3

p4

Page 10: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Let p1 = (0.5,0), p2 = (0,0.5), p3(1.5,0), p4(0,1.5)

• Constructing g1(x) = 0

• g1(x) = x1 + x2 – 0.5 = 0

• Constructing g2(x) = 0

• g2(x) = x1 + x2 – 1.5 = 0

10Neural Networks Dr. Randa Elanwar

)1(1)1(2

)2(1)2(2

)1(11

)2(12

pp

pp

px

px

5.00

05.0

5.01

02

x

x

)1(3)1(4

)2(3)2(4

)1(31

)2(32

pp

pp

px

px

5.10

05.1

5.11

02

x

x

Page 11: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Assume x1>p1(1) is the positive direction for g1(x)• Assume x1>p3(1) is the positive direction for g2(x)• Classifying the given patterns with respect to g1(x) and g2(x):

• If we represent +ve and –ve values as a result of step function i.e., y1 =f(g1(x))= 0 if g1(x) is –ve and 1 otherwiseand y2 =f(g2(x))= 0 if g2(x) is –ve and 1 otherwise

• We now have only three patterns that can be linearly separable and we got rid of the extra pattern causing the problem (since 2 patterns coincide)

11Neural Networks Dr. Randa Elanwar

x1 x2 g1(x) g2(x) y1 y2 Class

0 0 -ve -ve 0 0 B

0 1 +ve -ve 1 0 A

1 0 +ve -ve 1 0 A

1 1 +ve +ve 1 1 B

Page 12: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Let p1 = (0.5,0), p2 = (0,-0.25)

• Constructing g(y) = 0

• g(y) = y1 – 2 y2 – 0.5 = 0

12Neural Networks Dr. Randa Elanwar

y2

y1A

B

B

g(y)

+ve

p1

p2)1(1)1(2

)2(1)2(2

)1(11

)2(12

pp

pp

py

py

5.00

025.0

5.01

02

y

y

x1

x2

1

1

-0.51

-2 -0.5

1

1-1.5

g1(x)

g2(x)

g(y)

Output layerHidden layerInput layer

Page 13: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Example: The linearly non separable patterns x1 = [3 0], x2 = [5 2], x3 = [1 3], x4 = [2 4], x5 = [1 1], x6 = [3 3] have to be classified into two categories C1 = {x1, x2, x3, x4} and C2 = {x5, x6} using a feed forward 2-layer neural network. – Select a suitable number of partitioning straight lines.

– Consequently design the first stage (hidden layer) of the network with bipolar discrete perceptrons.

– Using this layer, transform the six samples.

– Design the output layer of the network with a bipolar discrete perceptron using the transformed samples.

13Neural Networks Dr. Randa Elanwar

Page 14: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Let p1 = (0,1), p2 = (1,2), • p3 = (2,0), p4 = (3,1)• Constructing g1(x) = 0

• g1(x) = x1 - x2 + 1 = 0• Constructing g2(x) = 0

• g2(x) = x1 - x2 – 2 = 0

14Neural Networks Dr. Randa Elanwar

X2

X1

x3

x1

x6

x5

g2(x)

g1(x)

+ve

+ve

p1

p2x2

x4

p3

p4)1(1)1(2

)2(1)2(2

)1(11

)2(12

pp

pp

px

px

01

12

01

12

x

x

)1(3)1(4

)2(3)2(4

)1(31

)2(32

pp

pp

px

px

23

01

21

02

x

x

Page 15: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

15Neural Networks Dr. Randa Elanwar

x1 x2 g1(x) g2(x) y1 y2 Class

3 0 +ve +ve 1 1 B

5 2 +ve +ve 1 1 B

1 3 -ve -ve -1 -1 B

2 4 -ve -ve -1 -1 B

1 1 +ve -ve 1 -1 A

3 3 +ve -ve 1 -1 A

Assume x2<p1(2) is the positive direction for g1(x)Assume x1>p3(1) is the positive direction for g2(x)Classifying the given patterns with respect to g1(x) and g2(x):

We now have only three patterns that can be linearly separable and we got rid of the extra pattern causing the problem (since 2 patterns coincide)

we represent +ve and –ve values as a result of bipolar function y1 =f(g1(x))= -1 if g1(x) is –ve and 1 otherwisey2 =f(g2(x))= -1 if g2(x) is –ve and 1 otherwise

Page 16: Introduction to Neural networks (under graduate course) Lecture 6 of 9

Solving Non Linearly Separable Functions

• Let p1 = (1,0), p2 = (0,-1)

• Constructing g(y) = 0

• g(y) = y1 – y2 – 1 = 0

16Neural Networks Dr. Randa Elanwar

y2

y1

x5,x6

x1,x2

x3,x4

g(y)

+vep1

p2

)1(1)1(2

)2(1)2(2

)1(11

)2(12

pp

pp

py

py

10

01

11

02

y

y

x1

x2

1

-1

11

-1 -1

1

-1-2

g1(x)

g2(x)

g(y)

Output layerHidden layerInput layer