Learning via Neural Networks L. Manevitz All rights reserved.

50
Learning via Neural Networks L. Manevitz All rights reserved
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Learning via Neural Networks L. Manevitz All rights reserved.

Page 1: Learning via Neural Networks L. Manevitz All rights reserved.

Learning via Neural Networks

L. Manevitz All rights reserved

Page 2: Learning via Neural Networks L. Manevitz All rights reserved.

Natural versus Artificial Neuron

• Natural Neuron McCullough Pitts Neuron

Page 3: Learning via Neural Networks L. Manevitz All rights reserved.

Definitions and History

• McCullough –Pitts Neuron

• Perceptron

• Adaline

• Linear Separability

• Multi-Level Neurons

• Neurons with Loops

Page 4: Learning via Neural Networks L. Manevitz All rights reserved.

Sample Feed forward Network (No loops)

•Weights •Weights

•Weights

•Input

•Output

•Wji•Vik

F(wji xj

Page 5: Learning via Neural Networks L. Manevitz All rights reserved.

•Pattern Identification

•(Note: Neuron is trained)

•weights

field receptivein threshold Axw ii kdkdkfjlll

field. receptive in the is letter The Axw ii

Perceptron

Page 6: Learning via Neural Networks L. Manevitz All rights reserved.

•weights

field receptivein threshold Axw ii kdkdkfjlll

Feed Forward Network

•weights

Page 7: Learning via Neural Networks L. Manevitz All rights reserved.

Reason for Explosion of Interest

• Two co-incident affects (around 1985 – 87)

– (Re-)discovery of mathematical tools and algorithms for handling large networks

– Availability (hurray for Intel and company!) of sufficient computing power to make experiments practical.

Page 8: Learning via Neural Networks L. Manevitz All rights reserved.

Some Properties of NNs

• Universal: Can represent and accomplish any task.

• Uniform: “Programming” is changing weights

• Automatic: Algorithms for Automatic Programming; Learning

Page 9: Learning via Neural Networks L. Manevitz All rights reserved.

Networks are Universal

• All logical functions represented by three level (non-loop) network (McCullough-Pitts)

• All continuous (and more) functions represented by three level feed-forward networks (Cybenko et al.)

• Networks can self organize (without teacher).

• Networks serve as associative memories

Page 10: Learning via Neural Networks L. Manevitz All rights reserved.

Universality

• McCullough-Pitts: Adaptive Logic Gates; can represent any logic function

• Cybenko: Any continuous function representable by three-level NN.

Page 11: Learning via Neural Networks L. Manevitz All rights reserved.

Networks can “LEARN” and Generalize (Algorithms)

• One Neuron (Perceptron and Adaline) Very popular in 1960s – early 70s– Limited by representability (only linearly separable

• Feed forward networks (Back Propagation)– Currently most popular network (1987 –now)

• Kohonen self-Organizing Network (1980s – now)(loops)

• Attractor Networks (loops)

Page 12: Learning via Neural Networks L. Manevitz All rights reserved.

Learnability (Automatic Programming)

• One neuron: Perceptron and Adaline algorithms (Rosenblatt and Widrow-Hoff) (1960s –now)

Feed forward Networks: Backpropagation (1987 – now)

Associative Memories and Looped Networks (“Attractors”) (1990s – now)

Page 13: Learning via Neural Networks L. Manevitz All rights reserved.

Generalizability

• Typically train a network on a sample set of examples

• Use it on general class

• Training can be slow; but execution is fast.

Page 14: Learning via Neural Networks L. Manevitz All rights reserved.

•weights

field receptivein threshold Axw ii kdkdkfjlll

Feed Forward Network

•weights

Page 15: Learning via Neural Networks L. Manevitz All rights reserved.

Classical Applications(1986 – 1997)

• “Net Talk” : text to speech

• ZIPcodes: handwriting analysis

• Glovetalk: Sign Language to speech

• Data and Picture Compression: “Bottleneck”

• Steering of Automobile (up to 55 m.p.h)

• Market Predictions

• Associative Memories

• Cognitive Modeling: (especially reading, …)

• Phonetic Typewriter (Finnish)

Page 16: Learning via Neural Networks L. Manevitz All rights reserved.

Neural Network

• Once the architecture is fixed; the only free parameters are the weights

• Thus Uniform ProgrammingUniform Programming

• Potentially Potentially Automatic ProgrammingAutomatic Programming

• Search for Learning AlgorithmsSearch for Learning Algorithms

Page 17: Learning via Neural Networks L. Manevitz All rights reserved.

Programming: Just find the weights!

• AUTOMATIC PROGRAMMING

• One Neuron: Perceptron or Adaline

• Multi-Level: Gradient Descent on Continuous Neuron (Sigmoid instead of step function).

Page 18: Learning via Neural Networks L. Manevitz All rights reserved.

Prediction

•Input/Output •NN

•delay

•Compare

Page 19: Learning via Neural Networks L. Manevitz All rights reserved.

Abstracting

• Note: Size may not be crucial (apylsia or crab does many things)

• Look at simple structures first

Page 20: Learning via Neural Networks L. Manevitz All rights reserved.

Real and Artificial Neurons

Page 21: Learning via Neural Networks L. Manevitz All rights reserved.

One NeuronMcCullough-Pitts

• This is very complicated. But abstracting the details,we have

w1

w2

wn

x1

x2

xn

hresholdntegrate

Integrate-and-fire Neuron

Page 22: Learning via Neural Networks L. Manevitz All rights reserved.

Representability

• What functions can be represented by a network of Mccullough-Pitts neurons?

• Theorem: Every logic function of an arbitrary number of variables can be represented by a three level network of neurons.

Page 23: Learning via Neural Networks L. Manevitz All rights reserved.

Proof

• Show simple functions: and, or, not, implies

• Recall representability of logic functions by DNF form.

Page 24: Learning via Neural Networks L. Manevitz All rights reserved.

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

1.5

1.0

1.0

Page 25: Learning via Neural Networks L. Manevitz All rights reserved.

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

.9

1.0

1.0

Page 26: Learning via Neural Networks L. Manevitz All rights reserved.

AND, OR, NOT

w1

w2

wn

x1

x2

xn

hresholdntegrate

.5

-1.0

Page 27: Learning via Neural Networks L. Manevitz All rights reserved.

DNF and All Functions

Page 28: Learning via Neural Networks L. Manevitz All rights reserved.

Learnability and Generalizability

• The previous theorem tells us that neural networks are potentially powerful, but doesn’t tell us how to use them.

• We desire simple networks with uniform training rules.

Page 29: Learning via Neural Networks L. Manevitz All rights reserved.

One Neuron(Perceptron)

• What can be represented by one neuron?

• Is there an automatic way to learn a function by examples?

Page 30: Learning via Neural Networks L. Manevitz All rights reserved.

Perceptron Training Rule

• Loop:

Take an example. Apply to network. If correct answer,

return to loop. If incorrect,

go to FIX.

FIX: Adjust network weights by

input example . Go to Loop.

Page 31: Learning via Neural Networks L. Manevitz All rights reserved.

Example of PerceptronLearning

• X1 = 1 (+) x2 = -.5 (-)

• X3 = 3 (+) x4 = -2 (-)

• Expanded Vector– Y1 = (1,1) (+) y2= (-.5,1)(-)

– Y3 = (3,1) (+) y4 = (-2,1) (-)

Random initial weight

(-2.5, 1.75)

Page 32: Learning via Neural Networks L. Manevitz All rights reserved.

Trace of Perceptron

– W1 y1 = (-2.5,1.75) (1,1)<0 wrong

• W2 = w1 + y1 = (-1.5, 2.75)– W2 y2 = (-1.5, 2.75)(-.5, 1)>0 wrong

• W3 = w2 – y2 = (-1, 1.75)– W3 y3 = (-1,1.75)(3,1) <0 wrong

• W4 = w4 + y3 = (2, 2.75)

Page 33: Learning via Neural Networks L. Manevitz All rights reserved.

Perceptron Convergence Theorem

• If the concept is representable in a perceptron then the perceptron learning rule will converge in a finite amount of time.

• (MAKE PRECISE and Prove)

Page 34: Learning via Neural Networks L. Manevitz All rights reserved.

Percentage of Boolean Functions Representible by a Perceptron

• Input Functions Perceptron

1 4 4

2 16 14 3 104256 4 1,882 65,536

5 94,572 10**9

6 15,028,134 10**19

7 8,378,070,864 10**38

8 17,561,539,552,946 10**77

Page 35: Learning via Neural Networks L. Manevitz All rights reserved.

What is a Neural Network?

• What is an abstract Neuron?

• What is a Neural Network?

• How are they computed?

• What are the advantages?

• Where can they be used?

– Agenda

– What to expect

Page 36: Learning via Neural Networks L. Manevitz All rights reserved.

Perceptron Algorithm

• Start: Choose arbitrary value for weights, W

• Test: Choose arbitrary example X

• If X pos and WX >0 or X neg and WX <= 0 go to Test

• Fix: – If X pos W := W +X;

– If X negative W:= W –X;

– Go to Test;

Page 37: Learning via Neural Networks L. Manevitz All rights reserved.

Perceptron Conv. Thm.

• Let F be a set of unit length vectors. If there is a vector V* and a value e>0 such that V*X > e for all X in F then the perceptron program goes to FIX only a finite number of times.

Page 38: Learning via Neural Networks L. Manevitz All rights reserved.

Applying Algorithm to “And”

• W0 = (0,0,1) or random

• X1 = (0,0,1) result 0

• X2 = (0,1,1) result 0

• X3 = (1,0, 1) result 0

• X4 = (1,1,1) result 1

Page 39: Learning via Neural Networks L. Manevitz All rights reserved.

“And” continued• Wo X1 > 0 wrong;

• W1 = W0 – X1 = (0,0,0)

• W1 X2 = 0 OK (Bdry)

• W1 X3 = 0 OK

• W1 X4 = 0 wrong;

• W2 = W1 +X4 = (1,1,1)

• W3 X1 = 1 wrong

• W4 = W3 –X1 = (1,1,0)

• W4X2 = 1 wrong

• W5 = W4 – X2 = (1,0, -1)

• W5 X3 = 0 OK

• W5 X4 = 0 wrong

• W6 = W5 + X4 = (2, 1, 0)

• W6 X1 = 0 OK

• W6 X2 = 1 wrong

• W7 = W7 – X2 = (2,0, -1)

Page 40: Learning via Neural Networks L. Manevitz All rights reserved.

“And” page 3

• W8 X3 = 1 wrong

• W9 = W8 – X3 = (1,0, 0)

• W9X4 = 1 OK

• W9 X1 = 0 OK

• W9 X2 = 0 OK

• W9 X3 = 1 wrong

• W10 = W9 – X3 = (0,0,-1)

• W10X4 = -1 wrong

• W11 = W10 + X4 = (1,1,0)

• W11X1 =0 OK

• W11X2 = 1 wrong

• W12 = W12 – X2 = (1,0, -1)

Page 41: Learning via Neural Networks L. Manevitz All rights reserved.

Proof of Conv Theorem

• Note:

1. By hypothesis, there is a

such that V*X > for all x F

1. Can eliminate threshold

(add additional dimension to input) W(x,y,z) > threshold if and only if

W* (x,y,z,1) > 0

2. Can assume all examples are positive ones

(Replace negative examples

by their negated vectors)

W(x,y,z) <0 if and only if

W(-x,-y,-z) > 0.

Page 42: Learning via Neural Networks L. Manevitz All rights reserved.

Proof (cont).

• Consider quotient V*W/|W|.

(note: this is multidimensional

cosine between V* and W.)

Recall V* is unit vector .

Quotient <= 1.

Page 43: Learning via Neural Networks L. Manevitz All rights reserved.

Proof(cont)

• Now each time FIX is visited W changes via ADD.

V* W(n+1) = V*(W(n) + X)

= V* W(n) + V*X

>= V* W(n) +

Hence

V* W(n) >= n

Page 44: Learning via Neural Networks L. Manevitz All rights reserved.

Proof (cont)

• Now consider denominator:

• |W(n+1)| = W(n+1)W(n+1) = ( W(n) + X)(W(n) + X) = |W(n)|**2 + 2W(n)X + 1 (recall |X| = 1)

< |W(n)|**2 + 1

So after n times

|W(n+1)|**2 < n (**)

Page 45: Learning via Neural Networks L. Manevitz All rights reserved.

Proof (cont)

• Putting (*) and (**) together:

Quotient = V*W/|W|

> n sqrt(n)

Since Quotient <=1 this means

n < 1/

This means we enter FIX a bounded number of times.

Q.E.D.

Page 46: Learning via Neural Networks L. Manevitz All rights reserved.

Geometric Proof

• See hand slides.

Page 47: Learning via Neural Networks L. Manevitz All rights reserved.

Additional Facts

• Note: If X’s presented in systematic way, then solution W always found.

• Note: Not necessarily same as V*

• Note: If F not finite, may not obtain solution in finite time

• Can modify algorithm in minor ways and stays valid (e.g. not unit but bounded examples); changes in W(n).

Page 48: Learning via Neural Networks L. Manevitz All rights reserved.

What wont work?

• Example: Connectedness with bounded diameter perceptron.

• Compare with Convex with

(use sensors of order three).

Page 49: Learning via Neural Networks L. Manevitz All rights reserved.

What wont work?

• Try XOR.

Page 50: Learning via Neural Networks L. Manevitz All rights reserved.

Limitations of Perceptron

• Representability

– Only concepts that are linearly separable.

– Compare: Convex versus connected

– Examples: XOR vs OR