Bioinspired Computing Lecture 6 Artificial Neural Networks: The rise & fall of the perceptron Netta...

Bioinspired ComputingLecture 6

Artificial Neural Networks:

The rise & fall of the perceptron

Netta Cohen

2

Last time... biological neural networks

This time... Artificial neural networks (part 1)

Forget the complexity. Focus on cartoon models of biological nnets & further simplify them. Build on biology to design simple artificial networks that perform classification tasks. Today, we start with a single artificial neuron and study its computational power.

We introduced biological neural networks. We found complexity at every level, from the sub-cellular to the entire brain. We realised that even with a limited understanding, cartoon models can be derived for some functions of neurons (action potentials, synaptic transmission, neuronal computation and coding). Despite (or perhaps because of) their simplicity, these cartoon models are priceless.

3

Learning

No discussion of the brain, or nervous systems more generally is complete without mention of learning.

• What is learning?• How does a neural network ‘know’ what computation to perform?• How does it know when it gets an ‘answer’ right (or wrong)? • What actually changes as a neural network undergoes ‘learning’?

brainSensory inputs

Motor outputs

body

environment

4

Learning (cont.)

Learning can take many forms:• Supervised learning• Reinforcement learning• Association• Conditioning• Evolution

At the level of neural networks, the best understood forms of

learning occur in the synapses, i.e., the strengthening and

weakening of connections between neurons. The brain uses its

own learning algorithms to define how connections should

change in a network.

5

How do the neural networks form in the brain? Once formed, what determines how the circuit might change?

Learning from experience

In 1948, Donald Hebb, in his book, "The Organization of Behavior", showed how basic psychological phenomena of attention, perception & memory might emerge in the brain.

Hebb regarded neural networks as a collection of cells that can collectively store memories. Our memories reflect our experience.

How does experience affect neurons and neural networks?How do neural networks learn?

6

Synaptic Plasticity

Definition of Learning: experience alters behaviour

The basic experience in neurons is spikes.Spikes are transmitted between neurons through synapses.

Hebb suggested that connections in the brain change in response to experience.

Pre-synaptic cell

Post-synaptic celldelay

time

Hebbian learning: If the pre-synaptic cell causes the post-synaptic cell to fire a spike, then the connection between them will be enhanced. Eventually, this will lead to a path of ‘least resistance’ in the network.

7

Today... Artificial neural networks (part 1)

Focus on the simplest cartoon models of biological neural nets. We will build on lessons from today to design simple artificial neurons and networks that perform useful computational tasks.

8

The Appeal of Neural ComputingThe only intelligent systems that we know of are biological. In particular most brains share the following feature in their neural architecture – they are massively parallel networks organised into interconnected hierarchies of complex structures.

In addition, they are very good at some tasks that computers are typically poor at:• recognising patterns, balancing conflicts, sensory-

motor coordination, interaction with the environment, anticipation, learning… even curiosity, creativity & consciousness.

• speed, tolerance, robustness, flexibility, self-driven dynamic activity

For computer scientists, many natural systems appear to share many attractive properties:

9

The first artificial neuron modelIn analogy to a biological neuron, we can think of a virtual neuron that crudely mimics the biological neuron and performs analogous computation.

The artificial neuron is a cartoon model that will not have all the biological complexity of real neurons. How powerful is it?

Just like biological neurons, this artificial neuron neuron will have:

• Inputs (like biological dendrites) carry signal to cell body.

inputs

• A body (like the soma), sums over inputs to compute output, and

Σcell

body

• outputs (like synapses on the axon) transmit the output downstream.

output

10

Early history (1943)

In this seminal paper, Warren McCulloch and Walter Pitts invented the first artificial (MP) neuron, based on the insight that a nerve cell will fire an impulse only if its threshold value is exceeded. MP neurons are hard-wired devices, reading pre-defined input-output associations to determine their final output. Despite their simplicity, M&P proved that a single MP neuron can perform universal logic operations.

A network of such neurons can therefore do anything a Turing machine can do, but with a much more flexible (and potentially very parallel) architecture.

McCulloch & Pitts (1943). “A logical calculus of the ideas immanent in nervous activity”, Bulletin of Mathematical Biophysics, 5, 115-137.

11

*

*

*

*

*

(

)over all i

• Weighted inputs are summed in the cell body.

The McCulloch-Pitts (MP) neuron

The “computation” consists of "adders" and a threshold.

• Each input has an assigned weight w.

• • •

w1

w2

w3

wn

weig

hts

inputs

x1

x2

x3

xn

• • •

inp

uts

• Inputs x are binary: 0,1.

1 if 0 if <

output

=

• Otherwise, the output=0.• If the neuron fires, the output =1.

• Neuron fires if sum exceeds (or equals) activation threshold .

Note: an equivalent

formalism assigns =0 & instead of threshold introduces an extra bias input, such that

bias * wbias = -

bias*wb

12

IN 1 OUT 1

0 0

1 0

Always 0

IN 1 OUT 2

0 0

1 1

IDENTITY

IN 1 OUT 3

0 1

1 0

NOT

IN 1 OUT 4

0 1

1 1

Always 1

For binary logic gates, with only one input, possible outputs are described by the following truth tables:

For example:

Logic gates with MP neurons

NOT xx

w

= -0.5

w = -1

Excercise: Find w and for the 3 remaining gates.

13

0 1 0 0 0 1 0 1

IN 1

IN 2

Here is a compact,graphical representationof the same truth table:

Logic gates with MP neurons (cont.)

IN 1 IN 2 OUT 0 0 0 0 1 0 1 0 0 1 1 1

With two binary inputs, there are 4 possible inputs and 24 = 16 corresponding truth tables (outputs)!

For example, the AND gate implemented in the MP neuron:

= +1.5

1

1 x1 AND x2

x1

x2

Excercise: Find w and for OR & NAND.

14

Computational power of MP neuronsUniversality: NOT & AND can be combined to perform any logical function; MP neurons, circuited together in a network can solve any problem that a conventional computer could.

But let’s examine the single neuron a little longer.

Q: Just how powerful is a single MP neuron?

A: It can solve any problem that can be expressed as a classification of points on a plane by a single straight line.

Generalisation to many inputs: points in many dimensions are now classified, not by a line, but by a flat surface.

0 1 0 0 0 1 0 1

IN 1

IN 2

AND

Even one neuron can successfully handle simple classification problem.

15

trace 1

2.49.81.20.47.96.7etc.

trace 2

1.08.30.22.18.87.2etc.

problem?

yesnoyesyesnonoetc.

outputsum

∑xi wi

*in

puts

x1

bias

x2

weig

hts

w1

w3

w2

∑ xi wi

∑ xi wi

∑ xi wi

∑ xi wi

∑ xi wi

∑ xi wi

etc.

output

+6.6-8.1

etc.

+8.6

+7.5

-6.7

-3.9

w1=-1, w2=-1, w3=+10 & bias=+1

output

Yes No

etc.

Yes Yes No No

Classification in ActionA set of patients may have a medical problem. Blood samples are analysed for the quantities of two trace elements.

With correct weights, this MP neuron consistently classifies patients.

+ive output = problem

16

The missing step

The ability of the neuron to classify inputs correctly hinges on the appropriate assignment of the weights and threshold.

So far, we have done this by hand.

Imagine we had an automatic algorithm for the neuron to learn the right weights and threshold on its own.

In 1962, Rosenblatt, inspired by biological learning rules, did just that.

Frank Rosenblatt (1962). Principles of Neurodynamics, Spartan, New York

17

Imagine a naive, randomly weighted neuron. One way to train a neuron to discriminate the sick from the healthy, is by reinforcing good behaviour and penalising bad. This carrot & stick model is the basis for the learning rule:

• Initialise the neuronal weights (random initialisation is the standard).

• Run each input set in turn through the neuron & note its output.

• Whenever a wrong output is encountered, alter responsible weights.

Learning Rule:

• Repeatedly run through training set until all outputs agree with targets.

wi wi + xi if output too lowwi wi xi if output too high

• When training is complete, test the neuron on a new testing set of patients.• If neuron succeeds, patients whose health is unknown may be determined.

• Compile a training set of N (say 100) sick and healthy patients.

18

Supervised learningThe learning rule is an example of supervised learning.

Training MP neurons requires a training set, for which the ‘correct’ output is known.

These ‘correct’ or ‘desired’ outputs are used to calculate the error, which in turn is used to adjust the input-output relation of the neuron.

Without knowledge of the desired output, the neuron cannot be trained. Therefore, supervised learning is a powerful tool when training sets with desired outputs are available.

When can’t supervised learning be used?

Are biological neurons supervised?

19

A simple exampleLet’s try to train a neuron to learn the logical OR operation:

w1= 1, w2= 1, w3= 0

w1 w2 w3

0 1 1

WRONG

0 1 1

OK

OK

1 1 1

OK

0 1 10 1 0

WRONG

0 1 0OK

1 1 1

WRONG

OK

1 1 0WRONG

1 1 0OK

1 1 0

OK0

1

0

0 0 0

0

0

0∑ xi wi 0

0 0 0OK

OK

x1 OR x2

0

1

1

1

desired outputx2

0

1

0

1

x1

0

0

1

1

x3

1

1

1

1

bias

x1

x3

x2

w1

w3

w2 ∑ xi wi output

wi wi + xi if output lowwi wi xi if output high

20

Some common variations on this learning rule:

Adding a learning rate 0<r<1 which “damps” weight changes (i = rxi or i = -rxi).

Widrow & Hoff recognised that weight changes should be large when actual output a and target output t were very different, but smaller otherwise.

They introduced an error term, ∆=t-a, such that i =r∆xi.

The power of learning rulesThe rule is guaranteed to converge on a set of appropriate weights, if a solution exists. While it might not be the most efficient of algorithms, this proven convergence is crucial.

What can be done to improve the convergence rate?

21

The Fall of the Artificial Neuron

Marvin Minsky & Seymour Papert (1969). Perceptrons, MIT Press, Cambridge.

• Before long researchers had begun to discover the neuron’s limitations.• Unless input categories were “linearly separable”, a perceptron could not

learn to discriminate between them.• Unfortunately, it appeared that many important categories were not

linearly separable. This proved a fatal blow to the artificial neural networks community.

Successful

Unsuccessful

Many Hours in the Gym per Week

Few Hours in the Gym

per Week

Footballers

Academics

In this example, an MP neuron would not be able to discriminate between the footballers and the academics…

This failure caused the majority of researchers to walk away.

Exercise: Which logic operation is described in this example?

22

Connectionism Reborn

David E. Rumelhart & James L. McClelland (1986).Parallel Distributed Processing, Vols. 1 & 2, MIT Press, Cambridge, MA.

Most influential of these was a two-volume book by Rumelhart & McClelland, who suggested a feed-forward architecture of neurons: layers of neurons, with each layer feeding its calculations on to the next.

The crisis in artificial neural networks can be understood, not as an inability to connect many neurons in a network, but an inability to generalise the training algorithms to arbitrary architectures. By arranging the neurons in an ‘appropriate’ architecture, a suitable training algorithm could be invented. The solution, once found, quickly emerged as the most popular learning algorithm for nnets.

Back-propagation first discovered in 1974 (Werbos, PhD thesis, Harvard) but discovery went unnoticed. In the mid-80s, it was rediscovered independently by three groups within about one year.

23

This time…• The appeal of neural computing• From biological to artificial neurons• Nervous systems as logic circuits• Classification with the McCulloch & Pitts neuron• Developments in the 60s:

– The Delta learning rule & variations– Simple applications– The fatal flaw of linearity

Next time…The disappointment with the single neuron dissipated as promptly as it dawned upon the AI community. Next time, we will see why the single neuron’s simplicity does not rule out immense richness at the network level. We will examine the simplest architecture of feed-forward neural networks and generalise the delta-learning rule to these multi-layer networks. We will also re-discover some impressive applications.

24

Optional readingExcellent treatments of the perceptron, the delta rule & Hebbian learning, the multi-layer perceptron and the back-propagation learning algorithm can be found in:

Beale & Jackson (1990). Neural Computing, chaps. 3 & 4.

Hinton (1992). How neural networks learn from experience, Scientific American, 267 (Sep):104-109.

Bioinspired Computing Lecture 6 Artificial Neural Networks: The rise & fall of the perceptron Netta...

Documents

Transcript of Bioinspired Computing Lecture 6 Artificial Neural Networks: The rise & fall of the perceptron Netta...