Deep learning in a brain (skolkovo 2016)

31
Deep Learning in a Brain What makes our mind deep? [email protected]

Transcript of Deep learning in a brain (skolkovo 2016)

Deep Learning in a BrainWhat makes our mind deep?

[email protected]

1My goal today is to give the very coarse grained and preliminary answer to the question: What makes our mind deep?And how we could use this knowledgeTo better understand ourselvesTo make machines a little bit more intelligent

Brain is a survival machine

Basic tasksWhat happens?What should I do?

To start with, we should understand at the outline, that the brain is an ultimate survival machine.And as such it solves two basic tasks. What happens in the world right now, and how to react?Evolution only made the solution to those tasks deeper & deeper more abstract features, more strategic behavior.

In mammals this was accomplished mainly via the expansion of neocortex. The area of neocortex expanded, but its structure remained the same. And this gives us the first clue to understanding our brain.The second clue is that under the hood of rapidly expanding neocortex there exists ancient reptile brain, which remained basically the same through the evolution of mammals.2

What makes it deep?New (mammalian) brainDeep unsupervised learningSimilar to Deep Belief Nets

Old (reptilian) brainDeep reinforcement learningSimilar to Long Short Term Memory

So, basically our goal is to understand how these two components perform their tasks, and how do they interact with each other.As we will see the depth of our mind can be explained by:Deep unsupervised learning in our large neocortex (that appears to very much alike learning in Restricted Boltzmann Machine)Deep reinforcement learning, taking place mainly in our reptilian brain (which in turn resembles the famous LSTM model)3

NeocortexHomogeneous 2D tissueMammals: 2-3 mmHumans: 75% of brain volume

Single basic algorithm Pattern recognitionUnsupervised learning

4Lets start with Neocortex, which is a two-dimensional sheet of neurons, 2-3 mm wide, with the same structure and function in all mammals.In humans it represents of the brain and is convoluted, since its surface expanded very rapidly in the last several million years. As we will see shortly, the larger the surface of neocortex, the deeper the corresponding algorithms.

What makes neocortex simple (in a sense) is that it has one basic plan which is repeated over and over again.Roughly speaking it contains millions of simple pattern recognition units, powered by unsupervised Hebbian learning.

Cortical (hyper)columns1-3 mm. Boucsein, Beyond the cortical column (2011)

V. Mountcastle, The columnar organization of neocortex (1997)0.3 mm

Such basic pattern recognition unit is called a column, and is about 1/3 of mm in diameter. This is approximately the size of dendritic tree of pyramidal neurons, comprising the majority of cortical neurons (4 of each 5). Such column contains about 10,000 neurons, and is activated by some unique pattern of its inputs, it learns during lifetime.

Another larger scale of about 1-3 mm is given by the stretch of pyramids axons within the cortex sheet. Many of them end up on the inhibiting neurons, so that the column, being activated inhibit all nearby columns, preventing them from activation.

5

Cortex: Self-Organizing Maps

Hebb learning rule: Fire together wire together

Recognition

0.3 T. Kohonen (1982)

1-3

6Learning in two-dimensional grid of such elements with lateral inhibition results in formation of so called topological feature maps, as was shown by T. Kohonen in 1982. That is: If such layer is exposed with large amount of input signals, each of its elements eventually becomes a detector of that or another signal, and similar signals are recognized by neighbor elements.

During recognition only one element fires, indicating corresponding input pattern, and inhibiting all the rival detectors in its neighborhood. Each such recognition act further increases the connections between excited neurons: implementing hebbian rule Fire together wire together.

V1 map

1 mm

7As a result the whole neocortical surface of a brain comprise several millions of specialized pattern recognition elements, arranged in tens of thousands of self-organizing maps, 1-3 mm in diameter. Each such map works with its own input, and in its turn is part of the input of another map.

For example, this is the self-organized maps of the primary visual cortex. Here each color corresponds to orientation of light in a certain place of retina. Green elements fire when they see vertical lines, and the red ones are activated by horizontal lines etc.

This primary detectors form the input pattern for the next part of the cortex, etc.

Cortical hierarchy

A.R. Luria, Higher Cortical Functions in Man (1962)

8This is how the hierarchy of deep cortical neural networks is formed from bottom up, starting from primary sensory areas.

Cortical hierarchy

A.R. Luria, Higher Cortical Functions in Man(1962)

RBM

9All cortical interconnections are typically bidirectional. So, if the next layer indicates some pattern in a lower one, it activates this pattern, and helps to recognize it in noisy environment. This reciprocal excitation between the layers very much resembles so called Restricted Boltzmann Machines.

Deep Neural Networks

G. Hinton, Deep belief networks (2009)

Indeed, this was exactly the learning algorithm of the early deep neural networks, proposed by Geoffrey Hinton back in 2009.Each next of the consequent layers of RBMs learns more and more abstract features.

This is very much alike human children learn more and more abstract notions during their education.10

Features

Organization of behaviorSensorsStrategiesPlansActsMuscles

Scenes

Goal-directed learning?

0,3

11In the brain the bottom-up sensory hierarchy is supplemented by the top-down executive hierarchy in the frontal lobe.The top strategic layer learns strategies, that is series of actions, suitable for the current situation.Each of these actions are eventually decomposed into the orders for individual muscles.

When the appropriate behavior is already learned, it takes a fraction of second to react (recall e.g. a shooting cowboy).

The question is: how this goal-directed behavior is learned. It cant be unsupervised learning. Someone must somehow assess various variants of strategies, plans and actions.

Reinforcement learningStrategies

ScenesPlans

12This kind of learning is known as reinforcement learning. Our next goal is to find out, how it is implemented in our brain.

Suppose, there is some assessor deep in the brain (under the hood of neocortex). It monitors all the consequences of that or another strategy or action and assess it according to the accumulated Reward.

Strategies

ScenesPlans

Reinforcement learning

13Gradually it finds out the utility of each strategy, and learns to choose the appropriate one.

Strategies

ScenesPlans

Reinforcement learning

14

Strategies

ScenesPlansWho & how choose strategies?

Reinforcement learning

15The question is: who & how choose those strategies, plans and actions?

Subcortical structures

Basal GangliaThalamus

Neocortex

The short answer is: this assessor is our ancient reptilian brain, that is subcortical structures.Well now consider the interaction between neocortex and two major subcortical structures: basal ganglia and thalamus.

It turns out, that these recurrent interactions organize our long term thinking.

Lets start with thalamo-cortical system16

Thalamus: AttentionCortex

Thalamus~ 40 GzPositive feedback Gamma-rhythm Self-excitation

17There are very intimate relations between cortex and thalamus. Each element of the cortex receives some inputs from the thalamus, and sends back its feedback output. Such positive feedback loop results in self-excitation and self-sustained oscillations giving rise to well known Gamma rhythm.

This is the mechanism underlying Attention, since due to it we are able to concentrate on some ideas not for a fraction of a second, but for seconds, maybe minutes, or even longer.

Thalamus: Attention, BindingCortex

Thalamus~ 40 Gz~ 40 GzSinchronizationLaBerge Triangular Circuit Theory of Attention (1998)

18Moreover, thalamus may synchronize oscillations in diverge parts of the cortex, thus providing so called binding of different activations in a coherent sensation.

Thalamus: Attention, Binding

~ 40 Gz~ 40 Gz

ThalamusColorShapeWordNotion

TRIANGLEtriangle

19For instance, when we see this red triangle, this image is analyzed in different parts of the cortex. Some part indicates its color, another one the shape.In the language department the word triangle is activated, and in mathematical department the corresponding mathematical notion.

But the same is valid for the gray rectangle on this picture.

Then how do I know, that the triangle is red, while rectangle is gray, and not vice verse? The answer is, that correspondent indicators of triangle and red are synchronized, i.e. they oscillate in the same phase. And this synchrony fuse it into a single perception.

Cortical-subcortical interaction

Basal GangliaThalamus

Neocortex

Now, what is the role of basal ganglia, that lay over the thalamus very much like a brake pad?20

Basal ganglia: global inhibition

Striatum value functionPallidus action controlSubstantia nigra reinforcement

Basal ganglia is indeed sort of a brake, since it is the largest inhibiting part of the brain it consists mainly of inhibiting neurons.

The outer shell of it, called striatum, implements the value function, and the inner one pallidus implements the action control, applying brakes on the thalamus.

Finally Substantia nigra closer to the brain stem provide the reinforcement signal for learning.21

Basal ganglia: action controlCortex

Thalamus

StriatumPallidus

Substantia nigraBasal gangliabrakeNo activity withoutStriatum approval

22This is how it works. By default Pallidus is active and inhibits Thalamus. Due to these brakes no activity in the cortex can be sustained without permission of the Striatum.

Cortex

StriatumPallidus

Activity approved byStriatumBasal ganglia

Substantia nigra

Basal ganglia: action controlThalamus

LSTMHochreiter, Schmidhuber (1997)

23Thus, each activity pattern in the cortex seeks for the request of the Striatum. If and only if Striatum approves it, the brakes are loosen, and thus such activity may survive, i.e. enter our thoughts.

This is very much alike of so called Long Short Term Memory prolonged activity of neurons in recurrent neural networks, proposed by Hochreiter & Schmidhuber back in 1997.

The memory cell here represents cortical column, and the gates are implemented by Basal ganglia and Thalamus.Each gate has its own weights, implemented in Striatum in the course of reinforcement learning.

Cortex

ThalamusBasal gangliaStriatumPallidus

Weights strengtheningby dopamine signal

Substantia nigra

Basal ganglia: reinforcement learning

24Reinforcement signals, strengthening the weights of Striatum neurons, are provided by dopamine-ergic neurons of Substantia nigra.These neurons project mainly to striatum and inject dopamine hormone if the Reward is high enough.

Dopamine reinforcement signal (hormone of joy)

From Substantia nigraFrom cortex

D+ D

Dopamine is known as hormone of joy, since we feel joy, when our Striatum receives the dopamine shot.Thus Striatum reinforce proper actions from the cortex, based on this dopamine signals.

There are two types of dopamine receptors in Striatum D+/D-, which activate or inhibit synaptic connections respectively.Thus there are two neuron populations, implementing two Value functions: Q+/Q-25

Cortex

Basal ganglia

Value of action a in state s

state saction a

Hypothalamus, Amygdala,

Reward prediction errorBasal ganglia: reinforcement learning

Reward prediction

26Both populations of neurons project to Substantia nigra in a different way, so that Q+ neurons inhibit it, and Q- activate. The difference between these signalsmay be considered as a prediction of reward. Actual reward comes from a deeper and more ancient parts of the brain Hypothalamus and Amygdala, where all our instincts are inprinted.

If actual reward is greater, than expected, the dopamine shot goes to Striatum as a reinforcement signal. That is why we always want more, than we already have!

Cortex

Basal ganglia

Basal ganglia: reinforcement learningAccumulated joyAnticipated joy

state saction a

27Dopamine strengthen the weights of Q+ neurons, and inhibits the weights of Q- ones. The net result is, that Q+ neurons represent accumulated joy associated with the given behavior in a given context, while Q- population represent the future anticipated joy.

These are sort of the upper and lower estimate of the utility of the given action.

Mammals dopamine addictsCurious we like surprisesExplorative behaviorMotivation for learning

Forward thinkingAnticipated, not immediate rewardValue-based, not reactive behavior

Summing up, all mammals are kind of dopamine addicts.

This kind of reinforcement learning results in

Mammals are curious we like surprises - unexpected rewards. Thats why we are motivated to exploratory behavior and constant learning. It is imprinted in our brains.We are forward-thinking animals, i.e. we are motivated not by immediate, but by integrated reward. Another words, our behavior is governed not by sheer instincts, but by intrinsic values, which we learn during our childhood and throughout our lives

28

Deep control architecture

Motor Execution Motivation

And the last, but not least

There are multiple loops in the brain, resulting in what I call deep control architecture.Behavior is first chosen based on the current high-level motivation in so called orbital frontal cortex, then its execution is organized in prefrontal cortex, and finally executed in pre-motor and motor regions.29

Deep control architecture

Deep Belief networks

Recurrent Reinforcement networks

Motor Execution Motivation

This slide illustrates deep control architecture. Each module consists of green sensor and action part of the cortex, which govern certain level of behavior.The red slices are gating elements of Basal ganglia and Thalamus, that choose only the actions appropriate in the current context. Finally blue circles comprise dopamine subsystem, providing reinforcement for learning.

Note, that we already have good models for the sensor hierarchy these are well known deep neural networks for pattern recognition.We also have LSTM-like model for interaction of neocortex and basal ganglia. So we already understand a lot about how the brain works.30

TakeawayThe Brain is not THAT complex

We can HACK IT

This leads us to the following overall conclusion 31