Using Backprop to Understand Apects of Cognitive Development

38
Using Backprop to Understand Apects of Cognitive Development PDP Class Feb 8, 2010

description

Using Backprop to Understand Apects of Cognitive Development. PDP Class Feb 8, 2010. Back propagation algorithm. Propagate activation forward Propagate “error” backward Calculate ‘weight error derivative’ terms = d r a s Change weights after Each pattern A batch of patterns. i. - PowerPoint PPT Presentation

Transcript of Using Backprop to Understand Apects of Cognitive Development

Using Backprop to Understand Apects of Cognitive Development

PDP ClassFeb 8, 2010

Back propagation algorithm

• Propagate activation forward

• Propagate “error” backward

• Calculate ‘weight error derivative’ terms = ras

• Change weights after– Each pattern– A batch of patterns

i

j

k

At the output level: i = (ti-ai)f’(neti)

At other levels: j = f’(netj)iiwij, etc.

inetii enetfa

1

1)(

Variants/Embellishments to back propagation

• We can include weight decay and momentum: wrs= prpasp – wrs + wrs(prev)

• An alternative error measure has both conceptual and practical advantages:

CEp = -i [tiplog(aip) + (1-tip)log(1-aip)]

• If targets are actually probabilistic, minimizing CEp causes activations to match the probability of the observed target values.

• This also eliminates the ‘pinned output unit’ problem.

Is backprop biologically plausible?

• Neurons do net send error signals backward across their weights through a chain of neurons, as far as anyone can tell.

• But we shouldn’t be too literal minded about the actual biological implementation of the learning rule.

• Some neurons appear to use error signals, and there are ways to use differences between activation signals to carry error information. (We will explore this in a later lecture.)

Why is back propagation important?

• Provides a procedure that allows networks to learn weights that can solve any deterministic input-output problem.

– Contrary to expectation, it does not get stuck in local minima except in cases where the network is exceptionally tightly constrained.

– Allows networks with multiple hidden layers to be trained, although learning tends to proceed slowly (later we will learn about procedures that can fix this).

• Allows networks to learn how to represent information as well as how to use it.

• Raises questions about the nature of representations and of what must be specified in order to learn them.

The Time-Course of Cognitive Development

• Networks trained with back-propagation address several issues in development including– Whether innate knowledge is necessary as a starting

point for learning.– Aspects of the time course of development– What causes changes in the pattern of responses

children make at different times during development?– What allows a learned to reach a the point of being

ready to learn something s/he previously was not ready to learn?

Two Example Models

• Rumelhart’s semantic learning model– Addresses most of the issues above– Available as the “semnet” script in the bp directory

• Model of child development in a ‘naïve physics’ task (Piaget’s balance scale task)– Addresses stage transitions and readiness to learn

new things– We will not get to this; see readings of interested

Quillian’s (1969)HierarchicalPropositional

Model

The Rumelhart (1990) Model

The Training Data:

All propositions true of items at the bottom levelof the tree, e.g.:

Robin can {fly, move, grow}

The Rumelhart Model: Target output for ‘robin can’ input

The Rumelhart Model

Experience

Early

Later

LaterStill

Inference and Generalizationin the PDP Model

• A semantic representation for a new item can be derived by error propagation from given information, using knowledge already stored in the weights.

Start with a neutral representation on the representation units. Use backprop to adjust the representation to minimize the error.

The result is a representation similar to that of the average bird…

Use the representation to infer what a this new thing can do.

Some Phenomena in Conceptual Development

• Progressive differentiation of concepts• Illusory correlations and U-shaped

developmental trajectories• Domain- and property-specific constraints on

generalization• Reorganization of Conceptual Knoweldge

• Waves of differentiation reflect sensitivity to patterns of coherent covariation of properties across items.

• Patterns of coherent covariation are reflected in the principal components of the property covariance matrix.

• Figure shows attribute loadings on the first three principal components:

– 1. Plants vs. animals– 2. Birds vs. fish– 3. Trees vs. flowers

• Same color = features covary in

component• Diff color = anti-covarying

features

What Drives Progressive

Differentiation?

Coherent Covariation

• The tendency of properties of objects to co-occur in clusters.

• e.g.– Has wings– Can fly– Is light

• Or– Has roots– Has rigid cell walls– Can grow tall

CoherenceTrainingPatterns

No labels are providedEach item and each property occurs with equal frequency

Properties

Coherent Incoherent

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

Items

is can has is can has …

Effects of Coherence on Learning

CoherentProperties

Incoherent Properties

Effect of Coherence on Representation

Effects of Coherent Variation on Learning in Connectionist Models

• Attributes that vary together create the acquired concepts that populate the taxonomic hierarchy, and determine which properties are central and which are incidental to a given concept.– Labeling of these concepts or their properties is in no

way necessary, but it may contribute additional ‘covarying’ information, and can affect the pattern of differentiation.

• Arbitrary properties (those that do not co-vary with others) are very difficult to learn.– And it is harder to learn names for concepts that are

only differentiated by such arbitrary properties.

Sensitivity to Coherence Requires Convergence

A

A

A

Illusory Correlations

• Rochel Gelman found that children think that all animals have feet.– Even animals that look like small furry balls

and don’t seem to have any feet at all.

A typical property thata particular object lackse.g., pine has leaves

An infrequent,atypical property

Domain Specificity

• What constraints are required for development and elaboration of domain-specific knowledge?– Are domain specific constraints required?– Or are there general principles that allow for

acquisition of conceptual knowledge of all different types?

Differential Importance (Marcario, 1991)

• 3-4 yr old children see a puppet and are told he likes to eat, or play with, a certain object (e.g., top object at right)– Children then must choose

another one that will “be the same kind of thing to eat” or that will be “the same kind of thing to play with”.

– In the first case they tend to choose the object with the same color.

– In the second case they will tend to choose the object with the same shape.

– Can the knowledge that one kind of property is important for one type of thing while another is important for a different type of thing be learned?

– They can in the PDP model, since it is sensitive to domain-specific patterns of coherent covariation.

Adjustments to Training

Environment

• Among the plants:– All trees are large– All flowers are small– Either can be bright or

dull• Among the animals:

– All birds are bright– All fish are dull– Either can be small or

large• In other words:

– Size covaries with properties that differentiate different types of plants

– Brightness covaries with properties that differentiate different types of animals

Testing Feature Importance

• After partial learning, model is shown eight test objects:– Four “Animals”:

• All have skin• One is large, bright; one small, bright;

one large, dull, one small, dull.

– Four “Plants”:• All have roots• Same 4 combinations as above

• Representations are generated by usingback-propagation to representation.

• Representations are then compared to see which ‘animals’ are treated as most similar, and which ‘plants’ are treated as most similar.

The Rumelhart Model

Similarities of Obtained Representations

Size is relevant for Plants

Brightness is relevant for Animals

Additional Properties of the model

• The model is sensitive to amount and type of exposure, addressing frequency effects, expertise effects and capturing different types of expertise.

• The model’s pattern of generalization varies as a function of the type of property as well as the domain.

• The model can reorganize its knowledge:– It will first learn about superficial appearance

properties if these are generally available; later, it can re-organize its knowledge based on coherent covariation among properties that are only occur in specific context.