Time and Causality: A theory of learning What is associative learning for? How does Rescorla Wagner...

Time and Causality: A theory of learning

What is associative learning for?

How does Rescorla Wagner do? How does it fail?

Wagner’s time-based theory of learning

Applications


Learning about causality Tone --> food


Learning about features of stimuli - what goes with what

juicy

nicepastry

fruitwarm

If you want to design a model to learn about causality, what should it be like?


Directionality Cause -->effect

or effect --> Cause


Sensitivity to delay between Cause and effect


Sensitive to correlation


Learning about predictable outcomes

?

?

What possible rules are there for forming associations?

Would a pure contiguity model have the properties we want? (e.g. Hebb)

V = ()

Direction X

Delay yes

Correlation X

Predictable outcomes X

V = ()

Rescorla Wagner avoids some of these problems:

V = ( - V)

Rescorla Wagner avoids some of these problems:

bracketed term means how surprising US is

V = ( - V)

Rescorla Wagner thus allow selective learning about surprising outcomes

it can also explain sensitivity to correlation

Rescorla Wagner thus allow selective learning about surprising outcomes

it can also explain sensitivity to correlation

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context

context ---> food context+tone ---> food

Rescorla Wagner

Direction X

Delay X

Correlation yes

Predictable outcomes yes

Rescorla Wagner cannot explain why backward conditioning should not work, and cannot easily explain the effect of trace intervals…..

this is because there is nothing in the Rescorla Wagner equation that refers to time

– and time is the essence of causality

Wagner’s SOP (1981)

Sometimes Opponent Process Theory

incorporates time, by basing itself on the idea that processing of a stimulus can vary:

as a function of time (cf trace decay in STM)

as a function of recent events

stimulus processing is reduced if:

the same stimulus has just been presented

self-generated priming

a predictor (CS) for the stimulus has just been presented

retrieval-generated priming

General Assumptions

Stimulus represented as a set of elements, some of which may be activated by stimulus presentation.

Elements may be inactive, or in one of two states:

A1 is a primary state of limited capacity (corresponding to rehearsal/STM)

A2 is a secondary state of activation.

General Assumptions

Differences between A1 and A2....

Response elicited by A2 often less intense than that elicited by A1 – in some cases it’s the opposite

General Assumptions

When a stimulus is presented, some of its inactive elements enter A1, then gradually decay into A2, and then become inactive again.

inactive A1 A2 inactive

fast slow

I A1

A2

I A1

A2fast

I A1

A2slow

How does this model produce self generated priming?

I A1

A2

but after a while.....

I A1

A2 fast

When the stimulus is first presented its elements go into A1, and then quickly decay into A2

Elements cannot go from A2 directly to A1; must decay to I first

The more elements accumulated in A2 state, the fewer are left for the next presentation of the CS to put into A1

So the second presentation produces less A1 activity, and the stimulus is less effective

so by the time the next CS occurs.....

I A1

A2 fast

Retrieval-generated priming:

if an associate of the stimulus is presented, then its elements are activated directly to the A2 state.

inactive A2 inactive

I A1

A2

Condition Tone --->Food.....

and present Tone; what happens to Food elements?

I A1

A2

...so when food presented it is less effective

“conditioned diminution of the UR”

Differences between A1 and A2....

Learning about A1 and A2 obeys different rules..

in order to form an excitatory association :

--- the CS must be A1

--- if the US must be in A1

--- if the US is in A2 an inhibitory association forms

How does conditioning happen?

After one trial:

I A1

A2

I A1

A2

tone food

How does conditioning stop?

After many trials:

I A1

A2

I A1

A2

tone food

CS mainly A1, US mainly in A2 --> mix of excitatory and inhibitory

learning

How does extinction happen?

I A1

A2

I A1

A2

tone food

CS mainly A1, US all in A2 --> inhibitory learning

nothing

Inhibitory conditioning:

First establish tone-->food association

I A1

A2

I A1

A2

tone food

Inhibitory conditioning:

nothing

CS mainly A1, US all in A2 --> inhibitory learning

I A1

A2

I A1

A2

light food

then introduce tone+lightnothing trials

An inhibitor prevents inactive elements of the US from entering A2.

It will thus interfere with action of a conditioned excitor, which is trying to put inactive US elements into A2.

So how does this model do all the things that learning about causality would require?

Selective learning about signals for surprising events

Correlation

Delay

Directionality

Blocking: Early Stage 1

tonefood tone+light food

I A1

A2

tone

I A1

A2

food

Blocking: Late Stage 1


I A1

A2

tone

I A1

A2

food

Blocking: Stage 2


I A1

A2

light

I A1

A2

food

CS mainly A1, US mainly in A2 --> mix of excitatory and inhibitory

learning

I A1

A2

I A1

A2

Excitatory Conditioning Short ISI

Mainly A1/A1 ---> strong excitatory association

I A1

A2

I A1

A2

Less CS in A1 ---> weaker excitatory association

Excitatory Conditioning Medium ISI

I A1

A2

I A1

A2

No CS in A1 ---> no excitatory association

Excitatory Conditioning Very Long ISI

Backward conditioning

tonefood

I A1

A2

I A1

A2

Further Predictions and Applications

The theory predicts that a US will be processed less effectively when it is predicted. This was tested by Terry and Wagner (1975).

Train

US: CS--> US no US: CS--> -

(or the opposite)

1210864200

20

40

60

80

100

CS+

CS-

US signals reinforcement

Sessions

% CRs

12108642010

20

30

40

50

60

70

80

CS+

CS-

US signals nonreinforcement

Sessions

% CRs

Then train tone-->US light-->no US

after this training

US: CS--> US no US: CS--> -

Test : compare

tone --> US: CS ??

light --> US: CS ??

Predicted shock should be less effective than unsignalled shock

Tone trials should be less accurate than light trials

Redrawn from Terry & Wagner

0

5

10

15

20

25

30

35

40

45

50

1 2 3

Preparatory Releaser Interval (sec)

Mean percent correct deviation from P

CS-CS+

Another prediction of the account is that a predicted CS is less effective at evoking its CR than a surprising one -- priming

A --> X --> food

B --> Y --> food

test CR to X and Y with same combinations... and different combinations

same A --> X different A --> Y

B --> Y B --> X

10

2

4

6

8

From Honey, Hall & Bonardi, 1993

Same Different

Elevation ratio

Applications 1

Andresen et al (1990) – The scapegoat effect

Suggested novel tasting food eaten after “normal” food which precedes CT will

acquire strong association and overshadow association to normal food

(act as a scapegoat)

This idea appeals to two principles:

(i) conditioning two stimuli together results in less learning than if you condition just one -- overshadowing

(ii) novel stimuli condition better than familiar ones – latent inhibition – latent inhibition

Applications 1








CS CS CS CS +

Applications 1








CS CS CS CS +

context context context context

Applications 2

Drug addiction and tolerance e.g Paletta & Wagner 1986

Response elicited by A2 may be opposite to that elicited by A1

If the UR has two phases, one opposite to the other, it suggests A1 and A2 activity are opposite to each other

e.g. UR to morphine sedation/hypoactivity (A1 response) followed by hyperactivity (compensatory A2 response)

this means that CSs associated with the drug may produce tolerance to drug’s effect

Paletta & Wagner 1986

Three groups of animals:

Morphine (distinctive context)

Morphine (home cage)

No drug

Then tested all groups in distinctive context

measure activity and sensitivity to pain (tail flick test)

Across several experiments they found evidence of hyperactivity and hyperalgesia in the group that had experienced morphine in a distinctive context – the opposite of drug’s normal effects

Suggested Reading

Dickinson, A. (1980). Contemporary animal learning theory. Cambridge University Press) (Short, sophisticated but compelling introduction to learning theory written from a causal perspective)

Honey, R.C., Hall, G., & Bonardi, C. (1993). Negative priming in associative learning: Evidence from serial conditioning procedures. Journal of Experimental Psychology: Animal Behavior Processes, 19, 90-97.

A test of Wagner's theory

Marlin, N.A., & Miller, R.R. (1981). Associations to contextual stimuli as a determinant of long term habituation. Journal of Experimental Psychology: Animal Behavior Processes, 7, 313-333.


Paletta, M.S., & Wagner, A.R. (1986). Development of context-specific tolerance to morphine: support for a dual process interpretation. Behavioral Neuroscience, 100, 611-623.

Application of Wagner's theory

Terry, W.S., & Wagner, A.R. (1975). Short term memory for "surprising" versus "expected" conditioned stimuli in Pavlovian conditioning. Journal of Experimental Psychology: Animal Behavior Processes, 104, 122-133.


Wagner, A.R. (1981) SOP: A model of automatic memory processing in animals. In N.E. Miller & R.R. Spear (Eds.) Information processes in animals: Memory Mechanims (pp. 95-128). Hillsdale, N.J. Erlbaum

Wagner’s theory!

Time and Causality: A theory of learning What is associative learning for? How does Rescorla Wagner...

Documents

Transcript of Time and Causality: A theory of learning What is associative learning for? How does Rescorla Wagner...