Neurobiological Models of Instrumental Conditioning

Neurobiological Models of Instrumental Conditioning

Matthew J. Crossley

Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106

I. A neurobiological model of appetitive instrumental conditioning

II. Applications of model

Fast Reacquisition

Partial Reinforcement Extinction

Renewal

III. Temporal-Difference model of DA

Outline

Why Instrumental Conditioning?

• The Ashby lab bread and butter is category learning

• Information-Integration category-learning is a procedural skill

• Appetitive Instrumental Conditioning is a procedural skill

• Learned incrementally from feedback

• Model-free reinforcement learning

• Habitual control

• E.g., riding a bike or playing an instrument

• E.g., radiology

Procedural Skills

Procedural Skills

Where are the tumors?

Procedural Skills

TUMORS!

Procedural Skills Depend on the Basal Ganglia

• Basal ganglia are a collection of subcortical nuclei

• Interconnects with cortex in well defined circuits

• Striatum is a major input structure

Cortex Excites the Striatum

Striatum Inhibits the GPi

GPi Inhibits the Thalamus

High baseline firing rate

Striatum Disinhibits the Thalamus

Thalamus Excites Cortex

Dopamine Modulates Activity

Procedural Learning Depends on the Striatum

• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995

• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992

• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996

• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011

Striatal Neurons

Medium Spiny Projection Neurons (MSNs)

96%

GABA Interneurons 2%

TANs - Cholinergic Interneurons 2%

The TANs are of Particular Interest

• Tonically active and pause to excitatory input

• Presynaptically inhibit cortical input to MSNs

• Get major input from CM-Pf (thalamus)

• Learn to pause to stimuli that predict reward (requires dopamine)



Fast Reacquisition


Renewal


Outline

Model Architecture

Ashby and Crossley (2011)

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse


Network Dynamics: Early Trial

Network Dynamics - Early Trial

Network Dynamics - Early Trial

SMA

Response and Feedback

• Model responds if SMA crosses threshold

• Model is given feedback after every trial

Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses

Pf-TAN Synapse

CTX-MSN Synapse


CTX-MSN Synaptic Modification Requires a TANs Pause

• Synaptic Strengthening:

- Strong presynaptic activation

- Strong postsynaptic activation

- Elevated DA levels

• Synaptic Weakening:



- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

Synaptic Plasticity in the Striatum Depends on Dopamine (DA)

• Synaptic Strengthening:



- Elevated DA levels

• Synaptic Weakening:



- Depressed DA levels

Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)

DA Encodes Reward Prediciton Error (RPE)

• Elevated after unexpected reward

• Depressed after unexpected no-reward

• Does nothing if anything expected happens

Bayer & Glimcher (2005)

Computing RPE

Obtained feedback on trial n:

Predicted feedback on trial n:

Rn =

�1 if positive feedback0 otherwise

Pn = Pn�1 + �(Rn�1 � Pn�1)

RPE on trial n:

RPE(n) = Rn � Pn

DA Released on Trial n

DA(n) =

�⌅⇤

⌅⇥

1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25

Updating Synapses in the Model

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Presynaptic Activity

Presynaptic Activity

Synaptic Strengthening

Synaptic Weakening


!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Postsynaptic Activation

Postsynaptic Activation


Synaptic Weakening


!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).

Elevated DA

Depressed DA


Synaptic Weakening

Network Dynamics: Late Trial

Network Dynamics - Late Trial

Network Dynamics - Late Trial

SMA

Model Accounts for Electrophysiological Recordings from TANs


Model Accounts for Electrophysiological Recordings from MSNs




Fast Reacquisition


Renewal


Outline

Fast Reacquisition


Fast reacquisition is evidence that extinction did not erase initial learning

Fast Reacquisition Mechanics

TANs quickly stop pausing, and thereby protect cortico-striatal synapses

Fast Reacquisition Mechanics

Partial Reinforcement Extinction (PRE)

Extinction is slower when acquisition is trained with partial reinforcement

PRE Mechanics

TANs take longer to stop pausing under partial reinforcement

Slowed Reacquisition

Condition

Phase

Ext2 Ext8 Prf2 Prf8

Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec

ExtinctionNo

ReinforcementNo

ReinforcementLean Schedule Lean Schedule

Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min

Woods and Bouton (2007)

Behavioral Results

Crossley, Horvitz, Balsam, & Ashby (in prep)

Modeling Results


TANs don’t stop pausing during extinction in Prf Conditions

CTX-MSN Synapse Pf-TAN Synapse

Renewal - Basic Design

Condition

Phase

ABA AAB ABC

Acquisition Environment A Environment A Environment A

Extinction Environment B Environment A Environment B

Renewal (Extinction)

Environment A Environment B Environment C

Bouton et al. (2011)

Renewal

Model Architecture


Synaptic Plasticity at ALL Pf-TAN Synapses


Renewal


ABA Mechanics


Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses

Instrumental Conditioning Summary

• The TANs protect learning at CTX-MSN synapses.

• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.

Untested Physiological Predictions

• Development of TANs pause precedes development of category-specific responses in MSNs

• TANs should stop pausing during extinction



Fast Reacquisition


Renewal

III. Temporal-Difference (TD) model of DA

Outline

Putting TD into the model

We want to replace the discrete-trial model of DA with a continuous

time model

The TD Prediction Error

TrialTime Step

Pred

ictio

n Er

ror

The TD Prediction Error

⇥t = rt + �V (t+ 1)� V (t)

rt =

�1 if reward at time t

0 if no reward at time t

Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947

Model Architecture

Spiking Neuron Driven by TD prediction error:

TANs were removed for initial TD applications

⇥t = rt + �V (t+ 1)� V (t)

We Need Modified Learning Equations

!

wK ,J

(n +1) = wK ,J

(n)

+"wIK

(n) SJ(n) #$

NMDA[ ]+D(n) #D

base[ ]+

1# wK ,J

(n)[ ]

#%wIK

(n) SJ(n) #$

NMDA[ ]+Dbase

#D(n)[ ]+wK ,J

(n)

# &wIK

(n) $NMDA

# SJ(n)[ ]

+' S

J(n) #$

AMPA[ ]+wK ,J

(n).


Synaptic Weakening

DA is no longer modeled on a discrete trial-by-trial basis!

A Cortico-Striatal Synapse

CaMKII, PP-1 and Striatal Plasticity

Learning Equations

w(n+ 1) = w(n)

+ �w

�[SCaMKII(t)� SCaMKII base]

+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w


+[Dbase �DPP-1(t)]+w(n)dt


Synaptic Weakening

CaMKII Activity

CaMKII Activity

Learning Equations

w(n+ 1) = w(n)

+ �w


+[DPP-1(t)�Dbase]+[wmax � w(n)]dt

� ⇥w


+[Dbase �DPP-1(t)]+w(n)dt


Synaptic Weakening

PP-1 Activity

PP-1 Activity

Acquisition and Extinction

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

MSN and SNc

TrialTime Step

TrialTime Step

MSN

Out

put

SNc

Out

put

CaMKII and PP-1

DA model learns very quickly that reward is taken away

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Extinction under noncontingent reward delivery

Trial

Prop

ortio

n R

espo

nses

Em

itted

Trial

CT

X-M

SN S

ynap

tic S

tren

gth

MSN and SNc

TrialTime Step

MSN

Out

put

TrialTime Step

SNc

Out

put

MSN and SNc

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

CaMKII and PP-1

Noncontingent reward delivery keeps DA surprised

Trial

Tim

e St

ep

Trial

Tim

e St

ep

Summary and Future Directions

• TANs need to be added to account for reacquisition, renewal, and other effects after extinction with noncontingent reward

• TD model might need to be modified once the TANs are included and post-extinction effects are examined

Acknowledgments Collaborators:

Greg Ashby

The Ashby Lab

Todd Maddox

Jon Horvitz

Peter Balsam

!

Funding:

NIMH Grant MH3760-2, Todd Wilkinson

Neurobiological Models of Instrumental Conditioning

Science

Transcript of Neurobiological Models of Instrumental Conditioning