Neurobiological Models of Instrumental Conditioning
-
Upload
matthew-crossley -
Category
Science
-
view
142 -
download
2
Transcript of Neurobiological Models of Instrumental Conditioning
Neurobiological Models of Instrumental Conditioning
Matthew J. Crossley
Department of Psychological and Brain Sciences University of California, Santa Barbara, 93106
I. A neurobiological model of appetitive instrumental conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
Why Instrumental Conditioning?
• The Ashby lab bread and butter is category learning
• Information-Integration category-learning is a procedural skill
• Appetitive Instrumental Conditioning is a procedural skill
• Learned incrementally from feedback
• Model-free reinforcement learning
• Habitual control
• E.g., riding a bike or playing an instrument
• E.g., radiology
Procedural Skills
Procedural Skills
Where are the tumors?
Procedural Skills
TUMORS!
Procedural Skills Depend on the Basal Ganglia
• Basal ganglia are a collection of subcortical nuclei
• Interconnects with cortex in well defined circuits
• Striatum is a major input structure
Cortex Excites the Striatum
Striatum Inhibits the GPi
GPi Inhibits the Thalamus
High baseline firing rate
Striatum Disinhibits the Thalamus
Thalamus Excites Cortex
Dopamine Modulates Activity
Procedural Learning Depends on the Striatum
• Single-cell recordings Carelli, Wolske, & West, 1997; Merchant, Zainos, Hernadez, Salinas, & Romo, 1997; Romo, Merchant, Ruiz, Crespo, & Zainos, 1995
• Lesion studies Eacott & Gaffan, 1991; Gaffan & Eacott, 1995; Gaffan & Harrison, 1987; McDonald & White, 1993, 1994; Packard, Hirsch, & White, 1989; Packard & McGaugh, 1992
• Neuropsychological patient studies Filoteo, Maddox, & Davis, 2001; Filoteo, Maddox, Salmon, & Song, 2005; Knowlton, Mangels, & Squire, 1996
• Neuroimaging Nomura et al., 2007; Seger & Cincotta, 2002; Waldschmidt & Ashby, 2011
Striatal Neurons
Medium Spiny Projection Neurons (MSNs)
96%
GABA Interneurons 2%
TANs - Cholinergic Interneurons 2%
The TANs are of Particular Interest
• Tonically active and pause to excitatory input
• Presynaptically inhibit cortical input to MSNs
• Get major input from CM-Pf (thalamus)
• Learn to pause to stimuli that predict reward (requires dopamine)
I. A neurobiological model of appetitive instrumental conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
Model Architecture
Ashby and Crossley (2011)
Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses
Pf-TAN Synapse
CTX-MSN Synapse
Ashby and Crossley (2011)
Network Dynamics: Early Trial
Network Dynamics: Early Trial
Network Dynamics - Early Trial
Network Dynamics - Early Trial
Network Dynamics - Early Trial
SMA
Response and Feedback
• Model responds if SMA crosses threshold
• Model is given feedback after every trial
Learning Occurs at the CTX-MSN Synapse and at Pf-TAN Synapses
Pf-TAN Synapse
CTX-MSN Synapse
Ashby and Crossley (2011)
CTX-MSN Synaptic Modification Requires a TANs Pause
• Synaptic Strengthening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Elevated DA levels
• Synaptic Weakening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)
Synaptic Plasticity in the Striatum Depends on Dopamine (DA)
• Synaptic Strengthening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Elevated DA levels
• Synaptic Weakening:
- Strong presynaptic activation
- Strong postsynaptic activation
- Depressed DA levels
Arbuthnott, Ingham, & Wickens (2000) Calabresi, Pisani, Mercuri, & Bernardi (1996) Reynolds & Wickens (2002)
DA Encodes Reward Prediciton Error (RPE)
• Elevated after unexpected reward
• Depressed after unexpected no-reward
• Does nothing if anything expected happens
Bayer & Glimcher (2005)
Computing RPE
Obtained feedback on trial n:
Predicted feedback on trial n:
Rn =
�1 if positive feedback0 otherwise
Pn = Pn�1 + �(Rn�1 � Pn�1)
RPE on trial n:
RPE(n) = Rn � Pn
DA Released on Trial n
DA(n) =
�⌅⇤
⌅⇥
1 if RPE > 10.8RPE + 0.2 if � 0.25 < RPE � 10 if RPE < 0.25
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Presynaptic Activity
Presynaptic Activity
Synaptic Strengthening
Synaptic Weakening
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Postsynaptic Activation
Postsynaptic Activation
Synaptic Strengthening
Synaptic Weakening
Updating Synapses in the Model
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Elevated DA
Depressed DA
Synaptic Strengthening
Synaptic Weakening
Network Dynamics: Late Trial
Network Dynamics: Late Trial
Network Dynamics - Late Trial
Network Dynamics - Late Trial
Network Dynamics - Late Trial
SMA
Model Accounts for Electrophysiological Recordings from TANs
Ashby and Crossley (2011)
Model Accounts for Electrophysiological Recordings from MSNs
Ashby and Crossley (2011)
I. A neurobiological model of appetitive instrumental conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference model of DA
Outline
Fast Reacquisition
Ashby and Crossley (2011)
Fast reacquisition is evidence that extinction did not erase initial learning
Fast Reacquisition Mechanics
TANs quickly stop pausing, and thereby protect cortico-striatal synapses
Fast Reacquisition Mechanics
Partial Reinforcement Extinction (PRE)
Extinction is slower when acquisition is trained with partial reinforcement
PRE Mechanics
TANs take longer to stop pausing under partial reinforcement
Slowed Reacquisition
Condition
Phase
Ext2 Ext8 Prf2 Prf8
Acquisition VI-30 sec VI-30 sec VI-30 sec VI-30 sec
ExtinctionNo
ReinforcementNo
ReinforcementLean Schedule Lean Schedule
Reacquisition VI-2 min VI-8 min VI-2 min VI-8 min
Woods and Bouton (2007)
Behavioral Results
Crossley, Horvitz, Balsam, & Ashby (in prep)
Modeling Results
Crossley, Horvitz, Balsam, & Ashby (in prep)
TANs don’t stop pausing during extinction in Prf Conditions
CTX-MSN Synapse Pf-TAN Synapse
Renewal - Basic Design
Condition
Phase
ABA AAB ABC
Acquisition Environment A Environment A Environment A
Extinction Environment B Environment A Environment B
Renewal (Extinction)
Environment A Environment B Environment C
Bouton et al. (2011)
Renewal
Model Architecture
Crossley, Horvitz, Balsam, & Ashby (in prep)
Synaptic Plasticity at ALL Pf-TAN Synapses
Crossley, Horvitz, Balsam, & Ashby (in prep)
Renewal
Crossley, Horvitz, Balsam, & Ashby (in prep)
ABA Mechanics
Crossley, Horvitz, Balsam, & Ashby (in prep)
Net Pf-TAN synaptic weight is the average of all active Pf-TAN synapses
Instrumental Conditioning Summary
• The TANs protect learning at CTX-MSN synapses.
• Manipulations that keep the TANs paused during extinction leave learning at the CTX-MSN synapse subject to change.
Untested Physiological Predictions
• Development of TANs pause precedes development of category-specific responses in MSNs
• TANs should stop pausing during extinction
I. A neurobiological model of appetitive instrumental conditioning
II. Applications of model
Fast Reacquisition
Partial Reinforcement Extinction
Renewal
III. Temporal-Difference (TD) model of DA
Outline
Putting TD into the model
We want to replace the discrete-trial model of DA with a continuous
time model
The TD Prediction Error
TrialTime Step
Pred
ictio
n Er
ror
The TD Prediction Error
⇥t = rt + �V (t+ 1)� V (t)
rt =
�1 if reward at time t
0 if no reward at time t
Montague, Dayan, Sejnowski (1996) journal of neuroscience 16(5): 1936-1947
Model Architecture
Spiking Neuron Driven by TD prediction error:
TANs were removed for initial TD applications
⇥t = rt + �V (t+ 1)� V (t)
We Need Modified Learning Equations
!
wK ,J
(n +1) = wK ,J
(n)
+"wIK
(n) SJ(n) #$
NMDA[ ]+D(n) #D
base[ ]+
1# wK ,J
(n)[ ]
#%wIK
(n) SJ(n) #$
NMDA[ ]+Dbase
#D(n)[ ]+wK ,J
(n)
# &wIK
(n) $NMDA
# SJ(n)[ ]
+' S
J(n) #$
AMPA[ ]+wK ,J
(n).
Synaptic Strengthening
Synaptic Weakening
DA is no longer modeled on a discrete trial-by-trial basis!
A Cortico-Striatal Synapse
CaMKII, PP-1 and Striatal Plasticity
Learning Equations
w(n+ 1) = w(n)
+ �w
�[SCaMKII(t)� SCaMKII base]
+[DPP-1(t)�Dbase]+[wmax � w(n)]dt
� ⇥w
�[SCaMKII(t)� SCaMKII base]
+[Dbase �DPP-1(t)]+w(n)dt
Synaptic Strengthening
Synaptic Weakening
CaMKII Activity
CaMKII Activity
Learning Equations
w(n+ 1) = w(n)
+ �w
�[SCaMKII(t)� SCaMKII base]
+[DPP-1(t)�Dbase]+[wmax � w(n)]dt
� ⇥w
�[SCaMKII(t)� SCaMKII base]
+[Dbase �DPP-1(t)]+w(n)dt
Synaptic Strengthening
Synaptic Weakening
PP-1 Activity
PP-1 Activity
Acquisition and Extinction
Trial
Prop
ortio
n R
espo
nses
Em
itted
Trial
CT
X-M
SN S
ynap
tic S
tren
gth
MSN and SNc
TrialTime Step
TrialTime Step
MSN
Out
put
SNc
Out
put
CaMKII and PP-1
DA model learns very quickly that reward is taken away
Trial
Tim
e St
ep
Trial
Tim
e St
ep
Extinction under noncontingent reward delivery
Trial
Prop
ortio
n R
espo
nses
Em
itted
Trial
CT
X-M
SN S
ynap
tic S
tren
gth
MSN and SNc
TrialTime Step
MSN
Out
put
TrialTime Step
SNc
Out
put
MSN and SNc
Noncontingent reward delivery keeps DA surprised
Trial
Tim
e St
ep
Trial
Tim
e St
ep
CaMKII and PP-1
Noncontingent reward delivery keeps DA surprised
Trial
Tim
e St
ep
Trial
Tim
e St
ep
Summary and Future Directions
• TANs need to be added to account for reacquisition, renewal, and other effects after extinction with noncontingent reward
• TD model might need to be modified once the TANs are included and post-extinction effects are examined
Acknowledgments Collaborators:
Greg Ashby
The Ashby Lab
Todd Maddox
Jon Horvitz
Peter Balsam
!
Funding:
NIMH Grant MH3760-2, Todd Wilkinson