A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The...

22
A theory of cortical responses Karl Friston * The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University College London, 12 Queen Square, London WC1N 3BG, UK This article concerns the nature of evoked brain responses and the principles underlying their generation. We start with the premise that the sensory brain has evolved to represent or infer the causes of changes in its sensory inputs. The problem of inference is well formulated in statistical terms. The statistical fundaments of inference may therefore afford important constraints on neuronal implementation. By formulating the original ideas of Helmholtz on perception, in terms of modern-day statistical theories, one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. It turns out that the problems of inferring the causes of sensory input (perceptual inference) and learning the relationship between input and cause (perceptual learning) can be resolved using exactly the same principle. Specifically, both inference and learning rest on minimizing the brain’s free energy, as defined in statistical physics. Furthermore, inference and learning can proceed in a biologically plausible fashion. Cortical responses can be seen as the brain’s attempt to minimize the free energy induced by a stimulus and thereby encode the most likely cause of that stimulus. Similarly, learning emerges from changes in synaptic efficacy that minimize the free energy, averaged over all stimuli encountered. The underlying scheme rests on empirical Bayes and hierarchical models of how sensory input is caused. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and context-sensitive fashion. This scheme provides a principled way to understand many aspects of cortical organization and responses. The aim of this article is to encompass many apparently unrelated anatomical, physiological and psychophysical attributes of the brain within a single theoretical perspective. In terms of cortical architectures, the theoretical treatment predicts that sensory cortex should be arranged hierarchically, that connections should be reciprocal and that forward and backward connections should show a functional asymmetry (forward connections are driving, whereas backward connections are both driving and modulatory). In terms of synaptic physiology, it predicts associative plasticity and, for dynamic models, spike-timing-dependent plasticity. In terms of electrophysiology, it accounts for classical and extra classical receptive field effects and long-latency or endogenous components of evoked cortical responses. It predicts the attenuation of responses encoding prediction error with perceptual learning and explains many phenomena such as repetition suppression, mismatch negativity (MMN) and the P300 in electroencephalography. In psycho- physical terms, it accounts for the behavioural correlates of these physiological phenomena, for example, priming and global precedence. The final focus of this article is on perceptual learning as measured with the MMN and the implications for empirical studies of coupling among cortical areas using evoked sensory responses. Keywords: cortical; inference; predictive coding; generative models; Bayesian; hierarchical 1. INTRODUCTION This article represents an attempt to understand evoked cortical responses in terms of models of perceptual inference and learning. The specific model considered here rests on empirical Bayes, in the context of generative models that are embodied in cortical hierarchies. This model can be regarded as a mathe- matical formulation of the longstanding notion that ‘our minds should often change the idea of its sensation into that of its judgment, and make one serve only to excite the other’ (Locke 1690). In a similar vein, Helmholtz (1860) distinguishes between perception and sensation. ‘It may often be rather hard to say how much from perceptions as derived from the sense of sight is due directly to sensation, and how much of them, on the other hand, is due to experience and training’ (see Pollen 1999). In short, there is a distinction between percepts, which are the products of recognizing the causes of sensory input and sensation per se. Recognition (i.e. inferring causes from sensation) is the inverse of generating sensory data from their causes. It follows that recognition rests on models, learned through experience, of how sensations are caused. In this article, we will consider hierarchical generative models and how evoked cortical responses can be understood as part of the recognition process. The particular recognition scheme we will focus on is based on empirical Bayes, where prior expectations are abstracted from the sensory data, using a hierarchical Phil. Trans. R. Soc. B (2005) 360, 815–836 doi:10.1098/rstb.2005.1622 Published online 29 April 2005 One contribution to a Theme Issue ‘Cerebral cartography 1905–2005’. * (k.friston@fil.ion.ucl.ac.uk). 815 q 2005 The Royal Society

Transcript of A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The...

Page 1: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Phil. Trans. R. Soc. B (2005) 360, 815–836

doi:10.1098/rstb.2005.1622

A theory of cortical responses

Published online 29 April 2005

Karl Friston*

One co1905–20

* (k.frist

The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University College London,12 Queen Square, London WC1N 3BG, UK

This article concerns the nature of evoked brain responses and the principles underlying theirgeneration. We start with the premise that the sensory brain has evolved to represent or infer thecauses of changes in its sensory inputs. The problem of inference is well formulated in statisticalterms. The statistical fundaments of inference may therefore afford important constraints onneuronal implementation. By formulating the original ideas of Helmholtz on perception, in terms ofmodern-day statistical theories, one arrives at a model of perceptual inference and learning that canexplain a remarkable range of neurobiological facts.

It turns out that the problems of inferring the causes of sensory input (perceptual inference) andlearning the relationship between input and cause (perceptual learning) can be resolved using exactlythe same principle. Specifically, both inference and learning rest on minimizing the brain’s freeenergy, as defined in statistical physics. Furthermore, inference and learning can proceed in abiologically plausible fashion. Cortical responses can be seen as the brain’s attempt to minimize thefree energy induced by a stimulus and thereby encode the most likely cause of that stimulus.Similarly, learning emerges from changes in synaptic efficacy that minimize the free energy, averagedover all stimuli encountered. The underlying scheme rests on empirical Bayes and hierarchical modelsof how sensory input is caused. The use of hierarchical models enables the brain to construct priorexpectations in a dynamic and context-sensitive fashion. This scheme provides a principled way tounderstand many aspects of cortical organization and responses. The aim of this article is toencompass many apparently unrelated anatomical, physiological and psychophysical attributes of thebrain within a single theoretical perspective.

In terms of cortical architectures, the theoretical treatment predicts that sensory cortex should bearranged hierarchically, that connections should be reciprocal and that forward and backwardconnections should show a functional asymmetry (forward connections are driving, whereasbackward connections are both driving and modulatory). In terms of synaptic physiology, it predictsassociative plasticity and, for dynamic models, spike-timing-dependent plasticity. In terms ofelectrophysiology, it accounts for classical and extra classical receptive field effects and long-latencyor endogenous components of evoked cortical responses. It predicts the attenuation of responsesencoding prediction error with perceptual learning and explains many phenomena such as repetitionsuppression, mismatch negativity (MMN) and the P300 in electroencephalography. In psycho-physical terms, it accounts for the behavioural correlates of these physiological phenomena, forexample, priming and global precedence. The final focus of this article is on perceptual learning asmeasured with the MMN and the implications for empirical studies of coupling among cortical areasusing evoked sensory responses.

Keywords: cortical; inference; predictive coding; generative models; Bayesian; hierarchical

1. INTRODUCTIONThis article represents an attempt to understand

evoked cortical responses in terms of models of

perceptual inference and learning. The specific model

considered here rests on empirical Bayes, in the context

of generative models that are embodied in cortical

hierarchies. This model can be regarded as a mathe-

matical formulation of the longstanding notion that

‘our minds should often change the idea of its sensation

into that of its judgment, and make one serve only to

excite the other’ (Locke 1690). In a similar vein,

Helmholtz (1860) distinguishes between perception

ntribution to a Theme Issue ‘Cerebral cartography05’.

[email protected]).

815

and sensation. ‘It may often be rather hard to say howmuch from perceptions as derived from the senseof sight is due directly to sensation, and how muchof them, on the other hand, is due to experienceand training’ (see Pollen 1999). In short, there is adistinction between percepts, which are the products ofrecognizing the causes of sensory input and sensationper se. Recognition (i.e. inferring causes from sensation)is the inverse of generating sensory data from theircauses. It follows that recognition rests on models,learned through experience, of how sensations arecaused. In this article, we will consider hierarchicalgenerative models and how evoked cortical responsescan be understood as part of the recognition process.The particular recognition scheme we will focus on isbased on empirical Bayes, where prior expectations areabstracted from the sensory data, using a hierarchical

q 2005 The Royal Society

Page 2: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

816 K. Friston A theory of cortical responses

model of how those data were caused. The particularimplementation of empirical Bayes we consideris predictive coding, where prediction error is usedto adjust the state of the generative model untilprediction error is minimized and the most likelycauses of sensory input have been identified.

Conceptually, empirical Bayes and generativemodels are related to ‘analysis-by-synthesis’ (Neisser1967). This approach to perception, drawn fromcognitive psychology, involves adapting an internalmodel of the world to match sensory input and wassuggested by Mumford (1992) as a way of under-standing hierarchical neuronal processing. The idea isreminiscent of Mackay’s epistemological automata(MacKay 1956) which perceive by comparing expectedand actual sensory input (Rao 1999). These modelsemphasize the role of backward connections inmediating predictions of lower level input, based onthe activity of higher cortical levels.

Recognition is simply the process of solving aninverse problem by jointly minimizing prediction errorat all levels of the cortical hierarchy. The main pointof this article is that evoked cortical responses canbe understood as transient expressions of predictionerror, which index some recognition process. Thisperspective accommodates many physiological andbehavioural phenomena, for example, extra classicalRFeffects and repetition suppression in unit recordings,the MMN and P300 in ERPs, priming and globalprecedence effects in psychophysics. Critically, many ofthese emerge from the same basic principles governinginference with hierarchical generative models.

In a series of previous papers (Friston 2002, 2003),we have described how the brain might use empiricalBayes for perceptual inference. These papers con-sidered other approaches to representational learningas special cases of generative models, starting withsupervised learning and ending with empirical Bayes.The latter predicts many architectural features, such asa hierarchical cortical system, prevalent top-downbackward influences and functional asymmetriesbetween forward and backward connections seen inthe real brain. The focus of previous work was onfunctional cortical architectures. This paper looks atevoked responses and the relevant empirical findings,in relation to predictions and theoretical constraintsafforded by the same theory. This is probably morerelevant for experimental studies.We will therefore takea little time to describe recent advances in modellingevoked responses in human cortical systems to showthe detailed levels of characterization it is now possibleto attain.

(a) Overview

We start by reviewing two principles of brain organi-zation, namely functional specialization and functionalintegration and how they rest upon the anatomy andphysiology of hierarchical cortico-cortical connections.Representational inference and learning from atheoretical or computational perspective is discussedin §2. This section reviews the heuristics behindschemes using the framework of hierarchical gen-erative models and introduces learning based onempirical Bayes that they enable. The key focus of

Phil. Trans. R. Soc. B (2005)

this section is on the functional architecturesimplied by the model. Representational inferenceand learning can, in some cases, proceed using onlyforward connections. However, this is only tenablewhen processes generating sensory inputs areinvertible and independent. Invertibility is pre-cluded by nonlinear interactions among causes ofsensory input (e.g. visual occlusion). These inter-actions create a problem for recognition that can beresolved using generative models. Generative orforward models solve the recognition problemusing the a priori distribution of causes. EmpiricalBayes allows these priors to be induced by sensoryinput, using hierarchies of backward and lateralprojections that prevail in the real brain. In short,hierarchical models of representational learning area natural choice for understanding real functionalarchitectures and, critically, confer a necessary roleon backward connections. Predictions and empiricalfindings that arise from the theoretical consider-ations are reviewed in §5–7. Implications for func-tional architectures, in terms of how connectionsare organized, functional asymmetries betweenforward and backward connections and how theychange with learning, are highlighted in §3. Then,§4 moves from infrastructural issues to implicationsfor physiological responses during perceptual infer-ence. It focuses on extra classical RF effects andlong-latency responses in electrophysiology. Thefinal sections look at the effect of perceptuallearning on evoked responses subtending inference,as indexed by responses to novel or deviant stimuli.We conclude with a demonstration of how plas-ticity, associated with perceptual learning, can bemeasured and used to test some key theoreticalpredictions.

2. FUNCTIONAL SPECIALIZATION ANDINTEGRATION

(a) Background

The brain appears to adhere to two fundamentalprinciples of functional organization, integration andspecialization. The distinction relates to that between‘localizationism’ and ‘(dis)connectionism’ that domi-nated thinking about cortical function in the nineteenthcentury. Since the early anatomic theories of Gall, theidentification of a particular brain region with a specificfunction has become a central theme in neuroscienceand was the motivation for Brodmann’s cytoarchitec-tonic work (Brodmann 1905; see also Kotter & Wanke2005). Brodmann posited ‘areae anatomicae’ to denotedistinct cortical fields that could be recognized usinganatomical techniques. His goal was to create a compa-rative system of organs that comprised these elementalareas, each with a specific function integrated withinthe system (Brodmann 1909).

Initially, functional localization per se was not easyto demonstrate. For example, a meeting that tookplace on 4 August 1881 addressed the difficulties ofattributing function to a cortical area, given thedependence of cerebral activity on underlying connec-tions (Phillips et al. 1984). This meeting was entitled

Page 3: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 817

Localization of function in the cortex cerebri. Althoughaccepting the results of electrical stimulation in dogand monkey cortex, Goltz considered that theexcitation method was inconclusive because thebehaviours elicited might have originated in relatedpathways or current could have spread to distantcentres. In short, the excitation method could not beused to infer functional localization because localiza-tionism discounted interactions or functional integra-tion among different brain areas. It was proposed thatlesion studies could supplement excitation experi-ments. Ironically, it was observations on patients withbrain lesions some years later (see Absher & Benson1993) that led to the concept of ‘disconnectionsyndromes’ and the refutation of localizationism as acomplete or sufficient explanation of cortical organi-zation. The cortical infrastructure supporting a singlefunction may involve many specialized areas whoseunion is mediated by functional integration. Func-tional specialization and integration are not exclusive;they are complementary. Functional specialization isonly meaningful in the context of functional inte-gration and vice versa.

(b) Functional specialization and segregation

The functional role, played by any component (e.g.cortical area, subarea, neuronal population or neuron)of the brain is defined largely by its connections.Clearly, this ‘connectivity’ may transcend many scales(e.g. molecular to social). However, here we focus onanatomical connections. Certain patterns of corticalprojections are so common that they could amount torules of cortical connectivity. ‘These rules revolvearound one, apparently, overriding strategy that thecerebral cortex uses—that of functional segregation’(Zeki 1990). Functional segregation demands thatcells with common functional properties be groupedtogether. There are many examples of this grouping(e.g. laminar selectivity, ocular dominance bands andorientation domains in V1). This architectural con-straint necessitates both convergence and divergenceof cortical connections. Extrinsic connections,between cortical regions, are not continuous butoccur in patches or clusters. In some instances, thispatchiness has a clear relationship to functionalsegregation. For example, the secondary visual areaV2 has a distinctive cytochrome oxidase architecture,consisting of thick stripes, thin stripes and inter-stripes. When recordings are made in V2, direction-ally selective (but not wavelength or colour-selective)cells are found exclusively in the thick stripes.Retrograde (i.e. backward) labelling of cells in V5 islimited to these thick stripes. All the availablephysiological evidence suggests that V5 is a function-ally homogeneous area that is specialized for visualmotion. Evidence of this nature supports the idea thatpatchy connectivity is the anatomical infrastructurethat underpins functional segregation andspecialization.

(c) The anatomy and physiology of

cortico-cortical connections

If specialization depends upon connections, thenimportant organizational principles should be

Phil. Trans. R. Soc. B (2005)

embodied in their anatomy and physiology. Extrinsicconnections couple different cortical areas, whereasintrinsic connections are confined to the cortical sheet.There are certain features of cortico-cortical connec-tions that provide strong clues about their functionalrole. In brief, there appears to be a hierarchicalorganization that rests upon the distinction betweenforward and backward connections (Maunsell & VanEssen 1983). The designation of a connection asforward or backward depends primarily on its corticallayers of origin and termination. The importantcharacteristics of cortico-cortical connections are listedbelow. This list is not exhaustive but serves to introducesome principles that have emerged from empiricalstudies of visual cortex.

(i) Hierarchical organizationThe organization of the visual cortices can be con-sidered as a hierarchy of cortical levels with reciprocalcortico-cortical connections among the constituentcortical areas (Maunsell & Van Essen 1983; Felleman& Van Essen 1991). Forward connections run fromlower to higher areas and backward connections fromhigher to lower. Lateral connections connect regionswithin a hierarchical level. The notion of a hierarchydepends upon a distinction between extrinsic forwardand backward connections (see figure 1).

(ii) Reciprocal connectionsAlthough reciprocal, forward and backward connec-tions show a microstructural and functional asymmetryand the terminations of both show laminar specificity.Forward connections (from a low to a high level) havesparse axonal bifurcations and are topographicallyorganized, originating in supragranular layers andterminating largely in layer 4. On the other hand,backward connections show abundant axonal bifur-cation and a more diffuse topography, although theycan be patchy (Angelucci et al. 2002a,b). Their originsare bilaminar/infragranular and they terminate predo-minantly in agranular layers (Rockland & Pandya 1979;Salin & Bullier 1995). An important distinction is thatbackward connections are more divergent. Forexample, the divergence region of a point in V5 (i.e.the region receiving backward afferents from V5) mayinclude thick and inter-stripes in V2, whereas itsconvergence region (i.e. the region providing forwardafferents to V5) is limited to the thick stripes (Zeki &Shipp 1988). Furthermore, backward connectionsare more abundant. For example, the ratio of forwardefferent connections to backward afferents in thelateral geniculate is about 1 : 10. Another distinctionis that backward connections traverse a number ofhierarchical levels whereas forward connections aremore restricted. For example, there are backwardconnections from TE and TEO to V1 but no mono-synaptic connections from V1 to TE or TEO (Salin &Bullier 1995).

(iii) Functionally asymmetric forward and backwardconnectionsFunctionally, reversible inactivation studies(e.g. Sandell & Schiller 1982; Girard & Bullier 1989)and neuroimaging (e.g. Buchel & Friston 1997)

Page 4: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 1. Schematic illustrating hierarchical structures in the brain and the distinction between forward, backward and lateralconnections. This schematic is inspired by Mesulam’s (1998) notion of sensory-fugal processing over ‘a core synaptic hierarchy,which includes the primary sensory, upstream unimodal, downstream unimodal, heteromodal, paralimbic and limbic zones ofthe cerebral cortex’ (see Mesulam 1998 for more details).

818 K. Friston A theory of cortical responses

suggest that forward connections are driving andalways elicit a response, whereas backward connec-tions can be modulatory. In this context, modulatorymeans backward connections modulate responsive-ness to other inputs. At the single cell level, ‘inputsfrom drivers can be differentiated from those ofmodulators. The driver can be identified as thetransmitter of RF properties; the modulator can beidentified as altering the probability of certain aspectsof that transmission’ (Sherman & Guillery 1998).

The notion that forward connections are concernedwith the promulgation and segregation of sensoryinformation is consistent with: (i) their sparse axonalbifurcation; (ii) patchy axonal terminations; and(iii) topographic projections. In contrast, backwardconnections are considered to have a role in mediating

Phil. Trans. R. Soc. B (2005)

contextual effects and in the co-ordination of proces-sing channels. This is consistent with: (i) their frequentbifurcation; (ii) diffuse axonal terminations; and(iii) more divergent topography (Salin & Bullier 1995;Crick & Koch 1998). Forward connections mediatetheir post-synaptic effects through fast AMPA(1.3–2.4 ms decay) and GABAA (6 ms decay) recep-tors. Modulatory effects can be mediated by NMDAreceptors. NMDA receptors are voltage-sensitive,showing nonlinear and slow dynamics (approximately50 ms decay). They are found predominantly in supra-granular layers where backward connections terminate(Salin & Bullier 1995). These slow time constants againpoint to a role in mediating contextual effects that aremore enduring than phasic sensory-evoked responses.The clearest evidence for the modulatory role of

Page 5: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 819

backward connections (that is mediated by ‘slow’glutamate receptors) comes from corticogeniculateconnections. In the cat LGN, cortical feedback ispartly mediated by type 1 metabotropic glutamatereceptors, which are located exclusively on distalsegments of the relay-cell dendrites. Rivadulla et al.(2002) have shown that these backward afferentsenhance the excitatory centre of the thalamic RF.‘Therefore, cortex, by closing this corticofugal loop, isable to increase the gain of its thalamic input within afocal spatial window, selecting key features of theincoming signal.’

Angelucci et al. (2002a,b) used a combination ofanatomical and physiological recording methods todetermine the spatial scale of intra-areal V1 hori-zontal connections and inter-areal backward connec-tions to V1. ‘Contrary to common beliefs, these(monosynaptic horizontal) connections cannot fullyaccount for the dimensions of the surround field (ofmacaque V1 neurons). The spatial scale of feedbackcircuits from extrastriate cortex to V1 is, instead,commensurate with the full spatial range of centre-surround interactions. Thus these connections couldrepresent an anatomical substrate for contextualmodulation and global-to-local integration of visualsignals.’

It should be noted that the hierarchical ordering ofareas is a matter of debate and may be indeterminate.Based on computational neuroanatomic studiesHilgetag et al. (2000) conclude that the laminarhierarchical constraints presently available in theanatomical literature are ‘insufficient to constrain aunique ordering’ for any of the sensory systemsanalysed. However, basic hierarchical principles wereevident. Indeed, the authors note, ‘All the corticalsystems we studied displayed a significant degree ofhierarchical organization’ with the visual and somato-motor systems showing an organization that was‘surprisingly strictly hierarchical’.

In the post-developmental period, synaptic plasticityis an important functional attribute of connections inthe brain and is thought to subserve perceptual andprocedural learning and memory. This is a large andfascinating field that ranges from molecules to maps(e.g. Buonomano & Merzenich 1998; Martin et al.2000). Changing the strength of connections betweenneurons is widely assumed to be the mechanism bywhich memory traces are encoded and stored in thecentral nervous system. In its most general form, thesynaptic plasticity and memory hypothesis states that,‘Activity-dependent synaptic plasticity is induced atappropriate synapses during memory formation and isboth necessary and sufficient for the informationstorage underlying the type of memory mediated bythe brain area in which that plasticity is observed’ (seeMartin et al. 2000 for an evaluation of this hypothesis).A key aspect of this plasticity is that it is generallyassociative.

(iv) Associative plasticitySynaptic plasticity may be transient (e.g. short-termpotentiation or depression) or enduring (e.g. long-termpotentiation or depression) with many different timeconstants. In contrast to short-term plasticity, long-

Phil. Trans. R. Soc. B (2005)

term changes rely on protein synthesis, synapticremodelling and infrastructural changes in cell pro-cesses (e.g. terminal arbours or dendritic spines) thatare mediated by calcium-dependent mechanisms. Animportant aspect of NMDA receptors, in the inductionof long-term potentiation, is that they confer associa-tivity on changes in connection strength. This isbecause their voltage-sensitivity allows calcium ionsto enter the cell when, and only when, there is conjointpre-synaptic release of glutamate and sufficient post-synaptic depolarization (i.e. the temporal association ofpre- and post-synaptic events). Calcium entry rendersthe post-synaptic specialization eligible for futurepotentiation by promoting the formation of synaptic‘tags’ (e.g. Frey & Morris 1997) and other calcium-dependent intracellular mechanisms.

In summary, the anatomy and physiology ofcortico-cortical connections suggest that forwardconnections are driving and commit cells to aprespecified response given the appropriate patternof inputs. Backward connections, on the other hand,are less topographic and are in a position to modulatethe responses of lower areas. Modulatory effects implythe postsynaptic response evoked by presynaptic inputis modulated by, or interacts in a nonlinear way with,another input. This interaction depends on nonlinearsynaptic or dendritic mechanisms. Finally, brainconnections are not static but are changing at thesynaptic level all the time. In many instances, thisplasticity is associative. In §3, we describe a theoreticalperspective, provided by generative models, thathighlights the functional importance of hierarchies,backward connections, nonlinear coupling and associ-ative plasticity.

3. REPRESENTATIONAL INFERENCE ANDLEARNINGThis section introduces learning and inference basedon empirical Bayes. A more detailed discussion can befound in Friston (2002, 2003). We will introduce thenotion of generative models and a generic scheme fortheir estimation. This scheme uses expectation maxi-mization (EM; an iterative scheme that estimatesconditional expectations and maximum likelihoods ofmodel parameters, in an E- and M-step, respectively).We show that predictive coding can be used toimplement EM and, in the context of hierarchicalgenerative models, is sufficient to implement empiricalBayesian inference.

(a) Causes and representations

Here, a representation is taken to be a neuronalresponse that represents some ‘cause’ in the sensorium.Causes are simply the states of processes generatingsensory data. It is not easy to ascribe meaning to thesestates without appealing to the way that we categorizethings, either perceptually or conceptually. Causes maybe categorical in nature, such as the identity of a face orthe semantic category to which an object belongs.Others may be parametric, such as the position of anobject. Even though causes may be difficult to describethey are easy to define operationally. Causes arequantities or states that are necessary to specify the

Page 6: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

820 K. Friston A theory of cortical responses

products of a process generating sensory information.For the sake of simplicity, let us frame the problemof representing causes in terms of a deterministicnonlinear function.

uZ gðv; qÞ; (3.1)

where v is a vector (i.e. a list) of underlying causes inthe environment (e.g. the velocity of a particular object,direction of radiant light, etc.), and u representssensory input; g(v,q) is a function, that generates inputsfrom the causes; q represents the parameters of thegenerative model. Unlike the causes, they are fixedquantities that have to be learned. We shall see laterthat the parameters correspond to connection strengthsin the brain’s model of how inputs are caused.Nonlinearities in equation (3.1) represent interactionsamong the causes. These can often be viewed ascontextual effects, where the expression of a particularcause depends on the context established by another. Aubiquitous example from early visual processing is theocclusion of one object by another. In a linear world,the visual sensation caused by two objects would be atransparent overlay or superposition. Occlusion is anonlinear phenomenon because the sensory input fromone object (occluded) interacts with, or depends on,the other (occluder). This interaction is an example ofnonlinear mixing of causes to produce sensory data. Ata cognitive level, the cause associated with the word‘hammer’ will depend on the semantic context (thatdetermines whether the word is a verb or a noun).

The problem the brain has to contend with is to finda function of the inputs that recognizes the underlyingcauses. To do this, the brain must effectively undo theinteractions to disclose contextually invariant causes.In other words, the brain must perform a nonlinearunmixing of causes and context. The key point here isthat the nonlinear mixing may not be invertible andthat the estimation of causes from input may befundamentally ill-posed. For example, no amount ofunmixing can discern the parts of an object that areoccluded by another. The corresponding indetermi-nacy in probabilistic learning rests on the combinatorialexplosion of ways in which stochastic generativemodels can generate input patterns (Dayan et al.1995). In what follows, we consider the implicationsof this problem. Put simply, recognition of causes fromsensory data is the inverse of generating data fromcauses. If the generative model is not invertible thenrecognition can only proceed if there is an explicitgenerative model in the brain. This speaks to theimportance of backward connections that may embodythis model.

(b) Generative models and representational

learning

This section introduces the basic framework withinwhich one can understand learning and inference.This framework rests upon generative and recognitionmodels, which are simply functions that map causes tosensory input and vice versa. Generative models afforda generic formulation of representational learning andinference in a supervised or self-supervised context.There are many forms of generative models that range

Phil. Trans. R. Soc. B (2005)

from conventional statistical models (e.g. factor andcluster analysis) and those motivated by Bayesianinference and learning (e.g. Dayan et al. 1995; Hintonet al. 1995). The goal of generative models is ‘to learnrepresentations that are economical to describe butallow the input to be reconstructed accurately’(Hinton et al. 1995). The distinction between recon-structing inputs and learning efficient representationsrelates directly to the distinction between inferenceand learning.

(i) Inference versus learningGenerative models relate unknown causes v andunknown parameters q, to observed inputs u. Theobjective is to make inferences about the causes andlearn the parameters. Inference may be simply estima-ting the most likely cause, and is based on estimates ofthe parameters from learning. A generative model isspecified in terms of a prior distribution over the causesp(v;q) and the generative distribution or likelihood ofthe inputs given the causes p(ujv;q). Together, thesedefine the marginal distribution of inputs implied by agenerative model

pðu; qÞZÐpðujv; qÞpðv; qÞdv: (3.2)

The conditional density of the causes, given theinputs, are given by the recognition model, which isdefined in terms of the recognition distribution

pðvju; qÞZpðujv; qÞpðv; qÞ

pðu; qÞ: (3.3)

However, as considered above, the generative modelmay not be inverted easily and it may not be possible toparameterize this recognition distribution. This iscrucial because the endpoint of learning is theacquisition of a useful recognition model that can beapplied to sensory inputs by the brain. One solution isto posit an approximate recognition or conditionaldensity q(v;u) that is consistent with the generativemodel and that can be parameterized. Estimating themoments (e.g. expectation) of this density correspondsto inference. Estimating the parameters of the under-lying generative model corresponds to learning. Thisdistinction maps directly onto the two steps of EM.

(c) Expectation maximization

Here, we introduce a general scheme for inference andlearning using EM (Dempster et al. 1977). To keepthings simple, we will assume that we are onlyinterested in the first moment or expectation ofq(v;u), which we will denote by f. This is theconditional mean or expected cause. EM is a coordi-nate ascent scheme that comprises an E-step and anM-step. In the present context, the E-step entailsfinding the conditional expectation of the causes(i.e. inference), while the M-step identifies the maxi-mum likelihood value of the parameters (i.e. learning).Critically, both adjust the conditional causes andparameters to maximize the same objective function.

(i) The free energy formulationEM provides a useful procedure for density estimationthat has direct connections with statistical mechanics.

Page 7: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 821

Both steps of the EM algorithm involve maximizing afunction of the densities above that corresponds to thenegative free energy in physics.

F Z hLiu

LZ ln pðu; qÞKKLfqðv; uÞ; pðvju; qÞg:(3.4)

This objective function has two terms. The first isthe likelihood of the inputs under the generative model.The second term is the Kullback–Leibler divergence1

between the approximate and true recognition den-sities. Critically, the second term is always positive,rendering F a lower bound on the expected log like-lihood of the inputs. This means maximizing theobjective function (i.e. minimizing the free energy) issimply minimizing our surprise about the data. TheE-step increases F with respect to the expected cause,ensuring a good approximation to the recognitiondistribution implied by the parameters q. This isinference. The M-step changes q, enabling the gener-ative model to match the input density and correspondsto learning.

InferenceðEÞ fZmaxf F

LearningðMÞ qZmaxq F(3.5)

EM enables exact and approximate maximum like-lihood density estimation for a whole variety ofgenerative models that can be specified in terms ofprior and generative distributions. Dayan & Abbot(2001) work through a series of didactic examples fromcluster analysis to independent component analyses,within this unifying framework. From a neurobiologicalperspective, the remarkable thing about this formalismis that both inference and learning are driven in exactlythe same way, namely to minimize the free energy. Thisis effectively the same as minimizing surprise aboutsensory inputs encountered. As we will see below, theimplication is that the same simple principle canexplain phenomena as wide-ranging as the MMN inevoked electrical brain responses to Hebbian plasticityduring perceptual learning.

(d) Predictive coding

In §3(c), we established an objective function that ismaximized to enable inference and learning in E- andM-steps, respectively. In this section, we consider howthat maximization might be implemented. In particu-lar, we will look at predictive coding, which is based onminimizing prediction error. Prediction error is thedifference between the input observed and thatpredicted by the generative model and inferred causes.We will see that minimizing the free energy is equivalentto minimizing prediction error. Consider anystatic nonlinear generative model under Gaussianassumptions

uZ gðv; qÞC3u

vZ vp C3p;(3.6)

where Cov{3u}ZSu is the covariance of any random orstochastic part of the generative process. Priors on thecauses are specified in terms of their expectation vp

Phil. Trans. R. Soc. B (2005)

and covariance Cov{3p}ZSp. This form will be usefulin the next section when we generalize to hierarchicalmodels. For simplicity, we will approximate therecognition density with a point mass. From equation(3.4),

LZK1

2xTu xu K

1

2xTp xp K

1

2lnjSujK

1

2lnjSpj

xu ZSK1=2u ðuKgðf; qÞÞ

xp ZSK1=2p ðfKvpÞ:

(3.7)

The first term in equation (3.7) is the predictionerror that is minimized in predictive coding. Thesecond corresponds to a prior term that constrains orregularizes conditional estimates of the causes. Theneed for this term stems from the ambiguous or ill-posed nature of recognition discussed above and is aubiquitous component of inverse solutions.

Predictive coding schemes can be regarded asarising from the distinction between forward andinverse models adopted in machine vision (Ballardet al. 1983; Kawato et al. 1993). Forward modelsgenerate inputs from causes (cf. generative models),whereas inverse models approximate the reversetransformation of inputs to causes (cf. recognitionmodels). This distinction embraces the noninverti-bility of generating processes and the ill-posed natureof inverse problems. As with all underdeterminedinverse problems, the role of constraints is central. Inthe inverse literature, a priori constraints usually enterin terms of regularized solutions. For example,‘Descriptions of physical properties of visible sur-faces, such as their distance and the presence ofedges, must be recovered from the primary imagedata. Computational vision aims to understand howsuch descriptions can be obtained from inherentlyambiguous and noisy data. A recent development inthis field sees early vision as a set of ill-posedproblems, which can be solved by the use ofregularization methods’ (Poggio et al. 1985). Thearchitectures that emerge from these schemes suggestthat ‘Feedforward connections from the lower visualcortical area to the higher visual cortical area providesan approximated inverse model of the imagingprocess (optics)’. Conversely, ‘.the backprojectionconnection from the higher area to the lower areaprovides a forward model of the optics’ (Kawato et al.1993; see also Harth et al. 1987). This perspectivehighlights the importance of backward connectionsand the role of priors in enabling predictive codingschemes.

(i) Predictive coding and BayesPredictive coding is a strategy that has some compel-ling (Bayesian) underpinnings. To finesse the inverseproblem posed by noninvertible generative models,constraints or priors are required. These resolve theill-posed problems that confound recognition basedon purely forward architectures. It has long beenassumed that sensory units adapt to the statisticalproperties of the signals to which they are exposed(see Simoncelli & Olshausen 2001 for a review). Infact, the Bayesian framework for perceptual inference

Page 8: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

822 K. Friston A theory of cortical responses

has its origins in Helmholtz’s notion of perception asunconscious inference. Helmholtz realized that retinalimages are ambiguous and that prior knowledge wasrequired to account for perception (Kersten et al.2004). Kersten et al. (2004) provide an excellentreview of object perception as Bayesian inference andask a fundamental question, ‘Where do the priorscome from. Without direct input, how does image-independent knowledge of the world get put into thevisual system?’ In §3(e), we answer this question andshow how empirical Bayes allows priors to be learnedand induced online during inference.

(e) Cortical hierarchies and empirical Bayes

The problem with fully Bayesian inference is that thebrain cannot construct the prior expectation andvariability, vp and Sp, de novo. They have to belearned and also adapted to the current experientialcontext. This is a solved problem in statistics andcalls for empirical Bayes, in which priors areestimated from data. Empirical Bayes harnesses thehierarchical structure of a generative model, treatingthe estimates at one level as priors on the subordinatelevel (Efron & Morris 1973). This provides a naturalframework within which to treat cortical hierarchiesin the brain, each level providing constraints on thelevel below. This approach models the world as ahierarchy of systems where supraordinate causesinduce and moderate changes in subordinate causes.These priors offer contextual guidance towards themost likely cause of the input. Note that predictionsat higher levels are subject to the same constraints,only the highest level, if there is one in the brain, isunconstrained. If the brain has evolved to recapitulatethe causal structure of its environment, in terms of itssensory infrastructures, it is possible that our visualcortices reflect the hierarchical causal structure of ourenvironment.

Next, we introduce hierarchical models and extendthe parameterization of the ensuing generative modelto cover priors. This means that the constraints,required by predictive coding and regularized solutionsto inverse problems, are now absorbed into the learningscheme and are estimated in exactly the same way asthe parameters. These extra parameters encode thevariability or precision of the causes and are referred toas hyperparameters in the classical covariance com-ponent literature. Hyperparameters are updated in theM-step and are treated in exactly the same way as theparameters.

(i) Hierarchical modelsConsider any level i in a hierarchy whose causesvi are elicited by causes in the level above viC1.The hierarchical form of the generative model is

uZ g1ðv2; q1ÞC31

v2 Z g2ðv3; q2ÞC32

v3 Z.;

(3.8)

with uZv1 (cf. equation (3.6)). Technically, thesemodels fall into the class of conditionally independent

Phil. Trans. R. Soc. B (2005)

hierarchical models when the stochastic terms areindependent (Kass & Steffey 1989). These models arealso called parametric empirical Bayes (PEB) modelsbecause the obvious interpretation of the higher-leveldensities as priors led to the development of PEBmethodology (Efron & Morris 1973). Often, instatistics, these hierarchical models comprise just twolevels, which is a useful way to specify simple shrinkagepriors on the parameters of single-level models. We willassume the stochastic terms are Gaussian with covari-ance SiZS(li). Therefore, viC1, qi and li parameterizethe means and covariances of the likelihood at eachlevel.

pðvijviC1; qÞZNðgiðviC1; qiÞ;SiÞ: (3.9)

This likelihood also plays the role of a prior on vi atthe level below, where it is jointly maximized with thelikelihood p(viK1jvi;q). This is the key to understandingthe utility of hierarchical models. By learning theparameters of the generative distribution of level i, oneis implicitly learning the parameters of the priordistribution for level iK1. This enables the learningof prior densities.

The hierarchical nature of these models lends animportant context-sensitivity to recognition densitiesnot found in single-level models. The key point here isthat high-level causes viC1 determine the prior expec-tation of causes vi in the subordinate level. This cancompletely change the distributions p(vijviC1;q), uponwhich inference in based, in an input and context-dependent way.

(ii) ImplementationThe biological plausibility of empirical Bayes in thebrain can be established fairly simply. To do this, ahierarchical scheme is described in some detail. A morethorough account, including simulations of variousneurobiological and psychophysical phenomena, willappear in future publications. For the moment, we willaddress neuronal implementation at a purely theoreti-cal level, using the framework above.

For simplicity, we will again assume deterministicrecognition. In this setting, with conditional indepen-dence, the objective function is

LZK1

2xT1 x1K

1

2xT2 x2K/K

1

2lnjS1jK

1

2lnjS2jK/

xiZfiKgiðfiC1;qiÞKlixiZð1CliÞK1ðfiKgiðfiC1;qiÞÞ

(3.10)

(cf. equation (3.7)). Here, S1=2i Z1Cli. In neuronal

models, the prediction error is encoded by theactivities of units denoted by xi. These error unitsreceive a prediction from units in the level above2 viabackward connections and lateral influences from therepresentational units fi being predicted. Horizontalinteractions among the error units serve to decorrelatethem (cf. Foldiak 1990), where the symmetric lateralconnection strengths li hyperparameterize the covari-ances of the errors Si, which are the prior covariancesfor level iK1.

Page 9: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 823

The estimators fi and parameters perform agradient ascent on the objective function

E : _fiC1 ZvF

vfiC1

ZKvxTivfiC1

xi KvxTiC1

vfiC1

xiC1

M :

_qi ZvF

vqiZK

*vxTivqi

x

+u

ð3:11Þ

_li ZvF

vliZK

*vxTivli

x

+u

K ð1CliÞK1:

Inferences mediated by the E-step rest on changes inthe representational units, mediated by forward con-nections from error units in the level below and lateralinteraction with error units within the same level.Similarly, prediction error is constructed by comparingthe activity of representational units, within the samelevel, to their predicted activity conveyed by backwardconnections.

This is the simplest version of a very general learningalgorithm. It is general in the sense that it does notrequire the parameters of either the generative orthe prior distributions. It can learn noninvertible,nonlinear generative models and encompasses compli-cated hierarchical processes. Furthermore, each of thelearning components has a relatively simple neuronalinterpretation (see below).

4. IMPLICATIONS FOR CORTICALINFRASTRUCTURE AND PLASTICITY

(a) Cortical connectivityThe scheme implied by equation (3.11) has four clearimplications or predictions about the functional archi-tectures required for its implementation. We nowreview these in relation to cortical organization inthe brain. A schematic summarizing these pointsis provided in figure 2. In short, we arrive at exactlythe same four points presented at the end of §2(c).

(i) Hierarchical organizationHierarchical models enable empirical Bayesian esti-mation of prior densities and provide a plausible modelfor sensory inputs. Models that do not show con-ditional independence (e.g. those used by connection-ist and infomax schemes) depend on prior constraintsfor unique inference and do not invoke a hierarchicalcortical organization. The useful thing about thearchitecture in figure 2 is that the responses of unitsat the ith level fi depend only on the error at thecurrent level and the immediately preceding level.Similarly, the error units xi are only connected torepresentational units in the current level and the levelabove. This hierarchical organization follows fromconditional independence and is important because itpermits a biologically plausible implementation, wherethe connections driving inference run only betweenneighbouring levels.

(ii) Reciprocal connectionsIn the hierarchical scheme, the dynamics of represen-tational units fiC1 are subject to two, locally available,

Phil. Trans. R. Soc. B (2005)

influences. A likelihood or recognition term mediatedby forward afferents from the error units in the levelbelow and an empirical prior conveyed by error units inthe same level. Critically, the influences of the errorunits in both levels are mediated by linear connectionswith strengths that are exactly the same as the(negative) reciprocal connections from fiC1 to xi andxiC1. Mathematically, from equation (3.11),

v _fiC1

vxiZK

vxTivfiC1

v _fiC1

vxiC1

ZKvxTiC1

vfiC1

:

(4.1)

Functionally, forward and lateral connections arereciprocated, where backward connections generatepredictions of lower-level responses. Forward connec-tions allow prediction error to drive representationalunits in supraordinate levels. Within each level, lateralconnections mediate the influence of error units on thepredicting units and intrinsic connections li among theerror units decorrelate them, allowing competitionamong prior expectations with different precisions(precision is the inverse of variance). In short, lateral,forwards and backward connections are all reciprocal,consistent with anatomical observations.

(iii) Functionally asymmetric forward and backwardconnectionsAlthough the connections are reciprocal, the functionalattributes of forward and backward influences aredifferent. The top-down influences of units fiC1 onerror units in the lower level xi instantiate theforward model xiZfiKgiðfiC1; qiÞKlixi. These canbe nonlinear, where each unit in the higher level maymodulate or interact with the influence of others,according to the nonlinearities in gi(fiC1,qi). In con-trast, the bottom-up influences of units in lower levels donot interact when producing changes at the higher level,because according to equation (3.11), their effects arelinearly separable. This is a key observation because theempirical evidence, reviewed in the previous section,suggests that backward connections are in a position tointeract (e.g. through NMDA receptors expressedpredominantly in supragranular layers in receipt ofbackward connections). Forward connections are not.In summary, nonlinearities, in the way sensory inputsare produced, necessitate nonlinear interactions in thegenerative model that are mediated by backwardconnections but do not require forward connections tobe modulatory.

(iv) Associative plasticityChanges in the parameters correspond to plasticity inthe sense that the parameters control the strength ofbackward and lateral connections. The backwardconnections parameterize the prior expectations andthe lateral connections hyperparameterize the priorcovariances. Together, they parameterize the Gaussiandensities that constitute the priors (and likelihoods) ofthe model. The plasticity implied can be seen moreclearly with an explicit model. For example, let

Page 10: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 2. Upper panel: schematic depicting a hierarchical predictive coding architecture. Here, hierarchical arrangements withinthe model serve to provide predictions or priors to representations in the level below. The upper circles represent error units andthe lower circles functional subpopulations encoding the conditional expectation of causes. These expectations change tominimize both the discrepancy between their predicted value and the mismatch incurred by their prediction of the level below.These two constraints correspond to prior and likelihood terms, respectively (see main text). Lower panel: a more detaileddepiction of the influences on representational and error units.

824 K. Friston A theory of cortical responses

giðviC1; qiÞZqiviC1. In this instance,

_qi Z ð1CliÞK1hxif

TiC1iu

_li Z ð1CliÞK1ðhxix

Ti iu K1Þ:

(4.2)

This is simply Hebbian or associative plasticity,where the connection strengths change in proportionto the product of pre- and postsynaptic activity, forexample, hxif

TiC1i. An intuition about equation (4.2)

is obtained by considering the conditions underwhich the expected change in parameters is zero (i.e.after learning). For the backward connections, thisimplies there is no component of prediction errorthat can be explained by estimates at the higher levelhxif

TiC1iZ0. The lateral connections stop changing

when the prediction error has been whitenedhxix

Ti iZ1.It is evident that the predictions of the theoretical

analysis coincide almost exactly with the empiricalaspects of functional architectures in visual corticeshighlighted in the §2(c) (hierarchical organization,

Phil. Trans. R. Soc. B (2005)

reciprocity, functional asymmetry and associativeplasticity). Although somewhat contrived, it is pleasingthat purely theoretical considerations and neuro-biological empiricism converge so precisely.

(b) Functional organizationIn short, representational inference and learning lendsitself naturally to a hierarchical treatment, whichconsiders the brain as an empirical Bayesian device.The dynamics of the units or populations are driven tominimize error at all levels of the cortical hierarchyand implicitly render themselves posterior modes (i.e.most likely values) of the causes given the data. Incontrast to supervised learning, hierarchical predictiondoes not require any desired output. Unlike infor-mation theoretic approaches, they do not assumeindependent causes. In contrast to regularized inversesolutions (e.g. in machine vision) they do not dependon a priori constraints. These emerge spontaneously asempirical priors from higher levels.

Page 11: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 825

The overall scheme implied by equation (3.11)sits comfortably with the hypothesis (Mumford 1992)that:

Phil. T

on the role of the reciprocal, topographic pathways

between two cortical areas, one often a ‘higher’ area

dealingwithmore abstract information about theworld,

the other ‘lower’, dealing with more concrete data. The

higher area attempts to fit its abstractions to the data it

receives from lower areas by sending back to them from

its deep pyramidal cells a template reconstruction best

fitting the lower level view. The lower area attempts to

reconcile the reconstruction of its view that it receives

fromhigher areaswithwhat it knows, sending back from

its superficial pyramidal cells the features in its data

which are not predicted by the higher area. The whole

calculation is done with all areas working simul-

taneously, but with order imposed by synchronous

activity in the various top-down, bottom-up loops.

We have tried to show that this sort of hierarchicalprediction can be implemented in brain-like archi-tectures using mechanisms that are biologicallyplausible.

(i) Backward or feedback connections?There is something slightly counterintuitive aboutgenerative models in the brain. In this view, corticalhierarchies are trying to generate sensory data fromhigh-level causes. This means the causal structure ofthe world is embodied in the backward connections.Forward connections simply provide feedback byconveying prediction error to higher levels. In short,forward connections are the feedback connections. Thisis why we have been careful not to ascribe a functionallabel like feedback to backward connections. Percep-tual inference emerges from mutually informed top-down and bottom processes that enable sensation toconstrain perception. This self-organizing process isdistributed throughout the hierarchy. Similar perspec-tives have emerged in cognitive neuroscience on thebasis of psychophysical findings. For example, reversehierarchy theory distinguishes between early explicitperception and implicit low level vision, where ‘ourinitial conscious percept—vision at a glance—matchesa high-level, generalized, categorical scene interpret-ation, identifying “forest before trees” (Hochstein &Ahissar 2002)’.

(c) Dynamic models and prospective coding

Hitherto, we have framed things in terms of statichierarchical models. Dynamic models require a simpleextension of equation (3.8) to include hidden states xithat serve to remember past causes vi of sensoryinputs:

uZ g1ðx1; v2; q1ÞC31

_x1 Z f1ðx1; v2; q1Þ

v2 Z g2ðx2; v3; q2ÞC32

_x2 Z f2ðx2; v3; q2Þ

v3 Z.

(4.3)

rans. R. Soc. B (2005)

In a subsequent paper, describing DEM forhierarchical dynamic models, we will show that it isnecessary to minimize the prediction errors and theirtemporal derivatives. However, the form of theobjective function and the ensuing E- and M-stepsremains the same. This means the conclusions abovehold in a dynamic setting, with some interestinggeneralizations.

When the generative model is dynamic (i.e. iseffectively a convolution operator) the E-step, subtend-ing inference, is more complicated and rests on ageneralization of equation (3.11) to cover dynamicsystems

E : _fðtÞi ZvLðtCtÞ

v _fi

ZvLðtÞ

v _fi

CtvLðtÞ

vfi

Ct

2

v _LðtÞ

vfi

C/

(4.4)

In DEM, the aim is not to find the most likely causeof the sensory input but to encode the evolution ofcauses in terms of conditional trajectories. For staticmodels, equation (4.4) reduces to equation (3.11),which can be regarded as a special case when there areno dynamics and the trajectory becomes a single point.The dynamic version of the E-step is based on theobjective function evaluated prospectively at some pointt in the future and can be understood as a gradientascent on the objective function and all its higherderivatives (see the second line of equation (4.4)).This prospective aspect of DEM lends it someinteresting properties. Among these is the nature ofthe plasticity. Because the associative terms involveprospective prediction errors, synaptic changes occurwhen presynaptic activity is high and post-synapticactivity is increasing. This has an interesting connec-tion with STDP, where increases in efficacy rely onpostsynaptic responses occurring shortly after presyn-aptic inputs. In this instance, at peak presynaptic input,the postsynaptic response will be rising, to peak a shorttime later. The DEM scheme offers a principledexplanation for this aspect of plasticity that can berelated to other perspectives on its functional role. Forexample, Kepecs et al. (2002) note that the temporalasymmetry implicit in STDP may underlie learningand review some of the common themes from a rangeof findings in the framework of predictive coding(see also Fuhrmann et al. 2002).

Generative models of a dynamic sort confer atemporal continuity and prospective aspect on repre-sentational inference that is evident in empiricalstudies. As noted by Mehta (2001) ‘a critical task ofthe nervous system is to learn causal relationshipsbetween stimuli to anticipate events in the future’. Boththe inference and learning about the states of theenvironment, in terms of trajectories, enables thisanticipatory aspect. Mehta (2001) reviews findingsfrom hippocampal electrophysiology, in which spatialRFs can show large and rapid anticipatory changes intheir firing characteristics, which are discussed in thecontext of predictive coding (see also Rainer et al. 1999for a discussion of prospective coding for objects in thecontext delayed paired associate tasks).

Page 12: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

826 K. Friston A theory of cortical responses

It would be premature to go into the details of DEMhere. We anticipate communicating several articlescovering the above themes in the near future. However,there are a number of complementary approaches tolearning in the context of dynamic models that arealready in the literature. For example, the seminalpaper of Rao & Ballard (1999) uses Kalman filteringand a hierarchical hidden Markov model to provide afunctional interpretation of many extra classical RFeffects (see below). Particularly relevant here is thediscussion of hierarchical Bayesian inference in thevisual cortex by Lee &Mumford (2003). These authorsconsider particle filtering and Bayesian-belief propa-gation (algorithms from machine learning) that mightmodel the cortical computations implicit in hierarchi-cal Bayesian inference and review the neurophysiolo-gical evidence that supports their plausibility.Irrespective of the particular algorithm employed bythe brain for empirical (i.e. hierarchical) Bayes, theyeach provide predictions about the physiology ofcortical responses. These predictions are the subjectof §5 below.

5. IMPLICATIONS FOR CORTICAL PHYSIOLOGYThe empirical Bayes perspective on perceptual infer-ence suggests that the role of backward connections isto provide contextual guidance to lower levels througha prediction of the lower level’s inputs. When thisprediction is incomplete or incompatible with the lowerareas input, a prediction error is generated thatengenders changes in the area above until reconcilia-tion. When (and only when) the bottom-up drivinginputs are in accord with top-down predictions, error issuppressed and a consensus between the predictionand the actual input is established. Given thisconceptual model, a stimulus-related response can bedecomposed into two components corresponding tothe transients evoked in two functional subpopulationsof units. The first representational subpopulationencodes the conditional expectation of perceptualcauses f. The second encodes prediction error x.Responses will be evoked in both, with the error unitsof one level exciting appropriate representationalunits through forward connections and the represen-tational unit suppressing error units through backwardconnections (see figure 2). As inference converges,high-level representations are expressed as the latecomponent of evoked responses with a concomitantsuppression of error signal in lower areas.

In short, within the model, activity in the corticalhierarchy self-organizes to minimize its free energythough minimizing prediction error. Is this sufficient toaccount for classical RFs and functional segregationseen in cortical hierarchies, such as the visual system?

(a) Classical receptive fields

The answer to the above question is yes. We haveshown previously that minimizing the free energy isequivalent to maximizing the mutual informationbetween sensory inputs and neuronal activity encod-ing their underlying causes (Friston 2003). Therehave been many compelling developments in theore-tical neurobiology that have used information theory

Phil. Trans. R. Soc. B (2005)

(e.g. Barlow 1961; Optican & Richmond 1987; Oja1989; Foldiak 1990; Linsker 1990; Tovee et al. 1993;Tononi et al. 1994). Many appeal to the principle ofmaximum information transfer (e.g. Atick & Redlich1990; Linsker 1990; Bell & Sejnowski 1995). Thisprinciple has proven extremely powerful in predictingmany of the basic RF properties of cells involved inearly visual processing (e.g. Atick & Redlich 1990;Olshausen & Field 1996). This principle represents aformal statement of the common sense notion thatneuronal dynamics in sensory systems should reflect,efficiently, what is going on in the environment(Barlow 1961).

There are many examples where minimizing thefree energy produces very realistic RFs, a verycompelling example can be found in Olshausen &Field (1996). An example from our own work, whichgoes beyond single RFs, concerns the selectivityprofile of units in V2. In Friston (2000), we used theinfomax principle (Linsker 1990) to optimize thespatio temporal RFs of simple integrate and fire unitsexposed to moving natural scenes. We examined theresponse profiles in terms of selectivity to orientation,speed, direction and wavelength. The units showedtwo principal axes of selectivity. The first partitionedcells into those with wavelength selectivity and thosewithout. Within the latter, the main axis was betweenunits with direction selectivity and those without. Thispattern of selectivity fits nicely with the characteristicresponse profiles of units in the thin, thick and interstripes of V2. See figure 3 for examples of the ensuingRFs, shown in the context of the functional architec-ture of visual processing pathways described in Zeki(1993).

(b) Extra classical receptive fields

Classical models (e.g. classical RFs) assume thatevoked responses will be expressed invariably in thesame units or neuronal populations, irrespective ofcontext. However, real neuronal responses are notinvariant but depend upon the context in which theyare evoked. For example, visual cortical units havedynamic RFs that can change from moment tomoment. A useful synthesis that highlights the ana-tomical substrates of context-dependent responses canbe found in Angelucci et al. (2002a,b). The keyconclusion is that ‘feedback from extrastriate cortex(possibly together with overlap or inter-digitation ofcoactive lateral connectional fields within V1) canprovide a large and stimulus-specific surround modu-latory field. The stimulus specificity of the interactionsbetween the centre and surround fields, may be due tothe orderly, matching structure and different scales ofintra-areal and feedback projection excitatorypathways.’

Extra classical effects are commonplace and aregenerally understood in terms of the modulation ofRF properties by backward and lateral afferents. Thereis clear evidence that horizontal connections in visualcortex are modulatory in nature (Hirsch & Gilbert1991), speaking to an interaction between thefunctional segregation implicit in the columnararchitecture of V1 and activity in remote populations.

Page 13: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 827

These observations suggest that lateral and backwardsinteractions may convey contextual information thatshapes the responses of any neuron to its inputs(e.g. Kay & Phillips 1996; Phillips & Singer 1997) toconfer the ability to make conditional inferences aboutsensory input.

The most detailed and compelling analysis of extraclassical effects in the context of hierarchical modelsand predictive coding is presented in Rao & Ballard(1999). These authors exposed a hierarchical networkof model neurons using a predictive coding schemeto natural images. The neurons developed simple-cell-like RFs. In addition, a subpopulation of error unitsshowed a variety of extra classical RF effectssuggesting that ‘non-classical surround effects in thevisual cortex may also result from cortico-corticalfeedback as a consequence of the visual system usingan efficient hierarchical strategy for encoding naturalimages.’ One nonclassical feature on which theauthors focus is end stopping. Visual neurons thatrespond optimally to line segments of a particularlength are abundant in supragranular layers and havethe curious property of end stopping or end inhi-bition. Vigorous responses to optimally oriented linesegments are attenuated or eliminated when the lineextends beyond the classical RF. The explanation forthis effect is simple, because the hierarchy was trainedon natural images, containing long line segments, theinput caused by short segments could notbe predicted and error responses could not besuppressed. This example makes a fundamentalpoint, which we will take up further below. Theselective response of these units does not mean theyhave learned to encode short line segments. Theirresponses reflect the fact that short line segments havenot been encountered before and represent anunexpected visual input, given the context establishedby input beyond the classical RF. In short, theirresponse signals a violation of statistical regularitiesthat have been learned.

If these models are right, interruption of backwardconnections should disinhibit the response of supra-granular error units that are normally suppressed byextra classical stimuli. Rao & Ballard (1999) citeinactivation studies of high-level visual cortex inanaesthetized monkeys, in which disinhibition ofresponses to surround stimuli is observed in lowerareas (Hupe et al. 1998). Furthermore, removal offeedback from V1 and V2 to the LGN reduces the endstopping of LGN cells (Murphy & Sillito 1987).

(c) Long-latency evoked responses

In addition to explaining the form of classical RFs,the temporal form of evoked transients is consistentwith empirical (hierarchical) Bayes. This is aptlysummarized by Lee & Mumford (2003): ‘Recentelectrophysiological recordings from early visual neu-rons in awake behaving monkeys reveal that there aremany levels of complexity in the information proces-sing of the early visual cortex, as seen in the longlatency responses of its neurons. These new findingssuggest that activity in the early visual cortex is tightlycoupled and highly interactive with the rest of the visualsystem.’ Long-latency responses are used to motivate

Phil. Trans. R. Soc. B (2005)

hierarchical Bayesian inference in which ‘the recurrentfeedforward/feedback loops in the cortex serve tointegrate top-down contextual priors and bottom-upobservations so as to implement concurrent probabili-stic inference.’

The prevalence of long-latency responses in unitrecordings is mirrored in similar late components ofERPs recorded noninvasively. The cortical hierarchy infigure 2 comprises a chain of coupled oscillators. Theresponse of these systems to sensory perturbationconforms to a damped oscillation, emulating a succes-sion of late components. Functionally, the activity oferror units at any one level reflect states that have yet tobe explained by higher-level representations and willwax and wane as higher-level causes are selected andrefined. The ensuing transient provides a compellingmodel for the form of ERPs, which look very much likedamped oscillation in the alpha (10 Hz) range. In someinstances, specific components of ERPs can beidentified with specific causes, For example, theN170, a negative wave about 170 ms after stimulusonset, is elicited by face stimuli relative to non-facestimuli. In the following, we focus on examples of latecomponents. We will not ascribe these components torepresentational or error subpopulations because theirrespective dynamics are tightly coupled. The theme wehighlight is that late components reflect inference aboutsupraordinate or global causes at higher levels in thehierarchy.

(i) Examples from neurophysiologyThis subsection considers evidence for hierarchicalprocessing in terms of single-cell responses to visualstimuli in the temporal cortex of behaving monkeys.If perceptual inference rests on a hierarchical gen-erative model, then predictions that depend on thehigh-order attributes of a stimulus must be conferredby top-down influences. Consequently, one mightexpect to see the emergence of selectivity for high-level attributes after the initial visual response(although delays vary greatly, it typically takes about10 ms for spike volleys to propagate from one corticalarea to another and about 100 ms to reach prefrontalareas). This delay in the emergence of selectivity isprecisely what one sees empirically. For example,Sugase et al. (1999) recorded neurons in macaquetemporal cortex during the presentation of faces andobjects. The faces were either human or monkeyfaces and were categorized in terms of identity(whose face it was) and expression (happy, angry,etc.): ‘Single neurones conveyed two different scalesof facial information in their firing patterns, startingat different latencies. Global information, categorizingstimuli as monkey faces, human faces or shapes, wasconveyed in the earliest part of the responses. Fineinformation about identity or expression was con-veyed later’ starting, on average, about 50 ms afterface-selective responses. These observations speak toa temporal dissociation in the encoding of stimuluscategory, facial identity and expression that is anatural consequence of hierarchically distributedprocessing.

A similar late emergence of selectivity is seen inmotion processing. A critical aspect of visual

Page 14: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 3. Schematic adapted from Zeki (1993) summarizing the functional segregation of processing pathways and therelationship of simulated RFs to the stripe structures in V2. LGN, lateral geniculate nucleus; P, parvocellular pathway; M,magnocellular pathway. These RFs were obtained by minimizing the free energy of a model neuronal system when exposed tomoving natural scenes. See Friston (2000) for details.

828 K. Friston A theory of cortical responses

processing is the integration of local motion signalsgenerated by moving objects. This process is compli-cated by the fact that local velocity measurements candiffer depending on contour orientation and spatialposition. Specifically, any local motion detector canmeasure only the component of motion perpendicularto a contour that extends beyond its field of view(Pack & Born 2001). This aperture problem isparticularly relevant to direction-selective neuronsearly in the visual pathways, where small RFs permitonly a limited view of a moving object. Pack & Born(2001) have shown ‘that neurons in the middletemporal visual area (known as MT or V5) of themacaque brain reveal a dynamic solution to theaperture problem. MT neurons initially respondprimarily to the component of motion perpendicularto a contour’s orientation, but over a period ofapproximately 60 ms the responses gradually shift toencode the true stimulus direction, regardless oforientation’.

Finally, it is interesting to note that extra classical RFeffects in supragranular V1 units are often manifest80–100 ms after stimulus onset, ‘suggesting that feed-back from higher areas may be involved in mediatingthese effects’ (Rao & Ballard 1999).

Phil. Trans. R. Soc. B (2005)

(ii) Examples from electrophysiologyIn the discussion of extra classical RF effects above, weestablished that evoked responses, expressed 100 ms orso after stimulus onset, could be understood in terms ofa failure to suppress prediction error when the localinformation in the classical RF was incongruent withglobal context established by the surround. Exactly thesame phenomena can be observed in ERPs evoked bythe processing of compound stimuli that have local andglobal attributes (e.g. an ensemble of L- shaped stimuli,arranged to form an H). For example, Han & He(2003) have shown that incongruency between globaland local letters enlarged the posterior N2, a com-ponent of visually evoked responses occurring about200 ms after stimulus onset. This sort of result may bethe electrophysiological correlate of the global pre-cedence effect expressed behaviourally. The globalprecedence effect refers to a speeded behaviouralresponse to a global attribute relative to local attributesand the slowing of local responses by incongruentglobal information (Han & He 2003).

(iii) Examples from neuroimagingAlthough neuroimaging has a poor temporal reso-lution, the notion that V1 responses evoked by

Page 15: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 4. Repetition suppression as measured with fMRI innormal subjects. Top panel: estimated hemodynamicresponses to the presentation of faces that were (red), andwere not (blue), seen previously during the scanning session.These estimates were based on a linear convolution model offMRI responses in the most significant voxel in thecorresponding statistical parametric map. Lower panel:statistical parametric maps, overlaid on a cortical renderingof a single subject, showing areas that responded to all faces(left) and the region showing significant repetition suppres-sion. For details, see Henson et al. (2000).

A theory of cortical responses K. Friston 829

compound stimuli can be suppressed by congruentglobal information can be tested easily. Murray et al.(2002) used MRI to measure responses in V1 and ahigher object processing area, the lateral occipitalcomplex, to visual elements that were either groupedinto objects or arranged randomly. They ‘observedsignificant activity increases in the lateral occipitalcomplex and concurrent reductions of activity inprimary visual cortex when elements formed coherentshapes, suggesting that activity in early visual areas isreduced as a result of grouping processes performed inhigher areas. These findings are consistent withpredictive coding models of vision that postulate thatinferences of high-level areas are subtracted fromincoming sensory information in lower areas throughcortical feedback.’

Our own work in this area uses coherent motionsubtended by sparse dots that cannot fall within thesame classical RF of V1 neurons. As predicted, theV1 response to coherent, relative to incoherent, stimuliwas significantly reduced (Harrison et al. 2004). Thisis a clear indication that error suppression is media-ted by backward connections because only highercortical areas have RFs that were sufficiently large toencompass more than one dot.

(d) Shut up or stop gossiping?

In summary, a component of evoked responses can beunderstood as the transient expression of predictionerror that is suppressed quickly by predictions fromhigher cortical areas. This suppression may be com-promised if the stimulus has not been learnedpreviously or is incongruent with the global context inwhich it appears. Kersten et al. (2004) introduce twoheuristics concerning the reduction of early visualresponses to coherent or predictable stimuli. High-levelareas may explain away the sensory input and tell thelower levels to ‘shut up’. Alternatively, high-level areasmight sharpen the responses of early areas by reducingactivity that is inconsistent with the high-levelinterpretation; that is, high-level areas tell competingrepresentations in lower areas to ‘stop gossiping’. Infact, the empirical Bayes framework accommodatesboth heuristics. High-level predictions explain awayprediction error and tell the error units to ‘shut up’.Concurrently, units encoding the causes of sensoryinput are selected by lateral interactions, with the errorunits, that mediate empirical priors. This selectionstops the gossiping. The conceptual tension, betweenthe two heuristics, is resolved by positing two function-ally distinct subpopulations, encoding the conditionalexpectations of perceptual causes and the predictionerror respectively.

In §6, we turn to the implication of error suppressionfor responses evoked during perceptual learning andtheir electrophysiological correlates.

6. IMPLICATIONS FOR SENSORY LEARNINGAND ERPsIn §5, we introduced the notion that evoked responsecomponents in sensory cortex encode a transientprediction error that is rapidly suppressed by

Phil. Trans. R. Soc. B (2005)

predictions mediated by backward connections. If thestimulus is novel or inconsistent with its context, thenthis suppression is compromised. An example of thismight be extra classical effects expressed 100 ms or soafter stimulus onset. In the following, we considerresponses to novel or deviant stimuli as measured withERPs. This may be important for empirical studiesbecause ERPs can be acquired noninvasively and can beused to study humans in both a basic and clinicalcontext.

(a) Perceptual learning and long-latency

responses

The E-step in our empirical Bayes scheme provides amodel for the dynamics of evoked transients in terms ofthe responses of representational and error units.Representational learning in the M-step models plas-ticity in backward and lateral connections to enablemore efficient inference using the same objectivefunction. This means that perceptual learning shouldprogressively reduce free energy or prediction error onsuccessive exposures to the same stimulus. For simpleor elemental stimuli, this should be expressed fairlysoon after stimulus onset; for high-order attributes ofcompound stimuli, later components should be sup-pressed. This suppression of responses to repeatedstimuli is exactly what one observes empirically and isreferred to as repetition suppression (Desimone 1996).This phenomenon is ubiquitous and can be observedusing many different sorts of measurements. Anexample is shown in figure 4, which details reduced

Page 16: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

FzN = 2 N = 6 N = 18 N = 36−3 µv

200msstandard

decrease in prediction error

deviant

MMN revealed

Figure 5. Schematic using empirical results reported in

830 K. Friston A theory of cortical responses

activation in the fusiform region to repeated faces,relative to new faces using fMRI (see Henson et al.2000 for details).

In §5, we presented empirical examples wherecoherence or congruency was used to control thepredictability of stimuli. Below, we look at exampleswhere rapid sensory and perceptual learning renderssome stimuli more familiar than others do. Thebehavioural correlate of repetition effects is priming(cf. global precedence for congruency effects). Theexample we focus on is the MMN elicited with simplestimuli. However, there are many other examples inelectrophysiology, such as the P300.

Baldeweg et al. (2004) relating theMMN to a predictive errorsuppression during perceptual learning. The idea is thatperceptual synthesis (E-step) minimizes prediction error‘online’ to terminate an early negativity, while perceptuallearning (M-step) attenuates its expression over repeatedexposures (solid black bars). The magnitude of MMNincreases with number N of standards in each ‘roving’stimulus train. This may be due to the suppression of anN1-like component over repeated presentation of thestandards (dotted lines) that reveals the MMN.

(b) MMN and perceptual learning

TheMMN is a negative component of the ERP elicitedby any perceptible change in some repetitive aspect ofauditory stimulation. The MMN can be seen in theabsence of attention and is generally thought to reflectpre-attentive processing in the temporal and frontalsystem (Naatanen 2003). The MMN is elicited bystimulus change at about 100–200 ms after thestimulus, and is presumed to reflect an automaticcomparison of stimuli to sensory memory represen-tations encoding the repetitive aspects of auditoryinputs. This prevailing theory assumes that there aredistinct change-specific neurons in auditory cortex thatgenerate the MMN. The alternative view is thatpreceding stimuli adapt feature-specific neurons. Inthis adaptation hypothesis, the N1 response is delayedand suppressed on exposure to repeated stimuli givingrise to the MMN. The N1 is a negative electricalresponse to stimuli peaking at about 100 ms. Theproblem for the adaptation hypothesis is that thesources of the N1 and MMN are, apparently, different(Jaaskelainen et al. 2004).

Neither the change-specific neuron nor the intrinsicadaptation hypotheses are consistent with the theoreti-cal framework established above. The empirical Bayesscheme would suggest that a component of the N1response, corresponding to prediction error, is sup-pressed more efficiently after learning-related plasticityin backward and lateral connections. This suppressionwould be specific for the repeated aspects of the stimuliand would be a selective suppression of predictionerror. Recall that error suppression (i.e. minimizationof free energy) is the motivation for plasticity in theM-step. This repetition suppression hypothesissuggests the MMN is simply the attenuation of theN1. Fortunately, the apparent dislocation of N1 andMMN sources has been resolved recently: Jaaskelainenet al. (2004) show that the MMN results fromdifferential suppression of anterior and posteriorauditory N1 sources by preceding stimuli. This altersthe centre of gravity of the N1 source, creating aspecious difference between N1 and MMN loci whenestimated using dipole equivalents.

In summary, both the E-step and M-step try tominimize free energy; the E-step does so duringperceptual synthesis on a time-scale of millisecondsand the M-step does so during perceptual learning overseconds or longer. If the N1 is an index of predictionerror (i.e. free energy), then theN1, evokedby thefirst in

Phil. Trans. R. Soc. B (2005)

a train of repeated stimuli, will decrease with eachsubsequent stimulus. This decrease discloses theMMNevoked by a new (oddball) stimulus. In this view, theMMN is subtended by a positivity that increaseswith thenumber of standard. Recent advances in the MMNparadigm that use a ‘roving’ paradigm show exactly this(see Baldeweg et al. 2004; see also figure 5).

(i) MMN and plasticityThe suppression hypothesis for the MMN rests onplasticity in backward and lateral connections to enablethe minimization of prediction error. The adaptationhypothesis does not. If the suppression hypothesis iscorrect, then the MMN should be attenuated whenplasticity is compromised pharmacologically. This isprecisely what happens. Umbricht et al. (2000) showthat ketamine significantly decreases the MMN ampli-tude, to both pitch and duration, by about 20%.Ketamine is a non competitive NMDA receptorantagonist. NMDA receptors play a key role inshort- and long-term plasticity as mentioned in §2(c).

The mechanistic link between plasticity and theMMN provided by perceptual learning based onempirical Bayes is important because a number ofneuropsychiatric disorders are thought to arise fromaberrant cortical plasticity. Two interesting examplesare dyslexia and schizophrenia. In dyslexia, the MMNto pitch is attenuated in adults and the degree ofattenuation correlates with reading disability. In con-trast, in schizophrenia, the MMN to duration is moreaffected and has been shown to correlate with theexpression of negative symptoms (Naatanen et al.2004). This is potentially important because thedisconnection hypothesis for schizophrenia (Friston1998) rests on abnormal experience-dependent plas-ticity. From the point of view of this paper, the keypathophysiology of schizophrenia reduces to aberrantrepresentational learning as modelled by the M-step. Ifthe MMN could be used as a quantitative index ofperceptual learning, then it would be very useful inschizophrenia research (see also Baldeweg et al. 2002).

Page 17: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 831

In §7, we show that the MMN can indeed be explainedby changes in connectivity.

7. AN EMPIRICAL EPILOGUEAlthough it is pleasing to have a principled explanationfor many anatomical and physiological aspects ofneuronal systems, the question can be asked, is thisexplanation empirically useful? We conclude by show-ing that recent advances in the DCM of evokedresponses now afford measures of connectivity amongcortical sources that can be used to quantify theoreticalpredictions about perceptual learning. These measuresmay provide mechanistic insights into putative func-tional disconnection syndromes, such as dyslexia andschizophrenia. The advances in data analysis are usefulbecause they use exactly the same EM scheme toanalyse neurophysiological data as proposed here forthe brain’s analysis of sensory data.

(a) Dynamic causal modelling with

neural mass models

ERPs have been used for decades as electrophysiologi-cal correlates of perceptual and cognitive operations.However, the exact neurobiological mechanismsunderlying their generation are largely unknown. Inthe following, we use neuronally plausible models toexplain event-related responses. The example usedhere suggests changes in connectivity are sufficient toexplain ERP components. Specifically we will look atlate components associated with rare or unexpectedevents (e.g. the MMN). If the unexpected nature ofrare stimuli depends on learning frequent stimuli, thenthe MMN must be due to plastic changes inconnectivity that mediate perceptual learning. If theempirical Bayes model of perception is right, then thislearning must involve changes in backward, lateral andforward connections. Below, we test this hypothesis inrelation to changes that are restricted to forwardconnections, using a neural mass model of ERPs andEEG data.

(i) A hierarchical neural mass modelThe minimal model we have developed (David et al.2005) uses the laminar-specific rules outlined in §2(c)and described in Felleman & Van Essen (1991) toassemble a network of coupled sources. These rules arebased on a partitioning of the cortical sheet into supra-,infra-granular layers and granular layer (layer 4).Bottom-up or forward connections originate in agra-nular layers and terminate in layer 4. Top-down orbackward connections target agranular layers. Lateralconnections originate in agranular layers and target alllayers. These long-range or extrinsic cortico-corticalconnections are excitatory and arise from pyramidalcells.

Each region or source is modelled using a neuralmass model decribed in David & Friston (2003), basedon the model of Jansen & Rit (1995). This modelemulates the activity of a cortical area using threeneuronal subpopulations, assigned to granular andagranular layers. A population of excitatory pyramidal(output) cells receives inputs from inhibitory andexcitatory populations of inter neurons, via intrinsic

Phil. Trans. R. Soc. B (2005)

connections. Within this model, excitatory interneurons can be regarded as spiny stellate cells foundpredominantly in layer 4 and in receipt of forwardconnections. Excitatory pyramidal cells and inhibitoryinter neurons will be considered to occupy agranularlayers and receive backward and lateral inputs (seefigure 6).

To model ERPs, the network receives inputs viainput connections. These connections are exactly thesame as forward connections and deliver inputsw to thespiny stellate cells in layer 4. In the present context,these are subcortical auditory inputs. The vector Ccontrols the influence of the ith input on each source.The matrices AF, AB, AL encode forward, backwardand lateral connections, respectively. The DCM here isspecified in terms of the state equations shown infigure 6 and a linear output equation,

_xðtÞZ f ðx;wÞ

yZLx0 C3;(7.1)

where x0 represents the transmembrane potential ofpyramidal cells, and L is a lead field matrix couplingelectrical sources to the EEG channels. As an example,the state equation for an inhibitory subpopulation is3

_x7 Z x8

_x8 ZHe

teððAB CAL Cg3IÞSðx0ÞÞK

2x8te

Kx7t2e

:(7.2)

Within each subpopulation, the evolution of neu-ronal states rests on two operators. The first transformsthe average density of presynaptic inputs into theaverage postsynaptic membrane potential. This ismodelled by a linear transformation with excitatoryand inhibitory kernels parameterized by H and t. Hcontrols the maximum postsynaptic potential and t

represents a lumped rate constant. The secondoperator S transforms the average potential of eachsubpopulation into an average firing rate. This isassumed to be instantaneous and is a sigmoid function.Interactions, among the subpopulations, depend onconstants g1,2,3,4, which control the strength ofintrinsic connections and reflect the total number ofsynapses expressed by each subpopulation. In equation(7.2), the top line expresses the rate of change ofvoltage as a function of current. The second linespecifies how current changes as a function of voltage,current and presynaptic input from extrinsic andintrinsic sources. Having specified the DCM, one canestimate the coupling parameters from empirical datausing EM (Friston et al. 2003), using exactly the sameminimization of free energy proposed for perceptuallearning.

(ii) Perceptual learning and the MMNWe elicited ERPs that exhibited a strong modulationof late components, on comparing responses tofrequent and rare stimuli, using an auditory oddballparadigm. Auditory stimuli of between 1000 and2000 Hz tones with 5 ms rise and fall times and80 ms duration were presented binaurally. The toneswere presented for 15 min, every 2 s in a pseudo

Page 18: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 6. This schematic shows the state equations describing the dynamics of one source. Each source is modelled with threesubpopulations (pyramidal, spiny stellate and inhibitory inter neurons) as described in Jansen & Rit (1995) and David & Friston(2003). These have been assigned to granular and agranular cortical layers which receive forward and backward connection,respectively.

832 K. Friston A theory of cortical responses

random sequence with 2000 Hz tones on 20% ofoccasions and 1000 Hz tones for 80% of the time(standards). The subject was instructed to keep amental record of the number of 2000 Hz tones(oddballs). Data were acquired using 128 EEGelectrodes with 1000 Hz sample frequency. Beforeaveraging, data were referenced to mean earlobeactivity and band-pass filtered between 1 and 30 Hz.Trials showing ocular artefacts and bad channels wereremoved from further analysis.

Six sources were identified using conventionalprocedures (David et al. 2005) and used to constructDCMs (see figure 7). To establish evidence forchanges in backward and lateral connections beyondchanges in forward connections, we employed aBayesian model selection procedure. This entailedspecifying four models that allowed for changes inforward, backward, forward and backward andchanges in all connections. These changes in extrinsicconnectivity may explain the differences in ERPselicited by standard relative to oddball stimuli. Themodels were compared using the negative free energy

Phil. Trans. R. Soc. B (2005)

as an approximation to the log evidence for eachmodel. From equation (3.4), if we assume theapproximating conditional density is sufficientlygood, then the free energy reduces to the log evidence.In Bayesian model selection of a difference in logevidence of three or more can be considered as verystrong evidence for the model with the greaterevidence, relative to the one with less. The logevidence for the four models is shown in figure 7.The model with the highest evidence (by a margin of27.9) is the DCM that allows for learning-relatedchanges in forward, backwards and lateral connec-tions. These results provide clear evidence thatchanges in backward and lateral connections areneeded to explain the observed differences in corticalresponses.

These differences can be seen in figure 8 in terms ofthe responses seen and those predicted by the fourthDCM. The underlying response in each of the sixsources is shown in the lower panel. The measuredresponses over channels have been reduced to the firstthree (spatial) modes or eigenvectors. The lower panel

Page 19: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 7. Upper right: transparent views of the cortical surface showing localized sources that entered the DCM. A bilateralextrinsic input acts on primary auditory cortices (red), which project reciprocally to orbito-frontal regions (green). In the righthemisphere, an indirect pathway was specified via a relay in the superior temporal gyrus (magenta). At the highest level, orbito-frontal and left posterior cingulate (blue) cortices were assumed to be laterally and reciprocally connected (broken lines). Lowerleft: schematic showing the extrinsic connectivity architecture of the DCM used to explain empirical data. Sources were coupledwith extrinsic cortico-cortical connections following the rules of Felleman & Van Essen (1991). A1, primary auditory cortex;OF, orbitofrontal cortex; PC, posterior cingulate cortex; STG, superior temporal gyrus (right is on the right and left on the left).The free parameters of this model included extrinsic connection strengths that were adjusted to best explain the observed ERPs.Critically, these parameters allowed for differences in connections between the standard and oddball trials. Lower right: Theresults of a Bayesian model selection are shown in terms of the log evidence for models allowing changes in forward (F),backward (B), forward and backward (FB) and forward, backward and lateral connections (FBL). There is very strong evidencethat both backward and lateral connections change with perceptual learning as predicted theoretically.

A theory of cortical responses K. Friston 833

also showswhere changes in connectivity occurred. The

numbers by each connection represent the relative

strength of the connections during the oddball stimuli

relative to the standards. The percentages in brackets

are the conditional confidence that these differences

are greater than zero (see David et al. 2005 for details

of this study). Note that we have not attempted to assign

a functional role to the three populations in terms of

a predictive coding model, nor have we attempted

to interpret the sign of the connection changes. This

represents the next step in creating theoretically

informed DCMs. At present, all we are demonstrating

is that exuberant responses to rare stimuli, which may

Phil. Trans. R. Soc. B (2005)

present a failure to suppress prediction error, can be

explained quantitatively by changes in the coupling

among cortical sources,whichmay represent perceptual

learning with empirical Bayes.

In summary, we estimated differences in the strength

of connections for rare and frequent stimuli. As

expected, we could account for detailed differences in

the ERPs by changes in connectivity. These changes

were expressed in forward, backward and lateral

connections. If this model is a sufficient approximation

to the real sources, then these changes are a non-

invasive measure of plasticity, mediating perceptual

learning in the human brain.

Page 20: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

Figure 8. Auditory oddball paradigm: DCM results for the FBL model of the previous figure. Upper panel: the data are theprojection of the original scalp time-series onto the three first spatial modes or eigenvectors. Note the correspondence between themeasured ERPs (thin lines) and those generated by the model (thick lines). Lower panel: the response of each source is shown forthe standard (grey) and oddball (black) trials based on the conditional expectation of theDCMparameters. Changes in coupling areshown alongside each connection in terms of the relative strength (oddball to standard). The percentages refer to the conditionalconfidence this change is non-zero (i.e. a relative strength ofmore than one). Changes with over 95% confidence are shown as solidlines. A1, primary auditory cortex; OF, orbitofrontal cortex; PC, posterior cingulate cortex; STG, superior temporal gyrus.

834 K. Friston A theory of cortical responses

ENDNOTES

1The Kullback–Leibler divergence is a measure of the distance or

difference between two probability densities.2Clearly, in the brain, backward connections are not inhibitory.

However, after mediation by inhibitory interneurons, their effective

influence could be thus rendered.3Propagation delays on the extrinsic connections have been omitted

for clarity here and in figure 6.

The Wellcome Trust funded this work. I would like to thankmy colleagues for help in writing this paper and developingthe ideas, especially Cathy Price, Peter Dayan, Rik Henson,Olivier David, Klaas Stephan, Lee Harrison, James Kilner,Stephan Kiebel, Jeremie Mattout and Will Penny. I would

Phil. Trans. R. Soc. B (2005)

also like to thank Dan Kersten and Bruno Olshausen fordidactic discussions.

REFERENCESAbsher, J. R. & Benson, D. F. 1993 Disconnection

syndromes: an overview of Geschwind’s contributions.Neurology 43, 862–867.

Angelucci, A., Levitt, J. B. & Lund, J. S. 2002a Anatomicalorigins of the classical receptive field and modulatorysurround field of single neurons in macaque visual corticalarea V1. Prog. Brain Res. 136, 373–388.

Angelucci, A., Levitt, J. B., Walton, E. J., Hupe, J. M.,Bullier, J. & Lund, J. S. 2002bCircuits for local and global

Page 21: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

A theory of cortical responses K. Friston 835

signal integration in primary visual cortex. J. Neurosci. 22,8633–8646.

Atick, J. J. & Redlich, A. N. 1990 Towards a theory of earlyvisual processing. Neural Comput. 2, 308–320.

Baldeweg, T., Klugman, A., Gruzelier, J. H. & Hirsch, S. R.2002 Impairment in frontal but not temporal componentsof mismatch negativity in schizophrenia. Int.J. Psychophysiol. 43, 111–122.

Baldeweg, T., Klugman, A., Gruzelier, J. & Hirsch, S. R.2004 Mismatch negativity potentials and cognitiveimpairment in schizophrenia. Schizophr. Res. 6,203–217.

Ballard, D. H., Hinton, G. E. & Sejnowski, T. J. 1983 Parallelvisual computation. Nature 306, 21–26.

Barlow, H. B. 1961 Possible principles underlying thetransformation of sensory messages. In Sensory communi-cation (ed. W. A. Rosenblith). Cambridge, MA:MIT Press.

Bell, A. J. & Sejnowski, T. J. 1995 An informationmaximisation approach to blind separation and blindde-convolution. Neural Comput. 7, 1129–1159.

Brodmann, K. 1905 Beitrage zur histologischen lokalisationder Grobhirnrinde. III. Mitteilung. Die Rindefelder derniederen Affen. J. Psychol. Neurol. 4, 177–226.

Brodmann, K. 1909 Vergleichende Lokaisationslehre derGrobhirnrinde in ihren Prinzipien dargestellt auf Grunddes Zellenbaues. pp. 1–9, Leipzig: Barth.

Buchel, C. & Friston, K. J. 1997 Modulation of connectivityin visual pathways by attention: cortical interactionsevaluated with structural equation modelling and fMRI.Cereb. Cortex 7, 768–778.

Buonomano, D. V. & Merzenich, M. M. 1998 Corticalplasticity: from synapses to maps. Annu. Rev. Neurosci. 21,149–186.

Crick, F. & Koch, C. 1998 Constraints on cortical andthalamic projections: the no-strong-loops hypothesis.Nature 391, 245–250.

David, O. & Friston, K. J. 2003 A neural mass model forMEG/EEG: coupling and neuronal dynamics.NeuroImage20, 1743–1755.

David, O., Kiebel, S. J., Harrison, L. M., Mattout, J.,Kilner, J. M. & Friston, K. J. 2005 Dynamic causalmodelling of evoked responses in EEG and MEG. PLoSBiology. (Under review.)

Dayan, P. & Abbot, L. F. 2001 Theoretical neuroscience.Computational and mathematical modelling of neural systems.Cambridge, MA: MIT Press.

Dayan, P., Hinton, G. E. & Neal, R. M. 1995 The Helmholtzmachine. Neural Comput. 7, 889–904.

Dempster, A. P., Laird, N. M. & Rubin 1977 Maximumlikelihood from incomplete data via the EM algorithm.J. R. Stat. Soc. Ser. B 39, 1–38.

Desimone, R. 1996 Neural mechanisms for visual memoryand their role in attention. Proc. Natl Acad. Sci. USA 93,13 494–13 499.

Efron, B. & Morris, C. 1973 Stein’s estimation rule and itscompetitors—an empirical Bayes approach. J. Am. Stat.Assoc. 68, 117–130.

Felleman, D. J. & Van Essen, D. C. 1991 Distributedhierarchical processing in the primate cerebral cortex.Cereb. Cortex 1, 1–47.

Foldiak, P. 1990 Forming sparse representations by localanti-Hebbian learning. Biol. Cybern. 64, 165–170.

Frey, U. &Morris, R. G. M. 1997 Synaptic tagging and long-term potentiation. Nature 385, 533–536.

Friston, K. J. 1998 The disconnection hypothesis. Schizophr.Res. 30, 115–125.

Friston, K. J. 2000 The labile brain. III. Transients andspatio-temporal receptive fields. Phil. Trans. R. Soc. B 355,253–265.

Phil. Trans. R. Soc. B (2005)

Friston, K. J. 2002 Functional integration and inference inthe brain. Prog. Neurobiol. 68, 113–143.

Friston, K. J. 2003 Learning and inference in the brain.Neural Netw. 16, 1325–1352.

Friston, K. J., Harrison, L. & Penny,W. 2003Dynamic causalmodelling. NeuroImage 19, 1273–1302.

Fuhrmann, G., Segev, I., Markram, H. & Tsodyks, M. 2002Coding of temporal information by activity-dependentsynapses. J. Neurophysiol. 87, 140–148.

Girard, P. & Bullier, J. 1989 Visual activity in area V2 duringreversible inactivation of area 17 in the macaque monkey.J. Neurophysiol. 62, 1287–1301.

Han, S. & He, X. 2003 Modulation of neural activities byenhanced local selection in the processing of compoundstimuli. Hum. Brain Mapp. 19, 273–281.

Harrison, L. M., Rees, G. & Friston, K. J. 2004 Extra-classicaland predictive coding effects measured with fMRI. Presentedat the 10th Annual Meeting of the Organization for HumanBrain Mapping, Budapest, Hungary, June 14-17 2004.(Available on CD-ROM in NeuroImage Vol. 22.)

Harth, E., Unnikrishnan, K. P. & Pandya, A. S. 1987The inversion of sensory processing by feedback pathways:amodel of visual cognitive functions.Science 237, 184–187.

Helmholtz, H. 1860/1962 Handbuch der physiologischenoptik (ed. J. P. C. Southall), vol. 3. New York: Dover(English trans.)

Henson, R., Shallice, T. & Dolan, R. 2000 Neuroimagingevidence for dissociable forms of repetition priming.Science 287, 1269–1272.

Hilgetag, C. C., O’Neill, M. A. & Young, M. P. 2000Hierarchical organisation of macaque and cat corticalsensory systems explored with a novel network processor.Phil. Trans. R. Soc. B 355, 71–89.

Hinton, G. E., Dayan, P., Frey, B. J. & Neal, R. M. 1995 The‘Wake–Sleep’ algorithm for unsupervised neural networks.Science 268, 1158–1161.

Hirsch, J. A. & Gilbert, C. D. 1991 Synaptic physiology ofhorizontal connections in the cat’s visual cortex.J. Neurosci. 11, 1800–1809.

Hochstein, S. & Ahissar, M. 2002 View from the top:hierarchies and reverse hierarchies in the visual system.Neuron 36, 791–804.

Hupe, J. M., James, A. C., Payne, B. R., Lomber, S. G.,Girard, P. & Bullier, J. 1998 Cortical feedbackimproves discrimination between figure and backgroundby V1, V2 and V3 neurons. Nature 394, 784–787.

Jaaskelainen, I. P. et al. 2004 Human posterior auditorycortex gates novel sounds to consciousness. Proc. NatlAcad. Sci. 101, 6809–6814.

Jansen, B. H. & Rit, V. G. 1995 Electroencephalogram andvisual evokedpotential generation in amathematicalmodelof coupled cortical columns. Biol. Cybern. 73, 357–366.

Kass, R. E. & Steffey, D. 1989 Approximate Bayesianinference in conditionally independent hierarchicalmodels (parametric empirical Bayes models). J. Am.Stat. Assoc. 407, 717–726.

Kawato, M., Hayakawa, H. & Inui, T. 1993 A forward-inverse optics model of reciprocal connections betweenvisual cortical areas. Network 4, 415–422.

Kay, J. & Phillips, W. A. 1996 Activation functions, compu-tational goals and learning rules for local processors withcontextual guidance. Neural Comput. 9, 895–910.

Kepecs, A., Van Rossum, M. C., Song, S. & Tegner, J. 2002Spike-timing-dependent plasticity: common themes anddivergent vistas. Biol. Cybern. 87, 446–458.

Kersten, D., Mamassian, P. & Yuille, A. 2004 Objectperception as Bayesian inference. Annu. Rev. Psychol. 55,271–304.

Kotter, R. & Wanke, E. 2005 Mapping brains withoutcoordinates. Phil. Trans. R. Soc. B 360.

Page 22: A theory of cortical responses · 2020. 8. 15. · A theory of cortical responses Karl Friston* The Wellcome Department of Imaging Neuroscience, Institute of Neurology, University

836 K. Friston A theory of cortical responses

Lee, T. S. & Mumford, D. 2003 Hierarchical Bayesianinference in the visual cortex. J. Opt. Soc. Am. Opt. ImageSci. Vis. 20, 1434–1448.

Linsker, R. 1990 Perceptual neural organisation: someapproaches based on network models and informationtheory. Annu. Rev. Neurosci. 13, 257–281.

Locke, J. 1690/1976 An essay concerning human understanding.London: Dent.

MacKay, D. M. 1956 The epistemological problem forautomata. In Automata studies (ed. C.E. Shannon &J. McCarthy), pp. 235–251, Princeton, NJ: PrincetonUniversity Press.

Martin, S. J., Grimwood, P. D. & Morris, R. G. 2000Synaptic plasticity and memory: an evaluation of thehypothesis. Annu. Rev. Neurosci. 23, 649–711.

Maunsell, J. H. & Van Essen, D. C. 1983 The connections ofthe middle temporal visual area (MT) and their relation-ship to a cortical hierarchy in the macaque monkey.J. Neurosci. 3, 2563–2586.

Mehta, M. R. 2001 Neuronal dynamics of predictive coding.Neuroscientist 7, 490–495.

Mesulam, M. M. 1998 From sensation to cognition. Brain121, 1013–1052.

Mumford, D. 1992 On the computational architecture of theneocortex. II. The role of cortico-cortical loops. Biol.Cybern. 66, 241–251.

Murphy, P. C. & Sillito, A. M. 1987 Corticofugal feedbackinfluences the generation of length tuning in the visualpathway. Nature 329, 727–729.

Murray,S.O.,Kersten,D.,Olshausen,B.A.,Schrater,P.&Woods,D.L.2002Shapeperceptionreducesactivity inhumanprimaryvisual cortex. Proc. Natl Acad. Sci. USA 99, 15 164–15 169.

Naatanen, R. 2003Mismatch negativity: clinical research andpossible applications. Int. J. Psychophysiol. 48, 179–188.

Naatanen, R., Pakarinen, S., Rinne, T. & Takegata, R. 2004The mismatch negativity (MMN): towards the optimalparadigm. Clin. Neurophysiol. 115, 140–144.

Neisser, U. 1967 Cognitive psychology. New York: Appleton-Century-Crofts.

Oja, E. 1989 Neural networks, principal components, andsubspaces. Int. J. Neural Syst. 1, 61–68.

Olshausen, B. A. & Field, D. J. 1996 Emergence of simple-cell receptive field properties by learning a sparse code fornatural images. Nature 381, 607–609.

Optican, L. & Richmond, B. J. 1987 Temporal encoding oftwo-dimensional patterns by single units in primateinferior cortex. II. Information theoretic analysis.J. Neurophysiol. 57, 132–146.

Pack, C. C. & Born, R. T. 2001 Temporal dynamics of aneural solution to the aperture problem in visual area MTof macaque brain. Nature 409, 1040–1042.

Phillips, W. A. & Singer, W. 1997 In search of commonfoundations for cortical computation. Behav. Brain Sci.20, 57–83.

Phillips, C. G., Zeki, S. & Barlow, H. B. 1984 Localization offunction in the cerebral cortex; past present and future.Brain 107, 327–361.

Poggio, T., Torre, V. & Koch, C. 1985 Computational visionand regularization theory. Nature 317, 314–319.

Pollen, D. A. 1999 On the neural correlates of visualperception. Cereb. Cortex 9, 4–19.

Rainer, G., Rao, S. C. & Miller, E. K. 1999 Prospectivecoding for objects in primate prefrontal cortex. J. Neurosci.19, 5493–5505.

Phil. Trans. R. Soc. B (2005)

Rao, R. P. 1999 An optimal estimation approach tovisual perception and learning. Vision Res. 39,1963–1989.

Rao, R. P. & Ballard, D. H. 1999 Predictive coding inthe visual cortex: a functional interpretation of someextra-classical receptive field effects. Nat. Neurosci. 2,79–87.

Rivadulla, C., Martinez, L. M., Varela, C. & Cudeiro, J. 2002Completing the corticofugal loop: a visual role for thecorticogeniculate type 1 metabotropic glutamate receptor.J. Neurosci. 22, 2956–2962.

Rockland, K. S. & Pandya, D. N. 1979 Laminar origins andterminations of cortical connections of the occipital lobe inthe rhesus monkey. Brain Res. 179, 3–20.

Salin, P.-A. & Bullier, J. 1995 Corticocortical connections inthe visual system: structure and function. Psychol. Bull. 75,107–154.

Sandell, J. H. & Schiller, P. H. 1982 Effect of cooling area 18on striate cortex cells in the squirrel monkey.J. Neurophysiol. 48, 38–48.

Sherman, S. M. & Guillery, R. W. 1998 On the actions thatone nerve cell can have on another: distinguishing‘drivers’ from ‘modulators’. Proc. Natl Acad. Sci. USA95, 7121–7126.

Simoncelli, E. P. & Olshausen, B. A. 2001 Natural imagestatistics and neural representation. Annu. Rev. Neurosci.24, 1193–1216.

Sugase, Y., Yamane, S., Ueno, S. & Kawano, K. 1999Global and fine information coded by singleneurons in the temporal visual cortex. Nature 400,869–873.

Tononi, G., Sporns, O. & Edelman, G. M. 1994 A measurefor brain complexity: relating functional segregation andintegration in the nervous system. Proc. Natl Acad. Sci.USA 91, 5033–5037.

Tovee, M. J., Rolls, E. T., Treves, A. & Bellis, R. P. 1993Information encoding and the response of single neuronsin the primate temporal visual cortex. J. Neurophysiol. 70,640–654.

Umbricht, D., Schmid, L., Koller, R., Vollenweider, F. X.,Hell, D. & Javitt, D. C. 2000 Ketamine-induced deficitsin auditory and visual context-dependent processingin healthy volunteers. Arch. Gen. Psychiatry 57,1139–1147.

Zeki, S. 1990 The motion pathways of the visual cortex.Vision: coding and efficiency (ed. C. Blakemore). pp. 321–345, UK: Cambridge University Press.

Zeki, S. 1993 A vision of the brain. Oxford: BlackwellScientific.

Zeki, S. & Shipp, S. 1988 The functional logic of corticalconnections. Nature 335, 311–317.

GLOSSARY

DEM: dynamic expectation maximization

DCM: dynamic causal modelling

EM: expectation maximization

ERP: event-related potential

fMRI: functional magnetic resonance imaging

LGN: lateral geniculate nucleus

MMN: mismatch negativity

RF: receptive field

STDP: spike-timing dependent plasticity