Chapter V Adventures into terra incognita: probing the...

BiologicalandComputerVision GabrielKreiman©ChapterV 2019

1

ChapterV Adventuresintoterraincognita:probingtheneural1circuitsalongtheventralvisualstream2 3 Lesions studies had provided a compelling case that damage to 4circumscribed brain regions leads to specific visual processing deficits (Chapter 5IV) and behavioral experiments had characterized many phenomenological 6aspects of visual perception that begged for a mechanistic explanation (Chapter 7III). The time was ripe to open the black box of the brain and begin to think about 8how vision emerges from the spiking activity of neurons in cortex. 9 10

Retinal ganglion neurons project to the lateral geniculate nucleus (LGN) in 11the 12Thalamus and the main output projection from the LGN conveys visual information 13to primary visual cortex (V1) (Chapter II), the first stage of cortical processing for 14visual information. From V1, information is propagated into a large number of 15visual cortical areas which are responsible for transforming a pixel-like 16representation of sensory information into the rich and complex visual percepts 17(Chapter I, Figure I-5). The exploration and computational modeling of visual 18cortex is an ongoing adventure, where courageous conquistadores dare to peek 19inside the inner workings of the most complex system ever examined by science. 20Basic structural and functional principles of computation are beginning to emerge 21out of the sometimes seemingly enigmatic terra incognita of visual cortex. These 22basic principles are introduced in this Chapter and form the basis of the 23computational models of vision discussed in Chapters VII-IX. 24 25

V.1. About neocortex 26 27

The neocortex is the outer layer of the neural tissue in the brain, and is 28thought to be responsible for cognition. The human neocortex is about 2-4 mm 29thick, comprises about 40% of the brain mass and contains on the order of 1010 30neurons. Cortex shows a large number of folds such that it can fit about 2600 cm2, 31approximately half a basketball court, into the size of the brain. Because of its 32extensive surface and relatively shallow depth, many investigators think of 33neocortex as a quasi-2D structure, though there are important computations that 34take place along the third dimension. The biggest fold is the longitudinal fissure 35separating the right and left hemisphere. The human neocortex has more folds 36than that of many other mammals; for example, the mouse cortex appears 37relatively smooth in comparison to the human cortex. Mechanical tension, 38combined with the strong constraint to save wiring and space are likely to have 39been important factors in determining the shape and folds of cortex {Van Essen, 401997 #193}. 41

42To a pretty reasonable approximation, cortex is cortex: staining of cortical 43

tissue appears at a gross level to be very similar both across different parts of the 44brain, and even across different species. It takes a connoisseur to distinguish a 45


2

piece of cortical tissue from a mouse from a human one. This similarity is perhaps 46remarkably to some people. Egomaniac or anthropomorphic considerations might 47lead some to think that human cortex might be substantially different; after all, mice 48don’t place chess or read Shakespeare. These coarse similarities suggest that 49approximately the same pieces of hardware can be combined in different and 50exciting ways to account for the cognitive capacities of different species. 51

52Upon further scrutiny, specialists can distinguish different species by 53

examining cortical tissue. Furthermore, it is also possible to demarcate different 54brain regions by examining cortex. The German neuroanatomist Korbinian 55Brodmann (1868-1918) devised a parcellation of the human and monkey brains – 56as well as many other species – based on morphological cytoarchitectonic criteria. 57Many parts of neocortex are still referred to by their Brodmann area number 58(Figure V-1) {Brodmann, 1909 #1031}. For example, primary visual cortex 59corresponds to Brodmann’s area 17. Biologists are rather fond of assigning names 60to structures and sometimes they are not content with only one name. Primary 61visual cortex is described in the literature as area 17, or area V1, or even striate 62cortex because of its distinctive stripe patterns. Subsequent physiological and 63lesion studies have shown that several of the structural subdivisions proposed by 64Brodmann and other neuroanatomists correlate with functional differences. 65Attempts to separate cortical regions, particularly combined with attempts to attach 66cognitive functions to different regions have a long and rich history that continues 67to current days (Finger, 2000). 68 69

V.2. Connectivity to and from primary visual cortex 70 71

Primary visual cortex is the first stage where information from the two eyes 72converges onto individual neurons. Each hemisphere in V1 represents the 73contralateral visual field. The part of the retina that is closer to the nose is called 74nasal while the other half of the retina is called temporal. The left visual field (left 75

Figure V-1. Cortex can be subdivided into multiple brain areas based on cytoarchitectonic criteria. Brodmann subdivided neocortex into multiple areas based on cytoarchitectonic criteria. Primary visual cortex corresponds to Brodmann area 17 in this diagram [source = Wikipedia].


3

of where the eyes are fixating on) is represented by the nasal part of the retina on 76the left eye and by the temporal part of the retina on the right eye. Information from 77the nasal retina on the left eye will cross the brain and end up represented in the 78right hemisphere in primary visual cortex. Information from the temporal retina on 79the right eye will turn at the optic chiasm and also end up represented in the right 80hemisphere in primary visual cortex. 81

82Like most other aspects 83

of neuroanatomy, the first 84drawings of primary visual 85cortex were made by 86Santiago Ramon y Cajal, 87who was introduced in 88Chapter II. The basic 89architecture of primary 90visual cortex turned out to 91be approximately similar to 92that of other parts of visual 93neocortex. The neocortical 94sheet is characterized by 95six layers that can be 96distinguished in a Nissl 97staining, a technique used 98to sparsely introduce a dye 99into many neurons in a 100given area for visualization. 101The six layers are 102characterized by a 103

stereotypical connectivity pattern that is often referred to as the canonical 104microcircuit. With some exceptions -- it is Biology after all -- this canonical 105connectivity pattern is shared across different visual areas and also across 106different sensory modalities. 107

108Connections among different areas of cortex are often described as “bottom-109

up”, “top-down” or “horizontal” connections, a nomenclature that is also used to 110describe connectivity in neural network architectures (Chapter VII). These 111different types of connections can be defined based on the specific layer of the 112pre- and post-synaptic neurons. The connections between and within visual 113cortical areas follow a typical pattern that has been used to define what area is 114“upstream” or “downstream”, and which connections are bottom-up or top-down 115(Felleman and Van Essen, 1991) (Figure V-2). Bottom-up connections arrive at 116layer 4. The LGN projects to pyramidal neurons in layer 4 in primary visual cortex, 117perhaps the most studied layer. Layer 1 is the most superficial layer and contains 118mostly dendrites and few neuron bodies; the neuron bodies for those dendritic 119arbors are mostly located in layers 2 and 3. Top-down connections from other 120visual cortical areas typically end in the deep layers 5 and 6, and also to a lesser 121

Figure V-2. Canonical cortical circuits. Cortical connectivity across visual cortex follows stereotypical connectivity patterns illustrated here. L1 through L6 refer to the 6 cortical layers. “Bottom-up” connectivity between areas is shown in black, “Top-down” connectivity between areas is shown in light gray, and connections within an area are shown in medium gray.


4

degree in layers 2 and 3. After LGN input (or input from a “lower area”) arrives onto 122layer 4, information flows from layer 4 to layers 2/3 and then onto layer 5 and layer 1236. Information from layer 6 provides back projections to the LGN (or to a “lower” 124visual area) and is also fed back to layer 4. An important aspect of connectivity in 125visual cortex is that connections between areas are almost invariably reciprocal in 126that if area A provides bottom-up input into area B, area B provides top-down inputs 127to area A {Felleman, 1991 #641; Markov, 2014 #6816}. 128

129By scrutinizing the connectivity patterns across layers in multiple brain areas, 130

investigators have come up with an approximate map of the anatomical paths 131through which different visual areas communicate with each other (Chapter I, 132Figure I-5). Based on the separation of connections into bottom-up and top-down, 133it is possible to arrange the multiple different visual brain areas into an 134approximately hierarchical structure {Felleman, 1991 #641}. The Felleman and 135Van Essen diagram provides a semi-hierarchical description of the anatomical flow 136of information in the visual system. 137

138The more we study connectivity in visual cortex, the more we realize that this 139

stereotypical pattern is full of exceptions. There are differences across species, 140differences between visual cortex and motor cortex, even differences between 141different visual cortical areas. To make matters more complicated, these layers 142can in turn be subdivided into sub-layer structures and the connectivity patterns 143may be different depending on the types of neurons being considered. 144Additionally, the hierarchical nature of visual cortex should not be interpreted too 145strictly; for example, there are numerous “bypass” connections that send 146information from an area A to area D, without going through the intermediate areas 147B and C. Despite the subdivisions, exceptions, and refinements, the basic 148principles of connectivity in visual cortex have played an important role in imposing 149method to the apparent madness, and have inspired the development of the best 150computational models that we have today. 151

152A word of caution about nomenclature is pertinent, particularly for computer 153

scientists used to thinking about neural networks. Biologists talk about different 154cortical areas, such as V1, V2, etc. Each of these areas has 6 layers, as described 155in this section. In Chapters VII-VIII, we will discuss computational models of visual 156processing, which often refer to computational steps instantiated in different 157“layers”. Those computational layers should not be confounded with the cortical 158layers described in here. A layer in a neural network is not necessarily directly 159linked to one of the 6 layers in neocortex in any given brain area. In fact, in many 160cases, people think about a layer in a neural network as potentially equivalent to a 161whole brain area in cortex. We will come back to the issue of making a commitment 162in the mapping between computational models and biological anatomy. For the 163moment, here we refer to layers in the biological sense discussed in the previous 164paragraph and in Figure V-2. In addition to information flowing from one layer to 165another layer within a visual area, and also information flowing between brain 166areas, there are extensive horizontal connections whereby information flows within 167


5

a layer. Some investigators use the term recurrent connections to refer to both 168horizontal and top-down connections. 169 170

V.3. How to study neuronal circuits 171 172

Every problem has an appropriate scale of study, a Goldilocks scale, not 173too coarse, not too fine. For example, it is particularly tedious and difficult to 174attempt to read the newspaper using a microscope (too fine a resolution). It is also 175extremely challenging to read a newspaper from a distance of 200 meters away 176(too coarse). A plethora of methods are available to study the brain, ranging from 177elucidating the three-dimensional structure of specific types of ion channels, all the 178way to indirectly measuring signals that show some degree of correlation with 179blood flow, averaged over coarse spatial scales. 180

181In the case of neocortical circuits, this Goldilocks scale is the activity of 182

individual neurons. Studying the three-dimensional structure of each protein inside 183a neuron is equivalent to trying to read the newspaper with a microscope -- but it 184can be extremely useful for other questions such as understanding the kinetics 185and properties of ion channels in the neuronal membrane. Studying blood flowing 186through half a cubic centimeter of cortex averaged over several seconds is 187equivalent to attempting to extract ink tones from the newspaper from 200 meters 188away (but it can be extremely useful for other questions such as differentiating 189general properties of a part of cortex). In addition to this spatial scale, there is also 190a natural time scale to examine neuronal activity. Neurons communicate with each 191other by sending electrical signals called action potentials (Kandel et al., 2000) 192lasting about 2 milliseconds. For most purposes, it is sufficient to study neuronal 193activity at the millisecond level. With a few exceptions (e.g. small differences in 194timing between signals arriving at the two ears), microsecond resolution timescales 195do not provide additional information (one day has 1,440 minutes and therefore 196the analogy for studying brain at the microsecond scale would be to reread the 197same newspaper every minute). At the other end of the spectrum, techniques that 198average activity over many seconds are too coarse to elucidate cortical 199computations (one second is 1,000 milliseconds and therefore the analogy for 200studying brain at the scale of several seconds would be to average the newspaper 201over 2 years). 202

203Studying the activity of neocortical circuits at neuronal resolution is not 204

trivial. The gold standard is to examine the activity of individual neurons at 205millisecond resolution by inserting thin microelectrodes. Neuronal action potentials 206lead to changes in the electrical potential in the extracellular milieu. It is possible 207to amplify and measure this electrical potential in the extracellular space and 208measure the action potentials emitted by individual neurons. The methodology was 209established by Edgar Adrian (Adrian, 1926) and we already introduced example 210measurements of single neuron activity in the retina in Chapter II. 211


6

212V.4. Neurons in primary visual cortex respond selectively to bars shown at 213

specific orientations 214 215Primary visual cortex consists of about 250 million neurons arranged in a 2- 216

mm-thick sheet encompasses a few square inches in surface. There are probably 217more papers examining the neurophysiology of primary visual cortex than the rest 218of the visual cortex combined. Neurons in primary visual cortex -- as well as those 219in the retina and LGN (Chapter II), and also neurons in other parts of visual cortex 220-- show spatially restricted receptive fields, that is, they respond only to a certain 221part of the visual field. The ensemble of all the neurons tiles the visual field. The 222receptive field size of neurons in primary visual cortex is larger than the ones in 223the retina and LGN and can typically encompass about 0.5 to 1 degree of visual 224angle. A typical neurophysiology experiment often starts by determining the 225receptive field location of the neuron under study. After determining the location of 226the receptive field, a battery of stimuli is used to probe the neuron’s response 227preferences. 228

229The initial and 230

paradigm-shifting strides 231towards describing the 232neurophysiological 233responses in primary 234visual cortex were 235introduced by Torsten 236Wiesel and David Hubel 237(1926-2013). It is said 238that, the history of visual 239neuroscience is the 240history of visual stimuli. 241Typically, before the 242Hubel-Wiesel era, 243investigators had 244examined the responses 245in primary visual cortex 246using diffuse light or the 247type of point sources used 248to elicit activity in the 249retina and LGN. By a 250combination of inspiration, 251perspiration, and careful 252observation, Hubel and 253Wiesel realized that 254neurons in primary visual 255cortex responded most 256strongly when a bar of a 257

Figure V-3.Example responses of a neuron in monkey primary visual cortex. Physiological responses of a neuron in primary visual cortex to bars of different orientation. In these examples, the bar was moved in a direction perpendicular to its orientation. The dashed lines on the left indicate the receptive field, the black rectangle is the oriented bar and the arrows indicate the direction of motion. The neuronal response traces are shown on the right. Reproduced from {Hubel, 1968 #2321}.


7

particular orientation was presented within the neuron’s receptive field {Hubel, 2581959 #1045}(Hubel and Wiesel, 1998). The story of how this discovery came about 259is particularly fascinating and recounted in David Hubel’s Nobel Lecture {Hubel, 2601981 #6722}. Hubel and Wiesel did not have particularly grandiose hypotheses 261about the function of neurons in visual cortex before they embarked on these 262

investigations, but rather intuited that 263interesting principles would emerge by 264courageously placing electrodes in V1. 265After a particularly long day recording 266the activity of a V1 neuron, they were 267frustrated by how little the neuron 268seemed to care about the presence of 269a light or dark annulus inside the 270receptive field. In those days, they did 271not have computers to present stimuli; 272instead they used slides inserted into a 273projector. Their careful power of 274observation led them to realize that the 275neuron would show a burst of activity 276every time they inserted the slide into 277the projector. It was the edge of the 278slide moving into and out of the 279projector that triggered activation much 280more than its content. Excited by this 281finding, they went on to discover that 282the orientation of the edge mattered 283quite a lot, certain orientations led to 284much larger activation than others. 285

286A typical pattern of responses 287

obtained from V1 recordings is 288illustrated in Figure V-3. In this 289experiment, an oriented bar was moved 290within the receptive field of the neuron 291under study. The direction of 292movement was perpendicular to the 293bar’s orientation. Different orientations 294elicited drastically distinct numbers of 295action potentials. While the number of 296action potentials (or spike count) is not 297the only variable that can be used to 298define the neuronal response, it 299provides a simple and good starting 300point to examine neuronal preferences. 301When the bar was approximately at a -30245-degree angle (panel D in Figure 303

Figure V-4. Complex neurons show tolerance to position changes. A. Schematic diagram showing responses from a simple neuron that responds maximally to a -45-degree oriented line when it is positioned in the center of the receptive field (top) but not when the position is shifted (rows 2,3) or when the orientation changes (bottom). B. Schematic diagram showing responses of a complex neuron that shows tolerance to position changes.


8

V-3), the neuron emitted more spikes than for any other orientation. Moreover, the 304activity of this neuron was also dependent on the direction of motion. When the 305bar was moving towards the upper right, the neuron was vigorously active whereas 306there was minimal activation in the opposite direction of motion. 307

308Hubel and Wiesel went on to characterize the properties of V1 neurons in 309

terms of their topography, orientation preference, ocular preference, color, 310direction of motion, and even how those properties arise during development. Their 311Nobel-prize winning discovery inspired generations of neurophysiologists to 312examine neuronal responses throughout the visual cortex. 313

314V.5. Complex neurons achieve tolerance to position changes 315 316 In the example shown in Figure V-3, the V1 neuron responds preferentially 317to a moving bar. Neurons in V1 also respond to flashes of static stimuli. How 318precise does the position of the oriented bar have to be within the neuron’s 319receptive field to trigger a response? A distinction has been observed between two 320types of responses in V1: simple and complex V1 neurons. The latter are less 321sensitive to the exact position of the bar. When using gratings containing multiple 322oriented bars arranged at a certain spatial frequency, complex neurons tolerate 323larger changes in those spatial frequencies. Simple and complex neurons are often 324distinguished by the ratio of the “DC” maintained response to their “AC” response 325elicited by a moving grating (De Valois et al., 1982). Complex neurons show a 326small AC/DC ratio (typically <10) whereas simple neurons have a larger AC/DC 327ratio (typically >10). In other words, complex neurons show a higher degree of 328tolerance to the exact position of a bar with the preferred orientation within the 329receptive field compared to a simple neuron whose response magnitude 330decreases when the bar is shifted away from the preferred position within the 331receptive field (Figure V-4). This progression from a simple neuron to a complex 332neuron showing increased tolerance has inspired the development of hierarchical 333computational models of object recognition that concatenate simple and complex-334like operations as a way of keeping selectivity while achieving tolerance to 335transformations in the stimulus. 336

337Some complex neurons also show “end-stopping”, meaning that their 338

optimum stimulus includes an end within the receptive field, as opposed to very 339long bars whose ending is outside of the receptive field. This end-stopping 340phenomenon can be understood as a form of contextual modulation where the 341light patterns in the region surrounding the receptive field influence the responses 342to the stimulus inside the receptive field. Such influences from outside the 343receptive field are not restricted to end-stopping. V1 neurons also show surround 344suppression, similar to the suppressive effects of light around the receptive field 345center for on-center retinal ganglion neurons described in Chapter II. In sum, V1 346neurons are particularly sensitive to spatial changes, to detect an edge indicative 347of a discontinuity in the visual field, to detect when the line stops, and complex 348


9

neurons can further detect the presence of their preferred features irrespective of 349small changes in position or scale. 350

351V.6. Quantitative phenomenological description of the responses in primary 352

visual cortex 353 354The receptive field structure of orientation-tuned simple V1 neurons (D(x,y) 355

describing the responses at position x, y with respect to the fixation point) is often 356mathematically characterized by a Gabor function, that is, the product of an 357exponential and a cosine function: 358

359 360

Equation V.1 361 362

where sx and sy control the spatial spread of the receptive field, k controls the 363spatial frequency, and f the phase (Dayan and Abbott, 2001). An example 364illustration of a Gabor function is shown in Figure V-5. The Gabor function is 365characterized by an excitatory region as well as a surrounding inhibitory region. 366 367 In addition to the spatial aspects of the receptive field, it is important to 368characterize the temporal dynamics of responses in V1. In most cases, the spatial 369and temporal aspects of the receptive fields in V1 can be considered to be 370approximately independent or separable. The temporal aspects of the receptive 371field can be described by the following equation: 372D(τ ) =αexp(−ατ ) (ατ )5 / 5!− (ατ )7 / 7!⎡⎣ ⎤⎦ Equation V.2 373for t >=0 and 0 otherwise. 374

375V.7. Nearby neurons show similar properties 376

377Neurons in primary visual cortex are topographically organized, in a similar 378

fashion to the situation described in the retina in Chapter II. The V1 topography is 379

�

D(x,y) = 12πσxσy

exp − x2

2σx2 −

y2

2σy2

⎡

⎣ ⎢

⎤

⎦ ⎥ cos kx−φ( )

Figure V-5. The spatial structure of receptive fields of V1 neurons is often described by a Gabor function. A. Illustration of a Gabor function. B. Contour plot.


10

inherited from the LGN: the connections from the LGN to primary visual cortex are 380topographically organized, meaning that nearby neurons in the LGN map onto 381nearby neurons in primary visual cortex. V1 neurons cover the visible visual field, 382with a much higher density of neurons covering the foveal region. These 383neurophysiological observations are consistent with the types of scotomas 384described for cases of V1 lesions described in Chapter III. 385 386

Another aspect of the topographical arrangement of neurons in V1 was 387discovered by Hubel and Wiesel by comparing the preferences of different neurons 388recorded during the same electrode penetration. In addition to sharing properties 389with their two-dimensional neighbors along the cortical sheet, neurons also share 390similar response patterns with their neighbors in the third dimension representing 391cortical depth. Advancing the electrode in a direction approximately tangential to 392the cortical surface, different neurons along a penetration share similar orientation 393tuning preferences. This observation led to the notion of a columnar structure: 394neurons within a column have similar preferences, neurons in adjacent columns 395show a continuous variation in their orientation tuning preferences. 396

397This smooth map of tuning properties onto V1 is not a necessary 398

requirement for V1 computations. In fact, recent work has shown that this level of 399organization may not be a universal property. The mouse primary visual cortex 400does not have such a clear topographical mapping and the map of tuning 401preferences is described as salt-and-pepper, or at best, showing having a very 402weak topographical arrangement. Topography may thus be dissociated from 403function. 404 405

V.8. A simple model of orientation selectivity in primary visual cortex 406 407

Figure V-6. Building orientation tuning by combining circular center-surround neurons. Schematic diagram showing how multiple LGN neurons with a circular center-surround receptive field structure can be combined to give rise to a V1 simple neuron that shows orientation tuning when those receptive field centers are properly aligned (modified from {Hubel, 1962 #1852}


11

In a remarkable feat of intuition, Hubel and Wiesel proposed a simple and 408elegant biophysically plausible model of how orientation tuning could arise from 409the responses of LGN-type receptive fields (Figure V-6). The input to a V1 neuron 410in the model comes from LG 411

In their model, multiple LGN neurons with circularly symmetric center-412surround receptive fields oriented along a line were made to project and converge 413onto a single V1 neuron. Subsequent work gave rise to a plethora of other possible 414models and there is still ongoing debate about the extent to which the Hubel-415Wiesel purely feed-forward model represents the only mechanism giving rise to 416orientation selectivity in area V1 (e.g. (Carandini et al., 2005)). Still, this simple and 417elegant interpretation of the origin of V1 receptive fields constitutes a remarkable 418example of how experimentalists can provide reasonable and profound models 419that account for their data. Furthermore, the basic ideas behind this model have 420been extended to explain the build-up of more complex neuronal preferences in 421other areas (e.g. (Serre et al., 2007)). 422

423In spite of significant amounts of work investigating the neuronal properties 424

in primary visual cortex, investigators do not agree in terms of how much still 425remains to be explained (Carandini et al., 2005). Biases in the recording 426procedures, stimuli, theories and ignorance of contextual effects and internal 427expectations may have an effect on the responses of neurons in V1. Yet, there has 428been significant progress over the last several years. Deciphering the neuronal 429preferences along the human ventral visual cortex is arguably one of the greatest 430adventures of Neuroscience. 431

432Extending their model for orientation selectivity in simple neurons by 433

combining the output of LGN neurons, Hubel and Wiesel proposed that the 434responses of a complex neurons could originate by the combination of responses 435from multiple simple neurons with similar orientation preferences but slightly 436shifted receptive fields. 437 438

V.9. Divide and conquer 439 440 Ascending through the hierarchy of cortical computations, leaving primary 441visual cortex, we reach the fascinating and bewildering cortical areas that bridge 442

Figure V-7. How does cortex convert pixels to percepts? Through the cascade of computations along the ventral visual stream, the brain can convert preferences for simple stimulus properties such as orientation tuning into complex features such as faces.


12

low-level visual features into the 443building blocks of perception. In 444primary visual cortex there are 445neurons that respond selectively to 446lines of different orientation (Hubel 447& Wiesel 1959, Hubel & Wiesel 4481968). At the other end of the visual 449hierarchy, there are neurons that 450respond selectively to complex 451shapes such as faces. In between, 452there is a large expanse of cortex 453involved in the magic 454transformations that take oriented 455lines into complex shapes. How do 456we go from oriented lines to 457recognizing faces and cars and 458

other fancy shapes Figure V-7.? Despite heroic efforts by a talented cadre of 459investigators to scrutinize the responses between primary visual cortex and the 460highest echelons of inferior temporal cortex, this part of cortex remains terra 461incognita in many ways. Visual information flows along the ventral visual stream 462from primary visual cortex into areas V2, V4, posterior and anterior parts of inferior 463temporal cortex. The cortical real estate between V2 and inferior temporal cortex 464composes a mysterious, seductive, controversial and fascinating ensemble of 465neurons whose functions remain unclear and are only beginning to be deciphered. 466 467 To solve the complex task of object recognition, the visual system seems 468to have adopted a “divide and conquer” strategy. Instead of trying to come up with 469a single function that will transform lines into complex shapes in one step, the 470computations underlying pattern recognition are implemented by a cascade of 471multiple approximately sequential computations. Each of these computations may 472be deceptively simple and yet the concatenation of such steps can lead to 473interesting and complex results. As a coarse analogy, consider a factory making 474cars. There is a long sequence of specialized areas, departments and tasks. One 475group of workers may be involved in receiving and ordering different parts, others 476may be specialized in assembling the carburetor, others in painting the exterior. 477The car is the result of all of these sequential and parallel steps. To understand 478the entire mechanistic process by which a car is made, we need to dig deeper into 479each of those specialized sub-steps. To understand the mechanisms orchestrating 480visual object recognition, we need to inspect neuronal ensembles along the ventral 481visual stream. 482 483

V.10. We cannot exhaustively study all possible visual stimuli 484 485 One of the challenges to investigate the function and preferences of 486neurons in cortex is that we have a limited amount of recording time for a given 487neuron. Given current techniques, it is simply impossible to examine the large 488

Table V.1: Response latencies in different areas in the macaque monkey (from Schmolesky et al 1998).

Area Mean(ms) S.D.(ms)LGNdMlayer 33 3.8LGNdPlayer 50 8.7V1 66 10.7V2 82 21.1V4 104 23.4V3 72 8.6MT 72 10.3MST 74 16.1FEF 75 13


13

number of possible combinations of different stimuli that might drive a neuron. 489Consider a simple scenario where we present a 5x5 image patch where each pixel 490is either black or white (Error! Reference source not found.). There are 225 such 491stimuli. If we present each stimulus for 100 ms and we do not allow for any 492intervening time in between stimuli, it would take more than 5 weeks to present all 493possible combinations. There are many more possibilities if we allow each pixel to 494have three colors (Red, Green, Blue) with an intensity between 0 and 255. We can 495typically hold extracellular recordings with single (non-chronic) electrodes for a 496couple of hours. Recent heroic efforts have managed to track the activity of 497presumably the same neuron for months (Bondar et al 2009, McMahon et al 2014). 498Yet, even with such chronic electrodes, it is difficult to keep an animal engaged in 499a visual presentation task for more than a few hours a day. Thus, investigators 500often recur to a number of astute strategies to decide which stimuli to use to 501investigate the responses of cortical neurons. These strategies typically involve a 502combination of: (i) inspiration from previous studies (past behavior of neurons in 503other studies is a good predictor of how neurons will behave in a new 504investigation); (ii) intuitions about what might matter for neurons (for example, 505many investigators have argued that real world objects such as faces should be 506important); (iii) statistics of natural stimuli (as discussed in Chapter II, it is 507reasonable to assume that neuronal tuning is sculpted by exposure to images in 508the natural world); (iv) computational models (to be discussed in more detail in 509Chapters VII-IX); (v) serendipity (the role of rigorous scrutiny and systematic 510observation combined with luck should not be underestimated). Combining these 511approaches, several investigators have begun probing the neural code for visual 512shapes along the ventral visual cortex. 513

514V.11. We live in the visual past: response latencies increase along the ventral 515

stream 516 517

Figure V-8.The curse of dimensionality in vision. With current techniques, we cannot exhaustively sample all possible stimuli. Here we consider a 5x5 grid of possible binary images (top) or possible grayscale images (bottom). Even for such simple stimuli, the number of possibilities is immense.


14

Vision seems to be instantaneous. You open your eyes and the world is 518out there, apparently immediately. Visual processing is indeed very fast. Indeed, 519we argued in Chapter I that the speed of vision is likely to have conferred critical 520advantages to the first species with eyes and may well constitute one of the key 521reasons why evolution led to the expansion of visual capabilities. Yet, the intuition 522that vision is instantaneous is nothing more than an illusion. It takes time for signals 523to propagate through the brain. A small fraction of this time has to do with the 524speed of propagation of signals within a neuron, along dendrites and axons. Yet, 525the within neuron delays are relatively small. In particular, action potential signals 526within axons that are insulated by myelin can propagate with speeds of about 100 527meters per second. Dendrites tend to be shorter than axons and propagation 528speeds within dendrites is also quite fast. The main reason why vision is far from 529instantaneous is the multiple computations and integration steps in each neuron 530combined with the synaptic hand-off of information from one neuron to the next. 531 532 At each processing stage in the visual system, it is possible to estimate the 533time it takes for neurons in that area to realize that a flash of light was presented. 534Response latencies to a stimulus flash within the receptive field of a neuron 535increase from ~45 ms in the LGN to ~100 ms in inferior temporal cortex (Hung et 536al, 2005, Schmolesky et al 1998) (Table 5.1). There is an increase in the average 537latency within each area from the retina to the LGN to V1, to V2, to V4, to ITC. This 538progression of latencies has further reinforced the notion of the ventral processing 539stream as an approximately hierarchical and sequential architectures. Each 540additional processing stage along the ventral stream adds an average of ~15 ms 541of computation time. It should be emphasized that these are only coarse values 542and there is a lot of neuron-to-neuron variability within each area. For example, an 543analysis in anesthetized monkeys by Schmolesky and colleagues showed 544latencies ranging from 30 ms all the way to 70 ms in primary visual cortex. Because 545of this heterogeneity, the distribution of response latencies overlap and the fastest 546neurons in a given area (say V2) may fire before the slowest neurons in an earlier 547area (say V1). The notion of sequential processing is only a coarse approximation. 548The response latencies constrain the number of computations required to perform 549computations along the visual hierarchy. 550 551 Because of these latencies, we continuously live in the past in terms of 552vision. The notion that we only see the past events is particularly evident when we 553consider distant stars. The light signals that reach the Earth have left those stars 554a long time ago. Although much less intuitive, the same idea applies to visual 555processing in the brain. Of course, the time it takes for light to bounce on a given 556object and reach the retina is negligible, yet signal propagation in the brain takes 557on the order of a hundred milliseconds as discussed above. In several cases, 558through learning, the brain might be able to account for these delays by predicting 559what will happen next. For example, how is it possible for a ping-pong player to 560respond to a smash? The ball may be moving at about 50 km/h (apparently, the 561world record is about 112 km/h) and thus traverses the ~3 m distance in about 200 562ms. By the time the opponent has to hit the ball back, his or her visual cortex are 563


15

processing sensory inputs from the time when the ball was passing the net in the 564best-case scenario. Not to mention that to orchestrate a movement also takes time 565(signals need to travel from the decision centers of the brain all the way to the 566muscles). The only way to play ping-pong and other sports is to use the visual input 567combined with predictions learnt through experience. Because of these 568predictions, players not only capitalize on smashing speed but also recur to other 569strategies such as embedding the ball with spinning effects to confuse the 570opponent. 571 572

V.12. Receptive field sizes increase along the ventral visual stream 573 574 Concomitant with the prolonged latencies, as we ascend through the visual 575hierarchy, receptive fields become larger (Error! Reference source not found.). 576Receptive fields range from below one degree in the initial steps (LGN, V1) all the 577way to several degrees or even in some cases tens of degrees in the highest 578echelons of cortex (Kobatake & Tanaka 1994, Rolls 1991). Each area has a 579complete map of the visual field; thus, the centers of the receptive fields go from 580the fovea all the way to the periphery. As discussed for primary visual cortex, within 581each area, the size of the receptive field increases as we move farther away from 582the periphery. There is always better resolution in the fovea, across all visual 583areas. The range of receptive field sizes within an area also increases with the 584mean receptive field size. The distributions are relatively narrow in primary visual 585cortex but investigators have described a wide range of receptive field sizes in V4 586or inferior temporal cortex. The scaling factor between receptive field size and 587eccentricity is more pronounced in V4 than in V2 and in V2 compared to V1. 588 589 590

V.13. What do neurons beyond V1 prefer? 591 592 There have been a few systematic parametric studies of the neuronal 593preferences in areas V2 and V4. These studies have clearly opened the doors to 594

Figure V-9.Receptive field sizes increase with eccentricity and along the ventral stream. Receptive field size increases within eccentricity for a given area. Additionally, receptive field size increases along the ventral visual stream at a fixed eccentricity. a. Experimental measurements based on neurophysiological recordings in macaque monkeys. B. Schematic rendering of receptive field sizes in areas V1, V2 and V4. Reproduced from {Freeman, 2011 #3743}.


16

investigate the complex transformations along the ventral visual stream. Despite 595multiple interesting studies comparing responses in V1, V2 and V4, there isn’t yet 596a clear unified theory of what neurons “prefer” in these higher visual areas. Of 597course, the term “prefer” is an anthropomorphism. Neurons do not prefer anything. 598They fire spikes whenever the integration of their inputs exceeds a given threshold. 599Investigators often speak about neuronal preferences in terms of what types of 600images will elicit high firing rates. 601 602 The notion that V1 neurons show a preference for orientation tuning is well 603established, even if this only accounts for part of the variance in V1 responses to 604natural stimuli (Carandini et al 2005). There is significantly less agreement as to 605the types of shape features that are encoded in V2 and V4. There have been 606several studies probing responses with stimuli that are more complex than oriented 607bars and less complex than everyday objects. These stimuli include sinusoidal 608gratings, hyperbolic gratings, polar gratings, angles formed by intersecting lines, 609curvatures with different properties, among others (Hegde & Van Essen 2003, 610Hegde & Van Essen 2007, Kobatake & Tanaka 1994, Pasupathy & Connor 2001). 611Simple stimuli such as Cartesian gratings can certainly drive responses in V2 and 612V4. As a general rule, neurons in V2 and V4 can be driven more strongly by more 613complex shapes. As discussed above in the context of latency, there is a wide 614distribution of stimulus preferences in V2 and V4. 615 616 Perhaps one of the challenges is that investigators seek an explanation of 617neural coding preferences in terms of colloquial English expressions such as 618orientation, curvature, etc. An attractive idea that is gaining momentum is the 619notion that neurons in these higher visual areas filter the inputs from previous 620stages to produce complex tuning functions that defy language-based 621descriptions. A neuron may be particularly activated by a patch representing 622complex shapes and textures that is not simply defined as an angle or a convex 623curve. Ultimately, the language of science is mathematics, not English or 624Esperanto. Neuronal tuning properties do not have to map in any direct way to a 625short language-based description. 626 627

V.14. Brains construct an interpretation of the world: the case of illusory 628contours 629

630 Another pervasive illusion is that our senses contain a veridical 631representation of exactly what is out there in the world. This notion can be readily 632debunked through the study of visual illusions (Chapter I). Let us revisit the 633Kanizsa triangle (Error! Reference source not found.) where we have the strong 634illusion of perceiving an equilateral triangle in the midst of the three pacman icons. 635The sides of the triangle near the vertices are composed of real black/white 636contours. However, the center of each side is composed of a line that does not 637really exist. These lines represent illusory contours, edges created without any 638change in luminance. It is relatively easy to “trick the eye”. Except that the eye is 639typically not tricked in most visual illusions. Visual illusions represent situations 640


17

where our brains construct an 641interpretation of the image that is 642different from the pixel level 643content. In most such illusions, 644retinal ganglions neurons do 645follow the pixel level content in 646the image relatively well. If we 647record the activity of a retinal 648ganglion neuron whose 649receptive field is right in the 650center of the illusory contour, 651nothing would happen upon 652flashing the Kanizsa figure. In 653other words, the activity of 654retinal ganglion neurons does 655not correlate with our 656perception. But if the retina does 657not reflect perception, then who 658does? It seems reasonable to 659conjecture that there must be 660neurons somewhere that 661explicitly represent the contents 662of our perception, in this case 663the illusory contours (this is a 664critical postulate that we will 665

discuss again in more depth when we take up the question of the neuronal 666correlates of consciousness in Chapter X). 667 668 Indeed, neurons in area V2 respond vigorously to illusory contours (Error! 669Reference source not found.). These V2 neurons respond almost equally well to 670an illusory line or to a real line (Lee 2003, Lee & Nguyen 2001, von der Heydt et al 6711984). The responses to illusory contours are remarkable because there is no 672contrast change within the neuron’s receptive field. Hence, these responses must 673indicate a form of context modulation that is consistent with the subjective 674interpretation of borders. There are also neurons in V1 that respond to illusory 675contours but there are more such neurons in V2. 676 677

V.15. A colorful V4 678 679 Neurons in area V4 are particularly sensitive to stimulus color (Zeki 1983). 680Neurons in area V4 demonstrate sensitivity to color properties that are more 681complex than those observed in earlier areas such as LGN parvocellular neurons 682or V1 blobs. Neurons in V4 have been implicated in the phenomenon of color 683constancy whereby an object’s color is relatively insensitive to large changes in 684the illumination, in contrast to the responses earlier in the visual system. 685 686

Figure V-10. V2 neurons can represent lines that do not exist except in the eyes of the beholder. The figure shows the Kanizsa triangle visual illusion and a schematic rendering of neurophysiological recordings from 4 neurons: two retinal ganglion neurons (RGC) and two V2 neurons. When the receptive fields (gray dotted circles) encompass locations that have a real contour (A), both RGC and V2 neurons fire vigorously. In contrast, when the receptive fields encompass an illusory contour (B), the V2 neuron fires vigorously but the RGC neuron only fires a few baseline spikes.


18

V.16. Attentional modulation 687 688 As discussed for V1, neurons along the ventral visual cortex receive 689massive top-down signals in addition to their bottom-up inputs (Markov et al 2012). 690Presumably through these top-down signaling mechanisms, the activity of neurons 691along ventral visual cortex can be strongly modulated by signals beyond the 692specific content of their receptive fields including spatial context and higher level 693cognitive influences such as task goals. 694 695 A prime example of this type of modulation involves spatial attention 696(Desimone & Duncan 1995, Reynolds & Chelazzi 2004). Importantly, spatial 697attention effects can be demonstrated outside of the fixation focus. That is, a 698subject can be looking at one place and paying attention to another place, a 699phenomenon known as covert attention (as opposed to overt attention which is the 700more common scenario where attention is allocated to the fixated area). Through 701a series of astute training paradigms, investigators have been able to train animals 702to deploy covert attention, thus enabling them to investigate the consequences of 703spatial attention on neurons with receptive fields outside the fovea. Neurons 704typically show an enhancement in the responses when their receptive field is within 705the locus of attention. The magnitude of this attentional effect follows the reverse 706hierarchical order, being significantly stronger in area V4 compared to area V1. 707 708

V.17. Summary 709 710

• Visual computations transpire in the six-layered neocortical structure 711 712

• Cortex is characterized by stereotypical connectivity patterns from 713one area to the next forming approximately canonical microcircuits 714

715• The gold standard to study cortical function is to scrutinize the activity 716

of individual neurons 717 718

• Neurons in primary visual cortex show orientation tuning, responding 719more strongly to a bar in a certain orientation within the receptive 720field 721

722• Complex neurons in primary visual cortex show tolerance to the 723

exact position of the preferred stimulus within the receptive field 724 725

• The responses of neurons in primary visual cortex can be 726phenomenologically fitted by Gabor functions 727

728• A model posits that V1 simple cell receptive fields can be created by 729

adequately combining the outputs of center-surround neurons from 730the lateral geniculate nucleus positioned to create the desired 731orientation 732


19

733• A model posits that V1 complex cell receptive fields can be created 734

by adequately combining the outputs of V1 simple cells with the 735same orientation preferences but slightly shifted receptive fields 736

737• Visual cortex uses a divide and conquer strategy, subdividing visual 738

processing into a sequence of computations in tens of different brain 739areas arranged into an approximate hierarchy 740

741• Ascending through the visual hierarchy, neurons show increased 742

receptive field sizes, more complex tuning preferences, and longer 743latencies 744

745 746

V.18. References 747 748Adrian, E. (1926). The impulses produced by sensory nerve endings. Part 2: The749responseofasingleend-organ.JournalofPhysiology61,151-171.750Brodmann,K.(1909).VergleichendeLokalisationslehrederGrosshirnnrindeinihren751PrinzipiendargestelltaufGrunddesZellenbaues(Leipzig:Barth).752Carandini,M.,Demb,J.B.,Mante,V.,Tolhurst,D.J.,Dan,Y.,Olshausen,B.A.,Gallant,J.L.,753andRust,N.C.(2005).Doweknowwhattheearlyvisualsystemdoes?JNeurosci25,75410577-10597.755Dayan,P.,andAbbott,L.(2001).TheoreticalNeuroscience(Cambridge:MITPress).756DeValois,R.L.,Albrecht,D.G.,andThorell,L.G.(1982).Spatialfrequencyselectivityof757cellsinmacaquevisualcortex.VisionRes22,545-559.758Felleman,D.J.,andVanEssen,D.C.(1991).Distributedhierarchicalprocessinginthe759primatecerebralcortex.CerebralCortex1,1-47.760Finger, S. (2000). Minds behind the brain. A history of the pioneers and their761discoveries.(NewYork:OxfordUniversityPress).762Holmes, G. (1918). Disturbances of vision by cerebral lesions. British Journal of763Ophthalmology2,353-384.764Hubel,D.H.,andWiesel,T.N.(1998).Earlyexplorationofthevisualcortex.Neuron20,765401-412.766Kandel,E.,Schwartz, J.,andJessell,T.(2000).PrinciplesofNeuralScience,4thedn767(NewYork:McGraw-Hill).768Kreiman, G. (2004). Neural coding: computational and biophysical perspectives.769PhysicsofLifeReviews1,71-102.770Serre, T., Kreiman, G., Kouh,M., Cadieu, C., Knoblich, U., and Poggio, T. (2007). A771quantitativetheoryofimmediatevisualrecognition.ProgressInBrainResearch165C,77233-56.773774 775 776

Chapter V Adventures into terra incognita: probing the...

Documents

Transcript of Chapter V Adventures into terra incognita: probing the...