Computational theory of the responses of V1 & MT...

13
Computational theory of the responses of V1 & MT neurons and psychophysics of motion perception ~50,000 neurons per cubic mm ~6,000 synapses per neuron ~10 billion neurons & ~60 trillion synapses in cortex Neural circuits perform computations Computational theory: how do neurons compute motion? Hubel & Wiesel (1968) V1 orientation tuning

Transcript of Computational theory of the responses of V1 & MT...

Computational theory of the responses of V1 & MT neurons and psychophysics of motion perception

~50,000 neurons per cubic mm~6,000 synapses per neuron~10 billion neurons & ~60 trillion synapses in cortex

Neural circuits perform computations

Computational theory: how do neurons compute motion?

Hubel & Wiesel (1968)

V1 orientation tuning

No stimulus in receptive field: no response

Non-preferred stimulus: no response

Preferred stimulus: large response

Orientation selectivity model

linearweighting function

rectification

firing ratestimulus

Complementaryreceptive fields

Rectification and squaring

Rectification and spiking threshold

Stimulus: vertical bar

Responses of each of severalorientation tuned neurons.

Peak (distribution mean) codesfor stimulus orientation.

Distributed representation of orientation

Broad tuning can code for small changes

Neural code depends on multiple factors

Direction selectivity

Hubel & Wiesel (1968)

Motion is like orientation in space-time and spatiotemporally oriented filters can be used to detect and measure it.

Orientation in space-time

Adelson & Bergen (1985)

XYTMotion is orientation in space-time

Strong response for motion inpreferred direction.

Weak response for motion innon-preferred direction.

Direction selectivity model

Ohzawa, DeAngelis, & Freeman

Space-time receptive field

t

x

0preferred speed

Population code

Distributed representation of speed

Each spatiotemporal filter computes something like a derivative of image intensity in space and/or time. “Perceived speed” is the orientation corresponding to the gradient in space-time(max response).

Impulse response

Strong response to preferred direction

Note: negative responses not seen in neural firing rates

Weak response to opposite direction

“Aperture Problem”

118 PONTIFICIAE ACADEMIAE SCIENTIARVM SCRIPTA VARTA - 54

Fig. 1. Three different motions that produce the same physical stimulus.

moves to the left. Note that in all three cases the appearance of the

moving grating, as seen through the window, is identical: the bars appear

to move up and to the left, normal to their own orientation, as if produced

by the arrangement shown in Fig. 1A. The fact that a single stimulus can

have many interpretations derives from the structure of the stimulus rather

than from any quirk of the visual system. Any motion parallel to a gra-

ting's bars is invisible, and only motion normal to the bars can be detected.

Thus, there will always be a family of real motions in two dimensions that

can give rise to the same motion of an isolated contour or grating

(Wohlgemuth, 1911, Wallach, 1935; Fennema and Thompson, 1979; Marr

and Ullman, 1981).

[Wallach 1935; Horn & Schunck 1981; Marr & Ullman 1981]

Figure: Movshon, Adelson, Gizzi, Newsome, 1985

The “aperture problem”

These three motions are different but look the same when viewed through a small aperture (i.e., that of a direction-selective receptive field).

Wallach (1935)

Intersection-of-constraints (IOC)

PATTERN RECOGNITION MECHANISMS 123

Fig. 4. A single grating (A) and a 90 deg plaid (B), and the representation of their motions in velocityspace. Both patterns move directly to the right, but have different orientations and 1-D motions. Thedashed lines indicate the families of possible motions for each component.

in spatial extent, and uniformly stimulated the entire retinal region they

covered. This sidesteps the issue which arises in considering stimuli like

the diamond of Fig. 2, of how the identification of spatially separate

moving borders with a common object takes place. Moreover, the plaid

patterns were the literal physical sum of the grating patterns, which makes

superposition models particularly simple to evaluate.

These stimuli were generated by a PDPll computer on the face of

a display oscilloscope, using modifications of methods that are well-

established (Movshon et al., 1978). Gratings were generated by modulat-

[Adelson & Movshon, 1982]

vxvx

vyvy

Intersection of constraints

With two different motion components within the aperture, there is a unique solution:

Adelson & Movshon (1981)

Component vs. pattern motion (perception)

Adelson & Movshon (1981)

+ =weak

pattern-motionpercept

strongpattern-motion

percept+ =

+ =weak

pattern-motionpercept

strongpattern-motion

percept+ =

Adelson & Movshon (1981)

Component vs. pattern motion (perception)

component-motion cell

=

pattern-motion cell

pattern moving up-right strong response

=

grating component moving up-right => strong response

Component vs. pattern motion selectivity

V1 MT

Gratings Plaids

V1

MT

Gratings Plaids

A B C D

E F G H

80

40

80

40

.70

24

12

24

12

.50

Movshon et al., 1983 Model

.25

.35

.50

.25

.70

.35

Simoncelli and Heeger, 1998

Component vs. pattern motion: single neurons

Component gratings

Adapted direction plaids

Mixed direction plaids

Component gratings

Component vs. pattern motion: fMRI adaptation

Huk & Heeger (2002)

74 nature neuroscience • volume 5 no 1 • january 2002

articles

tiples of 90° from trial to trial to minimize adaptation. Theresponse modulations in MT+ were not significantly differentfrom zero (A.C.H., p = 0.35; D.J.H., p = 0.74, two-tailed t-test),demonstrating that the two blocks of plaids elicited similarresponse levels when the effects of both component- and pat-tern-motion adaptation were absent. This result provides furtherevidence that adaptation, due to the repetition of pattern direc-tion, was the key factor in the original experiment.

DISCUSSIONOur findings demonstrate that human MT+ contains a popula-tion of pattern-motion cells and that the activity of those neu-rons is linked to the perception of coherent pattern motion. Thepattern-motion responsivity of human MT+ adds to the case fora homology to macaque MT, which includes a relatively largeproportion of pattern-motion cells1. We also observed lesserdegrees of pattern-motion adaptation in V2, V3, V3A and V4v.Macaque V3 is known to have a minority of pattern-motioncells13, but there are no published investigations of pattern-motion cells in macaque V2, V3A or V4. Although our datademonstrate pattern-motion responses in each of these visualareas, we cannot determine if pattern motion is computed sepa-rately in each visual area or if the responses in V2–V4 are affect-ed by the adaptation that is taking place in MT+. We emphasizethat fMRI adaptation studies14–17 can reveal the selectivities ofsubpopulations of neurons in the human brain, even when thoseneurons are intermingled at a spatial scale that is finer than thespatial sampling resolution (voxel size) of the fMRI measure-ments.

METHODSWe collected fMRI data in 3 subjects, males, 25–39 years old, all withnormal or corrected-to-normal vision. Experiments were undertakenwith the written consent of each subject, and in compliance with the safe-ty guidelines for MR research. Each subject participated in several scan-ning sessions: one to obtain a high-resolution anatomical volume, oneto identify MT+, one to identify the retinotopically organized corticalvisual areas, 2–3 to measure motion adaptation, one to measure base-line responses and 1–3 to perform control measurements. In each subject,we collected 8–20 repeats of the pattern-motion adaptation experimentand 8–16 repeats of the various control experiments.

Stimulus and protocol. Stimuli were presented on a flat-panel display (NEC,multisynch LCD 2000, Itasca, Illinois) placed within a Faraday box with aconducting glass front, positioned near the subjects’ feet. Subjects lay ontheir backs in the MR scanner and viewed the display through binoculars.

Subjects viewed a pair of circular patches 12° in diameter centered 7.5°to the left and right of a central fixation point. Patches were filled witha plaid stimulus comprised of two superimposed sinusoidal gratings.Individual component gratings had 20% contrast, and spatial and tem-poral frequencies were selected to yield a variety of pattern directionswhen superimposed in various combinations (Fig. 1).

Each scan consisted of 6 (32-s) cycles; each cycle consisted of alter-nating adapted-direction and mixed-direction blocks. Adapted-direc-tion blocks consisted of 8 consecutive trials in which the plaid stimulusalways appeared to move in the same direction (horizontally, at 12.9 or1.9°/s; Fig. 1a); mixed-direction blocks consisted of 8 trials in which thedirection of the plaids varied from trial-to-trial (possible plaid direc-tions, computed from the intersection-of-constraints of the componentgratings, were ±31°, ±123° from horizontal at 5.3°/s and 6.3°/s; Fig. 1b).The component gratings with orientations of ±72° had spatial frequen-cies of 0.5 cycles/degree and temporal frequencies of 2 cycles/second (Fig. 1a, components above first plaid); the component gratings withorientations ±45° had spatial frequencies of 0.5 cycles/degree and tem-poral frequencies of 0.67 cycles/second (Fig. 1a, components above sec-ond plaid). In the component-motion experiment, perceptualtransparency was achieved by scaling one component’s spatial frequencyup to 1 cycle/degree and the other down to 0.125 cycle/degree, producinga 3-octave separation. (Temporal frequencies were also scaled accord-ingly to leave component velocities unchanged.)

To control attention, subjects performed a speed discrimination judg-ment on each stimulus presentation16. Each 2-s trial consisted of 1300 ms of plaid motion followed by a 700-ms luminance-matched blankperiod during which subjects pressed a button to indicate which plaid(left or right of fixation) moved faster. The speed differences were deter-mined by an adaptive staircase procedure, adjusting the speeds from trialto trial so that subjects would be approximately 80% correct.

Across different blocks and experiments, we chose to equate percent-correct performance, instead of the exact stimulus speed (although speedsdid remain within a few percent), because in previous work, we and oth-ers have noted large attentional effects on MT+ responses, but no effectsof slight differences in speed18–20. Although the speed discriminationthresholds were larger for non-coherent (transparent gratings) than forcoherent plaids (but not for mixed- versus adapted-direction blocks),the differences were not very large (percent speed-increment thresholdswere !15% versus !10% for non-coherent versus coherent, respective-ly). These small speed differences might affect the responses of someindividual neurons (although speed tuning curves of all direction-selec-tive cells are rather broad), but these speed differences would not beexpected to evoke measurable changes in the pooled activity (as mea-sured with fMRI) of large populations of neurons.

In the adaptation experiments, equal numbers of scans were collectedwith the plaids moving in opposite directions (for example, inward

0 16 32–0.5

0

0.5

fMR

I res

pons

e(p

erce

nt B

OLD

sig

nal)

Pattern motionComponent motion

Adapteddirection

Mixeddirection

Visual area

0

0.1

0.2

V1 V2 V3 V4v V3A MT+

Pat

tern

ada

ptat

ion

inde

x(a

dapt

atio

n re

spon

se/b

asel

ine

resp

onse

)

Pattern motion

Component motion

Time (s)

Fig. 2. Pattern-motion adaptation in human visual cortex. (a) Averagetime series in MT+. Pattern-motion adaptation produced strong modu-lations in MT+ activity. Transparent component motion evoked muchless adaptation. Each trace represents the average MT+ response, aver-aged across subjects and scanning sessions. (b) Adaptation index acrossall visual areas. Pattern-motion adaptation was largest in MT+, but alsoevident in other extrastriate visual areas. Adaptation was weak androughly equal across visual areas in the transparent component-motionexperiment. Height of bars, geometric mean across subjects (arithmeticmean yielded similar results). Error bars, bootstrap estimates of the 68%confidence intervals.

a

b

©20

01 N

atur

e P

ublis

hing

Gro

up h

ttp:

//neu

rosc

i.nat

ure.

com

© 2001 Nature Publishing Group http://neurosci.nature.com

Huk & Heeger (2002)

Visual areaMT+V1 V2 V3 V4v V3A

Mot

ion

adap

tati

on in

dex

(ada

ptat

ion

resp

onse

/ba

selin

e re

spon

se)

Strong pattern motion percept

Weak pattern motion percept

Pattern motion selectivity across visual areas

Pattern motion selectivity model

Simoncelli & Heeger (1998)

Intersection of constraints (two components)

Each component activates a different V1 neuron, selective for a different orientation and speed.

Intersection of constraints (many components)

Each component activates a different V1 neuron, selective for a different orientation and speed.

How do you get selectivity for the moving pattern as a whole, not the individual components?

+ + + +

Answer: For each possible 2D velocity, add up the responses of those V1 neurons whose preferred orientation and speed is consistent with that 2D velocity.

Neural implementation of IOC

t

x

y

Construction of MT pattern cell velocity selectivity via combination

of V1 complex cell afferents, shown in the Fourier domain.

Linear velocity selectivity

Simoncelli, 1993

Add spectral energy on plane

Subtract spectral energy off plane

Spatiotemporal frequency domain

Spatiotemporal frequency response of space-time oriented linear filter.

Frequency responses of filters that are all consistent with one velocity.

!"#$%&&

!'()%*$+"*,$-%./0*"1'2,*,$

!!

!"!#

!

$!

$"

3/45*,%6/2"72"6/8-92,*"6"2,%:8/;:*88%;%,"/;*%,"#"*/,6<

'2,*,$=./0#.*>%:*,"+%*4#$%?@%./0*"1:/4#*,<

A7/72.#"*/,/8"+%6%4%0+#,*6467;/@*:%6#:*6";*52"%:;%7;%6%,?"#"*/,/8@%./0*"1<

AB-CDEFGG

MT

Linear weightingTuning

Simoncelli, 1993

!"#$% &&

!'( )%*$+"*,$ -%./0*"1 '2,*,$

!!

!"!#

!

$!

$"

3/45*,%6 /2"72"6 /8 -9 2,*"6 "2,%: 8/; :*88%;%," /;*%,"#"*/,6<

'2,*,$= ./0#.*>%: *, "+% *4#$%?@%./0*"1 :/4#*,<

A 7/72.#"*/, /8 "+%6% 4%0+#,*646 7;/@*:%6 # :*6";*52"%: ;%7;%6%,?"#"*/, /8 @%./0*"1<

AB-CD EFG G

MT

Linear weighting Tuning

Simoncelli, 1993

!"#$% &&

!'( )%*$+"*,$ -%./0*"1 '2,*,$

!!

!"!#

!

$!

$"

3/45*,%6 /2"72"6 /8 -9 2,*"6 "2,%: 8/; :*88%;%," /;*%,"#"*/,6<

'2,*,$= ./0#.*>%: *, "+% *4#$%?@%./0*"1 :/4#*,<

A 7/72.#"*/, /8 "+%6% 4%0+#,*646 7;/@*:%6 # :*6";*52"%: ;%7;%6%,?"#"*/, /8 @%./0*"1<

AB-CD EFG G

MT

Linear weighting Tuning

Simoncelli, 1993

Distributed representation of 2D velocity

Brightness at each location represents the firing rate of a single MT neuron with a different preferred velocity. Location of peak corresponds to perceived velocity.

+ + + +

vx

vy

V

Vy

x

V

Vy

x

?

V

Vy

x

Visual motion ambiguity

Visual motion ambiguity

43210-5

0

5

10

15

20

Model

Log Contrast Ratio

43210-5

0

5

10

15

20

5%10% 20%40%

Total Contrast

Subject

Log Contrast Ratio

Per

ceiv

ed D

irec

tion

Bia

s (d

egre

es)

Stone et al. 1990

Total contrast

5%

10%

20%

40%

Stone, Watson, Mulligan 1990

IOC failure

[Stone etal 1990]

Bias in perceived velocity

Stone, Watson, & Mulligan (1990)

Perception is our best guess as to what is in the world, given our current sensory input and our prior experience (Helmholtz, 1866).

Goal: explain “mistakes” in perception as “optimal” solutions given the statistics of the environment.

Bayesian perception

memory

Bayesian models of perception

V

Vy

x

V

Vy

x

?

V

Vy

x

V

Vy

x

V

Vy

x

?

V

Vy

x

Bayesian posteriors

Prior bias for slower speeds

Simoncelli (1993)

Bayesian perception

memory

m

likelihood

v

probability

P(m|v)

Bayesian estimation of velocity

Bayesian perception

posterior

v

pro

bab

ility

P(m|v) P(v) ~ P(v|m)x

prior

Bayesian estimation of velocity

prior

Bayesian perception

v^

v

probability

Bayesian estimation of velocity

Bayesian perception

v^ v^

v

probability

v

prior

Bayesian estimation of velocity

Vx

Vy

stimulus idealization modelVy

Vx

Vx

Vy

Vx

Vy

[Simoncelli & Heeger, ARVO ‘92]

Bayesian model predictions

Vx

Vy

stimulus idealization modelVy

Vx

Vx

Vy

Vx

Vy

Bayesian model predictions

43210

-5

0

5

10

15

20

Model

Log Contrast Ratio

43210-5

0

5

10

15

20

5%10% 20%40%

Total Contrast

Subject

Log Contrast Ratio

Per

ceiv

ed D

irec

tion

Bia

s (d

egre

es)

Stone et al. 1990

[Simoncelli & Heeger, ARVO ‘92]

Log contrast ratio

Human observers Model

Perc

eive

d di

rect

ion

bias

(deg

)

Stone et al (1990) Heeger & Simoncelli (1993)

Prior for slow speeds explains bias in perceptual bias

Vx

Vy

stimulus idealization modelVy

Vx

Vx

Vy

Vx

Vy

Bayesian model predictions

2 1. 5 1 0. 5 0 0.50.5

0.6

0.7

0.8

0.9

1

1.1

Log contrast ratio

Rel

ativ

e sp

eed

max contrast 70%

max contrast 40%

0 1 2 3 4� 5

0

5

10

15

20

25

Log2 contrast ratioB

ias(

degr

ees)

40%

5%

0 20 40 60 80

0

20

40

60

80

100F eature motion

Normal motion

C ontrast

Per

cent

cor

rect

1 2 3 4 5� 80

� 60

� 40

� 20

0

20

40

60

80

C ondition

Dire

ctio

n (d

egre

es)

IOC

0 10 20 30 40 50250

260

270

280

290

300V A

IOC

P laid component separation (degrees)

Judg

ed p

laid

dire

ctio

n

0.4 0.5 0.6 0.7 0.8

0

20

40

60

80

100

R atio of component speeds P

erce

ntag

e in

VA

dire

ctio

n

V A

IOC

Stone & Thompson, ‘90 Stone etal, ‘90 Lorenceau etal, ‘92

Yo & Wilson, ‘92 Burke & Wenderoth, ‘93 Bowns, ‘96

[Weiss, Simoncelli, Adelson, ‘02]

Weiss, Simoncelli, & Adelson (2002)see also Stocker & Simoncelli (2006)

Theory fits lots of behavioral data

The “big picture”

•Functional specialization and computational theory (two balancing principles in the field).

•Canonical computation (linear sum, threshold or sigmoid nonlinearity, adaptation).

•Perception is an inference that has evolved/developed to match the statistics of the environment (Bayesian estimation with priors that embody statistics of environment).

A computational theory of motion appearance

L+M+SL-M L+M-S

A computational theory of color appearance

What distinguishes neural activity that underlies conscious visual appearance?

- Neural activity in certain brain areas.

- Activity of specific subtypes of neurons.

- Particular temporal patterns of neural activity (e.g., oscillations).

- Synchronous activity across groups of neurons in different brain areas.

- Neural activity that is driven by a coherent combination of bottom-up sensory information and top-down recurrent processing (e.g., linked to attention).

- Nothing. Once you know the computations, you’re done!