Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications...

44
Learning sparse representations of static and time-varying natural images Bruno A. Olshausen Center for Neuroscience, U.C. Davis & Redwood Neuroscience Institute, Menlo Park, CA

Transcript of Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications...

Page 1: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Learning sparse representations of static andtime-varying natural images

Bruno A. Olshausen

Center for Neuroscience, U.C. Davis&

Redwood Neuroscience Institute, Menlo Park, CA

Page 2: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Main Points

• Vision as inference

• Learning sparse, overcomplete image representations

• Sparse coding in V1

• Learning shiftable basis functions

Page 3: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in
Page 4: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Recurrent computation is pervasive throughout cortex

V4

LGN

V1 V2

IT

retina

Page 5: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Vision as inference

lens

ImageWorld Model

Page 6: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

How do you interpret an edge?

Page 7: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in
Page 8: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Lightness perception depends on 3D scene layout

Page 9: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Bayes’ rule

P (E|D) ∝ P (D|E)︸ ︷︷ ︸how data isgenerated by

the environment

× P (E)︸ ︷︷ ︸prior beliefsabout the

environment

E = the actual state of the environment

D = data about the environment

Page 10: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Sparse coding

ai

I(x,y)

• Provides a simple description of images

• Makes image structure explicit → Grouping

• Makes it easier to learn associations

• Field’s (1987) analysis of simple-cell receptive fields suggests they havebeen optimized for sparseness.

Page 11: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Image model

I(x, y) =∑

i

ai φi(x, y) + ν(x, y) .

P (a|I, θ) ∝ P (I|a, θ) P (a|θ)

P (I|θ) =∫

P (I|a, θ) P (a|θ) da

Goal: Find a set of basis functions {φi} for representing natural imagessuch that the coefficients ai are as sparse and statistically independent aspossible.

Page 12: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Prior

• Factorial: P (a|θ) =∏

i P (ai|θ)

• Sparse: P (ai|θ) = 1ZS

e−S(ai)

ai

P(ai)

Page 13: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Objective functions for inference and learning

Inference (perception):

P (a|I, θ) ∝ P (I|a, θ) P (a|θ)

Learning:

〈log P (I|θ)〉 =⟨

log∫

P (I|a, θ) P (a|θ) da⟩

Page 14: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Energy function

E = log P (a|I, θ)

=λN

2

∑x,y

[I(x, y)−

∑i

aiφi(x, y)

]2

+∑

i

S(ai) + const.

Page 15: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Dynamics

ai ∝ −∂E∂ai

:

τ ai = bi −∑

j

Cijaj − S′(ai)

bi = λN

∑x,y

φi(x, y) I(x, y)

Cij = λN

∑x,y

φi(x, y)φj(x, y)

Page 16: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Network implementation

I(x)

ai

φi(x)

−S’−Cij

Page 17: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Learning

∆φi ∝ −⟨

∂E∂φi

⟩:

∆φi(x, y) = η 〈ai r(x, y)〉

r(x, y) = I(x, y)−∑

i

aiφi(x, y)

Page 18: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Learned basis functions (200, 12x12)

Page 19: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Sparsification

0 20 40 60 80 100 120 140 160 180 200−2

−1

0

1

2posterior maximum

0 20 40 60 80 100 120 140 160 180 200−2

−1

0

1

2linear prediction

coefficient

outp

ut

image reconstruction

Page 20: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Tiling properties

−0.5 0 0.5−0.5

0

0.5

fx

fy

0 5 10 15 20

0

5

10

15

20

x

y

0 0.1 0.2 0.3 0.4 0.50

0.1

0.2

0.3

0.4

0.5

fr

band

wid

th

0 0.1 0.2 0.3 0.4 0.50

5

10

15

20

25

30

fr

Num

ber

−6 −5 −4 −3 −2 −10

2

4

6

8

10

log2 ( f

r )

log 2 (

Num

ber

)

0 0.1 0.2 0.3 0.4 0.51.6

1.8

2

2.2

2.4

2.6

2.8

3

fr

Orie

ntat

ion

Page 21: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in
Page 22: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Scale space cross-section of a fractal contour

Page 23: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Space-time image model

I(x, y, t) =∑

i

ai(t) ∗ φi(x, y, t) + ν(x, y, t)

. . .

t

t

ai(t)

τx

y

x

y

t’

φi(x,y,t−t’)

I(x,y,t)

Goal: Find a set of space-time basis functions {φi} forrepresenting natural images suchthat the time-varying coefficientsai(t) are as sparse and statisticallyindependent as possible over bothspace and time.

Page 24: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Learned space-time basis functions (200, 12× 12× 7)

Training set: nature documentary

Page 25: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

V1 space-time receptive field

(Courtesy of Dario Ringach)

Page 26: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Basis function properties

1

2

3

30

210

60

240

90

270

120

300

150

330

180 0

0 0.2 0.40

0.5

1

1.5

2

2.5

3Speed vs. spatial−frequency

Spatial−frequency (cy/pixel)

Spe

ed (

pixe

ls/fr

ame)

Speed vs. direction

Page 27: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Spike encoding and reconstruction

0 1 2 3 4 5 6 7

−2

0

2

sparsified

0 1 2 3 4 5 6 7

−2

0

2

convolution

time (sec)

ampl

itude

Page 28: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

“Standard model” of V1 simple-cells

Image

I(x,y,t)

K(x,y,t)

Receptive field

+-

-linear response - or /

Responsenormalization

Pointwisenon-linearity

nieghboring neurons

r(t)

Response

Page 29: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Evidence for sparse coding

Data from Gray lab (J. Baker and S. Yen)

0 5 10 15 20 25 30−5

0

5

10

15

20

25

time (sec)

resp

onse

(sp

ikes

/sec

)

actual response (psth)predicted response

Page 30: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Responses of nearby units are heterogeneous

0 5 10 15 20 25 300

10

20

30cell 1

0 5 10 15 20 25 300

10

20

30cell 2

0 5 10 15 20 25 300

10

20

30

spik

es/s

ec

time (sec)

cell 3

Page 31: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Context in natural scenes sparsifies responses

Vinje & Gallant (2000, 2002)

metabolic load by lowering mean spike rates. Furthermore, thesethree findings suggest that nCRF stimulation reduces the effectivebandwidth of single neurons, thereby restricting the range ofstimuli that they represent.

nCRF stimulation increases informationtransmission rateDoes this shrinkage in effective bandwidth reduce the amount ofinformation represented by V1 neurons? If information is lost,then the stimulus representation will be coarsened rather thanmade sparser (Foldiak and Young, 1995; Olshausen and Field,1997; Barlow, 2001). Information must be preserved if nCRFstimulation truly increases sparseness. Information transmissioncan be preserved in numerous ways. One possibility is that theoverall information transmission rate might be preserved at thelevel of individual neurons. Alternatively, some neurons mayincrease their information transmission rates while other neuronstransmit less information.

Information transmission rates (bits per second) for our sampleof V1 neurons are shown in Figure 6A–D. For each neuron ateach stimulus size, we compared information rates observed withand without nCRF stimulation. Neurons with significantly in-creased information rates are shown in black, while those with sig-nificantly decreased rates are shown in white (p � 0.01). The effectsof natural nCRF stimulation vary across neurons. Some exhibitdecreases in information transmission rates, whereas others exhibitincreases. Interestingly, significant increases in information trans-mission rates occur more frequently than significant decreases. Theratio of significant increases to significant decreases is 3.8:1 at 2 �CRF, 3.4:1 at 3 � CRF, and 3.7:1 at 4 � CRF.

For our sample of neurons, the average information transmis-sion rate also increases with stimulus size (Fig. 6E). The increasein mean rate is modest but statistically significant for stimulussizes of 2 � CRF and 3 � CRF (p � 0.05) and is marginallysignificant for stimuli of 4 � CRF diameter (p � 0.07).

Table 1 shows the average information rate as a function ofstimulus size and time-bin duration. In general, the average rateincreases as time-bin duration decreases. From 50 msec to 4.6msec, the information transmission rate increases by �250%. Theincrease in information rates for short binning times is commonlyobserved in neurophysiological data sets (Strong et al., 1998) and

occurs because H(r) increases more rapidly than H(r�s) as binduration shrinks.

Our second prediction is that the average information trans-mission rate should not decrease as stimulus size increases. Ourresults demonstrate that information transmission actually in-creases with stimulus size. This is consistent with the predictedpreservation of information. It also suggests that nCRF stimula-tion may be necessary to fully realize the information-processingpotential of V1 neurons.

nCRF stimulation increases information per spikeAs discussed in the introductory remarks, sparse coding offersseveral potential advantages to the nervous system. It may sim-plify development of neural connections, increase learning rates,and increase memory capacity (Barlow, 1961, 2001). Sparse cod-ing also reduces the number of action potentials required torepresent a scene and thereby decreases the metabolic demandsof information processing (Srinivasan et al., 1982; Laughlin et al.,1998). If the system is to maintain the fidelity with which a sceneis represented, this reduction in spiking activity must be accom-panied by an increase in the average amount of information eachspike provides about the stimulus. Thus, natural nCRF stimula-tion should increase the average information carried by eachspike.

The average information that a spike transmits about the stim-ulus is found by simply dividing the information per second by themean number of spikes per second: Ispike � Isec/�, where � is themean spike rate of the neuron for all stimuli of a given size.

Information transmission per spike is shown in Figure 7A–D.Figure conventions are identical to those used in Figure 6. Stim-ulation of the nCRF can increase or decrease the information perspike, but the trend is strongly toward increasing the informationcontent of spikes. The ratio of neurons with significant increasesto those with significant decreases is 6.5:1 at 2 � CRF and 26:1 at3 � CRF. For data obtained with stimuli of 4 � CRF diameter,all significantly modulated neurons show increases in their infor-mation transmission per spike.

The mean information per spike also increases substantially asa function of stimulus size (Fig. 7E, black circles). For stimuli of4 � CRF diameter, the mean information per spike is 1.85 timeslarger than that of the value obtained with CRF-sized stimuli. All

Figure 4. The nCRF modulates responses dur-ing natural vision. A, PSTH obtained from oneV1 neuron in response to a natural-vision movieconfined to the CRF. Responses are weakly mod-ulated by the simulated fixations (information persecond, 13.1 bits/sec; information per spike, 0.18bits/spike; efficiency, 10%; selectivity index, 13%).B, Responses of the same cell to a natural-visionmovie composed of the CRF stimulation used inA plus a circular surrounding region. The overallstimulus size was 4 � CRF diameter. Stimulationof the nCRF dramatically increases variation ofresponses across fixations (information per sec-ond, 28.4 bits/sec; information per spike, 0.67 bits/spike; efficiency, 26%; selectivity index, 51%). Re-sponses to some stimuli are significantly enhanced(black bins; p � 0.01). For this neuron, enhance-ment is concentrated in the onset transients oc-curring at the beginning of simulated fixations.Other responses are strongly suppressed (whitebins; p � 0.01). The under-bar highlights thosetime bins where significant enhancement and sup-pression occur.

Vinje and Gallant • nCRF Stimulation Increases Efficiency of V1 Cells J. Neurosci., April 1, 2002, 22(7):2904–2915 2909

Page 32: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Extreme sparse coding

• Gilles Laurent - mushroom body, insect

• Michael Fee - HVC, zebra finch

• Tony Zador - auditory cortex, mouse

• Bill Skaggs - hippocampus, primate

• Harvey Swadow - motor cortex, rabbit

• Michael Brecht - barrel cortex, rat

• Christof Koch - inferotemportal cortex, human

Page 33: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Hahnloser RHR, Kozhevnikov AA, Fee MS (2002) An ultra-sparse code

underlies the generation of neural sequences in a songbird. Nature, 419,

65-70.

Page 34: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in
Page 35: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Review article

Olshausen BA, Field DJ (2004) Sparse coding of sensory inputs. CurrentOpinion in Neurobiology, 14, 481-487.

http://redwood.ucdavis.edu/bruno

Page 36: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Problems with the current model

• Sparsification: small changes in the image could lead to drastic changesin the output representation.

• Factorial prior: coefficients exhibit strong dependencies, so the factorialprior is wrong.

• Linear model: how to extend to a hierarchical model?

Page 37: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in
Page 38: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Image model with ‘shiftable’ basis functions

I(x) =∑

i

<{zi φi(x)}

zi = ai ej αi

φi(x) = φRi (x) + j φI

i (x)

I(x) =∑

i

ai [cos αi φRi (x) + sin αi φI

i (x)]

Page 39: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Learned complex basis functions (144, 12 × 12 patches)

real imag

animate!

Page 40: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Local phase is important

Original image

Page 41: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Local phase is important

Magnitudes only

Page 42: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Local phase is important

Phases only

Page 43: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Conclusions

• V1 neurons represent time-varying natural images in terms of sparseevents.

• Joint dependencies among coefficients may be modeled with shiftablebasis functions → neurons carry both amplitude and phase?

Page 44: Learning sparse representations of static and time-varying ...helper.ipam.ucla.edu › publications › mgaws1 › mgaws1_5063.pdfFigure conventions are identical to those used in

Further information and details

[email protected]://redwood.ucdavis.edu/bruno