Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu

Post on 30-Jan-2016

34 views 0 download

Tags:

description

Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu UCLA Department of Statistics JSM, August 2010. Outline Primary visual cortex (V1) Modeling and learning in V1 Layered hierarchical models. http://www.stat.ucla.edu/~ywu/ActiveBasis - PowerPoint PPT Presentation

Transcript of Statistical Modeling and Learning in Vision --- cortex-like generative models Ying Nian Wu

Statistical Modeling and Learning in Vision --- cortex-like generative models

Ying Nian WuUCLA Department of Statistics

JSM, August 2010

http://www.stat.ucla.edu/~ywu/ActiveBasis

Matlab/C code, Data

Outline• Primary visual cortex (V1)• Modeling and learning in V1 • Layered hierarchical models

Source: Scientific American, 1999

Visual cortex: layered hierarchical architecture

V1: primary visual cortex simple cells complex cells

bottom-up/top-down

1]}[2

1exp{)(

22

22

21

21 ixe

xxxG

Simple V1 cells Daugman, 1985

Gabor wavelets: localized sine and cosine waves

Transation, rotation, dilation of the above function

)'()'(,'

,,,, xBxIBIx

sxsx

image pixels

V1 simple cells

,,sxB

respond to edges

Complex V1 cells Riesenhuber and Poggio,1999

2,,)(),( |,|max sxxAx BI

Image pixels

V1 simple cells

V1 complex cells

Local max

Local sum

•Larger receptive field •Less sensitive to deformation

Independent Component Analysis Bell and Sejnowski, 1996

CBcBcI NN B ...11

Nicpci ,...,1tly independen )(~

)dim(IN

IIC AB 1

mNNmmm CBcBcI B ,11, ...

mmm IIC AB 1

Laplacian/Cauchy

Hyvarinen, 2000

Sparse coding Olshausen and Field, 1996

Laplacian/Cauchy/mixture Gaussians

Nicpci ,...,1tly independen )(~

NNBcBcI ...11

mNNmmm BcBcI ,11, ...)dim(IN

Inference: sparsification, non-linear lasso/basis pursuit/matching pursuit mode and uncertainty of p(C|I) explaining-away, lateral inhibition

Nicpci ,...,1tly independen )(~

Sparse coding / variable selection

Learning: mNNmmm BcBcI ,11, ...

)dim(IN

A dictionary of representational elements (regressors)

NNBcBcI ...11

Olshausen and Field, 1996

}exp{)(

1),(

,, j

jiiji IcB

ZICp

B

Nici ,...,1 ,

I

Restricted Boltzmann Machine Hinton, Osindero and Teh, 2006

P(I|C) P(C|I): factorized no-explaining away

hidden, binary

visible

Energy-based model Teh, Welling, Osindero and Hinton, 2003

)},(exp{),(

1)(

iiBIZ

Ip B

Features, no explaining-away

Maximum entropy with marginalsExponential family with sufficient stat

)},(exp{)(

1)(

,,,

sxis BI

ZIp

Zhu, Wu, and Mumford, 1997Wu, Liu, and Zhu, 2000

Markov random field/Gibbs distribution

Zhu, Wu, and Mumford, 1997Wu, Liu, and Zhu, 2000

Source: Scientific American, 1999

Visual cortex: layered hierarchical architecture

bottom-up/top-down

What is beyond V1?Hierarchical model?

Hierchical ICA/Energy-based model?

Larger featuresMust introduce nonlinearitiesPurely bottom-up

P(I,C) = P(C)P(I|C) P(C) P(J,C)

I

C

I

J

Discriminative correction by back-propagation

Unfolding, untying, re-learning

Hierarchical RBM Hinton, Osindero and Teh, 2006

Hierarchical sparse coding

NNBcBcI ...11

,,sxB

Attributed sparse coding elements transformation group topological neighborhood system

UBcIii sx

n

ii

,,

1

Layer above : further coding of the attributes of selected sparse coding elements

msx

n

iimm UBcI

imim

,, ,,

1,

msxx

n

iimm UBcI

imiimi

,, ,,

1,

Hierarchical sparse coding

Active basis

imiim xxx ,,

imiim ,,

Wu, Si, Fleming, Zhu, 2007

Residual generalization

Shared matching pursuit

msxx

n

iimm UBcI

imiimi

,, ,,

1,

M

msxx

n

iimm imiimiBcI

1

2,,

1, ||||

,,

, ,

2, , ( , ) ( ) , ,

1

2, , ( , ) ( ) , ,

, , ,

0: Initialize , 0.

1: Let 1. Let ( , ) arg max max | , | .

2: Let ( , ) arg max | , | .

3: Let , .

i i i

i m i i m i

m m

M

i i x s x A m x x sm

m i m i x A m x x s

m i m x x s

U I i

i i x U B

x U B

c U B

, ,, , ,Update .

4: Stop if , else go back to 1.

i m i i m im m m i x x sU U c B

i n

1. Local maximization in step 1: complex cells, Riesenhuber and Poggio,19992. Arg-max in step 2: inferring hidden variables 3. Explaining-away in step 3: lateral inhibition

Wu, Si, Fleming, Zhu, 2007

Active basis

msxx

n

iimm UBcI

imiimi

,, ,,

1, Two different scales

Putting multiple scales together

msxx

n

iimm UBcI

imiiimi

,, ,,

1,

)'(,, xB sx

More elements added

Residual images

msxx

n

iimm UBcI

imiimi

,, ,,

1,

Statistical modeling

, ,, ,( , 1,..., )i m i i m im x x sB i n Borthogonal

imiimi sxxmim BIc,, ,,, ,

n

i

n

i im

imim

im

imimmm rq

rpIq

cq

cpIqIp

1 1 ,

,

,

,

)(

)()(

)(

)()()|( B

2,, || imim cr

)()}(exp{)(

1)( rqrhZ

rp ii

i

)|()|( CUqCUp mm

Conditional independence of coefficients

Exponential family model

Strong edges in background

Wu, Si, Gong, Zhu, 2010

……

……

UBcIiiii sxxx

n

ii

,,

1

)](log)|,(|max[)(1

2,,)(),( i

n

isxxxAxi ZBIhxl

iii

Detection by sum-max maps Wu, Si, Gong, Zhu, 2010

Image pixels

V1 simple cells

V1 complex cells

Local max

Local sum

Complex V1 cells Riesenhuber and Poggio,1999

2,,)(),( |,|max sxxAx BI

•Larger receptive field •Less sensitive to deformation

SUM-MAX maps (bottom-up/top-down)

)](log)|,(|max[)(1

2,,)(),( i

n

isxxxAxi ZBIhxl

iii

Local maximization: complex cellsRiesenhuber and Poggio,1999

Gabor wavelets: simple cellsOlshausen and Field, 1996

SUM2 operator: what “cell”?

Bottom-up detectionTop-down sketching

SUM1

MAX1

SUM2

arg MAX1

Sparse selective connection as a result of learningExplaining-away in learning but not in inference

Bottom-up scoring and top-down sketching

Adjusting Active Basis Model by L2 Regularized Logistic RegressionBy Ruixun Zhang

L2 regularized logistic regressionre-estimated lambda’s

Conditional on: (1) selected basis elements (2) inferred hidden variables (1) and (2) generative learning

•Exponential family model, q(I) negatives Logistic regression•Generative learning without negative examples•Discriminative correcting of conditional independence assumption (with hugely reduced dimensionality)

Learning from non-aligned training images

msxxx

n

iimm UBcI

imiimim

,,)( ,,

1,

, ,( , 1,..., )i ix sB i n B

Learning from non-aligned training images

EM mixture

msxxk

n

iimm UBcI im

kiim

ki

,

)(,

)( ,,)(

1,

( ) ( )

( )

, ,{ ( , 1,..., ), 1,..., }k k

i i

k

x sB i n k K

B

EM mixture

MNIST

Active bases as part-templates

Split bike template to detect and sketch tandem bike

Is there an edge here?

Is there an edge nearby?

Is there a wheel here?

Is there a wheel nearby?

Is there a tandem bike here?

Soft scoring instead of hard decision

Learning part templates or visual words

Shape script model

Shape motifs: elementary geometric shapes

UBcIii sx

n

ii

,,

1

),...,1,,(motif shape),...,1,,( nixnix iik

kii

UCIkkkk sx

K

kk

,,,

1

B nK

Si and Wu, 2010

UBcIii sx

n

ii

,,

1

),...,1,,(motif shape),...,1,,( nixnix iik

kii

Layers of attributed sparse coding elements