local building blocks Deep learning and human...

4
Deep learning and human vision Mini lecture 2: hand-wired vs. learning local building blocks continuous valued inputs and outputs representing frequency of action potentials (spikes” w1 w4 w5 w3 w2 what determines the weights w i ?, V1 1-2 mm Hubel & Wiesel, 1960s V1 receptive fields a dictionary of image features?

Transcript of local building blocks Deep learning and human...

Deep learning and human vision

Mini lecture 2: hand-wired vs. learning

local building blocks

continuous valued inputs and outputs representing frequency of action potentials (spikes”

w1

w4

w5

w3

w2

what determines the weights wi ?,

V1

1-2 mm

Hubel & Wiesel, 1960s

V1receptive fields

a dictionary of image features?

linear

1-2 mm

- + -

point-wise non-linearity

Receptive field

standard non-linear spatial filter

V1 model

convolution

just one of a set of spatial frequency “channels”

linear

1-2 mm

- + - divisive normalization

Neighboring neurons

Receptive field

/

standard non-linear spatial filter

V1 model

convolution

an example of what DiCarlo et al. called a “normalized LN” mode, or NLN

shallow convolutional networkswhat can they do?

texturesdetect faces

detect edges

…but despite much research, few of these shallow networks work very well as models of human perception, except for simple stimuli and tasks

e.g. they work well as predictors of contrast detection, discrimination of certain types of textures.

simple and complex cells in V1

one possible circuit model for complex cell—illustrating

local translation invariance

- +-

Simple Cell receptive

field

what determines the weights wij as one proceeds up levels (j) of the hierarchy?,

the tasks of vision, e.g. “core” recognition

the regularities in images, e.g. high correlations between nearby pixels

?

hierarchical models for feature extraction given task constraints, e.g. core recognition

• Local features progressively grouped into more structured representations

• edges => contours => fragments => parts => objects

• Selectivity/invariance trade-off

• Increased selectivity for object/pattern type

• Decreased sensitivity to view-dependent variations of translation, scale and illumination

Fukushima 1988

Fukushima, K. (1988). Neocognitron - a Hierarchical Neural Network Capable of Visual-Pattern Recognition. Neural Networks, 1(2), 119–130.

1-2 mm

simple and complex cells in V1

one model illustrating local translation invariance

- + -

simple cell receptive

field“AND-ing”

“OR-ing”

simple & complex cells in V1• Simple cells

• “template matching”, i.e. detect conjunctions, logical “AND”

• Complex cells

• insensitivity to small changes in position, detect disjunctions, logical “OR”

• Recognition as the hierarchical detection of “disjunctions of conjunctions”

Recognize the letter “t”

i=1 i= 2 i=3 i=1 i= 2 i=3 i=1 i= 2 i=3

i=9

“t” is represented by the conjunction of a vertical and horizontal bar AND

OR OR ...

= t

which can occur at any one of many locations i

“t”: h1 && v1 || h2 && v2 || h3 && v3...

Riesenhuber & Poggio, 1999

S - simple cell like

C - complex cell like

volunteers to lead next week paper discussions?

• Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. Pattern Analysis and Machine Intelligence, 29(3), 411–426.

• Serre, T., Oliva, A., & Poggio, T. (2007). A Feedforward Architecture Accounts for Rapid Categorization, 104(15), 6424–6429.

• Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.

• Wu, C.-T., Crouzet, S. M., Thorpe, S. J., & Fabre-Thorpe, M. (2015). At 120 msec You Can Spot the Animal but You Don“t Yet Know It”s a Dog. Cognitive Neuroscience, Journal of, 27(1), 141–149. http://doi.org/10.1162/jocn_a_00701