local building blocks Deep learning and human...
Transcript of local building blocks Deep learning and human...
Deep learning and human vision
Mini lecture 2: hand-wired vs. learning
local building blocks
continuous valued inputs and outputs representing frequency of action potentials (spikes”
w1
w4
w5
w3
w2
what determines the weights wi ?,
V1
1-2 mm
Hubel & Wiesel, 1960s
V1receptive fields
a dictionary of image features?
linear
1-2 mm
- + -
point-wise non-linearity
Receptive field
standard non-linear spatial filter
V1 model
convolution
just one of a set of spatial frequency “channels”
linear
1-2 mm
- + - divisive normalization
Neighboring neurons
Receptive field
/
standard non-linear spatial filter
V1 model
convolution
an example of what DiCarlo et al. called a “normalized LN” mode, or NLN
shallow convolutional networkswhat can they do?
texturesdetect faces
detect edges
…but despite much research, few of these shallow networks work very well as models of human perception, except for simple stimuli and tasks
e.g. they work well as predictors of contrast detection, discrimination of certain types of textures.
simple and complex cells in V1
one possible circuit model for complex cell—illustrating
local translation invariance
- +-
Simple Cell receptive
field
what determines the weights wij as one proceeds up levels (j) of the hierarchy?,
the tasks of vision, e.g. “core” recognition
the regularities in images, e.g. high correlations between nearby pixels
?
hierarchical models for feature extraction given task constraints, e.g. core recognition
• Local features progressively grouped into more structured representations
• edges => contours => fragments => parts => objects
• Selectivity/invariance trade-off
• Increased selectivity for object/pattern type
• Decreased sensitivity to view-dependent variations of translation, scale and illumination
Fukushima 1988
Fukushima, K. (1988). Neocognitron - a Hierarchical Neural Network Capable of Visual-Pattern Recognition. Neural Networks, 1(2), 119–130.
1-2 mm
simple and complex cells in V1
one model illustrating local translation invariance
- + -
simple cell receptive
field“AND-ing”
“OR-ing”
simple & complex cells in V1• Simple cells
• “template matching”, i.e. detect conjunctions, logical “AND”
• Complex cells
• insensitivity to small changes in position, detect disjunctions, logical “OR”
• Recognition as the hierarchical detection of “disjunctions of conjunctions”
Recognize the letter “t”
i=1 i= 2 i=3 i=1 i= 2 i=3 i=1 i= 2 i=3
i=9
“t” is represented by the conjunction of a vertical and horizontal bar AND
OR OR ...
= t
which can occur at any one of many locations i
“t”: h1 && v1 || h2 && v2 || h3 && v3...
Riesenhuber & Poggio, 1999
S - simple cell like
C - complex cell like
volunteers to lead next week paper discussions?
• Serre, T., Wolf, L., Bileschi, S., Riesenhuber, M., & Poggio, T. (2007). Robust object recognition with cortex-like mechanisms. Pattern Analysis and Machine Intelligence, 29(3), 411–426.
• Serre, T., Oliva, A., & Poggio, T. (2007). A Feedforward Architecture Accounts for Rapid Categorization, 104(15), 6424–6429.
• Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025.
• Wu, C.-T., Crouzet, S. M., Thorpe, S. J., & Fabre-Thorpe, M. (2015). At 120 msec You Can Spot the Animal but You Don“t Yet Know It”s a Dog. Cognitive Neuroscience, Journal of, 27(1), 141–149. http://doi.org/10.1162/jocn_a_00701