Learning and Selecting Features Jointly with Point-wise...
Transcript of Learning and Selecting Features Jointly with Point-wise...
![Page 1: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/1.jpg)
1 / 61
Learning and Selecting Features Jointly with Point-wise Gated Boltzmann Machines
Kihyuk Sohn, Guanyu Zhou, Chansoo Lee, Honglak Lee
Dept. of Electrical Engineering and Computer Science University of Michigan
![Page 2: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/2.jpg)
2 / 61
Kihyuk Sohn
• Overview
• Preliminary
• Point-wise Gated Boltzmann Machines
• Experimental results
• Conclusion
Outline
![Page 3: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/3.jpg)
3 / 61
Kihyuk Sohn
• Unsupervised feature learning (Hinton et al., 2006, Bengio et al., 2007, Ranzato et al., 2007, Bengio, 2009)
– Powerful in discovering representations from unlabeled data.
– However, not all patterns (or data) are equally important.
• When data contains lots of distracting factors, learning meaningful representations can be challenging.
• Feature selection (Jain & Zongker, 1997, Yang & Pedersen, 1997, Weston et al., 2001, Guyon & Elisseeff, 2003)
– Powerful in selecting features from labeled data.
– However, it assumes existence of discriminative features.
• There may not be such features at hand.
Learning from scratch
![Page 4: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/4.jpg)
4 / 61
Kihyuk Sohn
• Learning features from images for object recognition.
Motivating Example
• Want to learn “person” specific high-level features for good recognition performance.
![Page 5: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/5.jpg)
5 / 61
Kihyuk Sohn
• Learning features from images for object recognition.
Motivating Example
• Want to learn “person” specific high-level features for good recognition performance.
• There are lots of irrelevant patterns in the background other than person.
![Page 6: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/6.jpg)
6 / 61
Kihyuk Sohn
• Learning features from images for object recognition.
Motivating Example
• Want to learn “person” specific high-level features for good recognition performance.
• There are lots of irrelevant patterns in the background other than person.
• Class labels may be helpful, though they don’t specify where to focus in the image.
![Page 7: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/7.jpg)
7 / 61
Kihyuk Sohn
• Learning features from images for object recognition.
Motivating Example
• Want to learn “person” specific high-level features for good recognition performance.
• There are lots of irrelevant patterns in the background other than person.
• Class labels may be helpful, though they don’t specify where to focus in the image.
Q. How can we learn task-relevant high-level features using
(weak) supervision?
We develop a joint model for feature learning and feature
selection.
![Page 8: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/8.jpg)
8 / 61
Kihyuk Sohn
• Feature learning using class labels
– Convolutional Deep Neural Networks (Lecun et al., Neural Computation 1989; Krizhevsky et al., NIPS 2012, Ciresan et al., Neural Computation 2011, etc.)
– Deep (Belief) Networks (Hinton and Salakhutdinov, Science 2006, Bengio et al., NIPS 2006, Hinton et al., Neural Computation 2006, etc.)
– Discriminative RBMs (Larochelle & Bengio, ICML 2008)
Related Work
![Page 9: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/9.jpg)
9 / 61
Kihyuk Sohn
• Foreground and background modeling with Boltzmann machines.
– Robust Boltzmann machines (Tang et al., CVPR 2012)
– Masked RBMs (Le Roux et al., Neural Computation 2011; Heess et al., ICANN 2011)
– Our model makes use of class labels and perform generative feature selection while feature learning.
Related Work
![Page 10: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/10.jpg)
10 / 61
Kihyuk Sohn
w1 w2
Restricted Boltzmann Machines
• Representation
– Undirected bipartite graphical model.
– : binary visible (observed) units.
– : binary hidden units. hidden units
visible units
w3
![Page 11: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/11.jpg)
11 / 61
Kihyuk Sohn
• Inference
– Efficient and exact due to conditional independence.
– Joint probability can be estimated using Gibbs sampling.
– Posterior can be used as a feature.
• Training: maximum-likelihood.
– Stochastic gradient descent using sampling-based approximation (e.g., contrastive divergence).
Inference and Learning in RBM
![Page 12: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/12.jpg)
12 / 61
Kihyuk Sohn
Feature Encoding in RBM
… …
…
hidden layer
visible layer
Samples from variations of MNIST with natural images in the background. (Larochelle et al., ICML 2007)
![Page 13: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/13.jpg)
13 / 61
Kihyuk Sohn
Feature Encoding in RBM
hidden layer
visible layer
… …
…
Issues with standard RBMs: 1. RBMs assume all input features are useful (e.g., task-relevant),
but it may not be true in many scenarios.
![Page 14: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/14.jpg)
14 / 61
Kihyuk Sohn
Feature Encoding in RBM
hidden layer
visible layer
… …
…
Issues with standard RBMs: 1. RBMs assume all input features are useful (e.g., task-relevant),
but it may not be true in many scenario. 2. Set of useful input features may vary across examples.
![Page 15: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/15.jpg)
15 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
hidden layer
… visible layer
… …
![Page 16: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/16.jpg)
16 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
…
hidden layer
Binary switch variables
… visible layer
Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction
between switch and visible variables.
… …
![Page 17: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/17.jpg)
17 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
…
Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction
between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the
contribution of each visible variable only when it is useful.
…
hidden layer
Binary switch variables
visible layer 1 0 0 1 1
… …
![Page 18: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/18.jpg)
18 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
…
Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction
between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the
contribution of each visible variable only when it is useful.
…
hidden layer
Binary switch variables
visible layer 1 0 0 1 1
… …
![Page 19: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/19.jpg)
19 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
…
Point-wise Gated Boltzmann Machines (PGBM) • Point-wise (or input coordinate-wise) multiplicative interaction
between switch and visible variables. • Per-visible-unit switch variable (z1,…,zD ; binary) gates the
contribution of each visible variable only when it is useful.
…
hidden layer
Binary switch variables
visible layer 1 0 0 1 1
… …
![Page 20: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/20.jpg)
20 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
… …
switch variables 𝒛
(red: 1, blue: 0)
visible variables 𝐯
![Page 21: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/21.jpg)
21 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
… …
switch variables 𝒛
(red: 1, blue: 0)
visible variables 𝐯
Clean input with switch variables
![Page 22: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/22.jpg)
22 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
… …
switch variables 𝒛
(red: 1, blue: 0)
visible variables 𝐯
Clean input with switch variables
![Page 23: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/23.jpg)
23 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
… …
switch variables 𝒛
(red: 1, blue: 0)
visible variables 𝐯
Clean input with switch variables
Focus on different set of input features dynamically.
![Page 24: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/24.jpg)
24 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
…
…
… …
• Plate notation
![Page 25: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/25.jpg)
25 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM)
![Page 26: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/26.jpg)
26 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM)
Focus on modeling useful input features
![Page 27: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/27.jpg)
27 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM) • How about the irrelevant patterns?
Focus on modeling useful input features
![Page 28: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/28.jpg)
28 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables.
Focus on modeling useful input features
![Page 29: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/29.jpg)
29 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables. • Modeling irrelevant patterns helps distinguishing relevant patterns from
irrelevant patterns.
Focus on modeling the rest of input features
Focus on modeling useful input features
![Page 30: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/30.jpg)
30 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM) • PGBM models irrelevant patterns using another set of hidden variables. • Modeling irrelevant patterns helps distinguishing relevant patterns from
irrelevant patterns.
Focus on modeling the rest of input features
Focus on modeling useful input features
![Page 31: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/31.jpg)
31 / 61
Kihyuk Sohn
Point-wise Gated Boltzmann Machines
• Plate notation
Point-wise Gated Boltzmann Machines (PGBM) • Modeling with multiple components than two is also possible.
…
…
![Page 32: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/32.jpg)
32 / 61
Kihyuk Sohn
• Representation
– : binary visible units.
– : binary hidden units.
– : binary switch units.
Point-wise Gated Boltzmann Machine
![Page 33: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/33.jpg)
33 / 61
Kihyuk Sohn
• PGBM is an unsupervised learning algorithm, and it can only group semantically distinct features with each group of hidden units.
• How to make PGBM to learn discriminative features using class labels?
→ Supervised PGBM
Learning Discriminative Features in PGBM
![Page 34: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/34.jpg)
34 / 61
Kihyuk Sohn
• Labels are connected to one group of hidden units.
Supervised PGBM
hidden layer
visible and switch layer
label layer
task-relevant hidden units
task-irrelevant hidden units
… …
[Related work: Rifai et al., ECCV 2012]
![Page 35: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/35.jpg)
35 / 61
Kihyuk Sohn
• Representation
– : binary visible units.
– : binary hidden units.
– : binary switch units.
– : 1-of-L label units.
Supervised PGBM
![Page 36: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/36.jpg)
36 / 61
Kihyuk Sohn
• Inference
– Mean-field or alternate Gibbs sampling for approximate inference.
– Conditional independence of single type of variables given other variables.
Inference and Learning in PGBM
![Page 37: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/37.jpg)
37 / 61
Kihyuk Sohn
• Conditional probabilities
Inference and Learning in PGBM
![Page 38: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/38.jpg)
38 / 61
Kihyuk Sohn
• Conditional probabilities P(h1|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
h1 focus on the task-relevant part of the input features that are gated with switch variables.
![Page 39: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/39.jpg)
39 / 61
Kihyuk Sohn
• Conditional probabilities P(h2|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
h2 focus on the task-irrelevant part of the input features that are gated with (complement of) switch variables.
![Page 40: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/40.jpg)
40 / 61
Kihyuk Sohn
• Conditional probabilities P(z|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
![Page 41: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/41.jpg)
41 / 61
Kihyuk Sohn
• Conditional probabilities P(z|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
The switch variable is determined through the competition between h1 and h2 based on the matching between visible variable and the contribution (reconstruction) from each group of hidden units.
![Page 42: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/42.jpg)
42 / 61
Kihyuk Sohn
• Conditional probabilities P(v|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
The visible variable is determined with both groups of hidden units.
![Page 43: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/43.jpg)
43 / 61
Kihyuk Sohn
• Conditional probabilities P(y|-)
Inference and Learning in PGBM
: variables to be sampled.
: variables that are given.
The label variable is inferred only with h1, not h2.
![Page 44: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/44.jpg)
44 / 61
Kihyuk Sohn
• Inference
– Mean-field or alternate Gibbs sampling for approximate inference.
– Conditional independence of single type of variables given other variables.
• Training
– Maximum-likelihood for joint distribution .
– Stochastic gradient descent using contrastive divergence.
Inference and Learning in PGBM
![Page 45: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/45.jpg)
45 / 61
Kihyuk Sohn
• Propagate only “task-relevant” information to higher layers.
• Stack multiple layers of neural networks on top of task-relevant group of hidden units.
Extensions – deeper architecture
…
2nd hidden layer …
3rd hidden layer …
label layer
![Page 46: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/46.jpg)
46 / 61
Kihyuk Sohn
• Convolutional Point-wise Gated Deep Network (CPGDN)
– Convolutional architecture is good at dealing with spatially (or temporally) correlated data.
– Convolutional deep belief network (CDBN; Lee et al., ICML 2009, Desjardins and Bengio, 2008)
Extensions – convolutional PGBM
![Page 47: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/47.jpg)
47 / 61
Kihyuk Sohn
– Low-level features are generic patterns (e.g., edges)
– High-level features are semantically meaningful.
Extensions – convolutional PGBM
W1
W2
[CDBN, Lee et al., ICML 2009]
input image
![Page 48: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/48.jpg)
48 / 61
Kihyuk Sohn
– Low-level features are generic patterns (e.g., edges)
– High-level features are semantically meaningful.
Extensions – convolutional PGBM
W1
W2
[CDBN, Lee et al., ICML 2009]
input image
![Page 49: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/49.jpg)
49 / 61
Kihyuk Sohn
– Low-level features are generic patterns (e.g., edges)
– High-level features are semantically meaningful.
Extensions – convolutional PGBM
W1
W2
[CDBN, Lee et al., ICML 2009]
input image
There are some irrelevant patterns as well.
![Page 50: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/50.jpg)
50 / 61
Kihyuk Sohn
– We can distinguish between task-relevant and irrelevant features with point-wise gating idea while feature learning.
Extensions – convolutional PGBM
input image
W1
W2
W1 W2
![Page 51: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/51.jpg)
51 / 61
Kihyuk Sohn
– We can distinguish between task-relevant and irrelevant features with point-wise gating idea while feature learning.
Extensions – convolutional PGBM
input image
W1
W2
W1 W2
![Page 52: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/52.jpg)
52 / 61
Kihyuk Sohn
• Task 1: handwritten digit recognition in the presence of background noise.
• Task 2: learning from large images with cluttered background in application to weakly supervised object localization and object recognition.
Experiments
![Page 53: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/53.jpg)
53 / 61
Kihyuk Sohn
• Recognizing handwritten digits in the presence of background noise.
– uniform random noise or natural images in the background.
– rotation transformations are applied.
– Due to significant amount of distracting factors, learning good features become much more challenging, and this results in poor recognition performance.
Experiments – variations of MNIST with background noise
original back-rand back-image rot-back-rand rot-back-image
![Page 54: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/54.jpg)
54 / 61
Kihyuk Sohn
• Learning from noisy handwritten digits with PGBM
Experiments – visualizations
Learned task-relevant hidden unit weights: mostly pen-strokes
Inferred switch variables
Noisy digit images (mnist-back-image)
Learned task-irrelevant hidden unit weights:
noisy patterns
![Page 55: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/55.jpg)
55 / 61
Kihyuk Sohn
• Handwritten digit recognition
– Evaluated on variations of MNIST.
– Compared with several variations of RBMs:
• Standard RBM.
• Implicit mixture of RBM (imRBM; Nair and Hinton, NIPS 2008) – multiple groups of hidden units.
• Discriminative RBM (discRBM; Larochelle and Bengio, ICML 2008) – supervised, semi-supervised training.
• Standard RBM + feature selection.
Experiments – digit recognition
![Page 56: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/56.jpg)
56 / 61
Kihyuk Sohn
• Handwritten digit recognition error rates
Experiments – digit recognition
0
10
20
30
40
50
60
RBM
imRBM
discRBM
RBM-FS
PGBM
supervised PGBM
back-rand back-image rot-back-image rot-back-rand
![Page 57: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/57.jpg)
57 / 61
Kihyuk Sohn
• Handwritten digit recognition error rates
Experiments – digit recognition
0
10
20
30
40
50
60
RBM
imRBM
discRBM
RBM-FS
PGBM
supervised PGBM
back-rand back-image rot-back-image rot-back-rand
![Page 58: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/58.jpg)
58 / 61
Kihyuk Sohn
• Handwritten digit recognition error rates
Experiments – digit recognition
0
10
20
30
40
50
60
RBM
imRBM
discRBM
RBM-FS
PGBM
supervised PGBM
back-rand back-image rot-back-image rot-back-rand
Feature selection doesn’t improve the performance.
![Page 59: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/59.jpg)
59 / 61
Kihyuk Sohn
• Handwritten digit recognition error rates
0
10
20
30
40
50
60
RBM
imRBM
discRBM
RBM-FS
PGBM
supervised PGBM
back-rand back-image rot-back-image rot-back-rand
Experiments – digit recognition
Joint feature learning and feature selection improves performance.
![Page 60: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/60.jpg)
60 / 61
Kihyuk Sohn
• Handwritten digit recognition error rates
0
10
20
30
40
50
60
RBM
imRBM
discRBM
RBM-FS
PGBM
supervised PGBM
back-rand back-image rot-back-image rot-back-rand
Experiments – digit recognition
2~8% decrease in absolute error rates.
![Page 61: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/61.jpg)
61 / 61
Kihyuk Sohn
• Comparison to other deep learning methods
Experiments – digit recognition
0
10
20
30
40
50
60 RBM
DBN-3 (Vincent et al., 2008)
CAE-2 (Rifai et al., 2011)
PGBM
supervised PGBM
PGBM + DN-1
back-rand back-image rot-back-image
![Page 62: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/62.jpg)
62 / 61
Kihyuk Sohn
• Comparison to other deep learning methods
Experiments – digit recognition
0
10
20
30
40
50
60 RBM
DBN-3 (Vincent et al., 2008)
CAE-2 (Rifai et al., 2011)
PGBM
supervised PGBM
PGBM + DN-1
back-rand back-image rot-back-image
![Page 63: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/63.jpg)
63 / 61
Kihyuk Sohn
• Comparison to other deep learning methods
Experiments – digit recognition
0
10
20
30
40
50
60 RBM
DBN-3 (Vincent et al., 2008)
CAE-2 (Rifai et al., 2011)
PGBM
supervised PGBM
PGBM + DN-1
back-rand back-image rot-back-image
![Page 64: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/64.jpg)
64 / 61
Kihyuk Sohn
• Comparison to other deep learning methods
Experiments – digit recognition
0
10
20
30
40
50
60 RBM
DBN-3 (Vincent et al., 2008)
CAE-2 (Rifai et al., 2011)
PGBM
supervised PGBM
PGBM + DN-1
back-rand back-image rot-back-image
3~13% decrease in absolute error rates, achieving state-of-the-art.
![Page 65: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/65.jpg)
65 / 61
Kihyuk Sohn
• Given cluttered, high-resolution images, how can we find relevant foreground features?
– Weakly supervised setting (no bounding box is given).
• Convolutional point-wise gating for generative feature selection while feature learning from large images.
Experiments – learning from large images with cluttered background
![Page 66: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/66.jpg)
66 / 61
Kihyuk Sohn
• CPGDN on Caltech 101 dataset
Experiments – CPGDN
input image
W1
W2
W1 W2
![Page 67: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/67.jpg)
67 / 61
Kihyuk Sohn
• Learned set of filters (task-relevant/irrelevant)
• (Weakly supervised) object localization
Experiments – weakly supervised object segmentation
Caltech101 - Faces Caltech101 – car side
1st row: switch unit activation map, 2nd row: predicted and ground truth bounding box.
![Page 68: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/68.jpg)
68 / 61
Kihyuk Sohn
• Learned set of bases from 101 classes
Experiments – weakly supervised object segmentation
Caltech101 – task-relevant
Caltech101 – task-irrelevant
![Page 69: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/69.jpg)
69 / 61
Kihyuk Sohn
Experiments – weakly supervised object segmentation
1st row: switch unit activation map, 2nd row: predicted and ground truth bounding box.
![Page 70: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/70.jpg)
70 / 61
Kihyuk Sohn
• Object recognition with predicted bounding boxes.
Experiments – object recognition
1. Bounding box prediction using CPGDN.
2. Dense SIFT feature extraction from cropped image.
3. Feature encoding with Gaussian RBM or CRBM (Sohn et al., ICCV 2011).
4. Spatial pyramid pooling, followed by linear SVM.
![Page 71: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/71.jpg)
71 / 61
Kihyuk Sohn
• Classification accuracy on Caltech 101 dataset
Experiments – object recognition
50
55
60
65
70
75
80
15 images/class 30 images/class
RBM (Sohn et al., 2011)
CPGDN + RBM
CRBM (Sohn et al., 2011)
CPGDN + CRBM
![Page 72: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/72.jpg)
72 / 61
Kihyuk Sohn
• Classification accuracy on Caltech 101 dataset
Experiments – object recognition
50
55
60
65
70
75
80
15 images/class 30 images/class
RBM (Sohn et al., 2011)
CPGDN + RBM
CRBM (Sohn et al., 2011)
CPGDN + CRBM
![Page 73: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/73.jpg)
73 / 61
Kihyuk Sohn
• Comparison to other results on Caltech 101 dataset
– With single type of features.
– MKL (Yang et al., 2009): 84.3%
– Multipath sparse coding (Bo et al., 2013): 82.5%
– ….
Experiments – object recognition
![Page 74: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/74.jpg)
74 / 61
Kihyuk Sohn
• We propose the PGBMs that jointly perform the feature learning and feature selection in a unified framework.
• The PGBM effectively learn useful representations from the data containing significant irrelevant or distracting patterns.
Conclusion
![Page 75: Learning and Selecting Features Jointly with Point-wise ...kihyuks/pubs/icml2013-Sohnetal-pgbm_slide.pdf · Learning and Selecting Features Jointly with Point-wise Gated Boltzmann](https://reader033.fdocuments.net/reader033/viewer/2022041417/5e1c7a54b2f8cf19605386b3/html5/thumbnails/75.jpg)
75 / 61
Kihyuk Sohn
Thank you
demo code is available: http://umich.edu/~kihyuks/pubs/pgbm.zip