Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When...
Transcript of Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When...
![Page 1: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/1.jpg)
Shu KongCS, ICS, UCI
Pay Attention to the Pixel, Understand the Scene Better
![Page 2: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/2.jpg)
pixel level labelingsemantic segmentation -- assigning a class label to each pixel
Background: Scene Parsing
wall painting pillow sofa cabinet towel
![Page 3: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/3.jpg)
Background: Scene Parsing
old days (before 2013), extracting features to represent pixels,unary pixel classification and pixel pairs for CRF
features, transforms,grouping, etc.
![Page 4: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/4.jpg)
Background: Scene Parsing
nowadays, deep learning
![Page 5: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/5.jpg)
Convolutional Neural Network (CNN)
aggregating local information
avoid computing over whole image directly
Background: Convolutional Neural Net
Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.
![Page 6: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/6.jpg)
Receptive Field @ high-level layers
Background: Receptive Field in CNN
photo credit to Honglak Lee
![Page 7: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/7.jpg)
Image Classification
Background: CNN for Scene Parsing
cat vs. dog
![Page 8: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/8.jpg)
Dense Prediction
Background: CNN for Scene Parsing
wall painting pillow sofa cabinet towel
![Page 9: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/9.jpg)
Background: Perspective and Scale
size(car) > size(train)?
size(chair) > size(whiteboard)?
![Page 10: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/10.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Outline
![Page 11: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/11.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Outline
![Page 12: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/12.jpg)
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information
Perspective-aware Pooling
![Page 13: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/13.jpg)
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information
The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.
Perspective-aware Pooling
![Page 14: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/14.jpg)
Goal: deciding for each pixel the size of receptive field (RF) to aggregate information
The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information..
Depth conveys the scale information.
Perspective-aware Pooling
![Page 15: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/15.jpg)
Idea: making the pooling size adaptive w.r.t depth
Perspective-aware Pooling
![Page 16: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/16.jpg)
Idea: making the pooling size adaptive w.r.t depth
dilated convolution (Atrous Convolution).
Perspective-aware Pooling
![Page 17: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/17.jpg)
Idea: making the pooling size adaptive w.r.t depth
dilated convolution (Atrous Convolution).
Perspective-aware Pooling
![Page 18: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/18.jpg)
2D atrous convolution of different dilate rates.
allowing for larger RF to aggregate more contextual information
Perspective-aware Pooling
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
![Page 19: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/19.jpg)
Idea: making the pooling size adaptive w.r.t depth
Perspective-aware Pooling
![Page 20: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/20.jpg)
Idea: making the pooling size adaptive w.r.t depth
quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
![Page 21: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/21.jpg)
Idea: making the pooling size adaptive w.r.t depth
quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
Multiplicative gating
![Page 22: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/22.jpg)
Idea: making the pooling size adaptive w.r.t depth
quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
![Page 23: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/23.jpg)
Idea: making the pooling size adaptive w.r.t depth
quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}
Perspective-aware Pooling
![Page 24: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/24.jpg)
When depth is not available in inference --
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 25: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/25.jpg)
When depth is not available in inference --
Idea: train a depth estimator
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 26: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/26.jpg)
When depth is not available in inference --
Idea: train a depth estimator
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 27: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/27.jpg)
When depth is not available in inference --
Idea: train a depth estimator
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
Why better?
capacity, representation power
![Page 28: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/28.jpg)
Consistent improvement
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 29: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/29.jpg)
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 30: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/30.jpg)
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 31: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/31.jpg)
Recurrently refining by adapting the predicted depth
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 32: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/32.jpg)
Recurrent refinement by adapting the predicted depth
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 33: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/33.jpg)
No depth at all?
Idea: train unsupervised attention map
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 34: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/34.jpg)
No depth at all?
Idea: train unsupervised attention map
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
![Page 35: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/35.jpg)
No depth at all?
Idea: train unsupervised attention map
Perspective-aware Pooling
S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018
gt-depth
pred-depth
attention
![Page 36: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/36.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Pixel-wise Attentional Gating
![Page 37: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/37.jpg)
Improvement
1. attention vs. depth
general, learning rule
Pixel-wise Attentional Gating
![Page 38: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/38.jpg)
Improvement
1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
Pixel-wise Attentional Gating
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
![Page 39: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/39.jpg)
Improvement
1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
computational saving
Pixel-wise Attentional Gating
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
![Page 40: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/40.jpg)
PAG produces binary masks, e.g., using argmax on softmax.
Pixel-wise Attentional Gating
![Page 41: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/41.jpg)
PAG produces binary masks, e.g., using argmax on softmax.
How to output binary maps but still allowing end-to-end training?
Pixel-wise Attentional Gating
![Page 42: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/42.jpg)
PAG produces binary masks, e.g., using argmax on softmax.
How to output binary maps but still allowing end-to-end training?
Idea: Gumbel-softmax trick[1,2]
Pixel-wise Attentional Gating
[1] Categorical reparameterization with gumbel-softmax, ICLR, 2017[2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
![Page 43: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/43.jpg)
The normal way for sampling is to use argmax operator on softmax.
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
![Page 44: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/44.jpg)
The normal way for sampling is to use argmax operator on softmax.
Then,
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
![Page 45: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/45.jpg)
The normal way for sampling is to use argmax operator on softmax.
Then,
Here is an alternative that
where
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
![Page 46: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/46.jpg)
The normal way for sampling is to use argmax operator on softmax.
Then,
Here is an alternative that
where
random variable m follows
Pixel-wise Attentional Gating
Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)
![Page 47: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/47.jpg)
but not continuous, not differentiable
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
![Page 48: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/48.jpg)
but not continuous, not differentiable
expressing a discrete random variable as a one-hot vector
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
![Page 49: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/49.jpg)
but not continuous, not differentiable
expressing a discrete random variable as a one-hot vector
the Gumbel softmax relaxation is
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017
![Page 50: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/50.jpg)
one-hot-encoded categorical distribution
from argmax to uniform controlled by = 0~inf
Pixel-wise Attentional Gating
Categorical reparameterization with gumbel-softmax, ICLR, 2017
![Page 51: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/51.jpg)
PAG produces binary masks.
Two applications
1. parallel pooling branch for deciding the “right” receptive field
2. pixel-level dynamic routing
Pixel-wise Attentional Gating
![Page 52: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/52.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Attentional Pooling for the “Right” Receptive Field
![Page 53: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/53.jpg)
Improvement
1. attention vs. depth
general, learning rule
2. binary gating vs. weighted gating
computational saving
Attentional Pooling for the “Right” Receptive Field
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018
![Page 54: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/54.jpg)
Visual summary of three tasks on three different datasets
Consistent result:
PAG improves baseline, and is comparable to Multiplicative gating
Attentional Pooling for the “Right” Receptive Field
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 55: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/55.jpg)
semantic segmentation
Attentional Pooling for the “Right” Receptive Field
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 56: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/56.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Pixel-Level Dynamic Routing
![Page 57: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/57.jpg)
dynamic computation time for each pixel of an image
It is useful with limited computation budget.
Pixel-Level Dynamic Routing
![Page 58: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/58.jpg)
Pixel-Level Dynamic Routing
![Page 59: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/59.jpg)
sparse PAG at each layer
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 60: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/60.jpg)
sparse binary mask
Using KL-divergence term for sparse masks.
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 61: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/61.jpg)
experiment of semantic segmentation
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 62: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/62.jpg)
experiment of semantic segmentation
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 63: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/63.jpg)
learning to skip layers for dynamic routing
Pixel-Level Dynamic Routing
[1] BlockDrop: Dynamic Inference Paths in Residual Networks[2] Convolutional Networks with Adaptive Computation Graphs[3] SkipNet: Learning Dynamic Routing in Convolutional Networks
![Page 64: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/64.jpg)
experiment of semantic segmentation
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 65: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/65.jpg)
experiment of semantic segmentation
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 66: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/66.jpg)
Semantic segmentation on NYU-depth-v2 dataset
Boundary detection on BSDS500 dataset
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 67: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/67.jpg)
Visualization of sparse binary masks
ponder map = pixel-level computation depth
Pixel-Level Dynamic Routing
S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018
![Page 68: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/68.jpg)
indoor image panorama street scene
Pixel-Level Dynamic Routing
![Page 69: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/69.jpg)
1. Perspective-aware Pooling
2. Pixel-wise Attentional Gating (PAG)
1. Attentional Pooling for the “right” receptive field
2. pixel-level dynamic routing
3. Discussion
Discussion
![Page 70: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/70.jpg)
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
Discussion
![Page 71: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/71.jpg)
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
2. PAG scales back computation with little cost in performance.
Discussion
![Page 72: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/72.jpg)
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
2. PAG scales back computation with little cost in performance.
enough savings? real-time?
Discussion
![Page 73: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/73.jpg)
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
2. PAG scales back computation with little cost in performance.
3. More attention to pixels for scene parsing.
Discussion
![Page 74: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/74.jpg)
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
![Page 75: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/75.jpg)
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
![Page 76: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/76.jpg)
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
![Page 77: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/77.jpg)
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
![Page 78: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/78.jpg)
More Attention to Pixels for Scene Parsing
photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot
![Page 79: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/79.jpg)
1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.
2. PAG scales back computation with little cost in performance. real-time?
3. More attention to pixels for scene parsing.
4. Potentially a unified model for all these tasks. How?
Discussion
![Page 80: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,](https://reader034.fdocuments.net/reader034/viewer/2022050106/5f444178251ff36099231a41/html5/thumbnails/80.jpg)
Thanks
Q&A
Charless FowlkesShu Kong