Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When...

80
Shu Kong CS, ICS, UCI Pay Attention to the Pixel, Understand the Scene Better

Transcript of Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When...

Page 1: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Shu KongCS, ICS, UCI

Pay Attention to the Pixel, Understand the Scene Better

Page 2: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

pixel level labelingsemantic segmentation -- assigning a class label to each pixel

Background: Scene Parsing

wall painting pillow sofa cabinet towel

Page 3: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Background: Scene Parsing

old days (before 2013), extracting features to represent pixels,unary pixel classification and pixel pairs for CRF

features, transforms,grouping, etc.

Page 4: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Background: Scene Parsing

nowadays, deep learning

Page 5: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Convolutional Neural Network (CNN)

aggregating local information

avoid computing over whole image directly

Background: Convolutional Neural Net

Y. Handwritten digit recognition: Applications of neural net chips and automatic learning, Neural Computation, 1989.

Page 6: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Receptive Field @ high-level layers

Background: Receptive Field in CNN

photo credit to Honglak Lee

Page 7: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Image Classification

Background: CNN for Scene Parsing

cat vs. dog

Page 8: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Dense Prediction

Background: CNN for Scene Parsing

wall painting pillow sofa cabinet towel

Page 9: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Background: Perspective and Scale

size(car) > size(train)?

size(chair) > size(whiteboard)?

Page 10: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Outline

Page 11: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Outline

Page 12: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information

Perspective-aware Pooling

Page 13: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information

The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information.

Perspective-aware Pooling

Page 14: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Goal: deciding for each pixel the size of receptive field (RF) to aggregate information

The closer the object is to the camera, the larger size it appears in the image, the larger RF the model should use to aggregate information..

Depth conveys the scale information.

Perspective-aware Pooling

Page 15: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

Perspective-aware Pooling

Page 16: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

dilated convolution (Atrous Convolution).

Perspective-aware Pooling

Page 17: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

dilated convolution (Atrous Convolution).

Perspective-aware Pooling

Page 18: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

2D atrous convolution of different dilate rates.

allowing for larger RF to aggregate more contextual information

Perspective-aware Pooling

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Page 19: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

Perspective-aware Pooling

Page 20: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

Page 21: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

Multiplicative gating

Page 22: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

Page 23: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Idea: making the pooling size adaptive w.r.t depth

quantize the depth into five scales with dilate rates {1, 2, 4, 8, 16}

Perspective-aware Pooling

Page 24: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

When depth is not available in inference --

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 25: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

When depth is not available in inference --

Idea: train a depth estimator

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 26: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

When depth is not available in inference --

Idea: train a depth estimator

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 27: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

When depth is not available in inference --

Idea: train a depth estimator

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Why better?

capacity, representation power

Page 28: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Consistent improvement

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 29: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 30: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 31: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Recurrently refining by adapting the predicted depth

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 32: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Recurrent refinement by adapting the predicted depth

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 33: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

No depth at all?

Idea: train unsupervised attention map

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 34: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

No depth at all?

Idea: train unsupervised attention map

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

Page 35: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

No depth at all?

Idea: train unsupervised attention map

Perspective-aware Pooling

S. Kong, C. Fowlkes, Recurrent Scene Parsing with Perspective Understanding in the Loop, CVPR, 2018

gt-depth

pred-depth

attention

Page 36: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Pixel-wise Attentional Gating

Page 37: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Improvement

1. attention vs. depth

general, learning rule

Pixel-wise Attentional Gating

Page 38: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Improvement

1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

Pixel-wise Attentional Gating

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

Page 39: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Improvement

1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

computational saving

Pixel-wise Attentional Gating

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

Page 40: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

PAG produces binary masks, e.g., using argmax on softmax.

Pixel-wise Attentional Gating

Page 41: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

PAG produces binary masks, e.g., using argmax on softmax.

How to output binary maps but still allowing end-to-end training?

Pixel-wise Attentional Gating

Page 42: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

PAG produces binary masks, e.g., using argmax on softmax.

How to output binary maps but still allowing end-to-end training?

Idea: Gumbel-softmax trick[1,2]

Pixel-wise Attentional Gating

[1] Categorical reparameterization with gumbel-softmax, ICLR, 2017[2] The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Page 43: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

The normal way for sampling is to use argmax operator on softmax.

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Page 44: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

The normal way for sampling is to use argmax operator on softmax.

Then,

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Page 45: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

The normal way for sampling is to use argmax operator on softmax.

Then,

Here is an alternative that

where

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Page 46: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

The normal way for sampling is to use argmax operator on softmax.

Then,

Here is an alternative that

where

random variable m follows

Pixel-wise Attentional Gating

Gumbel, E.J.: Statistics of extremes. Courier Corporation (2012)

Page 47: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

but not continuous, not differentiable

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Page 48: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

but not continuous, not differentiable

expressing a discrete random variable as a one-hot vector

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Page 49: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

but not continuous, not differentiable

expressing a discrete random variable as a one-hot vector

the Gumbel softmax relaxation is

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017The concrete distribution: A continuous relaxation of discrete random variables, ICLR, 2017

Page 50: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

one-hot-encoded categorical distribution

from argmax to uniform controlled by = 0~inf

Pixel-wise Attentional Gating

Categorical reparameterization with gumbel-softmax, ICLR, 2017

Page 51: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

PAG produces binary masks.

Two applications

1. parallel pooling branch for deciding the “right” receptive field

2. pixel-level dynamic routing

Pixel-wise Attentional Gating

Page 52: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Attentional Pooling for the “Right” Receptive Field

Page 53: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Improvement

1. attention vs. depth

general, learning rule

2. binary gating vs. weighted gating

computational saving

Attentional Pooling for the “Right” Receptive Field

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, 2018

Page 54: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Visual summary of three tasks on three different datasets

Consistent result:

PAG improves baseline, and is comparable to Multiplicative gating

Attentional Pooling for the “Right” Receptive Field

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 55: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

semantic segmentation

Attentional Pooling for the “Right” Receptive Field

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 56: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Pixel-Level Dynamic Routing

Page 57: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

dynamic computation time for each pixel of an image

It is useful with limited computation budget.

Pixel-Level Dynamic Routing

Page 58: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Pixel-Level Dynamic Routing

Page 59: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

sparse PAG at each layer

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 60: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

sparse binary mask

Using KL-divergence term for sparse masks.

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 61: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

experiment of semantic segmentation

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 62: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

experiment of semantic segmentation

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 63: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

learning to skip layers for dynamic routing

Pixel-Level Dynamic Routing

[1] BlockDrop: Dynamic Inference Paths in Residual Networks[2] Convolutional Networks with Adaptive Computation Graphs[3] SkipNet: Learning Dynamic Routing in Convolutional Networks

Page 64: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

experiment of semantic segmentation

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 65: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

experiment of semantic segmentation

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 66: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Semantic segmentation on NYU-depth-v2 dataset

Boundary detection on BSDS500 dataset

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 67: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Visualization of sparse binary masks

ponder map = pixel-level computation depth

Pixel-Level Dynamic Routing

S. Kong, C. Fowlkes, Pixel-wise Attentional Gating for Parsimonious Pixel Labeling, arxiv 1805.01556, 2018

Page 68: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

indoor image panorama street scene

Pixel-Level Dynamic Routing

Page 69: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Perspective-aware Pooling

2. Pixel-wise Attentional Gating (PAG)

1. Attentional Pooling for the “right” receptive field

2. pixel-level dynamic routing

3. Discussion

Discussion

Page 70: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

Discussion

Page 71: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

2. PAG scales back computation with little cost in performance.

Discussion

Page 72: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

2. PAG scales back computation with little cost in performance.

enough savings? real-time?

Discussion

Page 73: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

2. PAG scales back computation with little cost in performance.

3. More attention to pixels for scene parsing.

Discussion

Page 74: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

Page 75: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

Page 76: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

Page 77: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

Page 78: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

More Attention to Pixels for Scene Parsing

photo credit to Nathan Silberman, Wongun Choi, Sean Bell, Edward Hsiao, iRobot

Page 79: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

1. Pixel-wise Attentional Gating unit (PAG) allocates dynamic computation for pixels; it is general, agnostic to architectures and problems.

2. PAG scales back computation with little cost in performance. real-time?

3. More attention to pixels for scene parsing.

4. Potentially a unified model for all these tasks. How?

Discussion

Page 80: Pay Attention to the Pixel, Understand the Scene Betterskong2/slides/20180514_AIML_UCI.pdf · When depth is not available in inference --Perspective-aware Pooling S. Kong, C. Fowlkes,

Thanks

Q&A

Charless FowlkesShu Kong