Fully Convoluional Networks for Semantic...

16
Fully Convolutional Networks for Semantic Segmentation Jonathan Long, Evan Shelhamer, Trevor Darrell Presenter: Hannah Li

Transcript of Fully Convoluional Networks for Semantic...

Page 1: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Fully Convolutional Networks for Semantic Segmentation

Jonathan Long, Evan Shelhamer, Trevor Darrell

Presenter: Hannah Li

Page 2: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Fixed dimension and throws away spatial coordinatesConvolutional Networks

Fully Convolutional Networks

Page 3: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Fully Convolutional Networks

xi,j - location (i, j) for a particular layeryi,j - location (i, j) for the following layerk - kernel (filter) sizes - stridefks - determines the layer type:

- matrix multiplication for convolution or average pooling- spatial max for max pooling- elementwise nonlinearity for an activation function

To find the gradient descent on the whole image, you only need to find the gradient descent of the final layer

Page 4: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Output dimensions are reduced to keep filters small and

computational requirements reasonable

However, we need dense pixels…

Page 5: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Solution: Deconvolution/Upsampling

stride = 1 stride = 2

https://github.com/vdumoulin/conv_arithmetic

Page 6: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-pixels.pdf

Deconvolution

Bilinear interpolation

Page 7: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

From classifier to dense FCN1. Discard the final layer2. Convert all fully connected layers to convolutions3. Append convolution with channel dimension 21 to

predict score for the PASCAL classes

Page 8: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean
Page 9: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean
Page 10: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

FCN for segmentation Technique: combining final

prediction layer with lower layers with finer strides

Not detailed enough

Combining fine layers and coarse layers lets the model make local predictions that respect global

structure.

Page 11: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Results

20% improvement for mean IoU

286 times faster

*Simultaneous detection and segmentation. ECCV 2014

*

Page 12: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Conclusions

• “Fully convolutional” networks take inputs of any size

• Adapt AlexNet, VGG net, GoogLeNetinto fully convolutional networks

• Fine-tune them to the segmentation task

• New architecture: combines semantic info from coarse layer with info from shallow, fine layer

• 20% relative improvement to 62.2% mean IU (PASCAL VOC 2012)

Page 13: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Extras

Page 14: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Fully Convolutional Networks

xi,j - location (i, j) for a particular layeryi,j - location (i, j) for the following layerk - kernel (filter) sizes - stridefks - determines the layer type:

- matrix multiplication for convolution or average pooling- spatial max for max pooling- elementwise nonlinearity for an activation function

- Kernel size and stride for this functional form obeys the transformation rule above- Computes a nonlinear filter (as opposed to a general nonlinear function)

FCN naturally operates on an input of any size!

Page 15: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean

Loss Function

• Loss function is a sum over the spatial dimension of the final layer

• Gradient of loss function is a sum over the gradients of each of its spatial components

• Takes all of the final layer receptive fields as a minibatch

The receptive fields overlap significantly ->More efficient!

5 times faster than AlexNet

Spatial output maps are suitable dense problems like semantic

segmentation

Page 16: Fully Convoluional Networks for Semantic Segmentationvicente/recognition/2016/presentations/fcns.pdf · predictions that respect global structure. Results 20% improvement for mean