Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional...

49
Convolutional Neural Networks for Semantic Segmentation Nikola Milosavljević July 18, 2017

Transcript of Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional...

Page 1: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Convolutional Neural Networks for Semantic Segmentation

Nikola

Milosavljević

July 18, 2017

Page 2: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Goal: classify each pixel into one of semantic classes

Does not distinguish objects of the same class

Semantic segmentation

http://jamie.shotton.org

Page 3: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Related problems

CS231N (Stanford)

Classification Classification

+ localization

Semantic

segmentation Object

detection

Instance

segmentation

CAT CAT CAT GRASS

TREE SKY

DOG DOG

CAT

DOG DOG

CAT

Single object Just pixels

(no objects)

Multiple objects

Page 4: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Autonomous driving

Differentiate road, pedestrians, traffic signs, …

Cordts et al., The Cityscapes Dataset for Semantic Urban Scene Understanding

Page 5: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Image/video editing, visual effects

Sky replacement

Tsai et al., Sky is Not the Limit: Semantic-Aware Sky Replacement

Page 6: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Map extraction from satellite image

Distinguish streets from houses, parks etc.

Marmanis et al., Semantic segmentation of aerial images with an ensemble of fully convolutional neural networks

Page 7: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Outline

Semantic segmentation and related problems

Convolutional neural networks for image classification

Fully convolutional networks

Upsampling methods

Refinement methods

Evaluation

Page 8: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Image classification

Goal: classify the main object in a given image

Basis for solving other problems

Krizhevsky, Sutskever, Hinton, ImageNet Classification with Deep Convolutional Neural Networks

Page 9: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

ImageNet

ImageNet 1K challenge 1000 classes

1.28 million training images

50.000 test images

http://image-

net.org/challenges/talks/ILSVRC+MSCOCO_12_17_15_introduction.pdf

Convolutional nets

Russakovsky et al. ImageNet Large Scale Visual

Recognition Challenge, IJCV, 2015

http://www.image-net.org/challenges/LSVRC/

Page 10: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Convolutional neural networks

Feedforward neural networks suitable for image data

Convolutional, max-pooling, and fully connected layers

https://research.googleblog.com/2016/08/improving-inception-and-image.html

Inception v3 (colors denote layer types)

Page 11: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Convolutional neural networks (cont.)

Activations can be viewed as multichannel images

Number of channels increases, spatial size decreases

Page 12: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Convolutional filter

3D tensor of weights which “slides” across input tensor

Followed by a nonlinear activation function

Generates one output channel (“feature map”)

Output Input

Filter

Page 13: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Example: edge detection

Sobel operator is implemented as convolution

+1 0

+2 0

+1 0

-1

-2

-1

By Simpsons contributor, CC BY-SA 3.0,

https://commons.wikimedia.org/w/index.php?curid=8904663

Page 14: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Convolutional layer

Consists of multiple filters acting on the same input, each

producing one output channel

Filter1 Input

Output

Input

Input

Filter2

Filter3

Page 15: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Max-pooling layer

Output is the max value within a sliding window

Invariance to translation

Input

Window

Output

5

7

2

0 0 7 0 0

0 5 -3 6 2

0 0 1 0 0

0 2 0 0 -1

2 0 0 5 0

7 Max pooling

Page 16: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Fully connected layer

Like a convolutional layer with

filter size equal to its input size Output spatial size 1 x 1

Same as hidden layer in

multilayer perceptrons Product between matrix of

weight and vector of inputs

Input

Output Input

Input

Input Output

Page 17: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Stride

Distance between consecutive sliding window positions

Horizontal and vertical stride usually equal

Used for downsampling

Output

Filter

Stride 1 pixel

Page 18: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Outline

Semantic segmentation and related problems

Intro to convolutional neural networks for image classification

Fully convolutional networks

Upsampling methods

Refinement methods

Evaluation

Page 19: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Fully convolutional networks (FCNs)

Long, Shelhamer, Darrell, Fully Convolutional Networks for Semantic Segmentation

Family of architectures for pixelwise predictions

Any architecture for image classification can be converted

and finetuned for a pixelwise task

Contain only “sliding window” layers (convolution, pooling) Do not contain fully connected layers

Page 20: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Reduction to image classification

Apply image classifier to sliding window

Apply result to the central part of each patch

Grass

Elephant

Elephant

Grass

Page 21: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Patches classified independently even though they overlap Not reusing computation

Large number of heavily overlapping patches Patch size much bigger than labeled region per patch

Need context to reliably classify

Problem: inefficiency

Patch size ~ 100s of pixels

Labeled region ~ a few pixels

Page 22: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Idea for reusing computation

Convolution and pooling layers accept input of any size Get proportionally bigger output

Apply first layer to the whole image Contains activations for all invocations of the classifier

Apply second layer to the whole output of the first layer etc.

Output Input

Filter

New input New output

Page 23: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Fully connected layer is just a convolutional layer applied

only once

Convert it to a convolutional layer acting on a bigger input

Converting fully connected layers

Input

Output Filter Filter

New output

New input

Page 24: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Start from a (pretrained) image classification network AlexNet, VGG, Inception, ResNet, …

Convert any fully connected layers to convolutional layers

Converted network can be applied to image of any size Each layer’s output size changes proportionally to the input size

Recipe for vanilla fully convolutional network

Page 25: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Compared to the sliding window approach, we get a

downsampled version of output

Downsampling effect

Page 26: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Input image: 512 x 512

Classifier network: input 256 x 256, output 1 x 1

Comparison of output sizes

Downsampling effect: example

Sliding window approach

1 + (512 – 256) = 257

Fully convolutional approach

≈ 1 ∙ 512 / 256 = 2

Page 27: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Outline

Semantic segmentation and related problems

Intro to convolutional neural networks for image classification

Fully convolutional networks

Upsampling methods

Refinement methods

Evaluation

Page 28: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Two approaches Eliminating existing downsampling

Appending explicit upsampling layers

Usually a hybrid, dictated by computational efficiency

Obtaining high-resolution output

Page 29: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Eliminating strided convolution/pooling

Replacing stride s > 1 by stride 1

All subsequent feature maps are upsampled s times

Effectively, each subsequent kernel is s times smaller

New output Old input

Filter

New input

Old output

Page 30: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Dilated convolution

Insert s - 1 zeros between every two consecutive entries of

the original kernel

Dilation 3

Original kernel 3 x 3

New kernel 7 x 7

w11 w12

w21 w22

w32 w32

w13

w23

w33

0 0

0 w22

0 0

0

0

0

0 w12

0 0

0

0

0 0

0 w22

0

0

0

0

0

0

w23

0

0

0

w13

0

0

0

0

w23

0 0

w21 0

0 0

w11 0

0 0

0 0

w21 0

Page 31: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Dilated convolution (cont.)

Does not increase parameter count

Can keep the same weights Important when starting from a pretrained network

Zero-optimization helps

Can eliminate all downsampling in principle Typically not done in practice due to computational complexity

Page 32: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Bilinear upsampling

Each input multiplies the kernel

Arrange kernels as sliding window

Add overlapping elements

3 x 3

5 x 5

7 x 7

Output

Input

Bilinear kernel

Page 33: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Transposed convolution

Generalization of bilinear upsampling Weights can be learned

Channels can be “mixed”

Input Output

Kernel

Kernel1 Output

Input

Output

Output

Kernel2

Kernel3

Long, Shelhamer, Darrell, Fully Convolutional Networks for Semantic Segmentation

Page 34: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Max-unpooling

Store indices of “winning” pixels in max-pooling

Upsample by restoring to the same position

Noh et al., Learning Deconvolution Network for Semantic Segmentation

Page 35: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Outline

Semantic segmentation and related problems

Intro to convolutional neural networks for image classification

Fully convolutional networks

Upsampling methods

Refinement methods

Evaluation

Page 36: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Fusing high-res and low-res information

Long, Shelhamer, Darrell, Fully Convolutional Networks for Semantic Segmentation

High resolution Low resolution

Poor classification Good classification

Appearance information Semantic information

Shallow layers Deep layers

+

Page 37: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Fusion by simple addition

Long, Shelhamer, Darrell, Fully Convolutional Networks for Semantic Segmentation

Upsample

+

High-res

features

Low-res

features

ConvNet ConvNet ConvNet 16x + 2x

ConvNet ConvNet ConvNet 2x + 2x + 8x

ConvNet ConvNet ConvNet 32x

“Backbone” network

Page 38: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Effects of fusion

Long, Shelhamer, Darrell, Fully Convolutional Networks for Semantic Segmentation

Page 39: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Refinement modules

More sophisticated way of fusing information from

“shallower” layers

ConvNet

ConvNet Upsample

+ ConvNet

High-res

features

Low-res

features

Fused

high-res

features

Lin et al., RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Page 40: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Refinement modules (cont.)

Lin et al., RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Page 41: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Conditional random field (CRF) postprocessing

Encourage consistent labels for “similar” pixels

Energy function Nodes xi: original network outputs

Nodes yj: postprocessed outputs

Edges: energy terms (unary and pairwise)

CRF inference Given xi minimize energy over yi

Practical algorithms exist only in special cases

x1 x2

x3 x4

y1 y2

y3 y4

Chen et al., DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs Krähenbühl and Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Page 42: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

CRF postprocessing (cont.)

Unary potential is high for labels which the network

assigns low probability

Pairwise potential is high if corresponding pixels in the

original input image have similar color or spatial position

Inference can be approximated by a convolutional network Enables end-to-end training

Zheng et al., Conditional Random Fields as Recurrent Neural Networks

Page 43: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Outline

Semantic segmentation and related problems

Intro to convolutional neural networks for image classification

Fully convolutional networks

Upsampling methods

Refinement methods

Evaluation

Page 44: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Datasets

Classic, relatively small MSRC

SiftFlow

Stanford

LabelMe

Newer & bigger Pascal VOC

Microsoft COCO

Including depth channel NYU

Sun RGBD

Autonomous driving KITTI

CamVid

Cityscapes

Page 45: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

PASCAL VOC challenge

Image classification, object detection,

semantic segmentation…

21 classes

A few thousand images for

training/test

Everingham et al., The PASCAL Visual Object Classes (VOC) Challenge http://host.robots.ox.ac.uk/pascal/VOC/

Page 46: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Intersection-over-union (IoU) Intersection and union computed per

class, and over whole set

Mean intersection-over-union (mIOU) Average over classes

Standard accuracy, precision, and recall also used

Semantic segmentation metrics

Intersection

Union

Page 47: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Accuracy

Mean per-class accuracy

Mean intersection-over-union (mIOU)

Semantic segmentation metrics

Page 48: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Pascal VOC leaderboard

http://host.robots.ox.ac.uk:8080/leaderboard/

Page 49: Convolutional Neural Networksimft.ftn.uns.ac.rs/ssip2017/wp-content/uploads/... · Convolutional Neural Networks for Semantic ... Shelhamer, Darrell, Fully Convolutional Networks

Summary

Fully convolutional neural networks can be used for pixelwise

classification on images

Can start from pretrained network for image classification

Need to adapt architecture to upsample and refine output