Deep Inside Convolutional Networks: Visualising Image...

Given an image classification ConvNet, we aim to answer two questions: • What does a class model look like? • What makes an image belong to a class? To this end, we visualise: • Canonical image of a class • Class saliency map for a given image and class Both visualisations are based on the class score derivative w.r.t. the input image (computed using back-prop) • Derivative of a ConvNet class score w.r.t. the input image is useful for visualising: • canonical image of a class • image-specific class saliency • Image-specific class saliency can be further processed to perform weakly-supervised object segmentation and detection 1. OVERVIEW Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University of Oxford, UK This work was supported by ERC grant VisRec no. 228180 We acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU 2. CLASS MODEL VISUALISATION fully connected classifier layer soft-max layer … • We compute a (regularised) image with a high class score : [Erhan et al., 2009] • Optimised using gradient descent, initialised with the zero image • Gradient is computed using back-prop • Maximising soft-max score leads to worse visualisation • We visualise a ConvNet trained on ImageNet ILSVRC 2013 (1000 classes) 3. IMAGE-SPECIFIC CLASS SALIENCY VISUALISATION • Linear approximation of the class score in the neighbourhood of an image : • has the same size as the image • Magnitude of defines a saliency map for image and class – computed using back-prop – score of -th class Image-Specific Class Saliency Properties: • Weakly supervised • computed using classification ConvNet, trained on image labels • no additional annotation required (e.g. boxes or masks) • Highlights discriminative object parts • Instant computation – no sliding window, just a single back-prop pass • Fires on several object instances • Given an image and a saliency map: 1. Saliency map is thresholded to obtain foreground / background masks 2. GraphCut colour segmentation [Boykov and Jolly, 2001] is initialised with the masks 3. Object localisation: bounding box of the largest foreground connected component • GraphCut propagates segmentation from the most salient areas of the object • ILSVRC 2013 localisation accuracy: 46.4% • weak supervision: ground-truth bounding boxes were not used for training • saliency maps for top-5 predicted classes were used to compute five bounding box predictions Layer Forward pass DeconvNet [Zeiler & Fergus, 2013] Back-prop w.r.t. input Convolution RELU Max-pooling equivalent equivalent max location “switch” slightly different: threshold layer output vs input – n th layer activity; – n th layer DeconvNet reconstruction; – visualised neuron activity goose toucan ostrich keyboard dumbbell husky kit fox dalmatian bell pepper lemon 4. WEAKLY-SUPERVISED OBJECT LOCALISATION 5. RELATION TO DECONVOLUTIONAL NETS 6. CONCLUSION

Upload
others
Category

Documents
view
4
download
0

Embed Size (px):

Transcript of Deep Inside Convolutional Networks: Visualising Image...

Page 1: Deep Inside Convolutional Networks: Visualising Image ...vgg/publications/2014/Simonyan14a/poster.pdf · Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, University

Given an image classification ConvNet, we aim to answer two questions: • What does a class model look like? • What makes an image belong to a class?

To this end, we visualise: • Canonical image of a class • Class saliency map for a given image and class

Both visualisations are based on the class score derivative w.r.t. the input image (computed using back-prop)

• Derivative of a ConvNet class score w.r.t. the input image is useful for visualising: • canonical image of a class • image-specific class saliency

• Image-specific class saliency can be further processed to perform weakly-supervised object segmentation and detection

1. OVERVIEW

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Visual Geometry Group, University of Oxford, UK

This work was supported by ERC grant VisRec no. 228180 We acknowledge the support of NVIDIA Corporation with the donation of the Tesla K40 GPU

2. CLASS MODEL VISUALISATION

fully connected classifier layer

soft-max layer

…

• We compute a (regularised) image with a high class score : [Erhan et al., 2009]

• Optimised using gradient descent, initialised with the zero image

• Gradient is computed using back-prop

• Maximising soft-max score leads to worse visualisation

• We visualise a ConvNet trained on ImageNet ILSVRC 2013 (1000 classes) 3. IMAGE-SPECIFIC

CLASS SALIENCY VISUALISATION

• Linear approximation of the class score in the neighbourhood of an image :

• has the same size as the image • Magnitude of defines a saliency map for image

and class

– computed using back-prop

– score of -th class

Image-Specific Class Saliency Properties:

• Weakly supervised

• computed using classification ConvNet, trained on image labels

• no additional annotation required (e.g. boxes or masks)

• Highlights discriminative object parts

• Instant computation – no sliding window, just a single back-prop pass

• Fires on several object instances

• Given an image and a saliency map:

1. Saliency map is thresholded to obtain foreground / background masks

2. GraphCut colour segmentation [Boykov and Jolly, 2001] is initialised with the masks

3. Object localisation: bounding box of the largest foreground connected component

• GraphCut propagates segmentation from the most salient areas of the object

• ILSVRC 2013 localisation accuracy: 46.4%

• weak supervision: ground-truth bounding boxes were not used for training

• saliency maps for top-5 predicted classes were used to compute five bounding box predictions

Layer Forward pass DeconvNet [Zeiler & Fergus, 2013] Back-prop w.r.t. input

Convolution

RELU

Max-pooling

equivalent

max location “switch”

slightly different: threshold layer output vs input

– nth layer activity; – nth layer DeconvNet reconstruction; – visualised neuron activity

goose toucan ostrich keyboard dumbbell

husky kit fox dalmatian bell pepper lemon

4. WEAKLY-SUPERVISED OBJECT LOCALISATION

5. RELATION TO DECONVOLUTIONAL NETS 6. CONCLUSION

Very Deep ConvNets for Large-Scale Image Recognitionkaren/pdf/ILSVRC_2014.pdf · Very Deep ConvNets for Large-Scale Image Recognition Karen Simonyan, Andrew Zisserman Visual Geometry

SpatialTransformerNetworks · 2018. 10. 9. · SpatialTransformerNetworks Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu Google DeepMind, London, UK 15210240008 贺珂珂

A arXiv:1412.6598v2 [cs.CV] 11 Apr 2015vedaldi/assets/pubs/parizi...Sobhan Naderi Parizi Andrea Vedaldi Andrew Zisserman Pedro Felzenszwalb Brown University University of Oxford Brown

VLFeat - An open and portable library of computer vision ...vedaldi/assets/pubs/vedaldi10vlfeat.pdf · VLFeat - An open and portable library of computer vision algorithms Andrea Vedaldi

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION does size matter? Karen Simonyan Andrew Zisserman.

Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI ... · Hanbyul Joo Natalia Neverova Andrea Vedaldi Facebook AI Research Abstract We propose a method for building large collections

Cats and Dogs - University of Oxfordvgg/publications/2012/parkhi12a/parkhi… · Cats and Dogs Omkar M Parkhi 1;2 Andrea Vedaldi Andrew Zisserman C. V. Jawahar2 1Department of Engineering

Attribute Recognition from Adaptive Parts arXiv:1607 ...pingtan/Papers/bmvc16.pdf · arXiv:1607.01437v2 [cs.CV] 18 Jul 2016 {Jader berg, Simonyan, Zisserman, etprotect unhbox voidb@x

Lecture 10: Convolutional Neural Networks · CS231n: Convolutional Neural Networks K. Simonyan and A. Zisserman, Very Deep Convolutional Networks for Large -Scale Image Recognition,

Wide Residual Networks arXiv:1605.07146v4 [cs.CV] 14 Jun ... · arXiv:1605.07146v4 [cs.CV] 14 Jun 2017 {Krizhevsky, Sutskever, and Hinton} 2012{} {Simonyan and Zisserman} 2015 {Szegedy,

Hasmik Simonyan

Florian Schroff, Antonio Criminisi & Andrew Zisserman ICCV 2007

Speeding up Convolutional Neural Networks with Low …vedaldi/assets/pubs/... · 2016-01-05 · JADERBERG, VEDALDI, ZISSERMAN: SPEEDING UP CONVOLUTIONAL NEURAL... 1 Speeding up Convolutional

CSE 559A: Computer Visionayan/courses/cse559a/PDFs/lec...Let's talk about VGG-16. Winner of Imagenet 2014. Reference Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman,

On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman.

Jeff Donahue jeffdonahue@google.com Karen Simonyan arXiv ... · Jeff Donahue DeepMind London, UK jeffdonahue@google.com Karen Simonyan DeepMind London, UK simonyan@google.com ABSTRACT

Jeff Donahue [email protected] Karen Simonyan arXiv ... · Jeff Donahue DeepMind London, UK [email protected] Karen Simonyan DeepMind London, UK [email protected] ABSTRACT

Relja Arandjelovid and Andrew Zisserman - University of Oxfordvgg/publications/2012/Arandjelovic12a/... · Relja Arandjelovid and Andrew Zisserman ... Antonio Canova Cupid and Psyche

C4B Computer Vision Question Sheet 2 answersaz/lectures/cv/answers.pdf · C4B Computer Vision Question Sheet 2 answers Andrea Vedaldi vedaldi@robots.ox.ac.uk Question 1- Structure

C4B Computer Vision Question Sheet 2 answersaz/lectures/cv/answers.pdf · C4B Computer Vision Question Sheet 2 answers Andrea Vedaldi [email protected] Question 1- Structure

A. Simonyan, V. Danielyan, V. Dekhtiarov,

Parallel Separable 3D Convolution for Video and Volumetric ... · arXiv:1809.04096v1 [cs.CV] 11 Sep 2018 {He, Zhang, Ren, and Sun} 2015{} {Simonyan and Zisserman} 2014{} {Yu and Koltun}

Deep Inside Convolutional Networks: Visualising Image...

Documents

Transcript of Deep Inside Convolutional Networks: Visualising Image...