Convolutional Neural Networks (CNN) - Robot Vision...

53
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolutional Neural Networks (CNN) Algorithm and Some Applications in Computer Vision Luo Hengliang Institute of Automation June 10, 2014 Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 1 / 53

Transcript of Convolutional Neural Networks (CNN) - Robot Vision...

Page 1: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

.

......

Convolutional Neural Networks (CNN)Algorithm and Some Applications in Computer Vision

Luo Hengliang

Institute of Automation

June 10, 2014

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 1 / 53

Page 2: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Table of Contents

...1 Algorithm of CNN

...2 Application in Traffic Sign Recoginition

...3 How to use it in detection?

...4 Descriptor matching

...5 Conclusion

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 2 / 53

Page 3: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Introduction to CNN1

1LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 3 / 53

Page 4: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Traditional pattern recognition method for image

.hand-crafted feature + general classifier..

......

The first module, called the featureextractor, transforms the input patternsso that they can be represented bylow-dimensional vectors.

The classifier, on the other hand, isoften general purpose and trainable.

.main problem..

......

The recognition accuracy is largelydetermined by the ability of the designerto come up with an appropriate set offeatures.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 4 / 53

Page 5: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

How to learn the feature extractor itself?

.If we input the raw pixels to the multilayer networks and train it, then:..

......

Typical images are large, and the networks contain several tens of thousandsof weights. Such a large number of parameters increases the capacity of thesystem and therefore requires a larger training set.

The unstructured nets for image applications have no built-in invariance withrespect to translations or local distortions of the inputs.

The topology of the input is entirely ignored. The input variables can bepresented in any (fixed) order without affecting the outcome of the training.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 5 / 53

Page 6: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

How to learn the feature extractor itself?(cont’d)

Architecture of Convolutional Neural Networks

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 6 / 53

Page 7: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Rebirth of CNN2

2Krizhevsky A, Sutskever I, Hinton G E. ImageNet Classification with Deep Convolutional Neural Networks[C]//NIPS. 2012, 1(2): 4.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 7 / 53

Page 8: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Dataset

.ImageNet..

......

15 million labeled high-resolution images belonging to roughly 22,000categories.

.ILSVRC(ImageNet Large-Scale Visual Recognition Challenge)..

......

a subset of ImageNet with roughly 1000 images in each of 1000 categories.

there are roughly 1.2 million training images, 50,000 validation images, and150,000 testing images.

it is customary to report two error rates:top-1 and top-5, where the top-5error rate is the fraction of test images for which the correct label is notamong the five labels considered most probable by the model.

.Data pre-process..

......

rescale the image such that the shorter side was of length 256, and thencropped out the central 256x256 patch from the resulting image.

subtract the mean activity over the training set from each pixel.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 8 / 53

Page 9: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

The Architecture

The Architecture

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 9 / 53

Page 10: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

The Architecture(cont’d)

ReLU(Rectified Linear Units) Nonlinearity:deep convolutional neural networks withReLUs train several times faster than theirequivalents with tanh units.

Training on Multiple GPUs.

Local Response Normalization:localnormalization after ReLU Nonlinearity aidsgeneralization.

Overlapping Pooling: during training thatmodels with overlapping pooling find itslightly more difficult to overfit.

−5 −4 −3 −2 −1 0 1 2 3 4 5−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1y=tanh(x)

x

y

(c) tanh

−5 −4 −3 −2 −1 0 1 2 3 4 5−1

0

1

2

3

4

5y=ReLU(x)

x

y(d) ReLU

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 10 / 53

Page 11: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Reducing Overfitting

.Data Augmentation..

......

generating image translations and horizontal reflections. We do this byextracting random 224x224 patches (and their horizontal reflections) fromthe 256x256 images and training our network on these extracted patches .This increases the size of our training set by a factor of 2048,

The second form of data augmentation consists of altering the intensities ofthe RGB channels in training images. Specifically, we perform PCA on theset of RGB pixel values throughout the ImageNet training set.

.Dropout..

......

We use dropout in the first two fully-connected layer. Without dropout, ournetwork exhibits substantial overfitting.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 11 / 53

Page 12: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Result on ILSVRC-2010

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 12 / 53

Page 13: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Result on ILSVRC-2010(cont’d)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 13 / 53

Page 14: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

CNN Features off-the-shelf3

3Razavian A S, Azizpour H, Sullivan J, et al. CNN Features off-the-shelf: an Astounding Baseline for Recognition[J]. arXiv preprint arXiv:1403.6382,

2014.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 14 / 53

Page 15: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

CNN result on many datasets

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 15 / 53

Page 16: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Visual Classification Method

For all the experiments we resize the whole image (or cropped sub-window) to221x221 and input the image to OverFeat. This gives a vector of 4096dimensions. We have two settings:

The feature vector is further L2 normalized to unit length for all theexperiments. We use the 4096 dimensional feature vector in combinationwith a Support Vector Machine (SVM) to solve different classification tasks(CNN-SVM).

We further augment the training set by adding cropped and rotated samplesand doing component-wise power transform and report separate results(CNNaug+SVM)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 16 / 53

Page 17: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Pascal VOC 2007 Image Classification Results

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 17 / 53

Page 18: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Table of Contents

...1 Algorithm of CNN

...2 Application in Traffic Sign Recoginition

...3 How to use it in detection?

...4 Descriptor matching

...5 Conclusion

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 18 / 53

Page 19: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

GTSRB4 Dataset and reletated works5

.GTSRB..

......

43 classes, 39,209 training images, 12,630 test images, images size vary from15x15 to 250x250.

.Result..

......Result overview for the final stage of the GTSRB.

4http://benchmark.ini.rub.de/?section=gtsrb&subsection=news

5Stallkamp J, Schlipsing M, Salmen J, et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition[J]. Neural

networks, 2012, 32: 323-332.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 19 / 53

Page 20: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Programming tool:Torch76

.About Torch7..

......

Torch7 is a scientific computing framework with wide support for machine learningalgorithms. It is easy to use and provides a very efficient implementation, thanksto an easy and fast scripting language, LuaJIT, and an underlying Cimplementationa.

atorch.ch

.Why Choose Torch7?..

......

It was recommended by Yann Lecun

It is very fast

See more from the websitea below.

ahttp://www.kdnuggets.com/2014/02/exclusive-yann-lecun-deep-learning-facebook-ai-lab.html

6Collobert R, Farabet C, Kavukcuoglu K. Torch7: A matlab-like environment for machine learning[C]//BigLearn, NIPS Workshop. 2011

(EPFL-CONF-192376).

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 20 / 53

Page 21: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Image pre-processing

Crop the image Color to gray CLAHEResize to 32x32

Image preprocessing method from the paper7

(g) Origin (h) Crop (i) Gray (j) CLAHE

7Ciresan D, Meier U, Masci J, et al. A committee of neural networks for traffic sign classification[C]//Neural Networks (IJCNN), The 2011

International Joint Conference on. IEEE, 2011: 1918-1921.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 21 / 53

Page 22: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Contrast Limited Adaptive Histogram Equalization8

.Adaptive histogram equalization..

......

.Contrast limited adaptive histogram equalization..

......

8http://en.wikipedia.org/wiki/Adaptive_histogram_equalization

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 22 / 53

Page 23: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

CNN Structure

input32x32

C1:feature maps16@28x28 S2:maps

16@14x14

C3:maps32@10x10

S4:maps32@5x5

F5:layer800 F6:layer

256 output43

convolutions convolutions

subsampling subsampling

reshape

full-connected MLP

CNN Structure

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 23 / 53

Page 24: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Define Model in Torch7

model : add ( nn . Spat ia lConvo lut ionMM (1 , 16 , 5 , 5 ) )model : add ( nn . Tanh ( ) )model : add ( nn . Spa t i a l LPPoo l i n g (16 , 2 , 2 , 2 , 2 , 2 ) )

model : add ( nn . Spat ia lConvo lut ionMM (16 , 32 , 5 , 5 ) )model : add ( nn . Tanh ( ) )model : add ( nn . Spa t i a l LPPoo l i n g (32 , 2 , 2 , 2 , 2 , 2 ) )

model : add ( nn . Reshape (32∗5∗5))model : add ( nn . L i n e a r (32∗5∗5 , 256))model : add ( nn . Tanh ( ) )model : add ( nn . L i n e a r (256 , 43) )

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 24 / 53

Page 25: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Train Result

0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 57570

75

80

85

90

95

100

The Train Dataset Result

Trainning set Iterations

Tra

in A

ccur

acy

(%)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 25 / 53

Page 26: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Test Result

0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 57590

91

92

93

94

95

96

97

98The Test Dataset Result

Trainning set Iterations

Tes

t Acc

urac

y (%

)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 26 / 53

Page 27: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Result(cont’d)

0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 57597

97.1

97.2

97.3

97.4

97.5

97.6

97.7

97.8The Test Dataset Result

Trainning set Iterations(from 25 to the end)

Tes

t Acc

urac

y (%

)

The best test accuracy is 97.75%.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 27 / 53

Page 28: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Some misclassified examples in test set

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 28 / 53

Page 29: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Retrain:data augmentation

.Data augmentation..

......

scaling:[0.8, 1.2]

translation:random

rotation:[−15◦, 15◦]

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 29 / 53

Page 30: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Recognize the detection results

Error CountClass Total Without Aug. With Aug.

Danger 62 12 0Mandatory 56 23 1Prohibitory 168 48 1

Some “hard” detection results

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 30 / 53

Page 31: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Integrated with the Application

.Forward-cnn..

......

Torch7 doesn’t support Windows now, but we need to create a gui demoapplication in Windows.

So I write the forward cnn in C++ athttps://github.com/beenfrog/cnn-forward

input32x32

C1:feature maps16@28x28 S2:maps

16@14x14

C3:maps32@10x10

S4:maps32@5x5

F5:layer800 F6:layer

256 output43

convolutions convolutions

subsampling subsampling

reshape

full-connected MLP

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 31 / 53

Page 32: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Table of Contents

...1 Algorithm of CNN

...2 Application in Traffic Sign Recoginition

...3 How to use it in detection?

...4 Descriptor matching

...5 Conclusion

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 32 / 53

Page 33: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

R-CNN: Regions with CNN features9

9Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[J]. arXiv preprint

arXiv:1311.2524, 2013.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 33 / 53

Page 34: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Object detection system overview

.overview..

......

takes an input image

extracts around 2000 bottom-up region proposals

computes features for each proposal using a large convolutional neuralnetwork (CNN)

classifies each region using class-specific linear SVMs

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 34 / 53

Page 35: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Region proposals:Selective Search10

.Selective Search..

......

10van de Sande K E A, Uijlings J R R, Gevers T, et al. Segmentation as selective search for object recognition[C]//Computer Vision (ICCV), 2011

IEEE International Conference on. IEEE, 2011: 1879-1886.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 35 / 53

Page 36: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Feature extraction

.Feature extraction..

......

We extract a 4096-dimensional feature vector from each region proposalusing our own implementation of the CNN of [Hinton2012].

In order to compute features for a region proposal, we must first convert theimage data in that region into a fixed 224x224 pixel size.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 36 / 53

Page 37: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Training11

.CNN pre-training..

......

.CNN fine-tuning..

......

.Object category classifiers..

......11

http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 37 / 53

Page 38: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

R-CNN: Result

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 38 / 53

Page 39: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Any other new approaches to detection with CNN?12

Question from me:How to use the CNN effectively in object detection? Thetraditional sliding window method may be too slow. There are some worksfocused on generating region proposals first, such ashttp://arxiv.org/abs/1311.2524, any other new approaches? Thanks!

Answer by ylecun:ConvNets are not too slow for detection. Look at ourpaper on OverFeat [Sermanet et al. ICLR 2014], on pedestrian detection[Sermanet et al. CVPR 2013], and on face detection [Osadchy et al. JMLR2007] and [Vaillant et al. 1994]. The key insight is that you can apply aConvNet.....convolutionally over a large image, without having to recomputethe entire network at every location (because much of the computation wouldbe redundant). We have known this since the early 90’s.

Answer by osdf:A recent paper that takes the idea of avoidingrecomputations to CNNs with max-pooling operations: Fast image scanningwith deep max-pooling convolutional neural networks.

12http://www.reddit.com/r/MachineLearning/comments/25lnbt/ama_yann_lecun

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 39 / 53

Page 40: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Fast Image Scanning 13

13Giusti A, Cirean D C, Masci J, et al. Fast image scanning with deep max-pooling convolutional neural networks[J]. arXiv preprint arXiv:1302.1700,

2013.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 40 / 53

Page 41: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Speed up the forward computation of CNN14

Both schemes follow the same intuition: that CNN filter banks can be approximatedusing a low rank basis of filters that are separable in the spatial domain

14Jaderberg M, Vedaldi A, Zisserman A. Speeding up Convolutional Neural Networks with Low Rank Expansions[J]. arXiv preprint arXiv:1405.3866,

2014.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 41 / 53

Page 42: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Table of Contents

...1 Algorithm of CNN

...2 Application in Traffic Sign Recoginition

...3 How to use it in detection?

...4 Descriptor matching

...5 Conclusion

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 42 / 53

Page 43: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Descriptor Matching15

15Fischer P, Dosovitskiy A, Brox T. Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT[J]. arXiv preprint

arXiv:1405.5769, 2014.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 43 / 53

Page 44: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Feature Learning with Convolutional Neural Nets

.Supervised Training..

......We used a pre-trained model form [Hinton2012].

.Unsupervised Training..

......

We used random images from Flickr because we expect those to be betterrepresentatives of the distribution of natural images.

Next N = 16000 “seed” patches of size 64x64 pixels were extracted randomlyfrom different images at various locations and scales. Each of these “seed”patches was declared to represent a surrogate class of its own.

These classes were augmented by applying K = 150 random transformationsto each of the “seed” patches. Each transformation was a composition ofrandom elementary transformations. These included translation, scalevariation, rotation, color variation, contrast variation, and also blur, which isoften relevant for matching problems.

As a result we obtained a surrogate labeled dataset with N classes containingK samples each. We used these data to train a convolutional neural network.

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 44 / 53

Page 45: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

The New Test DataSet

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 45 / 53

Page 46: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Result

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 46 / 53

Page 47: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Feature computation time

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 47 / 53

Page 48: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Table of Contents

...1 Algorithm of CNN

...2 Application in Traffic Sign Recoginition

...3 How to use it in detection?

...4 Descriptor matching

...5 Conclusion

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 48 / 53

Page 49: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Deep ConvNets: astounding baseline for vision16

16http://cs.nyu.edu/~sermanet/papers/Deep_ConvNets_for_Vision-Results.pdf

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 49 / 53

Page 50: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Deep ConvNets: astounding baseline for vision (cont’d)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 50 / 53

Page 51: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Deep ConvNets: astounding baseline for vision (cont’d)

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 51 / 53

Page 52: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Future works

Use the feature learned from CNN in other vision tasks.

Unsupervised Learning.

. . .

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 52 / 53

Page 53: Convolutional Neural Networks (CNN) - Robot Vision Groupvision.ia.ac.cn/zh/senimar/reports/ConvNet-slide-luohl.pdf · 2014-06-10 · Luo Hengliang (Institute of Automation) Convolutional

..........

.....

.....................................................................

.....

......

.....

.....

.

Thank you!

Luo Hengliang (Institute of Automation) Convolutional Neural Networks (CNN) June 10, 2014 53 / 53