AI&BigData Lab. Артем Чернодуб "Распознавание изображений...

Lazy Deep Learning for Images

Recognition in ZZ Photo app

Artem Chernodub, George Paschenko

IMMSP NASU

AI&Big Data Lab, 23 April, 2015, Odessa.

ZZ Photo

𝑝 𝑥 𝑦 =𝑝 𝑦 𝑥 𝑝(𝑥)

𝑝(𝑦)

Biological-inspired models

Neuroscience

Machine Learning

2 / 55

Biological Neural Networks

3 / 55

Artificial Neural Networks

Traditional (Shallow) Neural

Networks

Deep Neural Networks

Deep Feedforward Neural

Networks Recurrent Neural Networks

4 / 55

Conventional Methods vs Deep Learning

5 / 55

Deep Learning = Learning of Representations (Features)

The traditional model of pattern recognition (since the late 50's):

fixed/engineered features + trainable classifier

Hand-crafted

Feature

Extractor

Trainable

Classifier

Trainable

Feature

Extractor

Trainable

Classifier

End-to-end learning / Feature learning / Deep learning:

trainable features + trainable classifier

6 / 55

ImageNet

Le et al. “Building high-level features using large-scale unsupervised learning” ICML

2012.

Model # of parameters Accuracy, %

Deep Net 10M 15.8

best state-of-the-art N/A 9.3

Training data: 16M images, 20K categories

7 / 55

Deep Face (Facebook)

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-

Level Performance in Face Verification // CVPR 2014.

Model # of parameters Accuracy, %

Deep Face Net 128M 97.35

Human level N/A 97.5

Training data: 4M facial images

8 / 55

TIMIT Phoneme Recognition

Graves, A., Mohamed, A.-R., and Hinton, G. E. (2013). Speech recognition with deep

recurrent neural networks // IEEE International Conference on Acoustics, Speech and

Signal Processing (ICASSP), pages 6645–6649. IEEE.

Mohamed, A. and Hinton, G. E. (2010). Phone recognition using restricted Boltzmann

machines // IEEE International Conference on Acoustics, Speech and Signal Processing

(ICASSP), pages 4354–4357.

Model # of parameters Accuracy

Hidden Markov Model, HMM N / A 27,3%

Deep Belief Network, DBN ~ 4M 26,7%

Deep RNN 4,3M 17.7%

Training data: 462 speakers train / 24 speakers test, 3.16 / 0.14 hrs.

9 / 55

Google Large Vocabulary Speech Recognition

H. Sak, A. Senior, F. Beaufays. Long Short-Term Memory Recurrent Neural Network

Architectures for Large Scale Acoustic Modeling // INTERSPEECH’2014.

K. Vesely, A. Ghoshal, L. Burget, D. Povey. Sequence-discriminative training of deep

neural networks // INTERSPEECH’2014.

Model # of parameters Cross-entropy

ReLU DNN 85M 11.3

Deep Projection LSTM RNN 13M 10.7

Training data: 3M utterances (1900 hrs).

10 / 55

Classic Feedforward Neural Networks (before 2006).

• Single hidden layer (Kolmogorov-Cybenko Universal

Approximation Theorem as the main hope).

• Vanishing gradients effect prevents using more layers.

• Less than 10K free parameters.

• Feature preprocessing stage is often critical.

11 / 55

Training the traditional (shallow) Neural Network: derivative + optimization

12 / 55

1) forward propagation pass

),( )1(i

ijijxwfz

),()1(~ )2(

jjj

zwgky

where zj is the postsynaptic value for the j-th hidden neuron, w(1) are the hidden layer’s

weights, f() are the hidden layer’s activation functions, w(2) are the output layer’s weights,

and g() are the output layer’s activation functions.

13 / 55

2) backpropagation pass

Local gradients calculation:

),1(~)1( kyktOUT

.)(' )2( OUT

jj

HID

jwzf

,)()2( j

OUT

j

zw

kE

.)(

)1( i

IN

j

ji

xw

kE

Derivatives calculation:

14 / 55

Bad effect of vanishing (exploding) gradients: a problem

,)( )1()(

)(

m

i

m

jm

ji

zw

kE

,' )1()()1()( m

ii

m

ij

m

j

m

jwf 0

)()(

m

jiw

kE=> 1mfor

15 / 55

Bad effect of vanishing (exploding) gradients: two hypotheses

1) increased frequency and

severity of bad local

minima

2) pathological curvature, like

the type seen in the well-known

Rosenbrock function: 222 )(100)1(),( xyxyxf

16 / 55

Deep Feedforward Neural Networks

• 2-stage training process: i) unsupervised pre-training; ii) fine tuning

(vanishing gradients problem is beaten!).

• Number of hidden layers > 1 (usually 6-9).

• 100K – 100M free parameters.

• No (or less) feature preprocessing stage.

17 / 55

Sparse Autoencoders

18 / 55

Dimensionality reduction

• Use a stacked RBM as deep auto-

encoder

1. Train RBM with images as input &

output

2. Limit one layer to few dimensions

Information has to pass through middle

layer

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural

Networks // Science 313 (2006), p. 504 – 507. 19 / 55

Original

Deep

RBN

PCA


Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions

(625 30)



How to use unsupervised pre-training stage / 1

21 / 55


22 / 55


23 / 55


24 / 55

Unlabeled data

Unlabeled data is readily available

Example: Images from the web

1. Download 10’000’000 images

2. Train a 9-layer DNN

3. Concepts are formed by DNN




PCA Deep RBN

804’414 Reuters news stories, reduction to 2 dimensions



Hierarchy of trained representations

Low-level

feature Middle-level

feature Top-level

feature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus

2013]

27 / 55

Hessian-Free optimization: Deep Learning with no pre-training stage

J. Martens. Deep Learning via Hessian-free Optimization // Proceedings of the 27th

International Conference on Machine Learning (ICML), 2010.

28 / 55

FLOPS comparison

https://ru.wikipedia.org/wiki/FLOPS

Type Name Flops Cost

Mobile Raspberry Pi 1st Gen, 700

Mhz 0,04 Gflops $35

Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)

CPU Intel Core i7-4930K (Ivy

Bridge), 3.7 GHz 140 Gflops $700

CPU Intel Core i7-5960X

(Haswell), 3.0 GHz 350 Gflops $1300

GPU NVidia GTX 980 4612 Gflops (single

precision), 144 Gflops

(double precision)

$600 + cost of PC

(~$1000)

GPU NVidia Tesla K80 8740 Gflops (single

precision), 2910

Gflops (double

precision)

$4500 + cost of

PC (~1500)

29 / 55



Deep Networks Training time using GPU

• Pretraining – from 2-3 weeks to 2-3 months.

• Fine-tuning (final supervised training) – from

1 day to 1 week.

30 / 55

Tools for training Deep Neural Networks

D. Kruchinin, E. Dolotov, K. Kornyakov, V. Kustikova, P. Druzhkov. The Comparison of

Deep Learning Libraries on the Problem of Handwritten Digit Classication // Analysis

of Images, Social Networks and Texts (AIST), 2015, April, 9-11th, Yekaterinburg.

31 / 55

Lazy Deep Learning: motivation

A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-the-shelf: an

Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and

Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512

– 519.

32 / 55

Lazy Deep Learning: bechmark results

A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN Features off-the-shelf: an

Astounding Baseline for Recognition //2014 IEEE Conference on Computer Vision and

Pattern Recognition Workshops (CVPRW), 23-28 June 2014, Columbus, USA, p. 512

– 519.

33 / 55

Convolutional Neural Networks: Return of Jedi

Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual

Recognition http://cs231n.github.io/convolutional-networks

Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press

book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 34 / 55

http://cs231n.github.io/convolutional-networks





http://www-labs.iro.umontreal.ca/~bengioy/DLbook/




AlexNet, 2012 — MeGa HiT

A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep

Convolutional Neural Networks // Advances in Neural Information Processing

Systems 25 (NIPS 2012).

35 / 55

AlexNet, results on ISVRC-2012

A. Kryzhevsky, I. Sutskever, G.E. Hinton. ImageNet Classification with Deep

Convolutional Neural Networks // Advances in Neural Information Processing

Systems 25 (NIPS 2012).

36 / 55

Convolution Layer




book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 37 / 55










Pooling layer




book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook

38 / 55










Implementation tricks: im2col

K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks

for Document Processing // International Workshop on Frontiers in Handwriting

Recognition, 2006.

39 / 55

Implementation tricks: im2col for convolution

K. Chellapilla, S. Puri, P. Simard. High Performance Convolutional Neural Networks

for Document Processing // International Workshop on Frontiers in Handwriting

Recognition, 2006.

40 / 55

Activation functions




book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook

𝑓(𝑥) = max 0, 𝑥

𝑓′ 𝑥 = 1, 𝑥 ≥ 00, 𝑥 < 0

ReLU activation function

41 / 55










Development of AleksNet on OpenCV

VGG MatConvNet: CNNs for MATLAB http://www.vlfeat.org/matconvnet/

mexopencv:MATLAB-OpenCV interface http://kyamagu.github.io/mexopencv/matlab

MatConvNet,

MATLAB + CUDA OpenCV app, C++

YAML

YAML

42 / 55

http://www.vlfeat.org/matconvnet/



http://kyamagu.github.io/mexopencv/matlab

http://kyamagu.github.io/mexopencv/matlab

ZZ Photo – photo organizer

Free beta version is available on http://zzphoto.me

43 / 55

http://zzphoto.me/

MIT-8 toy problem: formulation

• 8 classes

• 2688 images in total

• TRAIN: 2000 images,

250 per class

• TEST: 688 images,

~86 per class

S. Banerji, A. Verma, C. Liu. Novel Color LBP Descriptors for Scene and Image

Texture Classification // Cross Disciplinary Biometric Systems, 2012, 15th

International Conference on Image Processing, Computer Vision, and Pattern

Recognition, Las Vegas, Nevada, pp. 205-225. 44 / 55

MIT-8 toy problem: results

Acc.

TRAIN

Acc.

TEST

1 LBP + SVM with RBF Kernel

27,2% 19,0%

2 LPQ + SVM with RBF kernel 38,4% 30,5%

3 LBP + SVM with χ2 kernel 94,2% 74,0%

4 LPQ + SVM with χ2 kernel 99,1% 82,2%

5 Deep CNN (AlexNet) + SVM RBF kernel (LAZY DL) 95,1% 91,8%

6 Deep CNN (AlexNet) + SVM with χ2 Kernel (LAZY DL) 100,0% 93,2%

7 Deep CNN (AlexNet) + MLP (LAZY DL) 100,0% 92,3%

Original results, to be published. 45 / 55

Pets detection problem (Kaggle Dataset + random Other images)

• Kaggle Dataset +

random “other” images;

• 2 classes (cats & dogs

VS other);

• TRAIN: 1,000 samples;

• TEST: 12,000 samples.

46 / 55

Viola-Jones Object Detector

• Very popular for Human Face Detection.

• May be trained for Cat and Dog Face detection.

• Available free in OpenCV library (http://opencv.org).

O. Parkhi, A. Vedaldi, C. V. Jawahar, and A. Zisserman. The Truth about Cats and

Dogs // Proceedings of the International Conference on Computer Vision (ICCV),

2011. J.

Liu, A. Kanazawa, D. Jacobs, P. Belhumeur. Dog Breed Classification Using Part

Localization // Lecture Notes in Computer Science Volume 7572, 2012, pp 172-

185.

47 / 55

http://opencv.org/

Images pyramid for Viola-Jones

48 / 55

Viola-Jones Object Detector Classifier Structure

49 / 55

P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple

features // Proceedings of the 2001 IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, CVPR 2001.

Pets detection results: FAR vs FRR graphs


Pets detection results: FAR = 0.5%

Error, %

1 Viola-Jones Face Detector for Cats & Dogs + LBP + SVM 79,73%

2 AlexNet, argmax (STANDARD DL, ImageNet-2012, 1000)

26,11%

3 AlexNet, sum (STANDARD DL, ImageNet-2012, 1000) 26,11%

4 AlexNet + SVM linear (LAZY DL) 4,35%


Pet detection results : ROC curve


Labeled Faces in the Wild (LFW) Dataset

G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A

Database for Studying Face Recognition in Unconstrained Environments // University

of Massachusetts, Amherst, Technical Report 07-49, October, 2007

• more than 13,000 images

of faces collected from the

web.

• Pairs comparison,

restricted mode.

• test: 10-fold cross-

validation, 6000 face

pairs.

53 / 55

Face Recognition on LWF, results

54 / 55

Y. Taigman, M. Yang, M. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-

Level Performance in Face Verification, 2014, CVPR.

Error, %

1 Principal Component Analysis (EigenFaces) 60,2%

2 Local Binary Pattern Histograms (LBP) 72,4%

3 Deep CNN (AlexNet) + Euclid (LAZY DL) 71,0%

4 DeepFace by Facebook (STANDARD DL) 97,25%

contact: [email protected]

Thanks!

mailto:[email protected]

AI&BigData Lab. Артем Чернодуб "Распознавание изображений...

Documents

Transcript of AI&BigData Lab. Артем Чернодуб "Распознавание изображений...