Cognitive IoT using DeepLearning on data parallel frameworks like Spark & Flink
Chernodub Paris Deeplearning
-
Upload
anton-kulaga -
Category
Documents
-
view
37 -
download
4
description
Transcript of Chernodub Paris Deeplearning
![Page 1: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/1.jpg)
Deep Learning for Face Recognition
Artem Chernodub, Yurii Paschenko
IMMSP NASU
Paris, 30 September 2015.
ZZWolf
![Page 2: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/2.jpg)
𝑝 (𝑥|𝑦 )=𝑝 ( 𝑦|𝑥 )𝑝 (𝑥)
𝑝 (𝑦 )
Biological-inspired models
Neuroscience
Machine Learning
2 / 50
![Page 3: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/3.jpg)
Biological Neural Networks
3 / 50
![Page 4: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/4.jpg)
Artificial Neural Networks
Traditional (Shallow) Neural Networks
Deep Neural Networks
Deep Feedforward Neural Networks
Recurrent Neural Networks
4 / 50
![Page 5: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/5.jpg)
Deep Neural Networks
![Page 6: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/6.jpg)
Deep Learning = Learning of Representations (Features)
The traditional model of pattern recognition (since the late 50's):
fixed/engineered features + trainable classifierHand-
crafted Feature
Extractor
TrainableClassifier
Trainable Feature
Extractor
TrainableClassifier
End-to-end learning / Feature learning / Deep learning:
trainable features + trainable classifier
6 / 50
![Page 7: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/7.jpg)
Deep Feedforward Neural Networks
• 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!).
• Number of hidden layers > 1 (usually 6-9).• 100K – 100M free parameters.• No (or less) feature preprocessing stage. 7 / 50
![Page 8: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/8.jpg)
FLOPS comparison
https://ru.wikipedia.org/wiki/FLOPS
Type Name Flops Cost
Mobile Raspberry Pi 1st Gen, 700 Mhz
0,04 Gflops $35
Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)
CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz
140 Gflops $700
CPU Intel Core i7-5960X (Haswell), 3.0 GHz
350 Gflops $1300
GPU NVidia GTX 980 4612 Gflops (single precision),
144 Gflops (double precision)
$600 + cost of PC (~$1000)
GPU NVidia Tesla K80 8740 Gflops (single precision),
2910 Gflops (double precision)
$4500 + cost of PC (~1500)
8 / 50
![Page 9: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/9.jpg)
Sparse Autoencoders
9 / 50
![Page 10: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/10.jpg)
Dimensionality reduction
• Use a stacked RBM as deep auto-encoder
1. Train RBM with images as input & output
2. Limit one layer to few dimensions
Information has to pass through middle layer
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.10 /
50
![Page 11: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/11.jpg)
Original
Deep RBN
PCA
Dimensionality reduction
Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.11 /
50
![Page 12: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/12.jpg)
How to use unsupervised pre-training stage / 1
12 / 50
![Page 13: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/13.jpg)
How to use unsupervised pre-training stage / 2
13 / 50
![Page 14: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/14.jpg)
How to use unsupervised pre-training stage / 3
14 / 50
![Page 15: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/15.jpg)
How to use unsupervised pre-training stage / 4
15 / 50
![Page 16: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/16.jpg)
Unlabeled data
Unlabeled data is readily available
Example: Images from the web
1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.16 /
50
![Page 17: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/17.jpg)
Dimensionality reduction
PCA Deep RBN
804’414 Reuters news stories, reduction to 2 dimensions
G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.17 /
50
![Page 18: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/18.jpg)
Hierarchy of trained representations
Low-level feature
Middle-level
feature
Top-level feature
Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]
18 / 50
![Page 19: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/19.jpg)
Face Recognition
![Page 20: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/20.jpg)
Standard Face Recognition Scenario
Step 1. Face Detection
Step 2. Face Alignment
Step 3. Face Matching
20 / 50
![Page 21: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/21.jpg)
Face Matching (Face Recognition)
A) Training the classifier B) Matching two face images
Classifier
Features
Answer21 /
50
![Page 22: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/22.jpg)
Recognition of Face Attributes
• man;• white;• 45-50 years;• glasses – no;• moustache –
yes;• beard – yes.
22 / 50
![Page 23: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/23.jpg)
Beauty Recognition
http://www.linkface.cn/face/beauty
Чернодуб А.Н., Пащенко Ю.А., Головченко К.А. Нейросетевая система определения привлекательности лица человека // «Нейроинформатика-2013», Москва, 21-25 января 2013, c. 254 — 259. 23 /
50
![Page 24: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/24.jpg)
Face Matching
![Page 25: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/25.jpg)
EigenFaces, Principal Component Analysis (PCA) for Face Matching, 1991
M. Turk, A. Pentland. Face Recognition using EigenFaces // Journal of cognitive neuroscience 3 (1), p. 71-86.
• Well-known method for dimensionality reduction.• Building features from the training data.• Impressive visualization of the basis vectors.
25 / 50
![Page 26: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/26.jpg)
Local Binary Pattern Histograms (LBPH), 2006
• Fast & robust features for texture descriptions applied to face recognition.
• distance for comparison of histograms.• Matrix of importance regions (weighted .
T. Ahonen, A. Hadid. Face Description with Local Binary Patterns:Application to Face Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, p. 2037-2041.
26 / 50
![Page 27: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/27.jpg)
Deep Face (Facebook), 2014
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014.
Model # of parameters
Accuracy, %, LFW
Deep Face Net 128M 97.25
Human level N/A ~ 97.64
Training data: 4M facial images
27 / 50
![Page 28: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/28.jpg)
Convolutional Neural Networks: Return of Jedi
Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks
Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 28 /
50
![Page 29: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/29.jpg)
DeepFace Training Framework
Step 1. Supervised training for identification
Step 2. Use produced features for face matching
Big Face database
~1-10M images,~ 1-5K persons
ConvolutionalNeuralNetwork
Feature
1-5Klabels
Feature 2
Feature 1 - Cosine metric- Euclid metric
- metric- Siamese networks
Similarity
29 / 50
![Page 30: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/30.jpg)
Face Recognition Algorithm
Face localization
Normalization
Feature extraction
Comparing
Verification
metricFALSE
30 / 50
![Page 31: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/31.jpg)
Databases for Deep Face Training
• Facebook’s Social Face Classification (SCF) dataset, 2014. 4.4M labeled faces, 4030 people with 800 to 1200 faces each. Private dataset.
• Visual Geometry Group Dataset, Oxford, 2015. 2622 people with 1000 faces each. Private dataset.
• CASIA WebFace Database, 2014. 10575 people, 500K faces. Public dataset.http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html
31 / 50
![Page 32: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/32.jpg)
CASIA
• Dataset: CASIA WebFace (10 575 subjects and 494 414 face images)
• Alignment: 2D• Architectures: single CNN + triplet
loss• Verification metric:– Cosine– Joint Bayes
D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.
32 / 50
![Page 33: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/33.jpg)
CASIA architecture
• Input: gray image 100x100
• Feature vector size: 320
• Parameters: ~5M
33 / 50
![Page 34: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/34.jpg)
CASIA replication
Training data Accuracy on LFW
Normalized 97,27 %
No alignment 94,15 %
Our 2D alignment 96,23 %
34 / 50
![Page 35: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/35.jpg)
Error rate vs Database size
Identities
Faces Error rate
1.5K 150K 3.1%
9K 450K 1.35%
18K 1.2M 0.87%
J. Liu, Y. Deng, T. Bai, Zh. Wei, C. Huang. Baidu Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, 2015 http://hdl.handle.net/1721.1/97999
35 / 50
![Page 36: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/36.jpg)
Face Databases (benchmarks)
![Page 37: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/37.jpg)
AT&T Database (Olivetti, ORL): 1992-1994
F. Samaria, A. Harter. Parameterisation of a Stochastic Model for Human Face Identification // Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994.
http://www.uk.research.att.com/
• Grayscale images, 92x112.
• 40 subjects, 10 images per subject.
• Different facial expressions, poses, glasses.
• Studio, single session.
37 / 50
![Page 38: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/38.jpg)
FERET dataset, 1994-1996
P.J. Phillips, H. Moon, P. Rauss, S. A. Rizvi. The FERET September 1996 Database and Evaluation Procedure // First International Conference, AVBPA'97 Crans-Montana, Switzerland, March 12–14, 1997, pp. 395-402
http://www.nist.gov/srd/
• 2413 still facial images, representing 856 individuals, studio, two sessions.
• Different facial expressions, poses, light.
38 / 50
![Page 39: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/39.jpg)
Labeled Faces in the Wild dataset, 2007
G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.http://vis-www.cs.umass.edu/lfw/
• Images with faces collected from the web.
• Pairs comparison, restricted mode.
• test: 10-fold cross-validation, 6000 face pairs.
39 / 50
![Page 40: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/40.jpg)
Results for Face Matching, LWF (option “Labeled outside data”)
http://vis-www.cs.umass.edu/lfw/results.html
Method Accuracy
1 EigenFaces 60,20%
2 LBP-classic (3K features) 72,43%
3 High-dim LBP (100K features) 95,17%
4 DeepFace 97,25%
5 Human level 97,64%
6 DeepID3 99,53%
7 Baidu 99,77%
40 / 50
![Page 41: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/41.jpg)
Face Detection
![Page 42: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/42.jpg)
Viola-Jones Object Detector
• Very popular for Face Detection.• Different features except HAAR: LBP, SURF/SIFT, etc.• Available free in OpenCV library (http://opencv.org).• OpenCV implementation supports Windows, Linux,
Mac OS, iOS (not officially) and Android.• May be trained with MATLAB tool traincascade.
42 / 50
![Page 43: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/43.jpg)
Images pyramid for Viola-Jones
43 / 50
![Page 44: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/44.jpg)
FDDB: Face Detection Data Set and Benchmark
Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings // Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.
http://vis-www.cs.umass.edu/fddb
• 5171 faces in a set of 2845 LWF images.
• Special software tool for testing.
44 / 50
![Page 45: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/45.jpg)
Viola-Jones Object Detector Classifier Structure
P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001.
45 / 50
![Page 46: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/46.jpg)
FDDB: state-of-the-art
http://vis-www.cs.umass.edu/fddb/results.html46 /
50
![Page 47: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/47.jpg)
Face Alignment
![Page 48: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/48.jpg)
3D Face Alignment
Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. 48 /
50
![Page 49: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/49.jpg)
2D Face Alignment
D. Yi, Z. Lei, S. Liao, S.Z. Li. Learning Face Representation from Scratch // CoRR abs/1411.7923 (2014). [CASIA]
49 / 50
![Page 50: Chernodub Paris Deeplearning](https://reader036.fdocuments.net/reader036/viewer/2022062407/563db93a550346aa9a9b4f3a/html5/thumbnails/50.jpg)
Some Computer Vision / Machine Learning tools• OpenCV, VLFeat – popular Computer Vision
libraries.• mexopencv – MATLAB interface for
OpenCV.• pythonXY – all-in-one Machine Learning
Package.• libSVM – reliable cross-platform Support
Vector Machine library.• NetLab – Shallow Neural Networks & other
ML models, from Christopher Bishop’s book.
• Caffe, Theano, Torch, matconvnet – frameworks for training Deep Neural Networks. 50 /
50