Scalable image recognition model with deep embedding
-
Upload
- -
Category
Technology
-
view
89 -
download
0
Transcript of Scalable image recognition model with deep embedding
Scalable Image Recognition Model with Deep Embedding
Chieh-En [email protected]
Motivation
Motivation: DNN raising
• Deep Neural Network achieved the best performance for variety of visual tasks.
Motivation: popular mobiles
• devices like smartphone, in-car camera, GoPro, IOT devices pop up.
Huge amount of valuable images stored not in server, but in mobile & IOT devices
Motivation: exploit DNN
• High performance brought by DNN• Valuable data brought by mobile & IOT devices
How to exploit the best of both worlds ?
Solution: client-server system
La Tour Eiffel
averaging 7 - 12 secCan’t do real-time application
Or, another way
Solution: pure mobile system
DatasetLib
Linear
Feature extractionClassification
OrFurther
Processing
Send low dim.feature to server formore complicated job
Problem: Limited Storage & Computing power
• Too many parameters for a DNN model makes it impossible to fit in a storage & computing limited system like mobile & IOT devices
• How to perform image classification on mobile & IOT device?
Krizhevsky et al model size (alexNet)
A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” NIPS, 2012.
Layer: Model Size(MB)Conv1: float*(48+48)*(3*11^2) = 0.1Conv2: float*(128+128)*(48*5^2) = 1.2Conv3: float*(192+192)*(256*3^2 = 3.4Conv4: float*(192+192)*(192*3^2) = 2.5Conv5: float*(128+128)*(192*3^2) = 1.7FC6: float*((128+128)*6^2)*4096 = 144(66%)FC7: float*4096*4096 = 64(29%)
Total = 217 MB
Solution:Semantic-Rich Low Dim. Feature
• The activations of fully connected layer of alexNet model are viewed as a general high-semantic feature in recent years
• 95% of model parameters are for fully connected
Solution:Semantic-Rich Low Dim. Feature
Drop fully connected layer in final model while still encoding it’s information !
How ?
Kernel Preserving Projection(KPP)• find a linear transformation that project
features into a lower dimensional space where ”preserve the relevance distance in kernel space”
YC Su et. al. ,”Scalable Mobile Visual Classification by Kernel Preserving Projection over High Dimensional Features”, IEEE, 2014
Kernel Preserving Projection(KPP)
• find a explicit transform such that:
• In matrix representation, we want to find a matrix
Kernel Preserving Projection(KPP)
• MVProjection:
• L1MVProjection:
Deep Embedding
• Experimental result shows that on hand-craft feature, RBF kernel perform best
• Thought inf. dim. , RBF space itself is semantically meaningless !
Deep Embedding
• For RBF kernel,
• For Deep Embedding,
Deep Embedding
Not only model reduced,but also the classifier
Result
In the experiment, we use liblinear as our classifier and perform 10-fold on scene15 benchmark dataset. We first compare KPP(RBF) and other methods on hand-craft state-of-the-art feature(VLAD) to show how KPP outperform others.
Result
Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN
- Deep embedding outperform other method by large on DNN feature.
The final model result in:- Requiring only 14% of parameters, 86% space saved.
(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !
Result-Deep Embed
21.1M030MB
Result-Deep Embed
- Acc. boost from 75.6%(hand-craft) to 89.5%(alexNet) shows to power of DNN
- Deep embedding outperform other method by large on DNN feature.
The final model result in:- Requiring only 14% of parameters, 86% space saved.
(217M->30M)
- Accuracy drop only 1.12%.(89.5%->88.38%)
- Suitable for mobile & IOT device computing !
Thank you !