rips-hk-lenovo (1)
-
Upload
owen-richfield -
Category
Documents
-
view
163 -
download
1
Transcript of rips-hk-lenovo (1)
Creation and Optimization of a LogoRecognition System
Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao
Academic Mentor: Dr. Albert KuIndustrial Mentor: Mr. Sun Lin
August 6, 2015
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Problem Description
Problem: What if there was anapp that could provide asmartphone user withinformation about a companyjust by recognizing thatcompany’s logo in an image?Goal: Create this app.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model Introduction
I Bag of Features ModelI Convolutional Neural Network
� Model Testing and Results� Application Demonstration� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model IntroductionI Bag of Features Model
I Convolutional Neural Network
� Model Testing and Results� Application Demonstration� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model IntroductionI Bag of Features ModelI Convolutional Neural Network
� Model Testing and Results� Application Demonstration� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model IntroductionI Bag of Features ModelI Convolutional Neural Network
� Model Testing and Results
� Application Demonstration� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model IntroductionI Bag of Features ModelI Convolutional Neural Network
� Model Testing and Results� Application Demonstration
� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Outline
� Model IntroductionI Bag of Features ModelI Convolutional Neural Network
� Model Testing and Results� Application Demonstration� Conclusions and Future Work
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
� Interest points detection
I Rotational and scale-invariant features� Interest points description
I Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
� Interest points detectionI Rotational and scale-invariant features
� Interest points description
I Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
� Interest points detectionI Rotational and scale-invariant features
� Interest points description
I Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
� Interest points detectionI Rotational and scale-invariant features
� Interest points descriptionI Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Extraction and description: SURF
� Interest points detectionI Rotational and scale-invariant features
� Interest points descriptionI Good representation form of image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Second-order box filter
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Taking advantages of integral image
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points detection
Use determinant of Hessian to detect blob-like structure
Use box filter to approximate the second order derivative of Gaussian filter
Taking advantages of integral domain
Apply scale-space analysis to choosethe appropriate points scale
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
SURF: Interest points description
Calculate dominant orientation based on Haar wavelet analysis
Build 4*4 descriptor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
BOW Training
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Feature Vector Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space
� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
clusterI Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
clusterI Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centers
I Calculate distances between each data point and eachcluster
I Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
cluster
I Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
clusterI Cluster points based on min distance
I Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
clusterI Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Basics of K-means
� Clustering Method in N -dimensional Space� Algorithmic Steps:
I With a given set of data, choose k cluster centersI Calculate distances between each data point and each
clusterI Cluster points based on min distanceI Recalculate cluster centers:
vi =1
ci
ci∑j=1
xj
I vi=new cluster center, ci=number of data points in ith
cluster, xj=jth data point in ith cluster.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
K-means Clustering
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Hierarchical K-means
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
FEATURE VECTORS
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
CL.
CL. CL. CL.
CL.
CL. CL.
CL.
CL. CL.
CL.
CL. CL.
X XXXXXXX X X
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Words and Hierarchical K-means
word1
word2
word3
word4
word5
0
2
4
6
8
3
8
2
5
1matches
1
;
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
� word 1:� word 2� word 3� word 4� word 5� word 6� ...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Inverted File Index
� word 1: image 1, image 3, image 5, ...� word 2: image 4, image 9, image 16, ...� word 3: image 4, image 12, image 13, ...� word 4: image 1, image 5, image 7, ...� word 5: image 2, image 3, image 9, ...� word 6: image 7, image 12, image 17, ...� ...
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
� Benefit: retrieval via the inverted file is faster thansearching every image
� Drawback: lack of spatial accuracy
� Need additional verification to re-rank the retrieval images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
� Benefit: retrieval via the inverted file is faster thansearching every image
� Drawback: lack of spatial accuracy
� Need additional verification to re-rank the retrieval images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Classification: Inverted File Index
� Benefit: retrieval via the inverted file is faster thansearching every image
� Drawback: lack of spatial accuracy
� Need additional verification to re-rank the retrieval images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Bag of Features Model
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:
I Match each descriptor in query image to its nearestneighbor descriptor from list image.
I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.
I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:
I Match each descriptor in query image to its nearestneighbor descriptor from list image.
I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.
I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:I Match each descriptor in query image to its nearest
neighbor descriptor from list image.
I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.
I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:I Match each descriptor in query image to its nearest
neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.
I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:I Match each descriptor in query image to its nearest
neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.
I Sum up number of “matches” for each list image and divideby total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:I Match each descriptor in query image to its nearest
neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Re-ranking of Return Images
� Match descriptors of query image to descriptors in imagesin returned list.
� Simple Algorithm:I Match each descriptor in query image to its nearest
neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query
descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide
by total number of features.
� The returned list is then re-ranked based on this “matchratio” and returned to the user.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional NeuralNetworks (CNNs)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Neural Networks
Figure: Neural network from http://www.texample.net/media/tikz/examples/PNG/neural-network.png
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with anadditional biological inspiration.
Each layer is of two basictypes: convolution and pooling.
� Convolution is the process of convolving an image with akernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.
� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a
kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.
� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Convolutional Neural Networks
Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a
kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.
� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Figure: Description of convolution process from http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:
� AlexNet [?], the winner of the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) 2012.
� GoogLeNet [?], the winner of the ILSVRC 2014.
Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.
� GoogLeNet [?], the winner of the ILSVRC 2014.Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.
Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.
Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Implementation and Architecture
For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.
Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
AlexNet
Figure: Image of AlexNet architecture (from [?]). This also illustrateshow original the network was split to train on two GPUs.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
GoogLeNet
Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12xfewer parameters than AlexNet.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Filter/Layer Visualization
Let’s do some filter/layer visualization!� 143.89.75.120/filayer.html
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Model Testing
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”.
One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:
� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold
� import ManualLabor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:
� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold
� import ManualLabor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors
between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold
� import ManualLabor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Dataset Construction
We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors
between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold
� import ManualLabor
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing the original pipeline
� parameter tuning� cross validation
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� BOW structure: how to choose vocabulary size:I words = BL
I B: number of branch; L: number of level
I Too large: lack of generalization, overfitting
I
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� BOW structure: how to choose vocabulary size:I words = BL
I B: number of branch; L: number of levelI Too large: lack of generalization, overfitting
I
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� BOW structure: how to choose vocabulary size:I words = BL
I B: number of branch; L: number of levelI Too large: lack of generalization, overfittingI Too small: lack of discrimination,mismatched
I
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� vocabulary size� How to choose the number of images returned by inverted
file index searchI accuracyI the computation time of re-ranking
� How to choose the number of image shown in the clientside
I accuracyI mobile application, the size of screen
post
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� vocabulary size� How to choose the number of images returned by inverted
file index searchI accuracyI the computation time of re-ranking
� How to choose the number of image shown in the clientside
I accuracyI mobile application, the size of screen
post
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� vocabulary size� How to choose the number of images returned by inverted
file index searchI accuracyI the computation time of re-ranking
� How to choose the number of image shown in the clientside
I accuracyI mobile application, the size of screen
post
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameter Tuning
� vocabulary size� the number of images returned by searching� the number of image shown� Re-ranking: how to determine weight factor w in the
weighted functionI scores = w ∗ I + (1− w) ∗ FI I: number of inliersI F: frequency of the brands in the return images
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Parameters for Evaluation
� vocabulary sizeI number of branchI number of level
� the number of images returned by searching� the number of image shown� weight factor w in the weighted function� calculation of the accuracy
I one correct return then accuracy = 1
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Cross Validation
� applicationI model selectionI model assessment
� procedure
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Cross Validation
randomly divide the data into Kequal sized parts.� leave out part k, fit the
model to the other K-1parts(combined), and thenobtain predictions for theleft-out kth part
� this is done in turn for eachpart k=1,2,...K, and thenthe results are combined
� choose k = 5
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
� test on vocabulary size� optimal number of words: 500000 to 800000
I number of branch = 14 or 15I number of level = 5
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
� With otherparameters fixed,test on
I weight factorI number of return
imageI number of image
shown on theclient side
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Result
� optimal parametersetting:
I number of imageshown = 6
I set number ofreturn image tobe 15, savingabout 0.3s
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Testing Summary
� optimal parameter setting:I number of words: 500000 to 800000I number of image return: 15I number of image shown: 6
� stability of the system was also test:I standard deviation of 5 fold cross validation range from
0.005 to 0.007
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet (Top-5 Accuracy)
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
0.95
10
00
60
00
11
00
0
16
00
0
21
00
0
26
00
0
31
00
0
36
00
0
41
00
0
46
00
0
51
00
0
56
00
0
61
00
0
66
00
0
71
00
0
76
00
0
81
00
0
86
00
0
91
00
0
96
00
0
10
10
00
10
60
00
11
10
00
11
60
00
12
10
00
12
60
00
13
10
00
13
60
00
14
10
00
14
60
00
15
10
00
15
60
00
16
10
00
16
60
00
17
10
00
17
60
00
18
10
00
18
60
00
19
10
00
19
60
00
Cross Validation Example
94.63% 94.02%93.80%94.02%93.90%93.59%94.11%93.44%94.54%93.80%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Final Accuracy reaches: (AlexNet)
AlexNet
Top-1 Accuracy 93.33%
Top-5 Accuracy 96.73%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for GoogleNet (Top-5 Accuracy)
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Cross-validation for AlexNet
Cross-validation for GoogleNet
Final Accuracy reaches: (GoogleNet)
GoogleNet
Top-1 Accuracy 94.05%
Top-5 Accuracy 97.39%
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Evaluation of Deep Learning framework
Final Comparison
GoogleNet AlexNet Visual Bag of Words
Accuracy (Top-5) 97.39% 96.73% 87.6%
Efficiency
Preprocess 8.47ms 7.5ms 6ms
Classification 17.7ms 6.94ms
SURF Featureextraction
24ms
Total Time(Including some system level operation)
129ms 170ms 281ms
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Demonstration
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and
16,000 images)
� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep
learned feature to achieve better accuracy.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and
16,000 images)� Test different deep learning frameworks.
� Combine locally hand-crafted feature and globally deeplearned feature to achieve better accuracy.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Future development
There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and
16,000 images)� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep
learned feature to achieve better accuracy.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
We would like to thank� Mr. Sun Lin and Lenovo-Hong Kong.� Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong
University of Science and Technology.� Professor Susanna Serna and the Institute for Pure and
Applied Mathematics.� The National Science Foundation for program funding -
Grant DMS #0931852.
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo
Qi, Richfield, Zeng, Zhao
RIPS-HK: Lenovo