Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2,...
-
Upload
sibyl-wilcox -
Category
Documents
-
view
222 -
download
1
description
Transcript of Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2,...
![Page 1: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/1.jpg)
Unsupervised Auxiliary Visual Words
Discovery for Large-Scale
Image Object Retrieval
Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang 1, and Winston H. Hsu 1
1 National Taiwan University and 2 Academia Sinica, Taipei, Taiwan
CVPR 2011
![Page 2: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/2.jpg)
Outline• Introduction• Key Observations -- the problem of BoW
model
• Graph construction and Image Clustering• Semantic Visual Features Propagation• Common Visual Words Selection• Solution & Optimization– Gradient Descent Solver– Analytic Solver
• Experiment and Result• Conclusion & Future Work
![Page 3: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/3.jpg)
Query image:
Introduction
It is a challenging problem because target may cover only small region
Image object retrieval – retrieving images containing the target image object – is one of the key techniques of managing the exponentially growing image/video collections
Result:
![Page 4: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/4.jpg)
• Although BoW is popular and shown effective for image object retrieval [14]
BoW-like methods fail to address issues related to:☻Noisily quantized visual features ☻Vast variations in viewpoints☻Lighting conditions☻Occlusions.
• Thus it suffers from low recall rate
Introduction
![Page 5: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/5.jpg)
Traditional BoW v.s. Proposed
![Page 6: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/6.jpg)
• The contribution of this paper:☺ Observing problems (Two) in large-scale image object retrieval
by conventional BoW model
☺ Proposing auxiliary visual words(AVW) discovery through visual and textual clusters in unsupervised and scalable fashion
☺ Investigate variant optimization methods for efficiency and accuracy in AVW discovery
☺ Conducting experiments on consumer photos and show improvement recall rate for image object retrieval
Introduction
![Page 7: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/7.jpg)
Prob1. Sparseness of the Visual words
• Total 540,321 images in Flickr550 dataset– Half of VWs only occur in less than 0.11% (57 images)– Most (96%) VWs occur for about 0.5% (2702 images)
• Those similar images will have very “few common VWs”
This is known as the uniqueness of VWs [2]
Partly due to some quantization errors or noisy features
![Page 8: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/8.jpg)
Prob.2Lacking Semantics Related Feature
![Page 9: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/9.jpg)
Graph construction and Image Clustering
• Image clustering is based on graph construction• Images are represented by 1M VWs and 90K
Text tokens by Google snippets from associated tags
• Construct large-scale image graph by MapReduce [4] Algorithm (large scale calculation)
![Page 10: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/10.jpg)
Graph construction and Image Clustering
• To cluster images on the image graph , we apply Affinity Propagation (AP) [5]
• AP’s advantage:– Automatic determining the number of clusters– Automatic canonical image detection within each
cluster
![Page 11: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/11.jpg)
Graph construction and Image Clustering
• Apply Affinity Propagation algorithm for both textual and visual relation
![Page 12: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/12.jpg)
Semantic Visual Features Propagation
• Conduct the propagation on each extend visual cluster (Fig. b)
• If there is a single image in visual cluster (Fig. b, point H), it can also obtain AVWs in extend visual cluster
• We have VW histograms X and propagation matrix P is unknown
(Xi is VW combination of image i)
![Page 13: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/13.jpg)
Semantic Visual Features Propagation
• Propose to formulate propagation as
• First term: avoid propagating too many VWs• Second term: keep similarity to original
propagation matrix
Frobenius norm (Euclidean) norm
![Page 14: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/14.jpg)
Common Visual Words Selection
![Page 15: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/15.jpg)
Common Visual Words Selection
• Let X be VW combinations, S be selection matrix (unknown)
• Propose to formulate selection as
• First term: avoid too many distortions from original features• Second term: reduce number of selected features
![Page 16: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/16.jpg)
Finding Solutions• Stack columns of P to a vector
• p=vec(P)• P0=vec(P0)
• Replace vec(PX) with (XT IM)p• is Kronecker product• Propagation function becomes
X
X
![Page 17: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/17.jpg)
Kronecker product
![Page 18: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/18.jpg)
Optimization
• The first term of (5) is positive semi-definite• The second term of (5) is positive finite
because α2 > 0• So propagation function has unique optimal
solution• Same for selection function
![Page 19: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/19.jpg)
Optimization• The two equations are strictly convex
quadratic programming problems• Able to use quadratic programming solver to
find optimal solutions• Two solvers are used for evaluation:– Gradient Descent Solver– Analytic Solver
![Page 20: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/20.jpg)
Gradient Descent Solver• Updates p by• η is called learning rate• It’s time consuming calculating• Rearrange function by
and get
![Page 21: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/21.jpg)
Gradient Descent Solver• Finally, get
• The initial P is P0
• Do similar job for selection formula, get
• But with initial S to zero matrix
![Page 22: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/22.jpg)
Analytic Solver• The optimal solution should satisfy• From eq(4)
can be represented by
where H is positive definite Hessian matrix, so and back to matrix form,
![Page 23: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/23.jpg)
Analytic Solver• Similarly, S can be solved by
by using inverse function
the S can represented by
XTX is 1Mx1M, but XXT is smaller (time saving)
![Page 24: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/24.jpg)
Experiments• Uses Flickr550 as main dataset• Select 56 query images (1282 ground truths)• Pick 10000 images from Flickr550 to form a
smaller subset called Flickr11k
![Page 25: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/25.jpg)
Experiments• Uses Mean Average Precision (MAP) over all
queries to evaluate performance• Apply query expansion technique of pseudo-
relevance-feedback (PRF) • Take L1 distance as baseline for BoW model• The MAP baseline is 0.245 with 22M feature
points• MAP after PRF is 0.297
![Page 26: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/26.jpg)
Result and Discussions
The MAP of AVW results with the best iteration number and PRF in Flickr11K with totally 22M (SIFT) feature points. Note that the MAP of the baseline BoW model [14] is 0.245 and after PRF is 0.297 (+21.2%).
#F represents the total number of features retained; M is short for million. % indicates the relative MAP gain over the Bow baseline
![Page 27: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/27.jpg)
Result and Discussions1. Propagation then selection2. Selection then propagation
Propagation then selection has more accuracyBecause: 2 might lose some common VWs before
propagation
![Page 28: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/28.jpg)
Result and Discussions• We only need one or two iterations to achieve
better result– Informative and representative VWs have been
propagated or selected in early iteration steps• Number of features significantly reduced from
22.2M to 0.3M (1.4%)• Using α=β=0.5
Learning Time(s) GDS AS
Propagation 2720 123
Selection 1468 895
![Page 29: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/29.jpg)
Search Result by Auxiliary VWs
![Page 30: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/30.jpg)
Result and DiscussionsFrom the figure, α=0.6 should work well
![Page 31: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/31.jpg)
Conclusions & Future Work• Conclusions:– Showed problems of current BoW model and
needs for semantic visual words to improve recall rate
– Formulated process as unsupervised optimization problems
– Improve accuracy by 111% relative to BoW model• Future Works:– Look for other solvers to maximize accuracy and
efficiency
![Page 32: Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.](https://reader033.fdocuments.net/reader033/viewer/2022051503/5a4d1ad07f8b9ab0599711c8/html5/thumbnails/32.jpg)
Thank you