Automatic Image Annotation (AIA)
-
Upload
farzaneh-rezaei -
Category
Education
-
view
377 -
download
2
Transcript of Automatic Image Annotation (AIA)
Seminar Report Presented to:
Dr. Shanbehzadeh
Presented by: Farzaneh Rezaei
November 2015
2
What is the goal of computer vision ?
Perceive the story behind the picture
See the world!!But what exactly does it mean to see?Source: Wall-e Movie: Pixar, Walt Disney Pictures
3
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• Deep Learning• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
4
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• Deep Learning• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
5
What is Automatic Image Annotation?Automatic image annotation is the task of automatically assigning words to an image that describe the content of the image.
Munirathnam Srikanth, et al. Exploiting ontologies for automatic image annotation
Source: Personalizing Automated Image Annotation Using Cross-Entropy: https://ivi.fnwi.uva.nl/isis/publications/bibtexbrowser.php?key=LiICM2011&bib=all.bib
6
What is Automatic Image Annotation?(Cont.)
Source: MS COCO Captioning Challenge: http://mscoco.org/dataset/#captions-challenge2015
7
3,000 Photos Are Uploaded Every Second to Facebook
Why Image Annotation is important?Recently, we have witnessed an exponential growth of user generated videos and images, due to the booming of social networks, such as Facebook and Flickr.
Source: petapixel.com
Source: http://petapixel.com/2012/02/01/3000-photos-are-uploaded-every-second-to-facebook/
8
Why Image Annotation is important?(Cont.)
Source: Barriuso, A., & Torralba, A. (2012). Notes on image annotation
• Applications e.g. Photo organizer apps• Image Classification Systems
9
Numbers of articles per year for “Automatic Image Annotation”
(in Title of article)
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 20150
10
20
30
40
50
60
70
Year Reported by: Google Scholar
10
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• Deep Learning• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
11
How do you annotate these images?
12
What are components of
Automatic Image Annotation
System ?
13
How to classify Images ?
What are components of
Automatic Image Annotation
System ?
14
Feature Extraction
ClassificationMethods
What are components of
Automatic Image Annotation
System ?
15
What are components of
Automatic Image Annotation
System ?
ClassificationMethods
Feature Extraction
16
What are components of
Automatic Image Annotation
System ?
Feature Extraction
ClassificationMethods
Pattern Recognition !!
17
Slide Credit
18
An Example of classical approaches in AIA
Source: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346–362. doi:10.1016/j.patcog.2011.05.013
19
Theoretical Limitations of Shallow Architectures*
Functions that can be compactly represented by a depth k architecture
might require an exponential number of computational elements to
be represented by a depth k − 1 architecture
Issues of classical approaches
*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning
20
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow Architectures
• Shallow? Deep?
• Functions?
• Compact?
• Depth?
• Computational Elements?
logic circuit
21
Issues of classical approaches (Cont.)
Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning
Depth 4 Depth 3
22
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow Architectures
• Linear regression and logistic regression have depth 1, i.e., have a single level.
• Ordinary multi-layer neural networks With the most common choice of one hidden
layer, they have depth two
• Decision trees can also be seen as having two levels
• Boosting (Freund & Schapire, 1996) usually adds one level to its base learners: that
level computes a vote or linear combination of the outputs of the base learners
23
Issues of classical approaches (Cont.)Theoretical Limitations of Shallow Architectures
• Shallow? Deep?
• Functions
• Compact
• Depth
• Computational Elements
24
Theoretical Limitations of Shallow Architectures*
Functions that can be compactly represented by a depth k architecture
might require an exponential number of computational elements to
be represented by a depth k − 1 architecture
Issues of classical approaches
*Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning
25
• A two-layer circuit of logic gates can represent any boolean function (Mendelson,
1997).
• With depth two logical circuits, most boolean functions require an exponential
number of logic gates (Wegener, 1987) to be represented (with respect to input size)
• There are functions computable with a polynomial-size logic gates circuit of depth k
that require exponential size when restricted to depth k − 1 (Hastad, 1986) The proof
of this theorem relies on earlier results (Yao, 1985) showing that d-bit parity circuits
of depth 2 have exponential size
Issues of classical approaches (Cont.)
26
• One might wonder whether these computational complexity results for boolean circuits are
relevant to machine learning.
• See Orponen (1994)!
• for an early survey of theoretical results in computational complexity relevant to learning
algorithms. Interestingly, many of the results for boolean circuits can be generalized to
architectures whose computational elements are linear threshold units (also known as
artificial neurons (McCulloch & Pitts, 1943)), which compute:
f(x) = w0 x+b≥0 (1)
with parameters w and b.
Issues of classical approaches (Cont.)
27
Issues of classical approaches (Cont.)
1 Theoretical Limitations of Shallow Architectures
2 Theoretical Advantages of Deep Architectures
Which one ?? !
28
Slide Credit
29
Slide Credit
30
How to assign a word to an image ?
What are components of
Automatic Image Annotation
System ?
Feature Extraction
ClassificationMethods
Pattern Recognition !!
Components of AIA
Classical or Shallow
Structure Issues
31http://graffiti-artist.net/corporate-offices/ny-facebook-office-graffiti/
32
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• CNN• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
33
Going Deeper!• Color• Texture• Shape• Segmentation
Feature Extraction &
Representation
• ANN• SVM• Bayes• Metadata
Learning Methods
34
Feature Extraction
ColorHistogram
Color Moments
Color Coherence
Vector
Color Correlogra
m Scalable Color
Descriptor
Color Structure Descriptor
Dominant Color
Descriptor
Spatial• Statistical• Structural• Model-basedSpectral• FT, DCT,
Wavelet, ..Texture
35
Color
36
Color
37
Color: ComparisonsColor method Pros Cons
Histogram Simple to compute, intuitive High dimension, no spatial info,sensitive to noise
CM Compact, robust Not enough to describe all colors, no spatial info
CCV Spatial info High dimension, high computation cost
Correlogram Spatial info Very high computation cost, sensitive to noise, rotation and scale
38
Color: Comparisons (Cont.)Color method Pros Cons
DCD Compact, robust,perceptual meaning
Need post-processing for spatial info
CSD Spatial info Sensitive to noise, rotation and scale
SCD Compact on need,scalability
No spatial info, less accurate ifcompact
39
Spatial Texture : ComparisonsColor method Pros Cons
Texton Intuitive Sensitive to noise, rotation and scale, difficult to define textons
GLCM based method Intuitive, compact, robust High High computation cost, not enough to describe all
Tamura Perceptually meaningful Too few features
SAR Compact, robust, rotationinvariant
High computation cost, difficult to define pattern size
FD Compact, perceptually meaningful computation cost, sensitive to scale
40
Spectral Texture : Comparisons (Cont.)Color method Pros Cons
FT/DCT Fast computation Sensitive to scale and rotation
Wavelet Fast computation, multi-resolution Sensitive to rotation, limitedorientations
Gabor Multi-scale, multi-orientation, robust
normalisation, losing of spectral information due to incomplete cover of spectrum plane
Curvelet Multi-resolution, multi-orientation, robust
Need rotation normalisation
41
Shape
Chart Source: [Zhang and Lu 2004]
42
Chart Source: [M. Yang, K. Kpalma, J. Ronsin 2008]
Shape (Cont.)
43
Shape (Cont.)
Contour Based
Calculate shape features only from the boundaryof the shape
Region Based
Extract features from the entire
region
44
Shape (Cont.)• Because contour based techniques are more sensitive to noise than
region based techniques.• Therefore, color image retrieval usually employs region based shape
features.
45
Learning Methods:
Learning Methods• SVM• ANN• Tree• Parametric• Non-Parametric
46
Learning Methods: ComparisonsAnnotation method Pros Cons
SVM Small sample, optimal class boundary, non-linear classification
Single labelling, one class per time, expensive trial and run, sensitive to noisy data, prone to over-fitting
ANN Multiclass outputs, non- linear classification, robust to noisy data, suitable for complex problem
Single labelling, sub-optimal, expensive training, complex and black box classification
DT Intuitive, semantic rules, multiclass outputs, fast, allow missing values, handle both categorical and numerical values
Single labelling, sub-optimal, need pruning, can be unstable
47
Learning Methods: ComparisonsAnnotation method Pros Cons
Non-parametric Multi-labelling, model free, fast Large number of parameters, large sample, sensitive to noisy data
Parametric Multi-labelling, small sample, good approximation of unknown distribution
Predefined distribution, expensive training, approximated boundary
Metadata Use of both textual and visual features
Difficult to relate visual features with textual features, difficult textual feature extraction
48
Deep Learning• Deep belief networks• Deep Boltzmann machines• Deep Convolutional neural networks• Deep Recurrent neural networks• Hierarchical temporal memory
Source: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts
49
Deep Learning (Cont.)
Source: Ranzato, 4 October 2013, Slides
50
Deep Learning (Cont.)
•A Potential Problem with Deep Learning *??•Optimization Task• See : • Bengio’s Articles!• Hot videos about Deep Learning on YouTube!• Ranzato, 4 October 2013:• https://www.youtube.com/watch?
v=clgMTk5V2Sk*: Ranzato, 4 October 2013, Slides
51
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• Deep Learning• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
52
2009, Shallow
Source: Venkatesh N. Mur thy, S. Maji, R. Manmatha, Automatic Image Annotation using Deep Learning Representations 2015
Useful Information: Recent Articles
53
Which one ?? !
1 Theoretical Limitations of Shallow Architectures
2 Theoretical Advantages of Deep Architectures
54
Source: B. Klein, G. Lev, G. Sadeh, and L. Wolf, Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation 2015
Useful Information: Recent Articles (Cont.)
55
Useful Information: Toolbox
MatConvNet• MatConvNet is a MATLAB toolbox
implementing Convolutional Neural Networks (CNNs) for computer vision applications. It is simple, efficient, and can run and learn state-of-the-art CNNs. Several example CNNs are included to classify and encode images.
Caffe• Caffe is a deep learning framework made with
expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors.Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
56
Useful Information: Databases
an important benchmark for keyword based image retrieval and image annotation5000 images manually annotated with 1 to 5 keywords. The vocabulary contains 260 words.
Corel5k:This data set is obtained from an online game where two players, that can not communicate outside the game, gain points by agreeing on words describing the image
ESP Game:This set of 20.000 images accompanied with descriptions in several languages was initially published for cross-lingual retrieval
IAPR TC12:
57
Useful Information: Databases• Other Databases:• Flicker8,10,30
Table Source: M. Guillaumin, T. Mensink, J. Verbeek and C. Schmid, TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Auto-Annotation
58
Useful Information: Authors
Cordelia Schmid• Research director INRIA• Computer vision, object recognition,
video recognition, learning
Li Fei-Fei• Professor, Stanford University• Artificial Intelligence,
Machine Learning, Computer Vision, Neuroscience
Yoshua Bengio• Professor, U. Montreal, Computer Sc.• Machine learning, deep learning,
artificial intelligence
Reported by: Google Scholar
59
Useful Information: Authors (Cont.)
Richard Socher• MetaMind• deep learning, machine learning,
natural language processing, computer vision
Recursive Deep Learning for Natural Language Pro
cessing and Computer Vision
,
PhD Thesis, Computer Science Department,
Stanford University
2014 Arthur L. Samuel Best Computer Science PhD
Thesis Award
Reported by: Google Scholar
60
Outline
Introduction To Image
Annotation
• What?• Why?
Story Behind AIA
• Components of AIA• Progress of AIA• Issues &
Conclusions
Going deeper !
• Feature Extraction• Learning Methods• Deep Learning• Conclusions
Useful Information
• Recent Articles• Toolbox• Databases• Authors
Conclusions
• References
61
How to assign a word to an image ?
What are components of
Automatic Image Annotation
System ?
Feature Extraction
ClassificationMethods
Pattern Recognition !!
Components of AIA
Classical or Shallow
Structure Issues
Conclusions !!!
62
1. High dimensional feature analysis2. How to build an effective annotation model?3. The third issue is that currently annotation and
ranking are done online simultaneously in the multiple labelling annotation approaches. This is not efficient for image retrieval.
4. Lack of standard vocabulary and taxonomy.5. There is no commonly acceptable image database6. insufficient depth of architectures, and locality of
estimators[Bengio, 2009]
Picture Source: Bengio, Y. (2009). Learning Deep Architectures for AI. Foundations and Trends® in Machine Learning
Source: Zhang, D., Islam, M. M., & Lu, G. (2012). A review on automatic image annotation techniques. Pattern Recognition, 45(1), 346–362. doi:10.1016/j.patcog.2011.05.013
Conclusions (Cont.)
63
References