Machine learning and multimedia information retrieval
-
Upload
si-krishan -
Category
Technology
-
view
122 -
download
5
description
Transcript of Machine learning and multimedia information retrieval
Machine Learning and Multimedia Information Retrieval*
Integrated Knowledge Solutions
* Based on a talk at ICMLA Conference
Outline
• Introduction
• Bridging the Semantic Gap
• Events in Videos
• Use of Tagging in MIR
• Killer Apps of MIR
• Take Home Message
12/12/2010 2 ICMLA Talk
Too Much Information
Which is more frustrating?
Being stuck in traffic on way to or from work
Not being able to find information you urgently need
12/12/2010 3 ICMLA Talk
According to a survey by Xerox
Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang.
Not a New Problem
The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.
12/12/2010 4 ICMLA Talk
However, Earlier
Data Producers
Data Consumers
12/12/2010 5 ICMLA Talk
But Now a Days
12/12/2010 6 ICMLA Talk
Some Relevant Numbers
Photobucket has 6.2 billion photos and Flickr has over 2 billion.
Facebook has over 10 Billion photos and over 400 million active users.
12/12/2010 7 ICMLA Talk
Phenomenon
• 24 hours of videos are uploaded to YouTube every one minute
• YouTube streams 2 billions of videos every day
12/12/2010 8 ICMLA Talk
12/12/2010 ICMLA Talk 9
So how do we get help in finding the desired multimedia information?
MIR
So What is MIR?
• Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval)
• Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content
12/12/2010 10 ICMLA Talk
History of MIR
• Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy)
• NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA)
• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference
• Shift to semantic similarity from signal similarity, 1999
• Community tagging, photo and video sharing sites, 2002
12/12/2010 11 ICMLA Talk
A Typical MIR System
Feature Extraction
Features Media Collection
Indexing & Matching
Query Feature Extraction
Retrieved Results
Relevance Feedback
12/12/2010 12 ICMLA Talk
Semantic Gap
Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept.
Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000
12/12/2010 13 ICMLA Talk
http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/
Semantic Gap
Users also like to query using descriptive words rather than query images or other multimedia objects. This requires MIR systems to correlate low-level features with high level concepts.
Visually dissimilar images representing the same concept.
12/12/2010 14 ICMLA Talk
How to Bridge the Semantic Gap?
Exploit context • Text surrounding images • Associated sound track and closed captions in videos • Query history
Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features
12/12/2010 15 ICMLA Talk
Exploiting Context: An Example
12/12/2010 16 ICMLA Talk
Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001
Example of Using Surrounding Text
12/12/2010 ICMLA Talk 17
Context via Surrounding Text
12/12/2010 ICMLA Talk 18
Context Via Surrounding Text: One More Example
12/12/2010 ICMLA Talk 19
Better Context with More Text
12/12/2010 ICMLA Talk 20
Improving Context via More Words per Query
12/12/2010 21 ICMLA Talk
Issues Unique to ML for MIR
• Simultaneous presence of multiple concepts
• How to extract/isolate concept-specific features? Segment or do not segment?
• Imbalance between positive and negative examples
• Extremely large number of concepts for a general purpose MIR
Romance, couple, beach, sundown From: s163.photobucket.com
12/12/2010 22 ICMLA Talk
A Template Relating Concepts with Pictures Concepts Image Tokens Images
12/12/2010 23 ICMLA Talk
Feature Extraction Issues
Whole image based features. Easy to use but not very effective
Region based features. Both regular region structure and segmented regions are popular
Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image
12/12/2010 24 ICMLA Talk
Scale Invariant Feature Transform (SIFT) Descriptors
SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values
http://www.vlfeat.org/
12/12/2010 25 ICMLA Talk
D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.
Feature Discovery
Basic idea is to discover features that are best suitable for a given collection
12/12/2010 26 ICMLA Talk
Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004
Image Category Classifiers (ICC)
• Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc)
• Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts
12/12/2010 27 ICMLA Talk
VQ Based Image Category Classifier
Test Image
Best Codebook Label
Water Codebook
Sky Codebook
Fire Codebook
Mustafa & Sethi (2004)
12/12/2010 28 ICMLA Talk
Object Detectors
12/12/2010 29 ICMLA Talk
PASCAL Visual Object Classes Challenge
Project
http://labelme.csail.mit.edu/
Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.
12/12/2010 30 ICMLA Talk
IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos)
Image Category Classifiers Examples
12/12/2010 31 ICMLA Talk
Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities.
From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)
Image Classification via Probabilistic Modeling
12/12/2010 ICMLA Talk 32
Retrieving Events in Videos
• An event in MIR implies an interesting spatiotemporal instance
• Considerable work in MIR community on events because of popularity of sports videos
• Also tremendous interest in detecting and recognizing events with potential homeland security applications
12/12/2010 33 ICMLA Talk
Event Retrieval Examples: Supervised Approach
Mustafa & Sethi AVSS Conference 2005
12/12/2010 34 ICMLA Talk
Unsupervised Learning for Event Retrieval
Mustafa & Sethi, ICTAI 2007
12/12/2010 35 ICMLA Talk
Unsupervised Learning Based Event Retrieval
12/12/2010 36 ICMLA Talk
Mustafa & Sethi, ICTAI 2007
Retrieval By Cross-Modal Associations
Approaches: Latent semantic indexing (LSI) Cross-modal factor analysis (CFA) Canonical correlation analysis (CCA)
- Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features
Li, Dimitrova, Li and Sethi (ACM MM 03) 12/12/2010 37 ICMLA Talk
Talking Face Example
...
Feature
Extraction
Feature
Extraction
Query
Collection
of Image
Sequences
Retrieval Results
Cross-Modal
Association
12/12/2010 38 ICMLA Talk
M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 2003
Tagging in MIR
All time most popular tags at Flickr
12/12/2010 39 ICMLA Talk
About Tags
• User centered
• Imprecise and often overly personalized
• Tag distribution follows power law
• Most users use very few distinct tags while a small group of users works with extremely large set of tags
12/12/2010 40 ICMLA Talk
How are Tags Being Used in MIR?
Relating tags in different languages through visual features
Aurnhammer, Hanappe and Steels Proc. WWW2006
12/12/2010 41 ICMLA Talk
Tag Suggester
Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)
12/12/2010 42 ICMLA Talk
Collaborative Tags
• Also known as Folksonomy, social tagging, and social classification
• Great for content characterization • The tag size represents the number of times the tag has
been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag.
12/12/2010 43 ICMLA Talk
Decision Tree Based Tagger
• Uses social tags in binary/weighted mode
• Generates/suggests multiple tags through a single decision tree classifier
First, the label vectors associated with training vectors are clustered into two initial groups
Next, the SVM is used on training vectors to yield the split that best matches the clustering result
An impurity based measure is used to iteratively adjust the split, if needed
12/12/2010 44 ICMLA Talk
Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)
12/12/2010 45 ICMLA Talk
12/12/2010 46 ICMLA Talk
Current Status of MIR
• Extensive interest as evident from conferences, journals, and special issues
• Most in the MM community happy with the progress
• Gap between published results and results from publicly available systems on web. (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)
• Lack of application focus
• Plenty of scope for machine learning to help improve MIR systems performance
• Killer applications are beginning to emerge
12/12/2010 ICMLA Talk 47
MIR Application Examples
12/12/2010 ICMLA Talk 48
Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)
Biological and Medical Data Retrieval
12/12/2010 ICMLA Talk 49
http://www.cs.washington.edu/research/VACE/Multimedia/
Killer Apps?
12/12/2010 ICMLA Talk 50
http://www.iqengines.com/applications.php
12/12/2010 51 ICMLA Talk
12/12/2010 52 ICMLA Talk
http://www.iqengines.com/applications.php
http://www.thingd.com
Bloomberg Businessweek, Nov29, 2010 12/12/2010 53 ICMLA Talk
12/12/2010 54 ICMLA Talk
Take Home Message
• MIR is emerging in the commercial domain. Lot more activity is expected in near future
• MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time
• ML is playing a vital role in MIR
• Approaches combining social search and visual search techniques are expected to gain prominence
12/12/2010 ICMLA Talk 55
Acknowledgement
• This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that.
• Also want to thank my present and past students and collaborators.
12/12/2010 ICMLA Talk 56
Questions?
12/12/2010 57 ICMLA Talk