Machine learning and multimedia information retrieval

Machine Learning and Multimedia Information Retrieval*

Integrated Knowledge Solutions

[email protected]

* Based on a talk at ICMLA Conference

Outline

• Introduction

• Bridging the Semantic Gap

• Events in Videos

• Use of Tagging in MIR

• Killer Apps of MIR

• Take Home Message

12/12/2010 2 ICMLA Talk

Too Much Information

Which is more frustrating?

Being stuck in traffic on way to or from work

Not being able to find information you urgently need

12/12/2010 3 ICMLA Talk

According to a survey by Xerox

Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang.

Not a New Problem

The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.

12/12/2010 4 ICMLA Talk

However, Earlier

Data Producers

Data Consumers

12/12/2010 5 ICMLA Talk

But Now a Days

12/12/2010 6 ICMLA Talk

Some Relevant Numbers

Photobucket has 6.2 billion photos and Flickr has over 2 billion.

Facebook has over 10 Billion photos and over 400 million active users.

12/12/2010 7 ICMLA Talk

Phenomenon

• 24 hours of videos are uploaded to YouTube every one minute

• YouTube streams 2 billions of videos every day

12/12/2010 8 ICMLA Talk

12/12/2010 ICMLA Talk 9

So how do we get help in finding the desired multimedia information?

MIR

So What is MIR?

• Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval)

• Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content

12/12/2010 10 ICMLA Talk

History of MIR

• Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy)

• NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA)

• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference

• Shift to semantic similarity from signal similarity, 1999

• Community tagging, photo and video sharing sites, 2002

12/12/2010 11 ICMLA Talk

A Typical MIR System

Feature Extraction

Features Media Collection

Indexing & Matching

Query Feature Extraction

Retrieved Results

Relevance Feedback

12/12/2010 12 ICMLA Talk

Semantic Gap

Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept.

Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000

12/12/2010 13 ICMLA Talk

http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/

http://www.computer.org/tpami/

http://www.computer.org/tpami/

Semantic Gap

Users also like to query using descriptive words rather than query images or other multimedia objects. This requires MIR systems to correlate low-level features with high level concepts.

Visually dissimilar images representing the same concept.

12/12/2010 14 ICMLA Talk

How to Bridge the Semantic Gap?

Exploit context • Text surrounding images • Associated sound track and closed captions in videos • Query history

Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features

12/12/2010 15 ICMLA Talk

Exploiting Context: An Example

12/12/2010 16 ICMLA Talk

Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001

Example of Using Surrounding Text

12/12/2010 ICMLA Talk 17

Context via Surrounding Text

12/12/2010 ICMLA Talk 18

Context Via Surrounding Text: One More Example

12/12/2010 ICMLA Talk 19

Better Context with More Text

12/12/2010 ICMLA Talk 20

Improving Context via More Words per Query

12/12/2010 21 ICMLA Talk

Issues Unique to ML for MIR

• Simultaneous presence of multiple concepts

• How to extract/isolate concept-specific features? Segment or do not segment?

• Imbalance between positive and negative examples

• Extremely large number of concepts for a general purpose MIR

Romance, couple, beach, sundown From: s163.photobucket.com

12/12/2010 22 ICMLA Talk

A Template Relating Concepts with Pictures Concepts Image Tokens Images

12/12/2010 23 ICMLA Talk

Feature Extraction Issues

Whole image based features. Easy to use but not very effective

Region based features. Both regular region structure and segmented regions are popular

Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image

12/12/2010 24 ICMLA Talk

Scale Invariant Feature Transform (SIFT) Descriptors

SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values

http://www.vlfeat.org/

12/12/2010 25 ICMLA Talk

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.

Feature Discovery

Basic idea is to discover features that are best suitable for a given collection

12/12/2010 26 ICMLA Talk

Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004

Image Category Classifiers (ICC)

• Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc)

• Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts

12/12/2010 27 ICMLA Talk

VQ Based Image Category Classifier

Test Image

Best Codebook Label

Water Codebook

Sky Codebook

Fire Codebook

Mustafa & Sethi (2004)

12/12/2010 28 ICMLA Talk

Object Detectors

12/12/2010 29 ICMLA Talk

PASCAL Visual Object Classes Challenge

Project

http://labelme.csail.mit.edu/

Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.

12/12/2010 30 ICMLA Talk

IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos)

Image Category Classifiers Examples

12/12/2010 31 ICMLA Talk

http://www.alphaworks.ibm.com/tech/imars

http://www.alphaworks.ibm.com/tech/imars

http://mp7.watson.ibm.com/

Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities.

From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)

Image Classification via Probabilistic Modeling

12/12/2010 ICMLA Talk 32

Retrieving Events in Videos

• An event in MIR implies an interesting spatiotemporal instance

• Considerable work in MIR community on events because of popularity of sports videos

• Also tremendous interest in detecting and recognizing events with potential homeland security applications

12/12/2010 33 ICMLA Talk

Event Retrieval Examples: Supervised Approach

Mustafa & Sethi AVSS Conference 2005

12/12/2010 34 ICMLA Talk

Unsupervised Learning for Event Retrieval

Mustafa & Sethi, ICTAI 2007

12/12/2010 35 ICMLA Talk

Unsupervised Learning Based Event Retrieval

12/12/2010 36 ICMLA Talk

Mustafa & Sethi, ICTAI 2007

Retrieval By Cross-Modal Associations

Approaches: Latent semantic indexing (LSI) Cross-modal factor analysis (CFA) Canonical correlation analysis (CCA)

- Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features

Li, Dimitrova, Li and Sethi (ACM MM 03) 12/12/2010 37 ICMLA Talk

Talking Face Example

...

Feature

Extraction

Feature

Extraction

Query

Collection

of Image

Sequences

Retrieval Results

Cross-Modal

Association

12/12/2010 38 ICMLA Talk

M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 2003

Tagging in MIR

All time most popular tags at Flickr

12/12/2010 39 ICMLA Talk

About Tags

• User centered

• Imprecise and often overly personalized

• Tag distribution follows power law

• Most users use very few distinct tags while a small group of users works with extremely large set of tags

12/12/2010 40 ICMLA Talk

How are Tags Being Used in MIR?

Relating tags in different languages through visual features

Aurnhammer, Hanappe and Steels Proc. WWW2006

12/12/2010 41 ICMLA Talk

Tag Suggester

Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)

12/12/2010 42 ICMLA Talk

Collaborative Tags

• Also known as Folksonomy, social tagging, and social classification

• Great for content characterization • The tag size represents the number of times the tag has

been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag.

12/12/2010 43 ICMLA Talk

Decision Tree Based Tagger

• Uses social tags in binary/weighted mode

• Generates/suggests multiple tags through a single decision tree classifier

First, the label vectors associated with training vectors are clustered into two initial groups

Next, the SVM is used on training vectors to yield the split that best matches the clustering result

An impurity based measure is used to iteratively adjust the split, if needed

12/12/2010 44 ICMLA Talk

Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)

12/12/2010 45 ICMLA Talk

12/12/2010 46 ICMLA Talk

Current Status of MIR

• Extensive interest as evident from conferences, journals, and special issues

• Most in the MM community happy with the progress

• Gap between published results and results from publicly available systems on web. (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)

• Lack of application focus

• Plenty of scope for machine learning to help improve MIR systems performance

• Killer applications are beginning to emerge

12/12/2010 ICMLA Talk 47

MIR Application Examples

12/12/2010 ICMLA Talk 48

Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)

Biological and Medical Data Retrieval

12/12/2010 ICMLA Talk 49

http://www.cs.washington.edu/research/VACE/Multimedia/

Killer Apps?

12/12/2010 ICMLA Talk 50

http://www.iqengines.com/applications.php

12/12/2010 51 ICMLA Talk

12/12/2010 52 ICMLA Talk

http://www.iqengines.com/applications.php

http://www.thingd.com

Bloomberg Businessweek, Nov29, 2010 12/12/2010 53 ICMLA Talk

12/12/2010 54 ICMLA Talk

Take Home Message

• MIR is emerging in the commercial domain. Lot more activity is expected in near future

• MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time

• ML is playing a vital role in MIR

• Approaches combining social search and visual search techniques are expected to gain prominence

12/12/2010 ICMLA Talk 55

Acknowledgement

• This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that.

• Also want to thank my present and past students and collaborators.

12/12/2010 ICMLA Talk 56

Questions?

12/12/2010 57 ICMLA Talk

Machine learning and multimedia information retrieval

Technology

Transcript of Machine learning and multimedia information retrieval