Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...

Content-based image and video analysis

Tools and Libraries

19.07.2010

Hazım Kemal Ekenel, [email protected]

Rainer Stiefelhagen, [email protected]

Lecture overview

Labeled data is needed in almost all

discussed approaches

But: Labeling data is tedious and expensive work

We will discuss different approaches to solve this

problem

Software libraries for standard tasks can

drastically reduce development time

We will discuss software libraries for image

processing and machine learning

Content-based Image and Video Retrieval 2

Labels

Diverse types of data annotations needed Face recognition face bounding box, facial landmarks, id

Object recognition bounding box, object class

High-level features: images depicting concept

Genre-classification:videos of certain genre

…

The only way to get the labels is to get people to manually label the data It is a very boring task

People need to be compensated somehow

Most common way: just pay them! Can be quite expensive for large datasets


LabelMe

Collaborative labeling

To download the database, a certain number of

images have to be annotated

Label quality probably good, since annotators

have to use them

Public (everyone can join)

A large database (460000 labeled objects of all

kinds)

http://labelme.csail.mit.edu/

Credit: Franziska Kraus


Annotation Tool

Draw polygons and name the labeled object


Annotation Tool

Quality of polygons and names is not ensured by supervision → still sufficient A lot of images have more than 80% of their pixels

labeled

Several different object categories in many images


Annotation Tool

Objects are mostly labeled

completely despite of

partial occlusions

Object-parts hierarchies

Depth-ordering


Browse Database


Query Database


Query Database

Label names are up to the user

→ no consistency

Use

WordNet

(dictionary)

to group

categories


Object-parts hierarchies

Polygons with high overlap either Object-part hierarchy (head body)

Occlusion

For a given query (e.g. “car”) check for polygons that often have a high overlap (e.g. “wheel”)

Compute score as percentage of images where part (“wheel”) has high overlap with object (“car”)

List of object-part candidates


Occlusion / Depth-ordering

Simple heuristics

Some things can never occlude other objects (sky)

An object completely contained in another is on

top

May be wrong if containing object is transparent etc.

The polygon with more control points in the

intersecting area is on top

Use color histograms: compare histogram in

overlapping region to the two other regions

More similar region is on top

Combined heuristic achieves 2.9% error


Semi-automatic labeling

Use available labels to train detectors

Run detector on unlabeled data

Let user verify the generated labels


Summary of LabelMe

Advantages Motivation to label images is given

To download the database a user first has to label a certain number of images

→ Data quality is probably good

Disadvantages Only a few people are interested in the labels

They are the only ones providing the labels

Is it be possible to get people with no interest in the labels to annotate the images (without paying them)?


Human Computation

Humans can solve problems that computers

can't solve yet

Simple example: captchas

Humans are way better than computers in

labeling and tagging images

How do we get them to do that?

Do people on the Internet have enough time?


Human Computation

“People all over the world spent 9 billion hours playing solitaire in 2003” Luis von Ahn, 2006

Construction of the Empire State Building took ~ 6.8h playing solitaire(7 million human-hours)

Building the Panama Canal took less than 1 day of playing solitaire(20 million human-hours)


Human Computation

Humans have a lot of time

Humans can easily label and tag images

BUT you'll have to pay them to do that for you

Solution – Create a game that encourages

people to label images

Or any other task you want them to do


Games With A Purpose

http://www.gwap.com/


http://www.gwap.com/

ESP-Game

Game for tagging images from

the Web

Users see a picture and have

to agree on a tag for what the

picture contains

→ A lot of image-tag pairs, but

no information about location

or size of objects in the image


ESP-Game


User statistics

In the first four months after release

13,630 people played the game

80% of them played more than once

1,271,451 labels for 293,760 images were

generated

33 people played more than 1000 games (>50h)

Extrapolating

5000 people playing 24h/d could label all images

in Google image search in one month!

In popular online gaming sites, many more players

are online at a time (>100,000 players)


Label quality

Search for labels For 10 random labels, images with this label were

displayed

For all returned images, the label made sense

Very high precision

Manually labeled images For all images at least 83% of the ESP game tags

were also used by the manual annotators

For all images the three most common tags used by the manual annotators were also in the ESP game tags

Manual quality assessment People would use 85% of the ESP game tags to

describe the corresponding image

Only 1.7% of the tags don’t make sense as description of the image


Peekaboom

Game for locating objects in images

Improves data collected by the ESP Game

Two random players are paired up

Boom (Player 1) reveals parts of the image

Peek (Player 2) guesses the associated word

Role switch on successful guess

Playing “bots” for uneven number of players

or when people quit the game


Game Overview


Pings

Boom can “ping”

parts of the revealed

object to point out

particular parts


Word – Image Relation

Boom can give hints about how the word

relates to the image


Image Metadata

For each image-word pair, metadata is collected How the word relates to the image

Hints

Necessary pixels to guess the word Area that is revealed

Pixels inside the specified object Pings

Most salient aspects of objects in the image Sequence of Boom's clicks

Elimination of poor image-word pairs Many pairs of players click “pass”


Cheating

People could try to violate the game by

logging in at the same time

Tell each other which words to type

→ reveal the wrong parts and type the right

word anyway

Multiple anti-cheating mechanisms


Anti-Cheating Mechanisms

Player queue Player has to wait n seconds until he is paired up

IP address checks

Seed images Images with hand-verified metadata in the game

Limited freedom to enter guesses Guess field used for communication

Only letters, only words in the dictionary

Aggregating data from multiple players


Implementation

Spelling check Incorrect words are displayed

in a different color

Inappropriate word replacement Substituted with words like

“ILuvPeekaboom”

Top scores list and ranks Top scores of the day and of

all time

Users get a rank based on their total number of points


Additional Applications

Improving image-search results

Peekaboom gives estimate of the fraction of the

image that is related to the word

→ use these fractions to order image results

Use ping data for pointing to objects

Object bounding boxes

Image search engine with highlighted results


Ping Accuracy

Use ping data for pointing

Arrow lines pointing to the

objects

Pings selected at random

100% accuracy shown in

experiment


Bounding Box Accuracy

Bounding box created

with Peekaboom

Lowest overlap with user-

created box was 50%


User Statistics

How enjoyable is the game?

14,000 different people and 1.1 million pieces of

data during the first month

User comments:


“One unfortunate side effect

of playing so much in such a

short time was a mild case of

carpal tunnel syndrome in my

right hand and forearm, but

that dissipated quickly.”

“This game is like crack. I've been

Peekaboom-free for 32 hours. Unlike other

games, Peekaboom is cooperative.”

“[...] I would say that it gives the same gut feeling as

combining gambling with charades while riding on a

roller coaster. The good points are that you increase

and stimulate your intelligence, you don't lose all your

money and you don't fall off the ride. The bad point is

that you look at your watch and eight hours have just

disappeared!”

Summary of Peekaboom

Peekaboom works because people like to

play games

Experiments show that results are sufficiently

accurate

In addition to the ESP data, information about

location and size of objects in the image is

retrieved


Other “Games with a purpose”

Tag a tune Some piece of music is played to you and partner

You must describe the music with words

Based on your partner’s description you have to decide whether you two are listening to the same song

Verbosity Describe a word to the partner using other words

Partner must guess secret word

Squigl Trace the outline of an object in the same way as your partner

Very limited time (5-10 seconds)

Matchin Decide which one of two images you like best

Points when partner agrees

Popvideo You and your partner are shown a video clip

You have to enter tags describing the video (and audio!)

Points when tags match


Publicly available labeled datasets

Some datasets are available for free in order

to advance research

Caltech 101/256, AR, …

Some evaluation campaigns generate a lot of

labeled data and provide it

for everybody: PASCAL VOC challenge

for participants: TRECVID, ImageCLEF


Caltech 101/256

Freely available http://vision.caltech.edu/Image_Datasets/Caltech101

http://vision.caltech.edu/Image_Datasets/Caltech256

Pictures of 101/256 object categories

Labels and outlines of the objects




PASCAL

Collection of object recognition databases with ground truth

VOC challenge uses it

Freely available http://pascallin.ecs.soton.ac.uk/challenges/VOC/databa

ses.html


http://pascallin.ecs.soton.ac.uk/challenges/VOC/databases.html

http://pascallin.ecs.soton.ac.uk/challenges/VOC/databases.html

Software libraries

Many systems use standard algorithms Linear algebra

Image processing

Image filters, image features, etc.

Machine learning

SVMs, PCA, LDA, etc.

Advantages of using standard libraries Avoid errors in own implementation

Leverage know-how of other people

The implementation details of simple algorithms can be quite tricky sometimes!

Save a lot of development time

More time to work on your algorithm


OpenCV

Open Computer Vision library http://sourceforge.net/projects/opencvlibrary/

http://opencv.willowgarage.com/

Features C-like interface (Version 2.0 is more C++)

Linear algebra

Image processing functions Filters, FFT, DCT, etc.

Image features SURF, HOG, Haar-like features (mainly in 2.0/SVN)

Detectors Haar-cascades (training & detection)

Machine learning SVMs, Neural nets, decision trees, GMMs, …

Much much more…


http://sourceforge.net/projects/opencvlibrary/

http://opencv.willowgarage.com/

OKAPI

Open Karlsruhe library for processing of images http://cvhci.anthropomatik.kit.edu/okapi

Features C++

Cameras (V4L, Firewire, VfW)

Videos (frame-accurate random access)

Image Features (DCT, Gabor, LBP, MCT, …)

Linear Projections (PCA, LDA, RCA)

Detector (using MCT features, very fast)

SVMs (libsvm, liblinear)

3D geometry functions

Simple GUI for prototypes

Many utility functions (timing, XML, in-memory imageIO, …)


http://cvhci.anthropomatik.kit.edu/okapi

SVMs

Libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/

Simple and standard

Liblinear (linear SVMs) http://www.csie.ntu.edu.tw/~cjlin/liblinear/

Much faster for linear SVMs

SVMlight http://svmlight.joachims.org/

Also very popular

Shogun toolbox http://www.shogun-toolbox.org/

Many kernel functions and Multi-Kernel Learning (MKL)

Uses libsvm or SVMlight in the background


http://www.csie.ntu.edu.tw/~cjlin/libsvm/

http://www.csie.ntu.edu.tw/~cjlin/liblinear/

http://svmlight.joachims.org/

http://www.shogun-toolbox.org/



Machine Learning

Weka (Java) http://www.cs.waikato.ac.nz/ml/weka/

Java-ML (Java) http://java-ml.sourceforge.net/

Torch (C/Lua) http://torch5.sourceforge.net/

MLC++ (C++) http://www.sgi.com/tech/mlc/

MLPACK / FASTlib http://mloss.org/software/view/152/

http://fastlib.analytics1305.com/

Spider (Matlab) http://www.kyb.tuebingen.mpg.de/bs/people/spider/

FLANN (Approximate k-Nearest Neighbors) http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

List of Open Source Machine Learning software http://mloss.org/


http://www.cs.waikato.ac.nz/ml/weka/

http://java-ml.sourceforge.net/



http://torch5.sourceforge.net/

http://www.sgi.com/tech/mlc/

http://mloss.org/software/view/152/

http://fastlib.analytics1305.com/

http://www.kyb.tuebingen.mpg.de/bs/people/spider/

http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN

http://mloss.org/

References

B.C. Russell, A. Torralba, K. Murphy, W.T. FreemanLabelMe: A Database and Web-Based Tool for Image AnnotationInternational Journal of Computer Vision, vol. 77, issue 1, May 2008

L. von Ahn, L. DabbishLabeling images with a computer gameProceedings of the SIGCHI conference on Human factors in computing systems, 2004

L. von Ahn, R. Liu, M. BlumPeekaboom: a game for locating objects in imagesProceedings of the SIGCHI conference on Human Factors in computing systems, 2006


Lecture Overview

Introduction

Visual Descriptors

Image Segementation

Classification

Shot Boundary Detection & Genre Classification

High-level Feature Detection

High-level Feature Detection II

Semantics

Event Recognition

Person Identification

Copy Detection

Search

Tools and Libraries


Computer Vision & Machine

Learning Overview

Indexing & General concept

detection

Matching

Special topics in

concept detection

Automatic and interactive search

Annotation tools, databases,

and software libraries

Intro. to Video proc.

Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...

Documents

Transcript of Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...