Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...
Transcript of Tools and Libraries - KIT · User statistics In the first four months after release 13,630 people...
Content-based image and video analysis
Tools and Libraries
19.07.2010
Hazım Kemal Ekenel, [email protected]
Rainer Stiefelhagen, [email protected]
Lecture overview
Labeled data is needed in almost all
discussed approaches
But: Labeling data is tedious and expensive work
We will discuss different approaches to solve this
problem
Software libraries for standard tasks can
drastically reduce development time
We will discuss software libraries for image
processing and machine learning
Content-based Image and Video Retrieval 2
Labels
Diverse types of data annotations needed Face recognition face bounding box, facial landmarks, id
Object recognition bounding box, object class
High-level features: images depicting concept
Genre-classification:videos of certain genre
…
The only way to get the labels is to get people to manually label the data It is a very boring task
People need to be compensated somehow
Most common way: just pay them! Can be quite expensive for large datasets
Content-based Image and Video Retrieval 3
LabelMe
Collaborative labeling
To download the database, a certain number of
images have to be annotated
Label quality probably good, since annotators
have to use them
Public (everyone can join)
A large database (460000 labeled objects of all
kinds)
http://labelme.csail.mit.edu/
Credit: Franziska Kraus
Content-based Image and Video Retrieval 4
Annotation Tool
Draw polygons and name the labeled object
Content-based Image and Video Retrieval 5
Annotation Tool
Quality of polygons and names is not ensured by supervision → still sufficient A lot of images have more than 80% of their pixels
labeled
Several different object categories in many images
Content-based Image and Video Retrieval 6
Annotation Tool
Objects are mostly labeled
completely despite of
partial occlusions
Object-parts hierarchies
Depth-ordering
Content-based Image and Video Retrieval 7
Browse Database
Content-based Image and Video Retrieval 8
Query Database
Content-based Image and Video Retrieval 9
Query Database
Label names are up to the user
→ no consistency
Use
WordNet
(dictionary)
to group
categories
Content-based Image and Video Retrieval 10
Object-parts hierarchies
Polygons with high overlap either Object-part hierarchy (head body)
Occlusion
For a given query (e.g. “car”) check for polygons that often have a high overlap (e.g. “wheel”)
Compute score as percentage of images where part (“wheel”) has high overlap with object (“car”)
List of object-part candidates
Content-based Image and Video Retrieval 11
Occlusion / Depth-ordering
Simple heuristics
Some things can never occlude other objects (sky)
An object completely contained in another is on
top
May be wrong if containing object is transparent etc.
The polygon with more control points in the
intersecting area is on top
Use color histograms: compare histogram in
overlapping region to the two other regions
More similar region is on top
Combined heuristic achieves 2.9% error
Content-based Image and Video Retrieval 12
Semi-automatic labeling
Use available labels to train detectors
Run detector on unlabeled data
Let user verify the generated labels
Content-based Image and Video Retrieval 13
Summary of LabelMe
Advantages Motivation to label images is given
To download the database a user first has to label a certain number of images
→ Data quality is probably good
Disadvantages Only a few people are interested in the labels
They are the only ones providing the labels
Is it be possible to get people with no interest in the labels to annotate the images (without paying them)?
Content-based Image and Video Retrieval 14
Human Computation
Humans can solve problems that computers
can't solve yet
Simple example: captchas
Humans are way better than computers in
labeling and tagging images
How do we get them to do that?
Do people on the Internet have enough time?
Content-based Image and Video Retrieval 15
Human Computation
“People all over the world spent 9 billion hours playing solitaire in 2003” Luis von Ahn, 2006
Construction of the Empire State Building took ~ 6.8h playing solitaire(7 million human-hours)
Building the Panama Canal took less than 1 day of playing solitaire(20 million human-hours)
Content-based Image and Video Retrieval 16
Human Computation
Humans have a lot of time
Humans can easily label and tag images
BUT you'll have to pay them to do that for you
Solution – Create a game that encourages
people to label images
Or any other task you want them to do
Content-based Image and Video Retrieval 17
Games With A Purpose
http://www.gwap.com/
Content-based Image and Video Retrieval 18
ESP-Game
Game for tagging images from
the Web
Users see a picture and have
to agree on a tag for what the
picture contains
→ A lot of image-tag pairs, but
no information about location
or size of objects in the image
Content-based Image and Video Retrieval 19
ESP-Game
Content-based Image and Video Retrieval 20
User statistics
In the first four months after release
13,630 people played the game
80% of them played more than once
1,271,451 labels for 293,760 images were
generated
33 people played more than 1000 games (>50h)
Extrapolating
5000 people playing 24h/d could label all images
in Google image search in one month!
In popular online gaming sites, many more players
are online at a time (>100,000 players)
Content-based Image and Video Retrieval 21
Label quality
Search for labels For 10 random labels, images with this label were
displayed
For all returned images, the label made sense
Very high precision
Manually labeled images For all images at least 83% of the ESP game tags
were also used by the manual annotators
For all images the three most common tags used by the manual annotators were also in the ESP game tags
Manual quality assessment People would use 85% of the ESP game tags to
describe the corresponding image
Only 1.7% of the tags don’t make sense as description of the image
Content-based Image and Video Retrieval 22
Peekaboom
Game for locating objects in images
Improves data collected by the ESP Game
Two random players are paired up
Boom (Player 1) reveals parts of the image
Peek (Player 2) guesses the associated word
Role switch on successful guess
Playing “bots” for uneven number of players
or when people quit the game
Content-based Image and Video Retrieval 23
Game Overview
Content-based Image and Video Retrieval 24
Pings
Boom can “ping”
parts of the revealed
object to point out
particular parts
Content-based Image and Video Retrieval 25
Word – Image Relation
Boom can give hints about how the word
relates to the image
Content-based Image and Video Retrieval 26
Image Metadata
For each image-word pair, metadata is collected How the word relates to the image
Hints
Necessary pixels to guess the word Area that is revealed
Pixels inside the specified object Pings
Most salient aspects of objects in the image Sequence of Boom's clicks
Elimination of poor image-word pairs Many pairs of players click “pass”
Content-based Image and Video Retrieval 27
Cheating
People could try to violate the game by
logging in at the same time
Tell each other which words to type
→ reveal the wrong parts and type the right
word anyway
Multiple anti-cheating mechanisms
Content-based Image and Video Retrieval 28
Anti-Cheating Mechanisms
Player queue Player has to wait n seconds until he is paired up
IP address checks
Seed images Images with hand-verified metadata in the game
Limited freedom to enter guesses Guess field used for communication
Only letters, only words in the dictionary
Aggregating data from multiple players
Content-based Image and Video Retrieval 29
Implementation
Spelling check Incorrect words are displayed
in a different color
Inappropriate word replacement Substituted with words like
“ILuvPeekaboom”
Top scores list and ranks Top scores of the day and of
all time
Users get a rank based on their total number of points
Content-based Image and Video Retrieval 30
Additional Applications
Improving image-search results
Peekaboom gives estimate of the fraction of the
image that is related to the word
→ use these fractions to order image results
Use ping data for pointing to objects
Object bounding boxes
Image search engine with highlighted results
Content-based Image and Video Retrieval 31
Ping Accuracy
Use ping data for pointing
Arrow lines pointing to the
objects
Pings selected at random
100% accuracy shown in
experiment
Content-based Image and Video Retrieval 32
Bounding Box Accuracy
Bounding box created
with Peekaboom
Lowest overlap with user-
created box was 50%
Content-based Image and Video Retrieval 33
User Statistics
How enjoyable is the game?
14,000 different people and 1.1 million pieces of
data during the first month
User comments:
Content-based Image and Video Retrieval 34
“One unfortunate side effect
of playing so much in such a
short time was a mild case of
carpal tunnel syndrome in my
right hand and forearm, but
that dissipated quickly.”
“This game is like crack. I've been
Peekaboom-free for 32 hours. Unlike other
games, Peekaboom is cooperative.”
“[...] I would say that it gives the same gut feeling as
combining gambling with charades while riding on a
roller coaster. The good points are that you increase
and stimulate your intelligence, you don't lose all your
money and you don't fall off the ride. The bad point is
that you look at your watch and eight hours have just
disappeared!”
Summary of Peekaboom
Peekaboom works because people like to
play games
Experiments show that results are sufficiently
accurate
In addition to the ESP data, information about
location and size of objects in the image is
retrieved
Content-based Image and Video Retrieval 35
Other “Games with a purpose”
Tag a tune Some piece of music is played to you and partner
You must describe the music with words
Based on your partner’s description you have to decide whether you two are listening to the same song
Verbosity Describe a word to the partner using other words
Partner must guess secret word
Squigl Trace the outline of an object in the same way as your partner
Very limited time (5-10 seconds)
Matchin Decide which one of two images you like best
Points when partner agrees
Popvideo You and your partner are shown a video clip
You have to enter tags describing the video (and audio!)
Points when tags match
Content-based Image and Video Retrieval 36
Publicly available labeled datasets
Some datasets are available for free in order
to advance research
Caltech 101/256, AR, …
Some evaluation campaigns generate a lot of
labeled data and provide it
for everybody: PASCAL VOC challenge
for participants: TRECVID, ImageCLEF
Content-based Image and Video Retrieval 37
Caltech 101/256
Freely available http://vision.caltech.edu/Image_Datasets/Caltech101
http://vision.caltech.edu/Image_Datasets/Caltech256
Pictures of 101/256 object categories
Labels and outlines of the objects
Content-based Image and Video Retrieval 38
PASCAL
Collection of object recognition databases with ground truth
VOC challenge uses it
Freely available http://pascallin.ecs.soton.ac.uk/challenges/VOC/databa
ses.html
Content-based Image and Video Retrieval 39
Software libraries
Many systems use standard algorithms Linear algebra
Image processing
Image filters, image features, etc.
Machine learning
SVMs, PCA, LDA, etc.
Advantages of using standard libraries Avoid errors in own implementation
Leverage know-how of other people
The implementation details of simple algorithms can be quite tricky sometimes!
Save a lot of development time
More time to work on your algorithm
Content-based Image and Video Retrieval 41
OpenCV
Open Computer Vision library http://sourceforge.net/projects/opencvlibrary/
http://opencv.willowgarage.com/
Features C-like interface (Version 2.0 is more C++)
Linear algebra
Image processing functions Filters, FFT, DCT, etc.
Image features SURF, HOG, Haar-like features (mainly in 2.0/SVN)
Detectors Haar-cascades (training & detection)
Machine learning SVMs, Neural nets, decision trees, GMMs, …
Much much more…
Content-based Image and Video Retrieval 42
OKAPI
Open Karlsruhe library for processing of images http://cvhci.anthropomatik.kit.edu/okapi
Features C++
Cameras (V4L, Firewire, VfW)
Videos (frame-accurate random access)
Image Features (DCT, Gabor, LBP, MCT, …)
Linear Projections (PCA, LDA, RCA)
Detector (using MCT features, very fast)
SVMs (libsvm, liblinear)
3D geometry functions
Simple GUI for prototypes
Many utility functions (timing, XML, in-memory imageIO, …)
Content-based Image and Video Retrieval 43
SVMs
Libsvm http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Simple and standard
Liblinear (linear SVMs) http://www.csie.ntu.edu.tw/~cjlin/liblinear/
Much faster for linear SVMs
SVMlight http://svmlight.joachims.org/
Also very popular
Shogun toolbox http://www.shogun-toolbox.org/
Many kernel functions and Multi-Kernel Learning (MKL)
Uses libsvm or SVMlight in the background
Content-based Image and Video Retrieval 44
Machine Learning
Weka (Java) http://www.cs.waikato.ac.nz/ml/weka/
Java-ML (Java) http://java-ml.sourceforge.net/
Torch (C/Lua) http://torch5.sourceforge.net/
MLC++ (C++) http://www.sgi.com/tech/mlc/
MLPACK / FASTlib http://mloss.org/software/view/152/
http://fastlib.analytics1305.com/
Spider (Matlab) http://www.kyb.tuebingen.mpg.de/bs/people/spider/
FLANN (Approximate k-Nearest Neighbors) http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN
List of Open Source Machine Learning software http://mloss.org/
Content-based Image and Video Retrieval 45
References
B.C. Russell, A. Torralba, K. Murphy, W.T. FreemanLabelMe: A Database and Web-Based Tool for Image AnnotationInternational Journal of Computer Vision, vol. 77, issue 1, May 2008
L. von Ahn, L. DabbishLabeling images with a computer gameProceedings of the SIGCHI conference on Human factors in computing systems, 2004
L. von Ahn, R. Liu, M. BlumPeekaboom: a game for locating objects in imagesProceedings of the SIGCHI conference on Human Factors in computing systems, 2006
Content-based Image and Video Retrieval 46
Lecture Overview
Introduction
Visual Descriptors
Image Segementation
Classification
Shot Boundary Detection & Genre Classification
High-level Feature Detection
High-level Feature Detection II
Semantics
Event Recognition
Person Identification
Copy Detection
Search
Tools and Libraries
Content-based Image and Video Retrieval 47
Computer Vision & Machine
Learning Overview
Indexing & General concept
detection
Matching
Special topics in
concept detection
Automatic and interactive search
Annotation tools, databases,
and software libraries
Intro. to Video proc.