EE 6882 Statistical Methods for Video Indexing and...
Transcript of EE 6882 Statistical Methods for Video Indexing and...
1
EE 6882 Statistical Methods for Video Indexing and Analysis
Fall 2004Prof. Shih-Fu Chang
http://www.ee.columbia.edu/~sfchang
Lecture 1 part A (9/8/04)
2EE6882-Chang
EE E6882 SVIA Lecture #1Part I
Introduction Course SyllabusReadings
A. Jain et al, "Statistical Pattern Recognition: A Review," IEEE Tran. on Pattern Analysis and Machine Intelligence, vol 22, No 1, Jan. 2000.Gonzalez and Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2001 (Chapter 12, Object recognition)Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989. (Chapter 9.14)
Part IIIntroduction of a simple image search systemImage feature extractionSimilarity matching, Performance metricsReadings
J. R. Smith and S.-F. Chang, "Visually Searching the Web for Content," IEEE Multimedia Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997. John R. Smith, Shih-Fu Chang. “VisualSEEk: a Fully Automated Content-Based Image Query System,” In ACM Multimedia, Boston, MA, November 1996.
3EE6882-Chang
Problems in Video Indexing and Analysis
Indexing, search, and retrieval for images and videosSee Columbia’s WebSEEk and EdSearch demosGoggle image search?“find video clips of basketball going through the hoop”“find images containing shape shown in the sketch”
Automatic annotation of visual content(e.g., recognition of text, face, scene, vehicle, location, etc)
Automatic parsing of video programs into structures(e.g., break videos into shots, scenes, and stories)
Event detection(e.g., sports events, human activities, meetings, medical, and other spatio-temporal patterns)
Summarye.g., topic clustering, highlight generationSee Columbia’s sports highlight, news topic clustering demo
4EE6882-Chang
Examples of object recognition and structure parsing problems
shotstory
anchor shot
How to detect and recognize the characters and words?
(Demo)
How to detect the boundaries of programs, stories, and
commercials?
5EE6882-Chang
Statistical Paradigm
Many problems can be posed as pattern recognition
(e.g., Matlab statistical classification demo)Statistical models to handle uncertainty and provide flexibilityRich tools for learning and predictionImage processing toolkits availableIncreasing benchmark data
(e.g., NIST TREC Video)
6EE6882-Chang
A Very High-Level Stat. Pattern Recog. Architecture
(From Jain, Duin, and Mao, SPR Review, ’99)
7EE6882-Chang
Important issuesImage/video pre-processing – quality, resolution etcFeature extraction
Color, texture, motion, shape, layout, regions, parts, etcFeature representation
Discrete vs. continuous, vectorization, dimensionInvariance to scale, rotation, translation …
Feature selectionPCA, MDS, Kernel PCA, etc
Classification modelsGenerative vs. discriminativeMulti-modal fusion, early fusion vs. late fusion
Size of training/test data and manual supervision effortsValidation and evaluation processesComplexity
8EE6882-Chang
Some examples of feature representation
Features determine the patterns and their separabilityE.g.,
Angular distance for closed shapesPart features for iris flowers
9EE6882-Chang
Another example of feature
Bankers Asso. Font used on personal checksUse magnetic ink and reader to simplify segmentationFeature: the horizontal scan of the rate of increase/decrease of the character areaPeaks and zeros are arranged to be located at the vertical grid lines can be sampled accuratelyPatterns can be easily distinguished
11EE6882-Chang
Classification Paradigms
x
Likelihood
Probabilistic
Class 1 Class 2
x0 (Height, income, …)
P(x|C=1) > or < P(x|C=2)
C(x0 )=?
x1
Decision Boundary
+++
+ + ++
+++++
+
+++ +
++
++
++
--
--
-- -
-- --
- -
-
-
-----
- --
--
--
--
----
- --
--
-
---
--
x2
Discriminative
+
++
+
++ +
+
f(x) < 0
f(x) > 0
f(x) discriminant function
12EE6882-Chang
Training / Validation / Testing
Assume the same distribution in different set, otherwise the optimal solution from validation may not be optimal in test data
x(1)
x(2)Training
- ---
++
+
++ -
optimal features, models, parameters
x(1)
x(2) Validation
- --
++
++
-
Select optimal hypothesis through
validation
x(1)
x(2)Testing
- --
+
+
++ -+-
Evaluate performance over test data
13EE6882-Chang
Training / Validation / Testing (cont.)
Multiple validation sets can be used for different optimization steps.
Val - 1
Val - 1
Optimal classifierusing feature 1
Val - 2Optimal classifierusing feature 2
Optimal classifierfusing multiple features
… …
Cross validation, leave-one-out
1 2 … K
Training Testing
Rotate the choice of the test set and average the performance over runs
14EE6882-Chang
Curse of Dimensionality and Overtraining
Rule of thumb –(# of training patterns per class) / (# of features)
> 10
x(1)
x(2)Overtraining
--
--
++
+
++ -
++
+
+
+
-
-- - --
A case of overtraining
15EE6882-Chang
About the courseObjectives:
Learn how to formulate and solve problems in this fieldFeature extraction, object/event recognition, structure detection, video search and retrieval
Get insights and experience of recent machine learning techniques
Statistical, Bayesian, Neural Network, PCA, HMM, SVMHave fun in experimenting with actual visual classification/indexing problems
Intended AudienceBeginning graduate students or professionalsfamiliar with signal/image processingcomfortable with probability, statistics, linear algebra, and some machine learning
16EE6882-Chang
Course Format
Overview Lectures + student presentations + final projectsI will give several overview lectures at the beginning.Student paper presentation
One paper assigned to each studentassignments determined 3 weeks in advanceCVN students present over the phone
Everyone writes comments before and after class on the class wiki site (starting the 3rd week)One written exam after all presentations
test understanding of concepts discussed throughout the courseOne term project at the end of the course Grading
Paper presentation/demo 30%Exam 30%Final Project 40%
17EE6882-Chang
Paper review and demo
Each student discusses paper and demos with me and TA 2 weeks before class
Week 1: review and researchWeek 2: simulate a toy problem using available data set and toolsWeek 3: prepare presentation
Upload the slide and codes to the class wiki site before classPresentation
30 mins each paper (including demo)I will provide additional materials about the subject.
18EE6882-Chang
Paper Review and Demo (2)Review
Background review and examplesProblem addressed and main ideasInsights about why it worksLimitation, generality, and repeatabilityAlternatives and comparisons
DemoSoftware and data available and repeatable?Reconstruct the method and try on toy data set?(from some available generic toolkit)Analysis of results (not just accuracy numbers, offer explanations and verifiable theories about observations)Demo code archived on class site and shared with others
19EE6882-Chang
Resources and MatlabLinks on the class web site
Tutorials on paper writing, Matlab, etcSoftware links on web site to Matlab, Neural Network, HMM, Netlab, SVMSVIA EE6882 Class Dataset
Benchmark data set, a few thousands of images from broadcast news and stock photosExtracted features and labelsWill distribute on a DVD for class project use only
Matlab is recommended for programmingAccessible in Mudd 251 Computer LabNeed CU ACIS accountVery brief introduction next week
20EE6882-Chang
Paper categoriesProblems
Feature extraction and image searchImage/video classificationInteractive image retrievalVideo structure parsingMultimedia information retrieval
Statistical TechniquesBayesian, factor graph, graphical modelSVM and variationsLanguage model, relevance model from IR HMM and variationsothers
21EE6882-Chang
A few papers reviewed last year
22EE6882-Chang
Maximum Entropy Fusing
Objective: a story boundary at time ?= { shot boundaries or significant pauses}
observation
time
{video, audio}a static face?
motion energy changes?
change from music to speech?
speech segment?
{cue words}j appear{cue words}i appear
kτ
kτ 1kτ +1kτ −
(Hsu and Chang)kτ
23EE6882-Chang
Bayesian Image Classification(Valaiya et al 98 and 01)
How to select the categories and tree?How to estimate the distributions of features for each class?
25EE6882-Chang
Concept (In)Dependence
(Naphade et al)
26EE6882-Chang
Boosting
Extract > 45K selectiveefficient features by multi-scale filtering
Classifier combination and sample re-weighting
(Tieu and Viola)
27EE6882-Chang
Boosting retrieval interface
User selected examples
20 retrieval results
Negative images in the training set close to decision boundary
Images in the testing set close to the decision boundary
Real-time evaluation of 20 features over millions of images
Two class problem: relevant vs. irrelevant
28EE6882-Chang
Object-Word Correspondence(Duygulu et al)
Model the joint distribution between words and blobsUsed in automatic annotation and retrieval
29EE6882-Chang
Unsupervised Video Structure Discovery: Hierarchical Hidden Markov Model
time… ……
top-level states
running pitching
break
bottom-level states
bench close up
batteraudiencefield bird view
pitcher1st base
Learning Multi-Level Markovian Temporal DependenceHigh-level states represent distinct events Presence of each event produces observations modeled by low-level HMMs
BaseballExample
(Xie et al)