EE 6882 Statistical Methods for Video Indexing and...

1

EE 6882 Statistical Methods for Video Indexing and Analysis

Fall 2004Prof. Shih-Fu Chang

http://www.ee.columbia.edu/~sfchang

Lecture 1 part A (9/8/04)

2EE6882-Chang

EE E6882 SVIA Lecture #1Part I

Introduction Course SyllabusReadings

A. Jain et al, "Statistical Pattern Recognition: A Review," IEEE Tran. on Pattern Analysis and Machine Intelligence, vol 22, No 1, Jan. 2000.Gonzalez and Woods, Digital Image Processing, 2nd edition, Prentice Hall, 2001 (Chapter 12, Object recognition)Anil K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, 1989. (Chapter 9.14)

Part IIIntroduction of a simple image search systemImage feature extractionSimilarity matching, Performance metricsReadings

J. R. Smith and S.-F. Chang, "Visually Searching the Web for Content," IEEE Multimedia Magazine, Summer, Vol. 4 No. 3, pp.12-20, 1997. John R. Smith, Shih-Fu Chang. “VisualSEEk: a Fully Automated Content-Based Image Query System,” In ACM Multimedia, Boston, MA, November 1996.

3EE6882-Chang

Problems in Video Indexing and Analysis

Indexing, search, and retrieval for images and videosSee Columbia’s WebSEEk and EdSearch demosGoggle image search?“find video clips of basketball going through the hoop”“find images containing shape shown in the sketch”

Automatic annotation of visual content(e.g., recognition of text, face, scene, vehicle, location, etc)

Automatic parsing of video programs into structures(e.g., break videos into shots, scenes, and stories)

Event detection(e.g., sports events, human activities, meetings, medical, and other spatio-temporal patterns)

Summarye.g., topic clustering, highlight generationSee Columbia’s sports highlight, news topic clustering demo

4EE6882-Chang

Examples of object recognition and structure parsing problems

shotstory

anchor shot

How to detect and recognize the characters and words?

(Demo)

How to detect the boundaries of programs, stories, and

commercials?

5EE6882-Chang

Statistical Paradigm

Many problems can be posed as pattern recognition

(e.g., Matlab statistical classification demo)Statistical models to handle uncertainty and provide flexibilityRich tools for learning and predictionImage processing toolkits availableIncreasing benchmark data

(e.g., NIST TREC Video)

6EE6882-Chang

A Very High-Level Stat. Pattern Recog. Architecture

(From Jain, Duin, and Mao, SPR Review, ’99)

7EE6882-Chang

Important issuesImage/video pre-processing – quality, resolution etcFeature extraction

Color, texture, motion, shape, layout, regions, parts, etcFeature representation

Discrete vs. continuous, vectorization, dimensionInvariance to scale, rotation, translation …

Feature selectionPCA, MDS, Kernel PCA, etc

Classification modelsGenerative vs. discriminativeMulti-modal fusion, early fusion vs. late fusion

Size of training/test data and manual supervision effortsValidation and evaluation processesComplexity

8EE6882-Chang

Some examples of feature representation

Features determine the patterns and their separabilityE.g.,

Angular distance for closed shapesPart features for iris flowers

9EE6882-Chang

Another example of feature

Bankers Asso. Font used on personal checksUse magnetic ink and reader to simplify segmentationFeature: the horizontal scan of the rate of increase/decrease of the character areaPeaks and zeros are arranged to be located at the vertical grid lines can be sampled accuratelyPatterns can be easily distinguished

11EE6882-Chang

Classification Paradigms

x

Likelihood

Probabilistic

Class 1 Class 2

x0 (Height, income, …)

P(x|C=1) > or < P(x|C=2)

C(x0 )=?

x1

Decision Boundary

+++

+ + ++

+++++

+

+++ +

++

++

++

--

--

-- -

-- --

- -

-

-

-----

- --

--

--

--

----

- --

--

-

---

--

x2

Discriminative

+

++

+

++ +

+

f(x) < 0

f(x) > 0

f(x) discriminant function

12EE6882-Chang

Training / Validation / Testing

Assume the same distribution in different set, otherwise the optimal solution from validation may not be optimal in test data

x(1)

x(2)Training

- ---

++

+

++ -

optimal features, models, parameters

x(1)

x(2) Validation

- --

++

++

-

Select optimal hypothesis through

validation

x(1)

x(2)Testing

- --

+

+

++ -+-

Evaluate performance over test data

13EE6882-Chang

Training / Validation / Testing (cont.)

Multiple validation sets can be used for different optimization steps.

Val - 1

Val - 1

Optimal classifierusing feature 1

Val - 2Optimal classifierusing feature 2

Optimal classifierfusing multiple features

… …

Cross validation, leave-one-out

1 2 … K

Training Testing

Rotate the choice of the test set and average the performance over runs

14EE6882-Chang

Curse of Dimensionality and Overtraining

Rule of thumb –(# of training patterns per class) / (# of features)

> 10

x(1)

x(2)Overtraining

--

--

++

+

++ -

++

+

+

+

-

-- - --

A case of overtraining

15EE6882-Chang

About the courseObjectives:

Learn how to formulate and solve problems in this fieldFeature extraction, object/event recognition, structure detection, video search and retrieval

Get insights and experience of recent machine learning techniques

Statistical, Bayesian, Neural Network, PCA, HMM, SVMHave fun in experimenting with actual visual classification/indexing problems

Intended AudienceBeginning graduate students or professionalsfamiliar with signal/image processingcomfortable with probability, statistics, linear algebra, and some machine learning

16EE6882-Chang

Course Format

Overview Lectures + student presentations + final projectsI will give several overview lectures at the beginning.Student paper presentation

One paper assigned to each studentassignments determined 3 weeks in advanceCVN students present over the phone

Everyone writes comments before and after class on the class wiki site (starting the 3rd week)One written exam after all presentations

test understanding of concepts discussed throughout the courseOne term project at the end of the course Grading

Paper presentation/demo 30%Exam 30%Final Project 40%

17EE6882-Chang

Paper review and demo

Each student discusses paper and demos with me and TA 2 weeks before class

Week 1: review and researchWeek 2: simulate a toy problem using available data set and toolsWeek 3: prepare presentation

Upload the slide and codes to the class wiki site before classPresentation

30 mins each paper (including demo)I will provide additional materials about the subject.

18EE6882-Chang

Paper Review and Demo (2)Review

Background review and examplesProblem addressed and main ideasInsights about why it worksLimitation, generality, and repeatabilityAlternatives and comparisons

DemoSoftware and data available and repeatable?Reconstruct the method and try on toy data set?(from some available generic toolkit)Analysis of results (not just accuracy numbers, offer explanations and verifiable theories about observations)Demo code archived on class site and shared with others

19EE6882-Chang

Resources and MatlabLinks on the class web site

Tutorials on paper writing, Matlab, etcSoftware links on web site to Matlab, Neural Network, HMM, Netlab, SVMSVIA EE6882 Class Dataset

Benchmark data set, a few thousands of images from broadcast news and stock photosExtracted features and labelsWill distribute on a DVD for class project use only

Matlab is recommended for programmingAccessible in Mudd 251 Computer LabNeed CU ACIS accountVery brief introduction next week

20EE6882-Chang

Paper categoriesProblems

Feature extraction and image searchImage/video classificationInteractive image retrievalVideo structure parsingMultimedia information retrieval

Statistical TechniquesBayesian, factor graph, graphical modelSVM and variationsLanguage model, relevance model from IR HMM and variationsothers

21EE6882-Chang

A few papers reviewed last year

22EE6882-Chang

Maximum Entropy Fusing

Objective: a story boundary at time ?= { shot boundaries or significant pauses}

observation

time

{video, audio}a static face?

motion energy changes?

change from music to speech?

speech segment?

{cue words}j appear{cue words}i appear

kτ

kτ 1kτ +1kτ −

(Hsu and Chang)kτ

23EE6882-Chang

Bayesian Image Classification(Valaiya et al 98 and 01)

How to select the categories and tree?How to estimate the distributions of features for each class?

25EE6882-Chang

Concept (In)Dependence

(Naphade et al)

26EE6882-Chang

Boosting

Extract > 45K selectiveefficient features by multi-scale filtering

Classifier combination and sample re-weighting

(Tieu and Viola)

27EE6882-Chang

Boosting retrieval interface

User selected examples

20 retrieval results

Negative images in the training set close to decision boundary

Images in the testing set close to the decision boundary

Real-time evaluation of 20 features over millions of images

Two class problem: relevant vs. irrelevant

28EE6882-Chang

Object-Word Correspondence(Duygulu et al)

Model the joint distribution between words and blobsUsed in automatic annotation and retrieval

29EE6882-Chang

Unsupervised Video Structure Discovery: Hierarchical Hidden Markov Model

time… ……

top-level states

running pitching

break

bottom-level states

bench close up

batteraudiencefield bird view

pitcher1st base

Learning Multi-Level Markovian Temporal DependenceHigh-level states represent distinct events Presence of each event produces observations modeled by low-level HMMs

BaseballExample

(Xie et al)

EE 6882 Statistical Methods for Video Indexing and...

Documents

Transcript of EE 6882 Statistical Methods for Video Indexing and...