Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.

33
Content-Based Video Content-Based Video Retrieval System Retrieval System Presented by: Presented by: Edmund Liang Edmund Liang CSE 8337: Information CSE 8337: Information Retrieval Retrieval

Transcript of Content-Based Video Retrieval System Presented by: Edmund Liang CSE 8337: Information Retrieval.

Content-Based Video Content-Based Video Retrieval SystemRetrieval System

Presented by:Presented by:

Edmund LiangEdmund Liang

CSE 8337: Information RetrievalCSE 8337: Information Retrieval

IntroductionIntroduction

Traditional Library search methodTraditional Library search method

Introduction (cont.)Introduction (cont.)

Other search engines still using Other search engines still using description search method.description search method.

Current image search method: by Current image search method: by description.description.

Introduction (cont.)Introduction (cont.)

Sample of Google Video Search:Sample of Google Video Search:

Introduction (cont.)Introduction (cont.)

Google Video Archive selections:Google Video Archive selections:

Introduction (cont.)Introduction (cont.)

Picture is worth a thousand words.Picture is worth a thousand words. More than words can express.More than words can express. Growing number video clips on MySpace Growing number video clips on MySpace

and YouTube, there is a need for a video and YouTube, there is a need for a video search engine.search engine.

Introduction (cont.)Introduction (cont.)

Sample YouTube Video page:Sample YouTube Video page:

Introduction (cont.)Introduction (cont.)

Therefore, we need a better search Therefore, we need a better search technique – Content-Based Video technique – Content-Based Video Retrieval System (CBVR).Retrieval System (CBVR).

Introduction (cont.)Introduction (cont.)

What good is video retrieval?What good is video retrieval? Historical AchieveHistorical Achieve Forensic documentsForensic documents Fingerprint & DNA matchingFingerprint & DNA matching Security usageSecurity usage

Overview (cont.)Overview (cont.) CBVR has two Approaches:CBVR has two Approaches:

Attribute basedAttribute based Object basedObject based

CBVR can be done by:CBVR can be done by: ColorColor TextureTexture ShapeShape Spatial relationshipSpatial relationship Semantic primitivesSemantic primitives BrowsingBrowsing Objective AttributeObjective Attribute Subjective AttributeSubjective Attribute MotionMotion Text & domain conceptsText & domain concepts

Overview (cont.)Overview (cont.)

CBVR has two phases:CBVR has two phases: Database Population phaseDatabase Population phase

• Video shot boundary detectionVideo shot boundary detection• Key Frames selectionKey Frames selection• Feature extractionFeature extraction

Video Retrieval phaseVideo Retrieval phase• Similarity measureSimilarity measure

Overview (cont.)Overview (cont.)

How CBVR works:How CBVR works:

[Wang, Li, Wiederhold, 2001]

Database Population PhaseDatabase Population Phase

Here are the three major procedures:Here are the three major procedures: Shot boundary detection – partition, segmentsShot boundary detection – partition, segments

[Luo, Hwang, Wu, 2004]

Database Population Phase (cont)Database Population Phase (cont)

Key frames selection – select characteristicsKey frames selection – select characteristics Extracting low-level spatial features like color, Extracting low-level spatial features like color,

texture, shape, etc.texture, shape, etc.

[Luo, Hwang, Wu, 2004]

Database Population Phase (cont.)Database Population Phase (cont.)

Video is complex data type – audio & videoVideo is complex data type – audio & video Audio can be handled by query by humming.Audio can be handled by query by humming. Voice recognition system using Patricia-like tree Voice recognition system using Patricia-like tree

to construct all possible substrings of a to construct all possible substrings of a sentence.sentence.

Audio is categorized by: speech, music, and Audio is categorized by: speech, music, and sound.sound.

Audio retrieval methods: Hidden Markov Model, Audio retrieval methods: Hidden Markov Model, Boolean Search with multi-query using Fuzzy Boolean Search with multi-query using Fuzzy Logic.Logic.

Database Population Phase (cont)Database Population Phase (cont)

Most simple database storage: description Most simple database storage: description of video as index along with the video. of video as index along with the video.

Human effort is involved in this case.Human effort is involved in this case. We are searching for automatic video We are searching for automatic video

indexing and digital image storage method indexing and digital image storage method – Latent Semantic Indexing (LSI) – Latent Semantic Indexing (LSI)

Database Population Phase (cont.)Database Population Phase (cont.)

LSI is using vector space model – low rank LSI is using vector space model – low rank approximation of vector space represent image approximation of vector space represent image document collection.document collection.

Original matrix is replaced by an as close as Original matrix is replaced by an as close as possible matrix, where its column space is only possible matrix, where its column space is only the subspace of the original matrix column the subspace of the original matrix column space.space.

By reducing the rank of the matrix, noises By reducing the rank of the matrix, noises (duplicate frames) are reduce to improve storage (duplicate frames) are reduce to improve storage and retrieval performance.and retrieval performance.

Term indexing is referred to the process of Term indexing is referred to the process of assigning terms to the content of the video.assigning terms to the content of the video.

Database Population Phase (cont.)Database Population Phase (cont.)

Closest terms in the database is returned Closest terms in the database is returned based on the similarity measure between based on the similarity measure between the query images and the resulting ones.the query images and the resulting ones.

Cosine similarity measure is used in the Cosine similarity measure is used in the vector space model.vector space model.

Cosine similarity measure on Term-by-Cosine similarity measure on Term-by-video matrix:video matrix:

k

h

k

h

k

h

vt

vtvt

1

2

1

21

1 11 ),cos(

Database Population Phase (cont.)Database Population Phase (cont.)

Enterprise database like Oracle introduces Enterprise database like Oracle introduces new object type: ORDImage, which new object type: ORDImage, which contains four different visual attributes: contains four different visual attributes: global color, local color, texture and global color, local color, texture and shape.shape.

ORDImageIndex provides ORDImageIndex provides multidimensional index structure to speed multidimensional index structure to speed up stored feature vectors.up stored feature vectors.

Database Population Phase (cont.)Database Population Phase (cont.) Oracle example of joining two images of Picture1 and Picture2:Oracle example of joining two images of Picture1 and Picture2:

CREATE TABLE Picture1(CREATE TABLE Picture1(author VARCHAR2(30),author VARCHAR2(30),description VARCHAR2(200),description VARCHAR2(200),photo1 ORDSYS.ORDImage,photo1 ORDSYS.ORDImage,photo1_sig ORDSYS.ORDImageSignaturephoto1_sig ORDSYS.ORDImageSignature

););CREATE TABLE Picture2(CREATE TABLE Picture2(

mydescription VARCHAR2(200),mydescription VARCHAR2(200),photo2 ORDSYS.ORDImage,photo2 ORDSYS.ORDImage,photo2_sig ORDSYS.ORDImageSignaturephoto2_sig ORDSYS.ORDImageSignature

););SELECT p1.description, p2.mydescriptionSELECT p1.description, p2.mydescriptionFROM Picture p1, Picture p2,FROM Picture p1, Picture p2,WHEREWHERE

ORDSYS.IMGSimilar(p1.photo1_sig, p2.photo2_sig,ORDSYS.IMGSimilar(p1.photo1_sig, p2.photo2_sig, ‘ ‘color=”0,6” texture=”0,2” shape=”0,1”color=”0,6” texture=”0,2” shape=”0,1” location=”0,1”’, 20)=1;location=”0,1”’, 20)=1;

Note: Weighted sum of the distance of the visual attributes is less than or equal to Note: Weighted sum of the distance of the visual attributes is less than or equal to the threshold, the image is matched.the threshold, the image is matched.

Image Retrieval PhaseImage Retrieval Phase

Query by example (QBE)Query by example (QBE) Allow to select sample image to search.Allow to select sample image to search.

[Wang, Li, Wiederhold, 2001]

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

[Li, Shapiro 2004]

Yet Another CBVR Application Interface

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Query by color anglogramQuery by color anglogram Histogram intersection measures is a fairly Histogram intersection measures is a fairly

standard metric to analyze histogram base on standard metric to analyze histogram base on features.features.

Image is divided into 5 sub-images, upper Image is divided into 5 sub-images, upper right, upper left, lower right, lower left, and the right, upper left, lower right, lower left, and the center image.center image.

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Query by color anglogram (cont.)Query by color anglogram (cont.) Convert RGB to HSV [wikipedia]Convert RGB to HSV [wikipedia]

Global and sub-image histogram forms LSI Global and sub-image histogram forms LSI matrix. matrix.

[Zhao & Grosky 2002][Zhao & Grosky 2002]

Image Retrieval Phase (cont)Image Retrieval Phase (cont)

Sample results:Sample results:

Ancient TowersAncient Towers

Ancient ColumnsAncient Columns

Horses FigureHorses Figure

[Zhao & Grosky 2002][Zhao & Grosky 2002]

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Retrieve by shape anglogramRetrieve by shape anglogram Each image is divided into 256 block.Each image is divided into 256 block. Each block is approximated with hue and Each block is approximated with hue and

saturated value.saturated value. Corresponding feature points are mapped Corresponding feature points are mapped

perceptually base on the saturated value.perceptually base on the saturated value. Feature histogram is obtained by measure the Feature histogram is obtained by measure the

largest angle of the nearest feature points.largest angle of the nearest feature points.

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Query by shape anglogram (cont): DemoQuery by shape anglogram (cont): Demo

[Zhao & Grosky 2002][Zhao & Grosky 2002]

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Query by shape anglogram sample output:Query by shape anglogram sample output:

[Zhao & Grosky 2002][Zhao & Grosky 2002]

Image Retrieval Phase (cont.)Image Retrieval Phase (cont.)

Query by color and other category Query by color and other category selection combination.selection combination. Use training dataset: sky, sun, land, water, Use training dataset: sky, sun, land, water,

boat, grass, horse, rhino, bird, human, boat, grass, horse, rhino, bird, human, pyramid, column, tower, sphinx, and snow.pyramid, column, tower, sphinx, and snow.

Sun(5%), grass (15%), Sky(20%) combine Sun(5%), grass (15%), Sky(20%) combine with the LSI matrix to return better results.with the LSI matrix to return better results.

Future WorksFuture Works

Handle multi-layer imagesHandle multi-layer images Include human-intractable relevance Include human-intractable relevance

retrieval feedback system.retrieval feedback system. Eliminate bias objects but not affecting the Eliminate bias objects but not affecting the

performance.performance.

SummarySummary

Content-Based Video Retrieval system contains Content-Based Video Retrieval system contains two phases:two phases: Database population phaseDatabase population phase

• Shot boundary detectionShot boundary detection• Key frames selectionKey frames selection• Extract low-level featuresExtract low-level features

Image retrieval phaseImage retrieval phase• Query by exampleQuery by example• Query by color anglogramQuery by color anglogram• Query by shape anglogramQuery by shape anglogram• Query by color anglogram and category bit.Query by color anglogram and category bit.

ConclusionConclusion

Content-based Video Retrieval system is Content-based Video Retrieval system is not a sound system.not a sound system.

Video stream will become the main stream Video stream will become the main stream in the years to come.in the years to come.

Better off if we had a efficient CBVR Better off if we had a efficient CBVR search engine ready.search engine ready.

Still many area needs to be improved.Still many area needs to be improved.

The EndThe End

Thank you.Thank you.