New Challenges in Semantic Concept Detection€“ Developed an efficient and easy-to-use toolkit...

16
New Challenges in New Challenges in Semantic Concept Detection Semantic Concept Detection M.-F. Weng, C.-K. Chen, Y.-H. Yang, R.-E. Fan, Y.-T. Hsieh, Y.-Y. Chuang, W. H. Hsu, and C.-J. Lin National Taiwan University 2 Preliminaries for Semantic Preliminaries for Semantic Concept Detection Concept Detection What are preliminaries for building a semantic concept detection system? A lexicon of well A lexicon of well- defined concepts defined concepts Training resources Training resources Video data • Annotations • Features Tools Tools Tagging or labeling tools, (e.g., CMU and IBM tools) Feature extractors Machine learning tools, (e.g., LIBSVM) Semantic concept detection tailored tools

Transcript of New Challenges in Semantic Concept Detection€“ Developed an efficient and easy-to-use toolkit...

New Challenges inNew Challenges inSemantic Concept DetectionSemantic Concept Detection

M.-F. Weng, C.-K. Chen, Y.-H. Yang, R.-E. Fan,

Y.-T. Hsieh, Y.-Y. Chuang, W. H. Hsu, and C.-J. Lin

National Taiwan University

2

Preliminaries for Semantic Preliminaries for Semantic Concept DetectionConcept Detection• What are preliminaries for building a semantic

concept detection system?–– A lexicon of wellA lexicon of well--defined conceptsdefined concepts–– Training resourcesTraining resources

• Video data• Annotations• Features

–– ToolsTools• Tagging or labeling tools, (e.g., CMU and IBM tools)• Feature extractors• Machine learning tools, (e.g., LIBSVM)• Semantic concept detection tailored tools

3

Semantic ConceptsSemantic Concepts

MediaMill 101

Columbia374

LSCOM 449

TRECVID-2006 LSCOM-Lite (39)

TRECVID-2005 10 Concepts

4

Video Data SetsVideo Data Sets

TV 03

TV 04

TV 05

TV 06

TV 07

Dat

aset

0 50 100 150 200

Video Length (hrs)

devel. set

test set

~190 hours ~190 hours News VideoNews Video

~330 hours ~330 hours MultiMulti--Lang. Lang. Broadcast Broadcast

News VideoNews Video

100 hours 100 hours Sound and Sound and

VisionVision VideoVideo……

5

AnnotationsAnnotations

TV 05

TV 06

TV 07

Dat

aset

0 50 100 150 200Video Length (hrs)

devel. set

test set

NIST Truth Judgments

Common Annotations

6

Features, Detectors, ScoresFeatures, Detectors, Scores

Scores of TV Scores of TV 07 dataset07 dataset

VIREOVIREO--374 374 detectorsdetectors

•Color moment

•Wavelet texture

•Keypoint featureVIREO-374

Columbia374 Columbia374 scores of TV scores of TV 06/07 dataset06/07 dataset

Columbia 374 Columbia 374 detectorsdetectors

•EDH

•GBR

•GCMColumbia374

Scores of TV Scores of TV 05/06 dataset05/06 dataset

5 sets of 101 5 sets of 101 classifiersclassifiers

•Visual feature

•Text featureMediaMillBaseline

ScoresScoresDetectorsFeatureatures

7

Available ResourcesAvailable Resources

• Concept definition is sufficient

• Training resources are plentiful

• No feature extractors and tailored tools available

Tailored Tools

Machine Learning Tools

Feature Extractors

Tagging Tools

Tools

Features

Annotations

Video dataTraining

Resources

Well-defined concepts

8

The New ChallengesThe New Challenges

• Challenge 1 : Easy and Efficient Tools

–– LL datasets, MM concepts, NN features, imply L*M*NL*M*Nclassifiers

– Each classifier has to consider many parameters

–– Time seems very limitedTime seems very limited to validate each parameter and to train all classifiers

• Challenge 2: Resource Exploitation or Reuse– Resources are precious

– Existent resources are potentially usefulpotentially useful for new dataset

– Plentiful resources have not been fully utilizednot been fully utilized

9

Facing the New ChallengesFacing the New Challenges

• Challenge 1:

– Extended LIBSVM to improve training efficiency

– Developed an efficient and easy-to-use toolkit tailored for semantic concept detection

• Challenge 2:

– Reused classifiers of past data to improve accuracy by late aggregation

– Exploited contextual relationship and temporal dependency from annotations to boost accuracy

10

Facing the New ChallengesFacing the New Challenges

• Challenge 1 : Easy and Efficient Tools– Extended LIBSVM to improve training efficiency– Developed an efficient and easy-to-use toolkit

tailored for semantic concept detection

• Challenge 2– Reused classifiers of past data to improve

accuracy by late aggregation– Exploited contextual relationship and temporal

dependency from annotations to boost accuracy

11

A Tailored ToolkitA Tailored Toolkit

• We extended LIBSVM in three aspects for semantic concept detection:– Using dense representations

– Exploiting parallelism of independent concepts, features, and SVM model parameters

– Narrowing down parameter search to a safe range

• Overall, training time of our baseline was approximately reduced from 14 days to about 3 daysfrom 14 days to about 3 days

12

Facing the New ChallengesFacing the New Challenges

• Challenge 1 – Extended LIBSVM to improve training efficiency

– Developed an efficient and easy-to-use toolkit tailored for semantic concept detection

• Challenge 2 : Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by

late aggregation

– Exploited contextual relationship and temporal dependency from annotations to boost accuracy

13

Reuse Past DataReuse Past Data

• Early aggregation–– Must reMust re--train classifiers train classifiers

–– Cause considerable training timeCause considerable training time

• Late aggregation–– Simple and directSimple and direct

–– May be biasedMay be biased

14

Late AggregationLate Aggregation

• We adopt late aggregation to reuse existent classifiers by two strategies:–– Equally Average AggregationEqually Average Aggregation

• Simply average the scores of past and newly trained classifiers

–– ConceptConcept--dependent Weighted Aggregationdependent Weighted Aggregation• Use concept-dependent weights to aggregate

classifiers

15

Aggregation BenefitsAggregation Benefits

0

0.05

0.1

0.15

0.2

0.25

0.3

Flag-U

S

Police

_Sec

urity

Explos

ion_F

ire

Airpla

ne

Airpla

ne

Charts

Milit

ary

Deser

t

Deser

t

Truck

Wea

ther

Mou

ntain

People

-Mar

ching

Sports

Sports

Boat_

Ship

Map

sM

aps

Mee

ting

Compute

r_TV

Compute

r_TV--s

cree

n

scre

en

Car

Animal

Office

Wat

ersc

ape_

Wat

erfro

nt

Overa

ll

infA

P

TV07 Classifiers

Average Aggregation

Weighted Aggregation

Overall Improvement Ratio –

Average Aggregation : 22%

Weighed Aggregation : 30%

16

Facing the New ChallengesFacing the New Challenges

• Challenge 1 – Extended LIBSVM to improve training efficiency

– Developed an efficient and easy-to-use toolkit tailored for semantic concept detection

• Challenge 2: Resource Exploitation or Reuse– Reused classifiers of past data to improve accuracy by

late aggregation

– Exploited contextual relationship and temporal dependency from annotations to boost accuracy

17

Observation in AnnotationsObservation in Annotations

A sequence of video shots

A lexiconof concepts

car

outdoor

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

building 1 0 0 1 1 1 1 0

sky 0 1 1 1 0 1 1 0

people 0 0 0 1 0 1 1 0

urban 1 1 0 1 1 1 1 1

Contextual relationshipTemporal Dependency

18

PostPost--processing Frameworkprocessing Framework

VideoSegmentation

FeatureExtraction

ConceptDetection

TemporalFiltering

Shot Ranking

VideoSequence

Concept

Reranking

CombinationAnnotation

TemporalDependency

Mining

UnsupervisedContextual

Fusion

TemporalFilter

Design

Mining phase: Processing phase:Detecting phase:

19

Temporal FilteringTemporal Filtering

VideoSegmentation

FeatureExtraction

ConceptDetection

TemporalFiltering

Shot Ranking

VideoSequence

Concept

Reranking

CombinationAnnotation

TemporalDependency

Mining

UnsupervisedContextual

Fusion

TemporalFilter

Design

Mining phase: Processing phase:Detecting phase:

20

Temporal DependencyTemporal Dependency

• Different concepts have different levels of dependency at different temporal distance– E.g., sports, weather, maps, explosion sports, weather, maps, explosion

0

2000

4000

6000

8000

10000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

sports

weathermaps

explosion

Temporal Distance

Chi-square test

2kχ

21

Temporal FilterTemporal Filter

Xt-1

)|1( 1−= ttlP x

Xt-2

)|1( 2−= ttlP x

Xt-k

)|1( kttlP −= x

Xt Xt+1 Xt+2 Xt+k…

)|1( kttlP += x)|1( 2+= ttlP x)|1( 1+= ttlP x)|1( ttlP x=

…….. …..

w0w1 w1 w2 wkw2wk

( )[ ]∑=

−===d

kkttkt lPwlP

0

)|1()1(ˆ x

SVM Classifier

)|1( 11 −− = ttlP x)|1( 22 −− = ttlP x)|1( ktktlP −− = x )|1( ktktlP ++ = x)|1( 22 ++ = ttlP x)|1( 11 ++ = ttlP x)|1( ttlP x=

22

Filtering PredictionFiltering Prediction

• A sequence of shots for predicting sportssports

–– Classifier prediction resultsClassifier prediction results

–– After temporal filteringAfter temporal filtering0

0.20.40.60.8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

0

0.1

0.2

0.3

0.4

0.5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Misclassification picked upRank rose

23

Concept Concept RerankingReranking

VideoSegmentation

FeatureExtraction

ConceptDetection

TemporalFiltering

Shot Ranking

VideoSequence

Concept

Reranking

CombinationAnnotation

TemporalDependency

Mining

UnsupervisedContextual

Fusion

TemporalFilter

Design

Mining phase: Processing phase:Detecting phase:

24

Initial ranking list produced by a baseline methodTarget concept: ‘boat’ (search or detection)

Initial list

25

training

test

Step 1. Randomly split to training and test sets

Initial list

26

training

test

Step 2. Learn to maintain the ranking orders

Initial list

1. Related concepts1. Related concepts2. Importance of each2. Importance of each

conceptconcept

Related concepts:oceanwaterscape

Shot pairs

27

training

test

Related concepts:oceanwaterscape

Step 3. Context fusion on the test data

Initial list

28

Reranked

test

Initial list training

Step 4. Merge

29

CombinationCombination

VideoSegmentation

FeatureExtraction

ConceptDetection

TemporalFiltering

Shot Ranking

VideoSequence

Concept

Reranking

CombinationAnnotation

TemporalDependency

Mining

UnsupervisedContextual

Fusion

TemporalFilter

Design

Mining phase: Processing phase:Detecting phase:

30

PostPost--processing Benefitsprocessing Benefits

Flag-U

S

Police

_Sec

urity

Explos

ion_F

ire

Milit

ary

Airplan

e

Wea

ther

Charts

Truck

Mou

ntain

Mee

ting

People

-Mar

ching

Boat_

Ship

Sports

Deser

t

Compu

ter_

TV-scr

een

Animal Car

Map

sOffic

e

Wat

ersc

ape_

Wat

erfro

nt

Overa

ll0

0.05

0.1

0.15

0.2

0.25

0.3

infA

P

Classification

Concept Reranking

Temporal Filtering

Parallel Combination

Overall improvement ratio:Overall improvement ratio:

Parallel Combination: +10%Parallel Combination: +10%

+45%

+29%

+37%

+17%

31

ConclusionConclusion

• We reduce the training time of detectorsreduce the training time of detectors by using a tailored toolkit for semantic concept detection

• The proposed aggregation methods reuse the reuse the classifiersclassifiers of past data and can boost the detection accuracy

• Our post-processing approaches exploit exploit existent resourceexistent resource and can further improve detection results

32

Thank You For Your AttentionThank You For Your Attention