Post on 24-Feb-2016
description
Utilizing Video Ontology for Fast and Accurate Query-by-Example Retrieval
Kimiaki ShirahamaGraduate School of Economics, Kobe University
Kuniaki UeharaGraduate School of System Informatics, Kobe University
Need for Video OntologyQBE (Query by Example):1. Represent a query by providing example shots2. Retrieve shots similar to example shots in terms of features
The similarity on features does not agree with the similarity on semantic contents!
Characterized by few edges corresponding to sky regions (Overfitting)
(Example shots) (Retrieved shots)Query: Tall buildings are shown
Relevant
Irrelevant
Our Use of Video Ontology
(Example shots)
Query: Tall buildings are shownRetained
Filtered
Construct a video ontology as knowledge base in QBE(Formal and explicit specification of concepts, concept properties and their relations)→ Filter shots that are clearly irrelevant to the query
Building: 0.9Urban: 0.7Person: 0.1
Building: 0.8Urban: 0.8Person: 0
Building: 0.9Urban: 0.6Person: 0
Building: 0.8Urban: 0.9Person: 0.2
Building: 0.1Urban: 0.0Person: 0.8
Building: 0.0Urban: 0.0Person: 0.1
Building: 0.0Urban: 0.1Person: 0
Building: 0.9Urban: 0.7Person: 0.1
Concept detection scores arealready assigned to all shots.
Overview of Our Approach
1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering: Filter irrelevant shots based on selected concepts
a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible
Building
Outdoor
Tower Window
Building
Tower
Outdoor
Window
House
House
Concept vocabulary Video ontology
Query: Buildings are shown
Concept selection:Buildings, Outdoor,Tower, etc.
Retained
Filtered
Shot filtering
Large-Scale Concept Ontology for Multimedia (LSCOM)
The most popular ontology for video retrieval 1,000 concepts in broadcast news video domain
Broadcast_News
Location People Objects
Office
Meeting
Crowd
Face
Bus
Vehicle
Program
Weather
Sports
Activity andevents
People related
Walk/Run
Graphics
Maps
Charts
Concept properties and relations are clearly insufficient!→ Organize LSCOM concepts into a meaningful structure!
Many research effort to develop effective detection methods for LSCOM concepts→ Use detection scores of 374 concepts provided by City University of Hong Kong
appearWith
Concept Organization Approach
Limited kinds of reasoning(e.g. majority voting, linear combination) Various kinds of reasoning!
Inductive approachUse annotated video collection (training data)Only degrees of relatedness are represented
Deductive approach (our approach)Manually organize LSCOM conceptsVarious properties and relations are represented
Tower House
OBJECT
Vehicle
WindowpartOf
Ground_Vehicle
WITH_PERSON Car
Person
Bicycle Motorcycle
OutdoorlocatedAt
CONSTRUCTION
Building
(Wei, 2011)
Our Video OntologyShot with 8 attributes (categories of semantic contents)
Disjoint partition Visual co-occurrence
Define new concepts to construct a meaningful structure
appearWith
Video Ontology Construction
Ground_Vehicle
1. Disjoint partition
Car
Vehicle
Cannot be an instance of morethan one subconcept!
Ground_Vehicle
Car
Vehicle
Bus
2. Visual co-occurrence
Ground_Vehicle
WITH_PERSON
Bicycle
Car
Some concepts are frequently shownin the same shots
All concepts should not be organizedinto a single hierarchy
Motorcycle
Person
Overview of Our Approach
1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:
a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible
Building
Outdoor
Tower Window
Building
Tower
Outdoor
Window
House
House
Concept vocabulary Video ontology
Query: Buildings are shown
Concept selection:Buildings, Outdoor,Tower, etc.
Retained
Filtered
Shot filtering
Concept Selection1. Select concepts matching words in the text description of a query2. Select their subconcepts, and concepts specified as properties3. Select concepts with properties matching words in the text description of the query
Query: Buildings are shown
4. Validate selected concepts using example shots→ Concepts detected with high detection scores are selected
Overview of Our Approach
1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:
a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible
Building
Outdoor
Tower Window
Building
Tower
Outdoor
Window
House
House
Concept vocabulary Video ontology
Query: Buildings are shown
Concept selection:Buildings, Outdoor,Tower, etc.
Retained
Filtered
Shot filtering
Shot Filtering Using Concept Relations
(Wrongly retained shot)
Query: Person appears with computers
Person, Female_Person Computer, Room, Laboratory
Use concept relations to efficiently filter irrelevant shots
Indoor object Indoor locations
Shots where Outdoor isdetected should be filtered!
Simple filtering: Filter shots if none ofselected concepts are detected.
Use of Concept Relations Hierarchical relations
Sibling relations (Disjoint partition)
Building
Office_Building
LOCATION
INDOOROutdoorUnderwaterOuter_Space
Should not be detected
Query: Buildings are shown
IF detected,
THEN should be detected,
Query: Person appears with computers→ INDOOR needs to be detected
Office_Building iswrongly detected
Overview of Our Approach
1. Video ontology construction: Organize concepts into a meaningful structure2. Concept selection: Select concepts related to a given query3. Shot filtering:
a. Filter as many irrelevant shots as possibleb. Retain as many relevant shots as possible
Building
Outdoor
Tower Window
Building
Tower
Outdoor
Window
House
House
Concept vocabulary Video ontology
Query: Buildings are shown
Concept selection:Buildings, Outdoor,Tower, etc.
Retained
Filtered
Shot filtering
Uncertainty in Concept Detection
Indoor fails to be detected
Query: Person appears with computers → INDOOR needs to be detected
Ontologies present a priori knowledge which is taken as true by human.→ Lack the support for uncertainty
Traditional ontology reasoning assumes … (Russell, 2003)1. Locality: If A⇒B, then B is concluded by A without considering any other rules.2. Detachment: Once B is proven, it is used regardless of how it was derived.→ Do not consider the uncertainty of a hypothesis 3. Truth functionality: A complex rule can be examined from the truth of its components.→ Does not consider the uncertainty of combining multiple hypotheses
How to manage erroneous concept detection results?
Dempster-Shafer Theory (1/2)
1. Locality: If A⇒B, then B is concluded by A without considering any other rules.2. Detachment: Once B is proven, it is used regardless of how it was derived.→ Do not consider the uncertainty of a hypothesis ⇒ Represent the degree of belief of a hypothesis
Demspter-Shafer Theory (DST):Generalization of Bayesian theory where the probability of a hypothesis isdefined based on its degree of belief
Degree of belief that a shot is certainly relevantm({relevance}) = 0.2
Degree of belief that its relevance is uncertain m({relevance, irrelevance}) = 0.6
Degree of belief that it is certainly irrelevantm({irrelevance}) = 0.2
Query: Person appears with computers → INDOOR needs to be detected
Dempster-Shafer Theory (2/2)
3. Truth functionality: A complex rule can be examined from the truth of its components.→ Does not consider the uncertainty of combining multiple hypotheses ⇒ Combination rule for integrating uncertain hypotheses
Filtering by video ontologym({relevance}) = 0.2m({relevance, irrelevance}) = 0.6m({irrelevance}) = 0.2
Filtering by visual –based approachm({relevance}) = 0.7m({relevance, irrelevance}) = 0.2m({irrelevance}) = 0.1
Conflict
Agreement
Query: Persons appear with computers → INDOOR needs to be detected
jm({relevant}) = 0.71
Experimental SettingTRECVID 2009 video data
219 development videos (36,106 shots) → Manually select 10 example shots for each query
619 test videos (97,150 shots)→ Retrieve shots matching the query
Target Queries• Query 1: A view of one or more tall buildings and the top story visible• Query 2: One or more people, each at a table or desk with a computer visible• Query 3: One or more people, each sitting in a char, talking
Evaluation measures1. Precision: The fraction of retained shots that are relevant to a query.2. Recall: The fraction of relevant shots that are successfully retained.3. Filter recall: The fraction of irrelevant shots that are successfully filtered.4. Retrieval performance: The number of relevant shots within 1,000 retrieved shots
Retrieval method: (Shirahama, 2011)
Importance of Concept SelectionOnto WordNet Visual
Shot filtering does not heavily rely on concept selection methods.(Selected concepts are similar to each other)
Filter shots if none of selected concepts are detected
Examine the importance of using concept relations for shot filtering
Query 1 Query 2 Query 3 Query 1 Query 2 Query 3 Query 1 Query 2 Query 3
(a) Precision (b) Recall (c) Filter recall
Effectiveness of Using Concept Relations and DST
Onto: Many irrelevant shots are wrongly retained (Filter recall) No-DST: Precision is significantly improved, but recall is degraded DST: Recall is recovered while improving precision and filter recall
Onto (base) No-DST DST
(a) Precision (b) Recall (c) Filter recall
Query 1 Query 2 Query 3 Query 1 Query 2 Query 3 Query 1 Query 2 Query 3
Best!Concept relations and DST are useful to achieve shot filtering, which not only filters many irrelevant shots, but also retains many relevant shots.
Retrieval Performance and Time
Shot filtering by our video ontology is effective for bothimproving the retrieval performance and reducing the retrieval time!
Without using shot filteringUsing shot filtering
(a) Retrieval performance (b) Retrieval time
Query 1 Query 2 Query 3 Query 1 Query 2 Query 3
Conclusion and Future Works
ConclusionVideo ontology construction and utilization for fast and accurate QBE1. Video ontology construction based on design patterns of general ontologies
Disjoint partition and visual co-occurrence2. Concept selection by tracing the video ontology3. Shot filtering
Improve precision → Concept relationsImprove recall → Dempster-Shafer theory (DST)
Future works Compute detection scores for self-defined concepts Estimate optimal parameter in Basic Belief Assignment (BBA) functions in DST Reduce retrieval time by parallelizing the retrieval process
Thank you!