Content-Based Video Copy Detection: PRISMA at TRECVID 2010PRISMA (University of Chile) CCD Task...
Transcript of Content-Based Video Copy Detection: PRISMA at TRECVID 2010PRISMA (University of Chile) CCD Task...
Content-Based Video Copy Detection:PRISMA at TRECVID 2010
Juan Manuel Barrios and Benjamin Bustos
PRISMA Research GroupDepartment of Computer Science
University of Chile{jbarrios,bebustos}@dcc.uchile.cl
November 17, 2010
PRISMA (University of Chile) CCD Task November 17, 2010 1 / 25
PRISMA System Overview
Copy Detection System developed for TRECVID 2010.
Three Global descriptors.
No Audio information.
Pivot-based index with approximate search.
Voting algorithm for copy localization.
Implemented in C with OpenCV library.
System divided in five tasks/steps.
PRISMA (University of Chile) CCD Task November 17, 2010 2 / 25
PRISMA System Overview
Feature Extraction
Frame Sampling
Preprocessing
Similarity Search
Copy Localization
QueryVideos
ReferenceVideos
DetectionResult
1
2
3
4
5
PRISMA (University of Chile) CCD Task November 17, 2010 3 / 25
System Tasks
1 Preprocessing:
Skip irrelevant frames.Remove black borders.Inverse transformations for Camcording, PIP and Flip.
Query videos increased from 1,608 to 5,378.Reference videos kept in 11,524.
PRISMA (University of Chile) CCD Task November 17, 2010 4 / 25
System Tasks
2 Frame Sampling:
Divides each video in groups of similar consecutive frames (GF).Uniform subsampling of 3 frames per second.Similarity between frames defined as maximum difference betweenintensity of pixels.
Query Videos are divided into 1,000,000 groups.Reference Videos are divided into 4,000,000 groups.
PRISMA (University of Chile) CCD Task November 17, 2010 5 / 25
System Tasks
2 Frame Sampling:
Divides each video in groups of similar consecutive frames (GF).Uniform subsampling of 3 frames per second.Similarity between frames defined as maximum difference betweenintensity of pixels.
Query Videos are divided into 1,000,000 groups.Reference Videos are divided into 4,000,000 groups.
GF1 GF2 GF3 GF4 GF5 GF6 GF7 GF8 GF9 GF10 GF11 GF13GF12
PRISMA (University of Chile) CCD Task November 17, 2010 5 / 25
System Tasks
3 Feature Extraction:
Descriptor of a group is the average of descriptors for each frame.Extracts three global visual descriptors :
EH: Edge Histogram (4× 4× 10 = 160 dimensions)GH: Gray Histogram (3× 3× 20 = 180 dimensions)CH: RGB Histogram (2× 2× 48 = 192 dimensions)(1 byte per dimension)
( )
...EH ( )
... ( )
... ( )
... ( )
... ( )... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
( )
...GH ( )
... ( )
... ( )
... ( )
... ( )... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
( )
...CH ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
GF1 GF2 GF3 GF4 GF5 GF6 GF7 GF8 GF9 GF10 GF11 GF13GF12
PRISMA (University of Chile) CCD Task November 17, 2010 6 / 25
System Tasks
4 Similarity Search:
Compares descriptors from query groups with descriptors fromreference groups.DIST (Gi, Gj) is a distance function that measures the similaritybetween groups Gi and Gj .DIST is defined as a combination of two descriptors:
Run ehdNgryhst: DIST combines EH and GH.Run ehdNclrhst: DIST combines EH and CH.
PRISMA (University of Chile) CCD Task November 17, 2010 7 / 25
Similarity Search Task
Distance between groups is a static weighted combination ofdistance between descriptors (γ):
δ(Gi, Gj) = w1 × γ1(Gi, Gj) + w2 × γ2(Gi, Gj)
We defined γ as L1 (Manhattan) distance for EHD, GH and CHvectors:
L1(x, y) =
d∑i=0
|xi − yi|
Final distance between groups is the average of δ between threeconsecutive groups:
DIST (Gi, Gj) =δ(Gi−1, Gj−1) + δ(Gi, Gj) + δ(Gi+1, Gj+1)
3
DIST requires more than 1,000 operations to be evaluated.
PRISMA (University of Chile) CCD Task November 17, 2010 8 / 25
Similarity Search Task
We set weights for each descriptor using a histogram of distancesbetween pairs of vectors.
Weights normalize to 100 the distance that covers 0.01% of pairson each histogram: 100
1469 = 0.068 1001106 = 0.090 100
660 = 0.152
ehdNgryhst: δ = 0.068× EH + 0.090×GH
ehdNclrhst: δ = 0.068× EH + 0.152× CH
PRISMA (University of Chile) CCD Task November 17, 2010 9 / 25
Similarity Search Task
The intrinsic dimensionality µ2
2σ2 quantifies how hard is to searchon a metric space [Chavez et al, 2001].
Move w2 to a value that locally maximizes intrinsic dimensionalityof δ.
Iterative algorithm that converged to:
ehdNgryhst: δ = 0.068× EH + 0.090×GHehdNclrhst: δ = 0.068× EH + 0.045× CH
PRISMA (University of Chile) CCD Task November 17, 2010 10 / 25
Similarity Search Task
The output of the Similarity Search task is a Nearest-NeighborsTable with most similar reference groups for each query group.
Query1Group1Query1Group2Query1Group3Query1Group4Query1Group5Query1Group6Query1Group7Query1Group8
Vid07_Grp54 distVid09_Grp13 distVid07_Grp34 distVid09_Grp15 distVid01_Grp88 distVid09_Grp54 distVid01_Grp45 distVid09_Grp19 dist
Vid08_Grp73 distVid02_Grp34 distVid03_Grp54 distVid02_Grp13 distVid01_Grp12 distVid09_Grp17 distVid03_Grp43 distVid01_Grp12 dist
Vid01_Grp68 distVid02_Grp33 distVid09_Grp14 distVid03_Grp65 distVid07_Grp58 distVid07_Grp59 distVid03_Grp20 distVid07_Grp61 dist
... ... ... ...
Query NN 1 NN 2 NN 3
A naive approach would evaluate 1,000,000 × 4,000,000 timesDIST (this takes about 11 month!).
PRISMA (University of Chile) CCD Task November 17, 2010 11 / 25
Similarity Search Task
DIST complies with metric properties: Reflexivity,Non-Negativity, Symmetry, and Triangle Inequality.
Let q be a group of frames from a query video, and v be a group offrames from a reference video.
A lower bound for DIST (q, v) can be calculated with pivots:
PRISMA (University of Chile) CCD Task November 17, 2010 12 / 25
Similarity Search Task
DIST complies with metric properties: Reflexivity,Non-Negativity, Symmetry, and Triangle Inequality.
Let q be a group of frames from a query video, and v be a group offrames from a reference video.
A lower bound for DIST (q, v) can be calculated with pivots:
Lower Bound: DIST (q, v) ≥ |DIST (p, q)−DIST (p, v)|
PRISMA (University of Chile) CCD Task November 17, 2010 12 / 25
Similarity Search Task
DIST complies with metric properties: Reflexivity,Non-Negativity, Symmetry, and Triangle Inequality.
Let q be a group of frames from a query video, and v be a group offrames from a reference video.
A lower bound for DIST (q, v) can be calculated with pivots:
Let S = {p1, ..., pm} be a set of pivots, then:DIST (q, v) ≥ maxp∈S {|DIST (p, q)−DIST (p, v)|}
PRISMA (University of Chile) CCD Task November 17, 2010 12 / 25
Similarity Search Task
Index creation:The system selects 4 sets of 9 pivots with the incremental SSSalgorithm [Bustos et al, 2008].
Each set requires a table with 9 × 4,000,000 distances.
The system compares the 4 sets and selects the set that has thegreatest average lower bound and discards the others [Zezula et al,2005].
PRISMA (University of Chile) CCD Task November 17, 2010 13 / 25
Similarity Search Task
Index creation:The system selects 4 sets of 9 pivots with the incremental SSSalgorithm [Bustos et al, 2008].
Each set requires a table with 9 × 4,000,000 distances.
The system compares the 4 sets and selects the set that has thegreatest average lower bound and discards the others [Zezula et al,2005].
PRISMA (University of Chile) CCD Task November 17, 2010 13 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
Similarity Search Task
Similarity search for a query group q:For every pivot p evaluate DIST (q, p).For every reference group v calculate a lower bound for DIST (q, v)
Only 9 operations to calculate each lower bound.
Select 4,000 objects (0.1%) with lowest lower bounds.Calculate actual DIST (q, v) just for the 4,000 objects and selectthe NNs between them.
PRISMA (University of Chile) CCD Task November 17, 2010 14 / 25
System Tasks
5 Copy Localization:
Takes NNs table and searches for chains of groups belonging to asame reference video with temporal coherence.Voting algorithm based on NN rank, NN distance and spread ofvotes in chain.Copy localization set as start/end of chain.
Query1Group1Query1Group2Query1Group3Query1Group4Query1Group5Query1Group6Query1Group7Query1Group8
Vid07_Grp54 distVid09_Grp13 distVid07_Grp34 distVid09_Grp15 distVid01_Grp88 distVid09_Grp54 distVid01_Grp45 distVid09_Grp19 dist
Vid08_Grp73 distVid02_Grp34 distVid03_Grp54 distVid02_Grp13 distVid01_Grp12 distVid09_Grp17 distVid03_Grp43 distVid01_Grp12 dist
Vid01_Grp68 distVid02_Grp33 distVid09_Grp14 distVid03_Grp65 distVid07_Grp58 distVid07_Grp59 distVid03_Grp20 distVid07_Grp61 dist
... ... ... ...
Query NN 1 NN 2 NN 3
PRISMA (University of Chile) CCD Task November 17, 2010 15 / 25
System Tasks
5 Copy Localization:
Takes NNs table and searches for chains of groups belonging to asame reference video with temporal coherence.Voting algorithm based on NN rank, NN distance and spread ofvotes in chain.Copy localization set as start/end of chain.
Query1Group1Query1Group2Query1Group3Query1Group4Query1Group5Query1Group6Query1Group7Query1Group8
Vid07_Grp54 distVid09_Grp13 distVid07_Grp34 distVid09_Grp15 distVid01_Grp88 distVid09_Grp54 distVid01_Grp45 distVid09_Grp19 dist
Vid08_Grp73 distVid02_Grp34 distVid03_Grp54 distVid02_Grp13 distVid01_Grp12 distVid09_Grp17 distVid03_Grp43 distVid01_Grp12 dist
Vid01_Grp68 distVid02_Grp33 distVid09_Grp14 distVid03_Grp65 distVid07_Grp58 distVid07_Grp59 distVid03_Grp20 distVid07_Grp61 dist
... ... ... ...
Query NN 1 NN 2 NN 3
score Vid07= 2.2
PRISMA (University of Chile) CCD Task November 17, 2010 15 / 25
System Tasks
5 Copy Localization:
Takes NNs table and searches for chains of groups belonging to asame reference video with temporal coherence.Voting algorithm based on NN rank, NN distance and spread ofvotes in chain.Copy localization set as start/end of chain.
Query1Group1Query1Group2Query1Group3Query1Group4Query1Group5Query1Group6Query1Group7Query1Group8
Vid07_Grp54 distVid09_Grp13 distVid07_Grp34 distVid09_Grp15 distVid01_Grp88 distVid09_Grp54 distVid01_Grp45 distVid09_Grp19 dist
Vid08_Grp73 distVid02_Grp34 distVid03_Grp54 distVid02_Grp13 distVid01_Grp12 distVid09_Grp17 distVid03_Grp43 distVid01_Grp12 dist
Vid01_Grp68 distVid02_Grp33 distVid09_Grp14 distVid03_Grp65 distVid07_Grp58 distVid07_Grp59 distVid03_Grp20 distVid07_Grp61 dist
... ... ... ...
Query NN 1 NN 2 NN 3
score Vid07= 2.2score Vid09= 3.7
PRISMA (University of Chile) CCD Task November 17, 2010 15 / 25
Results
RESULTS
PRISMA (University of Chile) CCD Task November 17, 2010 16 / 25
Results
Submitted Runs:
balanced.ehdNgryhst: δ = 0.068× EH + 0.090×GHbalanced.ehdNclrhst: δ = 0.068× EH + 0.045× CHnofa.ehdNgryhst: equal to balanced.ehdNgryhst with strictervoting algorithm.nofa.ehdNghT10: equal to nofa.ehdNgryhst but with a differentthreshold.
Analysis focused on Optimal NDCR.
EH+GH slightly better than EH+CH.
Better results in NOFA profile than in Balanced profile.
PRISMA (University of Chile) CCD Task November 17, 2010 17 / 25
Results nofa.ehdNgryhst
Optimal NDCR:Lower NDCR than median for each transformation.Better results for Insertion of Pattern and Strong Reencoding.
0.001
0.010
0.100
1.000
10.000
100.000
1000.000
9999.999
min
imal
ND
CR
PRISMA (University of Chile) CCD Task November 17, 2010 18 / 25
Results nofa.ehdNgryhst
Optimal NDCR:Lower NDCR than median for each transformation.Better results for Insertion of Pattern and Strong Reencoding.
0.001
0.010
0.100
1.000
10.000
100.000
1000.000
9999.999
min
imal
ND
CR
Camcording
PIP
Insertio
no
f Pattern
Stro
ng
reenco
din
g
Changegam
ma
Decreasequality
Postproduction
Randomcom
bination
PRISMA (University of Chile) CCD Task November 17, 2010 18 / 25
Results nofa.ehdNgryhst
Optimal F1:Good localization for PIP and bad localization for Camcording andChange in gamma.
0.0
0.2
0.4
0.6
0.8
1.0
Op
tim
al m
ean
F1
for
TP
s
Cam
cording
PIP
Insertionof P
attern
Strong
reencoding
Change
gamm
a
Decrease
quality
Postproduction
Random
combination
PRISMA (University of Chile) CCD Task November 17, 2010 19 / 25
Results nofa.ehdNgryhst
Mean Time:
Slightly higher than the median, specially for camcording and PIP.
0.001
0.010
0.100
1.000
10.000
100.000
1000.000
9999.999
mea
n p
roce
ssin
g ti
me
(s)
Cam
cording
PIP
Insertionof P
attern
Strong
reencoding
Change
gamm
a
Decrease
quality
Postproduction
Random
combination
PRISMA (University of Chile) CCD Task November 17, 2010 20 / 25
Comparison
Comparison with Optimal NDCR averaged between alltransformations.
22 teams, 41 submitted runs for balanced profile and 37 for nofaprofile.
Run Avg Opt NDCR global rank video-only rank
balanced.ehdNgryhst 0.597 14th of 41 1st of 15
balanced.ehdNclrhst 0.658 16th of 41 3rd of 15
nofa.ehdNgryhst 0.611 10th of 37 1st of 14
nofa.ehdNghT10 0.611 11th of 37 2nd of 14
Run Avg Opt F1 global rank video-only rank
balanced.ehdNgryhst 0.820 15th of 41 2nd of 15
balanced.ehdNclrhst 0.820 16th of 41 3rd of 15
nofa.ehdNgryhst 0.828 14th of 37 1st of 14
nofa.ehdNghT10 0.828 15th of 37 2nd of 14
PRISMA (University of Chile) CCD Task November 17, 2010 21 / 25
Comparison
1.0
0.8
0.6
0.4
0.2
0
Ave
rag
e O
pti
mal
F1
0.1 1.0 10 100 1000Average Optimal NDCR
PRISMA
Video only Audio only Audio+VideoNo False Alarms Profile
(logarithmic scale)
PRISMA (University of Chile) CCD Task November 17, 2010 22 / 25
Comparison
1.0
0.8
0.6
0.4
0.2
0
Ave
rag
e O
pti
mal
F1
0.5 1.0 1.5 2.0Average Optimal NDCR
PRISMA
Video only Audio only Audio+VideoBalanced Profile
PRISMA (University of Chile) CCD Task November 17, 2010 23 / 25
Conclusions
Acceptable overall results:
Global descriptors can achieve competitive results with TRECVIDtransformations.Pivot-based approximation enables to discard 99.9% of distancecomputations and still have good effectiveness.
Two novel techniques:
Set weights maximizing intrinsic dimensionality.Calculate actual distance just for 0.1% lowest lower bounds.
Future work:
Improve the efficiency of preprocessing task.Test other distances for descriptors instead of L1 (in particularsome non-metric similarity measure).Test the inclusion of audio information and local descriptors.
PRISMA (University of Chile) CCD Task November 17, 2010 24 / 25
Thank you!
Feature Extraction
Frame Sampling
Preprocessing
Similarity Search
Copy Localization
QueryVideos
ReferenceVideos
DetectionResult
1
2
3
4
5
Query1Group1Query1Group2Query1Group3Query1Group4Query1Group5Query1Group6Query1Group7Query1Group8
Vid07_Grp54 distVid09_Grp13 distVid07_Grp34 distVid09_Grp15 distVid01_Grp88 distVid09_Grp54 distVid01_Grp45 distVid09_Grp19 dist
Vid08_Grp73 distVid02_Grp34 distVid03_Grp54 distVid02_Grp13 distVid01_Grp12 distVid09_Grp17 distVid03_Grp43 distVid01_Grp12 dist
Vid01_Grp68 distVid02_Grp33 distVid09_Grp14 distVid03_Grp65 distVid07_Grp58 distVid07_Grp59 distVid03_Grp20 distVid07_Grp61 dist
... ... ... ...
Query NN 1 NN 2 NN 3
score Vid07= 2.2score Vid09= 3.7
1.0
0.8
0.6
0.4
0.2
0
Ave
rag
e O
pti
mal
F1
0.5 1.0 1.5 2.0Average Optimal NDCR
PRISMA
Video only Audio only Audio+VideoBalanced Profile
( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
... ( )
...
GF1 GF2 GF3 GF4 GF5 GF6 GF7 GF8 GF9 GF10 GF11 GF13GF12
Thank you!PRISMA (University of Chile) CCD Task November 17, 2010 25 / 25