The 2012 ICSI/Berkeley Video Location Estimation System
-
Upload
mediaeval2012 -
Category
Technology
-
view
585 -
download
0
Transcript of The 2012 ICSI/Berkeley Video Location Estimation System
The 2012 ICSI / Berkeley Location Estimation System
Jaeyoung Choi, Venkatesan Ekambaram, Gerald Friedland and Kannan Ramchandran
ICSI / UC Berkeley, USAOctober 4th, 2012
1Thursday, October 4, 12
Agenda
• Baseline Approach
• Drawbacks
• Graphical Model Framework
• Result
2Thursday, October 4, 12
Baseline Approach
• Investigate ‘Spatial Variance’ of feature:
• spatial variance is small : feature is likely location-indicative
• spatial variance is large : feature is likely not indicative
3Thursday, October 4, 12
ExampleTag Matches in
Training setSpatial Variance
pavement 2 5.739
ucberkeley 4 0.132
berkeley 14 68.138
greek 0 N/A
greektheatre 0 N/A
spitonastranger 0 N/A
live 91 6453.109
video 2967 6735.844
4Thursday, October 4, 12
Problem: Sparsity coming from biased dataset
5Thursday, October 4, 12
The effect of sparsity
6
0"
10"
20"
30"
40"
50"
60"
0≤e<1&
10≤e<100&
100≤e<1000&
1000≤e<10000&
10000≤e&
Percen
tage&[%
]&
Distance&error&(e)&between&ground&truth&and&es<ma<on&[km]
>6400"6400"1600"400"100"
* Test"video"from"a"dense"area"has"higher"chance"of"being"es<mated"with"lower"error"in"distance."""
6Thursday, October 4, 12
Geo-‐tagging: an es-ma-on-‐theore-c viewpoint
{berkeley, sathergate, campanile}
{berkeley, haas} {campanile} {campanile, haas}
Observa(ons:
Images:
Tags: , , ,
{tk1} {tk2} {tk3} {tk4}, , ,Es(mate:Geoloca-ons:
x1 x2 x3 x4, , ,7Thursday, October 4, 12
Interpre-ng tradi-onal approaches
Loca-ons are random variables: {x1, x2, ....., xN}
8Thursday, October 4, 12
Interpre-ng tradi-onal approaches
Loca-ons are random variables: {x1, x2, ....., xN}
Tradi-onal approaches es-mate:p(xi|{tki }) �
Y
k
p(xi|tki )
wherep(xi|tki ) is obtained from the training set
Probability of loca-on given tags
8Thursday, October 4, 12
Interpre-ng tradi-onal approaches
Loca-ons are random variables: {x1, x2, ....., xN}
Tradi-onal approaches es-mate:p(xi|{tki }) �
Y
k
p(xi|tki )
wherep(xi|tki ) is obtained from the training set
Example: the distribu-on for the tag “washington” is depicted here
Probability of loca-on given tags
8Thursday, October 4, 12
Interpre-ng tradi-onal approaches
Loca-ons are random variables: {x1, x2, ....., xN}
Tradi-onal approaches es-mate:p(xi|{tki }) �
Y
k
p(xi|tki )
wherep(xi|tki ) is obtained from the training set
Example: the distribu-on for the tag “washington” is depicted here
Loca-on es-mate:Z
xi p(xi|{tki })dxi
Probability of loca-on given tags
8Thursday, October 4, 12
DrawbacksData sparsity: Not all tags in test set are available in training set. Hence es-mate of can be bad
p(xi|tki )Sub-‐op(mality: The approaches are subop-mal given the data.
What we ideally want:p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN})
Mean of the above distribu-on gives the best es-mate of the loca-onsi.e. for each image we want
p(xi|{tk1}, {tk2}, ...., {tkN})
Tradi-onal algorithms only give:p(xi|{tki })
9Thursday, October 4, 12
Bayesian graphical framework{berkeley, sathergate, campanile}
{berkeley, haas}
{campanile} {campanile, haas}
Node: Geoloca-on of the image
Edge: Correlated loca-ons (e.g. common tag)
Edge Poten(al: Strength of an edge, (e.g. posterior distribu-on of loca-ons given common tags)
p(xi, xj |{tki } � {tkj })
p(xj |{tkj })p(xi|{tki })
10Thursday, October 4, 12
Coopera-ve geo-‐taggingIntui-on: Images in the training set having common tags have correlated geo-‐loca-ons captured by the joint distribu-on
11Thursday, October 4, 12
Coopera-ve geo-‐taggingIntui-on: Images in the training set having common tags have correlated geo-‐loca-ons captured by the joint distribu-onJoint probability modeling:
p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y
i
p(xi|{tki })Y
(i,j)
p(xi, xj |{tki } ⇥ {tkj })
Pairwise distribu-on given at least one common tag
11Thursday, October 4, 12
Coopera-ve geo-‐taggingIntui-on: Images in the training set having common tags have correlated geo-‐loca-ons captured by the joint distribu-onJoint probability modeling:
p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y
i
p(xi|{tki })Y
(i,j)
p(xi, xj |{tki } ⇥ {tkj })
Pairwise distribu-on given at least one common tag
is obtained from the training set as before
p(xi, xj |{tki } � {tkj }) Modeled as an indicator func-on I(xi = xj)If the common tag has low spa-al variance or occurs infrequently, e.g. if the common tag is “haas”, its very likely the loca-ons are the same
p(xi|{tki })
11Thursday, October 4, 12
Coopera-ve geo-‐taggingIntui-on: Images in the training set having common tags have correlated geo-‐loca-ons captured by the joint distribu-onJoint probability modeling:
p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y
i
p(xi|{tki })Y
(i,j)
p(xi, xj |{tki } ⇥ {tkj })
Pairwise distribu-on given at least one common tag
is obtained from the training set as before
p(xi, xj |{tki } � {tkj }) Modeled as an indicator func-on I(xi = xj)If the common tag has low spa-al variance or occurs infrequently, e.g. if the common tag is “haas”, its very likely the loca-ons are the same
Ques-on: How to es-mate to op-mal marginal distribu-on ?
p(xi|{tk1}, {tk2}, ...., {tkN})
p(xi|{tki })
11Thursday, October 4, 12
Belief propaga-on updatesp(xi|{tk1}, {tk2}, ...., {tkN})Itera-ve algorithm to approximate
the posterior distribu-on
Gaussian modeling p(xi|{tki }) � N (µi,�2i )
At itera-on 0 each node calculates (µi,�2i )
At itera-on t each node updates its loca-on as a weighted mean of its previous loca-on and that of its neighbors
µ(t)i =
1
(�(t)i )2
µ(t�1)i +
Pk⇥N (i)
1
(�(t)k )2
µ(t)k
(�(t)i )2
1
(�(t)i )2
=1
(�(t�1)i )2
+X
k2i
1
(�(t�1)k )2
The weights reflect the confidence in that measurements, i.e. higher the spa-al variance lower is the weight
12Thursday, October 4, 12
Belief propaga-on
(µ1,�21)
(µ2,�22)
(µ3,�23)
Posterior mean and variance assuming Gaussian beliefs
Audio visual features are incorporated in modeling the edge and node poten-als
13Thursday, October 4, 12
Incorpora-ng Audio-‐Visual features• GIST features are extracted for the images.• MFCC features are extracted for the audio.• These are now incorporated into the node and edge poten-als as exponen-al distribu-ons.
p(xi, xj |ai, aj) ⇥ exp(� ||xi � xj ||�||ai � aj ||
)
ai are the audio features associated with image i
The intui-on is that closer the audio features are, higher the probability that the geo-‐loca-ons are closer.Similarly this can be included in the node poten-als as well as for the visual features.
14Thursday, October 4, 12
Result• Percentage of test videos (out of 4182 videos) correctly es-mated under distances in the top row from the groundtruth loca-on.
– run1 -‐ baseline approach without using gaze_eer– run2 -‐ graphical model based approach with gaze_eer– run3 -‐ baseline approach with gaze_eer– run4 -‐ k-‐NN with gist visual feature
• Graphical model approach with gaze_eer outperforms baseline approaches in range above 1km.
14
15Thursday, October 4, 12
Conclusion
• graphical model framework can achieve performance improvement over baseline approach by incorpora-ng results from test data • various issues remain to be explored– the modeling of edge poten-al • text : hard threshold (current) -‐-‐> sod• visual/audio features
– assump-on of condi-onal independence of loca-on distribu-on given mul-ple tags
15
16Thursday, October 4, 12
Thank You!
16
Questions?
Work together with: Venkatesan Ekambaram, Kannan
Ramchandran, Giulia Fanti Howard Lei, Adam Janin, and Gerald
Friedland
http://mmle.icsi.berkeley.edu
17Thursday, October 4, 12
18Thursday, October 4, 12
19Thursday, October 4, 12