The 2012 ICSI/Berkeley Video Location Estimation System

25
The 2012 ICSI / Berkeley Location Estimation System Jaeyoung Choi ,Venkatesan Ekambaram, Gerald Friedland and Kannan Ramchandran ICSI / UC Berkeley, USA October 4th, 2012 1 Thursday, October 4, 12

Transcript of The 2012 ICSI/Berkeley Video Location Estimation System

Page 1: The 2012 ICSI/Berkeley Video Location Estimation System

The 2012 ICSI / Berkeley Location Estimation System

Jaeyoung Choi, Venkatesan Ekambaram, Gerald Friedland and Kannan Ramchandran

ICSI / UC Berkeley, USAOctober 4th, 2012

1Thursday, October 4, 12

Page 2: The 2012 ICSI/Berkeley Video Location Estimation System

Agenda

• Baseline Approach

• Drawbacks

• Graphical Model Framework

• Result

2Thursday, October 4, 12

Page 3: The 2012 ICSI/Berkeley Video Location Estimation System

Baseline Approach

• Investigate ‘Spatial Variance’ of feature:

• spatial variance is small : feature is likely location-indicative

• spatial variance is large : feature is likely not indicative

3Thursday, October 4, 12

Page 4: The 2012 ICSI/Berkeley Video Location Estimation System

ExampleTag Matches in

Training setSpatial Variance

pavement 2 5.739

ucberkeley 4 0.132

berkeley 14 68.138

greek 0 N/A

greektheatre 0 N/A

spitonastranger 0 N/A

live 91 6453.109

video 2967 6735.844

4Thursday, October 4, 12

Page 5: The 2012 ICSI/Berkeley Video Location Estimation System

Problem: Sparsity coming from biased dataset

5Thursday, October 4, 12

Page 6: The 2012 ICSI/Berkeley Video Location Estimation System

The effect of sparsity

6

0"

10"

20"

30"

40"

50"

60"

0≤e<1&

10≤e<100&

100≤e<1000&

1000≤e<10000&

10000≤e&

Percen

tage&[%

]&

Distance&error&(e)&between&ground&truth&and&es<ma<on&[km]

>6400"6400"1600"400"100"

*  Test"video"from"a"dense"area"has"higher"chance"of"being"es<mated"with"lower"error"in"distance."""

6Thursday, October 4, 12

Page 7: The 2012 ICSI/Berkeley Video Location Estimation System

Geo-­‐tagging:  an  es-ma-on-­‐theore-c  viewpoint

{berkeley,  sathergate,  campanile}

{berkeley,  haas} {campanile} {campanile,  haas}

Observa(ons:

Images:

Tags: , , ,

{tk1} {tk2} {tk3} {tk4}, , ,Es(mate:Geoloca-ons:

x1 x2 x3 x4, , ,7Thursday, October 4, 12

Page 8: The 2012 ICSI/Berkeley Video Location Estimation System

Interpre-ng  tradi-onal  approaches

Loca-ons  are  random  variables: {x1, x2, ....., xN}

8Thursday, October 4, 12

Page 9: The 2012 ICSI/Berkeley Video Location Estimation System

Interpre-ng  tradi-onal  approaches

Loca-ons  are  random  variables: {x1, x2, ....., xN}

Tradi-onal  approaches  es-mate:p(xi|{tki }) �

Y

k

p(xi|tki )

wherep(xi|tki ) is  obtained  from  the  training  set

Probability  of  loca-on  given  tags

8Thursday, October 4, 12

Page 10: The 2012 ICSI/Berkeley Video Location Estimation System

Interpre-ng  tradi-onal  approaches

Loca-ons  are  random  variables: {x1, x2, ....., xN}

Tradi-onal  approaches  es-mate:p(xi|{tki }) �

Y

k

p(xi|tki )

wherep(xi|tki ) is  obtained  from  the  training  set

Example:  the  distribu-on  for  the  tag  “washington”  is  depicted  here

Probability  of  loca-on  given  tags

8Thursday, October 4, 12

Page 11: The 2012 ICSI/Berkeley Video Location Estimation System

Interpre-ng  tradi-onal  approaches

Loca-ons  are  random  variables: {x1, x2, ....., xN}

Tradi-onal  approaches  es-mate:p(xi|{tki }) �

Y

k

p(xi|tki )

wherep(xi|tki ) is  obtained  from  the  training  set

Example:  the  distribu-on  for  the  tag  “washington”  is  depicted  here

Loca-on  es-mate:Z

xi p(xi|{tki })dxi

Probability  of  loca-on  given  tags

8Thursday, October 4, 12

Page 12: The 2012 ICSI/Berkeley Video Location Estimation System

DrawbacksData  sparsity:    Not  all  tags  in  test  set  are  available  in  training  set.                  Hence  es-mate  of                                          can  be  bad    

p(xi|tki )Sub-­‐op(mality:    The  approaches  are  subop-mal  given  the  data.

What  we  ideally  want:p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN})

Mean  of  the  above  distribu-on  gives  the  best  es-mate  of  the  loca-onsi.e.  for  each  image  we  want

p(xi|{tk1}, {tk2}, ...., {tkN})

Tradi-onal  algorithms  only  give:p(xi|{tki })

9Thursday, October 4, 12

Page 13: The 2012 ICSI/Berkeley Video Location Estimation System

Bayesian  graphical  framework{berkeley,  sathergate,  campanile}

{berkeley,  haas}

{campanile} {campanile,  haas}

Node:  Geoloca-on  of  the  image

Edge:  Correlated  loca-ons  (e.g.  common  tag)

Edge  Poten(al:  Strength  of  an  edge,  (e.g.  posterior  distribu-on  of  loca-ons  given  common  tags)

p(xi, xj |{tki } � {tkj })

p(xj |{tkj })p(xi|{tki })

10Thursday, October 4, 12

Page 14: The 2012 ICSI/Berkeley Video Location Estimation System

Coopera-ve  geo-­‐taggingIntui-on:  Images  in  the  training  set  having  common  tags  have              correlated  geo-­‐loca-ons  captured  by  the  joint  distribu-on

11Thursday, October 4, 12

Page 15: The 2012 ICSI/Berkeley Video Location Estimation System

Coopera-ve  geo-­‐taggingIntui-on:  Images  in  the  training  set  having  common  tags  have              correlated  geo-­‐loca-ons  captured  by  the  joint  distribu-onJoint  probability  modeling:

p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y

i

p(xi|{tki })Y

(i,j)

p(xi, xj |{tki } ⇥ {tkj })

Pairwise  distribu-on  given  at  least  one  common  tag

11Thursday, October 4, 12

Page 16: The 2012 ICSI/Berkeley Video Location Estimation System

Coopera-ve  geo-­‐taggingIntui-on:  Images  in  the  training  set  having  common  tags  have              correlated  geo-­‐loca-ons  captured  by  the  joint  distribu-onJoint  probability  modeling:

p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y

i

p(xi|{tki })Y

(i,j)

p(xi, xj |{tki } ⇥ {tkj })

Pairwise  distribu-on  given  at  least  one  common  tag

is  obtained  from  the  training  set  as  before

p(xi, xj |{tki } � {tkj }) Modeled  as  an  indicator  func-on I(xi = xj)If  the  common  tag  has  low  spa-al  variance  or  occurs  infrequently,  e.g.  if  the  common  tag  is  “haas”,  its  very  likely  the  loca-ons  are  the  same

p(xi|{tki })

11Thursday, October 4, 12

Page 17: The 2012 ICSI/Berkeley Video Location Estimation System

Coopera-ve  geo-­‐taggingIntui-on:  Images  in  the  training  set  having  common  tags  have              correlated  geo-­‐loca-ons  captured  by  the  joint  distribu-onJoint  probability  modeling:

p(x1, x2, ....., xN |{tk1}, {tk2}, ..., {tkN}) �Y

i

p(xi|{tki })Y

(i,j)

p(xi, xj |{tki } ⇥ {tkj })

Pairwise  distribu-on  given  at  least  one  common  tag

is  obtained  from  the  training  set  as  before

p(xi, xj |{tki } � {tkj }) Modeled  as  an  indicator  func-on I(xi = xj)If  the  common  tag  has  low  spa-al  variance  or  occurs  infrequently,  e.g.  if  the  common  tag  is  “haas”,  its  very  likely  the  loca-ons  are  the  same

Ques-on: How  to  es-mate  to  op-mal  marginal  distribu-on  ?

p(xi|{tk1}, {tk2}, ...., {tkN})

p(xi|{tki })

11Thursday, October 4, 12

Page 18: The 2012 ICSI/Berkeley Video Location Estimation System

Belief  propaga-on  updatesp(xi|{tk1}, {tk2}, ...., {tkN})Itera-ve  algorithm  to  approximate  

the  posterior  distribu-on

Gaussian  modeling p(xi|{tki }) � N (µi,�2i )

At  itera-on  0  each  node  calculates (µi,�2i )

At  itera-on  t  each  node  updates  its  loca-on  as  a  weighted  mean  of  its  previous  loca-on  and  that  of  its  neighbors

µ(t)i =

1

(�(t)i )2

µ(t�1)i +

Pk⇥N (i)

1

(�(t)k )2

µ(t)k

(�(t)i )2

1

(�(t)i )2

=1

(�(t�1)i )2

+X

k2i

1

(�(t�1)k )2

The  weights  reflect  the  confidence  in  that  measurements,  i.e.  higher  the  spa-al  variance  lower  is  the  weight

12Thursday, October 4, 12

Page 19: The 2012 ICSI/Berkeley Video Location Estimation System

Belief  propaga-on

(µ1,�21)

(µ2,�22)

(µ3,�23)

Posterior  mean  and  variance  assuming  Gaussian  beliefs

Audio  visual  features  are  incorporated  in  modeling  the  edge  and  node  poten-als

13Thursday, October 4, 12

Page 20: The 2012 ICSI/Berkeley Video Location Estimation System

Incorpora-ng  Audio-­‐Visual  features• GIST  features  are  extracted  for  the  images.• MFCC  features  are  extracted  for  the  audio.• These  are  now  incorporated  into  the  node  and  edge  poten-als  as  exponen-al  distribu-ons.

p(xi, xj |ai, aj) ⇥ exp(� ||xi � xj ||�||ai � aj ||

)

ai are  the  audio  features  associated  with  image  i

The  intui-on  is  that  closer  the  audio  features  are,  higher  the  probability  that  the  geo-­‐loca-ons  are  closer.Similarly  this  can  be  included  in  the  node  poten-als  as  well  as  for  the  visual  features.

14Thursday, October 4, 12

Page 21: The 2012 ICSI/Berkeley Video Location Estimation System

Result• Percentage of test videos (out of 4182 videos)  correctly  es-mated  under  distances  in  the  top  row  from  the  groundtruth  loca-on.  

– run1  -­‐  baseline  approach  without  using  gaze_eer– run2  -­‐  graphical  model  based  approach  with  gaze_eer– run3  -­‐  baseline  approach  with  gaze_eer– run4  -­‐  k-­‐NN  with  gist  visual  feature

• Graphical  model  approach  with  gaze_eer  outperforms  baseline  approaches  in  range  above  1km.    

14

15Thursday, October 4, 12

Page 22: The 2012 ICSI/Berkeley Video Location Estimation System

Conclusion

• graphical  model  framework  can  achieve  performance  improvement  over  baseline  approach  by  incorpora-ng  results  from  test  data  • various  issues  remain  to  be  explored–  the  modeling  of  edge  poten-al  • text  :  hard  threshold  (current)  -­‐-­‐>  sod• visual/audio  features    

–  assump-on  of  condi-onal  independence  of  loca-on  distribu-on  given  mul-ple  tags  

15

16Thursday, October 4, 12

Page 23: The 2012 ICSI/Berkeley Video Location Estimation System

Thank You!

16

Questions?

Work together with: Venkatesan Ekambaram, Kannan

Ramchandran, Giulia Fanti Howard Lei, Adam Janin, and Gerald

Friedland

http://mmle.icsi.berkeley.edu

17Thursday, October 4, 12

Page 24: The 2012 ICSI/Berkeley Video Location Estimation System

18Thursday, October 4, 12

Page 25: The 2012 ICSI/Berkeley Video Location Estimation System

19Thursday, October 4, 12