A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet...

24
Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman A Compact and Discriminative Face Track Descriptor

Transcript of A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet...

Page 1: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

A Compact and Discriminative!Face Track Descriptor

Page 2: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Recognising and verifying faces in videos 2

Recognition Verification

same

different

Page 3: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

VF2: a new compact face track descriptor

▶Discriminative!▶Useful for different tasks (Recognition, Verification)!▶Extremely compact

3

Face track: sequence of face detections in consecutive frames.

face track descriptor

Page 4: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Large scale face retrieval

▶ Example of a typical target dataset!▶ 5 years of evening news programs!▶ 10,000 hrs of broadcast !▶ 20 Million frames, !▶ 2.1 Million face tracks!▶ Real time performance

4

http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/

▶ 30 frames per track on average!▶ Typical 4000D descriptor → 1 TB!▶ Our descriptor → 270 MB!

Page 5: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

5

Page 6: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

1. Dense feature computation

▶ Input: a face track!▶ Aligned or unaligned!▶ No facial landmarks required (eyes, nose, etc.)!

▶ Output: a set of local features !▶ Extracted from all frames!▶ Dense RootSIFT at multiple scales!▶ 64-D PCA

6

Page 7: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

7

Page 8: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

GMM

[Perronnin et al. ECCV 2012]

2. Fisher Vector encoding 8

first and second order statistics

FV encoding+ sqrt-L2

normalisation

� =

2

6666666664

v1u1v2u2...vKuK

3

7777777775 uk =1

Mp2⇡k

MX

i=1

�k(xi )

✓xi � µk

�i� 1

◆2

vk =1

Mp⇡k

MX

i=1

�k(xi )xi � µk

�i

xi

Gaussians!(μk,Σk)

µk

xi�k(xi )

AssignmentHard

Dense SIFT

Page 9: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Gaussian components as part detectors

2. Fisher Vector Encoding 9

2

66664 xW � 1

2yH � 1

2

3

77775

Spatial (x,y) Augmentation

Page 10: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

10

Page 11: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

3. Video and jittered pooling

▶ Typically each frame is pooled independently!▶ Complex inference procedures combining multiple descriptors!▶ Large memory footprint

11

[Sivic et al. CVPR 09, Everingham et. al IVC 09,, Wolf et al. CVPR 2011]

Page 12: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

3. Video and jittered pooling

▶ Single descriptor per track!▶ Smaller memory footprint!▶ Easy to use!▶ Improved performance

12

[Application to Action Recognition: Oneata, Verbeek, Schmid ICCV 2013]

Page 13: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

3. Video and jittered pooling

▶ Data augmentation!▶ Data augmentation without training set increase!▶ Improvement in the performance

13

[Paulin et al. CVPR 2014]

Page 14: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

14

Page 15: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Learn to discriminate faces

4. Metric Learning 15

d2W (x, y) = kW x�W yk2 < b

same person

d2W (u, v) = kWu�W vk2 > b

x y uv

different people

d2W (x, y) = kW x�W yk2

[Simonyan, Parkhi, Vedaldi, Zisserman BMVC 2013]

W

x

=

learnt projectionFisherVector

z

Page 16: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

16

Page 17: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Parseval Tight Frame

5. Binarisation

▶ Low-dimensional real-valued descriptor → high dimensional binary !▶ 4x decrease in memory footprint (128D real → 1024D binary)!▶ Fast distance computation!▶ Alternative binarisation methods could be used

17

q ⨉ m!Columns from a

random rotation matrix

sign= q

q bits only

0101010

[Jégou et al. ICASSP 2012, Simonyan et al. PAMI 2014]

real-valued descriptor

m⨉

U z sign(Uz)zU

q

Page 18: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

18

Page 19: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Face Verification

YouTube Faces Dataset 19

▶ Face verification in videos!▶ 3,425 videos of 1,595 celebrities!▶ Videos collected from internet!▶ Wide pose, expression and illumination variation!▶ 10 splits of 600 pairs of videos!

▶ Restricted setting: Use provided pairs!▶ Unrestricted setting: Free to form own pairs.

differentsame

[Wolf, Hassner, Moaz CVPR 2011]

Page 20: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Image Pool (Soft assignment FV)

Video Pool (Soft assignment FV)

Video Pool hard asignment fv

Video Pool + Jittered Pool

Video Pool. + Binar. 1024 bit + jitt.

Video Pool. + Joint sim. + jitt.

Error0 4.5 9 13.5 18

12.3

13.4

14.2

16.2

15

17.3

Face Verification

YouTube Faces Dataset 20

Page 21: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Face Verification

YouTube Faces Dataset 21

MGBS & SVM-APEM FUSION

STFRD & PMMLVSOF & OSS (Adaboost)

DDML (Combined)VF 1024D (binary)

VF 256DDeep Face (facebook.com)

Error0 5.5 11 16.5 22

8.612.3

13.418.5

2019.9

21.421.2

Requires additional training data.

2

2

Page 22: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Weakly supervised face classification

Oxford Buffy Dataset 22

▶ “Buffy The Vampire Slayer”!▶ Face tracks from 7 episodes of season 5.!▶ Both frontal and profile detections!▶ Weak supervision from transcript and subtitles!▶ Multi Class classification for every episode

[Everingham et al. IVC 2009, Sivic et al. CVPR 2009]

Page 23: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Weakly supervised classification

Oxford Buffy Dataset 23

Sivic et al. (HOG RBF MKL)

VF ( GMMs trained on Buffy )

VF ( GMMs trained on YTF )

VF ( GMMs trained on YTF ) + Jitt. Pool 1024D

VF ( GMMs trained on YTF 2048b)

Avg. AP0.79 0.808 0.825 0.843 0.86

0.82

0.86

0.8

0.81

0.812

2

2

2

Page 24: A Compact and Discriminative Face Track Descriptorvgg/publications/2014/... · Very simple yet powerful face track descriptor! Recap 24 Track descriptor in 128 bytes! Face landmarks

Very simple yet powerful face track descriptor!

Recap 24

▶Track descriptor in 128 bytes!▶Face landmarks and alignment not required!▶One descriptor per track!!

▶State of the art/comparable results on multiple tasks!▶YouTube Faces Dataset!▶Oxford Buffy Dataset!

!▶ Can be trained with very small amount of data!▶ Extremely easy to compute!

!▶ Code online soon.

Questions?