Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...

25
Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan Zhang, Student Member, IEEE, Changsheng Xu, Senior Member, IEEE, Hanqing Lu, Senior Member, IEEE, and Yeh-Min Huang, Member, IEEE

Transcript of Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS...

Page 1: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Character Identification in Feature-Length Films Using Global Face-Name Matching

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009

Yi-Fan Zhang, Student Member, IEEE, Changsheng Xu, Senior Member, IEEE, Hanqing Lu, Senior Member, IEEE,

and Yeh-Min Huang, Member, IEEE

Page 2: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 3: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 4: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Introduction

• In a film, the interactions among the characters resemble them into a relationship network, which makes a film be treated as a small society.

• In the video, faces can stand for characters and co-occurrence of the faces in a scene can represent an interaction between characters.

• In the film script, the spoken lines of different characters appearing in the same scene also represents an interaction.

scene titlebrief description:

environment, actions

speaker namespoken line

Page 5: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Introduction

speaking face tracks

face affinity network

name affinity network

a graph matching method

an EMD-based measure of face track distance

leading characters & cliques

Since we try to keep as same as possible with the name statistics in the script, we select the speaking face tracks to build the face affinity network, which is based on the co-occurrence of the speaking face tracks.

Page 6: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 7: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face Clustering

.....

.....

frame

face detection: detect faces on each frame of the video

face track (the same person):store face position, scale andthe start and end frame number of the track

video

video scene segmentation:1. The scene segmentation points can be inserted in

the boundary between two shots which have the high degree of discontinuity.

2. To align with the scene partition in the film script, we change discontinuity degree threshold to get the same number of scenes in the video with the script.

the scene segmentation point

scene 5

scene 6

scene 7

speaking face track detection:1. the mouth ROI is located2. SIFT3. normalized sum of absolute difference (NSAD)4. if a face track has more than 10% frames labeled as speaking, it will be determined as a speaking face track

Page 8: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face Clusteringface representation by locally linear embedding(LLE):1. It is a dimensionality reduction technique.2. It project high dimensional face features into the embedding space which can still preserve the neighborhood relationship.

extract the dominant clusters:1. We employ spectral clustering to do clustering on all the faces in the LLE space.2. The number of clusters K is set by prior knowledge derived from the film script.

spectral clustering

k dominant clusters

Page 9: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face Clusteringearth mover’s distance (EMD):It is a metric to evaluate the dissimilarity between two distributions.

,ij ij

i j

iji j

d f

EMD P Qf

1 1 2 21.67

3EMD

EMD similarity

measure face track distance by EMD:represent face track:

: is the cluster center : is the number of faces belonging to this cluster

1 1{( , ),..., ( , )}

m mp p p pP c w c w

𝑐𝑝𝑖𝑤𝑝 𝑖

3 45

61 2

7 89

dominant clusters:cluster1 cluster2 cluster3 cluster4 cluster5

face track P : 1 2 3 4 5 1 3, 2 , ,3P c c

1 1

1 1

( , )

m n

ij iji j

m n

iji j

d fEMD P Q

f

: the ground distance between cluster centers and : the flow between and

ijd

ipcjqc

ipcjqc

ijf

Page 10: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face Clusteringconstrained K-Means Clustering:1. K-Means clustering is performed to group the scatted face tracks.2. The two face tracks which share the common frames cannot be clustered together.3. The target number of clusters on face tracks is the same as K we set in spectral clustering on the faces.4. We also ignore those characters whose spoken lines are less than three in the script.5. To clean the noise from the clustering results, a pruning method is employed in the next step.

speaking face track clusterscluster1 cluster2 cluster3 cluster4 cluster5

face track noise face track

cluster pruning: We refine the clustering results by pruning the

marginal points which have low confidence belonging to the current cluster.

: the EMD between the face track F and its cluster centerk : the number of K-nearest neighbors of F : the number of K-nearest neighbors which belong to the same cluster with F All the marginal points: We do a re-classification which incorporates the speaker voice features for enhancement.

: the likelihood of ‘s voice model for XX : the feature vector of the corresponding audio clip1. To clean noises, we set a threshold . 2. The face track will be classified into the cluster whose function score is maximal.

0( , )( ) D F FinkC F e

k

0,D F F

0F

ink

( , )( , ) ( | )Ck

k

D F F

k CS F C e P X

( | )kCP X kC

( , )kS F C

scoreTh

Page 11: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 12: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face-Name Association

We use a name entity recognition software to extract every name in front of the spoken lines and the scene titles.

name ik m nO o

iko

name occurrence matrixm : the number of namesn : the number of scenes : the name count of the ith character in kth scene

name affinity matrixThe affinity value between two names is represented by their co-occurrence.

face affinity matrix

[ ]name ij m mR r

1

min( , )n

ij ik jkk

r o o

[ ]face ij m mR r

Page 13: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face-Name Associationvertices matching between two graphs:The name affinity network and the face affinity network both can be represented as an undirected, weightedgraphs, respectively: , ,name n n nG V E W , ,face f f fG V E W

for each assignment , we can find a measurement on how well matching :

for each pair of assignments , where, , we can find a measurement on how compatible the two assignments are :

We use spectral matching method to find the final results of name-face association.

Page 14: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Face-Name AssociationSpectral matching method: It is commonly used for finding consistent correspondences between two sets of features.

A B

DC

1 2

43

A,B,C,DP 1,2,3,4Q

assignment , ' , where and ' Qa i i i P i

assignment

A,3 , B,1 , C,4 , D,2

, ' , , 'a i i b j j

M(a,a): It measures how well the feature i matches the feature i’.ex: M(A,3)=4, M(A,1)=1

M(a,b): It measures how well the edge (i,j) matches the edge (i’,j’).ex: M((A,3),(B,1))=4, M((A,3),(B,4))=0

Page 15: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Raleigh’s ratio theorem: is principal eigenvector of M corresponding to its largest eigenvalue.

Greedy algorithm:It is used for finding the solution to the correspondence problem.

A1

A2

A3

A4

B1

B2

B3

B4

C1

C2

C3

C4

D1

D2

D3

D4

A1

2 0 0 0 0 1 4 3 0 3 1 2 0 2 3 4

A2

0 3 0 0 1 0 3 1 3 0 2 4 2 0 4 2

A3

0 0 4 0 4 3 0 1 1 2 0 4 3 4 0 2

A4

0 0 0 1 3 1 1 0 2 4 4 0 4 2 2 0

B1

0 2 4 3 4 0 0 0 0 3 3 4 0 4 2 3

B2

2 0 3 1 0 3 0 0 3 0 4 2 4 0 3 3

B3

4 3 0 1 0 0 2 0 3 4 0 2 2 3 0 3

B4

3 1 1 0 0 0 0 3 4 2 2 0 3 3 3 0

C1

0 3 1 2 0 3 3 4 3 0 0 0 0 3 1 2

C2

3 0 2 4 3 0 4 2 0 2 0 0 3 0 2 4

C3

1 2 0 4 3 4 0 2 0 0 1 0 1 2 0 4

C4

2 4 4 0 4 2 2 0 0 0 0 4 2 4 4 0

D1

0 3 3 4 0 4 2 2 0 3 1 2 3 0 0 0

D2

3 0 4 2 4 0 3 3 3 0 2 4 0 4 0 0

D3

3 4 0 2 2 3 0 3 1 2 0 4 0 0 3 0

D4

4 2 2 0 2 3 3 0 2 4 4 0 0 0 0 2

1. The correspondence problem reduces now to finding the cluster C of assignments (i,i’) that maximizes the score

2. We can represent a cluster C by an indicator vector x, such that and zero otherwise.

3. the optimal solution

,

, ,a b C a C

S M a b M a a

,

, , T

a b C a C

S M a b M a a x Mx

* arg max Tx x Mx

A B

DC

1 2

43

A1 0

A2 0

A3 1

A4 0

B1 1

B2 0

B3 0

B4 0=

C1 0

C2 0

C3 0

C4 1

D1 0

D2 1

D3 0

D4 0

x

A1

A2

A3

A4

B1

B2

B3

B4

C1

C2

C3

C4

D1

D2

D3

D4

x

Page 16: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 17: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Applications

• Until now, we have associated a name to each speaking face track cluster. For the non-speaking face tracks, we can also classify them into the nearest speaking face track clusters depending on the EMD.

Character Relationship Mining• The relationship mining is conducted on the name affinity network.• The leading character can be considered as the one who has high

centrality in the name affinity network . The centrality of a character is defined as .

Page 18: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Applications

• For clique detection, we use agglomerative hierarchical clustering.

and are the numbers of the

characters in and

We classify the result cliques into dyad which has two members, triad which has three members and the large clique.

Page 19: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Applications

Character-Centered Browsing

Page 20: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 21: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Experiment

film information

speaking face track detection

Page 22: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Experiment

The higher the value of is , the more speaking face tracks

will be pruned.

Precision/recall curves of

face track clustering

confTh

Page 23: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Experiment

name-face association

relationship mining

Page 24: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Outline

• Introduction• Face Clustering• Face-Name Association• Applications• Experiment• Conclusions

Page 25: Character Identification in Feature-Length Films Using Global Face-Name Matching IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 11, NO. 7, NOVEMBER 2009 Yi-Fan.

Conclusions

• A graph matching method has been utilized to build name-face association between the name affinity network and the face affinity network.

• As an application, we have mined the relationship between characters and provided a platform for character-centered film browsing.