ICML2011: recognizing human-object interaction activities

79
Recognizing Human-Object Interaction Activities Bangpeng Yao, Aditya Khosla and Li Fei-Fei Computer Science Department, Stanford University {bangpeng,feifeili}@cs.stanford.edu 1 Action Classification Action Retrieval

Transcript of ICML2011: recognizing human-object interaction activities

Page 1: ICML2011: recognizing human-object interaction activities

1

Recognizing Human-Object Interaction Activities

Bangpeng Yao, Aditya Khosla and Li Fei-Fei

Computer Science Department, Stanford University

{bangpeng,feifeili}@cs.stanford.edu

• Action Classification• Action Retrieval

Page 2: ICML2011: recognizing human-object interaction activities

2

B. Yao and L. Fei-Fei. “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities.” CVPR 2010.

B. Yao, A. Khosla, and L. Fei-Fei. “Classifying Actions and Measuring Action Similarity by Modeling the Mutual Context of Objects and Human Poses.” ICML 2011.

Page 3: ICML2011: recognizing human-object interaction activities

Visual Recognition

3

Page 4: ICML2011: recognizing human-object interaction activities

Visual Recognition

Focus on Humans

6

Page 5: ICML2011: recognizing human-object interaction activities

Human images are everywhere:

Why humans are important?

7

Page 6: ICML2011: recognizing human-object interaction activities

Why humans are important?

Top 3 most popular synsets in ImageNet:

Deng et al, 2009

http://www.image-net.org/

8

Page 7: ICML2011: recognizing human-object interaction activities

Human Action Recognition

9

Robots interact with objects

Automatic sports commentary

Security – Drunk people detection

Page 8: ICML2011: recognizing human-object interaction activities

Human Action RecognitionHuman-Object InteractionB. Yao and L. Fei-Fei. Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities. IEEE Computer Vision and Pattern Recognition (CVPR). 2010.

B. Yao, A. Khosla, and L. Fei-Fei. Classifying Actions and Measuring Action Similarity by Modeling the Mutual Context of Objects and Human Poses. International Conference on Machine Learning (ICML). 2011.

Robots interact with objects

Automatic sports commentary

Security – Drunk people detection

10

Page 9: ICML2011: recognizing human-object interaction activities

11

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 10: ICML2011: recognizing human-object interaction activities

12

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 11: ICML2011: recognizing human-object interaction activities

• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009

Difficult part appearance

Self-occlusion

Image region looks like a body part

Human pose estimation & Object detection

13

Human pose estimation is challenging.

Page 12: ICML2011: recognizing human-object interaction activities

Human pose estimation & Object detection

14

Human pose estimation is challenging.

• Felzenszwalb & Huttenlocher, 2005• Ren et al, 2005• Ramanan, 2006• Ferrari et al, 2008• Yang & Mori, 2008• Andriluka et al, 2009• Eichner & Ferrari, 2009

Page 13: ICML2011: recognizing human-object interaction activities

Human pose estimation & Object detection

15

Facilitate

Given the object is detected.

Page 14: ICML2011: recognizing human-object interaction activities

• Viola & Jones, 2001• Lampert et al, 2008• Divvala et al, 2009• Vedaldi et al, 2009

Small, low-resolution, partially occluded

Image region similar to detection target

Human pose estimation & Object detection

16

Object detection is challenging

Page 15: ICML2011: recognizing human-object interaction activities

Human pose estimation & Object detection

17

Object detection is challenging

• Viola & Jones, 2001• Lampert et al, 2008• Divvala et al, 2009• Vedaldi et al, 2009

Page 16: ICML2011: recognizing human-object interaction activities

Human pose estimation & Object detection

18

Facilitate

Given the pose is estimated.

Page 17: ICML2011: recognizing human-object interaction activities

Human pose estimation & Object detection

19

Mutual Context

Page 18: ICML2011: recognizing human-object interaction activities

20

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 19: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

21

Croquet shot

Volleyball smash

Tennis forehand

Activity classes:

Activity

A

Image evidenceII

[Yao et al, 2011]

Page 20: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

22

Human pose as layout of body parts. Activity

H

A

P1 P2 PL

Human pose

Body parts

I

[Yao et al, 2011]

Page 21: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

23

Volleyball smashing

Cricket bowling

Tennis forehand

Human pose as layout of body parts.

Atomic poses – pose dictionary.

Activity

H

A

P1 P2 PL

Human pose

Body parts

I

[Yao et al, 2011]

Page 22: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

24

List of objects:

Human interact with any number of objects:

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 23: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

25

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 24: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

26

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field: Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 25: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

27

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field:

Compatibility between actions, objects, and human poses:

1

( ) ( ) , ,( )1 1 1 1

( , , )

1 1 1h o a

mi kj

N N NM

H h A a i j kO oi m j k

A O H

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 26: ICML2011: recognizing human-object interaction activities

Mutual Context Model Representation

28

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field:

Modeling actions:

2 ( )1

( , ) 1 ( )a

k

NT

A a kk

A I s I

Na-dimensional output of an action classifier

Activity

Objects H

A

P1 P2 PL

OM O1Body parts

I

Human pose

[Yao et al, 2011]

Page 27: ICML2011: recognizing human-object interaction activities

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

Mutual Context Model Representation

29

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field:

Modeling objects:

3 ( )1 1

( , ) 1 ( )o

mj

NMT mjO o

m j

O I g O

Object detection scores

Spatial relationship between two object windows

,( ) ( )1 1 1 1

1 1 ( , )o o

m mj j

N NM MT m mj jO o O o

m m j j

b O O

[Yao et al, 2011]

Page 28: ICML2011: recognizing human-object interaction activities

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

Mutual Context Model Representation

30

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field:

Modeling human poses:

4

( ) , ,1 1

( , )

1 ( | ) ( )h

i i

N LT l l T l

H h i l I h i li l

H I

p f I

x x

Detection score of the l-th body part

Location of the l-th body part with the prior of atomic pose hi

[Yao et al, 2011]

Page 29: ICML2011: recognizing human-object interaction activities

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

Mutual Context Model Representation

31

1 2( , , , ) ( , , ) ( , )A O H I A O H A I

3 4 5( , ) ( , ) ( , )O I H I O H

Conditional Random Field:

Modeling human poses:

5

( ) , ,( )1 1 1 1

( , )

1 1 ( , )h o

mi j

N NM LT l m

H h i j l IO om i j l

H O

b O

x

Spatial relationship between the l-th body part and the m-th object window

[Yao et al, 2011]

Page 30: ICML2011: recognizing human-object interaction activities

32

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 31: ICML2011: recognizing human-object interaction activities

Mutual Context Model Learning

33

• Obtaining atomic poses

Annotating

Clustering

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 32: ICML2011: recognizing human-object interaction activities

Mutual Context Model Learning

34

• Obtaining atomic poses• Potentials

– Object & body part detection

One detector for each object or body part

Deformable part model [Felzenszwalb et al, 2008]

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 33: ICML2011: recognizing human-object interaction activities

Mutual Context Model Learning

35

• Obtaining atomic poses• Potentials

– Object & body part detection– Action classification

Spatial pyramid model [Lazebnik et al, 2005]

Activity

Objects H

A

P1 P2 PL

OM O1Body parts

I

Human pose

[Yao et al, 2011]

Page 34: ICML2011: recognizing human-object interaction activities

Mutual Context Model Learning

36

• Obtaining atomic poses• Potentials

– Object & body part detection– Action classification– Spatial relationships

Bin function [Desai et al, 2009]

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 35: ICML2011: recognizing human-object interaction activities

Mutual Context Model Learning

37

• Obtaining atomic poses• Potentials

– Object & body part detection– Action classification– Spatial relationships

• Model parameter estimation

Standard Conditional random field: Belief Propagation

[Pearl, 1988]

, , , , ,

Activity

Objects H

A

P1 P2 PL

OM O1

Human pose

Body parts

I

[Yao et al, 2011]

Page 36: ICML2011: recognizing human-object interaction activities

Model Learning Result

38

Activity classes:

Atomic poses:

Objects:

Page 37: ICML2011: recognizing human-object interaction activities

Model Learning Result

39

Activity classes:

Atomic poses:

Objects:

Tennis Serving

Page 38: ICML2011: recognizing human-object interaction activities

Model Learning Result

40

Activity classes:

Atomic poses:

Objects:

Tennis Serving

Volleyball Smash

Page 39: ICML2011: recognizing human-object interaction activities

41

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 40: ICML2011: recognizing human-object interaction activities

42

Model Inference for Pose Estimation, Object Detection, and Action Classification

• Initialization• Iteratively optimize :

Updating the layout of human body parts Updating the object detections Updating the action and atomic pose labels

( , , , )A O H I

Action a1

Action a2

Action a3

Action aNa

Page 41: ICML2011: recognizing human-object interaction activities

43

Model Inference for Pose Estimation, Object Detection, and Action Classification

• Initialization• Iteratively optimize :

Updating the layout of human body parts

1( )H hp 2( )H hp 3( )H hp

0.51 0.06 0.04

Mixture model

Re-estimate human pose

[Felzenszwalb et al, 2005][Sapp et al, 2010]

( , , , )A O H I

Action a1

Action a2

Action a3

Action aNa

Page 42: ICML2011: recognizing human-object interaction activities

44

Model Inference for Pose Estimation, Object Detection, and Action Classification

• Initialization• Iteratively optimize :

Updating the layout of human body parts Updating the object detections

( , , , )A O H I

Start from no objects in the image;

Evaluate the contribution of increasing for each detection window separately.

( , , , )A O H IAction a1

Action a2

Action a3

Action aNa

Page 43: ICML2011: recognizing human-object interaction activities

45

Model Inference for Pose Estimation, Object Detection, and Action Classification

• Initialization• Iteratively optimize :

Updating the layout of human body parts Updating the object detections Updating the action and atomic pose labels

( , , , )A O H I

Enumerating all possible A and H values to maximize .

Action a1

Action a2

Action a3

Action aNa

( , , , )A O H I

Page 44: ICML2011: recognizing human-object interaction activities

46[Gupta et al, 2009]

Cricket batting Cricket bowling Croquet shot

Tennis forehand Tennis serve Volleyball smash

Sport data set: 6 classes, 180 training (supervised with object and body part locations) & 120 testing images

Action Classification Experiment

Page 45: ICML2011: recognizing human-object interaction activities

47

Action Classification Results

1 2 3 4 5 6 70.5

0.6

0.7

0.8

0.9

1

Acc

urac

y

1 2 3 4 5 6 70.5

0.6

0.7

0.8

0.9

1

Acc

urac

y

Cricket bowling

Croquet shot

Tennis forehand

Tennis serving

Volleyball smash

Cricket batting

Overall

83%87%

Yao & Fei-Fei (2010b)

Lazebnik et al. (2006)

Our MethodYao et al,

(2011)Yao & Fei-Fei,

(2010)

Page 46: ICML2011: recognizing human-object interaction activities

48[Gupta et al, 2009]

Cricket batting Cricket bowling Croquet shot

Tennis forehand Tennis serve Volleyball smash

Object Detection and Pose Estimation

Sport data set: 6 classes, 180 training (supervised with object and body part locations) & 120 testing images

Page 47: ICML2011: recognizing human-object interaction activities

49

Object Detection Results

cricket bat .17 .18 .20

cricket ball .24 .27 .32

cricket stump .77 .78 .77

croquet mallet .29 .32 .34

croquet ball .50 .52 .58

croquet hoop .15 .17 .22

tennis racket .33 .31 .37

tennis ball .42 .46 .49

volleyball .64 .65 .67

volleyball net .04 .06 .09

MethodFelzensz-walb et al.

(2010)

Desai et al. (2009)

Yao et al. (2011)

overall .36 .37 .41

Page 48: ICML2011: recognizing human-object interaction activities

50

Object Detection Results

cricket bat .17 .18 .20

cricket ball .24 .27 .32

cricket stump .77 .78 .77

croquet mallet .29 .32 .34

croquet ball .50 .52 .58

croquet hoop .15 .17 .22

tennis racket .33 .31 .37

tennis ball .42 .46 .49

volleyball .64 .65 .67

volleyball net .04 .06 .09

MethodFelzensz-walb et al.

(2010)

Desai et al. (2009)

Yao et al. (2011)

overall .36 .37 .41

Page 49: ICML2011: recognizing human-object interaction activities

51

Object Detection Results

cricket bat .17 .18 .20

cricket ball .24 .27 .32

cricket stump .77 .78 .77

croquet mallet .29 .32 .34

croquet ball .50 .52 .58

croquet hoop .15 .17 .22

tennis racket .33 .31 .37

tennis ball .42 .46 .49

volleyball .64 .65 .67

volleyball net .04 .06 .09

MethodFelzensz-walb et al.

(2010)

Desai et al. (2009)

Yao et al. (2011)

overall .36 .37 .41

Page 50: ICML2011: recognizing human-object interaction activities

52

Human Pose Estimation Results

head .58 .71 .76

torso .66 .69 .77

left/rightupper arms

.44 .44 .52

.40 .40 .45

left/rightlower arms

.27 .35 .39

.29 .36 .37

left/rightupper legs

.43 .58 .63

.39 .63 .61

left/rightlower legs

.44 .59 .60

.34 .71 .77

Method Yao & Fei-Fei (2010)

Andrilu-ka et al. (2009)

Yao et al. (2011)

overall .42 .55 .59

Page 51: ICML2011: recognizing human-object interaction activities

53

Human Pose Estimation Results

head .58 .71 .76

torso .66 .69 .77

left/rightupper arms

.44 .44 .52

.40 .40 .45

left/rightlower arms

.27 .35 .39

.29 .36 .37

left/rightupper legs

.43 .58 .63

.39 .63 .61

left/rightlower legs

.44 .59 .60

.34 .71 .77

Method Yao & Fei-Fei (2010)

Andrilu-ka et al. (2009)

Yao et al. (2011)

overall .42 .55 .59

Page 52: ICML2011: recognizing human-object interaction activities

54

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 53: ICML2011: recognizing human-object interaction activities

Action Recognition as Classification

Cricket batting

Tennis Forehand

Volleyball Smashing

Playing Bassoon

Playing Guitar

Playing Erhu

Running

Gupta et al (2009)Yao & Fei-Fei (2010)

PASCAL VOC (2010)

Reading

Ikizler-Cinbis et al, 2009Desai et al, 2010Yang et al, 2010Delaitre et al, 2011Maji et al, 2011

55

Page 54: ICML2011: recognizing human-object interaction activities

Is Classification the End?

stand run

Actions in a continuous space

56

Page 55: ICML2011: recognizing human-object interaction activities

Is Classification the End?

Same action,Different meanings

57

Page 56: ICML2011: recognizing human-object interaction activities

Is Classification the End?

More than one action at the same time

Shopping

Calling

58

Page 57: ICML2011: recognizing human-object interaction activities

59

Retrieval Instead of ClassificationRetrieval as Similarity Ranking

> > >

> >

> > >

Page 58: ICML2011: recognizing human-object interaction activities

60

Ref.

Retrieval as Similarity Ranking

Decreasing of similarity value

Page 59: ICML2011: recognizing human-object interaction activities

61

Retrieval as Similarity Ranking

Ref.

Decreasing of similarity value

Page 60: ICML2011: recognizing human-object interaction activities

62

Ref.

Retrieval as Similarity Ranking

• Challenges:How to obtain the ground-truth?How to perform automatic retrieval?How to evaluate a retrieval system?

Decreasing of similarity value

Page 61: ICML2011: recognizing human-object interaction activities

63

Action Retrieval: Obtaining Ground Truth

• Human annotation experiment:– Eight human subjects, the same set of 252 trials.

One trial:Comparison images

Reference image

Page 62: ICML2011: recognizing human-object interaction activities

64

• Human annotation experiment:– Eight human subjects, the same set of 252 trials.

One trial:

Reference image

Comparison images

Reference image

Action Retrieval: Obtaining Ground Truth

Page 63: ICML2011: recognizing human-object interaction activities

65

• Human annotation experiment:– Eight human subjects, the same set of 252 trials.

One trial:

?? ?

Reference image

Comparison images

Action Retrieval: Obtaining Ground Truth

Page 64: ICML2011: recognizing human-object interaction activities

66

• Human annotation experiment:– Eight human subjects, the same set of 252 trials.

One trial:

?? ?

Reference image

Comparison images

Action Retrieval: Obtaining Ground Truth

Page 65: ICML2011: recognizing human-object interaction activities

67

• Human annotation experiment:– Eight human subjects, the same set of 252 trials.

1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

8:0 7:1 6:2 5:3 4:4

Degree of consistency of human annotations

Per

cent

age

Action Retrieval: Obtaining Ground Truth

Page 66: ICML2011: recognizing human-object interaction activities

68

• From pairwise annotation to overall similarity:

1Ref.

1s

2s

3s

4s

Ns

Sim( , )

PairwiseHuman annotation

Similarityvector

Action Retrieval: Obtaining Ground Truth

2s.t. 0, 1s s

-1 0 1 0 0 0 1 0 0 -1

Page 67: ICML2011: recognizing human-object interaction activities

69

1Ref.

1s

2s

3s

4s

Ns

Sim( , )

PairwiseHuman annotation

Similarityvector

-1 0 1 0 0 0 1 0 0 -1

• From pairwise annotation to overall similarity:

Action Retrieval: Obtaining Ground Truth

2s.t. 0, 1s s

Page 68: ICML2011: recognizing human-object interaction activities

70

Ref.

0.260 0.227 0.145 0.135 0.112

0.085 0.075 0.041 0.012 0.006

0.002 0.000 0.000 0.000 0.000

• From pairwise annotation to overall similarity:

Action Retrieval: Obtaining Ground Truth

Page 69: ICML2011: recognizing human-object interaction activities

Action Retrieval: Our Approach

>

>

>

Action class

Human pose

Object

71

Page 70: ICML2011: recognizing human-object interaction activities

>

>

>

Action class

Human pose

Object

• Distance between two images I and I’:

2 ( | ), ( | )D p A I p A I

2 ( | ), ( | )D p H I p H I

( | ), ( | )D p O I p O I

( , ) i ii

D T p q p q

22 ( )( , ) i i

i i i

p qD p q

p q

Total variance:

Chi-square statistics:

72

Action Retrieval: Our Approach

Page 71: ICML2011: recognizing human-object interaction activities

73

Action Retrieval: Evaluation Metric

Ref.

• Ranking from an algorithm:

• Ranking by ground-truth similarity:

1reI 2reI nreI

1gtI 2gtI ngtI

Number of Neighborhoods

Acc

urac

y

refI

n

1

1

( , )

( , )

i

i

n re ref

in gt ref

i

s I I

s I I

: ground-truth similaritys

Page 72: ICML2011: recognizing human-object interaction activities

74

Action Retrieval: Result

MC: Mutual Context

10 20 30 400.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of neighbors

Ave

rage

pre

cisi

on

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

Page 73: ICML2011: recognizing human-object interaction activities

75

Action Retrieval: Result

MC: Mutual Context

SPM: spatial pyramid matching (Lazebnik et al, 2005)

• Use the confidence scores of SPM output to evaluate the action similarity.

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of neighbors

Ave

rage

pre

cisi

on

Page 74: ICML2011: recognizing human-object interaction activities

76

Action Retrieval: Result

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of neighbors

Ave

rage

pre

cisi

on

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

Page 75: ICML2011: recognizing human-object interaction activities

77

Action Retrieval: Result

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of neighbors

Ave

rage

pre

cisi

on

MC, Overall

SPM Baseline

Page 76: ICML2011: recognizing human-object interaction activities

78

Action Retrieval: Result

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.6

0.7

0.8

Number of retrieved images

Ave

rage

pre

cisi

on

MC overall, 2

MC oeverall, T

MC action only, 2

MC action only, T

MC object only, 2

MC object only, T

MC pose only, 2

MC pose only, T

SPM baseline, 2

SPM baseline, T

10 20 30 400.5

0.55

0.6

0.65

0.7

0.75

0.8

Number of neighbors

Ave

rage

pre

cisi

on

MC, Overall

SPM Baseline

Page 77: ICML2011: recognizing human-object interaction activities

79

• Mutual context model for Action Recognition Motivation

Model representation

Model learning

• Recognition I: Action Classification, Object

Detection, and Pose Estimation

• Recognition II: Action Retrieval by Matching

Action Similarity

• Conclusion

Outline

Page 78: ICML2011: recognizing human-object interaction activities

80

Conclusion

Human action as human-object interaction:

• Action classification:

• Matching action similarity:

> >

Croquet shot

Tennis forehand

Cricket bowling

Page 79: ICML2011: recognizing human-object interaction activities

81

Acknowledgment

• Stanford Vision Lab reviewers:– Jia Deng– Jia Li