RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND...

39
RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang, Sheng-Jyh Student: Hung, Fei-Fan Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Transcript of RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND...

Page 1: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES

Date: 2013/05/27

Instructor: Prof. Wang, Sheng-Jyh

Student: Hung, Fei-Fan

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

Page 2: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

2

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 3: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

3

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 4: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

4

Why using context in computer vision?

• simple image vs. human activities

~3-4%

with context

without context

With mutual context:

Without context:

Page 5: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

5

Challenges in Human Pose Estimation

• Human pose estimation is challenging

• Object detection facilitate human pose estimation

Difficult part appearance

Self-occlusion

Image region looks like a body part

Page 6: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

6

Challenges in Object Detection• Object detection is challenging

• human pose estimation facilitate object detection

Small, low-resolution, partially occluded

Image region similar to detection target

Page 7: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

7

The Goal• To build a mutual context model in Human-Object

Interaction(HOI) activities

Page 8: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

8

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 9: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

9

Tennis ball

Croquet mallet

Volleyball

Tennis racket

O:

Model representation• Modeling the mutual context of object and human poses

A:

Croquet shot

Volleyball smash

Tennis forehand

H:

P: body parts,

, M:num of bounding box

More than one atomic pose H in A

Body parts

Page 10: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

10

• : co-occurrence compatibility

between A,O,H• : spatial relationship between O,H• : modeling the image evidence with detectors

or classifiers

Model representation

H

A

P1 P2 PL

O1 O2

activity

Human poseobjects

Page 11: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

11𝝓1: Co-occurrence context

• co-occurrence between all A,O,H

• : strength of co-occurrence interaction

between

: indicator function: total number of atomic poses : total number of objects : total number of activity classes

H

A

P1 P2 PL

O1 O2

Page 12: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

12

• Spatial relationship between all O and different H

• : weight of • : a sparse binary vector • shows relative location• of w.r.t.

𝝓2: Spatial context

H

A

P1 P2 PL

O1 O2

:

Page 13: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

13

• Model O in the image I using object detection score

• For all object O• : vector of score of detecting • : weight of

• Between Om and Om’

• : binary feature vector• : weight of and

𝝓3: Modeling objects

H

A

P1 P2 PL

O1 O2

Page 14: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

14𝝓4: Modeling human pose

• Model atomic pose that H belongs to and likelihood

• : Gaussian likelihood function• : vector of score of detecting

body part in

H

A

P1 P2 PL

O1 O2

Page 15: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

15𝝓5: Modeling activity

• Model HOI activity by training activity classifier

• : -dim output of one-versus-all (OVA)

discriminative classifier

taking image as features

• : feature weight of

H

A

P1 P2 PL

O1 O2

Page 16: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

17

Model Properties• Spatial context between O and H

• Object detection and human pose estimation facilitate each other • Ignore the objects and body parts that are unreliable

• Flexible to extend to large scale datasets and other activities• Jointly model can share all objects and atomic poses

Page 17: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

18

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 18: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

19

Model Learning

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 19: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

20

• Using clustering to obtain atomic poses

• Normalize the annotations

• Finding missing part• Using the nearest visible neighbor

• Obtain a set of atomic poses• Hierarchical clustering

with maximum linkage

measure :

Obtaining Atomic Poses

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 20: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

21

Training Detectors and Classifiers• : Object detector in • : Human body part detector in

• : Overall activity classifier in

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

deformable part model

Spatial pyramid matching (SPM)SIFT + 3 level image pyramid

Page 21: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

24

Estimating Model Parameters

• Estimate by using ML approach with zero-mean Gaussian prior

Assign human pose to atomic pose

Training detectors and classifiers

Estimate parameters by Maximum Likelihood

Page 22: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

25

Learning result

Page 23: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

26

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 24: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

27

Model Inference

Initialize with learned results

New image

Update human body parts

Update object detection results

Update A and H labels

Page 25: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

28

Initialization

Initialize Activity classification

Object detectionHuman pose estimation

New image

Initialize with learned results

A: SPM classificationO: object detectionH: pictorial structure model

Page 26: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

29

Update model inference• Marginal distribution of human pose:

• Using mixture of Gaussian to refine the prior of body part

Update human body parts

Update object detection results

Update A and H labels

Page 27: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

30

Update model inference

• Greedy forward search method :• Initial and no object in bounding box• Select • Label box as • update

• Stop when <0

Update human body parts

Update object detection results

Update A and H labels

O,H

O,A,H O,I

Page 28: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

31

Update model inference• Enumerate possible A and H label

• Optimize

Update human body parts

Update object detection results

Update A and H labels

Page 29: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

32

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 30: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

33

Experimental Results (Sports Dataset)

Page 31: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

34

Experimental Results (Sports Dataset)

Page 32: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

35

Experimental Results (Sports Dataset)• Activity classification

Page 33: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

36

Page 34: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

37

Experimental results (PPMI Dataset)

Page 35: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

38

Experimental results (PPMI Dataset)

Page 36: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

39

Page 37: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

40

Outline• Introduction

• Intuition and goal

• Model Representation• Model Learning

• Obtaining Atomic Poses• Training Detectors and Classifiers• Estimating Model Parameters

• Model Inference• Experimental Results• Conclusion

Page 38: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

41

Conclusion• Mutual context can significantly improve the performance

in difficult visual recognition problems

• The joint model can share all the information

• Annotate all the human body parts and objects in training images

Page 39: RECOGNIZING HUMAN-OBJECT INTERACTION IN STILL IMAGE BY MODELING THE MUTUAL CONTEXT OF OBJECTS AND HUMAN POSES Date: 2013/05/27 Instructor: Prof. Wang,

42

Reference• Yao, B., and Fei-fei, L. “Recognizing Human-Object Interactions in

Still Images by Modeling the Mutual Context of Objects and Human Poses,” IEEE Transactions on Pattern Analysis and Machine Intelligence (2012)

• B. Yao and L. Fei-Fei, “Modeling Mutual Context of Object and Human Pose in Human-Object Interaction Activities,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010

• B. Sapp, A. Toshev, and B. Taskar, “Cascade Models for Articulated Pose Estimation,” Proc. European Conf. Computer Vision, 2010.

• S. Lazebnik, C. Schmid, and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories,” Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2006.

• http://en.wikipedia.org/wiki/Hierarchical_clustering