Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with...

50
Pose Estimation Sujay Yadawadkar, Virginia Tech , ECE 6554:Advanced Computer Vision

Transcript of Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with...

Page 1: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Pose Estimation

Sujay Yadawadkar, Virginia Tech,

ECE 6554:Advanced Computer Vision

Page 2: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Agenda:

• Pose Estimation:

• Part Based Models for Pose Estimation

• Pose Estimation with Convolutional Neural

Networks (Deep pose)

• Pose Estimation with Sequential Prediction (Pose

Machines)

Page 3: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Estimating Articulated PosesLocalizing Body Joints from Monocular Images

Our goal is…monocular (single view)

3

Page 4: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence
Page 5: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Estimating Articulated Poses from Monocular Images

Why it is Hard?

large variance occlusion L/R ambiguity

(single view)

5

Page 6: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

6

Estimating Articulated Poses from Monocular Images

Direct Mapping

Page 7: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Part-based ModelsRecognizing Local Appearance

Confidence

maps

7

……

Part

detector

for wrist

0.999 0.001 0.001

Page 8: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Non-parametric Uncertainty on Confidence Maps

8

right wrist

Page 9: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

9

Hand-crafted

Context feature

[Ramakrishna14]

Extracting Features from Confidence Maps

Loses Uncertainty

[Carriera16]

Regress to

DisplacementPeak Candidates for

Graphical Models

[Chen14] [Pishchulin16][Tompson15]

Page 10: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Pose Estimation with CNN

• Consider Pose Estimation as a regression problem.

• Loss function: L2 distance between ground truth of

the pose vector and estimated pose vector.

Page 11: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Convolutional Pose Machines

1. Capture local appearance with CNNs

3. Iteratively refine confidence maps with global cues

11

2. CNNs on confidence maps to capture long-range part

dependencies (preserve uncertainty)

Page 12: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Convolutional Pose Machines (CPMs)

12

P

Stage 1

P

Stage 2

……

Stage T

P

Page 13: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

13

P P

Stage 1 Stage 2

……

P

Stage 1

Capturing Local Appearance by FCNN

Convolutional Pose Machines

Stage T

P

Page 14: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

14

P P

Stage 1 Stage 2

……

P

Stage 1 Stage 2

P

Learning Image-dependent Spatial Model

Convolutional Pose Machines

Stage T

P

Page 15: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Large Receptive Field

15

P P

Stage 1 Stage 2

……

P

Stage 1 Stage 2

P

Convolutional Pose Machines

Stage T

P

Page 16: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Convolutional Pose MachinesOverall Architecture

16

P P

Stage 1 Stage 2

……

Stage T

P

Stage 1 Stage 2 Stage T

Page 17: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Iteratively Refined Confidence Maps

right

elbow

right

wrist

1st stage 2nd stage 3rd stageInput Image

18

Page 18: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

19

Iteratively Refined Confidence MapsRecover from False Negative

1st stage 2nd stage 3rd stage

Page 19: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Iteratively Refined Confidence Maps

1st stage output 2nd stage 3rd stage

20

Recover from False Negative

Page 20: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

21

Iteratively Refined Confidence Maps

Page 21: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Training CPMs

overall loss

Ideal Confidence Maps for Intermediate Supervisions

23

Stage 1 Stage 2 Stage T

Page 22: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Training CPMsJoint training with Intermediate Supervisions

24

Stage 1 Stage 2 Stage T

Page 23: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Training CPMsIntermediate Supervisions Resolves Gradient Vanishing

25

With Intermediate SupervisionWithout Intermediate Supervision

Stage 1 Stage 2 Stage 3

Stage 1 Stage 2 Stage 3

Page 24: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

26

Convolutional Pose MachinesOverall Architecture with Shared Image Features

…Stage 1 Stage 2 Stage T

Stage 1 Stage 2 Stage T

Page 25: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Training CPMsData Prepare and Augmentation

27

Page 26: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Analysis and Results

28

Page 27: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

29

Benchmark Datasets

FLIC

3987 training

1016 testing

movie scenes

upper body

size

type

annotation

MPII

29116 training

11823 testing

diverse

full body w/ truncation

LSP

11000 training

1000 testing

sports

full body

Page 28: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Number of Stages

(error tolerance)

PCK 0.2

30

Page 29: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Training Methods

(error tolerance)

PCK 0.2

31

Page 30: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Quantitatively ResultsFLIC Upper Body with Observer Centric (OC) Annotations

PCK 0.2

PCK 0.1

32

Page 31: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Quantitatively ResultsLSP Dataset with Person Centric (PC) Annotations

PCK 0.2

34

90.5

%

Page 32: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Quantitatively ResultsMPII Dataset with PC Annotations

90.1

%

PCKh 0.5

35

LSP

Page 33: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Quantitatively ResultsMPII Dataset: Viewpoints

36

Page 34: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

37

Page 35: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

right wrist

Failure Cases

rare viewpointL/R confusion severe occlusionrare pose

38

Page 36: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Summary

• Monocular human pose estimation are becoming reliable.

• CPMs capture complex long-range part dependencies by

iteratively refining confidence maps with preserved

uncertainty.

• CPMs naturally avoid the problem of vanishing gradient by

intermediate supervisions.

39

Page 37: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

What’s Next?

40

Page 38: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

From Single to Multi-PersonChallenge: Identifying number of people and part-person

association

41

Page 39: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Multi-person Human Pose EstimationsNaive Two-phase Method

42

CPM with P = 1 (person detector)

Credit: Zhe Cao

Page 40: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Pose Estimations in Finer ScalesFaces

Source: Tomas Simon43

Page 41: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Pose Estimations in Finer ScalesHands

44

Page 42: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

45

CMU Panoptic Studio500 Synced Cameras

Page 43: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Multiple

Views

for

3D

Recon

Right Wrist

46

Page 44: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Multiple Views for 3D Reconstruction

Source: Hanbyul Joo47

Page 45: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Projected

Full

Poses

Multiple

Views

for

3D

Recon

48

Page 46: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Live Demo!

49

Page 47: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

50

Real-time CPMs

Page 48: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

51

Real-time CPMs: Confidence Map of Right Wrist

Page 49: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

52

- Analysis on failure cases and data distribution

- One-shot multi-person pose estimation

- Direct 3D reasoning

- Temporal CPMs

Future Directions

Page 50: Pose Estimation - Virginia Tech · Convolutional Pose Machines 1. Capture local appearance with CNNs 3. Iteratively refine confidence maps with global cues 11 2. CNNs on confidence

Thank you

Questions?

53

Check our Paper, Github, and Youtube Channel!