1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2]...

54
Human Pose Recognition

Transcript of 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2]...

Page 1: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Human Pose Recognition

Page 2: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Contents

1. Introduction

2. Article [1]

Real Time Motion Capture Using a

Single TOF Camera (2010)

3. Article [2]

Real Time Human Pose Recognition In

Parts Using a Single Depth

Images(2011)

Page 3: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1.1 What Is Pose Recognition?

Fig From [2]

Input Image

armtorso

head

Page 4: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1.2 Motivation

Why do we need this?

Robotics

Smart surveillance

virtual reality

motion analysis

Gaming - Kinect

Page 5: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Kinect – Project Natal

Microsoft Xbox 360 console

“You are the controller”

Launched - 04/11/10

In the first 60 days on the market sold

over 8M units! (Guinness world record)

http://www.youtube.com/watch?v=p2qlHo

xPioM

Page 6: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1.3 Challenges

Real Time???

Full Solution??

Cheap???

OCCLUSIONS???Light?

Shadows?

Clothes?

What is the problem???

Page 7: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1.4 Previous Technology

mocap using markers –

expensive

Multi View camera systems –

limited applicability.

Monocular –

simplified problems.

Page 8: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1.4 New TechnologyTime Of Flight Camera. (TOF)

Dense depth

High frame rate (100 Hz)

Robust to:

Lighting

shadows

other problems.

Page 9: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2. Article [1]Real Time Motion

Capture Using a Single Time Of Flight Camera

(V. Ganapathi et al. CVPR 2010)

Page 10: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Article Contents

2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation

Page 11: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.1 Previous workMany many many articles…

(Moeslund et al 2006–covered 350

articles…)

(2006) (2006) (1998)

Page 12: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2 What’s new?TOF technology

Propagating information up the kinematic

chain.

Probabilistic model using the unscented

transform.

Multiple GPUs.

Page 13: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Overview

1. Probabilistic Model

2. Algorithm Overview:

Model Based Hill Climbing Search

Evidence Propagation

Full Algorithm

Page 14: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1 .Probabilistic Model 1 .Probabilistic Model

15 body parts

DAG – Directed Acyclic Graph

1{ }i Nt t iX X pose

tVspeed tzrange scan

DBN– Dynamic Bayesian Network

Page 15: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

dynamic Bayesian network (DBN)

Assumptions

Use ray casting to evaluate

distance from measurement.

Goal: Find the most likely states, given previous frame

MAP, i.e.:

Fig From [1]

1( ) 1i i it t tP X V X 1 1| ~ ( , )t t tV V N V

, 1 1ˆ ˆ ˆ ˆ, argmax log ( | , ) log ( , | , )

t tt t X V t t t t t t tX V P z X V P X V X V

1 .Probabilistic Model

kz

Page 16: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2 .Algorithm Overview

1. Hill climbing search (HC)

2. Evidence Propagation –EP

Page 17: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.1 Hill Climbing Search (HC)

Fig From [1]

0.05m

1ˆ,t t tX fromV X

0.05m

Calculate

evaluate likelihood choose best point!

1( | )i it tP V V Grid around

itVSamplei

Coarse to fine Grids.

Page 18: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.1 Hill Climbing Search (HC)

The good:

Simple

Fast

run in parallel in GPUS

The Bad:

Local optimum

Ridges, Plateau, Alleys

Can lose track when motion is fast ,or occlusions

occur.

Page 19: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2 Evidence Propagation

Also has 3 stages:

1. Body part detection (C. Plagemann et al 2010)

2. Probabilistic Inverse Kinematics

3. Data association and inference

Page 20: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2.1 Body Part Detection

Bottom up approach:

1. Locate interest points with AGEX –

Accumulative Geodesic Extrema.

2. Find orientation.

3. Classify the head, foots and hands using local shape

descriptors.

Fig From [3]

Page 21: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2.1 Body Part Detection

Results:

Fig From [3]

Page 22: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2.2 Probabalistic inverse kinematics (EP)

51{ , , }i ip Head Hands Legs of X ˆ ( 1,..., )jp j N

?

Assume Correspondence

Need new MAP conditioned on .

Problem – isn’t linear!

Solution: Linearize with the unscented Kalman filter .

Easy to determine .

1 1ˆ ˆ( , , )i t t tp V X V

1 1ˆ ˆ ˆ( | , , )t t t jP V V X p

1ˆ ˆ,t jX p

ˆi jp p

Page 23: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Full Algorithm

HC

Part Detection

Remove ExplainedSuggestions.Coresspond: by body parts

ˆ{( , )}i jp p

X’

HC

PreviousMAP

DepthImage

X’>Xbest?

X’

Xbest

EP

Page 24: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results Experiments:

28 real depth image sequences.

Ground Truth - tracking markers.

, – real marker position

– estimated position

perfect tracks.

fault tracking.

Compared 3 algorithms: EP, HC, HC+EP .

1

ˆ|| ||Mi i

avgi

m m

M

im

ˆ im

0.1avg m

0.3avg m

Page 25: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results

best – HC+EP, worse – EP.

Runs close to real time.

HC: 6 frames per second.

HC+EP: 4-6 frames per second.Fig From [1]

BiggerDifference

Harder

Page 26: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results

HC

HC+EP

Lose trackExtreme case – 27:

Fig From [1]

Page 27: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.5 Limitations & Future workLimitations:

Manual Initialization.

Tracking more than one person at a time.

Using temporal data – consume more time,

reinitialization problem.

Future work:

improving the speed.

combining with color cameras

fully automatic model initialization.

Track more than 1 person.

Page 28: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.6 Evaluation Well Written

Self Contained

Novel combination of existing parts

New technology

Achieving goals (real time)

Missing examples on probabilistic model.

Not clear how is defined

Extensively validated:

Data set and code available

not enough visual examples in article

No comparison to different

algorithms

0X

Page 29: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

3. Article [2]Real Time Human Pose

Recognition In Parts From Single Depth

Images (Shotton et al. & Xbox incubation

Microsoft Research 2011)

Page 30: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Article Contents

2.1 previous work

2.2 What’s new?

2.3 Overview

2.4 results

2.5 limitations & future work

2.6 Evaluation

Page 31: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.1 Previous work Same as Article [1].

Page 32: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2 What’s new? Using no temporal information – robust

and

fast (200 frames per second).

Object recognition approach.

per pixel classification.

Large and highly varied

training dataset .

Fig From [2]

Page 33: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Overview

1. Database construction

2. Body part inference and joint proposals:

Goals:

computational efficiency and robustness

Page 34: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1 .Database

Pose estimation is often overcome lack of training

data… why???

Huge color and texture variability.

Computer simulation don’t produce the range of

volitional motions of a human subject.

Page 35: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2 .Data base

Fig From [2]

100k mocap frames Synthetic rendering pipeline

Page 36: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

1 .Database

Real data

Synthetic data

Which is real???

Fig From [2]

Page 37: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2 .Body part inference

1. Body part labeling

2. Depth image features

3. Randomized decision forests

4. Joint position proposals

Page 38: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.1 Body part labeling

31 body parts labeled .

The problem now can be solved by an efficient

classification algorithms.

Fig From [2]

Head Up RightHead Up Left

Page 39: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.2 Depth comparison features

Simple depth comparison features:(1)

– depth at pixel x in image I, offset

normalization - depth invariant.

computational efficiency:

no preprocessing.

( )Id x ( , )u v

Fig From [2]

Page 40: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Randomized Decision forests

How does it work?

Node = feature

Classify pixel x:

f and a threshold

1

1( | , ) ( | , )

T

tt

P c I x P c I xT

Fig From [2]

Pixel x

Page 41: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Randomized Decision forests

Training Algorithm: 1M Images – 2000 pixels

Per image

( , )

( , )

| ( ) |argmax ( ) ( ) ( ) ( ( ))

| |s

ss l r

QG G H Q H Q

Q

*H-antropy

Training 3 trees, depth 20, 1M images~ 1 day (1000 core

cluster)

1M images*2000pixels*2000 *50 =

f 142 10 ...computations

Page 42: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.3 Randomized Decision forests

Fig From [2]

Trained tree:

Page 43: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Joint Position Proposal

Local mode finding approach based on mean shift with a

weighted Gaussian kernel.

Density estimator:

2

1

ˆ( ) expN

ic ic

i c

x xf x w

b

2( | , ) ( )ic i I iw P c I x d x

Fig From [4]

outliersCenter of mass

Page 44: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results Experiments:

8800 frames of real depth images.

5000 synthetic depth images.

Also evaluate Article [1] dataset.

Measures :

1. Classification accuracy – confusion

matrix.

2. joint accuracy –mean Average Precision

(mAP)

results within D=0.1m –TP.

Page 45: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Fault

Fig From [2]

Page 46: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results- Classification accuracy high correlation between real and synthetic.

Depth of tree – most effective

Fig From [2]

Page 47: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results - Joint Prediction Comparing the algorithm on:

real set (red) – mAP 0.731

ground truth set (blue) – mAP 0.914

mAP 0.984 – upper

body

Fig From [2]

Page 48: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results- Joint PredictionComparing algorithm to ideal Nearest Neighbor

matching, and realistic NN - Chamfer NN.

Fig From [2]

Page 49: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results- Joint PredictionComparison to Article[1]:

Run on the same dataset

Better results (even without temporal

data)

Runs 10x faster.

Fig From [2]

Page 50: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.4 Results- Joint PredictionFull rotations and multiple people

Right-left ambiguity

mAP of 0.655 ( good for our uses)

Result VideoFig From [2]

Page 51: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.5 Limitations & Future workFuture work:

better synthesis pipeline

Is there efficient approach that directly

regress joint positions? (already done in

future

work -

Efficient offset regression of body joint position

s

)

Page 52: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

2.6 Evaluation Well Written

Self Contained

Novel combination of existing parts

New technology

Achieving goals (real time)

Extensively validated:

Used in real console

Many results graphs and examples

(Another pdf of supplementary

material)

Broad comparison to other

algorithms

data set and code not available

Page 53: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

References[1] Real Time Motion Capture Using a Single TOF Camera (V.

Ganapathi et al. 2010)

[2] Real Time Human Pose Recognition In Parts Using a Single

Depth Images(Shotton et al. & Xbox Incubation 2011)

[3] Real time identification and localization of body parts from

depth images (C. Plagemann et al. 2010)

[4] Computer Graphics course (046746), Technion.

Page 54: 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2] Real Time Human Pose Recognition In Parts Using a.

Questions?