1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2]...
-
Upload
clarence-cross -
Category
Documents
-
view
218 -
download
0
Transcript of 1.Introduction 2.Article [1] Real Time Motion Capture Using a Single TOF Camera (2010) 3.Article [2]...
Human Pose Recognition
Contents
1. Introduction
2. Article [1]
Real Time Motion Capture Using a
Single TOF Camera (2010)
3. Article [2]
Real Time Human Pose Recognition In
Parts Using a Single Depth
Images(2011)
1.1 What Is Pose Recognition?
Fig From [2]
Input Image
armtorso
head
1.2 Motivation
Why do we need this?
Robotics
Smart surveillance
virtual reality
motion analysis
Gaming - Kinect
Kinect – Project Natal
Microsoft Xbox 360 console
“You are the controller”
Launched - 04/11/10
In the first 60 days on the market sold
over 8M units! (Guinness world record)
http://www.youtube.com/watch?v=p2qlHo
xPioM
1.3 Challenges
Real Time???
Full Solution??
Cheap???
OCCLUSIONS???Light?
Shadows?
Clothes?
What is the problem???
1.4 Previous Technology
mocap using markers –
expensive
Multi View camera systems –
limited applicability.
Monocular –
simplified problems.
1.4 New TechnologyTime Of Flight Camera. (TOF)
Dense depth
High frame rate (100 Hz)
Robust to:
Lighting
shadows
other problems.
2. Article [1]Real Time Motion
Capture Using a Single Time Of Flight Camera
(V. Ganapathi et al. CVPR 2010)
Article Contents
2.1 previous work
2.2 What’s new?
2.3 Overview
2.4 results
2.5 limitations & future work
2.6 Evaluation
2.1 Previous workMany many many articles…
(Moeslund et al 2006–covered 350
articles…)
(2006) (2006) (1998)
2.2 What’s new?TOF technology
Propagating information up the kinematic
chain.
Probabilistic model using the unscented
transform.
Multiple GPUs.
2.3 Overview
1. Probabilistic Model
2. Algorithm Overview:
Model Based Hill Climbing Search
Evidence Propagation
Full Algorithm
1 .Probabilistic Model 1 .Probabilistic Model
15 body parts
DAG – Directed Acyclic Graph
1{ }i Nt t iX X pose
tVspeed tzrange scan
DBN– Dynamic Bayesian Network
dynamic Bayesian network (DBN)
Assumptions
Use ray casting to evaluate
distance from measurement.
Goal: Find the most likely states, given previous frame
MAP, i.e.:
Fig From [1]
1( ) 1i i it t tP X V X 1 1| ~ ( , )t t tV V N V
, 1 1ˆ ˆ ˆ ˆ, argmax log ( | , ) log ( , | , )
t tt t X V t t t t t t tX V P z X V P X V X V
1 .Probabilistic Model
kz
2 .Algorithm Overview
1. Hill climbing search (HC)
2. Evidence Propagation –EP
2.1 Hill Climbing Search (HC)
Fig From [1]
0.05m
1ˆ,t t tX fromV X
0.05m
Calculate
evaluate likelihood choose best point!
1( | )i it tP V V Grid around
itVSamplei
Coarse to fine Grids.
2.1 Hill Climbing Search (HC)
The good:
Simple
Fast
run in parallel in GPUS
The Bad:
Local optimum
Ridges, Plateau, Alleys
Can lose track when motion is fast ,or occlusions
occur.
2.2 Evidence Propagation
Also has 3 stages:
1. Body part detection (C. Plagemann et al 2010)
2. Probabilistic Inverse Kinematics
3. Data association and inference
2.2.1 Body Part Detection
Bottom up approach:
1. Locate interest points with AGEX –
Accumulative Geodesic Extrema.
2. Find orientation.
3. Classify the head, foots and hands using local shape
descriptors.
Fig From [3]
2.2.1 Body Part Detection
Results:
Fig From [3]
2.2.2 Probabalistic inverse kinematics (EP)
51{ , , }i ip Head Hands Legs of X ˆ ( 1,..., )jp j N
?
Assume Correspondence
Need new MAP conditioned on .
Problem – isn’t linear!
Solution: Linearize with the unscented Kalman filter .
Easy to determine .
1 1ˆ ˆ( , , )i t t tp V X V
1 1ˆ ˆ ˆ( | , , )t t t jP V V X p
1ˆ ˆ,t jX p
ˆi jp p
2.3 Full Algorithm
HC
Part Detection
Remove ExplainedSuggestions.Coresspond: by body parts
ˆ{( , )}i jp p
X’
HC
PreviousMAP
DepthImage
X’>Xbest?
X’
Xbest
EP
2.4 Results Experiments:
28 real depth image sequences.
Ground Truth - tracking markers.
, – real marker position
– estimated position
perfect tracks.
fault tracking.
Compared 3 algorithms: EP, HC, HC+EP .
1
ˆ|| ||Mi i
avgi
m m
M
im
ˆ im
0.1avg m
0.3avg m
2.4 Results
best – HC+EP, worse – EP.
Runs close to real time.
HC: 6 frames per second.
HC+EP: 4-6 frames per second.Fig From [1]
BiggerDifference
Harder
2.4 Results
HC
HC+EP
Lose trackExtreme case – 27:
Fig From [1]
2.5 Limitations & Future workLimitations:
Manual Initialization.
Tracking more than one person at a time.
Using temporal data – consume more time,
reinitialization problem.
Future work:
improving the speed.
combining with color cameras
fully automatic model initialization.
Track more than 1 person.
2.6 Evaluation Well Written
Self Contained
Novel combination of existing parts
New technology
Achieving goals (real time)
Missing examples on probabilistic model.
Not clear how is defined
Extensively validated:
Data set and code available
not enough visual examples in article
No comparison to different
algorithms
0X
3. Article [2]Real Time Human Pose
Recognition In Parts From Single Depth
Images (Shotton et al. & Xbox incubation
Microsoft Research 2011)
Article Contents
2.1 previous work
2.2 What’s new?
2.3 Overview
2.4 results
2.5 limitations & future work
2.6 Evaluation
2.1 Previous work Same as Article [1].
2.2 What’s new? Using no temporal information – robust
and
fast (200 frames per second).
Object recognition approach.
per pixel classification.
Large and highly varied
training dataset .
Fig From [2]
2.3 Overview
1. Database construction
2. Body part inference and joint proposals:
Goals:
computational efficiency and robustness
1 .Database
Pose estimation is often overcome lack of training
data… why???
Huge color and texture variability.
Computer simulation don’t produce the range of
volitional motions of a human subject.
2 .Data base
Fig From [2]
100k mocap frames Synthetic rendering pipeline
1 .Database
Real data
Synthetic data
Which is real???
Fig From [2]
2 .Body part inference
1. Body part labeling
2. Depth image features
3. Randomized decision forests
4. Joint position proposals
2.1 Body part labeling
31 body parts labeled .
The problem now can be solved by an efficient
classification algorithms.
Fig From [2]
Head Up RightHead Up Left
2.2 Depth comparison features
Simple depth comparison features:(1)
– depth at pixel x in image I, offset
normalization - depth invariant.
computational efficiency:
no preprocessing.
( )Id x ( , )u v
Fig From [2]
2.3 Randomized Decision forests
How does it work?
Node = feature
Classify pixel x:
f and a threshold
1
1( | , ) ( | , )
T
tt
P c I x P c I xT
Fig From [2]
Pixel x
2.3 Randomized Decision forests
Training Algorithm: 1M Images – 2000 pixels
Per image
( , )
( , )
| ( ) |argmax ( ) ( ) ( ) ( ( ))
| |s
ss l r
QG G H Q H Q
Q
*H-antropy
Training 3 trees, depth 20, 1M images~ 1 day (1000 core
cluster)
1M images*2000pixels*2000 *50 =
f 142 10 ...computations
2.3 Randomized Decision forests
Fig From [2]
Trained tree:
2.4 Joint Position Proposal
Local mode finding approach based on mean shift with a
weighted Gaussian kernel.
Density estimator:
2
1
ˆ( ) expN
ic ic
i c
x xf x w
b
2( | , ) ( )ic i I iw P c I x d x
Fig From [4]
outliersCenter of mass
2.4 Results Experiments:
8800 frames of real depth images.
5000 synthetic depth images.
Also evaluate Article [1] dataset.
Measures :
1. Classification accuracy – confusion
matrix.
2. joint accuracy –mean Average Precision
(mAP)
results within D=0.1m –TP.
Fault
Fig From [2]
2.4 Results- Classification accuracy high correlation between real and synthetic.
Depth of tree – most effective
Fig From [2]
2.4 Results - Joint Prediction Comparing the algorithm on:
real set (red) – mAP 0.731
ground truth set (blue) – mAP 0.914
mAP 0.984 – upper
body
Fig From [2]
2.4 Results- Joint PredictionComparing algorithm to ideal Nearest Neighbor
matching, and realistic NN - Chamfer NN.
Fig From [2]
2.4 Results- Joint PredictionComparison to Article[1]:
Run on the same dataset
Better results (even without temporal
data)
Runs 10x faster.
Fig From [2]
2.4 Results- Joint PredictionFull rotations and multiple people
Right-left ambiguity
mAP of 0.655 ( good for our uses)
Result VideoFig From [2]
2.5 Limitations & Future workFuture work:
better synthesis pipeline
Is there efficient approach that directly
regress joint positions? (already done in
future
work -
Efficient offset regression of body joint position
s
)
2.6 Evaluation Well Written
Self Contained
Novel combination of existing parts
New technology
Achieving goals (real time)
Extensively validated:
Used in real console
Many results graphs and examples
(Another pdf of supplementary
material)
Broad comparison to other
algorithms
data set and code not available
References[1] Real Time Motion Capture Using a Single TOF Camera (V.
Ganapathi et al. 2010)
[2] Real Time Human Pose Recognition In Parts Using a Single
Depth Images(Shotton et al. & Xbox Incubation 2011)
[3] Real time identification and localization of body parts from
depth images (C. Plagemann et al. 2010)
[4] Computer Graphics course (046746), Technion.
Questions?