Model-Based Human Pose Tracking
-
Upload
nguyendieu -
Category
Documents
-
view
245 -
download
6
Transcript of Model-Based Human Pose Tracking
![Page 1: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/1.jpg)
ModelModel--Based Human Pose TrackingBased Human Pose Tracking
David J FleetDepartment of Computer ScienceUniversity of Toronto
withM. Brubaker, Toronto P. Fua, EPFLA. Hertzmann, TorontoN. Lawrence, SheffieldR. Urtasun, MITJ. Wang, Toronto
IEEE Workshop on Evaluation of Articulated Human Motion and Pose Estimation, June 2007, Minneapolis [Keynote Presentation]
![Page 2: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/2.jpg)
Introduction
We need to get a lot of things right to successfully infer 3D pose and motion from monocular video:
body size and shapepose and motionappearance (foreground and background)lighting and occlusionimage measurementsearch and detection…
![Page 3: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/3.jpg)
Introduction
articulated modelmotion capture
Human pose and motion data are high-dimensional, and difficult to obtain.
Sparseness of data, over-fitting and generalization are significant issues.
![Page 4: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/4.jpg)
Introduction
[Elgammal and Lee `04]
![Page 5: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/5.jpg)
Introduction
[Sminchisescu and Jepson `04]
![Page 6: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/6.jpg)
Introduction
[Agarwal and Triggs `06]
![Page 7: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/7.jpg)
Introduction
[Sigal et al, `06]
![Page 8: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/8.jpg)
Introduction
[Sminchisescu et al. `06]
![Page 9: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/9.jpg)
Gaussian process latent variable models (GPLVM)
Nonlinear generalization of probabilistic PCA [Lawrence `05].
![Page 10: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/10.jpg)
Gaussian processes
input (x)
targ
et (
y)
data pointstrue curve
input (x)
targ
et (
y)
data pointstrue curveGP mean
Model averaging (marginalization of the parameters) helps to avoid problems due to over-fitting and under-fitting with small data sets.
input (x)
targ
et (
y)
data pointstrue curveGP meanGP 2σ
![Page 11: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/11.jpg)
Gaussian processes
If , then is zero-mean Gaussian with covariance
Output is modelled as a function of input :
A Gaussian process is fully specified by a covariance functionand its hyper-parameters
Linear:
RBF:
![Page 12: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/12.jpg)
Gaussian process latent variable models
Learning:Learning: Maximize log likelihood (or MAP) to find latent positions and kernel hyper-parameters, given an initial guess (e.g., using PCA).
Joint likelihood of vector-valued data , given the latent positions :
where denotes the th dimension of the training data, and the kernel matrix has elements , and is shared by all data dimensions.
GPDM:GPDM: For time-series data one can include a Gaussian process prior on latent state sequences [Wang et al ’06].
![Page 13: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/13.jpg)
Conditional (predictive) distribution
Given a model , the distribution over the data conditioned on a latent position, , is Gaussian:
where
![Page 14: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/14.jpg)
Conditional (predictive) distribution
mean pose
log variance
![Page 15: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/15.jpg)
3D B-GPDM for walking
GPDM: sample trajectoriesGPDM: log reconstruction variance
6 walking subjects,1 gait cycle each, on treadmill at same speed with a 20 DOF joint parameterization.
[Urtasun et al, `06]
![Page 16: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/16.jpg)
People tracking with GPDM
global pose
joint angles
latent coordinates
Image Measurements: trajectories of 2D patches, estimated with the WSL tracker [Jepson et al. `03]
likelihood predictionposterior
Inference: MAP estimation by hill climbing on the negative log posterior over windowed state sequences
Image Observations:
GPDM:
State:
[Urtasun et al, `06]
![Page 17: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/17.jpg)
Occlusion
3D model overlaid on video
3D animated characters
![Page 18: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/18.jpg)
Exaggerated gait
3D model overlaid on video
3D animated characters
![Page 19: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/19.jpg)
… but wait …
you’re thinking …
these models won’t scale
they won’t handle different styles of motion
efficiency is a major issue
the amount of data required for training is daunting
![Page 20: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/20.jpg)
Multiple motions often produce poor models
GPDM with MAP learning
4 walking subjects, 2 gait cycles each, 50 DOFs
![Page 21: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/21.jpg)
Multiple motions often produce poor models
Marginalize latent positions, and solve with HMC-EM [Wang et al, ‘06]
4 walking subjects, 2 gait cycles each, 50 DOFs
![Page 22: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/22.jpg)
Multiple motions often produce poor models
But there is more valuable information in the training data, and prior knowledge about human motion that can be used to influence the structure of the model.
GPLVMs do not ensure that the map from pose to the latent space is smooth, i.e., that nearby poses map to nearby latent positions.
With sparse mocap data, different motions may be modeled as arising from different distributions.
![Page 23: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/23.jpg)
Topological constraints
[Urtasun et al. ’07]
Topological constraints and smoothness can be encouraged with back-constraints (parameterized latent coordinates):
Exploit prior knowledge to control the topology of the latent space, and promote smoothness.
Smoothness “priors” in terms of latent positions of “similar” poses:
where weights are chosen (or optimized) to represent likely latent positions in terms of neighboring positions (or desired predictors).
![Page 24: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/24.jpg)
“Cylindrical” topology
9 walk cycles10 run cyclesdifferent speeds different subjects
Back-constraints: first 2 latent coordinates map to vicinity of unit circle
Locally-linear prior: similar poses map to nearby latent positions
Transitions: poses with left (right) foot on the ground map to similar phases (around unit circle)
runwalk
[Urtasun et al. ’07]
![Page 25: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/25.jpg)
“Cylindrical” topology
Simulations that transition from running to walking
[Urtasun et al. ’07]
![Page 26: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/26.jpg)
Style-content separation
gait, phase,identity,gender, ...
pose
Nonlinear basis functions [Elgammal and Lee ‘04]
data
factor 1 …factor 2 factor N
Multilinear style-content models [Tenenbaum and Freeman ’00; Vasilescu and Terzopoulos ‘02]
![Page 27: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/27.jpg)
Multifactor GPLVM
[Wang et al. ’07]
where
Suppose depends linearly on latent style parameters , and nonlinearly on :
where
If and , then is zero-mean Gaussian, with covariance
![Page 28: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/28.jpg)
Multifactor locomotion model
Covariance function:
: identity of the subject performing the motion
: gait of the motion (walk, run, stride)
: current state of motion (evolves w.r.t. time)
Three factor latent model with :
[Wang et al. ’07]
![Page 29: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/29.jpg)
Multifactor locomotion model
Training data:Training data: 6 motions, 314 poses in total,
![Page 30: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/30.jpg)
Generating new motions
stride
run
walk
subject 1 subject 2 subject 3
The GP model provides a Gaussian prediction for new motions. We use the mean to generate motions with different styles.
Each training motion is a sequence of poses, sharing the same combination of subject ( ) and gait ( ).
![Page 31: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/31.jpg)
Generating new motions
subject 1, walksubject 1, stride(generated)
![Page 32: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/32.jpg)
Generating new motions
subject 2, walksubject 2, stride(generated)
![Page 33: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/33.jpg)
Compositionality
Hierarchical GPLVM [Lawrence and Moore ’07]
left arm
abdomen
head
rightarm left
leftrightleg
parts
upper/lower body
entire body
![Page 34: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/34.jpg)
Compositionality
Data:Data: 1 walk cycle, 1 run cycle
Initialization:Initialization: PCA
Learning:Learning: joint ML optimization of latent coordinates and hyper-parameters at all layers.
![Page 35: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/35.jpg)
… and beyond
Scaling / Efficiency (e.g.,[Quinonero-Candela and Rasmussen `05])
Switching models for modeling activity transitions (e.g., [Pavlovic et al `00; Li et al `02; Oh et al `05; Li et al. `06])
…
Tracking applications
![Page 36: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/36.jpg)
Interactions are important
![Page 37: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/37.jpg)
Implausible motion
[Poon and Fleet, 01]
KinematicKinematic Model:Model: damped second-order Markov model with Beta process noise and joint angle limits
Observations:Observations: steerable pyramid coefficients (image edges)
Inference:Inference: hybrid Monte Carlo particle filter
![Page 38: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/38.jpg)
Implausible motion
[Urtasun et al. `05]
KinematicKinematic Model:Model: non-linear latent model of the pose manifold, with second-order Gauss-Markov model for temporal evolution
Observations: Observations: tracked 2D patches on body (WSL tracker)
Inference:Inference: MAP estimation (hill-climbing)
![Page 39: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/39.jpg)
Implausible motion
KinematicKinematic Model:Model: Gaussian process dynamical model (GPDM)
Observations:Observations: tracked 2D patches on body (WSL tracker)
Inference:Inference: MAP estimation (hill climbing) with sliding window
[Urtasun et al. `06]
![Page 40: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/40.jpg)
Can learning scale?
Problem: Learning kinematic models that incorporate dependence on the environment and other bodies from motion capture data may be untenable.
![Page 41: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/41.jpg)
Physics-based models
Physics specifies the motions of bodies and their interactions in terms of inertial descriptions and forces, and generalize naturally to account for:
balance and body lean (e.g., on hills)
sudden accelerations (e.g., collisions)
static contact (e.g., avoiding footskate)
variations in style due to speed and mass distribution (e.g., carrying an object)
…
![Page 42: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/42.jpg)
Modeling full-body dynamics is difficult
[Liu et al. `06] [Kawada Ind. HRP-2, Robodex 2003]
![Page 43: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/43.jpg)
Biomechanics
[McGeer `90; Kuo `01,`02]
Impu
lse
Properties:Properties:walks efficiently (passively down an incline)when powered, exhibits a human-like preferred speed-step length relationshipinvariant to total mass and leg length (approximately)
Anthropomorphic WalkerAnthropomorphic Walker2D model with rigid bodies for the torso and each legforces can be added with a spring between the legs and an impulsive toe-off
![Page 44: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/44.jpg)
Physics-based model of lower-body dynamics
Development of physics-based motion models for tracking:equations of motionsprior distribution over forces that model natural 2D locomotion with different speeds, step lengths, …a 3D pose model consistent with underlying dynamics
controlparameters
2D dynamics 3D kinematics, given dynamics
[Brubaker et al. `07]
![Page 45: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/45.jpg)
Tracking results
Approximate MAP trajectory
![Page 46: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/46.jpg)
Tracking results
Approximate MAP trajectory in 3D
![Page 47: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/47.jpg)
Limitations (future work)
Knees and torso are needed to help account for bipedal locomotion on stairs, hills, etc.
3D models would allow body lean and foot placement in turning, and variations in upper-body moments of inertia
Extend dynamics to capture standing (both feet in contact) and running (no contact during flight)
Learning parameters of physics-based models from mocapconditional kinematics.
Our work thus far has just scratched the surface
![Page 48: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/48.jpg)
Modeling appearance: Shape
[Plankers and Fua, 2003]
![Page 49: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/49.jpg)
Modeling appearance: Shape
[Allen et al. 2003]
[Balan et al. 2007]
![Page 50: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/50.jpg)
Modeling appearance: Clothing
[Rosenhahn et al. `07]
![Page 51: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/51.jpg)
Modeling appearance: Lighting
[de la Gorce et al. `07]
![Page 52: Model-Based Human Pose Tracking](https://reader033.fdocuments.net/reader033/viewer/2022051318/58667c261a28ab8c408b504d/html5/thumbnails/52.jpg)
Conclusions
We need to get a lot of things right to successfully infer 3D pose and motion from monocular video:
body size and shapepose and motionappearance (foreground and background)lighting and occlusionimage measurementsearch and detection…