Lattice P article Filters - Artificial...

8

Transcript of Lattice P article Filters - Artificial...

Page 1: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

Lattice Particle Filters

Dirk Ormoneit

Computer Science Dept.,Stanford University,Stanford, CA 94305

Christiane Lemieux

Mathematics and Statistics Dept.,University of Calgary,

Alberta, Canada

David J. Fleet

Xerox PARC,3333 Coyote Hill Rd,Palo Alto, CA 94304

Abstract

A promising approach to approximate infer-ence in state-space models is particle �lter-ing. However, the performance of particle�lters often varies signi�cantly due to theirstochastic nature. We present a class of al-gorithms, called lattice particle �lters, thatcircumvent this diÆculty by placing the par-ticles deterministically according to a Quasi-Monte Carlo integration rule. We describe apractical realization of this idea, discuss itstheoretical properties, and its eÆciency. Ex-perimental results with a synthetic 2D track-ing problem show that the lattice particle �l-ter is equivalent to a conventional particle �l-ter that has between 10 and 60% more par-ticles, depending on their \sparsity" in thestate-space. We also present results on in-ferring 3D human motion from moving lightdisplays.

1 Introduction

The Particle Filter (PF) has become a popular methodfor approximate inference in dynamical systems, withapplications such as visual tracking [1, 11, 15, 18, 20]and robot localization [8]. The PF approximates amarginal probability distribution over unknown statevariables with a weighted particle set, and therebyprovides a convenient approach to dealing with multi-modal distributions, and nonlinear dynamics and ob-servation equations [6, 9, 11, 14]. But despite its suc-cesses, the PF can be unstable. Statistically speak-ing, even though the PF produces properly weighted

samples, the random variation of predictions based onthese samples may be excessive and therefore the pre-dictions may be unreliable. For visual tracking, thisresults in poor estimates of object location; in somesituations it causes the algorithm to lose track of theobject altogether.

This paper proposes the Lattice Particle Filter (LPF)as an alternative, where particles are placed deter-ministically according to a lattice rule. Lattice rulesare a subclass of Quasi-Monte Carlo (QMC) meth-ods, which are used for high-dimensional integrationin computer graphics and �nance [2, 12, 17]. Animportant theoretical advantage of QMC methods isthat for N samples the error converges at the rateO(N�1 logsN), where s is the state space dimension,versus O(N�1=2) for conventional Monte Carlo (MC).From a practical viewpoint, randomized QMC meth-ods can be used to construct unbiased estimators thathave smaller variance than MC estimators.

After a brief introduction to the PF, we introduceQMC methods and then describe the lattice particle�lter. We show quantitative results on two problems,namely, tracking 2D image patterns, and the inferenceof 3D human pose from a 2D binocular sequence ofprojected limb positions.

2 Previous Work

In Bayesian �ltering we are interested in computing aprobability distribution over the unknown state vari-able xt at time t, conditioned on the observation his-tory, Yt � (yt; :::;y1). This distribution, denotedp(xt jYt), is called the �ltering distribution. The PFis Bayesian �ltering method for which p(xt jYt) is ap-proximated by a set of weighted states (or particles).This approximation is updated recursively from onetime step to the next as new observations becomeavailable. Gordon et al. [9] provide a clear descrip-tion of the method which they call the Bootstrap algo-

rithm. In computer vision it is often called the Con-

densation Algorithm [11]. Other descriptions of themethod, along with some interesting extensions aregiven by Liu et al. [14], Doucet et al. [6], and Pittand Shepard [19].

The goal of the particle �lter is to approximate the �l-tering distribution p(xt jYt) with a set of samples. Inmany applications it is diÆcult to sample directly from

Page 2: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

p(xt jYt). Instead, using Bayes' rule and a Markovassumption, one can use the fact that p(xt jYt) isproportional to the product of a likelihood function,p(yt jxt), and a prediction distribution that summa-rizes the information from previous �ltering steps,p(xt jYt�1). The PF draws random proposals (states)from the prediction distribution, and evaluates thelikelihood function at each proposal; likelihood valuesserve as importance weights to account for the dis-crepancy between the true �ltering distribution andthe proposal from which states were sampled.

In many applications the evaluation of the likelihoodfunction dominates the computational cost of the algo-rithm. As a consequence the number of particles, andusually the dimensionality of the state space, shouldbe kept relatively small. For example, in estimating3D human motion, it has been useful to design low-dimensional subspaces of the state-space within whichto do the �ltering [20]. One can also learn better mod-els of the body's dynamics. Of course, low dimensionalrepresentations may not exist for unconstrained mo-tions, in which case the principal way to improve theperformance of the PF is the eÆcient placement ofparticles, e.g., with importance sampling [6, 19, 15],or with MCMC sampling [3].

This paper examines a Quasi-Monte Carlo (QMC)method as a general way to further reduce the numberof samples. Similar approaches have only recently beensuggested for �ltering [7, 18]; in [18], a non-randomizedQMC method is applied locally at each time step, andin [7], a form of strati�ed sampling having a QMC avor is used, but it can only be applied in low di-mensions. Here we use a randomized QMC method,and this will allow us to get properly weighted sam-ples that can be used to construct unbiased estimators

in the same way the PF does, and therefore the samekind of statistical analysis as for the PF can be per-formed. In addition, our choice of QMC point set hasthe property that the quality of the sampling remainsthe same throughout time.

3 Bayesian Filtering

Before describing the lattice particle �lter, we reviewthe particle �lter and the �ltering equations. Let xtdenote the unknown state variable at time t, and let ytdenote the image observation at time t. The transitionmodel characterizes the state dynamics; with a �rst-order Markov model it takes the form p(xtjxt�1). Theobservation model speci�es the probabilistic relationbetween the state and the observations, providing alikelihood function of the form p(ytjxt). Then, the�ltering distribution p(xtjYt) is given by

p(xtjYt)=

Z:::

Zp(xt; :::;x0jYt)dx1:::dxt�1

/

Z:::

Z tYs=1

p(ysjxs)p(xsjxs�1)dxs: (1)

The direct evaluation of (1) is diÆcult in practice be-cause it involves an integral whose dimension growswith t. Hence it is advisable to solve (1) recursivelyusing the prediction and �ltering equations:

p(xtjYt�1) =

Zp(xtjxt�1)p(xt�1jYt�1)dxt�1(2)

p(xtjYt) / p(ytjxt)p(xtjYt�1): (3)

If the transition and observation models are linear withGaussian noise, then (2) and (3) are also Gaussian andcan be updated using the Kalman �lter [10]. Other-wise, the representation and the computation of (2)and (3) can be diÆcult. One approach is to approxi-mate (3) with a weighted set of particles,

St = f(xt;i; wt;i)gN

i=1(4)

such that St approximates p(xtjYt) as well as possible.This idea can be made precise by requiring that St beproperly weighted [14] in the sense that

NXi=1

wt;if(xt;i)N!1�! E[f(xtjYt)] (5)

for arbitrary integrable functions f .

The PF generates properly weighted samples [14] andit encourages the exploration of paths (x1;i; : : : ;xt�1;i)that appear likely given Yt�1, discarding particleswith small weights from (4). The PF is described by:

a(i) � multinomial(wt�1;1; : : : ; wt�1;N ); (6)

xt;i := g(ut;i;xt�1;a(i)); (7)

wt;i :=p(yt j xt;i)PN

k=1 p(yt j xt;k): (8)

The index sequence generated in step (6) speci�es the\surviving" particles, xt�1;a(i); i = 1; :::; N . Theseparticles are propagated forward in step (7), andstep (8) speci�es the weights associated with eachnew particle. The points ut;i are independent andg(ut;i;xt�1;i) is a transformation function that mapsthe uniform vector ut;i and the previous state xt�1;ionto an sample from p(xtjxt�1;i). This transformationis a component of any computer simulation since ran-dom variate generation for many continuous distribu-tions are based on transformations of uniform randomnumbers [5]. In the case of a univariate standard nor-mal transition model, g(ut;i; 0) would be the inversestandard normal distribution function.

The LPF provides an improved way of executing the�ltering steps (6) { (7). The idea is to draw points ut;ifrom a lattice rule instead of using random sampling,as we explain in the next section.

Page 3: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

4 Lattice Particle Filter

The PF described in Sec. 3 produces properly weightedsamples that approximate the �ltering distribution (1).However, the PF's performance may vary considerablyin practice due to the random nature of the sampling.This is sometimes particularly harmful in the contextof �ltering because, even though the approximationerror due to sampling at each step may be small, errorscan accumulate over time in an exponential manner.

To see this, consider a simple example where the tran-sition density p(xtjxt�1) is uniform on the state-space[0; 1) and independent of xt�1. Let the observationsbe binary with P (yt = 1jxt) being 1 if xt < 0:2 and 0otherwise. That is, the system states are placed inde-pendently on the unit interval, but only those states in[0; 0:2) trigger the response yt = 1 in the observations.Suppose furthermore that the true state trajectory ofxt happened to evolve entirely within [0; 0:2) so that

yt = 1 for all t, and that we use a PF with N = 10particles to recover this trajectory. At each time stepthe number of particles within [0; 0:2) is binomiallydistributed with parameters 10 and 0.2. Hence theprobability of having no particles within [0; 0:2) in anyone of k time steps is (1�(1�0:810)k), which convergesto one exponentially with k. In other words, after asuÆciently long time there almost certainly occurs onetime where the PF lost track of the object completely.

In this example, an alternative strategy would havebeen to choose only one particle at random at eachtime, and to place the remaining particles equidis-tantly around that particle. Then there would alwaysbe two particles in [0; 0:2). This deterministic place-ment of particles according to a low-discrepancy rule

is a special case of a QMC method. We next providesome general background on QMCmethods and latticerules, and then describe the LPF.

4.1 Quasi-Monte Carlo Methods

Consider a generic integration problem with respectto the uniform measure over the domain [0; 1)s. Thereare several ways to approach this task. MC integra-tion averages the values of the integrand at N random,uniform, independent points in [0; 1)s. Its error has a

convergence rate of O(N�1

2 ) that is independent of s.In many practical problems, this convergence rate istoo slow. A deterministic alternative might be to usea Cartesian product of one-dimensional point sets, asshown in Figure 1(left) for s = 2. The problem withthis approach is that the number of pointsN needed topreserve a constant level of integration accuracy growsexponentially fast with the dimension s. This is causedby the poor projections of such point sets onto lower-dimensional subspaces; for example, note that in Fig-ure 1(left), the 256 points project to only 16 points

1.0

1.0

0.0 x1

x2

..

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1.0

1.0

0.0 x1

x2

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Figure 1: Left: Rectangle rule. Right: Korobov latticerule with N = 256 and a = 25. The projection of thepoints on each axis results in N distinct values in thiscase.

along each axis to integrate the x1 and x2 componentsof an integrand of the form f(x1; x2).

By contrast, QMC methods aim to create point setswhose di�erent projections onto coordinate subspacesalways contain N distinct points. An overview of ran-dom number generation and QMC methods is givenin the book by Niederreiter [16]. An example of a 2DQMC point set, called a Korobov lattice rule, is shownin Figure 1(right).

A second goal of QMC methods is to create pointsets that are as close as possible to a uniform distri-bution. More precisely, QMC methods are based onso-called low-discrepancy point sets which give conver-gence rates of O(N�1 logsN) for integration error, in

contrast to the MC method that is O(N�1

2 ). In prac-tice, the performance of QMC integration is intimatelyrelated to the e�ective dimension of the integrationproblem, i.e., the ability to approximate the integrandby a sum of low-dimensional functions [2, 17]. Inte-grands having a small e�ective dimension can be inte-grated accurately by QMC point sets that have goodprojections over subspaces of low dimension. Superiorperformance of QMC over MC has been demonstratedin numerous applications [2, 12, 17].

As a special case of QMC methods we consider latticerules here [13, 21]. By comparison to the Sobol andNiederreiter sequences [16] used in [18], lattice rulesare easy to implement (only one parameter, the gen-

erator, is required for a given sample size), and can bebuilt so that their projections onto low-dimensionalsubspaces are not only well distributed but also ofequivalent quality for di�erent subspaces. They canalso be randomized to produce unbiased estimators.

To build the LPF, we use a special construction oflattice rule called a shifted Korobov lattice rule thatcontains N points U1; : : : ;UN given by

Ui =

�i� 1

N(1; a; : : : ; as�1) +�

�mod 1 ; (9)

Page 4: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

log2N 4 5 6 7 8

al 3 5 11 13 25ah 3 5 5 11 75log

2N 9 10 11 12 13

al 55 43 259 307 699ah 51 139 519 1081 1289log

2N 14 15 16 17 18

al 2087 7243 11035 27891 18373ah 2961 2149 21553 27383 3597

log2N 19 20 21

al 21643 201579 431119ah 120079 172565 232501

Table 1: Generators for dimensions up to 8, al, orproblems with dimension up to 32, ah.

for i = 1; :::; N , where the modulo 1 is appliedcomponent-wise. The integer a 2 [1; :::; N�1] is calledthe generator of the rule; its choice is crucial for theperformance. A table of suitable values for a, depend-ing on the sample size N and the state dimension, aregiven in Table 1 (see [13] for more details). The vector� is a shift that is uniformly distributed over [0; 1)s,which implies that the points Ui are also uniform,but dependent. This shift is important as a meansof obtaining error estimates via multiple simulations[4], and it is necessary to guarantee that the resultingapproximations are unbiased.

4.2 LPF Algorithm

The LPF algorithm propagates N particles followingthe rules (6), (7), and (8). However, the uniform num-bers ut;i in the forward propagation (7) become com-ponents of the (shifted) lattice rule (9).

More precisely, we start with a shifted lattice rulePN = fUi; i = 1; : : : ; Ng in nT dimensions, where n isthe dimension of the state at a single time, and T is thenumber of time steps. We decompose PN into com-ponents of the form ut;1;ut;2; : : : ;ut;N for each timet. Note that, due to (9), these components can becomputed recursively for t = 1; :::; T , without storingPN explicitly. Next, we use ut;1;ut;2; : : : ;ut;N for for-ward propagation in step (7). An important issue isthe assignment of a particular uniform number ut;jto the uniform variables ut;i in (7). There is a dan-ger of assigning components to particles that mightdepend on the outcome of the resampling, hence in-troducing a serious bias. Our approach is to gener-ate at each time t a uniform and random permutation�t = ( t;1; : : : ; t;N) of the integers [1 : : :N ], and thenassign the point ut; t;i to particle i. This implies thatthe trajectory x1;i; : : : ;xt;i of the i

th particle from time1 to t is generated by the point

~Ut;i = (u1; 1;i ;u2; 2;i ; : : : ;ut; t;i): (10)

Note that the point set f ~Ut;1; : : : ; ~Ut;Ng does not forma lattice rule, which would be necessary to obtain

the O(N�1 logsN) convergence rate globally. How-ever, the point set used at each time t forms a lat-tice rule. In addition, it can be shown [13] that whengcd(a;N) = 1, the point sets used at each time stepdi�er in the implementation only because of the ran-dom shift. Hence the quality of the sampling remainsthe same throughout time, which is generally not thecase for other types of QMC point sets such as thoseused in [18].

In summary, the LPF implementation only di�ers fromthe PF in the use of the points ut;i used to generatethe state xt;i. Instead of calling a pseudorandom num-ber generator n times to de�ne ut;i, the LPF insteadgenerates one random shift �, one random permuta-tion �t = ( t;1; : : : ; t;N) of the integers from 1 to N ,and then use the point�

t;i � 1

N(1; a; : : : ; an�1) + �

�mod 1

to generate the state xt;i of the ith particle. Moredetails and pseudo-code are provided in a technicalreport accompanying this paper, and we will make aversion of the C code available on the internet.

5 Theoretical Aspects

In this section we discuss some theoretical results andwe outline di�erences and similarities between the LPFand the ordinary PF. In terms of search methods,one can view the ordinary PF as an informed, ran-dom search where di�erent nodes of a search tree areexpanded or truncated at di�erent time steps. Con-versely, the LPF corresponds to an (almost) determin-istic search algorithm. The observation model is usedas a heuristic function to determine which \node" (i.e.branch of the �ltering tree) is expanded next, and thetransition model speci�es how to carry out this expan-

sion. In the LPF the expansion is designed to searchthe state space as evenly as possible.

From a mathematical perspective, we can show thatthe samples generated by the LPF are properlyweighted in the sense of (5). This is suggested by thefact that the updating of the weights wt;i and the sam-pling with replacement in the LPF are done exactly asin the PF. The only di�erence is that the particlesxt;1; : : : ;xt;N are not independent. We formalize thisintuition in the following theorem, which covers theLPF as a special case:

Theorem 1 If each particle xt;i is generated by a

point ut;i that has a uniform distribution over [0; 1)n

and such that ut;i and us;j are independent for 1 � s <

t � T , 1 � i; j � N , and if the resampling step at time

t is done by sampling with replacement using weights

wt;i proportional to p(ytjxt;i), then St = fxt;i; wt;igN

i=1

is a properly weighted sample, for t = 1; : : : ; T .

Page 5: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

The proof is omitted due to space limitations; itproceeds by showing that each xt;i follows the dis-tribution p(xtjYt�1) as in (2), and then, thatE(h(xt;i)p(ytjxt;i)) = E(p(ytjxt;i))E�(h(xt)); fromthere, applying the ergodic theorem yields the result.Hence, at least asymptotically, the LPF produces acorrect approximation of the �ltering distribution (3),

and furthermore, the quantityP

N

i=1 wt;if(xt;i)=N isan unbiased estimator of E(f(xtjYt)). Note that therandom shift in (9) is essential to this result and isa principal di�erence between the LPF and previouswork on QMC-based �ltering where samples are gen-erated deterministically, e.g. [18].

6 Experiments

To test the LPF we compare it to a conventional PFusing residual sampling [14]. Both �lters had the sameobservations, the same temporal dynamics, and thesame likelihood function. Our goal is to compare thedi�erent �lters in how well they provide an approxi-mation to the �ltering distribution (3). The quality ofthe approximation is measured by computing the errorcovariation in the estimation of the mean of the �lter-ing distribution across many runs of the �lters. Byapplying the �lters with varying numbers of particleswe can also analyze the relationship between estima-tion errors and the computation time required by bothalgorithms. Computational requirements in both algo-rithms scale similarly with the number of particles.

6.1 Disk Tracking

The �rst experiment involves a circular disk under-going a random walk. The disk position at time t,xt 2 IR2, is given by

xt = xt�1 + �t; (11)

where �t is 2D i.i.d. zero-mean Gaussian noise; i.e.,�t � N(�0; �2xI2), where I2 is the 2�2 identity matrix.The standard deviation was �x = 3 pixels.

The e�ective scene modelm(s; t) is 1 if pixel s is withina disk of radius 16 pixels, centered at xt:

m(s; t) �

�1 for ks� xtk � 160 otherwise:

(12)

The image observation is obtained by adding i.i.d.zero-mean Gaussian noise �t;s � N(0; �2

�), with �� =

0:25, to m(s; t):

I(s; t) � m(s; t) + �t;s (13)

The transition density used is

p(xtjxt�1) � N(xt � xt�1; �2dI2) : (14)

We misspeci�ed the model (14) by setting �d = 5,which is about 60% larger than the standard deviation

Figure 2: Root mean-squared errors (RMSE) in the esti-

mated disk location for a PF with residual sampling and

for the LPF. (TOP) RMSE with con�dence levels based

on 64 samples. (BOTTOM) RMSE for PF's with di�er-

ent sample sizes; with 50% more samples, the PF produces

errors similar in magnitude to the LPF.

in the \true" transition model (11). This is becausesuch model parameters are typically unknown in prac-tice, and hence a relatively large value must be chosento search a suÆciently large neighborhood of xt.

We created 1000 image sequences, each with 40 frames,by simulating (11), (12), and (13). We applied the PFand the LPF to each sequence, initialized by setting x0to the true disk location at time 0. The disk locationwas estimated by taking the weighted mean of the par-ticles, according to (5). The error at each time step isthe Euclidean distance between the estimated and thetrue disk location. As a summary statistic, we com-pute the root mean-squared prediction error (RMSE)of the 1000 trials at each time step.

The RMSE results of an experiment with 64 particlesis shown in Figure 2 as a function of time. The cir-cles represents the RMSE of the LPF and the othercurves represent the performance of the ordinary PFfor di�erent numbers of particles. Figure 2 (top) com-pares the LPF to the PF with identical numbers of

particles, showing RMSE and standard error bars thatprovides a measure of our con�dence in this expecteddeviation from the true disk location. The LPF pro-duces location errors that are approximately 20% lessthan the PF with the same number of samples. Thissupports our hypothesis that the deterministic place-

Page 6: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

# particles 16 32 64 128 256 512

Var di�erence 20% 21% 19% 11% 11% 10%eÆciency gain 50% 60% 50% 15% 20% 20%

Table 2: Experimental results in disk tracking task

ment of particles improves the average performance ofthe estimate. In addition, although diÆcult to discernfrom this plot, the standard error bars from the LPFare also smaller.

Figure 2 (bottom) shows the same RMSE results forthe LPF but compared against RMSE results for PFswith di�erent numbers of particles. This allows one toassess the computational gains due to the LPF. Onecan see that we need to increase the number of parti-cles, and hence the computation time, for the PF byapproximately 50% to lower the expected magnitudeof the errors to that obtained with the LPF. Even with50% more particles the performance of each individualtrial may still be worse for the PF due to the largererror bars.

These results hold over a wide range of particle sizes.Figure 3 shows other experiments with from 16 par-ticles up to 512 particles. The experimental setup isidentical to that in Figure 2. Note that the averageprediction error of both methods, as well as the vari-ability of the predictions, decreases with an increasingnumber of particles. A summary of the results fromthe complete set of experiments is given in Table 2.

Note that the LPF consistently outperforms the PF bya margin of at least 10%. The performance improve-ment seems to be less pronounced in the cases wherewe have relatively many particles (128, 256, and 512).This is to be expected because an optimized parti-cle placement seems particularly relevant in the casewhere there are relatively few particles. Regarding thepercentage of additional particles needed by the PF tomatch the performance of the LPF, which is reportedin the third row of the table, the maximum di�erenceof 60% occurs at a sample size of 32 particles for theLPF. Again, this di�erence becomes less pronouncedas the overall number of particles increases.

6.2 Human Motion Tracking

In our second set of experiments, we apply both theLPF and the PF using residual sampling to a hu-man motion tracking problem, where the task is torecover lower portion of a human body from 2D pro-jections. Rather than using camera images as obser-vations, we used 2D projections of body markers (onthe joints) obtained from a commerical motion capturesystem. This �ltering problem is challenging becausethe lower body (legs and hips) is described in a 10Dspace. Tracking occurs directly on this space without�rst applying algorithms for dimensionality reduction

Figure 3: Root mean-squared errors (RMSE) in the es-

timated disk location for PF's with residual sampling and

for the LPF. The number of samples for the di�erent �lters

are show in the legends in each plot.

like principal component analysis [20].

The image observations are labeled 2D positions cor-responding to markers on a human subject, the 3Dlocations of which were found with a commerical mo-tion capture system. The 6 markers used for track-ing the lower body are shown in Fig. 4. The observa-tion model involves the perspective projection of the3D points onto the image plane plus additive Gaus-sian noise. Given a camera center at (Xc; Yc; Zc), withthe optical axis parallel to the Y-axis, a marker point(Xm; Ym; Zm) produces the observation

d =

�Xm �Xc

Ym � Yc;Zm � Zc

Ym � Yc

�+ � (15)

where � is isotropic 2D mean-zero Gaussian noisewith variance �2. In the experiments below we used� = 0:002, which is approximately 2% the length of anupright spine projected into the image plane.

The body is modeled as an articulated model. Thestate vector for the lower body model has 10 angulardegrees of freedom (DOF): 2 for the pelvis, 3 for eachhip, and 1 for each knee. Given a state, s, and the

known spine location, one can traverse the kinematictree, from the spine to the pelvis, and then to eachleg, to generate the 3D marker locations. These 3Dlocations are then projected into the image to generatepredicted observations from the model. Let d̂j(s) bethe 2D location for the jth marker that is predicted by

Page 7: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

lefthip

righthip

rightknee

leftknee

rightheel

leftheel

Figure 4: Locations of 6 markers on the lower body.

state s. Then the likelihood function for state s andobservations D = fdjg

6j=1 is

p(D j s) =

�1

2��2

�6

exp

��

P6

i=1 jjdj � d̂j(s)jj2

2�2

�; (16)

where �2 is the variance of the image noise.

The experiments simulate a binocular observer thatviews the person from approximately 2.5m, with 6cmbetween the two eyes. Assuming independent noise inthe two image views, the joint likelihood is the prod-uct of the individual likelihoods. For a temporal priorover these state variables, we make the simplistic as-sumption that the state changes slowly over time. Inparticular, the temporal prior for states st given st�1is an isotropric, mean-zero Gaussian, with a varianceof �2

a. Here, we �x �a = 0:1 rad. (almost 6 deg.).

Note that we do not model limits on how far joints canrotate in practice. The tracker may therefore drawhigh-likelihood proposals that are anatomically im-possible but are consistent with the 2D observations.Also, limb lengths are determined from the dimensions

of the human subject before tracking begins and arethen �xed during tracking. This is a source of erroras real joints are more complicated than our model,causing some parts such as the pelvis to vary in widthover time.

Comparing the performance of the LPF and PF ismore diÆcult in this case. In particular, the mean ofthe �ltering distribution is not always equal to the truestate. As the �lters are attempting to approximate theposterior distribution, an appropriate measure of per-formance is the di�erence between the true posteriormean state and the mean states computed using theapproximations provided by the LPF and the PF.

To extract the mean of the true posterior, we ran thePF on the input sequences with 103 more samples thanwe used in the experiments. We then took the meanto be the sample mean from this large run. We thenran the LPF and the PF 200 times on the same input,

with di�erent random seeds on every run. The meanson each run were computed according to (5).

Figure 5 summarizes the results for individual statevariables of the left leg, namely, two angles from theleft hip, and one for the left knee. Notice from Fig. 5(top) that the average of the mean estimates obtainedover the 200 trials is very close to the posterior meanin both cases. This is not surprising since both theLPF and the PF produced unbiased estimates. Theerror bars in Fig. 5 (top) show the standard deviationof the ensemble of means for the 200 runs, which iswhere we expect the LPF to show smaller variabilityof the estimates about the true posterior mean. Likethe disk tracker results, the standard deviation for theLPF is usually 5% - 20% below that for the particle�lter, with similar gains in computational eÆciency.

Figure 5 (bottom) shows the expected (absolute) dif-ference between the true posterior mean and the meanestimates obtained with the LPF and the PF. The er-ror bars are larger than in Fig. 2 because we have only200 instead of 1000 trials. The lower errors with thelattice method are clear nonetheless. Finally, note thatalthough the variance reduction in individual states isrelatively modest, reducing the variance in each of 10dimensions can signi�cantly reduce the volume of the10D search space that will need to be explored by the�lter at each time step.

7 Conclusions

We presented an algorithm called the lattice parti-cle �lter to improve the reliability of particle �ltersin visual tracking. Speci�cally, the method placesthe particles deterministically according to a randomlyshifted lattice rule, which reduces the variance of theparticle-based approximation. Experiments demon-strate that the practical improvement from using theLPF amounts to savings in the number of particles ofbetween 20% and 60%. The size of this e�ect dependson the \sparsity" of the particles in the state-space,being more pronounced when there are relatively fewparticles as it is typical for real applications. In ourexperiments involving a 10D human motion tracker,the lattice particle �lter also led to similar reductionsin variance.

For future work, we plan alternative versions of theLPF that are based on a global lattice rule in nT di-mensions. We believe that the global lattice propertywill be central to obtain exact convergence rates and tooptimize the practical performance of the LPF. How-ever, it involves the creation of particles satisfying aglobal lattice constraint. We will also explore alterna-tive resampling methods that could, e.g., depend onthe variability of the likelihoods obtained at each timestep, such as those in [6, 14].

Page 8: Lattice P article Filters - Artificial Intelligencerobotics.stanford.edu/~ormoneit/publications/uai2001-lattice.pdf · article Filters Dirk Ormoneit Computer Science Dept., Stanford

Figure 5: Summary of 200 trials of PF and LPF tracking of lower body. (Top) Individual state variables for the left leg,

for which the LPF means are generally is closer to the posterior mean, and have signi�cantly less variance. (Bottom)

Expected di�erence between the posterior mean and the means computed by the PF and the LPF.

References

[1] M.J. Black and D.J. Fleet. Probabilistic detection andtracking of motion discontinuities. Int. J. Comp. Vis.,38:229{243, 2000

[2] R. Ca ish, W. Moroko�, and A. Owen. Valuation ofmortgage-backed securities using Brownian bridges toreduce e�ective dimension. J. Comp. Fin., 1:27{46,1997.

[3] K. Choo and D.J. Fleet. People tracking with Hy-brid Monte Carlo. Proc. IEEE Int. Conf. Comp. Vis.,Vancouver, July 2001 (to appear)

[4] R. Cranley and T. N. L. Patterson. Randomizationof number theoretic methods for multiple integration.SIAM J. Num. Anal., 13:904{914, 1976.

[5] L. Devroye. Non-Uniform Random Variate Genera-tion. Springer-Verlag, New-York, 1986.

[6] A. Doucet, N. de Freitas, and N. Gordon. SequentialMonte Carlo Methods in Practice. Springer-Verlag,Berlin, 2001.

[7] P. Fearnhead. Sequential Monte Carlo Methods in Fil-ter Theory. PhD thesis, University of Oxford, 1998.

[8] D. Fox, W. Burgard, F. Dellaert, and S. Thrun. MonteCarlo localization: EÆcient position estimation formobile robotics. Proc. AAAI, pp. 343{349, 1999.

[9] N.J. Gordon, D. J. Salmond, and A. F. M. Smith.Novel approach to nonlinear/non-Gaussian Bayesianstate estimation. IEE Proc. Radar, Sonar and Navig.,140:107{113, 1993.

[10] J. D. Hamilton. Time Series Analysis. Princeton Uni-versity Press, 1994.

[11] M. Isard and A. Blake. Condensation - conditionaldensity propagation for visual tracking. Int. J. Comp.Vis., 29:2{28, 1998.

[12] A. Keller. Quasi-Monte Carlo methods in computergraphics: The global illumination problem. Lecturesin Appl. Math., 32:455{469, 1996.

[13] P. L'Ecuyer and C. Lemieux. Variance reduction vialattice rules. Management Sci., 46:1214{1235, 2000.

[14] J.S. Liu and R. Chen. Sequential monte carlo methodsfor dynamic systems. JASA, 93:1032{1044, 1998

[15] J. MacCormick and M. Isard. Partitioned sampling,articulated objects, and interface-quality hand track-ing. Proc. Euro. Conf. Comp. Vis., Vol II, pp. 3{19,Dublin, 2000.

[16] H. Niederreiter. Random Number Generation andQuasi-Monte Carlo Methods, Vol 63 of SIAM CBMS-NSF Regional Conf. Series in Appl. Math.. SIAM,Philadelphia, 1992.

[17] S. Paskov and J. Traub. Faster valuation of �nan-cial derivatives. J. Portfolio Management, 22:113{120,1995.

[18] V. Philomin, R. Duraiswami, and L. Davis. Quasi-random sampling for condensation. Proc. Euro. Conf.Comp. Vis., Vol II, pp. 134{149, Dublin, 2000.

[19] M. K. Pitt and N. Shepard. Filtering via simulation:Auxilary particle �lters. JASA, 94:590{599, 1999.

[20] H. Sidenbladh, M. Black, and D. Fleet. Stochastictracking of 3d human �gures using 2d image motion.Proc. Euro. Conf. Comp. Vis., Vol II, pp. 702{718,Dublin, 2000.

[21] I. H. Sloan and S. Joe. Lattice Methods for MultipleIntegration. Clarendon Press, Oxford, 1994.