Grid Based Sequential Inference - Otago · 2014. 11. 27. · University of Otago Dunedin, New...

5
21 st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014 73 Oral Session 3 (Measuring Things) Grid Based Sequential Inference Malcolm Morrison, Colin Fox Electronics Group, Physics Department University of Otago Dunedin, New Zealand Email: [email protected] Abstract—Sequential inference and filtering methods often make approximations and simplifications to the system dynamics that they look at in order to be as efficient as possible. This leads to limits in accuracy and even wrong results when non- linearities dominate the dynamics of said system. In this paper we acknowledge that computers have vastly improved since methods such as the Unscented Kalman Filter (UKF) were devised and create a new filtering method that takes full advantage of this fact. This is done by discretizing the system of interest and evolve the resultant continuity equation with a PDE solver while updating the state with observations by directly applying Bayes’ Theorem. We use a simple finite volume method on a square grid-discretized pendulum and show through comparison that this method is superior to the UKF, provided computational time is not an issue. I. I NTRODUCTION Sequential inference methods have been dominated by the Kalman filter [1] and, from the late 90’s on, the unscented Kalman filter (UKF) [2]. These filters have been used to great effect in prediction and guidance systems, such as early warning earthquake [3] and vision tracking [4] systems. However the Kalman filter only works well for approximately linear systems with zero mean Gaussian noise [5]. While the UKF represents dynamics and probability distributions of up to second order perfectly [2], it too breaks down in situations where the higher order dynamics dominate, as will be shown here. In this paper we will propose an alternative that, while lacking the elegance and speed of the UKF, has theoretically no limits to accuracy. This is done by looking directly at the properties of probability to construct a partial differential equation (PDE) and solve directly using a grid based method. Hence the accuracy is entirely dependant on the solver that is used and can be increased by throwing more computational power at it. This method will be suited very well for those that don’t have time constraints and have dynamics that are highly non-linear. For the purposes of this paper we will be using a very simple PDE solver as an example of how this may be done and then will compare with the UKF. We will focus on the dynamics of a pendulum as this ranges from approximately linear, allowing for the small angle approximation, to highly non-linear when completely vertical. II. THE METHOD We are going to be performing inference on a system by evolving the probability in phase-space directly through solving the continuity equation: ∂ρ ∂t = −∇· (ρf ) (1) for some quantity ρ on a velocity field f . This equation is often used in areas such as fluid dynamics or any field that deals with the transport of some quantity over space. In layman terms it says that if something enters or leaves a volume, then the amount of stuff left must change proportionately. This is the probability density function that we shall be solving to evolve a probability distribution in time. Using a finite volume method to solve this is a sensible choice as it is derived through this exact equation. Here are the details and approximations to our solver: Our space will be divided up into a square grid of cells of width Δx ranging from π to π. This discretization must be fine enough that the values of ρ and f are approximately constant in a cell and across a boundary, respectively. We will use the method of up-streaming to calculate the flux at the boundary between cell (i, j ) and (i + 1,j ). That is: ρ | x i+1/2 = P ij Δx 2 if f · n 0, (2) ρ | x i+1/2 = P i+1 j Δx 2 if f · n < 0 (3) where P i,j is the total probability in cell (i, j ), and the notation x i+1/2 refers to the boundary between cell (i, j ) and (i +1,j ) with associated outward pointing normal vector n. We will be using a simple Euler step for our time partial differential: ∂t P ij P k+1 ij P k ij Δt (4) with time step Δt. The time step size is going to be limited by: Δt Δx 2N max(|f · n|) (5) in order to preserve positivity, where n is the outward pointing normal vector of a cell and N is the number of neighbouring cells to a single one, 4 in our case.

Transcript of Grid Based Sequential Inference - Otago · 2014. 11. 27. · University of Otago Dunedin, New...

  • 21st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014

    73Oral Session 3 (Measuring Things)

    Grid Based Sequential Inference

    Malcolm Morrison, Colin Fox

    Electronics Group, Physics Department

    University of Otago

    Dunedin, New Zealand

    Email: [email protected]

    Abstract—Sequential inference and filtering methods oftenmake approximations and simplifications to the system dynamicsthat they look at in order to be as efficient as possible. Thisleads to limits in accuracy and even wrong results when non-linearities dominate the dynamics of said system. In this paper weacknowledge that computers have vastly improved since methodssuch as the Unscented Kalman Filter (UKF) were devised andcreate a new filtering method that takes full advantage of this fact.This is done by discretizing the system of interest and evolve theresultant continuity equation with a PDE solver while updatingthe state with observations by directly applying Bayes’ Theorem.We use a simple finite volume method on a square grid-discretizedpendulum and show through comparison that this method issuperior to the UKF, provided computational time is not an issue.

    I. INTRODUCTION

    Sequential inference methods have been dominated by theKalman filter [1] and, from the late 90’s on, the unscentedKalman filter (UKF) [2]. These filters have been used togreat effect in prediction and guidance systems, such asearly warning earthquake [3] and vision tracking [4] systems.However the Kalman filter only works well for approximatelylinear systems with zero mean Gaussian noise [5]. While theUKF represents dynamics and probability distributions of upto second order perfectly [2], it too breaks down in situationswhere the higher order dynamics dominate, as will be shownhere.

    In this paper we will propose an alternative that, whilelacking the elegance and speed of the UKF, has theoreticallyno limits to accuracy. This is done by looking directly atthe properties of probability to construct a partial differentialequation (PDE) and solve directly using a grid based method.Hence the accuracy is entirely dependant on the solver thatis used and can be increased by throwing more computationalpower at it. This method will be suited very well for those thatdon’t have time constraints and have dynamics that are highlynon-linear.

    For the purposes of this paper we will be using a verysimple PDE solver as an example of how this may be doneand then will compare with the UKF. We will focus on thedynamics of a pendulum as this ranges from approximatelylinear, allowing for the small angle approximation, to highlynon-linear when completely vertical.

    II. THE METHOD

    We are going to be performing inference on a systemby evolving the probability in phase-space directly through

    solving the continuity equation:

    ∂ρ

    ∂t= −∇ · (ρf) (1)

    for some quantity ρ on a velocity field f . This equation isoften used in areas such as fluid dynamics or any field thatdeals with the transport of some quantity over space. In laymanterms it says that if something enters or leaves a volume, thenthe amount of stuff left must change proportionately. This isthe probability density function that we shall be solving toevolve a probability distribution in time.

    Using a finite volume method to solve this is a sensiblechoice as it is derived through this exact equation. Here arethe details and approximations to our solver:

    • Our space will be divided up into a square grid ofcells of width ∆x ranging from −π to π.

    • This discretization must be fine enough that the valuesof ρ and f are approximately constant in a cell andacross a boundary, respectively.

    • We will use the method of up-streaming to calculatethe flux at the boundary between cell (i, j) and (i +1, j). That is:

    ρ |xi+1/2

    =Pi j

    ∆x2if f · n ≥ 0, (2)

    ρ |xi+1/2 =Pi+1 j

    ∆x2if f · n < 0 (3)

    where Pi,j is the total probability in cell (i, j), and thenotation xi+1/2 refers to the boundary between cell(i, j) and (i+ 1, j) with associated outward pointingnormal vector n.

    • We will be using a simple Euler step for our timepartial differential:

    ∂tPi j ≈

    P k+1i j − Pki j

    ∆t(4)

    with time step ∆t.

    • The time step size is going to be limited by:

    ∆t ≤∆x

    2N max(|f · n|)(5)

    in order to preserve positivity, where n is the outwardpointing normal vector of a cell and N is the numberof neighbouring cells to a single one, 4 in our case.

  • 21st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014

    74Oral Session 3 (Measuring Things)

    This all leads to the equation for each cell to get from time kto k + 1:

    P k+1i j = Pki j−∆t(Fv−1/2+Fv+1/2+Fx−1/2+Fx+1/2) (6)

    where the F ’s refer to the outward fluxs through each boundaryand the subscripts x and v refer to the space and velocitydirections respectively.

    If we create a vector that contains all Pi j’s, this can thenbe summed up into the matrix equation:

    Pk+1 = (I −∆tA)P k, (7)

    where I is the identity and A is a matrix that contains the up-streaming information plus the associated velocities for eachcell.

    We can now use Bayes’ Theorem [5] to update our proba-bility from an observation through direct vector multiplication:

    P new ∝ P observation · P old (8)

    These two sides just differ by a normalization term.

    A. Kullback-Leibler Divergence

    In order to have a measure of how close we are to thetrue solutions we are going to be using the Kullback-Leiblerdivergence [6]. It is a way of comparing how different oneprobability distribution is from another. The divergence of thediscrete probability distribution Q from P is:

    DKL(P ||Q) =∑

    i

    log

    (

    P (i)

    Q(i)

    )

    P (i). (9)

    Note that this is not symmetric. Again, this is the divergenceof Q from P and not the other way around.

    This only works for cases where Q(i) = 0 =⇒ P (i) = 0as limx→0 x log(x) = 0, hence, we avoid infinities. Also notethat both probabilities must have the same discretization.

    In the cases where the analytical solution exists, we shallcalculate the divergence of our numerical solution from theanalytical one. For the majority of cases where this does notexist we will use the fact that as our discretization becomesfiner and finer, the numerical solution converges to the truesolution. If we take our “true” solution, P , to be anothersimulation run with a finer grid for the same time then weshould get an estimate of how close we are to the real truesolution.

    For the purposes of this paper we shall be comparing asimulation with one where each cell is divided into 16 smallerones. For later analysis we shall refer to the “fineness” of agrid by how many cells per dimension, n. For the most partwe will be comparing simulations of n cells per dimensionwith one of 4n.

    We also need the 4n case to have the same discretizationas the n case. So we need to match the cells for the n case tothose that would be contained in it for the 4n case, then sumbefore calculating the divergence. This then gives:

    DKL(Pn||P4n) =

    n2∑

    i

    log

    (

    P (i)16∑

    j

    P4n(j)

    )

    P (i), (10)

    where the subscript 4n refers to the 4x finer grid.

    III. SIMULATIONS

    (a) x0 = 0.2π (b) x0 = 0.6π

    Fig. 1. The final snapshot of the simulations on a pendulum in phase-space,showing both the n = 100 and the n = 400 that is used to calculate thedivergence DKL.

    For the following simulations the initial state is a one di-mensional Gaussian in phase-space with covariance Σ = 0.2I ,and mean µ = (x0, 0). These are then run for until t = 2πand relevant values are calculated. All other constants are setto unity, so this would be a full period for a simple harmonicoscillator.

    The velocity field that we are using is of the form:

    f = (v,− sin (x)), (11)

    where x and v are the angular displacement and velocityrespectively.

    Figure 1 shows two examples of what these simulationslook like. Notice that figure 1(a) is approximately Gaussian,

    Fig. 2. The divergence after t = π using the x4 finer grid as comparisonfor different starting positions. All are exponential decays, slowing down thefurther from x0 = 0 we go, excluding x0 = π. That one grows slightly fromn = 50 to 100, then drops off ever so slightly at 400.

  • 21st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014

    75Oral Session 3 (Measuring Things)

    Fig. 3. Computing the divergence of all of the t = π states starting at x0 = πfrom one with n = 3200 produces an exponential decay, which shows thatthe method does indeed converge.

    this is due to the fact that for a small angular displacement thedynamics are approximately linear, so a Gaussian will keep itsshape. It is clear that figure 1(b) is highly non-linear from theshape of the PDF.

    A. Rate Of Convergence

    The first thing to note is that the up-streaming that wehave implemented causes an exaggerated spreading out of thedistribution. This is particularly noticeable in the lower gridresolution simulations like the top plots in figure 1.

    Calculating the divergence from the 4x finer grid as dis-cussed earlier we see in figure 2 that most of the initialconditions lead to an exponential-esque decay. However, thefurther away from 0 we go, the slower this decay is. Forthe case where x0 = π there doesn’t seem to be any decaywhatsoever.

    In figure 3 we look at the x0 = π case, but rather thantaking the 4x finer grid to be the “true” value we take one withn = 3200. Here we do see a decay, so the solutions do get

    Fig. 4. Expectation values over time for position and velocities of somefinite volume simulations with observations on a pendulum with grid size n.The solid line is the expectation value, the dotted lines are the variance fromthe expected value. The plus signs are the observations made based on thetrue value (the dashed, dotted line). This simulation is for x0 = 0.2π withn = 100 and 20 observations were made at regular intervals.

    Fig. 5. As in figure 4, only with n = 400.

    closer as we increase the fineness of the grid. The behaviourdisplayed in figure 2 can be explained by the fact that thelower values of n do not produce the convergent behaviourused to justify the x4 finer grid method when run for t = π.However this is not a problem due to the fact we will bemaking observations on our system more often than once everyπ seconds. For the simulations shown in this paper we shallbe making 30 observations every π seconds, hence the PDEsolver will only need to run for π/30 seconds accurately. Forthis even as low as n = 100 is sufficient to demonstrate theadvantages of our inference method.

    B. Comparison With UKF

    For all of the simulations presented here the initial condi-tions are a Gaussian with covariance matrix Σ = 0.2I as abovewith all observations being Gaussian distributed variables withcovariance Σz = 0.1I .

    Figures 4 and 5 show our finite volume method’s positionexpectation values with initial condition x0 = 0.2π for n =100 and n = 400 respectively. Note the closer fit to the truevalue in the latter.

    Fig. 6. Expectation values over time for position and velocities from theUKF for a pendulum. x0 = 0.2π with 30 observations.

  • 21st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014

    76Oral Session 3 (Measuring Things)

    Fig. 7. As in figure 6, but we have told the filter that the dynamics are noisy(σx = 0.01) for a better result. x0 = 0.2π with 30 observations.

    We now compare this with the UKF run for the same lengthin figure 6. Here we see that the predicted path that the filteroutputs overestimates the points of most curvature. When runfor longer times this forms a positive feedback and increaseswith every oscillation. The variance decreases quickly for thefirst few observations, and stays small despite being far awayfrom the observed data and true path.

    In figure 7 we have introduced a fudge factor in the formof process noise. A simple pendulum has no noise in thedynamics, but this does lead to a more precise fit to the path.

    Figures 8 and 9 are the finite volume method run for higherinitial positions. The initial position of x0 = 0.8π in figure 8leads to non-linear dynamics, and yet the true data stays withinthe variances of the filter. Taking this to the extreme, figure9 is run from x0 = π, which is an unstable equilibrium. Youcan see that even in this situation, after a brief run-in time, thefilter keeps the true values within the variances.

    In stark contrast to that, figures 10 and 11 show thepredicted values wandering off from the true data completely.This is lessened with more regular observations being made,

    Fig. 8. Expectation values over time for position and velocities from the FVmethod for a pendulum. x0 = 0.8π with 30 observations.

    Fig. 9. Finite Volume simulation with x0 = π and 30 observations.

    but even this is limited as shown by figure 11 where even 300observations does not keep it from drifting away.

    Running the same conditions with the dynamic noise asabove produces figure 12 and 13. Figure 12 is an adequatefit, but does tend to drift away from the true values briefly.

    Fig. 10. x0 = 0.8π with 30 observations on the UKF.

    Fig. 11. x0 = π and 300 observations on the UKF.

  • 21st Electronics New Zealand Conference (ENZCon) University of Waikato, Hamilton, 20-21 November, 2014

    77Oral Session 3 (Measuring Things)

    Fig. 12. x0 = 0.8π and 30 observations on the UKF with fudge factorσx = 0.01.

    Fig. 13. x0 = 0.8π and 30 observations on the UKF with fudge factorσx = 0.01.

    However, for our unstable equilibrium in figure 13 the expec-tation value moves around the true value, but the variance isunderestimated as earlier.

    IV. CONCLUSION

    In this paper we have demonstrated a new way to performinference on a dynamic system. We have shown throughcomparison that it has several advantages over the unscentedKalman filter:

    • While the UKF is far more computationally efficient,it tends to underestimate the uncertainties for thependulum, both in the non-linear and even the mostlylinear areas.

    • Even in the approximately linear dynamics, the UKFneeds there to be some process noise in order toaccurately estimate the state of the system. There isno process noise in a simple pendulum.

    • Our grid-based method can accurately evolve the sys-tem for some time without performing an observation.Whereas the UKF must perform an observation everytime step. This can lead to the true value being well

    outside the estimated statistics when observations areinfrequent.

    • For both methods, the more non-linearities in thedynamics, the less accurate the result. However, for thegrid based method one can counter this by devotingmore computational power to the problem.

    These differences lead us to conclude that our method isbest suited for inference problems where the system is highlynon-linear, accuracy is valued over speed, and observations arelimited or obtained sporadically. This method also stands out ifone cares about more than just the statistics. In this grid basedmethod one can gain arbitrary information from the state atany point in time, not only at the observation steps.

    ACKNOWLEDGMENT

    The authors would like to thank Richard Norton for helpingpolish the rough edges off finite volume solver and to the OtagoElectronics group in general for their support.

    REFERENCES

    [1] R. E. Kalman, A New Approach to Linear Filtering and PredictionProblems Journal of Basic Engineering 82 (1960): 3545.

    [2] S. J. Julier and J. K. Uhlmann, A New Extension of the Kalman Filterto Nonlinear Systems Proc. of AeroSense: The 11th Int. Symp. onAerospace/Defense Sensing, Simulations and Controls, 1997.

    [3] Y. Bock; B. Crowell; F. Webb; S. Kedar; R. Clayton; B. Miyahara, Fusionof High-Rate GPS and Seismic Data: Applications to Early Warning

    Systems for Mitigation of Geological Hazards American GeophysicalUnion, 2008.

    [4] Shaikh, M.M. ; Wook Bahn ; Changhun Lee ; Kwang-soo Kim ; Tae-jae Lee ; Kwang-soo Kim ; Dongil Cho, Mobile robot vision trackingsystem using Unscented Kalman Filter System Integration (SII), 2011IEEE/SICE International Symposium on.

    [5] Chen, Zhe. Bayesian filtering: From Kalman filters to particle filters, andbeyond. Statistics 182.1 (2003): 1-69.

    [6] Kullback, S.; Leibler, R.A. On Information and Sufficiency Annals ofMathematical Statistics 22 (1) (1951): 7986.