Fast Gauss Transform and Applications

37
Fast Gauss Transform and Applications Changjiang Yang Ramani Duraiswami Nail A. Gumerov Larry Davis

description

Fast Gauss Transform and Applications. Changjiang Yang Ramani Duraiswami Nail A. Gumerov Larry Davis. Outline. Introduction to fast Gauss transform A new multivariate Taylor expansion K-center algorithm for space subdivision Error bound analysis Kernel density estimation - PowerPoint PPT Presentation

Transcript of Fast Gauss Transform and Applications

Page 1: Fast Gauss Transform and Applications

Fast Gauss Transform and Applications

Changjiang YangRamani Duraiswami

Nail A. GumerovLarry Davis

Page 2: Fast Gauss Transform and Applications

Outline

Introduction to fast Gauss transform A new multivariate Taylor expansion K-center algorithm for space subdivision Error bound analysis Kernel density estimation Mean-shift algorithm Experimental results Conclusions

Page 3: Fast Gauss Transform and Applications

Fast Gauss Transform (FGT)

Originally proposed by Greengard and Strain (1991) to efficiently evaluate the weighted sum of Gaussians:

FGT is an important variant of Fast Multipole Method (Greengard & Rolklin 1987, Greengard and Strain 1990).

FGT is a counterpart of fast Fourier transform, except that FFT is for the data on regular grid.

Widely used in mathematical physics, computational chemistry, mechanics. Recently applied to RBF interpolation, acoustic scatter problem, and computer vision (Elgammal et al. 2001).

Page 4: Fast Gauss Transform and Applications

Fast Gauss Transform (cont’d)

The weighted sum of Gaussians

is equivalent to the matrix-vector product:

Direct evaluation of the sum of Gaussians requires O(N2) operations.

Fast Gauss transform reduces the cost to O(N log N) operations.

SourcesTargets

Page 5: Fast Gauss Transform and Applications

Fast Gauss Transform (cont’d)

The factorization or separation of variables

Space subdivision scheme

Error bound analysis

Page 6: Fast Gauss Transform and Applications

Factorization of Gaussian

Factorization is one of the key parts of the FGT. Factorization breaks the entanglements between

the sources and targets. The contributions of the sources are

accumulated at the centers, then distributed to each target.

Sources Targets Sources Targets Sources Targets

Center

Page 7: Fast Gauss Transform and Applications

Hermite Expansion in 1D

The Gaussian kernel is factorized into Hermite functions

where Hermite function hn(x) is defined by

Exchange the order of the summations

where An is defined bya.k.a. Moment

Page 8: Fast Gauss Transform and Applications

Hermite Expansion in Higher Dimensions

The higher dimensional Hermite expansion is the product of 1D Hermite expansion along each dimension.

Total number of terms is O(pd), p is the number of truncation terms.

The number of operations in one factorization is O(pd).

h0 h1 h2

h0h0 h0h1 h0h2

h1h0 h1h1 h1h2

h2h0 h2h1 h2h2

D=1 D=2 D=3 D>3

Page 9: Fast Gauss Transform and Applications

Space Subdivision in FGT

The FGT subdivides the space into uniform boxes and assigns the source points and target points into boxes.

For each box the FGT maintain a neighbor list.

The number of the boxes increases exponentially with the dimensionality.

D=1 D=2 D=3 D>3

Page 10: Fast Gauss Transform and Applications

FGT in Higher Dimensions

The FGT is originally designed to solve the problems in mathematical physics (heat equation, vortex methods, etc), where the dimension is up to 3.

The higher dimensional Hermite expansion is the product of univariate Hermite expansion along each dimension. Total number of terms is O(pd).

The space subdivision scheme in the FGT is uniform boxes. The number of boxes grows exponentially with the dimension.

The exponential dependence on the dimension makes the FGT extremely inefficient in higher dimensions.

Page 11: Fast Gauss Transform and Applications

Multivariate Taylor Expansions

The Taylor expansion of the Gaussian function:

The first two terms can be computed independently.

The Taylor expansion of the last term is:

where =(1, , d) is multi-index.

The multivariate Taylor expansion about center x*:

where coefficients C are given by

Page 12: Fast Gauss Transform and Applications

Reduction from Taylor Expansions

The idea of Taylor expansion of Gaussian kernel is first introduced in the course CMSC878R by Ramani and Nail.

The number of terms in multivariate Hermite expansion is O(pd).

The number of terms in multivariate Taylor expansion is , asymptotically O(dp), a big reduction for large d.

Fix p = 10, vary d = 1:20 Fix d = 10, vary p = 1:20

0 2 4 6 8 10 12 14 16 18 2010

0

102

104

106

108

1010

1012

1014

1016

1018

1020

d

Num

ber

of t

erm

s

Hermite ExpansionTaylor Expansion

0 2 4 6 8 10 12 14 16 18 2010

0

102

104

106

108

1010

1012

1014

p

Num

ber

of t

erm

s

Hermite ExpansionTaylor Expansion

Page 13: Fast Gauss Transform and Applications

Monomial Orders

Let =(1, , n), =(1, , n), then three standard monomial orders:

Lexicographic order, or “dictionary” order: Álex iff the leftmost nonzero entry in - is negative.

Graded lexicographic order: Ágrlex iff 1 · i · n i < 1 · i · n i or (1 · i · n i = 1 · i · n i and

Álex ). Graded reverse lexicographic order:

Ágrevlex iff 1 · i · n i < 1 · i · n i or (1 · i · n i = 1 · i · n i and the rightmost nonzero entry in - is positive).

Example: Let f(x,y,z) = xy5z2 + x2y3z3 + x3, then w.r.t. lex f is: f(x,y,z) = x3 + x2y3z3 + xy5z2; w.r.t. grlex f is: f(x,y,z) = x2y3z3 + xy5z2 + x3; w.r.t. grevlex f is: f(x,y,z) = xy5z2 + x2y3z3 + x3.

Page 14: Fast Gauss Transform and Applications

The Horner’s Rule The Horner’s rule (W.G.Horner 1819) is to recursively

evaluate the polynomial p(x) = an xn + + a1 x + a0 as:

p(x) = (((an x + an-1)x+)x + a0.

It costs n multiplications and n additions, no extra storage. We evaluate the multivariate polynomial iteratively using

the graded lexicographic order. It costs n multiplications and n additions, and n storage.

x=(a,b,c)x0

x1

x2

x3

Page 15: Fast Gauss Transform and Applications

An Example of Taylor Expansion

Suppose x = (x1, x2, x3) and y = (y1, y2, y3), then

1

x1 x2 x3

x12 x1 x2 x1 x3 x2

2 x2x3 x32

x13 x1

2x2 x12x3 x1x2

2 x1x2x

3

x1x32 x2

3 x22x3 x2x3

2 x33

×1

y1 y2 y3

y12 y1 y2 y1 y3 y2

2 y2y3 y32

y13 y1

2y2 y12y3 y1y2

2 y1y2y

3

y1y32 y2

3 y22y3 y2y3

2 y33

1

2 2 2

2 4 4 2 4 2

4/3 4 4 4 8 4 4/3 43 4 4/3

×≈

Constant19 opsDirect: 29ops

19 opsDirect:29ops

19 opsDirect: 29 ops

Page 16: Fast Gauss Transform and Applications

An Example of Taylor Expansion (Cont’d)

1

x1 x2 x3

x12 x1 x2 x1 x3 x2

2 x2x3 x32

x13 x1

2x2 x12x3 x1x2

2 x1x2x

3

x1x32 x2

3 x22x3 x2x3

2 x33

×1

y1 y2 y3

y12 y1 y2 y1 y3 y2

2 y2y3 y32

y13 y1

2y2 y12y3 y1y2

2 y1y2y

3

y1y32 y2

3 y22y3 y2y3

2 y33

1

2 2 2

2 4 4 2 4 2

4/3 4 4 4 8 4 4/3 4 4 4/3

×≈

×

×

Constant19 opsDirect: 29ops

21N opsDirect:31Nops

20 opsDirect:30ops

Page 17: Fast Gauss Transform and Applications

Space Subdivision Scheme

The space subdivision scheme in the original FGT is uniform boxes. The number of boxes grows exponentially with the dimensionality.

The desirable space subdivision should adaptively fit the density of the points.

The cell should be as compact as possible. The algorithm is better to be a progressive one. Based on the above considerations, we model

the task as k-center problem.

Page 18: Fast Gauss Transform and Applications

K-center Algorithm

The k-center problem is defined to seek the “best” partition of a set of points into clusters (Gonzalez 1985, Hochbaum and Shmoys 1985, Feder and Greene 1988).

The k-center problem is NP-hard but there exists a simple 2-approximation algorithm.

Given a set of points and a predefined number k, k-center clustering is to find a partition S = S1 [ S2 [ [ Sk that minimizes max1 · i · k radius(Si), where radius(Si) is the radius of the smallest disk that covers all points in Si.

Page 19: Fast Gauss Transform and Applications

Farthest-Point Algorithm

The farthest-point algorithm (a.k.a. k-center algorithm) is a 2-approximation of the optimal solution (Gonzales 1985).

The total running time is O(kn), n is the number of points. It can be reduced to O(n log k) using a slightly more complicated algorithm (Feder and Greene 1988).

Page 20: Fast Gauss Transform and Applications

A Demo of k-center Algorithm

k = 4

Page 21: Fast Gauss Transform and Applications

Results of k-center Algorithm

The results of k-center algorithm. 40000 points are divided into 64 clusters in 0.48 sec on a 900MHZ PIII PC.

Page 22: Fast Gauss Transform and Applications

Fast Gauss Transform Algorithm

Page 23: Fast Gauss Transform and Applications

Error Bound of FGT

The total error from the series truncation and the cutoff outside of the neighborhood of targets is bounded by

rx

rxry

Truncation error

Cutoff error

Sources

Targets

Page 24: Fast Gauss Transform and Applications

Error Bound Analysis

Increase the number of truncation terms p, the error bound will decrease.

With the progress of k-center algorithm, the radius of the source points rx will decrease, until the error bound is less than a given precision.

The error bound first decreases, then increases with respect to the cutoff radius ry.

Page 25: Fast Gauss Transform and Applications

Kernel Density Estimation (KDE)

Kernel density estimation (a.k.a Parzen method, Rosenblatt 1956, Parzen 1962) is an important nonparametric technique.

Given a set of observations {x1, , xn}, an estimate of density function is

Some commonly used kernel functionsBandwidth

Kernel function

Dimensionality

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

Rectangular Triangular Epanechnikov Gaussian

Page 26: Fast Gauss Transform and Applications

KDE and FGT

In practice, the most widely used kernel is the Gaussian

The density estimate using the Gaussian kernel:

The computational requirement for large datasets is high, especially in higher dimensions. For N points, the computational cost is O(N2).

Fast Gauss transform can reduce the cost to O(N logN).

Page 27: Fast Gauss Transform and Applications

Mean-Shift Algorithm

Mean-shift algorithm is a general clustering technique based on the kernel density estimation. Originally proposed by Fukunaga and Hostetler in 1975, first applied to computer vision applications by Comaniciu.

Given N data points, a kernel function k(x) and a bandwidth h, for each data point x, the mean-shift algorithm iteratively performs:• Compute the mean-shift vector

• Update the current position

Page 28: Fast Gauss Transform and Applications

Mean-Shift with FGT

Each step of the mean-shift algorithm requires kernel density evaluations whose complexity is quadratic.

The quadratic computational complexity can be reduced to linear order using the FGT.

Mean-shift algorithm is slow, also due to the fact that the steepest ascent method used in mean-shift is very inefficient, especially for the complex density functions.

Second order information is necessary for the mean-shift algorithm to reach the nearest stationary point in a global optimal convergence rate.

Page 29: Fast Gauss Transform and Applications

Mean-Shift with Quasi-Newton Methods

Exact Newton step requires computation of the Hessian matrix of the density which is computationally expensive.

Quasi-Newton methods avoid this by using the function and gradient to approximate the Hessian matrix.

(Update direction)

(Update Hessian)

Page 30: Fast Gauss Transform and Applications

Experimental Result (1)

The speedup of the fast Gauss transform in 4, 6, 8, 10 dimensions (h=1.0).

Page 31: Fast Gauss Transform and Applications

Experimental Result (2)

The comparison between the direct evaluation, the original fast Gauss transform, and our improved version of the fast Gauss transform (h=0.2). Relative error is under 2%.

102

103

104

105

100

101

102

103

104

105

106

N

CP

U t

ime

direct methodFGTIFGT

Page 32: Fast Gauss Transform and Applications

Experimental Result (3)

Image segmentation results of the mean-shift algorithm with the Gaussian kernel.

Size: 432X294Time: 7.984 s

Size: 481X321Time: 12.359 s

Page 33: Fast Gauss Transform and Applications

Experimental Result (4)

The results of the mean-shift with BFGS algorithm

Page 34: Fast Gauss Transform and Applications

Experimental Result (5)

The results of the mean-shift with BFGS algorithm

Page 35: Fast Gauss Transform and Applications

Adaptive Bandwidth Selection

KDE is crucially dependent on the choice of bandwidth.

Data-driven bandwidth selection methods are preferred for the choice of the bandwidth.

Rules of thumb (Deheuvels 1977, Silverman 1986) Least-squares cross-validation (Bownman 1984, Redemo

1982) Biased cross-validation (Scott and Terrell 1987) Solved-the-equation plug-in approach (Hall 1980, Sheather

1983,86) Smoothed bootstrap (Faraway and Jhun 1990, Taylor 1989)

Adaptive bandwidth selection requires a local estimation of the point distribution.

K-center algorithm gives a rough estimation of the point distribution (similar to the histogram).

Page 36: Fast Gauss Transform and Applications

K-center and Adaptive Bandwidth Selection

The objective function of k-center is to find k spheres as small as possible to enclose all points.

Each partition of the k-center algorithm can be used to estimate the bandwidth locally.

Page 37: Fast Gauss Transform and Applications

Conclusions

The fast Gauss transform is very effective for speeding up the weighted sums of Gaussians.

The exponential dependence on the dimensionality makes the FGT inefficient in higher dimensions.

Three techniques are applied to speed up the FGT: a new Taylor expansion scheme, a pyramid-like data structure to store and manipulate the expansions, and an adaptive space subdivision technique based on the k-center algorithm.

The FGT is applied to the kernel density estimation. The mean-shift algorithm is chosen as a case study.