Fast Gauss Transform and Applications
description
Transcript of Fast Gauss Transform and Applications
Fast Gauss Transform and Applications
Changjiang YangRamani Duraiswami
Nail A. GumerovLarry Davis
Outline
Introduction to fast Gauss transform A new multivariate Taylor expansion K-center algorithm for space subdivision Error bound analysis Kernel density estimation Mean-shift algorithm Experimental results Conclusions
Fast Gauss Transform (FGT)
Originally proposed by Greengard and Strain (1991) to efficiently evaluate the weighted sum of Gaussians:
FGT is an important variant of Fast Multipole Method (Greengard & Rolklin 1987, Greengard and Strain 1990).
FGT is a counterpart of fast Fourier transform, except that FFT is for the data on regular grid.
Widely used in mathematical physics, computational chemistry, mechanics. Recently applied to RBF interpolation, acoustic scatter problem, and computer vision (Elgammal et al. 2001).
Fast Gauss Transform (cont’d)
The weighted sum of Gaussians
is equivalent to the matrix-vector product:
Direct evaluation of the sum of Gaussians requires O(N2) operations.
Fast Gauss transform reduces the cost to O(N log N) operations.
SourcesTargets
Fast Gauss Transform (cont’d)
The factorization or separation of variables
Space subdivision scheme
Error bound analysis
Factorization of Gaussian
Factorization is one of the key parts of the FGT. Factorization breaks the entanglements between
the sources and targets. The contributions of the sources are
accumulated at the centers, then distributed to each target.
Sources Targets Sources Targets Sources Targets
Center
Hermite Expansion in 1D
The Gaussian kernel is factorized into Hermite functions
where Hermite function hn(x) is defined by
Exchange the order of the summations
where An is defined bya.k.a. Moment
Hermite Expansion in Higher Dimensions
The higher dimensional Hermite expansion is the product of 1D Hermite expansion along each dimension.
Total number of terms is O(pd), p is the number of truncation terms.
The number of operations in one factorization is O(pd).
h0 h1 h2
h0h0 h0h1 h0h2
h1h0 h1h1 h1h2
h2h0 h2h1 h2h2
D=1 D=2 D=3 D>3
Space Subdivision in FGT
The FGT subdivides the space into uniform boxes and assigns the source points and target points into boxes.
For each box the FGT maintain a neighbor list.
The number of the boxes increases exponentially with the dimensionality.
D=1 D=2 D=3 D>3
FGT in Higher Dimensions
The FGT is originally designed to solve the problems in mathematical physics (heat equation, vortex methods, etc), where the dimension is up to 3.
The higher dimensional Hermite expansion is the product of univariate Hermite expansion along each dimension. Total number of terms is O(pd).
The space subdivision scheme in the FGT is uniform boxes. The number of boxes grows exponentially with the dimension.
The exponential dependence on the dimension makes the FGT extremely inefficient in higher dimensions.
Multivariate Taylor Expansions
The Taylor expansion of the Gaussian function:
The first two terms can be computed independently.
The Taylor expansion of the last term is:
where =(1, , d) is multi-index.
The multivariate Taylor expansion about center x*:
where coefficients C are given by
Reduction from Taylor Expansions
The idea of Taylor expansion of Gaussian kernel is first introduced in the course CMSC878R by Ramani and Nail.
The number of terms in multivariate Hermite expansion is O(pd).
The number of terms in multivariate Taylor expansion is , asymptotically O(dp), a big reduction for large d.
Fix p = 10, vary d = 1:20 Fix d = 10, vary p = 1:20
0 2 4 6 8 10 12 14 16 18 2010
0
102
104
106
108
1010
1012
1014
1016
1018
1020
d
Num
ber
of t
erm
s
Hermite ExpansionTaylor Expansion
0 2 4 6 8 10 12 14 16 18 2010
0
102
104
106
108
1010
1012
1014
p
Num
ber
of t
erm
s
Hermite ExpansionTaylor Expansion
Monomial Orders
Let =(1, , n), =(1, , n), then three standard monomial orders:
Lexicographic order, or “dictionary” order: Álex iff the leftmost nonzero entry in - is negative.
Graded lexicographic order: Ágrlex iff 1 · i · n i < 1 · i · n i or (1 · i · n i = 1 · i · n i and
Álex ). Graded reverse lexicographic order:
Ágrevlex iff 1 · i · n i < 1 · i · n i or (1 · i · n i = 1 · i · n i and the rightmost nonzero entry in - is positive).
Example: Let f(x,y,z) = xy5z2 + x2y3z3 + x3, then w.r.t. lex f is: f(x,y,z) = x3 + x2y3z3 + xy5z2; w.r.t. grlex f is: f(x,y,z) = x2y3z3 + xy5z2 + x3; w.r.t. grevlex f is: f(x,y,z) = xy5z2 + x2y3z3 + x3.
The Horner’s Rule The Horner’s rule (W.G.Horner 1819) is to recursively
evaluate the polynomial p(x) = an xn + + a1 x + a0 as:
p(x) = (((an x + an-1)x+)x + a0.
It costs n multiplications and n additions, no extra storage. We evaluate the multivariate polynomial iteratively using
the graded lexicographic order. It costs n multiplications and n additions, and n storage.
x=(a,b,c)x0
x1
x2
x3
An Example of Taylor Expansion
Suppose x = (x1, x2, x3) and y = (y1, y2, y3), then
1
x1 x2 x3
x12 x1 x2 x1 x3 x2
2 x2x3 x32
x13 x1
2x2 x12x3 x1x2
2 x1x2x
3
x1x32 x2
3 x22x3 x2x3
2 x33
×1
y1 y2 y3
y12 y1 y2 y1 y3 y2
2 y2y3 y32
y13 y1
2y2 y12y3 y1y2
2 y1y2y
3
y1y32 y2
3 y22y3 y2y3
2 y33
1
2 2 2
2 4 4 2 4 2
4/3 4 4 4 8 4 4/3 43 4 4/3
×≈
Constant19 opsDirect: 29ops
19 opsDirect:29ops
19 opsDirect: 29 ops
An Example of Taylor Expansion (Cont’d)
1
x1 x2 x3
x12 x1 x2 x1 x3 x2
2 x2x3 x32
x13 x1
2x2 x12x3 x1x2
2 x1x2x
3
x1x32 x2
3 x22x3 x2x3
2 x33
×1
y1 y2 y3
y12 y1 y2 y1 y3 y2
2 y2y3 y32
y13 y1
2y2 y12y3 y1y2
2 y1y2y
3
y1y32 y2
3 y22y3 y2y3
2 y33
1
2 2 2
2 4 4 2 4 2
4/3 4 4 4 8 4 4/3 4 4 4/3
×≈
×
×
Constant19 opsDirect: 29ops
21N opsDirect:31Nops
20 opsDirect:30ops
Space Subdivision Scheme
The space subdivision scheme in the original FGT is uniform boxes. The number of boxes grows exponentially with the dimensionality.
The desirable space subdivision should adaptively fit the density of the points.
The cell should be as compact as possible. The algorithm is better to be a progressive one. Based on the above considerations, we model
the task as k-center problem.
K-center Algorithm
The k-center problem is defined to seek the “best” partition of a set of points into clusters (Gonzalez 1985, Hochbaum and Shmoys 1985, Feder and Greene 1988).
The k-center problem is NP-hard but there exists a simple 2-approximation algorithm.
Given a set of points and a predefined number k, k-center clustering is to find a partition S = S1 [ S2 [ [ Sk that minimizes max1 · i · k radius(Si), where radius(Si) is the radius of the smallest disk that covers all points in Si.
Farthest-Point Algorithm
The farthest-point algorithm (a.k.a. k-center algorithm) is a 2-approximation of the optimal solution (Gonzales 1985).
The total running time is O(kn), n is the number of points. It can be reduced to O(n log k) using a slightly more complicated algorithm (Feder and Greene 1988).
A Demo of k-center Algorithm
k = 4
Results of k-center Algorithm
The results of k-center algorithm. 40000 points are divided into 64 clusters in 0.48 sec on a 900MHZ PIII PC.
Fast Gauss Transform Algorithm
Error Bound of FGT
The total error from the series truncation and the cutoff outside of the neighborhood of targets is bounded by
rx
rxry
Truncation error
Cutoff error
Sources
Targets
Error Bound Analysis
Increase the number of truncation terms p, the error bound will decrease.
With the progress of k-center algorithm, the radius of the source points rx will decrease, until the error bound is less than a given precision.
The error bound first decreases, then increases with respect to the cutoff radius ry.
Kernel Density Estimation (KDE)
Kernel density estimation (a.k.a Parzen method, Rosenblatt 1956, Parzen 1962) is an important nonparametric technique.
Given a set of observations {x1, , xn}, an estimate of density function is
Some commonly used kernel functionsBandwidth
Kernel function
Dimensionality
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
-5 -4 -3 -2 -1 0 1 2 3 4 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Rectangular Triangular Epanechnikov Gaussian
KDE and FGT
In practice, the most widely used kernel is the Gaussian
The density estimate using the Gaussian kernel:
The computational requirement for large datasets is high, especially in higher dimensions. For N points, the computational cost is O(N2).
Fast Gauss transform can reduce the cost to O(N logN).
Mean-Shift Algorithm
Mean-shift algorithm is a general clustering technique based on the kernel density estimation. Originally proposed by Fukunaga and Hostetler in 1975, first applied to computer vision applications by Comaniciu.
Given N data points, a kernel function k(x) and a bandwidth h, for each data point x, the mean-shift algorithm iteratively performs:• Compute the mean-shift vector
• Update the current position
Mean-Shift with FGT
Each step of the mean-shift algorithm requires kernel density evaluations whose complexity is quadratic.
The quadratic computational complexity can be reduced to linear order using the FGT.
Mean-shift algorithm is slow, also due to the fact that the steepest ascent method used in mean-shift is very inefficient, especially for the complex density functions.
Second order information is necessary for the mean-shift algorithm to reach the nearest stationary point in a global optimal convergence rate.
Mean-Shift with Quasi-Newton Methods
Exact Newton step requires computation of the Hessian matrix of the density which is computationally expensive.
Quasi-Newton methods avoid this by using the function and gradient to approximate the Hessian matrix.
(Update direction)
(Update Hessian)
Experimental Result (1)
The speedup of the fast Gauss transform in 4, 6, 8, 10 dimensions (h=1.0).
Experimental Result (2)
The comparison between the direct evaluation, the original fast Gauss transform, and our improved version of the fast Gauss transform (h=0.2). Relative error is under 2%.
102
103
104
105
100
101
102
103
104
105
106
N
CP
U t
ime
direct methodFGTIFGT
Experimental Result (3)
Image segmentation results of the mean-shift algorithm with the Gaussian kernel.
Size: 432X294Time: 7.984 s
Size: 481X321Time: 12.359 s
Experimental Result (4)
The results of the mean-shift with BFGS algorithm
Experimental Result (5)
The results of the mean-shift with BFGS algorithm
Adaptive Bandwidth Selection
KDE is crucially dependent on the choice of bandwidth.
Data-driven bandwidth selection methods are preferred for the choice of the bandwidth.
Rules of thumb (Deheuvels 1977, Silverman 1986) Least-squares cross-validation (Bownman 1984, Redemo
1982) Biased cross-validation (Scott and Terrell 1987) Solved-the-equation plug-in approach (Hall 1980, Sheather
1983,86) Smoothed bootstrap (Faraway and Jhun 1990, Taylor 1989)
Adaptive bandwidth selection requires a local estimation of the point distribution.
K-center algorithm gives a rough estimation of the point distribution (similar to the histogram).
K-center and Adaptive Bandwidth Selection
The objective function of k-center is to find k spheres as small as possible to enclose all points.
Each partition of the k-center algorithm can be used to estimate the bandwidth locally.
Conclusions
The fast Gauss transform is very effective for speeding up the weighted sums of Gaussians.
The exponential dependence on the dimensionality makes the FGT inefficient in higher dimensions.
Three techniques are applied to speed up the FGT: a new Taylor expansion scheme, a pyramid-like data structure to store and manipulate the expansions, and an adaptive space subdivision technique based on the k-center algorithm.
The FGT is applied to the kernel density estimation. The mean-shift algorithm is chosen as a case study.