Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang
Department of Computer Science, Duke University
PODS, May 23, 2012
Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman
2
Nearest-Neighbor Searching
ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.
π
πβ
a set of points in
any query point in
Find the closest point to
π
3
Voronoi Diagram
Voronoi cell: Voronoi diagram : decomposition induced by
Preprocessing time
Space
Query time
ππ
4
Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.
π
What is the βnearest neighborβ of now?
5
Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --
Expected distance:
. Find the expected nearest neighbor (ENN) of :
Or an -ENN : π π
6
Previous work Uncertain data
ENNβ’ The ENN under metric: Ξ΅-approximation [Ljosa2007]β’ No bounds on the running time
Most likely NNβ’ Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]
Uncertain queryENNβ’ Discrete uniform distribution: both exact and O(1)
factor approximation [Li2011, Sharifzadeh2010, etc] β’ No bounds on the running time
7
Our contribution
Distance
function
Settings Preprocessing time Space Query time
Squared Euclidean distance
Uncertain data
Uncertain query
metric
Uncertain data
Uncertain query
Euclidean metric(-ENN)
Uncertain data
Uncertain query
Results in , extends to higher dimensions
First nontrivial methods for ENN queries with provable performance guarantees !
8
Expected Voronoi cell
Expected Voronoi diagram : induced by
An example in metric
Expected Voronoi Diagram
9
: the centroid of
Lemma:
β’ same as the weighted Voronoi diagram WVD
Squared Euclidean distanceUncertain data
Preprocessing time
Space Query time
Remarks: Works for any distribution
οΏ½ΜοΏ½π 2
πβπ« π
Ed (π ,π)|β¨πβοΏ½ΜοΏ½ ||2
οΏ½ΜοΏ½π 2
10
metricUncertain data Size of : Lower bound construction
the inverse Ackermann function Remarks: Extends to metric
11
metricUncertain data (cont.) A near-linear size index exists despite size of
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions
12
Euclidean metric (-ENN)Uncertain data Approximate by
Outside the grid:
Inside the gird:
Total # of cells:
Remarks: Extends to any metric
8 Ed (π , οΏ½ΜοΏ½)/ ποΏ½ΜοΏ½
Cell size: π
13
Euclidean metric (-ENN)Uncertain data (cont.)
A linear size approximate !
13
Preprocessing time
Space Query time
ππ 1
ππ 2
π
14
Conclusion and future work Conclusion
First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees
ENN is not a good indicator when the variance is large Future work
Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs
THANKS
15
Squared Euclidean distanceUncertain query
: the centroid of
Preprocessingβ’ Compute the Voronoi diagram VD Queryβ’ Given , compute in , then query VD with
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions and works for any distribution
16
Rectilinear metricUncertain query Similarly, linear pieces
Preprocessing time
Space
Query time
17
Euclidean metric (-ENN)Uncertain query
Preprocessing time
Space
Query time
Remarks: Extends to higher dimensions
18
metricUncertain data (cont.) A near-linear size index exists despite size of
linear pieces!
πππ
β (π₯π ππβπ₯π)+(π¦π ππ
β π¦π)
β (π₯π ππβπ₯π)β ( π¦πππ
β π¦π)
(π₯πππβπ₯π)+ (π¦ πππ
β π¦π)
(π₯πππβπ₯π)β ( π¦π ππ
βπ¦π )πππ
Linear!
π π
Top Related