PODS, May 23, 2012
-
Upload
sylvester-vinson -
Category
Documents
-
view
26 -
download
2
description
Transcript of PODS, May 23, 2012
Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang
Department of Computer Science, Duke University
PODS, May 23, 2012
Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman
2
Nearest-Neighbor Searching
ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.
𝑆
𝑝∗
a set of points in
any query point in
Find the closest point to
𝑞
3
Voronoi Diagram
Voronoi cell: Voronoi diagram : decomposition induced by
Preprocessing time
Space
Query time
𝑝𝑖
4
Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.
𝑞
What is the “nearest neighbor” of now?
5
Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --
Expected distance:
. Find the expected nearest neighbor (ENN) of :
Or an -ENN : 𝑞 𝑄
6
Previous work Uncertain data
ENN• The ENN under metric: ε-approximation [Ljosa2007]• No bounds on the running time
Most likely NN• Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]
Uncertain queryENN• Discrete uniform distribution: both exact and O(1)
factor approximation [Li2011, Sharifzadeh2010, etc] • No bounds on the running time
7
Our contribution
Distance
function
Settings Preprocessing time Space Query time
Squared Euclidean distance
Uncertain data
Uncertain query
metric
Uncertain data
Uncertain query
Euclidean metric(-ENN)
Uncertain data
Uncertain query
Results in , extends to higher dimensions
First nontrivial methods for ENN queries with provable performance guarantees !
8
Expected Voronoi cell
Expected Voronoi diagram : induced by
An example in metric
Expected Voronoi Diagram
9
: the centroid of
Lemma:
• same as the weighted Voronoi diagram WVD
Squared Euclidean distanceUncertain data
Preprocessing time
Space Query time
Remarks: Works for any distribution
�̂�𝜎 2
𝑃∈𝒫 𝑞
Ed (𝑃 ,𝑞)|∨𝑞−�̂� ||2
�̂�𝜎 2
10
metricUncertain data Size of : Lower bound construction
the inverse Ackermann function Remarks: Extends to metric
11
metricUncertain data (cont.) A near-linear size index exists despite size of
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions
12
Euclidean metric (-ENN)Uncertain data Approximate by
Outside the grid:
Inside the gird:
Total # of cells:
Remarks: Extends to any metric
8 Ed (𝑃 , �̂�)/ 𝜀�̂�
Cell size: 𝜀
13
Euclidean metric (-ENN)Uncertain data (cont.)
A linear size approximate !
13
Preprocessing time
Space Query time
𝑔𝑃 1
𝑔𝑃 2
𝑞
14
Conclusion and future work Conclusion
First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees
ENN is not a good indicator when the variance is large Future work
Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs
THANKS
15
Squared Euclidean distanceUncertain query
: the centroid of
Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with
Preprocessing time
Space Query time
Remarks: Extends to higher dimensions and works for any distribution
16
Rectilinear metricUncertain query Similarly, linear pieces
Preprocessing time
Space
Query time
17
Euclidean metric (-ENN)Uncertain query
Preprocessing time
Space
Query time
Remarks: Extends to higher dimensions
18
metricUncertain data (cont.) A near-linear size index exists despite size of
linear pieces!
𝑝𝑖𝑗
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗
− 𝑦𝑞)
− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗
− 𝑦𝑞)
(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗
−𝑦𝑞 )𝑝𝑖𝑗
Linear!
𝑃 𝑖