PODS, May 23, 2012

18
Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University PODS, May 23, 2012 Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

description

Joint work with Pankaj K. Agarwal , Alon Efrat , and Swaminathan Sankararaman. Nearest-Neighbor Searching Under Uncertainty Wuzhou Zhang Department of Computer Science, Duke University. PODS, May 23, 2012. Nearest-Neighbor Searching. a set of points in. any query point in. - PowerPoint PPT Presentation

Transcript of PODS, May 23, 2012

Page 1: PODS, May 23, 2012

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

Department of Computer Science, Duke University

PODS, May 23, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

Page 2: PODS, May 23, 2012

2

Nearest-Neighbor Searching

ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.

𝑆

𝑝∗

a set of points in

any query point in

Find the closest point to

𝑞

Page 3: PODS, May 23, 2012

3

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

𝑝𝑖

Page 4: PODS, May 23, 2012

4

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

𝑞

What is the “nearest neighbor” of now?

Page 5: PODS, May 23, 2012

5

Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --

Expected distance:

. Find the expected nearest neighbor (ENN) of :

Or an -ENN : 𝑞 𝑄

Page 6: PODS, May 23, 2012

6

Previous work Uncertain data

ENN• The ENN under metric: ε-approximation [Ljosa2007]• No bounds on the running time

Most likely NN• Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]

Uncertain queryENN• Discrete uniform distribution: both exact and O(1)

factor approximation [Li2011, Sharifzadeh2010, etc] • No bounds on the running time

Page 7: PODS, May 23, 2012

7

Our contribution

Distance

function

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

First nontrivial methods for ENN queries with provable performance guarantees !

Page 8: PODS, May 23, 2012

8

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Expected Voronoi Diagram

Page 9: PODS, May 23, 2012

9

: the centroid of

Lemma:

• same as the weighted Voronoi diagram WVD

Squared Euclidean distanceUncertain data

Preprocessing time

Space Query time

Remarks: Works for any distribution

�̂�𝜎 2

𝑃∈𝒫 𝑞

Ed (𝑃 ,𝑞)|∨𝑞−�̂� ||2

�̂�𝜎 2

Page 10: PODS, May 23, 2012

10

metricUncertain data Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

Page 11: PODS, May 23, 2012

11

metricUncertain data (cont.) A near-linear size index exists despite size of

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

Page 12: PODS, May 23, 2012

12

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:

Remarks: Extends to any metric

8 Ed (𝑃 , �̂�)/ 𝜀�̂�

Cell size: 𝜀

Page 13: PODS, May 23, 2012

13

Euclidean metric (-ENN)Uncertain data (cont.)

A linear size approximate !

13

Preprocessing time

Space Query time

𝑔𝑃 1

𝑔𝑃 2

𝑞

Page 14: PODS, May 23, 2012

14

Conclusion and future work Conclusion

First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees

ENN is not a good indicator when the variance is large Future work

Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs

THANKS

Page 15: PODS, May 23, 2012

15

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessing• Compute the Voronoi diagram VD Query• Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

Page 16: PODS, May 23, 2012

16

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

Page 17: PODS, May 23, 2012

17

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions

Page 18: PODS, May 23, 2012

18

metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

𝑝𝑖𝑗

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)+(𝑦𝑝 𝑖𝑗

− 𝑦𝑞)

− (𝑥𝑝 𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)+ (𝑦 𝑝𝑖𝑗

− 𝑦𝑞)

(𝑥𝑝𝑖𝑗−𝑥𝑞)− ( 𝑦𝑝 𝑖𝑗

−𝑦𝑞 )𝑝𝑖𝑗

Linear!

𝑃 𝑖