Download - PODS, May 23, 2012

Transcript
Page 1: PODS, May 23, 2012

Nearest-Neighbor Searching Under UncertaintyWuzhou Zhang

Department of Computer Science, Duke University

PODS, May 23, 2012

Joint work with Pankaj K. Agarwal, Alon Efrat, and Swaminathan Sankararaman

Page 2: PODS, May 23, 2012

2

Nearest-Neighbor Searching

ApplicationsDatabases, Information RetrievalStatistical Classification, ClusteringPattern Recognition, Data CompressionComputer Vision, etc.

𝑆

π‘βˆ—

a set of points in

any query point in

Find the closest point to

π‘ž

Page 3: PODS, May 23, 2012

3

Voronoi Diagram

Voronoi cell: Voronoi diagram : decomposition induced by

Preprocessing time

Space

Query time

𝑝𝑖

Page 4: PODS, May 23, 2012

4

Data Uncertainty Location of data is imprecise: Sensor databases, face recognition, mobile data, etc.

π‘ž

What is the β€œnearest neighbor” of now?

Page 5: PODS, May 23, 2012

5

Our Model and Problem Statement Uncertain point : represented as a probability density function(pdf) --

Expected distance:

. Find the expected nearest neighbor (ENN) of :

Or an -ENN : π‘ž 𝑄

Page 6: PODS, May 23, 2012

6

Previous work Uncertain data

ENNβ€’ The ENN under metric: Ξ΅-approximation [Ljosa2007]β€’ No bounds on the running time

Most likely NNβ€’ Heuristics [Cheng2008, Kriegel2007, Cheng2004, etc]

Uncertain queryENNβ€’ Discrete uniform distribution: both exact and O(1)

factor approximation [Li2011, Sharifzadeh2010, etc] β€’ No bounds on the running time

Page 7: PODS, May 23, 2012

7

Our contribution

Distance

function

Settings Preprocessing time Space Query time

Squared Euclidean distance

Uncertain data

Uncertain query

metric

Uncertain data

Uncertain query

Euclidean metric(-ENN)

Uncertain data

Uncertain query

Results in , extends to higher dimensions

First nontrivial methods for ENN queries with provable performance guarantees !

Page 8: PODS, May 23, 2012

8

Expected Voronoi cell

Expected Voronoi diagram : induced by

An example in metric

Expected Voronoi Diagram

Page 9: PODS, May 23, 2012

9

: the centroid of

Lemma:

β€’ same as the weighted Voronoi diagram WVD

Squared Euclidean distanceUncertain data

Preprocessing time

Space Query time

Remarks: Works for any distribution

οΏ½Μ‚οΏ½πœŽ 2

π‘ƒβˆˆπ’« π‘ž

Ed (𝑃 ,π‘ž)|βˆ¨π‘žβˆ’οΏ½Μ‚οΏ½ ||2

οΏ½Μ‚οΏ½πœŽ 2

Page 10: PODS, May 23, 2012

10

metricUncertain data Size of : Lower bound construction

the inverse Ackermann function Remarks: Extends to metric

Page 11: PODS, May 23, 2012

11

metricUncertain data (cont.) A near-linear size index exists despite size of

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions

Page 12: PODS, May 23, 2012

12

Euclidean metric (-ENN)Uncertain data Approximate by

Outside the grid:

Inside the gird:

Total # of cells:

Remarks: Extends to any metric

8 Ed (𝑃 , οΏ½Μ‚οΏ½)/ πœ€οΏ½Μ‚οΏ½

Cell size: πœ€

Page 13: PODS, May 23, 2012

13

Euclidean metric (-ENN)Uncertain data (cont.)

A linear size approximate !

13

Preprocessing time

Space Query time

𝑔𝑃 1

𝑔𝑃 2

π‘ž

Page 14: PODS, May 23, 2012

14

Conclusion and future work Conclusion

First nontrivial methods for answering exact or approximate ENN queries with provable performance guarantees

ENN is not a good indicator when the variance is large Future work

Linear-size index for most likely NN queries in sublinear time Index for returning the probability distribution of NNs

THANKS

Page 15: PODS, May 23, 2012

15

Squared Euclidean distanceUncertain query

: the centroid of

Preprocessingβ€’ Compute the Voronoi diagram VD Queryβ€’ Given , compute in , then query VD with

Preprocessing time

Space Query time

Remarks: Extends to higher dimensions and works for any distribution

Page 16: PODS, May 23, 2012

16

Rectilinear metricUncertain query Similarly, linear pieces

Preprocessing time

Space

Query time

Page 17: PODS, May 23, 2012

17

Euclidean metric (-ENN)Uncertain query

Preprocessing time

Space

Query time

Remarks: Extends to higher dimensions

Page 18: PODS, May 23, 2012

18

metricUncertain data (cont.) A near-linear size index exists despite size of

linear pieces!

𝑝𝑖𝑗

βˆ’ (π‘₯𝑝 π‘–π‘—βˆ’π‘₯π‘ž)+(𝑦𝑝 𝑖𝑗

βˆ’ π‘¦π‘ž)

βˆ’ (π‘₯𝑝 π‘–π‘—βˆ’π‘₯π‘ž)βˆ’ ( 𝑦𝑝𝑖𝑗

βˆ’ π‘¦π‘ž)

(π‘₯π‘π‘–π‘—βˆ’π‘₯π‘ž)+ (𝑦 𝑝𝑖𝑗

βˆ’ π‘¦π‘ž)

(π‘₯π‘π‘–π‘—βˆ’π‘₯π‘ž)βˆ’ ( 𝑦𝑝 𝑖𝑗

βˆ’π‘¦π‘ž )𝑝𝑖𝑗

Linear!

𝑃 𝑖