Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational...
Transcript of Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational...
![Page 1: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/1.jpg)
Computational GeometrySearch in High dimension and kd-trees
Ioannis Emiris
Dept Informatics & Telecoms, National Kapodistrian U. AthensATHENA Research & Innovation Center, Greece
Spring 2018
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 1 / 57
![Page 2: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/2.jpg)
Contents
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 2 / 57
![Page 3: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/3.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 3 / 57
![Page 4: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/4.jpg)
Range query interpreted geometrically
date of birth
salary
3000
4000
19500000 19559999
G. Ometerborn: Aug 19, 1954salary: $3200
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 4 / 57
![Page 5: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/5.jpg)
Range query in 3 dimensions
date of birth
salary
3000
4000
19500000 19559999
2
4
chlidren
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 5 / 57
![Page 6: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/6.jpg)
The geometric approach
We are interested in answering queries on d fields of the records inour database.
Transform the records to points in d-dimensional space.
The transformed range query asks for all points inside ad-dimensional axis-parallel box (may be unbounded).
Such a query is called “rectangular” or “orthogonal” range query.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 6 / 57
![Page 7: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/7.jpg)
1-Dimensional Range Search
Problem
Preprocess a set of points P = {p1, p2, . . . , pn} ∈ R so as to answerqueries efficiently:Which points lie inside a query interval [x : x ′]?
Arrays
O(n) space, O(n log n) preprocess, O(k + log n) query
But, do not generalize in higher dim,
do not allow efficient updates: O(n).
Balanced Binary Search Trees (BBST)
The leaves of T store the points of P,
internal nodes store splitting values that guide the search.
E.g. red-black trees, AVL trees.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 7 / 57
![Page 8: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/8.jpg)
A search with the interval [18 : 77]
3 10
3
19 23
19
30
4930
10 37
37 59 62
59
70 80
70
62
23
89 100
89
100 105
80
49
µ µ′
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 8 / 57
![Page 9: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/9.jpg)
A search with the interval [x , x ′]
Search for x and x ′ in T . The search ends to leaves µ and µ′.
Report all points stored at leaves between µ and µ′ plus, possibly, thepoints stored at µ and µ′.
Remark
The leaves to be reported are the ones of subtrees that are rooted atnodes whose parents are on the search paths to µ and µ′.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 9 / 57
![Page 10: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/10.jpg)
A search with the interval [x , x ′]
Search for x and x ′ in T . The search ends to leaves µ and µ′.
Report all points stored at leaves between µ and µ′ plus, possibly, thepoints stored at µ and µ′.
Remark
The leaves to be reported are the ones of subtrees that are rooted atnodes whose parents are on the search paths to µ and µ′.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 9 / 57
![Page 11: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/11.jpg)
The selected subtrees
µ µ′
root(T )
vsplit
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 10 / 57
![Page 12: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/12.jpg)
Correctness and Performance
Any reported point lies in the query range.
Any point in the range is reported.
O(n) storage.
O(n log n) preprocessing.
O(log n) update.
Θ(n) worst case case query cost.
O(k + log n) output sensitive query cost: O(k) to report the pointsplus O(log n) to follow the paths to x , x ′.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 11 / 57
![Page 13: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/13.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 12 / 57
![Page 14: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/14.jpg)
kd-Trees in the plane
Problem
Preprocess points P = {p1, p2, . . . , pn} ⊂ R2, to answer queries efficiently:Which points lie inside a query rectangle [x : x ′]× [y : y ′]?p = (px , py ) lies in the rectangle iff px ∈ [x , x ′] & py ∈ [y , y ′].
kd-trees
Generalize BBST: they split current pointset at median value, but usedifferent coordinate at each level.
Left subtree contains half (or one less) points with smaller coordinate andpoint with median value.
Points shall correspond to leaves (and plane regions).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 13 / 57
![Page 15: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/15.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5
l6
l7
l9
p1p2
p3
p4
p5
p6
p7
p8
p9
p10
l8
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 16: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/16.jpg)
The way the plane is subdivided
l1p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 17: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/17.jpg)
The way the plane is subdivided
l1
l2
p1p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 18: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/18.jpg)
The way the plane is subdivided
l1
l2l3
p1p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 19: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/19.jpg)
The way the plane is subdivided
l1
l2l3
l4
p1p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 20: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/20.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 21: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/21.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5
l6
p1p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 22: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/22.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5
l6
l7p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 23: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/23.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5
l6
l7p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
l8
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 24: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/24.jpg)
The way the plane is subdivided
l1
l2l3
l4
l5
l6
l7p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
l8l9
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 14 / 57
![Page 25: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/25.jpg)
The corresponding binary tree
p5 p4
l4
p2
l5
p10
l6
l2 l3
p9
l1
p3 p1
p2
l5
p3 p1
l8 p2
l5
p3 p1
l9p7
l7
p6 p8
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 15 / 57
![Page 26: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/26.jpg)
BuildKdTree(P , depth)
if P contains only one point thenreturn a leaf storing this point
elseif depth is even then
split P with vertical ` through median x-coord. of points in PP1 ← points left of ` or on `P2 ← points right of `
else {depth is odd}split P with horizontal ` through median y -coord. of points in PP1 ← points below ` or on `P2 ← points above `
end ifvleft ← BuidKdTree(P1, depth + 1); vright ← BuidKdTree(P2, depth + 1)create a node v storing `lc(v)→ vleft ; rc(v)→ vrightreturn v
end if
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 16 / 57
![Page 27: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/27.jpg)
Building time and storage
Remarks
Split at the n2 -th smallest (median) coordinate: O(n) time,
or preprocess by sorting both on x- and y -coordinates.
The building time satisfies the recurrence:
T (n) =
{O(1) if n = 1O(n) + 2T (n2 ) if n > 1
T (n) = O(n log n) which subsumes sorting.
O(n) storage: points stored at leaves, leaf contains ≥ 1 points(alternatively stored at internal/splitting nodes).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 17 / 57
![Page 28: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/28.jpg)
Nodes in a kd-tree and regions in the plane
l1
l2
l3
l1
l2
l3
v
region(v)
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 18 / 57
![Page 29: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/29.jpg)
Regions and the query algorithm
Internal nodes of a kd-tree correspond to rectangular regions of theplane: can be unbounded on one or more sides.
Regions of all nodes at a specific level partition the plane.
region(root(T )) is the whole plane.
Point stored at (leaf of) subtree rooted at v iff it lies in region(v)
Search the subtree of v only if the query rectangle intersectsregion(v).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 19 / 57
![Page 30: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/30.jpg)
A query on a kd-tree
l1
l2l3
l4
l5
l6
l7p1
p2
p3
p4
p5
p6
p7
p8
p9
p10
l8l9
p5 p4
l4 l5
p10
l6
l2 l3
p9
l1
p1
l5
p1
l8 p2
l5
p3 p1
l9p7
l7
p6 p8
p2
p3
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 20 / 57
![Page 31: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/31.jpg)
Algorithm
SearchKdTree(v ,R)
if v is a leaf thenreport point stored at v if in R
elseif region(lc(v)) is fully contained in Rthen
ReportSubtree(lc(v))else
if region(lc(v)) intersects R thenSearchKdTree(lc(v),R)
end ifend ifif region(rc(v)) is fully contained in Rthen
ReportSubtree(rc(v))else
if region(rc(v)) intersects R thenSearchKdTree(rc(v),R)
end ifend if
end if
Input: root v , range R.
Works for any query R,e.g. disk, triangle.
O(k) to report k points.
How many other nodes vare visited? i.e. for howmany v , query rangeintersects region(v)?
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 21 / 57
![Page 32: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/32.jpg)
Query time analysis
Any vertical line intersects region(lc(root(T ))) orregion(rc(root(T ))) but not both.
If a vertical line intersects region(lc(root(T ))) it always intersects theregions corresponding to both children of lc(root(T )).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 22 / 57
![Page 33: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/33.jpg)
Query time analysis
The number of intersected regions (by vertical line) in a kd-treestoring n points, satisfies the recurrence:
Q(n) =
{O(1) if n = 12 + 2Q(n4 ) if n > 1
Q(n) = O(√n)⇒ time = O(
√n + k) for rectangular query
The analysis is rather pessimistic: In many practical situations thequery range is small and will intersect much fewer regions.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 23 / 57
![Page 34: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/34.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 24 / 57
![Page 35: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/35.jpg)
Introduction
Given a distance function/metric:
Preprocess: set of points/objects P = {p1, . . . , pn} in d dimensions.
Query: Given a d-dimensional query point/object q, report the closestp ∈ P to q.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 25 / 57
![Page 36: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/36.jpg)
Motivation
Points model general objects (e.g. handwritten digits)
Distance between points inverse to similarity measure
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 26 / 57
![Page 37: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/37.jpg)
Several applications
Machine Learning: clustering/classification.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 27 / 57
![Page 38: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/38.jpg)
Several applications
Pattern Recognition and Classification
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 28 / 57
![Page 39: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/39.jpg)
Several applications
Searching multimedia databases.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 29 / 57
![Page 40: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/40.jpg)
NN in R
Sort/store the n points, use binary search for queries, then:
Prepreprocessing in O(n log n) time
Data structure requiring O(n) space
Answer the query in O(log n) time
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 30 / 57
![Page 41: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/41.jpg)
NN in R2
Preprocessing: Voronoi Diagram in O(n log n).
Storage = O(n).
Given query q, find the cell it belongs (point location) in O(log n).NN = site of cell containing q.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 31 / 57
![Page 42: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/42.jpg)
Exact NN in Rd
Is it faster than linear-time?
Curse of Dimensionality:
Complexity of Voronoi diagram grows rapidly = O(ndd/2e).
Planar point location methods do not extend to higher dimensions.
The volume of the space increases so fast that data becomes sparse
State of the art:
kd-trees: Sp = O(n), Query = O(d · n1−1/d).Most practical for d � log n: O(log n) expected for “random” points
Randomized [Clarkson’88]: Sp = O(ndd/2e+δ), Q ' log n· exp(d).
n hyperplanes: point location O(d5 log n), Sp = O(nd+δ) [Meiser’93]
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 32 / 57
![Page 43: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/43.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 33 / 57
![Page 44: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/44.jpg)
Nearest Neighbor in high dimension
Exact NN
Given set P in d dimensions, and query point q, its NN is point p0 ∈ P:
dist(p0, q) ≤dist(p, q), ∀p ∈ P.
Approximate NN
Given set P in d dimensions, approximation factor 1 > ε > 0, and querypoint q, an ε-NN, or ANN, is any point p0 ∈ P:
dist(p0, q) ≤ (1 + ε) dist(p, q), ∀p ∈ P.
•
•
•
•
•
• •
•
•q
•NN
•x∗1
•x∗2
r(1 + ε)r
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 34 / 57
![Page 45: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/45.jpg)
Approximate NN in Rd
BBD tree [Arya,Mount et al.98]: optimal query for d = O(1) BBD
In practice like kd-trees:
cgal offers “lazy” kd-treesann [Mount] for d ≤ 60flann [Lowe-Muja], kd-geraf [E-Samaras]: randomized
Locality sensitive hashing (LSH) for ε-NNSp = O(dn1+ρ), Q = O(dnρ), ρ = 1/(1 + ε)2.[Indyk,Motwani’98] [Panigrahy’06] [Andoni,Indyk’06]
Dimensionality reduction [Anagnostopoulos,E,Psarros’15’17]Sp = O∗(dn), Q = O∗(dnρ), ρ = 1 + ε2/ log ε < 1
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 35 / 57
![Page 46: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/46.jpg)
NN formulations
Standard computational geometry: the space is Euclidean Rd , forconstant d .
Complex data: treat d as an asymptotic quantity and seek solutionshaving no exponential dependence on d .
Wish to treat arbitrary metric (nonvector) spaces.
Structure, especially for metric spaces: may assume a growth-limitingproperty, e.g. constant doubling dimension: twice the ball is includedin constant number of balls (true for Euclidean).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 36 / 57
![Page 47: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/47.jpg)
Grid for Uniform points
n uniformly distributed points in [0, 1]d
Cell structure (array) using parameter c = O(1)– Expected #points per box = c (high c increases box search).– Expected n/c boxes (high c reduces array size).– Each box of volume = c/n, edge length (c/n)1/d < 1.
Query lands in a box in O(1), checks points in box.– Given current best distance, check 3d − 1 adjacent boxes.– In expectation, O(1) boxes visited, in time O(c).– Expected query time = O(c + 3d).[Bentley-W.-Yao’80,Bentley’90].
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 37 / 57
![Page 48: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/48.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 38 / 57
![Page 49: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/49.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 39 / 57
![Page 50: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/50.jpg)
kd-trees
Assuming d > 2 but small.
Iterate through splitting coordinates; various strategies to pick them
Leaves contain ≥ 1 points; bound #levels.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 40 / 57
![Page 51: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/51.jpg)
NN search
Procedure NN(node), given query q
if node is leaf thenSearch all points in node, update best-dist
else {internal node}if split-coor(q) ≤ node’s split-value then
NN(left-child) // standard branchif split-coor(q) + best-dist > node’s split-value then
NN(right-child) // recurse to checkend if
else {split-coor(q) > node’s split-value}NN(right-child)if split-coor(q)− best-dist ≤ node’s split-value then
NN(left-child)end if
end if left/rightend if internal node
Overall topdown algorithm: NN(root).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 41 / 57
![Page 52: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/52.jpg)
Complexity
Sp = O(d · n).
construction of balanced tree: O(d · n log n) by sorting per dimension,O(n log n) by linear-time median computation.
(Few) Insert/delete operations in balanced kd-tree = O(log n)
Exact Range query = O(d · n1−1/d + k).
In practice, ANN ' O(log n) when d = O(1), since O(1) expectedneighbors for random (e.g. uniform) distribution. See also BBD-trees:O((d/ε)d log n).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 42 / 57
![Page 53: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/53.jpg)
Complexity
Sp = O(d · n).
construction of balanced tree: O(d · n log n) by sorting per dimension,O(n log n) by linear-time median computation.
(Few) Insert/delete operations in balanced kd-tree = O(log n)
Exact Range query = O(d · n1−1/d + k).
In practice, ANN ' O(log n) when d = O(1), since O(1) expectedneighbors for random (e.g. uniform) distribution. See also BBD-trees:O((d/ε)d log n).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 42 / 57
![Page 54: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/54.jpg)
Extension
k Nearest Neighbors
Store k current best points.
Current ball encloses k current best points.
Eliminate sibling if none of its points can be closer than any of kcurrent best points, i.e. if sibling region outside current ball.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 43 / 57
![Page 55: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/55.jpg)
Splitting at max spread
median of set closest to box centre
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 44 / 57
![Page 56: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/56.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 45 / 57
![Page 57: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/57.jpg)
Randomization
Construct:
Create r kd-trees s.t. searches are largely independent.
Find O(1) coord’s maximizing variance: Pick one randomly
May sample the data; split it about the mean of the sample.
Use bounded #levels; bucket contain several points.
Principal Component Analysis finds moment axes: rotate to alignthem with the coordinte axes. Or, random rotation.
Search:
Upper bound on total #nodes to be searched.
Priority queue stores candidates across r trees.
Similar effect to r lower-dim projection [Silpa-Anan,Hartley’08]
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 46 / 57
![Page 58: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/58.jpg)
FLANN: Fast Library for ANN
Typically r ≤ 6 independent trees.
Target d = 128, n > 104 (SIFT encoding of images).
Given data: Automatic choice of configuration,and algorithm (Randomized kd-trees, Hierarchical k-means trees)
[Lowe:IJCV04], software [Lowe,Muja]
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 47 / 57
![Page 59: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/59.jpg)
kd-GeRaF
Implement k-ANN
Simultaneous search, no backtracking.
Quickselect algorithm to find median in O(n)
Accelerated distance computations (dot product, see below)
Public domain C++: https://github.com/gsamaras/kd_GeRaF
(WebApp: 195.134.67.90:8080)
[Avrithis,E,Samaras’15]
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 48 / 57
![Page 60: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/60.jpg)
Randomization
RotationEvery tree uses a randomly rotated pointset, thus using adifferent set of dimensions/coordinates.
Split dimensionPick t dimensions of highest variance. Choose one randomlyat every node while building the tree.
Split ValueThe pointset’s median in split dimension plus uniformlydistributed δ ∈ [−3∆√
d, 3∆√
d], ∆ = diameter of pointset.
ShufflingThe split value may be witnessed in several points, instead ofpicking always the same point, shuffle them to break ties.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 49 / 57
![Page 61: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/61.jpg)
Performance
Parameters (auto or manual)
r Number of trees in forest (points are stored once)
t Number of hi-variance dimensions used for splits
Maximum number of points-per-leaf
c Maximum number of leaves to be checked during search
ε Determine search accuracy
Practical complexity
Automatic parameter configuration yields fastest preprocessing,successful trade-off between accuracy and speed.
Most competing methods suffer from slow parameter configuration,running out of memory, unstable search behaviour.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 50 / 57
![Page 62: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/62.jpg)
Implementation
Search
Descend every tree to leaf, store unvisited branch nodes inmin-priority queue Q.
Examine nodes in Q, until c/1 + ε leaves are checked.
On descending a tree:– at leaf: update currently best distance.– at node: if query in the left half-space: insert right child to Q,descend to left child; or vice versa.
Distance computation
‖x − q‖2 = ‖x‖2 + ‖q‖2 − 2q · x , where the first two can be stored.Offers up to 10% speedup.
Project idea: ‖x − q‖2 − ‖y − q‖2 reduces to 2q · (y − x).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 51 / 57
![Page 63: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/63.jpg)
Experiments
Faster than ann/bbd, flann for d ≥ 1, 000 (up to 10,000), n ≤ 106.
(i) SIFT images: n = 106, d = 128, BBD out of memory.GIST: d = 960 (ii) n = 105 (iii) n = 106, query < 1s, 90% exact.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 52 / 57
![Page 64: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/64.jpg)
Experiments on Oxford set, CroW features (neural nets)
n = 5062 images, d = 512.Brute force: 5.22 sec. Build takes 2 sec for kd-Geraf.
points per leaf trees no t max leaf check miss(%) time(ms)
1 1 4 2 4 0.21 1 4 4 0 0.31 4 4 4 0 0.5
Search with ”Noisy” queries.
points per leaf trees no t max leaf check miss(%) time(s)
16 8 32 32 3.6 0.0116 32 64 64 0 0.0316 64 64 4 0 0.02
Search with Oxford queries.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 53 / 57
![Page 65: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/65.jpg)
Outline
1 Orthogonal Range Search2D orthogonal range search
2 Nearest NeighborsApproximate nearest neighbor
3 Treeskd-treesRandomized kd-treesBalanced Box-Decomposition trees
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 54 / 57
![Page 66: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/66.jpg)
BBD-trees
Box: set theoretic difference of two boxes,one enclosed in the other (inner is optional)
”Empirical runtimes for most distributionsshow little/no practical advantage overkd-trees” [Arya,Mount,et al’94,98].
Complexity:
O(1) points per leaf, space = O(dn).
Height = O(log n): every 4 levels reduce #points by > 2/3.
Construct = O(dn log n).
k εNNs in time O((d/ε)d + k) log n).
Dynamic: point insertion/deletion = O(log n).
overview
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 55 / 57
![Page 67: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/67.jpg)
Construction
Tree constructed by applying 2 operations, when cell contains > 1 points:
(Fair) Split:– by hyperplane parallel to a coordinate plane, through midpoint.– If inner box exists, do not intersect it.– Exponential decrease of region size (quadtree).
Shrink:– partitions box into inner and outer boxes.– If inner box exists, it lies inside new inner box.– Exponential decrease in number of points per cell (kd-tree).
Two strategies:– Splits and Shrinks alternate.– Split until both children with < 2/3 of parent’s points, then Shrink.
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 56 / 57
![Page 68: Computational Geometry - Search in High dimension and kd-trees · 2018-05-15 · Computational Geometry Search in High dimension and kd-trees Ioannis Emiris Dept Informatics & Telecoms,](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78147edd50ce1f666c0535/html5/thumbnails/68.jpg)
ANN search
Algorithm
1 Find leaf that contains query q; min-dist δ from q to points in leaf
2 Order leaves in increasing distance from q (priority search).
3 Find closest leaf to q, compute min-distance δ from q to points in cell
4 While distance of next closest leaf < δ(1 + ε), compute min-distancebetween q and points in cell: if < δ, this distance becomes δ.
Time Complexity
Point location = O(log n).
#cells explored = c < (1 + 6d/ε)d .
ANN query = O(cd log n).
I.Emiris (Athens, Greece) Computational Geometry Spring 2018 57 / 57