1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In...
![Page 1: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/1.jpg)
1
Efficient Algorithms for Non-parametric Clustering With Clutter
Weng-Keen Wong Andrew Moore
(In partial fulfillment of the speaking requirement)
![Page 2: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/2.jpg)
2
Problems From the Physical Sciences
Minefield detection
(Dasgupta and Raftery 1998)
Earthquake faults
(Byers and Raftery 1998)
![Page 3: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/3.jpg)
3
Problems From the Physical Sciences
(Pereira 2002) (Sloan Digital Sky Survey 2000)
![Page 4: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/4.jpg)
4
A Simplified Example
![Page 5: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/5.jpg)
5
Clustering with Single Linkage Clustering
ClustersSingle Linkage Clustering MST
![Page 6: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/6.jpg)
6
Clustering with Mixture ModelsResulting ClustersMixture of Gaussians with a
Uniform Background Component
![Page 7: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/7.jpg)
7
Clustering with CFFCuevas-Febrero-Fraiman Original Dataset
![Page 8: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/8.jpg)
8
Related Work(Dasgupta and Raftery 98) Mixture model approach – mixture of
Gaussians for features, Poisson process for clutter
(Byers and Raftery 98) K-nearest neighbour distances for all points
modeled as a mixture of two gamma distributions, one for clutter and one for the features
Classify each data point based on which component it was most likely generated from
![Page 9: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/9.jpg)
9
Outline
1. Introduction: Clustering and Clutter
2. The Cuevas-Febreiro-Fraiman Algorithm
3. Optimizing Step One of CFF4. Optimizing Step Two of CFF5. Results
![Page 10: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/10.jpg)
10
The CFF Algorithm Step One
Find the highdensity datapoints
![Page 11: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/11.jpg)
11
The CFF Algorithm Step Two Cluster the
high density points using Single Linkage Clustering
Stop when link length >
![Page 12: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/12.jpg)
12
The CFF Algorithm
Originally intended to estimate the number of clusters
Can also be used to find clusters against a noisy background
![Page 13: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/13.jpg)
13
Step One: Density Estimators
Finding high density points requires a density estimator
Want to make as few assumptions about underlying density as possible
Use a non-parametric density estimator
![Page 14: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/14.jpg)
14
A Simple Non-Parametric Density Estimator
A datapoint is a highdensity datapoint if:The number of datapoints within ahypersphere of
radiush is > threshold c
![Page 15: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/15.jpg)
15
Speeding up the Non-Parametric Density Estimator
Addressed in a separate paper (Gray and Moore 2001)
Two basic ideas:1. Use a dual tree algorithm (Gray and
Moore 2000)2. Cut search off early without computing
exact densities (Moore 2000)
![Page 16: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/16.jpg)
16
Step Two: Euclidean Minimum Spanning Trees (EMSTs)
Traditional MST algorithms assume you are given all the distances
Implies O(N2) memory usage Want to use a Euclidean Minimum
Spanning Tree algorithm
![Page 17: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/17.jpg)
17
Optimizing Clustering Step
Exploit recent results in computational geometry for efficient EMSTs
Involves modification to GeoMST2 algorithm by (Narasimhan et al 2000)
GeoMST2 is based on Well-Separated Pairwise Decompositions (WSPDs) (Callahan 1995)
Our optimizations gain an order of magnitude speedup, especially in higher dimensions
![Page 18: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/18.jpg)
18
Outline for Optimizing Step Two
1. High level overview of GeoMST22. Properties of a WSPD3. How to create a WSPD4. More detailed description of GeoMST25. Our optimizations
![Page 19: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/19.jpg)
19
Intuition behind GeoMST2
![Page 20: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/20.jpg)
20
Intuition behind GeoMST2
![Page 21: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/21.jpg)
21
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
Well-Separated Pairwise
Decomposition
![Page 22: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/22.jpg)
22
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
Well-Separated Pairwise
Decomposition
Each Pair (Ai,Bi) represents a possible edge in the MST
![Page 23: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/23.jpg)
23
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
1. Create the Well-Separated Pairwise Decomposition
2. Take the pair (Ai,Bi) that corresponds to the shortest edge
3. If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2.
![Page 24: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/24.jpg)
24
A Well-Separated Pair (Callahan 1995)
Let A and B be point sets in d Let RA and RB be their respective bounding hyper-rectangles Define MargDistance(A,B) to be the minimum distance
between RA and RB
![Page 25: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/25.jpg)
25
A Well-Separated Pair (Cont)The point sets A and B are considered to be well-separated if: MargDistance(A,B) max{Diam(RA),Diam(RB)}
![Page 26: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/26.jpg)
26
Interaction ProductThe interaction product between two point
sets A and B is defined as:
A B = {{p,p’} | p A, p’ B, p p’}
![Page 27: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/27.jpg)
27
Interaction ProductThe interaction product between two point
sets A and B is defined as:
A B = {{p,p’} | p A, p’ B, p p’}
This is the set of all distinct pairs with one element
in the pair from A and the other element from B
![Page 28: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/28.jpg)
28
Interaction Product DefinitionThe interaction product between two point
sets A and B is defined as:
A B = {{p,p’} | p A, p’ B, p p’}
For Example:
A = {1,2,3}B = {4,5}
A B = {{1,4}, {1,5}, {2,4}, {2,5}, {3,4}, {3,5}}
![Page 29: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/29.jpg)
29
Interaction Product
A B = {{0,1}, {0,2}, {0,3},{0,4},
{1,2}, {1,3}, {1,4},
{2,3}, {2,4},
{3,4}}
Now let A and B be the same point set ie.
A = {0,1,2,3,4} B = {0,1,2,3,4}
![Page 30: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/30.jpg)
30
Interaction Product
A B = {{0,1}, {0,2}, {0,3}, {0,4},
{1,2}, {1,3}, {1,4},
{2,3}, {2,4},
{3,4}}
Now let A and B be the same point set ie.
A = {0,1,2,3,4} B = {0,1,2,3,4}
Think of this as all possible edges in a complete, undirected graph with {0,1,2,3,4} as the
vertices
![Page 31: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/31.jpg)
31
A Well-Separated Pairwise Decomposition
Pair #1:
([0],[1])
Pair #2:
([0,1], [2])
Pair #3:
([0,1,2],[3,4])
Pair #4:
([3], [4])
Claim:
The set of pairs {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]),
([3], [4])} form a Well-Separated Decomposition.
![Page 32: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/32.jpg)
32
Interaction Product Properties If P is a point set in d then a WSPD of P is a set of pairs (Ai,Bi),…,(Ak,Bk) with the following properties:
1. Ai P and Bi P for all i = 1,…,k
2. Ai Bi = for all i = 1, …, k
A = {0,1,2,3,4} B = {0,1,2,3,4}{([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])} clearly satisfies Properties 1 and 2
![Page 33: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/33.jpg)
33
Interaction Product Property 33. (Ai Bi) (Aj Bj) = for all i,j such that i j
From {([0],[1]), ([0,1], [2]), ([0,1,2],[3,4]), ([3], [4])}
we get the following interaction products:
A1 B1 = {{0,1}}
A2 B2 = {{0,2},{1,2}}
A3 B3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}}
A4 B4 = {{3,4}}
These Interaction Products are all disjoint
![Page 34: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/34.jpg)
34
Interaction Product Property 44. k
i ii BAPP1
P P = {{0,1}, {0,2}, {0,3}, {0,4}, {1,2}, {1,3}, {1,4},
{2,3}, {2,4}, {3,4}}
A1 B1 = {{0,1}}
A2 B2 = {{0,2},{1,2}}
A3 B3 = {{0,3},{1,3},{2,3},{0,4},{1,4},{2,4}}
A4 B4 = {{3,4}}
The Union of the above Interaction Products gives back
P P
![Page 35: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/35.jpg)
35
Interaction Product Property 55. Ai and Bi are well-separated for all i=1,…,k
![Page 36: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/36.jpg)
36
Two Points to Note about WSPDs
Two distinct points are considered to be well-separated
For any data set of size n, there is a trivial WSPD of size (n choose 2)
![Page 37: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/37.jpg)
37
A Well-Separated Pairwise Decomposition (Continued)
If there are n points in P, a WSPD of P can be constructed in O(nlogn) time with O(n) elements using a fair split tree (Callahan 1995)
![Page 38: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/38.jpg)
38
A Fair Split Tree
![Page 39: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/39.jpg)
39
Creating a WSPD
Are the nodes outlined in yellow well-separated? No.
![Page 40: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/40.jpg)
40
Creating a WSPD
Recurse on children of node with widest dimension
![Page 41: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/41.jpg)
41
Creating a WSPD
Recurse on children of node with widest dimension
![Page 42: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/42.jpg)
42
Creating a WSPD
Recurse on children of node with widest dimension
![Page 43: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/43.jpg)
43
Creating a WSPD
And so on…
![Page 44: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/44.jpg)
44
Base Case
Eventually you will find a well-separated pair of nodes.Add this pair to the WSPD.
![Page 45: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/45.jpg)
45
Another Example of the Base Case
![Page 46: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/46.jpg)
46
Creating a WSPDFindWSPD(W,NodeA,NodeB)
if( IsWellSeparated(NodeA,NodeB))AddPair(W,NodeA,NodeB)
elseif( MaxHrectDimLength(NodeA) <
MaxHrectDimLength(NodeB) )Swap(NodeA,NodeB)
FindWSPD(W,NodeA->Left,NodeB)FindWSPD(W,NodeA->Right,NodeB)
![Page 47: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/47.jpg)
47
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
1. Create the Well-Separated Pairwise Decomposition
2. Take the pair (Ai,Bi) that corresponds to the shortest edge
3. If the vertices of that edge are not in the same connected component, add the edge to the MST. Repeat Step 2
![Page 48: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/48.jpg)
48
Bichromatic Closest Pair Distance
Given two sets (Ai,Bi), the Bichromatic
Closest Pair Distance is the closest distancefrom a point in Ai to a point in Bi
![Page 49: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/49.jpg)
49
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
1. Create the Well-Separated Pairwise Decomposition
2. Take the pair (Ai,Bi) with the shortest BCP distance
3. If Ai and Bi are not already connected, add the edge to the MST. Repeat Step 2.
![Page 50: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/50.jpg)
50
GeoMST2 Example Start
Current MST
![Page 51: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/51.jpg)
51
GeoMST2 Example Iteration 1
Current MST
![Page 52: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/52.jpg)
52
GeoMST2 Example Iteration 2
Current MST
![Page 53: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/53.jpg)
53
GeoMST2 Example Iteration 3
Current MST
![Page 54: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/54.jpg)
54
GeoMST2 Example Iteration 4
Current MST
![Page 55: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/55.jpg)
55
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
1. Create the Well-Separated Pairwise Decomposition
2. Take the pair (Ai,Bi) with the shortest BCP distance
3. If Ai and Bi are not already connected, add the edge to the MST. Repeat Step 2.
Modification for CFF:
If BCP distance > , terminate
![Page 56: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/56.jpg)
56
Optimizations We don’t need the EMST We just need to cluster all points
that are within distance or less from each other
Allows two optimizations to GeoMST2 code
![Page 57: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/57.jpg)
57
High Level Overview of GeoMST2
(A1,B1)
(A2,B2)
. . .(Am,Bm)
1. Create the Well-Separated Pairwise Decomposition
2. Take the pair (Ai,Bi) with the shortest BCP distance
3. If Ai and Bi are not already connected, add the edge to the MST. Repeat Step 2.
Optimizations take place in Step 1
![Page 58: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/58.jpg)
58
Recall: How to Create the WSPD
![Page 59: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/59.jpg)
59
Optimization 1 Illustration
![Page 60: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/60.jpg)
60
Optimization 1
Ignore all links that are > Every pair (Ai, Bi) in the WSPD
becomes an edge unless it joins two already connected components
If MargDistance(Ai,Bi) > , then an edge of length cannot exist between a point in Ai and Bi
Don’t include such a pair in the WSPD
![Page 61: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/61.jpg)
61
Optimization 2 Illustration
![Page 62: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/62.jpg)
62
Optimization 2
Join all elements that are within distance of each other
If the max distance separating the bounding hyper-rectangles of Ai and Bi is , then join all the points in Ai and Bi if they are not already connected
Do not add such a pair (Ai,Bi) to the WSPD
![Page 63: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/63.jpg)
63
Implications of the optimizations
Reduce the amount of time spent in creating the WSPD
Reduce the number of WSPDs, thereby speeding up the GeoMST2 algorithm by reducing the size of the priority queue
![Page 64: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/64.jpg)
64
Results
Ran step two algorithms on subsets of the Sloan Digital Sky Survey
7 attributes – 4 colors, 2 sky coordinates, 1 redshift value
Compared Kruskal, GeoMST2, and -clustering
![Page 65: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/65.jpg)
65
Results (GeoMST2 vs -Clustering vs Kruskal in 4D)
![Page 66: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/66.jpg)
66
Results (GeoMST2 vs -Clustering in 3D)
![Page 67: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/67.jpg)
67
Results (GeoMST2 vs -Clustering in 4D)
![Page 68: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/68.jpg)
68
Results (Change in Time as changes for 4D data)
![Page 69: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/69.jpg)
69
Results (Increasing Dimensions vs Time
![Page 70: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/70.jpg)
70
Future Work More accurate, faster non-
parametric density estimator Use ball trees instead of fair split
tree Optimize algorithm if we keep h
constant but vary c and
![Page 71: 1 Efficient Algorithms for Non-parametric Clustering With Clutter Weng-Keen Wong Andrew Moore (In partial fulfillment of the speaking requirement)](https://reader035.fdocuments.net/reader035/viewer/2022062714/56649d6b5503460f94a4aa9e/html5/thumbnails/71.jpg)
71
Conclusions -clustering outperforms GeoMST2
by nearly an order of magnitude in higher dimensions
Combining the optimizations in both steps will yield an efficient algorithm for clustering against clutter on massive data sets