26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University...

26
/26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi Rice University

Transcript of 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University...

Page 1: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/261

FLANNFast Library for Approximate

Nearest NeighborsMarius Muja and David G. Lowe

University of British Columbia

Presented byMohammad Sadegh Riazi

Rice University

Page 2: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 3: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 4: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. System for automatically choosing the best algorithm and optimum parameters depending on the dataset.

What is FLANN?

Page 5: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Which Programming Languages does it support?

Written in C++Contains a binding for:• C• MATLAB• Python

Page 6: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 7: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Applications

• Cluster analysis• Pattern Recognition• Statistical classification• Computational Geometry• Data compression• Database

Page 8: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 9: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

ApproachesMultiple Randomized k-d Tree Algorithm

• Searching multiple trees in parallel• Splitting dimension is randomly chosen from top ND dimensions with highest variance• ND is fixed to 5• Usually 20 trees is used • Best performance in most of data sets

Page 10: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

ApproachesThe priority search K-Means Tree algorithm

• Partitioning data into K distinct regions • Recursively partitioning each zone until the leaf node which has no more than K items• Pick up the initial centers using random selection Gonzales’ algorithm• I max is number of iterations of making regions • Better performance than k-d tree for higher precisions

Page 11: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

ApproachesComplexity Comparison

Page 12: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 13: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Experiments Data Dimensionality

• Has a great impact on the nearest neighbor matching performance • The decrease or increase in performance is highly correlated with the type of data• For Random data samples it will highly decreases

Page 14: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Experiments Data Dimensionality

• However for Image Patches and real life data, the performance will increases as dimensionality increases. It can be explained by the fact that each dimension gives us some information about the other dimensions so with few search iterations it’s more likely to find the exact NN

Page 15: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Experiments Search Precision

• The desired search precision determines the degree of speedup • Accepting precision as low as 60% we can achieve a speedup of three orders of magnitude

Page 16: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Experiments Automatic Selection of the Optimal Algorithm

Algorithm is a parameter itself Each algorithm can have different performance with different data sets Each algorithm has some internal parameters Dimensionality has a great impact Size and Structure of data (Correlation?) Desired Precision k-means tree & randomized Kd-trees have best performances in most data sets

Page 17: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Experiments How to find the best internal parameters

• First using Global Grid Search to find a zone on parameter plane in order to achieve better performance• Then Local optimizing using Nelder-Mead downhill simplex method• Can choose optimizing on all data sets or portion of it• Do this for all available algorithms Find the best algorithms with its internal parameters

Page 18: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 19: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Scaling Nearest Neighbor Search

• We can achieve better performance using larger scale data sets• Problem: NOT possible to load into single memory • Solutions: • Dimension Reduction • Keeping data on a disk and loading into memory (poor performance)• Distributing ←

Page 20: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Scaling Nearest Neighbor Search

• Distribute NN matching among N machines using Map-Reduce like algorithm• Each machine will only have to index and search 1/N of the whole data• The final result of NNS is obtained by merging the partial results from all the machines in the cluster once they have completed the search• Using Message Passing Interface (MPI) specification• The query is sent from a client to one of the computers in MPI cluster

(Master server) • The master server broadcasts the query to all of the processes in the cluster• Each process run NNS in parallel on its own fraction of the data • When the search is complete an MPI reduce operation is used to merge the

results back to master process and the final results is returned to the client

Page 21: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Scaling Nearest Neighbor Search

• Implementing using Message Passing Interface (MPI)

Page 22: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Outline Introduction• What is FLANN?• Which programming languages does it support?

Applications Approaches • Randomized k-d Tree Algorithm• The Priority Search K-Means Tree Algorithm

Experiments • Data Dimensionality • Search Precision • Automatic Selection of the Optimal Algorithm

Scaling Nearest Neighbor Search What we are going to do References

Page 23: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

What we are going to do

• Developing and simulating an approach for pre processing the input queries to get the better performance • Try to group the input data so that we do not need to search though all data in the tree• Trade off between Throughput and Latency

Page 24: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

References

[1] Marius Muja and David G. Lowe: "Scalable Nearest Neighbor Algorithms for High Dimensional Data". Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014. [PDF] [BibTeX]

[2] Marius Muja and David G. Lowe: "Fast Matching of Binary Features". Conference on Computer and Robot Vision (CRV) 2012. [PDF] [BibTeX]

[3] Marius Muja and David G. Lowe, "Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration", in International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009 [PDF] [BibTeX]

Page 25: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Thank you for your attention

Page 26: 26 1 FLANN Fast Library for Approximate Nearest Neighbors Marius Muja and David G. Lowe University of British Columbia Presented by Mohammad Sadegh Riazi.

/26

Questions ?