What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest...

What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast

Approximate Nearest Neighbor Search?

Masakazu Iwamura, Tomokazu Sato and Koichi Kise(Osaka Prefecture University, Japan)

ICCV’2013

Sydney, Australia

Finding similar data Basic but important problem in information

processing

Possible applications include Near-duplicate detection Object recognition Document image retrieval Character recognition Face recognition Gait recognition

A typical solution: Nearest Neighbor (NN) Search

2

Finding similar data by NN Search Desired properties

Fast and accurate Applicable to large-scale data

3

The paper presents a way to realizefaster approximate nearest neighbor

search for certain accuracy

Benefit from improvement of

computing power

Contents NN and Approximate NN Search Performance comparison Keys to improve performance

4


5

Nearest Neighbor (NN) Search This is a problem that the true NN is

always found In a naïve way

6

NN

　　 Data　　Query

For more data,more time is required

7

Nearest Neighbor (NN) Search Finding nearest neighbor efficiently

Before query is given

1. Index dataNN

1. Select search regions2. Calculate distances of

selected data

After query is given

The true NN must be contained in the selected search regions

Ensuring this takes so long time

Search regions

8

Approximate Nearest Neighbor Search Finding nearest neighbor more efficiently

NN

Search regions Much faster

“Approximate” means that the true NN is not

guaranteed to be retrieved


10

ANN search on 100M SIFT features

BAD

GOOD

Selected results


BAD

GOOD

IMI(Babenko 2012)

IVFADC(Jegou 2011)

Selected results


BAD

GOOD

IMI(Babenko 2012)

IVFADC(Jegou 2011)

BDH(Proposed method)

2.0 times

4.5 times

9.4 times

2.9 times

Selected results


BAD

GOOD

IMI(Babenko 2012)

IVFADC(Jegou 2011)


2.0 times

4.5 times

9.4 times

2.9 times

The novelty of BDH was reduced by IMI before we

succeeded in publishing it…(For more detail, check out the

Wakate program on Aug. 1) Selected results


BAD

GOOD

IMI(Babenko 2012)

IVFADC(Jegou 2011)


2.0 times

4.5 times

9.4 times

2.9 times

So-called binary coding is not suitable for fast

retrieval but for saving memory usage Selected

results


16

Keys to improve performance Select search regions in subspaces Find the closest ones in the original space

efficiently

17


efficiently

18

Select search regions in subspaces In past methods (IVFADC, Jegou 2011 &

VQ-index, Tuncel 2002)

Search regions

Query

Indexed by k-means

clustering

Select search regions in subspaces In past methods (IVFADC, Jegou 2011 &

VQ-index, Tuncel 2002)

Search regions

Query

Indexed by k-means

clustering

Taking very much time to select the search regions

Proven to be the least quantization error

Pros.

Cons.

Indexed by vector quantization

Select search regions in subspaces In the past state-of-the-art (IMI, Babenko

2012)

Feature vectors

Divide into two or more

Calculate distances

in subspaces

Select the regions in the original

space

Indexed by k-means

clustering

Indexed by k-means

clustering

Select search regions in subspaces In the past state-of-the-art (IMI, Babenko

2012)

Feature vectors

Divide into two or more

Calculate distances

in subspaces

Select the regions in the original

space

Less accurate(More quantization error)

Much less processing timePros.

Cons.

>

Indexed by product quantization

Realize better ratio


efficiently

23

Find the closest search regionsin original space In the past state-of-the-art (IMI, Babenko

2012)

1 3 815

1 2 4 916

2 3 510

5 6 8

11

12

Centroid in original space

1 38

15

12

5

11

Search regions are selected in the ascending order of distances in the original space

Subspace 2

Sub

space

1

Distances in subspace

2

Dis

tan

ces

in s

ub

space

1

Centroid in

subspace

Find the closest search regionsin original space In the past state-of-the-art (IMI, Babenko

2012)

1 3 815

1 2 4 916

2 3 510

5 6 8

11

12


1 38

15

12

5

11

Subspace 2

Sub

space

1


2

Dis

tan

ces

in s

ub

space

1

Centroid in

subspace

This can be done more efficiently with the branch and bound

methodIt does not consider the

order of selecting buckets

Search regions are selected in the ascending order of distances in the original space

Find the closest search regionsin original space efficiently In the proposed method


1 38

15

12

5

11

Subspace 2

Sub

space

1

Centroid in

subspace

0

1

3

8

15

1

2

5

11

Assume that upper limit is set to 8


1


2



1 38

15

12

5

11

Subspace 2

Sub

space

1

Centroid in

subspace


2


1

1

3

8

15

1

2

5

11


Max 8

0



1 38

15

12

5

11

Subspace 2

Sub

space

1

Centroid in

subspace


2


1

1

3

8

15

1

2

5

11


Max 8Max 8

10



1 38

15

12

5

11

Subspace 2

Sub

space

1

Centroid in

subspace


2


1

1

3

8

15

1

2

5

11


Max 8Max 8

0 2



1 38

15

12

5

11

Subspace 2

Sub

space

1

Centroid in

subspace


2


1

1

3

8

15

1

2

5

11


Max 8Max 8

0 5


The upper and lower bounds are increased in a step-by-step manner until enough number of data are selected

31

What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast

Approximate Nearest Neighbor Search?

Masakazu Iwamura, Tomokazu Sato and Koichi Kise(Osaka Prefecture University, Japan)

ICCV’2013

Sydney, Australia

What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest...

Documents

Transcript of What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest...