Large-Scale Image Retrieval with Attentive Deep Local Features

25
Large-Scale Image Retrieval with Attentive Deep Local Features 2021.04.27 Sebin Lee ICCV 2017

Transcript of Large-Scale Image Retrieval with Attentive Deep Local Features

Page 1: Large-Scale Image Retrieval with Attentive Deep Local Features

Large-Scale Image Retrieval with Attentive Deep Local Features

2021.04.27Sebin Lee

ICCV 2017

Page 2: Large-Scale Image Retrieval with Attentive Deep Local Features

Contents

2

• Background & Motivation

• Our Approach

• Results

• Summary

• Quiz

Page 3: Large-Scale Image Retrieval with Attentive Deep Local Features

Recap: Keypoint

3picture from Sung-eui Yoon slide, originally made by Kristen Grauman and David Lowe

• Important point of image (e.g., corner)• The keypoint itself is not useful for image retrieval.• e.g., Harris corner detector, Local maxima of DoG

Background & Motivation

< Harris corner detector >

Scale

< Local maxima of DoG >

Page 4: Large-Scale Image Retrieval with Attentive Deep Local Features

Recap: Local Descriptor

4

• The local descriptor is used because the keypoint itself is not useful.• The local descriptor means a compact representation of the image

patch centered on the extracted keypoint.• e.g., SIFT

< Gradients of image patch > < SIFT descriptor >

Background & Motivation

Page 5: Large-Scale Image Retrieval with Attentive Deep Local Features

Classical Keypoint & Local Descriptor Properties

5picture from Sung-eui Yoon slide, originally made by Denis Simakov

• Repeatable• Distinctive• Invariant• Adequate Quantity

How about deep feature?

Sparse keypoint & descriptor

< Keypoints of Image>

Background & Motivation

Page 6: Large-Scale Image Retrieval with Attentive Deep Local Features

Deep Feature

6

• CNNs generate uniformly dense grid feature map.• We can regard the dense grid feature map as a grid of local descriptors.

localdescriptor

Background & Motivation

RGB Image(3, H, W)

Dense grid feature(C, H', W')

CNN layers

receptive field of local descriptor= local image patch

= Densely extracted local descriptors

Page 7: Large-Scale Image Retrieval with Attentive Deep Local Features

Motivation

7picture from Sung-eui Yoon slide, originally made by Denis Simakov

• Unlike classical methods, the CNNs generate uniformly dense grid local descriptors.• Therefore, local descriptors are extracted from regions that have no value as keypoints.

(e.g., texture-less region)• Many local descriptors containing unnecessary ones hinder search and making codebook.

Background & Motivation

Query DB

< Query feature contains unnecessary local descriptors (people) >

∴We needs keypoint selection that selects only helpful keypointsfor efficient and accurate image retrieval.

Page 8: Large-Scale Image Retrieval with Attentive Deep Local Features

Contents

8

• Background & Motivation

• Our Approach

• Results

• Summary

• Quiz

Page 9: Large-Scale Image Retrieval with Attentive Deep Local Features

Goal

9

• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.

Our Approach

Local descriptor network

< Overall Pipeline of Image Retrieval >

Our goal

Page 10: Large-Scale Image Retrieval with Attentive Deep Local Features

Goal

10

Our Approach

Local descriptor network

< Overall Pipeline of Image Retreival >

Our goal

• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.

Page 11: Large-Scale Image Retrieval with Attentive Deep Local Features

Goal

11

Our Approach

Local descriptor network

< Overall Pipeline of Image Retreival >

Our goal

• Train a local descriptor network using keypoint selection for efficient and accurate image retrieval.

Page 12: Large-Scale Image Retrieval with Attentive Deep Local Features

Additionallayers only for

training

Train Local Descriptor Network by using Classification task

12

Our Approach

Input image

Grid local descriptors

LabelC.ELoss

• Backbone: ResNet50 trained on ImageNet• Fine tune the local descriptor network on Google Landmarks dataset.

(Image Retrieval dataset)

• Use only cross-entropy loss (loss of classification task)• However, this method doesn’t perform keypoint selection.

Local Descriptor Network

Page 13: Large-Scale Image Retrieval with Attentive Deep Local Features

Training for Keypoint Selection

13

Our Approach

Input image

Grid local descriptors

Attention score (channel: 1)

Sparse local descriptors by keypoint selection

• Keypoint selection can be performed by attention.• Attention module is used to calculate attention score.• Attention score is close to 0: no keypoint• Attention score is close to 1: keypoint

LabelC.ELoss

Local Descriptor Network

∴We can train the local descriptor network performing keypoint selection by end-to-end manner.

Additionallayers only for

training

Page 14: Large-Scale Image Retrieval with Attentive Deep Local Features

Comparison with classical local descriptor

14

• Classical Local descriptor① keypoint selection② local descriptor extraction

• Ours① local descriptor extraction② keypoint selection

• The order of process is different, but the results are similar.

Our Approach

Page 15: Large-Scale Image Retrieval with Attentive Deep Local Features

Image retrieval pipeline with ours

15

• Overall image retrieval pipeline with our local descriptor network

Our Approach

Local descriptor network

Local descriptor network

Grid local descriptors

Sparse local descriptors by keypoint selection

Page 16: Large-Scale Image Retrieval with Attentive Deep Local Features

Contents

16

• Background & Motivation

• Our Approach

• Results

• Summary

• Quiz

Page 17: Large-Scale Image Retrieval with Attentive Deep Local Features

Precision & Recall Result

17

Results

• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention for keypoint selection• QE: Use Query Expansion

Absolute Recall(#true positives w.r.t all queries)

Prec

isio

n

PR curve @ Google Landmarks dataset

Our full model

Our Backbone

Page 18: Large-Scale Image Retrieval with Attentive Deep Local Features

mAP Result

18

Results

Mean average precision: mAP(%)

• DELF: Backbone trained on ImageNet• FT: Fine Tune with Google Landmarks• ATT: Use Attention• QE: Use Query Expansion• DIR: Use global descriptor

Page 19: Large-Scale Image Retrieval with Attentive Deep Local Features

Keypoint Selection Result(Attention Result)

19

Results

DELF DELF+FT DELF+FT+ATT

DELF DELF+FT+ATT

DELF+FT

• Keypoint selection effectively disregards clutter.• Attention activates on important pixels for image retrieval.

DELF: Backbone trained on ImageNetFT: Fine Tune with Google LandmarksATT: Use Attention

Page 20: Large-Scale Image Retrieval with Attentive Deep Local Features

Contents

20

• Background & Motivation

• Our Approach

• Results

• Summary

• Quiz

Page 21: Large-Scale Image Retrieval with Attentive Deep Local Features

Contributions

21

Summary

• Keypoint selection using the attention module• Unnecessary region descriptors are suppressed. (e.g., texture-less region) • Only sparse local descriptors that are useful for image retrieval are extracted.∴Search efficiency ↑, Accuracy ↑

< Result of Keypoint Selection>

Page 22: Large-Scale Image Retrieval with Attentive Deep Local Features

Strengths & Weaknesses

22

• Strengths• Proposed method can extract both local descriptors and keypoints via one forward pass.• Efficient and accurate image retrieval can be performed by keypoint selection using attention.

• Weaknesses• This paper trains the descriptor network using only classification task.

(Do not use other metric learning methods)• The image label is required to train the network.

Summary

Page 23: Large-Scale Image Retrieval with Attentive Deep Local Features

Contents

23

• Background

• Related work

• Our Approach

• Results

• Quiz

Page 24: Large-Scale Image Retrieval with Attentive Deep Local Features

Quiz

24

• Please submit this google form.

Link will be posted in the regular zoom meeting session.

Quiz

Page 25: Large-Scale Image Retrieval with Attentive Deep Local Features

25

THANK YOU