DrillDown: Interactive Retrieval of Complex Scenes Using...

29
DrillDown: Interactive Retrieval of Complex Scenes Using Natural Language Queries

Transcript of DrillDown: Interactive Retrieval of Complex Scenes Using...

Page 1: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

DrillDown: Interactive Retrieval of Complex Scenes Using Natural Language Queries

Page 2: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

When we’d like to retrieve an image of a complex scene

Difficult to describe the whole scene in one sentence

Page 3: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Image Search Engine

Single sentence as queryNo refinement (no interaction)

Page 4: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Find a specific image in our gallery album

or online image collection

Page 5: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Image Retrieval with Multiple Rounds Queries

Drill-down: Interactive Retrieval of Complex Scenes using Natural Language QueriesFuwen Tan, Paola Cascante-Bonilla, Xiaoxiao Guo, Hui Wu, Song Feng, Vicente Ordonez.Conf. on Neural Information Processing Systems. NeurIPS 2019. Vancouver, Canada. December 2019.

Page 6: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Previous efforts on Image-Text Matching

Two women sitting on the sofa

Woman in white shirt holding a dog

Woman in yellow shirt holding a cat

CNN RNN

1D Feature Space

[1] DeViSE: A Deep Visual-Semantic Embedding Model. Andrea Frome, Greg S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc’Aurelio Ranzato, Tomas Mikolov. NIPS 2013.[2] Deep Fragment Embeddings for Bidirectional Image Sentence Mapping. Andrej Karpathy, Armand Joulin, Li Fei-Fei. NIPS 2014

Page 7: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Previous efforts on Image-Text Matching

[3] Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations. Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma. CVPR 2019.

Page 8: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Observations

Feature channels

Sp

atia

l dim

ensi

on

s2D image representation can help distinguish instances sharing the same feature subspace

Page 9: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Observations

Feature channels

Sp

atia

l dim

ensi

on

s

Two women sitting on the sofa

Woman in white shirt holding a dog

Woman in yellow shirt holding a cat

1D sentence representation can NOT distinguish instances sharing the same feature subspace

Page 10: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Observations

Feature channels

Sp

atia

l dim

ensi

on

s

Two women sitting on the sofa

Woman in white shirt holding a dog

Woman in yellow shirt holding a cat

2D sentence representation

“person” subspace

“dog” subspace

“cat” subspace

Instance1

Instance2

Instance3

Page 11: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

We still want compact representations

Especially, if it is for retrieval applications

Feature vector 1Sentence 1

Feature vector 2Sentence 2

Feature vector 3Sentence 3

...

Page 12: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Text input

Pre-allocated state vectors

Page 13: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Text feature

Page 14: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Action: which state vector to

update

Page 15: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Update the state vector

Page 16: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Pairwise alignment between state vectors and

image regions

Page 17: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Simulated queries through region-phrase annotations at training time

Human queries

Page 18: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Quantitative evaluation on a test set of 10000 images

Although, the more state vectors,

the better

Page 19: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Although, the more state vectors,

the better

We could have an even more compact representation

Quantitative evaluation on a test set of 10000 images

Page 20: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Quantitative evaluation on a test set of 10000 images

Page 21: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 22: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 23: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 24: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 25: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 26: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 27: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Target

Page 28: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Future work: instance aware text encoder for dialog based applications?

Potential challenges:● Named entity detection● Coreference resolution● Negation● ...

Page 29: DrillDown: Interactive Retrieval of Complex Scenes Using ...ft3ex/projects/drilldown/DrillDown_slides.pdfImage Retrieval with Multiple Rounds Queries Drill-down: Interactive Retrieval

Q&A