Visual Memorability for Egocentric Cameras

48
Visual Memorability for Egocentric Cameras Marc Carné Herrera Advisors: Xavier Giró-i-Nieto and Cathal Gurrin

Transcript of Visual Memorability for Egocentric Cameras

Visual Memorability for Egocentric Cameras

Marc Carné Herrera

Advisors: Xavier Giró-i-Nieto and Cathal Gurrin

Acknowledgements

2

Petia Radeva Maite Garolera

Albert Gil Josep Pujal

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Conclusions

3

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Conclusions

4

“Brain is designed to forget in order to survive”

● Lifelogger → person that captures his daily life in order to create a virtual and digital memory.

● Wearable cameras → capture first person vision.● Big data → 1.400 - 2.000 images/day.● Challenge → retrieval!

5

Introduction

“Brain is designed to forget in order to survive”

● Lifelogger → person that captures his daily life in order to create a virtual and digital memory.

● Wearable cameras → capture first person vision.● Big data → 1.400 - 2.000 images/day.● Challenge → retrieval!

6

Introduction

What we want to

remember?

● Cognitive therapy → Alzheimer patients, reminiscence therapy.

7

Motivation

8

Image set

Relevant images with low level feature from a

CNN

Relevant images with object detection, faces detction… (based on content)

Visual memorability

9

[Isola, CVPR 2011]

Visual memorability

10

[Isola, CVPR 2011]

More memorable Less memorable

Domain adaptation

11

Human-taken Egocentric

[Khosla, ICCV 2015]

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Conclusions

12

Why an annotation tool?

13

ConvolutionalNeural Network

Input(image)

Output(label)

Image + label to train the model

● Inspired by MIT research work [1]● Visual memory game:

○ Simple task → press ‘d’ when a repeated image is found○ Duration: 9 minutes○ Output: text file with detections

14

[1] Understanding and Predicting Image Memorability at a Large

Scale, A. Khosla, A. S. Raju, A. Torralba and A. Oliva. ICCV 2015

Annotation tool for visual memorability

UTEgocentric

Insight Center for Data Analytics

Annotation

15

[Khosla, ICCV 2015]

Annotation tool

16

● Docker:○ Container with an operating system and software required.○ Always run the same in any environment.

● Simple implementation → dockerfile

17

Annotation tool implementationWhy to use a Docker?

First docker implementation

in GPI for research

18

● Memorability score → [0,1]

● Result:○ Dataset → 50 annotated images (25 users)

Annotation tool results

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Contributions

19

Convolutional neural network: definition

● Automatic learning paradigm based by how human brain works● Neuron interconnection that work together to generate an output stimulus or activation

20

21

Convolutional neural network: layers

Convolutional layer Fully connected layer

● MemNet → CNN for memorability prediction○ 5 conv layers + 2 fully connected layers + linear regression

22

EgoMemNet: visual memorability adaptation to egocentric images

MemNet CNN[Koshla, ICCV 2015]

1

Structure: AlexNet

CNN fine-tuning

23

MemNet [Koshla, ICCV 2015]Insight dataset(egocentric dataset)

EgoMemNet

1

● No augmentation● Spatial data augmentation → common method● Temporal data augmentation → egocentric feature

24

Data augmentation strategies

Spatial data augmentation Temporal data augmentation

25

Quantitative results

Spearman’s rank correlation

Compute the similarity between positions between two different ranked lists.

Memorability rank Ground truth rank

26

Quantitative results

Spearman’s rank correlation

27

Qualitatives results

28

Memorability maps

● Heat maps that highlight most memorable regions.● Methods:

○ Grid-and-forward → obtain a memorability score per patch○ EgoMemNet → fully convolutional version

Grid-and-forward EgoMemNet

Memorability maps: grid-and-forward pass

29

Memorability maps: EgoMemNet

30

[Zhou, CVPR 2016]

31

Memorability vs. saliency maps

Original image Saliency map(SalNet CNN)

[Pan, CVPR 2016]

Memorability map(EgoMemNet* CNN)

In green, parts shared between saliency and memorability maps.In blue, memorability regions non-salient.In red, salient regions non-memorability

Binarized maps with

learned threshold

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Conclusions

32

The Insight dataset

● Multimodal homemade dataset:○ Images○ Memorability score○ Heart rate value (during

image acquisition)○ Galvanic skin response

(during image acquisition)

33

Publicly available!

Heart rate correlation

34

Memory scores quantized in 8 bins

Mean heart rate in the bin

Galvanic skin response

35

Memory scores quantized in 8 bins

Mean GSR in the bin

Physiological signals for memorability prediction

36

SNAP !

Detect snap points● Prior approach → efficient capture without image processing

37

Linear Regression

Adding physiological signals for memorability prediction

● Post approach

38

Linear Regression

EgoMemNet score

New feature: EEG signals

● EEG → electroencephalographic signals● Hands free visual memory game

39

EEG data extraction

40

Peak at 400 ms

P3@Pz → average 350-600 ms

41

P3@PzPeak 400ms

Outline

➔ Introduction➔ Contributions

◆ Annotation tool for visual memorability◆ EgoMemNet: visual memorability adaptation to egocentric images◆ Visual memorability and physiological signals

➔ Conclusions

42

Conclusions

● New annotation tool allows to create novel dataset for egocentric memorability.

● Egocentric (first person vision) dataset containing 50 annotated images.

● EgoMemNet, a model adapted for memorability prediction to egocentric images, presents a perform over MemNet, a convolutional neural network model trained with human-taken images.

● Physiological signals for memorability prediction.

43

Extended abstract

Carné-Herrera M, Giró-i-Nieto X, Gurrin C. EgoMemNet: Visual Memorability Adaptation to Egocentric Images. Las Vegas, NV, USA: 4th Workshop on Egocentric (First-Person) Vision, CVPR 2016;

44

Spotlight

Full spotlight in youtube!

https://www.youtube.com/watch?v=qwM5NNW37YE

Poster presentation

464646

47

Open Research

Dataset Model Annotation tool

http://imatge-upc.github.io/memory-2016-fpv/

48

Hope this presentation has been memorable!

Thanks for your attention!