Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker...

32
Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi Microsoft Research

Transcript of Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker...

Page 1: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Multi-Output Learningfor Camera Relocalization

Abner Guzmán-Rivera UIUC

Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram Izadi

Microsoft Research

Page 2: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

2

Camera Relocalizationfrom RGB-D images

World

Know 3D model

RGB-Depth

Observe single frame

Where is the camera?

6D camera pose H(rotation and translation)

Page 3: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

3

Applications Large scale 3D model reconstruction

Page 4: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

4

Applications Vehicle, robot, etc. localization

Page 5: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

5

Applications Augmented Reality

Page 6: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

6

Other Approaches to Localization Sparse key-point matching:

– Detectors: [Rosten et al. PAMI’10], [Holzer et al. ECCV’12]

– Descriptors: [Winder and Brown CVPR’07], [Calonder et al. ECCV’10], [Rublee et al. ICCV’11]

– Matching: [Lepetit and Fua PAMI’06], [Nistér and Stewénius CVPR’06], [Schindler et al. CVPR’07]

– Pose estimation: [Irschara et al. CVPR’09], [Dong et al. ICCV’09], [Yi et al. ECCV’10], [Baatz et al. IJCV’11], [Sattler et al. ICCV’11]

Whole key-frame matching[Klein and Murray ECCV’08], [Gee and Mayol-Cuevas BMVC’12]

Epitomic location recognition[Ni et al. PAMI’09]

Page 7: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

7

Relocalization as Inverse Problem Find the pose H* minimizing the error in a

rendering of the model

3D model of sceneRendering error

View “renderer”Input RGB-D frame

Page 8: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

8

Inverse Problem

DiscriminativePredictor

Page 9: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

9

Inverse Problem

Page 10: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

10

Single Predictor Not Powerful Enough Limited expressivity

The mapping is one-to-many

Input frame

Page 11: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

11

Approx. Inverse Problem Stage 1

Portfolio ofDiscriminative

PredictorsWant complementary or “diverse” predictions

Page 12: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

12

Approx. Inverse Problem Stage 2

Page 13: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

13

How to train such portfolioof complementary predictors?

Page 14: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

14

Discriminative Predictor[Shotton et al. CVPR’13]

Page 15: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

15

Scene Coordinate Regression Forests

[Shotton et al. CVPR’13]

Pixel comparison features(Depth and RGB) (x,y,z) world coordinate

Regression tree:

Regression forest

. . .

Page 16: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

16

Scene Coordinate Regression Forests

[Shotton et al. CVPR’13]

Inliers for several hypothesesfrom RANSAC

H1

H2

H3

H4

H5

H6

. . .Forest predicts 3Dworld coordinates

Sample pixels frominput RGB-D frame

Page 17: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

17

Learning a portfolio of predictors

to output a set of hypotheses that:Would like to train a set of predictors

1. Are relevant, i.e., approx. local minimizers2. Summarize well the output space

Page 18: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

18

Learning a portfolio: previous work Multiple Choice Learning

[Guzman-Rivera et al. NIPS’12, AISTATS’14]

Set min-loss Oracle penalizes portfolio for the errorin the best prediction in the output

– The portfolio is NOT penalized for being diverse– Set min-loss applies to standard datasets– Iterative training of fixed size portfolio

Standard task-loss

Page 19: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

19

Learning a portfolio of predictors

Portfolio of predictors CVPR’13 SCoRe Forest

We already have the objective to optimize

and propose to approximate (1) by

Page 20: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

20

– The portfolio is NOT penalized for being diverse– Learning procedure is able to tune portfolio to

the reconstruction error to be used at test-time– Next we describe one way to achieve diversity

Multi-Output LossStandard task-loss

Page 21: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

21

Training Algorithm

Page 22: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

22

Loss to Example Weights

Diversity parameter(“variance” of the weights)

Multi-output loss for example j

Intuition: Want next predictor to emphasize accuracy on examples difficult thus far

Page 23: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

23

Rendering Error

Page 24: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

24

L1 Rendering ErrorInput frame 1. Raycast depth frame for some hypothesis

2. Evaluate L1 distance between input depth and raycast depth

Page 25: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

25

Results

Page 26: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

26

7-Scenes Dataset

[Shotton et al. CVPR’13, Glocker et al. ISMAR’13]

Page 27: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

27

Metric Proportion Correct (single prediction)

– Correct if translational error ≤ 5cm ANDrotational error ≤ 5o

Competing Approaches CVPR13: Scene Coordinate Regression Forests

[Shotton et al. CVPR’13]

CVPR13 + M-Best– Take M-Best RANSAC hypotheses

Page 28: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

28

Office

Input frame

Multiple predictions:

Ground-truth (white),Prediction (magenta):

Page 29: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

29

Stairs

Input frame

Multiple predictions:

Ground-truth (white),Prediction (magenta):

Page 30: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

30

All Scene Average

1 2 3 4 5 6 7 8 9 100.66

0.68

0.70

0.72

0.74

0.76

0.78

0.80

CVPR13 + M-BestMulti-OutputCVPR13

Pro

port

ion

Cor

rect

Size of Portfolio

Page 31: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

31

All Scene Average

1 2 3 4 5 6 7 8 9 100.66

0.68

0.70

0.72

0.74

0.76

0.78

0.80

CVPR13 + M-BestMulti-OutputCVPR13

Pro

port

ion

Cor

rect

Size of Portfolio

Usingaggregation

Page 32: Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

32

Summary Camera relocalization as inverse problem

Portfolio of complementarydiscriminative predictors

Method to learn suchportfolio

State-of-the-art camerarelocalization