Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A...

71
Learning-based Localization Eric Brachmann ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches Torsten Sattler, Eric Brachmann

Transcript of Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A...

Page 1: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Learning-based Localization Eric Brachmann

ECCV 2018 Tutorial on Visual Localization - Feature-based vs. Learned Approaches

Torsten Sattler, Eric Brachmann

Page 2: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

2

Page 3: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

3

Page 4: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Machine Learning

4

Monday

Sunday

Wednesday

Saturday

𝐲 = 𝑓(𝐼,𝐰)Wednesday

Saturday

Sunday

Training Set

Page 5: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

5

0.2 -0.3

0.7 0.1

𝐰1

Page 6: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

6

0.2 -0.3

0.7 0.1

𝐰1

Page 7: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

7

0.2 -0.3

0.7 0.1

𝐰1

𝑓1 𝐼, 𝐰1

Page 8: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

8

0.2 -0.3

0.7 0.1

𝐰1

0.6 -0.4

-0.1 0.0

𝐰2

𝑥

𝑓(𝑥)

𝑓2 𝑓1 𝐼, 𝐰1 , 𝐰2

Page 9: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

9

MondayWednesday

SaturdaySunday

y = 𝑓3(𝑓2 𝑓1 𝐼,𝐰1 , 𝐰2 , 𝑤3)

ℓ(y)Sunday

𝛿ℓ

𝛿𝐰1=

𝛿ℓ

𝛿𝑓3

𝛿𝑓3𝛿𝑓2

𝛿𝑓2𝛿𝑓1

𝛿𝑓1𝛿𝐰1

Page 10: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

10

𝐰

[Long et al., “Fully convolutional networks for semantic segmentation”. CVPR15]

Page 11: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

11

𝐰

[12 Scenes Dataset: http://graphics.stanford.edu/projects/reloc/]

Page 12: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

12

[Kokkinos, “UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory”, CVPR17]

Page 13: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Convolutional Neural Networks

• Advantages:

• Expensive but Parallelizable

• Powerful due to End-To-End Training

• Disadvantages:

• Interpretability

• Training Data

13

Feature Extraction(hand-crafted or learned)

Classification(hand-crafted or learned)

Image 𝐼 Output

LearnedImage 𝐼 Output

Page 14: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

14

Page 15: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forests

15

𝐲 = 𝑓(𝐼,𝐰)

𝑔awake(𝐰𝟏) = 0.5

𝑔smile 𝐰𝟐 = −0.7

𝑔hat(𝐰𝟑) = 0.0

𝑔awake > 0

𝑔smile > 0

Sunday

Saturday Monday

Page 16: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forests

16

Monday

Saturday

Sunday

𝑔awake > 0

SundaySaturday

Monday

Page 17: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forests

17

Monday

Saturday

Sunday

𝑔awake > 0

Sunday

Monday

Sunday

Wednesday

Saturday𝑔smile > 0

Saturday Monday

Monday

Sunday

Wednesday

Saturday

Monday

Sunday

Wednesday

Saturday

Page 18: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forests

18

[Shotton et al., “Real-Time Human Pose Recognition in Parts from Single Depth Images”, CVPR11]

Page 19: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forests

• Advantages

• Fast

• Good Performance

• Need less training data

• Disadvantages

• No end-to-end training*

• Hand-crafted features*

• Interpretability

19

* Not in [Kontschieder et al., „Deep Neural Decision Forests“, ICCV15]

Rule 34 (of computer science after 2012):„There is a paper about it, no exceptions.“

(Andrej Karpathy, t-SNE embeddings of IMAGENET)

Page 20: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

20

Page 21: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Camera Pose Regression

21

Input: RGB Image Feed-Forward CNN Output: Pose

𝑟1𝑟2𝑟3𝑡1𝑡2𝑡3

= መ𝐡

Annotated Training Data

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 22: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

22

Page 23: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Pose Parametrization

23

መ𝐡

መ𝐡 ∈ SE(3) with SE 3 = {𝑅 𝐭0 1

|𝑅 ∈ 𝑆𝑂 3 , 𝐭 ∈ ℝ3}

𝑅 ∈ SO 3 = {𝑅 ∈ ℝ3×3|𝑅𝑅T = 𝐼, det R = 1}

𝑅 =

𝑟1 𝑟2 𝑟3𝑟4 𝑟5 𝑟6𝑟7 𝑟8 𝑟9

More infos about the wonderful world of SO(3) in e.g. [Hartley et al., „Rotation Averaging“, IJCV13]

Rotation Matrix

Enforce/Map:𝑅𝑅T = 𝐼, det R = 1

Unit Quaternione.g. [1]

𝐪 = (𝑐, 𝑣1, 𝑣2, 𝑣3)

Enforce/Map:𝐪 = 1

Log Unit Quaternione.g. [3]

log 𝐪 = 2 log 𝑅 [4]

Enforce/Map:−

Axis-Anglee.g. [2]

log 𝑅 = 𝜃ෝ𝐮 =(𝑢1′, 𝑢2′, 𝑢3′)

Enforce/Map:−

[1] Kendall et al., “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”, ICCV15[2] Brachmann et al., “DSAC - Differentiable RANSAC for Camera Localization”, CVPR17[3] Brahmbhatt et al., “Geometry-Aware Learning of Maps for Camera Localization”, CVPR18[4] Sola, “Quaternion kinematics for the error-state Kalman filter”, 2017

Page 24: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

24

Page 25: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Pose Loss Functions

25

How to measure rotation error?

𝐡 = (𝑅, 𝐭) ℓ 𝐭, 𝐭∗ = 𝐭 − 𝐭∗

Quaternion Distance[2]: 𝐪 − 𝐪∗

Angle-Axis Distance,Log Quaternion Distance[3]: log 𝑅 − log 𝑅∗

Angular Distance [1]: 𝜃 𝑅, 𝑅∗ = log(𝑅∗𝑅T)

𝜃

𝑅 𝑅∗

𝐪

𝐪∗

𝜃

𝐪 − 𝐪∗

log 𝑅 − log 𝑅∗ ≠ log 𝑇𝑅 − log 𝑇𝑅∗ [4]

[1] Brachmann et al., “DSAC - Differentiable RANSAC for Camera Localization”, CVPR17[2] Kendall et al., “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”, ICCV15[3] Brahmbhatt et al., “Geometry-Aware Learning of Maps for Camera Localization”, CVPR18[4] Hartley et al., „Rotation Averaging“, IJCV13

Page 26: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Pose Loss Functions

26

ℓ𝛽 𝐡, 𝐡∗ = ℓ 𝐭, 𝐭∗ + 𝛽ℓ 𝑅, 𝑅∗

ℓ 𝐡, 𝐡∗ = ℓ 𝐭, 𝐭∗ + ℓ 𝑅, 𝑅∗

or max(ℓ 𝐭, 𝐭∗ , ℓ 𝑅, 𝑅∗ )

[1] Brachmann et al., “DSAC - Differentiable RANSAC for Camera Localization”, CVPR17[2] Kendall et al., “PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization”, ICCV15[3] Kendall and Cipolla, “Geometric Loss Functions for Camera Pose Regression with Deep Learning”, CVPR 2017

How to combine rotation error and translation error?

ℓ𝜎2 𝐡, 𝐡∗ = ℓ 𝐭, 𝐭∗ exp −𝑠𝑡 + 𝑠𝑡 + ℓ 𝑅, 𝑅∗ exp −𝑠𝑅 + 𝑠𝑅

Hand-Tuned [2]:

Self-Tuned [3]:

Implicit/Metric [1]:

𝑠 = log 𝜎2

Page 27: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Pose Loss Functions

27

[1] Kendall and Cipolla, “Geometric Loss Functions for Camera Pose Regression with Deep Learning”, CVPR 2017

How to combine rotation error and translation error?

Measure the reprojection error [1]:

ℓ𝜋 𝐡, 𝐡∗ =

𝐯∈ℳ

𝜋 𝐡, 𝐯 − 𝜋(𝐡∗, 𝐯)

Page 28: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

28

Page 29: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

PoseNet Discussion

29

[1]“PoseNet: A Convolutional Network for Real-Time 6-DoF Camera Localization”, Kendall et al., ICCV 2015[2]“Image-Based Localization with Spatial LSTMs”, Walch et al., ICCV 2017[3]“Geometric Loss Functions for Camera Pose Regression with Deep Learning”, Kendall and Cipolla, CVPR 2017[4]“Geometry-Aware Learning of Maps for Camera Localization”, Brahmbhatt et al., CVPR18[5]“Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization”, Sattler et al., PAMI 2017

PoseNet [1]: ~45cm, 10°(GoogLeNet, Quaternions, ℓ𝛽)

Spatial LSTM [2]: ~31cm, 10°(GoogLeNet+LSTM, Quaternions, ℓ𝛽)

PoseNet [3]: ~23cm, 8°(GoogLeNet, Quaternions, ℓ𝜎2)

PoseNet [3]: ~23cm, 8°(GoogLeNet, Quaternions, ℓ𝜋)

PoseNet [4]: ~23cm, 8°(ResNet, Quaternions, ℓ𝜋)

PoseNet [4]: ~22cm, 8°(ResNet, log Quat., ℓ𝜎2)

MapNet+ [4]: ~19cm, 7°(ResNet, log Quat., ℓ𝜎2+ℓRel)

Advantages PoseNet:

• Simple

• End-To-End Learning

• Fast

Sparse Features: ~5cm, 2°

2015

2017

2018

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 30: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Side Comment: Visual Odometry

• Pose Regression seems to work very well for estimating relative poses [1]

30

Additional Input: Pose estimate of last frame

≈ Tracking

Accuracy: ~5cm, 4°

[1]“Deep Auxiliary Learning For Visual Localization And Odometry”, Valada et al., ICRA 2018

Page 31: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Feature-Based Pipeline

31

Feature Extraction

Feature Matching

Estimate Poseመ𝐡 = PnP({𝐩𝑖 , 𝐲𝑖})

Image 𝐼6D Pose

መ𝐡

Image Coordinates 𝐩𝑖 Scene Coordinates 𝐲𝑖

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 32: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Feature-Based Pipeline

32

Feature Extraction

Feature Matching

Estimate Poseመ𝐡 = PnP({𝐩𝑖 , 𝐲𝑖})

Image 𝐼6D Pose

መ𝐡

Image Coordinates 𝐩𝑖 Scene Coordinates 𝐲𝑖

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 33: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Feature-Based Pipeline

33

Feature Extraction

Feature Matching

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses (Inliers)

Image Coordinates 𝐩𝑖 Scene Coordinates 𝐲𝑖

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 34: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Task Knowledge: Local Invariance

34

Training Image Test Images

መ𝐡 ? ? ?? ? ?

[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 35: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Task Knowledge: The Camera Model

35

𝐶

𝐲𝑖

𝐩𝑖

𝐩𝑖 = 𝐶𝐡𝐲𝑖Camera Matrix

Pose SceneCoordinate

ImageCoordinate

Scene CoordinateSystem

Camera CoordinateSystem

Estimate Poseመ𝐡 = PnP({𝐩𝑖 , 𝐲𝑖})

Page 36: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

36

Page 37: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Feature-Based Pipeline

37

Feature Extraction

Feature Matching

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses (Inliers)

Image Coordinates 𝐩𝑖 Scene Coordinates 𝐲𝑖[7 Scenes Dataset: https://www.microsoft.com/en-us/research/project/rgb-d-dataset-7-scenes/]

Page 38: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression [Sho13]

38

Scene Coordinate Regression

Image 𝐼(RGB-D)

6D Pose መ𝐡

RANSACEstimate Poses (Kabsch)

Score Poses (Inliers)

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13

𝐲𝑖 ∈ ℝ3𝐩𝑖 ∈ ℝ2

Page 39: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Forest

39

𝑓 𝐩 < 𝜏

𝑓𝐷𝐴−𝑅𝐺𝐵 𝐩 = 𝐼𝑅𝐺𝐵 𝐩 + 𝜹1𝐼𝐷(𝐩)

, 𝑐1 − 𝐼𝑅𝐺𝐵(𝒑 + 𝜹2𝐼𝐷(𝐩)

, 𝑐2)

𝑓𝐷𝐴−𝐷 𝐩 = 𝐼𝐷 𝐩 + 𝜹1𝐼𝐷(𝐩)

− 𝐼𝐷(𝒑 + 𝜹2𝐼𝐷(𝐩)

)

Features: Depth Adapted Pixel Differences

Split Score: Reduction of Spatial Variance

𝑉 𝑆𝐿 , 𝑆𝑅 = |𝑆𝐿|

𝐲∈𝑆𝐿

𝐲 − ത𝐲𝐿 + |𝑆𝑅|

𝐲∈𝑆𝑅

𝐲 − ത𝐲𝑅𝑆𝐿 𝑆𝑅

Page 40: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Forest

40

𝑋

𝑓𝑆

𝑋

𝑓𝑆

𝐲pred

Mean-Shift

Page 41: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Results

41

Page 42: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Results

42

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13

RGB%Pose Err. < 𝟓𝐜𝐦𝟓°

Sparse Features 38.6%

RGB-D

[Sho13] 72.6%

Page 43: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Results

43

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Val15] “Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization”, Valentin et al., CVPR’15

RGB%Pose Err. < 𝟓𝐜𝐦𝟓°

Sparse Features 38.6%

RGB-D

[Sho13] 72.6%

[Val15] 91.3%

𝑋

𝑓𝑆

𝐲pred𝑃(𝐲)

Page 44: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Results

44

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Val15] “Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization”, Valentin et al., CVPR’15[Cav17] “On-the-Fly Adaptation of Regression Forests for Online Camera Localization”, Cavallari et al., CVPR’17

RGB%Pose Err. < 𝟓𝐜𝐦𝟓°

Sparse Features 38.6%

RGB-D

[Sho13] 72.6%

[Val15] 91.3%

[Cal17] 90.7%

𝑋

𝑓𝑆

𝑋

𝑓𝑆

𝑋

𝑓𝑆

Page 45: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Results

45

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Val15] “Exploiting Uncertainty in Regression Forests for Accurate Camera Relocalization”, Valentin et al., CVPR’15[Cav17] “On-the-Fly Adaptation of Regression Forests for Online Camera Localization”, Cavallari et al., CVPR’17[Bra16] “Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image”, Brachmann et al., CVPR’16

RGB%Pose Err. < 𝟓𝐜𝐦𝟓°

Sparse Features 38.6%

[Bra16] 55.2%

RGB-D

[Sho13] 72.6%

[Val15] 91.3%

[Cal17] 90.7%

Page 46: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Forests

• Advantages:

• Fast

• Good Results

• Training On-The-Fly

• Disadvantages:

• Needs 3D Model for Training

• No End-To-End Training

46

𝑉 𝑆𝐿 , 𝑆𝑅 = |𝑆𝐿|

𝐲∈𝑆𝐿

𝐲 − ത𝐲𝐿 + |𝑆𝑅|

𝐲∈𝑆𝑅

𝐲 − ത𝐲𝑅

Page 47: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

47

Page 48: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression with a CNN

48

Scene Coordinate Regression (CNN)

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses (Inliers)

𝐲𝑖 ∈ ℝ3𝐩𝑖 ∈ ℝ2

Page 49: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forest vs. CNN

49

Forest Prediction: CNN Prediction:

Pose Estimation Succeeds (< 5cm, 5°) Pose Estimation Fails (> 5cm, 5°)

Ground Truth:

Page 50: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Random Forest vs. CNN

50

CNN Prediction:

Obj. Corrd. Error< 𝟏𝟎𝐜𝐦

%Pose Err. < 𝟓𝐜𝐦𝟓°

Random Forest ~15% 55.2%

CNN ~25% 17.2%

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0 100 200 300 400 500 600 700 800 900 1000

Freq

uen

cy

Object Coordinate Error (mm)

Forest OutliersInlier Threshold

Inlier Peaks

Random Forest CNN

Optimized: ℓ𝐿2 𝐲, 𝐲∗ = 𝐲 − 𝐲∗ 2

Page 51: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Loss Functions

51

ℓ𝐿1

ℓHuber ℓTukey

ℓ𝐿2%𝐎𝐛𝐣. 𝐂𝐨𝐨𝐫. Err.

< 𝟏𝟎𝐜𝐦%Pose Err. < 𝟓𝐜𝐦𝟓°

Random Forest ~15% 55.2%

CNN (𝐿2) ~25% 17.2%

CNN (𝐿1) ~𝟒𝟓% 55.9%

CNN (Huber) ~𝟒𝟓% 54.4%

CNN (Tukey) ~𝟒𝟓% 52.1%

Page 52: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Loss Functions

52

%𝐎𝐛𝐣. 𝐂𝐨𝐨𝐫. Err. < 𝟏𝟎𝐜𝐦

%Pose Err. < 𝟓𝐜𝐦𝟓°

Random Forest ~15% 55.2%

CNN (𝐿2) ~25% 17.2%

CNN (𝐿1) ~𝟒𝟓% 55.9%

CNN (Huber) ~𝟒𝟓% 54.4%

CNN (Tukey) ~𝟒𝟓% 52.1%

0

0,02

0,04

0,06

0,08

0,1

0,12

0,14

0 100 200 300 400 500 600 700 800 900 1000

Freq

uen

cy

Object Coordinate Error (mm)

Inlier Threshold

Random Forest CNN (𝐿2) CNN (𝐿1)

Page 53: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Nets

53

• Advantages:

• Fast

• Better Results

• Disadvantages:

• Needs 3D Model for Training

• End-To-End Training?

• Advantages:

• Fast

• Good Results

• Disadvantages:

• Needs 3D Model for Training

• No End-To-End Training

Page 54: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Even Better Loss Function?

54

Scene Coordinate Regression (CNN)

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses (Inliers)

Optimized: ℓ𝐿1 𝐲, 𝐲∗ = 𝐲 − 𝐲∗ ℓ መ𝐡, 𝐡∗

DSAC – Differentiable RANSAC [Bra17]

[Bra17] “DSAC – Differentiable RANSAC for Camera Localization” Brachmann et al., CVPR 2017

Page 55: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

55

Page 56: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

What is RANSAC?

56

𝐲𝑖𝐡 = 𝑓({𝐲𝑖}) 𝐡

Page 57: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

What is RANSAC?

57

𝐡11) Sample multiple hypotheses 𝐡𝑗𝐡1 = 𝑓(𝐲1, 𝐲4)𝐡2 = 𝑓 𝐲2, 𝐲3𝐡3 = 𝑓 𝐲1, 𝐲5

2) Score each hypothesis: 𝑠 𝐡𝑗𝑠 𝐡1 = 2𝑠 𝐡2 = 2𝑠 𝐡3 = 4

3) Take best one (and refine): መ𝐡 = argmax𝐡𝑗

𝑠(𝐡𝑗)

መ𝐡 = 𝐡3

𝐡2

𝐡3

Page 58: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Our Pipeline

58

Scene Coordinate Regression (CNN)

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses

ReprojectionErrors of 𝐡2

𝐰

𝐡1

𝐡3

𝐡4

𝐡2

Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result

መ𝐡

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

Page 59: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Our Pipeline

59

Scene Coordinate Regression (CNN)

Image 𝐼6D Pose

መ𝐡RANSAC

Estimate Poses (PnP)

Score Poses

ReprojectionErrors of 𝐡2

𝐰

𝐡1

𝐡3

𝐡4

𝐡2

Input RGB Scene Coordinate Regression Hypothesis Sampling Scoring Hypothesis Selection Result

መ𝐡

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

𝜕

𝜕𝐰ℓ(መ𝐡, 𝐡∗)

Page 60: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

End-to-End Learning: How?

60

𝐡AM = argmax𝐡𝑗

𝑠(𝐡𝑗)

argmax Selection

non-differentiable

hard decision

𝐡SoftAM =

𝑗

exp(𝑠(𝐡𝑗))𝐡𝑗σ𝑘 exp(𝑠(𝐡𝑘))

Soft argmax Selection

differentiable

soft decision

ReprojectionErrors of 𝐡2

𝐰

𝐡1

𝐡3

𝐡4

𝐡2

መ𝐡

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

Page 61: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

End-to-End Learning: How?

61

𝐡AM = argmax𝐡𝑗

𝑠(𝐡𝑗)

argmax Selection

non-differentiable

hard decision

𝐡SoftAM =

𝑗

exp(𝑠(𝐡𝑗))𝐡𝑗σ𝑘 exp(𝑠(𝐡𝑘))

Soft argmax Selection

differentiable

soft decision

𝐡DSAC = 𝐡𝑗 , where 𝑗~exp(𝑠(𝐡𝑗))

σ𝑘 exp(𝑠(𝐡𝑘))

Probabilistic Selection

differentiable

hard decision

ReprojectionErrors of 𝐡2

𝐰

𝐡1

𝐡3

𝐡4

𝐡2

መ𝐡

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

Page 62: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

DSAC – Differentiable RANSAC

62

𝐡DSAC = 𝐡𝑗 , where 𝑗~exp(𝑠(𝐡𝑗))

σ𝑘 exp(𝑠(𝐡𝑘))= 𝑃 𝑗 𝐰

Probabilistic Selection

𝜕

𝜕𝐰𝔼𝑗~𝑃(𝑗|𝐰) ℓ(𝐡𝑗, 𝐡

∗)

Differentiation via the expected task loss:

= 𝔼𝑗~𝑃(𝑗|𝐰) ℓ 𝐡𝑗 , 𝐡∗

𝜕

𝜕𝐰log 𝑃 𝑗 𝐰 +

𝜕

𝜕𝐰ℓ 𝐡𝑗 , 𝐡

𝐰

Scene CoordinateRegression

Scoring

Derivative of the selection probability Derivative of the task loss

Page 63: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Roadmap

• Machine Learning Basics [10min]• Convolutional Neural Networks

• Random Forests

• Camera Pose Regression [15min]• Pose Parametrization

• Pose Loss Functions

• Results

• Scene Coordinate Regression [35min]• Scene Coordinate Regression Forests

• Scene Coordinate Regression CNNs

• End-to-End Learning

• Results

63

Page 64: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Accuracy on 7-Scenes [Sho13]

64

Sparse Features [Sho13]Brachmann et al. [Bra16]Ours [Bra17]

RANSACSoft argmaxDSAC

38.6%55.2%

61.0% (not end-to-end)

57.8% (end-to-end)

66.2% (end-to-end)

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Bra16] “Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image”, Brachmann et al., CVPR’16[Bra17] “Learning to Predict Dense Correspondences For 6D Pose Estimation”, Brachmann, Thesis, 2017

Error < 5cm, 5°

Page 65: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Side Comment: Learning Scores

65

[Sho13] Inlier Count: 𝑠 𝐡 = σ𝑖 𝟙 𝜏 − 𝑟𝑖 𝐡,𝐰 - not differentiable

[Bra18] Soft In. Count: 𝑠 𝐡 = σ𝑖 sig(𝜏 − 𝛽𝑟𝑖 𝐡,𝐰 ) − differentiable

[Bra17] learned 𝑠 𝐡 - hard to regularize, overfits

Training Set (2 Images Total)

Test Image

Estimation with Learned Score [Bra17] Estimation with Soft Inlier Count [Bra18]

Estimated Camera Poses 3D Model Overlay 3D Model Overlay

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

𝑠(𝐡1)

𝑠(𝐡4)

𝑠(𝐡2)

𝑠(𝐡3)

Score Regression

Soft Inlier Count

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Bra17] “DSAC – Differentiable RANSAC for Camera Localization”, Brachmann et al., CVPR’17[Bra18] “Learning Less is More - 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

Page 66: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Accuracy on 7-Scenes [Sho13]

66

Sparse Features [Sho13]Brachmann et al. [Bra16]Ours [Bra17]

RANSACSoft argmaxDSAC

38.6%55.2%

61.0% (not end-to-end)

57.8% (end-to-end)

66.2% (end-to-end)

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Bra16] “Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image”, Brachmann et al., CVPR’16[Bra17] “Learning to Predict Dense Correspondences For 6D Pose Estimation”, Brachmann, Thesis, 2017[Bra18] “Learning Less is More - 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

Error < 5cm, 5°

DSAC++ [Bra18] 76.1% (end-to-end)

[Bra17]

[Bra18]

Page 67: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Nets

67

• Advantages:

• Fast

• End-To-End Training

• Better Best Results

• Disadvantages:

• Needs 3D Model for Training

• Advantages:

• Fast

• Good Results

• Disadvantages:

• Needs 3D Model for Training

• No End-To-End Training

Page 68: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Training without a 3D Model

68

• End-to-end training needs no 3D model but good initialization

• [Bra17] Initialization: Minimize 3D distance to ground truth scene coordinates

• [Bra18] Initialization: Minimize 2D reprojection error

min

𝑖

𝐲𝑖 𝐰 − 𝐲𝑖∗

min

𝑖

𝜋(𝐲𝑖 𝐰 , 𝐡∗) − 𝐩𝑖

[Bra17] “DSAC – Differentiable RANSAC for Camera Localization”, Brachmann et al., CVPR’17[Bra18] “Learning Less is More - 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

Page 69: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Accuracy on 7-Scenes [Sho13]

69

Sparse Features [Sho13]Brachmann et al. [Bra16]Ours [Bra17]

RANSACSoft argmaxDSAC

38.6%55.2%

61.0% (not end-to-end)

57.8% (end-to-end)

66.2% (end-to-end)

[Sho13] “Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images”, Shotton et al., CVPR’13[Bra16] “Uncertainty-Driven 6D Pose Estimation of Objects and Scenes from a Single RGB Image”, Brachmann et al., CVPR’16[Bra17] “Learning to Predict Dense Correspondences For 6D Pose Estimation”, Brachmann, Thesis, 2017[Bra18] “Learning Less is More - 6D Camera Localization via 3D Surface Regression”, Brachmann and Rother, CVPR’18

Error < 5cm, 5°

DSAC++ [Bra18] 76.1% (end-to-end)

(w/o M) [Bra18] 60.4% (end-to-end)

with 3D model

w/o 3D model

Page 70: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Scene Coordinate Regression Nets

70

• Advantages:

• Fast

• End-To-End Training

• Better Best Results

• Disadvantages:

• Needs 3D Model for Training

• Advantages:

• Fast

• Good Results

• Disadvantages:

• Needs 3D Model for Training

• No End-To-End Training

Page 71: Learning-based Localization · 2020. 3. 31. · PoseNet Discussion 29 [1]“PoseNet: A Convolutional Network for Real-Time 6-DoF amera Localization”, Kendall et al., ICCV 2015 [2]“Image-ased

Conclusion

Pose Regression

• fast

• coarse estimates

71

PoseNet: http://mi.eng.cam.ac.uk/projects/relocalisation/MapNet: https://research.nvidia.com/publication/2018-06_Geometry-Aware-Learning-ofSCoRF [Bra16]: https://hci.iwr.uni-heidelberg.de/vislearn/research/scene-understanding/pose-estimation/#CVPR16

VLocNet: http://deeploc.cs.uni-freiburg.de/DSAC: https://github.com/cvlab-dresden/DSACDSAC++: https://github.com/vislearn/LessMore

Scene Coordinate Regression• with RFs: fast and accurate• with CNNs: best accuracy, no 3D models

Learning pose estimation works! High accuracy possible.Learning pose estimation is fun! Interesting mixture of learning and geometry.

Code of many methods online: