Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0...

35
Learning Dense Correspondence via 3D-guided Cycle Consistency Tinghui Zhou 1 , Philipp Krähenbühl 1 , Mathieu Aubry 2 , Qixing Huang 3 , Alexei A. Efros 1 UC Berkeley 1 , ENPC ParisTech 2 , TTI-Chicago 3

Transcript of Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0...

Page 1: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Learning Dense Correspondence via 3D-guided Cycle Consistency

Tinghui Zhou1, Philipp Krähenbühl1, Mathieu Aubry2, Qixing Huang3, Alexei A. Efros1

UC Berkeley1, ENPC ParisTech2, TTI-Chicago3

Page 2: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

The Unreasonable Effectiveness of Deep Learning?

Performance gain over traditional methods

60%

45%

30%

15%

0Object

detectionSemantic

seg.Humanpose

Intrinsicimage

VideoSeg.

Lots of direct labels

Very few direct labels

Densematching

Page 3: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

3

Dense Semantic Correspondence

Page 4: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

4

Dense Semantic Correspondence

Page 5: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

5

Traditional Pairwise Methods

• SIFT flow: Liu et al., ECCV 2008• Generalized PatchMatch: Barnes et al., ECCV 2010• Deformable Spatial Pyramid: Kim et al., CVPR 2013

Hand-crafted Features

Hand-crafted Features

Feature Matching

Page 6: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Collection Correspondence

• Congealing: Learned-Miller, PAMI 2006• Collection Flow: Kramelmacher-Shlizerman et al., CVPR 2012• Object discovery and segmentation: Rubinstein et al., CVPR 2013• Compositional Image Model: Mobahi et al., CVPR 2014• Object discovery and localization: Cho et al., CVPR 2015• FlowWeb: T. Zhou et al., CVPR 2015• Multi-image Matching: X. Zhou et al., ICCV 2015

Page 7: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Labels for CNN Training?

CNN Infeasible to label in large-scale

Page 8: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero

Page 9: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero • 2-cycle consistency: Fi,j � Fj,i = 0

Page 10: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero • 2-cycle consistency: Fi,j � Fj,i = 0

• 3-cycle consistency: Fi,k � Fk,j � Fj,i = 0

Page 11: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero • 2-cycle consistency: Fi,j � Fj,i = 0

• 3-cycle consistency: Fi,k � Fk,j � Fj,i = 0

Page 12: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero • 2-cycle consistency: Fi,j � Fj,i = 0

• 3-cycle consistency: Fi,k � Fk,j � Fj,i = 0

CNNAmount of

inconsistency

Page 13: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Cycle Consistency in Vision

Shape Matching SfMCo-segmentation

Huang et al, SGP’13 Wang et al, ICCV’13 Zach et al, CVPR’10

Collection Correspondence

Zhou et al, CVPR’15 Zhou et al, ICCV’15

Page 14: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Could be consistent but wrong…

2

6664

0 0 0 . . . 00 0 0 . . . 0...

......

...0 0 0 . . . 0

3

7775

26664

00

0. ..

0

00

0. ..

0

. . .

. . .

. . .

. . .

00

0. ..

0

37775

26664

00

0. . .

0

00

0. . .

0

......

...

...

00

0. . .

0

37775

Need an anchor edge!

Page 15: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Synthetic Correspondence as the Anchor

3D CAD Model

Viewpoint Renderer

Correspondence from renderer

Page 16: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

3D-guided Cycle Consistency

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1

synthetic s1 synthetic s2

real r1 real r2

F̃s1,s2 = Fs1,r1 � Fr1,r2 � Fr2,s2

Accumulate flow vector

Ground truth

Page 17: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

TRAINING TIME

3D-guided Cycle Consistency

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1

synthetic s1 synthetic s2

real r1 real r2

minX

<s1,s2,r1,r2>

L⇣F̃s1,s2 � Fs1,r1 �Fr1,r2 �Fr2,s2

Ground truth

Page 18: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Network Architecture

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

8 16 16 32 32 64 64 128 128

512 256 256 128 128

64 64 32 2

Source

Target

WeightSharing

Flow field

Page 19: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Matchability PredictionSource

Target

Flow field

CNN

Page 20: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Matchability PredictionSource

Target

Flow field

CNN

Background: ✗!

Page 21: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Matchability PredictionSource

Target

Flow field

CNN

Background: ✗!Occlusion: ✗!

Page 22: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Matchability PredictionSource

Target Flow fieldCNN

Matchability

Page 23: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Training Set ConstructionPASCAL 3D

(Bbox + Viewpoint)ShapeNet

(Synthetic Rendering)

Xiang et al, WACV’14 Chang et al, arXiv’15

Page 24: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Training Set ConstructionPASCAL 3D

(Bbox + Viewpoint)ShapeNet

(Synthetic Rendering)

Xiang et al, WACV’14 Chang et al, arXiv’15

Page 25: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Training Set Construction

Single view reconstruction via joint analysis of image and shape collections, Huang et al., SIGGRAPH 2015

Image-to-shape retrieval

Page 26: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Training Set Construction

One training example

• ~80,000 examples per category• A single network for all 12 PASCAL3D categories (aero,

boat, bus, car, chair, etc.)

Page 27: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

RESULTS

Page 28: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Image Warping VisualizationTargetSource

SIFT flow Ours

Page 29: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Image Warping Visualization

TargetSource

SIFT flow Ours

Page 30: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Keypoint TransferSource TargetAccuracy (PCK)

SIFT flow

Ours

Mean 19.6 24.0

Car 22.4 33.3

Bus 28.6 40.3

Bottle 28.3 40.3

TV 42.9 51.1

SIFT flow Ours

Page 31: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Matchability PredictionSource TargetOurs Ground truth

AccuracySIFT flow Ours

64.5 72.0

Page 32: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

t-SNE Feature Visualization

128

8

3 Source

Target

Weight sharing

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

8 16 16 32 32 64 64 128 128

8 16 16 32 32 64 64 128 128

512 256 256 128 128

64 64 32 2

256 128 128 64 64

32 32 16 2

Flow field

Matchability

Global image features

Page 33: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

t-SNE Feature Visualization

Side views 45。views Frontal views

Page 34: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Application: Cross-domain Dense Label Transfer

Source Target Dense CRF SIFT flow Ours

Page 35: Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0 Object detection Semantic seg. Human pose Intrinsic image Video Seg. Lots of direct labels

Conclusion

TRAINING TIME

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1

synthetic s1 synthetic s2

real r1 real r2

Ground truth

• Cycle consistency effective when direct labels not available• ‘Meta’-supervision: supervising the behavior of the data

Thank you!