Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0...

Learning Dense Correspondence via 3D-guided Cycle Consistency

Tinghui Zhou1, Philipp Krähenbühl1, Mathieu Aubry2, Qixing Huang3, Alexei A. Efros1

UC Berkeley1, ENPC ParisTech2, TTI-Chicago3

The Unreasonable Effectiveness of Deep Learning?

Performance gain over traditional methods

60%

45%

30%

15%

0Object

detectionSemantic

seg.Humanpose

Intrinsicimage

VideoSeg.

Lots of direct labels

Very few direct labels

Densematching

3

Dense Semantic Correspondence

4

Dense Semantic Correspondence

5

Traditional Pairwise Methods

• SIFT flow: Liu et al., ECCV 2008• Generalized PatchMatch: Barnes et al., ECCV 2010• Deformable Spatial Pyramid: Kim et al., CVPR 2013

Hand-crafted Features

Hand-crafted Features

Feature Matching

Collection Correspondence

• Congealing: Learned-Miller, PAMI 2006• Collection Flow: Kramelmacher-Shlizerman et al., CVPR 2012• Object discovery and segmentation: Rubinstein et al., CVPR 2013• Compositional Image Model: Mobahi et al., CVPR 2014• Object discovery and localization: Cho et al., CVPR 2015• FlowWeb: T. Zhou et al., CVPR 2015• Multi-image Matching: X. Zhou et al., ICCV 2015

Labels for CNN Training?

CNN Infeasible to label in large-scale

Cycle-consistency as Supervision

• Composite flows along a cycle should be zero


• Composite flows along a cycle should be zero • 2-cycle consistency: Fi,j � Fj,i = 0



• 3-cycle consistency: Fi,k � Fk,j � Fj,i = 0



• 3-cycle consistency: Fi,k � Fk,j � Fj,i = 0

CNNAmount of

inconsistency

Cycle Consistency in Vision

Shape Matching SfMCo-segmentation

Huang et al, SGP’13 Wang et al, ICCV’13 Zach et al, CVPR’10

Collection Correspondence

Zhou et al, CVPR’15 Zhou et al, ICCV’15

Could be consistent but wrong…

2

6664

0 0 0 . . . 00 0 0 . . . 0...

......

...0 0 0 . . . 0

3

7775

26664

00

0. ..

0

00

0. ..

0

. . .

. . .

. . .

. . .

00

0. ..

0

37775

26664

00

0. . .

0

00

0. . .

0

......

...

...

00

0. . .

0

37775

Need an anchor edge!

Synthetic Correspondence as the Anchor

3D CAD Model

Viewpoint Renderer

Correspondence from renderer

3D-guided Cycle Consistency

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1

synthetic s1 synthetic s2

real r1 real r2

F̃s1,s2 = Fs1,r1 � Fr1,r2 � Fr2,s2

Accumulate flow vector

Ground truth

TRAINING TIME

3D-guided Cycle Consistency

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1


real r1 real r2

minX

<s1,s2,r1,r2>

L⇣F̃s1,s2 � Fs1,r1 �Fr1,r2 �Fr2,s2

⌘

Ground truth

Network Architecture

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

8 16 16 32 32 64 64 128 128

512 256 256 128 128

64 64 32 2

Source

Target

WeightSharing

Flow field

Matchability PredictionSource

Target

Flow field

CNN


Target

Flow field

CNN

Background: ✗!


Target

Flow field

CNN

Background: ✗!Occlusion: ✗!


Target Flow fieldCNN

Matchability

Training Set ConstructionPASCAL 3D

(Bbox + Viewpoint)ShapeNet

(Synthetic Rendering)

Xiang et al, WACV’14 Chang et al, arXiv’15

Training Set Construction

…

…

…

…

Single view reconstruction via joint analysis of image and shape collections, Huang et al., SIGGRAPH 2015

Image-to-shape retrieval

Training Set Construction

One training example

• ~80,000 examples per category• A single network for all 12 PASCAL3D categories (aero,

boat, bus, car, chair, etc.)

RESULTS

Image Warping VisualizationTargetSource

SIFT flow Ours

Image Warping Visualization

TargetSource

SIFT flow Ours

Keypoint TransferSource TargetAccuracy (PCK)

SIFT flow

Ours

Mean 19.6 24.0

…

Car 22.4 33.3

Bus 28.6 40.3

Bottle 28.3 40.3

TV 42.9 51.1

…

SIFT flow Ours

Matchability PredictionSource TargetOurs Ground truth

AccuracySIFT flow Ours

64.5 72.0

t-SNE Feature Visualization

128

8

3 Source

Target

Weight sharing

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

128

8

3

128 64 64 32 32

16 16

16 32 32

64 64 128 128 256

8 16 16 32 32 64 64 128 128

8 16 16 32 32 64 64 128 128

512 256 256 128 128

64 64 32 2

256 128 128 64 64

32 32 16 2

Flow field

Matchability

Global image features

t-SNE Feature Visualization

Side views 45。views Frontal views

Application: Cross-domain Dense Label Transfer

Source Target Dense CRF SIFT flow Ours

Conclusion

TRAINING TIME

Fr2,s2

F̃s1,s2

Fr1,r2

Fs1,r1


real r1 real r2

Ground truth

• Cycle consistency effective when direct labels not available• ‘Meta’-supervision: supervising the behavior of the data

Thank you!

Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0...

Documents

Transcript of Learning Dense Correspondence via 3D-guided Cycle …tinghuiz/slides/cvpr16_cycle.pdf30% 15% 0...