Automatic Portrait Segmentation and...

Automatic Portrait Segmentation and Matting

Xiaoyong ShenThe Chinese University of Hong Kong

[email protected]

Research on CV

• Pixel based (low level/ early vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object/ Semantic based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

My Research on CV

• Pixel based (low level vision)• Filtering, restoration, denoise, enhancement, deblur,

editing, dehaze, etc.

• Region/ Patch based (Middle level vision)• Matching, optical flow, stereo matching, tracking,

segmentation, etc.

• Object based (high level vision)• Semantic segmentation, Object detection, image

classification, recognition, etc.

Multi-Spectral Image Restoration

• Input• Noisy RGB image I0

• E.g. captured at night

• Clean guidance image G• E.g. dark-flashed NIR, or flashed RGB images

• Output• Denoised image I

• Structures are clear as guidance G.• Appearance is the same as image I0.• Shadow/Highlight does not affect.

5[TPAMI 2015]

Scale Map

• Given 𝐼∗ – the expected ground truth noise-free image, our scale map s is defined under the following condition

min 𝛻𝐼∗ − 𝑠𝛻𝐺

• It adapts structures of 𝐺 to that of I*.

• It is an ideal ratio map between 𝛻𝐺 and 𝛻𝐼∗.

6

Result

7Our Result Ground Truth

Input Noisy Image Input NIR Image

RGB Input I

8

NIR Input G

9

BM3D

10

Our Result

11

Mutual-Structure Filter

[ICCV 2015 Oral Presentation]

Depth/RGB Restoration

Noisy Depth


Noisy RGB Image


Ground truth


OursPSNR = 37.19

Rolling Guidance Filter

One line code only: 𝐼𝑡+1 = 𝐽𝐹(𝐼0, 𝐼𝑡)

[ECCV 2014 Oral Presentation]

Texture Removal

18

Halftone Image

19

De-Filter

One line code only: 𝐼𝑡+1 = 𝐼𝑡 + (𝐼0 − 𝐹(𝐼𝑡))

Reverse Skin Retouch

Retouched input


Reversed


Before retouch

Multi-Spectral Matching

• Match general multi-spectral images with significant displacement and obvious structure inconsistency

Different Exposures RGB/Depth RGB/NIR Flash/No-flash

Result

• Match RGB/NIR image pair

InputsOur ResultBlended

Applications

• HDR construction

Without AlignmentWith AlignmentConstructed HDR

Internet Image Matching

Reference Input

Dense Correspondences ?

Exist Correspondence

No Correspondence

[SIGGRAPH ASIA 2016]

Our Motivation

Reference Input

Dense Correspondences ?

Foremost Region Matching

Time-lapse Generation

Automatic Morphing

Object-based MatchingAchieve higher accuracy with the help of object (person)

Object-based Matching

State-of-the-art Ours

Classification and Segmentation

• Fine-grained Classification• DeepLAC (CVPR 2015)

• Text detection and recognition

• Semantic object segmentation• Portrait segmentation and matting

• VOC challenge

Automatic Portrait Segmentation

Motivation

• Abundant portraits in smartphone photos

38

Portrait, 30%

Others, 70%

Samsung UK

Portrait, 90%

Others, 10%

Symon Whitehorn from HTC

Portrait Post-processing

39

Foreground Selection

40

Quick Selection

41

Automatic Segmentation

42

Automatic?

Challenges

43

Similar Color Complex Background Various Accessories

Low Contrast Diverse PoseComplicated Edges

Possible Solutions

• Graph-cut with face tracker

44

Possible Solutions

• CNNs for semantic segmentation

45

Most Related Work

• Interactive Image Selection• Lazy snapping [Li et al. 2004]• Grabcut [Rother et al. 2004]• Paint Selection [Li et al. 2009]

• CNNs for Semantic Object Segmentation• FCN [Long et al. 2014]• DeepLab [Chen et al. 2014]• CRFasRNN [Zheng et al. 2015]

• Image Matting• Bayesian matting [Chuang et al. 2001]• Closed-form matting [Levin et al. 2008]• KNN matting [Chen et al. 2013]

46

Our Approach

47

PortraitFCN and PortraitFCN+

Our System

48

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN ModelRGB Channels 2 Outputs

PortraitFCN

49

• Fine tune it from original FCN-8s model

Portrait Knowledge

PortraitFCN+

50

Detector

Conv ReLUPooling Conv

ConvPoolingReLU

DeConv Mask[Long et al. 2015]

PortraitFCN+ ModelRGB+Shape+Position 2 Outputs

Shape Position

Shape Channel

51

……

Labeled Masks

Align

Canonical Pose

Mean

Shape Channel

𝑀 =σ𝑖𝑤𝑖 ∘ 𝑇𝑖(𝑀𝑖)

σ𝑖𝑤𝑖

Align

Test Image

Position Channel

52

Canonical Pose

x- Coordinate y- Coordinate

Position Test Image

Align

Effectiveness

53

Input

Effectiveness

54

PortraitFCN

Effectiveness

55

PortraitFCN+

Experiments and Applications

56

Our Dataset

• 1,800 portraits from Flickr with labeled mask• 1500 portraits as the training data

• 300 for testing

• Large variations on portrait types• Age, color, background, clothing, accessories, head

position, hair style, lighting, etc.

57

Training

• Fine turn the model starting from FCN-8s• Synthesize more data with different transforms

• Using the person class and background weights

• Find the best learning rate• Loss

• accuracy

59

Find the Best LR

60

Evaluation

61

Methods Mean IoU (%)

Graph-cut 80.02

FCN (Person Class) 73.09

IoU =area(output ∩ ground truth)

area(output ∪ ground truth)

Evaluation

62


Graph-cut 80.02


PortraitFCN 94.20



Evaluation

63


Graph-cut 80.02


PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61



Evaluation

64


Graph-cut 80.02


PortraitFCN 94.20

PortraitFCN+ (Only with Mean Mask) 94.89

PortraitFCN+ (Only with Normalized x and y) 94.61

PortraitFCN+ 95.91



Comparisons

65

Input

Comparisons

66

Ground Truth

Comparisons

67

Graph-cut

Comparisons

68

FCN-8s (Person)

Comparisons

69

PortraitFCN

Comparisons

70

PortraitFCN+

Comparisons

71

Input Ground Truth

IoU = 0.83 IoU = 0.42

IoU = 0.91 IoU = 0.85

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Ours

Comparisons

72

Input Ground Truth

IoU = 0.77 IoU = 0.95

IoU = 0.38 IoU = 0.84

FCN-8s Graph-cut

IoU = 0.98

IoU = 0.98

Ours

Comparisons

73

Input Ground Truth

IoU = 0.83 IoU = 0.53

IoU = 0.81 IoU = 0.89

FCN-8s Graph-cut

IoU = 0.99

IoU = 0.98

Ours

Robustness

74

Color Scale Rotation Occlusion

User Study

• Our result provides very good initialization for further refinement

75

Segmentation is not enough--Automatic Portrait Matting

Portrait Matting

Input Image Alpha Matte

Color transform Depth-of-field Portrait

Stylization Cartoon

Background Edit

Problem Definition

78

𝜶𝑭 + 𝟏 − 𝜶 𝑩

foreground background

Image Alpha/foreground opacity

𝑰 =

Natural Image Matting

• Color Sampling Methods• Given manual-labeled trimap

• Bayesian Matting [Y-Y Chuang, 2001], etc.

79

Image Trimap Alpha matte

Natural Image Matting

• Propagation approaches• Given manual-labeled strokes & trimap

• Closed-form Matting [Levin, 2008], etc.

80

𝛼 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝛼𝑇𝐿𝛼 + 𝜆 𝛼 − 𝑏𝑠𝑇𝐷(𝛼 − 𝑏𝑠)

Matting Laplacian User-provided Strokes

Diagonal stroke mask

Motivation

• It is very hard to specify trimap or strokes

81

Input Labeled Strokes Closed-form Matting

error

Motivation

• It is very hard to specify trimap or strokes

Input Labeled Trimap Closed-form Matting

error

Motivation

83

Usually we need to refine the trimap many times to get a good alpha matte……

Segmentation to Matting

Learning for Automatic Matting

• Challenges• Data preparation

• Learning framework

• We propose end-to-end Convolutional Neural Networks (CNNs) for Portrait Matting

87

Learning Data Collection

• 2000 portraits from Flickr with large variation• Keywords…

• Different Age, gender, pose, hairstyle, background…

• Different camera type…

• Data example

88

Data Labeling

• Apply closed-form matting and robust matting• Gradually refine the input trimap

• Choose the best one from closed-form or robust matting

• User interface

• Ground truth example

90

Learn Automatic Matting

92

Our Method

93

Trimap labeling• Input: RGB image

• Output: trimap

• Network: Fine tuned from FCN

Our Method

94

Image Matting Layer• Input: trimap

• Output: alpha matte

• Novel-designed structure

Our Method

95

Image Matting Layer• Feed-Forward:

𝑚𝑖𝑛 𝜆𝐴𝑇𝐵𝐴 + 𝜆 𝐴 − 1 𝑇𝐹(𝐴 − 1) + 𝐴𝑇𝐿𝐴• Back-Forward:

𝜕𝑓

𝜕𝐵= −𝜆𝐷−1𝑑𝑖𝑎𝑔(𝐷−1𝐹)

𝜕𝑓

𝜕𝐹=𝜕𝑓

𝜕𝐵+ 𝐷−1

𝜕𝑓

𝜕𝜆= −𝜆𝐷−1𝑑𝑖𝑎𝑔 𝐹 + 𝐵 𝐷−1𝐹

Our Method

96

Image Matting Layer• Loss function:

𝐿(𝐴, 𝐴𝑔𝑡) =

𝑖

𝑤 𝐴𝑖𝑔𝑡

| 𝐴𝑖 − 𝐴𝑖𝑔𝑡

|,

𝑤 𝐴𝑖𝑔𝑡

= −𝑙𝑜𝑔(𝑝(𝐴 = 𝐴𝑖𝑔𝑡))

Model Training

97

• Data augmentation• 4 scales {0.6,0.8,1.2,1.5}

• 4 rotations {-45,-22,22,45} degree

• Gamma value {0.5,0.8,1.2,1.5}

• Network initialization• Fine tuned from FCN-8s Model [J. Long, 2015]

Experiments

98

• Running Time• Training time: 20k iterations, one day on Titan X GPU

• Testing Time: 0.6s for 600×800 color image.

• Comparisons• Graph-cut

• FCN Baseline: direct FCN segmentation followed by closed-form matting

Results

99

Input Graph-cut FCN Ours

Results

100


Results

101


Results

102


Failure Cases

103

Input Alpha Matte Input Alpha Matte

Applications

104

Input Stylization PS GS Stick PS Fresco Stylization

Input Stylization Depth-of-Field PS Fresco Stylization

Applications

105

Input Stylization PS Palette Knife PS GS Stick PS Sketch

Input PS Oil Paint Depth-of-Field PS GS Stick Stylization

Applications

106

Input Stylization PS Palette Knife Depth-of-Field Stylization

Input Stylization PS Palette Knife PS Dark Stroke PS Paint Daubs

Conclusions

• High accuracy automatic portrait segmentation and matting approach• A novel CNN framework• Training and testing dataset• Benefits lots of applications

• Future work• Video segmentation• Human segmentation• Single portrait image depth estimation• Weakly supervised version

107

Q & A

108

Thanks

Automatic Portrait Segmentation and...

Documents

Transcript of Automatic Portrait Segmentation and...