DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision...

1
IEEE 2016 Conference on Computer Vision and Pattern Recognition (a) (b) 0 10000 20000 30000 40000 50000 DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations Ziwei Liu 1 Ping Luo 1 Shi Qiu 2 Xiaogang Wang 1 Xiaoou Tang 1 1. The Chinese University of Hong Kong 2. SenseTime Group Ltd. 1. Motivation Taskclothes recognition and retrieval Landmarks improve fine-grained recognition Massive attributes better partition feature space Photo pairs bridge the cross-domain gap We provide Comprehensiveness: 50 categories, 1,000 attributes, 4~8 landmarks, 300K photo pairs Scale: over 800K real-life clothing images Jeans Cutoffs Skirt Dress Kimono Tee Top Sweater Blazer Hoodie Chinos (a) (b) WTBI [1] DARN [2] DeepFashion # image 78,958 182,780 >800,000 # attributes 11 179 1050 # pairs 39,479 91,390 >300,000 localization bbox N/A 4~8 landmarks 2. DeepFashion Dataset Data Source Search engines, online stores, user posts. Quality Control Duplicate removal, fast screening, double checking Annotation Assessment: Sample Images Attributes Statistics Landmarks and Pairs Texture Palm Colorblock Fabric Leather Tweed Shape Crop Midi Part Bow-F Fringed-H Style Mickey Baseball Category Ramper Hoodie 3. FashionNet Network Architecture FashionNet jointly predicts landmarks and attributes to unify global and local feature learning. Landmark Pooling Layer Landmark pooling layer pools and gates features from estimated landmark locations. Multi-task Learning Cross-entropy loss for attributes, Euclidean loss for landmarks, triplet loss for pairs. 4 . Benchmarks Category & Attribute Prediction Metric: top-3 recall rate In-shop Clothes Retrieval Metric: top-k retrieval accuracy Consumer-to-shop Clothes Retrieval Metric: top-k retrieval accuracy Further Analysis How different variations affect performance? (b) feature maps of conv4 feature maps of pool5_local landmark visibility landmark visibility landmark location landmark location max-pooling max-pooling . . . x y landmark location landmark visibility category attributes triplet conv5_pose fc6_pose fc7_fusion fc7_pose conv5_global fc6_local fc6_global pool5_local 1 2 3 (b) Categories (%) Attributes (%) WTBI [1] 43.73 27.46 DARN [2] 59.48 42.35 FashionNet 82.58 45.52 attribute positive negative Label accuracies (%) 97.0 99.4 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Top-5 Attribute Recall FashionNet (Ours) DARN WTBI Download Dataset: http://mmlab.ie.cuhk.edu.hk/projects/ DeepFashion.html References [1] M. H. Kiapour, et al. Where to buy it: Matching street clothing photos in online shops. In ICCV, 2015. [2] J. Huang, et al. Cross-domain image retrieval with a dual attribute-aware ranking network. In ICCV, 2015

Transcript of DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision...

Page 1: DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision and Pattern Recognition (a) (b) 0 10000 20000 30000 40000 50000 es 0 40000 80000

IEEE 2016 Conference on

Computer Vision and Pattern

Recognition

(a) (b)0

10000

20000

30000

40000

50000

# I

ma

ges

0

40000

80000

120000

160000

# I

ma

ges

(a) (b)

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich AnnotationsZiwei Liu1 Ping Luo1 Shi Qiu2 Xiaogang Wang1 Xiaoou Tang1

1. The Chinese University of Hong Kong 2. SenseTime Group Ltd.

1. Motivation

Task: clothes recognition and retrieval

• Landmarks improve fine-grained recognition

• Massive attributes better partition feature space

• Photo pairs bridge the cross-domain gap

We provide

• Comprehensiveness: 50 categories, 1,000

attributes, 4~8 landmarks, 300K photo pairs

• Scale: over 800K real-life clothing images

Jeans

Cutoffs

SkirtDress

Kimono

Tee

Top

Sweater

Blazer

Hoodie

Chinos

(a)

(b)

WTBI[1] DARN[2] DeepFashion

# image 78,958 182,780 >800,000

# attributes 11 179 1050

# pairs 39,479 91,390 >300,000

localization bbox N/A 4~8 landmarks

2. DeepFashion Dataset

Data Source

Search engines, online stores, user posts.

Quality Control

Duplicate removal, fast screening, double checking

Annotation Assessment:

Sample Images

Attributes Statistics

Landmarks and Pairs

TexturePalm Colorblock

FabricLeather Tweed

ShapeCrop Midi

PartBow-F Fringed-H

StyleMickey Baseball

CategoryRamper Hoodie

3. FashionNet

Network Architecture

FashionNet jointly predicts landmarks and attributes

to unify global and local feature learning.

Landmark Pooling Layer

Landmark pooling layer pools and gates features

from estimated landmark locations.

Multi-task Learning

Cross-entropy loss for attributes, Euclidean loss for

landmarks, triplet loss for pairs.

4. Benchmarks

Category & Attribute Prediction

Metric: top-3 recall rate

In-shop Clothes Retrieval

Metric: top-k retrieval accuracy

Consumer-to-shop Clothes Retrieval

Metric: top-k retrieval accuracy

Further Analysis

How different variations affect performance?

(a) (b)

feature maps

of conv4

feature maps

of pool5_local

landmark

visibility

landmark

visibility

landmark

location landmark

location

max-pooling

max-pooling

……...

x

y

landmark

location

landmark

visibilitycategory attributes triplet

conv5_pose

fc6_pose

fc7_fusion fc7_pose

conv5_global

fc6_localfc6_global

pool5_local…

1

2

3 (a) (b)(a) (b)

Categories (%) Attributes (%)

WTBI[1] 43.73 27.46

DARN[2] 59.48 42.35

FashionNet 82.58 45.52

attribute positive negative

Label accuracies (%) 97.0 99.4

0.2

0.3

0.4

0.5

0.6

0.7

0.8

To

p-5

Attrib

ute R

ecall

FashionNet (Ours) DARN WTBI

Download Dataset:

http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html

References[1] M. H. Kiapour, et al. Where to buy it: Matching street clothing photos in online shops. In ICCV, 2015.

[2] J. Huang, et al. Cross-domain image retrieval with a dual attribute-aware ranking network. In ICCV, 2015