DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision...

IEEE 2016 Conference on

Computer Vision and Pattern

Recognition

(a) (b)0

10000

20000

30000

40000

50000

# I

ma

ges

0

40000

80000

120000

160000

# I

ma

ges

(a) (b)

DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich AnnotationsZiwei Liu1 Ping Luo1 Shi Qiu2 Xiaogang Wang1 Xiaoou Tang1

1. The Chinese University of Hong Kong 2. SenseTime Group Ltd.

1. Motivation

Task： clothes recognition and retrieval

• Landmarks improve fine-grained recognition

• Massive attributes better partition feature space

• Photo pairs bridge the cross-domain gap

We provide

• Comprehensiveness: 50 categories, 1,000

attributes, 4~8 landmarks, 300K photo pairs

• Scale: over 800K real-life clothing images

Jeans

Cutoffs

SkirtDress

Kimono

Tee

Top

Sweater

Blazer

Hoodie

Chinos

(a)

(b)

WTBI[1] DARN[2] DeepFashion

# image 78,958 182,780 >800,000

# attributes 11 179 1050

# pairs 39,479 91,390 >300,000

localization bbox N/A 4~8 landmarks

2. DeepFashion Dataset

Data Source

Search engines, online stores, user posts.

Quality Control

Duplicate removal, fast screening, double checking

Annotation Assessment:

Sample Images

Attributes Statistics

Landmarks and Pairs

TexturePalm Colorblock

FabricLeather Tweed

ShapeCrop Midi

PartBow-F Fringed-H

StyleMickey Baseball

CategoryRamper Hoodie

3. FashionNet

Network Architecture

FashionNet jointly predicts landmarks and attributes

to unify global and local feature learning.

Landmark Pooling Layer

Landmark pooling layer pools and gates features

from estimated landmark locations.

Multi-task Learning

Cross-entropy loss for attributes, Euclidean loss for

landmarks, triplet loss for pairs.

4. Benchmarks

Category & Attribute Prediction

Metric: top-3 recall rate

In-shop Clothes Retrieval

Metric: top-k retrieval accuracy

Consumer-to-shop Clothes Retrieval

Metric: top-k retrieval accuracy

Further Analysis

How different variations affect performance?

(a) (b)

feature maps

of conv4

feature maps

of pool5_local

landmark

visibility

landmark

visibility

landmark

location landmark

location

max-pooling

max-pooling

……...

x

y

landmark

location

landmark

visibilitycategory attributes triplet

conv5_pose

fc6_pose

fc7_fusion fc7_pose

conv5_global

fc6_localfc6_global

pool5_local…

1

2

3 (a) (b)(a) (b)

Categories (%) Attributes (%)

WTBI[1] 43.73 27.46

DARN[2] 59.48 42.35

FashionNet 82.58 45.52

attribute positive negative

Label accuracies (%) 97.0 99.4

0.2

0.3

0.4

0.5

0.6

0.7

0.8

To

p-5

Attrib

ute R

ecall

FashionNet (Ours) DARN WTBI

Download Dataset:

http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html

References[1] M. H. Kiapour, et al. Where to buy it: Matching street clothing photos in online shops. In ICCV, 2015.

[2] J. Huang, et al. Cross-domain image retrieval with a dual attribute-aware ranking network. In ICCV, 2015

http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html

DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision...

Documents

Transcript of DeepFashion: Powering Robust Clothes Recognition and ... · IEEE 2016 Conference on Computer Vision...