MediaImpact University...

8
Fashion Analysis: Current Techniques and Future Directions D riven by the huge profit potential in the fashion industry, intelligent fashion anal- ysis based on techniques for clothing and make- over analysis is receiving a great deal of attention in the multimedia and computer vision literature. In China, total retail sales in the clothing market exceeded $977.8 billion yuan in 2012, and its apparel market is one of the “fastest growing markets in the world.” 1 Such a huge market prospect greatly motivates clothing relevant research. Many cosmetic companies in the world, such as Gucci and L’Or eal, have also seen huge economic gains. L’Or eal reported that the world cosmetic mar- ket grew from 2005 to 2012, and the company’s operating profit reached 22 million Euros in 2012. 2 As Figure 1 shows, fashion analysis contains two components: clothing analysis and facial beauty (including makeup and hairstyle) analy- sis. This article surveys the state of the art in both areas. We first provide an overview of the techniques for clothing modeling 3 and then introduce the representative works on clothing recognition. 4 Based on the results of clothing recognition, clothing parsing can be conducted. (The first clothing parsing work was conducted by Kota Yamaguchi and his colleagues. 5 Later, several researchers proposed different methods for clothing parsing with various inputs. 6 ) These clothing-related techniques serve as the foundation for two clothing-related applica- tions: clothing retrieval and recommendation. Our survey also provides several example imple- mentation systems from this field of research, including the Street-to-Shop system 7 and the Magic Closet. 8 In facial beauty research, some research works focus on the internal facial regions (such as facial shape, eyes, and mouth). Others focus on external facial regions, such as hair. We first survey several research directions related to facial and hair beauty. Then, we introduce the Beauty e-Experts system, which is a complete system with both automatic makeover recom- mendation and synthesis. Clothing Analysis Techniques There is already a large body of research on clothing modeling, recognition, parsing, retrieval, and recommendations. As a brief overview, here we introduce the representative works of each category. Clothing Modeling Hong Chen and his colleagues presented a con- text-sensitive grammar in an AND-OR graph representation that can produce a large set of composite graphical templates to account for the variability of cloth configurations. 3 In the supervised learning phase, an artist is asked to draw clothing sketches that are decomposed into body components. Each component has a number of distinct subtemplates. These sub- templates serve as leaf nodes in an AND-OR graph. The AND-nodes represent a decomposi- tion of the graph into subconfigurations with Markov relations for context and constraints, and the OR-nodes are switches for choosing one out of a set of alternative AND-nodes. The work done by Chen and his colleagues mainly involves constructing a 2D model of the clothing. 3 Several researchers have also pro- posed methods for estimating a 3D clothing model and utilized it to estimate the 3D shape of a person from images of that person wearing clothing. Clothing Recognition Ming Yang and Kai Yu presented a complete sys- tem for tagging clothing categories in real time. 4 Specifically, the system takes advantage of face detection and tracking to locate human figures, and the authors also developed an efficient clothing segmentation method utilizing Voro- noi images to select seeds for region growing. Si Liu, Luoqi Liu, and Shuicheng Yan National University of Singapore Cees Snoek University of Amsterdam Media Impact 1070-986X/14/$31.00 c 2014 IEEE Published by the IEEE Computer Society 72 Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Transcript of MediaImpact University...

Page 1: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

Fashion Analysis: CurrentTechniques and Future Directions

D riven by the huge profit potential in the

fashion industry, intelligent fashion anal-

ysis based on techniques for clothing and make-

over analysis is receiving a great deal of

attention in the multimedia and computer

vision literature. In China, total retail sales in

the clothing market exceeded $977.8 billion

yuan in 2012, and its apparel market is one of

the “fastest growing markets in the world.”1

Such a huge market prospect greatly motivates

clothing relevant research. Many cosmetic

companies in the world, such as Gucci and

L’Or�eal, have also seen huge economic gains.

L’Or�eal reported that the world cosmetic mar-

ket grew from 2005 to 2012, and the company’s

operating profit reached 22 million Euros in

2012.2

As Figure 1 shows, fashion analysis contains

two components: clothing analysis and facial

beauty (including makeup and hairstyle) analy-

sis. This article surveys the state of the art in

both areas. We first provide an overview of the

techniques for clothing modeling3 and then

introduce the representative works on clothing

recognition.4 Based on the results of clothing

recognition, clothing parsing can be conducted.

(The first clothing parsing work was conducted

by Kota Yamaguchi and his colleagues.5 Later,

several researchers proposed different methods

for clothing parsing with various inputs.6)

These clothing-related techniques serve as the

foundation for two clothing-related applica-

tions: clothing retrieval and recommendation.

Our survey also provides several example imple-

mentation systems from this field of research,

including the Street-to-Shop system7 and the

Magic Closet.8

In facial beauty research, some research

works focus on the internal facial regions (such

as facial shape, eyes, and mouth). Others focus

on external facial regions, such as hair. We first

survey several research directions related to

facial and hair beauty. Then, we introduce the

Beauty e-Experts system, which is a complete

system with both automatic makeover recom-

mendation and synthesis.

Clothing Analysis TechniquesThere is already a large body of research on

clothing modeling, recognition, parsing,

retrieval, and recommendations. As a brief

overview, here we introduce the representative

works of each category.

Clothing Modeling

Hong Chen and his colleagues presented a con-

text-sensitive grammar in an AND-OR graph

representation that can produce a large set of

composite graphical templates to account for

the variability of cloth configurations.3 In the

supervised learning phase, an artist is asked to

draw clothing sketches that are decomposed

into body components. Each component has a

number of distinct subtemplates. These sub-

templates serve as leaf nodes in an AND-OR

graph. The AND-nodes represent a decomposi-

tion of the graph into subconfigurations with

Markov relations for context and constraints,

and the OR-nodes are switches for choosing

one out of a set of alternative AND-nodes.

The work done by Chen and his colleagues

mainly involves constructing a 2D model of the

clothing.3 Several researchers have also pro-

posed methods for estimating a 3D clothing

model and utilized it to estimate the 3D shape

of a person from images of that person wearing

clothing.

Clothing Recognition

Ming Yang and Kai Yu presented a complete sys-

tem for tagging clothing categories in real time.4

Specifically, the system takes advantage of face

detection and tracking to locate human figures,

and the authors also developed an efficient

clothing segmentation method utilizing Voro-

noi images to select seeds for region growing.

Si Liu, Luoqi Liu,and

Shuicheng YanNational

University ofSingapore

Cees SnoekUniversity of AmsterdamMedia Impact

1070-986X/14/$31.00�c 2014 IEEE Published by the IEEE Computer Society72Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 2: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

Other researchers follow almost the same

pipeline for recognizing and classifying human

clothing. The main difference is the use of dif-

ferent kinds of classifiers, such as random forest.

Clothing Parsing

Kota Yamaguchi and his colleagues developed an

effective method for parsing clothing in fashion

photographs, which is an extremely challenging

problem because of the large number of possible

garment items and the variations in configura-

tion, garment appearance, layering, and occlu-

sion.5 In addition, to enable future research on

clothing parsing, they provided a large novel

dataset and tools for labeling garment items.

Later, Yamaguchi and his colleagues further

tackled the clothing parsing problem using a

retrieval-based approach. For a query image,

they found similar styles from a large database

of tagged fashion images and used these exam-

ples to parse the query. Their approach com-

bines parsing from pretrained global clothing

models, local clothing models learned on the

fly from retrieved examples, and transferred

parse masks (paper doll item transfer) from

retrieved examples.

Jian Dong and his colleagues used parselets

as the building blocks of a parsing model.6 Par-

selets are a group of parsable segments that can

generally be obtained by low-level over-seg-

mentation algorithms. They built a deformable

mixture parsing model (DMPM) for human

parsing to simultaneously handle the deforma-

tion and multimodalities of parselets. Their

model has two unique characteristics: the possi-

ble modalities of parselet ensembles are exhib-

ited as the AND-OR structure of subtrees and

the visibility properties are directly modeled at

some leaf nodes.

Clothing Retrieval

The early works on clothing retrieval mainly

tackled this problem by leveraging low-level

features and high-level attributes of clothes.

Typical clothing retrieval systems may utilize a

content-based image retrieval approach first, in

which a codebook is constructed from extracted

dominant color patches. A reranking approach

is then used to improve the search quality by

exploiting clothing attributes.

Si Liu and her colleagues addressed the practi-

cal problem of cross-scenario clothing retrieval:

given a photo of a person captured in a general

environment, such as on the street, they

attempted to find similar clothing in online

shops, where the photos were captured more

professionally and with clean backgrounds.7 In

this case, there are large discrepancies between

the daily photo scenario and the online shop-

ping scenario. The authors proposed a solution

that includes two key components: human/

clothing parts alignment to handle human pose

variations and bridging cross-scenario discrep-

ancies with an auxiliary daily photo dataset.

Figure 2 shows their proposed framework.

First, given a daily photo, 20 upper body parts

and 10 lower body parts are located. Second,

each human part in the daily photo, such as the

Fashionanalysis

(1) Recommendation

Dating

(2) Retrieval

(3) Parsing

Clothing Makeover

(1) Attractiveness prediction

(2) Makeup synthesis

(3) Makeover recommendation

More attractive Less attractive

Figure 1. Research on fashion analysis mainly studies two aspects: clothing and makeovers. Clothing

analysis tasks include clothing recommendation, retrieval, and parsing, whereas makeover analysis tasks

involve attractiveness prediction, makeup synthesis, and makeover recommendations.

Ap

ril–Jun

e2014

73Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 3: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

zoomed-in right shoulder, y in Figure 2, is line-

arly reconstructed by the collected auxiliary set

Y with a sparse coefficient a. Because both y and

Y are daily photos, the precision of the online

within-scenario reconstruction can be guaran-

teed. The auxiliary set Y is collaboratively

reconstructed offline by the OS dataset X, and

the sparse reconstruction coefficient matrix Z is

obtained by multitask sparse representation

with the constraint that the reconstruction

errors corresponding to the same spatial posi-

tion (several rows of the error term E) are acti-

vated or turned off simultaneously.

Finally, the feature representation of the

daily photo y is refined by integrating the

online calculated within-scenario sparse coeffi-

cient a and the offline cross-scenario collabora-

tive reconstruction coefficient Z by y0 ¼ XYa.Consequently, a nearest-neighbor search of the

OS dataset can be implemented based on the

reconstructed new feature representation y0.

Clothing Recommendation

Few existing works target the clothing recom-

mendation task. Some websites (such as www.

dresscodeguide.com) can provide recommen-

dations for the most suitable clothing for a spe-

cific occasion. Many current recommendation

systems consider the scenario during the rec-

ommendation process: given the hair color, eye

color, and skin color of the input body, plus a

wardrobe of clothing items, the outfit synthesis

system suggests a set of outfits subject to a par-

ticular dress code.

The Magic Closet is one such system.8 It

mainly targets at two clothing recommendation

scenarios. The first scenario is clothing suggestion.

As the top panel in Figure 3 shows, a user speci-

fies an occasion and the most suitable suits or

two separate clothing items (such as one t-shirt

and one pair of trousers) are suggested from the

user’s photo album. For this function, the Magic

Closet can be implemented as a mobile applica-

tion. The second scenario is clothing pairing. The

bottom panel in Figure 3 shows that a user

inputs an occasion and one reference clothing

item (such as a t-shirt the user wants to pair

with), and the most matched clothing from the

online shopping website is returned (such as a

skirt). The returned clothing should aestheti-

cally pair with the reference clothing and be

suitable for the specified occasion. For this func-

tion, the Magic Closet system can serve as a

plug-in application in any online shopping web-

site for shopping recommendations.

Two key principles were considered when

designing the Magic Closet. One is wearing prop-

erly, which means putting on decent, suitable

clothing. Such recommendations must con-

form to applicable dress codes, which are writ-

ten and unwritten rules with regard to clothing

and common sense. The other is wearing aes-

thetically, which are atheistic rules that need to

be followed when someone pairs upper and

lower body clothing items. For example, it may

not be appropriate to wear a red coat and a pair

of green pants together.

In the model learning process, to narrow the

semantic gap between the low-level visual fea-

tures of clothing and the high-level occasion

categories information, middle-level clothing

attributes are utilized. Attributes have proven

Within-scenariospare representation

αParts alignmentQuery dailyphoto

020 40 60 80 1000

0.5

1.0

1.5

2.0

Y X EZ

+≅ ……

Offline: Cross-scenario transfer learning

Online: Parts alignment and two-step feature transfer

Cross-scenariospare representation Zα Retrieved clothing

Reconstructionbasis

y

Figure 2. Framework of the Street-to-Shop clothing retrieval system. The proposed solution includes two key components: human/

clothing parts alignment to handle human pose variations and bridging cross-scenario discrepancies with an auxiliary daily photo

dataset. (This illustration originally appeared in previous work.7)

Media Impact

74

IEEE

Mu

ltiM

ed

ia

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 4: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

useful in many computer vision tasks. The

Magic Closet utilizes seven defined multivalue

clothing attributes, including the category

attribute (such as jeans or skirt) and detail attrib-

utes, which describe certain properties of the

clothing (such as the color or pattern). The

clothing recommendation model is learned

through a unified latent support vector machine

(SVM) framework. The model integrates four

potential terms:

� visual features versus attribute,

� visual features versus occasion,

� attributes versus occasion, and

� attribute versus attribute.

Here the first three concern clothing-occa-

sion matching and the last one describes the

clothing-clothing matching. Embedding these

matching rules into the latent SVM model

explicitly ensures that the recommended cloth-

ing satisfies the wearing properly and wearing

aesthetically requirements simultaneously.

Future Directions of Clothing AnalysisThe key techniques for clothing analysis—

human/object detection, human pose estima-

tion, and image semantic segmentation—can

also be directly applied to the analysis of other

fashion items. Thus, it is likely that more works

will be conducted on the analysis of other fash-

ion items, such as bags, shoes, and earrings,

because there is a huge market for these fashion

items as well. However, these items are rela-

tively small compared with clothes, so tailored

features are demanded for item representation.

Another likely area of future research is vir-

tual fitting, which mainly involves techniques

for clothing analysis including retrieval and rec-

ommendation applications. In real life, custom-

ers expect to try on recommended clothing.

Thus, a virtual fitting would be a way to see the

virtual effects. In 2013, some large fashion

brands, including Adidas and Hugo Boss, imple-

mented online virtual fitting rooms in a bid to

reduce return rates and improve customer satis-

faction. However, the virtual fitting problem is

still far from solved. For example, it is challeng-

ing to virtually change the texture and pattern

of a t-shirt while maintaining correct clothes

deformation and shading. Another topic worth

studying is how to use an image-based approach

with a multicamera setup to transfer clothes.

Lastly, online clothing shopping is becom-

ing increasingly popular. In many online shop-

ping websites, such as Amazon.com, eBay.com,

and shopstyle.com, customers can conven-

iently find their favorite clothing by typing

I am at a conference, pleaserecommend something from my

photo album?

I am going on a date,please find something tomatch with this t-shirt?

(a) Attribute vs. occasion model

(b) Attribute vs. attribute model

Occasion-oriented clothing recommendation results

(c) Clothing attribute estimation

NoSleeve

wedding

sleeveless

straplessdress

silk

cotton

long_sleeve

jewelneckline

0.02

42

0.0759

0.0238

0.0161

0.01290.0182

Long_sleeve

Short_sleeve 0.41

0.68

0.29 0.29

Whitesleevelessstrapless

0.13

0.29

0.19

0.350.36

longnorm

al

short

Clothing recommendation model

Occasion-oriented clothing pairing results

(a) Suit (b) Upper + lower

++

Online shopping websites

T-shirt

Figure 3. Two typical clothing recommendation scenarios for the Magic Closet. (Top panel) Given a specified occasion, the system

suggests the most suitable clothing combinations from a photo album. The suggested clothing can be one or two pieces of clothing.

(Bottom panel) For clothing pairing, given an occasion and a reference piece of clothing, the clothing most suitable for the occasion

and most matched with the reference clothing is recommended from online shopping websites. (This illustration originally appeared

in previous work.8)

75

Ap

ril–Jun

e2014

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 5: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

keywords, such as “black, sleeveless cocktail

dress.” Also, some online websites, such as

www.dresscodeguide.com, www.chictopia.com,

www.mogujie.com, and www.meilishuo.com,

attempt to teach girls how to dress well. How-

ever, many challenges still exist when deploy-

ing current academic demo systems into real

application scenarios. In the coming years, we

expect to see more interaction between aca-

demia and industry, and practical and deploy-

able systems will be the common interest.

Makeover AnalysisResearch on makeover analysis typically in-

volves studies looking at facial and hair beauty.

Facial beauty studies include facial attractive-

ness prediction, facial geometry beautification,

and facial makeup synthesis, whereas hair

beauty research consists of hair segmentation

and modeling. After surveying the recent stud-

ies on facial and hair beauty as well as makeover

recommendation and synthesis systems, we

discuss future directions in makeover analysis.

Facial Beauty Study

Facial attractiveness largely depends on a per-

son’s precisely controlled symmetry, height of

the internal features, relative luminance of

different facial features, and the quality of the

skin. There are several major theories about

facial attractiveness, such as the composite

faces theory, the symmetry theory, the skin and

texture theory, the (geometric) facial feature

theory, the golden ratio theory, the facial thirds

theory, the facial fifths theory, the juvenilized

face theory, and the frontal versus lateral view

theory. For example, the composite faces theory

suggests that an average face is more attractive

than a normal one, but that a very beautiful

face may not be close to the average face.

Facial Attractiveness Prediction. There have

already been numerous works evaluating facial

attractiveness. For example, Douglas Gray and

his colleagues built a system to learn female

facial beauty and produce human-like predic-

tors.9 Instead of an absolute rating, they used a

pairwise rating to accelerate the rating process.

Active learning is introduced into the system to

avoid heavy manual labeling. A hierarchical

feed-forward model is learned directly from the

raw pixels to train a human-like predictor.

Tam Nguyen and his colleagues presented a

comprehensive study on how multiple modal-

ities (face, dress, and voice) jointly affect the

sense of female attractiveness (see Figure 4).10

k-wise preferencesfrom participants

Face Dressing Voice

Attributes annotation byworkers in mechanical Turk

VERY ATTRACTIVEATTRACTIVEQUITE ATTRACTIVE

Attractiveness score

Eye open

LipstickSmilingAND/OR

AND/ORSleevelessLong dress Skirt shape Pant shape

SmoothnessSinging

AttributesM2B dataset

HairstyleEye wear

Figure 4. Comprehensive

study of sensing human

beauty via

multimodality cues:

face, dress, and voice.11

The attractiveness

scores are given by

k-wise preferences from

participants. The

attribute model and the

attractiveness model

are jointly learned in a

dual-supervised model.

Media Impact

76

IEEE

Mu

ltiM

ed

ia

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 6: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

They extended the pairwise rating to a more

user-friendly k-wise rating for large-scale attrac-

tiveness annotation. They further integrated

the attribute model and the attractiveness

model into a dual-supervised model to jointly

learn their relations. The commonalities and

differences of the Eastern and Western sensing

of attractiveness were also analyzed.

Facial Geometry Beautification. Facial geom-

etry beautification mainly aims to enhance the

attractiveness of facial shapes and components.

This kind of technology is helpful for plastic

surgery. Before a plastic surgery operation is per-

formed on patients, it is important for them to

fully understand the change in the facial

appearance after the operation. Digital facial

beautification makes it possible for a patient to

view the outcome during an early stage.

Tommer Leyvand and his colleagues pro-

posed a data-driven approach to optimize facial

shapes, while maintaining a close similarity

between the beautified and original faces.11

They trained a beautification engine on a data-

base of 92 Caucasian male and female faces and

rated the attractiveness score of each face.

Given a new face, 84 facial landmark points are

first detected, and a set of distances between

those landmarks are calculated to define a high-

dimensional face geometry space. Then the

nearby face with higher attractiveness rating is

searched by a K-nearest neighbor or SVM in the

face space to serve as a target of beautification.

Beautification is applied by calculating a 2D

warp field mapping from the original face to

the beautified face according to the correspond-

ing landmark locations.

Facial Makeup Synthesis. As opposed to facial

geometry beautification, digital facial makeup

synthesis mainly focuses on changing the facial

appearance. As a result of the huge market

potential, a variety of software applications have

been built for online cosmetic shops to help

women simulate the effects of makeup, such as

Taaz (www.taaz.com). Some image processing

software, such as Photoshop, can also be used to

synthesize the effects of makeup, but most of

them require heavy manual interaction work.

Most research works try to utilize before-

and-after image pairs created by professional

makeup artists to extract makeup. Their cos-

metic-transfer procedure defines makeup as

ratios of appearance change on before-and-after

image pairs.

Unlike these works, the approach proposed

by Dong Guo and Terence Sim only requires

after-makeup photos.12 This is more conven-

ient for real applications because the before-

and-after pairs are often difficult to obtain in

practice. Guo and Sim decomposed the test and

example images into face structure, skin detail,

and color layers, and they assumed that makeup

mainly exists in the skin detail and color layers.

Then the test face and the example face are

aligned by a thin plate spline (TPS) referring to

the 83 detected facial landmark points. Makeup

is transferred from the example image to the

test image, layer by layer, before being compos-

ited to generate the final effect.

Hair Beauty Study

A person’s overall attractiveness depends on not

only the internal face traits but on the external

facial regions, such as hair, as well. Still in its

early stage, the research of hair beauty mainly

focuses on hair segmentation and modeling.

Hair Segmentation. Hair segmentation is a

task that may benefit other problems, such as

hair style retrieval, virtual makeovers, and hair

sketching. The difficulty in modeling shape var-

iations makes it challenging. Nan Wang and his

colleagues proposed a part-based model to iden-

tify local parts.13 They proposed a measurable

statistic, called subspace clustering dependency

(SC-Dependency), to estimate the co-occur-

rence probabilities between local shapes. Then,

a Markov random field is utilized to formulate

this part identification problem and optimize

the effectiveness of the potential functions.

Hair Modeling. Hong Chen and Song-Chun

Zhu presented a generative sketch model for

hair analysis and synthesis.14 They proposed a

2D view-based method that treats hair images as

2D piecewise smooth vector (flow) fields. The

generative model is divided into three levels: the

high-frequency band of the hair image; the mid-

dle level of piecewise smooth vector field for the

hair orientation, gradient strength, and growth

directions; and the top level of attribute sketch

graph for representing the discontinuities in the

vector field. This model can then synthesize real-

istic hair images and stylistic drawings from a

sketch graph and Gaussian basis.

Makeover Recommendation and Synthesis

The Beauty e-Experts system15 is the first make-

over recommendation and synthesis system

77

Ap

ril–Jun

e2014

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 7: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

that helps users to select hairstyles and makeup

automatically and produces the synthesized

visual effects (see Figure 5). The main challenge

is determining how to model the complex rela-

tionships among different beauty and beauty-

related attributes for reliable recommendation

and natural synthesis.

To address this challenge, a large Beauty

e-Experts dataset was constructed that contains

1,505 images of beautiful female figures se-

lected from professional fashion websites. The

beauty attributes are labeled for each image

in the whole dataset. These beauty attributes,

including different hairstyles and makeup

types, are all adjustable. Their specific combina-

tion is considered the recommendation objec-

tive of the Beauty e-Experts system. To narrow

the gap between the high-level beauty attrib-

utes and low-level image features, a set of mid-

level beauty-related attributes, such as facial

traits and clothing properties, are also anno-

tated for the dataset.

Based on all these attributes, a multiple tree-

structured super-graph model is learned to

explore the complex relationships among these

attributes. As a generalization of a graph, a

super-graph can theoretically characterize any

type of relationships among different attributes

and thus provide a powerful recommendation.

A multiple tree-structured approximation is

proposed to reserve the most important

relationships and make the inference procedure

tractable. Based on the recommended results,

an effective and efficient facial image synthesis

module is designed to seamlessly synthesize the

recommended results into a facial image for the

user.

Future Directions of Makeover AnalysisOne area of possible future research is personal-

ized makeover recommendation. Different users

may have different makeover preferences. In

addition to personal tastes, different occasions

require different makeover effects. For example,

a woman attending school classes will likely

require different makeup than when she attends

a banquet. These differences also vary largely for

different groups. Asian women, for example,

may not like large eye shadow shapes because

their face shapes are more flat than Europeans.

Preferences may also vary with age. For exam-

ple, many seniors may not like colorful and

heavy makeup, so light makeup and face treat-

ment is enough for them. Thus, personal prefer-

ence, race, occasion, and age should all be

considered in makeover recommendations.

To satisfy such personal preferences, one pos-

sible solution is to consider using history infor-

mation about a person’s makeover preference.

People may share some of their own makeover

photos in Facebook and other photo-sharing

websites, and an improved recommendation

I have secretbeauty “experts”!

Beautye-experts

Long curls side part

Lip gloss

Eye shadow color

Eye shadow template

Foundation

Synthesize and share

Wow! You are sobeautiful today!

Figure 5. Beauty

e-Experts system.15

Based on the user’s

facial and clothing

characteristics, the

Beauty e-Experts system

automatically

recommends suitable

hairstyle and makeup

products to the user and

then produces the

synthesized visual

effects.

Media Impact

78

IEEE

Mu

ltiM

ed

ia

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.

Page 8: MediaImpact University ofAmsterdamcolalab.org/media/paper/Fashion_Analysis_Current_Techniques_and_… · Other researchers follow almost the same pipeline for recognizing and classifying

model could be learned from such photos. In

addition, race-, occasion-, and age-oriented

makeover recommendations can also be mod-

eled by integrating with race, occasion, and age

information in the recommendation model.

Overall, the personalized makeover analysis is

particularly important for real applications, and

new models and learning methods need be

developed for this purpose.

Current research on makeover analysis is

mainly based on relatively small-scale datasets.

However, given the billions of photos available

on the Web, a scalable recommendation model

must be built. Thus, more efforts in benchmark

database construction are necessary, and more-

over, scalable models and learning methods

also need to be developed to satisfy the speed

requirements of real applications.

Lastly, makeover preferences are dynamic,

and new makeover-related photos are easily

available from the Web. An interesting direc-

tion may involve automatically mining make-

over trends from online content to intelligently

and dynamically serve users.

ConclusionThe promising future directions discussed in

both clothing and makeover analysis will need

to bridge the gap between academic research

and real industry demand. Given the huge

profit potential in the ever-growing consumer

fashion industry, the representative intelligent

fashion analysis techniques surveyed here are

just the beginning of this expanding research

field. MM

References

1. “China’s Apparel Market, 2013,” research report,

Fung Business Intelligence Center, 4 Dec. 2013;

http://economists-pick-research.hktdc.com/

business-news/article/Economic-Forum/China-s-

Apparel-Market-2013/ef/en/1/1X000000/

1X09VKMQ.htm.

2. “2012 Annual Report,” L’Or�eal, 2012; www.

loreal-finance.com/eng/annual-report.

3. H. Chen et al., “Composite Templates for Cloth

Modeling and Sketching,” Proc. 2006 IEEE Conf.

Computer Vision and Pattern Recognition, 2006,

pp. 943–950.

4. M. Yang and K. Yu, “Real-Time Clothing Recogni-

tion in Surveillance Videos,” Proc. IEEE Int’l Conf.

Image Processing, 2011, pp. 2937–2940.

5. K. Yamaguchi et al., “Parsing Clothing in Fashion

Photographs,” Proc. 2012 IEEE Conf. Computer

Vision and Pattern Recognition, 2012,

pp. 3570–3577.

6. J. Dong et al., “A Deformable Mixture Parsing

Model with Parselets,” Proc. Int’l Conf. Computer

Vision, 2013, pp. 3408–3415.

7. S. Liu et al., “Street-to-Shop: Cross-Scenario Cloth-

ing Retrieval via Parts Alignment and Auxiliary

Set,” Proc. 2012 IEEE Conf. Computer Vision and

Pattern Recognition, 2012, pp. 3330–3337.

8. S. Liu et al., “Hi, Magic Closet, Tell Me What to

Wear!” Proc. ACM Int’l Conf. Multimedia, 2012,

pp. 619–628.

9. D. Gray et al., “Predicting Facial Beauty without

Landmarks,” Proc. European Conf. Computer Vision,

part VI, 2010, pp. 434–447.

10. T.V Nguyen et al., “Sense Beauty via Face, Dress-

ing, and/or Voice,” Proc. 20th ACM Int’l Conf.

Multimedia, 2012, pp. 239–248.

11. T. Leyvand et al., “Data-Driven Enhancement of

Facial Attractiveness,” ACM Trans. Graphics, vol.

27, no. 3, 2008, article no. 38.

12. D. Guo and T. Sim, “Digital Face Makeup by Exam-

ple,” Proc. 2009 IEEE Conf. Computer Vision and Pat-

tern Recognition, 2009, pp. 73–79.

13. N. Wang, H. Ai, and F. Tang, “What Are Good Parts

for Hair Shape Modeling?” Proc. 2012 IEEE Conf.

Computer Vision and Pattern Recognition, 2012,

pp. 662–669.

14. H. Chen and S.-C. Zhu, “A Generative Sketch

Model for Human Hair Analysis and Synthesis,”

Trans. Pattern Analysis and Machine Intelligence,

vol. 28, no. 7, 2006, pp. 1025–1040.

15. L. Liu et al., “News Contextualization with Geo-

graphic and Visual Information,” Proc. 19th ACM

Int’l Conf. Multimedia, 2013, pp. 133–141.

Si Liu is a research fellow in the Learning and Vision

Research Group, Department of Electrical and Com-

puter Engineering, at National University of Singa-

pore. Contact her at [email protected].

Luoqi Liu is a doctoral student in the Learning and

Vision Research Group, Department of Electrical and

Computer Engineering, at National University of

Singapore. Contact him at [email protected].

Shuicheng Yan is an associate professor in the

Department of Electrical and Computer Engineering

at National University of Singapore and the found-

ing lead of the Learning and Vision Research Group.

Contact him at [email protected].

79

Ap

ril–Jun

e2014

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on April 01,2020 at 13:49:50 UTC from IEEE Xplore. Restrictions apply.