Deep learning for e-commerce: current status and future prospects

20
Deep learning for e-commerce: current status and future prospects Oct.28.2017 Béranger Dumont Rakuten Institute of Technology Rakuten, Inc.

Transcript of Deep learning for e-commerce: current status and future prospects

Deep learning for e-commerce:current status and future prospectsOct.28.2017

Béranger Dumont Rakuten Institute of TechnologyRakuten, Inc.

2

Introduction

‣Spectacular success of deep learning techniques since 2012

‣What can we use for e-commerce and how?

[A. Karpathy, L. Fei-Fei,CVPR 2015]

[Steve James,CC BY-NC-ND 2.0]

[image courtesy of McCown,http://weekendblitz.com]

3

Simple representations: categorical variables, text

‣The most simple representations can be uninformative / limiting / hard to manipulate

‣One-hot representation

‣Bag-of-words representation

4

Simple representations: images

‣The most simple representations can be uninformative / limiting / hard to manipulate

‣ Image RGB representation

4 6 1 3

0 2 16 53

2 3 7 10

1 8 1 3

9 7 3 2

26 22 16 53

15 3 7 10

8 8 1 3

35 19 25 6

13 22 16 53

4 3 7 10

0 8 1 3

width: 4 pixels

height: 4 pixels

3 colorchannels

5

Learning representations

Rule-basedsystems

Classicmachinelearning

Simple representation

learning

Input

Hand-designedprogram

Output

Input

Hand-designedprogram

Output

Mappingfrom

features

Input

Features

Output

Mappingfrom

features

Deeplearning

Input

Simple

features

Output

Mappingfrom

features

More complexfeatures

adapted from[Deep Learning,I. Goodfellow,

Y. Bengio,A. Courville,

MIT Press 2016]

6

Deep Convolutional Neural Networks

Most spectacular success ofdeep learning algorithms

[Deep Learning, I. Goodfellow et al,MIT Press 2016]

7

Recurrent Neural Networks

‣Application to a wide range of data: - text - audio - video - browsing / purchase history - …

‣Notable example:machine translation

Deep learning for dealing with sequences(ordered data of variable length)

Long short-term memory (LSTM) cell

[https://cs224d.stanford.edu/][http://colah.github.io/]

8

Applications on e-commerce: selected topics

Improving the catalog of products Product recommendations

[image courtesy of G. Agis, http://blog.guillaumeagis.eu]

9

Catalog of products

‣Very important for e-commerce but very challenging for a marketplace

‣A good catalog has well-organized productsi.e. structured information: - category - attributes

- available and accurate for every product

‣Essential for: - browsing experience - SEO, visibility on Google Shopping - detailed market analyses - downstream tasks (e.g. recommendation)

Example of taxonomy

Examples of product attributes (bottle of wine)

10

Improving the catalog of products

‣Available data: - text: product title, description, user reviews - image: product picture(s) - browsing patterns of users, search queries

‣Goals: - build / match / normalize taxonomies - categorize products - predict product attributes

11

Improving the catalog of products: attribute prediction from text

[G. Lample et al., NAACL 2016]

‣Goals: - build / match / normalize taxonomies - categorize products - predict product attributes

‣Approach: tag entities in texte.g.

‣Deep learning with bidirectional LSTM-CRF - no feature engineering - character-based word representation

‣Available data: - text: product title, description, user reviews - image: product picture(s) - browsing patterns of users, search queries

LOCATION A jewel in the [living room], COLOR TYPE this [golden-yellow] [armchair]

is very comfortable.

12

[MUTAN,H. Ben-Younes et al.,

arXiv:1705.06676]

‣Basic idea: combine individual results from text and image classification ‣More interesting: joint learning from text and image

‣Goals: - build / match / normalize taxonomies - categorize products - predict product attributes

‣Available data: - text: product title, description, user reviews - image: product picture(s) - browsing patterns of users, search queries

Improving the catalog of products: categorization from text and image

12

[MUTAN,H. Ben-Younes et al.,

arXiv:1705.06676]

‣Basic idea: combine individual results from text and image classification ‣More interesting: joint learning from text and image

“totes Men's Stadium Black

Size 9 M”

boot

‣Goals: - build / match / normalize taxonomies - categorize products - predict product attributes

‣Available data: - text: product title, description, user reviews - image: product picture(s) - browsing patterns of users, search queries

Improving the catalog of products: categorization from text and image

[https://www.rakuten.com/shop/shoe-pulse/product/4900674Black/]

13

Visual recommendations and search

‣Capture visual similarity allows for: - recommendation - visual search from a user picture

‣ Interesting solution: triplet networks

Examples of visual similarity challenges

[D. Shankar et al., arXiv:1703.02344]

[image courtesy of B. Amos,http://bamos.github.io/]

14

Challenges

‣May not have a wealth of relevant annotated data if: - catalog taxonomy is new / modified - source vs. target domain annotated data (e.g. wild vs. shop picture)

source images + labels

Clas

sifie

r

Pre-training

classlabel

source images

SourceCNN

Disc

rimin

ator

Adversarial Adaptation

domainlabel

TargetCNN

target images

Clas

sifie

r

Testing

classlabel

TargetCNN

target image

SourceCNN

Use human annotators Transfer learning

‣👍 solves the problem👎 expansive, does not scale well

‣Can be made clever:— selection of the data to annotate— online learning with a— human-in-the-loop process— weak supervision using cheap labels?— (quantity vs. quality vs. cost)

‣Plenty of labels in target domain?→ fine-tune a pre-trained network

‣Few or no labels in target domain?

[E. Tzeng et al.,arXiv:1702.05464]

‣Ubiquitous and veryimportant for e-commerce

‣Deep understanding of theusers and of the products

15

Recommender systems

[S. Zhang, L. Yao, A. Sun, arXiv:1707.07435]

16

Example: Deep Neural Networks for YouTube Recommendations

‣“increased the watch time dramatically on recently uploaded videos in A/B testing”

Candidate generation network

[P. Covington, J. Adams, E. Sargin, RecSys 2016]

17

Conclusions and outlook

‣Deep learning = sequence of progressively more abstract representationsMost spectacular success on image, text and audio data (CNN & RNN)

‣Challenge: amount of annotated data → non-fully supervised learning techniques? - unsupervised (no label) - semi-supervised (few labels) - weakly supervised (labels carry less information than would be necessary)

‣ Impact on e-commerce is already significant, and will continue to grow: - improvement of the catalogs of products - better recommendations - and much more!

THANK YOU

19

Improving the catalog of products: categorization from image

‣Available data: - text: product title, description, user reviews - image: product picture(s) - browsing patterns of users, search queries

‣Goals: - build / match / normalize taxonomies - categorize products - predict product attributes

7x

7 co

nv

, 6

4, /2

po

ol, /2

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 1

28

, /

2

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 c

on

v, 2

56

, /

2

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 c

on

v, 5

12

, /

2

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

av

g p

oo

l

fc

10

00

im

ag

e

3x

3 c

on

v, 5

12

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

po

ol, /2

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

po

ol, /2

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

po

ol, /2

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

po

ol, /2

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

po

ol, /2

fc

40

96

fc

40

96

fc

10

00

im

ag

e

ou

tp

ut

siz

e: 1

12

ou

tp

ut

siz

e: 2

24

ou

tp

ut

size

: 5

6

ou

tp

ut

size

: 2

8

ou

tp

ut

size

: 1

4

ou

tp

ut

siz

e: 7

ou

tp

ut

siz

e: 1

VG

G-1

93

4-la

ye

r p

la

in

7x

7 co

nv

, 6

4, /2

po

ol, /

2

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 6

4

3x

3 c

on

v, 1

28

, /

2

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 co

nv

, 1

28

3x

3 c

on

v, 2

56

, /

2

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 co

nv

, 2

56

3x

3 c

on

v, 5

12

, /

2

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

3x

3 co

nv

, 5

12

av

g p

oo

l

fc

10

00

ima

ge

34

-la

ye

r r

esid

ua

l

[ResNet,K. He et al.,

arXiv:1512.03385]

‣ Image classification: essentially a solved problem*Image classification: thanks to deep Convolutional Neural Networks

‣ * given hundreds of thousands of relevant training images and recent GPUs