Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

October 28, 2017

Giuseppe “Pino” Di Fabbrizio

Rakuten Institute of Technology – Boston

• Motivations

• Traditional information retrieval models

• Learning-to-rank models

• Relevance

• Ranking Metrics

• Algorithms

• Ranking optimization

• Use cases

• Summary

• What is next?

Disclaimer: If not otherwise specified, images in this presentation

comply with the (CC) creative commons publishing license

• E-commerce growing faster than traditional brick-and-mortar market ($4.06T by 2020)

• Mobile shopping adoption increasing worldwide (46% shoppers in Asia and 28% in North America)

• Online catalogs offering broader selections and competitive products

• Electronic money transactions gaining more consumers’ trust

• Massive data collected during web and mobile interactions providing foundation for machine learning-driven optimizations

1.61BShoppers

$1.86TSales

$150B*Revenues

*2016 Combined revenues for Amazon, Otto Group, and Rakuten

https://www.statista.com/topics/871/online-shopping/

250M+ Products

40k+ Categories

How do we find

the most relevant

products for a

search query?

www.rakuten.com

Oct 10, 2017

8Query

Rankingfunction

Documents

www.rakuten.com

Nov 2016

• Relevance is estimated by lexical matches of query terms with document terms

• Examples:

• Boolean models

• Vector space models

• Latent semantic indexing

• Okapi BM25

Indexer

Documents

Scoring

Top-n retrieved

documentsOn-line

Off-line

www.rakuten.com

Oct 10, 2017

Query (Q)

Document 1 (D1)

Document 2 (D2)

iphone

iphone 7 Case

Q 1 1 1

D1 2 2 2

D2 3 1 0

• Basic ideas

• Lexical similarity metrics

• Penalizing repeated occurrences of the same term

• Penalizing term frequency for longer documents

• Only few features

• Manually hand-tuned feature weights based on heuristic

• Cannot include important search signals such as user’s feedback, product popularity, purchase history, etc.

• Fast and scalable

• Data-driven approach

• Directly optimize products rank based on relevance (different from classification and regression ML tasks)

• Handle thousands of features

• Robust to noisy data

• Handle personalization

• Industry & research state-of-the-art (Amazon, eBay, Microsoft, Yahoo!, Yandex, etc.)

A document is relevant if contains the information the user was looking for when submitted the query

Relevance is subjective and depends on many factors:• context (what is displayed and how)

• task (purchase, search info, answer, etc.)

• novelty (unexpected data, ads, ext.)

• time and user’s effort involved

www.rakuten.com

Nov 2016

buyclick add

www.rakuten.com

Nov 2016

• Clickthrough data (user’s implicit feedback) as source of relevance for search query / document pairs

• Pros

• Abundant and easy to harvest

• Always fresh

• Unbiased

• Cons

• Noisy

• Long tail queries

• Simple relevance mapping:

• score = 0 (not relevant), score = 3 (highly relevant)

• Purchase > cart > click > impression

Score User’s implicit feedback

3 Product purchased

2 Product added to the shopping cart

1 Product clicked

0 No clicks

Seen products

Potentially

seen products

Unseen

products

Browser

viewport

www.rakuten.com

Aug 2017

Documents

Normalized and Discounted Cumulative Gain

1 2 3 4 5 6 7 8 9 10

• Tree ensemble method

• Handle sparse data

• Handle missing values and various value types

• Robust to outliers

• Learn higher-order feature interactions

• Invariant to feature scaling

• Highly scalable and optimized open source implementation (XGBoost)

Point-wise

• Input: single documents / Output: class labels or scores

• Classify each document as relevant or non-relevant.

• Adjust w to reduce classification errors

Pairwise ranking

• Input: document pairs / Output: partial order preferences

• Classify pairs of documents – D1 > D2?

• Adjust w to reduce discordant pairs

List-wise ranking

• Input: document collections / ranked document list

• Score permutations -- Is {D1,D2,…} > {D1’,D2’,…} ?

• Adjust w to directly maximize ranking measure of interest (NDCG)

QDjDi >

QDjDi > Dk>

Green = relevant

Gray = not-relevant

Blue arrows = boost for pair-wise loss function

Red arrows = boost for list-wise loss function

(a) is the perfect ranking;

(b) is ranking with 10 pairwise errors;

(c) is ranking with 8 pairwise errors

• Relevance: User’s behavior signals

• Ranking Metrics: NDCG

• Machine Learning Algorithm: Gradient Tree Boosting

• Ranking optimization: List-wise with NDCG metrics

Indexer

Documents

Scoring

Scores

Features

Training

Learning

to rank

Re-ranking

Top-n ranked

documents (n > 1M)Top-m re-ranked

documents (m < 1k)

On-line

Off-line

Relevance

24www.rakuten.com

Mar 2017

Search Query: “40inch tv”

Regular text

search

Search with user’s signals

and learning-to-rank models

Not relevant

Conversion Rate(Simulation)

NDCG CTR SimulatedQueries

Relative gain 15.58% 7.50% 10,000

Depth / Estimators

5 / 500 3 / 500 10 / 500 3 / 500

NDCG 0.687 0.688 0.685 0.689

Relative gain 15.14% 15.41% 14.92% 15.58%

Training time (56 cores)

2:45:48 1:20:57 35:25:44 1:58:07

Automatic Speech

Recognition

ComputerVision

Natural Language

Processing

Information Retrieval

2011 2013 2013-2015 2017?

28Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of

Text for Web Search. In Proceedings of the 26th International Conference on World Wide Web (WWW '17).

• Traditional IR methods do not scale to modern e-commerce needs

• User’s implicit feedback is a proxy for search query / document pairs

relevance

• Learning-to-rank (LTR) methods scale to thousand of features and are

robust to data noise

• LTR with listwise-based loss function substantially improve search

relevance (15.6% NDCG increase on e-commerce data)

• NDCG improvements directly correlate to conversion rates (7.5% CTR

increase on e-commerce data)

• DNN methods for IR are starting to outperform traditional ML methods

Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

Technology

Transcript of Find it! Nail it!Boosting e-commerce search conversions with machine learning at scale

Boosting Small Engines to High Performance – Boosting Systems ...

Influencers EventTech Deck SZ - Legacy Marketinglegacymarketing.com/uploads/about-us/Influencers_EventTech_Deck.… · Effectiveness: boosting reach and conversions. ... The “Wild

Nail the Landing Page - 5 Steps to Optimize Conversions

Blazon Surgical › download-catalog › BeautyCatalogue.pdf · Cuticle & Nail Scissors Cuticle & Nail Scissors $.-5001 Nail Scissor Duplex Hanndle str/cvd 9cm, Cuticle & Nail Scissors

Nail File: An App for Nail Polish Organization

Extended Newegg CaseStudy - Ecommerce Live Chat Powered by ... · boosting sales help, driving higher conversions, upselling for AOV lifts, keeping the focus on revenue. And the built-in

8293 50g DataSheet · •Decimal hr/hr min sec conversions •Polar/rectangular conversions •Angle conversions •Base conversions and arithmetic •Unit conversions • Bit, boolean,

Nail Diseases, Hand in Diagnosis, Terry's Nail

kit Nail Art Fever - Bobit Studios · Nail Art Gallery Magazine will be distributed online at no charge to nail salon consumers, online nail enthusiasts, and nail professionals alike.

Chapter 5 Section 2 Energy Conversions - Amazon S3 · Chapter menu Resources Section 2 Energy Conversions Why Energy Conversions Are Important • Energy conversions are needed for

Nail-head pull-through strength and lateral nail ...

Get Inspirational Nail Art at The Nail Place

· OdUOW17 COUR CARRÉ PORTS PORTS TSE bonboN Dior GUERLAIN NAIL NAIL NAIL NAIL. 2676 9001 2317 3680 961 .com

Esthetician Nail Technician Nail Tips and Forms · pull the natural nail plate flat, resulting in discomfort or separation of the nail plate from the nail bed. Most nail tips have

1.3. 11.4. 1.5. Unit Conversions Metric Conversions ...

YOUR SOURCE FOR BEAUTY SUPPLIES, EQUIPMENT AND …€¦ · Silk Nail Supplies, Lamour Acrylic Artificial Color Nail Tips, Kupa Mani Pro Electric Nail File, Nail Treatment, OPI Nail

The Nail Structure. The Nail Structure/Cross Section.

Subsea Boosting Systems - · PDF file1 Content Introduction Subsea Boosting and Compression –Why subsea boosting? Subsea Boosting Systems for Subsea Tiebacks –Total system approach

boosting your Site conversions today. Ready to get started? Chapter l: What Is a Landing Page? Let's get the basics out of the way. ... Site SEO Boost

SoC opportunities for boosting SDR GNSS performanceceur-ws.org/Vol-2416/paper57.pdf · data corrections and auxiliary conversions. 3. Serial algorithm performance. Finding the best