Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning...

Post on 15-Apr-2017

1.341 views 1 download

Transcript of Damien Lefortier, Senior Machine Learning Engineer and Tech Lead in the Prediction Machine Learning...

Copyright © 2016 Criteo

ML for Display Advertising @ Scale

Damien Lefortier

MLconf NYC

2016-04-15

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

2

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

3

Copyright © 2015 Criteo

AdTech / Criteo

4

Advertiser Publisher

Copyright © 2015 Criteo

Our Engine is trying to answer 3 questions

COMMON

OBJECTIVE:

Maximize the

client’s value

1. How much should we bid for a given ad space?My company

yes no no

My company

yes …

2. What products should we recommend / show?

My company

BUY!

My company

BUY! BUY!

BUY! BUY!

My company

BUY! BUY!

BUY! BUY!

My company

BUY! BUY! BUY!

BUY!

3. What is the best look & feel of the banner?

Copyright © 2015 Criteo

6

Physical infrastructure

7 in-house data centers on 3 continents

~ 15000 servers; largest Hadoop cluster in Europe

More than 35 PB of data storage

Traffic

800k HTTP requests / sec (peak activity)

29000 impressions / sec (peak activity)

< 10 ms to process a bidding request

< 100 ms to render the ad (if we win)

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

7

Copyright © 2015 Criteo

8

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Copyright © 2015 Criteo

9

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Copyright © 2015 Criteo

Bidding strategy (1)

• As we sell performance: Criteo’s and our clients’ interests are aligned.

• The cost of a display is lower and independent from the bid (2nd price or floor), so we should bid the max value the client is willing to pay.

• We use adjustments for 1st price auctions.

10

Copyright © 2015 Criteo

Bidding strategy (2)

• This value depends on the predicted performance and the client’s objective.

• Some examples:

• Click optimized campaign: bid = maxCPC pClick

• CR optimized campaign: bid = maxCPO pCR

• …

11

We train our prediction models on our historical displays

Historical displays

Variables

Level of engagement of the user

Quality of inventory

User fatigue

For travel: time to check-in and number

of nights

: clicked displays : converted displays (size = order value)

Our ability to predict relies

greatly on the relevance of

the variables we consider

Machine Learning

Algorithms

Copyright © 2015 Criteo

13

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Copyright © 2015 Criteo

Recommend products for a user

• What we want: reco(user) = products

• 1B users x 3B products!

• But we need to scale and keep it fresh

• What we can do:

Pre-select products offline

Refine scoring online to get final candidates

Bob saw orange shoes

Some candidate products

Historical

Similar

Complementary

Most viewed

Products delivering the best performance are displayed

Variables

Products seen by the user

Time since product event

Level of similarity

Product features

Historical displays

: clicked products : converted products (size = order value)

Products are selected based

on their CTR, CR or OV

Machine Learning

Algorithms

Copyright © 2015 Criteo

17

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Historical displays (color = look & feel)

We train our prediction models on our historical displays

Variables

Some of which we control:

How user interacts with banner

Organization of information

Colorset

Some of which we don’t:

Zone format

Publisher

: clicked displays : converted displays (size = order value)

Look and feel will be selected

based on its CTR, CR or OV

My company

BUY! BUY! BUY!

BUY!

Machine Learning

Algorithms

Copyright © 2015 Criteo

19

Bidding

• Should we bid?

•At which price?

Recommendation

•Which products shouldwe display?

Look & Feel

•Big image vs small image

•Background color, ...

Prediction

•Generic prediction engine

• Specific models trained on TBs of data

Copyright © 2015 Criteo

Many models to learn

• We have different ML models for bidding / recommendation / … and depending on the campaign objective. We use logistic regression in many places.

• Each model is trained independently & refreshed as often as possible.

• Three main sources of features: user, ad, page (mostly categorical).

20

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

Copyright © 2015 Criteo

Learn on huge volumes of data

10 000 displays

leads to

50 clicks

leads to

1 sale

Copyright © 2015 Criteo

Quadratic features

• Outer product between 2 features (similar to a polynomial kernel of degree 2).

• Example between site and advertiser:

24

Publisher network

Publisher

Site

Url

Advertiser network

Ad

Campaign

Advertiser

Copyright © 2015 Criteo

Hashing trick

• Standard representation of categorical features: “one-hot” encoding• Dimensionality equal to the number of different values…

• Hashing to reduce dimensionality (made popular by John Langford in VW)• Dimensionality now independent of number of values

• Using:

25

Copyright © 2015 Criteo

In-house Machine Learning library -- IRMA

• We have our own large-scale distributed machine learning library on top of Hadoop used for all our models.

• From a ML perspective we rely, in most cases, on an L-BFGS solver initialized with SGD (see, eg, A. Agarwal et al. A Reliable Effective Terascale Linear Learning System).

26

Copyright © 2015 Criteo

Distribution of L-BFGS & SGD

• L-BFGS, being a batch algorithm, is easy to distribute.

• SGD is a bit tricker: we do parameter averaging for that and we also use Hogwild! to multi-thread on each machine.

• We use Hadoop AllReduce:

27

Copyright © 2015 Criteo

A word on more advanced techniques

• Irma is not only about vanilla logistic regression with L2 regularization…

• It contains more advanced techniques such as, e.g., transfer learning, factorization machines, learning to rank, cost-sensitive learning, …

• We for example use cost-sensitive learning for bidding.

28

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

29

Copyright © 2015 Criteo

Offline & online evaluation

Usual two-step process:

• Offline testing is fast, cheap, and efficient for wide exploration.

• Online testing is expensive but has the ultimate word.

30

Copyright © 2015 Criteo

Offline metrics (bidding case)

• We use classical metrics: LLH, RMSE, … (which focus on the prediction and ignore the bidding system where we use these models).

• Utility from Offline Evaluation of Response Prediction in Online Advertising Auctions by O. Chapelle (WWW’15).

31

Copyright © 2015 Criteo

Online metrics (bidding case)

• RevExTac = Revenue Excluding Traffic Acquisition Costs

• Cost, Revenue, …

32

Copyright © 2015 Criteo

Some statistics on evaluation

• 100K+ offline tests per year

• 1K+ A/B tests per year

• Many people

33

• We developed a platform and processes that enable very fast testing and improvement

Copyright © 2015 Criteo

Outline

• Introduction to the AdTech / Criteo

• Deep dive into our ML algorithms

• Offline and online evaluation

• Future areas of research

34

Copyright © 2015 Criteo

Some examples of future areas of Research

• Counterfactual evaluation (offline A/B tests)

• Embeddings for recommandation

• Policy learning

35

Copyright © 2015 Criteo

Counterfactual evaluation

• Estimate the business metric directly (clicks, sales, …).

• Using the production model + randomization.

• Good results on clicks already.

36

Copyright © 2015 Criteo

Embeddings for recommandation

• Can embeddings (for example a la word2vec) help us compute similaritiesbetween, e.g., different products or users?

37

Copyright © 2015 Criteo

Policy learning – example on Look & Feel optimization

• Classical supervised machine learning approach: learn a pClick model and sort by predicted values for each possible value (e.g, each color).

• This is a hard problem and may be overkill!

• Really, we only want to know which color is the best according to somebusiness metric (eg, sales).

38

Copyright © 2015 Criteo

Academic research @ Criteo

• Our 1st public dataset is online: http://bit.ly/1vgw2XC

• New 1TB dataset released last year.

• Some recent publications:

Offline Evaluation of Response Prediction in Online Advertising Auctions. O. Chapelle, WWW’15.

Sources of Variability in Large-scale Machine Learning Systems. D. Lefortier, A. Truchet, and M. de Rijke, NIPS 2015, workshop on ML systems, 2015.

Cost-sensitive Learning for Bidding in Online Advertising Auctions. F. Vasile and D. Lefortier, NIPS workshop on ML for e-Commerce, 2015.

39

Copyright © 2015 Criteo

Questions

d.lefortier@criteo.com