Fast Data Driving Personalization - Nick Gorski

54
1 Fast Data Driving Personalization Nick Gorski August 7, 2014

description

hakkalabs.co

Transcript of Fast Data Driving Personalization - Nick Gorski

Page 1: Fast Data Driving Personalization - Nick Gorski

1

Fast Data Driving Personalization

Nick GorskiAugust 7, 2014

Page 2: Fast Data Driving Personalization - Nick Gorski

2

Fast data driving personalization

• Fast data paradigm– There are common big data / distributed systems patterns– Lambda architecture– Grounded example: features of user behavior

• Deep dive into TellApart personalization secret sauce– Retargeting bidder– Predictions built on top of fast data architecture

Page 3: Fast Data Driving Personalization - Nick Gorski

3

Fast data?

Page 4: Fast Data Driving Personalization - Nick Gorski

4

Fast data

• Big data• All-time data• New events readily available• Served with low latency

Page 5: Fast Data Driving Personalization - Nick Gorski

5

Fast data driving personalization

• Predictive marketing platform– Ingest diverse events over all-time– Indexed by user– Reflecting events that occurred seconds ago– Served in real time (ms latency)

Page 6: Fast Data Driving Personalization - Nick Gorski

6

Bidding for Transactional Retargeting

Page 7: Fast Data Driving Personalization - Nick Gorski

7

Transactional retargeting• E-marketing• Display advertising• Site retargeting• Personalization for marketing• Transactional retargeting

Page 8: Fast Data Driving Personalization - Nick Gorski

8

It’s 2014, you know what marketing is

Page 9: Fast Data Driving Personalization - Nick Gorski

9

Display advertising funnel

Brand

Demographic targeting

Paid search

Awareness

Interest

Intent

Consideration Site retargeting

Purchase

Page 10: Fast Data Driving Personalization - Nick Gorski

10

It’s 2014, you know what retargeting is

Page 11: Fast Data Driving Personalization - Nick Gorski

11

TellApart transactional retargeting

Page 12: Fast Data Driving Personalization - Nick Gorski

12

Aligned incentives

Page 13: Fast Data Driving Personalization - Nick Gorski

13

Secret sauce

TellApart’s bidder for transactional retargeting

Page 14: Fast Data Driving Personalization - Nick Gorski

14

TellApart’s bidder: our secret sauceReceive real-time bidding (RTB) request

Submit RTB response

Predict expected revenue

Predict auction environment

Get features of user

Calibrate bid with real-time data

Bid

flow

100ms 40ms

30ms

30ms

Page 15: Fast Data Driving Personalization - Nick Gorski

15

Getting features of a user’s behavior• We’d like to predict a user’s value to a merchant

– Lifetime value– Today– If we showed them an ad right now

• Proxy used throughout ad-tech: CTR– What is

– What is

Page 16: Fast Data Driving Personalization - Nick Gorski

16

Informative features

Merchant events• recency• product views• added items to

cart• purchases

TellApart events• ad views• ad clicks

Real-time (RTB)• publisher• TOD• vertical

Page 17: Fast Data Driving Personalization - Nick Gorski

17

Special purpose fast data

Events

MapReduce

Event logs Feature vectors

Model training

P(click)

Model

User events

Application

Features of user

Page 18: Fast Data Driving Personalization - Nick Gorski

18

Lambda architecture for general fast data

http://lambda-architecture.net/

Page 19: Fast Data Driving Personalization - Nick Gorski

19

Fast data for machine learningFeature speed layer

Feature batch layer

Model batch layer

Feature serving layer

Model serving layer

Feature API

Model API

Predictionsabout users

Summarize

Training

Materialization

Feature topology

Registration

ExtractEvent logs

Kafka topics

Page 20: Fast Data Driving Personalization - Nick Gorski

20

Case study: Freshplum

• Challenge:– Port existing feature

extraction and model training to lambda architecture

• Dynamic offers– Who should receive an offer?– Gradient boosted decision

trees– Session-based features

• Same AUC, similar performance• Week of dev time

Lambda architectureFreshplum infrastructure

Page 21: Fast Data Driving Personalization - Nick Gorski

21

Fast data for fast features

• Lambda architecture applied to feature extraction– Unified offline and online extraction– Robust and fault tolerant– Feature engineering is fast– Supports features that are otherwise expensive to deploy

Page 22: Fast Data Driving Personalization - Nick Gorski

22

TellApart’s bidder: our secret sauceReceive RTB request

Submit RTB response

Predict expected revenue

Predict auction environment

Get features of user

Calibrate bid with real-time data

Bid

flo

w

40ms

Page 23: Fast Data Driving Personalization - Nick Gorski

23

Modeling is hard, modeling is easy

• Building a retargeting bidding strategy is hard

• Effective valuation strategies for retargeting:– Pass the buck to your client– Bid infinity– Bidding proportional to lifetime user value– Bidding proportional to P(Iclick)

– Formulating as an MDP and learning the optimal policy

“Smart scientists don’t just solve big, hard problems; they also have a knack

for making big hard problems small.”

-DJ Patil

Page 24: Fast Data Driving Personalization - Nick Gorski

24

Modeling is hard, modeling is easy

• CPM – ads are valuable when they are displayed• CPC – ads are valuable when they are clicked• CPA – ads are valuable when they lead to an action

• TellApart bills on a CPA basis, charging a revenue share for each click conversion

– Proxy for true value– Auditable by merchants

Page 25: Fast Data Driving Personalization - Nick Gorski

25

The value of a TellApart ad

• The value of a TellApart ad is the click conversion revenue it drives

• Decompose into a chain of simple models

• Further decompose probabilities– Train model– Calibrate with offline data

Page 26: Fast Data Driving Personalization - Nick Gorski

26

Does it work?

• 85-90% of clicks are made by bids in top quartile

• 96% of conversions made by bids in top quartile

• 80% of conversions made by bids in top decile

Page 27: Fast Data Driving Personalization - Nick Gorski

27

TellApart’s bidder: our secret sauceReceive RTB request

Submit RTB response

Predict expected revenue

Predict auction environment

Get features of user

Calibrate bid with real-time data

Bid

flo

w

40ms

Page 28: Fast Data Driving Personalization - Nick Gorski

28

RTB auction

Publisher A Publisher B Publisher C . . .

Bidder Bidder Bidder . . .

RTB exchange

Exchange ?

Page 29: Fast Data Driving Personalization - Nick Gorski

29

Auctions and equalibria

• RTB auctions are second-price, sealed bid– Vickrey with stable Nash equilibria– Not repeated, but not one-shot either

• Impressions shown milliseconds apart• Multiple exchanges• Information leakage

– Sliding fees violate second-price assumption– Multiple slots (generalized second-price), but pay per

impression

Page 30: Fast Data Driving Personalization - Nick Gorski

30

Value and bid price across auctions

• Bidding infinity maximizes revenue – Win every impression, get every possible click and drive

every possible conversion (naively true)

• Bidding our true value maximizes profit

• Winning our true value maximizes affordable revenue

• Win at true value on average (minimize variance in mean)

Page 31: Fast Data Driving Personalization - Nick Gorski

31

Winning at a target clearing price

• Goal: win at true value on average, minimizing variance of mean

• How do we do that in a second-price auction?

• Model the competition

Page 32: Fast Data Driving Personalization - Nick Gorski

32

All publishers are not created equal

Page 33: Fast Data Driving Personalization - Nick Gorski

33

• Given a win price and features of environment, predict the bid price that will clear at that win price.

• Modeling this– The easy way– The good way

Predicting the markets

Page 34: Fast Data Driving Personalization - Nick Gorski

34

The easy way: bid to win

• Local linear isotonic regression

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.50

0.5

1

1.5

2

2.5

Win CPM

Bid

CPM

Page 35: Fast Data Driving Personalization - Nick Gorski

35

The right way: bid to win

• Mixture of Gaussians– Identify clusters– Share information to

leverage trends in features across clusters

• EM may be the right way, but MapReduce EM is not the easy way

• Big impact to revenue and performance

Page 36: Fast Data Driving Personalization - Nick Gorski

36

TellApart’s bidder: our secret sauceReceive RTB request

Submit RTB response

Predict expected revenue

Predict auction environment

Get features of user

Calibrate bid with real-time data

Bid

flo

w

40ms

Page 37: Fast Data Driving Personalization - Nick Gorski

37

Calibrating overall performance

• Building models is great, but the real world is messy– Non-stationary adversarial environment– Biased data and imperfect models

• We bid to maximize affordable revenue– If we spend too little, we sacrifice top-line revenue for profit– If we spend too much, we can’t afford the revenue that we

drive

Page 38: Fast Data Driving Personalization - Nick Gorski

38

PID control

Page 39: Fast Data Driving Personalization - Nick Gorski

39

Calibrated bidding

Bids

Gain

Difference

Production(taba)

target

control signal

error

bid $

real-time spend $revenue $

Page 40: Fast Data Driving Personalization - Nick Gorski

40

Controlling bids strategically

• Control signal says “spend more” or “spend less”

• When we spend less– Don’t bid less than true

value– Instead, threshold low-

value bids

• When we spend more– Bid more across the board

Bid

$

Bid rank

Page 41: Fast Data Driving Personalization - Nick Gorski

41

Thresholding increases efficiency

Spend (legacy)

Spend

Revenue

Bids

Ranked bids

Perc

en

t of

Tota

l

Page 42: Fast Data Driving Personalization - Nick Gorski

42

Control is hard

• Given limited time, we prefer improving our models• Reasons for control

– Unpredictable market dynamics– Predictable market dynamics– Inaccurate user value models

• Pushing responsibilities up to models makes bidder more effficient

Page 43: Fast Data Driving Personalization - Nick Gorski

43

TellApart’s bidder: our secret sauceReceive RTB request

Submit RTB response

Predict expected revenue

Predict auction environment

Get features of user

Calibrate bid with real-time data

Bid

flo

w

40ms

Page 44: Fast Data Driving Personalization - Nick Gorski

44

And that’s just bidding!

• Identity• Products shown in ads

– Strategies to select viewed products– Recommended products

• Data platform

Page 45: Fast Data Driving Personalization - Nick Gorski

45

TellApart retargeting by the numbers• Handle 5.3B requests per day, at peak 100K QPS

• Lifts online revenue 10% or more

• As of December 2013, $100M ARR• Drove 1% of Cyber Monday e-commerce in 2013

• TellApart has won every head to head test with performance

Page 46: Fast Data Driving Personalization - Nick Gorski

46

TellApart’s data philosophy

• Infrastructure dictates product, so build good infrastructure• EV[work] = EV[impact] * P(works)• Simple models, chained together• Find simple changes with big impact• Data wins arguments• Transparent and aligned objective functions

Page 47: Fast Data Driving Personalization - Nick Gorski

47

Greenfields

Page 48: Fast Data Driving Personalization - Nick Gorski

48

TellApart Identity Network

Page 49: Fast Data Driving Personalization - Nick Gorski

49

Email

Page 50: Fast Data Driving Personalization - Nick Gorski

50

Onsite personalization

Page 51: Fast Data Driving Personalization - Nick Gorski

51

Performance-based personalized marketing

Page 52: Fast Data Driving Personalization - Nick Gorski

52

Fast Data. Hard Problems. Insanely Great Team.

http://www.tellapart.com/careers/

Page 53: Fast Data Driving Personalization - Nick Gorski

53

Thanks!

[email protected]@nicholasgorski

Page 54: Fast Data Driving Personalization - Nick Gorski

54

Fast Data. Hard Problems. Insanely Great Team.

http://www.tellapart.com/careers/