Probabilistic Methods for Targeted Advertising

40
Probabilistic Methods for Targeted Advertising Max Chickering Microsoft Research

description

Probabilistic Methods for Targeted Advertising. Max Chickering Microsoft Research. Outline. Targeted Mailing To whom should you send a solicitation? Targeted Advertising on the Web How should you display banner ads to maximize click-through?. Targeted Mailing. - PowerPoint PPT Presentation

Transcript of Probabilistic Methods for Targeted Advertising

Page 1: Probabilistic Methods for Targeted Advertising

Probabilistic Methods forTargeted Advertising

Max Chickering

Microsoft Research

Page 2: Probabilistic Methods for Targeted Advertising

Outline

• Targeted Mailing

To whom should you send a solicitation?

• Targeted Advertising on the Web

How should you display banner ads to maximize click-through?

Page 3: Probabilistic Methods for Targeted Advertising

Targeted Mailing• Given a population of potential customers.

Person X1 X2 … Xn

1 0 0 … red2 0 3.4 … blue. . . .. . . .. . . .m 1 7 … green

• Sending an advertisement costs money:

- Postage- Possible Discount

Which potential customers do you solicit?

Page 4: Probabilistic Methods for Targeted Advertising

Motivating Application

Advertisement:

MSN subscription

Potential customers:

People who registered Windows 95

Known variables:

15 from questionnaire (e.g. gender, RAM size)

Page 5: Probabilistic Methods for Targeted Advertising

Naïve Solutions

• Mail to those customers most likely to subscribe to MSN

Can waste money by targeting customers who wouldsubscribe anyway

• Mail to everyone

Even worse!

Page 6: Probabilistic Methods for Targeted Advertising

Response Behaviors

Mail Don’t Mail Always buyer Yes YesPersuadable Yes NoAnti-persuadable No YesNever buyer No No

Will the potential customer buy the product?

We only make money from mailing to the persuadablepotential customers

Page 7: Probabilistic Methods for Targeted Advertising

Expected Profit for a Population

Population of N potential cutomers Nalw, Nper, Nanti, Nnev

Cost of mailing cSolicited and unsolicited revenue r

Expected Profit from mailing

rN

NNc peralw

rN

NN antialw

Profit from not mailing

Page 8: Probabilistic Methods for Targeted Advertising

Lift in Profit From Mailing

Profit from mailing - Profit from not mailing

rN

NN

N

NNc antialwperalw

For any set of potential customers, we should onlymail if the lift is positive.

Page 9: Probabilistic Methods for Targeted Advertising

Learning Expected Lift

S {s0, s1} (did not subscribe, did subscribe)

M {m0, m1} (did not mail, did mail)

)|( 11 mMsSpN

NN peralw

)|( 01 mMsSpN

NN antialw

Identifiable ifS, M known in training data

Lift : -c + [ p(S=s1|M=m1) – p(S=s1|M=m0) ] r

Page 10: Probabilistic Methods for Targeted Advertising

Controlled Experiment: Identify Profitable Sub-Populations

1. Choose a small sample of the potential customers

2. Randomly divide those customers into a “treatment group” (M = m1) and a “control group” (M = m0)

3. Wait a specified period of time, and record S = s0 or S = s1 for each

Page 11: Probabilistic Methods for Targeted Advertising

Controlled Experiment

Person X1 X2 … Xn M S1 0 0 … red m1 s0

2 0 3.4 … blue m0 s1

. . . .

. . . .

. . . .m 1 7 … green m1 s1

Use machine-learning techniques to identify sub-populations with high positive lift, and then target those customers

Lift ( Sub-population corresponding to Xn=blue ) =

-c + [ p(S=s1|M=m1 , Xn=blue) – p(S=s1|M=m0 , Xn=blue) ] r

Page 12: Probabilistic Methods for Targeted Advertising

Identify Profitable Sub-Populations

Partitions of X define sub-populations and statistical model for p(S|M,X) defines the lift

Approach: Use Decision Trees

Known distinctions in our data : X = {X1, …, Xn}, S, M

X1 > 10, X4 = 2

X1 < 10, X12 = false

X1 < 10, X12 = true

Lift 2 Lift 3

Lift 4

X1 > 10, X4 2

Lift 1

Page 13: Probabilistic Methods for Targeted Advertising

Probabilistic Decision Trees

p(S | M=m0, X1=1, X2=2)

X2

M X1

M

Mp(S=subscribed) = 0.6p(S=not subscribed) = 0.4

21,3

mailednot

mailed 12

p(S=subscribed) = 0.5p(S=not subscribed) = 0.5

p(S=subscribed) = 0.4p(S=not subscribed) = 0.6

p(S=subscribed) = 0.2p(S=not subscribed) = 0.8

mailed notmailed

mailed

notmailedp(S=subscribed) = 0.7

p(S=not subscribed) = 0.3

p(S=subscribed) = 0.3p(S=not subscribed) = 0.7

p(S | M, X1, X2)

Page 14: Probabilistic Methods for Targeted Advertising

X2

M X1

M

Mp(S=subscribed) = 0.6p(S=not subscribed) = 0.4

21,3

mailednot

mailed 1 2

p(S=subscribed) = 0.5p(S=not subscribed) = 0.5

p(S=subscribed) = 0.4p(S=not subscribed) = 0.6

p(S=subscribed) = 0.2p(S=not subscribed) = 0.8

mailed notmailed

mailed

notmailed

p(S=subscribed) = 0.7p(S=not subscribed) = 0.3

p(S=subscribed) = 0.3p(S=not subscribed) = 0.7

Calculating Lift

Potential customer with {X1=1, X2=2}, Assume c = 0.50, r = 9

Lift = -0.5 + (0.4 – 0.2) 9 = 1.3

Mail to this person!

Page 15: Probabilistic Methods for Targeted Advertising

Traditional Learning Algorithm

X1

Score1(Data)

X2

Score2(Data)

Xn

Scoren(Data)

X2

X2

X1

Score1(Data)

X2

X3

Score3(Data)

X2

Xn

Scoren(Data)

Page 16: Probabilistic Methods for Targeted Advertising

Lift-Aware Learning Algorithm

Traditional Learning Algorithm

Identify a tree that represents p(S|M,X) well

Lift-Aware

Would like the tree to be good at modeling the difference:

p(S=s1|M=m1,X=x) - p(S=s1|M=m0,X=x)

Page 17: Probabilistic Methods for Targeted Advertising

A HeuristicOnly consider decision trees (for S) with the last split on M

M

X1

M M

X1

M M

Score1(Data)

Xn

M M

Scoren(Data)

X1

M

Score2(Data)

X2

M M

X1

M

Score2(Data)

X2

M M

Page 18: Probabilistic Methods for Targeted Advertising

Experiment: Real-world Dataset

Product of interest: MSN subscriptionPotential customers: Windows 95 registrantsKnown variables (X): 15 from questionnaire (e.g. gender, RAM size) Cost to Mail: 42 centsSubscription revenue: varied from 1 to 15 dollars

Data: sample of ~110,000 potential customers (70% train, 30% test)

Compared our algorithm (FORCE) with unconstrained greedyalgorithm (NORMAL) for various revenues

Page 19: Probabilistic Methods for Targeted Advertising

Results on Test Data:Per-person improvement over Mail-to-All

0

0.05

0.1

0.15

0.2

0.25

1 4 7 10 13 16 19 22 25

Benefit (Dollars)

Imp

rove

men

ts (

Do

llars

)

FORCE

NORMAL

Page 20: Probabilistic Methods for Targeted Advertising

Conclusions / Future Work

Marginal improvement over standard decision-tree algorithm:

Almost every path in the “standard” trees contained a split onM. We expect larger difference for other domains.

Algorithm works for discounted prices:

Expected Profit from mailing discountperalw r

N

NNc

rN

NN antialw

Profit from not mailing

Page 21: Probabilistic Methods for Targeted Advertising

Part II: Targeted Advertising on the Web

Given information about a visitor, how do you choosewhich advertisement to display?

???

Page 22: Probabilistic Methods for Targeted Advertising

Goals of Targeted Advertising

Maximize $$$

• Maximize Clicks

• Brand Presence

Page 23: Probabilistic Methods for Targeted Advertising

Naïve Targeting Scheme

Possible cluster attributes:

• Current page category

• Pages the user has visited on the site

• Known demographics

• Inferred demographics

• Previous advertisement clicks

Cluster 1 Cluster m

Step 1: cluster / segment users

Page 24: Probabilistic Methods for Targeted Advertising

Naïve Targeting Scheme

Step 2: Advertiser books ads into clusters

Step 3: Measure click probabilities

Step 4: Show best ad to each cluster

Problems: (Inventory management)

Ad Quotas

Cluster overbooking

Page 25: Probabilistic Methods for Targeted Advertising

Advertisement Allocation

Cluster 1 Cluster m

Ad 1

Ad 2

Ad n

x11

x21

xn1

x1m

x2m

xnm

Cluster 2

x12

x22

xn2

xij = Number of times to show advertisement i

to user cluster j

Page 26: Probabilistic Methods for Targeted Advertising

Maximize Expected Clicks

Cluster 1 Cluster m

Ad 1

Ad 2

Ad n

p11 x11

p21 x21

pn1 xn1

p1m x1m

p2m x2m

pnm xnm

Cluster 2

p12 x12

p22 x22

pn2 xn2

n

i

m

jijij xpE

1 1

)for Clicks#( X

Page 27: Probabilistic Methods for Targeted Advertising

Inventory-Management Constraints

Ad i xi1 xim

Cluster j

xij

xi1

xin

m

jiij qx

1

n

ijij cx

1

Page 28: Probabilistic Methods for Targeted Advertising

Linear ProgramFind the schedule X that maximizes:

Subject to:

n

i

m

jijij xp

1 1

iqxm

jiij

1

jcxn

ijij

1

Solve using (e.g.) the simplex algorithm

Page 29: Probabilistic Methods for Targeted Advertising

A Simple Targeting System

• Estimate probabilities

• Find the optimal schedule

• Serve ads to cluster j via

''

) Serve(

iji

ij

x

xip

Page 30: Probabilistic Methods for Targeted Advertising

Sensitivity to Estimates

Cluster 1

Ad 1

Ad 2

0.49

0.51

Cluster 2

0.51

0.49q1 = q2 = c1 = c2 =k

Cluster 1

Ad 1

Ad 2

0

k

Cluster 2

k

0

Probabilities:

Optimal Schedule:

Page 31: Probabilistic Methods for Targeted Advertising

Solution: BucketsCluster 1

Ad 1

Ad 2

0.5

0.5

Cluster 2

0.5

0.5q1 = q2 = c1 = c2 =k

Cluster 1

Ad 1

Ad 2

a

c

Cluster 2

b

d

Probabilities:

Optimal Schedule:

a+b+c+d = 2k

Secondary (linear) optimization: Ads are shown as close to uniform across all clusters

Page 32: Probabilistic Methods for Targeted Advertising

Passive Experiment: MSNBC(December 1998)

SportsNewsHealthOpinion

Clusters defined by the current page group

Manual approach: advertisers buy impressions on page groups

Page 33: Probabilistic Methods for Targeted Advertising

~20 clusters~500 advertisements~1.6 million impressions / day

Passive Experiment: MSNBC(December 1998)

Data from day 1:Estimate pij (ave ~4K data points per probability)Find optimal schedule (less than 1 minute – no buckets)

Data from day 2:Re-estimate pij

Evaluate schedule:

Result:

20 – 30 % increase over manual schedule

n

i

m

jijij xp

1 1

Page 34: Probabilistic Methods for Targeted Advertising

Particular advertiser: 5 ads

Data from weekend 1:Estimate pij (~15K data points per probability)Find optimal schedule (less than 1 second using buckets)

Rearrange advertisements for weekend 2

Data from weekend 2:

Count the number of clicks and compare to weekend 1

Active Experiment on MSNBC(May 1999)

Page 35: Probabilistic Methods for Targeted Advertising

0

advertiser control

Weekend 1 (pre target)

Weekend 2 (post target)

30% increase for the advertiser, negligible increase for othersPredicted a 20% increase on MSNBC

Active Experiment Results

Page 36: Probabilistic Methods for Targeted Advertising

Extensions

Problem:

Increasing total expected clicks across site may decrease clicks for particular advertiser

Solution:

Add (linear) constraint that expected clicks cannotdecrease

Passive experiment: MSNBC overall increase still ~20%

Page 37: Probabilistic Methods for Targeted Advertising

Extensions

Focus of talk: pij = expected #clicks from showing ad i to user jIn general: uij = expected utility from showing ad i to user j

Expected utility of X =

n

i

m

jijij xu

1 1

Alternative uij choicesWeighted probabilities: wi pij

Probability of purchaseIncrease in brand awarenessExpected revenue

Page 38: Probabilistic Methods for Targeted Advertising

My Home Page

http://research.microsoft.com/~dmax/

Page 39: Probabilistic Methods for Targeted Advertising
Page 40: Probabilistic Methods for Targeted Advertising

Results on Test Data:Per-person improvement over Mail-to-All

To evaluate test case given a model:

• Evaluate the lift given X (ignoring M and S)

• Recommend Mail if and only if Lift > 0

• If recommendation matches M from the test case, add r to the total revenue. Otherwise, ignore.