Download - Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Transcript
Page 1: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Experiments in the Internet age:A modern Bayesian look at the multi-armed bandit

Steven L. Scott

June 21, 2016

Page 2: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Summary

I The economics of running an experiment in the online serviceeconomy are different than manufacturing or agriculture.

I Costs have shifted from production to opportunity cost.

I Multi-armed bandits minimize the opportunity cost of running anexperiment.

I Explicitly minimizing cost is hard, but a simple Bayesian heuristicknown as Thompson sampling produces good outcomes.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 2 / 46

Page 3: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Multi-armed bandits

(a) “one armed bandit” (b) multi-armed

I Sequential experimental design.

I Produce the highest reward under an uncertain payoff distribution.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 3 / 46

Page 4: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Outline

Motivation and background

Thompson sampling

When to stop experimenting

Incorporating good ideas from DOE

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 4 / 46

Page 5: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Classical experimentsFisher arrives at Rothamstead in 1919

R. A. Fisher Rothamstead station.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 5 / 46

Page 6: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

The classical experiment playbook

I Steps:I Decide which factors you want to test.

Think about possible interactions.I Create a “design matrix” that will allow you to estimate the effects

that are likely to be important.I Power analysis or “optimal design” for sample size at each design point

(row of the design matrix).I Maybe “block” on contextual factors (men / women) to reduce

variance.I Do the experiment (collect the data).I Analyze the results.

I Get it right in the beginning, because you’ve only got one shot!

I All the probability calcuations are done before seeing any data,because the experiment had to be planned before any data wereobserved.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 6 / 46

Page 7: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Experiments through the ages

I AgriculturalI Experiments take a long time (seasons).I Inputs (land, labor, machinery) are expensive.

I IndustrialI Time requirements are typically less.I Inputs still expensive. (Factory retooling, lost units)

I ServiceI Experiments fast (real time).I Inputs are cheap (programmer time).I “All” cost is opportunity cost.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 7 / 46

Page 8: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Experiments today: outsource the detailsCan be run through online optimization frameworks (e.g. Google Optimize 360).

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 8 / 46

Page 9: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Website optimization

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 9 / 46

Page 10: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Service experiments

The economics are favorable.

I No cost to acquiring experimental units.

I Minimal costs to changing the way to treat them.

I Automated experimental frameworks.

Why not experiment with everything, all the time?

I Some do (esp. big tech companies)

I For many, the opportunity cost is a barrier.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 10 / 46

Page 11: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Motivation and background

Stats education can do real damage hereFrom a misguided blogger focused on type-I errors

If you run experiments: the best way to avoid repeatedsignificance testing errors is to not test significance repeatedly.

Decide on a sample size in advance and wait until the experimentis over before you start believing the “chance of beating original”figures that the A/B testing software gives you.

“Peeking” at the data is OK as long as you can restrain yourselffrom stopping an experiment before it has run its course.

I know this goes against something in human nature, so perhapsthe best advice is: no peeking!

I Type I errors cost (effectively) zero. Type II errors are expensive.I This is terrible advice (he later suggests Bayesian design).

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 11 / 46

Page 12: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Outline

Motivation and background

Thompson samplingThe binomial bandit

When to stop experimenting

Incorporating good ideas from DOE

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 12 / 46

Page 13: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Multi-armed bandit: Problem statement

I Rewards y are generated from a probability distribution

fa(y |θ).

I Goal: Choose action a to maximize your total reward.

I Challenge: You don’t know θ, so you need to experiment.

I Trade-off:

Exploit Take the action your model says is best.Explore Do something else, in case your model is wrong.

I Uncertainty about θ is the critical piece.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 13 / 46

Page 14: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Example reward distributions

Binomial Rewards are independent 0/1. θ is a vector of distinctsuccess probabilities for each arm.

Logistic Rewards are independent 0/1. θ is a set of logistic regressioncoefficients for a design matrix. The design matrix mayinclude both experimental and contextual factors.

Restless bandits Rewards accrue in a time series, where the rules arechanging. θt is a set of logistic regression coefficientsmodeled as θt = θt−1 + εt .

Hierarchical Experiments share structure, perhaps because of betweengroup heterogeneity.

Etc (Continuous rewards, continuous action spaces, your problemhere.)

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 14 / 46

Page 15: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

The Thompson heuristic[Thompson(1933)]

Thompson sampling

Randomly allocate units in proportion to theprobability that each arm is “best.”

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 15 / 46

Page 16: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Computing “optimal arm probabilities”

The slow way:

I Given data y, simulate θ(1), θ(2), . . . , θ(N) from p(θ|y).

I For each draw, compute va(θ), the value of action a conditional on θ.

I Let Ia(θ) be the indicator that action a has the largest value. (Breakties at random).

I Set

wa =1

N

N∑g=1

Ia(θ(g)

)The fast way:

I Draw a single θ ∼ p(θ|y).

I Choose a to maximize va(θ).

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 16 / 46

Page 17: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Computing the probability that one arm beats another.

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

● ● ●

● ●

●●

● ●●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Success Probability

Arm 1

Arm

2

●●

● ●

●●

●●

●●

● ●

●●

●●

●● ●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

●●

● ●

● ●

● ●

● ●

●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

● ●●

●●

● ● ●

● ●

●●

●●●

● ●

●●

●●

● ●

●●●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●●

● ●

●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●● ●

●●

●●

●●

●●

●●●

●●

●●●

● ●

● ●

● ●

●●

● ●

●●●

●●●

● ●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●●

●●●

● ●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Success Probability

Arm 1

Arm

2

X = P(θ1|20 successes, 50 trials) X = P(θ1|20 successes, 50 trials)Y = P(θ2|2 successes, 3 trials) Y = P(θ2|20 successes, 30 trials)

P(θ1 > θ2|y) ≈ .16 P(θ1 > θ2|y) ≈ .009

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 17 / 46

Page 18: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Some nice features of Thompson sampling

1. Easily understood principle

Arms attract observations in proportion to their probabilityof being optimal.

2. Easy to implementI Can be applied to a very generallyI Just need to be able to sample from p(θ|y)I Free of arbitrary tuning constants

I No need to set “exploration fraction” (ε-learning)I No decay scheduleI No “discount factor”

I It can be used with batch updates of the posterior.

3. Nearly optimal performanceI Obviously suboptimal arms are quickly dropped, increasing reward.I Increases the sample size for picking the winner from the “good” arms.I Allocations are attracted to performance and uncertainty.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 18 / 46

Page 19: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Other methods

Several other well known methods and heuristics [Sutton and Barto(1998)]

I Equal allocation

I Greedy

I ε−greedy

I ε−decreasing

I Softmax learning

I Upper confidence bound

I Gittins index

I Dynamic programming

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 19 / 46

Page 20: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Upper confidence bound (UCB)Optimism in the face of uncertainty.

CA B

Pick this one

valu

e

I Pick the arm with the highest upperconfidence bound.

I Not the usual bounds from normalapproximations: extra factors of log n inthe SE numerator.

I [Auer et al.(2002)] showed UCBsatisfies optimal rate of exploration[Lai and Robbins(1985)].

I UCB can be effective, but some skill is needed to find the “right”confidence set.

I Replacing confidence sets with posterior distributions finds the “right”set automatically. [Russo and Van Roy(2014)]

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 20 / 46

Page 21: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

Thompson vs UCBResults from [Chapelle and Li(2011)]

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 21 / 46

Page 22: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling

A timeline of recent literature

I [Scott(2010)] and [Chapelle and Li(2011)] present empirical resultssuggesting Thompson sampling is a good idea.

I [May et al.(2012)] prove asymptotic convergence.I All arms are visited “infinitely often.”I Almost all time spent on optimal arm.

I Several authors provide finite time regret bounds in the case ofindependent Bernoulli arms, including [Kaufmann et al.(2012)],[Bubeck and Liu(2013a), Bubeck and Liu(2013b)] and others.

I [Russo and Van Roy(2014)] is the current state of the art. Regretbounds for nearly arbitrary reward distributions.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 22 / 46

Page 23: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

How bad is equal allocation?

I Consider two arms: θ1 = .04, and θ2 = .05.

I Plan a classical experiment to detect this change with 95% power at5% significance.

> power.prop.test(p1 = .04, p2 = .05, power = .95)

n = 11165.99

NOTE: n is number in *each* group

I We need over 22,000 observations.

I Regret is 11, 165× .01 = 111 lost conversions.

I At 100 visits per day, the experiment will take over 220 days.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 23 / 46

Page 24: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

Two-armed experimentBandit shown 100 visits per “day”

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 24 / 46

Page 25: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

Two armed experimentSavings vs equal allocation in terms of time and conversions

Source: https://support.google.com/analytics/answer/2844870?hl=en

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 25 / 46

Page 26: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

Bandits’ advantage grows with experiment size

Now consider 6 arms (formerly the limit of GA Content Experiments).

I Compare the original arm to the “best” competitor.

I Bonferroni correction says divide significance level by 5.

> power.prop.test(p1 = .04, p2 = .05, power = .95,

sig.level=.01)

n = 15307.8

NOTE: n is number in *each* group

I In theory we only need this sample size in the largest arm, but wedon’t know ahead of time which arm that will be.

I Experiment needs 91848 observations.

I At 100 per day that is 2.5 years.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 26 / 46

Page 27: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

6-arm experimentStill 100 observations per day

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 27 / 46

Page 28: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

Huge savings vs equal allocationPartly due to ending early, and partly due to lower cost per day.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 28 / 46

Page 29: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Thompson sampling The binomial bandit

Daily cost diminishes as inferior arms are downweighted

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 29 / 46

Page 30: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

When to stop experimenting

Outline

Motivation and background

Thompson sampling

When to stop experimenting

Incorporating good ideas from DOE

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 30 / 46

Page 31: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

When to stop experimenting

Value remainingA better name than “per-play regret.”

I Industrial experimentshappen because people wantto improve something.

I You can stop experimentingwhen you’ve squeezed all thevalue out of the experiment.

I Let vi∗ = value in draw i ofbest overall arm.

I Let v∗i = value of best armin draw i .

I v∗i − vi∗ is a draw of thevalue remaining in theexperiment.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 31 / 46

Page 32: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

When to stop experimenting

“Potential value remaining” shrinks as n grows.

successes 30 5 20trials 100 50 80wat 0.76 0.00 0.24

successes 120 20 80trials 400 200 320wat 0.93 0.00 0.07

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 32 / 46

Page 33: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

When to stop experimenting

Features of PVR

I NiceI If a subset of arms tie, the experiment can still end.

This can easily happen in multi-factor experiments.

I If any single optimal arm probability exceeds .95 then PVR = 0.(Assuming the .95 quantile)

I Units are meaningful to the experimenter.Allows the experiment to end based on “practical significance.”

I PVR is not a hypothesis testI PVR makes no attempt to control “type I error rate.”

I If two (or more) arms tie, then no attempt is made to favor theoriginal.

I If switching costs are relevant, they need to be baked in.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 33 / 46

Page 34: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Outline

Motivation and background

Thompson sampling

When to stop experimenting

Incorporating good ideas from DOE

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 34 / 46

Page 35: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Bringing in good ideas from DOE

I Experiments in the real world involve multiple factors.

I The “1-way” layout is a bad idea because the number of “arms”explodes with multiple experimental factors.

I Classic DOE uses fractional factorial experiments to control thecombinatorial explosion.

I The fractional factorial idea remains really powerful in a sequentialworld.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 35 / 46

Page 36: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

The power of fractional factorial experimentsA single observation provides partial information on several configurations

I 3 fonts × 5 themes × 5 colors = 75 configurationsI A single run of configuration (Font=3, Color=4, Theme=2) . . . gives

partial information on all configurations withI Color=4, orI Theme=2, orI Font=3

I Each observation teaches us something about 43 configurations.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 36 / 46

Page 37: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Fractional factorial experiments

The classic way of doing a fractional factorial experiment:

I Imagine that you will be analyzing the results of your experiment with alinear regression model.

I Choose a set of interactions that you’re willing to live without.

I Allocate observations so that the remaining coefficients are “estimated asaccurately as possible.”

I X-optimal design, with X = D,G ,A, etc.I Heavily dependent on Gaussian linear models (so that mean and

variance are distinct, and variance only depends on X .).

I Usually restrict to a small subset of potential design point (rows in thedesign matrix).

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 37 / 46

Page 38: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Fractional factorial bandits

How to do it with bandits:Thompson sampling with logistic regression.

I This is really “full factorial” randomization but “fractional factorialmodeling.”

I Easy enough to fractionate the design matrix if there is reason to doso (e.g. eliminate problematic rows).

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 38 / 46

Page 39: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

How to take advantage of “free” replicationsRegression model with dummy variables

Int F2 F3 C2 C3 C4 C5 T2 T3 T4 T5

1 0 0 0 0 0 0 0 0 0 01 1 0 0 0 0 0 0 0 0 01 0 1 0 0 0 0 0 0 0 01 0 0 1 0 0 0 0 0 0 01 0 0 0 0 0 0 1 0 0 01 1 0 1 0 0 0 1 0 0 01 0 0 1 0 0 0 1 0 0 01 0 0 0 1 0 0 0 0 0 01 0 1 0 1 0 0 0 0 0 01 0 1 0 1 0 0 0 1 0 01 0 0 0 1 0 0 0 1 0 01 0 1 0 0 1 0 0 0 0 01 0 1 0 0 0 0 0 0 1 01 0 0 0 0 1 0 0 0 1 01 0 0 0 0 1 0 0 0 0 0

...

(75 rows, 11 columns)

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 39 / 46

Page 40: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Interactions

Interaction # coef Interpretation

Font:Theme 8Helvetica bold has a greater impact withTheme 1 than Theme 2.

Font:Color 8Orange increases CTR, but only withPalatino italic

Color:Theme 16 ...

Font:Color:Theme 32The incremental benefit of Orange withPalatino italic is muted in Theme 3, butmagnified in Theme 4

Main Effects 2 + 4 + 4Intercept 1

Total 75

High order interactions tend to be Small, Noisy, and Confusing.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 40 / 46

Page 41: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

The FF bandit has many fewer parameters to learn, so itlearns much faster, and wastes fewer impressions.

Fractional factorial (probit) bandit Binomial bandit

100 impressions per update cycle.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 41 / 46

Page 42: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Controlling for contextIn a simulation with important interactions between experimental and contextual factors

Interactions with environ-ment confound binomialand FF bandits.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 42 / 46

Page 43: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Controlling for contextIn a simulation with an important environmental effect, but no interactions

Lower variance for envi-ronmental bandit.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 43 / 46

Page 44: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

Incorporating good ideas from DOE

Conclusion

I The economics of the modern service economy are different thanthose of agriculture and industry.

I The world improves rapidly by experimenting with everything all thetime, but experiments have to be cost effective.

I The cost of experiments can be reduced byI Fractional factorial designs that test several factors at once.I Sequential methods that down-weight clearly underperforming arms.

I Thompson sampling is an effective, robust method of running asequential experiment that lets you focus on getting the model right.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 44 / 46

Page 45: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

References

References I

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002).

Finite-time analysis of the multiarmed bandit problem.Machine Learning 47, 235–256.

Bubeck, S. and Liu, C.-Y. (2013a).

A note on the Bayesian regret of Thompson sampling with an arbitrary prior.arXiv preprint arXiv:1304.5758.

Bubeck, S. and Liu, C.-Y. (2013b).

Prior-free and prior-dependent regret bounds for Thompson sampling.In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Weinberger, eds., Advances in Neural InformationProcessing Systems 26, 638–646. Curran Associates, Inc.

Chapelle, O. and Li, L. (2011).

An empirical evaluation of Thompson sampling.In Neural Information Processing Systems (NIPS).

Kaufmann, E., Korda, N., and Munos, R. (2012).

Thompson sampling: An asymptotically optimal finite-time analysis.In Algorithmic Learning Theory, 199–213. Springer.

Lai, T.-L. and Robbins, H. (1985).

Asymptotically efficient adaptive allocation rules.Advances in Applied Mathematics 6, 4–22.

May, B. C., Korda, N., Lee, A., and Leslie, D. S. (2012).

Optimistic bayesian sampling in contextual-bandit problems.The Journal of Machine Learning Research 13, 2069–2106.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 45 / 46

Page 46: Experiments in the Internet age: A modern Bayesian look at ... · If you run experiments: the best way to avoid repeated signi cance testing errors is to not test signi cance repeatedly.

References

References II

Russo, D. and Van Roy, B. (2014).

Learning to optimize via posterior sampling.Mathematics of Operations Research .

Scott, S. L. (2010).

A modern Bayesian look at the multi-armed bandit.Applied Stochastic Models in Business and Industry 26, 639–658.(with discussion).

Sutton, R. S. and Barto, A. G. (1998).

Reinforcement Learning: an introduction.MIT Press.

Thompson, W. R. (1933).

On the likelihood that one unknown probability exceeds another in view of the evidence of two samples.Biometrika 25, 285–294.

Steven L. Scott (Google) Bayesian Bandits June 21, 2016 46 / 46