Georgi Yosifov Vasil Kostov ELSYS...

20
Bandit Algorithms ELSYS 2014/2015 Vasil Kostov Georgi Yosifov

Transcript of Georgi Yosifov Vasil Kostov ELSYS...

Page 1: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Bandit AlgorithmsELSYS 2014/2015

Vasil KostovGeorgi Yosifov

Page 2: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have
Page 3: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Deborah Knull’s problem

Page 4: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

The Scientist and the Businessman

Page 5: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Cynthia the Scientist

● Exploration of all the possibilities● Controlled Experiment● A/B testing…………..● A/B/C/D/E/F testing

Page 6: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have
Page 7: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Bob the Businessman● Cynthia’s a scientist. Of course she thinks that you

should run lots of experiments.● Costs of her experiments● You should try to maximize your site’s profits. ● Unless you really believe a change has the potential to

be valuable, don’t try it at all.● Going with your traditional logo is the best strategy.”

Page 8: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Oscar the Operations Researcher

“I entirely agree that you have to find a way to balance Cynthia’s interest in experimentationand Bob’s interest in profits. My colleagues and I call that the Explore-Exploittrade-off.”

Page 9: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Problems● There is no universal solution to balancing your two

goals: to learn which ideas are good or bad, you have to explore — at the risk of losing money and bringing in fewer profits

● Halloween, Christmas

● How long to run the test?!

Page 10: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

What is success?

● Traffic (Did a change increase the amount of traffic to a site’s landing page?)● Conversions (Did a change increase the number of one-time visitors who were successfully converted into

repeat customers?)● Sales (Did a change increase the number of purchases being made on a site by either new or existing customers?)● CTR’s (Did a change increase the number of times that visitors clicked on an ad?)

Page 11: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

The epsilon-Greedy Algorithm

Page 12: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Describing Our Logo-Choosing Problem Abstractly

Page 13: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

What is an arm?

For historical reasons, these options (the different logos) are typically referred to as arms, so we’ll talk about Arm 1 and Arm 2 and Arm N rather than Option 1, Option 2 or Option N.

Page 14: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

What’s a Reward?

● A measure of success

Page 15: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

What’s a Bandit Problem?● We’re facing a complicated slot machine, called a bandit, that has a set of

N arms that we can pull on.

● When pulled, any given arm will output a reward. But these rewards aren’t reliable, which is why we’re gambling: Arm 1 might give us 1 unit of reward only 1% of the time, while Arm 2 might give us 1 unit of reward only 3% of the time. Any specific pull of any specific arm is risky.

● Not only is each pull of an arm risky, we also don’t start off knowing what the reward rates are for any of the arms. We have to figure this out experimentally by actually pulling on the unknown arms.

Page 16: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

● We only find out about the reward that was given out by the arm we actually pulled. Whichever arm we pull, we miss out on information about the other arms that we didn’t pull. Just like in real life, you only learn about the path you took and not the paths you could have taken.

● Every time we experiment with an arm that isn’t the best arm, we lose reward because we could, at least in principle, have pulled on a better arm.

Page 17: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have
Page 18: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

Monte Carlo method● Monte Carlo methods (or Monte Carlo experiments) are a broad class of

computational algorithms that rely on repeated random sampling to obtain numerical results

● Typically one runs simulations many times over in order to obtain the distribution of an unknown probabilistic entity.

Page 19: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have
Page 20: Georgi Yosifov Vasil Kostov ELSYS 2014/2015lubo.elsys-bg.org/wp-content/uploads/2015/01/Bandit-Algorithms.pdf · Oscar the Operations Researcher “I entirely agree that you have

References● Bandit Algorithms for Website Optimization, By John

Myles White, Publisher: O'Reilly Media, December 2012● https://github.com/johnmyleswhite/BanditsBook● http://en.wikipedia.org/wiki/Monte_Carlo_method