Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via...

36
Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich

Transcript of Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via...

Page 1: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

Low-regret Online Decision-making Via Bellman Inequalities

Joint work with Sid Banerjee and Itai Gurvich

Page 2: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

2/36Relaxations and Regret Bounds for Online Problems

● Must make decisions upon request ● Uncertain process● Statistical information available● Goal: develop practical near optimal algorithms

Page 3: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

3/36

Our Results

Relaxations and Regret Bounds for Online Problems

Page 4: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

4/36

Our Results

Relaxations and Regret Bounds for Online Problems

Meta-Theorem For diferent resource allocation problems, we

give a practical policy, based on re-solving an optimization

program, with bounded .

The bound is independent of the horizon and capacities.

Page 5: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

5/36

Our Results

Relaxations and Regret Bounds for Online Problems

Meta-Theorem For diferent resource allocation problems, we

give a practical policy, based on re-solving an optimization

program, with bounded .

The bound is independent of the horizon and capacities.

● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits

Page 6: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

6/36

Our Results

Relaxations and Regret Bounds for Online Problems

Meta-Theorem For diferent resource allocation problems, we

give a practical policy, based on re-solving an optimization

program, with bounded .

The bound is independent of the horizon and capacities.

● Applications: Dynamic posted pricing, Online Knapsack, Network Revenue Management (Online Packing), Online Matching, Online Probing, Contextual Bandits

● Challenges: defne a benchmark and use it to design an algorithm

Page 7: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

7/36

Why Constant Regret?

Relaxations and Regret Bounds for Online Problems

Case Study: edge weighted online matching

Page 8: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

8/36

Why Constant Regret?

Relaxations and Regret Bounds for Online Problems

Case Study: edge weighted online matching

Page 9: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

9/36

Why Constant Regret?

Relaxations and Regret Bounds for Online Problems

● Algorithms are diferent● Not worst case, but parametric

Case Study: edge weighted online matching

Page 10: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

10/36

Problem 1: Online Knapsack

Relaxations and Regret Bounds for Online Problems

● Finite set of types:

● Known reward distribution and weight:

● Initial budget and horizon:

● Arrival process:

● Objective: collect as much reward as possible

Page 11: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

11/36

Types of Benchmark

Relaxations and Regret Bounds for Online Problems

Number of type- arrivals

Page 12: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

12/36

Types of Benchmark

Relaxations and Regret Bounds for Online Problems

Reward

Algorithm Optimal (DP) Prophet

Regret Number of type- arrivals

Page 13: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

13/36

Online Packing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for online packing problems. Regret independent of .

In particular, the regret depends only on

Generalizes to multiple resources and other arrival processes.

Page 14: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

14/36

Online Packing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for online packing problems. Regret independent of .

In particular, the regret depends only on

Generalizes to multiple resources and other arrival processes.

Page 15: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

15/36

Online Packing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for online packing problems. Regret independent of .

In particular, the regret depends only on

Generalizes to multiple resources and other arrival processes.

Page 16: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

16/36

Online Packing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for online packing problems. Regret independent of .

In particular, the regret depends only on

Generalizes to multiple resources and other arrival processes.

Page 17: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

17/36

Online Packing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for online packing problems. Regret independent of .

In particular, the regret depends only on

Generalizes to multiple resources and other arrival processes.

Similar results in a recent work for restricted cases [Bumpensanti & Wang]

Page 18: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

18/36

Overview of the General Framework

Relaxations and Regret Bounds for Online Problems

Goal: Handle more general problems

Page 19: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

19/36

Overview of the General Framework

Relaxations and Regret Bounds for Online Problems

Goal: Handle more general problems

Page 20: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

20/36

Intuition

Relaxations and Regret Bounds for Online Problems

Given the additional information, Prophet wants to solve a DP

Page 21: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

21/36

Intuition

Relaxations and Regret Bounds for Online Problems

Given the additional information, Prophet wants to solve a DP

Page 22: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

22/36

Intuition

Relaxations and Regret Bounds for Online Problems

Bellman Loss (computational)

Given the additional information, Prophet wants to solve a DP

Page 23: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

23/36

Intuition

Relaxations and Regret Bounds for Online Problems

Bellman Loss (computational)

Given the additional information, Prophet wants to solve a DP

Page 24: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

24/36

Intuition

Relaxations and Regret Bounds for Online Problems

Bellman Loss (computational)

Information Loss (estimation)

Given the additional information, Prophet wants to solve a DP

Page 25: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

25/36

Knapsack RABBI

Relaxations and Regret Bounds for Online Problems

Page 26: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

26/36

Problem 2: Dynamic Posted Pricing

Relaxations and Regret Bounds for Online Problems

● Stream of T customers with i.i.d. rewards

● Each customer wants one of our identical items

● We can post any fare from the set

● Objective: collect as much reward as possible

Prophet solves:

?

Page 27: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

27/36

Pricing RABBI

Relaxations and Regret Bounds for Online Problems

Fraction of customers that would buy when the fare is

Page 28: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

28/36

Dynamic Posted Pricing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for Dynamic Posted Pricing. Regret independent of .

In particular, the regret depends only on .

Fraction that buys at

Page 29: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

29/36

Dynamic Posted Pricing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for Dynamic Posted Pricing. Regret independent of .

In particular, the regret depends only on .

Fraction that buys at

Page 30: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

30/36

Dynamic Posted Pricing

Relaxations and Regret Bounds for Online Problems

Theorem A natural policy with constant expected regret

for Dynamic Posted Pricing. Regret independent of .

In particular, the regret depends only on .

Fraction that buys at

Page 31: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

31/36

The Algorithm is Practical

Relaxations and Regret Bounds for Online Problems

Page 32: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

32/36

The Algorithm is Practical

Relaxations and Regret Bounds for Online Problems

Page 33: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

33/36

Bound via Bellman Inequalities

Relaxations and Regret Bounds for Online Problems

Defnition Given fltration , is a relaxed value w.r.t. if

1) Initial Ordering:

2) Monotonicity:

Page 34: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

34/36

Bound via Bellman Inequalities

Relaxations and Regret Bounds for Online Problems

Defnition Given fltration , is a relaxed value w.r.t. if

1) Initial Ordering:

2) Monotonicity:

Page 35: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

35/36

Conclusions and Extensions

Relaxations and Regret Bounds for Online Problems

● Framework based on constructing tractable benchmarks● Bellman Loss: computational● Information Loss: estimation● Applications: NRM, Probing, Contextual Bandits,

AdWords, Dynamic Pricing, and other Resource Allocation Problems

Page 36: Low-regret Online Decision-making Via Bellman Inequalities · Low-regret Online Decision-making Via Bellman Inequalities Joint work with Sid Banerjee and Itai Gurvich. 2/36 Relaxations

36/36

Related Work

Relaxations and Regret Bounds for Online Problems

● Prophet: worst case distribution (competitive ratio) for maximum of iid [Hill & Kertz], best possible [Correa et al.], matroid constraints [Kleinberg & Weinberg]

● Constant regret in NRM: [Arlotto & Gurvich]

[Talluri & Van Ryzin], [Reiman & Wang], [Jasin & Kumar], [Bumpensanti & Wang]

● Online matching, resource allocation, AdWords[Manshadi et al], [Legrain & Jaillet]

● Probing: competitive ratio (linear regret) [Gupta & Nagarajan], [Singla], [Chugg & Maehara]

● Information Relaxation [Balseiro & Brown], [Brwon, Smith, & Sun] ● Approximate Dynamic Programming [Powell]