Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016....

43
Approximate Optimality with Bounded Regret in Dynamic Matching Models Ana Buˇ si´ c Inria Paris CS Department of ´ Ecole normale sup´ erieure Joint work with Sean Meyn University of Florida and Ivo Adan, Varun Gupta, Jean Mairesse, and Gideon Weiss Real-Time Decision Making Simons Institute, Berkeley, Jun. 27 - Jul. 1, 2016 1 / 19

Transcript of Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016....

Page 1: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Approximate Optimality with Bounded Regret in

Dynamic Matching Models

Ana Busic

Inria ParisCS Department of Ecole normale superieure

Joint work withSean Meyn

University of Florida

andIvo Adan, Varun Gupta, Jean Mairesse, and Gideon Weiss

Real-Time Decision MakingSimons Institute, Berkeley, Jun. 27 - Jul. 1, 2016

1 / 19

Page 2: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Outline

1 Background

2 Bipartite matching model

3 OptimizationAverage Cost CriterionWorkloadWorkload RelaxationAsymptotic optimality

4 Final remarks

2 / 19

Page 3: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Background

Bipartite Matching

(D,S,E ) bipartite graph

D(s) = {d ∈ D : (d , s) ∈ E}S(d) = {s ∈ S : (d , s) ∈ E}

1 32

2’ 1’3’

D

S

xi number of elements of type i ∈ D ∪ S

Perfect matching: m ∈ NE such that:

xd =∑

s∈S(d)

mds , ∀d ∈ D, xs =∑

d∈D(s)

mds , ∀s ∈ S

Hall’s marriage theorem (1935)

∃ perfect matching if and only if:∑d∈U xc ≤

∑s∈S(U) xs , ∀U ⊂ D∑

s∈V xs ≤∑

d∈D(V ) xd , ∀V ⊂ S

3 / 19

Page 4: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Background

Matching in Health-care

Kidney paired donation

Who can join this program?For recipients: If you are eligible

for a kidney transplant and are

receiving care at a transplant

center in the United States, you

can join ... You must have a living

donor who is willing and medically

able to donate his or her kidney ...

For donors: You must also be

willing to take part ...

U N I T E D N E T W O R K F O R O R G A N S H A R I N G

TA L K I N G A B O U T T R A N S P L A N TAT I O N

Need for Dynamic Matching Models

4 / 19

Page 5: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Background

Matching in Health-care

Kidney paired donation

Who can join this program?For recipients: If you are eligible

for a kidney transplant and are

receiving care at a transplant

center in the United States, you

can join ... You must have a living

donor who is willing and medically

able to donate his or her kidney ...

For donors: You must also be

willing to take part ...

U N I T E D N E T W O R K F O R O R G A N S H A R I N G

TA L K I N G A B O U T T R A N S P L A N TAT I O N

Need for Dynamic Matching Models

4 / 19

Page 6: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Background

Dynamic Matching Model: FCFS

Another Example

Boston area public housing (some 25 years ago):

Model

Two independent infinite sequences of items.Demand / supply i.i.d.

FCFS matching policy admits product-form invariant distribution

Caldentey, Kaplan, Weiss 2009Adan, Weiss 2012Adan, B., Mairesse, Weiss 2015

5 / 19

Page 7: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Bipartite Matching Model

Multiclass queueing model – Supply/Demand play symmetric roles

Discrete time queueing model with twotypes of arrival: “supply” and “demand”.

Discrete time: at each time step there isone customer and one server that arriveinto the system, independently of the past.

Instantaneous matchings according to abipartite matching graph.Unmatched supply/demand stored in abuffer.

Given by: a matching graph, a joint probability measure µ for arrivals ofdemand/supply and a matching policy.

6 / 19

Page 8: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Bipartite Matching Model

Multiclass queueing model – Supply/Demand play symmetric roles

Discrete time queueing model with twotypes of arrival: “supply” and “demand”.

Discrete time: at each time step there isone customer and one server that arriveinto the system, independently of the past.

Instantaneous matchings according to abipartite matching graph.Unmatched supply/demand stored in abuffer.

Given by: a matching graph, a joint probability measure µ for arrivals ofdemand/supply and a matching policy.

6 / 19

Page 9: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Bipartite Matching Model

Multiclass queueing model – Supply/Demand play symmetric roles

Discrete time queueing model with twotypes of arrival: “supply” and “demand”.

Discrete time: at each time step there isone customer and one server that arriveinto the system, independently of the past.

Instantaneous matchings according to abipartite matching graph.Unmatched supply/demand stored in abuffer.

Given by: a matching graph, a joint probability measure µ for arrivals ofdemand/supply and a matching policy.

6 / 19

Page 10: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Bipartite Matching Model

Multiclass queueing model – Supply/Demand play symmetric roles

Discrete time queueing model with twotypes of arrival: “supply” and “demand”.

Discrete time: at each time step there isone customer and one server that arriveinto the system, independently of the past.

Instantaneous matchings according to abipartite matching graph.Unmatched supply/demand stored in abuffer.

Given by: a matching graph, a joint probability measure µ for arrivals ofdemand/supply and a matching policy.

6 / 19

Page 11: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Matching Model: Stability

For the dynamic model with i.i.d. arrivals, when is the Markovian model stable?(positive recurrent)

Necessary condition: generalization of Hall’s marriage theorem

Under this condition, certain policies are stabilizing, such as MaxWeight

Under this condition, other policies are not stabilizing

B., Gupta, Mairesse 2013.

7 / 19

Page 12: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Dynamic Matching Model: Approximate Optimality

Subject of this talk:

How to define ‘heavy traffic’? This requires a formulation of ‘network load’

What is the structure of an optimal policy for the model in heavy traffic?

How do we use this structure for policy design?

B., Meyn 2016

8 / 19

Page 13: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Necessary stability conditions

Assumption: matching graph (D,S,E ) is connected.

Necessary conditions: If the model is stable then the marginals of µ satisfy

NCond :

{µD(U) < µS(S(U)), ∀U ( DµS(V ) < µD(D(V )), ∀V ( S

Sufficient conditions: If NCond holds, then there exists a policy that is stabilizing

Prop. Given [(D,S,E ), µ], there exists an algorithm of time complexityO((|D|+ |S|)3) to decide if NCond is satisfied.

9 / 19

Page 14: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Necessary stability conditions

Assumption: matching graph (D,S,E ) is connected.

Necessary conditions: If the model is stable then the marginals of µ satisfy

NCond :

{µD(U) < µS(S(U)), ∀U ( DµS(V ) < µD(D(V )), ∀V ( S

Sufficient conditions: If NCond holds, then there exists a policy that is stabilizing

Prop. Given [(D,S,E ), µ], there exists an algorithm of time complexityO((|D|+ |S|)3) to decide if NCond is satisfied.

9 / 19

Page 15: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Bipartite matching model

Necessary stability conditions

Assumption: matching graph (D,S,E ) is connected.

Necessary conditions: If the model is stable then the marginals of µ satisfy

NCond :

{µD(U) < µS(S(U)), ∀U ( DµS(V ) < µD(D(V )), ∀V ( S

Sufficient conditions: If NCond holds, then there exists a policy that is stabilizing

Prop. Given [(D,S,E ), µ], there exists an algorithm of time complexityO((|D|+ |S|)3) to decide if NCond is satisfied.

9 / 19

Page 16: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Average Cost Criterion

Optimization

Cost function c on buffer levels.

Average-cost: η = lim supN→∞

1

N

N−1∑t=0

E[c(Q(t))

]

Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .

X (t) = Q(t) + A(t) the state process of the MDP model,

X (t + 1) = X (t)− U(t) + A(t + 1)

The state space X� = {x ∈ Z`+ : ξ0 · x = 0} with ξ0 = (1, . . . , 1,−1, . . . ,−1).

10 / 19

Page 17: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Average Cost Criterion

Optimization

Cost function c on buffer levels.

Average-cost: η = lim supN→∞

1

N

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0

Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .

X (t) = Q(t) + A(t) the state process of the MDP model,

X (t + 1) = X (t)− U(t) + A(t + 1)

The state space X� = {x ∈ Z`+ : ξ0 · x = 0} with ξ0 = (1, . . . , 1,−1, . . . ,−1).

10 / 19

Page 18: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Average Cost Criterion

Optimization

Cost function c on buffer levels.

Average-cost: η = lim supN→∞

1

N

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .

X (t) = Q(t) + A(t) the state process of the MDP model,

X (t + 1) = X (t)− U(t) + A(t + 1)

The state space X� = {x ∈ Z`+ : ξ0 · x = 0} with ξ0 = (1, . . . , 1,−1, . . . ,−1).

10 / 19

Page 19: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Average Cost Criterion

Optimization

Cost function c on buffer levels.

Average-cost: η = lim supN→∞

1

N

N−1∑t=0

E[c(Q(t))

]Queue dynamics: Q(t + 1) = Q(t)− U(t) + A(t) , t ≥ 0Input process U represents the sequence of matching activities. Input space:

U� ={∑e∈E

neue : ne ∈ Z+

}with ue = 1i + 1j for e = (i , j) ∈ E .

X (t) = Q(t) + A(t) the state process of the MDP model,

X (t + 1) = X (t)− U(t) + A(t + 1)

The state space X� = {x ∈ Z`+ : ξ0 · x = 0} with ξ0 = (1, . . . , 1,−1, . . . ,−1).

10 / 19

Page 20: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload

For any D ⊂ D, corresponding workload vector ξD defined so that

ξD · x =∑i∈D

xDi −

∑j∈S(D)

xSj

Necessary and sufficient condition for a stabilizing policy:

NCond: δD :=−ξD · α > 0 for each Dα = E[A(t)] arrival rate vector.

Why is this workload? Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

The minimal time to reach the origin from x(0) = x : T ∗(x) = maxDξD ·xδD

Heavy-traffic: δD ∼ 0 for one or more D

11 / 19

Page 21: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload

For any D ⊂ D, corresponding workload vector ξD defined so that

ξD · x =∑i∈D

xDi −

∑j∈S(D)

xSj

Necessary and sufficient condition for a stabilizing policy:

NCond: δD :=−ξD · α > 0 for each Dα = E[A(t)] arrival rate vector.

Why is this workload?

Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

The minimal time to reach the origin from x(0) = x : T ∗(x) = maxDξD ·xδD

Heavy-traffic: δD ∼ 0 for one or more D

11 / 19

Page 22: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload

For any D ⊂ D, corresponding workload vector ξD defined so that

ξD · x =∑i∈D

xDi −

∑j∈S(D)

xSj

Necessary and sufficient condition for a stabilizing policy:

NCond: δD :=−ξD · α > 0 for each Dα = E[A(t)] arrival rate vector.

Why is this workload? Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

The minimal time to reach the origin from x(0) = x : T ∗(x) = maxDξD ·xδD

Heavy-traffic: δD ∼ 0 for one or more D

11 / 19

Page 23: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload

For any D ⊂ D, corresponding workload vector ξD defined so that

ξD · x =∑i∈D

xDi −

∑j∈S(D)

xSj

Necessary and sufficient condition for a stabilizing policy:

NCond: δD :=−ξD · α > 0 for each Dα = E[A(t)] arrival rate vector.

Why is this workload? Consistent with routing/scheduling models:

Fluid model,d

dtx(t) = −u(t) + α

The minimal time to reach the origin from x(0) = x : T ∗(x) = maxDξD ·xδD

Heavy-traffic: δD ∼ 0 for one or more D

11 / 19

Page 24: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t)

can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

12 / 19

Page 25: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t) can be positive or negative.

Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

12 / 19

Page 26: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t) can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

12 / 19

Page 27: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t) can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

12 / 19

Page 28: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload

Workload Dynamics

Fix one workload vector ξD ; denote (ξ, δ) for (ξD , δD).

Workload W (t) = ξ · X (t) can be positive or negative.Dynamics as in other queueing models,

E[W (t + 1)−W (t) | X (t), U(t)] ≥ −δ

Achieved ⇐⇒ S(D) matches with D only.

Workload relaxation: take this as the model for control.

12 / 19

Page 29: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload Relaxation

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ ∆(t + 1)︸ ︷︷ ︸Zero mean

Effective cost c : R→ R+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

Piecewise linear if c is linear: c(w) = max(c+w ,−c−w).

Conclusions

Control of the relaxation = inventory model of Clark & Scarf (1960)

Hedging policy, with threshold τ∗: Idling is not permitted unless W (t) < −τ∗

Heavy-traffic: For average-cost optimal control, τ∗ ∼ 12

σ2∆

δlog(1 + c+/c−)

13 / 19

Page 30: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload Relaxation

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ ∆(t + 1)︸ ︷︷ ︸Zero mean

Effective cost c : R→ R+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

Piecewise linear if c is linear: c(w) = max(c+w ,−c−w).

Conclusions

Control of the relaxation = inventory model of Clark & Scarf (1960)

Hedging policy, with threshold τ∗: Idling is not permitted unless W (t) < −τ∗

Heavy-traffic: For average-cost optimal control, τ∗ ∼ 12

σ2∆

δlog(1 + c+/c−)

13 / 19

Page 31: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload Relaxation

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ ∆(t + 1)︸ ︷︷ ︸Zero mean

Effective cost c : R→ R+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

Piecewise linear if c is linear: c(w) = max(c+w ,−c−w).

Conclusions

Control of the relaxation = inventory model of Clark & Scarf (1960)

Hedging policy, with threshold τ∗: Idling is not permitted unless W (t) < −τ∗

Heavy-traffic: For average-cost optimal control, τ∗ ∼ 12

σ2∆

δlog(1 + c+/c−)

13 / 19

Page 32: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload Relaxation

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ ∆(t + 1)︸ ︷︷ ︸Zero mean

Effective cost c : R→ R+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

Piecewise linear if c is linear: c(w) = max(c+w ,−c−w).

Conclusions

Control of the relaxation = inventory model of Clark & Scarf (1960)

Hedging policy, with threshold τ∗: Idling is not permitted unless W (t) < −τ∗

Heavy-traffic: For average-cost optimal control, τ∗ ∼ 12

σ2∆

δlog(1 + c+/c−)

13 / 19

Page 33: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Workload Relaxation

Relaxations

A workload relaxation takes this as the model for control:One Dimensional Workload relaxation,

W (t + 1) = W (t)− δ + I (t)︸︷︷︸Idleness ≥ 0

+ ∆(t + 1)︸ ︷︷ ︸Zero mean

Effective cost c : R→ R+: Given a cost function c for Q,

c(w) = min{c(x) : ξ · x = w}

Piecewise linear if c is linear: c(w) = max(c+w ,−c−w).

Conclusions

Control of the relaxation = inventory model of Clark & Scarf (1960)

Hedging policy, with threshold τ∗: Idling is not permitted unless W (t) < −τ∗

Heavy-traffic: For average-cost optimal control, τ∗ ∼ 12

σ2∆

δlog(1 + c+/c−)

13 / 19

Page 34: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Asymptotic optimality

Asymptotic optimality

Family of arrival processes {Aδ(t)}, parameterized by δ ∈ [0, δ•], δ• ∈ (0, 1).Additional assumptions:

(A1) For one set D ( D we have ξD · αδ = −δ, where αδ denotes the mean ofAδ(t).Moreover, there is a fixed constant δ > 0 such that ξD

′ · αδ ≤ −δ for anyD ′ ( D, D ′ 6= D, and δ ∈ [0, δ•].

(A2) The distributions are continuous at δ = 0, with linear rate: For someconstant b,

E[‖Aδ(t)− A0(t)‖] ≤ bδ.

(A3) Graph structure for arrivals and for feasible matches independent of δ ≥ 0=⇒ The matching graph is connected even for δ = 0.Moreover, there exists i0 ∈ S(D), j0 ∈ Dc , and pI > 0 such that

P{Aδi0 (t) ≥ 1 and Aδj0 (t) ≥ 1} ≥ pI , 0 ≤ δ ≤ δ•.

14 / 19

Page 35: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Asymptotic optimality

Asymptotic optimality

h-MWT (h-MaxWeight with threshold) policy:For a differentiable function h : R` → R+, and a threshold τ ≥ 0,

φ(x) = argmax u · ∇h (x)

subject to u feasible

and I (t) ≤ max(−W (t)− τ, 0), when X (t) = x and U(t) = u.

Thm (Asymptotic Optimality With Bounded Regret) [B., Meyn ’16]

There is an h-MWT policy with finite average cost η, satisfying

η∗ ≤ η∗ ≤ η ≤ η∗ + O(1)

where η∗ is the optimal average cost for the MDP model, η∗ is the optimalaverage cost for the workload relaxation, and the term O(1) does not dependupon δ.

15 / 19

Page 36: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Asymptotic optimality

Asymptotic optimality

h-MWT (h-MaxWeight with threshold) policy:For a differentiable function h : R` → R+, and a threshold τ ≥ 0,

φ(x) = argmax u · ∇h (x)

subject to u feasible

and I (t) ≤ max(−W (t)− τ, 0), when X (t) = x and U(t) = u.

Thm (Asymptotic Optimality With Bounded Regret) [B., Meyn ’16]

There is an h-MWT policy with finite average cost η, satisfying

η∗ ≤ η∗ ≤ η ≤ η∗ + O(1)

where η∗ is the optimal average cost for the MDP model, η∗ is the optimalaverage cost for the workload relaxation, and the term O(1) does not dependupon δ.

15 / 19

Page 37: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Asymptotic optimality

Asymptotic optimality

The average cost for the relaxation satisfies the uniform bound,

η∗ = η∗∗ + O(1)

where η∗∗ is the optimal cost for the diffusion approx. for the relaxation:

η∗∗ = τ∗ c− = 12

σ2N

δc− log

(1 +

c+

c−

)

h(x) = h(ξ · x) + hc(x)

- hc is introduced to penalize deviations between c(x) and c(ξ · x).- The first term h is a function of workload. For w ≥ −τ∗, it solves the

second-order differential equation,

−δh′ (w) + 12σ2

∆h′′ (w) = −c(w) + η∗∗, (1)

There is a solution that is convex and increasing on [−τ∗,∞), withh′(−τ∗) = h′′(−τ∗) = 0. Then extended to get a convex C 2 function on R.

16 / 19

Page 38: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Optimization Asymptotic optimality

Asymptotic optimality

The average cost for the relaxation satisfies the uniform bound,

η∗ = η∗∗ + O(1)

where η∗∗ is the optimal cost for the diffusion approx. for the relaxation:

η∗∗ = τ∗ c− = 12

σ2N

δc− log

(1 +

c+

c−

)h(x) = h(ξ · x) + hc(x)

- hc is introduced to penalize deviations between c(x) and c(ξ · x).- The first term h is a function of workload. For w ≥ −τ∗, it solves the

second-order differential equation,

−δh′ (w) + 12σ2

∆h′′ (w) = −c(w) + η∗∗, (1)

There is a solution that is convex and increasing on [−τ∗,∞), withh′(−τ∗) = h′′(−τ∗) = 0. Then extended to get a convex C 2 function on R.

16 / 19

Page 39: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Examples

Example

Example

Cost: c(x) = xD1 + 2xD

2 + 3xD3 + 3xS

1 + 2xS2 + xS

3

=⇒ Effective Cost: c(w) = 4|w |

e1 e2 e3 e4 e5

xD1 xD

2 xD3

xS1xS

2xS3

Dem

and

Supp

ly

Q QD1

D2 QD

3

Q QS1

S2 QS

3

0 5 10 15 20 25 30 r

r∗ = 14.9

62

64

66

68

70

72

74

76

Average Cost Estimated in Simulation:Average Cost Comparisons:

50

100

150

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

T

W (t) = QD3 (t)− QS

1 (t)

Matching of Supply 1 and Demand 2allowed only if W (t) < −τ∗

Workload Relaxation:

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

Simulation with τ = 14.9

17 / 19

Page 40: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Examples

Example

Example

Cost: c(x) = xD1 + 2xD

2 + 3xD3 + 3xS

1 + 2xS2 + xS

3

=⇒ Effective Cost: c(w) = 4|w |

e1 e2 e3 e4 e5

xD1 xD

2 xD3

xS1xS

2xS3

Dem

and

Supp

ly

Q QD1

D2 QD

3

Q QS1

S2 QS

3

0 5 10 15 20 25 30 r

r∗ = 14.9

62

64

66

68

70

72

74

76

Average Cost Estimated in Simulation:

Average Cost Comparisons:

50

100

150

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

T

W (t) = QD3 (t)− QS

1 (t)

Matching of Supply 1 and Demand 2allowed only if W (t) < −τ∗

Workload Relaxation:

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

Simulation with τ = 14.9

17 / 19

Page 41: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Examples

Example

Example

Cost: c(x) = xD1 + 2xD

2 + 3xD3 + 3xS

1 + 2xS2 + xS

3

=⇒ Effective Cost: c(w) = 4|w |

e1 e2 e3 e4 e5

xD1 xD

2 xD3

xS1xS

2xS3

Dem

and

Supp

ly

Q QD1

D2 QD

3

Q QS1

S2 QS

30 5 10 15 20 25 30 r

r∗ = 14.9

62

64

66

68

70

72

74

76

Average Cost Estimated in Simulation:

Average Cost Comparisons:

50

100

150

Priority

MaxWeight

Threshold (15)

1 2 3 4 5x 106

T

W (t) = QD3 (t)− QS

1 (t)

Matching of Supply 1 and Demand 2allowed only if W (t) < −τ∗

Workload Relaxation:

QS1 (t) = QS

2 (t) = 0 if W (t) > 0

QD2 (t) = QD

3 (t) = 0 if W (t) < 0

Simulation with τ = 14.9

17 / 19

Page 42: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

Final remarks

Final remarks

Performance bounds?

Approximate optimal control for relaxations in higher dimensions?

More general arrival assumptions. Admission control? Abandonnements?

Optimization for non-bipartite matching?

Applications?

18 / 19

Page 43: Approximate Optimality with Bounded Regret in Dynamic …busic/slides/DynamicMatching... · 2016. 11. 2. · Sean Meyn University of Florida and Ivo Adan,Varun Gupta,Jean Mairesse,

References

References

Dynamic bipartite matching models

Caldentey, Kaplan, Weiss, FCFS infinite bipartite matching of servers andcustomers. Adv. Appl. Probab. 2009.

Adan & Weiss, Exact FCFS matching rates for two infinite multi-type sequences.Operations Research, 2012.

Busic, Gupta, Mairesse, Stability of the bipartite matching model. Adv. Appl.Probab. 2013.

Mairesse, Moyal, Stability of the stochastic matching model. ArXiv. 2014.

Adan, Busic, Mairesse, Weiss, Reversibility and further properties of FCFS infinitebipartite matching. ArXiv. 2015.

Busic, Meyn, Approximate optimality with bounded regret in dynamic matchingmodels. ArXiv. 2016.

Workload relaxations

Meyn, Control Techniques for Complex Networks. Cambridge Uni. Press, 2007.

Meyn, Stability and asymptotic optimality of generalized MaxWeight policies.SIAM J. Control Optim., 2009.

Gurvich, Ward, On the dynamic control of matching queues, Stoch. Systems, 2014.

19 / 19