Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C...

14
Stochastic Dynamic Teams and Games with Asymmetric Information TAMER BAȘAR ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C [email protected] September 29, 2015 IMA-Distributed Control and DM over Networks Sept 28 – Oct 2, 2015 Univ Minnesota, Minneapolis Sept 29, 2015 IMA OUTLINE Introduction to decision making in stochastic environments Dynamic teams, games, role of information, asymmetry, non-classical IS Some caveats and counter-examples on existence and characterization of sols Existence and characterization of team- optimal policies with non-classical IS Existence and characterization of Nash equilibrium policies under asymmetry ** Conclusions Sept 29, 2015 IMA General Framework INGREDIENTS Uncertainty (uncertain environment) Decision makers (DMs) (players, agents) Perceptions of DMs on uncertainty Objective(s) Description of interactions (underlying network) Common information / Private information Sept 29, 2015 IMA General Framework Uncertainty (uncertain environment); decision makers (DMs) (players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information DMs pick policies (decision laws, strategies) leading to actions that evolve over time Sept 29, 2015 IMA General Framework Uncertainty (uncertain environment); decision makers (DMs) (players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information DMs pick policies (decision laws, strategies) leading to actions that evolve over time Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon Sept 29, 2015 IMA General Framework Uncertainty (uncertain environment); decision makers (DMs) (players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information DMs pick policies (decision laws, strategies) leading to actions that evolve over time Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon Single DM => stochastic control Single objective => stochastic teams Otherwise ZS or NZS games, with NE

Transcript of Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C...

Page 1: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Stochastic Dynamic Teams and Games with Asymmetric

Information TAMER BAȘAR

ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C

[email protected]

September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 28 – Oct 2, 2015 Univ Minnesota, Minneapolis Sept 29, 2015 IMA

OUTLINE !•  Introduction to decision making in

stochastic environments •  Dynamic teams, games, role of

information, asymmetry, non-classical IS •  Some caveats and counter-examples on

existence and characterization of sols •  Existence and characterization of team-

optimal policies with non-classical IS •  Existence and characterization of Nash

equilibrium policies under asymmetry** •  Conclusions

Sept 29, 2015 IMA

General FrameworkINGREDIENTS •  Uncertainty (uncertain environment) •  Decision makers (DMs) (players, agents) •  Perceptions of DMs on uncertainty •  Objective(s) •  Description of interactions (underlying

network) •  Common information / Private information

Sept 29, 2015 IMA

General Framework•  Uncertainty (uncertain environment); decision makers (DMs)

(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information

•  DMs pick policies (decision laws, strategies) leading to actions that evolve over time

Sept 29, 2015 IMA

General Framework•  Uncertainty (uncertain environment); decision makers (DMs)

(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information

•  DMs pick policies (decision laws, strategies) leading to actions that evolve over time

•  Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon

Sept 29, 2015 IMA

General Framework•  Uncertainty (uncertain environment); decision makers (DMs)

(players, agents); perceptions of DMs on uncertainty; objective(s); description of interactions (underlying network); common information

•  DMs pick policies (decision laws, strategies) leading to actions that evolve over time

•  Policies are constructed based on information received (active as well as passive) and guided by individual utility or cost functions over the DM horizon

–  Single DM => stochastic control –  Single objective => stochastic teams –  Otherwise ZS or NZS games, with NE

Page 2: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Coupling of Information and Actions !

•  Is the quality of active and relevant information received by a player affected by actions of other players ?

–  If no => underlying game (team) is generally “simple”

–  If yes => it is generally “difficult”

Sept 29, 2015 IMA

A stochastic decision problem is one with asymmetric information, if different decision units (players, agents, decision makers) acting at the same time instant do not have access to the same information (of relevance).

Asymmetric Information

Sept 29, 2015 IMA

A stochastic decision problem is one with non-classical information, if a decision unit, B, that “follows” another one, A, does not have all the information acquired and used by A.

Non-classical Information

Sept 29, 2015 IMA

A

yA

B C

yB yC

Qi(x, uA, uB, uC) i = A, B, C

Sept 29, 2015 IMA

A

yA

B C

yB yC

Qi(x, uA, uB, uC) versus

A B C

Sept 29, 2015 IMA

A

yA

B C

yB yC

Qi(x, uA, uB, uC) versus

A B C

Page 3: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

A

yA

B C

yB yC

Qi(x, uA, uB, uC) versus

A B C

Sept 29, 2015 IMA

Questions / Challenges

•  Existence of team-optimal (T-O) solutions •  Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) •  Characterization of T-O solutions •  Characterization of NE policies

Sept 29, 2015 IMA

Questions / Challenges

•  Existence of team-optimal (T-O) solutions •  Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) •  Characterization of T-O solutions •  Characterization of NE policies •  Challenges •  Possibility of non-existence with asymmetry •  Triple role (caution, probing, signaling) •  Signaling (deception/threat) through action •  Tension between signaling and optimization

Sept 29, 2015 IMA

Questions / Challenges

•  Existence of team-optimal (T-O) solutions •  Existence of Nash equilibrium (NE) policies (in dynamic stochastic games) •  Characterization of T-O solutions •  Characterization of NE policies •  Challenges •  Possibility of non-existence with asymmetry •  Triple role (caution, probing, signaling) •  Signaling (deception/threat) through action •  Tension between signaling and optimization

NEXT: An example exhibiting the subtleties, such as the significance of failure of continuity of information content of channels in the limit of a sequence of discretized problems

Sept 29, 2015 IMA

Non-classical (limited memory)

γ0 γ1+w

yu0u1x

x, w ~ independent random variables u0, u1 are real valued actions J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ] J* = min min J(γ0 , γ1)

A B

Sept 29, 2015 IMA

Non-classical (2-pt pmf’s)

γ0 γ1+w

yu0u1x

x ~ ± 1 with equal prob; likewise w J(γ0 , γ1) = E [k (u0 - x)2 + (u0 - u1)2 | γ0 , γ1 ], k > 0 J* = inf inf J(γ0 , γ1)

A B

Page 4: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Infimizing policies

γ0 γ1+w

yu0u1x

J* = inf inf J(γ0 , γ1) = 0 – lowest possible Consider the policies: γ0(x) = x + ε sgn(x) γ1(y) = 1+ε, if y=2+ε or ε = -1-ε, if y=-2-ε or –ε ! J = k ε2 " 0 as ε -> 0

A B

Sept 29, 2015 IMA

Infimizing policies

γ0 γ1+w

yu0u1x

J* = inf inf J(γ0 , γ1) = 0 – lowest possible Consider the policies: γ0(x) = x + ε sgn(x) γ1(y) = 1+ε, if y=2+ε or ε = -1-ε, if y=-2-ε or –ε ! J = k ε2 " 0 as ε -> 0

A B

But, as ε -> 0, limiting policies do not lead to J = 0 ! Loss of crucial information in the limit !

Another Example (Bismut, TAC’72)

J(γ0 , γ1) = E [(u0 - x)2 + α (u0 + u1)2 + (u1)2 | γ0 , γ1 ] α takes values 1 & 2 with equal prob x ~ uniform on [-√3 , √3] u0 = γ0(α, x) and u1 = γ1(u0 ) If u1 also knew α, then optimal cost: E[(1/(2+α2)) x2] = 7/24, with u0 = (1/(2+α)) x

Sept 29, 2015 IMA

Another Example (Bismut, TAC’72)

J(γ0 , γ1) = E [(u0 - x)2 + α (u0 + u1)2 + (u1)2 | γ0 , γ1 ] α takes values 1 & 2 with equal prob x ~ uniform on [-√3 , √3] u0 = γ0(α, x) and u1 = γ1(u0 ) If u1 also knew α, then optimal cost: E[(1/(2+α2)) x2] = 7/24, with u0

* = (1/(2+α)) x

Sept 29, 2015 IMA

With the original IS, agent 0 can truncate u0

* to some n decimal places and use the n+1’th decimal to transmit the true value of α – coming to within ε of 7/24. Again limit is not well defined.

Sept 29, 2015 IMA

The “simplest” “difficult” problem (Witsenhausen, 1968)

γ0 γ1+w

yu0u1x

x ~ N(0, σx2) w ~ N(0, σw

2) QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

optimal team solution exists, but its structure is not known

Sept 29, 2015 IMA

The “simplest” difficult problem (Witsenhausen, 1968)

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

optimal team solution exists, but its structure is not known affine policies are not optimal

Page 5: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

The “simplest” difficult problem (Witsenhausen, 1968)

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

optimal team solution exists, but its structure is not known affine policies are not optimal

Note that if u0 = γ0(x), best γ1 is u1 = γ1(y) = E[γ0(x) | y] Hence optimization is with respect to the probability distribution of u0

Sept 29, 2015 IMA

The “simplest” difficult problem (Witsenhausen, 1968)

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

optimal team solution exists, but its structure is not known affine policies are not optimal

Original existence proof given by Witsenhausen involved calculus of variations type arguments, and does not extend to more general scenarios.

Sept 29, 2015 IMA

The “simplest” difficult problem

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 A policy pair that beats the best linear one: u0 = γ0(x) = ε sgn (x) + λ x u1 = γ1(y) = E[ε sgn (x) + λ x | y] optimize wrt ε and λ

Sept 29, 2015 IMA

The “simplest” difficult problem

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 A more general one with improved perf: u0 = γ0(x) = ε Quant (x) + λ x u1 = γ1(y) = E[ε Quant (x) + λ x | y] optimize wrt ε and λ for different Quant(ization) schemes (Bansal, TB ‘87)

Sept 29, 2015 IMA

The “simplest” difficult problem

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2 u0 = γ0(x) = ε Quant (x) + λ x u1 = γ1(y) = E[ε Quant (x) + λ x | y] Best perf under linear could be arbitrarily bad against best performance under Quant => supk,σ [Qw

linear* / QWQuant*] = ∞ (Mitter, Sahai ‘99) Sept 29, 2015 IMA

Digression However, with Conflicting Objectives

γ0 γ1+w

yu0u1x

QG(x, u0, u1) = - k0 (u0 - x)2 + (u0 - u1)2

J* = min max J(γ0 , γ1)

γ1 γ0 Unique saddle-point solution, control laws are linear

Page 6: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

However, with Conflicting Objectives

γ0 γ1+w

yu0u1x

QG(x, u0, u1) = - k0 (u0 - x)2 + (u0 - u1)2

J* = min max J(γ0 , γ1)

γ1 γ0 γ0

* (x) = - [k0 / (k0 – (λ* -1)2)]x, γ1

* (y) = λ* y

where λ* uniquely solves the polynomial eqf(λ) = (σw

2 / σx2) λ[k0 – (λ -1)2]2 – k0

2(1-λ) = 0in the open interval (max(0, 1-√k0), 1)

Sept 29, 2015 IMA

Coming back to Witsenhausen (1968)

Is the solution

piecewise constant (or rather piecewise affine)?

Sept 29, 2015 IMA

NO

It is strictly increasing with a real analytic left inverse

(Wu, Verdú – CDC’11)

Sept 29, 2015 IMA

NO It is strictly increasing with a real analytic left inverse

(Wu, Verdú CDC’11 )

Uses optimal transport theory: Given prob measures P and Q, and cost function c: R2" R, find inf over all joint distributions of X and Y, with marginals P and Q, of E[c(X,Y)] (Monge-Kantorovich)

Sept 29, 2015 IMA

The “simplest” difficult problem (Witsenhausen, 1968)

γ0 γ1+w

yu0u1x

QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

optimal team solution exists, but its structure is not known affine policies are not optimal

E[c(x,u0)]= E[QW] = k W2(P,Q)2 + mmse(Q) where P and Q are PDF’s of x and u0, and W2 is the quadratic Wasserstein distance between P and Q. For any P, ∃ minimizing Q, with “smooth” γ0 . Sept 29, 2015 IMA

A message to take e.g. on a variation to LQG

P

C / µ

+

plant uncertainty, w

sensor noise v

y = H x + v

u

y

dx/dt = Ax+Bu+Dw

u(t) = µ(yt) -- memoryless controller a difficult problem; solution not known

w, v indep GWN

Cost: (1/T) ∫ |z|2 + |u|R2

Page 7: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Implications for Distributed Control

P

C

+

C1 CM

w

v

C2 …..

Sept 29, 2015 IMA

Implications for Distributed Control

P

C

+

C1 CM

w

v

C2 …..

In any optimal operation, controllers with decentralized information have to talk to (signal) each other through the plant.

Sept 29, 2015 IMA

Earlier general set-up with a different Q

γ0 γ1+w

yu0u1x

x ~ N(0, σx2) w ~ N(0, σw

2) u0, u1 are real valued actions J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ] Q(x, u0, u1) = k (u0)2 + (u1 - x)2

A B

Sept 29, 2015 IMA

Gaussian Test Channel

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2

optimal control law (encoder/decoder)exists, and is linear

Sept 29, 2015 IMA

Generalized Gaussian Test Channel

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 x

optimal control law (encoder/decoder)exists, and is linear

Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α

Page 8: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence-2)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α

DPT: I(X;Y) ≥ I(X;U1)

Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence-3)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α

(1/2)log (1+(α/ σw2)) ≥ I(X;Y) ≥ I(X;U1) ≥ (1/2)log (σx

2/ β) C(α) R(β)

Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence-4)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx√ α

(1/2)log (1+(α/ σw2)) ≥ I(X;Y) ≥ I(X;U1) ≥ (1/2)log (σx

2/ β) ==> β ≥ σw

2 σx2 / (σw

2 + α)Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence-5)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β + infγ b0 E[ γ0(x)x] ≥ k α + β - sgn(b0) σx √ α

==> β ≥ σw2 σx

2 / (σw2 + α)

Inequality is tight with γ0 (x) = -sgn(b0)(√ α / σx) x

Sept 29, 2015 IMA

Generalized Gaussian Test Channel !(Proof of existence-6)

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 xE[Q]=F(γ0, γ1) ≥ k α + β - |b0| σx√ α ≥ k α + σw

2 σx2 / (σw

2 + α) - |b0| σx√ α

Obtain the α that minimizes the bound --> α*

Then, γ0* (x) = -sgn(b0)(√ α* / σx) x, γ1

*(y) = E[x|y]Sept 29, 2015 IMA

Generalized Gaussian Test Channel

γ0 γ1+w

yu0u1x

Q(x, u0, u1) = k (u0)2 + (u1 - x)2 + b0 u0 x

One of the few instances when static/causal coding (and linear in this case) leads to attainment of equality in C(α) ≥ R(β)

Page 9: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Recap

γ0 γ1+w

yu0u1x

QW = k0 (u0 - x)2 + (u0 - u1)2 conflicting rolesQG = - k0 (u0 - x)2 + (u0 - u1)2 aligned rolesQTC = k0 (u0)2 + (u1 - x)2 aligned roles

Not only the information structure but also the cost function is a determining factor

Sept 29, 2015 IMA

Noise Corrupted Source

γ0 γ1+w

yu0u1x + v

x ~ N(0, σx2), w ~ N(0, σw

2), v ~ N(0, σv2)

J(γ0 , γ1) = E [ Q(x, u0, u1) | γ0 , γ1 ]

! Similar structural results – linear policies

Sept 29, 2015 IMA

Jammer Corrupted Channel

γ0 γ1+wN

yu0u1x + v

x ~ N(0, σx2), wN ~ N(0, σw

2), v ~ N(0, σv2)

wJ unknown – controled by jammer

J(γ0 , γ1; γJ) = E [ Q(x, u0, u1; wJ) | γ0 , γ1, γJ]

! SP policies are linear (TB – TIT’83)

wJ

Sept 29, 2015 IMA

Jammer Corrupted Channel

γ0 γ1+wN

yu0u1x + v

x ~ N(0, σx2), w unknown, v ~ N(0, σv

2)

J(γ0 , γ1; γw) = E [ Q(x, u0, u1, w) | γ0 , γ1, γw]

! SP policies are linear (TB – TIT’83)

Even with x and channel noise non-Gaussian. under some matching conditions on spectral densities, SP policies are linear (Akyol-Rose-TB TIT-Aug’15)

wJ

Sept 29, 2015 IMA

Relay Channels Multiple Serial Decision Units

γ0 γm+

w0

ymu0umx + v

x ~ N(0, σx2), wi ~ N(0, σw

2), v ~ N(0, σv2)

J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]] Q(x, u0, u[1,m]) = (um – x)2 + Σi=1m (ui-1)2

+

wm-1

γi…. ….

Sept 29, 2015 IMA

Multiple Serial Decision Units

γ0 γm+

w0

ymu0umx + v

x ~ N(0, σx2), wi ~ N(0, σw

2), v ~ N(0, σv2)

J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]]

! For m > 3, affine laws no longer optimal for this extended GTC model (Lipsa-Martins’10)

+

wm-1

γi…. ….

Page 10: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Multiple Serial Decision Units

γ0 γm+

w0

ymu0umx + v

x ~ N(0, σx2), wi ~ N(0, σw

2), v ~ N(0, σv2)

J(γ0 , γ[1,m]) = E [ Q(x, u0, u[1,m]) | γ0 , γ[1,m]]

! Also for m = 3, affine laws not optimal for this extended GTC model (Yüksel et al 2011)

+

wm-1

γi…. ….

Does a general unifying existence proof exist?

One that would apply to •  Witsenhausen counter-example •  Gaussian test channel •  their multi-dimensional versions •  relay channels •  stochastic control problems with no memory

or limited memory •  LQG teams with asymmetric information •  etc Sept 29, 2015 IMA

Answer is YES (Gupta, Yüksel, TB -- CDC’14)

(Gupta, Yüksel, TB & Langbort – SICON’15)

Assumptions on the dynamic M-agent T stage stochastic team problem: •  Sequential decision problem: Agent i

observes and recalls at time t, y(i,t), and constructs u(i,t)= γ(i,t)(y(i,t))

•  Inf J(γ) is finite (e.g. use zero control) •  Witsenhausen’s static reduction holds

Sept 29, 2015 IMA

Answer is YES

Assumptions on the dynamic M-agent T stage stochastic team problem: •  Sequential decision problem: Agent i

observes and recalls at time t, y(i,t),, and constructs u(i,t)= γ(i,t)(y(i,t))

•  Inf J(γ) is finite (e.g. use zero control) •  Witsenhausen’s static reduction holds

Sept 29, 2015 IMA

Static reduction (Wits’88): •  Convert M-agent team to MT-agent one where each agent acts only once •  Measure and cost transformation that turns the dynamic problem into static one, with independent measurements (measurements now enter into cost)

Sept 29, 2015 IMA

Static Reduction for the C-example

γ0 γ1+w

yu0u1x

x ~ N(0, σx2) w ~ N(0, σw

2) QW(x, u0, u1) = k (u0 - x)2 + (u0 - u1)2

∫ P(dx)P(dy) QW(x, u0, u1) # exp (u0 (2y - u0)/2) x ~ N(0, σx

2) y ~ N(0, 1) independent Sept 29, 2015 IMA

Challenges in the Proof

•  Extracting a convergent subsequence of the infimizing sequence

•  Showing lower semi-continuity of J •  Making sure that informational

constraints are preserved in the limit

Page 11: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Approach to Overcome Challenges in the Proof

•  With static reduction, redefine optimization over randomized strategies

•  Because of independence of measurements information constraints are preserved

•  Identify a compact set that contains the optimal solution (Markov’s inequality)

•  Use Blackwell’s irrelevant information theorem to go to deterministic policies

•  Invoke equivalence of reduced static and dynamic teams

Sept 29, 2015 IMA

Assumptions & Approach Apply to Various Settings

•  Multi-dimensional Witsenhausen C-E •  Multi-dimensional Gaussian Test Channel •  Relay channel and multi-dimensional version •  LQG with static output feedback (solution is

generally nonlinear) •  Countable/quantized observation spaces in

static teams with observation sharing IS (uses a lifting technique)

Sept 29, 2015 IMA

Next Question

Does there exist a computational scheme that would generate policies with costs arbitrarily close to optimum?

Sept 29, 2015 IMA

Selected Prior Work with no Performance Guarantees

•  Lee-Lao-Ho, TAC’01 •  Baglietto-Parisini-Zoppoli, TAC’01 •  Li-Marden-Shamma, CDC’09 •  Gnecco-Sanguinetti, INOC’09, OL’12 •  Gnecco-Sanguinetti-Gaggero, SICON’12 •  McEneaney-Han, Automatica’15 NOTE: As an optimiz problem these are NP complete – Papadimitriou-Tsitsiklis, SICON’86

Sept 29, 2015 IMA

Recent Answer

Quantized policies are asymptotically optimal -- those that use uniformly quantized measurements (restricted to a compact domain) Saldı-Yüksel-Linder (Sept 2015)

Sept 29, 2015 IMA

Recent Answer (Specifics)

•  Cost is bounded and continuous in control •  Action space of each agent is a convex subset

of a locally convex vector space •  Measurement space of each agent is compact •  Static reduction holds, satisfying the above ! On each ε-net of uniformly quantized measurements, there exists a team-optimal policy, leading to asymptotically optimal value for team cost as net gets arbitrarily dense

Page 12: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

Comprehensive coverage of analysis, optimization and performance of stochastic networked systems in Stochastic Networked Control Systems: Stabilization and Optimization under Information Constraints -- Yüksel & TB Birkhäuser, 2013

Sept 29, 2015 IMA

Stochastic Dynamic Games with Asymmetric Information

“Lifting” to a symmetric game

utilizing the common information of the players

Dynamic Stochastic Games with Asymmetric Information

(A. Nayyar & TB, CDC’12)

•  Game G1: Local state process (xi) for Pi & global state process (z):

zt+1 = f0t(zt, ut, w0

t) xi

t+1 = fit(zt, ut, wi

t) •  ui

t = μit(xi

[1,t], z[1,t], u[1,t-1]), i = 1, …, n •  Ji(μ1, ..., μn) = E[∑[1, T] ci(xi

t, zt, ut)|μ] •  Interested in Nash equilibrium μ*

Sept 29, 2015 IMA

From “Dynamic Stochastic Games with Asymmetric Information”

(A. Nayyar, TB, CDC 2012)

•  Game G1: Local state process (xi) for Pi & global state process (z):

zt+1 = f0t(zt, ut, w0

t) xi

t+1 = fit(zt, ut, wi

t) •  ui

t = μit(xi

[1,t], z[1,t], u[1,t-1]) •  Ji(μ1, ..., μn) = E[∑[1, T] ci(xi

t, zt, ut)|μ] •  Interested in Nash equilibrium μ*

Sept 29, 2015 IMA

Game G2: ui

t = μit(xi

t, z[t-1,t], ut-1) ! If μ* is a NE for G2, it is also a NE for G1

Models with Asymmetric Information (with A. Nayyar, 2012)

•  Game G1: Local state process (xi) for Pi & global state process (z):

zt+1 = f0t(zt, ut, w0

t) xi

t+1 = fit(zt, ut, wi

t) •  ui

t = μit(xi

[1,t], z[1,t], u[1,t-1]) •  Ji(μ1, ..., μn) = E[∑[1, T] ci(xi

t, zt, ut)|μ] •  Interested in Nash equilibrium μ*

Sept 29, 2015 IMA

Game G2: ui

t = μit(xi

t, z[t-1,t], ut-1) ! If μ* is a NE for G2, it is also a NE for G1 Game G3: Replace Pi with FPi who has access to z[t-1,t], ut-1, and selects Γi

t: Xi -> Ui according to φi

t (his strategy): Γi

t = φit(z[t-1,t], ut-1)

Then, control action is: uit = Γi

t(xit)

Cost: Li(φ1, ..., φn) = E[∑[1, T] ci(xit, zt, ut)|φ]

Models with Asymmetric Information (with A. Nayyar, 2012)

•  Game G1: Local state process (xi) for Pi & global state process (z):

zt+1 = f0t(zt, ut, w0

t) xi

t+1 = fit(zt, ut, wi

t) •  ui

t = μit(xi

[1,t], z[1,t], u[1,t-1]) •  Ji(μ1, ..., μn) = E[∑[1, T] ci(xi

t, zt, ut)|μ] •  Interested in Nash equilibrium μ*

Sept 29, 2015 IMA

Theorem: Let μ be a NE for G2. Define φi

t(z[t-1,t], ut-1) = μit( #, z[t-1,t], ut-1)

Then, φ is a NE for G3. Conversely, if φ is a NE for G3, μ as above is a NE for G2.

Game G3: Replace Pi with FPi who has access to z[t-1,t], ut-1, and selects Γi

t: Xi -> Ui according to φi

t (his strategy): Γi

t = φit(z[t-1,t], ut-1)

Then, control action is: uit = Γi

t(xit)

Cost: Li(φ1, ..., φn) = E[∑[1, T] ci(xit, zt, ut)|φ]

Page 13: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t}

Sept 29, 2015 IMA

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t} •  For ZS version, (z[t-1, t], ut-1) can be replaced with

z[t-1, t], with some caveats regarding existence of SPE

Sept 29, 2015 IMA

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t} •  For ZS version, (z[t-1, t], ut-1) can be replaced with

z[t-1, t], with some caveats regarding SPE •  Extends to teams against teams in ZS context

Sept 29, 2015 IMA

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t} •  For ZS version, (z[t-1, t], ut-1) can be replaced with

z[t-1, t], with some caveats regarding SPE •  Extends to teams against teams in ZS context •  Generalization to noisy channels (next)

Sept 29, 2015 IMA

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t} •  For ZS version, (z[t-1, t], ut-1) can be replaced with

z[t-1, t], with the caveats identified earlier •  Extends to teams against teams in ZS context •  Generalization to noisy channels (next) •  Other contexts where a SDG with asymmetric

information can be transformed to one with symmetric information (through “lifting”)

Sept 29, 2015 IMA

What does this lead to •  Since G3 is a symmetric full information game, its

NE can be obtained by backward induction (strongly-time consistent, sub-game perfect, etc), albeit with optimization on a function space {Γi

t} •  For ZS version, (z[t-1, t], ut-1) can be replaced with

z[t-1, t], with the caveats identified earlier •  Extends to teams against teams in ZS context •  Generalization to noisy channels (next) •  Other contexts where a SDG with asymmetric

information can be transformed to one with symmetric information (through “lifting”)

Sept 29, 2015 IMA

Two recent papers: Nayyar, Gupta, Langbort, TB (TAC, 59(3), 2014) “Common information based Markov perfect equilibria for stochastic games with asymmetric information: Finite games” Gupta, Nayyar, Langbort, TB (SICON, 52(5), 2014) “Common information based Markov perfect equilibria for Linear-Gaussian games with asymmetric information”

Page 14: Stochastic Dynamic Teams OUTLINE · ECE, CAS, CSL, ITI and MechSE University of Illinois at U-C basar1@illinois.edu September 29, 2015 IMA-Distributed Control and DM over Networks

Sept 29, 2015 IMA

LQG Games with Asymmetric Information

•  Standard set-up with independent measurement(s) and system noises

•  Some sharing of past measurements and control actions by the (two) players

•  Player i’s policy at time t: g(i,t): I(i,t) -> U(i,t) •  Standard stage-additive T-horizon cost •  Decomposition into common and private

information: Ct = I(1,t) � I(2,t) , P(i,t) = I(i,t) \ Ct

Sept 29, 2015 IMA

Assumptions / Approach

•  Common information Ct does not contract •  Common information based common belief is

independent of the policies of the players •  ----------------------------------------------- •  Introduce common information based game

(G2) on lifted strategy spaces as before •  MPE of G2 is CIMPE for G1

Relationship between G1 & G2 INTRODUCTION TECHNICAL CHALLENGES TEAMS CONCLUSION

GAME G2

P

1t

U

1t �1

t : P1t ! U1

t

Mt

VP 1

Common Information Ct

P

2t

U

2t

Controller 1

Game G1

Game G2

Mt

VP 2

�2t : P2

t ! U2t

Controller 2

NE(G1) ()Bijection

NE(G2)

DYNAMIC SEQUENTIAL DECISION PROBLEMS WITH ASYMMETRIC INFORMATION 52/60

Sept 29, 2015 IMA Sept 29, 2015 IMA

Assumptions / Approach / Result

•  Common information Ct does not contract •  Common information based common belief is

independent of the policies of the players •  ----------------------------------------------- •  Introduce common information based game

(G2) on lifted strategy spaces as before •  MPE of G2 is CIMPE for G1 ! CIMPE is unique & CIMPE strategies are affine; computation involves solving linear eqs

Sept 29, 2015 IMA

Recap •  We now have a general theory of existence of

team-optimal solutions in stochastic dynamic teams with asymmetric and non-classical IS

•  Caveats in computations based on discretization/quantization, but feasible with uniform quantization of measurements

•  Lifting of SGs with asymmetric information to ones with symmetric information at the expense of increase in complexity

•  Common information based MPE offers computational advantages

Sept 29, 2015 IMA

THANKS !