AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...
Transcript of AdityaMahajan McGillUniversityadityam/talks/queens-2011.pdf · Stochastic games: Papavassilopoulos,...
Separation result for delayedsharing information structures
Aditya Mahajan
McGill UniversityJoint work with: Ashutosh Nayyar and Demos Teneketzis, UMichigan
Queen's University, July 12, 2011
Common theme: multi-stage multi-agentdecision making under uncertainty
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
Model of uncertainty Model of information
Model of objective
Modeling multi-stage decisionmaking under uncertainty
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Model of objective
Modeling multi-stage decisionmaking under uncertainty
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Single DM with perfect recall
� = � ( �:� , �:��� )Model of objective
Modeling multi-stage decisionmaking under uncertainty
Model of uncertainty
Stochastic dynamics
��� = ( � , � , � )Noisy observations
� = ℎ( � , �� )State disturbance and noise arei.i.d. stochastic processes with knowndistribution.
System dynamics and observationfunction ℎ are known.
Model of information
Single DM with perfect recall
� = � ( �:� , �:��� )Model of objective
Cost at time � = � ( � , � ).Objective: minimize expected totalcost � [ �∑��� �� �, � ]
Modeling multi-stage decisionmaking under uncertainty
More general setups
Model of uncertainty
Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.
Model of information
Fixed memory/complexity at decision makerMore than one decision maker
Model of objective
Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)
More general setups
Model of uncertainty
Non-i.i.d. dynamics (Markov, ergodic, etc.)Unknown distributionUnknown model, unknown cost, etc.
Model of information
Fixed memory/complexity at decision makerMore than one decision maker
Model of objective
Worse-case performance (instead of expected performance)Minimize regret (instead of minimizing total cost)Remain in a desirable set (rather than minimize total cost)
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Single-agent decision making
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Single-agent decision making
Design difficulties� = � �:�, �:���Domain of control laws increaseswith time
min��,...,�� �[ �∑��� � �, � ]Search of optimal control policyis a functional optimization problem
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
Single-agent decision making
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
Single-agent decision making
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
�� is policy independent
Single-agent decision making
Estimation Control
Structural resultsDefine, information state:�� = ℙ � | �:�, �:���Then, there is no loss of optimalityin restricting attention to controllaws of the form� = � ��
Dynamic ProgrammingThe following recursive equationsprovide an optimal control policy
� �� = min�� �[� �, �+ ��� ���� | ��, �]
�� is policy independent Each step of DP is a parameteroptimization.
Single-agent decision making
(One-way) separation betweenestimation and control
In single-agent decision making, estimation is separated from control.This separation is critical for decomposing the search of optmial
control policy into a sequence of parameter optimization problems.
Does this separation extendto multi-agent decision making?
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
Model of uncertainty
Model of information
Model of objective
Delayed-sharing information structure
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Model of objective
Delayed-sharing information structure
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
Model of objective
Delayed-sharing information structure
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
�� = �� ( ��:�, ��:�����:���, ��:��� )
Model of objective
Delayed-sharing information structure
Model of uncertaintyStochastic dynamics
��� = ( � , �:�� , � )Noisy observations�� = ℎ� ( � , ��� )
Model of information
Each DM has perfect recall
Each DM observes �-step delayedinformation (observation and actions)of other DMs
�� = �� ( ��:�, ��:�����:���, ��:��� )
Model of objective
Cost at time � = � ( � , �:�� ).Objective: minimize
� [ �∑��� �� �, �:�� ]
Delayed-sharing information structure
Delayed-sharing information structure
Some Notation
�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where
Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���
Delayed-sharing information structure
Some Notation
�� = �� ( ��:�, ��:�����:���, ��:��� ) �� = �� ( ��:���, ��:�����:�, ��:��� )Thus, �� = �� ( �� , ��� )where
Common info �� = �:��:���, �:��:���Local info ��� = ������:�, ������:���
Same design difficulties as single-agent case
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.
Literature overview
(Witsenhausen, 1971): Proposed delayed-sharing information structure.Asserted a structure of optimal control law (without proof).
(Varaiya and Walrand, 1978): Proved Witsenhausen's assertion for � =. Showed via a counter-example that the assertion is false for delay� > .
(Nayyar, Mahajan, and Teneketzis, 2011): Prove two alternativestructures of optimal control law.
NMT 2011 also obtain a recursive algorithm to find optimal control laws.At each step, we need to solve a functional optimization problem.
Original setup�� = �� ��, ���Structure of optimal control law
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
Structure of optimal control law
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���Structure of optimal control law
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���
Structure of optimal control law
Original setup�� = �� ��, ���W71 Assertion�� = �� ℙ ����� | �� , ���
NMT11 First result�� = �� ℙ� �, ��:�� | �� , ���NMT11 Second result�� = �� ℙ ����� | �� , ��� ,��:������:���
Contrast dependence on policy for the different results.
Structure of optimal control law
Importance of the problem
Applications (of one step delay sharing)
Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991
Importance of the problem
Applications (of one step delay sharing)
Power systems: Altman et. al, 2009Queueing theory: Kuri and Kumar, 1995Communication networks: Grizzle et. al, 1982Stochastic games: Papavassilopoulos, 1982; Chang and Cruz, 1983Economics: Li and Wu, 1991
Conceptual significance
Understanding the design of networked control systems
Bridge between centralized and decentralized systems
Insights for the design of general decentralized systems
Orthogonal search does
not work due to presence
of signaling. How does
controller 1 figure out how
agent 2 will interpret
his (agent 1's) actions?
Orthogonal search does
not work due to presence
of signaling. How does
controller 1 figure out how
agent 2 will interpret
his (agent 1's) actions?
Proof outline
1. Construct a coordinated system
2. Show that any policy of the coordinated system is implementable in theoriginal system and vice-versa. Hence, the two systems are equivalent.
3. Optimal design of the coordinated system is a single-agent multi-stagedecision problem. Find a solution for the coordinated system.
4. Translate this solution back to the original system.
Step 1: The coordinated system��, ��� ��, ����� ���� ��
Step 1: The coordinated system��, ��� ��, ����� ���� ��
Define partially evaluated control law: ��� = �� ��,
Step 1: The coordinated system��� �����
�� ����� , ���
��� �����
Define partially evaluated control law: ��� = �� ��,Coordinator prescribes ��� , ��� to the controllers as��� , ��� = �� ��, ���:���, ���:���
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
Step 2: Equivalence
For any policy �, . . . , � of the original system, we can construct apolicy ��, . . . , �� of the coordinated system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.�� �� = ��� , ��� = ( �� ��, , �� ��, )For any policy ��, . . . , �� of the coordinated system, we can constructa policy �, . . . , � of the original system such that the systemvariables { �, �:�� , �:�� , � = , . . . , } have the same realization alongall sample paths in both cases.
At time 1, both controllers know ��. Choose�� ��, ��� = ��� �� ��� .At time 2, both controllers knows ��, ��� , and ��� . Choose�� ��, ��� = ��� ��, ��� , ��� ��� .
Step 3: Solve the coordinated system
By construction, the coordinated system has a single decision makerwith perfect recall.
Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history
Then, there is no loss of optimality in restricting attention to controllaws of the form:
control action = Fn ��
Step 3: Solve the coordinated system
By construction, the coordinated system has a single decision makerwith perfect recall.
Use result for single-agent decision making:Define: �� = ℙ “Current state” | past history
Then, there is no loss of optimality in restricting attention to controllaws of the form:
control action = Fn ��What is the state (for I/O mapping) for the system.
State for the coordinated system
��� �����
�� ����� , ���
��� �����
� �� , ���
State for the coordinated system
��� �����
�� ����� , ���
��� �����
� �� , ���
State�, ��� , ���
Structure of optimal control law
Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��
Structure of optimal control law
Define �� = ℙ �, ��� , ��� | ��, ���:���, ���:���Then, there is no loss of optimality in restricting attention tocoordination laws of the form��� , `��� = �� ��The following recursive equations provide an optimal coordination policy
� �� = min��� ,��� �[� �, � + ��� ���� | ��, ��� , ��� ]
Step 4: Translate the solution
For a system with delayed-sharing information structure, there is no loss ofoptimality in restricting attention to control laws of the form�� = �� ��, ���Optimal control laws can be obtained by the solution of the followingrecursive equations
� �� = min�� ��,⋅ ,�� ��,⋅ �[� �, � + ��� ���� | ��, �� ��, , �� ��, ]
Features of the solution
The space of realizations of �� = ℙ �, ��� , ��� | ��, ���:���, ���:���is time-invariant. Thus, the domain of the control laws �� ��, ��� istime-invariant.�� is not policy independent! Estimation is not separated from control.This is always the case when signaling is present.
In each step of the dynamic program, we choose the partially evaluatedcontrol laws �� ��, , �� ��, . Choosing partially evaluated functions(instead of values) allows us to write a dynamic program even in thepresence of signaling.
Outline of this talk
What is the conceptual difficulty withmulti-agent decision making? How to resolve it?
Modeling decision making under uncertainty
Overview of single-agent decision making
Delayed sharing information structure:A “simple” model for multi-agent decision making
History of the problemOur approachMain results
Conclusion
Summary
Simple methodology to resolve a 40 year old open question:
Find common information at each time
Look at the problem for the point of view of a coordinator thatobserves this common information and chooses partiallly evaluatedfunctions
Find an information state for the problem at the coordinatorℙ(state for input-output mapping | common information)( ℙ(past state | common information), past partial control laws )
This methodology is also applicable to systems with more generalinformation structures (Mahajan, Nayyar, Teneketzis, 2008).
Salient Features
The size of the information state is time-invariant
The methodology is also applicable to infinite horizon problems
Each step of DP is a functional optimization problem
Form of the DP is similar to that of POMDP
Can borrow from the POMDP literature for numerical approaches