Partitioned Linear Programming Approximations for MDPs€¦ · –Solving ALP formulations...

1 Intel Research1 Intel Research1

Partitioned Linear Programming Approximations for MDPs

Branislav KvetonIntel Research

Santa Clara, CA

Milos HauskrechtDepartment of Computer Science

University of Pittsburgh

2 Intel Research2

Overview

• Introduction

– Factored Markov decision processes

– Approximate linear programming

– Solving ALP formulations

• Partitioned linear programming approximations

– Formulation, theory, and insights

• Experiments

• Conclusions and future work

3 Intel Research3

Overview

• Introduction

• Experiments

4 Intel Research4

Factored Markov Decision Processes

• A factored Markov decision process (MDP) is a 4-tuple M = (X, A, P, R):

– X is a set of state variables

– A is a set of actions

– P is a transition functionrepresented by a dynamic Bayesian network (DBN)

– R is a reward model:

j aa ,R,R xx

t t+1Local reward functions

5 Intel Research5

Linear Value Function Approximations

• The quality of a policy is measured by the infinite horizon discounted reward:

• The optimal value function V* is a fixed point of the Bellman equation:

• A compact representation of an MDP may not guarantee a compact form of the optimal value function V*

• Approximation of V* by a linear combination of basis functions [Bellman et al. 1963, Van Roy 1998]:

π πγ xx ,RE0

xxx xx

VE,RmaxV ,| aPa

iiw xxw fV Local feature functions

6 Intel Research6

Approximate Linear Programming

• Optimization of the linear value function approximation can restated as an approximate linear program (ALP):

• The linear value function approximation combined with the structure of factored MDPs induces a structure in ALP:

VTV:subject to

VEminimize ψ

0,R,F:subject to

minimize

Constraint space of an ALP represented by a cost network

7 Intel Research7

State-of-the-Art Methods for ALP

• Exact methods

– Rewrite constraint space compactly (Guestrin et al. 2001)

– Cutting plane method (Schuurmans & Patrascu 2002):

– Problem: Exponential in the treewidth of the dependency graph that represents the constraint space in ALP

• Approximate methods

– Monte Carlo constraint sampling (de Farias & Van Roy 2004)

– Markov Chain Monte Carlo (MCMC) constraint sampling (Kveton & Hauskrecht 2005)

– Problem: Stochastic nature and slow convergence in practice

aaw ,R,Fmaxarg )(

8 Intel Research8

Overview

• Introduction

• Experiments

9 Intel Research9

Partitioned ALP Approximations

• Decompose the ALP constraint space (with a large treewidth) into a set of constraint subspaces (with small treewidths)

Constraint subspace #2Constraint subspace #1 Constraint subspace #3

Treewidth 2

Treewidth 1

Constraint space of an ALP represented

by a cost network

10 Intel Research10

• Partitioned ALP (PALP) formulation with K constraint spaces is given by a linear program:

:subject to

minimize

3,32,31,3

3,22,21,2

3,12,11,1

Partitioning matrix DColumn vector Mw(x, a)T of instantiated cost network

11 Intel Research11

• Partitioned ALP (PALP) formulation with K constraint spaces is given by a linear program:

• When the decomposition D is convex, a solution to the PALP formulation is feasible in the corresponding ALP formulation

• The PALP formulation is feasible if the set of basis functions includes a constant basis function f0(x) 1

:subject to

minimize

3,32,31,3

3,22,21,2

3,12,11,1

Partitioning matrix DColumn vector Mw(x, a)T of instantiated cost network

12 Intel Research12

Interpreting PALP Approximations

• PALP can be viewed as solving K MDPs with overlapping state and action spaces, and shared value functions:

MDP #2MDP #1 MDP #3

:subject to

minimize

3,32,31,3

3,22,21,2

3,12,11,1

,3,2,1

αwdαwdαwdi

13 Intel Research13

Partitioning Matrix D

• To achieve high quality and tractable approximations, the K constraint spaces should preserve critical dependencies in the MDP and have a small treewidth

• How to generate the best PALP approximation within a given complexity limit is an open question

• In the experimental section, we build a constraint space for every node in the ALP cost network and its neighbors

14 Intel Research14

Solving PALP Approximations

• PALP formulations can be solved by exact methods for solving ALP formulations

• In the experimental section, we use the cutting plane methodfor solving linear programs

15 Intel Research15

Theoretical Analysis

• PALP value functions are upper bounds on the optimal value function V*

• PALP minimizes the L1-norm error between the optimal value function V* and our value function approximation

• The quality of PALP solutions can be bounded as follows:

• PALP generates a close approximation to the optimal value function V* if V* lies in the span of basis functions and the penalty δ for partitioning the ALP constraint space is small

1VVmin

The L1-norm error of the PALP value

function

The minimum max-norm error of the linear value function approximation

The hardness of making an ALP solution feasible in

the PALP formulation

16 Intel Research16

Overview

• Introduction

• Experiments

17 Intel Research17

Experiments

• Demonstrate the quality and scale-up potential of partitioned ALP approximations

• Comparison to exact and Monte Carlo ALP approximations on three topologies of the network administration problem

Ring-of-rings topology Grid topologyRing topology

Treewidth of the grid topology grows with the number of computers

18 Intel Research18

• Evaluation by the quality of policies (relatively to the reward of ALP policies) and computation time

Experimental Results

6 12 18 24 30

Ring topology

6 12 18 24 30

Ring-of-rings topology

2x2 4x4 6x6 8x8 10x10

Grid topology

6 12 18 24 300

Problem size n

6 12 18 24 300

Problem size n

2x2 4x4 6x6 8x8 10x100

Problem size nxn

MC ALP

19 Intel Research19

• The quality of PALP policies is almost as high as the quality of ALP policies

6 12 18 24 30

Ring topology

6 12 18 24 30

2x2 4x4 6x6 8x8 10x10

Grid topology

6 12 18 24 300

Problem size n

6 12 18 24 300

Problem size n

2x2 4x4 6x6 8x8 10x100

Problem size nxn

MC ALP

20 Intel Research20

• Magnitudes of ALP and PALP weights are different but the weights exhibit similar trends

0 10 20 30 40 505

Basis function index i

0 10 20 30 40 500.8

7x7 grid topology

21 Intel Research21

• PALP policies can be computed significantly faster than ALP policies

6 12 18 24 30

Ring topology

6 12 18 24 30

2x2 4x4 6x6 8x8 10x10

Grid topology

6 12 18 24 300

Problem size n

6 12 18 24 300

Problem size n

2x2 4x4 6x6 8x8 10x100

Problem size nxn

MC ALP

Exponential speedup

22 Intel Research22

• PALP policies are superior to ALP policies, which are obtained by Monte Carlo constraint sampling

6 12 18 24 30

Ring topology

6 12 18 24 30

2x2 4x4 6x6 8x8 10x10

Grid topology

6 12 18 24 300

Problem size n

6 12 18 24 300

Problem size n

2x2 4x4 6x6 8x8 10x100

Problem size nxn

MC ALP

23 Intel Research23

• PALP policies are superior to ALP policies, which are obtained by Monte Carlo constraint sampling

6 12 18 24 30

Ring topology

6 12 18 24 30

2x2 4x4 6x6 8x8 10x10

Grid topology

6 12 18 24 300

Problem size n

6 12 18 24 300

Problem size n

2x2 4x4 6x6 8x8 10x100

Problem size nxn

MC ALP

24 Intel Research24

Overview

• Introduction

• Experiments

25 Intel Research25

Conclusions and Future Work

• Conclusions

– A novel approach to ALP that allows for satisfying ALP constraints without an exponential dependence on their treewidth

– Natural tradeoff between the quality and computation time of ALP solutions

– Bounds on the quality of learned policies

– Evaluation on a challenging synthetic problem

• Future work

– Learning of a good partitioning matrix D and the problem of exact inference in Bayesian networks with a large treewidth

– Evaluate PALP on a large-scale real-world planning problem

Partitioned Linear Programming Approximations for MDPs€¦ · –Solving ALP formulations...

Documents

Transcript of Partitioned Linear Programming Approximations for MDPs€¦ · –Solving ALP formulations...

CONVERGENCE. ACCURACY. AND STABILITY OF FINITE …oden/Dr._Oden_Reprints/1972-019... · elasticity. Piecewise linear finite-element approximations in x and cen-tral difference approximations

SOCG: Linear-Size Approximations to the Vietoris-Rips Filtration

BEST LINEAR APPROXIMATIONS TO SET IDENTIFIED …arungc/CCMS.pdf · 2019-02-18 · BEST LINEAR APPROXIMATIONS TO SET IDENTIFIED FUNCTIONS: WITH AN APPLICATION TO THE GENDER WAGE GAP

Elementary Linear Algebra - Institut Teknologi Bandunginformatika.stei.itb.ac.id/~rinaldi.munir/AljabarGeometri/2017... · Partitioned Matrices A matrix can be subdivided or partitioned

ATMCS: Linear-Size Approximations to the Vietoris-Rips Filtration

Linear Approximations and Diﬀerentials - tkiryl.com · Section 2.8 Linear Approximations and Diﬀerentials 2010 Kiryl Tsishchanka EXAMPLE: Find the linearization of the function

“Poissonian and Gaussian approximations in linear spaces” · 2016-10-02 · of Gaussian and Poissonian approximations (including some recent results of the author) are presented

Approximations of pseudo-Boolean functions; applications to ......of an arbitrary pseudo-Boolean function by a quadratic one. Linear approximations will be studied in Section 2, and

Notes on Linear Approximations, Equilibrium Multiplicity ...faculty.wcas.northwestern.edu/~lchrist/research/Zero_Bound/... · Notes on Linear Approximations, Equilibrium Multiplicity

16.4 Extended Kalman Filter · 2017-05-03 · Extended Kalman Filter • Does not assume linear Gaussian models • Assumes Gaussian noise • Uses local linear approximations of

Functional Analysis Ihomepages.warwick.ac.uk/~masdh/FA1.pdf8.2 Linear subspaces and orthogonal complements 49 8.3 Closed linear span 52 8.4 Best approximations 54 9 Separable Hilbert

Stable LInear Approximations to Dynamic …...Stable Linear Approximations Programming for Stochastic Control Problems 1047 are finite, we choose to think of cost-to-go functions mapping

Variational Approximations for Generalized Linear Latent ...users.jyu.fi/~slahola/files/VA.pdf · 2 15 Abstract 16 Generalized Linear Latent Variable Models (GLLVMs) are a powerful

3.8 Local Linear Approximations; Differentials (page 226)

Efficient Iterative Linear-Quadratic Approximations …dfk/pdfs/ILQGames.pdfrepeatedly refine an initial control strategy by efficiently solving linear-quadratic (LQ) approximations

Secure Linear Regression on Vertically Partitioned Datasets · 2018-11-06 · Secure Linear Regression on Vertically Partitioned Datasets Adri a Gasc on 1, Phillipp Schoppmann , Borja

Close Approximations to Invariant Tori in Nonlinear ... · PDF fileClose Approximations to Invariant Tori in Nonlinear Mechanics* Robert L. Warnock Stanford Linear Accelerator Center

Linear Approximations - Illinois Institute of TechnologyEstimation with Linear Approximations References. Table of Contents Linear Function Linear Function or Not ... I Managers I

Classical Linear Regression Model Notation and Assumptions Model Estimation –Method of Moments –Least Squares –Partitioned Regression Model Interpretation.

Lesson 12: Linear Approximations and Differentials (slides)