© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...

© 2009 Ilya O. Ryzhov 1© 2008 Warren B. Powell 1

.

Optimal Learning On A Graph

INFORMS Annual MeetingOctober 11, 2009

Ilya O. RyzhovWarren Powell

Princeton University

© 2009 Ilya O. Ryzhov, Princeton University

© 2009 Ilya O. Ryzhov 22

Motivation Learning on a graph

» Need to quickly plan the fastest (least congested) travel route

» GPS-enabled smartphones in the area can provide an estimate of local congestion

» We can make a small number of queries before we have to recommend a route

» Which areas should we measure in the limited time available?

» We are solving a problem on a graph, but we can measure any individual component of the graph at any time


Information collection on a graph

We have a shortest-path problem on a graph :

If the edge lengths , were deterministic, the problem would have a simple solution

» Algorithms by Bellman, Bellman-Ford, Dijkstra…

ijij

Eji ,ij

EVG ,



We have a shortest-path problem on a graph:

If the edge lengths were stochastic with known distribution:» We could run a deterministic shortest-path algorithm with edge lengths

given by » We could compute or approximate the distribution of the stochastic

shortest path (Kulkarni 1986, Fan et al. 2005, Peer & Sharma 2007)

ijij

ij

ijE



We have a shortest-path problem on a graph:

In the problem of learning on a graph, the edge lengths are stochastic, with unknown distribution

We use Bayesian statistics to learn the distributions sequentially

ijij



At first, we believe that

But we measure this edge and observe

Our beliefs change:

Thus, our beliefs about the rewards are gradually improved over measurements

6

00 1,~ ijijij N

0

1001

01

11

ˆ

1,~

ij

ijijijij

ijij

ijijij N

1,~ˆ1ijij N

N

i j0ij

i j0ij1ˆ ij

i j1ij



After n measurements, our beliefs about the entire graph are encoded in the knowledge state:

We can solve a deterministic shortest-path problem with edge lengths given by

This gives us a path pn that seems to be the shortest, based on our beliefs

» The length of this path is believed to be

This is not necessarily the real shortest path» The true length of path pn is

» The true length of the real shortest path is

npji

nij

nn sV,

nnns ,

nij

p

pVV min

n

n

pjiij

pV,



Optimal routing over a graph» The best path according to our beliefs

The black path is the path pn, with time-n length . nn sV



Optimal routing over a graph» The best path according to our beliefs» The edge we measure

The black path is the path pn, with time-n length . nn sV



Optimal routing over a graph» The best path according to our beliefs» The edge we measure» The best path according to our new beliefs

» How do we decide which links to measure?The black path is the path pn+1, with time-(n+1) length . 11 nn sV


Learning policies

Let be a function that takes the knowledge state and gives us an edge to measure

A learning policy is a set of such functions Simple examples of learning policies:

» Pure exploitation: find the time-n shortest path , then measure the shortest edge on that path

» Variance-exploitation: find the time-n shortest path , and then measure the edge that we are least certain about

ns EsX nn

nX

1,0, ,..., NXX

np

nij

pji

nnExp

nsX

,

, minarg

nij

pji

nnVExp

nsX

,

, minarg

np


Implementation policies

The problem consists of two phases:» Learning ( ): measuring individual edges» Implementation ( ): choosing a path

An implementation policy is a single function which maps the final state to some path

Simple examples of implementation policies:» Find the path pN: solve a deterministic shortest-path problem with edge

lengths given by» -percentile: solve a deterministic shortest-path problem with edge

lengths given by

Ns Nsp

N

ijNij z

Nij


Objective function

Choose a measurement policy and an implementation policy to minimize the true length of the path chosen by the implementation policy

VEinfinfObjective: .

Nsji

ijV

,


Learning policies

Theorem.

The best possible implementation policy is the one that finds the path pN

This result eliminates the problem of finding an implementation policy

We only have to find a learning policy that makes our estimate of small

NN sVEVE

infinfinf

NN sV


The KG decision rule: one-period look-ahead

The KG rule chooses an edge to maximize the expected one-period improvement in our estimate of the shortest path

nKG

ijji

nnnnnij

ji

nnKG sVsVEsX

,

,

11

,

,

maxarg

maxarg


Learning using knowledge gradients

Proposition. If we measure the edge at time n, then the best path at time n+1 (the path pn+1 that achieves ) will be either

» The best time-n path containing the edge , or» The best time-n path not containing the edge .

At time n, we know that the best time-(n+1) path can only be one of two things

ji,

ji,

ji,

11 nn sV


Computation of the knowledge gradient

ji,

nij

The best path containing the edge


Computation of the knowledge gradient

ji,

The best path not containing the edge

nij


Main result: KG formula It can be shown that

where

and

- standard normal cdf and pdf

- time-n length of the best path containing

- time-n length of the best path not containing

19

nij

nij

nijn

ijnKG

ij

VVf

~~,

111~

nij

nij

nij

zzzzf

nij

nij

V

V

, ji,

ji,

The marginal value of a measurement is bigger if these values are closer together


Asymptotic optimality property

Jensen’s inequality gives a global lower bound on the value of any policy:

Theorem. If the number of measurements is infinite, the KG policy attains the global lower bound.

If we have infinitely many measurements, then the KG policy will find the true shortest path.

VEsVE NN

VEsVE NNKG

N

lim



The proof is technical, but the key detail is

The KG factor of an edge is zero if and only if the length of that edge is known perfectly (with infinite precision)

It can be shown that the KG factor is continuous in The precision always increases when we measure (i,j) As we measure (i,j) more often, we have

Since we measure the edge with the largest KG, eventually we will switch over to another edge

.0, nij

nKGij

nij

Ns

.0, , nKGij

nij



There are many simple methods that are asymptotically optimal

» If we have infinitely many measurements, we could just measure every edge in a round-robin fashion

However, KG is also myopically optimal» If N=1, KG allocates the sole measurement optimally

KG is the only stationary method that is both myopically and asymptotically optimal

This suggests that KG may yield good performance for general finite time horizons


Knowledge gradient on a graph

Consider a simple layered graph (14 nodes, 24 edges) The true shortest path is highlighted in black The path that we think is the shortest is highlighted in blue Let’s see how the KG method changes our beliefs about

the best path

1

2

4

3

5

6

8

7

9

10

12

11

13

14



Edge measured by KG: (5,8)

Our beliefs about this edge have increased enough to change our beliefs about the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14




Our beliefs about this edge have increased enough to change our beliefs about the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14




Not every measurement changes our beliefs about the best path…

1

2

4

3

5

6

8

7

9

10

12

11

13

14




Notice how we always measure edges that are close to the blue path, but not always on it

1

2

4

3

5

6

8

7

9

10

12

11

13

14



Edges measured: (1,2), (5,8), (1,5), (2,7), (7,10)

We have found the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14


Experimental results

Ten layered graphs (22 nodes, 50 edges)

Ten larger layered graphs (38 nodes, 102 edges)


Conclusion

We have defined a new class of optimal learning problems, beyond the scope of the traditional literature

We have derived a one-period look-ahead method for the problem of learning on a graph

The method produces an easily computable decision rule and has certain theoretical advantages

» Optimal for N=1 by design: if we have only one measurement, we get as much value out of it as possible

» Asymptotic optimality: if we have infinitely many measurements, we find the true shortest path

Experimental evidence shows that KG performs well for values of N in between

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...

Documents

Transcript of © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...