© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...

30
© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1 . Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton University © 2009 Ilya O. Ryzhov, Princeton University

Transcript of © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...

Page 1: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 1© 2008 Warren B. Powell 1

.

Optimal Learning On A Graph

INFORMS Annual MeetingOctober 11, 2009

Ilya O. RyzhovWarren Powell

Princeton University

© 2009 Ilya O. Ryzhov, Princeton University

Page 2: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 22

Motivation Learning on a graph

» Need to quickly plan the fastest (least congested) travel route

» GPS-enabled smartphones in the area can provide an estimate of local congestion

» We can make a small number of queries before we have to recommend a route

» Which areas should we measure in the limited time available?

» We are solving a problem on a graph, but we can measure any individual component of the graph at any time

Page 3: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 33

Information collection on a graph

We have a shortest-path problem on a graph :

If the edge lengths , were deterministic, the problem would have a simple solution

» Algorithms by Bellman, Bellman-Ford, Dijkstra…

ijij

Eji ,ij

EVG ,

Page 4: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 44

Information collection on a graph

We have a shortest-path problem on a graph:

If the edge lengths were stochastic with known distribution:» We could run a deterministic shortest-path algorithm with edge lengths

given by » We could compute or approximate the distribution of the stochastic

shortest path (Kulkarni 1986, Fan et al. 2005, Peer & Sharma 2007)

ijij

ij

ijE

Page 5: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 55

Information collection on a graph

We have a shortest-path problem on a graph:

In the problem of learning on a graph, the edge lengths are stochastic, with unknown distribution

We use Bayesian statistics to learn the distributions sequentially

ijij

Page 6: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 6

Information collection on a graph

At first, we believe that

But we measure this edge and observe

Our beliefs change:

Thus, our beliefs about the rewards are gradually improved over measurements

6

00 1,~ ijijij N

0

1001

01

11

ˆ

1,~

ij

ijijijij

ijij

ijijij N

1,~ˆ1ijij N

N

i j0ij

i j0ij1ˆ ij

i j1ij

Page 7: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 7

Information collection on a graph

After n measurements, our beliefs about the entire graph are encoded in the knowledge state:

We can solve a deterministic shortest-path problem with edge lengths given by

This gives us a path pn that seems to be the shortest, based on our beliefs

» The length of this path is believed to be

This is not necessarily the real shortest path» The true length of path pn is

» The true length of the real shortest path is

npji

nij

nn sV,

nnns ,

nij

p

pVV min

n

n

pjiij

pV,

Page 8: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 88

Information collection on a graph

Optimal routing over a graph» The best path according to our beliefs

The black path is the path pn, with time-n length . nn sV

Page 9: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 99

Information collection on a graph

Optimal routing over a graph» The best path according to our beliefs» The edge we measure

The black path is the path pn, with time-n length . nn sV

Page 10: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 1010

Information collection on a graph

Optimal routing over a graph» The best path according to our beliefs» The edge we measure» The best path according to our new beliefs

» How do we decide which links to measure?The black path is the path pn+1, with time-(n+1) length . 11 nn sV

Page 11: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 11

Learning policies

Let be a function that takes the knowledge state and gives us an edge to measure

A learning policy is a set of such functions Simple examples of learning policies:

» Pure exploitation: find the time-n shortest path , then measure the shortest edge on that path

» Variance-exploitation: find the time-n shortest path , and then measure the edge that we are least certain about

ns EsX nn

nX

1,0, ,..., NXX

np

nij

pji

nnExp

nsX

,

, minarg

nij

pji

nnVExp

nsX

,

, minarg

np

Page 12: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 12

Implementation policies

The problem consists of two phases:» Learning ( ): measuring individual edges» Implementation ( ): choosing a path

An implementation policy is a single function which maps the final state to some path

Simple examples of implementation policies:» Find the path pN: solve a deterministic shortest-path problem with edge

lengths given by» -percentile: solve a deterministic shortest-path problem with edge

lengths given by

Ns Nsp

N

ijNij z

Nij

Page 13: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 13

Objective function

Choose a measurement policy and an implementation policy to minimize the true length of the path chosen by the implementation policy

VEinfinfObjective: .

Nsji

ijV

,

Page 14: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 14

Learning policies

Theorem.

The best possible implementation policy is the one that finds the path pN

This result eliminates the problem of finding an implementation policy

We only have to find a learning policy that makes our estimate of small

NN sVEVE

infinfinf

NN sV

Page 15: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 15

The KG decision rule: one-period look-ahead

The KG rule chooses an edge to maximize the expected one-period improvement in our estimate of the shortest path

nKG

ijji

nnnnnij

ji

nnKG sVsVEsX

,

,

11

,

,

maxarg

maxarg

Page 16: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 16

Learning using knowledge gradients

Proposition. If we measure the edge at time n, then the best path at time n+1 (the path pn+1 that achieves ) will be either

» The best time-n path containing the edge , or» The best time-n path not containing the edge .

At time n, we know that the best time-(n+1) path can only be one of two things

ji,

ji,

ji,

11 nn sV

Page 17: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 17

Computation of the knowledge gradient

ji,

nij

The best path containing the edge

Page 18: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 18

Computation of the knowledge gradient

ji,

The best path not containing the edge

nij

Page 19: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 19

Main result: KG formula It can be shown that

where

and

- standard normal cdf and pdf

- time-n length of the best path containing

- time-n length of the best path not containing

19

nij

nij

nijn

ijnKG

ij

VVf

~~,

111~

nij

nij

nij

zzzzf

nij

nij

V

V

, ji,

ji,

The marginal value of a measurement is bigger if these values are closer together

Page 20: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 20

Asymptotic optimality property

Jensen’s inequality gives a global lower bound on the value of any policy:

Theorem. If the number of measurements is infinite, the KG policy attains the global lower bound.

If we have infinitely many measurements, then the KG policy will find the true shortest path.

VEsVE NN

VEsVE NNKG

N

lim

Page 21: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 21

Asymptotic optimality property

The proof is technical, but the key detail is

The KG factor of an edge is zero if and only if the length of that edge is known perfectly (with infinite precision)

It can be shown that the KG factor is continuous in The precision always increases when we measure (i,j) As we measure (i,j) more often, we have

Since we measure the edge with the largest KG, eventually we will switch over to another edge

.0, nij

nKGij

nij

Ns

.0, , nKGij

nij

Page 22: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 22

Asymptotic optimality property

There are many simple methods that are asymptotically optimal

» If we have infinitely many measurements, we could just measure every edge in a round-robin fashion

However, KG is also myopically optimal» If N=1, KG allocates the sole measurement optimally

KG is the only stationary method that is both myopically and asymptotically optimal

This suggests that KG may yield good performance for general finite time horizons

Page 23: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2323

Knowledge gradient on a graph

Consider a simple layered graph (14 nodes, 24 edges) The true shortest path is highlighted in black The path that we think is the shortest is highlighted in blue Let’s see how the KG method changes our beliefs about

the best path

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 24: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2424

Knowledge gradient on a graph

Edge measured by KG: (5,8)

Our beliefs about this edge have increased enough to change our beliefs about the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 25: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2525

Knowledge gradient on a graph

Edge measured by KG: (1,5)

Our beliefs about this edge have increased enough to change our beliefs about the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 26: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2626

Knowledge gradient on a graph

Edge measured by KG: (2,7)

Not every measurement changes our beliefs about the best path…

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 27: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2727

Knowledge gradient on a graph

Edge measured by KG: (7,10)

Notice how we always measure edges that are close to the blue path, but not always on it

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 28: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 2828

Knowledge gradient on a graph

Edges measured: (1,2), (5,8), (1,5), (2,7), (7,10)

We have found the best path!

1

2

4

3

5

6

8

7

9

10

12

11

13

14

Page 29: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 29

Experimental results

Ten layered graphs (22 nodes, 50 edges)

Ten larger layered graphs (38 nodes, 102 edges)

Page 30: © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

© 2009 Ilya O. Ryzhov 30

Conclusion

We have defined a new class of optimal learning problems, beyond the scope of the traditional literature

We have derived a one-period look-ahead method for the problem of learning on a graph

The method produces an easily computable decision rule and has certain theoretical advantages

» Optimal for N=1 by design: if we have only one measurement, we get as much value out of it as possible

» Asymptotic optimality: if we have infinitely many measurements, we find the true shortest path

Experimental evidence shows that KG performs well for values of N in between