© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...
-
Upload
reginald-snow -
Category
Documents
-
view
215 -
download
1
Transcript of © 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual...
© 2009 Ilya O. Ryzhov 1© 2008 Warren B. Powell 1
.
Optimal Learning On A Graph
INFORMS Annual MeetingOctober 11, 2009
Ilya O. RyzhovWarren Powell
Princeton University
© 2009 Ilya O. Ryzhov, Princeton University
© 2009 Ilya O. Ryzhov 22
Motivation Learning on a graph
» Need to quickly plan the fastest (least congested) travel route
» GPS-enabled smartphones in the area can provide an estimate of local congestion
» We can make a small number of queries before we have to recommend a route
» Which areas should we measure in the limited time available?
» We are solving a problem on a graph, but we can measure any individual component of the graph at any time
© 2009 Ilya O. Ryzhov 33
Information collection on a graph
We have a shortest-path problem on a graph :
If the edge lengths , were deterministic, the problem would have a simple solution
» Algorithms by Bellman, Bellman-Ford, Dijkstra…
ijij
Eji ,ij
EVG ,
© 2009 Ilya O. Ryzhov 44
Information collection on a graph
We have a shortest-path problem on a graph:
If the edge lengths were stochastic with known distribution:» We could run a deterministic shortest-path algorithm with edge lengths
given by » We could compute or approximate the distribution of the stochastic
shortest path (Kulkarni 1986, Fan et al. 2005, Peer & Sharma 2007)
ijij
ij
ijE
© 2009 Ilya O. Ryzhov 55
Information collection on a graph
We have a shortest-path problem on a graph:
In the problem of learning on a graph, the edge lengths are stochastic, with unknown distribution
We use Bayesian statistics to learn the distributions sequentially
ijij
© 2009 Ilya O. Ryzhov 6
Information collection on a graph
At first, we believe that
But we measure this edge and observe
Our beliefs change:
Thus, our beliefs about the rewards are gradually improved over measurements
6
00 1,~ ijijij N
0
1001
01
11
ˆ
1,~
ij
ijijijij
ijij
ijijij N
1,~ˆ1ijij N
N
i j0ij
i j0ij1ˆ ij
i j1ij
© 2009 Ilya O. Ryzhov 7
Information collection on a graph
After n measurements, our beliefs about the entire graph are encoded in the knowledge state:
We can solve a deterministic shortest-path problem with edge lengths given by
This gives us a path pn that seems to be the shortest, based on our beliefs
» The length of this path is believed to be
This is not necessarily the real shortest path» The true length of path pn is
» The true length of the real shortest path is
npji
nij
nn sV,
nnns ,
nij
p
pVV min
n
n
pjiij
pV,
© 2009 Ilya O. Ryzhov 88
Information collection on a graph
Optimal routing over a graph» The best path according to our beliefs
The black path is the path pn, with time-n length . nn sV
© 2009 Ilya O. Ryzhov 99
Information collection on a graph
Optimal routing over a graph» The best path according to our beliefs» The edge we measure
The black path is the path pn, with time-n length . nn sV
© 2009 Ilya O. Ryzhov 1010
Information collection on a graph
Optimal routing over a graph» The best path according to our beliefs» The edge we measure» The best path according to our new beliefs
» How do we decide which links to measure?The black path is the path pn+1, with time-(n+1) length . 11 nn sV
© 2009 Ilya O. Ryzhov 11
Learning policies
Let be a function that takes the knowledge state and gives us an edge to measure
A learning policy is a set of such functions Simple examples of learning policies:
» Pure exploitation: find the time-n shortest path , then measure the shortest edge on that path
» Variance-exploitation: find the time-n shortest path , and then measure the edge that we are least certain about
ns EsX nn
nX
1,0, ,..., NXX
np
nij
pji
nnExp
nsX
,
, minarg
nij
pji
nnVExp
nsX
,
, minarg
np
© 2009 Ilya O. Ryzhov 12
Implementation policies
The problem consists of two phases:» Learning ( ): measuring individual edges» Implementation ( ): choosing a path
An implementation policy is a single function which maps the final state to some path
Simple examples of implementation policies:» Find the path pN: solve a deterministic shortest-path problem with edge
lengths given by» -percentile: solve a deterministic shortest-path problem with edge
lengths given by
Ns Nsp
N
ijNij z
Nij
© 2009 Ilya O. Ryzhov 13
Objective function
Choose a measurement policy and an implementation policy to minimize the true length of the path chosen by the implementation policy
VEinfinfObjective: .
Nsji
ijV
,
© 2009 Ilya O. Ryzhov 14
Learning policies
Theorem.
The best possible implementation policy is the one that finds the path pN
This result eliminates the problem of finding an implementation policy
We only have to find a learning policy that makes our estimate of small
NN sVEVE
infinfinf
NN sV
© 2009 Ilya O. Ryzhov 15
The KG decision rule: one-period look-ahead
The KG rule chooses an edge to maximize the expected one-period improvement in our estimate of the shortest path
nKG
ijji
nnnnnij
ji
nnKG sVsVEsX
,
,
11
,
,
maxarg
maxarg
© 2009 Ilya O. Ryzhov 16
Learning using knowledge gradients
Proposition. If we measure the edge at time n, then the best path at time n+1 (the path pn+1 that achieves ) will be either
» The best time-n path containing the edge , or» The best time-n path not containing the edge .
At time n, we know that the best time-(n+1) path can only be one of two things
ji,
ji,
ji,
11 nn sV
© 2009 Ilya O. Ryzhov 17
Computation of the knowledge gradient
ji,
nij
The best path containing the edge
© 2009 Ilya O. Ryzhov 18
Computation of the knowledge gradient
ji,
The best path not containing the edge
nij
© 2009 Ilya O. Ryzhov 19
Main result: KG formula It can be shown that
where
and
- standard normal cdf and pdf
- time-n length of the best path containing
- time-n length of the best path not containing
19
nij
nij
nijn
ijnKG
ij
VVf
~~,
111~
nij
nij
nij
zzzzf
nij
nij
V
V
, ji,
ji,
The marginal value of a measurement is bigger if these values are closer together
© 2009 Ilya O. Ryzhov 20
Asymptotic optimality property
Jensen’s inequality gives a global lower bound on the value of any policy:
Theorem. If the number of measurements is infinite, the KG policy attains the global lower bound.
If we have infinitely many measurements, then the KG policy will find the true shortest path.
VEsVE NN
VEsVE NNKG
N
lim
© 2009 Ilya O. Ryzhov 21
Asymptotic optimality property
The proof is technical, but the key detail is
The KG factor of an edge is zero if and only if the length of that edge is known perfectly (with infinite precision)
It can be shown that the KG factor is continuous in The precision always increases when we measure (i,j) As we measure (i,j) more often, we have
Since we measure the edge with the largest KG, eventually we will switch over to another edge
.0, nij
nKGij
nij
Ns
.0, , nKGij
nij
© 2009 Ilya O. Ryzhov 22
Asymptotic optimality property
There are many simple methods that are asymptotically optimal
» If we have infinitely many measurements, we could just measure every edge in a round-robin fashion
However, KG is also myopically optimal» If N=1, KG allocates the sole measurement optimally
KG is the only stationary method that is both myopically and asymptotically optimal
This suggests that KG may yield good performance for general finite time horizons
© 2009 Ilya O. Ryzhov 2323
Knowledge gradient on a graph
Consider a simple layered graph (14 nodes, 24 edges) The true shortest path is highlighted in black The path that we think is the shortest is highlighted in blue Let’s see how the KG method changes our beliefs about
the best path
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 2424
Knowledge gradient on a graph
Edge measured by KG: (5,8)
Our beliefs about this edge have increased enough to change our beliefs about the best path!
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 2525
Knowledge gradient on a graph
Edge measured by KG: (1,5)
Our beliefs about this edge have increased enough to change our beliefs about the best path!
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 2626
Knowledge gradient on a graph
Edge measured by KG: (2,7)
Not every measurement changes our beliefs about the best path…
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 2727
Knowledge gradient on a graph
Edge measured by KG: (7,10)
Notice how we always measure edges that are close to the blue path, but not always on it
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 2828
Knowledge gradient on a graph
Edges measured: (1,2), (5,8), (1,5), (2,7), (7,10)
We have found the best path!
1
2
4
3
5
6
8
7
9
10
12
11
13
14
© 2009 Ilya O. Ryzhov 29
Experimental results
Ten layered graphs (22 nodes, 50 edges)
Ten larger layered graphs (38 nodes, 102 edges)
© 2009 Ilya O. Ryzhov 30
Conclusion
We have defined a new class of optimal learning problems, beyond the scope of the traditional literature
We have derived a one-period look-ahead method for the problem of learning on a graph
The method produces an easily computable decision rule and has certain theoretical advantages
» Optimal for N=1 by design: if we have only one measurement, we get as much value out of it as possible
» Asymptotic optimality: if we have infinitely many measurements, we find the true shortest path
Experimental evidence shows that KG performs well for values of N in between