Knowledge-Gradient Methods for Efficient …...Knowledge-Gradient Methods for E cient Information...

Knowledge-Gradient Methodsfor Efficient Information Collection

Peter FrazierPresenting joint work with Warren Powell, Savas Dayanik, and Diana Negoescu

Department of Operations Research and Financial EngineeringPrinceton University

Tuesday February 3, 2009Operations Research and Information Engineering

Cornell University

1 / 35

Outline

1 Overview of Information Collection Applications

2 Global Optimization of Expensive Continuous FunctionsProblem DescriptionKnowledge-Gradient PolicyApplication: Simulation Model Calibration (Schneider)Application: Drug Discovery

3 KG Policies for General Offline ProblemsProblem Description and KG PolicyConvergence

2 / 35

Outline




3 / 35

Information Collection

We consider information collection problems, in which we mustdecide how much and of what type of information to collect.

We focus our interest on sequential Bayesian information collectionproblems.

In making such decisions we trade the benefit of information (theability to make better decisions in the future) against its cost (money,time, or opportunity cost).

We propose the knowledge-gradient (KG) method as a general wayto make information collection decisions.

4 / 35

Application: Simulation Optimization

We would like to choose a staffing policy in a hospital to minimizepatient waiting time, subject to a cost constraint.Hospital dynamics under a particular staffing policy cannot beevaluated analytically, but we can estimate it via simulation.To find a good staffing policy to implement in our hospital, weadaptively choose which sequence of policies to learn about with oursimulator.

Shi, Chen, and Yucesan

budget. This is the basic idea of optimal computing budget

allocation (OCBA) (Chen et al. 1996, 1999).

We apply the hybrid algorithm for a stochastic

resource allocation problem, where no analytical

expression exists for the objective function, and it is

estimated through simulation. Numerical results show that

our proposed algorithm can be effectively used for solving

large-scale stochastic discrete optimization problems.

The paper is organized as follows: In section 2 we

formulate the resource allocation problem as a stochastic

discrete optimization problem. In section 3 we present the

hybrid algorithm. The performance of the algorithm is

illustrated with one numerical example in Section 4.

Section 5 concludes the paper.

problem of performing numerical expectation since the

functional L( 0,5> is available only in the form of a complex

calculation via simulation. The standard approach is to

estimate E[L( 6, 5>] by simulation sampling, i.e.,

Unfortunately, t can not be too small for a reasonable

estimation of E[L(O, 01. And the total number of

simulation samples can be extremely large since in the

resource allocation problems, the number of (el, &,..., 0,) combinations is usually very large as we will show the

following example.

2 RESOURCE ALLOCATION PROBLEMS 2.1 Buffer Allocation in Supply Chain Management

There are many resource allocation problems in the design

of discrete event systems. In this paper we consider the

following resource allocation optimization problem:

where 0 is a finite discrete set and J : 0 + R is a

performance function that is subject to noise. Often J ( @ is

an expectation of some random estimate of the

performance,

where 5 is a random vector that represents uncertain factors

in the systems. The "stochastic" aspect has to do with the

We consider a 10-node network shown in Figure 1. There

are 10 servers and 10 buffers, which is an example of a

supply chain, although such a network could be the model

for many different real-world systems, such as a

manufacturing system, a communication or a traffic

network. There are two classes of customers with different

arrival distributions, but the same service requirements. We

consider both exponential and non-exponential

distributions (uniform) in the network. Both classes arrive

at any of Nodes 0-3, and leave the network after having

gone through three different stages of service. The routing

is not probabilistic, but class dependent as shown in Figure

1. Finite buffer sizes at all nodes are assumed which is

exactly what makes our optimization problem interesting.

More specific, we are interested in distributing optimally

C1: Unif[2,18]

C2: Exp(O.12) Arrival:

Figure 1: A 10-node Network in the Resource Allocation Problem

396

Authorized licensed use limited to: IEEE Xplore. Downloaded on November 20, 2008 at 16:29 from IEEE Xplore. Restrictions apply.

Source: Shi,Chen,and Yucesan 1999 5 / 35

Application: AIDS Treatment and Prevention

We would like to treat and preventAIDS in Africa.

We are uncertain about theeffectiveness of untried preventionmethods, but we can learn aboutthem by using them in practice.

To which prevention methodsshould we allocate our resources?

How should we balance using triedand true methods with using untriedmethods that may be better?

6 / 35

Application: Product Pricing

We would like to dynamically priceproducts to maximize revenue.

We learn about product demand fromsales and the prices at which those saleswere made.

The information collected depends onthe price:

If we price very high, we sell nothingand learn only an upper bound onwhat people are willing to pay.If we price very low, we sell to everyvaguely interested party, but learnlittle about how much they are willingto pay.

7 / 35

More Information Collection Applications

Design a sequence of focus groups to effectively choose features for anew product to be developed.

Choose which items in a retail store should carry RFID tags.

Decide whether to adopt a new technology now, or to wait and gathermore information about how well it works.

Manage a supply chain when demand distributions are uncertain, anddemand lost due to stockout is unobserved.

Design an adaptive data collection strategy that will quickly andeffectively identify the source and extent of radiation contaminationin an emergency.

8 / 35

Example: Ranking and Selection

Assume we have five choices, with uncertainty in our belief about how welleach one will perform. Imagine we can make a single measurement, afterwhich we have to make a choice about which one is best. What should wedo?

© 2008 Warren B. Powell 12

Introduction

! Now assume we have five choices, with uncertainty in our belief about how

well each one will perform. Imagine you can make a single measurement,

after which you have to make a choice about which one is best. What

would you do?

1 2 3 4 5

9 / 35




Introduction




would you do?

1 2 3 4 5

No improvement

9 / 35




Introduction




would you do?

1 2 3 4 5

New solution

The value of learning is that it may change your decision.

9 / 35

The Knowledge-Gradient Policy for Ranking and Selection

The knowledge-gradient policy values each potential measurement xaccording to

value of measuring x = E [best we can do with the measurement]

−(best we can do without the measurement) .

We call this value the KG factor. It then performs the measurement withthe largest KG factor.


The knowledge gradient

! Basic principle:

» Assume you can make only one measurement, after which you have to make a

final choice (the implementation decision).

» What choice would you make now to maximize the expected value of the

implementation decision?

1 2 3 4 5

!""#""$

Change in estimate

of value of option

5 due to

measurement.

Change which produces a

change in the decision.

10 / 35

Outline




11 / 35

Global Optimization of Expensive Continuous Functions

We have a function whose global maximum we would like to find.

We can evaluate the function with noise via some black-box, butcannot obtain gradients or other information.

Evaluating the function is expensive, justifying the use of asophisticated algorithm to choose evaluation points.

12 / 35

Bayesian Prior on the Function to be Optimized

We begin with a Gaussian process prior on f . Under this prior, ourprior belief on the values that f takes on any finite set of pointsx1, . . . ,xM is multivariate normal.

(f (x1), . . . , f (xM))∼N (µ0,Σ0),

where µ0 and Σ0 are functions of x1, . . . ,xM .

0 50 100−3

−2

−1

0

1

2

3

0 50 100−4

−3

−2

−1

0

1

2

0 50 100−2

−1

0

1

2

3

13 / 35

Updating the Prior

For computational reasons, we will restrict ourselves to a finite collectionof points x1, . . . ,xM . The time n posterior belief on f (x1), . . . , f (xM) isN (µn,Σn), where µn and Σn can be computed recursively from

the parameters of the previous belief µn−1 and Σn−1,

the location of the time-n measurement, xn,

and the measurement’s value, yn+1.

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=1, x=0

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=2, x=2

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=3, x=1

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=4, x=1

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=5, x=2

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=6, x=2

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=7, x=1

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=1, x=0

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=2, x=2

−1 0 1 2−0.5

0

0.5

1

1.5

2

x

fit

t=3, x=1

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=4, x=1

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=5, x=2

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=6, x=2

−1 0 1 2−0.5

00.5

11.5

2

x

fit

t=7, x=1

14 / 35

Measurement as a Stochastic Optimization Problem

Our goal is to choose measurements to maximize our ability tochoose a high-value alternative at implementation time.

The reward received is maxi µNi .

The optimal solution satisfies Bellman’s recursion with a statevariable of Sn = (n,µn,Σn).

V N(SN) = maxi

µNi ,

V n(Sn) = maxx

En

[V n+1(Sn+1) | xn = x

].

However, the state space has O(M2) dimensions, which makesactually solving Bellman’s recursion impossible and justifies the searchfor good heuristic policies.

15 / 35

Knowledge-Gradient Policy

The KG policy assigns a value or KG factor νnx to each potential

measurement x . It then performs the measurement with the largest KGfactor. The KG factor is

νnx = E [best we can do with the measurement]

−(best we can do without the measurement) .

0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni )

posterior (µn+1i )

16 / 35

Knowledge-Gradient Policy

The KG factor is

νnx = E [best we can do with the measurement]

− (best we can do without the measurement) .

= En

[max

iµ

n+1i | xn = x

]−max

iµ

ni .

0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni )

posterior (µn+1i )

17 / 35

Other Approaches

Many other derivative-free noise-tolerant global optimization methodsexist, e.g.,

pattern search, e.g., Nelder-Meadstochastic approximation, e.g., SPSA [Spall 1992].evolutionary algorithms, simulated annealing, tabu searchresponse surface methods.

The KG method is a Bayesian global optimization (BGO) methodbecause it places a Bayesian prior distribution on the underlying butunknown function.

BGO methods require more computation to decide where to evaluatenext, but often require fewer evaluations to find global extrema[Huang et al. 2006].

18 / 35

Computing the KG Factor

0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1

i

yn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

19 / 35


0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1

iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

19 / 35


0 20 40 60 80 1005

9

alternatives (i)

max

i µn+1

iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

19 / 35


0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1

i

yn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

20 / 35


0 20 40 60 80 1005

9

alternatives (i)

maxi µn+1

iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

20 / 35


0 20 40 60 80 1005

9

alternatives (i)

max

i µn+1

iyn+1

x=49

5.5 6 6.5 7 7.5 8 8.57

7.5

8

8.5

9

observation (yn+1)

best

pos

terio

r (m

axi µ

n+1

i)

prior (µni)

posterior (µn+1i

)

20 / 35



a1+b1 y a2+b2 y

a 4+b 4y

a3+b3y

maxi!n+1

i

observation (yn+1)

The KG factor νnx for measuring alternative x is

νnx = En

[max

iµ

n+1i | xn = x

]−max

xµ

nx

= ∑j

(bi+1−bi )f

(−|ai+1−ai |bi+1−bi

),

where f (z) = ϕ(z) + zΦ(z), ϕ is the normal pdf and Φ is the normal cdf.21 / 35

Maximizing the KG Factor

We compute the KG factor for each candidate measurement, and choosethe measurement with the largest.

0 20 40 60 80 1004

5

6

7

8

9

alternatives (i)

0 20 40 60 80 100

−6

−4

−2

0

log(

KG

fact

or)

alternatives (i)

µni

µni +/− sqrt(Σ

iin)

22 / 35

Example

23 / 35

Simulation Model Calibration at Schneider National

The logistics company Schneider National uses a largesimulation-based optimization model to try “what if” scenarios.

The model has several input parameters that must be tuned to makeits behavior match reality before it can be used.

© 2008 Warren B. Powell Slide 113

Schneider National

© 2008 Warren B. Powell Slide 114

24 / 35

Simulation Model Calibration at Schneider National

Current company practice gets company drivers home 2 times permonth, and independent contractors 1.7 times per month, on average.

The optimization model awards a bonus to itself each time it brings atruck driver home.

Goal: adjust the bonuses to make the optimal solution found by themodel match current practice.

Running the simulator to convergence for one set of bonuses takes 3days, and the full calibration takes 1−2 weeks when done by hand.

25 / 35

Simulation Model Calibration Results

Mean of Posterior, µn

Bonus 1

Bon

us 2

0 1 2 30

0.5

1

1.5

2

2.5

3Std. Dev. of Posterior

Bonus 1

Bon

us 2

0 1 2 30

0.5

1

1.5

2

2.5

3

log(KG Factor)

Bonus 1

Bon

us 2

0 1 2 30

0.5

1

1.5

2

2.5

3

0 2 4 6 8−2

−1.5

−1

−0.5

0Best Fit

n

log1

0(B

est F

it)

26 / 35

Simulation Model Calibration Results

The KG method calibrates the model in approximately 3 days,compared to 7−14 days when tuned by hand.

The calibration is automatic, freeing the human calibrator to do otherwork.

Current practice uses the year’s calibrated bonuses for each new“what if” scenario, but to enforce the constraint on driver at-hometime it would be better to recalibrate the model for each scenario.Automatic calibration with the KG method makes this feasible.

27 / 35

Drug Discovery

We are working with a medical group atGeorgetown University hospital toimprove upon a small molecule theybelieve can treat Ewing’s sarcoma.

As test cases, we use other families ofmolecules for which data has beencollected and published, including thebenzomorphan family at right.

We use the Free-Wilson model, underwhich a molecule’s value is the sum ofthe values of its substituent-locationpairs.

14

14

Jo

urn

al o

f Med

icina

l Ch

em

istry, 1

97

7, V

ol. 2

0, N

o. 1

1

Ka

tz, Osb

orn

e, Ion

escu

ri

v

m

m

t-

m

m

(9

m

m

*

m

m

m

02

01

ri

m

0

m

0,

o]

co o]

t- o]

(9

o]

Ln

N

e

o]

a

N

o]

ri

N

0

o]

Q,

ri

30

ri

c- ri

(9

m

7..

m

ri

-e

m

3

cv

- d

ri

3

0

ri

Q,

to

t-

(9

LQ

Tr

crj

hl

d

i

d

riri

37

-

ri

ri

ri

Td

--. r(

rid

ri

ri

ri

r-

dri

ri

d

d

v-t

d

3

Source: Katz, Osborne, Ionescu 1977

Benzomorphan molecule,with locations R1-R5available for substitution.

28 / 35

Drug Discovery: Numerical Results

0 20 40 60 80 100 120 140 160 180 2000.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

measurements

KG vs Pure Exploration using new prior

best value

KG

PEExplore

Number of Measurements

Quality of Best

Moleculelog(kg/mol)

Best ValueKG PolicyExploration Policy

29 / 35

Outline




30 / 35

The General Offline Information Collection Problem

1 We begin with a prior distribution on some unknown truth θ .

2 We make a sequence of measurements, deciding which types ofmeasurement to make as we go.

3 After N measurements, we choose an implementation decision i andearn a reward R(θ , i).

In the global optimization problem previously discussed,

θ is the function whose optimum we seek.

This function’s domain is the same as both the spaces of possiblemeasurement types x and possible implementation decisions i .

R(θ , i) = θ(i).

31 / 35

The KG Policy for General Offline Problems

The KG policy for any problem from this general framework is

arg maxx

En

[max

iµ

n+1i | xn = x

]−max

iµ

ni .

µni = En [R(θ ; i)] is the expected value of implementation decision i

given what we know at time n. µn+1 is defined similarly.

Evaluating this expression for the KG decision is oftencomputationally intensive.

32 / 35

Optimality and Convergence Results

The KG policy is myopically optimal in general (optimal when N = 1).

In certain special cases (e.g., independent normal ranking andselection on 2 alternatives) the KG policy is optimal for all N.

In many problems, the KG policy is provably convergent.

Convergence means that the alternative we think is best,arg maxi EN [R(θ , i)], converges to the one that is actually the best,arg maxi R(θ , i).In the global optimization of expensive functions, convergence meansthe KG policy always finds the global maximum when given enoughmeasurements.

KG is in some sense a myopic policy, and so convergence is importantbecause it shows that myopia does not mislead KG into getting stuck,measuring one alternative over and over.

33 / 35

Conclusion

Knowledge-gradient policies form a broadly applicable class of informationcollection policies with several appealing properties:

KG policies are myopically optimal in general.

KG policies are convergent in a broad class of problems.

KG policies perform well numerically against other existing policies inseveral problems.

KG policies are flexible and may be computed easily in a broad classof problems.

34 / 35

Thank You

Thank you.

Any questions?

35 / 35

Knowledge-Gradient Methods for Efficient …...Knowledge-Gradient Methods for E cient Information...

Documents

Transcript of Knowledge-Gradient Methods for Efficient …...Knowledge-Gradient Methods for E cient Information...