A Hybridized Planner for Stochastic Domains

47
A Hybridized Planner for Stochastic Domains Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento

description

A Hybridized Planner for Stochastic Domains. Mausam and Daniel S. Weld University of Washington, Seattle Piergiorgio Bertoli ITC-IRST, Trento. Planning under Uncertainty (ICAPS’03 Workshop). Qualitative (disjunctive) uncertainty Which real problem can you solve?. - PowerPoint PPT Presentation

Transcript of A Hybridized Planner for Stochastic Domains

Page 1: A Hybridized Planner for Stochastic Domains

A Hybridized Planner for Stochastic Domains

Mausam and Daniel S. WeldUniversity of Washington, Seattle

Piergiorgio BertoliITC-IRST, Trento

Page 2: A Hybridized Planner for Stochastic Domains

Planning under Uncertainty

(ICAPS’03 Workshop)

Qualitative (disjunctive) uncertainty

Which real problem can you solve?

Quantitative (probabilistic) uncertainty

Which real problem can you model?

Page 3: A Hybridized Planner for Stochastic Domains

The Quantitative View

Markov Decision Processmodels uncertainty with probabilistic

outcomesgeneral decision-theoretic frameworkalgorithms are slow

do we need the full power of decision theory?

is an unconverged partial policy any good?

Page 4: A Hybridized Planner for Stochastic Domains

The Qualitative View

Conditional PlanningModel uncertainty as logical disjunction of

outcomesexploits classical planning techniques FASTignores probabilities poor solutions

how bad are pure qualitative solutions?can we improve the qualitative policies?

Page 5: A Hybridized Planner for Stochastic Domains

HybPlan: A Hybridized Planner

combine probabilistic + disjunctive plannersproduces good solutions in intermediate timesanytime: makes effective use of resourcesbounds termination with quality guarantee

Quantitative View completes partial probabilistic policy by using qualitative

policies in some states

Qualitative View improves qualitative policies in more important regions

Page 6: A Hybridized Planner for Stochastic Domains

Outline

MotivationPlanning with Probabilistic Uncertainty

(RTDP)Planning with Disjunctive Uncertainty (MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work

Page 7: A Hybridized Planner for Stochastic Domains

Markov Decision Process

< S, A, Pr, C, s0, G > S : a set of states

A : a set of actions

Pr : prob. transition model

C : cost model

s0 : start state

G : a set of goals

Find a policy (S ! A) minimizes expected cost

to reach a goal for an indefinite horizon for a fully observable Markov decision process.

Optimal cost function, J*, ~ optimal policy

Page 8: A Hybridized Planner for Stochastic Domains

s0 Goal

2

2

Example

Longer path

Wrong direction,but goal still reachable

All states aredead-ends

Page 9: A Hybridized Planner for Stochastic Domains

41

3

8

44

1

8

3 2

2

77

3 3

1

11

2

6

01

1

2

2

Goal

Optimal State Costs

Page 10: A Hybridized Planner for Stochastic Domains

43

3 2

20

1

1 Goal

Optimal Policy

Page 11: A Hybridized Planner for Stochastic Domains

Bellman Backup: Create better approximation to cost function @ s

Page 12: A Hybridized Planner for Stochastic Domains

Bellman Backup: Create better approximation to cost function @ s

Trial=simulate greedy policy & update visited states

Page 13: A Hybridized Planner for Stochastic Domains

Bellman Backup: Create better approximation to cost function @ s

Real Time Dynamic Programming

(Barto et al. ’95; Bonet & Geffner’03)

Repeat trials until cost function converges

Trial=simulate greedy policy & update visited states

Page 14: A Hybridized Planner for Stochastic Domains

Planning with Disjunctive Uncertainty

< S, A, T, s0, G > S : a set of states

A : a set of actions

T : disjunctive transition model

s0 : the start state

G : a set of goals

Find a strong-cyclic policy (S ! A) that guarantees

reaching a goal for an indefinite

horizon for a fully observable planning problem

Page 15: A Hybridized Planner for Stochastic Domains

Model Based Planner (Bertoli et. al.)

States, transitions, etc. represented logicallyUncertainty multiple possible successor states

Planning Algorithm Iteratively removes “bad” states.Bad = don’t reach anywhere or reach other bad states

Page 16: A Hybridized Planner for Stochastic Domains

Goal

MBP Policy

Sub-optimal solution

Page 17: A Hybridized Planner for Stochastic Domains

Outline

MotivationPlanning with Probabilistic Uncertainty

(RTDP)Planning with Disjunctive Uncertainty

(MBP)Hybridizing RTDP and MBP (HybPlan)ExperimentsConclusions and Future Work

Page 18: A Hybridized Planner for Stochastic Domains

HybPlan Top Level Code0. run MBP to find a solution to goal1. run RTDP for some time2. compute partial greedy policy

(rtdp)

3. compute hybridized policy (hyb) by1. hyb(s) = rtdp(s) if visited(s) > threshold

2. hyb(s) = mbp(s) otherwise

4. clean hyb by removing1. dead-ends2. probability 1 cycles

5. evaluate hyb 6. save best policy obtained so far

repeat until1) resources exhaust or2)a satisfactory policy found

Page 19: A Hybridized Planner for Stochastic Domains

00

0

0

00

0

0

0 0

0

00

0 0

0

00

0

0

Goal

0

0

2

2

First RTDP Trial

1. run RTDP for some time

Page 20: A Hybridized Planner for Stochastic Domains

0

0

0

00

0

0

0 0

0

00

0 0

0

00

0

0

Goal

0

0

2

2

Q1(s,N) = 1 + 0.5£ 0 + 0.5£ 0Q1(s,N) = 1Q1(s,S) = Q1(s,W) = Q1(s,E) = 1J1(s) = 1

Let greedy action be North

Bellman Backup

1. run RTDP for some time

Page 21: A Hybridized Planner for Stochastic Domains

10

0

0

00

0

0

0 0

0

00

0 0

0

00

0

0

Goal

0

0

2

2

Simulation of Greedy Action

1. run RTDP for some time

Page 22: A Hybridized Planner for Stochastic Domains

10

0

0

0

0

0

0 0

0

00

0 0

0

00

0

0

Goal

0

0

2

2

Continuing First Trial

1. run RTDP for some time

Page 23: A Hybridized Planner for Stochastic Domains

0

0

0

01

0

0

0 0

0

00

0

0

00

0

0

Goal

0

0

2

2

1

Continuing First Trial

1. run RTDP for some time

Page 24: A Hybridized Planner for Stochastic Domains

0

0

0

0

0

0

0 0

0

00

1 0

0

00

0

Goal

0

0

2

2

1

1

Finishing First Trial

1. run RTDP for some time

Page 25: A Hybridized Planner for Stochastic Domains

0

0

0

0

0

0

0 0

0

00

0

0

00

2

0

Goal

0

0

2

2

1

1

1

Cost Function after First Trial

1. run RTDP for some time

Page 26: A Hybridized Planner for Stochastic Domains

0

2

Goal

2

1

1

1

Partial Greedy Policy

2. compute greedy policy (rtdp)

Page 27: A Hybridized Planner for Stochastic Domains

0

2

Goal

2

1

1

1

0

Construct Hybridized Policy w/ MBP

3. compute hybridized policy (hyb)

(threshold = 0)

Page 28: A Hybridized Planner for Stochastic Domains

0

2

Goal

2

1

1

1

After first trial

0

J(hyb) = 5

5

4

3

2

4

3

Evaluate Hybridized Policy

5. evaluate hyb

6. store hyb

Page 29: A Hybridized Planner for Stochastic Domains

0

0

0

0

1

0

0 0

0

00

0

0

12

2

0

Goal

0

0

2

2

1

1

1

Second Trial

Page 30: A Hybridized Planner for Stochastic Domains

0

112 1

Partial Greedy Policy

Page 31: A Hybridized Planner for Stochastic Domains

0

112 1

Absence of MBP Policy

MBP Policy doesn’t exist!no path to goal

£01

01

2

Goal

2

Page 32: A Hybridized Planner for Stochastic Domains

0

0

1

0

1

0

0 0

0

01

0

0

1

2

3

Goal

0

0

2

2

1

1

1

Third Trial

2

Page 33: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Partial Greedy Policy

Page 34: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Probability 1 Cycles

0

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

Page 35: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Probability 1 Cycles

0

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

Page 36: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Probability 1 Cycles

0

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

Page 37: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Probability 1 Cycles

0

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

Page 38: A Hybridized Planner for Stochastic Domains

1 0

1

3

2

1

Probability 1 Cycles

0

01

01

2

Goal

2

repeat find a state s in cycle hyb(s) = mbp(s)until cycle is broken

Page 39: A Hybridized Planner for Stochastic Domains

0

2

Goal

2

1

1

1

After 1st trial

0

J(hyb) = 5

5

4

3

2

4

3

Error Bound

J*(s0) · 5J*(s0) ¸ 1) Error(hyb) = 5-1 = 4

Page 40: A Hybridized Planner for Stochastic Domains

Terminationwhen a policy of required error bound is foundwhen the planning time exhaustswhen the available memory exhausts

Propertiesoutputs a proper policyanytime algorithm (once MBP terminates)HybPlan = RTDP, if infinite resources availableHybPlan = MBP, if extremely limited resourcesHybPlan = better than both, otherwise

Page 41: A Hybridized Planner for Stochastic Domains

Outline

MotivationPlanning with Probabilistic Uncertainty

(RTDP)Planning with Disjunctive Uncertainty

(MBP)Hybridizing RTDP and MBP (HybPlan)Experiments

Anytime Properties Scalability

Conclusions and Future Work

Page 42: A Hybridized Planner for Stochastic Domains

Domains

NASA Rover Domain Factory Domain Elevator domain

Page 43: A Hybridized Planner for Stochastic Domains

Anytime Properties

RTDP

Page 44: A Hybridized Planner for Stochastic Domains

Anytime Properties

RTDP

Page 45: A Hybridized Planner for Stochastic Domains

Scalability

Problems Time before memory exhausts

J(rtdp) J(mbp) J(hyb)

Rov5 ~1100 sec 55.36 67.04 48.16

Rov2 ~800 sec 1 65.22 49.91

Mach9 ~1500 sec 143.95 66.50 48.49

Mach6 ~300 sec 1 71.56 71.56

Elev14 ~10000 sec

1 46.49 44.48

Elev15 ~10000 sec

1 233.07 87.46

Page 46: A Hybridized Planner for Stochastic Domains

Conclusions

First algorithm that integrates disjunctive and probabilistic planners.

Experiments show that HybPlan is anytime scales better than RTDP produces better quality solutions than MBP can interleaved planning and execution

Page 47: A Hybridized Planner for Stochastic Domains

Hybridized Planning: A General Notion

Hybridize other pairs of planners an optimal or close-to-optimal planner a sub-optimal but fast planner

to yield a planner that produces a good quality solution in intermediate running times

Examples POMDP : RTDP/PBVI with POND/MBP/BBSP Oversubscription Planning : A* with greedy solutions Concurrent MDP : Sampled RTDP with single-action RTDP