Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

13
LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCES Nan Li, Subbarao Kambhampati, and Sungwook Yoon School of Computing and Informatics Arizona State University Tempe, AZ 85281 USA [email protected] , [email protected] , [email protected] Special Thanks to William Cushing A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?

description

A riddle for you: What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?. Learning Probabilistic Hierarchical Task Networks to Capture User Preferences. Nan Li, Subbarao Kambhampati, and Sungwook Yoon - PowerPoint PPT Presentation

Transcript of Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

Page 1: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

LEARNING PROBABILISTIC HIERARCHICAL TASK NETWORKS TO CAPTURE USER PREFERENCESNan Li, Subbarao Kambhampati, and Sungwook YoonSchool of Computing and InformaticsArizona State UniversityTempe, AZ 85281 [email protected], [email protected], [email protected] Thanks to William Cushing

A riddle for you:

What is the magic idea in planning that is at once more efficient and has higher complexity than vanilla planners?

Page 2: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

TWO TALES OF HTN PLANNING

Abstraction Efficiency Top-down

o Preference handlingo Qualityo Bottom-up

Learning Most work o Our work

Page 3: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

Hitchhike? No way!

Pbus: Getin(bus, source), Buyticket(bus), Getout(bus, dest) 2

Ptrain: Buyticket(train), Getin(train, source), Getout(train, dest) 8

Phike: Hitchhike(source, dest) 0

LEARNING USER PLAN PREFERENCES

Page 4: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

LEARNING USER PREFERENCES AS PHTNS

Given a set O of plans executed by the user Find a generative model, Hl

Hl = argmaxH p (O |H)

Probabilistic Hierarchical Task Networks(pHTNs)

S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout

Page 5: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

LEARNING pHTNs

HTNs can be seen as providing a grammar of desired solutions Actions Words Plans Sentences HTNs Grammar HTN learning Grammar induction

pHTN learning by probabilistic context free grammar (pCFG) induction Assumptions: parameter-less, unconditional

S 0.2, A1 B1S 0.8, A2 B2B1 1.0, A2 A3 B2 1.0, A1 A3A1 1.0, Getin A2 1.0, Buyticket A3 1.0, Getout

Page 6: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

A TWO-STEP ALGORITHM

• Greedy Structure Hypothesizer: Hypothesizes the

schema structure

• Expectation-Maximization (EM) Phase: Refines schema

probabilities Removes redundant

schemas

Generalizes Inside-Outside Algorithm (Lary & Young, 1990)

Page 7: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

GREEDY STRUCTURE HYPOTHESIZER

Structure learning Bottom-up Prefer recursive to non-recursive

Page 8: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

EM PHASE

E Step: Plan parse tree

computation Most probable parse

tree M Step:

Selection probabilities update

s: ai p, aj ak

Page 9: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

EVALUATION

Ideal: User studies (too hard) Our approach:

Assume H* represents user preferences Generate observed plans using H* (H* O) Learn Hl from O (O Hl) Compare H* and Hl (H* T*, Hl Tl)

Syntactic similarity is not important, only distribution is

Use KL-Divergence between distributions T*, Tl

KL-Divergence measures distance between distributions

Domains Randomly Generated Logistics Planning, Gold Miner

H*

P1, P2, …Pn

Learner

Hl

Page 10: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

RATE OF LEARNING AND CONCISENESS

Rate of Learning Conciseness

More training plans, better schemas.

• Small domains, 1 or 2 more non-primitive actions• Large domains, much more non-primitive actions• Refine structure learning?

Randomly Generated Domains

Page 11: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

EFFECTIVENESS OF EM

• Compare greedy schemas with learned schemas• EM step is very effective in capturing user preferences

Randomly Generated Domains

Page 12: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

“BENCHMARK” DOMAINS

H*: Move by plane or

truck Prefer plane Prefer fewer steps

KL Divergence: 0.04 Recovers

plane > truck less steps > more

steps

H*: Get the laser cannon Shoot rock until adjacent to

gold Get a bomb Use the bomb to remove last

wall KL Divergence: 0.52

Reproduces basic strategy

Logistics Planning Gold Miner

Page 13: Learning Probabilistic Hierarchical Task Networks to Capture User Preferences

CONCLUSIONS & EXTENSIONS

Learn user plan preferences Learned HTNs

capture preferences rather than domain abstractions

Evaluate predictive power Compare distributions

rather than structure

Preference obfuscation Poor graduate

student who prefers to travel by plane usually travels by car

Learning user plan preferences obfuscated by feasibility constraints. ICAPS’09