Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano...

1
RSVM visualized as a classification task on pairs: RSVM heuristic has a positive slope and thus is able to correctly rank both example pairs Kendall Rank Correlation Coefficient () : monotonic correlation (i.e. normalized # of correctly ranked) Only heuristic ordering matters in a greedy search Solve using Rank Support Vector Machine (RSVM) Equivalent to SVM on ranking pairs Only penalize examples from the same plan Can use non-negativity constraints (NN) Root Mean Squared Error (RMSE) Solve using Ridge Regression (RR) Cross validation to select Sacrifices correct orderings to produce predictions close to outputs Extremely sensitive to noisy data, imperfect feature representation, limited function class, and scaling of problems Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez MIT CSAIL, {caelan,lpk,tlp}@csail.mit.edu Learning to Rank for Synthesizing Planning Heuristics Learn linear model for heuristic function . Choice of loss function: Background Models for Heuristic Learning Results min w ||w || 2 + C n X i=1 m i X j =1 m i X k =j +1 ijk s.t. φ(x i j ) T w φ(x i k ) T w +1 - ijk , 8y i j y i k , 8i ijk 0, 8i,j,k . (φ(x i j ) - φ(x i k )) T w 1 - ijk , 8y i j y i k , 8i min w ||φ(X )w - Y || 2 + λ||w || 2 RMSE = 1 n n X i=1 v u u t 1 m i m i X j =1 (f (x i j ) - y i j ) 2 Solved 0 10 20 30 40 50 60 70 80 90 Original RR Single RR Pair RSVM Single RSVM Pair NN RSVM Pair FF CEA RMSE vs solved instances Solved 0 5 10 15 20 25 30 35 RMSE 0 50 100 150 200 vs solved instances Solved 0 5 10 15 20 25 30 35 0.84 0.88 0.92 0.96 1 • 2014 IPC learning track domains: elevators, transport, parking, no-mystery (90 of the largest testing problems) • 6 configurations of deferred greedy best-first search for FF and CEA heuristics • Feature representations (Single, Pair) • Learning techniques (Original, RR, RSVM, NN RSVM) Scatter plots of learned heuristics RMSE and vs number solved for transport • RMSE positively correlated implies bad loss function positively correlated implies good loss function Feature Representation Learn heuristic function for greedy best-first search to improve coverage and efficiency of domain-specific planning Distribution of deterministic planning problems Generate training examples from each solvable problem • Use states on a plan to generate supervised pairs Inputs are states along with their problem Outputs are distances-to-go Training plans are often prohibitively noisy - locally smooth plans with plan neighborhood graph search [Nakhost, 2010] {1 , ..., n } hx i j ,y i j i x i j = hs i j , i i i y i j Learning traditionally needs function ɸ(x) that embeds input Obtain features from existing domain-independent heuristics Some heuristics produce approximate partially-ordered plans - FastForward (FF), Context-Enhanced Add (CEA), … Single actions - count instances of each action schema. Unable to capture approximations in approx. plan (~7 features) Pairwise actions - count partial-orders along with interacting effects and preconditions (~40 features) x 1 x 2 g s 0 x 1 x 2 g s 0 x 1 x 2 g s 0 A B C Q A B C Q Stack (B,C) Pick (A) Stack (A,B) Start Goal Pick (B) Pick (Q) • Simple: ɸ(x) = [Pick: 3, Stack: 2] • Pairwise: ɸ(x) = [Pick-Hold-Stack: 2, Pick-Clear-Stack: 1, …] Blocksworld start and goal (stack ABC) d = 6, h FF = 5 f (x)= φ(x) T w λ 2 [-1, 1] 1D case study: 2 problems with 2 examples RR heuristic has a negative slope pushing states away from goal - + φ(x i a ) - φ(x i b ) a, b =1, 2 a, b =2, 1 f RSV M (x) φ(x) y 1 2 x 1 1 x 1 2 x 2 2 x 2 1 f RR (x) Conclusions Pairwise features able to encode more information is generally correlated with planner performance RankSVM improves heuristic performance by optimizing

Transcript of Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano...

Page 1: Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano ...web.mit.edu/caelan/www/presentations/ijcai_poster_2016.pdfCaelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez

RSVM visualized as a classification task on pairs:

RSVM heuristic has a positive slope and thus is able to correctly rank both example pairs

Kendall Rank Correlation Coefficient (𝜏) : monotonic correlation

(i.e. normalized # of correctly ranked) Only heuristic ordering matters in a greedy search Solve using Rank Support Vector Machine (RSVM)

Equivalent to SVM on ranking pairs

Only penalize examples from the same plan Can use non-negativity constraints (NN)

Root Mean Squared Error (RMSE)

Solve using Ridge Regression (RR)

Cross validation to select

Sacrifices correct orderings to produce predictions close to outputs

Extremely sensitive to noisy data, imperfect feature representation, limited function class, and scaling of problems

Caelan Reed Garrett, Leslie Pack Kaelbling, Tomás Lozano-Pérez MIT CSAIL, {caelan,lpk,tlp}@csail.mit.edu

Learning to Rank for Synthesizing Planning Heuristics

Learn linear model for heuristic function . Choice of loss function:

Background

Models for Heuristic Learning

Results

minw

||w||2 + C

nX

i=1

miX

j=1

miX

k=j+1

⇠ijk

s.t. �(xij)

Tw � �(xi

k)Tw + 1� ⇠ijk, 8yij � y

ik, 8i

⇠ijk � 0, 8i, j, k .

(�(xij)� �(xi

k))Tw � 1� ⇠ijk, 8yij � y

ik, 8i

minw

||�(X)w � Y ||2 + �||w||2

RMSE =1

n

nX

i=1

vuut 1

mi

miX

j=1

(f(xij)� y

ij)

2

Sol

ved

0

10

20

30

40

50

60

70

80

90

Original RR Single RR Pair RSVM Single RSVM Pair NN RSVM Pair

FF CEA

RMSE vs solved instances

Sol

ved

0

5

10

15

20

25

30

35

RMSE

0 50 100 150 200

⌧ vs solved instances

Sol

ved

0

5

10

15

20

25

30

35

0.84 0.88 0.92 0.96 1

• 2014 IPC learning track domains: elevators, transport, parking, no-mystery (90 of the largest testing problems) • 6 configurations of deferred greedy best-first search for FF and CEA heuristics

• Feature representations (Single, Pair) • Learning techniques (Original, RR, RSVM, NN RSVM)

• Scatter plots of learned heuristics RMSE and 𝜏 vs number solved for transport • RMSE positively correlated implies bad loss function • 𝜏 positively correlated implies good loss function

Feature Representation• Learn heuristic function for greedy best-first search to

improve coverage and efficiency of domain-specific planning • Distribution of deterministic planning problems • Generate training examples from each solvable problem

• Use states on a plan to generate supervised pairs • Inputs are states along with their problem • Outputs are distances-to-go

• Training plans are often prohibitively noisy - locally smooth plans with plan neighborhood graph search [Nakhost, 2010]

{⇧1, ...,⇧n}

hxij , y

iji

x

ij = hsij ,⇧ii

⇧i

yij

• Learning traditionally needs function ɸ(x) that embeds input • Obtain features from existing domain-independent heuristics

• Some heuristics produce approximate partially-ordered plans - FastForward (FF), Context-Enhanced Add (CEA), …

• Single actions - count instances of each action schema. Unable to capture approximations in approx. plan (~7 features)

• Pairwise actions - count partial-orders along with interacting effects and preconditions (~40 features)

x1

x2

g

s0

x1

x2

g

s0

x1

x2

g

s0A B C

QABCQ

Stack(B,C)

Pick(A)

Stack(A,B)

Start Goal

Pick(B)

Pick(Q)

• Simple: ɸ(x) = [Pick: 3, Stack: 2] • Pairwise: ɸ(x) = [Pick-Hold-Stack: 2, Pick-Clear-Stack: 1, …]

Blocksworld start and goal (stack ABC)

d = 6, hFF = 5

f(x) = �(x)Tw

⌧ 2 [�1, 1]

1D case study: 2 problems with 2 examples

RR heuristic has a negative slope pushing states away from goal

� +�(xi

a)� �(xib)a, b = 1, 2 a, b = 2, 1

fRSVM (x)

�(x)

y

⇧1

⇧2

x

11

x

12

x

22

x

21

fRR(x)

Conclusions • Pairwise features able to

encode more information • 𝜏 is generally correlated

with planner performance • RankSVM improves

heuristic performance by optimizing 𝜏