Planning to Gather Information
Richard DeardenUniversity of Birmingham
Joint work with Moritz Göbelbecker (ALU), Charles Gretton, Bramley Merton (NOC), Zeyn Saigol, Mohan
Sridharan (Texas Tech), Jeremy Wyatt
Underwater Vent Finding
AUV used to find vents• Can detect vent itself (reliably), plume of
fresh water emitted• Problem is where to go to collect data to
find the vents as efficiently as possible• Hard because plume detection is
unreliable, can’t easily assign ‘blame’ for the detections we do make
Vision Algorithm Planning
Goal: Answer queries and execute commands.• Is there a red triangle in the
scene?• Move the mug to the right of the
blue circle.
Our operators: colour, shape, SIFT identification, viewpoint change, zoom etc.
Problem: Build a plan to achieve the goal with high confidence
Assumptions
The visual operators are unreliable• Reliability can be represented by a confusion matrix,
computed from data
Speed of response and answering the query correctly are what really matters• We want to build the fastest plan that is ‘reliable
enough’• We should include planning time in our performance
estimate too
ObservedActual
Square Circle
Triangle
Square 0.85 0.1 0.05
Circle 0.1 0.80 0.1
Triangle
0.1 0.05 0.85
POMDPs
Partially Observable Markov Decision Problems Markov Decision Problem:
• (discrete) States, stochastic actions, reward• Maximise expected (discounted) long-term reward• Assumption: state is completely observable
POMDPs: MDPs with observations• Infer state from (sequence of) observations• Typically maintain belief state, plan over that
$
POMDP Formulation
States: Cartesian product of individual state vectors
Actions: A = {Colour, Shape, SIFT, terminal actions} Observations: {red, green, blue, circle, triangle,
square, empty, unknown}
Transition function
Observation functiongiven by confusion matrices
Reward specification time cost of actions, large +ve/-ve rewards on terminal actions
Maintain belief over states, likelihood of action outcomes
},,,,{ oc
oc
oc
oc
occ UBGRZ
]1,0[: SAST
]1,0[: ZASO
ASR :
},,,,{ os
os
os
os
oss USTCZ
AaaZZ
SIFTColourShapeaITa ,, ,
POMDP Formulation
For a broad query: ‘what is that?’ For each ROI:
• 26 states (5 colours x 5 shapes + term)• 12 actions (2 operations, 10 terminal actions
SayBlueSquare, SayRedTriangle, SayUnknown, …)• 8 observations
For n ROIs:• 25n + 1 states• Impractical for even a very small number of ROIs
BUT: There’s lots of structure. How to exploit it?
A Hierarchical POMDP
Proposed solution: Hierarchical Planning in POMDPs – HiPPo• One LL-POMDP for planning the
actions in each ROI• Higher-level POMDP to choose
which LL-POMDP to use at each step
Significantly reduces complexity of the state-action-observation space
Model creation and policy generation are automatic, based on the input query
Which Region to Process? HL
POMDP
How to Process?
LL POMDP
Low-level POMDP
The LL-POMDP is the same as the flat POMDP• Only ever operates on a
single ROI• 26 states, 12 Actions
Reward combines time-based cost for actions and answer quality
ASR :
,...},{ ShapeColourA
]1,0[: ZASO
},,,,{ oc
oc
oc
oc
occ UBGRZ
},,,,{ os
os
os
os
oss USTCZ
ColourShapeaITa , ,
Aa
aZZ
Terminal actions are answering the query for this region
Example
Query: ‘where is the blue circle?’
State space:{RedCircle, RedTriangle,
BlueCircle, BlueTriangle, …, Terminal}
Actions:{Colour, Shape, …, SayFound,
…}
Observations:{Red, Blue, NoColour, UnknownColour, Triangle, Circle,
NoShape, UnknownShape, …}
Observation probabilities given by confusion matrix
Policy
Policy tree for uniform prior initial state
We limit all LL policies to a fixed maximum number of steps
...
Colour
Shape
sFound sNotFound
Shape Shape
sFound
sNotFound
RB
T
T
T
C
CC
High-level POMDP
State space consists of the regions the object of interest is in
Actions are regions to process Observations are whether the
object of interest was found in a particular region
We derive the observation function and action costs for the HL-POMDP from the policy tree for the LL-POMDP
},,{ 21 sH AuuA
}|{ ROIsRFRZ ii 21, , uuaITa
ASR :]1,0[: ZASO
Treat the LL-POMDP as a black box that returns definite labels (not belief densities)
Example
Query: ‘where is the blue circle?’
State space:
Actions:{DoR1, DoR2, SayR1, SayR2,
SayR1^R2, SayNo}
21212121 ,,, RRRRRRRR
Observations:{FoundR1, ¬FoundR1, FoundR2, ¬FoundR2}
Observation probabilities are computed from the LL-POMDP
Vent Finding Approach
Assume mapping using occupancy grid Rewards only for visiting cells with vents in State space also too large to solve POMDP
• Instead do fixed length lookahead in belief space• Reasoning in belief space allows us to account for
value of information gained from observations• Use P(vent|all observations so far) as heuristic value
at end of lookahead
What we’re working on now
Most of these POMDPs are too big to solve
Take a domain, problem description in a very general language, generate a classical planning problem for it• Assume we can
observe any variable we care about
For each such observation, use a POMDP planner to determine the value of the variable with high confidence
Top Related