DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent...
-
Upload
charles-spencer -
Category
Documents
-
view
219 -
download
0
Transcript of DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1 Adaptive Intelligent...
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 1
Adaptive Intelligent Mobile Robotics
Leslie Pack Kaelbling
Artificial Intelligence Laboratory
MIT
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 2
Pyramid
•Addressing problem at multiple levels
Planning
Built-in Behaviors
Learning
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 3
Built-in Behaviors
Goal: general-purpose, robust visually guided local navigation
• optical flow for depth information• finding the floor
• optical flow information• Horswill’s ground-plane method
• build local occupancy grids• navigate given the grid
• reactive methods• dynamic programming
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 4
Reactive Obstacle Avoidance
Standard method in mobile robotics is to use potential fields
• attractive force toward goal• repulsive forces away from obstacles• robot moves in direction given by resultant force
New method for non-holonomic robots: move the center of the robot so that the front point is holonomic
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 5
Human Obstacle Avoidance
Control law based on visual angle and distance to goal and obstacles
Parameters set based on experiments with humans in large free-walking VR environment
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 6
Humans are Smooth!
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 7
Behavior Learning
Typical RL methods require far too much data to be practical in an online setting. Address the problem with
• strong generalization techniques• locally weighted regression• “skeptical” Q-Learning
• bootstrapping from human-supplied policy• need not be optimal and might be very wrong• shows learner “interesting” parts of the space• “bad” initial policies might be more effective
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 8
Two Learning Phases
LearningSystem
SuppliedControlPolicy
Environment
Phase One
AR O
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 9
Two Learning Phases
LearningSystem
SuppliedControlPolicy
Environment
AR O
Phase Two
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 10
New Results
Drive to goal, avoiding obstacles in visual field
Inputs (6 dimensions):• heading and distance to goal• image coordinates of two obstacles
Output:• steering angle
Reward:• +10 for getting to goal; -5 for running over obstacle
Training: simple policy that avoids one obstacle
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 11
Robot’s View
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 12
Local Navigation
Average steps to goal
0
50
100
150
200
250
-25 -15 -5 5 15
Training runs
Steps to goal
JAQLOptimalTrainer
Phase 1 Phase 2
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 13
Map Learning
Robot learns high-level structure of environment• topological maps appropriate for large-scale
structure• low-level behaviors induce topology• based on previous work using sonar• vision changes problem dramatically
• no more problems with many states looking the same
• now same state always looks different!
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 14
Sonar-Based Map Learning
DataTrue Model
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 15
Current Issues in Map Learning
• segmenting space into “rooms”• detecting doors and corridor openings• representation of places
• stored images• gross 3D structure• features for image and structure matching
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 16
Large Simulation Domain
Use for learning and large-scale experimentation that is impractical on a real robot
• built using video-game engine• large multi-story building• packages to deliver• battery power management• other agents (to survey)• dynamically appearing items to collect• general Bayes-net specification so it can be used
widely as a test bed
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 17
Hierarchical MDP Planning
Large simulated domain has unspeakably many primitive statesUse hierarchical representation for planning
• logarithmic improvement in planning times• some loss of optimality of plans
Existing work on planning and learning given a hierarchy• temporal abstraction: macro actions• spatial abstraction: aggregated states
Where does the hierarchy come from?• combined spatial and temporal abstraction• top-down splitting approach
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 18
Region-Based Hierarchies
Divide state space into regions• each region is a single abstract state at next level• polices for moving through regions are abstract
actions at next level
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 19
Choosing Macros
Given a choice of a region, what is a good set of macro actions for traversing it?
• existing approaches guarantee optimality with a number of macros exponential in the number of exit states
• our method is approximate, but works well when here are no large rewards inside the region
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 20
Point-Source Rewards
• Compute a value function for each possible exit state, offline
• Given a new valuation of all exit states online• Quickly combine value functions to determine
near-optimal action
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 21
Approximation is Good
0
100
200
300
400
500
600
700
0 500 1000
Distance between point sources
Value
OptimalPoint Source
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 22
How to Use the Hierarchy
Off line:• Decompose environment into abstract states• Compute macro operators
On line:• Given new goal, assign values to exits at highest
level• Propagate values at each level• In current low-level region, choose action
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 23
What Makes a Decomposition Good?
Trade off• decrease in off-line planning time• decrease in on-line planning time• decrease in value of actions
We can articulate this criterion formally but…
… we can’t solve it
Current research on reasonable approximations
DARPA Mobile Autonomous Robot SoftwareLeslie Pack Kaelbling; March 2001 24
Next Steps
Low-level• apply JAQL to tune obstacle avoidance behaviors
Map learning• landmark selection and representation• visual detection of openings
Hierarchy• algorithm for constructing decomposition• test hierarchical planning on huge simulated
domain