Learning from Observation Using Primitives Darrin Bentivegna.
-
Upload
allison-miranda-whitehead -
Category
Documents
-
view
219 -
download
0
Transcript of Learning from Observation Using Primitives Darrin Bentivegna.
Learning from Observation Using Primitives
Darrin Bentivegna
Bentivegna Thesis, July 2004
Outline
Motivation Test environments Learning from observation Learning from practice Contributions Future directions
Bentivegna Thesis, July 2004
Motivation
Reduce the learning time needed by robots. Quickly learn skills from observing others. Improve performance through practice . Adapt to environment changes. Create robots that can interact with and learn
from humans in a human-like way.
Bentivegna Thesis, July 2004
Real World Marble Maze
Bentivegna Thesis, July 2004
Real World Air Hockey
Bentivegna Thesis, July 2004
Research Strategy
Roll Off WallRoll To Corner
Guide Roll From WallLeave Corner
Domain knowledge: library of primitives. Manually defining primitives is a natural way to
specify domain knowledge. Focus of research is on how to use a fixed
library of primitivesMarble Maze Primitives
Bentivegna Thesis, July 2004
Primitives in Air Hockey
Right Bank Shot Straight ShotLeft Bank Shot
- Defend Goal -Static Shot -Idle
Bentivegna Thesis, July 2004
Take home message
Learning using primitives greatly speeds up learning and allows more complex problems to be performed by robots.
Memory based learning makes learning from observation easy.
I created a way to do memory based reinforcement learning. Problem is no fixed set of parameters to adjust. Learn by adjusting distance function.
Present algorithms that learn from both observation and practice.
Bentivegna Thesis, July 2004
Raw Data
Observe Critical Events in Marble Maze
Bentivegna Thesis, July 2004
Raw Data
Wall Contact Inferred
Observe Critical Events in Marble Maze
Bentivegna Thesis, July 2004
Shots made by human
Human paddle movement
Puck movement
+x
+y
Pad
dle
YP
addl
e X
Puc
k X
Puc
k Y
Shots made by human
Human paddle movement
Puck movement
+x
+y
+x
+y
Pad
dle
YP
addl
e X
Puc
k X
Puc
k Y
Observe Critical Events in Air Hockey
Bentivegna Thesis, July 2004
Learning From Observation
Memory-based learner: Learn by storing experiences.
Primitive selection: K-nearest neighbor. Sub-goal generation: Kernel regression
(distance weighted averaging) based on remembered primitives of the appropriate type.
Action generation: Learned or fixed policy.
Bentivegna Thesis, July 2004
Three Level Structure
Primitive Selection
Action Generation
Sub-goalGeneration
Bentivegna Thesis, July 2004
Learning from Observation Framework
Primitive Selection
Action Generation
Sub-goalGeneration
Learning from Observation
Bentivegna Thesis, July 2004
Observe Primitives Performed by a Human
◊-Guide ○-Roll To Corner □- Roll Off Wall*-Roll From Wall X-Leave Corner
0 5 10 15 20 250
5
10
15
20
Bentivegna Thesis, July 2004
Primitive Database
Create a data point for each observed primitive. The primitive type performed: TYPE State of the environment at the start of the primitive
performance.
State of the environment at the end of the primitive performance.
),,,,,( yxyxyx BBMMMM
),,,,,( yxyxyx EBEBMEMEEMEM
),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM
Bentivegna Thesis, July 2004
Marble Maze Example
Primitive Selection
Action Generation
Sub-goalGeneration
Primitive Selection
Action Generation
Sub-goalGeneration
Bentivegna Thesis, July 2004
Primitive Type Selection
Lookup using environment state.
Weighted nearest neighbor.
Many ways to select a primitive type. Use closest point. Use n nearest points to vote.
Highest frequency. Weighted by distance from the query point.
j jjj qxwqxd 2)(),(
),,,,,( yxyxyx BBMMMMq
),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM
Primitive Selection
Action Generation
Sub-goalGeneration
Primitive Selection
Action Generation
Sub-goalGeneration
Bentivegna Thesis, July 2004
Sub-goal Generation
Locally weighted average over nearby primitives (data points) of the same type.
Use a kernel function to control the influence of nearby data points. 2dedK
n
i
n
ii
qxdK
qxdKyqy
1
1
)),((
)),(()( ),,,,,( yxyxyx EBEBMEMEEMEM
),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM
Primitive Selection
Action Generation
Sub-goalGeneration
Primitive Selection
Action Generation
Sub-goalGeneration
Bentivegna Thesis, July 2004
Action Generation
Provides the action (motor command) to perform at each time step.
LWR, neural networks, physical model, etc.
Primitive Selection
Action Generation
Sub-goalGeneration
Primitive Selection
Action Generation
Sub-goalGeneration
Bentivegna Thesis, July 2004),( yx BB )),,,,,(),,,,,,(( yxyxyxyxyxyx EBEBMEMEEMEMBBMMMM
Creating an Action Generation Module (Roll to Corner)
Record at each time step from the beginning to the end of the primitive: Environment state: Actions taken: End state:
),,,,,( yxyxyx BBMMMM
),,,,,( yxyxyx EBEBMEMEEMEM ),( yx BB
Observed environment states
End of theRoll to Corner Primitive
Start of the Roll to Corner Primitive
Incoming Velocity Vector
Bentivegna Thesis, July 2004
Transform to a Local Coordinate Frame
Global information
Primitive specific local information.
),( yx BB )),,,,,(),,,,,,(( yxyxyxyxyxyx EBEBMEMEEMEMBBMMMM
),( yx BB )),,(),,,,(( yxxyxx EBEBMEBBMX
Dist to the end
Reference Point
+Y
+X
Bentivegna Thesis, July 2004
Learning the Maze Task from Only Observation
Bentivegna Thesis, July 2004
Related Research: Primitive Recognition
Survey of research in human motion analysis and recognizing human activities from image sequences. Aggarwal and Cai.
Recognize over time. HMM, Brand, Oliver, and Pentland. Template matching, Davis and Bobick.
Discover Primitives. Fod, Mataric, and Jenkins.
Bentivegna Thesis, July 2004
Related Research: Primitive Selection
Predefined sequence. Virtual characters, Hodgins, et al., Faloutsos, et
al., and Mataric et al. Mobile robots, Balch, et al. and Arkin, et al.
Learn from observation. Assembly, Kuniyoshi, Inaba, Inoue, and Kang.
Use a planning system. Assembly, Thomas and Wahl. RL, Ryan and Reid.
Bentivegna Thesis, July 2004
Related Research: Primitive Execution Predefine execution policy.
Virtual characters, Mataric et al., and Hodgins et al.
Mobile robots, Brooks et al. and Arkin. Learn while operating in the environment.
Mobile robots, Mahadevan and Connell RL, Kaelbling, Dietterich, and Sutton at al.
Learn from observation Mobile robots, Larson and Voyles, Hugues and
Drogoul, Grudic and Lawrence. High DOF robots, Aboaf et al., Atkeson, and
Schaal.
Bentivegna Thesis, July 2004
Primitive Selection
Action Generation
Sub-goalGeneration
Learning from Observation
Review
Bentivegna Thesis, July 2004
Using Only Observed Data
Tries to mimic the teacher. Can not always perform primitives as well as
the teacher. Sometimes select the wrong primitive type for
the observed state. Does not know what to do in states it has not
observed. No way to know it should try something
different. Solution: Learning from practice.
Bentivegna Thesis, July 2004
Primitive Selection
Action Generation
Sub-goalGeneration
Learning from Observation
Learning from Practice
Improving Primitive Selection and Sub-goal Generation from Practice
Bentivegna Thesis, July 2004
Improving Primitive Selection and Sub-goal Generation Through Practice
Need task specification information to create a reward function.
Learn by adjusting distance to query: Scale distance function by value of using a data point.
f(data point location, query location) related to Q value: 1/Q or exp(-Q)
Associate scale values with each data point. The scale values must be stored, and learned.
j jjj qxwqxd 2)(),( ),(),(),( qxfqxdqxd
Bentivegna Thesis, July 2004
Store Values in Function Approximator
Look-up table. Fixed size.
Locally Weighted Projection Regression (LWPR), Schaal, et al.
Create a model for each data point. Indexed by the difference between the query
point and data point’s state (delta-state).
),,,,,,( yxyxyx EBEBMEMEEMEMTYPE ),,,,,( yxyxyx BBMMMM
Bentivegna Thesis, July 2004
Learn Values Using a Reinforcement Learning Strategy
State: delta-state. Action: Using this data point. Reward Assignment
Positive: Making progress through the maze. Negative:
Falling into a hole. Going backwards through the maze. Taking time performing the primitive.
Bentivegna Thesis, July 2004
Learning the Value of Choosing a Data Point (Simulation)
Incoming Velocity Vector
Observed Roll Off Wall Primitive
= Two marble positions with the incoming velocity as shown when the LWPR model associated with the Roll Off Wall primitive shown is queried.
Testing area
+Y
+X
(12.9,18.8)
Computed Scale Values
BAD
GOOD
Bentivegna Thesis, July 2004
Maze Learning from Practice
0 50 100 150 200 250 3000
20
40
60
80
100
120
140
160
180
200
Cumulative failures/meter
Table
LWPR
Obs. Only
Real World Simulation
Bentivegna Thesis, July 2004
Learning New StrategiesHuman
Learning From Observation After Learning From Practice
Bentivegna Thesis, July 2004
Primitive Selection
Action Generation
Sub-goalGeneration
Learning from Observation
Learning from Practice
Learning Action Generation from Practice
Bentivegna Thesis, July 2004
Improving Action Generation Through Practice
Environment changes over time. Need to compensate for structural modeling
error. Can not learn everything from only observing
others.
Bentivegna Thesis, July 2004
Knowledge for Making a Hit
After hit location has been determined. Puck movement Puck-paddle collision Paddle placement Paddle movement timing
HitLocation
TargetLocation Path of the
incoming puck
Hit LineAbsolutePost-hitVelocity
TargetLine
PuckMotion
Impact Robot
TargetLocation
OutgoingPuck Velocity
IncomingPaddle Velocity
RobotTrajectory
Bentivegna Thesis, July 2004
Results of Learning Straight Shots (Simulation)
Observed 44 straight shots made by the human.
Running average of 5 shots. Too much noise in hardware sensing.
Practice100 Shots
Number of shot taken
Tar
get
E
rro
r(m
ete
rs)
PuckMotion
Impact Robot
TargetLocation
OutgoingPuck Velocity
IncomingPaddle Velocity
RobotTrajectory
Bentivegna Thesis, July 2004
Robot Model (Real World)
PuckMotion
Impact Robot
TargetLocation
OutgoingPuck Velocity
IncomingPaddle Velocity
RobotTrajectory
Bentivegna Thesis, July 2004
Obtaining Proper Robot Movement
Six set robot configurations. Interpolate between the four
surrounding configurations.
P(x,y)
+Y
+X
Compute joint angles
Paddle command: Desired end location and time of the trajectory, (x, y, t). Follows fifth-order polynomial equation, zero start and end
velocity and acceleration.
Bentivegna Thesis, July 2004
Robot Model
Desired state of the puck at hit time.
Robot trajectory
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40
0.05
0.1
0.15
0.2
0.25
0.3
+x
+y
Starting location
(x, y, t)
Compute the movement command
Pre-set time delay
Generate robot
trajectory
(x, y, t)
Bentivegna Thesis, July 2004
Robot Movement Errors
0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 0.40
0.05
0.1
0.15
0.2
0.25
0.3
Desired trajectory
Observed path of the paddle.Desired hit location –
Location of highest paddle velocity -
+x
+y
Starting location
Movement accuracy determined by many factors. Speed of the movement. Friction between the paddle and the board. Hydraulic pressure applied to the robot. Operating within the designed performance parameters.
Bentivegna Thesis, July 2004
Robot Model
Learn to properly place the paddle. Learn the timing of the paddle. Observe its own actions:
Actual hit point (highest velocity point). Time from when the command is given to the
time the paddle observed at the hit position.
PuckMotion
Impact Robot
TargetLocation
OutgoingPuck Velocity
IncomingPaddle Velocity
RobotTrajectory
Bentivegna Thesis, July 2004
Improving the Robot Model
Desired state of the puck at hit time.
Robot trajectory
Robot Movement
LWPR
(x, y, t)
Time delay from
LWPR
Generate robot
trajectory
(x, y, t)
Timing info.
(x, y, t)
Compute the movement command
Pre-set time delay
Generate robot
trajectory
(x, y, t)
Bentivegna Thesis, July 2004
Using the Improved Robot Model
Desired trajectory
Observed path of the paddle.
Desired hit location – Location of highest paddle velocity -
+x
+y
Starting location
Bentivegna Thesis, July 2004
Using the Improved Robot Model
Desired trajectory
Observed path of the paddle.
Desired hit location – Location of highest paddle velocity -
+x
+y
Starting location
Bentivegna Thesis, July 2004
Real-World Air Hockey
Bentivegna Thesis, July 2004
Major Contributions
A framework has been created as a tool in which to perform research in learning from observation using primitives. Flexible structure allows for the use of various learning
algorithms. Can also learn from practice.
Presented learning methods that can learn quickly from observed information and also have the ability to increase performance through practice.
Created a unique algorithm that gives a robot the ability to learn the effectiveness of data points in a data base and then use that information to change its behavior as it operates in the environment.
Presented a method of breaking the learning problem into small learning modules. Individual modules have more opportunities to learn and
generalize.
Bentivegna Thesis, July 2004
Some Future Directions
Automatically defining primitive types. Explore how to represent learned information so it can
be used in other tasks/environments. Can robots learn about the world from playing these
games? Explore other ways to select primitives and sub-goals. Use the observed information to create a planner. Investigate methods of exploration at primitive
selection and sub-goal generation. Simultaneously learn primitive selection and action
generation.