Muhammad Al-NasserMohammad Shahab
Stochastic Optimization of Bipedal Walking using Gyro Feedback and
Phase Resetting
King Fahd University of Petroleum and Minerals
March 2008 COE584: Robotics
COE 584/484: Robotics
Outline1. Problem Definition2. Physical Description3. Humanoid Walking System4. Feedback
1. Gyroscope2. Phase Resetting
5. Stochastic Optimization1. PGRL
6. Experimentation7. Comments
Problem Definition
Authors Felix Faber & Sven Behnke, Univ. of
Freinbrg, Germany
Problem Statement: “to optimize the walking pattern of a
humanoid robot for forward speed using suitable metaheuristics”
First Humanoid Robot!
• 1206 AD
• Ibn Ismail Ibn al-Razzaz Al-Jazari
• A boat with four programmable automatic musicians that floated on a lake to entertain guests at royal drinking parties!!
Problem Definition
• Problems?
Nonlinear Dynamics: i.e. complex system to control
Sensor Noise:CameraGyroscopeUltrasonicForce…
Environment Disturbances:Unknown surface…
Inaccurate Actuators:Motors…
Physical Description
• Jupp, team NimbRo
• 60 cm, 2.3 kg
• Pocket PC
Physical Description
• Pitch joint to bend trunk
• Each leg• 3DOF hip• Knee• 2DOF ankle
• Each arm• 2DOF shoulders• elbow
Humanoid Walking System• One Approach
• Model-Based (Geometric Model)• Accurate Model• Solving motion equations for all joints (offline)• 19 Degrees of Freedom• Nonlinear model equations• Computational complexity
ControllerLeg Motion
Trajectory
Joints motor positions
’s
Robot walks!
Humanoid Walking System• 2nd Approach
Controller
Joints motor positions
’s
• Central Pattern Generators (CPG)• Sinusoid joint trajectory generated• Bio-Inspired• no need for model
Humanoid Walking System• Open-loop (no feedback) Gait
• Mechanism1. Shifting weight from one leg to the other2. Shortening the leg not needed3. Leg motion in forward direction
Humanoid Walking SystemOpen-loop GaitClock-driven, Trunk phase being central
clockTrunk Phase (with ‘foot step frequency’ )
Right leg motion phase = Trunk + /2Left leg motion phase = Trunk - /2
time
-
Humanoid Walking System
• (continued)
Kinematic Mapping
Left
Right
Leg
Foot
yLeg
pLeg
rLeg
Leg
pFoot
rFoot
Foot
r: Rollp: Pitchy: Yaw
“Human-Like Walking using Toes Joint and Straight Stance Leg” by Behnke
Swing
Swing is leg swing amplitude
Is leg extension
Feedback•Overall Control System
Joints motor positions
’s
Mapping
Controller
1. Gyroscope: Gyro = Inclination (Balance) Angular Velocity
2. Force Sensing Resistors: foot touch ground trigger (‘High’ or ‘Low’)
Feedback•Gyroscope
– device for measuring orientation, based on the principles of conservation of angular momentum
– Remember Physics 101!
Feedback P-Control
Gyro increase = robot fall
• Proportional Control • reactive action proportionate to ‘error’ (Error = sensor value –
desired value)• Desired values = zero (i.e. no inclination)
• Other: Proportional-Integral Control• action proportionate to ‘error’ and proportionate to
accumulation of ‘error’
Joints motor positions
’s
Gyro
pGyro
p
rGyro
r
FootOldFootNew K
K
Feedback• Overall System
Joints motor positions
’s
Mapping
P-Control
Feedback• Overall System
Controller
Joints motor positions
’s
Online Adaptation(Stochastic Optimization)
• Adaptive Control• Online tuning of ‘parameters’ of the
controller
Stochastic Optimization Approach
• Goal:– Adjust parameters to achieve faster and
more stable walk.
• Fitness function (cost function) is used to express optimization goals (i.e. speed & robustness)
f (.): RN--->RN: number of parameters of interest
)(xf
Stochastic Optimization Approach
• The parameters are
Kinematic Mapping
(Behnke paper)
Stochastic Optimization Approach
• We evaluate f in a given set of parameters• x = [x1 , x2 , ... , xN] (Table 1)
• Now, how to find the values of the parameters that will result in the highest fitness value?– use a metaheuristic method called PGRL
?+1
d <dexp
Policy Gradient Reinforcement Learning (PGRL)
• An optimization method to maximize the walking speed
• It automatically searches a set of possible parameters aiming to find the fastest walk that can be achieved
Policy Gradient Reinforcement Learning
• How dose PGRL work?1st : generates randomly B test polices {x1, x2,…,
xB} • around an initially given set of parameter vector
xπ
• (where x = [x1 , x2 , … , xN])
– Each parameter in a given test policy xi is randomly set to
• where 1≤i ≤B and 1 ≤j ≤N• ε is a small constant value
jjj xorxx ,
Policy Gradient Reinforcement Learning
• 2nd: – the test policy is evaluated by ‘fitness
function’.
• For each parameter j is grouped into 3 categories
• Which are• depending on where the jth parameter is
modified by –ε, 0, +ε
jjj SorSS 0,
Policy Gradient Reinforcement Learning
• Next 3rd , construct vector a=[a1, a2, …, aN]
• As are average of each category
Policy Gradient Reinforcement Learning
• Then 4th (finally), adjust xπ as follows
where η is a scalar step size
Extension to PRLG
• Adaptive step sizeafter g steps:
where s: the number of fitness functions
evaluationsS: maximum allowed number of s
Overall
• Overall System
Controller
Joints motor positions
’s
PGRLxπ
Experiment
Results
Results
• speed is 21.3 cm/s
• fitness is 1.36
• Speed is 34.0 cm/s
• Fitness is 1.52
After 1000 iteration
Initial
60%
Parameters
Glossary
• Stance leg: – the leg which is on the floor during the walk.
• Swing leg:– the leg which moving during the walk.
• Single support:– The case where robot is touching the floor with one
leg.
• Double support:– The case where robot is touching the floor with both
legs.
Top Related