Download - Stochastic Optimization of Bipedal Walking using Gyro Feedback and Phase Resetting

Muhammad Al-NasserMohammad Shahab

Stochastic Optimization of Bipedal Walking using Gyro Feedback and

Phase Resetting

King Fahd University of Petroleum and Minerals

March 2008 COE584: Robotics

COE 584/484: Robotics

Outline1. Problem Definition2. Physical Description3. Humanoid Walking System4. Feedback

1. Gyroscope2. Phase Resetting

5. Stochastic Optimization1. PGRL

6. Experimentation7. Comments

Problem Definition

Authors Felix Faber & Sven Behnke, Univ. of

Freinbrg, Germany

Problem Statement: “to optimize the walking pattern of a

humanoid robot for forward speed using suitable metaheuristics”

First Humanoid Robot!

• 1206 AD

• Ibn Ismail Ibn al-Razzaz Al-Jazari

• A boat with four programmable automatic musicians that floated on a lake to entertain guests at royal drinking parties!!

Problem Definition

• Problems?

Nonlinear Dynamics: i.e. complex system to control

Sensor Noise:CameraGyroscopeUltrasonicForce…

Environment Disturbances:Unknown surface…

Inaccurate Actuators:Motors…

Physical Description

• Jupp, team NimbRo

• 60 cm, 2.3 kg

• Pocket PC

Physical Description

• Pitch joint to bend trunk

• Each leg• 3DOF hip• Knee• 2DOF ankle

• Each arm• 2DOF shoulders• elbow

Humanoid Walking System• One Approach

• Model-Based (Geometric Model)• Accurate Model• Solving motion equations for all joints (offline)• 19 Degrees of Freedom• Nonlinear model equations• Computational complexity

ControllerLeg Motion

Trajectory

Joints motor positions

’s

Robot walks!

Humanoid Walking System• 2nd Approach

Controller


’s

• Central Pattern Generators (CPG)• Sinusoid joint trajectory generated• Bio-Inspired• no need for model

Humanoid Walking System• Open-loop (no feedback) Gait

• Mechanism1. Shifting weight from one leg to the other2. Shortening the leg not needed3. Leg motion in forward direction

Humanoid Walking SystemOpen-loop GaitClock-driven, Trunk phase being central

clockTrunk Phase (with ‘foot step frequency’ )

Right leg motion phase = Trunk + /2Left leg motion phase = Trunk - /2

time

-

Humanoid Walking System

• (continued)

Kinematic Mapping

Left

Right

Leg

Foot

yLeg

pLeg

rLeg

Leg

pFoot

rFoot

Foot

r: Rollp: Pitchy: Yaw

“Human-Like Walking using Toes Joint and Straight Stance Leg” by Behnke

Swing

Swing is leg swing amplitude

Is leg extension

Feedback•Overall Control System


’s

Mapping

Controller

1. Gyroscope: Gyro = Inclination (Balance) Angular Velocity

2. Force Sensing Resistors: foot touch ground trigger (‘High’ or ‘Low’)

Feedback•Gyroscope

– device for measuring orientation, based on the principles of conservation of angular momentum

– Remember Physics 101!

http://en.wikipedia.org/wiki/Image:3D_Gyroscope.png

Feedback P-Control

Gyro increase = robot fall

• Proportional Control • reactive action proportionate to ‘error’ (Error = sensor value –

desired value)• Desired values = zero (i.e. no inclination)

• Other: Proportional-Integral Control• action proportionate to ‘error’ and proportionate to

accumulation of ‘error’


’s

Gyro

pGyro

p

rGyro

r

FootOldFootNew K

K

Feedback• Overall System


’s

Mapping

P-Control

Feedback• Overall System

Controller


’s

Online Adaptation(Stochastic Optimization)

• Adaptive Control• Online tuning of ‘parameters’ of the

controller

Stochastic Optimization Approach

• Goal:– Adjust parameters to achieve faster and

more stable walk.

• Fitness function (cost function) is used to express optimization goals (i.e. speed & robustness)

f (.): RN--->RN: number of parameters of interest

)(xf


• The parameters are

Kinematic Mapping

(Behnke paper)


• We evaluate f in a given set of parameters• x = [x1 , x2 , ... , xN] (Table 1)

• Now, how to find the values of the parameters that will result in the highest fitness value?– use a metaheuristic method called PGRL

?+1

d <dexp

Policy Gradient Reinforcement Learning (PGRL)

• An optimization method to maximize the walking speed

• It automatically searches a set of possible parameters aiming to find the fastest walk that can be achieved

Policy Gradient Reinforcement Learning

• How dose PGRL work?1st : generates randomly B test polices {x1, x2,…,

xB} • around an initially given set of parameter vector

xπ

• (where x = [x1 , x2 , … , xN])

– Each parameter in a given test policy xi is randomly set to

• where 1≤i ≤B and 1 ≤j ≤N• ε is a small constant value

jjj xorxx ,


• 2nd: – the test policy is evaluated by ‘fitness

function’.

• For each parameter j is grouped into 3 categories

• Which are• depending on where the jth parameter is

modified by –ε, 0, +ε

jjj SorSS 0,


• Next 3rd , construct vector a=[a1, a2, …, aN]

• As are average of each category


• Then 4th (finally), adjust xπ as follows

where η is a scalar step size

Extension to PRLG

• Adaptive step sizeafter g steps:

where s: the number of fitness functions

evaluationsS: maximum allowed number of s

Overall

• Overall System

Controller


’s

PGRLxπ

Experiment

Results

Results

• speed is 21.3 cm/s

• fitness is 1.36

• Speed is 34.0 cm/s

• Fitness is 1.52

After 1000 iteration

Initial

60%

Parameters

Glossary

• Stance leg: – the leg which is on the floor during the walk.

• Swing leg:– the leg which moving during the walk.

• Single support:– The case where robot is touching the floor with one

leg.

• Double support:– The case where robot is touching the floor with both

legs.