optimal control intro for course 2012.ppt control intro... · 2017. 12. 14. · Optimal Control...

43
UTA Research Institute (UTARI) The University of Texas at Arlington, USA F.L. Lewis Optimal Control Introduction Supported by NSF, ARO, AFOSR He who exerts his mind to the utmost knows nature’s pattern. The way of learning is none other than finding the lost mind. Meng Tz 500 BC Man’s task is to understand patterns in nature and society. Mencius

Transcript of optimal control intro for course 2012.ppt control intro... · 2017. 12. 14. · Optimal Control...

  • UTA Research Institute (UTARI)The University of Texas at Arlington, USA

    F.L. Lewis

    Optimal Control Introduction

    Supported byNSF, ARO, AFOSR

    He who exerts his mind to the utmost knows nature’s pattern.

    The way of learning is none other than finding the lost mind.

    Meng Tz500 BC

    Man’s task is to understand patterns innature and society.

    Mencius

  • Optimal Control

    Reinforcement learning

    Control of Robotic Devices

    Adaptive Learning of Human-Robot Interfaces

    Dynamic Games for HRI

    Importance of Feedback Control

    Darwin- FB and natural selectionVolterra- FB and fish population balanceAdam Smith- FB and international economyJames Watt- FB and the steam engineFB and cell homeostasis

    The resources available to most species for their survival are meager and limited

    Nature uses Optimal control

    SYSTEMoutputsAlfred North Whitehead

    Von BertalanffySystems Theory 1920s

    inputs

    processing

  • Relevance- Machine Feedback Control

    qd

    qr1 qr2

    AzEl

    barrel flexiblemodes qf

    compliant coupling

    moving tank platform

    turret with backlashand compliant drive train

    terrain andvehicle vibrationdisturbances d(t)

    Barrel tipposition

    qd

    qr1 qr2

    AzEl

    barrel flexiblemodes qf

    compliant coupling

    moving tank platform

    turret with backlashand compliant drive train

    terrain andvehicle vibrationdisturbances d(t)

    Barrel tipposition

    Vehicle mass m

    ParallelDamper

    mc

    activedamping

    uc(if used)

    kc cc

    vibratory modesqf(t)

    forward speedy(t)

    vertical motionz(t)

    surface roughness(t)

    k c

    w(t)

    Series Damper+

    suspension+

    wheel

    Single-Wheel/Terrain System with Nonlinearities

    High-Speed Precision Motion Control with unmodeled dynamics, vibration suppression, disturbance rejection, friction compensation, deadzone/backlash control

    VehicleSuspension

    IndustrialMachines

    Military LandSystems

    Aerospace

    plant

    controlu(t)

    outputy(t)

    controller

    systemidentifier estimated 

    output 

    )(ˆ ty

    identificationerror

    desiredoutput

    )(tyd

    plant

    controlu(t)

    outputy(t)

    controllerdesiredoutput

    )(tyd

    trackingerror

    plant

    controlu(t)

    outputy(t)

    controller #1

    controller #2

    desiredoutput

    )(tyd

    trackingerror

    Indirect Scheme

    Fixed Controller Topologies

    Direct Scheme

    Feedback/FeedforwardScheme

  • Cell Homeostasis The individual cell is a complex feedback control system. It pumps ions across the cell membrane to maintain homeostatis, and has only limited energy to do so.

    Cellular Metabolism

    Permeability control of the cell membrane

    http://www.accessexcellence.org/RC/VL/GG/index.html

    Optimality in Biological Systems

    Optimality in Control Systems DesignR. Kalman 1960

    Rocket Orbit Injection

    http://microsat.sm.bmstu.ru/e-library/Launch/Dnepr_GEO.pdf

    FmmmF

    rwvv

    mF

    rrvw

    wr

    cos

    sin22

    ObjectivesGet to orbit in minimum timeUse minimum fuel

    Dynamics

  • Optimal Control is Effective for:Aircraft AutopilotsVehicle engine controlAerospace VehiclesShip ControlIndustrial Process ControlRobot Control

    Optimal Control:Minimum timeMinimum fuelMinimum energyConstrained control

    Optimal control solutions are found byOffline solution of Matrix Design equationsA full dynamical model of the system is needed

    Optimality and Games

    PBPBRQPAPA TT 10

    1 Tu R B Px Kx

    BuAxx ( ( )) ( ) ( ) ( )T T T

    t

    V x t x Qx u Ru d x t Px t

    Full system dynamics must be knownOff‐line solution

    0 ( , , ) 2 2 ( )T

    T T T T T T TV VH x u V x Qx u Ru x x Qx u Ru x P Ax Bu x Qx u Rux x

    Optimal Control: Linear Quadratic Regulator (LQR)

    System 

    Performance Index

    Leibniz’s formula‐

    Optimal Control is SVFB

    Algebraic Riccati equation

    ( , , ) 2( ) 0T T Td V dH x u Ax Bu Px x Qx u Rudu x du

    Stationarity Condition

    2 0TRu B Px

    ( ) ( ) ( ) ( )T T T T T T T TdV x Qx u Ru x Px x Px x Px x Px Ax Bu Px x P Ax Budt

    ( , , )VH x ux

    Hamiltonian function

    Differential equivalent to PI is the Bellman equation

  • PBPBRQPAPA TT 10

    1 Tu R B Px Kx

    BuAxx

    ( ( )) ( ) ( ) ( )T T Tt

    V x t x Qx u Ru d x t Px t

    Full system dynamics must be knownOff‐line solutionCannot change performance objectives

    Optimal Control: Linear Quadratic Regulator

    System model 

    Performance Function

    Optimal Control is

    Algebraic Riccati equation

    MATLAB Control Systems Toolbox

    [K,P]=lqr(A,B,Q,R)

    PBPBRQPAPA TT 10

    1 Tu R B Px Kx

    BuAxx

    ( ( )) ( ) ( ) ( )T T Tt

    V x t x Qx u Ru d x t Px t

    Full system dynamics must be knownOff‐line solution

    0 ( , , ) 2 2 ( )T

    T T T T T T TV VH x u V x Qx u Ru x x Qx u Ru x P Ax Bu x Qx u Rux x

    Optimal Control: Linear Quadratic Regulator

    0 ( ) ( )T TA BK P P A BK Q K RK

    u Kx

    System 

    Cost

    Leibniz’s formula‐ Differential equivalent is the Bellman equation

    Given any stabilizing FB policy

    The cost value is found by solving Lyapunov equation

    Optimal Control is

    Algebraic Riccati equation

    ( , , ) 2( ) 0T T Td V dH x u Ax Bu Px x Qx u Rudu x du

  • Many Successful Design Applications of Optimal Control

  • PBPBRQPAPA TT 10

    1 Tu R B Px Kx

    BuAxx

    ( ( )) ( ) ( , ) ( ) ( )T T Tt t

    V x t x Qx u Ru d r x u d x t Px t

    Full system dynamics must be knownOff‐line solution

    Optimal Control: Linear Quadratic Regulator

    System 

    Cost function

    Optimal Control is

    Algebraic Riccati equation

    MATLAB Control Systems Toolbox

    Chop off Tail of cost function 

    ( ( )) ( , ) ( ( ))t T

    t

    V x t r x u d V x t T

    ( ( )) ( , ) ( , )t T

    t t T

    V x t r x u d r x u d

    Bellman Equation

    1 Tu R B Px Kx Update Control using Hamiltonian Physics

    Reinforcement Learning‐ Policy Iteration

    Online Learning Feedback ControlSynthesis of Robot Control and Biologically Inspired Learning

  • Neural Network Properties

    Learning

    Recall

    Function approximation

    Generalization

    Classification

    Association

    Pattern recognition

    Clustering

    Robustness to single node failure

    Repair and reconfiguration

    Nervous system cell. http://www.sirinet.net/~jgjohnso/index.html

    Dynamic Neural Networks and Control Learning

    Robot System[I]

    Robust ControlTerm

    qqd

    v(t)

    e

    Outer Tracking Loop

    f(x)

    rKv

    Nonlinear Inner Learning Loop

    ^

    Feedforward Loop

    Tunable weights

    dq

  • Theorem 1 (NN Weight Tuning for Stability)

    Let the desired trajectory )(tqd and its derivatives be bounded. Let the initial tracking error be within a certain allowable set U . Let MZ be a known upper bound on the Frobenius norm of the unknown ideal weights Z . Take the control input as

    vrKxVW vTT )ˆ(ˆ with rZZKtv MFZ )()( .

    Let weight tuning be provided by

    WrFxrVFrFW TTT ˆˆ'ˆˆˆ

    , VrGrWGxVTT ˆ)ˆ'ˆ(ˆ

    with any constant matrices 0,0 TT GGFF , and scalar tuning parameter 0 . Initialize the weight estimates as randomVW ˆ,0ˆ .

    Then the filtered tracking error )(tr and NN weight estimates VW ˆ,ˆ are uniformly ultimately bounded. Moreover, arbitrarily small tracking error may be achieved by selecting large control gains vK .

    NEW NN Tuning Laws for Control Learning

    Force Control

    Flexible pointing systemsSimis labs, Inc. and US Army

    Vehicle active suspensionLeo Davis Technol.

    SBIR Contracts

    Learning NN Controller

  • Applications at Boeing Defense Space & Security

    Kevin Wise and Eugene Lavretsky

    Highly reliable adaptive uncertainty approximation compensators for flight control applications:

    unmanned aircraft – Phantom Ray

    Tailkit adaptive control systems for Joint Direct Attack Munition (JDAM) munitions:

    Mk-82, Mk-84, and Laser-guided variantsFielded for defense in Iraq and Afghanistan.

    BUT-Nature is More than StableNature is OPTIMAL and conserves energy

  • We want to Learn optimal control solutions Online in real-time Using adaptive control techniquesWithout knowing the full dynamics

    For Adaptive Man/Machine systems-

    Reinforcement Learning

    Every living organism improves its control actions based on rewards received from the environment

    The resources available to living organisms are usually meager.Nature uses optimal control.

    1. Apply a control.  Evaluate the benefit of that control.2. Improve the control policy.

    RL finds optimal policies by evaluating the effects of suboptimal policies

    Optimality in Biological Systems

  • Different methods of learning

    SystemAdaptiveLearning system

    ControlInputs

    outputs

    environmentTuneactor

    Reinforcementsignal

    Actor

    Critic

    Desiredperformance

    Reinforcement learningIvan Pavlov 1890s

    Actor-Critic Learning

    We want OPTIMAL performance- ADP- Approximate Dynamic Programming

    Sutton & Barto book

    Adaptive Critic structure

    Slow learning Improvement loop

    Reinforcement learning

    Fast inner control loops

  • PBPBRQPAPA TT 10

    1 Tu R B Px Kx

    BuAxx

    ( ( )) ( ) ( , ) ( ) ( )T T Tt t

    V x t x Qx u Ru d r x u d x t Px t

    Full system dynamics must be knownOff‐line solution

    Optimal Control: Linear Quadratic Regulator

    System 

    Cost function

    Optimal Control is

    Algebraic Riccati equation

    MATLAB Control Systems Toolbox

    Chop off Tail of cost function 

    ( ( )) ( , ) ( ( ))t T

    t

    V x t r x u d V x t T

    ( ( )) ( , ) ( , )t T

    t t T

    V x t r x u d r x u d

    Bellman Equation

    1 Tu R B Px Kx Update Control using Hamiltonian Physics

    Reinforcement Learning‐ Policy Iteration

    CT Policy Iteration – How to implement online?Linear Systems Quadratic Cost‐ LQR

    ( ) ( ) ( )( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tx t P x t x Q K RK x d x t T P x t T

    Policy evaluation‐ solve IRL Bellman Equation

    ( ) ( ) ( ) ( ) ( )( ) ( )t T

    T T T Tk k k k

    tx t P x t x t T P x t T x Q K RK x d

    ( ( )) ( ) ( )TV x t x t Px tValue function is quadratic

    ( ) ( ) ( ) ( ) ( ) ( )t T

    T T T Tk k k k

    tp t p x t x t T x Q K RK x d

    Reinforcement on time interval [t, t+T]regression vector

    Same form as standard System ID problems

    Solve using RLS or batch LS

    Union of Reinforcement Learning and Adaptive Control

    Value Function Approximation

  • 1. Select initial control policy

    2. Find associated cost

    3. Improve control 11T

    k kK R B P

    ( ) ( ) ( ) ( ) ( ) ( , )t T

    T T Tk k k

    tp x t x t T x Q K RK x d t t T

    Bellman EquationSolves Lyapunov eq. without knowing dynamics

    t t+T

    observe x(t)

    observe x(t+T)

    apply uk(t)=Kkx(t)

    observe cost integral

    update P

    do RLS until convergence to Pk

    update control gain to Kk+1

    A is not needed anywhere

    This is a data-based approach that uses measurements of x(t), u(t)Instead of the plant dynamical model.

    Integral Reinforcement Learning (IRL)

    Data set at time [t,t+T)

    ( ), ( , ), ( )x t t t T x t T

    Motor control 200 Hz

    Oscillation is a fundamental property of neural tissue

    Brain has multiple adaptive clocks with different timescales

    theta rhythm, Hippocampus, Thalamus, 4-10 Hzsensory processing, memory and voluntary control of movement.

    gamma rhythms 30-100 Hz, hippocampus and neocortex high cognitive activity.

    • consolidation of memory• spatial mapping of the environment – place cells

    The high frequency processing is due to the large amounts of sensorial data to be processed

    Spinal cord

    D. Vrabie, F. Lewis, and Dan Levine- RL for Continuous-Time Systems

  • Doya, Kimura, Kawato 2001

    Limbic system

    Cerebral cortexMotor areas

    ThalamusBasal ganglia

    Cerebellum

    Brainstem

    Spinal cord

    Interoceptivereceptors

    Exteroceptivereceptors

    Muscle contraction and movement

    Summary of Motor Control in the Human Nervous System

    reflex

    Supervisedlearning

    ReinforcementLearning- dopamine

    (eye movement)inf. olive

    Hippocampus

    Unsupervisedlearning

    Limbic System

    Motor control 200 Hz

    theta rhythms 4-10 Hz

    picture by E. StinguD. Vrabie

    Memoryfunctions

    Long term

    Short term

    Hierarchy of multiple parallel loops

    gamma rhythms 30-100 Hz,

  • Adaptive Critic structure

    Theta waves 4-8 Hz

    Reinforcement learning

    Motor control 200 Hz

    F.L. Lewis and D. Vrabie,“Reinforcement learning andadaptive dynamic programmingfor feedback control,” IEEECircuits & Systems Magazine,Invited Feature Article, pp. 32-50, Third Quarter 2009.

    IEEE Control Systems Magazine“Reinforcement learning andfeedback Control,” Dec. 2012

  • D. Vrabie, K. Vamvoudakis, and F.L. Lewis, Optimal Adaptive Control and Differential Games by Reinforcement Learning Principles, IET Press, 2012, to appear.

    Books

    F.L. Lewis, D. Vrabie, and V. Syrmos, Optimal Control, third edition, John Wiley and Sons, New York, 2012.

    New Chapters on:Reinforcement LearningDifferential Games

    Direct Optimal Adaptive Controller

    A hybrid continuous/discrete dynamic controller whose internal state is the observed cost over the interval

    Draguna Vrabie

    DynamicControlSystemw/ MEMORY

    xu

    V

    ZOH T

    x A x B u System

    T Tx Q x u R u

    Critic

    ActorL

    T T

    Run RLS or use batch L.S.To identify value of current control

    Update FB gain afterCritic has converged

    Reinforcement interval T can be selected on line on the fly – can change

    Solves Riccati Equation Online without knowing A matrix

  • Optimal Control Design Allows a Lot of Design Freedom

    ( ( )) ( , )t

    V x t r x u d

    Tailor controls design by choosing utility

    Minimum EnergyMinimum FuelMinimum TimeControl Actuator Constraints

    Optimal Control for Constrained Input Systems

    This is a quasi-norm

    2

    0

    2 ( )u

    Tq

    u d

    Weaker than a norm –homogeneity property is replaced by the weaker symmetry property

    qqxx

    (Used by Lyshevsky for H2 control)

    Control constrained by saturation function tanh(p)

    p

    1

    -1

    0 0

    ( , ) ( ) 2 ( )u

    TJ u d Q x d dt

    Encode constraint into Value function

    Then Is BOUNDED1 ( )T Vu R g xx

    Murad Abu-Khalaf

  • -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6-0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6State Evolution for both controlllers

    x1

    x2

    Switching Surface methodNonquadratic functionals method

    10 0

    tanh( ) 2 ( )u

    TTV x Q x Rd dt

    Near Minimum-Time ControlEncode into Value Function

    Murad Abu-Khalaf

  • Force ControlIntelligent Automation, Inc.

    Flexible pointing systemsSimis labs, Inc. and US Army

    Vehicle active suspensionLeo Davis Technol.

    SBIR ContractsSBA Tibbets Award 1996 (led by TMAC)

    Learning NN Controller

    A. Yesildirek and F.L. Lewis, "Method for feedback linearization of neural networks and

    neural network incorporating same," U.S. Patent 5,943,660, awarded 24 August 1999.

    S. Jagannathan and F.L. Lewis, "Discrete-time tuning of neural network controllers for

    nonlinear dynamical systems," U.S. Patent 6,064,997, awarded 16 May 2000.

    R. Selmic, F.L. Lewis, A.J. Calise, and M.B. McFarland, "Backlash Compensation Using

    Neural Network," U.S. Patent 6,611,823, awarded 26 Aug. 2003.

    J. Campos and F.L. Lewis, "Method for Backlash Compensation Using Discrete-Time

    Neural Networks," U.S. Patent 7,080,055, awarded July 2006.

    Patents

    K. Vamvoudakis, D. Vrabie, and F.L. Lewis, “Control methodology for online adaptation to optimal feedback controller using integral reinforcement learning,” provisional patent, filed March 2012.

  • Applications of our Algorithms at Boeing Defense Space & Security

    Kevin Wise and Eugene Lavretsky

    Highly reliable adaptive uncertainty approximation compensators for flight control applications:

    Unmanned aircraft – Phantom Ray

    Tailkit adaptive control systems for Joint Direct Attack Munition (JDAM) munitions:

    Mk-82, Mk-84, and Laser-guided variants

    Fielded for defense in Iraq and Afghanistan.Saved Lives.

    optimal engine controllers based on RL for Caterpillar, Ford (Zetec engine), and GM

    Applications of Our Algorithms to Auto Engine Control

    Student S. Jagannathan11 US patents

    8-10% improvement in fuel efficiency and a drastic reduction in NOx (90%), HC (30%) and CO (15%) by operating with adaptive exhaust gas recirculation.

    Savings to Caterpillar were over $65,000 per component.

  • A Q-Learning Based Adaptive Optimal Controller Implementation for a Humanoid

    Robot Arm

    Said Ghani Khan1, Guido Herrmann1, Tony Pipe1, Chris Melhuish1

    Bristol Robotics LaboratoryUniversity of Bristol and University of the West of England, Bristol, UK

    Conference on Decision and Control (CDC) 2011, Orlando, 11 December 2011

    The cost function is modified to include constraints

    Introducing Constraints 

    The new cost function....

    qL is the joint limit.

  • BRL BERT II ARM Bristol Robotics Lab

    Natural Human-Like Motion

    An Approximate Dynamic Programming Based Controllerfor an Underactuated 6DoF Quadrotor

    Emanuel Stingu Frank Lewis

    Automation & Robotics Research InstituteUniversity of Texas at ArlingtonARRI

    Supported byARO grant W91NF-05-1-0314NSF grant ECCS-0801330

  • 3 control loops

    The quadrotor has 17 states and only 4 control inputs, thus it is very under-actuated.Three control loops with dynamic inversion are used to generate the 4 control signals.

    x

    y

    zamountof tilt

    yaw

    directionof tilt

    Momentum theory applied to the propeller

    Translationand yaw

    Controller

    AttitudeController

    Motor andPropellerController

    Referencetrajectory

    Position and velocity errors

    Desired attitude

    Desired forces

    Motor commands

    Dynamic inversionaccelerations to attitude

    Dynamic inversionrotation accelerations to forces

    Altitude control: thrust force

    Dynamic inversionthrust forces to motor voltages

    Using standard controller

    UsingReinforcement learningcontroller

  • Potential New Applications in Prosthetics

    With Sesh Commuri, Oklahoma UniversityAnd Muthu Wijesundara

    Adaptive Reinforcement Learning of Human/Prosthetic Ankle Interactions

    Current methods use stored gait information

  • What about Adaptive Human-Robot Interfaces (HRI)?

    Adaptive Learning

    HRI

    Different usersDifferent robotic devices

    Interface learns so as to improve Human/Robot team performance

    Tune parameters such as stiffness, speed of response, dynamic motion

    There are 3 componentsHumanRobotThe Interface

    They should all be intelligent

    We want verifiable performance and straightforward tuning algorithms

  • Several Approaches to Adaptive HRI1. Learning of Coordinate Transforms

    2. Dynamic Motion Primitives to Modify Task Parameters

    3. Adaptive Impedance Control

    Current and Expected Funding

    F.L. Lewis, “Adaptive Dynamic Programming for Real-Time Cooperative Multi-Player Games and Graphical Games,” NSF Grant, $272,000 for 3 years, July 2011.

    D. Popa, Z. Celik-Butler, D. Butler, and F.L. Lewis, “NRI: Multi-Modal Skin and Garments for Healthcare and Home Robots,” NSF Grant, $1.3M for 4 years, Sept. 2012.

    F.L. Lewis, “Games and Learning for Cooperative Nonlinear Systems and Internal Structure of Coalitions on Graphs, TARDEC/NAC grant, $81,000 for 1 year, Jan. 2013 expected.IF NO FISCAL CLIFF!

    1d dq J y

    Robot Inverse kinematics

    ( )dy t ( )dq t

    ( )dy t Robot Inverse Jacobian

    Hard to find AccuratelyNeed dynamics model of robotDoes not generalize to different robots

    Learning of Coordinate TransformsUser coordinates Robot coordinates

    User coordinates Robot coordinates

    Standard robot controller

  • ( )dy t Robot Inverse Jacobian

    Hard to find accuratelyNeed dynamics model of robotDoes not generalize to different robots

    1. Learning of Coordinate Transforms

    Usercoordinates Robot

    coordinates

    Standard robot controller

    Learning the Interface MappingWhat is it exactly?

    u y

    f uWhat can we do to get f(u)?

    The simplest way is to obtain a set of inputs and a set of outputs and calculate the relationship (Curve Fitting)

    uy

    Curve Fitting Algorithm f

    x

    0 0( )

    M P

    i i ij ii j

    y f u w w u

    Static nonlinear map

    Dan PopaReinforcement Learning

  • NN Training

    Log sigmoid function is used as a neuron activation function . 1

    1 ex

    Train using MATLAB NN Toolbox very easily if input/output pairs are known.

    Dan Popa

    Static Mapping Approach

    f u

    f(u) is a coordinate transformation

    Found by training neural network with known input/output pairs Dan Popa

  • Reinforcement Learning

    uy

    Curve Fitting Algorithm f u

    What if we cannot specify the desired output trajectory directly ? 

    “Learning by interacting with an environment”Reinforcement LearningReinforcement Learning

    With RL we do not have to specify a desired trajectory.

    Instead, a Reward Function is used

    Dan Popa

    How to find the Reward Function?

    Model of Task

    A task is a profile in velocity/force spaceDifferent Curve for Different Tasks

    2. Learn and Modify Task Parameters

  • 20( ) ( ) ( )

    T Tx Dx Ks g x K g x KW V ss s

    Dynamic Motion Primitives for Characterizing the Task

    Learning the task:Kinesthetic demonstration to learn D, K, initial weightsReward-based learning to tune the weights

    Phase variable s makes DMP independent of time

    Tunable NN weights

    Different NN weights give different task trajectories

    • Build an adaptive interface system that allows a single operator to manipulate a robots with multiple degrees of freedom and/or multiple robots• The interface system should be intuitive,  easy to use and can be learned quickly , and should be able to apply with different robot configurations

    Robot System

    Operator

    Performance Evaluation

    Output

    Interface Systeminput

    Reinforcement Learning for Human-Robot InterfacesDan Popa

    What to learn?How to evaluate Performance?

    Adaptive Impedance Control

  • 3. Adaptive Impedance Control

    ImpedanceModel

    Inner loop FB controllers

    Human intentRobot performance

    Performanceevaluation

    Tuning

    Tune interface so robot learns to control the human model to improve task performance

    Model of Human Behavior

    For a given task, a skilled human operator adapts to learn-

    1. An inverse model of the robot system to cancel nonlinearities

    2. A feedforward predictive control based on the task

    Frequency Crossover Theory -

    Human operator adapts to make the overall closed-loop man-machine system

    look like a high bandwidth frequency response

    And remain invariant to a wide range of task variations and changes in operating condition

    Suzuki & Furuta 2012

  • For a given task, after learning, the human model has three parts1. Time delay due to vision processing and reaction time2. A first-order filter due to neuromuscular dynamics3. A PD controller in the cerebellum

    Adaptive Impedance Control as a 2-Player Game

    InterfaceParameters

    Human intentRobot performance

    Critic Agent forPerformance Evaluation

    Tuning

    A 2-Player Dynamic Game

    Player 1- the human

    Player 2- the Interface!

  • Sun Tz bin fa孙子兵法

    Graphical Coalitional Games

    500 BC

    Optimal Control is Effective for:Aircraft AutopilotsVehicle engine controlAerospace VehiclesShip ControlIndustrial Process ControlRobot Control

    Multi-player Games Occur in:EconomicsControl Theory disturbance rejectionTeam gamesInternational politicsSports strategy

    Optimality and Games

  • Actor‐Critic Game structure ‐ three time scales

    x

    u2 1 0;x Ax B u B w x

    System

    Controller/Player 1

    w( 1)T i kuu B P x

    11T i

    ww B P x

    Interface/Player 2

    CriticLearning procedure

    ˆ ˆ, if =1

    ˆ ˆ ˆ ˆ, if >1

    T T T

    T T

    V x C Cx u u i

    V w w u u i

    T T

    1 2

    1iuP( 1) 1 ( 1)i k i i k

    u u uP P Z

    V

    iuP

    x

    ( ) 1 ( )i k i i ku u uP P Z

    The cloud-capped towers, the gorgeous palaces,The solemn temples, the great globe itself,Yea, all which it inherit, shall dissolve,And, like this insubstantial pageant faded,Leave not a rack behind.

    We are such stuff as dreams are made on, and our little life is rounded with a sleep.

    Our revels now are ended. These our actors, As I foretold you, were all spirits, and Are melted into air, into thin air.

    Prospero, in The Tempest, act 4, sc. 1, l. 152-6, Shakespeare

  • The way that can be told is not the Constant WayThe name that can be named is not the Constant Name

    For nameless is the true wayBeyond the myriad experiences of the world

    To experience without intention is to sense the world

    All experience is an archwherethrough gleams that untravelled landwhose margins fade forever as we move

    Dao ke dao feichang daoMing ke ming feichang ming