Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved...

51
Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish D. Tandale, John Valasek Texas A&M University Andrew J. Meade Rice University AIAA Infotech@Aerospace Conference 26-29 Sep 2005 Washington, DC

Transcript of Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved...

Page 1: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles

James Doebbler, Monish D. Tandale, John ValasekTexas A&M University

Andrew J. MeadeRice University

AIAA Infotech@Aerospace Conference26-29 Sep 2005Washington, DC

Page 2: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-2

Overview

Brief Introduction to Morphing Air VehiclesAdaptive-Reinforcement Learning Control ArchitectureSimplified Model of a Morphing Vehicle The Old Way– Reinforcement Learning Module– Structured Adaptive Model Inversion Control– Numerical Example

The New Way– Numerical Example

Results & ConclusionsFuture Work

Page 3: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-3

Student Research Team2005 - 2006

Page 4: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-4Aerospace Vehicle Systems Technology Program

Tech

nolo

gyPr

ogre

ss

Time

SOA Metal Supercritical Wing

AE Tailored Wet Composite Wing

Composite Blended Wing

Composite Wing & Fuselage VLA

Active Aero & Structures

Low-Boom SupersonicTransport(s)

Intelligent PersonalAir Vehicles

Bio/NanoSelf-Optimizing Aircraft

Advanced Concept Evolution

Page 5: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-5

• Nanostructures: 100 times stronger than steel at 1/6 the weight

• Active flow control

• Distributed propulsion

• Electric propulsion, advanced fuel cells, high-efficiency electric motors

• Integrated advanced control systems and information technology

• Central “nervous system”and adaptive vehicle control

• Develop light, strong, and structurally efficient air vehicles.

• Improved aerodynamic efficiency.

• Design fuel-efficient, low-emission propulsion systems.

• Develop safe, fault-tolerant vehicle systems.

Today’s Challenges: Technology Solutions:

Revolutionary Vehicles–Technologies

Fuel Cell Propulsion

Active Flow Control

Adaptive Control

Nanotube

Electric Circuit

Anode Catalys

t

Exhaust

Cathode Catalyst

Fuel

PolymerElectrolyteMembrane

Air

Page 6: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-6

WHEN to reconfigure

HOW to reconfigure

LEARNING to reconfigure

Big Picture Research Goals

Page 7: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-7

Morphing for Mission Adaptation– Large scale, relatively slow, in-flight shape change

to enable a single vehicle to perform multiple diverse mission profiles

as opposed to:

Morphing for Control – In-flight physical or virtual shape change to achieve

multiple control objectives (maneuvering, flutter suppression, load alleviation, active separation control)

MissionAdaptation Control

John Davidson, NASA Langley, AFRL Morphing Controls Workshop – Feb 2004

Which Morphing?

Page 8: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-8

Make Full Use Of The Physical Knowledge Of The Problem

??

Machine Learning

Reconfiguration Policy

learning

Adaptive Control

Parameters in a Known Functional Relationship

adapting

Learn When to Reconfigure With Machine Learning

Our Approach

Page 9: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-9

Synthetic Jets for Virtual Shaping and Separation Control

MultiSensor MEMS Arrays for Flow Control Feedback

Adaptive Controller

Sensed Information Aggregation

Control Information Distribution

Reconfiguration Command Generation

System Performance Evaluation

Knowledge Base

Environment

Control Architecture

Page 10: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-10

Adaptive-ReinforcementLearning Control (A-RLC)

Conceptual Control Architecturefor Reconfigurable Aircraft

SAMIStructured Adaptive Model Inversion

(Traditional Control)

Flight controller to handle wide variation in dynamic properties

due to shape change

RL Reinforcement Learning

(Intelligent Control)

Learn the morphing dynamics and the optimal shape at every flight

condition in real-time

Page 11: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-11

Morphing Vehicle Model

Lockheed Martin

NextGen

Page 12: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-12

2-D Plate Rectangular Block Ellipsoid Delta Wing Final2003 2004 2005 2006 Objective

Morphing Vehicle Evolution

Page 13: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-13

Morphing Vehicle - TiiMY

ShapeEllipsoidal shape with varying axis dimensions.Constant volume (V) during morphing2 independent variables: y and z, dependent dimension

Morphing DynamicsSmart material: carbon nano-tubes or shape memory alloyMorphing Dynamics : Simple Nonlinear Differential Equations

6Vxyzπ

=

y-dimension 2z-dimension 3

y

z

y yy Vz zz V+ =+ =

Page 14: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-14

Shape Morphing AnimationTiiMY

Page 15: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-15

Morphing Time Histories

Page 16: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-16

Optimal Shapes atVarious Flight Conditions

Optimality is defined by identifying a cost function.J=J (Current shape, Flight condition)

2 2

0.5

( ( )) ( ( ))

3 cos( ) and 2 22

y z y z

Fy z

J J J y S F z S F

S F S eπ −

= + = − + −

= + = +

Page 17: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-17

6-DOF Mathematical Modelfor Dynamic Behavior

Variables

Nonlinear 6–DOF Equations– Kinematic level:– Acceleration level:

Drag Force– Function of air density, square of velocity along axis, and

projected area of the vehicle perpendicular to the axis

[ ] [ ]

[ ] [ ]

T Tc x y z c

T T

p d d d v u v w

p q rσ φ θ ψ ω

= =

= =

;

c l c a

c c d

d

p J v Jmv mv F F

I I I M M

σ ωω

ω ω ω ω

= =+ = +

+ + = +

additional dynamics due to morphing

Page 18: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-18

Reinforcement Learning

Page 19: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-19

1. Actor takes action based upon states and preference function2. Critic updates state value function, and evaluates action3. Actor updates preference function

Learning is done repetitively, by subjecting to different scenarios

Reinforcement Learningself training

Page 20: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-20

Actor-critic method– On policy, TD method– The actor selects actions

Preference of an action:Greedy Policy:

Gibbs softmax policy:

– The critic criticize the actionsState value functions:TD error:

– Strengthen or weaken the tendency to select one action

),( asp

Policy

Value Function

Environment

Actor

Critic

Cost (Jy,Jz)

States (F,y,z)

Actions (Vy,Vz)

TD error

),(maxarg),( aspasa

t =π

∑==== ),(

),(

}Pr{),( aspa

asp

ttt eessaaasπ

),( asp

)()( 11 tttt sVsVr −+= ++ γδ)( tsV

ttttt aspasp βδ+← ),(),(

Reinforcement Learning

Page 21: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-21

Q-LearningOnly has to learn the action value function Q(s,a)– How well agent performs action a in state s under policy pi

Q learning is an off-policy temporal difference methodProven convergence

Q(s,a)

asQasQrasQasQsra

Q(s,a)sa

s

Q(s,a)

a

return terminalis s until ss

)],(),(max[),(),( , observe ,action Take

policy)greedy - //(e.g., from derivedpolicy using from Choose

:episode) of stepeach (for Repeat Initialize

:episode)each (for Repeat yarbitraril Initialize

′←−′′++←

′γα

ε

Learning()-Q

Page 22: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-22

Function Approximation K Nearest Neighbor Policy Iteration

The shape of the vehicle is on continuous domainsUse K-nearest neighbors method to approximate the action-value function Q(s,a)– Collect a set of state-action pair samples– Compute optimal action-values of these samples using Q-learning– The action-value of a new state-action pair is the interpolation of those

of its K nearest neighbors. – Takes a weighted average of the K nearest neighbors in the sampled

state spaceSusceptible to absence of accurate information near the desired point

),,,(

),(),(

to gneighborinnearest K ofset the},{ find ,),(For

),(Learn Collect

1 0000

000

∑=

=K

n ii

iisample

i

iisample

asasasQ

asQ

sssasQ

asQSample

Distance

KNNPI

Page 23: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-23

Structured AdaptiveModel Inversion Control

Page 24: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-24

Reference model

- e

my

model reference adaptive control

Controller Plantr yu

Adaptation Law

Estimated parameters

Adaptive Control

Page 25: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-25

Structured Model ReferenceAdaptive Control

Akella, Schaub, Junkins (Texas A&M)

Dynamics

2nd order differential equations

Exact kinematic relationship between position and velocity

Acceleration level relationships between

forces and system parameters

x v=

Fv am

= =

F m a=

Page 26: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-26

Features– Dynamic inversion inner-loop, with an MRAC outer-loop to

handle system uncertainties.– Controls are solved for explicitly:

– Undesirable dynamics are cancelled and replaced with user specified desired dynamics.

– Easily applicable to nonlinear systems.– Error dynamics can be specified.– Shown to be very effective for a wide variety of systems.

,

1

S y s te m M o d e l ( )R e fe re n c e T ra je c to ry

C o n tro l L a w ( ( ) ) s o th a t th ee r ro r d y n a m ic s b e c o m e s

r r

r

x A f x B ux x

u B x A f x ee e

λλ

= +

= − −= −

Structured Adaptive Model InversionSubbarao, Junkins (Texas A&M)

Page 27: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-27

Structured Adaptive ModelInversion Features

Trajectory Tracking for Dynamic SystemsPlant:– Nonlinear in states, affine in control, uncertain parameters appear linearly.

Control:– Dynamic Inversion and Sliding Mode Control.– Dynamic Inversion requires knowledge of system parameters, which are

inherently uncertain. Adaptive Learning Parameters:– Updated in real-time, and used for the Dynamic Inversion

Adaptation Mechanism:– Driven by the error between the actual plant trajectory and the reference

trajectory Stability Analysis:– Guarantees that the plant trajectory, asymptotically converges to the reference

trajectory in the presence of Parametric Uncertainties, and initial condition errors.

Page 28: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-28

Structured Model & Minimal Parameterization

Kinematic Level Model Acceleration level Model

c l c

a

p J vJσ ω==

c c d

d

mv mv F F

I I I M M

ω

ω ω ω ω

+ = +

+ + = +

Attitude Control * *( ) ( , ) ( )Ta a aI C P Mσ σ σ σ σ σ+ =

11

2211 12 13 1 1 2 3

3312 22 23 2 2 1 3

1213 23 33 3 3 1 2

13

23

0 0 00 0 00 0 0

II

I I I a a a aI

I I I a a a aI

I I I a a a aII

⎡ ⎤⎢ ⎥⎢ ⎥⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥= ⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Minimal Parametrization of the Inertia Matrix

Page 29: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-29

Control Law & Update Law

* *( ) ( , ) ( , , )a a aI C Yσ σ σ σ σ σ σ σ θ+ =

Using Minimal Parametrization of the Inertia Matrix

unknown parameters

With the control law ˆ{ ( , , , ) }Ta a r r da daM P Y C Kσ σ σ σ θ ε ε−= − −

the closed loop dynamics take the form* *{ ( , )} ( , , )a da a da aI C C K Yε σ σ ε ε σ σ σ θ+ + + =

and, along the adaptive law ˆ ( , , , )Ta r rYθ σ σ σ σ ε=−Γ

guarantees asymptotic stability of the tracking errors

Page 30: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-30

Adaptive–Reinforcement Learning Control

Page 31: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-31

A-RLC Architecture

Page 32: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-32

0 20 40 60 80 1000

1

2

3

4

5

6

Flight Path Distance

Flig

ht C

ondi

tion

Objective– Demonstrate optimal

shape morphing for multiplespecified flight conditions

Method– For every flight condition, learn

optimal policy that commands voltage producing the optimal shape

– Minimize total cost over the entire flight trajectory– Evaluate the learning performance after 200 learning episodes

Numerical Example

RL Module is Completely Ignorant of Optimality Relations and Morphing Control Functions:

It Must Learn On Its Own, From Scratch

Page 33: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-33

Agent: Morphing Air Vehicle Reinforcement Learning Module

Environment: Various flight conditions

Goal: Fly in optimal shape that minimizes cost

States: Flight condition; shape of vehicle

Actions: Discrete voltages applied to change shape of vehicle– Action set:

Rewards: Determined by cost functions

Optimal control policy: Mapping of the state to the voltage leading to the optimal shape

reinforcement learning definitions

( , ) [2,4] [0,5]( , ) [2,4] [0,5]y F

Sz F

×⎧ ⎫= ⎨ ⎬×⎩ ⎭

{ }( , ) [0, 0.5, 1, 5] ; [0, 0.5, 1, 4]y zA V V= … …

Numerical Example

Page 34: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-34

Learning Process

0

5

23

4−5

0

5

F

20 Episodes

y dimension

Err

or V

y

0

5

23

4−5

0

5

F

60 Episodes

y dimension

Err

or V

y

0

5

23

4−5

0

5

F

100 Episodes

y dimension

Err

or V

y

0

5

23

4−5

0

5

F

200 Episodes

y dimension

Err

or V

y

Error in the Action Preference Function after:

Page 35: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-35

Example: Old Way

Page 36: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-36

Comparison of True Optimal Shape and Learned Shape

KNN learns poorly forseveral flight conditions

Page 37: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-37

Time Histories of Angular States

Page 38: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-38

Time Histories of Adaptive Parameters

Page 39: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-39

Trajectory Tracking Controls

Page 40: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-40

Morphing Control Voltages

Page 41: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-41

F Still Equals d(mv)/dt– Stiffness, frequency, damping ratio, and cost function have a major

effect on learning times and learning performance.– Slowest element in system is critical

Some Assembly Required: Tuning– Flight condition transitions, morphing dynamics, learning performance,

and adaptive control must be balanced to achieve good performance.– Coordination and timing is everything

What Happened? (1)

Page 42: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-42

Function Approximation– Errors remained which could not be eliminated with additional training.

– Use Galerkin-based Sequential Function Approximation (SFA) to approximate the action-value function Q(s,a)

What Happened? (2)

Page 43: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-43

Galerkin-Based Sequential Function Approximation (SFA)

Ideal for approximating sparse multi-dimensional scattered dataAdaptive, no ad-hoc user parameters to adjustProvides information on the sensitivity to the inputsMatrix construction and evaluations are avoidedComputational cost is reduced while efficiency is enhanced by the low-dimensional unconstrained optimization

Page 44: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-44

Example: New WaySFA learns optimal shape well

Page 45: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-45

ResultsNormalized RMS error

Y dimension Z dimension

K N N 1.42 0.821

S F A 1.27 0.661

10% reduction 20% reduction

Page 46: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-46

Improved function approximation provided large improvement in performance– Galerkin-based Sequential Function Approximation method learns

optimal shape well

Shape Changes for Mission Morphing can be treated as piecewise constant parameter changes– SAMI is a favorable method for trajectory tracking control

Morphing for Control will require different control strategy– Piecewise constant approximation no longer valid

Conclusions

Page 47: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-47

Future Research 1

Modify the morphing dynamics to represent SMA actuators. – Hysteretic behavior

Investigate control methodologies to handle faster shape changes– Linear Parameter Varying (LPV) control

Investigate other function approximation methods– Radial Basis Functions (GLO-MAP)

Modify the simulation to include a more advanced aircraft model– Wing-Body, Wing-Body-Empennage, etc.

Page 48: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-48

STATE OF THE ART:Multiple antennas must be mounted

on spacecraft to accommodate various ground station signals

RESEARCH GOALS:Demonstrate feasibility of a reconfigurableantenna design that utilizes Reinforcement Learning to independently achieve optimal shape through use of SMA actuators

BENEFITS:A single antenna capable of altering its geometry to achieve world-wide compatibility between receivers and transmitters

Morphing Space Based Antenna

Future Research 2

Page 49: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-49

Future Research 3

Directly learn voltage inputs required to achieve certain position states with time dependency.– Skips mathematical modeling and computer simulation steps – Avoids modeling and simulation errors

Page 50: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-50

Questions?

Page 51: Improved Adaptive-Reinforcement Learning Control for Morphing …€¦ · Improved Adaptive-Reinforcement Learning Control for Morphing Unmanned Air Vehicles James Doebbler, Monish

Doebbler, Tandale, Valasek, & Meade AIAA-2005-7159-51

Reinforcement Learning

Parameterized functional form

Gradient descent learning

The Temporal Displacement error drives all learning in both actor and critic

∑=

Φ=N

jjtjvt ssV

1)()( θπ ∑

=

Φ=N

jjtjpta ssp

a1

)()( θ

Ttttp

Tttpttttatptp δsp

aaaaHθHθHθθ αββδα +=−++=

−−− 111))((

Ttttv

Tttvttvtv y HθHθHθθ αδα +=−+= −−− 111 )~(

function approximation