ON-LINE LEARNING OF ROBOT INVERSE DYNAMICS WITH CEREBELLAR … · functionality in articulating...

http://www.iaeme.com/IJMET/index.asp 445 [email protected]

International Journal of Mechanical Engineering and Technology (IJMET)

Volume 9, Issue 2, February 2018, pp. 445–460 Article ID: IJMET_09_02_046

Available online at http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2

ISSN Print: 0976-6340 and ISSN Online: 0976-6359

© IAEME Publication Scopus Indexed

ON-LINE LEARNING OF ROBOT INVERSE

DYNAMICS WITH CEREBELLAR MODEL

CONTROLLER IN FEEDFORWARD

CONFIGURATION

Lavdim Kurtaj, Vjosa Shatri* and Ilir Limani

Faculty of Electrical and Computer Engineering,

University of Prishtina “Hasan Prishtina”, 10000 Prishtina, Kosovo.

*Corresponding Author

ABSTRACT

Performance of robot control in trajectory tracking can be improved considerably

if robot inverse dynamics model is known. It may be used in feedforward or in

computed torque configuration. Cerebellar model controllers can be used to acquire

inverse robot dynamics model on-line. In this paper we explore different structural

aspects of cerebellar controller in feedforward configuration for improving robot

control performance. Cerebellar controller is used beside conventional proportional-

derivative controller, and it learns by using output of later as teaching signal. Effects

of cerebellar controller with dimensionality of input space lower than that of the

problem to be learned is explored. Fully coupled Albus overlays with uniform

population coding for input dimensions, at different number, shape and width of

receptive fields, in accuracy of acquired model is investigated. Root-mean-square of

position and speed error is used as measure of control performance. How

normalization of receptive fields affects cerebellar control performance is explored by

using receptive fields with self-normalization property and those without it. Simulink

model of cerebellar controller that preserves layered organization is used, along robot

plant model built in SimMechanics.

Key words: Cerebellar model controller, robot inverse dynamics, on-line learning,

feedforward robot controller, receptive fields.

Cite this Article: Lavdim Kurtaj, Vjosa Shatri and Ilir Limani, On-line learning of

robot inverse dynamics with Cerebellar Model Controller in feedforward

configuration, International Journal of Mechanical Engineering and Technology 9(2),

2018. pp. 445–460.

http://www.iaeme.com/IJMET/issues.asp?JType=IJMET&VType=9&IType=2

On-line learning of robot inverse dynamics with Cerebellar Model Controller in feedforward

configuration


1. INTRODUCTION

From control point of view robots have a number of actuators that must be driven in a

coordinated manner. Actuators drive joints that are linked with links, forming some structural

arrangement that is able to perform tasks it is intended for. Typically each joint actuator is

treated as part of a separate independent control system [1] with its own controller of type

proportional-integral-derivative (PID). But most robotic structures are characterized with

inherent dynamics interactions or couplings between joints. These couplings will be

manifested as disturbance for independent joint controllers and it is relied on ability of the

joint controller to suppress them at satisfactory level. When actuators are equipped with high

reduction ratio gearboxes, as is the case for low speed operation, disturbances at output axle

will be highly attenuated, and will not influence much in control performance of independent

joint actuators [2]. Use of constant parameter controllers of PID type is justified in this case.

In more demanding applications more advanced controllers are to be used. A broad range of

them rely on knowledge of robot dynamics, and use this information in improving control

performance. This leads to perfect control [3] with accurate robot dynamics model. Even

thought that this perfect control cannot be obtained in practice, it serves as leading way of

reaching it. Control problem is now converted to a problem of finding robot dynamics model

[4], as accurate as possible.

Attractive way of finding plant model, avoiding physical modeling, is by learning it with

some type of artificial neural network (ANN) [5]. Part of brain that is thought to be highly

involved in coordination of multi-joint movements is cerebellum [6]. How neuronal structure

of cerebellum, and of other parts of the brain, [7] is related to function it is involved in, is a

question that attracted many attention by research community. For cerebellum, relating

neuronal connectivity with its function has been covered by two theories of cerebellar

function, at 1969 by Marr [8] and at 1971 by Albus [9]. Theories assume a physiological

mechanism of learning at specific parts of connectivity between neurons, namely synapses,

for former being in form of long-term potentiation (LTP) and for latter in form of long-term

depression (LTD). About a decade latter at 1982 Ito [10] found that physiological mechanism

that aids learning at specified site was in form of LTD. Theories of cerebellar functioning

were followed by computational model at 1975 by Albus [11]. Based on its main assumed

functionality in articulating multi-joint movements, it was named Cerebellar Model

Articulation Controller or CMAC for short.

CMAC neural network in one of its extremes of implementation is treated as lookup table

[12, 13], where binary equivalents of inputs serve as address to a memory location where

information will be stored during learning or retrieved when it is used for control. Each

location will represent exclusively part of the multi-dimensional input space in form of the

hypercube (multi-dimensional receptive field), one quanta wide (binary receptive field one

quanta wide) in each direction, with same value over corresponding hypercube. Only one

memory location will be updated during learning, and also content of only one memory

location will determine its output. The result of learning will be stepped approximation of

hyper-surface describing input-output relationship for multi-input-single-output (MISO)

process. For multi-output (MIMO) processes each addressed location would be a vector with

number of locations equal to number of outputs. This form of learning is able to learn any

MISO (and MIMO) with desired accuracy [14], but uniform division of multi-dimensional

input space to hypercubes can result with enormous and impractical number of memory

locations. To attain desired accuracy quanta width must be determined from steepest part of

the hyper-surface.

Lavdim Kurtaj, Vjosa Shatri and Ilir Limani


Number of partitions can be decreased by using wider receptive fields, with positive

influence on generalization, or by creating non-uniform partitions (receptive fields of different

widths) per dimension, or over the same dimension [18, 16]. One issue that deserves

consideration, when assuming adaptability at input layer, is applicability of it to the

distributed usage (by cerebellar parallel fibers [6]) of the same information as present in the

cerebellar neuronal connectivity.

All approaches suffer from curse of dimensionality [18], i.e. exponential increase in

higher dimensional hypercubes (or hyper-parallelepipeds with non-uniform partitions of input

dimensions) and corresponding weights where learning takes place. Approaches that apply

equally to original model and all aforementioned modifications that will aid in decreasing this

impractical storage space are higher order receptive fields [19], Albus overlays [11] and

hashing [11]. First approach, besides decreasing number of hypercubes, will provide smother

approximation instead of stepped one for standard CMAC [20]. Using Ablus overlays will

lose some of the functionality, but it will contribute to lower number of hypercubes when

wider receptive fields are used, by using only a number equal to dimensionality of input space

from total number of higher dimensional receptive fields for given input. Hashing is based on

a simple fact that for higher dimensional input spaces only a small fraction of it will be used

under normal working conditions, making most of the hypercubes (higher dimensional

receptive fields) unused. Hashing is many-to-one mapping, and will map this high number of

hypercubes to much lower number of memory locations where learning information is stored.

Collisions of storage spaces may happen (if not intentionally resolved) and will be manifested

as noise, but with proper design seem to be acceptable practically. Higher order receptive

fields are biologically plausible. Also Albus overlays may be thought of as biologically

plausible if considering random connectivity between input information and processing units

(neurons) that generate higher order receptive fields, especially under space constrains.

Hashing will reduce storage space in computational model, but would not decrease number of

processing units biologically.

In this paper we explore influence of receptive fields shape and width, for cerebellar

model input signals coding, in quality of acquired robot inverse dynamics during on-line

learning in feedforward configuration. Only Albus full-overlaid CMAC is considered.

Multiplication operator is used for generating higher-order receptive fields from one-

dimensional receptive fields used for input signal coding. Rest of the paper in Section 2 gives

short overview of utilization of robot inverse dynamics model in feedforward and feedback

control structures. It is followed with presentation of neuronal circuit of the cerebellum and

typical information processing by cerebellar models. Results of simulation in controlling

robot with cerebellar model using receptive fields of different shapes and widths, while it is

on-line learning robot inverse dynamics, is presented in Section 3. Paper ends with

Conclusions Section.

2. METHODS

Two main control structures where robot inverse dynamics can be used to improve control

performance are presented. Cerebellum based model will be used for on-line learning of robot

inverse dynamics.

2.1. Model of Robot Inverse Dynamics in Control

Inverse dynamics mode of the robot can be used to improve control performance. With

accurate model performance of up to perfect control can be obtained [3], theoretically. Since

implementation of the controller will be in for digital controller, with limited number of

calculations per second, even with ideal robot inverse dynamics model perfect control is not


configuration


achievable. Other factor that can prevent perfect control is friction, being almost impossible to

create ideal model. Inaccuracies in the robot model itself or in the implementation make

presence of conventional controller indispensible. Figure 1 shows two standard configurations

where inverse dynamics model can be incorporated in joint controller. Figure 1(a) represents

robot (joint) control structure with conventional PD controller augmented with feedforward

controller utilizing inverse dynamics model. If robot model is accurate all control action will

be generated from feedforward controller. Unmodelled dynamics will be handled with

conventional controller, and if there is none its output will be zero. Use of inverse dynamics

controller in a computed torque control structure is shown in Figure 1(b). In this case joint

actual positions and speeds and desired joint accelerations will serve as input to the block that

will compute necessary actuating torques. For smooth referent trajectories with continuous

joint positions, speeds and accelerations performance of both structures will be the same [21].

Second configuration sometimes is implemented similar to structure in Figure 1(a), by

preserving standard connectivity of conventional controller, and by using actual values for

joint positions and accelerations.

Figure 1 Robot controllers based on inverse dynamics model. (a) Robot control structure with

conventional PD controller augmented with feedforward controller utilizing robot inverse dynamics

model. (b) Robot computed torque controller

Notice that physical meaning of output signal from conventional PD controller will be

different for two control structures in Figure 1, being torque (or voltage) for first one in Figure

1(a), and acceleration for second control structure in Figure 1(b).

2.2. Neuronal Circuit of the Cerebellum and Information Processing

Cerebellum is part of the brain that inhabits more than a half of total number of neurons,

while occupying only a fraction of total volume. It is attributed a main functionality in

coordinating multi-joint movement. Being well separated from but resembling the brain it is

dq

dq

dq

q

q

q

τ

c

m

(

a

)

dq

dq

dq

q

q

q

q

q

q

τ

c

m

(

b

)

)(),()( qGqqCqqD

Controller

PD

Controller

PD

)(),( qGqqC

)(qD

Inverse Dynamics

Model

Inverse Dynamics

Model

τ

F

F τPD

a

P

D

Robot

Robot



also known as little brain. Despite huge number of neurons, its organization seems simple and

regular, shown in Figure 2. Cerebellar cortex is organized in three layers of neurons. Inner

layer is molecular layer where mainly reside granule cells (smallest and most numerous type

of neuron) and Golgi cells, GrC and GoC in Figure 2. Middle layer, namely Purkinje cell

layer, having only bodies of Purkinje cells arranged in single cell thick layer, PC in Figure 2.

Third outer layer is molecular layer. It inhabits stellate cells and basket cell, StC and BaC in

Figure 2, and remarkable organization of with almost flat dendrite trees of Purkinje cells and

parallel fibers (axons of granule cells, PF in Figure 2) passing through them at right angle.

Information enters cerebellum through mossy fibers (mf1, mf2 and mf3 in Figure 2) and

through climbing fibers (cf in Figure 2). In developed cerebellum only one climbing fiber will

target given Purkinje cell, and they are assumed to carry teaching signal in form of error. All

climbing fibers originate from inferior olivary nucleus (IO in Figure 2). The only output from

cerebellar cortex is through Purkinje cell axons which target deep cerebellar nuclei (DCN in

Figure 2), with axons of latter being the only output from the cerebellum.

Figure 2 Neuronal circuit of the Cerebellum. Green lines ending with a circle are excitatory

connections; red lines ending with rhombs are inhibitory connections; climbing fibers with blue lines

ending with circles are also excitatory. mf1, mf2 and mf3: mossy fibers; gl: glomeruli; GrC: granule

cells; GoC: Golgi cells; aa: ascending axon; PF: parallel fibers; StC: stellate cells; BaC: basket cells;

PC: Purkinje cells; DCN: deep cerebellar nuclei; cf: climbing fiber; IO: inferior olive

Information carried by mossy fibers, is population coded with (assumed) one-dimensional

receptive fields of specific form, with square (0/1 or binary), triangular and Gaussian being

more common, shown in Figure 2 as 1-dim RF. They are processed expansively with granule

cells and Golgi cells processing arrangement, generating several orders of magnitude more

parallel fibers than original mossy fibers. Theoretically all possible combinations between

mossy fibers of different input dimensions are formed. Parallel fibers are axons of granule

cells that rise vertically from granule cell layer (ascending axon part, aa in Figure 2) through

Purkinje cell layer toward molecular layer, where they split in T-shaped form creating parallel

fibers. They are assumed to carry higher-dimensional information (n-dim RF in Figure 2) that

is distributed through several hundred Purkinje cell dendrite trees, but not necessarily

contacting all of them. One Purkinje cell may make contacts (synapses) with hundreds of

thousands (specie dependent) of parallel fibers from orders of magnitude more of them

passing through. These contacts are assumed main learning site [8, 9] where plasticity is

present, and are represented with synaptic weights wPF-PC in Figure 2. Co-activation of PF and


configuration


cf will induce LTD [10]. Hundreds of Purkinje cell axons will target deep cerebellar neuron,

also targeted from mossy fibers and climbing fibers, and will generate one excitatory output

from cerebellum. DCN also inhibit IO neurons, source of climbing fibers, shown in Figure 2

as recurrent loop between DCN and IO. Stellate cells and basket cells will also use

information from parallel fibers and will create inhibitory connections with Purkinje cells, to

dendrite tree for stellate cells, whereas basket cells target dendrite tree and specific basket

arrangement around Purkinje cells body where axon originates. Most models based on CMAC

[11] do not use DCN neuron and use only one PC per output, assuming simple summation

function for neurons of DCN and same learning signal for all PC that target given DCN

neuron. Also cerebellar models not necessarily respect number of inputs that target specific

neuron and amount of divergence for outputs from cerebellar neurons. Purkinje cell and

corresponding synaptic weights are also modeled as simple perceptron with linear activation

function. Functionality of granule cell layer with GrCs and GoCs is modeled as logical AND

or multiplication operator that generated multi-dimensional receptive fields at PF from one-

dimensional receptive fields at mossy fibers. First operator can be used if input receptive

fields are logical (0/1) and latter in general case with population coded receptive fields of any

form.

Generation of population coded information in the mossy fibers is supposed to be

happening outside of the cerebellum. Most of cerebellar models will use transformations that

convert input signals of each dimension to a group of signals with compact support receptive

fields of specific form, like those seen in Figure 2 at input. They can be arranged in groups of

non-overlapping receptive fields (Albus layers) like in standard Albus CMAC model [11], or

in a single layer of overlapping receptive fields in later CMAC models. In this paper three

shapes (square, trianglar and Gaussian) of different width for receptive fields used for

population coding input signals will be explored. Figure 3 shows one layer of one-

dimensional overlapping receptive fields used for coding input signal xi.

Figure 3 One layer of overlapping triangular receptive fields. Layer i has ni receptive fields, Bi,1 to

Bi,ni, Width of receptive field is CBi,4. All layers will be similar to one shown, but may be of different

number and of different widths

Only one layer of triangular receptive fields with a number of receptive fields of given

width is shown for population coding of input signal xi. Bi,k is basis function for receptive

field and CBi,4 is its width. Different inputs may use different receptive field shape and/or

width. Higher-order receptive fields carried by parallel fibers will be generated by using

multiplication operator over one-dimensional receptive fields that are used for population

coding input signals carried by mossy fibers.

Model of the cerebellar processing units, including processing from input signals to

specific type of coding, is implemented as cerebellar Simulink library [22, 23] and the same is

used for performing simulations (MATLAB and Simulink are registered trademarks of The

MathWorks, Inc., www.mathworks.com/trademarks). Robot plant used for simulations is

same to the one presented in [23].



3. RESULTS AND DISCUSSIONS

Receptive fields used for coding input signals is one of factors that will determine ability of

cerebellar model to acquire robot model and quality of the acquired model. Of importance is

number of them, their shape and width. Also when seeking models with lower number of

processing elements some input signals may be left out, resulting with cerebellar model that

will not be able to acquire specific dynamical effects of robot.

Robot from [23] with single rotary joint in pendulum configuration is used. Robot has one

link of length 1 m and of mass 1 kg in pendulum configuration is used. Link mass is linearly

distributed over link length. Gripper is considered as point mass of 0.2 kg situated at link end.

Friction coefficient is set to 0.35 Nm/rad/s. fixed gain constants for conventional

proportional-derivative (PD) controller are set to 30 for proportional and 4 derivative

constant. Update rate for PD controller was set to 1000 Hz and that of cerebellar controller

was set to 100 Hz. Reference trajectory for learning and testing is ideal sinusoidal trajectory

for joint position, joint speed and joint acceleration, with amplitude equal to 3π/2 rad. Quality

of acquired model is evaluated with maximum joint position and joint speed errors, and root

mean square (RMS) of errors. Discrete RMS is calculated from sampled signal samples over

one period of reference signal.

3.1. Two-Dimensional CMAC for Three-Dimensional Robot Problem

Idealized inverse dynamics model for robot plant, neglecting actuator dynamics (or

considered idealized as torque drive with corresponding controller), is three dimensional

problem [23] if dynamical friction is taken into consideration. For cerebellar model in

feedforward configuration, given in Figure 1(a), input signals are desired joint position,

desired joint speed and desired joint acceleration, but in this first case lower dimensional input

space will be used, by living out joint speed. If friction would not be present it would suffice

inverse dynamics of robot in question.

Input range for each input signal, -2π rad to 2π rad for joint position and that of joint

acceleration from -2π•(2π•0.2)2 rad/s

2 to 2π•(2π•0.2)

2 rad/s

2, is divided into 72 quanta of equal

width [11, 23]. Input signals are each population coded with 73 receptive fields of triangular

shape, 8-quanta wide. Reference sinusoidal signal period is set to 10 s. Output of the PD

controller served as teaching signal. Signals for calculation of RMS are sampled at cerebellar

controller rate of 100 Hz. Simulink model of control system with robot plant and PD

controller, augmented with cerebellar model inverse dynamics feedforward controller is

shown in Figure 4. Inside 1D_PosOri_mL_Ovar_vInt bock is robot model implemented in

SimMechanics. Load mass (mL) and orientation angles relative to base (alfab) are set to zero.

Control action of PD controller is at output of ZOH (Zero-Order Hold) block. It is also the

place where we set update rate for PD controller. Cerebellar controller block contains two-

dimensional CMAC, shown in Figure 5, with two blocks for coding input signals by


configuration


Figure 4 Control system with robot plant and PD controller, augmented with cerebellar model inverse

dynamics feedforward controller

Population code, blocks 1L_1D_TBF_x1 and 1L_1D_TBF_x1. Block 4D_BF (GrC-GoC)

will create up to four dimensional receptive fields from up to four input signals, currently only

two inputs are used and other two are set to constant 1 with Dim_x3x4, becoming effectively

unused. Purkinje cell and weights are implemented inside PC block, where learning gain and

learning rate are set. Update rate of cerebellar controller is set to ZOH_CMAC (zero-order

hold for cerebellar controller). Two switches are used to select initial vales for weights and to

set manually learning to on or to off state. Automatic control of learning state is done with

block named Time Profile for Error. It will block learning for two reference signal periods at

beginning of simulation in order to obtain PD control reference performance. It will also

block learning for two other reference signal periods by the end of simulation run, to create

conditions for testing quality of robot inverse dynamics model acquired by cerebellar

controller. Numbers nearby signal lines in Figure 4 and Figure 5 indicate dimensionality of

corresponding signal. Two lines at mossy fibers are of dimensionality 73 each, indicating that

each input signal has 73 receptive fields, while number 5329 nearby parallel fibers represents

number of two-dimensional receptive fields formed from all combinations of input receptive

fields (73•73 = 5329).

Trend of RMS position error during three phases of robot control, with PD controller

augmented with feedforward cerebellar controller, at different learning gains is shown in

Figure 6 Lowest trace marks automatic learning state control mentioned in previous

paragraph. Learning gains are given in the inset of the same figure. During first phase lasting

20 seconds (PD Only Control) weights are initialized to zero and learning is set to off, causing

only PD controller to handle all control action. All traces during this phase are coincident,

with rising part during first 10 seconds being initialization of RMS calculation blocks. It is

followed by constant value during next 10 seconds, marking performance of control with PD

controller only. Any change in the trace reflects cumulative behavior during one input signal

period at past from current position.



Figure 5 Cerebellar controller with two inputs

Second phase (Learning Phase) starts by automatic switching of learning to on. While

cerebellar controller acquires inverse dynamics model RMS errors (and corresponding

position and speed errors) will mainly have decreasing trend. Slope is steeper with higher

learning gain and control performance will improve. All traces converge, or would converge

for longer learning phase, to RMS position error value around 0.058 rad. This value under

sinusoidal like shape of error would correspond to maximal position error of 0.041 rad.

Inability of making zero control error is caused by controller being structurally inappropriate

to learn whole dynamical effects present in robot plant. In this case cerebellar controller will

not be able to acquire friction dynamics being speed dependent, since speed information is not

present as input dimension. Persistent error will have other adverse effects on quality of the

acquired model when transiting from learning on to learning off phase. It will cause

permanent oscillation of weights, similar to adaptive controllers, with possibly higher control

errors after learning phase when learning is switched off. This is visible on third phase of

traces in Figure 6 (Testing Phase). It can be seen that for traces that reached learning limit,

Figure 6 RMS of position error for three phases of robot control, with PD controller augmented with

feedforward cerebellar controller, at different learning gains

Cerebellum (Cerebellar

Cortex)


configuration


There is increase in RMS error when learning is switched off that will be higher the higher

the learning rate signifying more pronounced adaptive controller behavior. Variations during

first 10 seconds of this phase, from 80 s to 90 s, are caused from RMS calculation time

window and it will pass after one period and settle to constant value, since CMAC control

will be static. Lower learning gains will still manifest this effect, but with lower amplitudes

from settled limit.

The behavior with lower learning gains takes longer time to acquire model, up to the

modeling capabilities of given controller, and with less pronounced error increase when

learning turns off. This is shown in Figure 7 for 100 periods of learning signal (1000 seconds)

with learning rate 0.1/(16*8), equivalent of the trace with same mark in Figure 6. It can be

seen that gross learning will happen during first 100 seconds (80 seconds of learning). After

that small adjustments are done, mainly decreasing speed error, with little effect on maximum

and RMS of position error. They will be manifested as smoothing action to position control

and to smother control action by CMAC controller, seen by comparing corresponding insets

at beginning and end phase of Figure 7(a). RMS of speed error contains jump some instant

after first 10 seconds. It is caused from discontinuity of desired speed at beginning, followed

by

Figure 7 Takeover of control by feedforward cerebellar inverse dynamics controller during learning

and RMS errors trends. (a) Position error, speed error, and torques from PD and CMAC controllers.

(b) RMS of position error, RMS of speed error, and RMS of PD control action. All subfigures have by

two insets. Left insets are zoomed plots of first 100 seconds of corresponding plot with same vertical

axis scaling. Right insets show zoomed last 40 seconds (960 s to 1000 s) of corresponding plot with

vertical axis best-fit. During time range 0 s to 20 s learning is off and only PD controller is generating

control action. Following next and up to 980 s learning is with CMAC controller acquiring inverse

dynamics model. Learning is off also during last 20 seconds (980 s to 1000 s) and shows success of

acquired model by CMAC. Right insets cover transition from learning on to learning off, by 20

seconds each



Intense action of PD controller (initial sharp peek is clipped in second plot of Figure 7(a)).

It will fast enter to dynamics steady state, indicated with flat part of RMS errors until initial

20 seconds. This initial transitory phase is visible also at RMS of position error, but it is

relatively low. With proper trajectory planning they can be highly attenuated. Relative

increase in error when learning stops is relatively low and barely noticeable from main plots

in Figure 7(b), but right insets with last 40 seconds magnified for better visibility show

details. RMS of position error will increase for 1.147%, from about 0.05665 rad to about

0.0573 rad. Maximum position error from 0.0345 rad will increase to 0.0385 rad, with

theoretical value being 0.03454 rad.

3.2. CMAC with Complete Modeling Capability would Zero Control Error

When all significant signals (dimensions) are used as input for cerebellar controller, it is

expected from it to be able to learn complete model of the plant. Control error in ideal case

would be zero. In practical situations quality of acquired model will be determined from other

constructive parameters, helping to make control error lower but not zero. For example from

previous section complete modeling capability can be tested in two forms, by making friction

zero (that can be done in simulations), or by increasing input space dimensionality by adding

joint speed as third input. If friction is made zero resulting model will be two-dimensional

(2D), while for nonzero friction values model will be three-dimensional (3D). Only linear

dynamical friction will be considered, and its coefficient Bd will be 0 or 0.35. Each input

signal will be population coded by 17 two quanta wide triangular receptive fields (RF). In 2D

case there will be 17•17 = 289 2D RF, and there will be 17•17•17 = 4913 3D RF for 3D input

space. Learning gain will be 0.1/(1*1) for all simulations of this section. Comparison between

models will be based on achieved control performance with learning course similar to one

shown in Figure 6. Maximal position error, RMS of position error, maximal speed error, and

RMS of speed error will be compared, at during last period of learning phase (time range from

70 s to 80 s) and during last 10 seconds (time range from 90 s to 100 s) when learning is off.

Figure 8 shows results of four simulation runs. First two bars of each group correspond to 2D

CMAC and other two to 3D CMAC, with left bar of the subgroup being for friction

coefficient equal to 0 and nearby bar being for friction coefficient equal to 0.35. It can be seen

from maximal position error and from RMS of position error that that all models are able to

learn inverse dynamics model, with exception of 2D CMAC when friction coefficient is not

zero (2D CMAC used for 3D problem). 3D CMAC learning is not influenced much from

value of the friction coefficient, seen as almost equal

Figure 8 Position and speed maximal and RMS errors for two-dimensional (2D) and three-

dimensional (3D) CMAC, for problems without (Bd = 0) and with friction (Bd = 0.35). Two left bars of

each group correspond to 2D CMAC and other two bars correspond to 3D CMAC. Left bars of each

subgroup of two are for problems without friction


configuration


Height for all right bar pairs of all groups. It is assumed that ranges of input signals are

covered properly. Individual behaviors follow the same trend as shown in previous section

(Figure 6 and Figure 7), like difference in values at ending phase of the learning and steady

state of testing phase that can be seen also in Figure 8. It is highly pronounced for 2D CMAC

in presence of friction, corresponding to the black trance with learning rate 0.1/(16*1) in

Figure 6, where error while learning becomes lower but it will experience considerable

increase when learning is switched to off (brown and dark-blue bars in Figure 8). This is

caused from large learning gain in presence of persistent error. Behavior with learning on and

off in 3D CMAC is similar to learning with lower learning gain, relative to the CMAC

structure.

3.3. Number, Shape and Width of Receptive Fields

These parameters will be explored with 2D CMAC in problems without friction, to avoid

shadowing of structural behavior from frictional effects. All evaluation cases will follow same

learning and testing phases as shown in Figure 6.

Trend of RMS of position error for a number of simulations with different number of

receptive fields (RF) of triangular shape is shown in Figure 9. Two marks are given for every

simulation run, red dot for error at the end of the learning phase and blue circle for testing

phase error. Duration of learning is same for all simulations, being 6 input signal periods

(cycles). Normally errors can be made smaller with longer training phases, if there is no

inherent limitation. For digital control systems update period for conventional and cerebellar

controller will be some of limiting factors.

Figure 9 Number of triangular receptive fields (RF) and RMS of position error for feedforward

CMAC in two phases of learning. RMS error for each number of RF is given for two phases of

learning, when learning is on (red point) and at dynamics steady state when learning is off (blue

circle). Increasing number of RF will decrease RMS. In practical situations there may be inherent

limitations that prevent this

Results of on-line learning for three shapes of RF under different number and width of RF

are shown in Figure 10. Three shapes of RF were tested, with square (SBF), triangular (TBF)

and Gaussian (RBF) basis function. Results of learning for different learning gains at selected

receptive fields shape are given in three columns by three subplots each. Rows correspond to

a given number of RF and of a given width. One pair of bars corresponds to one simulation

run with coefficient marked under it determining learning gain. Blue bars represent RMS of



position error at the end of the learning phase (as in Figure 6), while brown bars dynamics

steady state RMS of position error during testing phase. General trend for all nine subplots is

similar. When learning gains are larger (several left pairs of bars in subplots) errors when

learning is on are lower than that when learning is off. These learning conditions have more

pronounced adaptive tracking behavior that aids in lowering tracking error, but model

acquired by cerebellar neural network will be less accurate, manifested by larger control error

when learning is turned off. Several pairs at the right of each subplot have opposite behavior,

with RMS error becoming lower when learning is off that the on when the learning was on.

Since when learning stops cerebellar controller performance cannot change, this is only an

indication of a decreasing cumulative trend during last period of RMS error calculation. Final

performance is obtained after one RMS calculation period passes, and RMS error will be

constant there after (for periodic reference signal), shown during last 10 seconds of Figure 6.

Some pair around middle of subplots for certain learning gain will have about the same errors

for two bars of the group. For gains lower than this (right pairs) learning may go

proportionally slower but with better model acquired by cerebellar controller. Larger learning

gains than this limit may provide faster performance improvement with less accurate model

learned by cerebellar controller, also accompanied with the risk off making control unstable.

Figure 10 CMAC learning with receptive fields (RF) of different shape. Columns show results of

learning for different learning gains at selected receptive fields shape. Rows are for given number of

RF and of given width. One pair of bars corresponds to one simulation run with coefficient marked

under it determining learning gain. Blue bars represent RMS of position error at the end of the learning

phase (as in Figure 6), while brown bars dynamics steady state RMS of position error during testing

phase. SBF: RF with square basis function (BF); TBF: RF with triangular BF; RBF: RF with Gaussian

BF


configuration


For all cases after gross learning (learning phase from Figure 6) RMS errors are reach

about the same level independent of learning gain, the level determined by structural

parameters. Notice the scaling for first row being almost three times higher than that of two

other rows, 0.09 and 0.035. Better performance with higher number of RF is noticed at each

column by looking from top to bottom. By increasing order of RF it is expected to improve

modeling performance of cerebellar controller. This can be seen by looking at rows from left

to right, where only shape (order) of RF will change (increase), leading to lower RMS errors

and better control performance. While behavior for SBF and TBF is as expected, behavior of

RBF is somehow different. First, error with lower number of RF (9) is slightly lower than that

with higher number of RF (17), first and second subplots of third column. This may be to

more favorable match of centers and widths of RF for problem that will be learned. Opposite

of this may be seen in Figure 9 with triangular RF, range from 12-15 RF. Second, increasing

number of RBF RF from middle to bottom subplots (17 RF to 25 RF) shows no improvement

on RMS error. The cause for this behavior may be in normalization of two-dimensional RF.

First two basis functions have self-normalization property, but this is not the case for

Gaussian RF used in the third case. Problem of normalization can be overcome by adding

normalization stage, similar to fuzzy neural networks, or by using basis functions that have

normalization as inbuilt property, like B-splines (having SBF and TBS as two first members).

4. CONCLUSION

Cerebellar controller in feedforward configuration was used as augmentation to conventional

proportional-derivative for path tracking problem in robotics. Structure of cerebellar

controller was of cerebellar model articulation controller (CMAC) with fully coupled Albus

overlays. Controller acquired robot inverse dynamics during on-line learning. Different

structural aspects and influence in accuracy of acquired model were explored. Simulink model

for control system, including CMAC and robot plant, was used for simulations. Cerebellar

Simulink model that preserves layered structure was used. Cerebellar controller will fast learn

inverse dynamics inside its modeling capability scope, and will take over control from

conventional proportional-derivative controller. It was shown that using lower dimensional

input space than that of the problem may limit modeling capabilities of cerebellar controller in

acquiring inverse dynamics model, and may result at inability of decreasing control error.

Increasing number of receptive fields for coding input signals will decrease error, but there

may be cases that do not follow this trend in regular way, with negative or positive effects.

Usually this issue can be bypassed with input layer adaption, where centers and widths of

receptive fields for coding input signals will be determined adaptively. We evaluated only

uniform distribution of receptive fields, with same widths per dimension, assuming them as

information source for distributed processing with overall result of uniform variation. Other

aspects explored were shape the width of receptive fields. Increasing order of receptive fields

increased accuracy of acquired model with same number of receptive fields for same length of

training phase. For learning gains lower than some value control performance in RMS sense

will not degrade when passing from learning phase to phase of controlling with acquired

model. Learning gains above that value may provide faster decrease of control error,

accompanied with less accurate model acquired by cerebellar controller, but may risk making

control system unstable. Basis functions without normalization property manifested adverse

effects by not being able to increase accuracy of acquired inverse dynamics model, caused by

activity fluctuations of resulting higher dimensional receptive fields that could not be

overcome by learning of selected time span. General trend of RMS position error in relation

to learning gain was same for all tested shapes, widths and numbers of receptive fields.



REFERENCES

[1] Wan Kyun Chung, Li-Chen Fu, Torsten Kröger. Motion Control. In: Bruno Siciliano,

Oussama Khatib, Eds., Springer Handbook of Robotics, Second Edition. Springer, 2016,

pp. 163-194.

[2] K. S. Fu, R. C. Gonzalez and C. S. G. Lee. Robotics: Control, Sensing, vision, and

Intelligence. McGraw-Hill, Inc., 1987.

[3] Zhihua Qu and Darren M. Dawson. Robust tracking control of robot manipulators, IEEE

Press, New York, 1996.

[4] Peter Corke. Robotics, Vision and Control: Fundamental Algorithms in MATLAB,

Springer-Verlag, 2011.

[5] Nazmul Siddique and Hojjat Adeli. Computational Intelligence: Synergies of Fuzzy

Logic, Neural Networks and Evolutionary Computing, John Wiley & Sons, Ltd, 2013.

[6] Masao Ito. The Cerebellum: Brain for an Implicit Self, Pearson Education, Inc., 2012.

[7] J. C. Eccles, M. Ito and J. Szentágothai. The Cerebellum as a Neuronal Machine. Springer

Science+Business Media, New York, 1967.

[8] D. Marr. A Theory of Cerebellar Cortex. The Journal of Physiology, vol. 202, no. 2, Jun

1969, pp. 437-470.

[9] J. S. Albus. Theory of cerebellar function. Mathematical Biosciences, vol. 10, no. 1/2,

February 1971, pp. 25-61.

[10] M. Ito, M. Kano. Long-lasting depression of parallel fiber-Purkinje cell transmission

induced by conjunctive stimulation of parallel fibers and climbing fibers in the cerebellar

cortex. Neuroscience Letters, vol. 33, no. 3, 13 December 1982, pp. 253-258.

[11] J. S. Albus. New approach to manipulator control: the cerebellar model articulation

controller (CMAC). Transactions of the ASME Journal of Dynamic Systems,

Measurement, and Control, vol. 97, no. 3, September 1975, pp. 220-227.

[12] Chan-Mo Kim, Kwang-Ho Choi and Yong B. Cho. Hardware Design of CMAC Neural

Network for Control Applications. Proceedings of the International Joint Conference on

Neural Networks, 2003, 20-24 July 2003, Portland, OR, USA, pp. 953-958.

[13] Lavdim Kurtaj, Vjosa Shatri and Ilir Limani. New model of information processing at

granule cell layer makes cerebellum as biological equivalent for ANFIS and CANFIS:

Sharing of processing resources and generalization. IEEE International Conference on

Fuzzy Systems, 2017, pp. 1-8.

[14] Raul Rojas. Neural Networks: A Systematic Introduction. Berlin, New-York: Springer-

Verlag, 1996.

[15] Francisco J. González-Serrano, Aníbal R. Figueiras-Vidal, and Antonio Artés-Rodríguez.

Generalizing CMAC Architecture and Training. IEEE Transactions on Neural Networks,

Vol. 9, No. 6, November 1998, pp. 1509-1514.

[16] S. D. Teddy, E. M.-K. Lai and C. Quek. Hierarchically Clustered Adaptive Quantization

CMAC and Its Learning Convergence. IEEE Transactions on Neural Networks, Volume

18, Issue 6, November 2007, pp. 1658-1682.

[17] Hyongsuk Kim and Chun-Shin Lin. Use of Adaptive Resolution for Better CMAC

Learning. International Joint Conference on Neural Networks, 1992. IJCNN, 7-11 June

1992, Baltimore, MD, USA, USA, pp. I-517-I-522.

[18] R. Bellman. Adaptive Control Processes. Princeton University Press, Princeton, 1961.

[19] Chiang Ching-Tsan and Lin Chun-Shin. CMAC with General Basis Functions. Neural

Networks, Elsevier Science Ltd., October 1996, Volume 9, Issue 7, pp. 1199 - 1211.


configuration


[20] Lavdim Kurtaj, Ilir Limani, Vjosa Shatri and Avni Skeja. Dependence of CMAC Neural

Network Properties at initial, during, and after Learning Phase from Input Mapping

Function. Proceedings of the 12th WSEAS International Conference on Systems Theory

and Scientific Computation (ISTASC’12), Istanbul, Turkey, August 21-23, 2012; ISBN

978-1-61804-115-9, pp. 187-192.

[21] Bruno Siciliano, Lorenzo Sciavicco, Luigi Villani and Giuseppe Oriolo. Robotics:

Modelling, Planning and Control, Springer, 2011.

[22] Vjosa Shatri, Lavdim Kurtaj and Ilir Limani. Hardware-in-the-loop architecture with

MATLAB/Simulink and QuaRC for rapid prototyping of CMAC neural network

controller for ball-and-beam plant. Proceedings of 2017 40th International Convention on

Information and Communication Technology, Electronics and Microelectronics, MIPRO

2017, 2017, pp. 1201-1206.

[23] Lavdim Kurtaj, Vjosa Shatri and Ilir Limani. Comparative performance of two types of

cerebellar model controllers for controlling robot joint: size, learning and generalization.

Proceedings of 2017 6th Mediterranean Conference on Embedded Computing, MECO

2017 - Including ECYPS 2017, 2017, pp. 1-5.

[24] P. Kamal Kumar, Taj, L. Praveen, Anoop Joshi and G Musalaiah. Fabrication of

Pneumatic Pick and Place Robot. International Journal of Civil Engineering and

Technology, 8(7), 2017, pp. 594–600.

ON-LINE LEARNING OF ROBOT INVERSE DYNAMICS WITH CEREBELLAR … · functionality in articulating...

Documents

Transcript of ON-LINE LEARNING OF ROBOT INVERSE DYNAMICS WITH CEREBELLAR … · functionality in articulating...