A holistic framework for planning, real-time control and ...A holistic framework for planning,...

A holistic framework for planning, real-time control and model learningfor high-speed ground vehicle navigation over rough 3D terrain.

Nima Keivan1, Steven Lovegrove1 and Gabe Sibley1

Abstract— This paper describes a local planning, control andlearning framework enabling high-speed autonomous ground-vehicle traversal of rough 3D terrain replete with bumps,berms, banked-turns and even jumps. We propose an approachbased on fast physical simulation and prediction, which wefind offers numerous benefits: first, it takes advantage of thefull expressiveness of the inherently non-linear, highly dynamicsystems involved; second, it allows for the fusion of localplanning and model-based feedback control all within a singleframework; third, it allows vehicle model learning. The finaland most important reason to use physical simulation as aunifying framework is that it works well in practice. The systemis experimentally validated on a high speed nonholonomicremotely controlled vehicle on undulating terrain using ascanned 3D ground model and motion capture ground-truthdata. Parameter reduction is achieved with the use of cubiccurvature control primitives and a fast precomputed lookuptable.

I. INTRODUCTIONRecent developments in path planning and navigation have

enabled operation in increasingly challenging environments.The use of motion primitives [9] and stochastic searchmethods such as RRT and RRT* [8] [6] have resultingin algorithms that successfully navigate complex obstaclefields even in higher order configuration space. A majoradvantage of these methods is that they can employ nonlin-ear dynamics models thereby enabling physically accurateplanning in complex environments without approximationor linearization. However, this advantage comes at a perfor-mance price as stochastic methods invariably sample infeasi-ble trajectories. Conversely, optimization based methods [4]employ effective initial guesses and numerical or analyticaloptimization techniques to rapidly converge on optimal paths.However due to the reliance on the accuracy of the initialguess, these methods are susceptible to failure or suboptimalperformance depending on the quality of this guess.

The quality, optimality and methodology of the plansnotwithstanding, their open loop performance in real robotsis inevitably impaired by the existence of imperfections orextraneous inputs that may not have been included in thedynamics model. Therefore for real-life applications, someform of closed loop control is desired. Moreover, both theplanner and control systems rely on an accurate model inorder to properly control and plan for the robot. Due tothe difficulty of obtaining accurate model parameters, itis desirable to learn model parameters by observing theresponse of the robot to control inputs. Recent developments

1N. Keivan, S. Lovegrove and G. Sibley are with the Departmentof Computer Science, George Washington University, Washington DC,nima|slovegrove|[email protected]

Fig. 1. a) Local plan (red curve) generated between two points on a3D scanned quarter pipe ramp and the simulated vehicle tracking the planin open-loop mode. b) Motion captured test vehicle performing the samemanoeuvre.

in Model Predictive Control(MPC) [3] and Learning-basedModel Predictive Control (LBMPC) [2] [10] have strived toboth implement model-based control schemes and to improvethe underlying model parameters by observing the responseof the system to inputs. The advantage that these schemeshold over more traditional control methods is twofold: theincorporation of increasingly complex models, and the abil-ity to generate control policies over a predicted series oftimesteps into the future. The latter offers clear advantageswhen controlling an infeasible trajectory or one that wasmade using an inaccurate model.

An alternative to model predictive control, traditionalfeedback systems use static and/or dynamic feedback of thestate to determine the controls for the next time steps. Recentdevelopments in this field have resulted in methods allowingthe calculation of Lyapunov functions for nonlinear systems[5] and defining graphs of Lyapunov-stable region aroundstates as in the case of LQR-Trees [11]. Generally speaking,these methods rely on the linearization of the state transferfunction in order to analytically obtain control policies.

Considering that the planning, MPC and model learningsystems all utilise a model of the system, a unified systemcould be conceived to utilise the same model to performall three tasks. The main contribution of this paper issuch a system encompassing planning, control and onlinemodel learning using a unified, simulation-based model andoperating in real time. We use a singular boundary valuesolver in all three cases in conjunction with cubic curvaturepolynomials for parameter reduction [7] . This allows accu-rate planning and control in full 3D environments and allowsthe learning of physical model parameters such as wheelradius, steering angle ratios and friction coefficients. Ground-truthed experimental evidence is also presented showcasingthe results of the system planning over waypoints on un-dulating terrain and subsequently tracking the trajectory ona high speed nonholonomic robotic platform with on-linemodel learning.

II. METHODOLOGY

The different components of the planning and control sys-tem rely on a unified boundary value solver which producesa control law in order to navigate the robot between thestart and goal 6DOF poses. For the purposes of this paper,we have implemented a parameter reduction and boundaryvalue solver to plan for and control a nonholonomic remotecontrol robot through high speed trajectories on undulatingterrain. This formulation relies on a good initial guess for thesteering and acceleration commands between two waypointsw1, w2 each parametrized as [x, y, z, p, q, r, v] where v is thedesired velocity with which the robot should reach the 6DOFcoordinates of the waypoint. The optimization is facilitatedwith an initial guess utilising cubic curvatures for steering,and a linear velocity profile between waypoints.

A. Dynamics Model

The centrepiece of the system is the dynamics model. Weuse the Bullet Physics Engine [1] to simulate the dynamicsof a vehicle with nonholonomic constraints on 3D terrain.A multithreaded framework allows the full use of modernmulticore processes resulting in quick simulations for finite-difference based optimization. Traditionally, the state transferfunction is defined as:

x̌ = f(x, u, p)

Where x is the current state, u is the control input, and pdefines the model parameters. In the case of the numericallyintegrated Bullet Physics model, the state transition is definedas:

xt+1 = F (xt, u, p)

Where F (x, u, p) encompasses the entirety of the dynam-ics of the vehicle and interaction with the terrain. In generalterms, this function can be replaced with any simulationsystem resulting in an update in the state, given the controlinputs, previous state and model parameters.

Fig. 2. Integration of cubic curvature polynomials in Cartesian spacebetween [0, 0, 0, 0] and [2, 0.6, π, 0] with varying values of π

B. Parameter Reduction

The boundary value solver used relies on the reduction incontrol space dimensionality with the use of a control law.In the proposed system, we have employed cubic curvaturepolynomials [7] as a means to parameter reduction.Thetrajectory curvature is parametrized as a function of thetravelled distance in the following form:

κ = a+ bs+ cs2 + ds3 (1)

Where a is the starting curvature, b, c, and d are thecubic polynomial coefficients and s is the distance travelledalong the trajectory. Individual polynomials are constrainedusing the endpoints coordinates [x, y, θ, κ]. To obtain thecubic parameters necessary to reach the desired endpoint, aprecomputed lookup table is employed followed by a Gauss-Newton optimization using the analytical Jacobian of thepolynomial. Figure 2 shows an example of cubic curvaturepolynomials integrated in 2D Cartesian space. However sincethe planner operates in 3D space, we project the curvaturepolynomial onto a 2D plane, with a normal which is linearlyinterpolated between the normals of the two waypoints asshown in Figure 3. This allows the 2D curvature to betterestimate the control law that will guide the vehicle andresolves singularities from waypoints perpendicular to theground plane.

Fig. 3. The projected 2D trajectory (dotted blue) between two 3D waypointson a curved manifold. This serves to better estimate the trajectory betweenthe waypoints as well as eliminate singularities when projecting waypointswhich are perpendicular to the ground plane.

C. Model Compensation

The linear velocity profile used between waypoints em-ploys a constant acceleration model. However, due to theunderlying physics-based vehicle model, this simple accel-eration control law does not constitute a good initial guess.To improve upon this guess, several compensation factorsare utilised to mitigate the extraneous influences introducedby the terrain and vehicle dynamics. Compensations areapplied iteratively after each physics model update. Thisallows folding in detailed terrain information such as slopeand also simulated vehicle parameters such as suspensionforce and extension. Furthermore, the underlying constantacceleration model remains valid once all other factors arecompensated for.

1) Gravity Compensation: The constant accelerationmodel used between waypoints is by definition unable toaccount for terrain slope and undulation effects. We haveimplemented a simplified compensation model which ac-counts for the axial forces imparted by wheel interaction withinclined terrain (See Fig. 4a). The position of the wheelsas well as the corresponding contact normal is obtainedafter each simulation step, and used to compensate theacceleration model for the following step.

2) Steering Compensation: Figure 4b shows the axialforce component imparted as a result of the front wheeldeflection during cornering. This force results in significantdeceleration during tight turns and is compensated for in asimilar fashion to gravity compensation at the end of eachsimulation step.

3) Friction Compensation: Friction compensation is un-dertaken iteratively similar to previously discussed factors.At each timestep, the friction forces on each wheel arecalculated by the physics model. This is then used to offsetthe constant acceleration model accordingly. We have optedto use a simple friction model based on static/dynamiccoefficients of friction, and the normal forces imparted onthe springs. This information is readily available from thephysics-model at each simulation step.

D. Boundary Value Solver

The boundary value optimization is performed by min-imizing the trajectory cost C which we have defined asthe 6 dimensional residual between the destination waypointand the simulation endpoint. The optimization is performedby first solving a Gauss-Newton iteration with line search,and if the error norm is not reduced, a coordinate descentstep is performed if possible. The Jacobian of the forwardsimulation is defined as:

J =

∂c1∂p1

· · · ∂c1∂pn

.... . .

...∂cn∂p1

· · · ∂cn∂pn

(2)

Where pn is a control law parameter (such as a curvaturepolynomial coefficient) and cn is a cost parameter. In thepresented implementation, the cost is calculated as projectedback onto the 2D plane of the cubic curvature polynomial

g

n1

n2

a) b)

Fig. 4. a) Axial forces (in red) resulting from wheels on inclined terrain.b) axial forces (in red) resulting from front wheel steering deflection.

(See Fig. 3) and is parametrized as [x, y, θ, v]. Each col-umn ∂c1

∂pj· · · ∂cn∂pj

of J is calculated by pushing forward thedynamics model using a set of control parameters p withperturbations ±ε along dimension j. This computation isaccelerated by the use of a multithreaded forward physicsmodel, solving for all dimensions of the Jacobian simulta-neously. The Gauss-Newton delta (δp) is then calculated byCholesky factorization as follows:

JTJ → RTR

RT y = JT b

Rδp = y

Where b is the vector of residuals calculated by runningthe current parameters p and obtaining the endpoint error(s).The validity of the assumption of quadratic convergencemade by this optimization is dependent on many factorsincluding interactions with the terrain and the dynamicsmodel. After obtaining the Gauss-Newton δp, we performa multithreaded line-search step by pushing forward thephysics model simultaneously with several scaled values ofδp.

pn+1 = pn + λ(δp)

Where λ ≤ 1 the a scaling factor. If none of the scaledvalues of δp improve upon the error norm, we performa coordinate descent if possible, by using the best normobtained when calculating the Jacobian (Eq. 2) by finite-differences. The optimization ceases if the either the errornorm is improved past a certain threshold, or if we are ina local minimum as indicated by the inability to perform acoordinate step to reduce the error norm.

E. Real-Time Control

In this section we present an MPC-like real-time controlscheme based on fast replanning to account for inaccuraciesand extraneous influences. Similar to MPC based controlsystems, our approach is formulated by constantly optimizingthe trajectory ahead of the vehicle by solving new controlplans which provide a viable control law from the vehicle’scurrent position, to a point on the trajectory further ahead.As part of the holistic approach, we have used the sameboundary value solver previously described to plan betweenwaypoints, in creating the control plans. Due to the unified

lookahead

Fig. 5. Replanning-based control. The active control plan’s (red) initialcurvature matches that of the vehicle’s current path curvature. Each controlplan is optimized to plan to a point ahead on the trajectory based on apre-define lookahead time.

underlying steering control law, the control plans tend toconverge back onto the original trajectory, thereby avoidingthe pitfalls of follow-the-leader trajectory trackers which areprone to diverge if the target vehicle is set too far ahead, oroscillate if the target vehicle is too close. However, in orderto achieve this behaviour, the starting curvature (denoted bythe constant a in Eq. 1) of the control cubic curvatures has tomatch that of the vehicle’s instantaneous path curvature asshown in Figure 5. This is also a requirement for smoothsteering between control plans, as each will start with acurvature equivalent to that of the vehicle’s current path.Consistent starting curvature is guaranteed by setting theconstant a before the 2D optimization which solves Eq.2. However, if the initial curvature required is too high,the 2D cubic required to solve the control plan might beinfeasible. In these cases, the control system falls back toan initial curvature of zero to solve for a feasible controlplan. Ideally, an alternative control law formulation wouldenable control plans that converged rapidly with any choiceof initial curvature.

Furthermore, each control plan takes a small amount oftime to optimize via the boundary value solver. During thistime, the vehicle will be following the previous control plan.Our implementation includes a timestamp which is used tosmoothly interpolate between plans as they become available.The control plans also constitute an any-time algorithm, asthe optimization simply improves upon the quality of theinitial guess with each iteration. The more time given tothe algorithm, the higher the accuracy of the endpoint willbe. It must be noted that each control plan is a valid setof controls that should ideally converge the vehicle backonto the trajectory in open-loop mode, given an accuratevehicle and terrain model. Therefore real-time control isestablished if the time taken to optimize tractable controlplans is less than the lookahead time, allowing constantreplanning without ever exceeding the bounds of a singlecontrol plan.

F. Online Model Learning

The quality of the control and planning provided by theaforementioned system relies significantly on the quality ofthe underlying model. Since a full physics-based model isused in the optimization, the number of parameters which

segment time

trajectory time

Fig. 6. The observed trajectory (black) shown with segments and theirrespective simulation results (red). The disparity between the simulationendpoint and the segment endpoint is the residual which is minimized inthe optimization. If the residual is zero, the model perfectly matches thereal vehicle’s performance.

could be adjusted rules out the possibility of manual tuning.We propose an optimization based learning system whichis formulated almost identically to the control and planningsystems, and which serves to tune select parameters in themodel to match those of the real vehicle. The proposedmethodology involves observing the vehicle and also thecontrol commands given to it for a period of time, andoptimizing the underlying physics model to replicate theobserved behaviour, given the same control commands. Weformulate the optimization by splitting the observed trajec-tory into segments as shown in Figure 6. Each segment isthen simulated and a Jacobian formed as per Eq. 2 but wherepn is a model parameter which is changed by ±ε. Due tothe nature of the optimization, we can obtain JTJ and JT bfor the trajectory directly as follows:

JTJ =

n∑i=0

JTi Ji

Jb =

n∑i=0

JTi bi

Where Ji is the Jacobian if the ith segment, n is the totalnumber of segments, and bi is the error vector of the ithsegment which we have defined as the disparity between theobserved trajectory and the final position of the simulatedvehicle (See Fig. 6). We then apply the resulting δp to themodel and repeat the process. As per the planning and controloptimization formulations, the learning system implements aGauss-Newton and line search stage which is followed bya coordinate descent stage if needed. The optimization endswhen either a sufficiently small norm is obtained, or if a localminimum is detected. The learning system is implemented asan on-line algorithm allowing continued refinement of modelparameters.

III. RESULTS

The planner, control and learning systems were experi-mentally validated in a motion captured environment, usinga terrain model which was 3D scanned using a MicrosoftKinect sensor combined with the motion capture system anda fusion algorithm. Due to the simulation-based model, anymethod of obtaining 3D terrain data could be used to simu-late the dynamics. This includes real-time acquisition using

cameras, laser scanners and/or dense tracking and mapping(DTAM) methods. In our experiments the waypoints weremanual placed over the terrain in order to put the vehiclethrough desired manoeuvres including straight tracks, curves,steep inclines and jumps.

−5

0

5

10

Acc

eler

atio

n (m

/s2 )

27.4 27.6 27.8 28 28.2 28.4 28.60

0.2

0.4

0.6

0.8

Stee

ring

(rad

)

Time (s)

Fig. 7. Acceleration (red) and steering (blue) commands generated by theplanner after optimization for the ramp manoeuvre depicted in Fig. 1. Notethe gravity compensation during the uphill and downhill sections resultingin acceleration and braking forces.

−4 −2 0 2 4

−10

12

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

x (m)y (m)

z (m

)

Fig. 8. Motion capture results (blue) of the control system running on aplanned trajectory (red) with a small jump (center) and quarter-pipe rampmanoeuvre (left). The model was tuned using a combination of the learningsystem and manual adjustments.

A. Boundary Value Solver

Due to the quality of the initial guess, the boundaryvalue solver successfully resolves feasible trajectories be-tween waypoints. Furthermore the underlying model followsthese trajectories precisely in open-loop control. Howeverthe choice of waypoints heavily influences the success orfailure of the planner, as is expected. Figure 1 shows a plangenerated between two waypoints on a scanned 3D modelof a quarter-pipe ramp and Figure 7 shows the resultingacceleration and steering commands. Gravity compensationcan be seen in Figure 7 and is vital to the feasibility of thisplan as the vehicle needs to accelerate uphill and deceleratedownhill in order to maintain velocity. The local planner canfail if the waypoints are poorly positioned or their velocitieschosen improperly, for example if two waypoints are placedeither side of a wall.

B. Real-Time Control

The real-time controller was tested on a trajectory includ-ing a small jump and sharp turn over a quarter-pipe ramp.Figure 8 shows the resulting vehicle path (blue) comparedto the planned trajectory (red). It must be noted that themodel used in this experiment was tuned using a combinationof the learning system and manual adjustments. Manualadjustments were necessary as model-learning was not per-formed in high acceleration scenarios such as jumping, andsmall adjustments to the learned parameters was required tooptimize the performance. The results obtained show that thevehicle is capable of accurately tracking the trajectory evenin challenging manoeuvres, however as seen in Figure 8 ,divergence can still be observed in the case of jumps wherethe steering is ineffective. This has the tendency to disrupt thereal-time controller, as the boundary value Jacobian becomesinvalid if changes in control input do not adequately perturbthe endpoint of the control plan. This is the case when thevehicle is airborne.

−2 −1 0 1 2 3 4

−1

0

1

2

3

x (m)

y (m

)

ReferenceTime < 50sTime < 90sTime < 120s

20 40 60 80 100 120 1400

0.2

0.4

0.6

0.8

Time (s)

Valu

e (s

ee le

gend

)

Wheel Base(m)Friction Coefficient (unit−less)

Fig. 9. Motion capture results of the control system (top graph) runningon a planned figure-eight trajectory (red) while performing online modellearning. The trajectories at various times show the improvement in trackingas the model parameters converge (bottom graph) to stable values.

C. Online Model Learning

The learning system was tested on a figure-eight trajectoryover two model parameters: friction coefficient and wheelbase. These model parameters, while not being all encom-passing in their influence over the model have significantimpact on the steering and acceleration response of the vehi-cle. They were chosen to test the learning system’s responseto deviations from the trajectory. While these parametershave underlying physical units, it is not expected that thevalues obtained from the model learning will reflect thesevalues. This is due to the existence of unknown factorssuch as servo command coefficients, that will ultimatelychange the physical bases of the parameters. Nevertheless,the learned wheel-base parameter was observed to convergereliably to a value close to the actual vehicle wheelbase of0.28m. However, it is expected that learned values will bringthe underlying model closer to the real-world vehicle, and

therefore improve planning and control. Figure 9 shows theresults of model-learning on a flat figure eight trajectoryat different time intervals. It is evident that as the modelparameters for wheel base and friction change, the adherenceof the vehicle to the intended trajectory is significantlyimproved.

0

0.2

0.4

0.6

0.8

1

Erro

r (se

e le

gend

)

Pose Error (compound)Velocity Error (m/s)

30 40 50 60 70 80 90 100 110 1200.1

0.2

0.3

0.4

0.5

Time (s)

Valu

e (s

ee le

gend

)

Wheel Base(m)Friction Coefficient

Fig. 10. Aggregated 6 DOF pose (blue) and velocity (green) error asa function of time performed on the figure-eight trajectory of Figure 9.The effect of the convergence of parameters can be seen on the repeatedtrajectory error pattern.

20 40 60 80 100 120 140 1600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Time (s)

Valu

e (s

ee le

gend

)

Wheel Base(m)Friction Coefficient (unit−less)

Fig. 11. Model parameter evolution during on-line learning for multipleexperiments on the figure-eight trajectory of Figure 9.

Figure 10 shows the aggregated position and velocity errorfor a single run on the figure-eight trajectory, while perform-ing model learning. It can be seen that the repeated errorpattern around the trajectory monotonically decreases as theparameters converge. This also corresponds to a significantvisible reduction in overshoot and improved tracking. Figure11 shows the parameter convergence over multiple runs onthe figure-eight trajectory and with different starting valuesfor the parameters. It can be seen that the parameters, whilenot perfectly converging to the same point every time, tendto arrive at similar values.

IV. CONCLUSIONS

We have presented a holistic solution to local planning,real-time control and model learning which uses a unifiedsimulation-based underlying physics model, folding in com-plex vehicle and terrain dynamics. The presented solutionuses cubic curvature control laws to reduce the dimensionof the control space while employing iterative compensationto deal with extraneous effects such as friction, terrain slopeand steering deceleration. The solution was experimentallyvalidated on a motion-captured vehicle and shown to executemanoeuvres on banked terrain and small jumps in real-time.However the real-time control system is still susceptible todisruption over jumps, and the model learning system hasnot been fully validated with a large number of parame-ters. Further experiments should also evaluate the abilityto produce physically accurate model parameters. However,both systems have been shown to work well in practice andhave produced good results in a challenging setting. Futurework will be aimed towards testing the system on morechallenging terrain, validating the model-learning with moreparameters and implementing global planning solutions toplace waypoints using the same holistic approach.

REFERENCES

[1] Bullet physics engine. http://bulletphysics.org. [Online;accessed July-2012].

[2] Anil Aswani, Patrick Bouffard, and Claire Tomlin. Extensions oflearning-based model predictive control for real-time application toa quadrotor helicopter. In ACC 2012, to appear, Montreal, Canada,June 2012.

[3] Thomas Howard, Colin Green, and Alonzo Kelly. Receding horizonmodel-predictive control for mobile robot navigation of intricate paths.In Proceedings of the 7th International Conferences on Field andService Robotics, July 2009.

[4] Thomas M. Howard and Alonzo Kelly. Optimal rough terraintrajectory generation for wheeled mobile robots. Int. J. Rob. Res.,26(2):141–166, February 2007.

[5] Tor A. Johansen and N-Trondheim Norway. Computation of lyapunovfunctions for smooth nonlinear systems using convex optimization.Automatica, 36:1617–1626, 1999.

[6] S. Karaman and E. Frazzoli. Incremental sampling-based algorithmsfor optimal motion planning. In Robotics: Science and Systems (RSS),Zaragoza, Spain, June 2010.

[7] Alonzo Kelly. Trajectory generation for car-like robots using cubiccurvature polynomials. Pittsburgh, PA, June 2001.

[8] Steven M. Lavalle. Rapidly-exploring random trees: A new tool forpath planning. Technical report, 1998.

[9] Mihail Pivtoraiko and Alonzo Kelly. Kinodynamic motion planningwith state lattice motion primitives. In Proceedings of the IEEE/RSJInternational Conference on Intelligent Robots and Systems, 2011.

[10] Neal Seegmiller, Forrest Rogers-Marcovitz, and Alonzo Kelly. Onlinecalibration of vehicle powertrain and pose estimation parameters usingintegrated dynamics. In IEEE International Conference on Roboticsand Automation, May 2012.

[11] R. Tedrake. LQR-trees: Feedback motion planning on sparse random-ized trees. In Proceedings of Robotics: Science and Systems, Seattle,USA, June 2009.

A holistic framework for planning, real-time control and ...A holistic framework for planning,...

Documents

Transcript of A holistic framework for planning, real-time control and ...A holistic framework for planning,...