Schaeffer Cain Ode Book

7/29/2019 Schaeffer Cain Ode Book

http://slidepdf.com/reader/full/schaeffer-cain-ode-book 1/303

Copyright c 2012 by David G. Schaeffer and John W. Cain

All rights reserved



ODE: A BRIDGE BETWEEN UNDERGRAD AND

GRADUATE MATH

by

David G. Schaeffer and John W. Cain



Contents

1 Introduction 1

1.1 Some simple ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Descriptive concepts . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Solutions of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Examples and discussion . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Geometric interpretation of solutions . . . . . . . . . . . . . . 5

1.3 Three solution techniques from the elementary theory . . . . . . . . . 6

1.3.1 Linear equations with constant coefficients . . . . . . . . . . . 6

1.3.2 First-order linear equations . . . . . . . . . . . . . . . . . . . 7

1.3.3 Separable equations . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Examples of physically based ODEs . . . . . . . . . . . . . . . . . . . 9

1.4.1 Mechanical systems . . . . . . . . . . . . . . . . . . . . . . . . 9

1.4.2 Physical equations with nonmechanical origins . . . . . . . . . 16

1.5 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.6 Topics covered in this book . . . . . . . . . . . . . . . . . . . . . . . 20

1.6.1 General remarks . . . . . . . . . . . . . . . . . . . . . . . . . 20

1.6.2 Qualitative behavior of some predator-prey models . . . . . . 21

1.7 Software for numerical solution of the IVP . . . . . . . . . . . . . . . 24

1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

1.8.1 Exercises to consolidate your understanding . . . . . . . . . . 27

i



1.8.2 Exercises referenced elsewhere in this book . . . . . . . . . . . 29

1.8.3 Computational Exercises . . . . . . . . . . . . . . . . . . . . . 31

1.8.4 Exercises of independent interest . . . . . . . . . . . . . . . . 33

1.9 Additional notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.9.1 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

1.9.2 The concept “generic” . . . . . . . . . . . . . . . . . . . . . . 35

2 Linear systems with constant coefficients 36

2.1 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

2.2 Definition and properties of the matrix exponential . . . . . . . . . . 38

2.2.1 Preliminaries about norms . . . . . . . . . . . . . . . . . . . . 38

2.2.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

2.2.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3 Calculation of the matrix exponential . . . . . . . . . . . . . . . . . . 45

2.3.1 The role of similarity . . . . . . . . . . . . . . . . . . . . . . . 45

2.3.2 Two problematic cases . . . . . . . . . . . . . . . . . . . . . . 48

2.3.3 Use of the Jordan form . . . . . . . . . . . . . . . . . . . . . . 50

2.4 Large-time behavior of solutions of homogeneous linear systems . . . 52

2.4.1 The main results . . . . . . . . . . . . . . . . . . . . . . . . . 52

2.4.2 Tests for negative eigenvalues . . . . . . . . . . . . . . . . . . 54

2.5 Solution of inhomogeneous problems . . . . . . . . . . . . . . . . . . 55

2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

2.6.1 Routine exercises . . . . . . . . . . . . . . . . . . . . . . . . . 55


ii




3 Nonlinear systems: local theory 61

3.1 Two counterexamples . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

3.2 The existence theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 63

3.2.1 Statement of the theorem . . . . . . . . . . . . . . . . . . . . 63

3.2.2 Differentiability implies Lipschitz continuity . . . . . . . . . . 64

3.2.3 Reformulation of the IVP as an integral equation . . . . . . . 66

3.2.4 The contraction-mapping principle . . . . . . . . . . . . . . . 66

3.2.5 Proof of the existence theorem . . . . . . . . . . . . . . . . . . 68

3.2.6 An illustrative example . . . . . . . . . . . . . . . . . . . . . . 70

3.2.7 Concluding remark . . . . . . . . . . . . . . . . . . . . . . . . 71

3.3 The uniqueness theorem . . . . . . . . . . . . . . . . . . . . . . . . . 71

3.3.1 Gronwall’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . 71

3.3.2 More on Lipschitz functions . . . . . . . . . . . . . . . . . . . 73

3.3.3 The uniqueness theorem . . . . . . . . . . . . . . . . . . . . . 74

3.4 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3.4.1 Nonautonomous systems . . . . . . . . . . . . . . . . . . . . . 76

3.4.2 Linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77


3.5.2 Exercises used elsewhere in this book . . . . . . . . . . . . . . 80

3.5.3 Computational exercise . . . . . . . . . . . . . . . . . . . . . . 81


iii




4 Nonlinear systems: global theory 84

4.1 The maximal interval of existence . . . . . . . . . . . . . . . . . . . . 84

4.2 Two sufficient conditions for global existence . . . . . . . . . . . . . . 86

4.2.1 Linear growth of the RHS . . . . . . . . . . . . . . . . . . . . 86

4.2.2 Trapping regions . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3 Nullclines and trapping regions . . . . . . . . . . . . . . . . . . . . . 91

4.3.1 An activator-inhibitor system . . . . . . . . . . . . . . . . . . 92

4.3.2 Lotka-Volterra with a logistic modification . . . . . . . . . . . 93

4.3.3 van der Pol’s equation . . . . . . . . . . . . . . . . . . . . . . 95

4.3.4 The torqued pendulum and ODEs on manifolds . . . . . . . . 97

4.3.5 Michaelis-Menten kinetics . . . . . . . . . . . . . . . . . . . . 100

4.4 Continuous dependence of the solution . . . . . . . . . . . . . . . . . 102

4.4.1 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.4.2 Some associated formalism . . . . . . . . . . . . . . . . . . . . 103

4.4.3 Continuity with respect to the equation . . . . . . . . . . . . . 104

4.5 Differentiable dependence on initial data . . . . . . . . . . . . . . . . 104

4.5.1 Formulation of the main result . . . . . . . . . . . . . . . . . . 104

4.5.2 The order notation . . . . . . . . . . . . . . . . . . . . . . . . 106

4.5.3 Proof of Theorem 4.5.1 . . . . . . . . . . . . . . . . . . . . . . 107

4.5.4 Further discussion . . . . . . . . . . . . . . . . . . . . . . . . . 109

4.5.5 Generalizations . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

iv




4.6.2 Exercises referenced elsewhere in this book . . . . . . . . . . . 119

4.6.3 Computational exercises . . . . . . . . . . . . . . . . . . . . . 120



4.8 Appendix: Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . 124

4.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

4.8.2 Theoretical basis for the approximation . . . . . . . . . . . . . 125

4.8.3 Convergence of the numerical solution . . . . . . . . . . . . . 127

5 Trajectories near equilibria 130

5.1 Stability of equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

5.1.1 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . 131

5.1.2 An illustrative example: . . . . . . . . . . . . . . . . . . . . . 133

5.2 An orgy of terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.2.1 Description of behavior near equilibria . . . . . . . . . . . . . 133

5.2.2 Classification of eigenvalues of 2 × 2 Jacobians . . . . . . . . . 136

5.2.3 Two-dimensional equilibria and slopes of nullclines . . . . . . 137

5.2.4 The Hartman-Grobman Theorem . . . . . . . . . . . . . . . . 138

5.3 Activator-inhibitor systems and the Turing instability . . . . . . . . . 138

5.3.1 Equilibria of the activator-inhibitor system . . . . . . . . . . . 139

5.3.2 The Turing instability: Destabilization by diffusion . . . . . . 141

5.4 Liapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

5.4.1 The main result . . . . . . . . . . . . . . . . . . . . . . . . . . 143

v



5.4.2 Lasalle’s invariance principle . . . . . . . . . . . . . . . . . . . 145

5.4.3 Construction of Liapunov functions . . . . . . . . . . . . . . . 146

5.5 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . 147

5.5.1 A preparatory example . . . . . . . . . . . . . . . . . . . . . . 147

5.5.2 Statement of the main result . . . . . . . . . . . . . . . . . . . 151

5.5.3 Proof of Theorem 5.5.1 . . . . . . . . . . . . . . . . . . . . . . 152

5.5.4 Global behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 157

5.5.5 Section 1.6 revisited . . . . . . . . . . . . . . . . . . . . . . . 159

5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160


5.6.2 Exercises referenced elsewhere in the book . . . . . . . . . . . 166

5.6.3 Computational exercises . . . . . . . . . . . . . . . . . . . . . 167


5.7 Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6 Oscillations in ODEs 171

6.1 Periodic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

6.1.1 Basic issues and examples . . . . . . . . . . . . . . . . . . . . 171

6.1.2 Contents of this chapter . . . . . . . . . . . . . . . . . . . . . 177

6.2 Special behavior in two dimensions . . . . . . . . . . . . . . . . . . . 179

6.2.1 The Poincare-Bendixson Theorem: minimal version . . . . . . 179

6.2.2 Application to the van der Pol equation . . . . . . . . . . . . 179

6.2.3 Limit sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

6.2.4 The Poincare-Bendixson Theorem: strong version . . . . . . . 183

vi



6.2.5 Dulac’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 183

6.3 Limit cycles in the van der Pol equation for small β . . . . . . . . . . 184

6.3.1 Two illustrative examples of perturbation theory . . . . . . . . 185

6.3.2 Application to the van der Pol equation . . . . . . . . . . . . 189

6.4 Limit cycles in the van der Pol equation for large β . . . . . . . . . . 191

6.4.1 Setting up the problem . . . . . . . . . . . . . . . . . . . . . . 191

6.4.2 The limit-cycle solution . . . . . . . . . . . . . . . . . . . . . 191

6.4.3 Relaxation oscillations in the van der Pol equation . . . . . . . 194

6.5 Stability of periodic orbits: the Poincare map . . . . . . . . . . . . . 195

6.5.1 The basic construction . . . . . . . . . . . . . . . . . . . . . . 195

6.5.2 Discrete dynamical systems . . . . . . . . . . . . . . . . . . . 198

6.5.3 Application of the Poincare-map criterion . . . . . . . . . . . 199

6.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

6.7 Appendix: Index theory in two dimensions . . . . . . . . . . . . . . . 206

7 Bifurcation from equilibria 213

7.1 Example 1: Pitchfork bifurcation . . . . . . . . . . . . . . . . . . . . 213

7.2 An outline of this chapter . . . . . . . . . . . . . . . . . . . . . . . . 219

7.3 Example 2: Transcritical bifurcation . . . . . . . . . . . . . . . . . . 220

7.4 Example 3: Saddle-node bifurcation . . . . . . . . . . . . . . . . . . . 221

7.5 Theory for steady-state bifurcation: the Liapunov-Schmidt reduction 224

7.5.1 Bare bones of the reduction . . . . . . . . . . . . . . . . . . . 224

7.5.2 Stability issues . . . . . . . . . . . . . . . . . . . . . . . . . . 225

7.5.3 Exploration of one-dimensional bifurcation problems . . . . . 2 2 6

vii



7.5.4 Symmetry and the pitchfork bifurcation . . . . . . . . . . . . 230

7.5.5 The two-cell Turing instability . . . . . . . . . . . . . . . . . . 230

7.5.6 Imperfect bifurcation . . . . . . . . . . . . . . . . . . . . . . . 231

7.5.7 A bifurcation theorem . . . . . . . . . . . . . . . . . . . . . . 232

7.6 Example 4: The Hopf bifurcation . . . . . . . . . . . . . . . . . . . . 233

7.7 Hopf bifurcation: theory . . . . . . . . . . . . . . . . . . . . . . . . . 238

7.8 Bifurcation in the FitzHugh-Nagumo equations . . . . . . . . . . . . 245

7.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

7.10 Ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

8 Global bifurcations 250

8.1 Mutual annihilation of two limit cycles . . . . . . . . . . . . . . . . . 250

8.1.1 An academic example . . . . . . . . . . . . . . . . . . . . . . . 250

8.1.2 The FitzHugh-Nagumo equations . . . . . . . . . . . . . . . . 250

8.1.3 Phase locking in coupled oscillators . . . . . . . . . . . . . . . 251

8.2 Saddle-node bifurcation points on a limit cycle . . . . . . . . . . . . . 251


8.2.2 The overdamped torqued pendulum . . . . . . . . . . . . . . . 251

8.2.3 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . 252

8.3 Homoclinic bifurcation . . . . . . . . . . . . . . . . . . . . . . . . . . 252

8.3.1 van der Pol with nonlinearity in the restoring force . . . . . . 252

8.3.2 The torqued pendulum with small damping . . . . . . . . . . 252

8.3.3 The Lotka-Volterra model with logistic growth and the Allee effect253

8.3.4 Other examples . . . . . . . . . . . . . . . . . . . . . . . . . . 253

viii



8.4 Hopf-like bifurcation to an invariant torus . . . . . . . . . . . . . . . 253


8.4.2 The forced van der Pol equation . . . . . . . . . . . . . . . . . 254

8.5 Period doubling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254


8.5.2 Rossler’s equation . . . . . . . . . . . . . . . . . . . . . . . . . 255

8.6 Appendix: ODEs on a torus . . . . . . . . . . . . . . . . . . . . . . . 255

8.7 Appendix: What is chaos? . . . . . . . . . . . . . . . . . . . . . . . . 255

8.8 ideas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

A Guide to Commonly Used Notation 256

B Notions from Advanced Calculus 257

B.0.1 Regions with smooth boundaries . . . . . . . . . . . . . . . . 265

C Notions from Linear Algebra 267

C.1 Appendix: A compendium of results from linear algebra . . . . . . . 267

C.1.1 How to compute Jordan normal forms . . . . . . . . . . . . . 267

C.1.2 The Routh-Hurwitz criterion . . . . . . . . . . . . . . . . . . . 271

C.1.3 Continuity of eigenvalues of a matrix with respect to its entries 273

C.1.4 Fast-slow systems . . . . . . . . . . . . . . . . . . . . . . . . . 275

C.1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276

D Nondimensionalization and Scaling 277

D.1 Classes of equations in applications . . . . . . . . . . . . . . . . . . . 277

D.1.1 Mechanical models . . . . . . . . . . . . . . . . . . . . . . . . 277

D.1.2 Electrical models . . . . . . . . . . . . . . . . . . . . . . . . . 278

ix



D.1.3 “Bathtub” models . . . . . . . . . . . . . . . . . . . . . . . . 278

D.2 Scaling and nondimensionalization . . . . . . . . . . . . . . . . . . . . 279

D.2.1 Duffing’s equation . . . . . . . . . . . . . . . . . . . . . . . . . 280

D.2.2 Lotka-Volterra with logistic limits . . . . . . . . . . . . . . . . 282

D.2.3 Michalis-Menton kinetics . . . . . . . . . . . . . . . . . . . . . 284

D.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

Bibliography 288

x



Chapter 1

Introduction

1.1 Some simple ODEs

1.1.1 Examples

An ordinary differential equation (ODE) is an equation involving an unknown func-tion of one variable and some of its derivatives. We hasten to assure the reader thatthis bland phrase has meaning for us primarily through examples, so let’s proceedto these immediately.

Most simply we have the equation for exponential growth or decay,

x′ = αx, (1.1)

where α is a constant (let’s say real), x(t) is the unknown function, and x′ denotesthe derivative of x with respect to t. The logistic equation modifies this equation, incase α > 0, by inclusion of a negative term that limits growth as x becomes large:

x′ = αx − βx2 (1.2)

where β is a positive constant.

The equationx′′ + x = 0 (1.3)

describes what is often called simple harmonic motion . In physical terms, which willbe introduced in Section 1.4, equation (1.3) describes the motion of a mass pulledback towards equilibrium by a frictionless spring. Here of course x′′ denotes thesecond derivative. A useful point of comparison for (1.3) is

x′′ + sin x = 0, (1.4)

1



which describes the motion of a pendulum under gravity under some simplifyingassumptions about units (also discussed in Section 1.4). Two other modifications of (1.3) are

(a) x′′ + tx = 0 and (b) x′′ + (α + β cos t)x = 0, (1.5)

known as Airy’s equation and Mathieu’s equation , respectively.

We conclude our first round of examples with a Riccati equation

x′ = x2

−t (1.6)

and a purely pedagogical example

1 + (x′)2 = x2. (1.7)

1.1.2 Descriptive concepts

The most basic concepts used in describing ODEs is order , which refers to the orderof the highest derivative that appears in the equation. Thus equations (1.1), (1.2),(1.6), and (1.7) are first order, while (1.3), (1.4), and (1.5) are second order. Here is

an example of a third-order equation:d3y

dx3=

αy + β

y3, (1.8)

where y(x) is the unknown function of the variable x. This example also illustratesthe following three points: (i) Usually the independent variable in the ODE westudy is time, but other choices also occur—in this equation x represents a spatialcoordinate. (The dependent variable y(x) represents the thickness of a thin film asa function of position.) (ii) We have written the derivative using the d/dx-notationrather than with primes; no mathematical significance should be attached to this

choice, it is only a matter of taste as to which notation seems more appropriate tous in a given situation. (iii) An ODE need not be defined for all values of eitherthe dependent or independent variable. For example, (1.8) is not defined for y = 0.Incidentally, much information can be gained by focusing on such exceptional pointsof an equation—called singularities in the usual terminology.

Normally we will solve for the highest derivative of the dependent variable as afunction of lower-order derivatives and of t: i.e., for an equation of order n

x(n) = f (x, x′, . . . , x(n−1), t). (1.9)

The value of this convention is illustrated by equation (1.7), which may be rewritten

x′ = ±√ x2 − 1. (1.10)

2



Two problems in (1.7) become evident from rewriting the equation in this way:(i) Really two different ODEs are hidden in (1.7); to get a specific ODE we needto choose between the plus and minus signs in (1.10). (ii) For some values of x—specifically |x| < 1—equation (1.7) has no real-valued solutions. The restrictionon values of x is not a problem in itself—as noted above, an ODE need not bedefined for all values of its variables—but there is yet another problem in (1.7) thatis less obvious but more serious. It relates to the fact that the behavior of

√ x2 − 1

is nondifferentiable at

|x

|= 1, the boundary of its natural domain. Specifically,

Exercise 3(c) shows that equation (1.7) suffers from a nonuniqueness pathology.Next we define the very important notion of linearity . We shall call an ODE

linear 1 if it may be written in the form

x(n) = a1(t)x(n−1) + a2(t)x(n−2) + . . . + an−1(t)x′ + an(t)x + f (t); (1.11)

i.e., the unknown function x and its derivatives appear only raised to the first power.Thus equations (1.1), (1.3), and (1.5) are linear. Equations (1.2), (1.6), and (1.7)are obviously nonlinear; (1.4) is also nonlinear because

sin x = x −x3

3! +

x5

5! + . . .

has many higher powers of x hidden in it. Likewise

x′′ = x′x

is also nonlinear because of the product on the RHS of the equation.

The linear equation (1.11) is called homogeneous if f (t) ≡ 0, and it is said to haveconstant coefficients if all the functions a j(t), j = 1, . . . , n are actually independentof t. The latter concept for nonlinear equations has a different name: (1.9) is calledautonomous if the function f does not depend on t.

1.2 Solutions of ODEs

1.2.1 Examples and discussion

Here is a definition even more vapid than the definition of an ODE: A function x(t)is called a solution of (1.9) if the two sides of the equation become equal when thisfunction is substituted into the equation. Let’s proceed to examples.

For any constant C , x(t) = Ceαt is a solution of (1.1). (In Exercise 1 weshow the reader how to prove that this is the most general solution of (1.1).) The

1More formally, we say that equation (1.9) is linear if the function f is linear is its first n arguments;no restriction on the t-dependence is implied.

3



general solution of (1.2) will be determined in Section 1.3.3 below, using a techniqueintroduced in that section.

For any constants C 1, C 2 ∈ R,

x(t) = C 1 cos t + C 2 sin t (1.12)

is a solution of (1.3). This solution provides an instance of the principle of linear superposition for a linear, homogeneous ODE. Specifically, if x1(t) and x2(t) are solu-

tions of (1.11) (with f (t) ≡ 0), then for any constants C 1, C 2, the linear combination

C 1x1(t) + C 2x2(t)

is also a solution. In the language of linear algebra, the set of solutions of a homoge-neous linear ODE forms a vector space. As we shall see in Chapter 2, any solution of (1.3) can be written in the form (1.12); this means that the set of solutions of (1.3)is a two-dimensional vector space for which cos t, sin t is a basis.

The above examples of solutions illustrate one of the most fundamental pointsin the whole subject: ODEs have infinitely many solutions. Thus, some auxiliaryinformation must be given to pick out exactly one solution from the infinite set of

solutions. The most common such auxiliary information is an initial condition . Forexample, if one seeks a solution of (1.1) subject to the auxiliary condition

x(0) = b

where b is a real constant, then x(t) = beαt is the unique solution of this more specificproblem. (To show it’s unique, we need to know that Ceαt is the general solution of (1.1), as is proved in the Exercise 1.) Similarly, given real constants b0, b1 there is aunique function (1.12) that satisfies (1.3) plus the initial condition

x(0) = b0, x′(0) = b1.

For a general problem, given constants b0, b1, . . . , bn−1, one seeks a solution of (1.9)such that

x(0) = b0, x′(0) = b1, . . . , x(n−1)(0) = bn−1. (1.13)

We shall call this the initial-value problem for (1.9). The initial condition may beimposed at any point t = t0, but we will usually impose this condition at t = 0as in (1.13). Of course the general case is easily reduced to (1.13) without loss of generality.

Warning: The symbol x may refer to a generic real variable, or it may refer to

a function x(t) that satisfies an ODE. The first usage occurs in (the italicized partof) the phrase “The equation x′ = f (x), where f is a differentiable function of x,is a general first-order autonomous ODE”, while the second occurs in “The general

4



x

t

b*

Figure 1.1: Direction field for the Riccati equation (1.6), with several solution trajectories corresponding to different choices of initial condition x(0).

solution x of (1.1) is given by the function x(t) = Ceαt.”

In general, it is a rare and pleasant occurrence when one is able to find explicitsolutions of an ODE. In Section 1.3 we shall describe three solution techniques fromthe elementary theory. Although much information about solutions of equations suchas (1.4), (1.5), and (1.6)) has been obtained through intensive study, simple formulaslike the solutions of (1.1) and (1.3) are not readily available.

1.2.2 Geometric interpretation of solutions

The geometric interpretation of ODEs is an essential part of the subject. The in-terpretation is clearest for first-order equations. We illustrate this with the help of Figure 1.1, which shows the direction field for (1.6): i.e., imagine that at every pointin the half-plane (t, x) : t > 0, a line segment whose slope is x2 − t is drawn. Thebasic geometrical fact is that a function x(t) is a solution of (1.6) iff at every point(t, x(t)) its graph is tangent to the line segment at that point. This interpretationmakes it seem natural that ODEs have many solutions and that a unique solutionmay be selected by specifying a starting point for the curve (t, x(t)) at t = 0..

Much qualitative information about solutions of an ODE may be obtained fromits direction field. For example, from Figure 1.1 we make a conjecture regarding

the behavior of solutions of (1.6) as t → ∞: i.e., for all initial data such that b isnegative, the solution of (1.6) asymptotes to a curve in the in the half-plane x < 0,while if b is large and positive, the solution grows without bound. Moreover, there

5



is a special initial condition x(0) = b∗ that marks the threshold between these vastlydifferent long-term behaviors. In Exercise 13 we invite the reader to explore thisconjecture with the numerical software introduced in Section 1.7 below.

1.3 Three solution techniques from the elementary theory

In this section we introduce three methods for finding explicit solutions of certainODEs that come from the elementary theory. In this book we shall assume the readercan use these three techniques. No other prerequisites from the elementary theoryof ODEs will be assumed.

1.3.1 Linear equations with constant coefficients

If the coefficients a j in (1.11) are actually independent of t, then one may findexplicit solutions of this equation using exponentials. This method will be developedextensively in Chapter 2, so here we limit ourselves to applying it in a simple example

x′′ + βx′ + x = 0. (1.14)

(This equation differs from (1.3) by the first-order term βx′. As discussed in Sec-tion 1.4, this new term represents friction, and normally β > 0.) Let us look forsolutions of (1.14) of the form x(t) = eλt. Substituting into (1.14) we see that eλt

satisfies (1.14) if (λ2 + βλ + 1)eλt = 0.

In other words, because the derivative of the exponential is a multiple of itself,finding an exponential solution of (1.14) reduces to solving the algebraic equationλ2 + βλ + 1 = 0, which of course has solutions

λ± = −β ± β 2

− 42

.

Thus, eλ±t is a solution of (1.14), and by linear superposition for any constantsC +, C −

C +eλ+t + C −eλ−t

is also a solution. As we shall show in Chapter 2, this is the general solution of (1.14).

If the friction coefficient β is positive, then the solutions eλ±t decay as t increases.If 0 < β < 2, then the roots λ± are complex. In this case we may separate real and

imaginary parts of the roots,

λ± = −β/2 ± i

1 − β 2/4

6



and use Euler’s formula eiθ = cos θ + i sin θ to rewrite the solution

eλ+t = e−βt/2

cos

1 − β 2/4 t + i sin

1 − β 2/4 t

and similarly for eλ−t. Moreover we may form linear combinations

e−βt/2 cos

1 − β 2/4 t, e−βt/2 sin

1 − β 2/4 t

to choose a different basis for the set of solutions of ( 1.14) whose elements are real-valued.

This discussion illustrates a tension that exists in this text: usually we are in-terested in real-valued solutions of an ODE, but often it is convenient to considercomplex-valued solutions in order to take advantage of the complex exponential. Ingeneral, as here, a complex exponent indicates oscillatory behavior of real-valuedsolutions of an ODE.

1.3.2 First-order linear equations

For a first-order linear ODE, say

x′ + a(t)x = f (t), (1.15)

one can find explicit solutions even if the coefficients are variable. In Exercise 1(b)we ask the reader to verify the following claim: Let a(t) =

t0 a(s) ds; then for any

constant C ,

x(t) = Ce−a(t) +

t

0

ea(s)−a(t) f (s) ds (1.16)

satisfies (1.15). Moreover since x(0) = C, (1.16) also provides a solution to theinitial-value problem.

For the reader seeking a deeper understanding, the derivation of (1.16) fromthe equation using an integrating factor is given in most introductory differentialequations textbooks, see for example [4].

1.3.3 Separable equations

A first-order ODE is called separable if the RHS may be factored

x′ = f (x)g(t). (1.17)

For example equations (1.2) and (1.7) are separable, where in both cases the factor

g(t) is trivial. Let us illustrate how to exploit this property by solving (1.2). In thefollowing derivation, we temporarily suspend all concerns of rigor—we shall freely

7



perform manipulations that might be problematic in order to obtain a formula forthe solution. After it has been derived, we may verify that the formula actually doesprovide a solution. Given such a verification, there is no need to justify intermediatesteps.

We write the LHS of (1.2) using the notation x′ = dx/dt, and we treat dx and dtas separate factors. Let us bring all x-dependence in the equation to the LHS andall t-dependence to the RHS, obtaining

dxαx − βx2 = dt. (1.18)

The LHS of (1.18) may be expanded in partial fractions:

1

αx − βx2=

1

αx+

β/α

α − βx.

Then multiplying both sides of (1.18) by α and integrating, we derive

ln x − ln(α − βx) = αt + C

where C is an arbitrary constant of integration. Exponentiation of this equationyields

x

α − βx= C ′eat,

where C ′ = eC , and this relation may be solved for x(t),

x(t) =C ′αeat

1 + C ′βeat. (1.19)

In Exercise 1(c) we ask the reader to verify that the above, rather formal, manipu-lations actually produce solutions of (1.2).

Regarding the IVP, we seek a value of C ′ in (1.19) that will satisfy the initialcondition x(0) = b. The reader may check that for any b = α/β , the initial conditionis satisfied if and only if

C ′ =b

α − βb. (1.20)

It is interesting that (1.19) fails to provide a solution to the IVP precisely in the casewhen the original ODE has the trivial solution x(t) ≡ α/β . This behavior arisesfrom one of the gaps in rigor in the above derivation: if x(t) ≡ α/β , then the termdx/(α − βx) in the derivation is undefined and hence cannot be integrated. Thisbehavior reminds us that solutions obtained using separability always need to bechecked.

Note that, unless C ′ = 0, the solution (1.19) is not defined for all t. This warns

8



0

α/β

x

t

Figure 1.2: Direction field for (1.2), with solution trajectories corresponding toseveral different choices of initial condition x(0).

us that an IVP may have a solution only for a limited time. (We explore this issuein a more serious way in Chapter 3.)

Despite the above non-existence problem, (1.2) gives acceptable predictions re-garding the future evolution of a population, for which of course x ≥ 0. Specifically,in Exercise 1(c) we ask the reader to show that if b ≥ 0, then the solution of theIVP—obtained from (1.19) if b = α/β and from x(t) ≡ α/β if b = α/β —exists forall t ≥ 0. The reader may also verify that, no matter what the initial conditions, thesolution x(t) tends to α/β as t → ∞, as may be anticipated from considering thedirection field shown in Figure 1.2.

Here is another point illustrated by the logistic equation: If x(t) is a solution of (1.2) for some time interval, then for any constant t0, the shifted function x(t) =x(t − t0) is also a solution of (1.2) on the appropriate translated interval. Suchtranslational invariance will occur whenever the governing equation is autonomous.

1.4 Examples of physically based ODEs

1.4.1 Mechanical systems

The motion of spring-mass systems provide invaluable insight into many phenomena

involving ODEs. Such systems are governed by Newton’s second law of motion,

mass × acceleration = sum of all forces,

9



−

= − k xspringF

=

friction

F β x’

x

m

equilibrium

Figure 1.3: Schematic diagram of the mass-spring-dashpot system corresponding to (1.21).

or more compactly and more famously, F = ma. Consider for example the systemillustrated in Figure 1.3. The mass is constrained to move along a single axis. If we let x measure the displacement of the mass from a reference position, then theacceleration is simply the second derivative d2x/dt2. There are two forces acting onthe mass: (i) the restoring force F spring from the spring and (ii) friction F friction. Therestoring force opposes any displacement from the equilibrium position. The simplestassumption, called Hooke’s law , is that the force is proportional to the displacementfrom equilibrium. If we measure displacements relative to the equilibrium position,we have the simple formula for the restoring force F spring = −kx where k is theconstant of proportionality. Regarding friction, the simplest assumption is that thisforce opposes the motion of the mass with a strength proportional to its speed: insymbols F friction =

−β dx/dt where β

≥0. Truth to tell, this formula is a rather

poor approximation of dry friction2; i.e., friction of a mass sliding over a dry surface.Despite this inaccuracy, friction is widely approximated by such a term because of the appealing fact that this leads to a linear ODE.

Combining the above forces in Newton’s equation we get the ODE for the motionof the mass

mx′′ = −βx′ − kx. (1.21)

Equation (1.14) is a special case of this equation3. As for (1.14), the general solutionof (1.21) is a linear combination of exponentials. Substituting into the equation, we

2This formula is more typical of the drag from a viscous fluid at moderate velocities—see [?].

For that reason in Figure 1.3 we have represented friction by a “dashpot”: i.e., a piston slidingthrough a viscous fluid.

3In fact, (1.21) can be reduced to (1.14) by scaling. Specifically, if we let τ = (

k/m) t and divide

10



find that eλt is a solution of (1.21) if

λ =−β ±

β 2 − 4mk

2m. (1.22)

This formula for the roots contains interesting information about how friction changesthe behavior of the system. If there is no friction (i.e., β = 0), the roots (1.22) arepure imaginary and the solutions of (1.21) are trigonometric functions with (angular)frequency ω = k/m; i.e., oscillations continue forever. As β increases from zero,

the solutions retain their oscillatory character, with some decrease in the frequency,but are confined within a decaying exponential envelope. This behavior continues asβ increases, the decay becoming more rapid, until β 2 = 4mk. After this point bothroots (1.22) are real, and the solution x(t) will cross x = 0 at most once in the courseof its decay. One calls the cases β 2 < 4mk, β 2 = 4mk, and β 2 > 4mk underdamped,critically damped, and overdamped, respectively.

These ideas have a practical consequence in the automotive world. The shockabsorbers of a car can be crudely modeled by (1.21). As the name suggests, onewants shock absorbers to have a lot of damping: i.e., to be overdamped. This givesrise to the following quick test for whether shock absorbers are worn out. Depress

the car and release it from rest. If the car returns monotonically to equilibrium, thenthe shock absorbers are OK. If on the other hand, the car oscillates up and down inits return to equilibrium, then the shock absorbers need to be replaced.

It is easy to imagine springs in which the restoring force is not exactly proportionalto the displacement; indeed exact linearity is the unlikely behavior. For example,although the restoring force is gravity rather than a spring, consider a pendulumas illustrated in Figure 1.4. The mass is confined to move on a circle by a rigid(massless) arm, say of length ℓ, and its position is specified by a single coordinate,an angle rather than a displacement. If the angle x that the pendulum makes with thevertical is measured in radians, then ℓx equals the displacement of the mass along

the circumference, ℓ dx/dt equals its velocity, and ℓ d2x/dt2 equals its tangentialacceleration. The tangential component of gravity is F tang = −mg sin x. Thus, if there is no friction, Newton’s equation of motion may be written

x′′ + (g/ℓ)sin x = 0, (1.23)

where we have divided both terms by mℓ. This derivation explains the origin of

(1.21) by k, we obtaind2x

dτ 2+ β

dx

dτ + x = 0

where β = β/√ mk. Mathematically, such scalings can greatly simplify an ODE, and physically,much insight can be gained by relating such scalings to the units of the parameters in the equation.We shall develop these ideas systematically in Appendix D.

11



F tang = − mg sin x

mg (gravity)

x

m

Figure 1.4: Schematic diagram of the pendulum corresponding to (1.4).

(1.4), but our purpose here is to illustrate the deviation of this restoring force fromlinearity. If x is small, then sin x

≈x so in this range the force is approximately

linear, but as x increases, the force falls behind this linear growth (see Figure 1.5).Incidentally, (1.4), along with elaborations that include other effects, is one of theexamples we recall throughout this book to illustrate the theory.

Similarly, it is also possible for the force to grow more rapidly than linearly. Amore extreme deviation from Hooke’s law is illustrated by the cantilever beam (i.e.,supported at one end only) placed between two magnets as in Figure 1.6. If thebeam bends so that its tip is displaced slightly to the right, the beam is closer tothe magnet on the right and hence more strongly attracted to it than to the magneton the left. If the magnetic forces dominate the bending resistance of the beamthen, following a small displacement, the net force on the beam will be to the right:

i.e., the beam is pulled away from the centerline rather than towards it. However,if the beam moves beyond the magnet to the right, then both magnetic forces andthe bending resistance all pull the beam back towards the center line. Suppose wenaively describe the bending of the beam by a single variable x, say the displacementof the tip. The simplest force law4 reproducing the above behavior is

F (x) = +k1x − k2x3 (1.24)

where both k1, k2 are positive. Despite its crudeness as a physical model, (1.24)is often used in applications, and mathematically it provides a useful illustrative

4See Exercise 11 for another system exhibiting qualitatively similar behavior.

12



F tang = −mg sin x

F lin = −mgx

x

0−π π

0

force

Figure 1.5: Comparison of the tangential component F tang with its linear approxi-mation F lin = −mgx.

N N

S S

x

Figure 1.6: Schematic diagram of a cantilever beam between two magnets.

13



example. We shall call the equation

x′′ + βx′ − x + x3 = 0 (1.25)

Duffing’s equation 5. The force law (1.24) is called a double-well potential , which leadsus to the concept of potential energy, which we now discuss.

The potential energy V (x) associated with a force law F (x) is defined as the workthat must be done against the force to move the mass from a reference position,

typically equilibrium, to the position specified by x, in symbols

V (x) = − x

0

F (s) ds. (1.26)

The potential functions for Hooke’s law, for the pendulum, and for (1.24) are graphedin Figure 1.7; the figure explains name double-well potential for (1.24). Of course(1.26) is equivalent, apart from an additive constant, to F (x) = −∂V/∂x(x), so theequation of motion of a particle moving in a force field6 with potential energy V is,assuming linear friction,

mx′′ + βx′ +∂V

∂x

(x) = 0. (1.27)

Note that this equation is nonlinear except in the special case when V (x) is quadratic.

One may attempt to visualize solutions of (1.27) as the motion of a marble rollingin the x, z -plane along a curve given by the equation z = V (x). As discussed in theNotes at the end of the chapter, this analogy is quantitatively inaccurate, but itmakes useful qualitative predictions nonetheless7: For example, a particle movingaccording to (1.25), which has double-well potential

V (x) = −1

2x2 +

1

4x4, (1.28)

will indeed come to rest at the bottom of one of the wells, just as a rolling marblewould do.

One technique for proving the previous statement, which will be studied in Chap-ter 5, is based on energy considerations, and we now introduce this important con-cept. The total energy of a mass is the sum of its potential energy and its kinetic

5Strictly speaking (1.25) should be called Duffing’s equation without forcing. Duffing’s equationwith forcing would include a nonzero inhomogeneous term f (t) on the RHS of (1.25). Someeffects of forcing are studied in Exercise6.

6Although we introduced forces in connection with spring-mass systems, we want to consider moregeneral force laws than can reasonably be associated with any spring. For that reason we adopt

the language of “a particle in a force field”.7One is reminded of the aphorism, “A simple lie may be more useful than a complicated truth.”

14



V =21 kx2(a)

x

V = mg (1−cos x)(b)

x

(c)

x

2V =2

x2

+k

4x

4−k1

Figure 1.7: The potential functions for (a) Hooke’s law; (b) the pendulum; and (c)the double-well potential.

15



energy, in symbols

E =m

2(x′)2 + V (x). (1.29)

We ask the reader to compute that if x(t) satisfies (1.27) then energy is dissipatedat the rate

dE

dt= −β (x′)2. (1.30)

To interpret: if there is no friction then energy is conserved (i.e., remains constant

as time evolves), while if β > 0 energy decreases at a rate given by (1.30).

1.4.2 Physical equations with nonmechanical origins

We have assumed β ≥ 0 in the discussion of spring-mass systems since frictionnormally dissipates energy. However, in certain electrical circuits what amounts tonegative friction can arise over a limited region of state space. The most famousequation exhibiting such behavior is van der Pol’s equation

x′′ + β (x2 − 1)x′ + x = 0 (1.31)

where x measures a voltage in a circuit. If x is small (specifically, |x| < 1), thecoefficient of x′ is negative, leading to the increase of energy, while if x is largethis coefficient has the usual positive sign. We will analyze this equation in detail inChapter 6. Historically van der Pol’s equation arose in modeling circuits with vacuumtubes, but of much greater current interest, it also arises in modeling semiconductorcircuits [?].

By way of background, a linear equation of the form (1.21) arises from the de-scription of an electrical circuit containing linear elements: an inductor (L), a resistor(R), and a capacitor (C ). Specifically, as may be derived from Kirchoff’s laws [?],the voltage x(t) across the capacitor in Figure 1.8 at time t satisfies8

Lx′′ +L

RC x′ +

1

C x = 0. (1.32)

The van der Pol equation arises if the linear resistor in the figure is replaced by anappropriate nonlinear element in which current depends non-monotonically on volt-age. (It may appear that, with “negative friction”, energy is being created out of nowhere, but the derivation [?] explains how the system is consistent with conserva-tion of energy.) Electrical circuits are perhaps further from everyday intuition thanspring-mass systems, and we do not develop the theory here.

Below we shall also consider ODEs derived from other applications, as for example

the predator-prey population model (1.33) below.8A circuit with the elements in series, rather than parallel, is probably more familiar to most

readers. We consider the parallel circuit since this configuration leads to van der Pol’s equation.

16



L

x = voltageacross

capacitor

C R

Figure 1.8: An inductor, a resistor, and a capacitor in parallel. The voltage xacross any of the elements satisfies (1.32). If the linear resistor is replaced by a suitably-chosen “nonlinear resistor”, then x satisfies van der Pol’s equation (1.31).

1.5 Systems of ODEs

All of the examples of ODEs considered above contained a single unknown function.It will be crucial to also study systems —i.e., several simultaneous equations—of ODEs involving several unknown functions.

Some physical or biological systems9 are most naturally modeled by systems of ODEs. One of the best known such systems is the Lotka-Volterra model

x′ = αx − βxyy′ = γxy − δy

(1.33)

where all parameters are positive. Let us describe the physical assumptions under-lying (1.33) since, in our view, such understanding is an essential part of acquiring

facility with ODEs. This system describes the evolution of two interacting popula-tions, a predator (say foxes, represented by y) and a prey (say rabbits, representedby x). In the absence of predators (i.e., y = 0), the prey population satisfies x′ = αx,the equation for exponential growth. However, their growth rate is reduced by preda-tion, which is assumed to occur at a rate proportional to each population10. Similarly,

9There is an unfortunate conflict between different fields in the use of the word system . Herewe mean system in the biological sense “a group of interacting, interrelated, or interdependentelements forming a complex whole”, while later in this same sentence we mean system in its morerestricted mathematical meaning, “several simultaneous equations”.

10For maximal realism, the underlying process should be modeled probabilistically. An ODE modelprovides a useful approximation for the evolution of average populations provided the populations

are large . The rate term proportional to xy may be derived from the probability that membersof the two species encounter one another. In chemical kinetics, the corresponding approximationis called the Law of Mass Action .

17



the predator equation for y represents a balance between two effects: the predatorpopulation is increased by a term proportional to the amount of food the preda-tors consume, and their population is decreased by a “death” term proportional totheir population. Remarkably, in the full equation for the evolution of y, these twoeffects are simply added ! This in an instance of a very general phenomena—whenseveral effects occur in a physical system, typically the ODE describing its evolutionis obtained simply by adding the contributions of each effect in the ODE. This isthe source of the power of ODEs. Of course, although the equations may be simple

to formulate, solving them is anything but simple, and that’s what these notes areabout.

Besides arising naturally, systems of ODEs also arise as a mathematical conve-nience. For example, we claim that van der Pol’s equation (1.31) is equivalent to the2 × 2 first-order system

y′1 = y2y′2 = −β (y21 − 1)y2 − y1.

(1.34)

To see this, suppose x(t) is a solution of (1.31). Then let y1(t) = x(t) and y2(t) =x′(t); it is easily seen that the two-component vector y(t) satisfies (1.34). Conversely,if (y1(t), y2(t)) satisfies (1.34), then a trivial calculation shows that x(t) = y1(t)satisfies (1.31).

This construction is quite general. Specifically, the nth-order ODE (1.9) is equiv-alent to the n × n system for functions y1(t), . . . , yn(t)

y′1 = y2y′2 = y3

......

...

y′n−1 = yn

y′n = f (y1, y2, . . . , yn, t).

(1.35)

The proof of this statement is completely analogous to the above calculation withvan der Pol’s equation.

It turns out that it is more convenient to study first-order systems of ODEs thanto study a single, higher-order equation. Among other reasons, this conveniencederives from the geometric interpretation of ODEs: although (1.35) requires workingin more dimensions, the equation may still be interpreted in a fashion analogousto Figure 1.1. Indeed, geometric language is fundamental to the advanced theory.

The presentation of the theory is also simplified by using vector notation. Thus,for example, if we write y = (y1, y2, . . . , yd), then (1.35) can be written compactlyy′ = F(y) where the vector-valued function F(y) has the components on the RHS

18



y2

y1

Figure 1.9: The vector field associated with (1.34) with β = 1. One sample solution trajectory is shown. Like all other non-equilibrium solutions, it converges to the periodic solution of (1.34).

of (1.35).Let us illustrate this geometric interpretation for van der Pol’s equation (1.34),

where the two-dimensional geometry11 simplifies visualization. Figure 1.9 shows thevector field

F(y) =

y2

−β (y21 − 1)y2 − y1

(1.36)

defined by the RHS of (1.34). A curve y(t), T 1 < t < T 2 is a solution of (1.34)iff for every t the tangent to the curve at the point y(t) equals F(y). Despite thetransparency of this interpretation, it is not at all easy to deduce global behaviorof solutions from this local information. The curve shown in the figure is a typical

solution of (1.34)—any non-zero solution converges to a periodic trajectory: i.e., asolution for which there exists a time T > 0 such that y(t + T ) = y(t) for all t. Weinvite the reader to use the software of Section 1.7 to verify this claim numerically.Considerable theory, which we will develop in Chapter 6, is needed in order to verifyit analytically.

Let us conclude this section with some terminology. A system y′ = F (y, t)is called linear if F has the special form F (y, t) = A(t)y + a(t) where A(t) anda(t) are matrix-valued and vector-valued functions of time, respectively. It is calledhomogeneous if a(t) ≡ 0. A system y′ = F (y, t), linear or nonlinear, is calledautonomous if F is actually independent of t.

11The phrase phase plane is used to describe graphs like Figure 1.9 that show trajectories of atwo-dimensional autonomous first-order system of ODE.

19



Given an autonomous system y′ = F (y), say of dimension d, we call a pointb∗ ∈ Rd an equilibrium of this system if F(b∗) = 0. In particular, the constantfunction y(t) ≡ b∗ is one solution of this system.

Regarding geometrical interpretation, let us distinguish between trajectory andorbit. Both terms refer to the curve traced out by a solution of an autonomous ODE,say

y′ = F (y). (1.37)

By trajectory , we mean the curve with parametrization by time,

t → (y1(t), . . . , yn(t))

where y(t) satisfies (1.37). By contrast, orbit refers to the point set,

(y1(t), . . . , yn(t)) : t ∈ (a, b)

where we assume the solution exists for times with a < t < b. The orbit is inde-pendent of any specific parametrization of the curve; for example, in the case of atwo-dimensional system, an orbit could be written as the level set of some function

Φ(y) of two variables, say(y1, y2) : Φ(y1, y2) = const. (1.38)

1.6 Topics covered in this book

1.6.1 General remarks

In a first course in ODEs the focus is on finding explicit formulas to represent solu-tions of equations. This is a fascinating subject that offers boundless opportunitiesfor ingenuity—it would be an interesting digression to describe the application of

such methods just to equations (1.2), (1.5), (1.6). However, the sad fact is that formost equations explicit solutions cannot be found. Approximate solutions, obtainedeither from numerical computations or asymptotic analysis, frequently provide anadequate substitute for explicit formulas. We will touch briefly on both kinds of approximate solutions, but neither they nor explicit solutions are the main focus inthis book.

In Part I of this book—Chapters 2–4—we address the “holy trinity” of theoreticalquestions regarding the initial value problem:

• Existence of solutions (local in Section 3.2, global in Section 4.2),

• Uniqueness of solutions (Section 3.3), and

• Continuous dependence on the initial data (Section 4.4).

20



The first two phrases are probably self-explanatory; we shall wait till Chapter 4 toflesh out the third. Our treatment in these chapters is completely rigorous.

In Part II—Chapters 5–8—we develop, with less concern about rigor, the quali-tative theory of ODEs, especially bifurcation theory. The qualitative theory studieswhat can be said analytically about solutions of ODEs in the absence of explicitformulas. A central question in the theory is to characterize the asymptotic behaviorof solutions as t → ∞. In the next unit we illustrate typical answers to this questionby considering the Lotka-Volterra equation (1.33) as well as some of its elaborations.

1.6.2 Qualitative behavior of some predator-prey models

The large-time behavior of solutions of the Lotka-Volterra is easily described. Let ussimplify the Lotka-Volterra equations to

x′ = x − xyy′ = ρ(xy − y)

(1.39)

where ρ is a positive constant; as we will show in Appendix D, (1.39) can be derivedfrom (1.33) by scaling. One particular solution of (1.39) is the constant, equilibrium,

solution x(t) ≡ 1, y(t) ≡ 1, which describes a steady balance between the twospecies. Every other solution in the open first quadrant12 x > 0, y > 0 circlesthis equilibrium point in a periodic fashion, as indicated in Figure 1.10. Indeed, theorbits are level sets of the function

L(x, y) = ρ(x − ln x) + y − ln y, (1.40)

which has a global minimum at (1, 1). (In Exercise 5 we discuss how to derive thisconclusion using the solution technique of separability.)

However, the Lotka-Volterra model is far too simplistic for realistic modeling. Ac-

knowledging that fact, let us also examine the large-time behavior of two of the manymodifications of it that have been studied. Specifically, we correct two unsatisfactoryconsequences of the linear growth rate in the prey-only equation x′ = x:

• Solutions of this equation grow indefinitely large as time evolves. As we sawin (1.2), a more realistic equation is x′ = x(1 − x/K ) where K is the carryingcapacity of the environment.

• No matter how small x(0) may be, the prey population never goes extinct.This defect may be corrected in an ad hoc manner by assuming the growthrate13 equals x(x− ε)/(x + ε). Then for x < ε the growth rate is negative; thus

12At the boundary of the first quadrant, there are solutions with y ≡ 0, in which case x growsexponentially, and solutions with x ≡ 0, in which case y decays exponentially.

13A growth rate that depends on the population size is called the Allee effect .

21



0

1

2

3

0 1 2 3x

y

Figure 1.10: Several solution curves of the Lotka-Volterra system (1.39). (Here we have chosen ρ = 1.) All non-equilibrium solutions are periodic and encircle the equilibrium at (1, 1).

if x(0) < ε, the prey will die out. On the other hand, for large x the growthrate is close to x, as in the original equations.

Inserting both of these modifications14 of the prey growth rate into the system (1.39)gives us the equations

(a) x′ = x

x−εx+ε

(1 − x

K ) − xy

(b) y′ = ρ(xy − y).(1.41)

If ε = 0 and K = ∞, then we obtain the unmodified Lotka-Volterra equations (1.39).

To begin the discussion of (1.41), let us find the equilibrium solutions of this sys-tem. From (1.41b), we find that ρ(xy

−y) = 0 if either y = 0 or x = 1. Substituting

y = 0 into (1.41a) gives the first three equilibria listed in Part (a) of Table 1.6.2, andsubstituting x = 1 gives the fourth, (1, y∗) where

y∗ = (1 − 1/K )(1 − ε)/(1 + ε). (1.42)

In studying (1.41) we assume that

0 < ε < minK, 1 : (1.43)

we want ε < K so that the carrying capacity exceeds the threshold for extinction,

and when K > 1, we want ε < 1 so that the prey population at coexistence exceeds14Although (1.41) is physically motivated, we are not claiming that it is a realistic model for

population growth.

22



Part (a)Equilibrium Description

(0, 0) Extinction(ε, 0) Extinction threshold(K, 0) Prey-only equilibrium(1, y∗) Co-existence equilibrium

Part (b)

Region Characterizing inequalities Generic long-time behaviorI (1 + 2ε − ε2)/2ε < K Converges to extinction

and ε < 1

II 1 < K < (1 + 2ε − ε2)/2ε Converges to extinction orthe co-existence equilibrium

III ε < K < 1 Converges to extinction orthe prey-only equilibrium

Table 1.1: Part (a): Equilibria of (1.41), the Lotka-Volterra system augmented

by logistic growth and the Allee effect. At the co-existence equilibrium y∗ is given by (1.42). Part (b): Generic long-term behavior of solutions of (1.41), depending on ε, K . Regions I, II, and III refer to Figure 1.11(a). (“Generic” is a rough synonym

for “typical”—see Additional Notes, Section 1.8.)

23



the threshold for extinction.

In contrast to (1.39), solutions of the perturbed equation (1.41) are almost neverperiodic. Rather, as illustrated in Figure 1.11(b)-(d), solutions converge to one of the equilibria of (1.41) as t → ∞. In Figure 1.11(a) we have identified three regionsin the subset of ε, K -plane defined by (1.43). In Figure 1.11(b)-(d) we show typicaltrajectories that occur for ε, K in each of the three regions, and in Table 1.6.2 wedescribe their behavior as t → ∞ in words.

Note that the following surprising behavior is contained in the above summary.

Imagine starting with ε, K in Region II and initial conditions such that the solutionof (1.41) converges to the co-existence equilibrium. Now increase K so that (ε, K )moves into Region III. Although increasing the carrying capacity seems like it shouldpromote the overall health of the system, this parameter change leads to a worsefate—total extinction !

1.7 Software for numerical solution of the IVP

Previously, we mentioned that it is rarely possible to produce explicit solutions of initial value problems. This begs the question: how might we describe solutions

of differential equations that are resistant to analytical techniques such as those inSection 1.3? Two of the most common approaches are

• the qualitative approach: Given an analytically intractable DE, produce a lin-ear, constant-coefficient DE (Section 1.3.1) which has qualitatively similar dy-namics to the original DE (at least locally15); and

• the numerical approach: Use computer software/algorithms to approximate thesolution of an initial value problem over some time interval.

This section concerns the latter approach and, although numerical methods are not

the central theme of this textbook, we want the reader to be aware of their importancein the study of DEs.

There is myriad of available software for numerical solution of IVPs, some com-mercially available and some freely available. One free program we urge you todownload and install is XPP, which was developed by Professor G. Bard Ermentroutof the University of Pittsburgh. The purpose of XPP is to numerically solve differen-tial equations, difference equations, delay equations, functional equations, boundaryvalue problems, and stochastic equations. Because XPP is bundled with another pro-gram called AUTO (a tool for exploring bifurcations—see Chapters 7 and 8), XPP isalso known as XPPAUT.

We have developed a website for readers who wish to supplement our textbookwith the XPP software: Please visit

15This approach will be developed in detail in Chapter 5.

24



Region I

Region II

Region III

Region I

Region II Region III

(a) (b)

(c)

3

1

0

0 1ε

K

(d)

0.0

2.5

y

2.50.0 x K K

0.0

2.5

y

2.50.0 x

0.0

2.5

y

2.50.0 x

start

K = 4.0

K = 2.0 K = 0.4

startstart

start start

start

Figure 1.11: Panel (a): Three regions in ε-K parameter space for which solutions of (1.41) have different dynamical behavior. The boundaries of the three regions are

formed by the curves K = ε, K = 1, and K = (1 +2ε − ε2)/(2ε). Panels (b) through (d): Solution trajectories of (1.41) with ρ = 1, ε = 1/5 and three different carrying

capacities K . Each panel shows trajectories corresponding to two choices of initial conditions: x(0) = y(0) = 0.5 and x(0) = y(0) = 2.0. Panel (b): With K = 4,both species go extinct. Growing oscillations in prey population ultimately lead to the prey’s extinction as a result of the Allee effect. Panel (c): With K = 2, the initial conditions determine whether both species go extinct or whether their populations experience transient oscillation en route to the coexistence equilibrium. Panel (d): If

K = 2/5, the predators go extinct, but the prey may either go extinct or equilibrate to K depending upon the initial conditions. In Panels (c) and (d) the prey-only equilibrium at (K, 0) is indicated, but in Panel (b) this equilibrium lies outside the range of x in the figure.

25



http://www.math.duke.edu/∼jcain/book/main.html

for (i) instructions on downloading and installing XPP as well as a link to the officialXPP website; (ii) XPP code that we have written to solve DEs that appear in thistextbook (including exercises); and (iii) step-by-step instructions on how to use ourXPP code to explore solutions of initial value problems. For example, we have includedXPP code for solving some of the equations in this chapter, including the Riccatiequation (1.6), the Duffing equation (1.25), the van der Pol equation (1.31), the

modified Lotka-Volterra system (1.41), and the vibrated pendulum equation (1.51).A few tips, disclosures, and disclaimers that you should be aware of, some of

which are specific to XPP:

1. The instructions and code that we posted on the above website assume that thereader will run XPP under the Windows R operating system. If you install andrun XPP under a different operating system, there may be slight discrepanciesbetween our instructions and what you see on your computer screen. We’ll relyon you to adapt and improvise as needed.

2. Be sure to try out our first two examples of XPP code under the “Chapter 1”

link on the website (Riccati and van der Pol equations).

3. You will find that most of the syntax in XPP is easy, but there are a fewconventions and quirks that you should be aware of. Examples of conventions:DEs that are second-order or higher must be written as systems of first-orderDEs, and the default variable name for the independent variable is t. Examplesof quirks: XPP has a couple of default settings that cause it to stop computingif either (i) a variable ever exceeds 100 in magnitude or (ii) more than 5000data points are generated. Dealing with such issues is very straightforward aslong as you are aware that they exist, and we have compiled a list of advice

under the “XPP Installation and Syntax” link on the above website.4. Caveat emptor! When using any software for numerical solution of initial value

problems, you need to be careful. Just because software uses mathematicalalgorithms that are intended to approximate solutions does not guarantee thatthose approximations will be satisfactory. You may be alarmed to learn thatnumerical methods can perform poorly even for DEs that are not “rigged”for pathological behavior. In the Chapter 4 exercises, you will see that thesimplest numerical method (Euler’s method) for approximating the solution of an IVP can be an utter disaster if applied to the seemingly innocent problemx′ =

−Mx,x(0) = 1, where M is a large constant. Conversely, you should be

relieved to know that numerical methods work beautifully for most IVPs, andare a wonderful tool for gaining intuition regarding a system’s behavior. Evenif an IVP does not admit an explicit solution, it is usually possible to tune

26



a numerical method so as to generate an approximation of the true solutionthat is accurate to within a user-specified error tolerance. The trade-off is thatthe less error you are willing to accept, the longer it will take for the softwareto generate the approximate solution. Readers interested in these and otherissues should consult texts in numerical analysis, such as [1].

1.8 Exercises

1.8.1 Exercises to consolidate your understanding

1. Supply details omitted in the text.

(a) Show that every solution of x′ = αx is of the form Ceαt for some constantC . (Hint: Show that, for a solution x(t), the derivative of e−αtx is zero.)

(b) Verify that formula (1.16) solves the first-order linear equation (1.15).

(c) Prove the following claims about the logistic equation made in the text.

• Check that (1.19) satisfies (1.2).

• Verify that the choice (1.20) satisfies the initial condition x(0) = b.

• Show the logistic equation has a solution for all positive time if the initialdatum b is positive.

(d) Derive (1.30), the equation for energy dissipation in (1.27).

2. Construct ODEs with the following properties. The ODEs should be in thestandard form where the highest derivative has been “solved for”.

(a) A third-order scalar ODE that is nonlinear and nonautonomous.(b) A fifth-order, linear, homogeneous scalar equation with constant coefficientssuch that every solution tends to zero as t → ∞. (Hint: Start by making up afifth-order polynomial with all its roots in the LHP.)

(c) A nonlinear autonomous three-dimensional system16.

3. Find solutions for the following equations or IVPs using separability. (See alsoExercise 5 below.)

16

Perhaps the most famous such system is the Lorenz equations, (7.5).

27



(a) The Gompertz model for tumor growth, in which the center is starved foroxygen (see p217 Edelstein-Keshet []):

dN/dt = µe−αtN.

(b) The logistic equation with constant harvesting:

x′ = x(1 − x) − µ

where µ is a positive constant.

Discussion: The cases 0 < µ < 1/4, µ = 1/4, and 1/4 < µ must betreated separately. Think about the equilibrium equation x(1 − x) −µ = 0 to understand why the behavior of this equation changes atµ = 1/4.

In this equation it is assumed that constant harvesting continues evenas x → 0, which of course is unsustainable. This faulty assumptionis related to the fact that this equation can predict negative popula-tions.

(c) The pedagogical example, (1.7), manipulated into standard form:

x′ =√

x2 − 1, x(0) = 1. (1.44)

Discussion: You will most likely find the solution x(t) = cosh t.However, x(t) ≡ 1 is also a solution! In Chapter 3 we will give con-ditions that guarantee that the initial-value problem has a uniquesolution. In the meantime, you may want to ponder what misbehav-ior of

√ x2 − 1 leads to this nonuniqueness.

(d) x′ = −1/x, x(0) = 1.

Discussion: The RHS of this equation is singular at x = 0. Be alertto what behavior results from this singularity.

(e) A system that will be used as an illustration in later chapters:

x′ = x − y − (x2 + y2)x x(0) = b1y′ = x + y − (x2 + y2)y y(0) = b2.

Hint: Solving this system would be hopeless except for the fact that

it may be rewritten in polar coordinates

r′ = r(1 − r2), θ′ = 1,

28



in which the two equations are uncoupled.

4. (a) Show that if D1, D2 ∈ C, then

x(t) = D1eit + D2e−it (1.45)

is a complex-valued solution of (1.3). Also show that if D1 = D2, where barindicates complex conjugation, then (1.45) is real-valued.

(b) Show that for any solution x(t) of the form (1.12), there exist real constantsC, δ such

x(t) = C sin(t + δ ), (1.46)

and conversely.

Remark: This exercise illustrates that other representations of solu-tions of (1.3) are possible.

1.8.2 Exercises referenced elsewhere in this book

5. In this exercise the reader develops evidence that nonconstant solutions of theLotka-Volterra equations are periodic.

(a) Although it is not possible to solve the Lotka-Volterra equations for x and yas functions of t, it is possible to eliminate time and derive an implicit relationbetween x and y for the orbits . We derive an ODE for y as a function of x bythe chain rule

dy

dx=

dy/dt

dx/dt=

ρ(xy − y)

x − xy, (1.47)

where we have substituted (1.39) for the second equality. Let L(x, y) be definedby (1.40). Derive

L(x, y) = const (1.48)

as an implicit solution of (1.47), preferably working directly from (1.47) usingthe fact that this equation is separable, or alternatively simply by differentiat-ing (1.40).

(b) Verify that the level sets of L(x, y) are closed curves.

Discussion: Combining (a) and (b), we see that the trajectories of (1.39) are contained in closed curves. To complete the proof that ev-ery nonconstant trajectory is periodic, we would have to rule out thepossibility that a trajectory might not complete the circuit aroundthe closed curve. Although this is not beyond our present capabil-ities, such arguments will be much easier after we have developed

29



more theory, so we leave this gap open for the time being. (Cf.Chapter 6.)

6. (a) Consider an inhomogeneous, linear scalar ODE of order n, say

x(n) + a1(t)x(n−1) + a2(t)x(n−2) + . . . + an−1(t)x′ + an(t)x = f (t). (1.49)

Let xpartic(t) be some solution of (1.49). (Such a solution is called a particular

solution, which provides the mnemonic for the subscript.) Show that anysolution x(t) of (1.49) can be written in the form

x(t) = xpartic(t) + xhomog(t)

where xhomog(t) satisfies the homogeneous equation: i.e., (1.49) with the inho-mogeneous term f (t) set equal to zero.

Remark: This idea—i.e., particular solution plus homogeneous solution—is taught in all elementary courses on ODE.

(b) Consider periodic forcing of a spring-mass system

mx′′ + βx′ + kx = C cos ωt. (1.50)

Find a particular solution of this equation by looking for a solution in the formxpartic(t) = A cos ωt + B sin ωt.

(c) Show that, provided β > 0, every solution of (1.50) tends to xpartic(t) ast → ∞.

(d) Graph the amplitude√

A2 + B2 in xpartic(t) as a function of ω, both forlarge β and small β . Note that in the latter case the amplitude has quite a

large spike if ω is close to the frequency

k/m of the undamped oscillator.

Remark: This is our first encounter with the phenomenon of reso-nance .

7. Consider the two-dimensional system

εx′ = 1 − xy′ = x − y.

where ε is a small positive parameter, subject to initial conditions

x(0) = a, y(0) = b.

30



(a) Note that the x-equation does not involve y. Use this observation to solvethe above initial-value problem, treating the x-equation and the y-equationsequentially.

(b) Given that ε ≪ 1, it is tempting to consider an approximation setting ε = 0.This approximation has the alarming effect of transforming the differentialequation εx′ = 1 − x into a purely algebraic equation, x = 1. Ignoring thewarning bells that such a violent approximation sets off, nevertheless substitutex

≡1 into the y-equation and solve the IVP

y′ = x − y, y(0) = b.

Discussion: Observe that, apart from an initial transient duringwhich x tends rapidly to 1, the two solutions closely track one an-other. This is the first instance of a theme that appears frequentlyin applied math. The original system has two widely separated timescales: the x-equation tends to an equilibrium in a short time on theorder of ε, while the y-equation evolves more slowly. The approxi-mation is to let the rapid variable “proceed to equilibrium”, which

results in a simpler problem for the remaining variable(s). This isan exceedingly useful approximation in many contexts, but cautionis required in using it.

1.8.3 Computational Exercises

In addition to the explicit exercises below, we invite the reader to use the software tocheck any statements made in the text. For example, although we prove that everysolution of (1.39) (in the first quadrant) is periodic, it may be reassuring to see thisbehavior in computed solutions. Incidentally, Exercise 13 also has a computationalcomponent.

8. Compare numerical solutions of the logistic equation (1.2) with the analyticalsolution (1.19).

Discussion: This exercise is more for practice in using the softwarethan for any interesting math. One particular lesson is to see howthe software behaves in case the solution (1.19) “blows up”, whichhappens for some positive time if the initial datum b is negative. Theblowup may be seen in better detail if one plots y, or rather |y|, ona log scale.

9. This exercise in intended to show that typical solutions of the van der Polequation (1.34) tend to periodic behavior as t → ∞.

31



x−A cos( t)ω

Figure 1.12: Schematic of the vertically-vibrated pendulum of Exercise 11.

(a) Set β = 1 and solve the initial-value problem for several choices of initialconditions. As long as b = 0, you will see the periodic solution in Figure 1.9

emerge.(b) Choose some other values of β and repeat the above computation. You willagain see periodic behavior, but the exact orbit depends on β .

10. For the augmented equation (1.41), verify the behavior claimed in Figure 1.11.

11. Discussion: Consider a pendulum whose supporting pin is vibrated vertically(see Figure 1.12). The next computation demonstrates the amazing fact thatrapid vibration of the pin can make the “straight up” position of the pendulumstable ! If the height of the pin is −A cos ωt and if friction is small, then thedisplacement x of the pendulum approximately satisfies an equation of the form

x′′ + βx′ + [1 + αω2 cos ωt]sin x = 0 (1.51)

where α is proportional to A, the amplitude of the vibrations. This differs from(1.4) by two terms: βx′, which models friction, and the term proportional tothe acceleration of the pin, Aω2 cos ωt.

(a) Write equation (1.51) as a first order system.

(b) Start with the pendulum at rest and nearly vertical, say x(0) = 3.1, andlet α = 0.1 Solve the equations for various ω’s, say starting from ω = 1 andincreasing it repeatedly. If ω >??, the pendulum will come to rest in the

straight-up position!

32



1.8.4 Exercises of independent interest

12. A point b is called an equilibrium of x′ = f (x) if f (b) = 0. In such a casex(t) ≡ b is a solution of the equation. Considering only scalar equations, makean educated guess (don’t bother with a “proof”) which of the following twostatements is associated with f ′(b) > 0 and which with f ′(b) < 0:

(a) If the initial datum x(0) is sufficiently close to b, then x(t) tends to b ast

→ ∞.

(b) No matter how close the initial datum x(0) may be to b, the solution x(t)moves further away from b as t increases.

Remark: This problem anticipates the concept of stability in Chap-ter 5.

13. (a) Verify numerically the behavior conjectured in Section 1.2.2 for the Riccatiequation, (1.6). Specifically,

• Compute that for negative and small positive values of b the solutionasymptotes to the parabola x2

−t = 0.

• Compute that for large values of b the solution appears to blow up in finitetime.

• Locate the initial datum b∗ that separates the two behaviors.

Discussion: In the remainder of this exercise we attempt to obtainanalytical information about the asymptotic behavior as t → ∞ of solutions of (1.6).

(b) Show that there are formal series solutions of this equation,

x+(t) =√

t + a01t

+ a11

t5/2+ . . . and x−(t) = −√

t + b01t

+ b11

t5/2+ . . . ,

series in inverse powers of t3/2.

Discussion: When we say formal series , we are allowing the possi-bility that the series may not converge. Thus to show that a formalseries solution exists, you need only derive a recursion relation forsuccessive coefficients in the series. You should also calculate thefirst few coefficients in each series.

The series x+ separates the two asymptotic behaviors of solutions of (1.6); x− characterizes the asymptotic behavior of all solutions thatremain in the lower-half plane x < 0 as t → ∞.

33



(c) Since the series are based on inverse powers, they are useful in the limitt → ∞. Compare, for large t, say t > 10, the sum of the first few terms of these series with numerical solutions of (1.6).

1.9 Additional notes

1.9.1 Miscellaneous

There is a general construction to reformulate a nonautonomous system of ODE inn dimensions, say

y′ = F(y, t),

where y(t) = (y1(t), . . . , yd(t)) and F : Rd → Rd, as an autonomous system in one

higher dimension. Form a new unknown z(t) = (z 1(t), . . . , z d(t), z d+1(t)), a d + 1-dimensional vector, by appending an additional variable. Using the notation

z(t) = (z(t), z d+1(t)) where z(t) = (z 1(t), . . . , z d(t)),

we require that z(t) satisfy

z′ = G(z)where G : Rd+1 → Rd+1 is defined by

G(z) =

F(z(t), z d+1(t))

1

.

We may connect the two systems by observing that z d+1(t) is essentially equivalentto time. To see this, it may be helpful to write out both equations in components.

In Section 1.3 we suggested that one might attempt to visualize solutions of (1.27), a particle moving in a potential V (x), as the motion of a marble rolling in thex, z -plane along a curve given by the equation z = V (x). Quantitatively this analogyfails badly. In the first place, rolling introduces a whole new level of complexity—one needs to distinguish between rolling with and without slipping, which requiresexamining friction between the marble and the surface. Suppose we completelyignore rolling—perhaps solutions of (1.27) are analogous to a particle sliding (withminimal friction) in the x, z -plane along a curve given by the equation z = V (x).However, this analogy is also flawed, even if one ignores the possibility of the marblemoving so rapidly that it lifts off the curve. Specifically, for sliding along a curve,motion in the z -direction influences the x-component. Precise equations for slidingalong a curve are most easily derived with the Lagrangian formulation of mechanics

[?].

Equation 1.41 is a system of ODEs that contains several parameters, and the

34



behavior of solutions depends on the parameters. Situations of this type will arisefrequently in this book.

1.9.2 The concept “generic”

Let us explore the meaning of the term generic used in Table 1.6.2. This is our firstencounter with this useful, but perhaps over-used, concept. Although the term is arough synonym for “typical”, it carries a lot of mathematical associations that can

be learned only by exposure over time. Let’s get started now.Not every solution of (1.41) has long-time behavior as listed in the table. For ex-

ample, no matter what ε, K may be, the threshold equilibrium solution (x(t), y(t)) ≡(ε, 0) does not move away from its initial condition (x(0), y(0)); in particular it doesnot converge to any of the equilibria listed in Part (b) of Table 1.6.2. On the otherhand, with the slightest perturbation of the initial conditions from (ε, 0), the solutionwill evolve in time and (probably) converge to some other equilibrium, which onedepending on the perturbation. Robustness is one association of generic; thus, wedismiss the equilibrium solution (x(t), y(t)) ≡ (ε, 0) as non-generic.

Here is a more interesting example of a non-generic solution. (Although we

describe it in words, the message is more vivid if you check what we say with yourown computations.) In Figure 1.11(b) imagine a one-parameter family of initialconditions lying on the line y = 0.5, say (x(0), y(0)) = (b, 0.5) where 0 < b < 1.On the one hand, if b is close to 0, the solution will converge to extinction, like theupper trajectory in the figure. On the other hand, if b is close to 1, the solutionwill converge to the prey-only equilibrium, like the lower trajectory in the figure. Bycontinuity, somewhere in between these extremes is an initial condition (b∗, 0.5) thatseparates these behaviors. As one might guess, the solution with initial condition(b∗, 0.5) in fact converges to the threshold equilibrium, (ε, 0), as t → ∞. However,we again dismiss this as non-generic since perturbing x(0) to either side of b∗ leads

to qualitatively different behavior.Yet another example of generic/non-generic behavior is contained Figure 1.1.

Generically, solutions of the Riccati equation (1.6) either blow up in finite time orasymptote to the parabola x = −√

t. The dividing case with x(0) = b∗ is a non-generic solution.

In the preceding paragraphs we have spoken of non-generic solutions of one spe-cific equation. One may also speak of a non-generic equation . Indeed, this termdescribes the Lotka-Volterra system (1.33) perfectly—every non-constant solution of this system is periodic, but an arbitrarily small perturbation of the equation, suchas (1.41) with ε ≪ 1 and K ≫ 1, can completely change the phase plane.

Stay tuned for more occurrences of this concept, but not till Chapter 5.

35



Chapter 2

Linear systems with constant coefficients

2.1 Preview

The bulk of this chapter is devoted to homogeneous linear systems of ODEs withreal constant coefficients. Such a system may be written

x′1 = a11x1 + a12x2 + . . . + a1dxd

x′2 = a21x1 + a22x2 + . . . + a2dxd

......

...

x′d = ad1x1 + ad2x2 + . . . + addxd.

(2.1)

(From now on we shall use d for the dimension of our systems so that the index n isavailable for other uses.) The written-out system (2.1) is awkward to read or write,

and we shall normally use the vastly more compact linear-algebra notation

x′ = Ax (2.2)

where x = (x1, x2, . . . , xd) is a d-dimensional vector of unknown functions, A is ad × d matrix with real entries, and matrix multiplication is understood in writingAx. In vector notation, an initial condition for (2.2) is

x(0) = b (2.3)

where b ∈ Rd.

In the Section 2.2 we show that the initial-value problem (2.2), (2.3) has the

36



unique solution given by the formula

x(t) = eAtb (2.4)

where the exponential of a matrix is defined in complete analogy with the exponentialof a scalar,

eAt = I + At +1

2!(At)2 +

1

3!(At)3 + . . . . (2.5)

Our first task, the goal of Section 2.2, is to show that the series ( 2.5) converges andto establish the basic properties of the matrix exponential. Note that each term inthe series is a square matrix and hence, if it converges, the sum is a d × d matrix;thus the RHS of (2.4) is dimensionally consistent as a matrix product. Also notethat, unlike for dimension one, the two factors in (2.4) must be written in the orderin which they appear.

It turns out that using the series (2.5) is rarely the most convenient way tocompute eAt. In Section 2.3 we discuss how to compute the exponential of a matrixby finding its eigenvalues and eigenvectors. Some linear-algebra background for thisis reviewed in Appendix C.

The following simple calculation motivates the appearance of the eigenvalue prob-lem in solving linear systems. We ask whether, in analogy with a scalar linear ODE,there might be for some λ solutions of the vector equation (2.2) of the form eλt timesa constant? Of course the “constant” would have to be a vector in order to have afunction of the appropriate dimension. Thus we refine our question to: are there anyscalars λ and any vectors v ∈ Rd such that

x(t) = eλtv (2.6)

is a solution of (2.2)? Making this substitution, we calculate for the two sides of (2.2):

x′(t) = λeλtv, Ax(t) = eλtAv.

The two sides of (2.2) are equal iff

Av = λv

where we have canceled the exponential factor, which is nonzero. In other words,(2.6) is a solution of (2.2) if and only if v is an eigenvector of A with eigenvalue λ.

In the two final sections of the chapter, we discuss the asymptotic behavior of solutions of (2.2) as t → ∞, and we give formulas for solving an inhomogeneousequation.

37



2.2 Definition and properties of the matrix exponential

2.2.1 Preliminaries about norms

To proceed, we need to define what it means for a series of matrices like (2.5) to con-verge. We could define convergence of a series of matrices in terms of the convergenceof each entry, considered as a sequence of real numbers. However, proofs are simplerif we introduce a metric on the set of matrices and use it to define convergence. The

metric is based on how matrices act when multiplying vectors, so we begin with someconcepts related to vectors.

For any vector x ∈ Rd, the d-dimensional generalization of the PythagoreanTheorem suggests that we define the length of x, written |x|, by

|x| =

d j=1

x2 j . (2.7)

In analysis it is more common to call the expression |x| the norm of x rather thanits length, and we shall follow that usage. Note that the norm may be expressed as

|x| =

x, x (2.8)

where ·, · denotes the usual inner product on Rd,

x, y =d

j=1

x jy j . (2.9)

In two and three dimensions it is known that

x, y

=

|x

| |y

|cos θ (2.10)

where θ is the angle between x and y. In d-dimensions, (2.10) is used to define theangle between two vectors. The following lemma, the Cauchy-Schwarz inequality,supports this definition by guaranteeing that cos θ computed from (2.10) is at mostunity in absolute value.

Lemma 2.2.1. For any vectors x, y ∈ Rd,

|x, y| ≤ |x| |y|.

Proof. If y = 0, both sides of the inequality vanish and the result is trivial, so we

assume y = 0. Consider choosing a constant c to minimize

|x + cy|2 = |x|2 + 2cx, y + c2|y|2. (2.11)

38



x1

x2

yx+y

x

Figure 2.1: Illustrating the triangle inequality.

To find the minimum, we differentiate (2.11) with respect to c, set the derivativeequal to zero, and solve the resulting trivial linear equation for c, obtaining

c∗ = −x, y

|y

|2

.

On substituting into (2.11) and combining terms, we find

|x + c∗y|2 = |x|2 − x, y|y|2 .

Since |x + c∗y|2 ≥ 0, the lemma follows.

In the next lemma we collect several simple but fundamental properties of thenorm function.

Lemma 2.2.2. For any vectors x, y∈Rd and scalar c

∈R,

(i) |x| ≥ 0, and |x| = 0 iff x = 0.(ii) |cx| = |c| |x|(iii) |x + y| ≤ |x| + |y|.

Inequality (iii) is called the triangle inequality, for reasons suggested by Figure 2.1.

Proof. We leave the derivation of properties (i) and (ii) to the reader; we merely callthe reader’s attention to the fact that the vertical bars in |c| and in |x|—the absolutevalue of a scalar and the norm of a vector—are subtly different. Regarding (iii),

observe that |x + y|2 = |x|2 + 2x, y + |y|2.

39



Using Lemma 2.2.1 to bound the middle term, we compute that

|x + y|2 ≤ |x|2 + 2|x| |y| + |y|2 = (|x| + |y|)2.

The result follows on taking a square root.

The norm of a vector specifies its “size”. The analogous quantity measuring sizefor matrices, written with double bars A and but also called the norm, is definedin terms of the operation of a matrix on vectors; specifically, if A is a d1

×d2 matrix,

we defineA = max

|x|≤1|Ax|. (2.12)

In this expression, for Ax to be defined, x must be a d2-dimensional vector, whileAx has dimension d1. Thus if d1 = d2, |x| and |Ax| are computed with respectto different spaces, even though the notation does not indicate this explicitly. InExercise 1 we ask the reader to justify that the maximum (2.12) actually exists. Thisfollows from compactness, a topic covered in real analysis courses and discussed belowin Appendix B. Even if these ideas may seem rather abstract when first encounteredin a theoretical course, we hope that seeing them used in specific applications will

demystify them. In any case, the reader needs to become comfortable with them.Let us collect useful properties of the matrix norm.

Lemma 2.2.3. For any matrices A, B of appropriate dimensions, for any vector x ∈ Rd, and for any scalar c ∈ R,(i) A ≥ 0, and A = 0 iff A = 0.(ii) cA = |c| A(iii) A + B ≤ A + B(iv) |Ax| ≤ A |x|(v) AB ≤ A B(vi)

A2

≤ A

2.

These properties are of vital importance, and in the Exercises we ask you to verifythem. Although this task probably seems less than exciting, we urge the reader notto skip over it lightly because it helps develop proficiency with the use of norms.

The next result relates the norm of a vector or matrix to information about thesize of its entries.

Lemma 2.2.4. If x ∈ Rd, then

max1≤ j≤d |

x j

| ≤ |x| ≤

d

j=1 |

x j

|.

40



If A is a d1 × d2 matrix with entries a jk, then

max1≤ j≤d1

max1≤k≤d2

|a jk | ≤ A ≤d1

j=1

d2k=1

|a jk |.

We refer the reader to Exercise 1 for hints on how to prove the matrix part of this lemma.

Note that properties (i–iii) in Lemmas 2.2.2 and 2.2.3 are the same. Indeed, any

function · from a vector space to the non-negative reals satisfying these threeproperties is called a norm. Given a norm, one may define the distance between twovectors in the space1. Thus, for vectors in Rd and for matrices we define

dist(x, y) = |x − y|, dist(A, B) = A − B, (2.13)

respectively.

2.2.2 Convergence

Given the notion of distance, one may define convergence. Specifically, for matrices,we say that a sequence An of matrices converges to a limit L if

limn→∞

An − L = 0.

In the usual way we say that an infinite series of matrices∞

0 An converges if thesequence of partial sums

N 0 An converges as N → ∞. More carefully,

∞0 An

converges to a limit matrix L if, for every ε > 0 there is an integer2 N 0 such that

N > N 0 =⇒

N

n=0

An − L

< ε.

Lemma 2.2.5. A sequence of matrices An converges if and only each sequence of entries converges, and likewise for an infinite series

∞0 An.

Proof. This result is easily proved using Lemma 2.2.4; we leave the details to thereader.

1In the next chapter, we shall explore this construction in an infinite-dimensional context.

2Sometimes in definitions of this sort one appends a restriction like N 0 > 0. This is necessary,for example, in the usual ε, δ definition of the continuity of a function of a real variable (i.e., forevery ε > 0 there is a δ > 0 such that |x − x0| < δ implies that |f (x) − f (x0)| < ε): if δ were

negative, the implication would be valid by virtue of its hypothesis never being satisfied. In thepresent case, no such restriction on N 0 is needed to avoid trivialities.

41



We shall say that a series∞

0 An of matrices is absolutely convergent if ∞

0 An <∞.

Lemma 2.2.6. If ∞

0 An is absolutely convergent, then the series converges; i.e.,there exists a matrix L such that

limN →∞

N n=0

An = L.

Moreover, for any integer M ,M

n=0

An − L

≤∞

n=M +1

An. (2.14)

Proof. In Exercise 1 we ask the reader to invoke the analogous result for scalars(Proposition B.0.8 from Appendix B) and to use Lemma 2.2.4 to reduce the proof of convergence of the matrix series to the scalar case. The inequality (2.14), a seem-ingly innocuous generalization of the triangle inequality to an infinite sum, actuallyrequires a limiting argument that we outline in the Exercise.

Proposition 2.2.7. The series ∞

0 (At)n/n! in (2.5) converges absolutely.

Proof. For the general term in (2.5) we have the estimate(At)n

n!

≤ (|t| A)n

n!, (2.15)

and by comparison with the series representation of the ordinary exponential e|t| A,we see that

∞0 (At)n/n! < ∞.

We will write eAt for the sum (2.5), which is guaranteed to exist by the proposi-

tion.

Corollary 2.2.8. For any real number M , the series ∞

0 (At)n/n! converges uniformly on t : |t| ≤ M . I.e., given any M , for any ε > 0 there is an integer N 0such that for any N > N 0 and all t satisfying |t| ≤ M ,

N n=0

(At)n

n!− eAt

< ε.

Proof. By (2.14) and (2.15)N

n=0

(At)n

n!− eAt

<∞

n=N +1

(|t| A)n

n!<

∞n=N +1

(M A)n

n!

42



Since the series for eM A converges, if N is sufficiently large, the RHS of this in-equality can be made arbitrarily small.

2.2.3 The main theorem

Having slogged through a lot of rather dry material, and with more of the sameahead of us, let us reward ourselves by jumping ahead to the main result of thissection. We return to the case where A is a square matrix, say d × d.

Theorem 2.2.9. For any b ∈ Rd the solution to the IVP

x′ = Ax, x(0) = b

is unique, and it is given by the formula

x(t) = eAtb. (2.16)

To prove the theorem, it’s back to the salt mines. First we must study thedependence of eAt on t. If φ(t) is a matrix-valued function, we shall say that φ is

continuous at t or that φ is differentiable at t with derivative L if

lim∆t→0

φ(t + ∆t) = φ(t) or lim∆t→0

φ(t + ∆t) − φ(t)

∆t= L,

respectively. It is natural to interpret these limits using norms. Of course, byLemma 2.2.4, φ is continuous or differentiable in this sense if and only if each entryof φ is continuous or differentiable. In the obvious notation, we shall write φ′(t) forthe derivative of φ at t, and we shall call φ continuously differentiable on an intervalif φ is differentiable at every point in the interval and φ′(t) is continuous with respectto t there.

Proposition 2.2.10. eAt is a continuously differentiable function of t and

d

dteAt = AeAt = eAtA.

Discussion: The proof of this result would be simple if we could differentiate aninfinite series of functions term by term; in symbols,

d

dt

∞n=0

f n(t) =∞

n=0

df ndt

(t). (2.17)

However, while the derivative of a finite sum is the sum of the derivatives, this neednot be true for an infinite sum. Corollary B.0.10 in Appendix B provides a sufficientcondition justifying term-by-term differentiation; specifically (2.17) is valid if the

43



series on the RHS converges uniformly. The appendix also contains a counterexamplein which (2.17) fails.

Proof. Each term (At)n/n! = tn(An/n!) in the series for eAt is continuously differen-tiable; we have

d

dt

(At)n

n!= nA

(At)n−1

n!= n

(At)n−1

n!A,

and we may simplify by observing that n/n! = 1/(n − 1)! provided n ≥ 1. Taking a

finite sum, we have

d

dt

N n=0

(At)n

n!= A

N n=1

(At)n−1

(n − 1)!=

N n=1

(At)n−1

(n − 1)!A.

NowN

n=1

(At)n−1

(n − 1)!=

N −1m=0

(At)m

m!,

and as N → ∞ this series converges to eAt. Indeed, as in the proof of Corollary 2.2.8,the convergence is uniform for

|t

| ≤M , so the proposition follows by applying Corol-

lary B.0.10 in Appendix B.

In the next two results we suppress t in eAt for brevity. Since A is an arbitrarymatrix, we lose no generality by doing this.

Proposition 2.2.11. The exponential of the zero matrix is the identity; in symbols,

e0 = I . For any matrix A, eA is invertible and

(eA)−1 = e−A.

Proof. It is readily seen from the series expansion (2.5) that e0 = I . Regarding the

claim about inverses, let φ(t) = eAte−At. According to the previous proposition, eachfactor in φ is continuously differentiable. In Exercise 1 we ask the reader to provethat the product of two continuously differentiable matrix-valued functions is con-tinuously differentiable and its derivative is given by Leibniz’ rule for differentiationof a product. Thus,

d

dtφ(t) =

d

dteAt

e−At + eAt

d

dte−At

= eAt(+A)e−At + eAt(−A)e−At = 0,

where we have applied Proposition 2.2.10. Hence φ(t) = φ(0) = I for all t, inparticular for t = 1, and the result is proved.

One consequence of Proposition 2.2.10 is that A and eAt commute. More gener-ally, we have:

44



Proposition 2.2.12. If AB = BA, then AeB = eBA, eAeB = eBeA, and

eA+B = eAeB.

Proof. We prove only the displayed formula; the other two results are left for theExercises. Let

φ(t) = e−t(A+B)etAetB.

By Leibniz’ rule

d

dtφ(t) = e−t(A+B)(−A − B)etAetB + e−t(A+B)(A)etAetB + e−t(A+B)etA(B)etB.

In third term we may commute the middle two factors, etA and B, and then all threeterms add up to zero. Thus φ(t) = φ(0) = I for all t.

It is now a simple matter to prove the main result of this section:

Proof of Theorem 2.2.9 . It is obvious that x(t) = eAtb satisfies the initial condition.To show that it satisfies the equation, just differentiate and apply Proposition 2.2.10.To show uniqueness, suppose x(t) is one solution and let

y(t) = e−Atx(t). (2.18)

Differentiate (2.18) to show that

d

dty(t) = −e−AtAx(t) + e−At d

dtx(t).

Since x(t) satisfies the ODE, the two terms in this equation cancel, yielding dy(t)/dt =0. Thus

y(t) = y(0) = x(0) = b,

and the result follows on multiplying (2.18) by eAt.

2.3 Calculation of the matrix exponential

2.3.1 The role of similarity

Suppose x(t) is a solution of the linear system

x′ = Ax. (2.19)

Let us consider a linear change of coordinates for the unknown functions x j(t);i.e., let S be a nonsingular matrix and define a new vector of unknown functions by

45



y(t) = S x(t). Then we may derive an ODE for y(t) as follows:

y′ = S x′(t) = SAx = SAS −1y.

In other words y also satisfies a linear homogeneous system of ODEs, and the coef-ficient matrix in the ODEs for y is SAS −1—a matrix similar to A in the technicalsense of linear algebra. The following proposition guarantees that the exponentialsof similar matrices are themselves similar.

Proposition 2.3.1. If B = SAS −1 then eBt = SeAtS −1.

The reader is asked to prove this result in Exercise 1

To illustrate the value of this result, let us suppose that the matrix A in (2.19) isdiagonalizable overR. Specifically, suppose that A = S ΛS −1 where Λ = Diag(λ1, λ2, . . . , λd)is a diagonal matrix3 with eigenvalues of A along its diagonal:

Λ =

λ1 0 0 . . . 00 λ2 0 . . . 00 0 λ3 . . . 0...

.

..... . . .

.

..

0 0 0 . . . λd

,

where λ j ∈ R. Hence by Proposition 2.3.1

eAt = SeΛtS −1.

Now, for any power, Λn is also diagonal, simply Diag(λn1 , λn

2 , . . . , λnd). Thus the series

for eΛt converges to the diagonal matrix

eΛt = Diag(eλ1t, eλ2t, . . . , eλdt).

HenceeAt = S Diag(eλ1t, eλ2t, . . . , eλdt) S −1. (2.20)

To take advantage of (2.20), we need to be able to calculate the similarity matrixthat diagonalizes A, and the next result tells us how to do this. In this propo-sition, the notation Col(v1, v2, . . . , vd) denotes the matrix whose columns are thespecified vectors v1, v2, . . . , vd. If A is diagonalizable over R, then there is a basisv1, v2, . . . , vd for Rd consisting of eigenvectors of A.

3

Unless Λ is a multiple of the identity, it is not unique because the eigenvalues may be enumeratedin any order.

46



Proposition 2.3.2. Suppose A is diagonalizable over R with eigenvectors v1, v2, . . . , vd,and let S = Col(v1, v2, . . . , vd). Then 4

S −1AS = Λ,

where Λ is the diagonal matrix whose j, j-entry is the eigenvalue λ j of A associated with the eigenvector v j.

Proof. The proof of this proposition relies on one of the interpretations of matrix

multiplication: specifically (See Assertion 1D(ii) on p25 of Strang [5])

The jth-column of AB = A times the jth-column of B. (2.21)

Applying this interpretation to S −1S = I , we deduce that

e j = S −1v j

where e j is the jth-column of the identity, or the jth-vector in the standard basis forR

d. Next, applying this interpretation to S −1AS , we have

jth-column of (S −1AS ) = (S −1A)v j .

By associativity of matrix multiplication,

(S −1A)v j = S −1(Av j) = S −1(λ jv j) = λ jS −1v j = λ je j, .

In words, we have shown, column by column, that S −1AS = Λ, as claimed.

In the Exercises the reader is asked to use this Proposition to compute the expo-nentials of various matrices.

It is instructive to re-interpret changes of variable in linear ODEs, as introduced

at the start of this subsection. Suppose A is diagonalizable over R, and choose S asin Proposition 2.3.2 so that S −1AS = Λ. Let us compare the ODE x′ = Ax withy′ = Λy obtained by the substitution y = S x. In the x-equation, the rate of changeof x j depends on all the components of x, while in the y-equation, the rate of changeof y j, which equals λ jy j , depends only on the same component y j. In other words,by diagonalizing A we are performing a change of coordinates on Rd such that thenew coordinates y j evolve uncoupled from one another.

4Up to this point it would not have mattered whether we considered the basic equation expressingsimilarity as B = S −1AS or B = SAS −1. Here, however, there is a difference: the columns of the matrix S such that S −1AS is diagonal are the eigenvectors of A, while neither the columns

nor rows of S −1 are as easily described.

47



2.3.2 Two problematic cases

The hypothesis in Proposition 2.3.2 may fail in two ways (and both failures mayoccur together):

• A has multiple eigenvalues but not enough eigenvectors, or

• A has complex eigenvalues.

Let us consider simple examples of each case before dealing with the general case.The following is the simplest example of a matrix that fails to have enough

eigenvectors to span Rd:

A =

a 10 a

.

It is readily seen that λ = a is the only possible eigenvalue of A but the eigenspaceassociated with this eigenvalue, ker(A − aI ), is only one-dimensional. However, letus write

A = aI + N, where N =

0 10 0

.

Since I and N commute, we have from Proposition 2.2.12 that

e(aI +N )t = eaIteNt = eateNt.

Moreover, since N 2 = 0, the exponential series for eNt truncates to just two terms,

eNt = I + Nt =

1 t0 1

, so eAt = eat

1 t0 1

. (2.22)

The following matrix has complex eigenvalues λ j = a ± bi:

A =

a −bb a

. (2.23)

We may again apply Proposition 2.2.12 to calculate the exponential of A. Specifically,we write

A = aI + bJ, where J =

0 −11 0

.

Since I and J commute,e(aI +bJ )t = eatebJt . (2.24)

48



Now as with nilpotent matrices, the exponential of J can be computed convenientlyusing the power-series definition because of the fact that J 2 = −I so that

J n =

I if n = 0 (mod 4)J if n = 1 (mod 4)−I if n = 2 (mod 4)−J if n = 3 (mod 4).

Thus, grouping odd and even powers (this rearrangement of terms to be justified inExercise 1), we see

ebJt =

1 − 1

2!(bt)2 +

1

4!(bt)4 + . . .

I +

bt − 1

3!(bt)3 +

1

5!(bt)5 . . .

J, (2.25)

where the power series for cos bt and sin bt can be recognized. On substituting thisformula into (2.24), we obtain

eAt = eat

cos bt − sin btsin bt cos bt

(2.26)

Incidentally, in Exercise 9 we ask the reader to prove, with hints, that every 2×2-matrix with nonreal eigenvalues is similar to (2.23) for some values of a, b. Equation(2.23) is called the real canonical form for 2 × 2 matrices with complex eigenvalues.

Let us show that the exponential of (2.23) may also be calculated by diagonal-ization. For this we need to work over the complex numbers, starting with the basicdefinitions. Temporarily, for a real vector x or a real matrix A we shall write |x|R orA

Rfor the norms defined above. Generalizing to complex vectors, if z ∈ Cd, we

let

|z

|C =

d

j=1 |

z j

|2. (2.27)

This norm may be calculated from the complex inner product |z|C = z, zC, where

z, wC =d

j=1

z jw j (2.28)

with z j denoting the complex conjugate of z j. If x ∈ Rd, then |x|R = |x|C. If A is amatrix with complex entries, then let

A

C

= max|z|C≤1 |

Az

|C. (2.29)

49



If A has real entries, it is not obvious but still true that

AR

= AC

, (2.30)

which we ask the reader to verify in Exercise 1. Because of (2.30) we will omit the subscript R or C in writing norms—if A has complex entries, we understand · C,and if A has real entries, it doesn’t matter which norm we choose. Moreover in theExercises we ask the reader to check that the various lemmas and propositions about

norms all carry over to the complex case.Now we calculate the exponential of (2.23) by diagonalization. Let

S =

1 1−i i

, S −1 =

1

2

1 i1 −i

.

The columns of S are eigenvectors of A so by Proposition 2.3.2, we have S −1AS = Λwhere

Λ =

a + bi 0

0 a − bi

.

Therefore

eAt = S eΛt S −1 = S

e(a+bi)t 00 e(a−bi)t

S −1.

Recalling Euler’s formula eibt = cos bt + i sin bt and multiplying out the product, weobtain (2.26).

2.3.3 Use of the Jordan form

By a Jordan block we mean a square matrix of the form

B =

λ 1 0 . . . 0 0

0 λ 1 . . . 0 00 0 λ . . . 0 0...

......

. . ....

...

0 0 0 . . . λ 10 0 0 . . . 0 λ

.

In words, B has entries λ on the diagonal, 1 on the “superdiagonal”, and zeroselsewhere. B may be of any dimension, including 1 × 1, in which case B is simplythe scalar λ. The diagonal entry λ is the only eigenvalue of B. No matter how largethe dimension of B may be, there is only one linearly independent eigenvector.

The Jordan normal-form theorem asserts that any square matrix A is similar to

50



a diagonal array of Jordan blocks; in symbols, S −1AS = J, where

J =

B1 0 0 . . . 00 B2 0 . . . 00 0 B3 . . . 0...

......

. . ....

0 0 0 . . . BM

.

Here, for m = 1, 2, . . . , M , the matrix Bm is a dm×dm Jordan block, and

M 1 dm = d,

the dimension of A. To shorten the notation we shall generalize the notation fordiagonal matrices and write J as

J = Diag(B1, B2, . . . , BM ).

The Jordan canonical-form theorem is spot on for computing the exponential of a matrix. First observe that

If A = S Diag(B1, . . . , BM ) S −1, then eAt = S Diag(eB1t, . . . , eBM t) S −1. (2.31)

The exponential of each Jordan block may be computed explicitly by the samemethod as was used for the computation of the 2 × 2 Jordan block (2.23) above.Specifically to exponentiate a d × d Jordan block,

1. Write B = λI + N where N is the d × d nilpotent matrix with ones on thesuperdiagonal.

2. Observe that by Proposition 2.2.12, eBt = eλtetN .

3. Calculate etN with the truncated power series

I + tN + 12!

(tN )2 + . . . + 1(d − 1)!

(tN )d−1.

This procedure yields

eBt = eλt

1 t 0 . . . td−2/(d − 2)! td−1/(d − 1)!

0 1 t . . . td−3/(d − 3)! td−2/(d − 2)!

0 0 1 . . . td−4/(d − 4)! td−3/(d − 3)!

......

.... . .

......

0 0 0 . . . 1 t

0 0 0 . . . 0 1

. (2.32)

51



In order to use this method one needs to be able to to find the Jordan normal formof a matrix. This may be done in a manner that generalizes Proposition 2.3.2, as isexplained in Appendix C. We urge you to read this section now and to apply themethod by doing Exercise ??? in the Appendix. We think you will find our approachto this topic refreshing. In particular, normal forms are determined with naturalcalculations finding generalized eigenvectors— although the minimal polynomial isneeded to prove that transformation to the Jordan form is possible, it is not neededto calculate the Jordan form. Of course in general both the Jordan normal form J

and the similarity matrix S have complex entries.The Jordan normal form is perfect for theoretical purposes because it exhibits

the structure of the solution so clearly. However, this normal form is poorly suitedto numerical computation because it is so sensitive to round-off errors. For example,consider the matrices

A =

a 10 a

, A =

a 10 a + ε

. (2.33)

No matter how small ε > 0 may be, the structure of the Jordan normal forms of these two matrices are completely different—the first has a single 2

×2 block, while

the second is diagonalizable and hence has two 1 × 1 blocks. (In the Exercises weask the reader to compare the exponentials of these matrices.)

2.4 Large-time behavior of solutions of homogeneous linearsystems

2.4.1 The main results

We shall say that the origin in Rd is a sink (or attractor ) for a linear system x′ = Axif for every initial condition b

∈Rd

limt→∞

eAtb = 0.

The eigenvalues of A provide an elegant test for such behavior.

Theorem 2.4.1. The origin is a sink for x′ = Ax iff

max1≤ j≤M

ℜλ j < 0.

The ideas underlying the proof of this theorem are clearest if A is diagonalizable(over either the real or complex numbers), so we formulate a separate, stronger,result for that case.

52



Proposition 2.4.2. If A = S ΛS −1 where Λ = Diag(λ1, . . . , λd), then

K −1eµt ≤ eAt ≤ Keµt (2.34)

where K = S S −1 and µ = max

1≤ j≤M ℜλ j, (2.35)

Proof. Regarding the upper bound in (2.34), observe that

eAt = SeΛtS −1 ≤ S eΛt S −1.

In Exercise 1 we ask the reader to show that

eΛt = max1≤ j≤d

|eλjt|.

Of course |eλjt| = e(ℜλj)t, so eΛt = eµt, from which the upper bound in (2.34)follows.

Conversely, regarding the lower bound,

eµt = eΛt = S −1eAtS ≤ K eAt,

and the result follows on dividing by K .

The next result will be used in proving Theorem 2.4.1.

Proposition 2.4.3. For any ε > 0 there is a constant 5 K such that

eAt ≤ Ke(µ+ε)t, (2.36)

where µ is given by (2.35).

Proof. We may prove the proposition by examining the Jordan normal form of A.At first, the derivation of (2.36) exactly parallels the proof of Proposition 2.4.2:

eAt ≤ S eDiag(B1,...,BM )t S −1

andeDiag(B1,...,BM )t ≤ max

1≤m≤M eBmt. (2.37)

However, because of the polynomial entries in (2.32), as t tends to infinity, eBmtmay grow like t pe(ℜλm)t for some power p. The increase in the exponent from µto µ + ε in (2.36) compensates for this extra growth of

eBmt

, provided one also

5In an exercise in Chapter 5 we ask you to show that there is a matrix B that is similar to A andsatisfies eBt ≤ e(µ+ε)t; i.e., (2.36) with the constant K = 1.

53



increases the constant K by an appropriate factor. The reader is asked to supplythe details of this argument in Exercise 1

Proof of Theorem 2.4.1. It follows from Proposition 2.4.3 that if µ < 0, then theorigin is a sink for x′ = Ax. The proof that if µ > 0 then the origin cannot be a sinkis similar to the derivation of the lower bound for eAt in (2.34); details are left forthe reader.

2.4.2 Tests for negative eigenvalues

Because of Theorem 2.4.1, it is useful to be able to test whether a matrix has all itseigenvalues in the left half plane without actually having to find the eigenvalues. For2 × 2 and 3 × 3 matrices the following two results give a simple test. Please forgiveus a homily:

Use these results! Generations of students have ignored them, wasting their time by calculating eigenvalues when it was not actually necessary.

Proposition 2.4.4. If A is a 2 × 2 matrix with real entries, then ℜλ j < 0 iff

(i) trA < 0 and

(ii) det A > 0.

Proposition 2.4.5. If A is a 3 × 3 matrix with real entries, then ℜλ j < 0 iff

(i) trA < 0,

(ii) 12

trA [(trA)2 − tr(A2)] < det A, and

(iii) det A < 0.

The proof of Proposition 2.4.4 is left as an Exercise. A proof of Proposition 2.4.5 isgiven in the Appendix, Section 2.7.2. Regarding the latter proposition, Conditions (i)and (iii) are clearly necessary for the eigenvalues of A to have negative real parts. InExercise 11 we suggest a calculation that helps motivate Condition (ii).

The following result is sometimes useful to test for oscillatory behavior in a 2 × 2system. It may be derived by examining the quadratic formulas for the eigenvaluesof A.

Proposition 2.4.6. A real 2 × 2 matrix has complex eigenvalues if and only if

(trA)2 < 4det A.

54



2.5 Solution of inhomogeneous problems

In case x satisfies an inhomogeneous linear equation, say

x′ = Ax + f (t), (2.38)

a solution to the IVP with x(0) = b is given by

x(t) = eAt

b + t

0 eA(t

−s)

f (s) ds. (2.39)

The derivation of this result, which will be used in Chapter 4, is included in Exer-cise 1.

The formula (2.39) answers the existence question for the inhomogeneous equa-tion. Uniqueness follows from Theorem 2.2.9 because if x1 and x2 are two solutionsof the IVP for (2.38), then their difference y = x1 − x2 satisfies a homogeneousequation y′ = Ay with initial condition y(0) = 0.

The similarity of (2.39) to the scalar case (1.16) is striking. However, no simpleformula exists for solutions of a linear system of ODE with variable coefficients,

x′ = A(t)x + f (t), (2.40)

unless A(t1) and A(t2) commute for all t1, t2. A counterexample demonstrating thisfact is contained in Exercise 12.

2.6 Exercises

2.6.1 Routine exercises

1. Supply details omitted in the text:

(a) Verify that the maximum in Equation (2.12) exists and is finite.

(b) Prove Lemmas 2.2.2, 2.2.3, 2.2.4, 2.2.5, and 2.2.6.

Hint for Lemma 2.2.3 : In the matrix part of the lemma, the lowerbound for A may be obtained by applying A to a basis vector ek.The upper bound may be obtained by writing A as a sum of d1d2matrices, each of which has only one nonzero entry.

Hint for Lemma 2.2.6 : Regarding (2.14), for any N > M add andsubtract terms An for n between M and N to write

M n=0

An − L =

N

n=0

An − L

−

N

n=M +1

An

.

55



For any ε > 0, N may be chosen large enough so that the first termhere is bounded by ε, and since the second term is a finite sum, itmay be estimated using the triangle inequality.

(c) Prove Leibniz rule for differentiation of the product of two matrix-valuedfunctions of a scalar variable:

d

dtφ(t)ψ(t) = φ′(t)ψ(t) + φ(t)ψ′(t),

and prove the first two assertions in Proposition 2.2.12.

(d) Prove Proposition 2.3.1.

second term is a finite sum, it may be estimated using the triangle inequality.

Remark: One may prove this result either by comparing terms in theseries for the two sides of the equation or by starting from t = 0 anddifferentiating.

(e) Justify the rearrangement of terms in Equation (2.25).

(f) Establish Equation (2.30).

(g) Show that if Λ = Diag(λ1, . . . , λd), then

eΛt = max1≤ j≤d

|eλjt|,

a result used in proving Proposition 2.4.2. Generalize this result to a diagonalarray of Jordan blocks as in (2.37).

(h) Complete the proofs of Theorem 2.4.1 and Proposition 2.4.3.

Discussion: Supplying the missing details in the proof of Proposi-tion 2.4.3 is a skill-building exercise—do it! Here is a practice prob-lem that may help in this task: Prove that for any positive power

p,max0≤t<∞

t pe−t

exists and is finite. The exact value of the maximum can in fact becomputed using calculus, but it is useful training to do this practiceexercise merely with estimation, as follows: Argue that

limt

→∞

t pe−t = 0.

56



Deduce that there is a constant M such that t pe−t < 1 if t > M .Therefore

max0≤t<∞

t pe−t ≤ max

max0≤t≤M

t pe−t , 1

and the maximum over [0, M ] is finite by compactness.

(i) Prove Propositions 2.4.4 and 2.4.6.

(j) Verify that (2.39) satisfies (2.38). One may derive (2.39) by multiplying

(2.38) by e−At and manipulating the result, or one may just differentiate (2.39)and see that (2.38) is actually satisfied.

(k) Show that the theory of Sections 2.2 and 2.3 extends to matrices withcomplex entries. (Ugh!)

2. Suppose A is a square matrix with at least one eigenvalue λ such that ℜλ < 0.Show that the linear system x′ = Ax has at least one nonzero solution x(t)such that

limt→∞

x(t) = 0.

3. Show that if A is d × 1—a column vector—or 1 × d—a row vector—thenA is just the norm of the vector. Also show that for an invertible matrixS S −1 ≥ 1.

4. Rederive (2.22) by explicitly solving the ODE x′ = Ax.

Hint: The x2-equation does not involve x1; solve this equation firstand then attack the x1-equation.

5. Compute etA for the following matrices A:

(a)

a 10 a + ε

Hint: Subtract aI from this matrix, exponentiate the result, andthen multiply by eat.

Discussion: Recalling (2.33), you will want to compare your answerwith (2.22), the exponential of the Jordan block.

(b)

1 0 0

1 2 01 0 −1

57



6. Find the 2×2 matrix A that has the indicated eigenvalues and eigenvectors:

e-value e-vector

−1/ε (1, 1)−1 (1, 1 + δ )

Discussion: This exercise is easy if one makes use of Proposition 2.3.2.The point of the exercise is to observe that A may become large

if ε and/or δ tends to zero. This behavior is not surprising if ε → 0since every eigenvalue of A is bounded by A. It may be surprisingfor δ → 0—i.e., the two eigenvectors of the matrix become nearlyparallel. Question: The norm A need not blow up as δ → 0 if ε ≈ 1; explain this.


7. (a) Prove the following slight improvement of the upper bound in Lemma 2.2.4:

A

≤max

i j |

aij

|.

(b) Derive the following exact formula for the norm of a matrix:

A =

λmax(AT A).

Use this result to find the norm of 1 20 −1

.

Compare this answer with the estimate of Part (a).

Hint for Part (b): Deduce from the definition of A that

A2 = max|x|≤1

AT Ax, x

and invoke the spectral theorem for symmetric matrices to estimateAT Ax, x.

8. Show that if for all x ∈ Rd, x, Ax ≤ −γ |x|2, then

eAt ≤ e−γt .

58



Discussion: The hypothesis in this Exercise implies that ℜλ j(A) ≤−γ (show this). Thus, the estimate (2.36) is available for eAt. Thepoint of this exercise is that under the stronger hypothesis the con-stants ε and K in (2.36) are not needed. Incidentally, another ap-proach in which K is not needed is described in “Additional notes”,Section 2.7.

Hint: For any x ∈ Rd, let u(t) = |eAtx|2. Use the hypothesis to showthat

ddt

u(t) ≤ −2γu(t).

Then estimate u(t) by differentiating e2γtu(t). Finally consider max-imizing over |x| ≤ 1.

9. Suppose A is a 2×2 real matrix with eigenvalues a ± ib where b = 0. Letu = v + iw be an eigenvector of A with eigenvalue a + ib; thus

A(v + iw) = (a + ib)(v + iw). (2.41)

Let S be the 2×

2 matrix Col(v, w). Deduce from (2.41) that S −1AS = C where

C =

a −bb a

. (2.42)

Remark: As mentioned in the text, the matrix C is called the realcanonical form of A.

10. If A is an d × d matrix with real entries, define the Euclidean norm of A,

A

E =

d

j,k=1

a2 jk1/2

.

Determine which of the following is true and prove it:

(a) For all A different from zero, A < AE .

(b) For all A, A ≤ AE , with equality occurring for at least one nonzeroA.

(c) There is a matrix A such that A > AE .

11. This exercise is intended to make Condition (ii) in Proposition 2.4.5 seemless mysterious. Consider this inequality applied to a matrix with eigenvalues

−a, ε ± iν , where a > 0. First show that if ε = 0, then the two sides of theinequality are in fact equal. Then extend your calculation to show that thatCondition (ii) holds if ε < 0 and is violated if ε > 0.

59



12. (a) Consider the variable-coefficient ODE (2.40), supposing that A(t1) andA(t2) commute for all t1, t2. Let

A(t) =

t

0

A(s) ds.

Show that

x(t) = eA(t)b + t

0

eA(t)−A(s)f (s) ds

solves the IVP for (2.40) with initial condition x(0) = b.

(b) Solve (2.40) in case

A(t) =

0 t0 1

,

show that this solution differs from what the above construction produces.


We urge you to re-read the paragraph at the end of Section 2.3.1 and to take to heartits message: two systems of ODEs with similar coefficient matrices, say x′ = Ax andy′ = S −1AS y, differ only by the choice of coordinates on Rd—they describe exactlythe same phenomena.

There are numerous alternative norms used to measure the size of vectors andmatrices. For vectors, two common choices are

|x|1 =d

j=1

|x j| and |x|∞ = max1≤ j≤d

|x j|,

which give rise to matrix norms

A1 = max|x|1≤1

|Ax|1 and A∞ = max|x|∞≤1

|Ax|∞.

If S is an invertible matrix, the norms

|x|S = |S x|, AS = max|x|S≤1

|Ax|S = S −1AS

where | · | is the usual mean-square norm (2.7), are sometimes useful. Another choicefor matrices is AE , discussed in Exercise 10.

60



Chapter 3

Nonlinear systems: local theory

3.1 Two counterexamples

In this chapter we formulate the main existence and uniqueness theorem for theIVP for systems of nonlinear ODEs. For the moment we consider only autonomous

systems, sayx′ = F(x) (3.1)

where F : Rd → Rd. More generally, we may assume that F is defined only onan open subset U ⊂ Rd. In Section 3.4, the theory is extended to nonautonomoussystems.

To introduce the main theorems, we begin with two examples to emphasize whatthey do not say.

Example 1: Without special conditions on F, the RHS in (3.1), the IVP may possess a solution only for a finite, possibly very short, time. To see this, consider

the IVP for a scalar unknown function x(t)

x′ = x2, x(0) = 1. (3.2)

The equation may be solved using separability:

dx

x2= dt, which integrates to − 1

x= t − C.

Solving for x and imposing the IC to deduce that C = −1, we obtain

x(t) =1

1 − t

. (3.3)

The reader may check that this formula satisfies both the equation and the initial

61



condition, but this solution exists only for t < 1.

Strictly speaking formula (3.3) makes sense provided t = 1, but in most modelingcontexts, continuation of the solution beyond the blow-up time—i.e., to t > 1—isrejected on physical grounds. Suppose for example that x represents a population; re-emergence of x with large negative values after the singularity at t = 1 is nonsensical.We shall say that the solution ceases to exist at t = 1.

The blow up of (3.3) at t = 1 may be understood as follows: From the equationwe see that x′ > 0, so the solution is always increasing. As the solution grows,

the equation forces x′ to increase ever more quickly, and the growth acceleratesout of control in a finite time. It is instructive to compare (3.2) with the linearequation x′ = x, whose solutions also grow without bound, but in the latter case thecumulative growth up to time t remains finite no matter how large t may get. Thekey difference between these equations is that in (3.2) the RHS x2 grows faster thanlinearly as x → ∞; finite “blow-up” time is associated with such superlinear growth.(See the Exercises for more examples relating the growth of F and blow-up in finitetime.)

In Chapter 4 we give sufficient conditions to guarantee that the solution to anIVP exists for all time.

Example 2: Without special conditions on F, solutions of the IVP for (3.1) need not be unique. To see this, consider another scalar IVP1

x′ =

|x|, x(0) = 0. (3.4)

Note that x′ ≥ 0, so that for positive t we have x(t) ≥ 0, and thus we may drop theabsolute value in the equation. Again we may solve x′ =

√ x by separability to get

the general solution

x(t) = t − C

2 2

.

Choosing C = 0 to satisfy the IC, we get the solution x(t) = t2/4. The reader maycheck that this formula satisfies both the equation and the initial condition.

However note that x(t) ≡ 0 also solves both the equation and the initial condi-tions. In other words, the solution to this IVP is not unique . Moreover, the situationis worse than is so far apparent: as we ask you to verify in the Exercises, for anyconstant t0 ≥ 0 the function

x(t) =

0 if t ≤ t0

(t − t0)2/4 if t > t0(3.5)

1Recall that in Exercise 1 of Chapter 1 we found multiple solutions to ( 1.44). The present exampleis simply a reworking of (1.44) to make the equation a little simpler and to make the RHS definedin a full neighborhood of the singularity at x = 0.

62



is a continuously differentiable solution of the equation that also satisfies the initialcondition. In other words, there are infinitely many solutions of the IVP.

The problem with this example stems from the singularity of √

x at the origin. Aswe shall see below, uniqueness may be guaranteed if F is continuously differentiable.One might wish that physical problems always led to nonsingular—e.g., continuouslydifferentiable—equations. Unfortunately this is not true: in the Exercises we intro-duce a physically based ODE with a singularity like (3.4) and discuss non-uniquenessissues in this example.

3.2 The existence theorem

3.2.1 Statement of the theorem

In this section we state and prove the fundamental existence theorem for the IVPfor (3.1). We will assume that the function F on the RHS of the ODE is continuous,and in fact we will impose a stronger condition that we now describe.

If S is a subset of Rd1, a function F : S → Rd2 is called Lipschitz continuous , orsimply Lipschitz , if there is a constant L such that for all points x, y

∈S ,

|F(x) − F(y)| ≤ L|x − y|. (3.6)

Condition (3.6) is much more restrictive than mere continuity. For example on theline F (x) =

|x| is continuous but not Lipschitz continuous. In fact, a continuousfunction can exhibit much more pathological behavior than a square root, such asthe Cantor function , also called the devil’s staircase.

Let U be an open subset of Rd1 , and let F : U → Rd2. We shall call F locally Lipschitz if for every point x0 ∈ U there is a neighborhood V of x0 such that therestriction F|V is Lipschitz. For example, on the real line the function F (x) = x2

is locally Lipschitz, even though it is not Lipschitz on the whole line. Sometimes,to emphasize the distinction with locally Lipschitz, we shall say that a function isglobally Lipschitz on S to mean that it is Lipschitz on S .

We shall study (3.1) under the assumption that F is locally Lipschitz. As Propo-sition 3.2.2 below shows, a C1 function is locally Lipschitz; in fact, being locallyLipschitz is only slightly less restrictive than being C1. In most applications we con-sider in later chapters, F will actually be C1; we consider the more general conditionhere only because, as we shall see, the local existence and uniqueness theory is sowell suited to a Lipschitz condition.

Here is our fundamental existence theorem for the IVP: i.e., given b ∈ Rd, to find

a continuously differentiable function x(t) such that

x′ = F(x), x(0) = b. (3.7)

63



Theorem 3.2.1. Let U ⊂ Rd be open and contain the initial data b, and let F : U → Rd be locally Lipschitz. Then there is an interval (−η, η) and a C1 function x : (−η, η) → U that satisfies (3.7).

The construction of x will show that the solution is unique on the possibly-very-short interval (−η, η) in the theorem. In fact, however, uniqueness holds in fargreater generality, as we shall show in Section 3.3. Pending those stronger results, weignore the information regarding uniqueness that may be obtained through provingTheorem 3.2.1.

Remark: As we discuss in the Exercises, the existence theorem may be easilygeneralized to nonautonomous equations, x′ = F(x, t).

We must develop substantial preliminaries before we are ready to prove Theo-rem 3.2.1. The results of the next subsection are not actually needed to prove thetheorem, but they help elucidate Lipschitz continuity.

3.2.2 Differentiability implies Lipschitz continuity

Proposition 3.2.2. If U ⊂ Rd1 is open and F : U → R

d2 is C1, then F is locally Lipschitz.

Incidentally, the converse of this result is not true—for example, on the real line,F (x) = |x| is locally Lipschitz (globally Lipschitz, in fact) despite not being differ-entiable at x = 0. In Exercise ?? we propose a more dramatic example illustratingthe same point.

We offer two proofs of the proposition, the first only for scalar-valued functionsof one variable and the second for the general case. The second proof is greatly tobe preferred. We offer the first proof only because it illustrates a common bad habitamong beginning analysis students—over-reliance on the mean-value theorem—andwe want to have an identified target to shoot down.

Proof 1 (Only for d1 = d2 = 1). Given x0 ∈ U , choose a closed interval I such that

x0 ∈ Int I ⊂ I ⊂ U ,

where Int means interior, and let

L = maxx∈I

|F ′(x)|.

Given x, y ∈ I , we have from the mean-value theorem that there is a point ξ betweenx and y such that

F (x)−

F (y) = F ′(ξ )(x−

y),

so by the choice of L|F (x) − F (y)| ≤ L|x − y|.

64



This proof generalizes easily to all d1, but there are problems with it if d2 > 1.The difficulty is that one needs to apply the mean-value theorem separately to eachof the d2 components of F. This is not impossible, but it results in a clunky proof.The following is a far more elegant alternative, and we strongly urge you to absorbthe idea in Lemma 3.2.3 on which the proof is based.

Proof 2 (For general d1, d2). Given x0

∈ U , we choose as an appropriate neighbor-

hood of x0 a closed ball B(x0, ε) that is contained in U . We show that F is Lipschitzon this ball with Lipschitz constant

L = maxz∈B(x0,ε)

DF(z). (3.8)

Let two points x, y ∈ B(x0, ε) be given. We isolate the following simple lemma as aseparate result so that we can refer to it later.

Lemma 3.2.3. In the above notation,

F(x) − F(y) = 1

0DF(y + s(x − y)) ds

(x − y). (3.9)

Proof. Note that the argument of DF in the above integrand,

ℓ(s) = y + s(x − y), 0 ≤ s ≤ 1, (3.10)

defines the line segment from y to x, which is entirely contained in B(x0, ε) ⊂ U .Thus, the composition F ℓ : [0, 1] → Rd2 is defined and C1. By the FundamentalTheorem of Calculus2

F(x) − F(y) = 10

d

ds [F ℓ] (s) ds. (3.11)

According to the chain rule, (d/ds)[F ℓ] = DF · ℓ′, and differentiation of (3.10)yields ℓ′(s) = x − y. Thus (3.9) follows.

Proof 2 of Proposition 3.2.2 , concluded. Recalling the definition (3.8), we estimate

2This equation involves vector-valued functions, which might worry you. Roughly speaking, cal-culus for vector-valued functions of a single real variable is no more complicated than the usualone-variable calculus. For example, each component of (3.11) is simply the standard Fundamen-tal Theorem of Calculus. By contrast, calculus of functions of several real variables introducesmany new complications, such as divergence, curl, multiple integrals, Green’s theorem, etc. The

few ideas from multi-variable calculus needed for this book, such as the chain rule, are developedin Appendix D.

65



from (3.9) that

|F(x) − F(y)| ≤ L|x − y| 10

ds,

which yields the required bound.

Although we do not make this an explicit exercise, we ask you to verify that theexact same construction as in the above proof supports the following extension:

Corollary 3.2.4. If U ⊂ Rd1

is open, if F : U → Rd2

is C1

, and if K ⊂ U is convex,then F|K is Lipschitz.

3.2.3 Reformulation of the IVP as an integral equation

The proof of Theorem 3.2.1 is based on analyzing the equivalent integral equation(3.12) that appears in the following Proposition. The integral equation is moretractable than (3.7) because (i) integration is a much less singular operation thandifferentiation and (ii) the two separate equations in (3.7) are combined in a singleintegral equation.

In the Proposition, it suffices if F is merely continuous, so we temporarily weakenour hypotheses. Also we consider the IVP on a general interval (α, β ).

Proposition 3.2.5. Let U ⊂ Rd be open, and let F : U → Rd be continuous. If x ∈ C1((α, β ), U ) satisfies (3.7), then x satisfies the integral relation

x(t) = b +

t

0

F(x(s)) ds, α < t < β. (3.12)

Conversely, if x is continuous on (α, β ) and satisfies (3.12), then x is C1 and satisfies

(3.7).

The proof of this result is a straightforward application of the Fundamental The-orem of Calculus, and we leave it as an Exercise for the reader.

Despite its appearance, equation (3.12) is not a formula that tells us what thesolution is because we need to know x in order to evaluate the integral.

3.2.4 The contraction-mapping principle

In Chapter 2 we encountered the concept of a norm on a vector space X: i.e., afunction · : X → [0, ∞) such that for any vectors x, y ∈ X and scalar c ∈ R

(i) x ≥ 0, and x = 0 iff x = 0.(ii) |cx| = |c| x

(iii) x + y ≤ x + y.(3.13)

66



There the norms were defined on finite dimensional spaces of vectors or matrices.In the present chapter we are interested in norms on infinite dimensional spaces,especially on C([−η, η],Rd), the set of continuous functions from the closed interval[−η, η] into Rd, where for x ∈ C([−η, η],Rd) we define

x = max−η≤t≤η

|x(t)|. (3.14)

Since [−η, η] is compact, the maximum exists. In the Exercises, the reader is

asked to show that (3.14) satisfies the axioms (3.13). Convergence of a sequencein C([−η, η],Rd) with respect to this norm is simply uniform convergence of a se-quence of functions.

Let X, · be a normed linear space. A sequence xn ⊂ X is called Cauchy if for every ε > 0 there is an integer N such that

m, n > N =⇒ xm − xn < ε.

The space X is called complete if every Cauchy sequence converges: i.e., there existsan element x∞ such that

limn→∞

xn

−x∞

= 0.

Both these concepts should be familiar to the reader from their application to thereal numbers.

Proposition 3.2.6. The space C([−η, η],Rd) is complete.

This theorem is merely a restatement of the result (Theorem B.0.5 in Appendix B)that the uniform limit of a sequence of continuous functions is itself continuous.

Incidentally, a complete normed linear space is called a Banach space. Thus, inthis terminology, Proposition 3.2.6 asserts that C([−η, η],Rd) is a Banach space.

After two more definitions, we will be ready to state and prove the contraction-mapping principle. Let Ω be a subset of a normed linear space X, and let T : Ω → Ωbe some mapping of that set into itself. We use a Gothic letter for the mapping asa warning that it is a more complicated mathematical object than others we haveencountered so far—if for example X = C([−η, η],Rd), then T needs a vector-valuedfunction x(t), −η ≤ t ≤ η, as its argument, and the result of applying T to x, whichwe write as T[x] with square brackets, is also a function on [−η, η]. We shall call Ta contraction if there is a constant C < 1 such that for all x, y ∈ Ω

T[x] − T[y] ≤ C x − y; (3.15)

in words, for T to be a contraction, it must be Lipschitz continuous with a Lipschitzconstant less than unity. (Although we originally defined Lipschitz continuity forfunctions on Rd, the definition generalizes to any metric space, even an infinite-

67



dimensional one as here.) Finally we shall call a point x ∈ Ω a fixed point of T if T[x] = x.

Theorem 3.2.7. If Ω is a closed subset of a Banach space X and if T : Ω → Ω is a contraction, then T has a unique fixed point in Ω.

Proof. Choose a vector x0 ∈ Ω arbitrarily. Define a sequence inductively as follows:having chosen x0, x1, . . . , xn, let xn+1 = T[xn]. Observe that

xn+1 − xn = T[xn] − T[xn−1] ≤ C xn − xn−1.

Iterating this inequality, we deduce that

xn+1 − xn ≤ C nx1 − x0. (3.16)

We ask the reader to show that, since C < 1, (3.16) implies that xn is Cauchy.Then since X is complete, we conclude that xn has a limit x∞ in X; moreover,since Ω is closed, x∞ ∈ Ω.

We claim that x∞ is a fixed point of T. To see this, observe that

T[x∞] = T[lim xn] = limT[xn] = lim xn+1 = x∞,

where we have used the continuity of T to pull the limit outside the argument of T.

To show that the fixed point is unique, suppose x, y are both fixed points of T.Then

x − y = T[x] − T[y] ≤ C x − y,

or(1 − C ) x − y ≤ 0.

Since 1 − C > 0, we deduce x − y ≤ 0. Then by (3.13i) we obtain x = y.

3.2.5 Proof of the existence theorem

To prove Theorem 3.2.1 we will construct a solution of (3.7) by finding a fixed pointof a mapping based on the integral equation (3.12), as follows: Choose a ball B(b, δ ),a neighborhood of b in Rd, such that (a) the closure B(b, δ ) is contained in the openset U and (b) the restriction of F to B(b, δ ) is Lipschitz: in symbols

(a) B(b, δ ) ⊂ U and (b) F|B(b, δ ) is Lipschitz continuous. (3.17)

Let Ω

⊂ C([

−η, η],Rd) be defined by

Ω = x ∈ C([−η, η],Rd) : ∀t ∈ [−η, η] |x(t) − b| ≤ δ ;

68



in other notation, we could also write

Ω = x ∈ C([−η, η],Rd) : x − b ≤ δ

in which formula b denotes the constant function that, at every point t, equalsthe vector b ∈ Rd. In words, B(b, δ ) is the ball in the Euclidean space Rd of radius δ around the vector b, while Ω is the ball in the infinite dimensional spaceC([−η, η],Rd) of radius δ around the constant function b. Then for any x ∈ Ω, the

integrand F(x(s)) in (3.12) makes sense and is a continuous function of s. Hence wemay define a mapping from Ω into C([−η, η],Rd), in symbols T : Ω → C([−η, η],Rd),by the RHS of (3.12): i.e.,

T[x](t) = b +

t

0

F(x(s)) ds, −η ≤ t ≤ η. (3.18)

This formula may involve a way of thinking unfamiliar to the reader since T is amapping between subsets of (infinite-dimensional) function spaces. Thus the argu-ment of T is a function , written x without any argument, and the result T[x] is alsoa function. To know what function T[x] is, we have to be told its value for every

point t ∈ [−η, η], and that is what (3.18) gives us.The following two claims will allow us to apply Theorem 3.2.7 to extract a fixed

point of T in C([−η, η],Rd). Application of Proposition 3.2.5 above then shows thatthis fixed point is a solution of (3.7) on the open interval (−η, η). In fact our proof of Theorem 3.2.1 shows a little more than what is claimed—i.e., the solution we obtainis actually continuous on the closed interval [−η, η].

Claim 1: If η is sufficiently small then for any x ∈ Ω, the image T[x] belongs to Ω.

In other words, although as originally defined the range of T was C([−η, η],Rd),by reducing η if needed, we may regard T as a mapping into Ω.

Proof. We need to show that for any x ∈ Ω,

T[x] − b ≤ δ.

From (3.12) we compute that

(T[x] − b)(t) =

t

0

F(x(s)) ds,

so

|T[x]

−b|(t)

≤ [0,t] |

F(x(s))|

ds.

(We replace limits of integration by the interval [0, t] to cover the case when t may

69



be negative; even if t < 0, in our notation [0,t] ds = |t| > 0.)

LetK = max

z∈B(b,δ)|F(z)|

Since x ∈ Ω, the integrand above satisfies |F(x(s))| ≤ K . Thus observing that [0,t]

ds ≤ η, we conclude that |T[x] − b|(t) ≤ ηK , so the claim will be satisfiedprovided η is chosen such that ηK ≤ δ .

Claim 2: If η is sufficiently small, then T is a contraction.

Proof. By (3.17b), there is a Lipschitz constant L for F over B(b, δ ). Let x, y ∈ Ωbe given. From (3.12)

|T[x] − T[y]|(t) ≤ [0,t]

|F(x(s)) − F(y(s))| ds.

By the Lipschitz property,

|T[x] − T[y]|(t) ≤ L [0,t] |x(s) − y(s)| ds.

Of course|x(s) − y(s)| ≤ x − y,

and estimating [0,t]

ds ≤ η, we deduce that

T[x] − T[y] ≤ ηLx − y.

Thus the claim follows if ηL < 1.

3.2.6 An illustrative example

Unscrambling the proof of the fixed point theorem, we see that the construction of the solution of the IVP ultimately comes down to the limit of an iterated sequence:x0 is chosen arbitrarily (e.g., x0(t) ≡ b) and subsequent x’s are chosen iteratively:

xn+1 = T[xn]. (3.19)

Let us compute the iterates for the simplest of scalar IVPs,

x′ = x, x(0) = 1.

Equation (3.19) becomes

xn+1 = 1 +

t

0

xn(s) ds.

70



If we choose x0(t) ≡ 1, then we find

xn(t) = 1 + t +1

2!t2 + . . . +

1

n!tn.

In other words, the nth iterate is just the polynomial approximation of degree n to theexponential et. Thus, the iteration works very well indeed for this simple example.

Some authors prove Theorem 3.2.1 directly by iteration of the integral equation(3.12), which is called Picard iteration . This approach avoids the abstractness of

the contraction-mapping principle. However, in our view the contraction-mappingformalism clarifies the proof by isolating exactly what one needs to show to guaranteethat the iteration may be continued indefinitely and converges—i.e., Claims 1 and 2above.

3.2.7 Concluding remark

The following two results are mildly interesting in their own right, but more impor-tant, they are useful as tools in certain proofs below. In Exercise 1 you are givenhints for proving both of them.

The first shows that, subject to a continuity hypothesis, two solutions of an ODEon adjacent intervals may be “pasted” together to give a solution on a larger interval.

Lemma 3.2.8. Suppose x1, x2 are solutions of an ODE such that

(i) x1 is continuous on (α, β ] and satisfies x′ = F(x) on (α, β )(ii) x2 is continuous on [β, γ ) and satisfies x′ = F(x) on (β, γ ).

If x1(β ) = x2(β ), then the definition

x(t) = x1(t) if α < t ≤ β x2(t) if β

≤t < γ

yields a solution of the ODE on the combined interval (α, γ ).

Corollary 3.2.9. If x is a C1 solution of an ODE in an interval (α, β ) that is continuous on the closed interval [α, β ], then x may be extended to a solution of the equation in a slightly larger interval (α − ε, β + ε).

3.3 The uniqueness theorem

3.3.1 Gronwall’s Lemma

Our first result, Gronwall’s Lemma, is a simple inequality, but it provides extremelyuseful estimates for solutions of an ODE. In particular, we will use it to derive theuniqueness result Theorem 3.3.4 below.

71



Lemma 3.3.1. Let g : [0, T ] → R be continuous, and suppose there are non-negative constants C, K such that

g(t) ≤ C + K

t

0

g(s) ds, 0 ≤ t ≤ T. (3.20)

Then g(t) ≤ CeKt , 0 ≤ t ≤ T. (3.21)

Here is a corollary of Gronwall’s Lemma whose hypotheses are more intuitive: if g is differentiable and satisfies

g′ ≤ Kg, g(0) ≤ C, (3.22)

then g is bounded by an exponential as in (3.21). This could easily be proved directly,and it follows the lemma because integration of condition (3.22) yields condition(3.20). However, Gronwall’s inequality does not require that g be differentiable, andthis makes application of the lemma much more flexible.

Proof of Lemma 3.3.1. We define a function by the RHS of (3.20),

G(t) = C + K

t

0

g(s) ds.

The function G is C1, and it satisfies

(a) g(t) ≤ G(t) and (b) G′(t) = Kg(t). (3.23)

Applying Leibniz’ Rule to differentiate the product e−KtG(t), invoking (3.23b) tocalculate G′, and recalling (3.23i), we compute

ddt

e−KtG(t)

= e−Kt · Kg(t) − Ke−Kt · G(t) = Ke−Kt [g(t) − G(t)] ≤ 0.

Thus, e−KtG(t) is nonincreasing, so

e−KtG(t) ≤ G(0) = C. (3.24)

From this it follows thatg(t) ≤ G(t) ≤ CeKt

where we have again invoked (3.23a) for the first inequality and multiplied (3.24) byeKt for the second.

Remark: It is a minor generalization of Gronwall’s Lemma to relax the hypothesesto assume that g is merely piecewise continuous. (See Exercise 5.)

72



3.3.2 More on Lipschitz functions

Proposition 3.3.2. Suppose U ⊂ Rd1 is open and F : U → Rd2 is locally Lipschitz.Then for any compact set K ⊂ U , the restriction F|K is (globally) Lipschitz.

Proof. For each x ∈ K, choose a ball B(x, δ x) such that (i) its closure B(x, δ x) iscontained in U and (ii) F is Lipschitz continuous on B(x, δ x). The collection

B(x, δ x/2) : x

∈ Uis an open cover of K. (Note that the radii here have been halved.) Choose a finitesubcover of K, say B(x j, δ j/2), j = 1, 2, . . . , J ; let Λ j be a Lipschitz constant for Fon B(x j, δ j) (radius not halved); and let

L1 = max j=1,...,J

Λ j, L2 = 4 maxx∈K

|F (x)|/ min j

δ j.

Let us show that F is Lipschitz over K with Lipschitz constant L = maxL1, L2.To prove this, suppose x, y ∈ K. The first point x belongs to one of the balls in thefinite subcover, say x ∈ B(xk, δ k/2). We consider two cases: (i) If y belongs to the

full-radius ball B(xk, δ k) with the same index, then by construction

|F(x) − F(y)| ≤ Λk|x − y| ≤ L1|x − y|.

(ii) If y lies outside B(xk, δ k), then it may be seen from Figure 3.1 that

|x − y| ≥ δ k/2. (3.25)

(In the Exercises we ask the reader to supply an analytical justification of this in-equality.) Of course

|F(x) − F(y)| ≤ 2max

K

|F|; (3.26)

we multiply the RHS of (3.26) by 2|x − y|/δ k, a quantity that by (3.25) is greaterthan unity, to obtain

|F(x) − F(y)| ≤ 4 maxK

|F| |x − y|δ k

≤ L2|x − y|.

For future use let us record a corollary of the compactness construction in theabove argument.

Corollary 3.3.3. If K ⊂ U ⊂ Rd where K is compact and U is open, then there is a larger compact set K′ ⊂ U and a δ > 0 such that for every x ∈ K, the ball B(x, δ )is contained in K′.

73



δ k

2

xk

δ k

y

x

Figure 3.1: Illustrating the validity of (3.25).

Proof. Exercise.

3.3.3 The uniqueness theorem

It is convenient to have a uniqueness theorem with minimal hypotheses. For thatreason we introduce the following, apparently weaker, notion of a solution of an IVP:By a solution of (3.7) in forward time we mean a continuous function x : [0, β ) → U with x(0) = b that is continuously differentiable on the open interval (0, β ) andsatisfies the ODE there.

The greater generality of the above definition is only apparent—given a solution xin forward time, it follows from Corollary 3.2.9 that x has an extension to a solutionon an open interval (−ε, β ) that contains the origin. Thus, in particular, if x is a

solution of the IVP in forward time, then the derivative x′ is continuous even att = 0.

This information is useful for the integral equation characterizing solutions of theIVP,

x(t) = b +

t

0

F(x(s)) ds, 0 < t < β. (3.27)

The example

x(t) = t sin

1

t

, x′(t) = −1

tcos

1

t

+ sin

1

t

shows that, in general, if x is continuous on [0, β ) and differentiable on (0, β ), then theintegral

t0

x′(s) ds may actually be an improper integral. However, if x is a solution

74



of x′ = F(x) in forward time, then x′ is continuous at t = 0, and the interpretationof the integral in (3.27) raises no such difficulties.

Theorem 3.3.4. Suppose that F : U → Rd in (3.7) is locally Lipschitz. Let x1, x2

be two solutions in forward time of the initial value problem (3.7), say defined for 0 ≤ t < β j, j = 1, 2. Then for all t in the range 0 ≤ t < minβ 1, β 2 where both solutions are defined, x1(t) = x2(t).

Remark: Although we proved existence only for a short time interval, the unique-ness result applies to any interval over which the IVP happens to have a solution,no matter how long.

Proof. We want to apply Gronwall’s Lemma to the function3 to g(t) = |x1(t)−x2(t)|.Both solutions satisfy integral equations as in (3.27). Subtracting these and cancelingthe constant terms, we have

x1(t) − x2(t) =

t

0

[F(x1(s)) − F(x2(s))] ds, 0 ≤ t < minβ 1, β 2. (3.28)

Thus

g(t) = |x1(t) − x2(t)| ≤

t

0

|F(x1(s)) − F(x2(s))| ds. (3.29)

Temporarily we restrict t to an interval [0, T ] where T < minβ 1, β 2. Since [0, T ]is compact, so is the union of images, K = x1([0, T ]) ∪ x2([0, T ]). Therefore byProposition 3.3.2 there is a Lipschitz constant L for F on K. Hence for 0 ≤ s ≤ T ,the integrand on the RHS of (3.29) may be estimated

|F(x1(s)) − F(x2(s))| ≤ L|x1(s) − x2(s)| = Lg(s).

Substituting into (3.29) we see that

g(t) ≤ L

t

0

g(s) ds. (3.30)

Thus from Gronwall’s inequality with C = 0 we deduce that g(t) ≤ 0 for 0 ≤ t ≤ T .But g is non-negative, so g ≡ 0, and thus x1(t) = x2(t) for 0 ≤ t ≤ T . Finally, we

3Note that the absolute-value function is not differentiable, so it is possible that g is not differen-tiable. The weak hypothesis in Gronwall’s Lemma greatly simplifies the proof of the Theorem.

This is just one instance of how proofs in ODE, which is an old subject, have been polishedover the years. Indeed, beware of reading through this and other proofs too quickly and missingthe cleverness. For example, in this proof, before applying Gronwall’s Lemma, we prepare for it

by (i) restricting t to a large, closed subinterval of [0, β ) to obtain compactness and (ii) invokingProposition 3.3.2 to derive a Lipschitz constant that works on all of [0, T ]. In this way we areable to prove uniqueness for as long as a solution exists.

75



may take T arbitrarily close to minβ 1, β 2, so we have equality for all t where bothsolutions are defined.

Remark 1: We could derive a uniqueness result for two-sided solutions of (3.7) bymodifications in the above proof, but there is a trick that requires even less effort. Letus define a solution in backwards time of an IVP by making the obvious modificationsof the forward-time concept. Note that x(t) is a solution in backward time if and onlythe function y(t) = x(−t) is a solution in forward time of the equation y′ = −F(y).

Since −F is locally Lipschitz if F is, by applying Theorem 3.3.4 to the IVP fory′ = −F(y) in forward time, we may derive uniqueness for the original equationin backward time. Of course, uniqueness in both forward and backward time givesuniqueness of two-sided solutions.

Remark 2: Because of the uniqueness theorem, two solutions of x′ = F(x) cannever cross one another.

Remark 3: The uniqueness theorem states that two solutions of x′ = F(x)that start from the same initial conditions coincide for as long as they both exist.Gronwall’s Lemma may also be used to show that two solutions of x′ = F(x) thatstart from nearby initial conditions diverge from one another at most exponentially

fast. (Cf. Theorem 4.4.1.) We wait until Chapter 4 to pursue this since a moresatisfactory result can be obtained using information about global existence thatwill be developed in Section 4.2.

3.4 Generalizations

3.4.1 Nonautonomous systems

Both the existence and uniqueness theorems generalize to nonautonomous IVPs, say

x′ = F(x, t), x(0) = b. (3.31)

Suppose F is defined on U ×I where U ⊂ Rd is open and I ⊂ R is an open interval.To avoid trivialities, we assume that b ∈ U and 0 ∈ I . We shall say that F is locally uniformly Lipschitz if for every (x0, t0) ∈ U × I there is a neighborhood V × J of (x0, t0) and a constant L such that

(∀x1, x2 ∈ V ) (∀t ∈ J ) |F(x1, t) − F(x2, t)| ≤ L|x1 − x2|. (3.32)

Theorem 3.4.1. If F : U × I → Rd is locally uniformly Lipschitz, then there is an interval (−η, η) and a C1 function x : (−η, η) → U that satisfies (3.31).

Theorem 3.4.2. Suppose that F : U × I → Rd is locally Lipschitz. Let x1, x2

be two solutions in forward time of the initial value problem (3.31), say defined for 0 ≤ t < β j, j = 1, 2. Then for all t in the range 0 ≤ t < minβ 1, β 2, x1(t) = x2(t).

76



Both these results may be proved by imitating the analogous proof for the au-tonomous case. Indeed, adapting the proofs of Theorems 3.2.1 and 3.3.4 to nonau-tonomous problems is probably a better way to understand the autonomous casethan just reading the proofs given in the text. The Exercises invite you to performthis task.

Note that, to shorten the notation in (3.31), we have imposed the initial conditionat t = 0. Unlike for autonomous equations, solutions of nonautonomous equationsdo not have translational invariance. Thus, imposition of an initial condition at a

different time, say x(t0) = b, is strictly speaking a different problem. However, noreal generality is lost by assuming t0 = 0 in (3.31) since the general case can easilybe reduced to (3.31) by appropriate translation.

3.4.2 Linear systems

Stronger results are available for linear systems with variable coefficients, say an IVP

x′ = A(t)x + g(t), x(0) = b. (3.33)

Theorem 3.4.3. Suppose the coefficient matrix A(t) and the inhomogeneous term

g(t) in (3.33) are continuous in the (possibly infinite) open interval (T 1, T 2) which contains t = 0; then this IVP has a unique solution that exists for T 1 < t < T 2.

In particular, solutions of a linear system do not blow-up in finite time, no matterhow quickly A(t) or |g(t)| may grow with t. Also note that Lipschitz continuityneed not be explicitly assumed in the theorem.

Like Theorem 3.2.1, this result may be proved with Picard iteration. It turnout that, for a linear problem, the iteration converges for arbitrarily large times!

We give hints for showing this in Exercise 4. We recommend the exercise since itprovides a useful perspective that will enhance your understanding of the proof of Theorem 3.2.1.

3.5 Exercises



(a) Consider the scalar IVP

x′ = |x| p, x(0) = b,

where p > 1 and b > 0. By solving the problem explicitly, show that thesolution blows up in finite time. What is the behavior of the solution as t → ∞if b < 0?

77



(b) Verify that the functions (3.5) are C1 solutions of (3.4).

(c) Show that F (x) = |x| is not Lipschitz on R. Show that F (x) = |x| is

(globally) Lipschitz on R but not differentiable. Show that F (x) = x2 is locallyLipschitz but not globally Lipschitz.

(d) Prove Proposition 3.2.5.

(e) Show that the definition (3.14) satisfies the axioms (3.13).

(f) Show that, since C < 1, equation (3.16) implies that the sequence

xn

is

Cauchy.

(g) Prove Lemma 3.2.8.

Hint: You may proceed in either of two ways: You can show thatthe one-sided limits of x′ are equal at β and proceed from there, oralternatively you can show that the integral equation (3.12) holds forall t ∈ (α, γ ), including for t = β , and then invoke Proposition 3.2.5.

(h) Prove Corollary 3.2.9.

Hint: It suffices to consider the upper end-point t = β . ApplyTheorem 3.2.1 to solve an IVP y′ = F(y) on β − η < t < β + η withinitial condition y(β ) = x(β ). Use Lemma 3.2.8 to obtain a solutionon [α, β + η).

(i) Give an analytical proof of (3.25).

(j) Prove Corollary 3.3.3

2. Are the following functions Lipschitz continuous on the indicated sets? (All of these examples are scalar-valued functions. Vector-valued functions would notpose any additional difficulties except the need to examine more components.)

(a) (x + 2)1/3 on the interval [−1, 1]? On R?

(b) (x2 + 2)1/3 on the interval [−1, 1]? On R?

(c) (x2 + 2)−1/3 sin(ex) on the interval [−1, 1]? On R?

(d) |x + y2 − 2| on the square |x| ≤ 2, |y| ≤ 2? On R2?

Hint: Observe that the function is the composition of x2 + y − 2with the absolute value function. Show that the composition of twoLipschitz functions is Lipschitz continuous. Thus, |x + y2 − 2| isLipschitz on the indicated set if x + y2

−2 is.

(e)√

x2 + 1/(x2 + y2 − 1) on the bounded annulus 2 ≤ x2 + y2 ≤ 8? On theunbounded annulus 2 ≤ x2 + y2 < ∞?

78



Discussion: The indicated function is certainly not Lipschitz on R2;indeed because the denominator goes to zero on the unit circle, thisfunction is not even continuous. Propositions 3.2.2 and 3.3.2 maybe used to show this function is Lipschitz on the bounded annulus.More thought is required to analyze the unbounded annulus—havefun!

3. (a) Prove Theorem 3.4.1.

(b) Prove Theorem 3.4.2.

Hint: Structure your proofs by following the proofs of Theorems 3.2.1and 3.3.4. For Theorem 3.4.2 you will have to prove an analogueof Proposition!3.3.2 regarding the restriction of a locally uniformlyLipschitz function to a compact set. Please do these problems —theyare more useful for understanding the proofs in the text than readingthe text.

4. This exercise addresses the derivation of the special behavior of linear equa-

tions, Theorem 3.4.3. According to Proposition 3.2.5, a continuous functionx : (T 1, T 2) → Rd satisfies (3.33) if and only if it satisfies the integral equation

x(t) = b +

t

0

[A(s)x(s) + g(s)] ds, T 1 < t < T 2. (3.34)

We outline an argument that (3.34) has a unique solution for t ∈ I , where I is an arbitrary compact subinterval of (T 1, T 2), which will establish the desiredresult.

Let us extract a linear operator L : C( I ,Rd) → C( I ,Rd) from (3.33),

L[x](t) =

t

0

A(s)x(s) ds.

We may then rewrite (3.33) as

(I − L)x = b + G

where G(t) = t0

g(s) ds. The crux of the argument is to show that I − L isinvertible on the space C( I ,Rd) by proving that the infinite series

(I − L)−1 = I + L + L2 + L3, . . .

79



converges. For this, first show that

Ln[x](t) =

t

0

ds1

s1

0

ds2 . . .

sn−1

0

dsn A(s1)A(s2) . . . A(sn)x(sn).

Let K = maxt∈I A(t). Estimating maxima in the above equation, concludethat as an operator on C( I ,Rd),

Ln

≤ K n t

0 ds1 s1

0 ds2 . . . sn−1

0 dsn ≤ K n

|I|n

/n!

where |I| denotes the length of I . Thus one may prove that the series for(I − L)−1 converges by comparison with the series for eK |I| .

3.5.2 Exercises used elsewhere in this book

5. In this Exercise we derive two generalizations of Gronwall’s inequality, Lemma3.3.1.

(a) Show that the estimate (3.21) continues to hold if g is assumed merely to

be piecewise continuous.

Discussion: A function g is called piecewise continuous on an interval I if there is a finite set of points a j : j = 1, . . . , J in I such that(i) g is continuous on I ∼ ∪ ja j and (ii) at each point a j the one-sided limits of g exist (and are finite).

(b) Show that if g : [0, T ] → R is continuous and if there are non-negativeconstants C, B , K such that

g(t) ≤ C + Bt + K t

0 g(s) ds, 0 ≤ t ≤ T.

Then

g(t) ≤ CeKt + BeKt − 1

K , 0 ≤ t ≤ T. (3.35)

Hint: One approach for Part (b) is to imitate the proof of Lemma 3.3.1.A more elegant alternative, which does not require re-emaninging theproof of Lemma 3.3.1, involves applying Gronwall’s inequality to thefunction h(t) = g(t) + B/K .

80



3.5.3 Computational exercise

6. Consider the linear system

x′ = yy′ = −(1/4 + β cos t)x

which comes from writing Mathieu’s equation (1.5b) as a first-order system,assuming α = 1/4. Solve an IVP for this equation, say with β = 0.02, over

a long time interval, say 0 ≤ t ≤ 1000. Exact initial conditions don’t mattergreatly, but if you want a suggestion, try x(0) = 1, y(0) = 0.

Discussion: This exercise is intended as an antidote to complacency:Based on Theorem 3.4.3 one might regard linear equations as boring,but contrast the behavior you find here with that of a constant-coefficient analogue of Mathieu’s equation, x′′+x/4 = 0. Incidentally,note that the coefficients in the above system are periodic functionsof time. Linear systems with periodic coefficients are analyzed inFloquet theory , which we discuss in Chapter 6.


7. (a) Find the general solution of the equations

tx′ ± x = 0

for t > 0. This may be solved either as a first-order linear equation or as aseparable equation.

Discussion: Note that writing this equation in standard form yields

x′ = F (x, t) where F (x, t) = x/t is singular at t = 0. The remainderof this exercise illustrates how badly such equations may behave.

(b) Deduce from your solution in Part (a) that the IVP

tx′ + x = 0, x(0) = b

has no solutions.

(c) Deduce from your solution in Part (a) that the IVP

tx′

−x = 0, x(0) = b

has no solutions if b = 0 and has infinitely many solutions if b = 0..

81



8. Here is a physical situation that gives rise to the backwards version of ournonuniqueness example, (3.4). Consider a partially filled bucket that has ahole in its bottom. Need figure? Under certain simplifying assumptions theheight h(t) of the water in the bucket satisfies

dh

dt= −C

√ h.

We refer to p191 of Hubbard and West for details, but briefly the derivation

is (i) dh/dt is proportional to the speed v with which the water emerges fromthe bucket and (ii) if friction is neglected, the kinetic energy (i.e., v2) of theemerging water is proportional to the loss of potential energy (i.e., h). Withoutloss of generality we may scale time so that C = 1 in this equation.

(a) Apply separability to solve the IVP

dx

dt= −√

x, x(0) = b (3.36)

where b > 0. Observe that after some finite time, x(t) reaches zero. On physicalgrounds one knows x(t)

≡0 for all later times, and this of course satisfies the

equation.

(b) Argue that from Theorem 3.3.4 the solution of (3.36) is unique in backwardstime.

(c) Show that the solution of (3.36) is unique in forward time.

Hint for Part (c): On physical grounds we need consider only non-negative solutions of this problem. Suppose that x(y) and y(t) areboth C1, non-negative solutions of (3.36). Then

d

dt [x(t) − y(t)]2

= 2[x(t) − y(t)][− x(t) +

y(t)] ≤ 0

where the inequality may be derived by considering x ≤ y and y ≤ xas two separate cases. Thus [x(t) − y(t)]2 ≤ [x(0) − y(0)]2 = 0.

Remark: The IVP (in forward time) for (3.4) may be interpreted inthe present context as, “Suppose you come into the room and seethe bucket is empty; how full was the bucket an hour ago?” In otherwords, of course the solution in backwards time is not unique.

9. Here is a more dramatic example that a Lipschitz function need not be continu-

ously differentiable: Define the “saw-tooth” function as the periodic extension(with period 2) of S (x) = |x| if |x| ≤ 1.

82



Let

F (x) =∞

n=1

1

n3S (nx).

Show that F is Lipschitz continuous with Lipschitz constant L =∞

1 n−2.

Discussion: It is clear that F is not C1—indeed F ′ fails to exist at allrational values of x. It might seem like F is nowhere differentiable,but in fact this is not true—the limit defining the derivative F ′ existsat all irrational points. Although it is not important for ODE, youmight enjoy proving this.


Although we have assumed Lipschitz continuity of F to prove the existence of so-lutions of x′ = F(x), this is not necessary—in fact continuity of F suffices. This isproved, for example, in Birkhoff-Rota, Chapter 6, Section 14. (Regarding uniqueness,as we saw in Section 4.1, Lipschitz continuity is necessary.)

Incidentally, Lipschitz continuous is only slightly weaker than differentiability. Inparticular, a Lipschitz continuous function is differentiable “almost everywhere”—this concept has a precise mathematical meaning in terms of the theory of Lebesgueintegration. (Cf. Problem 9.)

83



Chapter 4

Nonlinear systems: global theory

Theorem 3.2.1 proved that a solution to the IVP exists for what might be an ex-tremely short time. Typically the ODEs that arise in physical applications possesssolutions for much larger times than can be deduced with the contraction-mappingprinciple, and in this chapter we introduce methods for demonstrating this behav-

ior. The technique is based extending a short-time solution obtained from Theorem3.2.1. The main tool for such extensions—Theorem 4.1.2—is proved in Section 4.1.Two results that guarantee that an IVP in fact has a solution for all time are givenin Section 4.2. In Section 4.3 we introduce nullclines, which are an important tech-nique in their own right and are also useful in applying the existence theorem fromSection 4.2.

In Sections 4.4 and 4.5 we show that the solution of an IVP depends continuouslyand differentiably on its initial conditions. While theorems of this type could havebeen formulated in Chapter 3, by using Theorem 4.1.2 we are able to obtain muchmore satisfactory results.

4.1 The maximal interval of existence

Our first results asserts that there is a maximal interval for which the solution of anIVP exists, a sort of “gold standard” for solutions.

Proposition 4.1.1. Let F : U → Rd be locally Lipschitz on U ⊂ Rd. Given b ∈ U ,there is a solution x∗ : (−α∗, β ∗) → U of the IVP

x′ = F(x), x(0) = b (4.1)

that is maximal in the following sense: If x solves (4.1) for t in some open interval I , then (i) I ⊂ (−α∗, β ∗) and (ii) x(t) = x∗(t) if t ∈ I . (4.2)

84



Remark: It often happens that either α∗ or β ∗, or both, equals infinity. Forexample, for the IVP (3.2), the maximal interval of existence is (−∞, 1). Note thatthe maximal interval of existence is always open, even if α∗ or β ∗ is finite.

Proof. We focus only on β ∗ and t ≥ 0, leaving the analogous treatment of α∗ andt ≤ 0 for the dedicated reader. Let

β ∗ = sup β : (4.1) is solvable in forward time for 0 ≤ t < β .

Of course by Theorem 3.2.1, β ∗ > 0. We consider separately the cases when β ∗ isfinite or infinite.

Case 1: β ∗ < ∞. For n = 1, 2, . . . choose solutions xn of (4.1) that exist fortimes t ∈ [0, β n) where β n > β ∗ − 1/n. To define x∗, given t ∈ [0, β ∗) choose any nsuch that β n > t and let

x∗(t) = xn(t). (4.3)

By Theorem 3.3.4, the uniqueness result, the definition (4.3) does not depend on thechoice of n, and moreover x∗ is a solution of (4.1). It is readily checked that anysolution x of (4.1) on some interval I satisfies properties (i) and (ii) of (4.2).

Case 2: β ∗ = ∞. In this case one may choose a sequence of solutions existingfor times β n > n and proceed as above.

Despite its somewhat “wimpy” appearance, the following result is the mainworkhorse in extending solutions to larger times. Of course there is an analogousresult for negative time.

Theorem 4.1.2. Suppose x∗, the maximal solution of (4.1), exists only for times

t < β ∗ where β ∗ < ∞. Then for any compact set K ⊂ U , there is a time t ∈ [0, β ∗)such that x∗(t) /∈ K.

Proof. We argue by contradiction: Suppose that x∗(t) ∈ K for all t ∈ [0, β ∗). Recallfrom Proposition 3.2.5 that x∗ satisfies the integral equation

x∗(t) = b +

t

0

F(x∗(s)) ds, 0 ≤ t < β ∗.

The integrand F(x∗(s)) is bounded on [0, β ∗) by maxK |F|. The existence of theone-sided limit

I = limt→β −∗

t

0

F(x∗(s)) ds (4.4)

follows easily from this fact (see hint in Exercise 1a). Thus, the definition x∗(β ∗) =

b + I gives an extension of x∗ to the closed interval [0, β ∗] that is continuous there.By Corollary 3.2.9, the IVP (4.1) has a solution on a larger interval [0, β ∗ + η),contradicting the assumption that β ∗ was maximal.

85



In the next result we strengthen the conclusion of Theorem 4.1.2 from “thereexists a time . . . ” to “for all sufficiently large times . . . ”.

Corollary 4.1.3. Suppose x∗, the maximal solution of (4.1), exists only for times t < β ∗ where β ∗ < ∞. Then for any compact set K ⊂ U , there is a time β < β ∗such that x∗(t) /∈ K for all t with β < t < β ∗. In particular, if F(x) is defined for all x ∈ Rd, then for the maximal solution |x∗(t)| tends to infinity as t → β ∗.

In the Exercises we provide the reader hints for proving the corollary, but wechallenge him/her to try to prove it without consulting the hints. In teaching thecourse, we have found that student efforts to do this are highly educational.

4.2 Two sufficient conditions for global existence

4.2.1 Linear growth of the RHS

Our first result gives existence for all times, positive and negative.

Theorem 4.2.1. If F : Rd

→Rd is locally Lipschitz and if there exist nonnegative

constants B, K such that

|F(x)| ≤ K |x| + B, x ∈ Rd, (4.5)

the solution x(t) of (4.1) exists for all time, −∞ < t < ∞, and moreover

|x(t)| ≤ |b|eK |t| +B

K (eK |t| − 1), −∞ < t < ∞. (4.6)

Proof. This proof will use the generalization of Gronwall’s Lemma given in Exercise 5in Chapter 3. We consider only forward time, t ≥ 0. Suppose (4.1) has a solutionfor t

∈[0, β ), which of course satisfies the integral equation

x(t) = b +

t

0

F(x(s)) ds, 0 ≤ t < β.

Defining g(t) = |x(t)|, we deduce that

g(t) ≤ |b| +

t

0

[Kg(s) + B] ds, 0 ≤ t < β.

Hence by the generalized Gronwall Lemma, x satisfies the estimate (4.6) for its entiredomain of existence, 0

≤t < β

Now let x∗, β ∗ be the maximal solution of (4.1), and suppose β ∗ < ∞. According

86



to (4.6), x∗(t) belongs to the compact ball

K = z ∈ Rd : |z| ≤ |b|eKβ ∗ +B

K (eKβ ∗ − 1)

for all t ∈ [0, β ∗). This estimate contradicts Theorem 4.1.2, so we must have β ∗infinite.

Like many other results in ODE, this theorem has an analogue for a nonau-

tonomous equation, x′ = F(x, t)—see Exercise 4. Indeed, adapting the proof of Theorem 4.2.1 to the nonautonomous case may be a better way to understand theproof than simply reading it.

4.2.2 Trapping regions

Our second result gives global existence, but only in forward time, for an ODE thatneed be defined only on an open subset U of Rd. Let K be a compact subset of U whose boundary is a C1 surface, and for any x ∈ ∂ K, let N x be an inward normalto ∂ K at x. We shall call K a trapping region for x′ = F(x) if

(∀x ∈ ∂ K) F(x), N x ≥ 0 (4.7)

where ·, · is the inner product. In words, the direction of the flow of the ODE at ∂ Kis inward, or at worst tangential or zero. The following existence theorem explainsthe reason for the name “trapping region”.

Theorem 4.2.2. Suppose that F : U → Rd is C1 and that K is a trapping region for x′ = F(x). If the initial data b lies in the interior of K, then the solution x toequation (4.1) exists for all positive time and moreover lies in the interior of K.

In less formal language, we may restate the theorem: If in a picture it looks like the solution is confined to a region

K, then it really is confined to that region. In the

next section we use it to derive global existence for several specific ODEs. Otherapplications of the theorem are given in the Exercises. In Exercise 6 you will showthat global existence is still obtained if b belongs to the boundary of a trappingregion.

The proof is much easier if the inequality (4.7) is strict, so we begin by consideringseparately the special case with the augmented hypothesis

(∀x ∈ ∂ K) F(x), N x > 0. (4.8)

Breaking the proof into cases makes it longer but (we hope) more readable.

Case A: Proof of Theorem 4.2.2 assuming (4.8). Let

t∗ = supt : (∀s ≤ t) x(s) ∈ Int K < ∞. (4.9)

87



x = ψ (x )1 2

x 2

x1

K ofboundary

x*

N

V

Figure 4.1: Schematic of a trapping region K and the inward normal vector at a hypothetical exit point x∗ on the boundary of K. The neighborhood V and the function

ψ of (4.17) are also shown.

By the local existence theorem, t∗ > 0. We suppose t∗ < ∞ and look for a contra-diction. We ask you to prove that x(t∗) ∈ ∂ K. For brevity we write x∗ = x(t∗) and

N ∗=

N x∗.

Since ∂ K is smooth, it may be described locally as the zero set of a smoothfunction. Thus, there is a neighborhood V of x∗ and a function φ on V such that

∂ K ∩ V = x ∈ V : φ(x) = 0. (4.10)

Now ∇φ is a normal to ∂ K, and replacing φ by −φ if necessary, we may assume that∇φ is an inward normal. Thus, φ is non-negative on K, or in symbols

K ∩ V = x ∈ V : φ(x) ≥ 0.

By continuity, there is an interval [t∗−ε, t∗] such that for t in this interval x(t) liesin the neighborhood V of x∗. We derive a contradiction by considering the function

g(t) = φ(x(t)), t∗ − ε ≤ t ≤ t∗.

On the one hand, g(t) ≥ 0 for t∗ − ε ≤ t < t∗ while g(t∗) = 0, and differentiating

g′(t∗) = limδ→0+

g(t∗ − δ ) − g(t∗)

(−δ )≤ 0. (4.11)

On the other hand, by the chain rule

g′(t∗) = ∇φ(x∗), x′(t∗) = ∇φ(x∗), F(x∗). (4.12)

Combining (4.11) and(4.12), we deduce that ∇φ(x∗), F(x∗) ≤ 0, which contradicts

88



(4.8).

If the hypothesis (4.8) is eliminated, the strategy of the proof remains the same,but we have to work harder to derive a contradiction. To prepare for the generalcase, let us first prove the result in one dimension.

Case B: Proof of Theorem 4.2.2 in one dimension assuming only (4.7). The hypoth-esis that a closed interval K = [a1, a2] is a trapping region for a scalar ODE x′ = F (x)means that

F (a1) ≥ 0 and F (a2) ≤ 0. (4.13)

In analogy with (4.9), we suppose that

t∗ = supt : (∀s ≤ t) a1 < x(s) < a2 < ∞ (4.14)

and look for a contradiction.

First we focus on a1. Observe from the Fundamental Theorem of Calculus that

F (x) = F (a1) +

x

a1

F ′(y)dy.

Using (4.13) to drop F (a1), we deduce that

F (x) ≥ −K (x − a1) for x ∈ [a1, a2] (4.15)

where K = max[a1,a2] |F ′|. Then considering the solution x(t) on the interval 0 ≤ t ≤t∗, we compute with Leibniz’ rule that

d

dt[eKt(x(t) − a1)] = eKt · F (x(t)) + KeKt · (x(t) − a1) ≥ 0,

where for the final inequality we have invoked (4.15) to estimate the first term.

Therefore1

x(t∗) − a1 ≥ (x(0) − a1)e−Kt∗ > 0.

Applying similar considerations near x = a2, we conclude that if t∗ < ∞, then

a1 < x(t∗) < a2.

This contradiction proves the one-dimensional result.

Case C: Proof of Theorem 4.2.2 in d dimensions assuming only (4.7). As in Case A,suppose that t∗ defined by (4.14) satisfies t∗ <

∞. Again we characterize ∂

Knear

1This estimate has much in common with the proof of Gronwall’s Lemma (3.3.1), but note thathere we are estimating a solution from below rather than from above.

89



x∗ as the zero set of a function as in (4.10). However, to carry out the necessaryestimates it is desirable to massage φ into a special form as follows.

Relabeling the components of x and reflecting x1 if necessary, we may assumewithout loss of generality that the first component of the inward normal ∇φ(x∗) to∂ K at x∗ is positive (see Figure 4.1); in symbols

∇φ(x∗), e1 =∂φ

∂x1(x∗) > 0. (4.16)

Then by the Implicit Function Theorem, we may solve the equation

φ(x1, x2, . . . , xd) = 0

for x1 as a function of the remaining coordinates: i.e., there is a neighborhood V of x∗, say with V ⊂ U , and a C1 function ψ(x2, . . . , xd) on V such that

∂ K ∩ V = (x1, x) ∈ V : x1 = ψ(x) (4.17)

where x is a shorthand for (x2, . . . , xd). Thus, if ∇ indicates a (d − 1)-componentvector of partial derivatives with respect x2, . . . , xd, then

N = (1,

−˜

∇ψ(x)) is normal

to ∂ K; and since by assumption the first component of the inward normal is positive, N is an inward normal. Moreover, the set K lies in the direction of increasing x1;hence

(Int K) ∩ V = (x1, x) ∈ V : x1 > ψ(x).

Letg(t) = x1(t) − ψ(x(t)), t∗ − ε ≤ t ≤ t∗ (4.18)

where ε > 0 is chosen so that x(t) ∈ V for t in this interval. Then g(t) > 0 fort∗ − ε ≤ t < t∗ while g(t∗) = 0. Now it follows from the chain rule that

g′(t) = G(x(t)) (4.19)

where

G(x) = F 1(x) −d

j=2

∂ψ

∂x j(x)F j(x). (4.20)

Note that if x = (ψ(x), x) ∈ ∂ K , then

G(x) = F(x), N x ≥ 0, (4.21)

the latter inequality by the hypothesis (4.7).

90



Lemma 4.2.3. There is a constant K such that for all x ∈ K ∩ V

G(x) ≥ −K [x1 − ψ(x)].

Proof. We add and subtract G(ψ(x), x) to G(x):

G(x1, x) = [G(x1, x) − G(ψ(x), x)] + G(ψ(x), x). (4.22)

By (4.21) the second term here is non-negative; dropping it we have

G(x1, x) ≥ G(x1, x) − G(ψ(x), x). (4.23)

By the fundamental theorem of calculus2, the RHS of (4.23) may be rewritten

G(x1, x) − G(ψ(x), x) =

x1

ψ(x)

∂G

∂x1(s, x) ds ≥ −K [x1 − ψ(x)],

where K is a bound for the integrand for x ∈ V , from which the lemma follows.

Proof of Theorem 4.2.2 , concluded: Combining (4.18), (4.19), and Lemma 4.2.3, we

deduce that g′ ≥ −Kg. Differentiating eKt

g we may show that

g(t∗) ≥ e−Kεg(t∗ − ε),

which contradicts the above construction in which g(t∗) = 0.

Theorem 4.2.2 may be generalized by allowing trapping regions to have corners,which provides a much more versatile result. Restricting attention to two dimensions,for example, this generalization means there may be points where ∂ K is defined bytwo different curves. At such a singular boundary point, we require that inequal-ity (4.7) be satisfied for both normals. Sad to say, the proof of this generalization is

neither trivial nor rewarding, but in the Exercises we shall explore it further in thecontext of specific examples.

Like may other results for autonomous equations, Theorem 4.2.2 may be gener-alized to non-autonomous equations. However, in practice such a result turns outnot to be terribly useful, and we do not develop this idea.

4.3 Nullclines and trapping regions

We introduce nullclines through the following examples. After defining this term inthe discussion of the first example, we apply nullclines to construct trapping regions.

2Although G(x1, x) is merely continuous with respect to x (because of ∇ψ), it may be seen from(4.20) that G(x1, x) is C1 with respect to x1. Indeed, we massaged φ into the form (4.17) preciselyto obtain this extra modicum of smoothness.

91



Here and below, whenever feasible, we choose physically meaningful equations toillustrate the theory, either including or referencing an explanation of the origins of the equation. Most of these examples will re-occur in later chapters.

4.3.1 An activator-inhibitor system

Our first example is the 2 × 2 system

(a) x′ = α1

1+r

x2

1+x2 − x(b) r′ = γ [β x2

1+x2− r]

(4.24)

where α, β , γ are positive parameters. The variables x and r represent the concen-tration of two chemical species that evolve through their interaction inside a reactionvessel. In physical terms, x promotes its own production, or in mathematical terms,the factor x2/(1 + x2) in (4.24a) increases with x. At the same time x promotesthe production of r which in turn inhibits the production of x. Of course the linearterms in each equation represent the decay of the two species. (See Problem ?? inAppendix D for more information about this system.)

In the context of (4.24), the term nullcline refers to a curve where one of the twocomponents of the velocity vector (x′, r′) vanishes. By (4.24a), x′ vanishes if

x = 0 or r = αx

1 + x2− 1, (4.25)

while by (4.24b), r′ vanishes if

r = β x2

1 + x2. (4.26)

Both curves are graphed in Figure 4.2(a); we have shown the more interesting casewhen α is large enough (α2 > 4(β + 1)) that the two nullclines have multiple inter-sections (bold dots in the figure). Such points (a, b) where the nullclines intersectrepresent equilibrium solutions of the ODE: i.e., points such that the constant func-tion (x(t), r(t)) ≡ (a, b) is a solution. Thus, for the case of large α as shown in thefigure, (4.24) has three equilibrium solutions, the trivial one (0, 0) and two nontrivialones.

To extract information from the nullclines, it is helpful to proceed in the threestages represented in Figures 4.2(a), 4.2(b), and 4.2(c). Whenever you need to graphnullclines, we urge you to proceed in these three stages—over the years we have founddoing so organizes one’s thoughts for maximum efficiency.

•In Figure 4.2(a), vertical lines have been drawn along the curve (4.25) because

the flow is vertical there: i.e., x′ = 0. Similarly, horizontal lines have beendrawn along (4.26).

92



• Figure 4.2(b) augments the previous figure by specifying along (4.25) whetherthe flow is up or down; and along (4.26) whether the flow is to the right orleft. Here is the thinking behind constructing this figure. The orientation of the flow along (4.25) changes from up to down whenever this curve crosses anullcline of the other family. More particularly, on any segment of (4.25) thatdoes not intersect (4.26), the orientation of the flow does not change. Oftenone can fill in these arrows starting from the determination of the sign of theflow in some extreme case. For example, it may be seen from ( 4.24b) that r′ is

positive at the point where (4.25) crosses the positive x-axis (i.e., where r = 0),and thus the flow is upward at this point. The flow remains upward as onemoves away from this point, as long as the other nullcline (4.26) is not crossed.More generally, all the remaining arrows along (4.25) in Figure 4.2(b) may beconstructed from this starting case by reversing the direction every time (4.25)crosses (4.26). Similarly, by (4.24a), x′ is negative near the origin (in the firstquadrant), and as before, all the arrows along along (4.26) in Figure 4.2(b)may be constructed by building on this starting case.

• The nullclines partition the x, r-plane into regions. Within one region the flowF(x, r) points into one of the four quadrants,

±x > 0,

±r > 0

, and the

quadrant remains the same if (x, r) moves within this region. The quadrantof the flow in each of the regions is indicated by a thick black arrow in Fig-ure 4.2(c). Again, one can complete this figure by understanding an extremecase and making appropriate reversals on crossing nullclines.

Construction of a trapping region: Consider a rectangular region

R = (x, r) : 0 ≤ x ≤ A, 0 ≤ r ≤ B (4.27)

as illustrated in Figure 4.3. Pictorially, it may be seen from the figure that the flow

along each side of R is inward provided A and B are large enough. Analytically, inExercise 1 we ask you to determine explicit estimates for A, B to ensure this.

We now claim that, for any initial conditions (a, b) with a > 0, b > 0, theIVP for (4.24) has a solution for all positive time that remains in the first quadrant.Given (a, b), we may choose A, B large enough so that (i) the flow is inward along theboundary of R and (ii) (a, b) belongs to R. The direct application of Theorem 4.2.2to prove the claim is prevented by the fact that the boundary of R is only piecewise smooth, but in Exercise 7 we indicate how to overcome this difficulty.

4.3.2 Lotka-Volterra with a logistic modification

In Exercise 5 in Chapter 1 we saw that all trajectories of the Lotka-Volterra equationsare periodic. In particular, solutions of the Lotka-Volterra equations (1.33) exist for

93



x2β

x2β

x2β

x

r

x =

0

1+x

r = αx− 1

1+x

2

2

x

r

x =

0

1+x

r = αx− 1

1+x

2

2

x

r

x =

0

1+x

r = αx− 1

1+x

2

2

r =

r =

r =

(a)

(b)

(c)

Figure 4.2: Nullclines of the activator-inhibitor equations (4.24) for the specific choice of parameters α = 4 and β = 1, in which case there are three equilibria (bold dots). Insets show a blowup of a region near the origin which contains two of the three equilibria. Panel (a): Horizontal and vertical segments along the nullclines indicate the orientation of the flow. Panel (b): Arrows along the nullclines indicate the direction of the flow. Panel (c): Several short, thick arrows indicate the direction

of the flow in regions partitioned by the nullclines.

94



r = B

x =

0

x

x =

A

r

Figure 4.3: A rectangular trapping region for solutions of the system (4.24) lying in the first quadrant.

all time. On the other hand, the only trapping regions are those bounded by theorbits themselves. Thus, trapping regions are not a useful technique for provingglobal existence for the Lotka-Volterra system.

Let us modify the Lotka-Volterra equations by limiting the growth of the prey toa finite carrying capacity:

(a) x′ = x(1 − x/K ) − xy(b) y′ = ρ(xy − y).

(4.28)

The nullclines of (4.28) consist of four straight lines in the xy plane (Figure 4.4). Twoof these come from Equation (4.28a), namely x = 0 and y = 1 − x/K , and two fromequation (4.28b): y = 0 and x = 1. Provided that K > 1, the nullclines intersect asshown in the left panel of Figure 4.4. Flow directions between the nullclines are alsoshown in the figure.

For (4.28) we are unable to take a rectangle as a trapping region, but it is possibleto find a triangle. In the Exercises, the reader is asked to show that the triangularregion in the right panel of Figure 4.4 forms a trapping region provided A and B arechosen appropriately large. Repeating the argument above, including the extensionof Theorem 4.2.2 to allow ∂ K to have corners, we may show that the IVP for (4.28)has a solution for all positive time for any initial conditions in the first quadrant.

4.3.3 van der Pol’s equation

We now consider a technically more difficult example, the van der Pol system,

(a) x′ = y(b) y′ = −β (x2 − 1)y − x,

(4.29)

95



y

x

K

x

y

00

B

1

1

A

(a) (b)

1 K

Figure 4.4: Left panel: Nullclines of equations ( 4.28 a) (thick, black lines)and ( 4.28 b) (thinner, lighter-colored lines) assuming K > δ/γ . Right panel: If A, Bare chosen sufficiently large, the indicated trapezoid forms a trapping region.

which we encountered in Chapter 1. Let us first try to construct trapping regionswithout using nullclines. For starters, we check whether a level curve of an “energyfunction” like E = x2/2 + y2/2—a circle—could form the boundary of a trappingregion. However, a simple calculation shows that

dE

dt= −β (x2 − 1)y2 :

i.e., the flow is not inward on the portion of the circle within the vertical strip−1 < x < 1.

To accommodate this behavior, we attempt to construct a trapping region with apiecewise smooth boundary as shown in Figure 4.5(a): i.e., a line segment PQ withthe equation

y = (β + 1)x + A, −2 ≤ x ≤ 2 (4.30)

where A is a large constant, joined onto two circles. From our calculation above, weknow the flow is “inward” along the two circles3. By dividing the two equations in(4.29) we calculate that

dy

dx= β (1 − x2) − x

y≤ β − x

y;

therefore the flow is also “inward” along PQ provided the constant A is chosen largeenough so that |x/y| ≤ 1 along this line segment. However, we run into trouble

at large negative y in the problematic strip −1 < x < 1— it is not possible to3We put inward in quotes because we have not actually defined a closed region.

96



close up the figure in a way that maintains inward flow over the entire boundary.Moreover, this difficulty is not easily resolved by moving the centers of the circles orby adjusting their radii.

We now resort to nullclines, which are shown in Figure 4.5(b). Note that thedirection of the flow changes rapidly when it crosses the x-axis at large values of x—it changes from vertical on the x-axis to horizontal at the nullcline where y ≈ −1/βx.This fact gives us great latitude in choosing the boundary of a trapping region inthe fourth quadrant. For our purposes it will suffice to have the estimate below the

nullclinex > 1 , y < − x

β (x2 − 1)=⇒ x′ < 0 , y′ > 0. (4.31)

We use (4.31) to fix the problem encountered in Figure 4.5(a). Specifically, asillustrated in Figure 4.5(c), we construct trapping regions with six sides, as follows.Note that the flow (4.24) is odd under the reflection (x, y) → (−x, −y); we constructa region that is invariant under this reflection. We begin with the line segment PQas in (4.30). We continue with QR, a portion of a circle with center (0, 0) thatextends from Q to a point R just below the x-axis where the circle intersects thenullcline. Then we proceed along the straight line from R to S, where S, which has

coordinates (2, 2(β + 1)−A), is the reflection of P through the origin. The remainderof the boundary is the reflection of what we have already constructed. By previousconstruction the flow is inward along PQ and QR, and by (4.31) it is inward alongRS.

To conclude, starting with A in (4.30) large enough, we may construct a trap-ping region that encloses arbitrary initial conditions. Hence we have proved globalexistence for all initial data.

We have guided the reader through the above construction because it is goodtraining to work through such a complicated case. After the fact, we confess thatthere is an easier alternative based on a clever, nonstandard reduction of the scalar

van der Pol equation to a first-order system (see Exercise 8b).

The above three examples provide an adequate introduction to using nullclinesto find trapping regions for ODEs, and more examples are given in the Exercises.However, we present two more examples of constructing trapping regions becausethere are more lessons to be learned.

4.3.4 The torqued pendulum and ODEs on manifolds

Consider a pendulum, as illustrated in Figure 4.6, that is subjected to a “torque” µ,

which tends to twist the unperturbed pendulum away from its stable equilibrium.If friction is modeled by linear damping, then this problem is described by the first-

97



y−intercept A

y

x

0 2−2

slope β + 1

Q

Rx

y

P

Q

y

x

(b)

(c)

(a)

x

= 2

x

=

− 2

P

S

Figure 4.5: (a) An attempt to form a trapping region for the van der Pol system (4.29), as described in the text. The points P and Q have coordinates (−2, A − 2(β + 1)) and (2, A + 2(β + 1)), respectively. Although the flow is directed “inward” along the solid, black curve, it is not feasible to “close off” the curve to

form a trapping region. (b) Nullclines of the van der Pol system (4.29), with arrows indicating the direction of the flow. (c) Trapping region (enclosed by bold curve)

for the van der Pol system (4.29). The thin, solid curves are the nullclines, and the dashed vertical curves (included for reference) are the lines x = 0 and x = ±2,

and a circle of [sufficiently large] radius centered at the origin. The boundary of the trapping region is a piecewise smooth curve consisting of six pieces as described in the text.

98



µtorque

x

Figure 4.6: Schematic diagram of the torqued pendulum.

order system of ODEx′ = yy′ = − sin x − βy + µ.

(4.32)

Without loss of generality, it suffices to consider the case µ ≥ 0.In the absence of torque, the energy E = y2/2 − cos x decreases along orbits.

With torque one calculates that

dE

dt= −βy2 + µy;

this may have either sign, but if |y| > µ/β (4.33)

then dE/dt is negative. Note that if E = E 0 where E 0 > µ2/β + 1, then (4.33)follows. Hence the flow along the boundary of the region

K = (x, y) : βy2/2 − cos x ≤ E 0 (4.34)

is everywhere inward. Indeed K would be a trapping region except for the fact that itis not bounded in x and therefore is not compact (see Figure 4.7). Thus, in principle,a solution of (13) might stay inside K while x marches off to infinity in finite time.In fact, this does not happen: The RHS of (13) satisfies the hypothesis (4.5) of Theorem 4.2.1, so the solution may grow exponentially but it exists for all t ∈ R..(Remark: In fact in positive time x grows at most linearly in t—see Exercises.)

The above “clunky” proof of global existence may be replaced by a much more

elegant approach based on geometry. Note that the RHS of (13) is periodic in x.Rather than considering (13) as an ODE on the Euclidean space R× R, we regardit as an ODE on the cylinder S 1 × R, where S 1 is the circle. Then, considered as

99



−4π π−2 π42 π

y

x y = 0

Figure 4.7: Sketch of the region K in (4.34). In the left panel, K is the unbounded region that lies between the two curves. The right panel is a rendering of the vertical strip 0 ≤ x < 2π, −∞ < y < ∞ wrapped onto a cylinder so that x = 0 is identified with x = 2π.

a subset of S 1

×R, the set

Kis compact (see right panel of Figure 4.7). Moreover

Theorem 4.2.2 generalizes to the manifold S 1 × R as one may prove by workingwith appropriate periodic functions. This technique establishes global existence inforward time for (13).

Occasionally it will be convenient to regard a system of ODEs as describing flowon a manifold. Although a extension of Theorem 4.2.2 is valid on general manifolds,we do not formulate such a result. but give reference For our purposes it suffices tohave this extension for two manifolds, the cylinder S 1×R and the torus T2 = S 1×S 1,for which the extension may be proved with simple, albeit ad hoc , considerations of periodic functions of a real variable.

4.3.5 Michaelis-Menten kinetics

Michaelis-Menten kinetics arises in modeling the concentrations of certain chemicalspecies in an enzyme-mediated reaction. Applying suitable scaling and reasonableassumptions on the underlying chemistry (see Appendix D), the equations take theform

(a) r′ = −r(1 − c) + c(b) εc′ = r(1 − c) − (1 + κ)c,

(4.35)

where ε, κ are positive parameters. Typically ε is very small indeed.

Global existence for (4.35) is trivial. By forming a linear combination of the

equations one sees that the derivative of r + εc is negative. Any triangular regionbounded by the r-axis, the c-axis, and a line r + εc = A can serve as a trappingregion.

100



r

c

00

Figure 4.8: Nullclines for the scaled Michaelis-Menten equations (4.35) form a narrow trapping region.

However, the region between nullclines

c =r

r + 1

and c =r

r + 1 + κ

is a much more interesting trapping region (see Figure 4.8). The rapid flow (4.35b)drives (r, c) into this trapping region, after which both variables tend slowly to zerowithin the trapping region.

We ask the reader to recall Exercise 7 from Chapter 1, which also dealt withtwo variables evolving at radically different rates. In the spirit of that exercise, weconsider the approximation of setting ε = 0 in (4.35b). In this approximation, wemay then solve (4.35b), now an algebraic equation , to obtain c = r/(r + 1 + κ);substituting into (4.35a) we derive

drdt

= κrr + 1 + κ

. (4.36)

This equation is the (scaled) Michalis-Menton approximation for the enzymatic reac-tion rate arising from (4.35). One may worry about the violent approximation fromwhich it is derived, which changes the differential equation (4.35b) to an algebraicequation, but the above analysis with nullclines shows that, apart from the initialtransient, it does not lead us astray.

101



4.4 Continuous dependence of the solution

4.4.1 The main result

Theorem 4.4.1. Suppose F : U → Rd is locally Lipschitz, and let x0(t), 0 ≤ t < β ,be a solution in forward time of x′0 = F(x0) with initial condition x0(0) = b0. Then:

(i) For any positive T that satisfies T < β , there is a neighborhood V of b0 such that for any b ∈ V , the IVP

x′ = F(x), x(0) = b (4.37)

has a solution for 0 ≤ t < T .

(ii) Moreover, there is a constant L such that for all b ∈ V ,

|x(t) − x0(t)| ≤ |b − b0| eLt, 0 ≤ t < T. (4.38)

In words, Conclusion (ii) asserts that if the initial data for an IVP are alteredslightly, then the perturbed solution diverges from the original solution no fasterthan at a controlled exponential rate. Conclusion (i), which guarantees that the

perturbed solution exists for nearly as long as x0, gives the estimate more significance.Incidentally, in Exercise 17 we give an example where x0 blows up in finite time butnearby solutions exist for much longer times; indeed, for all positive time.

Proof of Theorem 4.4.1. Let K0 be the image of [0, T ] under x0, which is a compactsubset of U . By Corollary 3.3.3, there is a larger compact subset K ⊂ U and a δ > 0such that for every point x0(t) ∈ K0, the closed ball B(x0(t), δ ) is contained in K; insymbols,

t ≤ T =⇒ B(x0(t), δ ) ⊂ K. (4.39)

By Proposition 3.3.2, F|K is Lipschitz continuous, say with Lipschitz constant L > 0.

Let V = B(b0, e−LT δ ) be the neighborhood in the theorem. If b ∈ V , let x bethe solution in forward time of (4.37), on whatever interval it exists, and define

t∗ = supt ∈ [0, T ] : (∀s ≤ t) |x(s) − x0(s)| < δ . (4.40)

Since |x(0) − x0(0)| < e−LT δ < δ , we know t∗ > 0.

Let g(t) = |x(t) − x0(t)|. From subtracting the integral equations for x and x0

we deduce that

g(t) ≤ |b − b0| + t

0

|F(x(s)) − F(x0(s))| ds. (4.41)

If t ≤ t∗, both x(s) and x0(s) in the integrand belong to B(x0(s), δ ) ⊂ K, so Lipschitz

102



continuity gives us

|F(x(s)) − F(x0(s))| ≤ L|x(s) − x0(s)|.

Substituting into (4.41) we have

g(t) ≤ |b − b0| + L

t

0

g(s) ds

and hence by Gronwall’s Lemma

g(t) ≤ |b − b0|eLt, 0 ≤ t ≤ t∗. (4.42)

We claim that t∗ = T . Certainly t∗ ≤ T . But we have from (4.42) that

g(t∗) ≤ (e−LT δ ) eLt∗ ≤ e−L(T −t∗)δ.

If t∗ were strictly less than T , then we would have g(t∗) = |x∗ − x0(t∗)| < δ , contra-dicting the definition (4.40) of t∗ This completes the proof.

Corollary 4.4.2. In Theorem 4.4.1, if F :U →

Rd is

C1, then in the estimate (4.38)

we may take the constant

L = maxx∈K

DF(x)where K is the compact subset of U chosen in the above proof.

As an Exercise, we ask the reader to review the proof and verify that this choiceof L yields the desired estimate.

These results and related results below have analogues in backwards time, butwe do not bother to formulate these.

4.4.2 Some associated formalism

Sometimes, when it is convenient to focus on how the solution of an IVP depends onthe initial data, we shall use the flow notation. Specifically, we shall write

ϕ(t, b) = x(t) (4.43)

where x is the solution of the IVP (4.37). This solution operator or flow function isa mapping ϕ : Ω → U , where its domain is given by

Ω = (t, b) ∈ [0, ∞) × U : t ∈ maximal interval of existence for (4.37) . (4.44)

It follows from Theorem 4.4.1 that ϕ is Lipschitz continuous with respect to itssecond argument b. We ask the reader to supply the trivial proof that it is Lipschitzcontinuous with respect to both arguments simultaneously.

103



The solution operator satisfies the following relation, which is known as the semi-group property .

Proposition 4.4.3. If (s, b) ∈ Ω and if (t,ϕ(s, b)) ∈ Ω, then (s + t, b) ∈ Ω and

ϕ(t,ϕ(s, b)) = ϕ(s + t, b). (4.45)

This result follows easily from Lemma 3.2.8.

4.4.3 Continuity with respect to the equation

In Theorem 4.4.1 we showed that the solution of an IVP depends continuously on itsinitial data. In the following result we show that the solution “depends continuouslyon the equation”. More precisely, we compare solutions of two IVPs

x′ = F(x), x(0) = b and y′ = G(y), y(0) = b. (4.46)

Theorem 4.4.4. Suppose F : U → Rd and G : U → Rd are locally Lipschitz, and suppose x(t), y(t) satisfy (4.46) for 0 ≤ t < β . Then for every T < β , there exists

constants L and η > 0 such that if sup U |F − G| < η then

|x(t) − y(t)| ≤ eLt − 1

Lsup U

|F − G|, 0 ≤ t ≤ T, (4.47)

In this result, the supremum over U could be replaced by a supremum over anappropriate compact subset of U . Note that the strength of the estimate (4.47)derives from the fact that sup U |F − G| might be much smaller than η.

The estimate (4.47) is analogous to Conclusion (ii) of Theorem 4.4.1. We couldformulate an existence result for the perturbed equation that was analogous to Con-clusion (i), but this does seem not worth the bother.

The proof of this result, which involves just recycling ideas in the proof of Theo-rem 4.4.1, is left as an Exercise.

4.5 Differentiable dependence on initial data

4.5.1 Formulation of the main result

In the previous section we showed that the flow ϕ(t, b) is Lipschitz continuous in b.In this section we show that ϕ is in fact C1, provided of course that F is C1.

The above phrasing is how a pure mathematician would describe the results of

this section. However, let us adopt the language of applied math since we believe thismakes the discussion more intuitive. Thus we suppose x0(t), 0 ≤ t < β , is a solution

104



in forward time of x′ = F(x) with initial condition x0(0) = b0, and we ask how thesolution changes if the initial condition is perturbed: let x(t, ε) be the solution of

x′ = F(x), x(0) = b0 + εb1. (4.48)

We look for an expansion4 of the solution in powers of ε

x(t, ε) = x0(t) + εx1(t) + . . . . (4.49)

The size of the neglected terms, represented by the dots, will be estimated in Theo-rem 4.5.1. For the moment we proceed formally. Substituting (4.49) into (4.48) weobtain

x′0(t) + εx′1(t) + . . . = F(x0(t) + εx1(t) + . . .).

Using a Taylor series to expand the RHS of this equation in powers of ε, we calculate

x′0(t) + εx′1(t) + . . . = F(x0(t)) + εDF(x0(t)) · x1 + . . . . (4.50)

Equation (4.50) must hold for all values of ε: i.e., the two power series on either sideof the equation define the same functions of ε. Thus the coefficients of each power

of ε must be equal. Matching corresponding powers of ε in(4.50) we obtain

O(ε0) : x′0 = F(x0)O(ε1) : x′1 = DF(x0(t)) · x1.

The O(ε0)-equation merely repeats the equation for our original solution. The O(ε1)-equation gives new information: i.e., an ODE for x1, which is a linear equation withtime-dependent coefficients

x′1 = A(t)x1 (4.51)

where the coefficient matrix is given by A(t) = DF(x0(t)). Of course matching

powers of ε in the initial conditions requires

x1(0) = b1. (4.52)

Theorem 3.4.3 guarantees that the IVP (4.51), (4.52) has a unique solution x1(t) fort in the same interval [0, β ) on which x0 is defined.

Theorem 4.5.1. Let F : U → Rd be C1. In the above notation

limε→0

|x(t, ε) − x0(t) − εx1(t)|ε

= 0, 0 ≤ t < β. (4.53)

Moreover, for any T < β , the limit is uniform for 0 ≤ t ≤ T .4We may describe (4.49) as an ansatz : i.e., an assumed form for the solution of a problem. This

term, which comes from German, is a useful one to add to your (mathematical) vocabulary.

105



4.5.2 The order notation

We first introduce the order notation—big-O and little-o—because this makes anotherwise messy proof fairly straightforward.

The basic use of big-O, the simpler concept, is as follows: Given a quantity thatdepends on a parameter, f (ε) where 0 < ε < ε0 and f may be either vector or scalar,we say that f is order-ε, written f (ε) = O(ε), if

(∃ε1 > 0)(∃C ) such that 0 < ε < ε1 =⇒ |f (ε)| ≤ Cε.

(The formula f (ε) = O(ε) may also be read “f is big-O of ε.) The same notationis used if ε can assume either sign: i.e., if f (ε) is defined for 0 < |ε| < ε0, we writef (ε) = O(|ε|) if |f (ε)| ≤ C |ε| provided |ε| is sufficiently small. The notation isalso used to estimate quantities that depend on multiple parameters. For example,in Theorem 4.4.1 the solution x depends on the d parameters of the initial data,b1, . . . , bd, and we may paraphrase the conclusion of the theorem as

|x(t) − x0(t)| = O(|b − b0|). (4.54)

Indeed we will say that (4.54) holds uniformly for t ∈ [0, T ] because the singleconstant eLT makes the inequality work for all t in this interval.

Little-o is a more subtle concept. If f is defined for 0 < ε < ε0, we say that f islittle-o of ε, written f = o(ε), if

(∀η > 0)(∃ε1 > 0) such that 0 < ε < ε1 =⇒ |f (ε)| ≤ ηε.

Of course this definition is equivalent to lim f (ε)/ε = 0. In Exercise 1 we ask thereader to show that for any constant C and any function φ(ε) ≥ 0,

(a) If f (ε) = o(ε), then Cf (ε) = o(ε).

(b) If f (δ ) = o(δ ) and if φ(ε) = O(ε), then f (φ(ε)) = o(ε).(4.55)

This result illustrates the usefulness of the order notation—reinterpreting (4.55) interms of the original C ’s, ε’s and η’s obscures the basic simplicity of the behavior inquestion. The little-o concept and (4.55) also extend to functions defined for bothpositive and negative ε and to functions of multiple parameters. The proofs of theseclaims are left for the dedicated reader.

To relate these concepts to Theorem 4.5.1, let g(t, ε) = |x(t, ε) − x0(t) − εx1(t)|;inequality (4.53) simply asserts that

g(t, ε) = o(ε), (4.56)

106



uniformly in t. We shall show that there is a constant L such that

g(t, ε) ≤ o(ε) + L

t

0

g(s, ε) ds 0 ≤ t ≤ T (4.57)

where the o(ε)-term is uniformly small for 0 ≤ t ≤ T . We ask the reader to combineGronwall’s lemma with (4.55) to show that (4.53) follows from (4.57). Need toexplain what (4.57) means.

4.5.3 Proof of Theorem 4.5.1

Let K0 be the image of [0, T ] under x0. By Corollary 3.3.3, there is a larger compactsubset K ⊂ U and a δ > 0 such that

0 ≤ t ≤ T =⇒ B(x0(t), δ ) ⊂ K. (4.58)

We bound the location of the solution x(t, ε) of (4.48) with Theorem 4.4.1 using theconstant L of Corollary 4.4.2,

L = maxx∈K

DF(x)

. (4.59)

Note that |x(0, ε) − x0(0)| = ε|b1|. Thus, if |ε| < ε0, where ε0 = e−LT δ/|b1|, thenfor all t ∈ [0, T ], we have x(t, ε) ∈ B(x0(t), δ ) ⊂ K.

Starting from the integral relations

x(t, ε) = b0 + εb1 + t0

F(x(s, ε)) ds

x0(t) = b0 + t0

F(x0(s)) ds

x1(t) = b1 + t0

A(s)x1(s) ds,

we form an appropriate linear combination in which the constant terms cancel to

deduce that deduce that

g(t, ε) ≤ t

0

|F(x(s, ε)) − F(x0(s)) − εA(s)x1(s)| ds.

Let us add and subtract A(s)(x(s, ε)− x0(s)) in the integral on the RHS and use thetriangle inequality to obtain

g(t, ε) ≤ I 1(t, ε) + I 2(t, ε) (4.60)

107



where

I 1(t, ε) = t0|F(x(s, ε)) − F(x0(s)) − A(s)(x(s, ε) − x0(s))| ds

I 2(t, ε) = t0|A(s)[x(s, ε) − x0(s) − εx1(s)]| ds.

(4.61)

The following two claims verify (4.57), which will complete the proof.

Claim 1: I 1(t, ε) = o(ε), uniformly for t ∈ [0, T ).

Claim 2: If L is given by (4.59), then

I 2(t, ε) ≤ L

t

0

g(s, ε) ds.

Proof of Claim 2. The integrand in the definition (4.61) of I 2 may be estimated byA(s) g(s). Since x0(s) ∈ K we have A(s) = DF(x0(s)) ≤ L.

Proof of Claim 1. For any z1, z2 ∈ K such that the entire line segment from z1 to z2belongs to U , it was shown in Lemma 3.2.3 that

F(z1) − F(z2) = 1

0DF(z2 + α(z1 − z2)) dα

· (z1 − z2).

Subtracting DF(z2) · (z1 − z2) from both sides of the equation, we deduce

F(z1)−F(z2)−DF(z2)·(z1−z2) =

10

[DF(z2 + α(z1 − z2)) − DF(z2)] dα

·(z1−z2).

(4.62)Now DF(z) is continuous on U , and its restriction to the compact set K is uniformlycontinuous. Therefore, for every η > 0, there is a δ > 0 such that

|z1

−z2

|< δ =

⇒ DF(z1)

−DF(z2)

< η.

Substituting into (4.62) and observing that the distance between z2 + α(z1− z2) andz2 is α|z1 − z2| ≤ |z1 − z2| we conclude that

F(z1) − F(z2) − DF(z2) · (z1 − z2) ≤ η|z1 − z2| = o(|z1 − z2|), (4.63)

or more compactly,

F(z1) − F(z2) − DF(z2) · (z1 − z2) = o(|z1 − z2|)

Letting z1 = x(s, ε), z2 = x0(s), we deduce that the integrand of

I 1 in (4.61) satisfies

F(x(s, ε)) − F(x0(s)) − A(s) · (x(s, ε) − x0(s)) = o(|x(s, ε) − x0(s)|). (4.64)

108



Now we invoke Theorem 4.4.1:

|x(s, ε) − x0(s)| = O(|x(0, ε) − x0(0)|) = O(|b1|ε) = O(ε).

Substituting this estimate and (4.64) into (4.55) where ϕ(ε) = |x(s, ε) − x0(s)|, wesee that

F(x(s, ε)) − F(x0(s)) − A(s)(x(s, ε) − x0(s)) = o(ε).

Of course

I 1 is bounded by T times the maximum of the integrand in (4.61), so we

are done.

4.5.4 Further discussion

To start, we record the pure-mathematician’s version of Theorem 4.5.1.

Theorem 4.5.2. If F : U → Rd is C1, then the flow ϕ : Ω → U is C1 with respect toall its arguments.

Proof. We focus on the derivatives of ϕ with respect to b since the t derivative iseasily handled. First we must show that the partial derivatives of ϕ exist; i.e., show

the existence of the limits

∂ ϕ

∂b j(t, b) = lim

h→0

ϕ(t, b + he j) −ϕ(t, b)

h,

where e j is a unit vector in the jth-coordinate direction. This follows from The-orem 4.5.1, and in fact the limit equals the solution v(t) of a linear initial valueproblem

dv

dt= DF(ϕ(t, b))v, v(0) = e j . (4.65)

By Theorem 4.4.4, the solution of (4.65) depends continuously on b and t. (Note

that, as regards continuity with respect to t, the function v depends on t bothindirectly through the coefficient matrix in (4.65) and directly as the solution of anODE.) Therefore ϕ is C1.

Equation (4.65) is a beautiful characterization of ∂ ϕ/∂b j. It is used in theoreticalanalysis of ODE, for example in the study of stability of periodic solutions in Sec-tion 6.4 through the Poincare map. Unfortunately, cases where (4.65) can be solvedexplicitly are rather rare, not least because ϕ cannot be found explicitly.

Here is one example where explicit solution is possible. Consider the IVP for theLotka-Volterra equations,

(a) x′ = x − xy, x(0) = b1(b) y′ = ρ(xy − y), y(0) = b2.

(4.66)

109



If b2 = 0, then (4.66) has the explicit solution ϕ(t, b1, 0) = (b1et, 0). How is thesolution changed if initially there is a small population of predators? To estimate thischange we may calculate ∂ ϕ/∂b2, which is the goal of Exercise 10. This interpretationof the derivative makes this primarily instructional exercise slightly less academic.Here we calculate ∂ ϕ/∂b1 to prepare for the exercise.

The differential of (4.66) is

DF = 1 − y −x

ρy ρ(x − 1) .

Substituting ϕ(t, b1, 0) into this matrix we find that v = ∂ ϕ/∂b j satisfies the linearsystem v′ = A(t)v where

A(t) =

1 −b1et

0 ρ(b1et − 1)

.

For v = ∂ ϕ/∂b1, the appropriate initial condition is v(0) = (1, 0), and solving thesystem we calculate that ∂ ϕ/∂b1(t, b1, 0) = (et, 0). Of course this answer may bechecked by direct differentiation of ϕ(t, b1, 0) = (b1et, 0).

We refer the reader to Exercise 10 for the more interesting calculation of ∂ ϕ/∂b2.(See also Exercise 14 for another example where a derivative of a solution with respectto initial conditions may be computed.)

4.5.5 Generalizations

First let us extend Theorem 4.5.2 to nonautonomous IVP. Let ϕ(t, t0, b) denote thesolution of

x′ = F(x, t), x(t0) = b. (4.67)

Theorem 4.5.3. If F : U × (T 1, T 2) → Rd is C1, then the flow ϕ(t, t0, b) is C1 with

respect to all its arguments (in the domain where the IVP has a solution). Moreover,as a function of time, ∂ ϕ/∂b j(t, t0, b) satisfies a linear system

dv

dt= DF(ϕ(t, b), t)v, v(t0) = e j. (4.68)

Remark: In (4.68), DF denotes the d × d matrix of partial derivatives ∂ F/∂x j,not including the t derivative .

The proof of the theorem is posed for the dedicated reader in Exercise ??.

As in the discussion of continuity in Section 4, let us move on from differentiabilitywith respect to initial conditions to differentiability ”with respect to the equation”.

The benign interpretation of this concept involves a parametrized family of ODEs,say

x′ = F(x, α1, α2, . . . , αk). (4.69)

110



Hint: Invoke Corollary 3.3.3 to obtain a larger compact set K′ suchthat for every x ∈ K, B(x, δ ) ⊂ K′. Apply Proposition 4.1.2 to K′.Find a lower bound for the time required for the solution to movefrom U ∼ K′ to K.

(c) If t∗ is defined by (4.9), show that x(t∗) ∈ ∂ K.

(d) Determine sufficient conditions on A, B that make the region in Figure 4.3a trapping region for (4.24).

Remark: See Exercise 7 regarding the fact that ∂R is only piecewisesmooth.

(e) Determine sufficient conditions on A, B that make the region in Figure 4.4ba trapping region for (4.28).

Hint: Choose A first; the condition on B depends on A.

(f) Verify the claim (4.31).

(g) Regarding (13), improve on Theorem 4.2.1 by proving that y remainsbounded and |x| ≤ A + B|t| for some constants A, B.

Hint: Let K be the (noncompact) region (4.34). Combine and modifythe proofs of Theorems 4.2.1 and 4.2.2

(h) Prove Corollary 4.4.2.

(i) Show that the solution map ϕ defined in (4.43) is Lipschitz continuous withrespect to all arguments.

(j) Prove Proposition 4.4.3.

(k) Prove Proposition 4.4.4.

(l) Verify the claims made in (4.55).

(m) Show that (4.53) follows from (4.57).

2. Regarding Theorem 4.1.2, give an example to show that the conclusion neednot follow if β ∗ = ∞.

3. Construct an IVP x′ = F (x), x(0) = b in one-dimension for which F is boundedbut whose solution has a maximal interval of existence (α∗, β ∗) with both end-points finite.

112



Discussion: We have seen examples where the solution of an IVP forx′ = F(x) does not exist for all time because the solution blows upin finite time. As Theorem 4.2.1 shows, this behavior may be tracedto super-linear growth of F as x → ∞ and apparently cannot occurif F is bounded.. However, if the domain of F is a proper subset U of Rd, then the solution of an IVP might cease to exist because thesolution tends to ∂ U , which is what underlies the present exercise.

4. Generalize Theorem 4.2.1 to nonautonomous equations: i.e., prove that if thereexist nonnegative constants B, K such that

|F(x, t)| ≤ K |x| + B, x ∈ Rd and T 1 < t < T 2,

then solutions satisfy

|x(t)| ≤ |x(0)|eK |t| +B

K (eK |t| − 1), T 1 < t < T 2

where either T 1 or T 2 (or both) may be infinite.

Discussion: Recall that Theorem 3.4.3 gave a complete existencetheory for linear systems such as (3.33), but its proof required delvingmore deeply into the details of the integral-equation formulation of the IVP. This exercise is part of a alternative approach to linearequations that circumvents such analysis: i.e., start from the easilyproved local existence result Exercise 3(a) in the previous chapterand use the present exercise to obtain a global solution.

Regarding perspective, it is instructive to recall Exercise 6 in whichyou solved the special case of Mathieu’s equation x′′+(1/4+β cos t)x =0 (written as a linear first-order system with variable coefficients).Naively, based on a comparison with x′′ + x/4, one would not expectsolutions of Mathieu’s equation to grow at all as t increased, butcomputation showed otherwise. The present result does guaranteethat the growth is no worse than exponential.

5. Give an example of a locally Lipschitz function (or even C1) that satisfies thelinear-growth estimate (4.5) but is not (globally) Lipschitz.

6. Under the hypotheses of Theorem 4.2.2, show that if b ∈ ∂ K, then the solutionx to (4.1) exists for all positive time and moreover belongs to K.

This exercise is obsolete, given the rewriting of the proof of Theo-rem 4.2.2.

113



8. Construct trapping regions and thereby prove global existence for the follow-ing equations, possibly with restrictions on the initial data as indicated. Usenullclines if helpful.

(a) The Lotka-Volterra equations including both the Allee effect and logisticgrowth of the prey:

x′ = x

x−εx+ε

(1 − x

K ) − xy

y′ = ρ(xy − y)

where x(0), y(0) ≥ 0.(b) The FitzHugh-Nagumo equations,

x′ = −y − β (x3/3 − x)y′ = x,

(4.70)

which admit a rectangular trapping region. Not true!

Discussion: Show that the function x obtained from a solution of (4.70) satisfies the second-order scalar van der Pol equation

x′′ + β (x2

− 1)x′ + x = 0.

This alternative reduction of the van der Pol equation to a first-order system provides a simpler existence proof than the one givenin Section 5.3.3.

This simpler existence proof is based on a very clever idea. It is easyto get cowed into thinking, “I never would have thought of that ina million years.” It is important to remember that a lot of cleverpeople have been working in this field for a long time, and we arethe beneficiaries of their efforts. Many of us could not have come up

with this idea, either; and at the same time you might be surprisedby how resourceful you become after studying a problem intenselyover an extended period.

Incidentally, (4.70) is one of several, similar, equations that go bythis name.

(c) Duffing’s equation without forcing, written as as a first-order system:

x′ = yy′ = −βy + x − x3.

(4.71)

Hint: Use the energy (kinetic plus potential) of this system,

E (x, y) = y2/2 − x2/2 + x4/4 (4.72)

115



to construct the trapping region.

(d) A bead sliding on a rotating loop,

x′ = y (4.73)

may′ = −βy − mg sin x + m(a sin x)ω2 cos x,

as shown in Figure 4.9. Here x is the angle the position of the bead makes with

the vertical, m is the mass of the bead, a is the radius of the hoop, and ω isthe angular speed of the hoop.

Discussion: These equations come from applying Newton’s secondlaw to the motion, which is purely tangential. The tangential ac-celeration is ax′′. There are three forces acting on the bead: (i)gravity, whose projection onto the tangential direction is mg sin x;(ii) friction, which we model as −βx′; and (iii) centrifugal force fromrotation about the vertical axis in a circle of radius equal to a sin x,whose tangential projection is m(a sin x)ω2 cos x. The first two forcesare exactly the same as for the pendulum without rotation; the third

is new. Alternatively, one may regard (4.73) as another variant of the pendulum equation in which the pivot point rotates.

In this problem the energy to use in constructing a trapping regionis

E = my2/2 + m(aω sin x)2/2 − mga cos x.

The first term here is the kinetic energy of motion around the hoop;the third is the gravitational potential energy; the middle term is an-other contribution to kinetic energy, coming from the rotation aroundthe vertical axis.

It is quite easy to obtain global existence for (4.73) by constructinga trapping region as a subset of the cylinder S 1 × R. Challenge:Can you prove that, even if (4.73) is considered as an ODE on R2,trajectories remain bounded? I.e., show that friction brings the beadto rest after a finite number of revolutions around the loop, no matterwhat the initial conditions.

(e) The Lorenz equations:

x′ = σ(y − x)

y′ = ρx − y − xz z ′ = −βz + xy.

116



ax

ω

massm

mg

a sin(x)

Figure 4.9: Schematic of the bead on a rotating wire hoop in Exercise 8 d.

Hint: Show that a region of the form

K = x2 + y2 + (z − ρ − σ)2 ≤ A2

is a trapping region if A is sufficiently large.

9. Consider the equations for growth of two symbiotic species:

x′ = r1x

1 − x

K 1+

y

K 3

, y′ = r2y

1 +

x

K 2− y

K 4

.

The constants K j are all positive. For some parameter values, solutions of theseequations with x(0), y(0) ≥ 0 exist for all time; for other values, solutions blowup in finite time. Examine the nullclines of this system and determine thecondition on the parameters that separates the two cases. Prove that solutionexists for all time in good case. Prove that the solution does blow up in finitetime when the nullclines suggest this possibility.

Remark: Note that each species has logistic growth that is enhancedby the presence of the other.

10. For the Lotka-Volterra system (4.66), calculate ∂ ϕ/∂b2(t, b1, 0).

11. (a) Use separability to show that the solution of a scalar IVP

dx

dt= f (x), x(0) = b

satisfiesF (x) = t + F (b) (4.74)

117



where F (x) is an anti-derivative of 1/f (x).

(b) Differentiate (4.74) to show that

∂x

∂b=

f (x)

f (b).

(c) Show that your answer to Part (b) satisfies the appropriate IVP for (4.65);i.e.

v′ = f ′(x(t)) v, v(0) = 1.

Discussion: This problem provides an independent check on (4.65).A comment about notation: it might be more precise to use theflow notation ϕ throughout the above problem, but when explicitcalculations are involved we usually find it more intuitive to use xinstead. Note that what we are calling x depends on both t and b,and we are inconsistent about how many, if any, of the arguments of x we choose to write. Again, we find this vagueness is helpful whenfinding explicit solutions.

12. (a) Solve the IVP for the logistic equation with constant harvesting

dx

dt= x(1 − x) − µ, x(0) = b.

(b) Compute the derivative ∂x/∂µ as a function of t, µ, and b.

(c) Verify that Theorem 4.5.4 gives the same answer as Part (b).

(d) Allowing harvesting to be time-dependent,

dx

dt = x(1 − x) − µf (t), x(0) = b,

find ∂x/∂µ as a function of t for the special values µ = 0, and b = 1.

Discussion: Part (a) repeats Exercise 3(c) from Chapter 1. This ex-ercise, which involves some messy calculations, has limited intrinsicinterest, but it has some pedagogical value: it helps make Theo-rem 4.5.4 more concrete, it provides an independent check on theformula of the theorem, and Part (d) invites the reader to generalizethe theorem to nonautonomous equations. As in the previous exer-cise, we avoid the use of the flow notation ϕ and are sloppy aboutindicating arguments of dependent variables.

118



13. The least painful way to prove Theorem 4.5.5 is to extend the order notation.For any exponent p, define the concepts f (ε) = O(ε p) and f (ε) = o(ε p) withthe obvious modifications of the definitions in Section 4.5.2.

(a) Verify the generalization of (4.55):

(a) If f (ε) = O(ε p) and g(ε) = o(εq), then f (ε)g(ε) = o(ε p+q).(b) If f (ε) = o(ε p) and if φ(ε) = O(εq) where q > 0, then f (φ(ε)) = o(ε pq).

(4.75)

(b) Show that a function of one variable f (t) is of class Ck iff

f (t + ∆t) −k

ℓ=0

1

ℓ!

dℓf

dtℓ(t)(∆t)ℓ = o((∆t)k),

uniformly for t in compact sets. Extend this to an analogous result for functionsof several variables.

(c) Prove Theorem 4.5.5.

Discussion: It may be that the principal obstacle to Part (c) of the

Exercise is finding adequate notation for the higher derivatives of ϕ,especially with respect to the initial conditions b. Struggle with thispart, or maybe with the whole exercise, for as long as you find itinstructive to do so, but no longer.

4.6.2 Exercises referenced elsewhere in this book

14. Let r(t, b), θ(t, b) be the solution of the IVP

r′ = f (r, θ)(r − 1), r(0) = b

θ′ = 1, θ(0) = 0.

Calculate ∂r/∂b(2π, 1)

Discussion: Note that r(t, 1) ≡ 1 and that for all initial conditions bthe second component satisfies θ(t, b) = t. The solution with initialconditions b = 1 traces out the unit circle (see Figure 4.10). Theabove derivative may be interpreted as an approximate answer tothe following question: if a solution of this system starts out at apoint on the x-axis near r = 1, what will the value of r be when thesolution next crosses the (positive) x-axis? This apparently academic

exercise will have some value in Chapter 6 illustrating the Poincaremap.

119



of the iteration—advancing from yn, an approximation of x(nh), toyn+1—is a two-stage process:

(a) yn+ 1

2= yn + h

2F(yn)

(b) yn+1 = yn + hF(yn+ 1

2).

(4.77)

After yn+1 is computed, yn+ 1

2is discarded.

(b) Repeat Part (a) using the improved Euler method. Again plot the erroryour numerical solution as a function of h.

Discussion: To motivate this method, as in Section 4.8.2, we focus onthe first step of the iteration, calculation of y1 ≈ x(h). The improvedEuler method is based on the integral-equation formulation,

x(h) = x(0) +

h

0

F (x(s)) ds.

Equation (4.77b) represents a one-term Riemann-sum approximation

of the integral, h

0

F (x(s)) ds ≈ hF(x(h/2)), (4.78)

but with the integrand evaluated at the center of the interval. Asillustrated in Figure 4.11 this midpoint rule is substantially more ac-curate than the left-end-point rule on which the (unimproved) Eulermethod is based. Equation (4.77a) makes an estimate for x(h/2) tobe used in (4.78). Of course yn+ 1

2

= x(h/2), and this is yet anothersource of error in the approximation. However, the key point is this:

In (4.78), F(x(h/2)) is multiplied by the small factor h, and becauseof this, the errors generated by substitution of yn+ 1

2into (4.78) are

no larger than those inherent in the midpoint-rule approximation.

It may be shown that, for small h, the improved Euler method sat-isfies the estimate

|x(nh) − yn| ≤ Ch2eLnh.

(For a proof, see Atkinson [1]. the proof does not comfortably fitthe Gronwall-based proof of Theorem 4.8.1.) Notice that the error is

∼ h2

, a substantial improvement over the original Euler method if his small. Based upon the exponents of h in these error estimates, wesay that the [original] Euler’s method is first-order, and the improved

121



h h

s s

F(x(s)) F(x(s))

Figure 4.11: The areas of the shaded rectangles are the left endpoint (Euler) and

midpoint approximations of h0

F (x(s)) ds.

Euler method is second-order. Variants of the more sophisticatednumerical methods ode45 and Runge-Kutta-Fehlberg’s rkf45 , bothaccessible in the Matlab R software package, are fourth-order (errorson the order of h4).

16. Another weakness of Euler’s method is that it performs poorly for stiff ODEs.There is no rigorous definition of stiffness, but typically this behavior occurswhen different components of a system of ODE evolve at radically differentrates. Here is an academic example of such a system:

(a) x′ = y(b) y′ = −x − (2x2 − z )y(c) z ′ = −M (z − 1 − x2)

(4.79)

where M is a large number. We invite the reader to attack (4.79) numerically

before we attempt to explain further.

(a) Solve (4.79) with Euler’s method for M = 103 and M = 105. Warning:

You will find that h needs to be rather small for Euler’s method to performappropriately.

Discussion: Some insight into stiffness may be gleaned from thescalar IVP x′ = −Mx, x(0) = 1, where M > 0 is large. The exactsolution to this IVP is a rapidly-decaying exponential function x(t) =e−Mt. Using Euler’s method to generate the approximations yn ≈x(nh), we compute

yn+1 = yn + h(−Myn) = ( 1 − Mh)yn,

122



from which it follows that yn = (1 − Mh)n for each n ≥ 0. Unless

h < 2/M , the iterates yn do not tend to zero! Indeed, if h > 2/M ,then the sequence yn grows exponentially with consecutive ele-ments alternating in sign—behavior that is totally unrelated to thetrue solution. Euler’s method is guaranteed to converge if h is suffi-ciently small, but in this case, because of the rapid evolution in theODE x′ = −Mx, h0 must be exceedingly small.

Stiffness occurs when, as in (4.79), an equation with rapid evolutionaccompanies others with modest evolution rates. Thus, the restric-tion on h required by (4.79c) means that prohibitively many stepsare required in order to solve (4.79a,b) x(t) over an interval of anyappreciable length.

Since M is large, we might consider the approximation M → ∞,in which case (4.79c) is transformed to an algebraic equation z =1 + x2, and substitution of this into (4.79b) reduces the originalthree-dimensional system to the van der Pol system, which is two-dimensional and has no stiffness problem.

(b) Compare your solution in Part (a) with solutions obtained from the ap-proximation of M → ∞.

Give some idea of software that is effective on stiff problems? Maybegive an exercise on an implicit method?


17. Example of ODE such that one sln blows up in finite time but pert’bn existfor all time:

x′ = x2

−y2

y′ = 2xy.

Orbits are the x-axis and circles x2 + (y − C )2 = C 2. Can prove in 2 ways. (1)Write equation as

2C = x2/y + y

and differentiate. (2) Solve the ODE

dy

dx=

2xy

x2 − y2

to obtain the above equation, with C as a constant of integration. Include

hint: Multiply equation by x2 − y2 and divide by y2 to make exact.Give reference so interested student can follow up.

123



18. Prove global existence for

x′ = yy′ = −βy − [1 + αω2 cos ωt]sin x,

the model introduced in Exercise 11 for a pendulum vibrated at its base. Movethis to “routine exercises”. Also, connect with computation.

Q: Which first, Notes or Euler?


While Theorem 4.4.1 gives control of the solution of an IVP for any finite time, itdoes not give control for infinite time. Indeed, as we will see in Chapter 5,

limt→∞

ϕ(t, b)

may be discontinuous in b.

4.8 Appendix: Euler’s method

4.8.1 Introduction

In practice, it is rarely possible to produce explicit solutions of ODEs, so one oftenresorts to the use of numerical methods 5 in order to approximate the solution of an IVP. Indeed, on several occasions above we have already used software for thispurpose. In this section we introduce the simplest numerical method, known asEuler’s method . We do not propose to actually use this method to learn aboutsolutions of ODEs—the software employs methods that are far more accurate thanthis, and their automated control of step size makes them a joy to use. Rather,we study Euler’s method for cultural reasons: i.e., it provides useful insight intonumerical methods in general, while its simplicity allows the conceptual issues tocome through more easily.

Euler’s method is an iterative process for approximating the solution of (4.1), sayon a set of evenly-spaced6 t-values. Given h > 0, for n = 0, 1, . . . , we calculate theapproximation yn for x(nh) recursively according to the rule

y0 = b; yn+1 = yn + hF(yn), n = 0, 1, . . . . (4.80)

5

Perturbation methods , some of which are discussed in Sections 6.3 and 6.4 (among other places),offer another valuable way of approximating solutions.

6Actually, we assume equal spacing only for simplicity; this is not necessary.

124



Although yn depends on h, we follow the usual convention of not indicating thisdependence.

As an illustration, consider the scalar IVP x′ = x, x(0) = 1, which has exactsolution x(t) = et. Let approximate the solution on the interval t ∈ [0, 1] withEuler’s method, say using a step size of h = 1/N where N is a positive integer.Starting from y0 = 1, we use (4.80) to generate the subsequent iterates recursively:

yn+1 = yn + hyn = 1 +1

N yn, (n = 0, 1, . . . , N

−1)

so yn = (1 + 1/N )n. In particular,

yN = (1 + 1/N )N → e = x(1)

as N → ∞ (and thus h → 0). In other words, the approximation works as desired forthe point t = 1. More generally, yn provides an approximation for et for all t ∈ [0, 1],but the formulation of this behavior is made slightly awkward by the following twoissues: (i) the number of iterations needed to reach time t increases as 1/h as h → 0and (ii) any specific time t need not belong to the set of grid points nh : n = 0, 1, . . .for which the approximations are computed. Specifically, the convergence result forthis example guarantees that

limh→0

max0≤n≤h−1

|enh − yn| = 0.

A convergence result for the general case is given in Theorem 4.8.1 below.

4.8.2 Theoretical basis for the approximation

We offer three motivations7 for Euler’s method. All three motivations begin with

the limited goal of understanding the first step in (4.80), which we may rephrase as

x(h) ≈ x(0) + hF (x(0)). (4.81)

(For simplicity, we assume temporarily that (4.1) is a scalar equation.)

Method 1: (Tangent line) Interpreting the derivative geometrically (see Figure 4.12),we see from the ODE that the slope of the solution curve through (0, x(0)) equalsF (x(0)). Thus, we may estimate x(h) by following the tangent line, resulting in theapproximation (4.81).

7As regards Euler’s method, all three motivations produce the same approximation, but differentadvanced numerical methods may result from starting with one or another of these three pointsof view.

125



x(h)

h

exact

0

x

t

x(0) =

slope

0y

F(y )0

1y

Figure 4.12: Schematic il lustration of one iteration of Euler’s method. Note the discrepancy between the exact solution x(h) and its Euler’s method approximation

y1.

Method 2: (Finite differences) Using the difference quotient approximation

x(h) − x(0)h

≈ x′(0) = F (x(0)),

we again obtain (4.81).

Method 3: (Integral equation) Reformulating the IVP as an integral equation,

x(h) = x(0) +

h

0

F (x(s)) ds,

we derive (4.81) from a one-term Riemann-sum approximation for the integral.

The continuation of Euler’s method may seem like an act of desperation. It is

extremely unlikely that the point (h, y1) will lie on the exact solution curve (seeFigure 4.12). Nevertheless, it is the best information we have about the solution.Therefore, we will use that point as the starting point for another iteration of Euler’smethod: i.e., we let y2 equal the Euler approximation to the solution of x′ = F (x)through the point (h, y1). All subsequent steps are derived similarly. One may wellwonder about an approximation in which each step is based on increasingly faultyinformation, especially since as h → 0 more and more steps are required to advancea finite time. In fact, as we show in the next section, the accumulated error in thenumerical solution tends to zero with h.

126



4.8.3 Convergence of the numerical solution

If F is defined everywhere, then the definition (4.80) of yn remains meaningful forarbitrarily large n, even if the solution x that is being approximated blows up in finitetime. However, if F is defined only on a subset U ⊂ Rd, then some iterate yn may lieoutside U , so the iteration would halt. This possibility is addressed in Conclusion (i)of the following theorem, which has much in common with Theorem 4.4.1.

Theorem 4.8.1. Suppose F :

U →Rd is locally Lipschitz, and let x(t), 0

≤t < β ,

be a solution in forward time of x′ = F(x) with initial condition x(0) = b. Then:

(i) For any positive T that satisfies T < β , there exists a positive constant h0

such that, if h < h0, the iterates yn are defined for all n such that nh ≤ T .

(ii) For any ε > 0, there are constants C, L such that if h < h0, then

|x(nh) − yn| ≤ CheLnh, for 0 ≤ nh ≤ T.

Remark: Note that Conclusion (ii) implies the uniform estimate |x(nh) − yn| ≤CheLT .

Proof. Choose a compact subset K ⊂ U and a constant δ > 0 such that

(∀t ∈ [0, T ]) B(x(t), δ ) ⊂ K ⊂ U .

Regarding Conclusion (ii), we let C = maxK |F| and we let L be a Lipschitz constantfor F|K; regarding Conclusion (i) we let h0 = e−LT δ/C . We compute yn for as manyiterations as nh ≤ T and yn ∈ K, say n ≤ N . Note that yN ∈ K, so that it is possibleto calculate at least one more iterate, yN +1. We shall prove that if (N + 1)h ≤ T ,then yN +1 ∈ K. Thus, the iteration stops only because nh exceeds T .

Now the solution x satisfies the integral equation

x(t) = b +

t

0

F(x(s)) ds.

In order to derive an analogous equation for the approximate solution, we constructa piecewise constant function that is defined for (continuous) t ∈ [0, (N + 2)h):

y(h)(t) = yn for nh ≤ t < (n + 1)h.

We claim that at the grid points

y(h)(nh) = b + nh

0

F(y(h)(s)) ds , n = 0, 1, 2, . . . , N + 1,

127



which may be verified with induction since (n+1)hnh

F(y(h)(s)) ds = hF(yn),

the integrand being constant. More generally, between the grid points, if nh ≤ t <(n + 1)h

y(h)(t) = b + t

0

F(y(h)(s)) ds − t

nh

F(y(h)(s)) ds.

Suppose (N + 1)h ≤ T . For 0 ≤ t ≤ (N + 1)h, let g(t) = |x(t) − y(h)(t)|.Subtracting integrals we deduce that

g(t) ≤ t

0

|F(x(s)) − F(y(h)(s))| ds +

t

nh

|F(y(h)(s))| ds, 0 ≤ t ≤ (N + 1)h

where n is the largest integer such that nh ≤ t. Note that in the integrandsx(s), y(h)(s) ∈ K. By the definition of C the second term here satisfies

t

nh |F(y(h)(s))

|ds

≤Ch,

and by Lipschitz continuity the first satisfies t

0

|F(x(s)) − F(y(h)(s))| ds ≤ L

t

0

g(s)ds.

Thus by Gronwall’s inequality (more properly, by the extension to piecewise contin-uous functions in Exercise 5 in Chapter 3)

g(t) ≤ CheLt. (4.82)

In particular, taking t = (N + 1)h, we see that

|yN +1 − x((N + 1)h)| ≤ CheL(N +1)h < CheLT < δ,

so yN +1 ∈ K, which completes the proof.

Although the iterates produced by Euler’s method converge to the true solutionof the IVP as h → 0, the error estimate provided by Theorem 4.8.1 actually alludesto one of the method’s weaknesses: i.e., the error is on the order of h to the first power . By contrast, the error in the Matlab R routine ode45 is of the order of h4.Thus if the step size is halved, the error is decreased by a factor of 16! (Of coursetypically h is chosen by the software, not the user, so this behavior is not readilyapparent to the user.)

128



Much effort has gone into devising highly accurate numerical methods for solvingODEs. While this subject lies outside the scope of the present text, Exercises 15 and16 hint at ways Euler’s method can be improved. For a more thorough presentation,see Atkinson [1].

129



Chapter 5

Trajectories near equilibria

From here on in this book, unless otherwise stated, we shall assume in the ODEx′ = F(x) that the function F is C1.

5.1 Stability of equilibriaA point b∗ ∈ Rd is called an equilibrium point for an ODE x′ = F(x) if at thispoint F(b∗) = 0. If b∗ is an equilibrium point for an ODE, then x(t) ≡ b∗ is asolution of the equation. For a linear homogeneous equation x′ = Ax, the originb∗ = 0 is always an equilibrium, and it is the only equilibrium if A is invertible.Regarding nonlinear equations, consider the 2 × 2 system derived from Duffing’sequation without forcing

x′ = yy′ = x − x3 − βy,

(5.1)

which has equilibrium points (0, 0) and (±1, 0); these correspond to the local max-imum and the two global minima, respectively, of the potential function V (x) =−x2/2 + x4/4. Similarly, we saw in Section 4.3.1 that the activator-inhibitor system

(a) x′ = α 11+r

x2

1+x2− x

(b) r′ = γ [β x2

1+x2− r],

(5.2)

has multiple equilibria if α2 > 4(β + 1). Indeed, multiple equilibria is the moretypical behavior for nonlinear equations.

130



5.1.1 The main theorem

Recall from Theorem 2.4.1 that for a linear homogeneous equation x′ = Ax, if theeigenvalues of the coefficient matrix satisfy

ℜλ j(A) < 0, j = 1, 2, . . . , d

then every solution x(t) of the equation decays to zero as t → ∞. The followingtheorem, a major and quite beautiful result, asserts that one may deduce similar

behavior for solutions of a nonlinear equation x′ = F(x) near an equilibrium pointb∗, provided the eigenvalues of DF(b∗), the differential of F at the equilibrium,satisfy this condition. Here and below we shall abbreviate DF(b∗) to DF∗.

Theorem 5.1.1. Suppose b∗ is an equilibrium point for x′ = F(x), where F is C1,and assume that

ℜλ j(DF∗) < 0, j = 1, 2, . . . , d . (5.3)

Then there is a neighborhood V of b∗ in Rd such that for any initial data b ∈ V , the IVP

x′ = F(x), x(0) = b (5.4)

has a solution for all t ≥ 0 and moreover limt→∞ x(t) = b∗.

Remark. The theorem includes the linear case because if F(x) = Ax, then at anypoint x the differential DF(x) = A. In general, if b∗ is an equilibrium of x′ = F(x)and if A = DF(b∗), we shall call the equation y′ = Ay the linearization of x′ = F(x)at b∗.

Proof of Theorem 5.1.1. Making an appropriate translation in Rd, we may assumewithout loss of generality that the equilibrium b∗ is located at the origin: i.e., F(0) =0. Let’s expand F in a Taylor series at the origin: F(x) = Ax + r(x) where the

constant term is missing since F(0) vanishes, in the linear term A = DF(0), andusing the order-terminology of Section 4.5.2, the remainder r = o(|x|). We mayrewrite the IVP (5.4) as

x′ − Ax = r(x(t)), x(0) = b. (5.5)

We interpret (5.5) an a linear equation with constant coefficients with the inhomo-geneous term r(x(t)). As we saw in (2.39), a solution of (5.5) satisfies the integralequation

x(t) = eAtb + t

0

e(t−s)A r(x(s)) ds. (5.6)

Like (3.12), used in proving the existence theorem, (5.6) appears to give a formula forthe solution of the IVP, but actually it is only an integral equation that characterizes

131



the solution. The advantage of (5.6) over (3.12) is that the assumption on theeigenvalues of A implies that eAt tends to zero as t → ∞, assisting the convergence of the integral for large t. Specifically according to Proposition 2.4.2 there are constantsK, ε where K ≥ 1 and ε > 0 such that

eAt ≤ Ke−εt, t ≥ 0. (5.7)

Choose a positive constant η < εK. Since r(x) = o(|x|), there is a δ > 0 such that if

|x| < δ then |r(x)| < η |x|.Claim: If |x(0)| = |b| < δ/K , then the solution of (5.5) satisfies |x(t)| < δ for

as long as it exists.

Proof. With the same ε as in (5.7), let g(t) = eεt|x(t)|. We seek to control the growthof g(t) as t → ∞ in order to prove that x decays. Now (5.6) implies that

g(t) ≤ eεt|eAtb| + eεt

t

0

|e(t−s)A r(x(s))| ds.

Applying (5.7) to estimate the exponentials in each term we conclude that

g(t) ≤ K |b| + K

t

0

eεs|r(x(s))| ds. (5.8)

Let us derive a contradiction by assuming that there is a time t∗ such that |x(t)| <δ for t < t∗ while |x(t∗)| = δ . Then for t ≤ t∗ we may estimate the second term of (5.8)

K

t

0

eεs|r(x(s))| ds ≤ Kη

t

0

eεs|x(s)| ds = Kη

t

0

g(s) ds.

Hence by Gronwall’s Lemma, g(t)

≤K

|b

|eKηt. Recalling the definition of g, we

conclude that|x(t∗)| = e−εt∗g(t∗) ≤ K |b|e(Kη−ε)t∗.

But K |b| < δ , and e(Kη−ε)t∗ < 1 by our choice of η. Thus |x(t∗)| < δ , contradictingour assumption above, and this proves the claim.

Proof of Theorem 5.1.1 concluded. Let V = b ∈ Rd : |b| < δ/K . By theclaim, the solution of (5.4) stays inside B(0, δ ) for as long as it exists, and it followsfrom Theorem 4.1.2 that the solution exists for all positive time. Moreover |x(t)| ≤δe(Kη−ε)t, so x tends to zero as t → ∞.

The following corollary of Theorem 5.1.1 makes the convergence to the equilib-rium more quantitative.

132



Corollary 5.1.2. If in Theorem 5.1.1 the eigenvalues satisfy

ℜλ j(DF∗) < −ε∗, j = 1, 2, . . . , d (5.9)

where ε∗ > 0, then V may be chosen with the property that there is a constant K such that for all b ∈ V , the solution of the IVP satisfies

|x(t)| ≤ Ke−ε∗t|b|, t ≥ 0. (5.10)

This result may be derived by exercising a little more care in the proof of Theo-rem 5.1.1, a task we ask you to complete in the Exercises.

5.1.2 An illustrative example:

Let us illustrate these ideas by applying them to Duffing’s equation ( 5.1), whichdescribes motion in the double-well potential V (x) = −x2/2 + x4/4. At the twominima of V we have

DF(±1, 0) =

0 1

−2 −β

.

To test whether eigenvalues have negative real parts, we compute

det DF = +2 > 0, tr DF = −β < 0

where we have assumed β > 0: i.e., normal friction. Hence by Proposition 2.4.4,the eigenvalues of DF both have negative real parts, and so Theorem 5.1.1 appliesto the equilibria (±1, 0). Of course the term “potential well” suggests this behavior;really the point of this example is more about checking the theorem than gainingnew information.

Examples that are more interesting will be studied below.

5.2 An orgy of terminology

Researchers have introduced a bewildering variety of terminology related to the hy-potheses and conclusion of Theorem 5.1.1. Let’s get through this, because theseconcepts will help us to describe the phenomena more precisely.

5.2.1 Description of behavior near equilibria

(a) The terminology: An equilibrium b∗ of a system x′ = F(x) is called Liapunov

stable if for every neighborhood V of b in Rd

, there is a smaller neighborhood V 1 suchthat if initial data b are restricted to belong to V 1, then the IVP (5.4) is solvable forall positive times and moreover x(t) ∈ V for all t ≥ 0. The equilibrium b∗ is called

133



asymptotically stable if it is Liapunov stable and if there is one neighborhood V ∗ of b∗ such that for all initial data in V ∗ the solution of (5.4) satisfies

limt→∞

x(t) = b∗. (5.11)

The results of Section 5.1 imply that b∗ is asymptotically stable if condition (5.3) issatisfied—note that we must invoke Corollary 5.1.2 to conclude that b∗ is Liapunovstable.

Unstable will mean the negation of Liapunov stability: i.e., there is some neigh-borhood V of b in Rd such that for every smaller neighborhood V 1 there are initialconditions b ∈ V 1 such that the solution to the IVP (5.4) leaves V at some finite,positive time.

Two examples may explain the need for the careful language used in the defini-tions of stability and asymptotic stability. First, one might think that if property(5.11) held, then b∗ was surely Liapunov stable. This is false , as shown by thefollowing 2 × 2 system, which we write in polar coordinates

r′ = r − r3

θ′ = 1 − cos θ.

(5.12)

Trajectories for this system are illustrated in Figure 5.1. As may be seen from thefigure, all solutions of this system converge to (r, θ) = (1, 0) as t → ∞. However, atrajectory that starts at a point (r, θ) = (1, ε) where ε is small and positive, proceedsall the way around the circle before it converges to (r, θ) = (1, 0); in particular, itleaves the ball around (r, θ) = (1, 0) of radius 1/2.

Secondly, regarding the nested neighborhoods in the definition of Liapunov stable,consider the following linear 2 × 2 system:

x′ = −ε −M −1

M −ε x

where M is a large constant and ε > 0 a small one. For any constant C ,

x(t) = Ce−εt

cos t

M sin t

is a solution of this system. Even though they spiral into the origin, these orbits arevery elongated, as shown in Figure 5.2. Suppose we are given a circular neighborhoodV = b ∈ R2 : |b| < ε of the origin. We want to find another circular neighborhoodV 1 = b ∈ R2 : |b| < δ such that if x(0) ∈ V 1, then the trajectory remains confined

to V . We have to choose δ < ε/M because orbits are so elongated. If we were smartenough to choose V 1 with a perfect shape adjusted to the orbit, we could arrange thatx(t) ∈ V 1 for all t ≥ 0. In fact this is possible for linear systems (see Exercise ??.)

134



y

x1

Figure 5.1: The point (1, 0) is attracting but not Liapunov stable.

but completely impractical in general.

(b) Unstable and borderline cases: In the Exercises we ask you to establish the

following claim: If b∗ is an equilibrium of x′ = F(x) and if ℜλ j(DF∗) > 0 for some j, then b∗ is unstable. (Often much more specific information is available—see Section 5.5.) For example, regarding the equilibrium at the origin of Duffing’sequation (5.1), we have det DF(0, 0) < 0. Thus the eigenvalues of DF have oppositesigns, one of them being positive, so the origin is unstable. Of course this conclusionis entirely expected.

If at an equilibrium, we have only ℜλ j(DF∗) ≤ 0—i.e., if the inequality (5.3) isnot strict1—then no information can be deduced from the linearization. To justifythis statement, consider the scalar ODE

x′ = ±x3

. (5.13)

For either sign the origin is an equilibrium, and the lone eigenvalue of DF (0) vanishes.However if the minus sign is chosen, the origin is asymptotically stable, while if theplus sign is chosen, it is unstable. Indeed, in the latter case, all solutions exceptx(t) ≡ 0 blow up in finite time, an extreme form of instability.

In Duffing’s equation (5.1), if the friction coefficient β = 0 then the eigenvaluesof DF at (±1, 0) are pure imaginary. Thus, in this case one cannot determine thestability of these equilibria from theory based on linearization. In fact, the equilibriaare Liapunov stable but not asymptotically stable, as can easily be shown using aLiapunov function, a technique we will introduce in Section 5.4.

1For example, (5.12) suffers from this degeneracy.

135



b2

b1

δ

ε

b: |b|=

b: |b|=

Figure 5.2: Choosing large M gives rise to an elongated spiral.

5.2.2 Classification of eigenvalues of 2 × 2 Jacobians

Many authors use the term hyperbolic to describe an equilibrium b∗

of an ODEx′ = F(x) such that

ℜλ j(DF∗) = 0, j = 1, . . . , d . (5.14)

We regard this terminology as unfortunate since the word hyperbolic already has somany uses in mathematics, but it is well established and we will also use it. Theadjective “hyperbolic” derives from the simplest ODE with such an equilibrium,

x′

y′

=

0 11 0

xy

,

whose solutions move along hyperbolas

x2

−y2 = C

where C is a constant; in

general, however, orbits near a hyperbolic equilibrium have only a weak, qualitativeresemblance to hyperbolas.

In two dimensions there is an extensive vocabulary describing the eigenvalues of the Jacobian at an equilibrium. In Table 5.1 we have listed many terms classifyingsuch equilibria that are hyperbolic . It will be useful below to have these available whileconsidering specific examples, and we recommend that, during some “captive time”like on a long flight, you commit them to memory. Along with the memorization, youshould make phase-plane plots for a linear ODE with a node, with a focus, and witha saddle. (Strictly speaking, there are three qualitatively different case for a node,according as (i) λ1

= λ2, (ii) λ1 = λ2 with Jacobian equal to λ1I , or (iii) λ1 = λ2

with the Jordan normal form of the Jacobian being a 2 × 2 block.)

136



Signs of ∂F 1∂y

,∂F 2∂y

Slope x-nullcline larger Slope y-nullcline larger

Same det DF∗ < 0 det DF∗ > 0Opposite det DF∗ > 0 det DF∗ < 0

Table 5.2: Types of equilibria and slopes of nullclines

5.2.4 The Hartman-Grobman Theorem

Suppose a two-dimensional system x′ = F(x) has a hyperbolic equilibrium of oneof the above types. A little computation with examples offers convincing evidencethat the flow of the nonlinear system near the equilibrium qualitatively resembles theflow of the linearization. The resemblance is made precise in the Hartman-GrobmanTheorem, which states that the flow of x′ = F(x) is topologically conjugate to thatof its linearization. More precisely:

Theorem 5.2.1. Suppose b∗ is a hyperbolic equilibrium point for x′ = F(x), let ϕ be the flow map for this equation, and let A = DF∗. Then there exists a neighborhood V of b∗ and a continuous map Ψ : V → Rd, which is a homeomorphism onto its range, such that

Ψ(ϕ(t, b∗)) = etAΨ(b∗) (5.17)

for all b∗ ∈ V and all times such that ϕ(t, b∗) is defined and belongs to V .

Note that the result asserts only the continuity of the homeomorphism. There areseveral different results giving additional conditions which guarantee that a smootherΨ exists. One of the more satisfying results is due to Hartman need ref : if the equi-librium b∗ is asymptotically stable (i.e., ℜλ j(DF∗) < 0), there is a C1 diffeomorphismΨ such that (5.17) is satisfied, provided F is C2. However, all these issues are beyondthe scope of this book.

References:

Perko proves H/G theorem, states Hartman’s improvement.Chicone proves H/G.Meiss also proves H/G.Wiggins refers to Arnold and to Palis-de Melo for proof. Also gives example to showthat in Hartman theorem Ψ cannot be C2.

5.3 Activator-inhibitor systems and the Turing instability

In Section 4.3.1 we proved global existence in forward time for the system

(a) x′ = α 11+r

x2

1+x2 − x(b) r′ = γ [β x2

1+x2− r]

(5.18)

138



where α, β , γ are positive parameters. (Also we briefly described the physical signif-icance of the equations.) In the first subsection below we determine the equilibriaof (5.18) and their stabilities. In the second, we present the Turing instability: i.e.,we use Theorem 5.1.1 to show that, if the reaction takes place in several vessels,a stable equilibrium of (5.18) can be destabilized as a result of chemicals diffusingbetween vessels. This is a lovely application of the theorem, and it introduces somefascinating mathematics.

5.3.1 Equilibria of the activator-inhibitor system

The origin (0, 0) is an obvious equilibrium of (5.18). This system has the Jacobian

DF =

α

1+r2x

(1+x2)2− 1 − α

(1+r)2x2

1+x2

βγ 2x(1+x2)2

−γ

, (5.19)

which at the origin equals

DF∗ =

−1 00 −γ

,

so by Theorem 5.1.1 (0, 0) is asymptotically stable.Turning our attention to the other equilibria, we set the RHS of (5.18b) to zero

and solve for r to obtain

r = β x2

1 + x2, (5.20)

and we process (5.18a) similarly: excluding the zero solution, we divide by x andrewrite the equation as

r = αx

1 + x2− 1. (5.21)

Now substitute (5.20) for r in (5.21), clear 1 + x2 from the denominator, and rear-

range, yielding (β + 1)x2 − αx + 1 = 0. (5.22)

This relation is graphed in Figure 5.3. Thus, (5.18) has two nontrivial equilibria if

α > 2

β + 1, (5.23)

and none if α < 2√

β + 1. We assume that the parameters satisfy (5.23).

Let us investigate the stabilities of the two equilibria (5.22), say E± = (x±, r±).As shown in Section 5.2.3, we may determine the sign of det DF∗ by comparingthe slopes of the two nullclines. Note from (5.19) that ∂F 1/∂r and ∂F 2/∂r always

have the same sign, both negative. Thus, by Table 5.2, the equilibrium E− is asaddle point since, as we see in Figure 5.4 the x-nullcline has larger slope than ther-nullcline there. By contrast, E+ is either a sink or a source; to determine which

139



β +1

1

β +12

x+

x

α

x−

Figure 5.3: Nonzero equilibria of (5.18) for α ≥ 2√

β + 1.

we must compute the sign of tr DF∗.

The sign of tr DF∗ depends on the parameters α, β , γ . According to (5.22), givenβ , either quantity α or x+ determines the other. The calculations are more convenientif we regard x+, β , γ as the independent parameters and determine α from (5.22)

α = (β + 1)x+ + 1/x+, (5.24)

remembering from Figure 5.3 that the range of x+ is bounded from below, x+ >1/

√ β + 1. Now

tr DF∗ = a11 − γ =2

1 + x2+

− 1 − γ (5.25)

where we have invoked (5.21) in calculating a11, the 1,1-entry of the Jacobian (5.19).Manipulating (5.25), we find that E+ is asymptotically stable (i.e., a sink) if

1 −γ

1 + γ < x

+. (5.26)

If γ > 1—i.e., if the radical is complex—then E+ is asymptotically stable for all x+

in the physical range x+ > 1/√

β + 1. If (5.26) is not satisfied, then E+ is a source.

In the next subsection we focus on the parameter range

min

1 − γ

1 + γ ,

1√ β + 1

< x+ < 1. (5.27)

In this case (5.18) has three equilibria and the “top” equilibrium is a sink. Moreover,of particular interest to us, the inequality x+ < 1 means that at this equilibrium

the 1,1-entry of the Jacobian (5.19) is positive, as may be seen from the formulaa11 = 2/(1 + x2

+) − 1.

140



E+E−

r

x

r−nullcline

x−nullcline

Figure 5.4: Nullclines and equilibria of (5.18) for α = 4 and β = 5/2. The r- and

x-nullclines are given by (5.20) and (5.21), respectively. Note that this figure differs from the nullclines in Figure 4.2 because the present figure is drawn assuming that the parameters satisfy (5.27).

5.3.2 The Turing instability: Destabilization by diffusion

The Turing instability may arise if an activator and an inhibitor react in a spatiallyextended environment in which the chemicals may diffuse. It is believed that, inmorphogenesis, this mechanism underlies the formation of periodic structures likehair on mammals, gills on fish, feathers on birds, etc. The full description of theTuring instability requires both space and time—i.e., a PDE—which is beyond thescope of this book. However, the essential phenomenon occurs in a toy model thatwe study in this section.

To develop intuition about the effect of diffusion, let us consider a hypotheticalscalar reaction, say modeled by r′ = 1

−r, that takes place in two reaction vessels

coupled by diffusion. This situation is described by the equations

r′1 = 1 − r1 + D(r2 − r1)r′2 = 1 − r2 + D(r1 − r2).

The diffusive terms cause reactant to move from the cell with higher concentrationto the cell with lower concentration, at a rate proportional to the difference in con-centration. The original scalar ODE has a unique equilibrium at r = 1, and it isasymptotically stable. The system with diffusion has the unique equilibrium (1, 1);moreover, the eigenvalues of the 2 × 2 coefficient matrix are −1 and −1 − 2D, sothis equilibrium is also asymptotically stable. Indeed, the extra eigenvalue

−1

−2D

is even more negative than the original one. Thus diffusion is usually regarded as astabilizing effect.

141



However, let us consider two reaction vessels, each containing an activator-inhibitorsystem modeled by (5.18). Suppose the inhibitor is allowed to diffuse between thetwo cells, which leads to the four-dimensional system

(a) x′1 = α 11+r1

x21

1+x21

− x1

(b) r′1 = β x21

1+x21

− γr1 + D(r2 − r1)

(c) x′2 = α 11+r2

x22

1+x22

− x2

(d) r′2 = β x22

1+x22 − γr2 + D(r1 − r2)

(5.28)

This system has the equal-concentration equilibrium (x+, r+, x+, r+) where (x+, r+)are the coordinates of the top equilibrium E+ of (5.18). Let us apply Theorem 5.1.1to determine the stability of this equilibrium. The Jacobian DF∗ of (5.28) at theequilibrium may be conveniently written in terms of the 2 × 2 blocks

A =

a bc −d

, B = D

0 00 1

where A is the Jacobian of (5.18) at the equilibrium and B includes the diffusion

terms; specifically,

DF∗ =

A − B B

B A − B

. (5.29)

By applying a similarity transformation with

S =

I −I I I

where I is the 2 × 2 identity matrix, we may reduce DF∗ to the block diagonal form

S −1

DF∗S = A 0

0 A − 2B

,

whose eigenvalues are the two eigenvalues of A and the two eigenvalues of A − 2B.Since E+ is asymptotically stable, the eigenvalues of A are both negative, but let usconsider those of

A − 2B =

a −bc −d − 2D

.

The surprising behavior occurs when the parameters satisfy (5.27), which means thatthe 1,1-entry a is positive. Note from (5.19) that b,c,d are always positive. If D islarge enough, the determinant of this matrix is negative, in symbols

−a(d + 2D) + bc < 0.

142



For such large D, one of the eigenvalues of DF∗ must be positive, meaning the equi-librium of the 4 × 4 system is unstable. This is the Turing instability—an otherwisestable equilibrium has been destabilized by diffusion!

What is the long-term behavior of solutions of (5.28) when the equal-concentrationequilibrium is unstable? We can’t stop the determined reader from firing up his/hercomputer to answer this question right now, but let us mention that in Chapter 7we will develop analytical methods to answer this question.

For reference when we return to this problem, we record the following information:

One eigenvalue of the Jacobian DF∗ is positive if D > Dthr where the threshold valueof diffusion is

Dthr =bc − ad

2a.

For this value of D, the equilibrium of (5.28) is non-hyperbolic. If D = Dthr, thenull eigenvector v of DF∗ is

v =

w

−w

(5.30)

where w spans the kernel of A − 2B.

5.4 Liapunov functions

5.4.1 The main result

Liapunov functions provide another approach to analyze the stability of an equilib-rium b∗ of x′ = F(x) where F : U → Rd. Suppose U 1 ⊂ U is an open neighborhoodof b∗, and suppose L : U 1 → R is continuous on U 1 and C1 on U 1 ∼ b∗. We shallcall L a Liapunov function for b∗ if

(i) For all x ∈ U 1 ∼ b∗, ∇L(x), F(x) ≤ 0 and

(ii) For all x ∈ U 1, L(x) ≥ L(b∗), with equality only if x = b∗.

(5.31)

Condition (ii) requires that b∗ is a strict local minimum of L. Regarding Condi-tion (i): by the chain rule, the derivative of L(x) along a trajectory x(t) of x′ = F(x)is given by

d

dtL(x(t)) = ∇L(x(t)), x′(t) = ∇L(x(t)), F(x(t));

thus, the inequality in Condition (i) implies that L(x) is nonincreasing along any trajectory of x′ = F(x).

Given a Liapunov function L(x), if the interior of the level set

x ∈ U 1 : L(x) ≤ c (5.32)

is compact, then this set is a trapping region. Typically, this set is compact, at least

143



provided c is sufficiently close to the minimum L(b∗). Thus, usually a Liapunov function provides a one-parameter family of trapping regions surrounding b∗.

If in (5.31)(i), less-than-or-equal-to is replaced by strict inequality, ∇L(x), F(x) <0, then L is called a strict Liapunov function. In this case, of course, along any tra-

jectory L(x) is strictly decreasing.

Theorem 5.4.1. (a) If x′ = F(x) admits a Liapunov function L(x) near b∗, then the equilibrium is Liapunov stable. (b) If the equation admits a strict Liapunov function

near b∗, then the equilibrium is asymptotically stable.

The proof is straightforward but rather dry—sorry.

Proof. (a) Let V ⊂ U , a neighborhood of b∗, be given. Choose δ so small thatB(b∗, δ ) ⊂ V . Let

α = min|x−b∗|=δ

L(x);

by (5.31ii) and compactness, α > L(b∗). Let

V 1 = x ∈ B(b∗, δ ) : L(x) < α ⊂ V .

Suppose x(t) is a trajectory of x′ = F(x), defined for some interval of time, suchthat x(0) ∈ V 1. Then for t ≥ 0

L(x(t)) ≤ L(x(0)) < α,

so x(t) does not cross the boundary of this ball, x : |x − b∗| = δ . Thus, byTheorem 4.1.2 the solution x must exist for all positive time and moreover

x(t) ∈ V 1 ⊂ V ,

which shows that b∗ is stable.(b) Let L be a strict Liapunov function near b∗. By Part (a), if a trajectory startsnear b∗, then it exists for all positive time and stays within a compact neighborhoodof b∗. Suppose such a trajectory does not converge to b∗. Then there exists asequence tn tending to infinity such that x(tn) is bounded away from b∗. Byinvoking compactness and passing to a subsequence, if necessary, we may assumewithout loss of generality that the sequence x(tn) has a limit, say x(tn) → b forsome point b. Since L is continuous,

limn→∞

L(x(tn)) = L(b) = limt→∞

L(x(t)) (5.33)

the latter equality because L(x(t)) is a decreasing function. Now consider the IVPfor x′ = F(x) with initial condition b; we write this solution as ϕ(s, b), using the

144



flow notation of Section 4.3. Since L is a strict Liapunov function, for any s > 0

L(ϕ(s, b)) < L(ϕ(0, b)) = L(b). (5.34)

On the other hand, by Theorem 4.4.1 giving the continuity of ϕ, we have ϕ(s, b) =limn→∞ ϕ(s, x(tn)), and by Proposition 4.4.3 giving the semigroup property, we haveϕ(s, x(tn)) = x(tn + s). Combining these, we conclude that

L(ϕ(s, b)) = limn→∞

L(x(tn + s)) = L(b)

where we have invoked the second equality in (5.33). But this equation contradicts(5.34).

5.4.2 Lasalle’s invariance principle

To illustrate a common difficulty in applying Liapunov functions, let us attempt toshow that the equilibria (±1, 0) of Duffing’s equation (5.1) are asymptotically stable.We propose the energy E (x, y) = y2/2 − x2/2 + x4/4 as our Liapunov function. Theequilibria (

±1, 0) are local minima of E , and Since

dE

dt= −βy2 ≤ 0;

thus E is indeed a Liapunov function. Unfortunately it is not a strict Liapunovfunction because dE/dt vanishes along the x-axis. Thus Theorem 5.4.1 implies onlythat (±1, 0) are Liapunov stable, not asymptotically stable.

The following result, known as Lasalle’s Invariance Principle , will allow us toextract the desired conclusion despite this difficulty. The analysis is based on infor-mation about the set where the Liapunov inequality fails to be strict,

S = x ∈ U 1 ∼ 0 : ∇L(x), F(x) = 0. (5.35)

Specifically, we require that:

No trajectory that starts in S remains in S for all positive time. (5.36)

Theorem 5.4.2. If, near an equilibrium b∗, x′ = F(x) has a Liapunov function Lthat satisfies assumption (5.36), then b∗ is asymptotically stable.

Proof. The proof of this result closely follows that of Theorem 5.4.1(b). Supposethere is a trajectory x(t) starting close to b∗ that does not converge to b∗. Then

proceeding as in the previous proof, we may choose a sequence tn such that x(tn) →b ∈ V and such that (5.33) holds. Again we consider the solution ϕ(s, b) of the IVP,but with the following difference: Before we could guarantee that (5.34) held for any

145



s > 0. Now, however, if b ∈ S , it might happen that L(ϕ(s, b)) = L(b) for a rangeof s, but the crucial point is this: by Assumption 5.36, the trajectory ϕ(s, b) cannotremain in S indefinitely, and hence there must be some value of s such that (5.34)holds. The proof may now be completed as in the previous case.

In the Exercises we offer hints for how to use Lasalle’s Invariance principle toobtain asymptotic stability for the equilibria (±1, 0) of Duffing’s equation.

5.4.3 Construction of Liapunov functions

The real mystery regarding Liapunov functions is not how to use them but how to find them. In mechanical problems, such as Duffing’s equation above, energy is anobvious candidate. In the Exercises we introduce you to two classes of equations—gradient systems and Hamiltonian systems—whose structure automatically providesa Liapunov function. Failing these special cases, one is forced to rely on ingenuity.Here’s one reason why you need not feel discouraged by this prospect—in studyingan equation over an extended period of time you will find that you develop intuitionabout it that may astonish those less familiar with the equation.

In support of these encouraging words, we present an example of a Liapunovfunction constructed by ingenuity, which builds on knowledge the student has alreadyacquired in this book. We consider the “logistic Lotka-Volterra” system (4.28), whichfor convenience we repeat here:

(a) x′ = x(1 − x/K − y)(b) y′ = ρy(x − 1).

(5.37)

We assume that K > 1 so that the coexistence equilibrium of (5.37), located at(x, y) = (1, 1 − 1/K ), It is readily verified that this equilibrium is asymptoticallystable, but nevertheless let us construct a Liapunov function. Note that setting

K = ∞ in (5.37) yields the original Lotka-Volterra equation (1.33). As we saw inChapter 1, the function

ρ(x − ln x) + y − ln y

is constant on the (periodic) orbits of (1.33). Let us try to modify this function toobtain a Liapunov function for (5.37), say

L(x, y) = Ax − B ln x + y − D ln y (5.38)

for some constants A,B,D. (By scaling the Liapunov function, we have assumedwithout loss of generality that the coefficient of y in (5.38) is unity.) To determine

the coefficients in (5.38), we first require that this function assumes its minimum atthe equilibrium of (5.37) for which (x, y) = (1, 1 − 1/K ); this yields that A = B and

146



y

x

Figure 5.5: A level set (5.40) in case C > 0. The level set decomposes into twoorbits.

is contained in (but not necessarily equal to) a level set of the energy function,

(x, y) : H (x, y) = C (5.40)

where H (x, y) = y2

/2 + V (x). Below we will argue that, if for example C > 0, then(5.40) consists of exactly two distinct orbits (see Figure 5.5)

y =

C + x2/2 + x4/4 and y = −

C + x2/2 + x4/4 (5.41)

where −∞ < x < ∞. By contrast, infinitely many trajectories are associated withthe single orbit y =

C + x2/2 + x4/4 because for any given solution (x(t), y(t)) of

the ODE and any real t0, the shifted function (x(t−t0), y(t− t0)) is another solution,and this specifies a different trajectory. Thus, it is more convenient to enumerate thequalitatively different orbits of an ODE because this concept eliminates irrelevantredundancy in trajectories.

Another difference between orbits and trajectories is related to the fact thatsolutions of (5.39) blow up in finite time, both forwards and backwards. By contrast,the orbits (5.41) extend to infinity. This behavior reflects a general phenomenon: If F is defined for all x ∈ Rd, either an IVP is solvable for all time, or by Corollary 4.1.3the orbit extends out to infinity.

It is instructive to interpret the orbits of (5.39) physically in terms of the analogyof a marble rolling in the x, z -plane over the hill given by z = V (x) (see Figure 5.6).In case C > 0, the orbit y =

C + x2/2 + x4/4 in the upper half plane derives from

a particle that at large negative times is far to the left of the origin and is movingto the right towards the top of the hill at x = 0 (see Figure 5.6); it slows down

as it approaches the top of the hill, but it has enough energy to clear it; after itpasses x = 0, it sails off to the far right, at ever increasing speeds. In focusing on

148



V(x)x

Figure 5.6: Illustration of the rolling-marble analogy for (5.39) described in the text.

y

x

Figure 5.7: A level set (5.40) in case C < 0. The level set decomposes into twoorbits.

the orbit , we suppress information regarding exactly when the particle moves overthe hill. Similarly, the orbit in the lower half plane may be interpreted in terms of aparticle moving from the right to the left that clears the hill.

If C < 0, (5.40) again consists of exactly two orbits (see Figure 5.7), one in theright half plane x > 0 and one in the left half plane x < 0. In the rolling-marbleanalogy, the orbit in the right half plane derives from a particle that at large negativetimes is far to the right of the origin and is moving towards the hill but does nothave enough energy to clear it; thus the particle is turned around and sails back tothe far right as time increases. The orbit in the left half plane is similarly described.

When C = 0, the level set (5.40) consists of two crossed curves (see Figure 5.8),

y = ±x√ 1 + x2. Let us show that this level set contains exactly five orbits, as

149



y

x

(v) (ii)

(iii) (iv)

Figure 5.8: The level set (5.40) in the case that C = 0. The level set decomposes into five orbits, (5.42).Different colors for Ms and Mu? Describe color conventionin caption. Likewise in the next two figures.

follows:(i) x = y = 0,

(ii) y =

−x√

1 + x2, 0 < x <

∞,

(iii) y = −x√ 1 + x2, −∞ < x < 0,(iv) y = x

√ 1 + x2, 0 < x < ∞,

(v) y = x√

1 + x2, −∞ < x < 0.

(5.42)

We continue to invoke the rolling-marble analogy. The equilibrium at the originrequires no comment. Orbit (ii) derives from a particle that at large negative timeswas far to the right and is moving to the left with just enough energy to converge tothe top of the hill as t → ∞; this is a single orbit. Similarly orbit (iii) derives froma particle moving to the right that converges to the top of the hill. Mathematically,orbits (iv) and (v) are quite similar to orbits (ii) and (iii), but the description inwords is a little harder to swallow: the particle “falls off” the hilltop at time minusinfinity and is moving away from the equilibrium for all time, initially at infinitesimalspeeds but continuously accelerating—it takes an infinite amount of time to fall off an equilibrium at time minus infinity, just as it takes an infinite amount of time toconverge to equilibrium as t tends to plus infinity. The motion is to the right or leftfor orbits (iv) or (v), respectively.

The orbits (5.42) are special in that they are the only orbits of (5.39) that makecontact with the equilibrium at the origin. In the next subsection we study analogousbehavior near an arbitrary hyperbolic equilibrium.

150



5.5.2 Statement of the main result

We are concerned with the IVP

x′ = F(x), x(0) = b (5.43)

for initial conditions b close to a hyperbolic equilibrium b∗. The notation Ms in thefollowing theorem is mnemonic for stable manifold . An unstable manifold Mu, havingsimilar asymptotic properties as time tends to negative infinity, will be introduced

below.

Theorem 5.5.1. Suppose that the first ds eigenvalues of the Jacobian DF(b∗) have negative real parts and that the remaining d−ds eigenvalues of DF(b∗) have positive real parts. Then there is a (bounded) neighborhood V of b∗ in Rd and a differentiable manifold Ms ⊂ V of dimension ds through b∗ such that:

(i) If b ∈ Ms, then the IVP (5.43) has a solution ϕ(t, b) for all positive time with ϕ(t, b) ∈ Ms, and moreover

limt→∞x(t) = b∗. (5.44)

(ii) If b ∈ V ∼ Ms, then ϕ(t, b) leaves V at some positive time.

First let let us interpret the theorem for the saddle of (5.39). The neighborhoodV may be chosen with great latitude; for definiteness let V = B(0, r) be a ball of some finite radius. You should verify that

Ms = (x, y) ∈ V : y = −x√

1 + x2, (5.45)

satisfies all the claims of the theorem. In words, Ms is the intersection of V with

the union of orbits (5.42i, ii, iii) above. For (5.39), conservation of energy allowedus to derive an explicit formula for Ms. If we modified (5.39) slightly, for exampleby including a friction term, we could no longer parametrize Ms explicitly, but wecould invoke the theorem to guarantee that a stable manifold still existed.

The theorem concerns what properly should be called a local stable submanifold2.For (5.39), the curve y = −x

√ 1 + x2 where −∞ < x < ∞ is a global stable manifold.

We shall discuss global stable manifolds in the general case below.

If in the theorem ds = d, then the stable manifold is the entire neighborhood V of b∗. In this case, Theorem 5.5.1 is effectively just a restatement of Theorem 5.1.1.

2

Some authors use a notation such as M(loc)

s to indicate this idea explicitly. This notation seemsunbearably heavy to us, and we shall resort to it only when being so completely explicit isnecessary to avoid confusion.

151



To continue profitably reading this book you need to develop intuition aboutstable manifolds. This may be achieved, even without reading the proof of Theo-rem 5.5.1, through interpreting the conclusions of Theorem 5.5.1 in various examples,including (5.39), (5.56), the examples in the rest of Section 5, and a selection of ex-ercises. About the proof of the theorem, there is good news and bad news. Thebad news is that the proof is long and technical. The good news is that it does notinvolve new ideas and it may be omitted on a first reading without serious loss of continuity.

5.5.3 Proof of Theorem 5.5.1

The construction of Ms is facilitated by two reductions. First, we translate co-ordinates so that the equilibrium is at the origin, b∗ = 0, and second, we rotatecoordinates to separate the eigenvectors of DF(b∗) associated with eigenvalues hav-ing positive and negative real parts. Specifically, regarding the second reduction,after performing an appropriate similarity transformation we may assume withoutloss of generality that DF(b∗) has block-diagonal form

DF(b∗) = B 0

0 −C

where B and C are (square) matrices of dimension ds and d− ds, respectively, whoseeigenvalues all have negative real parts. Thus, the eigenvalues of B are the eigenvaluesof DF(b∗) with negative real parts, and the eigenvalues of C are negatives of theeigenvalues of DF(b∗) with positive real parts. Moreover, let us apply the conclusionof Exercise ??: i.e., at the expense of yet another similarity transformation we mayassume without loss of generality that there is a constant ε > 0 such that

eBt ≤ e−εt, eCt ≤ e−εt; (5.46)

note that there is no constant pre-factor on the RHSs of these estimates. Using thesenew coordinates, we may write x ∈ Rd as x = (y, z) where y = (x1, . . . , xds) containsthe first ds components of x and z contains the last d−ds components. More formally,we decompose Rd = E s ⊕ E u where E s = Rds × 0 is the span of the eigenvectorsof DF(b∗) whose eigenvalues have negative real parts and E u = 0 × Rd−ds is thespan of the eigenvectors with positive real parts.

The manifold Ms ⊂ E s ⊕ E u is tangent at the origin to E s. Thus, we may

152



describe Ms as the graph3 of a function ψ : E s → E u

Ms = (y,ψ(y)) : y ∈ E s (5.47)

where ψ(0) = 0. (Strictly speaking, ψ(y) will be defined only for near y = 0.) Theconstruction of ψ below is based on a fixed-point argument. We shall not prove thatψ is differentiable nor of course verify the tangency condition Dψ(0) = 0.

We expand F(x) = Ax+r(x) in a Taylor series, where A = DF(0) and Dr(0) = 0.

By continuity we may choose δ > 0 such that

|x| ≤ √ 2δ =⇒ |Dr(x)| ≤ ε

3(5.48)

where ε is the decay rate in (5.46). The following lemma is less of a distraction nowthan later.

Lemma 5.5.2. If |x1|, |x2| <√

2δ , then

|r(x1) − r(x2)| ≤ ε

3|x1 − x2|. (5.49)

Proof. The estimate (5.49) follows from integrating Dr along the line from x1 to x2,as in Lemma 3.2.3, and invoking (5.48).

Partial proof of Theorem 5.5.1: We start from the equivalent integral equation forsolutions of the IVP (5.43)

x(t) = eAtb +

t

0

e(t−s)A r(x(s)) ds.

Let us write this equation in components (y, z). Since A is block diagonal, multipli-cation of a vector by eAt does not mix components. Thus if we similarly decompose

b = (c, d) and r(x) = (p(x), q(x)) into components, the integral equation may berewritten y(t)z(t)

=

eBtc +

t0

e(t−s)B p(y(s), z(s)) ds

e−Ctd + t0 e−(t−s)C q(y(s), z(s)) ds

. (5.50)

Because C appears in (5.50) with a minus sign, it is desirable in the second componentof this equation to change from an initial condition to a terminal condition. In theExercise 1 we ask you to show that if T > 0, any solution of the IVP (5.43) satisfies

y(t)z(t)

=

eBtc +

t0

e(t−s)B p(y(s), z(s)) ds

eC (T −t)z(T ) − T

te(s−t)C q(y(s), z(s)) ds

(5.51)

3The present representation of Ms is different from (5.45) since in (5.45) we have not performedthe preliminary similarity transformation to make the eigenvectors of DF(b∗) parallel to thecoordinate axes.

153



for 0 ≤ t ≤ T .

Such integral equations are never formulas for the solution; here this is even“more true” since z(T ), about which we know nothing, appears on the RHS of (5.51). However, in the next proposition we show that this term disappears in thelimit T → ∞.

Proposition 5.5.3. If x(t) satisfies (5.43) for all t ≥ 0 and if supt≥0 |x(t)| < ∞,then

x(t) = eBtc + t

0

e(t−s)B p(x(s)) ds

− ∞t

e(s−t)C q(x(s)) ds

. (5.52)

Proof. In (5.51) we hold t fixed and let T → ∞. Because x(t) is bounded, it followsfrom (5.46) that eC (T −t)z(T ) tends to zero. Similarly, the integral from t to ∞ isabsolutely convergent. Thus (5.52) results from (5.51) by taking this limit.

Let us use (5.52) to define a mapping on a space of functions as follows: LetV ⊂ E s ⊕ E u be the direct product of δ -balls in Euclidean space

V = (y, z) ∈ Rd : |y| < δ, |z| < δ ;

Note that V in contained in the ball of radius √ 2δ so (5.49) holds for points in V .Let Ω be the set of functions

Ω = x ∈ C([0, ∞),Rd) : y(0) = c, (∀t)x(t) ∈ V.

Define the map T : Ω → C([0, ∞),Rd)

T[x](t) =

Ts[x](t)Tu[x](t)

=

eBtc +

t0

e(t−s)B p(x(s)) ds− ∞

te(s−t)C q(x(s)) ds

(5.53)

where for convenience below we have indicated the decomposition of T[x] into “sta-

ble” and “unstable” components.Claim 1: If |c| < δ , then for any x ∈ Ω, the image T[x] belongs to Ω.

Proof. It is clear that Ts[x](0) = c, so to show T[x] ∈ Ω we must estimate thecomponents of T[x]. For the “unstable” component, this is easy—by (5.46)

|Tu[x](t)| ≤ ∞0

e−εs′ |q(x(t + s′))| ds′

where we have made the substitution s′ = s − t in the integral. Restricting (5.49) tothe case x2 = 0 we observe that

|q(x(t + s′))| ≤ ε

3|x(t + s′)| ≤ ε

3

√ 2δ.

154



On substitution into the integral, we deduce that

Tu[x] ≤ ε

3

√ 2δ

∞0

e−εs′ ds′ =

√ 2

3δ < δ.

For the “stable” component, this is a little more delicate—we have

|Ts[x](t)| ≤ e−εt|c| + t

0

e−εs′ |p(x(t − s′))| ds′

where we have made the substitution s′ = t − s in the integral. The first term isstrictly less than e−εtδ . For the second, invoking (5.49), we deduce that |p(x)| ≤(ε/3)

√ 2δ and evaluate the integral, finding

|Ts[x](t)| <

e−εt +

√ 2

3(1 − e−εt)

δ ≤ δ.

Remark: In fact we have shown that T[x](t) belongs to the open set

V .

Claim 2: If |c| < δ , the mapping T : Ω → Ω is a contraction.

Proof. We estimate components separately. For the “unstable” components

|Tu[x1] − Tu[x2]|(t) ≤ ∞0

e−εs|p(x1(t + s)) − p(x2(t + s))| ds.

Using (5.49) we conclude

Tu[x1] − Tu[x2] ≤ ε

3 ∞

0

e−εs ds x1 − x2,

which integrates to x1 − x2/3. Similarly

Ts[x1] − Ts[x2] ≤ x1 − x2/3,

and by adding estimates for the components we see that T has Lipschitz constant atmost 2/3.

For completeness let us record that C([0, ∞),Rd) is a Banach space and that Ωis a closed subset of C([0, ∞),Rd). Therefore, if c ∈ E s and |c| < δ , then T has a

unique fixed point xfix ∈ Ω, which we decompose into components (yfix, zfix). Definethe mapping ψ in (5.47) byψ(c) = zfix(0).

155



Now consider the IVP

x′ = F(x), x(0) = (c, d) (5.54)

where |c| < δ . Regarding Conclusion (i) of Theorem 5.5.1 if (c, d) ∈ Ms—i.e., if d = ψ(c)—then xfix solves (5.54) and belongs to V for all time. In Exercise 1 we askyou to show that if b ∈ Ms and t ≥ 0, then ϕ(t, b) ∈ Ms.

On the other hand, if the solution of (5.54) belongs to V for all time, then by

Proposition 5.5.3 this solution is a fixed point of T

and by uniqueness equals xfix.Thus, regarding Conclusion (ii), if d = ψ(c), the solution of (5.54) cannot belong toV for all time.

We still need to verify (5.44), for which we introduce the following lemma.

Lemma 5.5.4. If c| > δ and if x ∈ Ω, then

limsupt→∞

|T[x](t)| ≤ 2

3lim sup

t→∞|x(t)|.

Proof. To begin, let us record a simple fact that we will use repeatedly: If 0 ≤ a ≤b

≤ ∞, then

ε

b

a

e−εsds ≤ 1. (5.55)

Let L = limsupt→∞ |x(t)|. For an arbitrary η > 0 there is a time t0 such thatx(t) < L + η if t > t0. Choose τ such that, with δ as above, e−ετ δ < η/6.

Now |T[x](t)| ≤ e−εtδ + I 1 + I 2 where

I 1 =

t

0

e−ε(t−s)|p(x(s))|ds, I 2 =

∞t

e−ε(s−t)|q(x(s))|ds.

We suppose that t

≥t0 + τ , so for the first term e−εtδ < η/6. We write

I 1 as the sum

of integrals over [0, t−τ ] and [t−τ, t]. On [0, t−τ ], we have |p(x)| ≤ (ε/3)√ 2δ < εδ .Substituting into the integral and using (5.55) we deduce that t−τ

0

e−ε(t−s)|p(x(s))|ds ≤ e−ετ εδ

t−τ

0

e−ε(t−τ −s)ds < e−ετ δ < η/6.

On [t − τ, t], we have |x| ≤ L + η so t

t−τ

e−ε(t−s)|p(x(s))|ds ≤ (ε/3)(L + η)

t

t−τ

e−ε(t−s)ds ≤ (L + η)/3.

Regarding I 2, over the entire interval [t, ∞), we have |x| ≤ L + η so I 2 ≤ (L + η)/3.Putting together the various pieces we conclude that |T[x](t)| ≤ η + 2L/3. Since ηwas arbitrary, we are done.

156



The claim (5.44) follows immediately from this lemma because, if x is a fixedpoint of T,

limsupt→∞

|x(t)| ≤ 2

3lim sup

t→∞|x(t)|,

so the lim sup, which is non-negative, must vanish.

This completes as much as the proof of Theorem 5.5.1 as we promised. See Meiss[] for the proof that ψ is differentiable and that Dψ(0) = 0.

Remark. Although we have not proved that Ms is tangent to E s, nor even stated itin Theorem 5.5.1, this fact is one of the important properties of Ms that you shouldkeep in mind as we use these manifolds to help understand the behavior of ODEs.

Under the hypotheses of Theorem 5.5.1 there is also an unstable manifold Mu of dimension d−ds through b∗. It is most easily described as the stable manifold of thetime-reversed system x′ = −F(x). Thus the IVP (5.43) has a solution that belongsto V for all negative times if and only if the initial data b lies in Mu. Moreover if b ∈ Mu, then ϕ(t, b) ∈ Mu for all t ≤ 0 and tends to b∗ as t → −∞.

Incidentally, in the case of a saddle point in the plane (d = 2, ds = 1), the

stable and unstable manifolds (which are simply curves) were traditionally calledseparatrices .

5.5.4 Global behavior

First let us correct a possible misunderstanding about what Conclusion (ii) in The-orem 5.5.1 asserts: although solutions with initial conditions in V ∼ Ms eventuallyleave V , it is possible for them to return at some later time. To explore this assertion,let’s consider Duffing’s equation (without friction)

x′ = yy′ = x − x3,

(5.56)

which differs from our example (5.39) above only in the sign of the nonlinear termin the force. The origin is an equilibrium of (5.56), and the Jacobian there is givenby

DF(0) =

0 11 0

,

which has eigenvalues ±1. Thus, by the theorem, Ms is a curve through the origintangent to (1, −1), the eigenvector of DF with eigenvalue −1, and Mu is a curve

tangent to (1, 1).Energy is conserved in (5.56), so orbits are contained in level sets of the energy.

157



2−

M u

M s

2−1 1

V

x

y

Figure 5.9: Stable and unstable manifolds through the saddle point at the origin in (5.56), Duffing’s equation without friction. Both manifolds are contained in the zero-energy level set (5.57). The local manifolds Ms and Mu are shown in bold, and the global manifolds actually coincide.

Ms and Mu are contained in the zero-energy level set,

S = (x, y) : y2/2 − x2/2 + x4/4 = 0, (5.57)

the only level set to make contact with the origin (see Figure 5.9). In Theorem 5.5.1,let us take V = (x, y) : |x| < 1, |y| < 1. Solving the equation in (5.57) for y, weobtain the description

Ms = (x, y) : y = −x

1 − x2/2, −1 < x < 1, (5.58)

and Mu is similarly described in terms of the function y = x

1 − x2/2.

Now consider nonzero initial data b∗ for (5.56) in Mu; in particular b∗ ∈ V ∼ Ms.As time increases, the solution moves away from the origin along the level set (5.57).It does indeed leave V but, verifying the above claim, eventually it returns; in fact,it returns along Ms!

As noted above, for the example (5.39), the local stable manifold M(loc)s of The-

orem 5.5.1 is a subset of a global stable manifold

M(glob)s = (x, y) ∈ R2 : y = −x

√ 1 + x2.

Here are two properties of M(glob)s that are characteristic of global stable manifolds.

(i) M(glob)

s is invariant

4

under the flow.4A set S is called invariant if for every initial condition b ∈ S and for all t such that the IVP has

a solution, ϕ(t, b) ∈ S . A set is invariant iff it is a union of orbits.

158



M s

M u

y

x

Figure 5.10: Global stable and unstable manifolds through the saddle point at (0, 0)in Duffing’s equation with friction (5.1) and β = 1/4. Reduce the number of turns in the spirals. Maybe put an arrowhead where you make the spiralsend?

We are now in a position to verify most of the claims made there. (See also Exercise 3and Section 6.2.3.)

To begin, in Exercise 2 we ask you to verify the information in Table 5.3 abouthow the equilibria of (5.60) depend on the parameters in the equation. Note that thestability changes in the equilibria correlate with the three regions in the ε, K planeshown in Figure 1.11(a).

To visualize the phase portraits of (5.60), it is extremely helpful to plot the globalstable manifold of the saddle point (ε, 0), as is done in Figure 5.11. Of course the

behavior of M(glob)s as t → −∞ differs in Cases I, II, and III. (Cf. Figure 1.11(a) in

Chapter 1.) In fact, two different behaviors occur within Case II, shown in panels(b) and (c) in the present figure. The precise location where the behavior shifts fromIIa to IIb, which depends on ρ as well as on ε, K , can be located only using thecomputer. Compute by choosing starting point along stable eigenvector, integratebackwards. Q: Describe as “shrinking basin of attraction” of co-exist eqlb..

5.6 Exercises



(a) Prove Corollary 5.1.2.

(b) Prove the claim made in Section 5.2.1: If b∗ is an equilibrium of x′ = F(x)and if ℜλ j(DF∗) > 0 for some j, then b∗ is unstable.

160



Equilibrium Description Stability

(0, 0) Extinction Always a stable node(ε, 0) Extinction threshold Always a saddle(K, 0) Prey-only equilibrium Stable node if K < 1

Saddle if K > 1(1, y∗) Co-existence equilibrium Unphysical saddle if K < 1

Sink if 1 < K < (1 + 2ε−

ε2)/2εSource if (1 + 2ε − ε2)/2ε < K

Table 5.3: Equilibria of (5.60), the Lotka-Volterra system augmented by logistic growth and the Allee effect. In the co-existence equilibrium,y∗ = (1 − 1/K )(1 − ε)/(1 + ε). The stability determinations were made assuming that ε < K . Q: Is coexist eqlb always a focus?

I

K = 5Region IIaRegion

K = 3.35

IIIRegionK = 0.4

RegionK=1.5

IIb

(a)

(c)

(b)

(d)

2.5x

1.5

y

0 ε0

2.5x

1.5

y

0 ε0

2.5x

1.5

y

0 ε0

2.5x

1.5

y

0 ε0

K K

(1,1−1/K)

Figure 5.11: Global stable manifolds for the saddle point (ε, 0) of (5.60) for various choices of the parameter K .

161



(c) Complete the analysis begun in Section 5.4.2: Use Lasalle’s invarianceprinciple to prove that the equilibria (±1, 0) of (5.1) are asymptotically stable.

Hint: Calculate that the exceptional set (5.35) is the x-axis. Thenexamine the equation (5.1) to show that (5.36) is satisfied.

(d) Show that, as claimed in the text, solutions of (5.39) blow up in finite time.

(e) Verify equation (5.51).

(f) Verify that the set Ms constructed using ψ in the proof of Theorem 5.5.1has the following key property of stable manifolds: If b ∈ Ms and t ≥ 0, thenϕ(t, b) ∈ Ms.

2. Miscellaneous applications of Theorem 5.5.1:

(a) The chemostat (ref’ce: Edelstein-Keshet) is described by the two ODEs:

(a) x′ = kn

1 + nx − x

(b) n′ = − n

1 + nx − n + α

where k, α are positive constants.

• Find conditions on the constants k, α so that these equations have anequilibrium with x > 0.

• If your condition is satisfied, show that the equilibrium with x > 0 isasymptotically stable.

(b) Recall from Exercise ?? in Chapter 4 the equations for the evolution of two symbiotic species. Show that, for values of the constants K j such thatthis system has an equilibrium with both x and y positive, the equilibrium isasymptotically stable.

(b’) Competing species:

x′ = r1x

1 − x

K 1− b

y

K 1

, y′ = r2y

1 − y

K 2− c

x

K 2

.

Figure out conditions for there to be an equilibrium in the open first quadrant.Determine the stabilities of all equilibria in all cases. (The equilibrium in thefirst quadrant may be unstable.)

(c) Recall from Exercise 8 in Chapter 4 the equations for a bead sliding on arotating loop,

x′ = y (5.61)

may′ = −βy − mg sin x + m(a sin x)ω2 cos x.

162



7. Cases where the condition of Theorem 5.1.1 is not satisfied. (a) Is the equilib-rium x = 0 of the scalar ODE

x′ = −x2.

asymptotically stable?

(b) Is the equilibrium x = 0 of the scalar ODE

x′ = −x3

asymptotically stable?

Remark: See Exercise 10 for another example where the condition of Theorem 5.1.1 is not satisfied.

8. Consider the three-dimensional system

x′ = rx(x − A) − p1xy − p2xz y′ = ρ1xy − d1y

z ′ = ρ2xz − d2z

(5.62)

where r,A,p j, ρ j , d j are positive constants. These equations are analogous tothe Lotka-Volterra equations with logistic growth of the prey except there aretwo species (y and z ) that attack the third (x).

(a) Show that unless d1/ρ1 = d2/ρ2, (5.62) has no equilibria at which all threepopulations are non-zero.

Discussion: In the language of Chapter 1, the relation d1/ρ1 =d2/ρ2, is “non-generic”. In non-technical language, it could be sat-isfied only by accident, and even if it were satisfied, the slightestperturbation of the system would undo it. In ecology, this behavioris known as the Law of competitive exclusion : i.e., it is ecologicallyunstable for two species to compete for exactly the same resources.

(b) Find the equilibria for which one of the prey populations is zero, anddetermine their stabilities.

165



5.6.2 Exercises referenced elsewhere in the book

9. Let H (x1, . . . , xd, y1, . . . , yd) be a smooth function of 2d real variables. A systemof the form

x′ j =∂H

∂y j j = 1, . . . , d

y′ j = −∂H

∂x j j = 1, . . . , d

is called Hamiltonian .

(a) Show that the Hamiltonian H (x, y) is constant along trajectories of such asystem.

(b) Consider the two-dimensional system (the torqued pendulum without fric-tion)

x′ = yy′ = − sin x + µ,

(5.63)

interpreted as an ODE on R2, not on the cylinder S 1 × R as in Section 4.3.4.

Show that this system is Hamiltonian with respect to the Hamiltonian function

H (x, y) = y2/2 − cos x − µx. (5.64)

Discussion: An attempt to define H as a function of the cylinderS 1×R would produce a multi-valued function. Incidentally, note thatthe function (5.64) is not bounded from below. Thus, this functionmay provide a Liapunov function near an equilibrium (if |µ| < 1, butit is of no value in proving global existence.

(c) Consider a bead sliding on a rotating hoop without friction, and scale y in

the equations of Exercise 8 so that they read

x′ = my/a2

y′ = −mga sin x + ma2ω2 sin x cos x.

Show that this system is Hamiltonian with the function

H (x, y) = my2/2a2 − mga cos x − m[aω sin x]2/2. (5.65)

Discussion: With the rescaling, y equals the angular momentumof the bead moving around the loop. The function (5.65) makesone think of the total energy of this system—the term my2/2a2

represents the kinetic energy of motion around the hoop, the term

166



−mga cos x represents the potential energy of the bead, and the termm[aω sin x]2/2 represents the kinetic energy of the bead being whirledaround the rotation axis. However, note that the term m[aω sin x]2/2is subtracted in (5.65), not added!

As the above examples hint, Hamiltonian systems are frictionless.Despite this fact, many mechanical systems can be understood as aHamiltonian system perturbed by the addition of a frictional term.

In such cases the Hamiltonian usually provides a Liapunov function,“for free”, so to speak.

Gradient systems provide another case in which a Liapunov functionmay be obtained “for free”: i.e., a system of the form

x′ j = − ∂V

∂x j, j = 1, . . . , d

where V : U → R is a smooth function. Please check that (i) V de-creases along orbits and (ii) a point b∗ is an equilibrium iff ∇V (b∗) =0. Thus V provides a Liapunov function for an equilibrium if V hasa local minimum there.

5.6.3 Computational exercises

10. Consider the activator-inhibitor system (5.2) in borderline case when

α = 2

β + 1;

i.e., (5.23) is just barely violated. Determine whether or not the equilibriumat x = 2/α is asymptotically stable.

11. If (5.23) is satisfied, the activator-inhibitor system (5.18) has three equilibria,one of which is a saddle point. Compute the (global) unstable manifold throughthe saddle point of this system. Consider first the case when ( 5.26) is satisfied(so E + is asymptotically stable) and then the case when (5.26) is violated.

12. Consider a modification of van der Pol’s equation (4.29) in which there is aquadratic non-linearity is the restoring force:

(a) x′ = y(b) y′ = −β (x2 − 1)y − x − εx2.

(5.66)

(a) Show that provided ε = 0, (5.66) has an equilibrium that is a saddle point.

167



(b) Starting with small ε, find the global unstable manifold through the saddlepoint.

(c) Explore what happens to this unstable manifold if ε is increased (say toε = 1).

13. Recall the torqued pendulum

x′ = y

y′ = − sin x − βy + µ.

(5.67)

Provided 0 ≤ µ < 1, this equation has two equilibria, solutions of sin x = µ.

(a) Show by calculating eigenvalues that the equilibrium with 0 < x < π/2 isasymptotically stable and that the equilibrium with π/2 < x < π is a saddlepoint.

(b) Using the Liapunov function (5.64), give an alternative proof that theequilibrium with 0 < x < π/2 is asymptotically stable.

(b) First assuming the friction coefficient β is small, find the (global) unstablemanifold through the saddle point.

(c) Increase β and look for a qualitative change in this unstable manifold.Interpret what you see.

Discussion: This equation (interpreted as an ODE on R2, not on thecylinder S 1 ×R) has an energy function that decreases along orbits,

H (x, y) = y2/2 − cos x − µx. (5.68)

See Exercise 3(c) above.


14. Recall from the Exercises of Chapter 4 the FitzHugh-Nagumo equations,

x′ = x(1 − x2) − y + I y′ = ε(x − γy)

(5.69)

where I, ε , γ are parameters with ε, γ positive. These equations will be dis-cussed later where?, including where they come from.

(a) If γ < 1, show that for every I , equation (5.69) has a unique equilibrium

solution.(b) Deduce from Figure ?? and Table 5.2 that in such cases the unique equi-librium is either a sink or a source.

168



and has the following property: There is a neighborhood V of the equilibrium b∗ suchthat, for any initial condition b ∈ V , if ϕ(t, b) ∈ V , then ϕ(t, b) ∈ Mc. Indeed,center manifolds are an extremely useful tool in bifurcation theory ref . However, inthis book we rely on more elementary techniques, even though this means we mustleave some results unproved.

Hidden material for exercises follows

170



d × d matrix, then the linear system x′ = Ax has periodic solutions if and only if Ahas at least one complex-conjugate pair of pure imaginary eigenvalues.

The following lemma and its corollary will help in analyzing the next example.

Lemma 6.1.1. Let x(t) be a continuous function on a closed interval [t1, t2] that satisfies x′ = F(x) for t1 < t < t2 and moreover

x(t1) = x(t2). (6.2)

Then (i) the maximal interval of existence of x(t) is −∞ < t < ∞, and (ii) x(t) is periodic with period τ = t2 − t1.

Proof. Let us define x(t) for all t by extending x periodically: thus, for all t

x(t + τ ) = x(t).

By (6.2) this extension is unambiguously defined and continuous on R. Since theequation x′ = F(x) is autonomous, the extension satisfies the ODE on any translateof the original (open) interval, (t1+nτ,t2+nτ ) where n is an integer. By Lemma 3.2.8,

the extension in fact satisfies the equation everywhere. By uniqueness, this periodicextension equals the maximal solution derived from the original solution.

Corollary 6.1.2. If the trajectory of a solution x′ of x′ = F(x), defined for all t, is contained in a closed C1 curve Γ and if there are no equilibria of x′ = F(x) on Γ,then x is periodic.

Proof. Since Γ is compact and has no equilibria, the minimum speed along Γ—i.e.,minΓ |F(x)|—is nonzero. Thus x will complete the circuit of Γ in some time less thanT , the length of Γ divided by this minimum speed. This shows that there is a timeτ such that x(τ ) = x(0), and we may therefore apply the lemma.

Example 2: (An equation with a conserved energy) Duffing’s equation (1.25),without friction or forcing, is the system

x′ = y, y′ = x − x3. (6.3)

As we have seen, the energy E (x, y) = y2/2−x2/2 + x4/4 remains constant along so-lution trajectories. There are two exceptional values of C for the level sets E (x, y) =C : i.e., C = 0, the energy of the saddle point in Figure 6.1, and C = −1/4, theenergy of the two minima. Apart from these, every level set

E (x, y) = C

is a

closed curve (with two components if C < 0, but no matter). Applying the Corol-lary, we conclude that all solutions of (6.3) with energy different from 0 and −1/4are periodic.

172



E

x

y

Figure 6.1: Double-well potential energy function for Duffing’s equation (6.3). En-ergy remains constant along solution trajectories. Q: Can we make a joke thatthe graph of E looks like a molar? Q: Should we keep the trajectoriesthat the figure now contains?

Remark: The level set E (x, y) = 0 consists of three orbits, the equilibrium(0, 0) plus two loops (see Figure 5.9 in Chapter 5). As we saw in Section 5.5.3, the

loops equal the (global) stable and/or unstable manifold of the saddle point, which infact coincide. The loops represent a limiting case of a periodic orbit: i.e., a trajectorythat closes up on itself, but only in an infinite amount of time. More formally, if x(t), −∞ < t < ∞ is a solution of an autonomous ODE such that x(t) convergesto the same point as t → −∞ and t → ∞, then its orbit x(t) : t ∈ R is called ahomoclinic orbit . Both loops of the level set E (x, y) = 0 are homoclinic orbits.

In both of the preceding examples, there are infinitely many periodic orbits. Of greater interest to us will be isolated periodic orbits known as limit cycles . (Thisname derives from the fact that, for planar systems, nearby trajectories approach

the periodic orbit in one of the limits t → ±∞.) As our numerics show, van derPol’s system (6.1) has such a periodic orbit, and we shall derive this analyticallylater in this chapter. In Chapter 7 we will see that, for certain parameter values, theaugmented Lotka-Volterra system (1.41) and the activator-inhibitor system (4.24)also have such solutions. In the meantime, here are two examples for which, evenwith just our present techniques, we can show analytically that such an orbit exists.

Example 3: (An academic example) Recall the system considered in Exer-cise 3(e) from Chapter 1

x′ = x

−y

−(x2 + y2)x

y′ = x + y − (x2 + y2)y. (6.4)

The circle of radius 1 is a limit cycle of this equation. Indeed, rewriting the system

173



in polar coordinatesr′ = r(1 − r2), θ′ = 1, (6.5)

we found explicit solutions of this system. Even without the explicit solutions, onemay see from (6.5) that the angular variable θ increases at constant rate and, unlessr(0) = 0, the radial variable r approaches 1 as t → ∞. Nearby trajectories areattracted to the periodic orbit r = 1 as t → ∞.

Example 4: (The torqued pendulum) Recall from Section 4.3.4 the system de-scribing the torqued pendulum,

x′ = yy′ = − sin x − βy + µ.

(6.6)

Suppose µ > 1; i.e., suppose the torque is large enough to overcome the pull of gravity, no matter what the angle x of the pendulum may be. Under this hypothesis,we will construct a solution x∗(t), y∗(t) of (6.6) such that the pendulum continues torotate indefinitely in a periodic fashion. Strictly speaking, if (6.6) is regarded as anODE on R× R, this solution is not a periodic function; rather it satisfies

x∗(t + τ ) = x∗(t) + 2π, y∗(t + τ ) = y∗(t) (6.7)

for an appropriate period τ . However, the RHS of (6.6) is periodic in x, and as inSection 4.3.4 we regard this equation as an ODE on the cylinder S 1 × R. In thissense a solution that satisfies (6.7) is periodic—i.e., its orbit is a closed curve1 onS 1 × R.

To construct this solution, we first eliminate time from (6.6); y as a function of x satisfies the scalar ODE dy/dx = F (y, x) where

F (y, x) =y′

x′=

µ − sin x

y −β. (6.8)

Choose constants ε and M such that

0 < ε <µ − 1

β , M >

µ + 1

β .

Then F (ε, x) > 0 and F (M, x) < 0. Thus, as illustrated in Figure 6.2, if ε ≤ b ≤ M ,

1Similar issues are implicit in the transformation of (6.4) to polar coordinates. If (6.5) wereregarded as an ODE on (0, ∞) × R, the orbit r = 1 would not be periodic. Of course because(r, θ) represent polar coordinates on R

2, it is most natural to regard (6.5) as an ODE on themanifold (0,

∞)

×S 1.

174



b*

2 π

M

0

ε

0

y

x

Figure 6.2: Trajectories for (6.8) in the strip ε < y < M with β = 1/2, µ = 3/2,

ε = 1/2, and M = 6. The bold trajectory is such that y(2π) = y(0) = b∗.

then the solution of the IVP

dy

dx

= F (y, x), y(0) = b (6.9)

is trapped between the lines y = ε and y = M . More formally, we have:

Claim 1: If ε ≤ b ≤ M , then the solution ϕ(x, b) b of the IVP (6.9) exists forall x ≥ 0 and moreover satisfies ε ≤ ϕ(t, b) ≤ M .

Regarding an analytical proof of the claim, it does not actually follow from anyspecific result we have articulated above. As a worthwhile review exercise, makeExercise we invite you to construct a rigorous proof of the claim from techniquesdeveloped in Chapter 4.

Claim 2: The derivative ∂ϕ/∂b satisfies the estimate

0 <∂ϕ

∂b(x, b) < 1.

Proof of Claim 2. According to Theorem ??, Chapter 4, need to write thmthere to cover variable coefficients the solution of (6.9) depends differentiablyon the initial condition b and moreover ∂ϕ/∂b(x, b) satisfies the linear IVP

dv

dx= −

µ − sin x

ϕ2(x, b) v, v(0) = 1,

where the RHS of the equation was obtained by differentiation of (6.8) with respectto y. The claim follows from the observation that the coefficient of v in this equation

175



b*

ε M0

bε0

M

slope 1

P(b)

Figure 6.3: Graph of the map P : [ε, M ] → (ε, M ) constructed from (6.6), using the same parameters as in Figure 6.2 .

is negative.

Define a map P : [ε, M ]

→(ε, M ) by the formula P (b) = ϕ(2π, b). (This map is a

special case of what is called the Poincare map, which we will study in Section 6.3.)As illustrated in Figure 6.3, by our claims above there is a unique point b∗ ∈ (ε, M )where the graph of P crosses the diagonal in [ε, M ] × [ε, M ].

Now let x∗(t), y∗(t) be the solution of (6.6) with initial conditions x∗(0) = 0, y∗(0) =b∗. It follows from Theorem 4.2.1 that this IVP has a solution for all t ∈ R. Sinceb∗ is a fixed point of P , there is a time τ such that x∗(τ ) = 2π, y∗(τ ) = b∗. Now

x∗(t + τ ) − 2π, y∗(t + τ ) (6.10)

also satisfies (6.6) and has the same initial conditions as x∗(t), y∗(t). Thus, by unique-

ness, (6.10) coincides with x∗(t), y∗(t), which therefore provides our desired periodicsolution of (6.6).

To conclude this introduction, let us record a couple of simple properties of peri-odic solutions of an ODE that almost don’t require proof.

Proposition 6.1.3. If x0(t) is a periodic solution of x′ = F(x), then (i) there are no equilibria on the orbit of x0 and (ii) x0 has a minimal period.

Proof. Regarding Claim (i), if x0(t∗) = b∗ where b∗ is an equilibrium of x′ = F(x),then y(t) ≡ b∗ and x0(t) are two different solutions of the IVP

x′ = F(x), x(t∗) = b∗,

176



contradicting uniqueness. Regarding Claim (ii), let

S = τ > 0 : x0(τ ) = x0(0).

Suppose that there were a sequence τ n ∈ S converging to zero. Then

x′0(0) = limn→∞

x0(τ n) − x0(0)

τ n= 0;

in other words, this assumption contradicts Claim (i), so S must contain a minimalelement τ min > 0. By Lemma 6.1.1, every element of S is a period of x0, and τ min isa minimal period.

Remark: If τ is the minimal period of x0, then the function x0 : [0, τ ] → Rd definesa closed curve that has no self-intersections: it is closed because x0(τ ) = x0(0) andit has no self-intersections because if x0(t1) = x0(t2) for 0 ≤ t1 < t2 < τ , then byLemma 6.1.1, t2−t1 would be a smaller period of x0. In other words, x0 : [0, τ ] → Rd

is a simple closed curve, what in complex analysis (where d = 2) is called a Jordan curve. The Jordan Curve Theorem ref’ce is the basis of the special behavior of ODEs in two dimensions, which is discussed in Section 6.2.

6.1.2 Contents of this chapter

The qualitative theory of ODE—in particular, the question of asymptotic behaviorof solutions as t → ±∞—provides an instructive perspective on oscillatory solutions.The simplest asymptotic behavior as t → ∞ of a solution that remains bounded is toconverge to an equilibrium point. Indeed, in the previous chapter we saw that nearan asymptotically stable equilibrium every solution has this behavior, and that neara hyperbolic equilibrium solutions belonging to the stable manifold Ms have thisbehavior. Limit cycles represent the next level of complexity in possible asymptotic

behavior of solutions.Despite numerous analogies, limit cycles are more difficult to analyze than equi-

libria; even showing that they exist can be challenging. In Section 6.2, we introduceone of the two general analytical tools2 in this book for proving existence of periodicsolutions—i.e., the Poincare-Bendixson theorem—and we use it to show that the vander Pol equation has a periodic solution. Since this theorem is ultimately based onthe Jordan Curve Theorem, it applies only to two-dimensional systems. In moregeneral terms, Section 6.2 explores special properties of two dimensional systems,including a criterion for non-existence of periodic solutions.

Naturally, we also want to describe limit cycles as opposed to merely proving

that they exist, which is the focus of Sections 6.3 and 6.4. There are three kinds of

2The other tool, which applies in any dimension, is the Hopf bifurcation theorem in Section 7.6.

177



techniques for this task:

• Numerical computation

• Asymptotic perturbation theory

• Rigorous mathematical analysis.

Virtually any problem is amenable to numerical solution; the limitation of this tech-

nique is that one may solve equations only with specific values of the parametersin it, which can make it difficult to get an overview of the behavior of solutions.Asymptotics, which works by deriving simpler, approximate problems that can besolved explicitly, is applicable only if there is a small or large parameter that can beexploited; on the other hand, it often provides an excellent overview of the behav-ior of solutions. Rigorous analysis is the least general of the three methods—newarguments must be developed for each new problem, and many problems are toocomplicated for complete analysis; however, the attraction of complete rigor is irre-sistible for many.

Remark: It is instructive to ask yourself which method you find most appealing—

your preference provides guidance about possible career choices, or at least specializa-tions within mathematics. If you like numerics best, consider scientific computation;if you like asymptotics best, consider traditional applied mathematics; if you likerigorous methods best, consider mathematical analysis.

In this book, our approach to these three techniques is as follows: Rather thanstudy the vast arsenal of numerical techniques that have been developed to solveODEs, we rely on existing software; if you wish to explore this fascinating subjectfurther, start for example with [1] best ref’ce?. Likewise we slight rigorous anal-ysis, the third technique; for the application of such methods to van-der-Pol-likeequations, we refer you to [?, ?]. Regarding the second, in two sections of this chap-

ter we illustrate the use of asymptotic methods to describe limit cycles in ODEs.Specifically, in Section 6.3 we study the van der Pol equation in the limit of small β ,and in Section 6.4 we study a more general class of problems that includes the vander Pol equation in the opposite limit of large β . However, asymptotics is only a sec-ondary focus for this book, and we merely scratch the surface; [3] is an appropriatereference going beyond our limited coverage.

As with equilibria, there is a notion of stability for limit cycles, driven by thequestion, “What happens if we start from initial conditions that are ‘close’ to alimit cycle?” In Section 6.5, we define stability notions for limit cycles and intro-duce a general theoretical technique for analyzing their stability: the Poincare map.

Reminiscent of Theorem 5.1.1, the stability or instability of a limit cycle may bedetermined from the eigenvalues of a certain matrix related to the Poincare map.However, unlike Theorem 5.1.1, it is rarely possible actually to able to calculate this

178



K

Figure 6.4: Annular trapping region K of the sort used when applying Theorem 6.2.1to prove existence of a periodic orbit. The inner boundary of the region encloses the equilibria (bold dots) so that none are contained in K itself.Change from a generalfigure to one specific to van der Pol.

and for definiteness we will let ε = 1/2 (see Figure 6.4). Now ∂ K = ∂ K0 ∪ Γ whereΓ =

r = ε

is the circle of radius ε. We know from Section 4.3 that the flow of

(6.1) is inward along ∂ K0, the outer boundary of K. Let us parametrize Γ, the innerboundary of K, by θ. On Γ the inward normal—i.e., the normal pointing into K—is

N θ = (cos θ, sin θ), and we calculate that

F, N θ = β

1 − ε2 cos2 θ

ε sin2 θ ≥ 0.

Hence K is a trapping region for (6.1) that contains no equilibria, so there must bea periodic orbit of (6.1) inside K.

In fact, although we cannot conclude this from Theorem 6.2.1, there is a unique

periodic orbit of the van der Pol equation inside

K. We have observed this fact in

computations; we shall derive it with asymptotics in the limit of large or small β ;and we refer you to [?] for a rigorous proof for all β .

Remark: Because of the equilibrium of (6.1) at the origin, only an annulartrapping region can be equilibrium-free. This is a general phenomenon: by The-orem ??, a periodic orbit of a planar system x′ = F(x) must enclose at least oneequilibrium. Consequently, whenever we want to obtain a periodic orbit by applyingTheorem 6.2.1, the trapping region K will need to have one or more holes in it.

6.2.3 Limit sets

We now introduce a concept used in the formulation of the strong version of thePoincare-Bendixson Theorem. Unlike the rest of Section 6.2, this concept makes sense in arbitrary dimension.

180



Recall from Section 4.4.2 the flow notation ϕ(t, b) for the solution of an IVP

x′ = F(x), x(0) = b. (6.12)

A point z is called an omega-limit point of b if ϕ(t, b) is defined for all t ≥ 0 andthere exists a sequence tn of real numbers tending to infinity such that

limn→∞

ϕ(tn, b) = z.

The set of all omega-limit points of b will be denoted ω(b). Incidentally, the alpha limit set , consisting of points obtained in the limit as t → −∞, is defined analogously,but we will not make much use of this latter concept.

The omega-limit set certainly can be empty, as illustrated by the scalar ODEx′ = x with x(0) = b = 0. Of course ω(b) is non-empty if the (forward) orbitthrough b is bounded. Here are some examples of omega-limit sets.

Example 1: (A single point) If b∗ is an asymptotically stable equilibrium of x′ = F(x), then there exists an neighborhood V of b∗ such that ω(b) = b∗ for allb

∈ V . Similarly, if b

∗is a hyperbolic equilibrium and if b

∈ Ms, the stable manifold

of b, then ω(b) = b∗.

Example 2: (A limit cycle) Example 3 in Section 6.1 was the ODE in polarcoordinates

r′ = r − r3

θ′ = 1.

We saw that Γ, the unit circle, was a limit-cycle orbit of (6.4) and that every non-equilibrium solution of (6.4) approaches Γ. Thus, in the present terminology, ω(b) =Γ for all b = 0. Similar behavior occurs for van der Pol’s equation.

Example 3: (Homoclinic cycles) Let us generalize the θ-equation in the preced-ing example to

r′ = r − r3

θ′ = ϕ(r, θ).(6.13)

If as in (5.12) we have ϕ(r, θ) = 1 − cos θ, then every non-zero trajectory convergesto the equilibrium at (r = 1, θ = 0). However, addition of a term (r − 1)2 to ϕ(r, θ)changes the omega-limits greatly. In need Figure ?? we show the flow for

(a) ϕ(r, θ) = 1 − cos θ + (r − 1)2, and (b) ϕ(r, θ) = 1 − cos2θ + (r − 1)2. (6.14)

In both cases ω(b) is the unit circle Γ if b = 0 and |b| = 1. However, for (6.14a),Γ consists of the equilibrium (r = 1, θ = 0) and a homoclinic orbit connected toit as t → ±∞, while for (6.14b), Γ consists of two equilibria, (r = 1, θ = 0) and

181



(r = 1, θ = π), and heteroclinic orbits (i.e., different limits as t → ±∞) connectingthese equilibria. A simple closed curve consisting of one or more equilibria of an ODEand orbits connecting these equilibria is called a homoclinic cycle . More interestingexamples of this type of omega limits will arise naturally in Chapter 8.

Example 4: (Limit sets in higher dimensions) In the Exercises we ask youto construct a three-dimensional system for which the typical omega-limit set is atorus, the direct product of two circles. Actually, the omega-limit set can be far

more complicated than this in three or more dimensions. In fact, in the 1960’smathematicians were so perplexed by limit sets they observed that they coined thepejorative phrase, “strange attractor”. We will get a chance to dig into this richtreasure in Chapter 8.

Primarily we define omega limit sets in this chapter in order to formulate thestrong Poincare-Bendixson Theorem. Although it is a slight digression, let us pauseto develop derive a couple of simple properties of such sets. Let ϕ(t, b) be theflow associated with an ODE x′ = F(x) where F : U → Rd. We say that V ⊂ U is invariant with respect to the flow if ϕ(t, b) ∈ V for all b ∈ V and all t ∈ R.(Incidentally, positive invariance of a set

V is a less restrictive concept, requiring

only that ϕ(t, b) ∈ V for t ≥ 0.)

Proposition 6.2.2. Any omega-limit set ω(b) associated with a solution x(t) of an ODE is a closed, invariant subset of U .Proof. If ω(b) is empty, the assertion is trivial. Suppose that zm is a sequence of points in ω(b) that converges to z. Then we must show that z ∈ ω(b): i.e., thereexists a sequence tn tending to infinity such limn x(tn) = z. Since zm ∈ ω(b), there

exist sequences s(m)k , all tending to infinity as k → ∞, such that limk x(s

(m)k ) = zm.

For n = 1, 2, . . ., choose tn = s(n)k(n), with tn ≥ n, such that

|x(tn) − zn| < 1n

.

Then|x(tn) − z| ≤ |x(tn) − zn| + |zn − z|;

since both terms on the right tend to zero, the first claim in the proposition is proved.

Regarding invariance, suppose z0 ∈ ω(b); thus there is a sequence tn tendingto infinity such that z0 = limn ϕ(tn, b). Given any point z = ϕ(t∗, z0) on thetrajectory through z0, consider the sequence ϕ(tn + t∗, b). If t∗ < 0, then earlyelements ϕ(tn + t∗, b) in the sequence might be undefined if tn + t∗ < 0, but let us

restrict n ≥ N to exclude these problematic elements. By the semi-group propertyProposition 4.4.3

ϕ(t∗ + tn, b) = ϕ(t∗, ϕ(tn, b)),

182



and by continuity

limn

ϕ(t∗, ϕ(tn, b)) = ϕ(t∗, limn

ϕ(tn, b)) = ϕ(t∗, z0) = z.

Hence z ∈ ω(b), as claimed.

6.2.4 The Poincare-Bendixson Theorem: strong version

Theorem 6.2.3. (Poincare-Bendixson): Suppose that Let F : U →R2

be C1

on U , where U is positively invariant with respect to the flow and contains only finitely many equilibria. If b ∈ U , then either ω(b)

(i) consists of a single point,(ii) is a periodic orbit, or (iii) is a homoclinic cycle.

Examples 1–3 of Section 6.2.3 illustrate possibilities (i–iii) of the theorem.

Give reference

Incidentally, the hypotheses that F has only finitely many equilibria in U isessential. A simple counterexample is provided by (6.13) with ϕ(r, θ) = (r

−1)2. In

this case most trajectories converge to the unit circle Γ, but Γ is an infinite union of equilibria. A more perverse example may be constructed using the following specialcase of a result from [?] Malgrange, Ideals of diff’ble fcn: For any closed subsetK ⊂ Γ, there is a non-negative, C∞ function ψ : Γ → R such that ψ(θ) = 0 iff θ ∈ K .Now consider (6.13) with ϕ(r, θ) = (r − 1)2 + ψ(θ). Again most trajectories convergeto Γ, but now Γ may be a horrible jumble of infinitely many equilibria plus orbitsconnecting them.

6.2.5 Dulac’s Theorem

We conclude our discussion of two-dimensional systems with a proposition that showsnon-existence of periodic solutions, plus an application of that result. Q: Make ex-ercise about uniqueness result? Bad for text since there is hidden topo-logical assumption—the two periodic orbits need to be deformable to oneanother. Not a problem for a specific application. Q: Apply to van derPol?

Proposition 6.2.4. (Dulac). Suppose that F : U → R2 is C1 on the open, simply connected set U ⊂ R2. If there exists a C1 function g : U → R such that the divergence ∇· (gF) is non-negative and is not identically zero on any open subset of U , then the ODE x′ = F(x) has no periodic solutions lying entirely within

U .

Remarks: (i) The same conclusion follows if ∇ · (gF) is non-positive and is notidentically zero on any open subset of U .

183



(ii) The proof below does not shed much light on why the proposition is true.As we explore in Exercise??, such intuition can be derived from considering how thearea of regions evolve under the flow ϕ.

Proof of Proposition 6.2.4. Suppose to the contrary that there exists a simple, closedorbit Γ, and let Ω denote the interior of Γ. By Green’s Theorem,

Ω ∇ · (gF) dA = Γ(gF) · n ds,

where n, ds, and dA have their usual meanings. By our assumptions regarding∇· (gF), the double integral on the LHS is strictly positive. By contrast, the contourintegral on the RHS is zero since the velocity vector F(x) = x′ is tangent to Γ andtherefore orthogonal to the normal n.

As an interesting application of the two-dimensional theory, recall the modifica-tions of Lotka-Volterra model for a predator-prey system introduced in Section 1.6.Specifically, consider (1.41) with the carrying capacity K set equal to infinity:

x′ = xx

−ε

x + ε − xy, y′ = ρ(xy − y), (6.15)

where ρ > 0 and ε ≥ 0. If ε = 0 trajectories of (6.15) in the (open) first quadrantare periodic. The seemingly innocent factor (x − ε)/(x + ε), which is approximatelyequal to 1 if x is large, changes the dynamics completely—we claim that, apart fromthe equilibria, all trajectories converge to total extinction, (x, y) = (0, 0).

The first step in proving the claim is to apply Dulac’s theorem with g(x, y) = 1/xyand F(x, y) given by the RHS of (6.15). We calculate

∇ ·(gF) =

2ε

y(x + ε)2,

which is strictly positive throughout the first quadrant. It follows that (6.15) has noperiodic solutions in the biologically meaningful regime x > 0, y > 0.

In Exercise?? we help the reader complete the proof of the claim by combiningthe above information with the strong Poincare-Bendixson Theorem.

6.3 Limit cycles in the van der Pol equation for small β

In perturbation theory, one calculates approximate solutions of problems whose exact

solutions are not easily computed. When used with appropriate care, perturbationmethods often produce approximations that are accurate well beyond what one hasany right to expect. In this section we use perturbation theory to describe the

184



limit cycle of the van der Pol equation in the limit of small β . In Section 6.4.1 weintroduce perturbation theory through two examples, and in Section 6.4.2 we makethe application to the van der Pol equation.

6.3.1 Two illustrative examples of perturbation theory

Example 1: Consider the one-parameter family of initial value problems,

x′ = −x + εx2

, x(0) = 1, (6.16)

where ε is a small parameter. If ε = 0, then (6.16) has the solution x(t) = e−t. Evenif ε = 0, (6.16) may be solved exactly because the equation is separable. However,let us temporarily ignore this exact solution and use perturbation theory to obtainthe approximation

x(t) ≈ e−t + ε(e−t − e−2t). (6.17)

In other words the small term εx2 changes the solution by ε(e−t − e−2t), at leastapproximately.

In perturbation theory, in attacking a one-parameter family of problems like

(6.16), one considers all small values of ε simultaneously . To emphasize this pointof view we write x(t; ε) for the solution, indicating the dependence on ε, and wesuppose x(t; ε) has a power-series expansion:

x(t; ε) = x0(t) + εx1(t) + ε2x2(t) + . . . . (6.18)

Even if the series might not converge, each term in the series should be small com-pared to all terms that precede it. Inserting the expansion into (6.16) yields

x′0 + εx′1 + ε2x′2 + . . . = −[x0 + εx1 + ε2x2 + . . .] + ε[x0 + εx1 + ε2x2 + . . .]2

where the dots indicate terms that are of order ε3

or higher. Expanding out thesquared term we obtain

x′0 + εx′1 + ε2x′2 + . . . = −x0 + ε[−x1 + x20] + ε2[−x2 + 2x1x0] + . . . .

For each t, the LHS and RHS of this equation are functions of ε, and for them tobe the same functions, the coefficient of each power of ε on the left must equal thecorresponding coefficient on the right. This principle may be used to calculate ODEsfor every coefficient xn(t) in (6.18). In particular, matching terms of corresponding

185



orders through ε2 generates the ODEs

O(ε0) : x′0 + x0 = 0

O(ε1) : x′1 + x1 = x20

O(ε2) : x′2 + x2 = 2x1x0.

Since the initial condition x(0, ε) = 1 holds for all ε, it follows that

x0(0) = 1, x1(0) = 0, x2(0) = 0, . . . .

We attack the equations sequentially. First, x0(t) = e−t satisfies the O(ε0)-IVP.Given x0(t), the O(ε)-problem is an inhomogeneous IVP whose solution is x1(t) =e−t − e−2t. In the Exercises we ask you to solve the O(ε2)-IVP for x2(t).

The first two terms of (6.18) yield the approximation (6.17). To assess the ac-curacy of this approximation, we solve (6.16) explicitly via separation of variables,obtaining

x(t, ε) =1

ε + (1 − ε)et.

Please check that x(t, ε) = x0(t) + εx1(t) + O(ε2), (6.19)

as expected.

In point of fact, (6.19) holds uniformly for 0 ≤ t < ∞. Such uniformity is rare—one would expect errors in the approximation to accumulate as time increases. Thus,normally (6.19) would be uniform only over finite intervals, say 0 ≤ t ≤ T . Problem(6.16) was hand-picked so that all coefficients xn(t) in (6.18) decay as t → ∞,meaning that the estimate (6.19) is uniform over [0, ∞), but only because both sides tend to zero for large t.

Example 2: Our next example illustrates the accumulation of errors in a power-series approximation as t increases and how to cope with this difficulty. Considerthe one-parameter family of initial value problems

x′′ + (1 + ε)x = 0, x(0) = b, x′(0) = 0, (6.20)

where b = 0. The exact solution of the IVP is x(t) = b cos(√

1 + ε t), which isperiodic; in particular it does not decay as t → ∞. Ignoring the exact solution andseeking an approximation as above, we suppose x(t, ε) = x0(t) + εx1(t) + . . . andinsert this expansion into the ODE,

x′′0 + εx′′1 + . . . + (1 + ε)[x0 + εx1 + . . .] = 0.

186



Multiplying out the product and grouping like powers of ε we obtain ODEs

O(ε0) : x′′0 + x0 = 0

O(ε1) : x′′1 + x1 = −x0

subject to initial conditions

x0(0) = b, x′0(0) = 0, x1(0) = x′1(0) = 0.

The solution of the leading-order problem is x0(t) = b cos(t). Substitution of x0(t)into the O(ε)-equation leads to an inhomogeneous ODE for x1 with a resonant forcingterm:

x′′1 + x1 = −b cos t.

Imposing the initial conditions, we find

x1(t) = − b

2t sin t,

which yields the two-term asymptotic approximation

x(t, ε) ≈ x0(t) + εx1(t) = b cos t − εbt

2sin t. (6.21)

The error in (6.21) is O(ε2), uniformly for t in any finite interval 0 ≤ t ≤ T .However, this approximation fails miserably as t → ∞ (see Figure 6.5). Indeed, forlarge t, the supposedly small, first correction to x0(t) in fact becomes large comparedto x0(t)!

The problem with the simple ansatz above, x(t, ε) = x0(t) + εx1(t) + . . ., isthat both x(t, ε) and x0(t) are periodic, but they have different periods , with theperiod of x(t, ε) depending on ε. Because of this difference, the two functions cannot

remain close to one another indefinitely, no matter how small ε may be; in hindsight,obviously the correction term εx1(t) cannot stay small as t → ∞.

A bit of terminology: in an expansion x(t, ε) = x0(t) + εx1(t) + . . ., a term in acoefficients xn(t) that grows without bound as t → ∞ is sometimes called a secular term. Such terms typically come from resonant forcing, as above.

When dealing with an IVP such as (6.20) that has a periodic 3 solution, thePoincare-Lindstedt method allows one to obtain an approximation that holds forarbitrarily large times. In this method one introduces a scaled time

τ (t, ε) = ω(ε)t = (1 + ω1ε + ω2ε2 + . . .)t (6.22)

3There is a more general approximation technique for nonperiodic problems: the method of multi-ple scales, a.k.a. two-timing. See [3] for more details, or better yet, take a course in asymptotics,which is not an easy subject to learn without guidance from a pro.

187



10π π20

−4

0

4

0

exact

approximation

Figure 6.5: Comparison of the exact solution of (6.20) with its two-term regular perturbation expansion approximation (6.21) for b = 1 and ε = 0.1. To the naked eye, the Poincare-Lindstedt approximation (6.24) is indistinguishable from the exact solution for this choice of ε and time window.

and seeks a power-series expansion of x(t, ε)

x(t, ε) = x0(τ (t, ε)) + εx1(τ (t, ε)) + ε2x2(τ (t, ε)) + . . . (6.23)

in which the coefficients xn(τ ) depend on the scaled time. If the scaling factor ω(ε)is chosen cleverly, it can compensate for the mismatch between the periods of theexact and approximate solutions. Of course the $64 question is, “how to choose thescaling factor?” In the calculation below we will see that by requiring at each orderthat no secular terms arise, both the undetermined coefficient ωn in (6.22) and thenext term xn(τ ) in the series are uniquely determined.

Let’s get on with it. Invoking the chain rule d/dt = ω d/dτ , we may rewrite theODE in (6.20) as

ω2(ε) d2

xdτ 2

+ (1 + ε)x = 0.

Inserting the expansions for ω(ε) and x(t, ε), we obtain

(1 + 2ω1ε + . . . )

d2x0

dτ 2+ ε

d2x1

dτ 2+ . . .

+ (1 + ε)[x0 + εx1 + . . . ] = 0.

Expanding the products and grouping terms according to their order in ε, the leading-order and next-order correction terms obey the equations

O(ε0) :

d2x0

dτ 2+ x0 = 0

O(ε1) :d2x1

dτ 2+ x1 = −2ω1

d2x0

dτ 2− x0.

188



Since τ is proportional to t, these functions must satisfy the initial conditions atτ = 0

x0(0) = b,dx0

dτ (0) = 0, x1(0) =

dx1

dτ (0) = 0.

The solution of the leading-order equation is x0(τ ) = b cos τ . Given x0(τ ), the O(ε)-equation becomes

d2x1

dτ 2+ x1 = (2ω1 − 1)b cos τ.

Now here is the key point: to avoid a secular term in x1(τ ), we must require that ω1 = 1/2, so that the RHS of this equation vanishes. The solution of the (nowhomogeneous) IVP for x1 is the trivial function x1(τ ) ≡ 0. Thus, modulo errors thatare of order ε2 or higher, our approximation of the true solution x(t) = b cos(

√ 1 + εt)

is given byx(t, ε) ≈ b cos [(1 + ε/2)t] . (6.24)

This estimate is far more satisfactory than (6.21). As mentioned in the captionof Figure 6.5, there is no visible discrepancy between (6.24) and the true solutionof (6.20) (at least not over the given viewing window and with b = 1 and ε = 0.1).In Exercise ?? we ask you to compare the errors in (6.21) and (6.24) analytically.

In general terms, the error in (6.21) is small if εt ≪ 1, while the error in (6.24) issmall if ε2t ≪ 1. Thus, (6.24) is accurate for a much larger, but still finite, range of t. By carrying the Poincare-Bendixson approximation to successively higher orders,one can obtain approximations that are accurate if εnt ≪ 1 for any integer n.

Here is another perspective on (6.24): since√

1 + ε = 1 +ε/2 +O(ε2), (6.24) maybe derived from the exact solution cos

√ 1 + ε t by neglecting ε2t inside the argument

of the cosine.

A pessimist might complain that the above calculation seems a little mysterious,and we would agree. On the other hand, an optimist might exclaim how wonderfullyit all works out, and we would again agree. Over time we have found that the mystery

in asymptotics resolves itself, while the wonder remains and even grows.

6.3.2 Application to the van der Pol equation

Let us now apply the Poincare-Lindstedt method to analyze the limit-cycle solutionof a nonlinear ODE for which exact solutions are not available: the van der Polequation

x′′(t) + ε(x2(t) − 1)x′(t) + x(t) = 0 (6.25)

where ε is small. In notable contrast with the linear equation in (6.20), for whichall solutions are periodic, the periodic solution of (6.25) is unique up to translation.Without loss of generality we may perform a translation in t such that a local maxi-mum of the periodic solution of (6.25) is located at t = 0. Then this periodic solution

189



will satisfy initial conditions

x(0) = b, x′(0) = 0 (6.26)

where b > 0 must be determined along with the solution itself.

Here goes. Defining scaled time as in (6.22), we may rewrite (6.25) as

ω2(ε)d2x

dτ 2+ ε(x2 − 1)ω(ε)

dx

dτ + x = 0.

Inserting the expansions for ω(ε) and x into the equation, we obtain

[1 + 2ω1ε + · · · ]

d2x0

dτ 2+ ε

d2x1

dτ 2+ · · ·

+ε[(x20 − 1) + · · · ] [1 + · · · ]

dx0

dτ + · · ·

+ [x0 + εx1 + · · · ] = 0,

where we have retained only those terms that contribute to orders ε0 or ε1. Groupingterms of like order, we calculate the equations

O(ε0) :d2x0

dτ 2+ x0 = 0

O(ε1) :d2x1

dτ 2+ x1 = −2ω1

d2x0

dτ 2− (x2

0 − 1)dx0

dτ

subject to the initial conditions

x0(0) = b,dx0

dτ (0) = 0, and x1(0) =

dx1

dτ (0) = 0.

The solution of the lowest-order problem is x0(τ ) = b cos τ , where b is yet to be

determined. We substitute x0(τ ) into the O(ε) equation to obtain

d2x1

dτ 2+ x1 = 2ω1b cos τ −

b2 cos2 τ − 1

(−b sin τ ).

The problematic resonant forcing terms 2ω1b cos τ and b sin τ are easy to spot, butthere is another troublemaker lurking here as well. Indeed, by use of the trigonomet-ric identity

sin θ cos2 θ =1

4[sin θ + sin 3θ],

the ODE for x1 can be rewritten:

d2x1

dτ 2+ x1 = 2ω1b cos τ +

b3

4− b

sin τ +

b3

4sin3τ.

190



The sin 3τ term is harmless: i.e., it has the particular solution (−b3/32)sin3τ , whichis periodic. By contrast, to avoid secular terms, we must require that ω1b = 0 andb3/4 − b = 0. Since b > 0 by assumption, we conclude that ω1 = 0 and b = 2. Thusour calculation has shown that, to this order, (6.25,6.26) has a periodic solution onlyif b = 2. Hence x0(τ ) = 2 cos τ . In other words, to lowest order, the periodic solutionof (6.25) is a trigonometric oscillation of amplitude 2 and period 2π.

In Exercise ?? we help you continue the expansion to the next order. You willfind the

O(ε2)-correction to the period and calculate the

O(ε)-distortion of the orbit

from a perfect sine wave. Challenge: Can you predict whether the O(ε2)-correctionwill make the period longer or shorter? Hint: It may be useful to reflect on theinformation in the next section about the large-β behavior of solutions.

6.4 Limit cycles in the van der Pol equation for large β

6.4.1 Setting up the problem

Consider the system(a) x′ = −y − (x3/3 − x)

(b) y′ = εx(6.27)

where ε > 0 is a small parameter. In Exercise ?? we ask you to show that if x(t), y(t)satisfy this system, then with respect to the scaled time t =

√ εt the first component

satisfies the van der Pol equation

d2x

dt2 + β (x2 − 1)

dx

dt+ x = 0 where β = 1/

√ ε. (6.28)

Thus, we may study solutions of the van der Pol equation for large β by analyzingsolutions of (6.27) for small ε. This reduction of van der Pol’s equation to a first-ordersystem is similar to that of Exercise 8, but the scaling in (6.27) is more convenientfor analyzing the large-β limit. Plug Appendix on scaling?

In Exercise ?? we ask you to carry out the following steps: (i) Following the ideasin Exercise 8 in Chapter 4, construct a rectangular trapping region K0 for (6.27).(ii) Show that the origin is the only equilibrium of (6.27) in K0 (or in R2), and itis a source. (iii) By removing a small ball B around the origin from K0, obtaina trapping region for (6.27) that contains no equilibria. (iv) Invoke the Poincare-Bendixson Theorem to prove that (6.27) has a periodic solution in K0 ∼ B.

6.4.2 The limit-cycle solution

Since ε is small, (6.27) is a fast-slow system. Specifically, the x-equation evolvesrapidly compared to the y-equation. With such systems, it is natural to make the

191



y

x

Figure 6.6: A compendium of data for (6.27) with ε = 1/100. The vertical seg-ment (1, b) : −1 < b < 0 is shown in bold. The solution trajectory starting from the initial conditions (x(0), y(0)) = (1, −0.2) is shown along with the graph of the

x-nullcline (6.29).

approximation of letting the fast equation proceed to equilibrium. Here this meanssolving

y + (x3/3 − x) = 0 (6.29)

for x as a function of y and substituting the result into (6.27b). It might seem moreconvenient to solve (6.29) for y as a function of x, but conceptually it is clearer tohave x as a function of y in order to substitute into an equation for the evolution of y. With (6.27) the fast-slow approximation faces a new difficulty, not present in pre-vious instances of this approximation: i.e., solving (6.27) gives x as a multiple-valued

function of y. For this reason it is more intuitive to consider (6.27) geometrically(i.e., through pictures) rather than analytically. In geometric terms, the fact that ε

is small mean that, as shown in Figure 6.6, the flow is nearly horizontal, except hear the x-nullcline (6.29).

To explore the implications of this geometry, let us consider specific initial condi-tions for (6.27), say (x(0), y(0)) = (1, b), where −1 < b < 0. Such initial conditions lieon the line below the local maximum of −x3/3 + x at (x, y) = (1, 2/3). As indicatedin Figure 6.6, the solution initially moves to the right, staying close to the horizontalline y = b, until it reaches the nullcline. After this, equation (6.27b) pushes the so-lution upward, but at a much slower rate. As y increases, x also evolves, keeping thesolution close to the nullcline; any significant departure from the nullcline would bequickly counteracted by (6.27a). As long as x > 0, (6.27b) implies that y continues

to increase. However, once the solution reaches (1, 2/3), there no longer is a nullcline to follow . At this point (6.27b) pushes the solution off the nullcline; it begins rapidmotion, close to the horizontal line y = 2/3, towards the left branch of the nullcline.

192



After reaching the nullcline, similar behavior ensues. Specifically, the solution movesslowly down the left branch of the nullcline till it reaches (−1, −2/3), after which itmoves rapidly to the right, approximately along the horizontal line y = −2/3. Thekey observation here is this: solutions with initial conditions (x(0), y(0)) = (1, b)may start out far from one another, but after their circuit around the origin they all return close to another , clustered around the line y = −2/3.

We use this information to argue that (6.27) has a limit-cycle solution. Define amapping P : [

−1, 0]

→[

−1, 0] as follows. Given b

∈[

−1, 0], follow the solution with

initial condition (x(0), y(0)) = (1, b) as it evolves and let P (b) be (the y-coordinateof) the point where the solution first crosses the line x = 1 after its circuit aroundthe origin. As observed above P (b) ≈ −2/3 so, as need figure makes clear, P willhave a unique fixed point b∗ in [−1, 0]. Lemma 3.2.8 may be invoked to show thatthe solution with initial conditions (x(0), y(0)) = (1, b∗) is periodic.

This limit-cycle is close to the piecewise smooth curve Γ specified below. In thefollowing description, the point (2, −2/3) in the first bullet arises as the intersectionof the line y = −2/3 with the x-nullcline (6.29). Γ has four pieces (see need figure),as follows:

• Phase 1: A horizontal piece from the local minimum of the x-nullcline at(−1, −2/3), intersecting the nullcline at (2, −2/3). Here the speed is O(1).

• Phase 2: A piece that follows the x-nullcline upward from (2, −2/3) to thelocal maximum of the nullcline at (1, 2/3). Here the speed is O(ε).

• Phase 3: A horizontal piece from (1, 2/3), intersecting the x-nullcline at (−2, 2/3).Here the speed is O(1).

• Phase 4: A piece that follows the x-nullcline downward from (−2, 2/3) return-ing to (−1, −2/3). Here the speed is O(ε).

Although our discussion has been purely heuristic, reference makes this analysiscompletely rigorous. However, this theory is not light reading!

Let us contrast the solution of (6.27) with previous instances of fast-slow systemswe have encountered. During Phase 2, the solution of (6.27) is well described bythe usual fast-slow approximation: i.e., a single scalar ODE obtained by letting thefast equation proceed to equilibrium. The new element here is that the fast-slow approximation predicts its own breakdown .

Let us argue that the period of the above limit cycle is approximately

(3

−2ln2)ε−1. (6.30)

Most of the time required to complete the cycle around the origin is spent in Phases 2

193



and 4, and by symmetry both phases last the same time; thus

Period of Γ ≈ 2 × [Time spent in Phase 2].

To estimate the time spent in Phase 2, it is convenient to alter the description aboveof the fast-slow approximation. Here we propose to solve (6.29) for y as a functionof x and substitute the result on the left-hand side of (6.27b), giving

d

dt

x − x3

/3

= εx.

Differentiating, dividing by x, and separating variables we obtain

(x−1 − x)dx = εdt.

Integrating this from x = 2 to x = 1, we deduce that the duration of Phase 2 isapproximately,

(3/2 − ln2)ε−1,

from which (6.30) follows.

With a tour de force application of higher-order asymptotics, it has been shownthat the period the limit cycle equals

(3 − 2ln2)ε−1 + Cε−1/3 + O(| ln ε|) (6.31)

where the constant C can be expressed in terms of a zero of the Airy function. TheO(ε−1/3)-term is perhaps surprising since it seems like the neglected durations of Phases 1 and 3 are only O(1) as ε → 0. The subtle point is that a substantialamount of time is required to make the transitions from Phase 2 to 3 and fromPhase 4 to 1: i.e., the transitions at which the solution is pushed off the nullcline.

The estimate (6.31) represents a good-news-bad-news kind of situation. The good

news is that such precise estimates can be obtained through the careful application of asymptotics. The bad news is that lengthy calculations are needed to derive ( 6.31),and sloppy analysis might easily miss the O(ε−1/3)-term altogether.

6.4.3 Relaxation oscillations in the van der Pol equation

Figure 6.7 shows the graph of the periodic solution of (6.28) for β = 10. As onewould expect from the above piecewise smooth limit-cycle solution of (6.27), thefigure shows long intervals of slow evolution of x separated by brief intervals inwhich x “relaxes” to a new metastable state. Such oscillations are described by the

term relaxation oscillations . Undoing the scaling in (6.28), we see that the period of these oscillations is approximately (3 − 2ln2)β ; in particular, the period gets largeas β → ∞. Intuitively, the first-order term in (6.28) involves friction, and as friction

194



x

t

Figure 6.7: Relaxation oscillations in the van der Pol equation (6.28) with β = 10.

gets large, all motion slows down.

6.5 Stability of periodic orbits: the Poincare map

6.5.1 The basic construction

The definition of stability for limit cycles is completely analogous to stability of equilibria. Thus, we say a limit cycle Γ of x′ = F(x) is Liapunov stable if for everyneighborhood V of Γ, there is a smaller neighborhood V 1 such that if initial datab are restricted to belong to V 1, then the IVP is solvable for all positive times andmoreover x(t) ∈ V for all t ≥ 0. Similarly, we say a limit cycle is asymptotically stable if it is Liapunov stable and if there is one neighborhood V ∗ of Γ such that forall initial data in V ∗ the solution of the IVP satisfies

limt→∞ dist(x(t), Γ) = 0.

For completeness, let us record that the distance from a point x to a compact set Kis defined by

dist(x, K) = miny∈K

|y − x|.

For example, the limit cycles in Examples 3 and 4 of Section 6.1 and the limit cyclein the van der Pol equation, for any value of β , are asymptotically stable. Intuitivelythis seems clear, but moving beyond intuition, in this section we introduce a rigoroustechnique for establishing the stability of a limit cycle—the Poincare map. Similar

to Theorem 5.1.1, which considers stability of an equilibrium point, the stability of alimit cycle is related to the eigenvalues of a certain matrix. Unfortunately, for limitcycles, this matrix is often difficult to calculate by hand.

195



It is customary to say the Poincare map, but in fact there are many: the mapswe are about to construct depend on arbitrary choices4, as follows:

• Choose a starting point on the trajectory, say its position γ (0) at time zero.

• Choose a small section Σ of any smooth, (d−1)-dimensional surface transverseto the periodic orbit at γ (0).

In symbols Σ may be written

Σ = b ∈ B(γ (0), η) : p(b) = 0 (6.32)

where B(γ (0), η) is the ball in Rd of radius η with center γ (0) and p : B(γ (0), η) → R

is a C1 function such that

p(γ (0)) = 0 and ∇ p(γ (0)), γ ′(0) = 0. (6.33)

Now for initial conditions b ∈ Σ, consider the IVP

x′ = F(x), x(0) = b. (6.34)

If b = γ (0), then the solution of (6.34) crosses Σ when t = T , the minimal period of γ (t), precisely at the point γ (0). The Poincare map focuses on the question: starting

from more general b ∈ Σ, when and especially where does the solution ϕ(t, b) of

(6.34) next cross Σ. In mathematical terms the question “When does ϕ(t, b) crossΣ?” may be answered by solving

p(ϕ(t, b)) = 0 (6.35)

for t, say t = τ (b); and the question “Where does ϕ(t, b) cross Σ?” is answered bythe formula that defines the Poincare map P ,

P (b) = ϕ(τ (b), b). (6.36)

Theorem 6.5.1. There exists a neighborhood 5 N ⊂ Rd of γ (0) such that: (i) if b ∈ Σ ∩ N , then (6.35) has a unique solution τ (b) ≈ T that is a C1 function of b,and (ii) equation (6.36) defines a C1 map P : Σ ∩ N → Σ.

Proof. (i) Let us apply the Implicit Function Theorem to (6.35). Of course p isdifferentiable, and we know from Theorem 4.5.1 that ϕ is also differentiable. By

4“The” is justified in the sense that all of these mappings may be transformed to one another byappropriate changes of coordinates.

5In many examples that we consider, in which γ is asymptotically stable, the neighborhood N may seem like an unnecessary complication. See Exercise ?? for a case where γ is unstable and

N must be included.

196



too largeη not too largeη

η

γ γ

Figure 6.8: If η is chosen too large, γ could cross Σ prematurely.

periodicity, ϕ(T, γ (0)) = γ (0), so if b = γ (0), then t = T solves (6.35). By the chainrule, the t-derivative of (6.35) at (T, γ (0)) equals

∇ p(γ (0)), ∂ tϕ(T, γ (0)); (6.37)

by periodicity∂ tϕ(T, γ (0)) = ∂ tϕ(0, γ (0)) = γ ′(0);

and by (6.33) the derivative (6.37) is nonzero. This proves Claim (i).

(ii) The map P is C1 because it is the composition of differentiable functions.

Remark: If the radius η of the Poincare section (6.33) is sufficiently small, thenτ (b) represents the first time at which the trajectory ϕ(t, b) returns to Σ. Thus,the Poincare map is sometimes called the first-return map. If η were too large, theperiodic orbit γ could cross Σ “prematurely” without completing a full cycle (seeFigure 6.8). Even if η were too large, the requirement that τ (b) depends smoothly

on b selects the right solution of (6.35), but we shall nevertheless suppose that η isappropriately small.

The Poincare map allows us to recast questions regarding the stability of a pe-riodic orbit γ . Suppose b0 ∈ Σ ∩ N is a point near γ (0), and follow the trajectoryϕ(t, b0) forward in time. At time t = τ (b0), the trajectory makes its first returnto Σ, crossing the Poincare section at the point b1 = P (b0). If b1 ∈ Σ happensto belong to Σ ∩ N , then the trajectory crosses Σ a second time at b2 = P (b1).Continuing for as long as these iterates remain in Σ ∩ N , we may recursively definea sequence of subsequent crossings, bn+1 = P (bn). If bn ∈ Σ ∩ N for all n and if

the trajectory ϕ(t, b0) converges to the periodic orbit, then bn → γ (0). Conversely,in Exercise ?? we ask you to show that if bn ∈ Σ ∩ N for all n and if bn → γ (0),then the trajectory ϕ(t, b0) converges to the periodic orbit.

197



and τ ′(1) may be calculated by implicit differentiation of its defining equation ( 6.40).Now by Theorem 4.5.1, ∂ ϕ/∂b(t, 1) satisfies a linear ODE with coefficients obtainedfrom the differential of (6.5) along the periodic solution: i.e.,

d

dt

∂ϕr

∂b

= 2

∂ϕr

∂b

,

d

dt

∂ϕθ

∂b

= 0.

The theorem also provides initial conditions at t = 0

∂ϕr

∂b(0, 1) = 1, ∂ϕθ

∂b(0, 1) = 0.

The solution of this IVP is

(a)∂ϕr

∂b(t, 1) = e−2t, (b)

∂ϕθ

∂b(t, 1) ≡ 0. (6.42)

Differentiating (6.40) and invoking (6.42b), we find that τ ′(1) = 0. Substituting(6.42a) into (6.41) and recognizing from the original periodic solution that τ (1) = 2π,we calculate P ′(1) = e−4π, which of course agrees with our previous result.

Example 2: The torqued pendulum (6.6)In Section 6.1 we constructed a periodic solution of (6.6) from a fixed point of

a mapping P : (ε, M ) → (ε, M ). Adopting the framework of Poincare maps, let usdefine a section

Σ = (x, y) ∈ S 1 × R : x = 0 (mod 2π) and ε < y < M .

In analyzing this equation, we showed there that if a trajectory of (6.6) started at apoint (0, b) ∈ Σ, then P (b) equals the y-coordinate of the point where the trajectorynext crosses Σ. Thus, if (ε, M ) and Σ are identified, then P is just the Poincare map

of the periodic solution we constructed. Since we calculated that P ′(b) < 1, we mayconclude that the limit cycle is asymptotically stable.

Example 3: The van der Pol equation for small β

We will calculate an approximate Poincare map with perturbation theory. Su-perficially the calculation resembles our calculation of the periodic solution in Sec-tion 6.3, but actually it is quite different, as we note below. Because the calculationsare more convenient with a second-order scalar equation than with a first-ordersystem, we use coordinates (x, x′) on the plane; thus, sometimes prime means differ-entiation, but it may also simply be a distinguishing mark for a second coordinate.

Define a section contained in the positive x-axis, say

Σ = (x, x′) ∈ R2 : 1 < x < 3, x′ = 0.

200



Regarding the Poincare map, given initial data (b, 0) ∈ Σ, consider the IVP

x′′ + ε(x2 − 1)x′ + x = 0, x(0) = b, x′(0) = 0

whose solution we will denote x(t, ε). First difference from Section 6.3: here b isarbitrary, not the special value that leads to a periodic solution. The Poincare mapis given by

P (b) = x(τ (b), ε) (6.43)

where τ (b) satisfies the equation

∂x

∂t(τ (b), ε) = 0. (6.44)

We look for an expansion of x(t, ε) with the usual form

x(t, ε) = x0(t) + εx1(t) + . . . .

Second difference from Section 6.3: we do not rescale time here because we wantonly to follow the solution for one loop around the origin; there is no issue of errors

accumulating over long times. Substituting the series into the equation and repeatingsome calculations from Section 6.3, we find ODEs

O(ε0) :d2x0

dt2+ x0 = 0

O(ε1) :d2x1

dt2+ x1 =

b3

4− b

sin t +

b3

4sin3t

subject to initial conditions

x0(0) = b,dx0

dt

(0) = 0; x1(0) =dx1

dt

(0) = 0.

Third difference from Section 6.3: the solution of the IVP for x1 will have secularterms, and in fact they are crucial—they are what make the solution converge to thecircle r = 2 as t → ∞. The solutions of these IVPs are

x0(t) = b cos t

x1(t) =

b3

4− b

− t

2cos t +

1

2sin t

− b3

32

sin3t − 1

3sin t

.

Substitution into (6.44) gives the equation

−b sin τ (b) + O(ε) = 0,

201



from which we conclude that τ (b) = 2π +O(ε). Then, substitution into (6.43) yields6

P (b) = x0(2π) + εx1(2π) + O(ε2) = b − πε

b3

4− b

+ O(ε2).

(At first blush one might expect an O(ε)-error in τ to contribute an O(ε)-error inx0(τ ), but because x′0(2π) vanishes, this pushes the error to higher order.) Differen-tiating and substituting b = 2, we find

P ′(2) = 1 − 2πε + O(ε2).

Hence P ′(2) < 1 provided ε is sufficiently small, so the limit cycle is asymptoticallystable.

Example 4: The van der Pol equation for large β

This example is similar to the torqued pendulum, considered above. Specifically,in Section 6.4 we defined a Poincare map P before we knew there was a periodicsolution, we obtained a periodic solution as a fixed point of P , and a posteriori wecan see that P is the Poincare map of the periodic solution so constructed. Moreover,

P ′ ≪ 1, so the solution is asymptotically stable.

Remark: No calculation possible for β between extremes.

6.6 Exercises

Rmk: If I ≤ 0 (and if ε is small), then FH/N is an excitable system. Explain. Alsotrue if I > 2/3. Actually, you need to make an exercise to introduce the FH/Nsystem.

Exercises verifying claimed limit sets in Section 6.2.3.

Also give example of “homoclinic cycle”.

Generalize Section 6.4 to larger class of FH/N eqn

6Neglecting the O(ε2)-term in this equation, we see that P (b) = b iff b = 2. This reproduces thePoincare-Lindstedt result of Section 6.3 that the periodic solution has radius 2, perhaps withfewer technicalities. Indeed, one may ask why bother with the Poincare-Lindstedt method at all?You may not find our answer terribly compelling: (i) it is traditional, and you will see it in bookson asymptotics, and (ii) it is somewhat easier to determine the orbit to higher order with this

method.

202



Example 6: The equation x′′ + βx′ + x = C cos ωt models a periodically-forceddamped oscillator. As we showed in Exercise 6, as t → ∞ every solution of thisequation tends to

xpartic(t) = A cos ωt + B sin ωt, (6.45)

where the coefficients A, B were calculated in the exercise.

With the usual constructions, the above second-order, non-autonomous ODE canbe written as an autonomous first-order system with three variables:

x′ = yy′ = −x − βy + A cos z z ′ = ω.

(6.46)

Because the RHS of this system is periodic in z , we may regard it as an ODE onR2×S 1, a generalized cylinder. By comparison with (6.45), we deduce that the curve

(x,y,z ) ∈ R2 × S 1 : x = A cos z + B sin z, y = −ωA sin z + ωB cos z

is a solution orbit of (6.46) and moreover every solution tends towards this orbit ast→ ∞

. Make solution more explicit, or make an exercise to fill in details.

Example 3: (Serendipity) In Exercise 5 in Chapter 1, we saw that every solutionof the (scaled) Lotka-Volterra equations

x′ = x − xy, y′ = ρ(xy − y) (6.47)

in the open first quadrant of the x, y-plane was contained in a level set of the functionL(x, y) = ρ(x − ln x) + y − ln y. It follows from the corollary that, except for theequilibrium solution (x(t), y(t)) ≡ (1, 1), these solutions are all periodic.

Make Exercise:Estimate the period of the relaxation oscillator. Parametrize eqn with xeq = µ,

recalling that I (µ) = µ3 − µ2 + µ/γ . Along the x-nullcline y = x2(1 − x) + I (µ)want to know how long it takes to move from (1, I ) to (2/3, I + 4/27). The motionis driven by y-eqn, but the calculations are more tractable in terms of x. Along thebranch we have

(2x − 3x2)dx/dt = dy/dt = γε[x/γ − (x2 − x3 + I (µ))].

Rework this to show the cubic on RHS equals

RHS = γε(x − µ)q (x)

203



is given by

DF(x, y) =

(1 + a) − 2ay2 −2xy

2bxy bx2 + (1 − b)

.

In particular,

DF(0, 0) =

1 + a 0

0 1 − b

and DF(1, 1) =

1 − a −2

2b 1

.

If, for example, α = −1 and β = 1, then 0 is a repeated eigenvalue of DF(0, 0), andtherefore (0, 0) is an asymptotically stable fixed point. The eigenvalues of DF(1, 1)

would be 32±

√ 152 i, and the fact that these have modulus larger than 1 implies that

(1, 1) is unstable for this choice of a and b. Testing the stability of the other threefixed points is handled in a similar fashion.

Of course, the stability of the fixed points in the above example depends upon thechoices of the parameters a and b and, for each fixed point, it is possible to charac-terize the ranges of a and b for which asymptotic stability occurs. Because DF(0, 0)is diagonal, the eigenvalues are the diagonal entries, and Theorem 6.5.3 implies that(0, 0) is asymptotically stable if a ∈ (−2, 0) and b ∈ (0, 2). The eigenvalues λ1, λ2 of

the [non-triangular] matrix DF(1, 1) are less apparent, but the following counterpartto Proposition 2.4.4 allows us to test whether |λ1| < 1 and |λ2| < 1 without actuallycomputing the eigenvalues. end of exercise

Q: Another exercise?

Proposition 6.6.1. If A is a 2 × 2 matrix, then its eigenvalues have modulus less than 1 if and only if

(i) trA + det A > −1(ii) trA − det A < 1(iii) det A < 1.

Proof. See Exercises.

• Show that a scalar, autonomous ODE x′ = f (x) cannot have periodic solutions.To see why, suppose indirectly that there exists a periodic solution of x′ = f (x),and let p denote its period. Then, multiply both sides of the ODE by dx/dtand then integrate over an interval of length p and try to spot a contradiction.

• In Lotka-Volterra, all orbits are periodic

• Apply Dulac’s theorem for a Lotka-Volterra system in which predators cansaturate:

x′ = growth −xy

1 + Sx y′ = ρ xy

1 + Sx − y

.

• Prove first statement in Theorem 6.5.1.

205



• Prove that if A is a d×d matrix with eigenvalues λ1, λ2, . . . λd, then limn→∞ An =0 if and only if |λi| < 1 for each i = 1, 2, . . . d.

• Prove Theorem 6.5.3.

• Prove Proposition 6.6.1.

• Traveling wave of KdV (or maybe in earlier Chapter?)

6.7 Appendix: Index theory in two dimensions

The notion of the index of a closed curve Γ relative to a C1 vector field F : U → R2

will help us analyze global behavior of planar systems. In this section, f and g willdenote the components of F, and we shall analyze systems of the form

x′ = f (x, y) and y′ = g(x, y). (6.48)

Much of the theory in this section relies upon a famous result regarding Jordancurves in the plane:

Theorem 6.7.1. (Jordan Curve Theorem): Every Jordan curve Γ in the plane sep-arates R2 into two disjoint, open, connected sets, both of which have Γ as their boundary. One region is bounded and simply connected, while the other is neither bounded nor simply connected.

The Jordan Curve Theorem seems rather intuitive in that we might expect anysimple, closed curve Γ to divide the plane into regions interior and exterior to Γ.However, the proof (ref) of Theorem 6.7.1 is surprisingly complicated, as Jordancurves can be very elaborate (e.g., labyrinths). ref, maybe to Olmstead Coun-terexamples in Analysis or Bill Ross’ survey paper?

Let F be as described above, and suppose that Γ is a Jordan curve whose graphcontains no zeros of F. The index of Γ relative to F is an integer I F(Γ) that measuresthe winding of the vector field F as Γ is traversed exactly once in the counterclockwisedirection. More explicitly (see Figure 6.9), the angle7

θ = arctan

g(x, y)

f (x, y)

(6.49)

formed by the vector F(x, y) and the positive x axis varies continuously as Γ istraversed. If ∆θ denotes the net change in θ over one counterclockwise cycle of Γ,

7Here, θ is not necessarily confined to [0, 2π). For example, if the vector F(x, y) “spins” clockwisefour times during one counterclockwise trip along Γ, then θ has decreased by 8π and the indexof Γ is −4.

206



Examples corresponding to these three cases would be the vector fields F (x, y) =(−x, −y), F (x, y) = (x, y), and F (x, y) = (−x, y), respectively, for which the originis the lone equilibrium.

Proposition 6.7.6. If Γ happens to be the orbit of a periodic solution of x′ = F(x),then I F(Γ) = 1.

Proof. Here, we opt for a heuristic argument as opposed to a technical proof. At anypoint x on Γ, the vector F(x) is tangent to the graph of Γ. Therefore, during onecounterclockwise trip around Γ, the vector F(x) must spin once counterclockwise, sothat ∆θ = 2π and I F(Γ) = 1.

Proposition 6.7.7. Suppose x1, x2, . . . xn are isolated equilibria associated with a C1

vector field F in the plane. If Γ is a Jordan curve containing these equilibria in its interior, then

I F(Γ) =n

i=1

I F(xi).

Proof. We sketch the proof for the special case of n = 2 equilibria, from which

the general case follows immediately. Because the two equilibria are isolated andcontained on the interior of Γ, it is possible to construct two disjoint circles centeredat the equilibria and contained inside Γ (see Figure 6.10). Cut the Jordan curve Γinto two piecewise smooth Jordan curves along the dashed lines and circles, resultingin two piecewise smooth Jordan curves as illustrated in the figure. Let J upper =Γu ∪ Au ∪ · · · ∪ E u, denote the Jordan curve in the “upper half” of the figure, and letJ lower (defined analogously) denote the Jordan curve in the lower half of the figure.By Proposition 6.7.3, both J upper and J lower have index zero because they enclose noequilibria. The indices of J upper and J lower are also equal to the sum of the changesin the angle θ over each of the smooth arcs whose unions form those curves:

∆θ(J upper) = ∆θ(Γu) + ∆θ(Au) + ∆θ(Bu) + ∆θ(C u) + ∆θ(Du) + ∆θ(E u) = 0

∆θ(J lower) = ∆θ(Γl) + ∆θ(Al) + ∆θ(Bl) + ∆θ(C l) + ∆θ(Dl) + ∆θ(E l) = 0.

Now convince yourself of each of the following:

• ∆θ(Γ) = ∆θ(J upper) + ∆θ(J lower);

• ∆θ(Au) = −∆θ(Al) and similarly for the pairs C u, C l and E u, E l;

• Combining the preceding facts, the change in θ during one trip around Γ mustequal the negative of the change in θ around the circular arcs formed by Bu, Bl,D

u, and D

l. The circles formed by B

u, B

land by D

u, D

lare oriented clockwise

and, as we complete one clockwise trip around each circle, ∆θ/2π measures thenegative of the index of the equilibrium enclosed by the circle.

210



AU

BU

CU

DU

EU

AL

CL

DL

Γ U

EL

BL

Γ L

Γ

Figure 6.10: Illustrating the proof of Proposition 6.7.7 . Cut the Jordan curve Γalong the dashed lines to get the (U)pper and (L)ower pieces.

• Piecing everything together, the index of Γ must be the sum of the indices of the two equilibria.

The previous two propositions combine to form a rather useful result.

Theorem 6.7.8. If γ (t) is a periodic solution of the planar system (6.48), then the interior of its orbit Γ must contain equilibria whose indices sum to 1.

Theorem 6.7.8 has a host of consequences. A periodic orbit Γ must enclose atleast one equilibrium and, if the interior of Γ contains exactly one equilibrium, itmust be a stable node or an unstable node (which have index 1) as opposed to asaddle (index −1). A periodic orbit cannot enclose an even number of hyperbolicequilibria.

Index theory can sometimes be used to prove non-existence of periodic orbits.Consider the system

x′ = αx − γxy, y′ = βy − γxy, (6.52)

where α, β , and γ are positive parameters. The system (6.52) can be interpretedas a crude model for population of two species in competition for the same foodsource. If species x is absent, then y grows exponentially with growth constant β ,and if y = 0 then x grows exponentially with growth constant α. Both species arepenalized equally by the term γxy , which is proportional to the product of the twopopulations. There are four nullclines in the phase plane: x = 0, x = β/γ , y = 0,and y = α/γ , and two equilibria, the origin and (x∗, y∗) = (β/γ,α/γ ). The Jacobian

211



matrices associated with these equilibria are

DF(0, 0) =

α 00 β

, DF(x∗, y∗) =

0 −β

−α 0

.

The origin is an unstable node since both eigenvalues of DF(0, 0) are real and pos-itive, and (x∗, y∗) is a saddle because det DF(x∗, y∗) = −αβ < 0. It follows thatI F(0, 0) = 1 and I F(x∗, y∗) = −1. If a periodic orbit (call it Γ) exists, Theorem 6.7.8implies that Γ must enclose equilibria whose indices sum to 1, and the only waythis is possible is if Γ encloses the origin but not (x∗, y∗). Certainly, such Γ wouldfail to be biologically relevant since it would include points with negative x and ycoordinates. In fact, we claim that there can be no periodic orbits even if we allowthe possibility that x < 0 or y < 0. Any trajectory Γ enclosing the origin wouldcross both coordinate axes, violating the existence and uniqueness theorem becausethe axes themselves form solution trajectories. It follows that (6.52) cannot haveperiodic solutions.

212



Chapter 7

Bifurcation from equilibria

In Chapter 5 we studied the behavior of solutions of an ODE near a hyperbolicequilibrium point. In this chapter we address behavior near a nonhyperbolic point.Both for theoretical reasons and applications, it is natural to consider this problemin the context of a one-parameter family of ODE, say

x′ = F(x, µ) (7.1)

where F : U × I → Rd is a vector-valued function on an open subset of Rd ×R. Forexample, suppose that for µ near some fixed value µ∗ in the interval I , (7.1) has asmoothly varying equilibrium xeq(µ) that is nonhyperbolic for µ = µ∗ but hyperbolicon either side of µ∗. Bifurcation theory seeks to characterize the behavior of solutionsof (7.1) for µ near µ∗. Unlike near a hyperbolic point, in the present context thebehavior of solutions depends crucially on nonlinear terms in the expansion of F atthe equilibrium point.

In this chapter we study bifurcation phenomena, focusing especially on specificexamples taken from applications. After presenting the most familiar type of bifur-cation in Section 7.1, we pause to summarize the remainder of the chapter.

7.1 Example 1: Pitchfork bifurcation

(a) The rotating pendulum

In Exercise 8(d) of Chapter 4 we encountered the system

x′ = y (7.2)

mℓy′ = −βy − mg sin x + m(ℓ sin x)ω2 cos x,

which describes the motion of a pendulum of length ℓ that is rotating about a vertical

213



axis with constant angular speed ω, as illustrated in Figure 7.1. We fit (7.2) into thecontext of (7.1) by identifying ω as the bifurcation parameter µ. For any value of ω, (7.2) has the obvious equilibrium x = y = 0, in which the bead is located at thebottom of hoop. (We ignore the precarious equilibrium at x = π.) To investigatestability we compute the 2 × 2 Jacobian matrix1 of (7.2) at (0, 0):

DF(0, 0, ω) =

0 1

−g + ℓω2 − β m

.

This determinant of this matrix, g − ℓω2, vanishes if ω =

g/ℓ, in which case theequilibrium is nonhyperbolic.

In fact, this example possesses additional structure that is typical for bifurcationproblems. Specifically, as the reader may easily verify, Exercise if ω <

g/ℓ, then

the equilibrium (0, 0) is asymptotically stable, while if ω >

g/ℓ, it is unstable(more precisely, a saddle point). More colloquially, we say that the equilibrium losesits stability when ω crosses

g/ℓ.

The central message of bifurcation theory is this: When an equilibrium loses stability as a parameter is varied, expect new solutions of some type to appear. Acting

on this message, we look for steady-state solutions of (7.2). The first equation impliesthat y = 0, and the second then yields the condition

(−g + ℓω2 cos x)sin x = 0. (7.3)

The sine factor vanishes if x = 0 or x = π: i.e., this factor gives the two equilibrianoted above. The other factor vanishes if

cos x =g/ℓ

ω2. (7.4)

This equation has no real solutions if ω < g/ℓ, but two real solutions appear as

soon as ω crosses the critical value

g/ℓ. (Can you hear the spirit of bifurcationtheory whispering smugly, “I told you so”?)

Figure 7.2, known as a bifurcation diagram , shows a graph of these various equi-librium solutions in the x, ω-plane. Intervals of ω where the equilibria are stable areindicated by a solid curve; unstable, by a dotted curve. (In the Exercises we askyou to show that the new equilibria given by (7.4) are stable, as is indicated in thefigure.)

Bifurcation diagrams are usually interpreted in the context of what is calledquasistatic variation of parameters . Imagine that, starting from the equilibrium

1In the context of a general equation (7.1), the notation DF denotes the matrix of derivatives of F with respect to the state variables x1, . . . , xd only . We write out derivatives with respect toparameters explicitly, such as ∂ F/∂ω in the case of (7.2).

214



x = 0 with ω <

g/ℓ, we increase ω by a small increment and wait until the systemreturns to equilibrium; then increase ω by another small increment and again waitfor re-equilibration, etc. Nothing will happen as long as ω stays smaller than

g/ℓ—

the system will remain at its stable equilibrium at x = 0. However, when ω crosses g/ℓ, we expect the system to move away from this equilibrium. Strictly speaking,

x = 0 is still an equilibrium when ω >

g/ℓ, but since it is now unstable, if thesystem is subjected to the slightest bit of noise, the solution will evolve away fromx = 0. It is natural to conjecture that, for ω > g/ℓ, the solution will tend to one

of the equilibria (7.4). In fact, in Exercise ?? of Chapter 5 we already asked youto show this. The solution may evolve to either equilibrium, x = ± arccos(g/ℓω2),when ω first crosses

g/ℓ—which case occurs depends on accidents in the initial

conditions and the noise. However, once one of the two branches has been selected,the system will follow that branch under further quasistatic increases of ω.

Various remarks: (i) The origin of the term bifurcation may be seen in Figure 7.2:as ω is increased the unique stable solution x = 0 is replaced by the two stable solu-tions, x = ± arccos(g/aω2). (The now-unstable equilibrium at x = 0 for ω >

g/ℓ

is not included in the counting—thus one does not speak of “trifurcation”.) (ii) Theparticular bifurcation diagram in Figure 7.2 is known as a pitchfork , for obvious

reasons. Pitchfork bifurcations are common in systems that exhibit reflectional sym-metry. (See Section 7.5.3 for elaboration of this statement.) Note that solutions of the reduced equation (7.3) are unchanged by the reflection x → −x. In the originalODE, symmetry is expressed as the following property: If (x(t), y(t)) is a solution of (7.2), then so is (−x(t), −y(t)). (iii) It follows from general principles, which will bedeveloped below, that the bifurcating solutions in Figure 7.2 are stable—one neednot do a specific calculation to derive this fact, although for pedagogical reasons weask you to perform this exercise.

(b) The Lorenz equations

As a second example of a pitchfork bifurcation, recall from Exercise 8(e) theLorenz equations

x′ = σ(y − x)y′ = ρx − y − xz z ′ = −βz + xy,

(7.5)

where σ, ρ, and β are positive parameters. We reverse the order of presentation fromthe previous example—here we first look for equilibrium solutions of (7.5), and thenwe make the connection with a loss of stability. The first equation implies that x = yat equilibrium, the third equation then implies that z = y2/β , and substitution intothe second yields the equation

y(ρ − 1 − y2/β ) = 0. (7.6)

215



Equilibrium Description Stability

(0, 0) Extinction A saddle for all K (K, 0) Prey-only A stable node if K < 1

A saddle if K > 1(1, 1 − 1/K ) Co-existence An unphysical saddle if K < 1

A sink for K > 1

Table 7.1: Equilibria of (7.9), the Lotka-Volterra system augmented by logistic

growth.

ing steady-state bifurcation. A full analysis of Hopf bifurcation is beyond the scopeof this book, but Section 7.7 introduces some theoretical tools for studying thesebifurcations.

Finally in Section 7.8 we apply the tools of the preceding section to a Hopf bifurcation in a specific equation that arises in models for nerve cells, the FitzHugh-Nagumo equations.

7.3 Example 2: Transcritical bifurcation

The Lotka-Volterra equations with logistic limits

In this section we encounter another type of bifurcation. Recall from Section 1.6the Lotka-Volterra model of a predator-prey system (1.41) modified to have logisticgrowth for the prey (but, for simplicity, not including the Allee effect):

(a) x′ = x(1 − x/K ) − xy(b) y′ = ρ(xy − y).

(7.9)

The variables x and y represent prey and predator populations, respectively, whileρ, K are positive parameters; we regard the carrying capacity K as the bifurcationparameter. The equilibria of (7.9), along with their stabilities, are listed in Table 7.3,which is taken from Table 5.3; and the equilibria are graphed in the bifurcationdiagram of Figure 7.5.

The example provides another illustration of the fundamental phenomenon of bifurcation theory. Specifically, the prey-only equilibrium loses stability as K crosses1, and the co-existence equilibria bifurcate from the prey-only equilibrium at preciselythis point. This kind of bifurcation is known as a transcritical bifurcation becausethe bifurcating solutions exist for K both below and above the bifurcation point.

It is useful to articulate the behavior implied by Figure ?? if, starting from K < 1,this parameter is increased quasistatically. As long as K < 1, equation (7.9) predictsthat the predators will die out; but when K > 1, then any solution with a nonzero

220



0

0 1 2K

1−1/K

S

S U

U

y

Figure 7.5: Transcritical bifurcation in (7.9) at K = 1. The equilibrium (x, y) = (K, 0) switches from (S)table to (U)nstable as K increases past 1, just as

(x, y) = (1, 1 − 1/K ) switches from unstable to stable.

prey population at t = 0 will converge to the co-existence equilibrium. We may saythat the two equilibria in Figure ?? experience an “exchange of stability” at K = 1.This idea will be developed considerably in Section 7.6.

7.4 Example 3: Saddle-node bifurcation

In pitchfork and transcritical bifurcations, a smoothly varying equilibrium solution of an ODE loses stability as a parameter varies. Saddle-node bifurcations , also known aslimit-point bifurcations [] or blue-sky bifurcations [], differ in that a stable equilibrium“disappears” altogether.

(a) The torqued pendulum

The most easily visualized such bifurcation is the torqued pendulum, introduced

in Section 4.3.4: x′ = yy′ = − sin x − βy + µ.

(7.10)

The bifurcation diagram for this equation—a graph of the equilibrium value of x vs.the bifurcation parameter µ—is shown in Figure 7.7. In the exercises we ask you toshow that the solution satisfying 0 < x < π/2 is a stable node while the solutionin (π/2, π) is a saddle, which of course is unstable. (In particular, this informationmotivates the name “saddle-node”.) In words, the stable equilibrium disappearswhen it and the unstable equilibrium annihilate one another.

Once again, the bifurcation diagram Figure ?? suggests a specific scenario under

quasistatic increase of µ: While µ < 1, the system can follow its stable equilibrium inthe interval (0, π/2), but when µ passes 1, the system evolves to states far removedfrom this equilibrium. Specifically, it converges to the periodic solution discussed in

221



µtorque

x

Figure 7.6: Torqued pendulum.

Example 4 of Section 6.1.

(b) Activator-inhibitor systems

Let’s show that a saddle-node bifurcation also appears in the activator-inhibitorsystem (??),

(a) x′ = α 11+r

x2

1+x2− x

(b) r′ = γ [β x2

1+x2− r].

(7.11)

In Section 5.3 we were interested in cases where α was large and (7.11) had threeequilibrium solutions, but here we remove such restrictions on α, which we regard asthe bifurcation parameter. To enumerate equilibria of (7.11), first we solve (7.11b)to obtain

r = β x2

1 + x2; (7.12)

then, excluding the zero solution from (7.11a), we divide by x and rewrite thisequation as r = αx/(1 + x2) − 1, substitute (7.12) for r in this formula, clear 1 + x2

from the denominator, and rearrange, yielding

(β + 1)x2 − αx + 1 = 0.

Thus, recalling the zero solution, we see that (7.11) has three equilibria if α >2√

β + 1 and one if 0 < α < 2√

β + 1. This information is shown graphically in thebifurcation diagram of Figure 7.8.

Suppose, starting from α > 2√

β + 1 and assuming the system is in the top equi-

librium in Figure 7.8, the bifurcation parameter is decreased quasistatically. Whileα > 2

√ β + 1, the system can follow this top equilibrium branch, but when α passes

2√

β + 1, the system evolves to states far removed from this equilibrium. In this

222



2

π

π

10µ

x

S

U bifurcationsaddle−node

Figure 7.7: Saddle-node bifurcation in the torqued pendulum equations (7.10).

α = 2 β +1

0

0 α

x

S

US

bifurcation atsaddle−node

Figure 7.8: Saddle-node bifurcation in the Turing equations (7.11).

223



case, the system “collapses” to the state x = 0.

7.5 Theory for steady-state bifurcation: the Liapunov-Schmidtreduction

7.5.1 Bare bones of the reduction

With the Liapunov-Schmidt reduction, one may greatly reduce the number of vari-

ables in calculations of steady-state bifurcation. Indeed, in all the above examples,the reduced problem has only one state variable (plus of course the various parame-ters). For example, recall how we analyzed bifurcation in the Lorenz equations (7.5):first we solved (7.5a) to obtain x = y, using this we solved (7.5c) to obtain z = y2/β ,and we finally substituted into (7.5b), yielding

y[ρ − 1 − y2/β ] = 0.

This one-dimensional equation is the relation graphed in Figure 7.4. The generalreduction proceeds in pretty much the same way.

To set the general context, consider a one-parameter family of ODEs, say

x′ = F(x, µ). (7.13)

Suppose that for µ = µ∗, equation (7.13) has an equilibrium solution x = x∗. If theequilibrium x∗ is hyperbolic, or even if the Jacobian matrix DF∗ merely satisfies

det DF∗ = 0, (7.14)

then by the implicit function theorem one may solve the equilibrium equation

F(x, µ) = 0 (7.15)

uniquely near (x∗, µ∗) for x as a smooth function of µ: i.e., no bifurcation occurs .

Thus, to have bifurcation at (x∗, µ∗), (7.14) must be violated. The minimal failureof (7.14) occurs if

0 is a simple eigenvalue of DF∗. (7.16)

(Exercise: Check that (7.16) is satisfied in all the examples in Sections 7.1–3.) Inthis case, the Liapunov-Schmidt technique may be used to reduce (7.15) to a single scalar equation.

To see how this works, let us for simplicity rotate the coordinates so that the first

coordinate vectore1 = (1, 0, . . . , 0) spans ker DF∗. (7.17)

224



Then the first column of DF∗ vanishes, so we may write in block notation

DF∗ =

0 vT

0 A

where v and 0 are (d − 1)-component vectors and A is the (d − 1) × (d − 1)-matrix,the Jacobian of F 2, . . . , F d with respect to x2, . . . , xd. By (7.16), the submatrix A isnonsingular. Thus by the implicit function theorem the last d−1 equations of (7.15)

F 2(x1, x2, . . . , xd, µ) = 0

. . . . . .

F d(x1, x2, . . . , xd, µ) = 0

(7.18)

may be solved near (x∗, µ∗) for x2, . . . , xd as functions of x1 and µ, say

(x2, . . . , xd) = X(x1, µ).

Define the scalar function g(x1, µ) by substituting this formula for x2, . . . , xd into thefirst component of F:

g(x1, µ) = F 1(x1, X(x1, µ), µ).

Here is the reduction: under the above hypotheses, there is a one-to-one corre-spondence between solutions near (x∗, µ∗) of the full problem (7.15) and the reduced problem

g(x1, µ) = 0. (7.19)

Specifically, if (x1, µ) is a solution of (7.19), then (x1, X(x1, µ), µ) is a solution of (7.15), and every solution of (7.15) near (x∗, µ∗) arises in this way. Indeed, the claimfollows from observing that (7.19) results merely from processing the d equations in

(7.15) sequentially.The best way to understand these ideas is to re-examine the bifurcation problems

considered above and interpret them as specific examples of the reduction. Exercise

7.5.2 Stability issues

In fact, stability information may also be derived from the reduction, provided ( 7.16)is strengthened as follows :

λ1(DF∗) = 0, ℜλ j(DF∗) < 0, j = 2, . . . , d . (7.20)

(In words, (7.21) asserts that the condition for x∗ to be asymptotically stable “missesby one dimension”.) Let us put the reduced function (7.19) on the RHS of a scalar

225



ODEx′ = g(x, µ). (7.21)

Observe that if g(x, µ) = 0—i.e., if x is an equilibrium of (7.21)—then this equi-librium is asymptotically stable for (7.21) provided the derivative gx(x, µ) < 0 andunstable provided gx(x, µ) > 0.

Theorem 7.5.1. Given a solution (x, µ) of g(x, µ) = 0, let x = (x, X(x, µ)) be the corresponding equilibrium of (7.13). Then regarding (7.13), x is

• asymptotically stable if gx(x, µ) < 0,

• unstable but hyperbolic if gx(x, µ) > 0, and

• nonhyperbolic if gx(x, µ) = 0.

While this result can be rigorously proved, the notation is mind numbing, andwe prefer to give an informal discussion. The stability of equilibria of ( 7.13) may becomputed from the signs of the eigenvalues of DF∗. Given (7.20), we deduce fromTheorem ?? appendix on evalues that (a) only λ1(DF∗) could become positive—i.e., be a stability breaker—and (b) λ1(DF

∗) is a smooth, real-valued function of

(x, µ) near (x∗, µ∗). The crux of the proof is to show that at any equilibrium of (7.21),the sign of gx is the same as the sign of λ1(DF∗) at the corresponding equilibriumof (7.13). We refer to [?] for the details of this proof. In the Exercises we ask you toverify the conclusions of this theorem for the bifurcation problems considered above.

7.5.3 Exploration of one-dimensional bifurcation problems

In this section we use the Liapunov-Schmidt reduction to introduce a partial hi-erarchy of steady-state bifurcation problems that satisfy (7.20). Without loss of generality we can translate coordinates so that the bifurcation point is located at

x = 0, µ = 0. Thus we consider the ODE (7.13) supposing that

F(0, 0) = 0 (7.22)

and that the Jacobian DF0 satisfies condition (7.20). The reduced function g(x, µ),defined near (0, 0), then satisfies

g(0, 0) = gx(0, 0) = 0, (7.23)

as follows from Theorem 7.5.1 since 0 is a non-hyperbolic equilibrium of x′ = F(x, 0).(Alternatively, in Exercise ?? we ask you to show directly that gx(0, 0) = 0.)

One-dimensional bifurcation problems can be roughly classified by how manyderivatives of g, beyond (7.23), vanish, as is done in Table 7.5.3. The phrase normal

226



Vanishing der. Normal form Name Example

None x′ = −x2 + µ saddle-node Activator-inhibitor network, (7.11)gµ = 0 x′ = −x2 + µx transcritical Logistic Lotka-Volterra , (7.9)gxx = 0 x′ = x3 + µ hysteresis point CSTR, Exercise ??

gµ = 0, gxx = 0 x′ = ±x3 + µx pitchfork Lorenz equations, (7.5)

Table 7.2: Partial classification of one-dimensional bifurcation problems

form , which appears in the table, refers to a particularly simple version of a bifurca-tion problem that captures the essential behavior of a class of problems. (We shalluse this phrase informally, shying away from a precise, technical definition.)

To gain intuition, let us consider the construction of one of these normal forms,the pitchfork. A somewhat more general scalar ODE3 with a degenerate equilibrium(i.e., satisfying (7.23)) for which gµ = 0, gxx = 0 is

x′ = Ax3 + Bµx.

Suppose the coefficients A and B are nonzero. If we rescale x =

|A

|1/2x and µ =

|B

|µ,

then we may reduce this equation to

x′ = ±x3 ± µx. (7.24)

The different bifurcation diagrams for the four choices of sign are shown in Figure 7.9.We regard Cases 1 and 2, for which the cubic coefficient in (7.24) is negative, asessentially equivalent, for the following reasons. In Case 1 the trivial solution isstable for µ < 0 and unstable for µ > 0, while in Case 2 it is the other way around.However, in both cases, as the bifurcation parameter crosses zero, the trivial solutionloses stability, to be replaced by two stable bifurcating solutions; the only differenceis the reversal of the orientation of the parameter change that causes instability, adifference that does not seem important to us. (In the terminology of Section 7.1, thebifurcation of x′ = −x3 + µx is supercritical .) It is conventional to collapse these twocases into the one case −x3 + µx in the Table by orienting the bifurcation parameterso that the trivial solution loses stability as µ is increased4.

In Cases 3 and 4, which we similarly collapse into the normal form x3 + µx, thecubic coefficient in (7.24) is positive. This change in sign cannot be scaled away.Rather it indicates a real difference in behavior—in Cases 3 and 4 the bifurcation issubcritical .

Note that we have not included any higher-order terms in the normal forms in

3Below we will discuss examples with additional, higher-order, terms.4This convention is natural in that in applications it is more common for instability to appear as

the parameters in the problem are increased, as occured in the examples of Section 7.1.

227



x x

x x

µ µ

µ µ

Case 3: + +

Case 1: − +

Case 4: + −

Case 2: − −

Figure 7.9: Pitchfork bifurcations in (7.24) for each of the possible pairs of sign choices. Per our usual convention, stable equilibria are indicated with bold, solid curves and unstable equilibria are indicated with thinner, dashed curves.

228



Similar discussions for the other normal forms in the table are given in [?], butwe shall not pursue this subject further.

7.5.4 Symmetry and the pitchfork bifurcation

By a reflection on Rd we mean a linear map R : Rd → Rd such that R2 = I . We saythat an ODE x′ = F(x) is symmetric 5 under R if

F(Rx) = RF(x) for all x. (7.29)

Alternatively, if an ODE is symmetric under R, then for any solution x(t), thereflected function Rx(t) is also a solution, and conversely. Recall that all three ex-amples of pitchfork bifurcation in Section 7.1 possessed such symmetry; the followingresult indicates that such behavior is no accident.

Theorem 7.5.2. Suppose that: (i) for µ near µ∗ the ODE (7.27) is symmetric with respect to the reflection R; (ii) for µ near µ∗, (7.27) has an equilibrium solution xeq(µ) that is invariant under R: i.e., Rxeq(µ) = xeq(µ); (iii) the Jacobian DF∗satisfies condition (7.20) at (xeq(µ∗), µ∗); and (iv) the null eigenvector v of DF∗

satisfies Rv = −v. Then at the bifurcation point the reduced function satisfies

gxx = 0, gµ = 0.

Of course (7.20) implies that g = gx = 0. Thus, provided gxxx = 0 and gµx = 0,it follows from the theorem that the bifurcation is a pitchfork.

This result is proved in [?]. More informative than reading the proof, however,is to verify that each of the examples in Section 7.1 satisfies the hypotheses of thetheorem.

Remarks: (i) If the Liapunov-Schmidt reduction is performed in a manner that

respects symmetry (and it would be perverse to do otherwise), then g is an oddfunction of x so all derivatives of even order in x must vanish at x = 0. (ii) Generalconsiderations imply that the null eigenvector v satisfies either Rv = +v or Rv =−v. In words, the fourth condition in the theorem requires that the bifurcationbreak the symmetry. By contrast, if Rv = +v, symmetry has no implications forthe bifurcation.

7.5.5 The two-cell Turing instability

Let us return to the Turing instability with two interacting cells, introduced in Sec-tion 5.3.2. We consider (5.28) as a bifurcation problem with the diffusion coefficientD as the bifurcation parameter. From our calculations in that section we know:

5The technical term used in [?] is equivariant .

230



• for all D (5.28) has a trivial equilibrium (x+, r+, x+, r+) in which the concen-trations are the same in both cells;

• There is a threshold value Dthr such that the trivial solution is stable if D < Dthr

and unstable if D > Dthr. Moreover, at D = Dthr the bif’n problem satisfiesthe minimal-degen’cy condition (7.20); and

• equation (5.28), as well as the trivial solution, is symmetric with respect to theinterchange of concentrations in the two cells,

R · (x1, r1, x2, r2) = (x2, r2, x1, r1), (7.30)

which is a reflection.

In Section 5.3.2 we left open the question of how solutions of ( 5.28) behave whenD > Dthr. How, however, Theorem 7.5.2 strongly suggests that the trivial solutionundergoes a pitchfork bifurcation. To prove this (as well as to determine analyticallywhether the pitchfork is supercritical or subcritical) we would need to calculate gxxx

and gµx and show that they are non-zero. Although this is possible, it is tedious, andwe prefer to answer these questions by the simulation shown in need Figure ??.Although the computations do not prove anything, who can see the figure and doubtthat the trivial solution of (5.28) undergoes a supercritical pitchfork bifurcation atD = Dthr?

7.5.6 Imperfect bifurcation

An ODE such as (7.13) represents an idealized description of some physical system,but real systems will differ from the idealized description in myriad ways that areimpossible to enumerate. For example, in the bead equation (7.2), let us suppose theaxis of rotation of the ring is very slightly off-center, say by a distance ε as in need

Figure ??. In this case, the length of the rotation arm is slightly changed—fromsin x to sin x + ε—and the equations of motion for the bead will read

x′ = y (7.31)

my′ = −βy − mg sin x + m(a sin x + ε)ω2 cos x.

This perturbation splits the bifurcation diagram into two connected pieces, in dif-ferent ways depending on the sign of ε. (See Figure ??) If ω is increased quasi-statically, say with ε > 0, then the equilibrium of the bead will evolve smoothlyfrom an equilibrium near the bottom of the loop to an equilibrium with x > 0; sim-

ilarly if ε < 0. In other words, making ε nonzero removes the indeterminancy of theidealized, perfectly symmetric, problem.

231



single real eigenvalue of the Jacobian passed through zero. Here stability is lost as apair of complex-conjugate eigenvalues crosses the imaginary axis6.

The bifurcation examples above lead us to expect some change in the set of solutions of (7.34) as µ crosses zero. However, no new equilibria appear—the onlyequilibrium of (7.34) is x = y = 0, no matter what the value of µ. To see whatchange does occur near µ = 0, let us rewrite the equations in polar coordinates

r′ = µr − r3

θ′ = 1.(7.35)

Note that r′ vanishes if r = 0 or if r2 = µ. The former solution just representsthe equilibrium x = y = 0. By contrast, the latter solution represents a new typeof orbit that appears as µ crosses zero: i.e., r =

√ µ, θ = t where we have chosen

the phase arbitrarily, or in Cartesian coordinates, x =√

µ cos t, y =√

µ sin t. Tosummarize, in Hopf bifurcation, periodic solutions appear when a pair of eigenvaluesof the Jacobian crosses the imaginary axis.

It is instructive to consider a more general, but still academic, example of Hopf bifurcation:

r′ = µr

−αr3

θ′ = 1 + βr2 (7.36)

where α, β are parameters. If α > 0, then the bifurcating periodic solutions existfor µ > 0 and are stable (where the equilibrium is unstable) as in Figure 7.10; thiscase is called supercritical . On the other hand, if α < 0, then the periodic solutionsexist for µ < 0 and are unstable, which is called subcritical . In the supercriticalcase, the bifurcating solutions describe the new behavior of solutions if µ is increasedbeyond zero. In the subcritical case, the bifurcating solutions constrict the domainof attraction of the equilibrium as µ tends to zero from below, but for µ > 0 thesolution evolves to states far away from the equilibrium.

In fact, (7.36) has greater generality than may be apparent—indeed it is shown in[?] that under rather general circumstances, near a Hopf bifurcation of an equationsuch as (7.1), there is a coordinate transformation that reduces the general problemto the normal form (7.36), modulo higher-order corrections. Incidentally, note thatthe parameter β in (7.36) makes the period of the oscillations vary with their ampli-tude. (In the Exercises we study a nonlinear resonance phenomenon in which thisparameter plays a key role.)

(b) The “repressilator”

Hopf bifurcations also occur in higher-dimensional ODEs. To illustrate this point,let us consider a gene network called the “repressilator”. The name tries to capturethe idea that oscillations occur through the mutual repression, or inhibition, of three

6In connection with this behavior, we recommed that you revisit Exercise 11 from Chapter 2.

234



2

1

x

µequilibrium

stable stable periodic

orbits

unstable

equilibrium

00

Figure 7.12: Bifurcation diagram for (7.37) assuming a Hill coefficient of n = 4.The steady-state value of x (denoted by xeq in Equation (7.38)) loses stability via a Hopf bifurcation at µ = 2. This Hopf bifurcation spawns stable, periodic orbits,which are indicated in the diagram by plotting the minimum and maximum values of

x for each periodic orbit; the gap between these values gives a visual representation of the amplitude of the periodic solutions as a function of µ.

µ = 1.8 µ = 2.2

1.4

1.0

0.6

x

0.6

1.0

1.4

0 30t 0 30t

x

Figure 7.13: Sample trace of x versus t obtained by numerical solution of (7.37)with initial conditions x(0) = 0.6, y(0) = 0.4, z (0) = 0.2, and Hill coefficient n = 4.Left panel: With µ = 1.8, transient oscillations occur before the system settles toequilibrium. Right panel: With µ = 2.2, the solution trajectory approaches a limit cycle.

236



Moreover, this solution depends smoothly and monotonically on µ, and it tends toinfinity as µ → ∞. The graph of this solution is the “backbone” of the bifurcationdiagram of Figure 7.12.

How does the stability of this equilibrium depends on µ? To answer this question,we compute that the Jacobian of the system:

DF =

−1 −α 00 −1 −α

−α 0 −1

(7.39)

where α = µnxn−1/(1 + xn)2, and at equilibrium, we have from (7.38) that

α =nxn

eq

1 + xneq

. (7.40)

Note that DF = −αI + B where

B =

0 1 00 0 11 0 0

The eigenvalues of B are cube roots of unity, so the eigenvalues of DF are

−α − 1 and − α

−1

2± i

√ 3

2

− 1.

As α increases, the complex-conjugate roots cross the imaginary axis when α = 2,which by (7.40) corresponds to xn

eq = 2/(n − 2) and

µ∗

= 2

n − 21/n

n

n − 2.

Thus, the equilibrium (7.38) is stable if µ < µ∗ and unstable if µ > µ∗.

Calling on simulations, we find that the Hopf bifurcation at µ = µ∗ is supercriti-cal: In Figure 7.13 periodic solutions appear after µ crosses µ∗.

(c) The augmented Lotka-Volterra equations

Another example of a Hopf bifurcation occurs in the Lotka-Volterra equationsaugmented to include logistic growth and the Allee effect for the prey, as consideredin Section 1.6,

(a) x′ = x x−ε

x+ε (1

−x

K )−

xy

(b) y′ = ρ(xy − y). (7.41)

Specifically, bifurcation occurs along the curve K = (1 + 2ε − ε2)/2ε that separates

237



Regions II and III in Figure 1.11(a). To simplify the calculations to show this7, wedefine the function

φ(x) =

x − ε

x + ε

1 − x

K

(7.42)

so that the RHS of the first equation in (7.41) may be rewritten xφ(x) − xy. Withthis notation, the Jacobian of (7.41) at the co-existence equilibrium (1, 1 − φ(1))equals

DF = φ′(1)

−1

ρφ(1) 0 . (7.43)

Now det DF > 0 provided K > 1, while

tr DF = φ′(1) =2ε − (1 + 2ε − ε2)/K

(1 + ε)2.

If K = (1 + 2ε − ε2)/2ε, then tr DF = 0, and hence ℜ λ(DF) = tr DF/2 vanishes.Hence, the system (7.41) undergoes a Hopf bifurcation as K increases through (1 +2ε − ε2)/2ε.

Unlike the preceding two examples, this bifurcation is subcritical . This assertion is

supported by the fact that when (K, ε) belongs to Region III, where the co-existenceequilibrium is unstable, solutions of (7.41) evolve to states far from the equilibrium:specifically, as shown in Figure 1.11(d), both populations go extinct. We may observethe periodic solutions directly in Figure 7.14, which shows a simulation of (7.41) with“time run backwards”.

7.7 Hopf bifurcation: theory

The Liapunov-Schmidt reduction provides an effective tool for understanding steady-state bifurcation. By contrast, for Hopf bifurcation, no such simple tool is available.

Thus, the following purely existential result, which is modeled on Theorem 7.5.3,assumes greater importance.

In the Hopf theorem we also consider a one-parameter family of ODE (7.32)near a non-hyperbolic equilibrium, but (7.20) is altered as follows. Suppose that forµ = µ∗, the Jacobian DF∗ of this equation at (xeq(µ∗), µ∗) satisfies:

DF∗ has simple eigenvalues ± iω0, where ω0 = 0, andhas no other eigenvalues on the imaginary axis.

(7.44)

In particular, zero is not an eigenvalue of DF∗, so by the implicit function theorem,for µ near µ

∗, there is a smooth branch of equilibria xeq(µ) passing through (x

∗, µ

∗).

Let DFµ be the linearization of (7.32) at these nearby equilibria. By Theorem ?? in

7These calculations repeat work you may have already done in Exercise 2(d) in Chapter 5.

238



ommend that he/she check the conclusions of the theorems in the context of theexample

x′1 = µx1 − x2 − αx1(x21 + x2

2)x′2 = x1 + µx2 − αx2(x2

1 + x22)

x′3 = −x3

. . . . . . . . .x′d = −xd,

(7.46)

which was constructed from (7.34) by inserting a general coefficient α in the cubicterms and appending d − 2 auxiliary variables whose evolution is trivial.

The bifurcation (7.46) is supercritical if α > 0, and in this case the bifurcatingsolutions are stable. Likewise, if α < 0, the bifurcation is subcritical and the solu-tions are unstable. In the theorem the bifurcation is supercritical if in Item (ii) thequadratic coefficient µ1 > 0; and subcritical if µ1 < 0. Moreover, we have:

Theorem 7.7.2. In Theorem 7.7.1, the periodic solutions are stable if µ1 > 0;unstable if µ1 < 0.

To describe these bifurcating solutions more quantitatively, it is useful to perform

a preliminary reduction of (??). By translating µ and subtracting the steady-statesolution from x, we may arrange without loss of generality that (x∗, µ∗) = (0, 0) andin fact xeq(µ) ≡ 0. Next we move the essential coordinates for the bifurcation to thex1, x2 position. It follows from (7.44) that there is a similarity matrix S such thatS −1DF∗S has the block structure

S −1DF∗S = Ω =

Ω0 00 A∗

where Ω0 =

0 −iω0

iω0 0

(7.47)

and A∗ is a (d − 2) × (d − 2) matrix with no eigenvalues on the imaginary axis. If we define a new unknown S x (without introducing a new name for it), then we have

the transformed equation

x′ = S −1F(S x) = Ωx + R(x, µ) (7.48)

where R(x, µ) = O(|x|2, µ|x|).

Theorem 7.7.3. If in the above Theorem, the ODE has the form (7.48), then the bifurcating solutions solutions have the form

γ (t, a) =

a cos ω(a)ta sin ω(a)t

0. . .0

+ O(a

2

)

240



where the frequency ω(a) has the expansion

ω(a) = ω0 + ω1a + ω2a2 + . . . .

Connect thm to stable manifold theorem

In general it is difficult to determine whether a Hopf bifurcation is subcritical orsupercritical, but for a 2 × 2 system the following formula may be applied. Let usassume the system has been written in the form

x′ = (Ω0 + µL)x +

f (x, µ)g(x, µ)

(7.49)

where Ω0 is given by (7.47) and L is a 2× 2 matrix. Let us write the nonlinear termsf (x, µ)g(x, µ)

= Q(x) + C(x) + O(|x|4, µ|x|2, µ2|x|) (7.50)

where Q and C are pure quadratic and cubic, respectively. We will write f x, f xx, etcfor derivatives of f evaluated at x = 0, and similarly for g. Thus for example

Q(x) = 12

f xxx2 + 2f xyxy + f yyy2

gxxx2 + 2gxyxy + gyyy2

where to avoid a proliferation of subscripts, we write x = (x, y); a similar formulaholds for C(x).

Theorem 7.7.4. The bifurcation of (7.49) is supercrit if

f xxx+f xyy +gxxy+gyyy +1

ω0[f xy(f xx+f yy)−gxy(gxx+gyy)−f xxgxx+f yygyy ] < 0 (7.51)

and subcritical if this expression is negative.

We shall prove result with calculation similar to calculation of periodic orbit forvan der Pol. Can scale van der Pol calculation to make completely parallel.

Q: Make this an exercise with signposts?

Must revise proof, Calculate formula for µ1. This will involve trace of L. Maybe remark trace must be nonzero, maybe normalize so that it’spositive? Also reduce to the case where ω0 = 1.

Look for solution of (7.49) in a series expansion

x(t, µ) = µ1/2x1(τ ) + µx2(τ ) + µ3/2x3(τ ) + . . . (7.52)

where x j = (x j, y j) is 2π-periodic in τ = ω(µ)t with

ω(µ) = 1 + µ1/2ω1 + µω2 + . . . .

241



Substitute (7.52) into (7.49), collect powers of µ:

(a) O(µ1/2) : dx1dτ

− Ωx1 = 0

(b) O(µ) : dx2dτ

− Ωx2 = −ω1dx1dτ

+ Q(x1)

(c) O(µ3/2) : dx3dτ

− Ωx3 = −ω1dx2dτ

− ω2dx1dτ + Lx1 + DQ(x1)x2 + C(x1)

(7.53)

where DQ(x1) is the 2 × 2 matrix, the differential,

DQ(x1) =

f xxx1 + f xyy1 f xyx1 + f yyy1gxxx1 + gxyy1 gxyx1 + gyyy1

(7.54)

Solution of (7.53a):

x1(τ ) = a

cos(τ )sin(τ )

(7.55)

where a is to be determined from higher-order calculations.

Solution of (7.53b):First calculate that

Q(x1) =a2

2

f xx cos2(τ ) + 2f xy cos(τ )sin(τ ) + f yy sin2(τ )gxx cos2(τ ) + 2gxy cos(τ ) sin(τ ) + gyy sin2(τ )

.

Expressing cos2 and sin2 in terms of double angles, we write

Q(x1) = a2 [q0 + q1 cos(2τ ) + q2 sin(2τ )]

where q j

is the constant vector given in the first column of Table 7.3.

For (7.53b) to have a solution, the RHS of this equation must satisfy an orthog-onality condition; specifically, for any function w(τ ) belonging to the kernel of thetranspose operator, we must have 2π

0

RHS (τ ), w(τ ) dτ = 0.

Now since ΩT = −Ω, the transpose operator is

−

d

dτ −ΩT =

−(

d

dτ −Ω);

i.e., the transpose operator is just the negative of the original operator. Thus x1 anddx1/dτ span the kernel of the adjoint. Note that Q(x1) is orthogonal to both x1 and

242



the first term in (7.57) we calculate from (7.55) that

1

2π

2π0

Lx1, x1 dτ =a2

2trL,

For the third term in (7.57) we calculate that

12π

2π

0C(x1), x1 dτ =

a4

2π 2π0

f xxx

6 cos3

(2τ ) +

f xxy

2 cos2

(τ ) sin(τ ) +

f xyy

2 cos(τ )sin2

(τ ) +

f yyy

6 sin3

(τ )

cos(τ ) dτ + a4

2π

2π0

gxxx6 cos3(2τ ) +

gxxy2 cos2(τ )sin(τ ) +

gxyy2 cos(τ )sin2(τ ) +

gyyy6 sin3(τ )

sin(τ ) d

(7.58)All integrals with an odd power of sin or cos vanish, and

1

2π

2π0

cos4(τ ) dτ =1

2π

2π0

sin4(τ ) dτ =6

16,

1

2π

2π0

cos2(τ )sin2(τ ) dτ =2

16.

Hence (7.58) reduces to

1

2π

2π

0 C(x1), x1

dτ =

1

16(f xxx + f xyy + gxxy = gyyy). (7.59)

For the middle term in (7.57), it is convenient to take a transpose:

1

2π

2π0

DQT (x1)x2, DQT (x1)x1 dτ.

We have already calculated x2. Substituting into (7.54) we find

DQT (x1)x1 = a2

f xx cos(τ ) + f xy sin(τ ) f xy cos(τ ) + f yy sin(τ )gxx cos(τ ) + gxy sin(τ ) gxy cos(τ ) + gyy sin(τ )

cos(τ )sin(τ )

.

Multiplying this out and using the double-angle formulas, we obtain

DQT (x1)x1 = a2 [p0 + p1 cos(2τ ) + p2 sin(2τ )]

where p j is given in the third column of Table 7.3. Recalling (7.56) for x2 andperforming the trivial trigonometric integrals, we see that the middle term in ( 7.57)equals

1

2π

2π

0

x2, DQT (x1)x1 dτ = v0, p0 + v1, p1/2 + v2, p2/2.

244



Finally, taking v j, p j from Table 7.3 we may compute that

1

2π

2π0

x2, DQT (x1)x1 dτ =1

16[f xy(f xx + f yy) − gxy(gxx + gyy) − f xxgxx + f yygyy].

(7.60)This calculation isn’t as bad as it might seem. Note that there are six independentcoefficients in Q(x). In principle a quadratic form in these six coefficients mighthave 6 × 7/2 = 21 independent terms. However, note in the Table that in the inner

product coefficients in the group f xx, gxy, f yy never multiply another coefficient inthe same group, only coefficients from the complementary group gxx, f xy, gyy. Thusthere are only 3 × 3 = 9 possible independent terms, and it may be seen from ( ??)that three of the nine vanish and the others all reduce to ±1/16.

Put the three terms together to get the desired result.

7.8 Bifurcation in the FitzHugh-Nagumo equations

Add something about saddle-node bif’n before discussion Hopf?

Let us recall the FitzHugh-Nagumo equations from Chapter 6:

x′ = x2(1 − x) − y + I y′ = ε(x − γy)

(7.61)

We saw in that chapter that if γ < 3, then (7.61) has a unique equilibrium for everyvalue of I . This equation exhibits relaxation oscillations if 0 < I < 2/3, provided εis sufficiently small, and the periodic orbit encloses a repelling equilibrium. On theother hand, (7.61) has a globally attracting equilibrium if I < 0 or I > 2/27. Notproved that it’s globally attracting, just stable. As I varies, the transitionbetween these different behaviors is effected through a Hopf bifurcation—see thenumerical simulations presented in Figure ??. Here we study the bifurcation thatoccurs for I close to zero, and our main result is:

Main claim: Equation (7.61) undergoes a Hopf bifurcation at I bif = ε/2 +O(ε2)that is supercritical if 0 < γ < 3/2 and subcritical if 3/2 < γ < 3.

A completely analogous bifurcation occurs for I close to 2/27. Incidentally, asFigure ?? shows, the periodic solutions bifurcate from an equilibrium that varies(smoothly) with I —this also happened above with (7.41).

Proof of claim: We may eliminate y from the equilibrium equations associatedwith (7.61), obtaining

x3eq − x2

eq + xeq/γ = I. (7.62)

This equation defines the equilibrium value of x as a function of I , and its graphis the backbone of Figure ??. Actually it is more convenient to regard xeq as theindependent variable and compute I from (7.62). Thus, we take µ, the equilibrium

245



value of x, as the bifurcation parameter. Following (7.62) we define

I (µ) = µ3 − µ2 + µ/γ (7.63)

and rewrite (7.61) asx′ = x2(1 − x) − y + I (µ)y′ = ε(x − γy).

(7.64)

In terms of incremental variables x = x

−µ, y = y

−µ/γ we may rewrite (7.64) as

x′ = (x2 + 2µx) − (x3 + 3µx2 + 3µ2x) − yy′ = ε(x − γy);

(7.65)

all terms that are independent of x, y cancel, so for all µ, x = y = 0 is an equilibriumsolution of (7.65). The differential of the equation at this equilibrium is

DF =

2µ − 3µ2 −1

ε −γε

. (7.66)

The determinant of this matrix is ε times γ (3µ2 − 2µ) + 1; provided γ < 3 this

quadratic form is positive definite; thus, det DF > 0. The system undergoes a Hopf bifurcation if

tr DF = −3µ2 + 2µ − γε = 0 (7.67)

which may be solved to yield

µbif =1 − √

1 − 3γε

3=

γε

2+ O(ε2),

where we have chosen the minus sign which gives the bifurcation near I = 0. Wedefine µ = µ − µbif and substitute into (7.65) to obtain, modulo terms that are

O(µx2, µ2x)

x′ = (2µbif − 3µ2bif )x − y + µ(2 − 6µbif )x + (1 − 3µbif )x2 − x3

y′ = ε(x − γy);(7.68)

here we have listed the terms on the RHS in the order

linear in x,y, O(µx), O(x2), O(x3).

In preparation for applying criterion (7.51), we rewrite (7.68) in matrix notation

x′

y′ = (A + µB) x

y + (1 − 3µbif )x2

0 + −x3

0 (7.69)

246



where

A =

γε 1ε −γε

, B =

2 − 6µbif 0

0 0

.

We need to reduce A to real canonical form as in (7.49). Thus we define

S =

1 0

γε

ε(1 − γ 2ε)

Explain choice–want x-coordinate to be unchanged to remain simple. Com-pute

S −1AS =

0 −ωω 0

where ω =

ε(1 − γ 2ε) and

B =

2 − 6µbif 0

− γ √

ε√ 1−γ 2ε

(2 − 6µbif ) 0

.

Thus application of this similarity transformation yields the normal form eqn (7.49)

with Ω = S −1AS , L = S −1BS ,

Q = (1 − 3µbif )

x2

− γ √

ε√ 1−γ 2ε

x2

, C =

−x3

γ √

ε√ 1−γ 2ε

x3

.

Now ω =√

ε(1 + O(ε), and using the notation of Section 7.6, we have

f xx = 2 + O(ε), gxx = −2γ √

ε(1 + O(ε), f xxx = −6, gxxx = O(√

ε),

while all other derivatives in (7.51) vanish. Substituting into this equation we com-pute

f xxx + gxxx − 1ω

f xxgxx = −6 + 4γ + O(√ ε),

which proves the claim.

Concluding remark: Canard explosion. Also note: period is O(√

ε) near bif’npoint, grows to O(ε) in relaxation oscillations.

7.9 Exercises

Subcritical bif’n: Euler strut supported from side.

Q: introduce minimizing energy as a technique to derive eqlb eqn.

247



Bioswitch example, Ellner and Guckenheimer page 120, making equations sym-metric in u and v:

x′ = µ1+yn

− x

y′ = µ1+xn

− y.

Bistability, through supercrit bif’n if µ increases through positive values, throughsaddle-node bifurcation if µ decreases through large negative values.

For (7.11), verify the claimed value of α for the bifurcation point. To do this,

write the equation for equilibria (with r eliminated) as

[β/γx2] − [αx − 1]

1 + x2= 0.

I.e., put 1 + x2 in the denom. When there is one eqlb, num is positive for all x, atbif’n, num becomes a perfect square.

Rosenzweig-MacArthur (after scaling) Note: Different scaling used here,based on carrying capacity. Connect to “nonunique scaling” idea above.

v′ = v(1 − v) − xv1+Sv

x′ = Exv1+Sv

− Dx.

(change letters?) Find ss-sln: Forget (0, 0) Trivial solution (1, 0). Stable for largeD, loses stability. Find nontrivial solution branch that bifurcates at point wheretrivial sln loses stability. Show stable nearby. Caveat.

Show there is a Hopf bifurcation for still smaller D.

Give alternate conditions for an eigenvalue to be simple: ker intersect range =zero, dim ker (A − λI )2 is one.

Note there is energy function for rotating bead, provided you include the rota-tional energy.

the CSTR: continuous stirred-tank reactor (make exercise)

Start with dim’l eqn. Do dim’l analysis, scaling. Obtain scaled equations:

c′ = µ(1 − c) − ceT

T ′ = −µT + HceT .

Assume H is large. Find equilibria as function of µ. Show unique ss-sln for µ

large or small, three ss-sln in between.Rmk: If H = 4, there is a “hysteresis point”. If H > 4, have interval of µ with 3

sln; if H < 4, no such interval. Q: Make Exercise out of this?

248



Hopf bifurcation for equilibria of Lorenz at large ρ

Rossler ODE has saddle-node bifurcation before all the period-doubling actionstarts.

Hopf bif’n in repressilator. (See “junk” at end of TeX file.)

7.10 Ideas

• Other examples? E.g., Selkov’s glycolysis model? Nowak’s virus model?

• Appendix: Do eigenvalues of matrix depend continuously on entries?

• Reference for proofs.

What about degenerate bifurcation, “codim 2”? Combine a couple of the above?Ex: buckling of a “spring-beam”. Something in Morris-Lecar?

Terms to give intuition about: unfolding, robust, generic (=what produces onlyrobust behavior)

Construct coordinate transf to show can push some terms out to higher order.

Give intuitive argument that hot (−x3, 0) or (x2, x2) stabilize the basic Hopf equation. Explain why quadratic more potent at high rotation frequency.

249



Chapter 8

Global bifurcations

In this chapter we present examples of five types of global bifurcations. Some of these bifurcations are actually present in equations considered above, and we beginwith these. Unfortunately, with this order of presentation, the examples that exhibitmore interesting behavior are pushed to later on the list. For most cases, we begin

the discussion of a type of bifurcation with an academic (pedagogical?) example andthen proceed to examples with more physical interest.

Overall remark: Even more dependent on examples and calculations than inprevious chapters. Can’t prove much of anything.

8.1 Mutual annihilation of two limit cycles

8.1.1 An academic example

r′ = (r

−1)2 + µ

θ′ = 1. (8.1)

If −1 < µ < 0, there are two periodic orbits, r = 1 ± √ −µ, the inner onestable and the outer one unstable. If µ > 0, there are none. This bifurcationis completely analogous to saddle-node bifurcation of equilibria, even though it is“nonlocal” explain why call it nonlocal.

8.1.2 The FitzHugh-Nagumo equations

If have subcritical Hopf bifurcation and bifurcating limit cycle “turns around”, anunstable cycle becoming stable. Refer to Chapter 7.

250



8.1.3 Phase locking in coupled oscillators

Consider equations on torus T2, with coordinates θ, φ. (Could of course pose on R4,using polar coordinates on each copy of R2.)

θ′ = ω1 − K 1 sin(φ − θ)φ′ = ω2 − K 2 sin(φ − θ).

Interpret (from Strogatz): two runners on circular track. Each has natural speed,

and these are different, but they place a value on running together.Find solutions, phase locked if differences small enough. Note there is a M/A

bifurcation at edge of inequality above.

Warning: If inequality is not satisfied, behavior is a lot more compli-cated than one might guess. See Appendix on ODEs on torus.

8.2 Saddle-node bifurcation points on a limit cycle


r′ = 1 − rθ′ = µ − cos θ

Bifurcation occurs at µ = 1. Describe

Q: Add (r −1)2 to θ-eqn? Connect with example of eqlb that is attract-ing but not stable.

8.2.2 The overdamped torqued pendulum

Recall the torqued pendulum from Chapter 7:

x′ = yy′ = − sin x − βy + µ.

(8.2)

We showed that this system undergoes a saddle-node bifurcation at µ = 1: as µcrosses 1, the two solutions of

sin x = µ

merge and disappear into the complex domain. Let us suppose that β is large: i.e.,there is a lot of damping. Argue that for µ < 1 the two halves of the unstable manifoldfrom the saddle point fall into the stable equilibrium, one being very short and theother making a nearly complete revolution. Together these make up a homoclinic

cycle. When µ crosses 1, the equilibria disappear and the homoclinic cycle becomesa limit cycle.

251



8.2.3 Other examples

Show exists in FitzHugh-Nagumo? In dumbed-down Morris-Lecar

v′ = C (v) − w + I w′ = ε[e(γv − v0) − w]

Here C is an appropriate cubic, perhaps C (v) = v2(1 − v) or C (v) = v(1 − v2), ande is an “elbow” function,

e(v) = [√ v2 + 1 + v]/2.

The model has 4 parameters, I, ε , γ , v0

8.3 Homoclinic bifurcation

8.3.1 van der Pol with nonlinearity in the restoring force

In this case there is a physical example that already is simple emough to exhibit thephenomenon, so there is no need to begin with an academic example.

x′ = yy′ = −β (x2 − 1)y − x − εx2.

(8.3)

Difference from van der Pol: the linear restoring force is perturbed by the nonlinearterm εx2.

If ε is small, the periodic orbit of van der Pol continues to exist, being only slightlyperturbed. Argue or compute to show: If ε is increased sufficiently, the periodic orbitdisappears. Argue that it disappears through a homoclinic bifurcation—must definethis term.

Do dimension counting to show homoclinic requires special value of parameter.

8.3.2 The torqued pendulum with small damping

The torqued pendulum considered above, (8.2), also exhibits a homoclinic bifurcationas the torque µ is increases, provided the damping β is small. Show this. Maybeit’s clearer to consider decreasing µ, from situation in which periodic orbit exists todisappears. Rmk: Phase space is S 1 × R.

Q: At what point does transition between saddle-node on a limit cycle and ho-moclinic bifurcation occur? Show diagram in Strogatz of figure taken from paper byLevi.

252



8.3.3 The Lotka-Volterra model with logistic growth and the Allee effect

Strictly speaking should call this homoclinic-cycle bifurcation. (Recall homocliniccycles in Poincare-Bendixson Theorem.)

Recall equations

Know unstable limit cycle appears as K decreases, but where does it go—notpresent at K = 1 when coexistence equilibrium meets prey-only equilibrium at(K, 0)? Show homoclinic cycle, involving connecting orbits between saddle points

at (ε, 0) and (K, 0). Compute the value of K at the bifurcation.

8.3.4 Other examples

FitzHugh-Nagumo or dd-Morris-Lecar?

Appearance of periodic orbit in Lorenz–part of the onset of chaos.

Ex of bifurcation from saddle point to unstable limit cycle, like in Lorenz? im-portant! (There is an example in my notes, between lectures 24 and 25.)

8.4 Hopf-like bifurcation to an invariant torus

Can describe with Poincare map, e-value = e±iα. Note analogy with Hopf bifurcationof equilibrium.


Pose on R3. Define coordinates (create name), starting from cylindrical coord r,θ,z .Specifically, θ unchanged, use polar coordinates to describe r, z -plane with origin at(r, z ) = (1, 0):

r

−1 = ρ cos θ, z = ρ sin θ,

or the inverseρ =

(r − 1)2 + z 2, φ = arctan(z/(r − 1)).

Give figure. Specify range of variables.

Considerθ′ = 1ρ′ = µρ − ρ3

φ′ = ω + βρ2.

Discuss behavior as µ crosses 0—the ρ, φ-subsystem undergoes a Hopf bifurcation.But orbits are located on a torus–i.e., θ evolves along with ρ, φ.

Discuss closed orbits and skew lines on T2.

253



Warning: Typically (generically) behavior of orbits on T2 not so simple–see Appendix. For example if added a term −K sin(φ−θ) to the φ equation,would get the kind of behavior described in Appendix.

Rmk: Example of quasi-periodic behavior in Strogatz, motion in a central forcefield, Strogatz, problem 8.6.7 on p295.

8.4.2 The forced van der Pol equation

x′′ + β (x2 − 1)x′ + x = γ cos(ωt).

If forcing γ is large, there is a unique stable periodic orbit with period 2π/ω. (Cf.forcing in linear case.) Expect if γ decreases, periodicity of unperturbed system willplay a role. Confirm with computations. (Use Section 2.1 of Guck-Holmes as guidein param choice, at least if weakly nonlinear.)

This system considered in much greater detail in the Appendix, ODEs on a torus.Is it really? Not just used to motivate idea of ODE on torus?

Incidentally, this type of bifurcation occurs in many fluid-mechanics problems(PDE). Part of the evolution towards turbulance at large Reynolds number. Ref’ce??

8.5 Period doubling

Note: can also describe with Poincare map, e-value = -1. (Strictly speaking there isa third case, in which e-value of Poincare map equals +1. Not very interesting. SeeExercises.)


Cylindrical coordinates, but still with r = 1, z = 0 as a periodic orbit.

θ′ = 1(r − 1)′

z ′

= (Ω − I + µA(θ))

(r − 1)

z

where I is the 2 × 2 identity matrix,

Ω =

0 −1/2

1/2 0

, A(θ) =

cos2 θ sin θ cos θ

sin θ cos θ sin2 θ

.

Explain how one rotating direction is amplified, the orthogonal direction is damped.

Write down explicit solutions.Phenomenon: new periodic solutions with period 4π.

254



8.5.2 Rossler’s equation

Should have introduced this in Exercises in Chapter 7. It exhibits both saddle-nodeand Hopf bifurcations.

No Physical basis, just ingenuity of Rossler.

x′ = −y − z y′ = x + ay

z ′ = b + z (x − c).

(8.4)

Numerical solutions. Period doubling after Hopf bifurcation. In fact, infinite se-quence of these!

Q: What discussion of period doubling in 1-dim maps. Ex can handle explicitly

xn+1 = −µ tanh(µxn)

where µ passes through 1. Mention period doubling in logistic map? Do somethingwith alternans?

8.6 Appendix: ODEs on a torus

Example: Forced van der Pol.

Discuss this problem through Poincare map.

Discuss Arnold’s example in Wiggins, Chapter 21. Get Arnold tongues. (Rmk:are boundaries of tongues examples of MA bifurcation?)

8.7 Appendix: What is chaos?

Mention Fourier analysis. Definitely discuss transition to chaos in Lorenz (here orsomewhere else?) Mention chaos in forced Duffing eqn?

8.8 ideas

Bursting

What on fast-slow systems?

APPENDIX A:

255



Appendix A

Guide to Commonly Used Notation

Symbol Usual Meaning

Rd d-dimensional Euclidean space

α, β real-valued parameters

t independent variable (often corresponds to time)x vector of dependent variables for a system of ODEs

b vector of initial conditions for a system of ODEs

A,B,S,J matrices (usually square and of dimension d × d)

Λ a diagonal matrix

N a nilpotent matrix

tr (A) trace of the square matrix A

det(A) determinant of the square matrix A

A norm of a matrix A; see Equation (2.12)

λ eigenvalue of a square matrix

F vector-valued function of a vector (often F : Rd → Rd)

256



Appendix B

Notions from Advanced Calculus

Supremum and infimum: If E ⊂ R, a number b is called an upper bound for E if x ≤ b for all x ∈ E . We say that b is the supremum of E if b is the least upper bound for E in the sense that (i) b is an upper bound for E , and (ii) if b is an upper boundfor E , then b

≤b. The infimum of E is defined analogously as the greatest lower

bound for E . The supremum and infimum of E ⊂ R are denoted by sup(E ) andinf(E ), respectively. If, for example, E = 1, 1

2 , 13 , 1

4 , . . . ⊂ R, then sup(E ) = 1 andinf(E ) = 0. Notice that in this example, sup(E ) ∈ E whereas inf(E ) /∈ E . When aset E contains it supremum, we typically write sup(E ) = max(E ), the maximum of the set E . Likewise, if inf(E ) ∈ E we may write inf(E ) = min(E ), the minimum of the set E . If a < b, then the open interval (a, b) and the closed interval [a, b] havethe same infimum and supremum (a and b, respectively), whereas only the closedinterval has a maximum and a minimum.

Compactness: Let E

⊂Rd. An open cover of E is any collection

Ωα

α

∈I of

open subsets of Rd with the property that

E ⊂α∈I

Ωα.

This union of sets need not be countable, and for that reason we have indexed thesets using real numbers α, drawn from some index set I . The set E is called compact

if every open cover of E has a finite subcover . That is, E is compact if from everyopen cover of E , we may select finitely many open sets whose union still contains E .

The same definition of compactness applies to subsets of spaces that are more“abstract” than Euclidean space Rd. Luckily for us, spotting compact subsets of Rd

is much easier than the above definition would suggest:

Theorem B.0.1. (Heine-Borel) A subset E ⊂ Rd is compact if and only if E is

257



closed and bounded.

We emphasize that the Heine-Borel theorem is a luxury associated with workingin Euclidean space. In any metric space, compactness always implies closedness andboundedness, but the reverse implication need not hold.

Many of the technical proofs throughout this text require us to estimate functions,their derivatives, or their integrals, over some given subset of Rd. Working withcontinuous functions over compact domains can facilitate such estimates, and thefollowing two theorems are invaluable.

Theorem B.0.2. Suppose that E ⊂ Rd1 is compact (i.e., closed and bounded) and F : E → Rd2 is continuous. Then the image F (E ) is a compact subset of Rd2.

Phrased more compactly (bad pun intended), images of compact sets under con-tinuous functions are compact.

Theorem B.0.3. (Extreme Value Theorem) Suppose that E ⊂ Rd is compact and F : E → R is continuous. Then there exist points xmin, xmax ∈ E such that

F (xmin) = inf x∈E

F (x) and F (xmax) = supx∈E

F (x).

The extreme value theorem guarantees that continuous functions F : E → R

always achieve maximum and minimum values over compact sets E . Regarding theinfimum and supremum in the statement of Theorem B.0.3, we typically refer tothese values as the [absolute] minimum and maximum, respectively, of F over thecompact set E .

For estimates involving continuous vector-valued functions, a slight adaptation of Theorem B.0.3 will aid us:

Corollary B.0.4. Suppose that E ⊂ Rd1 is compact and F : E → Rd2 is continuous.Then there exist points xmin, xmax ∈ E such that

|F(xmin)| = inf x∈E

|F(x)| and |F(xmax)| = supx∈E

|F(x)|.

To prove the Corollary, define the function G : Rd2 → R according to the ruleG(y) = |y|. Since F and G are continuous, then so is the composition (G F) : E →R. The corollary follows upon applying the extreme value theorem to ( G F).

Sequences: Suppose an, n = 1, 2, 3, . . . is a sequence of points in Rd. We saythat an converges to a limit L if for every ε > 0 there exists an integer N = N (ε)such that |an − L| < ε whenever n ≥ N .

There are many different notions of convergence for sequences of functions , two of which we single out for use throughout this text. Suppose that f n, n = 1, 2, 3, . . . ,is a sequence of functions defined over some set E ⊂ Rd. We say that f n converges

258



• pointwise on E to a function f if for every ε > 0 and x ∈ E , there exists aninteger N = N (x, ε) such that |f n(x) − f (x)| < ε whenever n ≥ N ;

• uniformly on E to a function f if for every ε > 0, there exists an integerN = N (ε) such that whenever n ≥ N , |f n(x) − f (x)| < ε for all x ∈ E .

Take a moment to contrast the definitions of pointwise and uniform convergence.Importantly, notice that in the definition of pointwise convergence, the integer N can depend upon both ε and x, whereas in the definition of uniform convergence,N depends only upon ε. For uniform convergence, the same integer N has to workfor all x ∈ E , and it is evident that uniform convergence automatically impliespointwise convergence. On the other hand, pointwise convergence of a sequence of functions does not imply uniform convergence. Consider, for example, the functionsf n : [0, 2] → R defined by

f n(x) =xn

1 + xn, (n = 1, 2, 3, . . . ).

We claim that this sequence converges pointwise but not uniformly to the function

f (x) =

1 if x ∈ (1, 2],12 if x = 1,

0 if x ∈ [0, 1).

First, suppose that x ∈ (0, 1), where we have excluded the endpoints because point-wise convergence is obvious at those two points. Given ε > 0, we must produce aninteger N = N (x, ε) for which

|f n(x) − f (x)| =xn

1 + xn< ε for all n ≥ N.

To motivate our choice of N , use algebra to rewrite this inequality as

n >ln

ε

1−ε

ln x

where1 we have used the fact that ln x < 0 since x ∈ (0, 1). Choosing

N (x, ε) =

ln

ε

1−ε

ln x

,

where⌈

y⌉

denotes the least integer not less than y, we have established pointwise

1We may as well assume ε < 1 since 0 < f n(x) < 1 for all x ∈ (0, 1). This allows us to dodge thepossibility of dividing by zero when we choose N (x, ε).

259



convergence over the interval 0 ≤ x ≤ 1. The same sort of algebraic manipulationscan be used to prove that f n(x) → f (x) pointwise for x ∈ (1, 2]. For x in thatinterval, the reader is encouraged to produce an integer N = N (x, ε) such that

|f n(x) − f (x)| = 1 − xn

1 + xn=

1

1 + xn< ε for all n ≥ N.

(This time, keep in mind that ln x > 0.)

To see why the convergence is not uniform, refer to the above calculation for

N (x, ε) for x ∈ (0, 1). Our choice of N = N (x, ε) is “best possible” in the sensethat N is the least integer for which the inequality |f n(x) − f (x)| < ε holds for eachn ≥ N . Again assuming ε < 1 (see footnote), letting x approach 1 would force us tochoose N larger and larger, implying that the convergence is not uniform.

The limiting function f (x) in the preceding example is discontinuous even thoughthe functions f n(x) are continuous. The next theorem ensures that this cannothappen for uniformly convergent sequences of continuous functions.

Theorem B.0.5. If f n is a sequence of continuous functions which converges uniformly to f (x) on a set E ⊂ Rd, then f (x) is continuous on E .

If the functions f n in the statement of Theorem B.0.5 happen to be differentiable,we might wonder whether the sequence f ′n converges to f ′. A cautionary exampleputs that line of questioning to rest: the sequence of differentiable functions

f n(x) =arctan(nx)√

n

converges uniformly to the constant function f (x) = 0 over the entire real line.However, the derivatives

f ′n(x) =

√ n

1 + n2

x2

have the unfortunate property that f ′n(0) =√

n → ∞ as n → ∞, whereas f ′(0) = 0.

With slightly stronger conditions on the sequence f n, we can draw conclusionsregarding uniform limits of sequences of differentiable functions:

Theorem B.0.6. Suppose that f n is a sequence of differentiable functions on [a, b]and that for some point x0 ∈ [a, b], the sequence f n(x0) converges. If f ′n converges uniformly on [a, b], then f n converges uniformly on [a, b] to a function f , and

f ′(x) = limn→∞

f ′n(x) (a ≤ x ≤ b).

Under weaker hypotheses, an analogous result holds for integrals:

260



Theorem B.0.7. Suppose that f n is a sequence of integrable functions on [a, b]and that f n converges uniformly to f on [a, b]. Then f is integrable on [a, b] and

b

a

f (x) dx = limn→∞

b

a

f n(x) dx.

Series: Suppose an ∈ Rd, n = 1, 2, 3, . . . and consider the infinite series

∞n=1 an.

The mth partial sum of the series is the finite sum Sm = mn=1 an. We say that the

infinite series converges to a limit L if the sequence of partial sums Sm convergesto L, and in this case we shall write

∞n=1 an = L. The infinite series converges

absolutely if the series∞

n=1

|an|,

whose terms are non-negative scalars, converges.

Absolute convergence of a series is a stronger property than [ordinary] conver-gence, as we now state more precisely.

Proposition B.0.8. If the series

∞n=1

an (an ∈ Rd)

converges absolutely, then the series converges.

Proof. Assuming the hypothesis of the proposition, let∞

n=1 |an| = L (a scalar).Given ε > 0, choose an integer N ∗ = N ∗(ε) large enough that

L −N ∗

n=1

|an| =∞

n=N

∗

+1

|an| < ε.

We claim that the sequence of partial sums of ∞

n=1 an is Cauchy, and thereforeconvergent since Rd is complete. Choose integers M, N larger than N ∗ and considerthe partial sums

SM =M

n=1

an and SN =N

n=1

an.

Assuming without loss of generality that N > M , the estimate

|SN

−SM

|=

N

n=M +1

an ≤

N

n=M +1 |

an

| ≤

∞

n=N ∗+1 |

an

|< ε

shows that the sequence of partial sums is Cauchy, completing the proof.

261



Theorems B.0.5 through B.0.7 have immediate corollaries involving series of func-tions. If

∞n=1 f n(x) is an infinite series of functions defined over an interval [a, b],

the mth partial sum is the finite sum S m(x) =m

n=1 f n(x). We say that the infi-nite series

∞n=1 f n(x) converges pointwise to S (x) on [a, b] if the sequence S m(x)

converges pointwise to S (x). Likewise, we say that the series converges uniformly toS (x) on [a, b] if the sequence S m(x) converges uniformly to S (x).

Corollary B.0.9. Suppose that ∞n=1 f n(x) converges uniformly to S (x) on a set

E ⊂Rd

and that each term f n(x) is continuous on E . Then S (x) is continuous on E .

The next two corollaries provide conditions under which we are justified in dif-ferentiating or integrating an infinite series term-by-term.

Corollary B.0.10. Suppose that ∞

n=1 f n(x) is a sum of differentiable functions on the interval [a, b] and suppose that the partial sums S m(x) obey the hypotheses of Theorem B.0.6 . That is, assume that the sequence S m(x0) converges for some choice of x0 ∈ [a, b], and that S ′m(x) converges uniformly on [a, b]. Then S mconverges uniformly to a function S (x), and S ′(x) = limm→∞ S ′m(x). Equivalently, ∞

n=1

f n(x)

′

=∞

n=1

f ′n(x).

Corollary B.0.11. Suppose that ∞

n=1 f n(x) is a sum of integrable functions on the interval [a, b], and that

∞n=1 f n(x) converges uniformly on [a, b]. Then

∞n=1 f n(x)

is integrable, and the order of integration and summation can be interchanged as

b

a

∞

n=1f n(x)

dx =

∞

n=1 b

a

f n(x) dx

.

Derivatives of vector-valued functions: Suppose E ⊂ Rd1 is open and thatF : E → Rd2 .

Definition B.0.12. We say that F is differentiable at x ∈ E if there exists a lineartransformation A from Rd1 into Rd2 such that

limh→0

|F(x + h) − F(x) − Ah||h| = 0.

We refer to A as the derivative of F at x, and write A = DF(x).

There are several equivalent ways to state this definition; e.g., setting h = y − xreveals that F is differentiable at x if and only if there exists a linear transformation

262



DF(x) : Rd1 → Rd2 such that

limy→x

|F(y) − F(x) − DF(x)(y − x)||y − x| = 0.

Alternatively, F is differentiable at x if and only if there exists a linear transformationDF(x) : Rd1 → Rd2 such that

F(x + h)

−F(x) = DF(x)h + r(h),

where the remainder r is “small” in the sense that

limh→0

|r(h)||h| = 0.

A function F : E ⊂ Rd1 → Rd2 can be written in component form as

F(x1, x2, . . . , xd1) =

F 1(x1, x2, . . . , xd1)F 2(x1, x2, . . . , xd1)

...

F d2(x1, x2, . . . , xd1)

,

where each of the functions F 1, F 2, . . . , F d2 is scalar-valued and (x1, x2, . . . , xd1) ∈ E .If F is differentiable on E , then the partial derivatives

∂F i/∂x j (i = 1, 2, . . . , d2 and j = 1, 2, . . . , d1)

exist and, for each x ∈ E , the derivative DF(x) has the d2×d1 matrix representation

∂F 1/∂x1 ∂F 1/∂x2 · · · ∂F 1/∂xd1

∂F 2/∂x1 ∂F 2/∂x2

· · ·∂F 2/∂xd1

... ... . . . ...∂F d2/∂x1 ∂F d2/∂x2 · · · ∂F d2/∂xd1

(B.1)

with respect to the standard bases for Rd1 and Rd2 . The matrix in Equation (B.1)is called the Jacobian of F.

Remark: If E is an open subset of Rd1 over which F : E → Rd2 is differentiable,then the Jacobian matrix exists for each x ∈ E and provides a convenient represen-tation for DF(x). However , existence of the partial derivatives in the matrix (B.1) isnot enough to conclude that F is differentiable. Consider, for example, the function

263



F : R2 → R defined by

F (x1, x2) =

1 if x1x2 = 0

0 otherwise,

which is discontinuous (and therefore non-differentiable) along both coordinate axes.In particular, F is not differentiable at (0, 0) despite the fact that both partial deriva-tives ∂F/∂x1 and ∂F/∂x2 exist (and are equal to 0) at that point. Such pathologies

can be avoided if the partial derivatives in the Jacobian matrix are continuous on E :

Theorem B.0.13. Suppose E ⊂ Rd1 is open and F : E → Rd2. If all partial derivatives ∂F i/∂x j in the Jacobian matrix (B.1) of F are continuous on E , then Fis differentiable on E .

Compositions of differentiable functions are also differentiable, and the chain rulefrom single-variable calculus generalizes to higher dimensions as follows.

Theorem B.0.14. (Chain Rule): Suppose E ⊂ Rd1 is open and F : E → Rd2

is differentiable at x0 ∈ E . Further suppose that G maps an open set containing F(E ) into Rd3 and that G is differentiable at F(x0). Then the composition H(x) =(G F)(x) = G(F(x)) mapping E into Rd3 is differentiable at x0 and

DH(x0) = DG(F(x0)) DF(x0).

Note: The composition DG(F(x0))DF(x0) can be computed by multiplying thed3×d2 Jacobian matrix representation of DG(F(x0)) with the d2×d1 Jacobian matrixrepresentation of DF(x0), resulting in a d3 × d1 Jacobian matrix representation of DH(x0).

The implicit function theorem: Under certain circumstances, an equation of

the form F (x, y) = 0 implicitly defines a function y = f (x). If the function F (x, y)is scalar-valued, then it is not difficult to state conditions guaranteeing that theequation F (x, y) = 0 implicitly defines a function.

Theorem B.0.15. Suppose that F (x, y) is continuously differentiable in the xy-plane. Then the equation F (x, y) = 0 can be solved for y in terms of x in a neigh-borhood of any point (a, b) at which F (a, b) = 0 and F y(a, b) = 0.

As an illustration, if F (x, y) = x2+y2−1 then the equation F (x, y) = 0 implicitlydefines a circle of radius 1 centered at the origin in the xy plane. Since F y = 2y isnonzero unless y = 0, the only points in the xy plane satisfying both F (x, y) = 0

and F y(x, y) = 0 are (x, y) = (±1, 0). At those two points, the lines tangent to thegraph of the circle F (x, y) = 0 are vertical. For all other pairs (x, y) on the graph,the equation F (x, y) = 0 implicitly defines a function y = f (x).

264



Now let us establish notation that will aid in generalizing Theorem B.0.15 tohigher dimensions. If x = (x1, x2, . . . , xd1) ∈ Rd1 and y = (y1, y2, . . . , yd2) ∈ Rd2 , weshall write

(x, y) = (x1, x2, . . . , xd1, y1, y2, . . . , yd2) ∈ Rd1+d2 .

Suppose that E ⊂ Rd1+d2 is open and F : E → Rd2 . The relationship F(x, y) = 0can be written in component form as

F 1 (x1, x2, . . . , xd1, y1, y2, . . . , yd2) = 0

F 2 (x1, x2, . . . , xd1, y1, y2, . . . , yd2) = 0...

F d2 (x1, x2, . . . , xd1, y1, y2, . . . , yd2) = 0.

The implicit function theorem provides criteria under which (y1, y2, . . . yd2) is (atleast locally) a function of (x1, x2, . . . , xd1). Of course, algebraically solving for they variables in terms of the x variables is generally too much to hope for.

Theorem B.0.16. (Implicit Function Theorem) Suppose that E ⊂ Rd1+d2 is open and that F : E

→Rd2 is continuously differentiable. Let (x0, y0)

∈E be a point such

that F(x0, y0) = 0, and form the square matrix

J =

∂F 1/∂y1 ∂F 1/∂y2 · · · ∂F 1/∂yd2

∂F 2/∂y1 ∂F 2/∂y2 · · · ∂F 2/∂yd2...

.... . .

...

∂F d2/∂y1 ∂F d2/∂y2 · · · ∂F d2/∂yd2

,

where each partial derivative is evaluated at (x0, y0). If J is invertible, then there exist open neighborhoods Ω1 ⊂ Rd1, Ω2 ⊂ Rd2 containing x0 and y0 (respectively),and a unique, continuously differentiable function G : Ω1

→Ω2 such that

F(x, G(x)) = 0 for all x ∈ Ω1.

B.0.1 Regions with smooth boundaries

Because we are interested only in trapping regions (see Chapter 4), we consider onlyclosed regions. A closed set K ⊂ Rd is said to have a C1-boundary if for every pointx ∈ ∂ K there is a neighborhood V of x and a C1-function φ : V → R such that

(i) (∀x ∈ V ) ∇φ(x) = 0, and(ii)

K ∩ V =

x

∈ V : φ(x)

≥0

.

(B.2)

Of course∂ K ∩ V = x ∈ V : φ(x) = 0

265



and ∇φ(x) equals the inward normal at x.

: Fact F: If ∂φ/∂x j = 0, then by the implicit function theorem, the equationφ(x) = 0 may be solved for x j as a function on the other variables.

266



Appendix C

Notions from Linear Algebra

C.1 Appendix: A compendium of results from linear algebra

C.1.1 How to compute Jordan normal forms

In a linear algebra text, one expects the author to prove that an arbitrary squarematrix is similar to a Jordan canonical form, and this proof is a messy affair1. Weassume the reader has seen the definitions and the statement of the theorem but notreally followed the proof. Here we accept that a Jordan normal form exists, and weask, more simply, how to compute it. We break this problem into two sub-questions,focusing more on examples than theory: given a matrix A,

Q1: How can we decide what the normal form of A is?

Q2: How can we find the similarity transformation that produces the normal form?

The first step in determining the normal form of A is to find the eigenvalues of A. Of course finding eigenvalues analytically is an intractable problem in general.We work with hand-picked examples in which the eigenvalues are readily computed.

Example 1:

A =

5 −22 1

.

It is readily computed that det(A − λI ) = (λ − 3)2. Thus, there are two possibleJordan forms for A,

J1 = 3 0

0 3 and J2 = 3 1

0 3 .

1One of the best treatments of Jordan forms of which we are aware is in Appendix B of Strang [5]

267



If A were similar to J1 = 3I , then every vector in R2 would be an eigenvector.However v ∈ R2 is an eigenvector iff (A − 3I )v = 0, or writing this out

2 −22 −2

v = 0.

Obviously not every vector satisfies this equation, so J2 must be the normal formfor A. Indeed, in hindsight we may see that if a 2 × 2 matrix has equal eigenvaluesbut is not equal to a multiple of the identity, then its Jordan normal form must bea 2 × 2 block.

Higher-dimensional examples in which there are double eigenvalues, but none of higher multiplicity, do not pose any additional difficulties, as we illustrate in someof the Exercises. Let us turn our attention to eigenvalues of multiplicity three.

Example 2: Consider

A1 =

a 1 1

0 a 00 0 a

A2 =

a 1 1

0 a 10 0 a

A3 =

a 0 1

0 a 10 0 a

.

By inspection, λ = a is the only eigenvalue of A j . Thus the possible normal formsfor A j are

J1 =

a

aa

J2 =

a 1

0 aa

J3 =

a 1 0

0 a 10 0 a

where, to facilitate visualization, entries that are zero but lie outside of any Jordanblock are left blank. We distinguish between cases by examining the dimension of the eigenspaces. These dimensions may be computed most easily by applying the“rank-plus-nullity” theorem (see Strang [5]), which gives us

dim ker(J j − aI ) = 3 − rank(J j − aI ).

Thus J1, J2, J3 have eigenspaces of dimension 3, 2, 1, respectively. Proceeding simi-larly, we find that A1, A2, A3 have eigenspaces of dimension 2, 1, 2, respectively. Sincethe dimension of eigenspaces is preserved under similarity transformations, we con-clude that A1, A2, A3 have Jordan forms J2, J3, J2, respectively.

Example 3:

A1 =

a 0 0 10 a 0 10 0 a 00 0 0 a

A2 =

a 0 1 00 a 0 10 0 a 00 0 0 a

A3 =

a 1 0 00 a 0 10 0 a 00 0 0 a

.

268



The possible Jordan forms are

J1 =

aa

aa

J2 =

a 10 a

aa

J3 =

a 1 00 a 10 0 a

a

J4 =

a 1 0 0

0 a 1 00 0 a 10 0 0 a

J5 =

a 1

0 a a 10 a

.

Proceeding as above, we compute that J1, J2, J3, J4, J5 have eigenspaces of dimension4, 3, 2, 1, 2, respectively. We can see potential trouble here in that J3 and J5 bothhave two-dimensional eigenspaces. Now A1, A2, A3 have eigenspaces of dimension3, 2, 2 respectively. Thus we may conclude that A1 has J2 as its normal form, butthe dimension of the eigenspace does not distinguish between J3 and J5 for A2 andA3. For this task we turn to generalized eigenvectors: a vector v ∈ Rd is called ageneralized eigenvector of a matrix A with eigenvalue λ if for some power p

(A − λI ) pv = 0.

Choosing p = 2, we compute that (J j − aI )2 has a three-dimensional null space for j = 3 and a four-dimensional null space for j = 5. On the other hand, (A j −aI )2 hasa four-dimensional null space if j = 2 and a three-dimensional null space if j = 3.Thus the normal forms for A2, A3 are J5, J3, respectively.

Now we turn to the second question above, finding the similarity matrix S suchthat S −1AS produces the Jordan form of A. As we shall see, the columns of S aregeneralized eigenvectors of A (cf. Proposition 2.3.2).

Recall Example 1, where

A =

5 −22 1

, with J =

3 10 3

its Jordan form. Observe that, with respect to the standard basis e1, e2 for R2, thematrix J satisfies

(J − 3I )e1 = 0 (J − 3I )e2 = e1.

To match this behavior for A, we need to find vectors v1, v2 such that

(A − 3I )v1 = 0 (A − 3I )v2 = v1,

269



and then the matrix S = Col(v1, v2) will achieve the required transformation. (Notethat (A − 3I )2v2 = 0, so v2 is a generalized eigenvector.) One possible choice is

S =

1 1/21 0

.

In the Exercises we ask the reader to check that this matrix performs the desiredtask. Incidentally, there is great latitude in the choice of S , more so than in the caseof distinct eigenvalues.

More subtle issues may arise in cases of higher multiplicity. Let A be the firstof the three matrices considered in that Example 2, and let J be its Jordan form.Observe that J satisfies

(J − aI )e1 = 0, (J − aI )e2 = e1, (J − aI )e3 = 0.

Thus we need to find vectors v1, v2, v3 such that

(A − aI )v1 = 0, (A − aI )v2 = v1, (A − aI )v3 = 0 (C.1)

and let S = Col(v1, v2, v3). Note that v1 and v3 are eigenvectors of A, but v1 must bechosen with care in order that the middle equation in (C.1), which is inhomogeneous,has a solution. Now the eigenspace of A is spanned by

100

0

1−1

.

Suppose v1 is a linear combination of these vectors with coefficients α, β . Writingout the middle equation in (C.1), we have

0 1 10 0 00 0 0

xyz

= αβ

−β

.

To have a solution we need β = 0; to avoid trivialities we need α = 0. Thus

S =

1 0 0

0 1 10 0 −1

,

where we have chosen α = 1, is one of the possible similarity matrices that transformsA to its Jordan form.

In the Exercises we ask the reader to carry out this procedure and check that itworks for several of the matrices considered above.

270



C.1.2 The Routh-Hurwitz criterion

It is astonishingly easy to determine whether a polynomial with real coefficients hasall its zeros in the left-half-plane. For example, for the two polynomials

Q1(λ) = λ4 + 2λ3 + 3λ2 + 2λ + 1Q2(λ) = λ5 + 2λ4 + 3λ3 + 3λ2 + 2λ + 1,

the calculations in Table C.1.2 show that the first has all its zeros in

ℜλ < 0

while

the second has at least one zero in ℜλ ≥ 0, respectively. Let us explain thesecalculations in the context of a general polynomial

P (λ) = λn + c1λn−1 + c2λn−2 + . . . + cn−1λ + cn.

The algorithm is slightly different, depending on whether n is even or odd. Reflectingthis difference we define ν = [n/2] where [·] is the greatest-integer function: thusn = 2ν if n is even and n = 2ν +1 if n is odd. The algorithm forms an (n+1)×(ν +1)matrix A as follows. The first two rows of A contain the coefficients of even and oddpowers of λ:

a1l : 1 c2 c4 . . .a2l : c1 c3 c5 . . . .

(If n is even, then 0 is inserted as the last entry of the second row, as in the table onthe left.) Subsequent rows, 3, 4, . . . , n + 1, are calculated inductively from productsthat resemble 2 × 2 determinants

ak+1,l = ak,lak−1,l+1 − ak,l+1ak−1,l. (C.2)

In words, computation of ak+1,l involves selecting entries from the two precedingrows and from the same column as ak+1,l and the one to the right. In calculating thelast column (l = ν + 1), entries ak,ν +2 or ak

−1,ν +2 outside the appropriate range are

assumed to be zero, as has been done in Table C.1.2. Then we have:

Theorem C.1.1. All the zeros of P lie in the open left half plane iff all entries ak1, k = 1, . . . , n + 1, in the first column of the above matrix are positive.

If the calculation produces a zero row, as in the table on the right, then thecalculation is stopped and there is at least one zero in closed right half plane. Indeed,note that Q2(±i) = 0. A root of P on the imaginary axis will cause a zero row, buta zero row may arise under other circumstances, also.

This theorem is proved in Section 4.2 of Engelberg’s book. Although the proof

requires careful reading, it is not terribly difficult, just clever. In cases where someof the zeros of P (λ) lie in the right half plane, it is usually possible to deduce howmany zeros lie there.

271



1 2 3

1 1 3 12 2 2 0

3 4 2 04 4 0 05 8 0 0

1 2 3

1 1 3 22 2 3 1

3 3 3 04 3 3 05 0 0 06 – – –

Table C.1: The matrices akl in the Routh-Hurwitz calculations for Q1(λ) = λ4 + 2λ3 + 3λ2 + 2λ + 1 (left table) and Q2(λ) = λ5 + 2λ4 + 3λ3 + 3λ2 + 2λ + 1(right table). Values of k from 1 to n + 1 appear in the first column of each table;values for l from 1 to ν + 1 appear in the top row. The two rows akl : l = 1, 2,which come directly from the coefficients of the polynomial, are separated from later rows that come from the calculation indicated in (C.2).

Reference: Shlomo Engelberg, A Mathematical Introduction to Control The-ory , Series in Electrical and Computer Engineering, Vol. 2, Imperial College Press,

London, 2005. Duke Catalogue QA402.3.E527.

If A is a d × d matrix with real entries, then in principle one could calculatethe characteristic polynomial of A and apply the Routh-Hurwitz criterion to it todetermine whether the eigenvalues of A lie in the left half plane. However, calcu-lating the characteristic polynomial of a moderately large matrix by hand is not apleasant task. (One could of course resort to symbolic computations to obtain thecharacteristic polynomial, but if the computer is involved, one might as well com-pute eigenvalues directly.) Thus we refrain even from formulating an analogue of Theorem 2.4.5 for 4 × 4 matrices. However, let us use the Routh-Hurwitz criterionto handle the 3

×3 case.

Proof of Proposition 2.4.5 . Let A be a 3 × 3 matrix with characteristic polynomial

det(A − λI ) = −[λ3 + c1λ2 + c2λ + c3].

The table applying the Routh-Hurwitz criterion to this polynomial is shown in Ta-ble C.1.2. Thus the roots of this polynomial are all in the left half plane iff

(a) c1 > 0, (b) c1c2 − c3 > 0, (c) c3 > 0. (C.3)

272



1 2

1 1 c22 c1 c33 c1c2 − c3 04 c3(c1c2 − c3) 0

Table C.2: The matrix akl in the Routh-Hurwitz calculations for the general cubic

λ3 + c1λ2 + c2λ + c3.

The coefficients c j are related to the eigenvalues of A through

c1 = −(λ1 + λ2 + λ3)c2 = λ1λ2 + λ2λ3 + λ3λ1

c3 = −λ1λ2λ3.

Thus it is apparent that (C.3a) and (c) are equivalent to Conditions (i) and (iii) of Theorem 2.4.5, and the equivalence of (C.3b) with Condition (ii) follows on observingthat

c2 = 12[(trA)2 − tr(A2)].

C.1.3 Continuity of eigenvalues of a matrix with respect to its entries

Near simple eigenvalues, the dependence of eigenvalues of a matrix on its entries isas nice as one could wish. Specifically, we have the following

Proposition C.1.2. Let λ1 be a simple eigenvalue of a d × d matrix A0. There is

an ε > 0 and a neighborhood U of A0 in Rd2

such any matrix A ∈ U has exactly one eigenvalue in the disk |z − λ1| < ε, and moreover this eigenvalue is a differentiable function on U .Proof. We prove this result by applying the implicit-function theorem to solve for λin the equation for eigenvalues,

f (λ, A) = det(A − λI ) = 0.

Now f (λ, A0) is a product of eigenvalues (λ1 − λ) . . . (λd − λ). Differentiation of thisproduct with respect to λ gives d terms, but only one of them is nonzero at λ = λ1:

∂f

∂λ(λ1, A0) = −(λ2 − λ1) . . . (λd − λ1).

273



Since λ1 is a simple eigenvalue, this product is nonzero, which completes the proof.

Suppose that A0 has a simple eigenvalue λ1, and consider a one-parameter familyof perturbations, A0 + εB. It follows from the proposition that there is a smoothlyvarying eigenvalue λ1(A0 + εB) that equals λ1 when ε = 0. In Exercise ?? we give aformula for calculating the derivative of this eigenvalue with respect to ε at ε = 0.

Near multiple eigenvalues, the dependence of eigenvalues of a matrix on its entries

is complicated by a difficulty familiar from complex function theory. For example,consider the matrix function

A(α, β ) =

0 1

α + iβ 0

,

which has eigenvalues λ j(α, β ) = ±√ α + iβ . We claim it is impossible to define

these square roots as continuous functions of α, β in a neighborhood of zero in R2.To see this, suppose otherwise that there are continuous eigenvalues λ j(α, β ). Onthe positive α-axis, the eigenvalues are ±√

α. Index the eigenvalues so that λ1 ispositive on the positive α-axis, and let us restrict λ1 to a small circle that encloses

the origin: i.e., let

Λ(φ) = λ1(ε cos φ, ε sin φ), where 0 ≤ φ < 2π.

Calcluation then shows thatΛ(φ) =

√ ε eiφ/2. (C.4)

If λ1 were continuous we would have Λ(2π) = Λ(0), but in fact (C.4) implies thatΛ(2π) = −Λ(0). This contradiction proves the claim.

The reader may protest that the matrix in this example has complex entries, buthere is a 4

×4 matrix with real entries that exhibits the same difficulty:

A(α, β ) =

0 I

B(α, β ) 0

where 0 is the 2 × 2 zero matrix, I is the 2 × 2 identity matrix, and

B(α, β ) =

α −β β α

.

Although at a multiple eigenvalue, one cannot define individual eigenvalues con-tinuously, nonetheless the group of eigenvalues does vary continuously, in the sense

of the following

Proposition C.1.3. Let λ1 be an eigenvalue of a d × d matrix A0 of multiplicity k.

274



There is an ε0 > 0 with the property that for all ε < ε0 there is a neighborhood U of

A0 in Rd2 such any matrix A ∈ U has exactly k eigenvalue in the disk |z − λ1| < ε.

The conclusion of this result is a standard epsilon-delta characterization of conti-nuity, with three differences: (i) a set of eigenvalues, rather than a single eigenvalue,is being bounded, (ii) an upper bound is needed on epsilon, and (iii) delta is relacedby the neighborhood U . Two remarks: (i) For ε0 one may use any number lessthan the minimum separation between λ1 and the other eigenvalue of A0. (ii) The

maximum diameter of U scales like ε1/k

as ε → 0.The proposition is easily proved with complex-function theory, but, since this

subject is not a pre-requisite for this text, we do not give the proof here.

Corollary C.1.4. Suppose all the eigenvalues of A lie in the left half plane ℜλ < 0.For any perturbation matrix B, for sufficiently small ε all the eigenvalues of A + εBalso lie in the left half plane.

The reader is asked to derive this corollary in the Exercises.

If one restricts attention to symmetric matrices, then all eigenvalues are real, andone may define individual eigenvalues continuously by ordering them. For example,

we may define λ1(A) to be the smallest eigenvalue of A; λ2(A) to be the next smallesteigenvalue; etc. (Ties do not matter for these definitions.) However, even thoughwith this convention the eigenvalues are continuous, they are not differentiable: thisis demonstrated by the matrix

A(α, β ) =

α β β −α

,

which has eigenvalues ± α2 + β 2.

C.1.4 Fast-slow systems

Recall problem from Chapter 1. More general system

εx′ = −x + bT yy′ = xc + By

Coefficient matrix

A =

−ε−1 ε−1bT

c B

Compare two approaches:

1. Solve for x = bT y. Substitute into y-eqn, get reduced system

y′ = (B + cbT )y.

275



2. Full system.

Full system has one e-value approx equal to −ε−1. Show other eigenvalues aresame, modulo ε to some fractional power. In particular, reduced system is stable iff full system is.

C.1.5 Exercises

Prove Corollary C.1.4.

Suppose that A0 has a simple eigenvalue λ1, and let Λ(ε) be the eigenvalue of A0 + εB that equals λ1 when ε = 0. In Exercise ?? we give a formula for calculatingdΛ/dε(0).1. In case A0 has block-diagonal form, show derivative is 1,1-entry of B.2. Suppose S −1A0S is block diagonal. Note that first col of S is eigenvector, say v,of A0, first row of S −1 is eigenvector, say w, of AT . Argue that

dΛ/dε(0) = w, Bv.

276



Appendix D

Nondimensionalization and Scaling

D.1 Classes of equations in applications

The applications in this text may be group into three broad classes: Mechanical,electrical, and “bathtub” models. ODEs from all three classes appeared already in

Chapter 1. In this Appendix we briefly describe each class, and then we discuss animportant technique for analyzing such equations.

D.1.1 Mechanical models

Mechanical systems that we encountered in Chapter 1 include spring-mass systems(see (1.21)), the pendulum (see (1.4)), and Duffing’s equation (see (1.25)). Thependulum and models derived from it represent one of the three central examplesin this text; extra effects added to the pendulum equation include (i) torque (??),(ii) rotation (??), and (iii) vertical vibration (??). The Lorenz system (see (7.5)) to

be studied in Chapters 7 and 8 also has a basis in mechanics.Usually the equations of motion for a mechanical system can be derived from a

straightforward application of Newton’s second law: mass times acceleration equalsthe sum of the forces. Let us illustrate this by deriving the equations of motion of a rotating pendulum, an example that occurs repeatedly in the text. Although wespeak of a rotating pendulum, we find it more intuitive to view this system as abead sliding on a wire hoop that is rotating about a vertical axis, as illustrated inFigure 7.1. Let m be the mass of the bead; a the radius of the hoop; ω the (constant)speed of rotation. We apply Newton’s second law to describe the motion, which ispurely tangential. If x measures the angular position of the bead as a function of time

(see figure), then the tangential acceleration is ax′′. There are three forces actingtangentially on the bead: (i) gravity, whose projection onto the tangential directionis −mg sin x; (ii) centrifugal force from rotation about the vertical axis in a circle

277



of radius equal to a sin x, whose tangential projection is m(a sin x)ω2 cos x; and (iii)friction, which we model as −βx′. Expressing Newton’s law as a first-order systemyields

x′ = y (D.1)

may′ = −βy − mg sin x + m(a sin x)ω2 cos x.

Regarding global existence for ODEs describing mechanical systems, often a trap-

ping region may be constructed as the region interior to a level set of the energy. Wedid this for the torqued pendulum in Section 4.3.4 and for the rotating pendulum inSection 5.4.?? Further examples are given in the Exercises.

D.1.2 Electrical models

Van der Pol’s equation (1.31) originally arose in modeling an electrical circuit. ODEswith an electrical basis also arise in models for nerve cells. Specifically, in thisbook we study the FitzHugh-Nagumo equations (see (5.69)) and an elaboration of these equations Chapter 8 (??) intended to mimic the Morris-Lecar model ref’ce).

Incidentally, as we show in chapter ref ’ce?, van der Pol’s equation (4.29) is aspecial case of the FitzHugh-Nagumo equations. In Section 4.3.3 we worked ratherhard to construct a trapping region for van der Pol’s equation. Ironically, it is easierto construct a trapping region for the more general FitzHugh-Nagumo equations,which we ask the reader to do in Exercise ??.

Van der Pol’s equation and equations for other circuit models may be derivedfrom Kirchoff’s laws plus information about the current-voltage characteristics of the various devices in the circuit. For biologically based models, one needs to discussphysiological issues in order to understand current-voltage characteristics. We do notattempt to introduce this material in the text. On web site? For mathematiciansnot primarily interested in biology, the fast-slow analysis of Section 6.5 providesmore useful intuition about the FitzHugh-Nagumo system than the derivation of theequations from fundamental laws1

At end of this TeX file there is derivation of v.d.Pol from FitzHugh-Nagumo , “junked”.

D.1.3 “Bathtub” models

We borrow the phrase “bathtub” model from Ellner-Guckenheimer []. This tongue-in-cheek phrase describes models in which a population (of organisms, chemicals,

1There is a reward for understanding the derivation of the van der Pol equation: this resolves themystery of how this equation seems to involve the creation of egergy out of nowhere. In biologicalprocesses, there is no issue: in fact energy is being consumed as time evolves.

278



proteins, enzymes, etc.) is divided into various categories (bathtubs) and the ODEstrack the flow of individuals (water) from one category to another. Strictly speakingthe populations in such models are usually integers, but in the case of large pop-ulations one may approximate an integer variable by a continuous variable. Twoof our three central examples belong to this class: (i) the Lotka-Volterra equationswith elaborations (1.41), in which the population consists of animals and (ii) theactivator-inhibitor equations (4.24), in which the population consists of chemicalsproteins?. Several other examples occur in the text and the Exercises. Q: Worth

listing these?Typically in bathtub models there are several processes causing motion from one

category to another, and it is a wonderful simplification that equations describingthe overall evolution may be obtained simply by adding the rates associated withthe various processes. For example, consider the activator-inhibitor equations (4.24):each equation has two terms, the first representing a nonlinear production rate andthe second, a linear decay, and these terms are merely added.

Regarding the name activator-inhibitor : The variable x in (4.24) is called anactivator since its production rate2 increases with its own concentration. The pro-duction rate of r also increases with x; this variable is called an inhibitor or repressor

because a high concentration of r suppresses the production of x through the fac-tor 1 + r in the denominator of the first term in (4.24a). Biologists represent theseinteractions in the schematic graph shown in Figure ??. The vertices of the graphenumerate the chemical species undergoing reaction. The edges of the graph indicatethat one concentration influences the production rate of another chemical. An edgethat terminates in an arrowhead indicates promotion; one that terminates in a shortcross bar, inhibition.

For many bathtub models it is not difficult to prove global existence. For example,in (4.24) each population has a death term that grows with the population, andthe growth terms are bounded. As we showed in Section 4.3.1, a sufficiently large

“rectangle” will serve as a trapping region. We ask the reader to carry out thesesteps in the Exercises for other models.

D.2 Scaling and nondimensionalization

Scaling and nondimensionalization are techniques for simplifying ODEs that arise inapplications, extracting the most important features from them. In the hands of askilled user, they are extremely powerful, as is illustrated by the following anecdoteref’ce about G.I. Taylor, a distinguished twentieth-century British physicist/applied

2

A function of the form xn

/(1 + xn

) is called a Hill function . ref’ce to Alon Although the shapeof this function suggests a hill, the name actually refers to ?? Hill, a ??-century biologist??.Describe: starts at zero, saturates.

279



mathematician. Using this kind of analysis, he estimated the power of one of theearly tests of the atomic bomb, based on only a series of photographs from the cover of Time Magazine of the mushroom cloud at several times closely following the explosion . His estimate was so accurate that it led to an investigation by the FBI!

Q: Where make the point that in scaling want to have all quantitiesbe O(1) or smaller?

We introduce these techniques by three examples.

D.2.1 Duffing’s equation

Duffing’s equation (1.25) provides a simple example in which to introduce the tech-nique. With all quantities retaining their units, the equation is

mx′′ + βx′ − k1x + k2x3 = 0. (D.2)

The technique focuses attention on two basic questions:

Q1: What dimensionless quantities can be constructed from the parameters in theproblem by forming products of powers: i.e., quantities of the form

ma β b kc1 kd

2

where the exponents a,b,c,d may be chosen arbitrarily?

Q2: How can the form of the equations be simplified by introducing scaled variables,

x = x/L, τ = t/T (D.3)

where the scale factors L, T may be chosen arbitrarily?

To address these questions, let’s make a table of the dimensions of all the quan-

tities in the equation, both variables and parameters. This information for (D.2) isgiven in table D.2.1. To compile this information, one first determines the units of t,x,m from the physical origins of the equation, and then one determines the otherunits by the requirement that the various terms in the equation must have the samedimensions. For example, we know that the units of the first term in (D.2) are masstimes length divided by time squared; in symbols

U(mx′′) = mℓ/t2

where U denotes “units of”. Thus, from requiring consistent units between terms wededuce that

mℓ/t2 = U(βx′) = U(β )U(x′) = U(β )ℓ/t,

from which the entry for the units of β in the table follows.

280



Variables Description Units

t time tx length ℓ

Parameters Description Units

m mass mβ friction m/ tk1 lin spring const m/ t2

k2 nonlin spring const m/ℓ2 t2

Table D.1: Units of quantities in Duffing’s equation, (D.2).

Dimensionless quantities appear explicitly in the first question, and they also arisein the second one, in the following way. In Question 2, the most convenient choicefor L has the units of length, so x is dimensionless; likewise t. It would be difficult tooveremphasize the importance of dimensionless quantities in understanding physicalequations. For example, it is meaningless to speak of a quantity with nontrivialunits being either large or small. Let us support this statement with an apparently

outrageous claim: one normally thinks of the speed of light as a very fast velocity,but in fact it is only 10−6—provided one mischievously measures velocity in termsof astronomical units per millisecond. The speed of light is fast compared withvelocities encountered in most circumstances like the speed of sound or the speedof an automobile, which mathematically means that the dimensionless ratio of thespeed of light divided another speed is large.

Turning to the first question, we find from the table that

U(ma β b kc1 kd

2) = ma+b+c+d t−b−2c−2d ℓ−2d.

Requiring this to be dimensionless means that we must have

a + b + c + d = 0b + 2c + 2d = 0

d = 0.

We have three equations in four unknowns, the coefficient matrix has rank 3, so thereis one linearly independent solution. We may take b = 1, a = c = −1/2, which givesthe dimensionless quantity β/

√ mk1.

To address second question let’s substitute (D.3) into (D.2); after dividing theequation by mL/T we have

d2xdτ 2

+ βT m

dxdτ

− k1T 2

mx + k2L2T 2

mx3 = 0.

281



We choose T =

m/k1 to make coefficient of term linear in x equal to unity, so wehave

d2x

dτ 2+

β √ mk1

dx

dτ − x +

k2L2

k1x3 = 0.

Next we choose L =

k2/k1, and the equation simplifies to

d2x

dτ 2+

β √ mk1

dx

dτ − x + x3 = 0,

which is the simple form of the equation introduced in Chapter 1. All coefficientsin the equation are either simple numbers—i.e., ±1—or the dimensionless quantityconstructed in answering Question 1.

The task of nondimensionalization is not complete until one has interpreted thedimensionless quantities constructed in answering Question 1 and the scale parame-ters T and L arising in answering Question 2. One convenient interpretation is basedon comparing (D.2) with the equation for simple harmonic motion,

mx′′ + k1x = 0. (D.4)

The time scale T =

m/k1 derives from the period of cos(

k1/m t), i.e., the trigono-metric solutions of (D.4). The length scale L =

k2/k1 is the displacement for which

the nonlinear and linear forces are of the same order of magnitude. Finally, the di-mensionless parameter β/

√ mk1 is a measure of the strength of friction in (D.2).

More precisely, the linear equation mx′′ + βx′ + k1x = 0 has solutions of the forme−βt/2m times a trigonometric function; in the characteristic time T this exponentialhas decayed by

e−βT /2m = e−(β/√

mk1)/2.

Here is another perspective on the importance of dimensionless quantities: no one

parameter in (D.2) by itself determines the behavior of solutions of this equation, incontrast with the dimensionless combination β/√ mk1: problems for which β/√ mk1is large and for which it is small are different systems, reflected in different behaviorof solutions. For example, the difference between under-damped, over-damped, andcritically damped depends on β/

√ mk1.

D.2.2 Lotka-Volterra with logistic limits

The fully dimensional form of the Lotka-Volterra equations with logistic limits onprey growth is

x′ = αx

−κx2

−βxy

y′ = γxy − δy (D.5)

282




t time tx number of prey N xy number of predators N y


α prey growth rate 1/ tβ predation coefficient 1/ N y tγ growth coefficient from predation 1/ N x t

δ predator death rate 1/ tκ nonlinear limit to prey growth 1/ N x t

Table D.2: Units of quantities in the Lotka-Volterra equation with logistic limits,

(D.5).

Table D.2.2 lists the units of variables and parameters in (D.5). In the Exercises, weask the reader, after checking the information in the table, to show that there areexactly two dimensionless combinations of the parameters in (D.5), which we maytake to be

ρ = δ α

and K = κγ

. (D.6)

If we substitutex = x/X, y = y/Y, τ = t/T

into (D.5), we derive the equations

dxdτ

= αT x − κTXx2 − β T Y xydydτ = γTXxy − δTy.

We choose T = 1/α to simplify the linear growth term in the x-equation. Similarly

we choose Y = α/β to simplify the nonlinear predation term in the x-equation.Finally we choose X = δ/γ for reasons we will explain below. These substitutionsyield the equations

dxdτ

= x(1 − x/K ) − xydydτ = ρ(xy − y)

(D.7)

where ρ, K are defined by (D.6).

It remains to interpret the dimensionless parameters and the various scales in-troduced in deriving (D.7). It is clear that ρ is a ratio of time scales, specifically of prey growth to predator death. Likewise it is clear that K is the nondimensional

carrying capacity of the environment for the prey. The time scale T is based on theprey growth rate. The predator scale Y , chosen to simplify the x-equation, may beinterpreted as the population of predators that exactly balances the linear growth

283



rate of the unperturbed prey. Finally, to interpret the prey scale X , we consider thesteady-state solutions of (D.5). Provided the carrying capacity is large enough—insymbols, κδ < αγ —the equations have a steady-state solution in which both popu-lations are nonzero. In this steady state, x = δ/γ , and we chose this value for ourprey scale.

Nondimensional reductions of an ODE are far from unique. In the first place, onemight have chosen prey scale X = α/γ so that the nonlinear term in the y-equationwould have a coefficient of unity; our choice above is simply a matter of taste. On

a more serious level, a different—but equally natural—nondimensionalization wouldresult from choosing prey scale based on the carrying capacity rather than predation:specifically X = α/κ. (In the Exercises we ask the reader to determine the equationsin this scaling.) What normalization is most convenient depends on what problemone is studying. In our discussion of the Lotka-Volterra equation above, we arecomparing solutions of the equation with finite carrying capacity to periodic solutionsof the original Lotka-Volterra equations: i.e., with infinite carrying capacity. Whileis is easy to let the carrying capacity tend to infinity in (D.7), it of course impossibleto do so if the carrying capacity is chosen as the prey scale.

Incidentally, if one measures both prey and predator populations in terms of their

biomass, then the ratio β/γ may be regarded as an additional dimensionless param-eter. Indeed, this parameter may be interpreted in terms of conversion efficiency:i.e., the mass of prey that must be consumed to produce a unit mass of predators.However, as we have seen above, in fact this parameter may be scaled out of theLotka-Volterra model. It is unusual for a dimensionless parameter to be of so littleconsequence; and in certain more complicated predator-prey models it does play adecisive role—see Rosenzweig-MacArthur problem.

D.2.3 Michalis-Menton kinetics

An enzyme is a catalyst in biochemical reactions: i.e., it facilitates the reaction but isnot consumed itself. Michalis-Menton kinetics arise as an approximation of certainreaction rates where an enzyme is involved. Consider a chemical reaction3 of theform

R+E ←→ [RE] → P+E. (D.8)

Here R is a reactant (often called a substrate), E is an enzyme, [RE] is a compoundin which the reactant and the enzyme are bound to one another, and P is a product.The reaction R+E ←→ [RE] is reversible, while the production of the product is

3Reaction (D.8) might seem to violate conservation of mass—otherwise how could P be differentfrom R? The product P might be what’s this called? an isomer—same chemical composition,different spatial structure. Also possible that [RE] → P+E represents a binary reaction [RE]+X → P+E where X is so plentiful that its concentration may be treated as constant during thereactions; its concentration could be included in the reaction constant k−1.

284



considered irreversible. According to the law of mass action, a binary reaction suchas R+E → [RE] proceeds at a rate proportional to the product of the concentrationsof R and E, while the unitary reactions [RE] → R+E and [RE] → P+E proceed atrates proportional to the concentration of [RE]. Let r,e,c,p denote the concentrationsof R,E,[RE],P, respectively (mnemonic: “c” for compound). We assume that allconcentrations are uniform over some region in space so that the concentrations aredescribed by ODEs (rather than PDEs), specifically by the equations4

r′ = −k+1re + k−1ce′ = −k+1re + k−1c + k2cc′ = k+1re − k−1c − k2c

p′ = k2c

(D.9)

where we use a standard, natural notation for the reaction constants k j . Althoughthis is a system of four equations, it may be reduced to two equations. In the firstplace, the equation for p decouples from the other three equations, so it may beignored until the other three have been solved. Less trivially, by adding the e and cequations we deduce that

Claim D.2.1. The sum e + c is independent of time.

Let us denote by E the constant value of e+c; for example, if initially none of thecompound [RE] is present, then E is the initial value e(0). We may use the relatione = E − c to eliminate e from the equations. Thus, it suffices to consider the system

r′ = −k+1r(E − c) + k−1cc′ = k+1r(E − c) − k−1c − k2c.

(D.10)

In the typical situation R is very abundant while the amount of E is ratherlimited—it is entirely possible that at some time E will be almost completely bound

in the compound RE while the concentration of R is decreased only modestly. Thisdifference in concentrations could arise because E is physically located only on a sur-face bounding a three-dimensional region which contains R; however, for simplicitywe shall consider both R and E both to be distributed over some region, just thatthe concentration of E is vastly smaller than that of R.

With nondimensionalization, one can systematically analyze the mathematicalconsequences of the great difference in the concentrations of R and E. Table D.2.3

4Note that the same reaction constants appear in different equations. This simplification arisesbecause we measure all concentrations in moles per unit volume . Q: Is this called molar?

This is like measuring concentration in number of molecules per unit volume, except to avoid

excessively large numbers we count molecules in units of Avagadro’s number, N a =??. Thisrequires more explanation. Also remember Q: Why can’t you measure predators

and prey in terms of biomass and get an extra dim’less constant in Lotka-Volterra ?

285




t time tr, c concentration molar


k+1 reaction rate 1/ (t molar)k−1 reaction rate 1/ tk2 reaction rate 1/ tE concentration molar

Table D.3: Units of quantities in (D.10), the reactions leading to Michalis-Menton kinetics.

lists the units of all quantities in (D.10). In the Exercises we ask the reader to showthat two dimensionless combinations may be constructed from these parameters,which may be chosen to be

κ =k2

k−1and ε =

Ek+1

k−1. (D.11)

We claim the parameter ε is exceedingly small. Now the concentration of the enzyme,which equals E −c, will never exceed E . On the other hand, k−1/k+1 is an appropriatescale for r—it may be interpreted as the concentration of R at which the forward andbackward reactions in R+E ←→ [RE] proceed at equal rates if half of the enzymeis bound (i.e., c = e = E/2). Thus ε is bounded by the ratio of the concentration of the enzyme to that of R, from which the claim follows.

To nondimensionalize (D.10), we define

r =k+1k−1

r c = c/E, t = εk−1t, (D.12)

which yieldsdr/dt = −r(1 − c) + c

εdc/dt = r(1 − c) − (1 + κ)c,(D.13)

the equations (4.35) that we discussed above. In particular we confirmed the validityof the fast-reactions-to-completion approximation for these equations. Because of the importance of the Michalis-Menton approximation, let us return to unscaledvariables to express the rate at which the product P is produced

dp

dt

= kr

K + r

(D.14)

where k = k2E and K = (k−1 + k2)/k+1. The key feature of (D.14) is that the

286



reaction rate saturates at large r.

Q: Mention Hill function? Q: In Exercise give reaction scheme thatleads to ODE with RHS like 1/(K + r)?

D.3 Exercises

List some exercises as “must-do”.

Extend theory (exist, unique, diff’bility) to nonaut eqn.

Scale activator-inhibitor.

Scale Lotka-Volterra -log using K = 1 to scale prey.

A long one: nondimensionalize the FitzHugh-Nagumo equations. Start with(??), which has eight constants, reduce it to (4.70), which has three. You have fiveconstants to play with in

x =x + a

X , ybar =

y + b

Y , t =

t

T .

Q: Why doesn’t multiplying eqn by constants do anything? Is it likefor a linear second-orde eqn? I.e., although there are 3 possible scalingparameters, scaling x and scaling the equation do the same thing. Maybeneed to put a warning when do naive counting of parameters.

• Choose a to move the local min to the origin.

• Choose X to move the other root of the cubic to x = 1.

• Choose T to make coefficient of x3 equal to unity.

• Choose Y to make coefficient of y in first eqn equal to unity.

• Choose b to make const term in second eqn go to zero.

Lesson to learn: in some problems you can translate a variable and getfurther simplification. The count of dim’less parameters is misleading insuch a case.

287



Bibliography

[1] K. E. Atkinson, An introduction to numerical analysis, 2nd ed., Wiley, New York,1989. 4, 15, 4.8.3, 6.1.2

[2] M. Golubitsky and D.G. Schaeffer, Singularities and groups in bifurcation theory ,Springer-Verlag, New York, 1988. 7.5.3

[3] M. H. Holmes, Introduction to perturbation methods , Springer-Verlag, New York,1995. 6.1.2, 3

[4] R. K. Nagle, E. B. Saff, and A. D. Snider, Fundamentals of differential equations and boundary value problems, 6th ed., Addison Wesley, 2011. 1.3.2

[5] G. Strang, Introduction to linear algebra, 4th ed., Wellesley Cambridge Press,Wellesley, MA, 2009. 2.3.1, 1, C.1.1

288



Index

absolutely convergent sequence, 42Airy’s equation, 2alpha limit set, 181asymptotically stable equilibrium, 134attractor, 52autonomous, 3, 19

Banach space, 67bifurcation diagram, 214blow-up of solutions, 62

cantilever beam, 12capacitor, 16Cauchy sequence, 67Cauchy-Schwarz inequality, 38center manifold, 169circuit, electrical, 16complete, 67constant-coefficient system, 36continuous, 43continuous dependence on initial data,

20continuously differentiable, 43contraction, 67contraction mapping principle, 68convergence of a sequence, 41

diagonalizable matrices, 46differentiable, 43diffusion, 141direction field, 5distance, 41double well potential, 14Duffing’s equation, 115, 130

eigenvalue problem, 37energy, 16, 148equilibrium point, 130

existence of solutions, 20, 64, 68, 76existence, global, 86, 87

exponential of a matrix, 37

first return, 197FitzHugh-Nagumo equations, 168fixed point, 68flow, 103

forward time, solution in, 74fundamental existence theorem, 64, 68fundamental existence theorem, nonau-

tonomous case, 76

global existence, 86, 87globally Lipschitz, 63Gronwall’s Lemma, 71, 86

Hamiltonian, 166homogeneous, 3, 19

Hooke’s Law, 10hyperbolic equilibrium, 136

IC, see initial conditionindex, 206index of an equilibrium, 209inductor, 16inhomogeneous equation, 55initial condition, 4initial value problem, 4inner product, 38

integral equation, 66invariant, 158IVP, see initial value problem

Jordan block, 50Jordan curve, 177Jordan Curve Theorem, 206Jordan normal form, 50

kinetic energy, 16

Lasalle’s Invariance Principle, 145Leibniz’ rule, 44level set, 20

289



Liapunov function, 143Liapunov stability theorem, 144Liapunov stable equilibrium, 133limit cycle, 173linear ODE, 3linear system, 19linearization, 131Lipschitz continuity, 63, 73

locally Lipschitz, 63locally uniformly Lipschitz, 76logistic equation, 1, 7Lotka-Volterra model, 17

magnets, 12Mathieu’s equation, 2matrix exponential, 37matrix norm, 40maximal interval of existence, 84Michaelis-Menten kinetics, 101

Newton’s second law of motion, 9non-degenerate equilibrium, 209non-diagonalizable matrices, 48norm, 41, 66norm (for matrices), 40norm (for vectors), 38nullcline, 92numerical methods, 124

ODE, see ordinary differential equationomega limit point, 181omega limit set, 181orbit, 20, 147order of a numerical method, 121order of an ODE, 2ordinary differential equation, 1

pendulum equation, 2, 12periodic solution, 171perturbation methods, 124

phase plane, 19Poincare map, 195Poincare-Lindstedt method, 187

positively invariant set, 182potential energy, 14potential function, 147predator-prey, 17Pythagorean Theorem, 38

Rayleigh number, 218real canonical form, 49

relaxation oscillations, 194resistor, 16Riccati equation, 2Routh-Hurwitz criterion, 271Runge-Kutta-Fehlberg method, 122

saddle-node bifurcation, 221secular term, 187semi-group property, 104separable ODE, 7separatrices, 157

similar matrices, 46simple harmonic motion, 1simple, closed curve, 177singularity, 2sink, 52solution of an ODE, 3solution operator, 103spring-mass system, 9stable manifold, 151stiff ODEs, 122

strict Liapunov function, 144superposition principle, 4system of ODEs, 17

total energy, 14trace-determinant criteria, 54trajectory, 20, 147transcritical bifurcation, 220trapping region, 87triangle inequality, 39

uniqueness, 62uniqueness of solutions, 20, 74uniqueness theorem, 75

290



uniqueness theorem, nonautonomous case,76

Schaeffer Cain Ode Book

Documents

Transcript of Schaeffer Cain Ode Book