Equivalence and bifurcations of ﬁnite order stochastic...

Equivalence and bifurcations of finite order stochasticprocesses∗

C. Diks and F.O.O. Wagener†

25th April 2005

Abstract

This article presents an equivalence notion of finite order stochastic processes. Local dependence mea-sures are defined in terms of ratios of joint and marginal probability densities. The dependence measuresare classified topologically using level sets. The corresponding bifurcation theory is illustrated withsome simple examples.

1 Introduction

Bifurcation theory has been an extremely successful tool to investigate the qualitative (or structural) prop-erties of deterministic nonlinear systems. But in many practical situations, deterministic models fit theavailable data only imperfectly, and stochastic models are proposed to describe the behaviour of a givensystem; the stochastic components can model genuinely random events, but they can also be introduced forquantities of which not enough is known to describe them otherwise.

Motivated by the success of deterministic bifurcation theory, there have been several attempts todevelop bifurcation theory for stochastic processes; however, to find a natural replacement of the notion of‘topological equivalence’ has been the main problem. For at the base of any bifurcation theory, there is anotion of ‘form’ or ‘shape’, formalised as an equivalence relation between systems: two systems are said tobe of the same form if they are in the same equivalence class. A meaningful bifurcation theory can only bedeveloped if there are equivalence classes with non-empty interior; note that this presupposes a topology onthe space of systems. Elements in the interior of an equivalence class are ‘structurally stable’: if a systemparameter is changed slowly, the system will remain in the equivalence class and the form of the systemdoes not change. All other elements, associated to changes of form, are called ‘bifurcating’.

In this article, we quickly review the previously proposed notions of phenomenological and dy-namical bifurcation of stochastic processes. We introduce a new equivalence relation, based on the depen-dence structure of the process. This notion is related, and in some cases equal, to the better known copuladensity of dependent stochastic variables. We show that our equivalence relation has ‘many’ structurallystable elements and that it avoids some limitations of older notions. Finally, we illustrate it by giving severalapplications.

∗The authors wish to thank the participants of the workshop on Stochastic Bifurcation Theory, January 2003, in Leiden, forstimulating discussions. This research is supported by the Netherlands Organization for Scientific Research (NWO) under a MaGW-Pionier and a MaGW-Vernieuwingsimpuls grant.

†Center for Nonlinear Dynamics in Economics and Finance (CeNDEF), Universiteit van Amsterdam, Roetersstraat 11,1018 WB Amsterdam, The Netherlands,[email protected], [email protected]

1

1.1 Phenomenological bifurcations

Without always stating it explicitly, in this article we shall always assume that the stochastic processesconsidered are ergodic, and that they have therefore unique invariant probability distributions; we shallmoreover assume that these invariant distributions are absolutely continuous with respect to a measure ofthe Lebesgue class, and that the corresponding probability density is a smooth differentiable function.

The natural first attempt to attain at a classification of stochastic processes is to apply the Morseclassification of real valued functions to invariant probability densitiesp of processes [3, 15]. The corre-sponding equivalence relation is that of smooth coordinate transformations of domain and range ofp, thestable elements being Morse functions with all critical values distinct from each other. For the purposes ofthis article, we shall call the equivalence relationP-equivalence, in analogy with the associated bifurcationnotion, which has been calledphenomenological bifurcationor P-bifurcation(see Arnold [2], p. 471-473).

The main problem of this approach, acknowledged in [15], is that the equivalence classes are notinvariant under diffeomorphisms of the underlying space. For instance, letXt andYt be two processeson Rm with invariant densitiespX andpY, and letϕ be an invertible transformation ofRm. If the processesare related byYt = ϕ(Xt), then the invariant densities are related as

pX(x) = pY(ϕ(x)) |detDϕ(x)|;

we see that, in the language of physicists, the function value of the invariant density ‘depends on the coor-dinates’. It is clear that equivalence classes are preserved if only volume-preserving diffeomorphisms areadmitted as coordinate changes, for which|detDϕ(x)| = 1 for all x. Note that these comprise the class ofRiemannian isometries proposed in [15].

The restriction of admissible transformations to volume-preserving diffeomorphisms of the do-main of the invariant densityp is necessary. For if all diffeomorphisms were admissible, then all processeson the real line would be equivalent, since there is always a coordinate transformation such that the invariantdensity of the transformed process is the uniform density on the unit interval. In the words of Zeeman [15]:‘However, this[i.e. admitting all diffeomorphisms]would be exactly thewrongthing to do, because it wouldrender the tool[i.e. P-equivalence]useless by making everything equivalent (...)’(the comments in squarebracket are our interpolations).

Underlying this difficulty is the fact that the probability densityp(x), unlike the measurep(x)dx, isnot an invariant geometrical object. By consequence, P–equivalence is an inconvenient notion for practicalapplications: for instance, recording data on linear or logarithmic scale might point to different conclusions.

1.2 Dynamical bifurcations

A second bifurcation notion for stochastic processes has been introduced by Arnold and his co-workers(see [2] for an extensive exposition). We shall try to sketch its main ideas briefly using the simple first-orderprocessXt onR satisfying

Xt+1 = g(Xt)+ εt , (1)

where theεt is a sequence of independent and identically distributed random variables. This processcan be considered as a deterministic dynamical system on an infinite dimensional phase spaceΩ×R asfollows. The elements ofΩ are the possible realisationsω = (ε0,ε1, · · ·) of the processεt. Introducingthe projectionπ(ω) = ε0 and the shiftσ(ω) = (ε1,ε2, · · ·), we have for instance thatεt = π σ t(ω). Definenow the mapΦ on Ω×R by

Φ(ω,x) = (ϕ1(ω),ϕ2(ω,x)) = (σ(ω),g(x)+π(ω)) .

2

Note that this is a deterministic system; the stochastics are ‘hidden’ in the fact that the initial conditionω ∈Ωis unknown. The realisationsXt of the process (1) are the values of the second component ofΦt(ω,x).

Note thatΦ is a skew system: the shift dynamicsϕ1 in the spaceΩ are driving the dynamicsϕ2

in R. ForΦ, a random fixed pointis defined as a mapξ : Ω → R, which satisfies the invariance condition

ϕ2(ω,ξ (ω)) = ξ (ϕ1(ω))

for all (or almost all)ω. Stability is now be defined in the usual way: a random fixed pointξ is stable ifall nearby orbits converge toξ . A random, or, following the terminology in [2],dynamical bifurcationorD-bifurcationof the process occurs for instance if a random fixed point loses stability.

At this point, a drawback of the notion of dynamical bifurcation becomes apparent: to determinestability of a random fixed point, two orbits ofΦ with identical noise realisations have to be compared. Thisseems to make it rather difficult to apply the notion of D-bifurcation to practical problems (see however [4]and related literature). We therefore leave aside this theory, and try rather to improve on the notion ofP-equivalence.

1.3 Local dependency structure

By considering stochastic analogues of concepts used in catastrophe theory, Hartelmanet al.[12] arrived at aclassification for stochastic differential equations of diffusion type on the real line that is invariant under in-vertible transformations. At the basis of this classification are level crossing statistics and derived quantities,which are invariant under monotonous transformations of the underlasying space. Although level crossingstatistics can also be used for discrete time systems, the corresponding classification would be rather re-strictive. The reason is that discrete time dynamical systems are ‘essentially richer’ than discretely sampledcontinuous time diffusions, because finite time transition densities induced by diffusions only represent asubclass of transition densities for discrete time dynamical systems.

Instead of level crossing statistics, we base our proposed equivalence relation on another functionthat can be associated to stochastic processes of finite order. For the purposes of this introduction, weexplain the main ideas in the case of a first order processXt on the real line; general definitions will begiven below. Assume therefore that the processXt is generated by

Xt+1 = g(Xt)+ εt , (2)

whereεt is a sequence of independent, identically distributed (IID) innovations. Recall that equation (2)imply a transformation law for probability densities. Assume thatXt is distributed with marginal probabilitydensitypt , that isP(a≤ Xt < b) =

∫ ba pt(x)dx, and thatεt has densityϕ. Then the probability densitypt+1

of Xt+1 is given by

pt+1(xt+1) =∫

τ(xt+1|xt)pt(xt)dxt ,

whereτ(x|y) = ϕ(x−g(y)) is the transition probability density of the process.Consider in particular the strictly stationary processXt whereX1, and hence everyXt , is dis-

tributed according to the invariant probability densityp1 = p. Denote the invariant joint probability densityof (Xt ,Xt+1) by pt,t+1; this joint density does not depend ont and it is therefore equal top1,2. Moreover,the measurep1,2(x1,x2)dx1dx2 is absolutely continuous with respect top(x1)p(x2)dx1dx2. By the Radon-Nikodym theorem, the following Jacobian exists:

f (x1,x2)def=

p1,2(x1,x2)dx1dx2

p(x1)p(x2)dx1dx2=

p1,2(x1,x2)p(x1)p(x2)

=τ(x2|x1)

p(x2);

3

The function f is called thedependency ratioof the process. Note thatf is identically 1 if Xt andXt+1

are independent; the difference| f (x1,x2)−1| can therefore be seen as a measure of the local dependencystructure of the process. Moreover,f contains all essential information regarding the dependence structureof the process: for if coordinates are chosen such thatp(x) = 1 for x∈ [0,1] and 0 otherwise, thenf (x1,x2) =p1,2(x1,x2) = τ(x2|x1) determines the entire process. By construction, dependency ratios are invariant —as geometrical objects — under any invertible transformation the underlying space of the process underconsideration. Our notion of equivalent processes will be based on these ratios; invariance of the equivalenceclasses under nonlinear invertible transformations is then obtained automatically.

Several other local dependence measures have recently been described in the statistical literature(see e.g. [7], [8], and [10]). The dependence measures in this literature are various localised versions of thePearson correlation coefficient, and as such are motivated entirely differently than our dependency ratio. Inparticular they do not share the invariance property with our dependency ratio.

The definition of our equivalence relation is a bit involved. We therefore postpone this definitionto section 3. First, in section 2, we give the definition of dependency ratio for more general processes, andwe show how this quantity is connected with other quantities like copula density, mutual information andentropy of a stochastic process. In section 3, the definition of our equivalence relation is given, together withtopological properties of the associated equivalence classes. Applications are given in section 4.

2 Dependency ratios, copulas and information theory

Our attention will be restricted to stationary discrete time processes of finite order that are generated byequations like

Xt = g(Xt−n, · · · ,Xt−1,εt).

Here the variablesXt take values in somen–dimensional orientable manifoldM, and theεt are identicallyand independently distributed random variables. In most casesM will be equal toRn. We wish howeverto bring out the dependency of the process on the variables chosen; in order to achieve this, a coordinate–independent formulation is chosen. Considering the problem on a manifold will come at little extra cost.

2.1 Basic definitions

Recall that any stochastic processXt is given by the joint probabilities

Pt1,··· ,t`(A1×·· ·×A`) = PXt1 ∈ A1, · · · ,Xt` ∈ A`,

which are defined for all–tuples(t1, · · · , t`) ∈ Z`. A stochastic process is calledstrictly stationaryif itsfinite dimensional joint probabilities are time invariant

Pt1,··· ,t` = Pt1+h,··· ,t`+h,

for all (t1, · · · , t`,h). Two stochastic variablesX andY are said to be equivalent in distribution, writtenas X ∼ Y, if P(X ∈ A) = P(Y ∈ A) for every P-measurable setA. A strictly stationary processXt issaid to be of ordern if, for all k > n, the conditional distribution ofXt givenXt−k, . . . ,Xt−1 is a function of(Xt−n, . . . ,Xt−1) only:

Xt |Xt−k, . . . ,Xt−1 ∼ Xt |Xt−n, . . . ,Xt−1. (3)

A measure on a manifoldM is of Lebesgue class, if at every point of the manifold it is absolutely continuouswith respect to the Lebesgue measure in some (and hence any) coordinate chart. IfM is orientable, there is

4

a volume formon M: a differentiablen-form that is everywhere non-zero. It is clear that any volume formis of Lebesgue class. We assume thatM is orientable, and we fix a volume form onM, denoting it by dx. Itwill be assumed that the processes we are dealing with have all joint probabilitiesPt1,··· ,t` of Lebesgue class,with twice differentiable probability densitiespt1,··· ,t` .

The transition probability densityτ of a process of ordern is given by

τ(xn+1|x1, · · · ,xn) =p1,··· ,n+1(x1, · · · ,xn+1)

p1,··· ,n(x1, · · · ,xn). (4)

Note that from the order property, it follows that fork≥ n

τ(xt |xt−k, · · · ,xt−1) = τ(xt |xt−n, · · · ,xt−1).

Given any initial probability density ofX1, · · · ,Xn, we can determine all joint probabilities by first determin-ing

p1,··· ,n+1(x1, · · · ,xn+1) = τ(xn+1|x1, · · · ,xn)p1,··· ,n(x1, · · · ,xn),

p1,··· ,n+2(x1, · · · ,xn+2) = τ(xn+2|x2, · · · ,xn+1)τ(xn+1|x1, · · · ,xn)p1,··· ,n(x1, · · · ,xn),...

and then integrating out unwanted variables. In particular, the distribution ofX2, · · · ,Xn+1 is obtained by

p2,··· ,n+1(x2, · · · ,xn+1) =∫

τ(xn+1|x1, · · · ,xn)p1,··· ,n(x1, · · · ,xn)dx1.

The central notion of dependency ratio of a stochastic process is given in terms of joint probability densities.

Definition. The n’th orderdependency ratio f of a strictly stationary stochastic processXt is given by

f (x1, · · · ,xn+1) =p1,··· ,n+1(x1, · · · ,xn+1)

p1,...,n(x1, · · · ,xn)p1(xn+1)=

τ(xn+1|x1, · · · ,xn)p(xn+1)

.

Note that f is the Jacobian of the measurep1,··· ,n+1(x1, · · · ,xn+1)dx1 · · · dxn+1 with respect to the mea-surep1,...,n(x1, · · · ,xn)dx1 · · · dxn · p1(xn+1)dxn+1. In particular,f is chart-independent.

Proposition 1. If for two processes of order n the dependency ratios of order n are equal, then for every m≥n, the dependency ratios of order m are equal as well.

Proof.We consider two processesXt andYt of ordern. It suffices to show that order-m equality of the depen-dency ratio for anym≥ n implies, and is implied by, order-n equality. Form≥ n, we may write

p1,...,m+1(x1, . . . ,xm+1) = p1,...,m(x1, . . . ,xm)τ(xm+1 | x1, . . . ,xm)= p1,...,m(x1, . . . ,xm)τ(xm+1 | xm−n+1, . . . ,xm).

The dependency ratio ofXt, for m≥ n, can be written as a function ofn+1 variables only:

p1,...,m+1(x1, . . . ,xm+1)p1,...,m(x1, . . . ,xm)p(xm+1)

=τ(xm+1 | xm−n+1, . . . ,xm)

p(xm+1)=

p1,...,n+1(xm−n+1, . . . ,xm+1)p1,...,n(xm−n+1, . . . ,xm)p(xm+1)

,

which is nothing but the dependency ratio of ordern in terms of the lastn+ 1 variables of the vector(x1, . . . ,xm+1). The dependency ratio ofYt can be reduced similarly. Clearly, order-m equality form≥ nimplies, and is implied by, order-n equality.

5

2.2 Relation with copulas

For a strictly stationary real-valued time seriesXt with continuous joint cumulative distribution functions(CDF)Ft1,...,t`(xt1, · · · ,xt`) and marginal CDFF(x), the copula associated with a delay vector(Xt−n+1, . . . ,Xt)is the quantity

Cn(u1, . . . ,un) = F1,...,n+1(F−1(u1), . . . ,F−1(un)),

whereu j ∈ [0,1] for j = 1, · · · ,n+1. The correspondeing copula density function is

cn(u1, . . . ,un) =∂ n+1Cn(u1, . . .un)

∂u1 · · ·∂un=

p1,...,n+1(F−1(u1), . . . ,F−1(un))p(F−1(u1)) · · · p(F−1(un))

,

or

cn(F(x1), . . . ,F(xn)) =p1,...,n(x1, . . . ,xn)p(x1) · · · p(xn)

.

In the case of a real valued first order time series, our definitions imply that the dependency ratiof (x1,xn+1)is equal to the copula density functionc2 evaluated in the ‘standard’ coordinates(u1,u2) = (F(x1),F(x2)),which are equivariant under monotonously increasing transformations ofX.

Although then+ 1-dimensional copulac(u1, . . . ,un+1) characterises the dependence structurewithin n+1 consecutive values(Xt−n, . . .Xt), it does not take into account the ordering of the observationsin time. In time series analysis one is often interested in the question of howXt depends on(Xt−n, . . . ,Xt−1).It is then more natural to take into account the ordering of the observations in time and to focus on the condi-tional distribution ofXt given past values ofXt . In this way, we are led to the definition of dependency ratiogiven above. This allows, in principle, to distinguish between time reversals in processes of order higherthan one. For orders larger than one the dependency ratio can be expressed as:

f (x1, . . . ,xn+1) =cn+1(F(x1), . . . ,F(xn+1))

cn(F(x1), . . . ,F(xn)).

2.3 Relation with information theory

Information theoretic dependence measures can often be expressed in terms of copulas. Well-known ex-amples are mutual information and the redundancy, which both are special instances of Kullback-Leiblerdivergences (relative entropies).

TheKullback-Leibler divergencebetween two probability distributions with probability densitiesp(x) andq(x) is defined as

D(p,q) =∫

p(x) ln

(p(x)q(x)

)dx

Themutual information I(X1;X2) between two random variablesX1 andX2, is a dependence measure defined asthe Kullback-Leibler divergence between their joint probability density functionp1,2(x1,x2) and the productof their marginalsp(x1)p(x2) (we consider the case with identical marginals, appropriate for stationarytime series). Sincep1,2(x1,x2) denotes the joint probability density function of(X1,X2), the integral overp(x) = p(x1,x2) can be concisely expressed as an expectation, that is,

I(X1;X2) =∫ ∫

p1,2(x1,x2) ln

(p1,2(x1,x2)p(x1)p(x2)

)dx1dx2 = E

[ln

(p1,2(X1,X2)p(X1)p(X2)

)],

6

which is non-negative and equals zero if and only ifX1 andX2 are independent. This can be seen as follows:the expected value of the random variableZ = p(X1)p(X2)/p1,2(X1,X2) is

E[Z] =∫ ∫

p1,2(x1,x2)p(x1)p(x2)p1,2(x1,x2)

dx1dx2 = 1.

By convexity of the function ln(z−1) =− ln(z), Jensen’s inequality gives

I(X1;X2) = E[ln(Z−1)]≥− ln(E[Z]) = 0,

and equality holds if and only ifZ = 1 almost surely.The generalisation of the mutual information to the multivariate case is known as theredundancy:

R(X1;...;Xn+1) = E

[ln

(p1,...,n+1(X1, · · · ,Xn+1)

p(X1) · . . . · p(Xn+1)

)],

which is zero if and only if(X1, . . . ,Xn+1) are jointly independent, and positive otherwise. In analogy withthe above discussion on copulas, a generalisation which is more suitable within a time series context is theentropy, given by

H(Xn+1;X1,...,Xn) = E

[ln

(p1,...,n+1(X1, · · · ,Xn+1)

p1,...,n(X1, . . .Xn)p(Xn+1)

)]= E [ln f (x1, . . . ,xn+1)] .

3 Equivalence notions

In this section we introduce and motivate our notion of equivalence of stochastic processes and we givesome of its fundamental properties.

3.1 Structural stability and bifurcations

We recall briefly the fundamentals of bifurcation theory. The two main ingredients of any such theory are atopological spaceX and an equivalence relation between elements ofX. An elementf of X is structurallystableif there is a neighbourhoodN( f ) such that all elementsg in that neighbourhood are equivalent tof ;that isg∼ f for all g∈N( f ). Intuitively speaking, a structurally stable elementf can be ‘perturbed’ slightlywithout being pushed out of its equivalence class. Such an element is sometimes called ‘persistent’. Clearly,the equivalence class of any structurally stable element is an open set. A structurally stable equivalenceclass can be thought of as defining a set of elements of the same ‘shape’ or ‘form’ (see [14]): form remains‘stable’ if perturbed slightly.

All elements ofX that are not structurally stable are calledbifurcating. This notion is usuallyfamiliar from the context of parametrised families: ifλ is someq-dimensional parameter, andλ 7→ fλ afamily of elements ofX, thenλ = λ0 is a bifurcating parameter value of the family iffλ0

is not structurallystable; it might be said that at bifurcating parameter values the ‘form’ offλ changes. Since the set ofstructurally stable elements is open, the set of bifurcating elements, and therefore also the set of bifurcatingparameter values in a parametrised family, is closed.

An equivalence relation will give rise to a useful bifurcation theory onX only if there at all existstructurally stable elements. The optimal situation is attained if the set of structurally stable elements, whilenot consisting of a single equivalence class, is ‘topologically big’, since then we will be able to associateto ‘most’ elements a form. In a topological space, a set is ‘big’ if it is open and dense, or if it is at least acountable intersection of open and dense sets (a so-called ‘generic’ or ‘second category’ set, see [11]).

7

3.2 Strong equivalence

A natural requirement to impose on an equivalence relation of stochastic processes on a manifoldM is thatprocesses which only differ by a diffeomorphism ofM, that is, which are the ‘same’ up to a coordinatechange, fall in the same equivalence class. Let for instanceXt, Yt denote two first order processes onM. We will certainly want to call two processes equivalent if there is a diffeomorphismϕ : M → M suchthat the variables(Yt ,Yt−1) and(ϕ(Xt),ϕ(Xt−1)) are identically distributed. If this is the case, we callXtandYt strongly equivalent. Let fX(x1,x2) and fY(x1,x2) denote the dependency ratios ofXt andYtrespectively; if the processes are strongly equivalent, it follows from the invariance of the dependency ratiosunder diffeomorphisms that

fX(x1,x2) = fY(ϕ(x1),ϕ(x2)) for all (x1,x2) ∈ M×M. (5)

If we would take strong equivalence as our equivalence relation, in general we would obtain anuncountable infinity of equivalence classes, and no class would be a neighbourhood to any of its points, thatis, no process would be structurally stable. To see this in a simple example, assume thatfX and fY are twodependency ratios defined on the square(−1,1)× (−1,1)⊂ R2, and that they are given as

fX(x1,x2) =2−µ

3+x2

1 + µx22, fY(x1,x2) =

2−ν

3+x2

1 +νx22.

Taking the invariant density in both cases to bep(x) = 12I[−1,1](x), whereIA(x) denotes the indicator func-

tion, we have specified two stochastic processes. The point(0,0) is the only non-degenerate critical pointfor both fX and fY; therefore, if fX and fY are strongly equivalent, we should have thatΦ(x1,x2) =(ϕ(x1),ϕ(x2)) satisfiesΦ(0,0) = (0,0). But there is no real-valued smooth diffeomorphismϕ such that (5)holds simultaneously withϕ(0) = 0, for the values offX and fY at (0,0) are different ifµ 6= ν . We see thatevery value ofµ defines a different equivalence class.

3.3 Topology of dependency ratios

In section 2 we have defined the dependency ratiof of ann’th order processXt by

f (x1, · · · ,xn+1) =p1,··· ,n+1(x1, · · · ,xn+1)

p1,··· ,n(x1, · · · ,xn)p(xn+1);

this quantity is invariant under coordinate transformations, and it is therefore a characteristic of the stochasticprocess.

As the space of dependency ratios is infinite dimensional, this characteristic itself is too fine-grained to be useful to classify such processes. In the previous section we have seen that one way ofextracting ‘coarser’ information from a dependency ratiof is to determine the expected value of somemonotone transformation off . But defining equivalence of two ratios by equality of such expected valueswould not lead to structurally stable elements, for a very small perturbation of the process might alreadychange the expected value.

Using however topological information of the dependency ratios turns out to give characteristicsthat are sufficiently coarse. To give a very simple example: we clearly want to call two functions defined onthe same domain to be of different shape if they have a different number of nondegenerate critical points.The number of such points is a numerical characteristic of the ‘shape’ of a function, and it in fact definesan equivalence class. Moreover, if we choose a suitable topology on the set of functions, we find that theequivalence classes are open sets, and that its members are structurally stable. We can specify different

8

equivalence relations by taking into account more detailed information; for instance, we can classify thecritical points according to their topological type.

In any case, we need a function topology on the set of dependency ratios; we choose theC2-topology, which is the ‘coarsest’ topology for which the number of nondegenerate critical points definesopen equivalence classes. Recall that in theC2 topology anε-neighbourhoodNε( f ) of a function f : M →Rdefined on a compact manifoldM consists of all functionsg such that, with respect to a fixed Riemannianmetric and the induced norm| · |x on the tangent spacesTxM we have

| f (x)−g(x)|x, |D f (x)−Dg(x)|x, |D2 f (x)−D2g(x)|x < ε.

If M is a noncompact manifold, the constantsε > 0 are replaced by positive functionsε(x) > 0 onM in theabove definition; in this context the topology obtained is called the ‘strong’C2-topology (see e.g. [6]).

As argued above, specifying an equivalence notion on the space of dependency ratios of stochasticprocesses induces a notion of stochastic bifurcation. In the following we shall sometimes use the words‘process’ and ‘dependency ratio’ indiscriminately; in particular, a ‘structurally stable process’ will denotea stochastic process whose dependency ratio is a structurally stable element under the equivalence relationunder consideration. A first rough formulation of our definition would be the following: we propose tocall two dependency ratios of stochastic processes equivalent, if every non-degenerate critical point of acertain type of the first dependency ratio can be mapped to a critical point of the same type of the seconddependency ratio by a transformation ofM×M induced by a diffeomorphism ofM. In the next section, weshall make this more precise.

3.4 Ratio equivalence on compact manifolds

Let M be anm-dimensional orientable compact manifold andMn+1 the(n+1)-fold Cartesian productM×·· ·×M; denote byπ` : Mn+1 → M the projection on the’th component

π`(x1, · · · ,xn+1) = x`.

An n’th order dependency ratio is a real valued function defined onMn+1.Recall the following definitions (see e.g. [5], subsections 10.2 and 10.4, p. 79 and p. 86 respec-

tively). If f : U → R is a twice continuously differentiable function defined on an open setU ⊂ Rn, apoint x∈U is acritical point of f if the derivative of f vanishes atx: D f (x) = 0. The valuef (x) of f ata critical pointx is called thecritical value of f at x. A critical point x is non-degenerateif the HessianmatrixH f (x) corresponding to the second derivativeD2 f (x) of f atx is invertible. The number of negativeeigenvalues of this matrix is called the(Morse) indexof the critical point. Clearly, the notions of criticalpoint, critical value, index and non-degeneracy carry over to functions defined on manifolds.

Our definition is based on the critical points of a given dependency ratio; this makes it necessaryto restrict attention to twice differentiable ratios only, and to consider theC2 function topology introducedabove on the space of these ratios. For in anyC0-neighbourhood of a given function, there are other functionsto be found with a different number of critical points; and the same is true for anyC1-neighbourhood of afunction that has itself at least one critical point.

Definition. If M is a manifold, a twice differentiable dependency ratio f: Mn+1 → (0,∞) is calledregularif all its critical points are non-degenerate, if no two critical values are equal and if no two critical pointshave the same image under any projectionπ`, for ` ∈ 1, · · · ,n+1.

9

Since the manifoldsM andMn+1 are compact, a regular dependency ratio has only finitely many criticalpointsξ1, · · · ,ξk; we assume that these are ordered such that the corresponding critical valuesvi = f (ξi)are in ascending order, that is,vi < v j if i < j. We associate to the critical pointξi its index ti (see sub-section 3.2). Note that 0≤ ti ≤ m(n+1). In this way we obtain theindex sequence t( f ) = (t1, · · · , tk) of aregular dependency ratiof .

Definition. Two order n processes defined on a compact manifold M, with dependency ratios f,g : Mn+1→(0,∞), are calledratio equivalent, if either both f and g are non-regular, or if f and g are both regular and

1. their index sequences are equal;

2. there is a diffeomorphismϕ : M → M, homotopic to the identity mapping on M, such that the mapΦ : Mn+1 → Mn+1 defined as

Φ(x1, · · · ,xn+1) =(

ϕ(x1), · · · ,ϕ(xn+1))

(6)

maps the i’th critical point of f to the i’th critical point of g.

It follows from the first point that the number of critical points off andg is equal as well.The following two propositions tell us that this definition has good properties: we can characterise

all structurally stable processes, and these form an open and dense set in the space of all stochastic processes.

Proposition 2. On a compact manifold M, a dependency ratio is structurally stable under ratio equivalenceif and only if it is regular.

Proposition 3. On a compact manifold M, the set of regular dependency ratios is everywhere dense.

The proofs of these propositions can be found in appendix A.

3.5 Ratio equivalence for non-compact manifolds

Though the results for compact manifolds are already useful in themselves, in practice most stochasticprocesses are defined on the non-compact manifoldRm. The direct generalisation of the notion of ratioequivalence is given in the following definition.

Definition. Two processes on a manifold M areweakly ratio equivalent, if they are ratio equivalent oneach compact set that is the closure of a bounded submanifold of Mn+1.

As the following example shows, this notion is unfortunately too weak for our purposes.

Example. Consider two first order processesXt andYt on the interval(−1,1) with invariant densi-ties p(x) = 1

2I[−1,1](x) and dependency ratios

fX(x1,x2) = 1− 12

x1x2 +14

x31, and fY(x1,x2) = 1+

12

x1x2−14

x31.

Both ratios have a unique critical point of index 1 at the origin, and hence they are ratio equivalent oncompact sets. But if we consider the values offX and fY along the curveγ(t) = (t, t) ast ↓ −1, we notethat fX γ(t) approaches the infimum offX on (−1,1)× (−1,1), while fY γ(t) approaches the supremumof fY on the same space. That means in particular that ifXt is close to−1, the probability is very low thatXt+1 is close to−1 as well, whereas ifYt is close to−1, the probability is rather high thatYt+1 is close to−1as well. Weak ratio equivalence is not sufficiently fine to distinguish between these processes.

10

For our second generalisation, we restrict ourselves to processes defined on a subclass of manifolds whichwe call manifolds of ‘constant type’; these are defined as follows.

Let M be an orientablem-dimensional manifold. If there exists a familyMt of bounded opensubmanifolds-with-boundary ofM depending on a real parametert > T, for whichMt ⊂Mt ′ if t < t ′,

⋃t Mt =

M and for which∂Mt is a smooth boundary for everyt > T, then we callMt anexhaustionof M. Moreover,we say that an exhaustion is ofconstant typeif there is a constantT > 0 such that∂Mt and ∂Mt ′ arediffeomorphic for allt, t ′ > T. A manifoldM that admits an exhaustion of constant type will itself be calledto be ofconstant type. For instance, the plane is a manifold of constant type with exhaustionR2

t = x∈R2 :‖x‖2 < t, whereas the plane with all points with integer coordinates removed is not a manifold of constanttype.

A convenient way to define an exhaustion ofM is to take a differentiable real valued functionJon M such that for a fixed pointx0 ∈ M and a metricd on M we have thatJ(x) → ∞ asd(x0,x)→ ∞, andsetMt = x∈ M : J(x) < t. If all values ofJ larger thanT are regular, then it follows from Morse theorythat∂Mt is diffeomorphic to∂Mt ′ for t, t ′ > T, and consequently thatMt is an exhaustion ofM of constanttype.

Consider the set(Mt)n+1 = Mt ×Mt ×·· ·×Mt ⊂ Mn+1.

This set can be decomposed into 2n+1 component manifoldsC jt N

j=1; each component is a Cartesian prod-ucts ofk factors∂Mt andn+ 1− k factorsMt , for k = 0, · · · ,n+ 1. We order these components such thata product of more factorsMt precedes a component with less, and for components with equal numbers offactorsMt , the component with more factorsMt in the first ` positions precedes a component with less,with ` taking the values 1,2, · · · ,n+1 consecutively. That is,

∂Mt ×Mt ×Mt precedes Mt ×∂Mt ×∂Mt ,

and

∂Mt ×Mt ×∂Mt precedes ∂Mt ×∂Mt ×Mt .

In the important special case thatM is one-dimensional, the component(∂Mt)n+1 consists of a finite numberof points. By definition, we consider these point as non-degenerate critical points, associating the index 0 tothem by default.

In the following three definitions,M is a manifold of constant type with exhaustionMt andcorresponding ordered decompositionC j

t of (Mt)n+1; by | · |x we denote a norm induced from a fixedRiemannian metric as in subsection 3.3. Moreover, the restriction off to C j

t is denoted byf jt with the

index j always ranging from 1 to 2n+1.

Definition. A dependency ratio f on Mn+1 is well-behaved at infinity if there are constants ct ,T > 0 suchthat for every t> T

1. for every j there is a compact set Kjt ⊂ C j

t such that|D f jt (x)|x > ct if x ∈ C j

t \K jt , unless Cj

t is 0-dimensional, and

2. f jt is weakly ratio equivalent to fjt ′ for all t , t ′ > T.

Definition. A dependency ratio f on Mn+1 is well-behavedif f is well-behaved at infinity and fjt isregular on Cj

t for every j and every t> T.

11

Definition. If M is a manifold of constant type, two well-behaved dependency ratios f and g are calledratio equivalent, if there is a value of t such that fj

t and gjt are weakly ratio equivalent for every j.

Note that if f andg are weakly ratio equivalent on each componentC jt for a single valuet > T, they are in

fact equivalent for all such values, sincef jt ∼ f j

t ′ for all t, t ′ > T.

Example. With this definition, we can distinguish between the two ratiosfX and fY introduced at the endof the previous subsection. Setat = t/(t + 1), and consider the exhaustionIt = (−at ,at) of (−1,1). Notethat∂ (It × It) can be decomposed into

C1t = (−at ,at)×−at ,at, C2

t = −at ,at× (−at ,at), C3t = −at ,at×−at ,at.

Restricted toC1t andC2

t , neither fX nor fY have any critical points. The setC3t consists of four isolated

critical points, which are critical by definition. The maximum offX restricted toC3t is assumed in the

points(at ,−at), whereasfY takes its minimum there. Since the only diffeomorphism ofC3t homotopic to

the identity is the identity itself, corresponding critical points offX and fY cannot be mapped onto eachother.

The following propositions describe the topological properties of ratio equivalence. The results are weakerthan in the compact case, as was to be expected; we obtain that well-behaved processes are stable elementsof ratio equivalence. However, restricted to the space of processes that are well-behaved at infinity, thewell-behaved processes form again an open and dense set.

The next proposition is a corollary to propositions 2 and 3.

Proposition 4. On a manifold M of constant type, a well-behaved dependency ratio is stable with respectto the strong topology under ratio equivalence.

The proof of this proposition is given in appendix A.

4 Examples

4.1 Stochastic dynamics on the circle

As an illustration of a stochastic process on a compact manifold, we consider the stochastic dynamicalsystem on the unit circleM = S1 defined by

Xt+1 = Xt +asin(Xt)+0.25sin2(Xt)+0.25+ εt+1 mod 2π, (7)

with εt IID andN(0,σ2) distributed. The state variable is taken modulo 2π; we represent states by pointson the interval[−π,π). For the above system we fixσ at the value 0.7 and consider qualitative changes inthe stochastic dynamics asa varies. The term 0.25(sin2(xt−1)+1) is added to break thex 7→ −x symmetryof the dynamics. In the symmetric case some particular additional properties arise which will be discussedin the next subsection.

Figure 1 shows a contour plot for the dependency ratiof for values ofa decreasing from−0.85to −0.95. Fora = −0.85, the contour plot shows two extrema, a maximum and a minimum, together withtwo saddle points. These are the minimal number of critical points of each type that can be attained for anon-degenerate functionf on the torusM2 = S1×S1. As the bifurcation parametera decreases, the systemshows a stochastic bifurcation at whichf develops a new local extremum, together with a new saddle point.

12

−2 0 2−3

−2

−1

0

1

2

3a=−0.85

xt−1

xt

−2 0 20

0.2

0.4

0.6

0.8

1

x

dens

ity

−2 0 2−3

−2

−1

0

1

2

3a=−0.9

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

−2 0 2−3

−2

−1

0

1

2

3a=−0.95

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

Figure 1: Level sets for the mapXt = Xt−1 +asin(Xt−1)+0.25sin2(xt−1)+0.25+ εt for decreasing valuesof a (top panels). The lower panels show the invariant probability density ofXt .

4.2 Antisymmetric dynamics

In applications dynamical systems are often symmetric. Though we leave the theoretical development ofour equivalence relation for symmetric systems for later work, we want to make some remarks about thissituation. We restrict to processes that are the sum of an antisymmetric deterministic part and a unimodalsymmetric noise term (e.g. Gaussian). Surprisingly, it turns out that for these systems the ‘ratio bifurcation’coincides with aP-bifurcation.

Consider the processXt+1 = g(Xt)+ εt ,

whereg(x) is a smooth odd function, that is, for whichg(−s) = −g(s). Theεt are independent and identi-cally distributed according to a symmetric unimodal distribution. It can be readily checked that the deter-ministic dynamics is equivariant under the transformationx 7→ −x. Because this transformation affects bothXt−1 andXt , the effect on the joint variable(Xt−1,Xt) is a point reflection in the origin.

For the conditional density ofXt givenXt−1 = x, one may write

pXt+1|Xt(x2|x1) = τ(x2|x1) =

p1,2(x1,x2)p(x1)

=1σ

h

(x2−g(x1)

σ

)whereh(·) is the probability density function ofεt ; we have thath(s) = h(−s) and thath has a unique localmaximum ats= 0. The mapg(·) as well as the probability density functionh(·) are assumed to be twicecontinuously differentiable.

Moreover we assume that the process has an invariant densityp(x), which is unique and twicecontinuously differentiable. These conditions onp(x) can easily be met by imposing some additional re-

quirements ong andh. For instance, a sufficient condition is that for eachx0, the support of1σ

h(

x1−g(x0)σ

)13

is M (strong mixing). If the invariant density is unique, it is necessarily an even function, since otherwise itsmirror imagep(−x) would be a different invariant density.

Our aim is to examine the properties of the dependency ratio

f (x1,x2) =p1,2(x1,x2)p(x1)p(x2)

=1

σ p(x2)h

(x2−g(x1)

σ

)near the origin. We see that

f (−x1,−x2) =1

σ p(−x2)h

(−x2−g(−x1)

σ

)=

1σ p(x2)

h

(−x2−g(x1)

σ

)= f (x1,x2).

It follows that the partial derivatives∂ f∂x1

and ∂ f∂x2

vanish at the point(x1,x2) = (0,0), so that the origin isalways critical. The index of this critical point is determined by the Hessian matrixH f (0,0). If g is oddandh is even and unimodal, we have

p′(0) = 0, g′′(0) = 0, h′(0) = 0, and h′′(0) < 0.

After some algebra one finds that the Hessian evaluated at the origin reduces to

H f (0,0) =h′′(0)

σ3p(0)

g′(0)2 −g′(0)

−g′(0) 1−σ2h(0)p′′(0)h′′(0)p(0)

.

Sinceh′′(0) is negative, the Hessian matrix has a negative determinant if and only if the second derivativeof the invariant densityp(x) satisfies

p′′(0) < 0,

in which case the origin is a saddle-point. If howeverp′′(0) > 0, the determinant is positive, while the traceof the matrix is negative, and the dependency ratiof (x1,x2) has a local maximum at the origin.

Apparently, the critical point at the origin changes from a saddle point to a local maximum ifp′′(0)becomes positive. Because this is exactly the condition for which the local maximum ofp(x) at the originchanges to a local minimum with a pair of maxima bifurcating off, it follows that for antisymmetric mapswith symmetric unimodal noise, the ‘ratio bifurcation’ coincides with a phenomenological bifurcation.

4.2.1 On the circle

We illustrate this by figure 2 which shows the level sets of the dependency ratio, as well as the invariantdensity, for the map

Xt+1 = a(

0.5sin(Xt)+0.25sin(2Xt))

+ εt+1,

where againM is the unit circleS1 and whereεt ∼ N(0,σ2t+1) with σ2

t+1 = 0.6−0.12cos(Xt). The noisevariance is made state dependent to avoid the critical point at(π,0) to bifurcate simultaneously with thebifurcation at the origin. As noted above, the local minimum inp(x) at x = 0 occurs exactly whenf (x1,x2)develops a local maximum at the origin. This is related to the fact that the denominator of the dependencyration contains a product of marginals which simultaneously develop a local minimum.

14

−2 0 2−3

−2

−1

0

1

2

3a=1.3

xt−1

xt

−2 0 20

0.2

0.4

0.6

0.8

1

x

dens

ity

−2 0 2−3

−2

−1

0

1

2

3a=1.4

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

−2 0 2−3

−2

−1

0

1

2

3a=1.5

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

Figure 2: Level sets for the mapXt+1 = a(

0.5sin(Xt)+ 0.25sin(2Xt))

+ εt+1 with εt+1 ∼ N(0,σ2t+1) and

σ2t+1 = 0.6−0.12cos(Xt) for increasing values ofa (upper panels) and the corresponding marginal density

functions (lower panels). The levels are not uniformly spaced.

4.2.2 On the real line

When we derived the coincidence of aP-bifurcation and a copula bifurcation at the origin in the antisym-metric case, apart from the global requirement of symmetry of the invariant density, only local argumentswere used. Therefore, provided that we confine ourselves to cases with symmetric invariant densities, theresult that aP-bifurcation coincides with a (local) ratio bifurcation, directly extends to stochastic dynamicson the real line.

As an example we consider the stochastic process onR defined by

Xt+1 = tanh(aXt)+ εt+1. (8)

Figure 3 shows the level sets of the dependency ratio and the corresponding invariant probability densityfunction for this map withN(0,σ2) distributed noise, takingσ = 0.7.

Note that the bifurcation parameter value differs from that of the analogous deterministic system(σ = 0): for the tanh map the stochastic analogue of the usual pitchfork bifurcation ata = 1 is shifted to alarger value ofa. Apparently the value of the bifurcation depends on the noise level. A natural question,therefore, is whether for increasing noise levels the bifurcation parameter merely shifts, or whether thebifurcation can disappear altogether.

Intuitively, if the map is bounded and has a small range relative to the noise level, the dynamicsis mainly governed by the noise and the deterministic part has little influence on the dynamics. In fact asimple argument shows that if the noise is fixed at a sufficiently large level, and if the family of odd mapsga is uniformly bounded, then there is no phenomenological bifurcation atx = 0, and therefore also noratio bifurcation at(x1,x2) = (0,0), for symmetric processes of the form

Xt+1 = ga(Xt)+ εt+1. (9)

15

−2 0 2−2

−1

0

1

2a=0.9

xt−1

xt

−2 0 20

0.2

0.4

0.6

0.8

1

x

dens

ity

−2 0 2−2

−1

0

1

2a=1.3

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

−2 0 2−2

−1

0

1

2a=1.7

xt−1

−2 0 20

0.2

0.4

0.6

0.8

1

x

Figure 3: Level sets for the mapXt+1 = tanh(aXt)+ εt+1 with εt+1 ∼ N(0,0.52) for increasing values ofa(top panels) and corresponding marginal density function (lower panels).

The argument runs as follows. By stationarity the invariant densityp should satisfy

p(x) =∫

1σ

h

(x−ga(s)

σ

)p(s)ds.

A necessary condition forp(x) to have a local minimum atx = 0 is thatp′′(0) > 0, where

p′′(0) =∫

1σ

h′′(−ga(s)

σ

)p(s)ds.

Sinceh(s) is a unimodal probability density function, its second derivativeh′′(s) is negative in a neighbour-hood ofs= 0. It follows that, forga uniformly bounded, for largeσ the integral on the right hand side ofthe last equation may remain negative asa varies.

4.3 Estimated dependency ratios from time series

In order to see whether dependency ratios can be used for classification of processes of which only a timeseries is available, a common situation in empirical applications, we estimate dependency ratios from sim-ulated time series. We generate relatively short seriesXt from the stochastic models considered earlierin this section; we estimate from these series bivariate invariant densities and use them to reconstruct thedependency ratios. It is well known [1, 13] that fixed bandwidth nonparametric kernel density estimatesbecome rather poor in regions with only few observations. One way to avoid this would be to use a datadriven adaptive bandwidth which depends on the density locally, becoming larger as fewer observations arepresent locally. Instead of using an adaptive bandwidth we suggest, for real valued time series, to transformthe data using the probability integral transform, that is, we construct

Ut = FX(Xt) =rank ofXt amongXsN

s=1

N.

16

This amounts to transforming the invariant distribution to a uniform distribution on the unit interval, whichtends to stabilise the estimation of the dependency ratio as the marginals no longer need to be estimated.In the case of first order ratios, the estimated empirical dependency ratio is equal to the empirical copuladensity

f (u1,u2) =1

N−1

N−1

∑t=1

Kb(u1−Ui ,u2−Ui+1).

HereKb(u1,u2) is a bivariate probability kernel, which we take to be the commonly used Gaussian kernel:

Kb(u1,u2) =1√2πb

e−(u21+u2

2)/(2b2).

To avoid ‘probability mass’ from disappearing out of the unit square by this smoothing procedure, we imposeperiodic boundary conditions ifM = S1 and reflecting boundary conditions ifM = R.

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ut−1

ut

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

t

xt

a=0.9

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ut−1

ut

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

t

xt

a=1.7

Figure 4: First 1000 values (top panels) of 4000 consecutiveXt-values generated by the mapXt+1 =tanh(aXt) + εt+1 with εt+1 ∼ N(0,0.52). for a = 0.9 (left) anda = 1.7 (right). The lower panels showthe corresponding empirical level sets estimated with a Gaussian kernel (bandwidthb = 0.07).

Figure 4 shows level sets of the empirical dependency ratio obtained from time series of length4000 from the symmetric hyperbolic tangent map given in equation (8) for different parameter values. Thedependency ratio is estimated by smoothing the empirical copula with a bivariate normal probability densityfunction (bandwidthb = 0.07). The empirical dependency ratio clearly reflects the fine structure of thetheoretical dependency ratio.

Figure 5 shows an attempt at performing a similar reconstruction for the asymmetric sine mapgiven by equation (7). In this case the topology of the reconstructed level sets does not correspond with thatobtained earlier; this is due to estimation error. Probably longer time series (along with smaller bandwidthsfor the smoothers) are required for this case. We consider the optimal estimation and the related issue ofdata requirements for estimating dependency ratios as an important area for future research.

17

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ut−1

ut

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4

t

xt

a=−0.85

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

ut−1

ut

0 200 400 600 800 1000−4

−2

0

2

4

t

xt

a=−0.95

Figure 5: First 1000 values (top panels) of 4000 consecutiveXt-values generated by the mapXt+1 = Xt +asin(Xt) + 0.25sin2(Xt) + 0.25+ εt+1 with εt+1 ∼ N(0,0.72) for a = −0.85 (left) anda = −0.95 (right).The lower panels show the level sets of the corresponding empirical dependency ratio.

References

[1] I.S. Abramson,On bandwidth variation in kernel estimates - a square root law, Annals of Statistics10(1982), 1217–1223.

[2] L. Arnold, Random dynamical systems, Springer, Heidelberg, 1998.

[3] L. Cobb,Stochastic catastrophe models and multimodal distributions, Behavioral Science23 (1978),360–374.

[4] M.A.H. Dempster, I.V. Evstigneev, and K.R. Schenk-Hoppe, Exponential growth of fixed-mix strate-gies in stationary asset markets, Finance and Stochastics7 (2003), 263–276.

[5] B.A. Dubrovin, A.T. Fomenko, and S.P. Novikov,Modern Geometry — Methods and Applications.Part II: The Geometry and Topology of Manifolds, Graduate Texts in Mathematics, vol. 104, Springer,New York, 1985.

[6] M.W. Hirsch, Differential Topology, Graduate Texts in Mathematics, vol. 33, Springer, New York,1976.

[7] P. W. Holland and Y. J. Wang,Dependence function for bivariate densities, Communications in Statis-tics A 16 (1987), 863–876.

[8] M. C. Jones,The local dependence function, Biometrika83 (1996), 899–904.

18

[9] A. Lasota and M.C. Mackey,Chaos, fractals, and noise: Stochastic aspects of dynamics, Springer,Heidelberg, 1994, 2nd edition.

[10] S. Nadarajah, K. Mitov, and S. Kotz,Local dependence functions for extreme value distributions,Journal of Applied Statistics30 (2003), 1081–1100.

[11] John C. Oxtoby,Measure and category. A survey of the analogies between topological and measurespaces. 2nd ed., Graduate Texts in Mathematics, vol. 2, Springer, 1980.

[12] A. Ploeger, H. L. J. van der Maas, and P. Hartelman,Catastrophe Analysis of Switches in the Perceptionof Apparent Motion, Psychonomic Bulletin & Review9 (2002), 26–42.

[13] G.R. Terrell and D.W. Scott,Variable kernel density estimation, Annals of Statistics20 (1992), 1236–1265.

[14] Rene Thom,Structural stability and morphogenesis. An outline of a general theory of models, W. A.Benjamin, Reading, Massachusetts, 1975.

[15] E.C. Zeeman,Stability of dynamical systems, Nonlinearity1 (1988), 115–155.

A Proofs of the topological properties

In this appendix, the topological properties given in section 3 are proved.

A.1 Proofs of the propositions

We repeat the statements of propositions 2 and 3 for the convenience of the reader.

Proposition 2. On a compact manifold M, a dependency ratio is structurally stable under ratio equiva-lence if and only if it is regular.

Proposition 3. On a compact manifold M, the set of regular dependency ratios is everywhere dense.

Proof.These propositions are direct corollaries from the following two lemmas.

Lemma 1. If M is compact and if f: Mn+1→R is a regular dependency ratio, then there is a constantε > 0such that every g∈ Nε( f ) is regular and equivalent to f .

Lemma 2. If M is compact, the set of regular dependency ratios is dense in the C2-topology.

From lemma 1 we infer that regular ratios are structurally stable. If howeverf is a structurally stable ratio,there is a neighbourhoodU = Nε( f ) such that everyg ∈ U is equivalent tof . But as the regular ratiosare dense, according to lemma 2, there is a regular ratio inU which then equivalent tof . By definition ofequivalence, the ratiof itself has to be regular. The propositions follow.

For non-compact manifolds of constant type, the following result is essentially a corollary of the results forcompact manifolds.

Proposition 4. On a manifold M of constant type, a well-behaved dependency ratio is stable with respectto the strong topology under ratio equivalence.

19

Proof.Let f be a well-behaved dependency ratio, and letMt be an exhaustion ofM. Let moreoverT > 0 andct >0 be such that for everyt, t ′ > T we have that∂Mt and∂Mt ′ are diffeomorphic, for every componentC j

t of (Mt)n+1 the restrictionf j

t of f to C jt is weakly ratio equivalent tof j

t ′ and there is a compact setK jt such

that|D f jt (x)|x > ct if x∈C j

t \K jt .

For everyt > T and everyj, there is then a constantεj

t > 0 such that for everyg∈ Nε

jt( f ) in the

C2-topology on(Mt)n+1, the restrictiong jt of g to C j

t is weakly ratio equivalent tof jt . Let εt = min j ε

jt ,

andε(x) = maxεt |x ∈ (Mt)n+1. It follows thatNε( f ) is an open neighbourhood off in the strongC2-topology, such that everyg∈ Nε( f ) is ratio equivalent tof .

A.2 Proofs of the lemmas

It remains to demonstrate the lemmas. For this, we first need the following technical result. In the statementof the lemma, a ball of radiusr around 0 is denoted byBr , that is,Br = x∈ Rk |‖x‖< r; also we have

‖ f −g‖C2(U) = max0≤ j≤2

maxx∈U

|D j f (x)−D jg(x)|.

Lemma 3. Let U ⊂ Rk be a bounded open set, and let f: U → R be a C2 function with D f(0) = 0 andH f (0) non-degenerate. Then there exist constantsδ0,η0 > 0 such that Bδ0

⊂U and that for every0< δ ≤ δ0

and0 < η ≤ η0 there is anε > 0, such that every function g satisfying‖ f −g‖C2(U) < ε has a unique non-degenerate critical pointy∈ Bδ with |g(y)− f (0)|< η , with g having the same index aty as f at0.

Proof.For a matrixA, let‖A‖= max‖x‖=1‖Ax‖ denote the matrix norm ofA. SinceH f (0) is non-degenerate, thereis a constantc > 0 such that‖H f (0)−1‖= c. Moreover, by continuity there is then aδ1 > 0, such that

‖H f−1(x)‖< 2c for all x∈ Bδ1.

Introduceψ = g− f andht = f + t(g− f ) = f + tψ. Thenh0 = f andh1 = g. We shall solve the equation

Dht(x) = 0 (10)

for t ∈ [0,1], using the implicit function theorem. Note that

Hht(x) = H f (x)+ tHψ(x) = H f (x)(I + tH f (x)−1Hψ(x)

),

and consequently that

‖Hht(x)−1‖ ≤ ‖H f (x)−1‖1− t‖H f (x)−1‖‖Hψ(x)‖

.

Taking|t| ≤ 1, x∈ Bδ1and‖ψ‖C2 < (4c)−1, we obtain

‖Hht(x)−1‖ ≤ 4c.

In particular, we can apply the implicit function theorem to solvex = x(t) from (10), first aroundt = 0, andthen around every value oft for which x(t) ∈ Bδ1

. Note that sinceHht is non-degenerate everywhere, theindex of the critical point cannot change.

20

Furthermore

Dht(x) = D f (0)+D f (x)−D f (0)+ tDψ(x) =∫ 1

0H f (sx)xds+ tDψ(x)

= H f (0)x+∫ 1

0

(H f (sx)−H f (0)

)xds+ tDψ(x).

By the continuity ofH f , there exists 0< δ2 < δ1 such that‖H f (x)−H f (0)‖ < 1/(2c) for all x ∈ Bδ2.

Recalling the estimate‖Ax‖ ≥ ‖A−1‖−1‖x‖, it follows for 0< δ < δ2 that if |t| ≤ 1, ‖x‖= δ and‖ψ‖C2 <δ/(2c), then

‖Dht(x)‖ ≥ ‖H f (0)−1‖−1‖x‖−∫ 1

0‖H f (sx)−H f (0)‖‖x‖ds− t‖Dψ(x)‖

>δ

c− δ

2c− δ

2c> 0.

We have obtained thea priori statement thatx(t) 6∈ ∂Bδ for all valuest ∈ [0,1] for which x(t) is defined.Thereforex(t) can be continued tot = 1, yieldingy = x(1).

To show thatx(t) is the unique solution of (10) inBδ , takey∈ Bδ such thaty 6= x(t), and compute(puttingys = x(t)+s(y−x(t))):

Dht(y) = Dht(y)−Dht(x(t)) =∫ 1

0Hht(ys)(y−x(t))ds

= H f (0)(y−x(t))+∫ 1

0

(H f (ys)−H f (0)

)(y−x(t))ds+ t

∫ 1

0Hψ(ys)(y−x(t))ds.

It follows, as in the previous paragraph, that‖Dht(y)‖> 0 for all t. But theny cannot be a critical point.Finally, if v(t) = ht(x(t)) denotes the critical value ofht , we see by differentiating that

dvdt

=∂ht

∂ t(x(t))+Dht(x(t))

dxdt

= ψ(x(t));

consequently

g(y)− f (0) = v(1)−v(0) =∫ 1

0ψ(x(t))dt,

and|g(y)− f (0)| ≤ maxBδ|ψ(x)|.

Puttingδ0 = δ2, η0 = 1 andε = min 14c,

δ

2c,η yields the statement of the lemma.

Lemma 1. If M is compact and if f: Mn+1→R is a regular dependency ratio, then there is a constantε > 0such that every g∈ Nε( f ) is regular and equivalent to f .

Proof.Let x = (x1, · · · ,xn+1) andy = (y1, · · · ,yn+1) denote points inMn+1. Note that thenπ`(x) = x` etc. For ametricd onM, define

dn+1(x,y) = max1≤`≤n+1

d(π`(x),π`(y)

).

Thendn+1 is a metric onMn+1.

21

Let x1, · · · , xk be the critical points off , ordered such thatvi = f (xi) < f (x j) = v j if i < j.Putv0 = 0; thenv0 < v1. Introduce

ζ = min0≤i< j≤k

|vi −v j |, σ = min0≤`≤n

min1≤i< j≤k

d(π`(xi),π`(x j));

then ζ is the smallest absolute difference of two critical values, andσ is the smallest distance of twoprojections of critical points onM.

For everyi, choose a neighbourhoodWi of xi and a coordinate chartxi : Wi → Rm(n+1), suchthat xi(xi) = 0, and setfi = f x−1

i . By assumptionD fi(0) vanishes andH fi(0) is nondegenerate. Foreveryi, take 0< δi < σ such thatBδi

⊂ xi(Wi) and such that 0 is the only critical point offi in Bδi.

By the lemma, we can findεi > 0, such that every functiongi defined onxi(Wi) with ‖ fi−gi‖C2 < εi

has a unique nondegenerate critical pointyi in Bδi, with | fi(0)−gi(yi)| < ζ/2 and withyi having the same

indexti as 0.Introduce the open setsUi = x−1

i (Bδi)⊂Mn+1, and letC= Mn+1\

⋃i Ui ; note thatC is compact, and

that df 6= 0 onC. Therefore, there isζ0 > 0, such that ifg∈Nζ0( f ), then dg 6= 0 onC as well. Moreover, for

everyi there isζi such thatg∈Nζi( f ), then in the chartxi we have that‖ fi −gi‖C2 < εi . Setε = min0≤i≤k ζi .

Finally, we have to provide a diffeomorphismϕ : M → M, homotopic to the identity, such that

Φ(x) =(ϕ π1(x)), · · · ,ϕ πn+1(x)

)mapsyi = x−1

i (yi) to xi .Note that by the choice ofδi , no two projections of the setsUi onM intersect:

π`1(Ui1)∩π`2(Ui2) = /0, for all 1≤ i1 < i2 ≤ k, 1≤ `1 < `2 ≤ n+1.

Fix i and`, and consider onπ`(Ui) a differentiable curveγ(t), defined for 0≤ t ≤ 1, such thatγ(0) = π`(xi)andγ(1) = π`(yi). Construct a vector fieldXi` on M such thatγ(t) = Xi`(γ(t)) for 0≤ t ≤ 1 andXi` = 0onM\π`(Ui).

Let X = ∑i,` Xi,`. The time-1 mapϕ = eX has the required properties.

Lemma 2. If M is compact, the set of regular dependency ratios is dense in the C2-topology.

Proof.Recall that the joint densities of ann-th order stochastic process propagate via the Perron-Frobenius operator(see e.g. [9]), giving the equation

pt−n+1,··· ,t(xt−n+1, · · · ,xt) =∫

Mτ(xt |xt−n, · · · ,xt−1)pt−n,··· ,t−1(xt−n, · · · ,xt−1)dxt−n.

If the process has a unique invariant densityp(x1, · · · ,xn), the process with the transition probability density

τ(xt |xt−n, · · · ,xt−1) = τ(xt |xt−n, · · · ,xt−1)+q(xt−n,xt−n+1, · · · ,xt)

p(xt−n, · · · ,xt−1)

has the same invariant densityp, if q is small enough, such that ˜p is indeed a probability density, andif∫

M qdxt−(n+1)+ j = 0 for every 1≤ j ≤ n+1.For every pointξ ∈ M, we can find a chartx = (x1, · · · ,xm) onM, such thatx(ξ ) = 0. Takeδ > 0

such thatU = x−1(Bδ ) andV = x−1(B2δ ) are in the domain ofx. Let ϕ,ψ : M → R be smooth functionssuch thatϕ = 1 onU , ϕ = 0 onM\V, andψ = 0 onU ∪M\V and

∫M ψ dx > 0.

22

For 1≤ j ≤ m, let ` j : M → R be defined by setting

` j(x) =

x jϕ +β jψ on V,

0 on M\V.

Moreover, set 0(x) = ϕ +β0ψ. The constantsβ j are chosen such that∫

M ` j dx = 0 for all j.For x ∈ Mn+1, let x = (x1, · · · ,xn+1) be a chart such thatx(x) = 0. For 1≤ i ≤ n+ 1, 1≤ j ≤

m, and p : Mn+1 → R a function that is everywhere positive (this will be the invariant probability den-sity pn+1(xn+1) later on), define

Lxk j(x) =

`k j(xk)`k0(xk)

∏n+1i=1 `i0(xi)

p(x)=

`10(x1) · . . . · `k j(xk) · . . . `n+1,0(xn+1)p(x)

.

Writing xk = (x1k, · · · ,xm

k ), we find that

∂Lxk j

∂x j ′

k′

(0) =

1

p(0)if k′ = k, j ′ = j,

0 otherwise.

It follows that forδ > 0 sufficiently small andx(y) ∈ Bδ ×·· ·×Bδ , the differentials of the functionsLxk j x

are linearly independent vectors inT∗y Mn+1.

Choose for everyx ∈ Mn+1 such a value forδ , and setUx = x−1(Bδ × ·· · ×Bδ ). SinceM iscompact, it is covered by a finite number of theUx, sayUx1, · · · ,UxK . Set

qki j = pLxk

i j .

Thenqki j /p is a finite collection of functions onMn+1 such that their differentials spanT∗

x Mn+1 at everypointx ∈ Mn+1. Moreover ∫

Mqk

i j π∗` dx` = 0

for all `.Recall the remark made at the beginning of the proof; let the stochastic process defined by the tran-

sition probabilityτ(xn+1|x1, · · · ,xn) have invariant probability densitiesp1,··· ,k(x1, · · · ,xk) and dependencyratio

f (x1, · · · ,xn+1) =p1,··· ,n+1(x1, · · · ,xn+1)

p1,··· ,n(x1, · · · ,xn)pn+1(xn+1)=

τ(xn+1|x1, · · · ,xn)pn+1(xn+1)

.

Let moreovera = (aki j ) be such that

τ(xn+1|x1, · · · ,xn)+∑i jk

aki j

qki j (x1, · · · ,xn+1)

p1,··· ,n(x1, · · · ,xn)

defines a parameterised joint probability density: this is always the case if the|aki j | are sufficiently small,

since the transition probability density is assumed to be positive everywhere on the compact manifoldMn+1.Then the dependency ratio of the new process is given by

g(a,x) = f (x)+∑i jk

aki j

qki j (x)

pn+1(xn+1),

23

wherea = (aki j ) ∈ A⊂ RKm(n+1), whereA is an open neighbourhood of 0.

Recall the definition of transversality (see e.g. [5], definition 10.3.1, p. 83): ifX andY are smoothmanifolds,W a smooth submanifold ofY, the map f : X → Y smooth, andx ∈ X, then f intersectsWtransversally atx, if either f (x) 6∈W or f (x) ∈W andTf (x)Y = Tf (x)W + d f (x)

(TxX). More generally, we

say thatf intersectsW transversally atA⊂ X, if f intersectsW transversally atx for everyx∈ A.We have the theorem that ifA, X andY are smooth manifolds,W a smooth submanifold ofY

and f : A×X →Y a smooth map which intersectsW transversally, then the set of pointsa∈A for which fa =f (a, .) : X →Y intersectsW transversally is everywhere dense inA (see [5], theorem 10.3.3, p. 85).

The derivative df of a function f : M → R on a manifoldM defines a sections of the cotangentbundleT∗M; in a sufficiently small neighbourhoodU of a point inM, the restrictionT∗

U M of the bundle toUis isomorphic toU ×Rm, and the section takes the forms(x) = (x,D f (x)). The zero sectionM0 of T∗M,which is isomorphic toM, is locally of the formU ×0.

The sections is transversal toM0 at a pointx∈ M0, if eithers(x) 6∈ M0, or if

T(x,0)T∗M = ds(x)TxM +T(x,0)M0 = (I ,H f (x))Rm+Rm×0.

Note that this is equivalent to saying thats is transversal toM0 everywhere if and only if the functionf hasonly nondegenerate critical points. Such a function is called aregular functionor aMorse function.

Consider now the functiong : A×M → R and the associated maps : A×M → T∗M givenby s(a,x) = (x, dxg(a,x)). Note thats is transversal toM0, since in local coordinates

ds(a,x)T(a,x)A×M +T(x,0)M0 =

(0 I

dqk

i j

pn+1 Hxg(a,x)

)RKm(n+1)×Rm+Rm×0,

and since by construction the d(qki j /pn+1) spanRm everywhere onM. By the theorem mentioned above, the

set ofa∈ A for whichga = g(a, .) is a regular function which is everywhere dense inA.For everyε > 0, we can choosea so small thatg = ga is a regular function andg ∈ Nε( f ),

whereNε( f ) is a neighbourhood in theC∞ topology. It remains to show that by a second arbitrarily smallperturbation, we can achieve regularity of the dependency ratio.

Note that sinceg is a regular function, its critical points are isolated. Denote them byx1, · · · , xN.Assume that the pointsx1 up toxk−1 have different critical values, and that they are such thatπ`(xi) 6= π`(x j)if 1 ≤ i < j ≤ k−1.

We choose a neighbourhoodU ⊂ Mn+1 of xk such thatU is contained in the domain of a chartxfor which x(xk) = 0, and such thatxk is the only critical point ofg in U . Let a ∈ Rm(n+1) be such that⟨a,Hg(0)−1a

⟩6= 0, where〈x,y〉 denotes the inner product of the vectorsx andy; the inverse ofHg(0) exists

sinceg is nondegenerate in 0; and the set of vectorsa that do not satisfy the condition form a union of asmooth manifold of codimension 1 with the point0.

Consider the function

ht(x) = h(t,x) = g(x)− t ∑i j

ai j Lxki j .

The critical points ofht are determined by the equation

0 = Dxht(x).

This equation can be solved using the implicit function theorem aroundx = 0 andt = 0 sinceHg(0) isinvertible. For the solutionx = x(t), we find

dxdt

(0) =1p

Hg(0)−1a. (11)

24

Note that by the assumption ona, this derivative is nonzero. We restrict the possible choice ofa further byrequiring that

π`∗dxdt

(0) = π`∗1p

Hg(0)−1a 6= 0.

Moreover, ifv(t) = ht(x(t)), then

dvdt

(t) =−∑i j

ai j Lxki j +Dxht(x) =−∑

i j

ai j Lxki j ,

and

d2vdt2 (0) =−1

p

⟨a,Hg(0)−1a

⟩6= 0. (12)

Because of our choices, there are only finitely many values oft for which v(t) is equal to one of the criticalvaluesg(x1), · · · , g(xk−1), or for which the projectionsπ`(xk) and π`(xi) coincide for some 1≤ i < kand 1≤ `≤ n+1. From equations (11) and (12) it follows that the set of values oft avoiding these specialvalues is everywhere dense in a neighbourhood oft = 0. This finishes the proof of the lemma.

25

Equivalence and bifurcations of ﬁnite order stochastic...

Documents

Transcript of Equivalence and bifurcations of ﬁnite order stochastic...