The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in...

15
The Role of Differential Geometry in Statistical Theory Author(s): O. E. Barndorff-Nielsen, D. R. Cox and N. Reid Source: International Statistical Review / Revue Internationale de Statistique, Vol. 54, No. 1 (Apr., 1986), pp. 83-96 Published by: International Statistical Institute (ISI) Stable URL: http://www.jstor.org/stable/1403260 Accessed: 18-05-2017 14:43 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extend access to International Statistical Review / Revue Internationale de Statistique This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTC All use subject to http://about.jstor.org/terms

Transcript of The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in...

Page 1: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

The Role of Differential Geometry in Statistical TheoryAuthor(s): O. E. Barndorff-Nielsen, D. R. Cox and N. ReidSource: International Statistical Review / Revue Internationale de Statistique, Vol. 54, No. 1(Apr., 1986), pp. 83-96Published by: International Statistical Institute (ISI)Stable URL: http://www.jstor.org/stable/1403260Accessed: 18-05-2017 14:43 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted

digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about

JSTOR, please contact [email protected].

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

http://about.jstor.org/terms

International Statistical Institute (ISI) is collaborating with JSTOR to digitize, preserve and extendaccess to International Statistical Review / Revue Internationale de Statistique

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 2: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

International Statistical Review, (1986), 54, 1, pp. 83-96. Printed in Great Britain

? International Statistical Institute

The Role of Differential Geometry in Statistical Theory O.E. Barndorff-Nielsen,1 D.R. Cox2 and N. Reid3

1Department of Theoretical Statistics, Aarhus University, Aarhus, Denmark. 2Department of Mathematics, Imperial College, London, UK. 3Department of Statistics, University of British Columbia, Vancouver, B.C., Canada

Summary

There has been increasing emphasis recently on the use of differential geometry in statistical theory, especially in asymptotic theory. In this paper a brief relatively nontechnical account is given of some relevant ideas in differential geometry. Some of the early work applying differential geometry in statistics is then sketched. Recent developments are outlined and finally directions of current and possible future work are indicated.

Key words: Affine connection; Asymptotic theory; Curvature; Curved exponential family; Distance between distributions; Expected and observed information; Nonlinear regression; Rie- mannian space; Tensors.

1 Introduction

Differential geometry is a traditional yet currently very active branch of pure mathematics with applications notably in a number of areas of physics. Until recently applications in the theory of statistics were fairly limited, but within the last few years there has been intensive interest in the subject.

This paper is in four main parts. First a very brief outline is given of some ideas in differential geometry, followed by a sketch of some of the early statistical applications. Then a summary is given of more recent work applying differential geometry to the asymptotic theory of statistical inference. Finally, current work and possible directions for the future are discussed.

For a systematic account of many of the topics discussed here, see the recent monograph of Amari (1985).

2 Some introductory remarks on differential geometry

Differential geometry began as the study of the local properties of curves and surfaces in three-dimensional space. Such notions as the curvature and torsion (twistedness) of curves and the curvature of surfaces are fundamental. In the study of curves drawn on a surface, a key notion is that of a geodesic, a curve of minimum length joining two points. On the surface of the unit sphere, a geodesic is part of a great circle.

When these notions are extended to higher dimensions and more abstract settings, we consider a coordinate system that specifies a point P on an m-dimensional 'surface' by

(wo1 ,..., om); note that the indices are identifiers, not powers. The squared 'distance' between two neighbouring points P and P + dP,

(w), . . . , om), (01 + do, ...., Oum + dom), is assumed given by the quadratic form

grs do' d os, (2.1)This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTC

All use subject to http://about.jstor.org/terms

Page 3: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

84 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

where:

(i) the grs, with g, = gs,, are functions of position forming the so-called metric tensor;

(ii) the convention is adopted that if the same letter occurs in lower and upper positions, it is summed from 1,. .. , m, so that in full (2.1) is

Sg, dw' dwo; r,s = 1

(iii) if the quadratic form (2.1) is positive-definite the surface is said to form a Riemannian space or Riemannian manifold.

For example, if the surface is the unit sphere, in Cartesian coordinates an arbitrary point has coordinates of the form (sin 0 cos V, sin 0 sin 4, cos 4) and the squared Euclidean distance between two adjacent points is

(d4))2 + sin2 4(d4l)2. (2.2)

Thus if o' = 4, t2 = 9, expression (2.1) applies with gll = 1, g12 = 0, g22 = sin2 ol In (2.2) we have considered the surface as embedded in a higher dimensional Euclidean

space; the unit sphere is embedded in three-dimensional space. In general, however, (2.1) is regarded as defining a Riemannian manifold in its own right. Thus, in particular, curvature can be discovered and measured by exploration on the surface of the sphere without going into three dimensions. Genuinely geometric ideas should be independent of a coordinate system. Therefore it

is important to examine behaviour under transformation of coordinates and the construction of invariants. Suppose that (Co1,..., om) are coordinates of P in a new coordinate system, that is T/' is a function of wol,..., onm. Then

d&= dos, (2.3) a os

and, because (2.1) has an invariant meaning, the metric tensor rgs in the new coordinate system satisfies

g, diO' dCos = g, dwo" d&, so that

grs= =or guy. (2.4) An important concept illustrated by (2.3) and (2.4) is the law of transformation for

contravariant (upper index) and covariant (lower index) tensors. There is a very fully explored calculus for such quantities, a key role being played by the metric tensor grs and its matrix inverse g". For example, a covariant tensor trs of rank two is transformed under change of coordinates w --) i according to the analogue of (2.4),

rs a1ir aQI)s tuv, and the contravariant version t", which satisfies

t"rs = grUgSV"t,, trs = gg,~ t"u V, obeys

- ' 80s trs = tuv. dO~u dO ee

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 4: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 85

In applications Riemannian geometry is closely linked to the investigation of quantities obeying tensor laws of transformation.

One further key idea can be introduced via any one of the equivalent notions covariant differentiation, absolute differentiation and parallel transport of vectors. Technically the key concept is that of a connection. Consider initially a Riemannian space on which is defined a curve C along which w' = w'(A) (r = 1, ..., m), where A is real. Suppose that at each point P in the m-dimensional manifold there is given an m-dimensional vector v = v(P), with covariant components v,. The set of all such vectors at point P can be regarded as forming a space, called the tangent space to the manifold at P. Now let P and P + dP be two adjacent points on C corresponding to A and A + dA. To introduce a geometrically meaningful notion of differentiation of v, we have to consider

{difference of v(P + dP) and v(P)}/dA,

that is to compare vectors in two different tangent spaces. We must thus link the frame of reference at P + dP with that at P and in a general discussion there is no unique way of doing this. Without such a link the difference between v(P + dP) and v(P) is indeed not defined. A linkage may be achieved by introducing a set of quantities Ft at each point, called coefficients of connection. We define, in component form, the absolute derivative of v with respect to A along the curve C:

Dv, 9v, doA do w D vt, (2.5) DA aS dA rsdA

If we take the curve C to correspond to a coordinate line, say that of the coordinate os, we obtain as a special case of (2.5)

avr

Vr;s = s- rstvt. (2.6) This is called the covariant derivative of v, with respect to cos. Provided that the Fts change under reparameterization in a particular way different from but closely related to the tensor laws, (2.5) and (2.6) have an invariant meaning.

Note that the definition

dvr rv, dow d- as dA (2.7) dA c c" dA

is unsatisfactory because (2.7) is not a tensor. For example, the vanishing of (2.7) does not have an invariant geometrical meaning independent of the coordinate system.

The physical meaning of F, is best seen from (2.6). Thus FT3 determines the influence of v, on the change of v2 when we move a small distance in the 03 direction.

If as we move along the curve C

DvrIDA = 0 (r = 1,..., m), (2.8)

we say that the vector v is subject to parallel displacement along C, with respect to the connection coefficients Ft. In this formulation a curve C for which the tangent at each point satisfies (2.8) is called a geodesic relative to the TF. Geodesics are analogues of straight lines in Euclidean space. Now in this discussion the Ft are arbitrary, except for the requirement as to behaviour

under transformation of coordinates, although for some purposes it is natural to impose the symmetry requirement Ft = TF~,r. We can establish a correspondence between neighbouring tangent spaces in many different ways. In specific applications it will often be sensible to choose the TF so that the curves of some special interest form geodesics.

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 5: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

86 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

In developing (2.5)-(2.8) the metric tensor is not used and indeed the definitions apply to manifolds in which the element (2.1) is not defined (or is disregarded). Such a space is said to be of affine connection, its properties being directly determined by the FT. If, however, the metric (2.1) is available, a particular connection is of special importance. Using this connection the length of tangent vectors and the angle between two tangent vectors are unchanged under parallel transport and the geodesic connecting two points P and Q is a curve of minimum length joining the points. To achieve these properties, the special choice,

rtr = ?gU igs u + ,, (2.9) is made, called the Riemannian connection. For a manifold embedded in R", this corresponds to projecting the tangent space at P + dP onto that at P, orthogonally with respect to the metric g,. The quantities (2.9) are called the Christoffel symbols.

From the connection coefficients Ft in general can be defined an invariant measure of the curvature of the space. The geometrical interpretation is that a vector moved by parallel transport from P around an infinitesimal closed curve back to P is unchanged if and only if the curvature at P is zero. The space is said to be flat if the curvature vanishes at all points. Note particularly, though, that a manifold can be endowed with many different Fst and the measurement of the curvature of a manifold will depend on the choice of connection coefficients.

In recent years there has been increasing emphasis in differential geometry on coordinate-free approaches. This leads to deeper understanding of underlying concepts. At an introductory level and for handling specific problems the rather more concrete view outlined above is likely to be preferable. There is a close analogy with the role of the coordinate-free approach to linear models.

For an introductory account of differential geometry emphasizing three-dimensional geometrical concepts, see Stoker (1969). For an elementary 'old fashioned' account of Riemannian geometry and tensor calculus, see Weatherburn (1957); and for a much more modern yet quite concrete discussion see Richtmyer (1981, Ch. 26, 27). To savour the flavour of a modern coordinate-free approach, examine Boothby (1975). There are, however, a large number of books on the subject and choice between them is very much a matter of taste.

The key ideas of which at least some qualitative understanding is needed to appreciate what follows are the following: metric tensor, Riemannian geometry, tensor, space of affine connection, curvature.

3 Differential geometry and statistics: Some early work

Gauss was a pioneer in differential geometry and in statistics, as in so many areas of mathematical science, but so far as we know made no connection between these two topics. Motivated by statistical problems concerning correlations, nonlinear regression and periodogram analysis, Hotelling (1939), in a paper of some renown in the mathematical literature, see for instance Gray & Vanhecke (1982), derived the first few terms of expansions of the volumes of spheres and tubes in Riemannian manifolds. Jeffreys, in the first (1939) edition of his book (3rd ed., 1961, pp. 179-192), was probably the first to set out a statistical interpretation to (2.1). Starting with a parametric family of

probability densities, p(y; wo,..., wod) =p(y; w), say, Jeffreys considered measures of

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 6: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 87

the distance between the distributions corresponding to ow and w', in particular

flog PY p(y; 'o)) - p(y; o) } dy. As w'-- o, Jeffreys showed that this leads to the quadratic form ir do dw, where

( log p(y; () ? log p(y; .) irs =E{lo Yg( 'o (3.1)

is the expected or Fisher information, so that the metric tensor of (2.1) is the information tensor. Jeffreys then argued that to achieve invariance of the prior density of o0 under reparameterization that density should be proportional to

{det (iS)}1.

Rao (1945, 1949) considered a parametric family of distributions and defined distance via the Fisher information matrix as in (3.1). He then studied the resulting Riemannian space and, in particular, determined the geodesic distance between distributions in some special cases; for a direct further development, see Atkinson & Mitchell (1981), who discuss the determination of geodesics in more detail, and Burbea & Rao (1982, 1984) and Skovgaard (1984). A superficially different line of development stems from Beale (1960), who considered nonlinear normal theory regression. The work of Bates & Watts (1980, 1981), Hamilton, Bates & Watts (1982) and Johansen (1983) develops this theme. See also Paizman (1982, 1984). In nonlinear regression the curvature and interesting geometric structure arise from the systematic part of the model and the objective is to obtain simple measures of the extent to which the simple 'exact' results of linear model theory are distorted.

Much of the recent work, however, arises from interest in curved exponential families, i.e. exponential family distributions in which the dimension of the minimal sufficient statistic exceeds that of the parameter space, and especially from the paper of Efron (1975) and the accompanying discussion; for some results on full exponential families, see B.R. Rao (1960). Some aspects of this work will now be described in more depth.

4 Differential geometry and statistics

Geometry, and in particular differential geometry, is concerned with quantities and properties which are invariant in the sense that they do not depend on which coordinate system is chosen to represent the structure studied. On the other hand, in parametric statistics procedures that are invariant under reparameterization are of particular interest. Now, as we shall discuss, a parametric statistical model A can be viewed as a differentiable manifold equipped with tensors and connections, derived by statistical considerations, and it turns out that under this viewpoint the two types of invariance, geometric and reparametric, agree.

Let the model A be specified by a family of probability density functions p(y; o), where o = (o1, .. , cod) varies in an open subset S? of Rd. Let 1(o) = logp(y; o) denote the log likelihood function and write &, for /Imw'. The expected information i(o) is then given by

ir(w) = E,(&,l adl) = -E, (&, al) (4.1) and, as mentioned in ? 3, i can be taken as a metric tensor and thus determines A as a Riemannian manifold. The tensorial character of i is obvious from the first form of i in

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 7: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

88 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

(4.1). More generally, for any m > 1 the array E.(ar,1l... arml) is a covariant tensor of rank m. Of particular interest is the skewness tensor.

T,, = E (arl a,l a,l).

From this and the information tensor i a whole one-parameter family of connections on tH is defined by

Tt' = OrF, - a2T,,ri"'. (4.2) The a-connections were introduced by Chentsov (1972), Dawid (1975, 1977) and Amari (1982a). They are of particular relevance in the study of the higher-order asymptotic theory of statistical inference for exponential models. The parameter a of this family is allowed to take any real value. Furthermore, in (4.2), oF denotes the Riemannian connection determined from i, as defined by (2.9), and i"rs is the (r, s) element of the inverse of i. The connection 17 is termed the ar-connection of the statistical model t.

Using (4.1) the defining equation (4.2) may be rewritten as

r,,st = E{8r al atl} + 2(1 - oa)E {r l 8ai l1}, (4.3) where rs,, = T"Ui,,.

Consider an exponential family ft with model function

exp { 0- x - K(0) - h(x)}, (4.4)

where 0 and x are vectors of dimension k and K is the cumulant function of x. Let A4o be a curved submodel of t, specified by restricting the canonical parameter 0 to be of the form 0(w) for some parameter wo of dimension d <k and some sufficiently smooth function 0(w). We may consider X4 and A4o as differentiable manifolds, with Ao a submanifold of A. These manifolds may be represented usefully in several ways. Thus X may be viewed as the domain of variation 8 for the canonical parameter 0 of (4.4), and Ao is then correspondingly given as 00 = 0(g). On the other hand, if we let T = T(0) = Eox denote the mean value of x under (4.4), the manifolds X and Ao can be represented as T = r(0) and To= r(O0), respectively. We assume that the canonical parameter domain 8 and the variation domain Q of o are open subsets of Rk and Rd, respectively, and then T is an open subset of Rk. As mentioned above, the manifold X becomes a Riemannian manifold when equipped with the expected information tensor i. Under the canonical parameterization 0 this information tensor is

irs(0) = Krs(0),

where, if we write ar for a/aor,

Krs(0) = 13r 9sK(0).

Moreover, in accordance with K being the cumulant function of x, we have i(0) = , where 1 = Vox is the variance matrix of x. If instead the information i is represented in the mean value parameterization r we have i(Qr) = X-1. The a-connection T of t when

expressed in the canonical parameterization 0 is simply

Trs,,(O) = ?(1 - a)Krs,,

where Krst = Krst(O8) = r &s atK(0) is the third-order cumulant of x. In particular, the Riemannian connection induced by the information tensor irs,(O)= Kr is is Orst(8) = Krst. For a = 1 the connection 1Frst(O) is identically 0, which means that O (or ib) is flat in the 1F geometry. Geodesics on 4 are linear in 0. The geometry of At is not Euclidean, though, because the 1-connection coefficients are not compatible with the information

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 8: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 89

metric tensor. Dually, in the mean value parameterization r we have, by the transforma- tion law for connection symbols, see for instance Richtmyer (1981, p. 196),

aTrst() = - (1 + oj)KrUKsVKtWKuvw

and hence T (or A) is flat in the -'1 geometry. Linear hypotheses about the mean value parameter can thus be described as a-geodesic for a = -1. The 'F and -1' connections are often denoted, alternatively, by er and '"F and are referred to as the exponential and the mixture connection, respectively; see for instance Amari (1982a).

Quite generally, if A is a differentiable manifold equipped with a metric tensor g and a connection F, and if Ao is a submanifold of A, then g and F induce, in a canonical way, a metric tensor and a connection on Ao. Furthermore, the 'curvature' of A in the geometry determined by F is described in terms of a certain tensor R which is called the Riemann or Riemann-Christoffel curvature tensor and which is derived from the connection coefficients by

R = a -a R x= rr rx t vrr, + trvU_ - r,,r

where xt indicates the coordinates considered for At and thus F = F(x). Now let Fo be a connection on 40, whether the one induced from (it, g, F) or not. The Riemann- Christoffel curvature determined by TF measures how Ao, considered by itself, is curved, and this curvature is spoken of as the intrinsic curvature of Ao. However, one may also ask how A~ curves considered as a submanifold of At, relative to the geometric structures of A given by g and F. This latter type of curvature is called the imbedding curvature or the Euler-Schouten curvature of A4o relative to (it, g, F). If the coordinate system (x', . . . , xk) for A is chosen so that Ao is given by xd+l = xd+2 ... = = 0, and if the submanifolds of A obtained by fixing the values of xl, ..., xd are orthogonal to Ao with respect to the inner product given by the metric tensor, that is, g,t =0 for 1 < r < d, d + l<t<~k, then the Euler-Schouten curvature is measured by the orthogonal components of the covariant derivatives of the tangent vectors to o. The components of the imbedding curvature are given by the tensor Hrst defined for 1 < r < d, 1 < s s d and d + 1 s<t <k by

Hrst = rsgut= F rst.

In the setting, outlined above, where Ao is a curved exponential submodel of an exponential model t, as given by (4.4), the information metric i and the a-connections 'T are useful in developing and interpreting higher-order asymptotic properties of estimators, tests and confidence regions (Amari, 1982a,b, 1983; Amari & Kumon, 1983; Kumon & Amari, 1983).

To illustrate this, suppose we have a sample xl, ... , x, from the model -Ao and let do be the maximum likelihood estimator of the (d-dimensional) parameter o of At. The conditional distribution of do given an, in general approximate, ancillary statistic may be expanded asymptotically as n - co in an Edgeworth series and the terms of order n-1 and n-1 may to a certain extent be expressed and interpreted by means of the information metric i, the cr-connections 'F, -'F and -IF and the associated intrinsic curvatures and imbedding curvatures of Ao; see Amari & Kumon (1983).

As another, but related, example one has that, if 6 is an arbitrary bias-corrected first-order efficient estimator of w, then the variance matrix of a can be expanded as

E{(o' - E'&r)(6'S - ECS)} = i"rn-l + {(-1f2)rs + 2(1H2)rs + (-1H2)rs}n-2 + O(n-3), (4.5)

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 9: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

90 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

where

(-~l2)rS = -ru-l vw iVuw (1 < r, s, t, u, v, w < d), (4.6) (1H2) = 1HHu Hwitviuw (1 r, s, t, v < d; d + 1 < u, w k), (4.7)

(-1H2)r = -= -1H ',wit"i"w (1 L r, s < d; d + 1 L t, u, v, w k), (4.8)

all quantities being expressed in a coordinate system on A having ol,..., tod as its first d components. This coordinate system is furthermore arranged so that 6) is obtained by expressing c in the coordinate system and taking the first d components. For d = 1 the result (4.5) is due to Efron (1975). It was extended to arbitrary d by Reeds (1975), Madsen (1979) and Amari (1982a). The term (4.6) is the 'square' of the (-1)-connection on 4to, sometimes called the 'naming curvature' of AJto. There exists a parameterization of

.Ao for which this vanishes identically if and only if the Riemann-Christoffel curvature tensor determined by -1T vanishes; this is generally not the case. However, the naming curvature can be made 0 at any given point of Ao by choosing the parameterization o to coincide with the (-1)-geodesic coordinates at that point. The term (4.7), sometimes referred to as the 'Efron excess' expresses how much 00 curves in E; it vanishes if 00 = e n L for some affine subspace L of Rk, in which case A4to is not a genuinely curved subfamily of t. While the terms (4.6) and (4.7) are common to all first-order efficient estimators Co, the third term (4.8) depends on the choice of estimator. It vanishes if the estimator is the maximum likelihood estimator d6, because those i? T which yield one and the same value of 6) form an affine subset of T, that is a subset of the form T n L for some affine subset L of Rk

There is an interesting alternative way of expressing (4.7), noted by Pierce (1975) and

Reeds (1975); see also Madsen (1979). For notational simplicity let 4r and iri (1 < r, s < d) denote the first- and second-order derivatives of the log likelihood function 1 of 9to, that is

1= 1(w) = .(w) - x - K(0(w))

and r 4= 1, r = r, ,sl, where now ,r = alar'. Moreover, let l'r* be the residual of il, after linear regression on the score vector 1, and set

Xrs, tu = {E(l(*, + irs)(l*u + +it4)};

that is, X is the variance matrix of the residuals lr. Then (1H2)rs = Xtu,vwirtiswiuv.

In particular, if d = 1 we have, writing y2= Xli2, that 1H2 = y2/i and

Y2 = var {j*/i}, (4.9)

where j* = -* is the residual of the observed information j(w) = -1(o) after linear regression on 1. Efron (1975) termed y2 the 'curvature of the model to' and, besides showing its role in (4.5) for d = 1, he demonstrated that y2 determines the loss of information on w caused by reducing the data from the minimal sufficient statistic I to the maximum likelihood estimator 6), in the sense that

n{1 - i'/(ni)}-- y2

as n -- oo, where

i6= var E(l{ I o} is the expected information on w contained in 6). An important spin-off of these results of Efron was the observation (Pierce, 1975; Cox,

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 10: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 91

1975) that, again for d = 1, the statistic

1*/(17) = (]/'- 1)/ would seem to be a natural and useful approximate ancillary for inference on t. This has led to a series of investigations (Efron & Hinkley, 1978; Hinkley, 1978, 1980; Cox, 1980; Barndorff-Nielsen, 1980, 1983, 1984a,c, 1985; Amari, 1982b, 1983; Buehler, 1982; Amari & Kumon, 1983; Kumon & Amari, 1983; McCullagh, 1984a; Skovgaard, 1985) concern- ing conditional inference, ancillarity and sufficiency.

The relation (4.9) can be taken as defining the 'statistical curvature' of an arbitrary one-parameter model, whether (curved) exponential or not. For a discussion of the geometrical interpretation in this more general setting, see Efron (1975) and also appendix Al of Amari (1982a). The reader is also referred to Efron (1975) for a number of examples and for a discussion of the role of y2 in hypothesis testing.

For an arbitrary parametric model A viewed as a differentiable manifold equipped with the connection T given by (4.2) one may ask whether there exists a parameteriza- tion to of A for which

Tr(wo) = 0 (4.10) for a given ar and for all i, j, k. If (4.10) holds identically for w E Q the model A is said to be ar-flat. This can be achieved if and only if the Riemann-Christoffel curvature tensor "R determined by T vanishes on iQ. This is rarely the case when the dimension d of A is greater than 1. However, for d = 1 we always have "R 0. If d = 1 and A is curved exponential then for a = -1, -i and 0 the parameterizations making 'T vanish on Q? are, respectively, the asymptotic bias-reducing, skewness-reducing and variance-stabilizing transformations, whereas the value a = 3 produces the parameterization for which the expected third derivative of the log likelihood is 0; see Kass (1984) who found this differential-geometric interpretation of earlier work by Wedderburn and by Hougaard (1982).

It is always possible to choose the parameterization so that (4.10) is satisfied at any given point of A, namely by choosing geodesic coordinates around that point. This fact can be used to construct second-order ancillary statistics for curved exponential models, using the (--)-connection (Amari, 1982b).

Returning to the above setting of a curved submodel Jto of the exponential model t of (4.4), for any given element o of A let us consider the set "A(w) consisting of those elements of Xt which can be connected to o by an a-geodesic curve in t such that this curve is orthogonal to Ao at t with respect to the information metric on AW. We suppose, for a fixed ar, that the sets "A(o) are disjoint and fill all of A. Since the observed mean . (at least for sufficiently large sample size n) can be considered as an element of Xt, namely the element whose mean value parameter r equals i, we can define the a-estimator of t as the uniquely determined point C OE Q for which ~ e "A(Ji). Note that for a = -1 we have C = d6, the maximum likelihood estimator of to. For a = 1 the estimator is referred to as the dual maximum likelihood estimator. The well-known

relation between maximum likelihood estimation and the discrimination, or Kullback-

Leibler, information measure has counterparts in ar-estimation, -1 < a r < 1, see appen- dices A.2 and A.3 of Amari (1982a). In this connection, see also Eguchi (1983).

On a more formal level the concept of tensors and the notational conventions developed in differential geometry have turned out to be very convenient, for instance in the treatment of cumulants, Hermite polynomials and Edgeworth expansions (Amari & Kumon, 1983; McCullagh, 1984a,b). To exemplify, the Edgeworth expansion to order n-1 for an r-dimensional statistic z, calculated from a sample x1, . .. , x, and having mean

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 11: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

92 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

value 0, can under rather broad regularity assumptions be written simply as

p(z) = gq(z; y){l + H1 + H2} + O(n-3/2),

where y is the precision of z, that is the inverse of the variance matrix of z, where gq(.; y) denotes the probability density function of the r-dimensional normal distribution with

mean 0 and precision y, and where HI and H2, of order n-2 and n-I, respectively, are given by

HI = 1Kr'sth,,(Z;; y), H2 = -1Kr',stuh~st(z; y) + Krs,tKu,v,wh (z; y).

In these expressions Kr,st and Kr,s,tu are the third- and fourth-order joint cumulants of the r coordinates of z, while h,,t(z; y), h,,tu(z; y) and h,,,,w(z; y) are tensorially defined versions of the multivariate Hermite polynomials. Written in this way the Edgeworth expansion looks no more complicated in the multivariate case than in the one-dimensional case.

An entirely different kind of application of advanced parts of differential geometry to statistical problems concerning shapes of random polygons and to positions of lines in three-dimensional space occurs in papers by Kendall (1981, 1984a,b, 1985), Kendall & Kendall (1980), Kendall & Young (1984). Recently, studies in the electrical engineering literature have shown that the theory of

Lie algebras has a bearing on problems of nonlinear filtering; see Marcus (1984).

5 Current work and some open problems

We now outline topics studied in the most recent work involving differential geometry and statistics.

(i) One approach to asymptotic inference for arbitrary parametric models t is to approximate t locally at each parameter value by an exponential family determined by the first few terms of a Taylor expansion of the log model function. This associates an exponential model with each parameter value and the collection of such models forms what is called a fibre bundle over N. This fibre bundle can itself be considered as a differentiable manifold. A more general approach still attaches a Hilbert space of random variables to each parameter value and treats the collection of these spaces as a fibre bundle over A, called the Hilbert bundle. The Hilbert bundle can, in particular, be equipped with connections in generalization of the ar-connections. This makes it possible to define more refined exponential family approximations. See Amari (1984, 1985).

(ii) One may consider inference about a parameter of interest when the number of nuisance parameters tends to infinity with the number of observations (Kumon & Amari, 1984). Asymptotic properties are approached via estimating equations: it is possible that the differential geometric structure of the estimating equation approach for finite samples can be elucidated.

Further, the approach of Kumon and Amari may be useful for semiparametric inference. A completely nonparametric approach would require the infinite-dimensional manifold of all probability density functions, and this involves substantial technical difficulties.

(iii) Applications to time series models (Amari, 1984) and more generally to inference in stochastic processes raise special problems.

(iv) In wide generality, differentiable manifolds may be equipped with a Riemannian

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 12: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 93

metric and with a connection different from the Riemannian one. Equivalently one may consider a Riemannian manifold having an associated covariant tensor of rank 3, called in a statistical context the skewness tensor. Lauritzen (1984) discusses such manifolds and suggests terming them statistical manifolds. An important question is to what extent such manifolds encompass the essential structure of a statistical model.

(v) Observed conditional information metrics and a-connections, using an ancillary statistic, can be used instead of geometric structures based on unconditional expectations (Barndorff-Nielsen, 1984b).

(vi) Transformation models and in particular exponential transformation models can be studied from the viewpoint of differential geometry (Barndorff-Nielsen, 1984b; Barndorff-Nielsen & Jupp, 1984b; Eriksen, 1984).

(vii) Inference via profile likelihoods (i.e. likelihood functions maximized over nuisance parameters) and marginal inference can be studied in relation to L-sufficiency (R6mon, 1984) and to the transfer of geometric structures by submersion (Barndorff- Nielsen & Jupp, 1984a,b).

(viii) Statistics that are parameterization invariant, such as the likelihood ratio statistic, can sometimes be expressed with insight as geometrical invariants. McCullagh & Cox (1985) use this approach to obtain a simplified form for asymptotic expansions associated with the likelihood ratio statistic.

(ix) Differential geometry can be used in the study of residuals for problems more general than the linear model; see Cook & Tsai (1985).

(x) A foliation of a manifold is a partition of the manifold into submanifolds, all of the same dimension. The notion of foliation can be used in connection with statistical models

and their properties, in particular independence properties (Barndorff-Nielsen, 1984b; Lauritzen, 1984).

Finally, we mention briefly some of the many open questions in this field.

(a) To what extent are the higher-order asymptotic statistical properties and inference procedures of a model *t determined by the expected information metric and expected a-connections and to what extent by the corresponding observed conditional features? See (iv) and (v) above. For instance, in the Edgeworth expansion considered by Amari & Kumon (1983), as well as in an expansion associated with the distribution of maximum likelihood estimates (Barndorff-Nielsen, 1984b), most of the terms of order n-1 have no obvious geometrical interpretation.

(b) What insight might be gained by detailed investigation of the difference between expected and observed conditional geometries?

(c) Are differential geometric notions, in particular that of Hilbert bundles mentioned in (i) above, useful in connection with prediction of future observations (as contrasted with inference about unknown parameters)?

(d) Are differential geometric notions fruitful in connection with behaviour under incorrect models, with robust estimation and with tests of separate families of hypotheses?

(e) What differential geometric concepts are useful in nonregular problems, such as estimating the support points of a density function? In these problems the information metric tensor is not defined.

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 13: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

94 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

(f) If conditional inference is applied, to what extent do Amari's ar-estimators lead to the same inferences, irrespective of the value of a?

(g) Are there connections other than the ar-connections that are useful statistically (Eguchi, 1983)?

(h) What are the implications for 'large-sample' Bayesian theory, especially when there are many nuisance parameters?

(i) What are the specific implications of differential geometric notions in very particular statistical models, in particular for the calculation of confidence regions in normal theory nonlinear regression?

In summary, it seems to us that the advantages of using tensor and connection notation in suitable types of statistical calculations are now clearly established. While the introduction of more specifically geometrical notions has considerable potential, it remains a challenging task to introduce such ideas in a way that is statistically wholly natural.

Acknowledgement

In April 1984 a NATO Research Workshop on Differential Geometry in Statistics was held at Department of Mathematics, Imperial College, London, with about 50 research workers taking part. The present paper owes much to the discussion on that occasion. We thank all the participants, and also the referees. Further, we are grateful to NATO for support of the Workshop and of the joint work of the first two authors.

References

Amari, S.I. (1982a). Differential geometry of curved exponential families--curvatures and information loss. Ann. Statist. 10, 357-385.

Amari, S.I. (1982b). Geometrical theory of asymptotic ancillarity and conditional inference. Biometrika 69, 1-17.

Amari, S.-I. (1983). Comparisons of asymptotically efficient tests in terms of geometry of statistical structures. Bull. Int. Statist. Inst., Proc. 44 Session, Book 2, 1.190-1.206.

Amari, S.-I. (1984). Differential geometry of statistics--towards new developments. Paper prepared for the NATO Advanced Research Workshop on "Differential Geometry in Statistical Inference", Imperial College, London, 9-11 April 1984.

Amari, S.-I. (1985). Differential Geometrical Methods in Statistics. Lecture notes in statistics. New York: Springer-Verlag.

Amari, S.-I. & Kumon, M. (1983). Differential geometry of Edgeworth expansion in curved exponential family. Ann. Inst. Statist. Math. 35, 1-24.

Atkinson, C. & Mitchell, A.F.S. (1981). Rao's distance measure. Sankhyid 43, 345-365. Barndorff-Nielsen, O.E. (1980). Conditionality resolutions. Biometrika 67, 293-310. Barndorff-Nielsen, O.E. (1983). On a formula for the distribution of the maximum likelihood estimator.

Biometrika 70, 343-365. Barndorff-Nielsen, O.E. (1984a). On conditionality resolution and the likelihood ratio for curved exponential

families. Scand. J. Statist. 11, 157-170. Barndorff-Nielsen, O.E. (1984b). Differential and integral geometry in statistical inference. Research Report

106, Dept. Theor. Statist., Aarhus University. Barndorff-Nielsen, O.E. (1984c). Tests and confidence limits for interest parameters in exponential models.

Research Report 114, Dept. Theor. Statist., Aarhus University. Barndorff-Nielsen, O.E. (1985). Confidence limits from c IJI1 L in the single-parameter case. Scand. J. Statist.

12, 83-87. Barndorff-Nielsen, O.E. & Jupp, P. (1984a). Differential geometry, profile likelihood and L-sufficiency.

Research Report 118, Dept. Theor. Statist., Aarhus University. Barndorff-Nielsen, O.E. & Jupp, P. (1984b). Profile likelihood, marginal likelihood and differential geometry

of composite transformation models. Research Report 122, Dept. Theor. Statist., Aarhus University. Bates, D.M. & Watts, D.G. (1980). Relative curvature measures of nonlinearity. J.R. Statist. Soc. B 42, 1-25. Bates, D.M. & Watts, D.G. (1981). Parameter transformations for improved approximate confidence regions in

nonlinear least squares. Ann. Statist. 9, 1152-1167.

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 14: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

Differential Geometry in Statistical Theory 95

Beale, E.M.L. (1960). Confidence regions in non-linear estimation (with discussion). J.R. Statist. Soc. B 42, 227-237.

Boothby, W.M. (1975). An Introduction to Differentiable Manifolds and Riemannian Geometry. New York: Academic Press.

Buehler, R.J. (1982). Some ancillary statistics and their properties (with discussion). J. Am. Statist. Assoc. 77, 581-594.

Burbea, J. & Rao, C.R. (1982). Entropy differential metric, distance and divergence measures in probability spaces-a unified approach. J. Mult. Anal. 12, 575-596.

Burbea, J. & Rao, C.R. (1984). Differential metrics in probability spaces. Prob. Math. Statist. 3, 241-258. Chentsov, N.N. (1972). Statistical Decision Rules and Optimal Inference (in Russian). Moscow: Nauka. English

translation (1982). Translation of Mathematical Monographs Vol. 53. American Mathematical Society, Providence, Rhode Island.

Chentsov, N.N. (1980). On basic concepts of mathematical statistics. Math. Statist. Banach Center Publ. 6, pp. 85-94. Warszaw: Polish Scientific Publishers.

Cook, R.D. & Tsai, C.-L. (1985). Residuals in nonlinear regression. Biometrika 72, 23-29. Cox, D.R. (1975). Discussion of paper by B. Efron. Ann. Statist. 3, 1221. Cox, D.R. (1980). Local ancillarity. Biometrika 67, 273-278. Dawid, A.P. (1975). Discussion of paper by B. Efron. Ann. Statist. 3, 1231-1234. Dawid, A.P. (1977). Further comments on a paper by Bradley Efron. Ann. Statist. 5, 1249. Efron, B. (1975). Defining the curvature of a statistical problem (with application to second order efficiency)

(with discussion). Ann. Statist. 3, 1189-1242. Efron, B. & Hinkley, D.V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed

versus expected Fisher information (with discussion). Biometrika 65, 457-487. Eguchi, S. (1983). Second order efficiency of minimum contrast estimators in a curved exponential family. Ann.

Statist. 11, 793-803. Eriksen, P.S. (1984). Existence and uniqueness of the maximum likelihood estimator in exponential

transformation models. Research Report 103, Dept. Theor. Statist., Aarhus University. Gray, A. & Vanhecke, L. (1982). The volumes of tubes about curves in a Riemannian manifold. Proc. London

Math. Soc. 44, 215-243. Hamilton, D.C., Bates, D.M. & Watts, D.G. (1982). Accounting for intrinsic nonlinearity in nonlinear

regression parameter inference regions. Ann. Statist. 10, 386-393. Hinkley, D.V. (1978). Likelihood inference about location and scale parameters. Biometrika 65, 253-262. Hinkley, D.V. (1980). Likelihood as approximate pivotal distribution. Biometrika 67, 287-292. Hotelling, H. (1939). Tubes and spheres in n-spaces, and a class of statistical problems. Am. J. Math. 61,

440-460.

Hougaard, P. (1982). Parametrizations of non-linear models. J.R. Statist. Soc. B 44, 244-252. Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford: Clarendon Press. Johansen, S. (1983). Some topics in regression. Scand J. Statist. 10, 161-194. Kass, R.E. (1984). Canonical parameterizations and zero parameter-effects curvature. J.R. Statist. Soc. B 46,

86-92.

Kendall, D.G. (1981). The statistics of shape. In Interpreting Multivariate Data, Ed. V. Barnett, pp. 75-80. Chichester: Wiley.

Kendall, D.G. (1984a). Shape-manifolds, procrustean metrics, and complex projective spaces. Bull. London Math. Soc. 16, 81-121.

Kendall, D.G. (1984b). Statistics, geometry and the Cosmos. (The Milne Lecture, 1983). Quart. J. R. Astron. Soc. 25, 147-156.

Kendall, D.G. (1985). Mathematical statistics in the humanities, and some related problems in astronomy. In A Celebration of Statistics: The ISI Centenary Volume, Ed. A.C. Atkinson and S.E. Fienberg, pp. 393-408. New York: Springer-Verlag.

Kendall, D.G. & Kendall, W.S. (1980). Alignments in two-dimensional random sets of points. Adv. Appl. Prob. 12, 380-424.

Kendall, D.G. & Young, G.A. (1984). Indirectional statistics and the significance of an asymmetry discovered by Birch. Mon. Not. R. Astron. Soc. 207, 637-647.

Kumon, M. & Amari, S.-I. (1983). Geometrical theory of higher-order asymptotics of test, interval estimator and conditional inference. Proc. R. Soc. A 387, 429-458.

Kumon, M. & Amari, S.-I. (1984). Estimation of a structural parameter in the presence of a large number of nuisance parameters. Biometrika 71, 445-459.

Lauritzen, S.L. (1984). Statistical manifolds. R 84-12, Inst. Electronic Syst., Aalborg University Center. Madsen, L.T. (1979). The geometry of statistical models: A generalization of curvature. Research Report 79/1.

Statist. Res. Unit., Univ. of Copenhagen. Marcus, S.I. (1984). Algebraic and geometric methods in nonlinear filtering. S.LA.M. J. Contr. Optim. 22,

817-844.

McCullagh, P. (1984a). Local sufficiency. Biometrika 71, 233-244. McCullagh, P. (1984b). Tensor notation and cumulants of polynomials. Biometrika 71, 461-476. McCullagh, P. & Cox, D.R. (1985). Invariants and likelihood ratio statistics. Unpublished paper. Pizman, A. (1982). Geometry of Gaussian nonlinear regression-parallel curves and confidence intervals.

Kybernetika 18, 376-396.

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms

Page 15: The Role of Differential Geometry in Statistical Theory · PDF fileDifferential Geometry in Statistical Theory 85 In applications Riemannian geometry is closely linked to the investigation

96 O.E. BARNDORFF-NIELSEN, D.R. Cox and N. REID

Pdzman, A. (1984). Probability distribution of the multivariate nonlinear least squares estimate. Kybernetika 20, 209-230.

Pierce, D.A. (1975). Discussion of paper by B. Efron. Ann. Statist. 3, 1219-1221. Rao, B.R. (1960). A formula for the curvature of the likelihood surface of a sample drawn from a distribution

admitting sufficient statistics. Biometrika 47, 203-207. Rao, C.R. (1945). Information and accuracy attainable in the estimation of statistical parameters. Bull. Calcutta

Math. Soc. 37, 81-91. Rao, C.R. (1949). On the distance between two populations. Sankhyii 9, 246-248. Reeds, J. (1975). Discussion of paper by B. Efron. Ann. Statist. 3, 1234-1238. R6mon, M. (1984). On a concept of partial sufficiency: L-sufficiency. Int. Statist. Rev. 52, 127-136. Richtmyer, R.D. (1981). Principles of Advanced Mathematical Physics, 2. New York: Springer. Skovgaard, L.T. (1984). A Riemannian geometry of the multivariate normal model. Scand. J. Statist. 11,

211-223.

Skovgaard, I. (1985). A second order investigation of asymptotic ancillarity. Ann Statist. 13, 534-551. Stoker, J.J. (1969). Differential Geometry. New York: Wiley. Weatherburn, C.E. (1957). Riemannian Geometry and the Tensor Calculus. Cambridge University Press.

Resume

R6cemment on a appuy6 sur l'usage de g6ometrie diff6rentielle en theorie statistique, particulierement en thdorie asymptotique. Dans cet article on d6crit, relativement court et non-technique, quelques id6es de la g6om6trie diff6rentielle qui ont rapport a la statistique. Une partie du travail ant6rieur est ensuite esquiss6e. Le contour des d6veloppements r6cents est dessin6 et enfin les directions du travail courant et 6ventuel a l'avenir sont indiqu6es.

[Received January 1985, revised October 1985]

This content downloaded from 163.1.41.41 on Thu, 18 May 2017 14:43:30 UTCAll use subject to http://about.jstor.org/terms