The Ergodic Hierarchy - Roman FriggThe so-called ergodic hierarchy (EH) is a central part of ergodic...
Transcript of The Ergodic Hierarchy - Roman FriggThe so-called ergodic hierarchy (EH) is a central part of ergodic...
1
The Ergodic Hierarchy
Forthcoming in Stanford Encyclopedia of Philosophy
Joseph Berkovitz
Roman Frigg
Frederick Konz
1. Introduction
The so-called ergodic hierarchy (EH) is a central part of ergodic theory. It is a hierarchy of
properties that dynamical systems can possess. Its five levels are egrodicity, weak mixing,
strong mixing, Kolomogorov, and Bernoulli. Although EH is a mathematical theory, its
concepts have been widely used in the foundations of statistical physics, accounts of
randomness, and discussions about the nature of chaos. We introduce EH and discuss how its
applications in these fields.
2. Dynamical Systems
The object of study in ergodic theory is a dynamical system. We first introduce the basic
concepts with a simple example, from which we abstract the general definition of a dynamical
system, a fundamental concept of modern ergodic theory. For a brief history of the modern
notion of a dynamical system and the associated concepts of EH see Appendix A.
A lead ball is hanging from the ceiling on a spring. We then pull it down a bit and let it go.
The ball begins to oscillate. The mechanical state of the ball is completely determined by a
specification of its position x and momentum p; that is, if we know x and p, then we know all
2
that there is to know about the ball. If we now conjoin x and p in one vector space we obtain
the so-called phase space of the system (sometimes also referred to as ‘state space’).1 This is
illustrated in Figure 1 for a two-dimensional phase space of the state of a ball moving up and
down (i.e. the phase space has one dimension for the ball’s position and one for its
momentum).
Figure 1: The motion of a ball on a spring.
Each point of X represents a state of the ball (because it gives the ball’s position and
momentum). Accordingly, the time evolution of the ball’s state is represented a line in X, a so-
called phase space trajectory (from now on ‘trajectory’), showing where in phase space the
system was at each instant of time. For instance, let us assume that at time t=0 the ball is
located at point x1 and then moves to x2 where it arrives at time t=5. This motion is
represented in X by the line segment connecting points 1 and 2. In other words, the motion of
the ball is represented in X by the motion of a point representing the ball’s (instantaneous)
state, and all the states that the ball is in over the course of a certain period of time jointly
1 The use of the term ‘space’ in physics might cause confusion. On the one hand the term is used in its ordinary
meaning to refer to the three-dimensional space of our everyday experience. On the other hand, an entire class of
mathematical structures are referred to as ‘spaces’ even though they have nothing in common with the space of
everyday experience (except some abstract algebraic properties, which is why these structures earned the title
‘spaces’ in the first place). Phase spaces are abstract mathematical spaces.
x1
x2
x
γ1 X
γ2
p
3
form a trajectory. The motion of this point has a name: it is the phase flow t .2 The phase
flow tells us where the ball is at some later time t, if we specify where it is at t=0; or,
metaphorically speaking, t drags the ball’s state around in X so that the movement of the
state represents the motion of the real ball. In other words, t is a mathematical representation
of the system’s time evolution. The state of the ball at time t=0 is commonly referred to as the
initial condition. t then tells us, for every point in phase space, how this point evolves if it is
chosen as an initial condition. In our concrete example point 1 is the initial condition and we
have )( 152 t . More generally, let us call the ball’s initial condition 0 and let )(t be its
state at some later time t. Then we have )()( 0 tt . This is illustrated in figure 2a.
Figure 2: Evolution in Phase space.
Since t tells us for every point in X how it evolves in time, it also tells us how sets of points
move around. For instance, choose an arbitrary set A in X; then )(At is the image of A after t
time units under the dynamics of the system. This is illustrated in Fig 2b. Considering sets of
points rather than only points is important when we think about physical applications of this
mathematical formalism. We can never determine the exact initial condition of a ball
bouncing on a spring. No matter how precisely we measure 0 , there will always be some
measurement error. So what we really want to know in practical applications is not how a
2 Note that the time dimension of the ball’s motion is not an explicit part of the phase space.
x1
x5
x
γ1 X
)( 152 t
p
x
X
0
)()( 0 tt
A
)(At
(2a) (2b)
4
precise mathematical point evolves, but rather how set of points around the initial condition
0 evolves. In our example of the ball the evolution is ‘tame’, in that the set keeps its original
shape. As we will see below, this is not always the case.
An important feature of X is that it is endowed with a so-called measure . We are familiar
with measures in many contexts: from a mathematical point of view, the length that we
attribute to a part of a line, the surface we attribute to a part of a plane, and the volume we
attribute to a segment of space are measures. A measure is simply a device to attribute a ‘size’
to a part of a space (in everyday contexts one, two, or three dimensional). Although X is an
abstract mathematical space, the leading idea of a measure remains the same: it is a tool to
quantify the size of a set. So we say that the set A has measure )(A in much the same way
as we say that a certain collection of points of ordinary space (for instance the ones that lie on
the inside of a bottle) have a certain volume (for instance one litre).
There are many different measures in our daily lives: we can measure length in meters or in
yards; we can measure surfaces in square meters or acres; and we can measure volume in
litres or gallons. The same is the case in abstract mathematical spaces, where we can also
introduce many different measures. One of these measures is particularly important, namely
the so called Lebesgue measure. This measure has an intuitive interpretation: it is just a
precise formalisation of the measure we commonly use in geometry. The interval [0, 2] has
Lebesgue measure 2 and the interval [3, 4] has Lebegues measure 1. In two dimensions, a
square whose sides are 2 long has Lebesgue measure 4; etc. Although this sounds simple, the
mathematical theory of measures is rather involved. We state the basics of measure theory in
Appendix B and avoid appeal to technical issues in measure theory in what follows.
The essential elements in the discussion so far were the phase space X, the time evolution t ,
and the measure . And these are also the ingredients for the definition of an abstract
dynamical system. An abstract dynamical system is triple ],,[ tTX , where
}{ timeofntsinstaallaretTt is a family of automorphisms, i.e. a family of transformations of
X onto itself with the property that ))(()(2121
xTTxT tttt for all Xx (Arnold and Avez
5
1968, 1); we say more about time below.3 In the above example X is the phase space of the
ball’s motion, is the Lebesgue measure, and tT is t .
So far we have described tT as giving the time evolution of a system. Now let us look at this
from a more mathematical point of view: the effect of tT is that it assigns to every point in X
another point in X after t time units have elapsed. In the above example 1 is mapped onto
2 under the t after 5t seconds. Hence, from a mathematical point of view the time
evolution of a system consists in a mapping of X onto itself, which is why the above
definition takes tT to be a family of mappings of X onto itself. Such a mapping is a
prescription that tells you for every point x in X on which other point in X it is mapped
(from now on we use x to denote any point in X , and it no longer stands, as in the above
example, for the position of the ball). A mapping that takes X onto itself is called an
automorphism of X.
The systems studied in ergodic theory are forward deterministic. This means that if two
identical copies of that system are in the same state at one instant of time, then they must be in
the same state at all future instants of time. Intuitively speaking, this means that for any given
time there is only one way in which the system can evolve forward. For a discussion of
determinism see Earman (1986).
It should be pointed out that no particular interpretation is intended in an abstract dynamical
system. We have motivated the definition with an example from mechanics, but dynamical
systems are not tied to that context. They are mathematical objects in their own right, and as
such they can be studied independently of particular applications. This makes them a versatile
tool in many different domains. In fact, dynamical systems are used, among others, in fields
as diverse as physics, biology, geology, and economics. In population biology, for instance,
the points in X are taken to represent the number of animals in a population, and the map T
gives the change in number over time.
3 Sometimes a fourth component is mentioned in the definition: a sigma algebra . Although in certain
circumstances it is convenient to add it is not strictly necessary since the main purpose of is to provide a
basis to define the measure , and so is always present in the background when there is a measure and it
is not necessary to mention it explicitly. For a discussion of sigma algebras and measures see Appendix B.
6
There are many different kinds of dynamical systems. The three most important distinctions
are the following.
Discrete versus continuous time. We may consider discrete instants of time or a continuum of
instants of time. For ease of presentation, we shall say in the first case that time is discrete and
in the second case that time is continuous. This is just a convenient terminology that has no
implications for whether time is fundamentally discrete or continuous. In the above example
with the ball time was continuous (it was taken to be a real number). But often it is convenient
to regard time as discrete. If time is continuous, then t is a real number and the family of
automorphisms is }{ RtTt , where R are the real numbers. If time is discrete, then the t is in
the set ,...}2,1,0,1,2{..., Z , and the family of automorphisms is }{ ZtTt . In order to
indicate that we are dealing with a discrete family rather than a continuous one we sometimes
replace ‘ tT ’ with ‘ nT ’; this is just a notational convention of no conceptual importance.4 In
such systems the progression from one instant of time to the next is also referred to as a ‘step’.
In population biology, for instance, we often want to know how a population grows over a
typical breeding time (e.g. one month). In mathematical models of such a population the
points in X represent the size of a population (rather than the position and the momentum of
a ball, as in the above example), and the transformation nT represents the growth of the
population after n time units. A simple example would be nxxTn )( , where x now is just
a point in X (and not, as above, the position of a ball).
Discrete families of automorphisms have an interesting property the interesting property that
they are generated by one mapping. As we have seen above, all automorphisms satisfy
))(()(2121
xTTxT tttt . From this it follows that )()( 1 xTxT nn , that is nT is the n -th iterate of
1T . In this sense 1T generates }{ ZtTt ; or, in other words, }{ ZtTt can be ‘reduced’ to 1T .
For this reason one often drops the subscript ‘1’, simply calls the map ‘T ’, and writes the
dynamical system as the triple ],,[ TX , where it is understood that 1TT .
4 By using R and Z we assume that time extends to the past as well as to the future, and we also assume that the
time evolution is reversible. This need not be the case and these assumptions be relaxed in different ways.
Nothing in what follows depends on this.
7
For ease of presentation we use discrete transformations from now on. The definitions and
theorems we formulate below carry over to continuous without further ado, and where this is
not the case we explicitly say so and treat the two cases separately.
Measure preserving versus non-measure preserving transformations. Roughly speaking, a
transformation is measure preserving if the size of a set (like set A in the above example) does
not change over the course of time: a set can change its form but it cannot shrink or grow
(with respect to the measure). Formally, T is a measure-preserving transformation on X if
and only if (iff) for all sets A in X : ))(()( 1 ATA , where )(1 AT is the set of points that
gets mapped onto A under T ; that is })(:{)(1 AxTXxAT .5 From now on we also
assume that the transformations we consider are measure preserving.6
Differentiable versus (merely) measurable dynamics. A further issue concerns the question of
where T comes from. In the case of classical mechanics, we obtain T as the solution of the
equation of motion. In the introductory example, for instance, we would start by writing down
the forces acting on the ball, then plug these into Newton’s equation of motion (a differential
equation), and then solve that equation. The solution to the equation is t , and since it is the
solution to differential equation it is differentiable. Systems whose flows are differentiable are
differentiable systems. Those systems that are not differentiable are (merely) measurable
since they have a measure defined on them (namely ), which can be used to measure sets at
all times.7 Since being differentialble is a strong assumption that many systems don’t satisfy,
in what follows we shall not assume that systems are differentiable (and when we do we shall
mention it explicitly).
5 Strictly speaking A has to be measurable. In what follows we always assume that the sets we consider are
measurable. This is a technical assumption that has no bearing on the issues that follow since the relevant sets
are always measurable. 6 First appearances notwithstanding, this is not a substantial restriction. Systems in statistical mechanics are all
measure preserving. Some systems in chaos theory are not measure preserving, but these systems, if they are
chaotic on a certain part of the phase space (which can be an attractor or an interval, for instance), then there is
an invariant measure on this part and EH is applicable with respect to that measure. For a discussion of this point
see Werndl (2009b). 7 There are also other types of systems; e.g. topological ones. These are not considered here since the concepts of
EH are essentially tied to there being a measure.
8
In sum, from now on, unless stated otherwise, we consider discrete measure preserving
transformations which can but need not be differentiable.
In order to introduce the concept of ergodicity we have to introduce the phase and the time
mean of a function f on X . Mathematically speaking, a function assigns each point in X a
number. If the numbers are always real the function is a real-valued function; and if the
numbers may be complex, then it is a complex-valued function. Intuitively we can think of
these numbers as physical quantities of interest. Recalling the example of the bouncing ball,
f could for instance assign each point in the phase space X the kinetic energy the system
has at that point; in this case we would have mpf 2/2 , where m is the mass of the ball.
For every function we can take two kinds of averages. The first is the infinite time average
f . The general idea of a time average is familiar from everyday contexts. You play the
lottery on three consecutive Saturdays. On the first you win $10; on the second you win
nothing; and on the third you win $50. Your average gain is ($10 + $0 + $50)/3 = $20.
Technically speaking this is a time average. This simple idea can easily be put to use in a
dynamical system: follow the system’s evolution over time (and remember that we are now
assuming time is discrete), take the value of the relevant function at each step, add the values,
and then divide by the number of steps. This yields
k
ii xTfk
10 ))((/1 , where
k
i
xTf1
0 ))(( is
just an abbreviation for )(...)()()( 002010 xTfxTfxTfxf k . This is the finite time
average for f after k steps. If the system’s state continues to evolve infinitely and we keep
tracking the system forever, then we get the infinite time average:
k
ii
kxTf
kf
10 ))((
1lim ,
where the symbol ‘lim’ (from latin ‘limes’, meaning border or limit) indicates that we are
letting time tend towards infinity (in mathematical symbols: ). One point deserves special
attention, since it will become crucial later on: the presence of 0x in the above expression.
Time averages depend on where the system starts; i.e. they depend on the initial condition. If
the process starts in a different state, the time average may well be different.
9
Next we have the space average f . Let us again start with a colloquial example: the average
height of the students in a particular school. This is easily calculated: just take each student’s
height, add up all the numbers, and divide the result by the number of students we have.
Technically speaking this is a space average. In the example the students in the school
correspond to the points in X ; and the fact that we count each students once (we don’t, for
instance, take John’s height into account twice and omit Jim’s) corresponds to the choice of a
measure that gives equal ‘weight’ to each point in X . The transformation T has no pendant
in our example, and this is deliberate: space averages have nothing to do with the dynamics of
the system (that’s what sets them off from time averages). The general mathematical
definition of the space average is as follows:
X
dxff )( ,
where X
is the integral over the phase space X .8 If the space consists of discrete elements,
like the students of the school (they are ‘discrete’ in that you can count them), then the
integral becomes equivalent to a sum like the one we have when we determine the average
height of a population. If the X is continuous (as the phase space above) things are a bit more
involved.
3. Ergodicity
With these concepts in place, we can now define ergodicity.9 A dynamical system ],,[ TX is
ergodic iff
8 The basic idea of an integral is the following: slice up the space into small m cells mcc ,...,1 (e.g. by putting a
grid on it), then choose a point in each cell and take the value of f for that point. Then multiply that value with
the size of the cell (its measure) and add them all up: )()(...)()( 11 mm cxfcxf , where 1x is a point in 1c
etc. Now we start making the cells smaller (and as result we need more of them to cover X ) until they become
infinitely small (in technical terms, we take the limit). That is the integral. Put simply, the integral is just
)()(...)()( 11 mm cxfcxf for infinitely small cells.
9 The concept of ergodicity has a long and complex history. For a account of this history see Sklar (1993, Ch. 2)
10
ff
for all complex-valued Lebesgue integrable functions f almost everywhere, meaning for
almost all initial conditions. The first qualification, ‘for all complex-valued Lebesgue
integrable functions’, is usually satisfied for the functions that are of interest in science. The
second qualification, ‘almost everywhere’, is non-trivial and is the source of a famous
problem in the foundations of statistical mechanics, the so-called ‘measure zero problem’ (to
which we turn in Section 4). So it is worth unpacking carefully what this condition involves.
Not all sets have a finite size. In fact, there are sets of measure zero. This may sound abstract
but is very natural. Take a ruler and measure the length of certain objects. You will find, for
instance, that your pencil is 17cm long – in the language of mathematics this means that the
one dimensional Lebegue measure of the pencil is 17. Now measure a geometrical point and
answer the question: how long is the point? The answer is that such a point has no extension
and so its length is zero. In mathematical parlance: a set consisting of a geometrical point is a
measure zero set. The same goes for a set of two geometrical points: also two geometrical
points together have no extension and hence have measure zero. Another example is the
following: you have device to measure the surface of objects in a plane. You find out that an
A4 sheet has a surface of 623.7 square centimetres. Then you are asked what the surface of a
line is. The answer is: zero. Lines don’t have surfaces. So with respect to the two dimensional
Lebesgue measure lines are measure zero sets.
In the context of ergodic theory, ‘almost everywhere’ means, by definition, ‘everywhere in X
except, perhaps, in a set of measure zero’. That is, whenever a claim is qualified as ‘almost
everywhere’ it means that it could be false for some points in X, but these taken together have
measure zero. Now we are in a position to explain what the phrase means in the definition of
ergodicity. As we have seen above, the time average (but not the space average!), depends on
the initial condition. If we say that ff almost everywhere we mean that those initial
conditions for which it turns out to be the case that ff taken together form a set of
measure zero – they are like a line in the plane.
Armed with this understanding of the definition of ergodicity, we can now discuss some
important properties of ergodic systems. Consider a subset A of X . For instance, thinking
again about the example of the oscillating ball, take the left half of the phase space. Then
11
define the so-called characteristic function of A , Af , as follows: 1)( xf A for all x in A and
0)( xf A for all x not in A . Plugging this function into the definition of ergodicity yields:
)(Af A . This means that the proportion of time that the system’s state spends in set A is
proportional to the measure of that set. To make this even more intuitive, assume that the
phase space is normalised: 1)( X (this is a very common and unproblematic assumption).
If we then choose A so that 2/1)( A , then we know that the system spends half of the
time in A ; if 4/1)( A it spends a quarter of the time in A ; etc. As we will see below, this
property of ergodic systems plays a crucial role in certain approaches to statistical mechanics.
Since we are free to choose A as we wish, we immediately get another important result: a
system can be ergodic only if its trajectory may access all parts of X of positive measure, i.e.
if the trajectory passes arbitrarily close to any point in X infinitely many times as time tends
towards infinity. And this implies that the phase space of ergodic systems is what is called
‘irreducible’ or ‘inseparable’: every set invariant under T (i.e. every set that is mapped onto
itself under T ) has either measure 0 or 1. As a consequence, X cannot be divided into two or
more subspaces (of non-zero measure) that are invariant under T . Conversely a non-ergodic
system is reducible. A reducible system is schematically illustrated in Figure 3.
P Q
X
Fig. 3: Reducible system: no point in region P evolves into region Q and vice versa.
Finally, we would like to state a theorem that will become important in Section 5. One can
prove that a system is ergodic iff
12
)()()(1
lim1
0
ABABTn
n
kk
n
(E)
holds for all subsets A and B of X. Although this condition does not have an immediate
intuitive interpretation, we will see later on that it is crucial in understanding the kind of
randomness we find in ergodic systems.
4. The Ergodic Hierarchy
It turns out that ergodicity is only the bottom level of an entire hierarchy of dynamical
properties. This hierarchy is called the ergodic hierarchy, and the study of this hierarchy is the
core task of a mathematical discipline called ergodic theory. This choice of terminology is
somewhat misleading, since ergodicity is only the bottom level of this hierarchy and so EH
contains much more than ergodicity and the scope of ergodic theory stretches far beyond
ergodicty. Ergodic theory (thus understood) is part of dynamical systems theory, which
studies a wider class of dynamical systems than ergodic theory (in particular ones that have no
measure at all).
EH is a nested classification of dynamical properties. The hierarchy is typically represented as
consisting of the following five levels:
Bernoulli Kolmogorov Strong Mixing Weak Mixing Ergodic
The diagram is intended to indicate that all Bernoulli systems are Kolmogorov systems, all
Kolomogorov systems are strong mixing systems, and so on. Hence all systems in EH are
ergodic. However, the converse relations need not hold: not all ergodic systems are weak
mixing, and so on. A system that is ergodic but not weak mixing is referred to in what follows
as merely ergodic and similarly for the next three levels.10
10 Sometimes EH is presented as having another level, namely C-systems (also referred to as Anosov systems or
completely hyperbolic systems). Although interesting in their own right, C-systems are beyond the scope of this
review. They do not have a unique place in EH and their relation to other levels of EH depends on details, which
we cannot discuss here. Paradigm examples of C-systems are located between K- and B-systems; that is, they are
K-systems but not necessarily B-systems. The cat map, for instance, is a C-system that is also a K-system
13
Mixing can be intuitively explained by the following example, first used by Gibbs in
introducing the concept of mixing. Begin with a glass of water, then add a shot of scotch; this
is illustrated in Fig. 4a. The volume C of the cocktail (scotch + water) is (C) and the volume
of scotch that was added to the water is (S), so that in C the concentration of scotch is
(S)/(C).
(4a) (4b)
Fig. 4 Mixing
Now stir. Mathematically, stirring is represented by the time evolution T , meaning that T (S)
is the region occupied by the scotch after one unit of mixing time. Intuitively we say that the
cocktail is thoroughly mixed, if the concentration of scotch equals (S) / (C) not only with
respect to the whole volume of fluid, but with respect to any region V in that volume. Hence,
the drink is thoroughly mixed at time n if )(/)()(/)( CSVVSTn for any volume V
(of non-zero measure). Now assume that the volume of the cocktail is one unit: (C) 1
(which we can do without loss of generality since there is always a unit system in which the
volume of the glass is one). Then the cocktail is thoroughly mixed iff (Lichtenberg & Liebermann, 1992, p. 307); but there are K-systems such as the so-called stadium billiard which
are not C-systems (Ott, 1993, p. 262). Some C-systems preserve a smooth measure (where ‘smooth’ in this
context means absolutely continuous with respect to the Lebesgue measure), in which case they are Bernoulli
systems. But not all C-systems have smooth measures. It is always possible to find other measures such as SRB
(Sinai, Ruelle, Bowen) measures. However, matters are more complicated in such cases, as such C-systems need
not be mixing and a fortiori they need not be K- or B-systems (Ornstein & Weiss, 1991, pp. 75–82).
14
)()(/)( SVVSTn for any region V (of non-zero measure). But how large must n be
before the stirring ends with the cocktail well stirred? We now suppose that the bartender
takes infinitely long to thoroughly mix the drink, so that mixing is achieved just in the infinite
limit: )()(/)(lim SVVSTnn
for any region V (of non-zero measure). If we now
associate the glass with the phase space X and replace the scotch S and the volume V with two
arbitrary subsets A and B of X, then we get without further ado the general definition of what
is called strong mixing (often also referred to just as ‘mixing’): a system is strong mixing iff
)()()(lim ABABTnn
(S-M)
for all subsets A and B of X. This requirement for mixing can be relaxed a bit by allowing for
fluctuations. That is, instead of requiring that the cocktail reach a uniform state of being
mixed, we now only require that it be mixed on average. In other words, we allow that
bubbles of either scotch or water may crop up every now and then, but they do so in a way
that these fluctuations average out as time tends towards infinity. This translates into
mathematics in a straightforward way. The deviation from the ideally mixed state at some
time n is )()()( ABABTn . The requirement that the average of these deviations
vanishes inspires the notion of weak mixing. A system is weak mixing iff
0)()()(1
lim1
0
n
kk
nABABT
n (W-M)
for subsets A and B of X. The vertical strokes denote the so-called absolute value; for instance:
555 . One can prove that there is a strict implication relation between the three
dynamical properties we have introduced so far: strong mixing implies weak mixing, but not
vice versa; and weak mixing implies ergodicity, but not vice versa. Hence, strong mixing is
stronger condition than weak mixing, and weak mixing is stronger condition than ergodicity.
The next higher level in EH are K-systems. Unlike in the cases of ergodic and mixing
systems, there is unfortunately no intuitive way of motivating the standard definition of such
systems, and the definition is such that one cannot read off from it the characteristics of K-
systems are (we state this definition in Appendix C). The least unintuitive way to present K-
systems is via a theorem due to Cornfeld et al (1982, 283), who prove that a dynamical
15
system is a K-system iff it is K-mixing. A system is K-mixing iff for any subsets
rAAA ,...,, 10 of X (where r is a natural number of your choice) the following condition holds:
0)()()(suplim 00),(
ABABrnBn
, (K-M)
where ),( rn is the minimal -algebra generated by the set },...,1;:{ rjnkAT jk . It is far
from obvious what this so-called sigma algebra is and hence the content of this condition is
not immediately transparent. We will come back to this issue in Section 6 where we provide
an intuitive reading of this condition. What matters for the time being is its similarity to the
mixing condition. Strong mixing is, trivially, equivalent to 0)()()(lim
ABABTnn
.
So we see that K-mixing adds something to strong mixing.
In passing we would like to mention another important property of K-systems: one can prove
that K-systems have positive Kolmogorov-Sinai entropy (KS-entropy); for details see
Appendix C. The KS-entropy itself does not have an intuitive interpretation, but it relates
three other concepts of dynamical systems theory in an interesting way, and these do have
intuitive interpretation. First, Lypunov exponents are a measure for how fast two originally
nearby trajectories diverge on average, and they are often used in chaos theory to characterise
the chaotic nature of the dynamics of a system. Under certain circumstances (essentially, the
system has to be differentiable and ergodic) one can proof that a dynamical system has a
positive KS-entropy if and only if it has positive Lyapounov exponents (Lichtenberg and
Liebermann 1992, 304). In such a system initially arbitrarily close trajectories diverge
exponentially. This result is known as Pessin’s theorem. Second, the algorithmic complexity
of a sequence is the length of the shortest computer programme needed to reproduce the
sequence. Some sequences are simple; e.g. a string of a million ‘1’ is simple: the programme
need to reproduce it basically is ‘write ‘1’ a million times’, which is very short. Others are
complex: there is no pattern in the decimal expansion of number that one could exploit,
and so a programme reproducing that expansion essentially reads ‘write 3.14…’, which is as
long as the decimal expansion of itself. In the discrete case a trajectory can be represented
as a sequence of symbols of this kind (it is basically a list of states). It is then the case that if a
system is a K-system, then its KS-entropy equals the algorithmic complexity of almost all its
trajectories (Brudno 1978). This is known as Brudno’s theorem (Alekseev and Yakobson
1981). Third, the Shannon entropy is a common measure for the uncertainty of a future
16
outcome: the higher the entropy the more uncertain we are about what is going to happen.
One can then prove that, given certain plausible assumptions, the KS-entropy is equivalent to
a generalised version of the Shannon entropy, and can hence be regarded as a measure for the
uncertainty of future events given past events (Frigg 2004).
Bernoulli systems mark highest level in EH. In order to define Bernoulli systems we first have
to introduce the notion of a partition of X (sometimes also ‘coarse graining of X’). A partition
of X is a division of X in to different parts (so called ‘atoms of the partition’) so that these
parts don’t overlap and jointly cover X (i.e. they are mutually exclusive and jointly
exhaustive). For instance, in Figure 1 there is a partition on the phase space that has two
atoms (the left and the right part). More formally, },...,{ 1 m is a partition of X (and the
i its atoms) iff (i) the intersection of any two atoms of the partition is the empty set, and (ii)
the union of all atoms is X. Furthermore it is important to notice that a partition remains a
partition under the dynamics of the system. That is, if is a partition, then
}...,,{ 1 mnnn TTT is a partition too for all n.
There are of course many different ways of partitioning a phase space, and so there exist
different partitions. In what follows we are going to study how different partitions relate to
each other. An important concept in this connection is independence. Let and be two
partitions of X. By definition, these partitions are independent iff )()()( jiji
for all atoms i of and all atoms j of . We will explain the intuitive meaning of this
definition (and justify calling it ‘independence’) in Section 5; for the time being we just use it
as a formal definition.
With these notions in hand we can now define a Bernoulli transformation: a transformation T
is a Bernoulli transformation iff there exists a partition of X so that the images of under
T at different instants of time are independent; that is, the partitions ...,,,,... 101 TTT are all
independent.11 In other words, T is a Bernoulli transformation iff
11 To be precise, a second condition has to be satisfied: must be T-generating (Mañé 1983, 87). However,
what matters for out considerations is the independence condition.
17
)()()( jiji (B)
for all atoms i of kT and all atoms j of lT for all lk . We then refer to as the
Bernoulli partition, and we call a dynamical system ],,[ TX a Bernoulli system if T is a
Bernoulli automorphism, i.e. a Bernoulli transformation mapping X onto itself.
Let us illustrate this with a well-known example, the baker’s transformation (so named
because of its similarity to the kneading of dough). This transformation maps the unit square
onto itself. Using standard Cartesian coordinates the transformation can be written as follows:
Tx
y
2x
y / 2
for 0 x 1 / 2 , and Tx
y
2x 1
y / 2 1 / 2
for 1 / 2 x 1.
In words, for all points (x, y) in the unit square that have an x-coordinate smaller than 1/2, the
transformation T doubles the value of x and halves the value of y. For all the points (x, y) that
have an x-coordinate greater or equal to 1/2, T transforms x in to 2x–1 and y into y/2–1/2. This
is illustrated in Fig. 5a.
Fig 5a. The Baker’s transformation
Now regard the two areas shown above as the two atoms of a partition },{ 21 . It is then
easy to see that and T are independent: 4/1)()()( 2121 TT , and similarly
for all other atoms of and T . This is illustrated in Figure 5b.
0
baker’s transformation
x x
y y
1
1
0
18
Fig 5b. The independence of and T .
One can prove that independence holds for all other iterates of as well. So the baker’s
transformation together with the partition is a Bernoulli transformation.
In the literature Bernoulli systems are often introduced using so-called shift maps (or
Bernoulli shifts). Those readers not familiar with these can skip the rest of this section without
loss for what follows; for the other readers we here briefly indicate how shift maps are related
to Bernoulli systems with the example of the baker’s transformation (for a more general
discussion see Appendix D). Chose a point in the unit square and write its x and y co-ordinate
as binary numbers: x 0.a1a2a3...and y 0.b1b2b3... , where all the ai and bi are either 0 or 1.
Now put both strings together back to back with a dot in the middle to form one infinite
string: ....... 321123 aaabbbS , which may represent the state of the system just as a ‘standard’
two-dimensional vector does. Some straightforward algebra then shows that
...).0,....0(...).0...,.0( 211432321321 bbaaaabbbaaaT . From this we see that in our ‘one string’
representation of the point the operation of T amounts to shifting the dot one position to the
right: ....... 321123 aaabbbTS Hence, the baker’s transformation is equivalent to a shift on an
infinite string of zeros and ones.12
There are two further notions that are crucial to the theory of Bernoulli systems, the property
of being weak Bernoulli and very weak Bernoulli. These properties play a crucial role in
showing that certain transformations are in fact Bernoulli. The example of the baker’s
12 A formal isomorphism proof can be found in Cornfeld et al. (1982, 9-10).
0
x x
y y
1
1
0 x
y
2 2T
1
1T
21 T
19
transformation is one of the few examples that have a geometrically simple Bernoulli
partition, and so one often cannot prove directly that a system is a Bernoulli system. One then
shows that a certain geometrically simple partition is weak Bernoulli and uses a theorem due
to Ornstein to the effect that if a system is weak Bernoulli then there exists a Bernoulli
partition for that system. The mathematics of these notions and the associated proofs of
equivalence are intricate and a presentation of them is beyond the scope of this entry. The
interested reader is referred to Ornstein (1974) or Shields (1973).
5. The Ergodic Hierarchy and Statistical Mechanics
The concepts of EH, and in particular ergodicity itself, play important roles in the foundation
of statistical mechanics (SM). In this section we review what these roles are.
A discussion of statistical mechanics faces an immediate problem. Foundational debates in
many other fields of physics can take as their point of departure a generally accepted
formalism. Not so in SM. Unlike, say, quantum mechanics and relativity theory, SM has not
yet found a generally accepted theoretical framework, let alone a canonical formulation.13
What we find in SM is plethora of different approaches and schools, each with its own
programme and mathematical apparatus.14 However, all these schools use (slight variants) of
either of two theoretical frameworks, one of which can be associated with Boltzmann (1877)
and the other with Gibbs (1902), and can thereby be classify either as ‘Boltzmannian’ or
‘Gibbsian’. For this reason we divide our presentation of SM into a two parts, one for each of
these families of approaches.
Before delving into a discussion of these theories, let us briefly review the basic tenets of SM
by dint of a common example. Consider a gas that is confined to the left half of a box. Now
we remove the barrier separating the two halves of the box. As a result, the gas quickly
disperses, and it continues to do so until it homogeneously fills the entire box. The gas has
approached equilibrium. This raises two questions. First, how is equilibrium characterised?
13 A similar situation exists for quantum field theory, which has a number of inequivalent formulations including
(to name just a few) the canonical, algebraic, axiomatic, and path integral frameworks. 14 For detailed reviews of SM see Frigg (2008), Sklar (1993) and Uffink (2007). Those interested in the long and
intricate history of SM are referred to Brush (1976) and von Plato (1994).
20
That is, what does it take for a system to be in equilibrium? Second, how do we characterise
the approach to equilibrium? That is, what are the salient features of the approach to
equilibrium and what features of a system make it behave in this way? These questions are
addressed in two subdisciplines of SM: equilibrium SM and non-equilibrium SM.
There are two different ways of describing processes like the spreading of a gas.
Thermodynamics (TD) describes the system using a few macroscopic variables (in the case
of the gas pressure, volume and temperature) and characterises both equilibrium and the
approach to equilibrium in terms of the behaviour of these variables, while completely
disregarding the microscopic constitution of the gas. As far as TD is concerned matter could
be a continuum rather than consisting of particles – it just would not make any difference. For
this reason TD is a ‘macro theory’.
The cornerstone of TD is the so-called Second Law of TD. This law describes one of the
salient features of the above process: its unidirectionality. We see gases spread – i.e. we see
them evolving towards equilibrium – but we never observe gases spontaneously reverting to
the left half of a box – i.e. we never see them move away from equilibrium when left alone.
And this is not a specific feature gases. In fact, not only gases but also all other macroscopic
systems behave in this way, irrespective of their specific makeup. This fact is enshrined in the
Second Law of TD, which, roughly, states that transitions from equilibrium to non-
equilibrium states cannot occur in isolated systems, which is the same as saying that entropy
cannot decrease in isolated systems (where a system is isolated if it has no interaction with its
environment: there is no heat exchange, no one is compressing the gas, etc.).
But there is an altogether different way of looking at that same gas. The gas consists of a large
number of gas molecules (a vessel on a laboratory table contains something like 2310
molecules). These molecules bounce around under the influence of the forces exerted onto
them when they crash into the walls of the vessel and collide with each other. The motion of
each molecule under these forces is governed by laws of classical mechanics in the same way
as the motion of the bouncing ball. So rather than attributing some macro variables to the gas
and focussing on them, we could try to understand the gas’ behaviour by studying the
dynamics of its micro constituents.
21
This also raises the question of how the two ways of looking at the gas fit together. Since
neither the thermodynamic nor the mechanical approach is in any way privileged, both have
to lead to the same conclusions. Statistical mechanics is the discipline that addresses this task.
From a more abstract point of view we can therefore also say that SM is the study of the
connection between micro-physics and macro-physics: it aims to account for a system’s
macro behaviour in terms of the dynamical laws governing its microscopic constituents. The
term ‘statistical’ in its name is owed to the fact that, as we will see, a mechanical explanation
can only be given if we also introduce probabilistic elements into the theory.
5.1 Boltzmannian SM
We first introduce the main elements of the Boltzmannian framework and then turn to the use
of ergodicity in it. Every system can posses various macrostates kMM ,...,1 . These
macrostates are characterised by the values of macroscopic variables, in the case of a gas
pressure, temperature, and volume.15 In the introductory example one macro-state corresponds
to the gas being confined to the left half, another one to it being spread out. In fact, these two
states have special status: the former is the gas’ initial state; the latter is the gas’ equilibrium
state. We label the states IM and EqM respectively.
It is one of the fundamental posits of the Boltzmann approach that macrostates supervene on
microstates, meaning that a change in a system’s macrostate must be accompanied by a
change in its microstate (for a discussion of supervenience see McLaughlin and Bennett 2005,
and references therein). For instance, it is not possible to change the pressure of a system and
at the same time keep its micro-state constant. Hence, to every given microstate x there
corresponds exactly one macrostate. Let us refer to this macrostate as )(xM . This
determination relation is not one-to-one; in fact many different x can correspond to the same
macrostate. We now group together all microstates x that correspond to the same macro-state,
which yields a partitioning of the phase space in non-overlapping regions, each corresponding
to a macro-state. For this reason we also use the same letters, kMM ,...,1 , to refer to macro-
states and the corresponding regions in phase space. Two macrostates are of particular 15 It is a common assumption in the literature on Boltzmannian SM that there is a finite number of macrostates
that system can possess. We should point out, however, that this assumption is based on an idealisation if the
relevant macro variables are continuous. In fact, we obtain a finite number of macrostates only if we coarse-grain
the values of the continuous variables.
22
importance: the macrostate at the beginning of the process, which is also referred to as the
‘past state’, and the equilibrium state. For this reason we introduce special labels for them,
pM and eqM , respectively, and choose the numbering of the macrostates so that pMM 1
and eqk MM (which, trivially, we always can). This is illustrated in Figure 6a.
Figure 6: The macrostate structure of X.
We are now in a position to introduce the Boltzmann entropy. To this end recall that we have
a measure on the phase space that assigns to every set a particular volume, hence a fortiori
also to macrostates. With this in mind, the Boltzmann entropy of a macro-state jM can be
defined as )](log[)( jBjB MkMS , where Bk is the Boltzmann constant. The important
feature of the logarithm is that it is a monotonic function: the larger jM , the larger its
logarithm. From this it follows that the largest macro-state also has the highest entropy!
One can show that, at least in the case of dilute gases, the Boltzmann entropy coincides with
the thermodynamic entropy (in the sense that both have the same functional dependence on
the basic state variables), and so it is plausible to say that the equilibrium state is the macro-
state for which the Boltzmann entropy is maximal (since TD posits that entropy be maximal
for equilibrium states). By assumption the system begins in a low entropy state, the initial
state IM (the gas being squeezed into the left half of the box). The problem of explaining the
approach to equilibrium then amounts to answering the question: why does a system
originally in IM eventually move into EqM and then stay there? (see Figure 6b)
(6a) (6b)
Mp
M2
M3
M4
M5 M6
Meq
Mp
M2
M3
M4
M5 M6
Meq
23
In the 1870s Boltzmann offered an important answer to this question.16 At the heart of his
answer lies the idea to assign probabilities to macrostates according to their size. So
Boltzmann adopted the following postulate: )()( jj McMp for all kj ,...,1 , where c is a
normalisation constant assuring that the probabilities add up to one. Granted this postulate, it
follows immediately that the most likely state is the equilibrium state (since the equilibrium
state occupies the largest chunk of the phase space). From this point of view it seems natural
to understand the approach to equilibrium as the evolution from an unlikely macrostate to a
more likely macrostate and finally to the most likely macro-state. This, Boltzmann argued,
was a statistical justification of the Second Law of TD.
But Boltzmann knew that simply postulating )()( jj McMp would not solve any problems
unless the postulate could be justified in terms of the dynamics of the system. This is where
ergodicity enters the scene. As we have seen above, ergodic systems have the property of
spending a fraction of time in each part of the phase space that is proportional to its size (with
respect to ). As we have also seen, the equilibrium state is the largest macrostate. And what
is more, the equilibrium state in fact is much larger than the other states. So if we assume that
the system is ergodic, then it is in equilibrium most of the time! It is then natural to interpret
)( jMp as a time average: )( jMp is the fraction of time that the system spends in state jM
over the course of time. Now all the elements of Boltzmann’s position are on the table: (a)
partition the phase space of the system in macrostates and show that the equilibrium state is
by far the largest state; (b) adopt a time average interpretation of probability; (c) assume that
the system in question is ergodic. It then follows that the system is most likely to be found in
equilibrium, which justifies the Second law.
Three objections have been levelled against this line of thought. First, it is pointed that
assuming ergodicity is too strong in two ways. The first is that it turns out to be extremely
difficult to prove that the systems of interest really are ergodic. Contrary to what is sometimes
asserted, not even a system of n elastic hard balls moving in a cubic box with hard reflecting
walls has been proven to be ergodic for arbitrary n; it has been proven to be ergodic only for
4n . To this charge one could reply that what looks like defeat to some is in fact just a
16 Uffink (2004, 2007) provides an overview over the tangled development of Boltzmann’s constantly changing
views. Frigg (2009a) discusses probabilities in Boltzmann’s account.
24
challenge and progress in mathematics will eventually resolve the issue, and there is at least
one recent result that justifies optimism: Simanyi (2004) shows that a system of n hard balls
on a torus of dimension 3 or greater is ergodic, for an arbitrary natural number n.
The second way in which ergodicity seems to be too strong is that even if eventually we can
come by proofs of ergodicity for the relevant systems, the assumption is too strong because
there are systems that are known not to be ergodic and yet they behave in accordance with the
Second Law. Bricmont (2001) investigates the Kac Ring Model and a system of n uncoupled
anharmonic oscillators of identical mass, and points out that both systems exhibit
thermodynamic behaviour and yet they fail to be ergodic. Hence, ergodicity is not necessary
for thermodynamic behaviour. But Earman and Redei (1996, p. 70) and van Lith (2001a, p.
585) argue that if ergodicity is not necessary for thermodynamic behaviour, then ergodicity
cannot provide a satisfactory explanation for this behaviour. Either there must be properties
other than ergodicity that explain thermodynamic behaviour in cases in which the system is
not ergodic, or there must be an altogether different explanation for the approach to
equilibrium even for systems which are ergodic.
In response to this objection, Vranas (1998) argues that most systems that fail to be ergodic
are ‘almost ergodic’ in specifiable way, and this is good enough. We discuss Vranas’
approach below when discussing Gibbsian SM since that is the context in which he has put
forward his suggestion. Frigg (2009b) suggested exploiting the fact that almost all
Hamiltonian systems are non-integrable, and that these systems have so-called Arnold webs,
i.e. large regions of phase space on which the motion of the system is ergodic. Lavis (2005)
re-examined the Kac ring model and has pointed out that although the system is not ergodic, it
has an ergodic decomposition (roughly, there exists a partition of the system’s phase space
and the dynamics is ergodic in each atom of the partition), which is sufficient to guarantee the
approach to equilibrium. He has also challenged the assumption, implicit in the above
criticism, that providing an explanation for the approach to equilibrium amounts to identifying
one (and only one!) property that all systems have in common. In fact, it may be the case that
different properties are responsible for the approach to equilibrium in different systems, and
there is not reason to rule out such explanations. This squares well with Bricomont’s (2001)
own observation that what drives a system of anharmonic oscillators to equilibrium is some
kind of mixing in the individual degrees of freedom. In sum, the tenor of all these responses is
25
that even though ergodicity simpliciter does have the resources to explain the approach to
equilibrium, a somewhat qualified use of EH does.
The second objection is that even if ergodicity obtains, this is not sufficient to give us what
we need. As we have seen above, ergodicity comes with the qualification ‘almost
everywhere’. This qualification is usually understood as suggesting that sets of measure zero
can be ignored without detriment. The idea is that points falling in a set of measure zero are
‘sparse’ and can therefore be neglected. The question of whether or not this move is
legitimate is know as the ‘measure zero problem’.
Simply neglecting sets of measure zero seems to be problematic for various reasons. First,
sets of measure zero can be rather ‘big’; for instance, the rational numbers have measure zero
within the real numbers. Moreover, a set of measure zero need not be (or even appear)
negligible if sets are compared with respect to properties other than their measures. For
instance, we can judge the ‘size’ of a set by its cardinality or Baire category rather than by its
measure, which leads us to different conclusions about the set’s size (Sklar 1993, pp. 182-88).
It is also a mistake to assume that an event with measure zero cannot occur. In fact, having
measure zero and being impossible are distinct notions. Whether or not the system at some
point was in one of the special initial conditions for which the space and time mean fail to be
equal is a factual question that cannot be settled by appeal to measures; pointing out that such
points are scarce in the sense of measure theory does not do much, because it does not imply
that they are scarce in the world as well.
In response two things can said. First, discounting sets of measure zero is standard practice in
physics and the problem is not specific to ergodic theory. So unless there is a good reason to
suspect that specific measure zero states are in fact important, one might argue that the onus
of proof is on those who think that discounting them in this case is illegitimate. Second, the
fact that SM works in so many cases suggests that they indeed are scarce.
The third criticism is rarely explicitly articulated, but it is clearly in the background of
contemporary Boltzmannian approaches to SM such as Albert’s (2000) who rejects
Boltzmann’s starting point, namely the postulate )()( jj McMp . Instead Albert introduces
another postulate, essentially providing transition probabilities between two macrostates
conditional on the so-called Past Hypothesis, the posit that the universe came into existence in
26
a low entropy state (the Big Bang). Albert then argues that in such an account erogidicity
becomes an idle wheel, and hence he rejects it as completely irrelevant to the foundations of
SM. This, however, may well be too hasty. Although it is true that ergodicity simpliciter
cannot justify Alberts probability postulate, another dynamical assumption is needed in order
for this postulate to be true (Frigg 2009a). We don’t know yet what this assumption is, but EH
may well be helpful in discussing the issue and ultimately formulating a suitable condition.
5.2 Gibbsian SM
At the beginning of the Gibbs approach stands a radical rupture with the Boltzmann
programme. The object of study for the Boltzmannians is an individual system, consisting of a
large but finite number of micro constituents. By contrast, within the Gibbs framework the
object of study is a so-called ensemble. An ensemble is an imaginary collection of infinitely
many copies of the same system (they are the same in that they have the same phase space,
dynamics and measure), but who happen to be in different states. An ensemble of gases, for
instance, consists of infinitely many copies of the same gas which are, however, in different
states: one is concentrated in the left corner of the box, one is evenly distributed, etc. It is
important to emphasise that ensembles are fictions, or ‘mental copies of the one system under
consideration’ (Schrödinger 1952, 3). Hence, it is important not to confuse ensembles with
collections of micro-objects such as the molecules of a gas!
The instantaneous state of one system of the ensemble is specified by one point in its phase
space. The state of the ensemble as a whole is therefore specified by a density function on
the system's phase space. From a technical point of view is a function just like f that we
encountered in Section 2. We furthermore assume that is a probability density, reflecting
the probability of finding the state of a system chosen at random from the entire ensemble in
region R : R
dRp )( . To make this more intuitive consider the following simple
example. You play a special kind of darts: you fix a plank to the wall, which serves as your
dart board. For some reason you know that the probability of your dart landing at a particular
place on the board is given by the curve shown in Figure 7. You are then asked what the
probability is that your next dart lands in the left half of the board. The answer is 1/2 since
one half of the surface underneath the curve is on the left side. In SM R plays the role of a
27
particular part of the board (in the example here the left half), and is the probability, but
not for a dart landing but for finding a system there.
Fig. 7 Dart board.
The importance of this is that it allows us to calculate expectation values. Assume that the
game is such that you get one Pound if the dart hits the left half and three Pounds if it lands on
the right half. What is your average gain? The answer is 12/1 Pound 32/1 Pounds = 2
Pounds. This is the expectation value. The same idea is at work in SM. Physical magnitudes
like, for instance, pressure are associated with functions f on the phase space. We then
calculate the expectation value, which, in general is given by dff . In the context of
Gibbsian SM these expectation values are also referred to as phase averages or ensemble
averages. They are of central importance because it is the fundamental posits of Gibbsian SM
that these values are what we observe in experiments! So if you want to use the formalism to
make predictions, you first have to figure out what the probability distribution is, then find
the function f corresponding to the physical quantity you are interested in, and then calculate
the phase average. Neither of these steps is easy in practice and working physicists spend
most of their time doing these calculations. However, these difficulties need not occupy us if
we are interested in the conceptual issues underlying this ‘recipe’.
By definition, a probability distribution is stationary if it does not change over time. Given
that observable quantities are associated with phase averages and that equilibrium is defined
x
(x)
1/2 of the total surface under the curve
28
in terms of the constancy of the macroscopic parameters characterising the system, it is
natural to regard the stationarity of the distribution as a necessary condition for equilibrium
because stationary distributions yield constant averages. For this reason Gibbs refers to
stationarity as the ‘condition of statistical equilibrium’.
Among all stationary distributions those satisfying a further requirement, the Gibbsian
maximum entropy principle, play a special role. The Gibbs entropy (sometimes also
`ensemble entropy') is defined as dkS BG )log()( . The Gibbsian maximum entropy
principle then requires that )(GS be maximal, given the constraints that are imposed on the
system.
The last clause is essential because different constraints single out different distributions. A
common choice is to keep both the energy and the particle number in the system fixed. One
can prove that under these circumstances )(GS is maximal for the so-called microcanonical
distribution (or microcanonical ensemble). If we choose to hold the number of particles
constant while allowing for energy fluctuations around a given mean value we obtain the so-
called canonical distribution; if we also allow the particle number to fluctuate around a given
mean value we find the so-called grand-canonical distribution.17
This formalism is enormously successful in that correct predictions can be derived for a vast
class of systems. But the success of this formalism is rather puzzling. The first and most
obvious question concerns the relation of systems and ensembles. The probability distribution
in the Gibbs approach is defined over an ensemble, the formalism provides ensemble
averages, and equilibrium is regarded as a property of an ensemble. But what we are really
interested in is the behaviour of a single system! What could the properties of an ensemble – a
fictional entity consisting of infinitely many mental copies of the real system – tell us about
the one real system on the laboratory table? And more specifically, why do averages over an
ensemble coincide with the values found in measurements performed on an actual physical
system in equilibrium? There is no obvious reason why this should be so, and it turns out that
ergodicity plays a central role in answering these questions.
17 For details see, for instance, Tolman (1938 Chs. 3 and 4).
29
Common textbook wisdom justifies the use of phase averages as follows. As we have seen the
Gibbs formalism associates physical quantities with functions f on the system’s phase space.
Making an experiment measuring one of these quantities takes time. So what measurement
devices register is not the instantaneous value of the function in question, but rather its time
average over the duration of the measurement; hence time averages are what is empirically
accessible. Then, so the argument continues, although measurements take an amount of time
that is short by human standards, it is long compared to microscopic time scales on which
typical molecular processes take place. For this reason it is assumed that the measured finite
time average is approximately equal to the infinite time average of the measured function. If
we now assume that the system is ergodic, then time averages equal phase averages. The
latter can easily be obtained from the formalism. Hence we have found the sought-after
connection: the Gibbs formalism provides phase averages which, by ergodicity, are equal to
infinite time averages, and these are, to a good approximation, equal to the finite time
averages obtained from measurements.
This argument is problematic for at least two reasons. First, from the fact that measurements
take some time it does not follow that what is actually measured are time averages. For
instance, it could be the case that the value provided to us by the measurement device is
simply the value assumed by f at the last moment of the measurement, irrespective of what the
previous values of f were (e.g. it’s simply the last pointer reading registered). So we would
need an argument for the conclusion that measurements indeed produce time averages.
Second, even if we take it for granted that measurements do produce finite time averages, then
equating these with infinite time averages is problematic. Even if the duration of the
measurement is long by experimental standards (which need not be the case), finite and
infinite averages may assume very different values. That is not to say that they necessarily
have to be different; they could coincide. But whether or not they do is an empirical question,
which depends on the specifics of the system under investigation. So care is needed when
replacing finite with infinite time averages, and one cannot identify them without further
argument.
These criticisms call for a different strategy. Two suggestions stand out. Space constraints
prevent a detailed discussion and so we will only briefly indicate what the main ideas are;
more extensive discussion can be found in references given in footnote 13.
30
Malament and Zabell (1980) respond to this challenge by suggesting a way of explaining the
success of equilibrium theory that still invokes ergodicity, but avoids altogether appeal to time
averages. This avoids the above mentioned problems, but suffers from the difficulty that many
systems that are successfully dealt with by the formalism of SM are not ergodic. To
circumvent this difficulty Vranas (1998) has suggested replacing ergodicity with what he calls
ergodicity. Intuitively a system is ergodic if it is ergodic not on the entire phase space,
but on a very large part of it (those parts on which it is not ergodic having measure , where
is very small). The leading idea behind his approach is to challenge the commonly held
belief that even if a system is just a ‘little bit’ non-ergodic, then it behaves in a completely
‘un-ergodic’ way. Vranas points out that there is a middle ground and then argues that this
middle ground actually provides us with everything we need. This is a promising proposal,
but it faces three challenges. First, it needs to be shown that all relevant systems really are
ergodicity. Second, the argument so far has only been developed for the microcanonical
ensemble, but one would like to know whether, and if so how, it works for the canonical and
the grandcanonical ensemble. Third, it is still based on the assumption that equilibrium is
characterised by a stationary distribution, which, as we will see below, is an obstacle when it
comes to formulating a workable Gibbsian non-equilibrium theory.
The second response begins with Khinchin's work. Khinchin (1949) pointed out that the
problems of the ergodic programme are due to the fact that it focuses on too general a class of
systems. Rather than studying dynamical systems at a general level, we should focus on those
cases that are relevant in statistical mechanics. This involves two restrictions. First, we only
have to consider systems with a large number of degrees of freedom; second, we only need to
take into account a special class of phase functions, the so-called ‘sum functions’. These are
functions are a sum of one-particle functions, i.e. functions that take into account only the
position and momentum of one particle. Under these assumption Khinchin proved that as n
becomes larger, the measure of those regions on the energy hypersurface18 where the time and
the space means differ by more than a small amount tends towards zero. Roughly speaking,
this result says that for large n the system behaves, for all practical purposes, as if it was
ergododic.
The problem with this result is that it is valid only for sum functions, and in particular only if
the energy function of the system is itself a sum function, which usually is not the case
18An energy hypersurface is an hypersurface in the system’s phase space on which the energy is constant.
31
whenever particles interact. So the question is how this result can be generalised to more
realistic cases. This problem stands at the starting point of a research programme now known
as the thermodynamic limit, championed, among others, by Lanford, Mazur, Ruelle, and van
der Linden (see van Lith (2001) for a survey). Its leading question is whether one can still
prove ‘Khinchin-like’ results in the case of energy function with interaction terms.19 Results
of this kind can be proven in the limit for n , if also the volume V of the system tends
towards infinity in such a way that the number-density Vn / remains constant.
So far we have only dealt with equilibrium, and things get worse once we deal with non-
equilibrium. The main problem is that it is a consequence of the formalism that the Gibbs
entropy is a constant! This precludes a characterisation of the approach to equilibrium in
terms of increasing Gibbs entropy, which is what one would expect if we were to treat the
Gibbs entropy as the SM counterpart of the thermodynamic entropy. The standard way around
this problem is to coarse-grain the phase space, and then define the so-called coarse grained
Gibbs entropy. Put simply, course-graining the phase space amounts to putting grid on the
phase space and declare that all points within one cell of the grid are indistinguishable. This
procedure turns a continuous phase space into a discrete collection of cells, and the state of
the system is the specified by saying in which cell it is. If we then define the Gibbs entropy on
this grid, it turns out (for purely mathematical reasons) that the entropy is no longer a constant
and can actually increase or decrease. If one then assumes that the system is mixing, it follows
from the so-called convergence theorem of ergodic theory that the coarse-grained Gibbs
entropy approaches a maximum. However, this solution is fraught with controversy, the two
main bones of contention being the justification of the coarse-graining and the assumption
that the system is mixing. Again, we refer the reader to the references given in Footnote 12 for
a detailed discussion of these controversies.
In sum, ergodicity plays a central role in many attempts to justify the posits of SM. And even
where a simplistic use of ergodicity is eventually unsuccessful, somewhat modified notions
prove fruitful in an analysis of the problem and in the search for better solutions.
19 To be more precise: what we are after is a proof for cases where there are nontrivial interaction terms, meaning
those for which there does not exist a canonical transformation that effectively eliminates such terms.
32
6. The Ergodic Hierarchy and Randomness
EH is often presented as a hierarchy of increasing degrees of randomness in deterministic
systems: the higher up in this hierarchy a system is placed the more random its behaviour.20
However, the definitions of different levels of EH do not make explicit appeal to randomness;
nor does the usual way of presenting EH involve a specification of the notion of randomness
that is supposed to underlie the hierarchy. So there is a question about what notion of
randomness underlies EH and in what sense exactly EH is a hierarchy of random behaviour.
Berkovitz, Frigg and Kronz (2006) discuss this problem and argue that EH is best understood
as a hierarchy of random behaviour if randomness is explicated in terms of unpredictability,
where unpredictability is accounted for in terms of probabilistic relevance, and different
degrees of probabilistic relevance, in turn, are spelled out in terms of different types of decay
of correlation between a system’s states at different times. Let us introduce these elements one
at a time.
Properties of systems can be associated with different parts of the phase space. In the ball
example, for instance, the property having positive momentum is associated with the right half
of the phase space; that is, it is associated with the set {x X : p 0}. Generalising this idea we
say that to every subset A of a system’s phase space there corresponds a property AP so that
the system possesses that property at time t iff the system’s state x is in A at t. The subset A
may be arbitrary and the property corresponding to A may not be intuitive, unlike, for
example, the property of having positive momentum. But nothing in the analysis to follow
hangs on a property being ‘intuitive’. We then define the event tA as the obtaining of AP at
time t.
At every time t there is a matter of fact whether AP obtains, which is determined by the
dynamics of the system. However, we may not know whether or not this is the case. We
therefore introduce epistemic probabilities expressing our uncertainty about whether AP
obtains: )( tAp reflects an agent’s degree of belief in AP ’s obtaining at time t. In the same way
we can introduce conditional probabilities: )|( 1tt BAp is our uncertainty that the system has
20 See for instance Lichtenberg & Liebermann, 1992; Ott, 1993; Tabor, 1989).
33
AP at t given that it had BP at an earlier time 1t , where B is also a subset of the system’s
phase space. By the usual rule of conditional probability we have
)(/)&()|( 111 ttttt BpBApBAp . This can of course be generalised to more then one event:
)&...&|( 1
1rt
rtt BBAp is our uncertainty that the system has AP at t given that it had
1BP at 1t ,
2BP at 2t , …, and rBP at rt , where rBB ,...,1 are subsets of the system’s phase space (and r a
natural number), and rtt ,...,1 are successive instants of time (i.e. rttt ...1 ).
Intuitively, an event in the past is relevant to our making predictions if taking the past event
into account makes a difference to our predictions, or more specifically if it lowers or raises
the probability for a future event. In other words, )()|( 1 ttt ApBAp is a measure for the
relevance of 1tB to predicting tA : 1tB is positively relevant if the 0)()|( 1 ttt ApBAp ,
negatively relevant if 0)()|( 1 ttt ApBAp , and irrelevant if 0)()|( 1 ttt ApBAp . For
technical reasons it turns out to be easier to work with )]()|()[( 11 tttt ApBApBp – which is
equivalent to )()()&( 11 tttt BpApBAp – rather than with )()|( 1 ttt ApBAp , but this makes
no conceptual difference since the multiplication with )( 1tBp does not alter relevance
relations. Therefore we adopt the following definition. The relevance of 1tB for tA is
)()()&(),( 111 tttttt BpApBApABR . (R)
The generalisation of this definition to cases with more than one set B (as above) is
straightforward.
Relevance serves to explicate unpredictability. Intuitively, the less relevant past events are for
tA , the less predictable the system is. This basic idea can then be refined in various ways.
First, the type of unpredictability we obtain depends on the type of events to which (R) is
applied. For instance, the degree of the unpredictability of tA increases if its probability is
independent not only of 1tB or other ‘isolated’ past events, but rather the entire past. Second,
the unpredictability of an event tA increases if the probabilistic dependence of that event on
past events 1tB decreases rapidly with the increase of the temporal distance between the
events. Third, the probability of tA may be independent of past events simpliciter, or it may
34
be independent of such events only ‘on average’. These ideas underlie the analysis of EH as a
hierarchy of unpredictability.
Before we can provide such an analysis, two further steps are needed. First, if the probabilities
are to be useful to understanding randomness in a dynamical system, the probability
assignment has to reflect the properties of the system. So we have to connect the above
probabilities to features of the system. The natural choice is the system’s measure .21 So we
postulate that the probability of an event tA is equal to the measure of the set A :
)()( AAp t for all t. This can be generalised to joint probabilities as follows:
)()&(1
1 BTABAp tttt
, (P)
for all instants of time 1tt and all subsets A and B of the system’s phase space. BT tt 1 is the
image of the set B under the dynamics of the system from 1t to t . This is illustrated in Figure
6. We refer to this postulate as the Probability Postulate (P). This is illustrated in Figure 8.
Again, this condition is naturally generalised to cases of joint probabilities of At with multiple
events Bti . Granted (P) and its generalization, (R) reflects the dynamical properties of
systems.
21 Provided that is normalised, which is the case in most systems studied in ergodic theory. Due to their
connection to some maybe inclined not to interpret the )( tAp as epistemic probabilities; in fact, in particular
in the literature on ergodic theory is often interpreted as a time average and so one could insist that )( tAp be
a time average as well. While this could be done, it is not conducive to our analysis. Our goal is explicate
randomness in terms degrees of unpredictability and to this end one needs to assume that )( tAp be epistemic
probabilities. However, contra radical Bayesianism, we posit that the values of these probabilities be constrained
(in the spirit of Lewis’ Principal Principle) by object facts about the system (here the measure ). But this does
not make these probabilities objective.
35
Figure 8: Condition (P)
Before introducing the next element of the analysis let us mention that there is a question
about whether the association of probabilities with the measure of the system is reasonable.
Prima facie, a measure on a phase space can have a purely geometrical interpretation and need
not necessarily have anything to do with the quantification of uncertainty. For instance, we
can use a measure to determine the length of a table, but this measure need not have anything
to do with uncertainty. Whether or not such an association is legitimate depends on the cases
at hand and the interpretation of the measure. However, for systems of interest in statistical
physics it is natural and indeed standard to assume that the probability of the system’s state to
be in a particular subset of the phase space X is proportional to the measure of A.
The last element to be introduced is the notion of the correlation between two subsets A and B
of the system’s phase space, which is defined as follows:
)()()(),( BABABAC . (C)
BT tt 1 A B
BTA tt 1
36
If the value of ),( BAC is positive (negative), there is positive (negative) correlation between
A and B; if it is zero, then A and B are uncorrelated. It then follows immediately from the
above that
),(),(1
1 ABTCABR tttt
. (RC)
(RC) constitutes the basis for the interpretation of EH as a hierarchy of objective randomness.
Granted this equation, the subjective probabilistic relevance of the event Bt1 for the event At
reflects objective dynamical properties of the system since for different transformations T
R(Bt1 ,At ) will indicate different kinds of probabilistic relevance of Bt1 for At .
To put (RC) to use, it is important to notice that the equations defining the various levels of
EH above can be written in terms of correlations. Taking into account that we are dealing with
discrete systems (and hence we have BTBT ktt 1, where k is the number of time steps it
takes to get from 1t and t), these equations read:
Ergodicity: 0),(1
lim1
0
n
kk
nABTC
n , for all XBA ,
Weak mixing: 0),(1
lim1
0
n
kk
nABTC
n, for all XBA ,
Strong Mixing: 0),(lim
ABTC nn
, for all XBA ,
K-Mixing: 0),(suplim 0),(
ABCrnBn
, for all XAAA r ,...,, 1
Bernoulli: 0),( ABTC n , for all AB, of the Bernoulli partition.
Applying (RC) to these expressions, we can explicate the nature of the unpredictability that
each of the different levels of EH involves.
37
Let us start at the top of EH. In Bernoulli systems the probabilities of the present state are
totally independent of whatever happened in the past, even if the past is only one time step
back. So knowing the past of the system does not improve our predictive abilities in the least;
the past is simply irrelevant to predicting the future. This fact is often summarised in the
slogan that Bernoulli systems are as random as a coin toss. We should emphasise, however,
that this is true only for events in the Bernoulli partition; the characterisation of a Bernoulli
system is silent about what random properties partitions other than the Bernoulli partition
have.
K-mixing is more difficult to analyse. We now have to tackle the question of how to
understand ),( rn , the minimal -algebra generated by the set },...,1;:{ rjnkAT jk that
we sidestepped above. What matters for our analysis is that the following types of sets are
members of ),( rn (ibid., 669): ...210 21 jkjkjk ATATAT , where the indices ij range
over 1, …, r. Since we are free to chose the sets rAAA ,...,, 10 as we please, we can always
chose them so that they are the past history of the system: the system was in 0j
A k time steps
back, in 1j
A k+1 time steps back, etc. Call this the (coarse-grained) remote past of the system
– ‘remote’ because we only consider states that are more than k time steps back). The K-
mixing condition then says that the system’s entire remote past history becomes irrelevant to
predicting what happens in the future as time tends towards infinity. Typically Bernoulli
systems are compared with K-systems by focussing on the events in the Bernoulli partition.
With respect to that partition K is weaker than Bernoulli. The difference is both the limit and
the remote history. In a Bernoulli system the future is independent of the entire past (not only
the remote past), and this is true without taking a limit (which in the case of K-mixing
independence only obtains in the limit). However, this only holds for the Bernoulli partition;
it may or may not hold for other partitions – the definition of a Bernoulli system says nothing
about that case.22
22 We would also like to mention that the analysis of randomness in Bernoulli and K-systems is based on
implications of the definitions of these systems, but they do not exhaust these definitions (or provide verbal
restatements of them) because there are parts of the definition that have not been used (in the case of Bernoulli
systems the condition that there be a generating partition and in the case of K-systems sets in ),( rn other than
ones of the form ...210 21 jkjkjk ATATAT . By contrast, the analyses of SM, WM, and E in the following
paragraphs exhaust the respective definitions. In the case of Bernoulli this has the consequence that the
38
The interpretation of strong mixing is now straightforward. It says that for any two sets A and
B, having been in B k time steps back becomes irrelevant to the probability of being in A some
time in the future if time tends towards infinity. In other words, past events B become
increasingly irrelevant for the probability of A as the temporal distance between A and B
becomes larger. This condition is weaker than K-mixing because it only stays that the future
is independent of isolated events in the remote past, while K-mixing implies independence of
the entire remote past history.
In weakly mixing systems the past may be relevant to predicting the future, even in the remote
past. The weak mixing condition only says that this influence has to be weak enough for it to
be the case that the absolute value of the correlations between a future event and past events
vanishes on average; but, again, this does not mean that individual correlations vanish. So in
weakly mixing systems the past can remain relevant to the future.
Ergodicity, finally, implies no decay of correlation at all. The ergodicity condition only says
that the average of the correlations (and this time without an absolute value) of all past events
with the relevant future event is zero. But this is compatible with there being strong
correlations between every instant in the past and the future, provided that positive and
negative correlations average out. So in ergodic systems the past does not become irrelevant.
For this reason ergodic system are not random at all (in the sense of random introduced
above). One could say that they mark, as it were, the zero point on the scale of randomness.23
7. The ‘no-application’ charge
How relevant are these insights to understanding the behaviour of actual systems? A
frequently heard objection (which we have already encountered in Section 5) is that EH and
characterisation given here also applies to some systems that are not ergodic. For a discussion of such cases see
Werndl (2009a, section 4.2.2). 23 They are not the only systems occupying the zero level. Periodic systems, for instance, are not random either.
We do not discuss other non-random systems here because they are not part of EH.
39
more generally ergodic theory are irrelevant since most systems (including those that we are
ultimately interested in) are not ergodic at all.24
This charge is less acute than it appears at first glance. First, it is important to emphasise that
it is not the sheer number of applications that make a physical concept important, but whether
there are some important systems that are ergodic. And there are examples of such systems.
For example, the so-called ‘hard-ball systems’ (and some more sophisticated variants of them)
are effective idealizations of the dynamics of gas molecules, and these systems seem to be
ergodic (for details, see Berkovitz, Frigg and Kronz 2006, Section 4.2, and references
therein).
Furthermore, EH can be used to characterize randomness and chaos in systems that are not
ergodic. Even if a system as a whole is not ergodic (i.e. if it fails to be ergodic with respect to
the entire phase space X) there can be (and usually there are) subsets of X on which the
system is ergodic. This is what Lichtenberg and Libermann (1992, p. 295) have in mind when
they observe that that ‘[i]n a sense, ergodicity is universal, and the central question is to
define the subspace over which it exists’. In fact, non-ergodic systems may have subsets that
are not only ergodic, but even Bernoulli! It then becomes an interesting questions what these
subsets are, of what measure they are, and what topological features they have. These are
questions studied in parts of dynamical systems theory, most notably KAM theory. Hence,
KAM theory does not demonstrate that ergodic theory is not useful in analyzing the
dynamical behavior of real physical systems (as is often claimed). Indeed, KAM systems have
regions in which the system manifest either merely ergodic or Bernoulli behaviour, and
accordingly EH is useful for charactering the dynamical properties of such systems
(Berkovitz, Frigg and Kronz 2006, Section 4). Further, as we have mentioned in Section 5,
almost all Hamiltonian systems are non-integrable, and accordingly they have large regions of
the phase space in which their motion is ergodic-like. So EH is a useful tool in studying the
dynamical properties of systems even if the system fails to be ergodic tout court.
Another frequently heard objection is that EH is irrelevant in practice because most levels of
EH (in fact, all except Bernoulli) are defined in terms of infinite time limits and hence remain
24 This bit of conventional wisdom is backed-up by a theorem by Markus and Mayer (1974), which is based on
KAM theory, and basically says that generic Hamiltonian dynamical systems are not ergodic.
40
silent about what happens in finite time. But all we ever observe are finite times and so EH is
irrelevant to physics as practiced by actual scientists.
This charge can be dispelled by a closer look at the definition of a limit, which shows that
infinite limits in fact have important implications for the dynamical behaviour of the system in
finite times. The definition of a limit is as follows (where f is an arbitrary function of time):
ctft
)(lim iff for every 0 there exists an t’ > 0 so that for all t > t’ we have f (t) c .
In words, for every number , no matter how small, there is a finite time t’ after which the
values of f differ from c by less then . That is, once we are past t’ the values of f never move
more than away from c. With this in mind strong mixing, for instance, says that for a given
threshold there exists a finite time tn (n units of time after the current time) after which
),( ABTC n is always smaller than . We are free to choose to be an empirically relevant
margin, and so we know that if a system is mixing, we should expect the correlations between
the states of the system after tn and its current to be below . The upshot is that in strong
mixing systems, being in a state B at some past time becomes increasingly irrelevant for its
probability of being in the state A now, as the temporal distance between A and B becomes
larger. Thus, the fact that system is strong mixing clearly has implications for its dynamical
behaviour in finite times.
Since different levels of EH correspond to different degrees of randomness, each explicated in
terms of a different type of asymptotic decay of correlations between states of systems at
different times, one might suspect that a similar pattern can be found in the rates of decay.
That is, one might be tempted to think that EH can equally be characterized as a hierarchy of
increasing rates of decay of correlations: K-system, for instance, which exhibits exponential
divergence of trajectories would be characterized by an exponential rate of decay of
correlations, while a SM-system would exhibit a polynomial rate of decay.
This, unfortunately, does not work. Natural as it may seem, EH cannot be interpreted as a
hierarchy of increasing rates of decay of correlations. It is a mathematical fact that there is no
particular rate of decay associated with each level of EH. For instance, one can construct K-
systems in which the decay is as slow as one wishes it to be. So the rate of decay is a feature
of certain properties of a system rather than of a level of EH.
41
8. The Ergodic Hierarchy and Chaos
The question of how to characterise chaos has been controversially discussed ever since the
inception of chaos theory; for a survey see Smith (1998, Ch. 10). An important family of
approaches defines chaos using EH. Belot and Earman (1997, 155) state that being strong
mixing is a necessary condition and being a K-system is a sufficient condition for a system to
be chaotic. The view that being a K-system is the mark of chaos and that any lower degree of
randomness is not chaotic is frequently motivated by two ideas. The first is the idea that
chaotic behaviour involves dynamical instability in the form of exponential divergence of
nearby trajectories. Thus, since a system involves an exponential divergence of nearby
trajectories only if it is a K-system, it is concluded that (merely) ergodic and mixing systems
are not chaotic whereas K- and B-systems are. It is noteworthy, however, that SM is
compatible with there being polynomial divergence of nearby trajectories and that such
divergence sometimes exceeds exponential divergence in the short run. Thus, if chaos is to be
closely associated with the rate of divergence of nearby trajectories, there seems to be no
good reason to deny that SM systems exhibit chaotic behaviour.
The second common motivation for the view that being a K-system is the mark of chaos is the
idea that the shift from zero to positive KS-entropy marks the transition from a ‘regular’ to
‘chaotic’ behaviour. This may suggest that having positive KS-entropy is both necessary and
sufficient condition for chaotic behaviour. Thus, since K-systems have positive KS-entropy
while SM systems don’t, it is concluded that K-systems are chaotic whereas SM-systems are
not. Why is KS-entropy a mark of chaos? There are three motivations, corresponding to three
different interpretations of KS-entropy. First, KS-entropy could be interpreted as entailing
dynamical instability in the sense of having nearby divergence of nearby trajectories (see
Lichtenberg & Liebermann, 1992, p. 304). Second, KS-entropy could be connected to
algorithmic complexity (Brudno 1978). Yet, while such a complexity is sometimes mentioned
as an indication of chaos, it is more difficult to connect it to physical intuitions about chaos.
Third, KS-entropy could be interpreted as a generalized version of Shannon’s information
theoretic entropy (see Frigg 2004). According to this approach, positive KS-entropy entails a
certain degree of unpredictability, which is sufficiently high to deserve the title chaotic.25
25 We would like to point out that an analysis of chaos in terms of positive KS-entropy needs further
qualifications. A system whose dynamics is, intuitively speaking, chaotic only on a part of the phase space can
still have positive KS-entropy. A case in point is a system with X = [-1, 1] where dynamics on [-1,0) is the
42
In a recent paper Werndl (2009b) argues that a careful review of all systems that one
commonly regards as chaotic shows that strong mixing is the crucial criterion. So a system is
chaotic just in case it is strong mixing. As she is careful to point out, this claim needs to be
qualified: systems are rarely mixing on the entire phase space, but neither are they chaotic on
the entire phase space. The crucial move is to restrict attention to those regions of phase space
where the system is chaotic, and it then turns out that in these same regions the systems are
also strong mixing. Hence Werndl concludes that strong mixing is the hallmark of chaos. And
surprisingly this is true also of dissipative systems (i.e. systems that are not measure
preserving). These systems have attractors, and they are chaotic on their attractors rather than
on the entire phase space. The crucial point then is that one can define an invariant
(preserved) measure on the attractor and show that the system is strongly mixing with respect
to that measure. So strong mixing can define chaos in both conservative and dissipative
systems.
The search for necessary and sufficient conditions for chaos presupposes that there is a clear-
cut divide between chaotic and non-chaotic systems. EH may challenge this view, as every
attempt to draw a line somewhere to demarcate the chaotic from non-chaotic systems is bound
to be somewhat arbitrary. Ergodic systems are pretty regular, mixing systems are less regular
and the higher positions in the hierarchy exhibit still more haphazard behaviour. But is there
one particular point where the transition from ‘non-chaos’ to chaos takes place? Based on the
argument that EH is a hierarchy of increasing degrees of randomness and degrees of
randomness correspond to different degrees of unpredictability (see Section 6), Berkovitz,
Frigg and Kronz (2006, Section 5.3) suggest that chaos may well be viewed as a matter of
degree rather than an all-or-nothing affair. Bernoulli systems are very chaotic, K-systems are
slightly less chaotic, SM-systems are still less chaotic, and ergodic systems are non-chaotic.
This suggestion connects well with the idea that chaos is closely related to unpredictability.
Conclusion
identity function and the tent map on [0, 1]. This system has positive KS-entropy, but dynamics of the entire
systems is not chaotic (only the part on [0, 1] is). This problem can be circumvented by adding extra conditions,
for instance that the system be ergodic (which the above system clearly is not).
43
EH is often regarded as relevant for explicating the nature of randomness in deterministic
dynamical systems. It is not clear, however, what notion of randomness this claim invokes.
The formal definitions of EH do not make explicit appeal to randomness and the usual ways
of presenting EH do not involve any specification of the notion of randomness that is
supposed to underlie EH. As suggested in Section 6, EH can be interpreted as a hierarchy of
randomness if degrees of randomness are explicated in terms of degrees of unpredictability,
which in turn are explicated in terms of conditional degrees of beliefs. In order for these
degrees of belief to be indicative of the system’s dynamical properties, they have to be
updated according to a system’s dynamical law. The idea is then that the different levels of
EH, except for merely ergodic systems, correspond to different kinds of unpredictability,
which correspond to different patterns of decay of correlations between their past states and
present states. Merely ergodic systems seem to display no randomness, as the correlations
between their past and present states need not decay at all.
Ergodic theory in general, and EH in particular play an important role in statistical physics. In
particular, they play an important role in the foundations of statistical mechanics (Section 5),
and EH, or some modification of it, constitutes an important measure of randomness in both
Hamiltonian and dissipative systems. It is frequently argued that EH is by and large irrelevant
for physics because real physical systems are not ergodic. But, this charge is unwarranted, and
a closer look at non-ergodic systems reveals a rather different picture. Almost all Hamiltonian
systems are non-integrable (in the sense non-integrable systems are of second Baire category
in the class of all normalised and infinitely differentiable Hamiltonians) and therefore in large
regions of their phase space the motion is random in various ways well captured by EH.
Further, as Werndl (2009) argues, EH could also be used to characterize randomness and
chaos in dissipative systems. So EH is a useful tool to study the dynamical properties of
various ergodic and non-ergodic systems.
Appendix
A. The Conceptual Roots of Ergodic Theory
The notion of an abstract dynamical system is both concise and effective. It focuses on certain
structural and dynamical features that are deemed essential to understanding the nature of the
44
seemingly random behaviour of deterministically evolving physical systems. The selected
features were carefully integrated, and the end result is a mathematical construct that has
proven to be very effective in revealing deep insights that would otherwise have gone
unnoticed. This brief note will provide some understanding of the key developments that
served to influence the choice of features involved in constructing this concept.
In his well known analysis of causation, Hume claims that the terms efficacy, agency, power,
force, energy, necessity, connection, and productive quality are nearly synonymous, and he
regards it as an absurdity to employ one in defining the rest (Hume 1978), section 1.3.14.
These are powerful claims, and when they are combined with his sceptical arguments
concerning necessary connections that he develops in that section, the result is a
metaphysically austere vision of science. However, one may regard Hume’s claims as
restricted to moral philosophy (the science of human nature) as opposed to having a broad
application that extends to natural philosophy (physics, chemistry and biology); compare
(Stroud 1977), pp. 1-16. The narrower interpretation permits regarding Hume’s claim as
peacefully co-existing with the history of mechanics, where dynamic terms such as those
above and related terms (such as those that are more fundamental) have distinctive meanings
and uses that are crucially important.
One key development in the early history of mechanics was the realization of the need to
settle on a set of fundamental quantities. For example, Descartes regarded volume and speed
as fundamental; whereas, Newton regarded mass and velocity as such. Those quantities were
then respectively used by each of them to define other important quantities, such as the notion
of quantity of motion. Descartes defined it as size (volume) times speed (Descartes 1983),
paragraph 2.36; whereas, Newton defined it as mass times velocity, (Newton 1687), p 1. (See
(Cohen 1966) for further discussion of the two views and the historical relationship between
them; also, see (Dijksterhuis 1986) for discussion of them in a broader historical context.)
Both Descartes and Newton regarded a force as that which brings about a change in the
quantity of motion; compare Descartes third law of motion (Descartes 1644), paragraph 2.40,
and Newton’s second law of motion (Newton 1687), p. 13. However, these are quite distinct
notions, and one has deeper ontological significance and substantially greater utility than the
other. In (Garber 1992), there is an excellent discussion of some of the shortcomings of
Descartes’ physics.
45
Although Newton’s notion of force is extremely effective, questions arose as to whether it is
the most fundamental dynamical notion on which mechanics is to be based. Eventually, it was
realized that the notion of energy is more fundamental than the notion of force. Both are
derived notions, meaning that they are defined in terms of fundamental quantities. The crucial
question is how to distinguish the derived quantities that are the most fundamental, or at least
more fundamental than the others. The answer to that question is far from straightforward.
Sometimes such determinations are made on the basis of principles (such as the principle of
virtual work or the principle of least action), or because they prove more useful than others in
solving problems or in providing deeper insight. In the history of mechanics, it was eventually
realized that it is best to adopt Hamilton’s formulation of mechanics rather than Newton’s; see
(Dugas 1988) for further discussion of the development of mechanics after Newton. In
Hamilton’s formulation, the fundamental equations of motion are defined in terms of the total
energy (kinetic plus potential energies) of a system, by contrast with the Newtonian
formulation, which define them in terms of the sum of the total forces acting on the system. A
number of deeply important insights result from that choice, and some of those are crucial for
understanding and appreciating the elegant conciseness of the notion of an abstract dynamical
system.
One key innovation of Hamilton’s approach is the use of phase space, a 6N dimensional
mathematical space, where N is the number of particles constituting the system of interest.
The 6N dimensions are constituted by 3 spatial coordinates per particle and one “generalized”
momentum coordinate per spatial coordinate. For a single simple system (such as a particle
representing a molecule of a gas) the phase space has 6 dimensions. Each point x in phase
space represents a possible physical state (also known as a phase) of the classical dynamical
system; it is uniquely specified by an ordered list of 6 (more generally, 6N for an N particle
system) numerical values, meaning a 6 dimensional vector. Once the state is known, other
properties of the system can be determined; each property corresponds to a mathematical
function of the state (onto the set of possible property values). The time evolution of the state
of a system (and so of its properties) is governed by a special function, the Hamiltonian,
which can be determined in many cases from the forces that act on the system (and in other
ways). The Hamiltonian specifies the transformation of the state of the system over time by
way of Hamilton’s equations of motion, which is a close analogue to Newton’s equation,
force equals mass time acceleration. It should perhaps be emphasized that the two
formulations of classical mechanics are not completely equivalent; that is to say, for many but
46
not all classical systems the corresponding mathematical representations of them are inter-
translatable. For further discussion, see section 1.7 of (Lanczos 1986) and section 2.5.3 of
(Torretti ).
The use of Hamilton’s formulation of the equations of motion leads to two immediate
consequences, the conservation of energy and the preservation of phase space volumes; for
more discussion, see sections 6.6 and 6.7 of (Lanczos 1986). These consequences are crucial
for understanding the foundations of ergodic theory. They are quite general, though not fully
general since some substantial assumptions (that need not be specified here) must be made to
derive them; however, a large class of important systems satisfy those assumptions. The
conservation of energy means that the system is restricted to a surface of constant energy in
phase space; more importantly for the foundations of ergodic theory is that most of these
surfaces are (as it turns out) compact manifolds. The time evolution of a phase space volume
that is restricted to a compact manifold has an invariant measure that is bounded, meaning
that it can be normalized to unity.
In light of the discussion above, important conceptual ties can be made to the elements that
constitute the notion of an abstract dynamical system. As noted above (in the main body of
this entry), those elements are a probability space [X,,] and a measure preserving
transformation T on [X,,]. The term X denotes an abstract mathematical space of points. It
is the counterpart to the phase space of Hamiltonian mechanics; however, it abstracts away
from the physical connections that the coordinate components have to spatial and kinematic
elements (the generalized momenta) of a classical system. The term denotes a -algebra of
subsets of X, and it is the abstract counterpart to the set of all possible phase space volumes.
The classical phase space volume is an important measure, and it is replaced by , a
probability measure on . The abstraction to a probability measure is ultimately related to the
conservation of energy and to the resulting restriction (in many cases) of the time evolution to
a compact manifold. In compact manifold cases, units may be chosen so that the total volume
of the compact manifold X is unity, in which case the volume measure on the set of sub-
volumes (the counterpart to ) is effectively a probability measure (the counterpart to ). The
phase-space-volume preserving time-evolution specified by Hamilton’s equations is replaced
by the abstract notion of a probability measure preserving transformation T on X.
47
To fully appreciate why the time evolution of volumes of phase space are of special interest in
ergodic theory rather than points of phase space, it is necessary to relate the discussion above
to developments in classical statistical mechanics. Classical statistical mechanics is typically
used to model systems that consist of a large number of sub-systems, such as a volume of gas.
A liter of a gas at standard temperature and pressure has on the order of 1020 molecules, which
means that the corresponding phase space has 6 x 1020 dimensions (leaving aside other
features of the molecules such as their geometric structure, which is often done for the sake of
simplicity). For such systems, the Hamiltonian depends on both inter-particle forces as well as
external forces. As in classical mechanics, the total energy is conserved (given certain
assumptions, as noted earlier) and the time evolution preserves phase space volumes.
One important innovation in classical statistical mechanics is the use of a new notion of state
for physical systems; referred to as a macrostate (or ensemble density). This notion goes back
to Gibbs (1902), and has since been widely used (and we should emphasise that the Gibbsian
notion of a macrostate is different from the Boltzmannian, introduced in section 5.1). The
macrostate is sometimes interpreted as indicating what is known probabilistically about the
actual physical state of the system. Macrostates are represented by density functions, which
are characterized below. The actual state of the system is referred to as a microstate, and such
states are represented as phase space points (as in classical mechanics). The predominant
reason for introducing macrostates is the large number of sub-systems that constitute the
typical system of interest, such as a volume of gas; such numbers make it impossible in
practice to make a determination of the actual state of the system.
A density function is a function that is normalized to unity over the relevant space of states for
the system (meaning a surface of constant total energy). If f(x) denotes the density function
that describes the macrostate of a system, then f(x) may be used to calculate the probability
that the system is in a given volume A of phase space by integrating the density function over
the specified volume, ∫Af(x)dx. Such probabilities are sometimes interpreted epistemically,
meaning that they represent what is known probabilistically about the microstate of the
system with regards to each volume of phase space. Subsets of phase space that can be
assigned a volume are known as the Lebesgue measurable sets,26 and their abstract
counterpart in ergodic theory is the -algebra of subsets of X. The probability measure is
26 Not all subsets of phase space points are measurable—see (Royden 1968, pp. 52-65) for an explanation
48
the abstract counterpart to the product of the density function and the Lebesgue measure in
classical statistical mechanics.
It turns out that the density function may also be used to obtain information about the average
value of each physical quantity of the system with respect to any given volume of phase
space. As already noted, each physical quantity of a classical system is represented by a
function on phase space. Such functions are similar to density functions in that they must be
Lebesgue integrable; however, they need not be normalized to unity. Suppose that f(x) is the
macrostate of the system. If g(x) is one of its physical quantities, then ∫Af(x)g(x)dx denotes the
average value of g(x) over phase space volume A.
The time evolution of an macrostate is defined in terms of the time evolution of the
microstates. Suppose that f(x) is the macrostate of the system for some chosen initial time, and
let Tt be the time evolution operator associated with the Hamiltonian for the system, which
governs its time evolution from the initial time to some other time t. During that time interval,
f(x) evolves to some other density operator ft(x) since Tt is measure preserving. It turns out
that the time evolved state ft(x) corresponds to Ttf(x), which is by definition equal to f(Ttx).
The probability that the system is in a given volume of phase space at a given time is
determined by integrating the density function at the given time over the specified volume.
A brief discussion of some key developments in the foundations of statistical mechanics will
serve to provide a deeper appreciation for the notion of an abstract dynamical system and its
role in ergodic theory. The theory emerged as a new abstract field of mathematical physics
beginning with the ergodic theorems of von Neumann and Birkhoff in the early 1930s. The
theorems have their roots in Ludwig Boltzmann’s ergodic hypothesis, which was first
formulated in the late 1860s (Boltzmann 1868, 1871). Boltzmann introduced the hypothesis in
developing classical statistical mechanics; it was used to provide a suitable basis for
identifying macroscopic quantities with statistical averages of microscopic quantities, such as
the identification of gas temperature with the mean kinetic energy of the gas molecules.
Although ergodic theory was inspired by developments in classical mechanics, classical
statistical mechanics, and even to some extent quantum mechanics (as will be shown shortly),
it became of substantial interest in its own right and developed for the most part in an
autonomous manner.
49
Boltzmann’s hypothesis says that an isolated mechanical system, which is one in which total
energy is conserved, will pass through every point that lies on the energy surface
corresponding to the total energy of the system in phase space, the space of possible states of
the system. Strictly speaking, the hypothesis is false; that realization came about much later
with the development of measure theory. Nevertheless the hypothesis is important due in part
to its conceptual connections with other key elements of classical statistical mechanics such as
its role in establishing the existence and uniqueness of an equilibrium state for a given total
energy, which is deemed essential for characterizing irreversibility, a central goal of the
theory. It is also important because it is possible to develop a rigorous formulation that is
strong enough to serve its designated role. Historians point out that Boltzmann was aware of
exceptions to the hypothesis; for more on that, see (von Plato 1992).
Over thirty years after Boltzmann’s formulation of the ergodic hypothesis, Henri Lebesgue
provided important groundwork for a rigorous formulation of the hypothesis in his
development of measure theory, which is based in his theory of integration. About thirty years
after that, von Neumann developed his Hilbert space formulation of quantum mechanics,
which he developed in a well known series of papers that were published between 1927 and
1929. That inspired Bernard Koopman to develop a Hilbert space formulation of classical
statistical mechanics (Koopman 1931). In both cases, the formula for the time evolution of the
state of a system corresponds to a unitary operator that is defined on a Hilbert space; a unitary
operator is a type of measure-preserving transformation. Von Neumann then used Koopman’s
innovation to prove what is known as the mean ergodic theorem (von Neumann 1932).
Birkhoff then used von Neumann’s theorem as the basis of inspiration for his ergodic theorem
(Birkhoff 1931). That von Neumann’s work influenced Birkhoff despite that Birkhoff’s paper
was published before von Neumann’s is explained in (Birkhoff and Koopman 1932).
Birkhoff’s paper provides a rigorous formulation and proof of Boltzmann’s conjecture that
was put forth over sixty years earlier. The key difference is that Birkhoff’s formulation is
weaker than Boltzmann’s, only requiring saying that almost all solutions visit any set of
positive measure in phase space in the infinite time limit. What is of particular interest here is
not Birkhoff’s ergodic theorem per se, but the abstractions that inspired it and that ultimately
led to the development of ergodic theory. For further discussion of the historical roots of
ergodic theory, see pp. 93-114 of (von Plato).
50
In the Koopman formulation of classical mechanics a unitary operator Tt that is defined in
terms of the Hamiltonian represents time evolution. It does so in its action on the state xX of
the system: If the initial state of a system is x, then at time t its state is Ttx. It can be shown
that the set of operators {Tt : tR} for a given Hamiltonian constitutes a mathematical group.
A set of elements with an operator is a group if the following three conditions are
satisfied.
Associativity: A(BC) = (AB)C for all A,B,C.
Identity element: IA = AI = A there is an I such that for all A.
Inverse element: AB = BA = I for each a there is a B, e is the identity.
The strategy underlying ergodic theory is to focus on simple yet relevant models to obtain
deeper insights about notions that are pertinent to the foundations of statistical mechanics
while avoiding unnecessary technical complications. Ergodic theory abstracts away from
dynamical associations including forces, potential and kinetic energies, and the like.
Continuous time evolution is often replaced with discrete counterparts to further simplify
matters. In the discrete case, a continuous group {Tt : tR} is replaced by a discrete group
{Tn : nZ} (and, as we have seen above, the evolution of x over n units of time corresponds to
nth iterate of a map T: meaning that Tnx = Tnx). Other advantages to the strategy include
facilitating conceptual connections with other branches of theorizing and providing easier
access to generality. For example, the group structure may be replaced with a semi-group,
meaning that the inverse-element condition is eliminated to explore irreversible time
evolution, another characteristic feature that one hopes to capture via classical statistical
mechanics. This entry restricts attention to invertible maps, but the ease of generalizing to a
broader range of phenomena within the framework of ergodic theory is worth noting.
A. Measure Theory
A set is an algebra of subsets of X if and only if the following conditions hold: The union of
any pair of elements of is in , the complement of each element of is in , and the empty
set is in . In other words, for every A,B, AB and Ã, where à denotes the set of
all elements of X that are not in A. An algebra of subsets of X is a -algebra if and only
51
contains every countable union of countable collections of its elements. In other words, if
{Ai} is countable, the countable union Ai is in .
By definition, is a probability measure on if and only if the following conditions hold:
assigns each element of a value in the unit interval, assigns X the maximum value, and
assigns the same value to the union of a finite or countable disjoint elements of that it does
to the sum of the values that it assigns to those elements. In other words, :[0,1], (X)=1,
()=0, and (Bi) =(Bi) whenever {Bi} is finite or countable and BjBk = for each
pair of distinct elements Bj and Bk of {Bi}. The probability measure is the abstract
counterpart in ergodic theory to the density function in classical statistical mechanics.
B. K-Systems
The standard definition of a K-system is the following (see Arnold and Avez 1968, p. 32, and
Cornfeld et al. 1982, p. 280): A dynamical system [ X,, , ] is a K-system if and only if
there is a subalgebra 0 such that the following three conditions hold: (1) 0 T0 , (2)
n V T n0 , (3) n
T n0 N . In this definition, T n0 is the sigma algebra
containing the sets T n B ( B 0 ), N is the sigma algebra consisting uniquely of sets of
measure one and measure zero, n V T n0 is the smallest -algebra containing all the 0
nT ,
and n T n0 denotes the largest subalgebra of which belongs to each T n0 .
The Kolomogoriv-Sinai entropy of an automorphism T is defined as follows. Let the function
z be:
z(x):x log(x) if x 0
0 if x 0
Now consider a partition of the probability space [ X, B,] and let the function h( ) be
h( ) : z[(i )]i1
r
, the so-called ‘entropy of the partition ’. Then, the KS-entropy of the
automorphism T relative to the partition is defined as
52
h(T , ): limn h( T ... T n1 ) / n ,
and the (non-relative) KS-entropy of T is defined as
h(T ): sup h(T , ),
where the supremum ranges over all finite partitions of X .
One can now prove the following theorem (Walters 1982, p. 108; Cornfeld et al. 1982, p.
283). If [ X, B,] is a probability space and T: X X is a measure-preserving map, then T is
a K-automorphism if and only if h(T , ) 0 for all finite partitions , N (where N is a
partition that consists of only sets of measure one and zero). Since the (non-relative) KS-
entropy is defined as the supremum of h(T , ) over all finite partitions it follows immediately
that a K-automorphism has positive KS-entropy; i.e., h(T ) 0 (Cornfeld et al p. 283; Walters
1982, p. 109). But notice that the converse it not true: there are automorphisms with a positive
KS-entropy that are not K-automorphisms.
C. Bernoulli systems
Let Y be a finite set of elements Y f1,..., fn (sometimes also called the ‘alphabet’ of the
system) and let fi pi be a probability measure on Y : 0 pi 1 for all 1 i n , and
pi 1i1
n
. Furthermore, let X be the direct product of infinitely many copies of Y : X Yii
,
where Yi Y for all i . The elements of X are doubly-infinite sequences x xi i
, where
xi Y for each i Z . As the -algebra C of X we choose the -algebra generated by all sets
of the form x X xi k, m i m n for all m Z , for all n N , and for all k Y (the
so-called ‘cylinder sets’). As a measure on X we take the product measure ii
, that is
xi i
... (x2 )(x1) (x0 )(x1 )(x2)... The triple is stationary if the chance element is
constant in time, that is iff for all cylinder sets
(y:yi1 wi ,m i m n) (y:yi wi , m i m n) holds. An invertible measure-
preserving transformation T: X X , the so-called shift map, is naturally associated with
53
every stationary stochastic process: Tx yi i
where yi xi1 for all i Z . It is
straightforward to see that the measure is invariant under T (i.e. that T is measure
preserving) and that T is invertible. This construction is commonly referred to as a ‘Bernoulli
Scheme’ and denoted by ‘ B(p1, ..., pn )’. From this it follows that the quadruple [ X,C,,T ] is a
dynamical system.
Bibliography
Alekseev, V. M., and Yakobson, M. V. (1981). Symbolic dynamics and hyperbolic dynamical
systems. Physics Reports 75, 287–325. Argyris, J., Faust, G. and Haase, M. (1994). An exploration of chaos. Amsterdam: Elsevier. Albert, D. (2000). Time and Chance. Cambridge/MA and London: Harvard University Press. Arnold, V. I. and Avez, A. (1968). Ergodic Problems of Classical Mechanics. New York:
Wiley. Belot, G., & Earman, J. (1997). Chaos out of order: Quantum mechanics, the correspondence
principle and chaos. Studies in the History and Philosophy of Modern Physics 28, 147–182.
Berkovitz, J., Frigg, R. and Kronz, F. (2006). The Ergodic Hierarchy, Randomness and Hamiltonian Chaos. Studies in History and Philosophy of Modern Physics 37, 661-691. Birkhoff, G. D. (1931), “Proof of a Recurrence Theorem for Strongly Transitive Systems,”
and “Proof of the Ergodic Theorem,” Proceedings of the National Academy of Sciences, 17: 650-660.
Birkhoff, G. D. and Koopman, B. O., 1932, “Recent Contributions to the Ergodic Theory,” Proceedings of the National Academy of Sciences, 18: 279-282.
Boltzmann, L., 1868, “Studien über das Gleichgewicht der lebendigen Kraft zwischen bewegten materiellen Punkten,” Wiener Berichte, 58: 517–560,
Boltzmann, L., 1871, “Über das Wärmegleichgewicht zwischen mehratomigen Gasmolekülen,” Wiener Berichte, 63: 397–418.
Boltzmann, L. (1877). Über die beziehung zwischen dem zweiten hauptsatze der mechanischen wärmetheorie und der wahrscheinlichkeitsrechnung resp. den sätzen über das wärmegleichgewicht. Wiener Berichte 76, 373-435. Reprinted in F. Hasenöhrl ed, Wissenschaftliche Abhandlungen. Leipzig: J. A. Barth 1909, Vol. 2, pp. 164-223.
Bricmont, J. (2001). Bayes, Boltzmann, and Bohm: Probabilities in physics, in Bricmont et al. (2001), pp. 4-21.
Brudno, A. A. (1978). The complexity of the trajectory of a dynamical system. Russian Mathematical Surveys 33, 197–198.
Brush, S. G. (1976). The Kind of Motion We Call Heat. Amsterdam: North Holland Publishing.
Cohen, I. B. 1966, “Newton’s Second Law and the Concept of Force in the Principia,” Texas Quarterly, 10.3: 127-157.
Cornfeld, I. P., Fomin, S. V., and Sinai, Y. G. (1982). Ergodic Theory. Berlin and New York: Springer.
Degas, René, 1988 [1955], A History of Mechanics, Dover Publications [Neuchatel, Switzerland: Editions du Griffon].
54
Descartes, René, (1644), Principles of Philosophy, edited by V. R. Miller and R. P. Miller, Dordrecht: D. Reidel Publishing Co. 1983
Dijksterhuis, E. J., 1986 [1961], The Mechanization of the World Picture, Princeton: Princeton University Press [Oxford: Oxford University Press].
Earman, J. (1986). A Primer on Determinism. Dordrecht: D. Reidel Publishing Company. Earman, J. and Redei, M. (1996). Why ergodic theory does not explain the success of equilibrium statistical mechanics. British Journal for the Philosophy of Science 47, 63-78. Frigg, R. (2004). In What Sense Is the Kolmogorov-Sinai Entropy a Measure for Chaotic
Behaviour? – Bridging the Gap Between Dynamical Systems Theory and Communication Theory. British Journal for the Philosophy of Science 55, 2004, 411-434.
Frigg, R. (2008). A Field Guide to Recent Work on the Foundations of Statistical Mechanics. In Dean Rickles (ed.), The Ashgate Companion to Contemporary Philosophy of Physics. London: Ashgate, 99-196.
Frigg, R. (2009a). Probability in Boltzmannian Statistical Mechanics. Forthcoming in Gerhard Ernst and Andreas Hüttemann (eds.), Time, Chance and Reduction. Philosophical Aspects of Statistical Mechanics. Cambridge: Cambridge University Press.
Frigg, R. (2009b). Typicality and the Approach to Equilibrium in Boltzmannian Statistical Mechanics’, forthcoming in Philosophy of Science (Supplement).
Garber, Daniel, 1992, “Descartes’ Physics,” in The Cambridge Companion to Descartes, Edited by John Cottingham, Cambridge: Cambridge University Press.
Gibbs, J. W. (1902). Elementary Principles in Statistical Mechanics. Woodbridge: Ox Bow Press, 1981.
Hume, David, 1978, A Treatise of Human Nature. Edited by L. A. Selby-Bigge with notes by P. H. Nidditch, Oxford: Oxford University Press.
Khinchin, A. I. (1949). Mathematical Foundations of Statistical Mechanics. Mineola/NY: Dover Publications 1960.
Koopman, Bernard, 1931, “Hamiltonian Systems and Hilbert Space,” Proceedings of the National Academy of Sciences, 17: 315-318.
Lanczos, Cornelius, 1986 [1970], The Variational Principles of Mechanics, Dover Publications [Toronto: University of Toronto Press].
Lavis, D. (2005). Boltzmann and Gibbs: An attempted reconciliation. Studies in History and Philosophy of Modern Physics 36, 245-73.
Lichtenberg, A. J., and Liebermann, M. A. (1992). Regular and chaotic dynamics (2nd ed). Berlin and New York: Springer.
Malament, D. and Zabell, S. (1980). Why Gibbs Phase Averages work – the Role of Ergodic Theory. Philosophy of Science 47, 339–349. Mañé, R. (1983). Ergodic theory and differentiable dynamics. Berlin and New York:
Springer. Markus, L., & Meyer, K. R. (1974). Generic Hamiltonian dynamical systems are neither
integrable nor ergodic. Memoirs of the American mathematical society, Providence, Rhode Island.
McLaughlin, B. and Bennett, K. Supervenience. The Stanford Encyclopedia of Philosophy (Fall 2008 Edition), Edward N. Zalta (ed.), URL = <http://plato.stanford.edu/archives/fall2008/entries/supervenience/>.
Newton, Isaac, (1687), Mathematical Principles of Natural Philosophy, edited by A. Motte and revised by F. Cajori, Berkeley: University of California Press 1934.
Ornstein, D. S. (1974). Ergodic theory, randomness, and dynamical systems. New Haven: Yale University Press.
Ott (1993). Chaos in dynamical systems. Cambridge: Cambridge University Press. Shields (1973). The theory of Bernoulli shifts. Chicago: Chicago University Press.
55
Simanyi, N. (2004), “Proof of the Ergodic Hypothesis for Typical Hard Ball Systems,” Ann. Henri Poincare 5: 203-233.
Sklar, L. (1993), Physics and Chance: Philosophical Issues in the Foundation of Statistical Mechanics. Cambridge: Cambridge University Press.
Smith, P. (1998). Explaining Chaos. Cambridge: Cambridge University Press. Stroud, Barry, 1977, Hume, London: Routledge and Kegan Paul. Tabor (1989). Chaos and integrability in nonlinear dynamics. An introduction. New York:
Wiley. Tolman, R. C. (1938). The Principles of Statistical Mechanics. Mineola/New York: Dover
1979. Torertti, Roberto, 1999, The Philosophy of Physics, Cambridge: Cambridge University Press. Uffink, J. (2004). Boltzmann's work in statistical physics. Standford Encyclopedia of
Philosophy. http://plato.stanford.edu, Winter 2004 edn. Uffink, J. (2007). Compendium of the foundations of classical statistical physics. In J.
Butterfield and J. Earman eds., Philosophy of Physics. Amsterdam: North Holland, 923-1047.
Van Lith, J. (2001a). Ergodic theory, interpretations of probability and the foundations of statistical mechanics. Studies in History and Philosophy of Modern Physics 32, 581-94.
Von Neumann, John, (1932) “Proof of the Quasi-Ergodic Hypothesis,” Proceedings of the National Academy of Sciences, 18: 70-82.
Von Plato, Jan, 1992, “Boltzmann’s Ergodic Hypothesis,” Archive for the History of exact Sciences 44. pp. 71-89
Von Plato, J. (1994). Creating Modern Probability. Cambridge: Cambridge University Press. Vranas, P. (1998). Epislon-ergodicity and the success of equilibrium statistical mechanics.
Philosophy of Science 68, 688-708. Werndl, C. (2009a). Justifying Definitions in Mathematics -- Going Beyond Lakatos.
Philosophia Mathematica 17, pp. 313-340. Werndl, C. (2009b). What Are the New Implications of Chaos for Unpredictability?
Forthcoming in British Journal for the Philosophy of Science.