EconoPhysics - Empirical Facts and Agent-based Models

download EconoPhysics - Empirical Facts and Agent-based Models

of 51

Transcript of EconoPhysics - Empirical Facts and Agent-based Models

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    1/51

    arXiv:0909.1974v2

    [q-fin.GN]21Jun2010

    Econophysics: Empirical facts and agent-based modelsAnirban Chakrabortia) and Ioane Muni Tokeb)

    Laboratoire de Mathematiques Appliquees aux Systemes, Ecole Centrale Paris, 92290 Chatenay-Malabry,France

    Marco Patriarcac)

    National Institute of Chemical Physics and Biophysics, Ravala 10, 15042 Tallinn,Estonia andInstituto de Fisica Interdisciplinaire y Sistemas Complejos (CSIC-UIB), E-07122 Palma de Mallorca,

    SpainFrederic Abergeld)

    Laboratoire de Mathematiques Appliquees aux Systemes, Ecole Centrale Paris, 92290 Chatenay-Malabry,France

    (Dated: 22 June 2010)

    This article aim at reviewing recent empirical and theoretical developments usually grouped under the termEconophysics. Since its name was coined in 1995 by merging the words Economics and Physics, this newinterdisciplinary field has grown in various directions: theoretical macroeconomics (wealth distributions),microstructure of financial markets (order book modelling), econometrics of financial bubbles and crashes,etc. In the first part of the review, we begin with discussions on the interactions between Physics, Math-ematics, Economics and Finance that led to the emergence of Econophysics. Then we present empiricalstudies revealing statistical properties of financial time series. We begin the presentation with the widelyacknowledged stylized facts which describe the returns of financial assets fat tails, volatility clustering,

    autocorrelation, etc. and recall that some of these properties are directly linked to the way time is takeninto account. We continue with the statistical properties observed on order books in financial markets. Forthe sake of illustrating this review, (nearly) all the stated facts are reproduced using our own high-frequencyfinancial database. Finally, contributions to the study of correlations of assets such as random matrix theoryand graph theory are presented. In the second part of the review, we deal with models in Econophysicsthrough the point of view of agent-based modelling. Amongst a large number of multi-agent-based models,we have identified three representative areas. First, using previous work originally presented in the fields ofbehavioural finance and market microstructure theory, econophysicists have developed agent-based models oforder-driven markets that are extensively presented here. Second, kinetic theory models designed to explainsome empirical facts on wealth distribution are reviewed. Third, we briefly summarize game theory modelsby reviewing the now classic minority game and related problems.

    Keywords: Econophysics; Stylized facts; Financial time series; Correlations; Order book models; Agent-based models; Wealth distributions; Game Theory; Minority Games; Pareto Law; Entropy maximization;

    Utility maximization.PACS Nos.: 05.45.Tp, 02.50.Sk, 05.40.-a, 05.45.Ra, 89.75.Fb

    Part I

    I. INTRODUCTION

    What is Econophysics? Fifteen years after the wordEconophysics was coined by H. E. Stanley by a merg-ing of the words Economics and Physics, at an interna-tional conference on Statistical Physics held in Kolkatain 1995, this is still a commonly asked question. Many

    still wonder how theories aimed at explaining the phys-ical world in terms of particles could be applied to un-derstand complex structures, such as those found in thesocial and economic behaviour of human beings. In fact,

    a)Electronic mail: [email protected])Electronic mail: [email protected])Electronic mail: [email protected])Electronic mail: [email protected]

    physics as a natural science is supposed to be preciseor specific; its predictive powers based on the use of afew but universal properties of matter which are suffi-cient to explain many physical phenomena. But in socialsciences, are there analogous precise universal propertiesknown for human beings, who, on the contrary of funda-mental particles, are certainly not identical to each otherin any respect ? And what little amount of informa-tion would be sufficient to infer some of their complexbehaviours ? There exists a positive strive in answer-ing these questions. In the 1940s, Majorana had takenscientific interest in financial and economic systems. Hewrote a pioneering paper on the essential analogy be-tween statistical laws in physics and in social sciences(di Ettore Majorana (1942); Mantegna (2005, 2006)).However, during the following decades, only few physi-cists like Kadanoff(1971) or Montroll and Badger (1974)had an explicit interest for research in social or eco-nomic systems. It was not until the 1990s that physicistsstarted turning to this interdisciplinary subject, and in

    http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2http://arxiv.org/abs/0909.1974v2mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://arxiv.org/abs/0909.1974v2
  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    2/51

    2

    the past years, they have made many successful attemptsto approach problems in various fields of social sciences(e.g. de Oliveira et al. (1999); Stauffer et al. (2006);Chakrabarti et al. (2006)). In particular, in Quantita-tive Economics and Finance, physics research has begunto be complementary to the most traditional approachessuch as mathematical (stochastic) finance. These variousinvestigations, based on methods imported from or also

    used in physics, are the subject of the present paper.

    A. Bridging Physics and Economics

    Economics deals with how societies efficiently usetheir resources to produce valuable commodities and dis-tribute them among different people or economic agents(Samuelson (1998); Keynes (1973)). It is a disciplinerelated to almost everything around us, starting fromthe marketplace through the environment to the fate ofnations. At first sight this may seem a very differentsituation from that of physics, whose birth as a well de-

    fined scientific theory is usually associated with the studyof particular mechanical objects moving with negligiblefriction, such as falling bodies and planets. However, adeeper comparison shows many more analogies than dif-ferences. On a general level, both economics and physicsdeal with everything around us, despite with differ-ent perspectives. On a practical level, the goals of bothdisciplines can be either purely theoretical in nature orstrongly oriented toward the improvement of the qualityof life. On a more technical side, analogies often becomeequivalences. Let us give here some examples.

    Statistical mechanics has been defined as the

    branch of physics that combines the prin-

    ciples and procedures of statistics with thelaws of both classical and quantum mechan-ics, particularly with respect to the field ofthermodynamics. It aims to predict and ex-plain the measurable properties of macro-scopic systems on the basis of the propertiesand behaviour of the microscopic constituentsof those systems.1

    The tools of statistical mechanics or statistical physics(Reif (1985); Pathria (1996); Landau (1965)), that in-clude extracting the average properties of a macroscopicsystem from the microscopic dynamics of the systems, arebelieved to prove useful for an economic system. Indeed,even though it is difficult or almost impossible to writedown the microscopic equations of motion for an eco-nomic system with all the interacting entities, economicsystems may be investigated at various size scales. There-fore, the understanding of the global behaviour of eco-

    1 In Encyclopdia Britannica. Retrieved June 11, 2010, from En-cyclopdia Britannica Online.

    nomic systems seems to need concepts such as stochas-tic dynamics, correlation effects, self-organization, self-similarity and scaling, and for their application we donot have to go into the detailed microscopic descrip-tion of the economic system.

    Chaos theory has had some impact in Economics mod-elling, e.g. in the work by Brock and Hommes (1998) orChiarella et al. (2006). The theory of disordered systems

    has also played a core role in Econophysics and study ofcomplex systems. The term complex systems wascoined to cover the great variety of such systems whichinclude examples from physics, chemistry, biology andalso social sciences. The concepts and methods of sta-tistical physics turned out to be extremely useful in ap-plication to these diverse complex systems including eco-nomic systems. Many complex systems in natural andsocial environments share the characteristics of compe-tition among interacting agents for resources and theiradaptation to dynamically changing environment (Parisi(1999); Arthur (1999)). Hence, the concept of disorderedsystems helps for instance to go beyond the concept of

    representative agent, an approach prevailing in much of(macro)economics and criticized by many economists (seee.g. Kirman (1992); Gallegati and Kirman (1999)). Mi-nority games and their physical formulations have beenexemplary.

    Physics models have also helped bringing new theo-ries explaining older observations in Economics. TheItalian social economist Pareto investigated a centuryago the wealth of individuals in a stable economy(Pareto (1897a)) by modelling them with the distribu-tion P(> x) x, where P(> x) is the number of peo-ple having income greater than or equal to x and isan exponent (known now as the Pareto exponent) whichhe estimated to be 1.5. To explain such empirical find-ings, physicists have come up with some very elegantand intriguing kinetic exchange models in recent times,and we will review these developments in the compan-ion article. Though the economic activities of the agentsare driven by various considerations like utility maxi-mization, the eventual exchanges of money in any tradecan be simply viewed as money/wealth conserving two-body scatterings, as in the entropy maximization basedkinetic theory of gases. This qualitative analogy seemsto be quite old and both economists and natural scien-tists have already noted it in various contexts (Saha et al.(1950)). Recently, an equivalence between the two maxi-mization principles have been quantitatively established

    (Chakrabarti and Chakrabarti (2010)).Let us discuss another example of the similarities of in-

    terests and tools in Physics and Economics. The friction-less systems which mark the early history of physics weresoon recognized to be rare cases: not only at microscopicscale where they obviously represent an exception dueto the unavoidable interactions with the environment but also at the macroscopic scale, where fluctuations ofinternal or external origin make a prediction of theirtime evolution impossible. Thus equilibrium and non-

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    3/51

    3

    equilibrium statistical mechanics, the theory of stochas-tic processes, and the theory of chaos, became main toolsfor studying real systems as well as an important part ofthe theoretical framework of modern physics. Very inter-estingly, the same mathematical tools have presided atthe growth of classic modelling in Economics and moreparticularly in modern Finance. Following the works ofMandelbrot, Fama of the 1960s, physicists from 1990 on-

    wards have studied the fluctuation of prices and univer-salities in context of scaling theories, etc. These linksopen the way for the use of a physics approach in Fi-nance, complementary to the widespread mathematicalone.

    B. Econophysics and Finance

    Mathematical finance has benefited a lot in the pastthirty years from modern probability theory Brownianmotion, martingale theory, etc. Financial mathemati-cians are often proud to recall the most well-known source

    of the interactions between Mathematics and Finance:five years before Einsteins seminal work, the theory ofthe Brownian motion was first formulated by the Frenchmathematician Bachelier in his doctoral thesis (Bachelier(1900); Boness (1967); Haberman and Sibbett (1995)),in which he used this model to describe price fluctuationsat the Paris Bourse. Bachelier had even given a courseas a free professor at the Sorbonne University with thetitle: Probability calculus with applications to finan-cial operations and analogies with certain questions fromphysics (see the historical articles in Courtault et al.(2000); Taqqu (2001); Forfar (2002)).

    Then Ito, following the works of Bachelier, Wiener,

    and Kolmogorov among many, formulated the presentlyknown Ito calculus (Ito and McKean (1996)). The ge-ometric Brownian motion, belonging to the class ofIto processes, later became an important ingredientof models in Economics (Osborne (1959); Samuelson(1965)), and in the well-known theory of option pric-ing (Black and Scholes (1973); Merton (1973)). Infact, stochastic calculus of diffusion processes combinedwith classical hypotheses in Economics led to the devel-opment of the arbitrage pricing theory (Duffie (1996),Follmer and Schied (2004)). The deregulation of finan-cial markets at the end of the 1980s led to the expo-nential growth of the financial industry. Mathematicalfinance followed the trend: stochastic finance with diffu-sion processes and exponential growth of financial deriva-tives have had intertwined developments. Finally, thisrelationship was carved in stone when the Nobel prizewas given to M.S. Scholes and R.C. Merton in 1997 (F.Black died in 1995) for their contribution to the theory ofoption pricing and their celebrated Black-Scholes for-mula.

    However, this whole theory is closely linked to clas-sical economics hypotheses and has not been groundedenough with empirical studies of financial time series.

    The Black-Scholes hypothesis of Gaussian log-returns ofprices is in strong disagreement with empirical evidence.Mandelbrot (1960, 1963) was one of the firsts to observea clear departure from Gaussian behaviour for these fluc-tuations. It is true that within the framework of stochas-tic finance and martingale modelling, more complex pro-cesses have been considered in order to take into ac-count some empirical observations: jump processes (see

    e.g. Cont and Tankov (2004) for a textbook treatment)and stochastic volatility (e.g. Heston (1993); Gatheral(2006)) in particular. But recent events on financialmarkets and the succession of financial crashes (see e.g.Kindleberger and Aliber (2005) for a historical perspec-tive) should lead scientists to re-think basic concepts ofmodelling. This is where Econophysics is expected tocome to play. During the past decades, the financiallandscape has been dramatically changing: deregulationof markets, growing complexity of products. On a tech-nical point of view, the ever rising speed and decreasingcosts of computational power and networks have lead tothe emergence of huge databases that record all trans-actions and order book movements up to the millisec-ond. The availability of these data should lead to mod-els that are better empirically founded. Statistical factsand empirical models will be reviewed in this article andits companion paper. The recent turmoil on financialmarkets and the 2008 crash seem to plead for new mod-els and approaches. The Econophysics community thushas an important role to play in future financial marketmodelling, as suggested by contributions from Bouchaud(2008), Lux and Westerhoff (2009) or Farmer and Foley(2009).

    C. A growing interdisciplinary field

    The chronological development of Econophysics hasbeen well covered in the book of Roehner (2002). Hereit is worth mentioning a few landmarks. The first ar-ticle on analysis of finance data which appeared in aphysics journal was that of Mantegna (1991). The firstconference in Econophysics was held in Budapest in 1997and has been since followed by numerous schools, work-shops and the regular series of meetings: APFA (Appli-cation of Physics to Financial Analysis), WEHIA (Work-shop on Economic Heterogeneous Interacting Agents),and Econophys-Kolkata, amongst others. In the recentyears the number of papers has increased dramatically;the community has grown rapidly and several new direc-tions of research have opened. By now renowned physics

    journals like the Reviews of Modern Physics, PhysicalReview Letters, Physical Review E, Physica A, Euro-physics Letters, European Physical Journal B, Interna-tional Journal of Modern Physics C, etc. publish papersin this interdisciplinary area. Economics and mathemat-ical finance journals, especially Quantitative Finance, re-ceive contributions from many physicists. The interestedreader can also follow the developments quite well from

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    4/51

    4

    the preprint server (www.arxiv.org). In fact, recently anew section called quantitative finance has been addedto it. One could also visit the web sites of the Econo-physics Forum (www.unifr.ch/econophysics) and Econo-physics.Org (www.econophysics.org). The first textbookin Econophysics (Sinha et al. (2010)) is also in press.

    D. Organization of the review

    This article aims at reviewing recent empirical and the-oretical developments that use tools from Physics in thefields of Economics and Finance. In section II of thispaper, empirical studies revealing statistical propertiesof financial time series are reviewed. We present thewidely acknowledged stylized facts describing the dis-tribution of the returns of financial assets. In section IIIwe continue with the statistical properties observed onorder books in financial markets. We reproduce most ofthe stated facts using our own high-frequency financialdatabase. In the last part of this article (section IV),

    we review contributions on correlation on financial mar-kets, among which the computation of correlations usinghigh-frequency data, analyses based on random matrixtheory and the use of correlations to build economicstaxonomies. In the companion paper to follow, Econo-physics models are reviewed through the point of viewof agent-based modelling. Using previous work origi-nally presented in the fields of behavioural finance andmarket microstructure theory, econophysicists have de-veloped agent-based models of order-driven markets thatare extensively reviewed there. We then turn to models ofwealth distribution where an agent-based approach alsoprevails. As mentioned above, Econophysics models helpbringing a new look on some Economics observations, andadvances based on kinetic theory models are presented.Finally, a detailed review of game theory models and thenow classic minority games composes the final part.

    II. STATISTICS OF FINANCIAL TIME SERIES: PRICE,RETURNS, VOLUMES, VOLATILITY

    Recording a sequence of prices of commodities or as-sets produce what is called time series. Analysis of fi-nancial time series has been of great interest not onlyto the practitioners (an empirical discipline) but also tothe theoreticians for making inferences and predictions.The inherent uncertainty in the financial time series andits theory makes it specially interesting to economists,statisticians and physicists (Tsay (2005)).

    Different kinds of financial time series have beenrecorded and studied for decades, but the scale changedtwenty years ago. The computerization of stock ex-changes that took place all over the world in the mid1980s and early 1990s has lead to the explosion of theamount of data recorded. Nowadays, all transactions ona financial market are recorded tick-by-tick, i.e. every

    event on a stock is recorded with a timestamp defined upto the millisecond, leading to huge amounts of data. Forexample, as of today (2010), the Reuters Datascope TickHistory (RDTH) database records roughly 25 gigabytesof data every trading day.

    Prior to this improvement in recording market activ-ity, statistics could be computed with daily data at best.Now scientists can compute intraday statistics in high-

    frequency. This allows to check known properties at newtime scales (see e.g. section II B below), but also impliesspecial care in the treatment (see e.g. the computationof correlation on high-frequency in section IV A below).

    It is a formidable task to make an exhaustive reviewon this topic but we try to give a flavour of some of theaspects in this section.

    A. Stylized facts of financial time series

    The concept of stylized facts was introduced inmacroeconomics around 1960 by Kaldor (1961), who ad-

    vocated that a scientist studying a phenomenon shouldbe free to start off with a stylized view of the facts. Inhis work, Kaldor isolated several statistical facts char-acterizing macroeconomic growth over long periods andin several countries, and took these robust patterns as astarting point for theoretical modelling.

    This expression has thus been adopted to describe em-pirical facts that arose in statistical studies of financialtime series and that seem to be persistent across varioustime periods, places, markets, assets, etc. One can findmany different lists of these facts in several reviews (e.g.Bollerslev et al. (1994); Pagan (1996); Guillaume et al.(1997); Cont (2001)). We choose in this article to present

    a minimum set of facts now widely acknowledged, at leastfor the prices of equities.

    1. Fat-tailed empirical distribution of returns

    Let pt be the price of a financial asset at time t. Wedefine its return over a period of time to be:

    r(t) =p(t + ) p(t)

    p(t) log(p(t + )) log(p(t)) (1)

    It has been largely observed starting with Mandelbrot(1963), see e.g. Gopikrishnan et al. (1999) for tests onmore recent data and it is the first stylized fact, thatthe empirical distributions of financial returns and log-returns are fat-tailed. On figure 1 we reproduce the em-pirical density function of normalized log-returns fromGopikrishnan et al. (1999) computed on the S&P500 in-dex. In addition, we plot similar distributions for unnor-malized returns on a liquid French stock (BNP Paribas)with = 5 minutes. This graph is computed by samplinga set of tick-by-tick data from 9:05am till 5:20pm betweenJanuary 1st, 2007 and May 30th, 2008, i.e. 356 days of

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    5/51

    5

    10-4

    10-3

    10-2

    10-1

    -1.5 -1 -0.5 0 0.5 1 1.5

    Probability

    densityfunction

    Log-returns

    BNPP.PAGaussian

    Student

    FIG. 1. (Top) Empirical probability density function of thenormalized 1-minute S&P500 returns between 1984 and 1996.Reproduced from Gopikrishnan et al. (1999). (Bottom) Em-pirical probability density function of BNP Paribas unnor-malized log-returns over a period of time = 5 minutes.

    trading. Except where mentioned otherwise in captions,this data set will be used for all empirical graphs in thissection. On figure 2, cumulative distribution in log-logscale from Gopikrishnan et al. (1999) is reproduced. Wealso show the same distribution in linear-log scale com-puted on our data for a larger time scale = 1 day,showing similar behaviour.

    Many studies obtain similar observations on differentsets of data. For example, using two years of data onmore than a thousand US stocks, Gopikrishnan et al.(1998) finds that the cumulative distribution of returns

    asymptotically follow a power law F(r) |

    r| with

    > 2 ( 2.8 3). With > 2, the second mo-ment (the variance) is well-defined, excluding stable lawswith infinite variance. There has been various sugges-tions for the form of the distribution: Students-t, hyper-bolic, normal inverse Gaussian, exponentially truncatedstable, and others, but no general consensus exists onthe exact form of the tails. Although being the mostwidely acknowledged and the most elementary one, thisstylized fact is not easily met by all financial modelling.Gabaix et al. (2006) or Wyart and Bouchaud (2007) re-

    10-3

    10-2

    10-1

    100

    0 0.02 0.04 0.06 0.08 0.1

    Cumulativedistribution

    Absolute log-returns

    SP500Gaussian

    Student

    FIG. 2. Empirical cumulative distributions of S&P 500 dailyreturns. (Top) Reproduced from Gopikrishnan et al. (1999),in log-log scale. (Bottom) Computed using official daily closeprice b etween January 1st, 1950 and June 15th, 2009, i.e.14956 values, in linear-log scale.

    call that efficient market theory have difficulties in ex-plaining fat tails. Lux and Sornette (2002) have shownthat models known as rational expectation bubbles,popular in economics, produced very fat-tailed distribu-tions ( < 1) that were in disagreement with the statis-tical evidence.

    2. Absence of autocorrelations of returns

    On figure 3, we plot the autocorrelation of log-returnsdefined as (T) r(t + T)r(t) with =1 minuteand 5 minutes. We observe here, as it is widely known(see e.g. Pagan (1996); Cont et al. (1997)), that thereis no evidence of correlation between successive returns,which is the second stylized-fact. The autocorrelationfunction decays very rapidly to zero, even for a few lagsof 1 minute.

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    6/51

    6

    -0.4

    -0.2

    0

    0.2

    0.4

    0 10 20 30 40 50 60 70 80 90

    Autocorrelation

    Lag

    BNPP.PA 1-minute returnBNPP.PA 5-minute return

    FIG. 3. Autocorrelation function of BNPP.PA returns.

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 10 20 30 40 50 60 70 80 90

    Autocorrelation

    Lag

    BNPP.PA 1-minute returnBNPP.PA 5-minute return

    FIG. 4. Autocorrelation function of BNPP.PA absolute re-turns.

    3. Volatility clustering

    The third stylized-fact that we present here is of pri-mary importance. Absence of correlation between re-turns must no be mistaken for a property of indepen-dence and identical distribution: price fluctuations arenot identically distributed and the properties of the dis-tribution change with time.

    In particular, absolute returns or squared returns ex-hibit a long-range slowly decaying auto correlation func-tion. This phenomena is widely known as volatilityclustering, and was formulated by Mandelbrot (1963)as large changes tend to be followed by large changes of either sign and small changes tend to be followed bysmall changes.

    On figure 4, the autocorrelation function of absolutereturns is plotted for = 1 minute and 5 minutes. Thelevels of autocorrelations at the first lags vary wildly withthe parameter . On our data, it is found to be maxi-

    10-3

    10-2

    10-1

    100

    0 1 2 3 4 5 6Empiricalcu

    mulativedistribution

    Normalized return

    = 1 day = 1 week = 1 month

    Gaussian

    FIG. 5. Distribution of log-returns of S&P 500 daily, weeklyand monthly returns. Same data set as figure 2 bottom.

    mum (more than 70% at the first lag) for a returns sam-

    pled every five minutes. However, whatever the samplingfrequency, autocorrelation is still above 10% after severalhours of trading. On this data, we can grossly fit a powerlaw decay with exponent 0.4. Other empirical tests re-port exponents between 0.1 and 0.3 (Cont et al. (1997);Liu et al. (1997); Cizeau et al. (1997)).

    4. Aggregational normality

    It has been observed that as one increases the timescale over which the returns are calculated, the fat-tailproperty becomes less pronounced, and their distribu-

    tion approaches the Gaussian form, which is the fourthstylized-fact. This cross-over phenomenon is docu-mented in Kullmann et al. (1999) where the evolutionof the Pareto exponent of the distribution with the timescale is studied. On figure 5, we plot these standardizeddistributions for S&P 500 index between January 1st,1950 and June 15th, 2009. It is clear that the larger thetime scale increases, the more Gaussian the distributionis. The fact that the shape of the distribution changeswith makes it clear that the random process underlyingprices must have non-trivial temporal structure.

    B. Getting the right time

    1. Four ways to measure time

    In the previous section, all stylized facts have beenpresented in physical time, or calendar time, i.e. timeseries were indexed, as we expect them to be, in hours,minutes, seconds, milliseconds. Let us recall here thattick-by-tick data available on financial markets all overthe world is time-stamped up to the millisecond, but the

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    7/51

    7

    order of magnitude of the guaranteed precision is muchlarger, usually one second or a few hundreds of millisec-onds.

    Calendar time is the time usually used to compute sta-tistical properties of financial time series. This meansthat computing these statistics involves sampling, whichmight be a delicate thing to do when dealing for examplewith several stocks with different liquidity. Therefore,

    three other ways to keep track of time may be used.Let us first introduce event time. Using this count,

    time is increased by one unit each time one order is sub-mitted to the observed market. This framework is nat-ural when dealing with the simulation of financial mar-kets, as it will be showed in the companion paper. Themain outcome of event time is its smoothing of data.In event time, intraday seasonality (lunch break) or out-burst of activity consequent to some news are smoothedin the time series, since we always have one event pertime unit.

    Now, when dealing with time series of prices, anothercount of time might be relevant, and we call it trade time

    or transaction time. Using this count, time is increasedby one unit each time a transaction happens. The advan-tage of this count is that limit orders submitted far awayin the order book, and may thus be of lesser importancewith respect to the price series, do not increase the clockby one unit.

    Finally, going on with focusing on important events toincrease the clock, we can use tick time. Using this count,time is increased by one unit each time the price changes.Thus consecutive market orders that progressively eatliquidity until the first best limit is removed in an orderbook are counted as one unit time.

    Let us finish by noting that with these definitions,when dealing with mid prices, or bid and ask prices, atime series in event time can easily be extracted from atime series in calendar time. Furthermore, one can al-ways extract a time series in trade time or in price timefrom a time series in event time. However, one cannotextract a series in price time from a series in trade time,as the latter ignores limit orders that are submitted in-side the spread, and thus change mid, bid or ask priceswithout any transaction taking place.

    2. Revisiting stylized facts with a new clock

    Now, using the right clock might be of primary impor-tance when dealing with statistical properties and esti-mators. For example, Griffin and Oomen (2008) investi-gates the standard realized variance estimator (see sec-tion IV A) in trade time and tick time. Muni Toke (2010)also recalls that the differences observed on a spread dis-tribution in trade time and physical time are meaning-ful. In this section we compute some statistics comple-mentary to the ones we have presented in the previoussection II A and show the role of the clock in the studiedproperties.

    10-5

    10-4

    10-3

    10-2

    10-1

    -4 -2 0 2 4

    Empirica

    ldensityfunction

    Normalized return

    = 17 trades = 1 minute

    = 1049 trades = 1 hourGaussian

    FIG. 6. Distribution of log-returns of stock BNPP.PA. Thisempirical distribution is computed using data from 2007,April 1st until 2008, May 31st.

    a. Aggregational normality in trade time We haveseen above that when the sampling size increases, the dis-tribution of the log-returns tends to be more Gaussian.This property is much better seen using trade time. Onfigure 6, we plot the distributions of the log-returns forBNP Paribas stock using 2-month-long data in calendartime and trade time. Over this period, the average num-ber of trade per day is 8562, so that 17 trades (resp. 1049trades) corresponds to an average calendar time step of1 minute (resp. 1 hour). We observe that the distribu-tion of returns sampled every 1049 trades is much moreGaussian than the one sampled every 17 trades (aggre-gational normality), and that it is also more Gaussian

    that the one sampled every 1 hour (quicker convergencein trade time).Note that this property appears to be valid in a mul-

    tidimensional setting, see Huth and Abergel (2009).b. Autocorrelation of trade signs in tick time It is

    well-known that the series of the signs of the trades ona given stock (usual convention: +1 for a transactionat the ask price, 1 for a transaction at the bid price)exhibit large autocorrelation. It has been observed inLillo and Farmer (2004) for example that the autocorre-lation function of the signs of trades (n) was a slowlydecaying function in n, with 0.5. We computethis statistics for the trades on BNP Paribas stock from2007, January 1st until 2008, May 31st. We plot the re-sult in figure 7. We find that the first values for shortlags are about 0.3, and that the log-log plot clearly showssome power-law decay with roughly 0.7.

    A very plausible explanation of this phenomenon re-lies on the execution strategies of some major brokers ona given markets. These brokers have large transactionto execute on the account of some clients. In order toavoid market making move because of an inconsiderablylarge order (see below section IIIF on market impact),they tend to split large orders into small ones. We think

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    8/51

    8

    10-2

    10-1

    100

    1 10 100 1000

    Autocorrelation

    Lag

    BNPP.PA trade timeBNPP.PA tick time

    FIG. 7. Auto-correlation of trade signs for stock BNPP.PA.

    that these strategies explain, at least partly, the largeautocorrelation observed. Using data on markets whereorders are publicly identified and linked to a given bro-ker, it can be shown that the autocorrelation functionof the order signs of a given broker, is even higher. SeeBouchaud et al. (2009) for a review of these facts andsome associated theories.

    We present here another evidence supporting this ex-planation. We compute the autocorrelation function oforder signs in tick time, i.e. taking only into accounttransactions that make the price change. Results areplotted on figure 7. We find that the first values for shortlags are about 0.10, which is much smaller than the val-ues observed with the previous time series. This supportsthe idea that many small transactions progressively eat

    the available liquidity at the best quotes. Note howeverthat even in tick time, the correlation remains positivefor large lags also.

    3. Correlation between volume and volatility

    Investigating time series of cotton prices, Clark (1973)noted that trading volume and price change vari-ance seem to have a curvilinear relationship. Tradetime allows us to have a better view on this property:Plerou et al. (2000) and Silva and Yakovenko (2007)among others, show that the variance of log-returns afterN trades, i.e. over a time period of N in trade time, isproprtional to N. We confirm this observation by plot-ting the second moment of the distribution of log-returnsafter N trades as a function of N for our data, as well asthe average number of trades and the average volatilityon a given time interval. The results are shown on figure8 and 9.

    This results are to be put in relation to the one pre-sented in Gopikrishnan et al. (2000b), where the statis-tical properties of the number of shares traded Qt for a

    100

    10-5

    10-4

    10-4

    10-4

    0 1000 2000 3000 4000 5000

    Num

    beroftrades

    Variance

    BNPP.PALinear fit

    FIG. 8. Second moment of the distribution of returns over Ntrades for the stock BNPP.PA.

    10-4

    10-3

    10-2

    10-1

    100

    10 100 1000 10000

    Numberoftrades

    Time period

    Average variance

    Average number of trades

    FIG. 9. Average number of trades and average volatility ona time period for the stock BNPP.PA.

    given stock in a fixed time interval t is studied. Theyanalyzed transaction data for the largest 1000 stocksfor the two-year period 1994-95, using a database thatrecorded every transaction for all securities in three ma-

    jor US stock markets. They found that the distributionP(Qt) displayed a power-law decay as shown in Fig.10, and that the time correlations in Qt displayed long-range persistence. Further, they investigated the rela-tion between Qt and the number of transactions Ntin a time interval t, and found that the long-rangecorrelations in Qt were largely due to those of Nt.Their results are consistent with the interpretation thatthe large equal-time correlation previously found betweenQt and the absolute value of price change |Gt| (relatedto volatility) were largely due to Nt.

    Therefore, studying variance of price changer in tradetime suggests that the number of trade is a good proxyfor the unobserved volatility.

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    9/51

    9

    101

    100

    101

    102

    103

    104

    Normalized Qt

    109

    108

    107

    106

    105

    104

    103

    102

    101

    100

    Probability

    densityP(Q

    t

    )

    I

    II

    III

    IV

    V

    VI

    1+ = 2.7

    (c) 1000 stocks

    FIG. 10. Distribution of the number of shares traded Qt.Adapted from Gopikrishnan et al. (2000b).

    4. A link with stochastic processes: subordination

    These empirical facts (aggregational normality in tradetime, relationship between volume and volatility) rein-force the interest for models based on the subordinationof stochastic processes, which had been introduced in fi-nancial modeling by Clark (1973).

    Let us introduce it here. Assuming the proportionalitybetween the variance x2 of the centred returns x andthe number of trades N over a time period , we canwrite:

    x2 = N. (2)Therefore, assuming the normality in trade time, we canwrite the density function of log-returns after N tradesas

    fN(x) =ex2

    2N2N

    , (3)

    Finally, denoting K(N) the probability density functionof having N trades in a time period , the distributionof log returns in calendar time can be written

    P(x) =

    0

    ex2

    2N2N

    K(N)dN. (4)

    This is the subordination of the Gaussian process xNusing the number of trades N as the directing process,i.e. as the new clock. With this kind of modelization,it is expected, since PN is gaussian, the observed non-gaussian behavior will come from K(N). For exam-ple, some specific choice of directing processes may lead

    to a symmetric stable distribution (see Feller (1968)).Clark (1973) tests empirically a log-normal subordina-tion with time series of prices of cotton. In a similarway, Silva and Yakovenko (2007) find that an exponen-tial subordination with a kernel:

    K(N) =1

    e

    N . (5)

    is in good agreement with empirical data. If the orderswere submitted to the market in a independent way andat a constant rate , then the distribution of the numberof trade per time period should be a Poisson processwith intensity . Therefore, the empirical fit of equa-tion (5) is inconsistent with such a simplistic hypothesisof distribution of time of arrivals of orders. We will sug-gest in the next section some possible distributions thatfit our empirical data.

    III. STATISTICS OF ORDER BOOKS

    The computerization of financial markets in the sec-ond half of the 1980s provided the empirical scientistswith easier access to extensive data on order books.Biais et al. (1995) is an early study of the new dataflows on the newly (at that time) computerized ParisBourse. Variables crucial to a fine modeling of orderflows and dynamics of order books are studied: timeof arrival of orders, placement of orders, size of orders,shape of order book, etc. Many subsequent papers of-fer complementary empirical findings and modeling, e.g.Gopikrishnan et al. (2000a), Challet and Stinchcombe(2001), Maslov and Mills (2001), Bouchaud et al. (2002),Potters and Bouchaud (2003). Before going further in

    our review of available models, we try to summarize someof these empirical facts.For each of the enumerated properties, we present new

    empirical plots. We use Reuters tick-by-tick data on theParis Bourse. We select four stocks: France Telecom(FTE.PA) , BNP Paribas (BNPP.PA), Societe Generale(SOGN.PA) and Renault (RENA.PA). For any givenstocks, the data displays time-stamps, traded quantities,traded prices, the first five best-bid limits and the firstfive best-ask limits. From now on, we will denote ai(t)(resp. (bj(t)) the price of the i-th limit at ask (resp. j-th limit at bid). Except when mentioned otherwise, allstatistics are computed using all trading days from Oct,1st 2007 to May, 30th 2008, i.e. 168 trading days. On agiven day, orders submitted between 9:05am and 5:20pmare taken into account, i.e. first and last minutes of eachtrading days are removed.

    Note that we do not deal in this section with the cor-relations of the signs of trades, since statistical results onthis fact have already been treated in section IIB2. Notealso that although most of these facts are widely acknowl-edged, we will not describe them as new stylized factsfor order books since their ranges of validity are stillto be checked among various products/stocks, markets

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    10/51

    10

    and epochs, and strong properties need to be properlyextracted and formalized from these observations. How-ever, we will keep them in mind as we go through thenew trend of empirical modeling of order books.

    Finally, let us recall that the markets we are dealingwith are electronic order books with no official marketmaker, in which orders are submitted in a double auc-tion and executions follow price/time priority. This type

    of exchange is now adopted nearly all over the world,but this was not obvious as long as computerization wasnot complete. Different market mechanisms have beenwidely studied in the microstructure literature, see e.g.Garman (1976); Kyle (1985); Glosten (1994); OHara(1997); Biais et al. (1997); Hasbrouck (2007). We willnot review this literature here (except Garman (1976) inour companion paper), as this would be too large a di-gression. However, such a literature is linked in manyaspects to the problems reviewed in this paper.

    A. Time of arrivals of orders

    As explained in the previous section, the choice of thetime count might be of prime importance when dealingwith stylized facts of empirical financial time series.When reviewing the subordination of stochastic processes(Clark (1973); Silva and Yakovenko (2007)), we haveseen that the Poisson hypothesis for the arrival timesof orders is not empirically verified.

    We compute the empirical distribution for interarrivaltimes or durations of market orders on the stock BNPParibas using our data set described in the previous sec-

    tion. The results are plotted in figures 11 and 12, both inlinear and log scale. It is clearly observed that the expo-nential fit is not a good one. We check however that theWeibull distribution fit is potentially a very good one.Weibull distributions have been suggested for example inIvanov et al. (2004). Politi and Scalas (2008) also obtaingood fits with q-exponential distributions.

    In the Econometrics literature, these observationsof non-Poissonian arrival times have given rise to alarge trend of modelling of irregular financial data.Engle and Russell (1997) and Engle (2000) have in-troduced autoregressive condition duration or intensitymodels that may help modelling these processes of or-

    ders submission. See Hautsch (2004) for a textbooktreatment.

    Using the same data, we compute the empirical dis-tribution of the number of transactions in a given timeperiod . Results are plotted in figure 13. It seems thatthe log-normal and the gamma distributions are bothgood candidates, however none of them really describesthe empirical result, suggesting a complex structure ofarrival of orders. A similar result on Russian stocks waspresented in Dremin and Leonidov (2005).

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    1 10 100

    Emp

    iricaldensity

    Interarrival time

    BNPP.PALognormal

    ExponentialWeibull

    FIG. 11. Distribution of interarrival times for stock BNPP.PAin log-scale.

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    0 2 4 6 8 10 12 14

    Empiricaldensity

    Interarrival time

    BNPP.PALognormal

    ExponentialWeibull

    FIG. 12. Distribution of interarrival times for stock BNPP.PA(Main body, linear scale).

    B. Volume of orders

    Empirical studies show that the unconditional dis-tribution of order size is very complex to character-ize. Gopikrishnan et al. (2000a) and Maslov and Mills(2001) observe a power law decay with an exponent1 +

    2.3

    2.7 for market orders and 1 +

    2.0 for

    limit orders. Challet and Stinchcombe (2001) empha-size on a clustering property: orders tend to have around size in packages of shares, and clusters are ob-served around 100s and 1000s. As of today, no consen-sus emerges in proposed models, and it is plausible thatsuch a distribution varies very wildly with products andmarkets.

    In figure 14, we plot the distribution of volume of mar-ket orders for the four stocks composing our benchmark.Quantities are normalized by their mean. Power-law co-

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    11/51

    11

    0

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0 1 2 3 4 5 6

    Emp

    iricaldensity

    Normalized number of trades

    = 1 minute = 5 minutes = 15 minutes = 30 minutes

    FIG. 13. Distribution of the number of trades in a given timeperiod for stock BNPP.PA. This empirical distribution iscomputed using data from 2007, October 1st until 2008, May31st.

    efficient is estimated by a Hill estimator (see e.g. Hill(1975); de Haan et al. (2000)). We find a power law withexponent 1 + 2.7 which confirms studies previouslycited. Figure 15 displays the same distribution for limitorders (of all available limits). We find an average valueof 1 + 2.1, consistent with previous studies. How-ever, we note that the power law is a poorer fit in thecase of limit orders: data normalized by their mean col-lapse badly on a single curve, and computed coefficientsvary with stocks.

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    0.01 0.1 1 10 100 1000 10000

    Probabilityfunctions

    Normalized volume for market orders

    Power law x-2.7

    Exponential e-xBNPP.PA

    FTE.PARENA.PASOGN.PA

    FIG. 14. Distribution of volumes of market orders. Quantitiesare normalized by their mean.

    C. Placement of orders

    a. Placement of arriving limit orders

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    0.01 0.1 1 10 100 1000 10000

    Proba

    bilityfunctions

    Normalized volume of limit orders

    Power law x-2.1

    Exponential e-x

    BNPP.PAFTE.PA

    RENA.PASOGN.PA

    FIG. 15. Distribution of normalized volumes of limit orders.Quantities are normalized by their mean.

    Bouchaudet al.

    (2002) observe a broad power-lawplacement around the best quotes on French stocks,confirmed in Potters and Bouchaud (2003) on USstocks. Observed exponents are quite stable acrossstocks, but exchange dependent: 1 + 1.6 on theParis Bourse, 1 + 2.0 on the New York StockExchange, 1 + 2.5 on the London Stock Exchange.Mike and Farmer (2008) propose to fit the empiricaldistribution with a Student distribution with 1.3 degreeof freedom.

    We plot the distribution of the following quantitycomputed on our data set, i.e. using only the firstfive limits of the order book: p = b0(t) b(t) (resp.a(t)

    a0(t

    )) if an bid (resp. ask) order arrives at price

    b(t) (resp. a(t)), where b0(t) (resp.a0(t)) is the bestbid (resp. ask) before the arrival of this order. Resultsare plotted on figures 16 (in semilog scale) and 17 (inlinear scale). These graphs being computed with in-complete data (five best limits), we do not observe aplacement as broad as in Bouchaud et al. (2002). How-ever, our data makes it clear that fat tails are observed.We also observe an asymmetry in the empirical distribu-tion: the left side is less broad than the right side. Sincethe left side represent limit orders submitted inside thespread, this is expected. Thus, the empirical distributionof the placement of arriving limit orders is maximum atzero (same best quote). We then ask the question: How

    is it translated in terms of shape of the order book ?b. Average shape of the order book Contrary to what

    one might expect, it seems that the maximum of the av-erage offered volume in an order book is located awayfrom the best quotes (see e.g. Bouchaud et al. (2002)).Our data confirms this observation: the average quantityoffered on the five best quotes grows with the level. Thisresult is presented in figure 18. We also compute the av-erage price of these levels in order to plot a cross-sectionalgraph similar to the ones presented in Biais et al. (1995).

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    12/51

    12

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    -0.6 -0.4 -0.2 0 0.2 0.4 0.6

    Probabilitydensityfunction

    p

    BNPP.PAGaussian

    Student

    FIG. 16. Placement of limit orders using the same best quotereference in semilog scale. Data used for this computationis BNP Paribas order book from September 1st, 2007, untilMay 31st, 2008.

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    -0.1 -0.05 0 0.05 0.1

    Pro

    babilitydensityfunction

    p

    BNPP.PAGaussian

    FIG. 17. Placement of limit orders using the same best quotereference in linear scale. Data used for this computation isBNP Paribas order book from September 1st, 2007, until May31st, 2008.

    Our result is presented for stock BNP.PA in figure 19 anddisplays the expected shape. Results for other stocks aresimilar. We find that the average gap between two levelsis constant among the five best bids and asks (less thanone tick for FTE.PA, 1.5 tick for BNPP.PA, 2.0 ticks forSOGN.PA, 2.5 ticks for RENA.PA). We also find thatthe average spread is roughly twice as large the aver-age gap (factor 1.5 for FTE.PA, 2 for BNPP.PA, 2.2 forSOGN.PA, 2.4 for RENA.PA).

    0.7

    0.75

    0.8

    0.85

    0.9

    0.95

    1

    1.05

    1.1

    1.15

    -6 -4 -2 0 2 4 6

    Normalizednumbersofshares

    Level of limit orders (0:asks)

    BNPP.PAFTE.PA

    RENA.PASOGN.PA

    FIG. 18. Average quantity offered in the limit order book.

    580000

    600000

    620000

    640000

    660000

    680000

    700000

    720000

    67 67.05 67.1 67.15

    Numbersofshares

    Price of limit orders

    BNPP.PA

    FIG. 19. Average limit order book: price and depth.

    D. Cancelation of orders

    Challet and Stinchcombe (2001) show that the dis-tribution of the average lifetime of limit orders fitsa power law with exponent 1 + 2.1 for cancelledlimit orders, and 1 + 1.5 for executed limit orders.Mike and Farmer (2008) find that in either case the ex-ponential hypothesis (Poisson process) is not satisfied onthe market.

    We compute the average lifetime of cancelled and exe-cuted orders on our dataset. Since our data does not in-clude a unique identifier of a given order, we reconstructlife time orders as follows: each time a cancellation isdetected, we go back through the history of limit ordersubmission and look for a matching order with same priceand same quantity. If an order is not matched, we discardthe cancellation from our lifetime data. Results are pre-sented in figure 20 and 21. We observe a power law decaywith coefficients 1 + 1.3 1.6 for both cancelled and

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    13/51

    13

    executed limit orders, with little variations among stocks.These results are a bit different than the ones presentedin previous studies: similar for executed limit orders, butour data exhibits a lower decay as for cancelled orders.Note that the observed cut-off in the distribution for life-times above 20000 seconds is due to the fact that we donot take into account execution or cancellation of orderssubmitted on a previous day.

    10

    -7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    1 20 400 8000

    Probabilityfunctions

    Lifetime for cancelled limit orders

    Power law x-1.4

    BNPP.PAFTE.PA

    RENA.PASOGN.PA

    FIG. 20. Distribution of estimated lifetime of cancelled limitorders.

    10-7

    10-6

    10-5

    10-4

    10-3

    10-2

    10-1

    100

    1 20 400 8000

    Probabilityfun

    ctions

    Lifetime for executed limit orders

    Power law x-1.5

    BNPP.PAFTE.PA

    RENA.PASOGN.PA

    FIG. 21. Distribution of estimated lifetime of executed limitorders.

    E. Intraday seasonality

    Activity on financial markets is of course not constantthroughout the day. Figure 22 (resp. 23) plots the (nor-malized) number of market (resp. limit) orders arrivingin a 5-minute interval. It is clear that a U-shape is ob-served (an ordinary least-square quadratic fit is plotted):

    the observed market activity is larger at the beginningand the end of the day, and more quiet around mid-day. Such a U-shaped curve is well-known, see Biais et al.(1995), for example. On our data, we observe that thenumber of orders on a 5-minute interval can vary with afactor 10 throughout the day.

    0

    0.5

    1

    1.5

    2

    2.5

    3

    30000 35000 40000 45000 50000 55000 60000 65000

    Numberofmarketorderssubmittedint=5minutes

    Time of day (seconds)

    BNPP.PABNPP.PA quadratic fit

    FTE.PAFTE.PA quadratic fit

    FIG. 22. Normalized average number of market orders in a5-minute interval.

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    30000 35000 40000 45000 50000 55000 60000 65000

    Numberoflimitorderssubmittedint=5minutes

    Time of day (seconds)

    BNPP.PABNPP.PA quadratic fit

    FTE.PAFTE.PA quadratic fit

    FIG. 23. Normalized average number of limit orders in a 5-

    minute interval.

    Challet and Stinchcombe (2001) note that the averagenumber of orders submitted to the market in a periodT vary wildly during the day. The authors also observethat these quantities for market orders and limit ordersare highly correlated. Such a type of intraday variationof the global market activity is a well-known fact, alreadyobserved in Biais et al. (1995), for example.

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    14/51

    14

    F. Market impact

    The statistics we have presented may help to under-stand a phenomenon of primary importance for any fi-nancial market practitioner: the market impact, i.e. therelationship between the volume traded and the expectedprice shift once the order has been executed. On a firstapproximation, one understands that it is closely linked

    with many items described above: the volume of mar-ket orders submitted, the shape of the order book (howmuch pending limit orders are hit by one large marketorders), the correlation of trade signs (one may assumethat large orders are splitted in order to avoid a largemarket impact), etc.

    Many empirical studies are available. An empiricalstudy on the price impact of individual transactions on1000 stocks on the NYSE is conducted in Lillo et al.(2003). It is found that proper rescaling make all thecurve collapse onto a single concave master curve. Thisfunction increases as a power that is the order of 1/2 forsmall volumes, but then increases more slowly for large

    volumes. They obtain similar results in each year for theperiod 1995 to 1998.

    We will not review any further the large literature ofmarket impact, but rather refer the reader to the recentexhaustive synthesis proposed in Bouchaud et al. (2009),where different types of impacts, as well as some theoret-ical models are discussed.

    IV. CORRELATIONS OF ASSETS

    The word correlation is defined as a relation exist-ing between phenomena or things or between mathemat-ical or statistical variables which tend to vary, be associ-ated, or occur together in a way not expected on the basisof chance alone2. When we talk about correlations instock prices, what we are really interested in are relationsbetween variables such as stock prices, order signs, trans-action volumes, etc. and more importantly how theserelations affect the nature of the statistical distributionsand laws which govern the price time series. This sec-tion deals with several topics concerning linear correla-tion observed in financial data. The first part deals withthe important issue of computing correlations in high-frequency. As mentioned earlier, the computerization offinancial exchanges has lead to the availability of huge

    amount of tick-by-tick data, and computing correlationusing these intraday data raises lots of issues concern-ing usual estimators. The second and third parts dealswith the use of correlation in order to cluster assets withpotential applications in risk management problems.

    2 In Merriam-Webster Online Dictionary. Retrieved June 14, 2010,from http://www.merriam-webster.com/dictionary/correlations

    A. Estimating covariance on high-frequency data

    Let us assume that we observe d time series ofprices or log-prices pi, i = 1, . . . , d, observed at timestm, m = 0, . . . , M . The usual estimator of the covari-ance of prices i and j is the realized covariance estimator,which is computed as:

    RVij (t) =Mm=1

    (pi(tm) pi(tm1))(pj(tm) pj(tm1)).(6)

    The problem is that high-frequency tick-by-tick datarecord changes of prices when they happen, i.e. at ran-dom times. Tick-by-tick data is thus asynchronous, con-trary to daily close prices for example, that are recordedat the same time for all the assets on a given exchange.Using standard estimators without caution, could beone cause for the Epps effect, first observed in Epps(1979), which stated that [c]orrelations among pricechanges in common stocks of companies in one indus-try are found to decrease with the length of the intervalfor which the price changes are measured. This haslargely been verified since, e.g. in Bonanno et al. (2001)or Reno (2003). Hayashi and Yoshida (2005) shows thatnon-synchronicity of tick-by-tick data and necessary sam-pling of time series in order to compute the usual realizedcovariance estimator partially explain this phenomenon.We very briefly review here two covariance estimatorsthat do not need any synchronicity (hence, sampling) inorder to be computed.

    1. The Fourier estimator

    The Fourier estimator has been introduced byMalliavin and Mancino (2002). Let us assume that wehave d time series of log-prices that are observations ofBrownian semi-martingales pi:

    dpi =Kj=1

    ijdWj + idt,i = 1, . . . , d . (7)

    The coefficient of the covariance matrix are then writ-ten ij(t) =

    Kk=1 ik(t)jk(t). Malliavin and Mancino

    (2002) show that the Fourier coefficient of ij(t) are, withn0 a given integer:

    ak(ij) = limN

    N + 1 n0N

    s=n0

    1

    2[as(dpi)as+k(dpj)

    +bs+k(dpi)bs(dpj)] , (8)

    bk(ij) = limN

    N + 1 n0N

    s=n0

    1

    2[as(dpi)bs+k(dpj)

    bs(dpi)as+k(dpj)] , (9)

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    15/51

    15

    where the Fourier coefficients ak(dpi) and bk(dpi) of dpican be directly computed on the time series. Indeed,rescaling the time window on [0, 2] and using integrationby parts, we have:

    ak(dpi) =p(2) p(0)

    k

    20

    sin(kt)pi(t)dt. (10)

    This last integral can be discretized and approximately

    computed using the times tim of observations of the pro-cess pi. Therefore, fixing a sufficiently large N, onecan compute an estimator Fij of the covariance of theprocesses i and j. See Reno (2003) or Iori and Precup(2007), for examples of empirical studies using this esti-mator.

    2. The Hayashi-Yoshida estimator

    Hayashi and Yoshida (2005) have proposed a simpleestimator in order to compute covariance/correlationwithout any need for synchronicity of time series. As

    in the Fourier estimator, it is assumed that the observedprocess is a Brownian semi-martingale. The time win-dow of observation is easily partitioned into d family ofintervals i = (Uim), i = 1, . . . , d, where t

    im = inf{Uim+1}

    is the time of the m-th observation of the process i. Letus denote pi(Uim) = pi(t

    im) pi(tim1). The cumula-

    tive covariance estimator as the authors named it, or theHayashi-Yoshida estimator as it has been largely referedto, is then built as follows:

    HYij (t) =m,n

    pi(Uim)pj(U

    jn)1{UimU

    jn=}

    . (11)

    There is a large literature in Econometrics that

    tackles the new challenges posed by high-frequencydata. We refer the reader, wishing to go be-yond this brief presentation, to the econometrics re-views by Barndorff-Nielsen and Shephard (2007) orMcAleer and Medeiros (2008), for example.

    B. Correlation matrix and Random Matrix Theory

    The stock market data being essentially a multivariatetime series data, we construct correlation matrix to studyits spectra and contrast it with the random multivariatedata from coupled map lattice. It is known from previousstudies that the empirical spectra of correlation matrices

    drawn from time series data, for most part, follow ran-dom matrix theory (RMT, see e.g. Gopikrishnan et al.(2001)).

    1. Correlation matrix and Eigenvalue density

    a. Correlation matrix If there are N assets withprice Pi(t) for asset i at time t, then the logarithmic re-turn of stock i is ri(t) = ln Pi(t) ln Pi(t 1), which for

    a certain consecutive sequence of trading days forms thereturn vector ri. In order to characterize the synchronoustime evolution of stocks, the equal time correlation coef-ficients between stocks i and j is defined as

    ij =rirj rirj

    [r2i ri2][r2j rj2]

    , (12)

    where ... indicates a time average over the trading daysincluded in the return vectors. These correlation coef-ficients form an N N matrix with 1 ij 1. If ij = 1, the stock price changes are completely corre-lated; if ij = 0, the stock price changes are uncorre-lated, and if ij = 1, then the stock price changes arecompletely anti-correlated.

    b. Correlation matrix of spatio-temporal series fromcoupled map lattices Consider a time series of the formz(x, t), where x = 1, 2,...n and t = 1, 2....p denote thediscrete space and time, respectively. In this, the timeseries at every spatial point is treated as a different vari-able. We define the normalised variable as

    z(x, t) =z(x, t) z(x)

    (x), (13)

    where the brackets . represent temporal averages and(x) the standard deviation of z at position x. Then,the equal-time cross-correlation matrix that representsthe spatial correlations can be written as

    Sx,x = z(x, t) z(x, t) , x, x = 1, 2, . . . , n . (14)

    The correlation matrix is symmetric by construction. Inaddition, a large class of processes are translation invari-

    ant and the correlation matrix can contain that addi-tional symmetry too. We will use this property for ourcorrelation models in the context of coupled map lat-tice. In time series analysis, the averages . have tobe replaced by estimates obtained from finite samples.As usual, we will use the maximum likelihood estimates,a(t) 1p

    pt=1 a(t). These estimates contain statisti-

    cal uncertainties, which disappears for p . Ideally,one requires p n to have reasonably correct correla-tion estimates. See Chakraborti et al. (2007) for detailsof parameters.

    c. Eigenvalue Density The interpretation of thespectra of empirical correlation matrices should be donecarefully if one wants to be able to distinguish betweensystem specific signatures and universal features. Theformer express themselves in the smoothed level den-sity, whereas the latter usually are represented by thefluctuations on top of this smooth curve. In time seriesanalysis, the matrix elements are not only prone to un-certainty such as measurement noise on the time seriesdata, but also statistical fluctuations due to finite sam-ple effects. When characterizing time series data in termsof random matrix theory, one is not interested in thesetrivial sources of fluctuations which are present on every

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    16/51

    16

    data set, but one would like to identify the significant fea-tures which would be shared, in principle, by an infiniteamount of data without measurement noise. The eigen-functions of the correlation matrices constructed fromsuch empirical time series carry the information con-tained in the original time series data in a graded man-ner and they also provide a compact representation for it.Thus, by applying an approach based on random matrix

    theory, one tries to identify non-random components ofthe correlation matrix spectra as deviations from randommatrix theory predictions (Gopikrishnan et al. (2001)).

    We will look at the eigenvalue density that has beenstudied in the context of applying random matrix the-ory methods to time series correlations. LetN() be theintegrated eigenvalue density which gives the number ofeigenvalues less than a given value . Then, the eigen-

    value or level density is given by () = dN()d . This canbe obtained assuming random correlation matrix and isfound to be in good agreement with the empirical time se-ries data from stock market fluctuations. From RandomMatrix Theory considerations, the eigenvalue density for

    random correlations is given by

    rmt() =Q

    2

    (max )( min) , (15)

    where Q = N/T is the ratio of the number of variablesto the length of each time series. Here, max and min,representing the maximum and minimum eigenvalues ofthe random correlation matrix respectively, are given bymax,min = 1 + 1/Q2

    1/Q. However, due to presence

    of correlations in the empirical correlation matrix, thiseigenvalue density is often violated for a certain numberof dominant eigenvalues. They often correspond to sys-tem specific information in the data. In Fig. 24 we show

    the eigenvalue density for S&P500 data and also for thechaotic data from coupled map lattice. Clearly, bothcurves are qualitatively different. Thus, presence or ab-sence of correlations in data is manifest in the spectrumof the corresponding correlation matrices.

    2. Earlier estimates and studies using Random MatrixTheory

    Laloux et al. (1999) showed that results from the ran-dom matrix theory were useful to understand the statis-tical structure of the empirical correlation matrices ap-pearing in the study of price fluctuations. The empiricaldetermination of a correlation matrix is a difficult task.If one considers N assets, the correlation matrix con-tains N(N 1)/2 mathematically independent elements,which must be determined from N time series of lengthT. If T is not very large compared to N, then gener-ally the determination of the covariances is noisy, andtherefore the empirical correlation matrix is to a largeextent random. The smallest eigenvalues of the matrixare the most sensitive to this noise. But the eigenvec-tors corresponding to these smallest eigenvalues deter-

    0.6 0.8 1 1.2 1.4 1.6 1.8 2

    0

    10

    20

    30

    40

    50

    60

    70

    ()

    0 0.5 1 1.5 2 2.5 3 3.5 4

    0

    0.5

    1

    1.5

    ()

    FIG. 24. The upper panel shows spectral density for multi-variate spatio-temporal time series drawn from coupled maplattices. The lower panel shows the eigenvalue density for thereturn time series of the S&P500 stock market data (8938time steps).

    mine the minimum risk portfolios in Markowitz theory.It is thus important to distinguish signal from noiseor, in other words, to extract the eigenvectors and eigen-values of the correlation matrix containing real informa-

    tion (those important for risk control), from those whichdo not contain any useful information and are unstable intime. It is useful to compare the properties of an empiri-cal correlation matrix to a null hypothesis a randommatrix which arises for example from a finite time se-ries of strictly uncorrelated assets. Deviations from therandom matrix case might then suggest the presence oftrue information. The main result of their study was theremarkable agreement between the theoretical prediction(based on the assumption that the correlation matrix israndom) and empirical data concerning the density ofeigenvalues (shown in Fig. 25) associated to the timeseries of the different stocks of the S&P 500 (or otherstock markets). Cross-correlations in financial data werealso studied by Plerou et al. (1999, 2002). They anal-ysed cross-correlations between price fluctuations of dif-ferent stocks using methods of RMT. Using two largedatabases, they calculated cross-correlation matrices ofreturns constructed from (i) 30-min returns of 1000 USstocks for the 2-yr period 199495, (ii) 30-min returnsof 881 US stocks for the 2-yr period 199697, and (iii)1-day returns of 422 US stocks for the 35-yr period 196296. They also tested the statistics of the eigenvaluesi of cross-correlation matrices against a null hypoth-

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    17/51

    17

    FIG. 25. Eigenvalue spectrum of the correlation matrices.Adapted from Laloux et al. (1999).

    esis. They found that a majority of the eigenvaluesof the cross-correlation matrices were within the RMTbounds [min, max], as defined above, for the eigenval-ues of random correlation matrices. They also tested theeigenvalues of the cross-correlation matrices within theRMT bounds for universal properties of random matricesand found good agreement with the results for the Gaus-sian orthogonal ensemble (GOE) of random matrices

    implying a large degree of randomness in the measuredcross-correlation coefficients. Furthermore, they foundthat the distribution of eigenvector components for theeigenvectors corresponding to the eigenvalues outside theRMT bounds displayed systematic deviations from theRMT prediction and that these deviating eigenvectorswere stable in time. They analysed the components of thedeviating eigenvectors and found that the largest eigen-value corresponded to an influence common to all stocks.Their analysis of the remaining deviating eigenvectorsshowed distinct groups, whose identities corresponded toconventionally-identified business sectors.

    C. Analyses of correlations and economic taxonomy

    1. Models and theoretical studies of financial correlations

    Podobnik et al. (2000) studied how the presence of cor-relations in physical variables contributes to the form ofprobability distributions. They investigated a processwith correlations in the variance generated by a Gaus-sian or a truncated Levy distribution. For both Gaus-

    sian and truncated Levy distributions, they found thatdue to the correlations in the variance, the process dy-namically generated power-law tails in the distributions,whose exponents could be controlled through the way thecorrelations in the variance were introduced. For a trun-cated Levy distribution, the process could extend a trun-cated distribution beyond the truncation cutoff, leadingto a crossover between a Levy stable power law and their

    dynamically-generated power law. It was also shownthat the process could explain the crossover behavior ob-served in the S&P 500 stock index.

    Noh (2000) proposed a model for correlations in stockmarkets in which the markets were composed of severalgroups, within which the stock price fluctuations werecorrelated. The spectral properties of empirical correla-tion matrices (Plerou et al. (1999); Laloux et al. (1999))were studied in relation to this model and the connectionbetween the spectral properties of the empirical corre-lation matrix and the structure of correlations in stockmarkets was established.

    The correlation structure of extreme stock returns were

    studied by Cizeau et al. (2001). It has been commonlybelieved that the correlations between stock returns in-creased in high volatility periods. They investigated howmuch of these correlations could be explained within asimple non-Gaussian one-factor description with time in-dependent correlations. Using surrogate data with thetrue market return as the dominant factor, it was shownthat most of these correlations, measured by a variety ofdifferent indicators, could be accounted for. In partic-ular, their one-factor model could explain the level andasymmetry of empirical exceeding correlations. However,more subtle effects required an extension of the one factormodel, where the variance and skewness of the residualsalso depended on the market return.

    Burda et al. (2001) provided a statistical analysis ofthree S&P 500 covariances with evidence for raw taildistributions. They studied the stability of these tailsagainst reshuffling for the S&P 500 data and showed thatthe covariance with the strongest tails was robust, witha spectral density in remarkable agreement with randomLevy matrix theory. They also studied the inverse par-ticipation ratio for the three covariances. The stronglocalization observed at both ends of the spectral den-sity was analogous to the localization exhibited in therandom Levy matrix ensemble. They showed that thestocks with the largest scattering were the least suscepti-ble to correlations and were the likely candidates for the

    localized states.

    2. Analyses using graph theory and economic taxonomy

    Mantegna (1999) introduced a method for finding a hi-erarchical arrangement of stocks traded in financial mar-ket, through studying the clustering of companies by us-ing correlations of asset returns. With an appropriatemetric based on the earlier explained correlation ma-

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    18/51

    18

    trix coefficients ij s between all pairs of stocks i andj of the portfolio, computed in Eq. 12 by consideringthe synchronous time evolution of the difference of thelogarithm of daily stock price a fully connected graphwas defined in which the nodes are companies, or stocks,and the distances between them were obtained fromthe corresponding correlation coefficients. The minimumspanning tree (MST) was generated from the graph by

    selecting the most important correlations and it was usedto identify clusters of companies. The hierarchical treeof the sub-dominant ultrametric space associated withthe graph provided information useful to investigate thenumber and nature of the common economic factors af-fecting the time evolution of logarithm of price of welldefined groups of stocks. Several other attempts havebeen made to obtain clustering from the huge correlationmatrix.

    Bonanno et al. (2001) studied the high-frequencycross-correlation existing between pairs of stocks tradedin a financial market in a set of 100 stocks traded in USequity markets. A hierarchical organization of the inves-

    tigated stocks was obtained by determining a metric dis-tance between stocks and by investigating the propertiesof the sub-dominant ultrametric associated with it. Aclear modification of the hierarchical organization of theset of stocks investigated was detected when the timehorizon used to determine stock returns was changed.The hierarchical location of stocks of the energy sectorwas investigated as a function of the time horizon. Thehierarchical structure explored by the minimum spanningtree also seemed to give information about the influentialpower of the companies.

    It also turned out that the hierarchical structure ofthe financial market could be identified in accordance

    with the results obtained by an independent cluster-ing method, based on Potts super-paramagnetic transi-tions as studied by Kullmann et al. (2000), where thespins correspond to companies and the interactions arefunctions of the correlation coefficients determined fromthe time dependence of the companies individual stockprices. The method is a generalization of the clus-tering algorithm by Blatt et al. (1996) to the case ofanti-ferromagnetic interactions corresponding to anti-correlations. For the Dow Jones Industrial Average, noanti-correlations were observed in the investigated timeperiod and the previous results obtained by different toolswere well reproduced. For the S&P 500, where anti-correlations occur, repulsion between stocks modified thecluster structure of the N = 443 companies studied, asshown in Fig. 26. The efficiency of the method is repre-sented by the fact that the figure matches well with thecorresponding result obtained by the minimal spanningtree method, including the specific composition of theclusters. For example, at the lowest level of the hierarchy(highest temperature in the super-paramagnetic phase)the different industrial branches can be clearly identi-fied: Oil, electricity, gold mining, etc. companies buildseparate clusters. The network of influence was investi-

    180

    70

    85

    5 3 3 2

    6 17 2 3 3 5 2 2 2

    54 6 17 3 2 5 2 2

    3 3 4 14 3 5 2 2

    4 12 3 5 2 2

    9 2 2 22 3 11 2

    2 17 3 11 29 .033

    .027

    .0175

    .014

    .005

    T443

    FIG. 26. The hierarchical structure of clusters of the S&P500 companies in the ferromagnetic case. In the boxes thenumber of elements of the cluster are indicated. The clustersconsisting of single companies are not indicated. Adaptedfrom Kullmann et al. (2000).

    gated by means of a time-dependent correlation methodby Kullmann et al. (2000). They studied the correlations

    as the function of the time shift between pairs of stockreturn time series of tick-by-tick data of the NYSE. Theyinvestigated whether any pulling effect between stocksexisted or not, i.e. whether at any given time the re-turn value of one stock influenced that of another stockat a different time or not. They found that, in general,two types of mechanisms generated significant correlationbetween any two given stocks. One was some kind of ex-ternal effect (say, economic or political news) that influ-enced both stock prices simultaneously, and the changefor both prices appeared at the same time, such thatthe maximum of the correlation was at zero time shift.The second effect was that, one of the companies had aninfluence on the other company indicating that one com-panys operation depended on the other, so that the pricechange of the influenced stock appeared latter because itrequired some time to react on the price change of thefirst stock displaying a pulling effect. A weak but sig-nificant effect with the real data set was found, showingthat in many cases the maximum correlation was at non-zero time shift indicating directions of influence betweenthe companies, and the characteristic time was of theorder of a few minutes, which was compatible with effi-cient market hypothesis. In the pulling effect, they foundthat in general, more important companies (which weretraded more) pulled the relatively smaller companies.

    The time dependent properties of the minimum span-ning tree (introduced by Mantegna), called a dynamicasset tree, were studied by Onnela et al. (2003b). Thenodes of the tree were identified with stocks and the dis-tance between them was a unique function of the corre-sponding element of the correlation matrix. By using theconcept of a central vertex, chosen as the most stronglyconnected node of the tree, the mean occupation layerwas defined, which was an important characteristic ofthe tree. During crashes the strong global correlation inthe market manifested itself by a low value of the mean

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    19/51

    19

    occupation layer. The tree seemed to have a scale freestructure where the scaling exponent of the degree dis-tribution was different for business as usual and crashperiods. The basic structure of the tree topology wasvery robust with respect to time. Let us discuss in moredetails how the dynamic asset tree was applied to studiesof economic taxonomy.

    a. Financial Correlation matrix and constructing As-

    set Trees Two different sets of financial data were used.The first set from the Standard & Poors 500 index(S&P500) of the New York Stock Exchange (NYSE)from July 2, 1962 to December 31, 1997 contained 8939daily closing values. The second set recorded the split-adjusted daily closure prices for a total ofN = 477 stockstraded at the New York Stock Exchange (NYSE) overthe period of 20 years, from 02-Jan-1980 to 31-Dec-1999.This amounted a total of 5056 prices per stock, indexedby time variable = 1, 2, . . . , 5056. For analysis andsmoothing purposes, the data was divided time-wise intoM windows t = 1, 2, ..., M of width T, where T corre-sponded to the number of daily returns included in the

    window. Note that several consecutive windows over-lap with each other, the extent of which is dictated bythe window step length parameter T, which describesthe displacement of the window and is also measured intrading days. The choice of window width is a trade-offbetween too noisy and too smoothed data for small andlarge window widths, respectively. The results presentedhere were calculated from monthly stepped four-year win-dows, i.e. T = 250/12 21 days and T = 1000 days.A large scale of different values for both parameters wereexplored, and the cited values were found optimal(Onnela(2000)). With these choices, the overall number of win-dows is M = 195.

    The earlier definition of correlation matrix, given byEq. 12 is used. These correlation coefficients form anNN correlation matrix Ct, which serves as the basis fortrees discussed below. An asset tree is then constructedaccording to the methodology by Mantegna (1999). Forthe purpose of constructing asset trees, a distance is de-fined between a pair of stocks. This distance is associatedwith the edge connecting the stocks and it is expected toreflect the level at which the stocks are correlated. Asimple non-linear transformation dtij =

    2(1 tij) is

    used to obtain distances with the property 2 dij 0,forming an N N symmetric distance matrix Dt. So,if dij = 0, the stock price changes are completely cor-related; if dij = 2, the stock price changes are com-pletely anti-uncorrelated. The trees for different timewindows are not independent of each other, but forma series through time. Consequently, this multitude oftrees is interpreted as a sequence of evolutionary stepsof a single dynamic asset tree. An additional hypothe-sis is required about the topology of the metric space:the ultrametricity hypothesis. In practice, it leads todetermining the minimum spanning tree (MST) of thedistances, denoted Tt. The spanning tree is a simplyconnected acyclic (no cycles) graph that connects all N

    nodes (stocks) with N 1 edges such that the sum ofall edge weights,

    dtijTt d

    tij , is minimum. We refer to

    the minimum spanning tree at time t by the notationTt = (V, Et), where V is a set of vertices and Et is a cor-responding set of unordered pairs of vertices, or edges.Since the spanning tree criterion requires all N nodes tobe always present, the set of vertices V is time indepen-dent, which is why the time superscript has been dropped

    from notation. The set of edges Et, however, does de-pend on time, as it is expected that edge lengths in thematrix Dt evolve over time, and thus different edges getselected in the tree at different times.

    b. Market characterization We plot the distributionof (i) distance elements dtij contained in the distance ma-

    trix Dt (Fig. 27), (ii) distance elements dij containedin the asset (minimum spanning) tree Tt (Fig. 28). Inboth plots, but most prominently in Fig. 27, there ap-pears to be a discontinuity in the distribution betweenroughly 1986 and 1990. The part that has been cut out,pushed to the left and made flatter, is a manifestation ofBlack Monday (October 19, 1987), and its length along

    the time axis is related to the choice of window widthT Onnela et al. (2003a,b). Also, note that in the dis-

    FIG. 27. Distribution of all N(N 1)/2 distance elements dijcontained in the distance matrix Dt as a function of time.

    tribution of tree edges in Fig. 28 most edges included inthe tree seem to come from the area to the right of thevalue 1.1 in Fig. 27, and the largest distance element isdmax = 1.3549.

    Tree occupation and central vertex Let us focuson characterizing the spread of nodes on the tree, byintroducing the quantity of mean occupation layer

    l(t, vc) =1

    N

    Ni=1

    lev(vti ) , (16)

  • 8/9/2019 EconoPhysics - Empirical Facts and Agent-based Models

    20/51

    20

    where lev(vi) denotes the level of vertex vi. The levels,not to be confused with the distances dij between nodes,are measured in natural numbers in relation to the centralvertex vc, whose level is taken to be zero. Here the meanoccupation layer indicates the layer on which the massof the tree, on average, is conceived to be located. Thecentral vertex is considered to be the parent of all othernodes in the tree, and is also known as the root of the

    tree. It is used as the reference point in the tree, againstwhich the locations of all other nodes are relative. Thusall other nodes in the tree are children of the centralvertex. Although there is an arbitrariness in the choiceof the central vertex, it is proposed that the vertex iscentral, in the sense that any change in its price stronglyaffects the course of events in the market on the whole.Three alternative definitions for the central vertex wereproposed in the studies, all yielding similar and, in mostcases, identical outcomes. The idea is to find the nodethat is most strongly connected to its nearest neighbors.For example, according to one definition, the central nodeis the one with the highest vertex degree, i.e. the numberof edges which are incident with (neighbor of) the vertex.Also, one may have either (i) static (fixed at all times) or(ii) dynamic (updated at each time step) central vertex,but again the results do not seem to vary significantly.The study of the variation of the topological propertiesand nature of the trees, with time were done.

    Economic taxonomy Mantegnas idea of linkingstocks in an ultrametric space was motivated a posteri-ori by the property of such a space to provide a meaning-ful economic taxonomy (Onnela et al. (2002)). Mantegnaexamined the meaningfulness of the taxonomy, by com-paring the grou