The Fourth Quadrant - Technical Appendix, Nassim Nicholas Taleb, Edge - 09

16
The Role and Nature of High Impact Events (Black Swans): Technical Commentary and Empirical Data N N Taleb This is an appendix to the Edge piece. It is striking how some simple, simple tests (of stability of the 4th moment and failures of stress testing) can invalidate tens of thousand of research papers on prediction using "least squares", and those based on "standard deviation", "variance", "correlation", "GARCH", "VAR", etc. Indeed one or two tests can transform anything quantitative/statistical in social science (outside psychology) into facade of knowledge. Introduction: Data: Note that the analysis here is exhaustive: it is done systematically on almost ALL transacted macro data representing >98% of worldwide volume. I used interest rates, commodities (oil, agricultural), all available equity indices (US, UK, Continental Europe, Russia, Indonesia, Brazil), main traded currencies. I selected tradability because of its "cleanliness" compared to merely computed data. I also added some micro data: although indices encompass single equities, I processed >18 million pieces of single stock daily data, and select industry datasuch as drug sales, movie returns, etc. (what "clean" data I could find). While we have a plethora of data with business variables, we don't have enough in epidemics, terrorism, wars, etc. Logical and Mathematical Commentary 1) Telescope problem, insufficiency of data in the tails; consequence on left-skewed and right-skewed distributions; 2) Preasymptotics of probability distributions, classification of convergence, or why the central limit theorem is too Platonic; What empirical data shows: 1) The severity of the fatness of the tails --and our inability to say "how" fat (my central problem). Not only kurtosis is> 3 everywhere;but it is unstable. One single observation in 10,000 represents 80% of the total fourth moment. Aside from the unpredictability,this means that notions in Norm- (like variance,standard deviation,correlation) are meaningless as an expression of any of the attributes of the probability distributions. 2) The "atypicality" of moves discussed in the article or why "stress testing" is dangerous--and why the data cannot be captured by conventional "stress" tests or a Poisson except after the fact). Also while we are certain of power laws, we just can't see the tails very clearly. 3) Past deviations (expressed in shortfalls) do not predict future deviations--at any lag you use collectively.There was no need to do it but I tried anyway out of curiosity Acknowledgments:These went into three pieces of formal technical work:a paper in the journal Complexity ( about the problem of separation of fratal power laws into two basins and the effect of the preasymptotics of fat-tailed processes), another in the International Journal of Forecasting ( about the role of measurement error with fat tails& what to do about it ), another under review (about the problems of model selection when we only observe the data,not the process). The arguments were presented at a special panel at the American Statistical Association Joint Statistical Meeting on August 6, 2008, in Denver.I thank Peter Westfall,Aaron Brown,Stan Young, Donald Rubin,Robert Lund, for helpful discussions. I also thank Benoit Mandelbrot,David Freedman,and Philip Stark for comments,and my long time colleague Pallop Angsupun for help with data.I thank David Shaywitz for help with payoffs from innovation in drugs. I also thank Scott Patterson for alerting me to the "great moderation" theories. Mathematical Discussion Definition of payoffs: Where probability is p(X), D the domain of the event, f(x) is the function of the payoff. Using continuous distributions to simplify. f(X) p(X) dX Simple payoff f(x) =1 (a bet) Complex payoff, expectation: f(x)=x

Transcript of The Fourth Quadrant - Technical Appendix, Nassim Nicholas Taleb, Edge - 09

The Role and Nature of High Impact Events (Black

Swans): Technical Commentary and Empirical Data

N N Taleb

This is an appendix to the Edge piece. It is striking how some simple, simple tests (of stability of the 4th moment and failures of stress testing) can invalidate tens of

thousand of research papers on prediction using "least squares", and those based on "standard deviation", "variance", "correlation", "GARCH", "VAR", etc. Indeed one or

two tests can transform anything quantitative/statistical in social science (outside psychology) into facade of knowledge.

Introduction:

Data: Note that the analysis here is exhaustive: it is done systematically on almost ALL transacted macro data representing

>98% of worldwide volume. I used interest rates, commodities (oil, agricultural), all available equity indices (US, UK,

Continental Europe, Russia, Indonesia, Brazil), main traded currencies. I selected tradability because of its "cleanliness"

compared to merely computed data. I also added some micro data: although indices encompass single equities, I processed >18

million pieces of single stock daily data, and select industry datasuch as drug sales, movie returns, etc. (what "clean" data I

could find). While we have a plethora of data with business variables, we don't have enough in epidemics, terrorism, wars,

etc.

Logical and Mathematical Commentary

1) Telescope problem, insufficiency of data in the tails; consequence on left-skewed and right-skewed distributions;

2) Preasymptotics of probability distributions, classification of convergence, or why the central limit theorem is too Platonic;

What empirical data shows:

1) The severity of the fatness of the tails --and our inability to say "how" fat (my central problem). Not only kurtosis is> 3

everywhere;but it is unstable. One single observation in 10,000 represents 80% of the total fourth moment. Aside from the

unpredictability,this means that notions in Norm- (like variance,standard deviation,correlation) are meaningless as an

expression of any of the attributes of the probability distributions.

2) The "atypicality" of moves discussed in the article or why "stress testing" is dangerous--and why the data cannot be

captured by conventional "stress" tests or a Poisson except after the fact). Also while we are certain of power laws, we just can't

see the tails very clearly.

3) Past deviations (expressed in shortfalls) do not predict future deviations--at any lag you use collectively.There was no

need to do it but I tried anyway out of curiosity

Acknowledgments:These went into three pieces of formal technical work:a paper in the journal Complexity ( about the problem

of separation of fratal power laws into two basins and the effect of the preasymptotics of fat-tailed processes), another in the

International Journal of Forecasting ( about the role of measurement error with fat tails& what to do about it ), another under

review (about the problems of model selection when we only observe the data,not the process). The arguments were presented

at a special panel at the American Statistical Association Joint Statistical Meeting on August 6, 2008, in Denver.I thank Peter

Westfall,Aaron Brown,Stan Young, Donald Rubin,Robert Lund, for helpful discussions. I also thank Benoit Mandelbrot,David

Freedman,and Philip Stark for comments,and my long time colleague Pallop Angsupun for help with data.I thank David

Shaywitz for help with payoffs from innovation in drugs. I also thank Scott Patterson for alerting me to the "great moderation"

theories.

Mathematical Discussion

Definition of payoffs: Where probability is p(X), D the domain of the event, f(x) is the function of the payoff.

Using continuous distributions to simplify.

f(X) p(X) dX

Simple payoff f(x) =1 (a bet)

Complex payoff, expectation: f(x)=x

More complex payoff: f(x) nonlinear

The Telescope Problem or The Problem of Fat Tails

Incompleteness of Information about Tail Events. Let us call the “true” probabilities !i*, the probabilities that can be obtained

by having full knowledge of the generating process. We do not observe them, but they are the ones that, in a world where they

can be designed, the data is sampled from them.

Assuming a unimodal distribution, with the probability of states i and their associated payoff, we are dealing with a

“contribution” product as a rectangle that gets thinner as ! becomes smaller (smaller probability, larger deviation), but

its area is more stochastic –and possibly larger ", with the error in the estimation of the product ! " getting larger as ! gets

smaller.

Figure1 -The stochastic rectangle: probability times deviation shows the contribution of an event to the total properties. With low probabilities the rectangle becomes very unstable.

Technical Difference between Fat Tails and Thin Tails: Another way to recover power laws. To take again my metaphor of

the stochastic rectangle, but complicating it by considering a mth power of the payoff , there are two types of

distributions --two types of distincts basins:

1) those for which declines rapidly so become insignificant (as becomes smaller) for all values of m. If you

move to a continuous variable you get as a solution, exponential decline: for large ", f(") = K , which bring us (thanks to

a convolution) to the Gaussian as a special limiting case.

2) others for which these terms stay significant enough, so here you get as a unique solution, for large ", f(") = K . The

value of m for which the rectangle explodes to infinity becomes 1 minus the exponent of the power law.

In other words, if higher terms E[ ] < , the usual expansion around values of X does not work –higher order

increase in importance.

The problem of lack of knowledge of the distribution. It is a fact rarely noticed that absence of knowledge of the parameters

of the distribution generates different classes of fat-tails (pending on the structure of ignorance). Stochastic volatility models, for

instance, can come out of simply not knowing the standard deviation of a Gaussian –and having to estimate it. See the section

on preasymptotics.

Preasymptotics of Platonic Distributions

Background: People discuss central limit: how the sum of N random variables (with finite variance and some independence)

converge to the Gaussian basin. This is mathematically wrong. You converge --but not at a reasonable speed, and not in the

tails. Fat tails implies that higher moments implode --not just the 4th.

The additivity of the Log of the Characteristic function under convolution makes it easy to examine the speed of the

convergence to the Gaussian basin. Some distributions have strong asymptotics, others don't.

Table of Normalized Cumulants -Speed of Convergence: Take the Log of the Fourier transform of the distribution, divide by

where m is the order of the cumulant (and ! the variance). Derive at 0 m times. You would observe convergence to the

Gaussian when higher scaled moments > 2 go to 0 when N becomes large (in a way to facilitate collapse of higher orders of

the distribution). We can see that some distributions reach the Gaussian easily (the 4th cumulant of the exponential is

and that, slower, of the Poisson is ) --others (power laws under any parametrization) NEVER do so for some higher

moment, finite or no finite variance. Later, looking at the data, I will examine the empirical cumulant (N from 1 to 50) and

show how we typically observe NO convergence outside of the small sample effect.

Now Bouchaud and Potters showed the slowness of convergence for power laws ( you converge to the Gaussian only

within ± N meaning the tails stay heavy). Mandelbrot and I used extreme value theory to get the same result: the

Extrememum/Mean stays significant until we hit a huge N.

Table 1 : Behavior under convolution of common distributions in the Gaussian family.

Distribution Normal[",!] Poisson(#) Exponential(#) $(a,b)

PDF

N-convoluted Log

Characteristic

2nd Cum (scaled) 1 1 1 1

3 rd 0

4 th 0

Table 2: Behavior under convolution of more complicated distributions

DistributionMixed Gaussians

(Stoch Vol style)StudentT(3) StudentT(4)

PDF

N-convoluted

Log

Characteristic

2nd Cum

(scaled)1 1 1

3 rd 0 Ind Finite

4 th Ind Ind

6 th Ind Ind

We see from Table 2 a huge qualitative difference between stochastic vol and student T.

Note: What do we mean by "Infinite Kurtosis" or "infinite moment"? It simply means that the number is unstable; it does not

converge as observations lengthen; its measurement is sample dependent. I typically use "indeterminate".

Letting the Data Speak

Sampling Error of the Fourth Moment

Kurtosis in the normal framework implies "departure from Gaussian". So can you imagine that people talk about "kurtosis" --

and measure it -- when one single observation in 40 years (10,000 data points) can represent 90% of the properties!

The implication is that 1) most of the work about fat tails 2) any measure of "volatility" in L2 is just inoperative!

I take here the maximum variable to the fourth power to see its contribution to the kurtosis. For a Gaussian,with

N~ the number is expected to be ridiculously small, ~.008.

Implication: we don't know how "fat" the tails are --if we want to stay in the regular world of assuming that a distribution has

three attributes: centrality, dispersion, symmetry. But we need a fourth dimension: tail indicator, and power laws have it. So,

again, we need to escape the L2 norm.

This also tells us that GARCH should not work --indeed it DOES NOT work out of sample.

In the Gaussian World it has a small dispersion, around .008 for N=10,000 (see simulation of times the N, for a total of

). Even then one observation in 10,000 synthetic securities reoresented a max ~ .037.

Saying "Fat Tails" Implies Difficulties with the Distribution

The instability of The Fourth moment. We have KURT the "raw" Kurtosis for daily observations, KURT10 for biweekly ones,

and KURT66 for 3 month observations of Log changes in the macro variables. " Max Quartic" is the measure of the maximal

contribution to the fourth moment coming from one single observation

KURT KURT10 KURT66 Max Quartic

Australian Dollar 6.3 3.8 2.9 0.12 22.

Australia TB 10y 7.5 6.2 3.5 0.08 25.

Australia TB 3y 7.5 5.4 4.2 0.06 21.

BeanOil 5.5 7. 4.9 0.11 47.

Bonds 30Y 5.6 4.7 3.9 0.02 32.

Bovespa 24.9 5. 2.3 0.27 16.

British Pound 6.9 7.4 5.3 0.05 38.

CAC40 6.5 4.7 3.6 0.05 20.

Canadian Dollar 7.4 4.1 3.9 0.06 38.

Cocoa NY 4.9 4. 5.2 0.04 47.

Coffee NY 10.7 5.2 5.3 0.13 37.

Copper 6.4 5.5 4.5 0.05 48.

Corn 9.4 8. 5. 0.18 49.

CrudeOil 29. 4.7 5.1 0.79 26.

CT 7.8 4.8 3.7 0.25 48.

DAX 8. 6.5 3.7 0.2 18.

Euro Bund 4.9 3.2 3.3 0.06 18.

Euro Curr 5.5 3.8 2.8 0.06 38.

Eurodollar Depo 1M 41.5 28. 6. 0.31 19.

Eurodollar Depo 3M 21.1 8.1 7. 0.25 28.

FTSE 15.2 27.4 6.5 0.54 25.

Gold 11.9 14.5 16.6 0.04 35.

Heating Oil 20. 4.1 4.4 0.74 31.

Hogs 4.5 4.6 4.8 0.05 43.

Jakarta Stock Index 40.5 6.2 4.2 0.19 16.

JGB 17.2 16.9 4.3 0.48 24.

Live Cattle 4.2 4.9 5.6 0.04 44.

Nasdaq 11.4 9.3 5. 0.13 21.

NatGas 6. 3.9 3.8 0.06 19.

Nikkei 52.6 4. 2.9 0.72 23.

Notes 5Y 5.1 3.2 2.5 0.06 21.

Russia RTSI 13.3 6. 7.3 0.13 17.

Short Sterling 851.8 93. 3. 0.75 17.

Silver 160.3 22.6 10.2 0.94 46.

smallcap 6.1 5.7 6.8 0.06 17.

SoyBeans 7.1 8.8 6.7 0.17 47.

SoyMeal 8.9 9.8 8.5 0.09 48.

sp500 38.2 7.7 5.1 0.79 56.

Sugar #11 9.4 6.4 3.8 0.3 48.

SwissFranc 5.1 3.8 2.6 0.05 38.

TY10Y Notes 5.9 5.5 4.9 0.1 27.

Wheat 5.6 6. 6.9 0.02 49.

Yen 9.7 6.1 2.5 0.27 38.

Behavior of the Fourth Moment under temporal aggregation

The discussion of the preasymptotics table shows the theoretical effect of Central Limit if it worked. --Yet we see NONE

beyond the regular sampling error with "infinite" (i.e. non existing) moments.

With !t as the lag in days (here the lag is 1 through 45):

Slight technicity: I avoid the notion of ex post "mean" in the computation of Kurtosis. Most of the data is continuous futures

with 0 expected mean. The data I used is mostly "continuous future"

Note Some data is "controlled", making it less wild, owing to circuit breakers (markets shut down if they move more than, say 3

points), which causes an artificial thinning of the tails and lowers the Maximum Quartic contrbution. For instance Oct 20 1987,

the 30y bond moved 10 points in the real market, but only a move of 3 was registered as the circuit breakers were activated.

Longitudinal 4th moment: no sign of stability. Typical graph.

First Conclusion: Avoid the use of "variance" metrics. mean-variance is inadequate.

Evidence of Scalability - or Why Observed "Fat Tails" are not (Standard) Poisson and why there is no TYPICAL

deviation

Thanks to the need for the probabililities add up to 1 (something even economists seem to agree with), scalability in the tails is

the sole possible model for such data. We may not be able to write the model for the full distribution --but we know how it

looks like in the tails, where it matters.

The Behavior of Conditional Averages: With a scalable (or "scale-free") distribution, when K is "in the tails" (say you reach

the point when f[x,!]=C , where C is a constant and ! the power law exponent), the relative conditional expectation of X

(knowing that x>K) divided by K, that is, is a constant, and does not depend on K. More precisely, it is

.

This provides for a handy way to ascertain scalability by raising K and looking at the averages in the data.

Note further that, for a standard Poisson, (too obvious for a Gaussian): not only the conditional expectation depends on K, but it

"wanes", i.e.

Other Decompositions: The result of course would cancel the kind of representations such as the model called Duffie-Pan-

Singleton, which decomposes generating processes into a sum of jumps and some diffusion. Unless they have an infinity of

power-law sized jumps, the conditional average would lose its scalability beyond the worst jump.

Calibrating Tail Exponents. In addition, we can calibrate power laws. Using K as the cross-over point, we get the ! exponent

above it --the same as if we used the Hill estimator or ran a regression above some point.

Individual Stocks Data

Stocks are interesting because there are so many. This test using 12 million pieces of exhaustive single stock returns shows how

equity prices do not have a characteristic scale.No other possible method than a Paretan tail,albeit of unprecise calibration,can

charaterize them.

Data: Pallop Angsupun ran the following test: We collected the most recent 10 years of daily prices for stocks (no survivorship

bias effect as we included companies that have been delisted up to the last trading day), n= 11,674,825 , deviations expressed

in logarithmic returns.

We focused on negative deviations. For instance,in the Table x below,the average move below "10 standard deviations",-10,is -

15.6 standard deviations, that is a multiple of 1.56.We kept moving K up until to 100 "sigmas" equivalent (indeed) --and we

still had observations.

Note the tail estimator

Daily Returns (Stocks)

I normalized by STD (to communicate the result in the lingo) but we get the same results with MAD

Sigma n Implied !

-1. -1.74525 1.5242*10^^6 1.74525 2.34183

-2. -3.01389 343952. 1.50695 2.9726

-3. -4.58148 99404. 1.52716 2.89696

-10. -15.6078 3528. 1.56078 2.78324

-20. -30.4193 503. 1.52096 2.91952

-50. -113.324 20. 2.26649 1.78958

-75. -180.84 9. 2.4112 1.70861

-100. -251.691 5. 2.51691 1.65923

Longer Window (Stocks)

A longer window, by taking time-aggregates, such as weeks, and months, do not show any different result --which is an

additional evidence of the failure of Poisson. For instance weekly tails exhibit thickening instead of flattening: the implied !

drops!

Sigma n Implied !

-1. -1.71473 270506. 1.71473 2.39914

-5. -6.88222 2638. 1.37644 3.65644

-10. -15.1321 190. 1.51321 2.9485

-15. -31.7716 34. 2.11811 1.89437

-20. -52.5833 14. 2.62916 1.61381

Macro Data

I used the set --again same pattern, particularly with the large deviations.

Positive Domain (Cond Exp is the expectation of the excess over a certain number)

1 2.01443 65958

2 3.0814 23450

3 4.19842 8355

4 5.33587 3202

5 6.52524 1360

6 7.74405 660

7 9.10917 340

8 10.3649 192

9 11.6737 120

10 13.8726 84

11 15.3832 65

12 19.3987 47

13 21.0189 36

14 21.7426 29

15 24.1414 21

16 25.1188 18

17 27.8408 13

18 31.2309 11

19 35.6161 7

20 35.9036 6

Negative Domains: Drops below a certain Threshold

-20 -38.7657 11

-19 -35.5523 13

-18 -35.0807 14

-17 -33.6529 16

-16 -27.5269 20

-15 -25.7004 22

-14 -25.0956 27

-13 -21.353 38

-12 -19.5828 46

-11 -17.02 66

-10 -14.6851 95

-9 -13.158 133

-8 -11.0048 226

-7 -9.43672 392

-6 -7.95766 689

-5 -6.66288 1415

-4 -5.40792 3346

-3 -4.24303 8676

-2 -3.13423 23258

-1 -2.06689 62803

EuroDollars Front Month 1986-2006

n=4947

MAD n Implied !

-1. -2.41323 969 2.41323 1.7076

-3. -5.16202 203 1.72067 2.38759

-5. -7.96752 69 1.5935 2.68491

-8. -11.4367 24 1.42959 3.32782

UK Rates 1990-2007

n=4143

MAD n Implied !

1. 2.23822 806 2.23822 1.80761

3. 4.97319 140 1.65773 2.52038

5. 8.43269 36 1.68654 2.45658

7. 11.4763 16 1.63947 2.56381

Literally, you do not even have a large number K for which scalability drops from a small sample effect.

USD-JPY (1971-2007) (Negative Domain)

MAD n Implied !

-1 -2.14951 1674 2.14951 1.86993

-3. -4.38008 288 1.46003 3.17378

-5. -6.74883 66 1.34977 3.85906

-6. -7.92747 34 1.32125 4.11288

-8.75 -13.2717 6 1.51677 2.9351

We get scalability as far as meets the eye. Usually small sample effects cause us to not observe much of the tails, with the

consequence of "thinning" the upper bound. We do not even witness such effect.

Past ShortFall Does not Predict Future ShortFall -at all lags

The picture shows the predictability of a 7% shortfall, i.e. on , = if X <-.07. With discrete data, we

see if a given shortfall after a date t can be predicted from data before that date t. Here X= Log[ / ]. The result is

presented in Log space. Note that here !t = 1day and I lagged by 252 days. But the result does not change in a perceptible way

when I change the observation period or vary the lag (next Graph).

Lagging does not help. Except that one my datamining might be able to find some "rule" but these have failed out of sample.

However regular events tend to predict regular events

The graph shows the predictability of mean deviation between one period (252 days) and the next.

A Brief Discussion of Drug and Movie Successes

I look at drug sales for existing drugs. The problem is that when the Max is 167 STD away from the mean, you have a problem.

That number could double if I include some marginal drugs not in my sample as these would affect the mean. I coulf not get a

convincing tail exponent.

Max 1.34*10^^10

Total 4.93777*10^^11

Mean 3.98583*10^^6

Max/STD 167.633

MAD 7.19047*10^^6

STD 7.99363*10^^7

With movies it is even worse. We don't know the baseline.

n 7985.

Max 6.00788*10^^8

Total 1.16288*10^^11

Mean 1.45633*10^^7

Max/STD 17.598

MAD 1.90327*10^^7

STD 3.41395*10^^7

But I can derive conclusions: there is a " potential" in the tails that I could fill-in --which would raise the expected mean

considerably. But how much? I don't know and I don't want to play like academic charlatans.

Created by Mathematica (September 21, 2008)