Fat Tailes and Fragility

download Fat Tailes and Fragility

of 75

Transcript of Fat Tailes and Fragility

  • 7/30/2019 Fat Tailes and Fragility

    1/75

    Nassim Nicholas Taleb

    Fat Tails and (Anti)fragility

    Lectures on Probability, Risk, and Decision Making in The Real World

    DRAFT VERSION, APRIL 2013

  • 7/30/2019 Fat Tailes and Fragility

    2/75

    Risk and (Anti)fragility - N N Taleb| 3

  • 7/30/2019 Fat Tailes and Fragility

    3/75

    PART I- MODEL ERROR & METAPROBABILITY

    This Segment, Part I corresponds to topicscovered in The Black Swan, 2 nd Ed.

    Part II will address those of Antifragile .

    Note that all the topics in this bookare discussedin these books a verbal or philosophicalform

    I am currently teaching a class with the absurd title risk management and decision-making in the real world , a title I have selectedmyself; this is a total absurdity since risk management and decision-making should never have to justify being about the real world, andwhats worse, one should never be apologetic about it. In real disciplines, titles like Safety in the Real World, Biology and Medicinein the Real World would be lunacies. But in social science all is possible as there is no exit from the gene pool for blunders, nothing tocheck the system. You cannot blame the pilot of the plane or the brain surgeon for being too practical, not philosophical enough; thosewho have done so have exited the gene pool. The same applies to decision making under uncertainty and incomplete information. Theother absurdity in is the common separation of risk and decision-making, as the latter cannot be treated in any way except under theconstraint: in the real world .

    And the real world is about incompleteness: incompleteness of understanding, representation, information, etc., what when does when onedoes not know what's going on, or when there is a non-zero chance of not knowing what's going on. It is based on focus on the unknown,not the production of mathematical certainties based on weak assumptions; rather measure the robustness of the exposure to the unknown,which can be done mathematically through metamodel (a model that examines the effectiness and reliability of the model), what I callmetaprobability, even if the meta-approach to the model is not strictly probabilistic.

    This first section presents a mathematical approach for dealing with errors in conventional risk models, taking the bulls***t out of some,add robustness, rigor and realism to others. For instance, if a "rigorously" derived model (say Markowitz mean variance) gives a preciserisk measure, but ignores the central fact that the parameters of the model don't fall from the sky, but need to be discovered with someerror rate, then the model is not rigorous for risk management, decision making in the real world, or, for that matter, for anything. We needto add another layer of uncertainty, which invalidates some models (but not others). The mathematical rigor is shifted from focus onasymptotic (but rather irrelevant) properties to making do with a certain set of incompleteness. Indeed there is a mathematical way to dealwith incompletness.

    The focus is squarely on "fat tails", since risks and harm lie principally in the high-impact events, The Black Swan and some statisticalmethods fail us there. The section ends with an identification of classes of exposures to these risks, the Fourth Quadrant idea , the class of decisions that do not lend themselves to modelization and need to be avoided. Modify your decisions. The reason decision-making andrisk management are insparable is that there are some exposure people should never take if the risk assessment is not reliable, somethingpeople understand in real life but not when modeling. About every rational person facing an plane ride with an unreliable risk model or ahigh degree of uncertainty about the safety of the aircraft would take a train instead; but the same person, in the absence of skin-in-thegame, when working as "risk expert" would say: "well, I am using the best model we have" and use something not reliable, rather than beconsistent with real-life decisions and subscribe to the straightforward principle: "let's only take those risks for which we have a reliablemodel".

    Finally, someone recently asked me to give a talk at unorthodox statistics session of the American Statistical Association. I refused: theapproach presented here is about as orthodox as possible, much of the bone of this author come precisely from enforcing rigorousstandards of statistical inference on process. Risk (and decisions) require more rigor than other applications of statistical inference.

    Risk and (Anti)fragility - N N Taleb| 5

  • 7/30/2019 Fat Tailes and Fragility

    4/75

    1Risk is Not in The Past (the Turkey Problem)

    This is an introductory chapter outlining the turkey problem , showing its presence in data, and explaining why an assessment of fragilityis more potent than data-based methods of risk detection.

    1.1 Introduction: Fragility, not Statistics

    Fragility (Chapter x) can be defined as an accelerating sensitivity to a harmful stressor: this response plots as a concave curve and mathemati-cally culminates in more harm than benefit from the disorder cluster [(i) uncertainty, (ii) variability, (iii) imperfect, incomplete knowledge, (iv)chance, (v) chaos, (vi) volatility, (vii) disorder, (viii) entropy, (ix) time, (x) the unknown, (xi) randomness, (xii) turmoil, (xiii) stressor, (xiv)error, (xv) dispersion of outcomes, (xvi) unknowledge.

    Antifragility is the opposite, producing a convex response that leads to more benefit than harm.We do not need to know the history and statistics of an item to measure its fragility or antifragility, or to be able to predict rare and random('black swan') events. All we need is to be able to assess whether the item is accelerating towards harm or benefit.

    The relation of fragility, convexity and sensitivity to disorder is thus mathematical and not derived from empirical data.

    Figure 1.1. The risk of breaking of the coffee cup is not necessarily in the past time series of the variable; if fact surviving objects have to have had a rosy past.

    The problem with risk management is that past time series can be (and actually are) unreliable. Some finance journalist (Bloomberg) wascommenting on my statement in Antifragile about our chronic inability to get the risk of a variable from the past with economic time series.Where is he going to get the risk from since we cannot get it from the past ? from the future?, he wrote. Not really, think about it: from the

    present, the present state of the system. This explains in a way why the detection of fragility is vastly more potent than that of risk -and mucheasier to do.

    6 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    5/75

    Assymmetry and Insufficiency of Past Data.

    Our focus on fragility does not mean you can ignore the past history of an object for risk management, it is just accepting that the past ishighly insufficient .

    The past is also highly asymmetric. There are instances (large deviations) for which the past reveals extremely valuable information aboutthe risk of a process. Something that broke once before is breakable, but we cannot ascertain that what did not break is unbreakable. This

    asymmetry is extremely valuable with fat tails, as we can reject some theories, and get to the truth by means of via negativa .

    This confusion about the nature of empiricism, or the difference between empiricism (rejection) and naive empiricism (anecdotalacceptance) is not just a problem with journalism. Naive inference from time series is incompatible with rigorous statistical inference; yetmany workers with time series believe that it is statistical inference. One has to think of history as a sample path, just as one looks at asample from a large population, and continuously keep in mind how representative the sample is of the large population. Whileanalytically equivalent, it is psychologically hard to take the outside view, given that we are all part of history, part of the sample so tospeak.

    General Principle To Avoid Imitative, Cosmetic Science:From Antifragile (2012):

    There is such a thing as nonnerdy applied mathematics: find a problem first, and figure out the math that works for it (just as oneacquires language), rather than study in a vacuum through theorems and artificial examples, then change reality to make it look like these

    examples.

    1.2 Turkey Problems

    Turkey and Inverse Turkey (from the Glossary for Antifragile): The turkey is fed by the butcher for a thousand days, and every day theturkey pronounces with increased statistical confidence that the butcher "will never hurt it"until Thanksgiving, which brings a BlackSwan revision of belief for the turkey. Indeed not a good day to be a turkey. The inverse turkey error is the mirror confusion, not seeingopportunities pronouncing that one has evidence that someone digging for gold or searching for cures will "never find" anythingbecause he didnt find anything in the past.

    What we have just formulated is the philosophical problem of induction (more precisely of enumerative induction.)

    1.3 Risk EstimatorLet us define a risk estimator that we will work with throughout the book.

    Definition 1.3.1 : Take, as of time T, a standard sequence X= 8 xt 0 + iDt

  • 7/30/2019 Fat Tailes and Fragility

    6/75

    S M T X H A, f L, with A = H- , K E, f H xL= x

    (1.2)S = i= 0n 1 A X t 0+ i Dt i= 0n 1an alternative method is to compute the conditional shortfall:

    S E @ M X < K D= M T X H A, f L i= 0n 1

    i= 0n 1 A(1.3)S = i= 0n 1 A X t 0+ i Dt i= 0n 1 A

    One of the uses of the indicator function for 1 A , for observations falling into a subsection A of the distribution, is that we can actually derive thepast actuarial value of an option with X as an underlying struck as K as M T

    X H A, xL, with A=(- ,K] for a put and A= [K, ) for a call, with f(x) =xCriterion:

    The measure M is considered to be an estimator over interval [t- N Dt, T] if and only it holds in expectation over the period X T + i D t forall i>0, that is across counterfactuals of the process, with a threshold x (a tolerated divergence that can be a bias) so |E[ M T + i D t

    X (A,f)]- M tX (A,

    f)]|< x . In other words, the estimator should have some stability around the "true" value of the variable and a lower bound on the tolerated bias.

    We skip the notion of variance for an estimator and rely on absolute mean deviation so x can be the absolute value for the tolerated bias. Andnote that we use mean deviation as the equivalent of a loss function; except that with matters related to risk, the loss function is embedded inthe subt A of the estimator.

    This criterion is compatible with standard sampling theory. Actually, it is at the core of statistics. Let us rephrase:

    Standard statistical theory doesnt allow claims on estimators made in a given set unless these are made on the basis that they cangeneralize, that is, reproduce out of sample, into the part of the series that has not taken place (or not seen), i.e., for time series, for t >t.

    This should also apply in full force to the risk estimator. In fact we need more, much more vigilance with risks.

    For convenience, we are taking some liberties with the notations, pending on context: M T X H A, f Lis held to be the estimator , or a conditional

    summation on data but for convenience, given that such estimator is sometimes called empirical expectation, we will be also using thesame symbol for the estimated variable in cases where M is the M-derived expectation operator ! or " P under real world probabilitymeasure P, that is the expectation operator completely constructed from past available n data points.

    1.4 Fat Tails, the Finite Moment Case

    Fat tails are not about the incidence of low probability events, but the contributions of events away from the center of the distributionto the total properties.

    As a useful heuristic, take the ratio E @ x2D E @x D(or more generally

    M T X HA,x n L

    M T X HA, x L); the ratio increases with the fat tailedness of the distribution,

    when the distribution has finite moments up to n).

    Simply, x n is a weighting operator that assigns a weight, x n- 1 large for large values of x, and small for smaller values.

    Norms ! p : Moregenerally, the ! p Norm of a vector x = 8 xi 0,

    (1.5)i= 1

    n

    xi p+ a1H p+ aL

    r i= 1

    n

    xi p1 p

    One property quite useful with power laws with infinite moments :

    8 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    7/75

    (1.6) x = MaxH8 xi

  • 7/30/2019 Fat Tailes and Fragility

    8/75

    Infinite moments , say infinite variance, always manifest themselves as computable numbers in observed sample, yielding an estimator M,simply because the sample is finite. A distribution, say, Cauchy, with infinite means will always deliver a measurable mean in finitesamples; but different samples will deliver completely different means.

    The next two figures illustrate the drifting effect of M a with increasing information.

    2000 4000 6000 8000 10 000T

    - 2

    - 1

    1

    2

    3

    4

    T X H A, xL

    2000 4000 6000 8000 10 000T

    3.0

    3.5

    4.0

    M T X I A, x2 O

    Figure 1.3 The mean (left) and standard deviation (right) of two series with Infinite mean (Cauchy) and infinite variance (St(2)), respectively.

    A Simple Heuristic to Create Mildly Fat TailsSince higher moments increase under fat tails, as compared to lower ones, it should be possible so simply increase fat tails without increasinglower moments.

    Variance-preserving heuristic. Keep ! [ x2] constant and increase ! [ x4], by "stochasticizing" the variance of the distribution, since < x 4 > isitself analog to the variance of < x2> measured across samples. Chapter x will do the "stochasticizing" in a more involved way.

    An effective heuristic to watch the effect of the fattening of tails is to simulate a random variable we set to be at mean 0, but with the following

    variance-preserving : it follows a distribution N(0, s 1 - a ) with probability p = 12

    and N(0, s 1 + a ) with the remaining probability 12

    ,

    with 0 b a < 1 .The characteristic function is

    (1.11)f Ht , aL= 12

    -1

    2 H1+ aLt 2 s 2 I1 + a t 2 s 2MOdd moments are nil. The second moment is preserved since

    (1.12) M H2L= H-L2 ! t ,2 f Ht L0 = s 2and the fourth moment

    (1.13) M H4L= H-L4 ! t ,4 f Ht L0 = 3 Ia 2 + 1Ms 4which puts the traditional kurtosis at 3 Ha 2 + 1L. This means we can get an "implied a" from kurtosis. a is roughly the mean deviation of thestochastic volatility parameter "volatility of volatility" or Vvol in a more fully parametrized form.

    This heuristic is of weak powers as it can only raise kurtosis to twice that of a Gaussian, so it should be limited to getting some intuition aboutits effects. Section 1.10 will present a more involved technique.

    1.5 Scalable and Nonscalable, A Deeper View of Fat Tails

    So far for the discussion on fat tails we stayed in the finite moments case. For a certain class of distributions, those with finite moments,

    10 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    9/75

    P > nK

    P > Kdepends on n and K. For a scale-free distribution, with K in the tails, that is, large enough,

    P > n K

    P > Kdepends on n not K. These latter

    distributions lack in characteristic scale and will end up having a Paretan tail, i.e., for X large enough, P > X = C X -a where a is the tail and C isa scaling constant.

    Table 1.1. The Gaussian Case, By comparison a Student T Distribution with 3 degrees of freedom, reaching power law s in the tail. And a "pure" power law , the Pareto distribution, with an exponenta =2.

    K1

    P > KGaussian

    P > K

    P > 2 K

    1

    P KSt H3 L

    P > K

    P > 2 KSt H3 L

    1

    P KPareto

    Ha = 2 L

    P > K

    P > 2 KPareto

    2 44.0 7.2 10 2 14.4 4.97443 8.00 4.

    4 3.16 10 4 3.21 10 4 71.4 6.87058 64.0 4.

    6 1.01 10 9 1.59 10 6 216. 7.44787 216. 4.

    8 1.61 10 15 8.2 10 7 491. 7.67819 512. 4.

    10 1.31 10 23 4.29 10 9 940. 7.79053 1.00 10 3 4.

    12 5.63 10 32 2.28 10 11 1.61 10 3 7.85318 1.73 10 3 4.

    14 1.28 10 44 1.22 10 13 2.53 10 3 7.89152 2.74 10 3 4.

    16 1.57 10 57 6.6 10 14 3.77 10 3 7.91664 4.10 10 3 4.

    18 1.03 10 72 3.54 10 16 5.35 10 3 7.93397 5.83 10 3 4.

    20 3.63 10 88 1.91 10 18 7.32 10 3 7.94642 8.00 10 3 4.

    Note: We can see from the scaling difference between the Student and the Pareto the conventional definition of a power law tailed distribution

    is expressed more formally as P > X = L H xL X -a where L H x) is a slow varying function, which satisfies lim x LHt xL L x =1 for all constantst > 0.

    For X large enough,log HP > X L

    log HXL converges to a constant, the tail exponent - a . A scalable should show the slope a in the tails, as X

    Gaussian

    LogNormal

    Student (3)

    2 5 10 20Log X

    10 - 13

    10 - 10

    10- 7

    10 - 4

    0.1

    Log P > X

    Figure 1.5. Three Distributions. As we hit t he tails, the Student remains scalable while the Lognormal shows an intermediate position.

    So far this gives us the intuition of the difference between classes of distributions. Only scalable have true fat tails, as others turn into aGaussian under summation. And the tail exponent is asymptotic; we may never get there and what we may see is an intermediate version of it.The figure above drew from Platonic off-the-shelf distributions; in reality processes are vastly more messy, with switches between exponents.

    Estimation issues: Note that there are many methods to estimate the tail exponent a from data, what is called a calibration. However, we willsee, the tail exponent is rather hard to guess, and its calibration marred with errors, owing to the insufficiency of data in the tails. In general, thedata will show thinner tail than it should.

    We will return to the issue in Chapter 3.

    Risk and (Anti)fragility - N N Taleb| 11

  • 7/30/2019 Fat Tailes and Fragility

    10/75

    The Black Swan Problem: It is is not merely that events in the tails of the distributions matter, happen, play a large role, etc. The point isthat these events play the major role and their probabilities are not computable, not reliable for any effective use.

    Why do we use Student T to simulate symmetric power laws? It is not that we believe that the generating process is Student T. Simply, the

    center of the distribution does not matter much for the properties. The lower the exponent, the less the center plays a role. The higher theexponent, the more the student T resembles the Gaussian, and the more justified its use will be accordingly. More advanced methods involvingthe use of Levy laws may help in the event of asymmetry, but the use of two different Pareto distributions with two different exponents, one forthe left tail and the other for the right one would help.

    Why power laws? There are a lot of theories on why things should be power laws, as sort of exceptions to the way things work probabilisti-cally. But it seems that the opposite idea is never presented: power should can be the norm, and the Gaussian a special case as we will see inChapt x, of concave-convex responses (sort of dampening of fragility and antifragility, bringing robustness, hence thinning tails).

    1.6 Different Approaches For Statistically Derived EstimatorsThere are broadly two separate ways to go about estimators: nonparametric and parametric.

    The nonparametric approach: it is based on observed raw frequencies derived from sample-size n. Roughly, it sets a subspace of events A

    and M T X H A, 1L(i.e., f(x) =1 ), so we are dealing with the frequencies j H AL= 1n i= 0n 1 A . Thus these estimates dont allow discussions on frequen-cies j < 1

    n, at least not directly. Further the volatility of the estimator increases with lower frequencies. The error is a function of the frequency

    itself (or rather, the smaller of the frequency j and 1-j ). So if i= 0n 1 A=30 and n=1000, only 3 out of 100 observations are expected to fall intothe subspace A, restricting the claims to too narrow a set of observations for us to be able to make a claim, even if the total sample n=1000 isdeemed satisfactory for other purposes. Some people introduce smoothing kernels between the various buckets corresponding to the variousfrequencies, but in essence the technique remains frequency-based. So if we nest subsets A 1 A 2 ... A , the expected volatility (as we will seelater in the chapter, we mean mean absolute deviation) of M T

    X H A z, f L, E @ M T X H A z, f L- M > T X H A z, f L D r E @ M T X H A< z, f L- M > T X H A< z, f L Dforall functions f. (Proof via law of large numbers).

    The parametric approach: it allows extrapolation but imprison the representation into a specific off-the-shelf probability distribution (whichcan itself be composed of more probability distributions); so M T

    X is an estimated parameter for use input into a distribution or model.

    Both methods make is difficult to deal with small frequencies. The nonparametric for obvious reasons of sample insufficiency in the tails, theparametric because small probabilities are very sensitive to parameter errors.

    The problem of payoff

    This is the central problem of model error seen in consequences not in probability. The literature is used to discussing errors on probabilitywhich should not matter much for small probabilities. But it matters for payoffs, as f can depend on x. Let us see how the problem becomesvery bad when we consider f and in the presence of fat tails. Simply, you are multiplying the error in probability by a large number, since fattails imply that the probabilities p(x) do not decline fast enough for large values of x. Now the literature seem to have examined errors inprobability, not errors in payoff.

    Let M T X H A z, f Lbe the estimator of a function of x in the small subspace A z= (d1 , d2) of the support of the variable. i) Take x the mean absolute

    error in the estimation of the probability in the small subspace A z= (d1 , d2), i.e., x = E M T X H A z, 1L- M > T X H A z, 1L. ii) Assume f(x) is either

    linear or convex (but not concave) in the form C+ L x b , with both L > 0 and b 1. iii) Assume further that the distribution p(x) is expected tohave fat tails (of any of the kinds seen in 1.4 and 1.5 ) ,

    Then the estimation error of M T X H A z, f Lcompounds the error in probability, thus giving us the lower bound in relation to x .

    (1.14) E A M T X H A z, f L- M > T X H A z, f L E L d1 - d 2 b Hd2 d1 L b- 1 E A M T X H A z, 1L- M > T X H A z, 1L E

    Since

    E @ M > T X H A z, f LD E @ M > T X H A z, 1LD=

    d1d2 f H xL pH xL x d1d2 pH xL x

    The error on p(x) can be in the form of parameter mistake that inputs into p, say s (Chapter x and discussion of metaprobability), or in thefrequency estimation. Note now that if d1 - , we may have an infinite error on M T

    X H A z, f L, the left-tail shortfall although by definition theerror on probability is necessarily bounded.

    1.7 The Mother of All Turkey Problems: How Time Series Econometrics and Statitics

    12 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    11/75

    Dont Replicate (Debunking a Nasty Type of PseudoScience)

    Something Wrong With Econometrics, as Almost All Papers Dont Replicate. The next two reliability tests, one about parametricmethods the other about robust statistics, show that there is something wrong in econometric methods, fundamentally wrong, and that themethods are not dependable enough to be of use in anything remotely related to risky decisions.

    Performance of Standard Parametric Risk Estimators, f(x)= x n (Norm ! 2 )With economic variables one single observation in 10,000, that is, one single day in 40 years, can explain the bulk of the "kurtosis", a measureof "fat tails", that is, both a measure how much the distribution under consideration departs from the standard Gaussian, or the role of remoteevents in determining the total properties. For the U.S. stock market, a single day, the crash of 1987, determined 80% of the kurtosis. The sameproblem is found with interest and exchange rates, commodities, and other variables. The problem is not just that the data had "fat tails",something people knew but sort of wanted to forget; it was that we would never be able to determine "how fat" the tails were within standardmethods. Never.

    The implication is that those tools used in economics that are based on squaring variables (more technically, the Euclidian, or ! 2 norm),such as standard deviation, variance, correlation, regression, the kind of stuff you find in textbooks, are not valid scientifically (except in somerare cases where the variable is bounded). The so-called "p values" you find in studies have no meaning with economic and financial variables.Even the more sophisticated techniques of stochastic calculus used in mathematical finance do not work in economics except in selectedpockets.

    The results of most papers in economics based on these standard statistical methods are thus not expected to replicate, and they effectively

    don't. Further, these tools invite foolish risk taking. Neither do alternative techniques yield reliable measures of rare events, except that we cantell if a remote event is underpriced, without assigning an exact value.

    From Taleb (2009), using Log returns,

    X t logP Ht L

    P Ht - i DtLTake the measure M t

    X HH- , L, X 4Lof the fourth noncentral moment M t

    X HH- , L, X 4L 1 N i= 0 N H X t - i DtL4and the N -sample maximum quartic observation Max{ X t - i Dt4 0

    Figure 4.4. Fitting a Frchet distribution to the Student T generated with m=3 degrees of freedom. The Frechet distribution a =3, b=32 fits up to higher values of E. But nexttwo graphs shows the fit more closely.

    44 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    43/75

    Figure 4.5. Seen more closely

    How Extreme Value Has a Severe Inverse Problem In the Real WorldIn the previous case we start with the distribution, with the assumed parameters, then get the corresponding values, as these risk modelers do.In the real world, we dont quite know the calibration, the a of the distribution, assuming (generously) that we know the distribution. So herewe go with the inverse problem. The next table illustrates the different calibrations of P K the probabilities that the maximum exceeds a certainvalue K (as a multiple of b under different values of K and a .

    a1

    P > 3 b

    1

    P > 10 b

    1

    P > 20 b

    1. 3.52773 10.5083 20.50421.25 4.46931 18.2875 42.79681.5 5.71218 32.1254 89.94371.75 7.3507 56.7356 189.6492. 9.50926 100.501 400.52.25 12.3517 178.328 846.3972.5 16.0938 316.728 1789.352.75 21.0196 562.841 3783.473. 27.5031 1000.5 8000.53.25 36.0363 1778.78 16 9 18.43.5 47.2672 3162.78 35 777.63.75 62.048 5623.91 75 659.84. 81.501 10 000.5 160 000.4.25 107.103 17 7 83.3 338 3 59.4.5 140.797 31 623.3 715 542.

    4.75 185.141 56 2 34.6 1.51319 10 6

    5. 243.5 100 001. 3.2 10 6

    Consider that the error in estimating the a of a distribution is quite large, often > 1/2, and typically overstimated. So we can see that we get theprobabilities mixed up > an order of magnitude. In other words the imprecision in the computation of the a compounds in the evaluation of the probabilities of extreme values.

    4.3 Using Power Laws Without Being Harmed by MistakesWe can use power laws in the near tails for information, not risk management. That is, not pushing outside the tails, staying within a part of the distribution for which errors are not compounded.

    I was privileged to get access to a database with cumulative sales for editions in print that had at least one unit sold that particular week (that is,

    Risk and (Anti)fragility - N N Taleb| 45

  • 7/30/2019 Fat Tailes and Fragility

    44/75

    conditional of the specific edition being still in print). I fit a powerlaw with tail exponent a > 1.3 for the upper 10% of sales (graph), withN=30K. Using the Zipf variation for ranks of powerlaws, with r x and r y the ranks of book x and y, respectively, S x and S y the correspondingsales

    (4.1)S x

    S y=

    r x

    r y

    -1

    a

    So for example if the rank of x is 100 and y is 1000, x sells I1001000 M-1

    1.3 = 5.87 times what y sells.

    Note this is only robust in deriving the sales of the lower ranking edition ( r y> r x) because of inferential problems in the presence of fat-tails.

    a =1.3

    Neartail

    100 10 4 10 6X

    10 - 4

    0.001

    0.01

    0.1

    1

    P > X

    Figure 4.7 Log-Log Plot of the probability of exceeding X (book sales), and X

    This works best for the top 10,000 books, but not quite the top 20 (because the tail is vastly more unstable). Further, the effective a for largedeviations is lower than 1.3. But this method is robust as applied to rank within the near tail.

    46 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    45/75

    5How To Tell True Fat Tails from Poisson Jumps

    5.1 Beware The Poisson

    By the masquerade problem , any power law can be seen backward as a Gaussian plus a series of simple (that is, noncompound) Poisson jumps, the so-called jump-diffusion process. So the use of Poisson is often just a backfitting problem, where the researcher fits a Poisson,happy with the evidence.

    The next exercise aims to supply convincing evidence of scalability and NonPoisson-ness of the data (the Poisson here is assuming a standardPoisson). Thanks to the need for the probabililities add up to 1, scalability in the tails is the sole possible model for such data. We may not beable to write the model for the full distribution --but we know how it looks like in the tails, where it matters.

    The Behavior of Conditional Averages: With a scalable (or "scale-free") distribution, when K is "in the tails" (say you reach the point when f H xL= C X -a where C is a constant and a the power law exponent), the relative conditional expectation of X (knowing that x>K ) divided by K,that is, E@X X> KD

    Kis a constant, and does not depend on K. More precisely, it is a

    a- 1.

    (5.1) K x f H x, aL x

    K f H x, aL x=

    K a

    a - 1

    This provides for a handy way to ascertain scalability by raising K and looking at the averages in the data.Note further that, for a standard Poisson, (too obvious for a Gaussian): not only the conditional expectation depends on K, but it "wanes", i.e.

    (5.2)LimitK

    K m xGH xL x K m x x! x

    K = 1

    Calibrating Tail Exponents. In addition, we can calibrate power laws. Using K as the cross-over point, we get the a exponent above it --thesame as if we used the Hill estimator or ran a regression above some point.

    We defined fat tails in the previous chapter as the contribution of the low frequency events to the total properties. But fat tails can come fromdifferent classes of distributions. This chapter will present the difference between two broad classes of distributions.

    This brief test using 12 million pieces of exhaustive returns shows how equity prices (as well as short term interest rates) do not have acharacteristic scale. No other possible method than a Paretan tail, albeit of unprecise calibration, can charaterize them.

    5.2 Leave it to the DataWe tried the exercise with about every piece of data in sight: single stocks, macro data, futures, etc.

    Equity Dataset : We collected the most recent 10 years (as of 2008) of daily prices for U.S. stocks (no survivorship bias effect as we includedcompanies that have been delisted up to the last trading day), n= 11,674,825 , deviations expressed in logarithmic returns.We scaled the data using various methods.

    The expression in "numbers of sigma" or standard deviations is there to conform to industry language (it does depend somewhat on thestability of sigma). In the "MAD" space test we used the mean deviation.

    Risk and (Anti)fragility - N N Taleb| 47

  • 7/30/2019 Fat Tailes and Fragility

    46/75

    MAD HiL=Log S t

    i

    S t - 1i

    1

    N j = 0 N LogKS t - j i

    S t - j - 1i O

    We focused on negative deviations. We kept moving K up until to 100 MAD (indeed) --and we still had observations.

    Implied a K =E@X X < K D

    E@X X < K D - K

    MAD E@X X < K D n Hfor X < KLE @X X < K D

    KImplied a

    - 1. - 1.75202 1.32517 10 6 1.75202 2.32974- 2. - 3.02395 300 806. 1.51197 2.95322- 5. - 7.96354 19 285. 1.59271 2.68717- 10. - 15.3283 3198. 1.53283 2.87678- 15. - 22.3211 1042. 1.48807 3.04888- 20. - 30.2472 418. 1.51236 2.95176- 25. - 40.8788 181. 1.63515 2.57443- 50. - 101.755 24. 2.0351 1.96609- 70. - 156.709 11. 2.23871 1.80729- 75. - 175.422 9. 2.33896 1.74685

    - 100. - 203.991 7. 2.03991 1.96163

    Short term Interest RatesEuroDollars Front Month 1986-2006

    n=4947

    MAD E@X X < K D n Hfor X < KLE @X X < K D

    KImplied a

    - 0.5 - 1.8034 1520 3.6068 1.38361- 1. - 2.41323 969 2.41323 1.7076- 5. - 7.96752 69 1.5935 2.68491

    - 6. - 9.2521 46 1.54202 2.84496- 7. - 10.2338 34 1.46197 3.16464- 8. - 11.4367 24 1.42959 3.32782

    UK Rates 1990-2007

    n=4143

    MAD E@X X < K D n Hfor X < KLE @X X < K D

    KImplied a

    0.5 1.68802 1270 3.37605 1.420871. 2.23822 806 2.23822 1.807613. 4.97319 140 1.65773 2.520385. 8.43269 36 1.68654 2.456586. 9.56132 26 1.59355 2.684777. 11.4763 16 1.63947 2.56381

    Literally, you do not even have a large number K for which scalability drops from a small sample effect.

    Global Macroeconomic data

    48 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    47/75

    Taleb(2008), International Journal of Forecasting.

    Risk and (Anti)fragility - N N Taleb| 49

  • 7/30/2019 Fat Tailes and Fragility

    48/75

    6An Introduction to Metaprobability

    Ludic fallacy (or uncertainty of the nerd) : the manifestation of the Platonic fallacy in the study of uncertainty; basing studies of chanceon the narrow world of games and dice (where we know the probabilities ahead of time, or can easily discover them). A-Platonicrandomness has an additional layer of uncertainty concerning the rules of the game in real life. (in the Glossary of The Black Swan )

    Epistemic opacity : Randomness is the result of incomplete information at some layer. It is functionally indistinguishable from true orphysical randomness.

    Randomness as incomplete information: simply, what I cannot guess is random because my knowledge about the causes is incomplete, notnecessarily because the process has truly unpredictable properties.

    6.1 Metaprobability

    The Effect of Estimation Error, General CaseThe idea of model error from missed uncertainty attending the parameters (another layer of randomness) is as follows.

    Most estimations in economics (and elsewhere) take, as input, an average or expected parameter, a-

    = a f HaL a , where a is f distributed (deemed to be so a priori or from past samples), and regardles of the dispersion of a , build a probability distribution for X that relies on themean estimated parameter, p H X )= pI X a- M, rather than the more appropriate metaprobability adjusted probability:

    (6.1) pH X L= pH X a Lf HaL aIn other words, if one is not certain about a parameter a , there is an inescapable layer of stochasticity; such stochasticity raises the expected

    (metaprobability-adjusted) probability if it is < 12

    and lowers it otherwise. The uncertainty is fundamentally epistemic, includes incertitude, in

    the sense of lack of certainty about the parameter.

    The model bias becomes an equivalent of the Jensen gap (the difference between the two sides of Jensen's inequality), typically positive sinceprobability is convex away from the center of the distribution. We get the bias w A from the differences in the steps in integration

    (6.2)w A = pH X a Lf HaL a - pK X a f HaL a OWith f(X) a function , f(X)=X for the mean, etc., we get the higher order bias w A'

    (6.3)w A' = f H X L pH X a Lf HaL a X - f H X L pK X a f HaL a O X Now assume the distribution of a as discrete n states, with a = 8a i

  • 7/30/2019 Fat Tailes and Fragility

    49/75

    This partwill be

    expanded

    6.2 Application to PowerlawsIn the presence of a layer of metaprobabilities (from uncertainty about the parameters), the asymptotic tail exponent for a powerlaw corre-sponds to the lowest possible tail exponent regardless of its probability. The problem explains Black Swan effects, i.e., why measurementstend to chronically underestimate tail contributions, rather than merely deliver imprecise but unbiased estimates.

    When the perturbation affects the standard deviation of a Gaussian or similar nonpowerlaw tailed distribution, the end product is the weightedaverage of the probabilities. However, a powerlaw distribution with errors about the possible tail exponent will bear the asymptotic propertiesof the lowest exponent, not the average exponent.

    Now assume p(X) a standard Pareto Distribution with a the tail exponent being estimated, p H X aL= a X -a- 1 X mina , where X min is the lowerbound for X,

    (6.5) pH X L= i= 1

    n

    a i X -a i- 1 X mina i f i

    Taking it to the limit

    Limit X

    X a*+ 1

    i= 1

    n

    a i X -a i- 1 X mina i f i = K

    where K is a strictly positive constant and a * = min a i1 i n

    . In other words i= 1n a i X -a i- 1 X mina i f i is asymptotically equivalent to a constant times X a

    *+ 1 . The lowest parameter in the space of all possibilities becomes the dominant parameter for the tail exponent.

    I X a-M

    p

    H X a *

    L i = 1n p H X a iLf i

    5 10 50 100 500 1000X

    10- 7

    10- 5

    0.001

    0.1

    Prob

    Figure 6.1 Log-log plot illustration of the asymptotic tail exponent with two states. The graphs shows the different situations, a) p IX a- Mb) i = 1n p HX a i Lf i and c) p HX a * L.We can see how b) and c) converge

    The asymptotic Jensen Gap w A becomes p

    H X a *

    L- p

    H X a

    -

    MImplications1. Whenever we estimate the tail exponent from samples, we are likely to underestimate the thickness of the tails, an observation made about

    Monte Carlo generated a -stable variates and the estimated results (the Weron effect).2. The higher the estimation variance, the lower the true exponent.

    3. The asymptotic exponent is the lowest possible one. It does not even require estimation.

    4. Metaprobabilistically, if one isnt sure about the probability distribution, and there is a probability that the variable is unbounded andcould be powerlaw distributed, then it is powerlaw distributed, and of the lowest exponent.

    Risk and (Anti)fragility - N N Taleb| 51

  • 7/30/2019 Fat Tailes and Fragility

    50/75

    The obvious conclusion is to in the presence of powerlaw tails, focus on changing payoffs to clip tail exposures to limit w A' and robustify tailexposures, making the computation problem go away.

    52 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    51/75

    7Brownian Motion in the Real World (Under PathDependence and Fat Tails)

    Most of the work concerning martingales and Brownian motion can be idealized to the point of lacking any match to reality, in spite of thesophisticated, rather complicated discussions. This section discusses the (consequential) differences.

    7.1 Path Dependence and History as Revelation of AntifragilityThe Markov Property underpining Brownian Motion XN 8X1 ,X 2 ,... X N - 1

  • 7/30/2019 Fat Tailes and Fragility

    52/75

    This partwill be

    expanded

    7.2 Brownian Motion in the Real WorldWe mentioned in the discussion of the Casanova problem that stochastic calculus requires a certain class of distributions, such as the Gaussian.It is not as we expect because of the convenience of the smoothness in squares (finite Dx2), rather because the distribution conserves across timescales. By central limit, a Gaussian remains a Gaussian under summation, that is sampling at longer time scales. But it also remains a Gaussianat shorter time scales.

    The problems are as follows:

    1. The results in the literature are subjected to the constaints that the Martingale M is member of the subset ( H 2 ) of square integrablemartingales, sup t T E[M

    2 ]

  • 7/30/2019 Fat Tailes and Fragility

    53/75

    7.4 Finite Variance not Necessary for Anything Ecological (incl. quant finance)

    This partwill be

    expanded

    Risk and (Anti)fragility - N N Taleb| 55

  • 7/30/2019 Fat Tailes and Fragility

    54/75

    8How Power-Laws Emerge From Recursive EpistemicUncertainty.

    8.1 The Opposite of Central Limit

    With the Central Limit Theorem: we start with a distribution and end with a Gaussian. The opposite is more likely to be true. Recall how wefattened the tail of the Gaussian by stochasticizing the variance? Now let us use the same metaprobability method, put add additional layers of uncertainty.

    The Regress Argument (Error about Error)The main problem behind The Black Swan is the limited understanding of model (or representation) error, and, for those who get it, a lack of understanding of second order errors (about the methods used to compute the errors) and by a regress argument, an inability to continuouslyreapplying the thinking all the way to its limit ( particularly when they provide no reason to stop ). Again, there is no problem with stopping therecursion, provided it is accepted as a declared a priori that escapes quantitative and statistical methods.

    Epistemic not statistical re-derivation of power laws : Note that previous derivations of power laws have been statistical (cumulative advantage,preferential attachment, winner-take-all effects, criticality), and the properties derived by Yule, Mandelbrot, Zipf, Simon, Bak, and others resultfrom structural conditions or breaking the independence assumptions in the sums of random variables allowing for the application of the centrallimit theorem. This work is entirely epistemic, based on standard philosophical doubts and regress arguments.

    8.2 Methods and DerivationsLayering Uncertainties

    Take a standard probability distribution, say the Gaussian. The measure of dispersion, here s , is estimated, and we need to attach somemeasure of dispersion around it. The uncertainty about the rate of uncertainty, so to speak, or higher order parameter, similar to what called the"volatility of volatility" in the lingo of option operators (see Taleb, 1997, Derman, 1994, Dupire, 1994, Hull and White, 1997) --here it wouldbe "uncertainty rate about the uncertainty rate". And there is no reason to stop there: we can keep nesting these uncertainties into higher orders,with the uncertainty rate of the uncertainty rate of the uncertainty rate, and so forth. There is no reason to have certainty anywhere in theprocess.

    Higher order integrals in the Standard Gaussian Case

    We start with the case of a Gaussian and focus the uncertainty on the assumed standard deviation. Define f ( m,s ,x) as the Gaussian PDF forvalue x with mean mand standard deviation s .

    A 2 nd order stochastic standard deviation is the integral of f across values of s ]0, [, under the measure f Hs , s 1 , s L, with s 1 its scaleparameter (our approach to trach the error of the error), not necessarily its standard deviation; the expected value of s 1 is s 1 .

    (8.1) f H xL1 = 0

    f H m, s , xL f Hs , s 1 , s L sGeneralizing to the Nth order, the density function f(x) becomes

    (8.2) f H xL N = 0

    ... 0

    f H m, s , xL f Hs , s 1 , s L f Hs 1 , s 2 , s 1L... f Hs N - 1 , s N , s N - 1L s s 1 s 2 ... s N The problem is that this approach is parameter-heavy and requires the specifications of the subordinated distributions (in finance, the lognormal

    56 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    55/75

    has been traditionally used for s 2 (or Gaussian for the ratio Log[s t

    2

    s 2] since the direct use of a Gaussian allows for negative values). We would

    need to specify a measure f for each layer of error rate. Instead this can be approximated by using the mean deviation for s , as we will see next.

    Discretization using nested series of two-states for s - a simple multiplicative process

    We saw in the last chapter a quite effective simplification to capture the convexity, the ratio of (or difference between) f ( m,s ,x) and

    0 f H m, s , xL f Hs , s 1 , s L s (the first order standard deviation) by using a weighted average of values of s , say, for a simple case of one-order stochastic volatility:

    s H1 a H1LL, 0 a H1L< 1where a(1) is the proportional mean absolute deviation for s , in other word the measure of the absolute error rate for s . We use 1

    2as the

    probability of each state. Unlike the earlier situation we are not preserving the variance, rather the STD.

    Thus the distribution using the first order stochastic standard deviation can be expressed as:

    (8.3) f H xL1 = 128f H m, s H1 + aH1LL, xL+ f H m, s H1 - aH1LL, xL 0, and in the more general case with variable a where a(n) a(n-1), the moments explode.

    A- Even the smallest value of a >0, since I1 + a 2 MN is unbounded, leads to the second moment going to infinity (though not the first)when N . So something as small as a .001% error rate will still lead to explosion of moments and invalidation of the use of the class of ! 2 distributions.

    B- In these conditions, we need to use power laws for epistemic reasons, or, at least, distributions outside the ! 2 norm, regardless of observations of past data.

    Note that we need an a priori reason (in the philosophical sense) to cutoff the N somewhere, hence bound the expansion of the second moment.

    Convergence to Properties Similar to Power Laws

    We can see on the example next Log-Log plot (Figure 1) how, at higher orders of stochastic volatility, with equally proportional stochastic

    Risk and (Anti)fragility - N N Taleb| 59

  • 7/30/2019 Fat Tailes and Fragility

    58/75

    coefficient, (where a(1)=a(2)=...=a(N)= 110

    ) how the density approaches that of a power law (just like the Lognormal distribution at higher

    variance), as shown in flatter density on the LogLog plot. The probabilities keep rising in the tails as we add layers of uncertainty until theyseem to reach the boundary of the power law, while ironically the first moment remains invariant.

    10.05.02.0 20.03.0 30.01.5 15.07.0 Log x

    10 - 13

    10 - 10

    10 - 7

    10 - 4

    0.1

    Log Pr HxL

    a=1

    10, N = 0,5,10,25,50

    Figure x - LogLog Plot of the probability of exceeding x showing power law-style flattening as N rises. Here all values of a= 1/10

    The same effect takes place as a increases towards 1, as at the limit the tail exponent P>x approaches 1 but remains >1.

    Effect on Small Probabilities

    Next we measure the effect on the thickness of the tails. The obvious effect is the rise of small probabilities.

    Take the exceedant probability, that is, the probability of exceeding K , given N , for parameter a constant :

    (8.9)P > K N = j = 0

    N

    2- N - 1 K N j OerfcK

    2 s Ha + 1L j H1 - aL N - j where erfc(.) is the complementary of the error function, 1-erf(.), erf H zL= 2p 0 ze- t

    2 dt

    Convexity effect :The next Table shows the ratio of exceedant probability under different values of N divided by the probability in the case of astandard Gaussian.

    a =1

    100

    N P> 3, N

    P> 3, N = 0

    P > 5, N

    P > 5, N = 0

    P> 10, N

    P> 10, N = 0

    5 1.01724 1.155 710 1.0345 1.326 4515 1.05178 1.514 22120 1.06908 1.720 92225 1.0864 1.943 3347

    a =1

    10

    N P> 3, N

    P> 3, N = 0

    P > 5, N

    P > 5, N = 0

    P > 10, N

    P> 10, N = 0

    5 2.74 146 1.09 1012

    10 4.43 805 8.99 1015

    15 5.98 1980 2.21 1017

    20 7.38 3529 1.20 1018

    25 8.64 5321 3.62 1018

    60 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    59/75

    8.4 Regime 2: Cases of decaying parameters a (n )

    As we said, we may have (actually we need to have) a priori reasons to decrease the parameter a or stop N somewhere. When the higher orderof a (i) decline, then the moments tend to be capped (the inherited tails will come from the lognormality of s ).

    Regime 2-a; First Method: bleed of higher order error

    Take a "bleed" of higher order errors at the rate l , 0 l < 1 , such as a(N) = l a(N-1), hence a (N) = l N a (1), with a(1) the conventionalintensity of stochastic standard deviation. Assume m=0.

    With N =2 , the second moment becomes:

    (8.10)M2H2L= IaH1L2 + 1Ms 2 IaH1L2 l 2 + 1MWith N =3,

    (8.11)M2

    H3

    L= s 2

    I1 + a

    H1

    L2

    M I1 + l 2 a

    H1

    L2

    M I1 + l 4 a

    H1

    L2

    Mfinally, for the general N:(8.12)M3H N L= IaH1L2 + 1Ms 2

    i= 1

    N - 1

    IaH1L2 l 2 i + 1MWe can reexpress H12Lusing the Q - Pochhammer symbol Ha ; qL N =

    i= 1

    N - 1

    I1 - aq iM(8.13)M2H N L= s 2 I- aH1L2; l 2M N

    Which allows us to get to the limit

    (8.14)Limit M2 H N L N = s 2 Hl2; l 2L2 HaH1L2 ; l 2LHl

    2- 1L

    2

    Hl2

    + 1LAs to the fourth moment:By recursion:

    (8.15)M4H N L= 3 s 4 i= 0

    N - 1

    I6 aH1L2 l 2 i + aH1L4 l 4 i + 1M(8.16)M4H N L= 3 s 4 KK2 2 - 3OaH1L2; l 2O

    N K- K3 + 2 2 OaH1L2; l 2O

    N

    (8.17)Limit M4 H N L N = 3 s 4 KK2 2 - 3OaH1L2; l 2OK- K3 + 2 2 OaH1L2; l 2O

    So the limiting second moment for l =.9 and a(1)=.2 is just 1.28 s 2 , a significant but relatively benign convexity bias. The limiting fourth

    moment is just 9.88 s 4 , more than 3 times the Gaussians (3 s 4 ), but still finite fourth moment. For small values of a and values of l close to1, the fourth moment collapses to that of a Gaussian.

    Regime 2-b; Second Method, a Non Multiplicative Error Rate

    For N recursions

    s H1 HaH1L H1 Ha H2L H1 aH3L H... LLL(8.18)PH x, m, s , N L= i= 1

    L f H x, m, s H1 + HT N . A N LiL L

    Risk and (Anti)fragility - N N Taleb| 61

  • 7/30/2019 Fat Tailes and Fragility

    60/75

    I M N .T + 1Mi is the ith component of the H N 1Ldot product of T N the matrixof Tuples in H6L, L the lengthof the matrix, and A is thevectorof parameters

    A N = 9a j = j = 1,... N So for instance, for N =3, T= {1, a , a 2 , a 3}

    T 3 . A 3 =

    a + a2

    + a3

    a + a 2 - a 3

    a - a 2 + a 3

    a - a 2 - a 3

    - a + a 2 + a 3

    - a + a 2 - a 3

    - a - a 2 + a 3

    - a - a 2 - a 3

    The moments are as follows:

    (8.19)M1H N L= m(8.20)M2H N L= m2 + 2 s(8.21)M4H N L= m4 + 12 m2 s + 12 s 2

    i= 0

    N

    a 2 i

    at the limit of N

    (8.22)Lim N M4H N L= m4 + 12 m2 s + 12 s 2 11 - a 2

    which is very mild.

    8.5 Conclusion

    Something Boring & something about epistemic opacity.

    This partwill be

    expanded

    62 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    61/75

    9On the Difference Between Binaries and Vanillas withImplications For Prediction Markets

    This explains how and where prediction markets (or, more general discussions of betting matters) do not correspond to reality and havelittle to do with exposures to fat tails and Black Swan effects. Elementary facts, but with implications. This show show, for instance, thelong shot bias is misapplied in real life variables, why political predictions are more robut than economic ones.

    This discussion is based on Taleb (1997) which focuses largely on the difference between a binary and a vanilla option.

    9.1 Definitions1- A binary bet (or just a binary or a digital) : a outcome with payoff 0 or 1 (or, yes/no, -1,1, etc.) Example: prediction market,election, most games and lottery tickets. Also called digital . Any statistic based on YES/NO switch. Its estimator is M T X H A, f L, which recall1.x was i=0

    n 1 A X t 0 + i Dt

    i= 0n 1 with f=1 and A either = (K, ) or its complement HK, L.Binaries are effectively bets on probability, more specifically cumulative probabilities or their complement. They are rarely ecological,except for political predictions.More technically, they are mapped by the Heaviside function, q (K )= 1 if x>K and 0 if x

  • 7/30/2019 Fat Tailes and Fragility

    62/75

    (9.1) M T

    X HH L, H L, xL limDK 0 i= 1

    H - K

    DK HK + i DKL- i= 1

    K - L

    DK HK - i DKLDKK

    and for unbounded vanilla payoffs, L=0,

    (9.2) M T X HH- , L, xL limK limDK 0

    i= 1

    H - K

    DK HK + i DKL- i= 1

    K

    DK HK - i DKLDKK

    The ProblemThe properties of binaries diverge from those of vanilla exposures. This note is to show how conflation of the two takes place: predictionmarkets, ludic fallacy (using the world of games to apply to real life),

    1. They have diametrically opposite responses to skewness (mean-preserving increase in skewness).

    Proof TK

    2. They respond differently to fat-tailedness (sometimes in opposite directions). Fat tails makes binaries more tractable.

    Proof TK

    3. Rise in complexity lowers the value of the binary and increases that of the vanilla.

    Proof TK

    Some direct applications:

    1- Studies of long shot biases that typically apply to binaries should not port to vanillas.

    2- Many are surprised that I find many econometricians total charlatans, while Nate Silver to be immune to my problem. This explains why.

    3- Why prediction markets provide very limited information outside specific domains.

    4- Etc.

    The Elementary Betting Mistake

    One can hold beliefs that a variable can go lower yet bet that it is going higher. Simply, the digital and the vanilla diverge. PH X > X 0L> 12 , butEH X L< E H X 0L. This is normal in the presence of skewness and extremely common with economic variables. Philosophers have a relatedproblem called the lottery paradox which in statistical terms is not a paradox.

    In Fooled by Randomness, a trader was asked:

    "do you predict that the market if going up or down?" "Up", he said, with confidence. Then the questioner got angry when he discovered that the trader was short the market, i.e., would benefit from the market going down. The questioner could not get the idea that the traderbelieved that the market had a higher probability of going up, but that, should it go down, it would go down a lot. So the rational idea wasto be short.

    This divorce between the binary (up is more likely) and the vanilla is very prevalent in real-world variables.

    9.3 The Elementary Fat Tails MistakeA slightly more difficult problem. To the question: What happens to the probability of a deviation >1 s when you fatten the tail (whilepreserving other properties)?, almost all answer: it increases (so far all have made the mistake). Wrong. Fat tails is the contribution of theextreme events to the total properties, and that it is a pair probability payoff that matters, not just probability. This is the reason peoplethought that I meant (and reported) that Black Swans were more frequent; I meant Black Swans are more consequential, more determining, butnot more frequent .

    64 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    63/75

    Ive asked variants of the same question. The Gaussian distribution spends 68.2% of the time between 1 standard deviation. The real worldhas fat tails. In finance, how much time do stocks spend between 1 standard deviations? The answer has been invariably lower. Why?Because there are more deviations. Sorry, there are fewer deviations: stocks spend between 78% and 98% between 1 standard deviations(computed from past samples).

    Some simple derivations

    Let x follow a Gaussian distribution ( m, s ). Assume m=0 for the exercise. What is the probability of exceeding one standard deviation?

    P > 1 s = 1 -1

    2erfcK- 12 O, where erfc is the complimentary error function, P > 1 s = P < 1 s > 15.86% and the probability of staying within the

    "stability tunnel" between 1 s is > 68.2 %.

    Let us fatten the tail in a varince-preserving manner, using the standard method of linear combination of two Gaussians with two standard

    deviations separated by s 1 + a and s 1 - a , where a is the "vvol" (which is variance preserving, technically of no big effect here, as astandard deviation-preserving spreading gives the same qualitative result). Such a method leads to immediate raising of the Kurtosis by a factor

    of H1 + a 2Lsince E I x4M E I x2M2 = 3 Ha 2 + 1LP > 1 s = P < 1 s = 1 -

    1

    2erfc -

    1

    2 1 - a-

    1

    2erfc -

    1

    2 a + 1

    So then, for different values of a as we can see, the probability of staying inside 1 sigma increases.

    - 4 - 2 2 4

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    Fatter and fatter tails: different values of a . We notice that higher peak lower probability of nothing leaving the 1 s tunnel

    9.4 The More Advanced Fat Tails Mistake and Great ModerationFatter tails increases time spent between deviations, giving the illusion of absence of volatility when in fact events are delayed and made worse(my critique of the Great Moderation).

    Stopping Time & Fattening of the tails of a Brownian Motion : Consider the distribution of the time it takes for a continuously monitoredBrownian motion S to exit from a "tunnel" with a lower bound L and an upper bound H. Counterintuitively, fatter tails makes an exit (at somesigma) take longer. You are likely to spend more time inside the tunnel --since exits are far more dramatic.y is the distribution of exit time t, where t inf {t: S [L,H]} From Taleb (1997) we have the following approximation

    y Ht s L= 1HlogH H L- logH LLL2

    -1

    8 It s2Mp s 2

    n= 1

    m 1

    H LH- 1Ln -

    n2 p 2 t s 2

    2 HlogH H L- logH LLL2 n S L sinn p HlogH LL- logHS LL

    logH H L- logH LL - H sinn p HlogH H L- logHS LL

    logH H L- logH LLand the fatter-tailed distribution from mixing Brownians with s separated by a coefficient a:

    Risk and (Anti)fragility - N N Taleb| 65

  • 7/30/2019 Fat Tailes and Fragility

    64/75

    y Ht s , aL= 12 pHt s H1 - aLL+1

    2 pHt s H1 + aLL

    This graph shows the lengthening of the stopping time between events coming from fatter tails.

    2 4 6 8Exit Time

    0.1

    0.2

    0.3

    0.4

    0.5

    Probability

    0.1 0.2 0.3 0.4 0.5 0.6 0.7v

    3

    4

    5

    6

    7

    8

    Expected t

    More Complicated : MetaProbabilities

    9.5 The Fourth Quadrant Mitigation (or Solution)Let us return to M[A, f H xL] of chapter 1. A quite significant result is that M[A, xn] may not converge, in the case of, say power laws withexponent a < n, but M[A, xm] where m

  • 7/30/2019 Fat Tailes and Fragility

    65/75

    1. M0, depend on the 0 th moment, that is, "Binary", or simple, i.e., as we saw, you just care if something is true or false. Very true or veryfalse does not matter. Someone is either pregnant or not pregnant. A statement is "true" or "false" with some confidence interval. (I callthese M0 as, more technically, they depend on the zero th moment, namely just on probability of events, and not their magnitude you justcare about "raw" probability). A biological experiment in the laboratory or a bet with a friend about the outcome of a soccer game belongto this category.

    2. M1+ Complex,depend on the 1 st or higher moments. You do not just care of the frequencybut of the impact as well, or, even morecomplex, some function of the impact. So there is another layer of uncertainty of impact. (I call these M1+, as they depend on highermoments of the distribution). When you invest you do not care how many times you make or lose, you care about the expectation: howmany times you make or lose times the amount made or lost.

    Two types of probability structures:

    There are two classes of probability domainsvery distinct qualitatively and quantitatively. The first, thin-tailed: Mediocristan", the second,thick tailed Extremistan:

    Note the typo f(x)= 1 should be f(x) = x

    Risk and (Anti)fragility - N N Taleb| 67

  • 7/30/2019 Fat Tailes and Fragility

    66/75

    The Map

    ConclusionThe 4th Quadrant is mitigated by changes in exposures. And exposures in the 4th quadrant can be to the negative or the positive, depending onif the domain subset A exposed on the left on on the right.

    68 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    67/75

    PART II - NONLINEARITIES

    This Segment,Part II corresponds to topics

    coveredin Antifragile

    The previous chapters dealt mostly with probability rather than payoff. The next section deals with fragility and nonlinear effects.

    Risk and (Anti)fragility - N N Taleb| 69

  • 7/30/2019 Fat Tailes and Fragility

    68/75

    10Nonlinear Transformations of Random Variables

    10.1. The Conflation Problem: Exposures to X Confused With Knowledge About X Exposure, not knowledge .Take X a random or nonrandom variable, and F(X) the exposure, payoff, the effect of X on you, the end bottom line.(To be technical, X is higher dimensions, in ! N but less assume for the sake of the examples in the introduction that it is a simple one-dimen-

    sional variable).The disconnect. As a practitioner and risk taker I see the following disconnect: people (nonpractitioners) talking to me about X (with theimplication that we practitioners should care about X in running our affairs) while I had been thinking about F(X), nothing but F(X). And thestraight confusion since Aristotle between X and F(X) has been chronic. Sometimes people mention F(X) as utility but miss the full payoff.And the confusion is at two level: one, simple confusion; second, in the decision-science literature, seeing the difference and not realizing thataction on F(X) is easier than action on X.

    Examples :

    X is unemployment in Senegal, F 1 (X) is the effect on the bottom line of the IMF, and F 2 (X) is the effect on your grandmother (which Iassume is minimal).

    X can be a stock price, but you own an option on it, so F(X) is your exposure an option value for X, or, even more complicated the utility of the exposure to the option value.X can be changes in wealth, F(X) the convex-concave value function of Kahneman-Tversky, how these affect you. One can see that F(X)is vastly more stable or robust than X (it has thinner tails).

    VariableX

    Funtionf HXL

    A convex and linear function of a variable X. Confusing f(X) (on the vertical) and X (the horizontal) is more and more significant when f(X) is nonlinear. The more convex f(X),the more the statistical and other properties of f(X) will be divorced from those of X. For instance, the mean of f(X) will be different from f(Mean of X), by Jensens ineqality. Butbeyond Jensens inequality, the difference in risks between the two will be more and more considerable. When it comes to probability, the more nonlinear f, the less theprobabilities of X matter compared to the nonlinearity of f. Moral of the story: focus on f, which we can alter, rather than the measurement of the elusive properties of X.

    Risk and (Anti)fragility - N N Taleb| 71

  • 7/30/2019 Fat Tailes and Fragility

    69/75

    Probability Distribution of x Probability Distribution of f HxL

    There are infinite numbers of functions F depending on a unique variable X .

    All utilities need to be embedded in F .

    Limitations of knowledge . What is crucial, our limitations of knowledge apply to X not necessarily to F(X) . We have no control over X, somecontrol over F(X ). In some cases a very, very large control over F(X) .

    This seems naive, but people do, as something is lost in the translation.

    The danger with the treatment of the Black Swan problem is as follows: people focus on X (predicting X). My point is that, although wedo not understand X, we can deal with it by working on F which we can understand, while others work on predicting X which we cant

    because small probabilities are incomputable, particularly in fat tailed domains. F(X) is how the end result affects you. The probability distribution of F(X) is markedly different from that of X, particularly when F(X) is nonlinear. We need a nonlineartransformation of the distribution of X to get F(X). We had to wait until 1964 to get a paper on convex transformations of randomvariables, Van Zwet (1964).

    Bad news : F is almost always nonlinear, often S curved, that is convex-concave (for an increasing function).

    The central point about what to understand : When F(X) is convex, say as in trial and error, or with an option, we do not need to understand X as much as our exposure to H. Simply the statistical properties of X are swamped by those of H . That's the point of Antifragility in whichexposure is more important than the naive notion of "knowledge", that is, understanding X .

    Fragility and Antifragility :

    When F(X) is concave (fragile), errors about X can translate into extreme negative values for F. When F(X) is convex, one is immune fromnegative variations.

    The more nonlinear F the less the probabilities of X matter in the probability distribution of the final package F .

    Most people confuse the probabilites of X with those of F. I am serious: the entire literature reposes largely on this mistake.

    So, for now ignore discussions of X that do not have F . And, for Baals sake, focus on F , not X .

    10.2. Transformations of Probability DistributionsSay x follows a distribution p (x) and z= f(x) follows a distribution g(z). Assume g(z) continuous, increasing, and differentiable for now.

    The density p at point r is defined by use of the integral

    DHrL -r

    p H xL xhence

    -r

    pH xL x = - f HrL

    gH zL zIn differential form

    gH zL z = pH xL xsince x = f H- 1LH zL, we getgH zL z = pI f H- 1LH zLM f H- 1LH zLNow, the derivative of an inverse function f H- 1LH zL= 1 f I f - 1H zLM, which obtains the useful transformation heuristic:

    72 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    70/75

    (10.1)gH zL= p I f H- 1LH zLM

    f 'HuLu = I f H- 1LH zLMIn the event that g(z) is monotonic decreasing, then

    (10.2)g

    H z

    L=

    p I f H- 1LH zLM f ' HuLu = I f H- 1LH zLM

    Where f is convex,1

    2 H f H x - D xL+ f HDx + xLL> f H xL, concave if 12 H f H x - D xL+ f HDx + xLL< f(x). Let us simplify with sole condition,assuming f(.) twice differentiable,

    ! 2 f! x 2

    > 0 for all values of x in the convex case and

  • 7/30/2019 Fat Tailes and Fragility

    71/75

    Figure 1 : Simulation, first. The distribution of the utility of changes of wealth, when the changes in wealth follow a power law with tail exponent =2 (5 million Monte Carlosimulations).

    Distribution of V(x)

    Distribution of x

    - 20 - 10 10 20

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    Figure 2 : The graph in Figure 1 derived analytically

    Fragility: as defined in the Taleb-Douady (2012) sense, on which later, i.e. tail sensitivity below K, v(x) is less fragile than x.

    Tail of x

    Tail of v(x)

    - 18 - 16 - 14 - 12 - 10 - 8 - 6

    0.005

    0.010

    0.015

    0.020

    Figure 3: Left tail.

    74 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    72/75

    v(x) has thinner tails than x more robust.

    ASYMPTOTIC TAIL More technically the asymptotic tail for V(x) becomesa

    a(i.e, for x and -x large, the exceedance probability for V, P > x

    ~ K x -a

    a , with K a constant, or

    z H xL~ K x- aa - 1We can see that V(x) can easily have finite variance when x has an infinite one. The dampening of the tail has an increasingly consequentialeffect for lower values of a .

    Case 2: Compare to the Monotone Concave of Classical UtilityUnlike the convex-concave shape in Kahneman Tversky, classical utility is monotone concave. This leads to plenty of absurdities, but theworst is the effect on the distribution of utility.

    Granted one (K-T) deals with changes in wealth, the second is a function of wealth.

    Take the standard concave utility function g(x)= 1- e - a x . With a=1

    - 2 - 1 1 2 3x

    - 6

    - 5

    - 4

    - 3

    - 2

    - 1

    1

    gHxL

    Plot of 1- e - a x

    The distribution of v(x) will be

    vH xL= - - H m+logH1- xLL

    2

    2 s 2

    2 p s H x - 1L

    - 10 - 8 - 6 - 4 - 2 2x

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    vHxL

    With such a distribution of utility it would be absurd to do anything.

    10.4 The effect of convexity on the distribution of f(x)

    Note the following property.

    Risk and (Anti)fragility - N N Taleb| 75

  • 7/30/2019 Fat Tailes and Fragility

    73/75

    Distributions that are skewed have their mean dependent on the variance (when it exists), or on the scale. In other words, more uncertaintyraises the expectation.

    Demonstration 1 :TK

    Outcome

    Probability

    Low Uncertainty

    High Uncertainty

    Example : the Lognormal Distribution has a terms 2

    2in its mean, linear to variance.

    Example : the Exponential Distribution 1 - - x l x 0 has the mean a concave function of the variance, that is,1

    l, the square root of its

    variance.

    Example : the Pareto Distribution La x- 1-a a x L , a >2 has the mean a - 2 a Standard Deviation,

    a

    a- 2 L

    a- 1(verify)

    10.5 The Mistake of Using Regular Estimation Methods When the Payoff is Convex

    76 | Risk and (Anti)fragility - N N Taleb

  • 7/30/2019 Fat Tailes and Fragility

    74/75

    11Fragility Is Nonlinear Response

    11.1 Fragility, As Linked to Nonlinearity

    Risk and (Anti)fragility - N N Taleb| 77

  • 7/30/2019 Fat Tailes and Fragility

    75/75

    1 2 3 4 5 6 7

    - 700000

    - 600000

    - 500000

    - 400000

    - 300000

    - 200000

    - 100000

    ,

    2 4 6 8 10 12 14

    - 2.5 10 6

    - 2.0 10 6

    - 1.5 10 6

    - 1.0 10 6

    - 500000

    ,

    5 10 15 20

    - 1 10 7

    - 8 10 6

    - 6 10 6

    - 4 10 6

    - 2 10 6

    Mean Dev 10 L 5 L 2.5 L L Nonlinear

    1 - 100000 - 50000 - 25000 - 10 000 - 10002 - 200000 - 100000 - 50000 - 20 000 - 80005 - 500000 - 250000 - 125000 - 50 000 - 12500010 - 1 000 000 - 500000 - 250000 - 100000 - 1 000 00015 - 1 500 000 - 750000 - 375000 - 150000 - 3 375 00020 - 2 000 000 - 1 000 000 - 500000 - 200000 - 8 000 000

    25 - 2 500 000 - 1 250 000 - 625000 - 250000 - 15625000

    78 | Risk and (Anti)fragility - N N Taleb