REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput...

26
THE EARLY HISTORY OF BAYESIAN STATISTICS TOM LEONARD REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from 1972) on my website www.thomashoskynsleonard.co.uk Refers to technical material in my book Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers (1999, with John S.J. Hsu) Cambridge University Press See also my academic life story The Life of a Bayesian Boy. Self-published on my website Slides prepared by Thomas Tallis

Transcript of REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput...

Page 1: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

THE EARLY HISTORY OF BAYESIAN STATISTICSTOM LEONARD

REFERENCES:

A personal history of Bayesian Statistics (2014)

Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115

with link to remaining chapters (from 1972) on my website

www.thomashoskynsleonard.co.uk

 

Refers to technical material in my book

Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers

(1999, with John S.J. Hsu) Cambridge University Press

 

See also my academic life story The Life of a Bayesian Boy.

Self-published on my website

Slides prepared by Thomas Tallis

Page 2: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

OCCAM’S RAZOR (William of Ockham, c1287-1347)

Among competing (plausible) hypotheses, the hypothesis with the fewest assumptions should be selected. (WILLIAM OF OCKHAM)In other words: Keep things simple, and cut out extraneous information

FOR EXAMPLE:: Use parameter parsimonious sampling models which depend upon on low numbers of unknown parameters (e.g. which minimise AIC or DIC)Contrasts with: ‘A model should be as big as an elephant’ (Leonard ‘Jimmie’ Savage, 1954, Lindley, 1983)

Agrees with: ‘The greater the amount of information the less you actually know’ (Toby Mitchell, c 1980)

Related to: E.T. Jaynes’ extremely valuable idea (1957 and 1968) of choosing the ‘maximum entropy’ prior distribution when only p summaries of the prior information are specified.

Page 3: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Blaise Pascal (1623-1662) formulated ‘Pascal’s Wager’ by reference to the notion of subjective probability.

Pascal corresponded with Pierre de Fermat about the potential development of probability theory.

In 1654, Pascal and De Fermat (1601 or 1607 -1665 ) together solved the problem of ‘points’ or ‘division of stakes’.In 1657, Christian Huygens discussed the Pascal –De Fermat debate, in De rationiciis in ludo aleae

Pascal Fermat

Page 4: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Daniel Bernoulli

Daniel Bernoulli (1700-1782) Swiss physician, doctor and mathematician.

Formalised subjective view of probability, decision making and risk.

Introduced concept of EXPECTED UTILITY in 1738 in historic paper published in St Petersburg

Used the St PETERSBURG PARADOX to justify maximising expected utility.

(where the expected reward from the specified betting scheme is infinite, but most punters would only want to place a small bet on the outcome because of the high probability of a low return)

Page 5: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

David Hume F.R.S.E (1711-1776)

Educated (from age 12) at University of Edinburgh Sceptical views about causality in 1739-41 trilogy between 1723 and 1725

Questionable cause fallacy----The false assumption that correlation proves causality Subjective probability discussed in Ch 6 of his 1748 book

Author of is-ought problem or Hume’s guillotineSignificant difference between descriptive statements (about what is) and prescriptive statements (about what ought to be)Not obvious how to get from descriptive statements to prescriptive ones Hume’s Law: You can’t derive an ought from an is

Page 6: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

“A midget on the shoulders of giants like Hume and Huygens’ (Tom Leonard, 2014)

Studied for Presbyterian Ministry at University of Edinburgh between 1719 and about 1722.

Probably derived continuous version of ‘Bayes’ Theorem’ during the 1740’s while a wealthy, well-connected minister in Tunbridge Wells, with a serious demeanour and happy disposition.

The Notebook of Thomas Bayes (1747-1760) contains a section on probabilities.

In his tract In defence of Isaac Newton (1736, printed by John Noon), sold for a shilling, Bayes writes,

To suspect Isaac Newton of the mean design of seeking reputation among the ignorant by venting unintelligible notions, and defending them by artful cunning and cunning artistry, is what no man is capable of doing.

Rev. Thomas Bayes (1701-1763)

Page 7: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Moral philosopher, inductive thinker, and political activist in support of American Revolution.

In 1763, Richard Price published Bayes’ paper ‘An Essay towards solving a Problem in the Doctrine of Chances’, posthumously, in the Proceedings of the Royal Society of London.

Bayes solved a complicated ‘Ball tossing problem’ involving n non-independent trials and with applications in life assurance. His mathematical solution was brilliant, but counterintuitive.

***Rev. Richard Price F.R.S.(1723-1791)

Page 8: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

He posed this as a special case of:Obscurely Worded General Problem: Given the number of times (n) an unknown event has happened and failed, REQUIRED the chance that the probability (ξ) of its happening in a single trial lies somewhere between any two degrees of probability that can be made?

A further special case (n=50 independent Bernoulli trials---see Bayes Appendix):If you fail to win a lottery on n=50 occasions, with equal chance ξ of winning on reach occasion, then what is the chance that you probability ξ of winning it on the 51st attempt lies between 0.001 and 0.01?

Page 9: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

VERY SPECIAL CASE (n=1)If a mother’s first baby is a girl, then what is the chance that the probability ξ that her second baby is a boy lies between 0.5 and 1?

Note that probability (girl on first birth, given ξ ) = 1-ξ Therefore LIKELIHOOD FUNCTION OF ξ is L (ξ, given girl on first birth) = 1-ξ for 0< ξ <1

In general, the likelihood of the unknown parameters is the assumed sampling density or probability mass function of the observations but expressed as a function of the unknown parameters, given the observations actually observed.

A young Bayesette

Page 10: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

 

Initiated the ‘Savageous’ philosophy of Bayesian Statistics

  THE BAYESIAN PARADIGM

Posterior information=Prior Information +

Sampling Information. ($$$)

  A Bayesian is somebody who tries to represent his prior

information about ξ by a probability distribution on ξ 

BAYES THEOREM (Continuous case): 

POSTERIOR DENSITY = K x PRIOR DENSITY x LIKELIHOOD

  where K can be calculated by noting that posterior density

integrates to unity across the parameter space. However, in his 1763 paper, Bayes assumed a uniform prior

distribution on (0,1) for ξ, in which case

POSTERIOR DENSITY=K x LIKELIHOOD

LEONARD ‘JIMMIE’ SAVAGE (1917- 1971)

Page 11: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

In preceding very special case,

Posterior density of ξ , given girl on first birth = (1-ξ)/2 (0<ξ<1) (*)

Posterior mean of ξ =predictive probability that next baby is a boy= 1/3 and P (0.5 <ξ <1, given girl on first birth) =1/4

If first n babies are girls, then predictive probability that next baby

is a boy is 1/(n+2)

DENSITY

PSI

POSTERIOR DENSITY OF PSI

Page 12: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Le Marquis Pierre’ Simon de Laplace (1749-1827)

French Astronomer, Mathematician, and PoliticianMinister in Napoleon’s GovernmentFOUNDING FATHER OF BAYESIAN STATISTICS AND DATA ANALYSIS

In 1774, his Memoir on the Probability of the Causes of Events Included a Bayesian analysis of the causes of events.

In 1812, his Analytic Theory of Probabilities contained a number of detailed statistical analyses.

He introduced a general version of Bayes’ theorem that includes the discrete and multiparameter cases.Applied it to ANALYZE DATA in celestial mathematics, MEDICAL STATISTICS, reliability and jurisprudence.

Developed LAPLACE’S APPROXIMATION to multi-dimensional integralsAnd LAPLACE TRANSFORMATIONS (moment generation functions)

Page 13: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Scottish moral philosopher and leading political economist.The Wealth of Nations , 1776

Rejected the idea that:Demand must be related to utility i.e. the more useful a thing is, and the more satisfaction it gives, the more people would be willing to pay for it.

THE PARODOX OF DIAMONDS AND WATERWater is necessary for life, and yet very cheapDiamonds have little utility, and are yet very costly.Smith thereby concluded that willingness to pay is not related to utility.

Adam Smith proposed using interval bounds for probabilities, rather than precisely specified subjective probabilities

Adam Smith (1723-1794)

Page 14: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Jeremy Bentham (1748-1832)

British philosopher, jurist and social reformer.

Regarded by some as the father of modern utilitarianism, and by others, in the context of banking, insurance, and speculation, as the founder of the subjectivist, Bayesian approach to decision making. (Bentham’s approach to subjective probability is an earlier version of the exact, linear approach recommended as being rational by Tversky and Kahnemann).

Introduction to Principles of Morals and Legislation, 1780GREATEST HAPPINESS PRINCIPLE:It is the greatest happiness of the greatest number which is the principle of right or wrong. Classification of 12 pains and 14 pleasures by which we may test the happiness factor of any action.Formalised set of criteria for measuring the extent of pain or pleasure that any decision will create. Reviewed concept of punishment, and whether a particular punishment will create more pain or pleasure for society. Bentham applied similar ideas to monetary economics.

Page 15: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Augustus De Morgan (1806-71)

Anglo-Indian mathematician, statistician and spiritualist. Appointed to Chair of Mathematics at University of London (later UCL) in 1838 See his Essay on Probabilities (1838) De Morgan further developed Bayes’s and Laplace’s approach to INVERSE PROBABILITY...

Posterior probabilities when the prior distribution is uniform.

Somewhat arbitrary e.g. a uniform prior for a non-linear transformation of the parameter

will give different posterior. Uniform priors over on continuous unbounded

parameter space are improper, but can, though not always, yield meaningful proper posteriors.

De Morgan sought to justify uniform prior by Laplace’s Principle of Insufficient Reason

Page 16: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

For remainder of 19th century(A) Many statistical scientists (e.g. Gauss, Edgeworth, Galton) thought Bayesian(B) Inverse probabilities remained the main methodology for statistical Inference. Fisher dabbled with then in the early 20th century and discarded them because of the arbitrariness in the choice of uniform prior.(C) Emphasis seemed to shifted somewhat to numerical and graphical summaries of data.

e.g. London Cholera epidemic map (1832) and Crimean War (Florence Nightingale, e.g. pie charts)

 Florence Nightingale (1820-1910) Nurse and statistician

Page 17: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Sir Francis Galton (1822-1911)

English geneticist, statistician and polymath, a truly great man of science

In 1877 built machine called GALTON QUINCUNX

Used simulations while attempting to calculate posterior distribution

Galton encouraged use of Bayes Theorem

Informative conjugate analysis for normal distribution developed around that time.

Page 18: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

American philosopher, logician, mathematician and scientist. ‘The father of pragmatism’ Emphasised that objective statistical conclusions can only be hoped for if the data result from a randomised experiment.

Was the first scientist to elicit subjective probabilities in experimental psychology.

Page 19: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

French Military Officer1894 TRIAL OF MILLENIUMDreyfus tried for treasonBizarrely justified subjective ‘probability’ of forgery.Falsely convicted of transmitting military secrets to Germany.Probability related to possible coincidences concerning frequencies of symbols in the code.

‘SIMILAR PROBLEMS OCCUR TODAY WHENEVER STATISTICAL EVDENCE AND SUBJECTIVE PROBABILITIES ARE INTRODUCED INTO EVIDENCE’

David H. Kaye, Minnesota Law Review (2007)O.J. Simpson murder case, Adam’s Rape Case, Sally Clark Cot Death CaseSee also D.H. Kaye (2010) DNA identification and the threat to civil liberties. Yale University Press

Alfred Dreyfus9 October 1859 – 12 July 1935)

Page 20: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Frank Ramsey (1903-1930)

British mathematician, philosopher and economist1926 papers on subjective probability and utility were encouraged by the economist John Maynard KeynesHis work on subjective probability and its elicitation satisfied Charles Peirce’s empirical test.

Used by experimental psychologists and recognised in 1944 by Von Neumann and Morgenstern, in their book The Theory of Games and Economic Behaviour

Famously used utility theory to judge ‘how much of its wealth a nation should spend’

Close friend of philosopher Ludwig Wittgenstein whose works he translated

Never stay up on the barren heights of cleverness, but come down into the green valleys of silliness

Page 21: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Highly eccentric English statistician, evolutionary biologist, geneticist and eugenicsOne of the chief architects of neo-Darwinian synthesis Galton Professor of Eugenics at UCL (1933-43) Argued with Karl Pearson e.g, about who should teach which course.

Dabbled with Bayesian inference and inverse probability, then argued vehemently against it because of its dependence on prior e.g. the choice of ‘vague’ so-called ignorance prior.

Introduced FIDUCIAL INFERENCE in paper in Annals of Eugenics (1935).Disputed by Neyman and shown by Lindley in 1958 to violate Kolmorogov’s addition laws of probability.

Sir Ronald Fisher (1990-1962)

Page 22: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

John Maynard Keynes (1883-1946)

Baron Keynes of TiltonCambridge EconomistEmployed expected utility in 1936 in Chapter 12 of The General Theory of Employment, Interest and Money.

Keynesian Economics has fundamentally affected the theory and practice of modern macroeconomics, and influenced the policies of governments, until about 1979, until the ideas of Milton Friedman, who also used expected utility, took over.

Page 23: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Sir Harold Jeffreys F.R.S. (1891-1989)

Cambridge-based Mathematician, Statistician, Geologist and Astronomer The Theory of Probability (1939)Precursed Anglo-American Bayesian Revival of 1960s

Led by Rudolf Kalman, Raiffa and Schlaifer, Mosteller and Wallace, Box and Tiao, John Aitchison F.R.S.E and Dennis Lindley. INCLUDED: Invariance priors---Vague priors which refer to the determinant of Fisher’s Information and yield posterior distributions which are invariant under non-linear transformations of the parameters. Approximate Bayes intervals (also approximate confidence intervals) centred on the maximum likelihood estimate, which also refer to the likelihood dispersion.

Page 24: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Andrey Kolmogorov (1903-1987)

Pre-eminent Russian Mathematician and Probabilist Introduced concept of Bayesian sufficiency in his paper on the statistical estimation of the law of Gauss in !942 in URSS Bulletin of the Academy of Sciences.

Kolmogorov’s Extension Theorem constrains us to only defining our probability distributions on measurable subsets of the parameter space or sample space(i.e. those which are elements of an appropriate sigma-field, such as a Borel field)

Page 25: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

Alan Turing (1912-1954) Irving Jack Good (1916-2009 )

Alan Turing: Gay icon and martyr, father of machine intelligence, modern computer science and artificial intelligence. Also the father of modern Bayesian applied statistics.Jack Good: cryptanalysist, mathematician, statistician and philosopher.While solving the Nazi codes at Bletchley Park, Turing and Good used various pioneering, effectively Bayesian procedures including•Empirical alternatives to Bayes factors as measures of evidence•Effectively Bayesian sequential analysis and decision-tree analysis•Shrinkage estimators for multinomial cell probabilities, which smooth the relative frequencies of the letters in the German code towards a common value,

Page 26: REFERENCES: A personal history of Bayesian Statistics (2014) Wiley Interdisciplinary Reviews, Comput Stat, 6:80-115 with link to remaining chapters (from.

If we deduce that knowledge comes from irrationality and out of rationality comes rationality then we must  also  deduce that most of our conventional knowledge derives from the senses and that every rational saying is a pragmatic lie (Adam Logan, Farewell Halcyon Days, 2013)

Thomas Tallis 1988-NotDeadYet Adam Empirius Logan

"If Bayesians live to be a hundred they think they think they've got it made, Very few people die past that age."