A Subjective Foundation of Objective Probability · IntroductionStructureSubjective ergodic...

Post on 29-Sep-2020

13 views 0 download

Transcript of A Subjective Foundation of Objective Probability · IntroductionStructureSubjective ergodic...

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

A Subjective Foundation of ObjectiveProbability

Luciano De Castro, UIUC&

Nabil I. Al-Najjar, MEDS

RUD 2009

To obtain a copy of this paper and other related work, visit:http://www.kellogg.northwestern.edu/faculty/alnajjar/htm/index.htm

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Similarity as basis for probability judgements Skip slide

In his classic 1937 article, de Finetti wondered:“How should an insurance company evaluate the probabilitythat an individual dies in a given year?”

De Finetti’s view was:First choose a class of “similar” events

use the frequency as base-line estimate of the probability

e.g., : “death in a given year of an individual of the sameage [...] and living in the same country.”

The choice of a class of “similar” events is subjective

..maybe “not individuals of the same age and country, butthose of the same profession and town, . . . etc, where onecan find a sense of ‘similarity’ that is also plausible.”

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Three fundamental ideas Skip slide

1 Similarity is not optional:probability judgements are founded on similarity

2 Similarity is exchangeabilityexchangeability formalizes the notion of similarity

3 Similarity is subjectiveSimilarity is the decision maker’s model or theory of theworld.

“we cannot repeat an experiment and look for a coveringtheory; we must have at least a partial theory before weknow whether we have a repetition of the experiment.”

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Alternative title:

Similarity First!

Luciano De Castro, UIUC&

Nabil I. Al-Najjar, MEDS

RUD 2009

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

De Finetti’s ideas in action Skip slide

Sequence of experiments, each with outcome S

S is Polish (complete, separable metric space)

σ-algebra S

State space:Ω = S × S × · · ·

Generic element ω = (s1, s2, . . .)

Borel σ-algebra Σ on Ω

Decision problem is static, but an inter-temporalinterpretation may be suggestive.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Acts and permutations Skip slide

An act is any bounded, measurable function

f : Ω→ R

Interpret values of acts as utils

F set of all acts

Π is the set of finite permutations π

Given an act f and a permutation π, define f π by

f π(s1, s2, ...) = f (sπ(1), sπ(2), ...)

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Stylized example: coin tosses Skip slide

S = H,T; generic ω = H,T ,T ,H, . . .

You might be interested in betting on the i th coin turningHeads

Typical assumption:

“the experiment is repeated under identical conditions”

“Absurd!” says de Finetti !!

If the experiments were really identical, then the coinshould either always turns up H or always T

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Exchangeability as model of similarity Skip slide

The fact that the outcome varies means that there areimportant factors that I do not understand or care to model

But I judge the experiments as “similar”

..means: these factors affect experiments symmetrically

De Finetti’s exchangeability: for all f ∈ F , π ∈ Π

f ∼ f π.

In the stylized coin example, H is equally likely in allexperiments

Heads in toss i ∼ Heads in toss j

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

De Finetti’s Theorem Skip slide

De Finetti’s theorem: a preferencethat is

1 exchangeable and

2 subjective expected utility

.. has the representation

P(A) =

∫θ∈Θ

Pθ(A) dµ.

1 Parameters: i.i.d. distributions Pθ indexed by θ ∈ Θ ≡ ∆(S)

2 Bayesian belief µ over the parameters Θ

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

De Finetti’s Theorem and its discontents Skip slide

Contentment: Parameters formally capture the extent to whichexperiments are similar

experiments are different enough to have differentoutcomes, but similar enough to share a commondistribution Pθ

The discontents: de Finetti’s theorem confounds similaritywith the decision criterion. It simultaneously identifies

the parameters Θ

and that the decision maker has a prior about them:

P(A) =

∫θ∈Θ

Pθ(A) d µ.

..but how about classical statisticians, ambiguity, etc?Similarity is an overarching principle for all decision making andinference; Bayesianism is not.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

1 Introduction

2 Structure

3 Subjective ergodic theory

4 Sufficient statistics

5 Ergodicity

6 Ambiguity

7 Conclusions

8 Appendix

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Preorder and Monotonicity Skip slide

Assumption 1: < is reflexive and transitive.(may be incomplete)

Assumption 2: If f (ω) ≥ g(ω) for all ω ∈ Ω, then f < g.

Assumption 3: For every x , y ∈ R,

x > y =⇒ x y .

Old Assumption: For any α ∈ R, f < g implies

f + α < g + α.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Null sets Skip slide

An event E is <-null if any pair of acts that differ only on E areindifferent.

Assumption 4:1E ∼ 0 =⇒ E is null;

α1E < β1E for some β > α =⇒ E is null.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Continuity Skip slide

We require continuity with respect to:

f n → f ⇐⇒ω ∈ Ω : lim

nf n(ω) 6= f (ω)

< is-null

Assumption 5:f n → f and gn → g;

f n < gn for all n

|f n(ω)| ≤ b(ω) and |gn(ω)| ≤ b(ω), for all ω and someb ∈ F .

Then f < g.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Exchangeability Skip slide

De Finetti’s condition:f ∼ f π

has little force without expected utility. In our more generalsetting, we need a stronger condition.

DefinitionA preference < is exchangeable if for every act f ∈ F andpermutations π1, . . . , πn

f ∼ f π1 + · · ·+ f πn

n.

Stylized coin example:

Heads in toss i ∼ average of Heads in tosses 1, . . . ,n

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

To simplify the exposition, for the remainder of the paper:

restrict to the set of acts F1 thatdepend only on the firstcoordinate

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Shifting states Skip slide

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Shifting acts Skip slide

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Ergodic Theorem

If < is exchangeable then for every act f , the act

f ?(ω) ≡ limn→∞

1n

n−1∑j=0

f(

T jω)

is well-defined, except on a <-null event, and

f ? ∼ f .

Note that f ?, when it exists, is objective. Its existence isconsequence of subjective assessment of similarity

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Ergodic Decomposition Skip slide

Eθ(ω) ≡ (f ?)−1(ω) ≡ all states ω′ such that f ?(ω′) = f ?(ω)

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Ergodic Decomposition Skip slide

Starting from ω, T defines an orbit that remains in Eθ(ω)

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Ergodic Decomposition

There is a parametrization(

Eθ,Pθ)

θ∈Θsuch that:

the Eθ ’s partition Ω

Pθ is the unique exchangeable distribution supported byEθ:

Pθ(Eθ) = 1

For every exchangeable preference < and parameter θFor Pθ-almost all ω ∈ Eθ:

f ?(ω) =

∫Ω

f dPθ(ω).

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Ergodic Decomposition Skip slide

Starting with ω, we obtain same answer if we:compute the ‘objective’ frequentist limit: f ?(ω)

calculate the expected utility:∫

Ω f dPθ(ω)

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Quick note on ‘learning’ Skip slide

Infinite data =⇒ ω is observed =⇒ θ can be learned

But the model does not admit a mechanism to change theexchangeability relationship(

Eθ,Pθ)

is the decision maker’s window to the world!His way to organize information.

The decision maker interprets the sequence

H,T ,H,T ,H, ......

as confirming that θ = 0.5

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Sufficient statistics: Goal and summary Skip slide

We will show that the i.i.d. parametrization Θ is sufficient, in thesense of sufficient statistics, for the class of exchangeablepreferences.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Sufficient statistics Skip slide

In statistics, a parameter is sufficient for a family ofdistributions if it contains all “relevant information” aboutthat family

Formally, a σ-algebra Z is sufficient for a family ofdistributions P if

P(· | Z)

does not depend on P ∈ P.

Here we have preferences, not distributions, so this cannotmake sense for us

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Sufficient Statistics Skip slide

Define FΘ ⊂ F to be the set of acts measurable withrespect to Eθθ∈Θ.

Define the mapping

Φ : F → FΘ by Φ(f )(ω) =

∫Ω

f dPθ(ω).

We allow ambiguity about parameters, of course.

However, a parameter means that, once its value is know,acts are evaluated using expected utility.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Sufficient Statistic Theorem

The i.i.d. parametrization Θ is sufficient for the set ofexchangeable preferences:

For every f and g

f < g ⇐⇒ Φ(f ) < Φ(g).

The parameter-based act associated with f :

F (θ) ≡ Φ(f )(θ) ≡∫

Ωf dPθ.

< induces a preference <<< on parameter-based acts:

f < g ⇐⇒ F <<< G.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

The Classic de Finetti’s TheoremAn expected utility preference < is exchangeable if and only ifthere is a belief µ on Θ such that:

F <<< G ⇐⇒∫

ΘF dµ ≥

∫Θ

G dµ.

This confounds:

similarity-based, statistical constructs:

Pθ ′s

Bayesian criterion to resolve uncertainty about parameters

µ

with no statistical or similarity interpretation

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Inter-subjective consensus Skip slide

CorollaryLet < be any exchangeable preference.

If all exchangeable Bayesians prefer f to g, then so would <.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Ergodicity: Goals Skip slide

We introduce learning-style conditions that imply subjectiveexpected utility.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Ergodicity Skip slide

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Probabilities as Frequencies Skip slide

E is invariant if T (E) = E ;

< is ergodic if

1 it is exchangeable, and

2 E is invariant =⇒ either E or Ec is <-null.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Empirical distributions Skip slide

The empirical distribution at ω is the set-function whichassigns to each event A the value

ν(A, ω) ≡ 1?A(ω),

if this value exists, and is not defined otherwise.

this is just the empirical frequency of A along ω.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Subjective Probability from Frequencies

If < is ergodic, then:

There is an event Ω′ with <-null complement, such thatthe empirical distribution ν(·) = ν(·, ω) is a well-definedprobability distribution on S that is constant in ω ∈ Ω′; and

< restricted to F1 is an expected utility preference withsubjective probability ν.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Different derivations Skip slide

Savage’s theory:

normative axioms: P1, P2, P4, P6+

exchangeability

exchangeable subjective probability P

Our result:

Exchangeability & Ergodicity

Expected utility: subjective belief = empirical measure ν

thus, P1, P2, P4, P6 !!

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Statistical ambiguity: Goal Skip slide

We define and characterize unambiguous events purely basedon learning foundations.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Statistical ambiguity: Summary Skip slide

Assume our weak assumptions

Exchangeability(but no substantive ambiguity-flavor assumptions: eg incompleteness, ambiguity aversion..)

(a) Subjective set of priors driven by learning conditions

(b) expected utility on statistically unambiguous events

(c) no implications about the attitude towards ambiguity(no pessimism, fear of malevolent nature..)

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Statistical ambiguity Skip slide

Assume:

finite outcome space Sfocus on the set of acts F1 that depend only on the firstcoordinate.

Definition

Given a family of subsets C ⊂ 2S, a partial probability ν on C isa function ν : C → [0,1] such that there is a probabilitydistribution on S that agrees with ν on C.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Statistical ambiguity Skip slide

Definition

An event A ⊂ S is <-statistically unambiguous if there isΩ′ ∈ Σ with <-null complement, such that the frequency of A,ν(A) = ν(A, ω), is constant in ω ∈ Ω′.

The crucial part of the definition is the requirement that ν(A, ω)is independent of ω off a <-null set.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Characterization Skip slide

Theorem (Statistical Ambiguity)

Assume S is finite. For any exchangeable preference <

The set of <-statistically unambiguous events C ⊂ 2S is aλ-system, i.e., a family of sets closed under complementsand disjoint unions;

The empirical measure ν(·) is a partial probability on C; and

For every C-measurable acts f ,g ∈ F1:

f < g ⇐⇒∫

f dν ≥∫

g dν.

C is only a λ-system.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Concluding remarks Skip slide

De Finetti understood the distinction between

1 Defining subjective probability, and2 Formulating probability judgements

Current conception of subjective probability emphasizesthe first, and largely ignores the second

Exchangeability, as formal model of similarity, is the part ofthe preference on which probability judgements are based

This paper does what de Finetti’s theorem cannot do:

separate similarity judgements from the decision criterion

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Moving forward Skip slide

An exchangeability relationship is the decision maker’stheory, of what constitutes similar experiments

..but so far we have nothing to say about theory choice,theory change...

Given our results, we can potentially talk about

Normative criteria that guide theory or model choiceModel uncertaintyIntegrating classical and Bayesian methods

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

The End!!

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Skip slide

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Epstein-Seo, Part 1 Skip slide

Epstein-Seo. Part 1, strengthen:

f ∼ f π

to:αf + (1− α)f π ∼ f (∗)

Under our general assumptions, our condition

f ∼ f π1 + · · ·+ f πn

n.

neither implies nor is it implied by (*)

More important is the difference between our results

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Epstein-Seo, Part 1 Skip slide

Assume our weak assumptions plus exchangeability

Epstein-Seo, Part 1:

Gilboa-Schmeidler axioms (completeness, C-indep., ambiguityaversion)⇓

(1) Gilboa-Schmeidler set of priors is exchangeable;(b) pessimistic MEU criterion given this set

Our approach:

Various learning conditions⇓

(a) Subjective set of priors driven by learning conditions;(b) no implications about the attitude towards statistical

ambiguity

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Epstein-Seo, Part 2 Skip slide

Epstein-Seo second model maintains:

f ∼ f πbut allow:

αf + (1− α)f π f (∗)

Now, parameters are sets of measures; the representation is aprobability distribution over sets of measures

Examples of parameters:the set of all Dirac measures on Ω is a possible parameter

so is the set of all independent distributions

the set of all independent distributions with marginalsbelonging to .2, .6or to .43, .91, . . . etc.

Introduction Structure Subjective ergodic theory Sufficient statistics Ergodicity Ambiguity Conclusions Appendix

Epstein-Seo, Part 2 Skip slide

Important contribution because it shows the sort ofanomalies that might arise when exchangeability isweakened.

A parametrization where individual parameters are sets,including the set of all sample paths, seems far removedfrom the intuitive idea of parameters as useful devices tosummarize information.

It is difficult to imagine how statistical inference canproceed on this basis.