IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan...

13
IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or

Transcript of IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan...

Page 1: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

IDSIA Lugano Switzerland

On the Convergence Speed of MDL Predictions for Bernoulli Sequences

Jan Poland and Marcus Hutter

Is MDL Really So Bad?

or

Page 2: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

2

Big Picture

MDL Bayes Other methods, e.g. PAC-Bayes

Page 3: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

3

Bernoulli Classes

121410

38164

111001

18164

111000

58164

111010

341161101

78164

111011

01400

11401

#

wCode

141161100

Code = 111|{z}1+#bi ts

0|{z}stop

10|{z}data

² Set of parameters £ = f#1;#2; : : :g½[0;1]² Weights w# for each#2 £² Weights correspond to codes: w#=2¡ (Code#)

Page 4: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

4

² Givenobservedsequencex=x1x2:::xn

² Probabilityof x given#:p#(x) =##ones(x)(1¡ #)n¡ #ones(x)

² Posterior weights w#(x) =w#p#(x)P#w#p#(x)

² Bayesmixture»(x) =P#w#(x)#

² MDL/MAP #¤(x) =argmax#w#(x)#

² MaximumLikelihood (ML):SameasMAP,butwithprior weightsset to1

Estimators

Page 5: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

5

An Example Process

Sequence x

Bayes mixture

ML estimate

MAP (MDL) *

0.5

0

0

0

0.21

0

0

1

0.5

0.5

0.5

0

0.45

0.34

0.5

0000011

0.4

5/16

0.5

...(32)...

0.27

0.25

0.25

...(640)...

0.3

5/16

5/16

Trueparameter#0= 5

16 =0:3125

Page 6: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

6

² Let #0 2 £ bethe trueparameter withweight w0

² » converges to #0 almost surely and fast,

preciselyP 1t=0E(»¡ #0)2 · ln(w¡ 10 )

² #¤ convergesto#0 almost surelyandingeneral slow,

preciselyP 1t=0E(#¤ ¡ #0)2 · O(w¡ 10 )

² Even true for arbitrary non-i.i.d. (semi-) measures!² TheML estimates converge to #0 almost surely,no such assertion about convergencespeed possible

What We Know

Page 7: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

7

² Bayesmixturebound is descri pt i on l ength(#0)

² MDL bound is exp(descri pt i on l ength(#0))

² ) MDL is exponentiallyworse in general

² This is also a loss bound!

² Howabout simple classes?

² Deterministic classes: can showboundhuge constant£(descri pt i on l ength(#0))3

² Simplestochastic classes, e.g. Bernoulli?

Is MDL Really So Bad?

Page 8: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

8

N parameters, w#= 1Nfor all #, #0= 1

2

MDL Is Really So Bad!

12+ 116

12+ 18

12+ 14

12

: :: } }}}}

Pt E(#

¤ ¡ #0)21#¤2[12+18;12+14]

=O(1)

Pt E(#

¤ ¡ #0)21#¤2[12+ 116;12+ 18]

=O(1)

Pt E(#

¤ ¡ #0)2=O(w¡ 10 ) in the following example:

Page 9: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

9

² The instantaneous loss bound is good,

precisely E (#¤ ¡ #0)2 · 1nO¡ln(w¡ 10 )

¢

² This does not imply a ¯nitely bounded cumulativeloss!

² The cumulative loss bound is good for certain niceclasses (parameters+weights)

² Intuitively: Bound is good if parameters of equalweights areuniformly distributed

MDL Is Not That Bad!

Page 10: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

10

² De ne interval construction (I k; J k) which exponen-tially contracts to #0

² Let K (I k) betheshortest description lengthof some#2 I k

Prepare Sharper Upper Bound

0 18

178

34

58

12

38#0= 1

4}J 0= [0; 12)

}I 0= [12;1]

}}}

I1 J1I1

Page 11: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

11

² Let K (J k) betheshortest description lengthof some#2 J k

² Let ¢ (k) =max©K (I k) ¡ K (J k);0

ª

² Theorem:X

t

E(#¤ ¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)p¢ (k)

¢

² Corollaries: \Uniformly distributed weights ) goodbounds

Sharper Upper Bound

Page 12: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

12

² £ = fall computable#2 [0;1]g

² w#=2¡ K (#), whereK denotesthepre xKolmogorovcomplexity

²Pk 2¡ ¢ (k)

p¢ (k) = 1 ) Theoremnot applicable

² Conjecture:X

t

E(#¤¡ #0)2 · O¡lnw¡ 10 +

1X

k=1

2¡ ¢ (k)¢

² ) bound huge constant£pol ynomi al holds forincompressible#0

² Compare to determistic case

The Universal Case

Page 13: IDSIA Lugano Switzerland On the Convergence Speed of MDL Predictions for Bernoulli Sequences Jan Poland and Marcus Hutter Is MDL Really So Bad? or.

13

² Cumulativeand instantaneousboundsareincompat-ible

² Main positivegeneralizes to arbitrary i.i.d. classes

² Openproblem: goodboundsformoregeneral classes?

² Thank you!

Conclusions