The Mabinogion Sheep Problem - Cornell Universitypi.math.cornell.edu/~web6720/Kun_slides.pdf · K....

The Mabinogion Sheep Problem

Kun Dong

Cornell University

April 22, 2015

K. Dong (Cornell University) The Mabinogion Sheep Problem April 22, 2015 1 / 18

Introduction


(Williams 1991) we are given a herd of black and whitesheep at time t = 0. At each time t = 1, 2, · · · , a sheep israndomly selected, and a sheep of the opposite color (if anyremain) becomes the same color as the chosen sheep. Justafter t = 0 and each transition, we can remove any numberof white sheep (if any remain) from the herd. The goal isto maximize the expected final number black sheep.

A naive lower bound


Definition. A Policy is a function a : (Z∗)3 → Z∗ forwhich a(t,#white,#black) is the number of white sheepremoved at time t.

Suppose we start with w0 white sheep and b0 black sheep.One of the policy is that we immediately remove all whitesheep at t = 0. Hence there is an easy lower bound for theoptimal policy, which is the starting number of black sheep.

Decision Process


Let (Wt ,Bt) be the number of white and black sheep at time t,with the natural filtration Ft = σ((Ws ,Bs)s≤t).

P(

(Wt+1,Bt+1) = (w , b)∣∣∣Ft

)= p

((Wt ,Bt), a(t), (w , b)

)

What if a ≡ 0? Total number of sheep stays the same b0 + w0.

p(b, b + 1) =b

b0 + w0, p(b, b − 1) = 1− b

b0 + w0

for 0 < b < b0 + w0.

Decision Process


Let (Wt ,Bt) be the number of white and black sheep at time t,with the natural filtration Ft = σ((Ws ,Bs)s≤t).

P(

(Wt+1,Bt+1) = (w , b)∣∣∣Ft

)= p

((Wt ,Bt), a(t), (w , b)

)What if a ≡ 0? Total number of sheep stays the same b0 + w0.

p(b, b + 1) =b

b0 + w0, p(b, b − 1) = 1− b

b0 + w0

for 0 < b < b0 + w0.

Transition probability


For the Markov Chain Bt with initially k sheep of eachcolor, the transition matrix is

p =

1 0

2k−12k

. . . 12k

2k−i2k

. . . i2k

12k

. . . 2k−12k

0 1

Bt is not irreducible.

{0, 2k} are recurrent but all other states are transient.

Any π with π(0) + π(2k) = 1 is stationary.

Absorbing Markov Chain


We can divide the communicating classes into transientclasses and ergodic classes.

Absorbing Markov Chain


We can divide the communicating classes into transientclasses and ergodic classes.

We call a chain an Absorbing Markov Chain when allergodic classes are singleton sets.

AMC cont.


TheoremFor any finite Markov chain, the probability that the process is inergodic classes tends to 1 as n→∞.

Proof. For any transient state, there is a positive probability to leavethe transient class, so there is also a positive probability to reach anergodic class. Because we have finite states, there exists N and psuch that the probability of entering an ergodic state within N stepsis at least p. This means the probability of not entering an ergodicstate after kN steps is less than (1− p)k → 0 as k →∞.

LemmaP(Bt ∈ {0, 2k})→ 1

Fundamental Matrix


We can put the transition matrix P into the form

P =

[S 0

R Q

]=⇒

1 00 1

2k−12k 0 1

2k

2k−i2k

. . . i2k

2k−12k

12k 0

S - transition within ergodic classes T̃

Q - transition within transient classes TR - transition from transient classes to ergodic classes

The fundamental matrix is N = (I − Q)−1 = I + Q + Q2 + · · · =∑k=0

Qk

N = (I + Q)−1 =∑

k=0 Qk


Remark 1. Let nj be the number of visits to a transient state j .Ei [nj ] =

∑pkij = Nij

Remark 2. In similar ways we can also get Vari (nj), Ei

[∑j∈T nj

],

Vari[∑

j∈T nj]

from the fundamental matrix N.

Theorem. Let bij be the probability that the chain starts intransient state i and ends in absorbing state j , then {bij} = B = NR.

Proof. By Strong Markov property,

bij = pij +∑k∈T

pikbkj =⇒ B = R + QB =⇒ (I − Q)B = R

The sheep chain

Figure: E [B∞] with respect to b0

Eb0 [B∞] = (2k)bb0,2k

= 2k( B0∑

i=1

1

(2k − i)!

(2k − 1

i − 1

))/( 2k∑i=1

1

(2k − i)!

(2k − 1

i − 1

))K. Dong (Cornell University) The Mabinogion Sheep Problem April 22, 2015 10 / 18

Policy A


Policy A

Do Nothing, Wt < Bt

Reduce Wt to Bt − 1, Wt ≥ Bt

Definition. The value function V (w , b) is the expected final number ofblack sheep under policy A if we start with w white sheep and b blacksheep.

As a result, V has the following properties,

(a1) V (0, b) = b

(a2) V (w , b) = V (w − 1, b) = V (b − 1, b) whenever w ≥ b > 0.

(a3) V (w , b) = ww+bV (w + 1, b − 1) + b

w+bV (w − 1, b + 1) wheneverb > w > 0.

Claim. V (Wn,Bn) is a martingale w.r.t {Fn}.

V (Wn+1,Bn+1) = Wn

Wn+BnV (Wn+1,Bn−1)+ Bn

Wn+BnV (Wn−1,Bn+1) = V (Wn,Bn)

Policy A


Policy A

Do Nothing, Wt < Bt

Reduce Wt to Bt − 1, Wt ≥ Bt

Definition. The value function V (w , b) is the expected final number ofblack sheep under policy A if we start with w white sheep and b blacksheep.As a result, V has the following properties,

(a1) V (0, b) = b

(a2) V (w , b) = V (w − 1, b) = V (b − 1, b) whenever w ≥ b > 0.

(a3) V (w , b) = ww+bV (w + 1, b − 1) + b

w+bV (w − 1, b + 1) wheneverb > w > 0.

Claim. V (Wn,Bn) is a martingale w.r.t {Fn}.

V (Wn+1,Bn+1) = Wn

Wn+BnV (Wn+1,Bn−1)+ Bn

Wn+BnV (Wn−1,Bn+1) = V (Wn,Bn)

Supermartingale


Lemma 1. V (w , b) ≥ V (w − 1, b) whenever w > 0Lemma 2. V (w , b) ≥ w

w+bV (w + 1, b − 1) + bw+bV (w − 1, b + 1)

whenever w , b > 0.

Suppose these two Lemmas hold, we have the theoremTheorem. For any policy, V (Wn,Bn) is a supermartingale.

Proof. Indeed, any action we take does not increase the value of V .Hence

E [V (Wn+1,Bn+1)|Fn] ≤ E [V (Wn,Bn)|Fn] = V (Wn,Bn)

Martingale Convergence


Because V (Wn,Bn) is a supermartingale it converges almost surely.Hence the chain must end up in an absorbing state for which allsheep are of one color. In this case, for deterministic W0 and B0,

E [B∞] = E [V (W∞,B∞)] ≤ V (W0,B0)

Therefore, for any initial number of black and white sheep, theexpected final number of black sheep under any policy is no morethan the expected final number of black sheep under Policy A. Weconclude that

Policy A is optimal

Long-term Limit


TheoremV (k , k)− (2k +

π

4−√πk)→ 0

Remark. If we start with 1000 black sheep and 1000 white sheep,we expect to finish with 1945 black sheep.

Proof.pk = 2−2k

(2kk

)∼ (πk)−1/2 by Stirling’s Formula.

Let αk = vk − (2k + π4 − pk). For ρk = 2pk

1+pk, vk+1 = vk + ρk(2k + 1− vk).

αk+1 = (1− ρk)αk − ρkck for ck =(pk − pk+1)(1 + pk)

2p2kpk+1− π

4

Because ck → 0 and∏

(1− ρk)→ 0, αk → 0

Continuous Case


Let Bt ,Wt be the number of black and white sheep, but fort ∈ R∗.At is also a continuous time, non-decreasing process(Heuristically dAt = atdt, at ≥ 0).

Xt = Bt + Wt , Yt = Bt −Wt . T = inf{t ≥ 0 : |Yt | ≥ Xt}.

dXt = −dAt

dYt =Yt

Xtdt + dAt +

√2dβt

V (x , y) = maxA

E(x ,y)[XT1{YT≥XT }]

Application


Analogous idea in Portfolio Selection with Transaction Cost(Davis and Norman, 1990).

An invester decide between a bank account paying a fixedinterest rate, and a stock with log-normal diffusion price.

He consumes from the bank, and tries to maximize theexpectation of consumption E

∫∞0 e−δtu(c(t))dt.

Let s0(t) be the holding in bank and s1(t) be the holding in stock.

ds0(t) = (rs0(t)− c(t))dt

ds1(t) = αs1(t) + σs1(t)dB

In this case, we want s0(t)s1(t)

= π∗ and c(t)s0(t)+s1(t)

= C .

References

Terence Chan, Some diffusion models for the mabinogion sheep problem ofwilliams, Advances in applied probability (1996), 763–783.

Mark HA Davis and Andrew R Norman, Portfolio selection with transaction costs,Mathematics of Operations Research 15 (1990), no. 4, 676–713.

John G Kemeny and James Laurie Snell, Finite markov chains: with a newappendix” generalization of a fundamental matrix”, Springer, 1976.

David Williams, Probability with martingales, Cambridge university press, 1991.


Thank you!


The Mabinogion Sheep Problem - Cornell Universitypi.math.cornell.edu/~web6720/Kun_slides.pdf · K....

Documents

Transcript of The Mabinogion Sheep Problem - Cornell Universitypi.math.cornell.edu/~web6720/Kun_slides.pdf · K....