A Berry Esseen Theorem for the Lightbulb Processbcf.usc.edu/~larry/papers/pdf/ltb11.pdf · A Berry...

A Berry Esseen Theorem for the Lightbulb Process

Larry Goldstein and Haimeng ZhangUniversity of Southern California and Mississippi State University

January 4, 2010

Abstract

In the so called lightbulb process, on days r = 1, . . . , n, out of n lightbulbs, all initially off, exactly rbulbs, selected uniformly and independent of the past, have their status changed from off to on, or viceversa. With X the number of bulbs on at the terminal time n, an even integer, and µ = n/2, σ2 = Var(X),we have

supz∈R

∣∣∣∣P (X − µσ

)− P (Z ≤ z)

∣∣∣∣ ≤ n

2σ2∆0 + 1.64

n

σ3+

2

σ

where

∆0 ≤1

2√n

+1

2n+ e−n/2 for n ≥ 4,

yielding a bound of order O(n−1/2) as n→∞. A similar, though slightly larger bound holds for n odd.The results are shown using a version of Stein’s method for bounded, monotone size bias couplings. Theargument for even n depends on the construction of a variable Xs on the same space as X which has theX size bias distribution, that is, that satisfies

EXg(X) = µEg(Xs) for all bounded continuous g,

and for which there exists a B ≥ 0, in this case B = 2, such that X ≤ Xs ≤ X + B almost surely. Theargument for n odd is similar to that for n even, but one first couples X closely to V , a symmetrizedversion of X, for which a size bias coupling of V to V s can proceed as in the even case.

1 Introduction

The problem we consider here arises from a study in the pharmaceutical industry on the effects of dermalpatches designed to activate targeted receptors. An active receptor will become inactive, and an inactiveone active, if it receives a dose of medicine released from the dermal patch. Let the number of receptors, allinitially inactive, be denoted by n. On study day 1, 2, . . ., some number s of n randomly selected receptorswill each receive one dose of medicine, changing their statuses between the inactive and active states. Weadopt the following, somewhat more colorful, though equivalent, ‘lightbulb process’ formulation. Consider ntoggle switches, each being connected to a lightbulb. Pressing the toggle switch connected to a bulb changesits status from off to on and vice versa. The problem of determining the properties of X, the number oflight bulbs on at the end of day n, was first considered in [6] for the case where on each day s = 1, . . . , n,exactly s of the n switches are randomly pressed.

Consider the lightbulb process with some number k of stages where sr ∈ 0, . . . , n lightbulbs are toggledin stage r, for r = 1, . . . , k, and let s = (s1, . . . , sk). To track the status of some subset of b of the n bulbs,we define

λn,b,s =b∑t=0

(b

t

)(−2)t

(s)t(n)t

and λn,b,s =k∏r=1

λn,b,sr, (1)

0AMS 2010 subject classifications: Primary 62E17, 60C05, 60B15, 62P10.0Key words and phrases: Normal approximation, toggle switch, Stein’s method, size biasing.

1

where (n)k = n(n− 1) · · · (n− k+ 1) denotes the falling factorial, and the empty product is 1. In particular

λn,1,s = 1− 2sn

and λn,2,s = 1− 4sn

+4s(s− 1)n(n− 1)

for s = 1, . . . , n.

Generalizing the results in [6], writing Xs for the number of bulbs on at the terminal time when applyingsr switches in stages r = 1, . . . , n, the martingale method in Proposition 4 of [10] shows that if the chain isinitialized with all bulbs off, then

EXs =n

2(1− λn,1,s) and Var(Xs) =

n

4(1− λn,2,s) +

n2

4(λn,2,s − λ 2

n,1,s). (2)

In particular, the mean µ = EX and variance of σ2 = Var(X) of X as considered in [6] are given by (2)with s = [n] where [n] = 1, . . . , n. Other results in [6] include recursions for determining the exact finitesample distribution of X. Though approximations to the distribution of X, including by the normal, werealso considered, the quality of such approximations, and the asymptotic normality of X was left open.

The following theorem settles the question of the asymptotic distribution of X by providing a bound tothe normal which holds for all finite n, and which tends to zero at the rate n−1/2 as n tends to infinity. Forn = 2m+ 1 an odd number, let

sm,m = (1, . . . ,m− 1,m,m,m+ 2, . . . , n), sm+1,m+1 = (1, . . . ,m− 1,m+ 1,m+ 1,m+ 2, . . . , n) (3)

and

λn,b =12(λn,b,sm,m + λn,b,sm+1,m+1

).

We proceed in the odd case by coupling X to a more symmetric random variable V with mean and variancegiven respectively by

µV =n

2(1− λn,1

)and (4)

σ2V =

n2

16(λn,1,sm,m

− λn,1,sm+1,m+1)2 +n

4(1− λn,2) +

n2

4

(λn,2 −

λ2n,1,sm,m

+ λ2n,1,sm+1,m+1

2

).

Theorem 1.1 Let X be the number of bulbs on at the terminal time n in the lightbulb process, applyingsr = r switches in stages r = 0, 1, . . . , n. For all n even

supz∈R

∣∣∣∣P (X − n/2σ≤ z)− P (Z ≤ z)

∣∣∣∣ ≤ n

2σ2∆0 + 1.64

n

σ3+

2σ, (5)

where σ2 = Var(X) is given by (2) and

∆0 ≤1

2√n

+1

2n+ e−n/2, for n ≥ 6.

For all n = 2m+ 1 odd,

supz∈R

∣∣∣∣P (X − µVσV≤ z)− P (Z ≤ z)

∣∣∣∣ ≤ µVσ2V

∆1 + 3.28µVσ3V

+1σV

(2 +

1√2π

),

where µV and σ2V are given in (4) and

∆1 ≤1

2√n

+√

54n

+1

n√

2(n− 1)+√

2ne−n/4 + e−n/2, for n ≥ 7. (6)

2

In the even case, as λn,1,[n] and λn,2,[n] decay exponentially fast to zero, the variance σ2 is of order n andthe bound (5), therefore, of order 1/

√n; analogous remarks hold for the case where n is odd.

We now more formally describe the lightbulb process. With n ∈ N fixed and s = (s1, . . . , sn) forsr ∈ 0, . . . , n with r = 1, . . . , n, we will let Xs = Xrk : r = 0, 1, . . . , n, k = 1, . . . , n denote a collection ofBernoulli variables, and will write X for Xs when s = (1, . . . , n). For r ≥ 1 these ‘switch variables’ have theinterpretation that

Xrk =

1 if the status of bulb k is changed at stage r,0 otherwise.

The initial state of the bulbs is given deterministically by X0k, k = 1, . . . , n, which will be taken to bestate zero, that is, all bulbs off, unless specifically stated otherwise; in fact, non zero initial conditions areconsidered in Lemma 5.2. At stage r, sr of the n bulbs are chosen uniformly to have their status changed,and the stages are independent of each other. Hence, with e1, . . . , en ∈ 0, 1, the joint distribution ofXr1, . . . , Xrn is given by

P (Xr1 = e1, · · · , Xrn = en) =

(nsr

)−1 if e1 + · · ·+ en = sr,

0 otherwise,(7)

with the collections Xr1, . . . , Xrn independent for r = 1, . . . , n.

Clearly, for each stage r, the variables (Xr1, · · · , Xrn) are exchangeable, and the marginal distribution ofXrk is given by

P (Xrk = 1) =srn

and P (Xrk = 0) = 1− srn

for all r, k = 1, . . . , n.

For r, k = 1, . . . , n, the quantity (∑ri=1Xik) mod 2 is the indicator that bulb k is on at time r, so letting

Xk =

(n∑r=0

Xrk

)mod 2 and Xs =

n∑k=1

Xk, (8)

the variable Xk is the indicator that bulb k is on at the terminal time, and the random variable Xs is thenumber of bulbs on at the terminal time.

The lightbulb process, where the n individual states evolve according to the same marginal Markov chain,is a special case of a class of multivariate chains studied in [10]. As shown in [10], such chains admit explicitfull spectral decompositions, and in particular, the transition matrices for each stage of the lightbulb processcan be simultaneously diagonalized by a Hadamard matrix. These properties were, in fact, put to use in[6] in particular for the calculation of the moments needed to compute the mean and variance of X whensr = r, r = 0, . . . , n. Here we put these same properties to somewhat more arduous work, the calculation ofmoments of fourth order.

That no higher order moments are required for the proof of a finite sample bound holding for all n is onedistinct advantage of the technique we apply here, Stein’s method for the normal distribution, brought to lifein the seminal monograph [9]. By contrast, the method of moments requires the calculation and appropriateconvergence of moments of all orders, and obtains only convergence in distribution. Stein’s method for thenormal is based on the characterization of the normal distribution in [8], which states that Z ∼ N (0, 1), thatis, that Z is standard normal, if and only if

E[Zg(Z)] = E[g′(Z)] (9)

for all absolutely continuous functions g for which these expectations exist. The idea behind Stein’s methodis that if a mean zero, variance 1 random variable W is close in distribution to Z, then W will satisfy (9)approximately. Hence, to gauge the proximity of W to Z for a given test function h, one can evaluate thedifference Eh(W )−Nh, where Nh = Eh(Z), by solving the Stein equation

f ′(w)− wf(w) = h(w)−Nh

3

for f and evaluating E[f ′(W )−Wf(W )] instead.A priori it may appear that an evaluation of E[f ′(W )−Wf(W )] would more difficult than for Eh(W )−

Nh. However, the former may be handled though the use of couplings. Here we consider size bias couplingsin particular. Given a nonnegative random variable Y with positive finite mean µ = EY , we say Y s has theY size biased distribution if P [Y s ∈ dy] = (y/µ)P [Y ∈ dy], or more formally, if

E[Y g(Y )] = µEg(Y s) for bounded continuous functions g. (10)

The use of size-biased couplings in Stein’s method was introduced in [1], where it was used to developbounds of order σ−1/2 for the normal approximation to the number of local maxima Y of a random functionon a graph, where σ2 = Var(Y ). In [4] the method was extended to multivariate normal approximations,and the rate was improved to σ−1, for the expectation of smooth functions of a vector Y recording thenumber of edges with certain fixed degrees in a random graph. In [3] the method was used to give bounds inthe Kolmogorov distance of order σ−1 for various functions on graphs and permutations, and in [5] for twoproblems in the theory of coverage processes, with bounds of this same order.

Here we prove and apply Theorem 2.1, which requires that Y and a random variable Y s, having the Ysize-biased distribution, be constructed on a common space such that for some B ≥ 0,

Y ≤ Y s ≤ Y +B

with probability one, that is, the coupling must be monotone, and bounded. Loosely speaking, Theorem 2.1says that given any such coupling of Y and Y s on a common space, an upper bound on the Kolmogorovdistance between the distribution of Y and the normal can computed in terms of B and the quantity

∆ =√

Var(E(Y s − Y |Y )). (11)

Theorem 2.1 is based on a concentration type inequality, provided in Lemma 2.1For the lightbulb process, a size biased coupling of X to Xs is achieved in the even case by the con-

struction, for each i = 1, . . . , n, of a collection Xi from a given X as follows. If Xi = 1, that is, if bulbi is on at the terminal times, we set Xi = X. Otherwise, let J be uniformly chosen from all j for whichXn/2,j = 1−Xn/2,i and let Xi be the same as X but that the values of Xn/2,i and Xn/2,J are interchanged.Let Xi be the number of bulbs on at the terminal time when applying the switch variables Xi. Then, withI uniformly chosen from 1, . . . , n, the variable Xs = XI has the X size bias distribution, essentially due tothe fact, shown in Lemma 3.1, that

L(Xi) = L(X|Xi = 1).

Due to the parity issue, to handle the odd case, say n = 2m+ 1, we first construct a coupling of X to amore symmetric variable V . In particular, V is constructed from the same switch variables as X, but thatwith equal probability either one additional switch variable is applied in stage m, or one fewer in stage m+1.A size biased coupling of V to V s can be achieved as in the even case, thus yielding a bound to the normalfor X.

In Section 2 we present Theorem 2.1, which gives a bound to the normal when a bounded, monotonesize biased coupling can be constructed for a given X. Our coupling constructions and proof of the boundfor the even case of the lightbulb process are given in Section 3. Calculations of the bounds on the variance∆ in (11) require estimates on λn,b,s in (1). These estimates are based on the work of [10] which yieldsthe spectral decomposition of the underlying transition matrix. The estimates required, for both the evenand odd cases, are given in the Appendix. Symmetrization, that is, the construction of V from X, couplingconstructions for V , and the proof of the bound in the odd case are given in Section 4.

2 Bounded Monotone Couplings

Theorem 2.1 for bounded monotone size bias couplings depends on the following lemma, which is in somesense the size bias version of Lemma 2.1 of [7]. With Y having mean µ and variance σ2, both finite and

4

positive, with some slight abuse of notation in the definition of W s, we set

W =Y − µσ

and W s =Y s − µσ

. (12)

Lemma 2.1 Let Y be a nonnegative random variable with mean µ and variance σ2, both finite and positive,and let Y s be given on the same space as Y , having the Y size biased distribution and satisfying Y s ≥ Y .Then with W and W s given in (12), for any z ∈ R and a > 0,

µ

σE(W s −W )1W s−W≤a1z≤W≤z+a ≤ a.

Proof. For fixed z ∈ R let

f(w) =

−a w ≤ zw − z − a z ≤ w ≤ z + 2a

a w ≥ z + 2a.

Then, by (12) and (10)

a ≥ E(Wf(W ))

=1σE(Y − µ)f

(Y − µσ

)=

µ

σE (f(W s)− f(W ))

=µ

σE

∫ W s−W

0

f ′(W + t)dt

≥ µ

σE

∫ W s−W

0

10≤t≤a1z≤W≤z+af ′(W + t)dt.

Noting that f ′(W + t) = 1z≤W+t≤z+2a, so that

10≤t≤a1z≤W≤z+af ′(W + t) = 10≤t≤a1z≤W≤z+a,

we obtain

a ≥ µ

σE

∫ W s−W

0

10≤t≤a1z≤W≤z+adt

=µ

σE(min(a,W s −W )1z≤W≤z+a

)≥ µ

σE((W s −W )1W s−W≤a1z≤W≤z+a

).

Theorem 2.1 Let Y be a nonnegative random variable with mean µ and variance σ2, both finite and positive,and let Y s be given on the same space as Y , with the Y size biased distribution, satisfying Y ≤ Y s ≤ Y +Bfor some B > 0. Then with W and W s given in (12), we have

supz∈R|P (W ≤ z)− P (Z ≤ z)| ≤ µ

σ2∆ + 0.82

δ2µ

σ+ δ,

where

∆ =√

Var(E(Y s − Y |Y )) and δ = B/σ. (13)

5

Proof. For z ∈ R arbitrary, let h(w) = 1w≤z and let f(w) be the unique bounded solution to the Steinequation

f ′(w)− wf(w) = h(w)−Nh, (14)

where Nh = Eh(Z) with Z ∼ N (0, 1). Substituting W into (14) and using (12) and (10) yields

E (h(W )−Nh)= E (f ′(W )−Wf(W ))

= E(f ′(W )− µ

σ(f(W s)− f(W ))

)= E

(f ′(W )(1− µ

σ(W s −W ))− µ

σ

∫ W s−W

0

(f ′(W + t)− f ′(W ))dt

). (15)

We have the following bounds on the solution f from [9], Chapter II,

0 < f(w) <√

2π/4 and |f ′(w)| ≤ 1, (16)

and, as noted in [7], as a consequence of (16) and the mean value theorem, we obtain

|(w + t)f(w + t)− wf(w)| ≤ (|w|+√

2π/4)|t|. (17)

Noting that EY s = EY 2/µ by (10), we find

µ

σE(W s −W ) =

µ

σ2

(EY 2

µ− µ

)= 1,

and therefore, taking expectation by conditioning, and then applying (16), we may bound the first term in(15) as

|Ef ′(W )E

(1− µ

σ(W s −W )|W

)| ≤ µ

σ

√Var(E(W s −W |W )) =

µ

σ2∆.

To bound the remaining term of (15), using (14), we have

µ

σ

∫ W s−W

0

(f ′(W + t)− f ′(W )) dt

=µ

σ

∫ W s−W

0

[(W + t)f(W + t)−Wf(W )]dt+µ

σ

∫ W s−W

0

(1W+t≤z − 1W≤z)dt. (18)

Applying (17) to the first term in (18), and using 0 ≤ W s −W ≤ δ and EW 2 = 1, shows that the absolutevalue of the expectation of this term is bounded by

µ

σE

(∫ W s−W

0

(|W |+

√2π4

)tdt

)≤ µ

2σE

((W s −W )2(|W |+

√2π4

)

)≤ µ

2σδ2(1 +

√2π4

) ≤ 0.82δ2µ

σ.

Taking the expectation of the absolute value of the second term in (18), we obtain

µ

σE

∣∣∣∣∣∫ W s−W

0

(1W+t≤z − 1W≤z)dt

∣∣∣∣∣ =µ

σE

(∫ W s−W

0

1z−t<W≤zdt

)≤ µ

σE((W s −W )1z−δ<W≤z

)since 0 ≤ W s −W ≤ δ with probability 1. Lemma 2.1 with a = δ and z replaced by z − δ shows this termcan be no more than δ. Since z ∈ R was arbitrary the proof is complete.

6

3 Normal approximation of X: even case

In this section we provide the proof of Theorem 1.1 when is n even, starting with a coupling of X, the totalnumber of bulbs on at the terminal time n, to a variable Xs with the X size bias distribution. Let U(S)denote the uniform distribution over a finite set S.Theorem 3.1 With n ∈ N even, let X = Xrk : r, k = 1, . . . , n be a collection of switch variables withdistribution given by (7) with sr = r for r = 1, . . . , n. For every i = 1, . . . , n let Xi be given from Xas follows. If Xi = 1 then Xi = X. Otherwise, with J i ∼ Uj : Xn/2,j = 1 − Xn/2,i, independent ofXrk : r 6= n/2, k = 1, . . . , n, let Xi = Xi

rk : r, k = 1, . . . , n where

Xirk =

Xrk r 6= n/2Xn/2,k r = n/2, k 6∈ i, J iXn/2,Ji r = n/2, k = iXn/2,i r = n/2, k = J i,

and let Xi =∑nk=1X

ik where

Xik =

(n∑r=1

Xirk

)mod 2.

Then, with I uniformly chosen from 1, . . . , n and independent of all other variables, the mixture XI = Xs

has the X size biased distribution and satisfies

Xs −X = 21XI=0,XJI =0. (19)

In particular, X ≤ Xs ≤ X + 2.We prove Theorem 3.1 making use of a number of preliminary lemmas, and also of the following simple

observation. As the marginal distribution of the switch variables is given by

P (Xri = 1) =r

n, for all r, i = 1, . . . , n,

when n is even we have P (Xn/2,i = 1) = 1/2. Hence, by the independence of the switch variables overdifferent stages, for any e1, . . . , en ∈ 0, 1,

P (X1i = e1, . . . , Xn/2,i = en/2, . . . , Xni = en) = P (X1i = e1, . . . , Xn/2,i = 1− en/2, . . . , Xni = en).

Writing x ≡ y when x = ymod 2, we have in particular

P (Xi = 0) =∑

∑r er≡0

P (X1i = e1, . . . , Xn/2,i = en/2, . . . , Xni = en)

=∑

∑r er≡0

P (X1i = e1, . . . , Xn/2,i = 1− en/2, . . . , Xni = en)

=∑

∑r er≡1

P (X1i = e1, . . . , Xn/2,i = en/2, . . . , Xni = en)

= P (Xi = 1),

so

P (Xi = 0) = P (Xi = 1) =12

for all i = 1, . . . , n. (20)

Lemma 3.1 For all i = 1, . . . , n, the collections of random variables X and Xi as specified in Theorem 3.1satisfies

L(Xi) = L(X|Xi = 1).

7

Proof. Let i ∈ 1, . . . , n and ei = eirk : r, k = 1, . . . , n with eirk ∈ 0, 1 for r, k = 1, . . . , n. First note that

P (Xi = ei) =12P (Xi = ei|Xi = 1) +

12P (Xi = ei|Xi = 0)

=12P (X = ei|Xi = 1) +

12P (Xi = ei|Xi = 0),

so it suffices to prove

P (Xi = ei|Xi = 0) = P (X = ei|Xi = 1). (21)

As the construction of Xi preserves the number of switches in each stage r, that is, since∑kX

irk =∑

kXrk, we may assume∑k e

irk = r for all r, as otherwise both sides of (21) are zero. By construction we

have Xii = 1, so P (Xi = ei) = 0 whenever

∑r e

iri ≡ 0, in which case, again, both sides of (21) are zero.

Hence we need only verify (21) assumingn∑r=1

eiri ≡ 1. (22)

To look at the values of the switch variables for bulb k, k = 1, . . . , n over all stages, let

eik = (ei1k, . . . , eink), Xk = (X1k, . . . , Xnk) and Xi

k = (Xi1k, . . . , X

ink) for k = 1, . . . , n.

When (22) holds, writing J for J i for simplicity, we have

P (Xi = ei|Xi = 0) = 2P (Xi = ei, Xi = 0)= 2P (Xi

1 = ei1, . . . ,Xin = ein, Xi = 0)

= 2P (Xik = eik, k 6∈ i, J,Xi

i = eii,XiJ = eiJ , Xi = 0)

= 2n∑j=1

P (Xik = eik, k 6∈ i, j,Xi

i = eii,Xij = eij , Xi = 0, J = j)

= 2n∑j=1

P (Xk = eik, k 6∈ i, j,Xii = eii,X

ij = eij , Xi = 0, J = j).

On the event J = j the vectors Xii and Xi

j are equal to Xi and Xj , respectively, with their n/2nd coordinatesinterchanged, and the values at these coordinates are unequal. Hence, letting eiji and ejij equal eii and eij ,respectively, with their n/2nd coordinates interchanged, we have

P (Xi = ei|Xi = 0) = 2n∑j=1

P (Xk = eik, k 6∈ i, j,Xi = eiji ,Xj = ejij , Xi = 0, J = j).

Note that the probability above is zero for ein/2,i = ein/2,j , as when Ji = j the statuses of Xn/2,i and

Xn/2,j are different, and when ein/2,i 6= ein/2,j and Xi = eiji , then by (22) we have

Xi ≡n∑r=1

eijri ≡n∑r=1

eiri + 1 ≡ 0.

Hence,

P (Xi = ei|Xi = 0) = 2n∑j=1

P (Xk = eik, k 6∈ i, j,Xi = eiji ,Xj = ejij , J = j)

= 2n∑j=1

P (J = j)P (Xk = eik, k 6∈ i, j,Xi = eiji ,Xj = ejij |J = j).

8

Since J is independent of the variables in all stages r 6= n/2, the second probability factors, and the jth

summand equals∏r 6=n/2

P (Xrk = eirk, k 6∈ i, j, Xri = eijri, Xrj = ejirj)

× P (J = j)P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = eijn/2,i, Xn/2,j = ejin/2,j |J = j)

=∏r 6=n/2

P (Xrk = eirk, k = 1, . . . , n)

× P (J = j)P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i|J = j),

the last equality due to the fact that eijrk and ejirk equal eirk for r 6= n/2 and k 6∈ i, j, and that eijn/2,i = ein/2,jand ejin/2,j = ein/2,i. As the first product does not depend on j, taking it past the sum we obtain

P (Xi = ei|Xi = 0) =∏r 6=n/2

P (Xrk = eirk, k = 1, . . . , n) (23)

× 2n∑j=1

P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i, J = j).

Since for each j the jth summand in (23) equals zero when ein/2,i = ein/2,j , and also noting that thereare n/2 indices whose switch variable at state n/2 is of opposite status to that of the ith, the jth summandequals

P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i, J = j)1(ein/2,i 6= ein/2,j)

= P (J = j|Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i)1(ein/2,i 6= ein/2,j)

× P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i)

=2nP (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i)1(ein/2,i 6= ein/2,j).

These probabilities are equal for all the n/2 values of j for which ein/2,j 6= ein/2,i, and zero otherwise, sosumming over j in (23) yields, now taking j to be any index such that ein/2,j 6= ein/2,i,

P (Xi = ei|Xi = 0) =∏r 6=n/2

P (Xrk = eirk, k = 1, . . . , n)

× 2P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,j , Xn/2,j = ein/2,i)

=∏r 6=n/2

P (Xrk = eirk, k = 1, . . . , n)

× 2P (Xn/2,k = ein/2,k, k 6∈ i, j, Xn/2,i = ein/2,i, Xn/2,j = ein/2,j)

= 2n∏r=1

P (Xrk = eirk, k = 1, . . . , n) = 2P (X = ei),

where we have used exchangeability to obtain the second equality.But now, by (22),

P (Xi = ei|Xi = 0) = 2P (X = ei) = 2P (X = ei, Xi = 1) = P (X = ei|Xi = 1),

which is (21). The next lemma will be used to construct our even case coupling. The result is a special case of Lemma

2.1 of [4], but we give a short direct proof to make the paper more self-contained.

9

Lemma 3.2 Suppose X is a sum of nontrivial exchangeable Bernoulli variables X1, . . . , Xn, and that fori ∈ 1, . . . , n the variables Xi

1, . . . , Xin have joint distribution

L(Xi1, . . . , X

in) = L(X1, . . . , Xn|Xi = 1).

Then

Xi =n∑j=1

Xij ,

has the X size biased distribution Xs, as does the mixture XI when I is a random index with values in1, . . . , n, independent of all other variables.Proof. We need to show that Xi satisfies (10), that is, that E[X]Eg(Xi) = E[Xg(X)] holds for a givenbounded continuous g. Now, for such g

E[Xg(X)] =n∑j=1

E[Xjg(X)] =n∑j=1

P [Xj = 1]E[g(X)|Xj = 1].

As exchangeability implies that E[g(X)|Xj = 1] does not depend on j, we have

E[Xg(X)] =

n∑j=1

P [Xj = 1]

E[g(X)|Xi = 1] = E[X]E[g(Xi)],

demonstrating the first result. The second follows from

Ef(XI) =n∑i=1

E[f(XI), I = i] =n∑i=1

E[f(XI)|I = i]P (I = i) =n∑i=1

Ef(Xi)P (I = i)

=n∑i=1

Ef(Xs)P (I = i) = Ef(Xs)n∑i=1

P (I = i) = Ef(Xs).

We now present the proof of Theorem 3.1.

Proof. With Xi as given in the theorem, Lemma 3.1 yields that the hypotheses of Lemma 3.2 are satisfied,so we may conclude that the given Xs has the X size biased distribution. To prove (19), first note that ifXI = 1 then XI = X, hence in this case Xs = X. Otherwise, for XI = 0, recall that for the given I thecollection XI is constructed from X by interchanging the stage n/2, unequal, switch variables Xn/2,I andXn/2,JI . If XJI = 1 then after the interchange XI

I = 1 and XIJI = 0, yielding Xs = X. If XJI = 0 then

after the interchange XII = 1 and XI

JI = 1, yielding Xs = X + 2. Based on the spectral decomposition in the Appendix, we now provide an upper bound to the term (13)

in Theorem 2.1, for this case.Lemma 3.3 Let n be even and X and Xs given by Theorem 3.1. Then for n ≥ 6

∆0 ≤1

2√n

+1

2n+ e−n/2 where ∆0 =

√Var(E(Xs −X|X)).

Proof. Recall the construction of Theorem 3.1, and for notational simplicity let JI = J , so that we maywrite Xs −X = 21XI=0,XJ=0. Expanding the indicator over the possible values of I and J , and then overthe values of the relevant switch variables at stage n/2, we have

1XI=0,XJ=0 =n∑

i,j=1

1Xi=0,Xj=01I=i,J=j

10

=n∑

i,j=1

1Xi=0,Xj=0,Xn/2,i=01I=i,J=j +n∑

i,j=1

1Xi=0,Xj=0,Xn/2,i=11I=i,J=j

=∑i 6=j

1Xi=0,Xj=0,Xn/2,i=0,Xn/2,j=11I=i,J=j +n∑i 6=j

1Xi=0,Xj=0,Xn/2,i=1,Xn/2,j=01I=i,J=j

= 2∑i 6=j

1Xi=0,Xj=0,Xn/2,i=0,Xn/2,j=11I=i,J=j,

where the second to last equality holds almost surely, as the probability of the event I = i, J = j is zerowhenever Xn/2,i and Xn/2,j agree, and the last inequality holds as the final expression is the sum of twoterms which remain the same when interchanging i and j.

Letting F be the σ-algebra generated by X, the collection of all switch variables. The first term in thesum above, 1Xi=0,Xj=0,Xn/2,i=0,Xn/2,j=1, is measurable with respect to F , while for the second conditioningyields

P (I = i, J = j|F) =2n2

1Xn/2,i 6=Xn/2,j,

as for any i, there are n/2 choices for j satisfying the condition in the indicator above. Hence, recalling thatXs −X = 21XI=0,XJ=0, we have

E

(Xs −X

∣∣∣∣ F) = Un where Un =4n2

∑i 6=j

1Xi=0,Xj=0,Xn/2,i=0,Xn/2,j=1, (24)

and

∆20 = Var(Un).

Taking the expectation of Un in (24), by the exchangeability of the (n)2 terms in the sum and identity(58),

EUn =4n2

(n)2g1,1,n,[n],n/2 =14

(1− λn,2,[n]n/2

). (25)

Squaring (24) in order to obtain the second moment of Un, we obtain a sum over indices i1, i2, j1, j2 withi1 6= j1, i2 6= j2, and i1 6= j2, i2 6= j1, so |i1, i2, j1, j2| ∈ 2, 3, 4, and we may write

U2n = U2

n,2 + U2n,3 + U2

n,4 where (26)

U2n,p =

16n4

∑|ii,i2,j1,j2|=p

i1 6=j1,i2 6=j2,i1 6=j2,i2 6=j1

1Xi1=0,Xj1=0,Xn/2,i1=0,Xn/2,j1=11Xi2=0,Xj2=0,Xn/2,i2=0,Xn/2,j2=1.

Beginning the calculation with the main term U2n,4, where all four indices are distinct, taking expectation

and using exchangeability and (59) yields

EU2n,4 =

16n4

(n)4g2,2,n,[n],n/2 =(n− 2

4n

)2 (1− 2λn,2,[n]n/2

+ λn,4,[n]n/2

). (27)

With the inequalities over the summation in (26) in force, the event |i1, i2, j1, j2| = 3 can occur when

a) i1 6= i2, j1 = j2 or b) i1 = i2, j1 6= j2.

Using (60), case a) leads to a contribution of

16n4

(n)3g2,1,n,[n],n/2 =n− 24n2

(1 + λn,1,[n]n/2

− λn,2,[n]n/2− λn,3,[n]n/2

),

11

while in the same manner, using (61), the contribution from case b) is

16n4

(n)3g1,2,n,[n],n/2 =n− 24n2

(1− λn,1,[n]n/2

− λn,2,[n]n/2+ λn,3,[n]n/2

).

Totaling we find

EU2n,3 =

n− 22n2

(1− λn,2,[n]n/2

). (28)

With the inequalities over the summation in (26) in force, the event |i1, i2, j1, j2| = 2 can only occurwhen i1 = i2 and j1 = j2, hence, again using (58),

EU2n,2 =

16n4

(n)2g1,1,n,[n],n/2 =1n2

(1− λn,2,[n]n/2

). (29)

Summing (27), (28) and (29),

EU2n =

(n− 2

4n

)2 (1− 2λn,2,[n]n/2

+ λn,4,[n]n/2

)+

12n

(1− λn,2,[n]n/2

).

Hence, by (25),

Var(Un) =116

(1− 2

n

)2 (1− 2λn,2,[n]n/2

+ λn,4,[n]n/2

)+

12n

(1− λn,2,[n]n/2

)− 1

16

(1− λn,2,[n]n/2

)2

=116

(λn,4,[n]n/2

− λ2n,2,[n]n/2

)+

1− n4n2

(1− 2λn,2,[n]n/2

+ λn,4,[n]n/2

)+

12n

(1− λn,2,[n]n/2

)=

14n

+1

4n2+

116

(λn,4,[n]n/2

− λ2n,2,[n]n/2

)− 1

2n2λn,2,[n]n/2

+1− n4n2

λn,4,[n]n/2.

Now applying Lemma 5.3, for n ≥ 6,

Var(Un) ≤ 14n

+1

4n2+

116

(12e−n + e−2n

)+

12n2

e−n +1− n8n2

e−n

≤ 14n

+1

4n2+ e−n

(116

+1n2

+1

8n

),

and applying the inequality√a+ b ≤

√a+√b for nonnegative a and b yields the claim of the lemma.

With all ingredients at hand, we may now prove the bound for even n.Proof of Theorem 1.1, even case. The size biased coupling given in Theorem 3.1 satisfies the hypotheses ofTheorem 2.1 with B = 2, and the result for the even case follows by applying Theorem 2.1 with δ = 2/σand the bound in Lemma 3.3.

4 Normal Approximation of X: Odd case

Now we move to the case where n = 2m + 1 is odd. Instead of directly forming a size biased coupling ofXs to X, we first couple X closely to a more symmetrical random variable V , for which a coupling like theone in the even case may be applied. The variable V is constructed by randomizing stages m and m+ 1 asfollows. With equal probability we either add an additional switch at stage m or remove a switch at stagem+ 1. We prove a normal bound for V in the same way as for X in the even case, and may then derive thenormal bound for X in the odd case based on X’s proximity to V .

Formally, let X be given with distribution (7) with sr = r, r = 1, . . . , n and set

Jr = j : Xrj = 0 for r = m,m+ 1,

12

so that Jm and Jm+1 are the bulbs that do not get toggled in stages m and m + 1, respectively. LetN ∼ Um,m + 1, Bm ∼ U(Jm) and Bm+1 ∼ U(J cm+1), independent of X and of each other. Now let thecollection V be given by

Vrk =

Xrk r 6= NXmk N = m, k 6= Bm

1 N = m, k = BmXm+1,k N = m+ 1, k 6= Bm+1

0 N = m+ 1, k = Bm+1,

(30)

and

V =n∑k=1

Vk where Vk =

(n∑r=1

Vrk

)mod 2. (31)

In other words, in all stages other than stage N the switch variables that produce V are the ones from thegiven collection X. If N = m, then the switch variables for all bulbs but bulb Bm, chosen uniformly overall bulbs in that stage that were not toggled, are the ones given by X and the switch variable for Bm is setto 1. Similarly, if N = m+ 1, the switch variable of bulb Bm+1, uniformly selected from all the bulbs thatwere toggled in that stage, is no longer toggled.

Various other couplings are possible, with mixed effects. In particular, the same analysis as the onebelow can be applied to the scheme where in stage m with probability 1/2 one additional switch is used and,independently in stage m + 1, with probability 1/2, one fewer. As two variables are affected the bound of1 in (34) increases to 2, but, on the other hand, more symmetry results, and, in particular, the variable soconstructed has mean zero. See [2] where the latter coupling was used to obtain a moderate deviation boundfor X.

Recalling the definitions of sm,m and sm+1,m+1 in (3), we may express the distribution of the collectionof switch variables V succinctly as

V =d

Xsm,m

N = m+ 1Xsm+1,m+1 N = m,

(32)

where Xsm,m and Xsm+1,m+1 are switch variables with distribution (7), and =d denotes equality in distribu-tion. As N ∼ Um,m+ 1, the law L(V) of V is the equal mixture of

L(V|N = m+ 1) = L(Xsm,m) and L(V|N = m) = L(Xsm+1,m+1). (33)

Consequently, for r ∈ m,m+ 1 and r,¬r = m,m+ 1,

P (Vr1 = e1, . . . , Vrn = en|N = r) = P (X¬r,1 = e1, . . . , X¬r,n = en)P (Vr1 = e1, . . . , Vrn = en|N = ¬r) = P (Xr1 = e1, . . . , Xrn = en),

and

L(Vr1, . . . , Vrn) =12L(Xm1, . . . , Xmn) +

12L(Xm+1,1, . . . , Xm+1,n).

Arguing as for (20),

P (Vri = 1) =12

for r ∈ m,m+ 1 which implies P (Vi = 1) =12

for all i = 1, . . . , n.

The following theorem shows how to construct a monotone, bounded size bias coupling to V .

13

Theorem 4.1 With n = 2m + 1 let X have distribution (7) with sr = r, r = 1, . . . , n and let V = Vrk :r, k = 1, . . . , n be constructed from X as in (30). For every i = 1, . . . , n let Vi be given from V as follows.If Vi = 1 then let Vi = V. Otherwise, with M ∼ Um,m + 1 and J i = Uj : VM,j = 1 − VM,i letVi = V irk : r, k = 1, . . . , n where

V irk =

Vrk r 6= MVM,k r = M,k 6∈ i, J iVM,Ji r = M,k = iVM,i r = M,k = J i,

and let V i =∑nk=1 V

ik where

V ik =

r∑j=1

V ijk

mod 2.

Then, with I uniformly chosen from 1, . . . , n and independent of all other variables, the mixture V s = V I

has the V size biased distribution and satisfies

V s − V = 21VI=0,VJI =0,

so in particular V ≤ V s ≤ V + 2. In addition, with probability one,

|X − V | ≤ 1, (34)

where X, given by (8) with s = (1, . . . , n), is the number of bulbs on at the terminal time using switchvariables X.In other words, in stage m or stage m+ 1, as determined by M , when Vi = 0 the switch for bulb i at stageM is interchanged with one having opposite parity.

The proof of Theorem 4.1 follows from the Lemma 4.1, as in the even case.Lemma 4.1 With Vi constructed as in Theorem 4.1,

L(Vi) = L(V|Vi = 1).

Proof. We argue as in Lemma 3.1, highlighting the differences, and use parallel notation, such as writing Jfor J i. It again suffices to show

P (Vi = ei|Vi = 0) = P (V = ei|Vi = 1)

forn∑k=1

eirk =

r r 6∈ m,m+ 1m or m+ 1 r ∈ m,m+ 1,

andn∑r=1

eiri ≡ 1. (35)

Forming the vector of switch variables for bulb k,

Vk = (V1k, . . . , Vnk)

and defining vectors such as eik likewise, using that P (Vi = 0) = 1/2 and that Vik = Vk for k 6∈ i, J,

decomposing by the values taken on by J and M , we have

P (Vi = ei|Vi = 0) = 2m+1∑r=m

n∑j=1

P (Vk = eik, k 6∈ i, j,Vii = eii,V

ij = eij , Vi = 0, J = j,M = r).

14

On the event J = j,M = r the vectors Vii and Vi

j are equal to Vi and Vj with their rth coordinatesinterchanged, and the values at these coordinates are unequal. Hence, letting eijri and ejirj equal eii and eij ,respectively, with their rth coordinates interchanged, the above probability is

2m+1∑r=m

n∑j=1

P (Vk = eik, k 6∈ i, j,Vi = eijri ,Vj = ejirj , Vi = 0, J = j,M = r).

Since the probabilities above are zero if eiri = eirj , and that V ii = 1 implies Vi = 0 when eiri 6= eirj , the sumequals

2m+1∑r=m

n∑j=1

P (Vk = eik, k 6∈ i, j,Vi = eijri ,Vj = ejirj , J = j,M = r)

= 2m+1∑r=m

n∑j=1

P (J = j,M = r)P (Vk = eik, k 6∈ i, j,Vi = eijri ,Vj = ejirj |J = j,M = r).

By the independence of the switch variables in stages s 6∈ r of J = j,M = r, the r, jth summand may bewritten ∏

s6=r

P (Vsk = eisk, k 6∈ i, j, Vsi = eijrsi , Vsj = ejirsj )

× P (J = j,M = r)P (Vrk = eirk, k 6∈ i, j, Vri = eijrri , Vrj = ejirsi |J = j,M = r)

=∏s6=r

P (Vsk = eisk, k = 1, . . . , n)

× P (Vrk = eirk, k 6∈ i, j, Vri = eirj , Vrj = eiri, J = j,M = r),

the last equality due to the fact that eijrsk and ejirsk equal eisk for s 6= r, and otherwise are eirj and eirj ,respectively. As the first product does not depend on j, we may write

P (Vi = ei|Vi = 0) = 2m+1∑r=m

∏s6=r

P (Vsk = eisk, k = 1, . . . , n) (36)

×n∑j=1

P (Vrk = eirk, k 6∈ i, j, Vri = eirj , Vrj = eiri, J = j,M = r)

.

Further decomposing according to the value of N , the term in the final sum equals

P (Vrk = eirk, k 6∈ i, j, Vri = eirj , Vrj = eiri, J = j,M = r) (37)

= P (Vrk = eirk, k 6∈ i, j, Vri = eirj , Vrj = eiri, J = j,M = r,N 6= r)

+ P (Vrk = eirk, k 6∈ i, j, Vri = eirj , Vrj = eiri, J = j,M = r,N = r).

The two terms above must be handled by considering further cases, depending on the values of r, eiriand eirj . Note first that the two probabilities being summed are zero when eiri = eirj , so we consider onlyeiri 6= eirj . For instance, for the first term in (37), specializing to r = m and eimi = 0, eimj = 1 results in

P (Vmk = eimk, k 6∈ i, j, Vmi = 0, Vmj = 1, J = j,M = m,N = m+ 1)1(eimi = 0, eimj = 1)

= P (J = j|Vmk = eimk, k 6∈ i, j, Vmi = 0, Vmj = 1,M = m,N = m+ 1)× P (Vmk = eimk, k 6∈ i, j, Vmi = 0, Vmj = 1,M = m,N = m+ 1)1(eimi = 0, eimj = 1).

15

Note that due to the condition N = m + 1, the last probability above is 0 unless∑k e

imk = m. When∑

k eimk = m and Vmi = 0, any j with status 1 has probability of 1/m of being chosen as J . Using that

V = X when M = m and N = m+ 1, and the independence of M and N from X, the product above equals

1mP (Vmk = eimk, k 6∈ i, j, Vmi = eimj , Vmj = eimi,M = m,N = m+ 1)1(eimi = 0, eimj = 1)

=1mP (Xmk = eimk, k 6∈ i, j, Xmi = eimj , Xmj = eimi,M = m,N = m+ 1)1(eimi = 0, eimj = 1)

=1

4mP (Xmk = eimk, k 6∈ i, j, Xmi = eimj , Xmj = eimi)1(eimi = 0, eimj = 1).

For eimi = 1, eimj = 0, as now there are m + 1 indices k 6= i with Xmk = 0, any of which can be chosenby J , this case leads similarly to the term

14(m+ 1)

P (Xmk = eimk, k 6∈ i, j, Xmi = eimj , Xmj = eimi)1(eimi = 1, eimj = 0).

Summing over j as indicated in (36) leads to the following contribution from the event M = m,N = m+1,taking j below to be any index not equal to i,

14P (Xmk = eimk, k 6∈ i, j, Xmi = 0, Xmj = 1)1(eimi = 0)1

(∑k

eimk = m

)

+14P (Xmk = eimk, k 6∈ i, j, Xmi = 1, Xmj = 0)1(eimi = 1)1

(∑k

eimk = m

)

=14

(n

m

)−1 (1(eimi = 0) + 1(eimi = 1)

)1

(∑k

eimk = m

)

=14P (Xmk = eimk, k = 1, . . . , n).

Likewise, the contribution from M = m,N = m + 1 is (1/4)P (Xm+1,k = eimk, k = 1, . . . , n), yieldingthat the second sum (36), for r = m, equals (1/2)P (Vmk = eimk, k = 1, . . . , n). Arguing in like manner forr = m+ 1 to obtain the same expression, substitution into (36) yields

P (Vi = ei|Vi = 0)

= 2m+1∑r=m

∏s6=r

P (Vsk = eisk, k = 1, . . . , n)× 12P (Vmk = eimk, k = 1, . . . , n)

= 2∏s 6=m

P (Vsk = eisk, k = 1, . . . , n)P (Vmk = eimk, k = 1, . . . , n)

= 2n∏r=1

P (Vrk = eirk, k = 1, . . . , n)

= 2P (V = ei) = P (V = ei|Vi = 1),

by (35).The last claim (34) of the theorem is immediate upon noting that X and V differ by at most one switch

variable, the one indexed by M,BM . Hence Xk = Vk for all k 6= BM , so X and V can differ by at most 1. With the coupling now in hand, we prove a bound to the normal for V .

Theorem 4.2 If n = 2m+ 1, an odd number, and V is given by (31), then

supz∈R

∣∣∣∣P (V − µVσV≤ z)− P (Z ≤ z)

∣∣∣∣ ≤ µVσ2V

∆1 + 3.28µVσ3V

+2σV

,

16

where the mean EV and variance Var(V ) are given by µV and σ2V in (4), respectively, and ∆1 satisfies the

upper bound in (6).Proof. That EV = µV and Var(V ) = σ2

V follows from (2) and (32), and, for the latter equality, the conditionalvariance formula.

By Theorem 4.1, we may apply Theorem 2.1 with δ = 2/σV , and it only remains to prove the upper bound(6) on ∆1 =

√VarE(V s − V |V ), for which we parallel the calculation of Section 3. Again, for notational

simplicity, let J = JI . From Theorem 4.1,

V s − V = 21VI=0,VJ=0.

Decomposing based on the possible values of I, J ,

1VI=0,VJ=0

=n∑

i,j=1

1Vi=0,Vj=01I=i,J=j

=n∑

i,j=1

1Vi=0,Vj=0,VMi=0,I=i,J=j +n∑

i,j=1

1Vi=0,Vj=0,VMi=1,I=i,J=j

=∑i 6=j

1Vi=0,Vj=0,VMi=0,VMj=11I=i,J=j +∑i 6=j

1Vi=0,Vj=0,VMi=1,VMj=01I=i,J=j,

where for the last equality we have used that J always selects a variable with opposite status to VMi.Letting F be the σ-algebra generated by Vrk,M,N, the first terms in the sums above are measurable

with respect to F , while for the second, similar to the identity in the even case analysis,

P (I = i, J = j|F)

=1nm

(1VMi=1,VMj=0,N=m + 1VMi=0,VMj=1,N=m+1

)+

1n(m+ 1)

(1VMi=1,VMj=0,N=m+1 + 1VMi=0,VMj=1,N=m

).

For instance, for the first term there will be n choices for i, and then m choices for j if the variable indexedby i takes the value one and N = m, yielding m zeros from which to choose j, or if the variable indexed byi takes the value zero and N = m+ 1, yielding m ones from which to choose j.

Hence, taking conditional expectation we have

E

(V s − V

∣∣∣∣ F) = Un +Wn,

where

Un = Un,1 + Un,2

:=2nm

∑i 6=j

1Vi=0,Vj=0,VMi=0,VMj=1,N=m+1 +2

n(m+ 1)

∑i 6=j

1Vi=0,Vj=0,VMi=0,VMj=1,N=m

and

Wn = Wn,1 +Wn,2

:=2nm

∑i6=j

1Vi=0,Vj=0,VMi=1,VMj=0,N=m +2

n(m+ 1)

∑i 6=j

1Vi=0,Vj=0,VMi=1,VMj=0,N=m+1.

17

Calculating the expectation of Un,1, consider that for i 6= j,

P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1, N = m+ 1,M = m)= P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1, N = m+ 1|M = m)P (M = m)

=12P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1, N = m+ 1)

=14P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1|N = m+ 1)

=14g1,1,n,sm,m,m,

using (33), that is, that the conditional distribution of V given N = m + 1 is that of Xsm,m, followed by

(57). Considering the case M = m+ 1 similarly and averaging we obtain

P (Vi = 0, Vj = 0, VMi = 1, VMj = 0, N = m) =14g1,1,n,sm,m,m.

Hence, lettingtm = (1, . . . ,m− 1,m,m+ 2, . . . , n),

by (58) we have

EUn,1 =2(n)2nm

14g1,1,n,sm,m,m =

(n)2nm

18

(1− λn,2,tm)m(m+ 1)

(n)2=m+ 1

8n(1− λn,2,tm

).

Likewise, for Un,2, we have that

P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1, N = m,M = m) =14P (Vi = 0, Vj = 0, Vmi = 0, Vmj = 1|N = m)

=14g1,1,n,sm+1,m+1,m,

so withtm+1 = (1, . . . ,m− 1,m+ 1,m+ 2, . . . , n),

after averaging over M = m+ 1, which yields the same result, by (58) we have

EUn,2 =2(n)2

n(m+ 1)14g1,1,n,sm+1,m+1,m =

m

8n(1− λn,2,tm+1).

For Wn,1, on M = m we have

P (Vi = 0, Vj = 0, Vmi = 1, Vmj = 0, N = m,M = m) =14g1,1,n,sm+1,m+1,m,

with the same result on M = m+ 1, hence

EWn,1 =2(n)2nm

14g1,1,n,sm+1,m+1,m+1 =

m+ 18n

(1− λn,2,tm+1)

and likewise

EWn,2 =2(n)2

n(m+ 1)14g1,1,n,sm,m,m+1 =

m

8n(1− λn,2,tm

).

Summing all four contributions,

E(Un +Wn) =14

(1−

λn,2,tm+ λn,2,tm+1

2

),

18

and so

[E(Un +Wn)]2 =164

(λ2n,2,tm

+ 2λn,2,tmλn,2,tm+1 + λ2n,2,tm+1

)− 1

16(λn,2,tm + λn,2,tm+1

)+

116. (38)

To obtain the second moment of Un +Wn, note that

(Un +Wn)2 = U2n,1 + U2

n,2 +W 2n,1 +W 2

n,2

as the summands that define Un,1, Un,2,Wn,1 and Wn,2 are indicators of disjoint events. To calculate theexpectation of U2

n,1, as in the proof of Lemma 3.3, we may write

U2n,1 = U2

n,1,2 + U2n,1,3 + U2

n,1,4 where (39)

U2n,1,p =

4n2m2

∑|ii,i2,j1,j2|=p

i1 6=j1,i2 6=j2,i1 6=j2,i2 6=j1

1Vi1=0,Vj1=0,VMi1=0,VMj1=1,Vi2=0,Vj2=0,VMi2=0,VMj2=1,N=m+1,

with analogous definitions for Un,2,p and a parallel decomposition for Wn.When all four indices are distinct, calculating using (57) and (59), as for the mean above, we find

EU2n,1,4 =

4(n)4n2m2

14g2,2,n,sm,m,m =

m2 − 116n2

(1− 2λn,2,tm+ λn,4,tm

) . (40)

With the inequalities over the summation in (39) in force, the event |i1, i2, j1, j2| = 3 can occur in onlythe following two ways,

a) i1 6= i2, j1 = j2 and b) i1 = i2, j1 6= j2.

It is straightforward to see that the contribution from case a) is

4(n)3n2m2

14g2,1,n,sm,m,m =

m+ 18n2

(1 + λn,1,tm− λn,2,tm

− λn,3,tm) ,

and from case b) is

4(n)3n2m2

14g1,2,n,sm,m,m =

m2 − 18n2m

(1− λn,1,tm− λn,2,tm

+ λn,3,tm) ,

yielding

EU2n,1,3 =

m+ 18n2

(1 + λn,1,tm − λn,2,tm − λn,3,tm) +m2 − 18n2m

(1− λn,1,tm− λn,2,tm

+ λn,3,tm) .

In view of the inequalities in (39), |i1, i2, j1, j2| = 2 can only occur when i1 = i2 and j1 = j2, yielding

EU2n,1,2 =

4(n)2n2m2

14g1,1,n,sm,m,m =

m+ 14n2m

(1− λn,2,tm).

In a completely analogous manner, we find that

EU2n,2,4 =

4(n)4n2(m+ 1)2

14g2,2,n,sm+1,m+1,m+1 =

m2(m− 1)16n2(m+ 1)

(1− 2λn,2,tm+1 + λn,4,tm+1

), (41)

EU2n,2,3 =

m2

8n2(m+ 1)(1 + λn,1,tm+1 − λn,2,tm+1 − λn,3,tm+1

)+

m(m− 1)8n2(m+ 1)

(1− λn,1,tm+1 − λn,2,tm+1 + λn,3,tm+1

),

19

and that

EU2n,2,2 =

m

4n2(m+ 1)(1− λn,2,tm+1).

A parallel analysis yields

EW 2n,1,4 =

m2 − 116n2

(1− 2λn,2,tm+1 + λn,4,tm+1

), (42)

EW 2n,1,3 =

m+ 18n2

(1 + λn,1,tm+1 − λn,2,tm+1 − λn,3,tm+1

)+m2 − 18n2m

(1− λn,1,tm+1 − λn,2,tm+1 + λn,3,tm+1

),

EW 2n,1,2 =

m+ 14n2m

(1− λn,2,tm+1),

and that

EW 2n,2,4 =

m2(m− 1)16n2(m+ 1)

(1− 2λn,2,tm + λn,4,tm) , (43)

EW 2n,2,3 =

m2

8n2(m+ 1)(1 + λn,1,tm − λn,2,tm − λn,3,tm) +

m(m− 1)8n2(m+ 1)

(1− λn,1,tm − λn,2,tm + λn,3,tm) ,

and, lastly, that

EW 2n,2,2 =

m

4n2(m+ 1)(1− λn,2,tm

).

Collecting terms we obtain

E(Un +Wn)2

=2m3 + 6m2 + 5m+ 2

8mn2

+2m2 + 2m+ 18mn2(m+ 1)

(λn,1,tm− λn,3,tm

+ λn,1,tm+1 − λn,3,tm+1)

−(m2 +m+ 1

) (2m2 + 2m+ 1

)8m(m+ 1)n2

(λn,2,tm+ λn,2,tm+1) +

2m3 −m− 116n2(m+ 1)

(λn,4,tm+ λn,4,tm+1).

Assuming without further mention in the remainder of the proof that n ≥ 7, computing the difference ofE(Un+Wn)2− [E(Un+Wn)]2, with the squared expectation given in (38), firstly we have the constant term

2m3 + 6m2 + 5m+ 28m(2m+ 1)2

− 116

=8m2 + 9m+ 416m(2m+ 1)2

=4m(2m+ 1) + 5m+ 4

16m(2m+ 1)2=

14n

+5

16n2+

12(n− 1)n2

,

Except for one mixed term, we only consider bounds for the terms involving tm, as the same boundsresult for tm+1. For the coefficients of λn,1,tm

and λn,3,tm, noting these variables do not appear in (38), the

inequality2m2 + 2m+ 18mn2(m+ 1)

≤ 13n2

and Lemma 5.3 yield the contributions

2m2 + 2m+ 18mn2(m+ 1)

|λn,1,tm| ≤ 1

3n2e1−n/2 ≤ 1

n2e−n/2 and

∣∣∣∣−2m2 + 2m+ 18mn2(m+ 1)

∣∣∣∣ |λn,3,tm| ≤ 1

6n2e−n.

20

For the contribution from λn,2,tm , using Lemma 5.3,∣∣∣∣∣−(m2 +m+ 1

) (2m2 + 2m+ 1

)8m(m+ 1)n2

+116

∣∣∣∣∣ |λn,2,tm| ≤

(5

16n2+

5m+ 216m(m+ 1)n2

)|λn,2,tm

| ≤ 3e−n

8n2.

Lastly we bound the difference∣∣∣∣2m3 −m− 116n2(m+ 1)

λn,4,tm− 1

64(λ2n,2,tm

+ λn,2,tmλn,2,tm+1

)∣∣∣∣=

∣∣∣∣(2m3 −m− 116n2(m+ 1)

− 132

)λn,4,tm

+164(2λn,4,tm

− λ2n,2,tm

− λn,2,tmλn,2,tm+1

)∣∣∣∣≤ 8m2 + 7m+ 3

32(m+ 1)(2m+ 1)212e−n +

164(e−n + 2e−2n

)=

((8m2 + 4m) + 3(m+ 1)

32(m+ 1)(2m+ 1)2

)12e−n +

164(e−n + 2e−2n

)≤

(1

16n+

364n2

)e−n +

164(e−n + 2e−2n

)≤ 1

8ne−n +

164e−n +

132e−2n.

Now adding in the corresponding terms involving tm+1, we conclude that the variance Var(Un + Wn) isbounded by

14n

+5

16n2+

12(n− 1)n2

+ 2(

1n2e−n/2 +

16n2

e−n +3

8n2e−n +

18ne−n +

164e−n +

132e−2n

)≤ 1

4n+

516n2

+1

2(n− 1)n2+

2n2e−n/2 +

(132

+1

4n+

1312n2

+116e−n

)e−n.

Now the upper bound (6) follows from the inequality√a+ b ≤

√a+√b for any a, b ≥ 0.

We now provide a bound for the normal approximation of X in the odd case. We remark that fewer errorterms, and therefore a smaller bound, results when standardizing X as in Theorem 1.1, that is, not by itsown mean and variance but by the (exponentially close) mean and variance of the closely coupled V .Proof of Theorem 1.1: odd n. Letting W = (X − µV )/σV and WV = (V − µV )/σV , since |X − V | ≤ 1 by(34) of Theorem 4.1, we have

|W −WV | = |X − V | /σV ≤ 1/σV . (44)

Letting P (Z ≤ z) = Φ(z) and

CV =µVσ2V

∆1 + 3.28µVσ3V

+2σV

.

Using (44) and Theorem 4.2 we have

P (W ≤ z)− Φ(z) ≤ P (WV − 1/σV ≤ z)− Φ(z)= P (WV ≤ z + 1/σV )− Φ(z + 1/σV ) + Φ(z + 1/σV )− Φ(z)

≤ CV + 1/(σV√

2π).

Similarly,

P (W ≤ z)− Φ(z) ≥ P (WV + 1/σV ≤ z)− Φ(z)= P (WV ≤ z − 1/σV )− Φ(z − 1/σV ) + Φ(z − 1/σV )− Φ(z)

≤ CV + 1/(σV√

2π),

thus demonstrating the claim.

21

5 Appendix: Spectral Decomposition

In [10] the lightbulb chain was analyzed as a composition chain of multinomial type. Such chains in generalare based on a d × d Markov transition matrix P which describes the transition of a single particle in asystem of n identical particles, a subset of which is selected uniformly to undergo transition at each timestep according to P .

In the case of the lightbulb chain there are d = 2 states and the transition matrix P of a single bulb isgiven by

P =[

0 11 0

],

where we let e1 = (0, 1)t and e0 = (1, 0)t denote the 1 and 0 states of the bulb, for on and off, respectively.With b ∈ 0, 1, . . . , n let Pn,b,s be the 2b× 2b transition matrix of a subset of size b of the n total lightbulbswhen s of the n bulbs are selected uniformly to be switched. Letting Pn,0,s = 1 for all n and s, and I2 the2× 2 identity matrix, for n ≥ 1 the matrix Pn,b,s is given recursively by

Pn,b,s =s

n(P ⊗ Pn−1,b−1,s−1) + (1− s

n) (I2 ⊗ Pn−1,b−1,s) for b ∈ 1, . . . , n,

as any particular bulb among the b in the subset considered is selected with probability s/n to undergotransition according to P , leaving the s − 1 remaining switches to be distributed over the remaining b − 1of n − 1 bulbs, and with probability 1 − s/n the bulb is left unchanged, leaving all the s switches to bedistributed.

The transition matrix P is easily diagonalizable as

P = T−1ΓT where T =1√2

[1 1−1 1

]and Γ =

[1 00 −1

],

hence Pn,b,s is diagonalized by

Pn,b,s = ⊗bT−1Γn,b,s ⊗b T (45)

where Γn,0,s = 1, and is otherwise given by the recursion

Γn,b,s =s

n(Γ⊗ Γn−1,b−1,s−1) + (1− s

n) (I2 ⊗ Γn−1,b−1,s) for b ∈ 1, . . . , n. (46)

The next result describes the matrices Γn,b,s more explicitly in terms of a sequence defined through therecursion

ab = (ab−1,ab−1 + 1b−1) for b ≥ 2, with a1 = 0, (47)

where 1b = (1, . . . , 1), a vector of length 2b. For example,

a1 = (0, 1), a2 = (0, 1, 1, 2) and a3 = (0, 1, 1, 2, 1, 2, 2, 3).

Letting an be the nth term of the vector ab for any b satisfying 2b ≥ n results in a well defined sequencea1, a2, . . ..Lemma 5.1 For n ∈ 0, 1, . . . , and b, s ∈ 0, . . . , n, and λn,b,s given by (1), we have

Γn,b,s = diag(λn,a1,s, . . . , λn,a2b ,s).

In particular, with 02b−1 the vector of zeros of length 2b−1, for b ≥ 1

Γn,b,s = diag(λn,a1,s, . . . , λn,a2b−1 ,s,02b−1) + diag(02b−1 , λn,a1+1,s, . . . , λn,a2b−1+1,s).

22

For instance,

Γn,2,s = diag(λn,0,s, λn,1,s, λn,1,s, λn,2,s),

and

Γn,3,s = diag(λn,0,s, λn,1,s, λn,1,s, λn,2,s, λn,1,s, λn,2,s, λn,2,s, λn,3,s).

Proof. As λn,0,s = 1, the lemma is true for b = 0. For the inductive step, assuming the lemma is true forb− 1, by (46) it suffices to verify

s

nλn−1,a,s−1 + (1− s

n)λn−1,a,s = λn,a,s and

− snλn−1,a,s−1 + (1− s

n)λn−1,a,s = λn,a+1,s.

For the first equality, by (1), the left hand side equals

s

n

a∑t=0

(a

t

)(−2)t

(s− 1)t(n− 1)t

+ (1− s

n)

a∑t=0

(a

t

)(−2)t

(s)t(n− 1)t

=a∑t=0

(a

t

)(−2)t

(s

n

(s− 1)t(n− 1)t

+ (1− s

n)

(s)t(n− 1)t

)

=a∑t=0

(a

t

)(−2)t

((s)t+1 + (n− s)(s)t

(n)t+1

)

=a∑t=0

(a

t

)(−2)t

((s)t(s− t+ n− s)

(n)t+1

)

=a∑t=0

(a

t

)(−2)t

((s)t(n− t)

(n)t+1

)

=a∑t=0

(a

t

)(−2)t

(s)t(n)t

= λn,a,s.

The second equality can be shown in similar, though slightly more involved, fashion. We note that [10] expresses these eigenvalues in terms of the hypergeometric function.If the k stages of the process 1, . . . , k use switches s = (s1, . . . , sk), as (45) implies that the matrices

Pn,b,s, s ∈ 0, 1, . . . , n are simultaneously diagonalizable, the overall transition matrix Pn,b,s for a subset ofb bulbs is the product

k∏j=1

Pn,b,sj= ⊗bT−1Γn,b,s ⊗b T = ⊗bT−1diag(λn,a1,s, . . . , λn,a2b ,s)⊗b T, (48)

where

Γn,b,s =k∏j=1

Γn,b,sjand λn,a,s =

k∏j=1

λn,a,sj. (49)

Hence, if π is any permutation of 1, . . . , k, letting π(s) = (sπ(1), . . . , sπ(k)), from (49) we have Γn,b,s =Γn,b,π(s), and now from (48) that

Pn,b,s = Pn,b,π(s). (50)

23

As order is unimportant, we may replace (s1, . . . , sk) by the multiset s1, . . . , sk when convenient.The following lemma computes the probabilities that out of a group of 2r bulbs, starting with initial

conditions such as half the bulbs off and the other half on, after k transitions using s1, . . . , sk switches, allbulbs will be off.Lemma 5.2 With s = (s1, . . . , sk) let λn,a,s be given by (49). Then for any r ∈ 0, 1, . . . with 2r ≤ n,

P (Xi = 0, i = 1, . . . , 2r|X0,i ≡ imod 2, i = 1, . . . , 2r) =1

22r

r∑j=0

(−1)j(r

j

)λn,2j,s,

and for any r ∈ 0, 1, . . . , n,

P (Xi = 0, i = 1, . . . , r|X0,i = 0, i = 1, . . . , r − 1, X0,r = 1) =12r

r∑j=0

(1− 2j/r)(r

j

)λn,j,s, (51)

and

P (Xi = 0, i = 1, . . . , r|X0,i = 1, i = 1, . . . , r − 1, X0,r = 0) =12r

r∑j=0

(−1)j(1− 2j/r)(r

j

)λn,j,s. (52)

Proof. Letting e1 = (0, 1)′ and e0 = (1, 0)′, extracting the appropriate element of the transition matrix, wehave

P (Xi = 0, i = 1, . . . , 2r|X0,i ≡ imod 2, i = 1, . . . , 2r) = (e′1 ⊗ e′0)⊗rPn,2r,se⊗2r0 .

For j ∈ 0, 1, . . . let Ω2r,j be the 22r × 22r diagonal matrix in the variables xk, k ∈ 0, 1, . . . given by

Ω2r,j = diag(xa1+j , . . . , xa22r +j),

and

w2r = (e′0 ⊗ e′1)⊗r ⊗2r T−1 and u2r = ⊗2rTe⊗2r0 .

With Ω2r = Ω2r,0 we verify by induction that

w2rΩ2ru2r =r∑j=0

(−1)j(r

j

)x2j . (53)

Adopting the convention that a⊗0 = 1, sides of (53) equal x0 when r = 0, so we assume (53) holds fornonnegative integers less than r. As e′1T

−1 = (1, 1)/√

2, and e′0T−1 = (1,−1)/

√2 we have

w2r = (e′1 ⊗ e′0)⊗2 T−1 ⊗ w2r−2 =12

(w2r−2,−w2r−2, w2r−2,−w2r−2), (54)

and similarly, as Te0 = (1,−1)′/√

2

u2r =12

(u2r−2,−u2r−2,−u2r−2, u2r−2). (55)

Using (47), we may write

Ω2r =[

Ω2r−1,0 00 Ω2r−1,1

]=

Ω2r−2,0 0 0 0

0 Ω2r−2,1 0 00 0 Ω2r−2,1 00 0 0 Ω2r−2,2

. (56)

24

Now calculating the left hand side of (53) using (54), (55), and (56) yields

14

(w2r−2Ω2r−2,0u2r−2 + w2r−2Ω2r−2,1u2r−2 − w2r−2Ω2r−2,1u2r−2 − w2r−2Ω2r−2,2u2r−2)

=14

(w2r−2Ω2r−2,0u2r−2 − w2r−2Ω2r−2,2u2r−2) .

By the induction hypotheses, the first term is given by (53) with r replaced by r − 1, and the secondterm the same as the first term, but with x2j replaced by x2j+2. Hence, we obtain

w2rΩ2ru2r =14

122(r−1)

r−1∑j=0

(−1)j(r − 1j

)x2j −

122(r−1)

r−1∑j=0

(−1)j(r − 1j

)x2j+2

=

122r

r−1∑j=0

(−1)j(r − 1j

)x2j −

r−1∑j=0

(−1)j(r − 1j

)x2j+2

=

122r

r−1∑j=0

(−1)j(r − 1j

)x2j +

r∑j=1

(−1)j(r − 1j − 1

)x2j

=

122r

r∑j=0

(−1)j(r

j

)x2j ,

as desired.Now to prove the first claim of the lemma note that the diagonalization (45) yields

(e′0 ⊗ e′1)⊗rPn,2r,se⊗2r0 = w2rΓn,2r,su2r,

and apply (53). Equalities (51) and (52) can be shown in a similar, but simpler, manner. Suppose switches s = (s1, . . . , sn) are applied to n bulbs in stages 1, . . . , n, respectively. Then, for stage

l, with α and β nonnegative with α+ β ≤ n,

P (Xl1 = · · · = Xlα = 0, Xl,α+1 = · · · = Xl,α+β = 1) =(n− sl)α(sl)β

(n)α+β,

so we define

gα,β,n,s,l := P (X1 = · · · = Xα+β = 0, Xl1 = · · · = Xlα = 0, Xl,α+1 = · · · = Xl,α+β = 1) (57)

= P (X1 = · · · = Xα+β = 0|Xl1 = · · · = Xlα = 0, Xl,α+1 = · · · = Xl,α+β = 1)(n− sl)α(sl)β

(n)α+β.

By (50), that is, the fact that the switch variables can be applied in any order, the conditional probabilityin (57) is the same as that for the lightbulb process with initial condition Xl1 = · · · = Xlα = 0, Xl,α+1 =· · · = Xl,α+β = 1 which skips stage l. Hence, for given s and l ∈ 1, . . . , n, letting

sl = (s1, . . . , sl−1, sl+1, . . . , sn),

by Lemma 5.2,

gr,r,n,s,l =1

22r

r∑j=0

(−1)j(r

j

)λn,2j,sl

× (n− sl)r(sl)r(n)2r

and

gr−1,1,n,s,l =12r

r∑j=0

(1− 2j/r)(r

j

)λn,j,sl

× (n− sl)r−1(sl)1(n)r

g1,r−1,n,s,l =12r

r∑j=0

(−1)j(1− 2j/r)(r

j

)λn,j,sl

× (n− sl)1(sl)r−1

(n)r.

25

Specializing further, we obtain

g1,1,n,s,l =14

(1− λn,2,sl)sl(n− sl)

(n)2, (58)

g2,2,n,s,l =116

(1− 2λn,2,sl+ λn,4,sl

)(sl)2(n− sl)2

(n)4. (59)

g2,1,n,s,l =18

(1 + λn,1,sl− λn,2,sl

− λn,3,sl)(n− sl)2sl

(n)3. (60)

and

g1,2,n,s,l =18

(1− λn,1,sl− λn,2,sl

+ λn,3,sl)(n− sl)(sl)2

(n)3. (61)

In order to obtain bounds on∆ =

√Var(E(Xs −X|X))

as required by Theorem 1.1, we study products of the eigenvalues of the chain.Lemma 5.3 For all even n ≥ 6 and s = (1, . . . , n/2− 1, n/2 + 1, . . . , n),

|λn,2,s| ≤ e−n and |λn,4,s| ≤12e−n. (62)

For all odd n = 2m+ 1 ≥ 7 and s equal to

tm = (1, . . . ,m− 1,m+ 1, . . . , n) or tm+1 = (1, . . . ,m− 1,m,m+ 2, . . . , n),

both inequalities in (62) hold, as do

|λn,1,s| ≤ e1−n/2 and |λn,3,s| ≤12e−n. (63)

Proof. In the even case we define m by n = 2m, and in the odd case we consider only tm, the argument fortm+1 being essentially identical. We follow and slightly generalize the arguments of [6].

Starting with the first claim in (62), for n ≥ 2 consider the second degree polynomial

f2(x) = 1− 4xn

+4(x)2(n)2

, 0 ≤ x ≤ n.

It is simple to verify that f2(x) achieves its global minimum value of −1/(n− 1) at n/2, and that f2(x) hasexactly two roots, at (n +

√n)/2 and (n −

√n)/2. Hence, as f2(x) ≤ 0 for all x between these roots, and

additionally, as (x− 1)/(n− 1) ≤ x/n for all x ∈ [0, n], we obtain the bound

|f2(x)| ≤

1

n−1 for x ∈ [n−√n

2 , n+√n

2 ](1− 2x

n

)2 for x ∈ [0, n], |x− n/2| >√n/2.

(64)

For x ∈ R let bxc and dxe denote the greatest integer less than or equal to x, and the smallest integergreater than or equal to x. Further, recalling that m is given by n = 2m and n = 2m + 1 in the even andodd cases, respectively, let

t =dn−

√n

2e, · · · , bn+

√n

2c\ m

26

Now, in both the even and odd cases we have

|λn,2,s| =

bn−√

n2 c∏s=0

|f2(s)|

(∏s∈t|f2(s)|

) n∏s=dn+

√n

2 e

|f2(s)|

,

noting additionally that if either (n−√n)/2 or (n+

√n)/2 is an integer then equality holds as both expressions

above are zero. We may henceforth assume neither value is an integer, with similar remarks applying to theproducts (68) and (69).

Applying the bound (64), and that bn/2− xc+ dn/2 + xe = n, yields

|λn,2,s| ≤

bn−√

n2 c∏s=0

(1− 2s

n

)2

(∏s∈t

1n− 1

) n∏s=dn+

√n

2 e

(1− 2s

n

)2

=

bn−√

n2 c∏s=0

(1− 2s

n

)4(

1n− 1

)|t|,

where |t| is the cardinality of t.Using 1− x ≤ e−x for x ≥ 0 and that bxc ≥ x− 1 on the first product, we obtain the bound

|λn,2,s| ≤(e−

2n (bn−

√n

2 c)(bn−√

n2 c+1)/2

)4

e−|t| log(n−1) ≤ e−(n−2√n−1+2/

√n+|t| log(n−1)). (65)

To control |t|, note that as dxe ≤ x+ 1, we have

|t| = bn+√n

2c − dn−

√n

2e ≥√n− 2.

As log 35 ≥ 3.5, for n ≥ 36 we have

−2√n− 1 + 2/

√n+ |t| log(n− 1) ≥ −2

√n− 1 + 3.5(

√n− 2) =

32√n− 8 ≥ 0,

and hence, from (65), that

|λn,2,s| ≤ e−n for n ≥ 36.

Using the given choices for s in the even and odd cases, one may verify directly that λn,2,s satisfies this samebound for all even integers 6 ≤ n ≤ 34, and all odd integers 7 ≤ n ≤ 35, completing the proof for all claimson λn,2,s.

Now turning to λn,4,s, for n ≥ 4 consider the fourth degree polynomial

f4(x) = 1− 8xn

+24(x)2(n)2

− 32(x)3(n)3

+16(x)4(n)4

, 0 ≤ x ≤ n.

It can be checked that the four roots of f4(x) are given by

x1± =n±

√√2√

3n2 − 9n+ 8 + 3n− 42

and x2± =n±

√−√

2√

3n2 − 9n+ 8 + 3n− 42

,

and that additionally the three roots to the cubic equation f ′4(x) = 0 occur at

y1 = n/2 and y2± = (n±√

3n− 4)/2.

27

These roots satisfy

0 < x1− < y2− < x2− < y1 < x2+ < y2+ < x1+ < n.

To obtain a bound over the interval [x1−, x1+], evaluating f4(x) at its critical values we obtain

f4(y1) =3

(n− 1)(n− 3)≤ 3

(n− 3)2and

f4(y2±) = − 2(3n2 − 9n+ 8)n(n− 1)(n− 2)(n− 3)

≥ − 6(n− 3)2

.

To bound f4(x) by f22 (x) in the remaining part of [0, n], write

f22 (x)− f4(x) =

16x(n− x)p(x)(n− 1)2n2 (n2 − 5n+ 6)

where

p(x) = (4n− 6)x2 + (6n− 4n2)x+ n3 − 2n2 + n.

The roots of the quadratic p(x) are given by

z± =n

2± 1

2

√n(n− 2)2n− 3

.

As 5n2 − 15n+ 12 ≥ 0 for all n, we have (2n− 3)(3n− 4) ≥ n(n− 2), and therefore(√2√

3n2 − 9n+ 8 + 3n− 4)

(2n− 3) ≥ n(n− 2).

Dividing by 2n− 3 and taking square roots demonstrates that

x1− ≤ z− < z+ < x1+.

Hence p(x) is nonnegative on the complement of [z−, z+], and we obtain

|f4(x)| ≤ 6

(n−3)2 for x ∈ [x1−, x1+]f22 (x) for x 6∈ [x1−, x1+], x ∈ [0, n].

(66)

Now write for short

C(n) =√√

2√

3n2 − 9n+ 8 + 3n− 4 so that x1− =n− C(n)

2.

Using (66), (64) and that |s − n/2| >√n/2 whenever s ≤ (n − C(n))/2, 1 − x ≤ e−x for x ≥ 0 and finally

bxc ≥ x− 1, we obtain

bn−C(n)2 c∏s=0

|λn,4,s| ≤bn−C(n)

2 c∏s=0

f22 (s) ≤

bn−C(n)2 c∏s=0

(1− 2s

n

)4

≤ e−(n−2(C(n)+1)+C(n)2/n+2C(n)/n). (67)

Now let

u =dn− C(n)

2e, · · · , bn+ C(n)

2c\ m.

28

Using λn,4,s = λn,4,n−s, (66), |u| ≥ C(n)− 2 and (67), we have, in both the even and odd cases, that

|λn,4,s|

=bn−C(n)

2 c∏s=0

|λn,4,s|∏s∈u|λn,4,s|

n∏s=dn+C(n)

2 e

|λn,4,s| ≤

bn−C(n)2 c∏s=0

λ2n,4,s

( 6(n− 3)2

)C(n)−2

(68)

≤ e−2(n−2(C(n)+1)+C(n)2/n+2C(n)/n)+(C(n)−2)(log(6)−2 log(n−3))

= e−n · e−(n−4(C(n)+1)+2C(n)2/n+4C(n)/n+(C(n)−2)(2 log(n−3)−log(6))).

Since for n ≥ 96 we have

n− 4(C(n) + 1) ≥ log(2) and 2 log(n− 3)− log(6) ≥ 0,

we conclude that |λn,4,s| ≤ e−n/2 over this range. One can verify directly that this same bound holds in theeven case for integers 6 ≤ n ≤ 94, and the odd case for integers 7 ≤ n ≤ 95, completing the proof of theclaims on λn,4,s.

Now we consider the bounds (63) for odd n = 2m+1, starting with λn,1,s. Applying the same inequalitiesas before,

m∏s=0

(1− 2s

n

)≤

m∏s=0

e−2s/n = e−2n

∑ms=0 s = e−m(m+1)/n,

and, as (1− 2s/n) = −(1− 2(n− s)/n), we have that

2m+1∏s=m+2

(1− 2s

n

)= (−1)m

m−1∏s=0

(1− 2s

n

)and hence

2m+1∏s=m+2

∣∣∣∣1− 2sn

∣∣∣∣ ≤ e−m(m−1)/n.

Therefore,

|λn,1,tm | ≤ e−2m2/n = e−(n−1)2/(2n) ≤ e−(n/2−1).

for all n ≥ 1.Now, to obtain the claimed bound for λn,3,tm consider the third degree polynomial

f3(x) = 1− 6xn

+12(x)2(n)2

− 8(x)3(n)3

, 0 ≤ x ≤ n.

The three roots of f3(x) are given by

x1± =n±√

3n− 22

and x2 =n

2.

Additionally, f3(x) achieves its local extreme values at

y1± =3n±

√3√

3n− 26

for which |f3(y1±)| = 2(n− 2/3)3/2

n(n− 1)(n− 2)≤ 2

(n− 2)3/2.

Clearly x1− < y1− < x2 < y1+ < x1+, so |f3(x)| ≤ |f3(y1±)| over [x1−, x1+].To bound |f3(x)| over the remaining portion of [0, n], we have(

1− 2xn

)3

− f3(x) =4(3n− 2)x(n− 2x)(n− x)

n3 (n2 − 3n+ 2),

29

demonstrating that f3(x) ≤ (1 − 2x/n)3 for all x ∈ [0, n/2], and hence |f3(x)| ≤ |(1 − 2x/n)3| for allx ∈ [0, x1−]. By the odd symmetry of f3(x) and (1− 2x/n) about x = n/2, we now obtain

|f3(x)| ≤

2

(n−2)3/2 for x ∈ [x1−, x1+]∣∣1− 2xn

∣∣3 for x 6∈ [x1−, x1+], x ∈ [0, n].

Hence, arguing as before,

bx1−c∏s=0

|λn,3,s| ≤

bx1−c∏s=0

(1− 2s

n

)3

≤ e− 34 (n−2

√3n−2+(n−2)/n+2

√3n−2/n),

with the same bound holding for∏ns=dx1+e |λn,3,s|. Therefore, letting

u =dn−

√3n− 22

e, · · · , bn+√

3n− 22

c\ m

we have |u| ≥√

3n− 2− 2 and

|λn,3,tm |

=bx1−c∏s=1

|λn,3,s|∏s∈u|λn,3,s|

n∏s=dx1+e

|λn,3,s| ≤

bx1−c∏s=1

λ2n,3,s

( 2(n− 2)3/2

)√3n−2−2

(69)

≤ e−32 (n−2

√3n−2+(n−2)/n+2

√3n−2/n)+(

√3n−2−2)(log(2)− 3

2 log(n−2))

= e−n · e−( 12n−3

√3n−2+ 3

2 (n−2)/n+3√

3n−2/n+(√

3n−2−2)( 32 log(n−2)−log(2))).

Since for n ≥ 111 we have

12n− 3

√3n− 2 ≥ log(2) and

32

log(n− 2)− log(2) ≥ 0,

we obtain that |λn,3,s| ≤ e−n/2 over this range. Verifying directly that the claims of the lemma hold for odd7 ≤ n ≤ 111 completes the proof.

Bibliography

1. Baldi, P., Rinott, Y. and Stein, C. (1989). A normal approximation for the number of local maxima ofa random function on a graph, In Probability, Statistics and Mathematics, Papers in Honor of SamuelKarlin. T. W. Anderson, K.B. Athreya and D. L. Iglehart eds., Academic Press , 59-81.

2. Gosh, S. and Goldstein, L. (2009). Concentration of measures via size biased couplings. ProbabilityTheory and Related Fields, to appear

3. Goldstein, L. (2005). Berry-Esseen bounds for combinatorial central limit theorems and pattern oc-currences, using zero and size biasing. J. Appl. Probab. 42, 661–683.

4. Goldstein, L. and Rinott, Y. (1996). On multivariate normal approximations by Stein’s method andsize bias couplings. J. Appl. Prob. 33, 1-17.

5. Goldstein, L. and Penrose, M. (2009). Normal approximation for coverage models over binomial pointprocesses. Ann. Appl. Prob, to appear.

6. Rao, C., Rao, M., and Zhang, H. (2007). One Bulb? Two Bulbs? How many bulbs light up? Adiscrete probability problem involving dermal patches. Sankya, 69, pp. 137-161.

30

7. Shao, Q. M. and Su, Z. (2005). The Berry-Esseen bound for character ratios. Proceedings of theAmerican Mathematical Society 134, 2153-2159.

8. Stein, C. (1981) Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9,1135-1151.

9. Stein, C. (1986). Approximate Computation of Expectations. IMS, Hayward, California.

10. Zhou, H. and Lange, K. (2009). Composition Markov chains of multinomial type. Advances in AppliedProbability, 41, pp. 270 - 291.

31

A Berry Esseen Theorem for the Lightbulb Processbcf.usc.edu/~larry/papers/pdf/ltb11.pdf · A Berry...

Documents

Transcript of A Berry Esseen Theorem for the Lightbulb Processbcf.usc.edu/~larry/papers/pdf/ltb11.pdf · A Berry...