Local Central Limit Theorems, the High Order Correlations ... · Local Central Limit Theorems, the

Local Central Limit Theorems, theHigh Order Correlations of Rejective

Sampling, and Applications toConditional Logistic Likelihood

Asymptotics ∗†

Richard Arratia, Larry Goldstein and Bryan LangholzUniversity of Southern California

November 15, 2001

Abstract

Let I1, . . . , In be independent but not necessarily identically dis-tributed Bernoulli random variables, and let Xn =

∑nj=1 Ij . For

ν in a bounded region, a local central limit theorem expansion ofPI (Xn = EI Xn + ν) is developed to any given degree. By condition-ing, this expansion provides information on the high order correlationstructure of dependent, weighted sampling schemes of a population E(a special case of which is simple random sampling) where a set r ⊂ Eis sampled with probability proportional to

∏A∈r xA, where xA are

positive weights associated with individuals A ∈ E. These results areused to derive the asymptotic information of the conditional logisticlikelihood for unmatched case-control study designs in which sets ofcontrols of the same size are sampled with equal probability.

∗AMS 2000 subject classifications. 62N02, 62D05 60F05.†Key words and phrases: local central limit theorem, rejective sampling, high order

correlation, sampling in epidemiology, conditional logistic likelihood.

1

1 Introduction

Our work is motivated by the asymptotic analysis of sampling schemes incase-control studies used in epidemiologic research. In particular, we obtainthe value of the limiting information, and further higher order propertiesof interest, for a likelihood analysis under various cohort sampling designs.Our analysis depends on the properties of the weighted sampling schemeSE,η where a set r of fixed size η is sampled from the population E withprobability proportional to

xr =∏A∈r

xA, (1)

where xA is a non-negative weight associated with individual A ∈ E in thepopulation.

For the sampling scheme SE,η we need estimates that individual A, orindividuals A and B, etc., will be included in the sample. The first, carriedout in Section 2, is to derive a high order local central limit theorem, Theo-rem 2.2, for independent but not necessarily identically distributed Bernoullirandom variables having success probability pj, j = 1, 2, . . ., which is of in-dependent interest.

Next, in Section 3 we extend Theorem 2.2 by showing that this same localcentral limit theorem expansion for the independent Bernoulli variables withsuccess probability

pA,θ =θxA

1 + θxA

holds uniformly for all θ in an interval bounded away from zero and infinity,under some asymptotic stability conditions on the weights xA, A ∈ E. Theconnection between the independent Bernoulli variables and the dependent,weighted scheme SE,η is the content of Lemma 3.5, which shows that con-ditioning the independent variables in the population E to have exactly ηsuccesses results in SE,η for any θ > 0. Choosing the θ so that the expectednumber of successes of the independent variables in E equals η allows forthe application of the local central limit theorem, and the derivation of theneeded inclusion probabilities under SE,η.

In Section 4, we use the properties of the inclusion probabilities to derivethe high order correlation structure of SE,η, given in Theorem 4.1.

In Section 5 all the foregoing is applied to derive the asymptotic infor-mation of the sampling schemes of interest, in Theorem 5.1.

2

1.1 The Statistical Motivation in Detail

Here is a brief overview of the motivating problem. Using notation similarto Langholz and Goldstein (2001), with I an indicator of disease status,we assume that the probability of disease given covariate vector Z has aproportional odds (logistic) form:

PI λ0,β0(I = 1|Z) =λ0x(Z; β0)

1 + λ0x(Z; β0)=

λ0x

1 + λ0x; (2)

where x(Z,0) = x(0, β) = 1, for any β in the parameter space (e.g.,[3, 2]).The parameter λ0 is therefore the baseline odds and x(Z, β0) is the odds ratioassociated with Z. The parameter β0 which quantifies the association be-tween the covariate values and disease outcome is usually the one of interest.

Now consider a “study base” R = 1, . . . , N of N individuals and let(ZA, IA) for A ∈ R be independent copies of (Z, I). Let S be the (random)set of indices of diseased subjects.

Then for r ⊂ R

PI λ0,β0(S = r) =∏A∈r

λ0 xA

1 + λ0 xA

∏A∈R\r

1

1 + λ0 xA

= λ|r|0 xr QR,

where xr =∏

A∈r xA as in (1) and QR =∏

A∈R(1 + λ0 xA)−1.When covariate values for all study base subjects are available, estimation

of the unknown β0 (and λ0) can be achieved by maximizing the likelihoodPI λ,β(S). But when the study base is large or the collection of the full set ofcovariate values is expensive or impractical, it is natural to sample subjects toform a sampled study base E of size |E| < N , and use the collected covariatesin this sample for the estimation of parameters. In particular, in studies ofrare diseases such as cancers, E often consists of all cases and a randomsample of controls. For the typically used methods of control selection suchas frequency matching, fixed size sampling, Bernoulli trial sampling, andcase-base sampling (e.g.,[9, 2, 13, 10]), the probability that the set of casesis r given that the sampled study base is E and the number of cases S is ηis given by ([3, 12, §2.1]

PI β0(S = r∣∣∣E, |S| = η) =

xr∑u⊂E:|u|=η xu

I(r ⊂ E, |r| = η), (3)

with xr as in (1).

3

The conditional logistic likelihood is constructed in the usual way basedon this probability as

L(β) = PI β(S∣∣∣E, |S| = η), (4)

considered as a function of β. With x′ the column vector of derivatives withrespect to β, let

ZA = x′Ax−1A ; (5)

we have ZA = ZA in the model where xA = exp(βT0 ZA). Letting

pA = EI β(IA∣∣∣E, |S| = η) and pAB = EI β(IAIB

∣∣∣E, |S| = η),

the score function for this likelihood is

U(β) =∑A∈S

ZA −∑A∈E

ZApA,

and, with pA + qA = 1, the (expected [12]) information

I(β) =∑A∈E

Z ⊗2A pAqA +

∑A,B∈E,A6=B

ZAZTB(pAB − pApB), (6)

where Z ⊗2 = ZZT. Note that

pAB − pApB = EI β[(IA − pA)(IB − pB)∣∣∣E, |S| = η], (7)

the (conditional) correlation of the joint inclusion of A and B.When the standard likelihood type argument is applied to study the

asymptotic properties of the estimator β which maximizes (4), then to showconsistency we need that the information I(β0) normalized by |E| convergesin probability. Since the information is a double sum over A and B, to havesuch convergence we require that the inclusion correlations (7) decay at rate1/|E|. The remainder term in the Taylor expansion of the log likelihoodcontains a triple sum of terms multiplied by the third order correlation,

EI β[(IA − pA)(IB − pB)(IC − pC)∣∣∣E, |S| = η].

For consistency we also need that 1/|E| times the (triple sum) remainder bebounded in probability; hence the third order correlation above needs to bego to zero as 1/|E|2.

4

Because of the dependence in SE,η created by having the probability ofa set proportional to the product of its individuals weights, a problematicstep in the asymptotic theory is to establish the orders of these correlations,as well as to compute the limiting second order correlation needed to derivethe asymptotic information. This problem has been explored only undervery restrictive situations ([7, 4]). While full treatment of the asymptotictheory for β, and its limiting distribution in particular, will be undertakenelsewhere, the results here provide the order of all high degree correlations(Section 4) as well as the means to compute the limiting information of theconditional logistic likelihood (4) for a large class of case-control samplingdesigns (Section 5.)

1.2 The Probabilistic Setup

Motivated by (3), we define a probability distribution SE,η on the size ηsubsets of E, given by

SE,η(S = r) =xr∑

u⊂E,|u|=η xu

, if r ⊂ E and |r| = η,

where xr is defined in (1). For a random variable V = V (S) let SE,η(V )denote the expectation of V under SE,η, let IA = 1(A ∈ S) be the indicatorthat A is included in the sample S, and set pA = SE,η(IA). We study thehigh order correlations of the form

Corr(H) = SE,η

( ∏A∈H

(IA − pA)

),

for any H ⊂ E; when H = A,B, a set of size 2, Corr(H) = pAB − pApB isthe usual correlation of IA and IB.

When xA = 1 for all A ∈ E (corresponding to β0 = 0 in (2),) SE,η reducesto simple random sampling. In this case, when there exists τ ∈ (0, 1/2] suchthat the sampling fraction η/|E| ∈ [τ, 1− τ ], then as |E| → ∞,

SE,η[(IA − pA)(IB − pB)] =−η(|E| − η)

|E|2(|E| − 1)= Oτ (|E|−1), (8)

andSE,η[(IA − pA)(IB − pB)(IC − pC)] = Oτ (|E|−2), (9)

5

showing that simple random sampling has the rates needed for the stability ofthe information and the control on the remainder in our likelihood analysis;the exact meaning of Oτ is given in Definition 2.1.

For simple random sampling a straightforward calculation shows that

Corr(H) =|H|∑j=0

(|H|j

)(|E|−jη−j

)(|E|η

) (−η|E|

)|H|−j.

For simple random sampling we may write Corr(k) for the common valueof Corr(H) for all H of size k , and we have that as |E|, η → ∞, withη/|E| → f ∈ (0, 1),

|E|Corr(2) → f(f − 1)

|E|2Corr(3) → 2f(f − 1)(2f − 1)

|E|2Corr(4) → 3(f(f − 1))2

|E|3Corr(5) → 20(f(f − 1))2(2f − 1)

|E|3Corr(6) → 15(f(f − 1))3

In fact [1], with N a standard normal variate,

lim|E|→∞

|E|k/2Corr(k) = EI N k(f(f − 1))k/2 for k even,

and

lim|E|→∞

|E|(k+1)/2Corr(k) =1

3(k−1)EIN k+1(f(f−1))(k−1)/2(2f−1) for k odd.

In particular, we see that

Corr(H) = O|H|,τ (|E|−(|H|+|H|mod2)/2), (10)

generalizing (8) and (9). Theorem 4.1 shows that the orders in (10) areobtained for the weighted sampling schemes which motivated this work.

1.3 Rejective Sampling

Simple random sampling is the most ubiquitous of all statistical methods.However, in some cases it is not possible to take a simple random sample, for

6

instance, when the inclusion of the population member A is influenced by acertain non-negative “size” xA associated with the unit A, where the largerthe size of an item the easier it is to locate, and the higher the probabilityof its inclusion.

Sampling proportional to the product xr is termed rejective sampling byHajek [8]; SE,η may be achieved by sampling η individuals independently withreplacement, and rejecting those samples in which the η individuals are notdistinct. Hajek considers the inclusion probabilities, second order correlationsand asymptotic normality of sums obtained by rejective sampling.

Other schemes where objects are sequentially sampled proportional totheir size have been extensively studied, (e.g, [11] or [6]). However, rejectivesampling differs from sampling sequentially proportional to size when η ≥ 2,as can be seen by comparison of the general probability that a sample of sizeη = 2 results in the units A and B. Both schemes reduce to simple randomsampling when the weights are constant.

1.4 Outline

First, a local central limit theorem for Bernuolli variables is provided in Sec-tion 2, where a high order expansion for the number of sampled elements isprovided. By conditioning (Lemma 3.5), in Section 3, we relate the indepen-dent and the weighted sampling schemes, and then apply the local centrallimit expansion for the sum of independent Bernoulli’s to derive a high orderexpansion, Theorem 3.1, for the inclusion of a sampling unit in the weightedscheme. In Section 4, the high order correlation of the weighted scheme isgiven in terms of these inclusion probabilities. In Section 5, we use our resultsto address our motivating question and derive the asymptotic information ofthe conditional logistic likelihood (4) for a large class of case-control samplingdesigns.

2 A High Order Local Central Limit Theo-

rem

The main result of this section is Theorem 2.2, a local central limit theo-rem expansion for the distribution of Xn, the sum of independent but notnecessarily identically distributed indicator random variables. The first step,Lemma 2.1, is to obtain an expression for the characteristic function of the

7

centered sum, Xn−EI Xn. In the following, we write Ω for a complex constant,not necessarily the same at each occurrence, such that |Ω| ≤ 1.

Lemma 2.1 Let

Xn =n∑

j=1

Ij,

where Ij, j = 1, . . . , n are independent Bernoulli variables with EI Ij = pj =1− qj; let

v2n =

n∑j=1

pjqj and w2n =

n∑j=1

pjqj(pj − qj). (11)

Then, denoting the characteristic function of Xn − EI Xn by φn(t), for alln = 1, 2, . . . and |t| ≤ 1,

φn(t) = exp

(−t

2v2n

2+ i

t3w2n

6+t4

10nΩ

). (12)

Furthermore, for all t ∈ [−π, π]

|φn(t)| ≤ exp(−t2v2

n/6). (13)

Proof: The characteristic function of an indicator I which has been centeredby subtraction of its mean p is

EI eit(I−p) = e−itp(q + peit) = qe−itp + peitq.

We have for all t,

qe−itp = q

(1− itp− t2

2p2 + i

t3

6p3 +

t4

24p4Ω

)

peitq = p

(1 + itq − t2

2q2 − i

t3

6q3 +

t4

24q4Ω

),

and adding, we obtain

EI eit(I−p) = 1− t2

2pq + i

t3

6pq(p− q) + Ω

t4

24. (14)

Using that pq(p− q) ≤√

3/18 ≤ 1/9, we have for |t| ≤ 1,

| − t2

2pq + i

t3

6pq(p− q) + Ω

1

24t4| ≤ t2

2

1

4+|t|3

6

1

9+t4

24

≤ 5

27t2 ≤ 1

2.

8

Applying the estimate

log(1 + x) = x+ Ωx2 ∀|x| ≤ 1

2,

we obtain that for |t| ≤ 1,

log(EI eit(I−p)) = −t2

2pq + i

t3

6pq(p− q) + Ω

1

24t4 + Ω

(5

27t2)2

= −t2

2pq + i

t3

6pq(p− q) +

t4

10Ω,

and now summing

log φn(t) = −t2

2

n∑j=1

pjqj + it3

6

n∑j=1

pjqj(pj − qj) + nt4

10Ω.

Exponentiating gives (12).To prove (13), observe that

|φn(t)| =n∏

j=1

|EI eit(I−pj)| =n∏

j=1

(p2

j + q2j + 2pjqj cos(t)

)1/2

=

n∏j=1

(1− 2(1− cos(t))pjqj)

1/2

≤ exp

−(1− cos(t))n∑

j=1

pjqj

.Using (11) and that 1− cos(t) ≥ t2/6 for all −π ≤ t ≤ π, we obtain (13).

Definition 2.1 For a possibly empty set of parameters λ, we will write fn =Oλ(gn) if there exists a constant Cλ and an integer nλ, both depending onlyon λ, such that

|fn| ≤ Cλ|gn| for all n ≥ nλ; (15)

we write fn = oλ(gn) if for every ε > 0 there exists nλ such that (15) holdswith Cλ replaced by ε. We write fn = Θλ(gn) if fn = Oλ(gn) and gn = Oλ(fn).

In the remainder of this section, recalling v2n =

∑nj=1 pjqj, we will assume:

Condition 2.1 There exists ε > 0 and nε such that v2n ≥ εn for all n ≥ nε.

9

Lemma 2.2 Under Condition 2.1, for any C > 0 and N a standard normalrandom variable,

1

2π

∫|t|≤√

C log n/n|t|j exp(−t

2v2n

2)dt =

v−(j+1)n√

2π

(EI |N |j + oε,j(1)

)(16)

= Θε,j(n−(j+1)/2).

Proof: By the change of variable z = vnt, the left hand side of (16) becomes

v−(j+1)n√

2π

∫|z|≤

√Cv2

n log n/n|z|j

exp(− z2

2)√

2πdz =

v−(j+1)n√

2π

(EI |N |j − 2EI N j1(N >

√Cv2

n log n/n)),

but

EI N j1(N >√Cv2

n log n/n) ≤ EI N j1(N >√Cε log n)

= oε,j(1),

as n→∞, by the Dominated Convergence Theorem.For a bounded function on [−π, π], define

||f ||∞ = sup|t|≤π

|f(t)|.

Lemma 2.3 Under Condition 2.1, for any K > 0 and f(t) a bounded mea-surable function on [−π, π], setting

an =√C log n/n with C ≥ 6ε−1K, (17)

we have ∫an<|t|≤π

f(t)φn(t)dt = ||f ||∞O(n−K).

Proof: Using Lemma 2.1,

|∫

an<|t|≤πf(t)φn(t)dt| ≤ ||f ||∞

∫an<|t|≤π

|φn(t)|dt

≤ ||f ||∞∫

an<|t|≤πe−nεt2/6dt

≤ 2π||f ||∞e−nεa2n/6 = 2π||f ||∞n−K .

10

Theorem 2.1 Let φn(t) be the characteristic function of the sum of n inde-pendent centered Bernoulli variables, and suppose that Condition 2.1 holds.Define for j ≥ 0,

In,j =1

2π

∫|t|≤π

tjφn(t)dt and I0n,j =

In,j

In,0

. (18)

Then for j even

In,j ∈ Θε,j(n−(j+1)/2) and nj/2I0

n,j =

(n

v2n

)j/2

EN j + oε,j(1), (19)

and for j oddIn,j ∈ Oε,j(n

−(j+2)/2); (20)

in particular, for all j,

I0n,j ∈ Oε,j(n

−(j+jmod2)/2) for j ≥ 0. (21)

Proof: Let an =√C log n/n with C = 6ε−1 ((j + 3)/2). Lemma 2.3 yields∫

an<|t|≤πtjφn(t)dt = πjO(n−(j+3)/2), (22)

so it suffices to consider the region |t| ≤ an. Take nε,j so that for n ≥ nε,j

an ≤ 1 and na3n ≤ 3. Since |t| ≤ an ≤ 1, (12) of Lemma 2.1 gives

φn(t) = exp

(−t

2v2n

2+ i

t3w2n

6+ nt4

1

10Ω

)

= exp

(−t

2v2n

2

)exp

(it3w2

n

6+ nt4

1

10Ω

).

For n ≥ nε,j we see that na4n/10 ≤ na3

n/6 ≤ 1/2. Therefore, for |t| ≤ an,|x| ≤ 1 where x = it3w2

n/6 + nt4Ω/10. Now using that for |x| ≤ 1,

ex = 1 + x+O(x2),

we have

φn(t) = exp

(−t

2v2n

2

)(1 + i

t3w2n

6

)+ exp

(−t

2v2n

2

)O(nt4 + t6w4

n).

11

Lemma 2.2 shows that the second term contributes a term of orderOε,j(n

−(j+3)/2), since

n∫|t|≤an

|t|j+4 exp

(−t

2v2n

2

)= nΘε,j(n

−(j+5)/2) = Θε,j(n−(j+3)/2)

and

w4n

∫|t|≤an

|t|j+6 exp

(−t

2v2n

2

)= Oε,j(n

2n−(j+7)/2) = Oε,j(n−(j+3)/2).

Now focusing on the contribution from the first term, using v2n = Θ(n) by

Condition 2.1, symmetry and Lemma 2.2 for j even we have

vj+1n In,j =

vj+1n

2π

∫|t|≤an

tj exp

(−t

2v2n

2

)(1 + i

t3

6w2

n

)dt+Oε,j(n

−1)

=vj+1

n

2π

∫|t|≤an

tj exp

(−t

2v2n

2

)dt+Oε,j(n

−1)

=1√2π

(EI N j + oε,j(1)

),

yielding (19).For j odd, again using symmetry,

n(j+2)/2In,j = n(j+2)/2 1

2π

∫|t|≤an

tj exp

(−t

2v2n

2

)(1 + i

t3w2n

6

)dt+Oε,j(n

−1/2)

= n(j+2)/2 iw2n

12π

∫|t|≤an

tj+3 exp

(−t

2v2n

2

)dt+Oε,j(n

−1/2)

=i(w2

n/n)

6√

2π(n

v2n

)(j+4)/2(EI Zj+3 + oτ,j(1)

)+Oε,j(n

−1/2);

the right hand side is now seen to be Oε,j(1).For EI Xn + ν an integer define

fn,ν = PI (Xn = EI Xn + ν). (23)

The following Theorem gives a high order local central limit for the proba-bilities of such deviations from the mean EI Xn.

12

Theorem 2.2 Let I1, I2, . . . independent Bernoulli variables with pj = EIjsatisfying Condition 2.1. For non-negative integers s, define

mν(s) =s∑

j=0

(−iν)j

j!In,j. (24)

Then for given κ and even s,

fn,ν = mν(s) + Θε,κ,s

(n−(s+3)/2

)for all |ν| ≤ κ with EI Xn + ν ∈ N.

Proof: Let

Rn,ν = fn,ν −s∑

j=0

(−iν)j

j!In,j,

g(x) = ex −s∑

j=0

xj

j!,

and an =√C log n/n with C = 6ε−1(s+ 2)/2. By the inversion formula

fn,ν =1

2π

∫|t|≤π

e−itνφn(t)dt,

so

2πRn,ν =∫|t|≤π

g(−itν)φn(t)dt

=∫|t|≤an

g(−itν)φn(t)dt+∫

an<|t|≤πg(−itν)φn(t)dt.

Since |φn(t)| ≤ 1, |ν| ≤ κ, and |g(x)| ≤ Cs|x|s+1, for |x| ≤ π, the first integralmay be bounded by

∫|t|≤an

|g(−itν)|dt ≤ 2an sup|t|≤an

|g(−itν)| ≤ 2Csas+2n κs+1 = Oε,nuappa,s

( log(n)

n

)(s+2)/2 .

Since sup|t|≤π |g(−iνt)| ≤ Cs(πκ)s+1, Lemma 2.3 shows that the second in-

tegral is Oκ,s(n−(s+2)/2) ⊂ Oε,κ,s((log n/n)(s+2)/2). Consequently for all s we

obtain

fn,ν =s+2∑j=0

(−iν)j

j!In,j +Oε,κ,s

( log n

n

)(s+4)/2 .

13

When s is even, we have by Theorem 2.1,

In,s+1 = Oε,s(n−(s+3)/2) and In,s+2 = Θε,s(n

−(s+3)/2);

we now obtain the result by observing

Oε,κ,s

( log n

n

)(s+4)/2 ⊂ Oε,κ,s(n

−(s+3)/2).

3 Finite Population Sampling and Inclusion

Probabilities

To extend the results of Section 2, let there be given for allA ∈ N = 0, 1, . . .a ‘weight’ xA ≥ 0, and for θ > 0 let Tθ be the measure under which IA forA ∈ N are independent indicator variables with success probability

pA,θ =θxA

1 + θxA

. (25)

We will assume that the xA weights are ‘asymptotically stable’ in the follow-ing sense.

Condition 3.1 For all δ ∈ (0, 1), there exists ε ∈ (0, 1] and n ≥ 1 such thatfor any finite E with |E| ≥ n,

1

|E|∑A∈E

1(xA ∈ [ε, 1/ε]) ≥ 1− δ. (26)

The case considered in Section 2 corresponds to θ = 1 and xj = pj/qj.For any finite E ⊂ N, with pA,θ + qA,θ = 1, let

XE =∑A∈E

IA, v2E,θ =

∑A∈E

pA,θqA,θ, (27)

and with Tθ(XE) denoting the expectation of XE with respect to Tθ andTθ(XE) + ν an integer,

fE,θ,ν = Tθ(XE = Tθ(XE) + ν). (28)

In this section, we will provide a local central limit theorem expansion forthe probabilities in (28) which holds uniformly for θ in an interval bounded

14

away from zero and infinity. Conditioning Tθ to have exactly η successesover E yields SE,η (Lemma 3.5), and then by selecting the θ which yieldsTθ(XE) = η, we obtain a high order expansion for the probability that A isincluded by SE,η.

With fE,ν a real valued function defined on finite subsets E ⊂ N andν ∈ R, for a possibly empty collection of parameters λ we say

fE,ν ∈ Oλ(gE)

if there exists Cλ and nλ such that

|fE,ν | ≤ Cλ|gE| for all |E| ≥ nλ.

We say fE,ν ∈ Θλ(gE) when fE,ν ∈ Oλ(gE) and gE ∈ Oλ(fE,ν). Note thatif H and G are any fixed subsets of N then fE,ν = Oλ(|E|−a) if and only iffE,ν = Oλ(|(E \H) ∪G|−a).

To see that Condition 3.1 implies Condition 2.1 in Section 2 uniformlyfor θ in an interval bounded away from zero and infinity, we have

Lemma 3.1 Let Condition 3.1 hold and γ ∈ (0, 1]. Then there exists εγ > 0and nγ such that

v2E,θ ≥ εγ|E| for all θ ∈ [γ, 1/γ] and |E| ≥ nγ. (29)

Proof: Letting δ ∈ (0, 1), ε ∈ (0, 1] and n be any values satisfying (26), set

εγ =(1− δ)γε

(1 + γε)(1 + γ−1ε−1)and nγ = n.

Then, for any |E| ≥ nγ, and θ ∈ [γ, 1/γ],

v2E,θ =

∑A∈E

θxA

1 + θxA

1

1 + θxA

≥∑A∈E

γxA

1 + γxA

1

1 + γ−1xA

≥ εγ|E|.

Now let φE,θ be the characteristic function of XE − Tθ(XE) under themeasure Tθ, and in parallel to (18) and (24), write

IE,θ,j =1

2π

∫|t|≤π

tjφE,θ(t)dt and mE,θ,ν(s) =s∑

j=0

(−iν)j

j!IE,θ,j. (30)

15

Lemma 3.2 Let Condition 3.1 be satisfied and γ ∈ (0, 1]. Then for allθ ∈ [γ, 1/γ], for j even

IE,θ,j ∈ Θγ,j(|E|−(j+1)/2) and |E|j/2I0E,θ,j =

(|E|v2

E,θ

)j/2

EI N j + oγ,j(1),

(31)and for j odd,

IE,θ,j ∈ Θγ,j(|E|−(j+2)/2); (32)

in particular, for all j,

I0E,θ,j ≡

IE,θ,j

IE,θ,0

∈ Oγ,j(|E|−(j+jmod2)/2) for j ≥ 0. (33)

Further, for given κ and even s, for Tθ(XE) + ν ∈ N

fE,θ,ν = mE,θ,ν(s) + Θγ,κ,s(|E|−(s+3)/2) for all |ν| ≤ κ. (34)

Proof: Lemma 3.1 in conjunction with Theorem 2.1 gives (31), (32) and(33), and in conjunction with Theorem 2.2 gives (34).

Now letΨtfE,ν = fE,ν−t for t = 0, 1, . . .,

and

∆0fE,ν = fE,ν , ∆fE,ν = fE,ν − fE,ν−1, and ∆t+1 = ∆∆t, t ≥ 0. (35)

For q a non-negative integer, the following classes Gqλ of functions fE,ν

will play a crucial role:

Gqλ = fE,ν : ∀t ≥ 0,∆tfE,ν = Oλ,t,ν(|E|−(t+q+(t+q)mod2)/2). (36)

Lemma 3.3 Let p ≤ q be non-negative integers, and suppose that fE,ν ∈ Gpλ

and gE,ν ∈ Gqλ. Then

Gpλ ⊃ Gq

λ (37)

afE,ν , fE,ν + gE,ν ∈ Gpλ (38)

∀t ≥ 0, ∆tfE,ν ∈ Gt+pλ (39)

∀t ≥ 0, fE,ν − fE,ν−t ∈ Gp+1λ (40)

∀t ≥ 0, ΨtfE,ν ∈ Gpλ (41)

fE,ν gE,ν ∈ Gp+qλ (42)

16

Proof: Without loss of generality take λ = ∅. Equation (37) follows sincep + j + (p + j) mod 2 is increasing in p. Equation (38) follows by (37) andthe linearity of ∆. Equation (39) follows from the definition of Gq.

For (40), write

fE,ν − fE,ν−t =t−1∑j=0

fE,ν−j − fE,ν−j−1 =t−1∑j=0

∆fE,ν−j;

by (39) the summands are in Gp+1, and hence by (38) so is the sum itself,proving (40). Now ΨfE,ν = fE,ν−1 = fE,ν −∆fE,ν ∈ Gp by (39) and (38); thecase for general t in (41) follows by induction.

The verification of equation (42) requires the following product rule whichcan be easily proved by induction;

∆t(fE,νgE,ν) =∑

0≤j≤t

(t

j

)(Ψt−j∆jfE,ν)(∆

t−jgE,ν).

Note that ∆tΨj = Ψj∆t for all non-negative j, t as

∆(Ψfν) = ∆fν−1 = fν−1 − fν−2 = Ψ(fν − fν−1) = Ψ(∆fν),

thus, using (39) and (41) we have that the sum above has order∑0≤j≤t

Oj,ν(|E|−(p+j+(p+j)mod2)/2)Ot−j,ν(|E|−(t+q−j+(t+q−j)mod2)/2)

= Ot,ν(|E|−(t+p+q)/2)∑

0≤j≤t

O(|E|−((p+j)mod2+(t+q−j)mod2)/2).

If t + p + q is odd, then for all 0 ≤ j ≤ t, one of p + j and t + q − j is oddand the other even, and

(p+ j)mod2 + (t+ q − j)mod2 = 1 for all 0 ≤ j ≤ t,

giving the rate Ot,ν(|E|−(t+p+q+1)/2) = Ot,ν(|E|−(t+p+q+(t+p+q)mod2)/2). Fort+ p+ q even, we have always that

(p+ j)mod2 + (t+ q − j)mod2 ≥ 0 for all 0 ≤ j ≤ t,

giving the order Ot,ν(|E|−(t+p+q)/2) = Ot,ν(|E|−(t+p+q+(t+p+q)mod2)/2).

17

Remark: For t + p + q odd, the worst case rate is always obtained, butnot for t+p+ q even. In particular, when t+p+ q is even and p is even thent+ q must be even, and when j = 0 we have

pmod2 + (t+ q)mod2 = 0,

achieving the worst case rate ofOt,ν(|E|−(t+p+q)/2) = Ot,ν(|E|−(t+p+q+(t+p+q)mod2)/2).However, if p is odd then t+ q must be odd, and we achieve the same worstcase rate at j = 1 only when t ≥ 1. That is, for p and q odd we actuallyhave (for the case t = 0) that fE,νgE,ν ∈ Ot,ν(|E|(p+q+2)/2).

For notational ease, we suppress the variable s in the quantity m0E,θ,ν

defined below. Lemmas 3.3 and 3.2 have the following consequence.

Lemma 3.4 Let Condition 3.1 hold and γ ∈ (0, 1]. Then for all θ ∈ [γ, 1/γ],

m0E,θ,ν ≡

s∑j=0

(−iν)j

j!I0

E,θ,j ∈ G0γ,s, (43)

and for any number a, defining

n0E,θ,ν = m0

E,θ,ν − a∆m0E,θ,ν , (44)

we haven0

E,θ,ν − 1 ∈ Oγ,s,ν(|E|−1), (45)

andn0

E,θ,ν − 1 ∈ G0γ,s. (46)

Proof: Note that ∆tνj = 0 for j < t, and hence for 0 ≤ t ≤ s,

∆tm0E,θ,ν =

s∑j=t

(−i)j∆tνj

j!I0

E,θ,j

=s∑

j=t

(−i)j∆tνj

j!Oγ,j(|E|(j+jmod2)/2)

= Oγ,s,t,ν(|E|(j+jmod2)/2).

For t > s ∆tm0E,θ,ν = 0. This proves (43).

Note that since I0E,θ,0 = 1 by definition,

m0E,θ,ν = 1 +Oγ,s,ν(|E|−1);

18

combining this fact with

∆m0E,θ,ν = Oγ,s,ν(|E|−1)

we see (45) follows from the definition (44) of n0E,θ,ν and (38) of Lemma 3.3.

Lemma 3.3 and mE,θ,ν ∈ G0γ,s gives ∆mE,θ,ν ∈ G1

γ,s and a∆mE,θ,ν ∈ G1γ,s,

therefore nE,θ,ν = mE,θ,ν − a∆mE,θ,ν ∈ G0γ,s. Since 1 ∈ G0

γ,s, (38) of Lemma3.3 gives (46).

Let E be a finite subset of N, and recall the definition of xr given in (1).Given an integer 0 ≤ η ≤ |E|, define the probability distribution SE,η of theset S ⊂ E by

SE,η(S = r) =xr∑

u⊂E:|u|=η xu

, if r ⊂ E and |r| = η. (47)

For convenience, we will write, for instance, SE,η(A) in place of SE,η(A ∈ S).Now recall the product measure T with marginals given by (25), such thatfor all r ⊂ E,

Tθ(A ∈ E : IA = 1 = r) = θ|r|(∏

A∈E

1

1 + θxA

)xr. (48)

The following Lemma provides a key relation between Tθ and SE,η; the quan-tity XE is as in (27).

Lemma 3.5 For any (E, η) with 0 ≤ η ≤ |E|, r ⊂ E with |r| = η and θ > 0,

SE,η(r) = Tθ(A ∈ E : IA = 1 = r|XE = η),

and when 0 < η ≤ |E|, for A ∈ E and F = E \ A,

SE,η(A) =pA,θTθ(XF = η − 1)

pA,θTθ(XF = η − 1) + qA,θTθ(XF = η). (49)

Proof: Summing (48) over sets u ⊂ E with |u| = η gives

Tθ(|A ∈ E : IA = 1| = η) = θη

(∏A∈E

1

1 + θxA

) ∑u⊂E,|u|=η

xu, (50)

and now, since |r| = η, division of (48) by (50) yields (47). Next,

SE,η(A) = Tθ(IA = 1|XE = η) =Tθ(IA = 1, XE = η)

Tθ(XE = η)

=Tθ(IA = 1, XF = η − 1)

Tθ(IA = 1, XF = η − 1) + Tθ(IA = 0, XF = η),

19

and (49) now follows using the independence of the variables IA and XF

under Tθ.In the following, for τ ∈ (0, 1/2], let

Eτ = (E, η) : τ ≤ η/|E| ≤ 1− τ.

Lemma 3.6 Suppose that Condition 3.1 is satisfied. Then for all τ ∈ (0, 1/2]there exists γτ ∈ (0, 1] and nτ depending only on τ such that for all (E, η) ∈Eτ with |E| ≥ nτ there exists a unique solution, θ(E, η), to the equation

hE(θ) =η

|E|where hE(θ) =

1

|E|∑A∈E

θxA

1 + θxA

= η, (51)

withθ(E, η) ∈ [γτ , 1/γτ ]. (52)

Proof: Let δ = (1/2) minτ, 1 − 2τ and take ε and nτ = n satisfying (26)for this δ. Then for all |E| ≥ nτ and θ > 0,

(1− δ)θε

1 + θε≤ 1

|E|∑A∈E

θxA

1 + θxA

≤ (1− δ)θε−1

1 + θε−1+ δ. (53)

Hence hE(θ), continuous and strictly increasing on [0,∞) as a functionof θ, satisfies

limθ→0

hE(θ) ≤ δ and limθ→∞

hE(θ) ≥ 1− δ.

Since δ < τ ≤ η/|E| ≤ 1− τ < 1− δ, there exists a unique value θ(E, η) in(0,∞) for which hE(θ) takes on the value η/|E|.

Since η/|E| ∈ [τ, 1− τ ] and θ(E, η) solves (51), by (53)

(1− δ)θ(E, η)ε

1 + θ(E, η)ε≤ 1− τ, and τ ≤ (1− δ)

θ(E, η)ε−1

1 + θ(E, η)ε−1+ δ,

yielding respectively

θ(E, η) ≤ 1− τ

ε(τ − δ)and θ(E, η) ≥ ε(τ − δ)

1− τ.

Verifying that 0 < (τ − δ)/(1− τ) ≤ 1 completes the proof of claim (52).

20

Theorem 3.1 Suppose that Condition 3.1 is satisfied and let be given τ ∈(0, 1/2], κ ∈ N and s even. Then there exists nτ such that for all (E, η) ∈ Eτ

with |E| ≥ nτ , ϑ = θ(E, η) exists, and for all |k| ≤ κ,

SE,η+k(A) = pA,ϑm0F,ϑ,ν−1

s/2∑l=0

(−1)l(n0F,ϑ,ν − 1)l + Θτ,κ,s(|E|−(s+2)/2), (54)

for all A ∈ E, where F = E \ A, ν = k + pA,ϑ, and m0F,θ,ν and n0

F,θ,ν aredefined in (43) and (44). In particular, for all |ν| ≤ κ,

SE,η+k(A) = pA,ϑ +Oτ,κ(|E|−1) and (55)

∆SE,η+k(A) = pA,ϑqA,ϑI0F,ϑ,2 +Oτ,κ(|E|−2).

Proof: By Lemma 3.6, the solutions ϑ = θ(E, η) exists for all (E, η) ∈ Eτ

with |E| ≥ nτ , and lie in an interval [γτ , 1/γτ ] for some γτ > 0 dependingonly on τ .

Hence, first applying Lemma 3.5,

SE,η+k(A) =pA,θTθ(XF = η + k − 1)

pA,θTθ(XF = η + k − 1) + qA,θTθ(XF = η + k)for all θ > 0

=pA,ϑTϑ(XF = η + k − 1)

pA,ϑTϑ(XF = η + k − 1) + qA,ϑTϑ(XF = η + k)upon setting θ = ϑ.

Since η + k = Tϑ(XE) + k = Tϑ(TF ) + pA,ϑ + k = Tϑ(TF ) + ν,

SE,η+k(A) =pA,ϑfF,ϑ,ν−1

pA,ϑfF,ϑ,ν−1 + qA,ϑfF,ϑ,ν

.

Letting Θ(s/2) = Θτ,κ,s(|E|−s/2) for short and applying Lemma 3.2, thequantity above equals

pA,ϑmF,ϑ,ν−1 + Θ((s+ 3)/2)

pA,ϑmF,ϑ,ν−1 + qA,ϑmF,ϑ,ν + Θ((s+ 3)/2)

=pA,ϑm

0F,ϑ,ν−1 + Θ((s+ 2)/2)

pA,ϑm0F,ϑ,ν−1 + qA,ϑm0

F,ϑ,ν + Θ((s+ 2)/2)since IF,ϑ,0 = Θτ (|E|−1/2),

=pA,ϑm

0F,ϑ,ν−1 + Θ((s+ 2)/2)

n0F,ϑ,ν + Θ((s+ 2)/2)

=pA,ϑm

0F,ϑ,ν−1

n0F,ϑ,ν

+ Θ((s+ 2)/2)

=pA,ϑm

0F,ϑ,ν−1

1 + (n0F,ϑ,ν − 1)

+ Θ((s+ 2)/2).

21

Equation (45) of Lemma 3.4 gives n0F,ϑ,ν − 1 ∈ Oτ,ν,s(|E|−1), hence a Taylor

expansion in x of the quotient 1/(1 + x) to order s/2 yields an error term oforder (n0

F,ϑ,ν − 1)s/2+1 ∈ Oτ,ν,s(|E|−(s+2)/2), and therefore (54).Using s = 2 in (54) and collecting terms of order Oτ,ν(|E|−2), we obtain

SE,η+k(A) = pA,ϑ

(1 + qA,ϑ(iI0

F,ϑ,1 + (ν − 1/2)I0F,ϑ,2) +Oτ,ν(|E|−2)

), (56)

proving (55).Under the hypotheses of Theorem 3.1 we have the following

Corollary 3.1SE,η+k(A) ∈ G0

τ . (57)

Proof: For t = 0, ∆0SE,η+k(A) = SE,η+k(A) = Oτ,k(1). Given arbitraryt ≥ 1 take

s = t+ tmod2− 2 and κ = t.

Since m0F,ϑ,ν ∈ G0

τ and (46) of Lemma 3.4 gives n0F,ϑ,ν − 1 ∈ G0

τ repeatedapplication of Lemma 3.3 shows

pA,ϑmF,ϑ,ν

s/2∑l=1

(n0F,ϑ,ν − 1)l ∈ G0

τ .

For the error term, by the choice s+ 2 = t+ tmod2,

∆tOτ,t(|E|−(s+2)/2) = ∆tOτ,t(|E|−(t+tmod2)/2) = Oτ,t(|E|−(t+tmod2)/2).

Therefore∆tSE,η+k(A) = Oτ,t,k(|E|−(t+tmod)/2)

for all t ≥ 0, and (57) follows.Standard manipulations shows that (54) can be written

SE,η+k(A) = pA,ϑ

1− qA,ϑ∆m0F,ϑ,ν

s/2−1∑l=0

(−1)l(n0F,ϑ,ν − 1)l + Θτ,κ,s(|E|−(s+2)/2)

,from which it is easier to see (56), and also that, although SE,η+k(A) ∈ G0

τ ,we actually have ∆SE,η+k(A) ∈ G2

τ .

22

4 High Order Weighted Sampling Correla-

tions

For E ⊂ N and sets (u ∪ v) ⊂ E with u ∩ v = ∅ and η ≥ |u|, we define the(conditonal) measure Su,v

E,η supported on sets r ⊂ E of size η with r ⊃ u andr ∩ v = ∅ by

Su,vE,η(S = r) = SE,η(S = r|S ⊃ u, S ∩ v = ∅); (58)

that is, Su,vE,η is the measure SE,η conditioned to contain every element of

u but none of the elements of v. The measures considered in the previoussections were unconditioned, and therefore represent the special case

SE,η = S∅,∅E,η;

the ∅, ∅ superscript may be omitted. We define the (commutative) difference∆B on the measure Su,v

E,η for B ∈ E \ (u ∪ v) by

∆BSu,vE,η = Su∪B,v

E,η − Su,v∪BE,η .

For k ∈ N the operators ∆k will continue to be used in accordance with (35).The following Lemma gives some key properties of the conditional mea-

sure Su,vE,η , including a useful relation to its unconditional version SE,η.

Lemma 4.1 Let u,v be disjoint subsets of E. For H ⊂ E \ (u ∪ v),

∆HSu,vE,η =

∑α∪β=H,α∩β=∅

(−1)|β|Su∪α,v∪βE,η . (59)

For r ⊂ E such that r ⊃ u and r ∩ v = ∅,

Su,vE,η(r) = SE\(u∪v),η−|u|(r \ u), (60)

and for A 6∈ (u ∪ v) and H ⊂ E \ (u ∪ v ∪ A),

∆HSu,vE,η(A) = (−1)|H|∆|H|Su,v

E\H,η(A). (61)

Proof: Relation (59) can be shown by induction. By definition (47) andthat of conditional probability, using u ⊂ r ⊂ E \ v we have

Su,vE,η(r) =

xr∑w:u⊂w⊂E,w∩v=∅,|w|=η xw

=xr\u∑

w:u⊂w⊂E,w∩v=∅,|w|=η xw\u

23

since both r and w contain u and the factor xu, which appears in bothxr = xr\uxu and xw = xw\uxu, can be cancelled. Furthermore, because u,vare disjoint, w ∩ v = ∅ if and only if (w \ u) ∩ v = ∅, and when u ⊂ w and|w| = η then |w \ u| = η − |u|. Hence,

w \ u : u ⊂ w ⊂ E,w ∩ v = ∅, |w| = η= w \ u : w \ u ⊂ E \ u, (w \ u) ∩ v = ∅, |w \ u| = η − |u|= w : w ⊂ E \ u, w ∩ v = ∅, |w| = η − |u|= w : w ⊂ E \ (u ∪ v), |w| = η − |u|

and Su,vE,η(r) equals

xr\u∑w:w⊂E\(u∪v), |w|=η−|u| xw

= SE\(u∪v),η−|u|(r \ u).

This proves (60).It suffices to prove (61) for (u,v) = (∅, ∅). First, note for A 6∈ α ∪ β

Sα,βE,η(A) =

∑r3A,α⊂r⊂E,r∩β=∅

Sα,βE,η(r)

=∑

r3A,α⊂r⊂E,r∩β=∅SE\(α∪β),η−|α|

=∑

r3A,r⊂E\(α∪β)

SE\(α∪β),η−|α|(r)

= SE\(α∪β),η−|α|(A); (62)

hence since A 6∈ H,

∆HSE,η(A) =∑

α∪β=H,α∩β=∅(−1)|β|Sα,β

E,η(A) by (59)

=∑

α∪β=H,α∩β=∅(−1)|β|SE\(α∪β),η−|α|(A) by (62)

=|H|∑j=0

∑α∪β=H,α∩β=∅,|α|=j

(−1)|H|−jSE\H,η−j(A)

=|H|∑j=0

(|H|j

)(−1)|H|−jSE\H,η−j(A)

= (−1)|H||H|∑j=0

(|H|j

)(−Ψ)jSE\H,η(A)

24

= (−1)|H|(1−Ψ)|H|SE\H,η(A)

= (−1)|H|∆|H|SE\H,η(A).

In parallel to definition (36), for functions fE for which versions fα,βE are

defined (such as fE = Su,vE,η), for G ⊂ N let

Γqλ(G) = fE : ∆HfE = Oλ,|H|(|E|−(|H|+q+(|H|+q)mod2)/2) for all H ∩G = ∅.

In parallel to Lemma 3.3, we have

Lemma 4.2 Let p ≤ q be non-negative integers, and suppose that fF ∈Γp

λ(P ) and gF ∈ Γqλ(Q). Then

If P ⊂ Q, Γpλ(P ) ⊃ Γq

λ(Q) (63)

afF ∈ Γpλ(P ), fF + gF ∈ Γp

λ(P ∪Q) (64)

If P ∩H = ∅, then ∆HfF ∈ Γp+|H|λ (P ∪H) (65)

f(F\H)∪G ∈ Γpλ(P ) (66)

fF gF ∈ Γp+qλ (P ∪Q) (67)

The proof, being parallel to that of Lemma 3.3, is omitted.

Lemma 4.3 Let Condition 3.1 hold and τ ∈ (0, 1/2]. For (E, η) ∈ Eτ ,G ⊃ (u ∪ v) and G ∩ A = ∅,

Su,vE,η(A) ∈ Γ0

τ (G ∪ A) (68)

Su,vE,η(A)Su,v

E,η(A) ∈ Γ0τ (G ∪ A) (69)

and Su,vE,η(A)− SE,η(A) ∈ Γ1

τ (G ∪ A) (70)

Proof: For H ⊂ E \ (G ∪ A), by (61) and (60)

∆HSu,vE,η(A) = (−1)|H|∆|H|Su,v

E\H,η(A)

= (−1)|H|∆|H|SE\(H∪u∪v),η−|u|(A).

The result (68) now follows by (57) of Corollary 3.1. Since 1 ∈ Γ0τ (G∪A) we

have Su,vE,η(A) = 1− Su,v

E,η(A) ∈ Γ0τ (G ∪ A), and hence (69) using Lemma 4.2.

25

Next, if B ∈ v 6= ∅,

Su,vE,η(A)− S

u∪B,v\BE,η (A) = −∆BS

u,v\BE,η (A),

which is in Γ1τ (G∪A) by (68). Iterating over all elements in v and using that

Γ1τ (G ∪ A) is closed under addition, we obtain

Su,vE,η(A)− Su∪v,∅

E,η (A) ∈ Γ1τ (G ∪ A). (71)

Next, for B ∈ u ∪ v

SE,η(A) = SE,η(B)SB,∅E,η(A) + SE,η(B)S∅,BE,η(A)

= (1− SE,η(B))SB,∅E,η(A) + SE,η(B)S∅,BE,η(A)

= SB,∅E,η(A)− SE,η(B)(SB,∅

E,η(A)− S∅,BE,η(A)).

Rearranging,SB,∅

E,η(A)− SE,η(A) = SE,η(B)∆BSE,η(A).

Since SE,η(B) ∈ Γ0τ (G ∪ A) and ∆BSE,η(A) ∈ Γ1

τ (G ∪ A) their product is in

∆BSE,η(A) ∈ Γ1τ (G∪A) by (4.2) and therefore SB,∅

E,η(A)−SE,η(A) ∈ Γ1τ (G∪A).

Iterating over all B ∈ u∪v and using that Γ1τ (G∪A) is closed under addition,

we haveSu∪v,∅

E,η (A)− SE,η(A) ∈ Γ1τ (G ∪ A),

and now by (71) and the closure property of Γ1τ (G ∪ A), (70) follows.

For short, write

pu,vA = Su,v

E,η(A) and qu,vA = 1− pu,v

A ,

as usual, for (u,v) = (∅, ∅) we omit the superscripts.

Lemma 4.4 For any V and A 6∈ u ∪ v,

Su,vE,η ((IA − pA)V ) = ((pu,v

A − pA) + pu,vA qu,v

A ∆A)Su,vE,η(V ). (72)

Proof: Adding and subtracting pu,vA we have,

Su,vE,η ((IA − pA)V ) = Su,v

E,η ((IA − pu,vA )V ) + (pu,v

A − pA)Su,vE,η (V )

and

Su,vE,η ((IA − pu,v

A )V ) = pu,vA Su∪A,v

E,η ((1− pu,vA )V ) + (1− pu,v

A )Su,v∪AE,η (−pu,v

A V )

= pu,vA qu,v

A

(Su∪A,v

E,η (V )− Su,v∪AE,η (V )

)= pu,v

A qu,vA ∆ASu,v

E,η(V ).

26

Theorem 4.1 Let Condition 3.1 hold, τ ∈ (0, 1/2], (E, η) ∈ Eτ and u,v besubsets of E with G ⊃ (u ∪ v). Then if

Su,vE,η(V ) ∈ Γq

τ (G),

then for G ∩ A = ∅,

Su,vE,η((IA − pA)V ) ∈ Γq+1

τ (G ∪ A),

and for G ∩H = ∅,

Su,vE,η(

∏A∈H

(IA − pA)V ) ∈ Γq+|H|τ (G ∪H).

In particular, when V = 1 and G = u = v = ∅, since 1 ∈ Γ0τ (∅), we have

Corr(H) ≡ SE,η(∏

A∈H

(IA − pA)) ∈ Γ|H|τ (H),

and therefore in particular,

Corr(H) = Oτ,|H|(|E|−(|H|+|H|mod2)/2).

Proof: By Lemma 4.2

∆ASu,vE,η(V ) ∈ Γq+1

τ (G ∪ A);

since pu,vA qu,v

A ∈ Γ0τ (G), using Lemma 4.2 again yields

pu,vA qu,v

A ∆ASu,vE,η(V ) ∈ Γq+1

τ (G ∪ A).

Lastly, sincepu,v

A − pA ∈ Γ1τ (G ∪ A)

we also have that

(pu,vA − pA)Su,v

E,η(V ) ∈ Γq+1τ (G ∪ A).

The first result now follows from Lemma 4.4, and the second result by thefirst and induction.

27

5 Application

We now return to the motivating problem for this work, and in particularcalculate the limiting variance and information of the conditional logisticlikelihood discussed in Section 1. Recalling the model given in the Introduc-tion, we assume that under the probability measure PI having expectation EI ,the covariate vector and failure indicator (ZA, IA) are independent copies of(Z, I) for A ∈ R, where the dependence of failure on the covariate is specifiedin (2). We also assume that EI |ZA|2 <∞ where ZA is given in (5). In muchof what follows, we will suppress the dependence of the weights x = x(Z, β0)on Z and β0. We assume that xA lies with probability one in (0,∞);

PI (0 < xA <∞) = 1. (73)

We assume also that the asymptotic proportion of cases η = |S| in thesampled study base |E| converges in probability:

η

|E|p→ f ∈ (0, 1). (74)

Setting M = 1/f , this limit yields an asymptotic ratio of 1 case (I = 0) toevery M−1 controls (I = 0). We are led therefore to consider the probabilitymeasure PI M defined by its associated expectation EI M through

EI M [V ] =1

MEI [V |I = 1] +

M − 1

MEI [V |I = 0],

for any bounded measurable random variable V . Under PI M we continue tohave EI M |ZA|2 < ∞, that ZA are iid ,and that the weights xA satisfy (73).Note that in contrast to the previous Sections, we now have weights xA whichare random rather than fixed. Probability statements in this section, such asconvergence in probability, are with respect to PI M .

We will say the functions gE(θ) converges uniformly to g(θ) in probabilityon [0,∞) as |E| → ∞ if

supθ∈[0,∞)

|gE(θ)− g(θ)| p→ 0 as |E| → ∞.

Recalling that pA,θ = θxA/(1 + θxA) and qA,θ + pA,θ = 1, define fork = 0, 1, 2,

hE(θ) =1

|E|∑A∈E

pA,θ and ek,E(θ) =1

|E|∑A∈E

Z ⊗kA pA,θqA,θ.

28

By the law of large numbers we have

∀θ ∈ [0,∞) hE(θ)p→ h(θ) and ek,E(θ)

p→ ek(θ) as |E| → ∞ (75)

where

h(θ) = EI M

[θxA

1 + θxA

]and ek(θ) = EI M

[Z ⊗kθxA

(1 + θxA)2

]. (76)

The functions h(θ) and ek(θ) are continuous by the dominated conver-gence theorem. Since the weight xA is not degenerate by (73), h(θ) strictlyincreases from 0 to 1 as θ increases from 0 to∞, and hence for every f ∈ (0, 1)there exists a unique θf ∈ (0,∞) such that

h(θf ) = f. (77)

The proof of the following Lemma is modeled after Theorem 16(a) inFerguson, and appears at the end of this Section.

Lemma 5.1 Let xA be iid and satisfy (73). Then the functions hE(θ) andek,E(θ) converge uniformly in probability on [0,∞) to h(θ) and ek(θ) respec-tively as |E| → ∞.

Letting τ = (1/2) min(f, 1− f), under (74),

(E, η) ∈ Eτ with probability tending to 1.

We can now show the following proposition:

Proposition 5.1 If xA are iid and satisfy (73) and (74) holds, then

θ(E, η)p→ θf and ek,E(θ(E, η))

p→ ek(θf ) as |E| → ∞. (78)

Proof: On ΩE, the set where (E, η) ∈ Eτ , θ(E, η) exists and hE(θ(E, η)) =η/|E|. Hence, by uniformity of the convergence of hE to h and (74), as|E| → ∞,

hE(θ(E, η))− h(θ(E, η))−(η

|E|− f

)p→ 0 on ΩE.

But since h(θf ) = f on ΩE and PI (ΩE) ↑ 1,

h(θ(E, η))p→ h(θf ).

29

Since h(θ) is continuous and strictly increasing we have

θ(E, η)p→ θf .

The second part of (78) follows by the first part of (78) and the uniformconvergence of ek,E(θ) to ek(θ), given in Lemma 5.1.

Under the i.i.d. assumption on xA and (73), the following “in probability”form of Condition 3.1 is implied by the weak law of large numbers:

Condition 5.1 For all δ ∈ (0, 1), there exists ε ∈ (0, 1] and n ≥ 1 such thatfor any finite E with |E| ≥ n,

1

|E|∑A∈E

1(xA ∈ [ε, 1/ε]) ≥ 1− δ with probability at least 1− δ. (79)

With the hypotheses of say, Theorem 2.2 holding in probability, the con-clusion holds where the order is also in probability, where we say that gE isOλ(hE) in probability if for every δ ∈ (0, 1] there exists Cλ,δ and nλ,δ suchthat for all |E| ≥ nλ,δ,

|gE| ≤ Cλ,δ|hE| with probability at least 1− δ.

We now have all the necessary ingredients to calculate the limiting in-formation (inverse variance) for the conditional logistic likelihood for a largeclass of study base sampling designs.

Theorem 5.1 Consider a study base R of N individuals with covariates anddisease indicators (ZA, IA) for A ∈ R, i.i.d replications of (Z, I), where theconditional disease probability PI (I = 1|Z) is given by the proportional oddsmodel (2). Let

p = EI

[λ0x(Z; β0)

1 + λ0x(Z; β0)

],

the asymptotic proportion of diseased subjects in the study base.Consider a case-control sampling design with conditional logistic likeli-

hood of the form (4) and assume (74) holds. Then the scaled informationN−1I(β0) converges in probability as

N−1I(β0)p→Mp

[e2(θf )−

e ⊗21 (θf )

e0(θf )

],

where θf is the unique solution to h(θf ) = f , and h(θ) and ek(θ) are definedin (76).

30

Proof: By the law of large number η/N → p in probability, and therefore,by (74), the limit in probability of (1/N)I(β0) equals Mp times the limit inprobability of (1/|E|)I(β0). By (6),

1

|E|I(β0) =

1

|E|∑A∈E

Z ⊗2A pAqA +

1

|E|2∑

A,B∈E,A6=B

ZAZTB|E|(pAB − pApB).

For the first sum, fixing κ ≥ 2 and letting ϑ = θ(E, η), by (55) we have

pA = pA,ϑ +Oτ (|E|−1), (80)

with this and all following error orders in probability. Hence

1

|E|∑A∈E

Z ⊗2A pAqA =

1

|E|∑A∈E

Z ⊗2A

(pA,ϑ qA,ϑ +Oτ (|E|−1)

)= e2,E(ϑ) + oτ (1)p→ e2(θf )

by Proposition 5.1. For the second term, now letting F = E \ (A ∪ B), wefirst write

pAB − pApB = SE,η(IA − pA)(IB − pB)

= pAqA∆ASE,η(IB − pB) by Lemma 4.4

= pAqA∆ASE,η(B) since Su,vE,η(pB) = pB for all u, v

= −pAqA∆SE\A,η(B) by (61) of Lemma 4.1

= −pAqApB,ϑqB,ϑI0F,ϑ,2 +Oτ (|E|−2) by (55) of Theorem 3.1.

Since I0F,ϑ,2 = Oτ (|E|−1) and pA = pA,ϑ +Oτ (|E|−1), we may replace pA and

qA by pA,ϑ and qA,ϑ without affecting the order term to obtain

pApB − pAB = −pA,ϑqA,ϑpB,ϑqB,ϑI0F,ϑ,2 +Oτ (|E|−2),

so that

|E|(pApB − pAB) = −pA,ϑqA,ϑpB,ϑqB,ϑ|E|I0F,ϑ,2 + oτ (1).

Relation (19) of Theorem 2.1 gives

|E|I0F,ϑ,2 = |F |I0

F,ϑ,2 + oτ (1)

= |F |v−2F,ϑ + oτ (1) = |E|v−2

E,ϑ + oτ (1) = e−10,E(ϑ) + oτ (1),

31

and so,

1

|E|2∑

A,B∈E,A6=B

ZAZTB |E|(pAB − pApB)

= − 1

|E|2∑

A,B∈E,A6=B

ZAZTBpA,ϑqA,ϑpB,ϑqB,ϑe0,E(ϑ)−1 + oτ (1)

= −(

1

|E|∑A∈E

ZApA,ϑqA,ϑ

)⊗2

e0,E(ϑ)−1 + oτ (1)

= −e1,E(ϑ)⊗2/e0,E(ϑ) + oτ (1)p→ −e1(θf )

⊗2/e0(θf ).

A remarkable and immediate consequence of this theorem is summarizedin the following corollary:

Corollary 5.1 For case-control sampling designs that meet the conditionsof Theorem 5.1, the asymptotic information only depends on the asymptoticproportion of subjects (and therefore controls) in the sample.

For instance, Bernoulli trial sampling, simple random sampling of con-trols, and case-base sampling (random sampling of the study base withoutregard to case-control status) all have the same asymptotic information aslong they have the same asymptotic proportion of controls.

5.1 Proof of Lemma 5.1

We argue for hE(θ); the proof for ek,E(θ) is similar.We first show that it suffices to prove that the convergence is uniform

on a compact interval [0, b] ⊂ [0,∞). Let δ ∈ (0, 1] be arbitrary. InvokingCondition 5.1, there exists ε > 0 and n ≥ 1 such that for |E| ≥ n,

PI

(1

|E|∑A∈E

1(xA ≥ ε) ≥ 1− δ/2

)≥ 1− δ.

Since the limiting function h(θ) is increasing in θ, and by (73) has limit1 at ∞, we may chose b such that for

1 ≥ h(θ) ≥ bε

1 + bε(1− δ/2) ≥ 1− δ and h(b) ≥ 1− δ.

32

Now, for θ ≥ b and |E| ≥ n, with probability at least 1− δ we have

1 ≥ hE(θ) ≥ θε

1 + θε(1− δ/2) ≥ bε

1 + bε(1− δ/2) ≥ 1− δ. (81)

Hence for |E| ≥ n

PI

(supθ≥b

|hE(θ)− h(θ)| ≤ δ

)≥ 1− δ.

Since δ > 0 is arbitrary, supθ≥b |hE(θ)−h(θ)| → 0 in probability as |E| → ∞.It remains to show uniform convergence in probability on [0, b]. Define

the function

U(A, θ) =θxA

1 + θxA

− h(θ),

which is continuous in θ for all A. Since U is the difference of terms whichlie in [0, 1],

|U(A, θ)| ≤ 1 for all A and θ ∈ [0, b].

For ρ > 0, letψ(A, θ, ρ) = sup

|θ′−θ|<ρU(A, θ′),

which is also bounded by 1 in absolute value. The function ψ(A, θ, ρ) ismeasurable for all θ and ρ by the continuity of U(A, θ) in θ (take supremumover a countably dense collection of θ’s) and for all A and θ,

ψ(A, θ, ρ) → U(A, θ) as ρ→ 0.

Hence, by the dominated convergence theorem, for all A and θ,

EI ψ(A, θ, ρ) → EI U(A, θ) = 0 as ρ→ 0.

Given ε > 0, let ρθ > 0 be such that for 0 < ρ ≤ ρθ, EI ψ(A, θ, ρ) ≤ ε. Thecollection of open sets S(θ, ρθ) = θ′ : |θ′ − θ| < ρθ for θ ∈ [0, b] form anopen cover of [0, b], and by compactness, [0, b] is covered by S(θ, ρθj

) as jranges over a finite set of indices J .

From the definition of ψ, for θ ∈ S(θj, ρθj),

1

|E|∑A∈E

U(A, θ) ≤ 1

|E|∑A∈E

ψ(A, θj, ρθj),

33

and hence

supθ∈[0,b]

1

|E|∑A∈E

U(A, θ) ≤ supj∈J

1

|E|∑A∈E

ψ(A, θj, ρθj).

By the law of large numbers, for every j ∈ J

PI (1

|E|∑A∈E

ψ(A, θj, ρθj) ≤ ε) → 1 as |E| → ∞,

and so, since J is finite,

PI (supj∈J

1

|E|∑A∈E

ψ(A, θj, ρθj) ≤ ε) → 1,

and therefore

PI

(sup

θ∈[0,b]

1

|E|∑A∈E

U(A, θ) ≤ ε

)= PI

(sup

θ∈[0,b](hE(θ)− h(θ)) ≤ ε

)→ 1.

Applying this same argument to −U(A, θ) gives

PI

(sup

θ∈[0,b]

(h(θ)− hE(θ)) ≤ ε

)→ 1.

Using

supθ∈[0,b]

|hE(θ)− h(θ)| = max supθ∈[0,b]

(hE(θ)− h(θ)), supθ∈[0,b]

(h(θ)− hE(θ))

gives uniform convergence in probability on [0, b].

References

[1] R. Arratia, L. Goldstein, B. Langholz Simple Random Sampling Corre-lations and Stirling Numbers. 2001.

[2] N. E. Breslow and N. E. Day. Statistical Methods in Cancer Research.Volume I – The Design and Analysis of Case-Control Studies, volume 32of IARC Scientific Publications. International Agency for Research onCancer, Lyon, 1980.

34

[3] D.R. Cox and E.J. Snell. Analysis of Binary Data. Monographs onStatistics and Applied Probability 32. Chapman and Hall, New York,second edition, 1989.

[4] V.T. Farewell. Some results on the estimation of logistic models basedon retrospective data. Biometrika, 66:27–32, 1979.

[5] T. Ferguson. A Course in Large Sample Theory. Chapman Hall, Textsin Statistical Science. 1996

[6] L. Gordon. Successive sampling in large finite populations. Annals ofMathematical Statistics, 11(2):702–706, 1983.

[7] W.L. Harkness. Properties of the extended hypergeometric distribution.Annals of Mathematical Statistics, 36(3):938–945, 1965.

[8] J. Hajek. Asymptotic theory of rejective sampling with varying probabili-ties from a finite population. Annals of Mathematical Statistics, 35:1491–1523, 1964.

[9] L.L. Kupper, A.J. McMichael, and R. Spirtas. A hybrid epidemiologicstudy design useful in estimating relative risk. J Am Statistical Associa-tion, 70:524–528, 1975.

[10] B. Langholz and L. Goldstein. Conditional logistic analysis of case-control studies with complex sampling. Biostatistics, 2:63–84, 2001.

[11] B. Rosen. Asymptotic theory for successive sampling with varying prob-abilities without replacement. I, II. Annals of Mathematical Statistics,43:373–397,43:748–776, 1972

[12] D.C. Thomas. General relative -risk models for survival time andmatched case-control analysis. Biometrics, 37:673–686, 1981.

[13] S. Wacholder, D.R. Silverman, J.K. McLaughlin, and J.S. Mandel. Se-lection of controls in case-control studies: III. Design options. AmericanJournal of Epidemiology, 135:1042–1050, 1992.

35

Local Central Limit Theorems, the High Order Correlations ... · Local Central Limit Theorems, the

Documents

Transcript of Local Central Limit Theorems, the High Order Correlations ... · Local Central Limit Theorems, the