University of Alberta - collections. Canada

University of Alberta

ESTIMATION OF MEDIAN FOR UNEQUAL PROBABILITY SAMPLING OVER TWO OCCASIONS

by

Shu Jing Gu

A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of

Master of Science

in

Statistics

Department of Mathematical and Statistical Sciences

©Shu Jing Gu

Fall, 2011 Edmonton, Alberta

Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is

converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms.

The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or

otherwise reproduced in any material form whatsoever without the author's prior written permission.

1*1 Library and Archives Canada

Published Heritage Branch

395 Wellington Street OttawaONK1A0N4 Canada

Bibliotheque et Archives Canada

Direction du Patrimoine de I'edition

395, rue Wellington OttawaONK1A0N4 Canada

Your file Votre reference ISBN: 978-0-494-81301-0 Our file Notre reference ISBN: 978-0-494-81301-0

NOTICE: AVIS:

The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.

L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par Plnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats.

The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.

L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.

In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.

Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these.

While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.

Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.

1*1

Canada

Abstract

The main concern in repeated surveys is the non-response due to the fact that the

same individuals are sampled repeatedly. A solution to this problem is to use par

tial replacement sampling (rotation sampling) scheme, where after each sampling

occasion, a fraction of the units observed on that occasion is rotated out of the sam

ple, and replaced by a new sub-sample from the population. Here we considered

estimation of population median for sampling over two occasions where unequal

probability sampling is used on both occasions. Recently, attempts have been made

to estimate population median for sampling over two occasions when simple ran

dom sampling scheme is used for both occasions. Besides, these existing methods

require density estimation of the underlying characteristics. This thesis presents

a new approach of estimating population median based on estimating equations

for unequal probability sampling over both occasions. The proposed method also

avoids the problem of density estimation.

Acknowledgements

I am heartily thankful to my supervisor, Dr. Narasimha Prasad, who has guided

me throughout my thesis with his patience and knowledge. I have been extremely

fortunate to have Dr. Prasad as my supervisor for my master studies, and without

him, this thesis would not have been completed.

I also offer my sincere gratitude to my exam committee members, Dr. Peng Zhang

and Dr. Irina Dinu for spending their valuable time reading and evaluating my the

sis.

Dr. Peter Hooper has provided me funding during my studies. Hereby I deeply

thank him for the support.

Finally, I thank my dear family and friends for supporting and helping me through

out all my studies.

Table of Contents

1. Introduction 1

2. Population total estimation in PPS sampling on two occasions 5

2.1 Notations 5

2.2 Des Raj Scheme 6

2.3 Ghangurde-Rao Scheme 8

2.4 Chotai Scheme 10

2.5 Prasad-Graham Scheme 12

3. Estimation of median in PPS sampling on two occasions 15

3.1 Estimating equations 16

3.2 Estimation of median for Des Raj scheme 16

3.3 Estimation of median for Ghangurde-Rao scheme 20

3.4 Estimation of median for Prasad-Graham scheme 22

4. A simulation study based on generated populations 25

4.1 Description of two sets of generated populations based on model 1 . . 25

4.2 Description of two sets of generated populations based on model 2 . . 26

4.3 Computations on generated populations 27

4.4 Numerical comparisons 29

4.4.1 Comparisons of results for the two sets of generated popu

lations based on model 1 29

4.4.2 Comparisons of results for the two sets of generated popu

lations based on model 2 32

5. A simulation study based on real data 35

5.1 Description of data sets 35

5.2 Computations on real data 36

5.3 Numerical comparisons 37

6. Conclusion and future work . 40

Bibliography 42

Appendix A Derivation 44

A.l Des Raj Scheme 44

A.1.1 Var(T2M) in equation (3.7) 44

A.1.2 Var(T2m) in equation (3.8) 45

A.2 Ghangurde-Rao Scheme 47

A.2.1 Var(T2u) in equation (3.14) 47

A.2.2 Var(T2m) in equation (3.15) 48

A.3 Prasad-Graham Scheme 52

A.3.1 Var(r2m) in equation (3.19) 52

Appendix B R code 54

List of Tables

4.1 Rel.bias and Rel.MSE of 0X and 62 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for the generated pop

ulations based on model 1: number of simulation= 1000 30

4.2 Rel.bias and Rel.MSE of 0\ and 62 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for the generated pop

ulations based on model 2: number of simulation=1000 33

5.1 Rel.bias and Rel.MSE of #\ and §2 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for real data set A:

number of simulation= 1000 37

5.2 Rel.bias and Rel.MSE of 6X and 62 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for real data set B:

number of simulation= 1000 37

Chapter 1

Introduction

Partial replacement sampling from a finite population is commonly used in repeat

ed surveys due to the fact that it reduces the burden of response and improves the

efficiency of estimation as a result. If the same population is sampled repeatedly

time after time, people may not be willing to respond the same information, and

this makes them becoming less representative as time proceeds; then, the precision

of estimation will be greatly influenced. Partial replacement sampling, on the other

hand, reduces this non-response bias. Based on Jessen(1942) who first introduced

the problem of sampling on two successive occasions, the estimates of the current

(second) occasion may be improved by replacing only part of the sample on the pre

vious (first) occasion. That is, after the first sampling occasion, only a proportion of

the units observed on that occasion will be retained, and the remanning unmatched

units are replaced by a fresh selection from the entire population. The unmatched

units are then observed on the second sampling occasion along with the matched

units. In this way, the efficiency and precision of the estimates will be optimized.

It is to be mentioned that sampling over two occasions has been studied by

various authors under different sampling schemes, and a particularly important case

1

is the sampling with unequal probability. Unequal probability sampling scheme is

usually considered when the sample designer has access to an auxiliary variable or

size measure x which is correlated to the variable of interest y for each unit in the

population. Since the use of auxiliary information at the estimation stage increases

the accuracy of estimates for the variable of interest, the selection probability for

each unit is set to be proportional to its size measure. In most cases, values of the

auxiliary variable x are available in advance for the entire population because of the

relatively low cost of achieving them. For example, surveys attempting to estimate

the area under the wheat in a village may use the total area of each farm (cultivated

area) as an auxiliary variable.

There is an extensive literature on the estimation of population total or mean

for PPS (probability proportional to size) sampling over two occasions. For in

stance, Prasad and Graham (1994) discussed several different sampling and esti

mation procedures for finite population total with PPS sampling on two occasions.

The approach developed is to provide the best estimate of current population to

tal by optimizing the weights for the estimates based on matched and unmatched

units on the second occasion. However, it should be noted that many surveys are

conducted not only to estimate the population total but also to estimate quantiles in

particular for variables such as income earnings. Recently, the problem of estimat

ing finite population quantiles in successive sampling on two occasions has been

considered. For example, Singh, H. P., Tailor, Singh, S. and Kim (2007) have made

some efforts in the development of procedures on quantile estimation for a finite

population. Nonetheless, the study is restricted to simple random sampling over t-

wo occasions, and estimates of probability density functions are required to achieve

the optimum estimation of quantiles, which makes the approach very complicated.

2

In view of this it is of great interest to see how the quantile estimation for a

finite population would look when PPS sampling is used for both of the two occa

sions. The purpose of this thesis is to present a new approach for the estimation

of population median for unequal probability sampling over two occasions. Prasad

and Graham (1994) discussed several schemes for the population total estimation

when unequal probability sampling is used over two occasions. The new approach

for estimating the median based on estimating equations is considered in this thesis

for all the sampling schemes discussed in Prasad and Graham (1994).

Chapter 2 provides an overview of the sampling schemes and associated es

timation methods discussed in Prasad and Graham (1994). It introduces several

sampling schemes with probability proportional to size over two successive occa

sions. The schemes discussed are Des Raj (1965) scheme, Ghangurde-Rao (1969)

scheme, Chotai (1974) scheme and Prasad-Graham (1994) scheme which is devel

oped as a modification to Chotai scheme. For each of the scheme, descriptions of

estimating a finite population total are provided.

In Chapter 3, estimating equation approach is considered to estimate the popu

lation median for all the schemes described in Chapter 2.

A simulation study based on four sets of synthetic populations is carried out in

Chapter 4. The relative bias and relative mean squared error for the proposed esti

mators are evaluated to illustrate the present approach for esimating the population

median.

Chapter 5 also considered a simulation study where finite populations are gen

erated using real data sets that are published in the literature.

Conclusions and further research projects are discussed in Chapter 6.

Appendix A contains derivations of the results given in Chapter 3. All numer-

3

ical computations done in this thesis are produced by using R 2.12.1 running on a

Windows XP platform. Appendix B gives the R code used in this thesis.

Chapter 2

Population total estimation in PPS

sampling on two occasions

In this chapter, the sampling schemes and underlying estimation methods for esti

mating population total given in Prasad and Graham (1994) are discussed. Prasad

and Graham considered four different sampling schemes: Des Raj (1965) scheme,

Ghangurde-Rao (1969) scheme, Chotai (1974) scheme and Prasad-Graham scheme

which is a modified version of Chotai scheme. Each of the four schemes will be

described in this chapter.

2.1 Notations

Consider a finite population of TV units with characteristics yt (i = 1,2, ...,7V)

whose total Y = y\ + y2 + ••• + VN is to be estimated. Let us denote:

revalue of auxiliary variable (size measure) for the z-th unit

yu=value of y for the i-th unit observed on the first occasion

y2t=value of y for the i-th unit observed on the second occasion

5

Yi=population total observed on the first occasion

l2=population total observed on the second occasion

m=number of units matched

u=number of units unmatched

5/2m=population total for matched units observed on the second occasion

5/2«=population total for unmatched units observed on the second occasion

For each of the following schemes size measure Xi (i = 1,..., N) is assumed to

be known for all N units in the population before sampling, and successive sam

pling over two occasions is used.

2.2 Des Raj Scheme

Raj (1965) has considered the following scheme of sampling over two occasions:

• On the first occasion:

A sample s of size n is selected from the entire population with PPSWR

(probabilities proportional to size measure x^ with replacement); that means, N

the probability of selecting each unit from the population is pt = £,/ ]T X{. i=l

• On the second occasion:

(1) A simple random sample s\ of size m = An (0 < A < 1) is selected

from sample s without replacement (SRSWOR); therefore, the proba

bility of selecting sample si is 1/Q).

(2) An independent sample s2 of size u — n - m is selected with PP

SWR from the whole population; this selecting method is the same as

6

choosing sample s, and the probability of selecting each unit from the

population is also p^ = xl/y}2lXi.

Then, unbiased estimate of population total for both the first and second occa

sions can be obtained as:

*]=X>ii / (nPi) (2-1)

and

% = QY2u + (1 - Q)Y2m, (2.2)

where Q is a weight (0 ^ Q < 1), and

^2* = X>2i/frPi), (2-3) i€S2

Y2m = ^2 yul(nPi) + ^(V2i - yu)/{mpi). (2.4)

Our primary interest is to find the best estimate of current population total Y2,

and one can calculate it by optimizing the allocated weight Q and the fraction of

matched units A. The optimal values of Q and A are values that provide the mini

mum variance of Y2. According to the composite estimator F2 defined in (2.2),

V(Y2) = Q2V(Y2u) + (1 - Q)2V(Y2m) + 2Q(1 - Q)COV{Y2u, Y2m), (2.5)

where COF(Y2u, l ™) = 0 because sample si is a subset of s which is independent

from sample s2. The optimal weight has the form of

= V(Y2m)

V(Y2u) + V(Y2m)

and let us assume that the following two variances are the same:

N N

Vi = £>«/?. - yifvi = v2 = Y,(v*in - Y^2pi = v- (2-7) 4 = 1 » = 1

Then, using the optimal values of Q and A, the minimum variance of Y2 is found to

be

v ^ - J f f l ^ S E l , ifS<l (,8)

where N

5 = v J2(yii/Pi ~ yi)(WPi - Y2)Pi (2-9) 2 = 1

is the correlation coefficient between yu/pi and y^i/pi-

2.3 Ghangurde-Rao Scheme

The procedure proposed by Ghangurde and Rao (1969) modified the Des Raj (1965)

scheme on the selection of sample s and sample s2. For simplicity we assume N/n

and N/u to be integers.


Population of N units are divided at random into n groups, each of size N/n;

then, sample s of size n is selected by drawing one unit from each of the n

groups independently with PP^WOR (probabilities proportional to pi with

out replacement); this indicates that the probability of selecting one unit from

each random group is Pi/Pu where P* denotes the total of pi values for the

group containing i-th unit (i = 1,2,..., N) when selecting s.


(1) A sample Si of size m = An (0 < A < 1) is drawn from s using the

same method as described in the Des Raj scheme.

(2) For the independent sample s2 of size u = n — m, first split N units at

random into u groups, each of size N/u; after that, collect one unit from

8

each of the u groups independently with PPpjWOR. The probability of

selecting one unit from each random group is Pi/P*, where P* denotes

the total of p, values for the group containing i^th unit (i = 1,2,...,N)

when selecting s2.

Then, population total for the first and second occasions are respectively unbi-

asedly estimated by:

y, = j2 y^Ei (2.10) •r- Pi

and

n = Q'YL + (1 - Q')Y2^ (2.11)

where Q' is a weight (0 ^ Q' 1), and

*£. = £ ^ , (2-12)

Y>m = YV-^ + -Y{V2i-yu)Pi. (2.13) k Vi rn^ Pi

Under the assumption in (2.7) and using the optimal values of Q' and A, the mini

mum variance of Y2' is given by

KnnOK,') = ^ ^ [ l - n / i V + v/ 2 T r ^ ) ( l + 7WiV], ifS < \ (2.14)

where V and 5 are as defined in (2.7) and (2.9) respectively, and

7 (1-S)V

with N , N

and t = l i=l

N

NV P = T7777 ^(Vu ~ Yi)(yx ~ Y2) i = l

which is the correlation coefficient between yu and y2i-

2.4 Chotai Scheme

Under the additional assumption that n/m is also an integer, Chotai (1974) intro

duced a sampling design that modified the Ghangurde-Rao (1969) scheme on the

selection of sample si over the second occasion.


A sample s of size n is chosen by the same procedure as in the Ghangurde-

Rao scheme.


(1) The n units in sample s are divided at random into m = An (0 < A <

1) groups, each of size n/m; then, draw one unit from each of the m

groups independently with PPPWOR (probabilities proportional to p

without replacement). The selected m units compose sample s\, and

the probability of choosing one unit from each random group is P / P j + ,

where Pj is as defined in the Ghangurde-Rao scheme, and P+ denotes

the total of p values for the random groups of s containing i-th unit

(i — 1,2,...,N) when selecting si.

(2) The selection of the independent sample s2 of size u = n — m is also

the same as described in the Ghangurde-Rao scheme.

After that, population total for the first and second occasions are respectively

unbiasedly estimated by:

y c = y, yuP ( 2 1 5 )

it. Pi

and

Y2C = QCY2

CU + (1 - Qc)Y2

cm, (2.16)

10

where Qc is a weight (0 ^ Qc ^ 1) and

V 2u £ V2iP*

Pi

yC 12m

yuPi , v ^ (?/2i - yii)P? y v ^ i i £ i + y -Pi

(2.17)

(2.18)

In equations (2.17) and (2.18), both P and P* are as defined in the Ghangurde-Rao

scheme, and P+ is as defined in the description of Chotai sampling design of this

section. The minimum variance of Y2C obtained by using the optimal values of Qc

and A under the assumption in (2.7) is:

NV VmUY2

C) = ^ T T Y y t 1 - nlN + V/20Z^)] , ifS < \. (2.19)

So far, the assumption in (2.7) has been considered for the estimation proce

dures. Now if we do not consider it; then, the estimation of population total for

the unmatched units on the second occasion will be the same as that with the as

sumption, but the estimation for the matched units on the second occasion will be

different. Under the Chotai scheme but without assumption (2.7), a composite esti

mator of y2 is:

yCM = QCMyC + ( 1 _ QCM)YCM^

where Yg is defined in (2.17), QCM is a weight (0 < QCM < 1) and

(2.20)

V CM 2m

y fai - Pyii)P? + oy &&

J G S I

with

Pi

N

iGs Pi

(2.21)

E(y2i/Pi-Y2)2

Pi 8 = 1

N (2.22)

T,(yu/pi-Yi)2Pi Lj=l

where 5 is the correlation between yu/Pi and yu/pi. Notice that we did not use S as

defined in (2.9) since the assumption in (2.7) has not been considered here, instead

11

6 is defined as

JV

„ „ x J2(yu/Pi-Yi)(y2i/Pi-Y2)pi x i Vu y2i \ i=i o = corr — Pi Pi ) N/VTVV2

N N

where Vx = Y^ivu/Pi ~ Yx)2Pi, and V2 = E ( W P * - ^ V The minimum

»=i j = i

variance of ig M without assumption (2.7), obtained by using the optimal values of

QCM a n ( J ^ j s gjygjj by

KnnCP?™) = M ^ 1 } ( 1 + VT^P - n/N), if8 < \. (2.23)

It should be noted that the value of /3 is required for the use of Y2CM; however,

the actual value is usually unaccessible in practice, and an estimate of /3 on the avail

able sample may induce some biases in the estimation. Therefore, a modification

on the Chotai scheme appears, and it is discussed in the next section.

2.5 Prasad-Graham Scheme

In this section, the authors introduced an alternative sampling and estimation pro

cedure of the Chotai (1974) scheme that does not need the value of j5 defined in

(2.22) to be known in advance. Under Prasad-Graham scheme (1994), N/n, N/u

and n/m are all assumed to be integers as in Chotai. In this sampling scheme, the

information collected on the first occasion is used in selecting the sample si on the

second occasion. The new approach is:


A sample s of size n is selected by the same method as in the Ghangurde-

Rao (1969) scheme, and after the selection, each units of s are observed on a

characteristic y and denoted as yu (i = 1,..., n).

12


(1) The n units in sample s are split at random into m = An (0 < A <

1) groups, each of size n/m; then, select one unit from each of the

m groups independently with PPp*WOR (probabilities proportional to

p* without replacement). The selected m units yields sample si, and

the probability of choosing one unit from each random group is p*/Pi,

where

P* = ^ , (2-24) Pi

which involves the information observed on the first occasion, Pt is as

defined in the Ghangurde-Rao scheme, and p denotes the total of p*

values for the groups containing i-th unit (i = 1,2,..., N) when select

ing Si.

(2) The selection of the independent sample s2 of size u = n — mis also

the same as described in the Ghangurde-Rao scheme.

Under the Prasad-Graham scheme, population total for the first occasion is un-

biasedly estimated by Yf as defined in (2.15), and a composite estimator of Y2 for

the second occasion is:

Y2=Q*Y2cu + (l-Q*)Y2*m, (2.25)

where Yg is defined in (2.17), Q* is a weight (0 ^ Q* ^ 1) and

i£si "l

with

y*. = V2^i, (2.27) Pi

13

The minimum variance of Y2* obtained by using the optimal values of Q* and A is

W i ? ) = n ( ^ 21 } [ l - n/N + Vh], (2.28)

where

with

h = ^ , (2.29), V2

and

z=ipi

v*=ib(—Yi-Y*)*1£- (2-3°) It is to be mentioned that the value of h measures the efficiency of the estimator

using pi as initial selection probabilities over the estimator using yu/Yi as initial s-

election probabilities in estimating the current population total; therefore, a relative

smaller value of h indicates that Prasad-Graham scheme, which uses the informa

tion obtained from the pervious occasion in selecting the sample on the current

occasion, outperforms Chotai scheme and Ghangurde-Rao scheme.

14

Chapter 3

Estimation of median in PPS

sampling on two occasions

The problem of quantile estimation is often considered when study variables exhibit

skewed distribution, such as income earnings. This is because unlike the population

total, quantiles are not affected by extreme values. In this chapter, estimation pro

cedures discussed in Chapter 2 will be extended to the situation where the median

of a finite population is estimated on each of the two occasions, and the current esti

mate is still of chief interest. The extension of the procedures is not straightforward.

Since the population median is a nonlinear function of population values, we con

sidered estimating equation approach (See Binder and Patak (1994) and Thompson

(1997)). In the following sections we discussed estimating equation approach for

unequal probability sampling.

15

3.1 Estimating equations

According to Binder and Patak (1994) and Thompson (1997), population median

9N can be defined as the solution of the population estimating equation

N 1

Tt[i(yi^eN)--] = o, (3.i) i= i z

and then 9, an estimator for population median can be defined as the solution of the

sampling estimating equation

r*iii=, (3.2), i€s %

where /(•) is the indicator function taking the value 1 when the condition is satisfied

and 0 otherwise, s is the set of population units in the sample, and 7 denotes the

probability of inclusion for i-th unit. That is,

^ = ]Tp(S). (3.3)

For any sampling design, (3.2) is unbiased for (3.1), and one may expect 6 to be as

close as to 9^ for large samples. The next section deals with estimation of median

based on estimating equations for Des Raj (1965) sampling scheme.

3.2 Estimation of median for Des Raj scheme

All the notations denoted in Section 2.1 as well as the following notation will be

used throughout this chapter:

#i=estimate of population median on the first occasion

02=estimate of population median on the second occasion

02u=estimate of population median for unmatched units on second occasion

16

6,2m=estimate of population median for matched units on second occasion

The estimation procedure of population total for Des Raj (1965) scheme, which

was described in Section 2.2, is now extended to estimate population median by

incorporating the idea of estimating equation introduced in Section 3.1.


Suppose we want to calculate the estimate of population median on the first

occasion 9X. In order to do this, let us replace yu in (2.1) by the corresponding

indicator function I(yu ^ 9]) — \, and denote the new formula as Ti, then:

r^E£fc|AM (34) zes np%

and

§i is such that Ti = 0.


(1) To obtain the estimate of population median for unmatched units on the

second occasion §2u, replace y2i in (2.3) by the corresponding indicator

function I(y2i ^ 02„) - §. and denote the new formula as T2u, then:

T2U = E I { V 2 i ^ § 2 u ) " *. (3-5) t"^ UPi

and

92u is such that T2„ = 0.

(2) To obtain the estimate of population median for matched units on the

second occasion 92m, replace yu and y2i in (2.4) by the corresponding

indicator functions I(yu ^ #i) - \ and I(y2i ^ #2m) - \ respectively,

17

and denote the new formula as T2m, then:

Tim _ y 7(y» < ft) - § , y ^ < 02m) ~ ijyii ^ di) ( 3 6 ) m /—' np, ^—' mpi

In order to find out #2m, we can set this new formula to be zero, and the

solution will be the estimate of current population median for matched

units. One can notice that the first term in the formula is actually the

same as Ti in (3.4) which has already been set to zero on the first occa

sion. Therefore, we only need to set the second term in the formula to

be zero to achieve #2m. That is:

a • u 4-u 4. ST J ( f e ^ ^ 2 m ) ~ J ( ^ ^ ^ ) n 92m is such that > — - = 0. ^—' mpi iesi

Thus, both 92u and #2m, independently estimate the population median for the

second occasion (#2). Now, one can obtain a composite estimator of #2 as a weight

ed average of these two estimators; that is, 92 = Q92u + (1 — <2)#2m- This is

an optimal estimator for 92, but the optimal weights are functions of variances of

these two estimators which are difficult to evaluate because it requires density esti

mation. To overcome this problem, we first obtain optimal estimating equation by

taking weighted average of the two estimating equations, T2u and T2m, to obtain

a better estimator of the population median 92 based on unmatched and matched

samples. That is, consider the following estimating equation

T2 = QT2u + (1 - Q)T2m

with

Var(T2) = Q2 Var(T2„) + (1 - Q)2 Var(T2m)

because T2u and T2rn are independent and their covariance is zero. Q is a weight

(0 ^ Q ^ 1). The optimal value of Q is the value that provide the minimum

18

variance of T2, and it has the form of

Var(T2m) Q =

Var(T2u) + Var(T2m)

where Var(T2u) and Var(T2m) have been found in the Appendix A, and their for

mulas are:

V » P y = i ( £ f - ^ ) , (3.7)

with

1 N

Wi = I{y2i ^ 6>2u) - 2» ^ = ^ Wi,

» = 1

and

V ^ ) 4 ( £ f - ^ ) + ^ ( E f - - 2 ) . (3.B> with

1 N

w'i = I(y2i^02m)--, W' = Y,<> i= i

< = I(y2i ^ 92m) ~ I(yu O i ) , W* = ^ < . i= l

By using Var(T2u) and Var(T2m), optimal weight Q is obtained, and then we

can calculate the estimate of current population median as following:

§2 is such that ffe = QT2u + (1 - Q)f2m = 0,

where

f a w =x: J ( t t e < g a ) "' . ^ JGS2 ^

and

f = y / (to^ftWfo^ (310) ^—' mpi i€si

Notice that T2ti is the form of replacing 62u in (3.5) by 92; T2m is the form of

replacing 02m in (3.6) by 92, and discarding the first zero term. 19

3.3 Estimation of median for Ghangurde-Rao scheme

In Section 2.3, the estimation of population total for Ghangurde-Rao (1969) scheme

was discussed, and it is now extended to estimate population median in this section.


In order to achieve the estimate of population median on the first occasion 9\,

replace yu in (2.10) by the corresponding indicator function I(yu ^ #i) — \,

and denote the new formula as Ti:

ri = E L ~ v ' " T ZjPl (3-n)

and

Pi

9\ is such that Ti = 0.


(1) To calculate the estimate of population median for unmatched units on

the second occasion 02u, replace y2i in (2.12) by the corresponding in

dicator function I(y2i ^ 92u) — \, and denote the new formula as T2u:

[ifoi ^ o2u) - \ D *

T2U = y ; -*= J —, (3.i2)

and

Pi 1&S2

#2u is such that T2u = 0.


second occasion 82m, replace yu and y2i in (2.13) by the corresponding

indicator functions I(yu < 0i) - § and I(y2i ^ 92m) - \ respectively,

20

and denote the new formula as T2m, then:

' 2m

+- T m {—i

i&si

[l(.Vli*i0l)-%]Pi

[l(y2z^02m)-I(yu^e1)]Pi (3.13)

One can set this new formula to be zero to find out 92m- The first term

in the formula has already been set to zero on the first occasion because

it is the same as Ti in (3.11). Therefore, only the second term in the

formula is needed to be set to zero, and then

~ n v—"\ 62m is such that — >

l{yn < 4m) - l{yu < 9i)

JGSI Pi

= 0.

The estimate of current population median 92 for Ghangurde-Rao scheme can

also be obtained similarly with the procedure of estimating 02 for Des Raj scheme

in Section 3.2, and the procedure is:

92 is such that f2 = QT2u + (1 - Q)f2m = 0,

where

Q Var{% 2ml

Var(T2u) + Var(T2m)

is the calculated optimal weight with Var(T2U) and Var(T2m) which can be found

in the Appendix A. Their formulas are:

(N- l)u \j^ Pi W2 (3.14)

where

w, 1 N

= I(V2i < 02u) - T, W = J2 Wi

i=l

and

Var(T2m) = ^ ( | ^ _ ^

+

(N-

n—m mn(N-l)

N (jV-2n + £ ) £ ^ - + ( n - l ) A W

i=i

21

*2 (3.15)

where 1 N

w'l = I{y2i^92m)--, W' = Ylw'i, i=l

N

w* = I(y2i ^ 92m) - I(yii O i ) , W* = £ < • i=l

Besides, T2u is the form of replacing 92u in (3.12) by 92; T2m is the form of replacing

92m in (3.13) by 92, and discarding the first zero term. That is,

^ _ / ( j / * 0 2 ) - i - , T2u = Y,- " — > (3-16)

Pi

and

n ^ [l{y2i ^ 62) - l{yn ^ k)] V T2m = - Y / - —• (3-17)

m *—' p.

3.4 Estimation of median for Prasad-Graham scheme

The estimation of population total for the Prasad-Graham (1994) scheme was dis

cussed in Section 2.5, and an extension of this approach will be applied to estimate

population median in this section.


If we want to calculate the estimate of population median on the first occa

sion 9\, we can replace yu in (2.15) by the corresponding indicator function

I{yu ^ #1) ~~ §> a nd denote the new formula as T\. Here, T\ is actually the

same as defined in (3.11) of the Ghangurde-Rao (1969) scheme, and &i can

also be calculated by solving the equation Ti = 0.


22

(1) To obtain the estimate of population median for unmatched units on the

second occasion 92u, one can replace y2i in (2.17) by the corresponding

indicator function I(y2i ^ 92u) — \, and denote the new formula as T2u.

Here, T2u is as defined in (3.12) of the Ghangurde-Rao scheme, and 02u

is also the solution for T2u = 0.


second occasion 92m, first plug equation (2.27) into (2.26), we will get

(V2iPi/Pi)Pi *Zn = £

iesi Pi

then replace y2t by the corresponding indicator function

I(V2i ^ km) ~ 2'

so the new formula, which is denoted as T2m, becomes:

^ [(/(to < e2m) - 1 ) PI Pi Pi

•* 2m / . „ Pi

where p* is defined in (2.24) as

* VuPi r>- = .

(3.18)

' Pi

It is to be mentioned that one cannot replace yu in p* by the correspond

ing indicator function I{yu ^ §i) — \. This is because if we do so, the

denominator of T2m becomes [I(yu < ^i) — \]Pi/Pu and it is actually

the same as Tx which has been set to zero on the first occasion, then T2m

will be unidentified. Therefore, we keep the value yu observed on the

first occasion in the estimation procedure. In order to obtain 92m, let us

set T2m = 0, and the solution of the equation will be the estimate for

matched units on the second occasion.

23

Now, let us calculate the estimate of current population median 92 by solving

the equation for an estimator of T2 composite with T2u and T2m. The approach used

here is similar with that discussed in Section 3.2 for Des Raj (1965) scheme and

Section 3.3 for Ghangurde-Rao (1969) scheme:

02 is such that f2 = QT2u + (1 - Q)f2m = 0,

where Q is the optimal weight calculated by

Var(T2m) Q

Var(T2u) + Var(T2my

with Var(T2u) the same as (3.14) of the Ghangurde-Rao scheme, and Var(T2m)

found in the Appendix. The form of Var(T2m) is

v-rpi.)=4^- (•£ =? - wA+^4 (± «£« - vA , (N - l)n \j^ Pi J mn(N - 1) \j^ yu J

(3.19) where

1 N

w'i = I(y2i^92m)--, W' = J2™i-i=l

Moreover, T2u is the same as defined in (3.16) of the Ghangurde-Rao scheme, and

(/(to < k) - l ) Pi/P: Pi T2m = E -^ 3 T ^ j— (3.20)

P-i£si

One can see that T2m actually has the form of replacing 92m in (3.18) by 92. We

will compare the performance of Des Raj scheme, Ghangurde-Rao scheme, and

Prasad-Graham scheme in estimating population medians in the simulation study.

24

Chapter 4

A simulation study based on

generated populations

To compare the proposed method of estimation for Des Raj (1965) scheme, Ghangurde-

Rao (1969) scheme, and Prasad-Graham (1994) scheme in estimating median of a

finite population, a simulation study based on four sets of random generated popu

lations is conducted. In this chapter, we consider two models for simulating finite

populations.

4.1 Description of two sets of generated populations

based on model 1

In model 1, we first constructed two fixed finite populations of the size measure

Xi (i = 1,2,..., N) with N = 500 and 720 units respectively. The two random

populations of Xi were generated from a normal distribution with mean ax = 25,

and standard deviation ox = 5. Using these x values, the population values for the

25

first occasion yu were simulated by the model yu = 500+0.5Xj+ei, where the error

term ex was generated from a normal distribution with mean u = 0, and standard

deviation a = 8a x- Then, we constructed the population values for the second

occasion y2i using the generated populations of yu and by the model to = 600 +

5.1 to + e2, where the error term e2 was also generated from a normal distribution

with mean u = 0, and standard deviation a = 8a x- Now, we have two sets of

populations of x^, yu and y2i with N = 500 and 720 respectively based on model 1.

Another two sets of populations will be simulated in a different way in next section.

4.2 Description of two sets of generated populations

based on model 2

In model 2, we also simulated two fixed finite populations of the size measure Xi

(i = 1,2,..., AT) with N = 500 and 720 units respectively, but the two random

populations of Xi were generated from an exponential distribution with mean ux —

0.1 instead. After that, we constructed the population values for the first occasion

to using the generated x values and by the model yu — 25 + 1.5XJ + ei, where

the error term ei was generated from a normal distribution with mean u = 0, and

standard deviation a = bJ\ax- Using these y\ values, the population values for

the second occasion y2i are simulated by the model y2i = l-3to+e2, where the error

term e2 was generated from a normal distribution with mean u = 0, and standard

deviation a = 2. Then, we have two sets of populations of xit yu and y2i with

N = 500 and 720 respectively based on model 2.

26

4.3 Computations on generated populations

For each set of populations with N = 500 based on model 1 and model 2, we

considered two cases of the choices of n and m. One is n = 100, m = 50, and the

other is n = 250, m = 125. We applied Des Raj (1965) scheme, Ghangurde-Rao

(1969) scheme and Prasad-Graham (1994) scheme to the two cases, and calculated

both the estimates of first occasion median 9\ and the estimates of current median

92 for each scheme. This whole process was repeated R = 1000 times.

For each set of populations with N = 720 based on model 1 and model 2, we

also considered two cases of the choices of n and m. One is n = 180, m = 60, and

the other is n = 180, m = 90. The same as before, Des Raj scheme, Ghangurde-

Rao scheme, and Prasad-Graham scheme were applied to the two cases, and both

the estimates of first occasion median §i and the estimates of current median 02

were obtained for each scheme. This whole process was also repeated R = 1000

times.

Our goal of this simulation study is to compare the performance of the three

sampling schemes in estimating population medians. For this goal, we computed

the relative biases and relative mean squared errors of. 9\ and #2 for each scheme.

The relative bias (Rel.bias) of 9\ in percentage was calculated as

Rel.biasl% biasipi)

x 100% £(0i) - 0i x 100%, (4.1)

where 9\ denotes true value of the population median on the first occasion, and

E denotes expectation with respect to the design; that is, the average value over

R = 1000 runs. For example, E{01) = J2 &i(r)/R. Similarly the relative bias of r=l

92 in percentage was calculated as

bias(92) Rel.bias2% = 0i

x 100% = E{02) - 02

92

x 100%, (4.2)

27

where 92 denotes true value of the population median on the second occasion.

The relative mean squared error (Rel.MSE) of 9X and 92 in percentage were

computed as

Rel.MSEl% = MSEM x 100% = EAzlll X 100%, (4.3) 0i 0\

and

Rel.MSE2% = ^ M x m% = ^ I Z ^ l x 100%, (4.4) 02 02

where MSE denotes the mean squared error over R — 1000 runs. For instance,

MSE{91) = Jt[0i(r)-91]2/R.

r=l

Recall that h defined in equation (2.29) measures the efficiency of the estimator

using Xi as a size measure compared to the estimator using yu as a size measure

in estimating the current population total, and 5 is the correlation between yu/Pi

and to/Pi- Based on Prasad and Graham (1994), a relatively small h, specifically

for those populations with h < 1 — S2, Prasad-Graham scheme, which uses the in

formation obtained from the pervious occasion in estimating the current population

total, is superior to that of Chotai scheme and Ghangurde-Rao scheme. We would

like to see if this is true also for our situation to estimate the population median.

Therefore, we calculated the ratio (1 — S'2)/h' in the situation of estimating pop

ulation median for each set of generated population and if the ratio is greater than

1, we expect Prasad-Graham scheme performs better than Ghangurde-Rao scheme.

To obtain 6', let zu = / ( t o < #i) - \ and &n = / ( t o ^ #2) - \, where 0i and

02 are the true value for the first occasion median and current median and then we

replaced yu by zu and y2i by z2i in 6. To obtain hi, we replaced to by z2% in h. That

is,

6' = corr(^,^), (4.5) \Pi Pi J

28

with

and

where

h' = ^ (4.6)

i = i p i

«-£(*«-*)>• N N

Yi = Yly^ z2 = ^2Z2i-i=l i=l

The results obtained from each set of generated populations are compared for

the three sampling schemes in the next section.

4.4 Numerical comparisons

Since the bias of an estimator is the difference between an estimator's expecta

tion and the true value of the parameter being estimated and MSE is the difference

between values implied by an estimator and the true values of the quantity being

estimated, the best estimating scheme is that provides the smallest bias (or relative

bias) and smallest MSE (or relative MSE).

4.4.1 Comparisons of results for the two sets of generated pop

ulations based on model 1

The Rel.bias and Rel.MSE of #i and 92 for the two sets of generated populations

based on model 1 are present in Table 4.1, and we compared the results for Des Raj

(1965) scheme, Ghangurde-Rao (1969) scheme and Prasad-Graham (1994) scheme

in each set of populations.

29

Table 4.1: Rel.bias and Rel.MSE of 9X and 92 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for the generated populations based

on model 1: number of simulation=1000

Scheme

DR

GR

PG

DR

GR

PG

DR

GR

PG

DR

GR

PG

N

500

500

720

720

l-S'2

h'

0.1713

0.1713

0.2799

0.2799

n

100

250

180

180

m

50

125

60

90

Rel.

biasl%

0.8360

0.7591

0.7591

0.5300

0.3774

0.3774

0.5838

0.4944

0.4944

0.5838

0.4944

0.4944

Rel.

bias2%

0.7863

0.7314

0.8426

0.5589

0.4614

0.5331

0.3758

0.3661

0.4240

0.3937

0.3514

0.4017

Rel.

MSE1%

5.7012

4.8029

4.8029

2.3827

1.2001

1.2001

2.9114

2.1836

2.1836

2.9114

2.1836

2.1836

Rel.

MSE2%

28.5914

24.8262

31.9417

14.7048

9.7248

13.2954

8.1422

7.5581

9.6897

9.0180

6.9015

8.8674

For the set of populations with N = 500, case 1 is when n = 100, m = 50,

and case 2 is when n = 250, m = 125. From case 1 to case 2, sample frac

tion n/N increases, but proportion of matched units m/n stays the same. One

can find out that all the relative bias and relative MSE decreases as sample frac

tion increases from case 1 to case 2. For both cases, Rel.bias 1 and Rel.MSE 1 are

30

the same for Prasad-Graham scheme and Ghangurde-Rao scheme, this is because

the sampling procedures on the first occasion are the same for these two schemes;

Des Raj scheme has larger Rel.bias 1 and Rel.MSEl since it is under the PPSWR

framework while Ghangurde-Rao scheme is under the PPSWOR framework. For

example, when n = 100, m = 50, Rel.MSEl for Prasad-Graham scheme and

Ghangurde-Rao scheme is 4.8029%, but Des Raj scheme has 5.7012% Rel.MSEl.

Next, we compared relative bias and relative MSE of the current estimates for the

three schemes. For case 1, Rel.bias2 and Rel.MSE2 for Ghangurde-Rao scheme is

the smallest while for Prasad-Graham scheme is the largest. For case 2, Rel.bias2

and Rel.MSE2 for Ghangurde-Rao scheme is also the smallest, but for Des Raj is

the largest.

For the set of populations with N = 720, case 1 is when n = 180, m = 60, and

case 2 is when n = 180, m = 90. From case 1 to case 2, sample fraction n/N stays

the same, but proportion of matched units m/n increases. Since the choices of N

and n are the same for the two cases, Rel.bias 1 and Rel.MSEl does not change from

case 1 to case 2. The conclusion draw on Rel.biasl and Rel.MSEl is the same as the

set of populations with N = 500. That is, for both cases, Rel.biasl and Rel.MSEl

are the same for Prasad-Graham scheme and Ghangurde-Rao scheme, and Des Ra-

j scheme has larger values. For instance, when n = 180, m = 60, Rel.MSEl

for Prasad-Graham scheme and Ghangurde-Rao scheme is 2.1836%, but Des Raj

scheme has 2.9114% Rel.MSEl. When comparing Rel.bias2 and Rel.MSE2 for the

three schemes, one can find out that comparison for case 1 is the same as that of the

set of populations with N — 500. Thus, Ghangurde-Rao scheme has the smallest

Rel.bias2 and Rel.MSE2 whereas Prasad-Graham scheme has the largest Rel.bias2

and Rel.MSE2. For case 2, Ghangurde-Rao scheme still have the smallest Rel.bias2

31

and Rel.MSE2, and Prasad-Graham scheme has the largest Rel.bias2; however, the

scheme which has the largest Rel.MSE2 is now Des Raj scheme.

In summary, Ghangurde-Rao scheme is better than Des Raj scheme for esti

mating both the previous and current population medians since Ghangurde-Rao

scheme is under the PPSWOR framework whereas Des Raj scheme is under the

PPSWR framework. For estimating previous population median, Prasad-Graham

scheme actually performs the same as Ghangurde-Rao scheme due to the fact that

they use the same sampling and estimation procedures on the first occasion. How

ever, for estimating current population median, Prasad-Graham scheme does not

outperform Ghangurde-Rao scheme. One possible reason might be that the value

of h' > 1 — 8'2 because the ratio (1 — 8'2)/h' for the two sets of populations with

N = 500 and 720 are both smaller than 1. The ratios for the two sets of populations

are (1 - 6'2)/ti = 0.1713 and 0.2799, respectively.

4.4.2 Comparisons of results for the two sets of generated pop

ulations based on model 2

The Rel.bias and Rel.MSE of 9\ and 92 for the two sets of generated populations

based on model 2 are present in Table 4.2, and we compared the results for Des Raj


in each set of populations.

32

Table 4.2: Rel.bias and Rel.MSE of fa and 92 under Des Raj (DR), Ghangurde-

Rao (GR) and Prasad-Graham (PG) schemes for the generated populations based

on model 2: number of simulation=1000

Scheme

DR

GR

PG

DR

GR

PG

DR

GR

PG

DR

GR

PG

N

500

500

720

720

l-S'2

h'

2.1 All

2.1 All

1.9124

1.9124

n

100

250

180

180

m

50

125

60

90

Rel.

biasl%

0.5343

0.4569

0.4569

0.3524

0.2324

0.2324

0.4811

0.4152

0.4152

0.4811

0.4152

0.4152

Rel.

bias2%

1.7677

1.5216

1.4764

1.3079

0.8962

0.8915

1.2552

1.0779

0.9943

1.2482

1.0353

0.9795

Rel.

MSE1%

0.1107

0.0802

0.0802

0.04816

0.0227

0.0227

0.0885

0.0683

0.0683

0.0885

0.0683

0.0683

Rel.

MSE2%

1.9233

1.3593

1.2916

0.8766

0.4077

0.4167

0.8974

0.6439

0.5542

0.8846

0.6113

0.5528

We also considered those cases discussed in the previous section. The con

clusion drawn on the comparisons of Rel.biasl and Rel.MSEl is the same as the

two sets of generated populations based on model 1. That is, Prasad-Graham

scheme and Ghangurde-Rao scheme have the same Rel.biasl and Rel.MSEl, but

Des Raj scheme has larger values than these two schemes. For example, when

33

N = 500, n = 100 and m = 50, Rel.MSEl for Prasad-Graham scheme and

Ghangurde-Rao scheme is 0.0802%, but Des Raj scheme has 0.1107% Rel.MSEl;

when A = 720, n = 180, rn -- 60, Prasad-Graham scheme and Ghangurde-Rao

scheme has 0.0683% Rel.MSEl, but for Des Raj scheme is 0.0885%. Next, relative

bias and relative MSE of the current estimates for the three schemes were compared.

One can find out that Des Raj scheme has the largest Rel.bias2 and Rel.MSE2 for all

the cases, and Prasad-Graham scheme has the smallest Rel.bias2 and Rel.MSE2 for

all the cases except one when N = 500, n = 250 and m = 125. In this particular

case, Prasad-Graham scheme still has the smallest Rel.bias2, but its Rel.MSE2 is

slightly greater than that of Ghangurde-Rao scheme. For Prasad-Graham scheme,

Rel.MSE2 = 0.4167%, and for Ghangurde-Rao scheme, Rel.MSE2 = 0.4077%.

In summary, Des Raj scheme provides the largest bias and errors in estimating

both previous and current medians among the three compared schemes. Prasad-

Graham scheme and Ghangurde-Rao scheme performs the same in estimating pre

vious population median. For the estimation of current population median, Prasad-

Graham scheme is superior to Ghangurde-Rao scheme for almost all the situa

tions. The possible reason might be the value of hi is relatively small now, and

hi < 1 — 8'2 since the ratio (l — S'2)/h' for the two sets of populations with N = 500

and 720 are both greater than 1. The ratios for the two sets of populations are

(1 - 8n)/h' = 2.7471 and 1.9124, respectively.

34

Chapter 5

A simulation study based on real data

In previous section, we conducted a simulation study based on four sets of random

generated populations. Now, another simulation study based on two real data sets is

carried out in this section to compare the Des Raj (1965) scheme, Ghangurde-Rao

(1969) scheme and Prasad-Graham (1994) scheme in estimating median of a finite

population.

5.1 Description of data sets

We used two real data sets A and B. Data set A is from Murthy (1967), it relates to

the area under the wheat in 1964, in 1963 and the total area of each farm (cultivated

area) in 1961 for 34 villages in India. The cultivated area in 1961 is considered

as the size measure x, the area under the wheat in 1963 is considered as the value

observed on the first occasion y\, and the area under the wheat in 1964 is consid

ered as the value observed on the second occasion y2- For the sample schemes we

compared, N/n, N/u and n/m are all assumed to be integers, so if total number of

observations is N = 34, it will be difficult to choose values for n and m. There-

35

fore, we deleted the two smallest and two largest data based on the values for y2;

then, total number of observations becomes N = 30. Data set B is from Sukhatme

(1970), it relates to the area under the wheat in 1937, in 1936 and the cultivated area

in 1930 for 34 villages in India. The cultivated area in 1930 is the size measure x,

the area under the wheat in 1936 is the value observed on the first occasion to- and

the area under the wheat in 1937 is the value observed on the second occasion y2.

To make N = 30, the two smallest and two largest data based on the values of y2

were also deleted.

5.2 Computations on real data

For both data sets A and B, we chose n = 15 and m = 5. We applied Des Raj


to each data set, and calculated both the estimates of first occasion median 9\ and

the estimates of current median 92 for each scheme. This whole process was al

so repeated R = 1000 times. Then, similar with the computations on generated

populations present in Section 4.3, we calculated the relative biases and relative

mean squared errors of 9\ and 92 for each of the three schemes. The relative bias

(Rel.bias) of 6\ and the relative bias of 92 in percentage were obtained as defined in

(4.1) and (4.2) respectively. The relative mean squared error (Rel.MSE) of 9X and

92 in percentage were calculated as defined in (4.3) and (4.4) respectively. Finally

we also computed the ratio (1 - 5'2)/h' for each of the two data sets, where 5' is as

defined in (4.5), and hi is as defined in (4.6).

The results obtained from each data set are compared for the three sampling

schemes in the next section.

36

5.3 Numerical comparisons

The Rel.bias and Rel.MSE of 9X and 92 for data set A and data set B are present in

Table 5.1 and Table 5.2, respectively. We compared the results for Des Raj (1965),

Ghangurde-Rao (1969) and Prasad-Graham (1994) scheme in each data set.

Table 5.1: Rel.bias and Rel.MSE of 9X and 02 under Des Raj (DR), Ghangurde-Rao

(GR) and Prasad-Graham (PG) schemes for real data set A: number of simula

t ion^ 000

Scheme

DR

GR

PG

N

30

1-5'2

h'

0.0684

n

15

m

5

Rel.

biasl%

31.05

24.46

24.46

Rel.

bias2%

28.30

22.72

25.78

Rel.

MSE1%

2213.383

1388.473

1388.473

Rel.

MSE2%

2195.072

1502.715

1837.650

Table 5.2: Rel.bias and Rel.MSE of 0X and 02 under Des Raj (DR), Ghangurde-Rao

(GR) and Prasad-Graham (PG) schemes for real data set B: number of simula

t ion^ 000

Scheme

DR

GR

PG

N

30

1-5'2

h'

0.0600

n

15

m

5

Rel.

biasl%

30.19

25.69

25.69

Rel.

bias2%

28.10

22.34

27.66

Rel.

MSE1%

2132.899

1528.973

1528.973

Rel.

MSE2%

2191.323

1452.867

2063.229

37

For both data sets A and B, Rel.biasl and Rel.MSEl are the same for Prasad-

Graham scheme and Ghangurde-Rao scheme, but Des Raj scheme has larger Rel.biasl

and Rel.MSEl. For example, data set A has 1388.473% Rel.MSEl for Prasad-

Graham scheme and Ghangurde-Rao scheme, and 2213.383% Rel.MSEl for Des

Raj scheme. When comparing Rel.bias2 and Rel.MSE2 for the three schemes

in both data sets, values of these two measurements for Ghangurde-Rao scheme

are the smallest and for Des Raj scheme are the largest. For instance, data set A

has 1502.715% Rel.MSE2 for Ghangurde-Rao scheme, 1837.650% Rel.MSE2 for

Prasad-Graham scheme, and 2195.072% Rel.MSE2 for Des Raj scheme; data set B

has 1452.867% Rel.MSE2 for Ghangurde-Rao scheme, 2063.229% Rel.MSE2 for

Prasad-Graham scheme, and 2191.323% Rel.MSE2 for Des Raj scheme.

In summary, Des Raj scheme provides the largest bias and errors in estimating

both previous and current medians among the three compared schemes. Prasad-

Graham scheme and Ghangurde-Rao scheme performs the same in estimating pre

vious population median. These conclusions are the same as those drawn on the

two sets of generated populations which were discussed in Section 4.4.2. Howev

er, for the estimation of current population median, Prasad-Graham scheme is not

superior to Ghangurde-Rao scheme. There are two possible reasons: one might

be that the value of h' > 1 - 5'2 since the ratio (1 - 8'2)/h' for both data sets

are smaller than 1; the other one might be that the number of total observations

N and sample sizes n and m are too small, and the results could not reflect com

plete information. The ratios (1 - 8'2)/h' for data sets A and B are 0.0684 and

0.0600 respectively. If we compare the two ratios, one can find out that data set

A has relative smaller value of hi, and we may expect that Prasad-Graham scheme

38

performs better for data set A than data set B. In order to determine whether our ex

pectation is reasonable, we calculated the differences of Rel.MSE2 between Prasad-

Graham scheme and Ghangurde-Rao scheme for both data sets. The difference of

Rel.MSE2 for data set A is (1837.650 - 1502.715)% = 334.935%, and for data

set B is (2063.229 - 1452.867)% = 610.362%. Since difference for data set A is

smaller, Prasad-Graham scheme indeed performs better for data set A than B.

39

Chapter 6

Conclusion and future work

The practice of using partial replacement sampling scheme in repeated surveys is

quite common now because of a reduction in the burden of response as well as a

improvement on the efficiency of estimation. After the first of the two successive

sampling occasions, part of the units observed on that occasion will be rotated out of

the sample and replaced by a fresh selection from the entire population. These un

matched units are then observed on the second sampling occasion together with the

remaining set of matched units. An important case of successive sampling over two

occasions is the sampling with probability proportional to size (PPS). This particu

lar case of unequal probability sampling uses the auxiliary information to compute

the initial selection probabilities since auxiliary information is usually relatively

cheap to obtain and often available in advance for the entire population. There are

extensive work on the estimation of population total or mean for PPS sampling over

two occasions; however, no effort has been made to estimate population quantiles

for unequal probability sampling over two occasions. The methods available for

quantile estimation for successive sampling over two occasions in the literature to

date are only applicable to simple random sampling situation, and these methods

40

need density estimation to obtain estimates for quantiles. In this thesis, we present

a new method of estimating population median (second or 50% quantile) for un

equal probability sampling over two occasions based on estimating equations, and

this proposed method overcomes the need for density estimation. For the estimate

of the population median, three sampling schemes with unequal probabilities are

considered, and comparisons for the three sampling schemes are also discussed in

the thesis. The proposed approach for estimation of population median can also be

used for estimation of other quantiles, such as first (25%) quantile and third (75%)

quantile. In future work, variance estimation for the proposed median estimates

would be discussed.

41

Bibliography

[1] Chotai, J. (1974). A Note on the Rao-Hartley-Cochran Method for PPS Sam

pling Over Two Occasions. The Indian Journal of Statistics, 36: 173-180.

[2] Binder, D. A. and Patak, Z. (1994). Use of Estimating Functions for Estima

tion from Complex Surveys. Journal of the American Statistical Association,

Vol. 89, No. 427, pp. 1035-1043.

[3] Ghangurde, P. D. and Rao, J. N. K. (1969). Some Results On Sampling Over

Two Occasions. Sankhy, Series A, 31, 463-472.

[4] Hansen, M.M. and Hurwitz, W.N. (1943). On the theory of sampling from

finite populations. Annals of Mathematical Statistics, 14, 333-362.

[5] Jessen, R.J. (1942). Statistical investigation of a sample survey for obtaining

farm facts. Iowa Agricultural Experiment Statistical Research Bulletin, 304,

1-104.

[6] Murthy, M. N. (1967). Sampling Theory and Methods. Calcutta, India: Statis

tical Publishing Society.

[7] Prasad, N. G. N. and Graham, J. E. (1994). PPS Sampling over Two Occa

sions. Survey Methodology, Vol. 20, No.l, pp. 59-64.

42

[8] Raj, D. (1965). On sampling over two occasions with probabilities proportion

al to size. Annals of Mathematical Statistics, 36, 327-330.

[9] Singh, H. P., Tailor, R„ Singh, S. and Kim, J. M. (2007). Quantile Estimation

in Successive Sampling. Journal of the Korean Statistical Society, 36: 4, pp

543-556.

[10] Sukhatme, P. V, and Sukhatme, B. V. (1970). Sampling Theory of Surveys

With Applications. Ames, Iowa: Iowa State University Press.

[11] Thompson, M. E. (1997). Theory of Sample Surveys. Chapman Hall, London,

pp.94-95.

43

Appendix A

Derivation

A.l Des Raj Scheme

A.l.l Var(T2u) in equation (3.7)

To derive Var(T2„), assume that 02 is known in T2u and then replace 02 by 02u in

the derived expression. That is, let

T :,: V^ I(^2i ^ e^ ~ 2 __ y ^ WJ 2-^ upi *-" upi' ies2 i€s2

where

w% = I(y2i ^ 02) - \ (A.l)

Since sample s2 of u units are selected with PPSWR (probability proportional to

size measure x» with replacement) from the entire population of N units, T2u is

actually a Hansen-Hurwitz (1943) estimator. Based on the idea of finding variance

for a Hansen-Hurwitz estimator, variance of T2u is found to be JV / \ 2

v«Pi.) = i £ ( ^ ) P . where

W = Y^Wi. (A2) i= i

44

In the equation of Var(T2u) above,

J V / v 2 JV 2 N N

i=l \ P i J i=l Pl i=l i=l Pi —: —~! i = i i = i

JV 2

wf = Y,--w2

JV

because Y^Pi = 1- Then, we get variance of T2M: i=l

v»m.)4(Ef-^). <^> where in u>i, 92 is replaced by 02u.

A.1.2 Var(T2m) in equation (3.8)

To derive Var(T2m), assume that 9X and #2 are known in T2m and then replace 9X by

0i and 02 by #2m in the derived expression. That is, let

^ _ y / ( t o ^ 0l) ~ | | y / ( t o < 02) ~ / ( t o < Si)

Since sample sx of m units is a subset of sample s, variance of T2m should be

obtained by:

Var(T2m) = Var[£(T2m|s)] + £[Var(T2m|s)]. (AA)

First let us find £,(T2m |s). Sample s\ of m units is a simple random sample selected

without replacement (SRSWOR) from sample s of n units, so

= y^ /(to < 02) - 5

—E-. ies vt

45

where

w\ = I(y2i ^ 02) ( ^ • 5 )

Now, let

then

JV

W=Y:^

I / N '2

w'

(A.6)

(A.l)

because sample s of n units is chosen with PPSWR from the entire population of

N units, and E(T2m\s) is a Hansen-Hurwitz (1943) estimator. Its variance is found

similarly with how we get (A.3).

Next is to find Var(T2m|s). Let

w* = I(y2i^02)-I(yii^0i), (A.8)

and

then

JV

W = Y,v>l i=l

\mT? Pi

Var i-y

n — m 1

w: m f—' Pi

< _ I y^< E t e s ies

(A9)

since sample Si is a simple random sample selected from sample s without replace

ment (SRSWOR), and (J2 w*/Pi)/m i s t n e m e a n o f s i - A f t e r s o m e simplication, iesi

the above equation becomes:

Var(T2m|s) = n — m (n — l)m

iy(<) - l-T^-

46

Then,

£[Var(T2m|S)] n — m (n — l)m

E l E 5f _E I E < n l€.S i£s

where

and

^E^l-E^.

Efiy:<V = vaI('lv<) + Uly: \nkv>) \nkn) [ l»tr Pi

Ti P

w * i \ + W * \

Now, plug the above two equations into (A. 10), we get

ElV,r(T2m\s)} = ^ ( ± ^ - W mn \"H / pi

*2

Therefore, combining (A.4), (A.7) and (A.ll), Var(T2m) is calculated:

i / N

Var(T2m) = -[Y, \i=l

N '2 \ / N *2

Pi J mn \J-{ pi *2

where in w'i and w*, 0\ and 02 are replaced by 9\ and 02m, respectively.

A.2 Ghangurde-Rao Scheme

(A10)

( A l l )

(A12)

A.2.1 Var(T2w) in equation (3.14)

To derive Var(T2u), assume that 02 is known in T2u and then replace 02 by 92u in

the derived expression. That is, let

T _ y i1^ <^ ~l]p* -y wip* i£S2 ies2

Al

where wt is defined in (A.l). Following equation (25) of Ghangurde and Rao

(1969), the variance of T2u is given by

v«ew - ^ E ( | - ->^ - ^ (E f - *"). (A13)

where in u>i, 92 is replaced by 92u, and V7 is as defined in (A.2).


To derive Var(T2m), assume that 9i and 92 are known in T2m and then replace 9\ by

0i and 02 by 02m in the derived expression. That is, let

„ y ^ [/(to < ft) - | ] Pi , n ^ [/(to < ft) - / ( t o < 0i)\ Pi J-2m = y, 1 2 ^ •

. Pi m/--' Pi

Variance of T2m should be obtained by the equation (A.4). First is to find /?(T2m|s).

it Pi

| U y [(/(to ^ 02) ~\)~ (/(to < 0l) ~ \)] P nit Pi

_ y [HVX < 02) - J] Pi V Pi

" V Pi '

where w' is defined in (A.5). then,

AT I N '2

N -n \-^w/

with W as defined in (A.6).

48

To obtain E[Yai(T2m\s)], we first need to find Var(T2m|s):

Var(T2ro|S) = Var (± V ^ ^ ~ ^ < W * s)

\m it, P* J = n 2 V a r ( l y ;

\m/L-J i ^ W:P1

n — m 1 (n

m 1 y ^ fw*Pt __ 1 ^ w*Pz\

^"M ^ "it* p* ) because (Y^ w*Pi/Pi)/m is the mean of simple random sample si selected from

J G S I

sample s without replacement. Then, Var(T2m|s) can be simplified to

Var(T2m|s) = n2 n — m n £—< in — l)m

Therefore, we obtained £?[Var(72TO|s)] as

n •'—' v p.

W:P

- to*"-

£[Var(T2m|s)j = n2 n — m (n — l)m * f e

zGs

<2/f P 2

n~Z P^

(A15)

where the two expectations in the above equation are needed to be computed sepa

rately.

sf^E <P* n *—' pt

es yt

= Var l £ ! * £ U

1 AT-n /v^w,*2

^(E?—) + (i£< n2(N

N-n ^w? | J V ( n - l ) ^ 2

( A f - l ) n 3 - ^ p, (Af - l )n 3

The first expectation in (A. 15) is calculated in the following way. Since the proba

bility of selecting unit i (i = 1,2,..., AT) in sample s is dependent on which group it

belongs to, let us consider:

49

1, ifuj E s given that Uj G group Gt Sij = "

0, otherwise

where i (i = 1,2, ...,n) indicates the number of random groups, and j (j =

1,2,..., N) indicates the number of population units. Let us denote k = N/n be the

sampling fraction, and we know that Pi defined in Section 2.3 of the Ghangurde-

Rao scheme (1969) is

Pi = Y,Pi, (A16)

then, 1 ,„*2p2 1 N

?„*2

sE^HEE^-tfv (Air) eGs ' r t sGs j = l •?

50

Therefore, by using (A. 16) and (A. 17),

wfP2

n*-? Pi

= E E n Z ^ WfP?

n~r p; Gi, • • • , Gre

£fesS^t N »„*2

*;EE^« i 6 s j =7 ft

= ^ ( S E E ? E R

k2 r-^ = -YE n ^

z€s

n

J 6 G ,

„*2

*2

ft ^Eft

^ E ? iE ^ f t

j e G ,

ft j e G .

n lA/

(N-l)kN^\p3 N^p3

+

= k2

If ^ ^ f t / 1

AT-fc 1 (jV-l)A;Ar

JV

N Eft

JV

ft iV Eft .7 = 1

N r> N in*2 1 W w*2\ 1 w

Z ^ WJ JV Z ^ „. ^ AT2 Z ^ „. J T AJ-2 Z ^ w

*2

7 = 1 A ^ ft ' ^ 2 ^ ft ^2tr ft JV „ JV * 2 JV » 2 \ JV yw*2_lyT^ + JLyTi-)+±y z^w3 i v ^ n ^ /v2z^ „. r n J ^

" < jV(n - 1)

N „„*2

U = l

AT(n - 1)

(AT - l )n 2 uc w"2 + 4 E

7 = 1 * J

! 2 ( n - l ) | n

51 AT N(N - 1)

Then, we convert j to i in the above equation without loss of generality to get

E ly<3\ = N(n-Vw« \ n i t ft' / (N~l)n2

Now, plug the above two expectations in (A. 15), to obtain

1 v ^ wf L 2(n - 1) n - 1 J \ _L_ i i L J n2ft ft L i V _ 1 N(N~ 1).

£[Var(T2m|s)] n — m mn(iY - 1)

w *2 (AT - 2n + ^ ) ] T ^ - + (ra - l)NW

i=l ft

*2

(A18)

Therefore, variance of T2m is computed by using (A.4), (A.15) and (A.18):

JV / 2

Var(T2m) = ^ - ( g a ! - ^ r JV

+- (jV-2n+#)£^- + (n-l)AW i= l

*2

(A19)

rara(JV-l)

where in w^ and w*, 0! and 02 are replaced by 0! and 02m, respectively,

A.3 Prasad-Graham Scheme


To derive Var(T2m), assume that 0\ and 02 are known in T2m and then replace 0\ by

ft and 02 by 02m in the derived expression. That is, let

r V^ W(y2i < ft) - \) Pi/Pi] Pi _ y ^ « * £ -'2m - 2 ^ ^ _ Z^

ies i ft iesi ft

where

«;„• = [/(to < ft) " |]Pi

Pi (A20)

Variance of T2m is also obtained by the equation (A.4), and let us first find E(T2m \ s).

svr n ^ . , * V [1{y2i ^ °2) ~ 2]Pl - V W'A

E(T2m\s) = 2^Wi=l^ ^ ~ 2w i£s i£s Pi i£s Pi

52

which is the same as the Ghangurde-Rao scheme (1969), and w'i is as defined in

(A.5). Hence, Var[£(T2m|s) is the same as the equation obtained in (A. 14).

Next, we need to find Var(T2m|s):

(n-l)mj£\p*i J

with

Then, after some simplication, we found

Biv-pi-wi = ^ 4 E (** - w')2y" mn(N - 1) f r ' \ t o /

and it can be further simplified to

z=l x '

JV

1 £[Var(T2ro|S)] = * < " TO) ( E — ^ - ^ ) " ( A 2 2 ) mn(AT - 1) \j^ yu J

where JV

^ = $ > i i i= l

is the population total on the first occasion. By combining (A.4), (A. 14) and (A.22),

Var(T2m) is achieved as:

where in WJ-, 02 is replaced by ftm.

53

Appendix B

Rcode

####N/n, N/u and n/m are assumed to be integers

N=500

n=100

m=50

num.sample=l00 0

u=n-m

k=N/n

r=N/u

t=n/m

set.seed(10)

X.popu=rnorm(N, 25, 5)

Y.popu=500+0.5*X.popu+rnorm(N,0,8*sqrt(X.popu))

Y.popu2=600+5.l*Y.popu+rnorm(N,0,8*sqrt(X.popu))

##X.popu=rexp(N,10)

##Y.popu=25+l.5*X.popu+rnorm(N,0,5*sqrt(0.5*X.popu))

##Y.popu2=1.3*Y.popu+rnorm(N,0,2)

##da ta=read . tab le ("DataA. tx t" ,header=T)

54

##X.popu=data[,l]

##Y.popu=data[,2]

##Y.popu2=data[,3]

sizes=X.popu

p.popu=X.popu/sum(X.popu)

zli=as.numeric(Y.popu<=median(Y.popu))-1/2

z2i=as.numeric(Y.popu2<=median(Y.popu2))-1/2

delta.prime=cor(zli/p.popu,z2i/p.popu)

V2.prime=sum(z2i"2/p.popu)-sum(z2i)"2

V3.prime=sum(z2i"2*sum(Y.popu)/Y.popu)-sum(z2i)"2

h.prime=V3.prime/V2.prime

(1-delta.prime"2)/h.prime

########Des Raj Scheme########

sample_indices.S=matrix(0,num.sample, n)

Y.S=matrix(0,num.sample,n)

p.S=matrix(0,num.sample,n)

sample_indices.S2=matrix(0,num.sample,u)

Y.S2=matrix(0,num.sample, u)

p.S2=matrix(0,num.sample,u)

sample_indices.Sl=matrix(0,num.sample,m)

Y.Sl=matrix(0,num.sample,m)

Y.S12=matrix(0,num.sample, m)

p.Sl=matrix(0,num.sample, m)

thetal_hat=vector(mode="numeric",length=num.sample)

55

theta2u_hat=vector(mode="numeric",length=num.sample)

theta2m_hat=vector(mode="numeric",length=num.sample)

theta2_hat=vector(mode="numeric",length=num.sample)

T2u=vector(mode="numeric",length=num.sample)

T2m=vector(mode="numeric",length=num.sample)

W=vector(mode="numeric",length=num.sample)

W.prime=vector(mode="numeric",length=num.sample)

W.star=vector(mode="numeric",length=num.sample)

var_T2u=vector(mode="numeric",length=num.sample)

var_T2m=vector(mode="numeric",length=num.sample)

var.E_T2m=vector(mode="numeric",length=num.sample)

E.var_T2m=vector(mode="numeric",length=num.sample)

Q=vector(mode="numeric",length=num.sample)

T2u.hat=vector(mode="numeric",length=num.sample)

T2m.hat=vector(mode="numeric",length=num.sample)

library (pps)

for (i in 1:num.sample)

{

####lst occasion: sample S of n units is selected

####from entire population

sample_indices.S[i,]=ppswr(sizes,n)

Y. S[i, ]=Y.popu[sample_indices.S[i,]]

p. S[i, ]=p.popu[sample_indices.S[i,]]

f=function(thetal)

{

56

sum((as.numeric(Y.S[i,]<=thetal)-l/2)/(n*p.S[i,]))

}

thetal_hat[i]=uniroot(f,c(-5000,5000))$root

####2nd occasion: sample S2 of u units is selected


sample_indices.S2[i,]=ppswr(sizes, u)

Y.S2[i, ]=Y.popu2[sample_indices.S2[i,]]

p.S2[i, ]=p.popu[sample_indices.S2[i,]]

f=function(theta2u)

{

sura((as.numeric(Y.S2[i,]<=theta2u)-1/2)/(u*p.S2[i,]))

}

theta2u_hat[i]=uniroot(f,c(-5000,5000))$root

T2u[i]=sum((as.numeric(Y.S2[i,]<=theta2u_hat[i])-

l/2)/(u*p.S2[i,]))

W[i]=sum(as.numeric(Y.popu2<=theta2u_hat[i])-1/2)

var_T2u[i]= l/u*(sum((as.numeric(Y.popu2<=

theta2u_hat[i])-1/2)"2/p.popu)-W[i]~2)

####2nd occasion: sample SI of m units is selected

####from sample S

sample_indices.SI[i, ] =

sample(sample_indices.S[i,],m,replace=FALSE)

Y.S1[i, ]=Y.popu2[sample_indices.Sl[i,]]

Y.S12[i,]=Y.popu[sample_indices.SI[i, ]]

p.SI [i,]=p.popu[sample_indices.SI[i, ] ]

57

f=function(theta2m)

{

sum((as.numeric(Y.S1[i,]<=theta2m)-

as.numeric(Y.S12[i,]<=thetal_hat[i]))/(m*p.Sl[i,]))

}

theta2m_hat[i]=uniroot(f,c(-5000,5000))$root

T2m[i]=sum((as.numeric(Y.S[i,]<=thetal_hat[i] ) -

l/2)/(n*p.S[i,]))+sum((as.numeric(Y.S1[i,]<=

theta2m_hat[i])-as.numeric(Y.S12[i,]<=

thetal_hat[i]))/(m*p.si[i, ] ) )

W.prime[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-1/2)

W.star[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-

as.numeric(Y.popu<=thetal_hat[i]))

var.E_T2m[i]=l/n*(sum((as.numeric(Y.popu2<=

theta2m_hat[i])-l/2)"2/p.popu)-W.prime[i]"2)

E.var_T2m[i]=(n-m)/(m*n)*(sum((as.numeric(Y.popu2<=

theta2m_hat[i])-as.numeric(Y.popu<=

thetal_hat[i]))"2/p.popu)-W.star[i]"2)

var_T2m[i]=var.E_T2m[i]+E.var_T2m[i]

####Optimal weight Q can be obtained in terms of

####Var(T2u) and Var(T2m):

Q[i]=var_T2m[i] /(var_T2u[i]+var_T2m[i])

f=func t ion( the ta2)

{

T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=

58

t h e t a 2 ) - l / 2 ) / ( u * p . S 2 [ i , ] ) )

T 2 m . h a t [ i ] = s u m ( ( a s . n u m e r i c ( Y . S l [ i , ] < = t h e t a 2 ) -

a s . n u m e r i c ( Y . S 1 2 [ i , ] < = t h e t a l _ h a t [ i ] ) ) / ( m * p . S l [ i , ] ) )

Q [ i ] * T 2 u . h a t [ i ] + ( l - Q [ i ] ) * T 2 m . h a t [ i ]

}

theta2_hat[i]=uniroot(f,c(-5000, 5000))$root

}

thetal_hat

theta2_hat

Rel.biasl=mean(abs((thetal_hat-

median(Y.popu))/median(Y.popu)))*100

Rel.bias2=mean(abs((theta2_hat-

median(Y.popu2))/median(Y.popu2)))*100

Rel.MSEl=mean((thetal_hat-

median(Y.popu))"2)/median(Y.popu)*100

Rel.MSE2=mean((theta2_hat-

median(Y.popu2))"2)/median(Y.popu2)*100

Rel .biasl

Rel.bias2

Rel.MSEl

Rel.MSE2

########Ghangurde-Rao Scheme Scheme########

permutation=matrix(0,num.sample,N)

permutation2=matrix(0,num.sample,N)

59

p.permu=matrix(0,num.sample,N)

p.permu2=matrix(0,num.sample, N)

P.permu=matrix(0,num.sample,N)

P_star.permu2=matrix(0,num.sample,N)

prob.permu=matrix(0,num.sample,N)

prob.permu2=matrix(0,num.sample, N)

prob.cumul=matrix(0,num.sample,N)

prob.cumul2=matrix(0,num.sample,N)

rand=matrix(0,num.sample,n)

rand2=matrix(0,num.sample,u)

samp'le_indices . S=matrix (0, num. sample, n)

Y.S=matrix(0,num.sample,n)

p.S=matrix(0,num.sample,n)

P.S=matrix(0,num.sample,n)

sample_indices.S2=matrix(0,num.sample, u)

Y.S2=matrix(0,num.sample,u)

p.S2=matrix(0,num.sample,u)

P_star.S2=matrix(0,num.sample,u)


Y.Sl=matrix(0,num.sample,m)

Y.S12=matrix(0,num.sample,m)

p.Sl=matrix(0,num.sample,m)

P.Sl=matrix(0,num.sample, m)

thetal_hat=vector(mode="numeric",length=num.sample)

theta2u_hat=vector(mode="numeric",length=num.sample)

60



T2u=vector(mode="numeric",length=num.sample)


W=vector(mode="numeric",length=num.sample)


W.star=vector(mode="numeric",length=num.sample)

var_T2u=vector(mode="numeric",length=num.sample)








{

####lst occasion: sample S of n units is selected


#N units are divdied randomly into

#n groups

permuta t ion[ i , ]=sample(N)

p . p e r m u [ i , ] = p . p o p u [ p e r m u t a t i o n [ i , ] ]

f o r ( j in 1:N)

{

#First and last units for each of the n permutated

61

tgroups

first=ceiling(j/k)*k-k+l

last=ceiling(j/k)*k

P.permu[i,j]=sum(p.permu[i,][first:last])

prob.permu[i,j]=p.permu[i,j]/P.permu[i,j]

prob.cumul[i,j]=sum(prob.permu[i,][first:j])

}

#Generate n random probabilities between

#0 and 1 for each sample

rand[i,]=runif (n,0,1)

for(j in 1:n)

{

first=j*k-k+l

last=j*k

sample_indices.S[i,j]=permutation[i,][first+

sum(as.numeric(rand[i,j]>=prob.cumul[i, ] [first:last]))]

Y.S[i,j]=Y.popu[sample_indices.S[i, j]]

p.S[i, j]=p.popu[sample_indices.S[i,j]]

P.S[i,j]=P.permu[i,][first]

}

f=function(thetal)

{

sum((as.numeric(Y.S[i,]<=thetal)-1/2)*P.S[i,]/p.S[i,])

}

thetal_hat[i]=uniroot(f,c(-5000,5000))$root

62

####2nd occasion: sample S2 of u units is selected


#N units are divdied randomly into u groups

permutation2[i,]=sample(N)

p.permu2[i,]=p.popu[permutation2[i, ] ]

for(j in 1:N)

{

first=ceiling(j/r)*r-r+l

last=ceiling(j/r)*r

P_star.permu2[i,j]=sum(p.permu2[i, ] [first:last])

prob.permu2[i,j]=p.permu2[i,j]/P_star.permu2[i,j]

prob.cumul2[i,j]=sum(prob.permu2[i,][first:j])

}

#Generate u random probabilities between 0 and 1

#for each sample

rand2[i,]=runif (u,0,1)

for(j in 1:u)

{

first=j*r-r+l

last=j*r

sample_indices.S2[i,j]=permutation2[i,][first+

sumfas.numeric(rand2[i,j]>=

prob.cumul2[i,] [first:last]))]

Y.S2[i,j]=Y.popu2[sample_indices.S2[i, j] ]

p.S2[i,j]=p.popu[sample_indices.S2[i, j]]

63

P_star.S2[i,j]=P_star.permu2[i, ] [first]

}

f=function(theta2u)

{

sural(as.numeric(Y.S2[i,]<=theta2u)-

1/2)*P_star.S2[i,]/p.S2[i,])

}

theta2u_hat[i]=uniroot(f,c(-5000, 5000))$root

T2u[i]=sum((as.numeric(Y.S2[i,]<=theta2u_hat[i])-1/2)*

P_star.S2[i,]/p.S2[i,])

W[i]=sum(as.numeric(Y.popu2<=theta2u_hat[i])-1/2)

var_T2u[i]=(N-u)/((N-l)*u)*

(sum((as.numeric(Y.popu2<=theta2u_hat[i])-

1/2)"2/p.popu)-W[i]-2)


####from sample S

sample_indices.SI[i,]=

sample(sample_indices.S[i,],m,replace=FALSE)

Y.SI[i,]=Y.popu2[sample_indices.SI[i, ] ]

Y.S12[i, ]=Y.popu[sample_indices.Sl[i,]]

p.SI[i, ]=p.popu[sample_indices.SI[i,]]

for (j in l:m)

{

P.Sl[i,j]=P.S[i,sample_indices.S[i, ]==

sample_indices.SI[i,j]]

64

}

f=function(theta2m)

{

n/m*sum((as.numeric(Y.Sl[i,]<=theta2m)-

as.numeric(Y.S12[i,]<=

thetal_hat[i]))*p.si[i,]/p.SI [i,])

}

theta2m_hat[i]=uniroot(f,c(-5000,5000))$root

T2m[i]=sum((as.numeric(Y.S[i,]<=thetal„hat[i])-1/2)*

P.S[i,]/p.S[i,])+n/m*sum((as.numeric(Y.Sl[i, ]<='

theta2m_hat[i])-as.numeric(Y.S12[i,]<=

thetal_hat[i]))*P.Sl[i,]/p.Sl[i,])


W.star[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-

as.numeric(Y.popu<=thetal_hat[i]))

var.E_T2m[i]=(N-n)/((N-1)*n)*(sum((as.numeric(Y.popu2<=

theta2m_hat[i])-1/2)"2/p.popu)-W.prime[i]"2)

E.var_T2m[i] = (n-m)/(m*n*(N-1) ) *

((N-2*n+n/N)*sum((as.numeric(Y.popu2<=theta2m_hat[i])-

as.numeric(Y.popu<=thetal_hat[i]))"2/p.popu)+

(n-1)*N*W.star[i]"2)


####0ptimal weight Q can be obtained in terms of


Q[i]=var_T2m[i]/(var_T2u[i]+var_T2m[i])

65

f=function(theta2)

{

T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=theta2)-1/2)*


T2m.hat[i]=n/m*sum((as.numeric(Y.SI[i, ]<=theta2)-

as.numeric(Y.S12[i,]<=thetal_hat[i]))*P.Sl[i,]/p.Sl[i,])

Q[i]*T2u.hat[i]+(l-Q[i])*T2m.hat[i]

}

theta2_hat[i]=uniroot(f,c(-5000,5000))$root

}

thetal_hat

theta2_hat









Rel.biasl

Rel.bias2

Rel.MSEl

Rel.MSE2

66

########Prasad-Graham Scheme########

permutation3=matrix(0,num.sample, n)

p_star.permu3=matrix(0,num.sample,n)

P_tao.permu3=matrix(0,num.sample, n)

prob.permu3=matrix(0,num.sample,n)

prob.cumul3=matrix(0,num.sample,n)

rand3=matrix(0,num.sample,m)

sample_indices.permu3=matrix(0,num.sample,m)


Y.Sl=matrix(0,num.sample, m)

p.Sl=matrix(0,num.sample,m)

P.Sl=matrix(0,num.sample,m)

p_star.Sl=matrix(0,num.sample, m)

P_tao.Sl=matrix(0,num.sample, m)












67

{


####from sample S

#n units are divdied randomly into m groups

permutation3[i,]=sample(n)

p_star.permu3[i,]=Y.S[i,][permutation3[i,]]*

P.S [i,] [permutation3[i,]]/p.S[i,] [permutation3 [i,]]

for(j in l:n)

{

first=ceiling ( j/t)*t-t+l

last=ceiling (j/t)*t

P_tao.permu3[i,j]=sum(p_star.permu3[i, ] [first:last])

prob.permu3[i,j]=p_star.permu3[i,j]/P_tao.permu3[i,j]

prob.cumul3[i,j]=sum(prob.permu3[i,][first:j])

}

•Generate m random probabilities between

#0 and 1 for each sample

rand3[i,]=runif(m,0,1)

for(j in l:m)

{

first=j*t-t+l

last=j*t

sample_indices.permu3[i,j]=permutation3[i,][first+

sum(as.numeric(rand3[i,j]>=


68

sample_indices.SI[i,j]=

sample_indices.S[i,][sample_indices.permu3[i,j]]

Y.SI[i,j]=Y.popu2[sample_indices.SI[i,j]]

p.SI[i, j]=p.popu[sample_indices.SI[i,j]]

P.Sl[i,j]=P.S[i,sample_indices.S[i,]==

sample_indices.SI[i,j]]

p_star . SI [i, j ] =p_star .permu3 [i, ] [f irst-t-

sum(as.numeric(rand3[i,j]>=


P_tao.SI[i,j]=P_tao.permu3[i, ] [first]

}

f=function(theta2m)

{

sum((as.numeric(Y.SI[i,]<=theta2m)-

l/2)*P.Sl[i,]/p.Sl[i,]*

P_tao.Sl[i,]/p_star.Sl[i,])

}

t h e t a 2 m _ h a t [ i ] = u n i r o o t ( f , c ( - 5 0 0 0 , 5 0 0 0 ) ) $ r o o t

T 2 m [ i ] = s u m ( ( a s . n u m e r i c ( Y . S l [ i , ] < = t h e t a 2 m _ h a t [ i ] ) -

1 / 2 ) * P . S l [ i , ] / p . S l [ i , ] * P _ t a o . S l [ i , ] / p _ s t a r . S I [ i , ] )


a=(N-n)/((N-1)*n)

b=(N*(n-m))/(m*n*(N-1) )

var.E_T2m[i]=a*(sum((as.numeric(Y.popu2<=

t h e t a 2 m _ h a t [ i ] ) - l / 2 ) " 2 / p . p o p u ) - W . p r i m e [ i ] " 2 )

69

E.var_T2m[i]=b*(sum((as.numeric(Y.popu2<=

theta2m_hat[i])-l/2)"2*

sum(Y.popu)/Y.popu)-W.prime[i]"2)


####Optimal weight Q can be obtained in terms of


Q[i]=var_T2m[i] /(var_T2u[i]+var_T2m[i])

f=function ( theta2)

{

T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=theta2)-1/2)*


T2m.hat[i]=sum((as.numeric(Y.SI[i, ]<=theta2)-1/2)*

P.Sl[i,]/p.Sl[i,]*P_tao.Sl[i,]/p_star.Sl[i,])

Q[i] *T2u.hat [i] + (1-Q[i] ) *T2m.hat-[i]

}

theta2_hat[i]=uniroot (f,c(-5000,5000))$root

}

thetal_hat

theta2_hat







70



Rel.biasl

Rel.bias2

Rel.MSEl

Rel.MSE2

71

University of Alberta - collections. Canada

Documents

Transcript of University of Alberta - collections. Canada