University of Alberta - collections. Canada
Transcript of University of Alberta - collections. Canada
University of Alberta
ESTIMATION OF MEDIAN FOR UNEQUAL PROBABILITY SAMPLING OVER TWO OCCASIONS
by
Shu Jing Gu
A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of
Master of Science
in
Statistics
Department of Mathematical and Statistical Sciences
©Shu Jing Gu
Fall, 2011 Edmonton, Alberta
Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is
converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms.
The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or
otherwise reproduced in any material form whatsoever without the author's prior written permission.
1*1 Library and Archives Canada
Published Heritage Branch
395 Wellington Street OttawaONK1A0N4 Canada
Bibliotheque et Archives Canada
Direction du Patrimoine de I'edition
395, rue Wellington OttawaONK1A0N4 Canada
Your file Votre reference ISBN: 978-0-494-81301-0 Our file Notre reference ISBN: 978-0-494-81301-0
NOTICE: AVIS:
The author has granted a nonexclusive license allowing Library and Archives Canada to reproduce, publish, archive, preserve, conserve, communicate to the public by telecommunication or on the Internet, loan, distribute and sell theses worldwide, for commercial or noncommercial purposes, in microform, paper, electronic and/or any other formats.
L'auteur a accorde une licence non exclusive permettant a la Bibliotheque et Archives Canada de reproduire, publier, archiver, sauvegarder, conserver, transmettre au public par telecommunication ou par Plnternet, preter, distribuer et vendre des theses partout dans le monde, a des fins commerciales ou autres, sur support microforme, papier, electronique et/ou autres formats.
The author retains copyright ownership and moral rights in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
L'auteur conserve la propriete du droit d'auteur et des droits moraux qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou autrement reproduits sans son autorisation.
In compliance with the Canadian Privacy Act some supporting forms may have been removed from this thesis.
Conformement a la loi canadienne sur la protection de la vie privee, quelques formulaires secondaires ont ete enleves de cette these.
While these forms may be included in the document page count, their removal does not represent any loss of content from the thesis.
Bien que ces formulaires aient inclus dans la pagination, il n'y aura aucun contenu manquant.
1*1
Canada
Abstract
The main concern in repeated surveys is the non-response due to the fact that the
same individuals are sampled repeatedly. A solution to this problem is to use par
tial replacement sampling (rotation sampling) scheme, where after each sampling
occasion, a fraction of the units observed on that occasion is rotated out of the sam
ple, and replaced by a new sub-sample from the population. Here we considered
estimation of population median for sampling over two occasions where unequal
probability sampling is used on both occasions. Recently, attempts have been made
to estimate population median for sampling over two occasions when simple ran
dom sampling scheme is used for both occasions. Besides, these existing methods
require density estimation of the underlying characteristics. This thesis presents
a new approach of estimating population median based on estimating equations
for unequal probability sampling over both occasions. The proposed method also
avoids the problem of density estimation.
Acknowledgements
I am heartily thankful to my supervisor, Dr. Narasimha Prasad, who has guided
me throughout my thesis with his patience and knowledge. I have been extremely
fortunate to have Dr. Prasad as my supervisor for my master studies, and without
him, this thesis would not have been completed.
I also offer my sincere gratitude to my exam committee members, Dr. Peng Zhang
and Dr. Irina Dinu for spending their valuable time reading and evaluating my the
sis.
Dr. Peter Hooper has provided me funding during my studies. Hereby I deeply
thank him for the support.
Finally, I thank my dear family and friends for supporting and helping me through
out all my studies.
Table of Contents
1. Introduction 1
2. Population total estimation in PPS sampling on two occasions 5
2.1 Notations 5
2.2 Des Raj Scheme 6
2.3 Ghangurde-Rao Scheme 8
2.4 Chotai Scheme 10
2.5 Prasad-Graham Scheme 12
3. Estimation of median in PPS sampling on two occasions 15
3.1 Estimating equations 16
3.2 Estimation of median for Des Raj scheme 16
3.3 Estimation of median for Ghangurde-Rao scheme 20
3.4 Estimation of median for Prasad-Graham scheme 22
4. A simulation study based on generated populations 25
4.1 Description of two sets of generated populations based on model 1 . . 25
4.2 Description of two sets of generated populations based on model 2 . . 26
4.3 Computations on generated populations 27
4.4 Numerical comparisons 29
4.4.1 Comparisons of results for the two sets of generated popu
lations based on model 1 29
4.4.2 Comparisons of results for the two sets of generated popu
lations based on model 2 32
5. A simulation study based on real data 35
5.1 Description of data sets 35
5.2 Computations on real data 36
5.3 Numerical comparisons 37
6. Conclusion and future work . 40
Bibliography 42
Appendix A Derivation 44
A.l Des Raj Scheme 44
A.1.1 Var(T2M) in equation (3.7) 44
A.1.2 Var(T2m) in equation (3.8) 45
A.2 Ghangurde-Rao Scheme 47
A.2.1 Var(T2u) in equation (3.14) 47
A.2.2 Var(T2m) in equation (3.15) 48
A.3 Prasad-Graham Scheme 52
A.3.1 Var(r2m) in equation (3.19) 52
Appendix B R code 54
List of Tables
4.1 Rel.bias and Rel.MSE of 0X and 62 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for the generated pop
ulations based on model 1: number of simulation= 1000 30
4.2 Rel.bias and Rel.MSE of 0\ and 62 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for the generated pop
ulations based on model 2: number of simulation=1000 33
5.1 Rel.bias and Rel.MSE of #\ and §2 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for real data set A:
number of simulation= 1000 37
5.2 Rel.bias and Rel.MSE of 6X and 62 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for real data set B:
number of simulation= 1000 37
Chapter 1
Introduction
Partial replacement sampling from a finite population is commonly used in repeat
ed surveys due to the fact that it reduces the burden of response and improves the
efficiency of estimation as a result. If the same population is sampled repeatedly
time after time, people may not be willing to respond the same information, and
this makes them becoming less representative as time proceeds; then, the precision
of estimation will be greatly influenced. Partial replacement sampling, on the other
hand, reduces this non-response bias. Based on Jessen(1942) who first introduced
the problem of sampling on two successive occasions, the estimates of the current
(second) occasion may be improved by replacing only part of the sample on the pre
vious (first) occasion. That is, after the first sampling occasion, only a proportion of
the units observed on that occasion will be retained, and the remanning unmatched
units are replaced by a fresh selection from the entire population. The unmatched
units are then observed on the second sampling occasion along with the matched
units. In this way, the efficiency and precision of the estimates will be optimized.
It is to be mentioned that sampling over two occasions has been studied by
various authors under different sampling schemes, and a particularly important case
1
is the sampling with unequal probability. Unequal probability sampling scheme is
usually considered when the sample designer has access to an auxiliary variable or
size measure x which is correlated to the variable of interest y for each unit in the
population. Since the use of auxiliary information at the estimation stage increases
the accuracy of estimates for the variable of interest, the selection probability for
each unit is set to be proportional to its size measure. In most cases, values of the
auxiliary variable x are available in advance for the entire population because of the
relatively low cost of achieving them. For example, surveys attempting to estimate
the area under the wheat in a village may use the total area of each farm (cultivated
area) as an auxiliary variable.
There is an extensive literature on the estimation of population total or mean
for PPS (probability proportional to size) sampling over two occasions. For in
stance, Prasad and Graham (1994) discussed several different sampling and esti
mation procedures for finite population total with PPS sampling on two occasions.
The approach developed is to provide the best estimate of current population to
tal by optimizing the weights for the estimates based on matched and unmatched
units on the second occasion. However, it should be noted that many surveys are
conducted not only to estimate the population total but also to estimate quantiles in
particular for variables such as income earnings. Recently, the problem of estimat
ing finite population quantiles in successive sampling on two occasions has been
considered. For example, Singh, H. P., Tailor, Singh, S. and Kim (2007) have made
some efforts in the development of procedures on quantile estimation for a finite
population. Nonetheless, the study is restricted to simple random sampling over t-
wo occasions, and estimates of probability density functions are required to achieve
the optimum estimation of quantiles, which makes the approach very complicated.
2
In view of this it is of great interest to see how the quantile estimation for a
finite population would look when PPS sampling is used for both of the two occa
sions. The purpose of this thesis is to present a new approach for the estimation
of population median for unequal probability sampling over two occasions. Prasad
and Graham (1994) discussed several schemes for the population total estimation
when unequal probability sampling is used over two occasions. The new approach
for estimating the median based on estimating equations is considered in this thesis
for all the sampling schemes discussed in Prasad and Graham (1994).
Chapter 2 provides an overview of the sampling schemes and associated es
timation methods discussed in Prasad and Graham (1994). It introduces several
sampling schemes with probability proportional to size over two successive occa
sions. The schemes discussed are Des Raj (1965) scheme, Ghangurde-Rao (1969)
scheme, Chotai (1974) scheme and Prasad-Graham (1994) scheme which is devel
oped as a modification to Chotai scheme. For each of the scheme, descriptions of
estimating a finite population total are provided.
In Chapter 3, estimating equation approach is considered to estimate the popu
lation median for all the schemes described in Chapter 2.
A simulation study based on four sets of synthetic populations is carried out in
Chapter 4. The relative bias and relative mean squared error for the proposed esti
mators are evaluated to illustrate the present approach for esimating the population
median.
Chapter 5 also considered a simulation study where finite populations are gen
erated using real data sets that are published in the literature.
Conclusions and further research projects are discussed in Chapter 6.
Appendix A contains derivations of the results given in Chapter 3. All numer-
3
ical computations done in this thesis are produced by using R 2.12.1 running on a
Windows XP platform. Appendix B gives the R code used in this thesis.
Chapter 2
Population total estimation in PPS
sampling on two occasions
In this chapter, the sampling schemes and underlying estimation methods for esti
mating population total given in Prasad and Graham (1994) are discussed. Prasad
and Graham considered four different sampling schemes: Des Raj (1965) scheme,
Ghangurde-Rao (1969) scheme, Chotai (1974) scheme and Prasad-Graham scheme
which is a modified version of Chotai scheme. Each of the four schemes will be
described in this chapter.
2.1 Notations
Consider a finite population of TV units with characteristics yt (i = 1,2, ...,7V)
whose total Y = y\ + y2 + ••• + VN is to be estimated. Let us denote:
revalue of auxiliary variable (size measure) for the z-th unit
yu=value of y for the i-th unit observed on the first occasion
y2t=value of y for the i-th unit observed on the second occasion
5
Yi=population total observed on the first occasion
l2=population total observed on the second occasion
m=number of units matched
u=number of units unmatched
5/2m=population total for matched units observed on the second occasion
5/2«=population total for unmatched units observed on the second occasion
For each of the following schemes size measure Xi (i = 1,..., N) is assumed to
be known for all N units in the population before sampling, and successive sam
pling over two occasions is used.
2.2 Des Raj Scheme
Raj (1965) has considered the following scheme of sampling over two occasions:
• On the first occasion:
A sample s of size n is selected from the entire population with PPSWR
(probabilities proportional to size measure x^ with replacement); that means, N
the probability of selecting each unit from the population is pt = £,/ ]T X{. i=l
• On the second occasion:
(1) A simple random sample s\ of size m = An (0 < A < 1) is selected
from sample s without replacement (SRSWOR); therefore, the proba
bility of selecting sample si is 1/Q).
(2) An independent sample s2 of size u — n - m is selected with PP
SWR from the whole population; this selecting method is the same as
6
choosing sample s, and the probability of selecting each unit from the
population is also p^ = xl/y}2lXi.
Then, unbiased estimate of population total for both the first and second occa
sions can be obtained as:
*]=X>ii / (nPi) (2-1)
and
% = QY2u + (1 - Q)Y2m, (2.2)
where Q is a weight (0 ^ Q < 1), and
^2* = X>2i/frPi), (2-3) i€S2
Y2m = ^2 yul(nPi) + ^(V2i - yu)/{mpi). (2.4)
Our primary interest is to find the best estimate of current population total Y2,
and one can calculate it by optimizing the allocated weight Q and the fraction of
matched units A. The optimal values of Q and A are values that provide the mini
mum variance of Y2. According to the composite estimator F2 defined in (2.2),
V(Y2) = Q2V(Y2u) + (1 - Q)2V(Y2m) + 2Q(1 - Q)COV{Y2u, Y2m), (2.5)
where COF(Y2u, l ™) = 0 because sample si is a subset of s which is independent
from sample s2. The optimal weight has the form of
= V(Y2m)
V(Y2u) + V(Y2m)
and let us assume that the following two variances are the same:
N N
Vi = £>«/?. - yifvi = v2 = Y,(v*in - Y^2pi = v- (2-7) 4 = 1 » = 1
Then, using the optimal values of Q and A, the minimum variance of Y2 is found to
be
v ^ - J f f l ^ S E l , ifS<l (,8)
where N
5 = v J2(yii/Pi ~ yi)(WPi - Y2)Pi (2-9) 2 = 1
is the correlation coefficient between yu/pi and y^i/pi-
2.3 Ghangurde-Rao Scheme
The procedure proposed by Ghangurde and Rao (1969) modified the Des Raj (1965)
scheme on the selection of sample s and sample s2. For simplicity we assume N/n
and N/u to be integers.
• On the first occasion:
Population of N units are divided at random into n groups, each of size N/n;
then, sample s of size n is selected by drawing one unit from each of the n
groups independently with PP^WOR (probabilities proportional to pi with
out replacement); this indicates that the probability of selecting one unit from
each random group is Pi/Pu where P* denotes the total of pi values for the
group containing i-th unit (i = 1,2,..., N) when selecting s.
• On the second occasion:
(1) A sample Si of size m = An (0 < A < 1) is drawn from s using the
same method as described in the Des Raj scheme.
(2) For the independent sample s2 of size u = n — m, first split N units at
random into u groups, each of size N/u; after that, collect one unit from
8
each of the u groups independently with PPpjWOR. The probability of
selecting one unit from each random group is Pi/P*, where P* denotes
the total of p, values for the group containing i^th unit (i = 1,2,...,N)
when selecting s2.
Then, population total for the first and second occasions are respectively unbi-
asedly estimated by:
y, = j2 y^Ei (2.10) •r- Pi
and
n = Q'YL + (1 - Q')Y2^ (2.11)
where Q' is a weight (0 ^ Q' 1), and
*£. = £ ^ , (2-12)
Y>m = YV-^ + -Y{V2i-yu)Pi. (2.13) k Vi rn^ Pi
Under the assumption in (2.7) and using the optimal values of Q' and A, the mini
mum variance of Y2' is given by
KnnOK,') = ^ ^ [ l - n / i V + v/ 2 T r ^ ) ( l + 7WiV], ifS < \ (2.14)
where V and 5 are as defined in (2.7) and (2.9) respectively, and
7 (1-S)V
with N , N
and t = l i=l
N
NV P = T7777 ^(Vu ~ Yi)(yx ~ Y2) i = l
which is the correlation coefficient between yu and y2i-
2.4 Chotai Scheme
Under the additional assumption that n/m is also an integer, Chotai (1974) intro
duced a sampling design that modified the Ghangurde-Rao (1969) scheme on the
selection of sample si over the second occasion.
• On the first occasion:
A sample s of size n is chosen by the same procedure as in the Ghangurde-
Rao scheme.
• On the second occasion:
(1) The n units in sample s are divided at random into m = An (0 < A <
1) groups, each of size n/m; then, draw one unit from each of the m
groups independently with PPPWOR (probabilities proportional to p
without replacement). The selected m units compose sample s\, and
the probability of choosing one unit from each random group is P / P j + ,
where Pj is as defined in the Ghangurde-Rao scheme, and P+ denotes
the total of p values for the random groups of s containing i-th unit
(i — 1,2,...,N) when selecting si.
(2) The selection of the independent sample s2 of size u = n — m is also
the same as described in the Ghangurde-Rao scheme.
After that, population total for the first and second occasions are respectively
unbiasedly estimated by:
y c = y, yuP ( 2 1 5 )
it. Pi
and
Y2C = QCY2
CU + (1 - Qc)Y2
cm, (2.16)
10
where Qc is a weight (0 ^ Qc ^ 1) and
V 2u £ V2iP*
Pi
yC 12m
yuPi , v ^ (?/2i - yii)P? y v ^ i i £ i + y -Pi
(2.17)
(2.18)
In equations (2.17) and (2.18), both P and P* are as defined in the Ghangurde-Rao
scheme, and P+ is as defined in the description of Chotai sampling design of this
section. The minimum variance of Y2C obtained by using the optimal values of Qc
and A under the assumption in (2.7) is:
NV VmUY2
C) = ^ T T Y y t 1 - nlN + V/20Z^)] , ifS < \. (2.19)
So far, the assumption in (2.7) has been considered for the estimation proce
dures. Now if we do not consider it; then, the estimation of population total for
the unmatched units on the second occasion will be the same as that with the as
sumption, but the estimation for the matched units on the second occasion will be
different. Under the Chotai scheme but without assumption (2.7), a composite esti
mator of y2 is:
yCM = QCMyC + ( 1 _ QCM)YCM^
where Yg is defined in (2.17), QCM is a weight (0 < QCM < 1) and
(2.20)
V CM 2m
y fai - Pyii)P? + oy &&
J G S I
with
Pi
N
iGs Pi
(2.21)
E(y2i/Pi-Y2)2
Pi 8 = 1
N (2.22)
T,(yu/pi-Yi)2Pi Lj=l
where 5 is the correlation between yu/Pi and yu/pi. Notice that we did not use S as
defined in (2.9) since the assumption in (2.7) has not been considered here, instead
11
6 is defined as
JV
„ „ x J2(yu/Pi-Yi)(y2i/Pi-Y2)pi x i Vu y2i \ i=i o = corr — Pi Pi ) N/VTVV2
N N
where Vx = Y^ivu/Pi ~ Yx)2Pi, and V2 = E ( W P * - ^ V The minimum
»=i j = i
variance of ig M without assumption (2.7), obtained by using the optimal values of
QCM a n ( J ^ j s gjygjj by
KnnCP?™) = M ^ 1 } ( 1 + VT^P - n/N), if8 < \. (2.23)
It should be noted that the value of /3 is required for the use of Y2CM; however,
the actual value is usually unaccessible in practice, and an estimate of /3 on the avail
able sample may induce some biases in the estimation. Therefore, a modification
on the Chotai scheme appears, and it is discussed in the next section.
2.5 Prasad-Graham Scheme
In this section, the authors introduced an alternative sampling and estimation pro
cedure of the Chotai (1974) scheme that does not need the value of j5 defined in
(2.22) to be known in advance. Under Prasad-Graham scheme (1994), N/n, N/u
and n/m are all assumed to be integers as in Chotai. In this sampling scheme, the
information collected on the first occasion is used in selecting the sample si on the
second occasion. The new approach is:
• On the first occasion:
A sample s of size n is selected by the same method as in the Ghangurde-
Rao (1969) scheme, and after the selection, each units of s are observed on a
characteristic y and denoted as yu (i = 1,..., n).
12
• On the second occasion:
(1) The n units in sample s are split at random into m = An (0 < A <
1) groups, each of size n/m; then, select one unit from each of the
m groups independently with PPp*WOR (probabilities proportional to
p* without replacement). The selected m units yields sample si, and
the probability of choosing one unit from each random group is p*/Pi,
where
P* = ^ , (2-24) Pi
which involves the information observed on the first occasion, Pt is as
defined in the Ghangurde-Rao scheme, and p denotes the total of p*
values for the groups containing i-th unit (i = 1,2,..., N) when select
ing Si.
(2) The selection of the independent sample s2 of size u = n — mis also
the same as described in the Ghangurde-Rao scheme.
Under the Prasad-Graham scheme, population total for the first occasion is un-
biasedly estimated by Yf as defined in (2.15), and a composite estimator of Y2 for
the second occasion is:
Y2=Q*Y2cu + (l-Q*)Y2*m, (2.25)
where Yg is defined in (2.17), Q* is a weight (0 ^ Q* ^ 1) and
i£si "l
with
y*. = V2^i, (2.27) Pi
13
The minimum variance of Y2* obtained by using the optimal values of Q* and A is
W i ? ) = n ( ^ 21 } [ l - n/N + Vh], (2.28)
where
with
h = ^ , (2.29), V2
and
z=ipi
v*=ib(—Yi-Y*)*1£- (2-3°) It is to be mentioned that the value of h measures the efficiency of the estimator
using pi as initial selection probabilities over the estimator using yu/Yi as initial s-
election probabilities in estimating the current population total; therefore, a relative
smaller value of h indicates that Prasad-Graham scheme, which uses the informa
tion obtained from the pervious occasion in selecting the sample on the current
occasion, outperforms Chotai scheme and Ghangurde-Rao scheme.
14
Chapter 3
Estimation of median in PPS
sampling on two occasions
The problem of quantile estimation is often considered when study variables exhibit
skewed distribution, such as income earnings. This is because unlike the population
total, quantiles are not affected by extreme values. In this chapter, estimation pro
cedures discussed in Chapter 2 will be extended to the situation where the median
of a finite population is estimated on each of the two occasions, and the current esti
mate is still of chief interest. The extension of the procedures is not straightforward.
Since the population median is a nonlinear function of population values, we con
sidered estimating equation approach (See Binder and Patak (1994) and Thompson
(1997)). In the following sections we discussed estimating equation approach for
unequal probability sampling.
15
3.1 Estimating equations
According to Binder and Patak (1994) and Thompson (1997), population median
9N can be defined as the solution of the population estimating equation
N 1
Tt[i(yi^eN)--] = o, (3.i) i= i z
and then 9, an estimator for population median can be defined as the solution of the
sampling estimating equation
r*iii=, (3.2), i€s %
where /(•) is the indicator function taking the value 1 when the condition is satisfied
and 0 otherwise, s is the set of population units in the sample, and 7 denotes the
probability of inclusion for i-th unit. That is,
^ = ]Tp(S). (3.3)
For any sampling design, (3.2) is unbiased for (3.1), and one may expect 6 to be as
close as to 9^ for large samples. The next section deals with estimation of median
based on estimating equations for Des Raj (1965) sampling scheme.
3.2 Estimation of median for Des Raj scheme
All the notations denoted in Section 2.1 as well as the following notation will be
used throughout this chapter:
#i=estimate of population median on the first occasion
02=estimate of population median on the second occasion
02u=estimate of population median for unmatched units on second occasion
16
6,2m=estimate of population median for matched units on second occasion
The estimation procedure of population total for Des Raj (1965) scheme, which
was described in Section 2.2, is now extended to estimate population median by
incorporating the idea of estimating equation introduced in Section 3.1.
• On the first occasion:
Suppose we want to calculate the estimate of population median on the first
occasion 9X. In order to do this, let us replace yu in (2.1) by the corresponding
indicator function I(yu ^ 9]) — \, and denote the new formula as Ti, then:
r^E£fc|AM (34) zes np%
and
§i is such that Ti = 0.
• On the second occasion:
(1) To obtain the estimate of population median for unmatched units on the
second occasion §2u, replace y2i in (2.3) by the corresponding indicator
function I(y2i ^ 02„) - §. and denote the new formula as T2u, then:
T2U = E I { V 2 i ^ § 2 u ) " *. (3-5) t"^ UPi
and
92u is such that T2„ = 0.
(2) To obtain the estimate of population median for matched units on the
second occasion 92m, replace yu and y2i in (2.4) by the corresponding
indicator functions I(yu ^ #i) - \ and I(y2i ^ #2m) - \ respectively,
17
and denote the new formula as T2m, then:
Tim _ y 7(y» < ft) - § , y ^ < 02m) ~ ijyii ^ di) ( 3 6 ) m /—' np, ^—' mpi
In order to find out #2m, we can set this new formula to be zero, and the
solution will be the estimate of current population median for matched
units. One can notice that the first term in the formula is actually the
same as Ti in (3.4) which has already been set to zero on the first occa
sion. Therefore, we only need to set the second term in the formula to
be zero to achieve #2m. That is:
a • u 4-u 4. ST J ( f e ^ ^ 2 m ) ~ J ( ^ ^ ^ ) n 92m is such that > — - = 0. ^—' mpi iesi
Thus, both 92u and #2m, independently estimate the population median for the
second occasion (#2). Now, one can obtain a composite estimator of #2 as a weight
ed average of these two estimators; that is, 92 = Q92u + (1 — <2)#2m- This is
an optimal estimator for 92, but the optimal weights are functions of variances of
these two estimators which are difficult to evaluate because it requires density esti
mation. To overcome this problem, we first obtain optimal estimating equation by
taking weighted average of the two estimating equations, T2u and T2m, to obtain
a better estimator of the population median 92 based on unmatched and matched
samples. That is, consider the following estimating equation
T2 = QT2u + (1 - Q)T2m
with
Var(T2) = Q2 Var(T2„) + (1 - Q)2 Var(T2m)
because T2u and T2rn are independent and their covariance is zero. Q is a weight
(0 ^ Q ^ 1). The optimal value of Q is the value that provide the minimum
18
variance of T2, and it has the form of
Var(T2m) Q =
Var(T2u) + Var(T2m)
where Var(T2u) and Var(T2m) have been found in the Appendix A, and their for
mulas are:
V » P y = i ( £ f - ^ ) , (3.7)
with
1 N
Wi = I{y2i ^ 6>2u) - 2» ^ = ^ Wi,
» = 1
and
V ^ ) 4 ( £ f - ^ ) + ^ ( E f - - 2 ) . (3.B> with
1 N
w'i = I(y2i^02m)--, W' = Y,<> i= i
< = I(y2i ^ 92m) ~ I(yu O i ) , W* = ^ < . i= l
By using Var(T2u) and Var(T2m), optimal weight Q is obtained, and then we
can calculate the estimate of current population median as following:
§2 is such that ffe = QT2u + (1 - Q)f2m = 0,
where
f a w =x: J ( t t e < g a ) "' . ^ JGS2 ^
and
f = y / (to^ftWfo^ (310) ^—' mpi i€si
Notice that T2ti is the form of replacing 62u in (3.5) by 92; T2m is the form of
replacing 02m in (3.6) by 92, and discarding the first zero term. 19
3.3 Estimation of median for Ghangurde-Rao scheme
In Section 2.3, the estimation of population total for Ghangurde-Rao (1969) scheme
was discussed, and it is now extended to estimate population median in this section.
• On the first occasion:
In order to achieve the estimate of population median on the first occasion 9\,
replace yu in (2.10) by the corresponding indicator function I(yu ^ #i) — \,
and denote the new formula as Ti:
ri = E L ~ v ' " T ZjPl (3-n)
and
Pi
9\ is such that Ti = 0.
• On the second occasion:
(1) To calculate the estimate of population median for unmatched units on
the second occasion 02u, replace y2i in (2.12) by the corresponding in
dicator function I(y2i ^ 92u) — \, and denote the new formula as T2u:
[ifoi ^ o2u) - \ D *
T2U = y ; -*= J —, (3.i2)
and
Pi 1&S2
#2u is such that T2u = 0.
(2) To obtain the estimate of population median for matched units on the
second occasion 82m, replace yu and y2i in (2.13) by the corresponding
indicator functions I(yu < 0i) - § and I(y2i ^ 92m) - \ respectively,
20
and denote the new formula as T2m, then:
' 2m
+- T m {—i
i&si
[l(.Vli*i0l)-%]Pi
[l(y2z^02m)-I(yu^e1)]Pi (3.13)
One can set this new formula to be zero to find out 92m- The first term
in the formula has already been set to zero on the first occasion because
it is the same as Ti in (3.11). Therefore, only the second term in the
formula is needed to be set to zero, and then
~ n v—"\ 62m is such that — >
l{yn < 4m) - l{yu < 9i)
JGSI Pi
= 0.
The estimate of current population median 92 for Ghangurde-Rao scheme can
also be obtained similarly with the procedure of estimating 02 for Des Raj scheme
in Section 3.2, and the procedure is:
92 is such that f2 = QT2u + (1 - Q)f2m = 0,
where
Q Var{% 2ml
Var(T2u) + Var(T2m)
is the calculated optimal weight with Var(T2U) and Var(T2m) which can be found
in the Appendix A. Their formulas are:
(N- l)u \j^ Pi W2 (3.14)
where
w, 1 N
= I(V2i < 02u) - T, W = J2 Wi
i=l
and
Var(T2m) = ^ ( | ^ _ ^
+
(N-
n—m mn(N-l)
N (jV-2n + £ ) £ ^ - + ( n - l ) A W
i=i
21
*2 (3.15)
where 1 N
w'l = I{y2i^92m)--, W' = Ylw'i, i=l
N
w* = I(y2i ^ 92m) - I(yii O i ) , W* = £ < • i=l
Besides, T2u is the form of replacing 92u in (3.12) by 92; T2m is the form of replacing
92m in (3.13) by 92, and discarding the first zero term. That is,
^ _ / ( j / * 0 2 ) - i - , T2u = Y,- " — > (3-16)
Pi
and
n ^ [l{y2i ^ 62) - l{yn ^ k)] V T2m = - Y / - —• (3-17)
m *—' p.
3.4 Estimation of median for Prasad-Graham scheme
The estimation of population total for the Prasad-Graham (1994) scheme was dis
cussed in Section 2.5, and an extension of this approach will be applied to estimate
population median in this section.
• On the first occasion:
If we want to calculate the estimate of population median on the first occa
sion 9\, we can replace yu in (2.15) by the corresponding indicator function
I{yu ^ #1) ~~ §> a nd denote the new formula as T\. Here, T\ is actually the
same as defined in (3.11) of the Ghangurde-Rao (1969) scheme, and &i can
also be calculated by solving the equation Ti = 0.
• On the second occasion:
22
(1) To obtain the estimate of population median for unmatched units on the
second occasion 92u, one can replace y2i in (2.17) by the corresponding
indicator function I(y2i ^ 92u) — \, and denote the new formula as T2u.
Here, T2u is as defined in (3.12) of the Ghangurde-Rao scheme, and 02u
is also the solution for T2u = 0.
(2) To obtain the estimate of population median for matched units on the
second occasion 92m, first plug equation (2.27) into (2.26), we will get
(V2iPi/Pi)Pi *Zn = £
iesi Pi
then replace y2t by the corresponding indicator function
I(V2i ^ km) ~ 2'
so the new formula, which is denoted as T2m, becomes:
^ [(/(to < e2m) - 1 ) PI Pi Pi
•* 2m / . „ Pi
where p* is defined in (2.24) as
* VuPi r>- = .
(3.18)
' Pi
It is to be mentioned that one cannot replace yu in p* by the correspond
ing indicator function I{yu ^ §i) — \. This is because if we do so, the
denominator of T2m becomes [I(yu < ^i) — \]Pi/Pu and it is actually
the same as Tx which has been set to zero on the first occasion, then T2m
will be unidentified. Therefore, we keep the value yu observed on the
first occasion in the estimation procedure. In order to obtain 92m, let us
set T2m = 0, and the solution of the equation will be the estimate for
matched units on the second occasion.
23
Now, let us calculate the estimate of current population median 92 by solving
the equation for an estimator of T2 composite with T2u and T2m. The approach used
here is similar with that discussed in Section 3.2 for Des Raj (1965) scheme and
Section 3.3 for Ghangurde-Rao (1969) scheme:
02 is such that f2 = QT2u + (1 - Q)f2m = 0,
where Q is the optimal weight calculated by
Var(T2m) Q
Var(T2u) + Var(T2my
with Var(T2u) the same as (3.14) of the Ghangurde-Rao scheme, and Var(T2m)
found in the Appendix. The form of Var(T2m) is
v-rpi.)=4^- (•£ =? - wA+^4 (± «£« - vA , (N - l)n \j^ Pi J mn(N - 1) \j^ yu J
(3.19) where
1 N
w'i = I(y2i^92m)--, W' = J2™i-i=l
Moreover, T2u is the same as defined in (3.16) of the Ghangurde-Rao scheme, and
(/(to < k) - l ) Pi/P: Pi T2m = E -^ 3 T ^ j— (3.20)
P-i£si
One can see that T2m actually has the form of replacing 92m in (3.18) by 92. We
will compare the performance of Des Raj scheme, Ghangurde-Rao scheme, and
Prasad-Graham scheme in estimating population medians in the simulation study.
24
Chapter 4
A simulation study based on
generated populations
To compare the proposed method of estimation for Des Raj (1965) scheme, Ghangurde-
Rao (1969) scheme, and Prasad-Graham (1994) scheme in estimating median of a
finite population, a simulation study based on four sets of random generated popu
lations is conducted. In this chapter, we consider two models for simulating finite
populations.
4.1 Description of two sets of generated populations
based on model 1
In model 1, we first constructed two fixed finite populations of the size measure
Xi (i = 1,2,..., N) with N = 500 and 720 units respectively. The two random
populations of Xi were generated from a normal distribution with mean ax = 25,
and standard deviation ox = 5. Using these x values, the population values for the
25
first occasion yu were simulated by the model yu = 500+0.5Xj+ei, where the error
term ex was generated from a normal distribution with mean u = 0, and standard
deviation a = 8a x- Then, we constructed the population values for the second
occasion y2i using the generated populations of yu and by the model to = 600 +
5.1 to + e2, where the error term e2 was also generated from a normal distribution
with mean u = 0, and standard deviation a = 8a x- Now, we have two sets of
populations of x^, yu and y2i with N = 500 and 720 respectively based on model 1.
Another two sets of populations will be simulated in a different way in next section.
4.2 Description of two sets of generated populations
based on model 2
In model 2, we also simulated two fixed finite populations of the size measure Xi
(i = 1,2,..., AT) with N = 500 and 720 units respectively, but the two random
populations of Xi were generated from an exponential distribution with mean ux —
0.1 instead. After that, we constructed the population values for the first occasion
to using the generated x values and by the model yu — 25 + 1.5XJ + ei, where
the error term ei was generated from a normal distribution with mean u = 0, and
standard deviation a = bJ\ax- Using these y\ values, the population values for
the second occasion y2i are simulated by the model y2i = l-3to+e2, where the error
term e2 was generated from a normal distribution with mean u = 0, and standard
deviation a = 2. Then, we have two sets of populations of xit yu and y2i with
N = 500 and 720 respectively based on model 2.
26
4.3 Computations on generated populations
For each set of populations with N = 500 based on model 1 and model 2, we
considered two cases of the choices of n and m. One is n = 100, m = 50, and the
other is n = 250, m = 125. We applied Des Raj (1965) scheme, Ghangurde-Rao
(1969) scheme and Prasad-Graham (1994) scheme to the two cases, and calculated
both the estimates of first occasion median 9\ and the estimates of current median
92 for each scheme. This whole process was repeated R = 1000 times.
For each set of populations with N = 720 based on model 1 and model 2, we
also considered two cases of the choices of n and m. One is n = 180, m = 60, and
the other is n = 180, m = 90. The same as before, Des Raj scheme, Ghangurde-
Rao scheme, and Prasad-Graham scheme were applied to the two cases, and both
the estimates of first occasion median §i and the estimates of current median 02
were obtained for each scheme. This whole process was also repeated R = 1000
times.
Our goal of this simulation study is to compare the performance of the three
sampling schemes in estimating population medians. For this goal, we computed
the relative biases and relative mean squared errors of. 9\ and #2 for each scheme.
The relative bias (Rel.bias) of 9\ in percentage was calculated as
Rel.biasl% biasipi)
x 100% £(0i) - 0i x 100%, (4.1)
where 9\ denotes true value of the population median on the first occasion, and
E denotes expectation with respect to the design; that is, the average value over
R = 1000 runs. For example, E{01) = J2 &i(r)/R. Similarly the relative bias of r=l
92 in percentage was calculated as
bias(92) Rel.bias2% = 0i
x 100% = E{02) - 02
92
x 100%, (4.2)
27
where 92 denotes true value of the population median on the second occasion.
The relative mean squared error (Rel.MSE) of 9X and 92 in percentage were
computed as
Rel.MSEl% = MSEM x 100% = EAzlll X 100%, (4.3) 0i 0\
and
Rel.MSE2% = ^ M x m% = ^ I Z ^ l x 100%, (4.4) 02 02
where MSE denotes the mean squared error over R — 1000 runs. For instance,
MSE{91) = Jt[0i(r)-91]2/R.
r=l
Recall that h defined in equation (2.29) measures the efficiency of the estimator
using Xi as a size measure compared to the estimator using yu as a size measure
in estimating the current population total, and 5 is the correlation between yu/Pi
and to/Pi- Based on Prasad and Graham (1994), a relatively small h, specifically
for those populations with h < 1 — S2, Prasad-Graham scheme, which uses the in
formation obtained from the pervious occasion in estimating the current population
total, is superior to that of Chotai scheme and Ghangurde-Rao scheme. We would
like to see if this is true also for our situation to estimate the population median.
Therefore, we calculated the ratio (1 — S'2)/h' in the situation of estimating pop
ulation median for each set of generated population and if the ratio is greater than
1, we expect Prasad-Graham scheme performs better than Ghangurde-Rao scheme.
To obtain 6', let zu = / ( t o < #i) - \ and &n = / ( t o ^ #2) - \, where 0i and
02 are the true value for the first occasion median and current median and then we
replaced yu by zu and y2i by z2i in 6. To obtain hi, we replaced to by z2% in h. That
is,
6' = corr(^,^), (4.5) \Pi Pi J
28
with
and
where
h' = ^ (4.6)
i = i p i
«-£(*«-*)>• N N
Yi = Yly^ z2 = ^2Z2i-i=l i=l
The results obtained from each set of generated populations are compared for
the three sampling schemes in the next section.
4.4 Numerical comparisons
Since the bias of an estimator is the difference between an estimator's expecta
tion and the true value of the parameter being estimated and MSE is the difference
between values implied by an estimator and the true values of the quantity being
estimated, the best estimating scheme is that provides the smallest bias (or relative
bias) and smallest MSE (or relative MSE).
4.4.1 Comparisons of results for the two sets of generated pop
ulations based on model 1
The Rel.bias and Rel.MSE of #i and 92 for the two sets of generated populations
based on model 1 are present in Table 4.1, and we compared the results for Des Raj
(1965) scheme, Ghangurde-Rao (1969) scheme and Prasad-Graham (1994) scheme
in each set of populations.
29
Table 4.1: Rel.bias and Rel.MSE of 9X and 92 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for the generated populations based
on model 1: number of simulation=1000
Scheme
DR
GR
PG
DR
GR
PG
DR
GR
PG
DR
GR
PG
N
500
500
720
720
l-S'2
h'
0.1713
0.1713
0.2799
0.2799
n
100
250
180
180
m
50
125
60
90
Rel.
biasl%
0.8360
0.7591
0.7591
0.5300
0.3774
0.3774
0.5838
0.4944
0.4944
0.5838
0.4944
0.4944
Rel.
bias2%
0.7863
0.7314
0.8426
0.5589
0.4614
0.5331
0.3758
0.3661
0.4240
0.3937
0.3514
0.4017
Rel.
MSE1%
5.7012
4.8029
4.8029
2.3827
1.2001
1.2001
2.9114
2.1836
2.1836
2.9114
2.1836
2.1836
Rel.
MSE2%
28.5914
24.8262
31.9417
14.7048
9.7248
13.2954
8.1422
7.5581
9.6897
9.0180
6.9015
8.8674
For the set of populations with N = 500, case 1 is when n = 100, m = 50,
and case 2 is when n = 250, m = 125. From case 1 to case 2, sample frac
tion n/N increases, but proportion of matched units m/n stays the same. One
can find out that all the relative bias and relative MSE decreases as sample frac
tion increases from case 1 to case 2. For both cases, Rel.bias 1 and Rel.MSE 1 are
30
the same for Prasad-Graham scheme and Ghangurde-Rao scheme, this is because
the sampling procedures on the first occasion are the same for these two schemes;
Des Raj scheme has larger Rel.bias 1 and Rel.MSEl since it is under the PPSWR
framework while Ghangurde-Rao scheme is under the PPSWOR framework. For
example, when n = 100, m = 50, Rel.MSEl for Prasad-Graham scheme and
Ghangurde-Rao scheme is 4.8029%, but Des Raj scheme has 5.7012% Rel.MSEl.
Next, we compared relative bias and relative MSE of the current estimates for the
three schemes. For case 1, Rel.bias2 and Rel.MSE2 for Ghangurde-Rao scheme is
the smallest while for Prasad-Graham scheme is the largest. For case 2, Rel.bias2
and Rel.MSE2 for Ghangurde-Rao scheme is also the smallest, but for Des Raj is
the largest.
For the set of populations with N = 720, case 1 is when n = 180, m = 60, and
case 2 is when n = 180, m = 90. From case 1 to case 2, sample fraction n/N stays
the same, but proportion of matched units m/n increases. Since the choices of N
and n are the same for the two cases, Rel.bias 1 and Rel.MSEl does not change from
case 1 to case 2. The conclusion draw on Rel.biasl and Rel.MSEl is the same as the
set of populations with N = 500. That is, for both cases, Rel.biasl and Rel.MSEl
are the same for Prasad-Graham scheme and Ghangurde-Rao scheme, and Des Ra-
j scheme has larger values. For instance, when n = 180, m = 60, Rel.MSEl
for Prasad-Graham scheme and Ghangurde-Rao scheme is 2.1836%, but Des Raj
scheme has 2.9114% Rel.MSEl. When comparing Rel.bias2 and Rel.MSE2 for the
three schemes, one can find out that comparison for case 1 is the same as that of the
set of populations with N — 500. Thus, Ghangurde-Rao scheme has the smallest
Rel.bias2 and Rel.MSE2 whereas Prasad-Graham scheme has the largest Rel.bias2
and Rel.MSE2. For case 2, Ghangurde-Rao scheme still have the smallest Rel.bias2
31
and Rel.MSE2, and Prasad-Graham scheme has the largest Rel.bias2; however, the
scheme which has the largest Rel.MSE2 is now Des Raj scheme.
In summary, Ghangurde-Rao scheme is better than Des Raj scheme for esti
mating both the previous and current population medians since Ghangurde-Rao
scheme is under the PPSWOR framework whereas Des Raj scheme is under the
PPSWR framework. For estimating previous population median, Prasad-Graham
scheme actually performs the same as Ghangurde-Rao scheme due to the fact that
they use the same sampling and estimation procedures on the first occasion. How
ever, for estimating current population median, Prasad-Graham scheme does not
outperform Ghangurde-Rao scheme. One possible reason might be that the value
of h' > 1 — 8'2 because the ratio (1 — 8'2)/h' for the two sets of populations with
N = 500 and 720 are both smaller than 1. The ratios for the two sets of populations
are (1 - 6'2)/ti = 0.1713 and 0.2799, respectively.
4.4.2 Comparisons of results for the two sets of generated pop
ulations based on model 2
The Rel.bias and Rel.MSE of 9\ and 92 for the two sets of generated populations
based on model 2 are present in Table 4.2, and we compared the results for Des Raj
(1965) scheme, Ghangurde-Rao (1969) scheme and Prasad-Graham (1994) scheme
in each set of populations.
32
Table 4.2: Rel.bias and Rel.MSE of fa and 92 under Des Raj (DR), Ghangurde-
Rao (GR) and Prasad-Graham (PG) schemes for the generated populations based
on model 2: number of simulation=1000
Scheme
DR
GR
PG
DR
GR
PG
DR
GR
PG
DR
GR
PG
N
500
500
720
720
l-S'2
h'
2.1 All
2.1 All
1.9124
1.9124
n
100
250
180
180
m
50
125
60
90
Rel.
biasl%
0.5343
0.4569
0.4569
0.3524
0.2324
0.2324
0.4811
0.4152
0.4152
0.4811
0.4152
0.4152
Rel.
bias2%
1.7677
1.5216
1.4764
1.3079
0.8962
0.8915
1.2552
1.0779
0.9943
1.2482
1.0353
0.9795
Rel.
MSE1%
0.1107
0.0802
0.0802
0.04816
0.0227
0.0227
0.0885
0.0683
0.0683
0.0885
0.0683
0.0683
Rel.
MSE2%
1.9233
1.3593
1.2916
0.8766
0.4077
0.4167
0.8974
0.6439
0.5542
0.8846
0.6113
0.5528
We also considered those cases discussed in the previous section. The con
clusion drawn on the comparisons of Rel.biasl and Rel.MSEl is the same as the
two sets of generated populations based on model 1. That is, Prasad-Graham
scheme and Ghangurde-Rao scheme have the same Rel.biasl and Rel.MSEl, but
Des Raj scheme has larger values than these two schemes. For example, when
33
N = 500, n = 100 and m = 50, Rel.MSEl for Prasad-Graham scheme and
Ghangurde-Rao scheme is 0.0802%, but Des Raj scheme has 0.1107% Rel.MSEl;
when A = 720, n = 180, rn -- 60, Prasad-Graham scheme and Ghangurde-Rao
scheme has 0.0683% Rel.MSEl, but for Des Raj scheme is 0.0885%. Next, relative
bias and relative MSE of the current estimates for the three schemes were compared.
One can find out that Des Raj scheme has the largest Rel.bias2 and Rel.MSE2 for all
the cases, and Prasad-Graham scheme has the smallest Rel.bias2 and Rel.MSE2 for
all the cases except one when N = 500, n = 250 and m = 125. In this particular
case, Prasad-Graham scheme still has the smallest Rel.bias2, but its Rel.MSE2 is
slightly greater than that of Ghangurde-Rao scheme. For Prasad-Graham scheme,
Rel.MSE2 = 0.4167%, and for Ghangurde-Rao scheme, Rel.MSE2 = 0.4077%.
In summary, Des Raj scheme provides the largest bias and errors in estimating
both previous and current medians among the three compared schemes. Prasad-
Graham scheme and Ghangurde-Rao scheme performs the same in estimating pre
vious population median. For the estimation of current population median, Prasad-
Graham scheme is superior to Ghangurde-Rao scheme for almost all the situa
tions. The possible reason might be the value of hi is relatively small now, and
hi < 1 — 8'2 since the ratio (l — S'2)/h' for the two sets of populations with N = 500
and 720 are both greater than 1. The ratios for the two sets of populations are
(1 - 8n)/h' = 2.7471 and 1.9124, respectively.
34
Chapter 5
A simulation study based on real data
In previous section, we conducted a simulation study based on four sets of random
generated populations. Now, another simulation study based on two real data sets is
carried out in this section to compare the Des Raj (1965) scheme, Ghangurde-Rao
(1969) scheme and Prasad-Graham (1994) scheme in estimating median of a finite
population.
5.1 Description of data sets
We used two real data sets A and B. Data set A is from Murthy (1967), it relates to
the area under the wheat in 1964, in 1963 and the total area of each farm (cultivated
area) in 1961 for 34 villages in India. The cultivated area in 1961 is considered
as the size measure x, the area under the wheat in 1963 is considered as the value
observed on the first occasion y\, and the area under the wheat in 1964 is consid
ered as the value observed on the second occasion y2- For the sample schemes we
compared, N/n, N/u and n/m are all assumed to be integers, so if total number of
observations is N = 34, it will be difficult to choose values for n and m. There-
35
fore, we deleted the two smallest and two largest data based on the values for y2;
then, total number of observations becomes N = 30. Data set B is from Sukhatme
(1970), it relates to the area under the wheat in 1937, in 1936 and the cultivated area
in 1930 for 34 villages in India. The cultivated area in 1930 is the size measure x,
the area under the wheat in 1936 is the value observed on the first occasion to- and
the area under the wheat in 1937 is the value observed on the second occasion y2.
To make N = 30, the two smallest and two largest data based on the values of y2
were also deleted.
5.2 Computations on real data
For both data sets A and B, we chose n = 15 and m = 5. We applied Des Raj
(1965) scheme, Ghangurde-Rao (1969) scheme and Prasad-Graham (1994) scheme
to each data set, and calculated both the estimates of first occasion median 9\ and
the estimates of current median 92 for each scheme. This whole process was al
so repeated R = 1000 times. Then, similar with the computations on generated
populations present in Section 4.3, we calculated the relative biases and relative
mean squared errors of 9\ and 92 for each of the three schemes. The relative bias
(Rel.bias) of 6\ and the relative bias of 92 in percentage were obtained as defined in
(4.1) and (4.2) respectively. The relative mean squared error (Rel.MSE) of 9X and
92 in percentage were calculated as defined in (4.3) and (4.4) respectively. Finally
we also computed the ratio (1 - 5'2)/h' for each of the two data sets, where 5' is as
defined in (4.5), and hi is as defined in (4.6).
The results obtained from each data set are compared for the three sampling
schemes in the next section.
36
5.3 Numerical comparisons
The Rel.bias and Rel.MSE of 9X and 92 for data set A and data set B are present in
Table 5.1 and Table 5.2, respectively. We compared the results for Des Raj (1965),
Ghangurde-Rao (1969) and Prasad-Graham (1994) scheme in each data set.
Table 5.1: Rel.bias and Rel.MSE of 9X and 02 under Des Raj (DR), Ghangurde-Rao
(GR) and Prasad-Graham (PG) schemes for real data set A: number of simula
t ion^ 000
Scheme
DR
GR
PG
N
30
1-5'2
h'
0.0684
n
15
m
5
Rel.
biasl%
31.05
24.46
24.46
Rel.
bias2%
28.30
22.72
25.78
Rel.
MSE1%
2213.383
1388.473
1388.473
Rel.
MSE2%
2195.072
1502.715
1837.650
Table 5.2: Rel.bias and Rel.MSE of 0X and 02 under Des Raj (DR), Ghangurde-Rao
(GR) and Prasad-Graham (PG) schemes for real data set B: number of simula
t ion^ 000
Scheme
DR
GR
PG
N
30
1-5'2
h'
0.0600
n
15
m
5
Rel.
biasl%
30.19
25.69
25.69
Rel.
bias2%
28.10
22.34
27.66
Rel.
MSE1%
2132.899
1528.973
1528.973
Rel.
MSE2%
2191.323
1452.867
2063.229
37
For both data sets A and B, Rel.biasl and Rel.MSEl are the same for Prasad-
Graham scheme and Ghangurde-Rao scheme, but Des Raj scheme has larger Rel.biasl
and Rel.MSEl. For example, data set A has 1388.473% Rel.MSEl for Prasad-
Graham scheme and Ghangurde-Rao scheme, and 2213.383% Rel.MSEl for Des
Raj scheme. When comparing Rel.bias2 and Rel.MSE2 for the three schemes
in both data sets, values of these two measurements for Ghangurde-Rao scheme
are the smallest and for Des Raj scheme are the largest. For instance, data set A
has 1502.715% Rel.MSE2 for Ghangurde-Rao scheme, 1837.650% Rel.MSE2 for
Prasad-Graham scheme, and 2195.072% Rel.MSE2 for Des Raj scheme; data set B
has 1452.867% Rel.MSE2 for Ghangurde-Rao scheme, 2063.229% Rel.MSE2 for
Prasad-Graham scheme, and 2191.323% Rel.MSE2 for Des Raj scheme.
In summary, Des Raj scheme provides the largest bias and errors in estimating
both previous and current medians among the three compared schemes. Prasad-
Graham scheme and Ghangurde-Rao scheme performs the same in estimating pre
vious population median. These conclusions are the same as those drawn on the
two sets of generated populations which were discussed in Section 4.4.2. Howev
er, for the estimation of current population median, Prasad-Graham scheme is not
superior to Ghangurde-Rao scheme. There are two possible reasons: one might
be that the value of h' > 1 - 5'2 since the ratio (1 - 8'2)/h' for both data sets
are smaller than 1; the other one might be that the number of total observations
N and sample sizes n and m are too small, and the results could not reflect com
plete information. The ratios (1 - 8'2)/h' for data sets A and B are 0.0684 and
0.0600 respectively. If we compare the two ratios, one can find out that data set
A has relative smaller value of hi, and we may expect that Prasad-Graham scheme
38
performs better for data set A than data set B. In order to determine whether our ex
pectation is reasonable, we calculated the differences of Rel.MSE2 between Prasad-
Graham scheme and Ghangurde-Rao scheme for both data sets. The difference of
Rel.MSE2 for data set A is (1837.650 - 1502.715)% = 334.935%, and for data
set B is (2063.229 - 1452.867)% = 610.362%. Since difference for data set A is
smaller, Prasad-Graham scheme indeed performs better for data set A than B.
39
Chapter 6
Conclusion and future work
The practice of using partial replacement sampling scheme in repeated surveys is
quite common now because of a reduction in the burden of response as well as a
improvement on the efficiency of estimation. After the first of the two successive
sampling occasions, part of the units observed on that occasion will be rotated out of
the sample and replaced by a fresh selection from the entire population. These un
matched units are then observed on the second sampling occasion together with the
remaining set of matched units. An important case of successive sampling over two
occasions is the sampling with probability proportional to size (PPS). This particu
lar case of unequal probability sampling uses the auxiliary information to compute
the initial selection probabilities since auxiliary information is usually relatively
cheap to obtain and often available in advance for the entire population. There are
extensive work on the estimation of population total or mean for PPS sampling over
two occasions; however, no effort has been made to estimate population quantiles
for unequal probability sampling over two occasions. The methods available for
quantile estimation for successive sampling over two occasions in the literature to
date are only applicable to simple random sampling situation, and these methods
40
need density estimation to obtain estimates for quantiles. In this thesis, we present
a new method of estimating population median (second or 50% quantile) for un
equal probability sampling over two occasions based on estimating equations, and
this proposed method overcomes the need for density estimation. For the estimate
of the population median, three sampling schemes with unequal probabilities are
considered, and comparisons for the three sampling schemes are also discussed in
the thesis. The proposed approach for estimation of population median can also be
used for estimation of other quantiles, such as first (25%) quantile and third (75%)
quantile. In future work, variance estimation for the proposed median estimates
would be discussed.
41
Bibliography
[1] Chotai, J. (1974). A Note on the Rao-Hartley-Cochran Method for PPS Sam
pling Over Two Occasions. The Indian Journal of Statistics, 36: 173-180.
[2] Binder, D. A. and Patak, Z. (1994). Use of Estimating Functions for Estima
tion from Complex Surveys. Journal of the American Statistical Association,
Vol. 89, No. 427, pp. 1035-1043.
[3] Ghangurde, P. D. and Rao, J. N. K. (1969). Some Results On Sampling Over
Two Occasions. Sankhy, Series A, 31, 463-472.
[4] Hansen, M.M. and Hurwitz, W.N. (1943). On the theory of sampling from
finite populations. Annals of Mathematical Statistics, 14, 333-362.
[5] Jessen, R.J. (1942). Statistical investigation of a sample survey for obtaining
farm facts. Iowa Agricultural Experiment Statistical Research Bulletin, 304,
1-104.
[6] Murthy, M. N. (1967). Sampling Theory and Methods. Calcutta, India: Statis
tical Publishing Society.
[7] Prasad, N. G. N. and Graham, J. E. (1994). PPS Sampling over Two Occa
sions. Survey Methodology, Vol. 20, No.l, pp. 59-64.
42
[8] Raj, D. (1965). On sampling over two occasions with probabilities proportion
al to size. Annals of Mathematical Statistics, 36, 327-330.
[9] Singh, H. P., Tailor, R„ Singh, S. and Kim, J. M. (2007). Quantile Estimation
in Successive Sampling. Journal of the Korean Statistical Society, 36: 4, pp
543-556.
[10] Sukhatme, P. V, and Sukhatme, B. V. (1970). Sampling Theory of Surveys
With Applications. Ames, Iowa: Iowa State University Press.
[11] Thompson, M. E. (1997). Theory of Sample Surveys. Chapman Hall, London,
pp.94-95.
43
Appendix A
Derivation
A.l Des Raj Scheme
A.l.l Var(T2u) in equation (3.7)
To derive Var(T2„), assume that 02 is known in T2u and then replace 02 by 02u in
the derived expression. That is, let
T :,: V^ I(^2i ^ e^ ~ 2 __ y ^ WJ 2-^ upi *-" upi' ies2 i€s2
where
w% = I(y2i ^ 02) - \ (A.l)
Since sample s2 of u units are selected with PPSWR (probability proportional to
size measure x» with replacement) from the entire population of N units, T2u is
actually a Hansen-Hurwitz (1943) estimator. Based on the idea of finding variance
for a Hansen-Hurwitz estimator, variance of T2u is found to be JV / \ 2
v«Pi.) = i £ ( ^ ) P . where
W = Y^Wi. (A2) i= i
44
In the equation of Var(T2u) above,
J V / v 2 JV 2 N N
i=l \ P i J i=l Pl i=l i=l Pi —: —~! i = i i = i
JV 2
wf = Y,--w2
JV
because Y^Pi = 1- Then, we get variance of T2M: i=l
v»m.)4(Ef-^). <^> where in u>i, 92 is replaced by 02u.
A.1.2 Var(T2m) in equation (3.8)
To derive Var(T2m), assume that 9X and #2 are known in T2m and then replace 9X by
0i and 02 by #2m in the derived expression. That is, let
^ _ y / ( t o ^ 0l) ~ | | y / ( t o < 02) ~ / ( t o < Si)
Since sample sx of m units is a subset of sample s, variance of T2m should be
obtained by:
Var(T2m) = Var[£(T2m|s)] + £[Var(T2m|s)]. (AA)
First let us find £,(T2m |s). Sample s\ of m units is a simple random sample selected
without replacement (SRSWOR) from sample s of n units, so
= y^ /(to < 02) - 5
—E-. ies vt
45
where
w\ = I(y2i ^ 02) ( ^ • 5 )
Now, let
then
JV
W=Y:^
I / N '2
w'
(A.6)
(A.l)
because sample s of n units is chosen with PPSWR from the entire population of
N units, and E(T2m\s) is a Hansen-Hurwitz (1943) estimator. Its variance is found
similarly with how we get (A.3).
Next is to find Var(T2m|s). Let
w* = I(y2i^02)-I(yii^0i), (A.8)
and
then
JV
W = Y,v>l i=l
\mT? Pi
Var i-y
n — m 1
w: m f—' Pi
< _ I y^< E t e s ies
(A9)
since sample Si is a simple random sample selected from sample s without replace
ment (SRSWOR), and (J2 w*/Pi)/m i s t n e m e a n o f s i - A f t e r s o m e simplication, iesi
the above equation becomes:
Var(T2m|s) = n — m (n — l)m
iy(<) - l-T^-
46
Then,
£[Var(T2m|S)] n — m (n — l)m
E l E 5f _E I E < n l€.S i£s
where
and
^E^l-E^.
Efiy:<V = vaI('lv<) + Uly: \nkv>) \nkn) [ l»tr Pi
Ti P
w * i \ + W * \
Now, plug the above two equations into (A. 10), we get
ElV,r(T2m\s)} = ^ ( ± ^ - W mn \"H / pi
*2
Therefore, combining (A.4), (A.7) and (A.ll), Var(T2m) is calculated:
i / N
Var(T2m) = -[Y, \i=l
N '2 \ / N *2
Pi J mn \J-{ pi *2
where in w'i and w*, 0\ and 02 are replaced by 9\ and 02m, respectively.
A.2 Ghangurde-Rao Scheme
(A10)
( A l l )
(A12)
A.2.1 Var(T2w) in equation (3.14)
To derive Var(T2u), assume that 02 is known in T2u and then replace 02 by 92u in
the derived expression. That is, let
T _ y i1^ <^ ~l]p* -y wip* i£S2 ies2
Al
where wt is defined in (A.l). Following equation (25) of Ghangurde and Rao
(1969), the variance of T2u is given by
v«ew - ^ E ( | - ->^ - ^ (E f - *"). (A13)
where in u>i, 92 is replaced by 92u, and V7 is as defined in (A.2).
A.2.2 Var(T2m) in equation (3.15)
To derive Var(T2m), assume that 9i and 92 are known in T2m and then replace 9\ by
0i and 02 by 02m in the derived expression. That is, let
„ y ^ [/(to < ft) - | ] Pi , n ^ [/(to < ft) - / ( t o < 0i)\ Pi J-2m = y, 1 2 ^ •
. Pi m/--' Pi
Variance of T2m should be obtained by the equation (A.4). First is to find /?(T2m|s).
it Pi
| U y [(/(to ^ 02) ~\)~ (/(to < 0l) ~ \)] P nit Pi
_ y [HVX < 02) - J] Pi V Pi
" V Pi '
where w' is defined in (A.5). then,
AT I N '2
N -n \-^w/
with W as defined in (A.6).
48
To obtain E[Yai(T2m\s)], we first need to find Var(T2m|s):
Var(T2ro|S) = Var (± V ^ ^ ~ ^ < W * s)
\m it, P* J = n 2 V a r ( l y ;
\m/L-J i ^ W:P1
n — m 1 (n
m 1 y ^ fw*Pt __ 1 ^ w*Pz\
^"M ^ "it* p* ) because (Y^ w*Pi/Pi)/m is the mean of simple random sample si selected from
J G S I
sample s without replacement. Then, Var(T2m|s) can be simplified to
Var(T2m|s) = n2 n — m n £—< in — l)m
Therefore, we obtained £?[Var(72TO|s)] as
n •'—' v p.
W:P
- to*"-
£[Var(T2m|s)j = n2 n — m (n — l)m * f e
zGs
<2/f P 2
n~Z P^
(A15)
where the two expectations in the above equation are needed to be computed sepa
rately.
sf^E <P* n *—' pt
es yt
= Var l £ ! * £ U
1 AT-n /v^w,*2
^(E?—) + (i£< n2(N
N-n ^w? | J V ( n - l ) ^ 2
( A f - l ) n 3 - ^ p, (Af - l )n 3
The first expectation in (A. 15) is calculated in the following way. Since the proba
bility of selecting unit i (i = 1,2,..., AT) in sample s is dependent on which group it
belongs to, let us consider:
49
1, ifuj E s given that Uj G group Gt Sij = "
0, otherwise
where i (i = 1,2, ...,n) indicates the number of random groups, and j (j =
1,2,..., N) indicates the number of population units. Let us denote k = N/n be the
sampling fraction, and we know that Pi defined in Section 2.3 of the Ghangurde-
Rao scheme (1969) is
Pi = Y,Pi, (A16)
then, 1 ,„*2p2 1 N
?„*2
sE^HEE^-tfv (Air) eGs ' r t sGs j = l •?
50
Therefore, by using (A. 16) and (A. 17),
wfP2
n*-? Pi
= E E n Z ^ WfP?
n~r p; Gi, • • • , Gre
£fesS^t N »„*2
*;EE^« i 6 s j =7 ft
= ^ ( S E E ? E R
k2 r-^ = -YE n ^
z€s
n
J 6 G ,
„*2
*2
ft ^Eft
^ E ? iE ^ f t
j e G ,
ft j e G .
n lA/
(N-l)kN^\p3 N^p3
+
= k2
If ^ ^ f t / 1
AT-fc 1 (jV-l)A;Ar
JV
N Eft
JV
ft iV Eft .7 = 1
N r> N in*2 1 W w*2\ 1 w
Z ^ WJ JV Z ^ „. ^ AT2 Z ^ „. J T AJ-2 Z ^ w
*2
7 = 1 A ^ ft ' ^ 2 ^ ft ^2tr ft JV „ JV * 2 JV » 2 \ JV yw*2_lyT^ + JLyTi-)+±y z^w3 i v ^ n ^ /v2z^ „. r n J ^
" < jV(n - 1)
N „„*2
U = l
AT(n - 1)
(AT - l )n 2 uc w"2 + 4 E
7 = 1 * J
! 2 ( n - l ) | n
51 AT N(N - 1)
Then, we convert j to i in the above equation without loss of generality to get
E ly<3\ = N(n-Vw« \ n i t ft' / (N~l)n2
Now, plug the above two expectations in (A. 15), to obtain
1 v ^ wf L 2(n - 1) n - 1 J \ _L_ i i L J n2ft ft L i V _ 1 N(N~ 1).
£[Var(T2m|s)] n — m mn(iY - 1)
w *2 (AT - 2n + ^ ) ] T ^ - + (ra - l)NW
i=l ft
*2
(A18)
Therefore, variance of T2m is computed by using (A.4), (A.15) and (A.18):
JV / 2
Var(T2m) = ^ - ( g a ! - ^ r JV
+- (jV-2n+#)£^- + (n-l)AW i= l
*2
(A19)
rara(JV-l)
where in w^ and w*, 0! and 02 are replaced by 0! and 02m, respectively,
A.3 Prasad-Graham Scheme
A.3.1 Var(T2m) in equation (3.19)
To derive Var(T2m), assume that 0\ and 02 are known in T2m and then replace 0\ by
ft and 02 by 02m in the derived expression. That is, let
r V^ W(y2i < ft) - \) Pi/Pi] Pi _ y ^ « * £ -'2m - 2 ^ ^ _ Z^
ies i ft iesi ft
where
«;„• = [/(to < ft) " |]Pi
Pi (A20)
Variance of T2m is also obtained by the equation (A.4), and let us first find E(T2m \ s).
svr n ^ . , * V [1{y2i ^ °2) ~ 2]Pl - V W'A
E(T2m\s) = 2^Wi=l^ ^ ~ 2w i£s i£s Pi i£s Pi
52
which is the same as the Ghangurde-Rao scheme (1969), and w'i is as defined in
(A.5). Hence, Var[£(T2m|s) is the same as the equation obtained in (A. 14).
Next, we need to find Var(T2m|s):
(n-l)mj£\p*i J
with
Then, after some simplication, we found
Biv-pi-wi = ^ 4 E (** - w')2y" mn(N - 1) f r ' \ t o /
and it can be further simplified to
z=l x '
JV
1 £[Var(T2ro|S)] = * < " TO) ( E — ^ - ^ ) " ( A 2 2 ) mn(AT - 1) \j^ yu J
where JV
^ = $ > i i i= l
is the population total on the first occasion. By combining (A.4), (A. 14) and (A.22),
Var(T2m) is achieved as:
where in WJ-, 02 is replaced by ftm.
53
Appendix B
Rcode
####N/n, N/u and n/m are assumed to be integers
N=500
n=100
m=50
num.sample=l00 0
u=n-m
k=N/n
r=N/u
t=n/m
set.seed(10)
X.popu=rnorm(N, 25, 5)
Y.popu=500+0.5*X.popu+rnorm(N,0,8*sqrt(X.popu))
Y.popu2=600+5.l*Y.popu+rnorm(N,0,8*sqrt(X.popu))
##X.popu=rexp(N,10)
##Y.popu=25+l.5*X.popu+rnorm(N,0,5*sqrt(0.5*X.popu))
##Y.popu2=1.3*Y.popu+rnorm(N,0,2)
##da ta=read . tab le ("DataA. tx t" ,header=T)
54
##X.popu=data[,l]
##Y.popu=data[,2]
##Y.popu2=data[,3]
sizes=X.popu
p.popu=X.popu/sum(X.popu)
zli=as.numeric(Y.popu<=median(Y.popu))-1/2
z2i=as.numeric(Y.popu2<=median(Y.popu2))-1/2
delta.prime=cor(zli/p.popu,z2i/p.popu)
V2.prime=sum(z2i"2/p.popu)-sum(z2i)"2
V3.prime=sum(z2i"2*sum(Y.popu)/Y.popu)-sum(z2i)"2
h.prime=V3.prime/V2.prime
(1-delta.prime"2)/h.prime
########Des Raj Scheme########
sample_indices.S=matrix(0,num.sample, n)
Y.S=matrix(0,num.sample,n)
p.S=matrix(0,num.sample,n)
sample_indices.S2=matrix(0,num.sample,u)
Y.S2=matrix(0,num.sample, u)
p.S2=matrix(0,num.sample,u)
sample_indices.Sl=matrix(0,num.sample,m)
Y.Sl=matrix(0,num.sample,m)
Y.S12=matrix(0,num.sample, m)
p.Sl=matrix(0,num.sample, m)
thetal_hat=vector(mode="numeric",length=num.sample)
55
theta2u_hat=vector(mode="numeric",length=num.sample)
theta2m_hat=vector(mode="numeric",length=num.sample)
theta2_hat=vector(mode="numeric",length=num.sample)
T2u=vector(mode="numeric",length=num.sample)
T2m=vector(mode="numeric",length=num.sample)
W=vector(mode="numeric",length=num.sample)
W.prime=vector(mode="numeric",length=num.sample)
W.star=vector(mode="numeric",length=num.sample)
var_T2u=vector(mode="numeric",length=num.sample)
var_T2m=vector(mode="numeric",length=num.sample)
var.E_T2m=vector(mode="numeric",length=num.sample)
E.var_T2m=vector(mode="numeric",length=num.sample)
Q=vector(mode="numeric",length=num.sample)
T2u.hat=vector(mode="numeric",length=num.sample)
T2m.hat=vector(mode="numeric",length=num.sample)
library (pps)
for (i in 1:num.sample)
{
####lst occasion: sample S of n units is selected
####from entire population
sample_indices.S[i,]=ppswr(sizes,n)
Y. S[i, ]=Y.popu[sample_indices.S[i,]]
p. S[i, ]=p.popu[sample_indices.S[i,]]
f=function(thetal)
{
56
sum((as.numeric(Y.S[i,]<=thetal)-l/2)/(n*p.S[i,]))
}
thetal_hat[i]=uniroot(f,c(-5000,5000))$root
####2nd occasion: sample S2 of u units is selected
####from entire population
sample_indices.S2[i,]=ppswr(sizes, u)
Y.S2[i, ]=Y.popu2[sample_indices.S2[i,]]
p.S2[i, ]=p.popu[sample_indices.S2[i,]]
f=function(theta2u)
{
sura((as.numeric(Y.S2[i,]<=theta2u)-1/2)/(u*p.S2[i,]))
}
theta2u_hat[i]=uniroot(f,c(-5000,5000))$root
T2u[i]=sum((as.numeric(Y.S2[i,]<=theta2u_hat[i])-
l/2)/(u*p.S2[i,]))
W[i]=sum(as.numeric(Y.popu2<=theta2u_hat[i])-1/2)
var_T2u[i]= l/u*(sum((as.numeric(Y.popu2<=
theta2u_hat[i])-1/2)"2/p.popu)-W[i]~2)
####2nd occasion: sample SI of m units is selected
####from sample S
sample_indices.SI[i, ] =
sample(sample_indices.S[i,],m,replace=FALSE)
Y.S1[i, ]=Y.popu2[sample_indices.Sl[i,]]
Y.S12[i,]=Y.popu[sample_indices.SI[i, ]]
p.SI [i,]=p.popu[sample_indices.SI[i, ] ]
57
f=function(theta2m)
{
sum((as.numeric(Y.S1[i,]<=theta2m)-
as.numeric(Y.S12[i,]<=thetal_hat[i]))/(m*p.Sl[i,]))
}
theta2m_hat[i]=uniroot(f,c(-5000,5000))$root
T2m[i]=sum((as.numeric(Y.S[i,]<=thetal_hat[i] ) -
l/2)/(n*p.S[i,]))+sum((as.numeric(Y.S1[i,]<=
theta2m_hat[i])-as.numeric(Y.S12[i,]<=
thetal_hat[i]))/(m*p.si[i, ] ) )
W.prime[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-1/2)
W.star[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-
as.numeric(Y.popu<=thetal_hat[i]))
var.E_T2m[i]=l/n*(sum((as.numeric(Y.popu2<=
theta2m_hat[i])-l/2)"2/p.popu)-W.prime[i]"2)
E.var_T2m[i]=(n-m)/(m*n)*(sum((as.numeric(Y.popu2<=
theta2m_hat[i])-as.numeric(Y.popu<=
thetal_hat[i]))"2/p.popu)-W.star[i]"2)
var_T2m[i]=var.E_T2m[i]+E.var_T2m[i]
####Optimal weight Q can be obtained in terms of
####Var(T2u) and Var(T2m):
Q[i]=var_T2m[i] /(var_T2u[i]+var_T2m[i])
f=func t ion( the ta2)
{
T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=
58
t h e t a 2 ) - l / 2 ) / ( u * p . S 2 [ i , ] ) )
T 2 m . h a t [ i ] = s u m ( ( a s . n u m e r i c ( Y . S l [ i , ] < = t h e t a 2 ) -
a s . n u m e r i c ( Y . S 1 2 [ i , ] < = t h e t a l _ h a t [ i ] ) ) / ( m * p . S l [ i , ] ) )
Q [ i ] * T 2 u . h a t [ i ] + ( l - Q [ i ] ) * T 2 m . h a t [ i ]
}
theta2_hat[i]=uniroot(f,c(-5000, 5000))$root
}
thetal_hat
theta2_hat
Rel.biasl=mean(abs((thetal_hat-
median(Y.popu))/median(Y.popu)))*100
Rel.bias2=mean(abs((theta2_hat-
median(Y.popu2))/median(Y.popu2)))*100
Rel.MSEl=mean((thetal_hat-
median(Y.popu))"2)/median(Y.popu)*100
Rel.MSE2=mean((theta2_hat-
median(Y.popu2))"2)/median(Y.popu2)*100
Rel .biasl
Rel.bias2
Rel.MSEl
Rel.MSE2
########Ghangurde-Rao Scheme Scheme########
permutation=matrix(0,num.sample,N)
permutation2=matrix(0,num.sample,N)
59
p.permu=matrix(0,num.sample,N)
p.permu2=matrix(0,num.sample, N)
P.permu=matrix(0,num.sample,N)
P_star.permu2=matrix(0,num.sample,N)
prob.permu=matrix(0,num.sample,N)
prob.permu2=matrix(0,num.sample, N)
prob.cumul=matrix(0,num.sample,N)
prob.cumul2=matrix(0,num.sample,N)
rand=matrix(0,num.sample,n)
rand2=matrix(0,num.sample,u)
samp'le_indices . S=matrix (0, num. sample, n)
Y.S=matrix(0,num.sample,n)
p.S=matrix(0,num.sample,n)
P.S=matrix(0,num.sample,n)
sample_indices.S2=matrix(0,num.sample, u)
Y.S2=matrix(0,num.sample,u)
p.S2=matrix(0,num.sample,u)
P_star.S2=matrix(0,num.sample,u)
sample_indices.Sl=matrix(0,num.sample,m)
Y.Sl=matrix(0,num.sample,m)
Y.S12=matrix(0,num.sample,m)
p.Sl=matrix(0,num.sample,m)
P.Sl=matrix(0,num.sample, m)
thetal_hat=vector(mode="numeric",length=num.sample)
theta2u_hat=vector(mode="numeric",length=num.sample)
60
theta2m_hat=vector(mode="numeric",length=num.sample)
theta2_hat=vector(mode="numeric",length=num.sample)
T2u=vector(mode="numeric",length=num.sample)
T2m=vector(mode="numeric",length=num.sample)
W=vector(mode="numeric",length=num.sample)
W.prime=vector(mode="numeric",length=num.sample)
W.star=vector(mode="numeric",length=num.sample)
var_T2u=vector(mode="numeric",length=num.sample)
var_T2m=vector(mode="numeric",length=num.sample)
var.E_T2m=vector(mode="numeric",length=num.sample)
E.var_T2m=vector(mode="numeric",length=num.sample)
Q=vector(mode="numeric",length=num.sample)
T2u.hat=vector(mode="numeric",length=num.sample)
T2m.hat=vector(mode="numeric",length=num.sample)
for (i in 1:num.sample)
{
####lst occasion: sample S of n units is selected
####from entire population
#N units are divdied randomly into
#n groups
permuta t ion[ i , ]=sample(N)
p . p e r m u [ i , ] = p . p o p u [ p e r m u t a t i o n [ i , ] ]
f o r ( j in 1:N)
{
#First and last units for each of the n permutated
61
tgroups
first=ceiling(j/k)*k-k+l
last=ceiling(j/k)*k
P.permu[i,j]=sum(p.permu[i,][first:last])
prob.permu[i,j]=p.permu[i,j]/P.permu[i,j]
prob.cumul[i,j]=sum(prob.permu[i,][first:j])
}
#Generate n random probabilities between
#0 and 1 for each sample
rand[i,]=runif (n,0,1)
for(j in 1:n)
{
first=j*k-k+l
last=j*k
sample_indices.S[i,j]=permutation[i,][first+
sum(as.numeric(rand[i,j]>=prob.cumul[i, ] [first:last]))]
Y.S[i,j]=Y.popu[sample_indices.S[i, j]]
p.S[i, j]=p.popu[sample_indices.S[i,j]]
P.S[i,j]=P.permu[i,][first]
}
f=function(thetal)
{
sum((as.numeric(Y.S[i,]<=thetal)-1/2)*P.S[i,]/p.S[i,])
}
thetal_hat[i]=uniroot(f,c(-5000,5000))$root
62
####2nd occasion: sample S2 of u units is selected
####from entire population
#N units are divdied randomly into u groups
permutation2[i,]=sample(N)
p.permu2[i,]=p.popu[permutation2[i, ] ]
for(j in 1:N)
{
first=ceiling(j/r)*r-r+l
last=ceiling(j/r)*r
P_star.permu2[i,j]=sum(p.permu2[i, ] [first:last])
prob.permu2[i,j]=p.permu2[i,j]/P_star.permu2[i,j]
prob.cumul2[i,j]=sum(prob.permu2[i,][first:j])
}
#Generate u random probabilities between 0 and 1
#for each sample
rand2[i,]=runif (u,0,1)
for(j in 1:u)
{
first=j*r-r+l
last=j*r
sample_indices.S2[i,j]=permutation2[i,][first+
sumfas.numeric(rand2[i,j]>=
prob.cumul2[i,] [first:last]))]
Y.S2[i,j]=Y.popu2[sample_indices.S2[i, j] ]
p.S2[i,j]=p.popu[sample_indices.S2[i, j]]
63
P_star.S2[i,j]=P_star.permu2[i, ] [first]
}
f=function(theta2u)
{
sural(as.numeric(Y.S2[i,]<=theta2u)-
1/2)*P_star.S2[i,]/p.S2[i,])
}
theta2u_hat[i]=uniroot(f,c(-5000, 5000))$root
T2u[i]=sum((as.numeric(Y.S2[i,]<=theta2u_hat[i])-1/2)*
P_star.S2[i,]/p.S2[i,])
W[i]=sum(as.numeric(Y.popu2<=theta2u_hat[i])-1/2)
var_T2u[i]=(N-u)/((N-l)*u)*
(sum((as.numeric(Y.popu2<=theta2u_hat[i])-
1/2)"2/p.popu)-W[i]-2)
####2nd occasion: sample SI of m units is selected
####from sample S
sample_indices.SI[i,]=
sample(sample_indices.S[i,],m,replace=FALSE)
Y.SI[i,]=Y.popu2[sample_indices.SI[i, ] ]
Y.S12[i, ]=Y.popu[sample_indices.Sl[i,]]
p.SI[i, ]=p.popu[sample_indices.SI[i,]]
for (j in l:m)
{
P.Sl[i,j]=P.S[i,sample_indices.S[i, ]==
sample_indices.SI[i,j]]
64
}
f=function(theta2m)
{
n/m*sum((as.numeric(Y.Sl[i,]<=theta2m)-
as.numeric(Y.S12[i,]<=
thetal_hat[i]))*p.si[i,]/p.SI [i,])
}
theta2m_hat[i]=uniroot(f,c(-5000,5000))$root
T2m[i]=sum((as.numeric(Y.S[i,]<=thetal„hat[i])-1/2)*
P.S[i,]/p.S[i,])+n/m*sum((as.numeric(Y.Sl[i, ]<='
theta2m_hat[i])-as.numeric(Y.S12[i,]<=
thetal_hat[i]))*P.Sl[i,]/p.Sl[i,])
W.prime[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-1/2)
W.star[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-
as.numeric(Y.popu<=thetal_hat[i]))
var.E_T2m[i]=(N-n)/((N-1)*n)*(sum((as.numeric(Y.popu2<=
theta2m_hat[i])-1/2)"2/p.popu)-W.prime[i]"2)
E.var_T2m[i] = (n-m)/(m*n*(N-1) ) *
((N-2*n+n/N)*sum((as.numeric(Y.popu2<=theta2m_hat[i])-
as.numeric(Y.popu<=thetal_hat[i]))"2/p.popu)+
(n-1)*N*W.star[i]"2)
var_T2m[i]=var.E_T2m[i]+E.var_T2m[i]
####0ptimal weight Q can be obtained in terms of
####Var(T2u) and Var(T2m):
Q[i]=var_T2m[i]/(var_T2u[i]+var_T2m[i])
65
f=function(theta2)
{
T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=theta2)-1/2)*
P_star.S2[i,]/p.S2[i,])
T2m.hat[i]=n/m*sum((as.numeric(Y.SI[i, ]<=theta2)-
as.numeric(Y.S12[i,]<=thetal_hat[i]))*P.Sl[i,]/p.Sl[i,])
Q[i]*T2u.hat[i]+(l-Q[i])*T2m.hat[i]
}
theta2_hat[i]=uniroot(f,c(-5000,5000))$root
}
thetal_hat
theta2_hat
Rel.biasl=mean(abs((thetal_hat-
median(Y.popu))/median(Y.popu)))*100
Rel.bias2=mean(abs((theta2_hat-
median(Y.popu2))/median(Y.popu2)))*100
Rel.MSEl=mean((thetal_hat-
median(Y.popu))"2)/median(Y.popu)*100
Rel.MSE2=mean((theta2_hat-
median(Y.popu2))"2)/median(Y.popu2)*100
Rel.biasl
Rel.bias2
Rel.MSEl
Rel.MSE2
66
########Prasad-Graham Scheme########
permutation3=matrix(0,num.sample, n)
p_star.permu3=matrix(0,num.sample,n)
P_tao.permu3=matrix(0,num.sample, n)
prob.permu3=matrix(0,num.sample,n)
prob.cumul3=matrix(0,num.sample,n)
rand3=matrix(0,num.sample,m)
sample_indices.permu3=matrix(0,num.sample,m)
sample_indices.Sl=matrix(0,num.sample,m)
Y.Sl=matrix(0,num.sample, m)
p.Sl=matrix(0,num.sample,m)
P.Sl=matrix(0,num.sample,m)
p_star.Sl=matrix(0,num.sample, m)
P_tao.Sl=matrix(0,num.sample, m)
theta2m_hat=vector(mode="numeric",length=num.sample)
theta2_hat=vector(mode="numeric",length=num.sample)
T2m=vector(mode="numeric",length=num.sample)
W.prime=vector(mode="numeric",length=num.sample)
var_T2m=vector(mode="numeric",length=num.sample)
var.E_T2m=vector(mode="numeric",length=num.sample)
E.var_T2m=vector(mode="numeric",length=num.sample)
Q=vector(mode="numeric",length=num.sample)
T2u.hat=vector(mode="numeric",length=num.sample)
T2m.hat=vector(mode="numeric",length=num.sample)
for (i in 1:num.sample)
67
{
####2nd occasion: sample SI of m units is selected
####from sample S
#n units are divdied randomly into m groups
permutation3[i,]=sample(n)
p_star.permu3[i,]=Y.S[i,][permutation3[i,]]*
P.S [i,] [permutation3[i,]]/p.S[i,] [permutation3 [i,]]
for(j in l:n)
{
first=ceiling ( j/t)*t-t+l
last=ceiling (j/t)*t
P_tao.permu3[i,j]=sum(p_star.permu3[i, ] [first:last])
prob.permu3[i,j]=p_star.permu3[i,j]/P_tao.permu3[i,j]
prob.cumul3[i,j]=sum(prob.permu3[i,][first:j])
}
•Generate m random probabilities between
#0 and 1 for each sample
rand3[i,]=runif(m,0,1)
for(j in l:m)
{
first=j*t-t+l
last=j*t
sample_indices.permu3[i,j]=permutation3[i,][first+
sum(as.numeric(rand3[i,j]>=
prob.cumul3[i,] [first:last]))]
68
sample_indices.SI[i,j]=
sample_indices.S[i,][sample_indices.permu3[i,j]]
Y.SI[i,j]=Y.popu2[sample_indices.SI[i,j]]
p.SI[i, j]=p.popu[sample_indices.SI[i,j]]
P.Sl[i,j]=P.S[i,sample_indices.S[i,]==
sample_indices.SI[i,j]]
p_star . SI [i, j ] =p_star .permu3 [i, ] [f irst-t-
sum(as.numeric(rand3[i,j]>=
prob.cumul3[i,] [first:last]))]
P_tao.SI[i,j]=P_tao.permu3[i, ] [first]
}
f=function(theta2m)
{
sum((as.numeric(Y.SI[i,]<=theta2m)-
l/2)*P.Sl[i,]/p.Sl[i,]*
P_tao.Sl[i,]/p_star.Sl[i,])
}
t h e t a 2 m _ h a t [ i ] = u n i r o o t ( f , c ( - 5 0 0 0 , 5 0 0 0 ) ) $ r o o t
T 2 m [ i ] = s u m ( ( a s . n u m e r i c ( Y . S l [ i , ] < = t h e t a 2 m _ h a t [ i ] ) -
1 / 2 ) * P . S l [ i , ] / p . S l [ i , ] * P _ t a o . S l [ i , ] / p _ s t a r . S I [ i , ] )
W.prime[i]=sum(as.numeric(Y.popu2<=theta2m_hat[i])-1/2)
a=(N-n)/((N-1)*n)
b=(N*(n-m))/(m*n*(N-1) )
var.E_T2m[i]=a*(sum((as.numeric(Y.popu2<=
t h e t a 2 m _ h a t [ i ] ) - l / 2 ) " 2 / p . p o p u ) - W . p r i m e [ i ] " 2 )
69
E.var_T2m[i]=b*(sum((as.numeric(Y.popu2<=
theta2m_hat[i])-l/2)"2*
sum(Y.popu)/Y.popu)-W.prime[i]"2)
var_T2m[i]=var.E_T2m[i]+E.var_T2m[i]
####Optimal weight Q can be obtained in terms of
####Var(T2u) and Var(T2m):
Q[i]=var_T2m[i] /(var_T2u[i]+var_T2m[i])
f=function ( theta2)
{
T2u.hat[i]=sum((as.numeric(Y.S2[i,]<=theta2)-1/2)*
P_star.S2[i,]/p.S2[i,])
T2m.hat[i]=sum((as.numeric(Y.SI[i, ]<=theta2)-1/2)*
P.Sl[i,]/p.Sl[i,]*P_tao.Sl[i,]/p_star.Sl[i,])
Q[i] *T2u.hat [i] + (1-Q[i] ) *T2m.hat-[i]
}
theta2_hat[i]=uniroot (f,c(-5000,5000))$root
}
thetal_hat
theta2_hat
Rel.biasl=mean(abs((thetal_hat-
median(Y.popu))/median(Y.popu)))*100
Rel.bias2=mean(abs((theta2_hat-
median(Y.popu2))/median(Y.popu2)))*100
Rel.MSEl=mean((thetal_hat-
median(Y.popu))"2)/median(Y.popu)*100
70
Rel.MSE2=mean((theta2_hat-
median(Y.popu2))"2)/median(Y.popu2)*100
Rel.biasl
Rel.bias2
Rel.MSEl
Rel.MSE2
71