Centre for Efficiency and Productivity Analysis · Since the seminal works of Farrell (1957) and...
Transcript of Centre for Efficiency and Productivity Analysis · Since the seminal works of Farrell (1957) and...
Centre for Efficiency and Productivity Analysis
Working Paper Series No. WP05/2015
Bootstrap-based testing for network DEA: Some Theory and Applications
Kelly D.T.Trinh, Valentin Zelenyuk
Date: May 2015
School of Economics University of Queensland
St. Lucia, Qld. 4072 Australia
ISSN No. 1932 - 4398
Bootstrap-based testing for network DEA:Some Theory and Applications
Kelly D.T.Trinh⇤, Valentin ZelenyukSchool of Economics, The University of Queensland, Brisbane, QLD, 4072, Australia
Abstract
Traditional data envelopment analysis (DEA) views a production technology process as a ‘black
box’, while network DEA allows a researcher to look into the ‘black box’, to evaluate the over-
all performance and the performance of each sub-process of the system. The technical efficiency
scores calculated from these approaches can be slightly, or sometimes vastly different. Our aim
is to develop two bootstrap-based algorithms to test whether any observed difference between the
results from the two approaches is statistically significant, or whether it is due to sampling and
estimation noise. We focus on testing the equality of the first moment (i.e., the mean) and of the
entire distribution of the technical efficiency scores. The bootstrap-based procedures can also be
used for pairwise comparison between two network DEA models to perform sensitivity analysis of
the resulting estimates across various network structures. In our empirical illustration of non-life
insurance companies in Taiwan, both algorithms provide fairly robust results. We find statistical
evidence suggesting that the first moment and the entire distribution of the overall technical ef-
ficiencies are significantly different between the DEA and network DEA models. However, the
differences are not statistically significant for the two sub-processes across these models.
Keywords: DEA, Network DEA, Subsampling Bootstrap.
⇤Corresponding authorEmail addresses: [email protected] (Kelly D.T.Trinh ), [email protected] (Valentin
Zelenyuk)
1
1. Introduction
Since the seminal works of Farrell (1957) and Charnes et al. (1978), data envelopment analysis
(DEA) has been extensively used to measure the efficiency of a set of decision making units
(DMUs) (i.e., firms, industries, etc.). The approach views production technology as a ‘black box’
in which inputs are transformed to produce final outputs. In reality, however, many production
technology processes are network technologies, where outputs of one sub-process are used as
inputs of another sub-process. Network DEA (hereafter NDEA), first discussed in Charnes et al.
(1986), is an approach allowing a researcher to look into the ‘black box’, to take into account
the internal structures of a technology process, and to evaluate the overall performance and the
performance of each sub-process of each DMU.
Unlike conventional DEA, NDEA does not have a standard form as it depends upon the spe-
cific network structures of a technology process. Fare and Grosskopf (1996, 2000) develop several
static and dynamic NDEA models within the framework of envelopment-based approach, while
Tone and Tsutsui (2009, 2010, 2014) develop models in the framework of slacks-based measure
approach. Kao and Hwang (2008) and Kao (2009a,b) add variants to the literature, modify and
generalize a number of NDEA models using a multiplier representation (see Kao (2014) for a re-
cent survey of NDEA works and related references). These NDEA models have recently received
considerable attention in many empirical studies such as health economics (Fare et al. (1996)),
agriculture (Fare and Whittaker (1995)), banking services (Avkiran (2009)), electricity services
(Tsutsui and Goto (2009)), etc.
A rational question is whether the difference of the resulting technical efficiency scores cal-
culated in DEA and NDEA models are statistically significant, or whether the difference is only
due to sampling and estimation variations. A researcher might be interested in these questions for
several reasons. For instance, it can become costly to collect information about intermediate prod-
ucts for a large sample, but may be easier (or already exist) for a smaller sample. An appropriate
statistical test verifying the statistical difference between these two models can provide guidance
for whether one should proceed to collect the information for a larger sample. Another example
is the case where data for both approaches exist, and both methods show ‘superficially similar’
2
results. In this case, a researcher advocating the NDEA approach might need to present statistical
evidence to show that it provides statistically different results to those obtained using DEA. Oth-
erwise, if the difference is statistically insignificant, one might prefer to use the most well-known,
and simplest approach.
A number of studies in the literature discuss the discrepancies of the results between DEA
and NDEA models, but they often do not provide a formal statistical test. One of the exceptions
is the work of Kao and Hwang (2008) in the study of non-life insurance companies in Taiwan.
They investigate the similarities between the DEA and NDEA models by ranking the insurance
companies according to the estimated individual efficiency scores, and then use Spearman’s rank
correlation test to investigate whether the two models provide the same rankings.
In this paper, we test for the equivalence of NDEA and DEA models using bootstrap tech-
niques, which are widely used in DEA framework to perform statistical inferences of technical
efficiencies (e.g., confidence intervals, standard errors, etc.).1 The proposed algorithms can also
be used to perform pairwise comparison of NDEA models to explore the sensitivity of NDEA
estimates across various network structures or assumptions imposed in a production process.
To test the equivalence of the two models, we consider hypothesis tests regarding the equality
of the first moments (i.e., means) and of the entire distributions of technical efficiency scores
between DEA and NDEA models. We illustrate the proposed procedures using the empirical
study investigated by Kao and Hwang (2008).
The paper is structured as follows. Section 2 describes the measurement of efficiency scores
in conventional DEA and NDEA models. Section 3 discusses hypotheses for testing the means
and distributions of the technical efficiencies between the DEA and NDEA models. Section 4
elaborates the bootstrap-based algorithms. Section 5 presents some empirical results and Section
6 concludes the paper.
1See Simar and Wilson (2000a,b) for an early survey of bootstrap methods used in DEA.
3
2. Efficiency measurement
Given a vector of q inputs (x 2 Rq
+) and a vector of p outputs (y 2 Rp
+), the set of feasible
combinations of the input and output vectors is characterized by the production technology set:
T ⌘ {(x, y) 2 Rq+p
+ : x 2 Rq
+ can produce y 2 Rp
+}. (2.1)
Equivalently, the technology can also be described by the output correspondence set:
P (x) ⌘ {y 2 Rp
+ : (x, y) 2 T}, x 2 Rq
+. (2.2)
The attainable set, defined in equation (2.2), is required to meet some standard regularity con-
ditions such as free disposability, convexity, returns to scale, etc.(see Fare and Primont (1995)
for further details). Under these assumptions, the technology set T can be characterized by the
Shephard (1970) output distance:
Do
(x, y) ⌘ inf{✓ > 0 : (x, y/✓) 2 T}. (2.3)
The output distance function is particularly convenient as a criterion for technical efficiency
of a DMU; it measures the distance from a point y in T to the upper boundary of T . Such
an efficiency criterion is often expressed in another form, the reciprocal of (2.3), denoted as
TE(x, y) = 1/Do
(x, y), which is well-known as the Farrell (1957) output-oriented measure of
technical efficiency.
If we let the technological frontier be the ‘upper’ boundary of T ,
@T = {(x, y) 2 Rq+p
+ : (x, y) 2 T, (x, ✓y) /2 T, 8✓ 2 (1,1)}, (2.4)
then for technically feasible allocations, TE 2 [1,1), TE = 1 implies that a DMU is technically
efficient (i.e., on the frontier @T ), whereas TE > 1 means that a DMU is not technically efficient.
In practice, we observe neither the true technology (T ) nor its boundary (@T ). However, we
can estimate T from an observed sample via various approaches such as DEA or NDEA.
4
2.1. Conventional DEA model
The seminal paper of Charnes et al. (1978), inspired by Farrell (1957), introduced the nonparamet-
ric DEA approach which uses linear programming techniques to estimate a convex frontier set. In
the framework, the best practice frontier is estimated by an empirical analogue of equation (2.4):
@ ˆTD
= {(x, y) 2 Rq+p
+ : (x, y) 2 ˆTD, (x, ✓y) /2 ˆTD, 8✓ 2 (1,1)}. (2.5)
where the technology ˆTD is characterized as:
ˆTD
= {(x, y) :nX
k=1
�k
ykl
� yl
, l = 1, . . . , p;
nX
k=1
�k
xks
xs
, s = 1, . . . , q;
�k
� 0, k = 1, . . . , n}. (2.6)
The non-negative �k
are the intensity variables determining the production frontier. The con-
straint �k
� 0 implies the constant returns to scale (CRS) and it can easily be relaxed in the style
of the Banker et al. (1984) model.2 Given the construction, the DEA technical efficiency estimate
at a fixed point (x, y) is a solution to the linear programming problem:
dTED
⌘ dTED
(x, y) = max
✓,�1,...,�n
{✓ : (x, ✓y) 2 ˆTD}. (2.7)
2.2. Network DEA model
Unlike conventional DEA, NDEA views a whole production process as a network technology: the
inputs can be directly used by all sub-processes, and the outputs of each sub-process can be either
the intermediate products, used by other sub-processes for the production, or final outputs of the
system. This type of technology process is depicted in Figure 2.1. We note that the illustration
in Figure 2.1 is only one specific network technology among many others, and that the proposed
bootstrap-based algorithms are not restricted to this specific structure of network technology. In
Figure 2.1, total available inputs are denoted as x = (x(1), x(2)) 2 Rq
+, where x(r) 2 Rq
+(r = 1, 2)
2For example, variable returns to scale can be imposed by adding another constraint:Pn
k=1 �k = 1.
5
y(1) 2 Rp1+
y(2) 2 R
p
2
+
x(1) 2 R
q
+
x(2) 2 Rq
+x = {x(1), x(2)} 2 Rq
+
z 2 Rm
+
y = {y(1), y(2)} 2 Rp
+;
p = p1 + p21
2
Figure 2.1: A Production Technology Process
is the portion of total inputs used in sub-process r. Final outputs are denoted as y = (y(1), y(2)) 2
Rp
+, where y(1) 2 Rp1+ and y(2) 2 Rp2
+ (p = p1 + p2) are the final outputs produced in sub-process
1 and sub-process 2, respectively. An intermediate product, denoted as z 2 Rm
+ , is the output
produced in sub-process 1, and is used as the input in sub-process 2.
Under the assumptions of free disposability, convexity and CRS, the technology ˆTN can be
characterized in the NDEA framework as follows:
ˆTN
= {(x, z, y) :
Sub-process 1:3
y(1)l1
nX
k=1
�(1)k
y(1)kl1, l1 = 1, . . . , p1; (2.8)
zt
nX
k=1
�(1)k
zkt
, t = 1, . . . ,m; (2.9)
x(1)s
�nX
k=1
�(1)k
x(1)ks
, s = 1, . . . , q; (2.10)
�(1)k
� 0, k = 1, . . . , n. (2.11)
3We thank an anonymous referee for pointing out that the constraint (2.9) and (2.13) can be written asPn
k=1(�(1)k �
�(2)k )zkt � 0, where t = 1, . . . ,m, which is similar to Model (8) in Chen et al. (2010, p.141).
6
Sub-process 2:
y(2)l2
nX
k=1
�(2)k
y(2)kl2, l2 = 1, . . . , p2; (2.12)
zt
�nX
k=1
�(2)k
zkt
, t = 1, . . . ,m; (2.13)
x(2)s
�nX
k=1
�(2)k
x(2)ks
, s = 1, . . . , q; (2.14)
xs
� x(1)s
+ x(2)s
, s = 1, . . . , q; (2.15)
�(2)k
� 0, k = 1, . . . , n}. (2.16)
The overall NDEA technical efficiency estimate at a fixed point (x, z, y) is a solution to the
linear programming problem:
dTEN
⌘ dTEN
(x, z, y) = max
✓,�
(1)1 ,...,�
(1)n ,�
(2)1 ,...,�
(2)n
{✓ : (x, ✓y) 2 ˆTN}. (2.17)
It is important to note that if the two sub-processes are not differentiated, meaning the non-
negative intensity of sub-process 1 and sub-process 2 are equivalent (�(1)k
= �(2)k
, k = 1, . . . , n),
then the NDEA model is equivalent to the conventional DEA model.4 Thus, the DEA model can
be viewed as a restricted version, or degenerate version of this NDEA model.
3. Testing for the equality of means and distributions of technical efficiency
scores
To measure the overall technical efficiency, a researcher can either use (2.7) in DEA or (2.17) in
NDEA. In this case, DEA does not take into account the intermediate product z. A researcher, in
some circumstances, might prefer to include the intermediate product z in the estimation of the
individual efficiency scores in DEA framework by treating it as an input or final output. DEA, in
4Another potential alternative of testing the equivalence of NDEA and DEA models is to test the equality of
�(1)k = �
(2)k , k = 1, . . . , n. Investigating this alternative is a research question in itself and so is beyond the scope of
this paper. We are grateful to an anonymous referee for the insightful suggestion.
7
this case, provides a misspecification of a technology process. Further discussion regarding the
similarities and discrepancies between NDEA and DEA models can be found in Kao (2014).
If a particular technology network structure is essential, then both DEA and NDEA models,
up to a certain level of statistical noise, should yield the same or similar results. Otherwise,
DEA becomes restrictive whereas NDEA is an appropriate approach and so the resulting technical
efficiency scores should be rather different between the approaches in a statistical sense. Our goal
is to develop formal hypothesis tests to verify the equivalence of the estimates obtained from the
two models. In the following sub-sections, we describe the hypothesis tests for the equality of the
mean and of distribution of the technical efficiencies.
3.1. Testing the equality of the mean technical inefficiency scores between NDEA and DEA models
The first hypothesis we consider is:
H0 : E(
dTEN
) = E(
dTED
) against H1 : E(
dTEN
) 6= E(
dTED
),
where E(
dTEN
) and E(
dTED
) stand for the mean of the estimated efficiency scores calculated
from NDEA and DEA models, respectively.
We are interested in how far, in a statistical sense, the quantity RD =
E(dTE
N)
E(dTE
D)
deviates from
unity. We can infer it from its estimator dRD =
1n
Pnk=1
dTE
N(xk,zk,yk)
1n
Pnk=1
dTE
D(xk,yk)
, where n is the number of
observations.5 For simplicity, we now refer to dTEN
k
and dTED
k
as the individual (k) technical
efficiency, which are computed from NDEA (
dTEN
(xk
, zk
, yk
)) and DEA (
dTED
(xk
, yk
)), respec-
tively.
The hypothesis test is developed in the spirit of Simar and Wilson (2002) for testing hypotheses
of the returns to scale in the nonparametric context, and as in Simar and Zelenyuk (2007) in the
context of testing equality of aggregate technical efficient scores. The null hypothesis is rejected
if the bootstrap-based p-value is smaller than the chosen level of significance ↵ (e.g., 5%).
5Other possible test statistics are the ratio of median, the ratio of trimmed means.
8
3.2. Testing the equality of the density distributions of technical efficiency scores between NDEA and
DEA models
The second hypothesis we consider is:
H0 : fN
(v) = fD
(v) against H1 : fN
(v) 6= fD
(v), on a set of positive measures,
where fN
(.) and fD
(.) are the probability density distributions of technical efficiency scores of
NDEA and DEA, respectively.
To test the hypothesis, Li (1996, 1999) considered the integrated square difference criterion
I ⌘Z
(fN
(v)� fD
(v))2dv
=
ZfN
(v)dFN
(v) +
ZfD
(v)dFD
(v)
�Z
fD
(v)dFN
(v)�Z
fN
(v)dFD
(v), (3.1)
where f j
(.) and F j
(.) are unknown density and distribution for each approach j (j = N,D) .
Using central limit theorem from Hall (1984), Li (1996) shows that a consistent and asymp-
totically normal estimator of I is obtained by replacing the unknown distribution F j
(.) and the
unknown density f j
(.) with the corresponding empirical distribution functions, ˆF j
(.), and the
nonparametric kernel density estimators, ˆf j
(.), respectively:
ˆF j
(v) ⌘ 1
n
nX
k=1
I(vjk
v), j = N,D (3.2)
andˆf j
(v) ⌘ 1
nhj
nX
k=1
K⇣v � vj
k
hj
⌘, (3.3)
where I(A) is the indicator function which yields 1 if the statement A is true, and 0 otherwise;
hj is a bandwidth such that hj ! 0, nhj ! 1 as n ! 1; K is an appropriate kernel function
and v is a point at which the density is estimated. The choice of kernel density estimation is not
as crucial as the choice of the bandwidth. Here we choose the Gaussian kernel and an optimal
bandwidth using least squares cross-validation proposed by Bowman (1984).
9
Li (1996) notes that the resulting statistic (
ˆ
I) without the diagonal terms, in most of Monte
Carlo experiments, performs better than the statistic with the diagonal terms. The statistic without
the diagonal terms is defined as:
ˆ
I
nd
n,h
=
1
hn(n� 1)
nX
i=1
nX
k 6=i,k=1
K⇣vN
i
� vNk
h
⌘
+
1
hn(n� 1)
nX
i=1
nX
k 6=i,k=1
K⇣vD
i
� vDk
h
⌘
� 1
hn(n� 1)
nX
i=1
nX
k 6=i,k=1
K⇣vD
i
� vNk
h
⌘
� 1
hn(n� 1)
nX
i=1
nX
k 6=i,k=1
K⇣vN
i
� vDk
h
⌘. (3.4)
where the bandwidth h = min{hN , hD}.
Li (1996, p.265) shows the limiting distribution of ˆIndn,h
in (3.4) is a standard normal distribution
after appropriate ‘standardization’:
ˆLnd
n,h
⌘nh1/2
ˆ
I
nd
n,h
b�h
d�! N(0, 1) (3.5)
where
�2h
= 2
h1
hn2
nX
i=1
nX
k=1
K�vN
i
� vNk
h
⌘
+
1
hn2
nX
i=1
nX
k=1
K�vD
i
� vDk
h
⌘
+
1
hn2
nX
i=1
nX
k=1
K�vD
i
� vNk
h
⌘
+
1
hn2
nX
i=1
nX
k=1
K�vN
i
� vDk
h
⌘i⇥Z
K2(v)dv.
For the application of the Li test, in our context, one of the main concerns is the discontinuity
issue: there is ‘spurious’ mass at unity, because at least one observation, by the construction of
DEA, is on the frontier while in reality the probability of such an event is zero.6 To address this
6The discontinuity issue in NDEA might not be as serious as it is in DEA because its constraints are less stringent
than those of the conventional DEA model.
10
issue, we follow Algorithm II suggested by Simar and Zelenyuk (2006) where the observations
equal to unity are smoothed away from the boundary by adding a small noise of an order of
magnitude smaller than the noise of estimation. The order of magnitude is suggested by the rate
of convergence of the DEA estimator. The smoothing is made as follows:
dTE⇤jk
=
8><
>:
dTEj
k
+ ✏jk
if dTEj
k
= 1,
dTEj
k
otherwise,(3.6)
where j = N,D, ✏jk
⇠ U(0,min{n�2/(p+q+1), a � 1}), a is the ↵ quantile of the empirical distri-
bution of dTEj
k
> 1.
The null hypothesis is rejected if the bootstrap-based p-value is less than the chosen level of
significance ↵. The bootstrap-based procedures for the p-values are described in Section 4.
4. Bootstrap-based algorithms
Bootstrap techniques, first proposed by Efron (1979), have been applied in various disciplines for
many studies. In productivity and efficiency analysis, bootstrap techniques were first introduced
by Simar (1992). Two main bootstrap approaches are the smoothed bootstrap and the subsampling
bootstrap. For about a decade, the smoothed bootstrap (Simar and Wilson (1998, 2000a,b)) was
the dominant approach in practice. Although no formal proofs were given at the time, it was an
important tool for researchers to perform statistical inference on estimated efficiencies such as
confidence intervals, standard errors, etc. About a decade later, Kneip et al. (2008) provided a
formal proof for the consistency of the subsampling bootstrap (m out of n subsampling bootstrap,
m < n), and that the smoothed bootstrap is its approximation. We therefore use the subsampling
bootstrap here, which is consistent for DEA and therefore also consistent for NDEA under the null
hypotheses.
The bootstrap-based tests are designed in the spirits of the works of Simar and Wilson (2002),
Simar and Zelenyuk (2006) and Simar and Zelenyuk (2007). We try to carefully adapt the existing
statistical methods to a new context of NDEA framework, while encouraging future research to
deliver the formal proofs (e.g., consistency under the alternative hypotheses).
11
To facilitate further discussion, let us assume that a data set Sn
= {(xk
, zk
, yk
) : k = 1, . . . , n}
is generated from a data generating process (DGP), which is completely characterized by the
knowledge of technology set P (x), and of the probability density function f(x, y),x = (x, z).
Let P denote the DGP, then P = (P (x), f(., .)), and its estimates bP = (
bP(x), bf(., .)). In general,
to bootstrap an unknown parameter of interest, �, which in this paper are RD and Lnd
n,h
, we first
estimate � from the original sample Sn
, and denote as �. Under the null hypotheses, we randomly
draw (with replacement) a new bootstrap sample S⇤m
= {(x⇤k
, z⇤k
, y⇤k
) : k = 1, . . . ,m;m n}
from Sn
, and then estimate the bootstrap estimates of �, denoted as �⇤, from the bootstrap sample
S⇤m
using the same formula as for the estimator �. For the case of DEA, the naive bootstrap is
inconsistent (see Simar and Wilson (2000a, p.786) for further discussion) while a subsampling
bootstrap is consistent (under the null hypotheses) for any subsample size of m = n, 0 < < 1.
If the bootstrap is consistent, then the relationship between the bootstrap estimates (�⇤) and
the original estimates (�) will mimic the relationship between the original estimates (�) and the
true unobserved values (�). In our case, this implies that:
(b�⇤ � b�)| ˆP asy⇠ (b� � �)|P
In order to preserve the structure of NDEA, we draw a bootstrap sample simultaneously across
all sub-processes of DMUs. That is, we resample on the ‘triples’ (x, z, y) across the n DMUs, and
separate the drawn observations back into sub-processes for each DMU.
Here we provide two bootstrap algorithms (Algorithm A and Algorithm B) to obtain the
bootstrap-based p-value under the null hypotheses. Algorithm A is a more comprehensive proce-
dure as the bootstrap estimates of the statistic � are computed from the bootstrap samples, which
are randomly drawn from the original sample Sn
= {(xk
, zk
, yk
) : k = 1, . . . , n}. On the other
hand, the bootstrap estimates of the statistics �, in Algorithm B, are computed from the bootstrap
samples that are drawn from the estimated technical efficiency scores computed from either the
DEA or NDEA models for the original sample Sn
. In other words, Algorithm A requires solv-
ing linear programming problem to obtain the individual technical efficiencies for each bootstrap
iteration while Algorithm B does not.
12
4.1. Bootstrap algorithms for testing the equality of the mean technical efficiency scores between
NDEA and DEA models
[1.] For each observation in sample Sn
= {(xk
, zk
, yk
) : k = 1, . . . , n}, compute the individual
technical efficiency scores using (2.7) and (2.17), and label them as {dTED
k
⌘ dTED
(xk
, yk
) :
k = 1, . . . , n} and {dTEN
k
⌘ dTEN
(xk
, zk
, yk
) : k = 1, . . . , n}, respectively.
[2.] Construct the test statistic dRD =
E(dTE
N)
E(dTE
D)
, where ˆE(
dTEj
) =
1n
Pn
k=1dTE
j
k
, j = N,D.
[3.] Algorithm A:
• Draw two bootstrap samples of size m out of n (m < n) by drawing triples (xk
, zk
, yk
)
independently and with replacement from the original sample Sn
, label them as S⇤Dmb
=
{(x⇤Dkb
, y⇤Dkb
) : k = 1, . . . ,m;m < n} and S⇤Nmb
= {(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . . ,m;m <
n}, where b denotes a bootstrap iteration, b = 1, . . . , B.
• Compute the bootstrap estimates of dTED
k
, dTEN
k
via DEA using equation (2.7) for the
bootstrap samples S⇤Dmb
and S⇤Nmb
, and label them as {dTE⇤Dkb
⌘ dTE⇤D(x⇤D
kb
, y⇤Dkb
) : k =
1, . . . ,m;m < n} and {dTE⇤Nkb
⌘ dTE⇤N
(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . . ,m;m < n},
respectively.
Algorithm B: Draw two samples of size n out of n by resampling from {dTED
k
: k =
1, . . . , n}, which are calculated in step [1.], and label them as {dTE⇤Dkb
⌘ dTE⇤D(x⇤D
kb
, y⇤Dkb
) :
k = 1, . . . , n} and {dTE⇤Nkb
⌘ dTE⇤N
(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . . , n}.
[4.] Construct the bootstrap estimates of dRD using the bootstrap samples {dTE⇤Dkb
: k = 1, . . . , n}
and {dTE⇤Nkb
: k = 1, . . . , n}, and denote them as dRD⇤b
=
E(dTE
⇤Nb )
E(dTE
⇤Db )
, where ˆE(
dTE⇤jb
) =
1n
Pn
k=1dTE
⇤jkb
, j = N,D, n = m < n for Algorithm A, and n = n for Algorithm B.
[5.] Repeat steps [3.]-[4.] B times, b = 1, . . . , B.
[6.] Construct the bootstrap-based p-value for the two-tailed hypothesis test:
p =
1
B
BX
b=1
I{|dRD⇤b
| > |dRD|}, (4.1)
13
where I is an indicator function, which yields the value of 1 if |dRD⇤b
| > |dRD| is true, and 0
otherwise.
If a researcher is interested in performing an one-tail hypothesis test (e.g., H0 : RD = 1
against H1 : RD > 1 () E(
dTEN
) > E(
dTED
), then the bootstrap-based p-value is:
p =
1
B
BX
b=1
I{dRD⇤b
> dRD}. (4.2)
Also, one can use other statistics (e.g., absolute difference, mean of squared deviations, etc.) to
test for the equivalence of the models. The proposed algorithms are easily adapted to this context.
In addition, rather than using a bootstrap-based p-value, one can use bootstrap-based confi-
dence intervals of the dRD statistic, in a similar fashion to Simar and Zelenyuk (2007). The null
hypothesis is to be rejected if the bootstrap-based confidence interval does not include unity.
4.2. Bootstrap algorithm for testing the equality of distributions of efficiency scores between NDEA
and DEA models
[1.] For each observation in sample Sn
= {(xk
, zk
, yk
) : k = 1, . . . , n}, compute the individual
technical efficiency scores using (2.7) and (2.17), and label them as {dTED
k
⌘ dTED
(xk
, yk
) :
k = 1, . . . , n} and {dTEN
k
⌘ dTEN
(xk
, zk
, yk
) : k = 1, . . . , n}, respectively.
[2.] Smooth {dTEj
k
: k = 1, . . . , n}, where j = N,D, by adding small noise to estimate the
technical efficiency scores equal to unity in the following way:
dTE⇤jk
=
8><
>:
dTEj
k
+ ✏jk
if dTEj
k
= 1,
dTEj
k
otherwise,(4.3)
where ✏jk
⇠ U(0,min{n�2/(p+q+1), a � 1}), a is the ↵ quantile of the empirical distribution
of dTEj
k
> 1.
[3.] Compute the Li (1996) statistic, ˆLnd
n,h
, via equation (3.5) using the data dTE⇤jk
in step [2.],
and the optimal bandwidth is chosen such that h = min{hN , hD}.
[4.] Algorithm A:14
• Draw two bootstrap samples of size m out of n (m < n) by drawing triples (xk, zk, yk)
independently and with replacement from the original sample Sn
, label them as S⇤Dmb
=
{(x⇤Dkb
, y⇤Dkb
) : k = 1, . . . ,m;m < n}, S⇤Nmb
= {(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . .m;m <
n}, where b denotes a bootstrap iteration, b = 1, . . . , B.
• Compute the bootstrap estimates of dTED
k
, dTEN
k
using DEA ( i.e., equation (2.7)) for
bootstrap samples S⇤Dmb
and S⇤Nmb
, and label them as {dTE⇤Dkb
⌘ dTE⇤D(x⇤D
kb
, y⇤Dkb
) : k =
1, . . . ,m;m < n} and {dTE⇤Nkb
⌘ dTE⇤N
(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . . ,m;m < n}
respectively.
Algorithm B: Draw two samples of size n out of n by resampling from {dTED
k
: k =
1, . . . , n}, which are calculated in step [1.], and label them as {dTE⇤Dkb
⌘ dTE⇤D(x⇤D
kb
, y⇤Dkb
) :
k = 1, . . . , n} and {dTE⇤Nkb
⌘ dTE⇤N
(x⇤Nkb
, z⇤Nkb
, y⇤Nkb
) : k = 1, . . . , n}.7
[5.] Smooth dTE⇤jkb
if dTE⇤jkb
= 1, j = N,D, by adding small noise as in (4.3), and denote them
as {dTE⇤⇤Dkb
: k = 1, . . . , n} and {dTE⇤⇤Nkb
: k = 1, . . . , n}, n = m < n for Algorithm A, and
n = n for Algorithm B.
[6.] Compute the bootstrap-based Li statistic via (3.5), and h⇤b
= min{h⇤Nb
, h⇤Db
} using the data
obtained in step [5.], dTE⇤⇤Nkb
,dTE⇤⇤Dkb
, and label them as ˆL⇤ndn,h
⇤b.
[7.] Repeat steps [4.]-[6.] B times, b = 1, . . . , B.
[8.] Compute the bootstrap-based p-values such that
p =
1
B
BX
b=1
I{|ˆL⇤ndn,h
⇤b| > |ˆLnd
n,h
|},
where I is an indicator function, which yields the value of 1 if |ˆL⇤ndn,h
⇤b| > |ˆLnd
n,h
| is true, and
0 otherwise.
7Li (1999, pp.204-205) concludes from the Monte Carlo simulations that the Li test performs better in the case of
drawing from one sample than in the case of drawing from a pooled sample.
15
5. Empirical illustration
For the empirical illustration, we use the data from Kao and Hwang (2008) in the study of non-life
insurance companies in Taiwan. The data include two primary inputs (operation and insurance
expenses), two intermediate products (direct written and reinsurance premiums), and two final
outputs (underwriting and investment profit) for 24 non-life insurance companies. The whole
production process of non-life insurance can be divided into two sub-processes. In the first sub-
process, premium acquisition, clients pay direct written premiums and other insurance companies
pay reinsurance premiums. In the second sub-process, profit generation, premiums are invested in
a portfolio to earn profit.
Conventional DEA can be used to measure the technical efficiencies for each sub-process as
well as the overall technical efficiencies. The estimation of the overall technical efficiency scores,
however, does not take into account the performance of the two sub-processes. Kao and Hwang
(2008) accounted for the internal structures in the estimation of the overall technical efficiency
scores by using a version of NDEA, which they called the ‘relational two-stage’ DEA. The ques-
tion is whether these efficiency scores calculated using NDEA are statistically different from those
calculated using a conventional DEA.
5.1. Hypothesis test for the equality of the means of technical efficiency scores between NDEA and
DEA models
The hypothesis tested here is:
H0 : E(
dTEN
) = E(
dTED
) against H1 : E(
dTEN
) 6= E(
dTED
).
The results of the two-tailed hypothesis test are summarized in Table 5.1.8 The computed RD
statistics are all greater than 1 (i.e., 1.63 for the overall efficiency, 1.09 for sub-process I, 1.17 for
sub-process II). These results indicate that the efficiency scores obtained from the NDEA and DEA
models are not equivalent. Furthermore, our statistical testing results suggest that only the means
8The results of the one-tailed hypothesis test are presented in Table A.1.
16
Table 5.1: Summary results of the two-tailed hypothesis test for the equality of means of
efficiency scores between NDEA and DEA models
Algorithm A Algorithm B
RD statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)
Process
The whole process 1.63 0.01 0.01 0.01 0.00Sub-process I 1.09 0.08 0.07 0.07 0.06Sub-process II 1.17 0.21 0.20 0.18 0.10
The number of bootstrap iterations are 1000 for each algorithm and for each hypoth-
esis test in the empirical illustration.
of overall technical efficiencies are statistically different between the NDEA and DEA models
while this is not the case for the two sub-processes (p-values > 0.05).
For Algorithm A, we try subsample sizes of m = [12, 14, 16] to investigate the sensitivity of
the choice of m.9 The conclusions are robust to the choices of subsampling bootstrap size m, and
across the two algorithms, given the level of significance, ↵ = 5%.
5.2. Hypothesis test for the equality of distributions of technical efficiency scores between NDEA and
DEA models
We now turn to investigating the entire probability density function, which contains more infor-
mation regarding the distribution of the efficiency scores. These density functions of the overall
process and its sub-processes are depicted in Figure A.1. Here we use the Silverman (1986) re-
flection method to account for the bounded support of the distribution of the technical efficiency
scores (see Simar and Zelenyuk (2006)).
Panel (a) of Figure A.1 shows that the distributions of the overall technical efficiencies appear
distinct between the two models. The higher mass at unity for the DEA model is due to the
9For simplicity of presentation, we use various choices of m to investigate the sensitivity, since the paper is not
about the choice of m. To choose the optimal m, one can use the approach suggested by Politis et al. (2001) adapted
to DEA by Simar and Wilson (2011).
17
more stringent constraints of this approach relative to NDEA. In addition, the spread of technical
inefficiencies in the NDEA model is much larger than that in the DEA model. While the NDEA
technical efficiencies are in the range of [1,7], most of the DEA estimates are concentrated around
the range [1,3] and a few observations yield the inefficiencies greater than 3. Panels (b) and (c),
on the other hand, show the probability density functions of the two sub-processes appear rather
similar.
The significance of the differences can be confirmed in the following hypothesis test:
H0 : fN
(v) = fD
(v) against H1 : fN
(v) 6= fD
(v) on a set of positive measures.
The estimated Li statistics and their bootstrap-based p-values are presented in Table 5.2. We
found statistical evidence suggesting that the density functions of the overall technical efficiency
are statistically different, whereas it is not so for the sub-processes (bootstrap-based p-values are
larger than 0.1).10
Table 5.2: Summary results of the hypothesis test for the equality of probability density
functions of efficiency scores between NDEA and DEA models
Algorithm A Algorithm B
Li statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)Process
The whole process 2.78 0.04 0.05 0.05 0.07Sub-process I 1.18 0.34 0.37 0.32 0.22Sub-process II 0.17 0.82 0.82 0.86 0.81
Gaussian kernel density estimation is used with the bandwidth chosen using least
squares cross-validation in Bowman (1984).
We also consider the hypothesis test of the equality of cumulative distributions using Kolmogorov-
Smirnov (KS) test. The hypothesis tested is:
H0 : FN
(v) = FD
(v) against H1 : FN
(v) 6= FD
(v).
10We also use Sheather and Jones (1991) bandwidth, and the conclusions are remained the same.
18
The KS statistic is defined as [KST = (n/2)1/2 supv
| ˆFN
(v) � ˆFD
(v)|, where an empirical
distribution ˆF j
(.), j = N,D, is computed as 1n
Pn
i=1 I(xi
x), where I(A) is an indicator
function which yields the value of 1 if the statement A is true, and 0 otherwise. An appealing
property of the KS test is that its test statistic is independent of the distribution F j
(v).
A direct application of the KS test into our context may not be appropriate as there is a dis-
continuity issue here. Analogous to the adapted Li test in Section 4, we smooth the observations
yielding technical efficiencies equal to one by adding small noise such as in (3.6), and then use
Algorithms A and B to obtain the bootstrap-based p-values. The results are presented in Table
5.3. The same conclusions are drawn for the cumulative distribution functions of the overall and
sub-processes of the technology process.
Table 5.3: Summary results of the hypothesis test for the equality of cumulative distributions of efficiency scores
between NDEA and DEA models
Algorithm A Algorithm B
KS statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)
Process
The whole process 1.44 0.03 0.05 0.04 0.06Sub-process I 1.01 0.40 0.32 0.52 0.30Sub-process II 0.72 0.71 0.80 0.62 0.70
Overall, we can conclude that the first moment and the entire distributions of the overall tech-
nical efficiency scores are statistically different between the NDEA and DEA models in the study
of non-life insurance companies in Taiwan. There is no evidence suggesting that the differences
of the technical efficiencies of the two sub-processes are statistically significant across the NDEA
and DEA models. The NDEA model seems to provide more meaningful results of the overall
technical efficiencies than the DEA model. For instance, even though both DEA and NDEA mod-
els show that Chung Kuo company does not perform efficiently in sub-process II (the technical
efficiency scores are greater than 1 in both the NDEA and DEA models, see Table A.2.). The
DEA model suggests that the overall process is perfectly efficient (equal to 1) whereas the NDEA
model indicates some inefficiency exists, because the overall efficiency score is 1.6.19
6. Conclusion
In this paper, we developed two bootstrap algorithms for testing the equality of the first moment
and of the entire distributions of technical efficiency scores estimated using the DEA and NDEA
approaches. The algorithms can also be adapted to perform a pairwise comparison between two
NDEA models to explore the sensitivity of the NDEA estimates across various network structures
or assumptions applied to production technology.
The proposed algorithms, when applied to the non-life insurance companies in Taiwan, pro-
vided fairly robust results. Specifically, all tests implied that the overall technical efficiencies
between the NDEA and DEA models are statistically different with regard to their means and
distributions, but this is not the case for the two sub-processes (premium acquisition and profit
generation).
Acknowledgement
We thank the editor and an anonymous referee for their constructive comments and suggestions
that helped to improve this paper substantially.
References
Avkiran, N. K., 2009. Opening the black box of efficiency analysis: An illustration with UAE banks. Omega 37 (4),
930 – 941.
Banker, R. D., Charnes, A., Cooper, W. W., 1984. Some models for estimating technical and scale inefficiencies in
data envelopment analysis. Management Science 30 (9), 1078–1092.
Bowman, A. W., 1984. An alternative method of cross-validation for the smoothing of density estimates. Biometrika
71 (2), 353–360.
Charnes, A., Cooper, W., Golany, B., Halek, R., Klopp, G., Schmitz, E., Thomas, D., 1986. Two phase data envelop-
ment analysis approach to policy evaluation and management of army recruiting activities:tradeoffs between joint
services and army advertising. Research Report CCS no, 532, Center for Cybernetic studies, The University of
Texas at Austin Texas.
Charnes, A., Cooper, W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of
Operational Research 2 (6), 429 – 444.
20
Chen, Y., Cook, W. D., Zhu, J., 2010. Deriving the DEA frontier for two-stage processes. European Journal of Oper-
ational Research 202 (1), 138 – 142.
Efron, B., 01 1979. Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7 (1), 1–26.
Fare, R., Grosskopf, S., 1996. Intertemporal production frontiers: with dynamic DEA. New York: Kluwer Academic
Publishers.
Fare, R., Grosskopf, S., 2000. Network DEA . Socio-Economic Planning Sciences 34 (1), 35 – 49.
Fare, R., Grosskopf, S., Roos, P., 1996. Network and production models of Swedish pharmacies. Tech. rep., Mimeo.
Fare, R., Primont, D., 1995. Multi-ouput production and duality: Theory and applications. New York: Kluwer Aca-
demic Publishers.
Fare, R., Whittaker, G., 1995. An intermediate input model of dairy production using complex survey data. Journal of
Agricultural Economics 46 (2), 201–213.
Farrell, M. J., 1957. The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A
(General) 120 (3), 253–290.
Hall, P., 1984. Central limit theorem for integrated square error of multivariate nonparametric density estimators.
Journal of Multivariate Analysis 14 (1), 1 – 16.
Kao, C., 2009a. Efficiency decomposition in network data envelopment analysis: A relational model. European Jour-
nal of Operational Research 192 (3), 949 – 962.
Kao, C., 2009b. Efficiency measurement for parallel production systems. European Journal of Operational Research
196 (3), 1107 – 1112.
Kao, C., 2014. Network data envelopment analysis: A review. European Journal of Operational Research 239 (1), 1 –
16.
Kao, C., Hwang, S.-N., 2008. Efficiency decomposition in two-stage data envelopment analysis: An application to
non-life insurance companies in Taiwan. European Journal of Operational Research 185 (1), 418 – 429.
Kneip, A., Simar, L., Wilson, P. W., 2008. Asymptotics and consistent bootstraps for dea estimators in nonparametric
frontier models. Econometric Theory 24 (6), 1663–1697.
Li, Q., 1996. Nonparametric testing of closeness between two unknown distribution functions. Econometric Reviews
15 (3), 261–274.
Li, Q., 1999. Nonparametric testing the similarity of two unknown density functions: local power and bootstrap
analysis. Journal of Nonparametric Statistics 11 (1-3), 189–213.
Politis, D. N., Romano, J. P., Wolf, M., 2001. On the asymptotic theory of subsampling. Statistica Sinica 11, 1105–
1124.
Sheather, S. J., Jones, M. C., 1991. A reliable data-based bandwidth selection method for kernel density estimation.
Journal of the Royal Statistical Society. Series B (Methodological) 53 (3), 683–690.
Shephard, R. W., 1970. Theory of cost and production functions. Princeton,NJ:Princeton University Press.
21
Silverman, B. W., 1986. Density estimation for statistics and data analysis. London:Chapman and Hall.
Simar, L., 1992. Estimating efficiencies from frontier models with panel data: A comparison of parametric, non-
parametric and semi-parametric methods with bootstrapping. Journal of Productivity Analysis 3 (1-2), 171–203.
Simar, L., Wilson, P. W., 1998. Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier
models. Management Science 44 (1), 49–61.
Simar, L., Wilson, P. W., 2000a. A general methodology for bootstrapping in non-parametric frontier models. Journal
of Applied Statistics 27 (6), 779–802.
Simar, L., Wilson, P. W., 2000b. Statistical inference in nonparametric frontier models: The state of the art. Journal of
Productivity Analysis 13 (1), 49–78.
Simar, L., Wilson, P. W., 2002. Non-parametric tests of returns to scale. European Journal of Operational Research
139 (1), 115 – 132.
Simar, L., Wilson, P. W., 2011. Inference by the m out of n bootstrap in nonparametric frontier models. Journal of
Productivity Analysis 36 (1), 33–53.
Simar, L., Zelenyuk, V., 2006. On testing equality of distributions of technical efficiency scores. Econometric Reviews
25 (4), 497–522.
Simar, L., Zelenyuk, V., 2007. Statistical inference for aggregates of Farrell-type efficiencies. Journal of Applied
Econometrics 22 (7), 1367–1394.
Tone, K., Tsutsui, M., 2009. Network DEA: A slacks-based measure approach. European Journal of Operational
Research 197 (1), 243 – 252.
Tone, K., Tsutsui, M., 2010. Dynamic DEA: A slacks-based measure approach. Omega 38 (3-4), 145 – 156.
Tone, K., Tsutsui, M., 2014. Dynamic DEA with network structure: A slacks-based measure approach. Omega 42 (1),
124 – 131.
Tsutsui, M., Goto, M., 2009. A multi-division efficiency evaluation of u.s. electric power companies using a weighted
slacks-based measure. Socio-Economic Planning Sciences 43 (3), 201 – 208.
22
Appendix A. Figures and Tables
1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4Kernel Est. Densities of DEA and NDEA Efficiency Scores
DEANDEA
(a) The Whole Process
1 1.5 2 2.5 30
0.5
1
1.5Kernel Est. Densities of DEA and NDEA Efficiency Scores
DEANDEA
(b) Sub-process I
1 2 3 4 5 6 70
0.05
0.1
0.15
0.2
0.25
0.3
0.35Kernel Est. Densities of DEA and NDEA Efficiency Scores
DEANDEA
(c) Sub-process II
Figure A.1: Kernel density estimation of the overall technical efficiencies, technical efficiencies of sub-
process I and sub-process II.
23
Table A.1: Summary results of the one-tailed hypothesis test for the equality of means
of efficiency scores between NDEA and DEA models
Algorithm A Algorithm B
RD statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)
Process
The whole process 1.63 0.01 0.01 0.01 0.00Sub-process I 1.09 0.08 0.07 0.07 0.06Sub-process II 1.17 0.21 0.21 0.20 0.10
Note: The hypothesis tested here is:
H0 : E(dTEN
) = E(dTED
) against H1 : E(dTEN
) > E(dTED
).
24
Table A.2: The overall and sub-process technical efficiency scores of the 24 non-life insurance companies in Taiwan
across the NDEA and DEA models
NDEA NDEA NDEA DEA DEA DEA
Company (Over all) (Sub-process I) (Sub-process II) (Over all) (Sub-process I) (Sub-process II)
Taiwan Fire 1.43 1.01 1.42 1.02 1.01 1.40
Chung Kuo 1.60 1.00 1.60 1.00 1.00 1.59
Tai Ping 1.45 1.45 1.00 1.01 1.45 1.00
China Mariners 3.29 1.38 2.38 2.05 1.38 2.31
Fubon 1.30 1.20 1.08 1.00 1.19 1.00
Zurich 2.57 1.04 2.47 1.68 1.04 2.47
Taian 3.62 1.49 2.42 2.13 1.33 1.86
Ming Tai 3.63 1.51 2.41 2.41 1.38 1.96
Central 4.48 1.00 4.48 3.06 1.00 3.43
The First 2.15 1.16 1.85 1.28 1.16 1.48
Kuo Hua 6.10 1.55 3.95 3.54 1.35 3.06
Union 1.32 1.00 1.32 1.00 1.00 1.32
Shingkong 4.81 1.49 3.23 2.84 1.23 1.84
South China 3.46 1.49 2.32 2.13 1.38 1.93
Cathay Century 1.63 1.00 1.63 1.02 1.00 1.42
Allianz President 3.12 1.13 2.77 2.12 1.10 2.60
Newa 2.78 1.59 1.74 1.58 1.38 1.00
AIU 3.86 1.26 3.07 2.34 1.26 2.68
North America 2.43 1.00 2.43 1.22 1.00 2.41
Federal 1.83 1.07 1.71 1.07 1.07 1.11
Royal and Sunalliance 4.98 1.37 3.65 3.00 1.33 3.58
Asia 1.70 1.70 1.00 1.00 1.70 1.00
AXA 2.38 1.19 2.00 1.67 1.18 1.79
Mitsui Sumitomo 7.42 2.33 3.18 3.89 1.00 2.98
Mean 3.06 1.31 2.30 1.88 1.21 1.97
25