Centre for Efficiency and Productivity Analysis · Since the seminal works of Farrell (1957) and...

Centre for Efficiency and Productivity Analysis

Working Paper Series No. WP05/2015

Bootstrap-based testing for network DEA: Some Theory and Applications

Kelly D.T.Trinh, Valentin Zelenyuk

Date: May 2015

School of Economics University of Queensland

St. Lucia, Qld. 4072 Australia

ISSN No. 1932 - 4398

Bootstrap-based testing for network DEA:Some Theory and Applications

Kelly D.T.Trinh⇤, Valentin ZelenyukSchool of Economics, The University of Queensland, Brisbane, QLD, 4072, Australia

Abstract

Traditional data envelopment analysis (DEA) views a production technology process as a ‘black

box’, while network DEA allows a researcher to look into the ‘black box’, to evaluate the over-

all performance and the performance of each sub-process of the system. The technical efficiency

scores calculated from these approaches can be slightly, or sometimes vastly different. Our aim

is to develop two bootstrap-based algorithms to test whether any observed difference between the

results from the two approaches is statistically significant, or whether it is due to sampling and

estimation noise. We focus on testing the equality of the first moment (i.e., the mean) and of the

entire distribution of the technical efficiency scores. The bootstrap-based procedures can also be

used for pairwise comparison between two network DEA models to perform sensitivity analysis of

the resulting estimates across various network structures. In our empirical illustration of non-life

insurance companies in Taiwan, both algorithms provide fairly robust results. We find statistical

evidence suggesting that the first moment and the entire distribution of the overall technical ef-

ficiencies are significantly different between the DEA and network DEA models. However, the

differences are not statistically significant for the two sub-processes across these models.

Keywords: DEA, Network DEA, Subsampling Bootstrap.

⇤Corresponding authorEmail addresses: [email protected] (Kelly D.T.Trinh ), [email protected] (Valentin

Zelenyuk)

1

1. Introduction

Since the seminal works of Farrell (1957) and Charnes et al. (1978), data envelopment analysis

(DEA) has been extensively used to measure the efficiency of a set of decision making units

(DMUs) (i.e., firms, industries, etc.). The approach views production technology as a ‘black box’

in which inputs are transformed to produce final outputs. In reality, however, many production

technology processes are network technologies, where outputs of one sub-process are used as

inputs of another sub-process. Network DEA (hereafter NDEA), first discussed in Charnes et al.

(1986), is an approach allowing a researcher to look into the ‘black box’, to take into account

the internal structures of a technology process, and to evaluate the overall performance and the

performance of each sub-process of each DMU.

Unlike conventional DEA, NDEA does not have a standard form as it depends upon the spe-

cific network structures of a technology process. Fare and Grosskopf (1996, 2000) develop several

static and dynamic NDEA models within the framework of envelopment-based approach, while

Tone and Tsutsui (2009, 2010, 2014) develop models in the framework of slacks-based measure

approach. Kao and Hwang (2008) and Kao (2009a,b) add variants to the literature, modify and

generalize a number of NDEA models using a multiplier representation (see Kao (2014) for a re-

cent survey of NDEA works and related references). These NDEA models have recently received

considerable attention in many empirical studies such as health economics (Fare et al. (1996)),

agriculture (Fare and Whittaker (1995)), banking services (Avkiran (2009)), electricity services

(Tsutsui and Goto (2009)), etc.

A rational question is whether the difference of the resulting technical efficiency scores cal-

culated in DEA and NDEA models are statistically significant, or whether the difference is only

due to sampling and estimation variations. A researcher might be interested in these questions for

several reasons. For instance, it can become costly to collect information about intermediate prod-

ucts for a large sample, but may be easier (or already exist) for a smaller sample. An appropriate

statistical test verifying the statistical difference between these two models can provide guidance

for whether one should proceed to collect the information for a larger sample. Another example

is the case where data for both approaches exist, and both methods show ‘superficially similar’

2

results. In this case, a researcher advocating the NDEA approach might need to present statistical

evidence to show that it provides statistically different results to those obtained using DEA. Oth-

erwise, if the difference is statistically insignificant, one might prefer to use the most well-known,

and simplest approach.

A number of studies in the literature discuss the discrepancies of the results between DEA

and NDEA models, but they often do not provide a formal statistical test. One of the exceptions

is the work of Kao and Hwang (2008) in the study of non-life insurance companies in Taiwan.

They investigate the similarities between the DEA and NDEA models by ranking the insurance

companies according to the estimated individual efficiency scores, and then use Spearman’s rank

correlation test to investigate whether the two models provide the same rankings.

In this paper, we test for the equivalence of NDEA and DEA models using bootstrap tech-

niques, which are widely used in DEA framework to perform statistical inferences of technical

efficiencies (e.g., confidence intervals, standard errors, etc.).1 The proposed algorithms can also

be used to perform pairwise comparison of NDEA models to explore the sensitivity of NDEA

estimates across various network structures or assumptions imposed in a production process.

To test the equivalence of the two models, we consider hypothesis tests regarding the equality

of the first moments (i.e., means) and of the entire distributions of technical efficiency scores

between DEA and NDEA models. We illustrate the proposed procedures using the empirical

study investigated by Kao and Hwang (2008).

The paper is structured as follows. Section 2 describes the measurement of efficiency scores

in conventional DEA and NDEA models. Section 3 discusses hypotheses for testing the means

and distributions of the technical efficiencies between the DEA and NDEA models. Section 4

elaborates the bootstrap-based algorithms. Section 5 presents some empirical results and Section

6 concludes the paper.

1See Simar and Wilson (2000a,b) for an early survey of bootstrap methods used in DEA.

3

2. Efficiency measurement

Given a vector of q inputs (x 2 Rq

+) and a vector of p outputs (y 2 Rp

+), the set of feasible

combinations of the input and output vectors is characterized by the production technology set:

T ⌘ {(x, y) 2 Rq+p

+ : x 2 Rq

+ can produce y 2 Rp

+}. (2.1)

Equivalently, the technology can also be described by the output correspondence set:

P (x) ⌘ {y 2 Rp

+ : (x, y) 2 T}, x 2 Rq

+. (2.2)

The attainable set, defined in equation (2.2), is required to meet some standard regularity con-

ditions such as free disposability, convexity, returns to scale, etc.(see Fare and Primont (1995)

for further details). Under these assumptions, the technology set T can be characterized by the

Shephard (1970) output distance:

Do

(x, y) ⌘ inf{✓ > 0 : (x, y/✓) 2 T}. (2.3)

The output distance function is particularly convenient as a criterion for technical efficiency

of a DMU; it measures the distance from a point y in T to the upper boundary of T . Such

an efficiency criterion is often expressed in another form, the reciprocal of (2.3), denoted as

TE(x, y) = 1/Do

(x, y), which is well-known as the Farrell (1957) output-oriented measure of

technical efficiency.

If we let the technological frontier be the ‘upper’ boundary of T ,

@T = {(x, y) 2 Rq+p

+ : (x, y) 2 T, (x, ✓y) /2 T, 8✓ 2 (1,1)}, (2.4)

then for technically feasible allocations, TE 2 [1,1), TE = 1 implies that a DMU is technically

efficient (i.e., on the frontier @T ), whereas TE > 1 means that a DMU is not technically efficient.

In practice, we observe neither the true technology (T ) nor its boundary (@T ). However, we

can estimate T from an observed sample via various approaches such as DEA or NDEA.

4

2.1. Conventional DEA model

The seminal paper of Charnes et al. (1978), inspired by Farrell (1957), introduced the nonparamet-

ric DEA approach which uses linear programming techniques to estimate a convex frontier set. In

the framework, the best practice frontier is estimated by an empirical analogue of equation (2.4):

@ ˆTD

= {(x, y) 2 Rq+p

+ : (x, y) 2 ˆTD, (x, ✓y) /2 ˆTD, 8✓ 2 (1,1)}. (2.5)

where the technology ˆTD is characterized as:

ˆTD

= {(x, y) :nX

k=1

�k

ykl

� yl

, l = 1, . . . , p;

nX

k=1

�k

xks

xs

, s = 1, . . . , q;

�k

� 0, k = 1, . . . , n}. (2.6)

The non-negative �k

are the intensity variables determining the production frontier. The con-

straint �k

� 0 implies the constant returns to scale (CRS) and it can easily be relaxed in the style

of the Banker et al. (1984) model.2 Given the construction, the DEA technical efficiency estimate

at a fixed point (x, y) is a solution to the linear programming problem:

dTED

⌘ dTED

(x, y) = max

✓,�1,...,�n

{✓ : (x, ✓y) 2 ˆTD}. (2.7)

2.2. Network DEA model

Unlike conventional DEA, NDEA views a whole production process as a network technology: the

inputs can be directly used by all sub-processes, and the outputs of each sub-process can be either

the intermediate products, used by other sub-processes for the production, or final outputs of the

system. This type of technology process is depicted in Figure 2.1. We note that the illustration

in Figure 2.1 is only one specific network technology among many others, and that the proposed

bootstrap-based algorithms are not restricted to this specific structure of network technology. In

Figure 2.1, total available inputs are denoted as x = (x(1), x(2)) 2 Rq

+, where x(r) 2 Rq

+(r = 1, 2)

2For example, variable returns to scale can be imposed by adding another constraint:Pn

k=1 �k = 1.

5

y(1) 2 Rp1+

y(2) 2 R

p

2

+

x(1) 2 R

q

+

x(2) 2 Rq

+x = {x(1), x(2)} 2 Rq

+

z 2 Rm

+

y = {y(1), y(2)} 2 Rp

+;

p = p1 + p21

2

Figure 2.1: A Production Technology Process

is the portion of total inputs used in sub-process r. Final outputs are denoted as y = (y(1), y(2)) 2

Rp

+, where y(1) 2 Rp1+ and y(2) 2 Rp2

+ (p = p1 + p2) are the final outputs produced in sub-process

1 and sub-process 2, respectively. An intermediate product, denoted as z 2 Rm

+ , is the output

produced in sub-process 1, and is used as the input in sub-process 2.

Under the assumptions of free disposability, convexity and CRS, the technology ˆTN can be

characterized in the NDEA framework as follows:

ˆTN

= {(x, z, y) :

Sub-process 1:3

y(1)l1

nX

k=1

�(1)k

y(1)kl1, l1 = 1, . . . , p1; (2.8)

zt

nX

k=1

�(1)k

zkt

, t = 1, . . . ,m; (2.9)

x(1)s

�nX

k=1

�(1)k

x(1)ks

, s = 1, . . . , q; (2.10)

�(1)k

� 0, k = 1, . . . , n. (2.11)

3We thank an anonymous referee for pointing out that the constraint (2.9) and (2.13) can be written asPn

k=1(�(1)k �

�(2)k )zkt � 0, where t = 1, . . . ,m, which is similar to Model (8) in Chen et al. (2010, p.141).

6

Sub-process 2:

y(2)l2

nX

k=1

�(2)k

y(2)kl2, l2 = 1, . . . , p2; (2.12)

zt

�nX

k=1

�(2)k

zkt

, t = 1, . . . ,m; (2.13)

x(2)s

�nX

k=1

�(2)k

x(2)ks

, s = 1, . . . , q; (2.14)

xs

� x(1)s

+ x(2)s

, s = 1, . . . , q; (2.15)

�(2)k

� 0, k = 1, . . . , n}. (2.16)

The overall NDEA technical efficiency estimate at a fixed point (x, z, y) is a solution to the

linear programming problem:

dTEN

⌘ dTEN

(x, z, y) = max

✓,�

(1)1 ,...,�

(1)n ,�

(2)1 ,...,�

(2)n

{✓ : (x, ✓y) 2 ˆTN}. (2.17)

It is important to note that if the two sub-processes are not differentiated, meaning the non-

negative intensity of sub-process 1 and sub-process 2 are equivalent (�(1)k

= �(2)k

, k = 1, . . . , n),

then the NDEA model is equivalent to the conventional DEA model.4 Thus, the DEA model can

be viewed as a restricted version, or degenerate version of this NDEA model.

3. Testing for the equality of means and distributions of technical efficiency

scores

To measure the overall technical efficiency, a researcher can either use (2.7) in DEA or (2.17) in

NDEA. In this case, DEA does not take into account the intermediate product z. A researcher, in

some circumstances, might prefer to include the intermediate product z in the estimation of the

individual efficiency scores in DEA framework by treating it as an input or final output. DEA, in

4Another potential alternative of testing the equivalence of NDEA and DEA models is to test the equality of

�(1)k = �

(2)k , k = 1, . . . , n. Investigating this alternative is a research question in itself and so is beyond the scope of

this paper. We are grateful to an anonymous referee for the insightful suggestion.

7

this case, provides a misspecification of a technology process. Further discussion regarding the

similarities and discrepancies between NDEA and DEA models can be found in Kao (2014).

If a particular technology network structure is essential, then both DEA and NDEA models,

up to a certain level of statistical noise, should yield the same or similar results. Otherwise,

DEA becomes restrictive whereas NDEA is an appropriate approach and so the resulting technical

efficiency scores should be rather different between the approaches in a statistical sense. Our goal

is to develop formal hypothesis tests to verify the equivalence of the estimates obtained from the

two models. In the following sub-sections, we describe the hypothesis tests for the equality of the

mean and of distribution of the technical efficiencies.

3.1. Testing the equality of the mean technical inefficiency scores between NDEA and DEA models

The first hypothesis we consider is:

H0 : E(

dTEN

) = E(

dTED

) against H1 : E(

dTEN

) 6= E(

dTED

),

where E(

dTEN

) and E(

dTED

) stand for the mean of the estimated efficiency scores calculated

from NDEA and DEA models, respectively.

We are interested in how far, in a statistical sense, the quantity RD =

E(dTE

N)

E(dTE

D)

deviates from

unity. We can infer it from its estimator dRD =

1n

Pnk=1

dTE

N(xk,zk,yk)

1n

Pnk=1

dTE

D(xk,yk)

, where n is the number of

observations.5 For simplicity, we now refer to dTEN

k

and dTED

k

as the individual (k) technical

efficiency, which are computed from NDEA (

dTEN

(xk

, zk

, yk

)) and DEA (

dTED

(xk

, yk

)), respec-

tively.

The hypothesis test is developed in the spirit of Simar and Wilson (2002) for testing hypotheses

of the returns to scale in the nonparametric context, and as in Simar and Zelenyuk (2007) in the

context of testing equality of aggregate technical efficient scores. The null hypothesis is rejected

if the bootstrap-based p-value is smaller than the chosen level of significance ↵ (e.g., 5%).

5Other possible test statistics are the ratio of median, the ratio of trimmed means.

8

3.2. Testing the equality of the density distributions of technical efficiency scores between NDEA and

DEA models

The second hypothesis we consider is:

H0 : fN

(v) = fD

(v) against H1 : fN

(v) 6= fD

(v), on a set of positive measures,

where fN

(.) and fD

(.) are the probability density distributions of technical efficiency scores of

NDEA and DEA, respectively.

To test the hypothesis, Li (1996, 1999) considered the integrated square difference criterion

I ⌘Z

(fN

(v)� fD

(v))2dv

=

ZfN

(v)dFN

(v) +

ZfD

(v)dFD

(v)

�Z

fD

(v)dFN

(v)�Z

fN

(v)dFD

(v), (3.1)

where f j

(.) and F j

(.) are unknown density and distribution for each approach j (j = N,D) .

Using central limit theorem from Hall (1984), Li (1996) shows that a consistent and asymp-

totically normal estimator of I is obtained by replacing the unknown distribution F j

(.) and the

unknown density f j

(.) with the corresponding empirical distribution functions, ˆF j

(.), and the

nonparametric kernel density estimators, ˆf j

(.), respectively:

ˆF j

(v) ⌘ 1

n

nX

k=1

I(vjk

v), j = N,D (3.2)

andˆf j

(v) ⌘ 1

nhj

nX

k=1

K⇣v � vj

k

hj

⌘, (3.3)

where I(A) is the indicator function which yields 1 if the statement A is true, and 0 otherwise;

hj is a bandwidth such that hj ! 0, nhj ! 1 as n ! 1; K is an appropriate kernel function

and v is a point at which the density is estimated. The choice of kernel density estimation is not

as crucial as the choice of the bandwidth. Here we choose the Gaussian kernel and an optimal

bandwidth using least squares cross-validation proposed by Bowman (1984).

9

Li (1996) notes that the resulting statistic (

ˆ

I) without the diagonal terms, in most of Monte

Carlo experiments, performs better than the statistic with the diagonal terms. The statistic without

the diagonal terms is defined as:

ˆ

I

nd

n,h

=

1

hn(n� 1)

nX

i=1

nX

k 6=i,k=1

K⇣vN

i

� vNk

h

⌘

+

1

hn(n� 1)

nX

i=1

nX

k 6=i,k=1

K⇣vD

i

� vDk

h

⌘

� 1

hn(n� 1)

nX

i=1

nX

k 6=i,k=1

K⇣vD

i

� vNk

h

⌘

� 1

hn(n� 1)

nX

i=1

nX

k 6=i,k=1

K⇣vN

i

� vDk

h

⌘. (3.4)

where the bandwidth h = min{hN , hD}.

Li (1996, p.265) shows the limiting distribution of ˆIndn,h

in (3.4) is a standard normal distribution

after appropriate ‘standardization’:

ˆLnd

n,h

⌘nh1/2

ˆ

I

nd

n,h

b�h

d�! N(0, 1) (3.5)

where

�2h

= 2

h1

hn2

nX

i=1

nX

k=1

K�vN

i

� vNk

h

⌘

+

1

hn2

nX

i=1

nX

k=1

K�vD

i

� vDk

h

⌘

+

1

hn2

nX

i=1

nX

k=1

K�vD

i

� vNk

h

⌘

+

1

hn2

nX

i=1

nX

k=1

K�vN

i

� vDk

h

⌘i⇥Z

K2(v)dv.

For the application of the Li test, in our context, one of the main concerns is the discontinuity

issue: there is ‘spurious’ mass at unity, because at least one observation, by the construction of

DEA, is on the frontier while in reality the probability of such an event is zero.6 To address this

6The discontinuity issue in NDEA might not be as serious as it is in DEA because its constraints are less stringent

than those of the conventional DEA model.

10

issue, we follow Algorithm II suggested by Simar and Zelenyuk (2006) where the observations

equal to unity are smoothed away from the boundary by adding a small noise of an order of

magnitude smaller than the noise of estimation. The order of magnitude is suggested by the rate

of convergence of the DEA estimator. The smoothing is made as follows:

dTE⇤jk

=

8><

>:

dTEj

k

+ ✏jk

if dTEj

k

= 1,

dTEj

k

otherwise,(3.6)

where j = N,D, ✏jk

⇠ U(0,min{n�2/(p+q+1), a � 1}), a is the ↵ quantile of the empirical distri-

bution of dTEj

k

> 1.

The null hypothesis is rejected if the bootstrap-based p-value is less than the chosen level of

significance ↵. The bootstrap-based procedures for the p-values are described in Section 4.

4. Bootstrap-based algorithms

Bootstrap techniques, first proposed by Efron (1979), have been applied in various disciplines for

many studies. In productivity and efficiency analysis, bootstrap techniques were first introduced

by Simar (1992). Two main bootstrap approaches are the smoothed bootstrap and the subsampling

bootstrap. For about a decade, the smoothed bootstrap (Simar and Wilson (1998, 2000a,b)) was

the dominant approach in practice. Although no formal proofs were given at the time, it was an

important tool for researchers to perform statistical inference on estimated efficiencies such as

confidence intervals, standard errors, etc. About a decade later, Kneip et al. (2008) provided a

formal proof for the consistency of the subsampling bootstrap (m out of n subsampling bootstrap,

m < n), and that the smoothed bootstrap is its approximation. We therefore use the subsampling

bootstrap here, which is consistent for DEA and therefore also consistent for NDEA under the null

hypotheses.

The bootstrap-based tests are designed in the spirits of the works of Simar and Wilson (2002),

Simar and Zelenyuk (2006) and Simar and Zelenyuk (2007). We try to carefully adapt the existing

statistical methods to a new context of NDEA framework, while encouraging future research to

deliver the formal proofs (e.g., consistency under the alternative hypotheses).

11

To facilitate further discussion, let us assume that a data set Sn

= {(xk

, zk

, yk

) : k = 1, . . . , n}

is generated from a data generating process (DGP), which is completely characterized by the

knowledge of technology set P (x), and of the probability density function f(x, y),x = (x, z).

Let P denote the DGP, then P = (P (x), f(., .)), and its estimates bP = (

bP(x), bf(., .)). In general,

to bootstrap an unknown parameter of interest, �, which in this paper are RD and Lnd

n,h

, we first

estimate � from the original sample Sn

, and denote as �. Under the null hypotheses, we randomly

draw (with replacement) a new bootstrap sample S⇤m

= {(x⇤k

, z⇤k

, y⇤k

) : k = 1, . . . ,m;m n}

from Sn

, and then estimate the bootstrap estimates of �, denoted as �⇤, from the bootstrap sample

S⇤m

using the same formula as for the estimator �. For the case of DEA, the naive bootstrap is

inconsistent (see Simar and Wilson (2000a, p.786) for further discussion) while a subsampling

bootstrap is consistent (under the null hypotheses) for any subsample size of m = n, 0 < < 1.

If the bootstrap is consistent, then the relationship between the bootstrap estimates (�⇤) and

the original estimates (�) will mimic the relationship between the original estimates (�) and the

true unobserved values (�). In our case, this implies that:

(b�⇤ � b�)| ˆP asy⇠ (b� � �)|P

In order to preserve the structure of NDEA, we draw a bootstrap sample simultaneously across

all sub-processes of DMUs. That is, we resample on the ‘triples’ (x, z, y) across the n DMUs, and

separate the drawn observations back into sub-processes for each DMU.

Here we provide two bootstrap algorithms (Algorithm A and Algorithm B) to obtain the

bootstrap-based p-value under the null hypotheses. Algorithm A is a more comprehensive proce-

dure as the bootstrap estimates of the statistic � are computed from the bootstrap samples, which

are randomly drawn from the original sample Sn

= {(xk

, zk

, yk

) : k = 1, . . . , n}. On the other

hand, the bootstrap estimates of the statistics �, in Algorithm B, are computed from the bootstrap

samples that are drawn from the estimated technical efficiency scores computed from either the

DEA or NDEA models for the original sample Sn

. In other words, Algorithm A requires solv-

ing linear programming problem to obtain the individual technical efficiencies for each bootstrap

iteration while Algorithm B does not.

12

4.1. Bootstrap algorithms for testing the equality of the mean technical efficiency scores between

NDEA and DEA models

[1.] For each observation in sample Sn

= {(xk

, zk

, yk

) : k = 1, . . . , n}, compute the individual

technical efficiency scores using (2.7) and (2.17), and label them as {dTED

k

⌘ dTED

(xk

, yk

) :

k = 1, . . . , n} and {dTEN

k

⌘ dTEN

(xk

, zk

, yk

) : k = 1, . . . , n}, respectively.

[2.] Construct the test statistic dRD =

E(dTE

N)

E(dTE

D)

, where ˆE(

dTEj

) =

1n

Pn

k=1dTE

j

k

, j = N,D.

[3.] Algorithm A:

• Draw two bootstrap samples of size m out of n (m < n) by drawing triples (xk

, zk

, yk

)

independently and with replacement from the original sample Sn

, label them as S⇤Dmb

=

{(x⇤Dkb

, y⇤Dkb

) : k = 1, . . . ,m;m < n} and S⇤Nmb

= {(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . . ,m;m <

n}, where b denotes a bootstrap iteration, b = 1, . . . , B.

• Compute the bootstrap estimates of dTED

k

, dTEN

k

via DEA using equation (2.7) for the

bootstrap samples S⇤Dmb

and S⇤Nmb

, and label them as {dTE⇤Dkb

⌘ dTE⇤D(x⇤D

kb

, y⇤Dkb

) : k =

1, . . . ,m;m < n} and {dTE⇤Nkb

⌘ dTE⇤N

(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . . ,m;m < n},

respectively.

Algorithm B: Draw two samples of size n out of n by resampling from {dTED

k

: k =

1, . . . , n}, which are calculated in step [1.], and label them as {dTE⇤Dkb

⌘ dTE⇤D(x⇤D

kb

, y⇤Dkb

) :

k = 1, . . . , n} and {dTE⇤Nkb

⌘ dTE⇤N

(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . . , n}.

[4.] Construct the bootstrap estimates of dRD using the bootstrap samples {dTE⇤Dkb

: k = 1, . . . , n}

and {dTE⇤Nkb

: k = 1, . . . , n}, and denote them as dRD⇤b

=

E(dTE

⇤Nb )

E(dTE

⇤Db )

, where ˆE(

dTE⇤jb

) =

1n

Pn

k=1dTE

⇤jkb

, j = N,D, n = m < n for Algorithm A, and n = n for Algorithm B.

[5.] Repeat steps [3.]-[4.] B times, b = 1, . . . , B.

[6.] Construct the bootstrap-based p-value for the two-tailed hypothesis test:

p =

1

B

BX

b=1

I{|dRD⇤b

| > |dRD|}, (4.1)

13

where I is an indicator function, which yields the value of 1 if |dRD⇤b

| > |dRD| is true, and 0

otherwise.

If a researcher is interested in performing an one-tail hypothesis test (e.g., H0 : RD = 1

against H1 : RD > 1 () E(

dTEN

) > E(

dTED

), then the bootstrap-based p-value is:

p =

1

B

BX

b=1

I{dRD⇤b

> dRD}. (4.2)

Also, one can use other statistics (e.g., absolute difference, mean of squared deviations, etc.) to

test for the equivalence of the models. The proposed algorithms are easily adapted to this context.

In addition, rather than using a bootstrap-based p-value, one can use bootstrap-based confi-

dence intervals of the dRD statistic, in a similar fashion to Simar and Zelenyuk (2007). The null

hypothesis is to be rejected if the bootstrap-based confidence interval does not include unity.

4.2. Bootstrap algorithm for testing the equality of distributions of efficiency scores between NDEA

and DEA models

[1.] For each observation in sample Sn

= {(xk

, zk

, yk

) : k = 1, . . . , n}, compute the individual

technical efficiency scores using (2.7) and (2.17), and label them as {dTED

k

⌘ dTED

(xk

, yk

) :

k = 1, . . . , n} and {dTEN

k

⌘ dTEN

(xk

, zk

, yk

) : k = 1, . . . , n}, respectively.

[2.] Smooth {dTEj

k

: k = 1, . . . , n}, where j = N,D, by adding small noise to estimate the

technical efficiency scores equal to unity in the following way:

dTE⇤jk

=

8><

>:

dTEj

k

+ ✏jk

if dTEj

k

= 1,

dTEj

k

otherwise,(4.3)

where ✏jk

⇠ U(0,min{n�2/(p+q+1), a � 1}), a is the ↵ quantile of the empirical distribution

of dTEj

k

> 1.

[3.] Compute the Li (1996) statistic, ˆLnd

n,h

, via equation (3.5) using the data dTE⇤jk

in step [2.],

and the optimal bandwidth is chosen such that h = min{hN , hD}.

[4.] Algorithm A:14

• Draw two bootstrap samples of size m out of n (m < n) by drawing triples (xk, zk, yk)

independently and with replacement from the original sample Sn

, label them as S⇤Dmb

=

{(x⇤Dkb

, y⇤Dkb

) : k = 1, . . . ,m;m < n}, S⇤Nmb

= {(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . .m;m <

n}, where b denotes a bootstrap iteration, b = 1, . . . , B.

• Compute the bootstrap estimates of dTED

k

, dTEN

k

using DEA ( i.e., equation (2.7)) for

bootstrap samples S⇤Dmb

and S⇤Nmb

, and label them as {dTE⇤Dkb

⌘ dTE⇤D(x⇤D

kb

, y⇤Dkb

) : k =

1, . . . ,m;m < n} and {dTE⇤Nkb

⌘ dTE⇤N

(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . . ,m;m < n}

respectively.

Algorithm B: Draw two samples of size n out of n by resampling from {dTED

k

: k =

1, . . . , n}, which are calculated in step [1.], and label them as {dTE⇤Dkb

⌘ dTE⇤D(x⇤D

kb

, y⇤Dkb

) :

k = 1, . . . , n} and {dTE⇤Nkb

⌘ dTE⇤N

(x⇤Nkb

, z⇤Nkb

, y⇤Nkb

) : k = 1, . . . , n}.7

[5.] Smooth dTE⇤jkb

if dTE⇤jkb

= 1, j = N,D, by adding small noise as in (4.3), and denote them

as {dTE⇤⇤Dkb

: k = 1, . . . , n} and {dTE⇤⇤Nkb

: k = 1, . . . , n}, n = m < n for Algorithm A, and

n = n for Algorithm B.

[6.] Compute the bootstrap-based Li statistic via (3.5), and h⇤b

= min{h⇤Nb

, h⇤Db

} using the data

obtained in step [5.], dTE⇤⇤Nkb

,dTE⇤⇤Dkb

, and label them as ˆL⇤ndn,h

⇤b.

[7.] Repeat steps [4.]-[6.] B times, b = 1, . . . , B.

[8.] Compute the bootstrap-based p-values such that

p =

1

B

BX

b=1

I{|ˆL⇤ndn,h

⇤b| > |ˆLnd

n,h

|},

where I is an indicator function, which yields the value of 1 if |ˆL⇤ndn,h

⇤b| > |ˆLnd

n,h

| is true, and

0 otherwise.

7Li (1999, pp.204-205) concludes from the Monte Carlo simulations that the Li test performs better in the case of

drawing from one sample than in the case of drawing from a pooled sample.

15

5. Empirical illustration

For the empirical illustration, we use the data from Kao and Hwang (2008) in the study of non-life

insurance companies in Taiwan. The data include two primary inputs (operation and insurance

expenses), two intermediate products (direct written and reinsurance premiums), and two final

outputs (underwriting and investment profit) for 24 non-life insurance companies. The whole

production process of non-life insurance can be divided into two sub-processes. In the first sub-

process, premium acquisition, clients pay direct written premiums and other insurance companies

pay reinsurance premiums. In the second sub-process, profit generation, premiums are invested in

a portfolio to earn profit.

Conventional DEA can be used to measure the technical efficiencies for each sub-process as

well as the overall technical efficiencies. The estimation of the overall technical efficiency scores,

however, does not take into account the performance of the two sub-processes. Kao and Hwang

(2008) accounted for the internal structures in the estimation of the overall technical efficiency

scores by using a version of NDEA, which they called the ‘relational two-stage’ DEA. The ques-

tion is whether these efficiency scores calculated using NDEA are statistically different from those

calculated using a conventional DEA.

5.1. Hypothesis test for the equality of the means of technical efficiency scores between NDEA and

DEA models

The hypothesis tested here is:

H0 : E(

dTEN

) = E(

dTED

) against H1 : E(

dTEN

) 6= E(

dTED

).

The results of the two-tailed hypothesis test are summarized in Table 5.1.8 The computed RD

statistics are all greater than 1 (i.e., 1.63 for the overall efficiency, 1.09 for sub-process I, 1.17 for

sub-process II). These results indicate that the efficiency scores obtained from the NDEA and DEA

models are not equivalent. Furthermore, our statistical testing results suggest that only the means

8The results of the one-tailed hypothesis test are presented in Table A.1.

16

Table 5.1: Summary results of the two-tailed hypothesis test for the equality of means of

efficiency scores between NDEA and DEA models

Algorithm A Algorithm B

RD statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)

Process

The whole process 1.63 0.01 0.01 0.01 0.00Sub-process I 1.09 0.08 0.07 0.07 0.06Sub-process II 1.17 0.21 0.20 0.18 0.10

The number of bootstrap iterations are 1000 for each algorithm and for each hypoth-

esis test in the empirical illustration.

of overall technical efficiencies are statistically different between the NDEA and DEA models

while this is not the case for the two sub-processes (p-values > 0.05).

For Algorithm A, we try subsample sizes of m = [12, 14, 16] to investigate the sensitivity of

the choice of m.9 The conclusions are robust to the choices of subsampling bootstrap size m, and

across the two algorithms, given the level of significance, ↵ = 5%.

5.2. Hypothesis test for the equality of distributions of technical efficiency scores between NDEA and

DEA models

We now turn to investigating the entire probability density function, which contains more infor-

mation regarding the distribution of the efficiency scores. These density functions of the overall

process and its sub-processes are depicted in Figure A.1. Here we use the Silverman (1986) re-

flection method to account for the bounded support of the distribution of the technical efficiency

scores (see Simar and Zelenyuk (2006)).

Panel (a) of Figure A.1 shows that the distributions of the overall technical efficiencies appear

distinct between the two models. The higher mass at unity for the DEA model is due to the

9For simplicity of presentation, we use various choices of m to investigate the sensitivity, since the paper is not

about the choice of m. To choose the optimal m, one can use the approach suggested by Politis et al. (2001) adapted

to DEA by Simar and Wilson (2011).

17

more stringent constraints of this approach relative to NDEA. In addition, the spread of technical

inefficiencies in the NDEA model is much larger than that in the DEA model. While the NDEA

technical efficiencies are in the range of [1,7], most of the DEA estimates are concentrated around

the range [1,3] and a few observations yield the inefficiencies greater than 3. Panels (b) and (c),

on the other hand, show the probability density functions of the two sub-processes appear rather

similar.

The significance of the differences can be confirmed in the following hypothesis test:

H0 : fN

(v) = fD

(v) against H1 : fN

(v) 6= fD

(v) on a set of positive measures.

The estimated Li statistics and their bootstrap-based p-values are presented in Table 5.2. We

found statistical evidence suggesting that the density functions of the overall technical efficiency

are statistically different, whereas it is not so for the sub-processes (bootstrap-based p-values are

larger than 0.1).10

Table 5.2: Summary results of the hypothesis test for the equality of probability density

functions of efficiency scores between NDEA and DEA models


Li statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)Process


Gaussian kernel density estimation is used with the bandwidth chosen using least

squares cross-validation in Bowman (1984).

We also consider the hypothesis test of the equality of cumulative distributions using Kolmogorov-

Smirnov (KS) test. The hypothesis tested is:

H0 : FN

(v) = FD

(v) against H1 : FN

(v) 6= FD

(v).

10We also use Sheather and Jones (1991) bandwidth, and the conclusions are remained the same.

18

The KS statistic is defined as [KST = (n/2)1/2 supv

| ˆFN

(v) � ˆFD

(v)|, where an empirical

distribution ˆF j

(.), j = N,D, is computed as 1n

Pn

i=1 I(xi

x), where I(A) is an indicator

function which yields the value of 1 if the statement A is true, and 0 otherwise. An appealing

property of the KS test is that its test statistic is independent of the distribution F j

(v).

A direct application of the KS test into our context may not be appropriate as there is a dis-

continuity issue here. Analogous to the adapted Li test in Section 4, we smooth the observations

yielding technical efficiencies equal to one by adding small noise such as in (3.6), and then use

Algorithms A and B to obtain the bootstrap-based p-values. The results are presented in Table

5.3. The same conclusions are drawn for the cumulative distribution functions of the overall and

sub-processes of the technology process.

Table 5.3: Summary results of the hypothesis test for the equality of cumulative distributions of efficiency scores

between NDEA and DEA models


KS statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)

Process


Overall, we can conclude that the first moment and the entire distributions of the overall tech-

nical efficiency scores are statistically different between the NDEA and DEA models in the study

of non-life insurance companies in Taiwan. There is no evidence suggesting that the differences

of the technical efficiencies of the two sub-processes are statistically significant across the NDEA

and DEA models. The NDEA model seems to provide more meaningful results of the overall

technical efficiencies than the DEA model. For instance, even though both DEA and NDEA mod-

els show that Chung Kuo company does not perform efficiently in sub-process II (the technical

efficiency scores are greater than 1 in both the NDEA and DEA models, see Table A.2.). The

DEA model suggests that the overall process is perfectly efficient (equal to 1) whereas the NDEA

model indicates some inefficiency exists, because the overall efficiency score is 1.6.19

6. Conclusion

In this paper, we developed two bootstrap algorithms for testing the equality of the first moment

and of the entire distributions of technical efficiency scores estimated using the DEA and NDEA

approaches. The algorithms can also be adapted to perform a pairwise comparison between two

NDEA models to explore the sensitivity of the NDEA estimates across various network structures

or assumptions applied to production technology.

The proposed algorithms, when applied to the non-life insurance companies in Taiwan, pro-

vided fairly robust results. Specifically, all tests implied that the overall technical efficiencies

between the NDEA and DEA models are statistically different with regard to their means and

distributions, but this is not the case for the two sub-processes (premium acquisition and profit

generation).

Acknowledgement

We thank the editor and an anonymous referee for their constructive comments and suggestions

that helped to improve this paper substantially.

References

Avkiran, N. K., 2009. Opening the black box of efficiency analysis: An illustration with UAE banks. Omega 37 (4),

930 – 941.

Banker, R. D., Charnes, A., Cooper, W. W., 1984. Some models for estimating technical and scale inefficiencies in

data envelopment analysis. Management Science 30 (9), 1078–1092.

Bowman, A. W., 1984. An alternative method of cross-validation for the smoothing of density estimates. Biometrika

71 (2), 353–360.

Charnes, A., Cooper, W., Golany, B., Halek, R., Klopp, G., Schmitz, E., Thomas, D., 1986. Two phase data envelop-

ment analysis approach to policy evaluation and management of army recruiting activities:tradeoffs between joint

services and army advertising. Research Report CCS no, 532, Center for Cybernetic studies, The University of

Texas at Austin Texas.

Charnes, A., Cooper, W., Rhodes, E., 1978. Measuring the efficiency of decision making units. European Journal of

Operational Research 2 (6), 429 – 444.

20

Chen, Y., Cook, W. D., Zhu, J., 2010. Deriving the DEA frontier for two-stage processes. European Journal of Oper-

ational Research 202 (1), 138 – 142.

Efron, B., 01 1979. Bootstrap methods: Another look at the jackknife. The Annals of Statistics 7 (1), 1–26.

Fare, R., Grosskopf, S., 1996. Intertemporal production frontiers: with dynamic DEA. New York: Kluwer Academic

Publishers.

Fare, R., Grosskopf, S., 2000. Network DEA . Socio-Economic Planning Sciences 34 (1), 35 – 49.

Fare, R., Grosskopf, S., Roos, P., 1996. Network and production models of Swedish pharmacies. Tech. rep., Mimeo.

Fare, R., Primont, D., 1995. Multi-ouput production and duality: Theory and applications. New York: Kluwer Aca-

demic Publishers.

Fare, R., Whittaker, G., 1995. An intermediate input model of dairy production using complex survey data. Journal of

Agricultural Economics 46 (2), 201–213.

Farrell, M. J., 1957. The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A

(General) 120 (3), 253–290.

Hall, P., 1984. Central limit theorem for integrated square error of multivariate nonparametric density estimators.

Journal of Multivariate Analysis 14 (1), 1 – 16.

Kao, C., 2009a. Efficiency decomposition in network data envelopment analysis: A relational model. European Jour-

nal of Operational Research 192 (3), 949 – 962.

Kao, C., 2009b. Efficiency measurement for parallel production systems. European Journal of Operational Research

196 (3), 1107 – 1112.

Kao, C., 2014. Network data envelopment analysis: A review. European Journal of Operational Research 239 (1), 1 –

16.

Kao, C., Hwang, S.-N., 2008. Efficiency decomposition in two-stage data envelopment analysis: An application to

non-life insurance companies in Taiwan. European Journal of Operational Research 185 (1), 418 – 429.

Kneip, A., Simar, L., Wilson, P. W., 2008. Asymptotics and consistent bootstraps for dea estimators in nonparametric

frontier models. Econometric Theory 24 (6), 1663–1697.

Li, Q., 1996. Nonparametric testing of closeness between two unknown distribution functions. Econometric Reviews

15 (3), 261–274.

Li, Q., 1999. Nonparametric testing the similarity of two unknown density functions: local power and bootstrap

analysis. Journal of Nonparametric Statistics 11 (1-3), 189–213.

Politis, D. N., Romano, J. P., Wolf, M., 2001. On the asymptotic theory of subsampling. Statistica Sinica 11, 1105–

1124.

Sheather, S. J., Jones, M. C., 1991. A reliable data-based bandwidth selection method for kernel density estimation.

Journal of the Royal Statistical Society. Series B (Methodological) 53 (3), 683–690.

Shephard, R. W., 1970. Theory of cost and production functions. Princeton,NJ:Princeton University Press.

21

Silverman, B. W., 1986. Density estimation for statistics and data analysis. London:Chapman and Hall.

Simar, L., 1992. Estimating efficiencies from frontier models with panel data: A comparison of parametric, non-

parametric and semi-parametric methods with bootstrapping. Journal of Productivity Analysis 3 (1-2), 171–203.

Simar, L., Wilson, P. W., 1998. Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier

models. Management Science 44 (1), 49–61.

Simar, L., Wilson, P. W., 2000a. A general methodology for bootstrapping in non-parametric frontier models. Journal

of Applied Statistics 27 (6), 779–802.

Simar, L., Wilson, P. W., 2000b. Statistical inference in nonparametric frontier models: The state of the art. Journal of

Productivity Analysis 13 (1), 49–78.

Simar, L., Wilson, P. W., 2002. Non-parametric tests of returns to scale. European Journal of Operational Research

139 (1), 115 – 132.

Simar, L., Wilson, P. W., 2011. Inference by the m out of n bootstrap in nonparametric frontier models. Journal of

Productivity Analysis 36 (1), 33–53.

Simar, L., Zelenyuk, V., 2006. On testing equality of distributions of technical efficiency scores. Econometric Reviews

25 (4), 497–522.

Simar, L., Zelenyuk, V., 2007. Statistical inference for aggregates of Farrell-type efficiencies. Journal of Applied

Econometrics 22 (7), 1367–1394.

Tone, K., Tsutsui, M., 2009. Network DEA: A slacks-based measure approach. European Journal of Operational

Research 197 (1), 243 – 252.

Tone, K., Tsutsui, M., 2010. Dynamic DEA: A slacks-based measure approach. Omega 38 (3-4), 145 – 156.

Tone, K., Tsutsui, M., 2014. Dynamic DEA with network structure: A slacks-based measure approach. Omega 42 (1),

124 – 131.

Tsutsui, M., Goto, M., 2009. A multi-division efficiency evaluation of u.s. electric power companies using a weighted

slacks-based measure. Socio-Economic Planning Sciences 43 (3), 201 – 208.

22

Appendix A. Figures and Tables

1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4Kernel Est. Densities of DEA and NDEA Efficiency Scores

DEANDEA

(a) The Whole Process

1 1.5 2 2.5 30

0.5

1


DEANDEA

(b) Sub-process I

1 2 3 4 5 6 70

0.05

0.1

0.15

0.2

0.25

0.3


DEANDEA

(c) Sub-process II

Figure A.1: Kernel density estimation of the overall technical efficiencies, technical efficiencies of sub-

process I and sub-process II.

23

Table A.1: Summary results of the one-tailed hypothesis test for the equality of means

of efficiency scores between NDEA and DEA models


RD statistic p-value p-value p-value p-valueSub-sample size (m=12) (m=14) (m=16) (m=24)

Process


Note: The hypothesis tested here is:

H0 : E(dTEN

) = E(dTED

) against H1 : E(dTEN

) > E(dTED

).

24

Table A.2: The overall and sub-process technical efficiency scores of the 24 non-life insurance companies in Taiwan

across the NDEA and DEA models

NDEA NDEA NDEA DEA DEA DEA

Company (Over all) (Sub-process I) (Sub-process II) (Over all) (Sub-process I) (Sub-process II)

Taiwan Fire 1.43 1.01 1.42 1.02 1.01 1.40

Chung Kuo 1.60 1.00 1.60 1.00 1.00 1.59

Tai Ping 1.45 1.45 1.00 1.01 1.45 1.00

China Mariners 3.29 1.38 2.38 2.05 1.38 2.31

Fubon 1.30 1.20 1.08 1.00 1.19 1.00

Zurich 2.57 1.04 2.47 1.68 1.04 2.47

Taian 3.62 1.49 2.42 2.13 1.33 1.86

Ming Tai 3.63 1.51 2.41 2.41 1.38 1.96

Central 4.48 1.00 4.48 3.06 1.00 3.43

The First 2.15 1.16 1.85 1.28 1.16 1.48

Kuo Hua 6.10 1.55 3.95 3.54 1.35 3.06

Union 1.32 1.00 1.32 1.00 1.00 1.32

Shingkong 4.81 1.49 3.23 2.84 1.23 1.84

South China 3.46 1.49 2.32 2.13 1.38 1.93

Cathay Century 1.63 1.00 1.63 1.02 1.00 1.42

Allianz President 3.12 1.13 2.77 2.12 1.10 2.60

Newa 2.78 1.59 1.74 1.58 1.38 1.00

AIU 3.86 1.26 3.07 2.34 1.26 2.68

North America 2.43 1.00 2.43 1.22 1.00 2.41

Federal 1.83 1.07 1.71 1.07 1.07 1.11

Royal and Sunalliance 4.98 1.37 3.65 3.00 1.33 3.58

Asia 1.70 1.70 1.00 1.00 1.70 1.00

AXA 2.38 1.19 2.00 1.67 1.18 1.79

Mitsui Sumitomo 7.42 2.33 3.18 3.89 1.00 2.98

Mean 3.06 1.31 2.30 1.88 1.21 1.97

25

Centre for Efficiency and Productivity Analysis · Since the seminal works of Farrell (1957) and...

Documents

Transcript of Centre for Efficiency and Productivity Analysis · Since the seminal works of Farrell (1957) and...