An extended data envelopment analysis for the decision-making

Based on the CCR model, we propose an extended data envelopment analysis to evaluate the efficiency of decision making units with historical input and output data. The contributions of the work are threefold. First, the input and output data of the evaluated decision making unit are variable over time, and time series method is used to analyze and predict the data. Second, there are many sample decision making units, which are divided into several ordered sample standards in terms of production strategy, and the constraint condition consists of one of the sample standards. Furthermore, the efficiency is illustrated by considering the efficiency relationship between the evaluated decision making unit and sample decision making units from constraint condition. Third, to reduce the computation complexity, we introduce an algorithm based on the binary search tree in the model to choose the sample standard that has similar behavior with the evaluated decision making unit. Finally, we provide two numerical examples to illustrate the proposed model.


Introduction
In conventional data envelopment analysis (DEA) models, such as CCR model named after Charnes et al. [] and BCC model proposed by Banker et al. [], the inputs and outputs are assumed to be precise. In addition, the constraint condition consists of the evaluated decision making units (DMUs).
In practical studies, the input and output data of the evaluated DMUs are frequently variable over multiple time periods (time series data), and it is important to analyze the change of efficiency over time. For example, in the evaluation of travel agencies, transportation, ticket price, accommodation, and labor are always regarded as the inputs, whereas profits and satisfaction of tourists are the outputs. The inputs and outputs are affected by various influential factors, such as the tourism policy, investment of infrastructure, level of starred hotel, annual per-capita income, and level of economic development. However, since the influential factors are variable over time, the inputs, outputs, and efficiencies of travel agencies are variable over time accordingly. Given the current upsurge in interest in DEA, it is surprising that the dynamic DEA attracts very little attention. The only methods we know of this area are Malmquist Productivity Index (MPI) and window analysis. MPI was originally proposed by Caves et al. [] to estimate changes in the overall productivity growth of each DMU over a two-year period by calculating the efficiency value. To deal with the productivity changes of DMUs over time, Färe et al. [] constructed a DEA-based MPI by combining the efficiency measurement of Farrell [] with the productivity measurement of Caves et al. Window analysis, proposed by Charnes et al. [], is adopted to overcome the constraint of limited DMUs and is a benefit to detect the tendency of DMUs over long period with large inputs and outputs. Since then, some improved approaches on the DEA-based MPI or window analysis have been proposed [-]. However, both the DEA-based MPI and window analysis models suffer from one shortcoming: they neglect predicting efficiency of the evaluated DMU.
In many practical evaluation problems, efficiency of every evaluated DMU in a particular period may not be contrasted with the evaluated DMUs, but rather with sample standards determined by manufacturing parameters. The purpose of the contrast is not only to evaluate efficiency, but also to locate the standard with which the evaluated DMU has similar behavior. For instance, there are many grade standards for the evaluation of travel agencies.
Travel agencies from the same region can be evaluated by the same standards separately, and those from different regions should not be evaluated by the same standards because of regional disparities. The standards should be formulated by the regional parameters.
Taking outbound tourism as an example, it is an important part for travel agency business in developed regions, but it may not be contained in the travel agency business in some developing regions. Clearly, it is unreasonable that the outbound tourism is included in input measures to evaluate the travel agencies from different regions, and then grade standards in different regions should be formulated in terms of different manufacturing parameters.
With these preparations, we then could use different standards to evaluate the level of travel agencies. However, in the existing DEA models, the constraint condition consists of the evaluated DMUs. Furthermore, we categorize DEA models into two types. Without such considerations, scholars will not be tempted to invest the effort in analyzing and predicting the development trend of the DMUs by contrasting with grade standards. In fact, managers can analyze and predict the development trend of input and output data based on historical data and then determine the level by contrasting with sample standards. Furthermore, to maximize profit and ensure proper resource allocation management, efforts can be made through improving influential factors. Therefore, it is a scenario that is worth considering in this case.
The rest of this paper is organized unfolded as follows. Section  introduces the CCR model and the time series method. In Section , an extended DEA model is proposed. In Section , the relationship between DEA efficiency and the production frontier is illustrated. In Section , the algorithm to determine sample standards is described. In Section , two numerical examples are given to illustrate the proposed model. At the end of the paper, some conclusions are drawn.

CCR model
As a most frequently used DEA model, the CCR model (Charnes et al. []) supposes that there are n DMUs and that each DMU consumes the same input type and produces the same output type. Let m and s be the numbers of inputs and outputs, respectively. All inputs and outputs are assumed to be positive. The multiple inputs and multiple outputs of each DMU are aggregated into a single virtual input and a single virtual output. The efficiency of the evaluated DMU is obtained as a ratio of its virtual output to its virtual input subject to the condition in which the ratio of each DMU is not greater than . The corresponding model is as follows: where x j = (x j , . . . , x mj ) T and y j = (y j , . . . , y s j ) T are the input and output vectors of the jth DMU, DMU j  is the evaluated DMU, and u and v are the weight column vectors of outputs and inputs, respectively. The constraint condition consists of all the evaluated DMUs. By applying the Charnes-Cooper transformation (Charnes and Cooper []) in the model (), the following equivalent linear model is obtained: The optimal objective values of models () and () fall into the range of (, ]. The relationship between DEA efficiency and the optimal objective value (Cooper et al. []) can be obtained as follows.
Definition  (DEA efficient) If the optimal objective value of the evaluated DMU is equal to  and there is at least one optimal solution in which the optimal weight vectors of inputs and outputs are greater than , then the evaluated DMU is DEA efficient.
Definition  (weak DEA efficient) If the optimal objective value of the evaluated DMU is equal to  and there is no optimal solution in which the optimal weight vectors of inputs and outputs are greater than , then the evaluated DMU is weak DEA efficient.
Definition  (DEA inefficient) If the optimal objective value of the evaluated DMU is less than , then the evaluated DMU is DEA inefficient.

Time series method
A discrete ordered set of observed data that changes over time is called a time series and denoted as y(t) = {y(t  ), y(t  ), . . . , y(t i ), . . .}, where y(t i ) is the observed data at the moment t i . Time series can be divided into nonparametric and parametric models. The nonparametric model estimates the covariance or the spectrum without assuming that the process has a particular structure. By contrast, the parametric model assumes that the underlying stationary stochastic process has a certain structure. The time series model is used to extract meaningful statistic and other characteristics of the observed data and then to predict the development trend. It is usually composed of three parts, namely, where f (t) is the trend term, which reflects the changing trend of Y (t), p(t) is the periodic term, reflecting the cyclical change of Y (t), and X(t) is the stochastic term, which reflects the influence of random factors of Y (t). Here we assume that X(t) is a normal stationary stochastic process (Chatfield [], Gershenfeld []).

An extended DEA model
In this section, based on the fundamental CCR model,we propose an extended DEA model. In the model, the input and output data of the evaluated DMUs are predicted by the time series method based on the historical data. The constraint condition consists of one of the sample standards determined by the production strategy. There are many sample DMUs, which are further divided into several ordered sample standards in terms of manufacturing parameters. Moreover, sample DMUs in the same standard have similar behavior. It is important to stress here that the evaluated DMU does not belong to the set of sample DMUs. The extended DEA model is as follows: where x E (t) and y E (t) are the input and output vectors of the evaluated DMU at the moment t, every element of x E (t) and y E (t) is nonnegative, andx kh (s) andȳ kh (s), which are determined in terms of the manufacturing parameter s, are the vectors of inputs and outputs of the hth sample DMU in the kth standard. There arem standards, and the kth (k = , . . . ,m) standard is composed ofn k sample DMUs. The efficiency of the evaluated DMU is obtained from the maximum of the ratio of weighted outputs to inputs, and the ratio is less than or equal to  for every sample DMU from the standard regarded as constraint condition. The corresponding linear programming model It is easy to see that the evaluated DMU is not contained in the constraint condition. The optimal objective values of u T y E (t) v T x E (t) and μ T y E (t) in models () and () vary in (, +∞). The superefficiency definition of the proposed model is given as follows.
Definition  (DEA superefficient) An evaluated DMU is DEA superefficient if its optimal objective value is higher than  and there is at least one optimal solution in which the optimal weight vectors of inputs and outputs are greater than .
To determine the efficiency of the evaluated DMUs in the proposed model, the following theorems are given by considering the relationship between DEA efficiency and the optimal objective value.
Theorem  If the evaluated DMU is DEA superefficient by the kth standard, then the optimal objective value is greater than .
Theorem  The evaluated DMU is DEA efficient by all the combinations of sample DMUs in the kth standard if and only if there exists an optimal objective value that is equal to  and the optimal weight vectors of inputs and outputs are greater than .
Theorem  The evaluated DMU is weak DEA efficient by the kth standard if and only if the optimal objective value is equal to  and there does not exist any optimal solution in which the optimal weight vectors of inputs and outputs are greater than .

Theorem  The evaluated DMU is DEA inefficient by all the combinations of sample
DMUs in the kth standard if and only if all optimal objective values are less than .

The relationship between DEA efficiency and the production frontier
In this section, we consider the case of two inputs and a single output to show the relationship between DEA efficiency and the production frontier. DEA efficiency is independent of the change of inputs and output by the same proportion, so we can change the inputs and output in the same proportion for each DMU until the output data of the evaluated DMUs and sample DMUs are equal. Next, the coordinate system is established with input  and input  as the x and y coordinate axes. For the DMU in the coordinate system, the closer it gets to the coordinate origin, the higher efficiency will be.

DEA efficiency and the production frontier in the conventional DEA models
In the CCR model, the constraint condition consists of all the DMUs, and the production frontier is spanned by efficient DMUs and weak efficient DMUs. As shown in Figure , the production frontier is spanned by DMUs S  , S  , S  , E  , and E  . DMUs S  , S  , S  , and E  are DEA efficient, DMU E  is weak DEA efficient, and DMU E is DEA inefficient. In the superefficiency model, the constraint condition consists of all the DMUs except the evaluated DMU, and the production frontier is spanned by all the corresponding DMUs without the DMU under evaluation. If the evaluated DMU is located on the weak production frontier, then it is weak efficient (Yu et al. [], Wei et al. []). If the evaluated DMU is located on the efficient production frontier, then it is efficient (that is, there exist positive optimal weight vectors of inputs and outputs such that the efficiency of the evaluated DMU is equal to the efficiency of a certain sample DMUs and the optimal objective value is equal to  (Doyle and Green [], Salo and Punkka []). If the evaluated DMU is located in the production possibility set but is not located on the production frontier, then it is inefficient. Otherwise, the evaluated DMU is superefficient. For example, the evaluated DMU S  is superefficient in Figure (a), the evaluated DMUs E  and E  are efficient and weak efficient, respectively, in Figure (b), and the evaluated DMU E is inefficient in Figure (c).

DEA efficiency and the production frontier in the proposed model
Unlike conventional DEA models, in the proposed model, the constraint condition consists of one of the sample standards, and the production frontier is spanned by different combinations of sample DMUs from the constraint condition. To illustrate this, now we suppose that there are seven evaluated DMUs E  -E  and that the kth standard is the constraint condition consisting of nine sample DMUs S  -S  .
The consequence of all the combinations of sample DMUs in the kth standard is easily understood in terms of Figure . The production frontier of sample DMUs S  -S  is shown in the shaded portion. We can see that the most efficient production frontier is spanned by sample DMUs S  -S  , the least efficient production frontier is spanned by sample DMUs S  -S  , and the other production frontiers that are spanned by different combinations of sample DMUs S  -S  are located between the most and least efficient production frontiers.
The evaluated DMU E  is closer to the coordinate origin than the most efficient production frontier, and then the efficiency of DMU E  is higher than that of every sample DMU from the constraint condition. In such a case, the constraint condition consists of all sample DMUs of the kth standard, the optimal objective value is greater than , and the evaluated DMU E  is DEA superefficient.
If the evaluated DMU is located between the most and least efficient production frontiers, then there is at least one optimal objective value equal to  for the evaluated DMU, such as DMU E  , E  , E  , E  , and E  . Clearly, in Figure , it is easy to see that DMU E  is DEA superefficient by the least efficient production frontier S  -S  and DEA inefficient by the most efficient production frontier S  -S  ; then DMU E  is DEA efficient (i.e., the optimal objective value of the evaluated DMU E  is equal to , and the optimal weight vectors of inputs and outputs are greater than ) by a combination of the kth standard. Similarly, there is an optimal objective value of DMU E  equal to , and the optimal weight vectors of inputs and outputs are greater than . In the following, we consider the evaluated DMU E  , which is weak DEA efficient relative to the least efficient production frontier, and then there is an optimal objective value of the DMU E  equal to . A similar analysis applies to the evaluated DMU E  ; we can see that there is also an optimal objective value equal to . Finally, we take the evaluated DMU E  into account, it can be expressed by a linear combination of DMU S  and S  , and thus it is DEA efficient by the least efficient production frontier, and there is an optimal objective value equal to . The evaluated DMU E  is located in the production possibility set of the least efficient production frontier but not located on the least efficient production frontier. Then DMU E  is DEA inefficient, and the optimal objective value is less than . In fact, in the proposed model, the determined production frontier is spanned by the difference between the production possibility sets of the most and least efficient production frontiers.

Algorithm
Sample DMUs are divided intom ordered sample standards, and it is important to stress thatm may be a very large value. In such a case, there will be high computation complexity if we locate the standard individually. Differently from the published works that address the problem of reducing computation complexity in DEA (e.g., Dulá [], Dulá and Thrall []), we introduce the algorithm based on a binary search tree in the proposed model to determine the sample standard with which the evaluated DMU has similar behavior.
If the evaluated DMU is superefficient by the tth standard, then the constraint condition should turn to the standard with higher efficiency. If the evaluated DMU is weak efficient or inefficient by the tth standard, then the constraint condition should turn to the standard with lower efficiency. Otherwise, the evaluated DMU is located in the tth standard, that is, the evaluated DMU has similar behavior with the tth standard. Let [x] denote the greatest integer not greater than x. The algorithm is summarized as follows.
Step : Star with dividing the sample DMUs intom ordered sample standards. Let t  = , t  =m.
Step : Use the t  th and t  th standards to evaluate the evaluated DMU.
If the evaluated DMU is DEA efficient by the t  th (or t  th) standard, then Stop -the evaluated DMU has similar behavior with the t  th (or t  th) standard; If the evaluated DMU is DEA superefficient (or inefficient) by the t  th and the t  th standard, then Stop -the evaluated DMU has not similar behavior with all the sample standards; Else Turn to Step .
Step : If the evaluated DMU is DEA superefficient by the t  th standard and inefficient by the t  th standard, then If the evaluated DMU is DEA inefficient by the t  th standard, then t  ← t  ; Turn to Step ; If the evaluated DMU is DEA superefficient by the t  th standard, then t  ← t  ; Turn to Step ; Else Turn to Step .
If the evaluated DMU is DEA superefficient by the t  th standard and weak efficient by the t  th standard, then Stop -the evaluated DMU has similar behavior with the (t  -)th standard.
If the evaluated DMU is DEA inefficient by the t  th standard and superefficient by the t  th standard, then If the evaluated DMU is DEA superefficient by the t  th standard, then t  ← t  ; Turn to Step ; If the evaluated DMU is DEA inefficient by the t  th standard, then t  ← t  ; Turn to Step ; Else Turn to Step .
If the evaluated DMU is DEA weak efficient by the t  th standard and superefficient by the t  th standard, then Stop -the evaluated DMU has similar behavior with the (t  + )th standard;

Illustrative examples
In this section, we present two numerical examples to illustrate the proposed model. For simplicity, "sample DMU" will be abbreviated to "SDMU".

The first example
In this example, the data of the evaluated DMU in the last  years are provided in Table .
There are  sample DMUs with two inputs and a single output listed in Table , and the sample DMUs SDMU i , SDMU i , and SDMU i (i = , , , , ) are located in the ith standard. By the proposed model the production status of the evaluated DMU in the th year is analyzed. Firstly, time series method is used to analyze the inputs and outputs. In Figure , all the p values of t-statistics are less than ., and then the original hypothesis whose parameter is  should be rejected, and the three unknown parameters are considered to be significant. The AR() model is suitable for fitting the Input  sequence, and the predicting equation is as follows: x  (t) = . + .x  (t -) -.x  (t -).
Similarly, the predicting equations of Input  and Output are given respectively by The DEA model is as follows: () Table 2 The sample standards  Secondly, the production status of the evaluated DMU in the th year is evaluated by the sample standards. Since the outputs of the constraint condition are interval values, it is impossible to calculate every value. Then the two endpoints of interval are defined as the pessimistic and optimistic values separately (Wang et al. []). For example, a is the pessimistic value, and b is the optimistic value in the interval (a, b] or [a, b]. Since all the points of an interval lie between the endpoints (i.e., the pessimistic and optimistic values), it is reasonable that the interval values are replaced by the pessimistic and optimistic values. Each sample DMU in the constraint condition is divided into two corresponding sample DMUs based on the pessimistic and optimistic values. The process is given in Table , and the evaluated DMU is located in the fourth sample standard in the next year.

The second example
Strategic groups are always used in the strategic management of insurance companies, and groups companies have similar business models or similar combinations of strategies. An insurance company can ascertain major competitors, obtain the competitive situation, and then formulate production strategy by analyzing strategic groups [].

.. Dividing the sample insurance companies into ordered strategic groups
In this example, we only study the property insurance companies. The Appendix gives the overall production status of sample property insurance companies from  to  by the averaging method (the data is collected from Yearbooks of China's Insurance). We assume that the formation of strategic groups is determined according to the following five manufacturing parameters: total number of employees (TNE), fixed assets (FA), sales tax and extra charges (STEC), earned insurance premiums (EIP), and expenses of payments (EP). The first three manufacturing parameters are inputs, and the others are outputs. FA, STEC, EIP, and EP are described in the unit of million Yuan RMB. As shown in Table , the sample DMUs are divided into six standards (i.e., six sample strategic groups).

.. Predicting production status of Samsung Fire & Marine Insurance (China)
Company Ltd In this example, the evaluated DMU is Samsung Fire & Marine Insurance (China) Company Ltd. (Samsung F&M). The data of inputs and outputs are shown in Table . Based on historical production status from  to , the predicted production status of  is given in Table . It is worth noting that Samsung F&M was established from  in China, and thus the data of production status are limited.    Table 7 The evaluation processes and results Step Using predicted production status Using actual production status

Conclusions
In the conventional DEA model, the inputs and outputs are known exactly, and the constraint condition consists of the evaluated DMUs. However, in many real applications, the observed data of the evaluated DMUs are variable over time. The efficiency of every evaluated DMU in a particular period may not be contrasted with the evaluated DMUs, but with sample standards determined by production strategy. Moreover, the development trend of the evaluated DMU, which is an important index to the budgetary decision-making and management system, is often required to be predicted. In this paper, we proposed an extended DEA model to evaluate the efficiency of DMUs with historical observed data of inputs and outputs. Firstly, based on the historical observed data, we introduced the time series method to analyze and predict the development trend of the evaluated DMUs. Secondly, in the proposed model, there are many sample DMUs, which are divided into several ordered sample standards in terms of manufacturing parameters, and the constraint condition consists of one of the sample standards. Finally, we employ the algorithm based on a binary search tree to determine the constraint condition in order to reduce the computation complexity. One of the most intriguing and appealing points mentioned is that the paper is suitable for the decision-making, whether the evaluated DMUs are hospitals, universities, branches of a bank, or whatever.