Piecewise Linear Virtual Inputs/Outputs in Interval DEA

Author(s):  
Yiannis G. Smirlis ◽  
Dimitris K. Despotis

A recent development in data envelopment analysis (DEA) concerns the introduction of a piece-wise linear representation of the virtual inputs and/or outputs as a means to model situations where the marginal value of an output (input) is assumed to diminish (increase) as the output (input) increases. Currently, this approach is limited to crisp data sets. In this paper, the authors extend the piece-wise linear approach to interval DEA, i.e. to cases where the input/output data are only known to lie within intervals with given bounds. The authors also define appropriate interval segmentations to implement the piece-wise linear forms in conjunction with the interval bounds of the input/output data and the authors propose a new models, compliant with the interval DEA methodology. They finally illustrate their developments with an artificial data set.

2017 ◽  
Vol 24 (4) ◽  
pp. 1052-1064 ◽  
Author(s):  
Yong Joo Lee ◽  
Seong-Jong Joo ◽  
Hong Gyun Park

Purpose The purpose of this paper is to measure the comparative efficiency of 18 Korean commercial banks under the presence of negative observations and examine performance differences among them by grouping them according to their market conditions. Design/methodology/approach The authors employ two data envelopment analysis (DEA) models such as a Banker, Charnes, and Cooper (BCC) model and a modified slacks-based measure of efficiency (MSBM) model, which can handle negative data. The BCC model is proven to be translation invariant for inputs or outputs depending on output or input orientation. Meanwhile, the MSBM model is unit invariant in addition to translation invariant. The authors compare results from both models and choose one for interpreting results. Findings Most Korean banks recovered from the worst performance in 2011 and showed similar performance in recent years. Among three groups such as national banks, regional banks, and special banks, the most special banks demonstrated superb performance across models and years. Especially, the performance difference between the special banks and the regional banks was statistically significant. The authors concluded that the high performance of the special banks was due to their nationwide market access and ownership type. Practical implications This study demonstrates how to analyze and measure the efficiency of entities when variables contain negative observations using a data set for Korean banks. The authors have tried two major DEA models that are able to handle negative data and proposed a practical direction for future studies. Originality/value Although there are research papers for measuring the performance of banks in Korea, all of the papers in the topic have studied efficiency or productivity using positive data sets. However, variables such as net incomes and growth rates frequently include negative observations in bank data sets. This is the first paper to investigate the efficiency of bank operations in the presence of negative data in Korea.


2011 ◽  
Vol 21 (03) ◽  
pp. 247-263 ◽  
Author(s):  
J. P. FLORIDO ◽  
H. POMARES ◽  
I. ROJAS

In function approximation problems, one of the most common ways to evaluate a learning algorithm consists in partitioning the original data set (input/output data) into two sets: learning, used for building models, and test, applied for genuine out-of-sample evaluation. When the partition into learning and test sets does not take into account the variability and geometry of the original data, it might lead to non-balanced and unrepresentative learning and test sets and, thus, to wrong conclusions in the accuracy of the learning algorithm. How the partitioning is made is therefore a key issue and becomes more important when the data set is small due to the need of reducing the pessimistic effects caused by the removal of instances from the original data set. Thus, in this work, we propose a deterministic data mining approach for a distribution of a data set (input/output data) into two representative and balanced sets of roughly equal size taking the variability of the data set into consideration with the purpose of allowing both a fair evaluation of learning's accuracy and to make reproducible machine learning experiments usually based on random distributions. The sets are generated using a combination of a clustering procedure, especially suited for function approximation problems, and a distribution algorithm which distributes the data set into two sets within each cluster based on a nearest-neighbor approach. In the experiments section, the performance of the proposed methodology is reported in a variety of situations through an ANOVA-based statistical study of the results.


2015 ◽  
Vol 5 (2) ◽  
pp. 137-148 ◽  
Author(s):  
Jeremy N.V Miles ◽  
Priscillia Hunt

Purpose – In applied psychology research settings, such as criminal psychology, missing data are to be expected. Missing data can cause problems with both biased estimates and lack of statistical power. The paper aims to discuss these issues. Design/methodology/approach – Recently, sophisticated methods for appropriately dealing with missing data, so as to minimize bias and to maximize power have been developed. In this paper the authors use an artificial data set to demonstrate the problems that can arise with missing data, and make naïve attempts to handle data sets where some data are missing. Findings – With the artificial data set, and a data set comprising of the results of a survey investigating prices paid for recreational and medical marijuana, the authors demonstrate the use of multiple imputation and maximum likelihood estimation for obtaining appropriate estimates and standard errors when data are missing. Originality/value – Missing data are ubiquitous in applied research. This paper demonstrates that techniques for handling missing data are accessible and should be employed by researchers.


2017 ◽  
Author(s):  
Bernardo A. Mello ◽  
Yuhai Tu

To decipher molecular mechanisms in biological systems from system-level input-output data is challenging especially for complex processes that involve interactions among multiple components. Here, we study regulation of the multi-domain (P1-5) histidine kinase CheA by the MCP chemoreceptors. We develop a network model to describe dynamics of the system treating the receptor complex with CheW and P3P4P5 domains of CheA as a regulated enzyme with two substrates, P1 and ATP. The model enables us to search the hypothesis space systematically for the simplest possible regulation mechanism consistent with the available data. Our analysis reveals a novel dual regulation mechanism wherein besides regulating ATP binding the receptor activity has to regulate one other key reaction, either P1 binding or phosphotransfer between P1 and ATP. Furthermore, our study shows that the receptors only control kinetic rates of the enzyme without changing its equilibrium properties. Predictions are made for future experiments to distinguish the remaining two dual-regulation mechanisms. This systems-biology approach of combining modeling and a large input-output data-set should be applicable for studying other complex biological processes.


2013 ◽  
Vol 46 (3) ◽  
pp. 1054-1066 ◽  
Author(s):  
Núria Macià ◽  
Ester Bernadó-Mansilla ◽  
Albert Orriols-Puig ◽  
Tin Kam Ho
Keyword(s):  

1993 ◽  
Vol 72 (3) ◽  
pp. 1036-1038 ◽  
Author(s):  
Mark E. Stedman

A procedure for involving introductory statistics students in data production is described. The procedure was designed to provide a data set with which students would be intimately familiar and so able to manipulate. Students reported improved understanding of the data over artificial data sets.


2013 ◽  
Vol 13 (11) ◽  
pp. 5533-5550 ◽  
Author(s):  
B. Hassler ◽  
P. J. Young ◽  
R. W. Portmann ◽  
G. E. Bodeker ◽  
J. S. Daniel ◽  
...  

Abstract. Climate models that do not simulate changes in stratospheric ozone concentrations require the prescription of ozone fields to accurately calculate UV fluxes and stratospheric heating rates. In this study, three different global ozone time series that are available for this purpose are compared: the data set of Randel and Wu (2007) (RW07), Cionni et al. (2011) (SPARC), and Bodeker et al. (2013) (BDBP). All three data sets represent multiple-linear regression fits to vertically resolved ozone observations, resulting in a spatially and temporally continuous stratospheric ozone field covering at least the period from 1979 to 2005. The main differences among the data sets result from regression models, which use different observations and include different basis functions. The data sets are compared against ozonesonde and satellite observations to assess how the data sets represent concentrations, trends and interannual variability. In the Southern Hemisphere polar region, RW07 and SPARC underestimate the ozone depletion in spring ozonesonde measurements. A piecewise linear trend regression is performed to estimate the 1979–1996 ozone decrease globally, covering a period of extreme depletion in most regions. BDBP overestimates Arctic and tropical ozone depletion over this period relative to the available measurements, whereas the depletion is underestimated in RW07 and SPARC. While the three data sets yield ozone concentrations that are within a range of different observations, there is a large spread in their respective ozone trends. One consequence of this is differences of almost a factor of four in the calculated stratospheric ozone radiative forcing between the data sets (RW07: −0.038 Wm−2, SPARC: −0.033 Wm−2, BDBP: −0.119 Wm−2), important in assessing the contribution of stratospheric ozone depletion to the total anthropogenic radiative forcing.


2013 ◽  
Vol 373-375 ◽  
pp. 1212-1219
Author(s):  
Afrias Sarotama ◽  
Benyamin Kusumoputro

A good model is necessary in order to design a controller of a system off-line. It is especially beneficial in the implementation of new advanced control schemes in Unmanned Aerial Vehicle (UAV). Considering the safety and benefit of an off-line tuning of the UAV controllers, this paper identifies a dynamic MIMO UAV nonlinear system which is derived based on the collection of input-output data taken from a test flights (36250 samples data). These input-output sample flight data are grouped into two flight data sets. The first flight data set, a chirp signal, is used for training the neural network in order to determine parameters (weights) for the network. Validation of the network is performed using the second data set, which is not used for training, and is a representation of UAV circular flight movement. An artificial neural network is trained using the training data set and thereafter the network is excited by the second set input data set. The predicted outputs based on our proposed Neural Network model is similar to the desired outputs (roll, pitch and yaw) which has been produced by real UAV system.


2007 ◽  
Vol 1 (2) ◽  
pp. 175-190 ◽  
Author(s):  
Kiyoshi Yoneda

Accurate traffic data are the basis for group control of elevators and its performance evaluation by trace driven simulation. The present practice estimates a time series of inter-floor passenger traffic based on commonly available elevator sensor data. The method demands that the sensor data be transformed into sets of passenger input-output data which are consistent in the sense that the transportation preserves the number of passengers. Since observation involves various behavioral assumptions, which may actually be violated, as well as measurement errors, it has been necessary to apply data adjustment procedures to secure the consistency. This paper proposes an alternative algorithm which reconstructs elevator passenger origin-destination tables from inconsistent passenger input-output data sets, thus eliminating the ad hoc data adjustment.


2012 ◽  
Vol 12 (10) ◽  
pp. 26561-26605 ◽  
Author(s):  
B. Hassler ◽  
P. J. Young ◽  
R. W. Portmann ◽  
G. E. Bodeker ◽  
J. S. Daniel ◽  
...  

Abstract. Climate models that do not simulate changes in stratospheric ozone concentrations require ozone input fields to accurately calculate UV fluxes and stratospheric heating rates. In this study, three different global ozone time series that are available for this purpose are compared: the data set of Randel and Wu (2007) (RW07), Cionni et al. (2011) (SPARC), and Bodeker et al. (2012) (BDBP). The latter is a very recent data set, based on the comprehensive ozone measurement database described by Hassler et al. (2008). All three data sets represent multiple-linear regression fits to vertically resolved ozone observations, resulting in a patially and temporally continuous stratospheric ozone field covering at least the period from 1979 to 2005. The main difference between the data sets result from using different observations and including different basis functions for the regression model fits. These three regression-based data sets are compared against observations from ozonesondes and satellites to compare how the data sets represent concentrations, trends, and interannual variability. In the Southern Hemisphere polar region, RW07 and SPARC underestimate the ozone depletion in spring as seen in ozonesonde measurements. A piecewise linear trend regression is performed to estimate the 1979–1996 ozone decrease globally, covering a period of extreme depletion in most regions. BDBP seems to overestimate Arctic and tropical ozone loss over this period somewhat relative to the available measurements, whereas these appear to be underestimated in RW07 and SPARC. In most regions, the three data sets yield ozone values that are within the range of the different observations that serve as input to the regressions. However, the differences among the three suggest that there are large uncertainties in ozone trends. These result in differences of almost a factor of four in radiative forcing, which is important for the resulting climate changes.


Sign in / Sign up

Export Citation Format

Share Document