Automating Data Envelopment Analysis in Python: Functional Comparison with XlDEA/XIDEA and Methodological Assessment of Second-Stage Inference
DOI:
https://doi.org/10.67294/j87y4t65Keywords:
Data Envelopment Analysis, Python Workflow, Technical Efficiency, Bootstrap Inference, Truncated RegressionAbstract
This study evaluated whether a reproducible Python workflow can strengthen Data Envelopment Analysis in industrial efficiency studies when compared with spreadsheet-based tools such as XlDEA/XIDEA. A methodological, documentary, and computational comparative design was applied. The study examined two implementation environments: a reproducible Python workflow and spreadsheet-based analysis tools. Data were collected through a structured comparison matrix that assessed methodological coverage, automation and scalability, reproducibility and auditability, and second-stage inferential robustness. The analytical procedure reviewed input-oriented CCR estimation, bootstrap inference, Tobit modeling on inefficiency, truncated regression with double bootstrap, and automated report generation. The main result indicates that Python provides a more scalable and auditable architecture for repeated analysis, especially when monthly data, multiple decision-making units, and standardized outputs are required. However, spreadsheet tools remain useful for exploratory applications because they offer greater initial accessibility for non-programming users. The study concludes that Python is preferable for production-grade efficiency analysis, while truncated regression with double bootstrap should guide future second-stage inference when contextual determinants of efficiency are analyzed.
References
Allen, R., Athanassopoulos, A., Dyson, R. G., & Thanassoulis, E. (1997). Weights restrictions and value judgements in data envelopment analysis: Evolution, development and future directions. Annals of Operations Research, 73, 13-34. https://doi.org/10.1023/A:1018968909638
Andersen, P., & Petersen, N. C. (1993). A procedure for ranking efficient units in data envelopment analysis. Management Science, 39(10), 1261-1264. https://doi.org/10.1287/mnsc.39.10.1261
Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, 30(9), 1078-1092. https://doi.org/10.1287/mnsc.30.9.1078
Banker, R. D., & Natarajan, R. (2008). Evaluating contextual variables affecting productivity using data envelopment analysis. Operations Research, 56(1), 48-58. https://doi.org/10.1287/opre.1070.0460
Bogetoft, P., & Otto, L. (2011). Benchmarking with DEA, SFA, and R. Springer. https://doi.org/10.1007/978-1-4419-7961-2
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429-444. https://doi.org/10.1016/0377-2217(78)90138-8
Cook, W. D., & Seiford, L. M. (2009). Data envelopment analysis (DEA) - Thirty years on. European Journal of Operational Research, 192(1), 1-17. https://doi.org/10.1016/j.ejor.2008.01.032
Cook, W. D., Tone, K., & Zhu, J. (2014). Data envelopment analysis: Prior to choosing a model. Omega, 44, 1-4. https://doi.org/10.1016/j.omega.2013.09.004
Cooper, W. W., Seiford, L. M., & Tone, K. (2007). Data envelopment analysis: A comprehensive text with models, applications, references and DEA-solver software (2nd ed.). Springer. https://doi.org/10.1007/978-0-387-45283-8
Cooper, W. W., Seiford, L. M., & Zhu, J. (Eds.). (2011). Handbook on data envelopment analysis (2nd ed.). Springer. https://doi.org/10.1007/978-1-4419-6151-8
Daraio, C., & Simar, L. (2005). Introducing environmental variables in nonparametric frontier models: A probabilistic approach. Journal of Productivity Analysis, 24(1), 93-121. https://doi.org/10.1007/s11123-005-3042-8
Doyle, J., & Green, R. (1994). Efficiency and cross-efficiency in DEA: Derivations, meanings and uses. Journal of the Operational Research Society, 45(5), 567-578. https://doi.org/10.1057/jors.1994.84
Dyson, R. G., Allen, R., Camanho, A. S., Podinovski, V. V., Sarrico, C. S., & Shale, E. A. (2001). Pitfalls and protocols in DEA. European Journal of Operational Research, 132(2), 245-259. https://doi.org/10.1016/S0377-2217(00)00149-1
Emrouznejad, A., & Yang, G. L. (2018). A survey and analysis of the first 40 years of scholarly literature in DEA: 1978-2016. Socio-Economic Planning Sciences, 61, 4-8. https://doi.org/10.1016/j.seps.2017.01.008
Gattoufi, S., Oral, M., & Reisman, A. (2004). A taxonomy for data envelopment analysis. Socio-Economic Planning Sciences, 38(2-3), 141-158. https://doi.org/10.1016/S0038-0121(03)00022-3
Hoff, A. (2007). Second stage DEA: Comparison of approaches for modelling the DEA score. European Journal of Operational Research, 181(1), 425-435. https://doi.org/10.1016/j.ejor.2006.05.019
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90-95. https://doi.org/10.1109/MCSE.2007.55
Kneip, A., Simar, L., & Wilson, P. W. (2008). Asymptotics and consistent bootstraps for DEA estimators in nonparametric frontier models. Econometric Theory, 24(6), 1663-1697. https://doi.org/10.1017/S026646660808065X
Liu, J. S., Lu, L. Y. Y., Lu, W. M., & Lin, B. J. Y. (2013a). Data envelopment analysis 1978-2010: A citation-based literature survey. Omega, 41(1), 3-15. https://doi.org/10.1016/j.omega.2010.12.006
Liu, J. S., Lu, L. Y. Y., Lu, W. M., & Lin, B. J. Y. (2013b). A survey of DEA applications. Omega, 41(5), 893-902. https://doi.org/10.1016/j.omega.2012.11.004
Mardani, A., Streimikiene, D., Balezentis, T., Saman, M. Z. M., Nor, K. M., & Khoshnava, S. M. (2018). Data envelopment analysis in energy and environmental economics: An overview of the state-of-the-art and recent development trends. Energies, 11(8), Article 2002. https://doi.org/10.3390/en11082002
McDonald, J. (2009). Using least squares and Tobit in second stage DEA efficiency analyses. European Journal of Operational Research, 197(2), 792-798. https://doi.org/10.1016/j.ejor.2008.07.039
McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56-61. https://conference.scipy.org/proceedings/scipy2010/mckinney.html
Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(k) plan participation rates. Journal of Applied Econometrics, 11(6), 619-632. https://doi.org/10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1
Peng, R. D. (2011). Reproducible research in computational science. Science, 334(6060), 1226-1227. https://doi.org/10.1126/science.1213847
Podinovski, V. V., & Thanassoulis, E. (2007). Improving discrimination in data envelopment analysis: Some practical suggestions. Journal of Productivity Analysis, 28(1-2), 117-126. https://doi.org/10.1007/s11123-007-0045-x
Ramalho, E. A., Ramalho, J. J. S., & Henriques, P. D. (2010). Fractional regression models for second stage DEA efficiency analyses. Journal of Productivity Analysis, 34(3), 239-255. https://doi.org/10.1007/s11123-010-0184-0
Ray, S. C. (2004). Data envelopment analysis: Theory and techniques for economics and operations research. Cambridge University Press. https://doi.org/10.1017/CBO9780511606731
Ruggiero, J. (1998). Non-discretionary inputs in data envelopment analysis. European Journal of Operational Research, 111(3), 461-469. https://doi.org/10.1016/S0377-2217(97)00306-8
Sandve, G. K., Nekrutenko, A., Taylor, J., & Hovig, E. (2013). Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10), e1003285. https://doi.org/10.1371/journal.pcbi.1003285
Seiford, L. M., & Thrall, R. M. (1990). Recent developments in DEA: The mathematical programming approach to frontier analysis. Journal of Econometrics, 46(1-2), 7-38. https://doi.org/10.1016/0304-4076(90)90045-U
Seiford, L. M. (1996). Data envelopment analysis: The evolution of the state of the art (1978-1995). Journal of Productivity Analysis, 7(2-3), 99-137. https://doi.org/10.1007/BF00157037
Simar, L., & Wilson, P. W. (1998). Sensitivity analysis of efficiency scores: How to bootstrap in nonparametric frontier models. Management Science, 44(1), 49-61. https://doi.org/10.1287/mnsc.44.1.49
Simar, L., & Wilson, P. W. (2000). A general methodology for bootstrapping in non-parametric frontier models. Journal of Applied Statistics, 27(6), 779-802. https://doi.org/10.1080/02664760050081951
Simar, L., & Wilson, P. W. (2007). Estimation and inference in two-stage, semi-parametric models of production processes. Journal of Econometrics, 136(1), 31-64. https://doi.org/10.1016/j.jeconom.2005.07.009
Simar, L., & Wilson, P. W. (2011). Two-stage DEA: Caveat emptor. Journal of Productivity Analysis, 36(2), 205-218. https://doi.org/10.1007/s11123-011-0230-6
Sueyoshi, T., & Goto, M. (2012). DEA environmental assessment: Comparison between public and private ownership in petroleum industry. European Journal of Operational Research, 216(3), 668-678. https://doi.org/10.1016/j.ejor.2011.07.046
Thanassoulis, E. (2001). Introduction to the theory and application of data envelopment analysis: A foundation text with integrated software. Springer. https://doi.org/10.1007/978-1-4615-1407-7
Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26(1), 24-36. https://doi.org/10.2307/1907382
Tone, K. (2001). A slacks-based measure of efficiency in data envelopment analysis. European Journal of Operational Research, 130(3), 498-509. https://doi.org/10.1016/S0377-2217(99)00407-5
United Nations. (2015). Transforming our world: The 2030 agenda for sustainable development. United Nations General Assembly. https://sdgs.un.org/2030agenda
van der Walt, S., Colbert, S. C., & Varoquaux, G. (2011). The NumPy array: A structure for efficient numerical computation. Computing in Science & Engineering, 13(2), 22-30. https://doi.org/10.1109/MCSE.2011.37
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., ... SciPy 1.0 Contributors. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261-272. https://doi.org/10.1038/s41592-019-0686-2
Wilson, P. W. (2008). FEAR: A software package for frontier efficiency analysis with R. Socio-Economic Planning Sciences, 42(4), 247-254. https://doi.org/10.1016/j.seps.2007.02.001
Wilson, G., Aruliah, D. A., Brown, C. T., Hong, N. P. C., Davis, M., Guy, R. T., Haddock, S. H. D., Huff, K. D., Mitchell, I. M., Plumbley, M. D., Waugh, B., White, E. P., & Wilson, P. (2014). Best practices for scientific computing. PLoS Biology, 12(1), e1001745. https://doi.org/10.1371/journal.pbio.1001745
Zhou, P., Ang, B. W., & Poh, K. L. (2008). A survey of data envelopment analysis in energy and environmental studies. European Journal of Operational Research, 189(1), 1-18. https://doi.org/10.1016/j.ejor.2007.04.042
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Marlon Stalin Taco Arias (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.






















