Research Article
BibTex RIS Cite

A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms

Year 2020, , 11 - 21, 15.06.2020
https://doi.org/10.38088/jise.693098

Abstract

Data mining is an interdisciplinary field that uses methods such as machine learning, artificial intelligence, statistics, and deep learning. Classification is an important data mining technique as it is widely used by researchers. Generally, statistical methods or machine learning algorithms such as Decision Trees, Fuzzy Logic, Genetic Programming, Random Forest, Artificial Neural Networks and Logistic Regression have been used in software defect prediction in the literature. Performance measures such as Accuracy, Precision, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are used to examine the performance of these classifiers. In this paper, 4 data sets entitled JM1, KC1, CM1, PC1 in the PROMISE repository, which are created within the scope of the publicly available NASA institution's Metric Data Program, are examined as in the other software defect prediction studies in the literature. These datasets include Halstead, McCabe method-level, and some other class-level metrics. Data sets are used with Wakiato Environment for Knowledge Analysis (WEKA) data mining software tool. By this tool, some classification algorithms such as Naive Bayes, SMO, K *, AdaBoost1, J48 and Random Forest were applied on NASA error datasets in PROMISE repository and their accuracy rates were compared. The best value among the accuracy rates was obtained in the Bagging algorithm in the PC1 data set with the values of %94.13.


Keywords: Software Defect Prediction, McCabe, Halstead, Data Mining, Accuracy, Random Forest


Cite this paper as:
GÜVEN AYDIN, Z.B., SAMLI, R. (2020). A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. Journal of Innovative Science and Engineering. 4(1): 11-21

*Corresponding author: Zeynep Behrin GÜVEN AYDIN
E-mail: zeynepguven@maltepe.edu.tr


Received Date: 24/02/2020
Accepted Date: 05/05/2020
© Copyright 2020 by
Bursa Technical University. Available online at http://jise.btu.edu.tr/


The works published in Journal of Innovative Science and Engineering (JISE) are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Supporting Institution

TÜBİTAK

Project Number

118E682.

Thanks

This research work was supported by The Scientific and Technological Research Council of Turkey (TÜBİTAK), Project Number: 118E682. Also, we are thankful to the PROMISE software engineering repository for providing free and easy access to the NASA defect data sets for use in our research.

References

  • [1] Gayatri, M. and Sudha, A. (2014). Software Defect Prediction System using Multilayer Perceptron Neural Network with Data Mining. International Journal of Recent Technology and Engineering (IJRTE), 3(2): 54-59.
  • [2] Menzies, T., Greenwald, J., and Frank, A. (2006). Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1): 2-13.
  • [3] Elish, K.O. and Elish, M.O. (2008). Predicting Defect-Prone Software Modules Using Support Vector Machines, Journal of Systems and Software, 81: 649-660.
  • [4] Lessmann, S., Baesens, B., Mues, C., and Pietsch, S. (2008). Benchmarking Classification Models for Soft ware Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Eng-ineering, 34(4): 485-496.
  • [5] Moeyersoms, J., de Fortuny, E. J., Dejaeger, K., Baesens, B., and Martens, D. (2015). Comprehensible Software Fault and Effort Prediction: A Data Mining Approach. Journal of Systems and Software, 100: 80-90.
  • [6] Gyimothy, T., Ferenc, R., and Siket, I. (2005). Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software engineering, 31(10): 897-910.
  • [7] Dhankhar, S., Rastogi, H., and Kakkar, M. (2015) Software fault prediction performance in software engineering, 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 11-13 March 2015, pp. 228-232. [8] Koru, A. G. and Liu, H. (2005). Building effective defect-prediction models in practice. IEEE software, 22(6): 23-29.
  • [9] Ma, Y., Guo, L., and Cukic, B. (2007). A Statistical Framework for the Prediction of Fault-Proneness. In Advances in Machine Learning Applications in Software Engineering IGI Global, 237-263.
  • [10] Wang, T. and Li, W. (2010). Naive Bayes Software Defect Prediction Model, 2010 International Conference on Computational Intelligence and Software Engineering, Wuhan, pp. 1-4.
  • [11] Wang, H., Khoshgoftaar, T. M., and Napolitano, A. (2011). An Empirical Study of Software Metrics Selection Using Support Vector Machine. In SEKE July, pp. 83-88.
  • [12] Choudhary, G. R., Kumar, S., Kumar, K., Mishra, A., and Catal, C. (2018). Empirical Analysis of Change Metrics for Software Fault Prediction. Computers & Electrical Engineering, 67: 15-24.
  • [13] Pandey, A. K. and Goyal, N. K. (2010). Predicting Fault-Prone Software Module Using Data Mining Technique and Fuzzy Logic. International Journal of Computer and Communication Technology, 2(2):56-63.
  • [14] Khoshgoftaar, T.M. and Seliya, N. (2002). Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm, In the proceedings of the 4th IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, pp. 365-374.
  • [15] Thwin, M.M. and Quah, T. (2003). Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics, In the proceedings of the 19th International Conference on Software Maintenance, Amsterdam, The Netherlands, pp. 113-122.
  • [16] Elish K.O. and Elish M.O. (2008). Predicting defect-prone software modules using support vector machines, Journal of Systems and Software, 81:649-660.
  • [17] Pai, G.J. and Dugan, J.B. (2007). Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods, IEEE Transactions on Software Engineering, 33: 675-686.
  • [18] Yu, Menzies, T., Greenwald, J., and Frank, A. (2007). Data Mining Static Code Attributes to Learn Defect Predictors, IEEE Transactions on Software Engineering, 33: 2-13.
  • [19] https://machinelearningmastery.com/what-is-the-weka-machine-learning-workbench/ Accessed: 29 January 2020.
  • [20] Chaudhary, N., Mehta, G., and Bajaj, K. (2015). Comparison Of Classification Algorithms And Design Of A Percentage-Split Based Method For Data Classification, IJCSIT, 2(5):1-6.
  • [21] Aydilek, İ . (2018). Yazılım Hata Tahmininde Kullanılan Metriklerin Karar Ağaçlarındaki Bilgi Kazançlarının İncelenmesi ve İyileştirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(5):906-914.
  • [22] http://promise.site.uottawa.ca/SERepository/datasets-page.html/Accessed:09.04.2020
Year 2020, , 11 - 21, 15.06.2020
https://doi.org/10.38088/jise.693098

Abstract

Project Number

118E682.

References

  • [1] Gayatri, M. and Sudha, A. (2014). Software Defect Prediction System using Multilayer Perceptron Neural Network with Data Mining. International Journal of Recent Technology and Engineering (IJRTE), 3(2): 54-59.
  • [2] Menzies, T., Greenwald, J., and Frank, A. (2006). Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1): 2-13.
  • [3] Elish, K.O. and Elish, M.O. (2008). Predicting Defect-Prone Software Modules Using Support Vector Machines, Journal of Systems and Software, 81: 649-660.
  • [4] Lessmann, S., Baesens, B., Mues, C., and Pietsch, S. (2008). Benchmarking Classification Models for Soft ware Defect Prediction: A Proposed Framework and Novel Findings. IEEE Transactions on Software Eng-ineering, 34(4): 485-496.
  • [5] Moeyersoms, J., de Fortuny, E. J., Dejaeger, K., Baesens, B., and Martens, D. (2015). Comprehensible Software Fault and Effort Prediction: A Data Mining Approach. Journal of Systems and Software, 100: 80-90.
  • [6] Gyimothy, T., Ferenc, R., and Siket, I. (2005). Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Transactions on Software engineering, 31(10): 897-910.
  • [7] Dhankhar, S., Rastogi, H., and Kakkar, M. (2015) Software fault prediction performance in software engineering, 2nd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, 11-13 March 2015, pp. 228-232. [8] Koru, A. G. and Liu, H. (2005). Building effective defect-prediction models in practice. IEEE software, 22(6): 23-29.
  • [9] Ma, Y., Guo, L., and Cukic, B. (2007). A Statistical Framework for the Prediction of Fault-Proneness. In Advances in Machine Learning Applications in Software Engineering IGI Global, 237-263.
  • [10] Wang, T. and Li, W. (2010). Naive Bayes Software Defect Prediction Model, 2010 International Conference on Computational Intelligence and Software Engineering, Wuhan, pp. 1-4.
  • [11] Wang, H., Khoshgoftaar, T. M., and Napolitano, A. (2011). An Empirical Study of Software Metrics Selection Using Support Vector Machine. In SEKE July, pp. 83-88.
  • [12] Choudhary, G. R., Kumar, S., Kumar, K., Mishra, A., and Catal, C. (2018). Empirical Analysis of Change Metrics for Software Fault Prediction. Computers & Electrical Engineering, 67: 15-24.
  • [13] Pandey, A. K. and Goyal, N. K. (2010). Predicting Fault-Prone Software Module Using Data Mining Technique and Fuzzy Logic. International Journal of Computer and Communication Technology, 2(2):56-63.
  • [14] Khoshgoftaar, T.M. and Seliya, N. (2002). Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm, In the proceedings of the 4th IEEE International Conference on Tools with Artificial Intelligence, Washington, DC, pp. 365-374.
  • [15] Thwin, M.M. and Quah, T. (2003). Application of Neural Networks for Software Quality Prediction Using Object-Oriented Metrics, In the proceedings of the 19th International Conference on Software Maintenance, Amsterdam, The Netherlands, pp. 113-122.
  • [16] Elish K.O. and Elish M.O. (2008). Predicting defect-prone software modules using support vector machines, Journal of Systems and Software, 81:649-660.
  • [17] Pai, G.J. and Dugan, J.B. (2007). Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods, IEEE Transactions on Software Engineering, 33: 675-686.
  • [18] Yu, Menzies, T., Greenwald, J., and Frank, A. (2007). Data Mining Static Code Attributes to Learn Defect Predictors, IEEE Transactions on Software Engineering, 33: 2-13.
  • [19] https://machinelearningmastery.com/what-is-the-weka-machine-learning-workbench/ Accessed: 29 January 2020.
  • [20] Chaudhary, N., Mehta, G., and Bajaj, K. (2015). Comparison Of Classification Algorithms And Design Of A Percentage-Split Based Method For Data Classification, IJCSIT, 2(5):1-6.
  • [21] Aydilek, İ . (2018). Yazılım Hata Tahmininde Kullanılan Metriklerin Karar Ağaçlarındaki Bilgi Kazançlarının İncelenmesi ve İyileştirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(5):906-914.
  • [22] http://promise.site.uottawa.ca/SERepository/datasets-page.html/Accessed:09.04.2020
There are 21 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Articles
Authors

Zeynep Behrin Güven Aydın 0000-0002-4121-8220

Rüya Şamlı 0000-0002-8723-1228

Project Number 118E682.
Publication Date June 15, 2020
Published in Issue Year 2020

Cite

APA Güven Aydın, Z. B., & Şamlı, R. (2020). A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. Journal of Innovative Science and Engineering, 4(1), 11-21. https://doi.org/10.38088/jise.693098
AMA Güven Aydın ZB, Şamlı R. A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. JISE. June 2020;4(1):11-21. doi:10.38088/jise.693098
Chicago Güven Aydın, Zeynep Behrin, and Rüya Şamlı. “A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms”. Journal of Innovative Science and Engineering 4, no. 1 (June 2020): 11-21. https://doi.org/10.38088/jise.693098.
EndNote Güven Aydın ZB, Şamlı R (June 1, 2020) A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. Journal of Innovative Science and Engineering 4 1 11–21.
IEEE Z. B. Güven Aydın and R. Şamlı, “A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms”, JISE, vol. 4, no. 1, pp. 11–21, 2020, doi: 10.38088/jise.693098.
ISNAD Güven Aydın, Zeynep Behrin - Şamlı, Rüya. “A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms”. Journal of Innovative Science and Engineering 4/1 (June 2020), 11-21. https://doi.org/10.38088/jise.693098.
JAMA Güven Aydın ZB, Şamlı R. A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. JISE. 2020;4:11–21.
MLA Güven Aydın, Zeynep Behrin and Rüya Şamlı. “A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms”. Journal of Innovative Science and Engineering, vol. 4, no. 1, 2020, pp. 11-21, doi:10.38088/jise.693098.
Vancouver Güven Aydın ZB, Şamlı R. A Comparison of Software Defect Prediction Metrics Using Data Mining Algorithms. JISE. 2020;4(1):11-2.


Creative Commons License

The works published in Journal of Innovative Science and Engineering (JISE) are licensed under a  Creative Commons Attribution-NonCommercial 4.0 International License.