Analyzing and improving information gain of metrics used in software defect prediction in decision trees

Aydilek, �brahim Berkan

Volume : 30 Issue : 2 Year : 2024

30/2Current Issue Ahead of Print Archive Most Accessed Articles Manuscript Submission

Analyzing and improving information gain of metrics used in software defect prediction in decision trees [Pamukkale Univ Muh Bilim Derg]

Pamukkale Univ Muh Bilim Derg. 2018; 24(5): 906-914 | DOI: 10.5505/pajes.2018.93584

Analyzing and improving information gain of metrics used in software defect prediction in decision trees

�brahim Berkan Aydilek
Harran University Engineering Faculty, Department of Computer Engineering, �anl�urfa

McCabe and Halstead method-level metrics are among the well-known and widely used quantitative software metrics are used to measure software quality in a concrete way. Software defect prediction can guess which or which of the sub-modules in the software to be developed may be more prone to defect. Thus, loss of labor and time can be avoided. The datasets which are used for software defect prediction, usually have an unbalanced class distribution, since the number of records with defective class can be fewer than the number of records with not defective class and this situation adversely affect the results of the machine learning methods. Information gain is employed in decision trees and decision tree based rule classifier and attribute selection methods. In this study, software metrics that provide important information for software defect prediction have been investigated and CM1, JM1, KC1 and PC1 datasets of NASA's PROMISE software repository have been balanced with the synthetic data over-sampling Smote algorithm and improved in terms of information gain. As a result, the software defect prediction datasets with higher classification success performance and the software metrics with increased information gain ratio are obtained in the decision trees.

Keywords: Software defect prediction, decision trees, information gain ratio

Yaz�l�m hata tahmininde kullan�lan metriklerin karar a�a�lar�ndaki bilgi kazan�lar�n�n incelenmesi ve iyile�tirilmesi

�brahim Berkan Aydilek
Harran �niversitesi M�hendislik Fak�ltesi, Bilgisayar M�hendisli�i B�l�m�, �anl�urfa

Yaz�l�m kalitesinin somut bir �ekilde �l��lebilmesi i�in kullan�lan say�sal yaz�l�m metrikleri i�inde bilinen ve yayg�n �ekilde kullan�lanlar aras�nda McCabe ve Halstead y�ntem-seviye metrikleri bulunmaktad�r. Yaz�l�m hata tahmini, geli�tirilecek olan yaz�l�mda bulunan alt mod�llerin hangisi veya hangilerinin daha �ok hataya meyilli olabilece�ini konusunda �ng�r�de bulunabilmektedir. B�ylece i�g�c� ve zaman konusundaki kay�plar�n �n�ne ge�ilebilmektedir. Yaz�l�m hata tahmini i�in kullan�lan veri k�melerinde, hata var s�n�fl� kay�t say�s�, hata yok s�n�fl� kay�t say�s�na g�re daha az say�da olabildi�inden bu veri k�meleri genellikle dengeli olmayan bir s�n�f da��l�m�na sahip olmakta ve makine ��renme y�ntemlerinin sonu�lar�n� olumsuz etkilemektedir. Bilgi kazanc�, karar a�a�lar� ve karar a�ac� temeline dayanan kural s�n�flay�c�, nitelik se�imi gibi algoritma ve y�ntemlerde kullan�lmaktad�r. Bu �al��mada, yaz�l�m hata tahmini i�in �nemli bilgiler sunan yaz�l�m metrikleri incelenmi�, NASA�n�n PROMISE yaz�l�m veri deposundan CM1, JM1, KC1 ve PC1 veri k�meleri sentetik veri art�r�m Smote algoritmas� ile daha dengeli hale getirilerek bilgi kazanc� y�n�nden iyile�tirilmi�tir. Sonu�ta karar a�a�lar�nda s�n�flama ba�ar� performans� daha y�ksek yaz�l�m hata tahmini veri k�meleri ve bilgi kazan� oran� y�kseltilmi� yaz�l�m metrik de�erleri elde edilmi�tir.

Anahtar Kelimeler: Yaz�l�m hata tahmini, karar a�a�lar�, bilgi kazan� oran�

�brahim Berkan Aydilek. Analyzing and improving information gain of metrics used in software defect prediction in decision trees. Pamukkale Univ Muh Bilim Derg. 2018; 24(5): 906-914

Corresponding Author: �brahim Berkan Aydilek, T�rkiye
Manuscript Language: Turkish

TOOLS Full Text PDF Print Download citation RIS EndNote BibTex Medlars Procite Reference Manager Share with email Share Send email to author Similar articles Google Scholar