Scopus Indexed Publications

Paper Details


Title
Training data selection using ensemble dataset approach for software defect prediction
Author
Md. Fahimuzzman Sohan, Md Alamgir Kabir, Md. Mostafijur Rahman, S. M. Hasan Mahmud, Touhid Bhuiyan,
Email
sohan35-1284@diu.edu.bd
Abstract

Cross-project defect prediction (CPDP) is using due to the limitation of within project defect prediction (WPDP) in Software Defect Prediction (SDP) research. CPDP aims to train one project data to predict another project using the machine learning technique. The source and target projects are different in the CPDP setting, because of various structured source-target projects, sometimes it may not be a perfect combination. This study represents a categorical data set ensemble technique, where multiple data sets have been aggregated for source data instead of using a single data set. The method has been evaluated on nine data sets, taken from the publicly accessible repository with two performance indicators. The results of this data set ensemble approach show the improvement of the prediction performance over 65% combinations compared with traditional CPDP models. The results also show that same categories (homogeneous) train-test data set pairs give high performance; otherwise, the prediction performances of different category data sets are mostly collapsed. Therefore, the proposed scheme is recommended as an alternative to predict defects that can improve the prediction of most of the cases compared with traditional cross-project SDP models.

Keywords
Software defect prediction Cross-project defect prediction Training data selection Data set ensemble
Journal or Conference Name
Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
Publication Year
2020
Indexing
scopus