Scopus Indexed Publications

Paper Details


Title
Selectively oversampling difficult positive samples from imbalanced data for preprocessing
Author
, Ms. Lamia Rukhsara,
Email
lamia.cse@diu.edu.bd
Abstract
Oversampling is a procedure traditionally has been applied to train machine learning classifiers for a better performance in presence of class imbalance. This work suggests a new insight for oversampling imbalanced data. In literature Borderline samples are mainly focused for oversampling. How-ever, because of low number of samples within the positive class a huge percentage of samples can be labeled as Rare and Outliers. These samples are often overlooked by the traditional oversampling methods or the nearest negative samples are often removed to increase positive prediction rate- while sacrificing the negative prediction rate. This work demonstrates that by only oversampling the Borderline, Rare and Outlier samples at different rate, better performance can be achieved than all other pre-processing methods. The proposed method is applied on four datasets- Abalone, CMC, Solar Flare and Seismic Bump, collected from the UCL digital library and compared with four traditional pre-processing methods ADYSYN, SMOTE, Border-line SMOTE 1 and 2 from imbalanced learn toolkit python. The result analysis shows that with fine tuning better performance can be achieved for all known performance measurements: Accuracy, True Positive Rate, True Negative Rate, Geometric Mean, Area Under the Curve measure and F-measure .

Keywords
Imbalance data , Oversampling , Positive Samples , Negative samples , Safe , Borderline , Rare , Outlier
Journal or Conference Name
22nd International Conference on Computer and Information Technology, ICCIT 2019
Publication Year
2019
Indexing
scopus