Scopus Indexed Publications

Paper Details


Title
Clustering-Based Under-Sampling with Normalization in Class-Imbalanced Data
Author
, Tanmoy Mondal,
Email
Abstract
In some real-world data sets, there is a class imbalance where one class (the minority class) has a limited number of data points and the other class (the dominant class) has a large number of data points. With the state-of-the-art machine learning approaches, it is extremely challenging to build an efficient model without taking data preparation into account to balance the unbalanced data sets. To ensure that each class has the same number of data points, random under-sampling has been used in numerous research. During the data preparation phase of this study, this research experiments with under-sampling techniques that use a clustering technique. This Research uses under-sampling techniques with normalization in this non-interest-bearing imbalanced data collection with a majority and minority class. Both majority and minority classifications contain personal information focuses. This Research first uses k-fold cross-validation to separate this unbalanced data set into preparation and testing sets. This Research separates the data into a majority course subset and a minority course subset after normalizing it. The majority of lesson information evaluations are minimized by using a clustering-based under-sampling method. The minority lesson subset is then combined with the reduced lion's share course subset to create an updated preparation set. Then The author normalizes the information once more at that moment. The classifier is then independently prepared and evaluated using the updated preparation and testing sets.

Keywords
Imbalanced Data-sets , Under-sampling , Clustering , Classification , Normalization
Journal or Conference Name
Proceedings of 2022 IEEE International Conference on Current Development in Engineering and Technology, CCET 2022
Publication Year
2022
Indexing
scopus