Scopus Indexed Publications

Paper Details


Title
Adaptive Feature Selection and Classification of Colon Cancer From Gene Expression Data: an Ensemble Learning Approach
Author
, Eshtiak Ahmed, Faisal Arafat,
Email
eshtiak.cse@diu.edu.bd
Abstract

Cancer research is one of the major and significant areas in medical research. A substantial number of research has been performed in this area and several methods have been employed. However, accuracy of cancer prediction is yet to reach near perfection as the conventional classification methods have several limitations. In recent times, microarray processed gene expression data has been used to predict cancer with significant accuracy. The gene expression data are usually high dimensional and comprises of relatively small number of samples which makes them difficult to classify. In order to achieve higher accuracy, ensembles method can be deployed which combines multiple classification methods. In this study, we have used the public colon cancer gene expression data set that consists of 62 instances having 2,000 attributes. An adaptive pre-processing procedure has been conducted including Linear Discriminant Analysis (LDA) and Principle Component Analysis (PCA) to cope up with the high dimensionality of the data. This was followed by building an ensemble learning model with k-Nearest Neighbors (kNN), Random Forest (RF), Kernel Support Vector Machines (KSVM), eXtreme Gradient Boosting (XGBoost), and Bayes Generalized Linear Model (GLM). Comparing with other classifiers, this study offers a significant improvement as our ensemble learning model gives higher accuracy than previously employed classification techniques. Thus the obtained accuracy is 91.67% with the scores 0.75, 1.00 and 0.85 of precision, recall and Matthews correlation coefficient (MCC) values respectively.

Keywords
Journal or Conference Name
ACM International Conference Proceeding Series
Publication Year
2020
Indexing
scopus