Biomarker-based cancer identification and
classification tools are widely used in bioinformatics and machine
learning fields. However, the high dimensionality of microarray gene
expression data poses a challenge for identifying important genes in
cancer diagnosis. Many feature selection algorithms optimize cancer
diagnosis by selecting optimal features. This article proposes an
ensemble rank-based feature selection method (EFSM) and an ensemble
weighted average voting classifier (VT) to overcome this challenge. The
EFSM uses a ranking method that aggregates features from individual
selection methods to efficiently discover the most relevant and useful
features. The VT combines support vector machine, k-nearest neighbor,
and decision tree algorithms to create an ensemble model. The proposed
method was tested on three benchmark datasets and compared to existing
built-in ensemble models. The results show that our model achieved
higher accuracy, with 100% for leukaemia, 94.74% for colon cancer, and
94.34% for the 11-tumor dataset. This study concludes by identifying a
subset of the most important cancer-causing genes and demonstrating
their significance compared to the original data. The proposed approach
surpasses existing strategies in accuracy and stability, significantly
impacting the development of ML-based gene analysis. It detects vital
genes with higher precision and stability than other existing methods.