DoR

Scopus Indexed Publications

Paper Details

Title: MCNN-LSTM: Combining CNN and LSTM to Classify Multi-Class Text in Imbalanced News Data

Author: , Sidratul Montaha,

Email

Abstract

"Searching, retrieving, and arranging text in ever-larger document collections necessitate

more efficient information processing algorithms. Document categorization is a crucial component of

various information processing systems for supervised learning. As the quantity of documents grows, the

performance of classic supervised classifiers has deteriorated because of the number of document categories.

Assigning documents to a predetermined set of classes is called text classification. It is utilized extensively

in a wide range of data-intensive applications. However, the fact that real-world implementations of these

models are plagued with shortcomings begs for more investigation. Imbalanced datasets hinder the most

prevalent high-performance algorithms. In this paper, we propose an approach name multi-class Convolutional Neural Network (MCNN)-Long Short-Time Memory (LSTM), which combines two deep learning

techniques, Convolutional Neural Network (CNN) and Long Short-Time Memory, for text classification in

news data. CNN’s are used as feature extractors for the LSTMs on text input data and have the spatial structure

of words in a sentence, paragraph, or document. The dataset is also imbalanced, and we use the Tomek-Link

algorithm to balance the dataset and then apply our model, which shows better performance in terms of F1-

score (98%) and Accuracy (99.71%) than the existing works. The combination of deep learning techniques

used in our approach is ideal for the classification of imbalanced datasets with underrepresented categories.

Hence, our method outperformed other machine learning algorithms in text classification by a large margin.

We also compare our results with traditional machine learning algorithms in terms of imbalanced and

balanced datasets."

Keywords: Big data, text classification, imbalanced data, machine learning, MCNN-LSTM

Journal or Conference Name: IEEE Access

Publication Year: 2023

Indexing: scopus