DoR

Scopus Indexed Publications

Paper Details

Title: Approach of Different Classification Algorithms to Compare in N-gram Feature Between Bangla Good and Bad Text Discourses

Author: Abu Kowshir Bitto, Dr. Imran Mahmud, Khalid Been Badruzzaman Biplob, Mr. Md. Hasan Imam Bijoy, Saima Khan,

Email

Abstract: Bangla Natural Language Processing (BNLP) is a newish challenge in Artificial Intelligence. With the rapid expansion of the Bangla language, it is now adopted on a variety of platforms, including social media, communication platforms, news media, and so on. The classification of text documents becomes an important factor in resolving the challenge of information organization and knowledge management. This study uses five supervised classification methods to explore the categorization of Bangla text discourse using N-gram (unigram, bigram, and trigram) features. Bangla text discourse is collected from different platforms such as social media, personal Bangla blogs, and people's utterances in order to accomplish the research goal. After collecting data, the most difficult part of the Bangla language preprocessing is completed, which includes adding contractions, removing punctuations, encoding, and a variety of other operations. For this study, 1499 text documents were initially used, with 1459 Bangla text discourses being used after preprocessing. To convert the text into a token, N-gram feature methods utilizing TF-IDF-Vectorizer are used. During the experiment phase, unigram, bigram, and trigram feature techniques are used to apply Logistic Regression (LR), Decision Tree Classifier (DTC), Random Forest (RF), Multinomial Naive Bayes (MNB), and K-Nearest Neighbors (KNN) models to the dataset. In the unigram and bigram features, Multinomial Naive Bayes (MNB) outperformed all other classifiers, with the highest accuracy of 89.31% and 86.94%, respectively. The trigram feature of K-Nearest Neighbors (KNN) achieves a maximum accuracy of 84.25%, and the proposed model can classify the Bangla text document as Good or Bad Discourse.

Keywords: "Bangla text classification Bangla Natural Language Processing N-gram feature Sentence categorization Multinomial Naive Bayes"

Journal or Conference Name: Lecture Notes in Electrical Engineering

Publication Year: 2023

Indexing: scopus