Scopus Indexed Publications

Paper Details


Title
Advancing Bangla NLP: A Systematic Evaluation of Preprocessing Techniques for Improved Text Classification

Author
Md. Istiak Tanvir, Asma Akter, MD. SAJIB AHAMMAD, MD. TAHMID, Shumona Akter Sraboni, Tanvirul Islam,

Email

Abstract

Bangla NLP is faced with several challenges due to the rich morphology of the language, its diverse dialects and metaphorical expressions. While traditional methods of preprocessing provide a simple technique for text normalization, they cannot be successful in braving these language-related problems. This work fills this gap by suggesting and comprehensively analyzing a new preprocessing pipeline comprising six important techniques: word correction, word splitting, detection of metaphors, identification of dialect, replacement by synonyms and Bangla number-to-text. A comparison evaluation was carried out by concatenating the techniques to verify if they can improve text classification issues on a data set of 67564 points with 19490647 words. The findings report that using this specific preprocessing pipeline results in a 14.3% accuracy gain in the Bangla BERT model from raw and untreated text. Results are valuable indicators of optimizing Bangla text processing techniques towards meaningful development in Bangla NLP applications.


Keywords

Journal or Conference Name
Lecture Notes in Networks and Systems

Publication Year
2026

Indexing
scopus