Scopus Indexed Paper

Paper Details

Suffix Based Automated Parts of Speech Tagging for Bangla Language
Natural language processing (NLP) is the technique by which we process the human language with the computer. Parts-of-Speech (POS) tagging is one of the fundamental requirements for some NLP applications. It is considered as a solved problem for some foreign languages, such as English, Chinese, due to higher accuracy (97%), where it is still an unsolved problem for Bangla because of its ambiguity. Although making a POS tagger for Bangla is not a new work, but each one of available POS taggers has different kinds of limitations. We choose to develop an unsupervised system rather than a supervised system, because a supervised system needs a huge data resource for training purpose and available resources in Bangla is really poor. Here we develop a POS tagger mainly based on Bangla grammar especially suffixes. Because Bangla is a very inflectional language, where a single word has many variants based on their suffixes. In this POS tagger, we assign 8 base POS tags, where some rules, based on Bangla grammar and suffix, are applied to identify POS tags with the cooperation of verb root dataset. To handle non-suffix words, a dataset of almost 14500 Bangla words, with having their default POS tags, is added with the system, which helps to increase the efficiency of this POS tagger. A modified version of previously used algorithm for suffix analysis is applied, which result in a satisfactory level of about 94.2%.
Tagging, Dictionaries, Grammar, Natural language processing, Hidden Markov models, Speech processing, Training
Monjoy Kumar Roy, Pinto Kumar Paull, Sheak Rashed Haider Noori, S M Hasan Mahmud
Journal or Conference Name
2nd International Conference on Electrical, Computer and Communication Engineering, ECCE 2019
Publish Year