Scopus Indexed Publications

Paper Details


Title
Lemmatization Algorithm Development for Bangla Natural Language Processing
Author
, Nusrat Jahan Prottasha,
Email
Abstract
Natural language processing (NLP) finds enormous applications in autonomous communication, while lemmatization is an essential preprocessing technique for simplification of a word to its origin-word in NLP. However, there is scarcity of effective algorithms in Bangla NLP. This leads us to develop a useful Bangla language lemmatization tool. Usually, some rule base stemming processes play the vital role of lemmatization in Bangla language processing as there is lack of Bangla lemmatization tool. In this paper, we propose a Bangla lemmatization framework using three effective lemmatization techniques based on data structures and dynamic programming. We have used Trie algorithm and developed a mapping algorithm named “Dictionary Based Search by Removing Affix (DBSRA)” based on data structure. We have applied both Trie and DBSRA lemmatization and selected the better one by considering the Levenshtein distance between the lemma and the original word. Eventually, we have experimented with Bangla language lemmatization among all three techniques and the framework. Among the three proposed techniques, the DBSRA performed better compared to others with an accuracy of 93.1 percent. The framework, developed by fusing three algorithms, came out with the highest efficiency of 95.89 percent. Contribution-This paper presents the development of three lemmatization algorithms and their fusion to develop a framework for Bangla Natural Language Processing.

Keywords
Bangla NLP , lemmatization , Trie , DBSRA , corpus
Journal or Conference Name
2020 Joint 9th International Conference on Informatics, Electronics & Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision & Pattern Recognition (icIVPR)
Publication Year
2020
Indexing
scopus