Scopus Indexed Publications

Paper Details


Title
Bidirectional LSTMs - CRFs networks for bangla POS tagging
Author
, Sheak Rashed Haider Noori,
Email
drnoori@daffodilvarsity.edu.bd
Abstract
Part-of-speech (POS) information is one of the fundamental components in the natural language processing pipeline, which helps in extracting higher-level information such as named entities, discourse, and syntactic structure of a sentence. For some languages, such as English, Dutch, and Chinese, it is considered as a solved problem due to the higher accuracy (97%) of the predicted system. Significant efforts have been made for such languages in terms of making the data publicly accessible and also organizing evaluation campaigns. Compared to that there are very fewer efforts for Bangla (ethnonym: Bangla; exonym: Bengali). In this paper, we present a knowledge poor approach for POS tagging, which we evaluated using publicly accessible dataset from LDC. The motivation of our approach is that we did not want to rely on any existing resources such as lexicon or named entity recognizer for designing the system as they are not publicly available and difficult to develop. We have not used any handcrafted features, rather we employed distributed representations of word and characters. We designed the system using Long Short Term Memory (LSTM) neural networks followed by Conditional Random Fields (CRFs) for designing the model with an inclusion of pre-trained word embedded model. We obtained promising results with an accuracy of 86.0%.

Keywords
Bangla , Deep Learning , POS tagging
Journal or Conference Name
2016 19th International Conference on Computer and Information Technology (ICCIT)
Publication Year
2017
Indexing
scopus