Natural language
processing (NLP) is
the technique by
which we process
the human language
with the computer. Parts-of-Speech (POS)
tagging is one
of the fundamental requirements
for some NLP
applications. It is considered as
a solved problem
for some foreign
languages, such
as English, Chinese, due to higher accuracy (97%), where it is
still an unsolved
problem for Bangla
because of its ambiguity.
Although making a POS tagger for Bangla is not a new work,
but each one of available POS taggers has different kinds of
limitations. We choose
to develop an
unsupervised system rather
than a supervised
system, because a supervised
system needs
a huge data
resource for training
purpose and available
resources in Bangla is really poor. Here
we develop a POS tagger
mainly based on
Bangla grammar especially
suffixes. Because
Bangla is a very inflectional
language, where a single
word has many variants based on their suffixes. In this POS tagger,
we assign 8 base POS
tags, where some
rules, based on
Bangla grammar and
suffix, are applied
to identify POS tags
with the cooperation
of verb root
dataset. To handle non-suffix
words, a dataset of almost 14500 Bangla words, with having
their default POS tags, is added with the system, which helps to
increase the efficiency
of this POS
tagger. A modified version of
previously used algorithm
for suffix analysis
is applied,
which result in a satisfactory level of about 94.2%