Automatic caption generation
from images has evolved into an active research topic that requires
Natural Language Processing (NLP) and Computer Vision (CV) to comprehend
the image input and represent it in text. This can assist visually
impaired people by generating text captions of images to understand
their surroundings. In this study, we have presented a Long Short-Term
Memory (LSTM) based Recurrent Neural Network (RNN) approach, which can
generate natural language for an image. A dataset containing 8,000
images and a total of 37611 captions are utilized for training our
model. Besides, VVG16 is employed to extract features from images.
Finally, performance is evaluated, which shows an accuracy of 66% and
BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores of 0.40, 0.18, 0.11, and 0.03,
respectively.