DoR

Abstract: Automatic caption generation from images has evolved into an active research topic that requires Natural Language Processing (NLP) and Computer Vision (CV) to comprehend the image input and represent it in text. This can assist visually impaired people by generating text captions of images to understand their surroundings. In this study, we have presented a Long Short-Term Memory (LSTM) based Recurrent Neural Network (RNN) approach, which can generate natural language for an image. A dataset containing 8,000 images and a total of 37611 captions are utilized for training our model. Besides, VVG16 is employed to extract features from images. Finally, performance is evaluated, which shows an accuracy of 66% and BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores of 0.40, 0.18, 0.11, and 0.03, respectively.

Keywords: Computer vision , Deep learning , Encoder-Decoder , Image captioning , LSTM

Journal or Conference Name: Proceedings of 2022 IEEE International Women in Engineering (WIE) Conference on Electrical and Computer Engineering, WIECON-ECE 2022