DoR

Abstract: One of the most popular research topics in the field of computer vision is automatic image captioning. Image captioning is essentially the process of generating textual description from an image, with the description generated based on the objects and actions in the image. A significant amount of effort has gone into creating image captions in English. However, there have only been a few previous works that have been discovered to generate Bangla image captioning for images, and Papers proposed solution has failed to provide satisfactory results. As a result, in this study, Author proposed a deep learning-based model for Bangla image captioning. This Research used a deep learning-based feature extractor named ResNet50 As a proposed model to extract feature vectors from images and an LSTM network to generate text captions using these features. The Research has deployed two datasets the BanglaLekhaImageCaptions dataset, which contains 9000 images, and the Flickr dataset, which contains 8000 images. After training and testing the model with these datasets, It discovered that the model outperforms all previous Bangla image captioning models.

Journal or Conference Name: Proceedings of 2022 IEEE International Conference on Current Development in Engineering and Technology, CCET 2022