DoR - Division of Research

Scopus Indexed Publications

Paper Details

Title: BanglaSarc3: A benchmark dataset for Bangla sarcasm detection from social media to advance Bangla NLP

Author: Susmoy Biswas, Md. Hasan Imam Bijoy, Md. Minhazul Abedin, Md. Mostafizur Rahman Zahid, Md. Sadekur Rahman, MST TAPOSI RABEYA,

Email

Abstract: Sarcasm is a form of sentiment often used for comedic effect. Its widespread use contributes to frequent misinterpretation of humour-based comments among native Bengali speakers. The growing prevalence of sarcasm in the Bengali language necessitates further study using natural language processing, as detecting Bengali sarcasm remains particularly challenging. To address this, the study introduces BanglaSarc3, a ternary-class dataset comprising 12,089 Facebook comments categorised as sarcastic with 4012 instances, neutral with 4056 instances, and non-sarcastic with 4021 instances. This dataset aims to tackle humour misinterpretation, which often leads to digital conflicts, providing a valuable resource for improving sarcasm detection in Bengali NLP research. This dataset serves as a benchmark for evaluating NLP models on Bengali sarcasm classification, fostering linguistic diversity and inclusive language models while ensuring balanced category representation. To enhance data quality, pre-processing steps such as anonymisation and duplicate removal were applied. Three native Bengali speakers independently assessed the text labels, ensuring reliability. Designed to advance NLP research, BanglaSarc3 supports applications in sarcasm detection, sarcastic text classification, language modelling, and education. By providing a robust foundation for temporal analysis in Bengali, it enhances the development of precise, context-aware NLP models. The dataset is openly available for academic and research purposes, promoting collaboration and innovation within the Bengali NLP community.

Keywords

Journal or Conference Name: Data in Brief

Publication Year: 2025

Indexing: scopus