DoR - Division of Research

Title: BRADS and BRWDS: Multipurpose audio and text datasets for automatic Bangla regional speech recognition

Abstract: This paper presents an innovative approach to Bangla voice recognition. Although Bangla is the seventh most spoken native language globally, it remains underrepresented in voice recognition research. The dataset contains 298 frequently used Bangla words, including 233 regional words and 65 standard Bangla words. These terms, encompassing various regional pronunciations and meanings, were collected from native speakers in Dhaka, Chattogram, Barisal, Mymensingh, Rajshahi, Sylhet, Rangpur, and Khulna. The 2439 audio segments in the dataset were contributed voluntarily by 85 native speakers and assessed by ten university students. This resource is intended for researchers working on automatic Bangla regional speech recognition systems, with an emphasis on capturing regional pronunciation and linguistic differences. The dataset allows researchers to recreate real-world scenarios during model training by incorporating background noise. Additionally, its modular construction enables further expansion to include new regional words. This multipurpose dataset addresses a critical gap in Bangla speech recognition research and has the potential to drive significant advancements in natural language processing (NLP), particularly with regard to linguistic diversity in Bangladesh.