DoR - Division of Research

Title: Retrieving Top k% Relevant Patterns for Relation Extraction in Bangla using Distant Supervision

Abstract: Information extraction constitutes a pivotal domain within natural language processing (NLP) aiming to distill structured insights—such as entity types and their relationships—from diverse textual sources. Relation extraction, a core task within this field, automates identifying and classifying relationships between entities mentioned in unstructured text. This paper explores distant supervision (DS) methods tailored for Bangla, addressing the linguistic nuances and challenges of data scarcity. DS leverages knowledge bases for automatic corpus labeling, although this can introduce noise. Our location mnemonics help reduce some noisy patterns related to location-based relations. Additionally, our proposed method retrieves the top K% relevant patterns using conflict scores which is further refined with probability scores. Setting K to 80% results in an impressive F1 score of 91%, demonstrating the efficacy of our approach.

Journal or Conference Name: IEEE International Conference on Signal Processing, Information, Communication and Systems, SPICSCON 2024 - Proceedings