Bengali is a script-rich language with complex characters and ligatures, making it rare in the field of font generation. Existing font generation methods have achieved good results in Chinese, English, and other fonts. However, due to the complexity of the Bengali character, recent methods like FontDiffuser do not produce high-quality Bengali fonts. We propose BengaliDiff, a novel generative model that uses a diffusion-based architecture, style-content fusion, and adversarial supervision to synthesize Bengali characters in a target font style. We use image-to-image translation-based methodology, which enhances font production, as we maintain the structure of characters and provide them with a uniform style in different fonts. We build on our approach of FontDiffuser but use a dual aggregation cross-attention scheme to inject content and style features on channel and spatial levels, individually, into the reverse denoising process. In addition, we embed an adversarial discriminator that promotes stylistically coherent and perceptually accurate generations. According to tests performed with a predefined group of Bengali fonts, it can be said that BengaliDiff is better in content preservation and style consistency compared to the current baselines that exist. To the best of our knowledge, our method is the first to use the diffusion model for Bengali font generation task. The study also provides a publicly available Bengali font dataset and a pre-trained model that allows them to support digitally published materials, text handwriting recognition, and custom typography with better assistance.