Deepfake technology’s widespread use poses serious risks to workplace security by eroding confidence and making it easier for nefarious actions including identity theft, data breaches, and disinformation. Even while current detection techniques make use of sophisticated deep learning models, their generalisability, computing efficiency, and resilience to hostile attacks sometimes cause them to fail in practical applications. In order to improve identification accuracy and liveness verification, this study presents a unique hybrid detection framework that combines face and speech features with feature-level fusion and multimodal biometric analysis. The suggested system is excellent at identifying subtle deepfake changes because it integrates convolutional neural networks (CNNs) for feature extraction and a Bi-LSTM-based classifier for temporal analysis. Additionally, the framework uses a multi-task learning approach that makes it possible to detect and locate deepfake artefacts simultaneously, enhancing the performance and interpretability of the model. Tested on extensive datasets, including DeepfakeTIMIT and AVSpeech, the system obtained a detection accuracy of 98.7%, exceeding state-of-the-art approaches. This strategy shows a lot of promise for implementation in work settings, offering strong defence against new threats while maintaining user privacy and operational scalability. Future research will examine real time deployment and edge computing technologies for extensive workplace applications.