The perception of the complex road environment is a critical factor in autonomous driving, which has become the research focus in intelligent vehicles. In this paper, a real-time front vehicle detection system is proposed to ensure safe driving in a complex environment, particularly in congested megacities. This system is based on the YOLO model, which effectively detects and classifies various vehicles from both images and videos. It improves detection accuracy by modifying a feature extraction-based backbone. To the authors’ best knowledge, this is the first time that vehicle detection is implemented on the recently published DhakaAI dataset. Compared to the other available datasets for object detection, such as KITTI, the DhakaAI dataset has a complex environment with numerous vehicles (21 different types). Experimental results demonstrate that the proposed system outperforms the state-of-the-art object detectors. In this method, the mAP (mean average precision) and the FPS (frame per second) is increased by 2.97% and 1.47, 4.64% and 5.57, 4.75% and 3.02, compared to the RetinaNet, SSD, and Faster RCNN on this dataset, respectively.