Diabetic retinopathy (DR) is the foremost cause of blindness in people with diabetes worldwide, and early diagnosis is essential for effective treatment. Unfortunately, the present DR screening method requires the skill of ophthalmologists and is time-consuming. In this study, we present an automated system for DR severity classification employing the fine-tuned Compact Convolutional Transformer (CCT) model to overcome these issues. We assembled five datasets to generate a more extensive dataset containing 53,185 raw images. Various image pre-processing techniques and 12 types of augmentation procedures were applied to improve image quality and create a massive dataset. A new DR-CCTNet model is proposed. It is a modification of the original CCT model to address training time concerns and work with a large amount of data. Our proposed model delivers excellent accuracy even with low-pixel images and still has strong performance with fewer images, indicating that the model is robust. We compare our model’s performance with transfer learning models such as VGG19, VGG16, MobileNetV2, and ResNet50. The test accuracy of the VGG19, ResNet50, VGG16, and MobileNetV2 were, respectively, 72.88%, 76.67%, 73.22%, and 71.98%. Our proposed DR-CCTNet model to classify DR outperformed all of these with a 90.17% test accuracy. This approach provides a novel and efficient method for the detection of DR, which may lower the burden on ophthalmologists and expedite treatment for patients.