A novel automated multi-classification approach is proposed for the anticipation of lung abnormalities using chest X-ray and CT images. The study leverages a publicly accessible dataset with an insufficient and unbalanced number of images, addressing this issue by employing the data augmentation approach DCGAN to balance the dataset. Various preprocessing procedures are applied to improve features and reduce noise in lung pictures. As the base for the model, the vision trans-former and convolution-based compact convolutional transformer (CCT) model is utilized. To determine the best model configuration, an ablation study is performed on the original CCT model using a CT scan dataset with image dimensions of 32×32 . Following that, this model is trained on the X-ray dataset to evaluate performance on an entirely other modality. The performances are compared to six pre-trained models with 32×32 images. While traditional models achieved modest performance, with test accuracies ranging from 43% to 77% and 49% to 73% requiring lengthy training times, the suggested model performed exceptionally well, obtaining test accuracies of 99.77% and 95.37% for CT and X-ray, respectively with a short training duration of 10–12 and 40–42 seconds/epoch. Robustness is demonstrated through the progressive reduction of the number of training images, with findings indicating that the model maintains good performance even on a reduced dataset. An explainable AI technique Grad-CAM is used to explain the model’s judgment. Grad-CAM-based color visualization is shown to explain model assessments and help health specialists make quick, confident decisions. This study used image preprocessing and deep learning techniques to detect lung anomalies, and it addressed the challenges of training time and computational complexity.