Potato production is facing serious threats due to the continuous spread of leaf diseases, which can severely disrupt plant growth and development. These diseases are common and often destructive, making early diagnosis and detection crucial to improve outcomes. Several automated methods have been developed to detect leaf diseases using various imaging techniques. Building on the proven success of transformer-based models, particularly the Vision Transformer (ViT) in image analysis, this study proposes a suitable ViT-based model for classifying, detecting, and localizing potato leaf diseases using visual data. The experiments evaluate the model performance through metrics such as precision, accuracy, recall, and Fl-score when changing hyperparameters and image patch configurations. The best results were achieved with a picture length of 48×48 and a patch length of 6, providing 99% validation accuracy and 98% test accuracy, supported by confusion matrix and Fl-score evaluation.