Speaker
Description
In this study, a deep learning model based on the Vision Transformer (ViT-B/16) architecture was employed to automatically classify cauliflower leaf diseases using image data. The VegNet dataset, which consists of four classes—Black Rot, Downy Mildew, Bacterial Spot Rot, and Healthy—was utilized to evaluate the effectiveness of the proposed method. All images were manually divided into training, validation, and test subsets to ensure a balanced and controlled experimental setup. Data augmentation techniques, including random rotation and horizontal flipping, were applied only to the training and validation sets to enhance model generalization, while the test images were kept in their original form to provide an unbiased performance assessment.
The pretrained ViT-B/16 model, originally trained on the ImageNet dataset, was fine-tuned by replacing its classification head with a task-specific output layer suitable for four-class prediction. This approach enabled the model to adapt its learned global representations to the more domain-specific task of plant disease identification. Experimental results demonstrate that the proposed model achieved a test accuracy of 99.05%, indicating its strong capability to distinguish visually similar disease patterns. Confusion matrix analysis and class-wise ROC curves further confirmed the model’s robustness and reliability across all disease categories.
Overall, the findings highlight that Vision Transformer-based architectures offer a highly effective and competitive alternative to traditional CNN-based approaches for plant disease classification. The global self-attention mechanism of ViT allows it to capture complex spatial relationships within leaf images, making it a promising solution for agricultural decision-support systems and automated disease monitoring applications.
| Keywords | Classification, Cauliflower Disease, Vision Transformer, VegNet |
|---|