Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.

Sec. AI in Food, Agriculture and Water

Volume 8 - 2025 | doi: 10.3389/frai.2025.1498025

MoSViT: A Lightweight Vision Transformer Framework for Efficient Disease Detection via Precision Attention Mechanism

Provisionally accepted
元奇 陈 元奇 陈 *爱萍 王 爱萍 王 子扬 刘 子扬 刘 杰 岳 杰 岳 Zhang Enxu Zhang Enxu 飞 李 飞 李 宁 张 宁 张 *
  • Xijing University, Xi'an, China

The final, formatted version of the article will be published soon.

    Maize, a staple food crop globally, faces substantial yield reductions due to diseases. Traditional diagnostic methods are often inefficient and subjective, leading to significant challenges in timely and accurate pest management. To address these issues, this paper proposes an innovative classification model, MoSViT, incorporating advanced machine learning and computer vision technologies. Building on the MobileViT V2 framework, we integrate the CLA focus mechanism, DRB module, MoSViT Block, and the LeakyRelu6 activation function to enhance feature extraction accuracy while reducing the complexity of analyzing all possible features. Trained on a comprehensive dataset of 3,850 images covering Blight, Common Rust, Gray Leaf Spot, and Healthy conditions, the MoSViT model demonstrates remarkable performance, achieving classification accuracy, Precision, Recall, and F1 Score of 98.75%, 98.73%, 98.72%, and 98.72%, respectively. These metrics surpass those of other leading models, including Swin Transformer V2, Densnet121, and EfficientNet V2, not only in accuracy but also in model parameter efficiency. Moreover, the model's interpretability is significantly improved through the use of heat maps to analyze the focus areas, providing valuable insights into the decision-making process. Testing on small sample datasets further confirms the model's excellent generalization capability and its potential for enhancing small sample detection scenarios.

    Keywords: Precision attention 1, Maize disease detection 2, deep learning 3, Maize disease detection 4, MobileViT V2 5, Parallel attention mechanism 6, few shot object detection7

    Received: 18 Sep 2024; Accepted: 27 Feb 2025.

    Copyright: © 2025 陈, 王, 刘, 岳, Enxu, 李 and 张. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence:
    元奇 陈, Xijing University, Xi'an, China
    宁 张, Xijing University, Xi'an, China

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

    Research integrity at Frontiers

    Man ultramarathon runner in the mountains he trains at sunset

    94% of researchers rate our articles as excellent or good

    Learn more about the work of our research integrity team to safeguard the quality of each article we publish.


    Find out more