Skip to main content

ORIGINAL RESEARCH article

Front. Plant Sci.
Sec. Technical Advances in Plant Science
Volume 15 - 2024 | doi: 10.3389/fpls.2024.1360113
This article is part of the Research Topic Artificial Intelligence and Internet of Things for Smart Agriculture View all 18 articles

Synthetic Data at Scale: A Development Model to Efficiently Leverage Machine Learning in Agriculture

Provisionally accepted
  • 1 King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
  • 2 Department of Computer Science, Faculty of Engineering, University of Kiel, Kiel, Schleswig-Holstein, Germany
  • 3 Adam Mickiewicz University, Poznań, Greater Poland, Poland

The final, formatted version of the article will be published soon.

    The rise of artificial intelligence (AI) and in particular modern machine learning (ML) algorithms during the last decade has been met with great interest in the agricultural industry. While undisputedly powerful, their main drawback remains the need for sufficient and diverse training data. The collection of real datasets and their annotation are the main cost drivers of ML developments, and while promising results on synthetically generated training data have been shown, their generation is not without difficulties on their own. In this paper, we present a development model for the iterative, cost-efficient generation of synthetic training data. Its application is demonstrated by developing a low-cost early disease detector for tomato plants (Solanum lycopersicum) using synthetic training data. A neural classifier is trained by exclusively using synthetic images, whose generation process is iteratively refined to obtain optimal performance. In contrast to other approaches that rely on a human assessment of similarity between real and synthetic data, we instead introduce a structured, quantitative approach. Our evaluation shows superior generalization results when compared to using non-task-specific real training data and a higher cost efficiency of development compared to traditional synthetic training data.

    Keywords: artificial intelligence, Data Generation and Annotation, Disease detection, Greenhouse farming, machine learning, synthetic data, Tomato plants

    Received: 22 Dec 2023; Accepted: 12 Aug 2024.

    Copyright: © 2024 Klein, Waller, Pirk, Pałubicki, Tester and Michels. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Jonathan Klein, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.