AUTHOR=Jiang Shufan , Cormier Stéphane , Angarita Rafael , Rousseaux Francis TITLE=Improving text mining in plant health domain with GAN and/or pre-trained language model JOURNAL=Frontiers in Artificial Intelligence VOLUME=Volume 6 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1072329 DOI=10.3389/frai.2023.1072329 ISSN=2624-8212 ABSTRACT=The Bidirectional Encoder Representations from Transformers (BERT)-based architecture proposes an objective engineering paradigm for Natural Language Processing that consists of: a) pre-training a language model for contextualized feature extraction; and b), fine-tuning downstream tasks. Different pre-trained language models have proven to be a promising technology for domain-specific text-mining. However, we still face the lack of sufficient labeled data in certain applications, such as plant health hazard detection from individuals’ observations. GAN-BERT extends the fine-tuning with unlabeled data in a Generative Adversarial Network (GAN) and obtains better performance in several text classification tasks. In this paper, we study the combination of a GAN-based model and further pre-training. First, we will discuss whether this combination improves the classification tasks for plant health hazard detection. Second, whether the pre-trained language model impacts the training in a GAN setting.