AUTHOR=Jiang Shufan , Cormier Stéphane , Angarita Rafael , Rousseaux Francis 

TITLE=Improving text mining in plant health domain with GAN and/or pre-trained language model

JOURNAL=Frontiers in Artificial Intelligence

VOLUME=Volume 6 - 2023

YEAR=2023

URL=https://www.frontiersin.org/journals/artificial-intelligence/articles/10.3389/frai.2023.1072329

DOI=10.3389/frai.2023.1072329

ISSN=2624-8212

ABSTRACT=The Bidirectional Encoder Representations from Transformers (BERT)-based architecture
proposes an objective engineering paradigm for Natural Language Processing that consists
of: a) pre-training a language model for contextualized feature extraction; and b), fine-tuning
downstream tasks. Different pre-trained language models have proven to be a promising
technology for domain-specific text-mining. However, we still face the lack of sufficient labeled
data in certain applications, such as plant health hazard detection from individuals’ observations.
GAN-BERT extends the fine-tuning with unlabeled data in a Generative Adversarial Network
(GAN) and obtains better performance in several text classification tasks. In this paper, we study
the combination of a GAN-based model and further pre-training. First, we will discuss whether
this combination improves the classification tasks for plant health hazard detection. Second,
whether the pre-trained language model impacts the training in a GAN setting.