Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Natural Language Processing
Volume 7 - 2024 | doi: 10.3389/frai.2024.1469197

Investigating Generative AI Models and Detection Techniques: Impacts of Tokenization and Dataset Size on Identification of AI-Generated Text

Provisionally accepted
Haowei Hua Haowei Hua 1*Jiayu Yao Jiayu Yao 2
  • 1 The Culver Academies, Culver, United States
  • 2 Anhui Polytechnic University, Anhui, China

The final, formatted version of the article will be published soon.

    Generative AI models, including ChatGPT, Gemini, and Claude, are increasingly significant in enhancing K-12 education, offering support across various disciplines. These models provide sample answers for humanities prompts, solve mathematical equations, and brainstorm novel ideas. Despite their educational value, ethical concerns have emerged regarding their potential to mislead students into copying answers directly from AI when completing assignments, assessments, or research papers. Current detectors, such as GPT-Zero, struggle to identify modified AI-generated texts and show reduced reliability for English as a Second Language learners. This study investigates detection of academic cheating by use of generative AI in highstakes writing assessments. Classical machine learning models, including logistic regression, XGBoost, and support vector machine, are used to distinguish between AI-generated and studentwritten essays. Additionally, large language models including BERT, RoBERTa, and Electra are examined and compared to traditional machine learning models. The analysis focuses on prompt 1 from the ASAP Kaggle competition. To evaluate the effectiveness of various detection methods and generative AI models, we include ChatGPT, Claude, and Gemini in their base, pro, and latest versions. Furthermore, we examine the impact of paraphrasing tools such as GPT-Humanizer and QuillBot and introduce a new method of using synonym information to detect humanized AI texts. Additionally, the relationship between dataset size and model performance is explored to inform data collection in future research.

    Keywords: GenAI, Generative artificial intelligence, machine learning, Natural Language Processing, writing assessment, ChatGPT, Claude, text classification

    Received: 06 Sep 2024; Accepted: 28 Oct 2024.

    Copyright: © 2024 Hua and Yao. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Haowei Hua, The Culver Academies, Culver, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.