Skip to main content

BRIEF RESEARCH REPORT article

Front. Educ.
Sec. STEM Education
Volume 9 - 2024 | doi: 10.3389/feduc.2024.1452570

Evaluating ChatGPT-4 and ChatGPT-4o: Performance Insights from NAEP Mathematics Problem Solving

Provisionally accepted
  • Digital Promise, Redwood City, United States

The final, formatted version of the article will be published soon.

    This study assesses the capabilities of OpenAI's ChatGPT-4 and ChatGPT-4o in solving mathematics problems from the National Assessment of Educational Progress (NAEP) across grades 4, 8, and 12. Results indicate that ChatGPT-4o slightly outperform ChatGPT-4 and both models generally surpass U.S. students' performance across all grades, content areas, item type, and difficulty level. However, both models perform worse on geometry and measurement than on algebra and face more difficulties with high-difficulty mathematics items. This investigation highlights the strengths and limitations of AI as a supplementary educational tool, pinpointing areas for improvement in spatial intelligence and complex mathematical problem-solving. These findings suggest that while AI has the potential to support instruction in specific mathematical areas like algebra, there remains a need for careful integration and teacher-mediated strategies in areas where AI is less effective.

    Keywords: artificial intelligence, ChatGPT-4, ChatGPT-4o, NAEP, mathematics education

    Received: 21 Jun 2024; Accepted: 26 Aug 2024.

    Copyright: © 2024 Wei. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Xin Wei, Digital Promise, Redwood City, United States

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.