AUTHOR=Wei Xin TITLE=Evaluating chatGPT-4 and chatGPT-4o: performance insights from NAEP mathematics problem solving JOURNAL=Frontiers in Education VOLUME=9 YEAR=2024 URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1452570 DOI=10.3389/feduc.2024.1452570 ISSN=2504-284X ABSTRACT=
This study assesses the capabilities of OpenAI’s ChatGPT-4 and ChatGPT-4o in solving mathematics problems from the National Assessment of Educational Progress (NAEP) across grades 4, 8, and 12. Results indicate that ChatGPT-4o slightly outperform ChatGPT-4 and both models generally surpass U.S. students’ performance across all grades, content areas, item type, and difficulty level. However, both models perform worse on geometry and measurement than on algebra and face more difficulties with high-difficulty mathematics items. This investigation highlights the strengths and limitations of AI as a supplementary educational tool, pinpointing areas for improvement in spatial intelligence and complex mathematical problem-solving. These findings suggest that while AI has the potential to support instruction in specific mathematical areas like algebra, there remains a need for careful integration and teacher-mediated strategies in areas where AI is less effective.