Skip to main content

ORIGINAL RESEARCH article

Front. Artif. Intell.
Sec. Machine Learning and Artificial Intelligence
Volume 7 - 2024 | doi: 10.3389/frai.2024.1408817
This article is part of the Research Topic Chatgpt and Other Generative AI Tools View all 11 articles

David vs. Goliath: Comparing conventional machine learning and a large language model for assessing students' concept use in a physics problem

Provisionally accepted
Fabian Kieser Fabian Kieser 1*Paul Tschisgale Paul Tschisgale 2Sophia Rauh Sophia Rauh 3Xiaoyu Bai Xiaoyu Bai 3Holger Maus Holger Maus 2Stefan Petersen Stefan Petersen 2Manfred Stede Manfred Stede 3Knut Neumann Knut Neumann 2Peter Wulff Peter Wulff 1
  • 1 Heidelberg University of Education, Heidelberg, Germany
  • 2 Department of Physics Didactics, Leibniz Institute for Science and Mathematics Education, Faculty of Mathematics and Natural Sciences, University of Kiel, Kiel, Schleswig-Holstein, Germany
  • 3 University of Potsdam, Potsdam, Brandenburg, Germany

The final, formatted version of the article will be published soon.

    Large language models have been shown to excel in many different tasks across disciplines and research sites. They provide novel opportunities to enhance educational research and instruction in different ways such as assessment. However, these methods have also been shown to have fundamental limitations. These relate, among others, to hallucinating knowledge, explainability of model decisions, and resource expenditure. As such, more conventional machine learning algorithms might be more convenient for specific research problems because they allow researchers more control over their research. Yet, the circumstances in which either conventional machine learning or large language models are preferable choices are not well understood.This study seeks to answer the question to what extent either conventional machine learning algorithms or a recently advanced large language model performs better in assessing students' concept use in a physics problem-solving task. We found that conventional machine learning algorithms in combination outperformed the large language model. Model decisions were then analyzed via closer examination of the models' classifications. We conclude that in specific contexts, conventional machine learning can supplement large language models, especially when labeled data is available.

    Keywords: Large language models, machine learning, Natural Language Processing, Problem Solving, Explainable AI

    Received: 15 Apr 2024; Accepted: 30 Aug 2024.

    Copyright: © 2024 Kieser, Tschisgale, Rauh, Bai, Maus, Petersen, Stede, Neumann and Wulff. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

    * Correspondence: Fabian Kieser, Heidelberg University of Education, Heidelberg, Germany

    Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.