AUTHOR=Meissner Roy , Pögelt Alexander , Ihsberner Katja , Grüttmüller Martin , Tornack Silvana , Thor Andreas , Pengel Norbert , Wollersheim Heinz-Werner , Hardt Wolfram TITLE=LLM-generated competence-based e-assessment items for higher education mathematics: methodology and evaluation JOURNAL=Frontiers in Education VOLUME=9 YEAR=2024 URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1427502 DOI=10.3389/feduc.2024.1427502 ISSN=2504-284X ABSTRACT=

In this article, we explore the transformative impact of advanced, parameter-rich Large Language Models (LLMs) on the production of instructional materials in higher education, with a focus on the automated generation of both formative and summative assessments for learners in the field of mathematics. We introduce a novel LLM-driven process and application, called ItemForge, tailored specifically for the automatic generation of e-assessment items in mathematics. The approach is thoroughly aligned with the levels and hierarchy of cognitive learning objectives as developed by Anderson and Krathwohl, and takes specific mathematical concepts from the considered courses into consideration. The quality of the generated free-text items, along with their corresponding answers (sample solutions), as well as their appropriateness to the designated cognitive level and subject matter, were evaluated in a small-scale study. In this study, three mathematical experts reviewed a total of 240 generated items, providing a comprehensive analysis of their effectiveness and relevance. Our findings demonstrate that the tool is proficient in producing high-quality items that align with the chosen concepts and targeted cognitive levels, indicating its potential suitability for educational purposes. However, it was observed that the provided answers (sample solutions) occasionally exhibited inaccuracies or were not entirely complete, signalling a necessity for additional refinement of the tool's processes.