AI-based avatars are changing the way we learn and teach: benefits and challenges

Fink, Maximilian C.; Robinson, Seth A.; Ertl, Bernhard

doi:10.3389/feduc.2024.1416307

PERSPECTIVE article

Front. Educ., 16 July 2024

Sec. Digital Learning Innovations

Volume 9 - 2024 | https://doi.org/10.3389/feduc.2024.1416307

This article is part of the Research Topic Artificial intelligence (AI) in the complexity of the present and future of education: research and applications View all 8 articles

AI-based avatars are changing the way we learn and teach: benefits and challenges

$\r\nMaximilian C. Fink*$ Maximilian C. Fink^1*

Seth A. Robinson²

Bernhard Ertl¹

¹Learning and Teaching with Media, Department of Education, Universität der Bundeswehr München, Neubiberg, Germany
²Robinson Technologies, Kyoto, Japan

Advancements in the generative AI field have enabled the development of powerful educational avatars. These avatars embody a human and can, for instance, listen to users’ spoken input, generate an answer utilizing a large-language model, and reply by speaking with a synthetic voice. A theoretical introduction summarizes essential steps in developing AI-based educational avatars and explains how they differ from previously available educational technologies. Moreover, we introduce GPTAvatar, an open-source, state-of-the-art AI-based avatar. We then discuss the benefits of using AI-based educational avatars, which include, among other things, individualized and contextualized instruction. Afterward, we highlight the challenges of using AI-based educational avatars. Major problems concern incorrect and inaccurate information provided, as well as insufficient data protection. In the discussion, we provide an outlook by addressing advances in educational content and educational technology and identifying three crucial open questions for research and practice.

1 Introduction

The vision of creating AI-based educational avatars began with research on chatbots. Early chatbot-based computer programs like ELIZA (Weizenbaum, 1983) allowed interaction through text input and relied on keyword analysis and decision rules. Chatbots that followed such simple algorithms and interacted skillfully convinced many people that they were talking to a human being. Another critical step was the release of the chatbot A.L.I.C.E. in 1995, considered the first artificial intelligence-powered chatbot (AbuShawar and Atwell, 2015). Large corpora of natural languages were imported into this chatbot because it had an editable knowledge base. This step reinforced the impression among users that the chatbot understood their questions and expressed itself like a human. Despite these early successes, most chatbots, especially in the education sector, remained text-based and relied on simple algorithms in the following years (Smutny and Schreiberova, 2020). In 2012, AlexNet was published (Krizhevsky et al., 2017) which is often considered the precursor to modern large-language models (LLMs). This model achieved excellent results in classification tasks and was based on a neural network trained using a backpropagation algorithm. In the years that followed, more and more chatbots relying on neural networks were created, and LLMs with similar architectures became established (Smutny and Schreiberova, 2020). Then, in 2023, the public widely adopted LLM GPT4 and its chat-based interface, ChatGPT. Considerable investment from investors followed, which triggered further innovations in AI. Current LLMs like GPT4 can interpret various types of unstructured data, browse the web, and perform well in a range of cognitive tasks (Kung et al., 2023; OpenAI et al., 2023; Orrù et al., 2023). Empirical results investigating the effectiveness of chatbots in education are promising. Alemdag (2023) reports that regular chatbots, not including recent LLMs, foster knowledge acquisition and self-regulation skills with medium effect sizes.

Simultaneously, pedagogical agents were created that can also be seen as the predecessors of today’s AI-based avatars. Herman the Bug (Lester et al., 1997) was a non-humanlike pedagogical agent in an anatomy and physiology learning environment. This pedagogical agent provided support and feedback depending on user actions and talked to learners to motivate them. STEVE (Rickel and Johnson, 1999) was a human-like pedagogical agent integrated into a VR learning environment to model and explain naval tasks and team collaboration. STEVE already possessed some abilities to listen and talk to users. Several years later, the pedagogical agent AutoTutor was published (Graesser et al., 2005). AutoTutor was embedded in an intelligent tutoring system that allowed the learner to manipulate variables. It responded to text-based user input by classifying speech acts and interpreting learner actions. In the following years, automatic speech recognition, text-to-speech technologies for transcribing and producing human speech, and natural language processing technologies significantly improved. The first more advanced pedagogical agents that interpret and respond in natural language through these technologies emerged by 2016 (Johnson and Lester, 2016). One example of these pedagogical agents is Marni (Ward et al., 2013), a science tutor who listens to users’ spoken utterances, interprets them using natural language processing, and answers with synthetic speech. The aforementioned pedagogical agents can foster learning as “guides, mentors, and teammates” (Rickel, 2001, p. 15), for instance, through demonstrating actions, conveying knowledge, and learning together. Consequently, researchers conducted many empirical studies to investigate their effectiveness. Despite high hopes for pedagogical agents, meta-analyses and literature reviews indicated that their effects are relatively small for knowledge acquisition (Heidig and Clarebout, 2011; Schroeder et al., 2013; Castro-Alonso et al., 2021) and affective outcomes (Wang et al., 2023).

Recent technological breakthroughs now allow the creation of AI-based educational avatars. LLMs or other generative AI models drive these avatars, which embody a human, can act in a shared virtual world with the user, and follow educational prompts. Most of these functions were already technically available in the past. However, the underlying AI technologies have made significant progress and are now more reliable, faster, and easier to integrate. Thus, AI-based avatars have taken an essential evolutionary step and make it possible to harness the full advantages of chatbots and pedagogical agents. The second author of this paper (Robinson, 2023) developed a state-of-the-art AI-based avatar, GPTAvatar, which records user input via microphone and converts words to text using automatic speech recognition. GPTAvatar uses an LLM as a backend to generate answers. Text-to-speech then processes these responses to generate realistic synthetic human voices that speak to the user. The resulting audio is also processed to generate matching lip movements on the 3D avatar. Dynamically merging animated behavior to match the current situation (listening vs. speaking, etc.) contributes to the authenticity of the avatar, which is placed in a 3D virtual world that can be manipulated to fit the desired theme. Figure 1A visualizes the software architecture enabling GPTAvatar, including the technologies used. Figure 1B shows a picture of a language-learning avatar created with this software. The user can set the avatar’s personality, the educational scenario, and the LLM’s response to user requests in a configuration file; see Figure 1C. Developed with the Unity game engine, GPTAvatar is open-source software that can be used to create custom AI-based avatars.

FIGURE 1

Figure 1. Software architecture (A), screenshot of the avatar (B), and config file (C) of GPTAvatar (Robinson, 2023).

2 Benefits of using AI-based educational avatars

Like chatbots, AI-based avatars can fulfill three main educational roles: learning, assisting, and mentoring (Wollny et al., 2021). Learning refers to facilitating or testing competencies. Assisting can be defined as helping or simplifying tasks for the learner. Mentoring pertains to fostering the students’ individual development. In addition, AI-based avatars can be excellent interaction partners who can answer questions promptly and accurately, browse the web, and perform actions in the shared virtual world (Kasneci et al., 2023; OpenAI et al., 2023). By taking on these roles, AI-based avatars can foster individual outcomes (e.g., factual knowledge) and collaborative skills (e.g., negotiating with a partner). Next, we present five areas where AI-based educational avatars can be highly beneficial.

2.1 Individualized instruction

A significant advantage of AI-based avatars lies in individualized instruction. One particularly important application of AI-based avatars for individualized instruction is teaching foreign languages (Wollny et al., 2021). Current LLMs like GPT-4 can understand and respond in more than 50 languages, and automatic speech recognition of user input and text-to-speech for synthetic voice output are also available for many languages. Other key applications include teaching science and engineering (Chan et al., 2023; Dai et al., 2024) and tutoring for individual learning difficulties (Johnson and Lester, 2016). In the contexts mentioned, AI-based avatars could generate suitable tasks for learners and adapt to their difficulty level. They could provide incentives and reinforcement that strengthen the learner’s motivation during the learning process. AI-based avatars’ content, language, and teaching styles can be further customized (Mageira et al., 2022) and their personalities can be matched to the users’ personality (Shumanov and Johnson, 2021). In general, there are two major ways in which individualization can take place with AI-based avatars. The first is that learners can specify to the AI-based avatar or in a config file precisely what the individualized lessons should look like. The second is that adaptive adjustments could also be made to the learner before or during the lesson based on the learner’s level or progress. Further ideas on how and which content can be individualized based on generative AI and learning analytics algorithms using multimodal data can be found in Sailer et al. (2024).

2.2 Contextualized instruction

Contextualized instruction refers to different types of teaching in which skills and competencies are acquired in practical and real-world scenarios (Berns and Erickson, 2001). Thus, the goal is to make knowledge and skills more accessible and relevant. In a broader sense, contextualized instruction encompasses problem-based learning (Wood, 2003) and case-based learning (Kolodner, 1992). Until recently, considerable resources were required to create and implement practical and real-world learning environments that enabled contextualized instruction. For instance, sophisticated scenarios had to be developed for problem- and case-based learning. These scenarios were then implemented using rule-based virtual humans, role-playing games, and trained actors (Fink et al., 2021). AI-based avatars are cost-effective and offer a level of interaction and response accuracy in presenting such stimuli that could compete with role-playing games and trained actors and go beyond the possibilities of rule-based virtual humans. Studies from domains like medical education and teacher education already highlight how AI-based avatars can support contextualized instruction. For instance, Chheang et al. (2024) showed that AI-based avatars can be used as an effective tutor in a case-based anatomy learning environment. Fecke et al. (2023) outlined how AI-based avatars can serve as interaction partners for role-plays that convey communicative competencies and what technical points have to be considered in their development.

2.3 Immersive learning

Immersive learning refers to the use of virtual, augmented, and mixed reality to create a deeply engaging and authentic learning experience. A recent meta-analysis reported that only 19 studies using AI-based avatars in the context of immersive learning are available (Dai et al., 2024). Most of these studies were conducted before the recent advances in generative AI. In immersive environments, participants can experience exceptionally high levels of engagement. However, immersive learning can also be associated with increased cognitive load (Makransky et al., 2019), and navigating can be difficult. AI-based educational avatars could provide cues that reduce cognitive load and help users find their way through immersive environments. Moreover, learners can struggle to use traditional learning strategies (Dunlosky et al., 2013) in immersive environments. AI-based avatars can help these learners by promoting learning strategies like Fiorella and Mayer’s (2016) generative learning strategies summarizing, creating concept maps, drawing, imagining, self-testing, self-explaining, teaching and learning by enacting. Some of these learning strategies can be particularly well stimulated by interaction with an AI-based avatar or are more plausible than if the user learns purely individually or is in an environment with static interaction partners.

2.4 Scaffolding

Scaffolding aids learners by simplifying the learning materials or providing additional instructional support (Wood et al., 1976). Popular scaffolding methods include providing feedback, reflection phases, and prompting. To date, few empirical findings are available on adaptive scaffolding with modern AI-based avatars. Most of the available studies either used static pedagogical agents without authentic animations and voice output or did not employ current generative AI models (Chien et al., 2024; Dai et al., 2024; Wu and Yu, 2024). Therefore, we now describe the results known for scaffolding in various forms of e-learning. According to a meta-analysis by Belland et al. (2017), computer-based scaffolds have a medium effect on several cognitive learning outcomes in STEM education. Other meta-analyses corroborated these findings in various domains and showed that the effects of different types of scaffolding vary depending on learner characteristics (Chernikova et al., 2020a,b). Some studies reported that adaptive scaffolding, such as individualized feedback, created by AI or learning analytics can bolster the effects of scaffolding further (e.g., Lim et al., 2023; Sailer et al., 2023). Other studies highlighted that pedagogical agents without generative AI-models can supply (adaptive) scaffolding well (e.g., Azevedo et al., 2010; Dever et al., 2023). Based on these findings, adaptive scaffolding could be particularly beneficial when LLMs and learning analytical techniques accurately diagnose learners’ progress and misconceptions (Kasneci et al., 2023) and AI-based avatars present personalized scaffolding convincingly. For this purpose, AI-based avatars could take on the role of peers or mentors who give learners feedback or instruct them to carry out an activity. This could make adaptive scaffolding appear more credible to or be better accepted by learners than adaptive scaffolds provided without AI-based avatars.

2.5 Fostering self-regulation, interest, and affect

Self-regulation training is successful when a tutor teaches strategies and then repeatedly encourages and reviews their application over a longer period (Dignath and Büttner, 2008). A study by Dever et al. (2023) evaluated the use of MetaTutor, a pedagogical agent which teaches self-regulation without generative AI functions. Participants who received prompts and feedback by MetaTutor displayed improved self-regulation strategies compared to a control group. A related experiment was conducted by Karaoğlan Yılmaz et al. (2018) to determine whether the addition of a pedagogical agent increases the effectiveness of digital self-regulation training. The intervention group that used a pedagogical agent achieved better self-regulation compared to the control group without such an agent. In addition, Ng et al. (2024) evaluated the effects of the type of used chatbot technology on self-reguation training in adults. In this study, chatbots powered by LLMs, such as ChatGPT, were found to increase self-regulation through recommendations more effectively than rule-based chatbots that provide recommendations. Considering these findings, we believe that AI-based avatars integrated into learning management systems can increasingly support self-regulation training. They are continually available and can repeatedly remind learners to apply strategies. AI-based avatars are also promising in terms of motivational and emotional effects. Krapp (2002) interest development theory states that situational interests emerge and later develop into manifest, stable interests. Social processes are important in this progression. With the help of AI-based avatars, virtual tutors and interaction partners can be created that match the learners and their personalities. In this way, interests can be developed effectively in a goal-directed manner. Moreover, AI-based avatars could invoke positive emotions like enjoyment and curiosity, which have been found to support learning (Loderer et al., 2020). This assumption is also supported by a study by Beege and Schneider (2023), which investigated the effect of stylized pedagogical agents without generative AI functions. Enthusiastic pedagogical agents were associated with more positive perceived emotions than neutral pedagogical agents in this study.

3 Challenges of using AI-based educational avatars

When used for educational purposes, AI-based avatars are, clearly, also associated with unique challenges. We now discuss four challenges in detail.

3.1 Incorrect and inaccurate information

LLMs can produce incorrect and inaccurate information when replying to a query (Hughes, 2023). Generative AI has made significant progress, and the average rate of incorrect and inaccurate information is now below 5% for the best LLMs (Hughes, 2023). Custom LLMs trained for a specific purpose on user documents and data can further reduce this rate of false information. Incorrect and inaccurate information also frequently originates from the fact that LLMs have a limited context window (a token limit) that may cause the AI to forget things that happened earlier in the conversation. LLM advancements are continually allowing larger token limits and will likely be a non-issue soon. Nevertheless, even low percentages of incorrect and inaccurate information are an issue once LLMs are deployed in educational settings. Moreover, learners could find incorrect and inaccurately conveyed information even more convincing and trustworthy when LLMs embody AI-based avatars who build relationships, possess unique personalities, and act with realistic speech, facial expressions, and body language (Bente et al., 2014; Aseeri and Interrante, 2021).

3.2 Inadequate relationships with humans

When humans learn from AI-based avatars, they will sometimes form inadequate or unbeneficial relationships with them. The first reports of humans building relationships with chatbots come from the ELIZA project (Weizenbaum, 1983), in which a secretary increasingly engaged in personal interactions with the chatbot. Although current AI-based avatars do not yet have humans’ high social and emotional skills (Li et al., 2023; Sorin et al., 2023), some particularly advanced avatars can already display emotions through facial expressions and recognize affect from text input. In addition, developments in affective and social computing will further enhance AI-based avatars’ social and emotional skills. These developments will contribute to people’s increasingly building (inadequate) relationships with avatars. We see two particular dangers of building a relationship with AI-based educational avatars: learners can become dependent on the avatars, or be manipulated to share too much or sensitive information. These dangers exist particularly for commercial educational services, as their economic success depends on subscription fees or user data.

3.3 Inappropriate values and interactional styles

Another challenge with current AI-based educational avatars is that they may have inappropriate values. The LLMs driving AI-based educational avatars and chatbots have been trained on a large text corpus that contains harmful, stereotypical, and racist material and views (Weidinger et al., 2021). As a result, earlier chatbots exhibited problematic behavior, like answering questions about building explosives (OpenAI et al., 2023). Even though current chatbots no longer exhibit such problematic behavior due to technical specifications, they may sometimes hold inappropriate values and lack the appropriate, vetted, and inherent values that most professional educators have.

In addition, professional educators develop their interaction styles in their training that help guide them and provide standards for raising and educating learners (Walker, 2008). AI-based educational avatars lack self-developed interaction styles, like those of professional educators, which originate from practical experience and usually fit relatively well with the context in which the educational activities take place.

3.4 Insufficient data protection

The last significant challenge has to do with insufficient data protection. AI-based avatars frequently combine LLMs, automatic speech recognition, and other cognitive services. These technologies are often cloud-based and come from companies in various countries. As a result, different data protection regulations apply, and multiple risks exist. For instance, the participants’ voice recordings feed the previously mentioned automatic speech recognition systems. In the wrong hands, these voice recordings could be used to identify people or create deepfakes (Dash and Sharma, 2023). The information that AI-based avatars pass on to LLMs can also be problematic. Participants could disclose personal or confidential information (Yao et al., 2024), especially if they interact with an avatar that they find interesting. Companies could then use this information for commercial purposes, or it could be leaked. Finally, it should be noted that educational avatars who counsel students or provide adaptive and individualized instruction are particularly at risk of generating sensitive information. They know learners’ problems, careers, strengths, and weaknesses. This information should remain confidential. As a result, professional educators should partially supervise avatars in critical settings and with young people.

4 Discussion

4.1 Advances in educational content and educational technology

We have seen that AI-based educational avatars are changing the way we learn and teach. Next, we look at potential advances in educational content and educational technology.

Advances in educational content will potentially include publishers and educational institutions developing AI-based avatars for specific products or courses. Unlike previous pedagogical agents, AI-based avatars require little effort to train on textbook excerpts or seminar content. This difference could lead to AI-based avatars gaining much wider adoption than pedagogical agents, which have not been widely used (Johnson and Lester, 2016). Creating a library of AI-based avatars for use in various educational settings, like immersive learning and massive open online courses, would be another desirable development. The AI-based avatars could play specific roles (e.g., a math tutor for a specific skill level) and even link to custom LLMs. Moreover, the cases in the case library could follow a joint framework, specifying characteristics like teaching style or error handling. Educators could then adapt prompts for specific characteristics that affect the avatar’s traits and behaviors.

Just recently, OpenAI (2024) announced GPT-4o. With this new LLM, AI-based avatars will be able to process a combination of images, videos, and audio sequences, and respond using these various media. Moreover, GPT-4o can detect and express emotions and interpret camera feed with advanced computer vision capabilities. Further advances like this are on the way in educational technology. Generative AI will likely increase accuracy and precision in many activities and tasks. These developments will allow learners to be instructed and supported in tasks that only occur in individual domains. In addition, generative AI will have access to more data. Various technologies, such as the Internet of Things, digital twins, and capturing human sensor data, are currently being expanded. Access to these data sources will significantly expand the insights that AI-based avatars have on the physical world and users’ states. We also expect that AI-based avatars will gain higher socio-emotional capabilities. Researchers are already developing multi-agent systems in which different LLM-based agents interact in various roles (Li et al., 2023). AI-based avatars operated with such technologies could develop their own perspectives and exhibit social skills on par with humans. Without a doubt, further progress in artificial intelligence, the increased use of sensor data, and the advancing social capabilities of AI-based avatars will present us with further challenges; the next evolutionary step of AI-based avatars already looms on the horizon.

4.2 Open questions for research and practice

Our considerations raise three open questions for research and practice that we now need to answer.

The first question is how we can utilize AI-based educational avatars effectively. Let us briefly summarize the current and upcoming use-cases of AI-based avatars. AI-based avatars can be employed in individual contexts as well as in settings that connect multiple users. Group scenarios can also be simulated by having multiple AI-based avatars work together in an orchestrated way. As mentioned, AI-based avatars can develop a grasp of the physical world and users’ states, when fed with data from digital twins, the real world or game engines. While these capabilities are already technically feasible, they are not yet fully integrated into most available AI-based avatars. We now need to find the most appropriate applications where AI-based avatars can provide added value using these capabilities. Then, we have to identify the most suitable software architectures to create powerful AI-based avatars for these purposes.

The second question concerns what we should and should not do with AI-based educational avatars. Although AI-based avatars hold great potential, they also come with challenges, such as incorrect and inaccurate information and inappropriate values and interactional styles. These issues make AI-based avatars less suitable for unguided and unsupervised instruction, especially for vulnerable groups. Clarifying the applications and limitations of AI-based avatars in education requires a two-faceted methodology: We should conduct experimental studies to investigate how different user groups perceive and interact with AI-based avatars across various applications. Also, we need to answer these questions normatively by referring back to pedagogical and ethical theories and critically reflecting on technological change. In terms of research topics to explore, it seems particularly important to further investigate the trust that users place in AI-based avatars and the authenticity they feel when interacting and building a relationship with them.

The third question relates to what we are allowed to do with AI-based educational avatars. Clear laws and guidelines are essential for the responsible use of generative AI and AI-based avatars in education. Key areas requiring regulation include the collection of sensitive data, the utilization in tasks with critical consequences, and use with vulnerable user groups. The answers to these regulatory questions will likely vary greatly depending on the country, the software and the intended purposes. It is crucial for local authorities to develop policies now to prevent uncontrolled use. Research can contribute by providing neutral information about the potential impact of AI-based avatars in education, suggesting ideas for policies, and reporting on the international status of their use.

Data availability statement

The original contributions presented in this study are included in the article/supplementary material, further inquiries can be directed to MCF, bWF4aW1pbGlhbi5maW5rQHVuaWJ3LmRl.

Author contributions

MCF: Conceptualization, Project administration, Visualization, Writing–original draft, Writing–review and editing. SAR: Software, Writing–review and editing. BE: Conceptualization, Funding acquisition, Supervision, Writing–review and editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of the article. We acknowledge financial support by Universität der Bundeswehr München. This research manuscript is funded by dtec.bw – Digitalization and Technology Research Center of the Bundeswehr [project RISK.twin]. dtec.bw is funded by the European Union – NextGenerationEU.

Acknowledgments

We are grateful to the research initiative Individuals and Organizations in a Digitalized Society (INDOR) of Universität der Bundeswehr München, which contributed new ideas to this manuscript. We thank Lukas Hart for interesting discussions on the topic and his support of the project. We also thank Kerstin Huber and Volker Eisenlauer for the conversations on educational technologies we had over the last years. MCF thanks his wife, Larissa Kaltefleiter, for inspiring discussions about generative AI. Please note that a preprint of a prior version of this article has been posted online at a repository (Fink et al., 2024).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The author(s) declared that they were an editorial board member of Frontiers, at the time of submission. This had no impact on the peer review process and the final decision.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

AbuShawar, B., and Atwell, E. (2015). ALICE Chatbot: Trials and outputs. Comput. Sist. 19, 625–632. doi: 10.13053/cys-19-4-2326