Embracing LLM Feedback: the role of feedback providers and provider information for feedback effectiveness

Ruwe, Theresa; Mayweg-Paus, Elisabeth

doi:10.3389/feduc.2024.1461362

ORIGINAL RESEARCH article

Front. Educ., 16 October 2024

Sec. Digital Learning Innovations

Volume 9 - 2024 | https://doi.org/10.3389/feduc.2024.1461362

Embracing LLM Feedback: the role of feedback providers and provider information for feedback effectiveness

Theresa Ruwe^*

Elisabeth Mayweg-Paus

Institute of Educational Studies, Digital Knowledge Management, Humboldt-Universität zu Berlin, Berlin, Germany

Feedback is an integral part of learning in higher education and is increasingly being provided to students via modern technologies like Large Language Models (LLMs). But students’ perception of feedback from LLMs vs. feedback from educators remains unclear even though it is an important facet of feedback effectiveness. Further, feedback effectiveness can be negatively influenced by various factors; For example, (not) knowing certain characteristics about the feedback provider may bias a student’s reaction to the feedback process. To assess perceptions of LLM feedback and mitigate the negative effects of possible biases, this study investigated the potential of providing provider-information about feedback providers. In a 2×2 between-subjects design with the factors feedback provider (LLM vs. educator) and provider-information (yes vs. no), 169 German students evaluated feedback message and provider perceptions. Path analyses showed that the LLM was perceived as more trustworthy than an educator and that the provision of provider-information led to improved perceptions of the feedback. Furthermore, the effect of the provider and the feedback on perceived trustworthiness and fairness changed when provider-information was provided. Overall, our study highlights the importance of further research on feedback processes that include LLMs due to their influential nature and suggests practical recommendations for designing digital feedback processes.

1 Introduction

Feedback is a central component of learning in (higher) education (Hattie and Timperley, 2007; Wisniewski et al., 2020) and can influence different outcomes, such as a learner’s affective and motivational reactions as well as their performance (Henderson et al., 2019; Wisniewski et al., 2020). In this vein, feedback can promote the development of various skills, including written argumentation skills (Latifi et al., 2019; Fleckenstein et al., 2023). Argumentation skills become increasingly important in our society, as they allow for knowledge construction and perspective-taking and can guide learners when dealing with modern technologies (Federal Ministry of Education and Research, 2023; Redecker, 2017). In this context, (student) teachers have a unique role: They need to acquire argumentation skills themselves, and at the same time act as models and promote their students’ argumentation skills.

While feedback that helps learners hone argumentative skills is critical in our technological society, interestingly, the rise of certain modern technologies is also affecting feedback environments. For example, artificially intelligent (AI) systems, more specifically large language models (LMMs), enormously impact feedback environments, as they can be implemented to provide immediate feedback to learners (Brown et al., 2020; Chiu et al., 2023) and thereby save educators time and resources (Cavalcanti et al., 2021; Kasneci et al., 2023; Wilson et al., 2021).

Yet, a feedback process is rather complex and does not only comprise the feedback message and the feedback provider: As summarized by Panadero and Lipnevich (2022), feedback encompasses characteristics of the message, implementation, student, context, and agents. All these aspects can influence the effectiveness of feedback, which is complex in itself. Henderson et al. (2019) summarized three broad categories of feedback effectiveness: cognitive, affective, and relational. Thus, the feedback provider can potentially affect learners’ perceptions of the feedback and feedback provider and, in turn, affect learners’ performance. Because feedback processes are continuously developing and the complexity of feedback interactions partly explains why the effectiveness of feedback is highly variable (Panadero and Lipnevich, 2022; Wisniewski et al., 2020), the question arises as to whether differences exist in learners’ cognitive, affective, and/or relational reactions toward feedback provided by a human versus an AI-system, particularly an LLM.

Whether feedback is effective for a learner depends not only on the original feedback provider but also the providers’ individual characteristics, like expertise (Lechermeier and Fassnacht, 2018; Lucassen and Schraagen, 2011; Winstone et al., 2017). To prevent learners from having socially biased responses to feedback providers, anonymous feedback processes are often considered. However, such anonymity does not always prevent bias (e.g., Panadero and Alqassab, 2019). Further, for educators or LLMs, providing feedback anonymously can be rather unrealistic, because learners know whose classes they take, and data protection laws require at least minimal information about algorithms and their implementation. Therefore, here we investigate the opposite of anonymity: We explored whether providing additional information about the feedback provider (‘s characteristics) can promote feedback effectiveness.

Specifically, in a 2×2 between-subjects design with the factors feedback provider (educator vs. LLM) and provider information (yes vs. no), we investigated how to minimize learners’ biases related to the feedback provider by shedding light on the questions alluded to above: How might learners’ perceptions of feedback providers and the feedback itself vary based on (1) whether the provider is an educator or an LLM, and (2) whether information about the provider is available?

2 Theoretical Background

2.1 Defining and contextualizing feedback

Feedback is indispensable in educational contexts and is seen as a promising strategy to improve various learning outcomes (Hattie and Timperley, 2007; Wisniewski et al., 2020). Particularly, written and elaborated feedback (in contrast to corrective feedback) plays a crucial role in promoting higher-order learning outcomes like argumentation skills (Van Der Kleij et al., 2011).

As generally defined, feedback includes information about several components from several sources that works best if learners actively engage with it (Lipnevich and Panadero, 2021, p. 25). Thus, feedback is not only about the feedback message itself. As summarized by Panadero and Lipnevich (2022), feedback constitutes characteristics of the message, implementation, student, context, and agents involved. All these elements are crucial for the feedback process and influence its effectiveness. Due to this complexity, the factors that negatively influence feedback effectiveness are hard to pin down, even though they exist (Kluger and DeNisi, 1996; Winstone et al., 2017; Wisniewski et al., 2020).

In this study, we consider the feedback provider as possible (negative) factor influencing feedback effectiveness, particularly because modern technology has made it, so feedback is now not only provided by humans, e.g., educators, but by AI-systems (Chiu et al., 2023). Furthermore, feedback processes are inherently social (Ajjawi and Boud, 2017), and learners’ social biases, e.g., those stemming from the feedback provider, seem to be worth closer investigation, as they have not been thoroughly researched so far (Panadero and Lipnevich, 2022).

2.2 AI-systems as new feedback providers

One of the most important factors influencing the effectiveness of feedback seems to be the feedback provider (Ilgen et al., 1979; Panadero and Lipnevich, 2022; Winstone et al., 2017), which has hardly been researched (Lechermeier and Fassnacht, 2018; Panadero and Lipnevich, 2022). Traditionally, educators have been learners’ sole feedback providers, but next to practical reasons such as the lack of resources to provide adequate feedback to learners, educators’ hierarchy above learners, specifically related to their authority, expertise, and experience, can hinder the effectiveness of feedback. The hierarchy can, for example, prevent learners from asking for clarifications (Carless, 2006; Winstone et al., 2017), and educators’ feedback is seldom questioned (Lechermeier and Fassnacht, 2018), perhaps due to their expert status (Metzger et al., 2016). Overall, the interaction between the educator as the feedback provider and the learner as the feedback recipient seems to play a crucial role in feedback processes.

While educators are the traditional feedback sources for learners, (generative) AI-systems, e.g., LLMs, that offer feedback are emerging (Bozkurt, 2023; Chiu et al., 2023). Such modern technologies are increasingly implemented, for example, in educational administration, as chatbots, or even in assessment (Chiu et al., 2023). AI-systems rapidly developed (Zawacki-Richter et al., 2019), and now, thanks to technological developments in natural language processing, LLMs can give feedback on (short) textual answers [e.g., ArgueTutor (Wambsganss et al., 2021], AcaWriter [Knight et al., 2020), or even ChatGPT (OpenAI, 2023)]. Automated feedback itself is not new, but thus far was pre-programmed and less dialogic (e.g., Azevedo and Bernard, 1995). Compared with that, LLMs are AI-systems that have been trained on huge amounts of data and are capable of analyzing existing patterns in language and imitate as well as understand human language (Brown et al., 2020; Kasneci et al., 2023). Users can thus easily interact with LLMs by using their own natural language (Kasneci et al., 2023). This interaction highly resembles a human-human – or educator-student – interaction. In this vein, building on Reeves and Nass (1996), even though LLMs are clearly non-human, people tend to ascribe them human characteristics (e.g., trust) based on similar cues (e.g., expertise). There is evidence that LLM-and instructor-feedback align (Dai et al., 2023). However, LLMs can be biased by their developers, the training data, and/or any learning that occurs during the LLM’s lifecycle (i.e., aspects determining the LLM’s competence). This can lead to the LLM’s output, i.e., the feedback being false, biased, or untransparent (see Chang et al., 2023). These flaws can affect trust users have in the AI-system (Grassini, 2023; Kaur et al., 2022). All in all, the implementation of LLMs in (higher) education contexts has huge potential. In fact, educators are increasingly supported by LLMs to provide feedback to students, because these can, for example, promote self-regulated learning or save educators time and resources (Cavalcanti et al., 2021; Kasneci et al., 2023; Wilson et al., 2021).

One prerequisite of effective feedback interactions is the flawlessness of the human-computer/AI interaction. A flawless interaction requires AI-literate users (for more information see, for example, Long and Magerko, 2020) that trust the LLM and, at the same time, feel agentic and able to reflect on their use of the system (Khosravi et al., 2022). Additionally, to lead to progress, the interaction should be humanlike and empathetic (Grassini, 2023), a given in interactions with LLM (see above).

When assessing texts to provide feedback, both humans and AI-systems rely on their experience, but their approach differs: AI-systems have a clear statistical approach, e.g., they come to a decision on structure by counting paragraphs and length (e.g., Yang et al., 2023), which resembles a rubric. In this process, (most) AI-systems neglect the actual content and do not reliably provide correct information (Grassini, 2023). Humans, on the other hand, more intuitively evaluate the text, even if they use rubrics which could make them less credible. Alongside investigating the functionality and reliability of AI-systems, research on AI-systems in higher education (Chiu et al., 2023; Grassini, 2023), e.g., as providers of elaborate feedback, should be examined, as this development has the potential to influence the effectiveness of feedback (Panadero and Lipnevich, 2022).

2.3 Feedback effectiveness

As mentioned above, feedback providers and their characteristics can influence the effectiveness of feedback. Similar to the feedback process itself, the effectiveness of feedback is complex: It encompasses cognitive, affective, and relational aspects as summarized by Henderson et al. (2019). Thus, effective feedback not only involves learners using it and improving their performance (i.e., cognitive aspect), but effective feedback also requires the provider to be seen as trustworthy and/or the feedback message to be perceived as fair (i.e., relational and affective aspects). This view on feedback effectiveness, in line with the definition of feedback, highlights the often overlooked fact that learners, as the recipients of feedback, are active recipients, in that they actively engage with the feedback (Lipnevich et al., 2016; Lipnevich et al., 2021; Tsai, 2022; Winstone et al., 2017). Thus, how learners perceive the various elements (i.e., MISCA elements) of the feedback process, like the feedback provider and the feedback itself, is thus crucial for their engagement with the feedback (Van der Kleij and Lipnevich, 2021; Strijbos et al., 2021).

2.3.1 The Importance of feedback message perceptions

As learners are in the center of the feedback process (Panadero and Lipnevich, 2022), a crucial element that determines the effectiveness of such a process is how the learners perceive the feedback. Feedback message perceptions include cognitive, provider-cognitive, motivational, and/or affective reactions (Strijbos et al., 2021, p. 2) and, thus, ‘capture how students comprehend, perceive, and value a feedback message and how they experience and receive feedback’ (Van der Kleij and Lipnevich, 2021, p. 349). Feedback message perceptions are part of the effectiveness of feedback, as they determine how learners engage with the feedback (Van der Kleij and Lipnevich, 2021). As aspects of feedback effectiveness, feedback message perceptions can be influenced by any element of the process, i.e., the message, implementation, student, context, or agents (Panadero and Lipnevich, 2022). In this vein, learners’ perceptions of feedback can be influenced by the feedback provider and their characteristics (Van der Kleij and Lipnevich, 2021). For example, Dijks et al. (2018) investigated how the (peer) feedback provider’s perceived expertise affected the feedback recipients’ perceptions and found a positive link between these two, concluding that learners might be biased by their knowledge about the feedback provider. Similarly, Strijbos et al. (2010) also found evidence that (peer) feedback providers’ expertise influences perceptions of feedback. Thus, it seems likely that the feedback provider as well as knowledge about their characteristics influence how learners perceive feedback.

2.3.2 The importance of trustworthy feedback providers

Because AI-systems will be implemented in educational contexts more and more to support educators and learners (Bozkurt, 2023; Chiu et al., 2023), it is crucial to examine how learners will react to these providers considering their different (power) relationships (see 2.2). Particularly important is understanding the extent to which learners trust the feedback provider (Carless, 2012; Davis and Dargusch, 2015; Ilgen et al., 1979; Winstone et al., 2017), as this is a prerequisite that helps learners decide whether to use the feedback (Boud and Molloy, 2013; Carless, 2006; Holmes and Papageorgiou, 2009; Carless, 2012; Davis and Dargusch, 2015). For example, feedback from trustworthy sources positively relates to feedback acceptance and motivation (see Lechermeier and Fassnacht, 2018).

For humans, trust constitutes perceived benevolence, integrity, and expertise (Hendriks et al., 2015). Since AI-systems are evaluated like humans, the same criteria can be applied to them (Reeves and Nass, 1996). Generally, the development of trust in the context of feedback is facilitated by the learner’s perception of certain characteristics about the feedback provider, such as their expertise, experience, or status (Hoff and Bashir, 2015; Lechermeier and Fassnacht, 2018; Lucassen and Schraagen, 2011; Van De Ridder et al., 2015). Further, trustworthiness can also be influenced by such observable cues (Kaplan et al., 2023).

While these concepts apply equally to humans and AI-systems, the development of trust in each of them differs (see Madhavan and Wiegmann, 2007). For humans, as the relationship between two people progresses, trustworthiness usually increases (Hoff and Bashir, 2015). By contrast, AI-systems often enjoy a positivity bias initially, meaning that people trust the system in the beginning due to, for example, its label (Langer et al., 2022) or assumed abilities and objectivity (Swiecki et al., 2022). In this vein, a previous study by Ruwe and Mayweg-Paus (2023) found that AI-systems were perceived as more trustworthy as humans after a first interaction. Learners’ interactions with AI-systems are also affected by the quality of the system (Cai et al., 2023), whereby students often have certain expectations toward AI-systems (like ChatGPT) which the system may or may not meet, thereby affecting learners’ behavior toward the system (Strzelecki, 2023). In education settings, both teachers and students have tended to meet AI-systems with skepticism (Chiu et al., 2023), as many are unsure about the benefits of AI (Clark-Gordon et al., 2019; Shin et al., 2020). In these cases, trust can develop with increasing interaction as learners get to know the system (Cai et al., 2023; Nazaretsky et al., 2022; Qin et al., 2020). Overall, the best-case scenario for such human-AI interactions is for people to have fruitful experiences with an AI-system and trust it just enough that they neither over-nor under-rely on the system’s decisions (Parasuraman and Riley, 1997).

In this vein, the concept of AI literacy (Ng et al., 2021) becomes important: To use an AI-system effectively, users need to understand the system, but this is difficult because the capabilities and functionalities of AI have developed faster than users’ comprehension of them (Zawacki-Richter et al., 2019). Since the ability to comprehend an AI-system is crucial for increasing trust (Nazaretsky et al., 2022; Qin et al., 2020), researchers have suggested that transparency and explainable AI (xAI) be employed (Cai et al., 2023; Khosravi et al., 2022; Memarian and Doleck, 2023), thereby improving trust and understanding and, in turn, interactions between humans and AI.

2.4 Providing information about the feedback provider

In (peer) feedback processes, anonymity is often considered beneficial to avoid social biases, but it does not always have the desired effects (e.g., Lu and Bol, 2007; Panadero and Alqassab, 2019). When working with educators or AI-systems, implementing anonymity is rather unrealistic: Learners know whose classes they take, and data protection laws require at least minimal information about algorithms. Furthermore, as mentioned above, users often want information about AI-systems for transparency reasons (Khosravi et al., 2022; Memarian and Doleck, 2023). Therefore, one approach might be to investigate the opposite of anonymity, namely whether giving additional information about a feedback provider actually promotes feedback effectiveness.

In the context of AI-systems, having information about the feedback provider (the AI-system) can be seen as an external aspect that affects situational trust (Hoff and Bashir, 2015) and is comparable to explainable AI (xAI). xAI aims at supporting users ‘to understand how, when, and why predictions are made’ (Kamath and Liu, 2021, p. 2), such that these explanations can establish trust and understanding as well as promote the use of AI-systems (Hoff and Bashir, 2015; Khosravi et al., 2022; Memarian and Doleck, 2023; Vössing et al., 2022). Therefore, the information about an AI-system can potentially influence whether and how the output of the system, e.g., feedback, is actually used by the user, e.g., the learner. However, providing information that is overwhelming or confusing reduces users’ understanding and can, thus, hamper transparency (Khosravi et al., 2022). One promising approach might be to offer users non-technical, global explanations about the general functioning, i.e., the competence, of the AI-system (Brdnik et al., 2023; Khosravi et al., 2022), such that learners can determine how the AI works, how it may affect them, and whether it is trustworthy (Holmes et al., 2021), even though there is no one-size-fits-all explanation in educational settings (Conijn et al., 2023).

Not only does having information about (the characteristics of) an AI-system increase trust in it, having information about human experts and their individual characteristics also increases trust in them and the information they provide, particularly in online contexts (Harris, 2012; Metzger, 2007; Flanagin et al., 2020). As outlined above, this is true for educators, whose perceived expertise (among other characteristics) influences the effectiveness of feedback (see 2.2 and 2.3.2).

2.5 Feedback literacy to improve feedback processes?

In the context of this study, feedback literacy might play an important role because feedback literacy is said to diminish biases in learners’ reception of feedback (Carless and Boud, 2018). Additionally, feedback literacy involves strong critical thinking and reflection skills, and these skills are also essential for effectively dealing with the challenges brought by modern technologies like AI-systems (Alqahtani et al., 2023; Casal-Otero et al., 2023; Federal Ministry of Education and Research, 2023; Ng et al., 2021). According to Carless and Boud (2018), feedback literacy encompasses appreciating feedback, making judgments based on feedback, managing affect resulting from feedback, and taking action from feedback. Thus, one could assume that feedback-literate learners might manage their affective reaction to the feedback provider and prevent negative influences from getting in the way of engaging with the feedback information.

2.6 Hypotheses

Building on the literature outlined above, we aimed to investigate the effect of providing provider-information about feedback providers as a way to avoid the pitfall of negative perceptions (in terms of feedback message and feedback provider perceptions) hindering feedback effectiveness, and we did this while also considering learners’ feedback literacy. We are thus asking whether providing background information about the feedback provider (i.e., educator vs. AI-system) influences learners’ perceptions of the feedback as well as the feedback provider, as these perceptions are known to influence the effectiveness of feedback. The framework underlying the study is illustrated in Figure 1. The study was pre-registered (see: https://aspredicted.org/N3H_JQ8).

Figure 1

Figure 1. Illustration of the conceptual framework of the study.

We outlined (see 2.4) that having information about feedback providers influences not only the learner’s perceived trustworthiness of the feedback provider, but it also influences their perception of the feedback information itself. Those arguments led to our first hypothesis:

H1: We assume there will be a main effect of having provider-information about the feedback provider on learners’ perceptions of (a) the feedback and (b) the feedback provider (i.e., trustworthiness) compared to when learners have no information about the providers.

In 2.3.2, we briefly outlined the differences and similarities between humans and AI-systems regarding trustworthiness. We assume that even though educators are the traditional sources of feedback and hold a certain status, AI-systems still benefit from positivity bias. Accordingly, we derived our second hypothesis:

H2: We assume that (a) feedback from an LLM will be perceived more positively than feedback from a human and that (b) learners’ perceptions of the LLM as a feedback provider will be more positive than their perceptions of a humans as a feedback provider.

Since certain characteristics of the feedback provider also play a role (see 2.2), our third hypothesis involves the interaction between the feedback provider and their corresponding provider-information:

H3: We assume that having provider-information will more positively influence learners’ perceptions of (a) the feedback and (b) the feedback provider when this information is provided for the human than when it is provided for the LLM.

Lastly, some learner attributes influence whether and how much learners engage with the feedback process, such as feedback literacy (see 2.5). Because feedback literacy encompasses competences that allow a less biased evaluation of feedback, we explored whether feedback literacy affected learners’ reactions to the feedback to the and feedback provider.

3 Method

3.1 Participants and design

Using G*Power (Faul et al., 2009) for MANOVA Special Effects/Interactions with α = 0.05, β = 0.95, and a medium-sized effect, a total sample of 65 participants was estimated. Participants were recruited via different channels (e.g., postings in the learning management system, using networks). In total, N = 462 German-speaking students (studying to be teachers) began participation in the study; N = 169 students finished the study, and their data were included in the analysis (included sample was 68.6% female, 0.18% diverse/not specified; M_age = 24.85, SD_age = 6.74; 54.4% at bachelor’s degree level; 95.9% German native speakers).

The 2×2 between-subjects study with the factors feedback provider (educator vs. AI-system) and provider-information (yes vs. no) was implemented online (unipark.com by Questback EFS Survey). Participants were randomly assigned to one of the four experimental conditions.

3.2 Measures

3.2.1 Trustworthiness

Participants’ perceptions of the feedback providers’ trustworthiness were assessed with the Muenster Epistemic Trustworthiness Inventory (METI; Hendriks et al., 2015). The 16 items of the METI can be summarized in three subscales (i.e., expertise (seven items, α ≥ 0.94), integrity (five items, α ≥ 0.82), benevolence (four items, α ≥ 0.80)). The items were assessed on a seven-point semantic differential scale (using antonyms, e.g., 1 = professional vs. 7 = unprofessional (expertise), 1 = honest vs. 7 = dishonest (integrity), 1 = responsible vs. 7 = irresponsible (benevolence)).

3.2.2 Feedback message perceptions

To assess how participants evaluated the feedback, the Feedback Perceptions Questionnaire (FPQ; Strijbos et al., 2021) was applied. The 18 items were assessed on a 10-point bipolar scale ranging from 1 = fully disagree to 10 = fully agree. They are distributed across five subscales: fairness, α ≥ 0.87 (e.g., I would be satisfied with this feedback), usefulness, α ≥ 0.91 (e.g., I would consider this feedback useful), acceptance, α ≥ 0.83 (e.g., I would accept this feedback), willingness to improve, α ≥ 0.87 (e.g., I would be willing to improve my performance), and affect, α ≥ 0.85 (e.g., I would feel satisfied/content if I received this feedback on my revision).

3.2.3 Feedback literacy

For assessing participants’ feedback literacy, we employed five subscales of Zhan’s (2022) student feedback literacy scale: processing (e.g., ‘I am good at comprehending others’ comments’, α ≥ 0.76), enacting (e.g., ‘I am good at managing time to implement the useful suggestions of others’, α ≥ 0.68), appreciation (e.g., ‘I have realized that feedback from other people can make me recognize my learning strengths and weaknesses’, α ≥ 0.76), readiness (e.g., ‘I am always ready to receive hypercritical comments from others’, α ≥ 0.81), and commitment (e.g., ‘I am always willing to overcome hesitation to make revisions according to the comments I get’, α ≥ 0.70). The 20 items (four items each) were assessed on positively packed six-point Likert-type scales ranging from 1 (= strongly disagree) to 6 (= strongly agree).

3.2.4 Control variables

To control for potential further influences of the participants’ characteristics (particularly on perceptions of the AI-system), we included demographic variables and collected information about participants’ experience with AI-systems (Kaplan et al., 2023). We had participants estimate their competence, experience, expertise, performance, and previous interactions with AI-systems in educational contexts (e.g., ‘I have sufficient competences to use an AI-system in educational contexts’) on a five-point Likert scale ranging from 1 = completely disagree to 5 = completely agree.

Furthermore, we wanted to gain insights into a potential positivity bias on the participants’ side. Therefore, we developed four items in accordance with Hoff and Bashir’s (2015) model to assess whether participants have dispositional, learned, internal or external situational trust in AI-systems (α ≥ 0.79).

3.3 Procedure

After providing their informed consent and demographic information, participants received an introduction to the scenario: They saw a screenshot of a seemingly realistic interaction between an educator or an LLM and a learner depending on their experimental group (see Figures 2, 3). The screenshots in Figures 2, 3 show the educator and LLM conditions with provider-information. This information (on top of the screenshot) was missing in the no provider-information conditions. The human-human interactions were oriented on a common learning management system, while the human-AI interactions were oriented on ChatGPT (OpenAI, 2023). The feedback solely referred to structural, not contextual issues, and the feedback was designed to be neither positive nor negative. The fit between the text and the feedback was pilot tested (see below).

Figure 2

Figure 2. Screenshot of experimental manipulation (educator x provider-information).

Figure 3

Figure 3. Screenshot of experimental manipulation (LLM x provider-information).

Building on research on xAI (see 2.4), the provider-information for the LLM was solely textual (Conijn et al., 2023) and addressed the LLM’s performance while not being too overwhelming (Khosravi et al., 2022). The provider-information was oriented on Conijn et al.’s (2023) global explanations but shortened and adapted. The provider-information about the human feedback provider described their expertise and was designed to align with the provider-information of the LLM.

After reviewing the screen with the feedback (and, for the corresponding conditions, the provider-information), participants were then asked to quantitatively and qualitatively give their perceptions of the feedback and the feedback provider as well as estimate their feedback literacy. More details on the procedure can be seen in Figure 4.

Figure 4

Figure 4. Illustration of the procedure of the study.

Prior to the experiment, we pilot tested the materials and the study with five people with educational backgrounds (60% female, M_age = 43.20, SD_age = 22.05). The pilot tests verified the comprehensibility and design of the study and all of its materials.

3.4 Ethics statement

The study complied with APA ethical standards for research with human subjects as well as the EC’s data protection act. All participants provided their informed consent and were debriefed about the purpose at the end of the survey. Dropping out was possible at any time without having to provide a reason. Participants were reimbursed with 10 €.

3.5 Statistical analyses

Using the lavaan package (Rosseel, 2012) in R studio (R Core Team, 2022), we built saturated path models (equivalent to regression analyses) to test our theoretically grounded model as outlined before and illustrated in Figure 5. We included dummy variables for our independent variables, using educators and no provider-information as the reference categories. Furthermore, we used a robust estimator (MLM) to account for minimal skewness in our data (Rosseel, 2012). We built one model with the overall means and one with the subscales for each hypothesis and set an α level of 0.05. The descriptive values of the dependent variables across the experimental groups can be found in Figures 6, 7.

Figure 5

Figure 5. Illustration of the theoretical background of the path models based on the literature review.

Figure 6

Figure 6. Overview over descriptive values of trustworthiness across all experimental groups. + Info, meta-information provided; − Info, no meta-information provided; METI, Muenster epistemic trustworthiness inventory; Exp, expertise; Ben, benevolence; Int, integrity.

Figure 7

Figure 7. Overview over descriptive values of feedback message perceptions across all experimental groups. FPQ, feedback perceptions questionnaire; FA, fairness; AF, affect, AC, acceptance; WI, willingness to improve; US, usefulness.

Accordingly, we built three models to test our hypotheses as stated above (see 2.6): Model A included the feedback provider and the provider information as independent variables and the overall values of feedback message and feedback provider perceptions as the dependent variables (see Supplementary Appendix Table A1). Instead of looking at the overall values, model B included all subscales as dependent variables (see Supplementary Appendix Table A2). These subscales remained the dependent variables in model C while including the feedback provider as independent variable and adding provider information as moderating variable (see Supplementary Appendix Table A3).

3.5.1 Manipulation checks

We assessed several other variables to get more insights into our setting and the experimental manipulations. The participants evaluated the setting as realistic (M = 2.36,¹ SD = 1.38), reasonable (M = 2.50, SD = 1.32), appropriate (M = 2.76, SD = 1.42), and suitable (M = 2.87, SD = 1.38). The majority of participants (82.1%) did not think about a specific or known person or AI-system. Participants who received the provider-information agreed that they found it rather helpful (M = 4.54², SD = 1.59), clear (M = 4.44, SD = 1.58), and adequate (M = 4.29, SD = 1.48); They rather disagreed that the provider-information was overwhelming (M = 2.90, SD = 1.45), confusing (M = 2.92, SD = 1.64), and distracting (M = 2.78, SD = 1.43). This indicates a good quality of the explanations (Brdnik et al., 2023; Conijn et al., 2023; Khosravi et al., 2022).

Finally, we assessed participants’ content knowledge and their experience with, attitude toward, and trust in AI-systems in feedback contexts. Participants’ knowledge about the topic of the argumentative text was low (M = 1.61, SD = 0.84). According to participants’ self-evaluations, their competence (M = 2.82, SD = 1.73), knowledge (M = 3.15, SD = 1.86), and experience (M = 2.24, SD = 1.70) with AI-systems in feedback contexts were rather low, but they estimated their performance in such a setting as average (M = 3.93, SD = 1.34). Their expectations toward an AI-system in such a setting were rather high (M = 4.08, SD = 1.75).

There were no differences between the experimental groups except for experience [F(1, 167) = 4.19, p ≤ 0.05, R² = 0.02], where the educator group (M = 2.50) reported more experience with AI-systems in feedback settings than the AI group (M = 1.96). There were no differences between the experimental groups regarding their attitudes toward AI-systems in feedback settings, which were rather positive (M = 4.31, SD = 0.91). This could indicate a slight positivity bias toward AI-systems in feedback contexts.

4 Results

4.1 Hypothesis 1 – Main effect of provider-information

While there was no effect on trustworthiness, the availability of provider-information had a direct association with feedback message perceptions. We found that having provider-information improved participants’ feedback message perceptions overall (β = 0.53, p ≤ 0.05, model A, Figure 8) and positively influenced participants’ affective reactions to the feedback (β = 0.79, p ≤ 0.01, model B, Figure 9). The effect of provider-information and feedback providers was small according to Cohen (1988)³ (see Figures 8, 9).

Figure 8

Figure 8. Illustration of model A. Significant associations highlighted in black.

Figure 9

Figure 9. Illustration of model B. Only significant associations are displayed. Covariances Shown in Table 1.

Table 1

Table 1. Covariances of significant outcome variables.

4.2 Hypothesis 2 – Main effect of feedback provider

There was no evidence of significant associations between the feedback provider and feedback message perceptions. Nonetheless, the feedback provider was associated with trustworthiness: The LLM was perceived as more trustworthy than the educator overall (β = 0.58, p ≤ 0.001, model A, Figure 8) as well as on all three subscales (model B, Figure 9) expertise (β = 0.74, p ≤ 0.001), benevolence (β = 0.62, p ≤ 0.001), and integrity (β = 0.39, p ≤ 0.05).

4.3 Hypothesis 3 – Interaction effect of feedback provider and provider-information

We did not find a significant interaction effect of feedback provider and provider-information on the overall scores of feedback message perceptions and trustworthiness. Looking at the subscales, we found that the effect of the feedback provider on fairness changed when adding provider-information (see model C, Figure 10): The interaction of feedback provider and provider-information was significant regarding fairness (β = 1.24, p ≤ 0.05) and expertise (β = 1.24, p ≤ 0.05). Follow-up t-tests revealed that without provider information, there was a significant difference between LLMs (M = 7.07, SD = 2.13) and educators (M = 8.06, SD = 2.10) regarding fairness (t = 2.52, p ≤ 0.05, d = 0.53). When adding provider-information, this effect vanished (M_LLM = 8.25, M_Educator = 8.06, t = − 0.44, p = 0.66, d = 0.09). In addition, there was a significant difference in participants’ perceptions of the feedback providers’ expertise without provider-information (M_LLM = 3.34, M_Educator = 2.19, t = − 4.46, p ≤ 0.001, d = 0.94), and this disappeared when provider-information was provided (M_LLM = 2.62, M_Educator = 2.31, t = −1.29, p = 0.201, d = 0.28).

Figure 10

Figure 10. Illustration of model C. Only significant associations are displayed.

Overall, the models illustrating the interaction of the feedback provider and provider-information explain more variance than those excluding the interaction: When including the moderation, 2.4% more variance of fairness could be explained, and 2.9% more variance of expertise could be explained. These indicate small effects (Cohen, 1988).

4.4 Exploratory analysis

Finally, we explored potential effects of feedback literacy on the associations under investigation. While neither model showed that feedback literacy significantly moderated the effects of feedback provider and provider-information on feedback message perceptions and trustworthiness, after running MANOVAs we found significant effects on overall feedback message perceptions and trustworthiness as well as on some subscales. Accordingly, with increases in feedback literacy, we found increases in feedback message perceptions [Figure 11; F(1, 167) = 10.61, p ≤ 0.01, R² = 0.05] but decreases in trustworthiness [Figure 12; F(1, 167) = 8.22, p ≤ 0.01, R² = 0.04]. A closer look at the subscales revealed the same pattern as in Figures 11, 12. The effects on expertise [F(1, 167) = 3.83, p = 0.052] and acceptance [F(1, 167) = 3.81, p = 0.053] were not significant though, meaning that feedback literacy did not influence these two subscales. Overall, the data show that feedback literacy does play a role in explaining learners’ perceptions of feedback and of feedback providers.

Figure 11

Figure 11. Plots of relationships between feedback literacy and feedback message perceptions. FPQ, feedback perceptions questionnaire; FA, fairness; AF, affect; AC, acceptance; US, usefulness; WI, willingness to improve.

Figure 12

Figure 12. Plots of relationships between feedback literacy and trustworthiness.

5 Discussion

To summarize, we found evidence that the type of feedback provider and the presence of provider-information about the feedback provider influence the effectiveness of the feedback, which was operationalized as learners’ perceptions of the feedback and the feedback provider. Furthermore, we found evidence that feedback literacy plays an important role for feedback effectiveness. First, feedback providers and corresponding provider-information were directly associated with learners’ perceptions of the feedback and feedback providers. While an LLM was perceived as more trustworthy than educators, the presence of provider-information improved participants’ perceptions of the feedback. Second, the presence of provider-information influenced the effect of the feedback provider: Without provider-information, feedback from an educator was perceived as fairer than from an LLM, but when the provider-information was present, AI-feedback was perceived as fairer. Similarly, there was a significant difference in trustworthiness regarding expertise without provider-information, such that an LLM were perceived as more trustworthy. Yet, when provider-information was present, the LLM and educator were perceived as similarly trustworthy regarding their expertise. Finally, we explored the role of feedback literacy and found that it did influence the way learners reacted to the feedback process. We found that with increases in feedback literacy, participants’ perceptions of the feedback improved, while the perceived trustworthiness of the feedback providers decreased.

5.1 Hypothesis 1 – Main effect of provider-information

The first hypothesis was confirmed with respect to the effect of having provider-information on learners’ perceptions of the feedback message (H1a). In line with the literature, the type of feedback provider as well as having knowledge about them influenced how learners reacted to feedback and, thus, how they perceived it. In the complex feedback process, various elements influence the effectiveness of feedback, including having access to any information that improves the relationship between the learner and the feedback provider (Panadero and Lipnevich, 2022; Winstone et al., 2017). Another reason that feedback message perceptions were positively influenced by provider-information could be that the presence of the provider-information improved transparency. As a whole, provider-information might help learners understand the feedback message (see 2.4).

In contrast, the second part of the first hypothesis (H1b) was not confirmed: There was no effect of provider-information on participants’ perceptions of the feedback provider, i.e., trustworthiness. This is a surprising finding, particularly because provider-information moderated the effect of the feedback provider on trustworthiness (see Hypothesis 3). Supposedly, provider-information that gives insights into the expertise and competence of the feedback provider should be beneficial for how the provider is perceived. Possibly, the provider-information presented in this study did not sufficiently emphasize the characteristics of the feedback provider that informed participants about their trustworthiness, such as expertise, experience, and status (Hoff and Bashir, 2015; Lechermeier and Fassnacht, 2018; Lucassen and Schraagen, 2011; Van De Ridder et al., 2015). In this vein, regarding provider-information, there is no one-size-fits-all (Conijn et al., 2023) – which also becomes evident below in the discussion on Hypothesis 3. Conclusively, the rather global information we provided may not be suitable to increase trustworthiness. In addition, participants’ individual characteristics could help explain the lack of an effect. These characteristics are crucial in context with trustworthiness (Kaplan et al., 2023). Keeping in mind the trust trajectories (see 2.3.2), the positivity bias we found (see Hypothesis 2), could be undermined by the additional information. With the additional information about the LLM as provider, trust in it decreases while this information adds to the trustworthiness of educators. The perceptions of the providers are thus approximating.

Overall, we can conclude that provider-information about the feedback provider influenced participants’ reactions to the feedback process, i.e., the effectiveness of feedback.

5.2 Hypothesis 2 – Main effect of feedback provider

In line with our assumption, the type of feedback provider influenced participants’ perceptions of them (H2b). More specifically, the LLM was perceived as more trustworthy than the educator, a finding that confirms the results of one of our previous studies (Ruwe and Mayweg-Paus, 2023). Overall, the literature agrees that the feedback provider influences the effectiveness of the feedback (Panadero and Lipnevich, 2022; Winstone et al., 2017). The fact that an LLM was again perceived as more trustworthy than a human can be explained by our sample’s characteristics: The participants in this study did not have much experience with AI-systems, but they had high expectations of them and rather positive attitudes. This could indicate a potential positivity bias, which our data confirms. According to the positivity bias, people have high initial trust in AI-systems due to various prejudices, and this decreases after repeated interactions with the system (see 2.3.2).

On the other hand, our assumption that the feedback provider influenced participants’ perceptions of the feedback (H2a) was not confirmed. In previous studies (e.g., Dijks et al., 2018; Strijbos et al., 2010) the expertise of the feedback provider, i.e., a key characteristic of the feedback provider, was found to influence feedback message perceptions. Even though a key characteristic of the feedback provider is closely related to the feedback provider themself, they are not the same thing, which could explain the absence of a significant association here. Particularly, against the background of provider-information having a moderating effect (H3a), this explanation might be true, as the purpose of the provider-information was to emphasize the characteristics of the feedback provider.

In conclusion, feedback providers influence the effectiveness of feedback in terms of learners’ reactions to the feedback process, and this finding agrees with the literature on characterizing the feedback provider as an important influence on feedback effectiveness (e.g., Panadero and Lipnevich, 2022; Winstone et al., 2017).

5.3 Hypothesis 3 – Interaction of provider-information and feedback provider

For fairness, we found evidence against the assumed relationship (H3a): The educator’s feedback was perceived as less fair than feedback generated by an alleged LLM when provider-information was present. Without provider-information, feedback from the educator was perceived as fairer than that of the LLM. As outlined above, information about the specific characteristics of a feedback provider seems to influence how learners perceive the feedback (Dijks et al., 2018; Strijbos et al., 2010). Furthermore, this finding is in line with literature about xAI, which aims at increasing transparency and fairness (see 2.4). Still, it is interesting that the educator did not benefit from the presence of provider-information regarding the perceived fairness of their feedback. Possibly, the content of the provider-information could have affected whether the educator was seen as objective.

The assumed relationship of provider-information on trustworthiness (H3b) was confirmed: Trustworthiness regarding the educator’s expertise increased when provider-information was present and, thus, approached the same level as trust in the LLM’s expertise. Thus, the presence of provider-information benefited the educator’s trustworthiness. This finding may be related to the positivity bias and the varying trust trajectories between humans and AI-systems: People are predisposed to trust an AI-system initially, but once they interact with it more, their trust decreases. On the contrary, trust in humans increases with repeated interactions. In this vein, providing information about the educator could have sped up the process of getting to know the feedback provider and, subsequently, led to higher trust. For AI-systems, it would be interesting to compare different explanations and different systems, because the characteristics of the system, its training data, and its developers can influence how users perceive the system (Kaur et al., 2022).

5.4 Exploratory analysis on the role of feedback literacy

Our exploratory investigation of whether participants’ feedback literacy affected their perceptions revealed interesting insights. Potentially, feedback literacy might allow learners to neither over-nor under-rely on feedback (providers) and accurately receive and engage with the feedback.

Increases in feedback literacy went hand in hand with increases in feedback message perceptions. Thus, feedback literacy did support learners in evaluating the feedback and planning their engagement with it. On the other hand, increases in feedback literacy also went hand in hand with decreases in trustworthiness of the feedback provider. This decrease in trustworthiness along with the increase in feedback message perceptions could mean that learners with higher feedback literacy engage with feedback in a way that is more independent of the feedback provider. Thus, these learners may rely less on potential biases related to the feedback provider and are more focused on the feedback itself.

5.5 Theoretical implications

Building on these findings, here we draw theoretical implications. Overall, the study confirmed that feedback processes are complex, and their elements are intertwined. This complexity does not disappear when AI-systems like LLMs are involved, and such systems bring their own complex backgrounds. It is worth diving deeper into feedback processes involving AI-systems and looking at the various relations in more detail.

Regarding provider-information, the results showed that its effect depends on the feedback provider. In this context, more research on what types of provider-information work for whom is important. We thus agree with authors like Conijn et al. (2023), who have stated that there is no one-size-fits-all explanation. In this vein, provider-information is closely related to the feedback provider, leading to interesting questions: Why do the effects of provider-information and feedback providers differ regarding feedback effectiveness? For future research, it would be interesting to investigate the underlying mechanisms and the value that learners place on the different elements of the feedback process.

Finally, we point toward the importance of feedback literacy. Further research should investigate how to promote feedback literacy or how to activate it in feedback processes to avoid biased feedback interactions. In this context, research on further provider-competences that allow reflective engagement with feedback (processes) seems promising and inevitable.

5.6 Practical implications

Our findings can also be used to infer practical implications. First, giving provider-information about feedback providers should be considered, as this information improves learners’ perceptions of the feedback and also influences the effect of the feedback provider. In this vein, it is important to consider that the effects of provider-information might differ with different content and for different feedback providers. Another aspect to strongly consider is how personal factors, e.g., the personal relationship between the learner and the feedback provider, influence trustworthiness. Second, positivity biases and stereotypes toward AI-systems should be considered when implementing them in feedback processes. No learner should blindly trust AI-systems. This leads directly to our third implication, which is that feedback literacy should be promoted to encourage learners to engage with the feedback (process) in a reflective manner. Conclusively, the importance of a trustworthy relationship between students and feedback providers should be emphasized. Such relationships should be based on trust and enable an open and critical engagement with the feedback process.

5.7 Limitations

For various reasons, the results and implications derived from this study should be interpreted with caution. The sample is quite specific and encompassed only German (speaking) students training to be teachers. These participants had varying individual characteristics (like different levels of feedback literacy) that influenced their responses, and their culture also influenced their trustworthiness assessments, particularly in the context of AI-systems (Kaplan et al., 2023). The transferability of the findings to other feedback scenarios is further limited by the feedback message that was specifically related to the argumentative structure and not to the content. Furthermore, we mentioned that personal relationships are critically important in feedback processes, and their power should not be underestimated. Yet, since the data were collected in a fictional study where personal relationships were not relevant, field studies could derive different results. In this vein, another limitation related to the methodology is that the subscale ‘Enacting’ used for assessing feedback literacy may not have been reliable. Furthermore, this study only covered a small snippet of the feedback process, including the design of the process as well as the effectiveness of feedback. Since feedback environments are (increasingly) complex, it is important to look at effects of the specific aspects of individual feedback interactions.

5.8 Conclusion

To conclude, our study found a difference in participants’ perceptions of the feedback and the feedback provider, and these differences at least partly depend on the feedback provider and whether provider-information was present. While provider-information was associated with increases in feedback message perceptions, we found a link between feedback providers and their trustworthiness in a way that benefitted LLMs. Furthermore, provider-information moderated the effect of the feedback provider on the effectiveness of feedback. Finally, our findings suggest that feedback literacy might play an important role in supporting learners when they reflectively engage with the feedback process. Based on these findings, we outlined many potential starting points for future research and also offered recommendations for practitioners regarding the design of feedback processes.

Data availability statement

The datasets presented in this article are not readily available because participants were informed their data remains confidential. Requests to access the datasets should be directed to dGhlcmVzYS5ydXdlQGh1LWJlcmxpbi5kZQ==.

Ethics statement

The studies involving humans were approved by Humboldt-Universität zu Berlin. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

TR: Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft, Writing – review & editing. EM-P: Conceptualization, Funding acquisition, Project administration, Supervision, Writing – original draft, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This study was funded by the Einstein Centre Digital Future (ECDF).

Acknowledgments

We thank our colleagues for their support. We would also like to thank the reviewers for their valuable input which helped improve our manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2024.1461362/full#supplementary-material

Footnotes

1. ^The items were assessed as antonyms ranging from 1 = positive manifestation to 7 = negative manifestation.

2. ^The items were assessed on a scale from 1 = completely disagree to 7 = completely agree.

3. ^The benchmarks for R² are small effect = 0.02, medium effect = 0.13, large effect = 0.26.

References

Ajjawi, R., and Boud, D. (2017). Researching feedback dialogue: An interactional analysis approach. Assess. Eval. High. Educ. 42, 252–265. doi: 10.1080/02602938.2015.1102863