Preservice teachers’ evaluation of evidential support in causal arguments about educational topics

Lederer, Andreas; Thomm, Eva; Bauer, Johannes

doi:10.3389/feduc.2024.1379222

ORIGINAL RESEARCH article

Front. Educ. , 10 September 2024

Sec. Educational Psychology

Volume 9 - 2024 | https://doi.org/10.3389/feduc.2024.1379222

This article is part of the Research Topic (Ir)Relevance in Education: Individuals as Navigators of Dynamic Information Landscapes View all 8 articles

Preservice teachers’ evaluation of evidential support in causal arguments about educational topics

Andreas Lederer¹^*^†

Eva Thomm^1,2^†

Johannes Bauer^1,2^†

¹Educational Research and Methodology, Faculty of Education, University of Erfurt, Erfurt, Germany
²Institute for Planetary Health Behaviour, Erfurt, Germany

Many questions about educational topics—such as the effectiveness of teaching methods—are of causal nature. Yet, reasoning about causality is prone to widespread fallacies, such as mistaking correlation for causation. This study examined preservice teachers’ ability to evaluate how various types of evidence provide adequate support for causal claims, using psychology students as a comparison group. The experiment followed a 2 × 3 mixed design with the within-participant factor evidence type (i.e., anecdotal, correlational, experimental) and the between-participants factor study field (i.e., teacher education, psychology). Participants (N = 135) sequentially read short texts on three different educational topics, each presenting a claim and associated evidence. For each topic, participants indicated their claim agreement, and evaluated the convincingness of the argument and the strength of the evidential support. Results from mixed ANOVAs displayed main effects for evidence type on the convincingness of the argument and strength of evidential support, but not on individual claim agreement. Participants found experimental evidence to be more convincing and to provide stronger support for causal claims compared to anecdotal evidence. This pattern occurred similarly for both student groups and remained stable when controlling for cognitive and motivational covariates. Overall, preservice teachers seem to possess a basic understanding of different kinds of evidence and their differential strength in supporting causal arguments. Teacher education may build upon this foundational knowledge to enhance future teachers’ competencies in critically appraising evidence from educational research and relating it to school-related claims and issues.

1 Introduction

Many—if not most—questions about educational topics are inherently causal (e.g., Kvernbekk, 2016; Shavelson and Towne, 2002). This concerns not only research questions, such as the effectiveness of teaching methods or educational interventions, but also frequently encountered concerns among educational practitioners. For instance, teachers may wonder how to best explain a difficult topic to enhance students’ comprehension, boost engagement and motivation or support a struggling child. Though other types of questions, such as diagnostic ones, are certainly important (Shavelson and Towne, 2002), education and teaching, as goal-directed endeavours, naturally involve analyses of whether specific actions causally contribute to achieving desired outcomes.

Unfortunately, human reasoning about causality poses considerable challenges and is notoriously susceptible to biases (Bleske-Rechek et al., 2015; Cunningham, 2021; Hernán and Robbins, 2023). The common adage “correlation does not imply causation” underscores the highly prevalent fallacy of mistaking mere coincidence or correlation between events as evidence of a cause-and-effect relationship (Bleske-Rechek et al., 2015). Because of such biases, reasoning about what practices might be effective in a given context may pose some challenges for practitioners. For instance, teachers wishing to learn about the effects of utilising digital media in instruction on students’ learning may encounter a plethora of sources that present diverse evidence: colleagues’ experiences, media coverage, educational guidebooks and tutorials but also literature based on educational research. Although both research and professional experiences hold value for practitioners (Rousseau and Gunia, 2016), studies consistently indicate that teachers might lack fundamental research knowledge (Rochnia et al., 2023; Schmidt et al., 2023). This is not surprising, given that teacher education curricula typically offer less systematic methodological training compared with disciplines more oriented towards empirical science (e.g., psychology). In line with this, teachers often favour anecdotal evidence, such as personal or colleagues’ experiences, over research-based sources when forming conclusions about effective practices (Fischer, 2021; van Schaik et al., 2018). However, because of these biases, anecdotal evidence must be viewed as the least conclusive for supporting generalised causal claims. Moreover, even within research, studies involving randomised experiments are typically deemed more causally informative compared with observational study designs, such as surveys (Hernán and Robbins, 2023; Shavelson and Towne, 2002). Therefore, relying solely on experiences or anecdotes, as well as studies with lower internal validity, could foster ill-advised practices or pedagogical misconceptions (Asberger et al., 2021; Menz et al., 2021; Michal et al., 2021).

The present study aimed to investigate preservice teachers’ assessments of different types of evidence (i.e., anecdotal, correlational, experimental) in supporting causal claims on educational topics. The study thus contributes to the expanding body of literature on preservice teachers’ evidence-informed reasoning and engagement with research-based knowledge in teacher education (Kollar et al., 2023). Surprisingly, although research has examined preservice teachers’ reasoning abilities (e.g., Csanadi et al., 2021) or evidence evaluation (e.g., Reuter and Leuchter, 2023), studies have hardly focused on causal argumentation. This lack of research is noteworthy considering the pivotal role that causal issues play in educational practice (Kvernbekk, 2016).

1.1 Causal reasoning about educational topics

Reasoning about causal claims essentially involves constructing arguments (Hahn et al., 2017; Kuhn and Dean, 2004). Argumentation follows a basic structure that involves presenting an assertion (Claim, C) supported by pertinent evidence (E) that is connected by a logical-theoretical link (Warrant, W) (Toulmin, 2003). These elements enable an assessment of the claim’s validity (Toulmin, 2003; Moshman and Tarricone, 2016). Analysing whether a causal claim is justified often requires examining the pertinence and conclusiveness of the evidence. This is pivotal because, as discussed above, not all evidence can equally substantiate causal claims (Hernán and Robbins, 2023; Kvernbekk, 2016; Shavelson and Towne, 2002).

Although there are various ways to identify causal relationships (Cunningham, 2021; Hernán and Robbins, 2023; Pearl, 2009), anecdotal evidence (alone) will rarely provide sufficient support for a robust causal argument (Kuhn, 1991). For instance, although anecdotal evidence from teaching practice can be valuable in recognising important co-occurring events and inspiring hypotheses about putative reasons, it falls short in excluding alternative explanations (Bleske-Rechek et al., 2015; Kuhn, 1991). In contrast, typical causal claims in education, such as those regarding the effectiveness of instructional methods, require evidence from controlled experimental settings that have high internal validity (Shavelson and Towne, 2002). Randomised experiments are generally considered the simplest and gold standard method to establish causality within the methodological literature (Holland, 1986) and, thus, hold a high position in evidence hierarchies for evidence-based practice (Kvernbekk, 2016). Conversely, observational study designs typically have lower internal validity although, under specific circumstances, they may allow for the identification of causal effects (Cunningham, 2021; Hernán and Robbins, 2023). This is mainly because observational studies often lack adequate control over extraneous factors that may influence the outcome of interest. As a result, relying on observational evidence to support a causal claim frequently leads to flawed arguments (Kuhn, 1991; Kuhn and Dean, 2004).

In essence, evaluating causal arguments necessitates recognising the relative quality of available evidence supporting a claim and aligning the type of evidence with that claim (Kuhn, 1991). This requires to coordinate one’s theoretical understanding of the claim with the quality of the evidence to reach a valid conclusion (Kuhn and Dean, 2004; Kuhn, 2012). However, research indicates that aligning claims with evidence is a challenging task that is prone to fallacies (Bromme and Goldman, 2014; Kuhn, 2012; Kuhn and Modrek, 2022). For instance, people often find anecdotal evidence to be as compelling as experimental evidence when substantiating a causal theory (Kuhn, 1991; Hoeken and Hustinx, 2009). This tendency seems particularly prevalent in emotionally engaging situations (Freling et al., 2020) when information is presented in narrative form (Kuhn, 1991), aligns with preexisting beliefs (Schmidt et al., 2022) and has high plausibility (Michal et al., 2021). Even when the flawed nature of evidence is explicitly highlighted, biased reasoning can occur (Braasch et al., 2014; Steffens et al., 2014), such as drawing causal conclusions from observational data (Bleske-Rechek et al., 2015).

1.2 (Preservice) teachers’ abilities in claim–evidence coordination

Given the aforementioned widespread prevalence of causal fallacies in society (Bleske-Rechek et al., 2015; Kuhn, 2012; Seifert et al., 2022), it seems plausible that preservice teachers may also have difficulties in effectively coordinating claims with evidence. This expectation is also backed up by curricular analysis indicating that teacher education programmes typically do not offer the systematic methodological training that would provide a thorough foundation in scientific (causal) reasoning (Darling-Hammond, 2017; Engelmann et al., 2022; Rochnia et al., 2023; Pieschl et al., 2021). Moreover, although specific research on preservice teachers’ claim–evidence coordination is scarce, the broader literature on preservice and in-service teachers’ engagement with research highlights substantial motivational and skill-related barriers (e.g., Ferguson et al., 2023; Kiemer and Kollar, 2021; see for review van Schaik et al., 2018). We will elaborate on the discussed issues before delving into additional individual factors that might affect claim–evidence coordination.¹

1.2.1 Lack of methodological learning opportunities in teacher education

Despite the increasing recognition of teaching as a research-based profession and related developments in teacher education (Bauer and Prenzel, 2012; Darling-Hammond, 2017), study curricula commonly lack dedicated learning opportunities in research methods and statistics (Rochnia et al., 2023). Initial teacher education primarily aims to equip future teachers with a scientifically founded knowledge base essential for achieving high instructional quality and advancing student learning (Darling-Hammond, 2017; Rochnia et al., 2023). Although many countries, including Germany where the present study was conducted, emphasise fostering a scientific mindset, often through practitioner-research projects (Bock et al., 2024; Böttcher-Oschmann et al., 2021; Westbroek et al., 2022), comprehensive training in research methods akin to disciplines like psychology or sociology remains notably absent (Pieschl et al., 2021; Thomm et al., 2021b). Teacher education also seems to offer less methodological training compared with other profession-oriented study programmes, such as medicine (Rochnia et al., 2023). Empirical studies among preservice teachers and in-service teachers have frequently found that they tend to exhibit limited methodological knowledge and skills in scientific reasoning and argumentation (Groß Ophoff et al., 2017; Schmidt et al., 2023; Williams and Coles, 2007). These competencies are foundational for understanding and critically engaging with research (e.g., Joram et al., 2020; Niemi, 2008), including coordinating claims and evidence within (causal) argumentation.

1.2.2 Abilities for engagement with evidence and claim–evidence coordination

The expanding literature investigating teachers’ evidence-informed reasoning and engagement with educational research (e.g., Kollar et al., 2023; Thomm et al., 2021c) documents barriers linked to limited abilities (e.g., Thomm et al., 2021b; Williams and Coles, 2007) and dysfunctional motivational orientations, attitudes, and beliefs (e.g., Bråten and Ferguson, 2015; Merk et al., 2017; Voss, 2022). Research reception requires sufficient skills in finding, reading, evaluating, and applying relevant research knowledge and evidence (Thomm et al., 2021c). Previous research suggests that preservice and in-service teachers often show only insufficient skills to draw on and reason along with research findings or report low confidence in their abilities to do so (e.g., Duke and Ward, 2009; Ferguson et al., 2023; van Schaik et al., 2018; Wenglein, 2018). At the same time, teachers frequently tend to devaluate the relevance and applicability of educational research to inform their professional actions and decisions (e.g., Farley-Ripple et al., 2018; Thomm et al., 2021b; Voss, 2022). Hence, compared with students in other profession-oriented disciplines like medicine, preservice teachers seem to develop a research-oriented mindset to a lesser extent (Rochnia et al., 2023). Notably, preservice teachers exhibit a strong and persistent preference for anecdotal evidence sources, such as personal experiences or reports from colleagues (e.g., Ferguson et al., 2023; Kiemer and Kollar, 2021). As a result, active teachers rarely draw upon research-based knowledge to inform professional action and decision-making (van Schaik et al., 2018).

Concerning argumentation, preservice teachers often encounter difficulties in constructing evidence-based arguments unless they receive specific training (Iordanou and Constantinou, 2014; Uçar and Cevik, 2020). For instance, Wenglein et al. (2015) and Wenglein (2018) found that preservice teachers struggled to integrate scientific evidence they had been presented with to construct evidence-based arguments on educational topics, unless they received dedicated training. Instead, a substantial proportion of participants constructed weak arguments, relying only on anecdotal evidence or no evidence at all. Regarding evidence evaluation and coordination, some studies suggest that preservice teachers possess basic abilities to also differentiate different types of evidence (Reuter and Leuchter, 2023); however, this ability may primarily pertain to distinguishing between anecdotal and scientific evidence rather than discerning evidence derived from different scientific study types, such as observational versus experimental designs (see List et al., 2022). Although such research sheds light on the relevant abilities for causal argumentation, further investigation into how (preservice) teachers discern different evidence types and evaluate their respective strengths to support causal arguments is warranted.

1.2.3 Additionally relevant factors

Beyond the abilities described above, reasoning and arguing about educational issues can depend on perceptions of the specific topic at hand (Asberger et al., 2021). Specifically, teachers’ prior knowledge and interest in a topic can substantially shape the way they interpret the available evidence and use it for argumentation (Schmidt et al., 2022; Yang et al., 2015). People frequently assess the plausibility of a claim or its consistency with evidence based on their preexisting beliefs about the topic (Abendroth and Richter, 2023; Futterleib et al., 2022; Michal et al., 2021; Thomm et al., 2021a; Wolfe et al., 2009).

Furthermore, research suggests that engaging in epistemic activities, such as evaluating evidence, can depend on personal epistemic orientations that define an individual’s subjective understanding of what makes a valid argument and constitutes evidence (Fischer et al., 2014; Fives et al., 2017; Garrett and Weeks, 2017). For instance, individuals vary in their perceived need for supporting claims with valid evidence and in whether they consider intuition adequate support for establishing a claim’s truth, as opposed to seeking factual evidence (Chinn et al., 2014; Garrett and Weeks, 2017).

Given the potential role of these factors in preservice teachers’ abilities to differentiate between various types of evidence and coordinate their judgement with causal claims, we have included them as additional (exploratory) covariates in our study.

2 The present study

To address the abovementioned research gaps, the present study investigated preservice teachers’ ability to discern different types of evidence and its strength in supporting causal arguments about educational topics. The particular interest in substantiating causal arguments aligns with the causal nature of many—if not most—issues relevant to teachers’ work and schooling, such as the effectiveness of teaching methods (e.g., Kvernbekk, 2016; Shavelson and Towne, 2002). Given that anecdotal or correlational evidence provides insufficient support for causal arguments but is frequently preferred by preservice teachers (e.g., Ferguson et al., 2023), gaining a deeper understanding of their ability to judge evidential support seems crucial (see List et al., 2022). For this purpose, we conducted a repeated measures experiment to scrutinise whether and how preservice teachers can discern the evidential support provided by various types of evidence (i.e., anecdotal, correlational, and experimental).

To better gauge preservice teachers’ ability of judging evidential support, we compared them to psychology students as a benchmark. We chose the latter as a reference group, first, because both teacher education and psychology are pertinent to educational topics, such as issues of teaching and learning (cf. Asberger et al., 2020, 2021). This common ground aided a meaningful comparison. However, second, study programmes in teacher education and psychology are strikingly different regarding methodological learning opportunities. Psychology programmes contain comprehensive training in research methods and statistics, as required by established curricular standards [e.g., American Psychological Association (APA), 2023; Deutsche Gesellschaft für Psychologie (DGPs), 2014]. In contrast, as detailed above (see Section 1.2), systematic methodological training is rarely part of teacher education programmes. Moreover, psychology traditionally places a strong emphasis on the use of experimental methods for causal inference (e.g., Shadish et al., 2002). This specific methodological training can be expected to facilitate psychology students in recognising different types of evidence, such as correlational or experimental (Morling, 2014; Mueller and Coon, 2013; Seifert et al., 2022), and may foster skepticism towards unsupported beliefs and anecdotal evidence (Green and Hood, 2013; Leshowitz et al., 2002).

Before conducting the experiment, we preregistered the following hypotheses based on the theoretical reasoning and prior research outlined above.²

First, we expected that preservice teachers would agree more with a causal claim (H1a; claim agreement) and perceive it as more convincing (H1b; convincingness) when supported by anecdotal evidence rather than by correlational or experimental evidence, respectively. Additionally, we hypothesised that preservice teachers would attribute greater support strength to anecdotal evidence compared with correlational or experimental evidence, respectively (H1c; strength of evidential support). Second, we expected that psychology students would agree more with a causal claim (H2a) and find it more convincing (H2b) when supported by experimental evidence than by correlational or anecdotal evidence, respectively. Furthermore, we assumed that psychology students would attribute higher strength of evidential support to experimental evidence compared with correlational or anecdotal evidence, respectively (H2c).

In an additional exploratory analysis, we controlled for potential effects of the factors previously discussed: (a) familiarity with research methods and statistics, (b) topic-related prior knowledge and interest and (c) faith in intuition and the need for evidence, which represent crucial aspects of epistemic orientations. Although not central to the internal validity of testing our primary hypotheses, we considered investigating these covariates as promising for gaining a more nuanced understanding of the personal factors contributing to claim–evidence coordination.³

3 Methods

3.1 Data and materials availability

Open data, as well as all materials, are available on the Open Science Framework (OSF) under https://osf.io/xw64c.

3.2 Design and participants

The current study followed a 2 × 3 mixed design with evidence type (anecdotal vs. correlational vs. experimental evidence) as a within-participant factor and study field (preservice teachers vs. psychology students) as a between-participant factor. The participants evaluated three causal claims about different educational topics: the testing effect (Rowland, 2014), the effectiveness of advance organisers in teaching (Stone, 1983) and the benefits of self-regulated learning (Dent and Koenka, 2016). Each claim was supported by one of the three evidence types, with the assignment of evidence type being balanced across subjects and presented in randomised order. For details, see Section 3.4.

Based on an a priori power analysis (Faul et al., 2007), we aimed at a sample size of N = 142 participants to have sufficient statistical power (95%) for detecting medium-sized effects (f² = 0.25) in a mixed ANOVA with a significance level of α = 0.05. Undergraduate students were recruited online from mail distribution lists and university lecturers across multiple German universities. Participation was voluntary. The participants could either enter a lottery of vouchers or receive course credit as an incentive. We followed APA ethical standards, and the study was approved by the ethics committee of the University of Erfurt.

A total of N = 230 participants responded to the questionnaire and provided informed consent. In line with our preregistration, data of those participants who indicated they had not responded sincerely (n = 3), withdrew their consent at the end of the study (n = 5), were not enrolled in initial teacher training or psychology (n = 10) or missed responses on complete evidence conditions were excluded (n = 77). The final sample consisted of N = 135 university students (83.7% female, M = 22.48 years, SD = 3.88). Of this total, n = 61 were preservice teachers (78.7% female, M = 23.70 years, SD = 3.63), with 63.9% enrolled in a bachelor’s degree programme (M = 5.05 semesters, SD = 1.11) and 32.8% in a master’s degree programme (M = 2.94 semesters, SD = 1.11). Participants aimed to become teachers for either elementary school (63.9%), general and vocational secondary schools (19.6%), and special needs education (16.4%). The larger percentage of primary education is due to the specialization of the addressed universities in teacher education. Further, n = 74 participants studied psychology (87.8% female, M = 21.47 years, SD = 3.80). Most of them were enrolled in a bachelor’s degree programme (93.2%, M = 4.00 semesters, SD = 2.00), while some completed a master’s degree (6.8%, M = 3.00 semesters, SD = 1.58).

To ensure data quality, in a preliminary analysis, we checked for extreme outliers. As a criterion, we used z-values (z < 3.29; Field, 2012) for univariate outliers and Mahalanobis distance (MAH > 50; Tabachnick and Fidell, 2013) for multivariate outliers. This resulted in the identification of four outlying cases. Following our preregistration, we present the complete sample analysis. An additional analysis excluding outliers did not lead to different substantive conclusions compared with the complete sample. The results, excluding outliers, are available in Supplementary material S1.

3.3 Procedure

The study was conducted online. After introductions and providing informed consent, the participants were instructed that they were to read texts on the three educational topics mentioned above and answer related questions. For each of the three texts, the assessments followed the same sequence. First, the participants were told the topic and rated their prior knowledge and interest in it. Subsequently, they read the argument and assessed (a) their personal agreement with the claim (claim agreement), (b) how convincing they perceived the argument to be (argument convincingness), and (c) the strength of the evidential support provided in the text (strength of evidential support). Subsequently, they were asked for a brief written justification for their assessment of the evidential support. Having evaluated the arguments, we measured the participants’ familiarity with research methods/statistics, captured their need for evidence and faith in intuition and asked for demographic information (i.e., study field, study degree, number of semesters studied, gender, and age).

After completing the study, the participants could withdraw their participation, responded to a seriousness check and received a debriefing entailing information about the study’s goal, the experimental manipulation and additional scientific information about the presented topics.

3.4 Experimental manipulation

The participants read three short texts, each presenting an argument about the respective educational topic (see Table 1 for an example). We chose the following topics for their sound evidence base in educational-psychological research: the testing effect (e.g., Rowland, 2014), the effectiveness of advance organisers (e.g., Stone, 1983) and the benefits of self-regulated learning (e.g., Dent and Koenka, 2016).

Table 1

Table 1. Example of argument text (“testing effect”).

All texts followed a parallel structure. Each began with a sentence introducing the topic. Next, the text presented the causal claim of interest. The phrasing of the claim already pointed to the evidential support that followed. Evidential support was systematically varied: it briefly described either anecdotal evidence of an experienced in-service teacher, the design and result of a correlational study (scientific correlational evidence) or the design and result of an experimental study (scientific experimental evidence). The combination of evidential support and topic was balanced, and the order of the presented evidence was randomised to prevent sequence effects. Each text led to the same causal conclusion about the presented learning or teaching method.

The texts with nine possible combinations of topic and evidence were of a similar length (M = 147.33 words, SD = 2.87, range = 143 words to 154 words) and difficulty (Flesh Reading Index: M = 26.11, SD = 4.56). Before the experiment, we conducted cognitive interviews with 10 preservice teachers to examine the comprehensibility of the text materials and modified them by following the participants’ comments.

3.5 Measures

3.5.1 Dependent variables

All dependent variables (i.e., claim agreement, argument convincingness, and strength of evidential support) were measured using single items on a 7-point rating scale with higher numbers indicating higher prevalence. Item texts were as follows: “How much do you agree with the statement that [topic claim (e.g., learning processes are more effective when the content is actively recalled from memory after an initial learning phase through tests that accompany learning)]?”, (1 = do not agree at all; 7 = fully agree) for claim agreement. “How convincing do you find the argument given in the text that [topic claim]?”, (1 = not convincing at all; 7 = fully convincing) for argument convincingness. “In the text segment, the statement [Topic claim] is supported by [Evidence type]. How good does this evidence support the statement in your opinion?”, (1 = does not provide support at all; 7 = provides very good support) for the strength of evidential support.

3.5.2 Covariates

For exploratory purposes (see Section 2), we collected data on the covariates listed below. Unless stated otherwise, all used a 7-point rating scale similar to the dependent variables. For item texts, please see Supplementary material S2.

Prior knowledge (i.e., “I have comprehensive prior knowledge of the topic”) and topic interest (i.e., “I am very interested in this topic”) were assessed by single items. Familiarity with research methods/statistics was measured by an established scale asking the participants to rate their understanding of methodological concepts (e.g., “quasi-experiment”, “correlation”) on a 4-point scale ranging from 1 (= do not know the concept) to 4 (= understand the concept and could explain it to someone else) (7 items, α = 0.81; Mang et al., 2018; Thomm et al., 2021b). Finally, we collected data on two aspects of epistemic orientations, adopting scales from Garrett and Weeks (2017): the need for evidence (e.g., “Evidence is more important than whether something feels true”, 4 items, α =0.72) and faith in intuition (e.g., “I trust my gut to tell me what’s true and what’s not”, 4 items, α = 0.77).

3.6 Analyses

To examine H1 to H3, we used mixed ANOVAs and follow-up t-tests applying Bonferroni correction. The significance level was set to α < 0.05. We judged the effect sizes according to Cohen’s criteria (Cohen, 1988).

Prior to the analyses, we checked for potential topic differences. Preliminary ANOVA with follow-up t-tests did not reveal significant differences in the outcome measures across the three topics: claim agreement, F(2, 135) = 2.848, p = 0.060; argument convincingness, F(2, 135) = 2.683, p = 0.070; and strength of evidential support, F(2, 135) = 0.204, p = 0.815. Therefore, we considered them comparable and averaged across topics. Furthermore, we examined whether the data met the assumptions for conducting ANOVAs. Inspection of P–P plots, skewness and kurtosis pointed to a violation of the normal distribution assumption in all dependent variables. Following the recommendations of Tabachnick and Fidell (2013), we used square root transformations for moderate negative skewness to correct for these violations and reran all analyses. The results from these analyses led to identical conclusions, as with the untransformed data. Conforming with our preregistration, we will report the results from the analyses with untransformed data. The results for transformed data are available in Supplementary material S3.

In the exploratory analyses, we examined the potential effects of controlling for the mentioned covariates. To this end, we employed multilevel models (MLM) with repeated measures of the dependent variables at level one and individual-level covariates at level two. MLM were conducted following the procedures laid out by Field et al. (2012), using the nlme package (3.1–164) in R (4.2.3). We expanded the models stepwise⁴: Model 1 (M1) served as a baseline for further analyses and was set up analogously to the mixed ANOVA tests of H1 to H3. Hence, M1 can also be considered a robustness check for the ANOVA results (Field et al., 2012). Model 2 (M2) added the individual level covariates related to knowledge and topical content relation (i.e., prior knowledge, topic interest, and participants’ familiarity with research methods/statistics). Model 3 (M3) added individual faith in intuition and need for evidence as further individual level predictors.

4 Results

4.1 Examining the effects of evidence type and field of study

Regarding the hypotheses on claim agreement (H1a/H2a), the results from the mixed ANOVA indicated no statistically significant effects [evidence type, F(2,135) = 0.30, p = 0.739, part. η² = 0.00; study field, F(1,135) = 0.01, p = 0.938, part. η² = 0.00; interaction evidence type x study field, F(2,135) = 1.86, p = 0.157, part. η² = 0.01]. That is, in contrast to H1a and H2a, both preservice teachers and psychology students agreed equally high with the presented claim, regardless of the type of evidential support (see Figure 1).

Figure 1

Figure 1. Preservice teachers’ and psychology students’ claim agreement by evidence type.

Concerning argument convincingness (H1b/H2b), the mixed ANOVA revealed a significant main effect of evidence type, F(1.89,135) = 4.32, p = 0.016, part. η² = 0.03. In contrast, the main effect of study type, F(1,135) = 0.06, p = 0.815, part. η² = 0.00, and the interaction evidence type x study field, F(1.89, 135) = 1.54, p = 0.218, part. η² = 0.01, failed to reach significance (see Figure 2). Follow-up t-tests on the main effect of evidence type showed that, overall, the participants judged arguments supported by experimental evidence more convincing than those drawing on anecdotal evidence, t(134) = 2.56, p = 0.011, d = 0.22. However, experimental evidence was not perceived as more convincing than correlational evidence, t(134) = 0.61, p = 0.541, d = 0.05. Finally, the participants considered correlational evidence as more convincing than anecdotal, t(134) = 2.54, p = 0.012, d = 0.22. Regarding the interaction, there was a descriptive group difference in the expected direction, indicating that psychology students judged anecdotal evidence as less convincing than experimental evidence, but it was not statistically significant. In summary, these results provide support for H2b, even though H1b cannot be maintained.

Figure 2

Figure 2. Preservice teachers’ and psychology students’ assessment of argument convincingness by evidence type.

Regarding support strength (H3a/H3b), the mixed ANOVA showed a significant main effect of evidence type, F(1.92, 135) = 18.48, p < 0.001, part. η² = 0.12, but again, no further significant effects for study field, F(1,135) = 0.10, p = 0.753, part. η² = 0.00, or for their interaction, F(1.92,135) = 1.22, p = 0.295, part. η² = 0.01, were found (see Figure 3). Consistent with the findings on convincingness, follow-up t-tests for the main effect of evidence type indicated that participants assessed experimental evidence to provide stronger evidential support than anecdotal evidence, t(134) = 6.00, p < 0.001, d = 0.52, or correlational evidence, t(133) = 3.36, p = 0.001, d = 0.29. Moreover, they assessed correlational evidence to provide stronger support than anecdotal evidence, t(133) = 3.12, p = 0.002, d = 0.27. These results are not in line with H1c but corroborate H2c.

Figure 3

Figure 3. Preservice teachers’ and psychology students’ assessment of strength of evidential support by evidence type.

4.2 Exploring the individual differences in participants’ assessments

Regarding the model fit of the MLM, the results showed that, for claim agreement, both M2 [χ²(3) = 9.09, p = 0.028] and M3 [χ²(5) = 14.74, p = 0.012] significantly improved the fit over the baseline M1, while differences between M2 and M3 were not statistically significant [χ²(2) = 5.64, p = 0.060]. Concerning argument convincingness, M1 and M2 had similar fit [χ²(3) = 7.03, p = 0.071]. However, M3 yielded significant improvements over both M1 [χ²(5) = 17.79, p = 0.003] and M2 [χ²(2) = 10.77, p = 0.005]. Finally, regarding the strength of evidential support, there was no significant difference between M1 and M2 [χ²(3) = 2.75, p = 0.431]. However, M3 provided a significantly better fit than both M1 [χ²(5) = 22.36, p < 0.001] and M2 [χ²(2) = 19.60, p < 0.001].

Inspecting the results of the MLM (Table 2), including the covariates in Models 2 and 3, did not lead to substantial changes regarding the pattern of the experimental treatment effects for any of the dependent variables. Notably, the estimated parameters for the type of evidence remained numerically almost identical when comparing M1 with M2, and M1 with M3, respectively. Moreover, the MLM results were consistent with the ones from the ANOVAs reported above regarding the significant effects of correlational evidence and experimental evidence on argument convincingness and the strength of the evidential support.

Table 2

Table 2. Multilevel models of claim agreement, argument convincingness, and strength of evidential support.

Furthermore, we observed several statistically significant effects of the covariates that, however, occurred mostly differently for the respective dependent variables. Of the knowledge and topic-related variables, prior knowledge had a small negative effect on argument convincingness, and topic interest had a small positive effect on claim agreement. Both effects were stable across M2 and M3. Familiarity with research methods and statistics had a small positive effect on claim agreement; however, this was only the case for Model 2. Of the epistemic orientations added in M3, faith in intuition had small to medium positive effects on all dependent variables, whereas the need for evidence was statistically non-significant throughout.

5 Discussion

Experimental evidence, where available, is the most appropriate support for substantiating causal claims (Holland, 1986). This is also true for causal questions frequently encountered in educational topics (Kvernbekk, 2016; Shavelson and Towne, 2002). Unfortunately, general cognitive biases such as mistaking coincidence and correlation for causation (Bleske-Rechek et al., 2015; Seifert et al., 2022) alongside strong preferences for anecdotal evidence (e.g., Bråten and Ferguson, 2015; Merk et al., 2017; van Schaik et al., 2018) can lead (preservice) teachers to adopt unreliable knowledge and one-sided advice, which may subsequently influence their professional judgements, decisions and actions. Therefore, promoting evidence-informed practice in the teaching profession necessitates that future teachers be able to discriminate the quality of evidence for appropriately supporting claims (Reuter and Leuchter, 2023; Wenglein et al., 2015). The present study expands prior research by investigating this basic ability in the context of evaluating causal arguments. Specifically, we examined preservice teachers’ evaluations of anecdotal, correlational and experimental evidence in supporting causal claims about educational topics and compared their evaluations with those of psychology students as a benchmark group.

Our study revealed a notable and somewhat surprising pattern of results. Contrary to our expectations, we found no significant differences in the participants’ evaluations based on their study field. Despite indications from teacher education curricula and prior evidence suggesting that preservice teachers are less trained in the evaluation of evidence, they demonstrated comparable abilities to judge causal support as psychology students. Both groups could discriminate between the quality of the investigated evidence types in terms of argument convincingness and strength of support. Specifically, the participants perceived experimental evidence as more convincing than anecdotal evidence and as providing the strongest support for causal claims. However, these correct assessments did not translate into their personal claim agreement, which remained unaffected by the type of evidence they encountered. Notably, this pattern of results persisted even when controlling for relevant covariates, such as topic interest, prior knowledge, and epistemic orientations. These findings contrast with previous studies that have raised concerns about the skills of in-service and preservice teachers to engage adequately with research evidence and argumentation (Lytzerinou and Iordanou, 2020; Rochnia et al., 2023; van Schaik et al., 2018; Zimmermann and Mayweg-Paus, 2021).

Regarding claim agreement (H1a/H2a), we expected preservice teachers to indicate stronger agreement with arguments entailing anecdotal rather than correlational or experimental evidence and assumed the reverse response of psychology students. However, there was neither an effect of evidence type nor of study field on claim agreement. Although these findings go against our assumptions, they appear reasonable in hindsight. The research on inert knowledge has shown that people often fail to utilise their existing knowledge in relevant task situations (Cavagnetto and Kurtz, 2016; Renkl et al., 1996). Accordingly, it is possible that the participants acknowledged differences in evidential support but did not draw appropriate conclusions to determine their personal claim agreement. An alternative explanation could be that, even though it seems normatively reasonable to base one’s judgement on the evidence type, people may form their judgement based not only on this single factor. Previous research has observed similar discrepancies between what people may personally think about a scientific claim and what they consider scientifically more adequate or credible information (e.g., Scharrer et al., 2017; Thomm and Bromme, 2012, 2016). For example, Thomm and Bromme (2012) found that presenting information about scientific topics in a scientific text style (e.g., including citations and method information) compared with a factual one enhanced perceived credibility but had no effect on claim agreement. Hence, additional factors, such as prior beliefs and plausibility assessments, may affect claim agreement as well (Barzilai et al., 2020; Futterleib et al., 2022; Richter and Maier, 2017). Indeed, the results from our exploratory analysis indicated that the participants’ topic interest and faith in intuition played a role in their claim agreement. This may suggest that participants weighed the quality of the encountered evidence with their personal plausibility judgements. These interpretations should be treated with due caution, however, because the mentioned covariate effects were exploratory and of a small size. This notwithstanding, future research should investigate the interplay of such factors more closely.

For the convincingness of the arguments (H1b/H2b), we expected to find higher assessments of anecdotal evidence by preservice teachers and, in contrast, higher assessments of experimental evidence by psychology students when compared with the other types of evidence. Although the results regarding psychology students were widely in line with these assumptions, both groups found arguments entailing anecdotal evidence to be less convincing than those entailing either correlational or experimental evidence. However, the participants did not differentiate between these two types of scientific evidence. Thus, the contrast between arguments entailing anecdotal and scientific evidence might have been more salient than the additional, more nuanced difference between scientific correlational and experimental evidence. This pattern of results ties in with List’s (2024) finding that university students failed to discern the quality of correlational and causal evidence about a scientific claim, whereas they judged both to be of a higher quality than mere anecdotal evidence (see also List et al., 2022). This categorical difference might be easier to recognise, even for individuals with limited methodological knowledge.

For the strength of evidential support (H1c/H2c), we anticipated a similar pattern of assessments as observed for the convincingness of the argument. However, both preservice teachers and psychology students judged experimental evidence to be the strongest support for a causal claim. Together with the findings on convincingness, this suggests that the participants were able to distinguish not only between anecdotal and scientific evidence but also to make the more subtle distinction between the two types of scientific evidence. This may seem surprising given the frequently raised doubts about such abilities, even among university students (e.g., List, 2024). However, this apparent discrepancy may not be as pronounced as it seems at first glance. In the present study, the participants read simple, well-structured arguments designed to capture a fundamental understanding and coordination of evidence types for supporting causal claims. This setting might have facilitated discerning different types of evidence, and participants might have struggled with more complex or controversial arguments (see List, 2024; Menz et al., 2020; Münchow et al., 2023). Moreover, Reuter and Leuchter (2023) caution that, even if preservice teachers can recognise differences between more and less robust research evidence, it remains unclear whether they have a consistent understanding of the features that constitute evidence strength. That is, the existing studies offer no insights into which characteristics of the presented evidence participants were focussing on to form their judgements. Ideally, these judgements would be based on their knowledge of methodological principles and an understanding of why scientific evidence is preferable to anecdotal evidence, as well as why experimental evidence outweighs correlational evidence in supporting a causal claim. However, participants might also have referred to more superficial characteristics, such as the appearance of “scientificness” (Thomm and Bromme, 2012) in some texts, deeming them more trustworthy. Future research might delve deeper into whether participants indeed possess an adequate understanding of the principles that render different types of evidence more or less conclusive for causal hypotheses. So far, our findings demonstrate that preservice teachers possess at least a basic understanding of different kinds of evidence and their differential strength in supporting causal arguments.

Thus, taken together, our findings speak against simplistic and mainly deficit-oriented assumptions about preservice teachers’ competences in engaging with research. Although the skills investigated in the present study are certainly basic, they also address the very core of understanding research and using claim–evidence coordination in argumentation. This is remarkable given the frequent lack of methodological training in teacher education. However, the less clear distinction between the two investigated types of scientific evidence might indicate that both preservice teachers and psychology students still face challenges in understanding the crucial distinction in correlational and experimental research. As a result, there is a need to support the development of these skills to prevent correlation–causation fallacies when reasoning about causal questions in education. More generally, it seems valuable to support preservice teachers in better understanding the nature of diverse types of evidence and their qualities regarding claims about educational topics.

Limitations. Several limitations of the present study deserve attention. First, as mentioned, we presented participants with clearly structured text materials and used topics for which there is a sound research base. This allowed us to capture basic skills of claim–evidence coordination in a controlled way, without the danger of introducing extraneous variance that might have resulted from differential understanding and engagement with longer and more difficult texts. More authentic text materials that more closely resemble real research would certainly need to be more complex. For example, future studies could present participants with multiple documents that provide different types of evidence or contain inconsistent results (see Thomm et al., 2021c). Additionally, features of the addressed educational topic—for instance, to what degree it is politically controversial and emotionally laden—might play a role in what type of evidence people perceive as compelling (Aguilar et al., 2019; Darner, 2019; Sinatra et al., 2014). Hence, follow-up studies could systematically vary such topic-related characteristics.

Second, the within-participant design of the study offers several advantages, but it also introduces certain limitations. One notable advantage lies in the sequential presentation of texts featuring all three types of evidence, which enhances their contrast (cf. Birnbaum, 1999) and, thus, augments the systematic experimental variance. However, given this design, it is crucial to interpret participants’ differential judgements of evidential support as relative comparisons across texts rather than absolute evaluations. Therefore, our findings may be more applicable to situations where preservice teachers evaluate different evidence sources against each other as opposed to assessing a single piece of evidence. Nonetheless, encountering multiple information sources on a given topic is a realistic scenario, as noted earlier (Bromme and Goldman, 2014; Sinatra and Lombardi, 2020; Thomm et al., 2021c).

Third, although within-participant designs are advantageous for controlling individual-level factors, potential order effects pose a threat to internal validity. Even though the fully balanced sequence of conditions effectively eliminates position effects, controlling differential carryover effects in within-participant designs is challenging. These effects occur when the (potential) carryover effect of treatment condition A (e.g., anecdotal evidence) on condition B (e.g., experimental evidence) differs from the carryover effect of condition B on condition A (Maxwell et al., 2018). Although, in our view, there is no substantive reason to consider differential carryover as particularly likely for our three types of evidence, its potential occurrence is a concern, especially in a within-design with a small washout period between treatments, as in the present study. Evaluating the stability and strength of our results would require reconceptualising the study by employing a between-participants design.

Finally, it is an open issue to what degree our findings generalize to in-service teachers. Teachers with longer work tenure might differ in evidence evaluation based on their professional experience and larger distance to academia as compared to preservice teachers (Hillmayr et al., 2024; van Schaik et al., 2018). Moreover, in future research on claim-evidence coordination it could be interesting to include comparison groups from other fields that are unrelated to educational issues (e.g., STEM disciplines, economy; cf. Asberger et al., 2020, 2021). Doing so would aid our understanding what role of prior knowledge of the topic domain plays in causal reasoning about educational issues.

Notwithstanding these limitations, we believe that our findings provide important insights into the ability of preservice teachers to evaluate different types of evidence that they may encounter when faced with causal questions in education, such as the effectiveness of instructional methods. Building on the foundational skills that we observed would require additional learning opportunities in teacher education that promote their understanding of different qualities of evidence, also in relation to the claim it shall support. This is important because, like many other people, they are susceptible to the fallacy of misinterpreting coincidence and correlation as sufficient support for causal claims.

Author’s note

This study is preregistered with OSF preregistrations: doi: 10.17605/OSF.IO/R2TVW.

Data availability statement

The datasets presented in this study can be found online at https://osf.io/xw64c.

Ethics statement

The studies involving humans were approved by Ethics Committee of the University of Erfurt. The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Author contributions

AL: Writing – original draft, Writing – review & editing. ET: Writing – review & editing. JB: Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded by a scholarship awarded to Andreas Lederer by the Hanns-Seidel-Foundation e.V., Munich.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2024.1379222/full#supplementary-material

Footnotes

1. ^Because claim-evidence coordination primarily involves individual cognitive processes, we refrain from elaborating on contextual factors that might influence teachers’ engagement with research, such as constraints in availability or time or dysfunctional social pressures (see, e.g., Gold et al., 2023; Greisel et al., 2023; Thomm et al., 2021b; van Schaik et al., 2018).

2. ^https://doi.org/10.17605/OSF.IO/R2TVW

3. ^For additional exploratory purposes unrelated to the primary research questions of the present study, we gathered supplementary data concerning the ascribed trustworthiness and expertise of the evidence sources. These variables were measured at the end of each respective measurement point and, therefore, could not have influenced the primary outcomes collected before. Because these variables are beyond the scope of this paper, we do not include them herein. Aside from this omission, we confirm the reporting of all experimental conditions and dependent variables.

4. ^Compared with the interaction or random-effects models, the fixed effects model consistently demonstrated superior fit across all dependent measures. M1 formula: Y_ij = β0 _(Intercept) + β1_(Evidence)ij + β2_{(Study Field)ij +} u0_{(Random Intercept)j} + e_ij. M2 formula: Y_ji = β0 _(Intercept) + β1_(Evidence)ij + β2_{(Study Field)ij} + β3_{(Topic Prior Knowledge)ij} + β4 _{(Topic Interest)ij} + β5 _{(Knowledge Methods/Statistics)ij +} u0_{(Random Intercept)j} + e_ij. M3 formula: Y_ji = β0 _(Intercept) + β1_(Evidence)ij + β2_{(Study Field)ij} + β3_{(Topic Prior Knowledge)ij} + β4 _{(Topic Interest)ij} + β5 _{(Knowledge Methods/Statistics)ij} + β6_{(Faith Intuition)ij} + β7 _{(Need Evidence)ij} + u0_{(Random Intercept)j} + e_ij.

References

Abendroth, J., and Richter, T. (2023). Reading perspectives moderate text-belief consistency effects in eye movements and comprehension. Discourse Process. 60, 119–140. doi: 10.1080/0163853X.2023.2172300

Preservice teachers’ evaluation of evidential support in causal arguments about educational topics

1 Introduction

1.1 Causal reasoning about educational topics

1.2 (Preservice) teachers’ abilities in claim–evidence coordination

1.2.1 Lack of methodological learning opportunities in teacher education

1.2.2 Abilities for engagement with evidence and claim–evidence coordination

1.2.3 Additionally relevant factors

2 The present study

3 Methods

3.1 Data and materials availability

3.2 Design and participants

3.3 Procedure

3.4 Experimental manipulation

3.5 Measures

3.5.1 Dependent variables

3.5.2 Covariates

3.6 Analyses

4 Results

4.1 Examining the effects of evidence type and field of study

4.2 Exploring the individual differences in participants’ assessments

5 Discussion

Author’s note

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

Footnotes

References

95% of researchers rate our articles as excellent or good