Constructing multi-theory vignettes to measure the application of knowledge in ambivalent educational situations

Lohse-Bossenz, Hendrik; Bloss, Christopher; Dörfler, Tobias

doi:10.3389/feduc.2022.996029

ORIGINAL RESEARCH article

Front. Educ. , 24 November 2022

Sec. Teacher Education

Volume 7 - 2022 | https://doi.org/10.3389/feduc.2022.996029

This article is part of the Research Topic Evidence-Informed Reasoning of Pre- and In-Service Teachers View all 16 articles

Constructing multi-theory vignettes to measure the application of knowledge in ambivalent educational situations

Hendrik Lohse-Bossenz^*

Christopher Bloss

Tobias Dörfler

Institute for Psychology, University of Education Heidelberg, Heidelberg, Germany

Research on evidence-based argumentation shows that (pre-service) teachers have difficulties in orienting their actions to existing theories and empirical evidence. This article addresses the knowledge content needed for this and presents a vignette-based procedure. Within each vignette, two different theoretical perspectives are addressed. The behavior of a teacher can be either suitable or unsuitable from both perspectives or more or less suitable depending on the perspective. In study 1, the procedure is piloted and in study 2, an intervention on a specific area of knowledge takes place. The results show that participants differentiate the vignettes as expected. The intervention leads to corresponding increases in knowledge, which likely relates to a change in the evaluations. The presented approach is discussed with regard to possible applications in the context of research on evidence-based argumentation.

Introduction

The decisions involved in planning, delivering, and evaluating school lessons are characterized by high levels of uncertainty (Floden and Buchmann, 1993). In the face of this uncertainty, teachers may rely on a variety of sources to make their pedagogical decisions: scientific theories, scientific evidence, subjective theories, beliefs, anecdotes, recipes, or even gut feelings (Stark, 2017; Kiemer and Kollar, 2021). Given that information linked to these specific sources is acquired via specific knowledge-building processes, its epistemic status varies, for example, with respect to trustworthiness and credibility (Fenstermacher, 1994). Although the idea that scientific evidence might be valuable in solving practical problems is controversial (Brown and Rogers, 2015), the use of educational evidence to explicate the reasons for pedagogical judgments seems to be beneficial, at least in cases of classroom problems that appear to occur repeatedly.

Several sets of findings indicate that (pre-service) teachers encounter challenges on two levels of scientific reasoning (Csanadi et al., 2021). On the process level, they may struggle to engage in the inquiry process (Klahr and Dunbar, 1988) or to follow the trajectory of epistemic processes suggested by Fischer et al. (2014). This means that they might not collect enough evidence before engaging in evaluation of that evidence; as a result, the process is unsystematic and speculative. On the content level, they may not be able to relate scientific knowledge from relevant domains to actual classroom incidents, because they lack the requisite knowledge that would enable them to make this transfer, or to do so in a suitable manner (Brown and Rogers, 2015; Hetmanek et al., 2015; Hartmann et al., 2016).

To date, research has seldom addressed situations in which different lines of actions (in the sense of conflicting evidence) are available. Consider, for example, a typical classroom situation in which two students become angry with one other and are arguing at a time when all the students have been asked to work quietly on their worksheets. From a classroom management perspective, the teacher should intervene immediately to enforce the classroom rule that time on task should be maximized (see Lenske et al., 2016 for evidence on the influence of classroom management on students’ learning gains). From the perspective of the development of a healthy classroom climate and peer relationships, the teacher may instead let the class stop working and use this situation to explicitly address productive ways to solve peer conflicts (Jennings and Greenberg, 2009). Here, two educational goals with unique theoretical and empirical backgrounds come into play and lead to divergence in the actions that the teacher could potentially take. Such situations are highly prevalent in educational settings, and teachers face the challenge of weighing the benefits of possible actions against one another and coming to a decision tailored to the specific situation.

Presumably, if teachers make decisions on the basis of the knowledge that is accessible to them, they may not even perceive certain cues in the situation that would have led to another decision (a reflection of insufficient knowledge). In other cases, in which the teacher does have access to knowledge from different fields, the process by which they evaluate strands of evidence which may lead to different decisions (fragmented knowledge) is not well understood.

To address these issues, this article presents the construction and validation of a vignette-based instrument, involving items presenting scenarios in which decisions may vary depending on the theoretical perspectives on which the decision is based. After describing the theoretical background and the process of constructing the instrument, we present two validation studies indicating that convergent and divergent theoretical perspectives lead to systematic differences in decision-making (Study 1) and that knowledge input influences judgments in a manner that indicates deeper evaluation of the cues corresponding to the newly-acquired knowledge (Study 2).

Theoretical background and research aims

There seems to be an increasing demand for teachers and policymakers to orient their respective educational and political decisions more towards evidence rather than relying on other sources such as subjective theories or anecdotes (Davies, 1999; Bromme et al., 2014). Following Stark (2017) and other researchers, we consider evidence to broadly consist of both theories and obtained empirical results that are valued by an individual as being of high scientific quality. That means that evidence does not have an independent existence in an objective sense, outside the judgment of individuals who attribute to it the specific property of meeting scientific standards (Bromme et al., 2014).

Research in the domain of evidence-based education often makes reference to the medical profession, where the parallel term evidence-based medicine is employed (Sackett et al., 1996). Although the feasibility of transferring theoretical perspectives from the medical to the teaching profession is under debate (Stark, 2017), the basic idea that teachers use evidence in their argumentation for or against specific decisions seems plausible. In their description of evidence-based argumentation, Csanadi et al. (2021) differentiate between content and process levels. The content level relates to knowledge which is used for evidence-based argumentation. On this level, strands of knowledge with variable epistemic status (Fenstermacher, 1994) are brought to bear. In addition to scientific theories and empirical results, subjective theories or case knowledge can also be put to use as sources in the argumentation process (Kiemer and Kollar, 2021). The process level itself can be further subdivided into the selection and the use of specific sources (Kiemer and Kollar, 2021). In turn, the use of specific sources consists of further subprocesses, including problem identification, hypothesis generation, and drawing conclusions (Fischer et al., 2014).

Recent research on the process level has provided insight into the ways in which (pre-service) teachers use or do not use evidence. For instance, Hetmanek et al. (2015) have demonstrated that pre-service teachers – despite being provided with the necessary information – do not use scientific evidence in their case analysis. Concerning the content level, recent studies have directly compared types of source to explore their specific role in the argumentation process. For example, Kiemer and Kollar (2021) have demonstrated that scientific theories are used more often than anecdotes or subjective theories in case analysis.

Research gap

To date, there has been a paucity of research concerning the comparison of sources with comparable epistemic status, such as convergent or divergent scientific theories. In such situations, heuristics like ‘scientific theories are more trustworthy than subjective theories’ provide no value. Instead, the evidence has to be evaluated with respect to the specific situation at hand and different strands must potentially be weighted differently in order to arrive at a decision. To this end, relevant information in the scenario, typically referred to as cues, must be observed and ultimately taken into account in the argumentation process. Furthermore, research on evidence-based argumentation in the domain of education has mainly focused on generic issues in teaching, such as motivation or general instruction. Subject-specific theories are seldomly addressed as sources of evidence.

To address these gaps, we developed an approach using multi-theory vignettes. The basic idea is to present situations that can be perceived differently from different perspectives. By defining two perspectives and their related core principles a priori in the process of constructing the vignettes, we can explicitly model participants’ decision-making processes and formulate hypotheses concerning their reactions to the situations depicted.

Construction of multi-theory vignettes

Vignettes as a test format are becoming increasingly popular in the field of teacher education (Brovelli et al., 2014). Under this approach, each vignette consists of a scenario that presents an authentic situation from a lesson in school involving specific issues which necessitate the activation of professional knowledge in order to address them, and they are considered to be a suitable tool to assess situational knowledge or the ability of participants to access their knowledge in specific situations. In particular, research in the field of professional vision regularly employs this approach (Santagata and Angelici, 2010; Meschede et al., 2017).

Under our multi-theory vignette (MTV) approach, we constructed a set of vignettes containing cues that would be relevant from two different perspectives: the first falling primarily under the scope of a specific model of teaching games in Physical Education (PE), and the second falling primarily under the scope of self-determination theory (SDT). The core principle from the former perspective is that of complexity reduction: most teaching approaches in the domain of PE agree that sporting games need to be reduced in complexity when they are integrated into a school’s curriculum (Kolb, 2005). Therefore, the teachers’ behaviors depicted in our vignettes can be considered suitable from a PE teaching perspective if they involve a cue indicating some kind of complexity reduction. The core principle from the latter perspective is the fulfilment of basic psychological needs. If students’ basic psychological needs are fulfilled, this appears to enhance their sense that their actions are self-determined and to increase their intrinsic motivation (Ryan and Deci, 2000). Research indicates that satisfaction of basic psychological needs is associated with self-determined motivation (Chen and Jang, 2010; Goldman et al., 2017; Hu and Zhang, 2017) and positive learning outcomes (Baeten et al., 2013; McEown et al., 2014; Salmi and Thuneberg, 2019). Therefore, teachers’ behaviors depicted in our vignettes that address students’ psychological needs can be considered suitable from the SDT perspective.

As complexity reduction and need satisfaction are conceptually unrelated and are principles that arise from different theoretical perspectives, we combined both perspectives with their core principles in our vignettes. A convergent vignette would depict a pedagogical situation in which the action of the fictitious teacher is either suitable (the core principles are fulfilled) or unsuitable (the core principles are not fulfilled) according to both perspectives. We adopted a labelling scheme in which convergent vignettes depicting suitable teacher behavior were labelled SS because they suggested a suitable teacher action as seen from both perspectives. In contrast, convergent vignettes depicting unsuitable actions were labelled UU, as they suggested an unsuitable teacher action from both perspectives. A divergent vignette would depict a teacher action that is suitable from one of the perspectives and unsuitable from the other. These vignettes were labelled U_gS_m if they depicted an action which could be considered suitable or need-supporting from the perspective of motivational psychology or SDT, but an unsuitable action from the perspective of teaching games; or S_gU_m if they depicted an action which could be regarded as suitable from the perspective of teaching games but unsuitable from the perspective of SDT or motivational psychology.

A total of 10 experts in the field of sports science with a focus on teaching games and 11 experts in motivational psychology were asked to evaluate our categorizations of 26 drafted vignettes as illustrating suitable or unsuitable actions from their expert perspective. To this end, they were informed beforehand of which teacher actions we considered to be suitable or unsuitable in terms of complexity reduction and need satisfaction. The sports science experts were not informed of the SDT interpretation of the vignettes, nor were they asked to rate the vignettes with this perspective in mind, and vice versa. The experts were also asked to name possible alternative actions for the teacher in each vignette. In general, the experts considered the vignettes to be authentic and suitable for our research purposes. However, some disagreement emerged concerning the suitability of the actions described, and experts from both fields suggested alternative actions for the teacher in a number of cases. It became clear that sports science experts with a focus on teaching games weight motivational considerations more heavily than psychology experts weight sports science considerations. After discussing all the results, excluding 10 vignettes, and slightly reformulating some vignettes, we arrived at a final set of 16 vignettes, four of each type (UU, SS, U_gS_m, and S_gU_m).

Example multi-theory vignette

There is not much activity happening on the field where 24 fifth-graders are playing dodgeball: only a few students are actively taking part by running, dodging the ball, trying to catch it and throwing it at their opponents. One of the less active players, who has already had to leave the active zone of the field, is now outside in the passive zone (from where it is possible to return to the active zone by successfully throwing the ball at an opponent). She is standing close to the teacher and says to him: “This game is sooo boring…,” looking expectantly at the teacher.

The teacher replies that it would not be so boring if she, the girl, took part in it more actively. When the first round of the game has finished and the second round is about to begin, the teacher reduces the size of the field.

This vignette was constructed and used in our test as a divergent vignette (S_gU_m). From the perspective of teaching games, the teacher reacts rather appropriately to the lack of activity among his students by reducing the field size (complexity reduction). This lack of activity is evident in the vignette through the descriptions of the many passive players on the field and also the girl’s claim of boredom. Although it cannot be assumed that the teacher’s response here represents the ideal reaction, it is certainly a possible solution to a lack of activity during a ball game. However, from the SDT perspective, the teacher’s reaction to the girl’s complaint is inappropriate because he does not address the basic psychological needs of the student in this situation (need satisfaction). His answer makes it clear that he would prefer the girl to eliminate her negative emotions as quickly as possible. Additionally, he gives an unclear instruction by telling the girl that she should take part more actively: it can be assumed that the girl does not know what ‘taking part more actively’ means. Therefore, the student’s psychological needs are not satisfied.

General hypotheses on multi-theory vignette ratings

As described, each vignette contained a problem, a dilemma, or a challenge to which a fictitious teacher’s reaction was depicted. Each ended with a description of the teacher’s actions, which were generally verbal, but sometimes non-verbal. As part of the instrument, participants were then asked to rate the fictitious teacher’s action in relation to the statement ‘The teacher’s action is suitable’, with higher ratings indicating greater perceived suitability. Convergent vignettes (i.e., those in which the actions are either suitable or unsuitable from both perspectives) are rather clear, and we thus expected participants to provide polarized ratings: UU vignettes should receive the lowest rating and SS vignettes the highest rating, indicating high unsuitability and high suitability, respectively. In contrast, we expected ratings for divergent vignettes (i.e., those in which the suitability of the actions varied depending on the perspective adopted) to be close to the middle of the scale, as participants should be undecided. An example train of thought for the participant might be: “This is an appropriate way of dealing with the issue [complexity reduction], but the way he talks to his students does not seem right… [no need satisfaction].” However, our objective was to establish a method of identifying the type of knowledge brought to bear by different participants in providing their ratings by investigating individual differences in the ratings of divergent vignettes. Specifically, if a participant judges the actions depicted in U_gS_m vignettes to be more suitable than those depicted in S_gU_m vignettes, it can be concluded that their knowledge of SDT seems to have been of greater importance in their decision; conversely, if a participant judges the actions depicted in S_gU_m vignettes to be more suitable than those depicted in U_gS_m vignettes, it can be concluded that cues relating to the perspective of PE teaching seem to have been more salient to them. Furthermore, by examining changes in these differences over time, we expected to be able to measure the effects of knowledge-building and application.

The present study

We conducted two studies to test the validity of the MTV instrument described above (Borsboom et al., 2004). In Study 1, we aimed to pilot the instrument with a sample of student teachers and a sample of sports science students. We expected the vignette ratings to exhibit the distribution described above, with SS vignettes receiving the highest ratings, UU vignettes the lowest, and US and SU vignettes receiving intermediate ratings. We also expected that the exact pattern would be dependent on the sample: specifically, we hypothesized that U_gS_m vignettes would be associated with lower suitability judgments than S_gU_m vignettes by sports science students and vice versa for student teachers. In Study 2, we tested the hypothesis that a knowledge intervention providing information on SDT would elicit an increase in the difference between participants’ U_gS_m and S_gU_m ratings.

Study 1: Pilot

Methods

Student teachers (Sample 1)

Sample 1 consisted of 153 pre-service teachers (127 female) from a university of specializing in education studies. The mean age was 21.85 years (SD = 3.06); 78.5% were in semester 3 of their studies or below, and the remaining 22.5% were in semesters 4 to 13. The vast majority (79.7%) were working towards a Bachelor of Arts in either primary or secondary education; a smaller number (18.3%) were working towards a master’s degree in education.

Sports science students (Sample 2)

Sample 2 consisted of 48 sports science students (27 female), with a mean age of 21.10 years (SD = 2.15). Most were working towards a Bachelor of Science (87.5%); the remainder were working towards a Master of Science (12.5%). Approximately, 77% were in either their first or their third semester of study; only two (4.2%) had advanced beyond semester five.

Instruments

Participants completed the MTV instrument, in which they were presented with 16 MTVs (four in each condition, sample item in section 2.4) and were asked to rate the statement ‘The teacher’s reaction is suitable’ in relation to each vignette. Ratings were given on a five-point Likert scale ranging from 1 (I completely disagree) to 5 (I completely agree). The instrument was administered online to both participant groups. Language complexity of vignettes (measured by the LIX-index, Lenhard and Lenhard, 2014-2022) seems comparable for all types of vignettes: LIX_UU = 42.15 (SD = 6.35); LIX_US = 49.49 (SD = 3.20); LIX_SU = 45.94 (SD = 8.53), LIX_SS = 47.60 (SD = 6.48); Kruskal-Wallis-H = 2.54, df = 3, p = 0.47; Bayes factor for ANOVA with H₁: LIX_UU = LIX_US = LIX_SU = LIX_SS compared to unconstraint model H_u indicates moderate evidence for equality assumption (BF_iu = 2.94). Subsequently, each participant provided a self-assessment of their knowledge of both SDT and the teaching of games in PE on a four-point (sample 1) or five-point (sample 2) Likert scale. To compare these ratings between scales, we transformed individual ratings to scores on a scale ranging from 0 (no knowledge) to 1 (advanced knowledge).

Analyses

To explore whether participants’ ratings followed the hypothesized patterns, mean ratings for each vignette were calculated. Next, a mean score was computed for each set of vignettes in the same condition (i.e., UU, SU, US, or SS); these scores can be interpreted as representing the mean rating for vignettes within each condition or cluster. Next, we conducted Welch’s t-tests to compare the mean ratings given by participants in samples 1 and 2 for each condition. We were particularly interested in these comparisons for the conditions involving divergent vignettes. Finally, we calculated a Baysian analysis of variance with repeated measures (Gu et al., 2018; Hoijtink et al., 2019): mean scores for conditions represented the within-subject factor with four levels and sample was the between-subject factor with two levels (sample 1 and sample 2) and the following informed hypotheses:

H1: μ_UU1 = μ_US1 = μ_SU1 = μ_SS1 (all means are equal in sample 1).

H2: μ_UU2 = μ_US2 = μ_SU2 = μ_SS2 (all means are equal in sample 2).

H3: μ_UU1 < μ_SU1 < μ_US1 < μ_SS1 (means are ordered with SU being lower than US in sample 1).

H4: μ_UU1 < μ_US1 < μ_SU1 < μ_SS1 (means are ordered with US being lower than SU in sample 1).

H5: μ_UU2 < μ_SU2 < μ_US2 < μ_SS2 (means are ordered with SU being lower than US in sample 2).

H6: μ_UU2 < μ_US2 < μ_SU2 < μ_SS2 (means are ordered with US being lower than SU in sample 2).

H7: μ_UU1- μ_UU2 = μ_US1 - μ_US2 = μ_SU1 – μ_SU2 = μ_SS1 – μ_SS2 (mean differences for clusters are equal across samples indicating no interaction effect).

For each hypothesis, we calculated Bayes factors compared to the unconstrained hypotheses using the R-package bain (Hoijtink et al., 2019).

Results

Our research objective in Study 1 was to collect evidence on the validity of our proposed instrument by comparing participants’ ratings of the suitability of the behaviors depicted in the convergent and divergent vignettes to our hypotheses regarding the expected pattern of ratings.

Preliminary analysis: Comparison of groups

As access to relevant knowledge was expected to influence evidence-based argumentation via its influence on the ability to identify relevant cues, participants were asked to rate their knowledge concerning the content of the vignettes; these ratings are summarized in Table 1. With respect to knowledge of SDT, both samples (i.e., both student teachers and sports science students) gave rather low self-reports (with mean ratings being 0.11 and 0.15, respectively); there was no significant difference between the groups, t(82) = 0.94, p = 0.35, Cohen’s d = 0.15. However, as expected, sports science students reported having significantly more knowledge of teaching games (M = 0.40, SD = 0.25) than did student teachers (M = 0.06, SD = 0.18), t(64) = 8.77, p < 0.001, Cohen’s d = 1.69.

TABLE 1

Table 1. Self-reported knowledge of self-determination theory (SDT) and teaching games in physical education.

Vignette ratings

Average ratings for each vignette are presented in Table 2. In line with the hypotheses, convergent vignettes of the UU type received the lowest ratings (sample 1: M = 2.39, SD = 0.70; sample 2: M = 2.42, SD = 0.67). In other words, both groups were in agreement on their judgments of teacher behaviors which we had constructed to represent unsuitable actions from both perspectives. Participants from each group rated the individual UU vignettes (UU1–UU4) slightly differently, but the groups were approximately in agreement on the overall ordering, with vignette UU3 receiving the lowest overall rating and vignettes UU1 and UU2 receiving the highest ratings within this condition. A similar pattern was observed for convergent vignettes of the SS type, which received the highest overall ratings (sample 1: M = 3.77, SD = 0.71; sample 2: M = 3.61, SD = 0.79).

TABLE 2

Table 2. Vignette ratings: descriptive statistics and group comparisons.

The conditions containing divergent vignettes (US and SU) received intermediate ratings from participants in both groups, a result that was also in line with the hypotheses. Additionally, participants in both groups judged the actions in US vignettes (M₁ = 2.84; M₂ = 2.73) to be slightly less suitable than those in SU vignettes (M₁ = 3.23; M₂ = 3.35). Within these conditions, the rank order of the suitability of individual vignettes was constant across both groups, although the mean ratings varied.

Repeated measures ANOVA indicated a significant main effect for the within-subject factor “vignette condition” with F(3) = 119.334, p < 0.001, partial η² = 0.377. Post-hoc comparisons with Bonferroni correction revealed significant differences (p < 0.001) between each condition (UU, US, SU, and SS).

Comparison of ratings by student teachers and sports science students

The results of an independent-samples Welch’s t-test indicated that there was no significant difference between the two groups in their ratings of UU vignettes. The corresponding effect size (Cohen’s d = 0.04) indicated that the difference was below the threshold to be considered even a small effect. Similarly, both groups gave comparable judgments in response to the SS vignettes, representing items in which the teacher action was intended to represent a suitable response from both perspectives. There was no statistically significant difference between the ratings given by each group on this condition, and the (statistically insignificant) standardized mean difference (Cohen’s d = 0.22) was just above the threshold of what is considered to be a small effect.

Concerning the divergent vignette conditions, once again no significant effect of group was observed. A comparison on the descriptive level of the within-group difference between ratings of the SU and US vignettes across groups indicated that there was a larger difference in the case of sports science students, who self-reported having greater knowledge of PE teaching (ΔSU–US = 0.61, SD = 0.87), compared to pre-service teachers (ΔSU–US = 0.38, SD = 0.88). However, this difference in differences was not significant, t(79) = 1.62, p = 0.11, Cohen’s d = 0.27.

In the repeated measures ANOVA results from the paired Welch-tests could be replicated by a non-significant main effect for sample [F(1) = 0.158, p = 0.692]. Further, a non-significant interaction between condition and sample [F(3) = 1.518, p = 0.209] leads to the assumption that judgments did not depend on the sample. The results from the frequentist approach were supported by bayesian evaluation of informed hypotheses: Bayes factors indicated strong evidence for H4 (BF_iu = 20.98, means are ordered with US being lower than SU in sample 1), H6 (BF_iu = 21.52, means are ordered with US being lower than SU in sample 2), and H7 (BF_iu = 13.49, mean differences for clusters are equal across samples indicating no interaction effect).

Overall, our results indicated that participants were able to identify relevant cues in the vignettes in judging the suitability of specific teacher actions, and this led to a pattern of ratings that conformed to the hypotheses. However, differences between the two groups in terms of the mean ratings they gave were observed only on the level of individual vignettes, with no differences observed in the groups’ average ratings over any of the aggregated conditions (UU, US, SU, or SS). There was a tendency in the case of the divergent vignette conditions towards a difference between the groups, in the hypothesized direction, but this did not reach statistical significance. This lack of systematic differences between the groups may be attributable to the fact that the participants had not had enough opportunities to build a sufficient knowledge base in their respective fields. To address this issue, we conducted Study 2, in which a specific knowledge intervention was implemented.