Elements for understanding and fostering self-assessment of learning artifacts in higher education

Köppe, Christian; Verhoeff, Roald P.; van Joolingen, Wouter

doi:10.3389/feduc.2024.1213108

SYSTEMATIC REVIEW article

Front. Educ., 05 March 2024

Sec. Assessment, Testing and Applied Measurement

Volume 9 - 2024 | https://doi.org/10.3389/feduc.2024.1213108

Elements for understanding and fostering self-assessment of learning artifacts in higher education

Christian Köppe¹^*

Roald P. Verhoeff²

Wouter van Joolingen¹

¹Freudenthal Institute, Utrecht University, Utrecht, Netherlands
²Radboud Teachers Academy, Radboud University, Nijmegen, Netherlands

Self-assessment skills have long been identified as important graduate attributes. Educational interventions which support students with acquiring these skills are often included in higher education, which is usually the last phase of formal education. However, the literature on self-assessment in higher education still reports mixed results on its effects, particularly in terms of accuracy, but also regarding general academic performance. This indicates that how to foster self-assessment successfully and when it is effective are not yet fully understood. We propose that a better understanding of why and how self-assessment interventions work can be gained by applying a design-based research perspective. Conjecture mapping is a technique for design-based research which includes features of intervention designs, desired outcomes of the interventions, and mediating processes which are generated by the design features and produce the outcomes. When we look for concrete instances of these elements of self-assessment in the literature, then we find some variety of design features, but only a few desired outcomes related to self-assessment skills (mostly accuracy), and even less information on mediating processes. What is missing is an overview of all these elements. We therefore performed a rapid systematic literature review on self-assessment to identify elements that can help with understanding, and consequently foster an effective self-assessment of learning artifacts in higher education using conjecture mapping as analytical framework. Our review revealed 13 design features and six mediating processes, which can lead to seven desired outcomes specifically focused on self-assessment of learning artifacts. Together they form a model which describes self-assessment and can be used as construct scheme for self-assessment interventions and for research into the how and why self-assessment works.

1 Introduction

Self-assessment skills have long been identified as important attributes of graduates in higher education. They have been shown to positively influence students’ learning and academic performance in general (Panadero et al., 2013), self-regulated and lifelong learning (Burgess et al., 1999; Dochy et al., 1999; Nicol and MacFarlane-Dick, 2006), and students’ self-efficacy (Sitzmann et al., 2010; Panadero et al., 2023).

Several definitions of self-assessment have been given in the literature over the years. Table 1 provides an overview of some often cited definitions. While all of them are valid, we will use for this study the definition of Panadero et al. (2016) for following reasons:

• it includes different mechanisms and techniques for student judgments,

• it includes assessment and evaluation, and

• it includes student products.

Table 1

Table 1. Definitions of self-assessment.

Panadero et al. (2016) present and discuss different self-assessment typologies which reflect the distinctions and similarities of various self-assessment mechanisms and techniques such as self-marking, self-rating, self-grading, self-appraisal, or self-estimates. These distinctions are useful for classifications of self-assessment practices, yet for our study we prefer an approach which allows inclusion of all these mechanisms and techniques which is reflected in the self-assessment definition by Panadero et al. (2016).

Many scholars advocate that self-assessment should only be formative [see, (e.g., Brown et al., 2015; Andrade, 2019)], but there is evidence that summative forms such as self-grading or self-evaluation can be beneficial too (Edwards, 2007; Nieminen and Tuohilampi, 2020). Both forms are included in the self-assessment definition of Panadero et al. (2016) and will also be included in our work.

Products, which are created by students and used for evidencing their learning, can be termed learning artifacts (Cherner and Kokopeli, 2018). Such artifacts are often the direct results of assignments common in higher education (e.g., essays, design documents, or scientific reports), but sometimes they are especially created to facilitate assessment of learning (e.g., videos of oral presentations or audios of speaking tests).

As these artifacts are usually used by teachers for summative assessments (as evidence for learning), it is helpful for students to know how to assess their themselves and consistent with their teachers. Much self-assessment research focuses on this accuracy of self-assessment in terms of the level of consistency between student self-assessment and an external assessment, usually from a teacher (see, e.g., Brown et al., 2015; Han and Riazi, 2018; González-Betancor et al., 2019; Carroll, 2020).

Although accuracy of self-assessments is an important indicator of their validity and reliability, in our view the main goal of self-assessment is to help students with learning while producing and improving learning artifacts. This is in line with the positive effects on self-regulated learning (Panadero et al., 2017) and lifelong learning (Taranto and Buchanan, 2020) that have been related to self-assessment skills. Higher education, and specifically universities, can play an essential role in contributing to lifelong learning (Atchoarena, 2021). A study at a university is usually the last phase of formal education and therefore the last possible place for addressing the acquisition of self-assessment skills in a formal setting. In our study, we will focus on understanding, and consequently fostering, self-assessment of learning artifacts in higher education.

Elements of interventions intended to support self-assessment can be found in numerous studies. Based on findings from these studies, several authors collected and discussed these elements. Brown et al. (2015) present features that have been shown to improve self-assessment accuracy: clear criteria, models (e.g., comparable work of other students), instruction and practice in self-assessment, feedback on accuracy, rewards, and keeping self-assessments strictly formative. Nielsen (2014) provides a list of strategies for effective implementation of self-assessment methods in writing instruction, e.g., self-assessment training, models, or co-development of criteria. Tai et al. (2018) describe practices common for developing evaluative judgment: self-assessment, peer-feedback/review, feedback (on student’s judgments and as dialog), rubrics, and exemplars.

The results of self-assessment studies which used above mentioned intervention elements are providing valuable information, but are in some cases also inconsistent. In a recent meta-analysis, Yan et al. (2022) found that even though the overall effect of SA on academic performance was positive (g = 0.585), in 22.79% of the SA interventions negative effects were observed. And also accuracy highly can differ, depending on the creation of optimal conditions while avoiding the many pitfalls (Brown et al., 2015).

Observations on such inconsistencies in the results are not new, and Eva and Regehr (2008) suggested that we should stop addressing questions like “How can we improve self-assessment?,” as hundreds of studies have led to the answer “You cannot.” In 2019, Andrade confirmed this image in her literature review on student self-assessment (Andrade, 2019), but added:

“What is not yet clear is why and how self-assessment works. Those of you who like to investigate phenomena that are maddeningly difficult to measure will rejoice to hear that the cognitive and affective mechanisms of self-assessment are the next black box.” (Andrade, 2019, p. 10)

In our view, it is valid to continue trying to improve self-assessment. We agree that it is valuable to study the cognitive and affective mechanisms of self-assessment. In addition, we propose that a better understanding of when, why, and how self-assessment interventions work can be gained by a more design-based research approach. This means that empirical educational research should be blended with theory-driven design of learning environments (The Design-Based Research Collective, 2003). Instead of only looking at the successes or failures of certain intervention elements, we should also focus on the interactions and connections between designed learning environments, processes of enactment, and outcomes of interest (The Design-Based Research Collective, 2003).

Sandoval proposed a technique for design-based research in education called Conjecture Mapping (Sandoval, 2014). This framework intends to specify the “theoretically salient features of a learning environment design and map out how they are predicted to work together to produce desired outcomes” (Sandoval, 2014, p. 19). A conjecture map consists of four core elements:

• some high-level conjecture (s) about how to support a specific kind of learning in a specific context,

• embodiments (or design features) of specific designs in which that conjecture becomes reified,

• mediating processes the embodiments are expected to generate, and

• desired outcomes produced by the mediating processes.

Apart from the high-level conjecture, design conjectures describe how certain design features contribute to the generation of specific mediating processes, while theoretical conjectures describe how mediating processes lead to (or produce) desired outcomes.

If we take for example the provision of a rubric as an often implemented design element in self-assessment interventions, then the pure provision of the rubric does not in itself contribute to a higher accuracy of self-assessment. Students have to learn to distinguish between the different quality levels described in the rubric, to translate the quality descriptions of the rubric to concrete instances of products, and to apply the rubric for self-assessment of their own work. This means that, in order to achieve a higher accuracy, interaction with the criteria needs to be generated as a mediating process. A (simplified) conjecture map based on this example is shown in Figure 1. Generating this mediating process likely requires more design features, as otherwise the pure provision of a rubric would be sufficient. On the other hand, the absence of this mediating process is one of the possible explanations for the low accuracy of self-assessments in some studies where a rubric has just been provided to the students without further intervention elements.

Figure 1

Figure 1. Simple example of a conjecture map for accurate self-assessment using rubrics.

Only few studies provide detailed information about the process of self-assessment in educational practice (Andrade, 2019). When we look for elements of self-assessment in the literature, such as the ones in the previous example, then we find some variety of design features, but only a few desired outcomes related to self-assessment skills (the majority looks at accuracy), and even less information on mediating processes. What we miss is an overview of all these elements. Getting this overview could lead to a model of self-assessment, which can be used as a construct scheme for self-assessment interventions and would support a design-based research approach.

As a basis for such research, this study intends to provide such overview by identifying the most important elements of self-assessment as described in the literature. We address the following research question:

What are important elements for understanding and fostering effective self-assessment of learning artifacts in higher education?

We will use the elements of conjecture maps as analytical framework, and focus only on elements directly related to–or part of––the self-assessment intervention: which embodiments were implemented, were there descriptions of mediating processes as generated by the embodiments, and which outcomes were produced or desired? When looking at the outcomes, we will only look for outcomes related to self-assessment itself, and not outcomes such as academic performance or the quality of the products.

2 Methods

We performed a Rapid Systematic Review (Grant and Booth, 2009) as an alternative to a full systematic review, because we deemed conceptual saturation to be more important than completeness. In our study, the literature search, screening process, analysis, and synthesis were performed by the first author. All results, particularly unclear cases, were discussed and negotiated with the co-authors during all stages of the process.

We included empirical literature presenting qualitative or quantitative accounts of self-assessment of learning artifacts in higher education, as well as theoretical papers in our review, as the aim was to provide an overview of all potentially relevant elements and not a meta-analysis. The literature screening flowchart is shown in Figure 2.

Figure 2

Figure 2. Literature screening process.

2.1 Search strategy

For our literature search on self-assessment, we used two scientific databases: ERIC and Web of Science. The ERIC database is focused on research in education and is expected to contain publications from educational science addressing self-assessment. However, many publications are also released in field-specific journals, e.g., self-assessment in language learning or mathematics. To cover such publications, we used as second database the Web of Science Core Collection, as this database indexes a wide range of scholarly journals, books, and proceedings. We included literature from a period of 15 years (publication date between 2006-01-01 and 2021-12-31). As search terms, we used “self-assessment” and “self-grading”, as both are used in the literature, in combination with “higher education”: “((ALL = (self-assessment)) OR (ALL = (self-grading))) AND ALL = (“higher education”).” We did not include search terms related to self-assessment of artifacts/products in this first search in order to also select publications which do not use these terms, but mention only the concrete products, e.g., written essays, assignment solutions etc.

The Web of Science search resulted in 772 publications, and the ERIC educational research database search resulted in 36 publications. After removing the eight duplications, the full initial set contained 800 publications.

2.2 Inclusion criteria

The titles and abstracts of all publications in the full initial set were read to ascertain that they addressed self-assessments of learning artifacts in higher education. Publications with a context other than higher education (e.g., at workplaces) and publications addressing self-assessment of knowledge or skills that did not require the production of artifacts were excluded. Only publications in English were included in this study. The resulting dataset comprised 270 papers. Furthermore, only papers were included that contained at least one candidate for an embodiment, mediating process, or outcome. This resulted in a set of 92 papers that were used for our data analysis.

2.3 Data analysis

Our data analysis followed the thematical analysis approach (Braun and Clarke, 2006). From the included papers, we collected the publication source details and an extract containing the potential embodiments, processes, and outcomes. Codes were assigned to indicate the type of embodiment, process, or outcome addressed in each paper. In the initial analysis, the potential mediating processes and outcomes were collected in one column because there is often a close connection between them (e.g., understanding quality can be both a process and an outcome). The mediating processes and outcomes were distinguished during the qualitative synthesis phase. This initial analysis revealed 451 embodiments and 192 combined mediating processes and outcomes.

The next phase of the thematical analysis (searching for themes) involved inductive analysis of the initial codes to identify candidate themes for embodiments and combined mediating processes and outcomes. The first iteration of the analysis yielded 28 themes for embodiments and 19 themes for combined processes and outcomes.

Candidate themes for both embodiments and processes/outcomes were reviewed and refined in the next phase. This involved a more fine-grained classification of themes according to the types of embodiments, mediating processes, and outcomes, as described by Sandoval (2014). The results of the thematical analysis are presented in section 3.

2.4 Characteristics of included studies

A complete overview of all 92 publications included for analysis is provided in Supplementary Table S1 in Supplementary material. Covered disciplines/fields of study (where mentioned) include educational science and teacher education (N = 21), English/Language Learning (N = 12), Biology/Life Sciences (N = 7), Mathematics (N = 5), Business (N = 4), Social Sciences (N = 3), Accounting (N = 3), Chemistry (N = 2), and Engineering (N = 3). Covered only once are Design, Computer Science, Criminal Justice, Liberal Arts, Health, History, Information Literacy, and Physics. Some studies cover multiple disciplines and some (mostly reviews) do not mention the covered field.

Seventy six studies are empirical and 16 studies are reviews, model buildings or other theoretical work. Of the 76 empirical studies, 31 are quantitative, 16 are qualitative, and 29 mixed qualitative-quantitative. The sample sizes in the studies vary: 2 studies with very small samples (N < 10), 14 with small samples (N < 30), 29 with medium samples (N < 100), and 31 studies with large samples (N > =100). Most empirical studies also mentioned the country of data collection with a total of 24 different countries. Countries include Australia (N = 14), United States (N = 11), Spain (N = 9), Great Britain (N = 9), China (N = 5), Hong Kong (N = 3), New Zealand (N = 3), South Africa (N = 3) and 1 or 2 times each Taiwan, Finland, Chile, Columbia, Germany, Ireland, Iran, Israel, Lithuania, Mexico, Malaysia, Singapore, Serbia, Thailand, Turkey, and Ukraine. Regarding continent, 24 studies are from Europe, 18 from Asia (incl. Turkey), 17 from Australasia/Oceania, 12 from North-America, 3 from Africa and 2 from South-America.

Sixty-six studies have students as only data source, 2 studies use data from teachers only and 8 studies use data from students and teachers/staff. There are 40+ different types of products/learning artifacts used for self-assessment. The most frequently used ones are essays, (recordings of) oral presentations, scientific reports, and course assignments. The study year (where specified) also varies greatly: 17 studies look at 1st years only, 4 studies look at 2nd years, 14 studies look only at 3rd and/or 4th years, 8 studies look at multiple years and the rest is more unspecific (bachelor students, undergraduate, adult learners etc.). Forty-eight studies used only one source for data collection: 24 used process/performance data (the products, self-assessments and teacher assessments), 13 used surveys, 4 used interviews, 2 focus groups and 1 observation. All other studies triangulated with multiple sources (various combinations, double counts possible): 23 studies combined surveys with other sources, 12 studies combined interviews with other sources, 29 combined process/performance data with other sources, and nine studies used three or more sources for data collection.

3 Results

Our review revealed a total of 26 elements deemed relevant for fostering and understanding self-assessment of learning artifacts. These elements comprise 13 embodiments and six mediating processes, which can lead to seven desired outcomes specifically focused on self-assessment of learning artifacts. Together they form a model which describes self-assessment and can be used as construct scheme for self-assessment interventions and for research into the how and why self-assessment works. In this section, we will first present embodiment, mediating process, and outcome individually, sorted by their number of occurrences. Then the complete model will be presented.

3.1 Embodiments

Embodiments are the concrete elements of educational designs. They are classified and aggregated according to four types, following Sandoval (2014): Tools and Materials (software programs, instruments, manipulable materials, media, and other resources), Task Structures (the structure of the tasks learners are expected to do—their goals, criteria, standards, and so on), Participant Structures (how students and teachers are expected to participate in tasks, the roles and responsibilities participants take on), and Discursive Practices (practices of communication and discussion or simply ways of talking). A categorized summary of all embodiments is provided in Table 2.

Table 2

Table 2. Overview of embodiments found in the reviewed literature, incl. number of publications and short description.

3.1.1 Tools and materials

Analytic rubrics were the most frequently mentioned tool in the literature on self-assessment of learning artifacts (63 publications). They are used to define pre-set assessment criteria for (elements of) learning products, including descriptions of several quality levels per criterion. Analytic rubrics are useful for feedback (Sadler, 2009) and, consequently, for formative self-assessment. Rubrics must be complete, clear, and transparent to guide students’ self-assessments (Fastré et al., 2012; Tai et al., 2018).

While rubrics define assessment criteria and several levels of quality, exemplars (31 publications) illustrate how these dimensions of quality manifest in concrete work products (Sadler, 1989; Smyth and Carless, 2021). Exemplars can be either authentic student samples, representing a certain quality level, or teacher-constructed examples that make specific features visible to students (Smyth and Carless, 2021). Several publications recommend using a range of exemplars of different qualities (e.g., Jones et al., 2017; Knight et al., 2019). Additionally, assessments of exemplars or comments on their quality can be provided to students as models for their self-assessments (Andrade and Valtcheva, 2009; Fastré et al., 2012).

Only 11 of the reviewed publications provided a description of concrete self-assessment instruments, the actual tools to be used while performing self-assessment. The instruments mentioned mostly include generic ones such as assessment sheets/forms (Taras, 2015; Hung, 2019).

Few publications explicitly mention the technical instruments or digital systems used for self-assessment. Besides standard learning management systems such as Blackboard (e.g., Diefes-Dux, 2019), some papers also report on newly developed systems for the purpose of supporting self-assessment. Li and Zhang (2021) developed and used a computer-assisted adaptive instrument and Agost et al. (2021) introduced computer-based adaptable resources. Both include elements such as rubric annotations, complementary resources, and selectable levels of provided details. Lawson et al. (2012) described the review system designed to facilitate self-assessment.

Despite these reports, our review revealed that relatively little attention has been given to the technical support of self-assessment and how certain instruments can facilitate self-assessment. Further research in this area is warranted.

3.1.2 Task structures

Assessment training was explicitly mentioned as a way to increase the quality of students’ self-assessments (44 publications). Some papers only mention that students receive training, without further specification of the kind of training, while others provide a more detailed description of training elements, for example, analysis and discussion of criteria (Lavrysh, 2016; Hung, 2019), discussing assessment results with student peers (Chen, 2008), using multiple exemplars of varying quality to help students develop an appropriate sense of what makes good quality before introducing the rubrics (Smyth and Carless, 2021), or discussing with students how to interpret the quality levels (Tai et al., 2018). Some reviewed studies explicitly included an explanation of how the assessment process works (self-assessment and final assessment).

Twenty-two publications included at least one required self-assessment of work-in-progress. Initial self-assessments are often inaccurate, as they may be biased by (unintentional) self-deception and prior academic success (Buckelew et al., 2013). Required self-assessments of work-in-progress help to discover and address these issues. Additionally, students may perceive self-assessment as beneficial to them when they experience how it helps them close the gap between their current and desired performance.

Thirteen publications included students having to provide an explicit assessment result justification that obligates them to substantiate the quality level of (components of) an artifact and how it meets the assessment criteria. This stimulates higher-level cognitive skills (evaluation and analysis) and helps develop an understanding of what quality means (Tai et al., 2018). Some papers include assessment justification also for peer assessments or as part of assessment training. Explicit justification by students also provides information for teachers about the quality of their self-assessment and can serve as input for feedback on how students’ self-assessment might be improved (Tai et al., 2018).

Fifteen publications described an incentive for accurate self-assessment to stimulate serious and/or accurate self-assessment. In most cases, an accurate self-assessment is counted as a (small) part of the grade or resulted in students receiving extra credits (Cabedo and Maset-Llaudes, 2020). Contrary to rewarding accuracy, some publications mention punishments for inaccurate self-assessments, such as lower grades or loss of credits (Knight et al., 2019; Seifert and Feliks, 2019). Sometimes, completing a self-assessment itself was rewarded, independent of its accuracy (Davey, 2015; Wanner and Palmer, 2018).

Twenty-one publications reported on shared responsibility of grading at least partially between teachers and students to make self-assessment more valuable to students (Bourke, 2018) and to involve them actively in the assessment and grading process. This also benefits the teacher-student relationship (Edwards, 2007). Students reported putting more effort into self-assessment when graded (Jackson and Murff, 2011). Known problems of self-grading, such as grade inflation or social response bias (e.g., Brown et al., 2015), can be prevented by grade negotiation with the assessor (McDonnell and Curtis, 2014; Seifert and Feliks, 2019) and combining self-grading with assessment training or requiring justification of the grade by students (Evans, 2013; Bourke, 2018).

The relevance of self-assessments increases if there is a revision/improvement possibility (18 publications) to close the gap between the actual level of performance and the desired quality standard (Andrade and Valtcheva, 2009). Students can use their self-assessments of draft versions for revisions/improvements of the work before their final submission (Taras, 2015; Wanner and Palmer, 2018). Nielsen (2014) emphasized that it is important to provide sufficient time for revision.

3.1.3 Participant structures

The importance of teacher feedback on self-assessment results has been emphasized in 28 publications (e.g., Andrade and Valtcheva, 2009). It should focus on both the quality and accuracy of students’ judgments and not only on the quality of students’ work (Sitzmann et al., 2010; Tai et al., 2018) and should be given on time. If students provide justifications of their self-assessment, feedback can be tailored to their needs. Ideally, teachers can see whether students really understood the quality criteria and whether they were able to translate and use them for concrete evaluations. Reasons for over-assessment and under-assessment (such as unintentional self-deception or other biases) could be identified and addressed in the feedback (Fastré et al., 2012).

Peer assessment was used in 13 publications to improve self-assessment skills (Bozzkurt, 2020). Assessing the work of peers requires students to apply quality criteria to some work similar to theirs and to justify their assessment. If students have to perform assessments on the works of several peers, they are exposed to different implementations and levels of quality. This helps them gain a better understanding of the quality criteria and how they manifest in concrete artifacts. Getting their own work assessed by peers enables students to calibrate their own assessment with that of others and potentially develop more insights into multiple interpretations of quality criteria and their manifestation in artifacts.

3.1.4 Discursive practices

Involving students in co-creation of rubrics/criteria leads to a shared understanding of these criteria and increased ownership (23 publications, e.g., Birjandi and Hadidi Tamjid, 2012; Nielsen, 2014). During the process of creation, students are involved in the social construction and articulation of standards, which helps them with evaluative judgment in new fields and contexts (Tai et al., 2018). Students can also analyze and discuss exemplars to identify quality dimensions and criteria (Smyth and Carless, 2021). Boud and Falchikov (1989) stated that besides rating one’s own work, the identification of criteria or standards applied to one’s work is another key element of self-assessment.

An assessment result discussion with peers and/or teachers increases the understanding of how to accurately self-assess (eight publications, e.g., Lavrysh, 2016). Such discussions can be held after self-assessment, peer assessment, and teacher assessment, and can serve both as feedback and feedforward (Mannion, 2021). These are valuable collective calibration practices (Brown et al., 2015).

3.2 Mediating processes

Identifying the mediating processes helps to understand how and when embodiments do (not) contribute to achieving the desired outcomes. It also helps to clarify why some studies relate positive outcomes to certain embodiments (e.g., rubrics and assessment training), while other studies do not. According to Sandoval (2014), mediating processes manifest as either observable interactions or artifacts that function as proxies for learning processes, indicating the extent of learner engagement in relevant activities. Our review shows that mediating processes are often not explicitly described, although they are likely to have been generated by the embodiments. The mediating processes found in the literature are summarized in Table 3 and described in more detail below.

Table 3

Table 3. Overview of mediating processes found in the reviewed literature, incl. number of publications and short description.

3.2.1 Interactions

Students should perform interaction with criteria in order to understand what they mean and how to apply them to concrete products (3 publications, e.g., Boud et al., 2013; Bird and Yucel, 2015). This requires cognitive engagement, for example, by discussing the criteria (Cowan, 2010) or deconstructing a rubric (Jones et al., 2017).

The practice of assessment of other work is often described as essential for improving self-assessment skills (12 publications, e.g., Chen, 2008). Examining and interacting with work similar to what is expected from students serves as an instantiation and representation of good (or varying) quality (Andrade and Du, 2007; Bird and Yucel, 2015). Such similar work can be the products of peers or exemplars provided by the teacher. As the students are not personally attached to the artifacts to be assessed, the risk of biased assessment is much lower.

Students who perform regular self-assessments generate formative self-feedback on work-in-progress (drafts) and use it to inform revisions and improvements (14 publications, e.g., Andrade and Valtcheva, 2009). These self-assessments can be obligated (through embodiments) but might also be performed by students’ choice as part of monitoring their own performance (e.g., Chen, 2008; Cowan, 2010).

Assessment dialogs about different elements of assessments are key to supporting students’ self-assessments (11 publications, e.g., Nielsen, 2014). Communication between teachers and students (and between students) about self-assessment should be reciprocal and focus on learning how to self-assess. Allowing a dialog on assessment tools (Cockett and Jackson, 2018) or assessment results (Lavrysh, 2016) can be considered a democratic strategy, which is often valued by students (McDonnell and Curtis, 2014) and enhances their receptiveness to feedback (Henderson et al., 2019).

Self-reflection on self-assessments (six publications, e.g., Brown et al., 2015; To and Panadero, 2019) means that students become aware of not only the quality of their work products, but also the quality of their self-assessment performance. Therefore, such self-reflection is an important process for identifying strategies for closing the gap between the actual performance of self-assessments and the desired performance (correct and complete self-assessments). These strategies can be applied to improve self-assessment skills.

3.2.2 Artifacts

The different drafts of the learning artifacts present visible progress through repeated self-assessments (two publications). This means that self-assessments that indicate deficiencies in these artifacts can lead to traceable improvements. The learning process generated by self-assessments manifests in the progress of learning artifacts (Hung, 2019; Xiang et al., 2021).

The results of the regular self-assessments (14 publications) may also be documented as explicit artifacts by the students, which indicate the learner’s engagement with the quality of their work-in-progress.

Our review revealed seven mediating processes (counting regular self-assessments twice, both as interaction and artifact) that may be generated by the embodiments described in the previous section. However, most publications do not explicitly pay attention to mediating processes, and further research is needed to validate these processes and identify other potential processes that contribute to achieving the outcomes, as described in the next section.

3.3 Outcomes

The mediating processes, generated by embodiments, contribute to the production of desired outcomes. Although most of the literature on self-assessment focused primarily on high accuracy of self-assessments as outcome, in our review we also found other relevant outcomes.

Some outcomes are explicitly stated as such (e.g., judgment calibration or focus on quality improvement), whereas other expected outcomes remained implicit. For example, an item in a questionnaire such as “Would you like to see self-assessment applied in other courses too” indicates the outcome that students perceive self-assessments as valuable to them. All desired outcomes identified in the literature review, both explicit and implicit, are summarized in Table 4.

Table 4

Table 4. Overview of outcomes found in the reviewed literature, incl. number of publications and short description.

The outcome directly related to accuracy of self-assessments is judgment calibration (19 publications, e.g., Bozzkurt, 2020), meaning that student assessments are mostly consistent with external assessments, such as that of the teacher (Table 4).

Achieving a low impact of self-representation is one of the implicit outcomes (seven publications). Self-assessments are often influenced by self-representation and personality and not by real levels of achievement (Jansen et al., 1998). Unintentional self-deception, often present in low-achieving students, can lead to over-assessment (Agost et al., 2021). Other student characteristics, such as gender and cultural features, have also been reported to influence self-representations that impact self-assessments (González-Betancor et al., 2019; Carroll, 2020).

Gaining a quality understanding of relevant standards seems to be a key outcome of self-assessments, as it is the outcome mentioned the most (27 publications). The levels range from a more generic understanding of quality (Kearney, 2013; Carless and Chan, 2017; Scott, 2017) to distinguishing between low- and high-quality work (Lavrysh, 2016; Seifert and Feliks, 2019) or recognition of average but sufficient pieces of work (Taras, 2015). Some publications have focused on an expected standard for concrete artifacts that students should understand (McDonnell and Curtis, 2014; Bird and Yucel, 2015; Adachi et al., 2018). Sadler (1989) emphasized that students should understand that not all quality aspects can always be unambiguously formulated, that criteria may require interpretations, or that there is appraisal of work as a whole and not only of the parts.

The sustainable effect of self-assessment depends on how valuable and beneficial students see its application for their learning progress and the quality improvement of their learning artifacts. Self-assessment appreciation is an implicit outcome mentioned in ten publications, helpful for effective self-assessment to occur and leading to students’ application of it in following assignments, even though it is not required (Andrade and Valtcheva, 2009). The focus in the literature varies slightly and ranges from understanding why self-assessment is beneficial (Yan and Brown, 2017; Adachi et al., 2018) to understanding how self-assessment is beneficial (McDonnell and Curtis, 2014; Lavrysh, 2016) to becoming convinced of the benefits of self-assessment by experiencing it themselves (Panadero et al., 2016; Deeley and Bovill, 2017).

Students engage more in self-assessment when they assume assessment ownership (16 publications, e.g., Adachi et al., 2018). Commitment to, and understanding of, the assessment system helps to develop students’ competence in performing accurate and realistic self-assessments (Nielsen, 2014; González-Betancor et al., 2019) and producing better work (McDonnell and Curtis, 2014). A sense of ownership developed by self-assessment activities, such as the co-creation of criteria, motivates students (Nielsen, 2014) and cultivates responsibility and autonomy (Cassidy, 2007).

Having a focus on quality improvement helps students identify appropriate actions to close the gap between the actual level of performance and the desired standard (22 publications, e.g., Sadler, 1989; Andrade and Du, 2007). Self-assessments are used to improve work and correct mistakes to get closer to the desired standard (Bourke, 2018).

Five publications reported self-efficacy growth regarding judging quality as assessors as a result of self-assessment practices (e.g., Xiang et al., 2021). Some studies have explicitly aimed to build students’ confidence through self-assessment activities (e.g., Scott, 2017).

3.4 Model of self-assessment

The elements presented in the previous sections together form a model of self-assessment of learning artifacts in higher education based on the reviewed literature. Figure 3 gives an overview of the model. The elements in the model are structured according to the conjecture mapping approach into embodiments, mediating processes, and (desired) outcomes. The 13 embodiments identified in various self-assessment interventions can be applied in different configurations, depending on the focus of the self-assessment intervention (e.g., formative or summative) or other design decisions (e.g., providing Assessment Training versus “training-on-the-job” with multiple Required Self-Assessments). Application of the embodiments can contribute to the generation of the six mediating processes contained in the model. These mediating processes are deemed essential for the production of the desired outcomes and help to understand how and why the overall intervention design works. One mediating process, Regular Self-Assessments, can be categorized as both an interaction and an artifact, it is therefore included twice in the model. The seven desired outcomes described in the model provide an overview of what desirable results are of self-assessment interventions. These outcomes are specifically related to self-assessment itself and not to other (domain-specific) learning outcomes, which may be formulated within the context of the educational intervention.

Figure 3

Figure 3. Model of self-assessment based on results of our study.

4 Discussion

What are important elements for understanding and fostering self-assessment (of learning artifacts) in higher education? To answer this question, we performed a rapid systematic review (Grant and Booth, 2009) and used the Conjecture Mapping approach (Sandoval, 2014) as the analytical framework. Using this approach, we created a model linking 13 embodiments, six mediating processes, and seven outcomes, forming an integrated framework for understanding self-assessment, which can also serve as a basis for the design of learning environments that support self-assessment.

In this model, the 13 identified embodiments comprise a variety of design features that were all applied to support the self-assessment of learning artifacts, albeit in various configurations and with sometimes mixed results. All embodiments were applied in successful designs, indicating that they can potentially contribute to effective self-assessment. However, as with any educational intervention, the success of their application depends on context and details of implementation. The main contribution of our model is assisting researchers and practitioners in understanding of and designing for self-assessment by linking the levels of mediating processes and outcomes to the embodiments found.

According to Andrade (2019), research on these processes is essential for understanding when, how and why certain embodiment configurations lead to certain outcomes. However, our review showed only occasional mentions of these mediating processes and hardly any direct relationship with embodiments. Understanding how self-assessment can be successfully implemented requires identifying potential mediating processes and explaining how certain outcomes are related to specific (combinations of) embodiments.

The outcomes described in this paper cover various aspects of what should be achieved in order to make self-assessment effective. In addition to the outcomes explicitly mentioned in the literature on self-assessment, such as accuracy of self-assessments, we identified more implicit outcomes such as self-assessment appreciation or quality understanding. These outcomes are often student-centered and can be regarded as important when it comes to engaging students not only in comparing their work with the standard that is required, but also in appropriate action to improve their work toward that standard (Sadler, 1989) also when self-assessment is no longer required by the teacher. A more specific finding of our review is that relatively little research has been conducted on the impact of technological support on the effectiveness of self-assessments (the technical self-assessment instruments). In addition, the mediating processes generated in educational designs with students’ self-assessment of their work seem underexplored in the literature.

The framework is not prescriptive in the sense that it can be used to determine what kinds of self-assessment embodiments should be present in order to achieve specific outcomes. Instead, it provides an interpretative structure for understanding existing practices and for creating designs that explicitly link design features to expected outcomes, mediated by learning processes. The embodiments, processes and outcomes resulting from our review become building blocks for designers to make and underpin their design choices, in line with Sandoval et al.’s original ideas behind conjecture mapping.

A very relevant question in this field is what improving SA means, and the presented framework can help with an answer to this question. We propose the effectiveness of SA as level of achievement of one or more of the seven outcomes. For improving SA we argue that outcomes deemed relevant in a specific context should be improved. As argued above, the framework can be used as construct scheme for this. Following the constructive alignment approach (Biggs and Tang, 2011), these should be aligned with teaching and learning activities which can be designed using the elements of the presented model.

4.1 Limitations

The research method we applied was a Rapid Systematic Review (Grant and Booth, 2009) combined with a Thematical Analysis (Braun and Clarke, 2006). We used Conjecture Mapping as framework for the thematical analysis (Sandoval, 2014) aiming for the conceptual saturation of the identified elements. A consequence is that we might have missed publications with additional elements. This means that we cannot consider the collections of elements, processes and outcomes to be final or complete. For the model as a whole this has no consequences, as it will be robust to the addition of more elements, but such additions may be expected as a consequence of future studies.

Studies included in our review were from a variety of fields and countries and covered SA of different types of artifacts. Even though we did not evaluate per element in which field/country/artifact type it was applied/successful (as this was not our goal), we assume that our model principally is applicable for all types of artifacts in various fields. Evaluation of this assumption is subject to future work.

4.2 Future research

This model can be applied to at least three future research directions. All three can contribute to answering questions regarding how, when, and why specific embodiments contribute to the achievement of desired outcomes through certain mediating processes and consequently promote an understanding of successful self-assessment processes.

4.2.1 Research direction 1: new case studies

For educational design research, new case studies can use this framework for the design of self-assessment interventions and research on the effectiveness of these interventions. Figure 4 shows an example of a conjecture map based on our results. The four embodiments can be implemented and studied to determine to what extent they generate the three mediating processes and produce the intended outcomes.

Figure 4

Figure 4. Exemplary new conjecture map based on elements identified in our study, focusing on two specific outcomes.

4.2.2 Research direction 2: mapping onto existing research

Mapping elements of the framework as potential conjectures onto existing research can help to gain more insight into the complex phenomenon of self-assessment. For example, consider the following publication: Student self-assessment: Results from a research study in a level IV elective course in an accredited bachelor of chemical engineering (Davey, 2015).

The following embodiments can be discerned in the educational design of the study: assessment rubrics (provided in advance of course starting), required self-assessment (one after submission of assignment), shared grading (self-assessment counts for 10 % of the final grade), example (as idealized solution, provided after assignment submission), and a marking sheet. The course with these embodiments was reported to have the following self-assessment related outcomes: Accuracy was not high (self-assessments were 16% higher than tutor assessments), only 50% of participants valued self-assessment, and only 50% were confident that their sell-assessment was correct. From our conjecture mapping perspective, we note that this study lacks any mention of mediating processes. This information has been added to the conjecture map results in Figure 5.

Figure 5

Figure 5. Conjecture map based on elements as reported in Davey (2015).

It is likely that the mediating processes required to produce the outcomes were insufficiently generated. The following suggestions are formulated based on the proposed framework:

The necessary mediating processes that contribute to the production of the three outcomes are Regular Self-Assessments, Assessment Dialogs, Self-Reflection, and Visible Progress (through repeated SA). To generate these processes, the following embodiments can be adapted or added: the Required Self-Assessment should not be done after the deadline, but also required earlier (and optionally more than once) so that the results can be discussed and used for improvement (Revision/Improvement Possibility). Students should also be encouraged to perform self-assessments even if they are not required. Assessment Training would help students learn how to interpret and apply the rubrics, ideally using multiple Exemplars of varying quality (not only an idealized solution). There also should be Teacher Feedback on Self-Assessment Results.

4.2.3 Research direction 3: exploration of existing educational designs

A third research direction that we propose is more explorative than design-based. One could look for educational designs in which multiple embodiments were implemented and use our framework for analyzing the effects of these embodiments: which of the mediating processes were generated and which outcomes were produced by them. An example could be a software engineering project in which students work on several artifacts, such as software requirements documents, software design documents, program source code, and test reports. In such projects, rubrics often describe the quality criteria for all dimensions of the various artifacts. Such educational designs offer a rich research context for studying self-assessment: Students could be asked to regularly perform self-assessments of their work-in-progress, and data could be collected and analyzed to address the effects of implemented self-assessment elements.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Author contributions

CK, RV, and WJ contributed to the conception and design of the study. CK conducted the literature search and performed the analysis and wrote the manuscript in consultation with RV and WJ. All authors contributed to the article and approved the submitted version.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2024.1213108/full#supplementary-material

References

Adachi, C., Tai, J. H.-M., and Dawson, P. (2018). Academics’ perceptions of the benefits and challenges of self and peer assessment in higher education. Assess. Eval. High. Educ. 43, 294–306. doi: 10.1080/02602938.2017.1339775