Is Concept Appraisal Modulated by Procedural or Declarative Manipulations?

Thorne, Sapphira R.; Smortchkova, Joulia; Quilty-Dunn, Jake; Shea, Nicholas; Hampton, James A.

doi:10.3389/fpsyg.2022.774629

ORIGINAL RESEARCH article

Front. Psychol. , 22 March 2022

Sec. Cognition

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.774629

Is Concept Appraisal Modulated by Procedural or Declarative Manipulations?

$\r\nSapphira R. Thorne&#x;$ Sapphira R. Thorne^1†

Joulia Smortchkova^2,3†

Jake Quilty-Dunn^2,3,4

Nicholas Shea^2,3* $James A. Hampton\r\n$ James A. Hampton¹

¹Department of Psychology, City, University of London, London, United Kingdom
²Institute of Philosophy, School of Advanced Study, University of London, London, United Kingdom
³Faculty of Philosophy, University of Oxford, Oxford, United Kingdom
⁴Philosophy–Neuroscience–Psychology Program, Department of Philosophy, Washington University in St. Louis, St. Louis, MO, United States

A recent study has established that thinkers reliably engage in epistemic appraisals of concepts of natural categories. Here, five studies are reported which investigated the effects of different manipulations of category learning context on appraisal of the concepts learnt. It was predicted that dimensions of concept appraisal could be affected by manipulating either procedural factors (spacing of learning, perceptual fluency) or declarative factors (causal knowledge about categories). While known effects of these manipulations on metacognitive judgements such as category learning judgements and confidence at test were replicated, procedural factors had no reliable effects on the dimensions of concept appraisal. Effects of declarative manipulations on some forms of concept appraisal were observed.

Introduction

Questions surrounding the nature and role of concepts in thought have been at the forefront of research in psychology for many decades. A less explored aspect of concepts is the way in which thinkers assess their own concepts. In a previous empirical study, we have discovered that people evaluate their concepts epistemically. We called this aspect of thinking about concepts “concept appraisal” (Thorne et al., 2021). To explore concept appraisal, we investigated eight dimensions of thinkers’ epistemic evaluations of their concepts (these may be seen in Table 1). The first three dimensions encode how well thinkers understand the concepts they use. To explore understanding we asked participants to evaluate the accuracy of the information they associate with a concept, how much information is contained in a concept, and how well they could explain the concept to someone else. Four other dimensions encode something about the concept itself, and more precisely something about its reliability or dependability as a tool for thinking. To explore this aspect of concept appraisal we asked participants to evaluate how good the concept is for making inductive inferences, how informative the concept is about the objects that fall under its scope, how willing they are to defer to experts regarding its use, and finally how much they think there is to learn about the category. A final dimension of concept appraisal encodes whether thinkers consider the concept to be a useful tool when communicating with others. We have shown that these eight dimensions are reliable (people agree on how to judge different sets of concepts along the dimensions outlined above) and exist for concepts in multiple domains [from natural kinds to social groups to artefacts, for full details see Thorne et al. (2021)].

TABLE 1

Table 1. Nine concept appraisal questions used in Studies 1–5.

The existence of dimensions of concept appraisal raises the question of whether and how these dimensions could be manipulated. This question is directly relevant to some of the issues that motivated the initial study, and in particular to issues surrounding conceptual engineering in philosophy (Machery, 2017; Thomasson, 2017; Cappelen, 2018), a project that aims at improving and changing our shared understanding of some socially relevant concepts, and issues related to the mechanisms of conceptual change during development (Smortchkova and Shea, 2020). Conceptual change occurs all the time at the level of groups and of individuals, but its mechanisms are still debated, and it is notoriously difficult to achieve in a goal-directed manner. Could the dimensions of concept appraisal discovered in Thorne et al. (2021) play a role in conceptual change? As a first step in answering this question, in the present study we decided to explore which factors could influence the epistemic evaluations of concepts, that is factors that could influence both the subject’s assessment of how well they understand a certain concept, and of their assessment of the reliability of the concept as a good tool for thinking about the world.

As a way of operationalising the issue, we looked at the metacognitive literature exploring the factors that influence people’s evaluations of their own judgements, beliefs, and categories. In this literature a distinction is drawn between procedural or experiential factors (such as procedural fluency in reading) and declarative or theory-based factors (such as the assessment of how well new information fits with the information already possessed by the thinker, Koriat and Levy-Sadot, 1999; Schwarz, 2010; Proust, 2013). Both of these factors have been shown to have an impact on various judgements (such as judgements of truth, judgements of confidence, etc.). We hypothesised that these factors could also have an impact on concept appraisals, namely on judgements of understanding and judgements of reliability. We set out to investigate whether concept appraisals are modulated either by procedural manipulations or by declarative manipulations.

Overview of the Current Research

Study 1 starts with a much-studied procedural manipulation. It explores the impact of massed versus spaced learning on metacognitive judgements (Kornell and Bjork, 2008; Logan et al., 2012). We adopted an experimental design from Wahlheim et al. (2011) and used natural categories of families of birds. We aimed to test the potential effect on some forms of concept appraisal (judgements about concepts’ understanding, reliability and usefulness for communication) and to replicate the effect of massed learning on the overestimation of one’s future performance after learning (Kornell and Bjork, 2008).

Studies 2 and 3 focus on processing fluency, another procedural factor which is regularly found to affect metacognitive judgements. We used two fluency manipulations: image size and readability of fonts. Study 2 used paintings in two different categories (expressionist and minimalist) and manipulated their size to induce experiences of fluency and disfluency. Size has been shown to influence metacognitive judgements (Undorf et al., 2017). Study 3 used verbal descriptions for fictional types of ants [adapted from Rehder and Hastie (2004)] and used font size to induce experiences of fluency and disfluency (Kaspar et al., 2015). In both studies the impact of fluency on predictions of one’s performance and on concept appraisal is explored.

Studies 4 and 5 turn to declarative factors. They test the possible impact of information about the structure of the newly learnt categories on concept appraisal. Here we adapted a design from Rehder and Hastie (2004) and explored the influence of the structure of the category (as having properties that are produced by a common cause, as having properties inducing a common effect, as having properties related in a causal chain, or as having properties with no causal relations as a control condition) on a series of questions about the categories. Do causal beliefs about the newly learnt categories influence the ways the concepts are appraised epistemically?

General Method

Inspired by research on category learning judgements (Jacoby et al., 2010), we adopted a method that has been used to manipulate various types of metacognitions associated with category learning (Wahlheim et al., 2011). An adaptation of this general method was used in Studies 1–5. These studies involved three main parts: A study phase, a concept appraisal phase, and a test phase.

The study phase was presented on either Microsoft Visual Basic version VB15x (Studies 1, 2, and 5) or Qualtrics (Qualtrics, 2018, Provo, UT, United States) (Studies 3 and 4). During the study phase, 5–12 exemplars from several categories were presented randomly to participants. All exemplars were presented on a computer monitor against a white background. During the study phase, participants saw either visual (Studies 1 and 2) or written (Studies 3–5) exemplars, one at a time, together with the corresponding category name. After each exemplar was presented, the same exemplars were presented again in another random order. Following the second presentation of exemplars participants made predictions regarding the number of subsequent exemplars they would be able to classify correctly on a classification task, a measure known as category learning judgements (CLJs).

During the concept appraisal phase, participants completed several questions assessing dimensions of concept appraisal for each of the different categories studied. The full list of concept appraisal questions used throughout these studies is presented in Table 1. For Study 1, the induction dimension used in Thorne et al. (2021), was expanded into verbal and visual induction dimensions, leading to nine questions in all.

The test phase was presented using the same programme as the study phase. Participants were shown between 4 and 10 exemplars from each category (depending on the Study), at least half of which had not been seen previously, and had to decide which category the exemplar belonged to. After each classification, participants provided a confidence judgement about their classification on a scale of 1 (very unconfident) to 5 (very confident). Participants were unable to change their classification judgements after making a selection. All exemplars appeared in a random order. Participants were given as much time as they needed to complete the concept appraisal questions and the test phase.

Study 1

In the first study we wanted to explore whether the spacing effect shown in the previous metacognition literature (Kornell et al., 2010) would influence the dimensions of concept appraisal. This is a metacognitive manipulation, where subjects presented with new categories to learn tend to be more confident in the effectiveness of their learning when the categories are presented in a massed way (Kornell and Bjork, 2008). This confidence, however, does not track their actual performance, as it has been shown that spaced learning is often more conductive to better recall (Son, 2004; Kornell and Bjork, 2008; Wahlheim et al., 2011). We wanted to test the hypothesis that the way in which categories are learned could influence subjects’ judgements about the dimensions of concept appraisal, with massed learning leading to more confident judgements, and greater ratings for understanding, reliability and communication.

Method

Participants

Forty-three participants (24 Female, 18 Male, and 1 unspecified) recruited through an opportunity sample through the recruitment pool at City, University of London, participated in this study in exchange for a small monetary reward. Two participants whose performance fell more than two standard deviations below the mean (20% or less correct) were excluded from the study leaving N = 41 (Age 18–54; M_Age = 26.34). Sample size was sufficient to provide a power of 0.87 to detect a medium sized effect (d = 0.5).¹

Design and Materials

To select stimuli for the study, we selected ten exemplars from bird families used by Wahlheim et al. (2011) from images on www.whatbird.com. All presented images were 450 px by 450 px. Initially 12 bird families were selected (Chickadees, Finches, Flycatchers, Grosbeaks, Jays, Orioles, Sparrows, Swallows, Thrashers, Thrushes, Vireos, and Warblers). A pre-test (N = 33) obtained ratings of within-family similarity for each family of birds on scales of 0 (extremely dissimilar) to 100 (extremely similar) (see Supplementary Materials for further details). To ensure that the families of birds selected were relatively equal in terms of similarity, the six families of birds that received the most medium ratings (Jays, Orioles, Sparrows, Swallows, Thrushes, and Vireos) were selected for use in this study. Ten exemplars from each of these bird categories were selected as stimuli for this study, five or which were presented in the study phase. There were some marginal differences between the six families of birds in terms of similarity [F(5, 160) = 2.30, p = 0.06, η² = 0.07]. The 30 exemplars used for the study phases were presented one at a time for 4 s in two blocks of 15 trials. These exemplars were then each presented a second time for 2 s each in another two blocks of 15 trials. Exemplars were randomised between blocks.

A 2 factor (Study: Massed vs. Spaced) within-subjects design was implemented whereby participants learned six categories of birds, three of which were presented in a massed sequence and three in a spaced sequence. In the study phase, for massed blocks, participants were presented with study items from three categories, with five from the first, then five from the second and then five from the third. In spaced blocks, the 15 birds from the three categories were randomly ordered. Which bird families were assigned to massed versus spaced conditions was balanced across participants, and the type of study presented first in the experiment was also counterbalanced. The blocks were presented in one of two orders MSMS (N = 22) or SMSM (N = 21), where M refers to blocks where the categories were presented in a massed fashion, and S referred to blocks where the categories were presented in a spaced fashion. Blocks 3 and 4 used the same categories as Blocks 1 and 2, respectively. Exemplars were presented in a new random order for each participant. In the test phase, all ten exemplars from each category of birds were presented to participants for naming.

Procedure

The procedure for this study followed the general method. During the study phase, exemplars were presented in either a massed or spaced fashion; exemplars from the six different categories of birds were presented in four blocks of fifteen trials. Following this study phase, participants made CLJs and answered questions about the eight dimensions of concept evaluation (see Table 1) for each category of birds. Finally, participants completed the test phase in which the 30 old together with 30 new exemplars appeared in a random order, and had to be assigned to the correct category. The test phase consisted of 60 trials with a break halfway through. Ethical approval for all studies in this manuscript was obtained from the Psychology Research Ethics Committee, City, University of London.

Results

The aim of this analysis was to determine the effect that the massed and spacing manipulation had on how participants appraised the eight dimensions of concept appraisal. First, we conducted analyses to determine the effectiveness of the manipulation on participants’ judgements. If our manipulation was effective, then, consistent with previous literature, in their CLJs participants would overestimate their future performance for massed categories but not for spaced categories.

Classification Performance

To examine the accuracy of participants’ predictions, in this and subsequent studies, CLJs (estimated performance) for massed and spaced categories were converted into a percentage and compared to the percentage of exemplars from these same categories that were correctly classified. Comparison of estimated (CLJ) and actual performance was treated as a within-subjects factor labelled Performance. The results are shown in the first panel in Figure 1. A 2 (Order: Massed first vs. Spaced first) × 2 (Performance: CLJ/Estimated vs. Actual) × 2 (Study: Massed vs. Spaced) Mixed ANOVA revealed a main effect of Performance [F(1, 39) = 6.23, p = 0.02, η_p² = 0.14], indicating that participants overestimated their future performance on the task (CLJ: 57.7% vs. Actual performance: 50.4%). Further, there was a main effect of Study, with estimated and actual performance, taken together, higher for massed categories (M = 58.5%, SD = 14.9) than for spaced categories [M = 49.6%, SD = 14.3, F(1, 39) = 11.76, p = 0.001, η_p² = 0.23]. The benefits of massed categories were qualified by a significant interaction between Study and Performance [F(1, 38) = 17.68, p < 0.001, η_p² = 0.31]. Unlike Wahlheim et al. (2011), who had a somewhat different method, participants only overestimated their performance when the categories were presented in a massed fashion [t(40) = 4.62, p < 0.001, d = 0.77] and not when categories were presented in a spaced fashion [t(40) = 0.27, p = 0.79, d = 0.05].

FIGURE 1

Figure 1. Percent correct and estimated performance or category learning judgements (CLJ) in Studies 1–4. *Significant at p < 0.05.

Confidence Ratings

At test, participants rated confidence in each of their categorisation judgements on a scale from 1 to 5. Mean confidence was calculated for each condition. Means are shown as the first row in the top panel of Figure 2. A 2 (Order: Massed first vs. Spaced first) × 2 (Stimuli: Old vs. Novel) × 2 (Study: Massed vs. Spaced) mixed ANOVA revealed that participants were more confident classifying exemplars they had seen in the study phase (M = 3.31, SD = 0.74) than exemplars that were novel [M = 3.15, SD = 0.74; F(1, 39) = 15.93, p < 0.001, η_p² = 0.29]. However, there was no significant difference between massed and spaced categories [F(1, 39) = 0.96, p = 0.33, η_p² = 0.02], indicating that the manipulation did not have an effect on post-dictions. There were no other main effects or higher order effects from this analysis.

FIGURE 2

Figure 2. Average confidence at test, and concept appraisal for Studies 1–3. *Significant at p < 0.05.

Concept Appraisal

Having established that our manipulation was effective, we next tested whether the manipulation influenced the nine dimensions of concept appraisal. Results are also shown in the top panel of Figure 2. A 2 (Order: Massed first vs. Spaced first) × 2 (Study: Massed vs. Spaced) MANOVA was conducted with the nine questions on the metacognitive questionnaire. The MANOVA revealed no main effect of either order [λ = 0.25, F(9, 31) = 1.14, p = 0.37, η_p² = 0.25] or Study [λ = 0.33, F(9, 31) = 1.72, p = 0.13, η_p² = 0.33], and no significant interaction [λ = 0.29, F(9, 31) = 1.37, p = 0.24, η_p² = 0.29] (Wilks Lambda reflects the proportion of variance not attributable to effects). Thus, we found no evidence that a massed vs. spaced manipulation that succeeded in modifying CLJs had any influence on either confidence judgements, or judgements of concept appraisal.

How strong was the evidence for the null hypothesis here? Bayesian statistics for MANOVA are complex (Press, 1980), so a simplified approach was taken. Each of the nine F ratios for the univariate effects of the Study factor on the nine dependent variables was used to calculate a Bayes Factor using JASP (2021). Values below 1 indicate strength of support for the null hypothesis. Values ranged from 0.18 to 1.6, with a median of 0.31. Seven of the nine were below 1, suggesting greater support for the null than for the alternate hypothesis.

Discussion

Contrary to Wahlheim et al. (2011) subjects tended to overestimate their CLJs for massed rather than for spaced presentations of the categories. This is likely owing to the difference in the methods used. Indeed, our result is consistent with the known influence of massed presentation on learning of new categories where subjects think that they learn more when the stimuli are presented in a massed way (Kornell and Bjork, 2008; Kornell et al., 2010): we have found that the spaced vs. massed learning manipulation leads to increased estimates of future performance (CLJ) when the items are presented in a massed way, even when these CLJs do not track actual performance. This effect is plausibly due, at least in part, to processing fluency: subjects experience more fluency in the massed learning condition than in the spaced learning condition that is felt to be disfluent (Kornell et al., 2010; Wang and Xing, 2019). No increase in confidence was observed for postdictive judgements nor for the eight dimensions of concept appraisal. While this experiential manipulation of the way concepts are presented had an impact on the metacognitive evaluation of future performance, it appears not to have had an impact on the evaluation of the concepts themselves for their reliability or on the evaluation of the learner’s understanding of the concepts.

Study 2

The second study used image size (Undorf et al., 2017) to introduce a fluency manipulation in the classification of two schools of 20th Century paintings. The aims were similar to Study 1, to assess the effect of this manipulation on metacognitive and appraisal judgements.