The multiple indicator multiple cause model for cognitive neuroscience: An analytic tool which emphasizes the behavior in brain–behavior relationships

Rosen, Adon F. G.; Auger, Emma; Woodruff, Nicholas; Proverbio, Alice Mado; Song, Hairong; Ethridge, Lauren E.; Bard, David

doi:10.3389/fpsyg.2022.943613

METHODS article

Front. Psychol., 04 August 2022

Sec. Quantitative Psychology and Measurement

Volume 13 - 2022 | https://doi.org/10.3389/fpsyg.2022.943613

This article is part of the Research TopicPersistence of Measurement Problems in Psychological ResearchView all 5 articles

The multiple indicator multiple cause model for cognitive neuroscience: An analytic tool which emphasizes the behavior in brain–behavior relationships

Adon F. G. Rosen¹^*

Emma Auger¹

Nicholas Woodruff¹

Alice Mado Proverbio²

Hairong Song¹

Lauren E. Ethridge¹^†

David Bard³^†

¹Department of Psychology, University of Oklahoma, Norman, OK, United States
²Department of Psychology, University of Milan-Bicocca, Milan, Italy
³Department of Pediatrics, University of Oklahoma Health Sciences Center, Oklahoma City, OK, United States

Cognitive neuroscience has inspired a number of methodological advances to extract the highest signal-to-noise ratio from neuroimaging data. Popular techniques used to summarize behavioral data include sum-scores and item response theory (IRT). While these techniques can be useful when applied appropriately, item dimensionality and the quality of information are often left unexplored allowing poor performing items to be included in an itemset. The purpose of this study is to highlight how the application of two-stage approaches introduces parameter bias, differential item functioning (DIF) can manifest in cognitive neuroscience data and how techniques such as the multiple indicator multiple cause (MIMIC) model can identify and remove items with DIF and model these data with greater sensitivity for brain–behavior relationships. This was performed using a simulation and an empirical study. The simulation explores parameter bias across two separate techniques used to summarize behavioral data: sum-scores and IRT and formative relationships with those estimated from a MIMIC model. In an empirical study participants performed an emotional identification task while concurrent electroencephalogram data were acquired across 384 trials. Participants were asked to identify the emotion presented by a static face of a child across four categories: happy, neutral, discomfort, and distress. The primary outcomes of interest were P200 event-related potential (ERP) amplitude and latency within each emotion category. Instances of DIF related to correct emotion identification were explored with respect to an individual’s neurophysiology; specifically an item’s difficulty and discrimination were explored with respect to an individual’s average P200 amplitude and latency using a MIMIC model. The MIMIC model’s sensitivity was then compared to popular two-stage approaches for cognitive performance summary scores, including sum-scores and an IRT model framework and then regressing these onto the ERP characteristics. Here sensitivity refers to the magnitude and significance of coefficients relating the brain to these behavioral outcomes. The first set of analyses displayed instances of DIF within all four emotions which were then removed from all further models. The next set of analyses compared the two-stage approaches with the MIMIC model. Only the MIMIC model identified any significant brain–behavior relationships. Taken together, these results indicate that item performance can be gleaned from subject-specific biomarkers, and that techniques such as the MIMIC model may be useful tools to derive complex item-level brain–behavior relationships.

Introduction

Obtaining the highest signal-to-noise ratio in neuroimaging data has encouraged rapid methodological development for cognitive neuroscientists. Necessitated by the difficulty inherent to mapping the human brain where a ground truth is inaccessible. In a similar vein the quantification of cognitive traits lacks a ground truth as well. Cognitive neuroscientists typically employ workflows which minimize the influence of confounding variables in neuroimaging data; however, cognitive stimuli do not typically receive the same scrutiny. In one specific dimension of cognition, socio-emotional functioning, solutions to measuring cognition have been multipronged such as ensuring participants are familiar with the testing environment, as well as ensuring an adequate number of behavioral stimuli are obtained (Brooker et al., 2020). The multiple indicator multiple causes (MIMIC) model with itemset purification represents an additional step cognitive neuroscientists can employ to further ensure the highest quality of cognitive data are obtained. The MIMIC model represents a systems of equations approach that combines both causal and measurement modeling. Causal modeling represents the end goal of most scientific endeavors as it applies theory in a testable manner and a strict application (Rodgers, 2010). Measurement models are desirable as an inherent limitation of cognitive assessments is the influence of measurement error (Bollen, 1989b). Through the joint estimation of both a causal and measurement model, the MIMIC model represents a unique analytic tool for cognitive neuroscience as it ensures a more fine-grained assessment of behavior and a more tightly coupled brain–behavior causal model is obtained.

The application of measurement models is not novel for cognitive neuroscience. Examples exist when linking intelligence to brain volume (Gignac and Bates, 2017), interlocked functional relationships across brain regions (Finn et al., 2015), and electroencephalogram characteristics (McKinney and Euler, 2019; Hakim et al., 2021). These studies typically utilize a two-stage approach where the summary metrics of both behavioral data and neural data are created using techniques such as sum-scores, or principal components analysis, and then brain–behavior relationships are identified using a general linear model. One prominent example found within the magnetic resonance imaging literature includes the FSL FEAT software which estimates mass univariate statistics across the entire human brain using a general linear model (Woolrich et al., 2001). While the linear model has been a great success for mapping structural and functional underpinnings of behavior, techniques which jointly model both brain and behavior in a single system have become increasingly powerful for the identification of brain–behavior relationships.

Examples of techniques used to jointly model brain–behavior relationships include as canonical correlation analysis (CCA; Wang et al., 2020) or partial least squares regression (PLS; Krishnan et al., 2011). These approaches all seek to identify relationships across high dimensional data by performing dimensionality reduction on one or both sets of data and then identify components with the greatest covariance across sets of variables. However both CCA and PLS reflect more exploratory analytic techniques whereas the MIMIC model requires a more confirmatory approach be applied. The confirmatory nature of the MIMIC model requires a set of theorized causal variables (i.e., brain) to be regressed onto a theorized latent trait (i.e., fluid intelligence) which is approximated by an additional set of indicator variables (i.e., behavior; see Figure 1A). Previous research has applied the MIMIC model to explore brain–behavior relationships (Kievit et al., 2011, 2012), allowing researchers to model an individual’s cognitive ability onto their brain volume. Further applications of the MIMIC model within cognitive neuroscience have allowed explorations into whether individual differences are better explained with group factors or continuous covariates (Zadelaar et al., 2019).

FIGURE 1

Figure 1. Graphical representation of a multiple indicator multiple cause (MIMIC) model and MIMIC models exploring differential item functioning (DIF). (A) Displayed is the MIMIC model which is composed of a formative (causal) and a reflexive (measurement) model. (B) Displays the mechanism used to assess for uniform DIF, notably the mediator is the latent variable which is believed to be the mechanism linking the causal and indicator variables. When the gamma path is not fully mediated then uniform DIF is present. (C) Displays the mechanisms used to assess nonuniform DIF, notably, when the relationship between the latent variable and an individual indicator varies as a function of the causal variable nonuniform DIF is present.

In order to underscore the benefits of the MIMIC model the formative and reflexive components are first described in isolation of one another and then the synthesis of these two approaches highlights the benefit of the MIMIC model. The reflexive model’s distinctions will be described using a two-parameter item response theory (IRT) framework (Embretson and Reise, 2000):

p_{i} (θ) = \frac{1}{1 + e^{- a_{i} (Θ - b_{i})}}

In the above model $p_{i} (θ)$ is the probability of endorsement for an item (typically binary in nature) given an individual’s latent score estimate, $a_{i}$ is the item discrimination, and $b_{i}$ is the item difficulty. The above formula highlights how given a set of manifest variables IRT estimates a probability to endorse a binary item given an items discrimination and difficulty estimates. Greater discrimination values are desirable given their ability to differentiate on ability more precisely, difficulty reflects the location of the probability of endorsement being a 50% chance for a binary item. The above discrimination and difficulty parameters can be used to map out an item’s characteristic curve which is a graphical representation of the amount of information (discrimination) and location (difficulty) of an individual item. When working with binary data the logic of IRT extends beyond the formula to read as:

y_{i}^{*} = (\begin{matrix} 1 & i f y_{i}^{*} > γ_{i}, \\ 0 & o t h e r w i s e \end{matrix})

Where $τ_{i}$ is a threshold parameter for $y_{i}^{*}$ , and assume that:

y_{i}^{*} = λ_{i} η + ϵ_{i}

Where $λ_{i}$ is a loading parameter, and $η$ reflects an individual’s latent ability and $ϵ_{i}$ reflects the residual variable. The major appeal of reflexive models for cognitive neuroscience is that these models incorporate measurement error, and they allow insights into the quality of the behavioral data in both the dimensionality and the information provided by the indicator variables.

The formative model adheres to the following formulation:

η = γ^{`} x + ζ

Where $γ$ is a vector of the regression coefficients, x is a q x 1 vector of manifest random variables where q is the number of observed variables, and $ζ$ reflects the residual term. This formulation adheres to the underpinnings of most causal models, but more so implies a linear relationship (Muthén, 1985; Pearl, 2012).

The MIMIC model combines these into a system of equations resulting in the following formulation:

y_{i}^{*} = λ_{i} (γ^{`} x + ζ) + ϵ_{i}

The important distinction of this approach is the ability to incorporate residual error from both the formative and measurement model, distinguishing the system approach from these models applied in isolation. Further utility of the MIMIC model is the ability to explore the quality and consistency of the indicator variables if additional variables may be influencing the way individuals respond to items which is referred to as differential item functioning (DIF).

The second major benefit of the MIMIC model is the ability to isolate instances of DIF, which exist when an items characteristics (i.e., discrimination or difficulty) are influenced by a covariate of noninterest (e.g., gender or race). Two types of DIF exist, uniform and nonuniform. The former exists when only an item’s difficulty differs in relation to a nuisance variable, and the latter describes instances where the discrimination (and possibly difficulty) varies in relation to a nuisance variable. The impacts of DIF have previously been explored using simulated data (Roznowski and Reith, 1999; Wells et al., 2002; Li and Zumbo, 2009). These findings indicate that as larger and more frequent instances of DIF arise, an individual’s latent trait estimate becomes more biased, which can have prominent impacts on downstream statistical conclusions such as inflating Type-1 error for group comparisons (Li and Zumbo, 2009). Examples of studies utilizing real data can be found in both education; (Drasgow, 1987) and cognitive data (Roznowski and Reith, 1999; Maller, 2001), across these results are convergent emphasizing how even when bias is not observable based on the number of correct responses biased items may still be present, and these biases make it difficult to compare groups on a theorized unidimensional assessment.

The MIMIC model assesses for DIF by the inclusion of a direct path from the causal variables onto the response patterns of an individual indicator variable (see Figure 1B). By allowing for a direct path between the covariate of interest (i.e., brain volume) and the response patterns (i.e., correctly answering a question) it allows for differences in the item’s characteristics to be modeled after controlling for the latent ability. Through a mediation framework, DIF is present when this direct effect is not fully mediated (Montoya and Jeon, 2020). Another benefit of the mediation framework is that this technique allows the identification of DIF with a reduced number of observations when compared to other DIF identification techniques (Woods and Grimm, 2011; Cheng et al., 2016; Montoya and Jeon, 2020). Finally, the mediation model can be extended to incorporate a moderation to explore for instances of nonuniform DIF (Figure 1C).

To outline the structure of this paper, we explore a simulation study and an empirical study. The simulation study explores differences in estimated causal relationships when using two-stage approaches versus the MIMIC model. The second study is an empirical study with two goals. The first is to identify and explore instances of DIF in relation to neurophysiological data. The second is to illustrate the MIMIC model affords greater sensitivity when trying to identify brain–behavior relationships.

Simulation study

Goals

A simulation was performed to explore the amount of bias introduced when defining formative relationships with a two-stage approach. Data were simulated using a MIMIC model drawing on characteristics similar to the empirical example found in this study. The simulated behavioral data were summarized using two methods: a two-parameter IRT and a sum-score based approach. These behavioral proxies were then regressed onto the simulated causal variables also drawn from the same MIMIC model. Differences between the population and estimated relationships are then explored.

Methods

Simulation conditions were varied in five ways, for a total of 144 various conditions. The conditions included:

1. The number of examinees. This number varied the sample size of the simulated study ranging between a sample size which meets the minimum recommended sample size for an structural equation model exploration (n = 200) to a moderately powered exploration (n = 500). The minimum recommended sample size follows recommendations from Bollen (1989a) where it is recommended to have about five observations per freely estimated parameter. The moderately powered sample size follows more contemporary recommendations for roughly 10 observations for freely estimated parameters (Christopher Westland, 2010).

2. The strength of the indicator variables. The magnitude of the relationship between the binary indicator variables and the theorized latent variable (i.e., reflexive model) was varied between weak (Beta = 0.4) and strong (Beta = 0.8). The strength of the indicator was selected for the even and odd valued indicators so in total four permutations of the indicator strength were possible (see Figure 2A). This value represents the amount of information an indicator item shares with the latent trait. In an emotional identification setting this can be thought of as a face which is displaying only a single emotion versus traits shared across multiple emotions.

3. Item intercept. This condition type varied the item intercept thresholds—i.e., how high on the latent trait an examinee has to be to have a 50% probability of endorsement. Difficulties of screen items were drawn randomly from a uniform distribution ranging from [−1 to 1] or [0 to 2]. Note that screen item difficulties were never selected from a more difficult range [(e.g., 1–3)], because highly difficult screen items inevitably cause such an overwhelming loss of information that the simulations often failed for technical reasons. For example, highly difficult screen items will result in most examinees (rather than only some) endorsing none of the screens and therefore having response vectors of all 0 s (non-endorsements; see Figure 2B). In an emotional identification task this can be extended to how much of an emotion is displayed, anecdotally when an emotion is displayed with greater magnitude, more correct endorsements will be recorded lower the item’s intercept.

4. The magnitude of the causal relationship. The strength of the formative model included values from 0.2, 0.4, and 0.6 (see Figure 2A). The strength of the causal relationship would reflect the true relationship between the theorized brain–behavior relationship.

5. The method used to summarize the indicator variables. Indicator variables were summarized in one of three manners: sum-scores, IRT, and a mimic model. The sum-score approach took the sum of all endorsed items within each simulated participant. The IRT summarized the indicator variables with a unidimensional two-parameter IRT model trained using the “mirt” (Chalmers, 2012) package in R. The last approach used the same approach the data were simulated with, a MIMIC model.

FIGURE 2

Figure 2. Manipulation of MIMIC model for simulation component. (A) Details all possible values that can be sampled from within a single population model. These values include the relationship of the causal model indicated by 𝚪, the strength of the indicator variables indicated by 𝝠, and the intercept values indicated by 𝝡. (B) Details one example permutation with 𝚪 = 0.6, the odd 𝝠 = 0.8, the even 𝝠 = 0.4, and the 𝝡 is selected between 0:2 with a uniform distribution.

The above five conditions are summarized in Table 1. All simulated conditions used 20 indicator variables, and one causal variable. All permutations were simulated 100 times. All analyses explored parameter bias (True—Estimated) using an ANOVA framework which included all main effects described above and all possible two-, three-, and four-way interactions. Parameter bias was estimated from the sum-score approach by calculating the difference between the population model causal estimate, and the regression weight estimated when the z-scored sum-scores were regressed onto the causal variable. The parameter bias within the IRT framework was estimated by calculating the difference between the population model’s causal magnitude and the regression weight estimated when the ability estimates obtained from a two-parameter IRT model are regressed on the simulated causal variable. Finally, the parameter bias from the MIMIC model is obtained by taking the difference between the magnitude of the population causal relationship with the estimated causal relationship. All simulated datasets were created using MPlus (Muthén and Muthén, 2017), all models used for analysis were trained using R (R Core Team, 2020), all simulation code can be found online.¹

TABLE 1

Table 1. Simulation conditions.

Results

Table 2 shows the results of an ANOVA relating the simulation conditions (plus all interactions) to parameter bias. All results are statistically significant, but note that statistical significance is substantially aided by the large number of simulations. Arguably, more meaning can be attached to the ANOVA results by focusing on effect sizes. Table 2 includes eta squared and Cohen’s F. Among the main effects, the largest are for the method used to summarize the behavioral data (eta squared = 0.152; see Figure 3A) and the magnitude of the causal relationship (eta squared = 0.142; see Figure 3A). The smallest was for the sample size (eta squared = 0.001; see Figure 3A). The largest two-way interaction was between the method used to summarize the behavioral data and the magnitude of the causal relationship (eta squared = 0.071; see Figure 3B), indicating that all models performed similarly when the causal relationship was weaker, but bias increased much faster for both IRT and sum-scores as the causal relationship strengthened. The strongest three-way interaction extends this pattern to include the item intercept (eta squared = 0.001; see Figure 3C), indicating that bias is lower when items have difficulty values that encompass the majority of the ability distribution (−1:1) as opposed to more restricted difficulty items (0:2). Finally the largest four-way interaction extends the three-way interaction to include the magnitude of the indicator loadings; unsurprisingly, results indicate that a strong indicator set reduces bias across modeling techniques, but this four-way interaction also offers a cautionary note when indicators are weak, sum-scores are used, and the causal relationship is strong, in this permutation the bias was the strongest across all permutations with the estimated effect being on average one-third lower than the population parameter (see Figure 3D).

TABLE 2

Table 2. ANOVA results predicting by simulation condition.

FIGURE 3

Figure 3. Results from ANOVA comparing bias in parameter estimates. (A) Displays the main effects from all variables included in the ANOVA model, panels are faceted by the variable, and the x-axis details the levels within each factor. (B) Displays the two-way interaction with the largest eta squared between the method used to summarize the behavior scores (model) and the magnitude of the true formative relationship, results suggest near equivalent performance when a weak formative relationship is present across the models, but as the relationship increases the MIMIC model’s bias remains much lower compared to that of the sum-score and item response theory (IRT) model. (C) Displays a three-way interaction with the largest eta squared between the methods used to summarize the behavior scores (model) the magnitude of the true formative relationship, and the range of difficulty of the items results extend the logic of the two-way interaction but emphasize the reduction in bias when the difficulty parameters cover a greater majority of the range of ability estimates present in the data. (D) Displays a four-way interaction with the largest eta squared between the method used to summarize the behavior scores (model), the magnitude of the true formative relationship, the range of the difficulty parameters, and the magnitude of the indicator variable strength.