Clarifying the Implicit Assumptions of Two-Wave Mediation Models via the Latent Change Score Specification: An Evaluation of Model Fit Indices

Valente, Matthew J.; Georgeson, A. R.; Gonzalez, Oscar

doi:10.3389/fpsyg.2021.709198

ORIGINAL RESEARCH article

Front. Psychol., 06 September 2021

Sec. Quantitative Psychology and Measurement

Volume 12 - 2021 | https://doi.org/10.3389/fpsyg.2021.709198

Clarifying the Implicit Assumptions of Two-Wave Mediation Models via the Latent Change Score Specification: An Evaluation of Model Fit Indices

Matthew J. Valente¹^*

A. R. Georgeson²

Oscar Gonzalez²

¹Center for Children and Families, Department of Psychology, Florida International University, Miami, FL, United States
²Department of Psychology and Neuroscience, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States

Statistical mediation analysis is used to investigate mechanisms through which a randomized intervention causally affects an outcome variable. Mediation analysis is often carried out in a pretest-posttest control group design because it is a common choice for evaluating experimental manipulations in the behavioral and social sciences. There are four different two-wave (i.e., pretest-posttest) mediation models that can be estimated using either linear regression or a Latent Change Score (LCS) specification in Structural Equation Modeling: Analysis of Covariance, difference and residualized change scores, and a cross-sectional model. Linear regression modeling and the LCS specification of the two-wave mediation models provide identical mediated effect estimates but the two modeling approaches differ in their assumptions of model fit. Linear regression modeling assumes each of the four two-wave mediation models fit the data perfectly whereas the LCS specification allows researchers to evaluate the model constraints implied by the difference score, residualized change score, and cross-sectional models via model fit indices. Therefore, the purpose of this paper is to provide a conceptual and statistical comparison of two-wave mediation models. Models were compared on the assumptions they make about time-lags and cross-lagged effects as well as statistically using both standard measures of model fit (χ², RMSEA, and CFI) and newly proposed T-size measures of model fit for the two-wave mediation models. Overall, the LCS specification makes clear the assumptions that are often implicitly made when fitting two-wave mediation models with regression. In a Monte Carlo simulation, the standard model fit indices and newly proposed T-size measures of model fit generally correctly identified the best fitting two-wave mediation model.

Introduction

The questions asked in analyses of randomized interventions are inherently about change. For example, Kunze et al. (2019) assessed if imagery rescripting, a treatment for nightmare disorder, caused a change in nightmare distress via changing the participant's self-efficacy. Generally, interventionists might ask if the program was able to change a health outcome (e.g., nightmare distress), if the program components successfully changed the mechanism (e.g., self-efficacy), or how a change in a mechanism led to a change in the health outcome. Statistical mediation analysis is used to investigate mechanisms through which a randomized intervention causally affects an outcome variable (Lazarsfeld, 1955; Baron and Kenny, 1986; MacKinnon, 2008) and is often carried out in a pretest-posttest control group designs to address questions of change. While there are many ways in which to investigate mediating mechanisms over time [see for example, MacKinnon (2008, Chapter 8), Vuorre and Bolger (2018), and Montoya (2019)], we focus on the pretest-posttest control group design because it is a common design for evaluating experimental manipulations in the behavioral and social sciences.

Traditionally, researchers use several ways to represent the change across time of mediators and outcomes in the statistical mediation model (MacKinnon, 2008, Chapter 8). For example, researchers could use a difference score, which is the difference between the score at pretest and the score at posttest. A second approach is to use a residualized change score, which is the residual left over after the posttest score is regressed on the pretest score. At face value, difference scores and residualized change scores directly address the question of change—they represent how a variable changed from pretest to posttest—thus, they remain popular approaches in the social sciences despite having some drawbacks [see the discussion section for more details; see MacKinnon et al. (1991), Jansen et al. (2013), Cederberg et al. (2016), Silverstein et al. (2018), Kunze et al. (2019), for difference score mediation models, and Miller et al. (2002), Slee et al. (2008), Quilty et al. (2008), Reid and Aiken (2013), for residualized change score mediation models]. A third approach is the ANCOVA model which treats pretest measures of the mediator and outcome as covariates when analyzing the posttest mediator-outcome relation (MacKinnon, 1994; MacKinnon et al., 2001; Schmiege et al., 2009; Jang et al., 2012). Finally, it is also possible to estimate a cross-sectional model which ignores the pretest measures. The cross-sectional model is discussed because it is one of the four possible models that can be fit with two-waves of data. The difference score, residualized change score, and cross-sectional models make very stringent assumptions about the relationship between the pretest and posttest measures for both the mediator and the outcome (Valente and MacKinnon, 2017). These assumptions are rarely evaluated, and we suspect that this lack of evaluation is because researchers did not have the tools or guidance to do so.

Statistical mediation models with difference scores, residualized change scores, or a cross-sectional model can be parameterized using a Latent Change Score (LCS; McArdle, 2001, 2009) specification (Valente and MacKinnon, 2017). Using the LCS specification, it can be shown that statistical mediation models with difference scores or residualized change scores are nested within the ANCOVA model. As such, the LCS specification provides researchers with a venue to assess model fit which is not possible using traditional approaches for estimating these models (e.g., regression modeling). This is an important finding because it allows researchers to move beyond questions asked within a Null-Hypothesis Significance Testing (NHST) framework (e.g., which two-wave mediation models produced statistically significant mediated effect estimates?) and benefit from model-based thinking (e.g., which model best describes the psychological process under investigation and which model best fits the data?; Rodgers, 2010). However, despite the promise of using this framework to assess the fit of these various models, the performance of traditional (West et al., 2012) and newly-developed fit statistics (Yuan et al., 2016) to assess the fit for these models of change must first be evaluated. While statistical properties (i.e., Type 1 error rates, statistical power, confidence interval coverage, and relative bias) of these models of change were investigated in previous research (Valente and MacKinnon, 2017), the models of change were not compared under conditions that explicitly match the constraints implied by each model and the performance of model fit indices was not investigated.

Therefore, the goals of this paper are to demonstrate the advantage of the LCS framework over the regression framework for two-wave mediation models, assess the performance of model fit indices for these models, and provide guidance for applied researchers in how to evaluate the performance of model fit indices for assessing the adequacy of the model constraints that are implied by each of these models. This paper starts with an overview of the two-wave mediation model, followed by an overview of the approach of fitting each model using an LCS specification. Next, the χ² test is described in general and for each specific nested model followed by additional model fit statistics. Then, results are presented from a simulation study on the performance of model fit indices to evaluate the fit of different models of change. Finally, an empirical example is presented to demonstrate the advantages of fitting the models using the LCS specification compared to regression modeling.

Two-Wave Mediation Models

The simplest longitudinal mediation model that can be used to estimate the mediated effect of a randomized intervention on an outcome is the two-wave mediation model. The two-wave mediation model consists of pretest (or baseline) measures of the mediator and outcome variable collected prior to units being randomized to levels of an intervention and posttest measures of the mediator and outcome variables after units have been randomized to levels of an intervention ¹.

Three popular two-wave mediation models to examine change include ANCOVA, difference scores, and residualized change scores. We recommend readers to review MacKinnon (2008, Chapter 8) for differences in these models. Another possible model that researchers could investigate with two-waves of data is the cross-sectional mediation model, in which researchers ignore the measures at pretest altogether. In other words, the cross-sectional mediation model is not a model of change, but we describe it to understand the consequences of ignoring pretest measures. Below, we describe the typical specification of these four models.

ANCOVA

The following three equations can be used to describe the relations among the intervention (X), mediator (M), and outcome (Y) variables in the two-wave mediation model (MacKinnon, 2008; Valente and MacKinnon, 2017).

\begin{array}{l} Y_{2} = i_{1} + c_{y 2 x} X + e_{1} & (1) \end{array}

\begin{array}{l} M_{2} = i_{2} + a_{m 2 x} X + s_{m 2 m 1} M_{1} + b_{m 2 y 1} Y_{1} + e_{2} & (2) \end{array}

\begin{array}{l} Y_{2} = i_{3} + {c^{'}}_{y 2 x} X + s_{y 2 y 1} Y_{1} + b_{y 2 m 1} M_{1} + b_{y 2 m 2} M_{2} + e_{3} & (3) \end{array}

The ANCOVA estimate of the mediated effect in this model can be equivalently estimated as the product a_m2xb_y2m2 from Equations (2, 3) or the difference c_y2x- c'_2yx from Equations (1, 3).

Difference Score Model

Equations (4, 5) represent regression equations using difference scores for the mediator variable and outcome variable, respectively. The difference score for the mediator is Δ_M = M₂ − M₁. The difference score for the outcome is Δ_Y = Y₂ − Y₁. These difference scores represent change on the mediator and outcome from pretest, respectively.

\begin{array}{l} Δ_{M} = i_{6} + a_{Δ} X + e_{6} & (4) \end{array}

\begin{array}{l} Δ_{Y} = i_{7} + {c^{'}}_{Δ} X + b_{Δ} Δ_{M} + e_{7} & (5) \end{array}

The mediated effect is estimated by computing the product of a_Δ coefficient from Equation (4) and b_Δ coefficient from Equation (5) (a_Δb_Δ) which is the effect of X on change in Y through its effect on change in M.

Residualized Change Score Model

Equations (6, 7) represent regression equations using residualized change scores for the mediator variable and the outcome variable, respectively. The residualized change score for the mediator variable is R_M = M₂ − E[M₂|M₁] which is the change in predicted scores on the mediator variable measured at posttest subtracted from observed scores on the mediator variable measured at posttest. The residualized change score for the outcome variable is R_Y = Y₂ − E[Y₂|Y₁] which is the change in predicted scores on the outcome variable measured at posttest subtracted from observed scores on the outcome variable at posttest.

\begin{array}{l} R_{M} = i_{8} + a_{R} X + e_{8} & (6) \end{array}

\begin{array}{l} R_{Y} = i_{9} + {c^{'}}_{R} X + b_{R} R_{M} + e_{9} & (7) \end{array}

The mediated effect is estimated by computing the product of a_R coefficient from Equation (6) and b_R coefficient from Equation (7) (a_Rb_R) which is the effect of X on the residual change in Y through its effect on the residual change in M.

Cross-Sectional Model

The cross-sectional model is the simplest model because it does not take into account the pretest measures of the mediator and outcome and therefore does not address a question of change across time. Equation (8) represents the relation between the treatment variable and the posttest mediator (a_m2x) and Equation (9) represents the relation between the treatment variable and the posttest outcome (c'_y2x) adjusted for the posttest mediator and the relation between the posttest mediator and the posttest outcome (b_y2m2) adjusted for the treatment.

\begin{array}{l} M_{2} = i_{4} + a_{m 2 x} X + e_{4} & (8) \end{array}

\begin{array}{l} Y_{2} = i_{5} + {c^{'}}_{y 2 x} X + b_{y 2 m 2} M_{2} + e_{5} & (9) \end{array}

The cross-sectional mediated effect is estimated by computing the product of a_m2x coefficient from Equation (8) and b_y2m2 coefficient from Equation (9) (a_m2xb_y2m2) which is the effect of X on Y₂ through its effect on M₂ not adjusted for pretest measures, M₁ and Y₁.

Latent Change Score Specification for Two-Wave Mediation Models

LCS specification is a SEM approach to modeling longitudinal data that can represent simple and dynamic change over time with either manifest or latent measures of a time-dependent outcome (McArdle, 2001, 2009; Grimm et al., 2017). For the two-wave mediation model, all four two-wave models previously mentioned can be fitted with the LCS specification (see Figure 1). The two-wave mediation model displayed in Figure 1 contains 20 free parameters across the mean and covariance structure. Figure 1A displays the full ANCOVA model, 1B displays the difference score model, 1C displays the residualized change score model, and 1D displays the cross-sectional model. The LCS specification for each of the two-wave mediation models can be used to evaluate the assumptions encoded by the difference score, residualized change score, and cross-sectional models via model fit indices.

FIGURE 1

Figure 1. Adapted from Valente and MacKinnon (2017). (A) LCS specification of the ANCOVA two-wave mediation model. (B) LCS specification of the difference score model. (C) LCS specification of the residualized change score model. (D) LCS specification of the cross-sectional model.

The full ANCOVA model is estimated by creating a latent change score for the mediator (ΔM) which has a loading on M₂ that is fixed to 1.0 while the mean and variance of ΔM are freely estimated. Next, the path from M₁ to M₂ is fixed to 1.0, the mean and variance of M₂ are constrained to zero, and the mean and variance of M₁ are freely estimated. The same steps are followed to compute the latent change score for the outcome variable (ΔY). The covariances between M₁, Y₁, and X are freely estimated. ΔM is then regressed on X, M₁, and Y₁and ΔY is regressed on X, ΔM, M1, and Y₁. The full ANCOVA model is a saturated model with zero degrees of freedom (df).

The difference score model is obtained by constraining the s*_m2m1, s*_y2y1, b*_y2m1, and b_m2y1 parameters in Figure 1A to zero as shown in Figure 1B. The difference score model has four df. The residualized change score model is obtained by constraining the b*_y2m1, and b_m2y1 parameters in Figure 1A to zero and the parameters s*_m2m1 and s*_y2y1, in Figure 1A to b_m2m1 and b_y2y1, respectively as shown in Figure 1C. b_m2m1 is the regression coefficient estimate from a linear regression of M₂ on M₁ and b_y2y1 is the regression coefficient estimate from a linear regression of Y₂ on Y₁. The residualized change score model has four df. The cross-sectional model is obtained by constraining the $s_{m 2 m 1}^{*}$ , $s_{y 2 y 1}^{*}$ , $b_{y 2 m 1}^{*}$ , b_m2y1 parameters and the paths from M₁ to M₂ and from Y₁ to Y₂ in Figure 1A to zero as shown in Figure 1D. The cross-sectional model has four df. These models can be fitted using any SEM software. Because these models are fitted using SEM, we can use the model fit indices to evaluate the adequacy of the model in this dataset.

It was demonstrated how these four models can be estimated using a Latent Change Score (LCS) specification and that the difference score, residualized change score, and cross-sectional models make strict assumptions about the stability of the mediator and outcome variables and the cross-lagged paths from the pretest measures of the mediator and the outcome to the posttest measures of the mediator and the outcome. These models are therefore nested within the ANCOVA model (Valente and MacKinnon, 2017). The implication is that the difference score, residualized change score, and cross-sectional models are not fully saturated models that fit the data perfectly.

In summary, the LCS specification provides researchers with two advantages over regression modeling. First, the LCS specification helps researchers clarify the assumptions they are making regarding each model of change because these assumptions are encoded in the LCS path diagrams thus resulting in a clearer understanding of the theoretical implications of each model. Second, the LCS specification provides the added benefit of supplementing theoretical considerations of model choice with fit statistics. Below, we describe how model fit is evaluated.

Evaluating the Fit of Different Models of Change

There are several fit statistics that researchers could use to evaluate fit. Some of these fit statistics include: the χ² goodness of fit statistic, the Comparative Fit Index (CFI; Bentler, 1990), the Root Mean Square Error of Approximation (RMSEA; Steiger, 1989), and the newly proposed T-size CFI and RMSEA (Yuan et al., 2016). Performance of the standard fit indices has been investigated in LCS models in the context of measurement non-invariance (Kim et al., 2020) and when testing the performance of the fit indices in selecting the nested autoregressive cross-lagged factor model (Usami et al., 2015, 2016). The performance of these standard fit indices and the T-size fit indices for selecting alternative two-wave mediation models has never been investigated.

The χ² Test

Model χ² goodness of fit tests or simply, χ² tests, can be used to test the fit of a statistical model or used to compare the fit of two competing models such that one model is nested within another model (Bentler and Bonett, 1980; Bollen, 1989; West et al., 2012). A model is considered nested within a full model if it is possible to estimate the parameters of the nested model by constraining parameters of a full model to zero, effectively removing them from the model. As demonstrated above, the difference score model in Figure 1B is nested within the ANCOVA model in Figure 1A because constraining the $s_{m 2 m 1}^{*}$ , $s_{y 2 y 1}^{*}$ , $b_{y 2 m 1}^{*}$ , and b_m2y1 parameters to zero results in the difference score model with four df. Because the ANCOVA model is fully saturated and fits the data perfectly, the χ² tests of the nested models are simply the χ² goodness of fit tests. Therefore, the χ² test can then be used to test the null hypothesis that the difference score model fits the data perfectly assuming a χ²distribution with df equal to four. Therefore, rejecting the null hypothesis provides evidence that the difference score model does not fit the data perfectly. For example, we might fit the difference score model that is displayed in Figure 1B and observe χ² = 7.5, df = 4. The critical value for a chi-square distribution with 4 degrees of freedom at p-value = 0.05 is 9.488. Therefore, we fail to reject the null hypothesis that the difference score model fits the data perfectly, thus providing justification to fit the difference score model.

Failing to reject the null hypothesis provides statistical evidence that estimating the extra four parameters in the ANCOVA model does not result in a significantly better fitting model as compared to the fit of the simpler, more parsimonious difference score model (i.e., simpler in terms of less estimated parameters). In other words, the psychological phenomenon characterized by this two-wave mediation model can be explained equally well using a simpler model with fewer estimated parameters compared to a more complicated model with more estimated parameters. The χ² test is a test of perfect model fit which may be unrealistic in practice (MacCallum et al., 2001). Therefore, it is important to investigate how each model fits the data by using other indexes of model fit.

CFI and RMSEA

The Comparative Fit Index (CFI) is a goodness of fit index that measures how well model-implied covariances match the observed covariances in the data (Bentler, 1990). Higher values of the CFI indicate better fit than lower values of the CFI (Bentler, 1990; West et al., 2012). The Root Mean Square Error of Approximation (RMSEA) is a badness of fit index that measures how poorly model-implied covariances match the observed covariances in the data (Steiger and Lind, 1980; Steiger, 1989, 2016). Lower values of the RMSEA indicate better fit than higher values of the RMSEA (Steiger and Lind, 1980; Steiger, 1989, 2016; West et al., 2012). Both the CFI and the RMSEA have cut-off values that are used as a rule-of-thumb to determine at which values of the respective fit indexes a model is considered to fit the data well.

Equivalence Testing and T-Size Measures

While the null hypothesis of the χ²test can provide information that a model does not fit perfectly and the CFI and RMSEA can provide information about the goodness of model fit and the badness of model fit, respectively, neither of these indexes of model fit provide information about endorsing the null hypothesis for model fit. Ideally, there would be a measure that could provide some level of confidence that the model fit is within a specified range of the null hypothesis. In other words, the rejection of the standard null hypothesis of model fit will tell us the model does not fit perfectly but failure to reject the null does not tell us that the model does fit perfectly. Recent papers by Yuan et al. (2016) and Marcoulides and Yuan (2017, 2020) provide such measures for SEMs via equivalence testing.

The goal of equivalence testing is to endorse a model under the standard null hypothesis instead of rejecting a model under the standard null hypothesis. In order to conduct equivalence testing, a minimum tolerable size of model misspecification (ε_t; i.e., the T-size) corresponding to the observed χ²-test statistic must be determined. The main goal of equivalence testing is to accurately reject a model. This happens when the observed χ²-test statistic falls within a specific interval between zero and a left-tail critical value with cumulative probability equal to α from a non-central χ² distribution with a specified level of misspecification and df equal to the observed model df. The rejection of the null hypothesis at α = 0.05, implies the model misspecification is within a tolerable size. This is opposed to standard null hypothesis significance testing which tests if the observed χ²-test statistic falls above a right-tail critical value with cumulative probability equal to 1—α from a central χ² distribution with df equal to the observed model df [for a complete treatment and details on how the significance regions are calculated, see Yuan et al. (2016)]. In keeping with the literature on equivalence testing in SEM, the tolerable size of misspecification can be transformed into a T-size RMSEA, or CFI value. Regarding the RMSEA, the T-size measures are interpreted at α = 0.05 as “we are 95% confident that the misspecification is X-units as measured by the RMSEA” and the T-size CFI is interpreted as “we are 95% confident that the population CFI is above X” (Marcoulides and Yuan, 2017).

Because equivalence testing results in new T-size RMSEA and CFI values that take into account a specified level of model misspecification, it is not appropriate to compare these T-size RMSEA and CFI values to the standard cut-off values. To remedy this, Yuan et al. (2016) derived adjusted cut-off values for the T-size RMSEA and CFI values for which the T-size RMSEA and CFI values can be compared, respectively. The adjusted RMSEA cut-off values are estimated based on the observed sample size and model degrees of freedom and are therefore estimated for each sample and each model being fit to the data. The interpretation of the adjusted cut-off values is therefore conditional on our specified level of model misspecification. In other words, our model may have excellent, close, fair, mediocre, or poor fit given the specified level of misspecification.

Regarding the CFI, equivalence testing compares the fit of the observed model to the misfit of the baseline model for the CFI. The adjusted cut-off values for the CFI are a function of the sample size, model degrees of freedom, number of predictors, and baseline model degrees of freedom. The adjusted CFI cut-off values are therefore estimated for each sample and each model being fit to the data. Similar to the adjusted cut-off values for the RMSEA, the interpretation of the adjusted cut-off values for the CFI are conditional on our specified level of model misspecification. In other words, our model may have excellent, close, fair, mediocre, or poor fit given the specified level of misspecification.

Present Study

The LCS specification allows researchers to test the adequacy of the model constraints imposed by the difference score, residualized change score, and cross-sectional models. This is an advantage over the regression-based approach for these models because the regression-based approach does not have any way of evaluating how the model constraints may impact the fit of the models to the observed data. Since the model fit can be assessed for each of these models, researchers can use fit indexes in SEM to evaluate the appropriateness of these model constraints for their observed data. Therefore, it is important to know how the χ² test, CFI, and RMSEA, along with the equivalence measures, will perform when evaluating the fit of these two-wave mediation models (i.e., do the indices support or reject model fit when they should).

The purpose of the simulation study is to demonstrate which factors of the two-wave mediation model are important predictors of the Type 1 error and power of the χ² test when used to test the difference score model, the residualized change score model, and the cross-sectional model. There are three main hypotheses for the simulation study that are driven from the constraints that are made to fit each of these models: (1). When stability = 1.00 and cross-lags = 0 (i.e., the true model is the difference score model) the null hypothesis of the χ² test assessing the fit of the difference score model should not be rejected; (2). When both cross-lagged paths = 0 (i.e., the residualized change score model is the true model), the null hypothesis of the χ² test assessing the fit of the residualized change score model should not be rejected; (3).When stability = 0 and cross-lags = 0 (i.e., the cross-sectional model is the true model), the null hypothesis of the χ² test assessing the fit of the cross-sectional model should not be rejected. Further, it is expected that the CFI, T-size CFI, RMSEA, and T-size RMSEA values will indicate close or excellent fit when the respective model assumptions are met. Further, we have an empirical illustration to highlight the differences between the estimation of the models of change using regression and the LCS specification.

Simulation Study

Method

SAS 9.4 was used to conduct Monte Carlo simulations. The following equations represent the linear regression model used to generate the data where x is an observed value of X and $\tilde{x}$ is the sample median.

\begin{array}{l} X ~ N (0, 1) : (x \geq \tilde{x}) = 1; (x < \tilde{x}) = 0 & (10) \end{array}

\begin{array}{l} M_{1} ~ N (0, 1) & (11) \end{array}

\begin{array}{l} Y_{1} = b_{y 1 m 1} M_{1} + e_{1} & (12) \end{array}

\begin{array}{l} M_{2} = a_{m 2 x} X + b_{m 2 y 1} Y_{1} + s_{m 2 m 1} M_{1} + e_{2} & (13) \end{array}

\begin{array}{l} Y_{2} = {c^{'}}_{y 2 x} X + b_{y 2 m 1} M_{1} + b_{y 2 m 2} M_{2} + s_{y 2 y 1} Y_{1} + e_{3} & (14) \end{array}

The factors varied were: sample size (N = 50, 100, 200, 500); effect size of the a_m2x (0, 0.14, 0.39, 0.59), b_y2m2 (0, 0.14, 0.39, 0.59), and c'_y2x (0, 0.39) paths; effect size of the Y₂ cross-lagged path b_y2m1 (0, 0.50) and M₂ cross-lagged path b_m2y1 (0, 0.50); stability of the mediating variable (s_m2m1) and outcome variable (s_y2y1) (0, 0.30, 1.00); and relation between M₁ and Y₁ (0, 0.50). These factors were varied to test hypotheses 1 – 3. All residual terms (e₁, e₂, and e₃) had a standard deviation of one, were uncorrelated with each other and the predictors. The effect sizes were chosen to reflect approximately small, medium, and large effect sizes (Cohen, 1988). A full factorial design produced 3,072 conditions, each with 1,000 replications. All models were fit using SAS PROC CALIS. The T-size measures of the CFI and RMSEA were obtained using the R function provided by Yuan et al. (2016).

The raw data were analyzed using analysis of variance (ANOVA). The dataset contained 3,072,000 observations consisting of 1,000 replications for each of the 3,072 conditions. All significant main effects and interactions with semi-partial eta-squared values of 0.005 or greater (rounded to the third decimal place) were considered important and reported in the Supplementary Materials along with simulation results for additional fit indexes (SRMR, AIC, and BIC). The ANOVA was used to determine the pattern of results described in the proceeding results section. The CFI and RMSEA values of the ANCOVA model were not reported because the ANCOVA model is a saturated model with zero degrees of freedom therefore it fits the data perfectly. Type 1 error rates of the χ² tests were deemed acceptable if they fell within the robustness interval [0.025, 0.075] (Bradley, 1978). Sample size was not a significant predictor of the performance of the χ² tests, CFI, or RMSEA therefore all results for these fit indices were collapsed across sample sizes.