Skip to main content

METHODS article

Front. Psychol., 25 April 2022
Sec. Quantitative Psychology and Measurement
This article is part of the Research Topic Methodological Issues in Psychology and Social Sciences Research View all 13 articles

Bayesian Analysis of Aberrant Response and Response Time Data

  • 1School of Mathematics and Statistics, Yili Normal University, Yining, China
  • 2Institute of Applied Mathematics, Yili Normal University, Yining, China
  • 3School of Mathematics and Statistics, Yunnan University, Kunming, China
  • 4Key Laboratory of Applied Statistics of MOE, School of Mathematics and Statistics, Northeast Normal University, Changchun, China

In this article, a highly effective Bayesian sampling algorithm based on auxiliary variables is proposed to analyze aberrant response and response time data. The new algorithm not only avoids the calculation of multidimensional integrals by the marginal maximum likelihood method but also overcomes the dependence of the traditional Metropolis–Hastings algorithm on the tuning parameter in terms of acceptance probability. A simulation study shows that the new algorithm is accurate for parameter estimation under simulation conditions with different numbers of examinees, items, and speededness levels. Based on the sampling results, the powers of the two proposed Bayesian assessment criteria are tested in the simulation study. Finally, a detailed analysis of a high-state and large-scale computerized adaptive test dataset is carried out to illustrate the proposed methodology.

1. Introduction

In educational psychological assessments, examinees often perform different types of test-taking behaviors (Bolt et al., 2002; Boughton and Yamamoto, 2007; Goegebeur et al., 2008; Chang et al., 2014; Wang and Xu, 2015; Wang et al., 2018; Man et al., 2018; Man and Harring, 2021). One is the solution behavior, in which the examinee gives a response after careful consideration to each part of an item (Schnipke and Scrams, 1997; Bolt et al., 2002; Wise and Kong, 2005; Wang and Xu, 2015). An alternative is the rapid guessing behavior, in which the examinee simply seeks to obtain an answer quickly without a deep thought process; this behavior often occurs in high-stakes tests owing to insufficient time and in low-stakes tests owing to lack of motivation. In fact, the traditional item response theory (IRT) model is based on the assumption that the correct response probability increases with the ability of the test taker under the solution behavior. The correct response probability under the rapid guessing behavior is actually rarely dependent on the measure constructed by the test (Lord and Novick, 1968; Wise and DeMars, 2006; Boughton and Yamamoto, 2007; Goegebeur et al., 2008). Numerous studies have shown that the presence of rapid guessing behavior inevitably leads to biased inferences of the item and person parameters (Bolt et al., 2002; Wise and DeMars, 2006; Boughton and Yamamoto, 2007; Goegebeur et al., 2008; Chang et al., 2014; Wang and Xu, 2015; Wang et al., 2018). Therefore, appropriate models need to be constructed to capture both solution behavior and rapid guessing behavior to reduce these biased parameter estimates. Before we analyze aberrant response behavior, we provide an explanation of the change point, which is the cut-off point at which an examinee adopts different response strategies. By considering a change point, Bolt et al. (2002) classified examinees in the speeded group before the change point and found that they were more likely to adopt the solution behavior, whereas examinees who transferred from the speeded group to the non-speeded group after the change point were more likely to choose the rapid guessing behavior. In contrast to models using fixed change point locations, Boughton and Yamamoto (2007) proposed the more flexible HYBRID model, which allowed different examinees to have change points at different locations. The model assumes that examinees' responses follow a Rasch model until a particular point in a given examinee's test, after which the responses to all items are randomly guessed. Goegebeur et al. (2008) proposed a speeded model with one change point to characterize the gradual switch between response strategies by introducing an additional examinee-specific change-rate parameter. In addition, Wise and DeMars (2006) proposed an effort-moderated IRT model to decompose the correct response probability into a mixture of two sub-models. The two sub-models were used to characterize the solution behavior and rapid guessing behavior, respectively.

In parallel with the abovementioned item response data, response time, which is an important type of important auxiliary information, has been widely used to distinguish between two different behaviors (Schnipke and Scrams, 1997; Wise and DeMars, 2006; van der Linden and Guo, 2008; Wang and Xu, 2015). van der Linden and Guo (2008) found that examinees' response times in a high-stakes achievement test showed a mixture of two different distributions. Similarly, Schnipke and Scrams (1997) verified that the distribution of response times for end-of-test items showed a bimodal pattern in a high-stakes exam. In the study of (Schnipke and Scrams, 1997), a two-state mixture model was proposed to decompose the distribution of response times for each item into two parts. The two parts of the response times quantified the solution behavior and the rapid guessing behavior, respectively. Wang and Xu (2015) proposed a mixture model to consider differences between item responses and response time patterns resulting from the solution behavior and rapid guessing behavior. The mixture model used both item response and response time information and considered multiple switch points for each examinee.

A variety of estimation methods have been proposed to estimate the parameters of the IRT and response time models. In the frequentist framework, the most common method is the marginal maximum likelihood estimation (MMLE) via expectation maximization algorithm (Bock and Aitkin, 1981; Baker and Kim, 2004). However, the main drawback of marginal maximum likelihood methods is the inevitable need for tedious approximation of the multidimensional integral using numerical integration (Bock and Schilling, 1997; Rabe-Hesketh et al., 2002, 2005) or Monte Carlo integration (Kuk, 1999; Skaug, 2002) when the latent variables are high dimensional. This is because the number of discrete quadrature points required increases exponentially as the number of latent variables increases linearly during the computation (Converse et al., 2021, p. 1465). Although the adaptive quadrature method has been used to deal with the computational deficiency by using a small number of quadrature points, the problem has not been completely solved (Jiang and Templin, 2019). In addition, the comparison method of the MMLE is simplistic; comparison methods other than the root mean square error of approximation are seldom used (Zhang et al., 2019). Compared with the MMLE method, first, the Bayesian method allows one to update knowledge by using proper informative priors based on previous studies, the posterior distribution being more precise than the likelihood or the prior alone (Jackman, 2009). The incorporation of proper informative priors into the Bayesian analysis can be used to obtain better results in the case of small to moderate sample sizes. In addition, even if weakly informative inaccurate priors are used, the performance of the Bayesian method does not deteriorate. Second, Bayesian estimation does not rely on asymptotic arguments and can give more reliable results for small samples (Lee and Song, 2004; Song and Lee, 2012). Third, another major advantage of Bayesian analysis is the ability to analyze models that are computationally heavy or impossible to estimate with MMLE. These include models with categorical outcomes with many latent variables and, thus, many dimensions of numerical integrations (Asparouhov and Muthén, 2010b; Muthén, 2010).

In the current study, an efficient Pólya–gamma Gibbs sampling algorithm (Polson et al., 2013) in a fully Bayesian framework is proposed to estimate the parameters of the mixture model of Wang and Xu (2015). Compared with traditional Bayesian sampling algorithms, e.g., the Metropolis–Hastings sampling algorithm (Metropolis et al., 1953; Hastings, 1970; Tierney, 1994; Chib and Greenberg, 1995; Chen et al., 2000), Gibbs sampling algorithm (Geman and Geman, 1984; Tanner and Wong, 1987; Gelfand and Smith, 1990; Albert, 1992; Béguin and Glas, 2001; Fox and Glas, 2001), and the advantages of the Pólya–gamma Gibbs sampling algorithm are presented from multiple perspectives. First, the Pólya–gamma Gibbs sampling algorithm avoids retrospective tuning in the Metropolis–Hastings sampling algorithm if we do not know how to choose a proper tuning parameter or if no value for the tuning parameter is appropriate. It always keeps the drawn samples accepted, thereby increasing the sampling efficiency (Zhang et al., 2020). Second, the Pólya–gamma Gibbs sampling algorithm can transform the non-conjugate model into the conjugate model by using augmented auxiliary variables. With the help of the traditional Gibbs sampling algorithm, posterior sampling is easier to implement (Polson et al., 2013). Third, in Bayesian estimation, prior distributions of model parameters and observed data likelihood produce a joint posterior distribution for the model parameters. The prior specifications and prior sensitivity are important aspects of Bayesian inference (Ghosh and A. Ghosh, 2000). In fact, the Pólya–gamma Gibbs sampling algorithm is not sensitive to the specification of the prior distribution. It can still obtain satisfactory results even if the proper or mis-specification priors are adopted (Zhang et al., 2020).

The rest of this article is organized as follows. Section 2 contains an introduction to the mixture hierarchical model and the corresponding identification restrictions. A detailed implementation of the Pólya–gamma Gibbs sampling algorithm is described in Section 3. In Section 4, two simulations focus on the parameter recovery performance of the Bayesian algorithm using the results of the model assessments. In addition, the quality of the Bayesian algorithm is investigated using high-state and large-scale computerized adaptive test data in Section 5. We conclude the article with a brief discussion in Section 6.

2. Models

Following Wang and Xu (2015), the mixture model is used to distinguish solution behavior from rapid guessing behavior. The correct response probability of examinee i on item j is assumed to follow a mixture decomposition

P(Yij=1|ηij)=(1-ηij)P(Yij=1|ηij=0)                                    +ηijP(Yij=1|ηij=1),

where ηij is a latent response behavior indicator variable, ηij = 1 denotes the case where examinee i answers item j by rapid guessing behavior, and ηij = 0 denotes the solution behavior. P(Yij = 1|ηij = 0) quantifies the probability of a correct response resulting from the solution behavior, whereas P(Yij = 1|ηij = 1) captures the probability of a correct response with the rapid guessing behavior. We use the two-parameter logistic (2PL; Birnbaum, 1968) model for the solution behavior,

P(Yij=1|ηij=0,θi,aj,bj)=exp[aj(θi-bj)]1+exp[aj(θi-bj)],

where aj and bj are the discrimination and difficulty parameters of item j, and θi denotes the ability of the ithe examinee. The probability that examinee i answers item j correctly by the rapid guessing behavior is gj; this is an item-specific probability:

P(Yij=1|ηij=1)=gj.

In parallel with the mixture item response model, the observed response time Tijobs is

Tijobs=(1-ηij)Tij+ηijCij,

where Tij represents the time required for examinee i to respond to item j using solution behavior, and Cij represents the time required for examinee i to respond to item j using rapid guessing behavior. Therefore, given latent indicator variable ηij, the density function of observed response time Tijobs can be denoted as

pij(tij|ηij)=(1-ηij)fij(tij)+ηijhij(tij),

where f and h represent corresponding density functions of Tijv and Cijv.

Response times on test items have been modeled in various families of distributions in psychometric applications, including exponential (Scheiblechner, 1979), gamma (Maris, 1993), Weibull (Rouder et al., 2003), log-normal race (Rouder et al., 2015), and semi-parametric models (Wang et al., 2013). Response time data are non-negative, and their distributions tend to be positively skewed. The log transformation would move positively skewed distributions toward symmetric shapes. We chose the log-normal distribution (van der Linden, 2006) for response times with solution behavior:

log(Tij)=λj-τi+eij, eij~N(0,σj2),

where λj is the time intensity of item j; a higher value of λj indicates that the item is expected to consume more time. τi is a speed parameter of examinee i; a higher value of τi means that the examinee works faster and a lower response time is expected. σj2 allows for differences between the variances of log-times on different items. Following the “common-guessing” (Schnipke and Scrams, 1997), the response times of the guessing behavior have a common log-normal distribution

log(Cij)~N(μc,σc2).

To capture across-person relationships between speed and accuracy, we assume that the ability and speed parameters have a bivariate normal distribution, to explore whether examinees with higher ability tend to answer items faster, i.e.,

ξi=(θi,τi) ~N(μP,ΣP),

with mean vector

μP=(μθ,μτ)

and covariance matrix

ΣP=(σθ2σθτστθστ2).

2.1. Model Identification

In the 2PL model, to eliminate the trade-off between ability θ and difficulty parameter b in location, we only need to fix the mean population level of ability to zero. That is, μθ = 0. To eliminate the trade-off between ability θ and discrimination parameter a in scale, we need to restrict the variance population level of ability to one. That is, σθ2=1. For the response time model with the solution behavior, to eliminate the trade-off between speed parameter τ and time intensity parameter λ in location, we need to fix the mean population level of speed to zero. That is, μτ = 0.

There are several widely used identification restriction methods for two-parameter IRT models. The identification restrictions of our models are based on the following methods.

(1) Fix the mean population level of ability to zero and the variance population level of ability to one (Lord and Novick, 1968; Bock and Aitkin, 1981; Fox and Glas, 2001; Fox, 2010). That is, θ ~ N(0, 1).

(2) Restrict the sum of item difficulty parameters to zero and the product of item discrimination parameters to one (Fox and Glas, 2001; Fox, 2005, 2010). That is,

j=1Jbj=0and j=1Jaj=1.

(3) Fix the item difficulty parameter to a specific value, most often zero and restrict the discrimination parameter to a specific value, most often one (Fox and Glas, 2001; Fox, 2010). That is, b1 = 0 and a1 = 1.

3. Bayesian Estimation Using MCMC Sampling

Let Ω=(ηij,,θi,aj,bj,λj,τi,σj2,μa,σa2,μb,σb2,μλ,σλ2,μc,σc2,gj,σθτ,στ2,πi); then, the full joint posterior of person and item parameters given Y, T, and η is

L(Ω|Y,T)=i=1Nj=1J[πigjh(tij;μc,σc2)]ηij.Yij[πi(1-gj)h(tij;μc,σc2)]ηij.(1-Yij)×[(1-πi)P(Yij=1|ηij=0,aj,bj,θi)f(tij;λj,τi,σj2)](1-ηij).Yij×[(1-πi)P(Yij=0|ηij=0,aj,bj,θi)f(tij;λj,τi,σj2)](1-ηij).(1-Yij)×p(θi,τi;μp,Σp)p(aj)p(bj)p(λj)p(μp,Σp),    (1)

where πi is the probability that examinee i uses the rapid guessing behavior, i.e., πi = Pij = 1).

3.1. Pólya–Gamma Gibbs Sampling Algorithm

Polson et al. (2013) proposed a new data augmentation strategy for fully Bayesian inference in logistic regression. This data augmentation approach used a new class of Pólya–gamma distribution, in contrast to the data augmentation algorithm of Albert and Chib (1993), which was based on a truncated normal distribution. Here, we introduce the Pólya–gamma distribution.

Definition: Let {Bk}k=1+ be an independent and identically distributed random variable sequence from a gamma distribution with parameters β and 1. That is, Bk~gamma(β, 1). A random variable W follows a Pólya–gamma distribution with parameters β > 0 and ϱ ∈ R, denoted W ~ PG(β, ϱ), if

W=D12πk=1+Bk(k-12)2+ϱ24π2,

where =D denotes equality in distribution. In fact, the Pólya–gamma distribution is an infinite mixture of gamma distributions, which provides the ability to sample from gamma distributions.

Based on Theorem 1 of Polson et al. (2013, page 1341, Equation 7), the likelihood contribution of the ith examinee answering the jth item under the solution behavior category ηij = 0 can be expressed as

L(aj,bj,θi)={exp[aj(θi-bj)]}Yij1+{exp[aj(θi-bj)]}exp{kij[aj(θi-bj)]}                       ×0exp{-Wij[aj(θi-bj)]22}p(Wij|1,0)dWij,    (2)

where kij=Yij-12. p(Wij|1, 0) is the conditional density of Wij. That is, Wij ~ PG(1, 0). The auxiliary variable Wij follows a Pólya–gamma distribution with parameters (1, 0). Within the solution behavior category ηij = 0, the full conditional distribution of a, b, θ given the auxiliary variables, W can be written as

p(a,b,θ|η,W,Y){i=1Nj=1J[exp{kij[aj(θi-bj)]}exp[-Wij[aj(θi-bj)]22]]}I(ηij=0)×{i=1Np(θi|τi,μP,ΣP)}I(ηij=0){j=1J[p(aj)p(bj)]}I(ηij=0),    (3)

where p(aj) and p(bj) are the prior distributions for aj and bj. It is known that there are relationships between the latent ability and speed parameter, which can be constructed by a bivariate normal prior distribution (θiτi)~N((μθμτ),ΣP). Therefore, the conditional prior distribution of θi is the normal distribution

θi|τi,μP,ΣP~N(μθ|τ,σθ|τ2),

where μθ|τ=μθ+σθτστ-2(τi-μτ) and σθ|τ2=σθ2-σθτστ-2στθ.

Step 1: Sample the auxiliary variable Wij, within the solution behavior category ηij = 0, given the item discrimination and difficulty parameters aj, bj and the ability θi. According to Equation (1), the full conditional posterior distribution of the random auxiliary variable Wij is given by

f(Wij|aj,bj,θi)exp[-Wij[aj(θi-bj)]22] p(Wij|1,0).

According to Biane et al. (2001) and Polson et al. (2013; p. 1341), the density function p(Wij|1, 0) can be written as

p(Wij|1,0)=v=0(-1)v(2k+1)2πWijexp[-(2k+1)28Wij].

Therefore, f(Wij|aj, bj, θi) is proportional to

v=0(-1)v(2k+1)2πWijexp[-(2k+1)28Wij-Wij[aj(θi-bj)]22].

Finally, the specific form of the full conditional distribution of Wij is as follows:

Wij~PG(1,|aj(θi-bj)|).

Next, Gibbs samplers are used to draw the item parameters.

Step 2: Sample the discrimination parameter aj for each item j. The prior distribution of aj is assumed to follow a truncated normal distribution, i.e., aj~N(μa,σa2)I(aj>0). Given Y, W, bj, and θ, the fully conditional posterior distribution of aj is given by

p(aj|Y,W,bj,θ)i=1N{{exp[aj(θi-bj)]}Yij1+exp[aj(θi-bj)]f(Wij|aj,bj,θi)}p(aj),

where f(Wij|aj, bj, θi) is given by the following equation (for details, refer to Polson et al., 2013; p. 1341):

f(Wij|aj,bj,θi)={cosh(2-1|aj(θi-bj)|)}20Γ(1)                                        ×v=0(-1)v(2k+1)2πWij                                        ×exp[-(2k+1)28Wij-Wij[aj(θi-bj)]22].

After rearrangement, the full conditional posterior distribution of aj can be written as follows:

p(aj|Y,W,bj,θ)i=1N{{exp[aj(θibj)]}Yij1+exp[aj(θibj)]                                        ×[cosh(21|aj(θibj)|)]                                        ×exp[[aj(θibj)]2Wij2]}p(aj).

Therefore, the fully conditional posterior distribution of aj follows a normal distribution truncated at 0 with mean

Varaj×(μaσa-2+[i=1NWij(θi-bj)2]{[i=1N(1-2Yij)(θi-bj)]2[i=1NWij(θi-bj)2]})

and variance

Varaj={σa-2+[i=1NWij(θi-bj)2]}-1.

Step 3: Sample the difficulty parameter bj for each item j. The prior distribution of bj is assumed to follow a normal distribution with mean μb and σb2. That is, bj~N(μb,σb2). Similarly, given Y, W, aj, and θ, the fully conditional posterior distribution of bj is given by

p(bj|Y,W,aj,θ)i=1N{{exp[aj(θibj)]}Yij1+exp[aj(θibj)]                                        ×[cosh(21|aj(θibj)|)]                                        ×exp[[aj(θibj)]2Wij2]}p(bj|μb,σb2).

Therefore, the fully conditional posterior distribution of bj follows a normal distribution with mean

Varbj×(μbσb-2+i=1N[aj2Wij](i=1N(2aj2θiWij-2Yijaj+aj)2i=1N[aj2Wij]))

and variance

Varbj={σb-2+i=1N[aj2Wij]}-1

Step 4: Sample the ability parameter θi for each examinee i. The conditional prior distribution of θi is assumed to follow a normal distribution with mean μθ|τ=μθ+σθτστ-2(τi-μτ) and σθ|τ2=σθ2-σθτστ-2στθ. That is, θi~N(μθ|τ,σθ|τ2). Given Y, W, a and b, the fully conditional posterior distribution of θi is given by

 p(θi|Y,W,a,b)j=1J{{exp[aj(θibj)]}Yij1+exp[aj(θibj)][cosh(21|aj(θibj)|)]×exp[[aj(θibj)]2Wij2]}p(θi|μθ|τ,σθ|τ2).

Therefore, the fully conditional posterior distribution of θi follows a normal distribution with mean

Varθi×(μθ|τσθ|τ-2+j=1J[aj2Wij](j=1J(2Yijaj+2aj2bjWij-aj)2j=1J[aj2Wij]))

and variance

Varθi={σθ|τ-2+j=1J[aj2Wij]}-1.

Step 5: Sample the response behavior variable ηij. The fully conditional posterior distribution of ηij is a Bernoulli distribution with success probability

   πigjh(Tij;μc,σc2)πigjh(Tij;μc,σc2)+(1-πi)P(Yij=1|θi,aj,bj)f(Tij;λj,τi,σj2),if Yij=1,πi(1-gj)h(Tij;μc,σc2)πi(1-gj)h(Tij;μc,σc2)+(1-πi)P(Yij=0|θi,aj,bj)f(Tij;λj,τi,σj2),if Yij=0.

Step 6: Sample πi. Given a Beta1, ι2) prior and j=1Jηij~Binomial(J,πi), the fully conditional posterior of πi is

πi~Beta(ι1+j=1Jηij, ι2+J-j=1Jηij).

Step 7: Sample gj. Given a Beta3, ι4) prior, within the guessing behavior category ηij = 1, the total number of people engaging in rapid guessing behavior on item j is i=1Nηij, and the number of correct items is i=1NηijYij; thus, i=1NηijYij~Binomial(i=1Nηij,gj). The fully conditional posterior is

gj~Beta(ι3+i=1NηijYij, ι4+i=1Nηij-i=1NηijYij).

Step 8: Sample τi. The conditional prior distribution of τi is assumed to follow a normal distribution with mean μτ|θ=μτ+στθσθ-2(θi-μθ) and στ|θ2=στ2-στθσθ-2σθτ. That is, τi~N(μτ|θ,στ|θ2). The fully conditional posterior distribution of τi given Tobs, θ, λ , σj2, μP, ΣP, η is proportional to

j=1Jf(tij;λjv,τi,σj2)(1-ηij)p(τi|μθ|τ,σθ|τ2).

The fully conditional posterior distribution of τi is

N(στi*2(σθτθiστ2-σθτ2+j=1J[(1-ηij)σj-2(λj-logtij)]), στi*2),

where στi2=((στ2σθτ2)1+j=1J[(1ηij)σj2])1.

Step 9: Sample λj. The fully conditional posterior distribution of the intensity parameter given the parameters Tobs, τ, σj2, μI, ΣI, η is

  p(λj|Tjobs,τ,σj2,μλ,σλ2,η)i=1Nf(tij;λj,τi,σj2)(1-ηij)p(λj|μλ,σλ2),

where λj~N(μλ,σλ2). The fully conditional posterior distribution of λj is

N(σλj*2(μλσλ-2+i=1N(1-ηij)(logtij+τi)σj-2),σλj*2),

where σλj2=(σλ2+σj2i=1N(1ηij))1.

Step 10: Sample σj2. A prior for σj2 is an inverse-gamma distribution, IG1, ω1). The fully conditional posterior distribution of σj2 is

IG(υ1+i=1N(1-ηij)2,ω1+i=1N[(1-ηij)(logtij-λj+τi)2]2).

Step 11: Sample μc. We assume a uniform prior pc)∝1. The fully conditional posterior distribution of μc is proportional to

p(μc|T,η)i=1Nj=1Jf(tij;μc,σc2)ηijp(μc).

The fully conditional posterior distribution of μc is

μc|T,η~N((i=1Nj=1Jηij)1(i=1Nj=1Jηijlogtij), (i=1Nj=1Jηij)1σc2).

Step 12: Sample σc2. We assume that the variance parameter follows an inverse-gamma prior distribution, IG2, ω2). The fully conditional posterior distribution of σc2 given T, μc, υ2, ω2, η is proportional to

p(σc2|T,μc,υ2,ω2,η)i=1Nj=1Jf(tij;μc,σc2)ηijp(σc2).

The fully conditional posterior distribution of σc2 is

σc2|T,μc,υ2,ω2,η~IG(υ1+i=1Nj=1Jηij2,ω1+i=1Nj=1Jηij(logtij-μc)22).

3.2. Metropolis–Hastings Sampling Algorithm

In order to estimate the constrained covariance matrix ΣP=(1σθτστθστ2)(where σθ2 is restricted to be equal to 1 owing to the model identification limitation), we need to update each element of the constrained covariance matrix separately using the Metropolis–Hastings algorithm.

Step 13: Sample the correlation σθτ between θ and τ. The identification constraints induce a restricted covariance matrix. The new value σθτ* is sampled from a truncated normal distribution N(σθτ(r-1),s012)I(-p01<σθτ*<p01), where p01=στ2,(r-1). Therefore, the probability of acceptance α(σθτ(r-1),σθτ*) can be written as

min{1,i=1Np(τi|θi (r),στ2,(r-1),σθτ*)p(σθτ*)(Φ(p01-σθτ(r-1)s01)-Φ(-p01-σθτ(r-1)s01))i=1Np(τi|θi (r),στ2,(r-1),σθτ(r-1))p(σθτ(r-1))(Φ(p01-σθτ*s01)-Φ(-p01-σθτ*s01))};

otherwise, σθτ(r-1)=σθτ*, where pii) is the conditional density function of the speed parameter, s012 is the proposal variance, and pθτ) is the density of the uniform prior.

Step 14: Sample στ2. The new value στ2,* is sampled from a truncated normal distribution N(στ2,(r-1),s022)I(στ2,*>(σθ(2)τ*)2=p0). Therefore, the probability of acceptance α(στ2,(r-1),στ2,*) can be written as

min{1,i=1Np(τi|θi (r),στ2,*,σθτ(r))p(στ2,*;κ,ϑ)(1-Φ(p0-στ2,(r-1)s02))i=1Np(τi|θi (r),στ2,(r-1),σθτ(r))p(στ2,(r-1);κ,ϑ)(1-Φ(p0-στ2,*s02))};

otherwise, στ2,*=στ2,(r-1), where s022 is the proposal variance, and p(στ2;κ,ϑ) is the density function of the scaled inverse chi-squared distribution with degrees of freedom and the scale parameter.

3.3. Bayesian Model Assessment

Two Bayesian model assessment methods were developed to evaluate the fit of the two models. The new model is a mixture model that combines responses and response times to detect rapid guessing behavior. The other model does not consider the mixture structure. Spiegelhalter et al. (2002) proposed the deviance information criterion (DIC) as a way to evaluate model fit based on Bayesian posterior estimates by considering the trade-off relationship between the adequacy of the model fitting and the number of model parameters. Write Λ = (Λij, i = 1, ..., N. j = 1, ..., J.), where Λij=(ηij,θi,aj,bj,λj,τi,σj2,μc,σc2,gj,πi) . Let {Λ(1), ..., Λ(M)}, where Λ(m)=(ηij(m),θi(m),aj(m),bj(m),λj(m),τi(m),σj2,(m),μc(m),σc2,(m),gj(m),πi(m)) for m = 1, ..., M, denotes an Markov chain Monte Carlo (MCMC) sample from the posterior distribution in (1). The logarithm of the joint likelihood function evaluated at Λ(m) is given by

logf(Y,T|Λ(m))=i=1Nj=1Jlogf(Yij,Tij|Λij(m)),    (4)

where

f(Yij,Tij|Λij)=[πigjh(Tij|μc,σc2)]ηij.Yij[πi(1-gj)h(Tij|μc,σc2)]ηij.(1-Yij)×[(1-πi)P(Yij=1|ηij=0)f(Tij|λj,τi,σj2)](1-ηij).Yij×[(1-πi)P(Yij=0|ηij=0)f(Tij|λj,τi,σj2)](1-ηij).(1-Yij).

As the log-likelihood function logf(Yij,Tij|Λij(r)), i = 1, ..., N. j = 1, ..., J, is readily available from the R outputs, logf(Y, T|Λ(r)) in (4) is easy to compute. The DIC can be calculated as follows:

DIC=Dev(Λ)^+2PD=Dev(Λ)^+2[Dev(Λ)¯Dev(Λ)^],    (5)

where

Dev(Λ)¯=-2Mm=1Mlogf(Y,T|Λ(m)) andDev(Λ)^=-2max1mMlogf(Y,T|Λ(m)).

In (5), Dev(Λ)¯ is a Monte Carlo estimate of the posterior expectation of the deviance function Dev(Λ) = −2logf(Y, T|Λ). Dev(Λ)^ is an approximation of Dev(Λ^), where Λ^ is the posterior mode, when the prior is relatively non-informative, and PD=Dev(Λ)¯-Dev(Λ)^ is the effective number of parameters. The model with a smaller DIC value fits the data better.

Another method to compare the fit of the two models is to use the logarithm of the pseudomarginal likelihood (LPML; Geisser and Eddy, 1979; Ibrahim et al., 2001) by calculating the conditional predictive ordinates (CPO) index. Next, the formulas for computing the CPO and LPML are given. Letting Uij,max=max1mM{-logf(Yij,Tij|Λij(m))}, a Monte Carlo estimate of the CPO (Gelfand et al., 1992; Chen et al., 2000) is given by

log(CPOij)^=-Uij,max-log[1Mm=1Mexp{-logf(Yij,Tij|Λij(m))-Uij,max}].    (6)

Note that the maximum value adjustment used in log(CPOij)^ plays an important part in numerical stabilization when computing exp{-logf(Yij,Tij|Λij(m))-Uij,max} in (6). A summary statistic of the CPOij^ is the sum of their logarithms, which is called the LPML and given by

LPML=i=1Nj=1Jlog(CPOij)^.

A model with a larger LPML has a better fit to the data.

4. Simulation Studies

4.1. Simulation 1

This simulation study was conducted to evaluate the recovery performance of the Pólya–gamma Gibbs sampling algorithm under different simulation conditions.

Simulation Designs

The following conditions were manipulated: (a) test length, J = 20 or 40, where the 20-item test is within 40 min, and the 40-item test is within 80 min; (b) the number of examinees, N = 1, 000 or 2, 000; and (c) the speededness level, low speededness level (LSL) or high speededness level (HSL). The speededness level is controlled by the intensity parameter λj. That is, a larger time intensity parameter corresponds to a longer average testing time. Fully crossing the different values of these four factors yielded eight conditions (two test lengths × two sample sizes × two speededness levels).

True Values and Prior Distributions

For the 2PL model, true values of item discrimination parameters aj are generated from a truncated normal distribution, i.e., aj ~ N(0, 1)I(0, +∞), j = 1, 2, ..., J, where the indicator function I(A) takes a value of 1 if A is true and 0 if A is false. The item difficulty parameters bj are generated from a standardized normal distribution. For the RT model, the response times of the rapid guessing behavior, Cij, are generated from a log-normal distribution (Wang and Xu, 2015, p. 464), i.e., logCij ~ N(−2, 0.25). The correct response probability of the rapid guessing behavior, gj, is set to 0.25 for all items (Wang and Xu, 2015). Although the variances of the RT model, σj2, can vary across items in the process of model setting and algorithm implementation, for convenience, we assume that the variance of the RT model, σj2, is set to 0.5 for all items. We controlled the speededness level by adjusting the time intensity parameter, that is, low speededness distribution λ ~ U(−0.25, 0.25) and high speededness distribution λ ~ U(0.25, 0.75). The proportion of examinees who could not finish a test within the allocated time is shown in Table 1. The proportion of items that were answered by guessing is also shown in Table 1. For the population distribution of person parameters, the ability and speed parameters (θ, τ)′ were generated from a bivariate normal distribution with mean vector (0, 0)′ and covariance matrix (10.50.50.25). The responses and response times were generated from the 2PL model and log-normal distribution. The following method was used to generate the guessing behavior indicator ηij. For all items, examinees could finish a given test within the allotted time having ηij = 0, where j = 1, ..., J. Other ηij were generated by the following two steps. Assuming that the generated response time data has no time limit for all items, then we replace Tij with Cij from the last item backward until the total response time is less than or equal to the allocated time. Therefore, given the eight simulation conditions, the RT paths for the examinees are shown in Figures 1, 2. Figures 3, 4 show the histograms of response times obtained from all item–person combinations. The non-informative priors and hyperpriors for the parameters were chosen as follows: p(aj)~N(0,105)I(0,+), p(bj)~N(0,105), p(gj) ~ Beta(5, 17), p(λj)~N(0,105), pi) ~ Beta(1, 5), p(μc)~N(-3,105), p(σc2)~Inv-Gamma(0.0001, 0.0001), p(στ2)~ Inv-Gamma(0.0001, 0.0001), and σθτ ~U(-στ2,στ2), where στ2=1. Fifty replications were considered in each simulation condition.

TABLE 1
www.frontiersin.org

Table 1. The proportions of examinees and items in the simulation study 1.

FIGURE 1
www.frontiersin.org

Figure 1. Response time paths for 1,000 examinees at different speededness levels in the simulation study 1. (A) items 20 and low speededness, (B) items 20 and high speededness, (C) items 40 and low speededness, and (D) items 40 and high speededness.

FIGURE 2
www.frontiersin.org

Figure 2. Response time paths for 2,000 examinees at different speededness levels in the simulation study 1. (A) items 20 and low speededness, (B) items 20 and high speededness, (C) items 40 and low speededness, and (D) items 40 and high speededness.

FIGURE 3
www.frontiersin.org

Figure 3. Histogram of 1,000 examinees' response times based on all item–person combinations in the simulation study 1. (A) items 20 and low speededness, (B) items 20 and high speededness, (C) items 40 and low speededness, and (D) items 40 and high speededness.

FIGURE 4
www.frontiersin.org

Figure 4. Histogram of 2,000 examinees' response times based on all item–person combinations in the simulation study 1. (A) items 20 and low speededness, (B) items 20 and high speededness, (C) items 40 and low speededness, and (D) items 40 and high speededness.

Convergence diagnostics

In order to evaluate the convergence of parameter estimates, we only considered convergence in the case of minimum sample sizes with HSLs owing to space limitations. That is, the test length was fixed at 20, and the number of examinees was 1,000. Two methods were used to check the convergence of our algorithm: the “eyeball” method to monitor the convergence by visually inspecting the history plots of the generated sequences; and the Gelman–Rubin method (Gelman and Rubin, 1992; Brooks and Gelman, 1998).

The convergence of the Bayesian algorithm was checked by monitoring the trace plots of the parameters for consecutive sequences of 20,000 iterations. The first 10,000 iterations were set as the burn-in period. As an illustration, four chains started at overdispersed starting values were run for each replication. The trace plots of item parameters randomly selected are shown in Figure 5. In addition, the potential scale reduction factor (PSRF; Brooks and Gelman, 1998) values for all item parameters are shown in Figure 6. We found that the PSRF values for all parameters were less than 1.2, which ensured that all chains converged as expected.

FIGURE 5
www.frontiersin.org

Figure 5. The trace plots of three randomly selected items for the simulation study 1.

FIGURE 6
www.frontiersin.org

Figure 6. The trace plots of R^ for item parameters in the simulation study 1.

5. Results

As shown in Table 2, the bias was between 0.0098 and 0.1411 for the discrimination parameters a, between −0.0335 and 0.0010 for the difficulty parameters b, between −0.0206 and 0.0115 for the rapid guessing parameters g, between −0.0271 and 0.0386 for the time intensity parameters λ, between −0.0105 and 0.0314 for the time discrimination parameters σ2, between 0.0196 and 0.0313 for the ability parameters θ, between 0.0058 and 0.0377 for the speed parameters τ, between −0.0259 and 0.0202 for the μc, between −0.0373 and 0.0136 for the σc2, between −0.0671 and −0.0102 for the σθτ, and between −0.0201 and 0.0056 for the στ2. In addition, the MSE was between 0.0125 and 0.0413 for the discrimination parameters a, between 0.0041 and 0.0138 for the difficulty parameters b, between 0.0009 and 0.0026 for the rapid guessing parameters g, between 0.0001 and 0.0017 for the time intensity parameters λ, between 0.0001 and 0.0005 for the time discrimination parameters σ2, between 0.0873 and 0.1920 for the ability parameters θ, between 0.0068 and 0.0693 for the speed parameters τ, between 0.0000 and 0.0007 for the μc, between 0.0000 and 0.0009 for the σc2, between 0.0010 and 0.0045 for the σθτ, and between 0.0002 and 0.0009 for the στ2. In summary, the Pólya–gamma Gibbs sampling algorithm provides accurate estimates of the parameters for various numbers of examinees and items.

TABLE 2
www.frontiersin.org

Table 2. Evaluating the accuracy of parameters based on mixture model in simulation study 1.

5.1. Simulation 2

In this simulation study, we focus on the model fitting data for the mixture model and non-mixture model based on different simulation conditions from the perspective of Bayesian model assessment. Two Bayesian model assessment tools, DIC and LPML, are used to identify the true models.

Simulation Designs

For purposes of illustration, the numbers of examinees and items were fixed at 1,000 and 40, respectively. The true value settings for the item parameters in the 2PLIRT model and response time model were the same as in simulation study 1. The first factor is the correlation coefficient. Three correction coefficients ρθτ were considered in this simulation. That is, (1) ρθτ = 0.3 (θ and τ have weak correlation; WC); (2) ρθτ = 0.8 (θ and τ have a strong correlation; SC). Furthermore, the true values of θ and τ can be drawn from a bivariate normal distribution with mean vector 0 and covariance matrix (1ρθτρθτ1). The second factor is the speededness level, which was varied by adjusting the time intensity parameter λ: (1) LSL, λ ~ U(−0.25, 0.25); (2) HSL, λ ~ U(0.25, 0.75). The third factor is the choice of fitting model: (1) mixture model; (2) non-mixture model (hierarchical structure model of van der Linden, 2007). Based on the abovementioned test conditions, the item responses and response time data were respectively generated from the 2PLIRT model and response time model. Therefore, the true models and the fitted models were designed as follows.

(i) True model, i.e., mixture model with WC (ρθτ = 0.3)⊕LSL vs. fitted model, i.e., mixture model with WC (ρθτ = 0.3)⊕LSL, and non-mixture model with WC (ρθτ = 0.3)⊕LSL.

(ii) True model, i.e., mixture model with SC (ρθτ = 0.8 )⊕LSL vs. fitted model, i.e., mixture model with SC (ρθτ = 0.8)⊕LSL, and non-mixture model with SC (ρθτ = 0.8)⊕LSL.

(iii) True model, i.e., mixture model with WC (ρθτ = 0.3)⊕HSL vs. fitted model, i.e., mixture model with WC (ρθτ = 0.3)⊕HSL, and non-mixture model with WC (ρθτ = 0.3)⊕HSL.

(iv) True model, i.e., mixture model with SC (ρθτ = 0.8 )⊕HSL vs. fitted model, i.e., mixture model with SC (ρθτ = 0.8)⊕HSL, and non-mixture model with SC (ρθτ = 0.8)⊕HSL.

(v) True model, i.e., non-mixture model with WC (ρθτ = 0.3)⊕LSL vs. fitted model, i.e., mixture model with WC (ρθτ = 0.3)⊕LSL, and non-mixture model with WC (ρθτ = 0.3)⊕LSL.

(vi) True model, i.e., non-mixture model with SC (ρθτ = 0.8 )⊕LSL vs. fitted model, i.e., mixture model with SC (ρθτ = 0.8)⊕LSL, and non-mixture model with SC (ρθτ = 0.8)⊕LSL.

(vii) True model, i.e., non-mixture model with WC (ρθτ = 0.3)⊕HSL vs. fitted model, i.e., mixture model with WC (ρθτ = 0.3)⊕HSL, and non-mixture model with WC (ρθτ = 0.3)⊕HSL.

(viii) True model, i.e., non-mixture model with SC (ρθτ = 0.8 )⊕HSL vs. fitted model, i.e., mixture model with SC (ρθτ = 0.8)⊕HSL, and non-mixture model with SC (ρθτ = 0.8)⊕HSL.

The priors of parameters were also the same as those used in simulation 1. That is, the non-informative priors were used in this simulation study. To implement the MCMC sampling algorithm, chains of length 10,000 with an initial burn-in period of 20,000 were chosen. There were 50 replications for each simulation condition. The PSRF (Brooks and Gelman, 1998) values for all item and person parameters for each simulation condition were less than 1.2.

Results

As shown in Tables 3, 4, regardless of whether the speededness levels were low or high, and whether the correlation coefficients were weak (ρθτ = 0.3) or strong (ρθτ = 0.8), both Bayesian model assessment criteria could accurately identify the true models when the data were generated from the mixture models and non-mixture models. More specifically, under the LSL and WC conditions, when the mixture model was the true model, the mixture model fitted the data better, as expected. The median DIC of the mixture model (185007.092) was smaller than that of the non-mixture model (201335.596), and the median LPML of the mixture model (−91302.451) was larger than that of the non-mixture model (−103871.796). Similarly, under the HSL and SC conditions, when the mixture model was the true model, the mixture model also fitted the data best. The differences in the medians of DIC and LPML between the mixture model and non-mixture model were −16705.267 and 6977.566, respectively. In addition, under the LSL and WC conditions, when the non-mixture model was the true model, the non-mixture model fitted the data better. The median DIC of the non-mixture model (188057.725) was smaller than that of the mixture model (192051.824), and the median LPML of the non-mixture model (−93222.498) was larger than that of the mixture model (−95204.235). Similarly, under the HSL and SC conditions, when the non-mixture model was the true model, the mixture model also fitted the data better. The differences in the medians of DIC and LPML between the non-mixture model and mixture model were −4016.200 and 1943.389, respectively. Refer to Tables 3, 4 for more detailed results of the model assessment. In summary, the Bayesian assessment criteria were effective for identifying the true models and could, thus, be used in the subsequent real data study.

TABLE 3
www.frontiersin.org

Table 3. The results of Bayesian model assessment in simulation study 2.

TABLE 4
www.frontiersin.org

Table 4. The results of Bayesian model assessment in simulation study 2.

6. Empirical Example

This section presents an application of the mixture model with an empirical example. The data set was from a high-state, large-scale, standardized computerized adaptive test that was previously analyzed by Wang and Xu (2015). The data set included 37 dichotomous items, and the test time was 75 min. The sample size was 2,106. The mixture model and non-mixture model were used to fit the item response and response time data of the 37 dichotomous items. The response time path for the examinees is shown in Figure 7. In addition, Figure 7 shows a histogram of response times obtained from all item–person combinations.

FIGURE 7
www.frontiersin.org

Figure 7. The response time path for the examinees and the histogram of response times obtained from all item-person combinations.

In the Bayesian computation, we used 20,000 MCMC samples after a burn-in of 10,000 iterations to compute all posterior estimates. The convergence of the chains was checked using the PSRF. The PSRF values of all item parameters were less than 1.2. We used the DIC and LPML to fit the mixture model and non-mixture model. The mixture model resulted in a smaller DIC value (350696.11) than the non-mixture model (365690.66), and the LPML of the mixture model (−175027.99) was larger than that of the non-mixture model (−181062.48). This indicates that the mixture model better fitted the data. Based on the results of the model assessment, we used the mixture model to analyze real data in detail.

Analysis of item parameters

The estimated results for the discrimination and difficulty parameters are shown in Table 5. As shown in the table, the expected a posteriori (EAP) estimates of the one-item discrimination parameters were greater than 1. This indicated that the items could well distinguish the differences between abilities. The three items with the lowest discrimination were items 4, 34, and 20. The EAP estimates of discrimination parameters for these three items were 0.2000, 0.2024, and 0.2638. In addition, another three items had the lowest EAP estimates of the difficulty parameters, indicating that these items were easier than the other items. These were items 7, 13, and 8. The EAP estimates of gj had a range of 0.1334 to 0.2945. The EAP estimates of λj had a range of −0.3322 to 0.7634.

TABLE 5
www.frontiersin.org

Table 5. The estimation results of discrimination and difficulty parameter for the real data.

7. Conclusion

In this article, we propose a novel and efficient Bayesian algorithm (Pólya–gamma Gibbs sampling algorithm) based on the auxiliary variables for estimating the mixture hierarchical model. The new algorithm avoids the tedious multidimensional integral operation of the MMLE. Within a fully Bayesian framework, the Pólya–gamma Gibbs sampling algorithm not only avoids the heavy reliance of the traditional Metropolis–Hastings algorithm on the tuning parameters of the proposed distributions for different data sets but also overcomes the disadvantage of the Metropolis–Hastings algorithm being sensitive to step size. However, the computational burden of the Pólya–gamma Gibbs sampling algorithm becomes excessive especially when there are a large number of examinees, the items or the abnormal response and response time data are considered, or a large number MCMC sample size is used. Therefore, it would be desirable to develop a stand-alone R package associated with Fortran software for a more extensive large-scale assessment program.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding authors.

Author Contributions

ZZ completed the writing of the article. JZ and JL provided original thoughts. ZZ and JL provided key technical support. All authors contributed to the article and approved the submitted version.

Funding

This work was supported by the National Natural Science Foundation of China (grant no. 12001091), China Postdoctoral Science Foundations (grant nos. 2021M690587 and 2021T140108), the Fundamental Research Funds for the Central Universities of China (grant no. 2412020QD025), Yili Normal University 2021 Annual Research Project (grant no. 2021YSBS012).

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyg.2022.841372/full#supplementary-material

References

Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibb sampling. J. Educ. Stat. 17, 251–269.

Google Scholar

Albert, J. H. S., and Chib. (1993). Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88, 669–679.

Google Scholar

Asparouhov, T., and Muthén, B. (2010b). Bayesian analysis of latent variable models using Mplus (Technical report, Version 4). Available online at: http://www.statmodel.com.

PubMed Abstract | Google Scholar

Baker, F. B., and Kim, S. H. (2004). Item Response Theory: Parameter Estimation Techniques. 2nd Edition, Boca Raton: CRC Press. doi: 10.1201/9781482276725

CrossRef Full Text | Google Scholar

Béguin, A. A., and Glas, C. A. W. (2001). MCMC estimation of multidimensional IRT models. Psychometrika 66, 541–561. doi: 10.1007/BF02296195

CrossRef Full Text | Google Scholar

Biane, P., Pitman, J., and Yor, M. (2001). Probability laws related to the Jacobi theta and Riemann zeta functions, and brownian excursions. Bull. Am. Math. Soc. 38, 435–465. doi: 10.48550/arXiv.math/9912170

CrossRef Full Text | Google Scholar

Birnbaum, A. (1968). “Some latent trait models and their use in inferring an examinee's ability,” in Statistical Theories of Mental Test Scores, eds F. M. Lord and M. R. Novick (Reading, MA: Addison-Wesley), 397–472.

Google Scholar

Bock, R. D., and Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika 46, 443–459.

Google Scholar

Bock, R. D., and Schilling, S. G. (1997). “High-dimensional full-information item factor analysis,” in Latent Variable Modelling and Applications to Causality, ed M. Berkane (New York, NY: Springer), 164–176.

PubMed Abstract | Google Scholar

Bolt, D. M., Cohen, A. S., and Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: application of a mixture Rasch model with ordinal constraints. J. Educ. Meas. 39, 331–348. doi: 10.1111/j.1745-3984.2002.tb01146.x

CrossRef Full Text | Google Scholar

Boughton, K. A., and Yamamoto, K. (2007). “A HYBRID model for test speededness,” in Multivariate and Mixture Distribution Rasch Models, eds M. von Davier and C. H. Carstensen (New York, NY: Springer), 147–156.

PubMed Abstract | Google Scholar

Brooks, S. P., and Gelman, A. (1998). Alternative methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics, 7, 434-455.

Google Scholar

Chang, Y. W., Tsai, R. C., and Hsu, N. J. (2014). A speeded item response model: leave the harder till later. Psychometrika 79, 255–274. doi: 10.1007/s11336-013-9336-2

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, M.-H., Shao, Q.-M., and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation. New York, NY: Springer.

PubMed Abstract | Google Scholar

Chib, S., and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. Am. Stat. 49, 327–335.

Google Scholar

Converse, G., Curi, M., Oliveira, S., and Templin, J. (2021). Estimation of multidimensional item response theory models with correlated latent variables using variational autoencoders. Mach. Learn. 110, 1463–1480. doi: 10.1007/s10994-021-06005-7

CrossRef Full Text | Google Scholar

Fox, J.-P. (2010). Bayesian Item Response Modeling: Theory and Applications. New York, NY: Springer.

Google Scholar

Fox, J.-P., and Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika 66, 269–286. doi: 10.1007/BF02294839

CrossRef Full Text | Google Scholar

Fox, J. P. (2005). Multilevel IRT using dichotomous and polytomous items. Br. J. Math. Stat. Psychol. 58, 145–172. doi: 10.1348/000711005X38951

PubMed Abstract | CrossRef Full Text | Google Scholar

Geisser, S., and Eddy, W. F. (1979). A predictive approach to model selection. J. Am. Stat. Assoc. 74, 153–160.

Google Scholar

Gelfand, A. E., Dey, D. K., and Chang, H. (1992). “Model determination using predictive distributions with implementation via sampling-based methods (with discussion),” in Bayesian statistics 4, eds J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (Oxford, UK: Oxford University Press), 147–167.

Google Scholar

Gelfand, A. E., and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409.

Google Scholar

Gelman, A., and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statisti Sci, 7, 457–472. doi: 10.1214/ss/1177011136

CrossRef Full Text | Google Scholar

Geman, S., and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6, 721–741.

PubMed Abstract | Google Scholar

Ghosh, M. A., Ghosh, M., Chen, and A. Agresti. (2000). Noninformative priors for one parameter item response models. Journal of Statistical Planning and Inference, 88, 99-115.

Google Scholar

Goegebeur, Y., De Boeck, P., Wollack, J. A., and Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrika 73, 65–87. doi: 10.1007/s11336-007-9031-2

CrossRef Full Text | Google Scholar

Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.

Google Scholar

Ibrahim, J. G., Chen, M.-H., and Sinha, D. (2001).Bayesian Survival Analysis. New York, NY: Springer.

Jackman, S. (2009). Bayesian Analysis for the Social Sciences. Chichester, UK: John Wiley & Sons.

Google Scholar

Jiang, Z., and Templin, J. (2019). Gibbs samplers for logistic item response models via the pólya–gamma distribution: a computationally efficient data–augmentation strategy. Psychometrika 84, 358–374. doi: 10.1007/s11336-018-9641-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Kuk, A. Y. C. (1999). Laplace importance sampling for generalized linear mixed models. J. Stat. Comput. Simulat. 63, 143–158.

Google Scholar

Lee, S.-Y., and Song, X.-Y. (2004). Evaluation of the Bayesian and maximum likelihood approaches in analyzing structural equation models with small sample sizes. Multivariate Behav. Res. 39, 653–686. doi: 10.1207/s15327906mbr3904_4

PubMed Abstract | CrossRef Full Text | Google Scholar

Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA: Addison Wesley.

Google Scholar

Lord, F. M., and Novick, M. R. (1968). Statistical Theories of Mental Test Scores. Reading, MA: Addison-Wesley.

Google Scholar

Man, K., and Harring, J. R. (2021). Assessing preknowledge cheating via innovative measures: A multiple-group analysis of jointly modeling item responses, response times, and visual fixation counts. Educ. Psychol. Meas. 81, 441–465. doi: 10.1177/0013164420968630

PubMed Abstract | CrossRef Full Text | Google Scholar

Man, K., Harring, J. R., Ouyang, Y., and Thomas, S. L. (2018). Response time based nonparametric Kullback-Leibler divergence measure for detecting aberrant test-taking behavior. Int. J. Testing 18, 155–177. doi: 10.1080/15305058.2018.1429446

CrossRef Full Text | Google Scholar

Maris, E. (1993). Additive and multiplicative models for gamma distributed variables, and their application as psychometric models for response times. Psychometrika 58, 445–469.

Google Scholar

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chemi. Phys. 21, 1087–1092.

Google Scholar

Muthén, B. O. (2010). Bayesian Analysis in Mplus: A Brief Introduction (Incomplete Draft,Version 3). Available online at: http://www.statmodel.com/download/IntroBayesVersion%203.pdf.

Google Scholar

Polson, N. G., Scott, J. G., and Windle, J. (2013). Bayesian inference for logistic models using Pólya–Gamma latent variables. J. Am. Stat. Assoc. 108, 1339–1349. doi: 10.1080/01621459.2013.829001

PubMed Abstract | CrossRef Full Text | Google Scholar

Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2002). Reliable estimation of general ized linear mixed models using adaptive quadrature. Stata J. 2, 1–21. doi: 10.1177/1536867X0200200101

CrossRef Full Text | Google Scholar

Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2005). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. J. Econometr. 128, 301–323. doi: 10.1016/j.jeconom.2004.08.017

CrossRef Full Text | Google Scholar

Rouder, J. N., Province, J. M., Morey, R. D., Gomez, P., and Heathcote, A. (2015). The lognormal race: a cognitive-process model of choice and latency with desirable psychometric properties. Psychometrika 80, 491–513. doi: 10.1007/s11336-013-9396-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Rouder, J. N., Sun, D., Speckman, P. L., Lu, J., and Zhou, D. (2003). A hierarchical Bayesian statistical framework for response time distributions. Psychometrika 68, 589–606. doi: 10.1007/BF02295614

PubMed Abstract | CrossRef Full Text | Google Scholar

Scheiblechner, H. (1979). Specific objective stochastic latency mechanisms. J. Math. Psychol. 19, 18–38.

Google Scholar

Schnipke, D. L., and Scrams, D. J. (1997). Modeling response times with a two-state mixture model: a new method of measuring speededness. J. Educ. Meas. 34, 213–232.

Google Scholar

Skaug, H. J. (2002). Automatic differentiation to facilitate maximum likelihood estimation in nonlinear random effects models. J. Comput. Graphical Stat. 11, 458–470. doi: 10.1198/106186002760180617

CrossRef Full Text | Google Scholar

Song, X.-Y., and Lee, S.-Y. (2012). A tutorial on the Bayesian approach for analyzing structural equation models. J. Math. Psychol. 56, 135–148. doi: 10.1016/j.jmp.2012.02.001

CrossRef Full Text | Google Scholar

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. B 64, 583–639. doi: 10.1111/1467-9868.00353

CrossRef Full Text | Google Scholar

Tanner, M. A., and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82, 528–550.

Google Scholar

Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussions). Ann. Stat. 22, 1701–1762.

Google Scholar

van der Linden, W. J. (2006). A lognormal model for response times on test items. J. Educ. Behav. Stat. 31, 181–204. doi: 10.3102/10769986031002181

CrossRef Full Text | Google Scholar

van der Linden, W. J. (2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika 72, 287–308. doi: 10.1007/s11336-006-1478-z

CrossRef Full Text | Google Scholar

van der Linden, W. J., and Guo, F. (2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika 73, 365–384. doi: 10.1007/s11336-007-9046-8

CrossRef Full Text | Google Scholar

Wang, C., Fan, Z., Chang, H.-H., and Douglas, J. (2013). A semiparametric model for jointly analyzing response times and accuracy in computerized testing. J. Educ. Behav. Stat. 38, 381–417. doi: 10.3102/1076998612461831

CrossRef Full Text | Google Scholar

Wang, C., and Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. Br. J. Math. Stat. Psychol. 68, 456–477. doi: 10.1111/bmsp.12054

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, C., Xu, G., and Shang, Z. (2018). A two-stage approach to differentiating normal and aberrant behavior in computer based testing. Psychometrika 83, 223–254. doi: 10.1007/s11336-016-9525-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Wise, S. L., and DeMars, C. E. (2006). An application of item response time: the effort-moderated IRT model. J. Educ. Meas. 43, 19–38. doi: 10.1111/j.1745-3984.2006.00002.x

CrossRef Full Text | Google Scholar

Wise, S. L., and Kong, X. (2005). Response time effort: a new measure of examinee motivation in computer-based tests. Appl. Meas. Educ. 18, 163–183. doi: 10.1207/s15324818ame1802_2

CrossRef Full Text | Google Scholar

Zhang, J. W., Lu, J., Chen, F., and Tao, J. (2019). Exploring the correlation between multiple latent variable and covariates in hierarchical data based on the multilevel multidimensional IRT model. Front Psychol, 10:2387. doi: 10.3389/fpsyg.2019.02387

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, Z. Y, Zhang, J. W., Lu, J., and Tao, J. (2020). Bayesian estimation of the DINA model with Pólya-Gamma Gibbs algorithm. Front. Psychol. 11, 384. doi: 10.3389/fpsyg.2020.00384

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: aberrant responses, Bayesian inference, mixture hierarchical model, Pólya-gamma distribution, rapid guessing behavior, Gibbs sampling algorithm

Citation: Zhang Z, Zhang J and Lu J (2022) Bayesian Analysis of Aberrant Response and Response Time Data. Front. Psychol. 13:841372. doi: 10.3389/fpsyg.2022.841372

Received: 22 December 2021; Accepted: 15 March 2022;
Published: 25 April 2022.

Edited by:

Begoña Espejo, University of Valencia, Spain

Reviewed by:

Pablo Gomez, California State University, United States
Dylan Molenaar, University of Amsterdam, Netherlands
Kaiwen Man, University of Alabama, United States

Copyright © 2022 Zhang, Zhang and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jiwei Zhang, emhhbmdqdzcxMyYjeDAwMDQwO25lbnUuZWR1LmNu; Jing Lu, bHVqMjgyJiN4MDAwNDA7bmVudS5lZHUuY24=

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.