Differential item functioning in the autism behavior checklist in children with autism spectrum disorder based on a machine learning approach

Peng, Kanglong; Chen, Meng; Zhou, Libing; Weng, Xiaofang

doi:10.3389/fpsyt.2024.1447080

ORIGINAL RESEARCH article

Front. Psychiatry, 16 September 2024

Sec. Autism

Volume 15 - 2024 | https://doi.org/10.3389/fpsyt.2024.1447080

Differential item functioning in the autism behavior checklist in children with autism spectrum disorder based on a machine learning approach

Kanglong Peng^1*†

Meng Chen^2†

Libing Zhou²

Xiaofang Weng²

¹Rehabilitation Department, Shenzhen Children’s Hospital, Shenzhen, China
²Rehabilitation Department, Luohu District Maternal and Child Health Care Hospital, Shenzhen, China

Aim: Our study utilized the Rasch analysis to examine the psychometric properties of the Autism Behavior Checklist (ABC) in children with autism spectrum disorder (ASD).

Methods: A total of 3,319 children (44.77 ± 23.52 months) were included. The Rasch model (RM) was utilized to test the reliability and validity of the ABC. The GPCMlasso model was used to test the differential item functioning (DIF).

Result: The response pattern of this sample showed acceptable fitness to the RM. The analysis supported the unidimensionality assumption of the ABC. Disordered category functions and DIF were found in all items in the ABC. The participants responded to the ABC items differently depending not only on autistic traits but also on age groups, gender, and symptom classifications.

Conclusion: The Rasch analysis produces reliable evidence to support that the ABC can precisely depict clinical ASD symptoms. Differences in population characteristics may cause unnecessary assessment bias and lead to overestimated or underestimated symptom severity. Hence, special consideration for population characteristics is needed in making an ASD diagnosis.

Introduction

Autism spectrum disorder (ASD) is a highly heritable and heterogeneous neurodevelopmental disorder whose symptoms emerge in the early developmental stage and persist along the overall lifespan (1). For now, the specific pathogenesis mechanism underlying ASD is unknown; hence, no comprehensive cure for ASD has been found (2). Timely diagnosis is needed to initiate early interventions, which can lead to more optimal developmental outcomes in individuals with definitive or suspected ASD (3, 4).

A comprehensive ASD diagnosis is established based on the detailed developmental trajectory, clinical observation, and the application of standardized diagnostic instruments (3). As no objective evidence can decide whether autistic traits fulfill the criteria to make an ASD diagnosis or not, the diagnostic decision is mainly built on the clinician’s experience or the patients’ self-perception (1, 5).

To promote diagnostic reliability and validity, clinicians tend to describe autistic symptoms in two dimensions according to the Diagnostic and Statistical Manual of Mental Disorders, 5th Text Revision (DSM-5-TR), including social communication and restricted and repetitive behaviors (6). Studies found that autistic symptoms can be quantitatively rated on three dimensions including social interaction, communication, and restrictive and repetitive behaviors built based on the DSM, 4th Text Revision (DSM-4-TR) (1, 7–9). Findings proposed that the three-dimension rating structure is more optimal compared to others (e.g., DSM-5-TR) (7). Hence, individuals with ASD may present various extreme autistic traits in different dimensions, and clinicians need to decide whether autistic symptoms are merely autistic-like personalities or true autistic symptoms (10). For example, one study found that boys may display more restrictive interest in typical examples presented by clinical assessment (e.g., train, fan, computer, and dinosaurs), but girls may exhibit these interests on more developmentally normative circumscribed interests (e.g., Barbie doll and horse) (11). In fact, autistic symptoms are always heterogeneous, and children with ASD do not necessarily share the same symptomatology or the so-called core symptoms (10). Furthermore, researchers also found that individual autistic profiles built based on DSM-5-TR can be continuously categorized into various subgroups (5). For example, social communication deficits are more common in individuals with ASD who are younger and present lower developmental functioning (12). In contrast, those who are older and with higher developmental functioning tend to present restricted and repetitive behaviors (12). That means the overlap among autistic profiles can be well described by current diagnostic tools, but the variability across different subsamples may jeopardize diagnosis reliability and validity (13). As previous studies report, the symptom diversity may originate from the individual developmental profiles of children with ASD including age, cognition, speech, and language (5, 14). Hence, special considerations are needed in choosing appropriate diagnostic tools to avoid potential bias caused by these latent factors (14).

To address the developmental profile in ASD diagnosis, the International Classification of Disease 11th Revision (ICD-11) tries to describe the autistic traits starting from early childhood, and the ICD-11 defines autistic symptoms as the conflict among limited capacities and social demands (15). The Autism Behavior Checklist (ABC) was built to depict possibly all the main problematic symptoms in individuals with ASD (16). Accumulated evidence indicates that the ABC can be applied in a clinical setting to portray ASD symptomatology with acceptable psychometric properties (17–19). The psychometric research revealed some variability in measurement properties among different subsamples (17, 19, 20). For example, lower sensitivity and specificity were reported in a sample from China (79.31%/70.83%) (20). On the contrary, higher sensitivity (92.11% and 94.7%) and specificity (92.14% and 92.63%) were reported in populations from Egypt and Brazil (17, 19). As known, the ABC was established based on one survey form that contained possibly all autistic behaviors based on clinician experience (21). The ABC tries to depict autistic traits based on five components including relating, sensory, language, and body, object use, and social and self-help (21). To validate the measurement structure, Wadden and Fredrika tried to explore the component structure underlining the ABC, and the results could not confirm the original five-component structure proposed by Krug. Wadden proposed the three dimensions structure, namely, non-responsive, aloof or repetitive, and infantile or aggressive (22). Miranda proposed another five-component structure including non-responsive behavior, infant-like behavior, aggressive behavior, stereotypical behavior, and echolalic speech (23). More carefully designed psychometric studies are needed to investigate the theoretical structure under the ABC to draw a more definitive and clinically useful conclusion (22, 23).

Since timely diagnosis can endow individuals with ASD with appropriate access to early intervention, it is critical to rigorously utilize diagnostic measures to obtain reliable information from individuals with suspected diagnosis of ASD. As we know, Classical Test Theory (CTT) heavily depends on recruiting a sample with typical representative characteristics (24). The psychometric assumption achieved by CTT may vary across different studies due to samples with different characteristics (25). This can be explained by the fact that CTT assumed that measurement accuracy is invariant across all the individuals regardless of personal traits (e.g., gender, age, and ability), and CTT only adopted the total scores to estimate the measured error (26).

Hence, to comprehensively explore the theoretical basis underlying the ABC, this study adopted the Rasch model to elaborate on the item-level psychometric properties in detail. Additionally, this study also tried to describe the magnitude of measurement bias produced by potential variants in clinical applications. We aimed to establish a prediction model for clinicians to identify items that may demand extra consideration to be interpreted or individuals who tend to generate unexpected outcomes in the ABC.

Materials and methods

Participants

Participants were recruited from local referral programs of the government service including maternal and childcare service centers, educational institutions, and community agencies. Children with definitive or suspected diagnoses of ASD were referred for comprehensive evaluation through this program. The referred individuals would accept interdisciplinary assessment to achieve a definitive diagnosis of ASD and receive tailored intervention. The comprehensive assessment routinely induces the administration of the ABC and other standardized tools. Prior to administration, all necessary consent forms were obtained from all subjects and/or their legal guardian(s).

Measure

Childhood Autism Rating Scale

The Childhood Autism Rating Scale (CARS) was built to serve as an observation rating scale to depict ASD symptoms through parent/caregiver interviews, observations, and case reviews. The CARS-2 provided an additional version to describe the symptoms of high-functioning individuals with ASD. The original version remained applicable for individuals with ASD aged under 6 years or above 6 years with lower developmental functioning. The CARS consists of 15 items including relation to people, imitation, emotional response, body use, object use, adaption to change, visual response, listening response, sensory, emotional, verbal communication, gesture, activity status, intellectual response, and overall impressions. Each item is assigned a score from 1 to 4 points, where 1 denotes appropriate behavior and 4 denotes behavior severely deviated from normal criteria. The total score is the sum of all items, where higher scores denote more severe ASD symptoms. The CARS was administrated by trained/licensed clinicians or researchers with appropriate training for necessary interviews with parents and caregivers.

One study reported that the CARS can be utilized to categorize the ASD symptom severity into three levels including non-autism, mild-to-moderate autism, and severe autism (27). This tool was utilized to display the overall symptom severity in our sample.

Autism Behavior Checklist

The ABC assessment consists of 57 items that involve possibly all typical autistic behaviors. Items are categorized into five components including relating, sensory, language, body use and object manipulation, and social and self-help. Participants were asked according to the item description given by one researcher and rated the item if their children behaved as the item described. Furthermore, each item contained its own score ranging from 1 to 4 points according to the item weights. The weighted score of each item is decided by the occurrence frequency in Krug’s study (21). For example, if item 1 occurred more than item 2, then item 1 is endowed with 4 points, and item 2 is endowed with 2 points. If one item is rated, then the participant gets the according score (e.g., item 1 scores 0/4, and item 2 scores 0/2). The original cut-off score was set at 68, and a total score above 67 indicated severe symptoms or a higher possibility of being diagnosed with ASD (22).

The interrater reliability was 0.85, and the intra-rater reliability was 0.82 (17, 21).

Data analysis

Rasch model

The Rasch model is generally accepted as an augment to Classical Test Theory. The Rasch model converts the raw score summary to its natural logarithm, constructing an interval scale from dichotomous-level observation. Classical Test Theory defines the total score of a set of items as the latent traits of a person, while the Rasch model utilizes the score of items as the Sufficient Statistic for estimating person ability (the latent traits of a person) and item difficulty independently. The Rasch model related the probability of successful (unsuccessful) responses $x_{υ ι}$ to the difference between person ability $β_{υ}$ and item difficulty $δ_{ι}$ ; it allows us to estimate $β_{υ}$ and $δ_{ι}$ independently from available data, and then we can examine the way these data fit with prediction calculated from the model. The equation is shown below:

P {x_{υ ι} = 1 | β_{υ}, δ_{ι}} = \frac{e x p (β_{υ} - δ_{ι})}{[1 + e x p (β_{υ} - δ_{ι})]}

Hence, the Rasch model is widely used to investigate the properties of each item in scale, and the properties of items include item difficulties, discrimination, and fitness to the hypothetical theoretical model. The Rasch model is widely utilized to test the psychometric properties of commonly used assessment tools including the Test of Infant Motor Development, Motor Proficiency 2nd Edition, and Peabody Developmental Motor Scale (28–32). In this study, the ABC adopted a dichotomous option design (e.g., yes or no). That means if children’s behavior fulfilled the item description, then children score the weighted point, and vice versa. The Rasch model assumed that children with ASD with more severe symptoms may display more problematic behaviors in the ABC. In this study, the weighted point was canceled, and the scoring sheet was rescored by replacing the weighted by 0 and 1 points. Then, the rescored answer sheet contained only dichotomous responses (e.g., 1 and 0).

Hence, our study chose the Rasch model (RM) to examine the construct validity of the ABC.

Data were analyzed using the WINSTEPS software package (http://www.winsteps.com). The Rasch model reveals the relationship between the probability of a specific response and the difference between person ability and item difficulty.

Item fitness

The item score and person score were transformed into logit units. Then, the Rasch model was built based on the responses of individuals and the difficulties of items. Item fitness may reflect the prior assumption in the Rasch analysis. All Rasch measurements are established based on the assumption that items should display acceptable fitness to the Rasch model.

Our study adopted the infit mean square (MNSQ) and standardized Z (Zstd) to neutralize the influence of the unexpected response by assigning weight to the calculated residual. The MNSQ describes how much the participants’ response may deviate from the model, and the Zstd denotes how possible the participants may generate unexpected responses. As previous studies suggested, the infit mean square and Zstd should fall within 0.75 to 1.33 and −2 to 2, respectively (33–35). A reasonable differential efficacy was established with a separation index over 2.0, and reliability was supported with an index beyond 0.8 (33–35).

Unidimensionality

The residuals calculated based on the difference between the actual and expected performance are used in principal component analysis (PCA). The unidimensional structure is validated if over 40% of the variance of the residual can be explained by the measurement dimension, and the distribution of the residuals that are explained by extra dimension should follow the random characteristics (eigenvalue less than 2.0) (33, 34, 36).

Differential item functioning

Further, this study mainly focused on uniform differential item functioning (DIF) to detect the potential variants that may bring bias to the interpretation of the ABC scores. The Rasch model is the most frequently utilized method to identify DIF. However, large sample sizes may limit the statistical power and more easily produce type 1 errors. Furthermore, the Rasch model cannot control the impact of other confounding factors. For example, DIF produced by gender may also be affected by other demographic factors (e.g., age and education). To overcome these limitations, one machine learning method was introduced in our study. This study used a machine learning method to establish the Rasch model with the least absolute shrinkage and selection operator (lasso) penalty to detect the uniform DIF in the ABC. The GPCMlasso R package was utilized to calculate λ, which denotes the influence of the covariance (e.g., age group, gender, and symptom level in this article) on item response probability. Thus, the uniform DIF is confirmed if this lasso coefficient is unequal to zero. In this study, DIF analysis was conducted to test the influence of covariates including gender, age group, and symptom level.

Therefore, the calculation method can be written as follows:

\begin{matrix} l o g (\frac{P (Y_{p i} = r)}{P (Y_{p i} = r - 1)}) \\ = β_{i} [θ_{p} + x_{p}^{T} - δ_{i r} - (γ_{i 1} * G e n d e r + γ_{i 2} * A g e G r o u p \\ + γ_{i 3} * S y m p t o m l e v e l)] . \end{matrix}

In this equation, $γ_{i n} (n = 1, 2, 3)$ denotes the influence of covariates on item i. To modify the original DIF analysis methods (e.g., Welch’s t-test), the GPCMlasso package can test multiple covariates simultaneously and eliminate the potential multicollinearity that may exist among these variables. In this article, the Bayesian information criterion was adopted to screen for the optimal parameter λ.

Sample consideration

To obtain 99% confidence that the item calibration (item difficulty measure) is within ±1/2 logit of its robust value and avoid type one errors, a sample between 250 and 500 is recommended (37–39).

Result

Demographic data

A sample consisting of 3,319 children and adolescents was involved in this study. Table 1 presents the demographic data for this sample. The mean age was 44.77 ± 23.52 months, and the gender ratio was 2,645/674 (male/female). Our study tried to recruit a sample with balance in terms of age range designed based on the Chinese Education System (kindergarten, 3–6 years; primary school, 6–12 years; junior high school, 12–15 years; and high school, 15–18 years), but we ultimately obtained a sample of 1,414 children before registration in kindergarten, 1,503 children attending kindergarten, 380 children from primary school, 19 children and adolescents from junior high school, and 3 from high school. In terms of symptom severity, we managed to obtain a sample with a balance in the CARS severity classification as shown in Table 1.

Table 1

Table 1. Participant demographic data.

Person and item mapping and fit statistics

Figure 1 displays the overall view of the item occurrence frequency distribution. In Figure 1, the vertical line denotes the frequency continuum, and the upper position stands for less frequent behavior. As shown in Figure 1, item 13 (“Does not (or did not as a baby) reach out when reached for.”) was the least frequent behavior. That means an intention to reach for something may be the behavior that is only shown in those with the most severe autistic symptoms. Item 10 (“Seems not to hear (despite normal hearing tests).”) and item 38 (“Has not developed any friendships.”) were the most common behaviors, which means that under-reaction to external stimulation may be the most common symptom in this sample. As the mean item frequency was set at 0 logit, Figure 1 shows that the majority in this sample may present mild-to-moderate symptoms. Furthermore, Figure 1 shows that most of the items manage to cover almost the entire scale, and the items are nearly continuously distributed from −2 to 2 logit. That means the ABC can distinguish nearly 76% of the symptom variance (e.g., 2 logit = 12%, and −2 logit = 88% frequency).

Figure 1

Figure 1. Person–item map of the items in the ABC. ABC, Autism Behavior Checklist. Each "#" represents 29 persons, and each "." represents 1-28 person, for example, "." on the top represents 1-28 children are located above "1" ability level, 2) the number on the right represent the item, for example, ""10 on the botton represents item 10,3) the number on the left side represent the symptom occurence frequency continuum, for exapmle, item 10 on the participants may display this behavior.

The item–person map or the Wright map depicts the item and person distribution along the autistic trait scale constructed based on the ABC. This mapping is built by arbitrarily setting the mean item difficulty to 0. The item–person displays that the ABC items tend to cluster averagely along the 0 logit; hence, individuals with mild-to-moderate symptoms can be depicted more in detail.

The response pattern in this sample shows reasonable fitness to the expectation of the Rasch model (Table 2). That means the following analysis results are produced based on solid prior assumptions.

Table 2

Table 2. Fit statistics summary of the ABC.

The person reliability and separation index showed that the ABC is efficient enough to distinguish children with ASD with different symptom severity from each other. That means the ABC can capture the inter-person variations in symptom severity rather than other irrelevant behaviors. A value of 0.83 denotes that 83% of the personal variations captured by the ABC are caused by interindividual differences, and 16% is random error. The item reliability and separation index showed that the recruited sample was large enough to figure out the ranking of items on the measurement continuum.

Evaluation of item fitness

Our analysis detected 29 items that showed misfitting (over or unfitting) to the Rasch model (Table 3). That means these items may reflect some behaviors that are not related to symptom severity, or these behaviors tend to happen randomly instead of in a pattern. Among them, only item 8 violated the MNSQ and Zstd criteria [e.g., MNSQ (0.75–1.33) and Zstd (−2 to 2)]. These items may be more suspected as random behaviors rather than autistic traits. In general, the MNSQ denotes the unstandardized residual between the expected and real values, and Zstd presents the standardized residual that tends to cancel the effect of too erratic or robust pattern in this sample. That means the MNSQ indicates the magnitude of the residual, while the Zstd indicates the possibility of the unexpected value. In this case, the residual between expected and real performance was within the normal range, but the response pattern was too robust or erratic. For example, item 3 (“Frequently does not attend to social/environmental cues.”) displays normal MNSQ (e.g., 0.86) but unusual Zstd (e.g., −9.9), which means the reason why children do not respond to external cues may be related to autistic symptom severity, but this behavior can also happen randomly in children with ASD. As another example, item 8 (“Exhibits pronoun reversal (you for I, etc.).”) displays abnormal MNSQ and Zstd simultaneously, and that means an appropriate personal pronoun may not be a suitable behavior to calibrate the symptom severity. This could be explained by the fact that pronoun utilization may be a common problem in children as well.

Table 3

Table 3. Fit statistics for unfitting (overfitting and misfitting) items.

Assessment of unidimensionality

The principal component analysis of the residuals revealed that the variance explained by the measure was 60.9%. Three contrasts were detected in the ABC with an eigenvalue over 2. That means the ABC is measuring more than one main principal component (e.g., eigenvalue greater than 2). These extra components indicated those covariates that may jeopardize the measurement accuracy in the ABC. The unexplained ratios of measured variances were 5.3%, 3%, and 2.3%. However, the variance ratio of measures to contrasts was all larger than 3:1 (57/4.92, 57/2.77, and 57/2.13).

To determine which of the ABC items load onto the residual factors, our study arbitrarily set 0.4 as the cutoff value for a meaningful factor loading (Table 4) (40, 41).

Table 4

Table 4. Standardized residual loadings for items on contrasts.

Differential item functioning

The DIF analysis was conducted based on the rescored data.

According to Bayesian information criterion (BIC) methods, our results revealed that all items in the ABC display DIF differently in gender, age groups, and symptom classifications (Table 5). This implied that these behaviors occur in children and adolescents with ASD with different demographic characteristics differently. Also, these behaviors may be perceived and interpreted differently by the parents and guardians of these children. In the GPCMlasso equation, each group variable is encoded by the corresponding λ, and the predominant variables are set as reference.

Table 5

Table 5. The results of DIF analysis based on lasso coefficients in the GPCMlasso model for variables in the ABC.

To simplify the original formulation, the GPCMlasso model can be written as follows:

\begin{matrix} l o g (\frac{P (Y_{p i} = r)}{P (Y_{p i} = r - 1)}) \\ = [θ_{p} - (β_{i} + γ_{i 1} * G e n d e r + γ_{i 2} * A g e G r o u p \\ + γ_{i 3} * S y m p t o m l e v e l)] . \end{matrix}

In this study, the GPCMlasso model set the minorities among the subsamples as dummy code, which means Gender/female, Age group/high school, and Symptom level/severe ASD were equal to 0 in this formulation (e.g., $γ_{i 1} * f e m a l e$ , $γ_{i 2} * h i g h s c h o o l$ , and $γ_{i 3} * s e v e r e A S D$ ).

For example, Table 5 shows that the lasso coefficient for gender was −0.087 in item 31 (“Hurts others by biting, hitting, kicking …”), which means boys are more likely to hurt others physically. For the same $θ_{p}$ for boys and girls, the item difficulty equals $β_{i} - 0.087$ for boys and $β_{i}$ for girls. The probability equals the difference between $θ_{p}$ and item difficulty. That means boys have a higher probability of presenting physically harming behaviors compared to girls.

This study found no items displaying DIF regarding symptom severity. The ANOVA test was conducted to illustrate the comparison of component scores and total scores among subgroups (Table 6, Figure 2). Table 6 shows that overall significant differences among all the subgroups were found. To eliminate the impact brought by the unequal sample sizes among subgroups, the Bonferroni t-test was adopted for the post-hoc test.

Table 6

Table 6. The ABC score comparison in DIF analysis.

Figure 2

Figure 2. The ABC score comparison in DIF analysis. ABC, Autism Behavior Checklist; DIF, differential item functioning.

The result shows that no significant difference was found between boys and girls.

Since we only recruited very few children from junior high school and high school, the post-hoc result will be discussed only among infant, kindergarten, and primary children.

We found that children before kindergarten have higher total scores than the other subgroups (e.g., kindergarten and primary). The children before kindergarten presented more obvious behaviors in sensory, language, and body, and object use. No significant results were found in relating and social and self-help.

Discussion

The aim of this study was to examine the psychometric properties of the ABC by adopting Rasch analysis and machine learning methods. The ABC was developed as a common evaluation tool to be utilized across different populations diagnosed with ASD (e.g., age, gender, and symptom profiles) in clinical settings and research scenarios. The ABC has shown reasonable reliability and validity in previous works. That means the ABC can generate robust results to display the symptom severity in individuals with ASD. In this study, our results try to describe the psychometric properties of the ABC at the item level and address some limitations regarding potential measurement bias. Our findings reveal that the RM is suitable to explain the overall response pattern of individuals with ASD in the ABC evaluation. Furthermore, as a previous study reported, the ABC can explain 60.9% measurement variance, suggesting that the assessment outcomes are calibrated based on a unidimensional construct aiming to depict ASD traits. Our study also revealed some drawbacks regarding the items’ formulation. Differences in group variables may cause potential assessment bias, which can lead to unstable psychometric quality across different subsamples with ASD.

Measurement properties of the ABC items

The overall response pattern of individuals with ASD shows that children with more severe autistic symptoms displayed more problematic behaviors listed in the ABC. However, two items were not adequately endorsed. Item 13 (“Does not (or did not as a baby) reach out when reached for.”) and item 27 (“Is (or was as a baby) stiff and hard to hold.”) did not get enough response, and only five and seven persons responded to these items. The item-to-total correlation (e.g., −0.01 for item 13 and 0.03 for item 8) suggests that reaching out and body contact may not relate to symptom severity. That is the reason why these two items did not receive enough responses.

Item difficulty hierarchy and unidimensionality

This study recruited a sample with autistic symptoms ranging from non-autistic to severe autism according to the CARS classification. The item–person map displays that participants mainly represent the population with mild-to-moderate symptoms according to the severity continuum established based on the ABC. That means the ABC can be utilized to quantify the symptom severity in more detail, and the CARS is more suitable for early screening for children with suspected ASD.

For dimensionality analysis, the ABC can explain 60.9% measurement variance. The analysis also reveals three meaningful extra contrasts within the ABC. Hence, we support the multidimensionality assumption as previous studies reported (22, 23).

Differential item functioning

To date, this study is the first research focusing on the DIF analysis on the ABC. Furthermore, this study also used machine learning methods in the DIF analysis. According to the BIC method, our results reveal that 38 items in the ABC display different DIF in different groups. Our results reveal that the ABC items do not display symptom severity DIF. This implies that parents or caregivers of children with different symptom severities perceive and interpret these items equally. The other results indicate that age groupings and gender can alter the probability of item endorsement in the ABC. The identification of items with DIF emphasizes the need to cautiously interpret the ABC scores at the item level.

For gender DIF, the ABC displays an acceptable ability to capture autistic traits in male and female individuals equally in 54 items (54/57). These behaviors occur equally in children with ASD depending on symptom severity regardless of gender variant. This is in line with previous research that found limited clues for the significant difference in autistic symptoms between male and female individuals with ASD (42, 43).

For age grouping DIF, our finding reveals that the autistic traits captured by the ABC may differ depending on the developmental stage when individuals receive a diagnosis. For example, individuals with more severe symptoms are identified at the age before 3 years or entry into kindergarten in this sample. This is in line with a previous study that found that the Autism-Tics, ADHD, and other Comorbidities inventory (A-TAC) depicts the same performance pattern in individuals with ASD (42).

To date, studies on ASD are commonly recruiting samples that mostly consist of boys; hence, limited research can report the interaction between gender and autistic symptoms (44). Previous findings reveal that autistic symptoms may vary depending on gender diversity (44, 45). For example, item 25 (“Resists being touched or held.”) is more perceived in girls with ASD (e.g., λ^female equals 0/λ^male equals 0.022). In contrast, item 31 (“Hurts others by biting, hitting, kicking …”) is more commonly seen in boys with ASD (e.g., λ^female equals 0/λ^male equals −0.087). The explanations for gender DIF may vary across different studies. In this study, we found that boys tend to display more problematic behavior (e.g., hurting others and being destructive), and girls are more rigid in body contact (e.g., being held). These gender diversities have been reported in previous studies as well (43, 46). These studies found that restricted and repetitive behaviors are more perceived in boys with ASD, and girls experience more trouble in sensory sensitivity (43, 46). Previous studies report that female individuals with ASD could display different symptom phenotypes that cannot be comprehensively captured by clinical assessments (42, 46). For example, female individuals with ASD may display symptoms that are more easily accepted (e.g., toys and flowers compared to robots and trains). These interests may lead to misdiagnosis in female individuals with ASD. In some cases, female individuals with ASD cannot receive a timely diagnosis until the autistic traits ultimately cause inevitable problems during adolescence with increasing social demands (47). In this study, we only recruited 675 girls (2,645 boys) with ASD, and this sample may not adequately display the gender diversity in individuals with ASD. Overall, our study found that most symptom topography is shared between boys and girls, and the DIF statistics reflect meaningful variation in three items. The limited evidence indicates that special consideration is needed to interpret the ABC score regarding gender diversity. However, the possible symptom diversity between genders may be adequately captured by the ABC.

Autistic symptoms occur during the whole developmental trajectory, and measuring these symptoms is complex. The overlapping symptom across various spectrums is elaborated by the homogeneity and heterogeneity nature of the autistic traits. The DIF analysis reveals multiple items that demonstrate systematically different measurement properties related to age groups. In line with previous studies, autistic symptoms are more easily identified within 3 years of age (48). In this study, 1,414 (3,319) individuals received ASD diagnosis before 3 years old. Furthermore, 16 items are more easily perceived in individuals within 3 years old compared to other age groups. This can be explained by the salient developmental trajectories within 3 years old, and children with inappropriate growth rates may be more likely to be identified during this period (48). Studies found that atypical developmental curves may be related to ASD diagnosis at the early age (48). In this study, item 55 (“a developmental delay was identified at or before 30 months of age”) was more easily rated in individuals within 3 years old, which is in line with the current assumptions.

Among those inappropriate behaviors, gaze abnormalities, poor response to social stimuli, no social communication, and hypo/hypersensitivity can be early signs of autism. In this study, we found similar evidence. For example, individuals within 3 years old tend to rate in item 6 (“Poor use of visual discrimination when learning (fixates on parts of objects such as size, color, position …”). The behaviors mentioned above are commonly used in ASD measurement and can be noticed in early diagnostic assessments. We also found that these signs are less likely rated as individuals grow. These findings reveal that individuals with ASD are more likely to be noticed early, as they tend to rate more items compared to older subsamples (49). Hence, the timing of receiving the ASD diagnosis may be delayed if individuals get older, leading to a worse prognosis (4, 49).

In this study, we apply the GPCMlasso package as one machine learning method to overcome the multicollinearity problem and covariate calibration that disturbs the previous DIF analysis research. In addition to previous findings, our study reveals that the ABC cannot remain measurement-invariant across gender and age groups.

Implications for clinical practice

This finding provides preliminary evidence to assess the scale fitness, unidimensionality, and item DIF in the ABC. The results support the conclusion that the ABC is established based on a reasonable measurement structure. However, the results also reveal several shortcomings that may jeopardize the psychometric quality of the ABC. This research reminds the clinician to apply the ABC with rigorous consideration for the potential covariates (e.g., developmental stage) to eliminate possible bias. Notably, a clinical decision is needed when the ABC only provides scores that approach the judgment threshold (e.g., 67 points).

Study limitation

However, from the statistical perspective, our study fails to involve other possible comorbidities to calibrate the personal variances. Furthermore, the GPCMlasso model is only applicable to uniform DIF; hence, other items with non-uniform DIF cannot be detected. The uniform DIF denotes that the occurrence frequency is not always higher in one subgroup (e.g., infant). For example, one item is more common in infants with lower symptom severity and less likely observed in those with higher symptom severity. The DIF pattern cannot maintain stability along the whole severity continuum (50). Hence, it is more complicated to use the penalized likelihood function to determine items with non-uniform items, even though it may be theoretically feasible.

Conclusion

Our results support that the ABC is applicable for measuring autistic symptoms in individuals with ASD. Several drawbacks were identified. Items in the ABC display different DIF in corresponding subsamples. Understanding the symptom profile of individuals with ASD in the ABC by focusing on the interaction between the item difficulty and person ability can support more appropriate evaluation and intervention for this population.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

The studies involving humans were approved by the ethics committee and institutional review board of Shenzhen Children’s Hospital. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.

Author contributions

KP: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization. MC: Writing – review & editing, Data curation, Conceptualization. LZ: Writing – review & editing, Methodology, Investigation, Conceptualization. XW: Writing – review & editing, Visualization, Data curation.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This study was supported by Shenzhen High-level Hospital Construction Fund and Guangdong High-level Hospital Construction Fund.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fpsyt.2024.1447080/full#supplementary-material

References

1. Lord C, Brugha TS, Charman T, Cusack J, Dumas G, Frazier T, et al. Autism spectrum disorder. Nat Rev Dis primers. (2020) 6:5. doi: 10.1038/s41572-019-0138-4

PubMed Abstract | Crossref Full Text | Google Scholar

2. Ashmawi NS, Hammoda MA. Early prediction and evaluation of risk of autism spectrum disorders. Cureus. (2022) 14:e23465. doi: 10.7759/cureus.23465

PubMed Abstract | Crossref Full Text | Google Scholar

3. Singhi P, Malhi P. Early diagnosis of autism spectrum disorder: what the pediatricians should know. Indian J pediatrics. Apr. (2023) 90:364–8. doi: 10.1007/s12098-022-04363-1

Differential item functioning in the autism behavior checklist in children with autism spectrum disorder based on a machine learning approach

Introduction

Materials and methods

Participants

Measure

Childhood Autism Rating Scale

Autism Behavior Checklist

Data analysis

Rasch model

Item fitness

Unidimensionality

Differential item functioning

Sample consideration

Result

Demographic data

Person and item mapping and fit statistics

Evaluation of item fitness

Assessment of unidimensionality

Differential item functioning

Discussion

Measurement properties of the ABC items

Item difficulty hierarchy and unidimensionality

Differential item functioning

Implications for clinical practice

Study limitation

Conclusion

Data availability statement

Ethics statement

Author contributions

Funding

Conflict of interest

Publisher’s note

Supplementary material

References

94% of researchers rate our articles as excellent or good