AUTHOR=Graybill Emily , Barger Brian , Salmon Ashley , Lewis Scott TITLE=Gender and race measurement invariance of the Strengths and Difficulties Questionnaire in a U.S. base sample JOURNAL=Frontiers in Education VOLUME=9 YEAR=2024 URL=https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2024.1310449 DOI=10.3389/feduc.2024.1310449 ISSN=2504-284X ABSTRACT=Introduction

The Strengths and Difficulties Questionnaire (SDQ) is one the most widely used behavior screening tools for public schools due to its strong psychometric properties, low cost, and brief (25-question) format. However, this screening tool has several limitations including being primarily developed for the purposes of identifying clinical diagnostic conditions and primarily in a European population. To date, there has been minimal comparative research on measurement invariance in relationship to important U.S. socio-demographic metrics such as race and gender.

Method

This study utilized both structural equation modeling (i.e., confirmatory factor analysis) and item response theory (IRT) methods to investigate the measurement invariance of the SDQ across gender (male, female) and race (Black, White). CFA analyses were first conducted for each of the SDQ subscales to identify potential misfit in loadings, thresholds, and residuals. IRT-graded response models were then conducted to identify and quantify the between-group differences at the item and factor levels in terms of Cohen's d styled metrics (d > 0.2 = small, d > 0.5 = medium, d > 8 = large).

Results

There were 2,821 high school participants (52% Male, 48% Female; 88% Black, 12% White) included in these analyses. CFA analyses suggested that the item-factor relationship for most subscales were invariant, but the Conduct Problems and Hyperactivity subscales were non-invariant for strict measurement invariance. IRT analyses identified several invariant items ranging from small to large. Despite moderate to large effects for item scores on several scales, the test-level effects on scale scores were negligible.

Discussion

These analyses suggest that the SDQ subscale scores display reasonable comparable item-factor relationships across groups. Several subscale item scores displayed substantive item-level misfit, but the test level effects were minimal. Implications for the field are discussed.