Pediatric sleep problems can be detected across racial/ethnic subpopulations in primary care settings. However, the electronic health record (EHR) data documentation that describes patients' sleep problems may be inherently biased due to both historical biases and informed presence. This study assessed racial/ethnic differences in natural language processing (NLP) training data (e.g., pediatric sleep-related keywords in primary care clinical notes) prior to model training.
We used a predefined keyword features set containing 178 Peds B-SATED keywords. We then queried all the clinical notes from patients seen in pediatric primary care between the ages of 5 and 18 from January 2018 to December 2021. A least absolute shrinkage and selection operator (LASSO) regression model was used to investigate whether there were racial/ethnic differences in the documentation of Peds B-SATED keywords. Then, mixed-effects logistic regression was used to determine whether the odds of the presence of global Peds B-SATED dimensions also differed across racial/ethnic subpopulations.
Using both LASSO and multilevel modeling approaches, the current study found that there were racial/ethnic differences in providers' documentation of Peds B-SATED keywords and global dimensions. In addition, the most frequently documented Peds B-SATED keyword rankings qualitatively differed across racial/ethnic subpopulations.
This study revealed providers' differential patterns of documenting Peds B-SATED keywords and global dimensions that may account for the under-detection of pediatric sleep problems among racial/ethnic subpopulations. In research, these findings have important implications for the equitable clinical documentation of sleep problems in pediatric primary care settings and extend prior retrospective work in pediatric sleep specialty settings.