AUTHOR=Mandal Meisha , Levy Josh , Ives Cataia , Hwang Stephen , Zhou Yi-Hui , Motsinger-Reif Alison , Pan Huaqin , Huggins Wayne , Hamilton Carol , Wright Fred , Edwards Stephen TITLE=Correlation Analysis of Variables From the Atherosclerosis Risk in Communities Study JOURNAL=Frontiers in Pharmacology VOLUME=13 YEAR=2022 URL=https://www.frontiersin.org/journals/pharmacology/articles/10.3389/fphar.2022.883433 DOI=10.3389/fphar.2022.883433 ISSN=1663-9812 ABSTRACT=

The need to test chemicals in a timely and cost-effective manner has driven the development of new alternative methods (NAMs) that utilize in silico and in vitro approaches for toxicity prediction. There is a wealth of existing data from human studies that can aid in understanding the ability of NAMs to support chemical safety assessment. This study aims to streamline the integration of data from existing human cohorts by programmatically identifying related variables within each study. Study variables from the Atherosclerosis Risk in Communities (ARIC) study were clustered based on their correlation within the study. The quality of the clusters was evaluated via a combination of manual review and natural language processing (NLP). We identified 391 clusters including 3,285 variables. Manual review of the clusters containing more than one variable determined that human reviewers considered 95% of the clusters related to some degree. To evaluate potential bias in the human reviewers, clusters were also scored via NLP, which showed a high concordance with the human classification. Clusters were further consolidated into cluster groups using the Louvain community finding algorithm. Manual review of the cluster groups confirmed that clusters within a group were more related than clusters from different groups. Our data-driven approach can facilitate data harmonization and curation efforts by providing human annotators with groups of related variables reflecting the themes present in the data. Reviewing groups of related variables should increase efficiency of the human review, and the number of variables reviewed can be reduced by focusing curator attention on variable groups whose theme is relevant for the topic being studied.