AUTHOR=Lu Lu , Qian Ying , Dong Yihang , Su Han , Deng Yunxin , Zeng Qiang , Li He TITLE=A systematic study of the performance of machine learning models on analyzing the association between semen quality and environmental pollutants JOURNAL=Frontiers in Physics VOLUME=11 YEAR=2023 URL=https://www.frontiersin.org/journals/physics/articles/10.3389/fphy.2023.1259273 DOI=10.3389/fphy.2023.1259273 ISSN=2296-424X ABSTRACT=

Human exposure to Phthalates, a family of chemicals primarily used to enhance the flexibility and durability of plastics, could lead to a decline in semen quality. Extensive studies have been performed to investigate the associations between semen quality and exposure to environmental pollutants, such as phthalates. However, these early studies mainly focus on using conventional statistical methods, such as simple and efficient multi-variable linear regression methods, to perform the analysis, which may not be effective in analyzing these complex multi-variable associations. Herein, we perform a systematic study of the performance of different machine learning methods in analyzing these associations. We will use data from a cohort of 1070 Chinese males from Hubei province who provided repeated urine samples to measure phthalate metabolites. In addition, phthalate metabolites in semen are also evaluated as a biomarker to give a more direct metric. We also incorporate patient demographics and administered medications into the analysis. Overall, six machine learning models, including linear and non-linear models, are implemented to analyze associations among thirty-one features and five metrics of the quality of the semen. The performance of the models is evaluated based on root-mean-square deviation through 10-fold cross-validation. Our investigations show that the performance of different models is varied when employed to study different metrics that represent the quality of the semen. Therefore, a systematic study of the patients’ data with various machine learning models is essential in improving the quantitative analysis in discovering the critical environmental pollutants that dictate the quality of semen. We hope this study could provide guidance of employing machine learning models in the future investigation of the impact of various pollutants on semen quality.