AUTHOR=Zhou Mingjuan , Yao Tianci , Li Jian , Hui Hui , Fan Weimin , Guan Yunfeng , Zhang Aijun , Xu Bufang TITLE=Preliminary prediction of semen quality based on modifiable lifestyle factors by using the XGBoost algorithm JOURNAL=Frontiers in Medicine VOLUME=9 YEAR=2022 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2022.811890 DOI=10.3389/fmed.2022.811890 ISSN=2296-858X ABSTRACT=Introduction

Semen quality has decreased gradually in recent years, and lifestyle changes are among the primary causes for this issue. Thus far, the specific lifestyle factors affecting semen quality remain to be elucidated.

Materials and methods

In this study, data on the following factors were collected from 5,109 men examined at our reproductive medicine center: 10 lifestyle factors that potentially affect semen quality (smoking status, alcohol consumption, staying up late, sleeplessness, consumption of pungent food, intensity of sports activity, sedentary lifestyle, working in hot conditions, sauna use in the last 3 months, and exposure to radioactivity); general factors including age, abstinence period, and season of semen examination; and comprehensive semen parameters [semen volume, sperm concentration, progressive and total sperm motility, sperm morphology, and DNA fragmentation index (DFI)]. Then, machine learning with the XGBoost algorithm was applied to establish a primary prediction model by using the collected data. Furthermore, the accuracy of the model was verified via multiple logistic regression following k-fold cross-validation analyses.

Results

The results indicated that for semen volume, sperm concentration, progressive and total sperm motility, and DFI, the area under the curve (AUC) values ranged from 0.648 to 0.697, while the AUC for sperm morphology was only 0.506. Among the 13 factors, smoking status was the major factor affecting semen volume, sperm concentration, and progressive and total sperm motility. Age was the most important factor affecting DFI. Logistic combined with cross-validation analysis revealed similar results. Furthermore, it showed that heavy smoking (>20 cigarettes/day) had an overall negative effect on semen volume and sperm concentration and progressive and total sperm motility (OR = 4.69, 6.97, 11.16, and 10.35, respectively), while age of >35 years was associated with increased DFI (OR = 5.47).

Conclusion

The preliminary lifestyle-based model developed for semen quality prediction by using the XGBoost algorithm showed potential for clinical application and further optimization with larger training datasets.