Skip to main content

ORIGINAL RESEARCH article

Front. Endocrinol., 16 February 2024
Sec. Bone Research
This article is part of the Research Topic Vitamin D and mineral ion homeostasis: Endocrine dysregulation in chronic diseases View all 15 articles

Machine learning-based prediction of vitamin D deficiency: NHANES 2001-2018

Jiale GuoJiale Guo1Qionghan HeQionghan He2Yehai Li*Yehai Li1*
  • 1Department of Orthopedics, Chaohu Hospital of Anhui Medical University, Hefei, China
  • 2Department of Infection, Chaohu Hospital of Anhui Medical University, Hefei, China

Background: Vitamin D deficiency is strongly associated with the development of several diseases. In the current context of a global pandemic of vitamin D deficiency, it is critical to identify people at high risk of vitamin D deficiency. There are no prediction tools for predicting the risk of vitamin D deficiency in the general community population, and this study aims to use machine learning to predict the risk of vitamin D deficiency using data that can be obtained through simple interviews in the community.

Methods: The National Health and Nutrition Examination Survey 2001-2018 dataset is used for the analysis which is randomly divided into training and validation sets in the ratio of 70:30. GBM, LR, NNet, RF, SVM, XGBoost methods are used to construct the models and their performance is evaluated. The best performed model was interpreted using the SHAP value and further development of the online web calculator.

Results: There were 62,919 participants enrolled in the study, and all participants included in the study were 2 years old and above, of which 20,204 (32.1%) participants had vitamin D deficiency. The models constructed by each method were evaluated using AUC as the primary evaluation statistic and ACC, PPV, NPV, SEN, SPE, F1 score, MCC, Kappa, and Brier score as secondary evaluation statistics. Finally, the XGBoost-based model has the best and near-perfect performance. The summary plot of SHAP values shows that the top three important features for this model are race, age, and BMI. An online web calculator based on this model can easily and quickly predict the risk of vitamin D deficiency.

Conclusion: In this study, the XGBoost-based prediction tool performs flawlessly and is highly accurate in predicting the risk of vitamin D deficiency in community populations.

1 Introduction

Vitamin D is a unique fat-soluble vitamin, and as it is produced primarily through exposure of human skin to sunlight, few foods contain natural vitamin D (1). Its main role in humans is to increase the absorption of calcium and phosphate to mineralize the bones (2). In children, vitamin D deficiency leads to growth retardation and rickets (3). In adults, vitamin D deficiency can lead to osteochondrosis and osteoporosis (3). Vitamin D deficiency and its health consequences first gained attention with the industrialization of Northern Europe. As research progressed, vitamin D deficiency was also found to be strongly associated with the development of diabetes (4), sarcopenia (5), psychiatric disorders (6), autoimmune diseases (7), cardiovascular diseases (8), and tumors (9). Because of the role of vitamin D in the antiviral immune response (10, 11), vitamin D-related studies have gained more attention since the COVID-19 pandemic. Vitamin D levels have also been shown to be associated with the prevention and prognosis of COVID-19 (1214). Vitamin D deficiency has now been defined as a pandemic. As an important part of public health, identifying vitamin D deficiency is vital. However, a single measurement of vitamin D costs £9.86 and between 70.4% and 77.5% of tests are likely to be inappropriate (15). Testing for vitamin D in all populations does not appear to be appropriate. An Endocrine Society Clinical Practice Guideline recommends screening for vitamin D in people at risk for deficiency; they do not recommend screening for vitamin D in people who are not at risk (16). The use of prediction tools to identify patients at high risk of vitamin D deficiency is necessary. As of now, there are no prediction tools for predicting vitamin D risk in the general community population.

Machine learning is one of the fastest growing technology areas today and is widely used to enable evidence-based decision making in industries such as healthcare, manufacturing, and education (17). Machine learning is primarily based on large datasets to develop robust risk models and predict the type of person being studied (18, 19). Prediction tools developed using machine learning can be a good predictor of vitamin D deficiency risk in participants. The purpose of this study was to construct a prediction tool to predict participants’ risk of vitamin D deficiency using a machine learning method based on data that can be easily collected in a general community population.

2 Materials and methods

2.1 Data sources and study population

Data for this study were obtained from the National Health and Nutritional Examination Surveys (NHANES), a population-based, cross-sectional survey study conducted in two-year cycles since 1999 to assess the health and nutritional status of adults and children in the United States. Serum 25(OH)D as a good biomarker for evaluating vitamin D status was used in this study as a laboratory test to determine vitamin D deficiency (20). The definition of vitamin deficiency used in this study was 25(OH)D < 50 nmol/L as recommended by an Endocrine Society Clinical Practice Guideline (16). Data from NHANES 2001-2018 containing 25(OH)D measurements were included in this study. In particular, serum 25(OH)D data from NHANES 2001-2006 were determined by the radioimmunoassay (RIA) method, which, due to excessive methodological bias and inaccuracy, was switched to liquid chromatography-tandem mass spectrometry (LC-MS/MS), a method that has better specificity and sensitivity, in the follow-up to NHANES 2007-2018 (21). Whereas serum 25(OH)D data from NHANES 2001-2006 have been converted to 25(OH)D measurements from equivalent LC-MS/MS methods by using regression.

For simplicity and ease of use of the model, only information that could be obtained in the community through a simple interview was included as variables for instrument development: gender, age, race, total number of people in the Household (H.Size), household income to poverty income ratio (H.PIR), body mass index (BMI), whether or not someone smokes in the household (H.Smoke), past 30-day milk product consumption (Milk), diabetes. Race is categorized as Mexican American, Non-Hispanic White, Non-Hispanic Black, Other Hispanic, or Other Race. For H.Size over 7 or more defined as 7. For H.PIR more than 5 is defined as 5. For the past 30-day milk product consumption, four frequencies were used to distinguish between never, rarely, sometimes, and often, with never meaning never drinking milk; rarely meaning less than once a week; sometimes meaning once a week or more but less than once a day; and often meaning once a day or more.

The data analyzed in this study were obtained from NHANES and did not require additional ethical review by the investigator’s affiliated institution. NHANES has received approval from the National Center for Health Statistics (NCHS) Research Ethics Review Board.

2.2 Statistical analysis

Normally distributed continuous variables are expressed as mean ± standard deviation, non-normally distributed continuous variables as median (interquartile range), and categorical variables as percentages. Continuous variables were analyzed with the Independent Student’s t-test or Mann-Whitney U analysis; categorical variables were analyzed with the chi-square test or Fisher’s test. All statistical analyses were realized based on the “CBCgrps” package in R software.

2.3 Model construction, evaluation and validation

Data from the NHANES database for nine cycles from 2001-2018 were included for analysis. The included data were randomly divided into training and validation sets in the ratio of 70:30. We used the extracted variables as machine learning features for analysis. Six machine learning algorithms, Gradient Boosting Machine (GBM), Logistic Regression (LR), Neural Network (NNet), Random Forest (RF), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGBoost), were used to construct the classification model. Ten 10-fold cross validation resampling was used to ensure stability and reproducibility of model performance. Receiver operating characteristic (ROC) curves were plotted to evaluate the discriminative performance of the model, and the area under the curve (AUC) of the ROC curve was calculated. The AUC value was used as the main statistical indicator to evaluate the predictive performance of the model. To evaluate the predictive performance of the model more comprehensively, this study also reports accuracy (ACC), positive predictive value (PPV), negative predictive value (NPV), sensitivity (SEN), specificity (SPE), F1 score, Matthews correlation coefficient (MCC). The closer these statistics are to 1 the better the predictive performance of the model. Kappa values are used to determine whether the model’s results are consistent with actual results. The Kappa value is between -1 and +1, the closer the Kappa value is to 1, the better the consistency is, and if it is greater than 0.75, the consistency is excellent. The Brier Score combines the differentiation and calibration of the model and is used to evaluate the overall performance of the model, and the closer the Brier Score is to 0, the closer the predicted value is to the actual value (22). Decision curve analysis (DCA) is used to assess the clinical utility of models in decision making (23). The best machine learning predictive model was selected using AUC statistic value as the main statistic combined with various statistical indicators. Shapely Additive exPlanations (SHAP) values were used to interpret the best machine learning models (24). In addition, for the best machine learning models, an online web calculator is further constructed to facilitate the use of the models.

All statistical analyses, model construction and validation in this study were based on R software (version 4.1.3).

3 Results

There were 62,919 participants enrolled in the study, all the participants included in the study were 2 years old and above, of which 20,204 (32.1%) participants had vitamin D deficiency. The entire flow of the analysis is shown in the flowchart (Figure 1). The included data were randomly divided into training and validation sets in a ratio of 70:30, and the characteristics of the patients in the training set are shown in Table 1. The performance of the models constructed by each method was determined by resampling with ten ten-fold cross validation. AUC values were calculated based on the ROC curves (Figures 2A, B). The AUC values of GBM, LR, NNet, RF, SVM, and XGBoost in the training set are 0.796, 0.76, 0.778, 0.96, 0.8, and 0.995, respectively; and in the validation set are 0.786, 0.767, 0.79, 0.979, 0.837, and 1, respectively (Table 2). The model constructed by the XGBoost method has the best and near-perfect prediction performance in both the training and validation sets. To avoid the bias caused by data imbalance, this study further calculates ACC, PPV, NPV, SEN, SPE, F1 score, and MCC to evaluate the prediction performance of the model more comprehensively, as shown in Table 2. XGBoost obtained excellent results on all types of statistical metrics used to evaluate differentiation. The Kappa values of GBM, LR, NNet, RF, SVM, XGBoost in the training set are: 0.407, 0.353, 0.382, 0.745, 0.476, 0.928; and in the validation set are: 0.395, 0.36, 0.38, 0.821, 0.53, 0.997 (Table 2). The Brier score values of GBM, LR, NNet, RF, SVM, XGBoost in the training set are: 0.165, 0.178, 0.172, 0.084, 0.166, 0.042 respectively; and in the validation set are: 0.168, 0.175, 0.166, 0.068, 0.154, 0.013 respectively (Table 2). The XGBoost method also shows excellent consistency. The DCA curves show that the XGBoost-based model achieves higher net gains than the “all intervention” or “no intervention” strategies over the full range of thresholds, both in the training set (Figure 2C) and the validation set (Figure 2D). Combined with the various model performance evaluation statistics, the XGBoost-based model has the best and almost perfect performance.

Figure 1
www.frontiersin.org

Figure 1 Flowchart of data screening and analysis. NHANES, National Health and Nutritional Examination Surveys; GBM, Gradient Boosting Machine; LR, Logistic Regression; NNet, Neural Network; RF, Random Forest; SVM, Support Vector Machine; XGBoost, eXtreme Gradient Boosting; ACC, accuracy; PPV, positive predictive value; NPV, negative, predictive value; SEN, sensitivity; SPE, specificity; MCC, Matthews correlation coefficient.

Table 1
www.frontiersin.org

Table 1 Characterization of participants in the training set.

Figure 2
www.frontiersin.org

Figure 2 ROC and DCA curves for each method. (A) ROC in the training set. (B) ROC in the validation set. (C) DCA curves in the training set. (D) DCA curves in the validation set.

Table 2
www.frontiersin.org

Table 2 Evaluation metrics of the models constructed by each method.

We further plotted a summary of SHAP values (Figure 3) to interpret the XGBoost model results. For each feature, a point corresponds to a patient. The position of the point on the X-axis (i.e., the actual SHAP value) indicates the effect of the feature on the model output for that particular patient.The higher the feature on the Y-axis, the more important the feature is to the model. The results show that for this model, the features included are, in order of importance, Race, Age, BMI, H.PIR, Milk, H.Size, Gender, H.Smoke, and Diabetes. We also constructed an online web calculator based on the XGBoost method in order to facilitate the use of the model (Figure 4, https://jialeguo.shinyapps.io/vitamin_D_deficiency/).

Figure 3
www.frontiersin.org

Figure 3 Summary plot of SHAP values for the model constructed by XGBoost algorithm. The horizontal position “SHAP value” indicates whether the impact of the value is associated with a higher or lower prediction, and the color of each SHAP value point indicates whether the observed value is higher (purple) or lower (yellow). The vertical coordinates show the importance of the features, sorted by the importance of the variables in descending order, with the upper variables being more important to the model.

Figure 4
www.frontiersin.org

Figure 4 Online web calculator based on XGBoost modeling. Race is categorized as Mexican American, Non-Hispanic White, Non-Hispanic Black, Other Hispanic, or Other Race. Household Size: total number of people in the Household. Household Size over 7 or more defined as 7. Household PIR: household income to poverty income ratio. Household PIR more than 5 is defined as 5. BMI, body mass index; Household smoking, whether or not someone smokes in the household; Milk consumption, past 30-day milk product consumption. For Milk consumption, four frequencies were used to distinguish between never, rarely, sometimes, and often, with never meaning never drinking milk; rarely meaning less than once a week; sometimes meaning once a week or more but less than once a day; and often meaning once a day or more.

4 Discussion

This study uses data collected through interviews in a community-based population: gender, age, race, H.Size, H.PIR, BMI, H.Smoke, Milk, and diabetes. These nine variables were used as machine learning features to construct the model. Six machine learning methods (GBM, LR, NNet, RF, SVM, and XGBoost) were used to construct the model, and the model was evaluated for discrimination, fit, and clinical efficacy. Figures 2A, B show the main evaluation result of the discrimination: the ROC curve. The higher the convexity and the more skewed towards the upper left corner of the corresponding curve for each machine learning model, the better its differentiation. The results of the ROC curves in this study show that XGBoost-based has the best discrimination performance both in the training and validation sets. This is also confirmed in other complementary evaluation metrics: ACC, PPV, NPV, SEN, SPE, F1 score, and MCC. The results of the evaluation of clinical efficacy are presented in Figures 2C, D: DCA curves. The line corresponding to “Treat All” in the DCA curves shows the net benefit of intervening on all participants, and the line corresponding to “Treat None” shows the net benefit of not intervening on all participants. Therefore, it makes sense to construct a model that has a threshold probability that the net benefit is higher than both “Treat All” and “Treat None”. In this study, all the models have some clinical utility within a certain threshold. In particular, the model constructed by the XGBoost method has a higher net benefit than the “Treat All” or “Treat None” strategies within all thresholds. Ultimately, the model of the XGBoost method has the best and near perfect performance. This study further used SHAP values to interpret the model of XGBoost method, and among the variables included, race, age, and BMI were the top three important characteristics. In addition, an online web calculator was constructed based on the model of the XGBoost method for ease of use. Using this online web calculator, it is possible to screen community populations for vitamin D deficiency through a simple interview. The population in this study originated from the American community, where the prevalence of vitamin D deficiency was 32.11%. Vitamin D deficiency, a global public health problem, has different prevalence rates in different regions. Defined as vitamin D deficiency with 25(OH)D less than 50 nmol/L as recommended by an Endocrine Society Clinical Practice Guideline, the prevalence of vitamin D deficiency is 34.22% in Africa (25); 34.76% in South America (26); and 57.69% in Asia (27).

Both major forms of vitamin D forms (vitamin D2 and vitamin D3) are rarely found in food; vitamin D2 is found in plants and mushrooms; vitamin D3 is found in foods of animal origin, e.g., salmon, butter, and liver. Vitamin D in the body comes mainly from ultraviolet light exposure of the skin rather than through food intake. When human skin is exposed to ultraviolet light at wavelengths between 290 and 315 nm, it converts 7-dehydrocholesterol present in the epidermis to pre-vitamin D3 (28, 29). In turn, it is rapidly metabolized to vitamin D3 by thermal isomerization, which in turn is bound to vitamin D-binding proteins in the blood and transported to the liver. Converted to 1α,25(OH)2D3, the major biologically active metabolite form of vitamin D, sequentially by primary hydroxylation in the liver and kidney, respectively (28). This major source form of vitamin D in the body determines differences in vitamin D levels among different races and populations. The risk of vitamin D deficiency is related to race (30, 31), with darker-skinned races being less able to synthesize vitamin D from sunlight (32). In addition, latitude, season, atmospheric pollution, time spent outdoors, use of sunscreen, and habitual dress of some races, all factors that can affect the skin’s exposure to ultraviolet light, contribute to differences in vitamin D levels (32). The effect of age on vitamin D deficiency presents a different role in adults and minors. The results of a multicenter cross-sectional study of adults aged 30-75 years in Saudi Arabia suggest that older age is a protective factor against vitamin D deficiency (33). This has been confirmed in studies from other regions (3436). Instead, for minors, a higher risk of vitamin D deficiency was predicted with increasing age (37, 38). Obesity increases the risk of vitamin D deficiency in different regions and ages (3941). The results of a meta-analysis showed a positive association between BMI and vitamin D deficiency (42). Several Mendelian randomization studies have also demonstrated this relationship at the causal level (43, 44). Low vitamin D levels in the obese population may be caused by the deposition of vitamin D in the adipose zone of the body, which reduces its bioavailability (45).

Vitamin D plays a crucial role in the maintenance of calcium and phosphate homeostasis, normal bone growth and mineralization (46). The effect of vitamin D on mineral homeostasis is mediated by 1,25(OH)2 D activation of the vitamin D receptor (VDR) to stimulate intestinal calcium and phosphate absorption, renal tubular calcium reabsorption, and skeletal calcium mobilization (47). Vitamin D deficiency leads to decreased calcium and phosphorus absorption and lower circulating blood calcium, which is secondary to hyperparathyroidism. Parathyroid hormone (PTH) increases renal tubular calcium reabsorption and inhibits phosphorus reabsorption in order to maintain blood calcium levels (48), and ultimately, insufficient calcium phosphate products lead to systemic bone mineralization, causing rickets in children and osteomalacia in adults (49). Vitamin D is essential for bone health, and supplementation is essential for patients at risk for fractures and/or vitamin D deficiency (50). Besides roles closely related to calcium and phosphate homeostasis and bone metabolism, vitamin D has many roles to play, especially in the immune response. It can act directly on immune cells to promote an anti-inflammatory state and maintain the balance between pro- and anti-inflammatory (51). However, although vitamin D can affect the immune system in a number of ways, it tends to be interconnected with the microbiome and influence each other and the immune system (52). Vitamin D plays an important role in the immune response and maintenance of intestinal homeostasis by influencing the number and pathways of innate lymphoid cells (ILCs), which affect the immune response in the gut (53, 54). Recent studies have shown that the composition of the gut microbiota is altered by vitamin D levels (55, 56). The gut microbiota also influences calcium and vitamin D absorption, regulates intestinal permeability, hormone secretion and immune response (57). The intestinal epithelial VDR regulates autophagy and innate immune function through genes such as ATG16L1, which may influence the microbiota profile in the gut (58). Vitamin D deficiency also plays a key role in airway microbiome composition, as weekly oral supplementation has an impact on cystic fibrosis patients (59). Therefore, it is extremely important to use vitamin D and probiotics to regulate the immune system (60).

Prediction tools are widely used in the medical field to predict clinical disease diagnosis and prognosis. Several prediction tools have been used to predict vitamin D deficiency. However, there are no prediction tools for predicting the risk of vitamin D deficiency in the general community, including young people. In addition, the sample size included in this study far exceeds that of similar previous studies. The machine learning prediction tools developed by Sluyter et al. (61) are similar to ours: both are tools developed using data that could have been collected in the community through simple inspection and interviews. However, Sluyter et al.’s study was only applicable to adults older than 50 and performed worse than the XGBoost method in this study: the best AUC value for Sluyter et al.’s prediction tool was only 0.73; whereas the AUC value for the XGBoost method in this study was 0.995. Carretero et al. (62) and Kheir et al. (63) on the other hand developed prediction tools applied to hypertensive population and ICU admitted population respectively. Their AUC values were 0.74 and 0.64, respectively. This study is the first predictive tool that can be widely applied to predict vitamin D deficiency in community populations. The best performing XGBoost method in this study had perfect predictive performance. The large number of subjects is one of the strengths of this study, which resulted in the high accuracy of the results. The results of this study show that an online web calculator using the XGBoost method can be a good predictor of vitamin D deficiency in the general population. Using this predictive tool, screening for vitamin D deficiency in the community or primary care settings can be achieved at almost no cost, avoiding much of the public health expenditure on unnecessary vitamin D testing and providing an intuitive and powerful scientific tool for health education and further testing. Based on the results of the online web calculator in this study, primary care providers can give appropriate clinical advice to their patients and make timely interventions for those at high risk of vitamin D deficiency, especially for children, pregnant women, and the elderly.

However, we need to recognize that there are still some limitations to this study. First, in order for the predictor tool to be widely applicable to various scenarios, the vast majority of the predictors used in this study were based on participants’ self-reports, which may be subject to some bias. The NHANES database, on the other hand, has a strictly standardized process for data collection, and the large sample size of the studies included in this study can avoid these biases to a certain extent. Second, although internal validation was performed in this study by dividing the entire dataset into training and validation sets, we lacked external cohort studies to validate the performance of the prediction tool. All of the populations studied in this study were from the United States, and since vitamin D levels are related to factors such as race and latitude, the results of the study need to be viewed with caution when applied to populations in other regions. External validation of the study results using external datasets, especially from other continents, is necessary in the future.

5 Conclusion

The machine learning model constructed by the XGBoost method in this study possesses almost perfect performances. Based on this model, an online web calculator was further constructed, through which the risk of vitamin D deficiency in community populations can be predicted easily and quickly, and the public health expenditures caused by unnecessary vitamin D testing can be reduced.

Data availability statement

Publicly available datasets were analyzed in this study. This data can be found here: Centers for Disease Control and Prevention (CDC), National Center for Health Statistics (NCHS), National Health and Nutrition Examination Survey (NHANES), https://wwwn.cdc.gov/nchs/nhanes/Default.aspx, NHANES 2001-2018.

Ethics statement

The studies involving humans were approved by National Center for Health Statistics (NCHS) Research Ethics Review Board. The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation in this study was provided by the participants’ legal guardians/next of kin.

Author contributions

JG: Formal analysis, Methodology, Software, Visualization, Writing – original draft. QH: Software, Visualization, Writing – review & editing. YL: Supervision, Writing – review & editing.

Funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. JG and YL was funded by the “Postgraduate Innovation Research and Practice Program of Anhui Medical University” (No. YJS20230090).

Acknowledgments

The authors thank the participants and staff of the National Health and Nutrition Examination Survey 2011–2018 for their valuable contributions.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

1. Holick MF, Chen TC. Vitamin D deficiency: a worldwide problem with health consequences. Am J Clin Nutr (2008) 87(4):1080s–6s. doi: 10.1093/ajcn/87.4.1080S

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Lips P. Vitamin D deficiency and secondary hyperparathyroidism in the elderly: Consequences for bone loss and fractures and therapeutic implications. Review. Endocrine Rev (2001) 22(4):477–501. doi: 10.1210/er.22.4.477

CrossRef Full Text | Google Scholar

3. Holick MF. High prevalence of vitamin D inadequacy and implications for health. Mayo Clin Proc (2006) 81(3):353–73. doi: 10.4065/81.3.353

PubMed Abstract | CrossRef Full Text | Google Scholar

4. Sacerdote A, Dave P, Lokshin V, Bahtiyar G. Type 2 diabetes mellitus, insulin resistance, and vitamin D. . Curr Diabetes Rep (2019) 19(10):101. doi: 10.1007/s11892-019-1201-y

CrossRef Full Text | Google Scholar

5. Remelli F, Vitali A, Zurlo A, Volpato S. Vitamin D deficiency and sarcopenia in older persons. Nutrients (2019) 11(12). doi: 10.3390/nu11122861

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Cui X, McGrath JJ, Burne THJ, Eyles DW. Vitamin D and schizophrenia: 20 years on. Mol Psychiatry (2021) 26(7):2708–20. doi: 10.1038/s41380-021-01025-0

PubMed Abstract | CrossRef Full Text | Google Scholar

7. Illescas-Montes R, Melguizo-Rodríguez L, Ruiz C, Costela-Ruiz VJ. Vitamin D and autoimmune diseases. Life Sci (2019) 233:116744. doi: 10.1016/j.lfs.2019.116744

PubMed Abstract | CrossRef Full Text | Google Scholar

8. de la Guía-Galipienso F, Martínez-Ferran M, Vallecillo N, Lavie CJ, Sanchis-Gomar F, Pareja-Galeano H. Vitamin D and cardiovascular health. Clin Nutr (2021) 40(5):2946–57. doi: 10.1016/j.clnu.2020.12.025

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Carlberg C, Velleuer E. Vitamin D and the risk for cancer: A molecular analysis. Biochem Pharmacol (2022) 196:114735. doi: 10.1016/j.bcp.2021.114735

PubMed Abstract | CrossRef Full Text | Google Scholar

10. Charoenngam N, Holick MF. Immunologic effects of vitamin D on human health and disease. Nutrients (2020) 12(7). doi: 10.3390/nu12072097

PubMed Abstract | CrossRef Full Text | Google Scholar

11. Ismailova A, White JH. Vitamin D, infections and immunity. Rev Endocr Metab Disord (2022) 23(2):265–77. doi: 10.1007/s11154-021-09679-5

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Villasis-Keever MA, López-Alarcón MG, Miranda-Novales G, Zurita-Cruz JN, Barrada-Vázquez AS, González-Ibarra J, et al. Efficacy and safety of vitamin D supplementation to prevent COVID-19 in frontline healthcare workers. A randomized clinical trial. Arch Med Res (2022) 53(4):423–30. doi: 10.1016/j.arcmed.2022.04.003

PubMed Abstract | CrossRef Full Text | Google Scholar

13. di Filippo L, Frara S, Nannipieri F, Cotellessa A, Locatelli M, Rovere Querini P, et al. Low vitamin D levels are associated with long COVID syndrome in COVID-19 survivors. J Clin Endocrinol Metab (2023) 108(10):e1106–16. doi: 10.1210/clinem/dgad207

PubMed Abstract | CrossRef Full Text | Google Scholar

14. De Niet S, Trémège M, Coffiner M, Rousseau AF, Calmes D, Frix AN, et al. Positive effects of vitamin D supplementation in patients hospitalized for COVID-19: A randomized, double-blind, placebo-controlled trial. Nutrients (2022) 14(15). doi: 10.3390/nu14153048

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Woodford HJ, Barrett S, Pattman S. Vitamin D: too much testing and treating? Clin Med (Lond). (2018) 18(3):196–200. doi: 10.7861/clinmedicine.18-3-196

PubMed Abstract | CrossRef Full Text | Google Scholar

16. Holick MF, Binkley NC, Bischoff-Ferrari HA, Gordon CM, Hanley DA, Heaney RP, et al. Evaluation, treatment, and prevention of vitamin D deficiency: an Endocrine Society clinical practice guideline. J Clin Endocrinol Metab (2011) 96(7):1911–30. doi: 10.1210/jc.2011-0385

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat BioMed Eng (2022) 6(12):1330–45. doi: 10.1038/s41551-022-00898-y

PubMed Abstract | CrossRef Full Text | Google Scholar

18. Caruana A, Bandara M, Musial K, Catchpoole D, Kennedy PJ. Machine learning for administrative health records: A systematic review of techniques and applications. Artif Intell Med (2023) 144:102642. doi: 10.1016/j.artmed.2023.102642

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Li QY, Tang BH, Wu YE, Yao BF, Zhang W, Zheng Y, et al. Machine learning: A new approach for dose individualization. Clin Pharmacol Ther (2023). doi: 10.1002/cpt.3049

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Bouillon R, Carmeliet G. Vitamin D insufficiency: Definition, diagnosis and management. Best Pract Res Clin Endocrinol Metab (2018) 32(5):669–84. doi: 10.1016/j.beem.2018.09.014

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Altieri B, Cavalier E, Bhattoa HP, Pérez-López FR, López-Baena MT, Pérez-Roncero GR, et al. Vitamin D testing: advantages and limits of the current assays. Eur J Clin Nutr (2020) 74(2):231–47. doi: 10.1038/s41430-019-0553-3

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Redelmeier DA, Bloch DA, Hickam DH. Assessing predictive accuracy: how to compare Brier scores. J Clin Epidemiol. (1991) 44(11):1141–6. doi: 10.1016/0895-4356(91)90146-z

PubMed Abstract | CrossRef Full Text | Google Scholar

23. Fitzgerald M, Saville BR, Lewis RJ. Decision curve analysis. Jama (2015) 313(4):409–10. doi: 10.1001/jama.2015.37

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (2017) 30.

Google Scholar

25. Mogire RM, Mutua A, Kimita W, Kamau A, Bejon P, Pettifor JM, et al. Prevalence of vitamin D deficiency in Africa: a systematic review and meta-analysis. Lancet Glob Health (2020) 8(1):e134–42. doi: 10.1016/s2214-109x(19)30457-7

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Mendes MM, Gomes APO, Araújo MM, Coelho ASG, Carvalho KMB, Botelho PB. Prevalence of vitamin D deficiency in South America: a systematic review and meta-analysis. Nutr Rev (2023) 81(10):1290–309. doi: 10.1093/nutrit/nuad010

PubMed Abstract | CrossRef Full Text | Google Scholar

27. Jiang Z, Pu R, Li N, Chen C, Li J, Dai W, et al. High prevalence of vitamin D deficiency in Asia: A systematic review and meta-analysis. Crit Rev Food Sci Nutr (2023) 63(19):3602–11. doi: 10.1080/10408398.2021.1990850

PubMed Abstract | CrossRef Full Text | Google Scholar

28. DeLuca HF. Overview of general physiologic features and functions of vitamin D. Am J Clin Nutr (2004) 80(6 Suppl):1689s–96s. doi: 10.1093/ajcn/80.6.1689S

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Webb AR, Kift R, Durkin MT, O'Brien SJ, Vail A, Berry JL, et al. The role of sunlight exposure in determining the vitamin D status of the U.K. white adult population. Br J Dermatol Nov (2010) 163(5):1050–5. doi: 10.1111/j.1365-2133.2010.09975.x

CrossRef Full Text | Google Scholar

30. Sutherland JP, Zhou A, Leach MJ, Hyppönen E. Differences and determinants of vitamin D deficiency among UK biobank participants: A cross-ethnic and socioeconomic study. Clin Nutr (2021) 40(5):3436–47. doi: 10.1016/j.clnu.2020.11.019

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Hayden KE, Sandle LN, Berry JL. Ethnicity and social deprivation contribute to vitamin D deficiency in an urban UK population. J Steroid Biochem Mol Biol (2015) 148:253–5. doi: 10.1016/j.jsbmb.2014.11.014

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Clemens TL, Adams JS, Henderson SL, Holick MF. Increased skin pigment reduces the capacity of skin to synthesise vitamin D3. Lancet (1982) 1(8263):74–6. doi: 10.1016/s0140-6736(82)90214-8

PubMed Abstract | CrossRef Full Text | Google Scholar

33. AlQuaiz AM, Kazi A, Fouda M, Alyousefi N. Age and gender differences in the prevalence and correlates of vitamin D deficiency. Arch Osteoporos. (2018) 13(1):49. doi: 10.1007/s11657-018-0461-5

PubMed Abstract | CrossRef Full Text | Google Scholar

34. Mo H, Zhang J, Huo C, Zhang M, Xiao J, Peng J, et al. The association of vitamin D deficiency, age and depression in US adults: a cross-sectional analysis. BMC Psychiatry (2023) 23(1):534. doi: 10.1186/s12888-023-04685-0

PubMed Abstract | CrossRef Full Text | Google Scholar

35. Nguyen HT, von Schoultz B, Nguyen TV, Dzung DN, Duc PT, Thuy VT, et al. Vitamin D deficiency in northern Vietnam: prevalence, risk factors and associations with bone mineral density. Bone (2012) 51(6):1029–34. doi: 10.1016/j.bone.2012.07.023

PubMed Abstract | CrossRef Full Text | Google Scholar

36. Golbahar J, Al-Saffar N, Altayab Diab D, Al-Othman S, Darwish A, Al-Kafaji G. Predictors of vitamin D deficiency and insufficiency in adult Bahrainis: a cross-sectional study. Public Health Nutr (2014) 17(4):732–8. doi: 10.1017/s136898001300030x

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Middelkoop K, Walker N, Stewart J, Delport C, Jolliffe DA, Nuttall J, et al. Prevalence and determinants of vitamin D deficiency in 1825 cape town primary schoolchildren: A cross-sectional study. Nutrients (2022) 14(6). doi: 10.3390/nu14061263

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Karagol C, Duyan Camurdan A. Evaluation of vitamin D levels and affecting factors of vitamin D deficiency in healthy children 0-18 years old. Eur J Pediatr (2023) 182(9):4123–31. doi: 10.1007/s00431-023-05096-9

PubMed Abstract | CrossRef Full Text | Google Scholar

39. Alloubani A, Akhu-Zaheya L, Samara R, Abdulhafiz I, Saleh A, Altowijri A. Relationship between vitamin D deficiency, diabetes, and obesity. Diabetes Metab Syndr (2019) 13(2):1457–61. doi: 10.1016/j.dsx.2019.02.021

PubMed Abstract | CrossRef Full Text | Google Scholar

40. Wakayo T, Whiting SJ, Belachew T. Vitamin D Deficiency is Associated with Overweight and/or Obesity among Schoolchildren in Central Ethiopia: A Cross-Sectional Study. Nutrients (2016) 8(4):190. doi: 10.3390/nu8040190

PubMed Abstract | CrossRef Full Text | Google Scholar

41. Shafinaz IS, Moy FM. Vitamin D level and its association with adiposity among multi-ethnic adults in Kuala Lumpur, Malaysia: a cross sectional study. BMC Public Health (2016) 16:232. doi: 10.1186/s12889-016-2924-1

PubMed Abstract | CrossRef Full Text | Google Scholar

42. Pereira-Santos M, Costa PR, Assis AM, Santos CA, Santos DB. Obesity and vitamin D deficiency: a systematic review and meta-analysis. Obes Rev Apr (2015) 16(4):341–9. doi: 10.1111/obr.12239

CrossRef Full Text | Google Scholar

43. Huang YY, Zhang WS, Jiang CQ, Zhu F, Jin YL, Cheng KK, et al. Mendelian randomization on the association of obesity with vitamin D: Guangzhou Biobank Cohort Study. Eur J Clin Nutr Feb (2023) 77(2):195–201. doi: 10.1038/s41430-022-01234-y

CrossRef Full Text | Google Scholar

44. Vimaleswaran KS, Berry DJ, Lu C, Tikkanen E, Pilz S, Hiraki LT, et al. Causal relationship between obesity and vitamin D status: bi-directional Mendelian randomization analysis of multiple cohorts. PloS Med (2013) 10(2):e1001383. doi: 10.1371/journal.pmed.1001383

PubMed Abstract | CrossRef Full Text | Google Scholar

45. Wortsman J, Matsuoka LY, Chen TC, Lu Z, Holick MF. Decreased bioavailability of vitamin D in obesity. Am J Clin Nutr (2000) 72(3):690–3. doi: 10.1093/ajcn/72.3.690

PubMed Abstract | CrossRef Full Text | Google Scholar

46. Bergwitz C, Jüppner H. Regulation of phosphate homeostasis by PTH, vitamin D, and FGF23. Annu Rev Med (2010) 61:91–104. doi: 10.1146/annurev.med.051308.111339

PubMed Abstract | CrossRef Full Text | Google Scholar

47. Dominguez LJ, Farruggia M, Veronese N, Barbagallo M. Vitamin D sources, metabolism, and deficiency: available compounds and guidelines for its treatment. Metabolites (2021) 11(4). doi: 10.3390/metabo11040255

PubMed Abstract | CrossRef Full Text | Google Scholar

48. Potts JT Jr. A short history of parathyroid hormone, its biological role, and pathophysiology of hormone excess. J Clin Densitom. (2013) 16(1):4–7. doi: 10.1016/j.jocd.2012.11.002

PubMed Abstract | CrossRef Full Text | Google Scholar

49. Holick MF. Resurrection of vitamin D deficiency and rickets. J Clin Invest (2006) 116(8):2062–72. doi: 10.1172/jci29449

PubMed Abstract | CrossRef Full Text | Google Scholar

50. Capozzi A, Scambia G, Lello S. Calcium, vitamin D, vitamin K2, and magnesium supplementation and skeletal health. Maturitas (2020) 140:55–63. doi: 10.1016/j.maturitas.2020.05.020

PubMed Abstract | CrossRef Full Text | Google Scholar

51. Yamamoto E, Jørgensen TN. Immunological effects of vitamin D and their relations to autoimmunity. J Autoimmun (2019) 100:7–16. doi: 10.1016/j.jaut.2019.03.002

PubMed Abstract | CrossRef Full Text | Google Scholar

52. Murdaca G, Gerosa A, Paladin F, Petrocchi L, Banchero S, Gangemi S. Vitamin D and microbiota: is there a link with allergies? Int J Mol Sci (2021) 22(8). doi: 10.3390/ijms22084288

CrossRef Full Text | Google Scholar

53. Konya V, Czarnewski P, Forkel M, Rao A, Kokkinou E, Villablanca EJ, et al. Vitamin D downregulates the IL-23 receptor pathway in human mucosal group 3 innate lymphoid cells. J Allergy Clin Immunol (2018) 141(1):279–92. doi: 10.1016/j.jaci.2017.01.045

PubMed Abstract | CrossRef Full Text | Google Scholar

54. Ignacio A, Breda CNS, Camara NOS. Innate lymphoid cells in tissue homeostasis and diseases. World J Hepatol (2017) 9(23):979–89. doi: 10.4254/wjh.v9.i23.979

PubMed Abstract | CrossRef Full Text | Google Scholar

55. Luthold RV, Fernandes GR, Franco-de-Moraes AC, Folchetti LG, Ferreira SR. Gut microbiota interactions with the immunomodulatory role of vitamin D in normal individuals. Metabolism (2017) 69:76–86. doi: 10.1016/j.metabol.2017.01.007

PubMed Abstract | CrossRef Full Text | Google Scholar

56. Ooi JH, Li Y, Rogers CJ, Cantorna MT. Vitamin D regulates the gut microbiome and protects mice from dextran sodium sulfate-induced colitis. J Nutr (2013) 143(10):1679–86. doi: 10.3945/jn.113.180794

PubMed Abstract | CrossRef Full Text | Google Scholar

57. Locantore P, Del Gatto V, Gelli S, Paragliola RM, Pontecorvi A. The interplay between immune system and microbiota in osteoporosis. Mediators Inflamm (2020) 2020:3686749. doi: 10.1155/2020/3686749

PubMed Abstract | CrossRef Full Text | Google Scholar

58. Akimbekov NS, Digel I, Sherelkhan DK, Lutfor AB, Razzaque MS. Vitamin D and the host-gut microbiome: A brief overview. Acta Histochem Cytochem (2020) 53(3):33–42. doi: 10.1267/ahc.20011

PubMed Abstract | CrossRef Full Text | Google Scholar

59. Sun J, Dietary vitamin D. vitamin D receptor, and microbiome. Curr Opin Clin Nutr Metab Care (2018) 21(6):471–4. doi: 10.1097/mco.0000000000000516

PubMed Abstract | CrossRef Full Text | Google Scholar

60. Murdaca G, Greco M, Borro M, Gangemi S. Hygiene hypothesis and autoimmune diseases: A narrative review of clinical evidences and mechanisms. Autoimmun Rev (2021) 20(7):102845. doi: 10.1016/j.autrev.2021.102845

PubMed Abstract | CrossRef Full Text | Google Scholar

61. Sluyter JD, Raita Y, Hasegawa K, Reid IR, Scragg R, Camargo CA. Prediction of vitamin D deficiency in older adults: the role of machine learning models. J Clin Endocrinol Metab (2022) 107(10):2737–47. doi: 10.1210/clinem/dgac432

PubMed Abstract | CrossRef Full Text | Google Scholar

62. Garcia Carretero R, Vigil-Medina L, Barquero-Perez O, Mora-Jimenez I, Soguero-Ruiz C, Ramos-Lopez J. Machine learning approaches to constructing predictive models of vitamin D deficiency in a hypertensive population: a comparative study. Inform Health Soc Care (2021) 46(4):355–69. doi: 10.1080/17538157.2021.1896524

PubMed Abstract | CrossRef Full Text | Google Scholar

63. Bou Kheir G, Khaldi A, Karam A, Duquenne L, Preiser JC. A dynamic online nomogram predicting severe vitamin D deficiency at ICU admission. Clin Nutr (2021) 40(10):5383–90. doi: 10.1016/j.clnu.2021.08.024

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: machine learning, vitamin D deficiency, clinical decision rules, nutrition surveys, public health

Citation: Guo J, He Q and Li Y (2024) Machine learning-based prediction of vitamin D deficiency: NHANES 2001-2018. Front. Endocrinol. 15:1327058. doi: 10.3389/fendo.2024.1327058

Received: 24 October 2023; Accepted: 26 January 2024;
Published: 16 February 2024.

Edited by:

Mohammed S. Razzaque, Lake Erie College of Osteopathic Medicine, United States

Reviewed by:

Giuseppe Murdaca, University of Genoa, Italy
Rizaldy Taslim Pinzon, Duta Wacana Christian University, Indonesia

Copyright © 2024 Guo, He and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Yehai Li, YWh5eWx5aEAxNjMuY29t

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.