Impaired glucose tolerance (IGT) is diagnosed by a standardized oral glucose tolerance test (OGTT). However, the OGTT is laborious, and when not performed, glucose tolerance cannot be determined from fasting samples retrospectively. We tested if glucose tolerance status is reasonably predictable from a combination of demographic, anthropometric, and laboratory data assessed at one time point in a fasting state.
Given a set of 22 variables selected upon clinical feasibility such as sex, age, height, weight, waist circumference, blood pressure, fasting glucose, HbA1c, hemoglobin, mean corpuscular volume, serum potassium, fasting levels of insulin, C-peptide, triglyceride, non-esterified fatty acids (NEFA), proinsulin, prolactin, cholesterol, low-density lipoprotein, HDL, uric acid, liver transaminases, and ferritin, we used supervised machine learning to estimate glucose tolerance status in 2,337 participants of the TUEF study who were recruited before 2012. We tested the performance of 10 different machine learning classifiers on data from 929 participants in the test set who were recruited after 2012. In addition, reproducibility of IGT was analyzed in 78 participants who had 2 repeated OGTTs within 1 year.
The most accurate prediction of IGT was reached with the recursive partitioning method (accuracy = 0.78). For all classifiers, mean accuracy was 0.73 ± 0.04. The most important model variable was fasting glucose in all models. Using mean variable importance across all models, fasting glucose was followed by NEFA, triglycerides, HbA1c, and C-peptide. The accuracy of predicting IGT from a previous OGTT was 0.77.
Machine learning methods yield moderate accuracy in predicting glucose tolerance from a wide set of clinical and laboratory variables. A substitution of OGTT does not currently seem to be feasible. An important constraint could be the limited reproducibility of glucose tolerance status during a subsequent OGTT.