Systemic Lupus Erythematosus (SLE) is a complex autoimmune disease that disproportionately affects women. Early diagnosis and prevention are crucial for women’s health, and the gut microbiota has been found to be strongly associated with SLE. This study aimed to identify potential biomarkers for SLE by characterizing the gut microbiota landscape using feature selection and exploring the use of machine learning (ML) algorithms with significantly dysregulated microbiotas (SDMs) for early identification of SLE patients. Additionally, we used the SHapley Additive exPlanations (SHAP) interpretability framework to visualize the impact of SDMs on the risk of developing SLE in females.
Stool samples were collected from 54 SLE patients and 55 Negative Controls (NC) for microbiota analysis using 16S rRNA sequencing. Feature selection was performed using Elastic Net and Boruta on species-level taxonomy. Subsequently, four ML algorithms, namely logistic regression (LR), Adaptive Boosting (AdaBoost), Random Forest (RF), and eXtreme gradient boosting (XGBoost), were used to achieve early identification of SLE with SDMs. Finally, the best-performing algorithm was combined with SHAP to explore how SDMs affect the risk of developing SLE in females.
Both alpha and beta diversity were found to be different in SLE group. Following feature selection, 68 and 21 microbiota were retained in Elastic Net and Boruta, respectively, with 16 microbiota overlapping between the two, i.e., SDMs for SLE. The four ML algorithms with SDMs could effectively identify SLE patients, with XGBoost performing the best, achieving Accuracy, Sensitivity, Specificity, Positive Predictive Value, Negative Predictive Value, and AUC values of 0.844, 0.750, 0.938, 0.923, 0.790, and 0.930, respectively. The SHAP interpretability framework showed a complex non-linear relationship between the relative abundance of SDMs and the risk of SLE, with
This study revealed dysbiosis in the gut microbiota of female SLE patients. ML classifiers combined with SDMs can facilitate early identification of female patients with SLE, particularly XGBoost. The SHAP interpretability framework provides insight into the impact of SDMs on the risk of SLE and may inform future scientific treatment for SLE.