AUTHOR=Wang Haining , Cheng Wei , Hu Ping , Ling Tao , Hu Chao , Chen Yongzhen , Zheng Yanan , Wang Junqi , Zhao Ting , You Qiang TITLE=Integrative analysis identifies oxidative stress biomarkers in non-alcoholic fatty liver disease via machine learning and weighted gene co-expression network analysis JOURNAL=Frontiers in Immunology VOLUME=15 YEAR=2024 URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2024.1335112 DOI=10.3389/fimmu.2024.1335112 ISSN=1664-3224 ABSTRACT=Background

Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease globally, with the potential to progress to non-alcoholic steatohepatitis (NASH), cirrhosis, and even hepatocellular carcinoma. Given the absence of effective treatments to halt its progression, novel molecular approaches to the NAFLD diagnosis and treatment are of paramount importance.

Methods

Firstly, we downloaded oxidative stress-related genes from the GeneCards database and retrieved NAFLD-related datasets from the GEO database. Using the Limma R package and WGCNA, we identified differentially expressed genes closely associated with NAFLD. In our study, we identified 31 intersection genes by analyzing the intersection among oxidative stress-related genes, NAFLD-related genes, and genes closely associated with NAFLD as identified through Weighted Gene Co-expression Network Analysis (WGCNA). In a study of 31 intersection genes between NAFLD and Oxidative Stress (OS), we identified three hub genes using three machine learning algorithms: Least Absolute Shrinkage and Selection Operator (LASSO) regression, Support Vector Machine - Recursive Feature Elimination (SVM-RFE), and RandomForest. Subsequently, a nomogram was utilized to predict the incidence of NAFLD. The CIBERSORT algorithm was employed for immune infiltration analysis, single sample Gene Set Enrichment Analysis (ssGSEA) for functional enrichment analysis, and Protein-Protein Interaction (PPI) networks to explore the relationships between the three hub genes and other intersecting genes of NAFLD and OS. The distribution of these three hub genes across six cell clusters was determined using single-cell RNA sequencing. Finally, utilizing relevant data from the Attie Lab Diabetes Database, and liver tissues from NASH mouse model, Western Blot (WB) and Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) assays were conducted, this further validated the significant roles of CDKN1B and TFAM in NAFLD.

Results

In the course of this research, we identified 31 genes with a strong association with oxidative stress in NAFLD. Subsequent machine learning analysis and external validation pinpointed two genes: CDKN1B and TFAM, as demonstrating the closest correlation to oxidative stress in NAFLD.

Conclusion

This investigation found two hub genes that hold potential as novel targets for the diagnosis and treatment of NAFLD, thereby offering innovative perspectives for its clinical management.