Intervertebral Disc Degeneration (IDD) is a major cause of lower back pain and a significant global health issue. However, the specific mechanisms of IDD remain unclear. This study aims to identify key genes and pathways associated with IDD using bioinformatics and machine learning algorithms.
Gene expression profiles, including those from 35 LDH patients and 43 healthy volunteers, were downloaded from the GEO database (GSE124272, GSE150408, GSE23130, GSE153761). After merging four microarray datasets, differentially expressed genes (DEGs) were selected for GO and KEGG pathway enrichment analysis. Weighted Gene Co-expression Network Analysis (WGCNA) was then applied to the merged dataset to identify relevant modules and intersect with DEGs to discover candidate genes with diagnostic value. A LASSO model was established to select appropriate genes, and ROC curves were drawn to elucidate the diagnostic value of genetic markers. A Protein-Protein Interaction (PPI) network was constructed and visualized to determine central genes, followed by external validation using qRT-PCR.
Differential analysis of the preprocessed dataset identified 244 genes, including 183 upregulated and 61 downregulated genes. WGCNA analysis revealed the most relevant module intersecting with DEGs, yielding 9 candidate genes. The lasso-cox method was used for regression analysis, ultimately identifying 6 genes: ASPH, CDC42EP3, FOSL2, IL1R1, NFKBIZ, TCF7L2. A Protein-Protein Interaction (PPI) network created with GENEMANIA identified IL1R1 and TCF7L2 as central genes.
Our study shows that IL1R1 and TCF7L2 are the core genes of IDD, offering new insights into the pathogenesis and therapeutic development of IDD.