This study aimed to systematically investigate gene signatures for hepatoblastoma (HB) and identify potential biomarkers for its diagnosis and treatment.
GSE131329 and GSE81928 were obtained from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) between hepatoblastoma and normal samples were identified using the Limma package in R. Then, the similarity of network traits between two sets of genes was analyzed by weighted gene correlation network analysis (WGCNA). Cytoscape was used to visualize and select hub genes. PPI network of hub genes was construed by Cytoscape. GO enrichment and KEGG pathway analyses of hub genes were carried out using ClueGO. The random forest classifier was constructed based on the hub genes using the GSE131329 dataset as the training set, and its reliability was validated using the GSE81928 dataset. The resulting core hub genes were combined with the InnateDB database to identify the innate core genes.
A total of 4244 DEGs in HB were identified. WGCNA identified four modules that were significantly correlated with the disease status. A total of 114 hub genes were obtained within the top 20 genes of each node rank. 6982 relation pairs and 3700 nodes were contained in the PPI network of 114 hub genes. GO enrichment and KEGG pathway analyses of hub genes were focused on MAPK, cell cycle, p53, and other crucial pathways involved in HB. A random forest classifier was constructed using the 114 hub genes as feature genes, resulting in a 95.5% true positive rate when classifying HB and normal samples. A total of 35 core hub genes were obtained through the mean decrease in accuracy and mean decrease Gini of the random forest model. The classification efficiency of the random forest model was 81.4%. Finally,
Our study established a random forest classifier that identified 10 core genes in HB. These findings may be beneficial for the diagnosis, prediction, and targeted therapy of HB.