Glioblastoma (GBM) is a highly aggressive malignant tumor of the central nervous system that displays varying molecular and morphological profiles, leading to challenging prognostic assessments. Stratifying GBM patients according to overall survival (OS) from H&E-stained whole slide images (WSI) using advanced computational methods is challenging, but with direct clinical implications.
This work is focusing on GBM (IDH-wildtype, CNS WHO Gr.4) cases, identified from the TCGA-GBM and TCGA-LGG collections after considering the 2021 WHO classification criteria. The proposed approach starts with patch extraction in each WSI, followed by comprehensive patch-level curation to discard artifactual content, i.e., glass reflections, pen markings, dust on the slide, and tissue tearing. Each patch is then computationally described as a feature vector defined by a pre-trained VGG16 convolutional neural network. Principal component analysis provides a feature representation of reduced dimensionality, further facilitating identification of distinct groups of morphology patterns, via unsupervised k-means clustering.
The optimal number of clusters, according to cluster reproducibility and separability, is automatically determined based on the rand index and silhouette coefficient, respectively. Our proposed approach achieved prognostic stratification accuracy of 83.33% on a multi-institutional independent unseen hold-out test set with sensitivity and specificity of 83.33%.
We hypothesize that the quantification of these clusters of morphology patterns, reflect the tumor's spatial heterogeneity and yield prognostic relevant information to distinguish between short and long survivors using a decision tree classifier. The interpretability analysis of the obtained results can contribute to furthering and quantifying our understanding of GBM and potentially improving our diagnostic and prognostic predictions.