- 1Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, United States
- 2Department of Pathology, University of Illinois at Chicago, Chicago, IL, United States
- 3Department of Computational Biomedicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States
Recent advances in spatial omics technologies have enabled new approaches for analyzing tissue morphology, cell composition, and biomolecule expression patterns in situ. These advances are promoting the development of new computational tools and quantitative techniques in the emerging field of digital pathology. In this review, we survey current trends in the development of computational methods for spatially mapped omics data analysis using digitized histopathology slides and supplementary materials, with an emphasis on tools and applications relevant to genitourinary oncological research. The review contains three sections: 1) an overview of image processing approaches for histopathology slide analysis; 2) machine learning integration with spatially resolved omics data analysis; 3) a discussion of current limitations and future directions for integration of machine learning in the clinical decision-making process.
1 Introduction
Over the last decade, automation and digitization of laboratory processes have slowly transformed everyday practices in hospitals. Recent advances in computational pathology, especially with machine learning (ML) suggest an imminent revolution in the clinical decision-making process (1). There has been a steady build-up of resources to further support this transition. Adoption of digital pathology is becoming more common among medical centers, and large repositories such as The Cancer Genome Atlas (TCGA) have steadily been accruing digital tissue slides complemented by multi-omics profiles of each sample (https://www.cancer.gov/tcga). There have been ongoing research efforts to adopt ML and artificial intelligence (AI) for image analysis, with several tools already approved by the Food and Drug Administration (FDA) for use with radiologic images (https://www.fda.gov/). Utilizing ML for digital slide images analysis could help clinicians not only with diagnosis but also with risk stratification through predicting genetic alterations and classifying tumors based on meaningful features. Features such as microscopic morphology and expression pattern can be factored into the learning process, along with “sub-visual” level image features that may not be recognizable by human pathologists. This enables the development of more precise modeling of pathologies and therefore improved outcome prediction compared to traditional grading systems, contributing to precision medicine.
Spatially resolved omics is another field that ML may help clinicians overcome the limitations of traditional molecular testing methods. Analysis at a single-cell level has gained great popularity over the recent few years (2). Spatial multi-omics further augments single-cell technology by preserving the spatial information associated with each transcript or protein. The extracted expression profile is often combined with high-resolution hematoxylin and eosin (H&E) stained tissue slide images for complementary information on histological features (3). Computer vision can perform such cellular image analysis with minimal user intervention (4). Spatial profiling enables detection of unique features such as quantities of immune and stromal components that are associated with the tumor microenvironment (TME) in a spatial context (5). Employing such tools in research for identifying novel biomarkers will help clinicians identify with more ease in selecting patients who will benefit from targeted therapies, notably immunotherapy. Compared to the rapid gain of popularity in the development of ML algorithms, only a few tools are currently approved by the FDA or undergoing clinical trials.
In this review, we aim to provide the reader with an overview of currently available tools and methods in digital slide analysis and spatial multi-omics, with a focus on open-source tools. Selected studies are discussed to showcase the potential of ML in investigating urologic pathologies, to provide a reference for future research in the field of urology. Lastly, we discuss technical and practical problems that need to be addressed before clinical implementation.
2 Use of AI/ML in pathological diagnostics
2.1 Current trend of ML in digital pathology slide image analysis in urologic oncology
There has been an influx of basic and translational research studies during the recent few years, mostly focused on developing new workflows powered by AI and ML methods. With rising availability of whole slide image (WSI) databases and rapidly improving performance of computational pathology, there have been efforts to detect and quantify novel features beside traditional histological features of nuclear and cellular morphology. Newer features often used for model development include immune cell infiltration, stromal composition, glandular architecture, and even topological features. Quantifying immune cell infiltration patterns can provide insights into the tumor microenvironment and potential response to immunotherapies. Analyzing stromal composition, such as the presence of cancer-associated fibroblasts or extracellular matrix components, can shed light on tumor invasiveness and metastatic potential. Additionally, assessing glandular architecture and its disruption can aid in grading and staging, especially for prostate cancer. Of surveyed studies, we identified urological specific features that had been used, including cribriform glandular pattern in prostate cancer and glomerular structure along with tubular morphology for kidney cancer (Figure 1). ML can aid in the discovery and quantification of such meaningful features. Furthermore, by integrating these features in model training, researchers are now moving beyond basic cancer detection, with the goal of developing more comprehensive and predictive models for tasks like survival prediction and treatment stratification. This represents a paradigm shift in the field, harnessing the power of ML and big data of pathology slide images to unlock new insights and improve patient outcomes.
Figure 1. The various features and machine approaches used in urological cancer research. The recent urological cancer research used image features (blue), molecular features(green), and clinical features (yellow) for digital diagnosis. While solid lines indicate the machine learning models that are predominantly used in urological cancer research, dashed lines indicate machine learning models that have not yetbeen widely applied in this field. CNVs, Copy Number Variations; DEGs, Differentially Expressed Genes; SVM, Support Vector Machine.
Prostate cancer has garnered a relatively higher volume of research investigations compared to other malignancies of the urological system. This research emphasis can be attributed to the high incidence rates of prostate cancer and its substantial contribution to cancer-related morbidity and mortality. Huang et al. trained their model on randomly sampled tiles from H&E WSIs of The Cancer Genome Atlas (TCGA) Prostate Adenocarcinoma (PRAD) dataset. Cell-specific features including nuclear detail, glandular context and various TME elements including immune cells and stroma were extracted with a convolutional neural network (CNN) model to identify patterns predictive of early recurrence (6). In another study, a deep learning (DL) model was trained on the TCGA-PRAD dataset to predict TP53 mutation status from WSIs (7).
Similar efforts have also been made for bladder and renal carcinomas. Jiang et al. extracted TME features from TCGA bladder urothelial carcinoma (BCLA) WSI samples using CNN, which were then used to cluster the images by the K-means method. The clusters were found to correlate with prognosis and immune scores, suggesting there are differences in reactivity to immune checkpoint inhibitors. The authors furthermore attempted to quantify the TMI characteristics through an AI score, where a higher AI score predicted a higher therapeutic response to immunotherapy (8). Studies investigating renal cell carcinoma are relatively fewer in number compared to those focused on prostate and bladder cancers. Chen et al. developed a ML-based pathomics signature for clear cell renal cell carcinoma, where 5 most prognostic image factors were selected through least absolute shrinkage and selection operator (LASSO) analysis and factored into the ML formula (9).
2.2 Open-source tools for digital pathology analysis
While numerous studies have explored leveraging ML for digital pathology image analysis, many models developed by individual researchers and the complete methods and codes remain inaccessible to the public domain. This could perhaps be attributed to potential plans for patenting or commercializing the developed workflows. Future researchers could take advantage of several open-source tools, although optimization for individual research purposes might be necessary. Table 1 shows a summary of the most popular open-source tools and programs currently available. Those tools are freely available to the public, and a wide range of extensions or plugins could be used in addition to further expand the utility of the main tool. For instance, Stardist can complement QuPath by using a pre-trained DL algorithm for nucleus detection. As such, open-source tools offer great flexibility in the functionality and the type of stains and data they can analyze.
QuPath utilizes deep learning methods like StarDist, which uses CNN trained on annotated image data to precisely segment and identify individual nuclei, for 2D and 3D nucleus detection (2). ImageJ, with its Trainable Weka Segmentation plugin (Fiji plugin), uses pixel classification approach for image segmentation. Pixel classification strategy means that each pixel in the image is transformed into a feature vector capturing properties such as intensity values, edge information, texture, etc. If the user manually annotates a subset of pixels, labeling them into distinct classes like cells or background, these labeled pixels serve as training examples for a ML classifier, such as a random forest (RF) or support vector machine (SVM). Once trained, the classifier predicts class labels for all remaining unlabeled pixels in the image. This tool often employs supervised learning methods where users annotate training data to teach the model (18). TMarker employs ML algorithms such as random decision trees and SVM to improve tumor and cell segmentation. Superpixels, which is an algorithm starting with a rough initial division of pixels and updating the clustering until the result meets a certain criterion, are used to segment the tissue image, and these segments are classified into foreground and background, and subsequently into malignant and benign categories (19). Orbit uses object classification through a linear SVM to differentiate objects within images. The trained SVM model can then predict the class label for all segmented objects in an image based on their feature representations (20). CYTOMINE is an open-source web platform enabling collaborative analysis of multi-gigapixel biomedifcal images through an integration of manual annotation tools and ML algorithms. It allows multiple remote users to access and create semantic annotations on shared images by labeling regions of interest with ontology terms, providing ground truth data for training models (21). The Digital Slide Archive (DSA) is a comprehensive platform designed to handle large imaging data sets by offering capabilities for storage, management, visualization, and annotation. The DSA is composed of an analysis toolkit enabling users to perform a variety of image analysis tasks (HistomicsTK), an interface that allows users to visualize slides and label annotations (HistomicsUI), a database layer for storing and managing the vast amounts of image data and associated metadata (MongoDB), and a web-server providing API for interacting with the platform and managing data (Girder) (22). CellProfiler, which is open-source software developed by the Broad Institute, employs supervised learning to classify objects based on their properties. CellProfiler first measures various features of each cell/object. These quantitative measurements extracted by CellProfiler pipelines serve as the feature vectors input to the ML classifier. Once trained on the user-provided labeled examples, the classifier model can automatically score and classify all objects in the dataset based on their measured features. It is compatible with various data analysis tools and supported by a robust user community and comprehensive documentation, including tutorials and forums (23). Ilastik utilizes random forests for pixel classification and segmentation. ICY is an open community platform for both applied mathematicians developing new algorithms and biologists seeking a powerful and intuitive tool for image analysis (24). Lastly, PathML applies DL models to automate and enhance histopathology image analysis, making these tools pivotal in advancing bioimage informatics through ML. PathML integrates with PyTorch and TensorFlow, which is a DL framework for model development, training, and inference using CNN, to enable training and evaluating DL models on standardized pathology datasets (25).
However, although there are many tools based on various ML models, almost all urologic cancer research has primarily utilized open-source tools based on CNN, with other ML models not being extensively applied yet (Figure 1). Wen et al. evaluated the performance of SVM, RF, and CNN for nucleus segmentation in breast and pancreatic cancer. The area under curve (AUC) values for breast cancer was 0.54 for SVM; 0.67 for RF; and 0.82, 0.84, and 0.86 for 3 runs of CNN. Similar results were seen with pancreatic cancer with AUC 0.47 for SVM; 0.42 for RF; and 0.69, 0.79, and 0.80 for CNN. Three runs were performed with CNN as the randomness in the data augmentation process presents different classification results across runs. Although CNN outperformed the two other algorithms in accuracy, its processing time was significantly longer than that of SVM and RF, with RF being the fastest method (26). Another study on cervical intraepithelial lesions and malignancy used CNN for feature extraction then compared KNN, RF, and SVM classification models on the extracted features. The stacked classification models were found to have higher accuracy compared to purely CNN classification, with the accuracy of the CNN model being 70.83%, and the accuracy of the CNN-KNN, CNN-RF, CNN-SVM models being 85.83%, 80.83% and 86.67%, respectively. This suggests stacking of different ML methods may be able to achieve better performance than a single classifier (27).
QuPath’s pixel classifier has been used on immunohistochemistry (IHC) stained images of prostate cancer for quantification of TME to predict response to immune checkpoint inhibitors (10) (18). In muscle invasive bladder cancer, tumor-stroma ratio (TSR) quantified by ML was linked to prognosis (28). The cell classifier for TSR calculation was based on a previously developed ML algorithm (29) and QuPath. Tools other than QuPath have also been utilized in other studies, although less often in urology. CellProfiler was used for feature extraction from H&E images for diagnosis of clear cell renal cell carcinoma (30). On the more practical side, QuPath was used to train ML algorithms for automated Ki-67 index quantification in prostate cancer tissue microarray (TMA) samples to assess PSA recurrence risk (31). This could possibly streamline Ki-67 assessment, which is an important prognostic indicator along with Gleason grade in prostate cancer.
There are several AI tools that have been validated with the large clinical trial cohorts and approved by the FDA that are clinically available. Paige Prostate (https://paige.ai/diagnostic-ai/) is so far the only AI tool in urologic pathology approved for whole slide image analysis (32). It uses scanned slide images to detect possible areas of prostate cancer based on morphology and p63 staining pattern. Artera AI (https://artera.ai) is another AI tool that predicts risk of progression and predicts the treatment response based on the histomorphology and patient’s clinical data (33). A few clinical trials are underway for evaluation and implementation of newly developed tools. Ramon et al. developed a deep learning algorithm for predicting FGFR alterations from H&E WSIs of bladder cancer and pan-tumor datasets including prostate cancer (34). This could help lessen the tumor screening burden in determining eligibility for treatment with erdafitinib, thereby improving access to targeted therapy. Another clinical trial aims to evaluate an AI algorithm for predicting response to adjuvant BCG (Bacillus Calmette-Guerin) treatment in non-muscle invasive bladder cancer (35). However, it should be noted that the AI tools are not recommended for autonomous diagnosis but as an assistance tool for the pathologists.
3 Machine learning in guided spatial profiling
There are several well-established commercial platforms available for spatial transcriptomics (ST) analysis. Notable platforms that use image-based ST technologies include Xenium of 10X Genomics, which improves on CARTANA by combining in-situ sequencing (ISS) with in-situ hybridization (ISH). MERSCOPE and CosMx Spatial Molecular Imager of NanoString are both based on ISH. Sequencing-based technologies are used in several array-based ST platforms such as Visium of 10X Genomics, BMKMANU S1000 of Biomarker, and Slide-seq. Stereo-seq achieves an even lower resolution and is available as STOmics by BGI. GeoMx Digital Spatial Profiler implements microdissection technology instead of microarray (36).
While spatial proteomics still lags behind ST, a few tools and technologies are available for use. CO-Detection by indEXing (CODEX; PhenoCycler™) allows highly multiplexed protein detection through sequential rounds of antibody staining and imaging, visualizing up to 60 proteins in a single section (37). Hyperion Imaging System combines mass cytometry with imaging for high-dimensional protein analysis (38). Multiplexed Ion Beam Imaging (MIBI) employs secondary ion mass spectrometry with metal-tagged antibodies for multiplexed detection and spatial resolution at the cell level (39). CyteFinder combines high-content imaging with multi-parameter protein detection and allows for the spatial localization and quantification of proteins in tissue sections. The IMS (Imaging Mass Spectrometry) platforms from Bruker uses Matrix-Assisted Laser Desorption/Ionization (MALDI) mass spectrometry (MS) imaging to map protein distribution within tissue sections at high spatial resolution and represents a cutting-edge approach in spatial proteomics (40). This platform utilizes MALDI for sample ionization and Fourier Transform Ion Cyclotron Resonance (FTICR) for mass analysis, enabling the detection and spatial localization of a broad range of biomolecules with high mass accuracy and resolution. The GeoMx DSP is also used to quantify protein expression in spatially resolved sections (41). These platforms are crucial for providing invaluable data to advance our understanding of the spatial and functional roles of RNAs and proteins in tissues and provide insights into cellular processes and disease mechanisms.
ML methods are increasingly being incorporated into spatially resolved transcriptomics and proteomics analysis. Deep learning models including CNN are often used for automatic feature selection and pattern recognition in imaging-based spatial omics, implemented as some of the open-source tools for digital pathology discussed above. There are several image analysis tools developed specifically for spatial omics analysis; Squidpy and Giotto utilize various ML-based methods for image analysis and visualization (42, 43). Spatial profiling offers measurement of several unique features that cannot be measured with simple single cell analysis, such as cell density, cell to cell interaction, cell proximity, and aggregation. Various ML methods and closely related statistical models are used to contextualize these measurements and carry out different tasks at hand (44, 45). Common tasks in spatially resolved transcriptomics analysis include spatial clustering, spatially variable gene (SVG) detection, cell type deconvolution, and identification of cellular interactions.
SpaCell and StLearn extract histological features from slide images via ResNet50, a convolutional neural network model pre-trained on ImageNet, then integrates the information with gene expression data in clustering (46, 47). SpaGCN is based on a graph convolutional network that works by extracting RGB values of each pixel which then separates spots into different spatial domains by unsupervised iterative clustering. SVGs (Spatially Variable Genes) can then be identified for each spatial domain (48). Supervised learning approaches can also aid in spatial cell type deconvolution, where traditional scRNA-seq deconvolution methods may not work as well. Robust Cell Type Decomposition (RCTD) is a supervised learning approach that leverages maximum likelihood estimation using annotated single-cell RNA-seq (scRNA-seq) data to infer cell types of proportions for each pixel (49). Cell2location uses a hierarchical Bayesian model to estimate the absolute abundance of cell types at a location based on predefined cell type signature sets (50). Spatial-ID employs a deep neural network (DNN) model pretrained on scRNA-seq datasets to produce cell type probabilities distributions (51). To infer intercellular signaling, MISTy trains a RF model for each target feature to derive feature importance scores. NicheNet observes the underlying spatial interactions and networks through a graph affinity algorithm based on predefined ligand-receptor pairs (52).
Spatial proteomics have also benefited from the integration of various ML methods into the data analysis pipeline (53). Much like other MS-based proteomics data, MS-based spatial proteomic data often suffers from missing measurement values. K-nearest neighbors (KNN) imputation is often used for missing value imputations before downstream analysis. Various ML classifiers may be used for protein localization prediction, including SVM, RF, KNN, neural networks, and others as mentioned in a review (54). pRolocs is an R package based on a SVM classifier for protein localization (55). MetaMass performs K-means clustering and assigns each cluster a location based on its marker content, available for Excel and R (56). TRANSPIRE uses a probabilistic Gaussian process classifier trained on organelle protein markers to predict protein translocation (57) (Table 2).
While there is a relative abundance of basic research focused on biomarker discovery and understanding molecular mechanisms, there is a need for more translational research that bridges the gap between basic and clinical studies by testing new techniques and methods on animal and patient-derived xenograft (PDX) models. Zimmerman et al. designed multiplex ISH probes that target the protein coding genes of the mouse and human transcriptome to integrate the transcriptome with histological features in diabetic kidney disease (DKD) (58). The development of such whole transcriptome panels across multiple organisms enables further discoveries through translational and clinical studies. Wang et al. modeled a spatially resolved metabolic network of the prostate cancer TME to predict selective metabolic targets for cancer cells (59). Although the application of spatial omics technology in the clinic is still in its development stage, image analysis combined with spatial omics technology offers great potential in improving clinical practices. Well-trained deep learning algorithms could extract meaningful spatial features and molecular patterns from spatial datasets, identifying new biomarkers and aiding clinical decision making.
4 Challenges and future directions
4.1 Common challenges in adopting ML into clinical practice
The integration of AI and ML in medicine faces significant real-world challenges, particularly in quality assurance and regulatory compliance. Like other medical devices, AI/ML tools are regulated by the FDA, and developing these regulations is a meticulous process. The College of American Pathologists (CAP) Advocacy Committee is actively working with government agencies to ensure fair and effective regulation of AI in pathology.
Real-world performance and quality assurance also present notable technical hurdles. Similar to traditional diagnostic tests, AI tools must undergo rigorous validation and regulatory scrutiny, as incorrect results can have serious implications for patient care. Despite the promising potential of ML, there are currently limited published examples in urologic cancers where they were tested and validated in a clinical trial setting. Key challenges on the research perspective include limited robustness, reproducibility, comparability, and interpretability. Current ML research heavily relies on retrospective data analysis, often notably using the TCGA datasets. This raises concerns about dataset biases and the generalizability of developed models. Prospective multicenter validation with adequate sample sizes is crucial to assess the robustness of ML algorithms before clinical implementation.
Furthermore, the relative lack of standardized data formats, and consistent outcome measures across studies makes it challenging to reproduce and objectively compare existing ML models. Publishing publicly available datasets, standardizing data collection protocols, and sharing developed algorithms as open-source resources could improve reproducibility and facilitate benchmarking.
Integrating ML into clinical workflows presents additional financial and operational challenges that must be considered. Many models suffer from the “black box” effect, where the decision-making process is opaque and difficult to interpret for clinicians. Improving transparency and providing clinically relevant outputs, such as diagnostic reports, could enhance trust and adoption of ML tools by clinicians. The cost for adoption of ML-based workflows should also be taken into account. Storing and processing large volumes of high-dimensional whole slide image data imposes significant computational demands, necessitating investments in hardware and computing resources. Continuous monitoring and updating of ML models incur additional recurring costs. Dedicated resources should also be set aside for integrating ML models with existing laboratory information systems. The final barriers to clinical implementation may be regulatory and ethical requirements and lack of clear guidelines for approving ML-based decision support tools. Although a clear guideline is yet to be established, addressing ethical concerns surrounding data privacy and liability is essential for acceptance among clinicians and patients. Overcoming these multifaceted challenges requires substantial financial investments and interdisciplinary collaborations.
4.2 Limitations for clinical application of spatial and single cell technologies
In current studies, there are efforts to apply single cell genomics to diagnostics. One study utilized single-cell genomic analysis to assess prostate cancer risk from prostate biopsy samples. Researchers employed single nucleus sequencing (SNS) to help the diagnosis. The sequencing examined copy number variations in individual cells, developing methods to identify clonal cell populations and reconstruct phylogenetic relationships among them. The genomic data was integrated with histopathology and anatomical information using a custom visualization tool named the Single Cell Genome Viewer (SCGV). The author concluded that SNS has the potential to enhance prostate cancer diagnosis and risk assessment from biopsies by providing more detailed genomic information than standard histopathology alone (60).
Technical limitations to clinical application of ST include complexity and repeatability. The complexity of the data and the need for sophisticated computational tools can be a barrier in clinical settings. Many spatial and single-cell omics methods require extensive sample preparation and processing, which can be time-consuming and not scalable for high-throughput methods. This limits their practicality for routine clinical use where quick turnaround time is often critical. The equipment for spatial and single-cell analyses is often expensive, making it difficult to implement these technologies widely in clinical settings, particularly in resource-limited environments. These complex and time-consuming procedures involved in preparing and analyzing samples using spatial omics technologies limit their feasibility for routine clinical use. Integrating and interpreting data from spatial and single-cell analyses with traditional clinical data (e.g., histopathological assessments, clinical imaging) may also prove to be challenging. Additionally, one of the primary challenges is that the profile obtained may not be representative of the entire tumor. The spatial heterogeneity within tumors means that sampling a small fraction of the tumor might not capture all relevant biological phenomena, leading to incomplete or skewed data. As for spatial proteomics, it should be noted that it is currently challenging to measure the whole proteome due to technological limitations. Advances are needed in the development of more specific antibodies and techniques beyond mass spectrometry-based methods to enhance the range and accuracy of protein detection. Although spatial technology shows more potential as a research tool than a clinical diagnostic tool due to these challenges, addressing these limitations could significantly enhance its clinical applicability.
4.3 Underrepresented urological malignancies
There exists a bias in the types of urological carcinomas being investigated, with a relative abundance of studies focusing on prostate cancer compared to kidney and bladder cancers. Additionally, several other urological malignancies remain understudied due to their lower incidence rates, such as testicular cancer, upper tract urothelial carcinoma, and penile cancer. Such underrepresentation of these cancers in research efforts underscores the necessity of addressing this disparity.
Author contributions
HK: Visualization, Writing – original draft, Writing – review & editing. JK: Visualization, Writing – original draft, Writing – review & editing. SY: Writing – original draft, Writing – review & editing. SY: Conceptualization, Supervision, Writing – original draft, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by Department of Defense PC190482 and PC180192; Cedars-Sinai Cancer, 2022 Cancer Biology Program Discovery Fund Award.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Abbreviations
ML, Machine Learning; TCGA, The Cancer Genome Atlas; FDA, Food and Drug Administration; AI, Artificial Intelligence; WSI, Whole Slide Image; H&E, Hematoxylin and Eosin; CNN, Convolutional Neural Network; SVM, Support Vector Machine; TSR, Tumor-Stroma Ratio; IHC, Immunohistochemistry; BCG, Bacillus Calmette-Guerin; ST, Spatial Transcriptomics; ISS, In-Situ Sequencing; ISH, In-Situ Hybridization; CODEX, CO-Detection by indEXing; MIBI, Multiplexed Ion Beam Imaging; MALDI FTICR, Matrix-Assisted Laser Desorption/Ionization Fourier Transform Ion Cyclotron Resonance; IMS, Imaging Mass Spectrometry; SVG, Spatially Variable Gene; RCTD, Robust Cell Type Decomposition; DNN, Deep Neural Network; MS, Mass Spectrometry; KNN, K-Nearest Neighbors; PDX, Patient-Derived Xenograft; DKD, Diabetic Kidney Disease; SNS, Single Nucleus Sequencing; SCGV, Single Cell Genome Viewer.
References
1. Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. (2019) 16:703–15. doi: 10.1038/s41571-019-0252-y
2. Bankhead P, Loughrey MB, Fernandez JA, Dombrowski Y, McArt DG, Dunne PD, et al. QuPath: Open source software for digital pathology image analysis. Sci Rep. (2017) 7:16878. doi: 10.1038/s41598-017-17204-5
3. Stahl PL, Salmen F, Vickovic S, Lundmark A, Navarro JF, Magnusson J, et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science. (2016) 353:78–82. doi: 10.1126/science.aaf2403
4. Laubscher E, Wang X, Razin N, Dougherty T, Xu RJ, Ombelets L, et al. Accurate single-molecule spot detection for image-based spatial transcriptomics with weakly supervised deep learning. Cell Syst. (2024) 15:475–482.e6. doi: 10.1016/j.cels.2024.04.006
5. Wang N, Li X, Wang R, Ding Z. Spatial transcriptomics and proteomics technologies for deconvoluting the tumor microenvironment. Biotechnol J. (2021) 16:e2100041. doi: 10.1002/biot.202100041
6. Huang W, Randhawa R, Jain P, Hubbard S, Eickhoff J, Kummar S, et al. A novel artificial intelligence-powered method for prediction of early recurrence of prostate cancer after prostatectomy and cancer drivers. JCO Clin Cancer Inform. (2022) 6:e2100131. doi: 10.1200/CCI.21.00131
7. Pizurica M, Larmuseau M, van der Eecken K, de Schaetzen van Brienen L, Carrillo-Perez F, Isphording S, et al. Whole slide imaging-based prediction of TP53 mutations identifies an aggressive disease phenotype in prostate cancer. Cancer Res. (2023) 83:2970–84. doi: 10.1158/0008-5472.CAN-22-3113
8. Jiang Y, Huang S, Zhu X, Cheng L, Liu W, Chen Q, et al. Artificial intelligence meets whole slide images: deep learning model shapes an immune-hot tumor and guides precision therapy in bladder cancer. J Oncol. (2022) 2022:8213321. doi: 10.1155/2022/8213321
9. Chen S, Jiang L, Gao F, Zhang E, Wang T, Zhang N, et al. Machine learning-based pathomics signature could act as a novel prognostic marker for patients with clear cell renal cell carcinoma. Br J Cancer. (2022) 126:771–7. doi: 10.1038/s41416-021-01640-2
10. Erlmeier F, Klumper N, Landgraf L, Strissel PL, Strick R, Sikic D, et al. Spatial Immunephenotypes of Distant Metastases but not Matched Primary Urothelial Carcinomas Predict Response to Immune Checkpoint Inhibition. Eur Urol. (2023) 83:133–42. doi: 10.1016/j.eururo.2022.10.020
11. Yin PN, Kc K, Wei S, Yu Q, Li R, Haake AR, et al. Histopathological distinction of non-invasive and invasive bladder cancers using machine learning approaches. BMC Med Inform Decis Mak. (2020) 20:162. doi: 10.1186/s12911-020-01185-z
12. Zhong Q, Ruschoff JH, Guo T, Gabrani M, Schuffler PJ, Rechsteiner M, et al. Image-based computational quantification and visualization of genetic alterations and tumour heterogeneity. Sci Rep. (2016) 6:24146. doi: 10.1038/srep24146
13. Fitzgerald S, Wang S, Dai D, Murphree DH Jr., Pandit A, Douglas A, et al. Orbit image analysis machine learning software can be used for the histological quantification of acute ischemic stroke blood clots. PloS One. (2019) 14:e0225841. doi: 10.1371/journal.pone.0225841
14. Marée R, Dallongeville S, Olivo-Marin J-C, Meas-Yedid V. (2016). An approach for detection of glomeruli in multisite digital pathology, in: 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Prague, Czech Republic. New York City, US: IEEE. 1033–6.
15. Nalisnik M, Amgad M, Lee S, Halani SH, Velazquez Vega JE, Brat DJ, et al. Interactive phenotyping of large-scale histology imaging data with HistomicsML. Sci Rep. (2017) 7:14588. doi: 10.1038/s41598-017-15092-3
16. Kandemir M, Feuchtinger A, Walch A, Hamprecht FA. (2014). Digital pathology: Multiple instance learning can detect Barrett's cancer, in: 2014 IEEE 11th International Symposium on Biomedical Imaging (ISBI), IEEE, Beijing, China. New York City, US: IEEE. 1348–51.
17. Pakula H, Omar M, Carelli R, Pederzoli F, Fanelli GN, Pannellini T, et al. Distinct mesenchymal cell states mediate prostate cancer progression. Nat Commun. (2024) 15:363. doi: 10.1038/s41467-023-44210-1
18. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, et al. Fiji: an open-source platform for biological-image analysis. Nat Methods. (2012) 9:676–82. doi: 10.1038/nmeth.2019
19. Schuffler PJ, Fuchs TJ, Ong CS, Wild PJ, Rupp NJ, Buhmann JM. TMARKER: A free software toolkit for histopathological cell counting and staining estimation. J Pathol Inform. (2013) 4:S2. doi: 10.4103/2153-3539.109804
20. Stritt M, Stalder AK, Vezzali E. Orbit Image Analysis: An open-source whole slide image analysis tool. PloS Comput Biol. (2020) 16:e1007313. doi: 10.1371/journal.pcbi.1007313
21. Maree R, Rollus L, Stevens B, Hoyoux R, Louppe G, Vandaele R, et al. Collaborative analysis of multi-gigapixel imaging data using Cytomine. Bioinformatics. (2016) 32:1395–401. doi: 10.1093/bioinformatics/btw013
22. Gutman DA, Khalilia M, Lee S, Nalisnik M, Mullen Z, Beezley J, et al. The digital slide archive: A software platform for management, integration, and analysis of histology for cancer research. Cancer Res. (2017) 77:e75–8. doi: 10.1158/0008-5472.CAN-17-0629
23. Stirling DR, Swain-Bowden MJ, Lucas AM, Carpenter AE, Cimini BA, Goodman A. CellProfiler 4: improvements in speed, utility and usability. BMC Bioinf. (2021) 22:433. doi: 10.1186/s12859-021-04344-9
24. Berg S, Kutra D, Kroeger T, Straehle CN, Kausler BX, Haubold C, et al. ilastik: interactive machine learning for (bio)image analysis. Nat Methods. (2019) 16:1226–32. doi: 10.1038/s41592-019-0582-9
25. Rosenthal J, Carelli R, Omar M, Brundage D, Halbert E, Nyman J, et al. Building tools for machine learning and artificial intelligence in cancer research: best practices and a case study with the PathML toolkit for computational pathology. Mol Cancer Res. (2022) 20:202–6. doi: 10.1158/1541-7786.MCR-21-0665
26. Wen S, Kurc TM, Hou L, Saltz JH, Gupta RR, Batiste R, et al. Comparison of different classifiers with active learning to support quality control in nucleus segmentation in pathology images. AMIA Jt Summits Transl Sci Proc. (2018) 2017:227–36.
27. Zhang S, Chen C, Chen C, Chen F, Li M, Yang B. Research on application of classification model based on stack generalization in staging of cervical tissue pathological images. IEEE Access. (2021) 9:48980–91. doi: 10.1109/ACCESS.2021.3064040
28. Zheng Q, Jiang Z, Ni X, Yang S, Jiao P, Wu J, et al. Machine learning quantified tumor-stroma ratio is an independent prognosticator in muscle-invasive bladder cancer. Int J Mol Sci. (2023) 24. doi: 10.3390/ijms24032746
29. Zheng Q, Yang R, Ni X, Yang S, Jiao P, Wu J, et al. Quantitative assessment of tumor-infiltrating lymphocytes using machine learning predicts survival in muscle-invasive bladder cancer. J Clin Med. (2022) 11. doi: 10.3390/jcm11237081
30. Chen S, Zhang N, Jiang L, Gao F, Shao J, Wang T, et al. Clinical use of a machine learning histopathological image signature in diagnosis and survival prediction of clear cell renal cell carcinoma. Int J Cancer. (2021) 148:780–90. doi: 10.1002/ijc.v148.3
31. Blessin NC, Yang C, Mandelkow T, Raedler JB, Li W, Bady E, et al. Automated Ki-67 labeling index assessment in prostate cancer using artificial intelligence and multiplex fluorescence immunohistochemistry. J Pathol. (2023) 260:5–16. doi: 10.1002/path.v260.1
32. Eloy C, Marques A, Pinto J, Pinheiro J, Campelos S, Curado M, et al. Artificial intelligence-assisted cancer diagnosis improves the efficiency of pathologists in prostatic biopsies. Virchows Arch. (2023) 482:595–604. doi: 10.1007/s00428-023-03518-5
33. Ghamande SS, Cline JK, Sayyid RK, Klaassen Z. Advancing precision oncology with artificial intelligence: ushering in the arteraAI prostate test. Urology. (2024) 188:20–3. doi: 10.1016/j.urology.2024.04.009
34. Juan Ramon A, Parmar C, Carrasco-Zevallos OM, Csiszer C, Yip SSF, Raciti P, et al. Development and deployment of a histopathology-based deep learning algorithm for patient prescreening in a clinical trial. Nat Commun. (2024) 15:4690. doi: 10.1038/s41467-024-49153-9
35. Lotan Y, Krishna V, Abuzeid WM, Launer B, Chang SS, Krishna V, et al. Predicting response to intravesical BCG in high-risk NMIBC using an artificial intelligence-powered pathology assay: development and validation in an international 12-center cohort. J Urol. (2024), 101097JU0000000000004278. doi: 10.1097/JU.0000000000004278
36. Williams CG, Lee HJ, Asatsuma T, Vento-Tormo R, Haque A. An introduction to spatial transcriptomics for biomedical research. Genome Med. (2022) 14:68. doi: 10.1186/s13073-022-01075-1
37. Black S, Phillips D, Hickey JW, Kennedy-Darling J, Venkataraaman VG, Samusik N, et al. CODEX multiplexed tissue imaging with DNA-conjugated antibodies. Nat Protoc. (2021) 16:3802–35. doi: 10.1038/s41596-021-00556-8
38. Giesen C, Wang HA, Schapiro D, Zivanovic N, Jacobs A, Hattendorf B, et al. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat Methods. (2014) 11:417–22. doi: 10.1038/nmeth.2869
39. Ptacek J, Locke D, Finck R, Cvijic ME, Li Z, Tarolli JG, et al. Multiplexed ion beam imaging (MIBI) for characterization of the tumor microenvironment across tumor types. Lab Invest. (2020) 100:1111–23. doi: 10.1038/s41374-020-0417-4
40. Spraggins JM, Rizzo DG, Moore JL, Noto MJ, Skaar EP, Caprioli RM. Next-generation technologies for spatial proteomics: Integrating ultra-high speed MALDI-TOF and high mass resolution MALDI FTICR imaging mass spectrometry for protein analysis. Proteomics. (2016) 16:1678–89. doi: 10.1002/pmic.201600003
41. Merritt CR, Ong GT, Church SE, Barker K, Danaher P, Geiss G, et al. Multiplex digital spatial profiling of proteins and RNA in fixed tissue. Nat Biotechnol. (2020) 38:586–99. doi: 10.1038/s41587-020-0472-9
42. Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, et al. Squidpy: a scalable framework for spatial omics analysis. Nat Methods. (2022) 19:171–8. doi: 10.1038/s41592-021-01358-2
43. Dries R, Zhu Q, Dong R, Eng CL, Li H, Liu K, et al. Giotto: a toolbox for integrative analysis and visualization of spatial expression data. Genome Biol. (2021) 22:78. doi: 10.1186/s13059-021-02286-2
44. Hu J, Schroeder A, Coleman K, Chen C, Auerbach BJ, Li M. Statistical and machine learning methods for spatially resolved transcriptomics with histology. Comput Struct Biotechnol J. (2021) 19:3829–41. doi: 10.1016/j.csbj.2021.06.052
45. Lee AJ, Cahill R, Abbasi-Asl R. Machine learning for uncovering biological insights in spatial transcriptomics data. ArXiv. (2023). doi: 10.48550/arXiv.2303.16725
46. Tan X, Su A, Tran M, Nguyen Q. SpaCell: integrating tissue morphology and spatial gene expression to predict disease cells. Bioinformatics. (2020) 36:2293–4. doi: 10.1093/bioinformatics/btz914
47. Pham D, Tan X, Balderson B, Xu J, Grice LF, Yoon S, et al. Robust mapping of spatiotemporal trajectories and cell-cell interactions in healthy and diseased tissues. Nat Commun. (2023) 14:7739. doi: 10.1038/s41467-023-43120-6
48. Hu J, Li X, Coleman K, Schroeder A, Ma N, Irwin DJ, et al. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nat Methods. (2021) 18:1342–51. doi: 10.1038/s41592-021-01255-8
49. Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, et al. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol. (2022) 40:517–26. doi: 10.1038/s41587-021-00830-w
50. Kleshchevnikov V, Shmatko A, Dann E, Aivazidis A, King HW, Li T, et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol. (2022) 40:661–71. doi: 10.1038/s41587-021-01139-4
51. Shen R, Liu L, Wu Z, Zhang Y, Yuan Z, Guo J, et al. Spatial-ID: a cell typing method for spatially resolved transcriptomics via transfer learning and spatial embedding. Nat Commun. (2022) 13:7640. doi: 10.1038/s41467-022-35288-0
52. Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods. (2020) 17:159–62. doi: 10.1038/s41592-019-0667-5
53. Mou M, Pan Z, Lu M, Sun H, Wang Y, Luo Y, et al. Application of machine learning in spatial proteomics. J Chem Inf Model. (2022) 62:5875–95. doi: 10.1021/acs.jcim.2c01161
54. Gatto L, Breckels LM, Burger T, Nightingale DJ, Groen AJ, Campbell C, et al. A foundation for reliable spatial proteomics data analysis. Mol Cell Proteomics. (2014) 13:1937–52. doi: 10.1074/mcp.M113.036350
55. Mulvey CM, Breckels LM, Geladaki A, Britovsek NK, Nightingale DJH, Christoforou A, et al. Using hyperLOPIT to perform high-resolution mapping of the spatial proteome. Nat Protoc. (2017) 12:1110–35. doi: 10.1038/nprot.2017.026
56. Lund-Johansen F, de la Rosa Carrillo D, Mehta A, Sikorski K, Inngjerdingen M, Kalina T, et al. MetaMass, a tool for meta-analysis of subcellular proteomics data. Nat Methods. (2016) 13:837–40. doi: 10.1038/nmeth.3967
57. Kennedy MA, Hofstadter WA, Cristea IM. TRANSPIRE: A computational pipeline to elucidate intracellular protein movements from spatial proteomics data sets. J Am Soc Mass Spectrom. (2020) 31:1422–39. doi: 10.1021/jasms.0c00033
58. Zimmerman SM, Fropf R, Kulasekara BR, Griswold M, Appelbe O, Bahrami A, et al. Spatially resolved whole transcriptome profiling in human and mouse tissue using Digital Spatial Profiling. Genome Res. (2022) 32:1892–905. doi: 10.1101/gr.276206.121
59. Wang Y, Ma S, Ruzzo WL. Spatial modeling of prostate cancer metabolic gene expression reveals extensive heterogeneity and selective vulnerabilities. Sci Rep. (2020) 10:3490. doi: 10.1038/s41598-020-60384-w
Keywords: machine learning, spatial omics, digital pathology, genitourinary, oncology
Citation: Kim H, Kim J, Yeon SY and You S (2024) Machine learning approaches for spatial omics data analysis in digital pathology: tools and applications in genitourinary oncology. Front. Oncol. 14:1465098. doi: 10.3389/fonc.2024.1465098
Received: 15 July 2024; Accepted: 08 November 2024;
Published: 29 November 2024.
Edited by:
Martin King, Harvard Medical School, United StatesReviewed by:
Murat Akand, University Hospitals Leuven, BelgiumJianing Xi, Guangzhou Medical University, China
Copyright © 2024 Kim, Kim, Yeon and You. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Sungyong You, U3VuZ3lvbmcuWW91QGNzaHMub3Jn
†These authors have contributed equally to this work