Pancreatic cancer is one of the most fatal malignancies of the gastrointestinal cancer, with a challenging early diagnosis due to lack of distinctive symptoms and specific biomarkers. The exact etiology of pancreatic cancer is unknown, making the development of reliable biomarkers difficult. The accumulation of patient-derived omics data along with technological advances in artificial intelligence is giving way to a new era in the discovery of suitable biomarkers.
We performed machine learning (ML)-based modeling using four independent transcriptomic datasets, including GSE16515, GSE62165, GSE71729, and the pancreatic adenocarcinoma (PAC) dataset of the Cancer Genome Atlas. To find candidates for circulating biomarkers, we exported expression profiles of 1,703 genes encoding secretory proteins. Integrating three transcriptomic datasets into either a training or test set, ML-based modeling distinguishing PAC from normal was carried out. Another ML-model classifying long-lived and short-lived patients with PAC was also built to select prognosis-associated features. Finally, circulating level of SCG5 in the plasma was determined from the independent cohort (non-tumor = 25 and pancreatic cancer = 25). We also investigated the impact of SCG5 on adipocyte biology using recombinant protein.
Three distinctive ML-classifiers selected 29-, 64- and 18-featured genes, recognizing the only common gene,
Circulating SCG5, which may be associated with adipopenia, is a promising diagnostic biomarker for PAC.