AUTHOR=Gehrmann Julia , Soenarto Devina Johanna , Hidayat Kevin , Beyer Maria , Quakulinski Lars , Alkarkoukly Samer , Berressem Scarlett , Gundert Anna , Butler Michael , Grönke Ana , Lennartz Simon , Persigehl Thorsten , Zander Thomas , Beyan Oya TITLE=Seeing the primary tumor because of all the trees: Cancer type prediction on low-dimensional data JOURNAL=Frontiers in Medicine VOLUME=11 YEAR=2024 URL=https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1396459 DOI=10.3389/fmed.2024.1396459 ISSN=2296-858X ABSTRACT=

The Cancer of Unknown Primary (CUP) syndrome is characterized by identifiable metastases while the primary tumor remains hidden. In recent years, various data-driven approaches have been suggested to predict the location of the primary tumor (LOP) in CUP patients promising improved diagnosis and outcome. These LOP prediction approaches use high-dimensional input data like images or genetic data. However, leveraging such data is challenging, resource-intensive and therefore a potential translational barrier. Instead of using high-dimensional data, we analyzed the LOP prediction performance of low-dimensional data from routine medical care. With our findings, we show that such low-dimensional routine clinical information suffices as input data for tree-based LOP prediction models. The best model reached a mean Accuracy of 94% and a mean Matthews correlation coefficient (MCC) score of 0.92 in 10-fold nested cross-validation (NCV) when distinguishing four types of cancer. When considering eight types of cancer, this model achieved a mean Accuracy of 85% and a mean MCC score of 0.81. This is comparable to the performance achieved by approaches using high-dimensional input data. Additionally, the distribution pattern of metastases appears to be important information in predicting the LOP.