AUTHOR=Lv Jieqin , Chen Xiaohui , Liu Xinran , Du Dongyang , Lv Wenbing , Lu Lijun , Wu Hubing TITLE=Imbalanced Data Correction Based PET/CT Radiomics Model for Predicting Lymph Node Metastasis in Clinical Stage T1 Lung Adenocarcinoma JOURNAL=Frontiers in Oncology VOLUME=Volume 12 - 2022 YEAR=2022 URL=https://www.frontiersin.org/journals/oncology/articles/10.3389/fonc.2022.788968 DOI=10.3389/fonc.2022.788968 ISSN=2234-943X ABSTRACT=Objectives To develop and validate the imbalanced data correction based PET/CT radiomics model for predicting lymph node metastasis (LNM) in clinical stage T1 lung adenocarcinoma (LUAD). Methods 183 patients (148/35 non-metastasis/ LNM) with pathologically confirmed LUAD were retrospectively included. The cohorts were divided into training vs. validation cohort in a ratio of 7: 3. 487 radiomics features were extracted from PET and CT components separately for radiomics model construction. Four clinical features and seven PET/CT radiological features were extracted for traditional model construction. To balance the distribution of majority (non-metastasis) class and minority (LNM) class, the imbalance-adjustment strategies using ten data re-sampling methods were adopted. Three multivariate models (denoted as Traditional, Radiomics and Combined) were constructed using multivariable logistic regression analysis, where the combined model incorporated all of the significant clinical, radiological and radiomics features. 100 times repeated Monte Carlo cross-validation was used to assess the application order of feature selection and imbalance-adjustment strategies in the machine learning pipeline. Prediction performance of each model was evaluated using the area under the receiver operating characteristic curve (AUC) and Geometric mean score (G-mean). Results 2 clinical parameters, 2 radiological features, 3 PET and 5 CT radiomics features were significantly associated with LNM. The combined model with Edited Nearest Neighbours (ENN) re-sampling methods showed strong prediction performance than traditional model or radiomics model with the AUC of 0.94 (95% CI= 0.86-0.97) vs. 0.89 (95% CI= 0.79-0.93), 0.92 (95% CI= 0.85-0.97), and G-mean of 0.88 vs. 0.82, 0.80 in the training cohort, and the AUC of 0.75 (95% CI= 0.57-0.91) vs. 0.68 (95% CI= 0.36-0.83), 0.71 (95% CI= 0.48-0.83) and G-mean of 0.76 vs. 0.64, 0.51 in the validation cohort. The combination that performing feature selection before data re-sampling obtain a better result than the reverse combination (AUC 0.76±0.06 vs. 0.70±0.07, p<0.001). Conclusions The combined model (consisting of age, histological type, C/T ratio, MATV and radiomics signature) integrated with ENN re-sampling methods had strong lymph node metastasis prediction performance for imbalance cohorts in clinical stage T1 LUAD. Radiomics signatures extracted from PET/CT images could provide complementary prediction information compared with traditional model.