To develop an appropriate machine learning model for predicting anaplastic lymphoma kinase (ALK) rearrangement status in non-small cell lung cancer (NSCLC) patients using computed tomography (CT) images and clinical features.
This study included 193 patients with NSCLC (154 in the training cohort, 39 in the validation cohort), 68 of whom tested positive for ALK rearrangements and 125 of whom tested negative. From the nonenhanced CT scans, 157 radiomic characteristics were extracted, and 8 clinical features were collected. Five machine learning (ML) models were assessed to find the best classification model for predicting ALK rearrangement status. A radiomic signature was developed using the least absolute shrinkage and selection operator (LASSO) algorithm. The predictive performance of the models based on radiomic features, clinical features, and their combination was assessed by receiver operating characteristic (ROC) curves.
The support vector machine (SVM) model had the highest AUC of 0.914 for classification. The clinical features model had an AUC=0.805 (95% CI 0.731–0.877) and an AUC=0.735 (95% CI 0.566–0.863) in the training and validation cohorts, respectively. The CT image-based ML model had an AUC=0.953 (95% CI 0.913–1.0) in the training cohort and an AUC=0.890 (95% CI 0.778–0.971) in the validation cohort. For predicting ALK rearrangement status, the ML model based on CT images and clinical features performed better than the model based on only clinical information or CT images, with an AUC of 0.965 (95% CI 0.826–0.882) in the primary cohort and an AUC of 0.914 (95% CI 0.804–0.893) in the validation cohort.
Our findings revealed that ALK rearrangement status could be accurately predicted using an ML-based classification model based on CT images and clinical data.