A Classifier for Improving Early Lung Cancer Diagnosis Incorporating Artificial Intelligence and Liquid Biopsy

Ye, Maosong; Tong, Lin; Zheng, Xiaoxuan; Wang, Hui; Zhou, Haining; Zhu, Xiaoli; Zhou, Chengzhi; Zhao, Peige; Wang, Yan; Wang, Qi; Bai, Li; Cai, Zhigang; Kong, Feng-Ming (Spring); Wang, Yuehong; Li, Yafei; Feng, Mingxiang; Ye, Xin; Yang, Dawei; Liu, Zilong; Zhang, Quncheng; Wang, Ziqi; Han, Shuhua; Sun, Lihong; Zhao, Ningning; Yu, Zubin; Zhang, Juncheng; Zhang, Xiaoju; Katz, Ruth L.; Sun, Jiayuan; Bai, Chunxue

doi:10.3389/fonc.2022.853801

ORIGINAL RESEARCH article

Front. Oncol. , 02 March 2022

Sec. Thoracic Oncology

Volume 12 - 2022 | https://doi.org/10.3389/fonc.2022.853801

This article is part of the Research Topic Epidemiology, Screening and Diagnosis of Lung Cancer View all 24 articles

A Classifier for Improving Early Lung Cancer Diagnosis Incorporating Artificial Intelligence and Liquid Biopsy

Maosong Ye^1†

Lin Tong^1,2†

Xiaoxuan Zheng^3,4†

Hui Wang^5,6†

Haining Zhou^7†

Xiaoli Zhu^8†

Chengzhi Zhou^9†

Peige Zhao^10†

Yan Wang^11†

Qi Wang^12†

Li Bai^13†

Zhigang Cai^14†

Feng-Ming (Spring) Kong^15†

Yuehong Wang^16†

Yafei Li^17†

Mingxiang Feng^18†

Xin Ye^19,20†

Dawei Yang¹

Zilong Liu¹

Quncheng Zhang⁶

Ziqi Wang⁶

Shuhua Han⁸

Lihong Sun¹¹

Ningning Zhao¹¹

Zubin Yu²¹

Juncheng Zhang^19,20

Xiaoju Zhang⁶

Ruth L. Katz²²

Jiayuan Sun^3,4*

Chunxue Bai^1*

¹Department of Pulmonary and Critical Care Medicine, Zhongshan Hospital, Fudan University, Shanghai, China
²Shanghai Respiratory Research Institute, Shanghai, China
³Department of Respiratory Endoscopy, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
⁴Department of Respiratory and Critical Care Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
⁵Xinxiang Medical University, Xinxiang, China
⁶Department of Respiratory and Critical Care Medicine, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, China
⁷Department of Thoracic Surgery, Respiratory Center of Suining Central Hospital, Suining, China
⁸Department of Pulmonary and Critical Care Medicine, Zhongda Hospital, Southeast University, Nanjing, China
⁹State Key Laboratory of Respiratory Disease, National Clinical Research Center of Respiratory Disease, Guangzhou Institute of Respiratory Health, First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
¹⁰Department of Respiratory and Critical Care Medicine, Affiliated Hospital of Qingdao University, Qingdao, China
¹¹Department of Respiratory and Critical Care Medicine, Liaocheng People’s Hospital, Liaocheng, China
¹²Department of Respiratory Medicine, The Second Affiliated Hospital of Dalian Medical University, Dalian, China
¹³Department of Respiratory Disease, Xinqiao Hospital, Army Medical University, Chongqing, China
¹⁴The First Department of Pulmonary and Critical Care Medicine, The Second Hospital of Hebei Medical University, Shijiazhuang, China
¹⁵Clinical Oncology Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
¹⁶Department of Respiratory Medicine, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
¹⁷Department of Epidemiology, College of Preventive Medicine, Army Medical University, Chongqing, China
¹⁸Division of Thoracic Surgery, Zhongshan Hospital, Fudan University, Shanghai, China
¹⁹Joint Research Center of Liquid Biopsy in Guangdong, Hong Kong, and Macao, Zhuhai, China
²⁰Zhuhai Sanmed Biotech Ltd., Zhuhai, China
²¹Department of Thoracic Surgery, Xinqiao Hospital, Army Medical University, Chongqing, China
²²Chaim Sheba Hospital, Tel Aviv University, Ramat Gan, Israel

Lung cancer is the leading cause of cancer-related deaths worldwide and in China. Screening for lung cancer by low dose computed tomography (LDCT) can reduce mortality but has resulted in a dramatic rise in the incidence of indeterminate pulmonary nodules, which presents a major diagnostic challenge for clinicians regarding their underlying pathology and can lead to overdiagnosis. To address the significant gap in evaluating pulmonary nodules, we conducted a prospective study to develop a prediction model for individuals at intermediate to high risk of developing lung cancer. Univariate and multivariate logistic analyses were applied to the training cohort (n = 560) to develop an early lung cancer prediction model. The results indicated that a model integrating clinical characteristics (age and smoking history), radiological characteristics of pulmonary nodules (nodule diameter, nodule count, upper lobe location, malignant sign at the nodule edge, subsolid status), artificial intelligence analysis of LDCT data, and liquid biopsy achieved the best diagnostic performance in the training cohort (sensitivity 89.53%, specificity 81.31%, area under the curve [AUC] = 0.880). In the independent validation cohort (n = 168), this model had an AUC of 0.895, which was greater than that of the Mayo Clinic Model (AUC = 0.772) and Veterans’ Affairs Model (AUC = 0.740). These results were significantly better for predicting the presence of cancer than radiological features and artificial intelligence risk scores alone. Applying this classifier prospectively may lead to improved early lung cancer diagnosis and early treatment for patients with malignant nodules while sparing patients with benign entities from unnecessary and potentially harmful surgery.

Clinical Trial Registration Number: ChiCTR1900026233, URL: http://www.chictr.org.cn/showproj.aspx?proj=43370.

Introduction

Approximately 22% of the newly diagnosed cancer cases worldwide and 27% of cancer-related deaths occur in China (1). In 2018, the 5-year survival rate for lung cancer in China was 19.7% (2). Based on the results of the National Lung Screening Trial (NLST) (3, 4), low-dose computed tomography (LDCT) is the recommended test for lung cancer screening, but the high false-positive rate has diminished the benefits of the test; indeed, in a previous study, only 3.6% of the participants who had pulmonary nodules were confirmed to have lung cancer (3). Therefore, clinicians use diagnostic decision tools to stratify the malignancy risk of patients with positive LDCT results (5). The Mayo Clinic Model has been extensively validated worldwide and includes factors such as age, smoking history, extra-thoracic cancer history, spiculation, nodule diameter, and upper lobe location (6). However, because of the variation in ethnicity and environment, some risk factors might have different impacts on the Chinese population. For example, the diagnostic significance of the malignant risk factor “upper lobe location” is weakened owing to the high prevalence of tuberculosis (7).

New technologies have resulted in the emergence of several tools for early cancer diagnosis. Artificial intelligence (AI) approaches combined with deep learning technology have been adopted for image analysis in clinical settings. The use of AI can help clinicians reduce the risk of human errors caused by classifying a large number of medical images (8), which may lead to improved diagnostic efficacy of LDCT for lung cancer (9). Several studies have demonstrated that the application of deep learning technology may improve the performance of lung cancer diagnosis by the precise recognition of specific malignant features from LDCT images (10, 11). In general, AI can analyze the whole pulmonary nodule, looking for features characteristic of invasion, as opposed to histopathological evaluation of a small biopsy taken from an intermediate- or high-risk pulmonary nodule, which may not be representative (8, 11, 12). In addition, testing for early lung cancer via liquid biopsy using novel, sensitive, and specific biomarkers to examine cancer-related proteins or abnormal DNA (13, 14). Liquid biopsy for early lung cancer detection has been extensively investigated with various biomarkers and platforms. Indeed, previous studies (15–17) demonstrated that a fluorescent in situ hybridization (FISH) liquid biopsy approach to detect cells with cytogenetic abnormalities may be used to rule out lung cancer in individuals with intermediate pulmonary nodules (18, 19).

Guidelines for the early diagnosis of lung cancer in China recommend that prediction models be established based on data retrieved from Chinese populations (20), based on a broad range of preliminary information and evidence (21, 22). We hypothesized that the integration of clinical and radiological characteristics, together with AI interpretation of LDCT images and liquid biopsy testing for cells with cytogenetic abnormalities via a 4-color FISH array, might improve the ability to diagnose early lung cancer in individuals with intermediate and high-risk pulmonary nodules on LDCT. To this end, we conducted a prospective multicenter study in China to establish an effective early lung cancer prediction model to improve the diagnosis of pulmonary nodules with an intermediate and high risk of lung cancer detected by LDCT.

Material and Methods

Study Population

The study was approved by the Institutional Review Board of Zhongshan Hospital of Fudan University. A total of 1,663 individuals were recruited to the study from consecutive outpatients of 12 tertiary hospitals across mainland China. Pulmonary nodules detected by LDCT were identified as intermediate and high-risk for lung cancer by physicians in the usual care routine. Intermediate risk was defined as individuals requiring follow up to rule out malignancy, while high-risk was defined as individuals with a clinical suspicion of lung cancer. The flow chart in Figure 1 describes the criteria for patient recruitment in this study. Written informed consent was obtained from all participants.

FIGURE 1

Figure 1 Schematic Diagram of the Study Design.

Eligible patients recruited from ten hospitals between September 2019 and September 2020 were enrolled in the training set to establish an early lung cancer prediction model. Subsequently, an independent validation set composed of participants evaluated between March 2020 and October 2020 from the remaining two hospitals was used to test the diagnostic performance of the comprehensive lung cancer risk prediction model. The final selection of the individuals comprising the training set (n = 560) and independent validation set (n = 168) was based on the exclusion criteria shown in Figure 1.

Data Collection

All participants completed a demographic survey to obtain clinical information. LDCT images in the 6 months prior to enrollment of individuals were obtained for AI analysis. Following AI of LDCT scans and liquid biopsy, patients with intermediate and high-risk pulmonary nodules who met the inclusion criteria were subjected to fiberoptic bronchoscopy, fine needle biopsy, and/or surgical resection of their nodules for pathological examination. The World Health Organization classification for lung tumors was used to classify lung masses, and staging was based on the 8th edition of the TNM Classification for Lung Cancer of the International Cancer Control and the American Joint Committee on Cancer staging system.

AI Analysis Tool Development

An automated diagnostic platform comprising a deep-learning-based AI algorithm with a three-stage end-to-end deep conventional neural network (DCNNs) was developed to analyze the LDCT images of the patients. First, a 3D U-net-based DCNN was used for the patch segmentation of lung nodules to identify suspicious nodules. The LDCT images with labels were cropped in a sliding window style and feed into a 3-layer 3D U-Net segmentation model for training. Then the predicted segmentation patches were combined to generate final segmentation results. Next, the 3D patches of the suspicious nodules were forwarded to a false positive reduction network (FPRN) to discriminate the true clinically positive nodules from the false positive nodules. Then, the patches that were labeled positive were forwarded to a CNN-based classifier to determine whether the nodule was malignant or benign. This 3D U-net segmentation network was initially trained with the publicly available The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset and then further trained on a dataset of about approximately 20,000 samples from hospitals in the U.S. and China with histopathological results. Through further evaluation by experienced radiologists, the patches identified by the U-net in the first stage were segmented by manually marking the true clinically positive nodules and false positive nodules. The FPRN and malignant/benign (M/B) classifier were then trained at the patch level according to the true malignancy status confirmed by pathology results (Figure 2). All networks were trained with Python 3.6 and Tensorflow 1.10 on a NVIDIA DGX station. The LDCT data of the 728 participants were saved in DICOM format and uploaded to the AI lung nodule analysis platform for analysis. After the images were analyzed, the AI model provided a risk score for developing lung cancer (ranging from 0 to 100%) and a diagnosis statement for each participant.

FIGURE 2

Figure 2 End-to-end deep convolutional neural network-based Artificial Intelligence low-dose computed tomography analysis toll development procedures, (A) A three-dimensional (3D) U-net-based convolutional neural network was used for the segmentation of lung nodules to identify suspicious nodules; (B) the 3D patches of the suspicious nodules were cropped and forwarded to a false-positive reduction network to discriminate the true clinically positive nodules from the false-positive nodules; (C) the patches that were labeled as positive were forwarded to a convolutional neural network-based classifier to determine whether the nodule was malignant or benign.

Liquid Biopsy

To detect genetically circulating abnormal cells, we used a peripheral blood 4-color FISH assay developed to generate data for this study (23). This multiplex interphase FISH assay is composed of four DNA probes that are universally deleted in non-small cell lung cancer (NSCLC) and have been implicated in the pathogenesis of NSCLC (14, 23). This assay has previously shown a high degree of accuracy in detecting cells containing chromosomal abnormalities at 10q22.3 and 3p22.1 and in the internal control genes CEP 10 and 3q29 (14) in several studies involving the detection of early lung cancer (24). Abnormal cells that were discovered by the 4-color FISH assay were identified as intact cells with a nucleus larger than a lymphocyte nucleus and polysomy of at least two probes per nucleus. The FISH assay was performed according to the manufacturer’s instructions as previously described (Figure 3) (25).

FIGURE 3

Figure 3 Sample process procedures of liquid biopsy via 4-color fluorescent in situ hybridization (FISH) assay. (A) Peripheral blood from patients with indeterminate or high-risk nodules. (B) The peripheral blood mononuclear cells layer was isolated after configuration. (C) The peripheral blood mononuclear cells were applied to a glass slide. (D) Hybridization with 4-color FISH probes. (E) The result of the assay, scanned with a Duet microscope system.

Statistical Analysis

Descriptive analyses of the variables are expressed as means, ranges, or numbers, expressed as percentages (%). Statistical analysis was performed using Python version 3.8.5 (Python Software Foundation, USA) and MedCalc version 19.0.4 (MedCalc Software Ltd., Ostend, Belgium). All tests were 2-sided, and statistical significance was set at p <0.05.

Receiver operating curves (ROCs) were used to determine the individual performance of AI and liquid biopsy using the 4-color FISH assay. Univariate logistic regression analyses were used to determine the individual factors associated with early lung cancer in the training cohort. Variables with p <0.05 in the univariate analysis were included in a multivariate logistic regression analysis to examine the independent predictive factors for inclusion in the early lung cancer diagnostic models with different sets of predictors. Cohen’s kappa (κ) statistic was used to measure the reliability of the individual predictors. The mean sensitivity, specificity, and area under the curve (AUC) from the 10-fold cross validation were used to determine the diagnostic power of multiple early lung cancer prediction models. Sensitivity and specificity were used to evaluate the ability of the best-performing model to classify malignancy in an independent validation cohort. AUCs were also applied to display the classification performance of the individual validation set in Model 4, the Mayo Clinic Model, and the Veteran Affairs (VA) model.

Results

Patient Characteristics

Table 1 describes the clinical characteristics of the training and independent validation cohorts according to whether the underlying pathology was benign or malignant.

TABLE 1

Table 1 Clinical characteristics of the study participants.

Diagnostic Performance of the AI Risk Score and Liquid Biopsy

We evaluated the diagnostic ability of the AI risk score and liquid biopsy results to discriminate between benign and malignant nodules. According to the Youden index, the AI risk score had the best performance when the threshold value was set to >71%. This threshold was associated with a sensitivity of 73.77% (95% confidence interval [CI]: 69.81–77.47%) and a specificity of 65.15% (95% CI: 58.07–71.77%) in the overall cohort.

Similarly, when the cutoff value for the number of abnormal cells was set to ≥3, the sensitivity and specificity were 78.11% (95% CI: 74.35–81.56%) and 73.23% (95% CI: 66.49–79.26%), respectively. Based on the ROC curves of both tools, the AUC was 0.740 (95% CI: 0.698–0.782) for the AI risk score and 0.765 (95% CI: 0.727–0.803) for liquid biopsy in the overall cohort (Figure 4). Weak internal validity between the AI risk score and liquid biopsy data (κ = 0.16, 95% CI: 0.072–0.247) was observed, indicating the good complementary value of the two tools in early lung cancer diagnosis.

FIGURE 4

Figure 4 (A) The area under the curve (AUC) of AI was 0.740 in the overall cohort. (B) The AUC of liquid biopsy was 0.765 on the overall cohort. (C) The sensitivity was 82.8%, and the specificity was 80.95 in the independent validation cohort for the best performing model (model 4). (D) In the validation cohort, the areas under the curve were 0.895, 0.772, and 0.740 for model 4, the Mayo Clinic Model, and the VA model, respectively.

Relationship Between Individual Predictors and Lung Cancer

Next, individual radiological and clinical predictive factors were evaluated in a univariate logistic regression analysis using data from 560 patients in the training cohort. It was demonstrated that nodule diameter (p <0.001), nodule count (p <0.001), subsolid status (p <0.001), upper lobe location (p = 0.005), and malignant features, namely, lobulation, spiculation, vacuole sign, pleural indentation, and vessel convergence sign or other radiological malignant signs at the nodule edge (p <0.001), were independent radiological predictors of malignancy. Age (p <0.001), current smokers with 20 pack-years, or past smokers with quit time <15 years (p <0.001) were clinical characteristics that correlated with lung cancer. Both the risk score predicted by AI LDCT image analysis (p <0.001) and quantitation of abnormal cells identified by liquid biopsy (p <0.001) were strongly associated with malignancy (Table 2).

TABLE 2

Table 2 Univariate analyses of predictors of malignancy.

Multivariate Logistic Regression Analysis to Build Early Lung Cancer Prediction Models

Before building the early lung cancer prediction models, we applied correlation analyses to test the internal validation of the individual early lung cancer risk predictors. The correlation heat maps showed that the correlations between age, smoking, AI risk factors, liquid biopsy results, and radiological predictors that were significantly associated with malignancy in the univariate analysis were very weak (Figure 5), revealing that there was no multicollinearity association between each predictor.

FIGURE 5

Figure 5 Correlation Heat Map of Individual Predictors in the Training Cohort.

Using multivariate logistic regression analysis based on the malignancy predictors identified using the univariate statistical method, we first built four models, each with a different set of predictors (Table 3). Next, we calculated the diagnostic powers of the four models using 10-fold cross validation. The lowest diagnostic performance was found in model 1, which comprised only radiological characteristics (diameter, nodule count, subsolid status, upper lobe location, and malignant signs at the nodule edge), with sensitivity, specificity, and AUC of 89.01% (95% CI: 82–96.03%), 62.52% (95% CI: 50.33–74.70%), and 0.769 (95% CI: 0.719–0.820), respectively. In model 2, when predictors were also consistent with radiological characteristics, with the addition of the AI risk score, there was a slight increase in the AUC to 0.791 (95% CI: 0.737–0.845), with a sensitivity of 89.18% (95% CI: 81.30–97.09%) and a specificity of 65.96% (95% CI: 53.13–78.80%). For model 3, we attempted to integrate clinical characteristics (age and smoking), radiological characteristics, and the quantitation of abnormal cells identified by the 4-color FISH test to determine the power of the risk prediction model without AI. The AUCs of model 3 achieved 0.872 (95% CI: 0.846–0.900), with 86.29% (95% CI: 77.32–95.25%) sensitivity and 83.25% (95% CI: 76.70–89.80%) specificity. The best diagnostic performance appeared to be model 4, which combined clinical and radiological characteristics, the AI risk score, and liquid biopsy results, with 89.53% (95% CI: 81.79–97.26%) sensitivity, 81.31% (95% CI: 76.43–86.18%) specificity, and an AUC of 0.880 (95% CI: 0.852–0.910), respectively (Table 3).

TABLE 3

Table 3 Ten-fold cross validation result of classifiers with different predictors.

Performance of the Best Model in Independent Validation Cohort & Comparison With Other Clinical Models

Based on the perimeters that we developed from the training cohort, we tested the power of the best early lung cancer prediction model that combined clinical characteristics (age and smoking), radiological characteristics (diameter, nodule count, subsolid status, upper lobe location, and malignant signs at the nodule edge), AI risk score, and liquid biopsy results of the 4-color FISH assay in the independent validation cohort (n = 168) (Table 1). This model reached 82.86% (95% CI: 74.27–89.51%) sensitivity and 80.95% (95% CI: 69.09–89.75%) specificity for classifying malignant and benign nodules. ROC calculations on model 4, the Mayo Clinic Model, and the VA model were utilized. The AUCs of model 4 were 0.895 (95% CI: 0.844–0.946) in the same cohort compared to 0.772 for the Mayo Clinic Model (95% CI: 0.696–0.848) and 0.740 (95% CI: 0.663–0.817) for the VA model (Figure 4).

Discussion

In this prospective Chinese cohort study, clinical and radiological characteristics, together with the AI risk score of LDCT image analysis and quantitation of abnormal cells detected via a 4 color FISH-based liquid biopsy assay, were used to build an early lung cancer prediction model to diagnose malignant pulmonary nodules in individuals evaluated as having an intermediate and high risk of lung cancer from outpatient clinics at 12 tertiary hospitals across China with newly diagnosed pulmonary nodules. Our study was a diagnostic study and not a screening study as the study population did not comprise a typical screening population with the set criteria according to the NLST. Instead, we focused on detecting lung cancer in individuals with intermediate and high-risk pulmonary nodules as confirmed by pathological examination following subsequent surgical resection. The training set was comprised of data from 560 patients and was used to establish the model. Subsequently, the efficacy of the model was tested in a validation study using data from a different set of 168 participants. We only included patients with pulmonary nodules ≤30 mm, which means that individuals with malignant pulmonary nodules were all diagnosed with stage IA (T1N0M0) lung cancer according to the TNM classification.

To the best of our knowledge, this may be one of the first studies to integrate AI for LDCT image analysis and liquid biopsy to build a prediction model to diagnose malignant pulmonary nodules in individuals with intermediate and high risks of lung cancer in a prospective cohort. We observed an improvement in the AUC in the ability to diagnose early lung cancer when combining the AI risk score with radiological characteristics. However, when using only this information, the sensitivity of the first two models was over 80% in the two cohorts, but the specificity rates were only between 62.52% and 65.96%. As indicated by the AUCs, model 3, which included clinical characteristics, radiological characteristics, and the liquid biopsy result, performed better than models 1 and 2, which only considered information provided by LDCT with and without the assistance of AI. The highest diagnostic value was attained in a model that combined clinical and radiological characteristics, AI analysis of LDCT data, and liquid biopsy results with over 80% sensitivity and specificity. Compared to models 1 and 2, the enhancement in specificity in models 3 and 4, which combined multiple predictors, namely, liquid biopsy data and clinical data, has the potential to reduce harmful side effects such as pneumothorax and bleeding, which may be caused by invasive biopsy, suggesting that the liquid biopsy result and LDCT may complement one another. These findings provide evidence that using a classifier with a broad range of validated predictors may improve the diagnostic accuracy for early lung cancer.

The use of AI in cancer diagnosis is gaining acceptance and has been investigated for its ability to assist physicians in early lung cancer detection. AI can assist clinicians in expediting the interpretation of different pathological diagnoses and reducing the mental fatigue caused by classifying a large number of medical images (26). With the increasing incidence of lung cancer in rural China and the lack of skilled physicians (27), AI may be an excellent tool for clinicians to use as a supplement to the interpretation of LDCT images. To date, the performance metrics of AI in diagnosing lung cancer have not been verified in either retrospective data, such as the NLST dataset (28–30), or relatively small datasets (31). This prospective study evaluated the diagnostic power of AI in a large cohort of 728 patients with validated lung cancer histopathology.

We chose the 4-color FISH assay for this study as we had previously demonstrated that this assay was superior to serum protein biomarkers such as carcinoembryonic antigen, neuron-specific enolase, and cytokeratin 19 fragment (32). Furthermore, certain assays for circulating tumor cells, circulating tumor DNA, and exosomes have been measured in research studies (33, 34); however, most of these assay technologies are insensitive to early-stage lung cancer and are not commercially available for detecting early lung cancer (35–37). The FISH-based liquid biopsy assay was approved for commercial use by the China National Medical Products Administration. The performance of the test was verified in a 10-year study conducted in the USA with an accuracy rate of 94.2% in 207 participants (107 patients with lung cancer, 26 patients with benign nodules, and 80 control participants) who were at high risk of developing lung cancer (25). Additionally, in a study conducted in China, the same assay yielded sensitivities of 66.7 and 73.0% for 339 participants with pure ground-glass nodules and mixed ground-glass nodules who were diagnosed with early NSCLC (32). The results of these studies indicate that the FISH assay is a reliable tool for early lung cancer diagnosis.

According to the American College of Chest Physicians guidelines, upper lobe location is a risk factor for lung cancer, as indicated by the Mayo Clinical Model, with an odds ratio (OR) of 2.2 (38). The OR of upper lobe location in our study was 1.750 (p = 0.005). This finding may indicate that, in the Chinese population, the presence of pulmonary nodules located in the upper lobe is associated with a higher risk of malignancy than those discovered in other lobes, even when considering the high prevalence of pulmonary nodules in the upper lobe secondary to tuberculosis. In addition, the AUC of our best performance model was 0.895 in the independent validation cohort, which was superior to that of the Mayo Clinic Model (0.772) and the VA model (0.740). These results demonstrate that it is necessary to develop an early lung cancer classifier based on data retrieved from a Chinese population.

Our study has some limitations. First, because the participants traveled from various locations in the country prior to visiting our outpatient clinics to seek help in evaluating their nodule status, we were unable to calculate the disease prevalence in the general population. Patients in China are more likely to visit tertiary hospitals in big cities after they have discovered pulmonary nodules by LDCT in their hometowns. Since electronic health records are not shared between hospitals, we cannot track back how many people went for lung cancer screening before those with an intermediate and high risk of lung cancer went to the 12 outpatient clinics in the main cities of China. Second, our study cohort was small compared to national-scale data sets, such as those derived from the NLST and the Dutch–Belgian Randomized Lung Cancer Screening Trial (NELSON), and therefore might not be representative of the early lung cancer characteristics of the entire Chinese population; however, this is a diagnostic study and not a screening study in the general population, we have included individuals with positive LDCT results and evaluated as intermediate and high-risk for lung cancer by physicians in the usual care routine.

In the future, we hope to apply this methodology in a prospective study with a larger sample size to continue to validate and refine our classifier to improve early lung cancer diagnosis. Given the high number of pulmonary nodules discovered by LDCT scans, many patients with nodules might need to wait for a long period for physicians to interpret CT images to evaluate the significance of these lung nodules. If nodules are suspicious for malignancy, these patients may require surgical excision, biopsy, or stereotaxic radiation; however, if benign, these patients should undergo serial CT scans. The use of a multivariate lung cancer prediction model as proposed herein can help relieve the patients’ anxiety by reducing the follow-up time to a definitive diagnosis if the risk score is high or delaying the follow-up time to less frequent LDCT scans if the classifier returns a low-risk score. This will help to streamline clinical decision making by physicians for a large number of patients. We believe that a noninvasive tool such as this classifier will be a good complementary tool for physicians in the assessment of early lung cancer.

Data Availability Statement

The original contributions presented in the study are included in the article/supplementary material. Further inquiries can be directed to the corresponding authors.

Ethics Statement

The studies involving human participants were reviewed and approved by the Ethics Committee of Zhongshan Hospital, Fudan University. The patients/participants provided their written informed consent to participate in this study.

Author Contributions

Conception and design: JS and CB Development of methodology: MY, LT, XZhe, HW, HZ, XZhu, CZ, PZ, YaW, QW, LB, ZC FK, YuW, MF, and XY. Provision of study materials: DY, ZL, QZ, ZW, SH, LS, NZ, ZY, JZ, XZha, and JS. Statistical analysis: YL, Revised the manuscript: RK, JS, and CB. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Funding

This study was supported by the Program for the Guangdong Introducing Innovative and Entrepreneurial Teams (2019ZT08Y297) and the Shanghai Engineering & Technology Research Center of the Internet of Things for Respiratory Medicine (20DZ2254400).

Conflict of Interest

Authors XY and JZ are employees of Zhuhai Sanmed Biotech Ltd. RK is a consultant of Sanmed Biotech Ltd.

The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Acknowledgments

We acknowledge Xianjun Fan for assistance with the liquid biopsy technology. We also wish to thank the following: Chuoji Huang for excellent editing; Xing Lu for support with artificial intelligence; Xiaozheng Yang for contributing to statistical analysis and scientific writing; Yanling Zhou, Yanci Chen, and Meng Huang for their outstanding laboratory contributions; the Sanmed image analysis team for 4 color-FISH cell image analysis; the Sanmed clinical sample team for handling the blood sample processing.

References

1. Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, et al. Cancer Statistics in China, 2015. CA: Cancer J Clin (2016) 66:115–32. doi: 10.3322/caac.21338

PubMed Abstract | CrossRef Full Text | Google Scholar

2. Allemani C, Matsuda T, Di Carlo V, Harewood R, Matz M, Nikšić M, et al. Articles Global Surveillance of Trends in Cancer Survival 2000–14 (CONCORD-3): Analysis of Individual Records for 37 513 025 Patients Diagnosed With One of 18 Cancers From 322 Population-Based Registries in 71 Countries. Lancet (2018) 14(17):1023–75. doi: 10.1016/S0140-6736(17)33326-3

CrossRef Full Text | Google Scholar

3. The National Lung Screening Trial Research Team. Reduced Lung-Cancer Mortality With Low-Dose Computed Tomographic Screening. N Eng J Med (2011) 365:395–409. doi: 10.1056/nejmoa1102873

CrossRef Full Text | Google Scholar

4. Smith RA, Andrews KS, Brooks D, Fedewa SA, Manassaram-Baptiste D, Saslow D, et al. Cancer Screening in the United States, 2018: A Review of Current American Cancer Society Guidelines and Current Issues in Cancer Screening. CA: Cancer J Clin (2018) 68:297–316. doi: 10.3322/caac.21446

PubMed Abstract | CrossRef Full Text | Google Scholar

5. Tanner NT, Aggarwal J, Gould MK, Kearney P, Diette G, Vachani A, et al. Management of Pulmonary Nodules by Community Pulmonologists a Multicenter Observational Study. Chest (2015) 148:1405–14. doi: 10.1378/chest.15-0630

PubMed Abstract | CrossRef Full Text | Google Scholar

6. Swensen SJ, Silverstein MD, Ilstrup DM, Schleck CD, Edell ES. The Probability of Malignancy in Solitary Pulmonary Nodules. Arch Int Med (1997) 157(8):849–55. doi: 10.1001/archinte.1997.00440290031002

CrossRef Full Text | Google Scholar

7. Bai C, Choi CM, Chu CM, Anantham D, Ho JC, Khan AZ, et al. Evaluation of Pulmonary Nodules. Chest (2016) 150:877–93. doi: 10.1016/j.chest.2016.02.650

PubMed Abstract | CrossRef Full Text | Google Scholar

8. Ahuja AS. The Impact of Artificial Intelligence in Medicine on the Future Role of the Physician. PeerJ (2019) 7:e7702. doi: 10.7717/peerj.7702

PubMed Abstract | CrossRef Full Text | Google Scholar

9. Goryński K, Safian I, Grądzki W, Marszałł MP, Krysiński J, Goryński S, et al. Artificial Neural Networks Approach to Early Lung Cancer Detection. Cent Eur J Med (2014) 9:632–41. doi: 10.2478/s11536-013-0327-6

CrossRef Full Text | Google Scholar

10. Espinoza JL. Artificial Intelligence Tools for Refining Lung Cancer Screening. Am J Cancer Res (2020) 9:3860. doi: 10.3390/jcm9123860

CrossRef Full Text | Google Scholar

11. Yu KH, Lee TLM, Yen MH, Kou SC, Rosen B, Chiang JH, et al. Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation. J Med Internet Res (2020) 22:e16709. doi: 10.2196/16709

PubMed Abstract | CrossRef Full Text | Google Scholar

12. Varghese C, Rajagopalan S, Karwoski RA, Bartholmai BJ, Maldonado F, Boland JM, et al. Computed Tomography-Based Score Indicative of Lung Cancer Aggression (SILA) Predicts the Degree of Histologic Tissue Invasion and Patient Survival in Lung Adenocarcinoma Spectrum. J Thorac Oncol (2019) 14:1419–29. doi: 10.1016/j.jtho.2019.04.022

PubMed Abstract | CrossRef Full Text | Google Scholar

13. Kaiser J. ‘Liquid Biopsy’ for Cancer Promises Early Detection. Science (2018) 359:259. doi: 10.1126/science.359.6373.259

PubMed Abstract | CrossRef Full Text | Google Scholar

14. Katz RL, Zaidi TM, Pujara D, Shanbhag ND. Identification of Circulating Tumor Cells Using 4-Color Fluorescence in Situ Hybridization: Validation of a Noninvasive Aid for Ruling Out Lung Cancer in Patients With Low-Dose Computed Tomography-Detected Lung Nodules. Cancer Cytopathol (2020) 128:553–62. doi: 10.1002/cncy.22278

PubMed Abstract | CrossRef Full Text | Google Scholar

15. Mu Z, Benali-Furet N, Uzan G, Ye Z, Austin L, Wang C, et al. Abstract P2-02-14: Detection and Characterization of CTCs Isolated by ScreenCell®-Filtration in Metastatic Breast Cancer. Cancer Res (2016) 76(4 Suppl):Abstract P2-02-14. doi: 10.1158/1538-7445.sabcs15-p2-02-14

CrossRef Full Text | Google Scholar

16. Perakis S, Speicher MR. Emerging Concepts in Liquid Biopsies. BMC Med (2017) 15:75. doi: 10.1186/s12916-017-0840-6

PubMed Abstract | CrossRef Full Text | Google Scholar

17. Zheng H, Wu X, Yin J, Wang S, Li Z, You CX. Clinical Applications of Liquid Biopsies for Early Lung Cancer Detection. Am J Cancer Res (2019) 9:2567–79.

PubMed Abstract | Google Scholar

18. Marquette CH, Boutros J, Benzaquen J, Ferreira M, Pastre J, Pison C, et al. Circulating Tumour Cells as a Potential Biomarker for Lung Cancer Screening: A Prospective Cohort Study. Lancet Respir Med (2020) 8:709–16. doi: 10.1016/S2213-2600(20)30081-3

PubMed Abstract | CrossRef Full Text | Google Scholar

19. Asghar S, Waqar W, Umar M, Manzoor S. Tumor Educated Platelets, a Promising Source for Early Detection of Hepatocellular Carcinoma: Liquid Biopsy an Alternative Approach to Tissue Biopsy. Clin Res Hepatol Gastroenterol (2020) 44:836–44. doi: 10.1016/j.clinre.2020.03.023

PubMed Abstract | CrossRef Full Text | Google Scholar

20. Humphrey LL, Deffebach M, Pappas M, Baumann C, Artis K, Mitchell JP, et al. Screening for Lung Cancer With Low-Dose Computed Tomography: A Systematic Review to Update the U.S. Preventive Services Task Force Recommendation. Ann Int Med (2013) 159:411–20. doi: 10.7326/0003-4819-159-6-201309170-00690

PubMed Abstract | CrossRef Full Text | Google Scholar

21. Tranvåg EJ, Norheim OF, Ottersen T. Clinical Decision Making in Cancer Care: A Review of Current and Future Roles of Patient Age. BMC Cancer (2018) 18(1):546. doi: 10.1186/s12885-018-4456-9

PubMed Abstract | CrossRef Full Text | Google Scholar

22. Glatzer M, Panje CM, Sirén C, Cihoric N, Putora PM. Decision Making Criteria in Oncology. Oncol (Switzerland) (2020) 98:370–8. doi: 10.1159/000492272

CrossRef Full Text | Google Scholar

23. Katz RL, He W, Khanna A, Fernandez R, Zaidi TM, Krebs M, et al. Genetically Abnormal Circulating Cells in Lung Cancer Patients: An Antigen-Independent Fluorescence in Situ Hybridization-Based Case-Control Study. Clin Cancer Res (2010) 16:3976–87. doi: 10.1158/1078-0432.CCR-09-3358

PubMed Abstract | CrossRef Full Text | Google Scholar

24. Ye X, Yang XZ, Carbone R, Barshack I, Katz RL. Diagnosis of non-Small Cell Lung Cancer via Liquid Biopsy Highlighting a Fluorescence-in-Situ-Hybridization Circulating Tumor Cell Approach. Pathology-from classics to innovations Intechopen (2021) 129. doi: 10.5772/intechopen.97631

CrossRef Full Text | Google Scholar

25. Katz RL, Zaidi TM, Pujara D, Shanbhag ND, Truong D, Patil S, et al. Identification of Circulating Tumor Cells Using 4-Color Fluorescence in Situ Hybridization: Validation of a Noninvasive Aid for Ruling Out Lung Cancer in Patients With Low-Dose Computed Tomography–Detected Lung Nodules. Cancer Cytopathol (2020) 128:553–62. doi: 10.1002/cncy.22278

PubMed Abstract | CrossRef Full Text | Google Scholar

26. Shen D, Wu G, Suk H IL. Deep Learning in Medical Image Analysis. Ann Rev BioMed Eng (2017) 19:221–48. doi: 10.1146/annurev-bioeng-071516-044442

CrossRef Full Text | Google Scholar

27. Cao M, Chen W. Epidemiology of Lung Cancer in China. Thorac Cancer (2019) 10:3–7. doi: 10.1111/1759-7714.12916

PubMed Abstract | CrossRef Full Text | Google Scholar

28. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, et al. End-To-End Lung Cancer Screening With Three-Dimensional Deep Learning on Low-Dose Chest Computed Tomography. Nat Med (2019) 25:954–61. doi: 10.1038/s41591-019-0447-x

PubMed Abstract | CrossRef Full Text | Google Scholar

29. Riquelme D, Akhloufi MA. Deep Learning for Lung Cancer Nodules Detection and Classification in CT Scans. AI (2020) 1:28–67. doi: 10.3390/ai1010003

CrossRef Full Text | Google Scholar

30. Yoo H, Kim KH, Singh R, Digumarthy SR, Kalra MK. Validation of a Deep Learning Algorithm for the Detection of Malignant Pulmonary Nodules in Chest Radiographs. JAMA Netw Open (2020) 3:e2017135. doi: 10.1001/jamanetworkopen.2020.17135

PubMed Abstract | CrossRef Full Text | Google Scholar

31. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, et al. A Comparison of Deep Learning Performance Against Health-Care Professionals in Detecting Diseases From Medical Imaging: A Systematic Review and Meta-Analysis. Lancet Digit Health (2019) 1:E271–97. doi: 10.1016/S2589-7500(19)30123-2

PubMed Abstract | CrossRef Full Text | Google Scholar

32. Feng M, Xin Y, Chen B, Zhang J, Lin M, Zhou H, et al. Detection of Circulating Genetically Abnormal Cells Using 4-Color Fluorescence In Situ Hybridization for the Early Detection of Lung Cancer. J Cancer Res Clin Oncol (2021) 147:2397–405. doi: 10.1007/s00432-021-03517-6

PubMed Abstract | CrossRef Full Text | Google Scholar

33. Palmirotta R, Lovero D, Cafforio P, Felici C, Mannavola F, Pellè E, et al. Liquid Biopsy of Cancer: A Multimodal Diagnostic Tool in Clinical Oncology. Ther Adv Med Oncol (2018) 10:1–24. doi: 10.1177/1758835918794630

CrossRef Full Text | Google Scholar

34. Rijavec E, Coco S, Genova C, Rossi G, Longo L, Grossi F. Liquid Biopsy in non-Small Cell Lung Cancer: Highlights and Challenges. Cancers (2020) 12:17. doi: 10.3390/cancers12010017

CrossRef Full Text | Google Scholar

35. Cowling T, Loshak H. An Overview of Liquid Biopsy for Screening and Early Detection of Cancer. In: CADTH Issues in Emerging Health Technologies. Canadian Agency for Drugs and Technologies in Health (2016). p. 179.

Google Scholar

36. Moding EJ, Liu Y, Nabet BY, Chabon JJ, Chaudhuri AA, Hui AB, et al. Circulating Tumor DNA Dynamics Predict Benefit From Consolidation Immunotherapy in Locally Advanced non-Small-Cell Lung Cancer. Nat Cancer (2020) 1:176–83. doi: 10.1038/s43018-019-0011-0

PubMed Abstract | CrossRef Full Text | Google Scholar

37. Hou JM, Krebs MG, Lancashire L, Sloane R, Backen A, Swain RK, et al. Clinical Significance and Molecular Characteristics of Circulating Tumor Cells and Circulating Tumor Microemboli in Patients With Small-Cell Lung Cancer. J Clin Oncol (2012) 30:525–32. doi: 10.1200/JCO.2010.33.3716

PubMed Abstract | CrossRef Full Text | Google Scholar

38. Gould MK, Donington J, Lynch WR, Mazzone PJ, Midthun DE, Naidich DP, et al. Evaluation of Individuals With Pulmonary Nodules: When is it Lung Cancer? Diagnosis and Management of Lung Cancer, 3rd Ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines. Chest (2013) 143:93–120. doi: 10.1378/chest.12-2351

CrossRef Full Text | Google Scholar

Keywords: lung cancer, artificial intelligence, liquid biopsy, prediction model, early diagnosis

Citation: Ye M, Tong L, Zheng X, Wang H, Zhou H, Zhu X, Zhou C, Zhao P, Wang Y, Wang Q, Bai L, Cai Z, Kong F-M(S), Wang Y, Li Y, Feng M, Ye X, Yang D, Liu Z, Zhang Q, Wang Z, Han S, Sun L, Zhao N, Yu Z, Zhang J, Zhang X, Katz RL, Sun J and Bai C (2022) A Classifier for Improving Early Lung Cancer Diagnosis Incorporating Artificial Intelligence and Liquid Biopsy. Front. Oncol. 12:853801. doi: 10.3389/fonc.2022.853801

Received: 13 January 2022; Accepted: 07 February 2022;
Published: 02 March 2022.

Edited by:

Yutong He, Fourth Hospital of Hebei Medical University, China

Reviewed by:

Baishen Chen, Sun Yat-sen Memorial Hospital, China
Jianhua Gao, Chinese People’s Armed Police General Hospital, China
Wookjin Choi, Thomas Jefferson University, United States

Copyright © 2022 Ye, Tong, Zheng, Wang, Zhou, Zhu, Zhou, Zhao, Wang, Wang, Bai, Cai, Kong, Wang, Li, Feng, Ye, Yang, Liu, Zhang, Wang, Han, Sun, Zhao, Yu, Zhang, Zhang, Katz, Sun and Bai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Chunxue Bai, Y3hiYWlAZnVkYW4uZWR1LmNu; Jiayuan Sun, eGt5eWp5c3VuQDE2My5jb20=

^†These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

A Classifier for Improving Early Lung Cancer Diagnosis Incorporating Artificial Intelligence and Liquid Biopsy

Introduction

Material and Methods

Study Population

Data Collection

AI Analysis Tool Development

Liquid Biopsy

Statistical Analysis

Results

Patient Characteristics

Diagnostic Performance of the AI Risk Score and Liquid Biopsy

Relationship Between Individual Predictors and Lung Cancer

Multivariate Logistic Regression Analysis to Build Early Lung Cancer Prediction Models

Performance of the Best Model in Independent Validation Cohort & Comparison With Other Clinical Models

Discussion

Data Availability Statement

Ethics Statement

Author Contributions

Funding

Conflict of Interest

Publisher’s Note

Acknowledgments

References

95% of researchers rate our articles as excellent or good

95% of researchers rate our articles as excellent or good