Machine Learning in Disease Screening, Diagnosis, and Surveillance

Editors

Yi-Ju Tseng

National Yang Ming Chiao Tung University

Yu-Hsiu Lin

National Chung Cheng University

Impact

Pearson's correlation coefficient heat map and hierarchical clustering dendrogram for cephalometric variables. *P < 0.05, **P < 0.01, ***P < 0.001.

Original Research

17 November 2022

Development of a new category system for the profile morphology of temporomandibular disorders patients based on cephalograms using cluster analysis

Rui Zhu

, 4 more and

Xin Xiong

1,910 views

7 citations

Case Report

01 November 2022

Three-dimensional evaluation using CBCT of the mandibular asymmetry and the compensation mechanism in a growing patient: A case report

Monica Macrì

and

Felice Festa

2,235 views

12 citations

Calibration plots for predicting stroke-associated pneumonia in the holdout test set by existing pneumonia risk scores and two ML models. The P value for the Hosmer-Lemeshow test is shown for each model. ML, machine learning.

Original Research

29 September 2022

Application of machine learning and natural language processing for predicting stroke-associated pneumonia

Hui-Chu Tsai

, 1 more and

Sheng-Feng Sung

4,154 views

11 citations

The influence weight of each factor calculated by the random forest algorithm.

Original Research

20 September 2022

Predictive models based on machine learning for bone metastasis in patients with diagnosed colorectal cancer

Tianhao Li

, 7 more and

Mingqing Zhang

3,445 views

12 citations

Original Research

21 September 2022

Development and validation of chest CT-based imaging biomarkers for early stage COVID-19 screening

Xiao-Ping Liu

, 6 more and

Hang Chang

4,258 views

3 citations

Original Research

27 September 2022

Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa

Michael T. Mapundu

, 3 more and

Turgay Celik

3,330 views

6 citations

Original Research

14 September 2022

Machine learning for identifying benign and malignant of thyroid tumors: A retrospective study of 2,423 patients

Yuan-yuan Guo

, 5 more and

Cong Shao

2,824 views

7 citations

Original Research

12 September 2022

Development and validation of a prognostic nomogram for adult patients with renal sarcoma: A retrospective study based on the SEER database

Yongkun Zhu

, 8 more and

Ming Chen

3,392 views

3 citations

Original Research

25 August 2022

Application of machine learning algorithms in predicting HIV infection among men who have sex with men: Model development and validation

Jiajin He

, 8 more and

Chao Wu

Background: Continuously growing of HIV incidence among men who have sex with men (MSM), as well as the low rate of HIV testing of MSM in China, demonstrates a need for innovative strategies to improve the implementation of HIV prevention. The use of machine learning algorithms is an increasing tendency in disease diagnosis prediction. We aimed to develop and validate machine learning models in predicting HIV infection among MSM that can identify individuals at increased risk of HIV acquisition for transmission-reduction interventions.

Methods: We extracted data from MSM sentinel surveillance in Zhejiang province from 2018 to 2020. Univariate logistic regression was used to select significant variables in 2018–2019 data (P < 0.05). After data processing and feature selection, we divided the model development data into two groups by stratified random sampling: training data (70%) and testing data (30%). The Synthetic Minority Oversampling Technique (SMOTE) was applied to solve the problem of unbalanced data. The evaluation metrics of model performance were comprised of accuracy, precision, recall, F-measure, and the area under the receiver operating characteristic curve (AUC). Then, we explored three commonly-used machine learning algorithms to compare with logistic regression (LR), including decision tree (DT), support vector machines (SVM), and random forest (RF). Finally, the four models were validated prospectively with 2020 data from Zhejiang province.

Results: A total of 6,346 MSM were included in model development data, 372 of whom were diagnosed with HIV. In feature selection, 12 variables were selected as model predicting indicators. Compared with LR, the algorithms of DT, SVM, and RF improved the classification prediction performance in SMOTE-processed data, with the AUC of 0.778, 0.856, 0.887, and 0.942, respectively. RF was the best-performing algorithm (accuracy = 0.871, precision = 0.960, recall = 0.775, F-measure = 0.858, and AUC = 0.942). And the RF model still performed well on prospective validation (AUC = 0.846).

Conclusion: Machine learning models are substantially better than conventional LR model and RF should be considered in prediction tools of HIV infection in Chinese MSM. Further studies are needed to optimize and promote these algorithms and evaluate their impact on HIV prevention of MSM.

6,946 views

19 citations

Original Research

01 August 2022

Gray wolf optimization-extreme learning machine approach for diabetic retinopathy detection

Musatafa Abbas Abbood Albadr

, 3 more and

Mohammad Kamrul Hasan

Many works have employed Machine Learning (ML) techniques in the detection of Diabetic Retinopathy (DR), a disease that affects the human eye. However, the accuracy of most DR detection methods still need improvement. Gray Wolf Optimization-Extreme Learning Machine (GWO-ELM) is one of the most popular ML algorithms, and can be considered as an accurate algorithm in the process of classification, but has not been used in solving DR detection. Therefore, this work aims to apply the GWO-ELM classifier and employ one of the most popular features extractions, Histogram of Oriented Gradients-Principal Component Analysis (HOG-PCA), to increase the accuracy of DR detection system. Although the HOG-PCA has been tested in many image processing domains including medical domains, it has not yet been tested in DR. The GWO-ELM can prevent overfitting, solve multi and binary classifications problems, and it performs like a kernel-based Support Vector Machine with a Neural Network structure, whilst the HOG-PCA has the ability to extract the most relevant features with low dimensionality. Therefore, the combination of the GWO-ELM classifier and HOG-PCA features might produce an effective technique for DR classification and features extraction. The proposed GWO-ELM is evaluated based on two different datasets, namely APTOS-2019 and Indian Diabetic Retinopathy Image Dataset (IDRiD), in both binary and multi-class classification. The experiment results have shown an excellent performance of the proposed GWO-ELM model where it achieved an accuracy of 96.21% for multi-class and 99.47% for binary using APTOS-2019 dataset as well as 96.15% for multi-class and 99.04% for binary using IDRiD dataset. This demonstrates that the combination of the GWO-ELM and HOG-PCA is an effective classifier for detecting DR and might be applicable in solving other image data types.

3,774 views

42 citations

Original Research

28 July 2022

Machine learning-assisted prediction of pneumonia based on non-invasive measures

Clement Yaw Effah

, 6 more and

Yanbin Wang

Background: Pneumonia is an infection of the lungs that is characterized by high morbidity and mortality. The use of machine learning systems to detect respiratory diseases via non-invasive measures such as physical and laboratory parameters is gaining momentum and has been proposed to decrease diagnostic uncertainty associated with bacterial pneumonia. Herein, this study conducted several experiments using eight machine learning models to predict pneumonia based on biomarkers, laboratory parameters, and physical features.

Methods: We perform machine-learning analysis on 535 different patients, each with 45 features. Data normalization to rescale all real-valued features was performed. Since it is a binary problem, we categorized each patient into one class at a time. We designed three experiments to evaluate the models: (1) feature selection techniques to select appropriate features for the models, (2) experiments on the imbalanced original dataset, and (3) experiments on the SMOTE data. We then compared eight machine learning models to evaluate their effectiveness in predicting pneumonia

Results: Biomarkers such as C-reactive protein and procalcitonin demonstrated the most significant discriminating power. Ensemble machine learning models such as RF (accuracy = 92.0%, precision = 91.3%, recall = 96.0%, f1-Score = 93.6%) and XGBoost (accuracy = 90.8%, precision = 92.6%, recall = 92.3%, f1-score = 92.4%) achieved the highest performance accuracy on the original dataset with AUCs of 0.96 and 0.97, respectively. On the SMOTE dataset, RF and XGBoost achieved the highest prediction results with f1-scores of 92.0 and 91.2%, respectively. Also, AUC of 0.97 was achieved for both RF and XGBoost models.

Conclusions: Our models showed that in the diagnosis of pneumonia, individual clinical history, laboratory indicators, and symptoms do not have adequate discriminatory power. We can also conclude that the ensemble ML models performed better in this study.

4,905 views

13 citations

Heatmap of the correlation of patients' clinical and pathological features.

Original Research

29 June 2022

A Machine Learning Algorithm for Predicting the Risk of Developing to M1b Stage of Patients With Germ Cell Testicular Cancer

Li Ding

, 5 more and

Junqi Wang

3,857 views

4 citations

Original Research

21 June 2022

An Explainable AI Approach for the Rapid Diagnosis of COVID-19 Using Ensemble Learning Algorithms

Houwu Gong

, 3 more and

Min Jin

Background: Artificial intelligence-based disease prediction models have a greater potential to screen COVID-19 patients than conventional methods. However, their application has been restricted because of their underlying black-box nature.

Objective: To addressed this issue, an explainable artificial intelligence (XAI) approach was developed to screen patients for COVID-19.

Methods: A retrospective study consisting of 1,737 participants (759 COVID-19 patients and 978 controls) admitted to San Raphael Hospital (OSR) from February to May 2020 was used to construct a diagnosis model. Finally, 32 key blood test indices from 1,374 participants were used for screening patients for COVID-19. Four ensemble learning algorithms were used: random forest (RF), adaptive boosting (AdaBoost), gradient boosting decision tree (GBDT), and extreme gradient boosting (XGBoost). Feature importance from the perspective of the clinical domain and visualized interpretations were illustrated by using local interpretable model-agnostic explanations (LIME) plots.

Results: The GBDT model [area under the curve (AUC): 86.4%; 95% confidence interval (CI) 0.821–0.907] outperformed the RF model (AUC: 85.7%; 95% CI 0.813–0.902), AdaBoost model (AUC: 85.4%; 95% CI 0.810–0.899), and XGBoost model (AUC: 84.9%; 95% CI 0.803–0.894) in distinguishing patients with COVID-19 from those without. The cumulative feature importance of lactate dehydrogenase, white blood cells, and eosinophil counts was 0.145, 0.130, and 0.128, respectively.

Conclusions: Ensemble machining learning (ML) approaches, mainly GBDT and LIME plots, are efficient for screening patients with COVID-19 and might serve as a potential tool in the auxiliary diagnosis of COVID-19. Patients with higher WBC count, higher LDH level, or higher EOT count, were more likely to have COVID-19.

4,308 views

27 citations

Establishment of RF model and SVM model. (A) Reverse cumulative distribution of residual was displayed to demonstrate the residual distribution of RF and SVM model. (B) Boxplots of residual was displayed to demonstrate the residual distribution of RF and SVM model. (C) The influence of the number of decision trees on the error rate. (D) The importance of the 26 m6A regulators based on the RF model. (E) ROC curves revealed the accuracy of the RF and SVM model.

Original Research

17 May 2022

m6A Regulator-Mediated Methylation Modification Patterns and Characteristics in COVID-19 Patients

Xin Qing

, 1 more and

Ke Wang

2,759 views

10 citations

Original Research

28 April 2022

Microcalcification Discrimination in Mammography Using Deep Convolutional Neural Network: Towards Rapid and Early Breast Cancer Diagnosis

Yew Sum Leong

, 3 more and

Muhammad Mokhzaini Azizan

3,884 views

15 citations

Calibration curve of the nomogram. (A) Calibration curves of 1 -, 3 - and 5-year CSS in the training cohort; (B) calibration curves of 1-, 3-, and 5-year CSS in the validation cohort.

Original Research

04 April 2022

Development and Validation of a Nomogram to Predict Cancer-Specific Survival in Elderly Patients With Papillary Renal Cell Carcinoma

Chenghao Zhanghuang

, 8 more and

Bing Yan

2,530 views

11 citations

Calibration curves of the nomogram. (A–C) For 1-, 3-, and 5-year CSS in the training set; (D– F) For 1-, 3-, and 5-year CSS in the validation set.

Original Research

24 February 2022

A Web-Based Prediction Model for Cancer-Specific Survival of Middle-Aged Patients With Non-metastatic Renal Cell Carcinoma: A Population-Based Study

Jie Tang

, 3 more and

Binyi Zhao

2,611 views

1 citations

Original Research

20 January 2022

Unifying Diagnosis Identification and Prediction Method Embedding the Disease Ontology Structure From Electronic Medical Records

Jingfeng Chen

, 2 more and

Suying Ding

3,067 views

6 citations

Ranking of importance of characteristics of patients with classic FUO in predicting all types of causes.

Original Research

24 December 2021

Application of Machine Learning for the Prediction of Etiological Types of Classic Fever of Unknown Origin

Yongjie Yan

, 4 more and

Kexue Pu

3,871 views

10 citations

The basic architecture (A) and data pre-processing and architectural flow (B) of the Long Short-Term Memory model. Symbol x and h represent the input and output values of the LSTM cell. Symbol c represents the value of the memory cell in each LSTM cell. Subscript t represents the time step.

Original Research

09 December 2021

Deep-Learning Approach to Predict Survival Outcomes Using Wearable Actigraphy Device Among End-Stage Cancer Patients

Tien Yun Yang

, 8 more and

Jeng-Fong Chiou

Survival prediction is highly valued in end-of-life care clinical practice, and patient performance status evaluation stands as a predominant component in survival prognostication. While current performance status evaluation tools are limited to their subjective nature, the advent of wearable technology enables continual recordings of patients' activity and has the potential to measure performance status objectively. We hypothesize that wristband actigraphy monitoring devices can predict in-hospital death of end-stage cancer patients during the time of their hospital admissions. The objective of this study was to train and validate a long short-term memory (LSTM) deep-learning prediction model based on activity data of wearable actigraphy devices. The study recruited 60 end-stage cancer patients in a hospice care unit, with 28 deaths and 32 discharged in stable condition at the end of their hospital stay. The standard Karnofsky Performance Status score had an overall prognostic accuracy of 0.83. The LSTM prediction model based on patients' continual actigraphy monitoring had an overall prognostic accuracy of 0.83. Furthermore, the model performance improved with longer input data length up to 48 h. In conclusion, our research suggests the potential feasibility of wristband actigraphy to predict end-of-life admission outcomes in palliative care for end-stage cancer patients.

Clinical Trial Registration: The study protocol was registered on ClinicalTrials.gov (ID: NCT04883879).

4,118 views

17 citations

Fetching...

Open for submission