- 1Department of Rheumatology, Shanghai Guanghua Hospital of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- 2Guanghua Clinical Medical College, Shanghai University of Traditional Chinese Medicine, Shanghai, China
- 3Institute of Arthritis Research in Integrative Medicine, Shanghai Academy of Traditional Chinese Medicine, Shanghai, China
- 4Traditional Chinese Medicine Hospital of Inner Mongolia Autonomous Region, Hohhot, Inner Mongolia Autonomous Region, China
- 5Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi Collaborative Innovation Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, Guangxi, China
- 6Department of Urology, Affiliated Tumor Hospital of Guangxi Medical University, Guangxi Medical University, Nanning, Guangxi, China
Rheumatoid arthritis (RA) is an autoimmune disease causing progressive joint damage. Early diagnosis and treatment is critical, but remains challenging due to RA complexity and heterogeneity. Machine learning (ML) techniques may enhance RA management by identifying patterns within multidimensional biomedical data to improve classification, diagnosis, and treatment predictions. In this review, we summarize the applications of ML for RA management. Emerging studies or applications have developed diagnostic and predictive models for RA that utilize a variety of data modalities, including electronic health records, imaging, and multi-omics data. High-performance supervised learning models have demonstrated an Area Under the Curve (AUC) exceeding 0.85, which is used for identifying RA patients and predicting treatment responses. Unsupervised learning has revealed potential RA subtypes. Ongoing research is integrating multimodal data with deep learning to further improve performance. However, key challenges remain regarding model overfitting, generalizability, validation in clinical settings, and interpretability. Small sample sizes and lack of diverse population testing risks overestimating model performance. Prospective studies evaluating real-world clinical utility are lacking. Enhancing model interpretability is critical for clinician acceptance. In summary, while ML shows promise for transforming RA management through earlier diagnosis and optimized treatment, larger scale multisite data, prospective clinical validation of interpretable models, and testing across diverse populations is still needed. As these gaps are addressed, ML may pave the way towards precision medicine in RA.
1 Introduction
Rheumatoid arthritis (RA) is a prevalent autoimmune disorder characterized by inflammation and discomfort in numerous small joints, potentially leading to joint deformity and impaired functionality. Furthermore, it ranks among the primary contributors to chronic disability (1). Furthermore, RA not only impacts the joints but also has implications for other bodily systems, including the cardiovascular and respiratory systems, leading to an elevated susceptibility to conditions such as myocardial infarction, stroke, and pulmonary fibrosis (2, 3). Chronic illnesses and persistent pain can result in psychological distress for patients, manifesting as symptoms of depression and anxiety (4). Hence, it is imperative to promptly identify individuals with a high susceptibility to RA in order to facilitate early diagnosis and anticipate the potential severity of disease progression. Furthermore, the timely administration of efficacious medications is essential in impeding the advancement of the disease.
The phrase “machine learning (ML)” surged in popularity in the late 1990s in the field of artificial intelligence (5). In the past decade, ML has made significant advancements as a result of the increased availability of data and improvements in algorithms, enabling the identification of complex patterns and correlations within datasets (6). The biomedical field has experienced a significant increase in data volume, ranging from molecular details to comprehensive information on the human body system, due to advancements in high-throughput sequencing technologies, electronic health records, and medical imaging (7). Healthcare providers and researchers are currently facing a growing number of clinical challenges, leading them to explore ways to enhance decision-making effectiveness, refine personalized treatment strategies, and optimize resource allocation methods. ML is uniquely positioned to extract valuable patterns and insights from large datasets, potentially automating and enhancing the efficiency of healthcare decision-making and services. The incremental incorporation of biomedicine with various disciplines, including computational science, mathematics, and statistics, has spurred interdisciplinary partnerships, leading to accelerated progress in the application of ML in the field of biomedicine (8). In the clinical practice of RA, Rheumatoid Factor (RF) and Anti-Citrullinated Protein Antibody (ACPA) serve as crucial diagnostic biomarkers for RA, playing key roles in its diagnosis. However, approximately 20-25% of RA patients are seronegative, posing challenges to early diagnosis and potentially leading to delayed diagnosis and treatment (9). With the advent and development of biologics, significant progress has been made in the treatment of RA. Nevertheless, many RA patients exhibit poor responses to drug treatments, failing to achieve sustained remission (10), and currently, it is not possible to predict which treatment drugs will have the best therapeutic effect on individual patients. The accumulation of biomedical big data may provide new insights into better understanding the heterogeneity of RA (11). With the increase in data volume and complexity, traditional statistical analysis methods have become insufficient, especially when dealing with nonlinear relationships and complex interactions between variables (12). These unmet needs pose challenges to the precision medicine of RA. Using ML techniques for data processing and pattern recognition to build predictive models for RA can assist clinicians in making more accurate data-driven decisions (13). Therefore, understanding the prevalent ML algorithms in RA, their effectiveness, and potential applications is crucial. Our study is dedicated to evaluating recent literature on applications of ML in RA classification and outcome prediction, with the goal of offering a dependable benchmark for reference and guiding future research endeavors. By enhancing the utilization of sophisticated modeling in RA and advocating for precision medicine in the field, our work aims to propel advancements in RA treatment and management.
2 ML algorithms to enhance precision rheumatology
ML, a crucial component of artificial intelligence, is divided into two main categories: supervised and unsupervised learning. Supervised learning employs labeled training datasets to identify patterns and relationships. Upon training, the model can predict or classify new data inputs, yielding corresponding results. This method utilizes a range of algorithms, such as logistic regression, random forests, gradient boosting, and decision trees. Each algorithm contributes uniquely to the robustness and accuracy of predictive outcomes, making supervised learning integral to advancements in data-driven research methodologies (14). Supervised learning is divided into two principal methodologies: classification and regression (15). Classification methodologies segregate patients according to distinct characteristics (16). By employing datasets comprising genetic information, gene expression profiles, and clinical indicators from patients with RA, algorithms can be trained to identify RA patients within populations, as well as to ascertain which patients exhibit optimal responses to specific treatments. Regression models, on the other hand, are designed to predict continuous outcomes (17), such as disease activity scores and response rates to treatments in RA patients, thus facilitating personalized monitoring and management to optimize treatment efficacy. In contrast, unsupervised learning explores inherent patterns and relationships in datasets without predetermined labels (18). Clustering algorithms, an exemplary application of unsupervised learning, automatically group data into multiple clusters to maximize intra-cluster similarity and minimize inter-cluster similarity, aiding significantly in RA research by identifying potential patient subgroups who may exhibit favorable responses to specific treatments or distinct disease progression patterns. Deep learning, employing Artificial Neural Network (ANN) technologies, enhances the analysis and prediction of complex data through sophisticated non-linear mapping relationships (19). Particularly, Convolutional Neural Networks (CNNs) in deep learning architectures are adept in processing image data (20), enabling automatic feature learning from multiple convolutional layers which assist physicians in identifying early signs of arthritis or disease progression in X-ray or Magnetic Resonance Imaging (MRI) images of RA patients. In summary, supervised and unsupervised learning each serve specific roles, while deep learning technologies enhance the capability of these methods to process complex data, thereby effectively advancing the field of precision rheumatology.
In the preprocessing phase, data cleaning and organization are paramount, involving the removal of duplicates and correction of anomalies (21). Furthermore, feature engineering plays a critical role in identifying predictors (x) that significantly influence the target variable (y) through strategic selection and transformation of data, a crucial task in supervised learning. Accurate feature selection not only enhances the precision of the model but also its interpretability. When constructing predictive models, addressing the challenge of managing a large volume of available features is commonplace. While the use of advanced and efficient algorithms is vital, ineffective predictive information derived from these features, or the presence of numerous irrelevant variables, can impair model performance. Implementing key feature selection strategies is crucial, including statistical filtering, wrapper methods, and advanced embedded techniques (22–24). For instance, Random Forest assesses feature importance by calculating their contribution to model accuracy (25), whereas Logistic Regression identifies key influencing factors by analyzing the magnitude and direction of coefficients (26). Through rigorous feature selection, the dimensionality and complexity of the dataset are effectively reduced, thereby enhancing the interpretability and practical application of the predictive model in clinical decision-making (22). For example, identifying RA patients with specific genetic mutations through feature selection has indicated that these individuals respond more positively to methotrexate, a principal drug for RA treatment. This insight assists physicians in devising targeted treatment plans, thereby improving therapeutic outcomes.
ML algorithms are increasingly recognized as powerful analytical tools in the field of RA research. As depicted in Figure 1, they provide assistance across multiple domains, including diagnosis, disease progression forecasting, prediction of treatment responses, and identification of potential complications. These computational tools are guiding the field towards a more refined and individualized approach, allowing clinicians and researchers to explore the complexities of RA with greater accuracy.
Figure 1 Schematic overview of clinical prediction in RA using ML The schematic illustrates the comprehensive workflow and applications of ML algorithms in the management of RA. It encapsulates the stepwise process from data collection, including electronic health records, imaging, and multi-omics data, through data preprocessing and feature engineering, to model training and validation phases. The central part of the diagram highlights the primary domains of ML application in RA: risk prediction, diagnosis and subtype classification, prediction of disease activity and progression, treatment response, and comorbidity identification for RA. It emphasizes the iterative optimization of models and the synergy between clinical and computational insights aimed at advancing early diagnosis, personalized treatments, and patient outcomes in RA management.
3 ML models in precision diagnosis and therapeutics for RA
A variety of predictive models have been built using ML algorithms in RA research. Presented in Table 1 is the appraisal of performance when these ML models serve as classifiers across a multitude of data types from various sources. The functionalities of these classifiers include identification of individuals at risk for RA, diagnosis and differentiation of subtypes, discrimination of disease activity levels, forecasting of treatment outcomes as effective or ineffective, and predicting the presence or absence of comorbidities.
3.1 Stratification of RA risk cohorts
Identifying individuals at risk for RA is crucial for early intervention, which has been shown to yield substantially better outcomes when applied during the preclinical stages rather than after the overt development of clinically significant arthritis (70). Specifically, by identifying individuals at high risk and conducting regular medical examinations and monitoring RA-related biomarkers, such as inflammation levels and autoantibodies, early detection of the disease can utilize the ‘window of opportunity’ for therapeutic intervention. Early interventions can help prevent severe radiographic damage and disability, thus significantly improving patient prognosis (71). The exact etiology of RA remains not fully understood; however, it is known that genetic and environmental factors, as well as their interactions, influence the onset and progression of RA (72). ML, as an effective data analysis tool, is capable of processing and interpreting large volumes of diverse data, ranging from genetic factors to lifestyle choices. ML can uncover potential risk patterns within complex genetic and environmental datasets, assisting clinicians in making more accurate disease predictions and risk assessments.
Predictive modeling harnessing ML techniques to pinpoint individuals at an elevated risk for RA can be principally segregated into two domains: forecasting the incident risk in asymptomatic persons and assessing the progression likelihood in symptomatic patients with undifferentiated arthritis towards RA. The detection of RA susceptibility in the broad population leans on the analysis of genetic variants alongside common clinical risk indicators such as family history, age, and gender. A study found nine single nucleotide polymorphisms (SNPs) linked to RA, by combining these variations into a risk score and using ML algorithms, researchers were able to accurately distinguish RA patients from those without the condition, exhibiting five-fold cross-validated AUCs surpassing the 0.9 threshold (27). 11 risk factors for RA were identified from National Health and Nutrition Examination Survey (NHANES) data and used to create a Bayesian logistic regression model, which was refined using a Genetic Algorithm. The model showed high predictive accuracy with an AUC of 0.826 on the validation set (28). These findings highlight the potential of machine learning strategies in predicting risk populations for RA. Genetic risk scores derived from SNPs can help identify an individual’s potential genetic risks, thereby providing a crucial foundation for personalized medicine (73). However, translating these studies into clinical decision support tools faces obstacles, primarily ensuring the equal applicability of Polygenic risk score (PRS) across populations (74). In reality, PRS exhibits limited transferability among populations, and its clinical utility in RA remains undetermined, necessitating substantial investment in extensive data collection across diverse ethnic groups and methodological research to enhance genetic prediction in admixed individuals (75). Another critical issue is the interpretability of genetic findings in participants, requiring clinicians to possess the capacity to comprehend and interpret data (76). Furthermore, privacy and security of the involved genetic data must be adequately ensured. Federated learning, as a distributed machine learning technique, aims to achieve collaborative modeling while ensuring data privacy, security, and legal compliance (77). Participants can train their local models using their proprietary data, and through iterative training, each participant contributes to the construction of a global model without sharing their data externally (78). This approach fosters collaboration among multiple medical institutions, facilitating the sharing of model learning outcomes (79).
The likelihood of individuals with undifferentiated arthritis (UA), who exhibit joint symptoms without fulfilling the full diagnostic criteria, subsequently progressing to RA poses a clinical conundrum. Accurate prediction of this progression can facilitate early diagnosis and intervention for those at risk, while concurrently preventing overtreatment and diminishing both the health repercussions and superfluous healthcare expenditures for those unlikely to develop RA (80). Models are increasingly geared towards the evaluation of dynamic variables, reflecting shifts correlated with disease activity, such as gene expression profiles, epigenetic modifications, and a spectrum of detailed symptomatic and clinical markers.
A notable investigation sought to unearth clinically pertinent predictive biomarkers from peripheral blood CD4 T cells in UA patients, employing a support vector machine (SVM) classification model. This approach demonstrated that an integration of the pre-established Leiden predictive rule with a 12-gene risk indicator notably enhanced the prognostic capability from the original (AUC=0.74) to a significantly improved accuracy for seronegative UA patients (AUC=0.84) (29). A comparative analysis of three distinct ML algorithms revealed that a SVM model, which integrated DNA methylation profiles from 40 CpG sites with clinical parameters including disease activity score (DAS) and RF, effectively distinguished individuals with UA who were predisposed to developing RA within one year, achieving an AUC range of 0.85 to 1 (30).
Contemporary studies report promising predictive performance in identifying at-risk individuals within the general population and in forecasting RA development in patients with UA, and that the features having the greatest impact on predictive outcomes were identified and selected as much as possible during model training in order to simplify the model and potentially improve performance and generalizability. More important than performance, however, is the potential for practical clinical application, and future studies will need to examine the generalizability of the model by testing it in populations of multiple ethnicities and regions, and tracking the progression of individuals to RA in larger prospective cohorts to observe the accuracy of the model.
3.2 Diagnosis and subtype classification of RA
The diagnostic framework for RA, especially in the context of seronegative RA, is intricate and often obstructed by the absence of potent biomarkers, impeding early detection and management (47). Investigations are thus aimed at the identification of new biomarkers to bridge this gap.
Non-invasive imaging techniques are pivotal in elucidating inflammatory activity and its effects on joint morphology, especially when serological markers are indistinct or inconclusive. These tools are indispensable for both diagnostic purposes and for monitoring treatment efficacy (81). Furthermore, the application of ML algorithms in the analysis of imaging data presents a sophisticated approach to patient classification (82). Üreten K et al. presented a model of a Visual Geometry Group-16 (VGG-16) neural network for hand radiographs augmented by transfer learning to distinguish RA patients from non-RA patients, which achieved an AUC of 0.97 (31). Ultrasound imaging of the metacarpophalangeal joints in RA patients has been categorized for classification purposes, employing a DenseNet-based deep learning model in several regions of interest, significant efficacy was demonstrated in distinguishing between synovial proliferation and healthy and diseased synovium, as evidenced by AUCs exceeding 0.8 (32). Additionally, research has been conducted utilizing hand RGB images and gripforce as features to develop a random forest model with an AUC of 0.97 for distinguishing between individuals with RA and control subjects, thereby offering a supplementary diagnostic tool for RA (33). Image-based predictive models have shown notable performance in research settings, accurately differentiating RA patients from others in various cohorts, thereby contributing to the precision and efficiency of RA diagnosis. These models facilitate the early detection of abnormal changes within the joints, enabling timely intervention and ultimately delaying the progression of RA. However, their clinical application still faces significant challenges. A primary obstacle is the interpretability of the models. Owing to the ‘black box’ nature of deep learning models, the decision-making processes are opaque and difficult to comprehend, which may affect both physician and patient trust and understanding of model predictions (83). To address this limitation, some well-known methods can be utilized: The Class Activation Mapping (CAM) technique helps in understanding the regions of interest within images as attended by the model (84); Shapley Additive exPlanations (SHAP) elucidate the global impact of each feature on the model (85); and Local Interpretable Model-agnostic Explanations (LIME) explicate the local prediction process for individual samples (86). Collectively, these methods provide interpretability tools that enhance comprehension of the model’s decision-making process and improve its interpretability. Future studies are also suggested to involve multi-center collaborations to enhance image collection with the intent to further refine and generalize these diagnostic models.
In RA, both individual analyses and integrative omics studies have accumulated a vast amount of data, providing insights into the mechanisms of RA from multiple perspectives. Genomics identifies genetic variations associated with RA, revealing potential genetic mechanisms influencing gene expression (87). Epigenetic modifications, including DNA methylation, histone modifications, chromatin remodeling, and non-coding RNA, play crucial roles in maintaining normal gene expression patterns. Epigenomics studies these modifications to reveal gene expression and regulatory mechanisms in RA, offering insights into the diverse molecular processes involved (88). Transcriptomics, by analyzing the variations in gene expression under different conditions, provides a detailed elucidation of which genes are upregulated or downregulated in RA. This process not only involves the regulation at the genetic level but also directly affects the production and function of the corresponding proteins (89). Proteomics provides a comprehensive analysis of protein composition, expression levels, and modification states, elucidating the interactions and connections among proteins that may play key roles in RA inflammation and immune response processes (90). Metabolomics provides insights into the shifts in metabolic states and pathways during the progression of RA. These changes are potentially influenced by alterations in gene and protein activities. Furthermore, metabolites themselves can play a modulatory role, affecting gene transcription and protein expression, thereby forming a complex interplay that influences disease dynamics (91). Host genomic variations significantly influence the composition of the gut microbiota, which can synthesize, regulate, or degrade endogenous small molecules or macromolecules, resulting in metabolic changes. Utilizing metagenomics and related techniques reveals the role of gut microbiota in the development of RA by influencing metabolic pathways and modulating the host immune system (92). Omic studies are characterized by the generation of vast, high-dimensional datasets. ML algorithms are critically employed for visualization and processing such information—finding patterns, crafting predictive models, and examining large-scale, multi-omic data to identify biomarkers and pathways implicated in disease progression (93, 94). Existing research has integrated multimodal data and employed various machine learning algorithms to develop high-performance diagnostic models for RA. Key genes highly correlated with RA phenotypes have been identified through the application of weighted gene co-expression network analysis (WGCNA) and differential gene expression (DEG) analysis on RA blood sample microarray datasets. These genes have been deployed as features to assess the performance of six ML models, with five demonstrating commendable efficacy (AUC > 0.85) (34). Through the sourcing of RA patient peripheral blood sample microarray datasets from the GEO database, a platelet-related signature risk score model was formulated, comprised of six genes, using the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm. The model exhibited AUCs of 0.801 and 0.979 across the training and validation sets, respectively (35). Employing the Generalized Matrix Learning Vector Quantization (GMLVQ) method, mRNA expression profiles of cytokines and chemokines from synovial biopsies were analyzed, leading to the identification of two gene sets. These sets were instrumental in generating a model capable of differentiating between various arthritis types, with AUC scores reaching 0.996 and 0.764 for distinguishing diagnosed RA from non-inflammatory cases and early-stage RA from self-remitting arthritis, respectively (36). By focusing on the expression of 19 N6-methyladenosine (m6A) methylation regulators, diagnostic models have been established to separate RA from non-RA conditions. A subset of these regulators, particularly IGF2BP3 and YTHDC2, demonstrated accuracies and AUCs exceeding 0.8 across most ML models, indicating the potential diagnostic importance of m6A methylation profiles (37). A multi-variable classification model, incorporating 26 metabolites and lipids, was devised utilizing three ML algorithms. The logistic regression model, in particular, stood out for its ability to differentiate seropositive and seronegative RA from normal controls within an independent validation cohort, securing an AUC of 0.91, thus showcasing that a holistic metabolomic and lipidomic approach grounded in Liquid Chromatography-Mass Spectrometry (LC-MS) can effectively segregate RA cases (38). Serum antigens were analyzed in patient cohorts with RA, osteoarthritis (OA), and healthy controls. Subsequently, distinct biomarker sets were identified for the differentiation of RA, ACPA-positive RA, and ACPA-negative RA using feature selection through the Random Forest algorithm. The model demonstrated exceptional performance with AUC values of 0.9949, 0.9913, and 1.0, respectively, establishing a proteomics-based diagnostic model for RA (39). Furthermore, leveraging metagenomic data to predict the microbiomic characteristics of the gut in autoimmune diseases has been demonstrated to discriminate between various types of autoimmune disorders (40).
Histopathology, as a fundamental pillar in confirming disease diagnosis, stands as the definitive standard for the verification of numerous ailments (95). Overlap of symptoms in certain pathologies may obscure the principal etiology responsible for articular manifestations; in such instances, tissue biopsy, particularly of synovial tissue, proves invaluable. Following Total Knee Arthroplasty (TKA), synovial samples from 147 OA and 60 RA individuals were subjected to hematoxylin and eosin (H&E) staining. Utilization of a Random Forest Algorithm, integrating pathologist-derived scores with computer vision-generated cellular density measures, led to the construction of an optimal discriminative model for OA and RA, achieving a model AUC of 0.91 (42). This serves as a potent discriminative tool for RA assessment. Orange et al. utilized consensus clustering of gene expression data from synovial tissues of patients with RA to identify three distinct synovial subtypes: high-inflammatory, low-inflammatory, and mixed. They subsequently employed a support vector ML algorithm to distinguish between these subtypes based on histological features, achieving area under the curve values of 0.88, 0.71, and 0.59, respectively (43).
Despite the high performance of ML-derived predictive models for RA diagnosis, concerns on potential model overfitting due to limited sample sizes, which may exaggerate effect sizes, cannot be overlooked. Additionally, independent evaluation of the research methodology, data processing, and outcomes by an external party ensures the accuracy and reliability of the research findings. Validation of these models in diverse datasets, supplemented by molecular biology experimentation, is imperative for evaluating true diagnostic merit. Predictive models relying on histopathological data encounter additional challenges, including the necessity for manual feature annotation by pathologists and the invasiveness of the procedure, compounded by technical and sample handling issues. External validation is a critical quality control measure, ensuring that model utility and accuracy in diagnosing RA reflect true clinical relevance and potential for widespread application. The diagnosis of RA extends beyond segregating RA from healthy subjects or OA patients. Future investigations must address the diagnostic capacity of predictive model-derived markers in distinguishing seronegative RA from other inflammatory arthritides, such as psoriatic arthritis, reactive arthritis, or spondyloarthritis. Concomitantly, safeguarding against confounding variables and maintaining diversity within patient cohorts are essential to render the model universally applicable.
3.3 Prediction of disease activity and imaging progression in RA
Radiographic deterioration in RA is characterized by the degree of articular damage and the presence of distinct lesions such as joint space narrowing, bone erosion, and osteoporosis, as revealed through diagnostic imaging modalities including X-rays, magnetic resonance imaging, or computed tomography scans (96). The quantification and prognostication of structural joint impairment traditionally hinge on clinical expertise, underscoring the necessity for an automated, bias-free evaluation method. A study utilizing SVM modeling on cohorts comprising 374 Korean and 399 North American patients with incipient RA identified SNPs correlated with radiographic progression. An integrated model encompassing SNPs with clinical parameters exhibited optimal performance, yielding a mean ten-fold cross-validation AUC of 0.78, providing a more satisfactory distinction between severe and non-severe progression (44).
Radiological damage bears a significant association with disease activity in RA, with heightened activity posing an increased risk for osseous impairment. CNNs trained on ultrasound imagery of RA joints, have facilitated the automatic grading of disease activity, achieving an overall classification accuracy of 83.9% (45). Vodencarevic et al. used data from 135 consultations with 41 RA patients to predict flare incidents during biologic disease-modifying antirheumatic drugs (DMARDs) tapering in remission. They combined multiple ML models to achieve an AUC of 0.81 (46). Furthermore, baseline serum proteomics from 130 stable RA patients in clinical remission was analyzed for biomarkers predictive of future disease flares, employing LASSO and eXtreme Gradient Boosting (XGBoost) algorithms to construct predictive models. The XGBoost model exhibited superior performance in differentiating between relapsed and non-relapsed patients with an AUC of 0.80 (47).
The expansive volume of patient intelligence and clinical information harbored in electronic medical records (EMR) and electronic health records (EHR) constitutes a substantial body of data ripe for investigation (97, 98). Nonetheless, hindrances such as imbalances in data record quantities across patients, omissions of pivotal information, and the variability in patient conditions and therapeutic outcomes over time contribute to the complex temporal nature of the data (48). Conventional ML techniques encounter constraints concerning data pre-processing, time-series analysis capacity, and the simplification of intricate relational processing (99). Deep learning integrated with structured EHR data, have been deployed to prognosticate disease activity during subsequent outpatient rheumatology consultations, wherein the model trained on the UH cohort manifested an AUC of 0.91 for internal validation and 0.74 for external cohort testing (48). Feldman et al. endeavored to enhance the precision of RA disease activity evaluation by integrating electronic medical records and claims data, achieving an AUC of 0.76 in discriminating high/moderate from low disease activity/remission (49). Chandran et al. employed the use of biologic agents or tofacitinib as a surrogate for distinguishing disease severity indicators, with the model accurately predicting both current and future disease activity validated across various databases with AUCs exceeding 0.7 (50).
The aforementioned results substantiate the viability of employing routinely documented clinical and laboratory data to assess and forecast disease activity in RA. With the progressive advancements in information technology, an extensive array of data has become accessible, prompting researchers to explore ML methodologies for the extraction of RA patient records from electronic health record data, thereby enabling the study of substantial populations at minimal expense. Algorithms trained via ML are progressively leveraged with EMR for clinical investigations. These algorithms function by detecting specifiable patterns in the data associated with RA, yet systematic disparities in EMR data quality present hurdles for model generalizability. Despite these challenges, high-caliber investigations are somewhat limited and the dependability and transferability of pertinent ML methods remain largely undetermined, rendering periodic evaluation of algorithm performance imperative. The current research trend involves the utilization of thousands of digitally annotated images obtained from large-scale observational studies, clinical trials, and electronic medical records, along with clinical data, to automatically classify and quantify the extent of joint damage and activity scores in RA using ML algorithms (100–102).
3.4 Prediction of RA treatment response
In the realm of RA therapeutics, a plethora of options including nonsteroidal anti-inflammatory drugs (NSAIDs), glucocorticoids, conventional synthetic DMARDs, biologic DMARDs, and oral small molecules have been made available (103). The selection of appropriate treatments continues to challenge clinicians owing to the vast range of alternatives and the prevalent trial-and-error approach in therapeutic prescription, exacerbated by a lack of comprehensive knowledge regarding drug efficacy and safety across distinct patient demographics (53).
Methotrexate (MTX) stands as the quintessential first-line therapy in RA treatment strategies (104). Investigation into whether disparities in the gut microbiome across individuals could serve as predictive markers for MTX efficacy in newly onset RA was conducted by Artacho et al. Fecal samples from 26 new-onset RA patients, procured prior to MTX treatment, were analyzed using 16S ribosomal RNA (16S rRNA) and shotgun sequencing. Subsequent construction of a predictive model via random forests revealed that a response to MTX treatment at 4 months could be anticipated, with an AUC of 0.84, based on colony characterization (51). Additional research involving ML algorithms applied to clinical and biological data from 493 and 239 patients across two cohorts, aimed to predict MTX treatment response at 9 months. Notably, the Light Gradient Boosting Machine (LightGBM) model acquired AUCs of 0.73 and 0.72 in training and external validation sets, respectively (52). Lim et al. analyzed exome sequencing data from 349 RA patients and predicted treatment response to MTX using six ML algorithms. They identified 95 genetic factors and 5 non-genetic factors that influenced response. The predictions had strong performance with AUCs between 0.776 and 0.828 in the test set (53). Plant et al. utilized whole blood samples from RA patients initiating MTX treatment, both before and 4 weeks after commencement, conducting gene expression profiling to foretell treatment response at 6 months. Application of an L2 regularized logistic regression yielded an AUC of 0.78 (54). The development of these predictive models has contributed significantly towards identifying patients who are more likely to respond favorably to, or may not derive benefit from, MTX treatment.
Anti-tumor necrosis factor (anti-TNF) agents have been established as pivotal second-line therapeutic agents following methotrexate. A prospective multicenter study recruited 104 RA patients and 29 healthy donors to discover predictive biomarkers for anti-TNF treatment using ML. A hybrid model combining clinical and molecular variables achieved a high AUC value of 0.91 (55). The DREAM RA Responder Challenge introduced a novel approach to predicting anti-TNF treatment response by proposing an optimal model that incorporates Gaussian Process Regression (GPR) and integrates demographic, clinical, and genetic markers. This model accurately predicts the Disease Activity Score in patients 24 months post-baseline assessment and categorizes treatment response according to the EULAR response criteria, effectively identifying non-responders to anti-TNF therapy with an AUC of 0.6 in cross-validation data (56). Kim et al. utilized 11 datasets containing 256 synovial tissue samples, integrating RA-associated pathway activation scores and four ML types, and found that the SVM model performed the best, with an AUC of 0.87 using the pathway-driven model and an AUC of 0.9 using the DEG-driven model (57).
Recent research has emphasized the potential benefits of integrating diverse datasets for the purpose of treatment decision-making. ML algorithms have demonstrated efficacy in enhancing the precision of response prediction for TNF inhibitors and MTX. Furthermore, ML methodologies are being increasingly utilized in forecasting treatment responses to a range of other biologic therapies (61–64). Clinical data may be limited by trial design, including inclusion and exclusion criteria.Using deep learning technology for cluster analysis on RA patients has revealed the connection between patient characteristics and treatment response (105). Advancements in spatial omics technologies enable a comprehensive and spatially intact analysis of synovial tissue in RA patients. This approach allows for precise localization of cells, exploration of cellular interactions, assessment of cell type distributions, and identification of disease-associated molecular markers (106). Integrating traditional multi-omics with spatial data, spatial multi-omics elucidates the complexity and dynamics of biological processes across various levels, including their interactions and influences on each other. This approach deepens our understanding of the pathological mechanisms of RA and enhances our knowledge of its spatial heterogeneity (107). The biopsy-driven RA randomized clinical trial (R4RA), which utilizes spatial omics to create synovial biopsy gene maps, provides a paradigm for predicting drug treatment responses and refining therapeutic strategies. This is crucial for achieving personalized medicine and optimizing treatment outcomes. Despite some progress, spatial omics in RA research is still in its early stages. Numerous challenges remain, such as high costs, high demands on sample handling, patient acceptance, ethical issues, and the need for advanced computational tools for data integration (108). Overcoming these challenges will be crucial for developing accurate, interpretable, and clinically applicable predictive models. In summary while opportunities exist for refining the accuracy of these predictions, progress is evident in this area of study. In the future, using a larger, more comprehensive datase, appropriate algorithms, and methods in parameter optimization, improving model features and validating against independent cohorts may further improve the discriminative power of predictive models.
3.5 Prediction of comorbidities related to RA
ML is also gaining attention in the prediction of comorbidities associated with RA. Focus within extant research has primarily been oriented towards the identification of risk factors for osteoporosis (65, 66), assessment of cardiovascular risk (67, 68), and the prediction of interstitial lung disease development (69) in individuals with RA. Current models pertaining to comorbidities are limited in both quantity and accuracy, with constraints stemming from various sources, notably the scarcity of comprehensive comorbidity data within RA patient cohort datasets. Furthermore, there is significant variability in data quality across different cohorts. To overcome these obstacles, future research should prioritize the accumulation of larger, more robust datasets and improve integration among diverse data sources.Simultaneously, there is a necessity for the advancement of algorithms with broader applicability, thereby enabling the utilization of ML in the prediction of complications associated with RA.
4 Conclusion and outlook
Integrating data from diverse sources allows ML models to yield more comprehensive and precise predictions for the diagnosis and treatment outcomes of RA. However, more focus and effort are needed to create predictive models for comorbidities related to RA. Recent research has demonstrated the potential of multimodal learning to improve clinical prediction accuracy. The optimal performing model under specific conditions often necessitates an extensive comparative analysis. Beyond frequently used metrics such as AUC, accuracy, sensitivity, specificity, and F1 score, the employment of cross-validation, the statistical tests applied, the model’s computational cost, the data requirements, and accessibility, the adoption of multimodal learning approaches aims to refine clinical predictions. Efforts should be made to improve the clinical operability of models, utilize external datasets from diverse origins for validation, assess the model’s generalizability, monitor its long-term performance, and evaluate its strengths and weaknesses through multidimensional approaches rather than relying on a single performance metric. Although ML models have demonstrated impressive predictive prowess in research settings, it is imperative to establish their practicality and effectiveness in real-world clinical scenarios. To cultivate trust and acceptance among medical practitioners, it is essential to enhance the interpretability of these models. This can be achieved by prioritizing simplicity in experimental design or by employing tools that enhance model interpretability. Finally, but importantly, the privacy and ethical implications of big biological data should be emphasized and protected.
Author contributions
YMS: Data curation, Visualization, Writing – original draft. MZ: Data curation, Formal analysis, Writing – review & editing. CC: Data curation, Formal analysis, Writing – review & editing. PJ: Data curation, Formal analysis, Writing – review & editing. KW: Data curation, Formal analysis, Writing – review & editing. JZ: Data curation, Formal analysis, Writing – review & editing. YS: Data curation, Formal analysis, Writing – review & editing. YZ: Data curation, Formal analysis, Writing – review & editing. FZ: Data curation, Formal analysis, Writing – review & editing. XL: Data curation, Formal analysis, Writing – review & editing. SG: Conceptualization, Writing – review & editing. FW: Supervision, Writing – review & editing. DH: Funding acquisition, Supervision, Writing – review & editing.
Funding
The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was funded by the National Natural Science Funds of China (82074234, 82004166 and 82071756), Shanghai Chinese Medicine Development Office, National Administration of Traditional Chinese Medicine, Regional Chinese Medicine (Specialist) Diagnosis and Treatment Center Construction Project-Rheumatology, State Administration of Traditional Chinese Medicine, Shanghai Municipal Health Commission, East China Region-based Chinese and Western Medicine Joint Disease Specialist Alliance, and Shanghai He Dongyi Famous Chinese Medicine Studio Construction Project (SHGZS-202220).
Acknowledgments
Figure 1 was created by Figdraw (www.figdraw.com).
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
References
1. Cross M, Smith E, Hoy D, Carmona L, Wolfe F, Vos T, et al. The global burden of rheumatoid arthritis: estimates from the global burden of disease 2010 study. Ann Rheum Dis. (2014) 73:1316–22. doi: 10.1136/annrheumdis-2013-204627
2. Johnson TM, Sayles HR, Baker JF, George MD, Roul P, Zheng C, et al. Investigating changes in disease activity as a mediator of cardiovascular risk reduction with methotrexate use in rheumatoid arthritis. Ann Rheum Dis. (2021) 80:1385–92. doi: 10.1136/annrheumdis-2021-220125
3. Redente EF, Aguilar MA, Black BP, Edelman BL, Bahadur AN, Humphries SM, et al. Nintedanib reduces pulmonary fibrosis in a model of rheumatoid arthritis-associated interstitial lung disease. Am J Physiol Lung Cell Mol Physiol. (2018) 314:L998–L1009. doi: 10.1152/ajplung.00304.2017
4. Ng KJ, Huang KY, Tung CH, Hsu BB, Wu CH, Koo M, et al. Modified rheumatoid arthritis impact of disease (RAID) score, a potential tool for depression and anxiety screening for rheumatoid arthritis. Joint Bone Spine. (2019) 86:805–7. doi: 10.1016/j.jbspin.2019.04.007
5. Libbrecht MW, Noble WS. Machine learning applications in genetics and genomics. Nat Rev Genet. (2015) 16:321–32. doi: 10.1038/nrg3920
6. Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. (2015) 349:255–60. doi: 10.1126/science.aaa8415
7. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst. (2014) 2:3. doi: 10.1186/2047-2501-2-3
8. Butler KT, Davies DW, Cartwright H, Isayev O, Walsh A. Machine learning for molecular and materials science. Nature. (2018) 559:547–55. doi: 10.1038/s41586-018-0337-2
9. Coffey CM, Crowson CS, Myasoedova E, Matteson EL, Davis JM 3rd. Evidence of diagnostic and treatment delay in seronegative rheumatoid arthritis: missing the window of opportunity. Mayo Clin Proc. (2019) 94:2241–8. doi: 10.1016/j.mayocp.2019.05.023
10. Conigliaro P, Triggianese P, De Martino E, Fonti GL, Chimenti MS, Sunzini F, et al. Challenges in the treatment of rheumatoid arthritis. Autoimmun Rev. (2019) 18:706–13. doi: 10.1016/j.autrev.2019.05.007
11. Zhao J, Guo S, Schrodi SJ, He D. Molecular and cellular heterogeneity in rheumatoid arthritis: mechanisms and clinical implications. Front Immunol. (2021) 12:790122. doi: 10.3389/fimmu.2021.790122
12. Lo-Ciganic WH, Huang JL, Zhang HH, Weiss JC, Wu Y, Kwoh CK, et al. Evaluation of machine-learning algorithms for predicting opioid overdose risk among medicare beneficiaries with opioid prescriptions. JAMA Netw Open. (2019) 2:e190968. doi: 10.1001/jamanetworkopen.2019.0968
13. Warnat-Herresthal S, Schultze H, Shastry KL, Manamohan S, Mukherjee S, Garg V, et al. Swarm Learning for decentralized and confidential clinical machine learning. Nature. (2021) 594:265–70. doi: 10.1038/s41586-021-03583-3
14. Goodswen SJ, Barratt JLN, Kennedy PJ, Kaufer A, Calarco L, Ellis JT. Machine learning and applications in microbiology. FEMS Microbiol Rev. (2021) 45:fuab015. doi: 10.1093/femsre/fuab015
15. Jiang T, Gradus JL, Rosellini AJ. Supervised machine learning: A brief primer. Behav Ther. (2020) 51:675–87. doi: 10.1016/j.beth.2020.05.002
16. Gitto S, Cuocolo R, Annovazzi A, Anelli V, Acquasanta M, Cincotta A, et al. CT radiomics-based machine learning classification of atypical cartilaginous tumours and appendicular chondrosarcomas. EBioMedicine. (2021) 68:103407. doi: 10.1016/j.ebiom.2021.103407
17. Kulin M, Fortuna C, De Poorter E, Deschrijver D, Moerman I. Data-driven design of intelligent wireless networks: an overview and tutorial. Sensors (Basel). (2016) 16:790. doi: 10.3390/s16060790
18. Williamson DJ, Burn GL, Simoncelli S, Griffié J, Peters R, Davis DM, et al. Machine learning for cluster analysis of localization microscopy data. Nat Commun. (2020) 11:1493. doi: 10.1038/s41467-020-15293-x
19. Gao T, Lu W. Machine learning toward advanced energy storage devices and systems. iScience. (2020) 24:101936. doi: 10.1016/j.isci.2020.101936
20. Bajić F, Orel O, Habijan M. A multi-purpose shallow convolutional neural network for chart images. Sensors (Basel). (2022) 22:7695. doi: 10.3390/s22207695
21. Schwendicke F, Samek W, Krois J. Artificial intelligence in dentistry: chances and challenges. J Dent Res. (2020) 99:769–74. doi: 10.1177/0022034520915714
22. Peng H, Fan Y. Feature selection by optimizing a lower bound of conditional mutual information. Inf Sci (N Y). (2017) 418-419:652–67. doi: 10.1016/j.ins.2017.08.036
23. Yang L, Jiang H, Ding X, Liao Z, Wei M, Li J, et al. Modulation of sleep architecture by whole-body static magnetic exposure: A study based on EEG-based automatic sleep staging. Int J Environ Res Public Health. (2022) 19:741. doi: 10.3390/ijerph19020741
24. Tasci E, Jagasia S, Zhuge Y, Sproull M, Cooley Zgela T, Mackey M, et al. RadWise: A rank-based hybrid feature weighting and selection method for proteomic categorization of chemoirradiation in patients with glioblastoma. Cancers (Basel). (2023) 15:2672. doi: 10.3390/cancers15102672
25. Liang Y, Zhang ZQ, Liu NN, Wu YN, Gu CL, Wang YL. MAGCNSE: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinf. (2022) 23:189. doi: 10.1186/s12859-022-04715-w
26. Chen Y, Luo M, Cheng Y, Huang Y, He Q. A nomogram to predict prolonged stay of obesity patients with sepsis in ICU: Relevancy for predictive, personalized, preventive, and participatory healthcare strategies. Front Public Health. (2022) 10:944790. doi: 10.3389/fpubh.2022.944790
27. Lim AJW, Tyniana CT, Lim LJ, Tan JWL, Koh ET ,, TTSH Rheumatoid Arthritis Study Group, et al. Robust SNP-based prediction of rheumatoid arthritis through machine-learning-optimized polygenic risk score. J Transl Med. (2023) 21:92. doi: 10.1186/s12967-023-03939-5
28. Lufkin L, Budišić M, Mondal S, Sur S. A bayesian model to analyze the association of rheumatoid arthritis with risk factors and their interactions. Front Public Health. (2021) 9:693830. doi: 10.3389/fpubh.2021.693830
29. Pratt AG, Swan DC, Richardson S, Wilson G, Hilkens CM, Young DA, et al. A CD4 T cell gene signature for early rheumatoid arthritis implicates interleukin 6-mediated STAT3 signalling, particularly in anti-citrullinated peptide antibody-negative disease. Ann Rheum Dis. (2012) 71:1374–81. doi: 10.1136/annrheumdis-2011-200968
30. de la Calle-Fabregat C, Niemantsverdriet E, Cañete JD, Li T, van der Helm-van Mil AHM, Rodríguez-Ubreva J, et al. Prediction of the progression of undifferentiated arthritis to rheumatoid arthritis using DNA methylation profiling. Arthritis Rheumatol. (2021) 73:2229–39. doi: 10.1002/art.41885
31. Üreten K, Maraş HH. Automated classification of rheumatoid arthritis, osteoarthritis, and normal hand radiographs with deep learning methods. J Digit Imaging. (2022) 35:193–9. doi: 10.1007/s10278-021-00564-w
32. Wu M, Wu H, Wu L, Cui C, Shi S, Xu J, et al. A deep learning classification of metacarpophalangeal joints synovial proliferation in rheumatoid arthritis by ultrasound images. J Clin Ultrasound. (2022) 50:296–301. doi: 10.1002/jcu.23143
33. Alarcón-Paredes A, Guzmán-Guzmán IP, Hernández-Rosales DE, Navarro-Zarza JE, Cantillo-Negrete J, Cuevas-Valencia RE, et al. Computer-aided diagnosis based on hand thermal, RGB images, and grip force using artificial intelligence as screening tool for rheumatoid arthritis in women. Med Biol Eng Comput. (2021) 59:287–300. doi: 10.1007/s11517-020-02294-7
34. Xiao J, Wang R, Cai X, Ye Z. Coupling of co-expression network analysis and machine learning validation unearthed potential key genes involved in rheumatoid arthritis. Front Genet. (2021) 12:604714. doi: 10.3389/fgene.2021.604714
35. Liu Y, Jiang H, Kang T, Shi X, Liu X, Li C, et al. Platelets-related signature based diagnostic model in rheumatoid arthritis using WGCNA and machine learning. Front Immunol. (2023) 14:1204652. doi: 10.3389/fimmu.2023.1204652
36. Yeo L, Adlard N, Biehl M, Juarez M, Smallie T, Snow M, et al. Expression of chemokines CXCL4 and CXCL7 by synovial macrophages defines an early stage of rheumatoid arthritis. Ann Rheum Dis. (2016) 75:763–71. doi: 10.1136/annrheumdis-2014-206921
37. Geng Q, Cao X, Fan D, Gu X, Zhang Q, Zhang M, et al. Diagnostic gene signatures and aberrant pathway activation based on m6A methylation regulators in rheumatoid arthritis. Front Immunol. (2022) 13:1041284. doi: 10.3389/fimmu.2022.1041284
38. Luan H, Gu W, Li H, Wang Z, Lu L, Ke M, et al. Serum metabolomic and lipidomic profiling identifies diagnostic biomarkers for seropositive and seronegative rheumatoid arthritis patients. J Transl Med. (2021) 19:500. doi: 10.1186/s12967-021-03169-7
39. Han P, Hou C, Zheng X, Cao L, Shi X, Zhang X, et al. Serum antigenome profiling reveals diagnostic models for rheumatoid arthritis. Front Immunol. (2022) 13:884462. doi: 10.3389/fimmu.2022.884462
40. Volkova A, Ruggles KV. Predictive metagenomic analysis of autoimmune disease identifies robust autoimmunity and disease specific microbial signatures. Front Microbiol. (2021) 12:621310. doi: 10.3389/fmicb.2021.621310
41. Ormseth MJ, Solus JF, Sheng Q, Ye F, Wu Q, Guo Y, et al. Development and validation of a microRNA panel to differentiate between patients with rheumatoid arthritis or systemic lupus erythematosus and controls. J Rheumatol. (2020) 47:188–96. doi: 10.3899/jrheum.181029
42. Mehta B, Goodman S, DiCarlo E, Jannat-Khah D, Gibbons JAB, Otero M, et al. Machine learning identification of thresholds to discriminate osteoarthritis and rheumatoid arthritis synovial inflammation. Arthritis Res Ther. (2023) 25:31. doi: 10.1186/s13075-023-03008-8
43. Orange DE, Agius P, DiCarlo EF, Robine N, Geiger H, Szymonifka J, et al. Identification of three rheumatoid arthritis disease subtypes by machine learning integration of synovial histologic features and RNA sequencing data. Arthritis Rheumatol. (2018) 70:690–701. doi: 10.1002/art.40428
44. Joo YB, Kim Y, Park Y, Kim K, Ryu JA, Lee S, et al. Biological function integrated prediction of severe radiographic progression in rheumatoid arthritis: a nested case control study. Arthritis Res Ther. (2017) 19:244. doi: 10.1186/s13075-017-1414-x
45. Christensen ABH, Just SA, Andersen JKH, Savarimuthu TR. Applying cascaded convolutional neural network design further enhances automatic scoring of arthritis disease activity on ultrasound images from rheumatoid arthritis patients. Ann Rheum Dis. (2020) 79:1189–93. doi: 10.1136/annrheumdis-2019-216636
46. Vodencarevic A, Tascilar K, Hartmann F, Reiser M, Hueber AJ, Haschka J, et al. Advanced machine learning for predicting individual risk of flares in rheumatoid arthritis patients tapering biologic drugs. Arthritis Res Ther. (2021) 23:67. doi: 10.1186/s13075-021-02439-5
47. O'Neil LJ, Hu P, Liu Q, Islam MM, Spicer V, Rech J, et al. Proteomic approaches to defining remission and the risk of relapse in rheumatoid arthritis. Front Immunol. (2021) 12:729681. doi: 10.3389/fimmu.2021.729681
48. Norgeot B, Glicksberg BS, Trupin L, Lituiev D, Gianfrancesco M, Oskotsky B, et al. Assessment of a deep learning model based on electronic health record data to forecast clinical outcomes in patients with rheumatoid arthritis. JAMA Netw Open. (2019) 2:e190606. doi: 10.1001/jamanetworkopen.2019.0606
49. Feldman CH, Yoshida K, Xu C, Frits ML, Shadick NA, Weinblatt ME, et al. Supplementing claims data with electronic medical records to improve estimation and classification of rheumatoid arthritis disease activity: A machine learning approach. ACR Open Rheumatol. (2019) 1:552–9. doi: 10.1002/acr2.11068
50. Chandran U, Reps J, Stang PE, Ryan PB. Inferring disease severity in rheumatoid arthritis using predictive modeling in administrative claims databases. PloS One. (2019) 14:e0226255. doi: 10.1371/journal.pone.0226255
51. Artacho A, Isaac S, Nayak R, Flor-Duro A, Alexander M, Koo I, et al. The pretreatment gut microbiome is associated with lack of response to methotrexate in new-onset rheumatoid arthritis. Arthritis Rheumatol. (2021) 73:931–42. doi: 10.1002/art.41622
52. Duquesne J, Bouget V, Cournède PH, Fautrel B, Guillemin F, de Jong PHP, et al. Machine learning identifies a profile of inadequate responder to methotrexate in rheumatoid arthritis. Rheumatol (Oxford). (2023) 62:2402–9. doi: 10.1093/rheumatology/keac645
53. Lim AJW, Lim LJ, Ooi BNS, Koh ET, Tan JWL ,, TTSH RA Study Group, et al. Functional coding haplotypes and machine-learning feature elimination identifies predictors of Methotrexate Response in Rheumatoid Arthritis patients. EBioMedicine. (2022) 75:103800. doi: 10.1016/j.ebiom.2021.103800
54. Plant D, Maciejewski M, Smith S, Nair N ,, Maximising Therapeutic Utility in Rheumatoid Arthritis Consortium, the RAMS Study Group, Hyrich K, et al. Profiling of gene expression biomarkers as a classifier of methotrexate nonresponse in patients with rheumatoid arthritis. Arthritis Rheumatol. (2019) 71:678–84. doi: 10.1002/art.40810
55. Luque-Tévar M, Perez-Sanchez C, Patiño-Trives AM, Barbarroja N, Arias de la Rosa I, Abalos-Aguilera MC, et al. Integrative clinical, molecular, and computational analysis identify novel biomarkers and differential profiles of anti-TNF response in rheumatoid arthritis. Front Immunol. (2021) 12:631662. doi: 10.3389/fimmu.2021.631662
56. Guan Y, Zhang H, Quang D, Wang Z, Parker SCJ, Pappas DA, et al. Machine learning to predict anti-tumor necrosis factor drug responses of rheumatoid arthritis patients by integrating clinical and genetic markers. Arthritis Rheumatol. (2019) 71:1987–96. doi: 10.1002/art.41056
57. Kim KJ, Kim M, Adamopoulos IE, Tagkopoulos I. Compendium of synovial signatures identifies pathologic characteristics for predicting treatment response in rheumatoid arthritis patients. Clin Immunol. (2019) 202:1–10. doi: 10.1016/j.clim.2019.03.002
58. Miyoshi F, Honne K, Minota S, Okada M, Ogawa N, Mimura T. A novel method predicting clinical response using only background clinical data in RA patients before treatment with infliximab. Mod Rheumatol. (2016) 26:813–6. doi: 10.3109/14397595.2016.1168536
59. Yoosuf N, Maciejewski M, Ziemek D, Jelinsky SA, Folkersen L, Müller M, et al. Early prediction of clinical response to anti-TNF treatment using multi-omics and machine learning in rheumatoid arthritis. Rheumatol (Oxford). (2022) 61:1680–9. doi: 10.1093/rheumatology/keab521
60. Bouget V, Duquesne J, Hassler S, Cournède PH, Fautrel B, Guillemin F, et al. Machine learning predicts response to TNF inhibitors in rheumatoid arthritis: results on the ESPOIR and ABIRISK cohorts. RMD Open. (2022) 8:e002442. doi: 10.1136/rmdopen-2022-002442
61. Rivellese F, Surace AEA, Goldmann K, Sciacca E, Çubuk C, Giorli G, et al. Rituximab versus tocilizumab in rheumatoid arthritis: synovial biopsy-based biomarker analysis of the phase 4 R4RA randomized trial. Nat Med. (2022) 28:1256–68. doi: 10.1038/s41591-022-01789-0
62. Koo BS, Eun S, Shin K, Yoon H, Hong C, Kim DH, et al. Machine learning model for identifying important clinical features for predicting remission in patients with rheumatoid arthritis treated with biologics. Arthritis Res Ther. (2021) 23:178. doi: 10.1186/s13075-021-02567-y
63. Lee S, Kang S, Eun Y, Won HH, Kim H, Lee J, et al. Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis. Arthritis Res Ther. (2021) 23:254. doi: 10.1186/s13075-021-02635-3
64. Novella-Navarro M, Benavent D, Ruiz-Esquide V, Tornero C, Díaz-Almirón M, Chacur CA, et al. Predictive model to identify multiple failure to biological therapy in patients with rheumatoid arthritis. Ther Adv Musculoskelet Dis. (2022) 14:1759720X221124028. doi: 10.1177/1759720X221124028
65. Chen R, Huang Q, Chen L. evelopment and validation of machine learning models for prediction of fracture risk in patients with elderly-onset rheumatoid arthritis. Int J Gen Med. (2022) 15:7817–29. doi: 10.2147/IJGM.S380197
66. Lee C, Joo G, Shin S, Im H, Moon KW. Prediction of osteoporosis in patients with rheumatoid arthritis using machine learning. Sci Rep. (2023) 13:21800. doi: 10.1038/s41598-023-48842-7
67. Liu F, Huang Y, Liu F, Wang H. Identification of immune-related genes in diagnosing atherosclerosis with rheumatoid arthritis through bioinformatics analysis and machine learning. Front Immunol. (2023) 14:1126647. doi: 10.3389/fimmu.2023.1126647
68. Wei T, Yang B, Liu H, Xin F, Fu L. Development and validation of a nomogram to predict coronary heart disease in patients with rheumatoid arthritis in northern China. Aging (Albany NY). (2020) 12:3190–204. doi: 10.18632/aging.v12i4
69. Qin Y, Wang Y, Meng F, Feng M, Zhao X, Gao C, et al. Identification of biomarkers by machine learning classifiers to assist diagnose rheumatoid arthritis-associated interstitial lung disease. Arthritis Res Ther. (2022) 24:115. doi: 10.1186/s13075-022-02800-2
70. Karlson EW, van Schaardenburg D, van der Helm-van Mil AH. Strategies to predict rheumatoid arthritis development in at-risk populations. Rheumatol (Oxford). (2016) 55:6–15. doi: 10.1093/rheumatology/keu287
71. Burgers LE, Raza K, van der Helm-van Mil AH. Window of opportunity in rheumatoid arthritis - definitions and supporting evidence: from old to new perspectives. RMD Open. (2019) 5:e000870. doi: 10.1136/rmdopen-2018-000870
72. Hazlewood GS, Barnabe C, Tomlinson G, Marshall D, Devoe DJ, Bombardier C. Methotrexate monotherapy and methotrexate combination therapy with traditional and biologic disease modifying anti-rheumatic drugs for rheumatoid arthritis: A network meta-analysis. Cochrane Database Syst Rev. (2016) 2016:CD010227. doi: 10.1002/14651858.CD010227.pub2
73. Nahon P, Bamba-Funck J, Layese R, Trépo E, Zucman-Rossi J, Cagnot C, et al. Integrating genetic variants into clinical models for hepatocellular carcinoma risk stratification in cirrhosis. J Hepatol. (2023) 78:584–95. doi: 10.1016/j.jhep.2022.11.003
74. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. (2019) 51:584–91. doi: 10.1038/s41588-019-0379-x
75. Ruan Y, Lin YF, Feng YA, Chen CY, Lam M, Guo Z, et al. Improving polygenic prediction in ancestrally diverse populations. Nat Genet. (2022) 54:573–80. doi: 10.1038/s41588-022-01054-7
76. Hao L, Kraft P, Berriz GF, Hynes ED, Koch C, Kumar PKV, et al. Development of a clinical polygenic risk score assay and reporting workflow. Nat Med. (2022) 28:1006–13. doi: 10.1038/s41591-022-01767-6
77. Li H, Cai Z, Wang J, Tang J, Ding W, Lin CT, et al. FedTP: federated learning by transformer personalization. IEEE Trans Neural Netw Learn Syst. (2023). doi: 10.1109/TNNLS.2023.3269062
78. Gu X, Sabrina F, Fan Z, Sohail S. A review of privacy enhancement methods for federated learning in healthcare systems. Int J Environ Res Public Health. (2023) 20:6539. doi: 10.3390/ijerph20156539
79. Haggenmüller S, Schmitt M, Krieghoff-Henning E, Hekler A, Maron RC, Wies C, et al. Federated learning for decentralized artificial intelligence in melanoma diagnostics. JAMA Dermatol. (2024) 160:303–11. doi: 10.1001/jamadermatol.2023.5550
80. van den Berg R, Ohrndorf S, Kortekaas MC, van der Helm-van Mil AHM. What is the value of musculoskeletal ultrasound in patients presenting with arthralgia to predict inflammatory arthritis development? A systematic literature review. Arthritis Res Ther. (2018) 20:228. doi: 10.1186/s13075-018-1715-8
81. Jo J, Tian C, Xu G, Sarazin J, Schiopu E, Gandikota G, et al. Photoacoustic tomography for human musculoskeletal imaging and inflammatory arthritis detection. Photoacoustics. (2018) 12:82–9. doi: 10.1016/j.pacs.2018.07.004
82. Madani A, Arnaout R, Mofrad M, Arnaout R. Fast and accurate view classification of echocardiograms using deep learning. NPJ Digit Med. (2018) 1:6. doi: 10.1038/s41746-017-0013-1
83. Chen D, Liu S, Kingsbury P, Sohn S, Storlie CB, Habermann EB, et al. Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med. (2019) 2:43. doi: 10.1038/s41746-019-0122-0
84. Lei Y, Tian Y, Shan H, Zhang J, Wang G, Kalra MK. Shape and margin-aware lung nodule classification in low-dose CT images via soft activation mapping. Med Image Anal. (2020) 60:101628. doi: 10.1016/j.media.2019.101628
85. Rynazal R, Fujisawa K, Shiroma H, Salim F, Mizutani S, Shiba S, et al. Leveraging explainable AI for gut microbiome-based colorectal cancer classification. Genome Biol. (2023) 24:21. doi: 10.1186/s13059-023-02858-4
86. Lee WY, Lee Y, Lee S, Kim YW, Kim JH. A machine learning approach for recommending herbal formulae with enhanced interpretability and applicability. Biomolecules. (2022) 12:1604. doi: 10.3390/biom12111604
87. Lee YG, Choi SC, Kang Y, Kim KM, Kang CS, Kim C. Constructing a reference genome in a single lab: the possibility to use oxford nanopore technology. Plants (Basel). (2019) 8:270. doi: 10.3390/plants8080270
88. Sun Y, Chen BR, Deshpande A. Epigenetic regulators in the development, maintenance, and therapeutic targeting of acute myeloid leukemia. Front Oncol. (2018) 8:41. doi: 10.3389/fonc.2018.00041
89. Rodríguez-Molina JB, West S, Passmore LA. Knowing when to stop: Transcription termination on protein-coding genes by eukaryotic RNAPII. Mol Cell. (2023) 83:404–15. doi: 10.1016/j.molcel.2022.12.021
90. Graves PR, Haystead TA. Molecular biologist's guide to proteomics. Microbiol Mol Biol Rev. (2002) ;66:39–63. doi: 10.1128/MMBR.66.1.39-63.2002
91. Guo H, Guo H, Zhang L, Tang Z, Yu X, Wu J, et al. Metabolome and transcriptome association analysis reveals dynamic regulation of purine metabolism and flavonoid synthesis in transdifferentiation during somatic embryogenesis in cotton. Int J Mol Sci. (2019) 20:2070. doi: 10.3390/ijms20092070
92. Smeekens SP, Huttenhower C, Riza A, van de Veerdonk FL, Zeeuwen PL, Schalkwijk J, et al. Skin microbiome imbalance in patients with STAT1/STAT3 defects impairs innate host defense responses. J Innate Immun. (2014) 6:253–62. doi: 10.1159/000351912
93. Tarazona S, Balzano-Nogueira L, Gómez-Cabrero D, Schmidt A, Imhof A, Hankemeier T, et al. Harmonization of quality metrics and power calculation in multi-omic studies. Nat Commun. (2020) 11:3092. doi: 10.1038/s41467-020-16937-8
94. Yi D, Bayer T, Badenhorst CPS, Wu S, Doerr M, Höhne M, et al. Recent trends in biocatalysis. Chem Soc Rev. (2021) 50:8003–49. doi: 10.1039/D0CS01575J
95. Brown MV, McDunn JE, Gunst PR, Smith EM, Milburn MV, Troyer DA, et al. Gunst PR Cancer detection and biopsy classification using concurrent histopathological and metabolomic analysis of core biopsies. Genome Med. (2012) 4:33. doi: 10.1186/gm332
96. Yang S, Hollister AM, Orchard EA, Chaudhery SI, Ostanin DV, Lokitz SJ, et al. Quantification of bone changes in a collagen-induced arthritis mouse model by reconstructed three dimensional micro-CT. Biol Proced Online. (2013) 15:8. doi: 10.1186/1480-9222-15-8
97. Liao KP, Kurreeman F, Li G, Duclos G, Murphy S, Guzman R, et al. Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non-rheumatoid arthritis controls. Arthritis Rheumatol. (2013) 65:571–81. doi: 10.1002/art.37801
98. Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, et al. Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records. Am J Hum Genet. (2011) 88:57–69. doi: 10.1016/j.ajhg.2010.12.007
99. Li H, Guan Y. Multilevel modeling of joint damage in rheumatoid arthritis. Adv Intell Syst. (2022) 4:2200184. doi: 10.1002/aisy.202200184
100. Sun D, Nguyen TM, Allaway RJ, Wang J, Chung V, Yu TV, et al. RA2-DREAM challenge community. A crowdsourcing approach to develop machine learning models to quantify radiographic joint damage in rheumatoid arthritis. JAMA Netw Open. (2022) 5:e2227423. doi: 10.1001/jamanetworkopen.2022.27423
101. Fiorentino MC, Cipolletta E, Filippucci E, Grassi W, Frontoni E, Moccia S. A deep-learning framework for metacarpal-head cartilage-thickness estimation in ultrasound rheumatological images. Comput Biol Med. (2022) 141:105117. doi: 10.1016/j.compbiomed.2021.105117
102. Andersen JKH, Pedersen JS, Laursen MS, Holtz K, Grauslund J, Savarimuthu TR, et al. Neural networks for automatic scoring of arthritis disease activity on ultrasound images. RMD Open. (2019) 5:e000891. doi: 10.1136/rmdopen-2018-000891
103. Singh JA, Hossain A, Mudano AS, Tanjong Ghogomu E, Suarez-Almazor ME, Buchbinder R, et al. Biologics or tofacitinib for people with rheumatoid arthritis naive to methotrexate: a systematic review and network meta-analysis. Cochrane Database Syst Rev. (2017) 5:CD012657. doi: 10.1002/14651858
104. Bluett J, Riba-Garcia I, Verstappen SMM, Wendling T, Ogungbenro K, Unwin RD, et al. Development and validation of a methotrexate adherence assay. Ann Rheum Dis. (2019) 78:1192–7. doi: 10.1136/annrheumdis-2019-215446
105. Kalweit M, Burden AM, Boedecker J, Hügle T, Burkard T. Patient groups in Rheumatoid arthritis identified by deep learning respond differently to biologic or targeted synthetic DMARDs. PloS Comput Biol. (2023) 19:e1011073. doi: 10.1371/journal.pcbi.1011073
106. Jain S, Eadon MT. Spatial transcriptomics in health and disease. Nat Rev Nephrol. (2024). doi: 10.1038/s41581-024-00841-1
107. Wu H, Dixon EE, Xuanyuan Q, Guo J, Yoshimura Y, Debashish C, et al. High resolution spatial profiling of kidney injury and repair using RNA hybridization-based in situ sequencing. Nat Commun. (2024) 15:1396. doi: 10.1038/s41467-024-45752-8
Keywords: ML, rheumatoid arthritis, precision medicine, diagnosis, treatment
Citation: Shi Y, Zhou M, Chang C, Jiang P, Wei K, Zhao J, Shan Y, Zheng Y, Zhao F, Lv X, Guo S, Wang F and He D (2024) Advancing precision rheumatology: applications of machine learning for rheumatoid arthritis management. Front. Immunol. 15:1409555. doi: 10.3389/fimmu.2024.1409555
Received: 30 March 2024; Accepted: 24 May 2024;
Published: 10 June 2024.
Edited by:
Xu-jie Zhou, Peking University, ChinaReviewed by:
Hiufung Yip, Hong Kong Baptist University, Hong Kong SAR, ChinaMiha Lavric, University of Maribor, Slovenia
Copyright © 2024 Shi, Zhou, Chang, Jiang, Wei, Zhao, Shan, Zheng, Zhao, Lv, Guo, Wang and He. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Dongyi He, ZG9uZ3lpaGVAbWVkbWFpbC5jb20uY24=; Fubo Wang, d2FuZ2Z1Ym9AZ3htdS5lZHUuY24=