ORIGINAL RESEARCH article

Front. Microbiol.

Sec. Systems Microbiology

Volume 16 - 2025 | doi: 10.3389/fmicb.2025.1584360

This article is part of the Research TopicArtificial Intelligence and mNGS in Pathogenic Microorganism Research.View all 3 articles

Genome-wide expression in human whole blood for diagnosis of latent tuberculosis infection: a multicohort research

Provisionally accepted
  • 1Senior Department of Tuberculosis, The 8th Medical Center of PLA General Hospital, Beijing, China
  • 2Section of Health, No. 94804 Unit of the Chinese People's Liberation Army, Shanghai, China
  • 3Resident standardization training cadet corps, Air Force Medical Center, Beijing, China
  • 4Graduate School, Hebei North University, Zhangjiakou, Hebei Province, China

The final, formatted version of the article will be published soon.

Background: Tuberculosis (TB) remains a significant global health challenge, necessitating reliable biomarkers for differentiation between latent tuberculosis infection (LTBI) and active tuberculosis (ATB). This study aimed to identify blood-based biomarkers differentiating LTBI from ATB through multicohort analysis of public datasets. Methods: We systematically screened 18 datasets from the NIH Gene Expression Omnibus (GEO), ultimately including 11 cohorts comprising 2,758 patients across 8 countries/regions and 13 ethnicities. Cohorts were stratified into training (8 cohorts, n=1,933) and validation sets (3 cohorts, n=825) based on functional assignment. Results: Through Upset analysis, LASSO (Least Absolute Shrinkage and Selection Operator), SVM-RFE (Support Vector Machine Recursive Feature Elimination), and MCL (Markov Cluster Algorithm) clustering of protein-protein interaction networks, we identified S100A12 and S100A8 as optimal biomarkers. A Naive Bayes (NB) model incorporating these two markers demonstrated robust diagnostic performance: training set AUC: median=0.8572 (inter-quartile range 0.8002, 0.8708), validation AUC=0.5719 (0.51645, 0.7078), and subgroup AUC=0.8635 (0.8212, 0.8946). Conclusion: Our multicohort analysis established an NB-based diagnostic model utilizing S100A12/S100A8, which maintains diagnostic accuracy across diverse geographic, ethnic, and clinical variables (including HIV co-infection), highlighting its potential for clinical translation in LTBI/ATB differentiation.

Keywords: Active tuberculosis, latent tuberculosis infection, Diagnostic model, biomarkers, Multicohort analysis

Received: 27 Feb 2025; Accepted: 18 Apr 2025.

Copyright: © 2025 Jiang, Liu, Li, Ni, An, Li, Zhang and Gong. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence:
Lingxia Zhang, Senior Department of Tuberculosis, The 8th Medical Center of PLA General Hospital, Beijing, China
Wenping Gong, Senior Department of Tuberculosis, The 8th Medical Center of PLA General Hospital, Beijing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.