AUTHOR=Emmaneel Annelies , Bogaert Delfien J. , Van Gassen Sofie , Tavernier Simon J. , Dullaers Melissa , Haerynck Filomeen , Saeys Yvan TITLE=A Computational Pipeline for the Diagnosis of CVID Patients JOURNAL=Frontiers in Immunology VOLUME=10 YEAR=2019 URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2019.02009 DOI=10.3389/fimmu.2019.02009 ISSN=1664-3224 ABSTRACT=
Common variable immunodeficiency (CVID) is one of the most frequently diagnosed primary antibody deficiencies (PADs), a group of disorders characterized by a decrease in one or more immunoglobulin (sub)classes and/or impaired antibody responses caused by inborn defects in B cells in the absence of other major immune defects. CVID patients suffer from recurrent infections and disease-related, non-infectious, complications such as autoimmune manifestations, lymphoproliferation, and malignancies. A timely diagnosis is essential for optimal follow-up and treatment. However, CVID is by definition a diagnosis of exclusion, thereby covering a heterogeneous patient population and making it difficult to establish a definite diagnosis. To aid the diagnosis of CVID patients, and distinguish them from other PADs, we developed an automated machine learning pipeline which performs automated diagnosis based on flow cytometric immunophenotyping. Using this pipeline, we analyzed the immunophenotypic profile in a pediatric and adult cohort of 28 patients with CVID, 23 patients with idiopathic primary hypogammaglobulinemia, 21 patients with IgG subclass deficiency, six patients with isolated IgA deficiency, one patient with isolated IgM deficiency, and 100 unrelated healthy controls. Flow cytometry analysis is traditionally done by manual identification of the cell populations of interest. Yet, this approach has severe limitations including subjectivity of the manual gating and bias toward known populations. To overcome these limitations, we here propose an automated computational flow cytometry pipeline that successfully distinguishes CVID phenotypes from other PADs and healthy controls. Compared to the traditional, manual analysis, our pipeline is fully automated, performing automated quality control and data pre-processing, automated population identification (gating) and deriving features from these populations to build a machine learning classifier to distinguish CVID from other PADs and healthy controls. This results in a more reproducible flow cytometry analysis, and improves the diagnosis compared to manual analysis: our pipelines achieve on average a balanced accuracy score of 0.93 (±0.07), whereas using the manually extracted populations, an averaged balanced accuracy score of 0.72 (±0.23) is achieved.