AUTHOR=Katayama Yotaro , Kobayashi Tetsuya J. TITLE=Comparative Study of Repertoire Classification Methods Reveals Data Efficiency of k-mer Feature Extraction JOURNAL=Frontiers in Immunology VOLUME=13 YEAR=2022 URL=https://www.frontiersin.org/journals/immunology/articles/10.3389/fimmu.2022.797640 DOI=10.3389/fimmu.2022.797640 ISSN=1664-3224 ABSTRACT=
The repertoire of T cell receptors encodes various types of immunological information. Machine learning is indispensable for decoding such information from repertoire datasets measured by next-generation sequencing (NGS). In particular, the classification of repertoires is the most basic task, which is relevant for a variety of scientific and clinical problems. Supported by the recent appearance of large datasets, efficient but data-expensive methods have been proposed. However, it is unclear whether they can work efficiently when the available sample size is severely restricted as in practical situations. In this study, we demonstrate that their performances can be impaired substantially below critical sample sizes. To complement this drawback, we propose MotifBoost, which exploits the information of short