AUTHOR=Huang Yu-Ning , Patel Naresh Amrat , Mehta Jay Himanshu , Ginjala Srishti , Brodin Petter , Gray Clive M. , Patel Yesha M. , Cowell Lindsay G. , Burkhardt Amanda M. , Mangul Serghei TITLE=Data Availability of Open T-Cell Receptor Repertoire Data, a Systematic Assessment JOURNAL=Frontiers in Systems Biology VOLUME=2 YEAR=2022 URL=https://www.frontiersin.org/journals/systems-biology/articles/10.3389/fsysb.2022.918792 DOI=10.3389/fsysb.2022.918792 ISSN=2674-0702 ABSTRACT=

Modern data-driven research has the power to promote novel biomedical discoveries through secondary analyses of raw data. Therefore, it is important to ensure data-driven research with great reproducibility and robustness for promoting a precise and accurate secondary analysis of the immunogenomics data. In scientific research, rigorous conduct in designing and conducting experiments is needed, specifically in scientific writing and reporting results. It is also crucial to make raw data available, discoverable, and well described or annotated in order to promote future re-analysis of the data. In order to assess the data availability of published T cell receptor (TCR) repertoire data, we examined 11,918 TCR-Seq samples corresponding to 134 TCR-Seq studies ranging from 2006 to 2022. Among the 134 studies, only 38.1% had publicly available raw TCR-Seq data shared in public repositories. We also found a statistically significant association between the presence of data availability statements and the increase in raw data availability (p = 0.014). Yet, 46.8% of studies with data availability statements failed to share the raw TCR-Seq data. There is a pressing need for the biomedical community to increase awareness of the importance of promoting raw data availability in scientific research and take immediate action to improve its raw data availability enabling cost-effective secondary analysis of existing immunogenomics data by the larger scientific community.