The final, formatted version of the article will be published soon.
TECHNOLOGY AND CODE article
Front. Bioinform.
Sec. Single Cell Bioinformatics
Volume 5 - 2025 |
doi: 10.3389/fbinf.2025.1519468
Seurat function argument values in scRNA-seq data analysis: potential pitfalls and refinements for biological interpretation
Provisionally accepted- 1 Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, Moscow Oblast, Russia
- 2 Immanuel Kant Baltic Federal University, Kaliningrad, Kaliningrad Oblast, Russia
- 3 Institute of Medicine and Life Sciences, Immanuel Kant Baltic Federal University, Kaliningrad, Kaliningrad Oblast, Russia
Processing biological data is a challenge of paramount importance as the amount of accumulated data has been annually increasing along with the emergence of new methods for studying biological objects. Blind application of mathematical methods in biology may lead to erroneous hypotheses and conclusions. Here we narrow our focus down to a small set of mathematical methods applied upon standard processing of scRNA-seq data: preprocessing, dimensionality reduction, integration, and clustering (using machine learning methods for clustering). Normalization and scaling are standard manipulations for the pre-processing with LogNormalize (natural-log transformation), CLR (centered log ratio transformation), and RC (relative counts) being employed as methods for data transformation. The justification for applying these methods in biology is not discussed in methodological articles. The essential aspect of dimensionality reduction is to identify the stable patterns which are deliberately removed upon mathematical data processing as being redundant, albeit containing important minor details for biological interpretation. There are no established rules for integration of datasets obtained at different sampling times or conditions. Clustering calls for reconsidering its application specifically for biological data processing. The novelty of the present study lies in an integrated approach of biology and bioinformatics to elucidate biological insights upon data processing.
Keywords: biocentric mathematics, ScRNA-seq, dimension reduction, Cell clustering, datasets integration
Received: 29 Oct 2024; Accepted: 20 Jan 2025.
Copyright: © 2025 Arbatsky, Vasilyeva, Sysoeva, Semina, Saveliev and Rubina. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
* Correspondence:
Kseniya Rubina, Faculty of Fundamental Medicine, Lomonosov Moscow State University, Moscow, 119192, Moscow Oblast, Russia
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.