Expression quantitative trait locus studies in the era of single-cell omics

Luo, Jie; Wu, Xinyi; Cheng, Yuan; Chen, Guang; Wang, Jian; Song, Xijiao

doi:10.3389/fgene.2023.1182579

REVIEW article

Front. Genet., 22 May 2023

Sec. Statistical Genetics and Methodology

Volume 14 - 2023 | https://doi.org/10.3389/fgene.2023.1182579

Expression quantitative trait locus studies in the era of single-cell omics

Jie Luo ¹^*

Xinyi Wu ²

Yuan Cheng ²

Guang Chen ¹

Jian Wang ¹

Xijiao Song ¹

1. State Key Laboratory for Managing Biotic and Chemical Threats to The Quality and Safety of Agro‐products, Zhejiang Academy of Agricultural Sciences, Hangzhou, China
2. Institute of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China

Article metrics

View details

Citations

6,6k

Views

1,7k

Downloads

Abstract

Genome-wide association studies have revealed that the regulation of gene expression bridges genetic variants and complex phenotypes. Profiling of the bulk transcriptome coupled with linkage analysis (expression quantitative trait locus (eQTL) mapping) has advanced our understanding of the relationship between genetic variants and gene regulation in the context of complex phenotypes. However, bulk transcriptomics has inherited limitations as the regulation of gene expression tends to be cell-type-specific. The advent of single-cell RNA-seq technology now enables the identification of the cell-type-specific regulation of gene expression through a single-cell eQTL (sc-eQTL). In this review, we first provide an overview of sc-eQTL studies, including data processing and the mapping procedure of the sc-eQTL. We then discuss the benefits and limitations of sc-eQTL analyses. Finally, we present an overview of the current and future applications of sc-eQTL discoveries.

1 Introduction

Over the past decades, genome-wide association studies (GWAS) have successfully identified thousands of genetic variants associated with over 100 common diseases (Visscher et al., 2017). However, the vast majority of these variants are in non-coding regions (Brodie et al., 2016) and exert their effect function by regulating gene expression. Expression quantitative trait locus (eQTL) mapping, which links genetic variants to the variation in gene expression, has largely been performed in bulk transcriptomic data generated by RNA-seq and microarray technologies. However, a significant proportion of GWAS loci cannot be explained by eQTL signals in bulk transcriptomic data, in which expression levels are averaged across all cells in a sample.

One solution to this problem is to study the regulation of gene expression at the cell-type-specific level (Knowles et al., 2017; Favé et al., 2018). Several previous studies in purified blood cell populations (Fairfax et al., 2012; Ishigaki et al., 2017; Donovan et al., 2020; Kim-Hellmuth et al., 2020; Yao et al., 2021) have already identified cell-type-specific regulation. The recent advent of scRNA-seq technology has revolutionized our ability to understand cell-type-specific gene expression by resolving complex cellular heterogeneity.

The single-cell expression quantitative trait locus (sc-eQTL) is emerging as a powerful tool to identify cell-type-specific regulation of gene expression. For example, a recent study performed eQTL mapping using single nuclei RNA-seq from 196 individuals in eight CNS cell types and identified 6,108 eGenes, 43% of which have cell-type-specific effects. The study provided new insights into the disease etiology and genetic mechanisms influencing neurological disorders (Bryois et al., 2022), demonstrating that sc-eQTL mapping provides a powerful approach to link genetic variants to complex diseases.

In this review, we aim to provide a comprehensive overview of sc-eQTL studies. We begin with an introduction to data processing and mapping procedures used in sc-eQTL analyses and provide details of the methods used in the analysis of the cell-type-specific regulation of gene expression. We then discuss the benefits of sc-eQTL studies compared to traditional eQTL analyses using bulk transcriptomic data. The limitations and challenges of sc-eQTL analyses are also discussed. Finally, we present a comprehensive overview of the current and future applications of sc-eQTL discoveries.

2 Evolution of sc-eQTL analyses: from an early approach to recent developments

The concept of cell-type-specific eQTLs was first introduced in 2013 in a study that measured 92 genes in 1,440 single cells from 15 individuals (Wills et al., 2013) to explore whether studying individual cells could provide greater mechanistic insights into how genetic variants quantitatively affect gene expression. However, the first large-scale genome-wide sc-eQTL study was performed in 2018 in eight major immune cell populations from 78,000 peripheral blood mononuclear cells (PBMCs) from 23 donors (Kang et al., 2018; Ma et al., 2022). This study was further expanded by identifying unfound cell-type-specific and co-expression eQTLs (van der Wijst et al., 2018) in 25,000 PBMCs from 45 donors. Similar sc-QTL studies using different single-cell transcriptomic technologies were also reported (Sarkar et al., 2019; Cuomo et al., 2020a; Mandric et al., 2020; Van Der Wijst et al., 2020; Figure 1). Single-cell transcriptomic technologies primarily fall into two categories: one that captures the full length of transcripts (e.g., Smart-seq2, MATQ-seq2, and SUPeR-seq) and another that captures the 3′/5′ends of transcripts. Full-length transcript sequencing allows for the detection of the complete transcriptome and the analysis of alternative splicing; its high cost and limited scalability makes it impractical for large-scale studies. In contrast, 3′/5′-end transcript sequencing, while less sensitive in detecting gene expression and alternative splicing, is more cost-effective and scalable and can, thus, accommodate more cells (Svensson et al., 2017; Chen et al., 2019). Recently, long-read sequencing technologies, such as PacBio and Oxford Nanopore, have emerged as powerful tools in the field, enabling the detection of full-length transcripts at high throughput and with high accuracy. These technologies are still in their infancy, but they hold great potential for expanding the capabilities of single-cell transcriptomic studies and can be expected to impact the sc-eQTL study.

FIGURE 1

Similar to eQTL analyses at the bulk level, gene regulation can be classified into two types: cis-regulation (local) and trans-regulation (distant). Most sc-eQTL studies have focused on cis-regulation due to the statistical power. In theory, cis-eQTLs can be mapped for all the genes measured in each cell. However, owing to the coverage of scRNA-seq, the identification of cis-eQTLs is currently only limited to cell-type levels. As a result, current sc-eQTL studies mainly attempt to identify cell-type-specific cis-eQTLs using single-cell transcriptomics (van der Wijst et al., 2018). To overcome the coverage issue of single-cell transcriptomic data and utilize expression levels measured by bulk transcriptomics, many computational deconvolution methods were developed to integrate single-cell and bulk transcriptomic data to identify cell-type-specific cis-eQTLs. However, a limitation of the deconvolution methods is that the analyzed cis-eQTLs were assigned to known cell types. Several studies also pointed out that the analysis of cis-eQTLs directly detected by single-cell transcriptomics outperforms deconvolution methods (Perez et al., 2022; Yazar et al., 2022).

3 Data processing for sc-eQTL mapping

While significant efforts have been made in the development of statistical methods for bulk transcriptomic data, most of these methods cannot be directly applied to sc-eQTL studies. This is because single-cell transcriptomic data have unique characteristics, such as zero-inflated gene expression. As a result, several crucial processing steps are needed to be performed before utilizing statistic methods developed for bulk RNA-seq studies on single-cell transcriptomic data.

3.1 Preprocessing single-cell transcriptomic data for eQTL mapping

The main processes involved in preparing single-cell transcriptomic data for eQTL mapping involve several key steps, including cell-level gene expression counting, quality control (QC), mean aggregation, covariate correlation procedures, and multiple testing corrections in the context of sc-eQTL mapping (Figure 2). A study by has provided optimized eQTL mapping workflows for single-cell studies (Cuomo et al., 2021).

FIGURE 2

Processes for mapping cell-type-specific eQTLs.

The process starts with counting the cell-level gene expression, which can be obtained using a variety of different methods (Teng et al., 2016; Vieth et al., 2019; Chen et al., 2021). As for digital transcript quantification, transcripts from tag-based sequencing can be combined with UMI tags. UMI tags are a series of short sequences with specifically ordered bases; they are added to the ends of cDNAs during reverse transcription, and PCR products from the same cDNA would carry the same UMI molecule. Therefore, UMI tags can distinguish cDNA repeats from biology repeats. However, transcripts from full-length scRNA-seq cannot be combined with UMI molecules, which results in a lower quality of transcript counting based on full-length sequencing than that based on tag-based sequencing. However, MATQ-seq can produce full-length transcripts that can be combined with UMI molecules (Macosko et al., 2015).

QC steps should be performed at the cell level to remove low-quality cells and normalize data to remove technical variations in the sequencing depth per cell. Batch corrections should also be used to remove poor-quality batches. A study by (Luecken and Theis, 2019) provides an overview of their best practices. Moreover, Xue et al. (2023) proposed a new guideline to optimize the number of latent variables for bulk data batch-effect correction tools, such as probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA), thereby improving the power of sc-eQTL discovery. A list of methods/tools on data transformation, scaling/normalization, and batch effect correction are provided in Table 1 and Supplementary Table S1. Among batch effect correction methods in Table 1, some are linear methods (e.g., limma and ComBat) and some belong to NN-based methods (e.g., fastMNN, Scanorama, and Seurat). The four methods (WaVE, scMerge, scVI, and LIGER) in Table 1 can handle normalization and batch correction together (Chu et al., 2022). (Tran et al., 2020) compared 14 batch effect correction methods in five scenarios. In general, the tools Harmony, LIGER, and Seurat 3 perform well in batch processing. When correcting batch effects for unknown cell types, LIGER is preferred. However, the runtime of LIGER is comparatively long. Seurat 3 enables the handling of large datasets, but requires a longer runtime. To perform downstream DEG analysis well, the scMerge tool is recommended.

TABLE 1

Name	Tools/package	Model/method	Reference	Site
Batch effect correction
limma	limma	Quantitative weighting (linear-based)	Ritchie et al., 2015	http://mirrors.nju.edu.cn/bioconductor/2.11/bioc/html/limma.html
ComBat	sva	Empirical Bayesian frameworks (linear-based)	Johnson et al., 2007	http://www.bioconductor.org/packages/release/bioc/html/sva.html
MNN	scran	Mutual nearest neighbor methods (NN-based)	Haghverdi et al., 2018	https://bioconductor.org/packages/scran
BBKNN	bbknn	Fast graph-based data integration algorithm	Polański et al., 2020	https://github.com/Teichlab/bbknn
fastMNN	batchelor	(Fast version of) mutual nearest neighbor methods (NN-based)	Haghverdi et al., 2018	https://bioconductor.org/packages/release/bioc/html/scran.html
Scanorama	scanorama	NN-based	Hie et al., 2019	https://github.com/brianhie/scanorama
Seurat	Seurat	NN-based	Hao et al. (2021)	https://satijalab.org/seurat/
Harmony	harmony	Unsupervised joint embedding (linear-based)	Korsunsky et al., 2019	https://github.com/immunogenomics/harmony
scater	scater	normaliseExprs function; svaseq; RUVSeq	McCarthy et al., 2017	http://bioconductor.org/packages/scater
DCA	DCA	Negative-binomial noise model	Eraslan et al., 2019	http://github.com/theislab/dca
scGen	scGen	Variational autoencoders; latent space-vector arithmetics	Lotfollahi et al., 2019	https://github.com/theislab/scgen
Normalization and batch effect corrections together
ZINB-WaVE	zinbwave	Extension of the RUV model	Risso et al., 2018	https://bioconductor.org/packages/zinbwave
scMerge	scMerge	MNN search and linear modeling (NN-based)	Lin et al. (2017)	https://sydneybiox.github.io/scMerge
scVI	scVI	Stochastic optimization and deep neural networks	Lopez et al., 2018	https://github.com/YosefLab/scVI
LIGER	LIGER	Integrative non-negative matrix factorization	Liu et al., 2020	https://github.com/MacoskoLab/liger

Methods/tools used for data processing in sc-eQTL mapping.

After quality control, it is necessary to perform clustering and cell-type assignment for scRNA-seq data (Cuomo et al., 2021). Major clustering tools for scRNA-seq data are based on the combination of basic clustering methods, which contain feature selection and dimensionality reduction, k-means, hierarchical clustering, and so on. Feature selection can identify genes with the highest variance. Dimensionality reduction projects data into a low-dimensional space, trying to preserve the original pairwise distances between points in the data as much as possible. Principal component analysis is one of the classical dimensionality reduction methods. Many methods, including Euclidean distance, cosine similarity, Pearson’s correlation, Spearman’s correlation, and so on, can be used to calculate the distance between points in a lower-dimensional space. K-means iteratively identifies k-cluster centers (centroids), and each cell in scRNA-seq data is assigned to the closest centroid. K-means can deal with large datasets but is not guaranteed to find the global minimum, and additionally, it is biased toward identifying equal-sized clusters, while omitting rare cell types. Another widely used clustering algorithm is hierarchical clustering, which combines individual cells into larger clusters or divides clusters into smaller groups. A visible disadvantage of hierarchical clustering is the high cost of time and memory for a large dataset. Community detection is a variant of clustering and is especially applied to graphs. This method identifies groups of nodes that are densely connected. An advantage of graph-based methods is that they do not need to specify the number of clusters.

As a single clustering method has notable disadvantages, many tools, including clustering modules, are based on a combination of several basic clustering methods. For example, clustering modules in Scanpy (Wolf et al., 2018), Seurat (Hao et al., 2021), PhenoGraph (Levine et al., 2015), SC3 (Kiselev et al., 2017; Kiselev et al., 2019), CIDR (Lin et al., 2017), pcaReduce (Žurauskienė and Yau, 2016), and TSCAN (Ji and Ji, 2016) are based on a combination of PCA and other basic clustering methods. SIMLR (Wang et al., 2018) is based on data-driven dimensionality reduction and k-means. GiniClust (Jiang et al., 2016) is based on DBSCAN; mpath and SINCERA (Guo et al., 2015) are based on hierarchical clustering; BackSPIN (Zeisel et al., 2015) is based on biclustering; RaceID3 (Grün et al., 2015) is based on k-means; and SNN-Cliq is graph-based. So, there are several user-friendly clustering tools available today. However, they have been developed for solving certain problems and it is impossible for them to be suitable for all situations.

Choosing suitable clustering and cell-type assignment algorithms for scRNA-seq data is vital (Luecken and Theis, 2019). The identification or classification of a cell into the right type or state is especially important (Van Der Wijst et al., 2020). For example, developed a clustering method based on sorting points into neighborhoods (SPIN) (Tsafrir et al., 2005). Some methods identify cell types through unsupervised clustering, such as pcaReduce and SC3. A major challenge in cell-type profiling is to identify rare cell types. A developed algorithm named rare cell-type identification (RaceID) infers abundant cell types by k-means clustering followed by systematic outlier screening (Grün et al., 2015). GiniClust detects rare cell types from single-cell gene expression data with the Gini index (Jiang et al., 2016), and GiniClust2, the upgraded version of GiniClust, is a cluster-aware weighted ensemble clustering method for cell-type detection (Tsoucas and Yuan, 2018). A newly developed tool, CellSIUS, can provide the sensitive and specific detection of rare cell populations from complex scRNA-seq data (Wegmann et al., 2019). Mean aggregation of gene expression across cells for each cell type is typically conducted by averaging gene profiles across cell types. Cell or cell-type-specific eQTLs can be mapped using eQTL mapping methods, developed especially for scRNA-seq data (Figure 2).

3.2 Methods used for sc-eQTL mapping

After preprocessing single-cell transcriptomic data, eQTL mapping is applied to identify genetic variants regulating gene expression at the single-cell-type level. Mapping can be carried out through various methods, including some sc-eQTL-specific tools (Table 2) and bulk eQTL mapping methods (Supplementary Table S2). These methods can be classified into two categories: parametric and non-parametric methods. Parametric methods, such as linear regression and ANOVA, assume that gene expression follows the normal distribution, Poisson distribution, or negative binomial distribution and use gene expression as the dependent variable, and genotypes as independent variables (Gatti et al., 2009; Shabalin, 2012). In contrast, non-parametric methods, such as the Krux method, are considered more robust and do not rely on any distribution assumption (Qi et al., 2014). Each tool presented in Table 2 has specific advantages. For example, SCeQTL (R package) utilizes zero-inflated negative binomial regression for eQTL mapping in scRNA-seq data (Hu et al., 2020). eQTLsingle can discover eQTLs solely through scRNA-seq data, without the use of genomic data (Ma et al., 2022). FastGxC is an efficient and powerful tool for mapping context-specific eQTLs in scRNA-seq data (Lu et al., 2021). Lastly, scTBLDA considers information across cell types, which is often ignored by methods that use summary statistics within cell types (Gewirtz et al., 2022).

TABLE 2

Tool/method	Reference	Traits	Site
SCeQTL	Hu et al. (2020)	Zero-inflated generalized linear model	https://github.com/XuegongLab/SCeQTL/
eQTLsingle	Ma et al. (2022)	Discover eQTLs only with scRNA-seq data	https://github.com/horsedayday/eQTLsingle
FastGxC	Andrew et al., 2021	Map context-specific eQTLs by leveraging the correlation structure of multi-context studies	https://github.com/BrunildaBalliu/FastGxC
scTBLDA	Gewirtz et al. (2022)	Uses MatrixEQTL v2.3 with modelLINEAR to run eQTL testing	https://github.com/gewirtz/scTBLDA

eQTL mapping methods/tools specifically for scRNA-seq data.

Similar to traditional bulk eQTL mapping, the effects of covariates are typically removed from a sc-eQTL analysis to improve the sensitivity and interpretability of genetic associations in population-scale expression data. For example, a recent cell-type-specific eQTL in fibroblasts and fibroblast-derived iPSC types used different covariates and probabilistic estimation of expression residual factors (Shabalin, 2012; Neavin et al., 2021). Additionally, Xue et al. (2023) highlighted three key differences between bulk data and scRNA-seq pseudo-bulk data and provided a new guideline for selecting the optimal number of latent variables for bulk data batch-effect correction tools. This guideline has the potential to significantly improve sc-eQTL discovery and is an important contribution to the field.

The method specifically developed for sc-eQTL mapping can efficiently identify context-specific genetic variants regulating gene expression at the cell-type-specific level. For example, a method called FastGxC enables the construction of context-specific eQTL maps and has the potential to increase precision in identifying GWAS variants by three-fold compared to conventional eQTL mapping methods (Lu et al., 2021).

Compared to conventional eQTL mapping methods, sc-eQTL mapping strategies face the challenge of excessive zeros in single-cell transcriptomic data (Delmans and Hemberg, 2016; Miao et al., 2018; Hu et al., 2020). To address this challenge, the R package SCeQTL uses zero-inflated negative binomial regression for the sc-eQTL analysis to detect the gene expression variation and distinguish between “status difference” and “expression level difference” (Hu et al., 2020). Some recent approaches also take into account the dynamic pseudotime-defined cell types for the sc-eQTL analysis (Cuomo et al., 2020b), which have been shown to uncover new eQTL variants. In addition, the eQTLsingle tool was developed to discover eQTLs solely with single-cell transcriptomic data and detect mutations from single-cell transcriptomic data as genotypic data (Ma et al., 2022).

4 Advantages and limitations of sc-eQTL mapping

4.1 Advantages of sc-eQTL mapping compared to bulk eQTL methods

Single-cell transcriptomic data provide several advantages in exploring the genetic architecture of gene regulation. The ability of detecting cell types and cell states in an unbiased manner using single-cell transcriptomic data makes sc-eQTL mapping a powerful tool for studying the genetic architecture of gene regulation (Grün et al., 2015; Villani et al., 2017; Hernández et al., 2018; Karamitros et al., 2018; Guerrero-Juarez et al., 2019; Umans et al., 2020).The advantages of sc-eQTL mapping include the following: 1) discovery of cell-type-specific eQTLs, 2) identification of eQTLs regulating lowly expressed genes, and 3) detection cell-type-specific eQTLs in different spatiotemporal states. We discuss these advantages in detail in the following sections (Figure 3).

FIGURE 3

Advantages of scRNA-seq data, including **(A)** Identifying cell-type-specific eQTLs; **(B)** identifying low-expressed genes; **(C)** identifying cell-type-specific co-expression networks; and **(D)** identifying cell-type-specific eQTLs in different spatiotemporal states.

4.1.1 Discovery of cell-type-specific eQTLs that are diluted in bulk RNA-seq

Single-cell transcriptomic data offer a powerful tool to uncover cell-type-specific eQTLs that are diluted in bulk transcriptomic data. Cell-type-specific cis-eQTLs identified by bulk RNA-seq data are biased to known cell types, while the ones identified by scRNA-seq data can be assigned to novel cell types. Multiple studies have demonstrated this advantage. For example, a study discovered 379 cis-eQTLs (287 genes), of which 48 cis-eQTLs (38 genes) were only detected in specific cell types, not in any eQTLs from bulk RNA-seq data (van der Wijst et al., 2018). Another study on human skin fibroblasts showed that a majority of discovered eGenes were predominantly cell-type-specific and could only be identified in one fibroblast type or one iPSC type (Neavin et al., 2021). These findings suggest a high degree of cell-type-specific gene regulations detected in the sc-eQTL analysis that cannot be captured by bulk QTL mapping. Hence, sc-eQTL can be used to improve the eQTL detection when compared to bulk RNA-seq.

4.1.2 Identification of eQTLs regulating lowly expressed genes that are omitted by bulk data

Compared with bulk RNA-seq data, scRNA-seq data allow the estimation of the variability in gene expression across individual cells (Brennecke et al., 2013) and provide a new angle on how genetics may impact disease pathogenesis. For example, owing to the low expression of TSPAN13 in abundant CD4⁺ T cells, cis-eQTL rs2272245 was not identified in the bulk RNA-seq dataset (Zhernakova et al., 2017), but it significantly affected the low expressed gene TSPAN13 in cis (p = 2.21 × 10⁻⁶) in the scRNA-seq data analysis. This shows that the bulk RNA-seq-based cis-eQTL analysis loses power in the identification of cell-type-specific loci affecting lowly expressed genes (van der Wijst et al., 2018).

4.1.3 Detection of cell-state-specific eQTLs while bulk data lose this power

scRNA-seq data enable the simultaneous estimation of the composition and expression profiles of discrete cell populations, such as their activation states (van der Wijst et al., 2018). scRNA-seq data provide a flexible unbiased approach that has increased their resolution to define cell states along continuous dynamic processes, in which the eQTL effects manifest themselves (Cuomo et al., 2020a). In an elegant study by, the authors derived 126 iPSC cell lines from 125 donors in the HipSci project (Kilpinen et al., 2017) and harvested the cells immediately before differentiation (iPSCs) and at the mesendoderm and definitive endoderm stage of differentiation (Cuomo et al., 2020b). They found that over 30% of the identified eQTLs were specific to a single stage. Moreover, 349 eQTL variants identified during differentiation stages were novel and not previously identified in bulk RNA-Seq from iPSCs or GTEx tissues, and they also illustrated that eQTLs can modulate the timing of expression changes in response to differentiation (Cuomo et al., 2020a). Altogether the study demonstrated that the identification of eQTLs at distinct time points in the development allows the discovery of novel regulatory relationships.

In a study by the mapped eQTLs in memory T cells from 259 Peruvian individuals revealed more than 2,000 eQTLs, whose presence and function varied according to the transcriptomic state of T cells. So, they demonstrated that DNA sequence variation at a particular location in the genome may influence the expression of a given gene in some T-cell states but not in others (Nathan et al., 2022).

Another study by Yazar et al. (2022) identified cell-state-dependent eQTLs in B cells transitioning from naïve to memory states. In an example with rs9927852 and MAF, the expression of MAF increased with a high cytotoxic cell-state score and remained relatively constant with low cell-state scores. So, they demonstrated that two independent eQTLs have opposite effects on the expression of the same gene in different cell states. The above two studies emphasize the complexity of genome regulation in immune cells, and scRNA-seq increases the resolution of the identified eQTLs (Yazar and Powell, 2022).

4.2 Limitations of scRNA-seq in eQTL mapping

Despite the many benefits of sc-eQTL mapping, as shown previously, several limitations have also been noted in recent studies. These limitations include the following: 1) less power in identifying eQTLs, 2) high cost of scRNA sequencing, and 3) technical noises in scRNA-seq data.

4.2.1 Less power in identifying eQTLs

sc-eQTL mapping provides a detailed annotation of the eQTL effects across diverse cell types and cell states, enabling a better interpretation of the context-specific role of individual genetic variants (Cuomo et al., 2020b). However, owing to increased experimental noise, sc-eQTL mapping has lower power to discover eQTLs compared to bulk RNA-seq data. Thus, scRNA-seq data require larger sample sizes to identify the same number of eQTLs as bulk data (Sarkar et al., 2019). For instance, scRNA-seq studies by and Perez et al., 2022 identified less than 15 cell types, whereas Ota et al., 2021 identified 28 cell types in bulk RNA-seq data (Ota et al., 2021; Perez et al., 2022; Yazar et al., 2022). As a result, if the same sample size is used for scRNA-seq, a lower number of cis-eQTLs will be detected in scRNA-seq data compared to bulk data.

4.2.2 High cost of scRNA sequencing

The second limitation of the sc-eQTL study is the high cost associated with scRNA-seq, which is a relatively expensive method for gene expression analysis. While a typical bulk RNA-sequencing experiment requires up to 20 million sequencing reads per sample, scRNA-seq needs a much higher coverage, typically 50,000 to 150,000 reads per cell. A simple scRNA-seq experiment would include thousands of cells, with hundreds of thousands of reads. For example, to detect one thousand reads per cell, it needs to detect 50–150 million reads per sample, where the number of reads captured in scRNA-seq is 2.5–7.5 times larger than that in bulk RNA-seq. Therefore, scRNA-seq needs much more memory and storage space than bulk RNA-seq experiments.

4.2.3 Noise in the scRNA-seq dataset

scRNA-seq data are high dimensional and complex. When compared to traditional bulk RNA-seq, scRNA-seq needs to amplify genetic material in each cell to meet the requirements of sequencing platforms. The amplification processes bring many technical problems, such as a notable amplification bias and low genome coverage in DNA amplification, so the clustering and homogenization analysis strategies used in bulk RNA-seq cannot be used directly in scRNA-seq data analyses. As a result, there are many differences in various cells and platforms, and library sizes vary greatly between each other. So, there is much more noises in scRNA-seq data, which demand a series of pretreatment steps before the scRNA-seq data analysis.

4.3 Strategies to overcome the limitations of scRNA-seq in mapping eQTLs

4.3.1 Decreasing the cost of scRNA-seq

One of the main limitations of scRNA-seq is its high cost. However, with the development of cost-effective multiplexed workflows, that limitation has been significantly mitigated, enabling a broader adoption of population-scale scRNA-seq and cell-type-specific eQTL studies (van der Wijst et al., 2018; Zhang et al., 2018; Cuomo et al., 2020a). Through a series of simulations, Igor M. et al. demonstrated that by increasing the sample size and number of cells per individual while decreasing coverage, it was possible to reduce the cost of the scRNA-seq experiment by half (or even more), while maintaining the same statistical power. Furthermore, they provided a practical guideline for designing cell-type-specific eQTLs (Mandric et al., 2020).

4.3.2 Developing methods for deconvoluting bulk RNA-seq signals into different cell types

The high cost of single-cell transcriptomic sequencing has led to the development of several deconvolution methods to estimate the cell-type level gene expression from the bulk mRNA expression. These deconvolution methods, such as DeconRNAseq (Gong and Szustakowski, 2013), CIBERSORT (Newman et al., 2015), CIBERSORTx (Newman et al., 2019), BSEQ-sc (Baron et al., 2016), TIMER (Li et al., 2016), MuSiC (Qin et al., 2021), DSA (Zhong et al., 2013), and MMAD (Liebner et al., 2014), have been compared and discussed in recent literature (Avila Cobos et al., 2020; Jin and Liu, 2020). For instance, CIBERSORTx extends CIBERSORT to infer cell-type-specific gene expression profiles without physical cell isolation. Detailed information on the deconvolution methods is listed in Table 3. These tools are highly useful in re-analyzing both existing and new bulk RNA-seq datasets to identify and interpret the role of cell-type-specific eQTLs in complex diseases. The most widely used bulk deconvolution methods (i.e., OLS, nnls, RLR, FARDEEP, and CIBERSORT) and the three methods that use the scRNA-seq data as a reference (i.e., DWLS, MuSiC, and SCDC) achieved median RMSE values lower than 0.05 (Avila Cobos et al., 2020).

TABLE 3

Name	Deconvolution model	Site
Methods without scRNA-seq data as a reference
OLS	Least squares	https://link.springer.com/chapter/10.1007/978-3-642–50096-1_48
nnls	Least squares	https://cran.r-project.org/web/packages/nnls/index.html
FARDEEP	Robust regression	https://CRAN.R-project.org/package = FARDEEP
RLR	Robust regression	https://CRAN.R-project.org/package = MASS
LASSO	Penalized regression	http://xai-tools.drwhy.ai/glmnet.html
Ridge	Penalized regression	http://xai-tools.drwhy.ai/glmnet.html
Elastic net	Penalized regression	http://xai-tools.drwhy.ai/glmnet.html
DCQ	Penalized regression	http://dcq.tau.ac.il/
EPIC	Weighted least squares	http://epic.gfellerlab.org/
CIBERSORT	Support-vector regression	http://cibersort.stanford.edu/
dtangle	Model in the logarithmic scale	dtangle.github.io
DSA	Digital sorting algorithm	http://web.cbio.uct.ac.za/∼renaud/CRAN/web/CellMix
ssKL	Semi-supervised non-negative matrix factorization	http://web.cbio.uct.ac.za/∼renaud/CRAN/web/CellMix
ssFrobenius	Semi-supervised non-negative matrix factorization	http://web.cbio.uct.ac.za/∼renaud/CRAN/web/CellMix
DeconRNASeq	Quadratic programming	https://bioconductor.org/packages/DeconRNASeq/
TIMER	Monte Carlo simulation; pathological approach	http://cistrome.org/TIMER
Methods with scRNA-seq data as reference
Bisque	Regression-based approach	https://github.com/cozygene/bisque
deconvSeq	Generalized linear model	https://github.com/rosedu1/deconvSeq
DWLS	Weighted least squares	https://github.com/sistia01/DWLS
MuSiC	Weighted non-negative least squares regression (W-NNLS)	https://github.com/xuranw/MuSiC
SCDC	ENSEMBLE method	http://meichendong.github.io/SCDC
BSEQ-sc	csSAM methodology	http://github.com/shenorrlab/bseq-sc
CIBERSORTx	Support vector	https://cibersortx.stanford.edu/

Computational deconvolution methods.

Detailed information for convolution methods in Table 3: OLS (ordinary least squares (Chambers et al., 1990)), NNLS (non-negative least squares (Mullen and Stokkum, 2012)), FARDEEP (Fast And Robust DEconvolution of Expression Profiles (Hao et al., 2019)), RLR (robust linear regression, MASS (Ripley et al., 2022)), LASSO (in glmnet (Friedman et al., 2010)), Ridge (in glmnet (Friedman et al., 2010)), Elastic net (in glmnet (Friedman et al., 2010)), DCQ (digital cell quantifier (Altboum et al., 2014)), DSA (digital sorting algorithm, in CellMix (Gaujoux and Seoighe, 2013)), ssKL (in CellMix (Gaujoux and Seoighe, 2013)), ssFrobenius (in CellMix (Gaujoux and Seoighe, 2013)), EPIC (estimating the proportion of immune and cancer cells (Racle et al., 2017)), CIBERSORT (cell-type identification by estimating relative subsets of RNA transcripts (Newman et al., 2015)), dtangle ((Hunt et al., 2019)), DeconRNASeq ((Gaujoux and Seoighe, 2013)), TIMER (Tumor IMmune Estimation Resource (Li et al., 2016)), Bisque ((Jew et al., 2020)), deconvSeq ((Du et al., 2019)), DWLS (dampened weighted-least squares (Tsoucas et al., 2019)), MuSiC (multi-subject single cell (Wang et al., 2019)), SCDC ((Dong et al., 2021)), BSEQ-sc (bulk sequence single-cell (Baron et al., 2016)), CIBERSORTx ((Newman et al., 2019)).

4.3.3 Batch effect correction and normalization to reduce high technical noise in scRNA-seq

Reducing high technical noise in scRNA-seq data remains a challenge. The noise can arise from differences in the sequencing platform, sequencing depth, amplification bias, RNA capture efficiency, and dropout events. Current noise reduction methods for scRNA-seq data include correcting the batch effect and normalization of the sequencing data. Recently, a comprehensive study evaluated 28 noise reduction methods and tools using 55 real and simulated datasets (Chu et al., 2022). However, it was noted that no single method can be used for all scRNA-seq experiments. The advantages and pitfalls of typical methods for batch effect correction and normalization are listed in Table 4. Therefore, the selection of an appropriate method needs caution and depends on the study design. Additionally, increasing the sample size is a feasible strategy for reducing experimental noise in scRNA-seq.

TABLE 4

Method	Advantages	Pitfalls
ComBat	Corrects for known and unknown batch effects	May not work well with highly variable genes
fastMNN	Handles analysis between two datasets and better accuracy	Lacks explainability
Seurat 3	Integrated with clustering and downstream analyses	May introduce unwanted sources of variation
Harmony	Corrects for batch effects while preserving biological signal	Requires careful selection of parameters
scMerge	Handles batch effects and integrates data from multiple batches	Performance may depend on the number of clusters in each batch
LIGER	Handles batch effects and normalization for unknown cell types	Requires a comparatively long runtime

Advantages and pitfalls of typical methods for batch effect correction and normalization.

5 Conclusion and future directions

In conclusion, this review provided an overview of the recent advances in the study of the genetic regulation of gene expression through single-cell eQTL mapping. We also discussed how to perform sc-eQTL mapping and the advantages of scRNA-seq for sc-eQTL mapping and its challenges and limitations. While sc-eQTL analysis is still in its infancy stage, it offers great potential for advancing our understanding of the genetic regulation of gene expression.

In future, the advent of single-cell transcriptomics will lead to significant advancements in the understanding of the genetic regulation of gene expression. sc-eQTL studies have revealed many previously undetected cell-type-specific eQTLs that provide new insights into disease biology. With the decrease in single-cell transcriptomic sequencing costs, sc-eQTL studies will identify new genetic variants that regulate gene expression. Furthermore, the integration of QTL signals from multi-omics at the single-cell level and spatial data can improve the resolution of gene regulation at different omics levels.

Statements

Author contributions

JL contributed to the conception and design of the manuscript and the writing and data collection. XW and YC contributed to the writing and data collection. GC, JW, and XS contributed to the writing and editing All authors contributed to the article and approved the submitted version.

Funding

This study was supported by the Key Research and Development Program of Zhejiang (2021C02052) and the Natural Science Foundation of Zhejiang Province (LY20C150004).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2023.1182579/full#supplementary-material

References

1
AltboumZ.SteuermanY.DavidE.Barnett-ItzhakiZ.ValadarskyL.Keren-ShaulH.et al (2014). Digital cell quantification identifies global immune cell dynamics during influenza infection. Mol. Syst. Biol.10 (2), 720. 10.1002/msb.134947
- CrossRef
- Google Scholar
2
Avila CobosF.Alquicira-HernandezJ.PowellJ. E.MestdaghP.De PreterK. (2020). Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun.11 (1), 5650. 10.1038/s41467-020-19015-1
- CrossRef
- Google Scholar
3
BaronM.VeresA.WolockS. L.FaustA. L.GaujouxR.VetereA.et al (2016). A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst.3 (4), 346–360. 10.1016/j.cels.2016.08.011
- CrossRef
- Google Scholar
4
BrenneckeP.AndersS.KimJ. K.KołodziejczykA. A.ZhangX.ProserpioV.et al (2013). Accounting for technical noise in single-cell RNA-seq experiments. Nat. Methods10 (11), 1093–1095. 10.1038/nmeth.2645
- CrossRef
- Google Scholar
5
BrodieA.AzariaJ. R.OfranY. (2016). How far from the SNP may the causative genes be?Nucleic Acids Res.44 (13), 6046–6054. 10.1093/nar/gkw500
- CrossRef
- Google Scholar
6
BryoisJ.CaliniD.MacnairW.FooL.UrichE.OrtmannW.et al (2022). Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders. Nat. Neurosci.25 (8), 1104–1112. 10.1038/s41593-022-01128-z
- CrossRef
- Google Scholar
7
ChambersJ.HastieT.PregibonD. (Editors) (1990). Statistical models in S. Compstat; 1990 1990 (Heidelberg: Physica-Verlag HD). 10.1201/9780203738535
- CrossRef
- Google Scholar
8
ChenG.NingB.ShiT. (2019). Single-cell RNA-seq technologies and related computational data analysis. Front. Genet.10, 317. 10.3389/fgene.2019.00317
- CrossRef
- Google Scholar
9
ChenW.ZhaoY.ChenX.YangZ.XuX.BiY.et al (2021). A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples. Nat. Biotechnol.39 (9), 1103–1114. 10.1038/s41587-020-00748-9
- CrossRef
- Google Scholar
10
ChuS-K.ZhaoS.ShyrY.LiuQ. (2022). Comprehensive evaluation of noise reduction methods for single-cell RNA sequencing data. Briefings Bioinforma.23 (2), bbab565. 10.1093/bib/bbab565
- CrossRef
- Google Scholar
11
CuomoA. S. E.AlvariG.AzodiC. B.McCarthyD. J.BonderM. J. (2021). Optimizing expression quantitative trait locus mapping workflows for single-cell studies. Genome Biol.22 (1), 188. 10.1186/s13059-021-02407-x
- CrossRef
- Google Scholar
12
CuomoA. S. E.SeatonD. D.McCarthyD. J.MartinezI.BonderM. J.Garcia-BernardoJ.et al (2020b). Publisher Correction: Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun.11 (1), 1572. 10.1038/s41467-020-15098-y
- CrossRef
- Google Scholar
13
CuomoA. S. E.SeatonD. D.McCarthyD. J.MartinezI.BonderM. J.Garcia-BernardoJ.et al (2020a). Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression. Nat. Commun.11 (1), 810. 10.1038/s41467-020-14457-z
- CrossRef
- Google Scholar
14
DelmansM.HembergM. (2016). Discrete distributional differential expression– (D3E) - a tool for gene expression analysis of single-cell RNA-seq data. BMC Bioinforma.17 (1), 110. 10.1186/s12859-016-0944-6
- CrossRef
- Google Scholar
15
DongM.ThennavanA.UrrutiaE.LiY.PerouC. M.ZouF.et al (2021). Scdc: Bulk gene expression deconvolution by multiple single-cell RNA sequencing references. Briefings Bioinforma.22 (1), 416–427. 10.1093/bib/bbz166
- CrossRef
- Google Scholar
16
DonovanM. K. R.D’Antonio-ChronowskaA.D’AntonioM.FrazerK. A. (2020). Cellular deconvolution of GTEx tissues powers discovery of disease and cell-type associated regulatory variants. Nat. Commun.11 (1), 955. 10.1038/s41467-020-14561-0
- CrossRef
- Google Scholar
17
DuR.CareyV.WeissS. T. (2019). deconvSeq: deconvolution of cell mixture distribution in sequencing data. Bioinformatics35 (24), 5095–5102. 10.1093/bioinformatics/btz444
- CrossRef
- Google Scholar
18
EraslanG.SimonL. M.MirceaM.MuellerN. S.TheisF. J. (2019). Single-cell RNA-seq denoising using a deep count autoencoder. Nat. Commun.10 (1), 390. 10.1038/s41467-018-07931-2
- CrossRef
- Google Scholar
19
FairfaxB. P.MakinoS.RadhakrishnanJ.PlantK.LeslieS.DiltheyA.et al (2012). Genetics of gene expression in primary immune cells identifies cell type–specific master regulators and roles of HLA alleles. Nat. Genet.44 (5), 502–510. 10.1038/ng.2205
- CrossRef
- Google Scholar
20
FavéM-J.LamazeF. C.SoaveD.HodgkinsonA.GauvinH.BruatV.et al (2018). Gene-by-environment interactions in urban populations modulate risk phenotypes. Nat. Commun.9 (1), 827. 10.1038/s41467-018-03202-2
- CrossRef
- Google Scholar
21
FriedmanJ.HastieT.TibshiraniR. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33 (1), 1–22. 10.18637/jss.v033.i01
- CrossRef
- Google Scholar
22
GattiD. M.SypaM.RusynI.WrightF. A.BarryW. T. (2009). Safegui: Resampling-based tests of categorical significance in gene expression data made easy. Bioinformatics25 (4), 541–542. 10.1093/bioinformatics/btn655
- CrossRef
- Google Scholar
23
GaujouxR.SeoigheC. (2013). CellMix: A comprehensive toolbox for gene expression deconvolution. Bioinformatics29 (17), 2211–2212. 10.1093/bioinformatics/btt351
- CrossRef
- Google Scholar
24
GewirtzA. D.TownesF. W.EngelhardtB. E. (2022). Expression QTLs in single-cell sequencing data. bioRxiv. 10.1101/2022.08.14.503915
- CrossRef
- Google Scholar
25
GongT.SzustakowskiJ. D. (2013). DeconRNASeq: A statistical framework for deconvolution of heterogeneous tissue samples based on mRNA-seq data. Bioinformatics29 (8), 1083–1085. 10.1093/bioinformatics/btt090
- CrossRef
- Google Scholar
26
GrünD.LyubimovaA.KesterL.WiebrandsK.BasakO.SasakiN.et al (2015). Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature525 (7568), 251–255. 10.1038/nature14966
- CrossRef
- Google Scholar
27
Guerrero-JuarezC. F.DedhiaP. H.JinS.Ruiz-VegaR.MaD.LiuY.et al (2019). Single-cell analysis reveals fibroblast heterogeneity and myeloid-derived adipocyte progenitors in murine skin wounds. Nat. Commun.10 (1), 650. 10.1038/s41467-018-08247-x
- CrossRef
- Google Scholar
28
GuoM.WangH.PotterS. S.WhitsettJ. A.XuY. (2015). Sincera: A pipeline for single-cell RNA-seq profiling analysis. PLoS Comput. Biol.11 (11), e1004575. 10.1371/journal.pcbi.1004575
- CrossRef
- Google Scholar
29
HaghverdiL.LunA. T. L.MorganM. D.MarioniJ. C. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol.36 (5), 421–427. 10.1038/nbt.4091
- CrossRef
- Google Scholar
30
HaoY.HaoS.Andersen-NissenE.MauckW. M.3rdZhengS.ButlerA.et al (2021). Integrated analysis of multimodal single-cell data. Cell184 (13), 3573–3587.e29. 10.1016/j.cell.2021.04.048
- CrossRef
- Google Scholar
31
HaoY.YanM.HeathB. R.LeiY. L.XieY. (2019). Fast and robust deconvolution of tumor infiltrating lymphocyte from expression profiles using least trimmed squares. PLoS Comput. Biol.15 (5), e1006976. 10.1371/journal.pcbi.1006976
- CrossRef
- Google Scholar
32
HernándezP. P.StrzeleckaP. M.AthanasiadisE. I.HallD.RobaloA. F.CollinsC. M.et al (2018). Single-cell transcriptional analysis reveals ILC-like cells in zebrafish. Sci. Immunol.3 (29), eaau5265. 10.1126/sciimmunol.aau5265
- CrossRef
- Google Scholar
33
HieB.BrysonB.BergerB. (2019). Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat. Biotechnol.37 (6), 685–691. 10.1038/s41587-019-0113-3
- CrossRef
- Google Scholar
34
HuY.XiX.YangQ.ZhangX. (2020). SCeQTL: an R package for identifying eQTL from single-cell parallel sequencing data. BMC Bioinforma.21 (1), 184. 10.1186/s12859-020-3534-651
- CrossRef
- Google Scholar
35
HuntG. J.FreytagS.BahloM.Gagnon-BartschJ. A. (2019). dtangle: accurate and robust cell type deconvolution. Bioinformatics35 (12), 2093–2099. 10.1093/bioinformatics/bty926
- CrossRef
- Google Scholar
36
IshigakiK.KochiY.SuzukiA.TsuchidaY.TsuchiyaH.SumitomoS.et al (2017). Polygenic burdens on cell-specific pathways underlie the risk of rheumatoid arthritis. Nat. Genet.49 (7), 1120–1125. 10.1038/ng.3885
- CrossRef
- Google Scholar
37
JewB.AlvarezM.RahmaniE.MiaoZ.KoA.GarskeK. M.et al (2020). Accurate estimation of cell composition in bulk expression through robust integration of single-cell information. Nat. Commun.11 (1), 1971. 10.1038/s41467-020-15816-6
- CrossRef
- Google Scholar
38
JiZ.JiH. (2016). Tscan: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res.44 (13), e117. 10.1093/nar/gkw430
- CrossRef
- Google Scholar
39
JiangL.ChenH.PinelloL.YuanG. C. (2016). GiniClust: Detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol.17 (1), 144. 10.1186/s13059-016-1010-4
- CrossRef
- Google Scholar
40
JinH.LiuZ. (2020). A comparative study of deconvolution methods for RNA-seq data under a dynamic testing landscape. bioRxiv, 418640. 10.1101/2020.12.09.418640
- CrossRef
- Google Scholar
41
JohnsonW. E.LiC.RabinovicA. (2007). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics8 (1), 118–127. 10.1093/biostatistics/kxj037
- CrossRef
- Google Scholar
42
KangH. M.SubramaniamM.TargS.NguyenM.MaliskovaL.McCarthyE.et al (2018). Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat. Biotechnol.36 (1), 89–94. 10.1038/nbt.4042
- CrossRef
- Google Scholar
43
KaramitrosD.StoilovaB.AboukhalilZ.HameyF.ReinischA.SamitschM.et al (2018). Single-cell analysis reveals the continuum of human lympho-myeloid progenitor cells. Nat. Immunol.19 (1), 85–97. 10.1038/s41590-017-0001-2
- CrossRef
- Google Scholar
44
KilpinenH.GoncalvesA.LehaA.AfzalV.AlasooK.AshfordS.et al (2017). Common genetic variation drives molecular heterogeneity in human iPSCs. Nature546 (7658), 370–375. 10.1038/nature22403
- CrossRef
- Google Scholar
45
Kim-HellmuthS.AguetF.OlivaM.Muoz-AguirreM.KaselaS.WucherV.et al (2020). Cell type–specific genetic regulation of gene expression across human tissues. Science369 (6509), eaaz8528. 10.1126/science.aaz8528
- CrossRef
- Google Scholar
46
KiselevV. Y.AndrewsT. S.HembergM. (2019). Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet.20 (5), 273–282. 10.1038/s41576-018-0088-9
- CrossRef
- Google Scholar
47
KiselevV. Y.KirschnerK.SchaubM. T.AndrewsT.YiuA.ChandraT.et al (2017). SC3: Consensus clustering of single-cell RNA-seq data. Nat. Methods14 (5), 483–486. 10.1038/nmeth.4236
- CrossRef
- Google Scholar
48
KorsunskyI.MillardN.FanJ.SlowikowskiK.ZhangF.WeiK.et al (2019). Fast, sensitive and accurate integration of single-cell data with Harmony. Nat. Methods16 (12), 1289–1296. 10.1038/s41592-019-0619-0
- CrossRef
- Google Scholar
49
KnowlesD. A.DavisJ. R.EdgingtonH.RajA.FavéM-J.ZhuX.et al (2017). Allele-specific expression reveals interactions between genetic variation and environment. Nat. Methods14 (7), 699–702. 10.1038/nmeth.4298
- CrossRef
- Google Scholar
50
LevineJ. H.SimondsE. F.BendallS. C.DavisK. L.Amir elA. D.TadmorM. D.et al (2015). Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell162 (1), 184–197. 10.1016/j.cell.2015.05.047
- CrossRef
- Google Scholar
51
LiB.SeversonE.PignonJ. C.ZhaoH.LiT.NovakJ.et al (2016). Comprehensive analyses of tumor immunity: Implications for cancer immunotherapy. Genome Biol.17 (1), 174. 10.1186/s13059-016-1028-7
- CrossRef
- Google Scholar
52
LiebnerD. A.HuangK.ParvinJ. D. (2014). Mmad: Microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. Bioinformatics30 (5), 682–689. 10.1093/bioinformatics/btt566
- CrossRef
- Google Scholar
53
LinP.TroupM.HoJ. W. (2017). Cidr: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data. Genome Biol.18 (1), 59. 10.1186/s13059-017-1188-0
- CrossRef
- Google Scholar
54
LiuJ.GaoC.SodicoffJ.KozarevaV.MacoskoE. Z.WelchJ. D. (2020). Jointly defining cell types from multiple single-cell datasets using LIGER. Nat. Protoc.15 (11), 3632–3662. 10.1038/s41596-020-0391-8
- CrossRef
- Google Scholar
55
LopezR.RegierJ.ColeM. B.JordanM. I.YosefN. (2018). Deep generative modeling for single-cell transcriptomics. Nat. Methods15 (12), 1053–1058. 10.1038/s41592-018-0229-2
- CrossRef
- Google Scholar
56
LotfollahiM.WolfF. A.TheisF. J. (2019). scGen predicts single-cell perturbation responses. Nat. Methods16 (8), 715–721. 10.1038/s41592-019-0494-8
- CrossRef
- Google Scholar
57
LuA.ThompsonM.GordonM. G.DahlA.YeC. J.ZaitlenN.et al (2021). Fast and powerful statistical method for context-specific QTL mapping in multi-context genomic studies. bioRxiv. 10.1101/2021.06.17.448889
- CrossRef
- Google Scholar
58
LueckenM. D.TheisF. J. (2019). Current best practices in single-cell RNA-seq analysis: A tutorial. Mol. Syst. Biol.15 (6), e8746. 10.15252/msb.20188746
- CrossRef
- Google Scholar
59
MaT.LiH.ZhangX. (2022). Discovering single-cell eQTLs from scRNA-seq data only. Gene829, 146520. 10.1016/j.gene.2022.146520
- CrossRef
- Google Scholar
60
MacoskoE. Z.BasuA.SatijaR.NemeshJ.ShekharK.GoldmanM.et al (2015). Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell161 (5), 1202–1214. 10.1016/j.cell.2015.05.002
- CrossRef
- Google Scholar
61
MandricI.SchwarzT.MajumdarA.HouK.BriscoeL.PerezR.et al (2020). Optimized design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis. Nat. Commun.11 (1), 5504. 10.1038/s41467-020-19365-w
- CrossRef
- Google Scholar
62
McCarthyD. J.CampbellK. R.LunA. T. L.WillsQ. F. (2017). Scater: Pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics33 (8), 1179–1186. 10.1093/bioinformatics/btw777
- CrossRef
- Google Scholar
63
MiaoZ.DengK.WangX.ZhangX. (2018). DEsingle for detecting three types of differential expression in single-cell RNA-seq data. Bioinformatics34 (18), 3223–3224. 10.1093/bioinformatics/bty332
- CrossRef
- Google Scholar
64
MullenK. M.StokkumI. H. M. (2012). nnls: the Lawson-Hanson algorithm for non-negative least squares (NNLS). R Package Version 14.
- Google Scholar
65
NathanA.AsgariS.IshigakiK.ValenciaC.AmariutaT.LuoY.et al (2022). Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature606 (7912), 120–128. 10.1038/s41586-022-04713-1
- CrossRef
- Google Scholar
66
NeavinD.NguyenQ.DaniszewskiM. S.LiangH. H.ChiuH. S.WeeY. K.et al (2021). Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells. Genome Biol.22 (1), 76. 10.1186/s13059-021-02293-3
- CrossRef
- Google Scholar
67
NewmanA. M.LiuC. L.GreenM. R.GentlesA. J.FengW.XuY.et al (2015). Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods12 (5), 453–457. 10.1038/nmeth.3337
- CrossRef
- Google Scholar
68
NewmanA. M.SteenC. B.LiuC. L.GentlesA. J.ChaudhuriA. A.SchererF.et al (2019). Determining cell type abundance and expression from bulk tissues with digital cytometry. Nat. Biotechnol.37 (7), 773–782. 10.1038/s41587-019-0114-2
- CrossRef
- Google Scholar
69
OtaM.NagafuchiY.HatanoH.IshigakiK.TeraoC.TakeshimaY.et al (2021). Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell184 (11), 3006–3021.e17. 10.1016/j.cell.2021.03.056
- CrossRef
- Google Scholar
70
PerezR. K.GordonM. G.SubramaniamM.KimM. C.HartoularosG. C.TargS.et al (2022). Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus. Science376 (6589), eabf1970. 10.1126/science.abf1970
- CrossRef
- Google Scholar
71
PolańskiK.YoungM. D.MiaoZ.MeyerK. B.TeichmannS. A.ParkJ. E. (2020). BBKNN: Fast batch alignment of single cell transcriptomes. Bioinformatics36 (3), 964–965. 10.1093/bioinformatics/btz625
- CrossRef
- Google Scholar
72
QiJ.Foroughi AslH.BjörkegrenJ.MichoelT. (2014). kruX: matrix-based non-parametric eQTL discovery. BMC Bioinforma.15 (1), 11. 10.1186/1471-2105-15-11
- CrossRef
- Google Scholar
73
QinY.HuttlinE. L.WinsnesC. F.GosztylaM. L.WacheulL.KellyM. R.et al (2021). A multi-scale map of cell structure fusing protein images and interactions. Nature600 (7889), 536–542. 10.1038/s41586-021-04115-9
- CrossRef
- Google Scholar
74
RacleJ.de JongeK.BaumgaertnerP.SpeiserD. E.GfellerD. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. eLife6, e26476. 10.7554/eLife.26476
- CrossRef
- Google Scholar
75
RipleyB.VenablesB.BatesD. M.HornikK.GebhardtA.FirthD. (2022). Support functions and datasets for venables and ripley's MASS [R package MASS version 7, 3–58.
- Google Scholar
76
RissoD.PerraudeauF.GribkovaS.DudoitS.VertJ.-P. (2018). A general and flexible method for signal extraction from single-cell RNA-seq data. Nat. Commun.9 (1), 284. 10.1038/s41467-017-02554-5
- CrossRef
- Google Scholar
77
RitchieM. E.PhipsonB.WuD.HuY.LawC. W.ShiW.et al (2015). Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res.43 (7), e47–e47. 10.1093/nar/gkv007
- CrossRef
- Google Scholar
78
SarkarA. K.TungP. Y.BlischakJ. D.BurnettJ. E.LiY. I.StephensM.et al (2019). Discovery and characterization of variance QTLs in human induced pluripotent stem cells. PLoS Genet.15, e1008045. 10.1371/journal.pgen.1008045
- CrossRef
- Google Scholar
79
ShabalinA. A. (2012). Matrix eQTL: Ultra fast eQTL analysis via large matrix operations. Bioinformatics28 (10), 1353–1358. 10.1093/bioinformatics/bts163
- CrossRef
- Google Scholar
80
SvenssonV.NatarajanK. N.LyL. H.MiragaiaR. J.LabaletteC.MacaulayL. C.et al (2017). Power analysis of single-cell RNA-sequencing experiments. Nat. Methods14, 381–387. 10.1038/nmeth.4220
- CrossRef
- Google Scholar
81
TengM.LoveM. I.DavisC. A.DjebaliS.DobinA.GraveleyB. R.et al (2016). A benchmark for RNA-seq quantification pipelines. Genome Biol.17 (1), 74. 10.1186/s13059-016-0940-1
- CrossRef
- Google Scholar
82
TranH. T. N.AngK. S.ChevrierM.ZhangX. M.LeeN. Y. S.GohM.et al (2020). A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol.21, 12. 10.1186/s13059-019-1850-9
- CrossRef
- Google Scholar
83
TsafrirD.TsafrirI.Ein-DorL.ZukO.NottermanD. A.DomanyE. (2005). Sorting points into neighborhoods (SPIN): Data analysis and visualization by ordering distance matrices. Bioinformatics21 (10), 2301–2308. 10.1093/bioinformatics/bti329
- CrossRef
- Google Scholar
84
TsoucasD.DongR.ChenH.ZhuQ.GuoG.YuanG-C. (2019). Accurate estimation of cell-type composition from gene expression data. Nat. Commun.10 (1), 2975. 10.1038/s41467-019-10802-z
- CrossRef
- Google Scholar
85
TsoucasD.YuanG-C. (2018). GiniClust2: A cluster-aware, weighted ensemble clustering method for cell-type detection. Genome Biol.19 (1), 58. 10.1186/s13059-018-1431-3
- CrossRef
- Google Scholar
86
UmansB. D.BattleA.GiladY. (2020). Where are the disease-associated eQTLs?Trends Genet.37 (2), 109–124. 10.1016/j.tig.2020.08.009
- CrossRef
- Google Scholar
87
van der WijstM. G. P.BruggeH.de VriesD. H.DeelenP.SwertzM. A.FrankeL.et al (2018). Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat. Genet.50 (4), 493–497. 10.1038/s41588-018-0089-9
- CrossRef
- Google Scholar
88
Van Der WijstM. G. P.De VriesD. H.GrootH. E.TrynkaG.HonC. C.BonderM. J.et al (2020). The single-cell eQTLGen consortium. elife9, e52155. 10.7554/eLife.52155
- CrossRef
- Google Scholar
89
ViethB.ParekhS.ZiegenhainC.EnardW.HellmannI. (2019). A systematic evaluation of single cell RNA-seq analysis pipelines. Nat. Commun.10 (1), 4667. 10.1038/s41467-019-12266-7
- CrossRef
- Google Scholar
90
VillaniA-C.SatijaR.ReynoldsG.SarkizovaS.ShekharK.FletcherJ.et al (2017). Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors. Science356 (6335), eaah4573. 10.1126/science.aah4573
- CrossRef
- Google Scholar
91
VisscherP. M.WrayN. R.ZhangQ.SklarP.McCarthyM. I.BrownM. A.et al (2017). 10 Years of GWAS discovery: Biology, function, and translation. Am. J. Hum. Genet.101 (1), 5–22. 10.1016/j.ajhg.2017.06.005
- CrossRef
- Google Scholar
92
WangB.RamazzottiD.De SanoL.ZhuJ.PiersonE.BatzoglouS. (2018). Simlr: A tool for large-scale genomic analyses by multi-kernel learning. Proteomics18 (2), 1700232. 10.1002/pmic.201700232
- CrossRef
- Google Scholar
93
WangX.ParkJ.SusztakK.ZhangN. R.LiM. (2019). Bulk tissue cell type deconvolution with multi-subject single-cell expression reference. Nat. Commun.10 (1), 380. 10.1038/s41467-018-08023-x
- CrossRef
- Google Scholar
94
WegmannR.NeriM.SchuiererS.BilicanB.HartkopfH.NigschF.et al (2019). CellSIUS provides sensitive and specific detection of rare cell populations from complex single-cell RNA-seq data. Genome Biol.20 (1), 142. 10.1186/s13059-019-1739-7
- CrossRef
- Google Scholar
95
WillsQ. F.LivakK. J.TippingA. J.EnverT.GoldsonA. J.SextonD. W.et al (2013). Single-cell gene expression analysis reveals genetic associations masked in whole-tissue experiments. Nat. Biotechnol.31 (8), 748–752. 10.1038/nbt.2642
- CrossRef
- Google Scholar
96
WolfF. A.AngererP.TheisF. J. (2018). Scanpy: Large-scale single-cell gene expression data analysis. Genome Biol.19 (1), 15. 10.1186/s13059-017-1382-0
- CrossRef
- Google Scholar
97
XueA.YazarS.NeavinD.PowellJ. E. (2023). Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses. Genome Biolology24 (1), 33. 10.1186/s13059-023-02873-5
- CrossRef
- Google Scholar
98
YaoZ.van VelthovenC. T. J.NguyenT. N.GoldyJ.Sedeno-CortesA. E.BaftizadehF.et al (2021). A taxonomy of transcriptomic cell types across the isocortex and hippocampal formation. Cell184 (12), 3222–3241.e26. 10.1016/j.cell.2021.04.021
- CrossRef
- Google Scholar
99
YazarS.Alquicira-HernandezJ.WingK.SenabouthA.GordonM. G.AndersenS.et al (2022). Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease. Science376 (6589), eabf3041. 10.1126/science.abf3041
- CrossRef
- Google Scholar
100
YazarS.PowellJ. E. (2022). Single-cell expression quantitative trait loci: T-Cell immunology teams up with statistical genetics. Immunol. Cell Biol.100 (8), 588–590. 10.1111/imcb.12577
- CrossRef
- Google Scholar
101
ZeiselA.Muñoz-ManchadoA. B.CodeluppiS.LönnerbergP.La MannoG.JuréusA.et al (2015). Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science347 (6226), 1138–1142. 10.1126/science.aaa1934
- CrossRef
- Google Scholar
102
ZhangT.ChoiJ.KovacsM. A.ShiJ.XuM.GoldsteinA. M.et al (2018). Cell-type-specific eQTL of primary melanocytes facilitates identification of melanoma susceptibility genes. Genome Res.28 (11), 1621–1635. 10.1101/gr.233304.117
- CrossRef
- Google Scholar
103
ZhernakovaD. V.DeelenP.VermaatM.van ItersonM.van GalenM.ArindrartoW.et al (2017). Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet.49 (1), 139–145. 10.1038/ng.3737
- CrossRef
- Google Scholar
104
ZhongY.WanY. W.PangK.ChowL. M.LiuZ. (2013). Digital sorting of complex tissues for cell type-specific gene expression profiles. BMC Bioinforma.14, 89. 10.1186/1471-2105-14-89
- CrossRef
- Google Scholar
105
ŽurauskienėJ.YauC. (2016). pcaReduce: hierarchical clustering of single cell transcriptional profiles. BMC Bioinforma.17, 140. 10.1186/s12859-016-0984-y
- CrossRef
- Google Scholar

Summary

Keywords

sc-eQTL, cell-type-specific, genetic variants, scRNA-seq, bulk RNA-seq

Citation

Luo J, Wu X, Cheng Y, Chen G, Wang J and Song X (2023) Expression quantitative trait locus studies in the era of single-cell omics. Front. Genet. 14:1182579. doi: 10.3389/fgene.2023.1182579

Received

10 March 2023

Accepted

26 April 2023

Published

22 May 2023

Volume

14 - 2023

Edited by

Shizhong Xu, University of California, Riverside, United States

Reviewed by

Marc Jan Bonder, European Molecular Biology Laboratory Heidelberg, Germany

Maud Fagny, Institut National de recherche pour l’agriculture, l’alimentation et l’environnement (INRAE), France

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Jie Luo, luojie@mail.zaas.ac.cn

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Statistical Genetics and Methodology

REVIEW article

Expression quantitative trait locus studies in the era of single-cell omics

Abstract

1 Introduction

2 Evolution of sc-eQTL analyses: from an early approach to recent developments

3 Data processing for sc-eQTL mapping