Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes

Wei, Huijin; Li, Xianghua

doi:10.3389/fgene.2023.1087267

MINI REVIEW article

Front. Genet., 12 January 2023

Sec. Genomic Assay Technology

Volume 14 - 2023 | https://doi.org/10.3389/fgene.2023.1087267

Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes

Huijin Wei¹

Xianghua Li^1,2,3,4*

¹Zhejiang University—University of Edinburgh Institute, Zhejiang University, Haining, Zhejiang, China
²Deanery of Biomedical Sciences, University of Edinburgh, Edinburgh, United Kingdom
³The Second Affiliated Hospital of Zhejiang University, Hangzhou, Zhejiang, China
⁴Biomedical and Health Translational Centre of Zhejiang Province, Haining, Zhejiang, China

Unveiling how genetic variations lead to phenotypic variations is one of the key questions in evolutionary biology, genetics, and biomedical research. Deep mutational scanning (DMS) technology has allowed the mapping of tens of thousands of genetic variations to phenotypic variations efficiently and economically. Since its first systematic introduction about a decade ago, we have witnessed the use of deep mutational scanning in many research areas leading to scientific breakthroughs. Also, the methods in each step of deep mutational scanning have become much more versatile thanks to the oligo-synthesizing technology, high-throughput phenotyping methods and deep sequencing technology. However, each specific possible step of deep mutational scanning has its pros and cons, and some limitations still await further technological development. Here, we discuss recent scientific accomplishments achieved through the deep mutational scanning and describe widely used methods in each step of deep mutational scanning. We also compare these different methods and analyze their advantages and disadvantages, providing insight into how to design a deep mutational scanning study that best suits the aims of the readers’ projects.

Introduction

Since Mendel’s experiments with peas (Mendel, 1865) laid the foundation of modern genetics about 150 years ago, our ability to read, write, and rewrite genetic information has grown prominently. In comparison, our ability to understand genetic information—i.e., mapping genetic variations to phenotypic variations—is very limited. For instance, the effects of the vast majority of human genetic variations are unknown (Riesselman et al., 2018; Frazer et al., 2021; Lappalainen and MacArthur, 2021). In light of this challenge, deep mutational scanning (DMS) was developed to systematically quantify the effects of genetic variations on a large scale, with high efficiency and relatively low cost (Fowler et al., 2011; Hietpas et al., 2012). DMS, also known as massively parallel mutagenesis (Fowler et al., 2011; Fowler and Fields, 2014), involves making a comprehensive mutant library followed by high-throughput phenotyping and deep-sequencing of the mutant libraries before and after selection (Figure 1A).

FIGURE 1

FIGURE 1. An overview of the DMS procedure (A) A mutant DNA library is transformed into cell types of interest to generate a mutant cell library. Then, the mutant cell library goes through high-throughput phenotyping where cells carrying functional variants are enriched (cells filled with blue) while those with detrimental variants are depleted (cells filled with red or purple). Genetic variants are extracted and sequenced to calculate the relative enrichment changes before and after selection. Finally, the enrichment scores are analysed as the functional scores of mutations (B) Protein-Fragment Complementation Assay (PCA) (C) The underlying assumption is that the concentrations of functional DHFR are linearly related to cell survival (fitness) (D) BindingPCA captures mutational effects on both stability and protein-protein interactions without distinguishing them (E) AbundancePCA captures mutational effects on stability (F) ddPCA combines BindingPCA and AbundancePCA and enables the inference of the bbphysical effects of mutations by quantifying and comparing phenotypic effects.

DMS has been widely used in many biological systems, allowing breakthroughs in biological and biomedical research since its first introduction (Fowler et al., 2010; Fowler and Fields, 2014). For example, many human disease-related genetic variants with unknown significance have been classified as either benign or detrimental systematically (Majithia et al., 2016; Findlay et al., 2018; Matreyek et al., 2018; Mighell et al., 2018; Bridgford et al., 2020; Mighell et al., 2020; Hanna et al., 2021; Seuma et al., 2021). Genetic interaction patterns and the underlying biophysical mechanisms have been revealed for both between genes (Diss and Lehner, 2018; Lite et al., 2020; Faure et al., 2022) and within the same gene (Olson et al., 2014; Li et al., 2016; Puchta et al., 2016; Sarkisyan et al., 2016; Baeza-Centurion et al., 2019; Yoo et al., 2020; Faure et al., 2022). Also, using the positional genetic interaction scores generated from DMS experiments, protein structures can be accurately predicted (Rollins et al., 2019; Schmiedel and Lehner, 2019). The release of the DMS data on SARS-Cov2 spike protein RBD within a year of the SARS-Cov2 outbreak (Starr et al., 2020) demonstrates that DMS is a powerful technique to address pressing questions in a relatively short period. The data accurately captured some SARS-Cov2 mutations that became prevalent in the later stage of the pandemic (Starr et al., 2020; Starr et al., 2022). Furthermore, DMS data on immune-escape mutants of various SARS-Cov2 variants (Greaney et al., 2021a; Greaney et al., 2021b; Javanmardi et al., 2022) guides better vaccine design.

A typical DMS experiment involves three steps: 1) generating a genetic mutant library; 2) performing a high-throughput phenotyping assay; 3) and deep sequencing and data analysis. Several good reviews on designing DMS experiments were published (Fowler and Fields, 2014; Shin and Cho, 2015; Starita and Fields, 2015; Matuszewski et al., 2016; Cao et al., 2022) in the early days of DMS. However, many more technical options became available in DMS thanks to the fast-developing technology in gene synthesis, sequencing technologies and high-throughput phenotyping methods since the reviews. The recent reviews (Weile and Roth, 2018; Kemble et al., 2019; Kinney and McCandlish, 2019; Narayanan and Procko, 2021; Hanning et al., 2022) in light of the DMS boom mostly focus on specific biological insights—for example, how the technique enabled breakthroughs in human genetics (Weile and Roth, 2018), on transcriptional factors (TF) and cis-regulatory elements (CRE) (Kinney and McCandlish, 2019), on viral protein and receptors (Narayanan and Procko, 2021) or therapeutic antibody engineering (Hanning et al., 2022). Kemble et al. gave a comprehensive overview of genotype-phenotype mapping (Kemble et al., 2019) enabled by DMS technology. While the DMS strategy is straightforward, each step of the technique can be tricky and complicated to generate clean and meaningful data, as it involves various synthetic biology and massive parallel assays. In addition, genetic variants from DMS experiments are of low complexity but are of a big amount that needs special attention for statistical analysis. We notice a lack of such up-to-date reviews on the insights in technical aspects.

In this review, we give an up-to-date overview of the DMS experiment (Figure 1A) with a specific focus on recently developed techniques in mutation library generation, high-throughput methods, and data analysis. Our motivation is to guide the readers on selecting the most appropriate techniques for a DMS project aim. Finally, we will discuss the ongoing efforts and challenges in improving DMS accuracy and scope.

Generating a genetic mutant library

A genetic mutant library is often first synthesized as a pool of oligos and amplified as a library of linear gene blocks. Then the amplified dsDNA is ligated to the expression vector backbones to substitute the wild-type region of the gene to be mutated. The ligation mix is introduced to the cloning cell lines to be amplified and extracted as a plasmid mutant library, which will be introduced to the destination cells (i.e., via transformation) for the next step—high throughput phenotyping assay. While most of the steps mentioned above follow regular molecular cloning procedures, mutagenesis in the very first step is not trivial and requires careful design. In this section, we will discuss the most widely used methods in designing and creating mutant libraries so that the readers can determine the optimal method to suit their research needs.

Error-prone PCR

Error-prone PCR is relatively cheap and easy to perform. It uses low-fidelity DNA polymerases to incorporate mistakes during the DNA amplification, and mutation rates can be modified by PCR conditions like different concentrations of manganese chloride and dNTP (Lin-Goerke et al., 1997; Shafikhani et al., 1997). The technique has been widely used in making random mutations for directed evolution experiments (Giver et al., 1998; Moore et al., 2000) and recently for DMS studies (Sarkisyan et al., 2016; Seuma et al., 2021; Faure et al., 2022). However, mutations generated via error-prone PCR are not completely random due to mutation biases of polymerases. For example, Taq polymerase-based mutation rates from A/T are much higher than from C/G (Shafikhani et al., 1997; Wan et al., 1998). Nowadays, error-prone PCR is made easier using commercial kits with mixes of engineered polymerases (Vanhercke et al., 2005), generating reduced biases. However, judging from the DMS data (Faure et al., 2022), mutation biases are only partially removable even with commercial kits. To be noted, error-prone PCR is suitable for generating comprehensive nucleotide-level mutations but not for all possible single amino acid substitutions for each codon. To achieve all possible 19 amino acid substitutions per codon, two consecutive nucleotides of a codon must often be mutated simultaneously. But such a mutation rate will likely hit two or more codons simultaneously, creating a mutation library mixed with single amino acid substitutions and multiple amino acid substitutions.

PCR with oligonucleotides containing mutations

Another commonly used method is a DMS library with a pool of oligos containing different mutations. Compared to the error-prone PCR, it is more costly but can generate a customized library with fewer biases. Oligonucleotides containing random mutations can be synthesized as a pool of doped oligos (Matteucci and Heyneker, 1983; Araya et al., 2012; Li et al., 2016; Puchta et al., 2016; Li et al., 2019; Wu et al., 2022) or oligos containing NNN triplets (sometimes NNS or NNK) (Hietpas et al., 2012; McLaughlin et al., 2012; Stiffler et al., 2015; Starr et al., 2017; Diss and Lehner, 2018; Hartman et al., 2018; Ahler et al., 2019; Starr et al., 2020; Park et al., 2022), where N represents any of the four nucleotide bases, S for G/C and K for G/T) targeting each codon. This strategy, combined with oligo pool synthesis technology like DropSynth (Plesa et al., 2018), allows construction of user-defined, scalable, and low-cost mutant libraries with comprehensive nucleotide or amino acid substitutions.

These oligos can be designed as doped oligos with each position incorporating a defined percentage of mutations (Starita and Fields, 2015) during the oligo synthesis. The pool of the long mutant oligos (up to 300 nt) can be used as DNA templates. These long oligos need to contain flanking wild-type sequences for primer binding, so they can be amplified and replace the wild-type sequences. On the other hand, short oligos with user-defined mutations or NNN triplets serve as primers. Mutations are introduced to the gene in a manner that is similar to site-directed mutagenesis. The oligos containing NNN triplets are more suited to create mutant libraries covering all possible single amino acid substitutions, while this doped oligo method also targets nucleotide-level mutations as error-prone PCR does. The disadvantage of using oligos with NNN triplets is that it often requires at least two consecutive PCR reactions to generate double amino acid substitutions.

Another popular primer-based method is the nicking mutagenesis (Wrenbeck et al., 2016; Faure et al., 2022), which is developed from a method called Pfunkle (Firnberg and Ostermeier, 2012). Both methods use the circular dsDNA as the template and incorporate mutations using a mix of phosphorylated primers. To remove excessive wild-type template, thymidine in the template is replaced with uracil and degraded after the mutagenesis using the uracil DNA glycosylase and exonuclease III (Exo III) (Firnberg and Ostermeier, 2012). For the same purpose, nicking mutagenesis uses a pair of endonucleases (NtBbvCl and NbBbvCl) that nick one strand of the template dsDNA at a time.

Firstly, a 5’ phosphorylated mutant oligo pool as primers is applied to the NtBbvCI-treated ssDNA template to generate the second-strand DNA with mutations. Then, a second phosphorylated primer without mutations will synthesize the complementary strand for each mutated genetic variant, using the PCR-derived strand with mutations as templates. While other primer-based mutagenesis methods require a pair of primers per mutant, both Pfunkle and nicking mutagenesis requires only one primer per mutant, greatly reducing the cost of oligo synthesis. But to perform Pfunkle or nicking mutagenesis, one needs to ensure high-quality circular DNA and careful design of the primer libraries with a freshly phosphorylated state. Nevertheless, nicking mutagenesis has been rising in popularity for achieving codon-level saturation mutagenesis recently.

Generating a library with mutations at the endogenous genetic loci

For DMS studies aimed at endogenous genetic loci, CRISPR-based technologies (Doudna and Charpentier, 2014; Rees and Liu, 2018) are used. The mutant library can be designed as a sgRNA library targeting intended genetic loci (Wang et al., 2014; Hart et al., 2015; Sadhu et al., 2018; Hanna et al., 2021) or as a donor DNA mutant library for homology-directed repair (HDR) template (Findlay et al., 2014; Sharon et al., 2018; Choudhury et al., 2020; Shen et al., 2022). The donor DNA mutant library can be generated using the methods mentioned above and combined into the backbone flanked by the recombination arms and necessary components. Simultaneous use of two gRNAs also enables multiplexed mutagenesis (Campa et al., 2019). Yet, compared to the ectopic expression of a mutation library, there are much fewer DMS studies performed at the endogenous loci due to additional technical limitations—including sgRNA-dependent uneven editing efficiencies (Wang et al., 2014; Bassalo et al., 2018; Choudhury et al., 2020), low HDR efficiency (Findlay et al., 2014), and high incidences of undetected off-target mutations and editing biases (Cui and Bikard, 2016; Zerbini et al., 2017). The challenges are even more prominent when the mammalian cell genome is the target of saturation mutagenesis (Rees and Liu, 2018).

To sum up, different methods of generating mutation libraries have their own pros and cons (Table 1). The choice of the method should be determined primarily by the purpose. For instance, should the experiment target nucleotide or codon level, single or combinations of mutations? How extensive the mutant library should be, and should the mutations be ectopically expressed or integrated into the genome?

TABLE 1

TABLE 1. Mutant library construction.

High-throughput phenotyping

After obtaining the mutant plasmid library or gene blocks via various molecular cloning steps, including amplification, ligation, etc., the library is delivered (via transformation, transfection, or transduction) to the cell types of interest for high-throughput quantification of the phenotypes coupled by the deep sequencing. These phenotyping assays are usually designed to enrich functional genetic variants while depleting the detrimental variants in a bulk experiment (Fowler and Fields, 2014; Olson et al., 2014; Bandyopadhyay et al., 2020) or via reporter-based cell sorting (Starr et al., 2017; Matreyek et al., 2018; Li et al., 2019; Park et al., 2022).

Measured phenotypes can be divided into two main categories: 1) Fitness based on the reproduction rate of cells (Li et al., 2016; Puchta et al., 2016; Domingo et al., 2018) or 2) measurement of the molecular function (abundance, binding, or activity) (Araya et al., 2012; Olson et al., 2014; Sarkisyan et al., 2016; Matreyek et al., 2018; Li et al., 2019; Tack et al., 2021; Faure et al., 2022). In this section, we will describe and compare techniques used in these two categories and another recently developed method that can decompose molecular functions via the fitness-based assay.

Fitness assays

Fitness competition is the most straightforward and economical approach for a high-throughput functional selection. Its logic is that if the gene product is required for cell survival or reproduction, cells carrying functional genetic variants will enrich. In contrast, detrimental variants will deplete over time in a culture medium. As a result, frequency changes of genetic variants can be calculated as fitness scores (Domingo et al., 2018). This strategy does not require special equipment, making it easy to conduct. However, the obtained fitness scores may not necessarily be linearly related to the molecular mechanisms of the mutations, making it complicated to acquire mechanistic insight into the mutational effects on the molecular level (Stein et al., 2019). Besides, marginally detrimental effects on molecular functions may be masked due to the non-linear relationship between fitness and molecular function (Soskine and Tawfik, 2010; Stiffler et al., 2015). It also needs to be noted that mutational effects often alter in different environments (Stiffler et al., 2015; Domingo et al., 2018; Chen et al., 2022).

Therefore, it is essential to select an optimal condition that either reflects the physiological situation best (Starita et al., 2015; Braun et al., 2018; Cantor et al., 2018; Hartman et al., 2018; Staller et al., 2018; Ahler et al., 2019; Matreyek et al., 2020; Mighell et al., 2020) to unveil disease-causing mutations or evolutionary paths of mutations. On the other hand, to easily infer biophysical effects, the fitness assay conditions should be selected to be linearly related to the molecular function (Domingo et al., 2018; Li et al., 2019; Leander et al., 2020; Starr et al., 2020; Faure et al., 2022).

Functional assays

Using the protein stability or binding affinity as a phenotype (Araya et al., 2012; Olson et al., 2014; Starr et al., 2020; Faure et al., 2022) is another widely used method to evaluate mutational effects for a protein-coding gene. This approach can capture essential biophysical effects of mutations and give more mechanistic insights into mutations.

Stability assays often involve tagging the target protein to a reporter, like the green fluorescent protein (GFP) as an indicator of the protein stability (Li et al., 2019; Leander et al., 2020; Matreyek et al., 2020; Park et al., 2022). Cells can be sorted based on the fluorescence levels into several bins, followed by deep sequencing of each sorted subpopulation (Peterman and Levine, 2016; Matreyek et al., 2018). Then, each mutant’s mean fluorescence level is calculated based on the frequencies of each genetic variant in each sorted bin.

In vitro display methods, such as phage display (Araya et al., 2012), yeast display (Klesmith et al., 2017; Starr et al., 2017; Cao et al., 2022), and mRNA display (Olson et al., 2014), detect frequency changes of genetic variants based on the binding affinity of the protein to its ligands. Although the experimental results from such an approach reveal the functional effects of mutations, it does not immediately indicate whether mutations affect the function by changing the protein stability or binding affinity, which is termed biophysical ambiguity hereafter. Nevertheless, it is crucial to resolve the biophysical ambiguity of mutations if we want to predict the combined effects of mutations (Otwinowski et al., 2018; Li and Lehner, 2020) accurately. To overcome this, approaches like combining the binding affinity-based functional assay and the stability-based assay (Starr et al., 2020), or predicting mutants’ biophysical effects by analyzing how mutations combine based on a single assay (Otwinowski et al., 2018) have been shown.

Performing two sets of different experiments (Starr et al., 2020) are often troublesome while predicting folding and binding energy changes based on the protein structures (Capriotti et al., 2005; Schymkowitz et al., 2005; Zhang et al., 2020) is not as accurate as experiment results. Recently, a method called ddPCA (Faure et al., 2022) that uses a relatively simple experimental approach to solve the biophysical ambiguity has been developed, which we will discuss in the following part.

ddPCA: Untangling biophysical parameters with the fitness assays

The method called Double Deep Protein-Fragment Complementation Assay (ddPCA) (Faure et al., 2022) is based on the protein-fragment complementation (PCA) assay (Tarassov et al., 2008). In ddPCA, the expression ratios of dihydrofolate reductase (DHFR) fragments are tweaked into two sets so that one assay can detect mutational effects on the stability of the protein (AbundancePCA) while the other detects both stability and protein-protein interactions (BindingPCA) (Figures 1B–F).

BindingPCA uses the traditional PCA method in which two interacting partners are each tagged with interacting partners are each tagged with DHFR[1,2] and DHFR[3] fragments (Figure 1D). Mutations that affect the binding affinity to the ligand and/or the protein stability will reduce the functional DHFR concentration inside the cells and therefore minimize cell survival (fitness) (Figures 1B, C). AbundancePCA, on the other hand, only has one protein-coding gene tagged to one fragment of DHFR (DHFR[3]) while overexpressing the other fragment DHFR[1,2]. This allows the cellular fitness to be solely determined by the limiting concentration of the protein tagged with DHFR fragment (i.e., DHFR [1,2], which reflects the protein stability (Figure 1E). The combination of the BindingPCA and the AbundancePCA serves to determine biophysical effects and resolve biophysical ambiguities (Figure 1F). Compared to other experimental approaches, ddPCA is a much simpler approach to unveil stability and binding affinity of mutations because both AbundancePCA and BindingPCA use the same fitness selection system. ddPCA has been applied to several allosteric proteins and resolves the ‘biophysical ambiguities’, as well as pinpointing allosteric sites systematically (Faure et al., 2022; Weng et al., 2022).

Besides the methods mentioned above, enzyme kinetics can be measured in a dynamic system using microfluidics technology. For instance, the High-Throughput Microfluidic Enzyme Kinetics (HT-MEK) in a DMS experiment allows the systematic investigation of enzymes in an automatically valved microfluidics expression system (Markin et al., 2021).

Deep sequencing and data analysis

Deep-sequencing of the genetic variants for both the input (before phenotyping) and the output (after phenotyping) follows the high-throughput phenotyping. Samples from a DMS experiment are special in that there are up to tens of thousands of genetic variants. Yet, they are with a low frequency of mutations at each position (sometimes as low as 0.1%) in an overall very homogenous sequence. Considering that genotype-phenotype mapping depends on frequencies of each genetic variant that are often only one or two hamming distances away from each other, choosing a high-throughput sequencing platform with high accuracy is especially important for a DMS study.

Sequencing platforms

The most widely used platform in DMS studies has been the Illumina HiSeq platforms for their relatively lower error rates compared to the third-generation sequencing platforms (PacBio or Nanopore sequencing) and the higher cost-effectiveness (cost/base pair) compared to Illumina MiSeq (Shendure et al., 2017; Pfeiffer et al., 2018). However, the HiSeq platforms have a sequencing read length limitation to 300 nt. This makes the identification of long-range epistatic interactions challenging if the mutated region exceeds the length limit. To overcome this, barcoding genetic variants can be applied, first to associate genetic variants with barcodes using Miseq or PacBio sequencing (Puchta et al., 2016; Starr et al., 2020; Wu et al., 2022) and then to perform deep sequencing of the barcodes using HiSeq.

Recently, UMI-based Nanopore sequencing (Zurek et al., 2020) and new circular consensus sequencing (CCS) method using PacBio (Wenger et al., 2019) were developed to increase the accuracies of long-read sequencing platforms to >99.5% (Zurek et al., 2020; Karst et al., 2021). This suggests that Nanopore or PacBio may completely substitute HiSeq for DMS studies in the future.

Experiment design and library preparation for sequencing are utterly important to obtain high-quality data, regardless of the sequencing platforms used. One should always 1) start with a sufficient number of molecules per variant in each mutant library; 2) have multiple independent biological replicates; and 3) avoid experimental bottlenecks, over-sequencing or under-sequencing. Especially, it is essential not to over-sequence as that would impactfully hinder accurate prediction of the variant frequencies (Faure et al., 2020). Read-depths should not be more than that of total expected molecule numbers before generating sequencing libraries but also need to be sufficiently bigger than the expected unique counts of genetic variants.

Data analysis

The phenotype of each genetic variant is often quantified as the normalized relative enrichment scores from the aggregated count data (i.e., after versus before selection) compared to that of the wild type, as shown in Eq. 1 below.

E_{v} = \log 2 \frac{F_{v, o u t p u t}}{F_{v, i n p u t}} - \log 2 \frac{F_{w t, o u t p u t}}{F_{w t, i n p u t}} (1)

F_v,output and F_v,input are the frequencies of a given variant v after selection (output) and before selection (input) respectively, and E_v is the normalized enrichment score of the variant to the wild type. When the phenotype is based on the reporter fluorescence intensity and cell sorting (Starr et al., 2020), functional scores as the mean fluorescence signals are estimated based on the variant counts in each sorted bin and the bin fluorescence parameters (Peterman and Levine, 2016).

To estimate mutants’ phenotypes from the sequencing data accurately and to obtain enough statistical power, correct error detection and propagation are essential. However, it is not a simple task as there are many sources of errors in the typical DMS dataset, including sequencing error, Poisson error, errors from the replicates, and stochastic error (Rubin et al., 2017). Enrich2 (Rubin et al., 2017) and a more recent software DiMSum (Faure et al., 2020) are two good statistical frameworks developed for DMS sequencing data to help users reliably quantify the data and perform error estimation. Enrich2 and DiMSum are both based on the Poisson-based sequence count distribution, and they both integrate the empirical variance into account to estimate the errors. However, the way they handle the empirical variance is different. For instance, Enrich2 takes the mix-effects from the empirical variance, while DiMSum introduced replicate-specific additive and multiplicative modifier terms from empirical variance. Thus, the error estimated from the two models differs (Faure et al., 2020). The only available and direct comparisons between the two statistical software are from Faure and others who developed DiMSum. Based on the 12 datasets examined, Enrich2 and DiMSum performed similarly well on the datasets with little overdispersion, but Enrich2 underestimates errors on the dataset with a lot of overdispersion (Faure et al., 2020). Still, DiMSum is not as widely used as Enrich2 in DMS data analysis, likely because it is still a relatively newly developed pipeline. Either error models from Enrich2 or DiMSum cannot capture systematic errors arising from the experiments, which need to be identified and judged by the researchers using diagnostic plots. After this step, one could select only reliable data based on the error thresholds for further analysis.

Relative mutational effects are often presented in a 2D map, with each x- and y-axis representing mutation position and substitution respectively, and the phenotype in a gradient of filled color as a heatmap (Figure 1). With such a descriptive figure giving an overview of the data, one can quickly judge which positions are more sensitive to mutations and whether certain types of substitutes are more acceptable than others. There are also tools developed to visualize both published and own DMS data. MaveVis, developed as part of the MaveDB (Esposito et al., 2019), allows users to generate heatmaps integrating the protein structural information for each position. It is available for both web-based interfaces and as an R package. Another web-based tool called dms-view (Hilton et al., 2020) can provide a quick exploration of the DMS data to look for specific mutations per site and in the context of protein 3D structure in an interactive manner. Compared to MaveVis, the advantage of dms-view is the integration of the protein 3D structure and logo generation based on mutational effects. However, local users cannot use the software as it is only web-based.

To obtain mechanistic insights into genotype-phenotype maps, machine-learning algorithms (Wang et al., 2014; Hart et al., 2015; Majithia et al., 2016; Klesmith et al., 2017; Rocklin et al., 2017; Weile et al., 2017; Gray et al., 2018; Staller et al., 2018; Song et al., 2021; Faure et al., 2022; Hsu et al., 2022; Leander et al., 2022) and deep learning algorithms (Sarkisyan et al., 2016; Gelman et al., 2021; Faure et al., 2022; Tareen et al., 2022; Vaishnav et al., 2022) are frequently used. Especially, a recently developed python package called MAVE-NN (Tareen et al., 2022) can unveil the one-dimensional latent phenotypes (i.e., a hidden type of biophysical parameter values) that are non-linearly linking genotypes to phenotypes, based on the neural-network algorithm. MAVE-NN has its limitations. For example, it cannot analyse DMS data with only single mutations or mutations affecting more than one type of expected biophysical parameters to reveal the latent phenotypes.

Discussion

Recent years have witnessed a boom in DMS applied to various coding and non-coding genes from many organisms, including viruses, bacteria, yeast, and mammalian cells. In this review, we presented an overview of DMS that combines synthetic biology, high-throughput phenotyping methods, and deep sequencing technology. We also showed the main steps in the DMS technique and compared different choices of designing mutation libraries, phenotyping assays, and sequencing platforms. Finally, by comparing different techniques, we gave brief guidance on selecting the most appropriate strategy according to different scientific questions and experimental models.

A high-quality DMS dataset not only provides important information on genotype-phenotype mapping for biomedical research, but also guides other research fields including structural biology, biophysics, and protein engineering. For instance, the very comprehensive single and double-mutant GB1 DMS dataset (Olson et al., 2014) enabled accurate prediction of the protein 3D structure (Schmiedel and Lehner, 2019) and biophysical effects of each mutation without doing the painstaking experiments (Gelman et al., 2021; Tareen et al., 2022). Also, the technology provides mechanistic insights into understanding and predicting mutational effects (Hsu et al., 2022), contributing to protein engineering and structure prediction (Rollins et al., 2019; Schmiedel and Lehner, 2019), biomedicine (Braun et al., 2018; Baeza-Centurion et al., 2019; Bridgford et al., 2020; Livesey and Marsh, 2020; Frazer et al., 2021) and evolution (Aakre et al., 2015; Li et al., 2016; Puchta et al., 2016; Starr et al., 2017; Domingo et al., 2018; Starr et al., 2020; Park et al., 2022; Starr et al., 2022). In light of accumulating DMS datasets and the challenge of reproducibility and source-data compilation, several pioneering labs in the field of massive parallel assays made an open-source platform called MaveDB (Esposito et al., 2019) available for DMS experiment data. By now (December 2022), more than a hundred DMS datasets have been listed in MaveDB that are available for download and analyse. An alliance called Atlas of Variant Effects (https://www.varianteffect.org) is also formed to maximise collaboration, benefits, and the influence of mutational scanning.

Still, there are limitations in DMS technology. Firstly, each DMS experiment could handle up to tens of thousands of mutations but not more. One of the limiting factors in scaling mutation libraries is transformation (transfection or transduction) efficiencies, as one does not want to generate a bottleneck by randomly sampling genetic variants that go into the destination cells. The number of successfully transformed (transfected or transduced) cells should be sufficiently higher than the library sizes to minimize the loss of some genetic variants during the transformation step. Secondly, performing DMS at the endogenous genomic loci of cells is still a big challenge. Nevertheless, endogenous DMS will become available soon with improved precision in genome editing technologies and transformation/transfection efficiencies. While DMS has been applied to various organisms, including humans, viruses, bacteria and yeast, interestingly, there is no DMS research on plant genes, even though mapping genotypes to phenotypes on the plant is both important and challenging (Voichek and Weigel, 2020; Deng et al., 2021). The reason could be the technical challenges in developing a high throughput phenotyping assay with the designed mutation pools.

To sum up, our ability to interpret genotypes is still lacking due to the complication of the genotype-phenotype maps (Domingo et al., 2019; Kinney and McCandlish, 2019). While sequencing technology is becoming more advanced and economical, ‘reading’ genetic codes has become the routine of many labs. We believe that DMS will become a laboratory routine in the near future together with further development in synthetic biology and sequencing technologies.

Author contributions

HW and XL wrote the manuscript together. This work is supported by the Young Scientists Fund of the National Natural Science Foundation of China (NSFC Grant No. 32100478).

Funding

XL is supported by the Young Scientists Fund of National Natural Science Foundation of China (NSFC Grant No.32100478). The study is also supported by the department (Zhejiang University-University of Edinburgh Institute) start-up funding and the seed-fund.

Acknowledgments

We thank members of the Li lab for comments on the manuscript.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Aakre, C. D., Herrou, J., Phung, T. N., Perchuk, B. S., Crosson, S., and Laub, M. T. (2015). Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594–606. doi:10.1016/j.cell.2015.09.055

PubMed Abstract | CrossRef Full Text | Google Scholar

Ahler, E., Register, A. C., Chakraborty, S., Fang, L., Dieter, E. M., Sitko, K. A., et al. (2019). A combined approach reveals a regulatory mechanism coupling src’s kinase activity, localization, and phosphotransferase-independent functions. Mol. Cell 74, 393–408. e20. doi:10.1016/j.molcel.2019.02.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Araya, C. L., Fowler, D. M., Chen, W., Muniez, I., Kelly, J. W., and Fields, S. (2012). A fundamental protein property, thermodynamic stability, revealed solely from large-scale measurements of protein function. Proc. Natl. Acad. Sci. U. S. A. 109, 16858–16863. doi:10.1073/pnas.1209751109

PubMed Abstract | CrossRef Full Text | Google Scholar

Baeza-Centurion, P., Miñana, B., Schmiedel, J. M., Valcárcel, J., and Lehner, B. (2019). Combinatorial genetics reveals a scaling law for the effects of mutations on splicing. Cell 176, 549–563. doi:10.1016/j.cell.2018.12.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Bandyopadhyay, S., Bhaduri, S., Örd, M., Davey, N. E., Loog, M., and Pryciak, P. M. (2020). Comprehensive analysis of G1 cyclin docking motif sequences that control CDK regulatory potency in vivo. Curr. Biol. 30, 4454–4466. doi:10.1016/j.cub.2020.08.099

PubMed Abstract | CrossRef Full Text | Google Scholar

Bassalo, M. C., Garst, A. D., Choudhury, A., Grau, W. C., Oh, E. J., Spindler, E., et al. (2018). Deep scanning lysine metabolism in Escherichia coli. Mol. Syst. Biol. 14, e8371. doi:10.15252/msb.20188371

PubMed Abstract | CrossRef Full Text | Google Scholar

Braun, S., Enculescu, M., Setty, S. T., Cortés-López, M., de Almeida, B. P., Sutandy, F. X. R., et al. (2018). Decoding a cancer-relevant splicing decision in the RON proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315–3318. doi:10.1038/s41467-018-05748-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Bridgford, J. L., Lee, S. M., Lee, C. M. M., Guglielmelli, P., Rumi, E., Pietra, D., et al. (2020). Novel drivers and modifiers of MPL-dependent oncogenic transformation identified by deep mutational scanning. Blood. Am. Soc. Hematol. 135, 287–292. doi:10.1182/blood.2019002561

PubMed Abstract | CrossRef Full Text | Google Scholar

Campa, C. C., Weisbach, N. R., Santinha, A. J., Incarnato, D., and Platt, R. J. (2019). Multiplexed genome engineering by Cas12a and CRISPR arrays encoded on single transcripts. Nat. Methods 16, 887–893. doi:10.1038/s41592-019-0508-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Cantor, A. J., Shah, N. H., and Kuriyan, J. (2018). Deep mutational analysis reveals functional trade-offs in the sequences of EGFR autophosphorylation sites. Proc. Natl. Acad. Sci. 115, E7303–E7312. doi:10.1073/pnas.1803598115

PubMed Abstract | CrossRef Full Text | Google Scholar

Cao, L., Coventry, B., Goreshnik, I., Huang, B., Sheffler, W., Park, J. S., et al. (2022). Design of protein-binding proteins from the target structure alone. Nature 605, 551–560. doi:10.1038/s41586-022-04654-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Capriotti, E., Fariselli, P., and CasadioI-Mutant2, R. (2005). I-Mutant2.0: Predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 33, W306–W310. doi:10.1093/nar/gki375

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, J. Z., Fowler, D. M., and Tokuriki, N. (2022). Environmental selection and epistasis in an empirical phenotype–environment–fitness landscape. Nat. Ecol. Evol. 6, 427–438. doi:10.1038/s41559-022-01675-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Choudhury, A., Fenster, J. A., Fankhauser, R. G., Kaar, J. L., Tenaillon, O., and Gill, R. T. (2020). CRISPR/Cas9 recombineering-mediated deep mutational scanning of essential genes in Escherichia coli. Mol. Syst. Biol. 16, e9265. doi:10.15252/msb.20199265

PubMed Abstract | CrossRef Full Text | Google Scholar

Cui, L., and Bikard, D. (2016). Consequences of Cas9 cleavage in the chromosome of Escherichia coli. Nucleic Acids Res. 44, 4243–4251. doi:10.1093/nar/gkw223

PubMed Abstract | CrossRef Full Text | Google Scholar

Deng, Z., Zhang, J., Li, J., and Zhang, X. (2021). Application of deep learning in plant–microbiota association analysis. Front. Genet. 12, 697090. doi:10.3389/fgene.2021.697090

PubMed Abstract | CrossRef Full Text | Google Scholar

Diss, G., and Lehner, B. (2018). The genetic landscape of a physical interaction. Elife 7, e32472. doi:10.7554/eLife.32472

PubMed Abstract | CrossRef Full Text | Google Scholar

Domingo, J., Baeza-Centurion, P., and Lehner, B. (2019). The causes and consequences of genetic interactions (epistasis). Annu. Rev. Genomics Hum. Genet. 20, 433–460. doi:10.1146/annurev-genom-083118-014857

PubMed Abstract | CrossRef Full Text | Google Scholar

Domingo, J., Diss, G., and Lehner, B. (2018). Pairwise and higher-order genetic interactions during the evolution of a tRNA. Nature 558, 117–121. doi:10.1038/s41586-018-0170-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Doudna, J. A., and Charpentier, E. (2014). Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096. doi:10.1126/science.1258096

PubMed Abstract | CrossRef Full Text | Google Scholar

Esposito, D., Weile, J., Shendure, J., Starita, L. M., Papenfuss, A. T., Roth, F. P., et al. (2019). MaveDB: An open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223. doi:10.1186/s13059-019-1845-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Faure, A. J., Domingo, J., Schmiedel, J. M., Hidalgo-Carcedo, C., Diss, G., and Lehner, B. (2022). Mapping the energetic and allosteric landscapes of protein binding domains. Nature 604, 175–183. doi:10.1038/s41586-022-04586-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Faure, A. J., Schmiedel, J. M., Baeza-Centurion, P., and Lehner, B. (2020). DiMSum: An error model and pipeline for analyzing deep mutational scanning data and diagnosing common experimental pathologies. Genome Biol. 21, 207–223. doi:10.1186/s13059-020-02091-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Findlay, G. M., Boyle, E. A., Hause, R. J., Klein, J. C., and Shendure, J. (2014). Saturation editing of genomic regions by multiplex homology-directed repair. Nature 513, 120–123. doi:10.1038/nature13695

PubMed Abstract | CrossRef Full Text | Google Scholar

Findlay, G. M., Daza, R. M., Martin, B., Zhang, M. D., Leith, A. P., Gasperini, M., et al. (2018). Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222. doi:10.1038/s41586-018-0461-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Firnberg, E., and Ostermeier, M. (2012). PFunkel: Efficient, expansive, user-defined mutagenesis. PLoS One 7, e52031. doi:10.1371/journal.pone.0052031

PubMed Abstract | CrossRef Full Text | Google Scholar

Fowler, D. M., Araya, C. L., Fleishman, S. J., Kellogg, E. H., Stephany, J. J., Baker, D., et al. (2010). High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741–746. doi:10.1038/nmeth.1492

PubMed Abstract | CrossRef Full Text | Google Scholar

Fowler, D. M., Araya, C. L., Gerard, W., and Fields, S. (2011). Enrich: Software for analysis of protein function by enrichment and depletion of variants. Bioinformatics 27, 3430–3431. doi:10.1093/bioinformatics/btr577

PubMed Abstract | CrossRef Full Text | Google Scholar

Fowler, D. M., and Fields, S. (2014). Deep mutational scanning: A new style of protein science. Nat. Methods. 11, 801–807. doi:10.1038/nmeth.3027

PubMed Abstract | CrossRef Full Text | Google Scholar

Frazer, J., Notin, P., Dias, M., Gomez, A., Min, J. K., Brock, K., et al. (2021). Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95. doi:10.1038/s41586-021-04043-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Gelman, S., Fahlberg, S. A., Heinzelman, P., Romero, P. A., and Gitter, A. (2021). Neural networks to learn protein sequence–function relationships from deep mutational scanning data. Proc. Natl. Acad. Sci. 118, e2104878118. doi:10.1073/pnas.2104878118

PubMed Abstract | CrossRef Full Text | Google Scholar

Giver, L., Gershenson, A., Freskgard, P. O., and Arnold, F. H. (1998). Directed evolution of a thermostable esterase. Proc. Natl. Acad. Sci. U. S. A. 95, 12809–12813. doi:10.1073/pnas.95.22.12809

PubMed Abstract | CrossRef Full Text | Google Scholar

Gray, V. E., Hause, R. J., Luebeck, J., Shendure, J., and Fowler, D. M. (2018). Quantitative missense variant effect prediction using large-scale mutagenesis data. Cell Syst. 6, 116–124. doi:10.1016/j.cels.2017.11.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Greaney, A. J., Starr, T. N., Barnes, C. O., Weisblum, Y., Schmidt, F., Caskey, M., et al. (2021). Mapping mutations to the SARS-CoV-2 RBD that escape binding by different classes of antibodies. Nat. Commun. 12, 4196. doi:10.1038/s41467-021-24435-8

PubMed Abstract | CrossRef Full Text | Google Scholar

Greaney, A. J., Starr, T. N., Gilchuk, P., Zost, S. J., Binshtein, E., Loes, A. N., et al. (2021). Complete mapping of mutations to the SARS-CoV-2 spike receptor-binding domain that escape antibody recognition. Cell Host Microbe 29, 44–57.e9. doi:10.1016/j.chom.2020.11.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Hanna, R. E., Hegde, M., Fagre, C. R., DeWeirdt, P. C., Sangree, A. K., Szegletes, Z., et al. (2021). Massively parallel assessment of human variants with base editor screens. Cell 184, 1064–1080.e20. doi:10.1016/j.cell.2021.01.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Hanning, K. R., Minot, M., Warrender, A. K., Kelton, W., and Reddy, S. T. (2022). Deep mutational scanning for therapeutic antibody engineering. Trends Pharmacol. Sci. 43, 123–135. doi:10.1016/j.tips.2021.11.010

PubMed Abstract | CrossRef Full Text | Google Scholar

Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K. R., MacLeod, G., et al. (2015). High-Resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526. doi:10.1016/j.cell.2015.11.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Hartman, E. C., Jakobson, C. M., Favor, A. H., Lobba, M. J., Álvarez-Benedicto, E., Francis, M. B., et al. (2018). Quantitative characterization of all single amino acid variants of a viral capsid-based drug delivery vehicle. Nat. Commun. 9, 1385–1395. doi:10.1038/s41467-018-03783-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Hietpas, R., Roscoe, B., Jiang, L., and Bolon, D. N. A. (2012). Fitness analyses of all possible point mutations for regions of genes in yeast. Nat. Protoc. 7, 1382–1396. doi:10.1038/nprot.2012.069

PubMed Abstract | CrossRef Full Text | Google Scholar

Hilton, S. K., Huddleston, J., Black, A., North, K., Dingens, A. S., Bedford, T., et al. (2020). dms-view: Interactive visualization tool for deep mutational scanning data. J. open source Softw. 5, 2353. doi:10.21105/joss.02353

PubMed Abstract | CrossRef Full Text | Google Scholar

Hsu, C., Nisonoff, H., Fannjiang, C., and Listgarten, J. (2022). Learning protein fitness models from evolutionary and assay-labeled data. Nat. Biotechnol. 40, 1114–1122. doi:10.1038/s41587-021-01146-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Javanmardi, K., Segall-Shapiro, T. H., Chou, C-W., Boutz, D. R., Olsen, R. J., Xie, X., et al. (2022). Antibody escape and cryptic cross-domain stabilization in the SARS-CoV-2 Omicron spike protein. Cell Host Microbe 30 (9), 1242–1254.e6. doi:10.1016/j.chom.2022.07.016

PubMed Abstract | CrossRef Full Text | Google Scholar

Karst, S. M., Ziels, R. M., Kirkegaard, R. H., Sørensen, E. A., McDonald, D., Zhu, Q., et al. (2021). High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing. Nat. Methods 18, 165–169. doi:10.1038/s41592-020-01041-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Kemble, H., Nghe, P., and Tenaillon, O. (2019). Recent insights into the genotype–phenotype relationship from massively parallel genetic assays. Evol. Appl. 12, 1721–1742. doi:10.1111/eva.12846

PubMed Abstract | CrossRef Full Text | Google Scholar

Kinney, J. B., and McCandlish, D. M. (2019). Massively parallel assays and quantitative sequence-function relationships. Annu. Rev. Genomics Hum. Genet. Aug 31, 99–127. doi:10.1146/annurev-genom-083118-014845

PubMed Abstract | CrossRef Full Text | Google Scholar

Klesmith, J. R., Bacik, J. P., Wrenbeck, E. E., Michalczyk, R., and Whitehead, T. A. (2017). Trade-offs between enzyme fitness and solubility illuminated by deep mutational scanning. Proc. Natl. Acad. Sci. U. S. A. 114, 2265–2270. doi:10.1073/pnas.1614437114

PubMed Abstract | CrossRef Full Text | Google Scholar

Lappalainen, T., and MacArthur, D. G. (2021). From variant to function in human disease genetics. Sci. Am. Assoc. Adv. Sci. 373, 1464–1468. doi:10.1126/science.abi8207

PubMed Abstract | CrossRef Full Text | Google Scholar

Leander, M., Liu, Z., Cui, Q., and Raman, S. (2022). Deep mutational scanning and machine learning reveal structural and molecular rules governing allosteric hotspots in homologous proteins. Elife 11, e79932. doi:10.7554/eLife.79932

PubMed Abstract | CrossRef Full Text | Google Scholar

Leander, M., Yuan, Y., Meger, A., Cui, Q., and Raman, S. (2020). Functional plasticity and evolutionary adaptation of allosteric regulation. Proc. Natl. Acad. Sci. U. S. A. 117, 25445–25454. doi:10.1073/pnas.2002613117

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, C., Qian, W., Maclean, C. J., and Zhang, J. (2016). The fitness landscape of a tRNA gene. Science 352, 837–840. doi:10.1126/science.aae0568

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, X., Lalić, J., Baeza-Centurion, P., Dhar, R., and Lehner, B. (2019). Changes in gene expression predictably shift and switch genetic interactions. Nat. Commun. 10, 3886. doi:10.1038/s41467-019-11735-3

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, X., and Lehner, B. (2020). Biophysical ambiguities prevent accurate genetic prediction. Nat. Commun. 11, 4923. doi:10.1038/s41467-020-18694-0

PubMed Abstract | CrossRef Full Text | Google Scholar

Lin-Goerke, J. L., Robbins, D. J., and Burczak, J. D. (1997). PCR-based random mutagenesis using manganese and reduced dNTP concentration. Biotechniques 23, 409–412. doi:10.2144/97233bm12

PubMed Abstract | CrossRef Full Text | Google Scholar

Lite, T-L. V., Grant, R. A., Nocedal, I., Littlehale, M. L., Guo, M. S., and Laub, M. T. (2020). Uncovering the basis of protein-protein interaction specificity with a combinatorially complete library. Elife 9, e60924. doi:10.7554/eLife.60924

PubMed Abstract | CrossRef Full Text | Google Scholar

Livesey, B. J., and Marsh, J. A. (2020). Using deep mutational scanning to benchmark variant effect predictors and identify disease mutations. Mol. Syst. Biol. 16, e9380. doi:10.15252/msb.20199380

PubMed Abstract | CrossRef Full Text | Google Scholar

Majithia, A. R., Tsuda, B., Agostini, M., Gnanapradeepan, K., Rice, R., Peloso, G., et al. (2016). Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575. doi:10.1038/ng.3700

PubMed Abstract | CrossRef Full Text | Google Scholar

Markin, C. J., Mokhtari, D. A., Sunden, F., Appel, M. J., Akiva, E., Longwell, S. A., et al. (2021). Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics. Science 373, eabf8761. doi:10.1126/science.abf8761

PubMed Abstract | CrossRef Full Text | Google Scholar

Matreyek, K. A., Starita, L. M., Stephany, J. J., Martin, B., Chiasson, M. A., Gray, V. E., et al. (2018). Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882. doi:10.1038/s41588-018-0122-z

PubMed Abstract | CrossRef Full Text | Google Scholar

Matreyek, K. A., Stephany, J. J., Chiasson, M. A., Hasle, N., and Fowler, D. M. (2020). An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 48, e1. doi:10.1093/nar/gkz910

PubMed Abstract | CrossRef Full Text | Google Scholar

Matteucci, M. D., and Heyneker, H. L. (1983). Targeted random mutagenesis: The use of ambiguously synthesized oligonucleotides to mutagenize sequences immediately 5‘ of an ATG initiation codon. Nucleic Acids Res. 11, 3113–3121. doi:10.1093/nar/11.10.3113

PubMed Abstract | CrossRef Full Text | Google Scholar

Matuszewski, S., Hildebrandt, M. E., Ghenu, A-H., Jensen, J. D., and Bank, C. (2016). A statistical guide to the design of deep mutational scanning experiments. Genetics 204, 77–87. doi:10.1534/genetics.116.190462

PubMed Abstract | CrossRef Full Text | Google Scholar

McLaughlin, R. N., Poelwijk, F. J., Raman, A., Gosal, W. S., and Ranganathan, R. (2012). The spatial architecture of protein function and adaptation. Nature 491, 138–142. doi:10.1038/nature11500

PubMed Abstract | CrossRef Full Text | Google Scholar

Mendel, G. (1865). Versuche über Plflanzen-hybriden. Verhandlungen des naturforschenden Vereines 4, 3–47.

Google Scholar

Mendel, G. (1941). Versuche über Pflanzen-Hybriden. Zauchter Z. fur Theor Angew Genet 13, 221–268. doi:10.1007/bf01804628

CrossRef Full Text | Google Scholar

Mighell, T. L., Evans-Dutson, S., and O’Roak, B. J. (2018). A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am. J. Hum. Genet. 102, 943–955. doi:10.1016/j.ajhg.2018.03.018

PubMed Abstract | CrossRef Full Text | Google Scholar

Mighell, T. L., Thacker, S., Fombonne, E., Eng, C., and O’Roak, B. J. (2020). An integrated deep-mutational-scanning approach provides clinical insights on PTEN genotype-phenotype relationships. Am. J. Hum. Genet. 106, 818–829. doi:10.1016/j.ajhg.2020.04.014

PubMed Abstract | CrossRef Full Text | Google Scholar

Moore, G. L., Maranas, C. D., Gl, M., and Cd, M. (2000). Modeling DNA mutation and recombination for directed evolution experiments. J. Theor. Biol. 205 (3), 483–503. doi:10.1006/jtbi.2000.2082

PubMed Abstract | CrossRef Full Text | Google Scholar

Narayanan, K. K., and Procko, E. (2021). Deep mutational scanning of viral glycoproteins and their host receptors. Front. Mol. Biosci. 8, 636660. doi:10.3389/fmolb.2021.636660

PubMed Abstract | CrossRef Full Text | Google Scholar

Olson, C. A., Wu, N. C., and Sun, R. (2014). A comprehensive biophysical description of pairwise epistasis throughout an entire protein domain. Curr. Biol. 24, 2643–2651. doi:10.1016/j.cub.2014.09.072

PubMed Abstract | CrossRef Full Text | Google Scholar

Otwinowski, J., McCandlish, D. M., and Plotkin, J. B. (2018). Inferring the shape of global epistasis. Proc. Natl. Acad. Sci. U. S. A. 115, E7550–E7558. doi:10.1073/pnas.1804015115

PubMed Abstract | CrossRef Full Text | Google Scholar

Park, Y., Metzger, B. P. H. H., and Thornton, J. W. (2022). Epistatic drift causes gradual decay of predictability in protein evolution. Science 376, 823–830. doi:10.1126/science.abn6895

PubMed Abstract | CrossRef Full Text | Google Scholar

Peterman, N., and Levine, E. (2016). Sort-seq under the hood: Implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics 17, 206–217. doi:10.1186/s12864-016-2533-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Pfeiffer, F., Gröber, C., Blank, M., Händler, K., Beyer, M., Schultze, J. L., et al. (2018). Systematic evaluation of error rates and causes in short samples in next-generation sequencing. Sci. Rep. 8, 10950. doi:10.1038/s41598-018-29325-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Plesa, C., Sidore, A. M., Lubock, N. B., Zhang, D., and Kosuri, S. (2018). Multiplexed gene synthesis in emulsions for exploring protein functional landscapes. Sci. Am. Assoc. Adv. Sci. 359, 343–347. doi:10.1126/science.aao5167

PubMed Abstract | CrossRef Full Text | Google Scholar

Puchta, O., Cseke, B., Czaja, H., Tollervey, D., Sanguinetti, G., and Kudla, G. (2016). Network of epistatic interactions within a yeast snoRNA. Science 352, 840–844. doi:10.1126/science.aaf0965

PubMed Abstract | CrossRef Full Text | Google Scholar

Rees, H. A., and Liu, D. R. (2018). Base editing: Precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788. doi:10.1038/s41576-018-0059-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Riesselman, A. J., Ingraham, J. B., and Marks, D. S. (2018). Deep generative models of genetic variation capture the effects of mutations. Nat. Methods 15, 816–822. doi:10.1038/s41592-018-0138-4

PubMed Abstract | CrossRef Full Text | Google Scholar

Rocklin, G. J., Chidyausiku, T. M., Goreshnik, I., Ford, A., Houliston, S., Lemak, A., et al. (2017). Global analysis of protein folding using massively parallel design, synthesis, and testing. Science 357, 168–175. doi:10.1126/science.aan0693

PubMed Abstract | CrossRef Full Text | Google Scholar

Rollins, N. J., Brock, K. P., Poelwijk, F. J., Stiffler, M. A., Gauthier, N. P., Sander, C., et al. (2019). Inferring protein 3D structure from deep mutation scans. Nat. Genet. 51, 1170–1176. doi:10.1038/s41588-019-0432-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Rubin, A. F., Gelman, H., Lucas, N., Bajjalieh, S. M., Papenfuss, A. T., Speed, T. P., et al. (2017). A statistical framework for analyzing deep mutational scanning data. Genome Biol. 18, 150–164. doi:10.1186/s13059-017-1272-5

PubMed Abstract | CrossRef Full Text | Google Scholar

Sadhu, M. J., Bloom, J. S., Day, L., Siegel, J. J., Kosuri, S., and Kruglyak, L. (2018). Highly parallel genome variant engineering with CRISPR-Cas9. Nat. Genet. 50, 510–514. doi:10.1038/s41588-018-0087-y

PubMed Abstract | CrossRef Full Text | Google Scholar

Sarkisyan, K. S., Bolotin, D. A., Meer, M. V., Usmanova, D. R., Mishin, A. S., Sharonov, G. V., et al. (2016). Local fitness landscape of the green fluorescent protein. Nature 533, 397–401. doi:10.1038/nature17995

PubMed Abstract | CrossRef Full Text | Google Scholar

Schmiedel, J. M., and Lehner, B. (2019). Determining protein structures using deep mutagenesis. Nat. Genet. 51, 1177–1186. doi:10.1038/s41588-019-0431-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F., and Serrano, L. (2005). The FoldX web server: An online force field. Nucleic Acids Res. 33, W382–W388. doi:10.1093/nar/gki387

PubMed Abstract | CrossRef Full Text | Google Scholar

Seuma, M., Faure, A., Badia, M., Lehner, B., and Bolognesi, B. (2021). The genetic landscape for amyloid beta fibril nucleation accurately discriminates familial Alzheimer’s disease mutations. Elife 10, e63364–e63382. doi:10.7554/eLife.63364

PubMed Abstract | CrossRef Full Text | Google Scholar

Shafikhani, S., Siegel, R. A., Ferrari, E., and Schellenberger, V. (1997). Generation of large libraries of random mutants in Bacillus subtilis by PCR-based plasmid multimerization. Biotechniques 23, 304–310. doi:10.2144/97232rr01

PubMed Abstract | CrossRef Full Text | Google Scholar

Sharon, E., Chen, S. A. A., Khosla, N. M., Smith, J. D., Pritchard, J. K., and Fraser, H. B. (2018). Functional genetic variants revealed by massively parallel precise genome editing. Cell 175, 544–557. doi:10.1016/j.cell.2018.08.057

PubMed Abstract | CrossRef Full Text | Google Scholar

Shen, X., Song, S., Li, C., and Zhang, J. (2022). Synonymous mutations in representative yeast genes are mostly strongly non-neutral. Nature 606, 725–731. doi:10.1038/s41586-022-04823-w

PubMed Abstract | CrossRef Full Text | Google Scholar

Shendure, J., Balasubramanian, S., Church, G. M., Gilbert, W., Rogers, J., Schloss, J. A., et al. (2017). DNA sequencing at 40: Past, present and future. Nature 550, 345–353. doi:10.1038/nature24286

PubMed Abstract | CrossRef Full Text | Google Scholar

Shin, H., and Cho, B-K. (2015). Rational protein engineering guided by deep mutational scanning. Int. J. Mol. Sci. 16, 23094–23110. doi:10.3390/ijms160923094

PubMed Abstract | CrossRef Full Text | Google Scholar

Song, H., Bremer, B. J., Hinds, E. C., Raskutti, G., and Romero, P. A. (2021). Inferring protein sequence-function relationships with large-scale positive-unlabeled learning. Cell Syst. 12, 92–101.e8. doi:10.1016/j.cels.2020.10.007

PubMed Abstract | CrossRef Full Text | Google Scholar

Soskine, M., and Tawfik, D. S. (2010). Mutational effects and the evolution of new protein functions. Nat. Rev. Genet. 11, 572–582. doi:10.1038/nrg2808

PubMed Abstract | CrossRef Full Text | Google Scholar

Staller, M. V., Holehouse, A. S., Swain-Lenz, D., Das, R. K., Pappu, R. V., and Cohen, B. A. (2018). A high-throughput mutational scan of an intrinsically disordered acidic transcriptional activation domain. Cell Syst. 6, 444–455. doi:10.1016/j.cels.2018.01.015

PubMed Abstract | CrossRef Full Text | Google Scholar

Starita, L. M., and Fields, S. (2015). Deep mutational scanning: Library construction, functional selection, and high-throughput sequencing. Cold Spring Harb. Protoc. 2015, 777–780. doi:10.1101/pdb.prot085225

PubMed Abstract | CrossRef Full Text | Google Scholar

Starita, L. M., Young, D. L., Islam, M., Kitzman, J. O., Gullingsrud, J., Hause, R. J., et al. (2015). Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 200, 413–422. doi:10.1534/genetics.115.175802

PubMed Abstract | CrossRef Full Text | Google Scholar

Starr, T. N., Greaney, A. J., Hannon, W. W., Loes, A. N., Hauser, K., Dillen, J. R., et al. (2022). Shifting mutational constraints in the SARS-CoV-2 receptor-binding domain during viral evolution. Science 377, 420–424. doi:10.1126/science.abo7896

PubMed Abstract | CrossRef Full Text | Google Scholar

Starr, T. N., Greaney, A. J., Hilton, S. K., Ellis, D., Crawford, K. H. D., Dingens, A. S., et al. (2020). Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding. Cell 182, 1295–1310. doi:10.1016/j.cell.2020.08.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Starr, T. N., Picton, L. K., and Thornton, J. W. (2017). Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413. doi:10.1038/nature23902

PubMed Abstract | CrossRef Full Text | Google Scholar

Stein, A., Fowler, D. M., Hartmann-Petersen, R., and Lindorff-Larsen, K. (2019). Biophysical and mechanistic models for disease-causing protein variants. Trends Biochem. Sci. 44, 575–588. doi:10.1016/j.tibs.2019.01.003

PubMed Abstract | CrossRef Full Text | Google Scholar

Stiffler, M. A., Hekstra, D. R., and Ranganathan, R. (2015). Evolvability as a function of purifying selection in TEM-1 β-lactamase. Cell 160, 882–892. doi:10.1016/j.cell.2015.01.035

PubMed Abstract | CrossRef Full Text | Google Scholar

Tack, D. S., Tonner, P. D., Pressman, A., Olson, N. D., Levy, S. F., Romantseva, E. F., et al. (2021). The genotype-phenotype landscape of an allosteric protein. Mol. Syst. Biol. 17, e10847. doi:10.15252/msb.202110847

PubMed Abstract | CrossRef Full Text | Google Scholar

Tarassov, K., Messier, V., Landry, C. R., Radinovic, S., Molina, M. M. S., Shames, I., et al. (2008). An in vivo map of the yeast protein interactome. Science 320, 1465–1470. doi:10.1126/science.1153878

PubMed Abstract | CrossRef Full Text | Google Scholar

Tareen, A., Kooshkbaghi, M., Posfai, A., Ireland, W. T., McCandlish, D. M., and Kinney, J. B. (2022). MAVE-NN: Learning genotype-phenotype maps from multiplex assays of variant effect. Genome Biol. 23, 98–27. doi:10.1186/s13059-022-02661-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Vaishnav, E. D., de Boer, C. G., Molinet, J., Yassour, M., Fan, L., Adiconis, X., et al. (2022). The evolution, evolvability and engineering of gene regulatory DNA. Nature 603, 455–463. doi:10.1038/s41586-022-04506-6

PubMed Abstract | CrossRef Full Text | Google Scholar

Vanhercke, T., Ampe, C., Tirry, L., and Denolf, P. (2005). Reducing mutational bias in random protein libraries. Anal. Biochem. 339, 9–14. doi:10.1016/j.ab.2004.11.032

PubMed Abstract | CrossRef Full Text | Google Scholar

Voichek, Y., and Weigel, D. (2020). Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat. Genet. 52, 534–540. doi:10.1038/s41588-020-0612-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Wan, L., Twitchett, M. B., Eltis, L. D., Mauk, A. G., and Smith, M. (1998). In vitro evolution of horse heart myoglobin to increase peroxidase activity. Proc. Natl. Acad. Sci. U. S. A. 95, 12825–12831. doi:10.1073/pnas.95.22.12825

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, T., Wei, J. J., Sabatini, D. M., and Lander, E. S. (2014). Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84. doi:10.1126/science.1246981

PubMed Abstract | CrossRef Full Text | Google Scholar

Weile, J., and Roth, F. P. (2018). Multiplexed assays of variant effects contribute to a growing genotype–phenotype atlas. Hum. Genet. 137, 665–678. doi:10.1007/s00439-018-1916-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Weile, J., Sun, S., Cote, A. G., Knapp, J., Verby, M., Mellor, J. C., et al. (2017). A framework for exhaustively mapping functional missense variants. Mol. Syst. Biol. 13, 957. doi:10.15252/msb.20177908

PubMed Abstract | CrossRef Full Text | Google Scholar

Weng, C., Faure, A., and Lehner, B. (2022). The energetic and allosteric landscape for KRAS inhibition. bioRxiv 12, 840–844. doi:10.1101/2022.12.06.519122

CrossRef Full Text | Google Scholar

Wenger, A. M., Peluso, P., Rowell, W. J., Chang, P-C., Hall, R. J., Concepcion, G. T., et al. (2019). Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162. doi:10.1038/s41587-019-0217-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Wrenbeck, E. E., Klesmith, J. R., Stapleton, J. A., Adeniran, A., Tyo, K. E. J., and Whitehead, T. A. (2016). Plasmid-based one-pot saturation mutagenesis. Nat. Methods 13, 928–930. doi:10.1038/nmeth.4029

PubMed Abstract | CrossRef Full Text | Google Scholar

Wu, Z., Cai, X., Zhang, X., Liu, Y., Tian, G., Yang, J-R., et al. (2022). Expression level is a major modifier of the fitness landscape of a protein coding gene. Nat. Ecol. Evol. 6, 103–115. doi:10.1038/s41559-021-01578-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Yoo, J. I., Daugherty, P. S., and O’Malley, M. A. (2020). Bridging non-overlapping reads illuminates high-order epistasis between distal protein sites in a GPCR. Nat. Commun. 11, 690. doi:10.1038/s41467-020-14495-7

PubMed Abstract | CrossRef Full Text | Google Scholar

Zerbini, F., Zanella, I., Fraccascia, D., König, E., Irene, C., Frattini, L. F., et al. (2017). Large scale validation of an efficient CRISPR/Cas-based multi gene editing protocol in Escherichia coli. Microb. Cell Fact. 16, 68. doi:10.1186/s12934-017-0681-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, N., Chen, Y., Lu, H., Zhao, F., Alvarez, R. V., Goncearenco, A., et al. (2020). MutaBind2: Predicting the impacts of single and multiple mutations on protein-protein interactions. iScience 23, 100939. doi:10.1016/j.isci.2020.100939

PubMed Abstract | CrossRef Full Text | Google Scholar

Zurek, P. J., Knyphausen, P., Neufeld, K., Pushpanath, A., and Hollfelder, F. (2020). UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution. Nat. Commun. 11, 6023. doi:10.1038/s41467-020-19687-9

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: deep mutational scanning, genotype-phenotype mapping, massively parallel mutagenesis, high-throughput analysis, systems biology, biotechnology

Citation: Wei H and Li X (2023) Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Front. Genet. 14:1087267. doi: 10.3389/fgene.2023.1087267

Received: 02 November 2022; Accepted: 02 January 2023;
Published: 12 January 2023.

Edited by:

Honey V. Reddi, Medical College of Wisconsin, United States

Reviewed by:

Xiujun Zhang, Wuhan Botanical Garden (CAS), China
Qian Nie, Medical College of Wisconsin, United States

Copyright © 2023 Wei and Li. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Xianghua Li, djF4bGkyMjZAZXhzZWVkLmVkLmFjLnVr

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.