- 1Institute of Molecular Physical Science, ETH Zurich, Zurich, Switzerland
- 2Institute of Biophysical Chemistry, Goethe University Frankfurt, Frankfurt, Germany
- 3Department of Chemistry, Tokyo Metropolitan University, Hachioji, Japan
Chemical shift transfer (CST) is a well-established technique in NMR spectroscopy that utilizes the chemical shift assignment of one protein (source) to identify chemical shifts of another (target). Given similarity between source and target systems (e.g., using homologs), CST allows the chemical shifts of the target system to be assigned using a limited amount of experimental data. In this study, we propose a deep-learning based workflow, ARTINA-CST, that automates this procedure, allowing CST to be carried out within minutes or hours of computational time and strictly without any human supervision. We characterize the efficacy of our method using three distinct synthetic and experimental datasets, demonstrating its effectiveness and robustness even when substantial differences exist between the source and target proteins. With its potential applications spanning a wide range of NMR projects, including drug discovery and protein interaction studies, ARTINA-CST is anticipated to be a valuable method that facilitates research in the field.
1 Introduction
Recently, ARTINA (Klukowski et al., 2022), the first workflow that automates the analysis of protein NMR data for signal identification, resonance assignment (Schmidt and Güntert, 2012), and structure determination (Güntert et al., 1997; Güntert and Buchner, 2015) was developed, demonstrating the capability of machine learning to advance biomolecular NMR. With ARTINA, the entire NMR data analysis process can be completed on a web server (Klukowski et al., 2023) without human supervision and within hours after the NMR measurements are completed, replacing weeks or months of human labor. The procedure requires spectra with an information content that is sufficient to unambiguously assign chemical shifts without any prior knowledge about the protein structure. The exact amount of experimental data required for de novo chemical shift assignment with ARTINA depends on the signals’ resolution and signal-to-noise ratio. Typically, the requirement for the measurement time on expensive NMR spectrometers scales to approximately one or two weeks.
Often, one would like to study a system by NMR that is similar to an already assigned protein (Thompson et al., 2012). Examples of this include homologous proteins (Redfield and Robertson, 1991; Bartels et al., 1996), proteins studied under different experimental conditions (e.g., temperature, pH value, ligand concentration) (Jang et al., 2012; Banelli et al., 2017; Zieba et al., 2018), or proteins with bound ligands and in apo form (Orts and Gossert, 2018; Laveglia et al., 2021; Plata et al., 2023). In such cases, it is desirable to use knowledge about the known system, in particular its assigned chemical shifts, as complementary input for the automated analysis of a related target system with ARTINA. Such complementary input typically makes it possible to assign chemical shifts or determine the protein structure using a smaller set of spectra. In this work we evaluate the accuracy of ARTINA-CST and develop guidelines to utilize our method for chemical shift transfer applications.
2 Methods
2.1 Chemical shift transfer
The interface required for chemical shift transfer (CST) is included within the FLYA algorithm used by ARTINA for chemical shift assignment (Schmidt and Güntert, 2012). The function of this algorithm has been described in detail by Schmidt and Güntert; in brief, a list of expected peaks is constructed from the protein of interest’s amino acid sequence, then this list is mapped to a list of measured peaks generated by manual or automated peak picking. The mapping is improved iteratively using global and local optimization methods (Bartels et al., 1997). This procedure is repeated for a series of replicates and the final output is determined as a majority consensus of these.
In order to initialize the assignment of expected peaks to measured peaks, the default procedure within FLYA uses statistics from the Biological Magnetic Resonance Data Bank (BMRB) (Hoch et al., 2023) to construct an initial “search space” for each expected peak, from which a measured peak is picked at random to assign it (Schmidt and Güntert, 2012). For each atom in a protein, this search space is defined by a statistical distribution parametrized by the mean and standard deviation of chemical shifts of the given atom type over all occurrences of its amino acid type within the BMRB. Due to the scale of the database and the high number of peaks in protein NMR spectra, this usually generates a large area containing also many incorrect possibilities for assignment, the center of which can deviate from the peak’s true position. By improving the initial search space for each atom, i.e., reducing its size and shifting its center closer to the true position, the accuracy of FLYA assignments can therefore be improved significantly (Bartels et al., 1996; Aeschbacher et al., 2013). The essence of the CST procedure relies on this concept, aiming to provide a better estimate of the peak’s true position and therefore also allowing for a smaller search space size (Figure 1A).
FIGURE 1. Graphical abstract of the ARTINA-CST workflow. (A) CST affects automated chemical shift assignment through the selection of peak search spaces (light red rectangle). Without CST, the search space is estimated based on statistical information extracted from the BMRB database, whereas with CST, it is determined from a previously assigned source protein (the dark red circle indicates the chemical shifts of the source protein, and the highlighted area is specified by a user-defined parameter). The resulting area is centered closer to the target peak of interest (dark green oval) and contains less incorrect peaks (light green ovals). (B) The chemical shifts of the source protein are prepared by sequence alignment to the target protein and subsequent extraction of search spaces from the chemical shifts of the aligned residues. Measured NMR data is then input to FLYA as a list of peaks extracted automatically from spectra, yielding the assigned shifts of the target protein.
CST is performed as follows (Figure 1B): Given a source protein with a known assignment of chemical shifts and a target protein whose shifts have yet to be assigned, first a mapping of source atoms to target atoms must be defined. In the trivial case where source and target protein are identical in sequence, each source atom can simply be mapped to its copy in the target. However, this need not always be the case, and transfer of assignment data between different protein samples could also be highly beneficial in many cases. In this work, we compared pairs of distinct proteins by first performing a local pairwise sequence alignment tuned to produce large aligned sub-sequences between a small number of wide gaps (Supplementary Table S1). Each identical pair of amino acids in this alignment could then be used for CST. Aligned pairs of different amino acids and amino acids aligned to gaps were simply disregarded, falling back to FLYA’s default setting.
Based on this source-target mapping, new initial search spaces can be defined for each atom in the target protein, centered around the corresponding chemical shift in the source. Since this new center is likely to be more accurate, i.e., closer to the true position of the atom’s shift than the mean value obtained from BMRB, the search space’s width can then be reduced (higher precision), further facilitating its assignment.
The determined search space centers and sizes are then entered into FLYA as a list of mean values and standard deviations (“chemical shift statistics”), with the algorithm automatically applying the default BMRB-based search space for any atom not included.
When using FLYA as part of the ARTINA pipeline, automated visual analysis of each spectrum generates the experimental peaks to be used as input (Klukowski et al., 2022). For this work, execution was stopped after obtaining an assignment from FLYA. In practice, however, the output assignments can also be passed to the next stages of ARTINA, e.g., structure determination.
2.2 NMR data
The dataset established for the training and testing of ARTINA models (Klukowski et al., 2022) was the source of raw NMR spectral data in this study. Since CST is most efficient if only a small number of spectra have to be measured for the target protein, we used a small subset of NMR spectra types from the ARTINA dataset, consisting only of the 2D HSQC ([1H, 13C]-HSQC, [1H, 15N]-HSQC) and 3D NOESY (13C-resolved [1H, 1H] NOESY, 15N-resolved [1H, 1H] NOESY) spectra. Unless specified otherwise, we used this small subset of spectra for a representative set of 15 proteins for all experiments in this work (Supplementary Table S2).
2.3 Test examples for automated chemical shift transfer
2.3.1 Random perturbation
To evaluate the capabilities of ARTINA-CST, we initially generated CST test examples from the NMR data specified in section 2.2. Each test example was composed of the chemical shift list of the target (typically BMRB deposition) and source (randomly perturbed BMRB deposition) proteins, complemented by the set of NMR spectra of the target. The procedure ensured that each target protein had a corresponding source from which to draw chemical shift information, while also enabling quantification of the difference between the source and target shifts. The source chemical shift lists were created by manually perturbing deposited chemical shifts with Gaussian additive noise.
The standard deviation
where
When applying this data for CST, the uncertainty value used to determine the search space width was set to the larger of either:
• the standard deviation used for perturbation (
• a small tolerance of 0.04 ppm for 1H nuclei, and 0.4 ppm for both 13C and 15N.
To generate test examples, perturbations were applied to a specific fraction
By evaluating all combinations of the parameters
2.3.2 Structure-based perturbation
The subsequent series of experiments incorporated protein structure information into the source shift list generation procedure. The routine was designed to replicate an experimental setting where chemical shifts change between source and target due to such factors as point mutations or ligand binding at specific sites. We employed a random approach like the one outlined in the previous section. However, the standard deviation of the perturbation was defined as a function of the distance from the “perturbation centers” (
1. For each perturbation center
2. Calculate the decay factor
where
is the step function parametrized by a distance cutoff
3. Normalize each decay factor:
4. Calculate the standard deviation (
5. Set the uncertainty value used to determine the search space width for the atom
• the standard deviation used for perturbation (
• a small tolerance of 0.03 ppm for 1H nuclei, and 0.4 ppm for both 13C and 15N.
These steps form an alternative method to define the atom-specific chemical shift noise distribution
FIGURE 2. Three-dimensional rendering of decay factors (
In the experiment, we used the 15 benchmark proteins to generate 900 randomized CST test examples by drawing
2.4 Homologous proteins
To evaluate CST with experimental data, we investigated sequence homology between proteins in the ARTINA dataset (Klukowski et al., 2022) (Supplementary Table S2) and both RefDB (Zhang et al., 2003) and BMRB using sequence alignment parameters specified in Supplementary Table S1. As sequence alignment scores had no upper bound, they were normalized by the score of aligning each protein to itself. If the homolog protein was found in both RefDB and BMRB, preference was given to RefDB. We refer to the protein from the ARTINA dataset as the target and from RefDB/BMRB as the source protein.
All pairs with normalized sequence alignment score above 80% were selected for chemical shift alignment. In this step we propagated information from the sequence alignment into the source chemical shift list. This was indispensable as differences between the source and target sequences, such as insertion or deletion, require appropriate reindexing of chemical shifts in the source before applying CST to the target.
Subsequently, each shift in the aligned source list was compared with the corresponding target shift. An aligned shift was considered “correct” if it was within a given tolerance from the target position (0.03 ppm for 1H, 0.4 ppm for 13C/15N). Then, we calculated the “fraction of correct aligned shifts” for each pair of source/target homologous proteins, which is defined as the ratio of correct aligned shifts to total shifts in the aligned list.
In this experiment, we used protein pairs with a fraction of correct aligned shifts greater than or equal to 50%. Combining with the requirement of >80% sequence alignment score, we identified 12 source-target pairs for which experimental spectra were available in the ARTINA benchmark. These pairs corresponded to 9 distinct target proteins (Supplementary Figure S2, Supplementary Table S3).
3 Results
3.1 Reference experiments
To investigate the impact of complementary input (chemical shift list of the source protein) on the accuracy of the chemical shift assignment of the target protein, we first evaluated the baseline performance of de novo automated assignments of target proteins, without any transfer information (Supplementary Figure S1).
Subsequently, we ran the chemical shift transfer in an idealized setting, where each target protein’s own chemical shift list was passed as the transfer source. This corresponds to the scenario, unlikely in practice, in which the chemical shifts do not change between the source and target proteins, or that the input shift list for CST is equal to the expected output. It is important to note that FLYA still uses peak lists prepared for a minimal set of experimental spectra ([1H, 13C]-HSQC, [1H, 15N]-HSQC, 13C-resolved [1H, 1H] NOESY, 15N-resolved [1H, 1H] NOESY) to perform chemical shift transfer. Therefore, it cannot simply copy the input shifts to the output, and some chemical shifts cannot be assigned due to missing signals in the input data. However, this experimental setting represents the best possible initial search space positions for FLYA, and therefore can be used to estimate an upper bound for CST accuracy. In this case, the width of each search space was set to 0.04 ppm for 1H nuclei and 0.4 ppm for 13C and 15N nuclei.
As expected, we observed a significant improvement in chemical shift accuracy by 23.5 percentage points (pp) in the idealized reference case, as compared to de novo assignment, where no complementary input was used. The average assignment accuracy of 68.8% for 15 proteins used in the de novo experiment increased to 92.3% using CST (Supplementary Figure S1). The fact that FLYA was unable to attain 100% accuracy in these reference experiments was likely mainly due to the minimal set of spectra used (Supplementary Table S2).
In both reference experiments (Supplementary Figure S1) the assignment errors had a tendency to accumulate in side-chains (90.6% accuracy with CST, 66.3% without), leaving the accuracy of the backbone assignment above the average (95.3% accuracy with CST, 73.4% without)—a result consistent with previous studies of FLYA (Schmidt and Güntert, 2012) and ARTINA (Klukowski et al., 2022).
Another notable observation is that the availability of source chemical shifts affects the variance of the output accuracy. In de novo experiments, the discrepancy between the most and least accurately assigned proteins is 34.9 pp, compared to only 13.8 pp in the idealized CST case.
Overall, the experiments we have conducted, ranging from scenarios with minimal to maximal information available for chemical shift transfer, indicate that source protein information provides useful guidance for the combinatorial optimization that reduces the ambiguity of the assignment, resulting in fewer errors and enhancing the overall reliability of protein assignment.
3.2 The impact of random and structure-based perturbations
To characterize ARTINA-CST under more realistic conditions, we carried out over 1,000 automated chemical shift transfers with 15 proteins using the test examples described in sections 2.3.1 and 2.3.2.
In the first series of experiments, a variable fraction of atoms was selected at random for perturbation (20%, 50% and 100%), as described in section 2.3.1, imitating a variable degree of discrepancies between known source chemical shifts and the target protein. Perturbed chemical shifts were used as input for the ARTINA-CST procedure together with 4 NMR spectra of the target protein ([1H, 13C]-HSQC, [1H, 15N]-HSQC, 13C-resolved [1H, 1H] NOESY, 15N-resolved [1H, 1H] NOESY). ARTINA-CST automatically extracted cross-peak positions from experimental data and combined them with the corresponding perturbed shift list, yielding the assignment of the target protein. This result was compared with de novo ARTINA assignment, which involved the same procedure, but without the perturbed chemical shift list as input.
The use of CST turned out to be highly beneficial in all three experimental settings (20%, 50% and 100% shift perturbation) and for almost all proteins included in the study. The relative improvement in assignment accuracy is depicted in Figures 3A–C by the ratio between the accuracy of the chemical shift assignment with the CST procedure and with the de novo approach, with a value of 1.0 corresponding to a neutral effect of CST.
FIGURE 3. Improvement of chemical shift assignment with CST using randomly and structure-based perturbed shifts. On the vertical axis, the change in accuracy is given as the ratio between the chemical shift assignment accuracy with CST and corresponding de novo assignments without information about the source protein. On the horizontal axis, the fraction of shifts perturbed refers to all shifts where perturbations were applied, regardless of the size of the perturbation. The trend of each plot is modelled using Gaussian process regression and shown in blue, with a 95% confidence interval shown in orange. “Scaling coefficient” refers to the constant
As expected, the value of the ratio depends on the parameters
When the scaling coefficient was set to a moderately higher value (
Subsequently, we repeated the above experiment using structure-based perturbations (section 2.3.2) instead of random ones. The results for these assignments show a strong resemblance to those for the fully randomized perturbations (Figures 3D–F), including correlation between CST accuracy and
Overall, we identified no major changes in the performance of the algorithm that depend on the spatial distribution of the chemical shift perturbation, and in both experimental settings the number of perturbations and corresponding variance were the primary factors affecting the accuracy of the procedure.
3.3 Case study with the lipoprotein Spr NlpC/P60 domain
For individual cases, we observed a particularly large improvement in the assignment obtained with the CST procedure. For instance, the C-terminal NlpC/P60 domain of lipoprotein Spr from Escherichia coli [PDB 2K1G (Aramini et al., 2008)], showed poor performance in de novo chemical shift assignment (48.3% accuracy), compared to 94.7% obtained in the idealized case of CST (Figure 3).
Even in the experimental setting with the strongest perturbation (standard deviation equal to BMRB standard deviation and perturbations applied to all shifts, which resulted in the large majority of shifts being moved significantly from their original positions), the fraction of correct assignments for this protein was raised by 20.4 pp upon application of the CST procedure. In all other cases the positive impact of the chemical shift transfer was even stronger (30.8–49.8 pp improvement, depending on the experimental setting).
The primary reason for such improvements was the ability of the CST method to resolve chemical shifts in the proximity of the dynamic loop of residues 16–32, 79–81, 90–93, 99–101, long positively charged side chains (37.3% without CST vs. 91.5% with idealized CST) and aromatics (54.9% vs. 93.7%).
3.4 Chemical shift transfer with homologous proteins
As described in section 2.4, we used pairs of homologous proteins identified in RefDB/BMRB and the ARTINA benchmark of NMR spectra to assess the performance of the chemical shift transfer in fully experimental setting. In this procedure, the chemical shift list deposited in RefDB/BMRB was regarded as source, and four NMR spectra ([1H, 13C]-HSQC, [1H, 15N]-HSQC, 13C-resolved [1H, 1H] NOESY, 15N-resolved [1H, 1H] NOESY) were used by ARTINA-CST as input for the transfer procedure to the target system. In this experiment all cross-peaks in the abovementioned spectra were identified automatically by deep neural network models included in the ARTINA visual spectrum analysis layer. As in the previous experiments, we performed for each target protein additional de novo assignments to assess the relative performance of the CST procedure.
The results show an overall mean improvement by CST over de novo assignment of 7.4% (0.0%–14.4%) in the chemical shift assignment accuracy of all shifts and 9.7% (1.2%–23.9%) for the backbone NH groups (Figure 4). Out of 12 homolog pairs, the impact of CST was positive both for all shifts and NH groups in 11 cases. Only in one case (2JVD) the impact of CST was neutral for all shifts and positive for NH groups.
FIGURE 4. Comparison of the accuracy of the chemical shift assignment with and without CST. In the experiment 12 pairs of 9 homologous proteins were used. Each target protein is represented in a distinct color (for 2LEA, 2LL8 and 2LRH, two source proteins from BMRB were identified). Circles represent assignment accuracy over all atom types, whereas crosses signify the accuracy in the same experiments evaluated for the backbone amide N/H atoms only.
ARTINA-CST has a single parameter,
In a second series of experiments, we therefore carried out chemical shift transfers between the 12 homolog pairs using different values of the expected perturbations,
FIGURE 5. Impact of search space size on CST accuracy. The change in accuracy is given as the ratio between the accuracies with and without chemical shift transfer. The left-hand side represents assignment accuracy over all atom types, whereas the right-hand side signifies the accuracy in the same experiments only evaluated over all amide N/H atoms. Within each set of experiments, the furthest-right case (blue) represents the “ideal” reference experiment, using the target protein shift list as source for chemical shift transfer. The neighboring case (green) used a fixed-size search space of 0.09 ppm for 1H and 1.2 ppm for 13C/15N. The other boxes (red) represent a variable initial search space size calculated as a fraction of the BMRB standard deviation of chemical shifts.
Chemical shift transfer impacts full assignment and amide N/H assignment in similar way (Figure 5). For
4 Discussion
An assigned set of chemical shifts establishes a basis for various studies in protein NMR spectroscopy. It facilitates structure elucidation, with chemical shifts being instrumental in determining hydrogen spatial contacts and providing insight into the three-dimensional architecture of the macromolecule. Beyond elucidating static structures, chemical shift assignments offer a unique window into protein dynamics, allowing for the monitoring of temporal changes, facilitating the observation of protein folding processes, conformational alterations, and molecular interactions. In the field of protein-ligand studies, the chemical shifts typically exhibit changes upon ligand binding, thereby pinpointing the site of interaction, as well as details regarding its molecular mechanism and affinity.
In this study we focused on chemical shift transfer–a technique that allows to find chemical shift assignment of a target protein of interest, given a (small) set of experimental spectra and the shift assignment of a similar source protein (e.g., a homolog). The technique is particularly suitable for studies of protein interactions, such as protein-ligand complexes, where information about the structure in apo form can be utilized to model the structure upon binding. Other applications include studies of protein mutations, where a wild-type structure with its assignment can be used as a source of the transfer for a series of mutants. Finally, chemical shift transfer finds its applications in studies of proteins under different physical conditions. In all experimental settings evaluated in this study, the goal of the chemical shift transfer was to find chemical shifts of the target protein with a small amount of experimental data, thereby reducing the measurement time from about 1 to 2 weeks to 2–3 days ([1H,13C]-HSQC, [1H,15N]-HSQC, and combined 15N,13C-resolved [1H,1H]-NOESY).
In this work we built upon our previous work with ARTINA (Klukowski et al., 2022) and FLYA (Schmidt and Güntert, 2012) to establish a fully automated workflow that performs chemical shift transfer automatically, strictly without any human involvement. Subsequently, we carried out over 1,000 automated chemical shift assignments to demonstrate the performance of ARTINA-CST approach and characterize its properties under different experimental settings.
We demonstrated the boundary performance of ARTINA-CST by carrying out automated CST with complete information about the target system and without any information from the homolog structure. Subsequently, we characterized the performance of our method, depending on such factors as the similarity of source and target protein chemical shifts, the variance of chemical shift perturbations, or the spatial distribution of chemical shift deviations. Finally, we demonstrated the performance of our approach using pairs of homolog proteins extracted from RefDB/BMRB databases, which have experimental data available in the ARTINA dataset. In all these experiments, we demonstrated the benefits of CST for the automated assignment of protein NMR spectra whenever appropriate data is available. Even in presence of large differences between the target protein and the source chemical shifts, CST allows an effective transfer of information to improve the assignment while requiring only a small set of spectra for the target protein. This experiment was carried out with a minimal set of NMR spectra for source-target pairs with at least 80% sequence homology (and correspondingly lower sequence identity). We expect CST to be possible also at even lower sequence similarity, where, however, it might be necessary to compensate for the larger number and size of the chemical shift differences by measuring one or more additional spectra (for instance, HNCO, HNCA, HNcoCA, or CBCAcoNH) for the target protein. On the other hand, we see the main practical applications of automated CST rather for source-target pairs with highly similar sequences such as orthologous proteins from different species and mutants in combination with temperature, pH, salt or other environment changes, ligand binding, etc.
Future improvements of ARTINA-CST are possible, provided that more NMR data relevant for chemical shift transfer is collected and deposited in public repositories. It would allow for the use of statistical methods or machine learning to characterize chemical shift perturbation patterns resulting from different types of transfers (e.g., changes of the physical conditions or ligand binding).
We believe that the method can find future applications in fundamental studies of proteins with NMR spectroscopy, including investigations of protein structure, dynamics, and interactions. The method can be adopted easily by the NMR community and integrated in research protocols, as CST takes only up to 2 h of computational time and the whole process can be carried out in the web browser using our cloud computing platform NMRtist (https://nmrtist.org). Although this is a technical paper presenting a solution to a common problem in biomolecular NMR spectroscopy, where approaches analogous to molecular replacement in X-ray crystallography are not in common use, it may have a more general impact on biochemical research by simplifying the use of NMR in situations where it was so far considered a (too) laborious method. Additionally, we believe that the results presented here may serve as guidance for NMR practitioners who use the NMRtist platform (Klukowski et al., 2023), helping them to assess the practical benefits of the recently proposed ARTINA method in a new context.
Data availability statement
Publicly available datasets were analyzed in this study. This data can be found here: ETH Research Collection: https://doi.org/10.3929/ethz-b-000568621.
Author contributions
HW performed research, prepared illustrations and tables, and wrote the draft manuscript. PK, RR, PG designed and supervised research. All authors contributed to the article and approved the submitted version.
Funding
The study was supported by the EUREKA Eurostars grant E! 115328 and the Grant-in-Aid for Scientific Research 23K05660 of the Japan Society for the Promotion of Science. Open access funding was provided by ETH Zurich.
Conflict of interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Supplementary material
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmolb.2023.1244029/full#supplementary-material
References
Aeschbacher, T., Schmidt, E., Blatter, M., Maris, C., Duss, O., Allain, F. H.-T., et al. (2013). Automated and assisted RNA resonance assignment using NMR chemical shift statistics. Nucleic Acids Res. 41, e172. doi:10.1093/nar/gkt665
Aramini, J. M., Rossi, P., Huang, Y. J., Zhao, L., Jiang, M., Maglaqui, M., et al. (2008). Solution NMR structure of the NlpC/P60 domain of lipoprotein Spr from Escherichia coli: structural evidence for a novel cysteine peptidase catalytic triad. Biochemistry 47, 9715–9717. doi:10.1021/bi8010779
Banelli, T., Vuano, M., Fogolari, F., Fusiello, A., Esposito, G., and Corazza, A. (2017). Automation of peak-tracking analysis of stepwise perturbed NMR spectra. J. Biomol. NMR 67, 121–134. doi:10.1007/s10858-017-0088-7
Bartels, C., Billeter, M., Güntert, P., and Wüthrich, K. (1996). Automated sequence-specific NMR assignment of homologous proteins using the program GARANT. J. Biomol. NMR 7, 207–213. doi:10.1007/BF00202037
Bartels, C., Güntert, P., Billeter, M., and Wüthrich, K. (1997). Garant - a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra. J. Comput. Chem. 18, 139–149. doi:10.1002/(sici)1096-987x(19970115)18:1<139::aid-jcc13>3.0.co;2-h
Güntert, P., and Buchner, L. (2015). Combined automated NOE assignment and structure calculation with CYANA. J. Biomol. NMR 62, 453–471. doi:10.1007/s10858-015-9924-9
Güntert, P., Mumenthaler, C., and Wüthrich, K. (1997). Torsion angle dynamics for NMR structure calculation with the new program DYANA. J. Mol. Biol. 273, 283–298. doi:10.1006/jmbi.1997.1284
Hoch, J. C., Baskaran, K., Burr, H., Chin, J., Eghbalnia, H. R., Fujiwara, T., et al. (2023). Biological magnetic resonance Data Bank. Nucleic Acids Res. 51, D368–D376. doi:10.1093/nar/gkac1050
Jang, R., Gao, X., and Li, M. (2012). Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinforma. 13, S4. doi:10.1186/1471-2105-13-S3-S4
Klukowski, P., Riek, R., and Güntert, P. (2023). NMRtist: an online platform for automated biomolecular NMR spectra analysis. Bioinformatics 39, btad066. doi:10.1093/bioinformatics/btad066
Klukowski, P., Riek, R., and Güntert, P. (2022). Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. Nat. Commun. 13, 6151. doi:10.1038/s41467-022-33879-5
Laveglia, V., Giachetti, A., Cerofolini, L., Haubrich, K., Fragai, M., Ciulli, A., et al. (2021). Automated determination of nuclear magnetic resonance chemical shift perturbations in ligand screening experiments: the PICASSO web server. J. Chem. Inf. Model. 61, 5726–5733. doi:10.1021/acs.jcim.1c00871
Orts, J., and Gossert, A. D. (2018). Structure determination of protein-ligand complexes by NMR in solution. Methods 138, 3–25. doi:10.1016/j.ymeth.2018.01.019
Plata, M., Sharma, M., Utz, M., and Werner, J. M. (2023). Fully automated characterization of protein-peptide binding by microfluidic 2D NMR. J. Am. Chem. Soc. 145, 3204–3210. doi:10.1021/jacs.2c13052
Redfield, C., and Robertson, J. P. (1991). “Assignment of the NMR spectra of homologous proteins,” in Computational aspects of the study of biological macromolecules by nuclear magnetic resonance spectroscopy. Editor J. C. Hoch (New York: Plenum Press), 303–316.
Schmidt, E., and Güntert, P. (2012). A new algorithm for reliable and general NMR resonance assignment. J. Am. Chem. Soc. 134, 12817–12829. doi:10.1021/ja305091n
Thompson, J. M., Sgourakis, N. G., Liu, G. H., Rossi, P., Tang, Y. F., Mills, J. L., et al. (2012). Accurate protein structure modeling using sparse NMR data and homologous structure information. Proc. Natl. Acad. Sci. U. S. A. 109, 9875–9880. doi:10.1073/pnas.1202485109
Zhang, H. Y., Neal, S., and Wishart, D. S. (2003). RefDB: A database of uniformly referenced protein chemical shifts. J. Biomol. NMR 25, 173–195. doi:10.1023/a:1022836027055
Keywords: NMR, machine learning, automated spectra analysis, automated assignment, ARTINA, FLYA, protein
Citation: Wetton H, Klukowski P, Riek R and Güntert P (2023) Chemical shift transfer: an effective strategy for protein NMR assignment with ARTINA. Front. Mol. Biosci. 10:1244029. doi: 10.3389/fmolb.2023.1244029
Received: 21 June 2023; Accepted: 21 September 2023;
Published: 03 October 2023.
Edited by:
Caterina Alfano, Ri.MED Foundation, ItalyReviewed by:
Georg Kontaxis, University of Vienna, AustriaRafal Augustyniak, University of Warsaw, Poland
Copyright © 2023 Wetton, Klukowski, Riek and Güntert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Piotr Klukowski, piotr.klukowski@phys.chem.ethz.ch; Roland Riek, roland.riek@phys.chem.ethz.ch; Peter Güntert, peter.guentert@phys.chem.ethz.ch