Skip to main content

METHODS article

Front. Genet., 13 October 2021
Sec. Statistical Genetics and Methodology

A Multi-Marker Test for Analyzing Paired Genetic Data in Transplantation

  • 1Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, United States
  • 2Departments of Statistics, Central China Normal University, Wuhan, China
  • 3Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States
  • 4Department of Experimental and Clinical Pharmacology, College of Pharmacy, University of Minnesota, Minneapolis, MN, United States
  • 5Minneapolis Medical Research Foundation, Minneapolis, MN, United States
  • 6Department of Medicine, Hennepin County Medical Center, Minneapolis, MN, United States
  • 7Department of Epidemiology and Community Health, University of Minnesota, Minneapolis, MN, United States
  • 8Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States

Emerging evidence suggests that donor/recipient matching in non-HLA (human leukocyte antigen) regions of the genome may impact transplant outcomes and recognizing these matching effects may increase the power of transplant genetics studies. Most available matching scores account for either single-nucleotide polymorphism (SNP) matching only or sum these SNP matching scores across multiple gene-coding regions, which makes it challenging to interpret the association findings. We propose a multi-marker Joint Score Test (JST) to jointly test for association between recipient genotype SNP effects and a gene-based matching score with transplant outcomes. This method utilizes Eigen decomposition as a dimension reduction technique to potentially increase statistical power by decreasing the degrees of freedom for the test. In addition, JST allows for the matching effect and the recipient genotype effect to follow different biological mechanisms, which is not the case for other multi-marker methods. Extensive simulation studies show that JST is competitive when compared with existing methods, such as the sequence kernel association test (SKAT), especially under scenarios where associated SNPs are in low linkage disequilibrium with non-associated SNPs or in gene regions containing a large number of SNPs. Applying the method to paired donor/recipient genetic data from kidney transplant studies yields various gene regions that are potentially associated with incidence of acute rejection after transplant.

1 Introduction

Transplant matching usually focuses on non-genetic factors related to the donor, the recipient, or the graft itself, such as recipient age, donor sex, or organ size. Genetic matching in transplant has been limited to the human leukocyte antigen (HLA) region of the genome in the past, as this region codes for immune related genes that may lead to the recipient recognizing the allograft as non-self and mounting an immune response against it (Reddy et al., 2013; Hernandez-Fuentes et al., 2018). Although HLA matching reduces the risk of allograft rejection, it is not enough to prevent allograft rejection, even in the case of transplant between HLA-identical siblings (Grafft et al., 2010; Zanoni and Kiryluk, 2020). More recent transplant genetic studies have identified gene regions outside of the HLA region that may act as genetic modifiers for transplant outcomes (Almoguera et al., 2014; Yang and Sarwal, 2017; Steers et al., 2019; Farouk et al., 2020; Marin et al., 2020; Reindl-Schwaighofer et al., 2020). These so-called minor histocompatibility antigens may be important regions of interest to examine further in order to improve transplant outcomes.

Several studies have found evidence suggesting that donor and recipient genetic mismatch in these non-HLA regions could impact transplant outcomes. Zhang et al. (2020) showed that non-HLA donor/recipient (D/R) genetic differences were significantly associated with long-term graft survival in kidney transplant. Pineda et al. (2017) found a significantly increased number of D/R mismatched variants in the group of kidney transplant recipients with antibody-mediated rejection (AMR) compared to the group with no rejection, and they were able to identify 16 gene regions with multiple SNPs associated with AMR. Steers et al. (2019) utilized a genomic-collision model, in which a recipient who is homozygous for a deletion tagging allele obtains a transplant from a non-homozygous donor and were able to find a single polymorphism located in the LIMS1 locus with an increased hazard for rejection for D/R pairs with the collision genotype. In addition, previous work on single nucleotide polymorphism (SNP) matching in transplant found that utilizing D/R matching scores in association analyses led to discovery of SNPs potentially associated with acute rejection after liver transplant. Joint testing of these scores with the recipient genotype also suggested that the scores were measuring some aspect other than the combination of the recipient and donor genotype, since there were cases where the score was associated with transplant outcome, but the recipient and donor genotypes were not (Arthur et al., 2020). In order to improve power, increase interpretability and reproducibility of association signals, and facilitate follow-up functional studies, it is of interest to extend these single SNP methods to a multi-marker framework.

Many multi-marker methods have been proposed in the literature for assessing the association of multiple genetic markers within a gene region. Depending on how association information for individual markers is aggregated, they can largely be classified into three groups. In the first group, each test is based on combining p-values from tests of individual markers (Li et al., 2011; Chen et al., 2012; Mishra and Macgregor, 2015). Members in the second group can be seen as some quadratic form that combines statistics for testing marginal associations with each marker (Pan 2011). Well-known examples include Hotelling’s T2 statistic (Fan & Knapp, 2003), genomic-distance based regression (Wessel and Schork, 2006), variance component (VC) or kernel machine regression based tests (Tzeng and Zhang 2007; Pan 2009; Wu et al., 2010; Wu et al., 2011), the use of weighted genetic risk scores (Li et al., 2009; Iribarren et al., 2018), and the C-alpha test (Neale et al., 2011). These tests have also been extended using the framework of functional or mixed effects models (Fan et al., 2013; Chen et al., 2019; Chiu et al., 2019). In the third group, each test assesses associations between the phenotype variable and some form of aggregated genotype data. For example, the principal component regression (PCR) method (Gauderman et al., 2007; Wang and Abbot 2008; Chen and Qin, 2010; de Leeuw et al., 2015) tests the significance of top principal components of centered multi-marker genotype data, and Wang and Elston (2007) test similarly collapsed variables obtained via Fourier transformations. Similarly to tests in the second group, tests in this group have been built upon using functional data analysis methods (Luo et al., 2011; Luo et al., 2013). PCR and VC methods have been shown to generally have competitive statistical power (Ballard et al., 2010; Pan 2011; Liu et al., 2020). The connections among some of these methods have been studied in the literature (Schaid 2010; Bacanu 2012).

Only a few multi-marker D/R matching scores have been utilized for association analysis of transplant genetics data. The allogenomics mismatch score (AMS) is based on the hypothesis that observing the coding regions of both the recipient and donor genomes can help identify the number of potentially incompatible amino acids between the pair (Mesnard et al., 2016). The AMS is defined as the sum of amino acid mismatch contributions across all SNPs in the exome. A negative linear association was observed between the AMS and estimated glomerular filtration rate at 36 months post-transplant, suggesting that the AMS may be correlated with long term kidney graft function (Mesnard et al., 2016). A second method defined variant mismatch as any allele difference between the paired recipient and donor genomes (Pineda et al., 2017). This study found that the total number of D/R variant mismatches prior to transplant was significantly higher in the recipient group that developed antibody-mediated rejection versus the group with no rejection. The final method defined SNP mismatch as the donor carrying an allele not present in the recipient genome. These individual mismatches were then summed over all non-synonymous SNPs (nsSNPs) in the genome. After fitting a multivariate model that adjusted for HLA eplet mismatch, the degree of nsSNP mismatch was independently associated with graft loss (Reindl-Schwaighofer et al., 2019). While these methods were able to find some association between genome-wide mismatch and transplant outcomes, the results are difficult to interpret due to the scores spanning the entire genome. By extending existing gene-based test ideas to use both the recipient and donor genotype information, we have the potential to increase the power and interpretability over single SNP and whole genome-wide scoring methods.

Here we propose a new method called the “Joint Score Test (JST).” JST is built upon marginal likelihood scores for testing multiple SNP effects and a gene-based D/R matching score. JST uses only the most informative linear combinations of single recipient SNP marginal likelihood scores to allow for an increase in statistical power. JST also allows for flexible adjustment for covariates and therefore it can maintain nominal type-I error rates in the presence of population stratification.

We organize our paper as follows. First, we discuss the underlying model and the construction of JST. Then we present the results of extensive simulation studies. Third, we utilize JST in an association analysis of kidney transplant data. Finally, we discuss the benefits and potential limitations of the method.

2 Materials and Methods

JST jointly tests whether recipient genotype SNPs or a gene-based D/R matching score are associated with the transplant phenotype of interest.

2.1 Notation

Suppose that genotype data for m SNPs in a genomic region of interest are available for n transplant D/R pairs. Let  XijR and XijD be the numerical coding of the genotype of the jth marker for the ith recipient or donor, respectively (i=1,,n;j=1,,m), which can be the number of minor alleles or another numerical coding, and let Wik be the kth covariate of D/R pair i. We consider a regression setting with continuous or categorical outcomes, Yi, and our generalized linear model (GLM) is of the form

g(μi)= α0 + Wiα+ Xiβ+ Ziγ, 

where g () is the link function, μi=E(Yi|Wi,  Xi,  Zi), Wi=(Wi1,...,WiK) is the vector of K covariates for D/R pair i with regression coefficients α=(α1,,αk)T.  Let Xi=(Xi1R,Xi2R,...,XimR) denote the genotype vector of m SNPs for recipient i with regression coefficients β=(β1,,βm)T, and let Zi denote the single gene-based genetic matching score value for D/R pair i with regression coefficient γ. Note that may include principal components (PCs) for describing population substrata. We are interested in jointly testing the null hypotheses

H0: β=0 and γ=0

2.2 Gene-Based Scores

In general, each gene-based score can be written in the form

Zi=j=1mD(XijD , XijR )

where D(XijD , XijR ) represents a function defining a distance between the donor and recipient genotypes. We emphasize that the donor and recipient SNPs used in the distance calculations are not necessarily the same as those present in the recipient genotype main effects vector  Xi, and thus can be greater or fewer in number. We will focus on four different single SNP distance functions.

2.3 Single SNP Distance Scores

The first score, the identity-by-state (IBS) mismatch distance function, is defined as

DIBS=|XijD XijR | 

assuming diallelic SNPs. This function is based on the degree of identity-by-state between the donor and recipient genotypes, measuring the number of alleles the pair shares at a SNP. IBS has previously been used as a kernel function in the sequence kernel association test (Wu et al., 2011) and in a kernel machine approach to test multiple genetic markers in association with quantitative traits (Kwee et al., 2008).

The second score considered, the incompatibility distance function, is calculated as

DIncomp= {1 if XijDXijR  0 otherwise.

This distance metric has been utilized in a kidney transplant study, where single SNP incompatibility and a genome-wide sum of this measure were found to be associated with antibody-mediated rejection (Pineda et al., 2017). In addition, a similar score was utilized in mother/child pairs in a genetic study of pre-eclampsia (PE), where they found SNPs from three candidate gene regions to be nominally associated with PE (Parimi et al., 2008).

The third score, the Allogenomics Mismatch Score (AMS) distance function, is defined as

DAMS= a  XijD{0 if a  XijR1 otherwise,

where a denotes alleles of a genotype (Mesnard et al., 2016). The underlying hypothesis of this method states that examining the difference between transplant donor and recipient alleles in coding regions of the genome can give insight into which amino acids coded by the donor would present as non-self to the recipient immune system, potentially leading to allograft damage.

The fourth score, the binary mismatch score, is based off a simplification of the AMS which assigned a score of 1 for all SNPs where the donor genotype contained an allele that was not present in the recipient genotype and a score of 0 otherwise (Reindl-Schwaighofer et al., 2019). The single SNP distance function can be defined as

DMM= {1 if  aGdl such that aGrl0 otherwise

2.4 A New Multi-Marker Test Statistic for Paired Transplant Data

In this section we will focus on deriving our JST test statistic for a binary outcome, Y. Additional derivation for a binary and a continuous outcome is provided in the Online Resource. Let p^1(Wi)p(Yi=1|Wi;α^0;α^) denote the predicted probability of Yi=1 based on the null model

logit Pr(Yi=1|Wi;α0, α)=α0+k=1KWikαkα0+Wiα.

Here α^0 and α^ are the maximum likelihood estimates of α0 and α. Additionally, we let X^ijR denote the fitted value for the jth SNP genotype for recipient i from a weighted linear regression model XijR= θ0+ k=1KWikθk+ εX and let Z^i denote the fitted value for the gene-based genetic matching score of D/R pair i from a weighted linear regression model Zi= τ0+  k=1KWikτk+ εZ. In both cases, the weights are p^1(Wi){1 p^1(Wi)} for recipient i or D/R pair i. As derived in the Online Resource, the likelihood score for testing the marginal association with the jth recipient SNP is equivalent to

UjR= i=1n(XijRX^ijR){Yi  p^1(Wi)} i=1nQijR

Similarly, the likelihood score for testing the marginal association with the gene-based D/R SNP genetic matching score is equivalent to

US= i=1n(ZiZ^i){Yi  p^1(Wi)} i=1nQiS

Denote the vector of scores for all m recipient genotype SNPs and the gene-based genetic matching score as U, U=(U1R,...,UmR, US)T, let Bi=(Xi,Zi) and B^i=(X^i,Z^i). Then U can be written into the matrix form,

U=(BB^)T{Yp^1}

which is asymptotically distributed as a (m+1) dimensional normal random variable with variance-covariance matrix V=nQTQ where Q=(QR,QS). The element of V at position (a,b),  a=1,...,(m+1) and b=1,...,(m+1), is estimated as ni=1nQiaTQib. A Hotelling’s T2 statistic can be constructed as nUTV^1U, which asymptotically follows a Chi-squared distribution with (m+1) degrees of freedom. It is well known that the Hotelling’s T2 statistic has low power when (m+1) is large, and that eliminating V^1 from the test statistic could lead to an improvement in power (Bai and Saranadasa, 1996). Along this line, a squared score test (Pan 2009, referred to as “SSU”) and a kernel-machine based test with the linear kernel (Wu et al., 2010, referred to as “SKAT”) have increased power for testing multiple marker main effects. These methods do not distinguish between the matching score effects and recipient SNP effects, however, which may have different underlying biological mechanisms. The sequence kernel association test (SKAT) method, for example, assumes that β and γ have the same underlying variance component (Wu et al., 2011), which therefore does not distinguish recipients’ SNP main effects and the effect of the matching score.

Here we propose a new statistic as follows. V  can be decomposed as

[ VRCRSCSRVS]=[Var(UR)Cov(UR, US)Cov(US, UR)Var(US)]

Our statistic is based on Eigen decomposition of the sample variance-covariance matrix V^R.  Let A=[a1,a2,,am] denote a m × m matrix with the pth column being the pth eigenvector of V^R,  and (λ1,λ2,,λm), λ1λm, denote the corresponding eigenvalues. We extract the first s (s < m) principal components (PCs, the choice of s is discussed below). Let As=[a1,a2,,as]. Define UPR as the vector of (UR)Tal/ λl, l=1,2,,s. Our test statistic is constructed based on UP=(UPR,US).

(UPRUS)T[Is×sCôv(UPR, US)Côv(US, UPR)Vâr(US)]1(UPRUS)

where Is×s is the s by s identity matrix.

We can show that this statistic is asymptotically distributed as a Chi-squared random variable with s+1 degrees of freedom under the null. Therefore, Is×s is there under the null due to the orthogonality between the eigenvectors. Under the alternative hypothesis, this statistic is distributed as a non-central Chi-squared random variable with s+1 degrees of freedom and non-centrality parameter that is equal to its value.

2.5 Selection of s for JST

Choosing the number of PCs to retain (s) is always a difficult task in principal component analysis. One common method is to choose the number of PCs to keep based on a predefined percentage of total variance explained. We utilize this method in our simulation studies, looking at a range of 65–99% total variance explained by the retained PCs.

2.6 Simulations

Simulation studies were conducted to assess type I error and power levels of the JST, as well as to determine the number of principal components to maintain after Eigen decomposition, s.

2.6.1 Simulation Study Design

Datasets were sampled from 1,000 Genomes Phase 3 reference using HapGen2 (Su, Marchini and Donnelly, 2011). Briefly, subsets of the reference data were created based on three gene regions starting and ending positions, NAT2, CHI3L2, and ASAH1. These genes were chosen due to their differing number of SNPs and LD structures (Supplementary Figure S1). SNPs with minor allele frequency less than 0.05 were excluded from analyses. The subset reference data was then sampled with HapGen2 to generate 2n control individuals which were then paired into n donor/recipient pairs. A small sample size, n = 500, and a large sample size, n = 1,000, were considered. Recipient and donor genotype information was then extracted from these sampled datasets and used to calculate gene-based scores. A total of 5,000 simulations were conducted for each gene region and sample size combination.

For type I error analysis, null phenotypes were generated using the model

logit Pr(Yi=1|W)=α0+0.5W1+0.5W2

for binary outcome Yi, and using the model

Yi=0.5W1+0.5W2+ ε

for continuous outcome Yi, where W1 is a binary covariate taking a value of either 0 or 1 with probability 0.5, W2 is a continuous covariate drawn from a standard Normal distribution, and ε is an error term drawn from a standard Normal distribution.

For power analyses, phenotypes were generated using the model

logit Pr(Yi=1|W,X,Z)= α0 + 0.5W1+0.5W2+ Xiβ+ Ziγ, 

for binary outcome Yi, or using the model

Yi= 0.5W1+0.5W2+ Xiβ+ Ziγ + ε

for continuous outcome Yi. For both models, a variety of true associations were tested, where either recipient genotype SNPs were associated (β 0, γ=0), or D/R matching was associated (β=0, γ0). When recipient genotype SNPs were associated, we considered scenarios in which 5, 15, or 25% of the SNPs in the gene region were truly associated with outcome, and this group of SNPs was either in high linkage disequilibrium (LD) or low LD. When D/R matching was associated, we considered scenarios in which 5, 15, 25, 50, 75, or 100% of the SNPs in the gene region were important to match between D/R. For cases where less than 100% of the SNPs were important to match, we included only the associated SNPs in the summed matching score when deriving phenotypes, and then utilized the full gene-based matching score for testing. Similarly to the recipient genotype SNPs, these groups of matched SNPs were either in high or low LD with one another. We considered a small, 0.14, medium, 0.41, and a large, 0.69, effect size resulting in odds ratios of 1.25, 1.50, and 2.00. Prevalence of the binary outcome Yi ranged from 5 to 20% in order to see the effects of rare versus common outcomes. In addition, several values of s were examined, accounting for 65–99% of total variance explained by the principal components, to determine its effect on type I error and power levels. All analyses were run using R (v4.0, R Core Team, 2021). Code to run all simulations can be found online (https://github.com/arthurvickie/Multi-Marker_Method).

2.6.2 Comparison to Existing Methods

In addition to testing the type I error and power levels of JST, we compared our method with a standard GLM and the SKAT. For all comparisons, phenotype generation was the same as for JST. We used the same n x (m+1) matrix of combined recipient genotype SNPs and gene-based D/R matching score as input as was used for JST. For our standard GLM, we fit separate models under the null and alternative hypotheses respectively, and then calculated the likelihood ratio test (LRT) statistic using the lrtest function from the lmtest (v0.9-37, Zeileis and Hothorn, 2002) package in R or the score test statistic using the anova function in R. SKAT analysis was performed using the unweighted linear kernel and the unweighted IBS kernel as implemented in the SKAT (v1.3.2.1, Lee et al., 2017) package in R.

2.7 Real Data Analysis

2.7.1 Sample Information

Kidney transplant data was collected from two cohorts: Deterioration of Kidney Allograft Function (DeKAF, 2005–2011, NCT00270712) Genomics Study and Genomics of Kidney Transplantation (GEN-03, 2012–2016, NCT01714440) study. Genotypes from the DeKAF cohort (n = 784 donor-recipient pairs) were determined with the AFR-AMR Axiom chip (Affymetrix, Santa Clara, CA) (Hoffmann et al., 2011), which contains 837,930 variants. Genotyping of GEN-03 cohort (n = 404 donor-recipient pairs) was performed on a custom exome-plus Affymetrix TxArray SNP chip (Li et al., 2015), which contains approximately 782,000 variants. Genotype calling was performed in one batch on the Affymetrix Genotyping Console v4.0 using the GT1 algorithm, which is based on BRLMM-P (Affymetrix, Santa Clara, CA). Genotyping details can be found in our previous paper (Oetting et al., 2016). Non-Caucasian recipients were excluded from this study. In both data sets the outcome of interest incidence of acute rejection (AR) after transplant (161 cases in DeKAF cohort and 50 cases in GEN-03 cohort), was coded as a binary variable. AR was defined as time to first T-cell, antibody mediated, or mixed T-cell and antibody mediated rejection post-transplant as determined by the enrolling center and treating physician. Rejection was biopsy confirmed in ∼96% of the cases.

2.7.2 Statistical Analysis

The common SNPs in the two cohorts were grouped by physical locations within 23,062 genome-wide genes (GRCh38. p13) and then analyzed using both JST with s = 85% variance explained and SKAT with an unweighted IBS kernel. Covariates were included for recipients’ age, gender, PRA status, prior non-kidney transplantation, and an indicator for the cohort membership. p-values were adjusted for multiple comparisons using the Benjamini-Hochberg procedure to control false discovery rate (FDR).

3 Results

3.1 Results of Simulation Studies

A subset of the total simulation results is presented for both type I error testing and power testing. In all scenarios, results were similar for all combinations of gene, sample size, and any additional variables tested. Table 1 is based on scenarios using NAT2 with 500 D/R pairs, and all figures are based on 1000 D/R pairs, using data from either NAT2 or CHI3L2.

TABLE 1
www.frontiersin.org

TABLE 1. Results of Type I Error simulations for the gene NAT2 with 500 D/R pairs. Method refers to one of the four multi-marker methods used for testing. Score refers to the gene-based score that was fit as part of the modeling. The columns Prevalence 20, 10, and 5% give results for binary outcome Y, and the continuous column gives results for continuous outcome Y. JST stands for Joint Score Test, with s value (percent of variance explained by the PCs) indicated in parentheses. Results for s value between 70 and 95% were similar to those for s = 65%. SKAT stands for Sequence Kernel Association Test. SKAT (Linear) refers to using SKAT with an unweighted linear kernel. SKAT (IBS) refers to fitting SKAT with an unweighted IBS kernel. GLM stands for Generalized Linear Model.

3.1.1 Type I Error

Table 1 shows the results of type I error rates for joint testing. For the proposed JST method, the type I error rate was around 0.05 for all combinations of outcome prevalence and fitted gene-based score. Type I error rates are nominal for s values between 65 and 90%, with some conservative error rates seen with s = 99% for lower outcome prevalence. Similar results can be seen for the SKAT method using either the linear or IBS kernel. The standard score test based on generalized linear model tends to have inflated type I error values when the outcome is binary, ranging from around 5–11%. When the outcome is continuous, the type I error is slightly inflated, at 6%.

3.1.2 Power—Recipient Genotype SNPs Associated

Figure 1 shows results from power analyses where the recipient genotype SNPs are associated with outcome, but the gene-based matching score is not associated. Under this scenario, SKAT using the IBS kernel performs better than SKAT using the linear kernel, so we use the IBS kernel results for our comparisons. The GLM LRT tends to have the lowest power, ranging from around 50% (Figures 1B,C) to around 80% (Figure 1D). It is probable that the power levels in panel D are being artificially inflated due to inflation of type I error rate. The power difference between the proposed JST and SKAT using the unweighted IBS kernel varies. When the R genotype SNPs associated with outcome are in low LD, JST tends to have higher power than SKAT, as seen in the Figure 1A where JST with s = 95% reaches close to 90% power but SKAT does not reach 80% power. When the SNPs associated with the outcome are in high LD, SKAT and the JST method can have similar power levels. This can be seen in the Figure 1B where both SKAT and JST with s = 65–80% reach around 85% power. When the size of the gene is increased, as in panels C and D, we see that JST tends to have higher power than SKAT in both scenarios. Additionally, when a continuous outcome is examined, JST with any value of s tends to have much higher power than SKAT using either kernel (Supplementary Figure S2).

FIGURE 1
www.frontiersin.org

FIGURE 1. Power estimates from simulations using the genes NAT2 (Panels (A,B)) and CHI3L2 (Panels (C,D)) with 1,000 pairs of donors and recipients under the scenario that recipient genotype SNPs were associated with outcome. Panels (A,B) show an outcome prevalence of 10%. Panel (C) has an outcome prevalence of 5% and Panel (D), 20%. Panels (A–C) show results for a medium OR (1.25) and Panel (D) for a large OR (2.00). Panels (A, C) show results when 25% of recipient SNPs are associated, while 15% are associated in Panel (B) and only 5% in Panel (D). In panels (A, D), SNPs are in low LD while in panels (B, C) they are in high LD. The four colored bars represent which gene-based score was fit in the model, with red corresponding to the allogenomics mismatch score (AMS), orange to binary mismatch score, green to identity-by-state (IBS) score, and blue to Incompatibility score. From left to right in panels (A, C,D), the method used for model fitting was the joint score test (JST), with s values of 85, 90, 95, and 99% of variance explained by the principal components (PCs), the sequence kernel association test (SKAT) with the unweighted IBS kernel, and a generalized linear model (GLM) likelihood ratio test (LRT). Panel (B) shows results for JST with s values of 65–80%. The y-axis shows estimated power from 0 to 100%. The horizontal blue line corresponds to 65% power and the horizontal red line corresponds to 80% power.

For this scenario, there is not a singular s value that always results in the highest power. Figures 1A,D both show that JST has the highest power when s corresponds to 95% variance explained. Panel B shows that under different circumstances, retaining a smaller number of principal components may result in higher power, while the opposite is seen to be true for Figure 1D. We note that power levels for JST in all panels are around 80%, so changing the number of principal components retained does not seem to drastically affect the overall power of the method.

3.1.3 Power—Donor/Recipient Gene-Based Matching Score Associated

Figure 2 shows an example of results on power when the D/R gene-based matching score was associated with outcome and recipient genotype was not associated. Under this scenario, SKAT performance is slightly better using the linear kernel versus the IBS kernel, so we use the linear kernel results in our comparison. We can see that overall, SKAT has the highest power under these simulation conditions, with power levels reaching at least 80% for all four associated scores. The proposed JST method has the second highest power ranging from around 60% when the binary mismatch score is associated to around 90% when the IBS score is associated. The GLM LRT has the lowest power, ranging from around 25% to almost 65%. We note that JST tends to have the highest power when the percentage of variance explained by the s PCs is the smallest (65%). When the IBS gene-based score or the AMS is truly associated (Figures 2A,C), power tends to be higher overall than when the binary scores are associated, with all s values leading to over 80% power. The Binary mismatch score being truly associated (Figure 2B) tends to have the lowest overall power of the four scores, with four scenarios in which power does not reach 65%. Similar results are seen when outcome is continuous (Supplementary Figure S3).

FIGURE 2
www.frontiersin.org

FIGURE 2. Power estimates from simulations using the gene CHI3L2 and 1,000 pairs of donors and recipients under the scenario that the gene-based score was associated with outcome. All plots shown are for a binary outcome with prevalence 10%. A small odds ratio (1.25) was used for phenotype generation. For these simulations, 50% of SNPs in the gene score were associated with the outcome, and these SNPs were in low LD. From left to right, and top to bottom the true associated gene-score is the allogenomics mismatch score (AMS), the Binary Mismatch, the identity-by-state (IBS), and the Incompatibility score. The four colors represent which score was used to fit the model, where red is the AMS, yellow is the Binary Mismatch score, green is the IBS, and blue is the Incompatibility score. In each plot, the x-axis corresponds to the method used, where from left to right methods are joint score test (JST) with s values of 65, 70, 75, and 80% of variance explained by the principal components used, the sequence kernel association test (SKAT) with unweighted linear kernel, and a generalized linear model (GLM) likelihood ratio test (LRT). The y-axis shows estimated power from 0 to 100%. The horizontal blue line corresponds to 65% power and the horizontal red line corresponds to 80% power.

3.2 Data Analysis Results

Table 2 shows the top five genes from analysis of the combined GEN03 and DeKAF data sets using JST with an s value of 85% and the top five genes from SKAT analysis of the combined data sets, using an unweighted IBS kernel. Gene ranking is based on the smallest p-value for all four score models. p-values for the four different models are relatively similar for each of the five genes. The number of genotyped SNPs in the genes ranges from 3 to 15 for the JST results and from 3 to 104 in the SKAT results. Only one of the top genes found using SKAT analyses was also found in the top five for JST analysis. All five of the genes from JST analysis are significant using a FDR cutoff of less than 0.10, while none of the genes found by SKAT are significant using this method.

TABLE 2
www.frontiersin.org

TABLE 2. Results of JST and SKAT analysis of combined GEN03 and DeKAF data sets. The top 5 genes sorted by p-value from fitting a model with the any of the four gene-based score are shown. JST was calculated using p = 85% of variance explained. SKAT was run using the IBS kernel. Adj. p-value: p-value adjusted for multiple comparisons using FDR, IBS: identity-by-state, Incomp: incompatibility, AMS: allogenomics mismatch score.

4 Discussion

We propose a multi-marker test statistic designed for use with paired genetic data in transplantation. The joint score test, or JST, can be used for testing whether specific gene regions are associated with transplant outcomes, either through the recipient genotype or D/R genetic matching. Compared to other statistical tests in this setting, the JST has the potential for increased power and reproducibility as compared to single SNP tests, as well as increased interpretability as compared to multi-SNP methods that sum across large regions of the genome.

The Eigen decomposition of the recipient genotype covariance matrix allows for a potential increase in power for JST as compared to a standard GLM likelihood ratio test. By transforming the covariance matrix into its principal components, we are able to select only those components that are most likely to be associated and discard the remaining components. We chose to keep principal components with large eigenvalues in accordance with previous theoretical and empirical work by Liu et al. (2020) in order to have the greatest power to detect genetic associations.

Our simulations showed that JST is a competitive method as compared to standard GLM, and SKAT. Type I error rates were conserved for both JST and SKAT but were inflated for GLM. This inflation could be due to the larger number of covariates being fit in this model. In power simulations, JST tended to outperform SKAT when recipient genotype SNPs, in low LD with all other SNPs in the region, were truly associated with outcome. When the recipient SNPs were in high LD, SKAT and JST performed similarly for the smallest gene but increasing the gene size while keeping the percentage of associated SNPs constant led to JST surpassing SKAT in power. These findings agree with those found by Liu et al. (2020) where the SKAT method performed closer to use of the first principal component when LD between SNPs increased, due to the increase of the first eigenvalue weighting the first PC higher in the SKAT test statistic. When gene-based score was associated with outcome, SKAT tended to have higher or similar power to JST depending on which gene-based score was used in modeling. When the AMS and IBS score were associated with the outcome, they tended to have higher power than when the Binary Mismatch score and Incompatibility score were associated. Additionally, when the Binary Mismatch score or Incompatibility score were associated with outcome, the AMS or IBS score respectively maintained relatively high power. Following these observations, it is recommended that the AMS or IBS score be used for testing. Ideally, the choice of score should be determined based on prior knowledge of the genetic mechanism.

JST and SKAT behaved similarly in some of the simulation scenarios but the construction of these two test statistics is different. Using the linear kernel, the SKAT statistic is equivalent to summing the squared score statistics of the genetic data (Wu et al., 2011). Under this scenario, SKAT is similar to JST but does not involve the variance between the marginal score functions. The main difference between the two approaches lies in different modeling approaches. SKAT is also based on a generalized linear model, but the log-odds ratio parameters for both the recipient genotypes and gene score are further modeled as following a mean zero distribution. Different ways of specifying variance in such mean zero distribution correspond to different kernel functions chosen for SKAT analysis (Sun et al., 2013). It is not straightforward to derive general insights on which method may be more powerful, but it does seem that their power would differ at least according to the true underlying genetic model and the LD structures of the genetic variants. We recommend both methods be applied for analysis with adjustment to multiple testing.

To the best of our knowledge, SKAT has not previously been evaluated under the scenario of jointly testing for an association between a set of SNPs and a gene-based score and has not been utilized with paired transplant genetics data. Our evaluation of SKAT under these circumstances found that the method works well and is robust. We determined that the choice of kernel often impacted power levels, however, such as when the SKAT method using the linear kernel had minimal power as compared to the SKAT method using the IBS kernel under the scenario where R genotype SNPs were associated with outcome. Investigation into this phenomenon found that scaling the gene-based scores from between 0 and 1, before running SKAT with a linear kernel leads to an improvement in power (Supplementary Figure S4). The IBS kernel had relatively high power under both power scenarios, with power levels only about 5% less than those of the linear kernel when gene-based score was associated, so use of the IBS kernel may be preferred when the true underlying association is unknown.

Based on the simulation studies, there is no clear value of s that leads to the highest power in all scenarios. When gene-based score was associated with the outcome, the smallest s value we considered tended to have the highest power, but power tended to be similar for s values corresponding to between 65 and 80% variance explained by the PCs. When recipient genotype SNPs were associated with the outcome, changing the value of s did not tend to drastically change the power levels. Since the true association mechanism is unknown for real data analysis, it may be beneficial to run models using a few different s values, although this will increase the number of tests. An alternative is to choose a middling value of s, around 80 or 85%, which tends to have high power under either scenario of association.

Our analysis of kidney transplant data found five genes to be statistically significant after accounting for multiple comparisons at FDR<0.10. These include genes that could plausibly lead to AR after kidney transplant. Three of the five genes, IFNA5, Z98752.3, and SGK2, have been found to be associated with the immune system or specific types of immune cells which could attack a transplanted kidney if the graft is recognized as non-self (Kichaev et al., 2019; Chen et al., 2020). IFNA5, for example, is known to be involved in differentiation and proliferation of B and T cells, as well as being involved in the adaptive immune response, which involves the creation of antibodies that may attack a donor organ (Huntley et al., 2015). In previous GWAS analyses, SNPs located in SGK2 were found to be associated with leukocyte and monocyte counts which are directly associated with immune response (Kichaev et al., 2019; Chen et al., 2020). Replication of these results will be needed to verify the significance of these findings.

The JST method does have some limitations. Transplant data analysis was limited to data from paired kidney transplants. The method can be applied to other organ data as well, as long as genotype data is available for both the donor and the recipient. For our simulation studies, we only focused on unrelated D/R pairs, but it is possible that the degree of relatedness between a donor and a recipient may impact whether the recipient experiences acute rejection. We were able to look at related versus non-related pairs in our combined kidney data sets and found that these two groups had no overlap in their top five potentially associated genes. Based on these results, we believe it is important to account for relatedness between a D/R pair in analyses. We restricted our analyses to only include common SNPs, but rare variants can be used in JST analyses. If there is interest in the association of rare variants, it is possible to construct a weighted version of the distance function as,

Zi=j=1mwjD(XijD , XijR )j=1mwj

Then a simple weighting option that will help upweight rarer minor allele frequencies is the use of wj=1MAFj (Kwee et al., 2008). Additionally, the JST method gives results based on the joint null hypothesis of either recipient genotype SNP effects and/or gene-based score effect but cannot specify which effect is driving the results. Work is ongoing to determine a testing method for gene-based score effect that can account for any recipient genotype SNP effect simultaneously.

In summary, the JST is a powerful method that can be used for the analysis of paired genetic data. Use of this method could lead to the discovery of gene regions potentially important to transplant outcome, which could be further studied to try and determine the biological mechanisms behind acute rejection.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Author Contributions

JC, VA, ZL, WG, and RC contributed to statistical derivation of the proposed method. VA ran simulations. AI, WO, and PJ ran the clinical study, were responsible for data collection and oversaw the genotyping. RC and WG ran statistical analyses on the kidney transplant data. VA, JC, and ZL wrote the first draft of the manuscript. RC wrote sections of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.

Funding

This work was supported by the National Institutes of Health. Grant numbers R01-ES016626, R21-ES020811, 5U19-AI070119, and 5U01-AI-58013.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.745773/full#supplementary-material

References

Almoguera, B., Shaked, A., and Keating, B. J. (2014). Transplantation Genetics: Current Status and Prospects. Am. J. Transplant. 14 (4), 764–778. doi:10.1111/ajt.12653

CrossRef Full Text | Google Scholar

Arthur, V. L., Guan, W., Loza, B. l., Keating, B., and Chen, J. (2020). Joint Testing of Donor and Recipient Genetic Matching Scores and Recipient Genotype Has Robust Power for Finding Genes Associated with Transplant Outcomes. Genet. Epidemiol. 44, 893–907. doi:10.1002/gepi.22349

PubMed Abstract | CrossRef Full Text | Google Scholar

Bacanu, S.-A. (2012). On Optimal Gene-Based Analysis of Genome Scans. Genet. Epidemiol. 36 (4), 333–339. doi:10.1002/gepi.21625

PubMed Abstract | CrossRef Full Text | Google Scholar

Bai, Z., and Saranadasa, H. (1996). Effect of High Dimension: by an Example of a Two Sample Problem. Stat. Sinica 6 (2), 311–329.

Google Scholar

Ballard, D. H., Cho, J., and Zhao, H. (2010). Comparisons of Multi-Marker Association Methods to Detect Association between a Candidate Region and Disease. Genet. Epidemiol. 34 (3), 201–212. doi:10.1002/gepi.20448

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, H., Huffman, J. E., Brody, J. A., Wang, C., Lee, S., Li, Z., et al. (2019). Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies. Am. J. Hum. Genet. 104 (2), 260–274. doi:10.1016/j.ajhg.2018.12.012

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L. S., Hsu, L., Gamazon, E. R., Cox, N. J., and Nicolae, D. L. (2012). An Exponential Combination Procedure for Set-Based Association Tests in Sequencing Studies. Am. J. Hum. Genet. 91 (6), 977–986. doi:10.1016/j.ajhg.2012.09.017

CrossRef Full Text | Google Scholar

Chen, M.-H., Raffield, L. M., Mousas, A., Sakaue, S., Huffman, J. E., Moscati, A., et al. (2020). Trans-ethnic and Ancestry-specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations. Cell 182 (5), 1198–1213. doi:10.1016/j.cell.2020.06.045

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, S. X., and Qin, Y.-L. (2010). A Two-Sample Test for High-Dimensional Data with Applications to Gene-Set Testing. Ann. Statist. 38 (2), 808–835. doi:10.1214/09-AOS716

CrossRef Full Text | Google Scholar

Chiu, C. y., Yuan, F., Zhang, B. s., Yuan, A., Li, X., Fang, H. B., et al. (2019). Linear Mixed Models for Association Analysis of Quantitative Traits with Next‐generation Sequencing Data. Genet. Epidemiol. 43 (2), 189–206. doi:10.1002/gepi.22177

PubMed Abstract | CrossRef Full Text | Google Scholar

de Leeuw, C. A., Mooij, J. M., Heskes, T., and Posthuma, D. (2015). MAGMA: Generalized Gene-Set Analysis of GWAS Data. Plos Comput. Biol. 11 (4), e1004219. doi:10.1371/journal.pcbi.1004219

PubMed Abstract | CrossRef Full Text | Google Scholar

Fan, R., and Knapp, M. (2003). Genome Association Studies of Complex Diseases by Case-Control Designs. Am. J. Hum. Genet. 72 (4), 850–868. doi:10.1086/373966

CrossRef Full Text | Google Scholar

Fan, R., Wang, Y., Mills, J. L., Wilson, A. F., Bailey-Wilson, J. E., and Xiong, M. (2013). Functional Linear Models for Association Analysis of Quantitative Traits. Genet. Epidemiol. 37 (7), 726–742. doi:10.1002/gepi.21757

PubMed Abstract | CrossRef Full Text | Google Scholar

Farouk, S., Zhang, Z., and Menon, M. C. (2020). Non‐HLA Donor-Recipient Mismatches in Kidney Transplantation-A Stone Left Unturned. Am. J. Transpl. 20 (1), 19–24. doi:10.1111/ajt.15612

CrossRef Full Text | Google Scholar

Gauderman, W. J., Murcray, C., Gilliland, F., and Conti, D. V. (2007). Testing Association between Disease and Multiple SNPs in a Candidate Gene. Genet. Epidemiol. 31 (5), 383–395. doi:10.1002/gepi.20219

PubMed Abstract | CrossRef Full Text | Google Scholar

Grafft, C. A., Cornell, L. D., Gloor, J. M., Cosio, F. G., Gandhi, M. J., Dean, P. G., et al. (2010). Antibody-mediated Rejection Following Transplantation from an HLA-Identical Sibling. Nephrol. Dial. Transplant. 25 (1), 307–310. doi:10.1093/ndt/gfp526

PubMed Abstract | CrossRef Full Text | Google Scholar

Hernandez‐Fuentes, M. P., Franklin, C., Rebollo‐Mesa, I., Mollon, J., Delaney, F., Perucha, E., Stapleton, C., Borrows, R., Byrne, C., Cavalleri, G., Clarke, B., Clatworthy, M., Feehally, J., Fuggle, S., Gagliano, S. A., Griffin, S., Hammad, A., Higgins, R., Jardine, A., Keogan, M., Leach, T., MacPhee, I., Mark, P. B., Marsh, J., Maxwell, P., McKane, W., McLean, A., Newstead, C., Augustine, T., Phelan, P., Powis, S., Rowe, P., Sheerin, N., Solomon, E., Stephens, H., Thuraisingham, R., Trembath, R., Topham, P., Vaughan, R., Sacks, S. H., Conlon, P., Opelz, G., Soranzo, N., Weale, M. E., and Lord, G. M. (2018). Long‐ and Short‐term Outcomes in Renal Allografts with Deceased Donors: A Large Recipient and Donor Genome‐wide Association Study. Am. J. Transpl. 18 (6), 1370–1379. doi:10.1111/ajt.14594

CrossRef Full Text | Google Scholar

Hoffmann, T. J., Kvale, M. N., Hesselson, S. E., Zhan, Y., Aquino, C., Cao, Y., et al. (2011). Next Generation Genome-wide Association Tool: Design and Coverage of a High-Throughput European-optimized SNP Array. Genomics 98 (2), 79–89. doi:10.1016/j.ygeno.2011.04.005

PubMed Abstract | CrossRef Full Text | Google Scholar

Huntley, R. P., Sawford, T., Mutowo-Meullenet, P., Shypitsyna, A., Bonilla, C., Martin, M. J., et al. (2015). The GOA Database: Gene Ontology Annotation Updates for 2015. Nucleic Acids Res. 43, D1057–D1063. doi:10.1093/nar/gku1113

PubMed Abstract | CrossRef Full Text | Google Scholar

Iribarren, C., Lu, M., Jorgenson, E., Martínez, M., Lluis-Ganella, C., Subirana, I., et al. (2018). Weighted Multi-Marker Genetic Risk Scores for Incident Coronary Heart Disease Among Individuals of African, Latino and East-Asian Ancestry. Sci. Rep. 8 (1), 6853. doi:10.1038/s41598-018-25128-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Kichaev, G., Bhatia, G., Loh, P.-R., Gazal, S., Burch, K., Freund, M. K., et al. (2019). Leveraging Polygenic Functional Enrichment to Improve GWAS Power. Am. J. Hum. Genet. 104 (1), 65–75. doi:10.1016/j.ajhg.2018.11.008

PubMed Abstract | CrossRef Full Text | Google Scholar

Kwee, L. C., Liu, D., Lin, X., Ghosh, D., and Epstein, M. P. (2008). A Powerful and Flexible Multilocus Association Test for Quantitative Traits. Am. J. Hum. Genet. 82 (2), 386–397. doi:10.1016/j.ajhg.2007.10.010

CrossRef Full Text | Google Scholar

Lee, S., Miropolsky, L., and Wu, M. (2017). SKAT: SNP-Set (Sequence) Kernel Association Test. R package version 1.3.2.1. Available at: https://CRAN.R-project.org/package=SKAT.

Google Scholar

Li, M.-X., Gui, H.-S., Kwan, J. S. H., and Sham, P. C. (2011). GATES: A Rapid and Powerful Gene-Based Association Test Using Extended Simes Procedure. Am. J. Hum. Genet. 88 (3), 283–293. doi:10.1016/j.ajhg.2011.01.019

CrossRef Full Text | Google Scholar

Li, M., Wang, K., Grant, S. F. A., Hakonarson, H., and Li, C. (2009). ATOM: A Powerful Gene-Based Association Test by Combining Optimally Weighted Markers. Bioinformatics 25 (4), 497–503. doi:10.1093/bioinformatics/btn641

PubMed Abstract | CrossRef Full Text | Google Scholar

Li, Y. R., van Setten, J., Verma, S. S., Lu, Y., Holmes, M. V., Gao, H., et al. (2015). Concept and Design of a Genome-wide Association Genotyping Array Tailored for Transplantation-specific Studies. Genome Med. 7, 90. doi:10.1186/s13073-015-0211-x

PubMed Abstract | CrossRef Full Text | Google Scholar

Liu, Z., Barnett, I., and Lin, X. (2020). A Comparison of Principal Component Methods between Multiple Phenotype Regression and Multiple SNP Regression in Genetic Association Studies. Ann. Appl. Stat. 14 (1), 433–451. doi:10.1214/19-AOAS1312

CrossRef Full Text | Google Scholar

Luo, L., Boerwinkle, E., and Xiong, M. (2011). Association Studies for Next-Generation Sequencing. Genome Res. 21 (7), 1099–1108. doi:10.1101/gr.115998.110

PubMed Abstract | CrossRef Full Text | Google Scholar

Luo, L., Zhu, Y., and Xiong, M. (2013). Smoothed Functional Principal Component Analysis for Testing Association of the Entire Allelic Spectrum of Genetic Variation. Eur. J. Hum. Genet. 21 (2), 217–224. doi:10.1038/ejhg.2012.141

CrossRef Full Text | Google Scholar

Marin, E. P., Cohen, E., and Dahl, N. (2020). Clinical Applications of Genetic Discoveries in Kidney Transplantation: A Review. Kidney360 1 (4), 300–305. doi:10.34067/KID.0000312019

CrossRef Full Text | Google Scholar

Mesnard, L., Muthukumar, T., Burbach, M., Li, C., Shang, H., Dadhania, D., et al. (2016). Exome Sequencing and Prediction of Long-Term Kidney Allograft Function. Plos Comput. Biol. 12 (9), e1005088. doi:10.1371/journal.pcbi.1005088

PubMed Abstract | CrossRef Full Text | Google Scholar

Mishra, A., and Macgregor, S. (2015). VEGAS2: Software for More Flexible Gene-Based Testing. Twin Res. Hum. Genet. 18 (1), 86–91. doi:10.1017/thg.2014.79

PubMed Abstract | CrossRef Full Text | Google Scholar

Neale, B. M., Rivas, M. A., Voight, B. F., Altshuler, D., Devlin, B., Orho-Melander, M., et al. (2011). Testing for an Unusual Distribution of Rare Variants. Plos Genet. 7 (3), e1001322. doi:10.1371/journal.pgen.1001322

PubMed Abstract | CrossRef Full Text | Google Scholar

Oetting, W. S., Schladt, D. P., Guan, W., Miller, M. B., Remmel, R. P., Dorr, C., et al. (2016). Genomewide Association Study of Tacrolimus Concentrations in African American Kidney Transplant Recipients Identifies MultipleCYP3A5Alleles. Am. J. Transplant. 16 (2), 574–582. doi:10.1111/ajt.13495

PubMed Abstract | CrossRef Full Text | Google Scholar

Pan, W. (2009). Asymptotic Tests of Association with Multiple SNPs in Linkage Disequilibrium. Genet. Epidemiol. 33 (6), 497–507. doi:10.1002/gepi.20402

PubMed Abstract | CrossRef Full Text | Google Scholar

Pan, W. (2011). Relationship between Genomic Distance-Based Regression and Kernel Machine Regression for Multi-Marker Association Testing. Genet. Epidemiol. 35 (4), a–n. doi:10.1002/gepi.20567

CrossRef Full Text | Google Scholar

Parimi, N., Tromp, G., Kuivaniemi, H., Nien, J. K., Gomez, R., Romero, R., et al. (2008). Analytical Approaches to Detect Maternal/fetal Genotype Incompatibilities that Increase Risk of Pre-eclampsia. BMC Med. Genet. 9 (1), 60. doi:10.1186/1471-2350-9-60

PubMed Abstract | CrossRef Full Text | Google Scholar

Pineda, S., Sigdel, T. K., Chen, J., Jackson, A. M., Sirota, M., and Sarwal, M. M. (2017). Novel Non-histocompatibility Antigen Mismatched Variants Improve the Ability to Predict Antibody-Mediated Rejection Risk in Kidney Transplant. Front. Immunol. 8, 1687. doi:10.3389/fimmu.2017.01687

PubMed Abstract | CrossRef Full Text | Google Scholar

R Core Team (2021). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/.

Google Scholar

Reddy, M. S., Varghese, J., Venkataraman, J., and Rela, M. (2013). Matching Donor to Recipient in Liver Transplantation: Relevance in Clinical Practice. Wjh 5 (11), 603–611. doi:10.4254/wjh.v5.i11.603

PubMed Abstract | CrossRef Full Text | Google Scholar

Reindl‐Schwaighofer, R., Heinzel, A., Gualdoni, G. A., Mesnard, L., Claas, F. H. J., and Oberbauer, R. (2020). Novel Insights into non‐HLA Alloimmunity in Kidney Transplantation. Transpl. Int. 33 (1), 5–17. doi:10.1111/tri.13546

CrossRef Full Text | Google Scholar

Reindl-Schwaighofer, R., Heinzel, A., Kainz, A., van Setten, J., Jelencsics, K., Hu, K., et al. (2019). Contribution of Non-HLA Incompatibility between Donor and Recipient to Kidney Allograft Survival: Genome-wide Analysis in a Prospective Cohort. The Lancet 393 (10174), 910–917. doi:10.1016/S0140-6736(18)32473-5

CrossRef Full Text | Google Scholar

Schaid, D. J. (2010). Genomic Similarity and Kernel Methods I: Advancements by Building on Mathematical and Statistical Foundations. Hum. Hered. 70 (2), 109–131. doi:10.1159/000312641

PubMed Abstract | CrossRef Full Text | Google Scholar

Steers, N. J., Li, Y., Drace, Z., D’Addario, J. A., Fischman, C., Liu, L., et al. (2019). Genomic Mismatch at LIMS1 Locus and Kidney Allograft Rejection. N. Engl. J. Med. 380 (20), 1918–1928. doi:10.1056/NEJMoa1803731

PubMed Abstract | CrossRef Full Text | Google Scholar

Su, Z., Marchini, J., and Donnelly, P. (2011). HAPGEN2: Simulation of Multiple Disease SNPs. Bioinformatics 27 (16), 2304–2305. doi:10.1093/bioinformatics/btr341

PubMed Abstract | CrossRef Full Text | Google Scholar

Sun, J., Zheng, Y., and Hsu, L. (2013). A Unified Mixed-Effects Model for Rare-Variant Association in Sequencing Studies. Genet. Epidemiol. 37 (4), 334–344. doi:10.1002/gepi.21717

PubMed Abstract | CrossRef Full Text | Google Scholar

Tzeng, J.-Y., and Zhang, D. (2007). Haplotype-based Association Analysis via Variance-Components Score Test. Am. J. Hum. Genet. 81 (5), 927–938. doi:10.1086/521558

CrossRef Full Text | Google Scholar

Wang, K., and Abbott, D. (2008). A Principal Components Regression Approach to Multilocus Genetic Association Studies. Genet. Epidemiol. 32 (2), 108–118. doi:10.1002/gepi.20266

PubMed Abstract | CrossRef Full Text | Google Scholar

Wang, T., and Elston, R. C. (2007). Improved Power by Use of a Weighted Score Test for Linkage Disequilibrium Mapping. Am. J. Hum. Genet. 80 (2), 353–360. doi:10.1086/511312

CrossRef Full Text | Google Scholar

Wessel, J., and Schork, N. J. (2006). Generalized Genomic Distance-Based Regression Methodology for Multilocus Association Analysis. Am. J. Hum. Genet. 79 (5), 792–806. doi:10.1086/508346

CrossRef Full Text | Google Scholar

Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., et al. (2010). Powerful SNP-Set Analysis for Case-Control Genome-wide Association Studies. Am. J. Hum. Genet. 86 (6), 929–942. doi:10.1016/j.ajhg.2010.05.002

CrossRef Full Text | Google Scholar

Wu, M. C., Lee, S., Cai, T., Li, Y., Boehnke, M., and Lin, X. (2011). Rare-variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. Am. J. Hum. Genet. 89 (1), 82–93. doi:10.1016/j.ajhg.2011.05.029

CrossRef Full Text | Google Scholar

Yang, J. Y. C., and Sarwal, M. M. (2017). Transplant Genetics and Genomics. Nat. Rev. Genet. 18 (5), 309–326. doi:10.1038/nrg.2017.12

PubMed Abstract | CrossRef Full Text | Google Scholar

Zanoni, F., and Kiryluk, K. (2020). Genetic Background and Transplantation Outcomes: Insights from Genome-wide Association Studies. Curr. Opin. Organ. Transplant. 25 (1), 35–41. doi:10.1097/MOT.0000000000000718

PubMed Abstract | CrossRef Full Text | Google Scholar

Zeileis, A., and Hothorn, T. (2002). Diagnostic Checking in Regression Relationships. R. News 2 (3), 7–10. Available at: https://CRAN.R-project.org/doc/Rnews/.

Google Scholar

Zhang, Z., Menon, M. C., Zhang, W., Stahl, E., Loza, B.-L., Rosales, I. A., et al. (2020). Genome-wide Non-HLA Donor-Recipient Genetic Differences Influence Renal Allograft Survival via Early Allograft Fibrosis. Kidney Int. 98 (3), 758–768. doi:10.1016/j.kint.2020.04.039

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: transplant genetics, multi-marker testing, joint testing, genetic matching scores, paired genetic data

Citation: Arthur VL, Li Z, Cao R, Oetting WS, Israni AK, Jacobson PA, Ritchie MD, Guan W and Chen J (2021) A Multi-Marker Test for Analyzing Paired Genetic Data in Transplantation. Front. Genet. 12:745773. doi: 10.3389/fgene.2021.745773

Received: 22 July 2021; Accepted: 23 September 2021;
Published: 13 October 2021.

Edited by:

Nianjun Liu, Indiana University Bloomington, United States

Reviewed by:

Zhaoxia Yu, University of California, Irvine, United States
Shaoyu Li, University of North Carolina at Charlotte, United States

Copyright © 2021 Arthur, Li, Cao, Oetting, Israni, Jacobson, Ritchie, Guan and Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Victoria L. Arthur, vlynn@pennmedicine.upenn.edu

These authors have contributed equally to this work and share first authorship

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.